0% found this document useful (0 votes)

389 views35 pages

ML Lab Programs (1-12)

This document provides a table of contents for a Machine Learning Lab Manual. It lists 12 experiments covering topics like Bayes' rule, data extraction from databases using Python, k-nearest neighbors classification, k-means clustering, linear regression, naive Bayes classification, genetic algorithms, backpropagation neural networks, and several other machine learning algorithms. For each experiment, it provides a brief description of the problem and an outline of the steps to implement the given algorithm in Python.

Uploaded by

Shyam Naidu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

389 views35 pages

ML Lab Programs (1-12)

Uploaded by

Shyam Naidu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Machine Learning Lab Manual

TABLE OF CONTENTS

S.No Title Page No

The probability that it is Friday and that a student is absent is 3 %. Since there 13
are 5 school days in a week, the probability that it is Friday is 20 %. What is
1.
theprobability that a student is absent given that today is Friday? Apply Baye’s
rule in python to get the result. (Ans: 15%)
2. Extract the data from database using python 14
3. Implement k-nearest neighbours classification using python 17
Given the following data, which specify classifications for nine combinations of
VAR1 and VAR2 predict a classification for a case where VAR1=0.906 and
VAR2=0.606, using the result of kmeans clustering with 3 means (i.e., 3
centroids) periments

VAR1 VAR2 CLASS

1.713 1.586 0

0.180 1.786 1

4. 0.353 1.240 1 22
0.940 1.566 0

1.486 0.759 1

1.266 1.106 0

1.540 0.419 1

0.459 1.799 1

0.773 0.186 1

The following training examples map descriptions of individuals onto high,

medium and low credit-worthiness.
medium skiing design single twenties no -> highRisk
5. high golf trading married forties yes -> lowRisk 23
low speedway transport married thirties yes -> medRisk
medium football banking single thirties yes -> lowRisk
high flying media married fifties yes -> highRisk
low football security single twenties no -> medRisk
medium golf media single thirties yes -> medRisk
medium golf transport married forties yes -> lowRisk
high skiing banking single thirties yes -> highRisk
low golf unemployed married forties yes -> highRisk

Input attributes are (from left to right) income, recreation, job, status, age-
group, home-owner. Find the unconditional probability of `golf' and the
conditional probability of `single' given `medRisk' in the dataset?
6. Implement linear regression using python. 24
7. Implement Naïve Bayes theorem to classify the English text 26
8. Implement an algorithm to demonstrate the significance of genetic algorithm 29
Implement the finite words classification system using Back-propagation 34
9.
algorithm
Additional Experiments:
10. Find-S Algorithm 38
11. Candidate Elimination Algorithm 40
12. K-Means Clustering Algorithm 43
Experiment :1

1. The probability that it is Friday and that a student is absent is 3 %. Since there are 5 school days
in a week, the probability that it is Friday is 20 %. What is theprobability that a student is absent
given that today is Friday? Apply Baye’s rule in python to get the result. (Ans: 15%)

ALGORITHM:

Step 1: Calculate probability for each word in a text and filter the words which have a probability less than threshold
probability. Words with probability less than threshold probability are irrelevant.

Step 2: Then for each word in the dictionary, create a probability of that word being in insincere questions and its
probability insincere questions. Then finding the conditional probability to use in naive Bayes classifier.

Step 3: Prediction using conditional probabilities.

Step 4: End.

PROGRAM:

} PFIA=float(input(“Enter probability that it is Friday and that a student is absent=”))

PF=float(input(“ probability that it is Friday=”))

PABF=PFIA / PF

print(“probability that a student is absent given that today is Friday using conditional
probabilities=”,PABF)

OUTPUT:

Enter probability that it is Friday and that a student is

absent= 0.03 probability that it is Friday= 0.2

probability that a student is absent given that today is Friday using conditional probabilities= 0.15
Experiment:2

2. Extract the data from database using python

O Extracting data from Excel sheet using Python

Step1: First convert dataset present in excel to CSV file using online resources, then execute
following program:

consider dataset excel consists of 14 input columns and 3 output columns (C1, C2, C3)as follows:

Python Souce Code:

import pandas as pd

Import csv

dataset=pd.read_csv("Sample_Dataset.csv", delimiter=',')

print(dataset) #Print entire dataset

X = dataset[['AA','BB','CC','DD','EE','FF']].values

Y = dataset[['C1','C2','C3']].values

print(Y) #Prints output values

print(X) #Prints intput values

X1 = dataset[['AA','BB','CC']].values

print(X1) #Prints first 5 columns of intput values

print(X[0:5]) # Prints only first 5 rows of input values

OUTPUT SCREENS:
Experiment:3

3. Implement k-nearest neighbours classification using python

ALGORITHM:

Step 1: Load the data

Step 2: Initialize the value of k

Step 3: For getting the predicted class, iterate from 1 to total number of training data points

i) Calculate the distance between test data and each row of training data. Here we will
use Euclidean distance as our distance metric since it’s the most popular method.
The other metrics that can be used are Chebyshev, cosine, etc.
ii) Sort the calculated distances in ascending order based on distance values 3. Get top
k rows from the sorted array
iii) Get the most frequent class of these rows i.e. Get the labels of the selected K entries
iv) Return the predicted class If regression, return the mean of the K labels If
classification, return the mode of the K labels
 If regression, return the mean of the K labels
 If classification, return the mode of the K labels
Step 4: End.

PROGRAM

import numpy as np
from sklearn import datasets

iris = datasets.load_iris()
data = iris.data
labels = iris.target

for i in [0, 79, 99, 101]:

print(f"index: {i:3}, features: {data[i]}, label: {labels[i]}")

np.random.seed(42)

indices = np.random.permutation(len(data))
n_training_samples = 12

learn_data = data[indices[:-n_training_samples]]
learn_labels = labels[indices[:-n_training_samples]]

test_data = data[indices[-n_training_samples:]]

test_labels = labels[indices[-n_training_samples:]]

print("The first samples of our learn set:")

print(f"{'index':7s}{'data':20s}{'label':3s}")

for i in range(5):
print(f"{i:4d} {learn_data[i]} {learn_labels[i]:3}")

print("The first samples of our test set:")

print(f"{'index':7s}{'data':20s}{'label':3s}")

for i in range(5):
print(f"{i:4d} {learn_data[i]} {learn_labels[i]:3}")

#The following code is only necessary to visualize the data of our learnset
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

colours = ("r", "b")

X = []

for iclass in range(3):

X.append([[], [], []])
for i in range(len(learn_data)):

if learn_labels[i] == iclass:
X[iclass][0].append(learn_data[i][0])

X[iclass][1].append(learn_data[i][1])
X[iclass][2].append(sum(learn_data[i][2:]))

colours = ("r", "g", "y")

fig = plt.figure()

ax = fig.add_subplot(111, projection='3d')
for iclass in range(3):

ax.scatter(X[iclass][0], X[iclass][1], X[iclass][2], c=colours[iclass])

plt.show()

#----------------------------------------------------

def distance(instance1, instance2):

""" Calculates the Eucledian distance between two instances"""

return np.linalg.norm(np.subtract(instance1, instance2))

def get_neighbors(training_set, labels, test_instance, k, distance):

"""
get_neighors calculates a list of the k nearest neighbors of an instance 'test_instance'.

The function returns a list of k 3-tuples. Each 3-tuples consists of (index, dist, label)

"""

distances = []
for index in range(len(training_set)):
dist = distance(test_instance, training_set[index])

distances.append((training_set[index], dist, labels[index]))

distances.sort(key=lambda x: x[1])

neighbors = distances[:k]
return neighbors

for i in range(5):

neighbors = get_neighbors(learn_data, learn_labels, test_data[i], 3,

distance=distance)

print("Index: ",i,'\n',
"Testset Data: ",test_data[i],'\n',

"Testset Label: ",test_labels[i],'\n',

"Neighbors: ",neighbors,'\n')
OUTPUT:
Experiment 4

4. Given the following data, which specify classifications for nine combinations of VAR1 and
VAR2 predict a classification for a case where VAR1=0.906 and VAR2=0.606, using the result of
k-means clustering with 3 means (i.e., 3centroids)

ALGORITHM:

K means Clustering Algorithm:

K Means algorithm is a centroid-based clustering (unsupervised) technique. This technique

groups the dataset into k different clusters having an almost equal number of points. Each of the
clusters has a centroid point which represents the mean of the data points lying in that cluster.

The idea of the K-Means algorithm is to find k-centroid points and every point in the dataset will
belong to either of the k-sets having minimum Euclidean distance.

Step 1: Create X array with [var1,var2] as each element from the given input.

Step 2:Create y array with Class attribute from the given input.
Step 3:Training the KMeans model by providing (X,y) as training data.
Step 4:Predict the model by giving input.
Step 5: End

PROGRAM:

Write a program that implement Queue (its operations)using

from sklearn.cluster import KMeans

import numpy as np
X = np.array([[1.713,1.586], [0.180,1.786], [0.353,1.240], [0.940,1.566], [1.486,0.759],
[1.266,1.106],[1.540,0.419],[0.459,1.799],[0.773,0.186]])

y=np.array([0,1,1,0,1,0,1,1,1])

kmeans = KMeans(n_clusters=3, random_state=0).fit(X,y)

kmeans.predict([[0.906, 0.606]])

OUTPUT:

array([0], dtype=int32) // Given Input belongs to 0th cluster

Experiment 5

5.The following training examples map descriptions of individuals onto high, medium
and low credit-worthiness.
medium skiing design single twenties no -> highRisk
high golf trading married forties yes -> lowRisk
low speedway transport married thirties yes -> medRisk
medium football banking single thirties yes -> lowRisk
high flying media married fifties yes -> highRisk
low football security single twenties no -> medRisk
medium golf media single thirties yes -> medRisk
medium golf transport married forties yes -> lowRisk
high skiing banking single thirties yes -> highRisk
low golf unemployed married forties yes -> highRisk

PROGRAM:

totalRecords=10
numberGolfRecreation=4
probGolf=numberGolfRecreation/totalRecords
print("Unconditional probability of golf: ={}".format(probGolf))
#conditional probability of `single' given`medRisk'
# bayes Formula
#p(single|medRisk)=p(medRisk|single)p(single)/p(medRisk)
#p(medRisk|single)=p(medRisk ∩single)/p(single)
numberMedRiskSingle=2
numberMedRisk=3
probMedRiskSingle=numberMedRiskSingle/totalRecords
probMedRisk=numberMedRisk/totalRecords
conditionalProbability=(probMedRiskSingle/probMedRisk)
print("Conditional probability of single given medRisk: = {}".format(conditionalProbability))

OUTPUT:

Unconditional probability of golf: =0.4

Conditional probability of single given medRisk: = 0.6666666666666667
Experiment 6
6. Implement linear regression using python

ALGORITHM:

Step 1: Create Database for Linear Regression

Step 2:Finding Hypothesis of Linear Regression

Step 3:Training a Linear Regression model
Step 4:Evaluating the model
Step 5: Scikit-learn implementation

Step 6: End

PROGRAM:

Write a program that implement Queue (its operations)using

# Importing Necessary Libraries

import numpy as np

import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error, r2_score

# generate random data-set

np.random.seed(0)

x = np.random.rand(100, 1) #Generate a 2-D array with 100 rows, each row containing 1
random numbers:

y = 2 + 3 * x + np.random.rand(100, 1)

regression_model = LinearRegression() # Model initialization

regression_model.fit(x, y) # Fit the data(train the model)

y_predicted = regression_model.predict(x) # Predict

# model evaluation

rmse = mean_squared_error(y, y_predicted)

r2 = r2_score(y, y_predicted)
# printing values

print('Slope:' ,regression_model.coef_)

print('Intercept:', regression_model.intercept_)

print('Root mean squared error: ', rmse)

print('R2 score: ', r2)

# plotting values # data points

plt.scatter(x, y, s=10)

plt.xlabel('x-Values from 0-1')

plt.ylabel('y-values from 2-5')

# predicted values

plt.plot(x, y_predicted, color='r')

plt.show() )

OUTPUT:
Experiment 7

7. Implement Naive Bayes Theorem to Classify the English Text using python

The Naive Bayes algorithm

Naive Bayes classifiers are a collection of classification algorithms based on Bayes’ Theorem. It is not a
single algorithm but a family of algorithms where all of them share a common principle, i.e. every pair of
features being classified is independent of each other.

The dataset is divided into two parts, namely, feature matrix and the response/target vector.

 The Feature matrix (X) contains all the vectors(rows) of the dataset in which each vector
consists of the value of dependent features. The number of features is d i.e. X = (x1,x2,x2, xd).
 The Response/target vector (y) contains the value of class/group variable for each row of
feature matrix.

Now the “naïve” conditional independence assumptions come into play: assume that
all features in X are mutually independent, conditional on the category y:

Dealing with text data

The values 0,1,2, encode the frequency of a word that appeared in the initial text data.

E.g. The first transformed row is [0 1 1 1 0 0 1 0 1] and the unique vocabulary is [‘and’, ‘document’,
‘first’, ‘is’, ‘one’, ‘second’, ‘the’, ‘third’, ‘this’], thus this means that the words “document”, “first”, “is”,
“the” and “this” appeared 1 time each in the initial text string (i.e. ‘This is the first document.’).

In our example, we will convert the collection of text documents (train and test sets) into a matrix of
token counts.

To implement that text transformation we will use the make_pipeline function. This will internally
transform the text data and then the model will be fitted using the transformed data.

Source Code
print("NAIVE BAYES ENGLISH TEST CLASSIFICATION")

import numpy as np, pandas as pd

import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline
from sklearn.metrics import confusion_matrix, accuracy_score

sns.set() # use seaborn plotting style

# Load the dataset

data = fetch_20newsgroups()# Get the text categories
text_categories = data.target_names# define the training set
train_data = fetch_20newsgroups(subset="train", categories=text_categories)# define
the test set
test_data = fetch_20newsgroups(subset="test", categories=text_categories)

print("We have {} unique classes".format(len(text_categories)))

print("We have {} training samples".format(len(train_data.data)))
print("We have {} test samples".format(len(test_data.data)))

# let’s have a look as some training data let it 5th only

#print(test_data.data[5])
# Build the model
model = make_pipeline(TfidfVectorizer(), MultinomialNB())# Train the model using the
training data
model.fit(train_data.data, train_data.target)# Predict the categories of the test data
predicted_categories = model.predict(test_data.data)

print(np.array(test_data.target_names)[predicted_categories])

# plot the confusion matrix

mat = confusion_matrix(test_data.target, predicted_categories)
sns.heatmap(mat.T, square = True, annot=True, fmt = "d",
xticklabels=train_data.target_names,yticklabels=train_data.target_names)
plt.xlabel("true labels")
plt.ylabel("predicted label")
plt.show()
print("The accuracy is {}".format(accuracy_score(test_data.target,
predicted_categories)))

OUTPUT:
Experiment 8
8. Implement an algorithm to demonstrate the significance of Genetic Algorithm in python

ALGORITHM:

1. Individual in population compete for resources and mate

2. Those individuals who are successful (fittest) then mate to create more offspring than others
3. Genes from “fittest” parent propagate throughout the generation, that is sometimes parents
create offspring which is better than either parent.
4. Thus each successive generation is more suited for their environment.

Operators of Genetic Algorithms

Once the initial generation is created, the algorithm evolve the generation using following operators –
1) Selection Operator: The idea is to give preference to the individuals with good fitness scores and
allow them to pass there genes to the successive generations.
2) Crossover Operator: This represents mating between individuals. Two individuals are selected using
selection operator and crossover sites are chosen randomly. Then the genes at these crossover sites are
exchanged thus creating a completely new individual (offspring).

3) Mutation Operator: The key idea is to insert random genes in offspring to maintain the diversity in
population to avoid the premature convergence.
Given a target string, the goal is to produce target string starting from a random string of the same
length. In the following implementation, following analogies are made –

 Characters A-Z, a-z, 0-9 and other special symbols are considered as genes
 A string generated by these character is considered as chromosome/solution/Individual

Fitness score is the number of characters which differ from characters in target string at a particular
index. So individual having lower fitness value is given more preference.

Source Code

# Python3 program to create target string, starting from

# random string using Genetic Algorithm

import random

# Number of individuals in each generation

POPULATION_SIZE = 100

# Valid genes
GENES = '''abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOP
QRSTUVWXYZ 1234567890, .-;:_!"#%&/()=?@${[]}'''

# Target string to be generated

TARGET = "I love GeeksforGeeks"

class Individual(object):
'''
Class representing individual in population '''
def __init__(self, chromosome):
self.chromosome = chromosome
self.fitness = self.cal_fitness()

@classmethod
def mutated_genes(self):
'''
create random genes for mutation
'''
global GENES
gene = random.choice(GENES)
return gene

@classmethod
def create_gnome(self):
'''
create chromosome or string of genes
'''
global TARGET
gnome_len = len(TARGET)
return [self.mutated_genes() for _ in range(gnome_len)]

def mate(self, par2):

''' Perform mating and produce new offspring '''

# chromosome for offspring

child_chromosome = []
for gp1, gp2 in zip(self.chromosome, par2.chromosome):

# random probability
prob = random.random()

# if prob is less than 0.45, insert gene

# from parent 1
if prob < 0.45:
child_chromosome.append(gp1)

# if prob is between 0.45 and 0.90, insert

# gene from parent 2
elif prob < 0.90:
child_chromosome.append(gp2)

# otherwise insert random gene(mutate),

# for maintaining diversity
else:
child_chromosome.append(self.mutated_genes())

# create new Individual(offspring) using

# generated chromosome for offspring
return Individual(child_chromosome)

def cal_fitness(self):
''' Calculate fittness score, it is the number of
characters in string which differ from target string. '''
global TARGET
fitness = 0
for gs, gt in zip(self.chromosome, TARGET):
if gs != gt: fitness+= 1
return fitness

# Driver code
def main():
global POPULATION_SIZE

#current generation
generation = 1

found = False
population = []

# create initial population

for _ in range(POPULATION_SIZE):
gnome = Individual.create_gnome()
population.append(Individual(gnome))

while not found:

# sort the population in increasing order of fitness score

population = sorted(population, key = lambda x:x.fitness)

# if the individual having lowest fitness score ie.

# 0 then we know that we have reached to the target
# and break the loop
if population[0].fitness <= 0:
found = True
break

# Otherwise generate new offsprings for new generation

new_generation = []

# Perform Elitism, that mean 10% of fittest population

# goes to the next generation
s = int((10*POPULATION_SIZE)/100)
new_generation.extend(population[:s])

# From 50% of fittest population, Individuals

# will mate to produce offspring
s = int((90*POPULATION_SIZE)/100)
for _ in range(s):
parent1 = random.choice(population[:50])
parent2 = random.choice(population[:50])
child = parent1.mate(parent2)
new_generation.append(child)

population = new_generation

print("Generation: {}\tString: {}\tFitness: {}".\

format(generation,
"".join(population[0].chromosome),
population[0].fitness))

generation += 1

print("Generation: {}\tString: {}\tFitness: {}".\

format(generation,
"".join(population[0].chromosome),
population[0].fitness))

if __name__ == '__main__':
main()

OUTPUT:
Experiment 9
9. Implement an algorithm to demonstrate Back Propagation Algorithm in python

ALGORITHM:

It is the most widely used algorithm for training artificial neural networks.

In the simplest scenario, the architecture of a neural network consists of some sequential layers,
where the layer numbered i is connected to the layer numbered i+1. The layers can be classified
into 3 classes:
1. Input
2. Hidden
3. Output

Usually, each neuron in the hidden layer uses an activation function like sigmoid or rectified
linear unit (ReLU). This helps to capture the non-linear relationship between the inputs and their
outputs.

The neurons in the output layer also use activation functions like sigmoid (for regression) or
SoftMax (for classification).
To train a neural network, there are 2 passes (phases):
 Forward
 Backward

The forward and backward phases are repeated from some epochs. In each epoch, the following
occurs:
1. The inputs are propagated from the input to the output layer.
2. The network error is calculated.
3. The error is propagated from the output layer to the input layer.

Knowing that there’s an error, what should we do? We should minimize it. To minimize network
error, we must change something in the network. Remember that the only parameters we can
change are the weights and biases. We can try different weights and biases, and then test our
network.
Source Code:

import numpy
import matplotlib.pyplot as plt

def sigmoid(sop):
return 1.0/(1+numpy.exp(-1*sop))

def error(predicted, target):

return numpy.power(predicted-target, 2)

def error_predicted_deriv(predicted, target):

return 2*(predicted-target)

def sigmoid_sop_deriv(sop):
return sigmoid(sop)*(1.0-sigmoid(sop))

def sop_w_deriv(x):
return x

def update_w(w, grad, learning_rate):

return w - learning_rate*grad

x1=0.1
x2=0.4
target = 0.7
learning_rate = 0.01

w1=numpy.random.rand()
w2=numpy.random.rand()

print("Initial W : ", w1, w2)

predicted_output = []
network_error = []

old_err = 0
for k in range(80000):
# Forward Pass
y = w1*x1 + w2*x2
predicted = sigmoid(y)
err = error(predicted, target)

predicted_output.append(predicted)
network_error.append(err)
# Backward Pass
g1 = error_predicted_deriv(predicted, target)
g2 = sigmoid_sop_deriv(y)

g3w1 = sop_w_deriv(x1)
g3w2 = sop_w_deriv(x2)

gradw1 = g3w1*g2*g1
gradw2 = g3w2*g2*g1

w1 = update_w(w1, gradw1, learning_rate)

w2 = update_w(w2, gradw2, learning_rate)
#print(predicted)

plt.figure()
plt.plot(network_error)
plt.title("Iteration Number vs Error")
plt.xlabel("Iteration Number")
plt.ylabel("Error")
plt.show()

plt.figure()
plt.plot(predicted_output)
plt.title("Iteration Number vs Prediction")
plt.xlabel("Iteration Number")
plt.ylabel("Prediction")
plt.show()

OUTPUT:

Initial W : 0.08698924153243281 0.4532713230157145

Experiment 10
10. Implementing FIND-S algorithm using python

Training Database

Algorithm

1. Initialize h to the most specific hypothesis in H

2. For each positive training instance x
For each attribute constraint a, in h
If the constraint a, is satisfied by x
Then do nothing
Else replace a, in h by the next more general constraint that is satisfied by x
3. Output hypothesis h
-----------------------------------------------------------------------------------------------

Hypothesis Construction
Source Code:
import csv
with open('enjoysport.csv', 'r') as csvfile:
for row in csv.reader(csvfile):
a.append(row)
print(a)
print("\n The total number of training instances are : ",len(a))
num_attribute = len(a[0])-1
print("\n The initial hypothesis is : ")
hypothesis = ['0']*num_attribute
print(hypothesis)
for i in range(0, len(a)):
if a[i][num_attribute] == 'Yes': #for each positive example only
for j in range(0, num_attribute):
if hypothesis[j] == '0' or hypothesis[j] == a[i][j]:
hypothesis[j] = a[i][j]
else:
hypothesis[j] = '?'
print("\n The hypothesis for the training instance {} is : \n".format(i+1),hypothesis)
print("\n The Maximally specific hypothesis for the training instance is ")
print(hypothesis)

OUTPUT:
Experiment 11
11. Implementing Candidate Elimination algorithm using python

Training Database

Algorithm
Sourc
e
Code:
impor
t csv

with open("enjoysport.csv") as f:
csv_file=csv.reader(f)
data=list(csv_file)

print(data)
print("--------------------")
s=data[1][:-1] #extracting one row or instance or record
g=[['?' for i in range(len(s))] for j in range(len(s))]

print(s)
print("--------------------")
print(g)
print("--------------------")

for i in data:
if i[-1]=="Yes": # For each positive training record or instance
for j in range(len(s)):
if i[j]!=s[j]:
s[j]='?'
g[j][j]='?'

elif i[-1]=="No": # For each negative training record or example

for j in range(len(s)):
if i[j]!=s[j]:
g[j][j]=s[j]
else:
g[j][j]="?"
print("\nSteps of Candidate Elimination Algorithm",data.index(i)+1)
print(s)
print(g)
gh=[]
for i in g:
for j in i:
if j!='?':
gh.append(i)
break
print("\nFinal specific hypothesis:\n",s)
print("\nFinal general hypothesis:\n",gh)

OUTPUT:
Experiment 12

Implement K-Means_Clustering using python

ALGORITHM:

Step 1: Read the Given data Sample to X

Step 2: Train Dataset with K=5

Step 3: Find optimal number of clusters(k) in a dataset using Elbow method

Step 4: Train Dataset with K=3 (optimal K-Value)

Step 4: Compare results

Step 6: End

PROGRAM:

#Import libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn import datasets

#Read DataSet
df = datasets.load_iris()
x = df.data
y = df.target

print(x)
print(y)

#Lets try with k=5 initially

kmeans5 = KMeans(n_clusters=5)
y_kmeans5 = kmeans5.fit_predict(x)
print(y_kmeans5)
print(kmeans5.cluster_centers_)

# To find optimal number of clusters(k) in a dataset

Error =[ ]
for i in range(1, 11):
kmeans = KMeans(n_clusters = i).fit(x)
kmeans.fit(x)
Error.append(kmeans.inertia_)
import matplotlib.pyplot as plt
plt.plot(range(1, 11), Error)
plt.title('Elbow method')
plt.xlabel('No of clusters')
plt.ylabel('Error')
plt.show()

#Now try with k=3 finally

kmeans3 = KMeans(n_clusters=3)
y_kmeans3 = kmeans3.fit_predict(x)
print(y_kmeans3)

print(kmeans3.cluster_centers_)

OUTPUT:

ML Lab Programs (1-13)
No ratings yet
ML Lab Programs (1-13)
44 pages
Unit-3 Advanced Classes
No ratings yet
Unit-3 Advanced Classes
22 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
36 pages
1.3.SOA and Cloud Computing
No ratings yet
1.3.SOA and Cloud Computing
25 pages
AI - Unit - 4
No ratings yet
AI - Unit - 4
53 pages
Ai Chat Bot Unit - 2
No ratings yet
Ai Chat Bot Unit - 2
31 pages
Angular JS Lab Manual
No ratings yet
Angular JS Lab Manual
43 pages
Cloud Computing Chapter-11
No ratings yet
Cloud Computing Chapter-11
15 pages
Mini Project B.tech
100% (1)
Mini Project B.tech
15 pages
Data Analytics Life Cycle
No ratings yet
Data Analytics Life Cycle
8 pages
DBMS - Unit 4
No ratings yet
DBMS - Unit 4
22 pages
Fdsa Unit 5
No ratings yet
Fdsa Unit 5
48 pages
SM 1000 Idi Reference Manual
No ratings yet
SM 1000 Idi Reference Manual
108 pages
F U-4 PDF
No ratings yet
F U-4 PDF
48 pages
Mining Social Network Graphs
No ratings yet
Mining Social Network Graphs
35 pages
Unit-3-Greedy Method PDF
No ratings yet
Unit-3-Greedy Method PDF
22 pages
Daa Lab Manual
No ratings yet
Daa Lab Manual
60 pages
Lecture 02 Part A - Uninformed or Blind Search
No ratings yet
Lecture 02 Part A - Uninformed or Blind Search
92 pages
Why Web3 Matters - Cdixon
No ratings yet
Why Web3 Matters - Cdixon
5 pages
Orion ARINC653 Architecture
100% (1)
Orion ARINC653 Architecture
58 pages
Unit 1
No ratings yet
Unit 1
23 pages
Daa M-4
No ratings yet
Daa M-4
28 pages
DBDM Unit-3
No ratings yet
DBDM Unit-3
30 pages
WS MCQ (Sem-5) (Itscholar - Codegency.co - In) (MC)
No ratings yet
WS MCQ (Sem-5) (Itscholar - Codegency.co - In) (MC)
22 pages
1.1 Project Summary:: Digital Scrapbook
No ratings yet
1.1 Project Summary:: Digital Scrapbook
30 pages
AllTorque Gen II Manual
100% (1)
AllTorque Gen II Manual
43 pages
Project Work
No ratings yet
Project Work
21 pages
R22 Unit 5
No ratings yet
R22 Unit 5
23 pages
TQC Installation Maintenance Manual - Rev 7 - 09
100% (2)
TQC Installation Maintenance Manual - Rev 7 - 09
25 pages
DCN-NG Download and License Tool
No ratings yet
DCN-NG Download and License Tool
45 pages
Unit 2 BDA
No ratings yet
Unit 2 BDA
32 pages
Cs8582-Object Oriented Analysisand Design Laboratory-46023968-Cs8582 - Ooad Lab
No ratings yet
Cs8582-Object Oriented Analysisand Design Laboratory-46023968-Cs8582 - Ooad Lab
132 pages
CS3492 Database Management Systems Lecture Notes 2
100% (1)
CS3492 Database Management Systems Lecture Notes 2
170 pages
020 - BCA - 2nd & 4th SEMESTER - REVISED REAPPEAR RESULT - 11 STUDENTS - NOVEMBER, 2020
No ratings yet
020 - BCA - 2nd & 4th SEMESTER - REVISED REAPPEAR RESULT - 11 STUDENTS - NOVEMBER, 2020
14 pages
7 I 76
No ratings yet
7 I 76
9 pages
Slides Interim 2017 CFRG 01 Sessa Secp256k1 00
No ratings yet
Slides Interim 2017 CFRG 01 Sessa Secp256k1 00
7 pages
Designing For DTG: Prep School: File Type
No ratings yet
Designing For DTG: Prep School: File Type
11 pages
Supply Chain Flowchart
No ratings yet
Supply Chain Flowchart
8 pages
Unit-1 Concepts of OOP: 2140705 Object Oriented Programming With C++
No ratings yet
Unit-1 Concepts of OOP: 2140705 Object Oriented Programming With C++
24 pages
OOPs ASSIGNMENT
0% (1)
OOPs ASSIGNMENT
18 pages
Dbms Unit II
No ratings yet
Dbms Unit II
49 pages
Coreldraw Syllabus
No ratings yet
Coreldraw Syllabus
6 pages
IR Models: - Why IR Models? - Boolean IR Model - Vector Space IR Model - Probabilistic IR Model
No ratings yet
IR Models: - Why IR Models? - Boolean IR Model - Vector Space IR Model - Probabilistic IR Model
46 pages
Interactive Cyber Security Career Roadmap
100% (1)
Interactive Cyber Security Career Roadmap
22 pages
AME Annual Report 2019 2020
No ratings yet
AME Annual Report 2019 2020
23 pages
SK-600I - Operator's Manual v1.0 (En)
100% (1)
SK-600I - Operator's Manual v1.0 (En)
68 pages
Simplex 4100 Revision 9 02 Installation and Service Addendum 5842309716
No ratings yet
Simplex 4100 Revision 9 02 Installation and Service Addendum 5842309716
14 pages
Digital Notes: (Department of Computer Applications)
No ratings yet
Digital Notes: (Department of Computer Applications)
14 pages
Chpater 1 - Unit 2
No ratings yet
Chpater 1 - Unit 2
31 pages
UNIT-3 Backend Frameworks
No ratings yet
UNIT-3 Backend Frameworks
7 pages
Pemeliharaan Proteksi, Scada Dan Telkom
No ratings yet
Pemeliharaan Proteksi, Scada Dan Telkom
18 pages
CS8091 - Big Data Analytics - Unit 1
No ratings yet
CS8091 - Big Data Analytics - Unit 1
28 pages
Unit-3 DWDM
No ratings yet
Unit-3 DWDM
11 pages
1.4.1 GCSE Lesson Retrieval Practice (Assessments) Answers - Security Threats (OCR)
No ratings yet
1.4.1 GCSE Lesson Retrieval Practice (Assessments) Answers - Security Threats (OCR)
2 pages
ML Unit-3
No ratings yet
ML Unit-3
92 pages
Cp4152 Database Practice Lab Manual R 2021
No ratings yet
Cp4152 Database Practice Lab Manual R 2021
48 pages
Information Retrieval 1 Introduction To IR
No ratings yet
Information Retrieval 1 Introduction To IR
12 pages
Topics: Vector Class in Java
No ratings yet
Topics: Vector Class in Java
11 pages
CS408 FinalTerm MCQs02
No ratings yet
CS408 FinalTerm MCQs02
43 pages
Collate Se Unit 4 Notes
No ratings yet
Collate Se Unit 4 Notes
37 pages
Introduction To Automatic Indexing
No ratings yet
Introduction To Automatic Indexing
28 pages
XR3D 600 60 Een 201801 01
No ratings yet
XR3D 600 60 Een 201801 01
2 pages
Frank Wyatt Prentice - Patent CA253765
100% (1)
Frank Wyatt Prentice - Patent CA253765
10 pages
Unit 4 HIVE - PIG
No ratings yet
Unit 4 HIVE - PIG
71 pages
Jntuk Machine Learning 3-2 Unit-4
No ratings yet
Jntuk Machine Learning 3-2 Unit-4
32 pages
Mekelle University Ethiopian Institute of Technology-Mekelle Mechanical Engineering Department
No ratings yet
Mekelle University Ethiopian Institute of Technology-Mekelle Mechanical Engineering Department
3 pages
Case Analysis
0% (1)
Case Analysis
4 pages
Mean Stack Technologies Lab Record
No ratings yet
Mean Stack Technologies Lab Record
49 pages
How To Get Online: Quick-Start Guide
No ratings yet
How To Get Online: Quick-Start Guide
2 pages
SKP Engineering College: A Course Material On
No ratings yet
SKP Engineering College: A Course Material On
212 pages
Kyc Blockchain Er Diagram
No ratings yet
Kyc Blockchain Er Diagram
1 page
CS3353 Unit 2
No ratings yet
CS3353 Unit 2
26 pages
Unit V Easy To Learn
No ratings yet
Unit V Easy To Learn
21 pages
IRS Unit-3
No ratings yet
IRS Unit-3
30 pages
Django Ppts
No ratings yet
Django Ppts
243 pages
O.R - Unit - I, II, III
No ratings yet
O.R - Unit - I, II, III
44 pages
Log
No ratings yet
Log
390 pages
355955B30 Siddesh Mahind SMA Exp-5
No ratings yet
355955B30 Siddesh Mahind SMA Exp-5
11 pages
1 - Web Based Laboratory Information System LIMS - Edited
No ratings yet
1 - Web Based Laboratory Information System LIMS - Edited
63 pages
Ebay Adan
No ratings yet
Ebay Adan
70 pages
Requirements Modeling
No ratings yet
Requirements Modeling
39 pages
Unit 1 Bda Complete Notes
No ratings yet
Unit 1 Bda Complete Notes
15 pages
UNIT-2 ML Notes
No ratings yet
UNIT-2 ML Notes
15 pages
OOAD
No ratings yet
OOAD
2 pages
Data Mining-Rule Based Classification
No ratings yet
Data Mining-Rule Based Classification
4 pages
Unit I - Data Science
No ratings yet
Unit I - Data Science
161 pages
JNTUGV B.tech R23 Course Structure
No ratings yet
JNTUGV B.tech R23 Course Structure
6 pages
MC4102 OOSE Question Bank
No ratings yet
MC4102 OOSE Question Bank
4 pages
What Kind of Data Can Be Mined
No ratings yet
What Kind of Data Can Be Mined
6 pages