0% found this document useful (0 votes)
237 views57 pages

Updated ML LAB Manual-2020-21

The FIND-S algorithm finds the most specific hypothesis consistent with positive training examples by initializing the hypothesis to the most specific possible and then generalizing constraints based on positive examples. It was demonstrated on a tennis playing dataset, initializing the hypothesis to all specific constraints and then generalizing based on each positive training example to output the maximally specific hypothesis consistent with the examples.

Uploaded by

Sneha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
237 views57 pages

Updated ML LAB Manual-2020-21

The FIND-S algorithm finds the most specific hypothesis consistent with positive training examples by initializing the hypothesis to the most specific possible and then generalizing constraints based on positive examples. It was demonstrated on a tennis playing dataset, initializing the hypothesis to all specific constraints and then generalizing based on each positive training example to output the maximally specific hypothesis consistent with the examples.

Uploaded by

Sneha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Gopalan College of Engineering and Management

(ISO 9001:2015 certified)


Approved by All India Council for Technical Education (AICTE) , New Delhi

Affiliated to Visvesvaraya Technological University (VTU), Belagavi, Karnataka


GCEM Recognized by Govt. of Karnataka

Address: 181/1, 182/1, Sonnenahalli, Hoodi, K.R.Puram, Whitefield, Bangalore, Karnataka - 560 048
Phone No: (080) - 42229748 Email: [email protected] Website: www.gopalancolleges.com/gcem

Prepared by: Reviewed by: Approved by:

Girish
Ddddb
M sdc Dr.J.Somasekar Dr. N. Sengottaiyan
Asst.Professor Head of the Department Principal
Dept. of CSE Dept. of CSE GCEM
GCEM GCEM

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


MACHINE LEARNING LABORATORY
LAB MANUAL - 17CSL76
[As per Choice Based Credit System (CBCS) scheme]
(Effective from the academic year 2017 -2018)
SEMESTER-VII
2020-2021
Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

TABLE OF CONTENTS

Page
S No Title Of Contents
From To
1 SYLLABUS 3 4
2 STUDY EXPERIMENT 4 4
3 COURSE OBJECTIVE AND COURSE OUTCOME 5 5
4 COMPUTER LAB- DO’S AND DON’TS 5 5
5 LIST OF EXPERIMENTS 6 6

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 2


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 3


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

Prerequisites
• Programming experience in Python
• Knowledge of basic Machine Learning Algorithms
• Knowledge of common statistical methods and data analysis best practices.

Software Requirements
1. Python version 3.5 and above
2. Machine Learning packages
Scikit-Learn
Numpy - matrices and linear algebra
Scipy - many numerical routines
Matplotlib- creating plots of data
Pandas –facilitates structured/tabular data manipulation and visualisations
Pomegranate –for fast and flexible probabilistic models
3. An Integrated Development Environment (IDE) for Python Programming

Anaconda
Together with a list of Python packages, tools like editors, Python distributions include the
Python interpreter. Anaconda is one of several Python distributions. Anaconda is a new
distribution of the Python and R data science package. It was formerly known as Continuum
Analytics. Anaconda has more than 100 new packages. Anaconda is used for scientific
computing, data science, statistical analysis, and machine learning

Operating System
Windows/Linux
Anaconda Python distribution is compatible with Linux or windows.

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 4


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

COURSE OBJECTIVES
This course will enable students to, Make use of Data sets in implementing the machine
learning algorithms Implement the machine learning concepts and algorithms in any suitable
language of

choice.

COURSE OUTCOMES
After studying this course, the students will be able to Understand the implementation
procedures for the machine learning algorithms Design Java/Python programs for various
Learning algorithms. Apply appropriate data sets to the Machine Learning algorithms Identify
and apply Machine Learning algorithms to solve real world problems

COMPUTER LAB DO’S AND DON’Ts

DO’S

1. Know the location of the fire extinguisher and the first aid box and how to use them in
case of emergency.
2. Read and understand how to carry out an activity thoroughly before coming to the
laboratory.
3. Report fires or accidents to your lecturer/ laboratory technician immediately.
4. Report any broken plugs or exposed electrical wires to your lecturer/ laboratory
technician immediately.

DON’Ts

1. Do not eat or drink in the laboratory.


2. Avoid stepping on electrical wires or any other computer cables.
3. Do not open the system unit casing or monitor casing particularly when the power is
turned on. Some internal components hold electric voltages of up to 30,000 Volts
which can be fatal.
4. Do not insert metal objects such as clips, pins and needles into the computer casings.
They may cause fire.
5. Do not remove anything from the computer laboratory without permission.
6. Do not touch, connect or disconnect any plug or cable without your lecturer/
laboratory technician’s permission.
7. Do not misbehave in the computer laboratory.

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 5


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

LIST OF EXPERIMENTS

S.No Title Of The Experiment Page


From To
1 Implement and demonstrate the FIND-S Algorithm for finding 7 9
the most specific hypothesis.
2 Implement and demonstrate the candidate- Elimination 10 13
algorithm to output a description of set of all hypotheses
consistent with the training examples.
3 Demonstrate the working of decision tree based ID3 algorithm. 14 19
4 Implement the Back propagation algorithm and test the same 2- 23
using appropriate data sets
5 Implement the naïve Bayesian classifier. Compute the accuracy 24 27
of the classifier, considering few test data sets
6 Use the naïve Bayesian Classifier model. Calculate the 28 31
accuracy, precision, and recall for your data set
7 Program to construct a Bayesian network considering medical 32 34
data.
8 Apply EM algorithm to cluster a set of data. Use the same data 35 38
set for clustering k-means algorithm. Compare the results of
these two algorithms and comment on the quality of clustering.
9 Implement k-Nearest Neighbour algorithm to classify the iris 39 44
data set.
10 Implement the non-parametric Locally Weighted 45 52
Regression algorithm in order to fit data points.

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 6


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

Program 1
Implement and demonstrate the FIND- S Algorithm for finding the most specific hypothesis
based on a given set of training data samples. Read the training data from a .CSV file.
Objective To find most specific hypothesis in set of hypothesis which is consistent
with positive training example.
Dataset Tennis data set: This data set contains the set of example days on which
playing of tennis is possible or not, based on attributes Sky, AirTemp,
Humidity, Wind, Water and Forecast.
ML Supervised Learning-Find –S algorithm
algorithm
Description The FIND-S algorithm is probably one of the most simple machine
learning algorithms.

FIND-S Algorithm
1. Initialize h to the most specific hypothesis in H
2. For each positive training
instance x For each attribute
constraint ai in h
If the constraint ai is satisfied
by x Then do nothing
Else replace ai in h by the next more general constraint that is satisfied by x

3. Output hypothesis h

Program

import csv
a = []

with open('findsdataset.csv', 'r') as csvfile: #Reading data from CSV file

for row in csv.reader(csvfile):


a.append(row)

print(a)
print("\n The total number of training instances are : ",len(a))

num_attribute = len(a[0])-1
print("\n The initial hypothesis is : ")
hypothesis = ['0']*num_attribute
print(hypothesis)

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 7


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

for i in range(0, len(a)):


if a[i][num_attribute] == 'True':
for j in range(0, num_attribute):
if hypothesis[j] == '0' or hypothesis[j] == a[i][j]:
hypothesis[j] = a[i][j]
else:
hypothesis[j] = '?'
print("\n The hypothesis for the training instance {} is :\n" .format(i+1),hypothesis)

print("\n The Maximally specific hypothesis for the training instance is ")
print(hypothesis)

Data Set:
sunny warm normal strong warm same yes
sunny warm high strong warm same yes
rainy cold high strong warm change no
sunny warm high strong cool change yes

Output:

The Given Training Data Set

['sunny', 'warm', 'normal', 'strong', 'warm', 'same', 'yes']


['sunny', 'warm', 'high', 'strong', 'warm', 'same', 'yes']
['rainy', 'cold', 'high', 'strong', 'warm', 'change', 'no']
['sunny', 'warm', 'high', 'strong', 'cool', 'change',

'yes'] The total number of training instances are 4

The initial hypothesis is :


['0', '0', '0', '0', '0', '0']

The hypothesis for the training instance 1 is :


['sunny', 'warm', 'normal', 'strong', 'warm',
'same']

The hypothesis for the training instance 2 is :


['sunny', 'warm', '?', 'strong', 'warm', 'same']

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 8


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

The hypothesis for the training instance 3 is :


['sunny', 'warm', '?', 'strong', 'warm', 'same']

The hypothesis for the training instance 4 is


: ['sunny', 'warm', '?', 'strong', '?',
'?']
The Maximally specific hypothesis for the training
instance is
['sunny', 'warm', '?', 'strong', '?', '?']

Conclusion:

Thus to Implement and demonstrate the FIND- S Algorithm for finding the most specific
hypothesis based on a given set of training data samples was executed successfully.

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 9


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

Program 2
For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate-Elimination algorithm to output a description of the set of all
hypotheses consistent with the training examples.

Objective
Dataset Tennis data set: This data set contain the set of example days on which
playing of tennis is possible or not. Based on attribute Sky, AirTemp,
Humidity, Wind, Water and Forecast. The dataset has 14 instances.
ML
algorithm Supervised Learning-Candidate elimination algorithm
Description

CANDIDATE-ELIMINATION Learning Algorithm

The CANDIDATE-ELIMINTION algorithm computes the version space containing all


hypotheses from H that are consistent with an observed sequence of training examples.
Initialize G to the set of maximally general
hypotheses in H
Initialize S to the set of maximally specific
hypotheses in H
For each training example d, do
• If d is a positive example
• Remove from G any hypothesis inconsistent with d
• For each hypothesis s in S that is not consistent with d
• Remove s from S
• Add to S all minimal generalizations h of s such that
• h is consistent with d, and some member of G is more general than h
• Remove from S any hypothesis that is more general than another hypothesis
in S

• If d is a negative example
• Remove from S any hypothesis inconsistent with d
• For each hypothesis g in G that is not consistent with d
• Remove g from G
• Add to G all minimal specializations h of g such that
• h is consistent with d, and some member of S is more specific than h
• Remove from G any hypothesis that is less general than another hypothesis in
G

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 10


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

Training Examples:

Example Sky AirTemp Humidity Wind Water Forecast Enjoy


Sport
1 Sunny Warm Normal Strong Warm Same Yes
2 Sunny Warm High Strong Warm Same Yes
3 Rainy Cold High Strong Warm Change No
4 Sunny Warm High Strong Cool Change Yes

Program

import numpy as np
import pandas as pd

data = pd.read_csv('enjoysport.csv')
concepts = np.array(data.iloc[:,0:-1])
print(concepts)
target = np.array(data.iloc[:,-1])
print(target)
def learn(concepts, target):
specific_h = concepts[0].copy()
print("initialization of specific_h and general_h")
print(specific_h)
general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))]
print(general_h)

for i, h in enumerate(concepts):
print("For Loop Starts")
if target[i] == "yes":
print("If instance is Positive ")
for x in range(len(specific_h)):
if h[x]!= specific_h[x]:
specific_h[x] ='?'
general_h[x][x] ='?'

if target[i] == "no":
print("If instance is Negative ")
for x in range(len(specific_h)):
if h[x]!= specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'

print(" steps of Candidate Elimination Algorithm",i+1)

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 11


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

print(specific_h)
print(general_h)
print("\n")
print("\n")

indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]
for i in indices:
general_h.remove(['?', '?', '?', '?', '?', '?'])
return specific_h, general_h

s_final, g_final = learn(concepts, target)

print("Final Specific_h:", s_final, sep="\n")


print("Final General_h:", g_final, sep="\n")

Data Set:
sunny warm normal strong warm same yes
sunny warm high strong warm same yes
rainy cold high strong warm change no
sunny warm high strong cool change yes

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 12


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

output:
[['sunny' 'warm' 'normal' 'strong' 'warm' 'same']
['sunny' 'warm' 'high' 'strong' 'warm' 'same']
['rainy' 'cold' 'high' 'strong' 'warm' 'change']
['sunny' 'warm' 'high' 'strong' 'cool' 'change']]
['yes' 'yes' 'no' 'yes']
initialization of specific_h and general_h
['sunny' 'warm' 'normal' 'strong' 'warm' 'same']
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?',
'?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]
For Loop Starts
If instance is Positive
steps of Candidate Elimination Algorithm 1
['sunny' 'warm' 'normal' 'strong' 'warm' 'same']
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?',
'?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]

For Loop Starts


If instance is Positive
steps of Candidate Elimination Algorithm 2
['sunny' 'warm' '?' 'strong' 'warm' 'same']
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?',
'?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]

For Loop Starts


If instance is Negative
steps of Candidate Elimination Algorithm 3
['sunny' 'warm' '?' 'strong' 'warm' 'same']
[['sunny', '?', '?', '?', '?', '?'], ['?', 'warm', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', 'same']]

For Loop Starts


If instance is Positive
steps of Candidate Elimination Algorithm 4
['sunny' 'warm' '?' 'strong' '?' '?']
[['sunny', '?', '?', '?', '?', '?'], ['?', 'warm', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]

Final Specific_h:
['sunny' 'warm' '?' 'strong' '?' '?']
Final General_h:
[['sunny', '?', '?', '?', '?', '?'], ['?', 'warm', '?', '?', '?', '?']]

Conclusion:

Thus to implement and demonstrate the Candidate-Elimination algorithm to output a


description of the set of all hypotheses consistent with the training examples was executed
successfully.

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 13


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

Program 3
Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use
an appropriate data set for building the decision tree and apply this knowledge to classify a
new sample.
Objective To demonstrate the working of the decision tree based ID3 algorithm.
Dataset Tennis data set: This data set contain the set of example days on which
playing of tennis is possible or not. Based on attribute Sky, AirTemp,
Humidity, Wind, Water and Forecast.
ML Supervised Learning-Decision Tree algorithm
algorithm
Description Decision tree builds classification or regression models in the form of a
tree structure. It breaks down a dataset into smaller and smaller subsets
while at the same time an associated decision tree is incrementally
developed. The final result is a tree with decision nodes and leaf nodes.

ID3 Algorithm
ID3(Examples, Target_attribute, Attributes)

Examples are the training examples. Target_attribute is the attribute whose value is to be
predicted by the tree. Attributes is a list of other attributes that may be tested by the learned
decision tree. Returns a decision tree that correctly classifies the given Examples.
• Create a Root node for the tree
• If all Examples are positive, Return the single-node tree Root, with label = +
• If all Examples are negative, Return the single-node tree Root, with label = -
• If Attributes is empty, Return the single-node tree Root, with label = most common
value of Target_attribute in Examples

• Otherwise Begin
• A ← the attribute from Attributes that best* classifies Examples
• The decision attribute for Root ← A
• For each possible value, vi, of A,
• Add a new tree branch below Root, corresponding to the test A = vi
• Let Examples vi, be the subset of Examples that have value vi for A
• If Examples vi , is empty
• Then below this new branch add a leaf node with label = most
common value of Target_attribute in Examples
• Else below this new branch add the subtree
ID3(Examples vi, Targe_tattribute, Attributes –
{A}))
• End
• Return Root

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 14


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

Program:
import math
import csv
def load_csv(filename):
lines=csv.reader(open(filename,"r"));
dataset = list(lines)
headers = dataset.pop(0)
return dataset,headers

class Node:
def __init__(self,attribute):
self.attribute=attribute
self.children=[]
self.answer=""

def subtables(data,col,delete):
dic={}
coldata=[row[col] for row in data]
attr=list(set(coldata))

counts=[0]*len(attr)
r=len(data)
c=len(data[0])
for x in range(len(attr)):
for y in range(r):
if data[y][col]==attr[x]:
counts[x]+=1

for x in range(len(attr)):

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 15


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

dic[attr[x]]=[[0 for i in range(c)] for j in range(counts[x])]


pos=0
for y in range(r):
if data[y][col]==attr[x]:
if delete:
del data[y][col]
dic[attr[x]][pos]=data[y]
pos+=1
return attr,dic

def entropy(S):
attr=list(set(S))
if len(attr)==1:
return 0

counts=[0,0]
for i in range(2):
counts[i]=sum([1 for x in S if attr[i]==x])/(len(S)*1.0)

sums=0
for cnt in counts:
sums+=-1*cnt*math.log(cnt,2)
return sums

def compute_gain(data,col):
attr,dic = subtables(data,col,delete=False)

total_size=len(data)
entropies=[0]*len(attr)
ratio=[0]*len(attr)

total_entropy=entropy([row[-1] for row in data])


for x in range(len(attr)):
ratio[x]=len(dic[attr[x]])/(total_size*1.0)
entropies[x]=entropy([row[-1] for row in dic[attr[x]]])
total_entropy-=ratio[x]*entropies[x]
return total_entropy

def build_tree(data,features):
lastcol=[row[-1] for row in data]
if(len(set(lastcol)))==1:
node=Node("")
node.answer=lastcol[0]
return node

n=len(data[0])-1
gains=[0]*n
for col in range(n):
gains[col]=compute_gain(data,col)
split=gains.index(max(gains))

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 16


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

node=Node(features[split])
fea = features[:split]+features[split+1:]

attr,dic=subtables(data,split,delete=True)

for x in range(len(attr)):
child=build_tree(dic[attr[x]],fea)
node.children.append((attr[x],child))
return node

def print_tree(node,level):
if node.answer!="":
print(" "*level,node.answer)
return

print(" "*level,node.attribute)
for value,n in node.children:
print(" "*(level+1),value)
print_tree(n,level+2)
def classify(node,x_test,features):
if node.answer!="":
print(node.answer)
return
pos=features.index(node.attribute)
for value, n in node.children:
if x_test[pos]==value:
classify(n,x_test,features)

'''Main program'''
dataset,features=load_csv("id3.csv")
node1=build_tree(dataset,features)

print("The decision tree for the dataset using ID3 algorithm is")
print_tree(node1,0)
testdata,features=load_csv("id3_test.csv")

for xtest in testdata:


print("The test instance:",xtest)
print("The label for test instance:",end=" ")
classify(node1,xtest,features)

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 17


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

Training Dataset:

Day Outlook Temperature Humidity Wind PlayTennis


D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No

Test Dataset:

Day Outlook Temperature Humidity Wind


T1 Rain Cool Normal Strong
T2 Sunny Mild Normal Strong

* The best attribute is the one with highest information gain


ENTROPY:
Entropy measures the impurity of a collection of
examples.

Where, p+ is the proportion of positive


examples in S
p- is the proportion of negative examples in S.

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 18


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

INFORMATION GAIN:

• Information gain, is the expected reduction in entropy caused by


partitioning the examples according to this attribute.
• The information gain, Gain(S, A) of an attribute A, relative to a
collection of examples S, is defined as

OUTPUT
The decision tree for the dataset using ID3 algorithm is
Outlook
rain
Wind
strong
no
weak
yes
sunny
Humidity
high
no
normal
yes
overcast
yes
The test instance: ['rain', 'cool', 'normal', 'strong']
The label for test instance: no
The test instance: ['sunny', 'mild', 'normal', 'strong']
The label for test instance: yes

Conclusion:
Thus the working of the decision tree based ID3 algorithm was demonstrated successfully.

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 19


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

Program 4
Build an Artificial Neural Network by implementing the Backpropagation algorithm and
test the same using appropriate data sets.
Objective To build an artificial neural network using the backpropagation
algorithm.
Dataset Data stored as a list having two features- number of hours slept,
number of hours studied with the test score being the class label
ML Supervised Learning –Backpropagation algorithm
algorithm
Description The neural network using back propagation will model a single hidden
layer with three inputs and one output. The network will be predicting
the score of an exam based on the inputs of number of hours studied
and the number of hours slept the day before. The test score is the
output.

BACKPROPAGATION Algorithm

BACKPROPAGATION (training_example, ƞ, nin, nout, nhidden )


Each training example is a pair of the form (⃗𝑥,𝑡 ), where (𝑥) is the vector of
network input values, (𝑡 ) and is the vector of target network output
values.
ƞ is the learning rate (e.g., .05). ni, is the number of network inputs, nhidden
the number of units in the hidden layer, and nout the number of output
units.
The input from unit i into unit j is denoted xji, and the weight from unit i
to unit j is denoted wji

• Create a feed-forward network with ni inputs, nhidden hidden units, and nout output
units.
• Initialize all network weights to small random numbers
• Until the termination condition is met, Do For each (⃗𝑥,𝑡 ), in training
examples, Do
Propagate the input forward through the network:

1. Input the instance ⃗𝑥, to the network and compute the output ou of every
unit u in the network.

Propagate the errors backward through the network

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 20


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

Assign all network inputs and output

Initialize all weights with small random numbers, typically between -1 and 1

repeat

for every pattern in the training set

Present the pattern to the network

Propagated the input forward through the network: for each


layer in the network
for every node in the layer
Calculate the weight sum of the inputs to the node
Add the threshold to the sum
Calculate the activation for the node
end
end

Propagate the errors backward through the network for every


node in the output layer
calculate the error signal
end

for all hidden layers


for every node in the layer
Calculate the node's signal error
Update each node's weight in the network

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 21


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

end
end

Calculate Global Error


Calculate the Error Function

end

while ((maximum number of iterations < than specified) AND


(Error Function is > than specified))

Input layer with two inputs neurons


One hidden layer with two neurons
Output layer with a single neuron

Training Examples:

Expected % in
Example Sleep Study
Exams
1 2 9 92
2 1 5 86
3 3 6 89

Normalize the input


Expected %
Example Sleep Study
in Exams
1 2/3 = 0.66666667 9/9 = 1 0.92
2 1/3 = 0.33333333 5/9 = 0.55555556 0.86
3 3/3 = 1 6/9 = 0.66666667 0.89

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 22


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

Program:

import numpy as np
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
y = np.array(([92], [86], [89]), dtype=float)
X = X/np.amax(X,axis=0) # maximum of X array longitudinally
y = y/100
#Sigmoid Function
def sigmoid (x):
return 1/(1 + np.exp(-x))
#Derivative of Sigmoid Function
def derivatives_sigmoid(x):
return x * (1 - x)
#Variable initialization
epoch=7000 #Setting training iterations
lr=0.1 #Setting learning rate
inputlayer_neurons = 2 #number of features in data set
hiddenlayer_neurons = 3 #number of hidden layers neurons
output_neurons = 1 #number of neurons at output layer
#weight and bias initialization
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))
#draws a random range of numbers uniformly of dim x*y
for i in range(epoch):
#Forward Propogation
hinp1=np.dot(X,wh)
hinp=hinp1 + bh
hlayer_act = sigmoid(hinp)
outinp1=np.dot(hlayer_act,wout)
outinp= outinp1+ bout
output = sigmoid(outinp)
#Backpropagation
EO = y-output
outgrad = derivatives_sigmoid(output)
d_output = EO* outgrad
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)#how much hidden layer wts contributed to
error
d_hiddenlayer = EH * hiddengrad
wout += hlayer_act.T.dot(d_output) *lr# dotproduct of nextlayererror and currentlayerop
# bout += np.sum(d_output, axis=0,keepdims=True) *lr
wh += X.T.dot(d_hiddenlayer) *lr
#bh += np.sum(d_hiddenlayer, axis=0,keepdims=True) *lr
print("Input: \n" + str(X))
print("Actual Output: \n" + str(y))
print("Predicted Output: \n" ,output)

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 23


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

OUTPUT:
Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[0.92]
[0.86]
[0.89]]
Predicted Output:
[[0.89533147]
[0.88083584]
[0.89416396]]

Conclusion:

Thus to implement the Backpropagation algorithm and to test the same using appropriate data
sets was executed successfully.

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 24


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

Program 5
Write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file. Compute the accuracy of the classifier, considering few test data sets.

Objective To implement a classification model for a sample training dataset and


computing the accuracy of the classifier for test data.
Dataset Pima Indian Diabetes dataset stored as .CSV .The attributes are
Number of times pregnant, Plasma glucose concentration, Blood
Pressure, Triceps skin fold thickness, serum insulin, Body mass index,
Diabetes pedigree function, Age
ML Supervised Learning -Naïve Bayes Algorithm
algorithm
Description The Naïve Bayes classifier is a probabilistic classifier that is based on
Bayes Theorem. The algorithm builds a model assuming that the
attributes in the dataset are independent of each other.

Bayes’ Theorem is stated as:

Where,
P(h|D) is the probability of hypothesis h given the data D. This is called the posterior
probability.
P(D|h) is the probability of data d given that the hypothesis h was true.
P(h) is the probability of hypothesis h being true. This is called the prior
probability of h. P(D) is the probability of the data. This is called the prior
probability of D

After calculating the posterior probability for a number of different hypotheses h, and is
interested in finding the most probable hypothesis h ∈ H given the observed data D.
Any such maximally probable hypothesis is called a maximum a posteriori (MAP)
hypothesis.
Bayes theorem to calculate the posterior probability of each candidate hypothesis is

hMAP is a MAP hypothesis provided(Ignoring P(D) since it is a constant)

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 25


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

Gaussian Naive Bayes


A Gaussian Naive Bayes algorithm is a special type of Naïve Bayes algorithm. It’s
specifically used when the features have continuous values. It’s also assumed that all
the features are following a Gaussian distribution i.e., normal distribution
Representation for Gaussian Naive Bayes
We calculate the probabilities for input values for each class using a frequency. With
real- valued inputs, we can calculate the mean and standard deviation of input values
(x) for each class to summarize the distribution.
This means that in addition to the probabilities for each class, we must also store the
mean and standard deviations for each input variable for each class.
Gaussian Naive Bayes Model from Data
The probability density function for the normal distribution is defined by two parameters
(mean and standard deviation) and calculating the mean and standard deviation values
of each input variable (x) for each class value.

Program

import csv
import random
import math

def loadCsv(filename):
lines = csv.reader(open(filename, "r"));
dataset = list(lines)
for i in range(len(dataset)):
#converting strings into numbers for processing
dataset[i] = [float(x) for x in dataset[i]]

return dataset

def splitDataset(dataset, splitRatio):


#67% training size
trainSize = int(len(dataset) * splitRatio);

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 26


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

trainSet = []
copy = list(dataset);
while len(trainSet) < trainSize:
#generate indices for the dataset list randomly to pick ele for
training data
index = random.randrange(len(copy));
trainSet.append(copy.pop(index))
return [trainSet, copy]

def separateByClass(dataset):
separated = {}
#creates a dictionary of classes 1 and 0 where the values are the
instacnes belonging to each class
for i in range(len(dataset)):
vector = dataset[i]
if (vector[-1] not in separated):
separated[vector[-1]] = []
separated[vector[-1]].append(vector)
return separated

def mean(numbers):
return sum(numbers)/float(len(numbers))

def stdev(numbers):
avg = mean(numbers)
variance = sum([pow(x-avg,2) for x in
numbers])/float(len(numbers)-1)
return math.sqrt(variance)

def summarize(dataset):
summaries = [(mean(attribute), stdev(attribute)) for attribute in
zip(*dataset)];
del summaries[-1]
return summaries

def summarizeByClass(dataset):
separated = separateByClass(dataset);
summaries = {}
for classValue, instances in separated.items():
#summaries is a dic of tuples(mean,std) for each class value
summaries[classValue] = summarize(instances)
return summaries

def calculateProbability(x, mean, stdev):


exponent = math.exp(-(math.pow(x-mean,2)/(2*math.pow(stdev,2))))
return (1 / (math.sqrt(2*math.pi) * stdev)) * exponent

def calculateClassProbabilities(summaries, inputVector):


probabilities = {}
for classValue, classSummaries in summaries.items():#class and
attribute information as mean and sd
probabilities[classValue] = 1
for i in range(len(classSummaries)):
mean, stdev = classSummaries[i] #take mean and sd of
every attribute for class 0 and 1 seperaely
x = inputVector[i] #testvector's first attribute
probabilities[classValue] *= calculateProbability(x,
mean, stdev);#use normal dist

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 27


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

return probabilities

def predict(summaries, inputVector):


probabilities = calculateClassProbabilities(summaries,
inputVector)
bestLabel, bestProb = None, -1
for classValue, probability in probabilities.items():#assigns
that class which has he highest prob
if bestLabel is None or probability > bestProb:
bestProb = probability
bestLabel = classValue
return bestLabel

def getPredictions(summaries, testSet):


predictions = []
for i in range(len(testSet)):
result = predict(summaries, testSet[i])
predictions.append(result)
return predictions

def getAccuracy(testSet, predictions):


correct = 0
for i in range(len(testSet)):
if testSet[i][-1] == predictions[i]:
correct += 1
return (correct/float(len(testSet))) * 100.0

def main():
filename = 'naivedata.csv'
splitRatio = 0.67
dataset = loadCsv(filename);

trainingSet, testSet = splitDataset(dataset, splitRatio)


print('Split {0} rows into train={1} and test={2}
rows'.format(len(dataset), len(trainingSet), len(testSet)))
# prepare model
summaries = summarizeByClass(trainingSet);
# test model
predictions = getPredictions(summaries, testSet)
accuracy = getAccuracy(testSet, predictions)
print('Accuracy of the classifier is : {0}%'.format(accuracy))

main()

OUTPUT:

Split 768 rows into train=514 and test=254 rows

Accuracy of the classifier is : 68.50393700787401%

Conclusion:

Thus to implement the naïve Bayesian classifier to compute the accuracy of the classifier was
executed successfully.

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 28


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

Program 6
Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier
model to perform this task. Built-in Java classes/API can be used to write the program.
Calculate the accuracy, precision, and recall for your data set.
Objective To implement a binary classification model that classifies a set of
documents and calculates the accuracy, precision and recall for the
dataset
Dataset Contains text as sentences labelled positive and negative. The dataset
contains a total of 10 instances.
ML Supervised Learning -Naïve Bayes Algorithm
algorithm
Packages Scikit learn(sklearn),pandas
Description The Naïve Bayes classifier is a probabilistic classifier that is based on
Bayes Theorem. The algorithm builds a model assuming that the
attributes in the dataset are independent of each other.

Naive Bayes algorithms for learning and classifying text

LEARN_NAIVE_BAYES_TEXT (Examples, V)
Examples is a set of text documents along with their target values. V is the set of all
possible target values. This function learns the probability terms P(wk |vj,),
describing the probability that a randomly drawn word from a document in class vj
will be the English word wk. It also learns the class prior probabilities P(vj).

1. collect all words, punctuation, and other tokens that occur in Examples
• Vocabulary ← c the set of all distinct words and other tokens occurring in any
text document from Examples

2. calculate the required P(vj) and P(wk|vj) probability terms

• For each target value vj in V do


• docsj ← the subset of documents from Examples for which the target value is
vj
• P(vj) ← | docsj | / |Examples|
• Textj ← a single document created by concatenating all members of docsj
• n ← total number of distinct word positions in Textj
• for each word wk in Vocabulary
• nk ← number of times word wk occurs in Textj
• P(wk|vj) ← ( nk + 1) / (n + | Vocabulary| )

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 29


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

CLASSIFY_NAIVE_BAYES_TEXT (Doc)

Return the estimated target value for the document Doc. ai denotes the word found in the
ith position within Doc.

• positions ← all word positions in Doc that contain tokens found in Vocabulary
• Return VNB, where

Dataset/Examples:

Text Documents Label


1 I love this sandwich pos
2 This is an amazing place pos
3 I feel very good about these beers pos
4 This is my best work pos
5 What an awesome view pos
6 I do not like this restaurant neg
7 I am tired of this stuff neg
8 I can't deal with this neg
9 He is my sworn enemy neg
10 My boss is horrible neg
11 This is an awesome place pos
12 I do not like the taste of this juice neg
13 I love to dance pos
14 I am sick and tired of this place neg
15 What a great holiday pos
16 That is a bad locality to stay neg
17 We will have good fun tomorrow pos
18 I went to my enemy's house today neg

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 30


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

Program
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn import metrics

msg=pd.read_csv('naivetext6.csv',names=['message','label'])

print('The dimensions of the dataset',msg.shape)

msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.message
y=msg.labelnum

#splitting the dataset into train and test data


xtrain,xtest,ytrain,ytest=train_test_split(X,y)
print ('\n the total number of Training Data :',ytrain.shape)
print ('\n the total number of Test Data :',ytest.shape)

#output the words or Tokens in the text documents


cv = CountVectorizer()
xtrain_dtm = cv.fit_transform(xtrain)
xtest_dtm=cv.transform(xtest)
print('\n The words or Tokens in the text documents \n')
print(cv.get_feature_names())
df=pd.DataFrame(xtrain_dtm.toarray(),columns=cv.get_feature_names())

# Training Naive Bayes (NB) classifier on training data.


clf = MultinomialNB().fit(xtrain_dtm,ytrain)
predicted = clf.predict(xtest_dtm)

#printing accuracy, Confusion matrix, Precision and Recall


print('\n Accuracy of the classifier is',metrics.accuracy_score(ytest,predicted))
print('\n Confusion matrix')
print(metrics.confusion_matrix(ytest,predicted))
print('\n The value of Precision', metrics.precision_score(ytest,predicted))
print('\n The value of Recall', metrics.recall_score(ytest,predicted))

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 31


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

OUTPUT:

The dimensions of the dataset (18, 2)

the total number of Training Data : (13,)

the total number of Test Data : (5,)

The words or Tokens in the text documents

['about', 'an', 'awesome', 'bad', 'beers', 'boss', 'dance', 'do', 'enemy', 'feel', 'fun', 'good', 'great',
'have', 'he', 'holiday', 'horrible', 'house', 'is', 'juice', 'like', 'locality', 'love', 'my', 'not', 'of', 'place',
'restaurant', 'sandwich', 'stay', 'sworn', 'taste', 'that', 'the', 'these', 'this', 'to', 'today', 'tomorrow',
'very', 'view', 'we', 'went', 'what', 'will']

Accuracy of the classifier is 0.4

Confusion matrix
[[1 2]
[1 1]]

The value of Precision 0.3333333333333333

The value of Recall 0.5

Conclusion:

Thus to use the naïve Bayesian Classifier model Calculate the accuracy, precision, and recall
for your data set was executed successfully.

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 32


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

Program 7
Write a program to construct a Bayesian network considering medical data. Use this model
to demonstrate the diagnosis of heart patients using standard Heart Disease Data Set. You
can use Java/Python ML library classes/API.

import bayespy as bp
import numpy as np
import csv
from colorama import init
from colorama import Fore, Back, Style
init()
ageEnum = {'SuperSeniorCitizen':0, 'SeniorCitizen':1, 'MiddleAged':2, 'Youth':3, 'Teen':4}
genderEnum = {'Male':0, 'Female':1}
familyHistoryEnum = {'Yes':0, 'No':1}
dietEnum = {'High':0, 'Medium':1, 'Low':2}
lifeStyleEnum = {'Athlete':0, 'Active':1, 'Moderate':2, 'Sedetary':3}
cholesterolEnum = {'High':0, 'BorderLine':1, 'Normal':2}
heartDiseaseEnum = {'Yes':0, 'No':1}
with open('heart_disease_data.csv') as csvfile:
lines = csv.reader(csvfile)
dataset = list(lines)
data = []
for x in dataset:

data.append([ageEnum[x[0]],genderEnum[x[1]],familyHistoryEnum[x[2]],dietEnum[x[3]],lif
eStyleEnum[x[4]],cholesterolEnum[x[5]],heartDiseaseEnum[x[6]]])
data = np.array(data)
N = len(data)
p_age = bp.nodes.Dirichlet(1.0*np.ones(5))
age = bp.nodes.Categorical(p_age, plates=(N,))
age.observe(data[:,0])
p_gender = bp.nodes.Dirichlet(1.0*np.ones(2))
gender = bp.nodes.Categorical(p_gender, plates=(N,))
gender.observe(data[:,1])
p_familyhistory = bp.nodes.Dirichlet(1.0*np.ones(2))
familyhistory = bp.nodes.Categorical(p_familyhistory, plates=(N,))
familyhistory.observe(data[:,2])
p_diet = bp.nodes.Dirichlet(1.0*np.ones(3))
diet = bp.nodes.Categorical(p_diet, plates=(N,))
diet.observe(data[:,3])
p_lifestyle = bp.nodes.Dirichlet(1.0*np.ones(4))
lifestyle = bp.nodes.Categorical(p_lifestyle, plates=(N,))
lifestyle.observe(data[:,4])
p_cholesterol = bp.nodes.Dirichlet(1.0*np.ones(3))
cholesterol = bp.nodes.Categorical(p_cholesterol, plates=(N,))
cholesterol.observe(data[:,5])
p_heartdisease = bp.nodes.Dirichlet(np.ones(2), plates=(5, 2, 2, 3, 4, 3))
heartdisease = bp.nodes.MultiMixture([age, gender, familyhistory, diet, lifestyle, cholesterol],
bp.nodes.Categorical, p_heartdisease)

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 33


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

heartdisease.observe(data[:,6])
p_heartdisease.update()
m=0
while m == 0:
print("\n")
res = bp.nodes.MultiMixture([int(input('Enter Age: ' + str(ageEnum))), int(input('Enter
Gender: ' + str(genderEnum))), int(input('Enter FamilyHistory: ' + str(familyHistoryEnum))),
int(input('Enter dietEnum: ' + str(dietEnum))), int(input('Enter LifeStyle: ' +
str(lifeStyleEnum))), int(input('Enter Cholesterol: ' + str(cholesterolEnum)))],
bp.nodes.Categorical, p_heartdisease).get_moments()[0][heartDiseaseEnum['Yes']]
print("Probability(HeartDisease) = " + str(res))
m = int(input("Enter for Continue:0, Exit :1 "))

INPUT:

SuperSeniorCitizen,Male,Yes,Medium,Sedetary,High,Yes
SuperSeniorCitizen,Female,Yes,Medium,Sedetary,High,Yes
SeniorCitizen,Male,No,High,Moderate,BorderLine,Yes
Teen,Male,Yes,Medium,Sedetary,Normal,No
Youth,Female,Yes,High,Athlete,Normal,No
MiddleAged,Male,Yes,Medium,Active,High,Yes
Teen,Male,Yes,High,Moderate,High,Yes
SuperSeniorCitizen,Male,Yes,Medium,Sedetary,High,Yes
Youth,Female,Yes,High,Athlete,Normal,No
SeniorCitizen,Female,No,High,Athlete,Normal,Yes
Teen,Female,No,Medium,Moderate,High,Yes
Teen,Male,Yes,Medium,Sedetary,Normal,No
MiddleAged,Female,No,High,Athlete,High,No
MiddleAged,Male,Yes,Medium,Active,High,Yes
Youth,Female,Yes,High,Athlete,BorderLine,No
SuperSeniorCitizen,Male,Yes,High,Athlete,Normal,Yes
SeniorCitizen,Female,No,Medium,Moderate,BorderLine,Yes
Youth,Female,Yes,Medium,Athlete,BorderLine,No
Teen,Male,Yes,Medium,Sedetary,Normal,No

Note: install bayespy by the following command

Pip install bayespy

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 34


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

OUTPUT:

Enter Age: {'SuperSeniorCitizen': 0, 'SeniorCitizen': 1, 'MiddleAged': 2, 'Youth': 3, 'Teen': 4}


0

Enter Gender: {'Male': 0, 'Female': 1}0

Enter FamilyHistory: {'Yes': 0, 'No': 1}0

Enter dietEnum: {'High': 0, 'Medium': 1, 'Low': 2}1

Enter LifeStyle: {'Athlete': 0, 'Active': 1, 'Moderate': 2, 'Sedetary': 3}2

Enter Cholesterol: {'High': 0, 'BorderLine': 1, 'Normal': 2}1


Probability(HeartDisease) = 0.5

Enter for Continue:0, Exit :1

Conclusion:

Thus to construct a Bayesian network was executed successfully.

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 35


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

Program 8
Apply EM algorithm to cluster a set of data stored in a .csv file. Use the same data set for
clustering k-means algorithm. Compare the results of these two algorithms and comment on
the quality of clustering. You can add Java/Python ML library classes/API in the program.

Objective To group a set of unlabelled data into similar classes/clusters and label
them and to compare the quality of algorithm.
Dataset Delivery fleet driver dataset Data set in .csv file with features
“Driver_ID”, “distance_feature”,”speeding_feature” having more than
20 instances
ML EM algorithm, K means algorithm – Unsupervised clustering
algorithm
Packages Scikit learn(sklearn),pandas
Description EM algorithm – soft clustering - can be used for variable whose value is
never directly observed, provided the general probability distribution
governing these varaiable is known. EM algorithm can be used to train
Bayesian belief networks as well as radial basis function network.
K-Means – Hard Clustering - to find groups in the data, with the
number of groups represented by the variable K. The algorithm
works iteratively to assign each data point to one of K groups based
on the features that are provided. Data points are clustered based
on feature similarity.

Algorithm
1. Clusters the data into k groups where k is predefined.
2. Select k points at random as cluster centers.
3. Assign objects to their closest cluster center according to the
Euclidean distance function.
4. Calculate the centroid or mean of all objects in each cluster.
5. Repeat steps 2, 3 and 4 until the same points are assigned to each
cluster in consecutive rounds.

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 36


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

Program
from sklearn.cluster import KMeans

#from sklearn import metrics


import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
data=pd.read_csv("kmeansdata.csv")
df1=pd.DataFrame(data)
print(df1)
f1 = df1['Distance_Feature'].values
f2 = df1['Speeding_Feature'].values

X=np.matrix(list(zip(f1,f2)))
plt.plot()
plt.xlim([0, 100])
plt.ylim([0, 50])
plt.title('Dataset')
plt.ylabel('speeding_feature')
plt.xlabel('Distance_Feature')
plt.scatter(f1,f2)
plt.show()

# create new plot and data


plt.plot()
colors = ['b', 'g', 'r']
markers = ['o', 'v', 's']

# KMeans algorithm
#K = 3
kmeans_model = KMeans(n_clusters=3).fit(X)

plt.plot()
for i, l in enumerate(kmeans_model.labels_):
plt.plot(f1[i], f2[i], color=colors[l], marker=markers[l],ls='None')
plt.xlim([0, 100])
plt.ylim([0, 50])

plt.show()

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 37


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

INPUT/DATASET

Driver_ID Distance_Feature Speeding_Feature


3423311935 71.24 28
3423313212 52.53 25
3423313724 64.54 27
3423311373 55.69 22
3423310999 54.58 25
3423313857 41.91 10
3423312432 58.64 20
3423311434 52.02 8
3423311328 31.25 34
3423312488 44.31 19
3423311254 49.35 40
3423312943 58.07 45
3423312536 44.22 22
3423311542 55.73 19
3423312176 46.63 43
3423314176 52.97 32
3423314202 46.25 35
3423311346 51.55 27
3423310666 57.05 26
3423313527 58.45 30
3423312182 43.42 23
3423313590 55.68 37
3423312268 55.15 18

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 38


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

OUTPUT

Driver_ID Distance_Feature Speeding_Feature


0 3423311935 71.24 28
1 3423313212 52.53 25
2 3423313724 64.54 27
3 3423311373 55.69 22
4 3423310999 54.58 25
5 3423313857 41.91 10
6 3423312432 58.64 20
7 3423311434 52.02 8
8 3423311328 31.25 34
9 3423312488 44.31 19
10 3423311254 49.35 40
11 3423312943 58.07 45
12 3423312536 44.22 22
13 3423311542 55.73 19
14 3423312176 46.63 43
15 3423314176 52.97 32
16 3423314202 46.25 35
17 3423311346 51.55 27
18 3423310666 57.05 26
19 3423313527 58.45 30
20 3423312182 43.42 23
21 3423313590 55.68 37
22 3423312268 55.15 18

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 39


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

Conclusion:

Thus to compare EM algorithm to cluster a set of data stored in a .csv file and using the same
data set for clustering k-means algorithm was executed successfully.

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 40


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

Program 9
Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions. Java/Python ML library classes can be used for this
problem.
Objective To implement a classification model that classifies a set of documents
and calculates the accuracy, precision and recall for the dataset
Dataset IRIS data set with features “petal_length”, “petal_width”,
“sepal_length”, “sepal_width” having more than 150 instances
ML Supervised Learning – Lazy learning algorithm
algorithm
Packages Scikit learn(sklearn),pandas
Description When we get training set/instances, machine won’t learn or a model
can’t be built. Instead instances/examples will be just stored in
memory.Test instance is given, attempt to find the closest instance/most
neighboring instances in the instance space….

K-Nearest Neighbor (KNN)Algorithm

Training algorithm:
• For each training example (x, f (x)), add the example to the list training
examples Classification algorithm:
• Given a query instance xq to be classified,
• Let x1 . . .xk denote the k instances from training examples that are nearest to xq
• Return

• Where, f(xi) function to calculate the mean value of the k nearest training
examples.

Steps
1. Load the data
2. Initialize the value of k
3. For getting the predicted class, iterate from 1 to total number of training
data points
1. Calculate the distance between test data and each row of training
data. Here we will use Euclidean distance as our distance metric since
it’s the most popular method. The other metrics that can be used are
Chebyshev, cosine, etc.
2. Sort the calculated distances in ascending order based on distance
values
3. Get top k rows from the sorted array
4. Get the most frequent class of these rows i.e Get the labels of the
selected K entries
5. Return the predicted class

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 41


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21


 
If regression, return the mean of the K labels
 
If classification, return the mode of the K labels

Confusion matrix:
Note,
• Class 1 : Positive
• Class 2 : Negative

• Positive (P) : Observation is positive (for example: is an apple).


• Negative (N) : Observation is not positive (for example: is not an apple).
• True Positive (TP) : Observation is positive, and is predicted to be positive.
• False Negative (FN) : Observation is positive, but is predicted negative.
(Also known as a "Type II error.")
• True Negative (TN) : Observation is negative, and is predicted to be negative.
• False Positive (FP) : Observation is negative, but is predicted positive.
(Also known as a "Type I error.")

Data Set:
Iris Plants Dataset: Dataset contains 150 instances (50 in each of three
classes) Number of Attributes: 4 numeric, predictive attributes and the Class

Program

from sklearn.model_selection import train_test_split


from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix
from sklearn import datasets

iris=datasets.load_iris()

x = iris.data
y = iris.target

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 42


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

print ('sepal-length', 'sepal-width', 'petal-length', 'petal-width')


print(x)
print('class: 0-Iris-Setosa, 1- Iris-Versicolour, 2- Iris-Virginica')
print(y)

x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.3)

#To Training the model and Nearest nighbors K=5


classifier = KNeighborsClassifier(n_neighbors=5)
classifier.fit(x_train, y_train)

#To make predictions on our test data


y_pred=classifier.predict(x_test)

print('Confusion Matrix')
print(confusion_matrix(y_test,y_pred))
print('Accuracy Metrics')
print(classification_report(y_test,y_pred))

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 43


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

OUTPUT
sepal-length sepal-width petal-length petal-width
[[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5. 3.6 1.4 0.2]
[5.4 3.9 1.7 0.4]
[4.6 3.4 1.4 0.3]
[5. 3.4 1.5 0.2]
[4.4 2.9 1.4 0.2]
[4.9 3.1 1.5 0.1]
[5.4 3.7 1.5 0.2]
[4.8 3.4 1.6 0.2]
[4.8 3. 1.4 0.1]
[4.3 3. 1.1 0.1]
[5.8 4. 1.2 0.2]
[5.7 4.4 1.5 0.4]
[5.4 3.9 1.3 0.4]
[5.1 3.5 1.4 0.3]
[5.7 3.8 1.7 0.3]
[5.1 3.8 1.5 0.3]
[5.4 3.4 1.7 0.2]
[5.1 3.7 1.5 0.4]
[4.6 3.6 1. 0.2]
[5.1 3.3 1.7 0.5]
[4.8 3.4 1.9 0.2]
[5. 3. 1.6 0.2]
[5. 3.4 1.6 0.4]
[5.2 3.5 1.5 0.2]
[5.2 3.4 1.4 0.2]
[4.7 3.2 1.6 0.2]
[4.8 3.1 1.6 0.2]
[5.4 3.4 1.5 0.4]
[5.2 4.1 1.5 0.1]
[5.5 4.2 1.4 0.2]
[4.9 3.1 1.5 0.1]
[5. 3.2 1.2 0.2]
[5.5 3.5 1.3 0.2]
[4.9 3.1 1.5 0.1]
[4.4 3. 1.3 0.2]
[5.1 3.4 1.5 0.2]
[5. 3.5 1.3 0.3]
[4.5 2.3 1.3 0.3]
[4.4 3.2 1.3 0.2]
[5. 3.5 1.6 0.6]
[5.1 3.8 1.9 0.4]
[4.8 3. 1.4 0.3]
[5.1 3.8 1.6 0.2]
[4.6 3.2 1.4 0.2]
[5.3 3.7 1.5 0.2]
[5. 3.3 1.4 0.2]
[7. 3.2 4.7 1.4]
[6.4 3.2 4.5 1.5]
[6.9 3.1 4.9 1.5]

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 44


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

[5.5 2.3 4. 1.3]


[6.5 2.8 4.6 1.5]
[5.7 2.8 4.5 1.3]
[6.3 3.3 4.7 1.6]
[4.9 2.4 3.3 1. ]
[6.6 2.9 4.6 1.3]
[5.2 2.7 3.9 1.4]
[5. 2. 3.5 1. ]
[5.9 3. 4.2 1.5]
[6. 2.2 4. 1. ]
[6.1 2.9 4.7 1.4]
[5.6 2.9 3.6 1.3]
[6.7 3.1 4.4 1.4]
[5.6 3. 4.5 1.5]
[5.8 2.7 4.1 1. ]
[6.2 2.2 4.5 1.5]
[5.6 2.5 3.9 1.1]
[5.9 3.2 4.8 1.8]
[6.1 2.8 4. 1.3]
[6.3 2.5 4.9 1.5]
[6.1 2.8 4.7 1.2]
[6.4 2.9 4.3 1.3]
[6.6 3. 4.4 1.4]
[6.8 2.8 4.8 1.4]
[6.7 3. 5. 1.7]
[6. 2.9 4.5 1.5]
[5.7 2.6 3.5 1. ]
[5.5 2.4 3.8 1.1]
[5.5 2.4 3.7 1. ]
[5.8 2.7 3.9 1.2]
[6. 2.7 5.1 1.6]
[5.4 3. 4.5 1.5]
[6. 3.4 4.5 1.6]
[6.7 3.1 4.7 1.5]
[6.3 2.3 4.4 1.3]
[5.6 3. 4.1 1.3]
[5.5 2.5 4. 1.3]
[5.5 2.6 4.4 1.2]
[6.1 3. 4.6 1.4]
[5.8 2.6 4. 1.2]
[5. 2.3 3.3 1. ]
[5.6 2.7 4.2 1.3]
[5.7 3. 4.2 1.2]
[5.7 2.9 4.2 1.3]
[6.2 2.9 4.3 1.3]
[5.1 2.5 3. 1.1]
[5.7 2.8 4.1 1.3]
[6.3 3.3 6. 2.5]
[5.8 2.7 5.1 1.9]
[7.1 3. 5.9 2.1]
[6.3 2.9 5.6 1.8]
[6.5 3. 5.8 2.2]
[7.6 3. 6.6 2.1]
[4.9 2.5 4.5 1.7]
[7.3 2.9 6.3 1.8]
[6.7 2.5 5.8 1.8]

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 45


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

[7.2 3.6 6.1 2.5]


[6.5 3.2 5.1 2. ]
[6.4 2.7 5.3 1.9]
[6.8 3. 5.5 2.1]
[5.7 2.5 5. 2. ]
[5.8 2.8 5.1 2.4]
[6.4 3.2 5.3 2.3]
[6.5 3. 5.5 1.8]
[7.7 3.8 6.7 2.2]
[7.7 2.6 6.9 2.3]
[6. 2.2 5. 1.5]
[6.9 3.2 5.7 2.3]
[5.6 2.8 4.9 2. ]
[7.7 2.8 6.7 2. ]
[6.3 2.7 4.9 1.8]
[6.7 3.3 5.7 2.1]
[7.2 3.2 6. 1.8]
[6.2 2.8 4.8 1.8]
[6.1 3. 4.9 1.8]
[6.4 2.8 5.6 2.1]
[7.2 3. 5.8 1.6]
[7.4 2.8 6.1 1.9]
[7.9 3.8 6.4 2. ]
[6.4 2.8 5.6 2.2]
[6.3 2.8 5.1 1.5]
[6.1 2.6 5.6 1.4]
[7.7 3. 6.1 2.3]
[6.3 3.4 5.6 2.4]
[6.4 3.1 5.5 1.8]
[6. 3. 4.8 1.8]
[6.9 3.1 5.4 2.1]
[6.7 3.1 5.6 2.4]
[6.9 3.1 5.1 2.3]
[5.8 2.7 5.1 1.9]
[6.8 3.2 5.9 2.3]
[6.7 3.3 5.7 2.5]
[6.7 3. 5.2 2.3]
[6.3 2.5 5. 1.9]
[6.5 3. 5.2 2. ]
[6.2 3.4 5.4 2.3]
[5.9 3. 5.1 1.8]]

class: 0-Iris-Setosa, 1- Iris-Versicolour, 2- Iris-Virginica


[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2
2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2
2 2]

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 46


Machine Learning Laboratory- 17CSL76 7th Sem 2020-21

Confusion Matrix
[[15 0 0]
[ 0 14 0]
[ 0 0 16]]
Accuracy Metrics
precision recall f1-score support

0 1.00 1.00 1.00 15


1 1.00 1.00 1.00 14
2 1.00 1.00 1.00 16

avg / total 1.00 1.00 1.00 45

Conclusion:

Thus to implement k-Nearest Neighbour algorithm to classify the iris data set was executed
successfully.

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 47


Machine Learning Laboratory- 17CSL76 2020-21

Program 10
Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select appropriate data set for your experiment and draw graphs
Objective To implement Regression algorithm to fit the given data
Dataset The dataset contains billing information based on the attributes
total_bill,tip,sex,smoker,day,time,size
ML Locally Weighted Regression Algorithm – Instance Based learning
algorithm
Description Regression means approximating a real valued target function.Given a
new query instance Xq, the general approach is to construct an
approximation function F that fits the training example in the
neighbourhood surrounding Xq.This approximation is then used to
estimate the target value F(Xq)

Algorithm : Regression:

• Regression is a technique from statistics that is used to predict values of a


desired target quantity when the target quantity is continuous .
• In regression, we seek to identify (or estimate) a continuous variable y
associated with a given input vector x.
• y is called the dependent variable.
• x is called the independent variable.

Loess/Lowess Regression: Loess regression is a nonparametric technique that


uses local weighted regression to fit a smooth curve through points in a scatter
plot.

Lowess Algorithm: Locally weighted regression is a very powerful non-parametric model


used in statistical learning .Given a dataset X, y, we attempt to find a model parameter
β(x) that minimizes residual sum of weighted squared errors. The weights are given by a
kernel function(k or w) which can be chosen arbitrarily

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 48


Machine Learning Laboratory- 17CSL76 2020-21

Algorithm
1. Read the Given data Sample to X and the curve (linear or non linear) to Y
2. Set the value for Smoothening parameter or free parameter say τ
3. Set the bias /Point of interest set X0 which is a subset of X
4. Determine the weight matrix using:

5. Determine the value of model term parameter β using :

6. Prediction = x0*β

Program:

from numpy import *


import operator
from os import listdir
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np1
import numpy.linalg as np
from scipy.stats.stats import pearsonr

def kernel(point,xmat, k):


m,n = np1.shape(xmat)
weights = np1.mat(np1.eye((m)))
for j in range(m):

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 49


Machine Learning Laboratory- 17CSL76 2020-21

diff = point - X[j]


weights[j,j] = np1.exp(diff*diff.T/(-2.0*k**2))
return weights

def localWeight(point,xmat,ymat,k):
wei = kernel(point,xmat,k)
W=(X.T*(wei*X)).I*(X.T*(wei*ymat.T))
return W

def localWeightRegression(xmat,ymat,k):
m,n = np1.shape(xmat)
ypred = np1.zeros(m)
for i in range(m):
ypred[i] = xmat[i]*localWeight(xmat[i],xmat,ymat,k)
return ypred

# load data points


data = pd.read_csv('data10.csv')
bill = np1.array(data.total_bill)
tip = np1.array(data.tip)

#preparing and add 1 in bill


mbill = np1.mat(bill)
mtip = np1.mat(tip)
m= np1.shape(mbill)[1]
one = np1.mat(np1.ones(m))
X= np1.hstack((one.T,mbill.T))

#set k here
ypred = localWeightRegression(X,mtip,2)
SortIndex = X[:,1].argsort(0)
xsort = X[SortIndex][:,0]

fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(bill,tip, color='green')
ax.plot(xsort[:,1],ypred[SortIndex], color = 'red', linewidth=5)
plt.xlabel('Total bill')
plt.ylabel('Tip')
plt.show();

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 50


Machine Learning Laboratory- 17CSL76 2020-21

INPUT DATASET :

total_bill tip sex smoker day Time Size


16.99 1.01 Female No Sun Dinner 2
10.34 1.66 Male No Sun Dinner 3
21.01 3.5 Male No Sun Dinner 3
23.68 3.31 Male No Sun Dinner 2
24.59 3.61 Female No Sun Dinner 4
25.29 4.71 Male No Sun Dinner 4
8.77 2 Male No Sun Dinner 2
26.88 3.12 Male No Sun Dinner 4
15.04 1.96 Male No Sun Dinner 2
14.78 3.23 Male No Sun Dinner 2
10.27 1.71 Male No Sun Dinner 2
35.26 5 Female No Sun Dinner 4
15.42 1.57 Male No Sun Dinner 2
18.43 3 Male No Sun Dinner 4
14.83 3.02 Female No Sun Dinner 2
21.58 3.92 Male No Sun Dinner 2
10.33 1.67 Female No Sun Dinner 3
16.29 3.71 Male No Sun Dinner 3
16.97 3.5 Female No Sun Dinner 3
20.65 3.35 Male No Sat Dinner 3
17.92 4.08 Male No Sat Dinner 2
20.29 2.75 Female No Sat Dinner 2
15.77 2.23 Female No Sat Dinner 2
39.42 7.58 Male No Sat Dinner 4
19.82 3.18 Male No Sat Dinner 2
17.81 2.34 Male No Sat Dinner 4
13.37 2 Male No Sat Dinner 2
12.69 2 Male No Sat Dinner 2
21.7 4.3 Male No Sat Dinner 2
19.65 3 Female No Sat Dinner 2
9.55 1.45 Male No Sat Dinner 2
18.35 2.5 Male No Sat Dinner 4
15.06 3 Female No Sat Dinner 2
20.69 2.45 Female No Sat Dinner 4
17.78 3.27 Male No Sat Dinner 2
24.06 3.6 Male No Sat Dinner 3
16.31 2 Male No Sat Dinner 3
16.93 3.07 Female No Sat Dinner 3
18.69 2.31 Male No Sat Dinner 3
31.27 5 Male No Sat Dinner 3
16.04 2.24 Male No Sat Dinner 3
17.46 2.54 Male No Sun Dinner 2

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 51


Machine Learning Laboratory- 17CSL76 2020-21

13.94 3.06 Male No Sun Dinner 2


9.68 1.32 Male No Sun Dinner 2
30.4 5.6 Male No Sun Dinner 4
18.29 3 Male No Sun Dinner 2
22.23 5 Male No Sun Dinner 2
32.4 6 Male No Sun Dinner 4
28.55 2.05 Male No Sun Dinner 3
18.04 3 Male No Sun Dinner 2
12.54 2.5 Male No Sun Dinner 2
10.29 2.6 Female No Sun Dinner 2
34.81 5.2 Female No Sun Dinner 4
9.94 1.56 Male No Sun Dinner 2
25.56 4.34 Male No Sun Dinner 4
19.49 3.51 Male No Sun Dinner 2
38.01 3 Male Yes Sat Dinner 4
26.41 1.5 Female No Sat Dinner 2
11.24 1.76 Male Yes Sat Dinner 2
48.27 6.73 Male No Sat Dinner 4
20.29 3.21 Male Yes Sat Dinner 2
13.81 2 Male Yes Sat Dinner 2
11.02 1.98 Male Yes Sat Dinner 2
18.29 3.76 Male Yes Sat Dinner 4
17.59 2.64 Male No Sat Dinner 3
20.08 3.15 Male No Sat Dinner 3
16.45 2.47 Female No Sat Dinner 2
3.07 1 Female Yes Sat Dinner 1
20.23 2.01 Male No Sat Dinner 2
15.01 2.09 Male Yes Sat Dinner 2
12.02 1.97 Male No Sat Dinner 2
17.07 3 Female No Sat Dinner 3
26.86 3.14 Female Yes Sat Dinner 2
25.28 5 Female Yes Sat Dinner 2
14.73 2.2 Female No Sat Dinner 2
10.51 1.25 Male No Sat Dinner 2
17.92 3.08 Male Yes Sat Dinner 2
27.2 4 Male No Thur Lunch 4
22.76 3 Male No Thur Lunch 2
17.29 2.71 Male No Thur Lunch 2
19.44 3 Male Yes Thur Lunch 2
16.66 3.4 Male No Thur Lunch 2
10.07 1.83 Female No Thur Lunch 1
32.68 5 Male Yes Thur Lunch 2
15.98 2.03 Male No Thur Lunch 2
34.83 5.17 Female No Thur Lunch 4
13.03 2 Male No Thur Lunch 2
18.28 4 Male No Thur Lunch 2

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 52


Machine Learning Laboratory- 17CSL76 2020-21

24.71 5.85 Male No Thur Lunch 2


21.16 3 Male No Thur Lunch 2
28.97 3 Male Yes Fri Dinner 2
22.49 3.5 Male No Fri Dinner 2
5.75 1 Female Yes Fri Dinner 2
16.32 4.3 Female Yes Fri Dinner 2
22.75 3.25 Female No Fri Dinner 2
40.17 4.73 Male Yes Fri Dinner 4
27.28 4 Male Yes Fri Dinner 2
12.03 1.5 Male Yes Fri Dinner 2
21.01 3 Male Yes Fri Dinner 2
12.46 1.5 Male No Fri Dinner 2
11.35 2.5 Female Yes Fri Dinner 2
15.38 3 Female Yes Fri Dinner 2
44.3 2.5 Female Yes Sat Dinner 3
22.42 3.48 Female Yes Sat Dinner 2
20.92 4.08 Female No Sat Dinner 2
15.36 1.64 Male Yes Sat Dinner 2
20.49 4.06 Male Yes Sat Dinner 2
25.21 4.29 Male Yes Sat Dinner 2
18.24 3.76 Male No Sat Dinner 2
14.31 4 Female Yes Sat Dinner 2
14 3 Male No Sat Dinner 2
7.25 1 Female No Sat Dinner 1
38.07 4 Male No Sun Dinner 3
23.95 2.55 Male No Sun Dinner 2
25.71 4 Female No Sun Dinner 3
17.31 3.5 Female No Sun Dinner 2
29.93 5.07 Male No Sun Dinner 4
10.65 1.5 Female No Thur Lunch 2
12.43 1.8 Female No Thur Lunch 2
24.08 2.92 Female No Thur Lunch 4
11.69 2.31 Male No Thur Lunch 2
13.42 1.68 Female No Thur Lunch 2
14.26 2.5 Male No Thur Lunch 2
15.95 2 Male No Thur Lunch 2
12.48 2.52 Female No Thur Lunch 2
29.8 4.2 Female No Thur Lunch 6
8.52 1.48 Male No Thur Lunch 2
14.52 2 Female No Thur Lunch 2
11.38 2 Female No Thur Lunch 2
22.82 2.18 Male No Thur Lunch 3
19.08 1.5 Male No Thur Lunch 2
20.27 2.83 Female No Thur Lunch 2
11.17 1.5 Female No Thur Lunch 2
12.26 2 Female No Thur Lunch 2

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 53


Machine Learning Laboratory- 17CSL76 2020-21

18.26 3.25 Female No Thur Lunch 2


8.51 1.25 Female No Thur Lunch 2
10.33 2 Female No Thur Lunch 2
14.15 2 Female No Thur Lunch 2
16 2 Male Yes Thur Lunch 2
13.16 2.75 Female No Thur Lunch 2
17.47 3.5 Female No Thur Lunch 2
34.3 6.7 Male No Thur Lunch 6
41.19 5 Male No Thur Lunch 5
27.05 5 Female No Thur Lunch 6
16.43 2.3 Female No Thur Lunch 2
8.35 1.5 Female No Thur Lunch 2
18.64 1.36 Female No Thur Lunch 3
11.87 1.63 Female No Thur Lunch 2
9.78 1.73 Male No Thur Lunch 2
7.51 2 Male No Thur Lunch 2
14.07 2.5 Male No Sun Dinner 2
13.13 2 Male No Sun Dinner 2
17.26 2.74 Male No Sun Dinner 3
24.55 2 Male No Sun Dinner 4
19.77 2 Male No Sun Dinner 4
29.85 5.14 Female No Sun Dinner 5
48.17 5 Male No Sun Dinner 6
25 3.75 Female No Sun Dinner 4
13.39 2.61 Female No Sun Dinner 2
16.49 2 Male No Sun Dinner 4
21.5 3.5 Male No Sun Dinner 4
12.66 2.5 Male No Sun Dinner 2
16.21 2 Female No Sun Dinner 3
13.81 2 Male No Sun Dinner 2
17.51 3 Female Yes Sun Dinner 2
24.52 3.48 Male No Sun Dinner 3
20.76 2.24 Male No Sun Dinner 2
31.71 4.5 Male No Sun Dinner 4
10.59 1.61 Female Yes Sat Dinner 2
10.63 2 Female Yes Sat Dinner 2
50.81 10 Male Yes Sat Dinner 3
15.81 3.16 Male Yes Sat Dinner 2
7.25 5.15 Male Yes Sun Dinner 2
31.85 3.18 Male Yes Sun Dinner 2
16.82 4 Male Yes Sun Dinner 2
32.9 3.11 Male Yes Sun Dinner 2
17.89 2 Male Yes Sun Dinner 2
14.48 2 Male Yes Sun Dinner 2
9.6 4 Female Yes Sun Dinner 2
34.63 3.55 Male Yes Sun Dinner 2

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 54


Machine Learning Laboratory- 17CSL76 2020-21

34.65 3.68 Male Yes Sun Dinner 4


23.33 5.65 Male Yes Sun Dinner 2
45.35 3.5 Male Yes Sun Dinner 3
23.17 6.5 Male Yes Sun Dinner 4
40.55 3 Male Yes Sun Dinner 2
20.69 5 Male No Sun Dinner 5
20.9 3.5 Female Yes Sun Dinner 3
30.46 2 Male Yes Sun Dinner 5
18.15 3.5 Female Yes Sun Dinner 3
23.1 4 Male Yes Sun Dinner 3
15.69 1.5 Male Yes Sun Dinner 2
19.81 4.19 Female Yes Thur Lunch 2
28.44 2.56 Male Yes Thur Lunch 2
15.48 2.02 Male Yes Thur Lunch 2
16.58 4 Male Yes Thur Lunch 2
7.56 1.44 Male No Thur Lunch 2
10.34 2 Male Yes Thur Lunch 2
43.11 5 Female Yes Thur Lunch 4
13 2 Female Yes Thur Lunch 2
13.51 2 Male Yes Thur Lunch 2
18.71 4 Male Yes Thur Lunch 3
12.74 2.01 Female Yes Thur Lunch 2
13 2 Female Yes Thur Lunch 2
16.4 2.5 Female Yes Thur Lunch 2
20.53 4 Male Yes Thur Lunch 4
16.47 3.23 Female Yes Thur Lunch 3
26.59 3.41 Male Yes Sat Dinner 3
38.73 3 Male Yes Sat Dinner 4
24.27 2.03 Male Yes Sat Dinner 2
12.76 2.23 Female Yes Sat Dinner 2
30.06 2 Male Yes Sat Dinner 3
25.89 5.16 Male Yes Sat Dinner 4
48.33 9 Male No Sat Dinner 4
13.27 2.5 Female Yes Sat Dinner 2
28.17 6.5 Female Yes Sat Dinner 3
12.9 1.1 Female Yes Sat Dinner 2
28.15 3 Male Yes Sat Dinner 5
11.59 1.5 Male Yes Sat Dinner 2
7.74 1.44 Male Yes Sat Dinner 2
30.14 3.09 Female Yes Sat Dinner 4
12.16 2.2 Male Yes Fri Lunch 2
13.42 3.48 Female Yes Fri Lunch 2
8.58 1.92 Male Yes Fri Lunch 1
15.98 3 Female No Fri Lunch 3
13.42 1.58 Male Yes Fri Lunch 2
16.27 2.5 Female Yes Fri Lunch 2

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 55


Machine Learning Laboratory- 17CSL76 2020-21

10.09 2 Female Yes Fri Lunch 2


20.45 3 Male No Sat Dinner 4
13.28 2.72 Male No Sat Dinner 2
22.12 2.88 Female Yes Sat Dinner 2
24.01 2 Male Yes Sat Dinner 4
15.69 3 Male Yes Sat Dinner 3
11.61 3.39 Male No Sat Dinner 2
10.77 1.47 Male No Sat Dinner 2
15.53 3 Male Yes Sat Dinner 2
10.07 1.25 Male No Sat Dinner 2
12.6 1 Male Yes Sat Dinner 2
32.83 1.17 Male Yes Sat Dinner 2
35.83 4.67 Female No Sat Dinner 3
29.03 5.92 Male No Sat Dinner 3
27.18 2 Female Yes Sat Dinner 2
22.67 2 Male Yes Sat Dinner 2
17.82 1.75 Male No Sat Dinner 2
18.78 3 Female No Thur Dinner 2

OUTPUT

Conclusion:
Thus to Implement the non-parametric Locally Weighted Regression algorithm in order to fit
data points was executed successfully.

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 56


Machine Learning Laboratory- 17CSL76 2020-21

VIVA Questions

1. What is machine learning?


2. Define supervised learning
3. Define unsupervised learning
4. Define semi supervised learning
5. Define reinforcement learning
6. What do you mean by hypotheses
7. What is clustering
8. Define precision, accuracy and recall
9. Define entropy
10. Define regression
11. How Knn is different from k-means clustering
12. What is concept learning
13. Define specific boundary and general boundary
14. Define target function
15. Define decision tree
16. What is ANN
17. Explain gradient descent approximation
18. State Bayes theorem
19. Define Bayesian belief networks
20. Differentiate hard and soft clustering
21. Define variance
22. What is inductive machine learning?
23. Why K nearest neighbor algorithm is lazy learning algorithm
24. Why naïve Bayes is naïve
25. Mention classification algorithms
26. Define pruning
27. Differentiate Clustering and classification
28. Mention clustering algorithms
29. Define Bias
30. What is learning rate. Why it is need.

Girish M , Asst.Prof , Dept. of CSE, GCEM, Bangalore Page 57

You might also like