0% found this document useful (0 votes)
15 views8 pages

ML Lab Experiments (1) - Pages-5

The document describes implementing a locally weighted regression algorithm to fit data points. It discusses locally weighted regression, provides the lowess algorithm steps, includes sample Python code to generate a dataset, calculate weights, make predictions and plot the results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views8 pages

ML Lab Experiments (1) - Pages-5

The document describes implementing a locally weighted regression algorithm to fit data points. It discusses locally weighted regression, provides the lowess algorithm steps, includes sample Python code to generate a dataset, calculate weights, make predictions and plot the results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Machine Learning Lab (IT804) Jan-Jun 2021

Experiment No: 9
Objective: Write a program to implement k-Nearest Neighbor algorithm to classify the iris
data set. Print both correct and wrong predictions. Java/Python ML library classes can be used
for this problem.

Description:
K-Nearest Neighbour is based on Supervised Learning technique. This algorithm assumes the
similarity between the new case/data and available cases and put the new case into the category
that is most similar to the available categories. K-NN algorithm stores all the available data and
classifies a new data point based on the similarity. This means when new data appears then it can
be easily classified into a well suite category by using K- NN algorithm.
K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for
the Classification problems. It is a non-parametric algorithm, which means it does not make any
assumption on underlying data. It is also called a lazy learner algorithm because it does not learn
from the training set immediately instead it stores the dataset and at the time of classification, it
performs an action on the dataset.
KNN algorithm at the training phase just stores the dataset and when it gets new data, and then it
classifies that data into a category that is much similar to the new data.
The K-NN working can be explained on the basis of the below algorithm:
1. Select the number K of the neighbors
2. Calculate the Euclidean distance of K number of neighbors
3. Take the K nearest neighbors as per the calculated Euclidean distance.
4. Among these k neighbors, count the number of the data points in each category.
5. Assign the new data points to that category for which the number of the neighbor is
maximum.
6. Our model is ready.

Training Algorithm
 For each training example (x,f(x)), add the example to the list training examples
classification algorithm.
 Given a query instance xq to be classified.
 Let x1,x2,….xk denotes the k instances from training examples that are nearest to
xq.
23
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021

 return

∑ k
i =1
f ( xi )
f^ ( x q )←
k
 Where f(xi), function to calculate the mean value of the k-nearest training examples.

Data Set:
Iris plants data set: data set contains 150 instances (50 in each of three classes)
The Number of attributes: 4 numeric, predictive attributes and the class.
S. No sepal_length sepal_width petal_length petal_width class
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa

Program:
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix
from sklearn import datasets

iris=datasets.load_iris()

x = iris.data
y = iris.target

print ('sepal-length', 'sepal-width', 'petal-length', 'petal-width')


print(x)
print('class: 0-Iris-Setosa, 1- Iris-Versicolour, 2- Iris-Virginica')
print(y)

x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.3)

24
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021

#To Training the model and Nearest nighbors K=5


classifier = KNeighborsClassifier(n_neighbors=5)
classifier.fit(x_train, y_train)
#To make predictions on our test data
y_pred=classifier.predict(x_test)

print('Confusion Matrix')
print(confusion_matrix(y_test,y_pred))
print('Accuracy Metrics')
print(classification_report(y_test,y_pred))

Output:
Confusion matrix is as follows
[[11 0 0]
[0 9 1]
[0 1 8]]
Accuracy metrics
0 1.00 1.00 1.00 11
1 0.90 0.90 0.90 10
2 0.89 0.89 0,89 9
Avg/Total 0.93 0.93 0.93 30

25
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021

Experiment No: 10
Objective: Implement the non-parametric Locally Weighted Regression algorithm in order to
fit data points. Select appropriate data set for your experiment and draw graphs.
Description:
Locally Weighted Regression Algorithm
Regression:
 Regression is a technique from statistics that is used to predict values of a desired target
quantity when the target quantity is continuous.
 In regression, we seek to identify (or estimate) a continuous variable y associated with a
given input vector x.
 y is called the dependent variable.
 x is called the independent variable.
Loess/Lowess Regression:
Loess regression is a nonparametric technique that uses local weighted regression to fit a smooth
curve through points in a scatter plot.
Lowess Algorithm:
 Locally weighted regression is a very powerful nonparametric model used in statistical
learning.
 Given a dataset X, y, we attempt to find a model parameter β(x) that minimizes residual
sum of weighted squared errors.
 The weights are given by a kernel function (k or w) which can be chosen arbitrarily
Algorithm
1. Read the Given data Sample to X and the curve (linear or non linear) to Y
2. Set the value for Smoothening parameter or Free parameter say τ
3. Set the bias /Point of interest set x0 which is a subset of X
4. Determine the weight matrix using :
2
( x−x 0 )
− 2

w ( x , x 0 )=e
5. Determine the value of model term parameter β using :

β^( x 0 )=( XT WX )−1 XT Wy


6. Prediction = x0*β:

26
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021

Program:
import numpy as np
from bokeh.plotting import figure, show, output_notebook
from bokeh.layouts import gridplot
from bokeh.io import push_notebook

def local_regression(x0, X, Y, tau):# add bias term


x0 = np.r_[1, x0] # Add one to avoid the loss in information
X = np.c_[np.ones(len(X)), X]

# fit model: normal equations with kernel


xw = X.T * radial_kernel(x0, X, tau) # XTranspose * W

beta = np.linalg.pinv(xw @ X) @ xw @ Y #@ Matrix Multiplication or Dot Product

# predict value
return x0 @ beta # @ Matrix Multiplication or Dot Product for prediction
def radial_kernel(x0, X, tau):
return np.exp(np.sum((X - x0) ** 2, axis=1) / (-2 * tau * tau))

# Weight or Radial Kernal Bias Function


n = 1000

# generate dataset
X = np.linspace(-3, 3, num=n)
print("The Data Set ( 10 Samples) X :\n",X[1:10])
Y = np.log(np.abs(X ** 2 - 1) + .5)
print("The Fitting Curve Data Set (10 Samples) Y :\n",Y[1:10])

# jitter X
X += np.random.normal(scale=.1, size=n)
print("Normalised (10 Samples) X :\n",X[1:10])
domain = np.linspace(-3, 3, num=300)
print(" Xo Domain Space(10 Samples) :\n",domain[1:10])
27
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021

def plot_lwr(tau):

# prediction through regression


prediction = [local_regression(x0, X, Y, tau) for x0 in domain]
plot = figure(plot_width=400, plot_height=400)
plot.title.text='tau=%g' % tau
plot.scatter(X, Y, alpha=.3)
plot.line(domain, prediction, line_width=2, color='red')
return plot

show(gridplot([
[plot_lwr(10.), plot_lwr(1.)],
[plot_lwr(0.1), plot_lwr(0.01)]]))

Output:

28
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021

# -*- coding: utf-8 -*-


"""
Spyder Editor
This is a temporary script file.
"""
from numpy import *
from os import listdir
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np1
import numpy.linalg as np
from scipy.stats.stats import pearsonr

def kernel(point,xmat, k):


m,n = np1.shape(xmat)
weights = np1.mat(np1.eye((m)))
for j in range(m):
diff = point - X[j]
weights[j,j] = np1.exp(diff*diff.T/(-2.0*k**2))
return weights

def localWeight(point,xmat,ymat,k):

29
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021

wei = kernel(point,xmat,k)
W = (X.T*(wei*X)).I*(X.T*(wei*ymat.T))
return W

def localWeightRegression(xmat,ymat,k):
m,n = np1.shape(xmat)
ypred = np1.zeros(m)
for i in range(m):
ypred[i] = xmat[i]*localWeight(xmat[i],xmat,ymat,k)
return ypred

# load data points


data = pd.read_csv('tips.csv')
bill = np1.array(data.total_bill)
tip = np1.array(data.tip)
#preparing and add 1 in bill
mbill = np1.mat(bill)
mtip = np1.mat(tip) # mat is used to convert to n dimesiona to 2 dimensional array form
m= np1.shape(mbill)[1]
# print(m) 244 data is stored in m
one = np1.mat(np1.ones(m))
X= np1.hstack((one.T,mbill.T)) # create a stack of bill from ONE
#print(X)
#set k here
ypred = localWeightRegression(X,mtip,0.3)
SortIndex = X[:,1].argsort(0)
xsort = X[SortIndex][:,0]
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(bill,tip, color='green')
ax.plot(xsort[:,1],ypred[SortIndex], color = 'red', linewidth=5)
plt.xlabel('Total bill')

plt.ylabel('Tip')
plt.show();
30
Laboratory File

You might also like