ML Lab Experiments (1) - Pages-5
ML Lab Experiments (1) - Pages-5
Experiment No: 9
Objective: Write a program to implement k-Nearest Neighbor algorithm to classify the iris
data set. Print both correct and wrong predictions. Java/Python ML library classes can be used
for this problem.
Description:
K-Nearest Neighbour is based on Supervised Learning technique. This algorithm assumes the
similarity between the new case/data and available cases and put the new case into the category
that is most similar to the available categories. K-NN algorithm stores all the available data and
classifies a new data point based on the similarity. This means when new data appears then it can
be easily classified into a well suite category by using K- NN algorithm.
K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for
the Classification problems. It is a non-parametric algorithm, which means it does not make any
assumption on underlying data. It is also called a lazy learner algorithm because it does not learn
from the training set immediately instead it stores the dataset and at the time of classification, it
performs an action on the dataset.
KNN algorithm at the training phase just stores the dataset and when it gets new data, and then it
classifies that data into a category that is much similar to the new data.
The K-NN working can be explained on the basis of the below algorithm:
1. Select the number K of the neighbors
2. Calculate the Euclidean distance of K number of neighbors
3. Take the K nearest neighbors as per the calculated Euclidean distance.
4. Among these k neighbors, count the number of the data points in each category.
5. Assign the new data points to that category for which the number of the neighbor is
maximum.
6. Our model is ready.
Training Algorithm
For each training example (x,f(x)), add the example to the list training examples
classification algorithm.
Given a query instance xq to be classified.
Let x1,x2,….xk denotes the k instances from training examples that are nearest to
xq.
23
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021
return
∑ k
i =1
f ( xi )
f^ ( x q )←
k
Where f(xi), function to calculate the mean value of the k-nearest training examples.
Data Set:
Iris plants data set: data set contains 150 instances (50 in each of three classes)
The Number of attributes: 4 numeric, predictive attributes and the class.
S. No sepal_length sepal_width petal_length petal_width class
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
Program:
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix
from sklearn import datasets
iris=datasets.load_iris()
x = iris.data
y = iris.target
24
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021
print('Confusion Matrix')
print(confusion_matrix(y_test,y_pred))
print('Accuracy Metrics')
print(classification_report(y_test,y_pred))
Output:
Confusion matrix is as follows
[[11 0 0]
[0 9 1]
[0 1 8]]
Accuracy metrics
0 1.00 1.00 1.00 11
1 0.90 0.90 0.90 10
2 0.89 0.89 0,89 9
Avg/Total 0.93 0.93 0.93 30
25
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021
Experiment No: 10
Objective: Implement the non-parametric Locally Weighted Regression algorithm in order to
fit data points. Select appropriate data set for your experiment and draw graphs.
Description:
Locally Weighted Regression Algorithm
Regression:
Regression is a technique from statistics that is used to predict values of a desired target
quantity when the target quantity is continuous.
In regression, we seek to identify (or estimate) a continuous variable y associated with a
given input vector x.
y is called the dependent variable.
x is called the independent variable.
Loess/Lowess Regression:
Loess regression is a nonparametric technique that uses local weighted regression to fit a smooth
curve through points in a scatter plot.
Lowess Algorithm:
Locally weighted regression is a very powerful nonparametric model used in statistical
learning.
Given a dataset X, y, we attempt to find a model parameter β(x) that minimizes residual
sum of weighted squared errors.
The weights are given by a kernel function (k or w) which can be chosen arbitrarily
Algorithm
1. Read the Given data Sample to X and the curve (linear or non linear) to Y
2. Set the value for Smoothening parameter or Free parameter say τ
3. Set the bias /Point of interest set x0 which is a subset of X
4. Determine the weight matrix using :
2
( x−x 0 )
− 2
2τ
w ( x , x 0 )=e
5. Determine the value of model term parameter β using :
26
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021
Program:
import numpy as np
from bokeh.plotting import figure, show, output_notebook
from bokeh.layouts import gridplot
from bokeh.io import push_notebook
# predict value
return x0 @ beta # @ Matrix Multiplication or Dot Product for prediction
def radial_kernel(x0, X, tau):
return np.exp(np.sum((X - x0) ** 2, axis=1) / (-2 * tau * tau))
# generate dataset
X = np.linspace(-3, 3, num=n)
print("The Data Set ( 10 Samples) X :\n",X[1:10])
Y = np.log(np.abs(X ** 2 - 1) + .5)
print("The Fitting Curve Data Set (10 Samples) Y :\n",Y[1:10])
# jitter X
X += np.random.normal(scale=.1, size=n)
print("Normalised (10 Samples) X :\n",X[1:10])
domain = np.linspace(-3, 3, num=300)
print(" Xo Domain Space(10 Samples) :\n",domain[1:10])
27
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021
def plot_lwr(tau):
show(gridplot([
[plot_lwr(10.), plot_lwr(1.)],
[plot_lwr(0.1), plot_lwr(0.01)]]))
Output:
28
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021
def localWeight(point,xmat,ymat,k):
29
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021
wei = kernel(point,xmat,k)
W = (X.T*(wei*X)).I*(X.T*(wei*ymat.T))
return W
def localWeightRegression(xmat,ymat,k):
m,n = np1.shape(xmat)
ypred = np1.zeros(m)
for i in range(m):
ypred[i] = xmat[i]*localWeight(xmat[i],xmat,ymat,k)
return ypred
plt.ylabel('Tip')
plt.show();
30
Laboratory File