0% found this document useful (0 votes)
44 views34 pages

Module3-Similarity-based Learning-11Mar2024

Chapter 4 discusses similarity-based learning, also known as instance-based learning, which classifies test instances based on their similarity to training instances without constructing a global model. It highlights the k-Nearest Neighbor (k-NN) algorithm as a key method, explaining its lazy learning approach, the importance of distance metrics, and the impact of choosing the right number of neighbors (k). Additionally, it introduces the weighted k-NN variant, which assigns different weights to neighbors based on their distance to improve classification accuracy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views34 pages

Module3-Similarity-based Learning-11Mar2024

Chapter 4 discusses similarity-based learning, also known as instance-based learning, which classifies test instances based on their similarity to training instances without constructing a global model. It highlights the k-Nearest Neighbor (k-NN) algorithm as a key method, explaining its lazy learning approach, the importance of distance metrics, and the impact of choosing the right number of neighbors (k). Additionally, it introduces the weighted k-NN variant, which assigns different weights to neighbors based on their distance to improve classification accuracy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Chapter 4

Similarity-based Learning
Introduction to Similarity or Instance-based Learning
 Similarity-based classifiers use similarity measures to locate the nearest
neighbors and classify a test instance.
 It is also called as Instance-based learning / Just-in time learning as it does not
build an abstract model of the training instances and performs lazy learning
when classifying a new instance.
 This learning mechanism simply stores all data and uses it only when it needs to
classify an unseen instance.
 The advantage this learning is that processing occurs only when a request to
classify a new instance is given.
 This methodology is useful when the whole dataset is not available in the
beginning but collected in an incremental manner.
 The drawback of this learning is that it requires a large memory to store the data
since a global abstract model is not constructed initially with the training data.
 Classification of instances is done based on the measure of similarity in the form
of distance functions over data instances.
Machine Learning 2
Introduction to Similarity or Instance-based Learning

 Several distance metrics are used to estimate the similarity or


dissimilarity between instances required for clustering, nearest
neighbor classification, anomaly detection, and so on.
 Popular distance metrics used are Hamming distance, Euclidean
distance, Manhattan distance, Minkowski distance, Cosine similarity,
Mahalanobis distance, Pearson’s correlation or correlation similarity,
Mean squared difference, Jaccard coefficient, Tanimoto coefficient,
etc.
 Similarity-based classification problems formulate the features of test
instance and training instances in Euclidean space to learn the
similarity or dissimilarity between instances.

Machine Learning 3
Introduction to Similarity or Instance-based Learning
Differences Between Instance-based and Model-based Learning:
 An instance is an entity or an example in the training dataset.
 It is described by a set of features or attributes.
 One attribute describes the class label or category of an instance.
 Instance-based methods learn or predict the class label of a test instance only
when a new instance is given for classification and until then it delays the
processing of the training dataset.
 It is also called to as lazy learning methods since it does not generalize any
model from the training dataset but just keeps the training dataset as a
knowledge base until a new instance is given.
 Whereas, the model-based learning, also called as eager learning, tries to
generalize the training data to a model before receiving test instances.
 Model-based learning describes all assumptions about the problem domain in
the form of a model.

Machine Learning 4
Introduction to Similarity or Instance-based Learning
 These algorithms basically learn in two phases called training phase and testing
phase.
 In training phase, a model is built from the training dataset and is used to
classify a test instance during the testing phase.
 Some examples of models constructed are decision trees, neural networks and
Support Vector Machines (SVM), etc.
 The differences between instance-based learning and model-based learning are
listed below.

Instance-based Learning Model-based Learning


Lazy Learners Eager Learners
Processing of training instances is done Processing of training instances is done
only during testing phase during training phase
No model is built with the training Generalizes a model with the training
instances before it receives a test instance instances before it receives a test instance

Machine Learning 5
Introduction to Similarity or Instance-based Learning

Instance-based Learning Model-based Learning


Predicts the class of the test instance Predicts the class of the test instance
directly from the training data from the model built.
Slow in testing phase Fast in testing phase
Learns by making many local Learns by creating global approximation
approximations

 Instance-based learning also comes under the category of memory-based


models which compare the given test instance with the trained instances that
are stored in memory.
 Memory based models classify a test instance by checking the similarity with
the training instances.

Machine Learning 6
Introduction to Similarity or Instance-based Learning
 Examples of Instance-based learning algorithms are:
1. k-Nearest Neighbor (k-NN)
2. Variants of Nearest Neighbor learning
3. Locally Weighted Regression
4. Learning Vector Quantization (LVQ)
5. Self-Organizing Map (SOM)
6. Radial Basis Function (RBF) networks
 Instance-based methods have limitations about the range of feature
values taken.
 They are sensitive to irrelevant and correlated features leading to
misclassification of instances.

Machine Learning 7
Nearest-Neighbor Learning
 k-Nearest-Neighbors (k-NN) is a natural approach used in similarity-based
classification.
 It is a non-parametric method used for both classification and regression problems.
 It is a simple and powerful algorithm that predicts the category of the test instance
according to the ‘k’ training samples which are closer to the test instance and
classifies it to that category which has the largest probability.
 A visual representation of this learning is shown in Figure 4.1.
 There are two classes of objects called C1 and C2 in the given figure.
 For any given test instance T, the category of this test instance is determined by
looking at the class of k = 3 nearest neighbors.
 Thus, the class of this test instance T is predicted as C2.
Note:
 Nonparametric methods, or distribution-free methods, are statistical methods that do not
rely on assumptions that the data are drawn from a given probability distribution.
 Nonparametric methods are often applied when less is known about the data.
Machine Learning 8
Nearest-Neighbor Learning

 The algorithm relies on the assumption that similar objects are close to each
other in the feature space.
 k-NN performs instance-based learning which just stores the training data
instances and learn the instances case by case.
 This model is also memory-based as it uses training data at the time when
predictions need to be made.

Machine Learning 9
Nearest-Neighbor Learning
 It is a lazy learning algorithm since no prediction model is built earlier with
training instances and classification happens only after getting the test instance.
 The algorithm classifies a new instance by determining the ‘k’ most similar
instances (i.e., k nearest neighbors) and summarizing the output of those ‘k’
instances.
 If the target variable is discrete, then it is a classification problem.
 So, it selects the most common class value among the ‘k’ instances by a majority
vote.
 If the target variable is continuous, then it is a regression problem, and hence
the mean output variable of the ‘k’ instances is the output of the test instance.
 Euclidean distance, the most popular distance measure is used in k-NN to
determine the ‘k’ instances which are similar to the test instance.
 The value of ‘k’ is determined by tuning with different ‘k’ values and choosing the
‘k’ which classifies the test instance more accurately.

Machine Learning 10
Nearest-Neighbor Learning

Machine Learning 11
Nearest-Neighbor Learning
Example 4.1:
 Consider the student performance training dataset of 8 data instances shown in Table 4.2
which describes the performance of individual students in a course and their CGPA
obtained in the previous semesters. The independent attributes are CGPA, Assessment
and Project. The target variable is ‘Result’ which is a discrete valued variable that takes
two values ‘Pass’ or ‘Fail’. Based on the performance of a student, classify whether a
student will pass or fail in that course.
Sl. No. CGPA Assessment Project Submitted Result
1 9.2 85 8 Pass
2 8 80 7 Pass
3 8.5 81 8 Pass
4 6 45 5 Fail
5 6.5 50 4 Fail
6 8.2 72 7 Pass
7 5.8 38 5 Fail
8 8.9 91 9 Pass
Table 4.2: Training Dataset T
Machine Learning 12
Nearest-Neighbor Learning

Solution:
 Given a test instance (6.1, 40, 5) and a set of categories {Pass, Fail} also
called as classes, we need to use the training set to classify the test
instance using Euclidean distance.
 The task of classification is to assign a category or class to an arbitrary
instance.
 Assign k = 3.
Step 1:
 Calculate the Euclidean distance between the test instance (6.1, 40, and 5)
and each of the training instances as shown in Table 4.3.

Machine Learning 13
Nearest-Neighbor Learning

Machine Learning 14
Nearest-Neighbor Learning
Step 2:
 Sort the distances in the ascending order and select the first 3 nearest training
data instances to the test instance.
 The selected nearest neighbors are shown below.

Instance Euclidean Distance Class


4 5.001 Fail
5 10.05783 Fail
7 2.022375 Fail
Table 4.4: Nearest Neighbors
 Here, we take the 3 nearest neighbors as instances 4, 5 and 7 with smallest
distances.
Step 3:
 Predict the class of the test instance by majority voting.
 The class for the test instance is predicted as ‘Fail’.
Machine Learning 15
Nearest-Neighbor Learning
 Data normalization/standardization is required when data (features) have different
ranges or a wider range of possible values when computing distances and to
transform all features to a specific range.
 This is required to eliminate the influence of one feature over another (i.e., to give
all features equal chances).
 For example, if one feature has values in the range of [0–1] and another feature
has values in the range of [0–100], then the second feature will influence more
even if there is a small variation than the first feature.
 The performance of k-NN classifier is affected by three factors such as the number
of nearest neighbors (i.e., selection of k), distance metric and decision rule.
 If the k value selected is small, then it may result in overfitting or less stable and if
it is big then it may include many irrelevant points from other classes.
 The choice of the distance metric also plays a major role, and it depends on the
type of the independent attributes in the training dataset.
 The k-NN classification algorithm best suits for lower dimensional data as in a
high-dimensional space the nearest neighbors may not be very close at all.
Machine Learning 16
Weighted K-Nearest-Neighbor Algorithm
 The weighted k-NN is an extension of k-NN.
 It chooses the neighbors by using the weighted distance.
 The k-Nearest Neighbor (k-NN) algorithm has some limitations as its
performance is dependent on choosing the k nearest neighbors, the
distance metric used, and the decision rule.
 In weighted k-NN, the k closest neighbors to the test instance are assigned
a higher weight in the decision compared to the other neighbors that are
farther away from the test instance.
 The weights are inversely proportional to the distances.
 The selected k nearest neighbors can be assigned with uniform weights,
which means all the instances in each neighborhood have equal weights,
or the weights can be assigned by the inverse of their distance.
 In the second case, closer neighbors of a query point will have a greater
influence than neighbors which are farther away.

Machine Learning 17
Weighted K-Nearest-Neighbor Algorithm
Algorithm 4.2: Weighted k-NN
Inputs: Training dataset ‘T’, Distance metric ‘d’, Weighting function w(i), Test
instance ‘t’, the number of nearest neighbors ‘k’.
Output: Predicted class or category
Prediction: For test instance t,
1. For each instance ‘i’ in Training dataset T, compute the distance between the
test instance t and every other instance ‘i’ using a distance metric (Euclidean
distance).
[Continuous attributes: Euclidean distance between two points in the plane
with coordinates (x1, y1) and (x2, y2) is given as:
dist((x1, y1), (x2, y2)) = x2 − x1 y2 − y 1 ]
[Categorical attributes (Binary): Hamming Distance: If the values of two
instances are the same, the distance d will be equal to 0. Otherwise, d = 1.]
2. Sort the distances in the ascending order and select the first ‘k’ nearest
training data instances to the test instance.
Machine Learning 18
Weighted K-Nearest-Neighbor Algorithm

3. Predict the class of the test instance by weighted voting technique


(Weighting function w(i)) for the k selected nearest instances:
 Compute the inverse of each distance of the ‘k’ selected nearest
instances.
 Find the sum of the inverses.
 Compute the weight by dividing each inverse distance by the sum.
(Each weight is a vote for its associated class).
 Add the weights of the same class.
 Predict the class by choosing the class with the maximum vote.

Machine Learning
19
Weighted K-Nearest-Neighbor Algorithm
Example 4.2:
 Consider the same training dataset given in Table 4.2. Use Weighted k-NN and
determine the class.
Solution:
Step 1:
 Given a test instance (7.6, 60, 8) and a set of classes {Pass, Fail}, use the
training dataset to classify the test instance using Euclidean distance and
weighting function.
 Assign k = 3.
 The distance calculation is shown in Table 4.5.

Machine Learning 20
Weighted K-Nearest-Neighbor Algorithm
Sl. No. CGPA Assessment Project Submitted Result Euclidean Distance
1 9.2 85 8 Pass (9.2 − 7.6) + (85 − 60) + (8 − 8)
=25.05115
2 8 80 7 Pass (8 − 7.6) + (80 − 60) + (7 − 8)
=20.02898

3 8.5 81 8 Pass (8.5−7.6)2 + (81 − 60) + (8 − 8)


=21.01928
4 6 45 5 Fail (6 −7.6) + (45 − 60) + (5 − 8)
=15.38051
5 6.5 50 4 Fail (6.5 − 7.6) + (50 − 60) + (4 − 8)
=10.82636
6 8.2 72 7 Pass (8.2 − 7.6) + (72 − 60) + (7 − 8)
=12.05653
7 5.8 38 5 Fail (5.8 − 7.6) + (38 − 60) + (5 − 8)
=22.27644
8 8.9 91 9 Pass (8.9 − 7.6) + (91 − 60) + (9 − 8)
=31.04336
Machine Learning 21
Weighted K-Nearest-Neighbor Algorithm
Step 2:
 Sort the distances in the ascending order and select the first 3 nearest
training data instances to the test instance.
 The selected nearest neighbors are shown below.

Step 3:
 Predict the class of the test instance by weighted voting technique from
the 3 selected nearest instances.
 Compute the inverse of each distance of the 3 selected nearest instances
as shown in Table 4.7.

Machine Learning 22
Weighted K-Nearest-Neighbor Algorithm

 Find the sum of the inverses.


Sum = 0.06502 + 0.092370 + 0.08294 = 0.24033
 Compute the weight by dividing each inverse distance by the sum as
shown below.

Machine Learning 23
Weighted K-Nearest-Neighbor Algorithm

 Add the weights of the same class.


Fail = 0.270545 + 0.384347 = 0.654892
Pass = 0.345109
 Predict the class by choosing the class with the maximum vote.
 Therefore, the class is predicted as ‘Fail’.

Machine Learning 24
Nearest Centroid Classifier
 Another variant of k-NN classifiers used for similarity-based classification is the
Nearest Centroid Classifier.
 It is a simple classifier which is also called as Mean Difference classifier.
 In this method, we classify a test instance to the class whose centroid/mean is
closest to that instance.

Machine Learning 25
Nearest Centroid Classifier
Example 4.3:
 Consider the sample data shown in Table 4.9 with two features X and Y. The
target classes are ‘A’ or ‘B’. Predict the class using Nearest Centroid Classifier.

Solution:
Step1:
 Compute the mean/centroid of each class.
 In this example, we have two classes ‘A’ and ‘B’.
Machine Learning 26
Nearest Centroid Classifier
Centroid of class ‘A’ = (3 + 5 + 4, 1 + 2 + 3) / 3 = (12, 6)/3 = (4, 2)
Centroid of class ‘B’ = (7 + 6 + 8, 6 + 7 + 5) / 3 = (21, 18)/3 = (7, 6)
 Now for the given test instance (6, 5), we can predict the class as explained
below.
Step 2:
 Calculate the Euclidean distance between test instance (6, 5) and each of the
centroid.
Euc_Dist[(6, 5), (4, 2)] = [(6 − 4) (5 − 2) = 13 = 3.6
Euc_Dist[(6, 5), (7, 6)] = [(6 − 7) (5 − 6) = 2 = 1.414
 The test instance has smaller distance to class B.
 Hence, the class of this test instance is predicted as ‘B’.

Machine Learning 27
Locally Weighted Regression (LWR)
 Locally Weighted Regression(LWR) is a non-parametric supervised learning
algorithm that performs local regression by combining regression model with
nearest neighbor’s model.
 LWR is also referred to as a memory-based method as it requires training data
while prediction but uses only the training data instances locally around the
point of interest.
 Using nearest neighbors algorithm, we find the instances that are closest to a
test instance and fit linear function to each of those ‘k’ nearest instances in the
local regression model.
 We need to approximate the linear functions of all ‘k’ neighbors that minimize
the error such that the prediction line is no more linear, but it is a curve.
 Ordinary linear regression finds a linear relationship between the input x and the
output y.
 For the given training dataset T, Hypothesis function h𝛽 (x), the predicted target
output is a linear function, where 0 is the intercept and 1 is the coefficient of x.

Machine Learning 28
Locally Weighted Regression (LWR)
 It is given by,
h𝛽 (x) = 0 + 1x
 The cost function is such that it minimizes the error difference between the
predicted value h𝛽 (x) and true value ‘y’ and it is given by,
J( ) = (h𝛽(xi) − yi)2
 Where ‘m’ is the number of instances in the training dataset.
 Now the cost function is modified for locally weighted linear regression including
the weights only for the nearest neighbor points.
 Hence, the cost function is given by,
J( ) = wi (h𝛽(xi) − yi)2
 Where wi is the weight associated with each xi.

Machine Learning 29
Locally Weighted Regression (LWR)

 The weight function used is a Gaussian kernel which gives a higher


value for instances that are close to the test instance, and it tends to
zero for the instances that are far away, but never equals to zero.
 wi is computed as,
wi =
 Where, is called the bandwidth parameter and controls the rate at
which wi reduces to zero with distance from xi.

Machine Learning 30
Locally Weighted Regression (LWR)
Example 4.4:
 Consider a simple example with four instances shown in Table 4.10 and apply locally
weighted regression.
Sl. No Salary (in Lakhs) Expenditure (in Thousands)
1 5 25
2 1 5
3 2 7
4 1 8
Table 4.10 Sample Table
Solution:
 Using linear regression model assuming we have computed the parameters:
0 = 4.72, 1 = 0.62.
 Given a test instance with x = 2, the predicted y′ is:
y′ = 0 + 1 x = 4.72 + 0.62 × 2 = 5.96.
Machine Learning 31
Locally Weighted Regression (LWR)
 Applying the nearest neighbor model, we choose k = 3 closest instances.
 The Euclidean distance calculation for the training instances is shown below.

Sl. No x=Salary (in Lakhs) y=Expenditure (in Thousands) Euclidean Distance


1 5 25 (5 − 2) = 3
2 1 5 (1 − 2) = 1
3 2 7 (2 − 2) = 0
4 1 8 (1 − 2) = 1
Table 4.11 Euclidean Distance Calculation
 Instances 2, 3 and 4 are closer with smaller distances.
 The mean value = (5 + 7 + 8) / 3 = 20/3 = 6.67.
 Weights for the closest instances are computed using the Gaussian kernel,

wi =

Machine Learning 32
Locally Weighted Regression (LWR)
 Hence the weights of the closest instances is computed as follows:
 Weight of Instance 2 is:

w2 = = . =𝑒 .
= 0.043
 Weight of Instance 3 is:

w3 = = . = 𝑒 = 1 [w3 is closer, hence gets a higher weight value.]


 Weight of Instance 4 is:

w4 = = . =𝑒 .
= 0.043
 The predicted output for the 3 closer instances is given as follows:
 The predicted output of Instance 2 is:
y2′ = h𝛽 (x2) = 0 + 1x2 = 4.72 + 0.62 × 1 = 5.34
 The predicted output of Instance 3 is:
y3′ = h𝛽 (x3) = 0 + 1x3 = 4.72 + 0.62 × 2 = 5.96
Machine Learning 33
Locally Weighted Regression (LWR)

 The predicted output of Instance 4 is:


y4′ = h𝛽 (x4) = 0 + 1x4 = 4.72 + 0.62 × 1 = 5.34
 The error value is calculated as:
J( ) = wi(h𝛽(xi) − yi)2
= (0.043(5.34 - 5)2 + 1(5.96 - 7)2 + 0.043(5.34 - 8)2) = 0.6953.
 Now, we need to adjust this cost function to minimize the error difference
and get optimal parameters.

Machine Learning 34

You might also like