0% found this document useful (0 votes)
19 views21 pages

CS8082U4L01 - K-Nearest Neighbour Learning

This document discusses k-nearest neighbor (k-NN) learning, an instance-based machine learning algorithm. k-NN stores training examples and classifies new examples based on the classes of their k nearest neighbors. The document also discusses locally weighted regression, which generalizes k-NN by constructing a local approximation of the target function for each query instance using nearby, distance-weighted training examples. Key points covered include how k-NN classifies discrete and continuous targets, distance-weighted k-NN, and locally weighted linear regression.

Uploaded by

Rodin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views21 pages

CS8082U4L01 - K-Nearest Neighbour Learning

This document discusses k-nearest neighbor (k-NN) learning, an instance-based machine learning algorithm. k-NN stores training examples and classifies new examples based on the classes of their k nearest neighbors. The document also discusses locally weighted regression, which generalizes k-NN by constructing a local approximation of the target function for each query instance using nearby, distance-weighted training examples. Key points covered include how k-NN classifies discrete and continuous targets, distance-weighted k-NN, and locally weighted linear regression.

Uploaded by

Rodin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

CS8082 – Machine Learning Techniques

Unit 4 – Instance-Based Learning

k-Nearest Neighbor Learning


Unit Objectives

• Understand the need for machine learning for various problem solving
• Study the various supervised, semi-supervised and unsupervised learning
algorithms in machine learning

2
Unit Outcomes

At the end of the course, the student should be able to:


• CO 1: Differentiate between supervised, unsupervised, semi-supervised machine learning
approaches
• CO 2: Apply specific supervised or unsupervised machine learning algorithm for a particular
problem
• CO 3: Analyze and suggest the appropriate machine learning approach for the various types of
problem
• CO 4: Design and make modifications to existing machine learning algorithms to suit an
individual application
• CO 5: Provide useful case studies on the advanced machine learning algorithms

3
Prerequisite

 Understanding on k-NN algorithm


 Familiar to distance based instance similarity identification
 General idea on gradient descent
What is instance-based learning?

 In contrast to learning methods that construct a general, explicit


description of the target function when training examples are
provided, instance-based learning methods simply store the
training examples.
 Instance-based learning includes nearest neighbor and locally
weighted regression methods that assume instances can be
represented as points in a Euclidean space.
 Instance-based methods are sometimes referred to as "lazy"
learning methods because they delay processing until a new
instance must be classified.
Instance-based learning

 Learning in these algorithms consists of simply storing


the presented training data.
 When a new query instance is encountered, a set of similar related
instances is retrieved from memory and used to classify the new query
instance.

 Many techniques present only local approximation and


not for the entire instance space.
1. k- Nearest Neighborhood Learning

 Assumes all instances correspond to points in the n-dimensional


space
 The nearest neighbors of an instance are defined in terms of the
standard Euclidean distance.
 Let an arbitrary instance x be described by the feature vector

 Then the distance between two instances xi and xj is defined to


be d(xi, xj), where
k- Nearest Neighborhood Learning

Target function : Discrete – valued or real-valued


 CASE 1: Discrete – valued
k- Nearest Neighborhood Learning

 The diagram on the right side of figure shows the shape of this decision
surface induced by 1-NN over the entire instance space.
 The decision surface is a combination of convex polyhedra surrounding each
of the training examples.
 For every training example, the polyhedron indicates the set of query points
whose classification will be completely determined by that training example.
 Query points outside the polyhedron are closer to some other
training example.
 This kind of diagram is often called the Voronoi diagram of the set of
training examples.
k- Nearest Neighborhood Learning

 CASE 2: Continuous – valued


 It calculates the mean value of the k nearest training examples
rather than calculate their most common value.
Distance-Weighted Nearest Neighbor Algorithm

Refinement of k-NN algorithm


 To weight the contribution of each of the k neighbors according to
their distance to the query point giving greater weight to closer
neighbors.
 Weight the vote of each neighbor according to the inverse square
of its distance from xq.
Distance-Weighted Nearest Neighbor Algorithm
Distance-weighted instances – Real-valued target function

 The only disadvantage of considering all examples is that our classifier will run
more slowly.
 If all training examples are considered when classifying a new query instance,
we call the algorithm a global method.
 If only the nearest training examples are considered, we call it a local method.
 When the above rule is applied as a global method, using all training examples,
it is known as Shepard's method.
2. Locally weighted regression

 Generalization of approximating as the target function f (x) at the single


query point x = xq
 Locally weighted regression uses nearby or distance-weighted training
examples to form this local approximation to f.
Locally weighted regression
 local because the function is approximated based only on data near the query
point,
 weighted because the contribution of each training example is weighted by its
distance from the query point, and
 Regression because this is the term used widely in the statistical learning
community for the problem of approximating real-valued functions.
Understanding Locally weighted regression
Locally weighted linear regression

Consider a linear function

 While concentrating on global approximation to target function we


derive methods to choose weights that minimize the squared error
summed over the set D of training examples
Locally weighted linear regression
 How shall we modify this procedure to derive a local approximation rather than a
global one?
 The simple way is to redefine the error criterion E to emphasize fitting the local
training examples.
 Three possible criteria are given below.
Locally weighted linear regression

 Rederiving the gradient descent rule:

The contribution of instance x to the weight update is now multiplied


by the distance penalty and that the error is summed over only the k
nearest training examples.
Summary
 k-NN an instance-based algorithm for approximating real-valued or discrete-
valued target functions, assuming instances correspond to points in an n-
dimensional Euclidean space.
 The target function value for a new query is estimated from the known values
of the k nearest training examples.
 Locally weighted regression methods are a generalization of k-NN which an
explicit local approximation to the target function is constructed for each query
instance.
 The local approximation to the target function may be based on a variety of
functional forms such as constant, linear, or quadratic functions or on spatially
localized kernel functions.
References

TEXT BOOKS:

1. Tom M. Mitchell, ―Machine Learning‖, McGraw-Hill Education (India) Private Limited, 2013.

REFERENCES:

2. Ethem Alpaydin, ―Introduction to Machine Learning (Adaptive Computation and Machine


Learning), The MIT Press 2004.
3. Stephen Marsland, ―Machine Learning: An Algorithmic Perspective ‖, CRC Press, 2009.

You might also like