0% found this document useful (0 votes)
4 views11 pages

Locally Weighted Linear Regression

Instance-based learning methods store training examples and classify new instances by examining their relationships to these stored examples, allowing for local approximations of target functions. Techniques like k-nearest neighbor and locally weighted regression are key approaches, with advantages in handling complex target functions but potential high classification costs. Additionally, locally weighted regression and radial basis functions provide efficient ways to approximate functions based on nearby training examples.

Uploaded by

anildevaraj35
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views11 pages

Locally Weighted Linear Regression

Instance-based learning methods store training examples and classify new instances by examining their relationships to these stored examples, allowing for local approximations of target functions. Techniques like k-nearest neighbor and locally weighted regression are key approaches, with advantages in handling complex target functions but potential high classification costs. Additionally, locally weighted regression and radial basis functions provide efficient ways to approximate functions based on nearby training examples.

Uploaded by

anildevaraj35
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Module - 5

INSTANCE-BASED LEARNING

 Instance-based learning methods simply store the training examples.


 Generalizing beyond these examples is postponed until a new instance must be classified.
 Each time a new query instance is encountered, its relationship to the previously stored
examples is examined in order to assign a target function value for the new instance.
 Instance based learning includes nearest neighbor and locally weighted regression methods that
assume instances can be represented as points in a Euclidean space.
 It also includes case-based reasoning methods that use more complex, symbolic representations
for instances.
 Instance-based methods are sometimes referred to as "lazy" learning methods because they
delay processing until a new instance must be classified.
 A key advantage of this kind of delayed, or lazy, learning is that instead of estimating the target
function once for the entire instance space, these methods can estimate it locally and differently
for each new instance to be classified.

INTRODUCTION

Instance-based learning methods such as nearest neighbor and locally weighted regression are
conceptually straightforward approaches to approximating real-valued or discrete-valued target
functions.

Learning in these algorithms consists of simply storing the presented training data.

When a new query instance is encountered, a set of similar related instances is retrieved from memory
and used to classify the new query instance.

One key difference between these approaches and other methods is that instance-based approaches
can construct a different approximation to the target function for each distinct query instance that must
be classified.

In fact, many techniques construct only a local approximation to the target function that applies in the
neighborhood of the new query instance, and never construct an approximation designed to perform
well over the entire instance space.

This has significant advantages when the target function is very complex, but can still be described by a
collection of less complex local approximations.

Instance-based methods can also use more complex, symbolic representations for instances. In case-
based learning, instances are represented in this fashion and the process for identifying "neighboring"
instances is elaborated accordingly.

Case-based reasoning has been applied to tasks such as storing and reusing past experience at a help
desk, reasoning about legal cases by referring to previous cases, and solving complex scheduling
problems by reusing relevant portions of previously solved problems.

Disadvantage of instance-based approaches


1.The cost of classifying new instances can be high. This is due to the fact that nearly all computation
takes place at classification time rather than when the training examples are first encountered.

2. In nearest neighbor approaches, is that they typically consider all attributes of the instances when
attempting to retrieve similar training examples from memory.

If the target concept depends on only a few of the many available attributes, then the instances that are
truly most "similar" may well be a large distance apart.

k-NEAREST NEIGHBOR LEARNING

The most basic instance-based method is the k-NEAREST NEIGHBOR algorithm. This algorithm assumes
all instances correspond to points in the n-dimensional space .

The nearest neighbors of an instance are defined in terms of the standard Euclidean distance. More
precisely, let an arbitrary instance x be described by the feature vector

In nearest-neighbor learning the target function may be either discrete-valued or real-valued.

where ar(x) denotes the value of the rth attribute of instance x.

The distance between two instances xi and xj is defined to be d(xi, xj), where

Let us first consider learning discrete-valued target functions of the form , where V is the
finite set {v1, . . . vs}. The k-NEAREST NEIGHBOR algorithm for approximating a discrete-valued target
function is given in Table 8.1

This function returns, the most common value of f among the k training examples nearest to x q
Distance-Weighted NEAREST NEIGHBOR Algorithm

This is the refinement to the k-NEAREST NEIGHBOR algorithm.

This is to weight the contribution of each of the k neighbors according to their distance to the query
point xq, giving greater weight to closer neighbors.

We might weight the vote of each neighbor according to the inverse square of its distance from x q.

Where

To accommodate the case where the query point x, exactly matches one of the training instances x i and
the denominator d(xq, xi)2 is therefore zero, we assign f^(xq) to be f (xi) in this case. If there are several
such training examples, we assign the majority classification among them.

We can distance-weight the instances for real-valued target functions in a similar fashion, replacing the
final line of the algorithm in this case by

The only disadvantage of considering all examples is that our classifier will run more slowly. If all training
examples are considered when classifying a new query instance, we call the algorithm a global method.
If only the nearest training examples are considered, we call it a local method.

Locally weighted Linear Regression


Prerequisite: Linear Regression

Linear regression is a supervised learning algorithm used for computing linear relationships between
input (X) and output (Y).
The steps involved in ordinary linear regression are:

As evident from the image below, this algorithm cannot be used for making predictions when there
exists a non-linear relationship between X and Y. In such cases, locally weighted linear regression is
used.
Locally Weighted Linear Regression:

.
For example –
Points to remember:
 Locally weighted linear regression is a supervised learning algorithm.
 It a non-parametric algorithm.
 There exists No training phase. All the work is done during the testing phase/while making
predictions.

LOCALLY WEIGHTED REGRESSION

 The nearest-neighbor approach approximates the target function f (x) at the single query point
x = xq
Locally weighted regression is a generalization of this approach. It constructs an explicit
approximation to f over a local region surrounding xq
 Locally weighted regression uses nearby or distance-weighted training examples to form this
local approximation to f. For example, we might approximate the target function in the
neighborhood surrounding x, using a linear function, a quadratic function, a multilayer neural
network, or some other functional form.
 The phrase "locally weighted regression" is called
local - because the function is approximated based a only on data near the query point,
weighted - because the contribution of each training example is weighted by its distance from
the query point
regression - because this is the term used widely in the statistical learning community for the
problem of approximating real-valued functions.
Given a new query instance x q , the general approach in locally weighted regression is to
construct an approximation f^ that fits the training examples in the neighborhood surrounding
xq .
This approximation is then used to calculate the value f^(x,), which is output as the estimated
target value for the query instance. The description of f^ may then be deleted, because a
different local approximation will be calculated for each distinct query instance.
Locally Weighted Linear Regression Let us consider the case of locally weighted regression in
which the target function f is approximated near x, using a linear function of the form

Locally Weighted Linear Regression


Let us consider the case of locally weighted regression in which the target function f is
approximated near x, using a linear function of the form:

denotes the value of the ith attribute of the instance x.

The gradient descent is used to find the coefficients w0 . . . wn, to minimize the error in fitting
such linear functions to a given set of training examples.
We need to choose weights that minimize the squared error summed over the set D of training
examples

the gradient descent training rule is :

where - a constant learning rate.


The simple way is to redefine the error criterion E to emphasize fitting the local training
examples.
Three possible criteria are given below. Note we write the error E(x q) to emphasize the fact that
now the error is being defined as a function of the query point xq
Criterion three is a good approximation to criterion two and has the advantage that
computational cost is independent of the total number of training examples; its cost depends
only on the number k of neighbors considered.
If we choose criterion three above and rederive the gradient descent rule :

Remarks on Locally Weighted Regression


We considered using a linear function to approximate f in the neighborhood of the query
instance xq.
The target function is approximated by a constant, linear, or quadratic function. More complex
functional forms are not often found because (1) the cost of fitting more complex functions for
each query instance is prohibitively high, and (2) these simple approximations model the target
function quite well over a sufficiently small sub region of the instance space.

Radial Basis Functions(RBF)

One approach to function approximation that is closely related to distance-weighted regression and also
to artificial neural networks is learning with radial basis functions.

In this approach, the learned hypothesis is a function of the form

-------------(1)
The function given by Equation (1) can be viewed as describing a twolayer network where the first layer
of units computes the values of the various

the second layer computes a linear combination of these first-layer unit values. An example radial basis
function (RBF) network is illustrated in Figure given below. Because the kernel functions are held fixed
during this second stage, the linear weight values wu can be trained very efficiently.
Thus,radial basis function networks provide a global approximation to the target function,
represented by a linear combination of many local kernel functions. The value for any given
kernel function is non-negligible only when the input x falls into the region defined by its
particular center and width.
Thus, the network can be viewed as a smooth linear combination of many local approximations
to the target function. One key advantage to RBF networks is that they can be trained much
more efficiently than feed forward networks trained with BACKPROPAGATION. This follows from
the fact that the input layer and the output layer of an RBF are trained separately.

You might also like