Locally Weighted Linear Regression
Locally Weighted Linear Regression
INSTANCE-BASED LEARNING
INTRODUCTION
Instance-based learning methods such as nearest neighbor and locally weighted regression are
conceptually straightforward approaches to approximating real-valued or discrete-valued target
functions.
Learning in these algorithms consists of simply storing the presented training data.
When a new query instance is encountered, a set of similar related instances is retrieved from memory
and used to classify the new query instance.
One key difference between these approaches and other methods is that instance-based approaches
can construct a different approximation to the target function for each distinct query instance that must
be classified.
In fact, many techniques construct only a local approximation to the target function that applies in the
neighborhood of the new query instance, and never construct an approximation designed to perform
well over the entire instance space.
This has significant advantages when the target function is very complex, but can still be described by a
collection of less complex local approximations.
Instance-based methods can also use more complex, symbolic representations for instances. In case-
based learning, instances are represented in this fashion and the process for identifying "neighboring"
instances is elaborated accordingly.
Case-based reasoning has been applied to tasks such as storing and reusing past experience at a help
desk, reasoning about legal cases by referring to previous cases, and solving complex scheduling
problems by reusing relevant portions of previously solved problems.
2. In nearest neighbor approaches, is that they typically consider all attributes of the instances when
attempting to retrieve similar training examples from memory.
If the target concept depends on only a few of the many available attributes, then the instances that are
truly most "similar" may well be a large distance apart.
The most basic instance-based method is the k-NEAREST NEIGHBOR algorithm. This algorithm assumes
all instances correspond to points in the n-dimensional space .
The nearest neighbors of an instance are defined in terms of the standard Euclidean distance. More
precisely, let an arbitrary instance x be described by the feature vector
The distance between two instances xi and xj is defined to be d(xi, xj), where
Let us first consider learning discrete-valued target functions of the form , where V is the
finite set {v1, . . . vs}. The k-NEAREST NEIGHBOR algorithm for approximating a discrete-valued target
function is given in Table 8.1
This function returns, the most common value of f among the k training examples nearest to x q
Distance-Weighted NEAREST NEIGHBOR Algorithm
This is to weight the contribution of each of the k neighbors according to their distance to the query
point xq, giving greater weight to closer neighbors.
We might weight the vote of each neighbor according to the inverse square of its distance from x q.
Where
To accommodate the case where the query point x, exactly matches one of the training instances x i and
the denominator d(xq, xi)2 is therefore zero, we assign f^(xq) to be f (xi) in this case. If there are several
such training examples, we assign the majority classification among them.
We can distance-weight the instances for real-valued target functions in a similar fashion, replacing the
final line of the algorithm in this case by
The only disadvantage of considering all examples is that our classifier will run more slowly. If all training
examples are considered when classifying a new query instance, we call the algorithm a global method.
If only the nearest training examples are considered, we call it a local method.
Linear regression is a supervised learning algorithm used for computing linear relationships between
input (X) and output (Y).
The steps involved in ordinary linear regression are:
As evident from the image below, this algorithm cannot be used for making predictions when there
exists a non-linear relationship between X and Y. In such cases, locally weighted linear regression is
used.
Locally Weighted Linear Regression:
.
For example –
Points to remember:
Locally weighted linear regression is a supervised learning algorithm.
It a non-parametric algorithm.
There exists No training phase. All the work is done during the testing phase/while making
predictions.
The nearest-neighbor approach approximates the target function f (x) at the single query point
x = xq
Locally weighted regression is a generalization of this approach. It constructs an explicit
approximation to f over a local region surrounding xq
Locally weighted regression uses nearby or distance-weighted training examples to form this
local approximation to f. For example, we might approximate the target function in the
neighborhood surrounding x, using a linear function, a quadratic function, a multilayer neural
network, or some other functional form.
The phrase "locally weighted regression" is called
local - because the function is approximated based a only on data near the query point,
weighted - because the contribution of each training example is weighted by its distance from
the query point
regression - because this is the term used widely in the statistical learning community for the
problem of approximating real-valued functions.
Given a new query instance x q , the general approach in locally weighted regression is to
construct an approximation f^ that fits the training examples in the neighborhood surrounding
xq .
This approximation is then used to calculate the value f^(x,), which is output as the estimated
target value for the query instance. The description of f^ may then be deleted, because a
different local approximation will be calculated for each distinct query instance.
Locally Weighted Linear Regression Let us consider the case of locally weighted regression in
which the target function f is approximated near x, using a linear function of the form
The gradient descent is used to find the coefficients w0 . . . wn, to minimize the error in fitting
such linear functions to a given set of training examples.
We need to choose weights that minimize the squared error summed over the set D of training
examples
One approach to function approximation that is closely related to distance-weighted regression and also
to artificial neural networks is learning with radial basis functions.
-------------(1)
The function given by Equation (1) can be viewed as describing a twolayer network where the first layer
of units computes the values of the various
the second layer computes a linear combination of these first-layer unit values. An example radial basis
function (RBF) network is illustrated in Figure given below. Because the kernel functions are held fixed
during this second stage, the linear weight values wu can be trained very efficiently.
Thus,radial basis function networks provide a global approximation to the target function,
represented by a linear combination of many local kernel functions. The value for any given
kernel function is non-negligible only when the input x falls into the region defined by its
particular center and width.
Thus, the network can be viewed as a smooth linear combination of many local approximations
to the target function. One key advantage to RBF networks is that they can be trained much
more efficiently than feed forward networks trained with BACKPROPAGATION. This follows from
the fact that the input layer and the output layer of an RBF are trained separately.