0% found this document useful (0 votes)
10 views8 pages

K-NN Algorithm Overview

The document provides an overview of instance-based learning methods, particularly focusing on the k-Nearest Neighbor (k-NN) algorithm and Locally Weighted Regression (LWR). It explains how k-NN classifies instances based on the majority label of the nearest neighbors and discusses the advantages and disadvantages of these methods. Additionally, it details the algorithm phases, distance calculations, and the importance of weighting in LWR for capturing local variations in target functions.

Uploaded by

Denish Suhagiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views8 pages

K-NN Algorithm Overview

The document provides an overview of instance-based learning methods, particularly focusing on the k-Nearest Neighbor (k-NN) algorithm and Locally Weighted Regression (LWR). It explains how k-NN classifies instances based on the majority label of the nearest neighbors and discusses the advantages and disadvantages of these methods. Additionally, it details the algorithm phases, distance calculations, and the importance of weighting in LWR for capturing local variations in target functions.

Uploaded by

Denish Suhagiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

k-NN Algorithm Overview

Below is a properly formatted, easily readable explanation with high-quality LaTeX for all
equations:

Instance-Based Learning Overview


Concept:
Instance-based learning methods, such as nearest neighbor and locally weighted
regression, do not construct a single global model. Instead, they store the training data
and use it to approximate the target function when a new query instance is
encountered.

How It Works:

When a new query comes in, similar instances are retrieved from the stored data.

A local approximation (or prediction) is made using these retrieved instances.

This approach is beneficial for complex target functions since a collection of simpler,
local models can better capture the variations.

Types & Applications:

k-Nearest Neighbor (k-NN):


Classifies a query based on the majority label of the k closest stored instances.

Locally Weighted Regression:


Constructs a weighted model in the local region around a query.

Radial Basis Function (RBF) Networks:


A hybrid method combining instance-based learning and neural networks.

Case-Based Reasoning (CBR):


Uses symbolic representations to retrieve and reuse past cases, useful in domains
such as help desks, legal reasoning, and scheduling.

Advantages:

Highly Flexible:
Adapts dynamically to each query without assuming a fixed function.

1/8
Effective for Complex Functions:
Works well when the global function is too complex to capture accurately.

Disadvantages:

High Classification Cost:


Most computation occurs at query time, making it inefficient for very large datasets.

Sensitive to Irrelevant Features:


Methods like nearest neighbor treat all attributes equally, even if only a few are
important.

Key Comparisons:

Lazy Learning vs. Eager Learning:

Lazy Learning (e.g., k-NN, CBR):


Delays computation until classification time, making it adaptive but memory-
intensive.

Eager Learning (e.g., decision trees, neural networks):


Builds a general model upfront, enabling faster predictions but less adaptability.

k-Nearest Neighbor (k-NN) Algorithm


Instance Representation
Each instance x is represented as a point in an n-dimensional space by its feature vector:

x = ⟨a1 (x), a2 (x), … , an (x)⟩.


​ ​ ​

Euclidean Distance
The distance between two instances xi and xj is calculated using:
​ ​

n
∑ (ar (xi ) − ar (xj )) .
2
d(xi , xj ) =
​ ​ ​ ​ ​ ​ ​ ​

r=1

Algorithm Phases

2/8
Training Phase:
Simply store all training examples (x, f (x)) without constructing an explicit general
model.

Classification Phase (Discrete-Valued Targets):

1. Distance Calculation:
For a query instance xq , compute its distance from every training example.

2. Neighbor Selection:
Identify the k training examples with the smallest distances to xq . ​

3. Majority Voting:
For each class label v in the set

V = {v1 , v2 , … , vm },
​ ​ ​

count the number of times v appears among the k nearest neighbors. This is
expressed as:

k
f (xq ) = arg max ∑ 1(f (xi ) = v),
​ ​ ​ ​

v∈V
i=1

where 1(condition) equals 1 if the condition is true and 0 otherwise.

Special Case (k = 1):


Simply assign:

f (xq ) = f (x1 ),
​ ​

where x1 is the single nearest neighbor.


Adaptation for Continuous-Valued Targets:


Instead of voting, compute the mean of the k nearest neighbors:

k
1
f (xq ) = ∑ f (xi ).
​ ​ ​ ​

k i=1

Decision Boundaries (Voronoi Diagram):


In 1-NN, the decision surface forms a Voronoi diagram. Each training example has an
associated Voronoi cell (a convex polyhedron) containing all points closer to it than to
any other training example.

Key Characteristics:

3/8
Lazy Learning:
No explicit model is built during training; all processing is done at query time.

Non-Parametric:
No assumptions are made about the data distribution.

Sensitivity to Feature Scaling:


Since distances are computed using all attributes, proper scaling is essential.

Distance-Weighted Nearest Neighbor


Algorithm
Idea
Instead of treating each of the k nearest neighbors equally, assign a weight based on the
distance to the query point xq . Closer neighbors have a higher influence.

Weight Calculation
A common weight function is the inverse square of the distance:

1
wi = .
d(xq , xi )2
​ ​

​ ​

For Discrete-Valued Targets


Prediction:
Choose the class that maximizes the sum of weighted votes:

k
f (xq ) = arg max ∑ wi ⋅ 1(f (xi ) = v).
​ ​ ​ ​ ​

v∈V
i=1

Special Case:
If xq exactly matches a training instance (i.e., d(xq , xi )
​ ​ ​
= 0), assign that instance’s class
(or use a majority vote if multiple matches exist).

For Continuous-Valued Targets

4/8
Prediction:
Compute the weighted average:

k
∑i=1 wi f (xi )
f (xq ) = .
​ ​ ​

k
​ ​

∑i=1 ​
wi ​

Local vs. Global Methods


Local Method:
Use only the k nearest neighbors.

Global Method:
Consider all training examples, where distant ones have minimal impact due to very
small weights. When the global approach is used with this weighted rule, it is known as
Shepard’s Method.

Locally Weighted Regression (LWR) –


Detailed Exam Answer
Overview & Concept
Definition:
Locally Weighted Regression builds an explicit model to approximate the target function
f (x) in the vicinity of a given query point xq . Unlike global regression, LWR constructs a

separate model for each query using only nearby training data.

Key Idea:
The method focuses on data points that are "close" to xq by assigning them higher ​

weights. This local model is then used to predict the target value at xq . ​

Local Model Construction


Model Choice:
The local model can be a simple function (e.g., constant, linear, or quadratic).
Example (Linear Model):

f^(x) = w0 + w1 a1 (x) + ⋯ + wn an (x),


​ ​ ​ ​ ​ ​

5/8
where a1 (x), a2 (x), … , an (x) are the attributes of x.
​ ​ ​

Weighting Mechanism:

Purpose:
Ensure that training examples closer to xq have more influence. ​

Weight Function:
A kernel function K(d(xq , x)) is used to assign weights based on the distance

d(xq , x).

Example (Gaussian Kernel):

d(xq , x)2
K(d(xq , x)) = exp (− ),

2τ 2
​ ​

where τ is the bandwidth parameter controlling the rate of decay with distance.

Error Criterion & Model Fitting


Weighted Error Function:
The parameters w0 , w1 , … , wn are determined by minimizing the weighted squared
​ ​ ​

error:

2
E(xq ) = ∑ K(d(xq , x)) (f (x) − f^(x)) ,
​ ​ ​ ​

x∈D

where D is the set of training examples (or a subset, such as the k -nearest neighbors).

Fitting Process:

1. Distance Calculation:
For the query point xq , calculate d(xq , x) for each training example.
​ ​

2. Weight Computation:
Compute the weight K(d(xq , x)) for each example.

3. Solve Weighted Least Squares:


Determine the local coefficients w0 , w1 , … , wn . ​ ​ ​

4. Prediction:
Use the local model to predict the target value at xq : ​

f^(xq ).​ ​

Note:
The local model is discarded after the prediction is made.

6/8
Gradient Descent Alternative:
Instead of solving directly, update the coefficients using gradient descent:

∂E(xq )
wj ← wj − η ⋅ ,

∂wj
​ ​ ​

where η is the learning rate.

Advantages
Local Adaptation:
Captures variations in f (x) that differ across regions.

Noise Mitigation:
The weighting reduces the impact of noisy, distant data points.

Flexibility:
Various functional forms (linear, quadratic) can be chosen based on the local complexity.

Considerations & Challenges


Computational Cost:
A separate model is fitted for each query, which can be expensive for large datasets.

Parameter Tuning:
Choosing the kernel bandwidth τ or the number of nearest neighbors is critical.

Risk of Overfitting:
Overly complex models may fit noise in the small local dataset if not regularized
properly.

Summary
Locally Weighted Regression approximates the target function f (x) by fitting a simple,
local model around each query point xq . ​

It uses a weighting function K(d(xq , x)) to emphasize nearby training examples.


The model parameters are determined by minimizing a weighted error function, either
via weighted least squares or gradient descent.

LWR is particularly effective when the target function exhibits local variations that a
single global model might miss.

7/8
This detailed explanation provides a comprehensive overview of instance-based learning, k-
NN (including distance-weighted variations), and locally weighted regression, with all
mathematical expressions presented in high-quality LaTeX.

8/8

You might also like