K-NN Algorithm Overview
K-NN Algorithm Overview
Below is a properly formatted, easily readable explanation with high-quality LaTeX for all
equations:
How It Works:
When a new query comes in, similar instances are retrieved from the stored data.
This approach is beneficial for complex target functions since a collection of simpler,
local models can better capture the variations.
Advantages:
Highly Flexible:
Adapts dynamically to each query without assuming a fixed function.
1/8
Effective for Complex Functions:
Works well when the global function is too complex to capture accurately.
Disadvantages:
Key Comparisons:
Euclidean Distance
The distance between two instances xi and xj is calculated using:
n
∑ (ar (xi ) − ar (xj )) .
2
d(xi , xj ) =
r=1
Algorithm Phases
2/8
Training Phase:
Simply store all training examples (x, f (x)) without constructing an explicit general
model.
1. Distance Calculation:
For a query instance xq , compute its distance from every training example.
2. Neighbor Selection:
Identify the k training examples with the smallest distances to xq .
3. Majority Voting:
For each class label v in the set
V = {v1 , v2 , … , vm },
count the number of times v appears among the k nearest neighbors. This is
expressed as:
k
f (xq ) = arg max ∑ 1(f (xi ) = v),
v∈V
i=1
f (xq ) = f (x1 ),
k
1
f (xq ) = ∑ f (xi ).
k i=1
Key Characteristics:
3/8
Lazy Learning:
No explicit model is built during training; all processing is done at query time.
Non-Parametric:
No assumptions are made about the data distribution.
Weight Calculation
A common weight function is the inverse square of the distance:
1
wi = .
d(xq , xi )2
k
f (xq ) = arg max ∑ wi ⋅ 1(f (xi ) = v).
v∈V
i=1
Special Case:
If xq exactly matches a training instance (i.e., d(xq , xi )
= 0), assign that instance’s class
(or use a majority vote if multiple matches exist).
4/8
Prediction:
Compute the weighted average:
k
∑i=1 wi f (xi )
f (xq ) = .
k
∑i=1
wi
Global Method:
Consider all training examples, where distant ones have minimal impact due to very
small weights. When the global approach is used with this weighted rule, it is known as
Shepard’s Method.
separate model for each query using only nearby training data.
Key Idea:
The method focuses on data points that are "close" to xq by assigning them higher
weights. This local model is then used to predict the target value at xq .
5/8
where a1 (x), a2 (x), … , an (x) are the attributes of x.
Weighting Mechanism:
Purpose:
Ensure that training examples closer to xq have more influence.
Weight Function:
A kernel function K(d(xq , x)) is used to assign weights based on the distance
d(xq , x).
d(xq , x)2
K(d(xq , x)) = exp (− ),
2τ 2
where τ is the bandwidth parameter controlling the rate of decay with distance.
error:
2
E(xq ) = ∑ K(d(xq , x)) (f (x) − f^(x)) ,
x∈D
where D is the set of training examples (or a subset, such as the k -nearest neighbors).
Fitting Process:
1. Distance Calculation:
For the query point xq , calculate d(xq , x) for each training example.
2. Weight Computation:
Compute the weight K(d(xq , x)) for each example.
4. Prediction:
Use the local model to predict the target value at xq :
f^(xq ).
Note:
The local model is discarded after the prediction is made.
6/8
Gradient Descent Alternative:
Instead of solving directly, update the coefficients using gradient descent:
∂E(xq )
wj ← wj − η ⋅ ,
∂wj
Advantages
Local Adaptation:
Captures variations in f (x) that differ across regions.
Noise Mitigation:
The weighting reduces the impact of noisy, distant data points.
Flexibility:
Various functional forms (linear, quadratic) can be chosen based on the local complexity.
Parameter Tuning:
Choosing the kernel bandwidth τ or the number of nearest neighbors is critical.
Risk of Overfitting:
Overly complex models may fit noise in the small local dataset if not regularized
properly.
Summary
Locally Weighted Regression approximates the target function f (x) by fitting a simple,
local model around each query point xq .
The model parameters are determined by minimizing a weighted error function, either
via weighted least squares or gradient descent.
LWR is particularly effective when the target function exhibits local variations that a
single global model might miss.
7/8
This detailed explanation provides a comprehensive overview of instance-based learning, k-
NN (including distance-weighted variations), and locally weighted regression, with all
mathematical expressions presented in high-quality LaTeX.
8/8