K-Nearest Neighbor
Learning
Different Learning Methods
Eager Learning
Explicit description of target function on
the whole training set
Instance-based Learning
Learning=storing all training instances
Classification=assigning target function
to a new instance
Referred to as “Lazy” learning
Different Learning Methods
Eager Learning
Any random movement
=>It’s a mouse
I saw a mouse!
Instance-based Learning
Its very similar to a
Desktop!!
Instance-based Learning
K-Nearest Neighbor Algorithm
Weighted Regression
Case-based reasoning
•K-Nearest Neighbour is one of the simplest Machine
Learning algorithms based on Supervised Learning
technique.
•K-NN algorithm assumes the similarity between the new
case/data and available cases and put the new case into the
category that is most similar to the available categories.
•K-NN algorithm stores all the available data and
classifies a new data point based on the similarity. This
means when new data appears then it can be easily classified
into a well suite category by using K- NN algorithm.
•K-NN algorithm can be used for Regression as well as
for Classification but mostly it is used for the
Classification problems.
•K-NN is a non-parametric algorithm, which means it
does not make any assumption on underlying data.
•It is also called a lazy learner algorithm because it does
not learn from the training set immediately instead it
stores the dataset and at the time of classification, it
performs an action on the dataset.
•KNN algorithm at the training phase just stores the
dataset and when it gets new data, then it classifies that
Why K NN
K-Nearest Neighbor
Features
All instances correspond to points in an
n-dimensional Euclidean space
Classification is delayed till a new
instance arrives
Classification done by comparing
feature vectors of the different points
Target function may be discrete or real-
valued
1-Nearest Neighbor
3-Nearest Neighbor
How does K-NN work?
Step-1: Select the number K of the neighbors
Step-2: Calculate the Euclidean distance of K
number of neighbors
Step-3: Take the K nearest neighbors as per the
calculated Euclidean distance.
Step-4: Among these k neighbors, count the
number of the data points in each category.
Step-5: Assign the new data points to that
category for which the number of the neighbor is
maximum.
Step-6: Our model is ready.
•Firstly, we will choose the number of
neighbors, so we will choose the k=5.
•Next, we will calculate the Euclidean
distance between the data points. The Euclidean
distance is the distance between two points,
which we have already studied in geometry. It
can be calculated as:
As we can see the 3 nearest neighbors are from category A, hence
How to choose k?
When to use KNN
Problem to solve
K-Nearest Neighbor
An arbitrary instance is represented by
(a1(x), a2(x), a3(x),.., an(x))
ai(x) denotes features
Euclidean distance between two instances
d(xi, xj)=sqrt (sum for r=1 to n (ar(xi) -
ar(xj))2)
Continuous valued target function
mean value of the k nearest training examples
Distance-Weighted Nearest
Neighbor Algorithm
Assign weights to the neighbors
based on their ‘distance’ from the
query point
Weight ‘may’ be inverse square of the
distances
All training points may influence a
particular instance
Shepard’s method
Remarks
Efficient memory indexing
To retrieve the stored training
examples (kd-tree)