0% found this document useful (0 votes)
40 views16 pages

Instance Based Learning

Instance-based learning methods store training examples instead of learning an explicit target function. When a new instance is encountered, its relationship to stored examples is examined to assign a target value. K-nearest neighbors is an instance-based learning algorithm that classifies new examples based on the majority class of its k nearest neighbors, where distance is used to determine similarity. KNN is a simple yet powerful algorithm that can perform well with large datasets, though it requires tuning of the k parameter and computation time increases with data size.

Uploaded by

Swathi Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views16 pages

Instance Based Learning

Instance-based learning methods store training examples instead of learning an explicit target function. When a new instance is encountered, its relationship to stored examples is examined to assign a target value. K-nearest neighbors is an instance-based learning algorithm that classifies new examples based on the majority class of its k nearest neighbors, where distance is used to determine similarity. KNN is a simple yet powerful algorithm that can perform well with large datasets, though it requires tuning of the k parameter and computation time increases with data size.

Uploaded by

Swathi Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Instance Based Learning

•Instance-based learning methods simply store the training examples


instead of learning explicit description of the target function.

•Generalizing the examples is postponed until a new instance must be


classified.

•When a new instance is encountered, its relationship to the stored


examples is examined in order to assign a target function value for the
new instance.
•Instance-based learning includes nearest neighbor, locally weighted
regression and case-based reasoning methods.

•Instance-based methods are sometimes referred to as lazy learning


methods because they delay processing until a new instance must be
classified.
•A key advantage of lazy learning is that instead of estimating the target
function once for the entire instance space, these methods can estimate it
locally and differently for each new instance to be classified.
1
KNN – Different names
• K-Nearest Neighbors
• Memory-Based
Reasoning
• Example-Based
Reasoning
• Instance-Based
Learning
• Lazy Learning 2
What is KNN?

• A powerful classification algorithm used in


pattern recognition.

• K nearest neighbors stores all available cases


and classifies new cases based on a similarity
measure(e.g distance function)

• One of the top data mining algorithms used today.

• A non-parametric lazy learning algorithm (An


Instance- based Learning method).

3
KNN: Classification Approach
• An object (a new instance) is classified
by a majority votes for its neighbor
classes.
• The object is assigned to the most common
class amongst its K nearest neighbors.
(measured by a distant function )

4
Distance measure for Continuous
Variables

5
Distance Between Neighbors
• Calculate the distance between new example
(E) and all examples in the training set.

• Euclidean distance between two examples.


– X = [x1,x2,x3,..,xn]
– Y = [y1,y2,y3,...,yn]

– The Euclidean distance between X and Y is


defined as n
2
D( X ,Y ) (x − y )
:
=
∑ i i 11
i=1
K-Nearest Neighbor Algorithm
• All the instances correspond to points in an n-
dimensional feature space.

• Each instance is represented with a set of


numerical attributes.

• Each of the training data consists of a set of vectors


and a class label associated with each vector.

• Classification is done by comparing feature


vectors of different K nearest points.

• Select the K-nearest examples to E in the training


set.
7
3-KNN: Example(1)

sqrt [(35-37)2+(35-50)2 +(3-


2)2]=15.16

sqrt [(22-37)2+(50-50)2 +(2-


2)2]=15

sqrt [(63-37)2+(200-50)2 +(1-


2)2]=152.23

sqrt [(59-37)2+(170-50)2 +(1-


2)2]=122
sqrt [(25-37)2+(40-50)2 +(4-
2)2]=15.74
?

8
How to choose K?

• If K is too small it is sensitive to noise points.

• Larger K works well. But too large K may include


majority points from other classes.

• Rule of thumb is K < sqrt(n), n is number of


examples.

9
10
X X X

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data points


that have the k smallest distance to x

11
KNN Feature Weighting

• Scale each feature by its


importance for classification

• Can use our prior knowledge about which


features are more important
• Can learn the weights wk using cross‐
validation (to be covered later)
12
Nominal/Categorical Data
• Distance works naturally with numerical attributes.

• Binary value categorical data attributes can be


regarded as 1 or 0.

13
KNN Classification – Distance
Age Loan Default Distance
25 $40,000 N 102000
35 $60,000 N 82000
45 $80,000 N 62000
20 $20,000 N 122000
35 $120,000 N 22000
52 $18,000 N 124000
23 $95,000 Y 47000
40 $62,000 Y 80000
60 $100,000 Y 42000
48 $220,000 Y 78000
33 $150,000 Y 8000

48
D = $142,000
(x − x )?2 + ( y − y
)2
1 2 1 2 14
KNN Classification
$250,00
0

$200,00
0

Loan Non-
$150,00
$ 0
$100,00 Default
0 Default

$50,00
0

$0 0 1 2 3 4 5 6 7
0 0 0 0 0 0 0
Ag
e

15
Strengths of KNN
• Very simple and intuitive.
• Can be applied to the data from any distribution.
• Good classification if the number of samples is large
enough.

Weaknesses of KNN

• Takes more time to classify a new example.


• need to calculate and compare distance from new
example to all other examples.
• Choosing k may be tricky.
• Need large number of samples for accuracy.

16

You might also like