K Nearest Neighbour Classifier
K Nearest Neighbour Classifier
neighbors
Eager Learners vs Lazy Learners
Eager learners, when given a set of training tuples, will
construct a generalization model before receiving new
(e.g., test) tuples to classify.
Lazy learners simply store data (or do only a little minor
7 4 Bad
3 4 Good
1 4 Good
Now the factory produces a new paper tissue that passes the
laboratory test with X1 = 3 and X2 = 7. Guess the classification of
this new tissue.
Step 1 : Initialize and Define k.
Lets say, k = 3
(Always choose k as an odd number if the number
of attributes is even to avoid a tie in the class prediction)
Step 2 : Compute the distance between input sample and
Training sample
- Co-ordinate of the input sample is (3,7).
- Instead of calculating the Euclidean distance, we
calculate the Squared Euclidean distance.
X1 = Acid Durability X2 = Strength Squared Euclidean distance
(seconds) (kg/square meter)
7 7 (7-3)2 + (7-7)2 = 16
7 4 (7-3)2 + (4-7)2 = 25
3 4 (3-3)2 + (4-7)2 = 09
1 4 (1-3)2 + (4-7)2 = 13
Step 3 : Sort the distance and determine the
nearest neighbours based of the Kth minimum
distance :
7 4 25 4 No
3 4 09 1 Yes
1 4 13 2 Yes
Step 4 : Take 3-Nearest Neighbours:
Gather the category Y of the nearest
neighbours.
7 4 25 4 No -
3 4 09 1 Yes Good
1 4 13 2 Yes Good
Step 5 : Apply simple majority
C4.5 - Models built can be easily - Small variation in data can lead
Algorithm interpreted to different decision trees
- Easy to implement - Does not work very well on
- Can use both discrete small training dataset
and - Over-fitting
continuous values
- Deals with noise
ID3 - It produces more accuracy - Requires large searching time
Algorithm than C4.5 - Sometimes it may generate
- Detection rate is increased very long rules which are
and space consumption is difficult to prune
reduced - Requires large amount of
memory to store tree