Introduction To Classification - KNN

Introduction to classification - KNN

Uploaded by

samanthasaryar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views29 pages

Introduction To Classification - KNN

Introduction to classification - KNN

Uploaded by

samanthasaryar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 29

INTRODUCTION TO

CLASSIFICATION
K - NEAREST NEIGHBOUR
Prepared By
Fariha Jahan
Lecturer, Department
of Computer Science &
Engineering
Daffodil International
University(DIU)
Nearest Neighbour
• Mainly used when all attribute values are continuous
• It can be modified to deal with categorical attributes
• The idea is to estimate the classification of an unseen instance
using the classification of the instance or instances that are closest
to it, in some sense that we need to define (classifies new cases
based on a similarity measure)
Nearest Neighbour

• What should its classification be?

• Even without knowing what the six attributes represent, it seems
intuitively obvious that the unseen instance is nearer to the first
instance than to the second.
K - Nearest Neighbour (KNN)
• In practice there are likely to be many more instances in the
training set but the same principle applies.
• It
is usual to base the classification on those of the k nearest
neighbours, not just the nearest one.
• The method is then known as k-Nearest Neighbour or just k-NN
classification
KNN
• We can illustrate k-NN classification diagrammatically when the
dimension (i.e. the number of attributes) is small.
• Next
we will see an example which illustrates the case where the
dimension is just 2.
• Inreal-world data mining applications it can of course be
considerably larger.
KNN
•A training set with 20 instances, each giving
the values of two attributes and an associated
classification
• How can we estimate the classification for an
‘unseen’ instance where the first and second
attributes are 9.1 and 11.0, respectively?
KNN
• For this small number of attributes we can
represent the training set as 20 points on a
two-dimensional graph with values of the first
and second attributes measured along the
horizontal and vertical axes, respectively.
• Each point is labelled with a + or − symbol to
indicate that the classification is positive or
negative, respectively.
KNN

•Acircle has been added to enclose the five

nearest neighbours of the unseen instance,
which is shown as a small circle close to the
centre of the larger one.
KNN

• Thefive nearest neighbours are labelled with three

+ signs and two − signs
• Soa basic 5-NN classifier would classify the unseen
instance as ‘positive’ by a form of majority voting.
KNN
• We can represent two points in two dimensions (‘in two-
dimensional space’ is the usual term) as (a1, a2) and (b1, b2)
• When there are three attributes we can represent the points by
(a1, a2, a3) and (b1, b2, b3)
• When there are n attributes, we can represent the instances by the
points (a1, a2, . . . , an) and (b1, b2, . . . , bn) in ‘n-dimensional
space’
Distance Measures
• There are many possible ways of measuring the distance between
two instances with n attribute values, or equivalently between two
points in n-dimensional space.
• But here distance measurement usually imposes three requirements
(let, dist(X, Y) denotes the distance between two points X and Y)
• The distance of any point A from itself is zero, i.e. dist(A, A) = 0
• The distance from A to B is the same as the distance from B to A, i.e.
dist(A, B) = dist(B, A) (the symmetry condition)
• The third condition is called the triangle inequality (Figure 2.7). It
corresponds to the intuitive idea that ‘the shortest distance between any
two points is a straight line’. The condition says that for any points A, B
and Z:
dist(A, B) ≤ dist(A, Z) + dist(Z, B)
Distance Measures
• There are many possible distance measures
• Euclidean Distance

• Manhattan Distance or City Block Distance

• Hamming Distance …
Distance Measures: Euclidean
Distance
• Ifwe denote an instance in the training set by (a1, a2) and the
unseen instance by (b1, b2) the length of the straight line joining
the points is

• Ifthere are two points (a1, a2, a3) and (b1, b2, b3) in a three-
dimensional space the corresponding formula is

• The formula for Euclidean distance between points (a1, a2, . . . ,

an) and (b1, b2, . . . , bn) in n-dimensional space is a generalisation
of these two results. The Euclidean distance is given by the
formula
Distance Measures: Manhattan
Distance
• TheCity Block distance between the points (4, 2) and (12, 9) is (12
− 4) + (9 − 2) = 8 + 7 = 15
KNN
•A training set with 20 instances, each giving
the values of two attributes and an associated
classification
• How can we estimate the classification for an
‘unseen’ instance where the first and second
attributes are 9.1 and 11.0, respectively?
• Use Euclidean Distance
Normalisation
•A major problem when using the Euclidean distance formula (and
many other distance measures) is that the large values frequently
swamp the small ones.

• When the distance of these instances from an unseen one is

calculated, the mileage attribute will almost certainly contribute a
value of several thousands squared, i.e. several millions, to the
sum of squares total.
Normalisation
• It
is clear that in practice the only attribute that will matter when
deciding which neighbours are the nearest using the Euclidean
distance formula is the mileage.
• Wecould have chosen an alternative measure of distance travelled
such as millimetres or perhaps light years. Similarly we might have
measured age in some other unit such as milliseconds or millennia.
The units chosen should not affect the decision on which are the
nearest neighbours.
Normalisation
• Toovercome this problem we generally normalise the values of
continuous attributes.
• The idea is to make the values of each attribute run from 0 to 1.
• In
general if the lowest value of attribute A is min and the highest
value is max, we convert each value of A, say a, to (a − min)/(max
− min).
• Using this approach all continuous attributes are converted to
small numbers from 0 to 1, so the effect of the choice of unit of
measurement on the outcome is greatly reduced.
Normalisation
• Note that it is possible that an unseen instance may have a value
of A that is less than min or greater than max. If we want to keep
the adjusted numbers 38 Principles of Data Mining in the range
from 0 to 1 we can just convert any values of A that are less than
min or greater than max to 0 or 1, respectively.
Normalisation
• Another issue that occurs with measuring the distance between
two points is the weighting of the contributions of the different
attributes.
• We may believe that the mileage of a car is more important than
the number of doors it has.
• To achieve this we can adjust the formula for Euclidean distance to

where w1, w2, . . . , wn are the weights. It is customary to scale the

weight values so that the sum of all the weights is one.
Dealing with Categorical
Attributes
• One of the weaknesses of the nearest neighbour approach to
classification is that there is no entirely satisfactory way of dealing
with categorical attributes.
• One possibility is to say that the difference between any two
identical values of the attribute is zero and that the difference
between any two different values is 1. (Hamming Distance)
• Effectively
this amounts to saying (for a colour attribute) red − red
= 0, red − blue = 1, blue − green = 1, etc.
Dealing with Categorical
Attributes
• Sometimes there is an ordering (or a partial ordering) of the values
of an attribute (Ordinal Attribute), for example we might have
values good, average and bad.
• We could treat the difference between good and average or
between average and bad as 0.5 and the difference between good
and bad as 1.
• Thisstill does not seem completely right, but may be the best we
can do in practice.
Age Loan Default Distance
Exercise-1 25 40000 N
35 60000 N
45 80000 N
20 20000 N
35 120000 N
52 18000 N
23 95000 Y
40 62000 Y
60 100000 Y
48 220000 Y
33 150000 Y

48 142000 ??
Age Loan Default Distance
Exercise-1 25 40000 N 102000
35 60000 N 82000
45 80000 N 62000
20 20000 N 122000
35 120000 N 22000
52 18000 N 124000
23 95000 Y 47000
40 62000 Y 80000
60 100000 Y 42000
48 220000 Y 78000
33 150000 Y 8000

48 142000 ??
Age Loan Default Distance
Exercise-2 0.125 0.11 N
0.375 0.21 N
0.625 0.31 N
0 0.01 N
0.375 0.5 N
0.8 0 N
0.075 0.38 Y
0.5 0.22 Y
1 0.41 Y
0.7 1 Y
0.325 0.65 Y

0.7 0.61 ??
Age Loan Default Distance
Exercise-2 0.125 0.11 N 0.762
0.375 0.21 N 0.5154
0.625 0.31 N 0.3092
0 0.01 N 0.922
0.375 0.5 N 0.3431
0.8 0 N 0.6181
0.075 0.38 Y 0.666
0.5 0.22 Y 0.4383
1 0.41 Y 0.3606
0.7 1 Y 0.39
0.325 0.65 Y 0.3771

0.7 0.61 ??

K Nearest Neighbor KNN
No ratings yet
K Nearest Neighbor KNN
18 pages
Class Notes Unit 2 ML Material
No ratings yet
Class Notes Unit 2 ML Material
31 pages
DS - Module 3
No ratings yet
DS - Module 3
65 pages
Lec09 466 PDF
No ratings yet
Lec09 466 PDF
5 pages
IV Distance and Rule Based Models 4.1 Distance Based Models
No ratings yet
IV Distance and Rule Based Models 4.1 Distance Based Models
45 pages
m3 Final-1
No ratings yet
m3 Final-1
171 pages
05 KNN
No ratings yet
05 KNN
49 pages
Introduction To KNN
100% (1)
Introduction To KNN
8 pages
Ch2 - Lec2 - K Nearest Neighbour (KNN)
No ratings yet
Ch2 - Lec2 - K Nearest Neighbour (KNN)
18 pages
KNN - Algorithm - SVM - Algorithm
No ratings yet
KNN - Algorithm - SVM - Algorithm
27 pages
Road Traffic Algorithm
No ratings yet
Road Traffic Algorithm
5 pages
K Nearest Neighbour - Algorithm
No ratings yet
K Nearest Neighbour - Algorithm
29 pages
CSE445 NSU Week - 5
No ratings yet
CSE445 NSU Week - 5
26 pages
Machine Learning KNN Presentation
No ratings yet
Machine Learning KNN Presentation
28 pages
Machine Learning KNN Presentation
No ratings yet
Machine Learning KNN Presentation
28 pages
01 Basics 02knn 03
No ratings yet
01 Basics 02knn 03
9 pages
K-Nearest Neighbour Classifiers
No ratings yet
K-Nearest Neighbour Classifiers
18 pages
KNN Presentation
No ratings yet
KNN Presentation
16 pages
4.1 K-Nearest Neighbours (K-NN
No ratings yet
4.1 K-Nearest Neighbours (K-NN
9 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
Lecture#2. K Nearest Neighbors
No ratings yet
Lecture#2. K Nearest Neighbors
10 pages
A Review of Data Classification Using K-Nearest Neighbour
No ratings yet
A Review of Data Classification Using K-Nearest Neighbour
7 pages
ML Lecture 13 KNN
No ratings yet
ML Lecture 13 KNN
14 pages
Instance-Based Learning: K-Nearest Neighbour Learning
No ratings yet
Instance-Based Learning: K-Nearest Neighbour Learning
21 pages
K-Means and KNN
No ratings yet
K-Means and KNN
11 pages
Lecture-11-KNearest Clustering-Part-1
No ratings yet
Lecture-11-KNearest Clustering-Part-1
18 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
Textbook ML - Removed
No ratings yet
Textbook ML - Removed
10 pages
Chapter 4. K Nearest Neighbors
No ratings yet
Chapter 4. K Nearest Neighbors
55 pages
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
No ratings yet
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
25 pages
An Empirical Study of Distance Metrics For K-Nearest Neighbor Algorithm
No ratings yet
An Empirical Study of Distance Metrics For K-Nearest Neighbor Algorithm
6 pages
K Nearest Neighbors - Classification: Algorithm
No ratings yet
K Nearest Neighbors - Classification: Algorithm
4 pages
Supervised Learning and K Nearest Neighbors: Business Intelligence For Managers
No ratings yet
Supervised Learning and K Nearest Neighbors: Business Intelligence For Managers
15 pages
Nearest-Neighbor Classifier: MTL 782 Iit Delhi
No ratings yet
Nearest-Neighbor Classifier: MTL 782 Iit Delhi
16 pages
K Nearest Neighbor Classification
No ratings yet
K Nearest Neighbor Classification
16 pages
Lecture 07 KNN 14112022 034756pm
100% (1)
Lecture 07 KNN 14112022 034756pm
24 pages
Lecture8 KNN1
No ratings yet
Lecture8 KNN1
16 pages
K-Nearest Neighbor Algorithm: by Vipul Pathak (00216404824) Siddharth Tyagi (02016404824)
No ratings yet
K-Nearest Neighbor Algorithm: by Vipul Pathak (00216404824) Siddharth Tyagi (02016404824)
19 pages
Introduction To AI and ML - UNIT 4
No ratings yet
Introduction To AI and ML - UNIT 4
29 pages
Mbict 111 - 162 - 2021 - 11 - 14032021 - 3236
No ratings yet
Mbict 111 - 162 - 2021 - 11 - 14032021 - 3236
30 pages
K-Nearest Neighbour Classifier: Prerequisite
No ratings yet
K-Nearest Neighbour Classifier: Prerequisite
6 pages
Unit V: Distance and Rule Based Models
No ratings yet
Unit V: Distance and Rule Based Models
56 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
Non Parametric Classification: Pattern Recognition
No ratings yet
Non Parametric Classification: Pattern Recognition
74 pages
K-Nearest Neighbours (KNN)
No ratings yet
K-Nearest Neighbours (KNN)
10 pages
Decision Tree KNN
No ratings yet
Decision Tree KNN
9 pages
KNN
No ratings yet
KNN
5 pages
Module 3 Lab 1
No ratings yet
Module 3 Lab 1
6 pages
Machine Learning
No ratings yet
Machine Learning
50 pages
04 KNN M
No ratings yet
04 KNN M
26 pages
4 KNN Classifier
No ratings yet
4 KNN Classifier
6 pages
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
No ratings yet
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
7 pages
Machine Learning Module-03
No ratings yet
Machine Learning Module-03
24 pages
ML Unit 2
No ratings yet
ML Unit 2
11 pages
MachineLearning-Spring24 - KNN Implementation For Classification
No ratings yet
MachineLearning-Spring24 - KNN Implementation For Classification
3 pages
Distance Functions
No ratings yet
Distance Functions
7 pages
4 KNN Classifier
No ratings yet
4 KNN Classifier
6 pages
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
Foundation Maths by Example
From Everand
Foundation Maths by Example
Tim Prichard
No ratings yet
Standard-Slope Integration: A New Approach to Numerical Integration
From Everand
Standard-Slope Integration: A New Approach to Numerical Integration
Peter James Italia, MD
No ratings yet
Daftar Harga Barang Toko GMC Mojokerto Jl. Gajah Mada No. 42 Tlp. 0321-7229919 Mojokerto
No ratings yet
Daftar Harga Barang Toko GMC Mojokerto Jl. Gajah Mada No. 42 Tlp. 0321-7229919 Mojokerto
6 pages
Test Bank For Community Policing A Contemporary Perspective 6th Edition Kappelerdownload
100% (12)
Test Bank For Community Policing A Contemporary Perspective 6th Edition Kappelerdownload
32 pages
Chapter 3 Data Modeling Using The Entity Relationship ER Model
No ratings yet
Chapter 3 Data Modeling Using The Entity Relationship ER Model
55 pages
Reflective Essay 1
No ratings yet
Reflective Essay 1
2 pages
Vam Top HC Torque Table
100% (1)
Vam Top HC Torque Table
9 pages
Fee Slip
No ratings yet
Fee Slip
1 page
5 - Part 2 - Memory Principles
No ratings yet
5 - Part 2 - Memory Principles
10 pages
Intermittent Fasting
100% (1)
Intermittent Fasting
36 pages
HDBS Parameters
No ratings yet
HDBS Parameters
1 page
CSR of Reliance Industries PDF
67% (3)
CSR of Reliance Industries PDF
30 pages
Kinematic Diagrams
No ratings yet
Kinematic Diagrams
16 pages
Source Follower: (Common-Drain Amplifier)
No ratings yet
Source Follower: (Common-Drain Amplifier)
40 pages
Worksheet KTSP - Kelas 7
No ratings yet
Worksheet KTSP - Kelas 7
31 pages
Amec Unit 2 QB
No ratings yet
Amec Unit 2 QB
23 pages
Implementation and Analysis of Smart Lamp Using An
No ratings yet
Implementation and Analysis of Smart Lamp Using An
4 pages
RC msn4
No ratings yet
RC msn4
151 pages
RAC QB Final-2023
No ratings yet
RAC QB Final-2023
9 pages
Pro Proctor User Guide
No ratings yet
Pro Proctor User Guide
24 pages
Chapter Test: QS - Explain How You Found Your Answer
No ratings yet
Chapter Test: QS - Explain How You Found Your Answer
1 page
KP Technical Seminal Final Report FINAL
No ratings yet
KP Technical Seminal Final Report FINAL
30 pages
Four Dimension of Cloud Cube Model
No ratings yet
Four Dimension of Cloud Cube Model
2 pages
R S Aggarwal Solution Class 11 Maths Chapter 31 Probability Exercise 31A
No ratings yet
R S Aggarwal Solution Class 11 Maths Chapter 31 Probability Exercise 31A
9 pages
Semi Detailed LP 2
No ratings yet
Semi Detailed LP 2
3 pages
Bamboo Art: Terracotta
No ratings yet
Bamboo Art: Terracotta
2 pages
SOLUTION, SUSPENSION and COLLOID Activity Sheet
67% (3)
SOLUTION, SUSPENSION and COLLOID Activity Sheet
1 page
The Trade - Offs of ChatGPT To Filipino Freelance Content Writers A Diffusion of Innovation Theory Perspective
No ratings yet
The Trade - Offs of ChatGPT To Filipino Freelance Content Writers A Diffusion of Innovation Theory Perspective
7 pages
Geophysics Exploration - Chapter 6
No ratings yet
Geophysics Exploration - Chapter 6
43 pages
Grade 7 History Term 1 Worksheets 2023
No ratings yet
Grade 7 History Term 1 Worksheets 2023
23 pages
Ocular Trauma
No ratings yet
Ocular Trauma
109 pages
Productores de Banano de Nicaragua Probanic Datos Climáticos de Estación Finca San Luis Enero, 2011
No ratings yet
Productores de Banano de Nicaragua Probanic Datos Climáticos de Estación Finca San Luis Enero, 2011
17 pages

Introduction To Classification - KNN

Uploaded by

Introduction To Classification - KNN

Uploaded by

INTRODUCTION TO

• What should its classification be?

•Acircle has been added to enclose the five

• Thefive nearest neighbours are labelled with three

• Manhattan Distance or City Block Distance

• The formula for Euclidean distance between points (a1, a2, . . . ,

• When the distance of these instances from an unseen one is

where w1, w2, . . . , wn are the weights. It is customary to scale the

You might also like