Introduction To Machine Learning: K-Nearest Neighbor Algorithm

The document discusses the k-nearest neighbors (KNN) machine learning algorithm. KNN is a simple classification algorithm that classifies new data based on the majority class of its k nearest neighbors. It does not learn until a new data point needs to be classified, at which point it finds the k nearest neighbors in the training data and assigns the new point the class of the majority of those neighbors. The document discusses how KNN works, provides examples, pseudocode, and discusses factors like choosing k, similarity/distance measures, and the pros and cons of KNN.

Uploaded by

Alwan Siddiq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

218 views25 pages

Introduction To Machine Learning: K-Nearest Neighbor Algorithm

Uploaded by

Alwan Siddiq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

INTRODUCTION TO MACHINE LEARNING

K-NEAREST NEIGHBOR ALGORITHM

Mingon Kang, PhD

Department of Computer Science @ UNLV
KNN
 K-Nearest Neighbors (KNN)
 Simple, but a very powerful classification algorithm
 Classifies based on a similarity measure
 Non-parametric
 Lazy learning
 Does not “learn” until the test example is given
 Whenever we have a new data to classify, we find its
K-nearest neighbors from the training data

Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors
KNN: Classification Approach
 Classified by “MAJORITY VOTES” for its neighbor
classes
 Assigned to the most common class amongst its K-
nearest neighbors (by measuring “distant” between
data)

Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors
KNN: Example

Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors
KNN: Pseudocode

Ref: https://www.slideshare.net/PhuongNguyen6/text-categorization
KNN: Example

Ref: http://www.scholarpedia.org/article/K-nearest_neighbor
KNN: Euclidean distance matrix

Ref: http://www.scholarpedia.org/article/K-nearest_neighbor
Decision Boundaries
 Voronoi diagram
 Describes the areas that are nearest to any given point,
given a set of data.
 Each line segment is equidistant between two points of
opposite class

Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors
Decision Boundaries
 With large number of examples and possible noise
in the labels, the decision boundary can become
nasty!
 “Overfitting” problem

Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors
Effect of K
 Larger k produces smoother boundary effect
 When K==N, always predict the majority class

Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors
Discussion
 Which model is better between K=1 and K=15?
 Why?
How to choose k?
 Empirically optimal k?

Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors
Pros and Cons
 Pros
 Learning and implementation is extremely simple and
Intuitive
 Flexible decision boundaries

 Cons
 Irrelevantor correlated features have high impact and
must be eliminated
 Typically difficult to handle high dimensionality
 Computational costs: memory and classification time
computation

Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors
Similarity and Dissimilarity
 Similarity
 Numerical measure of how alike two data objects are.
 Is higher when objects are more alike.
 Often falls in the range [0,1]

 Dissimilarity
 Numerical measure of how different are two data objects
 Lower when objects are more alike
 Minimum dissimilarity is often 0
 Upper limit varies

 Proximity refers to a similarity or dissimilarity

Euclidean Distance
 Euclidean Distance

𝑑𝑖𝑠𝑡 = σ𝑝𝑘=1(𝑎𝑘 − 𝑏𝑘 )2

Where p is the number of dimensions (attributes) and

𝑎𝑘 and 𝑏𝑘 are, respectively, the k-th attributes
(components) or data objects a and b.

 Standardization is necessary, if scales differ.

Euclidean Distance
Minkowski Distance
 Minkowski Distance is a generalization of Euclidean
Distance
𝑝 1/𝑟

𝑑𝑖𝑠𝑡 = ෍ |𝑎𝑘 − 𝑏𝑘 |𝑟
𝑘=1
Where r is a parameter, p is the number of dimensions
(attributes) and 𝑎𝑘 and 𝑏𝑘 are, respectively, the k-th
attributes (components) or data objects a and b
Minkowski Distance: Examples
 r = 1. City block (Manhattan, taxicab, L1 norm) distance.
 A common example of this is the Hamming distance, which is just
the number of bits that are different between two binary vectors

 r = 2. Euclidean distance

 r →∞. “supremum” (𝐿𝑚𝑎𝑥 norm, 𝐿∞ norm) distance.

 This is the maximum difference between any component of the
vectors

 Do not confuse r with p, i.e., all these distances are defined

for all numbers of dimensions.
Cosine Similarity
 If 𝑑1 and 𝑑2 are two document vectors
cos(𝑑1 , 𝑑2 )=(𝑑1 ∙ 𝑑2 )/||𝑑1 ||||𝑑2 ||,
Where ∙ indicates vector dot product and ||𝑑|| is the length of vector d.

 Example:
𝑑1 = 3 2 0 5 0 0 0 2 0 0
𝑑2 = 1 0 0 0 0 0 0 1 0 2

𝑑1 ∙ 𝑑2 = 3*1 + 2*0 + 0*0 + 5*0 + 0*0 + 0*0 + 0*0 + 2*1 + 0*0 + 0*2 = 5
||𝑑1 || = (3*3+2*2+0*0+5*5+0*0+0*0+0*0+2*2+0*0+0*0)0.5= (42)^0.5= 6.481
||𝑑1 || = (1*1+0*0+0*0+0*0+0*0+0*0+0*0+1*1+0*0+2*2)0.5= (6)^0.5= 2.245
cos( d1, d2) = .3150
Cosine Similarity
1: exactly the same
 cos(𝑑1 , 𝑑2 ) = ቐ 0: orthogonal
−1: exactly opposite
Feature scaling
 Standardize the range of independent variables
(features of data)
 A.k.a Normalization or Standardization
Standardization
 Standardization or Z-score normalization
 Rescalethe data so that the mean is zero and the
standard deviation from the mean (standard scores) is
one

x−𝜇
x𝑛𝑜𝑟𝑚 =
𝜎
𝜇 is mean, 𝜎 is a standard deviation from the mean
(standard score)
Min-Max scaling
 Scale the data to a fixed range – between 0 and 1

x − xmin
xmorm =
xmax − xmin
Efficient implementation
 Consider data as a matrix or a vector
 Matrix/Vector computational is much more efficient
than computing with loop
Discussion
 Can we use KNN for regression problems?

ML Unit 2
No ratings yet
ML Unit 2
21 pages
Measuring Data Similarity and Dissimilarity
No ratings yet
Measuring Data Similarity and Dissimilarity
20 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
10 pages
K - Nearest Neighbor Algorithm
100% (1)
K - Nearest Neighbor Algorithm
18 pages
Unit 1 Data Mining Task
No ratings yet
Unit 1 Data Mining Task
7 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
No ratings yet
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
12 pages
Lecture 07 KNN 14112022 034756pm
100% (1)
Lecture 07 KNN 14112022 034756pm
24 pages
K-Means and PCA
No ratings yet
K-Means and PCA
69 pages
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
100% (1)
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
115 pages
Association Rule Mining
No ratings yet
Association Rule Mining
50 pages
UNIT-2 ML Notes
No ratings yet
UNIT-2 ML Notes
15 pages
Lecture Week 2 KNN and Model Evaluation PDF
100% (1)
Lecture Week 2 KNN and Model Evaluation PDF
53 pages
U L D R: Nsupervised Earning and Imensionality Eduction
No ratings yet
U L D R: Nsupervised Earning and Imensionality Eduction
58 pages
Chap 11 12 - Practical Methodology and Applications - Heechul Lim
100% (1)
Chap 11 12 - Practical Methodology and Applications - Heechul Lim
60 pages
Content - DELMIA - Ergonomics at Work Essentials
No ratings yet
Content - DELMIA - Ergonomics at Work Essentials
28 pages
Linear Regression Analysis. Statistics 2 Notes
No ratings yet
Linear Regression Analysis. Statistics 2 Notes
20 pages
ML Unit 2
No ratings yet
ML Unit 2
90 pages
Unit - 4 Machine Learning
100% (1)
Unit - 4 Machine Learning
84 pages
Regression Notes
100% (1)
Regression Notes
20 pages
Nearest Neighbour Algorithm
No ratings yet
Nearest Neighbour Algorithm
20 pages
Data Integration & Transformation
No ratings yet
Data Integration & Transformation
14 pages
Data Preprocessing
No ratings yet
Data Preprocessing
77 pages
CSE445 NSU Week - 5
No ratings yet
CSE445 NSU Week - 5
26 pages
K-Nearest Neighbors: KNN Algorithm Pseudocode
No ratings yet
K-Nearest Neighbors: KNN Algorithm Pseudocode
2 pages
K Nearest Neighbour
No ratings yet
K Nearest Neighbour
2 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
6 pages
Lab Program
100% (1)
Lab Program
15 pages
Science Technology and Society Final Examination
100% (2)
Science Technology and Society Final Examination
9 pages
VL2900 Inverter Instruction
No ratings yet
VL2900 Inverter Instruction
51 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Unit 4 Basics of Feature Engineering
No ratings yet
Unit 4 Basics of Feature Engineering
33 pages
ML UNIT 2 Sir
No ratings yet
ML UNIT 2 Sir
46 pages
Espan140 Solution 54860159 8697
No ratings yet
Espan140 Solution 54860159 8697
39 pages
Biped Humanoid Robot of 17 Degree of Freedom (Dof)
No ratings yet
Biped Humanoid Robot of 17 Degree of Freedom (Dof)
5 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
Supervised Learning (Classification and Regression)
No ratings yet
Supervised Learning (Classification and Regression)
14 pages
R22 ML Syllabus
No ratings yet
R22 ML Syllabus
2 pages
Support Vector Machine - Explanation
No ratings yet
Support Vector Machine - Explanation
12 pages
Machine Learning Unit 4 MCQ
No ratings yet
Machine Learning Unit 4 MCQ
28 pages
Starzplay Dec Data
No ratings yet
Starzplay Dec Data
423 pages
Powertronic Installation Manual-Kawasaki Er-6N (2012-2018)
No ratings yet
Powertronic Installation Manual-Kawasaki Er-6N (2012-2018)
33 pages
Curse of Dimensionality
No ratings yet
Curse of Dimensionality
9 pages
ML Lab Manual (1-10) FINAL
No ratings yet
ML Lab Manual (1-10) FINAL
34 pages
DM Chapter 3 Data Preprocessing
No ratings yet
DM Chapter 3 Data Preprocessing
76 pages
Single Link Example
No ratings yet
Single Link Example
8 pages
K Mean Clustering
No ratings yet
K Mean Clustering
36 pages
Module-3 Association Analysis: Data Mining Association Analysis: Basic Concepts and Algorithms
No ratings yet
Module-3 Association Analysis: Data Mining Association Analysis: Basic Concepts and Algorithms
34 pages
17 Microprocessor Systems Lecture No 17 JMP and LOOP Instructions PDF
No ratings yet
17 Microprocessor Systems Lecture No 17 JMP and LOOP Instructions PDF
12 pages
Dimension Reduction
No ratings yet
Dimension Reduction
15 pages
8051 UNIT 1-Material
No ratings yet
8051 UNIT 1-Material
38 pages
KNN (K Nearest Neighbor)
No ratings yet
KNN (K Nearest Neighbor)
21 pages
ML Unit-1
No ratings yet
ML Unit-1
26 pages
Application For Nda Alumni Association: Affix Photograph
No ratings yet
Application For Nda Alumni Association: Affix Photograph
3 pages
HC Vibration 1
No ratings yet
HC Vibration 1
9 pages
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
No ratings yet
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
16 pages
Draft DGS Order As An Addendum To Order 28 of 2020 v3
No ratings yet
Draft DGS Order As An Addendum To Order 28 of 2020 v3
19 pages
Classification and Prediction
No ratings yet
Classification and Prediction
126 pages
Data Structure Unit 5 (Searching and Sorting Notes)
100% (1)
Data Structure Unit 5 (Searching and Sorting Notes)
26 pages
ML Unit-2
No ratings yet
ML Unit-2
26 pages
Data Preprocessing: L1+ Freq
No ratings yet
Data Preprocessing: L1+ Freq
13 pages
Leaflet HERMA 652C 552C 452C English
No ratings yet
Leaflet HERMA 652C 552C 452C English
2 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
4 pages
TDS Tam 395 Coaltar Epoxy Black
No ratings yet
TDS Tam 395 Coaltar Epoxy Black
2 pages
Model SLS
No ratings yet
Model SLS
2 pages
Missing Value Treatment
No ratings yet
Missing Value Treatment
22 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
37 pages
Chapter 3 Selections Part 2
No ratings yet
Chapter 3 Selections Part 2
33 pages
Emfd Eec
No ratings yet
Emfd Eec
2 pages
Evaluation Metrics For Regression: Dr. Jasmeet Singh Assistant Professor, Csed Tiet, Patiala
No ratings yet
Evaluation Metrics For Regression: Dr. Jasmeet Singh Assistant Professor, Csed Tiet, Patiala
13 pages
Drill Stem Test
No ratings yet
Drill Stem Test
4 pages
Machine Learning Report
No ratings yet
Machine Learning Report
58 pages
Undercut Remove in Tooling
No ratings yet
Undercut Remove in Tooling
10 pages
Design and Implement of Performance of M
No ratings yet
Design and Implement of Performance of M
4 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
Case Study Instructions
No ratings yet
Case Study Instructions
8 pages
BCI Protocol V1.4
No ratings yet
BCI Protocol V1.4
3 pages
3 Categories of Entrants
No ratings yet
3 Categories of Entrants
5 pages
SCBM-910400#SCBM-910400 1
No ratings yet
SCBM-910400#SCBM-910400 1
2 pages
BMC Bos
No ratings yet
BMC Bos
1 page
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
23 pages
Com - Upgadata.up7723 Logcat
No ratings yet
Com - Upgadata.up7723 Logcat
47 pages
Crime Prediction in Nigeria's Higer Institutions
No ratings yet
Crime Prediction in Nigeria's Higer Institutions
13 pages
05 RSB Cluster
No ratings yet
05 RSB Cluster
14 pages
Pattern Recognition
No ratings yet
Pattern Recognition
3 pages
Artificial Intelligence in Public Policy
No ratings yet
Artificial Intelligence in Public Policy
8 pages
So3 b1 Unit Test U8a PDF
No ratings yet
So3 b1 Unit Test U8a PDF
5 pages
4 Word Processor
No ratings yet
4 Word Processor
22 pages
ESDL Lab Manual
No ratings yet
ESDL Lab Manual
7 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages