0% found this document useful (0 votes)
218 views25 pages

Introduction To Machine Learning: K-Nearest Neighbor Algorithm

The document discusses the k-nearest neighbors (KNN) machine learning algorithm. KNN is a simple classification algorithm that classifies new data based on the majority class of its k nearest neighbors. It does not learn until a new data point needs to be classified, at which point it finds the k nearest neighbors in the training data and assigns the new point the class of the majority of those neighbors. The document discusses how KNN works, provides examples, pseudocode, and discusses factors like choosing k, similarity/distance measures, and the pros and cons of KNN.

Uploaded by

Alwan Siddiq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
218 views25 pages

Introduction To Machine Learning: K-Nearest Neighbor Algorithm

The document discusses the k-nearest neighbors (KNN) machine learning algorithm. KNN is a simple classification algorithm that classifies new data based on the majority class of its k nearest neighbors. It does not learn until a new data point needs to be classified, at which point it finds the k nearest neighbors in the training data and assigns the new point the class of the majority of those neighbors. The document discusses how KNN works, provides examples, pseudocode, and discusses factors like choosing k, similarity/distance measures, and the pros and cons of KNN.

Uploaded by

Alwan Siddiq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

INTRODUCTION TO MACHINE LEARNING

K-NEAREST NEIGHBOR ALGORITHM

Mingon Kang, PhD


Department of Computer Science @ UNLV
KNN
 K-Nearest Neighbors (KNN)
 Simple, but a very powerful classification algorithm
 Classifies based on a similarity measure
 Non-parametric
 Lazy learning
 Does not “learn” until the test example is given
 Whenever we have a new data to classify, we find its
K-nearest neighbors from the training data

Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors
KNN: Classification Approach
 Classified by “MAJORITY VOTES” for its neighbor
classes
 Assigned to the most common class amongst its K-
nearest neighbors (by measuring “distant” between
data)

Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors
KNN: Example

Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors
KNN: Pseudocode

Ref: https://www.slideshare.net/PhuongNguyen6/text-categorization
KNN: Example

Ref: http://www.scholarpedia.org/article/K-nearest_neighbor
KNN: Euclidean distance matrix

Ref: http://www.scholarpedia.org/article/K-nearest_neighbor
Decision Boundaries
 Voronoi diagram
 Describes the areas that are nearest to any given point,
given a set of data.
 Each line segment is equidistant between two points of
opposite class

Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors
Decision Boundaries
 With large number of examples and possible noise
in the labels, the decision boundary can become
nasty!
 “Overfitting” problem

Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors
Effect of K
 Larger k produces smoother boundary effect
 When K==N, always predict the majority class

Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors
Discussion
 Which model is better between K=1 and K=15?
 Why?
How to choose k?
 Empirically optimal k?

Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors
Pros and Cons
 Pros
 Learning and implementation is extremely simple and
Intuitive
 Flexible decision boundaries

 Cons
 Irrelevantor correlated features have high impact and
must be eliminated
 Typically difficult to handle high dimensionality
 Computational costs: memory and classification time
computation

Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors
Similarity and Dissimilarity
 Similarity
 Numerical measure of how alike two data objects are.
 Is higher when objects are more alike.
 Often falls in the range [0,1]

 Dissimilarity
 Numerical measure of how different are two data objects
 Lower when objects are more alike
 Minimum dissimilarity is often 0
 Upper limit varies

 Proximity refers to a similarity or dissimilarity


Euclidean Distance
 Euclidean Distance

𝑑𝑖𝑠𝑡 = σ𝑝𝑘=1(𝑎𝑘 − 𝑏𝑘 )2

Where p is the number of dimensions (attributes) and


𝑎𝑘 and 𝑏𝑘 are, respectively, the k-th attributes
(components) or data objects a and b.

 Standardization is necessary, if scales differ.


Euclidean Distance
Minkowski Distance
 Minkowski Distance is a generalization of Euclidean
Distance
𝑝 1/𝑟

𝑑𝑖𝑠𝑡 = ෍ |𝑎𝑘 − 𝑏𝑘 |𝑟
𝑘=1
Where r is a parameter, p is the number of dimensions
(attributes) and 𝑎𝑘 and 𝑏𝑘 are, respectively, the k-th
attributes (components) or data objects a and b
Minkowski Distance: Examples
 r = 1. City block (Manhattan, taxicab, L1 norm) distance.
 A common example of this is the Hamming distance, which is just
the number of bits that are different between two binary vectors

 r = 2. Euclidean distance

 r →∞. “supremum” (𝐿𝑚𝑎𝑥 norm, 𝐿∞ norm) distance.


 This is the maximum difference between any component of the
vectors

 Do not confuse r with p, i.e., all these distances are defined


for all numbers of dimensions.
Cosine Similarity
 If 𝑑1 and 𝑑2 are two document vectors
cos(𝑑1 , 𝑑2 )=(𝑑1 ∙ 𝑑2 )/||𝑑1 ||||𝑑2 ||,
Where ∙ indicates vector dot product and ||𝑑|| is the length of vector d.

 Example:
𝑑1 = 3 2 0 5 0 0 0 2 0 0
𝑑2 = 1 0 0 0 0 0 0 1 0 2

𝑑1 ∙ 𝑑2 = 3*1 + 2*0 + 0*0 + 5*0 + 0*0 + 0*0 + 0*0 + 2*1 + 0*0 + 0*2 = 5
||𝑑1 || = (3*3+2*2+0*0+5*5+0*0+0*0+0*0+2*2+0*0+0*0)0.5= (42)^0.5= 6.481
||𝑑1 || = (1*1+0*0+0*0+0*0+0*0+0*0+0*0+1*1+0*0+2*2)0.5= (6)^0.5= 2.245
cos( d1, d2) = .3150
Cosine Similarity
1: exactly the same
 cos(𝑑1 , 𝑑2 ) = ቐ 0: orthogonal
−1: exactly opposite
Feature scaling
 Standardize the range of independent variables
(features of data)
 A.k.a Normalization or Standardization
Standardization
 Standardization or Z-score normalization
 Rescalethe data so that the mean is zero and the
standard deviation from the mean (standard scores) is
one

x−𝜇
x𝑛𝑜𝑟𝑚 =
𝜎
𝜇 is mean, 𝜎 is a standard deviation from the mean
(standard score)
Min-Max scaling
 Scale the data to a fixed range – between 0 and 1

x − xmin
xmorm =
xmax − xmin
Efficient implementation
 Consider data as a matrix or a vector
 Matrix/Vector computational is much more efficient
than computing with loop
Discussion
 Can we use KNN for regression problems?

You might also like