0% found this document useful (0 votes)

32 views7 pages

01 Basics 02knn 01

Uploaded by

kabiru Atiku

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views7 pages

01 Basics 02knn 01

Uploaded by

kabiru Atiku

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

STAT 479: Machine Learning

Lecture Notes

Sebastian Raschka
Department of Statistics
University of Wisconsin–Madison

http://stat.wisc.edu/∼sraschka/teaching/stat479-fs2018/

Fall 2018

2 Nearest Neighbor Methods

2.1 Introduction

Nearest neighbor algorithms are among the “simplest” supervised machine learning algo-
rithms and have been well studied in the field of pattern recognition over the last century.
While nearest neighbor algorithms are not as popular as they once were, they are still widely
used in practice, and I highly recommend that you are at least considering the k-Nearest
Neighbor algorithm in classification projects as a predictive performance benchmark when
you are trying to develop more sophisticated models.
In this lecture, we will primarily talk about two different algorithms, the Nearest Neighbor
(NN) algorithm and the k -Nearest Neighbor (k NN) algorithm. NN is just a special case of
k NN, where k = 1. To avoid making this text unnecessarily convoluted, we will only use the
abbreviation NN if we talk about concepts that do not apply to k NN in general. Otherwise,
we will use k NN to refer to nearest neighbor algorithms in general, regardless of the value
of k.

2.1.1 Key concepts

While k NN is a universal function approximator under certain conditions, the underlying

concept is relatively simple. k NN is an algorithm for supervised learning that simply stores
the labeled training examples,

hx[i] , y [i] i ∈ D (|D| = n), (1)

during the training phase. For this reason, k NN is also called a lazy learning algorithm.
What it means to be a lazy learning algorithm is that the processing of the training examples
is postponed until making predictions 1 – again, the training consists literally of just storing
the training data.
1 When you are reading recent literature, note that the *prediction* step is now often called ”inference”

in the machine learning community

Euclidean
distance=1 Manhattan
distance=1

Sebastian Raschka STAT479 FS18.

a L01: Intro to Machine Learning Page 2
c
? ?

Then, to make a prediction (classb label or continuous target), the kNN algorithms find the k
nearest neighbors of a query point and compute the class label (classification) or continuous
target (regression) based on the k nearest (most “similar”) points. The exact mechanics will
be explained in the next sections. However, the overall idea is that instead of approximating
the target function f (x) = y globally, during each prediction, k NN approximates the target
function locally. In practice, it is easier to learn to approximate a function locally than
globally.

? ?

Figure 1: Illustration of the nearest neighbor classification algorithm in two dimensions (features
x 1 and x 2). In the left subpanel, the training examples are shown as blue dots, and a query point
that we want to classify is shown as a question mark. In the right subpanel, the class labels are, and
the dashed line indicates the nearest neighbor of the query point, assuming a Euclidean distance
metric. The predicted class label is the class label of the closest data point in the training set (here:
class 0).

2.1.2 Nearest Neighbor Classification In Context

In the previous lecture, we learned about different kinds of categorization schemes, which
may be helpful for understanding and distinguishing different types of machine learning
algorithms.
To recap, the categories we discussed were
C
• eager vs lazy;
• batch vs online;
B
• parametric vs nonparametric;
A
• discriminative vs generative.

Since k NN does not have an explicit training step and defers all of the computation until
prediction, we already determined that k NN is a lazy algorithm.
Further, instead of devising one global model or approximation of the target function, for
each different data point, there is a different local approximation, which depends on the
data point itself as well as the training data points. Since the prediction is based on a D
comparison of a query point with data points in the training set (rather than Ba global
model), k NN is also categorized as instance-based (or “memory-based”) method. While
k NN is a lazy instance-based learning algorithm, an example of an eager instance-based
learning algorithm would be the support vector machine, which will be covered later in this
course.
Lastly, because we do not make any assumption about the functional form of the k NN
algorithm, a k NN model is also considered a nonparametric model. However, categorizing
Sebastian Raschka STAT479 FS18. L01: Intro to Machine Learning Page 3

k NN as either discriminative or generative is not as straightforward as for other algorithms.

Under certain assumptions, we can estimate the conditional probability that a given data
point belongs to a given class as well as the marginal probability for a feature (more details
are provided in the section on “k NN from a Bayesian Perspective” later) given a training
dataset. However, since k NN does not explicitly try to model the data generating process
but models the posterior probabilities (p(x|f (x))) directly, k NN is usually considered a
discriminative model.

2.1.3 Common Use Cases of k NN

While neural networks are gaining popularity in the computer vision and pattern recognition
field, one area where k -nearest neighbors models are still commonly and successfully being
used is in the intersection between computer vision, pattern classification, and biometrics
(e.g., to make predictions based on extracted geometrical features2 ).
Other common use cases include recommender systems (via collaborative filtering3 ) and
outlier detection4 .

2.2 Nearest Neighbor Algorithm

After introducing the overall concept of the nearest neighbor algorithms, this section provides
a more formal or technical description of the 1-nearest neighbor (NN) algorithm.
Training algorithm:
for i = 1, ..., n in the n-dimensional training dataset D (|D| = n):

• store training example hx[i] , f x[i] i

Prediction algorithm 5 :
closest point := None
closest distance := ∞

• for i = 1, ..., n:

– current distance := d(x[i] , x[q] )

– if current distance < closest distance:
∗ closest distance := current distance
∗ closest point := x[i]

prediction h(x[q] ) is the target value of closest point

Unless noted otherwise, the default distance metric (in the context of this lecture) of nearest
neighbor algorithms is the Euclidean distance (also called L2 distance), which computes the
2 Asmaa Sabet Anwar, Kareem Kamal A Ghany, and Hesham Elmahdy. “Human ear recognition using

geometrical features extraction”. In: Procedia Computer Science 65 (2015), pp. 529–537.
3 Youngki Park et al. “Reversed CF: A fast collaborative filtering algorithm using a k-nearest neighbor

graph”. In: Expert Systems with Applications 42.8 (2015), pp. 4022–4028.
4 Guilherme O Campos et al. “On the evaluation of unsupervised outlier detection: measures, datasets,

and an empirical study”. In: Data Mining and Knowledge Discovery 30.4 (2016), pp. 891–927.
5 We use ”:=” as an assignment operator.
Sebastian Raschka STAT479 FS18. L01: Intro to Machine Learning Page 4
? ?

distance between two points, x[a] and x[b] :

v
u m [a]
uX 2
[a] [b] [b]
d(x , x ) = t xj − xj . (2)
j=1

2.3 Nearest Neighbor Decision Boundary

In this section, we will build some intuition for the decision boundary of the NN classification
model. Assuming a Euclidean distance metric, the decision boundary between any two
training examples a and b is a straight line. If a query point is located on the decision
boundary, this means its equidistant from both training example a and b.
While the decision boundary between a pair of points is a straight line, the decision boundary
of the NN model on a global level, considering the whole training set, is a set of connected,
convex polyhedra. All points within a polyhedron are closest to the training example inside,
and all points outside the polyhedron are closer to a different training example.

a a c

Figure 2: Illustration of the plane-partitioning of a two-dimensional dataset (features x 1 and x 2)

via linear sigments between two training examples (a & b, a & c, and c & d) and the resulting
Voronoi diagram (upper right corner)

This partitioning of regions on a plane in 2D is also called “Voronoi diagram” or Voronoi

tessellation. (You may remember from geometry classes that given a discrete set of points,
a Voronoi diagram can also be obtained by a process known as Delaunay triangulation6 by
connecting the centers of the circumcircles.)
While each linear segment is equidistant from two different training examples, a vertex (or
node) in the Voronoi diagram is equidistant to three training examples. Then, to draw the
decision boundary of a two-dimensional nearest neighbor classifier, we take the union of the
pair-wise decision boundaries of instances of the same class.
6 https://en.wikipedia.org/wiki/Delaunay˙triangulation.
Sebastian Raschka STAT479 FS18. L01: Intro to Machine Learning Page 5

Figure 3: Illustration of the nearest neighbor decision boundary as the union of the polyhedra of
training examples belonging to the same class.

2.4 k -Nearest Neighbor Classification and Regression

Previously, we described the NN algorithm, which makes a prediction by assigning the class
label or continuous target value of the most similar training example to the query point
(where similarity is typically measured using the Euclidean distance metric for continuous
features).
Instead of basing the prediction of the single, most similar training example, k NN considers
the k nearest neighbors when predicting a class label (in classification) or a continuous target
value (in regression).

2.4.1 Classification

In the classification setting, the simplest incarnation of the k NN model is to predict the
target class label as the class label that is most often represented among the k most similar
training examples for a given query point. In other words, the class label can be considered
as the “mode” of the k training labels or the outcome of a “plurality voting.” Note that in
literature, k NN classification is often described as a “majority voting.” While the authors
usually mean the right thing, the term “majority voting” is a bit unfortunate as it typically
refers to a reference value of >50% for making a decision. In the case of binary predictions
(classification problems with two classes), there is always a majority or a tie. Hence, a
majority vote is also automatically a plurality vote. However, in multi-class settings, we do
not require a majority to make a prediction via k NN. For example, in a three-class setting
a frequency > 13 ( approx 33.3%) could already enough to assign a class label.
Sebastian Raschka STAT479 FS18. L01: Intro to Machine Learning Page 6

A
y:

Majority vote:
Plurality vote:

B
y:
Majority vote: None
Plurality vote:

Figure 4: Illustration of plurality and majority voting.

Remember that the NN prediction rule (recall that we defined NN as the special case of
k NN with k = 1) is the same for both classification or regression. However, in k NN we have
two distinct prediction algorithms:

• Plurality voting among the k nearest neighbors for classification.

• Averaging the continuous target variables of the k nearest neighbors for regression.

More formally, assume we have a target function f (x) = y that assigns a class label y ∈
{1, . . . , t} to a training example,
f : Rm → {1, ..., t}. (3)
(Usually, we use the letter k to denote the number of classes in this course, but in the context
of k-NN, it would be too confusing.)
Assuming we identified the k nearest neighbors (Dk ⊆ D) of a query point x[q] ,

Dk = {hx[1] , f x[1] i, . . . , hx[k] , f x[k] i},

(4)

we can define the k NN hypothesis as

k
X
h(x[q] ) = arg max δ(y, f (x[i] )). (5)
y∈{1,...,t}
i=1
Here, δ denotes the Kronecker Delta function
(
1, if a = b,
δ(a, b) = (6)
0, if a 6= b.

Or, in simpler notation, if you remember the “mode” from introductory statistics classes:
h(x[t] ) = mode f x[1] , . . . , f x[k]

. (7)

A common distance metric to identify the k nearest neighbors Dk is the Euclidean distance
measure, v
u m [a]
uX 2
[a] [b] [b]
d(x , x ) = t xj − xj , (8)
j=1
Sebastian Raschka STAT479 FS18. L01: Intro to Machine Learning Page 7

which is a pairwise distance metric that computes the distance between two data points x[a]
and x[b] over the m input features.

Figure 5: Illustration of k NN for a 3-class problem with k=5.

2.4.2 Regression

The general concept of k NN for regression is the same as for classification: first, we find
the k nearest neighbors in the dataset; second, we make a prediction based on the labels
of the k nearest neighbors. However, in regression, the target function is a real- instead of
discrete-valued function,
f : Rm → R. (9)
A common approach for computing the continuous target is to compute the mean or average
target value over the k nearest neighbors,
k
1X
h x[t] = f x[i] .

(10)
k i=1

As an alternative to averaging the target values of the k nearest neighbors to predict the
label of a query point, it is also not uncommon to use the median instead.

2.5 Curse of Dimensionality

The k NN algorithm is particularly susceptible to the curse of dimensionality 7 . In machine

learning, the curse of dimensionality refers to scenarios with a fixed size of training examples
but an increasing number of dimensions and range of feature values in each dimension in a
high-dimensional feature space.
In k NN an increasing number of dimensions becomes increasingly problematic because the
more dimensions we add, the larger the volume in the hyperspace needs to be to capture a
7 David L Donoho et al. “High-dimensional data analysis: The curses and blessings of dimensionality”.

In: AMS math challenges lecture 1.2000 (2000), p. 32.

6 - KNN Classifier
No ratings yet
6 - KNN Classifier
10 pages
Advantages and Disadvantages in Using Online Tools
50% (2)
Advantages and Disadvantages in Using Online Tools
3 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
02-knn Notes
No ratings yet
02-knn Notes
23 pages
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
23 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
COS4852 2023 Unit 2 - KNN
No ratings yet
COS4852 2023 Unit 2 - KNN
10 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
73 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
Lecture Note #3 - PEC-CS701E
No ratings yet
Lecture Note #3 - PEC-CS701E
27 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
13 pages
ML KN
No ratings yet
ML KN
12 pages
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
No ratings yet
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
18 pages
Instance Based Learning
No ratings yet
Instance Based Learning
7 pages
Aiml M3 C2
No ratings yet
Aiml M3 C2
56 pages
K-Nearest Neighbor Classification-Algorithm and Characteristics
No ratings yet
K-Nearest Neighbor Classification-Algorithm and Characteristics
6 pages
K Nearest Neighbour Classifier
No ratings yet
K Nearest Neighbour Classifier
24 pages
KNN Updated
No ratings yet
KNN Updated
30 pages
K-Nearest Neighbors (KNN) Algorithm in Machine Learning
No ratings yet
K-Nearest Neighbors (KNN) Algorithm in Machine Learning
3 pages
19-K-Nearest Neighbor Learning.-22-08-2024
No ratings yet
19-K-Nearest Neighbor Learning.-22-08-2024
25 pages
Lecture 14 and 15
No ratings yet
Lecture 14 and 15
42 pages
5c. Nearest Neighbour Classifier
No ratings yet
5c. Nearest Neighbour Classifier
2 pages
Machine Learning-Lecture 03
No ratings yet
Machine Learning-Lecture 03
19 pages
Instance Based Learning
No ratings yet
Instance Based Learning
16 pages
1694600817-Unit2.3 KNN CU 2.0
No ratings yet
1694600817-Unit2.3 KNN CU 2.0
25 pages
KNN Algorithm
No ratings yet
KNN Algorithm
3 pages
K-Nearest Neighbor Classifier: This Slide Is Modified From Dr. Tan's Slides. Thanks To Dr. Tan
No ratings yet
K-Nearest Neighbor Classifier: This Slide Is Modified From Dr. Tan's Slides. Thanks To Dr. Tan
11 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
19 pages
Lec 46
No ratings yet
Lec 46
12 pages
Lecture-11-KNearest Clustering-Part-1
No ratings yet
Lecture-11-KNearest Clustering-Part-1
18 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
35 pages
K Nearest Neighbours
No ratings yet
K Nearest Neighbours
12 pages
Lecture 3 - KNN Algorithm
No ratings yet
Lecture 3 - KNN Algorithm
28 pages
3.1 K Nearest Neighbour Classifier
No ratings yet
3.1 K Nearest Neighbour Classifier
24 pages
ML Lecture 13 KNN
No ratings yet
ML Lecture 13 KNN
14 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
Unit 3 KNN
No ratings yet
Unit 3 KNN
16 pages
KNN HMM
No ratings yet
KNN HMM
51 pages
PowerPoint Presentation - KNN Presentation
No ratings yet
PowerPoint Presentation - KNN Presentation
16 pages
Miss Erum Mahood Topic: KNN Algorthim: Presentator BY: Zobia Malaika Maryam Minahil
No ratings yet
Miss Erum Mahood Topic: KNN Algorthim: Presentator BY: Zobia Malaika Maryam Minahil
10 pages
Day43 KNN Intro
No ratings yet
Day43 KNN Intro
4 pages
Unit 3 KNN
No ratings yet
Unit 3 KNN
6 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
14-15 ASAP Advanced Statistics Clasification Techniques KNN
No ratings yet
14-15 ASAP Advanced Statistics Clasification Techniques KNN
49 pages
KNN Presentation
No ratings yet
KNN Presentation
16 pages
KNN With Example
No ratings yet
KNN With Example
21 pages
Lec 36
No ratings yet
Lec 36
13 pages
2 K-Nearest Neighbors: ( (X, Y, Y) Be The Set of Ob-X (X) R Y (Y) R
No ratings yet
2 K-Nearest Neighbors: ( (X, Y, Y) Be The Set of Ob-X (X) R Y (Y) R
2 pages
Aitee (Notes) KNN
No ratings yet
Aitee (Notes) KNN
3 pages
Supervised Example KNN
No ratings yet
Supervised Example KNN
22 pages
Machine Learning Unit-3.1
No ratings yet
Machine Learning Unit-3.1
20 pages
02-knn Slides
No ratings yet
02-knn Slides
57 pages
CH 2
No ratings yet
CH 2
30 pages
Algorithms - K Nearest Neighbors
No ratings yet
Algorithms - K Nearest Neighbors
23 pages
Introduction To KNN
100% (1)
Introduction To KNN
8 pages
Shubh
No ratings yet
Shubh
10 pages
Unit V Non Parametric Machine Learning
No ratings yet
Unit V Non Parametric Machine Learning
47 pages
KNN Algorithm
No ratings yet
KNN Algorithm
16 pages
K-NN (Nearest Neighbor)
100% (1)
K-NN (Nearest Neighbor)
17 pages
Context Clues Lesson Plan 1
No ratings yet
Context Clues Lesson Plan 1
5 pages
Accomplishment Report Online SBM Benchmarking
No ratings yet
Accomplishment Report Online SBM Benchmarking
7 pages
Content and Foreign Language Integrated Learning
No ratings yet
Content and Foreign Language Integrated Learning
4 pages
g2 Math
No ratings yet
g2 Math
3 pages
Students' Perspective of The Use of Audio Visual Aids in Pakistan
No ratings yet
Students' Perspective of The Use of Audio Visual Aids in Pakistan
3 pages
Singing Game Music Lesson Plan
No ratings yet
Singing Game Music Lesson Plan
5 pages
Lesson 1-The K To 12 Framework
No ratings yet
Lesson 1-The K To 12 Framework
14 pages
Guidebook2010 11
0% (1)
Guidebook2010 11
122 pages
Crawford Tiffany Resume
No ratings yet
Crawford Tiffany Resume
2 pages
Igcse Speaking Classroom Activities
No ratings yet
Igcse Speaking Classroom Activities
30 pages
Seniority Lists Through 6/30/2014 Probationary Teacher List Half Hollow Hills Central School District
No ratings yet
Seniority Lists Through 6/30/2014 Probationary Teacher List Half Hollow Hills Central School District
35 pages
Art - Snowflakes Lesson
100% (1)
Art - Snowflakes Lesson
3 pages
ENG4U Course Outline 2022-2023
No ratings yet
ENG4U Course Outline 2022-2023
9 pages
( (Đề) G11 - U8
No ratings yet
( (Đề) G11 - U8
3 pages
Captial Letters in Names
No ratings yet
Captial Letters in Names
5 pages
Euthenics 1
No ratings yet
Euthenics 1
3 pages
Trainees Who Are Enrolled Under The Diploma Programs of DIGITECH College Will Be Graded Using The Following Grade Computations
No ratings yet
Trainees Who Are Enrolled Under The Diploma Programs of DIGITECH College Will Be Graded Using The Following Grade Computations
3 pages
Pedagogy Solved Questions
No ratings yet
Pedagogy Solved Questions
10 pages
Flowcharts and Pseudocode Lesson Three Teaching Ideas
No ratings yet
Flowcharts and Pseudocode Lesson Three Teaching Ideas
3 pages
Malaysia National Education Philosophy
100% (1)
Malaysia National Education Philosophy
25 pages
C-70 Summer Holiday Homework Std. VI
No ratings yet
C-70 Summer Holiday Homework Std. VI
2 pages
Cit 360
No ratings yet
Cit 360
5 pages
Physics Global News: Aloysius Niko, A Best Madya Laboratory's Assistant About Me
No ratings yet
Physics Global News: Aloysius Niko, A Best Madya Laboratory's Assistant About Me
3 pages
Textbook Analysis of "Grow With English Book 1
No ratings yet
Textbook Analysis of "Grow With English Book 1
16 pages
Educ 323c
No ratings yet
Educ 323c
4 pages
Sf9 Intermediate Sy 21-22-8.5x11
No ratings yet
Sf9 Intermediate Sy 21-22-8.5x11
2 pages
Guide To Implementation
No ratings yet
Guide To Implementation
573 pages
Lesson Plan
No ratings yet
Lesson Plan
3 pages
Nick Tomasini Reseme 11-2023
No ratings yet
Nick Tomasini Reseme 11-2023
2 pages