0% found this document useful (0 votes)

19 views39 pages

08 - KNN

Note for DSME(DOTE)

Uploaded by

gordonlam145

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views39 pages

08 - KNN

Note for DSME(DOTE)

Uploaded by

gordonlam145

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

DM & BDA

k-Nearest Neighbours
Content

1 Motivation

2 Nearest Neighbours for Classi cation

3 Nearest Neighbours for Regression

4 The Curse of Dimensionality

5 Advantages & Shortcomings

6 Real-Life Applications

Reading: Shmueli et al., §7

Lantz, §3
fi
Motivation

“Dark dining” restaurants:

You are served

a tomato:

How do you recognise what you are eating?

Motivation

Previous food experience:

Ingredient Sweetness Crunchiness Food Type
10 9 fruit

1 4 protein

10 1 fruit

7 10 vegetable

3 10 vegetable

1 1 protein

… … … …
Motivation

Scatterplot in the sweetness-crunchiness plane:

crunchiness

sweetness
Motivation

Similar food types tend to be clustered together:

vegetables fruits
crunchiness

proteins

sweetness
Motivation

Let’s look at tomatoes: 1-Nearest Neighbour

tomatoes closest to oranges
oranges are fruits, hence
tomatoes are fruits

2-Nearest Neighbour
crunchiness

tomatoes closest to oranges

3 and grapes
2 oranges and grapes are fruits,
hence tomatoes are fruits

1 3-Nearest Neighbour
tomatoes closest to oranges,
grapes and nuts
2/3 of nearest neighbours are
sweetness fruits, hence tomatoes are fruits
Content

1 Motivation

2 Nearest Neighbours for Classi cation

3 Nearest Neighbours for Regression

4 The Curse of Dimensionality

5 Advantages & Shortcomings

6 Real-Life Applications

Reading: Shmueli et al., §7.1

Lantz, §3
fi
k-Nearest Neighbours for Classi cation

Input
n training samples (Xi1, …, Xip; Yi), i = 1, …, n, with Xi1, …, Xip
predictors and Yi the categorical outcome.
New data point (X1, …, Xp) whose outcome should be found.

Algorithm
1 Find the k training samples with the smallest distance
dist((Xi1 , . . . , Xip ), (X1 , . . . , Xp )).

2 Assign to the new data point (X1, …, Xp) the majority category
among these k training samples.
fi
k-Nearest Neighbours for Classi cation

Input
n training samples (Xi1, …, Xip; Yi), i = 1, …, n, with Xi1, …, Xip
predictors and Yi the outcome.
New data point (x1, …, xQuestions:
p) whose category should be found.

1 Which distance function should we use?

Algorithm
2 What about binary/categorical predictors?
1 Find the k training samples with the smallest distance
3 How should we choose k?
dist((Xi1 , . . . , Xip ), (X1 , . . . , Xp )).

2 Assign to the new data point (X1, …, Xp) the majority category
among these k training samples.
fi
(1) Distance Functions

Typically, distances are measured by a q-norm:

1/q
p
distq ((Xi1 , . . . , Xip ), (X1 , . . . , Xp )) = |Xij Xj | q
j=1

q=1 (Manhattan/taxicab distance):

p
dist1 ((Xi1 , . . . , Xip ), (X1 , . . . , Xp )) = |Xij Xj |
j=1
q=2 (Euclidean distance):
p
dist2 ((Xi1 , . . . , Xip ), (X1 , . . . , Xp )) = (Xij X j )2
j=1
q=∞ (maximum distance):
dist ((Xi1 , . . . , Xip ), (X1 , . . . , Xp )) = max {|Xij Xj | : j = 1, . . . , p}
(1) Distance Functions

Manhattan
distance

Euclidean
distance

Maximum
distance
(1) Scaling: Motivation

Nearest Neighbour methods are sensitive to scaling:

Imagine we add a “spiciness”

dimension to our food example.
We measure spiciness according
to the Scoville scale (see right).

What would happen to our

Nearest Neighbour classi er?
fi
(1) Scaling: Methods

1 Min-max normalization: works well if predictor roughly uniform

old old
new
Xij min Xij : i = 1, . . . , n
Xij = old : i = 1, . . . , n old : i = 1, . . . , n
max Xij min Xij

2 Z-score normalization: good in the presence of outliers

old n
new
Xij µj 1 old
Xij = with µj = Xij
j n i=1
n
1 old 2
and j = Xij µj
n i=1

The same transformation must be applied

to the new data point (X1, …, Xp)!
(2) Binary and Categorical Predictors

Consider the binary predictor “male/female”:

The predictor can be replaced 1 if male,

male =
with 1 dummy numerical variable: 0 if female.

Consider a 3-category temperature predictor “hot/medium/cold”:

3 if hot,
The predictor can be replaced temp =
1 2 if medium,
with 1 dummy numerical variable:
1 if cold.

1 if hot,
is hot =
The predictor can be replaced with 0 otherwise.
2
2/3 dummy numerical variables: 1 if medium,
is med =
0 otherwise.
…
(3) How to Choose k:
The Bias/Variance Trade-Off
Consider the dataset:

k = 1: k = 99:

What happens if k approaches n?

(3) How to Choose k:
The Bias/Variance Trade-Off

reducible error
error

optimum
bias
variance

large k model complexity small k

bias: large k converges towards “training average” (under tting)

variance: small k reacts to noise/outliers (over tting)

rule of thumb: k = n gives good performance

fi
fi
(3) How to Choose k:
The Bias/Variance Trade-Off
The bias/variance trade-off in machine learning

low variance high variance

low bias
high bias
(3) How to Choose k:
Validation and Test Sets
Naïve idea: choose k that performs best on the training data
• For each k = 1, …, n (or a suitable subset), check what
percentage of the training samples is correctly classi ed
by the k-Nearest Neighbours method over all training data.
• Choose the parameter k that gives the best result.
25%
the best estimator
for the training data
classi cation error

20%
is the training data
15% itself!
10%

120 100 80 60 40 20 0
parameter k
fi
fi
(3) How to Choose k:
Validation and Test Sets
Better idea: split samples into training and validation sets!

80% 20%
training validation
data data

Remove e.g. 20%, 1/3 or 1/2 of your training samples and

use them as validation samples.
For each k = 1, …, n (or a suitable subset), check what
percentage of the validation samples is correctly classi ed
by the k-Nearest Neighbours method.
Choose the parameter k that gives the best result on the
validation set.

fi
(3) How to Choose k:
Validation and Test Sets
Better idea: split samples into training and validation sets!

80% 20%
training validation
25% data data
classi cation error

20% optimum
lies here
15% somewhere

10%

120 100 80 60 40 20 0
parameter k
fi
(3) How to Choose k:
Validation and Test Sets
Better idea: split samples into training and validation sets!

80% 20%
training validation
25% data data
classi cation error

20% optimum
Can w lies here
15% e expe
somewhere
p erform c t the s
ance o ame
n new
10% data?

120 100 80 60 40 20 0
parameter k
fi
(3) How to Choose k:
Validation and Test Sets
Estimate of generalization error: use test set!

60% 20% 20%

training validation test
data data data
Remove e.g. 40% of your training samples; use one half as
validation samples, the other half as test samples.
For each k = 1, …, n (or a suitable subset), check what
percentage of the validation samples is correctly classi ed
by the k-Nearest Neighbours method.
Choose the parameter k that gives the best result.
For the optimal k, check what percentage of the test samples
is correctly classi ed — this gives the generalization error.
fi
fi
(3) How to Choose k:
Validation and Test Sets
Estimate of generalization error: use test set!

60% 20% 20%

training validation test
25% data data data
estimate of
classi cation error

20%
generalization error
(from test set)
15%

10%

120 100 80 60 40 20 0
parameter k
fi
k-Nearest Neighbours: Algorithm Variants

1 Propensities (“con dence values”) for predictions:

• The k-NN method assigns the majority category Y among the

k nearest training samples to a new data point.
• We can de ne the following propensity for this assignment:
# of nearest neighbours with category Y
100% · (0%, 100%]
k

2 Cutoff values (“don’t know’s”):

• We can specify a minimum con dence under which the k-NN

method does not assign any category (“I don’t know!”).
• Useful if wrong categorization is costly (e.g. HIV diagnosis).
fi
fi
fi
Content

1 Motivation

2 Nearest Neighbours for Classi cation

3 Nearest Neighbours for Regression

4 The Curse of Dimensionality

5 Advantages & Shortcomings

6 Real-Life Applications

Reading: Shmueli et al., §7.2

Lantz, §3
fi
k-Nearest Neighbours for Regression

Input
n training samples (Xi1, …, Xip; Yi), i = 1, …, n, with Xi1, …, Xip
predictors and Yi the numerical outcome.
New data point (X1, …, Xp) whose outcome should be found.

Algorithm
1 Find the k training samples with the smallest distance
dist((Xi1 , . . . , Xip ), (X1 , . . . , Xp )).

2 Assign to the new data point (X1, …, Xp) the average of the
numerical outcomes among these k training samples.
Choice of k in Regression Problems

Choice of parameter k:

The parameter k can be chosen in the same way as for a

classi cation problem (using validation or validation/test set).
We replace the classi cation error with the mean square error:
m
1 2
MSE = (ŷi yi ) where m = number of samples
m i=1 in the validation
or test set
ŷi = predicted numerical
response for sample i
yi = actual numerical
response for sample i
fi
fi
Example: Pandora
music streaming & recommendation
service that adapts to your taste
musicians characterize songs by up
to 450 features on a 0-5 scale
(the Music Genome Project)

Using a modi ed k-NN algorithm, Pandora recommends songs

that are (dis-)similar to the ones previously (dis-)liked by the user
fi
Content

1 Motivation

2 Nearest Neighbours for Classi cation

3 Nearest Neighbours for Regression

4 The Curse of Dimensionality

5 Advantages & Shortcomings

6 Real-Life Applications
Reading: Lantz, §3
fi
The Curse of Dimensionality (1): Fixed-Size
Training Sets Don’t Cover the Space
Assumptions:

5,000 training samples with p numerical features in [0,1]

training samples are uniformly distributed on [0,1]p
we measure distances by the maximum norm

Question: How close are the 4 nearest neighbours of a point?

Approximate We need to nd the smallest hypercube that covers

Answer: 1/1,000 of the volume of the [0,1]p hypercube.

the [0,1]p hypercube has volume 1

a hypercube with side length c has volume cp
1/p
p 1 1
c = c=
1000 1000
fi
The Curse of Dimensionality (1): Fixed-Size
Training Sets Don’t Cover the Space
1/p
1
c=
1000
c
1

0.75

In 100 dimensions, we need 93% side length

0.5

In 3 dimensions, we need 10% side length

0.25

0 10 20 30 40 50 60 70 80 90 100

p
The Curse of Dimensionality (1): Fixed-Size
Training Sets Don’t Cover the Space

Filling 50% of the volume takes side lengths of…

1
50% in R

2 3
71% in R 80% in R
The Curse of Dimensionality (2):
Similarity Breaks Down in High Dimensions
The volume of a high-dimensional orange
is concentrated in the skin, not the pulp
n/2
n
Volume of an n-dimensional ball of radius R: n R ,
2 +1
where is Euler’s gamma function.
n/2
volume
n Rn
outside: 2 +1
n/2
volume (R ) n
n
inside: 2 +1
n/2 n/2
n [Rn (R )n ] n Rn
2 +1 2 +1
as n .
The Curse of Dimensionality (2):
Similarity Breaks Down in High Dimensions
The volume of a high-dimensional orange
is concentrated in the skin, not the pulp
n/2
n
Volume of an n-dimensional ball of radius R: n R ,
2 +1
ConisseEuler’s
where quencgamma function.
each fe e: If the
ature is n/2 training d
volume uniform n ata is h
in a nthin Rly d i st r i g h-dimen
ibuted, s i o n al and
outside: + s
1 he l l a way fro t h e s a m
2
m the c p les in fa
n/2 entre! ct live
volume (R ) n
n
inside: 2 +1
n/2 n/2
n [Rn (R )n ] n Rn
2 +1 2 +1
as n .
Content

1 Motivation

2 Nearest Neighbours for Classi cation

3 Nearest Neighbours for Regression

4 The Curse of Dimensionality

5 Advantages & Shortcomings

6 Real-Life Applications

Reading: Shmueli et al., §7.3

Lantz, §3
fi
Advantages & Shortcomings

Very simple but often surprisingly effective

Fast training phase (just need to store the training set)
Non-parametric approach that can make use of large
amounts of data

Does not produce a model that offers insights into the

relationship between features and response
Slow classi cation phase (requires determination of
k nearest neighbours) — it is called a lazy learner
Requires choice of suitable k
Preprocessing required for scaling, binary/categorical
features and missing values
Suffers from the curse of dimensionality
fi
Content

1 Motivation

2 Nearest Neighbours for Classi cation

3 Nearest Neighbours for Regression

4 The Curse of Dimensionality

5 Advantages & Shortcomings

6 Real-Life Applications
fi
Real-Life Applications

1 Optical Character Recognition

2 Face Recognition 3 Recommender

Systems

Nearest Neighbor Regression: Find Training Datum Closest To Predict
No ratings yet
Nearest Neighbor Regression: Find Training Datum Closest To Predict
37 pages
ML Lecture#2
No ratings yet
ML Lecture#2
70 pages
Unit 4 - KVR
No ratings yet
Unit 4 - KVR
111 pages
جدول الاختبارات 2023-2024 الفصل الاول - مع القاعات - ٠٦١١٣٥
No ratings yet
جدول الاختبارات 2023-2024 الفصل الاول - مع القاعات - ٠٦١١٣٥
36 pages
Machine Learning: Lecture # 2 Data Normalization, KNN & Minimum Distance
No ratings yet
Machine Learning: Lecture # 2 Data Normalization, KNN & Minimum Distance
74 pages
cs4302 Lecture2
No ratings yet
cs4302 Lecture2
40 pages
Lazy LearningClassification Using Nearest Neighbors
No ratings yet
Lazy LearningClassification Using Nearest Neighbors
36 pages
INSY446 - 5 - Classification Part 2
No ratings yet
INSY446 - 5 - Classification Part 2
37 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
35 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
22 pages
19-K-Nearest Neighbor Learning.-22-08-2024
No ratings yet
19-K-Nearest Neighbor Learning.-22-08-2024
25 pages
Lecture 14 and 15
No ratings yet
Lecture 14 and 15
42 pages
ML Lec07 KNN
100% (2)
ML Lec07 KNN
37 pages
ML 5
No ratings yet
ML 5
35 pages
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
No ratings yet
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
13 pages
K Nearest Neighbour Classifier
No ratings yet
K Nearest Neighbour Classifier
24 pages
KNN CIML
No ratings yet
KNN CIML
12 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
Challenges in KNN Classification: Shichao Zhang
No ratings yet
Challenges in KNN Classification: Shichao Zhang
13 pages
05 KNN
No ratings yet
05 KNN
49 pages
K Nearest Neighbor Classification
No ratings yet
K Nearest Neighbor Classification
30 pages
06 KNN
No ratings yet
06 KNN
41 pages
KNN
No ratings yet
KNN
53 pages
KNN Updated
No ratings yet
KNN Updated
30 pages
1 KNN-Algo
No ratings yet
1 KNN-Algo
27 pages
Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang
No ratings yet
Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang
18 pages
Part A 3. KNN Classification
No ratings yet
Part A 3. KNN Classification
35 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
w5 Classification
No ratings yet
w5 Classification
34 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
31 pages
What Is KNN
No ratings yet
What Is KNN
9 pages
ML 2
No ratings yet
ML 2
6 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
ML Lecture 13 KNN
No ratings yet
ML Lecture 13 KNN
14 pages
ML Unit-2
No ratings yet
ML Unit-2
33 pages
Chapter 4. K Nearest Neighbors
No ratings yet
Chapter 4. K Nearest Neighbors
55 pages
Chapter#10 (Part#01) SL (K-NN)
No ratings yet
Chapter#10 (Part#01) SL (K-NN)
27 pages
08 Classification Using K NN
No ratings yet
08 Classification Using K NN
23 pages
ML Unit2
No ratings yet
ML Unit2
38 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
Saputra 2019 J. Phys. Conf. Ser. 1235 012006
No ratings yet
Saputra 2019 J. Phys. Conf. Ser. 1235 012006
7 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
Example 1: Riding Mowers
No ratings yet
Example 1: Riding Mowers
6 pages
Dynamic KNNF
No ratings yet
Dynamic KNNF
3 pages
KNN Algorithm
No ratings yet
KNN Algorithm
16 pages
Week 07
No ratings yet
Week 07
24 pages
3.1 K Nearest Neighbour Classifier
No ratings yet
3.1 K Nearest Neighbour Classifier
24 pages
Why Do We Need A K-NN Algorithm?
No ratings yet
Why Do We Need A K-NN Algorithm?
11 pages
K Nearest Neighbor: Presented by
No ratings yet
K Nearest Neighbor: Presented by
29 pages
Chap7 KNN
No ratings yet
Chap7 KNN
15 pages
Lecture#2. K Nearest Neighbors
No ratings yet
Lecture#2. K Nearest Neighbors
10 pages
Introduction To Data Science Lecture 6 KG Sir OEC M 621 (E)
No ratings yet
Introduction To Data Science Lecture 6 KG Sir OEC M 621 (E)
8 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
躍思數學4年級
No ratings yet
躍思數學4年級
10 pages
K Nearest Neighbor KNN
No ratings yet
K Nearest Neighbor KNN
18 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
Decision Tree KNN
No ratings yet
Decision Tree KNN
9 pages
K-Nearest Neighbour Classifier: Prerequisite
No ratings yet
K-Nearest Neighbour Classifier: Prerequisite
6 pages
5 - Reliability
100% (1)
5 - Reliability
14 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
Medical Statistics and Demography Made Easy®
100% (1)
Medical Statistics and Demography Made Easy®
353 pages
XII Probability Assignment Main
0% (1)
XII Probability Assignment Main
4 pages
Survey Adjustment Notes
No ratings yet
Survey Adjustment Notes
7 pages
Statistics For Managers Using Microsoft® Excel 5th Edition: Some Important Discrete Probability Distributions
No ratings yet
Statistics For Managers Using Microsoft® Excel 5th Edition: Some Important Discrete Probability Distributions
48 pages
Communicate Information Quickly and Easily.: B. Visual Factory (
No ratings yet
Communicate Information Quickly and Easily.: B. Visual Factory (
27 pages
Probabilistic Slope Analysis With The Finite Element Method: ARMA 09-149
No ratings yet
Probabilistic Slope Analysis With The Finite Element Method: ARMA 09-149
8 pages
Participation's Influence On Job Stafisfaction
No ratings yet
Participation's Influence On Job Stafisfaction
23 pages
Course Outline - Probability & Statistics (05-03-2021)
No ratings yet
Course Outline - Probability & Statistics (05-03-2021)
4 pages
XSTK
No ratings yet
XSTK
9 pages
Data Science Important Questions
No ratings yet
Data Science Important Questions
4 pages
07 - Evaluating Performance
No ratings yet
07 - Evaluating Performance
46 pages
09 - Bayes
No ratings yet
09 - Bayes
40 pages
10 - Cart
No ratings yet
10 - Cart
39 pages
Chapter3 FA
No ratings yet
Chapter3 FA
41 pages
Monte Carlo Methods For Particle Transport 1st Edition Alireza Haghighat - The Ebook Is Ready For Instant Download and Access
No ratings yet
Monte Carlo Methods For Particle Transport 1st Edition Alireza Haghighat - The Ebook Is Ready For Instant Download and Access
44 pages
Module 6 Lesson 2
No ratings yet
Module 6 Lesson 2
16 pages
Additional Mathematics
No ratings yet
Additional Mathematics
40 pages
88 21utility Analysis For Decisions
No ratings yet
88 21utility Analysis For Decisions
128 pages
Statistics SLM
No ratings yet
Statistics SLM
7 pages
Zenker 2017
No ratings yet
Zenker 2017
13 pages
Variance Lecture
No ratings yet
Variance Lecture
14 pages
Meteorological Calculations
No ratings yet
Meteorological Calculations
3 pages
Instant Download (Ebook PDF) Fundamental Statistics For The Behavioral Sciences 9th Edition by David C. Howell PDF All Chapter
100% (5)
Instant Download (Ebook PDF) Fundamental Statistics For The Behavioral Sciences 9th Edition by David C. Howell PDF All Chapter
41 pages
Probablity
No ratings yet
Probablity
32 pages
Chapter 1 I
No ratings yet
Chapter 1 I
43 pages
AD0774805
No ratings yet
AD0774805
128 pages
Topic2 - 2024 - Descriptive Statistics - STD - Revised
No ratings yet
Topic2 - 2024 - Descriptive Statistics - STD - Revised
20 pages
Task 5: H32 H32 H32 H32 H32 H32 H32 H32 H32 H32 H32 H32
No ratings yet
Task 5: H32 H32 H32 H32 H32 H32 H32 H32 H32 H32 H32 H32
49 pages
Joint Probability Distribution Reference 2
No ratings yet
Joint Probability Distribution Reference 2
13 pages
Frontier Soar: F.2 Vocab of Reading 21 F.2 Vocab of Reading 21
No ratings yet
Frontier Soar: F.2 Vocab of Reading 21 F.2 Vocab of Reading 21
4 pages
Dataanalysis PDF
No ratings yet
Dataanalysis PDF
12 pages
Distinguishing Between Random and Fixed: Variables, Effects, and Coefficients
No ratings yet
Distinguishing Between Random and Fixed: Variables, Effects, and Coefficients
3 pages
Garth Wait e 1985
No ratings yet
Garth Wait e 1985
15 pages
Course Information: Comm290@sauder - Ubc.ca
No ratings yet
Course Information: Comm290@sauder - Ubc.ca
6 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet

08 - KNN

Uploaded by

08 - KNN

Uploaded by

DM & BDA

2 Nearest Neighbours for Classi cation

3 Nearest Neighbours for Regression

4 The Curse of Dimensionality

5 Advantages & Shortcomings

Reading: Shmueli et al., §7

“Dark dining” restaurants:

You are served

How do you recognise what you are eating?

Previous food experience:

Scatterplot in the sweetness-crunchiness plane:

Similar food types tend to be clustered together:

Let’s look at tomatoes: 1-Nearest Neighbour

tomatoes closest to oranges

2 Nearest Neighbours for Classi cation

3 Nearest Neighbours for Regression

4 The Curse of Dimensionality

5 Advantages & Shortcomings

Reading: Shmueli et al., §7.1

1 Which distance function should we use?

Typically, distances are measured by a q-norm:

q=1 (Manhattan/taxicab distance):

Nearest Neighbour methods are sensitive to scaling:

Imagine we add a “spiciness”

What would happen to our

1 Min-max normalization: works well if predictor roughly uniform

2 Z-score normalization: good in the presence of outliers

The same transformation must be applied

Consider the binary predictor “male/female”:

The predictor can be replaced 1 if male,

Consider a 3-category temperature predictor “hot/medium/cold”:

What happens if k approaches n?

large k model complexity small k

bias: large k converges towards “training average” (under tting)

rule of thumb: k = n gives good performance

low variance high variance

Remove e.g. 20%, 1/3 or 1/2 of your training samples and

60% 20% 20%

60% 20% 20%

1 Propensities (“con dence values”) for predictions:

• The k-NN method assigns the majority category Y among the

2 Cutoff values (“don’t know’s”):

• We can specify a minimum con dence under which the k-NN

2 Nearest Neighbours for Classi cation

3 Nearest Neighbours for Regression

4 The Curse of Dimensionality

5 Advantages & Shortcomings

Reading: Shmueli et al., §7.2

The parameter k can be chosen in the same way as for a

Using a modi ed k-NN algorithm, Pandora recommends songs

2 Nearest Neighbours for Classi cation

3 Nearest Neighbours for Regression

4 The Curse of Dimensionality

5 Advantages & Shortcomings

5,000 training samples with p numerical features in [0,1]

Question: How close are the 4 nearest neighbours of a point?

Approximate We need to nd the smallest hypercube that covers

the [0,1]p hypercube has volume 1

In 100 dimensions, we need 93% side length

In 3 dimensions, we need 10% side length

Filling 50% of the volume takes side lengths of…

2 Nearest Neighbours for Classi cation

3 Nearest Neighbours for Regression

4 The Curse of Dimensionality

5 Advantages & Shortcomings

Reading: Shmueli et al., §7.3

Very simple but often surprisingly effective

Does not produce a model that offers insights into the

2 Nearest Neighbours for Classi cation

3 Nearest Neighbours for Regression

4 The Curse of Dimensionality

5 Advantages & Shortcomings

1 Optical Character Recognition

2 Face Recognition 3 Recommender

You might also like