0% found this document useful (0 votes)

13 views

Lecture12 KNN Classification and Missingness Module 2

The document discusses k-nearest neighbors (k-NN) classification and regression algorithms. It covers ROC curves, choosing the k parameter, handling ties and imbalanced classes, and extending k-NN to multiple predictors. It also discusses dealing with missing data and different types of missingness.

Uploaded by

ashishamitav123

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Lecture12 KNN Classification and Missingness Module 2

Uploaded by

ashishamitav123

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 67

Lecture #12: kNN Classification and Missing

Data
Data Science 1
CS 109A, STAT 121A, AC 209A, E-109A

Pavlos Protopapas Kevin Rader

Margo Levine Rahul Dave
Lecture Outline

ROC Curves

k-NN Revisited

Dealing with Missing Data

Types of Missingness

Imputation Methods

2
ROC Curves

3
ROC Curves

The ROC curve illustrates the trade-off for all possible

thresholds chosen for the two types of error (or correct
classification).

The vertical axis displays the true positive predictive value

and the horizontal axis depicts the true negative predictive
value.

What is the shape of an ideal ROC curve?

4
ROC Curves

The ROC curve illustrates the trade-off for all possible

thresholds chosen for the two types of error (or correct
classification).

The vertical axis displays the true positive predictive value

and the horizontal axis depicts the true negative predictive
value.

What is the shape of an ideal ROC curve?

See next slide for an example.

4
ROC Curve Example

5
ROC Curve for measuring classifier preformance

The overall performance of a classifier, calculated over all

possible thresholds, is given by the area under the ROC curve
(’AUC’). Let T be the threshold False Positive Rate, and let
T P R(T ) be the corresponding True Positive rate at T , then
the AUC is simply just the integral of the function:
∫ 1
AU C = T P R(T )dT
0
What is the worst case scenario for AUC? What is the best
case? What is AUC if we independently just flip a coin to
perform classification?

6
ROC Curve for measuring classifier preformance

The overall performance of a classifier, calculated over all

6
k-NN Revisited

7
k-Nearest Neighbors

We’ve already seen the k-NN method for predicting a

quantitative response (it was the very first method we
introduced). How was k-NN implemented in the Regression
setting (quantitative response)?

8
k-Nearest Neighbors

We’ve already seen the k-NN method for predicting a

quantitative response (it was the very first method we
introduced). How was k-NN implemented in the Regression
setting (quantitative response)?
The approach was simple: to predict an observation’s
response, use the other available observations that are most
similar to it.
For a specified value of k, each observation’s outcome is
predicted to be the average of the k-closest observations as
measured by some distance of the predictor(s).
With one predictor, the method was easily implemented.

8
Review: Choice of k

How well the predictions perform is related to the choice of k.

What will the predictions look like if k is very small? What if
it is very large?
More specifically, what will the predictions be for new
observations if k = n?

9
Review: Choice of k

How well the predictions perform is related to the choice of k.

What will the predictions look like if k is very small? What if
it is very large?
More specifically, what will the predictions be for new
observations if k = n?
ȳ
A picture is worth a thousand words...

9
Choice of k Matters

2
4

100

500
2

1000
y_train

0
−2
−4

−6 −4 −2 0 2 4 6

x_train

10
k-NN for Classification

How can we modify the k-NN approach for classification?

11
k-NN for Classification

How can we modify the k-NN approach for classification?

The approach here is the same as for k-NN regression: use
the other available observations that are most similar to the
observation we are trying to predict (classify into a group)
based on the predictors at hand.
How do we classify which category a specific observation
should be in based on its nearest neighbors?

11
k-NN for Classification

How can we modify the k-NN approach for classification?

11
k-NN for Classification formal definition

The KNN classifier first identifies the k points in the training

data that are closest to x0 , represented by N0 . It then
estimates the conditional probability for class j as the
fraction of points in N0 whose response values equal j:
1 ∑
P (Y = j|X = x0 ) = I(yi = j)
K
i∈N0

Then, the k-NN classifier applies Bayes rule and classifies the
test observation, x0 , to the class with largest probability.

12
k-NN for Classification (cont.)