Lecture12 KNN Classification and Missingness Module 2
Lecture12 KNN Classification and Missingness Module 2
Data
Data Science 1
CS 109A, STAT 121A, AC 209A, E-109A
ROC Curves
k-NN Revisited
Types of Missingness
Imputation Methods
2
ROC Curves
3
ROC Curves
4
ROC Curves
4
ROC Curve Example
5
ROC Curve for measuring classifier preformance
6
ROC Curve for measuring classifier preformance
6
k-NN Revisited
7
k-Nearest Neighbors
8
k-Nearest Neighbors
8
Review: Choice of k
9
Review: Choice of k
9
Choice of k Matters
2
4
10
100
500
2
1000
y_train
0
−2
−4
−6 −4 −2 0 2 4 6
x_train
10
k-NN for Classification
11
k-NN for Classification
11
k-NN for Classification
11
k-NN for Classification formal definition
Then, the k-NN classifier applies Bayes rule and classifies the
test observation, x0 , to the class with largest probability.
12
k-NN for Classification (cont.)
13
k-NN for Classification (cont.)
13
k-NN for Classification (cont.)
13
k-NN for Classification (cont.)
13
k-NN for Classification (cont.)
13
k-NN for Classification (cont.)
13
k-NN with Multiple Predictors
14
k-NN with Multiple Predictors
∑
P
D2 (xi , x0 ) = (xi,j − x0,j )2
j=1
14
k-NN with Multiple Predictors
15
k-NN with Multiple Predictors
15
k-NN with Multiple Predictors
15
Dealing with Missing Data
16
What is missing data?
17
What is missing data?
17
What is missing data?
17
Naively handling missingness
18
Naively handling missingness
18
Types of Missingness
19
Sources of Missingness
20
Types of Missingness
21
Missing completely at random (MCAR)
22
Missing at random (MAR)
23
Missing Not at Random (MNAR)
24
What type of missingness is present?
25
What type of missingness is present?
25
Imputation Methods
26
Handling missing data
27
Imputation methods
29
Schematic: imputation through modeling
30
Schematic: imputation through modeling
31
Schematic: imputation through modeling
31
Schematic: imputation through modeling
32
Schematic: imputation through modeling
33
Schematic: imputation through modeling
33
Imputation through modeling with uncertainty
34
Imputation through modeling with uncertainty
34
Imputation through modeling with uncertainty: an illustration
35
Imputation through modeling with uncertainty: linear regression
Y = β0 + β1 X1 + ... + βp Xp + ε
36
Imputation through modeling with uncertainty: k-NN regression
37
Imputation through modeling with uncertainty: k-NN regression
37
Imputation through modeling with uncertainty: classifiers
38
Imputation through modeling with uncertainty: classifiers
38
Imputation across multiple variables
If only one variable has missing entries, life is easy. But what
if all the predictor variables have a little bit of missingness
(with some observations having multiple entries missing)?
How can we handle that?
39
Imputation across multiple variables
If only one variable has missing entries, life is easy. But what
if all the predictor variables have a little bit of missingness
(with some observations having multiple entries missing)?
How can we handle that?
It’s an iterative process. Impute X1 based on X2 , ..., Xp . Then
impute X2 based on X1 and X3 , ..., Xp . And continue down
the line.
Any issues?
39
Imputation across multiple variables
If only one variable has missing entries, life is easy. But what
if all the predictor variables have a little bit of missingness
(with some observations having multiple entries missing)?
How can we handle that?
It’s an iterative process. Impute X1 based on X2 , ..., Xp . Then
impute X2 based on X1 and X3 , ..., Xp . And continue down
the line.
Any issues? Yes, not all of the missing values may be
imputed with just one ’run’ through the data set. So you will
have to repeat these ’runs’ until you have a completely filled
in data set.
39
Multiple imputation: beyond this class
40
Multiple imputation: beyond this class
40
Multiple imputation: beyond this class
40