Lab 4
Lab 4
Keri Hu
1/20
Today: Logistic regression in R
2/20
Variables in the dataset
3/20
Create training and testing sets
4/20
Install and load new package
5/20
Split dataset
6/20
Build a logistic regression model
7/20
Result of the model
8/20
Evaluate performance of the model
9/20
Plot predictions
10/20
Example: Classification/confusion matrix
• The prediction is FALSE if the probability is less than (or equal to)
0.5, and TRUE if the probability is greater than 0.5.
71 ` 11
Accuracy “ “ 82.83%
p71 ` 11q ` p3 ` 14q
• 3 false positive errors: predict poor care but actually good care
• 14 false negative errors: predict good care but actually poor care
11/20
Different threshold values
67 ` 13
Accuracy “ “ 80.81%
p67 ` 13q ` p7 ` 12q
• 7 false positive errors: predict poor care but actually good care
• 12 false negative errors: predict good care but actually poor care
12/20
Different threshold values
73 ` 6
Accuracy “ “ 79.80%
p73 ` 6q ` p1 ` 19q
• 1 false positive errors: predict poor care but actually good care
• 19 false negative errors: predict good care but actually poor care
13/20
ROC curve for the training set
14/20
Example: ROC curve
15/20
Add threshold labels and calculate AUC
• plot(ROCCurve, colorize=TRUE,
print.cutoffs.at=seq(0,1,0.1), text.adj=c(-0.2,0.7))
17/20
Classification/confusion matrix for the test set
18/20
Example: Classification matrix for the test set
19/20
ROC curve and AUC of the test set
20/20