0% found this document useful (0 votes)

25 views

Lab 4

lab lecture notes of R language.

Uploaded by

neilzhaony

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

Lab 4

lab lecture notes of R language.

Uploaded by

neilzhaony

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Logistic Regression in R

MACC7006 Accounting Data and Analytics

Keri Hu

Faculty of Business and Economics

1/20
Today: Logistic regression in R

By the end of today’s lecture, you should be able to:

• Create training and testing sets

• Build a logistic regression model
• Evaluate the model

We will work with the dataset: Healthcare.csv

• Predict whether a patient receives poor quality care, based on

information in his/her medical claims history

2/20
Variables in the dataset

3/20
Create training and testing sets

• Training dataset: used to build model

• Testing dataset: used to test the model’s out-of-sample accuracy

• If there is no chronological order on the observations, we randomly

assign observations to the training set or testing set.

4/20
Install and load new package

1. Install the package: install.packages("caTools")

2. Load into your current R session: library(caTools)

• When you use this package in the future, you will not need to
re-install it, but you will need to load it with the library function.

5/20
Split dataset

1. To replicate results by the same random number:

set.seed(any number )
• Restore “the seed” from a previous session, which enables us to reuse
the same set of random values

2. Randomly group data points:

sample.split(dependent variable, fraction of data in training set)
• Produce a TRUE/FALSE vector that helps us randomly split data into
two pieces according to the SplitRatio value (% of training data)

3. Split data into training set or testing set:

subset(data frame, spl==TRUE/FALSE )
• If spl is TRUE, put the corresponding observation in the training set;
if spl is FALSE, put the corresponding observation in the testing set.

6/20
Build a logistic regression model

1. Change the type/class of variables if needed using as.factor(),

as.numeric(), as.character(), etc.
• Here, PoorCare “ Y means quality is poor and N otherwise.

2. Generalized linear model:

glm(dependent variable „ sum of independent variables, data =
training set, family = binomial)
• Used for many different types of models
• family = binomial indicates that we are building a logistic
regression model

7/20
Result of the model

8/20
Evaluate performance of the model

If we want to calculate accuracy on the training set with threshold 0.5:

1. Prediction for the training set:

PredictTrain <- predict(logistic model, type="response")
• The type="response" option tells R to output probabilities of the
form PrpY “ 1|Xq, as opposed to other information such as logit.
• If no new data is specified within predict(), then probabilities are
computed for the training data used to fit the logistic regression.

2. Create a classification/confusion matrix for a threshold of 0.5:

table(training set$dependent variable, PredictTrain > 0.5)
• table() counts observations in each class of the variable(s).

9/20
Plot predictions

1. Add the vector of predictions to the data set:

training set$Predict <- PredictTrain
2. Plot predictions (about the training set)

10/20
Example: Classification/confusion matrix

Threshold value “ 0.5:

FALSE (predicted good care) TRUE (predicted poor care)

N (actual good care) 71 3
Y (actual poor care) 14 11

• The prediction is FALSE if the probability is less than (or equal to)
0.5, and TRUE if the probability is greater than 0.5.

71 ` 11
Accuracy “ “ 82.83%
p71 ` 11q ` p3 ` 14q

• 3 false positive errors: predict poor care but actually good care
• 14 false negative errors: predict good care but actually poor care

11/20
Different threshold values

Threshold value “ 0.3:

FALSE (predicted good care) TRUE (predicted poor care)

N (actual good care) 67 7
Y (actual poor care) 12 13

67 ` 13
Accuracy “ “ 80.81%
p67 ` 13q ` p7 ` 12q

• 7 false positive errors: predict poor care but actually good care
• 12 false negative errors: predict good care but actually poor care

12/20
Different threshold values

Threshold value “ 0.7:

FALSE (predicted good care) TRUE (predicted poor care)

N (actual good care) 73 1
Y (actual poor care) 19 6

73 ` 6
Accuracy “ “ 79.80%
p73 ` 6q ` p1 ` 19q

• 1 false positive errors: predict poor care but actually good care
• 19 false negative errors: predict good care but actually poor care

13/20
ROC curve for the training set

1. Install and load the ROCR package:

install.packages("ROCR"), library(ROCR)

2. Generate an ROC curve:

2.1 Create a prediction object that the ROCR package can understand:
ROCRpred <- prediction(PredictTrain, training set$
dependent variable)
2.2 Calculate performance metrics for the ROC curve:
ROCCurve <- performance(ROCRpred, "tpr", "fpr")
• "tpr": true positive rate
• "fpr": false positive rate

2.3 Plot the ROC curve:

plot(ROCCurve)

14/20
Example: ROC curve

Where is the threshold, say 0.5, on the curve?

15/20
Add threshold labels and calculate AUC

• plot(ROCCurve, colorize=TRUE,
print.cutoffs.at=seq(0,1,0.1), text.adj=c(-0.2,0.7))

• AUC of the training set

as.numeric(performance(ROCRpred, "auc")@y.values)
[1] 0.7945946
16/20
Prediction for the test set

• We should make out-of-sample predictions.

• This can be done on the test set by adding newdata:

PredictTest = predict(logistic model, type = "response",
newdata = testing set)

17/20
Classification/confusion matrix for the test set

Threshold value “ 0.5:

table(testing set$dependent variable, PredictTest > 0.5)

18/20
Example: Classification matrix for the test set

FALSE (predicted good care) TRUE (predicted poor care)

N (actual good care) 23 1
Y (actual poor care) 3 5

• Accuracy on the test set “ p23 ` 5q{rp23 ` 5q ` p1 ` 3qs “ 87.5%

• 1 false positive prediction
• 3 false negative prediction

19/20
ROC curve and AUC of the test set

• Plot ROC curve

• ROCRpredtest = prediction(PredictTest, testing set$
dependent variable)
• ROCCurvetest = performance(ROCRpredtest, "tpr", "fpr")
• plot(ROCCurvetest, colorize=TRUE,
print.cutoffs.at=seq(0,1,0.1), text.adj=c(-0.2,0.7)))

• AUC of the test set

as.numeric(performance(ROCRpredtest, "auc")@y.values)
[1] 0.875

20/20

Keith McNulty - Handbook of Regression Modeling in People Analytics-Routledge (2021)
100% (1)
Keith McNulty - Handbook of Regression Modeling in People Analytics-Routledge (2021)
272 pages
Acura TL 2006
No ratings yet
Acura TL 2006
83 pages
Regression Modeling Strategies
No ratings yet
Regression Modeling Strategies
506 pages
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
Morkels Transport V Melrose Foods
No ratings yet
Morkels Transport V Melrose Foods
25 pages
Flotation Control and Optimisation
No ratings yet
Flotation Control and Optimisation
11 pages
14 Legal Duties of A Coach
No ratings yet
14 Legal Duties of A Coach
2 pages
BAUDM Assignment2
No ratings yet
BAUDM Assignment2
16 pages
Logistic Regression vs. SVMs - Solution
No ratings yet
Logistic Regression vs. SVMs - Solution
7 pages
Uni T - 2 - R Programming
No ratings yet
Uni T - 2 - R Programming
10 pages
B-56 Sanket Jambhulkar MLA-3
No ratings yet
B-56 Sanket Jambhulkar MLA-3
7 pages
Machine Learning Model
No ratings yet
Machine Learning Model
9 pages
Logistic Regression Lecture Notes
No ratings yet
Logistic Regression Lecture Notes
11 pages
7708 - MBA PredAnanBigDataNov21
No ratings yet
7708 - MBA PredAnanBigDataNov21
11 pages
Logistic Regression and Discriminant Analysis: Jerry D.T. Purnomo, PH.D
No ratings yet
Logistic Regression and Discriminant Analysis: Jerry D.T. Purnomo, PH.D
54 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
ML-2-PPT-UNIT-2
No ratings yet
ML-2-PPT-UNIT-2
214 pages
Lecture Notes - Logistic Regression
100% (1)
Lecture Notes - Logistic Regression
11 pages
Codes
No ratings yet
Codes
14 pages
AIH_LAB1
No ratings yet
AIH_LAB1
10 pages
Logistic Regression Lecture Notes
No ratings yet
Logistic Regression Lecture Notes
11 pages
Practical Machine Learning Course Notes
No ratings yet
Practical Machine Learning Course Notes
76 pages
cor
No ratings yet
cor
6 pages
Supervised Learning in R Classification
No ratings yet
Supervised Learning in R Classification
7 pages
Logistic Regression
No ratings yet
Logistic Regression
18 pages
Session 7-8 - Data Cleaning and Logistic Regression For Classification
No ratings yet
Session 7-8 - Data Cleaning and Logistic Regression For Classification
30 pages
Rstudio Study Notes For PA 20181126
No ratings yet
Rstudio Study Notes For PA 20181126
6 pages
Heart Disease Prediction With ML
No ratings yet
Heart Disease Prediction With ML
18 pages
Preview-9781000427899 A41277316
No ratings yet
Preview-9781000427899 A41277316
28 pages
ML Model Paper 2 Solution
No ratings yet
ML Model Paper 2 Solution
15 pages
BDA MSC It
No ratings yet
BDA MSC It
35 pages
Rms PDF
No ratings yet
Rms PDF
506 pages
Basic ML Algorithm
No ratings yet
Basic ML Algorithm
74 pages
AST Day 2 Slides
No ratings yet
AST Day 2 Slides
58 pages
Module-2_Logistic Regression in Machine Learning
No ratings yet
Module-2_Logistic Regression in Machine Learning
28 pages
Lab 4 Classification v.0
No ratings yet
Lab 4 Classification v.0
5 pages
Predicting Disease With Machine Learning
No ratings yet
Predicting Disease With Machine Learning
20 pages
Course Regression Model Strategies PDF
No ratings yet
Course Regression Model Strategies PDF
307 pages
PE IV - Practical Machine Learning
No ratings yet
PE IV - Practical Machine Learning
7 pages
Logistic Regression
No ratings yet
Logistic Regression
3 pages
Logistic Regression
100% (1)
Logistic Regression
10 pages
Classification Models
No ratings yet
Classification Models
3 pages
Analytics in Practice: Model Evaluation
No ratings yet
Analytics in Practice: Model Evaluation
40 pages
Practical 3 2022
No ratings yet
Practical 3 2022
8 pages
Course PDF
No ratings yet
Course PDF
403 pages
Project
No ratings yet
Project
16 pages
Ml2-Summary
No ratings yet
Ml2-Summary
8 pages
Iot Hospital Management System and Analysis With Accessing Data From Cloud Using Machine Learning
No ratings yet
Iot Hospital Management System and Analysis With Accessing Data From Cloud Using Machine Learning
7 pages
IS4242 W6 Model Evaluation and Selection
No ratings yet
IS4242 W6 Model Evaluation and Selection
86 pages
MLS+2+-+Classification
No ratings yet
MLS+2+-+Classification
13 pages
Adv Analytical Theory and Methods: Regression
No ratings yet
Adv Analytical Theory and Methods: Regression
45 pages
RPubs - The Analytics Edge EdX MIT15
No ratings yet
RPubs - The Analytics Edge EdX MIT15
57 pages
Logistic Regression Essentials in R - Articles - STHDA
No ratings yet
Logistic Regression Essentials in R - Articles - STHDA
10 pages
Logistic Regression
No ratings yet
Logistic Regression
41 pages
Aychew Chernet
No ratings yet
Aychew Chernet
8 pages
Lab 3 - Logistic Regression: Part B
No ratings yet
Lab 3 - Logistic Regression: Part B
7 pages
Regression in R
No ratings yet
Regression in R
40 pages
Aquif Ibrar 1212
No ratings yet
Aquif Ibrar 1212
9 pages
SC&RP - Unit 5
No ratings yet
SC&RP - Unit 5
36 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Ex5
No ratings yet
Ex5
6 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Exercises of Statistical Inference
From Everand
Exercises of Statistical Inference
Simone Malacrida
No ratings yet
JavaScript: Advanced Guide to Programming Code with Javascript: JavaScript Computer Programming, #4
From Everand
JavaScript: Advanced Guide to Programming Code with Javascript: JavaScript Computer Programming, #4
Charlie Masterson
No ratings yet
Lab 5
No ratings yet
Lab 5
30 pages
Lab 3 (Tutorial 1)
No ratings yet
Lab 3 (Tutorial 1)
20 pages
Lab 2
No ratings yet
Lab 2
23 pages
Lab 1
No ratings yet
Lab 1
26 pages
Stephanie Teow - Sun Newspaper 10 Jan
No ratings yet
Stephanie Teow - Sun Newspaper 10 Jan
1 page
Intentional Torts: Tresspass To The Person
No ratings yet
Intentional Torts: Tresspass To The Person
13 pages
Cybersource-Verification Svcs SO API
No ratings yet
Cybersource-Verification Svcs SO API
73 pages
Ekwusigo LGA
No ratings yet
Ekwusigo LGA
32 pages
Design and Analysis of Different Shapes of 50 Storeys High Rise Building Under Different Loading Condition
No ratings yet
Design and Analysis of Different Shapes of 50 Storeys High Rise Building Under Different Loading Condition
6 pages
DN2-DI2-0006-ENG-OBE-V2 SD4 Extract From OH
No ratings yet
DN2-DI2-0006-ENG-OBE-V2 SD4 Extract From OH
3 pages
Annexure-I Self Help Group (SHG) Profile SHG Related Questions
No ratings yet
Annexure-I Self Help Group (SHG) Profile SHG Related Questions
5 pages
Tuition Fee - New
No ratings yet
Tuition Fee - New
3 pages
GE4 PPT Writing-Meeting-Minutes
No ratings yet
GE4 PPT Writing-Meeting-Minutes
16 pages
23M104-BEEE Syllabus
No ratings yet
23M104-BEEE Syllabus
1 page
7 Solutions
0% (1)
7 Solutions
11 pages
Ed Dodson, Chemeketa Board Candidate Questionnaire
No ratings yet
Ed Dodson, Chemeketa Board Candidate Questionnaire
5 pages
CBC Corrosion Calculations
No ratings yet
CBC Corrosion Calculations
1 page
CH7 Turbine Selection
No ratings yet
CH7 Turbine Selection
34 pages
Session Seven Hard Systems Models of Change: Learning Objectives
No ratings yet
Session Seven Hard Systems Models of Change: Learning Objectives
11 pages
Palmyra-Med and Palmyra-Fin: Leading Domain-Specific AI Models
No ratings yet
Palmyra-Med and Palmyra-Fin: Leading Domain-Specific AI Models
8 pages
American Connector Company: Case Analysis Report
No ratings yet
American Connector Company: Case Analysis Report
5 pages
Estoppel
No ratings yet
Estoppel
8 pages
Hs Module 5A: Height Safety
No ratings yet
Hs Module 5A: Height Safety
2 pages
Change Management
No ratings yet
Change Management
19 pages
Resume 2023 Consultant
No ratings yet
Resume 2023 Consultant
2 pages
Christy James OS Report
No ratings yet
Christy James OS Report
80 pages
Community Develpoment Project With Local Organization/Ngo ON
No ratings yet
Community Develpoment Project With Local Organization/Ngo ON
8 pages
Rajen-Resume
No ratings yet
Rajen-Resume
3 pages
Supply-Chain Management Future
No ratings yet
Supply-Chain Management Future
71 pages
LG 42pc1da-Ec (ET)
No ratings yet
LG 42pc1da-Ec (ET)
48 pages