0% found this document useful (0 votes)

18 views45 pages

Seu Ds610 Mod03

Uploaded by

g230001395

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views45 pages

Seu Ds610 Mod03

Uploaded by

g230001395

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 45

Live Session

Module 3
Logistic Regression

DS610
Advanced Applied
Statistics for Data Science

Instructor Name
Module 3 Overview
This module will cover logistic regression. Logistic
Regression is used for classification purposes where the
outcome can be binary (two classes) or more than two
classes. You will use R to perform Logistic Regression
and estimate parameter values and their confidence
values.

Chapter 8 (Discovering Statistics Using R)

2
DS610: Advanced Applied Statistics for Data Science
Module 3 Learning Outcomes

1. Understand when to perform logistic regression

and the underlying assumptions.
2. Perform binomial logistic regression using R.
3. Perform multinomial logistic regression using R.

3
DS610: Advanced Applied Statistics for Data Science
Logistic Regression
Aims
• When and why do we use logistic
regression?
– Binary
– Multinomial
• Theory behind logistic regression
– Assessing the model
– Assessing predictors
– Things that can go wrong
• Interpreting logistic regression
Slide 5
When and Why

• To predict an outcome variable that is

categorical from one or more categorical or
continuous predictor variables.
• Used because having a categorical
outcome variable violates the assumption
of linearity in normal regression.

Slide 6
With One Predictor

• Outcome
– We predict the probability of the outcome
occurring
• b0 and b1
– Can be thought of in much the same way as
multiple regression
– Note the normal regression equation forms part
of the logistic regression equation

Slide 7
With Several Predictors

• Outcome
– We still predict the probability of the outcome
occurring
• Differences
– Note the multiple regression equation forms
part of the logistic regression equation
– This part of the equation expands to
accommodate additional predictors

Slide 8
Assessing the Model
N

∑ [Y ln(P(Y ))+(1−Y )ln(1−P(Y ))]

log −likelihood= i i i i
i =1

• The log-likelihood statistic

– Analogous to the residual sum of squares in
multiple regression
– It is an indicator of how much unexplained
information there is after the model has been
fitted.
– Large values indicate poorly fitting statistical
models.
Assessing Changes in Models
• It’s possible to calculate a log-likelihood for
different models and to compare these
models by looking at the difference
between their log-likelihoods.

χ 2 =2[LL(ne ) −LL(baseline
)]
w
(df =knew−kbaselin )
e
Assessing Predictors: The Wald
Statistic

• Similar to t-statistic in regression.

• Tests the null hypothesis that b = 0.
• Is biased when b is large.
• Better to look at likelihood ratio statistics.

Slide 11
Assessing Predictors:
The Odds Ratio

• Indicates the change in odds resulting from

a unit change in the predictor.
– OR > 1: Predictor ↑, Probability of outcome
occurring ↑.
– OR < 1: Predictor ↑, Probability of outcome
occurring ↓.

Slide 12
Methods of Regression
• Forced entry: all variables entered simultaneously.
• Hierarchical: variables entered in blocks.
– Blocks should be based on past research, or theory
being tested. Good method.
• Stepwise: variables entered on the basis of
statistical criteria (i.e. relative contribution to
predicting outcome).
– Should be used only for exploratory analysis.

Slide 13
Things That Can Go Wrong
• Assumptions from linear regression:
– Linearity
– Independence of errors
– Multicollinearity
• Unique problems
– Incomplete information
– Complete separation
– Overdispersion
Incomplete Information from the
Predictors
• Categorical predictors:
– Predicting cancer from smoking and eating tomatoes.
– We don’t know what happens when non-smokers eat
tomatoes because we have no data in this cell of the design.
• Continuous variables
– Will your sample include an 80-year-old, highly anxious,
Buddhist, left-handed lesbian?
Complete Separation
• When the outcome variable can be perfectly
predicted.
– E.g. predicting whether someone is a burglar, your
teenage son or your cat based on weight.
– Weight is a perfect predictor of cat/burglar unless
you have a very fat cat indeed!
Overdispersion
• Overdispersion is where the variance is
larger than expected from the model.
• This can be caused by violating the
assumption of independence.
• This problem makes the standard errors too
small!
An Example
• Predictors of a treatment intervention.
• Participants
– 113 adults with a medical problem
• Outcome:
– Cured (1) or not cured (0).
• Predictors:
– Intervention: intervention or no treatment.
– Duration: the number of days before treatment that
the patient had the problem.

Slide 18
Basic Logistic Regression Analysis Using R
Commander

Reordering a factor in R Commander

Basic Logistic Regression Analysis Using R
Commander

Dialog box for generalized linear models in R Commander

Basic Logistic Regression Analysis Using R

newModel<-glm(outcome ~ predictor(s), data =

dataFrame, family = name of a distribution,
na.action = an action)
Hierarchical Regression Using R
• Model 1:
eelModel.1 <- glm(Cured ~ Intervention, data =
eelData, family = binomial())
• Model 2:
eelModel.2 <- glm(Cured ~ Intervention +
Duration, data = eelData, family = binomial())

summary(eelModel.1)
summary(eelModel.2)
Output Model 1: Intervention Only
Call:
glm(formula = Cured ~ Intervention, family = binomial(), data = eelData)

Deviance Residuals:
Min 1Q Median 3Q Max
-1.5940 -1.0579 0.8118 0.8118 1.3018

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.2877 0.2700 -1.065 0.28671
InterventionIntervention 1.2287 0.3998 3.074 0.00212 **

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 154.08 on 112 degrees of freedom

Residual deviance: 144.16 on 111 degrees of freedom
AIC: 148.16
Improvement: Model 1
• Find the improvement:
modelChi <- eelModel.1$null.deviance - eelModel.1$deviance
modelChi

[1] 9.926201
• degrees of freedom :
chidf <- eelModel.1$df.null - eelModel.1$df.residual
chidf

[1] 1
• To calculate the probability associated with this chi-square statistic we
can use the pchisq() function.
chisq.prob <- 1 - pchisq(modelChi, chidf)
chisq.prob

[1] 0.001629425
Writing a Function to Compute R2
logisticPseudoR2s <- function(LogModel) {
dev <- LogModel$deviance
nullDev <- LogModel$null.deviance
modelN <- length(LogModel$fitted.values)
R.l <- 1 - dev / nullDev
R.cs <- 1- exp ( -(nullDev - dev) / modelN)
R.n <- R.cs / ( 1 - ( exp (-(nullDev / modelN))))
cat("Pseudo R^2 for logistic regression\n")
cat("Hosmer and Lemeshow R^2 ", round(R.l, 3), "\n")
cat("Cox and Snell R^2 ", round(R.cs, 3), "\n")
cat("Nagelkerke R^2 ", round(R.n, 3), "\n")
}
Writing a Function to Compute R2
• To use the function on our model, we simply
place the name of the logistic regression model
(in this case eelModel.1) in the function and
execute:
logisticPseudoR2s(eelModel.1)
• The output will be:
Pseudo R^2 for logistic regression
Hosmer and Lemeshow R^2 0.064
Cox and Snell R^2 0.084
Nagelkerke R^2 0.113
Calculating the Odds Ratio
• We can also calculate the odds ratio as the exponential of the
b coefficient for the predictor variables by executing:
exp(eelModel.1$coefficients)

(Intercept) InterventionIntervention
0.750000 3.416667

• To get the confidence intervals execute:

exp(confint(eelModel.1))

2.5 % 97.5 %
(Intercept) 0.4374531 1.268674
InterventionIntervention 1.5820127 7.625545
Output Model 2: Intervention and
Duration as Predictors
Call:
glm(formula = Cured ~ Intervention + Duration, family = binomial(),
data = eelData)

Deviance Residuals:
Min 1Q Median 3Q Max
-1.6025 -1.0572 0.8107 0.8161 1.3095

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.234660 1.220563 -0.192 0.84754
InterventionIntervention 1.233532 0.414565 2.975 0.00293 **
Duration -0.007835 0.175913 -0.045 0.96447

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 154.08 on 112 degrees of freedom

Residual deviance: 144.16 on 110 degrees of freedom
AIC: 150.16
Improvement: Model 2
• We can compare the models by finding the difference in
the deviance statistics as before.
• Or we can use the anova() function:
anova(eelModel.1, eelModel.2)

>Analysis of Deviance Table

Model 1: Cured ~ Intervention

Model 2: Cured ~ Intervention + Duration
Resid. Df Resid. Dev Df Deviance
1 111 144.16
2 110 144.16 1 0.0019835
Summary
• The overall fit of the final model is shown by the deviance statistic and its
associated chi-square statistic.
– If the significance of the chi-square statistic is less than .05, then the model is a
significant fit to the data.
• Check the table labelled coefficients to see which variables significantly
predict the outcome.
• For each variable in the model, look at the z-statistic and its significance
(which again should be below .05).
• Use the odds ratio for interpretation. You can obtain this using
exp(model$coefficients), where model is the name of your model.
– If the value is greater than 1 then as the predictor increases, the odds of the
outcome occurring increase.
– A value less than 1 indicates that as the predictor increases, the odds of the
outcome occurring decrease.
– For the aforementioned interpretation to be reliable the confidence interval of
the odds ratio should not cross 1!
Reporting the Analysis
Multinomial Logistic Regression
• Logistic regression to predict membership of more than two
categories.
• It (basically) works in the same way as binary logistic regression.
• The analysis breaks the outcome variable down into a series of
comparisons between two categories.
– E.g., if you have three outcome categories (A, B and C), then the
analysis will consist of two comparisons that you choose:
• compare everything against your first category (e.g. A vs. B and A vs. C),
• or your last category (e.g. A vs. C and B vs. C),
• or a custom category (e.g. B vs. A and B vs. C).
• The important parts of the analysis and output are much the
same as we have just seen for binary logistic regression.
I May Not Be Fred Flintstone …
• How successful are chat-up lines?
• The chat-up lines used by 348 men and 672 women in a
nightclub were recorded.
• Outcome:
– Whether the chat-up line resulted in one of the following three events:
• the person got no response or the recipient walked away;
• the person obtained the recipient’s phone number;
• the person left the nightclub with the recipient.
• Predictors:
– The content of the chat-up lines were rated for:
• funniness (0 = not funny at all, 10 = the funniest thing that I have ever heard);
• sexuality (0 = no sexual content at all, 10 = very sexually direct);
• moral values (0 = the chat-up line does not reflect good characteristics, 10 =
the chat-up line is very indicative of good characteristics).
– Gender of recipient
Multinomial Logistic Regression in R

• We can use the mlogit.data() function to

convert our data into the correct format:
newDataframe<-mlogit.data(oldDataFrame,
choice = "outcome variable", shape =
"wide"/"long")
Restructuring the Data
• Therefore, to restructure the current data
we could execute:
mlChat <- mlogit.data(chatData, choice =
"Success", shape = "wide”)
Running Multinomial Regression
• Now we are ready to run the multinomial logistic
regression, using the mlogit() function:
newModel<-mlogit(outcome ~ predictor(s), data =
dataFrame, na.action = an action, reflevel = a number
representing the baseline category for the outcome)
• We can, therefore, create the model by executing:
chatModel <- mlogit(Success ~ 1 | Good_Mate + Funny +
Gender + Sex + Gender:Sex + Funny:Gender, data =
mlChat, reflevel = 3)

summary(chatModel)
Interpretation
• To help with the interpretation we can
exponentiate the coefficients:
exp(chatModel$coefficients)
• We can make the output nicer by asking R
to print the variable as a dataframe:
data.frame(exp(chatModel$coefficients))
Exponentiated Coefficients
Confidence Intervals
• We can get confidence intervals for these
coefficients using the confint() function:
exp(confint(chatModel))
Confidence Intervals
Interpretation:
Phone Number vs. No Response
• Good_Mate: Whether the chat-up line showed signs of good moral fibre
significantly predicted whether you got a phone number or no response/walked
away, b = 0.13, Wald χ2(1) = 6.02, p < .05.
• Funny: Whether the chat-up line was funny did not significantly predict whether
you got a phone number or no response, b = 0.14, Wald χ2(1) = 1.60, p > .05.
• Gender: The gender of the person being chatted up significantly predicted
whether they gave out their phone number or gave no response, b = −1.65, Wald
χ2(1) = 4.27, p < .05.
• Sex: The sexual content of the chat-up line significantly predicted whether you got
a phone number or no response/walked away, b = 0.28, Wald χ2(1) = 9.59, p < .01.
• Funny × Gender: The success of funny chat-up lines depended on whether they
were delivered to a man or a woman because in interaction these variables
predicted whether or not you got a phone number, b = 0.49, Wald χ2(1) = 12.37, p
< .001.
• Sex × Gender: The success of chat-up lines with sexual content depended on
whether they were delivered to a man or a woman because in interaction these
variables predicted whether or not you got a phone number, b = −0.35, Wald χ2(1)
= 10.82, p < .01.
Interpretation:
Going Home vs. No Response
• Good_Mate: Whether the chat-up line showed signs of good moral fibre did not
significantly predict whether you went home with the date or got a slap in the face,
b = 0.13, Wald χ2(1) = 2.42, p > .05.
• Funny: Whether the chat-up line was funny significantly predicted whether you went
home with the date or no response, b = 0.32, Wald χ2(1) = 6.46, p < .05.
• Gender: The gender of the person being chatted up significantly predicted whether
they went home with the person or gave no response, b = −5.63, Wald χ2(1) = 17.93,
p < .001.
• Sex: The sexual content of the chat-up line significantly predicted whether you went
home with the date or got a slap in the face, b = 0.42, Wald χ2(1) = 11.68, p < .01.
• Funny × Gender: The success of funny chat-up lines depended on whether they were
delivered to a man or a woman because in interaction these variables predicted
whether or not you went home with the date, b = 1.17, Wald χ2(1) = 34.63, p < .001.
• Sex × Gender: The success of chat-up lines with sexual content depended on
whether they were delivered to a man or a woman because in interaction these
variables predicted whether or not you went home with the date, b = −0.48, Wald
χ2(1) = 8.51, p < .01.
Reporting the Results
References

Chapter 8 (Discovering Statistics Using R)

Foundations of Deep Reinforcement Learning Theory and Practice in Python (Laura Graesser, Wah Loon Keng) (Z-Library)
100% (3)
Foundations of Deep Reinforcement Learning Theory and Practice in Python (Laura Graesser, Wah Loon Keng) (Z-Library)
413 pages
Logistic Regression
0% (1)
Logistic Regression
49 pages
ANOVA and MANOVA: Statistics For Psychology
No ratings yet
ANOVA and MANOVA: Statistics For Psychology
34 pages
R Code Default Data PDF
No ratings yet
R Code Default Data PDF
10 pages
Day 13 Logistic Regression
No ratings yet
Day 13 Logistic Regression
28 pages
Binary Logistic Regression - 6.2
No ratings yet
Binary Logistic Regression - 6.2
34 pages
Logistic Regression (With R) : 1 Theory
No ratings yet
Logistic Regression (With R) : 1 Theory
15 pages
Logistic Regression: Continued Psy 524 Ainsworth
0% (1)
Logistic Regression: Continued Psy 524 Ainsworth
29 pages
07 GLM
No ratings yet
07 GLM
49 pages
Statistical Modelling Assignment II
No ratings yet
Statistical Modelling Assignment II
3 pages
Sestrada Logistic Regression in R 02172023
No ratings yet
Sestrada Logistic Regression in R 02172023
25 pages
2101 F 12 Logistic Regression With R1
No ratings yet
2101 F 12 Logistic Regression With R1
10 pages
Using R For Linear Regression
No ratings yet
Using R For Linear Regression
9 pages
Cap1 Slides
No ratings yet
Cap1 Slides
30 pages
Logistic
No ratings yet
Logistic
14 pages
Problem-Set - 1 Practise Problems From Textbook
No ratings yet
Problem-Set - 1 Practise Problems From Textbook
2 pages
A Simple But Effective Logistic Regression Derivation
No ratings yet
A Simple But Effective Logistic Regression Derivation
6 pages
Exercice V
No ratings yet
Exercice V
5 pages
How To Perform Simple Linear Regression in Python
No ratings yet
How To Perform Simple Linear Regression in Python
8 pages
Ch3 Multiple Regression
No ratings yet
Ch3 Multiple Regression
56 pages
Regression 101
No ratings yet
Regression 101
18 pages
Wooldridge 7e Ch06 SM
No ratings yet
Wooldridge 7e Ch06 SM
9 pages
Modern Regression Homework 5-1
No ratings yet
Modern Regression Homework 5-1
8 pages
Atelier Regression Logistique
No ratings yet
Atelier Regression Logistique
4 pages
Assignment On Probit Model
No ratings yet
Assignment On Probit Model
17 pages
Lesson 3 Logistic Regression Diagnostics
No ratings yet
Lesson 3 Logistic Regression Diagnostics
37 pages
Adv Analytical Theory and Methods: Regression
No ratings yet
Adv Analytical Theory and Methods: Regression
45 pages
Lecture 16 Regression
No ratings yet
Lecture 16 Regression
30 pages
Weatherwax Weisberg Solutions
No ratings yet
Weatherwax Weisberg Solutions
162 pages
2 Modele Lineare
No ratings yet
2 Modele Lineare
43 pages
In All The Regression Models That We Have Considered So
100% (1)
In All The Regression Models That We Have Considered So
52 pages
Regression in R
No ratings yet
Regression in R
40 pages
Logistic Regression With R
No ratings yet
Logistic Regression With R
5 pages
Logit Regression - R Data Analysis Examples
No ratings yet
Logit Regression - R Data Analysis Examples
12 pages
Assignment 1
No ratings yet
Assignment 1
7 pages
Logisticregression PDF
No ratings yet
Logisticregression PDF
48 pages
Multinomial Logistic Regression - R Data Analysis Examples - IDRE Stats
No ratings yet
Multinomial Logistic Regression - R Data Analysis Examples - IDRE Stats
8 pages
Lec-4 Logistic Regression
No ratings yet
Lec-4 Logistic Regression
54 pages
Econo Labs
No ratings yet
Econo Labs
27 pages
Logit, Probit and Multinomial Logit Models in R: Oscar Torres-Reyna
No ratings yet
Logit, Probit and Multinomial Logit Models in R: Oscar Torres-Reyna
27 pages
Linear Regression
100% (2)
Linear Regression
228 pages
Notes Book
No ratings yet
Notes Book
39 pages
Logit Probit
No ratings yet
Logit Probit
66 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
Logistics Models: Question One (Exercise 6.1)
No ratings yet
Logistics Models: Question One (Exercise 6.1)
6 pages
Problem 4.1 A)
No ratings yet
Problem 4.1 A)
11 pages
Bda Assign
No ratings yet
Bda Assign
15 pages
Linear Regression
No ratings yet
Linear Regression
13 pages
ECON3002 2013 Final Merged Answer
No ratings yet
ECON3002 2013 Final Merged Answer
23 pages
Logit R101
No ratings yet
Logit R101
27 pages
Lecture 9
No ratings yet
Lecture 9
30 pages
Diffusion Models: A Comprehensive Survey of Methods and Applications
No ratings yet
Diffusion Models: A Comprehensive Survey of Methods and Applications
54 pages
Institute of Engineering and Management, Kolkata Artificial Intelligence Project (CS793C) On Handwriting Analysis
No ratings yet
Institute of Engineering and Management, Kolkata Artificial Intelligence Project (CS793C) On Handwriting Analysis
11 pages
Monitoring and Controlling of A Real-Time Ball Beam Fuzzy Predicting Based On PLC Network and Information Technologies
No ratings yet
Monitoring and Controlling of A Real-Time Ball Beam Fuzzy Predicting Based On PLC Network and Information Technologies
8 pages
AI Intern Interview Complete Questions Harpreet
No ratings yet
AI Intern Interview Complete Questions Harpreet
3 pages
Estimation and Approximation
No ratings yet
Estimation and Approximation
3 pages
Project 1
No ratings yet
Project 1
3 pages
Machine Learning in Antenna Design
No ratings yet
Machine Learning in Antenna Design
9 pages
Exponential Function
No ratings yet
Exponential Function
22 pages
Discrete Time Signals PDF
100% (1)
Discrete Time Signals PDF
13 pages
Design Problem 1
No ratings yet
Design Problem 1
5 pages
Chapter 3 System of Equations
No ratings yet
Chapter 3 System of Equations
48 pages
Data Structures and File Management
No ratings yet
Data Structures and File Management
12 pages
Anam, Al-Jumaily - 2016 - Adaptive Myoelectric Pattern Recognition For Arm Movement in Different Positions Using Advanced Online Sequent
No ratings yet
Anam, Al-Jumaily - 2016 - Adaptive Myoelectric Pattern Recognition For Arm Movement in Different Positions Using Advanced Online Sequent
4 pages
Chapter 18 Operations Management
No ratings yet
Chapter 18 Operations Management
4 pages
JD Campus Quant Researcher
No ratings yet
JD Campus Quant Researcher
2 pages
What Is Spanning Tree? What Are The Properties and Applications of Spanning Tree?
No ratings yet
What Is Spanning Tree? What Are The Properties and Applications of Spanning Tree?
5 pages
FLOWCHART Lecture
No ratings yet
FLOWCHART Lecture
7 pages
Workbook Workbook Workbook Workbook Workbook: Try Yourself Questions
No ratings yet
Workbook Workbook Workbook Workbook Workbook: Try Yourself Questions
15 pages
Floating Point Representation
No ratings yet
Floating Point Representation
18 pages
(J22) - A Limited-Preview Filtered B-Spline Approach To Tracking Control - With Application To Vibration-Induced Error Compensation of A 3D Printer
No ratings yet
(J22) - A Limited-Preview Filtered B-Spline Approach To Tracking Control - With Application To Vibration-Induced Error Compensation of A 3D Printer
10 pages
Asymmetric Ciphers: Unit III Prepared By: Suresh Thapa, Vedas College, Jawalakhel, Nepal
No ratings yet
Asymmetric Ciphers: Unit III Prepared By: Suresh Thapa, Vedas College, Jawalakhel, Nepal
44 pages
Reading Graphs - White
No ratings yet
Reading Graphs - White
8 pages
Fake News Detection and Fact Verification Research Paper
No ratings yet
Fake News Detection and Fact Verification Research Paper
2 pages
Program Evaluation and Review Technique
No ratings yet
Program Evaluation and Review Technique
7 pages
MSE Revision Set A-1
No ratings yet
MSE Revision Set A-1
2 pages
Exam MATH 201 2021
No ratings yet
Exam MATH 201 2021
4 pages
Applied Machine Learning For Engineers: Artificial Neural Networks
0% (1)
Applied Machine Learning For Engineers: Artificial Neural Networks
6 pages
Department of Electronics and Communication Engineering: Kuppam Engineering College, Kuppam-517425
No ratings yet
Department of Electronics and Communication Engineering: Kuppam Engineering College, Kuppam-517425
3 pages

Seu Ds610 Mod03

Uploaded by

Seu Ds610 Mod03

Uploaded by

Live Session

Chapter 8 (Discovering Statistics Using R)

1. Understand when to perform logistic regression

• To predict an outcome variable that is

∑ [Y ln(P(Y ))+(1−Y )ln(1−P(Y ))]

• The log-likelihood statistic

• Similar to t-statistic in regression.

• Indicates the change in odds resulting from

Reordering a factor in R Commander

Dialog box for generalized linear models in R Commander

newModel<-glm(outcome ~ predictor(s), data =

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 154.08 on 112 degrees of freedom

• To get the confidence intervals execute:

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 154.08 on 112 degrees of freedom

>Analysis of Deviance Table

Model 1: Cured ~ Intervention

• We can use the mlogit.data() function to

Chapter 8 (Discovering Statistics Using R)

You might also like