0% found this document useful (0 votes)
6 views

Logistic Regression Notes

Uploaded by

moira142560
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Logistic Regression Notes

Uploaded by

moira142560
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Logistic Regression

By:
Dr. Elya Nabila Abdul Bahri

1
Objectives

– Explain the concepts of logistic regression.


– Fit a binary logistic regression model using the Logistic
Regression task.
– Explain effect and reference cell coding.
– Define and explain the odds ratio.
– Explain the standard output from the Logistic Regression
task.

2
Overview
Response Analysis
Linear
Regression
Analysis
Continuous

Logistic
Regression
Analysis
Categorical

3
Types of Logistic Regression
Response Type of
Variable Logistic Regression
Binary
Two Binary
Categories
YES NO

Nominal Nominal
Three
or More
Categories Ordinal
Ordinal

4
What Does Logistic
Regression Do?
The logistic regression model uses the predictor variables, which
can be categorical or continuous, to predict the probability of
specific outcomes.
In other words, logistic regression is designed to describe
probabilities associated with the values of the response variable.

5
Logistic Regression Curve
1.0

0.9

0.8

0.7
Probability

0.6

0.5

0.4

0.3

0.2

0.1

0.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

x 6
Logit Transformation
 pi 
logit( pi )  log  
 1  pi 
Logistic regression models transform probabilities
called logits.

where
i indexes all cases (observations).
pi is the probability that the event (a sale,
for example) occurs in the ith case.
log is the natural log (to the base e).

7
Assumption
pi Logit (pi)

Logit
Transform

Predictor Predictor

8
Logistic Regression Model

logit (pi) = 0 + 1X1 + εi


where
logit (pi) logit transformation of the probability
of the event
0 intercept of the regression line
1 slope of the regression line
εi error (residual) associated with each observation

9
Open file logit_multi

10
Reference Cell Coding: Two
Levels
Design Variables

Class Value 1
purchase Yes 1
No 0

11
Reference Cell Coding: Three
Levels

Design Variables

Class Value Label 1 2

inclevel 3 dh: High Income 1 0


2 dm: Medium Income 0 1
1 Low Income 0 0

12
Quick Quiz: Reference Cell
Coding

logit(p) = 0 + 1 * DHigh income + 2* DMedium income

Write the following in your workbook:


 the equation for the logit when Income=High

 the equation for the logit when Income=Medium

13
Reference Cell Coding: An
Example
logit(p) = 0 + 1 * DHigh income + 2* DMedium income

0 = the value of the logit when income is Low


1 = the difference between the logits for High
and Low income
2 = the difference between the logits for Medium
and Low income

14
Effect Coding: Two Levels
Design Variables

Class Value 1
gender Female 1
Male -1

15
Effect Coding: Three Levels
Design Variables

Class Value Label 1 2

inclevel 1 Low Income 1 0


2 Medium Income 0 1
3 High Income -1 -1

16
Effect Coding: An Example
logit(p) = 0 + 1 * DHigh income + 2* DMedium income

0 = the average value of the logit across all


categories
1 = the difference between the logit for
High income and the average logit
2 = the difference between the logit for
Medium income and the average logit
-(1+2) = the difference between the average
logit and the logit for Low income

17
Binary Logistic Regression

This demonstration illustrates fitting a simple logistic


regression model using the Logistic Regression task.

18
Binary Logistic Regression
Task
Analyze  Regression 
Binary Logistic

19
What Is an Odds Ratio?

An odds ratio indicates how much more likely, with respect to


odds, a certain event occurs in one group relative to its
occurrence in another group.
Example: How much more likely are females
to purchase 100 dollars or more
in items compared to males?

20
Frequency

21
Probability of Outcome
Outcome
Yes No Total
Group A 101 139 240
Group B 61 130 191
Total 162 269 431

Probability of a YES outcome Probability of a NO outcome


in Group A = 101/240 (0.42) in Group A = 139/240 (0.58)

Probability of a YES outcome Probability of a NO outcome


in Group B = 61/191 (0.32) in Group B = 90/100 (0.68)
22
Odds
Odds of Outcome in Group A


Probability of Probability of
a Yes a No
outcome in outcome in
Group A Group A

0.42  0.58 = 0.72

23
Odds
Odds of Outcome in Group B


Probability of Probability of
a Yes a No
outcome in outcome in
Group B Group B

0.32  0.68 = 0.47

24
Odds Ratio
Odds Ratio of Group A to Group B

Odds of
outcome in
Group A
 Odds of
outcome in
Group B

0.72  0.47 = 1.53

25
Properties of the Odds Ratio
No Association

Group B Group A
More Likely More Likely

0 1


26
Odds Ratio Calculation from the
Current Logistic Regression Model
Logistic regression model:

logit pˆ   log(odds)   0  1  gender 


Odds ratio (females to males):

oddsfemales  e  0  1
oddsmales  e  0

odds ratio = e  0  1 e
1
=
0
e

27
1 Independent Variable:
Gender

28
Goodness-of-fit statistics for
Logistic Regression Model

29
Compare Means :
Independent-Samples T Test

30
Compare Means of Gender

31
Comparing Pairs
To find concordant, discordant, and tied pairs, compare
everyone who had the outcome of interest against
everyone who did not.
< $100 $100 +

32
Concordant Pair
Compare a woman who bought more than $100 worth
of goods from the catalog and a man who did not.

< $100 $100 +

P(100+) = .47 P(100+) = .72

The actual sorting agrees with the model.


This is a concordant pair.

33
Discordant Pair
Compare a man who bought more than $100 worth
of goods from the catalog and a woman who did not.
< $100 $100 +

P(100+) = .72 P(100+) = .47

The actual sorting disagrees with the model.


This is a discordant pair.

34
Tied Pair
Compare two women. One bought more than $100 worth
of goods from the catalog, but the other did not.

< $100 $100 +

P(100+) = .72 P(100+) = .72

The model cannot distinguish between the two.


This is a tied pair.

35
Concordant versus Discordant
Customer Purchasing Over $100
Predicted
Predicted Females
Females Males
Males
Outcome
Outcome (0.42) (0.32)
Probability (0.72) (0.47)
Probability
Customer
Females
Females Tie Discordant
Purchasing
(0.42)
(0.72) Tie Discordant
Pair Pair
Less Than
$100
Males
Males Concordant Tie
(0.47)
(0.32) Pair Pair
Concordant Tie

36
Model: Concordant,
Discordant, and Tied Pairs

37
Multiple Logistic Regression

38
Multiple Logistic
Regression

This demonstration illustrates fitting a logistic regression


model with more than one explanatory variable.

39
Objectives

– Fit a multiple logistic regression model using


the backward elimination method.
– Fit a multiple logistic regression model with interactions.

40
Multiple Logistic Regression

Purchase Gender Income Age

logit (pi) = 0 + 1X1 + 2X2 + 3X3

41
Backward Elimination
Method
Full
Model
Purchase Gender Income Age

? ?
Reduced
Model
Purchase

42
Adjusted Odds Ratio
Predictor Outcome

Gender Purchase

Controlling for

Income Age

43
Multiple Logistic Regression

44
Multiple Logistic
Regression with
Interactions

This demonstration illustrates adding interaction terms to


a main effects model and using backward elimination to
select the best model.

45
Multiple Logistic Regression

Gender Income Age

Purchase

46
Backward Elimination Method

Full Model

Purchase Gender Income Age

Reduced Model

Purchase
.
Gender
.
Income
.
Age ? ?
. . .
. . .
47
Comparing Models
Gender + Income Gender + Income + (Gender *
Income)
AIC 44.257 AIC 39.260
SIC 60.251 SIC 63.657
-2 Log likelihood 36.257 -2 Log likelihood 27.260
Concordant 54.0% Concordant 54.8%
Discordant 29.4% Discordant 28.6%
Ties 16.6% Ties 16.6%
Somers’ D 0.246 Somers’ D 0.261
Goodman and 0.295 Goodman and 0.314
Kruskal’s Gamma Kruskal’s Gamma
Kendall’s Tau-a 0.116 Kendall’s Tau-a 0.123
Concordance Index c 0.623 Concordance Index c 0.631

48
Graph Plot

49

You might also like