Logistic Reg
Logistic Reg
References:
1. Categorical Data Analysis; Alan Agresti; Wiley
Series in Probability; 1990.
3. Multivariate Data Analysis, Hair et al, Pearson
Education, 6e, pp 359-401
1
Logistic Regression/ Binary Logit Model
2
Problems with Usual Regression
3
Solution with Logistic Regression
8
Relation between Probability, Odds & Logit
Log(Odds)
Probability Odds =Logit
0 0 NC
0.1 0.11 -2.20
0.2 0.25 -1.39
0.3 0.43 -0.85
0.4 0.67 -0.41
0.5 1.00 0.00
0.6 1.50 0.41
0.7 2.33 0.85
0.8 4.00 1.39
0.9 9.00 2.20
1 NC NC
9
Binary Logit Model: Multiple Regressors
P
log a0 a1 X 1 ... ak X k
1 P
k
exp( ai X i )
i.e., P i 0
k
1 exp( ai X i )
i 0
10
Death Risk from Heart Disease
• Death from heart disease in next 5 years;
• 3 risk factors (age, sex, and blood cholesterol
level) to predict the risk of death from heart
disease
x1 = age in years, less 50
x2 = sex, where 0 is male and 1 is female
x3 = cholesterol level in mmol/L, less 5.0
11
Dataset (100 data points)
Sex=“1”
means
female
12
SPSS Windows: Logit Analysis
1. Select ANALYZE from the SPSS menu bar.
6. Click OK.
13
Output (3 explanatory variables)
17
Output (2 explanatory variables)
19
Properties of the Logit Model
i 1
exp( 0 1 x 1i ... k x ki )
where p i .
1 exp( 0 1 x 1i ... k x ki )
Thus, L is a function of 0 , 1 ,..., k
22
Estimation (contd.)
n
L p i i (1 p i ) (1 yi ) ,
y
i 1
yi 12
ln 0 1X1i ... k X ki
(1 y i 1
2)
23
Model Fit Measures
2/n
Cox & Snell R square: R 2 1 L 0
L
where L0= max likelihood of the intercept-only model,
L=max likelihood under specified model
24
Other Measures
• Akaike’s Information Criterion:
AIC= 2*ln(L) + 2*k,
where k = # of model parameters
• Corrected AIC (for small samples)
AICc= AIC + 2*(k(k+1))/(n-k-1)
• Bayesian Information Criterion:
BIC = 2*ln(L) + (ln(n))*k
25
Interpretation of Coefficients
p exp( a0 a1 x1 ... ak xk )
ln( ) a0 a1 x1 ... ak xk , i.e., p ,
1 p 1 exp( a0 a1 x1 ... ak xk )
26
Interpretation of Coefficients
If Xi is increased by one unit, the log odds will change by
ai units, when the values of other independent variables is
held constant.
Log(Odds)
Sign of ai will determine Probability Odds =Logit
0 0 NC
whether the probability 0.1 0.11 -2.20
increases (if the sign is 0.2 0.25 -1.39
positive) or decreases (if the 0.3 0.43 -0.85
0.4 0.67 -0.41
sign is negative) by some 0.5 1.00 0.00
amount. 0.6 1.50 0.41
0.7 2.33 0.85
0.8 4.00 1.39
0.9 9.00 2.20
1 NC NC
27
Example: Insurance Requirement
28
Data (2-level Response)
29
SPSS Windows: Logit Analysis
1. Select ANALYZE from the SPSS menu bar.
6. Click OK.
30
Output (2-level response)
32
Output (2-level response)
33
Output (2-level response; w/o Income)
34
Output (2-level response; w/o Income)
35
Output (2-level response; w/o Income)
36
Example: Brand Loyalty
Malhotra & Dash, Table 18.6, p.598
37
Shopping
Shopping
Product
Product
Loyalty
Loyalty
Brand
Brand
Seq#
Seq#
1 4 3 5 1 16 3 1 3 0
2 6 4 4 1 17 4 6 2 0
3 5 2 4 1 18 2 5 2 0
4 7 5 5 1 19 5 2 4 0
5 6 3 4 1 20 4 1 3 0
6 3 4 5 1 21 3 3 4 0
7 5 5 5 1 22 3 4 5 0
8 5 4 2 1 23 3 6 3 0
9 7 5 4 1 24 4 4 2 0
10 7 6 4 1 25 6 3 6 0
11 6 7 2 1 26 3 6 3 0
12 5 6 4 1 27 4 3 2 0
13 7 3 3 1 28 3 5 2 0
14 5 1 4 1 29 5 5 3 0
15 7 5 5 1 30 1 3 2 0
38
Explaining Brand Loyalty
No. Loyalty Brand Product Shopping
1 1 4 3 5
2 1 6 4 4
3 1 5 2 4
4 1 7 5 5
5 1 6 3 4
6 1 3 4 5
7 1 5 5 5
8 1 5 4 2
9 1 7 5 4
10 1 7 6 4
11 1 6 7 2
12 1 5 6 4
13 1 7 3 3
14 1 5 1 4
15 1 7 5 5
16 0 3 1 3
17 0 4 6 2
18 0 2 5 2
19 0 5 2 4
20 0 4 1 3
21 0 3 3 4
22 0 3 4 5
23 0 3 6 3
24 0 4 4 2
25 0 6 3 6
26 0 3 6 3
27 0 4 3 2
28 0 3 5 2
29 0 5 5 3
39
30 0 1 3 2
SPSS Windows
40
SPSS Windows: Logit Analysis
1. Select ANALYZE from the SPSS menu bar.
6. Click OK.
41
Results of Logistic Regression
Model Summary
42
Results of Logistic Regression
Classification Table a
Predicted
43
a.Variable(s) entered on step 1: Brand, Product, Shopping.
Regressor: Brand
44
Regressor: Brand
45
Regressors: Brand, Shopping
46
Regressors: Brand, Shopping
47
48
Bankruptcy Example
(Applied Multivariate Statistical Analysis by Johnson & Wichern)
49
Bankrupt
CA/NS
CA/CL
CF/TD
NI/TD
Seq#
NI/TD
Seq#
17 0.51 0.1 2.49 0.54 1
18 0.08 0.02 2.01 0.53 1
19 0.38 0.11 3.27 0.35 1
20 0.19 0.05 2.25 0.33 1
21 0.32 0.07 4.24 0.63 1
22 0.31 0.05 4.45 0.69 1
23 0.12 0.05 2.52 0.69 1
24 -0.02 0.02 2.05 0.35 1
25 0.22 0.08 2.35 0.4 1
26 0.17 0.07 1.8 0.52 1
27 0.15 0.05 2.17 0.55 1
28 -0.1 -0.01 2.5 0.58 1
29 0.14 -0.03 0.46 0.26 1
30 0.14 0.07 2.61 0.52 1
31 0.15 0.06 2.23 0.56 1
32 0.16 0.05 2.31 0.2 1
33 0.29 0.06 1.84 0.38 1
34 0.54 0.11 2.33 0.48 1
35 -0.33 -0.09 3.01 0.47 1
36 0.48 0.09 1.24 0.18 1 51
SPSS Windows: Logit Analysis
1. Select ANALYZE from the SPSS menu bar.
6. Click OK.
52
Regressors: CFTD, CACL, NITD, CANS
53
Regressors: CFTD, CACL, NITD, CANS
54
Regressors: CFTD, CACL
55
Regressors: CFTD, CACL
56
57
Example: Resort Visits
(Malhotra & Dash, Table 18.2, p.581)
Sample of 30 households
• VISIT = 1 if visited in last 2 years; = 2 otherwise
• INCOME
• TRAVEL (Attitude toward travel, on a 9-point scale)
• VACATION (importance attached to family vacation)
• HOUSEHOLD SIZE
• AGE (of head of household)
……………………………………………………………..
• AMOUNT (has 3 levels of spending on family vacation)
58
Resort Visits
Sample of 30 households
• VISIT = 1 if visited in last 2 years; = 2 otherwise
• INCOME
• TRAVEL (Attitude toward travel, on a 9-point scale)
• VACATION (importance attached to family vacation)
• HOUSEHOLD SIZE
• AGE (of head of household)
1 1 50.2 5 8 3 43 M (2)
2 1 70.3 6 7 4 61 H (3)
3 1 62.9 7 5 6 52 H (3)
4 1 48.5 7 5 5 36 L (1)
5 1 52.7 6 6 4 55 H (3)
6 1 75.0 8 7 5 68 H (3)
7 1 46.2 5 3 3 62 M (2)
8 1 57.0 2 4 6 51 M (2)
9 1 64.1 7 5 4 57 H (3)
10 1 68.1 7 6 5 45 H (3)
11 1 73.4 6 7 5 44 H (3)
12 1 71.9 5 8 4 64 H (3)
13 1 56.2 1 8 6 54 M (2)
14 1 49.3 4 2 3 56 H (3)
15 1 62.0 5 6 2 58 H (3)
60
Information on Resort Visits: Analysis Sample
Table 18.2, cont. Annual Attitude Importance Household Age of Amount
Resort Family Toward Attached Size Head of Spent on
No. Visit Income Travel to Family Household Family
($000) Vacation Vacation
16 2 32.1 5 4 3 58 L (1)
17 2 36.2 4 3 2 55 L (1)
18 2 43.2 2 5 2 57 M (2)
19 2 50.4 5 2 4 37 M (2)
20 2 44.1 6 6 3 42 M (2)
21 2 38.3 6 6 2 45 L (1)
22 2 55.0 1 2 2 57 M (2)
23 2 46.1 3 5 3 51 L (1)
24 2 35.0 6 4 5 64 L (1)
25 2 37.3 2 7 4 54 L (1)
26 2 41.8 5 1 3 56 M (2)
27 2 57.0 8 3 2 36 M (2)
28 2 33.4 6 8 2 50 L (1)
29 2 37.5 3 2 3 48 L (1)
30 2 41.3 3 3 2 42 L (1)
61
Regressors: ALL
62
Regressors: ALL
63
Most Important Regressors?
64
Most Important Regressors?
65
Regressors: Income, Hsize
66
Regressors: Income, Hsize
67
68
Multinomial Logistic Regression
Sample of 30 households
• INCOME
• TRAVEL (Attitude toward travel, on a 9-point scale)
• VACATION (importance attached to family vacation)
• HOUSEHOLD SIZE
• AGE (of head of household)
……………..
• AMOUNT (has 3 levels of spending on family
vacation)
69
Information on Resort Visits:
Analysis Sample
Annual Attitude Importance Household Age of Amount
Resort Family Toward Attached Size Head of Spent on
No. Visit Income Travel to Family Household Family
($000) Vacation Vacation
1 1 50.2 5 8 3 43 M (2)
2 1 70.3 6 7 4 61 H (3)
3 1 62.9 7 5 6 52 H (3)
4 1 48.5 7 5 5 36 L (1)
5 1 52.7 6 6 4 55 H (3)
6 1 75.0 8 7 5 68 H (3)
7 1 46.2 5 3 3 62 M (2)
8 1 57.0 2 4 6 51 M (2)
9 1 64.1 7 5 4 57 H (3)
10 1 68.1 7 6 5 45 H (3)
11 1 73.4 6 7 5 44 H (3)
12 1 71.9 5 8 4 64 H (3)
13 1 56.2 1 8 6 54 M (2)
14 1 49.3 4 2 3 56 H (3)
15 1 62.0 5 6 2 58 H (3)
70
Information on Resort Visits: Analysis Sample
Table 18.2, cont. Annual Attitude Importance Household Age of Amount
Resort Family Toward Attached Size Head of Spent on
No. Visit Income Travel to Family Household Family
($000) Vacation Vacation
16 2 32.1 5 4 3 58 L (1)
17 2 36.2 4 3 2 55 L (1)
18 2 43.2 2 5 2 57 M (2)
19 2 50.4 5 2 4 37 M (2)
20 2 44.1 6 6 3 42 M (2)
21 2 38.3 6 6 2 45 L (1)
22 2 55.0 1 2 2 57 M (2)
23 2 46.1 3 5 3 51 L (1)
24 2 35.0 6 4 5 64 L (1)
25 2 37.3 2 7 4 54 L (1)
26 2 41.8 5 1 3 56 M (2)
27 2 57.0 8 3 2 36 M (2)
28 2 33.4 6 8 2 50 L (1)
29 2 37.5 3 2 3 48 L (1)
30 2 41.3 3 3 2 42 L (1)
71
Multinomial Logit Model
Pj
log a0i a1 j X 1 ... akj X k , j 1,..., (m - 1)
Pm
k
exp( aij X i )
i.e., Pj m 1
i 0
k
, j 1,..., (m - 1)
1 [ exp( aij X i )]
j 1 i 0
72
Spending on Travel/Vacation
• Amount Spent has 3 levels (m=3) and there are 5
explanatory variables (k=5).
• We need two equations for modeling logits for
level j = 1, 2, with respect to “reference level” m
Pj
log a0i a1 j X 1 ... akj X k , j 1,2
Pm
73
SPSS Windows: Logit Analysis
1. Select ANALYZE from the SPSS menu bar.
2. Click REGRESSION and then MULTINOMIAL
LOGISTIC.
3. Move “amount” in to the DEPENDENT VARIABLE box.
4. Move “income” “travel,” “vacation”, “hsize” & “age” in to
the COVARIATES box.)
5. Under STATISTICS: under MODEL select everything
except Monotonicity measures, under PARAMETRS select
Estimates, Likelihood ratio tests & then ENTER
6. Click OK.
74
Multinomial Logistic Regression 1
75
Multinomial Logistic Regression 2
76
Multinomial Logistic Regression 3
Logit1= z1=35.771–.854*income+1.702*hsize
Logit2 = z2=14.594–.265*income+.125*hsize
p3= 1/(1+exp(z1)+exp(z2))
p1=exp(z1)/(1+exp(z1)+exp(z2))
77
p2=exp(z2)/(1+exp(z1)+exp(z2))
Example: Insurance Requirement
78
Data (3-level Response)
79
SPSS Windows: Logit Analysis
1. Select ANALYZE from the SPSS menu bar.
2. Click REGRESSION and then MULTINOMIAL
LOGISTIC.
3. Move “Resp3level” in to the DEPENDENT VARIABLE
box.
4. Move “Age,” “Dependent,” and “Income,” in to the
COVARIATES box.)
5. Under STATISTICS: under MODEL select everything
except Monotonicity measures, under PARAMETRS select
Estimates, Likelihood ratio tests & then ENTER
6. Click OK.
80
Output (3-level response)
82
Output (3-level response)
83
Output (3-level response; w/o Income)
86
Output (3-level response; w/o Income)
87