0% found this document useful (0 votes)
20 views22 pages

Lecture 8

The document discusses categorical dependent variables, focusing on binary and multicategory outcomes, and introduces linear probability models and their drawbacks. It emphasizes the use of logistic regression as a more suitable method for binary outcomes, detailing various link functions such as logit, probit, and complementary log-log. The document also explains odds and odds ratios in the context of logistic regression, providing examples and interpretations of the models' coefficients.

Uploaded by

mike1226004050
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views22 pages

Lecture 8

The document discusses categorical dependent variables, focusing on binary and multicategory outcomes, and introduces linear probability models and their drawbacks. It emphasizes the use of logistic regression as a more suitable method for binary outcomes, detailing various link functions such as logit, probit, and complementary log-log. The document also explains odds and odds ratios in the context of logistic regression, providing examples and interpretations of the models' coefficients.

Uploaded by

mike1226004050
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

University of Windsor, Winter 2025

Actuarial Regression and Time Series


ACSC- 4200/ ACSC- 8200

Dr. Poonam S. Malakar

Categorical dependent Variables

-A categorical variable(also called qualitative variable) is the variable which can


take one of the possible outcomes.
-Models with a categorical dependent variable allows one to predict whether an
observation is a member of a distinct group or category.
- Categorical variables with more than two groups are known as multicategory
outcomes. Multicategory outcomes can be ordered or unordered.

Binary Dependent Variables:


- Binary variables (dichotomous variables) are the categorical variables that can
take exactly two values.
- These are special type of discrete variable that can be used to indicate whether
a subject has a characteristic of interest
-eg. occurence of accident, has lung cancer
- A model with a binary dependent variable allows one to predict whether an
event has occured or a subject has a characteristic of interest.

Example: Medical Expenditure Panel Survey(MEPS)


This is the database on hospitalization utilization and expenditures. Here,
(
1 if ith person was hospitalized during the sample period.
yi =
0 otherwise .
There are n = 2000 persons in this sample, distributed as:

1
Male Female
Not hospitalized y=0 902(95.3%) 941(89.3%)
Hospitalized y=1 44(4.7)% 113(10.7%)
Total 946 1054

Table 1

In limited circumstances, linear regression can be used with binary depen-


dent variables- this application is known as linear probability model.

Linear Probability model:


Denote the probability of response equal 1 as

πi = P r(yi = 1)

A binary random variable has a Bernoulli distribution. We have,

E(yi ) = 0 × P r(yi = 0) + 1 × P r(yi = 1) = πi

V ar(yi ) = E(yi2 ) − [E(yi )]2 = 02 × P r(yi = 0) + 12 × P r(yi = 1) − πi2


= πi (1 − πi )

Consider the linear model of the form


yi = x0i β + i
known as a linear probability model. Assuming that E(i ) = 0, We have

E(yi ) = x0i β = πi

Since, yi has Bernoulli distribution, V ar(yi ) = x0i β(1 − x0i β) .

Drawbacks of Linear Probability model:


• Fitted values can be poor. The expected response is a probability and
thus must vary between 0 and 1. However, the linear combination, x0i β,
can vary between negative and positive infinity. This mismatch implies,
for example, that fitted values may be unreasonable.

• Heteroscedasticity. Linear models assume homoscedasticity (constantvari-


ance), yet the variance of the response depends on the mean that varies
over observations. The problem of varying variability is known as het-
eroscedasticity.

2
• Residual analysis is meaningless. The response must be either a 0 or 1,
although the regression models typically regard distribution of the error
term as continuous. This mismatch implies, for example, that the usual
residual analysis in regression modeling is meaningless. (In regular regres-
sion models, residuals are the differences between the observed values of
the dependent variable and the predicted values from the regression equa-
tion. Residual analysis is commonly used to assess the goodness-of-fit of a
regression model and to check the assumptions of the regression analysis,
such as linearity, normality of residuals, and constant variance. However,
when working with binary dependent variables, such as those that take
on values of 0 and 1 (representing categories like “success” and “failure”),
the nature of the data is inherently different. Binary outcomes do not
have continuous variation like numeric variables, and the assumptions of
traditional regression analysis may not hold.)

Using Nonlinear Functions of Explanatory Variables:


Logistic regression is a statistical modeling technique used to analyze the re-
lationship between a binary dependent variable and one or more independent
variables. It is a powerful tool for predicting and understanding categorical
outcomes in various fields, including social sciences, medicine, marketing, and
finance.

Unlike linear regression, which is designed for continuous dependent vari-


ables, logistic regression is specifically suited for situations where the dependent
variable takes on binary values, such as “yes” or “no”, “success” or “failure”,
or “1” or “0”. It allows us to estimate the probability of the binary outcome
based on the values of the independent variables.

The name “logistic” refers to the logistic function, also known as the sig-
moid function, which is the key component of logistic regression. This function
transforms a linear combination of the independent variables into a probability
value between 0 and 1. By applying a suitable threshold, we can then classify
observations into one of the two categories.

Three commonly used link functions:


To ensure that the fitted response mean is always between 0 and 1, we should
use an appropriate link function g(.)to connect πand the linear combination of
predictors. The model equation is

g(πi ) = x0i β or equivalently πi = g −1 (x0i β)

• Logit Link Function: The most commonly used link function in logistic
regression is the logit link. It applies the logistic transformation (sigmoid

3
function) to the linear predictor, which is defined as the log-odds of the
probability of the binary outcome. The logit link function is given by:

π
g(π) = logit(π) = log( 1−π )
• Probit Link Function: The probit link function is an alternative to the
logit link function. It uses the cumulative distribution function (CDF) of
the standard normal distribution (also known as the probit function) to
transform the linear predictor. The probit link function is given by:

g(π) = Φ−1 (π)

where, Φ is the distribution function of standard normal and Φ−1 is its


inverse.

• Complementary Log-log Link Function: The complementary log-log (cloglog)


link function is another option in logistic regression. It applies the comple-
mentary log-log transformation to the linear predictor. The cloglog link
function is given by:

g(π) = ln[−ln(1 − π)] or π = 1 − exp(−e−η )

Link Distribution function Distribution


Probit Φ(z) Normal
1
Logit 1+e−z Logistic
Complementary log-log 1 − exp(−e−z ) Extreme -value (Gumbel)

Table 2: How the binary response mean can be written as an appropriate distri-
bution function in the Probit, Logit, and Complementary log-log models. Here
z is the argument of the distribution function

4
Fig1: Comparision of the distribution function for the probit, logit, and com-
plementary log-log cases

Example 1

Determine which of the following pairs of distribution and the link function
is the most appropriate to model if a person is hospitalized or not
(A) Normal distribution, Identity link function
(B) Normal distribution, logit link function
(C) Binomial distribution, linear link function
(D) Binomial distribution, logit link function
(E) It cannot be determined from the information given

Comments: The term “linear link” is not standard one. Presumably it means
“identity link”.
Solution D: When a person is hospitalized or not is a binary variable, which
is best modeled by Binomial (more precisely, Bernoulli) distribution, leaving
only answers (C) and (D). The link function should be one that restricts the
Bernoulli response mean to the zero-one range. Among the identity and logit
link, only logit link has this property.

Threshold Interpretation:
Suppose that there exists an underlying linear model,
yi∗ = x0i β + ∗i
Here, we do not observe the response yi∗ , yet we interpret it to be the propen-
sity to possess a characteristic. For example, we might think about the finan-
cial strength of an insurance company as a measure of its propensity to become
insolvent (no longer capable of meeting its financial obligations). Under the
threshold interpretation, we do not observe the propensity, but we do observe
when the propensity crosses a threshold. It is customary to assume that this

5
threshold is 0, for simplicity. Thus, we observe

(
0 yi∗ ≤ 0
yi =
1 yi∗ > 0.

To see how the logit case is derived from the threshold model, assume a
logistic distribution function for the disturbances, so that

P r(∗i ≤ a) = 1
1+exp(−a) .

Like the normal distribution, one can verify by calculating the density that
the logistic distribution is symmetric about zero. Thus, −∗i has the same dis-
tribution as ∗i , and so

πi = P r(yi = 1|xi ) = P r(yi∗ > 0) = P r(∗i ≤ x0i β) = 1


1+exp(−x0i β)

This establishes the threshold interpretation for the logit case. The devel-
opment for the other two cases are similar and are omitted.

Logistic Regression:
• Logistic regression is another phrase used to describe logit case.

p
• logit(p) = ln( 1−p ) is the logit function.

• With a logistic regression model, we represent the linear combination of


explanatory variables as the logit of the success probability; that is,
x0i β = logit(πi )

Odds Interpretation
- When the response y is binary, knowing only p = P r(y = 1) summarizes the
entire distribution.
p
- The odds is given by 1−p .

-For example suppose that y indicates whether a horse wins a race and that
p is the probability of the horse winning. If p = 0.25, then the odds of the horse
winning are 0.25/(1.00 − 0.25) = 0.3333.

6
-We might say that the odds of winning are 0.3333 to 1, or 1 to 3.

-Equivalently, we say that the probability of not winning is 1 − p = 0.75,


so that the odds of the horse not winning is 0.75/(1 − 0.75) = 3 and the odds
against the horse are 3 to 1.

-Odds have a useful interpretation from a betting standpoint.


-Suppose that we are playing a fair game and that we place a bet of $1 with
one to three odds.
-If the horse wins, then we get our $1 back plus winnings of $3.
-If the horse loses, then we lose our bet of $1.
-It is a fair game in the sense that the expected value of the game is zero because
we win $3 with probability p = 0.25 and lose $1 with probability 1 − p = 0.75.
-From an economic standpoint, the odds provide the important numbers (bet
of $1 and winnings of $3), not the probabilities.
-Of course, if we know p, then we can always calculate the odds. Similarly, if
we know the odds, we can always calculate the probability p. The logit is the
logarithmic odds function, also known as the log odds.

Odds Ratio Interpretation


To interpret the regression coefficients in the logistic regression model, β =
(β0 , β1 , . . . , βk )0 , we begin by assuming that j th explanatory variable, xij , is ei-
ther zero or one. Then, with the notation xi 0 = (xi0 , . . . , xij , . . . , xik )0 , we may
interpret

βj = (xi0 , . . . , 1, . . . , xik )0 β − (xi0 , . . . , 0, . . . , xik )0 β


   
P r(yi =1|xij =1) P r(yi =1|xij =0)
= ln 1−P r(y i =1|x ij =1) − ln 1−P r(y i =1|x ij =0)

Thus,
P r(yi =1|xij =1)/(1−P r(yi =1|xij =1))
eβj = P r(yi =1|xij =0)/(1−P r(yi =1|xij =0))

-This shows that eβj can be expressed as the ratio of two odds, known as the
odds ratio.

-That is, the numerator of this expression is the odds when xij = 1, whereas
the denominator is the odds when xij = 0.

-Thus, we can say that the odds when xij = 1 are exp(βj ) times as large as
the odds when xij = 0.

-To illustrate, suppose βj = 0.693, so that exp(βj ) = 2 . From this, we say


that the odds (for y = 1) are twice as great for xij = 1 as for xij = 0.

7
Similarly, assuming that j th explanatory variable is continuous (differen-
tiable), we have
 
P r(yi =1|xij )
βj = ∂x∂ij x0i β = ∂x∂ij ln 1−P r(y i =1|xij )

∂ P r(yi =1|xij )
∂xij (1−P r(yi =1|xij ))
= P r(yi =1|xij )
1−P r(yi =1|xij )

case of one predictor:

ln(odds)= β0 + β1 x odds = exp(β0 + β1 x)


• If x is continuous, then

∂ ∂(odds)/∂x
β1 = ∂x ln(odds) = odds

which is the proportional chang (i.e., absolute change divided by the cur-
rent value) in the odds.

• If x is a binary variable equal to either zero or one, then

(
exp(β0 ), when x = 0
odds =
exp(β0 + β1 ) when x = 1

Then
odds whenx=1
exp(β1 ) = odds whenx=0 ,

which is the ratio of two odds, called an odds ratio. Equivalently the odds
when x = 1 is exp(β1 ) times as large as the odds when x = 0. If β1 > 0
(resp.β1 < 0), the odds is higher when x = 1 (resp. x = 0).
Example 2

You are given the following information:

• A statistician uses two models to predict the probability of success, π, of


a binomial random variable.

• One of the model uses a logistic link function and the other uses a probit
link function.

• There is one predictor variable, X, and an intercept in each model.

8
• Both models happen to produce same coefficient estimate βˆ0 = 0.02 and
βˆ1 = 0.3.

• You are interested in the predicted probabilities of success at x = 4.

Calculate the absolute difference in the predicted values from the two models
at x = 4.
(A) less than 0.1
(B) At lest 0.1, but less than 0.2
(C) At least 0.2, but less than 0.3
(D) At least 0.3, but less than 0.4
(E) At least 0.4

Solution: Answer B

For both links, the estimated linear predictor at x = 4 is 0.02+0.3×4 = 1.22

- For the logit link, π̂ = 1/[1 + exp(−1.22)] = 0.7721

- For the probit link, we have π̂ = Φ(1.22) = 0.8888.

The absolute diffrence between these two predicted values is 0.8888 - 0.7721
= 0.1167.
Example 3
A statistician uses a logistic model to predict the probability of success, π, of a
binomial random variable.
You are given the following information:

• There is one predictor variable, X, and an intercept in the model.

• The estimates of π at x = 4 and 6 are 0.88877 and 0.96562, respectiavely


Calculate the estimated intercept coefficient, b0 , in the logistic model that
produced the above probability estimates.
(A) less than -1.
(B) At least -1, but less than 0.
(C) At least 0, but less than 1
(D) At least 1, but less than 2.
(E) At least 2
Solution: In Class( Answer B)

9
Example 4
You are given the following information about an insurance policy:

• The probability of a policy renewal, p(X), follows a logistic model with


an intercept and one explanatory variable.

• β0 = 5

• β1 = −0.65

Calculate the odds of renewal at x = 5.


(A) less than 2
(B) At least 2, but less than 4
(C) At least 4, but less than 6
(D) At least 6, but less than 8
(E) At least 8

Solution:
The odds of renewal at x = 5 is
exp(β0 + β1 x) = exp(5 − 0.65 × 5) = 5.7546
(Answer: (C))

Example 5: MEPS Expeditures, Continued


- The table 1 above shows that the percentage of women who were hospitalized
is 10.7%
-The odds of women being hospitalized is p/(1 − p) = 0.107/(1 − 0.107) = 0.120
- For, men the percentage of hospitalized is 4.7% and the odds is 0.047/(1 −
0.047) = 0.0493
- odds Ratio = 0.120/0.0493 = 2.434; women are more than twice as likely to
be hospitalized as men.
- From a logistic regression fit, we get the coefficient associated with sex is 0.734.
Given this model we say that women are e0.734 = 2.083 times as likely as men to
be hospitalized.The regression estimate of the odds ratio controls for additional
variables (eg. age, education) compared to the basic calculation based on raw
frequencies.
R-out put
> MSEP <− r e a d . c s v ( f i l e . c h o o s e ( ) ) ## c h o o s e your d a t a f i l e
> df<−MSEP
> # C r e a t e a b i n a r y v a r i a b l e based on t h e e x p e n d i t u r e f o r i n p a t i e n t
> df$POSEXP <− i f e l s e (df$EXPENDIP > 0 , 1 , 0 )
> PosExpglmFull = glm (POSEXP˜ AGE + GENDER+f a c t o r (RACE)+ f a c t o r (REGION)
+f a c t o r (EDUC)+ f a c t o r (PHSTAT)+ f a c t o r (ANYLIMIT)
+f a c t o r (INCOME)

10
> summary ( PosExpglmFull )

Call :
glm ( f o r m u l a = POSEXP ˜ AGE + GENDER + f a c t o r (RACE) + f a c t o r (REGION) +
f a c t o r (EDUC) + f a c t o r (PHSTAT) + f a c t o r (ANYLIMIT) + f a c t o r (INCOME)
+ f a c t o r ( i n s u r e ) , f a m i l y = b i n o m i a l ( l i n k = l o g i t ) , data = d f )

Deviance R e s i d u a l s :
Min 1Q Median 3Q Max
−1.3846 −0.4211 −0.3161 −0.2341 2.9673

Coefficients :
Estimat e Std . E r r o r z v a l u e Pr ( >| z | )
( Intercept ) −4.784829 0 . 7 3 6 1 3 6 −6.500 8 . 0 4 e −11 ∗∗∗
AGE −0.001252 0 . 0 0 7 3 4 6 −0.170 0 . 8 6 4 6 9 9
GENDER 0.734233 0.192474 3 . 8 1 5 0 . 0 0 0 1 3 6 ∗∗∗
f a c t o r (RACE)BLACK 0.217677 0.570656 0.381 0.702869
f a c t o r (RACE)NATIV 0.824219 0.835982 0.986 0.324168
f a c t o r (RACE)OTHER 0.007304 0.847974 0.009 0.993127
f a c t o r (RACE)WHITE 0.221817 0.532313 0.417 0.676895
f a c t o r (REGION)NORTHEAST 0.086940 0.280031 0.310 0.756207
f a c t o r (REGION)SOUTH −0.186302 0 . 2 3 7 3 4 7 −0.785 0 . 4 3 2 4 9 3
f a c t o r (REGION)WEST −0.518257 0 . 2 7 4 6 6 6 −1.887 0 . 0 5 9 1 7 9 .
f a c t o r (EDUC)HIGHSCH −0.062887 0 . 2 2 9 2 3 1 −0.274 0 . 7 8 3 8 2 2
f a c t o r (EDUC)LHIGHSC −0.068306 0 . 2 6 7 3 4 9 −0.255 0 . 7 9 8 3 4 2
f a c t o r (PHSTAT)FAIR 0.114993 0.357955 0.321 0.748020
f a c t o r (PHSTAT)GOOD 0.370522 0.263039 1.409 0.158948
f a c t o r (PHSTAT)POOR 1.668471 0.368824 4 . 5 2 4 6 . 0 8 e −06 ∗∗∗
f a c t o r (PHSTAT)VGOO 0.174648 0.267145 0.654 0.513269
f a c t o r (ANYLIMIT) 1 0.553917 0.208929 2 . 6 5 1 0 . 0 0 8 0 2 0 ∗∗
f a c t o r (INCOME)LINCOME 0.506546 0.303061 1.671 0.094636 .
f a c t o r (INCOME)MINCOME 0.311229 0.252860 1.231 0.218384
f a c t o r (INCOME)NPOOR 0.711665 0.399682 1.781 0.074981 .
f a c t o r (INCOME)POOR 0.910737 0.295533 3 . 0 8 2 0 . 0 0 2 0 5 8 ∗∗
f a c t o r ( i n s u r e )1 1.232008 0.304847 4 . 0 4 1 5 . 3 1 e −05 ∗∗∗
−−−
S i g n i f . codes : 0 ‘∗∗∗ ’ 0.001 ‘∗∗ ’ 0.01 ‘∗ ’ 0.05 ‘. ’ 0.1 ‘ ’ 1

( D i s p e r s i o n parameter f o r b i n o m i a l f a m i l y taken t o be 1 )

Null deviance : 1100.36 on 1999 d e g r e e s o f freedom


Residual deviance : 977.42 on 1978 d e g r e e s o f freedom
AIC : 1 0 2 1 . 4

Number o f F i s h e r S c o r i n g i t e r a t i o n s : 6

11
Parameter Estimation:
The customary method of estimation for logistic and probit models is maximum
likelihood, described in further detail in Section 11.9. To provide intuition, we
outline the ideas in the context of binary dependent variable regression models.
The likelihood is the observed value of the probability function. For a single
observation, the likelihood is
(
1 − πi if yi = 0
πi if yi = 1.
The objective of maximum likelihood estimation is to find the parameter val-
ues that produce the largest likelihood. Finding the maximum of the logarithmic
function yields the same solution as finding the maximum of the corresponding
function. Because it is generally computationally simpler, we consider the log-
arithmic (or log-) likelihood, written as

(
ln(1 − πi ) if yi = 0
ln(πi ) if yi = 1.

More compactly, the log-likelihood of a single observation is


yi ln(πi ) + (1 − yi )ln(1 − πi )

Assuming independence among observations, the likelihood of the dataset


is a product of likelihoods of each observation. Taking logarithms, the log-
ikelihood of the dataset is the sum of log-likelihoods of single observations.
n
Given the data {(xi , yi )}i=1 , we form the likelihood function as
Qn 1−yi
L(β) = i=1 πi yi (1 − πi )

The log-likelihood is viewed as a function of the parameters, with the data


held fixed. In contrast, the joint probability mass function is viewed as a func-
tion of the realized data, with the parameters held fixed. Equivalently, the
log-likelihood function is
Pn h i
L(β) = ln(L(β)) = i=1 yi ln(πi ) + (1 − yi )ln(1 − πi )
Pn h i
= i=1 yi ln(π(x0i β)) + (1 − yi )ln(1 − π(x0i β))

where we used the fact πi = x0i β

The method of maximum likelihood involves finding the values of β that


maximize the log-likelihood. The customary method of finding the maximum is
taking partial derivatives with respect to the parameters of interest and finding
roots of the resulting equations. In this case, taking partial derivatives with

12
respect to β yields the score equations:
π 0 (x0i β)
Pn  
∂ 0
∂β L(β) = x
i=1 i yi − π(xi β) π(x0 β)(1−π(x0 β)) = 0
i i

0
where π is the derivative of π. The solution of these equations, denoted as
bM LE , is the maximum likelihood estimator. For the logit function the score
equations reduce to
Pn  
∂ 0
∂β L(β) = i=1 x i yi − π(xi β) =0 (1)

1
where π(z) = 1+exp(−z) .

Example 6:

Consider a logistic regression model for a Bernoulli response variable y with


a mean of π:

Solution: In class (Answer (C))

Additional Inference:
An estimator of the large sample variance of β may be calculated taking partial
derivatives of the score equations. Specifically, the term
 2 

I(β) = −E ∂β∂β 0 L(β)

13
is the information matrix. As a special case, using the logit function and
equation (1), straightforward calculations show that the information matrix is
Pn
I(β) = i=1 σi2 xi xi 0 ,

where, σi2 = π(x0i β)(1 − π(x0i β))

The square root of the (j + 1)th diagonal element of this matrix evaluated
at β = bM LE yields the standard error for bj,M LE , denoted as se(bj,M LE )

Likelihood Ratio Test:


To assess the overall model fit, it is customary to cite likelihood ratio test statis-
tics in nonlinear regression models. To test the overall model adequacy

H0 : β = 0 i.e. β 1 = β2 = · · · = βk = 0

we use the statistic,

LRT = 2(L(bM LE ) − L0 )

where L0 is the maximized log-likelihood with only an intercept term.


Under the null hypothesis H0 , this statistic has a chi-square distribution with
k degrees of freedom.

Example 7:

14
Solution: In class (Answer (E))
Example 8:
A practitioner built a GLM to predict claim frequency. She used a Poisson error
structure with a log link. You are given the following information regarding the
model summary statistics:

• The model has 5 parameters, including an intercept.

• The likelihood ratio test statistic for testing the overall model adequacy
is 11.601

Determine the best statement of what we can conclude from the value of the
likelihood ratio test statistic.

Solution: (Answer:(A))

H0 : β 1 = β 2 = β 3 = β 4 = 0

Under null hypothesis the likelihood ratio test statistic(LRT) has a χ4 2 dis-
tribution.

LRT = 11.601 > 9.48773 = χ2 4,0.05

We reject the null hypothesis at α = 5% and conclude that at least one of


the four slope parameters is non-zero.

Nominal Regression:
-Consider a c- level categorical response variable, taking values of 1, 2, . . . , c.
-If there is no natural order on the different values of the response variables,
then, the response is called Nominal .
-For example, color (red, green, yellow,purple,. . . ), type of household insurance
claim (theft,fire,storm damage, etc.)
-For nominal response variables, one may pursue a generalized logit model by
selecting one level, say the last category or the first category, as the base-
line category, relative to which the log odds of the multinomial probabilities,
πj = P r(y = j) for j = 1, 2, . . . , c − 1 are modeled in terms of predictors:
πj
ln = x0 β j = β0j + β1j x1 + · · · + βkj xk j = 1, 2, . . . , c,
πc
|{z}
not1−πj

15
with each level of j having a separate set of parameters βj = (β0j , β1j , . . . , βkj ).

By construction,β c = 0( Since ln( ππcc ) = 0)

In the special case of c = 2, π1 + π2 = 1, and the model equation reduces to

ln ππ12 = ln( 1−π


π1
1
) = β0 + β1 x1 + . . . , +βk xk

which is nothing but ordinary logistic regression.

To determine a explicit expression for πj , we rewrite the model equation as


0
πj = πc ex βj , for j = 1, 2, . . . , c. By summing over all j 0 s and using the fact
that π1 + π2 + · · · + πc = 1 and β c = 0, we have,
0
Pcexp(x βj0)
(βc =0) exp(x0 βj )
πj = = 1+ c−1
j = 1, 2, . . . , c.
k=1 exp(x βk )
0
P
k=1 exp(x βk )

In particular,

1 1
πc = Pc
exp(x0 βk ) = 1+ c−1
P 0
.
k=1 k=1 exp(x βk )

As soon as the parameter estimates βˆk ’s are available, we can estimate the
multinomial probabilities when the explanatory variables are x as
exp(x0 βˆj )
πˆj = Pc
exp(x0 βˆk )
, j = 1, 2, . . . , c.
k=1

16
Example 9:
In a study 100 subjects were asked to choose one of three election candidates
(A,B, or C). The subjects were organized into four age categories:(18-30, 31-45,
45-61,61+).

A logistic regression was fitted to the subject responses to predict their


preferred candidate, with age group (18-30) and Candidate A as the reference
categories.

For age group (18-30), the log-odds for preference of Candidate B and Can-
didate C were -0.535 and -1.489 respectively.

Calculate the modeled probability of someone from age group (18-30) pre-
ferring Candidate B.

(A) Less than 20 %


(B) At least 20 %, but less than 40 %
(C) At least 40 %, but less than 60 %
(D) At least 60 %, but less than 80 %
(E) At least 80 %
Solution: In class (Answer (B) πB = 0.3233

Example 10:
You are given the following information about a generalized logit model:

Level 3 is selected as the baseline category.


Calculate the estimated probability that a man in age group 3 gives a rating of 1.

17
(A) 0.2
(B) 0.3
(C) 0.4
(D) 0.5

(E) 0.6
Solution: In class (Answer (B) πˆ1 = 0.3203

Ordinal Dependent Variable:


The second type of multinomial regression model, known as ordinal regression,
is devoted to ordinal categorical responses, which are categorical variables whose
values can be naturally ordered. Examples include rating your health (good,
moderate, poor), teaching effectiveness (excellent, fair, poor), These responses
arise naturally in market research and opinion polls.

18
Example 11:

I. For a binary response variable with a continuous explanatory variable,


logistic regression in an inappropriate method of statistical analysis.

II. Ordinal variables are a type of continuous explanatory.

III. ANOVA is a useful approach for analyzing the means of groups of contin-
uous response variables, where the groups are categorical.

(Note: Ignore Statement III as it is beyond the SRM exam syllabus.)

Determine which of the above statements are true.

(A) I only

(B) II only

(C) III only

(D) I, II, and III

(E) The answer is not given by (A), (B), (C), or (D).

Solution:

I. False. Logistic regression allows for continuous or categorical explanatory


variables. Only the response variable has to be binary.
II. False. Ordinal variables are a type of categorical (or qualitatiave) explana-
tory variable.
III. True. This ANOVA is not the ANOVA table we have seen in sections 1.2
and 2.1, and is not required for Exam SRM.( Answer:(C))

19
Ordinal Regression:
To fix the ideas, consider an ordinal response variable y with c ordered values
which, without loss of generality, we label as 1, 2, . . . , c.The distribution of y is
described by the probabilities
πj = P r(y = j), j = 1, 2, . . . , c Pc
among which only c − 1 probabilities are free due to the condition j=1 πj = 1
and require modeling in terms of predictors.
There are different ways to extend ordinary logistic regression to ordinal re-
sponses.

• Cumulative logit model: The most common ordinal regression model uses
the “logit” link to explain the “cumulative” probabilities τj = π1 + π2 +
· · · + πj for j = 1, 2, . . . , c − 1 in terms of explanatory variables, leading
to the model equation

τ π1 +π2 +···+πj
j
ln 1−τ j
= πj+1 +πj+2 +···+πc = x0 β j , j = 1, 2, . . . , c − 1

or

1
τj = 1+exp(−x0 β j )

For example, if c = 3 and there are three (continuous) explanatory vari-


ables x1 , x − 2, x3 . Then, we have the two equations:

ln π2π+π
1
3
= x0 β 1 = β01 + β11 x1 + β21 x2 + β31 x3 ,

ln π1π+π
3
2
= x0 β 2 = β02 + β12 x1 + β22 x2 + β32 x3
This is the cumulative logit model.

• Proportional Odds model: An important special case of the cumulative


logit model is when the intercept term β0j , varies with j (the category),
but the other regression coefficients do not depend on j. The model equa-
tion becomes

j τ
logit(τj ) = ln 1−τ j
= β0j + β1 x1 + β2 x2 + · · · + βk xk , j = 1, 2, . . . , c − 1
| {z }
does not depend on j

or,

1
τj = 1+exp[−(β0j +β1 x1 +β2 x2 +···+βk xk )]

20
These c − 1 equations have different intercepts but the same slope with
respect to each explanatory variable. As a result, the explanatory variables
exert the same effect on all the c cumulative probabilities. For this reason
(5.2.6) is known as the proportional odds model, which is the default and
most important form of ordinal logistic regression.

Example 12:
You are given the following information about a proportional odds model for
the degree of vehicle crash classified on a three-point scale, 1,2, and 3:

• The model uses two categorical explanatory variable: Age and Sex.

• An interaction term between age and sex is included.

• The parameters of the model are given:

Parameters β̂
Intercept 1 (Degree 1) 0.450
Intercept 2 (Degree 2) 5.089
Age
Junior 0.179
Senior 0.000
Sex
F -0.172
M 0.000
Age× Sex -0.129

Calculate the ratio of the odds of having degree of crash 2 or lower for a
junior female to that for a senior male.

(A) Less than 0.8

(B) At least 0.8, but less than 0.9

(C) At least 0.9, but less than 1.0

(D) At least 1.0, but less than 1.1

(E) At least 1.1

21
(Note that “ senior male” is the reference level.)

Solution: In class (Answer (B)

22

You might also like