0% found this document useful (0 votes)

20 views22 pages

Lecture 8

The document discusses categorical dependent variables, focusing on binary and multicategory outcomes, and introduces linear probability models and their drawbacks. It emphasizes the use of logistic regression as a more suitable method for binary outcomes, detailing various link functions such as logit, probit, and complementary log-log. The document also explains odds and odds ratios in the context of logistic regression, providing examples and interpretations of the models' coefficients.

Uploaded by

mike1226004050

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views22 pages

Lecture 8

Uploaded by

mike1226004050

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

University of Windsor, Winter 2025

Actuarial Regression and Time Series

ACSC- 4200/ ACSC- 8200

Dr. Poonam S. Malakar

Categorical dependent Variables

-A categorical variable(also called qualitative variable) is the variable which can

take one of the possible outcomes.
-Models with a categorical dependent variable allows one to predict whether an
observation is a member of a distinct group or category.
- Categorical variables with more than two groups are known as multicategory
outcomes. Multicategory outcomes can be ordered or unordered.

Binary Dependent Variables:

- Binary variables (dichotomous variables) are the categorical variables that can
take exactly two values.
- These are special type of discrete variable that can be used to indicate whether
a subject has a characteristic of interest
-eg. occurence of accident, has lung cancer
- A model with a binary dependent variable allows one to predict whether an
event has occured or a subject has a characteristic of interest.

Example: Medical Expenditure Panel Survey(MEPS)

This is the database on hospitalization utilization and expenditures. Here,
(
1 if ith person was hospitalized during the sample period.
yi =
0 otherwise .
There are n = 2000 persons in this sample, distributed as:

1
Male Female
Not hospitalized y=0 902(95.3%) 941(89.3%)
Hospitalized y=1 44(4.7)% 113(10.7%)
Total 946 1054

Table 1

In limited circumstances, linear regression can be used with binary depen-

dent variables- this application is known as linear probability model.

Linear Probability model:

Denote the probability of response equal 1 as

πi = P r(yi = 1)

A binary random variable has a Bernoulli distribution. We have,

E(yi ) = 0 × P r(yi = 0) + 1 × P r(yi = 1) = πi

V ar(yi ) = E(yi2 ) − [E(yi )]2 = 02 × P r(yi = 0) + 12 × P r(yi = 1) − πi2

= πi (1 − πi )

Consider the linear model of the form

yi = x0i β + i
known as a linear probability model. Assuming that E(i ) = 0, We have

E(yi ) = x0i β = πi

Since, yi has Bernoulli distribution, V ar(yi ) = x0i β(1 − x0i β) .

Drawbacks of Linear Probability model:

• Fitted values can be poor. The expected response is a probability and
thus must vary between 0 and 1. However, the linear combination, x0i β,
can vary between negative and positive infinity. This mismatch implies,
for example, that fitted values may be unreasonable.

• Heteroscedasticity. Linear models assume homoscedasticity (constantvari-

ance), yet the variance of the response depends on the mean that varies
over observations. The problem of varying variability is known as het-
eroscedasticity.

2
• Residual analysis is meaningless. The response must be either a 0 or 1,
although the regression models typically regard distribution of the error
term as continuous. This mismatch implies, for example, that the usual
residual analysis in regression modeling is meaningless. (In regular regres-
sion models, residuals are the differences between the observed values of
the dependent variable and the predicted values from the regression equa-
tion. Residual analysis is commonly used to assess the goodness-of-fit of a
regression model and to check the assumptions of the regression analysis,
such as linearity, normality of residuals, and constant variance. However,
when working with binary dependent variables, such as those that take
on values of 0 and 1 (representing categories like “success” and “failure”),
the nature of the data is inherently different. Binary outcomes do not
have continuous variation like numeric variables, and the assumptions of
traditional regression analysis may not hold.)

Using Nonlinear Functions of Explanatory Variables:

Logistic regression is a statistical modeling technique used to analyze the re-
lationship between a binary dependent variable and one or more independent
variables. It is a powerful tool for predicting and understanding categorical
outcomes in various fields, including social sciences, medicine, marketing, and
finance.

Unlike linear regression, which is designed for continuous dependent vari-

ables, logistic regression is specifically suited for situations where the dependent
variable takes on binary values, such as “yes” or “no”, “success” or “failure”,
or “1” or “0”. It allows us to estimate the probability of the binary outcome
based on the values of the independent variables.

The name “logistic” refers to the logistic function, also known as the sig-
moid function, which is the key component of logistic regression. This function
transforms a linear combination of the independent variables into a probability
value between 0 and 1. By applying a suitable threshold, we can then classify
observations into one of the two categories.

Three commonly used link functions:

To ensure that the fitted response mean is always between 0 and 1, we should
use an appropriate link function g(.)to connect πand the linear combination of
predictors. The model equation is

g(πi ) = x0i β or equivalently πi = g −1 (x0i β)

• Logit Link Function: The most commonly used link function in logistic
regression is the logit link. It applies the logistic transformation (sigmoid

3
function) to the linear predictor, which is defined as the log-odds of the
probability of the binary outcome. The logit link function is given by:

π
g(π) = logit(π) = log( 1−π )
• Probit Link Function: The probit link function is an alternative to the
logit link function. It uses the cumulative distribution function (CDF) of
the standard normal distribution (also known as the probit function) to
transform the linear predictor. The probit link function is given by:

g(π) = Φ−1 (π)

where, Φ is the distribution function of standard normal and Φ−1 is its

inverse.

• Complementary Log-log Link Function: The complementary log-log (cloglog)

link function is another option in logistic regression. It applies the comple-
mentary log-log transformation to the linear predictor. The cloglog link
function is given by:

g(π) = ln[−ln(1 − π)] or π = 1 − exp(−e−η )

Link Distribution function Distribution

Probit Φ(z) Normal
1
Logit 1+e−z Logistic
Complementary log-log 1 − exp(−e−z ) Extreme -value (Gumbel)

Table 2: How the binary response mean can be written as an appropriate distri-
bution function in the Probit, Logit, and Complementary log-log models. Here
z is the argument of the distribution function

4
Fig1: Comparision of the distribution function for the probit, logit, and com-
plementary log-log cases

Example 1

Determine which of the following pairs of distribution and the link function
is the most appropriate to model if a person is hospitalized or not
(A) Normal distribution, Identity link function
(B) Normal distribution, logit link function
(C) Binomial distribution, linear link function
(D) Binomial distribution, logit link function
(E) It cannot be determined from the information given

Comments: The term “linear link” is not standard one. Presumably it means
“identity link”.
Solution D: When a person is hospitalized or not is a binary variable, which
is best modeled by Binomial (more precisely, Bernoulli) distribution, leaving
only answers (C) and (D). The link function should be one that restricts the
Bernoulli response mean to the zero-one range. Among the identity and logit
link, only logit link has this property.

Threshold Interpretation:
Suppose that there exists an underlying linear model,
yi∗ = x0i β + ∗i
Here, we do not observe the response yi∗ , yet we interpret it to be the propen-
sity to possess a characteristic. For example, we might think about the finan-
cial strength of an insurance company as a measure of its propensity to become
insolvent (no longer capable of meeting its financial obligations). Under the
threshold interpretation, we do not observe the propensity, but we do observe
when the propensity crosses a threshold. It is customary to assume that this

5
threshold is 0, for simplicity. Thus, we observe

(
0 yi∗ ≤ 0
yi =
1 yi∗ > 0.

To see how the logit case is derived from the threshold model, assume a
logistic distribution function for the disturbances, so that

P r(∗i ≤ a) = 1
1+exp(−a) .

Like the normal distribution, one can verify by calculating the density that
the logistic distribution is symmetric about zero. Thus, −∗i has the same dis-
tribution as ∗i , and so

πi = P r(yi = 1|xi ) = P r(yi∗ > 0) = P r(∗i ≤ x0i β) = 1

1+exp(−x0i β)

This establishes the threshold interpretation for the logit case. The devel-
opment for the other two cases are similar and are omitted.

Logistic Regression:
• Logistic regression is another phrase used to describe logit case.

p
• logit(p) = ln( 1−p ) is the logit function.

• With a logistic regression model, we represent the linear combination of

explanatory variables as the logit of the success probability; that is,
x0i β = logit(πi )

Odds Interpretation
- When the response y is binary, knowing only p = P r(y = 1) summarizes the
entire distribution.
p
- The odds is given by 1−p .

-For example suppose that y indicates whether a horse wins a race and that
p is the probability of the horse winning. If p = 0.25, then the odds of the horse
winning are 0.25/(1.00 − 0.25) = 0.3333.

6
-We might say that the odds of winning are 0.3333 to 1, or 1 to 3.

-Equivalently, we say that the probability of not winning is 1 − p = 0.75,

so that the odds of the horse not winning is 0.75/(1 − 0.75) = 3 and the odds
against the horse are 3 to 1.

-Odds have a useful interpretation from a betting standpoint.

-Suppose that we are playing a fair game and that we place a bet of $1 with
one to three odds.
-If the horse wins, then we get our $1 back plus winnings of $3.
-If the horse loses, then we lose our bet of $1.
-It is a fair game in the sense that the expected value of the game is zero because
we win $3 with probability p = 0.25 and lose $1 with probability 1 − p = 0.75.
-From an economic standpoint, the odds provide the important numbers (bet
of $1 and winnings of $3), not the probabilities.
-Of course, if we know p, then we can always calculate the odds. Similarly, if
we know the odds, we can always calculate the probability p. The logit is the
logarithmic odds function, also known as the log odds.

Odds Ratio Interpretation

To interpret the regression coefficients in the logistic regression model, β =
(β0 , β1 , . . . , βk )0 , we begin by assuming that j th explanatory variable, xij , is ei-
ther zero or one. Then, with the notation xi 0 = (xi0 , . . . , xij , . . . , xik )0 , we may
interpret

βj = (xi0 , . . . , 1, . . . , xik )0 β − (xi0 , . . . , 0, . . . , xik )0 β

P r(yi =1|xij =1) P r(yi =1|xij =0)
= ln 1−P r(y i =1|x ij =1) − ln 1−P r(y i =1|x ij =0)

Thus,
P r(yi =1|xij =1)/(1−P r(yi =1|xij =1))
eβj = P r(yi =1|xij =0)/(1−P r(yi =1|xij =0))

-This shows that eβj can be expressed as the ratio of two odds, known as the
odds ratio.

-That is, the numerator of this expression is the odds when xij = 1, whereas
the denominator is the odds when xij = 0.

-Thus, we can say that the odds when xij = 1 are exp(βj ) times as large as
the odds when xij = 0.

-To illustrate, suppose βj = 0.693, so that exp(βj ) = 2 . From this, we say

that the odds (for y = 1) are twice as great for xij = 1 as for xij = 0.

7
Similarly, assuming that j th explanatory variable is continuous (differen-
tiable), we have

P r(yi =1|xij )
βj = ∂x∂ij x0i β = ∂x∂ij ln 1−P r(y i =1|xij )

∂ P r(yi =1|xij )
∂xij (1−P r(yi =1|xij ))
= P r(yi =1|xij )
1−P r(yi =1|xij )

case of one predictor:

ln(odds)= β0 + β1 x odds = exp(β0 + β1 x)

• If x is continuous, then

∂ ∂(odds)/∂x
β1 = ∂x ln(odds) = odds

which is the proportional chang (i.e., absolute change divided by the cur-
rent value) in the odds.

• If x is a binary variable equal to either zero or one, then

(
exp(β0 ), when x = 0
odds =
exp(β0 + β1 ) when x = 1

Then
odds whenx=1
exp(β1 ) = odds whenx=0 ,

which is the ratio of two odds, called an odds ratio. Equivalently the odds
when x = 1 is exp(β1 ) times as large as the odds when x = 0. If β1 > 0
(resp.β1 < 0), the odds is higher when x = 1 (resp. x = 0).
Example 2

You are given the following information:

• A statistician uses two models to predict the probability of success, π, of

a binomial random variable.

• One of the model uses a logistic link function and the other uses a probit
link function.

• There is one predictor variable, X, and an intercept in each model.

8
• Both models happen to produce same coefficient estimate βˆ0 = 0.02 and
βˆ1 = 0.3.

• You are interested in the predicted probabilities of success at x = 4.

Calculate the absolute difference in the predicted values from the two models
at x = 4.
(A) less than 0.1
(B) At lest 0.1, but less than 0.2
(C) At least 0.2, but less than 0.3
(D) At least 0.3, but less than 0.4
(E) At least 0.4

Solution: Answer B

For both links, the estimated linear predictor at x = 4 is 0.02+0.3×4 = 1.22

- For the logit link, π̂ = 1/[1 + exp(−1.22)] = 0.7721

- For the probit link, we have π̂ = Φ(1.22) = 0.8888.

The absolute diffrence between these two predicted values is 0.8888 - 0.7721
= 0.1167.
Example 3
A statistician uses a logistic model to predict the probability of success, π, of a
binomial random variable.
You are given the following information:

• There is one predictor variable, X, and an intercept in the model.

• The estimates of π at x = 4 and 6 are 0.88877 and 0.96562, respectiavely

Calculate the estimated intercept coefficient, b0 , in the logistic model that
produced the above probability estimates.
(A) less than -1.
(B) At least -1, but less than 0.
(C) At least 0, but less than 1
(D) At least 1, but less than 2.
(E) At least 2
Solution: In Class( Answer B)

9
Example 4
You are given the following information about an insurance policy:

• The probability of a policy renewal, p(X), follows a logistic model with

an intercept and one explanatory variable.

• β0 = 5

• β1 = −0.65

Calculate the odds of renewal at x = 5.

(A) less than 2
(B) At least 2, but less than 4
(C) At least 4, but less than 6
(D) At least 6, but less than 8
(E) At least 8

Solution:
The odds of renewal at x = 5 is
exp(β0 + β1 x) = exp(5 − 0.65 × 5) = 5.7546
(Answer: (C))

Example 5: MEPS Expeditures, Continued

- The table 1 above shows that the percentage of women who were hospitalized
is 10.7%
-The odds of women being hospitalized is p/(1 − p) = 0.107/(1 − 0.107) = 0.120
- For, men the percentage of hospitalized is 4.7% and the odds is 0.047/(1 −
0.047) = 0.0493
- odds Ratio = 0.120/0.0493 = 2.434; women are more than twice as likely to
be hospitalized as men.
- From a logistic regression fit, we get the coefficient associated with sex is 0.734.
Given this model we say that women are e0.734 = 2.083 times as likely as men to
be hospitalized.The regression estimate of the odds ratio controls for additional
variables (eg. age, education) compared to the basic calculation based on raw
frequencies.
R-out put
> MSEP <− r e a d . c s v ( f i l e . c h o o s e ( ) ) ## c h o o s e your d a t a f i l e
> df<−MSEP
> # C r e a t e a b i n a r y v a r i a b l e based on t h e e x p e n d i t u r e f o r i n p a t i e n t
> df$POSEXP <− i f e l s e (df$EXPENDIP > 0 , 1 , 0 )
> PosExpglmFull = glm (POSEXP˜ AGE + GENDER+f a c t o r (RACE)+ f a c t o r (REGION)
+f a c t o r (EDUC)+ f a c t o r (PHSTAT)+ f a c t o r (ANYLIMIT)
+f a c t o r (INCOME)

10
> summary ( PosExpglmFull )

Call :
glm ( f o r m u l a = POSEXP ˜ AGE + GENDER + f a c t o r (RACE) + f a c t o r (REGION) +
f a c t o r (EDUC) + f a c t o r (PHSTAT) + f a c t o r (ANYLIMIT) + f a c t o r (INCOME)
+ f a c t o r ( i n s u r e ) , f a m i l y = b i n o m i a l ( l i n k = l o g i t ) , data = d f )

Deviance R e s i d u a l s :
Min 1Q Median 3Q Max
−1.3846 −0.4211 −0.3161 −0.2341 2.9673

Coefficients :
Estimat e Std . E r r o r z v a l u e Pr ( >| z | )
( Intercept ) −4.784829 0 . 7 3 6 1 3 6 −6.500 8 . 0 4 e −11 ∗∗∗
AGE −0.001252 0 . 0 0 7 3 4 6 −0.170 0 . 8 6 4 6 9 9
GENDER 0.734233 0.192474 3 . 8 1 5 0 . 0 0 0 1 3 6 ∗∗∗
f a c t o r (RACE)BLACK 0.217677 0.570656 0.381 0.702869
f a c t o r (RACE)NATIV 0.824219 0.835982 0.986 0.324168
f a c t o r (RACE)OTHER 0.007304 0.847974 0.009 0.993127
f a c t o r (RACE)WHITE 0.221817 0.532313 0.417 0.676895
f a c t o r (REGION)NORTHEAST 0.086940 0.280031 0.310 0.756207
f a c t o r (REGION)SOUTH −0.186302 0 . 2 3 7 3 4 7 −0.785 0 . 4 3 2 4 9 3
f a c t o r (REGION)WEST −0.518257 0 . 2 7 4 6 6 6 −1.887 0 . 0 5 9 1 7 9 .
f a c t o r (EDUC)HIGHSCH −0.062887 0 . 2 2 9 2 3 1 −0.274 0 . 7 8 3 8 2 2
f a c t o r (EDUC)LHIGHSC −0.068306 0 . 2 6 7 3 4 9 −0.255 0 . 7 9 8 3 4 2
f a c t o r (PHSTAT)FAIR 0.114993 0.357955 0.321 0.748020
f a c t o r (PHSTAT)GOOD 0.370522 0.263039 1.409 0.158948
f a c t o r (PHSTAT)POOR 1.668471 0.368824 4 . 5 2 4 6 . 0 8 e −06 ∗∗∗
f a c t o r (PHSTAT)VGOO 0.174648 0.267145 0.654 0.513269
f a c t o r (ANYLIMIT) 1 0.553917 0.208929 2 . 6 5 1 0 . 0 0 8 0 2 0 ∗∗
f a c t o r (INCOME)LINCOME 0.506546 0.303061 1.671 0.094636 .
f a c t o r (INCOME)MINCOME 0.311229 0.252860 1.231 0.218384
f a c t o r (INCOME)NPOOR 0.711665 0.399682 1.781 0.074981 .
f a c t o r (INCOME)POOR 0.910737 0.295533 3 . 0 8 2 0 . 0 0 2 0 5 8 ∗∗
f a c t o r ( i n s u r e )1 1.232008 0.304847 4 . 0 4 1 5 . 3 1 e −05 ∗∗∗
−−−
S i g n i f . codes : 0 ‘∗∗∗ ’ 0.001 ‘∗∗ ’ 0.01 ‘∗ ’ 0.05 ‘. ’ 0.1 ‘ ’ 1

( D i s p e r s i o n parameter f o r b i n o m i a l f a m i l y taken t o be 1 )

Null deviance : 1100.36 on 1999 d e g r e e s o f freedom

Residual deviance : 977.42 on 1978 d e g r e e s o f freedom
AIC : 1 0 2 1 . 4

Number o f F i s h e r S c o r i n g i t e r a t i o n s : 6

11
Parameter Estimation:
The customary method of estimation for logistic and probit models is maximum
likelihood, described in further detail in Section 11.9. To provide intuition, we
outline the ideas in the context of binary dependent variable regression models.
The likelihood is the observed value of the probability function. For a single
observation, the likelihood is
(
1 − πi if yi = 0
πi if yi = 1.
The objective of maximum likelihood estimation is to find the parameter val-
ues that produce the largest likelihood. Finding the maximum of the logarithmic
function yields the same solution as finding the maximum of the corresponding
function. Because it is generally computationally simpler, we consider the log-
arithmic (or log-) likelihood, written as

(
ln(1 − πi ) if yi = 0
ln(πi ) if yi = 1.

More compactly, the log-likelihood of a single observation is

yi ln(πi ) + (1 − yi )ln(1 − πi )

Assuming independence among observations, the likelihood of the dataset

is a product of likelihoods of each observation. Taking logarithms, the log-
ikelihood of the dataset is the sum of log-likelihoods of single observations.
n
Given the data {(xi , yi )}i=1 , we form the likelihood function as
Qn 1−yi
L(β) = i=1 πi yi (1 − πi )

The log-likelihood is viewed as a function of the parameters, with the data

held fixed. In contrast, the joint probability mass function is viewed as a func-
tion of the realized data, with the parameters held fixed. Equivalently, the
log-likelihood function is
Pn h i
L(β) = ln(L(β)) = i=1 yi ln(πi ) + (1 − yi )ln(1 − πi )
Pn h i
= i=1 yi ln(π(x0i β)) + (1 − yi )ln(1 − π(x0i β))

where we used the fact πi = x0i β

The method of maximum likelihood involves finding the values of β that

maximize the log-likelihood. The customary method of finding the maximum is
taking partial derivatives with respect to the parameters of interest and finding
roots of the resulting equations. In this case, taking partial derivatives with

12
respect to β yields the score equations:
π 0 (x0i β)
Pn
∂ 0
∂β L(β) = x
i=1 i yi − π(xi β) π(x0 β)(1−π(x0 β)) = 0
i i

0
where π is the derivative of π. The solution of these equations, denoted as
bM LE , is the maximum likelihood estimator. For the logit function the score
equations reduce to
Pn
∂ 0
∂β L(β) = i=1 x i yi − π(xi β) =0 (1)

1
where π(z) = 1+exp(−z) .

Example 6:

Consider a logistic regression model for a Bernoulli response variable y with

a mean of π:

Solution: In class (Answer (C))

Additional Inference:
An estimator of the large sample variance of β may be calculated taking partial
derivatives of the score equations. Specifically, the term
2
∂
I(β) = −E ∂β∂β 0 L(β)

13
is the information matrix. As a special case, using the logit function and
equation (1), straightforward calculations show that the information matrix is
Pn
I(β) = i=1 σi2 xi xi 0 ,

where, σi2 = π(x0i β)(1 − π(x0i β))

The square root of the (j + 1)th diagonal element of this matrix evaluated
at β = bM LE yields the standard error for bj,M LE , denoted as se(bj,M LE )

Likelihood Ratio Test:

To assess the overall model fit, it is customary to cite likelihood ratio test statis-
tics in nonlinear regression models. To test the overall model adequacy

H0 : β = 0 i.e. β 1 = β2 = · · · = βk = 0

we use the statistic,

LRT = 2(L(bM LE ) − L0 )

where L0 is the maximized log-likelihood with only an intercept term.

Under the null hypothesis H0 , this statistic has a chi-square distribution with
k degrees of freedom.

Example 7:

14
Solution: In class (Answer (E))
Example 8:
A practitioner built a GLM to predict claim frequency. She used a Poisson error
structure with a log link. You are given the following information regarding the
model summary statistics:

• The model has 5 parameters, including an intercept.

• The likelihood ratio test statistic for testing the overall model adequacy
is 11.601

Determine the best statement of what we can conclude from the value of the
likelihood ratio test statistic.

Solution: (Answer:(A))

H0 : β 1 = β 2 = β 3 = β 4 = 0

Under null hypothesis the likelihood ratio test statistic(LRT) has a χ4 2 dis-
tribution.

LRT = 11.601 > 9.48773 = χ2 4,0.05

We reject the null hypothesis at α = 5% and conclude that at least one of

the four slope parameters is non-zero.

Nominal Regression:
-Consider a c- level categorical response variable, taking values of 1, 2, . . . , c.
-If there is no natural order on the different values of the response variables,
then, the response is called Nominal .
-For example, color (red, green, yellow,purple,. . . ), type of household insurance
claim (theft,fire,storm damage, etc.)
-For nominal response variables, one may pursue a generalized logit model by
selecting one level, say the last category or the first category, as the base-
line category, relative to which the log odds of the multinomial probabilities,
πj = P r(y = j) for j = 1, 2, . . . , c − 1 are modeled in terms of predictors:
πj
ln = x0 β j = β0j + β1j x1 + · · · + βkj xk j = 1, 2, . . . , c,
πc
|{z}
not1−πj

15
with each level of j having a separate set of parameters βj = (β0j , β1j , . . . , βkj ).

By construction,β c = 0( Since ln( ππcc ) = 0)

In the special case of c = 2, π1 + π2 = 1, and the model equation reduces to

ln ππ12 = ln( 1−π

π1
1
) = β0 + β1 x1 + . . . , +βk xk

which is nothing but ordinary logistic regression.

To determine a explicit expression for πj , we rewrite the model equation as

0
πj = πc ex βj , for j = 1, 2, . . . , c. By summing over all j 0 s and using the fact
that π1 + π2 + · · · + πc = 1 and β c = 0, we have,
0
Pcexp(x βj0)
(βc =0) exp(x0 βj )
πj = = 1+ c−1
j = 1, 2, . . . , c.
k=1 exp(x βk )
0
P
k=1 exp(x βk )

In particular,

1 1
πc = Pc
exp(x0 βk ) = 1+ c−1
P 0
.
k=1 k=1 exp(x βk )

As soon as the parameter estimates βˆk ’s are available, we can estimate the
multinomial probabilities when the explanatory variables are x as
exp(x0 βˆj )
πˆj = Pc
exp(x0 βˆk )
, j = 1, 2, . . . , c.
k=1

16
Example 9:
In a study 100 subjects were asked to choose one of three election candidates
(A,B, or C). The subjects were organized into four age categories:(18-30, 31-45,
45-61,61+).

A logistic regression was fitted to the subject responses to predict their

preferred candidate, with age group (18-30) and Candidate A as the reference
categories.

For age group (18-30), the log-odds for preference of Candidate B and Can-
didate C were -0.535 and -1.489 respectively.

Calculate the modeled probability of someone from age group (18-30) pre-
ferring Candidate B.

(A) Less than 20 %

(B) At least 20 %, but less than 40 %
(C) At least 40 %, but less than 60 %
(D) At least 60 %, but less than 80 %
(E) At least 80 %
Solution: In class (Answer (B) πB = 0.3233

Example 10:
You are given the following information about a generalized logit model:

Level 3 is selected as the baseline category.

Calculate the estimated probability that a man in age group 3 gives a rating of 1.

17
(A) 0.2
(B) 0.3
(C) 0.4
(D) 0.5

(E) 0.6
Solution: In class (Answer (B) πˆ1 = 0.3203

Ordinal Dependent Variable:

The second type of multinomial regression model, known as ordinal regression,
is devoted to ordinal categorical responses, which are categorical variables whose
values can be naturally ordered. Examples include rating your health (good,
moderate, poor), teaching effectiveness (excellent, fair, poor), These responses
arise naturally in market research and opinion polls.

18
Example 11:

I. For a binary response variable with a continuous explanatory variable,

logistic regression in an inappropriate method of statistical analysis.

II. Ordinal variables are a type of continuous explanatory.

III. ANOVA is a useful approach for analyzing the means of groups of contin-
uous response variables, where the groups are categorical.

(Note: Ignore Statement III as it is beyond the SRM exam syllabus.)

Determine which of the above statements are true.

(A) I only

(B) II only

(C) III only

(D) I, II, and III

(E) The answer is not given by (A), (B), (C), or (D).

Solution:

I. False. Logistic regression allows for continuous or categorical explanatory

variables. Only the response variable has to be binary.
II. False. Ordinal variables are a type of categorical (or qualitatiave) explana-
tory variable.
III. True. This ANOVA is not the ANOVA table we have seen in sections 1.2
and 2.1, and is not required for Exam SRM.( Answer:(C))

19
Ordinal Regression:
To fix the ideas, consider an ordinal response variable y with c ordered values
which, without loss of generality, we label as 1, 2, . . . , c.The distribution of y is
described by the probabilities
πj = P r(y = j), j = 1, 2, . . . , c Pc
among which only c − 1 probabilities are free due to the condition j=1 πj = 1
and require modeling in terms of predictors.
There are different ways to extend ordinary logistic regression to ordinal re-
sponses.

• Cumulative logit model: The most common ordinal regression model uses
the “logit” link to explain the “cumulative” probabilities τj = π1 + π2 +
· · · + πj for j = 1, 2, . . . , c − 1 in terms of explanatory variables, leading
to the model equation

τ π1 +π2 +···+πj
j
ln 1−τ j
= πj+1 +πj+2 +···+πc = x0 β j , j = 1, 2, . . . , c − 1

1
τj = 1+exp(−x0 β j )

For example, if c = 3 and there are three (continuous) explanatory vari-

ables x1 , x − 2, x3 . Then, we have the two equations:

ln π2π+π
1
3
= x0 β 1 = β01 + β11 x1 + β21 x2 + β31 x3 ,

ln π1π+π
3
2
= x0 β 2 = β02 + β12 x1 + β22 x2 + β32 x3
This is the cumulative logit model.

• Proportional Odds model: An important special case of the cumulative

logit model is when the intercept term β0j , varies with j (the category),
but the other regression coefficients do not depend on j. The model equa-
tion becomes

j τ
logit(τj ) = ln 1−τ j
= β0j + β1 x1 + β2 x2 + · · · + βk xk , j = 1, 2, . . . , c − 1
| {z }
does not depend on j

or,

1
τj = 1+exp[−(β0j +β1 x1 +β2 x2 +···+βk xk )]

20
These c − 1 equations have different intercepts but the same slope with
respect to each explanatory variable. As a result, the explanatory variables
exert the same effect on all the c cumulative probabilities. For this reason
(5.2.6) is known as the proportional odds model, which is the default and
most important form of ordinal logistic regression.

Example 12:
You are given the following information about a proportional odds model for
the degree of vehicle crash classified on a three-point scale, 1,2, and 3:

• The model uses two categorical explanatory variable: Age and Sex.

• An interaction term between age and sex is included.

• The parameters of the model are given:

Parameters β̂
Intercept 1 (Degree 1) 0.450
Intercept 2 (Degree 2) 5.089
Age
Junior 0.179
Senior 0.000
Sex
F -0.172
M 0.000
Age× Sex -0.129

Calculate the ratio of the odds of having degree of crash 2 or lower for a
junior female to that for a senior male.

(A) Less than 0.8

(B) At least 0.8, but less than 0.9

(C) At least 0.9, but less than 1.0

(D) At least 1.0, but less than 1.1

(E) At least 1.1

21
(Note that “ senior male” is the reference level.)

Solution: In class (Answer (B)

CLIL World Natural Sciences - Student Book 4 - TOC
0% (1)
CLIL World Natural Sciences - Student Book 4 - TOC
5 pages
Lynette Fromme Interview With Psychologist
100% (1)
Lynette Fromme Interview With Psychologist
7 pages
Manual en Ventiloconvector HEIZTECH 2022
No ratings yet
Manual en Ventiloconvector HEIZTECH 2022
36 pages
Introduction to Applied Econometrics Analysis Using Stata
From Everand
Introduction to Applied Econometrics Analysis Using Stata
Justin Doran
5/5 (3)
Relational and Instrumental Understanding - Powerpoint
100% (4)
Relational and Instrumental Understanding - Powerpoint
22 pages
Logistic Regression
No ratings yet
Logistic Regression
54 pages
Logistic Regression
No ratings yet
Logistic Regression
18 pages
CHAPTER 2
No ratings yet
CHAPTER 2
11 pages
RM - Binary Logistic Regression Model - Estimation
No ratings yet
RM - Binary Logistic Regression Model - Estimation
19 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
Chapter 5 Mgt
No ratings yet
Chapter 5 Mgt
60 pages
Logistics Regression Notes
No ratings yet
Logistics Regression Notes
12 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
3-Classification
No ratings yet
3-Classification
26 pages
W5S01 - PM-Logistic Regression
No ratings yet
W5S01 - PM-Logistic Regression
17 pages
Lecture 22. Glm
No ratings yet
Lecture 22. Glm
41 pages
Regression3 Slides
No ratings yet
Regression3 Slides
47 pages
Loges Tic
No ratings yet
Loges Tic
30 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Section 9 Limited Dependent Variables
No ratings yet
Section 9 Limited Dependent Variables
17 pages
Statistics 244 - Binary Response Regression, and Related Issues
100% (1)
Statistics 244 - Binary Response Regression, and Related Issues
30 pages
Dummy Dependent Variable
100% (1)
Dummy Dependent Variable
58 pages
class
No ratings yet
class
102 pages
Regress Model Chap 11 Four Per Page
No ratings yet
Regress Model Chap 11 Four Per Page
9 pages
Chapter 5-LDVM-2024
No ratings yet
Chapter 5-LDVM-2024
27 pages
Unit - II Regression-LogisticRegressionModels
No ratings yet
Unit - II Regression-LogisticRegressionModels
7 pages
Sta 3010 Quizes
No ratings yet
Sta 3010 Quizes
10 pages
Logistic Regression
100% (1)
Logistic Regression
21 pages
Logistic Regression: Logistic Regression and The New: Residual Logistic Regression
No ratings yet
Logistic Regression: Logistic Regression and The New: Residual Logistic Regression
31 pages
Random notes
No ratings yet
Random notes
11 pages
Introduction To Logistic Regression
No ratings yet
Introduction To Logistic Regression
20 pages
ES714glm Generalized Linear Models
No ratings yet
ES714glm Generalized Linear Models
26 pages
Logistic Regression
100% (1)
Logistic Regression
37 pages
2
No ratings yet
2
6 pages
新增 Microsoft Word Document
No ratings yet
新增 Microsoft Word Document
10 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
23 pages
Report Logistic Regression
No ratings yet
Report Logistic Regression
21 pages
Cda Chapter Three
No ratings yet
Cda Chapter Three
18 pages
Logistic
No ratings yet
Logistic
14 pages
Logistic - Poly Regression
No ratings yet
Logistic - Poly Regression
13 pages
26GeneralizedLinearModelBernoulliAnnotated PDF
No ratings yet
26GeneralizedLinearModelBernoulliAnnotated PDF
46 pages
ppt4
No ratings yet
ppt4
54 pages
Logistic Regression
No ratings yet
Logistic Regression
14 pages
Capstone - Https:Users - Ox.ac - Uk: Jesu0073:Lecture 3:LogisticRegression
No ratings yet
Capstone - Https:Users - Ox.ac - Uk: Jesu0073:Lecture 3:LogisticRegression
17 pages
10 Dichotomous or Binary Responses
No ratings yet
10 Dichotomous or Binary Responses
74 pages
Notes 13
No ratings yet
Notes 13
18 pages
Logistic Regression Analysis
No ratings yet
Logistic Regression Analysis
48 pages
Linear Regression and Logit
No ratings yet
Linear Regression and Logit
15 pages
Msfe Week9
No ratings yet
Msfe Week9
5 pages
Logistic Regression & Practice
100% (1)
Logistic Regression & Practice
51 pages
Roni Presentation
No ratings yet
Roni Presentation
17 pages
spss10 LOGIT
No ratings yet
spss10 LOGIT
17 pages
LAB04-RegressionTasks
No ratings yet
LAB04-RegressionTasks
31 pages
Binary Logistic Regression
No ratings yet
Binary Logistic Regression
8 pages
Logistic_Regression
No ratings yet
Logistic_Regression
6 pages
Logistic Regression
100% (1)
Logistic Regression
56 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
ML Assignment
No ratings yet
ML Assignment
20 pages
What Is Logistic Regression
No ratings yet
What Is Logistic Regression
20 pages
Logistic Regression Playbook
No ratings yet
Logistic Regression Playbook
19 pages
Lecture 8
No ratings yet
Lecture 8
39 pages
Ilovepdf Merged (24)
No ratings yet
Ilovepdf Merged (24)
208 pages
15-GLM
No ratings yet
15-GLM
32 pages
5.1) Binary logistic regression
No ratings yet
5.1) Binary logistic regression
32 pages
Simulation of Injection Process Parameters To Optimize PET Preform Quality Using Design of Experiment Method
No ratings yet
Simulation of Injection Process Parameters To Optimize PET Preform Quality Using Design of Experiment Method
7 pages
Belsito v. Clark
No ratings yet
Belsito v. Clark
9 pages
Auditing Revenue Cycle Handout
No ratings yet
Auditing Revenue Cycle Handout
8 pages
Encyclopedia of Technology Terms PDF
100% (1)
Encyclopedia of Technology Terms PDF
840 pages
1MIS Interview Questions and Answers PDF Download
No ratings yet
1MIS Interview Questions and Answers PDF Download
4 pages
G0/1.2 Will Be in A Down/down State. G0/1.1 Will Be in An Administratively Down State
No ratings yet
G0/1.2 Will Be in A Down/down State. G0/1.1 Will Be in An Administratively Down State
33 pages
Authentic Movement A Dance With The Divi PDF
100% (1)
Authentic Movement A Dance With The Divi PDF
23 pages
Organization and Management: Quarter 1 - Module 2 Functions, Roles, and Skills of A Manager
75% (4)
Organization and Management: Quarter 1 - Module 2 Functions, Roles, and Skills of A Manager
17 pages
Title Experiment On Characteristics of Zener Diode
No ratings yet
Title Experiment On Characteristics of Zener Diode
2 pages
Networks and Mobile Phones
No ratings yet
Networks and Mobile Phones
4 pages
Chapter 2 Grade 12
No ratings yet
Chapter 2 Grade 12
34 pages
Q3 PA10 L2. Preparing Pasta
No ratings yet
Q3 PA10 L2. Preparing Pasta
4 pages
Maths 11th Subjective
No ratings yet
Maths 11th Subjective
1 page
Chapter 5 DCF With Inflation and Taxation: 1. Objectives
No ratings yet
Chapter 5 DCF With Inflation and Taxation: 1. Objectives
17 pages
Sustainable Investment
No ratings yet
Sustainable Investment
2 pages
Fire Protection and Arson Investigation
100% (2)
Fire Protection and Arson Investigation
14 pages
PHY2-11_12-Q3-0506-PF-FD
No ratings yet
PHY2-11_12-Q3-0506-PF-FD
43 pages
TESDA Circular No. 129-2020
No ratings yet
TESDA Circular No. 129-2020
10 pages
As 3750.18-2002 Paints For Steel Structures Moisture Cure Urethane (Single-Pack) Systems
No ratings yet
As 3750.18-2002 Paints For Steel Structures Moisture Cure Urethane (Single-Pack) Systems
7 pages
Lab Entomology Mosquito
No ratings yet
Lab Entomology Mosquito
7 pages
6acf413b59c2ff6369f33e2ba5ecb40d
No ratings yet
6acf413b59c2ff6369f33e2ba5ecb40d
1 page
SSC Foreign Diploma Requirements For Bachelor Programmes Um 2023-2024
No ratings yet
SSC Foreign Diploma Requirements For Bachelor Programmes Um 2023-2024
16 pages
Test 1 Unit 1 CL 9 2022
No ratings yet
Test 1 Unit 1 CL 9 2022
3 pages
Maths-Gr-11-Trigonometry-Worksheet
No ratings yet
Maths-Gr-11-Trigonometry-Worksheet
7 pages
Casanova Body Language How_ (Z-Library)_00001
100% (1)
Casanova Body Language How_ (Z-Library)_00001
31 pages
Steps To Be Followed To Start HCMIS View
100% (1)
Steps To Be Followed To Start HCMIS View
7 pages

Lecture 8

Uploaded by

Lecture 8

Uploaded by

University of Windsor, Winter 2025

Actuarial Regression and Time Series

Dr. Poonam S. Malakar

Categorical dependent Variables

-A categorical variable(also called qualitative variable) is the variable which can

Binary Dependent Variables:

Example: Medical Expenditure Panel Survey(MEPS)

In limited circumstances, linear regression can be used with binary depen-

Linear Probability model:

A binary random variable has a Bernoulli distribution. We have,

E(yi ) = 0 × P r(yi = 0) + 1 × P r(yi = 1) = πi

V ar(yi ) = E(yi2 ) − [E(yi )]2 = 02 × P r(yi = 0) + 12 × P r(yi = 1) − πi2

Consider the linear model of the form

Since, yi has Bernoulli distribution, V ar(yi ) = x0i β(1 − x0i β) .

Drawbacks of Linear Probability model:

• Heteroscedasticity. Linear models assume homoscedasticity (constantvari-

Using Nonlinear Functions of Explanatory Variables:

Unlike linear regression, which is designed for continuous dependent vari-

Three commonly used link functions:

g(πi ) = x0i β or equivalently πi = g −1 (x0i β)

g(π) = Φ−1 (π)

where, Φ is the distribution function of standard normal and Φ−1 is its

• Complementary Log-log Link Function: The complementary log-log (cloglog)

g(π) = ln[−ln(1 − π)] or π = 1 − exp(−e−η )

Link Distribution function Distribution

πi = P r(yi = 1|xi ) = P r(yi∗ > 0) = P r(∗i ≤ x0i β) = 1

• With a logistic regression model, we represent the linear combination of

-Equivalently, we say that the probability of not winning is 1 − p = 0.75,

-Odds have a useful interpretation from a betting standpoint.

Odds Ratio Interpretation

βj = (xi0 , . . . , 1, . . . , xik )0 β − (xi0 , . . . , 0, . . . , xik )0 β

-To illustrate, suppose βj = 0.693, so that exp(βj ) = 2 . From this, we say

case of one predictor:

ln(odds)= β0 + β1 x odds = exp(β0 + β1 x)

• If x is a binary variable equal to either zero or one, then

You are given the following information:

• A statistician uses two models to predict the probability of success, π, of

• There is one predictor variable, X, and an intercept in each model.

• You are interested in the predicted probabilities of success at x = 4.

For both links, the estimated linear predictor at x = 4 is 0.02+0.3×4 = 1.22

- For the logit link, π̂ = 1/[1 + exp(−1.22)] = 0.7721

- For the probit link, we have π̂ = Φ(1.22) = 0.8888.

• There is one predictor variable, X, and an intercept in the model.

• The estimates of π at x = 4 and 6 are 0.88877 and 0.96562, respectiavely

• The probability of a policy renewal, p(X), follows a logistic model with

Calculate the odds of renewal at x = 5.

Example 5: MEPS Expeditures, Continued

Null deviance : 1100.36 on 1999 d e g r e e s o f freedom

More compactly, the log-likelihood of a single observation is

Assuming independence among observations, the likelihood of the dataset

The log-likelihood is viewed as a function of the parameters, with the data

where we used the fact πi = x0i β

The method of maximum likelihood involves finding the values of β that

Consider a logistic regression model for a Bernoulli response variable y with

Solution: In class (Answer (C))

where, σi2 = π(x0i β)(1 − π(x0i β))

Likelihood Ratio Test:

we use the statistic,

where L0 is the maximized log-likelihood with only an intercept term.

• The model has 5 parameters, including an intercept.

LRT = 11.601 > 9.48773 = χ2 4,0.05

We reject the null hypothesis at α = 5% and conclude that at least one of

By construction,β c = 0( Since ln( ππcc ) = 0)

In the special case of c = 2, π1 + π2 = 1, and the model equation reduces to

ln ππ12 = ln( 1−π

which is nothing but ordinary logistic regression.

To determine a explicit expression for πj , we rewrite the model equation as

A logistic regression was fitted to the subject responses to predict their

(A) Less than 20 %

Level 3 is selected as the baseline category.

Ordinal Dependent Variable:

I. For a binary response variable with a continuous explanatory variable,

II. Ordinal variables are a type of continuous explanatory.

(Note: Ignore Statement III as it is beyond the SRM exam syllabus.)

Determine which of the above statements are true.

(C) III only

πi = P r(yi = 1|xi ) = P r(yi∗ > 0) = P r(∗i ≤ x0i β) = 1