0% found this document useful (0 votes)
19 views

RM - Binary Logistic Regression Model - Estimation

Uploaded by

Fides Mboma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

RM - Binary Logistic Regression Model - Estimation

Uploaded by

Fides Mboma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

MDS5202/ Segment 3

BINARY LOGISTIC
REGRESSION MODEL –
ESTIMATION
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION

Table of Contents

1. Simple Logistic Regression 4

1.1 Logistic Regression Analysis 5

1.2 Logistic Regression Estimation 8

1.3 Maximum Likelihood Estimation 11

1.4 Interpretation of Parameters 13

1.5 Categorical Independent Variable 13

1.6 Categorical Independent Variable Error! Bookmark not defined.

1.7 Reference Cell Coding 14

1.8 Challenger Dataset 15

2. Multiple Logistic Regression 17

2.1 Estimation 18

3. Summary 19

©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 2/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION

Introduction
Linear regression can be used to predict a value that is continuous in nature. However, if the
values to be predicted are dichotomous, that is, it takes only two values, the linear regression
is unsuitable in its original form. We need to modify the values to apply the concept. Binary
logistic regression is a method to predict dichotomous variables.

Learning Objectives

At the end of this topic, you will be able to:

• Explain the use of the logistic regression model for dichotomous outcome variables
• Describe the maximum likelihood estimation of the parameters of the logistic
regression
• Interpret the regression coefficient, odds ratio, and confidence interval

©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 3/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION

1. Simple Logistic Regression


Regression Analysis is a method to model a relationship between variables. It helps to infer
or predict a variable based on one or more other variables. Dependent variable is a variable
that one would want to infer or predict. An Independent variable is a variable that is used for
predicting a dependent variable. In Linear Regression Models, the dependent variable is
continuous. For example, the Salary of the employees, Consumption of electricity, Weight of
children and so on. Logistic Regression Models have dichotomous variables as dependent
variables. A dichotomous variable is a categorical variable with two categories. For example,
if the disease is present or absent, a customer buys a product or not and so on. There are
only two possible outcomes that the outcome variable or dependent variable can take.

In the visualisation of data with continuous values, the dependent variables are plotted on
the Y-axis, whereas the independent variables are plotted on the X-axis. There is a certain
scatter of points, as depicted in Figure 1. To fit the line of the regression, lease squares
methods are selected, and then a line is put up in the scatter so that there is an equal number
of points on either side of the line and the line passes through maximum points. This
process fits a linear regression into the distribution between two continuous variables.

Figure 1: Visualisation of Data

©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 4/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION

1.1 Logistic Regression Analysis


Logistic regression is similar to linear regression except that the Y-axis has only values 0
and 1, for example, representing a case of obesity - 'is obese' and 'is not obese'. While
creating a scatter plot, there is a line against the value 0, which represent 'is not obese', and
then there is a line of scatter points against 'is obese', coded as 1.

Figure 2: Logistic Regression Analysis

The logistics function gives the probability of an event of interest, say 𝑝. The probability
ranges from 0 to 1and we try to look at the odds of an event. The odds of an event can be
defined by taking the ratio of the probability of occurrence and the probability of non-
occurrence of the event. It can also be defined by using the formula,
= 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑜𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒/(1 − 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑜𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒)
𝑝
=
1−𝑝

When the probability ranging from 0 to 1 gets converted to odds ratio, then the range of this
entity, the odds will be 0 when 𝑝 is 0, and when 𝑝 is one, it'll go to infinity, the positive infinity.
So here, odds can take a value anywhere between 0 to infinity.

The log of odds brings the particular outcome variable from the '0 to 1 variable' to a
continuous scale, like in linear regression. So, a log of 0 is negative infinity. The log of odds
translates the scale of the 0 to 1 observed variable from negative infinity to positive infinity.
The 'log of odds' link function converts the '0 to 1 observation' or dichotomous variable into
a continuous scale. Once the continuous scale is achieved, the log odds become similar to
the 𝑌 in the linear regression model.

©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 5/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION

When the regression line is drawn, we get the curve shape when the probability of obesity is
put on the Y-axis. So, the curve line best fits the scatterplot. Here, 𝑌 ranges from 0 to 1.

Figure 3: Regression Line

Suppose the probability of obesity gets converted to the log of odds, then the axis moves
from 0 and 1 on the probability scale to negative to positive infinity. So now it becomes
similar to 𝑌 as seen in the linear regression model.

Figure 4: Regression line

Figure 5 shows what the logistic regression would look like. It is to be noted that logistic
regression is not a linear model, but it is a non-linear model. The link function converts the
non-linear model into a linear one.

©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 6/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION

Figure 5: Logistic Regression

Suppose in the logit transformation, the logit proportion is kept on the Y-axis, and the
predictor is on the X-axis; the sigmoid curve in Figure 6 gets converted into a linear graph.
So, this is the whole process where the link function, also called the logit transformation, is
used to convert a sigmoid curve to a linear curve.

Figure 6: Logit Transformation

©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 7/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION

1.2 Logistic Regression Estimation


Consider Figure 7, where weight is plotted against obesity. A point in Figure 7 indicates the
probability of obesity of a lightweight individual ( because the left side indicates a smaller
weight). So, the probability that a light-weighing individual is obese will be very small
because the weight is low.

Figure 7: Probability at a Point

If the red point is shifted to the particular location and the individual with an intermediate
size is weighed, then the probability is around 0.5.

Figure 8: Probability at Mid-Point

If the red dot moves to the extreme right, the probability of the new individual who is
measured as obese is higher. This seems logical because the individual weight is higher.

©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 8/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION

Figure 9: Probability at Higher Values

So, the curve in a logistic regression identifies the probability of that particular value on the
X-axis. The red circle on the line is nothing but the probability of observation of that order on
the X-axis. Using the predictors, you are trying to predict the probability of being an outcome
of interest; that is what logistic regression does. So, it classifies the subject into one or the
other group using the predictor values by giving a probability of it belonging to any one of
the two categories.

Figure 10: Probability at Multiple Points

The linear regression models estimate the regression coefficients using the least squares
method. The distance of each point from the line should be as minimum as possible.
Whereas logistic regression uses the "maximum likelihood method".

To calculate the likelihood of logistic regression, take the first observation. Here, 𝑦 equals 0
when an individual 'is not obese' and 1 when an individual 'is obese'. If the probability of

©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 9/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION

being obese is denoted as 𝑝, then the probability of the point will be 1 − 𝑝. So, with the help
of other observations, one can get the likelihood of all the other points. Since it is a binary
outcome variable, the likelihood of each point will be its respective probabilities, and the
likelihood of all points put together will be the product of the probabilities.

Figure 13: Other Likelihood Observation

Next, shift all the lines and calculate the likelihood of all the points. Calculate such
likelihoods for each line that can be drawn connecting the scatter of points using the
sigmoid curve. Identify the line that gives the maximum likelihood, which will help predict
the regression coefficient.

Figure 14: Likelihood Observation

©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 10/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION

In linear regression models, 𝑌 can take any value from negative to positive infinity, but in
logistic regression models, 𝑌 takes values between 0 and 1. Mathematically it can be written
as,
𝑙𝑜𝑔𝑖𝑡 (𝑝𝑖 ) = 0 + 1 𝑋1
Where,
𝑙𝑜𝑔𝑖𝑡 (𝑝𝑖 ): Logit transformation of the probability of the event = log of the odds
0 : Intercept of the regression line
1 : Slope of the regression line
𝑋1: Is the predictor variable

The distribution of each observation 𝑦𝑖 is given by the Bernoulli mass function,


𝑦
𝑓𝑖 (𝑦𝑖 ) = 𝜋𝑖 𝑖 (1 − 𝜋𝑖 )1−𝑦𝑖 , 𝑖 = 1, 2, . . . , 𝑛

The likelihood function is denoted as,


𝑛 𝑛
𝑦
𝐿(𝒚, 𝛽) = ∏ 𝑓𝑖 (𝑦𝑖 ) = ∏ 𝜋𝑖 𝑖 (1 − 𝜋𝑖 )1−𝑦𝑖
𝑖=1 𝑖=1

Consider the log-likelihood is,


𝑛 𝑛 𝑛
𝜋𝑖
ln 𝐿 (𝒚, 𝛽) = ln ∏ 𝑓𝑖 (𝑦𝑖 ) = ∑ [𝑦𝑖 ln ( )] + ∑ ln( 1 − 𝜋𝑖 )
1 − 𝜋𝑖
𝑖=1 𝑖=1 𝑖=1

1.3 Maximum Likelihood Estimation


The maximum likelihood estimators (MLEs) of the model parameters are those values that
maximise the likelihood (or log-likelihood) function. ML often gives estimators that are
intuitively pleasing. Some properties of MLEs are:
• Are unbiased for large samples
• Have nearly minimum variance
• Have an approximate normal distribution when 𝑛 is large.

If we have 𝑛𝑖 trials at each observation, we can write the log-likelihood as,


𝑛

ln 𝐿 (𝒚, 𝛽) = 𝛽 ′ 𝑿′ 𝒚 − ∑ 𝑛𝑖 ln[ 1 + exp( 𝒙′𝑖 𝛽)]


𝑖=1

©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 11/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION

The derivative of the log-likelihood is,


𝑛
𝜕 ln 𝐿 (𝒚, 𝜷) 𝑛𝑖
= 𝑿′ 𝒚 − ∑ [ ] exp( 𝒙′𝑖 𝜷)𝒙𝑖
𝜕𝜷 1 + exp( 𝒙′𝑖 𝜷)
𝑖=1
𝑛

= 𝑋 𝑦 − ∑[𝑛𝑖 𝜋𝑖 ] 𝑥𝑖
𝑖=1

= 𝑋 ′ 𝑦 − 𝑋 ′ 𝜇 (because 𝜇𝑖 = 𝑛𝑖 𝜋𝑖 )

Setting this last result to 0 gives the maximum likelihood score equations like,
𝑋 ′ (𝑦 − 𝜇) = 0

These equations are similar to linear regression,


𝑦 = 𝑋𝛽 + 𝜀 = 𝜇 + 𝜀𝑋 ′ (𝑦 − 𝜇) = 0

Results from ordinary least squares (OLS) or ML with normal errors.

Since 𝜇 = 𝑋𝛽,

𝑋 ′ (𝑦 − 𝜇) = 𝑋 ′ (𝑦 − 𝑋𝛽) = 0, 𝑋 ′ 𝑋𝛽̂ = 𝑋 ′ 𝑦, and

𝛽̂ = (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦

OLS or the normal−theory MLE.

Solving the ML score equations in logistic regression is not easy, as,


𝑛𝑖
𝜇𝑖 = , 𝑖 = 1,2, . . . , 𝑛
1 + exp( − 𝒙′𝑖 𝛽)

Logistic regression is a non-linear model. The solution can be calculated based on iteratively
reweighted least squares or IRLS. It follows an iterative procedure as parameter estimates
involve several steps and must be updated from an initial "guess". Since the variance of the
observations is not constant, weights are used. The weights are functions of the unknown
parameters.

©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 12/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION

1.4 Interpretation of Parameters


The log odds at 𝑥 is,
𝜋̂(𝑥)
𝜂̂ (𝑥) = ln = 𝛽̂0 + 𝛽̂1 𝑥
1 − 𝜋̂(𝑥)

The log odds at 𝑥 + 1 is,


𝜋̂(𝑥 + 1)
𝜂̂ (𝑥 + 1) = ln = 𝛽̂0 + 𝛽̂1 (𝑥 + 1)
1 − 𝜋̂(𝑥 + 1)

The difference in the log odds is,


𝜂̂ (𝑥 + 1) − 𝜂̂ (𝑥) = 𝛽̂1

The odds ratio is found by taking antilogs,


𝑂𝑑𝑑𝑠𝑥+1 ̂
𝑂̂𝑅 = = 𝑒 𝛽1
𝑂𝑑𝑑𝑠𝑥

It interprets the estimated increase in the probability of "success" associated with a one-unit
increase in the value of the predictor variable.

1.5 Categorical Independent Variable


The predictable variable is called categoric, and the dependent variable is coded as 0 and 1,
the binary outcome variable. It can also be multinomial. It is required to choose a reference
category because the comparison is made across the categories to look at the increased
probability of occurrence of the event. Generally, the code is either the highest or lowest
(depending on the software used). It uses the concept of dummy variables.
Dummy variables have indicator variables for each category.

©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 13/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION

1.6 Reference Cell Coding


Design Variables
Here is an example of identifying the dummy variables and the reference cell coding when
a categorical independent or predictor variable must be included.

Table 1: Design variables

This design variable of income has three categories, low, medium, and high, which are
numerically coded as the values, such as 1, 2, and 3. To depict this using, two dummy
variables are required, the indicator variable and the dummy variable. So, the number of
dummy variables equals one less than the categories in the variable. Here, the income
variable has three categories for which two dummy variables are needed. Table 1 shows
that against the low-income category, the first dummy variable has 1, and the second one
has 0, which signifies that it's a member of the low category.

Similarly, in the medium category, the first dummy variable has 0, and the second has 1. At
last, the high category can be noticed that both the dummy variables are taken as 0, which
becomes the reference category that indicates 0 across all the dummy variables. So, this
variable is being put into the regression model in software; it'll give the coefficients for low,
and medium compared to high.

©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 14/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION

1.7 Challenger Dataset


Here is an example of getting the simple logistic regression and interpretation of its output
that has been generated out of this analysis.

Figure 15: Dataset of Rockets

So, this is a challenger data set of rockets being installed for moving into space. There are
certain lists of variables. The data given in Figure 15 is the number of failures that came in
the field, the number of failures of the nozzle, the failed field, and so on. 0 means no and 1
means yes. There are certain values that are taken as temperature, pressure at the field, and
pressure at the nozzle. The failure occurred when the O-ring in the field joints or the nozzles
of the solid rocket were blasted. So, 1 codifies the incident of a failure, and 0 is its absence,
and the objective was to predict what determines the failure of the rocket. The focus is
mostly on the O-rings of the field joint as being the most determinant of the accidents. The
temperature on the day of launch is measured in degrees Celsius and the leak check of the
pressure test of the O-ring. These tests ensure that the rings would seal the joint of the field
and the nozzle.

While running through R, the output received tells about the 24 recordings, out of which 7
were failures and 17 were successful, and the predictor that has been considered is the
temperature. So, with the help of the standard error of the coefficients, we get the Z test,
which is the ratio of the coefficient to the standard error. The corresponding P-values are
given, and the column of odds ratio shows a unit increase in X, and the difference in the log

©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 15/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION

odds is 𝛽̂1 and if it is exponentiated, we get the exponent of 𝛽1, that is the odds ratio. If the
regression model is written, this is how it would be,
exp( 10.875 − 0.17132𝑥)
𝑦̂ =
1 + exp( 10.875 − 0.17132𝑥)

Figure 16: Challenger Dataset

The regression coefficient for the 𝛽 for the temperature is −0.17132, and to get the odds
ratio,
𝑂̂𝑅 = 𝑒 −0.17132 = 0.84

Implies that every decrease of one degree in temperature increases the odds of O-ring failure
1
by about = 1.19 𝑜𝑟 19 𝑝𝑒𝑟𝑐𝑒𝑛𝑡.
0.84

The risk at 22 degrees Celsius is,


𝑂̂𝑅 = 𝑒 22(−0.17132) = 0.0231

The temperature at the Challenger launch at 22 degrees below the lowest observed launch
1
temperature, increases the odds of failure of 0.0231 = 43.34 𝑜𝑟 𝑎𝑏𝑜𝑢𝑡 4200 𝑝𝑒𝑟 𝑐𝑒𝑛𝑡!! The

extrapolations imply that it will affect the launch.

To generate the 95%𝐶𝐼 𝑓𝑜𝑟 𝛽, the general rule for the confidence interval is,
𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 ± 𝐶𝑜𝑛𝑓𝑑. 𝐶𝑜𝑒𝑓𝑓 × 𝑆𝐸(𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐)

It is assumed that the statistic has a normal distribution.

©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 16/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION

95%𝐶𝐼 𝑓𝑜𝑟 𝛽:𝛽 ± 𝑍1 − 𝛼/2 × 𝑆𝐸(𝛽)


𝛼
95%𝐶𝐼 𝑓𝑜𝑟 𝑜𝑑𝑑𝑠 𝑟𝑎𝑡𝑖𝑜 [exp( 𝛽)]: exp {𝛽 ± 𝑍1 − 2 × 𝑆𝐸(𝛽)}

𝐼𝑛 𝑐ℎ𝑎𝑙𝑙𝑒𝑛𝑔𝑒𝑟 𝑑𝑎𝑡𝑎, 𝛽 = 0.17132 and 𝑆𝐸(𝛽) = 0.08344


95%𝐶𝐼 𝑒𝑥𝑝 𝛽:
exp {0.17132 ± 1.96 × 0.08344 = (0.72,0.99)

The 𝐶𝐼 of a ratio defines the presence of 1 in the interval. Hence, the presence of 1 does not
allow one to reject the hypothesis; if the interval is below or above 1, the chance of failure
will increase or decrease.

2. Multiple Logistic Regression


Multiple Logistic Regression is a statistical test used to predict a single binary variable using
one or more other variables. It is used when the variable you want to predict (your dependent
variable) is binary, and one or more independent variables or variables (s) are used.

For example, a purchase made can be answered with yes or no. This can be the dependent
variable. In this case, the consumer income and age are independent variables.

In another example, the dependent variables are coded 0 for no depression and 1 for
depression, whereas independent variables are: smoking: smoker yes = 1 or no = 0, age
(continuous), gender: female = 0 and male = 1

The equation for it is,


𝑙𝑜𝑔𝑖𝑡(𝑝𝑖 ) = 𝑎 + 𝑏1 𝑥1 + 𝑏2 𝑥2 … + 𝑏𝑖 𝑥𝑖
𝑑𝑒𝑝𝑟𝑒𝑠𝑠𝑖𝑜𝑛
𝑙𝑛 [ ] = 𝑎 + 𝑏1 (𝑠𝑚𝑜𝑘𝑖𝑛𝑔) + 𝑏2 (𝑎𝑔𝑒) + 𝑏3 (𝑔𝑒𝑛𝑑𝑒𝑟)
1 − 𝑑𝑒𝑝𝑟𝑒𝑠𝑠𝑖𝑜𝑛

©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 17/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION

2.1 Estimation
Table 2: Estimation of Output

Table 2 shows the output available for the particular study and in which the multiple logistic
regression was run. In the linear regression, the output available was 𝛽 coefficients, error of
𝛽. The coefficient's exponent is called the odds ratio, and the 95% confidence interval for
the odds ratio is given in Table 2. By looking at the odds ratio of gender, the value found is
1.132; this is called the adjusted odds ratio because there is a presence of other predictor
variables or other independent variables. It is considered that male is 13% more likely to be
depressed as compared to a female. While it is male, female is coded as 0, so that becomes
the reference category. The confidence interval for it has 1. Therefore, it is not considered a
significant predictor of depression because there is an inclusion of 1. The risk of being
depressed for males is likely to range from 0.823 𝑡𝑜 1.556 with 95% confidence when
adjusted for all the other variables in the model.

Similarly, from the Table, for a one-unit increase in age, the risk of being depressed is the
same as for the age lowered by 1. Hence it will also not help predict the depression or odds
of a depression. When it comes to smokers, the risk of odds ratio is 2.38. So, it is assumed
that a smoker is 2.38 times more likely to be depressed than non-smokers when adjusted
for age and gender. The odd ratio generated from simple logistic regression is referred to
as the crude odds ratio, and the odd ratio generated for the predictors from a multiple
logistic regression is referred to as the adjusted odds ratio.

©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 18/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION

3. Summary
In this topic, we discussed:
• Need for the use of the logistic regression model for dichotomous outcome variables
• Maximum likelihood estimation of the parameters of the logistic regression
• Interpretation of the regression coefficient, odds ratio and confidence interval

©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 19/20

You might also like