RM - Binary Logistic Regression Model - Estimation
RM - Binary Logistic Regression Model - Estimation
BINARY LOGISTIC
REGRESSION MODEL –
ESTIMATION
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION
Table of Contents
2.1 Estimation 18
3. Summary 19
©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 2/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION
Introduction
Linear regression can be used to predict a value that is continuous in nature. However, if the
values to be predicted are dichotomous, that is, it takes only two values, the linear regression
is unsuitable in its original form. We need to modify the values to apply the concept. Binary
logistic regression is a method to predict dichotomous variables.
Learning Objectives
• Explain the use of the logistic regression model for dichotomous outcome variables
• Describe the maximum likelihood estimation of the parameters of the logistic
regression
• Interpret the regression coefficient, odds ratio, and confidence interval
©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 3/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION
In the visualisation of data with continuous values, the dependent variables are plotted on
the Y-axis, whereas the independent variables are plotted on the X-axis. There is a certain
scatter of points, as depicted in Figure 1. To fit the line of the regression, lease squares
methods are selected, and then a line is put up in the scatter so that there is an equal number
of points on either side of the line and the line passes through maximum points. This
process fits a linear regression into the distribution between two continuous variables.
©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 4/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION
The logistics function gives the probability of an event of interest, say 𝑝. The probability
ranges from 0 to 1and we try to look at the odds of an event. The odds of an event can be
defined by taking the ratio of the probability of occurrence and the probability of non-
occurrence of the event. It can also be defined by using the formula,
= 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑜𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒/(1 − 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑜𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒)
𝑝
=
1−𝑝
When the probability ranging from 0 to 1 gets converted to odds ratio, then the range of this
entity, the odds will be 0 when 𝑝 is 0, and when 𝑝 is one, it'll go to infinity, the positive infinity.
So here, odds can take a value anywhere between 0 to infinity.
The log of odds brings the particular outcome variable from the '0 to 1 variable' to a
continuous scale, like in linear regression. So, a log of 0 is negative infinity. The log of odds
translates the scale of the 0 to 1 observed variable from negative infinity to positive infinity.
The 'log of odds' link function converts the '0 to 1 observation' or dichotomous variable into
a continuous scale. Once the continuous scale is achieved, the log odds become similar to
the 𝑌 in the linear regression model.
©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 5/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION
When the regression line is drawn, we get the curve shape when the probability of obesity is
put on the Y-axis. So, the curve line best fits the scatterplot. Here, 𝑌 ranges from 0 to 1.
Suppose the probability of obesity gets converted to the log of odds, then the axis moves
from 0 and 1 on the probability scale to negative to positive infinity. So now it becomes
similar to 𝑌 as seen in the linear regression model.
Figure 5 shows what the logistic regression would look like. It is to be noted that logistic
regression is not a linear model, but it is a non-linear model. The link function converts the
non-linear model into a linear one.
©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 6/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION
Suppose in the logit transformation, the logit proportion is kept on the Y-axis, and the
predictor is on the X-axis; the sigmoid curve in Figure 6 gets converted into a linear graph.
So, this is the whole process where the link function, also called the logit transformation, is
used to convert a sigmoid curve to a linear curve.
©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 7/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION
If the red point is shifted to the particular location and the individual with an intermediate
size is weighed, then the probability is around 0.5.
If the red dot moves to the extreme right, the probability of the new individual who is
measured as obese is higher. This seems logical because the individual weight is higher.
©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 8/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION
So, the curve in a logistic regression identifies the probability of that particular value on the
X-axis. The red circle on the line is nothing but the probability of observation of that order on
the X-axis. Using the predictors, you are trying to predict the probability of being an outcome
of interest; that is what logistic regression does. So, it classifies the subject into one or the
other group using the predictor values by giving a probability of it belonging to any one of
the two categories.
The linear regression models estimate the regression coefficients using the least squares
method. The distance of each point from the line should be as minimum as possible.
Whereas logistic regression uses the "maximum likelihood method".
To calculate the likelihood of logistic regression, take the first observation. Here, 𝑦 equals 0
when an individual 'is not obese' and 1 when an individual 'is obese'. If the probability of
©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 9/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION
being obese is denoted as 𝑝, then the probability of the point will be 1 − 𝑝. So, with the help
of other observations, one can get the likelihood of all the other points. Since it is a binary
outcome variable, the likelihood of each point will be its respective probabilities, and the
likelihood of all points put together will be the product of the probabilities.
Next, shift all the lines and calculate the likelihood of all the points. Calculate such
likelihoods for each line that can be drawn connecting the scatter of points using the
sigmoid curve. Identify the line that gives the maximum likelihood, which will help predict
the regression coefficient.
©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 10/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION
In linear regression models, 𝑌 can take any value from negative to positive infinity, but in
logistic regression models, 𝑌 takes values between 0 and 1. Mathematically it can be written
as,
𝑙𝑜𝑔𝑖𝑡 (𝑝𝑖 ) = 0 + 1 𝑋1
Where,
𝑙𝑜𝑔𝑖𝑡 (𝑝𝑖 ): Logit transformation of the probability of the event = log of the odds
0 : Intercept of the regression line
1 : Slope of the regression line
𝑋1: Is the predictor variable
©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 11/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION
= 𝑋 ′ 𝑦 − 𝑋 ′ 𝜇 (because 𝜇𝑖 = 𝑛𝑖 𝜋𝑖 )
Setting this last result to 0 gives the maximum likelihood score equations like,
𝑋 ′ (𝑦 − 𝜇) = 0
Since 𝜇 = 𝑋𝛽,
𝛽̂ = (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦
Logistic regression is a non-linear model. The solution can be calculated based on iteratively
reweighted least squares or IRLS. It follows an iterative procedure as parameter estimates
involve several steps and must be updated from an initial "guess". Since the variance of the
observations is not constant, weights are used. The weights are functions of the unknown
parameters.
©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 12/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION
It interprets the estimated increase in the probability of "success" associated with a one-unit
increase in the value of the predictor variable.
©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 13/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION
This design variable of income has three categories, low, medium, and high, which are
numerically coded as the values, such as 1, 2, and 3. To depict this using, two dummy
variables are required, the indicator variable and the dummy variable. So, the number of
dummy variables equals one less than the categories in the variable. Here, the income
variable has three categories for which two dummy variables are needed. Table 1 shows
that against the low-income category, the first dummy variable has 1, and the second one
has 0, which signifies that it's a member of the low category.
Similarly, in the medium category, the first dummy variable has 0, and the second has 1. At
last, the high category can be noticed that both the dummy variables are taken as 0, which
becomes the reference category that indicates 0 across all the dummy variables. So, this
variable is being put into the regression model in software; it'll give the coefficients for low,
and medium compared to high.
©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 14/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION
So, this is a challenger data set of rockets being installed for moving into space. There are
certain lists of variables. The data given in Figure 15 is the number of failures that came in
the field, the number of failures of the nozzle, the failed field, and so on. 0 means no and 1
means yes. There are certain values that are taken as temperature, pressure at the field, and
pressure at the nozzle. The failure occurred when the O-ring in the field joints or the nozzles
of the solid rocket were blasted. So, 1 codifies the incident of a failure, and 0 is its absence,
and the objective was to predict what determines the failure of the rocket. The focus is
mostly on the O-rings of the field joint as being the most determinant of the accidents. The
temperature on the day of launch is measured in degrees Celsius and the leak check of the
pressure test of the O-ring. These tests ensure that the rings would seal the joint of the field
and the nozzle.
While running through R, the output received tells about the 24 recordings, out of which 7
were failures and 17 were successful, and the predictor that has been considered is the
temperature. So, with the help of the standard error of the coefficients, we get the Z test,
which is the ratio of the coefficient to the standard error. The corresponding P-values are
given, and the column of odds ratio shows a unit increase in X, and the difference in the log
©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 15/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION
odds is 𝛽̂1 and if it is exponentiated, we get the exponent of 𝛽1, that is the odds ratio. If the
regression model is written, this is how it would be,
exp( 10.875 − 0.17132𝑥)
𝑦̂ =
1 + exp( 10.875 − 0.17132𝑥)
The regression coefficient for the 𝛽 for the temperature is −0.17132, and to get the odds
ratio,
𝑂̂𝑅 = 𝑒 −0.17132 = 0.84
Implies that every decrease of one degree in temperature increases the odds of O-ring failure
1
by about = 1.19 𝑜𝑟 19 𝑝𝑒𝑟𝑐𝑒𝑛𝑡.
0.84
The temperature at the Challenger launch at 22 degrees below the lowest observed launch
1
temperature, increases the odds of failure of 0.0231 = 43.34 𝑜𝑟 𝑎𝑏𝑜𝑢𝑡 4200 𝑝𝑒𝑟 𝑐𝑒𝑛𝑡!! The
To generate the 95%𝐶𝐼 𝑓𝑜𝑟 𝛽, the general rule for the confidence interval is,
𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 ± 𝐶𝑜𝑛𝑓𝑑. 𝐶𝑜𝑒𝑓𝑓 × 𝑆𝐸(𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐)
©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 16/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION
The 𝐶𝐼 of a ratio defines the presence of 1 in the interval. Hence, the presence of 1 does not
allow one to reject the hypothesis; if the interval is below or above 1, the chance of failure
will increase or decrease.
For example, a purchase made can be answered with yes or no. This can be the dependent
variable. In this case, the consumer income and age are independent variables.
In another example, the dependent variables are coded 0 for no depression and 1 for
depression, whereas independent variables are: smoking: smoker yes = 1 or no = 0, age
(continuous), gender: female = 0 and male = 1
©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 17/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION
2.1 Estimation
Table 2: Estimation of Output
Table 2 shows the output available for the particular study and in which the multiple logistic
regression was run. In the linear regression, the output available was 𝛽 coefficients, error of
𝛽. The coefficient's exponent is called the odds ratio, and the 95% confidence interval for
the odds ratio is given in Table 2. By looking at the odds ratio of gender, the value found is
1.132; this is called the adjusted odds ratio because there is a presence of other predictor
variables or other independent variables. It is considered that male is 13% more likely to be
depressed as compared to a female. While it is male, female is coded as 0, so that becomes
the reference category. The confidence interval for it has 1. Therefore, it is not considered a
significant predictor of depression because there is an inclusion of 1. The risk of being
depressed for males is likely to range from 0.823 𝑡𝑜 1.556 with 95% confidence when
adjusted for all the other variables in the model.
Similarly, from the Table, for a one-unit increase in age, the risk of being depressed is the
same as for the age lowered by 1. Hence it will also not help predict the depression or odds
of a depression. When it comes to smokers, the risk of odds ratio is 2.38. So, it is assumed
that a smoker is 2.38 times more likely to be depressed than non-smokers when adjusted
for age and gender. The odd ratio generated from simple logistic regression is referred to
as the crude odds ratio, and the odd ratio generated for the predictors from a multiple
logistic regression is referred to as the adjusted odds ratio.
©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 18/20
BINARY LOGISTIC REGRESSION MODEL-ESTIMATION
3. Summary
In this topic, we discussed:
• Need for the use of the logistic regression model for dichotomous outcome variables
• Maximum likelihood estimation of the parameters of the logistic regression
• Interpretation of the regression coefficient, odds ratio and confidence interval
©COPYRIGHT 2023 (VER. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 19/20