What Is Logistic Regression?
What Is Logistic Regression?
This type of statistical model (also known as logit model) is often used for
classification and predictive analytics. Logistic regression estimates the probability of
an event occurring, such as voted or didn’t vote, based on a given dataset of
independent variables. Since the outcome is a probability, the dependent variable is
bounded between 0 and 1. In logistic regression, a logit transformation is applied on
the odds—that is, the probability of success divided by the probability of failure. This
is also commonly known as the log odds, or the natural logarithm of odds, and this
logistic function is represented by the following formulas:
In this logistic regression equation, logit(pi) is the dependent or response variable and
x is the independent variable. The beta parameter, or coefficient, in this model is
commonly estimated via maximum likelihood estimation (MLE).
This method tests different values of beta through multiple iterations to optimize for
the best fit of log odds.
All of these iterations produce the log likelihood function, and logistic regression
seeks to maximize this function to find the best parameter estimate.
Once the optimal coefficient (or coefficients if there is more than one independent
variable) is found, the conditional probabilities for each observation can be calculated,
logged, and summed together to yield a predicted probability.
For binary classification, a probability less than .5 will predict 0 while a probability
greater than 0 will predict 1.
After the model has been computed, it’s best practice to evaluate the how well the
model predicts the dependent variable, which is called goodness of fit. The Hosmer–
Lemeshow test is a popular method to assess model fit.