PDF 3
PDF 3
models the probability of an event taking place by having the log-odds for the
event be a linear combination of one or more independent variables. In regression
analysis, logistic regression[1] (or logit regression) is estimating the parameters
of a logistic model (the coefficients in the linear combination). Formally, in
binary logistic regression there is a single binary dependent variable, coded by an
indicator variable, where the two values are labeled "0" and "1", while the
independent variables can each be a binary variable (two classes, coded by an
indicator variable) or a continuous variable (any real value). The corresponding
probability of the value labeled "1" can vary between 0 (certainly the value "0")
and 1 (certainly the value "1"), hence the labeling;[2] the function that converts
log-odds to probability is the logistic function, hence the name. The unit of
measurement for the log-odds scale is called a logit, from logistic unit, hence the
alternative names. See § Background and § Definition for formal mathematics, and §
Example for a worked example.
Analogous linear models for binary variables with a different sigmoid function
instead of the logistic function (to convert the linear combination to a
probability) can also be used, most notably the probit model; see § Alternatives.
The defining characteristic of the logistic model is that increasing one of the
independent variables multiplicatively scales the odds of the given outcome at a
constant rate, with each independent variable having its own parameter; for a
binary dependent variable this generalizes the odds ratio. More abstractly, the
logistic function is the natural parameter for the Bernoulli distribution, and in
this sense is the "simplest" way to convert a real number to a probability. In
particular, it maximizes entropy (minimizes added information), and in this sense
makes the fewest assumptions of the data being modeled; see § Maximum entropy.