90784-Origin of Logit
90784-Origin of Logit
Abstract The logistic regression model, as compared to the probit, Tobit, and
complementary log–log models, is worth revisiting based upon the work of Cramer
(http://ssrn.com/abstract¼360300 or http://dx.doi.org/10.2139/ssrn.360300) and
(Logit models from economics and other fields, Cambridge University Press,
Cambridge, England, 2003, pp. 149–158). The ability to model the odds has
made the logistic regression model a popular method of statistical analysis. The
logistic regression model can be used for prospective, retrospective, or cross-
sectional data while the probit, Tobit, and the complementary log–log models can
only be used with prospective data because they model the probability of the event.
This chapter provides a summary (http://ssrn.com/abstract¼360300 or http://dx.
doi.org/10.2139/ssrn.360300; Logit models from economics and other fields,
Cambridge University Press, Cambridge, England, 2003, pp. 149–158).
More than 175 years after the advent of the growth curve, we have fully embraced
the logistic regression model as a viable tool for binary data. Today, the logistic
regression model is one of the most widely used binary models in the analysis of
categorical data. The logistic regression model is based on modeling the odds of an
outcome, and the idea of odds (as used commonly by the average person) has lots of
appeal. Many seem to be familiar with the odds of certain outcomes, whether their
discussions are in sports, illness, or almost anything else. Additionally, it is quite
interesting from a statistical point of view that whether the data were obtained from
prospective, retrospective, or cross-sectional sampling, the covariate’s impact on
the binary outcome will be the same.
Since this book concentrates on fitting logistic regression models, it is reason-
able to spend time elaborating on the history and the origination of those models.
The advent of the logistic regression model, as compared to the probit, Tobit, log–
log, and complementary log–log models, is worth revisiting (Cramer, 2002, 2003).
The ability to model the odds has made it very attractive since the logistic
regression relies on the odds, and the odds can always be computed whether the
2.2.1 Notation
2.2.2 Definition
conducting a study of patients with AIDS and whether or not they had used dirty
needles or other common practices.
A case–control study is a non-experimental research design where researchers
collect information on previous cases and compare that information with a control
group of persons who have not had those cases (called the control). The two groups
(case and control) are matched for age, sex, and other personal data, and are then
examined to determine which possible factor (e.g., cigarette smoking, watching
television) may account for the increase or decrease in the case group.
A Tobit model is also referred to as a censored regression model. The Tobit
model is best suited to cases when the response variable is either left- or right-
censoring, and we are interested in the linear relationships between variables. For
example, in the 1980s there was a time when the law restricted speedometer
readings to at most 85 mph. So experiments involving predicting a vehicle’s
top-speed from a combination of horsepower and engine size, your largest speed
value would be 85, regardless of how fast the vehicle was speeding. This is a perfect
example of right-censoring (censoring from above) the data. The one thing we are
certain about, is that those vehicles recorded as traveling at 85 mph were at least
85 mph. Introduction to SAS. UCLA: Statistical Consulting Group. http://www.ats.
ucla.edu/stat/sas/notes2/ (accessed November 24, 2007).
The logistic regression model is a tool for presenting the relation between a binary
response or a multinomial response and several predictors. Its use is very familiar
and common in the fields of health and education, as well as with elections, credit
card companies, mortgages, and other cases, where there is a need to profile the
sampling unit (Fig. 2.1).
Some example questions to guide a study might be as follow:
1. How do education, ideology, race, and gender predict a vote in favor or not in
favor of a US Senator?
Input Output
Binary
Continuous Model
Categorical produced
(0,1)
One observation
2. What factors predict the type of registered voters who would support the
reelection of a President or a Governor?
3. What are the characteristics of the consumer who should be offered a credit
card?
4. What are the characteristics of a traveler that will make him or her choose one
mode of transportation over another (rail, bus, car, or plane)?
The origin of the logistic regression model is in bioassay and some other disci-
plines. We learned that the logistic function was invented for the purpose of
describing the population growth. Also it was given its name by a Belgian math-
ematician, Verhulst. Figure 2.2 provides a description of the function:
Pt ¼ eβ0 þβ1 t = 1 þ eβ0 þβ1 t
This figure shows the relation of proportion Pt as time increases. Let the linear
relation be
logit ½Pt ¼ β0 þ β1 t;
where β0 denotes the value at time equal to zero, β1 denotes the rate of change of
logit [Pt] with regard to time and
logit ½Pt ¼ log Pt = 1 Pt
Our analyses of binary data with logistic regression models will be done mostly
with SAS, SPSS, and R. There are several procedures in SAS, SPSS, and R for
modeling binary responses under varying conditions and certain assumptions. We
attempt to use the most common procedures as we demonstrate the fit of logistic
regression models to correlated data with and without time-dependent covariates
and with fixed and random effects. There are a few chapters when we were unable
to duplicate the fit of the model in all three statistical packages.
2.6 Conclusions
easily relate to such findings. On the contrary, using probit or complementary log–
log is only appropriate for modeling prospective data as they rely on probabilities.
References
Berkson, J. (1944). Applications of the logistic function to bioassay. Journal of the American
Statistical Association, 9, 357–365.
Berkson, J. (1951). Why I prefer logits to probits. Biometrics, 7(4), 327–339.
Bishop, Y. M., Fienberg, S. E., & Holland, P. W. (1975). Discrete multivariate analysis: Theory
and practice. Cambridge, MA: MIT Press.
Bliss, C. I. (1934a). The method of probits. Science, 79, 38–39.
Bliss, C. I. (1934b). The method of probits. Science, 79, 409–410.
Cornfield, J. (1951). A method of estimating comparative rates from clinical data. Journal of the
National Cancer Institute, 11, 1269–1275.
Cornfield, J. (1956). A statistical problem arising from retrospective studies. In J. Neyman (Ed.),
Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability
(pp. 135–148). Berkeley, CA: University of California Press.
Cox, D. R. (1969). Analysis of binary data. London: Chapman and Hall.
Cramer, J. S. (2002). The origins of logistic regression (Tinbergen Institute Working Paper
No. 2002-119/4). Retrieved from SSRN: http://ssrn.com/abstract¼360300 or http://dx.doi.
org/10.2139/ssrn.360300
Cramer, J. S. (2003). The origins and development of the logit model. In J. S. Cramer (Ed.), Logit
models from economics and other fields (pp. 149–158). Cambridge, England: Cambridge
University Press.
Gurland, J., Lee, I., & Dahm, P. A. (1960). Polychotomous quantal response in biological assay.
Biometrics, 16, 382–398.
Hosmer, D., & Lemeshow, W. (1989). Applied logistic regression. New York: Wiley.
Mantel, N. (1966). Models for complex contingency tables and polychotomous response curves.
Biometrics, 22, 83–110.
Reed, L. J., & Berkson, J. (1929). The application of the logistic function to experimental data.
Journal of Physical Chemistry, 33(5), 760–779.
Theil, H. (1969). A multinomial extension of the linear logit model. International Economic
Review, 10(3), 251–259.
Wilson, E. B. (1925). The logistic or autocatalytic grid. Proceedings of the National Academy of
Science, 11, 431–456.
Winsor, C. P. (1932). A comparison of certain symmetrical growth curves. Proceeding of
Washington Academy of Sciences, 22, 73–84.
http://www.springer.com/978-3-319-23804-3