100% found this document useful (1 vote)
272 views4 pages

Fox and Weisberg Logistic Regression

Logistic regression

Uploaded by

Cipriana Gîrbea
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
100% found this document useful (1 vote)
272 views4 pages

Fox and Weisberg Logistic Regression

Logistic regression

Uploaded by

Cipriana Gîrbea
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 4
AnR COMPANION toAPPLIED REGRESSION 2: Edition John Fox McMaster University Sanford Weisberg University of Minnesota @SAGE 5.3 GLMs for Binary-Response Data Table 5.4 Variables in the Mroz data set. Variable Description Remarks lfp wife’s labor-force participation factor: no, yes k5 number of children ages 5 and younger 0-3, few 3 k618 number of children ages 6 to 18 age wife’s age in years 30-60, single years we wife’s college attendance factor: no, yes he husband's college attendance factor, no, yes lwg log of wife’s estimated wage rate see text ine family income excluding wife’s income $1,000s To illustrate logistic regression, we turn to an example from Long (1997), which draws on data from the 1976 U.S. Panel Study of Income Dynamics, and in which the response variable is married women’s labor-force participa- tion. The data were originally used in a different context by Mroz (1987), and the same data appear in Berndt (1991) as an exercise in logistic regression. The data are in the data frame Mroz in the car package; printing 10 of the n = 753 observations at random: > library (car) > some(Mroz) # sample 10 rows lfp k5 k618 age we he lwg ine 43 yes 1 2 31 yes yes 0.9450 22.50 127 yes 0 3 45 yes yes -0.9606 23.67 194 yes 0 3 31 no no 1.4971 18.00 232 yes 0 0 52 no no 1.2504 10.40 277 yes 0 3 36 no yes 1.6032 16.40 351 yes 0 0 46 no no 1.3069 28.00 362 yes 0 0 54 yes no 2.1893 18.22 408 yes 1 3 36 no no 3.2189 21.00 415 yes 0 3 36 yes yes 0.5263 32.00 607 no 1 1 44 no no 0.9905 9.80 > nrow(Mroz) (4) 753 The definitions of the variables in the Mroz data set are shown in Tabie 5.4. With the exception of 1wg, these variables are straightforward. The log of each woman’s estimated wage rate, 1wg, is based on her actual earnings if she is in the labor force; if the woman is not in the labor force, then this variable is imputed (i.e., filled in) as the predicted value from a regression of log wages on the other predictors for women in the labor force. As we will see later (in Section 6.6.3), this definition of expected earnings creates a problem for the logistic regression. 5.3.1 FITTING THE BINARY LOGISTIC- REGRESSION MODEL The variable 1£p is a factor with two levels, and if we use this variable as the response, then the first level, no, corresponds to failure (0) and the second level, yes, to success (1). 236 Chapter 5 Fitting Generalized Linear Mode's > mroz.mod <- glm(1fp ~ k5 + k618 + age + we + he + Ig + inc, + familysbinomial, datasmroz) The only features that differentiate this command from fitting a linear model are the change of function from 1m to g1m and the addition of the far- ily argument. The family argument is set to the family-generator function binomial. The first argument to glm, the model formula, specifies the linear predictor for the logistic regression, not the mean function directly, as it default logit link is used; the command is therefore equivalent to did in linear regression, Because the link function isnot given explicitly the § > mroz.mod <- glm(1fp ~ k5 + k618 + age + wc + he + lwg + inc, + family=binomial(link=logit), data=Mroz) The model summary for a logistic regression is very similar to that for a linear regression: > summary (mroz.mod) call: glm(formula = ifp ~ k5 + k618 + age + we + he + lwg + inc, family = binomial, data = Mroz) Deviance Residuals: Min 19 Median 3Q Max -2.106 -1.090 0.598 0.972 2.189 Coefficients: Estimate Std. Error z value Pr(>|2|) (Intercept) 3.18214 0.64438 «= 4.94 7.9e-07 KS -1.46291 0.19700 -7.43. 1.1e-13 k618 -0.06457 0.06800 -0.95 0.34234 age ~0.06287 0.01278 -4.92 8.7e-07 weyes 0.80727 0.22998 += 3.51 0.00045 heyes 0.11173 0.20604 = 0.54_-0.58762 wg 0.60469 0.15082 4.01 6.1e-05 ine -0.03445 0.00821 -4.20 2. 7e-05 (Dispersion parameter for binomial family taken to be 1) Null deviance: 1029.75 on 752 degrees of freedom Residual deviance: 905.27 on 745 degrees of freedom AIC: 921.3 Number of Fisher Scoring iterations: 4 The Wald tests, given by the ratio of the coefficient estimates to their standard errors, are now labeled as z values because the large-sample reference dis- tribution for the tests is the normal distribution, not the ¢ distribution as in a linear model. The dispersion parameter, @ = 1, for the binomial family is noted in the output. Additional output includes the null deviance and degrees of freedom, which are for a model with all parameters apart from the intercept set to 0; the residual deviance and degrees of freedom for the model actually fit to the data; and the AIC, an alternative measure of fit sometimes used for 5.3 GLMs for Binary-Response Data 237 model selection (see Section 4.5). Finally, the number of iterations required to obtain the maximum-likelihood estimates is printed? 5.3.2 PARAMETER ESTIMATES FOR LOGISTIC REGRESSION The estimated logistic-regression model is given by log, [435] = bo + bin +--+ + bere If we exponentiate both sides of this equation, we get 2 1— p(x) where the left-hand side of the equation, 72(x) / [1 — Z(x)], gives the fitted odds of success—that is, the fitted probability of success divided by the fitted probability of failure. Exponentiating the model removes the logarithms and changes it from a model that is additive in the log-odds scale to one that is mul- tiplicative in the odds scale. For example, increasing the age of a woman by 1 year, holding the other predictors constant, multiplies the odds of her being in the workforce by exp(b3)= exp(—0.06287)= 0.9391—that is, reduces the odds of her working by 6%. The exponentials of the coefficient estimates are generally called risk factors (or odds ratios), and they can be viewed all at once, along with their confidence intervals, by the command = exp (bg) x exp (by) x +++ & exp (Byx;) > round (exp (cbind(Estimate=coef (mroz.mod), confint (mroz.mod))), 2} Estimate 2.5 8 97.5 % (Intercept) 24.10 6.94 87.03 5 0.23 0.16 0.34 k618 0.94 0.82 1.07 age 0.94 0.92 0.96 weyes 2.26 1.43 3.54 heyes 1,12 0.75 1.68 wg 1.83 1.37 2.48 ine 0.97 0.95 0.98 Compared with a woman who did not attend college, for example, a college- educated woman with all other predictors the same has odds of working about 2.24 times higher, with 95% confidence interval 1.43 to 3.54, The confint function provides confidence intervals for GLMs based on profiling the log-likelihood rather than on the Wald statistics used for linear models (Venables and Ripley, 2002, sec. 8.4). Confidence intervals for GLMs based on the log-likelihood take longer to compute but tend to be more accu- rate than those based on the Wald statistic. Even before exponentiation, the log-likelihood-based confidence intervals need not be symmetric about the estimated coefficients. >The iterative algorithm employed by g1m to maximize the likelihood is described in Section 5.12.

You might also like