0% found this document useful (0 votes)
44 views7 pages

Regression - Binary Logit - Q

Uploaded by

fuad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views7 pages

Regression - Binary Logit - Q

Uploaded by

fuad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

10/3/2021 Regression - Binary Logit - Q

Regression - Binary Logit


Model a binary dependent variable (e.g., yes/no, pass/fail, win/lose). Also known as a Logistic regression or Binomial re-
gression.

The Binary Logit is a form of regression analysis that models a binary dependent variable (e.g. yes/no, pass/fail,
win/lose). It is also known as a Logistic regression, and Binomial regression.

Data format
The key requirement for a binary logit regression is that the dependent variable is binary. In Displayr, the best data format
for this type is “Nominal: Mutually exclusive categories”, with values of “0” and “1”.

The independent variables can be continuous, categorical, or binary — just as with any other regression model.

Interpretation
Variable statistics measure the impact and significance of individual variables within a model, while overall statistics apply
to the model as a whole. Both are shown in the binary logit output.

Variable statistics
Estimate the magnitude of the coefficient indicates the size of the change in the independent variable as the value of the
dependent variable changes. A positive number indicates a direct relationship (y increases as x increases), and a negative
number indicates an inverse relationship (y decreases as x increases.

The coefficient is colored if the variable is statistically significant at the 5% level.

Standard Error measures the accuracy of an estimate. The smaller the standard error, the more accurate the
predictions.

Z-value the estimate divided by the standard error. The magnitude (either positive or negative) indicates the significance
of the variable. The values are highlighted based on their magnitude.

P-value expresses the z-value as a probability. A p-value under 0.05 means that the variable is statistically significant at
the 5% level; a p-value under 0.01 means that the variable is statistically significant at the 1% level. P-values under 0.05
are shown in bold.

https://wiki.q-researchsoftware.com/wiki/Regression_-_Binary_Logit 1/7
10/3/2021 Regression - Binary Logit - Q

Overall statistics
n the sample size of the model

McFadden’s rho-squared assess the goodness of fit of the model. A larger number indicates that the model captures
more of the variation in the dependent variable.

AIC Akaike information criterion is a measure of the quality of the model. When comparing similar models, the AIC can
be used to identify the superior model.

See also Regression Diagnostics.

Example
The example below is a model that predicts a survey respondent’s likelihood of having consumed a fast-food product based
on characteristics like age, gender, and work status.

Binary Logit: Q3 Ever Eaten: Burger Shack


Estimate Standard Error z p

(Intercept) -0.37 0.15 -2.45 .014

S1 Age: 19 to 24 0.70 0.12 5.83 < .001

S1 Age: 25 to 29 0.39 0.14 2.73 .006

S1 Age: 30 to 34 0.21 0.14 1.54 .124

S1 Age: 35 to 39 0.00 0.14 -0.03 .976

S1 Age: 40 to 44 -0.17 0.14 -1.26 .208

S1 Age: 45 to 49 -0.62 0.14 -4.31 < .001

S2 Gender: Female -0.12 0.06 -1.87 .062

C2 Work Status: Full time employment 0.62 0.10 6.34 < .001

C2 Work Status: Part time employment 0.13 0.11 1.21 .228

C2 Work Status: Unemployed 0.14 0.18 0.78 .434

C2 Work Status: Student/not working 0.19 0.12 1.57 .117

C2 Work Status: Retired -1.16 0.77 -1.49 .136

n = 4,853 cases used in estimation; R-squared: 0.03892; Correct predictions: 59.86%; McFadden's
rho-squared: 0.0381; AIC: 6,497; multiple comparisons correction: None

Create a Binary Logit Model in Displayr

1. Go to Insert > Regression > Binary Logit


2. Under Inputs > Outcome, select your dependent variable
3. Under Inputs > Predictor(s), select your independent variables

https://wiki.q-researchsoftware.com/wiki/Regression_-_Binary_Logit 2/7
10/3/2021 Regression - Binary Logit - Q

Object Inspector Options


Outcome The variable to be predicted by the predictor variables.

Predictors The variable(s) to predict the outcome.

Algorithm The fitting algorithm. Defaults to Regression but may be changed to other machine learning methods.

Type: You can use this option to toggle between different types of regression models, but note that certain types are not
appropriate for certain types of outcome variable.

Linear Appropriate for a continuous outcome variable. See Regression - Linear


Regression.
Binary Logit Appropriate if the outcome is binary (i.e. falls in one of two categories). See
Regression - Binary Logit.
Ordered Logit Appropriate for a discrete outcome where the categories have a natural
order (e.g. Low, Medium, High). See Regression - Ordered Logit.
Multinomial Logit Appropriate for a discrete outcome with unordered categories. See
Regression - Multinomial Logit.
Poisson Appropriate for count outcomes (i.e. outcomes that take only positive integer
values). See Regression - Poisson Regression.
Quasi-Poisson Appropriate for count outcomes. See Regression - Quasi-Poisson
Regression.
NBD Appropriate for count outcomes. See Regression - NBD Regression.

Robust standard errors Computes standard errors that are robust to violations of the assumption of constant variance
(i.e., heteroscedasticity). See Robust Standard Errors. This is only available when Type is Linear.

Missing data See Missing Data Options.

Output

Summary The default; as shown in the example above.


Detail Typical R output, some additional information compared to Summary, but without
the pretty formatting.
ANOVA Analysis of variance table containing the results of Chi-squared likelihood ratio
tests for each predictor.
Relative Importance Analysis The results of a relative importance analysis. See here
and the references for more information. This option is not available for Multinomial Logit.
Note that categorical predictors are not converted to be numeric, unlike in Driver
(Importance) Analysis - Relative Importance Analysis.
Shapley Regression See here and the references for more information. This option is
only available for Linear Regression. Note that categorical predictors are not converted to
be numeric, unlike in Driver (Importance) Analysis - Shapley.
Jaccard Coefficient Computes the relative importance of the predictor variables against
the outcome variable with the Jaccard Coefficients. See Driver (Importance_ Analysis -
Jaccard Coefficient. This option requires both binary variables for the outcome variable
and the predictor variables.
Correlation Computes the relative importance of the predictor variables against the
outcome variable via the bivariate Pearson product moment correlations. See Driver
(Importance) Analysis - Correlation and references therein for more information.
Effects Plot Plots the relationship between each of the Predictors and the Outcome. Not
available for Multinomial Logit.

https://wiki.q-researchsoftware.com/wiki/Regression_-_Binary_Logit 3/7
10/3/2021 Regression - Binary Logit - Q

Correction The multiple comparisons correction applied when computing the p-values of the post-hoc comparisons.

Variable names Displays Variable Names in the output instead of labels.

Absolute importance scores Whether the absolute value of Relative Importance Analysis scores should be displayed.

Auxiliary variables Variables to be used when imputing missing values (in addition to all the other variables in the
model).

Weight. Where a weight has been set for the R Output, it will automatically applied when the model is estimated. By de-
fault, the weight is assumed to be a sampling weight, and the standard errors are estimated using Taylor series lineariza-
tion (by contrast, in the Legacy Regression, weight calibration is used). See Weights, Effective Sample Size and Design Ef-
fects.

Filter The data is automatically filtered using any filters prior to estimating the model.

Crosstab Interaction Optional variable to test for interaction with other variables in the model. The interaction variable
is treated as a categorical variable. Coefficients in the table are computed by creating separate regressions for each level of
the interaction variable. To evaluate whether a coefficient is significantly higher (blue) or lower (red), we perform a t-test
of the coefficient compared to the coefficient using the remaining data as described in Driver Analysis. P-values are correc-
ted for multiple comparisons across the whole table (excluding the NET column). The P-value in the sub-title is calculated
using a the likelihood ratio test between the pooled model with no interaction variable, and a model where all predictors
interact with the interaction variable.

Automated outlier removal percentage A numeric value between 0 and 50 (including 0 but not 50) is used to specify
the percentage of the data that is removed from analysis due to outliers. All regression types except for the case of Multi-
nomial Logit support this feature. If a zero-value is selected for this input control then no outlier removal is performed
and a standard regression output for the entire (possibly filtered) dataset is applied. If a non-zero value is selected for this
option then the regression model is fitted twice. The first regression model uses the entire dataset (after filters have been
applied) and identifies the observations that generate the largest residuals. The user specified percent of cases in the data
that have the largest residuals are then removed. The regression model is refitted on this reduced dataset and output re-
turned. The specific residual used varies depending on the regression Type.

Linear: The studentized residual in an unweighted regression and the Pearson residual in
a weighted regression. The Pearson residual in the weighted case adjusts appropriately
for the provided survey weights.

Binary Logit and Ordered Logit: A type of surrogate residual from the sure R package
(see Greenwell, McCarthy, Boehmke and Liu (2018) for more details). In Binary Logit it
uses the resids function with the jitter parametrization. In Ordered Logit it uses the resids
function with the latent parametrization to exploit the ordered logit structure.

NBD Regression, Poisson Regression: A studentized deviance residual in an


unweighted regression and the Pearson residual in a weighted regression.

Quasi-Poisson Regression: A type of quasi-deviance residual via the rstudent function in


an unweighted regression and the Pearson residual in a weighted regression.

The studentized residual computes the distance between the observed and fitted value for each point and standardizes
(adjusts) based on the influence and an externally adjusted variance calculation . The studentized deviance residual com-
putes the contribution the fitted point has to the likelihood and standardizes (adjusts) based on the influence of the point

https://wiki.q-researchsoftware.com/wiki/Regression_-_Binary_Logit 4/7
10/3/2021 Regression - Binary Logit - Q

and an externally adjusted variance calculation (see rstudent function in R and Davison and Snell (1991) for more de-
tails). The Pearson residual in the weighted case computes the distance between the observed and fitted value and adjusts
appropriately for the provided survey weights. See rstudent function in R and Davison and Snell (1991) for more details
off the specifics of the calculations.

Stack data Whether the input data should be stacked before analysis. Stacking can be desirable when each individual in
the data set has multiple cases and an aggregate model is desired. More information is available at Stacking Data Files. If
this option is chosen then the Outcome needs to be a single Question that has a Multi type structure suitable for regression
such as a Pick One - Multi, Pick Any or Number - MultiVariable Set that has a Multi type structure suitable for regression
such as a Binary - Multi, Nominal - Multi, Ordinal - Multi or Numeric - Multi. Similarly, the Predictor(s) need to be a
single Question that has a Grid type structure such as a Pick Any - Grid or a Number - GridVariable Set that has a Grid
type structure such as a Binary - Grid or a Numeric - Grid. In the process of stacking, the data reduction is inspected. Any
constructed NETs are removed unless comprised of source values that are mutually exclusive to other codes, such as the
result of merging two categories.

Random seed Seed used to initialize the (pseudo)random number generator for the model fitting algorithm. Different
seeds may lead to slightly different answers, but should normally not make a large difference.

Increase allowed output size Check this box if you encounter a warning message "The R output had size XXX MB, ex-
ceeding the 128 MB limit..." and you need to reference the output elsewhere in your document; e.g., to save predicted val-
ues to a Data Set or examine diagnostics.

Maximum allowed size for output (MB). This control only appears if Increase allowed output size is checked.
Use it to set the maximum allowed size for the regression output in Megabytes. The warning referred to above about the R
output size will state the minimum size you need to increase to to return the full output. Note that having very many large
outputs in one document or page may slow down the performance of your document and increase load times.

Additional options are available by editing the code.

DIAGNOSTICS
Plot - Cook's Distance Creates a line/rug plot showing Cook's Distance for each observation.

Plot - Cook's Distance vs Leverage Creates a scatterplot showing Cook's distance vs leverage for each observation.

Plot - Influence Index Creates index plots of studentized residuals, hat values, and Cook's distance.

Multicollinearity Table (VIF) Creates a table containing variance inflation factors (VIF) to diagnose multicollinearity.

Plot - Normal Q-Q Creates a normal Quantile-Quantile (QQ) plot to reveal departures of the residuals from normality.

Prediction-Accuracy Table Creates a table showing the observed and predicted values, as a heatmap.

Test Residual Heteroscedasticity Conducts a heteroscedasticity test on the residuals.

Test Residual Normality (Shapiro-Wilk) Conducts a Shapiro-Wilk test of normality on the (deviance) residuals.

Plot - Residuals vs Fitted Creates a scatterplot of residuals versus fitted values.

Plot - Residuals vs Leverage Creates a plot of residuals versus leverage values.

Plot - Scale-Location Creates a plot of the square root of the absolute standardized residuals by fitted values.
https://wiki.q-researchsoftware.com/wiki/Regression_-_Binary_Logit 5/7
10/3/2021 Regression - Binary Logit - Q

Test Residual Serial Correlation (Durbin-Watson) Conducts a Durbin-Watson test of serial correlation (auto-cor-
relation) on the residuals.

SAVE VARIABLE(S)
Fitted Values Creates a new variable containing fitted values for each case in the data.

Predicted Values Creates a new variable containing predicted values for each case in the data.

Residuals Creates a new variable containing residual values for each case in the data.

More information
What is Logistic Regression? (https://www.displayr.com/what-is-logistic-regression/)

How to do Logistic Regression in Displayr (https://www.displayr.com/how-to-do-logistic-regression-in-displayr/)

How to Interpret Logistic Regression Outputs (https://www.displayr.com/how-to-interpret-logistic-regression-outputs/)

How to Interpret Logistic Regression Coefficients (https://www.displayr.com/how-to-interpret-logistic-regression-coeffi-


cients/)

Acknowledgements
Uses the glm from the stats R package. If weights are supplied, the svyglm function from the survey (https://cran.r-pr
oject.org/web/packages/survey/index.html) R package is used. Also uses the resids function in from the sure (https://cr
an.r-project.org/web/packages/sure/index.html) R package. See also Regression - Generalized Linear Model.

References
Greenwell, B. M., McCarthy, A. J., Boehmke, B. C. and Liu, D. (2018). "Residuals and Diagnostics for Binary and Ordinal
Regression Models: An Introduction to the sure Package", The R Journal, 10(1), 381--394, doi:10.32614/RJ-2018-004 (ht-
tps://doi.org/10.32614/RJ-2018-004)

Yap, J. (2018, August 22). What is logistic regression? [Blog post]. Accessed from https://www.displayr.com/what-is-lo-
gistic-regression/ (https://www.displayr.com/what-is-logistic-regression/).

For relative importance analysis: Johnson, J. W. (2000). A heuristic method for estimating the relative weight of predictor
variables in multiple regression. Multivariate behavioral research, 35(1), 1-19.

Code
▶ Show Code

▶ Show Code

Retrieved from ‘https://wiki.q-researchsoftware.com/index.php?title=Regression_-_Binary_Logit&oldid=54808’

This page was last modified on 7 July 2021, at 04:35.


https://wiki.q-researchsoftware.com/wiki/Regression_-_Binary_Logit 6/7
10/3/2021 Regression - Binary Logit - Q

Copyright © Displayr 2010-2021.

https://wiki.q-researchsoftware.com/wiki/Regression_-_Binary_Logit 7/7

You might also like