0% found this document useful (0 votes)
337 views33 pages

Pooled and Exact Logistics

Exact logistic regression is recommended over classical logistic regression when sample sizes are small and some cells have zero observations. An example analysis found that students who took AP calculus had odds of admission that were 28.2 times greater than students who did not take the calculus course. Exact logistic regression conditions on sufficient statistics, while classical logistic regression relies on asymptotic results and is not appropriate for small sample sizes with empty cells.

Uploaded by

seif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
337 views33 pages

Pooled and Exact Logistics

Exact logistic regression is recommended over classical logistic regression when sample sizes are small and some cells have zero observations. An example analysis found that students who took AP calculus had odds of admission that were 28.2 times greater than students who did not take the calculus course. Exact logistic regression conditions on sufficient statistics, while classical logistic regression relies on asymptotic results and is not appropriate for small sample sizes with empty cells.

Uploaded by

seif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Pooled and Exact Logistic

Regression
Mary-Winnie EAB2
Outline
•What is Pooled Logistic Regression?
•When it is applicable
•Pooled vs Panel Data (Longitudinal Data)
•Importance of Pooled Logistic Regression
•Modelling Pooled Data
•Example
•Limitations of Pooled Logistic Regression
•Summary
•Introduction to Exact Logistic Regression
•Example
•Why using Exact Logistic Regression instead of Classical Logistic Regression
•Example
•Exact logistic limitation
•Summary
•Conclusion
Scenario
Pooled logistic Regression
• An analysis that incorporates all repeated observations in a
personalised-years technique….pooling of repeated observations!
• Once an individual has an event in a particular interval all subsequent
intervals from that individual are excluded from the analysis.
• The time dependent covariates remain constant between
examination times.
• The method considers all these time dependent measurements
recorded at repeated intervals in the evaluation of the relationship
between risk factor profile and outcome
Pooled repeated observations Vs survival
Applicability
• Pooled sample to obtain an interval incidence rate of disease
• The distinction between the classic person-years approach and the
PLR method is that the latter updates the risk factors at the beginning
of each observation interval.
Pooled vs Panel Data (Longitudinal Data)

• Pooled data occur when we have a “time series of cross sections,” but
the observations in each cross section do not necessarily refer to the
same unit.
• Panel data refers to samples of the same cross-sectional units
observed at multiple points in time.
• A balanced panel has every observation from 1 to N observable in
every period 1 to T.
• An unbalanced panel has missing data.
• Panel data commands in Stata start with xt, as in xtreg.
Importance of Pooled Logistic Regression

• It will produce approximately equivalent with survival model results


when the disease of interest is a rare event i.e. few events occur in an
interval relative to the size of the population, and when follow-up
periods are short.
• Conditional logistic regression model and pooled logistic regression
are equivalent when the length of time interval tends towards zero
Importance…
• The odds in PLR, describes the conditional probability of event
occurrence, where the conditioning depends upon the individual
survival until that particular time period. This allows all records within
the person-period dataset to be considered as conditionally
independent.
• With multiple records for intervals within each individual there is no
inflation of test statistics resulting from a lack of independence.
Modelling Pooled Data
• Pooled logistic regression is used to link predictors to the event
outcome.
• The outcome is an event indicator, which records whether an event
occurs in the interval or not and does not account for when the event
occurs within the interval.
• A response occurring near the beginning of a follow-up period is
treated the same in analysis as one occurring at the end of that
period. This model relates the probability of an event occurring in an
interval to a logistic function of the risk factors
Example
• logit binbmi_overweight_y i.psum_unemployed_total_gwave_y
i.own_education_y i.medical_card_y i.employment_y
i.maritalstatus_y i.ord_age_y
i.year i.elec_div_y1 if gender==0, cluster (elec_div_y)
• estimates store pooled1
• estimates table pooled1, star stats(N r2 r2_a)
Example
Limitations of Pooled Logistic Regression

• Does not account for censoring however in long-term follow-up


studies, this property of logistic regression becomes a liability since
some of the losses are study participants who die of causes other
than that of the disease endpoint of interest.
• The model does not utilize information for the point in time during
the interval at which an event occurs or the exact time in an interval
that an individual is lost to follow-up. Thus, the contribution of the
risk factor to disease is dependent on the length of follow-up period.
Limitations
• The PLR method also ha larger standard errors compared to the Cox
model in high events rate while in models with low event rates the
standard errors for all methods are larger, as expected.
• Time adjusted PLR has a positive bias in the time dependent (eg age)
effect with reduced bias when the event rate is low. The PLR methods
showed a negative bias in the fixed covariate (eg sex) effect
compared to the other methods and had higher estimates for the
fixed effect compared to the other methods.
Summary
• Pooled logistic regression is useful when adjusted with time variables
• Is not superior to time dependent cox regression
• Can at one point be similar to survival regression, conditional logistic
regression
Exact logistic regression
• Exact logistic regression is used to model binary outcome variables in
which the log odds of the outcome is modeled as a linear
combination of the predictor variables. 
• It is used when the sample size is too small for a regular logistic
regression (which uses the standard maximum-likelihood-based
estimator) and/or when some of the cells formed by the outcome and
categorical predictor variable have no observations. 
• The estimates given by exact logistic regression do not depend on
asymptotic results. (Stata command: exlogistic)
…Other methods for
• Modelling of rare events in logistic regression to reduce bias is also
overcome by
• Bias correction method proposed by King and Zeng (2001a, 2001b)
(Stata command: relogit)
• Penalized maximum likelihood estimation (PMLE) proposed by Firth
(1993) (Stata command: firthlogit)
Exact Logistic Regression
• Principle: exact computation of parameter estimates

• Exact logistic regression is only applicable when


• n is (very) small (<200)
• covariates are discrete (best: dichotomous)
• # of covariates is small
Example
• Suppose that we are interested in the factors that influence whether
or not a high school senior is admitted into a very competitive
engineering school. 
• The outcome variable is binary (0/1): admit or not admit.  The
predictor variables of interest include student gender and whether or
not the student took Advanced Placement calculus in high school. 
• data analysis include the number of students admitted (admit), the
total number of applicants broken down by gender ( female), and
whether or not they had taken AP calculus ( apcalc).
STATA
Methods for analysis
• Exact logistic regression – This technique is appropriate because the
outcome variable is binary, the sample size is small, and some cells
are empty.
• Regular logistic regression – Due to the small sample size and the
presence of cells with no subjects, regular logistic regression is not
advisable, and it might not even be estimable.
• Two-way contingency tables – You may need to use the exact option
to get the Fisher’s exact test due to small expected values.
The odds for an applicant who had taken AP calculus
was about 28.2 times greater than for one who had not
taken the course.
• Exact logistic regression is an alternative to conditional logistic
regression if you have stratification, since both condition on the
number of positive outcomes within each stratum. 
• The estimates from these two analyses will be different
because clogit conditions only on the intercept term,
while exlogistic conditions on the sufficient statistics of the other
regression parameters as well as the intercept term.
Why using Exact Logistic Regression instead
of Classical Logistic Regression?
• Classical logistic applicable when cells have more than 5 observations
per cell…
Example
. exlogistic admit female apcalc, coef

Enumerating sample-space combinations:


observation 1: enumerations = 2
observation 2: enumerations = 3
observation 3: enumerations = 6
observation 4: enumerations = 9
observation 5: enumerations = 16
observation 6: enumerations = 19
observation 7: enumerations = 20
observation 8: enumerations = 13

Exact logistic regression Number of obs = 8


Model score = 0
Pr >= score = 1.0000

admit Coef. Suff. 2*Pr(Suff.) [95% Conf. Interval]

female 4.16e-17 2 1.0000 -3.137592 3.137592


apcalc 4.16e-17 2 1.0000 -3.137592 3.137592

. exlogistic

Exact logistic regression Number of obs = 8


Model score = 0
Pr >= score = 1.0000

admit Odds Ratio Suff. 2*Pr(Suff.) [95% Conf. Interval]

female 1 2 1.0000 .0433872 23.04829


apcalc 1 2 1.0000 .0433872 23.04829

. estat se

admit Odds Ratio Std. Err.

female 1 1.224745
apcalc 1 1.224745
. logistic admit female apcalc, or

Logistic regression Number of obs = 8


LR chi2(2) = 0.00
Prob > chi2 = 1.0000
Log likelihood = -5.5451774 Pseudo R2 = 0.0000

admit Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]

female 1 1.414214 0.00 1.000 .0625488 15.98751


apcalc 1 1.414214 0.00 1.000 .0625488 15.98751
_cons 1 1.224745 0.00 1.000 .0906766 11.0282

Note: _cons estimates baseline odds.


Exact limitation
• A major drawback of exact testing in logistic regression, however, is
the computational complexity of the problem.(memory loss)
• Exact p-value still becomes exponentially more difficult for larger
sample sizes, greater numbers of nuisance parameters, greater
imbalance in the data, or inclusion of covariates which are less
discrete.
Summary
• Exact is useful for sparse, small sample sized data
• It is also useful when there are no observations per cell
• However it is not superior to Penalised Maximun likelihood estimation
Conclusion
• Classical logistic regression can be substituted with Pooled and Exact
Logistic regression depending on the nature of the data.
• However there are other alternatives to logistic regression which we
will see in the coming lecture.
References:
Cupples, L. A. et al. (1988) ‘Comparison of baseline and repeated
measure covariate techniques in the Framingham heart study’,
Statistics in Medicine, 7(1–2), pp. 205–218. doi:
10.1002/sim.4780070122.
Collett, D.  Modeling Binary Data, Second Edition.  Boca Raton: 
Chapman and Hall.
Cox, D. R. and Snell, E. J. (1989).  Analysis of Binary Data, Second
Edition. Boca Raton: Chapman and Hall.
Hirji, K. F. (2005).  Exact Analysis of Discrete Data. Boca Raton:
Chapman and Hall.
Example, E. and For, D. (2020) Stata Data Analysis Examples Ordinal Logistic
Regression. Available at: https://stats.idre.ucla.edu/stata/dae/exact-logistic-
regression/ (Accessed: 25 February 2020).
Firth, D. (1993): Bias reduction of maximum likelihood estimates. In:
Biometrika 80: 27-38.
King, G./Zeng, L. (2001a): Logistic Regression in Rare Events Data. In:
Political Analysis 9: 137-163.
King, G./Zeng, L. (2001b): Explaining Rare Events in international Relations.
In: International Organization 55: 693-715.
Ngwa, J. S. et al. (2016) ‘A comparison of time dependent Cox regression ,
pooled logistic regression and cross sectional pooling with simulations and
an application to the Framingham Heart Study’, BMC Medical Research
Methodology. BMC Medical Research Methodology, pp. 1–12. doi:
10.1186/s12874-016-0248-6.

You might also like