0% found this document useful (0 votes)

89 views23 pages

Logistic Regression Course Note

This document provides an overview of logistic regression modeling for binary outcome data. It discusses [1] the differences between linear and logistic regression models, [2] how logistic regression transforms the outcome variable to account for its binary nature, and [3] how to interpret the coefficients in a logistic regression model, whether examining the effect of dichotomous or continuous explanatory variables on the log odds of the outcome. An example using SAS code and output is also provided to demonstrate model fitting and interpretation.

Uploaded by

HOD OD

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views23 pages

Logistic Regression Course Note

Uploaded by

HOD OD

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

PHCL 612- Logistic Regression Course Notes

PHCL 612: Applied Biostatistics-1

Course Notes – Logistic Regression Lecture
Ahmed Shaman, MClinPharm, DClinPharm, PhD
Assistant Professor, Clinical Pharmacy Department
[email protected]
1 Lecture 1: Logistic Regression

Lecture 1: LOGISTIC REGRESSION
Aims
To introduce the logistic regression model for modelling binary outcome data.

Objectives
By the end of this session you should be able to:
1. Understand the difference between a linear model and a logistic regression model;
2. Interpret the coefficients of variables in a logistic regression model;
3. Calculate the likelihood ratio test statistic for a variable, or group of variables, in a logistic
regression equation;
4. Write a SAS program to fit a logistic regression model and interpret the output.

Linear Models
The Mantel‐Haenszel method could be used to examine the association between two binary variables
after stratifying by another variable. That is, the method was used to examine the effect of a binary
study factor on a binary outcome variable, while adjusting for a single confounder or effect modifier.

If we want to adjust for more than one variable, it is simpler to use a regression model. We have seen in
this course that, when the outcome is a continuous variable, multiple linear regression is used. The
continuous outcome variable is linearly related to the explanatory variables, and the errors are assumed
to be Normally distributed. This type of model is known as a linear model. In multiple linear regression,
the expected value of the outcome variable (y) is modelled as a function of
multiple explanatory variables ( , , , …), and the regression equation is represented as

…

Logistic Regression
If we apply a linear regression model when the outcome variable has two possible values (e.g. diseased
or not diseased) then there may be problems with the resulting model. When the outcome is binary it
would commonly be coded as 0 (outcome not present) and 1 (outcome present) and the purpose of
fitting a regression model is to identify which factors are associated with the probability (P) of having the
outcome of interest. This probability can only take values between 0 and 1. A linear regression model
fitted to such data would have the following form:

…

Unfortunately such a model may produce estimates of the probability (P) that are outside the bounds of
0 and 1. To analyse data with a binary outcome we instead use the logistic regression model. The rest of
this course will focus on how to fit logistic regression models and interpret the results.
Lecture 1: Logistic Regression 2

In order to apply the logistic regression model we need to make a couple of changes to the linear
regression model described above. Firstly, we require a transformation such that the left hand side of the
equation (the outcome) can take any value. Secondly, if the outcome is binary then the errors will not be
Normally distributed so we will need to specify a distribution for the errors that would be appropriate for
binary data.

Logit Transformation
The transformation that is applied to the outcome in a logistic regression model is known as the logit
function.

ln
1
The logit (or logistic) function transforms P as follows:

When 0, ln ln ln 0 ∞

.
When 0.5, ln ln ln 1 0
.

When 1, ln ln ln ∞ ∞

Thus logit(P) can range from ∞ ∞

If logit(P) is linearly related to x, then the plot of P against x will be S‐shaped (sigmoidal).

Example of a logistic regression function: 2.0 0.5

1.0

0.8
Probability (P)

0.6

0.4

0.2

0.0
-2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13
x
3 Lecture 1: Logistic Regression

Interpreting logits
For a binary outcome (positive vs negative), we can represent the probability of the “positive” outcome
for individuals in group A by PA, and the probability of the positive outcome for individuals in group B by
PB. The log odds of a positive outcome in each group is then given by:

ln ln
1 1

The difference between logits for the two groups A and B is

ln ln
1 1

1
ln
1

ln (3.1)

That is, the difference between the logit of PB and the logit of PA is the natural logarithm of the odds ratio
for event for group B relative to group A.

Multiple logistic regression
The equation of a multiple logistic regression model can be represented as:

ln ⋯ (3.2)

The error distribution chosen for the logistic regression model is the Binomial distribution as the
outcome is binary.

The inverse of the logit transformation can be used to express P (rather than logit P) as a function of the
explanatory variables. Using this inverse transformation, equation (3.2) may also be written in the form:

⋯
(3.3)
⋯
Lecture 1: Logistic Regression 4

This form is useful for calculating the probability of an individual having the outcome, given the values of
their x's (explanatory variables) after having estimated the values of the coefficients (β's).

Interpretation of coefficients
In the multiple logistic regression equation (3.2), represents the additive increase in the loge(odds),
also denoted as ln(odds), for an increase of 1 unit in xi, with all the other x's fixed.

As in multiple linear regression, the explanatory variables in a multiple logistic regression model may be
either continuous or categorical. Using a simple logistic regression model with only one covariate for the
purposes of illustration, we can consider the interpretation of the coefficient when the explanatory
variable is (i) dichotomous or (ii) continuous.

(i) dichotomous explanatory variable x (e.g. 0=male and 1=female)

For males: x=0 and hence

For females: x=1 and hence

Taking the difference:

Hence, represents the log odds ratio for females vs males. The odds ratio for females vs males is
given by exp (i.e. ).

SAS example

– GLOW dataset from Hosmer and Lemeshow

– Outcome: Fracture (coded 1=yes, 0 = no)

– Study factor: Prior fracture (coded 1=yes, 0 = no)

– Explanatory variable: Age (years)

Analysis of Maximum Likelihood Estimates

Parameter DF Estimate Standard Wald Pr > ChiSq

Error Chi‐Square

Intercept 1 ‐4.2143 0.8478 24.7071 <.0001

PRIORFRAC 1 0.8388 0.2342 12.8335 0.0003

AGE 1 0.0412 0.0122 11.4402 0.0007

5 Lecture 1: Logistic Regression

Odds Ratio Estimates
Effect Point Estimate 95% Wald
Confidence Limits

PRIORFRAC 2.314 1.462 3.661

AGE 1.042 1.017 1.067

Interpretation: Those with a prior fracture have 2.31 times the odds of a subsequent fracture
compared to those without a prior fracture, after adjusting for age.

(ii) Continuous explanatory variable x (e.g. age)

For a given age x=a:

For age x=a+1: 1

Taking the difference:

Hence, represents the log odds ratio for a 1 year increase in age. The odds ratio for a 1 year
increase in age is given by exp() (i.e. ).

The interpretation of in a multiple logistic regression follows from the above, and exp( ) gives the

odds ratio (OR) corresponding to an increase of one unit in xi, with all the other x's constant. Hence,
exp(βi) represents the multiplicative increase in odds for a one unit increase in xi.
Lecture 1: Logistic Regression 6

SAS example continued:

Odds Ratio Estimates
Effect Point Estimate 95% Wald
Confidence Limits

PRIORFRAC 2.314 1.462 3.661

AGE 1.042 1.017 1.067

Interpretation: For a one year increase in age the odds of a subsequent fracture are increased by a factor
of 1.04, after adjusting for prior fracture.

(iii) Categorical explanatory variables with more than two levels

As demonstrated above, for a dichotomous variable, xj, exp(βj) represents the odds ratio due to the
presence of the characteristic (assuming xj is coded so that 0=absence and 1=presence of the
characteristic). For categorical variables with more than 2 levels, dummy variables must be created as for
multiple linear regression. One level must be chosen as the reference category, with OR=1
(corresponding to β=0). Then exp(β) for the dummy variables gives the OR for the other levels relative to
the reference category.

For example, if we grouped countries of birth into three categories (Australia/NZ, Asia, Elsewhere) then
we would need to create 2 dummy variables. The choice of coding of these dummy variables then
determines which is the reference category.

Dummy variable 1 Dummy variable 2

Australia/NZ 0 0

Asia 1 0

Elsewhere 0 1

In the coding above Australia would be the reference category and in the resulting output we would have
two odds ratios. These correspond to the odds ratio comparing the odds of the outcome for people born
in Asia compared to those in Australia, and the second odds ratio would be comparing the odds of the
outcome for people born Elsewhere to people born in Australia.
7 Lecture 1: Logistic Regression

Fitting a Logistic Regression Model using Maximum Likelihood Estimation
In linear regression we fit models using the method of least squares. Logistic regression models are fitted
using the method of maximum likelihood; a general method that is applied to all other types of
generalised linear models. The likelihood , , ,…, of the observed data is a measure of the
fit of a model. It reflects the probability of observing our data, if the given model was true. Therefore, the
larger the likelihood, the better the fit of the model.

The maximum likelihood estimates are those values of , , ,…, that maximise L, i.e. for which the

likelihood (probability) of the observed data is higher than for any other values.

, , ,…, where , , ,…, are the maximum likelihood estimates of the

regression coefficients.

The logistic regression equation can be fitted to a set of data using the method of maximum likelihood.
The computations are even more complex than for a classical regression fitted by least squares, but are
readily carried out using a computer package.

As a very simple illustration, imagine we have a study with 10 participants of whom 4 have the disease of
/
interest. Then in this study the odds of having disease is 0.667.
/

We could fit a logistic regression model to estimate the odds of having disease. The model would be

Logit(disease) = α

(Note: in this model we have no explanatory variables and so the intercept provides the estimate of the
log odds of the outcome).

The method of maximum likelihood postulates a value for α and then calculates how likely it is that this
model would generate the observed data we have (in this case 4 participants with disease and 6
participants without disease) and searches through all possible values of α to find the value that
maximises the likelihood.

In the graph below the likelihood is plotted on the vertical axis (well actually the log‐likelihood is plotted
as when we apply the methods of maximum likelihood we usually use the log of the likelihood as these
are easier to work with mathematically). On the horizontal axis we have postulated values of α (here
labelled as _cons). We can see that the maximum of the graph is approximately ‐0.4. This is the
maximum likelihood estimate of α.

Hence our estimate of Logit(disease) = ‐0.4 and if we take the exponential that gives:

exp 0.4 0.67

as expected from the observed data.
Lecture 1: Logistic Regression 8

The same principles apply when we extend the model to have explanatory variables. The method of
maximum likelihood searches through all possible estimates for the parameters and finds the values that
maximise the likelihood of observing the data we have. For example, if males are at greater risk of having
the disease than females then a model that postulates that males have only half the risk of females
would be very unlikely to produce the data we have observed.

Both classical regression and logistic regression are particular members of the wider class of models
called generalised linear models. The method of least squares is an example of the more general method
of maximum likelihood applied to a Normally distributed variable. Thus logistic regression is conceptually
similar to classical regression and the rationale of fitting different models to a data set is the same.

Hypothesis testing
There are two methods of constructing a hypothesis test:

1. The likelihood ratio test, based on the log likelihood, is used for testing any set of regression
coefficients. (This corresponds to an F test in linear regression.) This test uses the transformation
‐2Ln(L) (minus twice the log likelihood). Note that as the likelihood increases (i.e. the fit of the
model gets better), the smaller the ‐2Ln(L). Changes in ‐2Ln(L) are test statistics, and can be
referred to as deviance differences, where the deviance is the analogue of residual sum of
squares in classical regression. When variables are added into a model the difference between
‐2Ln(L) of the two models represents the effects of the added variables. This reduction in ‐2Ln(L)
9 Lecture 1: Logistic Regression

is distributed approximately as a chi‐squared statistic with degrees of freedom equal to the
number of variables added.

Consider two models (a) and (b) where model (b) contains q extra terms:

(a) , ,…, with

(b) , ,…, ,…, with

Then 2 2 is approximately a with q DF.

2. The Wald z or test may be used to assess the evidence for a single variable. (This corresponds
to the t test of a single regression coefficient in classical regression.)

Squaring the Wald z (which has an approximate Normal distribution) gives the Wald chi‐squared
statistic, which has an approximate distribution with 1 DF. Although the likelihood ratio test is
usually preferred, the Wald test is very convenient to use because it is included in the output
from many programs such as PROC LOGISTIC in SAS. The Wald test for a set of k dummy variables
or interaction terms can be obtained using PROC LOGISTIC; the test statistic has an approximate
chi‐squared distribution with k degrees of freedom. However, the likelihood ratio test is more
reliable than the Wald test particularly when the overall number of observations is small or the
number of observations in a variable category is small.

Confidence interval estimation
The maximum likelihood method gives the estimates, , with their standard errors, from which
confidence intervals may be constructed. As the sampling error for each , is assumed to be Normally
distributed, the 95% confidence interval is

1.96

and the 95% confidence interval for the odds ratio is

exp 1.96

Logistic regression vs Mantel‐Haenszel method
Logistic regression can be used as an alternative to the computationally simpler method of Mantel‐
Haenszel. The logistic regression has the advantage that it gives more information than the Mantel‐
Haenszel method. This point can be illustrated using the asbestos exposure and lung cancer case control
study. Fitting both asbestos and smoking variables in the model gave:
Lecture 1: Logistic Regression 10

Variable Coefficient SE OR

Asbestos 0.5776 0.1626 1.782

Smoking 1.2070 0.2363 3.344

The resulting odds ratio for lung cancer due to asbestos allowing for smoking was 1.78 (95% CI: 1.30 to
2.45), and the odds ratio due to smoking allowing for asbestos was 3.34 (95% CI: 2.10 to 5.31). Hence,
the variables, asbestos and smoking, are represented on an equal basis in the model. By contrast, the
Mantel‐Haenszel analysis does not provide an estimate of the smoking effect allowing for asbestos,
because smoking is the stratification variable. To obtain an estimate of the effect of smoking, adjusted
for asbestos exposure, it would be necessary to repeat the Mantel‐Haenszel analysis after re‐arranging
the data into tables showing the association between smoking and lung cancer after stratifying for
asbestos.

If one factor is regarded as the study factor and the other as a confounder, the above feature confers no
advantage on logistic regression. Often, however, both factors may be regarded as study factors and
both as confounders with respect to the other. Then the logistic regression approach has the advantage
of estimating the effects of both factors simultaneously, each effect being estimated after allowing for
the other.

The real advantage of logistic regression occurs when there are many risk factors to be considered. Even
if all except one were regarded as confounders, the Mantel‐Haenszel method could only be applied after
stratifying for all these factors simultaneously. This would lead to a large number of strata and few
subjects in each, and the stratification would result in loss of information on the effect of the study
factor. Note, in particular, if a stratum contained only one subject it would contribute no information, or
if it contained 2 or 3 subjects, say, but they were all cases, or all controls, or all exposed to the study
factor, or all unexposed, there would be no information. Therefore, in situations where there are several
confounders, stratification will usually lead to a lower power than a logistic regression. Moreover, effect
modification can be explored more readily using logistic regression by including product terms of study
factor and confounder. (This will be discussed further in the next session.)

Another point is that the Mantel‐Haenszel method requires that all variables be categorical, whereas in
logistic regression the risk factors could be either continuous or categorical.

To summarise, logistic regression is a much more flexible method which can deal with more complex
situations.

Further reading: H&L Chapters 1.1 to 1.4, 2 and 3. SMMR §14.2. Applied linear models with SAS Chapter
8, Applied medical statistics using SAS Chapter 8, Chapter 9.
11 Lecture 1: Logistic Regression

Lecture 1: SAS NOTES

Procedures
PROC LOGISTIC
Logistic regression models can be fitted using the LOGISTIC procedure. All that is required is a MODEL
statement with syntax. The MODEL statement will be different depending on whether the records in your
dataset correspond to single observations or grouped observations.

If each record in your dataset corresponds to a single observation or person then the following syntax
would be used:
PROC LOGISTIC [DATA=dataset];
[CLASS variable(s) [/options];]
MODEL outcome_var (EVENT='value') = explanatory variables [/options];

outcome_var is the outcome variable and value is the value this variable takes for the outcome of
interest. For example, MODEL case (EVENT='1'). If you do not include (EVENT='value') in your
program, SAS, by default, takes the lowest value of the outcome_var as the outcome of interest. Note if
you have assigned a format to the outcome variable then the value in the quotes should be the
corresponding format label. For example, if you have assigned a format label of Yes to the value 1 and No
to the value 0 then the statement would read (EVENT='Yes').

If each record corresponds to grouped observations then the following syntax would be used:
PROC LOGISTIC [DATA=dataset];
[CLASS variable(s) [/options];]
MODEL events/trials = explanatory variables [/options];

events is the variable containing the count of the number of people with the outcome of interest, and
trials is the variable containing the total (those with and without the outcome of interest) number of
people.

Note that only one MODEL statement can be used for each LOGISTIC procedure.

For each term in the model, the following are shown in the output:

Under the heading "Analysis of Maximum Likelihood Estimates":

Estimate the coefficient of that term, b

Standard Error the standard error of the parameter estimate, SE(b)

Chi‐Square = z2 where z=b/SE(b), with a chi‐squared distribution with 1 DF
Lecture 1: Logistic Regression 12

Pr > ChiSq the P‐value for the Wald chi‐square

Under the heading "Odds Ratio Estimates":

Point Estimate = exp(b) = eb

The LOGISTIC procedure also prints the value of ‐2Ln(L) for the intercept only, for the intercept and
covariates, and the difference which gives an approximate chi‐squared test for the joint effect of all the
explanatory variables (covariates) in the model, with DF equal to the number of covariates. The effect of
adding a variable to a model may be tested using the likelihood ratio test by calculating the change in ‐
2Ln(L) (or equivalently, the change in the chi‐square for covariates) between models with and without
the additional variable. This likelihood ratio (LR) test is a preferable alternative to the Wald test. The
Wald test statistic for a group of variables (e.g. a set of dummy variables) can be obtained using the TEST
statement.

Options
Some of the options for MODEL are:

ALPHA=value sets the significance level for the confidence limits. The value must be between 0 and 1.
The default value 0.05 results in the calculation of a 95% CI.

PARMLABEL displays the labels (see the LABEL statement) of the parameters in the
"Analysis of Maximum Likelihood Estimates" table.

CLASS Statement
CLASS variable(s) [/options];

Or, to assign different options to variables, use

CLASS variable (options) variable (options) …;

The CLASS statement can be used to define explanatory variables to be categorical. SAS will automatically
create dummy variables for any variables included in the CLASS statement, like in PROC GLM. The default
in SAS is to use effect coding for the dummy variables (this will not be covered in the course). To use
coding consistent with that adopted in Multiple Regression and in this course, we use the following
options for each variable:

PARAM=keyword Specifies the coding for the dummy (design) variables. The keyword REF specifies
that a referent category be used.

REF=level Specifies which level (i.e. value of the variable) is to be the referent category e.g. REF=0.
Alternatively, REF=FIRST requests that the first level of an ordered set of categories be
used as the referent group. REF=LAST, which is the default, uses the last level as the
referent group.
For example:
13 Lecture 1: Logistic Regression

* BASE MODEL;
PROC LOGISTIC DATA = cin_ca;
CLASS smok npart agefi educ / PARAM=ref REF=first;
MODEL status (EVENT = '1') =smok npart agefi educ age;

Notes:

a) All of the explanatory variables, except age, are treated as categorical in the above model.

b) Dummy variables are created using the first level for each variable as the referent category.

c) If value labels have been assigned to a categorical variable, the order of the categories will be
determined by the alphabetical order of the labels.

d) The Wald chi‐square, its degrees of freedom and P‐value will be reported for all variables when
the CLASS statement is used. This is very useful when screening variables for inclusion in the
model. For example,

Type 3 Analysis of Effects

D Wald
Effect F Chi-Square Pr > ChiSq
smok 1 13.6263 0.0002
npart 3 13.2814 0.0041
agefi 3 2.9985 0.3919
educ 2 0.3015 0.8601
age 1 1.7865 0.1814
Lecture 1: Logistic Regression 14

Lecture 1: EXAMPLES IN SAS
Example 3: Association between Lung Cancer and Asbestos Exposure: Logistic Regression
Modelling
The case‐control data may be analysed for the asbestos effect on lung cancer, taking account of
smoking. The data are:

ASB SMOKING CASES CONTROLS

1 0 17 98

0 0 6 103

1 1 131 274

0 1 69 240

As shown on page 3, the difference of two logits is the log of an odds ratio. Since the odds ratio can be
estimated in a case‐control study, logistic regression can be used to analyse the data from studies such as
Example 3. The outcome (P) represents the proportion of (cases + controls) that are cases for each unit of
data. A SAS program is given below that shows how logistic regression models can be fitted for these
data. The corresponding output and a brief discussion of this output follows.

It should be noted that the parameter α has no useful meaning for a case‐control study, as it is
influenced by the sampling fractions of cases and controls. Thus the fitted model cannot be converted
back to give meaningful estimates of P. This corresponds to the well‐known fact that incidences cannot
be estimated from a case‐control study but the odds ratio can.

SAS Program:
* author : me
date : today
purpose: logistic regression of asbestos and lung cancer data
adjusted for smoking;

TITLE1 'Logistic regression example 3';

TITLE2 ‘Analysis of the association between’;
TITLE3 ‘asbestos exposure and lung cancer’;

PROC IMPORT DATAFILE='/home/u48776026/my_shared_file_links/u48776026/

PCHL 612/DATA/asb_cc.xlsx'
OUT=table DBMS=XLSX REPLACE;
GETNAMES=YES;
15 Lecture 1: Logistic Regression

RUN;

PROC LOGISTIC DATA = table;

MODEL case (EVENT='1') = asb; * MODEL 1*;

PROC LOGISTIC DATA = table;

MODEL case (EVENT='1') = smoking; * MODEL 2*;

PROC LOGISTIC DATA = table;

MODEL case (EVENT='1') = smoking asb; * MODEL 3*;

RUN;
Lecture 1: Logistic Regression 16

Edited Output
MODEL 1

Logistic regression example 3

‘Analysis of the association between’
‘asbestos exposure and lung cancer’

The LOGISTIC Procedure

Model Fit Statistics

Criterion Intercept Only Intercept and Covariates

AIC 1030.912 1018.506

SC 1035.756 1028.194

-2 Log L 1028.912 1014.506

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 14.4060 1 0.0001
Score 14.1487 1 0.0002
Wald 13.9441 1 0.0002

Analysis of Maximum Likelihood Estimates

Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -1.5202 0.1275 142.2344 <.0001
asb 1 0.5986 0.1603 13.9441 0.0002

Odds Ratio Estimates

Point 95% Wald
Effect Estimate Confidence Limits
asb 1.819 1.329 2.491
17 Lecture 1: Logistic Regression

MODEL 2:

Model Fit Statistics

Criterion Intercept Only Intercept and Covariates

AIC 1030.912 999.145

SC 1035.756 1008.832

-2 Log L 1028.912 995.145

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 33.7677 1 <.0001
Score 29.6217 1 <.0001
Wald 27.0336 1 <.0001

Analysis of Maximum Likelihood Estimates

Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -2.1676 0.2201 96.9854 <.0001
smoking 1 1.2237 0.2354 27.0336 <.0001

Odds Ratio Estimates

Point 95% Wald
Effect Estimate Confidence Limits
smoking 3.400 2.143 5.392

MODEL 3:

Model Fit Statistics

Criterion Intercept Only Intercept and Covariates

AIC 1030.912 988.157

SC 1035.756 1002.688

-2 Log L 1028.912 982.157

Lecture 1: Logistic Regression 18

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 46.7552 2 <.0001
Score 41.9691 2 <.0001
Wald 38.8155 2 <.0001

Analysis of Maximum Likelihood Estimates

Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -2.4968 0.2432 105.3629 <.0001
smoking 1 1.2070 0.2363 26.0853 <.0001
asb 1 0.5776 0.1626 12.6097 0.0004

Odds Ratio Estimates

Point 95% Wald
Effect Estimate Confidence Limits
smoking 3.344 2.104 5.313
asb 1.782 1.295 2.451

P = proportion of total that are cases.

(a) MODEL 1: can be used to estimate the unadjusted effect of asbestos exposure, ignoring smoking.
1.5202 0.5986

The 2 for intercept only is 1028.912, and the 2 for intercept and covariates is 1014.51.

Since the only covariate in the model is asb, the difference between these values for 2 gives a
chi‐square statistic (=1028.912 – 1014.506 = 14.41) with 1 DF to test for the unadjusted effect of
asbestos.

The “Estimate” for asb (0.5986) gives the estimated coefficient for asb in the model, i.e. the estimated
log odds ratio. SAS uses this estimate in conjunction with its standard error to compute the odds ratio
and its 95% confidence interval (1.82, 95%CI 1.33 to 2.49)

Notes:

(i) The estimated odds ratio agrees with the unadjusted estimate obtained in example 2 (Not

covered here).

(ii) the Wald chi‐square statistic is 13.94, 1DF, P=0.0002. This also indicates very strong evidence of
an association between asb and lung cancer.
19 Lecture 1: Logistic Regression

(b) MODEL 2: can be used to estimate the effect of smoking exposure, not adjusted for asbestos

2.1676 1.2237

The 2 for intercept only is again 1028.918, and the 2 for intercept and covariates is

995.145. Since the only covariate in the model is smoking, the difference between these values for
2 gives a chi‐square statistic (=1028.912 – 995.145 = 33.7) with 1 DF to test for the unadjusted
effect of smoking.

The “Estimate” for smoking (1.2237) gives the estimated coefficient for smoking in the model, i.e. the
estimated log odds ratio. The odds ratio and its 95% confidence interval are 3.40, 95%CI 2.14 to 5.39)

Notes:

(i) Smoking is a risk factor for lung cancer

(ii) the Wald chi‐square statistic is 27.03, 1DF, P<0.0001 also indicates very strong evidence of an
association between smoking and lung cancer.

(c) MODEL 3: can be used to estimate the effect of asbestos exposure, adjusted for smoking.
2.4968 0.5776 1.2070
The 2 for intercept only is again 1028.912, but the 2 for the intercept and covariates is
982.157. Since there are two covariates in the model, subtracting in the same way as we did above would
provide a chi‐square statistic with 2 DF to test for the joint effect of asb and smoking. To test for the
effect of asbestos after allowing for smoking we must use the change in 2 between MODEL 3 and
MODEL 2:

995.145 982.157 12.99 1 0.001

The estimated effect of asbestos exposure after adjusting for smoking is 1.78 (95%CI 1.30 to 2.45). (The
Wald chi‐square of 12.61, 1DF, P=0.0004 again agrees closely with the likelihood ratio test.)

Note: for this analysis to be complete, we would also need to test for effect modification by testing for
interaction between asb and smoking in the model (This will be covered in future lectures).
Lecture 1: Logistic Regression 20

Lecture 1: PRACTICAL

(a) To obtain a feel of the logit transformation, consider the following data showing the relationship
between coronary heart disease (CHD) and age in a sample of 1000 men:

Frequency Table of Age Group by CHD

CHD

Age group n Absent Present

20‐29 110 106 4

30‐34 133 120 13

35‐39 122 100 22

40‐44 141 91 50

45‐49 130 62 68

50‐54 124 37 87

55‐59 132 27 105

60‐69 108 13 95

Total 1000 556 444

(i) Calculate the proportion of men with CHD in each age group, P, and plot this proportion against
age (HINT: use the mid‐point of each age group).

(ii) Calculate logit P and plot it against age.

(iii) Describe the effect of the logit transformation on these data.

(b) The first phase of the Blue Mountains Eye Study was a cross‐sectional study of people aged 49 years
or more living in the Blue Mountains. Subjects had photographs taken of their lenses, which were
then graded for the presence of different types of cataract. Information on possible risk factors for
posterior subcapsular cataract (PSC) was collected using an interviewer‐administered questionnaire.
These data included age and smoking status. A total of 3065 study participants had complete data for
these variables, and information on the coding of the variables is given below.
21 Lecture 1: Logistic Regression

Variable variable name Coding

Posterior subcapsular cataract PSC 0 = no, 1 = yes

Age AGE Age in years

Smoking SMOKE 1 = current, 2 = past, 3 = never

These data are recorded in Tutorial_2.xlsx. Using this data:

(i) Produce a frequency table for the variable PSC. How many of the 3065 participants had a
posterior subcapsular cataract?

(ii) Fit a logistic regression model to the data to determine whether age is associated with
posterior subcapsular cataract.

(iii) Interpret the estimate for age

(iv) What is the estimated risk of posterior subcapsular cataract for a person aged 60 years?

(v) Is smoking associated with posterior subcapsular cataract after adjusting for age?

(vi) Interpret the coefficients for smoking in this model.
Appendix

Appendix: Percentage points of the χ2 distribution

From ARMITAGE P, BERRY G, MATTHEWS J.N.S. Statistical methods in medical research. 4th edition. Oxford: Blackwell Scientific Publications, 2001.

Ebook (EPUB) Nursing Research 11e Denise Polit, Cheryl Beck
50% (8)
Ebook (EPUB) Nursing Research 11e Denise Polit, Cheryl Beck
19 pages
Helping Skills-Clara Hill
92% (25)
Helping Skills-Clara Hill
466 pages
Text Book of Public Health and Community Medicine
84% (43)
Text Book of Public Health and Community Medicine
1,358 pages
Dholki Booklet
100% (2)
Dholki Booklet
16 pages
Phrasebank Manchester
100% (1)
Phrasebank Manchester
25 pages
Promoting Health
100% (2)
Promoting Health
267 pages
Crisis Intervention
50% (2)
Crisis Intervention
36 pages
Informatics and Nursing
100% (7)
Informatics and Nursing
1,026 pages
Correlation Regression Solutions
No ratings yet
Correlation Regression Solutions
22 pages
Nursing Care of Patients Receiving Chemotherapy
79% (19)
Nursing Care of Patients Receiving Chemotherapy
28 pages
Logistic Regression
100% (1)
Logistic Regression
34 pages
Epidemiology Lecture Notes
90% (10)
Epidemiology Lecture Notes
178 pages
5.1) Binary Logistic Regression
No ratings yet
5.1) Binary Logistic Regression
32 pages
Logistic Regression & Practice
100% (1)
Logistic Regression & Practice
51 pages
Cancer
100% (8)
Cancer
15 pages
APA 7th Edition PDF
100% (10)
APA 7th Edition PDF
25 pages
Binomial Logistic Regression Using SPSS
No ratings yet
Binomial Logistic Regression Using SPSS
11 pages
Multinomial Logistic Regression-2
No ratings yet
Multinomial Logistic Regression-2
21 pages
Final End of Life Care
100% (4)
Final End of Life Care
38 pages
Breast Cancer
100% (6)
Breast Cancer
67 pages
Research Protocol Template
100% (1)
Research Protocol Template
6 pages
Logistic Regression
No ratings yet
Logistic Regression
47 pages
OPMA Chapter 3 PPT and Textbook
No ratings yet
OPMA Chapter 3 PPT and Textbook
65 pages
Classification
No ratings yet
Classification
56 pages
Lecture3-Logistic Regression 6-5-08
No ratings yet
Lecture3-Logistic Regression 6-5-08
72 pages
Logistic Regression-Advanced Biostat PDF
No ratings yet
Logistic Regression-Advanced Biostat PDF
86 pages
Logistic Regression
No ratings yet
Logistic Regression
72 pages
The Research Protocol
100% (1)
The Research Protocol
8 pages
Classification With Logistic Regression: DR Sandipan Karmakar Mnit Jaipur
No ratings yet
Classification With Logistic Regression: DR Sandipan Karmakar Mnit Jaipur
54 pages
Care of The Dying
89% (9)
Care of The Dying
4 pages
Logistic Regression Analysis
No ratings yet
Logistic Regression Analysis
48 pages
Assignment Cover: Primrose Student. ZQMS-ARC-REC-002
No ratings yet
Assignment Cover: Primrose Student. ZQMS-ARC-REC-002
12 pages
Logistic Regression Monograph
No ratings yet
Logistic Regression Monograph
33 pages
Detailed Logistic Regression
No ratings yet
Detailed Logistic Regression
30 pages
Logistic Regression Analysis
100% (4)
Logistic Regression Analysis
65 pages
Logistic Regression Monograph - DSBA v2
No ratings yet
Logistic Regression Monograph - DSBA v2
54 pages
Machine Learning (Se204A) Lab Manual
No ratings yet
Machine Learning (Se204A) Lab Manual
27 pages
Mba Tgouexam Dec2011 Pape
No ratings yet
Mba Tgouexam Dec2011 Pape
166 pages
Business Analytics: Advance: Logistic Regression
100% (1)
Business Analytics: Advance: Logistic Regression
26 pages
Logistic+Regression+Monograph+ +DSBA+v2
No ratings yet
Logistic+Regression+Monograph+ +DSBA+v2
54 pages
Breast Cancer Survivors
100% (1)
Breast Cancer Survivors
30 pages
W5S01 - PM-Logistic Regression
No ratings yet
W5S01 - PM-Logistic Regression
17 pages
Geriatric Nursing Review Questions
88% (103)
Geriatric Nursing Review Questions
30 pages
PHD Syllabus
No ratings yet
PHD Syllabus
119 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
23 pages
4 - C - Logistic Regression
No ratings yet
4 - C - Logistic Regression
13 pages
Regression3 Slides
No ratings yet
Regression3 Slides
47 pages
Chapter 06 - US 7e
No ratings yet
Chapter 06 - US 7e
22 pages
Clinical Manual of Geriatric Psychiatry - Thakur, Mugdha E. (SRG)
100% (12)
Clinical Manual of Geriatric Psychiatry - Thakur, Mugdha E. (SRG)
317 pages
Machine Learning Ensembles For Wind Power Prediction: Version of Record
No ratings yet
Machine Learning Ensembles For Wind Power Prediction: Version of Record
23 pages
Logistic Regression
No ratings yet
Logistic Regression
30 pages
CLC Attracting and Retaining Critical Talent Segments Identifying Drivers of Attraction and Commitment in The Global Labor Market
100% (1)
CLC Attracting and Retaining Critical Talent Segments Identifying Drivers of Attraction and Commitment in The Global Labor Market
191 pages
Communication and Support in Palliative Care
100% (1)
Communication and Support in Palliative Care
30 pages
Logistic Regression
100% (2)
Logistic Regression
47 pages
Clinical Epidemiology The Essentials Robert H. Fletcher, Suzanne, 2014 PDF
100% (8)
Clinical Epidemiology The Essentials Robert H. Fletcher, Suzanne, 2014 PDF
274 pages
Breast Cancer 1
No ratings yet
Breast Cancer 1
194 pages
11 - Logistic Regression
No ratings yet
11 - Logistic Regression
17 pages
Logistic Regression Analysis 2022
No ratings yet
Logistic Regression Analysis 2022
38 pages
RM - Binary Logistic Regression Model - Estimation
No ratings yet
RM - Binary Logistic Regression Model - Estimation
19 pages
Lecture 22. GLM
No ratings yet
Lecture 22. GLM
41 pages
Logistic Regression
No ratings yet
Logistic Regression
24 pages
Logistic Regression
No ratings yet
Logistic Regression
16 pages
MBA Data Analytics - SAGE
No ratings yet
MBA Data Analytics - SAGE
10 pages
Determining Fatigue Failure of Compacted Asphalt Concrete Subjected To Repeated Flexural Bending
No ratings yet
Determining Fatigue Failure of Compacted Asphalt Concrete Subjected To Repeated Flexural Bending
14 pages
Nursing Intervention in Alleviating Loneliness in Elderly Homes
No ratings yet
Nursing Intervention in Alleviating Loneliness in Elderly Homes
61 pages
What Is Logistic Regression
No ratings yet
What Is Logistic Regression
20 pages
Regression Logistic 4
No ratings yet
Regression Logistic 4
51 pages
Unit 2 Soft
No ratings yet
Unit 2 Soft
14 pages
Logistic Regression Playbook
No ratings yet
Logistic Regression Playbook
19 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Day 13 Logistic Regression
No ratings yet
Day 13 Logistic Regression
28 pages
Neurath 2020-11-30 FAN F Neurotox Dose-Response Assessment, NTP Studies
No ratings yet
Neurath 2020-11-30 FAN F Neurotox Dose-Response Assessment, NTP Studies
30 pages
Bcom Sem-6 Syllabus
No ratings yet
Bcom Sem-6 Syllabus
8 pages
Gait Speed and Survival JAMA 2011
No ratings yet
Gait Speed and Survival JAMA 2011
9 pages
Logistic Regression
No ratings yet
Logistic Regression
30 pages
Preventing Social Isolation and Loneliness
100% (1)
Preventing Social Isolation and Loneliness
27 pages
Research Methodology MCQ Questions With Answers
81% (779)
Research Methodology MCQ Questions With Answers
60 pages
Logistic Regression
100% (2)
Logistic Regression
32 pages
Logistic Regression
100% (1)
Logistic Regression
21 pages
Bio2 Module 5 - Logistic Regression
No ratings yet
Bio2 Module 5 - Logistic Regression
19 pages
Psy 512 Logistic Regression
No ratings yet
Psy 512 Logistic Regression
12 pages
Supporting Older People Toolkit 2014
No ratings yet
Supporting Older People Toolkit 2014
40 pages
Logistic Regression
100% (1)
Logistic Regression
21 pages
Eight Questions About Corruption: Jakob Svensson
No ratings yet
Eight Questions About Corruption: Jakob Svensson
58 pages
Logistic Regression
100% (1)
Logistic Regression
37 pages
Logistic Regression 1
No ratings yet
Logistic Regression 1
6 pages
Sta 3010 Quizes
No ratings yet
Sta 3010 Quizes
10 pages
Practice Exam For ECOR 1010
100% (1)
Practice Exam For ECOR 1010
18 pages
SPSS Assignment 1
No ratings yet
SPSS Assignment 1
6 pages
Determinants of Vehicle Ownership in Nigeria
No ratings yet
Determinants of Vehicle Ownership in Nigeria
13 pages
Psych Drugs Cheat Sheet
100% (47)
Psych Drugs Cheat Sheet
4 pages
Chapter - 14 Advanced Regression Models
No ratings yet
Chapter - 14 Advanced Regression Models
49 pages
Logistic Regression: Logistic Regression and The New: Residual Logistic Regression
No ratings yet
Logistic Regression: Logistic Regression and The New: Residual Logistic Regression
31 pages
EGVN 602 Assignment 3 - Solution
No ratings yet
EGVN 602 Assignment 3 - Solution
13 pages
Logisticregression 120102011227 Phpapp01
No ratings yet
Logisticregression 120102011227 Phpapp01
11 pages
Home Lesson 15: Logistic, Poisson & Nonlinear Regression
No ratings yet
Home Lesson 15: Logistic, Poisson & Nonlinear Regression
32 pages
Data Science and Bigdata Analytics: Dr. Ali Imran Jehangiri
No ratings yet
Data Science and Bigdata Analytics: Dr. Ali Imran Jehangiri
20 pages
Analisis Deskriptif cONTOH
No ratings yet
Analisis Deskriptif cONTOH
3 pages
The Effect of Cross Cultural Differences and Productivity
No ratings yet
The Effect of Cross Cultural Differences and Productivity
10 pages
Logistic Regression
No ratings yet
Logistic Regression
18 pages
Logistic Regression
No ratings yet
Logistic Regression
7 pages
Pseudocode To Predict Stock Prices
No ratings yet
Pseudocode To Predict Stock Prices
3 pages
Perimenopause and Mental Health: Implications For The Assessment and Treatment of Women at Midlife
No ratings yet
Perimenopause and Mental Health: Implications For The Assessment and Treatment of Women at Midlife
12 pages
Binary Logistic Regression Lecture 9
No ratings yet
Binary Logistic Regression Lecture 9
33 pages
Binary Logistic Regression - 6.2
No ratings yet
Binary Logistic Regression - 6.2
34 pages
Logistic Regression: Psy 524 Ainsworth
No ratings yet
Logistic Regression: Psy 524 Ainsworth
37 pages
Loneliness and Social Isolation
No ratings yet
Loneliness and Social Isolation
16 pages
新增 Microsoft Word Document
No ratings yet
新增 Microsoft Word Document
10 pages
Regression of Thermodynamic Data
No ratings yet
Regression of Thermodynamic Data
18 pages
Dummy Independent Variable
No ratings yet
Dummy Independent Variable
14 pages
Introduction To Logistic Regression: Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein
No ratings yet
Introduction To Logistic Regression: Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein
36 pages
Econometrics For ECO 2022 Tutorial 4
No ratings yet
Econometrics For ECO 2022 Tutorial 4
2 pages
End-Term Questions and Answers
No ratings yet
End-Term Questions and Answers
3 pages
Laboratory 10
No ratings yet
Laboratory 10
8 pages
DID101
No ratings yet
DID101
6 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
3 pages
In, Logistic Regression, or Logit Regression, or Logit Model Is Aregression Model Where The Is
No ratings yet
In, Logistic Regression, or Logit Regression, or Logit Model Is Aregression Model Where The Is
1 page
Demand Forecasting
No ratings yet
Demand Forecasting
7 pages
Lectures on the Coupling Method
From Everand
Lectures on the Coupling Method
Torgny Lindvall
No ratings yet
Introduction to Logarithms and Exponentials
From Everand
Introduction to Logarithms and Exponentials
Simone Malacrida
No ratings yet

Logistic Regression Course Note

Uploaded by

Logistic Regression Course Note

Uploaded by

PHCL 612- Logistic Regression Course Notes

PHCL 612: Applied Biostatistics-1

Example of a logistic regression function: 2.0 0.5

Parameter DF Estimate Standard Wald Pr > ChiSq

Intercept 1 ‐4.2143 0.8478 24.7071 <.0001

PRIORFRAC 1 0.8388 0.2342 12.8335 0.0003

AGE 1 0.0412 0.0122 11.4402 0.0007

PRIORFRAC 2.314 1.462 3.661

AGE 1.042 1.017 1.067

The interpretation of in a multiple logistic regression follows from the above, and exp( ) gives the

PRIORFRAC 2.314 1.462 3.661

AGE 1.042 1.017 1.067

The maximum likelihood estimates are those values of , , ,…, that maximise L, i.e. for which the

, , ,…, where , , ,…, are the maximum likelihood estimates of the

exp 0.4 0.67

(a) , ,…, with

Then 2 2 is approximately a with q DF.

Variable Coefficient SE OR

Asbestos 0.5776 0.1626 1.782

Smoking 1.2070 0.2363 3.344

Type 3 Analysis of Effects

ASB SMOKING CASES CONTROLS

TITLE1 'Logistic regression example 3';

PROC IMPORT DATAFILE='/home/u48776026/my_shared_file_links/u48776026/

PROC LOGISTIC DATA = table;

PROC LOGISTIC DATA = table;

PROC LOGISTIC DATA = table;

Logistic regression example 3

The LOGISTIC Procedure

Model Fit Statistics

Criterion Intercept Only Intercept and Covariates

AIC 1030.912 1018.506

-2 Log L 1028.912 1014.506

Testing Global Null Hypothesis: BETA=0

Analysis of Maximum Likelihood Estimates

Odds Ratio Estimates

Model Fit Statistics

Criterion Intercept Only Intercept and Covariates

AIC 1030.912 999.145

-2 Log L 1028.912 995.145

Testing Global Null Hypothesis: BETA=0

Analysis of Maximum Likelihood Estimates

Odds Ratio Estimates

Model Fit Statistics

Criterion Intercept Only Intercept and Covariates

AIC 1030.912 988.157

-2 Log L 1028.912 982.157

Testing Global Null Hypothesis: BETA=0

Analysis of Maximum Likelihood Estimates

Odds Ratio Estimates

The 2 for intercept only is 1028.912, and the 2 for intercept and covariates is 1014.51.

(i) The estimated odds ratio agrees with the unadjusted estimate obtained in example 2 (Not

The 2 for intercept only is again 1028.918, and the 2 for intercept and covariates is

995.145 982.157 12.99 1 0.001

Age group n Absent Present

20‐29 110 106 4

30‐34 133 120 13

35‐39 122 100 22

40‐44 141 91 50

45‐49 130 62 68

50‐54 124 37 87

55‐59 132 27 105

60‐69 108 13 95

Total 1000 556 444

Variable variable name Coding

Posterior subcapsular cataract PSC 0 = no, 1 = yes

Age AGE Age in years

Smoking SMOKE 1 = current, 2 = past, 3 = never

You might also like