SC968 Lecture 5 Worksheet
SC968 Lecture 5 Worksheet
5.0 OBJECTIVES
To be able to summarise time to event data
To become familiar with Cox regression models in Stata
To perform appropriate commands in Stata to check the proportional
hazards assumption
To examine the results critically
If you have not already done so, download the data from the course website:
http://www.iser.essex.ac.uk/iser/teaching/module-sc968/course-
materials and save it in your working directory as “survival.dta”.
In order to download the data, you will need the username and password,
which you were given in the first practical.
Try to keep a “clean” copy of the data in your working directory, and save any
other datafiles which you may produce under different names.
Open STATA, and set your working directory as your home directory
cd m:
Open the teaching dataset
use survival, clear
The data set is a subset of records from our training.dta data set containing a
20 percent random sample of the British Household Panel Survey. Data from
wave 1 (1991) to wave 15 (2005) are included. The file is organised in “long”
(person-wave) format so there is one record for each wave that a respondent
participated in the survey. For survival.dta, only those who were single, never
married at wave 1 are selected and only the variables needed for the practical
have been kept. For simplicity, missing values on the independent variables
have been filled in using the last recorded value.
The first task will be to get to know the data by looking at some descriptive
statistics. Some variables are derived from those you are familiar with in
preparation for the practical.
describe
summarize
Then comes the most important part before any survival modelling: preparing
the data.
First the data must be declared as survival time data using the stset
command.
Have a look at them for the first 100 records and check that they are set
correctly.
stdes
Are there any gaps?
The next task is to produce a summary of survival times and rates for the
complete sample.
stsum
stsum, by(sex)
You can plot the hazard function for men and women using the Kaplan-Meier
graph
Now test for a difference in the time to cohabitation between men and women
using the log rank tests
Now let’s fit some Cox proportional hazard models to the data. The command
for this is stcox. For example, to predict time to partnership from gender:
xi:stcox i.sex
First look at gender and age group. Run the Cox proportional hazard model
for this and fill in the values in table 1 below:
What happens to the hazard ratio for gender when you adjust for age group?
Why do you think the hazard ratio changes?
Adjusted for
age group and
gender
Adjusted for
age group and
gender and
other SEP
measures
Which measure of SEP has the highest hazard ratios in the unadjusted
models? And which the lowest?
Are they all significant predictors of time to cohabitation?
How would you interpret the differences between the unadjusted hazard ratios
and the hazard ratios adjusted for age and sex?
Does each SEP measure still predict time to cohabitation when you control for
the other SEP measures?
What do you conclude from this?
5.4 THE PROPORTIONAL HAZARDS ASSUMPTION
stcoxkm, by(sex)
What do you notice? Are the observed lines close to the predicted lines?
You can also carry out a formal test of the proportional hazards assumption
for gender. One way to do this is to introduce an interaction term between
gender and time. This can be done with the options tvc and texp in the stcox
command.
The tvc option specifies that gender should interact with a function of time and
the texp specifies that this function is log(time). This allows the effect of
gender on survival to vary by time since entry to the study.
You can see that the output gives you two rows for sex. The second relates to
the interaction of sex with log(time). If the interaction is significant, there is
evidence that the effect of gender varies by time. Does it?
Is there any evidence that hazards are non proportional for one of the other
covariates?
Run the graphical and formal checks on the proportional hazards function for
the variable agegroup.
If there is evidence that the proportional hazards assumption is not met, then
re-run the analyses treating the agegroups as separate strata. This estimates
separate baseline hazard functions for each group thus allowing them to be
non-proportional. To do this, use the strata option.
If the proportional hazards function for the SEP measures had been violated,
an option would have been to re-run the analysis using the time-varying
versions of the SEP measures.
Some respondents could have withdrawn from the survey before they have
found a partner. We are now going to investigate predictors of drop-out.
The variable wdrawn takes a value 1 if the respondent dropped out of the
survey before wave 15 and 0 otherwise.
Use list, stdes, stsum to check that you have set the survival variables up
correctly.