A practical introduction
to nordpred
Michael Otterstatter
Cancer Surveillance
Centre for Chronic Disease Prevention and Control
Public Health Agency of Canada
March 7-9, 2011
Projections
Accurate projections are essential for planning…
…but how best to predict future events?
2
Projections
3
Many models are available
• Poisson regression (classic APC model)
• other generalized linear models (power models)
• generalized additive models
• simple average methods
• univariate (ARIMA) & multivariate (VAR) time series
• state-space models (American Cancer Society)
• functional data analysis
• Bayesian models (versions of all of the above)
4
Practical introduction to nordpred
This presentation is a practical introduction to one approach
to projections: the nordpred package for cancer projections
Observed Projected
values NORDPRED values
Theoretical and philosophical considerations are important,
but will not be covered here
5
The nordpred package
• R software package that predicts trends in cancer incidence
using a version of the traditional age-period-cohort model
and Bjørn Møller (
[email protected]), of the Cancer
Registry of Norway
• background and example:
• Møller B, et al. 2003. Prediction of cancer incidence in the Nordic
countries: Empirical comparison of different approaches. Statistics in
medicine 22:2751-2766
• Møller B, et al. 2002. Prediction of cancer incidence in the Nordic
countries up to the year 2020. European Journal of Cancer Prevention
11, suppl. 1
6
The nordpred package
Available free-of-charge (but see license agreement):
www.kreftregisteret.no/en/Research/Projects/Nordpred/Nordpred-software/
# Nordpred: R (www.r-project.org) & S-PLUS
# (www.insightful.com) functions
# for prediction of cancer incidence (as used in the
# Nordpred project)
# Written by: Bjørn Møller and Harald Fekjaer
# <
[email protected]>, 2000-2003
# Version 1.1. updated to correct for wrong estimates
# in young age groups
# License: GNU version 2
7
Recent examples of nordpred
• Coupland, V. H. et al. 2010. The future burden of cancer in London compared
with England. Journal of Public Health, 32:83.
• Mork, J. et al. 2010. Time trends in pharyngeal cancer incidence in Norway
1981-2005. Cancer Causes and Control, 21:1397
• Parkin, D. M. et al. 2009. The potential for prevention of colorectal cancer in
the UK. European Journal of Cancer Prevention, 18:179.
• Aitken, R. et al. 2008. Cancer incidence and mortality projections in New
South Wales, 2007 to 2011. Cancer Institute NSW.
• Olsen, A. H. et al. 2008. Cancer mortality in the United Kingdom: Projections
to the year 2025. British Journal of Cancer, 99:549.
• Parkin, D. M. et al. 2008. Predicting the impact of the screening programme
for colorectal cancer in the UK. Journal of Medical Screening, 15:163.
• Ferlay, J. et al. 2007. Estimates of the cancer incidence and mortality in
Europe in 2006. Annals of Oncology, 18:581.
• Møller, H. et al. 2007. The future burden of cancer in England: Incidence and
numbers of new patients in 2020. British Journal of Cancer, 96:1484.
• Quinn, M. J. et al. 2003. Cancer mortality trends in the EU and acceding
countries up to 2015. Annals of Oncology, 14:1148.
8
Basic steps for using nordpred
1. Reading the nordpred package
2. Input data
3. Generate projections
4. Get results
5. Plot results
6. Explore options
9
1. Reading the nordpred package
Download ‘nordpred.s’ file to preferred working directory and
read package in R
setwd("C:/Documents and Settings/My Documents/nordpred")
source("nordpred.s")
10
Example #1: Colon cancer in Norway*
* Data and code for this example available at nordpred website
11
2. Input data: structure
Numbers of cancer cases Population sizes
Five-year periods Five-year periods
58-62 63-67 68-72 73-77 78-82 83-87 88-92 93-97 58-62 63-67 68-72 73-77 78-82 83-87 88-92 93-97
0-4 0-4
5-9 5-9
10-14 10-14
15-19 15-19
20-24 20-24
Five-year age groups
Five-year age groups
25-29 25-29
30-34 30-34
35-39 35-39
40-44 40-44
45-49 45-49
50-54 50-54
55-59 55-59
60-64 60-64
65-69 65-69
70-74 70-74
75-79 75-79
80-84 80-84
85+ 85+
Note: there must be 18 five-year age groups in both the cases/deaths
and population data files
12
2. Input data: reading in files
Cancer cases or deaths, observed
indata <- read.table (
"colon-men-Norway.txt", observed counts
header = TRUE,
sep = "," ,
row.names = 1 )
13
2. Input data: reading in files (continued...)
14
2. Input data: reading in files (continued...)
Population sizes, observed and projected
inpop1 <- read.table (
"men-Norway.txt", person-years during
header = TRUE, observed period
sep = "," ,
row.names = 1 )
inpop2 <- read.table (
"men-Norway-pred.txt", person-years during
header = TRUE, projection period
sep = "," ,
row.names = 1 )
inpop <- cbind(inpop1,inpop2) merge population files
into a single file
15
2. Input data: reading in files (continued...)
16
3. Producing projections: overview
• Producing projections is a two-step process
1. fitting a model to the observed data
2. generating projections (unknown future values) based on the
fitted model
• Nordpred allows these steps to be separate or combined
1. Separate: use nordpred.estimate and
nordpred.prediction functions in sequence
2. Combined: use single nordpred function
17
3. Producing projections: overview (continued...)
• Nordpred fits an APC regression model to the observed data
‘drift’ or ‘trend’
Cases = (Age) + (Period) + (Cohort) + Period term
categorical continuous
variables variable
• Due to colinearity among age, period and cohort, linear
effects of period and cohort cannot be estimated
simultaneously— a common linear trend (‘drift’) is
estimated instead
• Predictions assume future cohort and period effects are
equal to last estimated effect in the fitted model
18
3. Producing projections: fit model
a. Fit APC regression model to input data
est <- nordpred.estimate (
cases = indata,
pyr = inpop, input data
num. observed 5-yr periods to
noperiod = 4, use in model fit (x≥3)
startestage = 5, youngest age group to include
in regression model
linkfunc = "power5" )
“power5” or “poisson” links
#optional—show model information:
print(est$glm)
19
3. Producing projections: fit model (continued...)
20
3. Producing projections: predict
b. Generate projected values from fitted model
res <- nordpred.prediction (
est, fitted model
startuseage = 6, youngest age group using
fitted model for projections*
cuttrend=c(0,.25,.5,.75,.75), proportional reduction in drift,
for each successive projection
recent = TRUE ) period (vector of values)
#optional—summary of NUMBERS & model: use average trend across all
summary(res) observed periods (FALSE) or
only from last 10 years (TRUE)
*age groups younger than ‘startuseage’ cut-off are projected using the
average rate from the most recent 10 years of observed data
21
22
3. Producing projections: prediction output
Observed and predicted values
X58.62 X63.67 X68.72 X73.77 X78.82 X83.87 X88.92 X93.97 X98.02 X03.07 X08.12 X13.17 X18.22
0-4 1 0 0 0 0 0 0 0 0 0 0 0 0
5-9 0 0 0 0 0 0 0 0 0 0 0 0 0
10-14 1 0 1 1 0 0 0 0 0 0 0 0 0
15-19 1 0 0 1 3 0 1 3 2 2.2 2.5 2.6 2.5
20-24 1 4 2 3 3 2 4 3 3.6 3.1 3.4 3.9 4
25-29 4 1 6 6 6 11 7 7 9.6 9.5 8.9 9.9 11.8
30-34 13 11 6 12 15 26 14 10 19.9 20.1 19.3 17.5 19.4
35-39 27 16 18 15 29 22 31 18 24.2 31.1 30.6 28.5 25.9
40-44 32 37 32 28 40 50 64 71 59.5 51 62.3 59.5 55.4
45-49 54 75 73 73 75 72 122 112 163 120.5 104.7 122.5 117
50-54 66 103 128 139 125 134 185 190 265.7 278.7 210 183.7 212.4
55-59 105 124 197 210 257 254 230 269 448.5 444 453.9 347.7 311.5
60-64 189 212 252 283 369 442 446 436 612.6 742.8 738.5 739.5 585.4
65-69 197 258 303 358 484 603 683 668 785.1 867.2 1047.7 1045.1 1045.4
70-74 223 286 376 403 554 663 754 880 1003.5 1001 1104.7 1336.8 1356.1
75-79 192 262 312 402 502 640 775 913 1100.2 1118.9 1121.9 1246 1535.7
80-84 146 160 215 278 337 415 513 615 780.3 894.6 921.4 937.8 1066.7
85+ 81 95 122 151 209 278 328 379 549.3 636.6 755.6 830.1 891.6
Observed Predicted
23
3. Producing projections: prediction output
Model information
Prediction done with:
Number of periods predicted (nopred): 5
Trend used in predictions (cuttrend): 0 , 0.25 , 0.5 , 0.75 , 0.75
Number of periods used in estimate (noperiod): 8
P-value for goodness of fit: 0.7
Used recent (recent): TRUE
P-value for recent: 0.0292
First age group used (startuseage): 6
First age group estimated (startestage): 5
24
4. Get results: basic output
a. Projected numbers and rates, with options
pred_num <- nordpred.getpred (
res,
numbers (FALSE) or rates (TRUE)
incidence=FALSE, crude (NULL) or age-stand. rates
(vector of weights)
standpop=NULL,
omit (FALSE) or show (TRUE)
observed values
excludeobs=TRUE,
output for all ages only (FALSE)
byage=TRUE, or by age groups (TRUE)
agegroups=c(1:18)) output for specified age groups
25
4. Get results: basic output (continued…)
26
4. Get results: basic output (continued…)
b. Generate crude and age-standardized rates
predrates_crude <- nordpred.getpred(
res, incidence=TRUE, standpop=NULL,
excludeobs=TRUE, byage=FALSE,
agegroups=c(1:18))
Canada1991 <- c(.0694,.0694,.0680,.0684,.0750,.0899,
.0924,.0833,.0760,.0595,.0476,.0440,.0423,.0385,
.0296,.0221,.0135,.0102)
predrates_stnd <- nordpred.getpred(
res, incidence=TRUE, standpop=Canada1991,
excludeobs=TRUE, byage=FALSE,
agegroups=c(1:18))
27
4. Get results: basic output (continued…)
crude rates
age-standardized
rates
28
4. Get results: basic output (continued…)
29
4. Get results: basic output (continued…)
c. Generate projected values and export to a file
# GENERATE PROJECTED VALUES
pred <- nordpred.getpred (res, incidence=FALSE,
standpop=NULL, excludeobs=TRUE,
byage=TRUE, agegroups=c(1:18))
# EXPORT PROJECTED VALUES TO TEXT FILE
write.table (pred, file="Norway_males_CRC.txt",
sep=",", na="NA", row.names=TRUE,
col.names=TRUE)
30
4. Get results: basic output (continued…)
c. Generate projected values and export to a file
Five year period
X98.02 X03.07 X08.12 X13.17 X18.22
0-4 0 0 0 0 0
5-9 0 0 0 0 0
10-14 0 0 0 0 0
15-19 2.04 2.19 2.53 2.6 2.55
20-24 3.59 3.12 3.35 3.88 3.97
25-29 8.72 8.18 7.38 8.06 9.43
Five year age group
30-34 16.8 16.17 15.01 13.41 14.62
35-39 18.55 22.61 21.57 19.83 17.77
40-44 56.4 44.87 52.83 49.85 45.83
45-49 153.31 110.05 90.47 103.89 98.04
50-54 265.49 272.73 202.67 171.14 194.16
55-59 461.84 458.36 463.53 354.79 308.74
60-64 658.68 789.19 793.43 792.46 628.49
65-69 828.27 958.51 1154 1172.51 1169.38
70-74 1008.7 1033.45 1198.54 1454.99 1500.87
75-79 1100.34 1140.9 1179.74 1380.22 1706.57
80-84 742.88 866.72 914.87 964.69 1153.69
85+ 529.27 609.17 739.58 836.86 930.5
31
5. Plot results
a. Generate plots of crude and/or standardized rates
plot( res,
incidence=TRUE, numbers (FALSE) or rates (TRUE)
standpop=NULL, crude (NULL) or age-stand. rates
(vector of weights)
agegroups="all",
new=TRUE, overlay on current plot (FALSE) or
create a new plot (TRUE)
lty=c(1,2),
col=c(1,1), line types (1=solid, 2=dashed, etc.)
and colours (1=black, 2=red, etc.)
main="Colorectal Cancer",
xlab="Period", labels for plot, x-axis and y-axis
ylab="Incidence rate")
32
5. Plot results (continued…)
Colorectal Cancer
50
40
Incidence rate
30
20
10
0
X58.62 X68.72 X78.82 X88.92 X98.02 X08.12 X18.2
Period
33
5. Plot results (continued…)
b. Overlaying plots
plot( res,
incidence=TRUE,
standpop=NULL,
agegroups="all",
new=TRUE,
lty=c(1,2),
col=c(1,1),
main="Colorectal Cancer",
xlab="Period",
ylab="Incidence rate")
plot( res, incidence=TRUE, standpop=Canada1991,
new=FALSE, lty=c(1,2), col=c(2,2))
plot( res, incidence=TRUE, standpop=US2000,
new=FALSE, lty=c(1,2), col=c(4,4))
legend(9, 10, c("Crude rate","Canada 1991", "US
2000"), text.col=c(1,2,4))
34
5. Plot results (continued…)
Colorectal Cancer
50
40
Incidence rate
30
20
10
Crude rate
Canada 1991
US 2000
0
X58.62 X68.72 X78.82 X88.92 X98.02 X08.12 X18.22
Period
35
Producing projections: using a single function
• Producing projections is a two-step process
1. fitting a model to the observed data
2. generating projections (unknown future values) based on the
fitted model
• Nordpred allows these steps to be separate or combined
1. Separate: use nordpred.estimate and
nordpred.prediction functions in sequence
2. Combined: use single nordpred function
36
6. Exploring nordpred options: overview
projections <- nordpred (
cases=,
input data
pyr=,
startestage=, youngest age group to include in model fit
startuseage=, youngest age group to use for projections
use fixed number of obs. periods in projection
noperiods=, base (e.g., 5), or let nordpred choose from a
range (e.g., 4:6) using goodness-of-fit test
recent=, trend: historical average (FALSE), or only from
last 10 years (TRUE), or nordpred decides
(NULL) based on curvature of trend
cuttrend=, proportional reduction in drift, for each
successive projection period (vector of values)
linkfunc= ) “power5” or “poisson” links
37
6. Exploring nordpred options: observed periods
3 obs. periods (min)
7 obs. periods
10 obs. periods
38
6. Exploring nordpred options: trend
Compare projections using recent vs. historical trend
proj_recent_trend <- nordpred (
cases=indata,pyr=inpop,startestage=5,startuseage=6,
noperiods=8, recent=TRUE,
cuttrend=c(0,.25,.5,.75,.75), linkfunc="power5")
proj_historical_trend <- nordpred (
cases=indata,pyr=inpop,startestage=5,startuseage=6,
noperiods=8, recent=FALSE,
cuttrend=c(0,.25,.5,.75,.75), linkfunc="power5")
# CODE FOR PRODUCING OVERLAYED PLOTS
plot(proj_historical_trend, incidence=TRUE, standpop=NULL, agegroups="all",
new=TRUE,lty=c(1,2),col=c(1,1),main="Incidence Rate of Colorectal Cancer Among
Norwegian Males", xlab="Period", ylab="Incidence rate")
plot(proj_recent_trend,incidence=TRUE,standpop=NULL, agegroups="all",
new=FALSE,lty=c(1,2),col=c(2,2))
legend(9, 10, c("Historical trend", "Recent trend"), text.col=c(1,2))
39
6. Exploring nordpred options: trend
Compare projections using recent vs. historical trend
Incidence Rate of Colorectal Ca
50
40
Incidence rate
30
20
10
Historical trend
Recent trend
0
X58.62 X68.72 X78.82 X88.92 X98.02 X08.12 X18.22
Period
40
6. Exploring nordpred options: link function
Compare projections using power5 vs. Poisson link
proj_power5 <- nordpred (
cases=indata,pyr=inpop,startestage=5,startuseage=6,
noperiods=8,recent=FALSE,cuttrend=c(0,.25,.5,.75,.75),l
inkfunc="power5")
proj_poisson <- nordpred (
cases=indata,pyr=inpop,startestage=5,startuseage=6,
noperiods=8,recent=FALSE,cuttrend=c(0,.25,.5,.75,.75),l
inkfunc="poisson")
# CODE FOR PRODUCING OVERLAYED PLOTS
plot( proj_poisson, incidence=TRUE,standpop=NULL,agegroups="all",new=TRUE,
lty=c(1,2),col=c(1,1),main="Incidence Rate of Colorectal Cancer Among Norwegian
Males",xlab="Period", ylab="Incidence rate")
plot( proj_power5, incidence=TRUE,standpop=NULL,agegroups="all",new=FALSE,
lty=c(1,2),col=c(2,2))
legend(9, 10, c("Poisson", "Power5"), text.col=c(1,2))
41
6. Exploring nordpred options: link function
Compare projections using power5 vs. Poisson link
Incidence Rate of Colorectal Ca
60
50
Incidence rate
40
30
20
10
Poisson
Power5
0
X58.62 X68.72 X78.82 X88.92 X98.02 X08.12 X18.22
Period
42
6. Exploring nordpred options: choose carefully!
Incidence of Colorectal Cancer Amon
10000
Number of new cases
8000
6000
4000
Poisson link
Power link
2000
omit oldest obs.
use recent trend
use cut-trend
increase cut-trend
0
X58.62 X68.72 X78.82 X88.92 X98.02 X08.12 X18.22
Period
43
Acknowledgment and disclaimer
The Cancer Surveillance and Epidemiology Networks have been
made possible through a financial contribution from Health Canada,
provided by the Canadian Partnership Against Cancer.
The views expressed herein do not necessarily represent the views of
the Canadian Partnership Against Cancer nor that of Health Canada.
44
Supplemental information
45
Calculation of predicted rates in nordpred
Projections: 1. linear trend component
2. non-linear cohort component
3. fix period to last estimated value
Observed rates Predicted rates
1995-1999 2000-2004 2005-2009
exp(A1+5⋅D+C12+P5) exp(A1+6⋅D+C13+P6) exp(A1+7⋅D+C13+P6)
exp(A2+5⋅D+C11+P5) exp(A2+6⋅D+C12+P6) exp(A2+7⋅D+C13+P6)
exp(A3+5⋅D+C10+P5) exp(A3+6⋅D+C11+P6) exp(A3+7⋅D+C12+P6)
exp(A4+5⋅D+C9+P5) exp(A4+6⋅D+C10+P6) exp(A4+7⋅D+C11+P6)
exp(A5+5⋅D+C8+P5) exp(A5+6⋅D+C9+P6) exp(A5+7⋅D+C10+P6)
exp(A6+5⋅D+C7+P5) exp(A6+6⋅D+C8+P6) exp(A6+7⋅D+C9+P6)
exp(A7+5⋅D+C6+P5) exp(A7+6⋅D+C7+P6) exp(A7+7⋅D+C8+P6)
exp(A8+5⋅D+C5+P5) exp(A8+6⋅D+C6+P6) exp(A8+7⋅D+C7+P6)