Computing Direct and Indirect Standardized Rates and Risks With The STDRATE Procedure
Computing Direct and Indirect Standardized Rates and Risks With The STDRATE Procedure
Paper 423-2013
ABSTRACT
In epidemiological and health care studies, a common goal is to establish relationships between various
factors and event outcomes. But outcome measures such as rates or risks can be biased by confounding.
You can control for confounding by dividing the population into homogeneous strata and estimating rate or
risk based on a weighted average of stratum-specific rate or risk estimates. This paper reviews the concepts
of standardized rate and risk and introduces the STDRATE procedure, which is new in SAS/STAT® 12.1.
PROC STDRATE computes directly standardized rates and risks by using Mantel-Haenszel estimates, and
it computes indirectly standardized rates and risks by using standardized morbidity/mortality ratios (SMR).
PROC STDRATE also provides stratum-specific summary statistics, such as rate and risk estimates and
confidence limits.
INTRODUCTION
Epidemiology is the study of the occurrence and distribution of health-related states or events in specified
populations. It is also the study of causal mechanisms for health phenomena in populations (Friss and
Sellers 2009, p. 5). A goal of epidemiology is to establish relationships between various factors (such as
exposure to a specific chemical) and event outcomes (such as incidence of disease). Two commonly used
event frequency measures are rate and risk, which are defined as follows:
• An event rate in a defined population is a measure of the frequency with which an event occurs in a
specified period of time. That is, an event rate is the number of new events divided by population-time
(for example, person-years) over the time period (Kleinbaum, Kupper, and Morgenstern 1982, p. 100).
• An event risk in a defined population is the probability that an event occurs in a specified time period.
That is, an event risk is the number of events divided by the population size in the time period.
Event rates and risks can be biased by confounding, which occurs when other variables that are associated
with exposure influence the outcome. For example, when event rates vary for different age groups of a
population, the crude rate for the population (unadjusted for age structure) might not be a meaningful
summary statistic. In particular, the crude rate might be misleading when it is used to compare two
populations that differ in their age structures.
A common strategy for controlling confounding is stratification. You begin by subdividing the population into
several strata that are defined by levels of the confounding variables, such as age. You estimate the effect of
exposure on the event outcome within each stratum, and then you combine the resulting stratum-specific
effect estimates into an overall estimate.
Standardized overall rate and risk estimates that are based on stratum-specific estimates adjust for the
effects of confounding variables. These estimates provide meaningful summary statistics and allow valid
comparisons of populations. There are two types of standardization:
• Direct standardization uses the weights from a standard or reference population to compute the
weighted average of stratum-specific rate or risk estimates in the study population. When you use the
same reference population to compute directly standardized estimates for two populations, you can
also compare the resulting estimates.
1
SAS Global Forum 2013 Statistics and Data Analysis
• Indirect standardization uses the stratum-specific rate or risk estimates in the reference population to
compute the expected number of events in the study population. The ratio of the observed number
of events to the computed expected number of events in the study population is the standardized
morbidity ratio (SMR). SMR is also the standardized mortality ratio if the event is death; you can use it
to compare rates or risks between the study and reference populations.
The STDRATE (pronounced “standard rate”) procedure provides both directly standardized and indirectly
standardized rate and risk estimates. In addition, if an effect (such as the rate difference between two
populations) is homogeneous across strata, PROC STDRATE also provides the Mantel-Haenszel method
(Greenland and Rothman 2008, p. 271) to compute a pooled estimate of the effect that is based on these
stratum-specific effect estimates.
Note: The term standardization has different meanings in other statistical applications. For example, the
STANDARD procedure standardizes numeric variables in a SAS data set to a given mean and standard
deviation.
The following three sections describe the main features of PROC STDRATE: direct standardization, Mantel-
Haenszel estimation, and indirect standardization and SMR. Each section includes an example. These
sections are followed by a summary section that summarizes the main features of PROC STDRATE.
DIRECT STANDARDIZATION
Direct standardization uses the weights from a standard or reference population to compute the weighted
average of stratum-specific estimates in the study population. The directly standardized rate is computed as
T O
P
Ods D j rj sj
Tr
O
P of the study population, Trj is the population-time in the jth stratum of
where sj is the rate in the jth stratum
the reference population, and Tr D k Trk is the total population-time in the reference population.
The standardized risk can also be computed similarly.
The direct standardization is applicable when the study population is large enough to provide stable stratum-
specific estimates. The directly standardized estimate is the overall crude estimate in the study population if
it has the same strata distribution as the reference population.
When you use the same reference population to derive standardized estimates for different populations, you
can also use the estimated difference and estimated ratio statistics to compare the resulting estimates.
data Alaska;
State='Alaska';
input Sex $ Age $ Death PYear comma9.;
datalines;
Male 00-14 37 81,205
Male 15-34 68 93,662
Male 35-54 206 108,615
Male 55-74 369 35,139
Male 75+ 556 5,491
Female 00-14 78 77,203
Female 15-34 181 85,412
Female 35-54 395 100,386
Female 55-74 555 32,118
Female 75+ 479 7,701
;
2
SAS Global Forum 2013 Statistics and Data Analysis
The variables Sex and Age are the grouping variables that form the strata in the standardization, and the
variables Death and PYear indicate the number of events and person-years, respectively. The COMMA9.
format is specified in the DATA step to input numerical values that contain commas in PYear.
The following Florida data set contains the corresponding stratum-specific mortality information for the state
of Florida (Florida Department of Health 2000, 2012):
data Florida;
State='Florida';
input Sex $ Age $ Death comma8. PYear comma11.;
datalines;
Male 00-14 1,189 1,505,889
Male 15-34 2,962 1,972,157
Male 35-54 10,279 2,197,912
Male 55-74 26,354 1,383,533
Male 75+ 42,443 554,632
Female 00-14 906 1,445,831
Female 15-34 1,234 1,870,430
Female 35-54 5,630 2,246,737
Female 55-74 18,309 1,612,270
Female 75+ 53,489 868,838
;
The crude rate for Alaska (2924/626932 = 0.004664) is less than the crude rate for Florida (76455/15577105
= 0.004908). However, because the age distributions in the two states differ widely, these crude rates might
not provide a valid comparison.
To compare standardized rates for the two populations, you can combine the two data sets to form a single
data set to be used in the DATA= option. The following TwoStates data set contains the data sets Alaska and
Florida, where the variable State identifies the two states:
data TwoStates;
length State $ 7.;
set Alaska Florida;
run;
The following US data set contains the stratum-specific person-years information for the United States (U.S.
Bureau of the Census 2011):
data US;
input Sex $ Age $ PYear comma12.;
datalines;
Male 00-14 30,854,207
Male 15-34 40,199,647
Male 35-54 40,945,028
Male 55-74 19,948,630
Male 75+ 6,106,351
Female 00-14 29,399,168
Female 15-34 38,876,268
Female 35-54 41,881,451
Female 55-74 22,717,040
Female 75+ 10,494,416
;
3
SAS Global Forum 2013 Statistics and Data Analysis
The following statements invoke PROC STDRATE and compute the direct standardized rates for the states
of Florida and Alaska by using the United States as the reference population:
Standardization Information
The EFFECT option in the STRATA statement and the STAT=RATE option in the PROC STDRATE statement
display the “Strata Rate Effect Estimates” table, as shown in Figure 2. The EFFECT=RATIO option in the
PROC STDRATE statement requests that the stratum-specific rate ratio statistics be displayed.
4
SAS Global Forum 2013 Statistics and Data Analysis
The “Strata Rate Effect Estimates” table shows that except for the age group 75+, Alaska has lower mortality
rates than Florida for male groups and higher mortality rates for female groups. For the age group 75+,
Alaska has much higher mortality rates than Florida for both male and female groups.
With ODS Graphics enabled, the PLOTS(ONLY)=EFFECT option displays only the strata effect plot; the
default strata rate plot is not displayed. The strata effect measure plot includes the stratum-specific effect
measures and their associated confidence limits, as shown in Figure 3. The STAT=RATE option and the
EFFECT=RATIO option request that the strata rate ratios be displayed. By default, confidence limits are
generated at a 95% confidence level. This plot displays the stratum-specific rate ratios that are shown in the
“Strata Rate Effect Estimates” table in Figure 2.
5
SAS Global Forum 2013 Statistics and Data Analysis
The “Directly Standardized Rate Estimates” table in Figure 4 displays directly standardized rates and related
statistics.
-----------Standardized Rate----------
Standard 95% Normal
State Estimate Error Confidence Limits
The MULT=1000 suboption in the STAT=RATE option requests that rates per 1,000 person-years be displayed.
The table in Figure 4 shows that although the crude rate in the Florida population (4.908) is higher than the
crude rate in the Alaska population (4.664), the resulting standardized rate in the Florida population (4.0385)
is actually lower than the standardized rate in the Alaska population (4.2289).
The EFFECT=RATIO option requests that the “Rate Effect Estimates” table in Figure 5 display the log rate
ratio statistics of the two directly standardized rates.
Log
-------State------ Rate Rate Standard
Alaska Florida Ratio Ratio Error Z Pr > |Z|
The table in Figure 5 shows that when the log rate ratio statistic is 1.047, the resulting p-value is 0.0335,
indicating that the death rate is significantly higher in Alaska than in Florida at the 5% significance level.
MANTEL-HAENSZEL ESTIMATION
Assuming that an effect, such as the rate difference between two populations, is homogeneous across strata,
each stratum provides an estimate of the same effect. You can derive a pooled estimate of the effect from
these stratum-specific effect estimates, and you can use the Mantel-Haenszel method to estimate such an
effect. For a homogeneous rate difference effect between two populations, the Mantel-Haenszel estimate is
identical to the difference between two directly standardized rates, but it uses weights that are derived from
the two populations instead of from an explicitly specified reference population.
6
SAS Global Forum 2013 Statistics and Data Analysis
That is, for population k, k=1 and k=2, the standardized rates are
P O
j wj kj
O k D P
j wj
where O kj is the rate in the jth stratum of population k and the weights are derived from the two population-
times,
T1j T2j
wj D
T1j C T2j
where Tkj is the population-time in the jth stratum of population k.
The Mantel-Haenszel difference statistic is then given by
O 1 O 2
You can also apply the Mantel-Haenszel method to other homogeneous effects between populations, such
as the rate ratio, risk difference, and risk ratio.
data School;
input Smoking $ Pet $ Grade $ Case Student;
datalines;
Yes Yes K-1 109 807
Yes Yes 2-3 106 791
Yes Yes 4-5 112 868
Yes No K-1 168 1329
Yes No 2-3 162 1337
Yes No 4-5 183 1594
No Yes K-1 284 2403
No Yes 2-3 266 2237
No Yes 4-5 273 2279
No No K-1 414 3398
No No 2-3 372 3251
No No 4-5 382 3270
;
The variables Pet and Grade are the grouping variables that form the strata in the standardization, and
the variable Smoking identifies students who have smokers in their households. The variables Case and
Student indicate the number of students who have respiratory symptoms and the total number of students,
respectively.
The following statements invoke PROC STDRATE and compute the Mantel-Haenszel rate difference statistic
between students who have smokers in their household and students who do not:
7
SAS Global Forum 2013 Statistics and Data Analysis
The METHOD=MH option requests the Mantel-Haenszel estimation, and the STAT=RISK option specifies
the risk statistic for standardization. When you specify the EFFECT=DIFF option, PROC STDRATE uses the
default risk difference statistics to compute the risk effect between the study populations.
The POPULATION statement specifies the options that are related to the study populations. The EVENT=
option specifies the variable for the number of cases, the TOTAL= option specifies the number of students,
and the GROUP=SMOKING option specifies the variable Smoking, which identifies the smoking groups in
the DATA= data set.
The STRATA statement names the variables, Pet and Grade, that form the strata in the standardization. The
ORDER=DATA option sorts the strata by order of their appearance in the input data set, and the EFFECT
option displays the strata effects.
The “Standardization Information” table in Figure 6 displays the standardization information.
Standardization Information
With ODS Graphics enabled, PROC STDRATE displays the strata risk plot by default. The strata risk plot
displays stratum-specific risk estimates and their confidence limits in the study populations, as shown in
Figure 7. This plot displays stratum-specific risk estimates and the overall crude risks for the two study
populations. By default, strata levels are displayed on the vertical axis.
8
SAS Global Forum 2013 Statistics and Data Analysis
When you specify the STAT=RISK option in the PROC STDRATE statement, the EFFECT option in the
STRATA statement displays the “Strata Risk Effect Estimates” table, as shown in Figure 8. The EFFECT=DIFF
option in the PROC STDRATE statement requests that strata risk differences be displayed.
1 -.043766 0.010001
2 -.042366 0.012169
3 -.035225 0.016740
4 -.025554 0.016405
5 -.027373 0.013892
6 -.017120 0.021148
The “Strata Risk Effect Estimates” table shows that for the stratum of students in Grade 4–5 who have no
pets in their households, the risk is higher for students who have no smokers in their households than for
students who do have smokers in their households. For all other strata, the risk is lower for students without
household smokers than for students with household smokers. The difference is not significant in each
stratum because the null value 0 is between the lower and upper confidence limits.
With ODS Graphics enabled, the PLOTS=EFFECT option displays the plot that includes the stratum-specific
risk effect measures and their associated confidence limits, as shown in Figure 9. The EFFECT=DIFF
option requests that the risk difference be displayed. By default, confidence limits are generated with a 95%
confidence level. This plot displays the stratum-specific risk differences in the “Strata Risk Effect Estimates”
table in Figure 8.
9
SAS Global Forum 2013 Statistics and Data Analysis
The “Mantel-Haenszel Standardized Risk Estimates” table in Figure 10 displays the Mantel-Haenszel
standardized risks and related statistics.
-----------Standardized Risk----------
Standard 95% Normal
Smoking Estimate Error Confidence Limits
10
SAS Global Forum 2013 Statistics and Data Analysis
The EFFECT=DIFF option requests that the “Risk Effect Estimates” table display the risk difference statistic
for the two Mantel-Haenszel standardized risks, as shown in Figure 11.
The table in Figure 11 shows that although the standardized risk for students without household smokers
is lower than the standardized risk for students with household smokers, the difference (–0.00698) is not
significant at the 5% significance level (p-value = 0.1418).
where Tsj is the population-time in the jth stratum of the study population and O rj is the rate in the jth stratum
of the reference population.
With the expected number of events, E, SMR is
D
Rsm D
E
where D is the observed number of events.
With the computed Rsm , you compute an indirectly standardized rate for the study population as
O i s D Rsm O r
11
SAS Global Forum 2013 Statistics and Data Analysis
data Florida_C43;
input Age $1-5 Event PYear comma11.;
datalines;
00-04 0 953,785
05-14 0 1,997,935
15-24 4 1,885,014
25-34 14 1,957,573
35-44 43 2,356,649
45-54 72 2,088,000
55-64 70 1,548,371
65-74 126 1,447,432
75-84 136 1,087,524
85+ 73 335,944
;
Age is a grouping variable that forms the strata in the standardization, and the variables Event and PYear
identify the number of events and total person-years, respectively. The COMMA11. format is specified in the
DATA step to input numerical values that contain commas in PYear.
The following US_C43 data set contains the corresponding stratum-specific mortality information for the
United States in 2000 (Miniño et al. 2002; U.S. Bureau of the Census 2011):
data US_C43;
input Age $1-5 Event comma7. PYear comma12.;
datalines;
00-04 0 19,175,798
05-14 1 41,077,577
15-24 41 39,183,891
25-34 186 39,892,024
35-44 626 45,148,527
45-54 1,199 37,677,952
55-64 1,303 24,274,684
65-74 1,637 18,390,986
75-84 1,624 12,361,180
85+ 803 4,239,587
;
The following statements invoke PROC STDRATE and request indirect standardization to compare death
rates between Florida and the United States:
12
SAS Global Forum 2013 Statistics and Data Analysis
The DATA= and REFDATA= options name the study data set and reference data set, respectively. The
METHOD=INDIRECT option requests indirect standardization. The STAT=RATE option specifies the rate as
the frequency measure for standardization, and the MULT=100000 suboption (which is the default) displays
the rates per 100,000 person-years in the table output and graphics output. The PLOTS=ALL option requests
all plots that are appropriate for indirect standardization.
The POPULATION and REFERENCE statements specify the options that are related to the study and
reference populations, respectively. The EVENT= and TOTAL= options specify variables for the number of
events and person-years in the populations, respectively.
The STRATA statement lists the variable, Age, that forms the strata. The STATS option requests a strata
information table that contains stratum-specific statistics such as crude rates, and the SMR option requests
a strata SMR estimates table.
The “Standardization Information” table in Figure 12 displays the standardization information.
Standardization Information
The STATS option in the STRATA statement requests that the “Indirectly Standardized Strata Statistics”
table in Figure 13 display the strata information and expected number of events at each stratum. The
MULT=100000 suboption in the STAT=RATE option requests that crude rates per 100,000 person-years be
displayed. The Expected Events column displays the expected number of events when the stratum-specific
rates in the reference data set are applied to the corresponding person-years in the study data set.
13
SAS Global Forum 2013 Statistics and Data Analysis
------------------Study Population------------------
Stratum Observed ----Population-Time--- Crude Standard
Index Age Events Value Proportion Rate Error
With ODS Graphics enabled, the PLOTS=ALL option displays all appropriate plots. When you request
indirect standardization and a rate statistic, these plots include the strata distribution plot, the strata rate plot,
and the strata SMR plot. By default, strata levels are displayed on the vertical axis for these plots.
The strata distribution plot displays proportions for stratum-specific person-years in the study and reference
populations, as shown in Figure 14.
14
SAS Global Forum 2013 Statistics and Data Analysis
The strata distribution plot displays the proportions in the “Indirectly Standardized Strata Statistics” table in
Figure 13. In the plot in Figure 14, the proportions of the study population are identified by the blue lines,
and the proportions of the reference population are identified by the red lines. The plot shows that the
study population has higher proportions of skin cancer deaths in older age groups and lower proportions in
younger age groups than the reference population.
The strata rate plot displays stratum-specific rate estimates in the study and reference populations, as shown
in Figure 15. This plot displays the rate estimates in the “Indirectly Standardized Strata Statistics” table in
Figure 13. In addition, the plot displays the confidence limits for the rate estimates in the study population
and the overall crude rates for the two populations.
15
SAS Global Forum 2013 Statistics and Data Analysis
The SMR option in the STRATA statement requests that the “Strata SMR Estimates” table display the strata
SMR at each stratum. (See Figure 16.) The MULT=100000 suboption in the STAT=RATE option requests
that the reference rates per 100,000 person-years be displayed. The table shows that SMR is less than 1 at
three age strata (55–64, 65–74, and 75–84).
1 . .
2 . .
3 0.0406 4.0154
4 0.7304 2.3373
5 0.9226 1.7093
6 0.8333 1.3339
7 0.6449 1.0395
8 0.8072 1.1487
9 0.7919 1.1118
10 0.8841 1.4104
The strata SMR plot displays stratum-specific SMR estimates and their confidence limits, as shown in
Figure 17. The plot displays the SMR estimates in the “Strata SMR Estimates” table in Figure 16.
16
SAS Global Forum 2013 Statistics and Data Analysis
The METHOD=INDIRECT option requests that the “Standardized Morbidity/Mortality Ratio” table be dis-
played. (See Figure 18.) The table displays the SMR, its confidence limits, and the test for the null hypothesis
H0 W SMR D 1. The default ALPHA=0.05 option requests that 95% confidence limits be constructed.
The 95% normal confidence limits contain the null hypothesis value SMR=1, and the hypothesis of SMR=1
is not rejected at the ˛=0.05 level from the normal test.
17
SAS Global Forum 2013 Statistics and Data Analysis
The “Indirectly Standardized Rate Estimates” table in Figure 19 displays the indirectly standardized rate and
related statistics.
-----------Standardized Rate----------
Standard 95% Normal
Estimate Error Confidence Limits
The indirectly standardized rate estimate is the product of the SMR and the crude rate estimate for the
reference population. The table in Figure 19 shows that although the crude rate in the state of Florida
(3.4359) is much higher than the crude rate in the United States (2.6366), the resulting standardized rate
(2.6829) is close to the crude rate in the United States.
SUMMARY
In comparing the outcome measure of rate or risk between two populations, the use of the overall crude
rate or risk might not be appropriate because of confounding. You can derive directly standardized and
indirectly standardized rate or risk estimates based on stratum-specific estimates by removing the effects of
confounding variables. These estimates provide useful summary statistics and allow valid comparison of the
populations.
Although standardization provides useful summary standardized statistics, it is not a substitute for individual
comparisons of stratum-specific estimates. The STDRATE procedure provides summary statistics, such as
rate and risk estimates and their confidence limits, in each stratum. PROC STDRATE also displays these
stratum-specific statistics by using ODS Graphics.
REFERENCES
Alaska Bureau of Vital Statistics (2000a), “2000 Annual Report, Appendix I: Population Overview,” Accessed
February 2012.
URL http://www.hss.state.ak.us/dph/bvs/PDFs/2000/annual_report/Appendix_I.
pdf
Alaska Bureau of Vital Statistics (2000b), “2000 Annual Report: Deaths,” Accessed February 2012.
URL http://www.hss.state.ak.us/dph/bvs/PDFs/2000/annual_report/Death_chapter.
pdf
Florida Department of Health (2000), “Florida Vital Statistics Annual Report 2000,” Accessed February 2012.
URL http://www.flpublichealth.com/VSBOOK/pdf/2000/Population.pdf
Florida Department of Health (2012), “Florida Death Query System,” Accessed February 2012.
URL http://www.floridacharts.com/charts/DeathQuery.aspx
18
SAS Global Forum 2013 Statistics and Data Analysis
Friss, R. H. and Sellers, T. A. (2009), Epidemiology for Public Health Practice, 4th Edition, Sudbury, MA:
Jones & Bartlett.
Greenland, S. and Rothman, K. J. (2008), “Introduction to Stratified Analysis,” in K. J. Rothman, S. Greenland,
and T. L. Lash, eds., Modern Epidemiology, 3rd Edition, Philadelphia: Lippincott Williams & Wilkins.
Kleinbaum, D. G., Kupper, L. L., and Morgenstern, H. (1982), Epidemiologic Research: Principles and
Quantitative Methods, Research Methods Series, New York: Van Nostrand Reinhold.
Miniño, A. M., Arias, E., Kochanek, K. D., Murphy, S. L., and Smith, B. L. (2002), “Deaths: Final Data for
2000,” Accessed February 2012.
URL http://www.cdc.gov/nchs/data/nvsr/nvsr50/nvsr50_15.pdf
U.S. Bureau of the Census (2011), “Age and Sex Composition: 2010,” Accessed February 2012.
URL http://www.census.gov/prod/cen2010/briefs/c2010br-03.pdf
ACKNOWLEDGMENT
The author is grateful to Bob Rodriguez of the SAS Advanced Analytics Division for his valuable assistance
in the preparation of this paper.
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the author:
Yang Yuan
SAS Institute Inc.
111 Rockville Pike, Suite 1000
Rockville, MD 20850
301-838-7030
310-838-7410 (Fax)
[email protected]
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of
SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.
19