0% found this document useful (0 votes)

23 views20 pages

Concordance

Uploaded by

adrianmax9532

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views20 pages

Concordance

Uploaded by

adrianmax9532

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Concordance

Terry Therneau, Elizabeth Atkinson

April 22, 2024

1 The concordance statistic

1.1 Overview
Let (xi , yi ) be paired data values, say two measurements on a set of subjects, or observed
data y and
predictions x for those values from a statistical model. A pair of observations
(i, j ) is considered concordant if the prediction and the data go in the same direction, i.e.,
(xi > xj , yi > yj ) or (xi < xj , yi < yj ). The concordance statistic C is dened as the fraction of
concordant pairs, and is an estimate of P (xi > xj |yi > yj ), or equivalently P (yi > yj |xi > xj ).
The rst form might make more sense if, for instance, one took a list of elected ocials, say, and
looked back at local newspapers to see how many of the election contests had been predicted
correctly (probablility the prediction is correct, given outcome). The second conforms to our
though process if we started with the predictions, before election day, and then counted up the
fraction correct after the contest. When looking at a statistical model's predictions either form
can be argued; they give the same result.
One wrinkle is what to do with ties in either x or y. Such pairs can be ignored in the count
(treated as incomparable), treated as discordant, or given a score of 1/2. Let c, d, tx , ty and txy
be a count of the pairs that are concordant, discordant, tied on the predictor x (but not y ), tied
on y (but not x), and tied on both. Then

c−d
τa = (1)
c + d + tx + ty + txy
c−d
τb =p (2)
(c + d + tx )(c + d + ty )
c−d
γ = (3)
c+d
c−d
D = (4)
c + d + tx
c + tx /2
C = (D + 1)/2 = (5)
c + d + tx
Kendall's tau-a (1) is the most conservative, ties shrink the value towards zero.

The Goodman-Kruskal γ statistic (3) ignores ties in either y or x.

1
Somers' D (4) treats ties in y as incomparable; pairs that are tied in x (but not y) score
as 1/2, as we can see from equation (5) which is the concordance statistic.

Kendall's tau-b (2) can be viewed as a version of Somers' D that is symmetric in x and y.

The rst 4 statistics range from -1 to 1, similar to the correlation coecient r. The concor-
dance (5) ranges from 01, which matches the scale for a probability.
Why is C dened using Somers' D rather than one of the other three?

If y is a 0/1 variable, then C = AUROC, the area under the receiver operating curve,
which is well established for binary outcomes. (Proving this simple theorem is harder than
it looks, but the result is well known.)

For survival data, this choice will agree with Harrell's C. More importantly, as we will see
below, it has strong connections to standard tests for equality of survival curves.

The concordance has a natural interpretation as an experiment: present pairs of subjects

one at a time to the physician, statistical model, or some other oracle, and count the number of
correct predictions. Pairs that have the same outcome yi = yj are not put forward for scoring,
since they do not help discriminate a good oracle from a bad one. If the oracle cannot decide
then a random choice is made. This leads to c + tx /2 correct selections out of c + d + tx choices.
This hypothetical experiment gives a baseline insight into the concordance. A value of 1/2
corresponds to using a random guess for each subject. Values of .5.55 are not very impressive,
since the ordering for some pairs of subjects will be obvious, and someone with almost no medical
knowledge could do that well by marking these easy pairs and using a coin ip for the rest. Values
of less than 1/2 are possible some stock market analysts come to mind.

1.2 Simple examples

The concordance function accepts simple data or models as input. For the latter, it assesses
the concordance between y and the model prediction ŷ . Here is a set of simple examples.

Direct

> # x1 and y2 are both continuous variables

> concordance(y2 ~ x1, data= anscombe)
Call:
concordance.formula(object = y2 ~ x1, data = anscombe)

n= 11
Concordance= 0.7818 se= 0.1255
concordant discordant tied.x tied.y tied.xy
43 12 0 0 0

2
Logistic regression

> # Fisher's iris data

> fit1 <- glm(Species=="versicolor" ~ ., family=binomial, data=iris)
> concordance(fit1) # equivalent to an AUC
Call:
concordance.lm(object = fit1)

n= 150
Concordance= 0.8258 se= 0.03279
concordant discordant tied.x tied.y tied.xy
4129 871 0 6174 1

Linear regression

> # Anscombe data (all variables are continuous)

> fit2 <- lm(y2 ~ x1 + x4, data= anscombe)
> concordance(fit2) # C
Call:
concordance.lm(object = fit2)

n= 11
Concordance= 0.7818 se= 0.1255
concordant discordant tied.x tied.y tied.xy
43 12 0 0 0
> sqrt(summary(fit2)$r.squared) # R
[1] 0.891425

Parametric survival

> # parametric survival

> fit3 <- survreg(Surv(time, status) ~ karno + age + trt, data=veteran)
> concordance(fit3)
Call:
concordance.survreg(object = fit3)

n= 137
Concordance= 0.7122 se= 0.02232
concordant discordant tied.x tied.y tied.xy
6263 2527 14 39 0

3
Cox regression

> # 3 Cox models

> fit4 <- coxph(Surv(time, status) ~ karno + age + trt, data=veteran)
> fit5 <- update(fit4, . ~ . + celltype)
> fit6 <- update(fit5, . ~ . + prior)
> ctest <- concordance(fit4, fit5, fit6)
> ctest
Call:
concordance.coxph(object = fit4, fit5, fit6)

n= 137
concordance se
fit4 0.7119 0.0224
fit5 0.7384 0.0210
fit6 0.7359 0.0212

concordant discordant tied.x tied.y tied.xy

fit4 6261 2529 14 39 0
fit5 6499 2301 4 39 0
fit6 6478 2324 2 39 0

As shown in the last example, the concordance for multiple ts can be obtained from a
single call. The variance-covariance matrix for all three concordance values is available using
vcov(ctest); this is used in Section 3.1 to formally test the equality of two concordance values.
The above also shows that addition of another variable to a tted model can decrease the
concordance. The larger model will have higher correlation between the linear predictor Xβ
and the response y , by denition, but this does not guarantee a greater association between
rank(Xβ) and rank(y).

1.3 Ecient compuatation

For continuous data without ties, the concordance involves a comparison between all n(n − 1)/2
pairs. This O(n2 ) computation will become painfully slow for large datasets. To improve this,
rst order the data by increasing y values, leading to

n n
1 XX
c−d= sign(yi − yj )sign(xi − xj ) (6)
2 i=1 j=1
 
X1 X
=  sign(xi − xj ) (7)
i=n yj >yi

The rst equation is the simple denition of concordance as a sum over all n2 possible pairs,
where sign is the R sign function. Equation (7) makes the obvious simplication of counting
each pair only once, by taking advantage of the fact that y is sorted. The key coding insight is

4
18

8 24

2 12 21 27

1 6 9 14 19 23

Figure 1: A balanced binary tree with 13 elements.

5
to store the x values as part of a balanced binary tree; an example of such a tree is shown in
Figure 1. The basic algorithm is to:

1. Create a balanced binary tree for all n xi values. This can be done in O(n log2 (n)) steps.
The nal tree will have a node for each unique x value. Each node contains the value,
along with counts for the number of observations at that value, for left hand children, and
for right hand children. Initialize all the counts to 0.

2. Go through the data from largest y to smallest y.

(a) For this new (xi , yi ) pair, count the number of xj values in the tree that are smaller or
larger than this element, and increment the overall counts of concordant, discordant,
and tied. This can be done in log2 (n) steps (proceed down from the top).

(b) Add this observation to the tree. Each addition will update the count for its node,
then walk up the tree updating child counts of the parent, grandparent, etc.

If there are tied y values, do all the counts for a set of ties rst, and then add their x values to
the tree.

2 Concordance and survival data

As stated above, for continuous data without ties, the concordance involves a comparison between
all n(n − 1)/2 y values, however, those pairs are ignored. For example,
pairs. If there are tied
in the iris data above C = 4129/(4129 + 871), the 6175 pairs with tied response values play no
role. For survival data, this set of incomparables is extended to include those pairs for which
the time ordering is ambiguous. For instance, assume that yi is censored at time 10 and yj is an
event (or censor) at time 20. Subject i may or may not survive longer than subject j, and so it
is not possible to tell if a rule has ranked them correctly or not. Note that if yi is censored at
time 10 and yj is an event at time 10 then yi > yj . This same convention is followed for all the
survival models, and agrees with common clinical data, i.e., a patient censored on day 100 was
observed to be still alive on day 100, their death time must be strictly greater than 100.
A second, smaller issue for survival data is to recognize that we desire to assess the con-
cordance between an observed survival time yi and a predicted survival time ŷi . This can be
done without creating an explicit survival curve from the model: for a survreg t, for instance,
η1 > η2 implies S(t; x1 ) > S(t; x2 ) for all t, where η1 and η2 are the respective linear predictors
xi β and x2 β . Not creating the survival curves saves considerable computation time. For a Cox
model η1 > η2 implies S(t; x1 ) < S(t; x2 ); the order is reversed. This is because the Cox model is
a hazard model, and higher hazard implies a shorter survival. When a coxph object is used as the
argument to concordance the reversal is handled automatically. However, when a concordance
is done by hand the user needs to be aware of this as seen in the example below. In this case
the concordance routine does not know that the prediction came from a Cox model, resulting in
a swap of the count for concordant and discordant pairs and C < .5. The user is responsible for
adding the reverse = TRUE argument in this case.

> # Concordance for a coxph object

> concordance(fit4)

6
Call:
concordance.coxph(object = fit4)

n= 137
Concordance= 0.7119 se= 0.02235
concordant discordant tied.x tied.y tied.xy
6261 2529 14 39 0
> # Concordance using predictions from a Cox model
> concordance(Surv(time, status) ~ predict(fit4), data = veteran, reverse = TRUE)
Call:
concordance.formula(object = Surv(time, status) ~ predict(fit4),
data = veteran, reverse = TRUE)

n= 137
Concordance= 0.7119 se= 0.02235
concordant discordant tied.x tied.y tied.xy
6261 2529 14 39 0

Stratied models
Stratied models present a further variation: if observations i and j are in dierent strata, the
survival curves for those strata might cross; S(t; xi ) and S(t; xj ) no longer have a simple ordering.
A solution is to use a stratied concordance, which compares all pairs within each stratum, and
then adds up the result. In the example below there is a separate count for each stratum, the
nal concordance is based on the column sums. The same issue, and solution, applies to stratied
survreg models. (The fact that strata names are not retained as labels for the counts matrix is
a deciency in the routine.)

> fit4b <- coxph(formula = Surv(time, status) ~ karno + age + trt +

strata(celltype), data = veteran)
> concordance(fit4b)
Call:
concordance.coxph(object = fit4b)

n= 137
Concordance= 0.6986 se= 0.02679
concordant discordant tied.x tied.y tied.xy
squamous 357 161 0 1 0
smallcell 728 361 3 9 0
adeno 275 65 1 1 0
large 240 102 0 0 0
> table(veteran$celltype)
squamous smallcell adeno large
35 48 27 27

7
2.1 Time-weighted concordance
Look again at equation (7), rewriting it for survival with ti as the response in order to more
closely match standard notation for survival. Watson and Therneau [10] show that this can be
further rewritten as
 
n
X X
c−d= δi  sign(xi − xj )
i=1 tj >ti
 
n
X X
= δi  sign(xi − xj ) (8)
i=1 tj ≥ti
X
=2 δi n(ti ) [ri (ti ) − r] (9)
i

Here δ is 0 = censored, 1 = uncensored; if ti is censored then all other observations with tj ≥ ti

are not comparable. Equation (8) rewrites the inner term is a sum over all subjects in the risk
set at time ti , a familiar concept in survival models. In equation (9) the inner terms has been
rewritten with n(t) as the number of subjects still at risk at time t, and ri (t) the rank of xi
among all those still at risk at time t, where ranks are dened such that 0 ≤ r ≤ 1, and r the
mean of those ranks. (Proofs for (8) and (9) have been omitted.) It turns out that equation (9)
is exactly the score statistic for a Cox model with a single time-dependent covariate n(t)r(t).
One immediate consequence of this connection is a straightforward denition of concordance for
a risk score containing time dependent covariates. Since the Cox model score statistic is well
dened for time dependent covariates, the concordance is also well dened for a time-dependent
risk score: at each event time the current risk score of the subject who failed is compared to the
current (time dependent) scores of all those still at risk. A deeper consequence of the equivalence
between the concordance and the Cox model is a link to alternate weightings of the risk scores.
If the original Cox model has a single 0/1 treatment covariate then equation (9) exactly matches
the numerator of the Gehan-Wilcoxon statistic; replacing n(t) with weights of 1 will yield the
log-rank statistic.
There is a deep literature with respect to the best weight for survival tests, and we can apply
the same historical arguments to the concordance as well. We will point out four of interest:

Peto and Peto [7] point out that n(t) ≈ n(0)S(t−)G(t−), where S is the survival distribu-
tion and G the censoring distribution. They argue that S(t−) would be a better weight
since G may have features that are irrelevant to the question being tested. For a particu-
lar dataset, Prentice [8] later showed that these concerns were indeed justied, and most
software now uses the Peto-Wilcoxon variant.

Tarone and Ware point out that weights of

p n(t) and 1 give the Gehan-Wilcoxon and log-
rank tests, respectively, and suggest n(t) as an intermediate value.

Schemper et al [9] argue for a weight of S(t)/G(t) in the Cox model. When proportional
hazards does not hold the coecient from the Cox model is an average hazard ratio,
and they show that using S/G leads to a value that remains interpretable in terms of an
underlying population model. The same argument would also apply to the concordance,
since our goal is an assumption free assessment of association.

8
Uno et al [11] recommend the use of n/G2 as a weight based on a consistency argument.
If we assume that the concordance value that would be obtained after full follow-up of all
subjects (no censoring) is the right one, and proportional hazards does not hold, then
the standard concordance will not consistently estimate this target quantity when there is
censoring.

In practice, weights need to be based on left continuous versions of the survival curves S(t−)
and G(t−), and extra care needs to be exercised in computation of G. Consider the aml dataset
as an example; the rst few lines of the relevant survival curve are shown below.

> afit <- survfit(Surv(time, status) ~1, aml, se = FALSE)

> summary(afit, times = afit$time[1:6], censor = TRUE)
Call: survfit(formula = Surv(time, status) ~ 1, data = aml, se = FALSE)

time n.risk n.event survival

5 23 2 0.913
8 21 2 0.826
9 19 1 0.783
12 18 1 0.739
13 17 1 0.696
16 15 0 0.696
The rst censor is at time 13, when there is both a death and a censor. For the event at
time 13, however, no one has yet been censored, and no censoring correction should be done
at this point. Weights at any time t, t, must be the values in
like the number at risk at time
eect just before the event. For a weight based on S ; at time 13 the proper value would be .739,
dropping to .696 just after time 13. The value of G(t−) at 13 will be 1. Just after time 13 will
be 15/16, not the value of 16/17 that will be obtained from survfit(Surv(time, 1-status)
1). Censors happen after deaths, thus there are only 16 at risk for the censoring event.
Based on the Peto and Peto argument that n(t) ≈ n(0)S(t−)G(t−), we might expect the
Schemper and Uno weights to be similar. In testing, we discovered to our surprise that if
computations are done carefully as per above, then n(t) = n(0)S(t−)G(t−) exactly. As a
consequence, the Schemper and Uno weights are also identical.
The timewt option in theconcordance function allows you to modify weights for the con-
cordance. The options are: n, S, S/G, n/G2 and I, the last of which giving equal weight to
each event time; this would correspond to a log-rank test. (We do not recommend this nor are
we aware of any literature which does so, but included it for completeness). For non-survival
data all the weights become identical.
The gure below shows the rst three weights for the colon cancer dataset. This data is from
a clinical trial of 929 subjects, with 3 years of enrollment followed by 5 years of follow. Since there
is almost no one lost to follow-up in the rst 5 years, the choices are nearly identical over that
time. From 5 to 8 years, S(t) continues its steady decline, n(t) plummets due to administrative
censoring, and S/G explodes. Even with these changes in weights, the concordance values are
all very similar.

> colonfit <- coxph(Surv(time, status) ~ rx + nodes + extent, data = colon,

subset = (etype == 2)) # death only

9
> cord1 <- concordance(colonfit, timewt="n", ranks=TRUE)
> cord2 <- concordance(colonfit, timewt="S", ranks=TRUE)
> cord3 <- concordance(colonfit, timewt="S/G", ranks=TRUE)
> cord4 <- concordance(colonfit, timewt="n/G2", ranks=TRUE)
> temp <- c("n(t)"= coef(cord1), S=coef(cord2), "S/G"= coef(cord3),
"n/G2"= coef(cord4))
> round(temp,5) # 4 different concordance estimates
n(t) S S/G n/G2
0.65559 0.65437 0.65357 0.65357
> # Plot the weights over time using the first 3 approaches
> matplot(cord1$ranks$time/365.25, cbind(cord1$ranks$timewt,
cord2$ranks$timewt,
cord3$ranks$timewt),
type= "l", lwd=2, col=c(1,2,4),
xlab="Years since enrollment", ylab="Weight")
> legend(1, 3000, c("n(t)", "nS(t-)", "nS(t-)/G(t-)"), lwd=2,
col=c(1,2,4), lty=1:3, bty="n")
> # Note that n/G2 and S/G are identical
> all.equal(cord3$ranks$timewt,cord4$ranks$timewt)
[1] TRUE
6000
5000
4000
Weight

3000

n(t)
nS(t−)
2000

nS(t−)/G(t−)
1000
0

0 2 4 6 8

Years since enrollment

10
0.8
0.8

0.8
0.4

0.4

0.4
0.0

0.0

0.0
0 2 4 6 8 0 2 4 6 8 10 14 0 10 20 30 40

NCCTG colon cancer Free light chain McGilchrist kidney

0.8

0.8
0.4

0.4

0.4
0.0

0.0

0.0
0.0 0.5 1.0 1.5 2.0 2.5 0 5 10 20 30 0 5 10 15 20

advanced lung cancer MGUS NAFLD

0.13
Beta(t) for age
0.8

0.8

0.10
0.4

0.4

0.07
0.0

0.0

0 2 4 6 8 10 0 5 10 15 20 0 5 10 15

PBC Rotterdam Time

Figure 2: Survival (black) and censoring (red) curves for 8 datasets found in the survival package.
The nal panel shows a proportional hazards evaluation for the age variable, in a t of age +
male to the NAFLD data.

When do weights matter?

When might the concordance dier between weight functions? In order for S/G weights to show
an important dierence we need to have two conditions.

First, sucient censoring that the two weights dier for a reasonable fraction of the data,
that is, G(t) is low and S(t) has not attened (deaths are still occurring).

Second, per the arguments in Schemper and in Uno, is the potential presence of non-
proportional hazards in the t.

Figure 2 shows survival and censoring curves for 8 dierent datasets found in the survival
package. Based on these, the dataset with the greatest potential for an S/G dierence is the
NAFLD data; the dataset also has some early non-proportionality for age. The code below shows
the calculation of the S/G dierence. Surprisingly, the four weightings still yield very similar
concordance values; the Harrell (n) and Uno (n/G2) weighting dier in only the second decimal
place.

11
> nfit <- coxph(Surv(futime/365.25, status) ~ age + male, data = nafld1)
> ncord1 <- concordance(nfit, timewt = "n")
> ncord2 <- concordance(nfit, timewt = "S")
> ncord3 <- concordance(nfit, timewt = "S/G")
> ncord4 <- concordance(nfit, timewt = "n/G2")
> temp <- c(n = coef(ncord1), S = coef(ncord2),
"S/G" = coef(ncord3), "n/G2" = coef(ncord4))
> round(temp,6)
n S S/G n/G2
0.823254 0.821457 0.805438 0.805438

The concordance function provides a ranks=TRUE argument which can be used to further
exploration of the weights. I f set, the output will include a dataframe that contains one row for
each event, containing the time point, its relative rank in the risk set (ri − r ), the case weight for
the observation and the time weight at that time point. The relative ranks are comparable to
Schoenfeld residuals, and their weighted sum will equal c − d. We can plot them over time and
apply a smooth. Figure 3, for the veteran dataset, shows a precipitous drop to 0. The Veteran's
cancer data is perhaps an extreme of this pattern. Due to the very rapid progression of disease,
baseline measurements soon lose their meaning.

> # code for Figure 3

> # pick a dataset with a smaller number of points, and non PH
> vfit <- coxph(Surv(time/365.25, status) ~ age + karno, data = veteran)
> temp <- concordance(vfit, ranks=TRUE)$rank
> # Two outliers at 999 days = 2.7 years stretch the axis too far
> plot(rank ~ time, data=temp, xlim=c(0,1.6),
xlab="Years", ylab="Rank residual")
> lines(lowess(temp$time, temp$rank, iter=1), lwd=2, col=2)
> abline(0, 0, lty=3)

2.2 Restricted time range

A rst consideration in any model assessment is to stop and ponder what a good t means.
This is far too often ignored in the rush to computation. Two wonderful papers in this regard
are Korn and Simon [4] and Altman and Royston [1]. The title of the second makes its purpose
clear: What do we mean by validating a prognostic model?
Korn and Simon, for instance, point out the importance of restricting the range of comparison,
a point echoed by Altman and Royston. Though a risk model can be used for long-range
prediction, in actual patient practice this will very often not be the need; the model should be
evaluated over the time range in which it will actually be used. For example, for a patient who
returns every 4 years, a 10 year prediction will never actually be put to the test; predictions will
be updated every 4 years based on the most current information. If the model is to be used for
ongoing patient management we will want to restrict attention to a 4 year horizon, with respect
to evaluating its utility. (A 10 year Framingham heart risk might still be useful in convincing a
subject to stop smoking, however.)

12
1.0
0.5
Rank residual

0.0
−0.5

0.0 0.5 1.0 1.5

Years

Figure 3: Schoenfeld residuals for the scaled ranks, from a t to the veteran dataset.

13
We sometimes receive push-back on this, with the argument that one should use all the data.
We disagree, and think that the target of the validation is critical. Predictions become less
accurate the further out we reach in time; this is true for everything from weather forecasts to
the stock market, and survival models are not immune. Reaching too far into the future may
return an overly pessimistic value of C.
Another reason for using an upper limit is that 1/G can become unstable as the sample size
becomes small (large jumps in the KM), or unreasonably large as G approaches 0. Most authors
suggest an upper limit for this purely technical reason.

> # should this be included?

> # recurrence free survival = earlier of recurrence and death
> rdata <- rotterdam
> rdata$rfs <- with(rdata, ifelse(recur==1, 1, death))
> rdata$rfstime <- with(rdata, ifelse(recur==1, rtime, dtime))/ 365.25
> rfit <- coxph(Surv(rfstime, rfs) ~ age + meno + grade + pspline(nodes), rdata)
> ctemp <- matrix(0, 100, 2) # concordance and std err
> ctime <- seq(.1, 10, length=100)
> for (i in 1:100) {
temp <- concordance(rfit, ymax=ctime[i])
ctemp[i,] <- c(temp$concordance, sqrt(temp$var))
}
> yhat <- ctemp[,1] + outer(ctemp[,2], c(0, -1.96, 1.96), '*')
> matplot(ctime, yhat, type='l', lty=c(1,2,2), lwd=c(2,1,1), col=1,
xlab="Upper cutoff", ylab="C", ylim=c(0.5,1))

Argument could also be made for a lower limit, though this would be uncommon for censored
data. Many laboratory values, for instance, treat all results less than some threshold as identical.
However, the ability to implement a lower limit correctly is constrained by censoring. Say for
instance that there were values of 5, 8+, and 9, and a lower limit of 10 were chosen. The approach
used for non-censored data is to treat 5 and 9 as tied values, but this logic does not correctly
extend to the censored value. This issue and possible solutions will be discussed more fully in
the external validation vignette.

2.3 Synthetic C
As another method of addressing censoring, Göen and Heller [3] show that if the statistical model
is correct, and if proportional hazards holds, then for any pair of covariate vectors

1
P (yi > yj ) =
1 + eηj −ηi
They then order the η values from a tted model, and take an average over all n(n−1)/2 ordered
pairs. The authors argue that this is an estimate that is independent of censoring, and therefore
preferable to Harrell's C. (The estimate can be obtained by using the royston function.)
The biggest problem with this approach is that it gives an estimate of concordance under an
assumption that the model is exactly correct. Our goal, rather, is to assess how well the model
performs, for our needs, knowing that it will be imperfect. The G and H formula answers a

14
question that we did not ask, over a time range (0, ∞) which is not of interest. The calculation
is also O(n2 ) so will be slow for large sample sizes.

2.4 Picking a weight

So, which weight should we use? As shown in the examples above, it may not matter that much.
(The fact that we have never found an example where the eect is large does not mean there are
no such datasets, however.) Time weights play no role for uncensored data.
Further issues to consider:

1. An important issue that has not been sorted out is how to extend 1/G weighting arguments
to datasets that are subject to delayed entry, e.g., when using age as the time scale instead
of time since enrollment. There is, in this case, no natural estimate available for G. It is also
not possible for the coxph routine to reliably tell the dierence between such left truncation
and simple time-dependent covariates or strata. The default action of the routine is to use
the safe choice of n(t).
2. Consider setting a time (y ) restriction using the ymax option, based on careful thought
about the proper range of interest. This often has a larger practical eect than the choice
of time weight.

3. Safety. If using the usual Gehan-Wilcoxon weights of n(t), the Peto-Wilcoxon variant S(t)
would appear advantageous, particularly if there is dierential censoring for some subjects.

4. Equality vs. eciency. On one hand we would like to treat each data pair equally, but in
our quest for ever sharper p-values we want to be ecient. The rst argues for n(t) as the
weight and the second for using equal weights, since the variances of each ranking term
are nearly identical. This is exactly the argument between the Gehan-Wilcoxon and the
log-rank tests.

5. For uncensored data n, S and S/G weights are all identical.

Our current opinion is that the point of the concordance is to evaluate the model in a more
non-parametric way, so a log-rank type of focus on ideal p-values is misplaced. This suggests
using either S or S/G as weights. Both give more prominence to the later time points as
compared to the default n(t) choice, but if time limits have been thought through carefully the
dierence between these three will almost always be ignorable.
We most denitely disagree with Uno's unstated assumption that the C statistic one would
obtain with innite follow-up and no censoring is the proper target of estimation, and the
ordinary concordance is therefore biased. That target will never be attainable, and we would
argue that it is largely irrelevant if it was. Proportional hazards never is true over the long term,
simply because it is almost impossible to predict events that are a decade or more away and
thus the rank residuals shown above will eventually tend to 0. The starting point should always
be to think through exactly what one wants to estimate. As stated by Yogi Berra If you don't
know where you are going, you'll end up someplace else."

15
3 Variance
The variance of the statistic is estimated in two ways. The rst is to use the variance of the
equivalent Cox model score statistic. As pointed out by Watson, this estimate is both correct and
ecient under H0 : C = .5, and so it forms a valid test of H0 . However, when the concordance is
over .7 or so, this estimator systematically overestimates the true variance. An alternative that
remains unbiased is the innitesimal jackknife (IJ) variance

n
X
V = wi Ui2
i=1
∂C
Ui =
∂wi
The concordance routine calculates an inuence matrix U with one row per subject and columns
that contain derivatives for the 5 individual counts: concordant, discordant, tied on x, tied on y,
and tied on xy pairs. From this it is straightforward to derive the inuence of each subject on the
concordance, or on any other of the other possible association measures such as τ -a mentioned
earlier. The IJ variance is printed by default but the PH variance is also returned; an earlier
survConcordance function only computed the PH variance.
The condordance function does not compute Kendall's τ -a or τ -b, nor Goodman's gamma.
However, since all of the necessary components for those values are returned, along with IJ
inuence for each, it can be used as the computational engine for those measures and their
variance, should someone wish to do so.
The variance computation accounts for the inuence of each subject on both the numberator
and denominator of C ; this agrees with parallel development used by Newson [6] and implemented
in STATA. An alternate approach is to treat the total number of comparable pairs is an ancillary
statistic, thus var(C) = var(c − d)/(4m) where m is the number of comparable pairs (n(n − 1)/2
for uncensored data). Arguments about what aspects of a dataset can or should be treated as
ancillary are as old as statistics, e.g., treating the margins of a 2x2 table as ancillary leads to
Fisher's exact test. In this case we anticipate that the dierences will be quite small, but have
done no formal exploration.

3.1 Multiple concordances

One useful property of using a jackknife variance estimate is that the variance of the dierence
in concordance between two separately tted models is also easily obtained. If ca and cb are
the two concordance statistics and Uia and Uib the corresponding inuence values, the inuence
vector for ca − cb is Ua − Ub . (If subject i increases ca by .03 and cb by .01, then he/she raises
the dierence between them by .02.) It is not necessary that the models be nested. However, it
is crucial that they be computed on the exact same set of observations. Here is a comparison of
concordance values from previous models.

> ctest <- concordance(fit4, fit5, fit6)

> ctest
Call:
concordance.coxph(object = fit4, fit5, fit6)

16
n= 137
concordance se
fit4 0.7119 0.0224
fit5 0.7384 0.0210
fit6 0.7359 0.0212

concordant discordant tied.x tied.y tied.xy

fit4 6261 2529 14 39 0
fit5 6499 2301 4 39 0
fit6 6478 2324 2 39 0
> # compare concordance values of fit4 and fit5
> contr <- c(-1, 1, 0)
> dtest <- contr %*% coef(ctest)
> dvar <- contr %*% vcov(ctest) %*% contr
> c(contrast=dtest, sd=sqrt(dvar), z=dtest/sqrt(dvar))
contrast sd z
0.02646524 0.01662275 1.59211003
To do a similar comparison for models which do not have a concordance method, use a set
of dummy linear model ts as a container. For instance, assume that y was continuous, and
we have 3 dierent predicted values phat1, phat2, phat3 from three dierent machine learning
models, say, and we want to compute and compare concordance. The one could use the following
code:

dummy1 <- lm(y ~ phat1)

dummy2 <- lm(y ~ phat2)
dummy3 <- lm(y ~ phat3)
cfit <- concordance(dummy1, dummy2, dummy3)
print(cfit)
etc.
In order to produce correct answers, it is necessary that y, phat1, phat2, and phat3 be
results for exactly the same observations, in exactly the same order. If one model has some
subject removed for missing values, say, then those subjects can not be present in any of the ML
ts.

3.2 Asymmetric condence intervals

The innitesimal jackknife (IJ) has provided us with an honest estimate of the standard devi-
ation of C. A natural condence interval for the concordance is then C ± zα sd(C). As with
condence intervals for an ordinary proportion p̂, however, this simple interval can sometimes
be inconsistent, giving CI endpoints that lie outside of the legal range of [0, 1]. In the case of p̂
there is a long history of methods to address this issue, going back at least as far as the 1956
paper by Anscombe [2], but there is less literature for the concordance or AUC. Newcombe [5]
provides corrected methods, but with the caveat that they have the drawback that on account
of the large number of outcomes they are computationally practicable only for very small sample

17
sizes. The tree based computations used in the concordance function might well address the
speed issue, but have not been implemented.
Here we pursue another avenue, which is to consider a transformation based condence
interval, in much the same way as is done for condence intervals of a survival curve. That is,
we use
g −1 [g(C) ± zσ(g(C))]
for some transformation function g. For survival curves, the g functions log(p), log(p/(1 − p),
log(− log(1 − p)) and arcsin(p) have all been found to be superior to the simple interval.
For the concordance, consider the Fisher z-transform, widely used for the correlation coe-
cient r
1 1+r
z= log (10)
2 1−r
Since Somers' D andr are targeted at similar concepts, we might hazard that a similar transfor-
mation of Somers' D, which also ranges from -1 to 1, would also be close to equivariant. Since
D = 2C − 1 we have

1 1 + (2C − 1)
zc = log
2 1 − (2C − 1)

1 C
= log
2 1−C

which we recognize as the inverse of the logistic link used glm models.
We can get the standard error of zc by retrieving the individual dfbeta values and performing
a transformation. The dfbeta value is dened as di = C − C−i where the latter is the C statistic
omitting the ith observation.

> zci <- function(fit, p=.95) {

ilogist <- function(p) log(p/(1-p)) # inverse logistic
logistic <- function(x) exp(x)/(1 + exp(x))
temp <- concordance(fit, influence =1)
cminus <- temp$concordance - temp$dfbeta # values of concordance, without i

newd <- ilogist(temp$concordance) - ilogist(cminus) # dfbeta on new scale

new.sd <- sqrt(sum(newd^2))
old.sd <- sqrt(sum(temp$dfbeta^2)) # same as sqrt(temp$var)

z <- qnorm((1-p)/2)
old.ci <- temp$concordance + c(z, -z)*old.sd
new.ci <- logistic(ilogist(temp$concordance) + c(z, -z)* new.sd)
rbind(old = old.ci, new= new.ci)
}
> round(zci(colonfit), 4)
[,1] [,2]
old 0.6302 0.6810
new 0.6298 0.6805

18
The two intervals hardly dier, which is what we would expect for a value far from 1. As a
second example, create a small dataset with a concordance that is close to 1. As shown below, the
z-transform shifts the CI towards zero, as it should, but also avoids the out-of-bounds endpoint.

> set.seed(1953)
> ytest <- matrix(rexp(20), ncol=2) %*% chol(matrix(c(1, .98, .98, 1), 2))
> cor(ytest)
[,1] [,2]
[1,] 1.0000000 0.9422072
[2,] 0.9422072 1.0000000
> lfit <- lm(ytest[,1] ~ ytest[,2])
> zci(lfit)
[,1] [,2]
old 0.8419721 1.0246946
new 0.7253801 0.9867027

4 Details
This section documents a few details - most readers can skip it.
The usual convention for survival data is to assume that censored values come after deaths,
even if they are recorded on the same day. This corresponds to the common case that a subject
who is censored on day 200, say, was actually seen on that day. That is, their survival is strictly
greater than 200. As a consequence, censoring weights G actually use G(t−) in the code: if 10
subjects are censored at day 100, and these are the rst censorings in the study, then an event
on day 100 should not be given a larger weight. (Both the Uno and Schemper papers ignore this
detail.)
When using weights of S(t) the program actually uses a weight of nS(t−) where n is the
number of observations in the dataset. The reason is that for a stratied model the weighted
number of concordant, discordant and tied pairs is calculated separately for each stratum, and
then added together. If one stratum were much smaller or larger than the others we want to
preserved this fact in the sum.

References
[1] D. G. Altman and P. Royston. What do we mean by validating a prognostic model? Stat.
in Medicine, 19:45373, 2000.

[2] F. J. Anscombe. One estimating binomial response relations. Biometrika, 43:461464, 1956.

[3] M. Göen and G. Heller. Concordance probability and discriminatory power in proportional
hazards regression. Biometrika, 92:965970, 2005.

[4] E. L. Korn and R. Simon. Measures of explained variation for survival data. Stat. in
Medicine, 9:487503, 1990.

19
[5] R. G. Newcombe. Condence intervals for an eect size measure based on the mannwhitney
statistic. part 2: asymptotic methods and evaluation. Stat. in Medicine, pages 55973, 2006.

[6] R Newson. Condence intervals for rank statistics: Somers' D and extensions. Stata Journal,
6(3):309334, 2006.

[7] R. Peto and J. Peto. Asymptotically ecient rank invariant test procedures (with discus-
sion). J. Royal Stat. Soc. A, 135(2):185206, 1972.

[8] Ross L Prentice and P Marek. A qualitative discrepancy between censored data rank tests.
Biometrics, 35(4):861867, 1979.

[9] M. Schemper, S. Wakounig, and G. Heinze. The estimation of average hazard ratios by
weighted Cox regression. Stat. in Medicine, 28(19):24732489, 2009.

[10] T. M. Therneau and D. A. Watson. The concordance statistic and the Cox model. Technical
Report 85, Department of Health Science Research, Mayo Clinic, 2015.

[11] H. Uno, T. Cai, M. J. Pencina, R. B D'Agnostino, and L. J. Wei. On the C-statistics for
evaluating overall adequacy of risk prediction procedures with censored survival data. Stat.
in Medicine, 30(10):11051117, 2011.

EDUC/PSY 6600: Unit 3 Homework: Your Name Spring 2018
No ratings yet
EDUC/PSY 6600: Unit 3 Homework: Your Name Spring 2018
73 pages
Matching and Selection On Observables Handout
100% (1)
Matching and Selection On Observables Handout
30 pages
Bioinformatics 1 - Lecture 8: Random Sequences and Significance Erdos & Renyi
No ratings yet
Bioinformatics 1 - Lecture 8: Random Sequences and Significance Erdos & Renyi
37 pages
Concordance
No ratings yet
Concordance
19 pages
Ver Invariant and Metric Free Proximities For Data Jss.v025.i11
No ratings yet
Ver Invariant and Metric Free Proximities For Data Jss.v025.i11
22 pages
Linear Approximations and The Cox Model
No ratings yet
Linear Approximations and The Cox Model
40 pages
Sta 224 Lecture Note 2
No ratings yet
Sta 224 Lecture Note 2
17 pages
lecture 总结
No ratings yet
lecture 总结
12 pages
Combining Paired and Two-Sample Data Using A Permutation Test
No ratings yet
Combining Paired and Two-Sample Data Using A Permutation Test
13 pages
EDUC/PSY 6600: Unit 6 Homework: Categorical Data - Binomial and Chi Squared Tests
No ratings yet
EDUC/PSY 6600: Unit 6 Homework: Categorical Data - Binomial and Chi Squared Tests
34 pages
Starting Cheet
No ratings yet
Starting Cheet
8 pages
Discrimination and Calibration by Terry Therneau
No ratings yet
Discrimination and Calibration by Terry Therneau
6 pages
STAT2 Resit Exam 2023-01-12
No ratings yet
STAT2 Resit Exam 2023-01-12
7 pages
Discrim
No ratings yet
Discrim
6 pages
Concordant and Discordant Pairs
No ratings yet
Concordant and Discordant Pairs
7 pages
CH 03 Quiz
No ratings yet
CH 03 Quiz
3 pages
Gea Cheatsheet
No ratings yet
Gea Cheatsheet
3 pages
Final - Answers Stud
100% (1)
Final - Answers Stud
11 pages
Solutions - Week 5
No ratings yet
Solutions - Week 5
3 pages
Stats Notes
No ratings yet
Stats Notes
76 pages
Advanced Biostatics II Individual Assignment
No ratings yet
Advanced Biostatics II Individual Assignment
32 pages
Harrell's Concordance Index R
No ratings yet
Harrell's Concordance Index R
13 pages
Formulas
No ratings yet
Formulas
12 pages
Unit 3 Assignment DIRECTIONS R spr18
No ratings yet
Unit 3 Assignment DIRECTIONS R spr18
28 pages
Grad Lecture 3
No ratings yet
Grad Lecture 3
27 pages
Bioinformatics 1 - Lecture 8: Probability Significance Extreme Value Distribution
No ratings yet
Bioinformatics 1 - Lecture 8: Probability Significance Extreme Value Distribution
31 pages
14.the Concordance Index Decomposition A Measure For A Deeper Understanding of Survival Prediction Models
No ratings yet
14.the Concordance Index Decomposition A Measure For A Deeper Understanding of Survival Prediction Models
30 pages
Correlation and Regression
No ratings yet
Correlation and Regression
5 pages
12.1correlation and Simple Linear
No ratings yet
12.1correlation and Simple Linear
45 pages
AP Stats Study Guide
No ratings yet
AP Stats Study Guide
17 pages
03 Practice Quizzes
No ratings yet
03 Practice Quizzes
3 pages
4 Types of Reliability
No ratings yet
4 Types of Reliability
57 pages
STAT 217 Final Cheat Sheets
No ratings yet
STAT 217 Final Cheat Sheets
5 pages
Logistic Regression: Continued Psy 524 Ainsworth
0% (1)
Logistic Regression: Continued Psy 524 Ainsworth
29 pages
Lecture Week 12 - Intro To Regression
No ratings yet
Lecture Week 12 - Intro To Regression
5 pages
Cheat Sheet AMDA Fall
No ratings yet
Cheat Sheet AMDA Fall
1 page
BSC Sample Surveys Unit III Part I
No ratings yet
BSC Sample Surveys Unit III Part I
5 pages
Correlation Rank - Correlation Curve - Fitting For Student
No ratings yet
Correlation Rank - Correlation Curve - Fitting For Student
26 pages
Comparing The Predictive Powers of Survival Model Using Harrell's C or Somer's D PDF
No ratings yet
Comparing The Predictive Powers of Survival Model Using Harrell's C or Somer's D PDF
20 pages
00 Lab Notes
No ratings yet
00 Lab Notes
8 pages
Introduction To Data Analysis: Professor David Richardson IIT Stuart School of Business
No ratings yet
Introduction To Data Analysis: Professor David Richardson IIT Stuart School of Business
31 pages
Microsoft Word - Documento1
No ratings yet
Microsoft Word - Documento1
14 pages
Testes de Qualidade de Ajuste
No ratings yet
Testes de Qualidade de Ajuste
113 pages
Quiz 2 Cheatsheet v3
No ratings yet
Quiz 2 Cheatsheet v3
2 pages
What Is Simple Linear Regression?
No ratings yet
What Is Simple Linear Regression?
7 pages
6 Continuous Data Analysis
No ratings yet
6 Continuous Data Analysis
49 pages
Stats101A - Chapter 1
No ratings yet
Stats101A - Chapter 1
25 pages
Samenvatting Statistiek 10tm17
No ratings yet
Samenvatting Statistiek 10tm17
11 pages
Coef de Associação
No ratings yet
Coef de Associação
17 pages
Regn Lect 5
No ratings yet
Regn Lect 5
9 pages
Czekanowski Index-Based Similarity As Alternative Correlation Measure in N-Asset Portfolio Analysis
No ratings yet
Czekanowski Index-Based Similarity As Alternative Correlation Measure in N-Asset Portfolio Analysis
1 page
R Commands
No ratings yet
R Commands
5 pages
QUIZ Complete Answers
0% (1)
QUIZ Complete Answers
21 pages
Multivariate Statistics Introduction
No ratings yet
Multivariate Statistics Introduction
20 pages
Ds Imp Qs
No ratings yet
Ds Imp Qs
4 pages
Quantitative Analysis in Football
No ratings yet
Quantitative Analysis in Football
19 pages
Chapter 4: of Tests and Testing 12 Assumptions in Psychological Testing and Assessment
No ratings yet
Chapter 4: of Tests and Testing 12 Assumptions in Psychological Testing and Assessment
5 pages
Exam 1
No ratings yet
Exam 1
5 pages
Statistics For Business and Economics: Hypothesis Testing
No ratings yet
Statistics For Business and Economics: Hypothesis Testing
57 pages
Bayesian Disease Mapping Hierarchical Modeling in Spatial Epidemiology, Third Edition 3rd Edition Full Text
100% (14)
Bayesian Disease Mapping Hierarchical Modeling in Spatial Epidemiology, Third Edition 3rd Edition Full Text
14 pages
Unit IV - Predictive Analytics - BA by PP
No ratings yet
Unit IV - Predictive Analytics - BA by PP
66 pages
Questions Updated
No ratings yet
Questions Updated
13 pages
RocData Tutorial 03 Triaxial Lab Data
No ratings yet
RocData Tutorial 03 Triaxial Lab Data
3 pages
Cap 6B. The Median, Quartiles and Interquartile Range
No ratings yet
Cap 6B. The Median, Quartiles and Interquartile Range
4 pages
GROUP 3 Research PPT Report
No ratings yet
GROUP 3 Research PPT Report
49 pages
ML Unit 3 Notes
No ratings yet
ML Unit 3 Notes
117 pages
FRM Test 15 Ans
No ratings yet
FRM Test 15 Ans
32 pages
Lecture 8 Arch and Garch
No ratings yet
Lecture 8 Arch and Garch
39 pages
7SSMM700 Lecture 8
No ratings yet
7SSMM700 Lecture 8
33 pages
Unit IV Lesson 2 Understanding Confidence Interval Estimates For The Sample Mean
No ratings yet
Unit IV Lesson 2 Understanding Confidence Interval Estimates For The Sample Mean
18 pages
Syllabus SP 20 Stat 300 Hybrid (8 Weeks) Sunday
No ratings yet
Syllabus SP 20 Stat 300 Hybrid (8 Weeks) Sunday
7 pages
AIML Lect6 Ensembles
No ratings yet
AIML Lect6 Ensembles
41 pages
Kassahun Ketema
No ratings yet
Kassahun Ketema
12 pages
Steps Ofvvector Estimating Error Correction Model
No ratings yet
Steps Ofvvector Estimating Error Correction Model
4 pages
Doc-20240330-Wa0001 240330 194806
No ratings yet
Doc-20240330-Wa0001 240330 194806
7 pages
Visualization
No ratings yet
Visualization
27 pages
Hypothesis Testing Skills Set
No ratings yet
Hypothesis Testing Skills Set
6 pages
Chapter 2 The Simple Regression Model
No ratings yet
Chapter 2 The Simple Regression Model
9 pages
Discriptive Statistics
No ratings yet
Discriptive Statistics
50 pages
Chapter 3 Multiple Regression Analysis Estimation
No ratings yet
Chapter 3 Multiple Regression Analysis Estimation
38 pages
Sa
No ratings yet
Sa
4 pages
Data Handeling and Data Visualization Concept Map
No ratings yet
Data Handeling and Data Visualization Concept Map
18 pages
Pharmaceutical Quality - The Dissolution Test and Clinically Relevant Specifications - Impact On Product Development
No ratings yet
Pharmaceutical Quality - The Dissolution Test and Clinically Relevant Specifications - Impact On Product Development
20 pages
01 Handout 2
No ratings yet
01 Handout 2
6 pages
Exam - 1 - OSTA
No ratings yet
Exam - 1 - OSTA
17 pages
882 - Business Statistics - 720 - (24-05-23 08 - 23 - 39 - 852 Am)
No ratings yet
882 - Business Statistics - 720 - (24-05-23 08 - 23 - 39 - 852 Am)
5 pages
Bin Frequency 2 2 3 1 4 1 5 1 6 0 More 0
No ratings yet
Bin Frequency 2 2 3 1 4 1 5 1 6 0 More 0
3 pages
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
From Everand
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
CSPacademic
No ratings yet