Lin 1992
Lin 1992
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/
info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact [email protected].
International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to Biometrics.
http://www.jstor.org
This content downloaded from 194.27.18.18 on Sat, 02 May 2015 00:39:15 UTC
All use subject to JSTOR Terms and Conditions
BIOMETRICS 48, 599-604
June 1992
AssayValidationUsingtheConcordance
Correlation
Coefficient
LawrenceI-KueiLin
BaxterHealthcare
Corporation,
P.O. Box 490/WG1-3N,
RoundLake,Illinois60073,U.S.A.
SUMMARY
A newreproducibility index,calledtheconcordance correlation hasbeenproposed(Lin,
coefficient,
1989,Biometrics 45, 255-268) to evaluatethereproducibility
of an assay,method,or instrument.
This indexincludesmeasurements of precisionand accuracy.Based on certaincriteria
aboutthe
allowablelossesin precision one can computethesamplesizerequirement.
and accuracy, One can
alsomeaningfully validatethereproducibility
usingtheconfidenceintervalapproach.Forillustration,
an application
usingactualdatais given.
1. Introduction
In an assay validationor an instrumentvalidationprocess,the reproducibilityproperties
can be characterizedby a concordancecorrelationcoefficient
(Lin, 1989). For example,if
a newassaycan reproduce
theoutcomeofthe"gold-standard"
assay,theplotofthenew
assay's resultsversusthe standardassay's resultsshould fallcloselyon the 45? line through
the origin(45', the concordanceline), as shown in Figure 1.
Departurefromstandardcan be measuredby how farthe observationsdeviatefromthe
concordanceline in the scale of 1 (perfectagreement)to 0 (no agreement)to -1 (perfect
reversedagreement).This departurefromstandardconsistsof a measureof precision(not
correctable)multipliedby a measureof accuracy(correctable,forexample,by calibration),
and is referredto as Pc.
The measureof precisionevaluateshow fartheobservations
deviatefromthebest-fit
linearline. The usual Pearson correlationcoefficient(p) is used. The measureof accuracy
(Cb) evaluateshow farthe best-fitline deviatesfromthe concordanceline in the scale of 1
(no deviation)to (but not including)0 (veryfaraway). This bias consistsof a scale shift
(ratio of 2 standarddeviations,denoted by v) and a location shift(squared differencein
means relativeto the productof 2 standarddeviations,denotedby u2).
Let us assume thatpairsof samples(Y1I? Y,2),i = 1, 2, . . ., n, are independentlyselected
froma bivariatepopulation,withmeans Al and A2and covariancematrix
(2
U'l P J10'2
2
P l02 02 /
Then
U2 = (A1 - A2)2/(o10-l2), V = 01/02, and Cb = 2[v + (1/v) + u21-1.
Finally,Pc = PCb.
The sample counterpartof Pcis
" + 2S12
IcS2 + S2 +( _F2)2,
Keywords. Accuracy;
Concordance;
Precision;
Reproducibility.
599
This content downloaded from 194.27.18.18 on Sat, 02 May 2015 00:39:15 UTC
All use subject to JSTOR Terms and Conditions
600 Biometrics,June 1992
new 45?
gold-standard
Figure1. Reproducibility
ofnewassayagainstgold-standard
assay.
where
1 y? 1
=n )2, 1, 2;
n=E 1=
and
It has been shown (Lin, 1989) that for bivariate normal samples, using the inverse
hyperbolictangenttransformation
(Z transformation),
ZC = tanh-(p$) = In 1 + PC
1 PC
normalwithmean
yieldsa distributionasymptotically
and variance
7
2 1__ _ F(i-P c 4p'(1 _
)ll2 2p4it 1
n=,- 2 1(1 - + p(1 - Pc)' P-C p1 -)_
2.1 Rationale
In thetraditionalhypothesistestingsetting,the practicehas been to minimizethe Type II
errorratewhilecontrollingthe Type I errorrate.In the sense ofjustice,it is parallelto the
idea of maintainingthat one is innocentuntilproven(beyond a reasonabledoubt) guilty.
This practice is sensible when we try to prove that one thing is different from the other.
However,in the assay validationprocess,we are tryingto provethatone thingis
thepracticeshouldbe to minimizetheTypeI
to theother.In thissituation,
equivalent
This content downloaded from 194.27.18.18 on Sat, 02 May 2015 00:39:15 UTC
All use subject to JSTOR Terms and Conditions
AssayValidationUsingConcordance 601
errorratewhilecontrolling theType II errorrate-thatis, maintaining thatone is guilty
untilproveninnocent. In theformer whenassaysarenotequivalent,
situation, thesmaller
thesamplesize(weakerevidence),theeasierit is to acceptthehypothesis ofequivalence.
Thislogiccontradicts intuition.
In thelattersituation,
thepractice is equivalent
to reversing
the(one-tailed)nulland alternative hypotheses. Thisis parallelto thefollowingapproach
usingtheconceptof confidence intervals,whichis straightforward and mucheasierto
understand.
After thecollectionofdata,computethesampleconcordance correlationcoefficientand
its 100(1- a)% lower(one-tailed) confidence limit.AlsocomputetheleastacceptablePc,
namely,Pca, assumingwe can accept 100x% loss in precision(p can be droppedto
vp2- X), 100zi%locationshift per standarddeviation,and 100(1 - v)% scale shift.We
wouldacceptan assay'sreproducibility ifthe100(1- ca)%lowerconfidence limitis greater
thanor equal to Pca. Notethatin usingthisapproach,whenassaysare reproducible, the
smallerthesamplesize,thesmallerthelowerconfidence limit,andtherefore, thelesslikely
onewouldaccepttheassay.Thisis also paralleltotheconceptoftesting thenullhypothesis
Ho: Pc > Pca againstthe one-tailedalternativehypothesisHa: Pc> Pca. We would accept the
ifthenullhypothesis
assay'sreproducibility is rejected.In thiscase,ifthepower1
I- at
Pc= p is required,
thealternative thesamplesizeis givenbytheformula
n= -3) + ( - ) + 2,
whereJD1is theinversecumulative
normalfunction,
Z= In l P ZCa = - In l Pc,
2 l -p . 2 1 -Pc~a
2.2 Example
A studywas conductedat the microbiological
laboratory of BaxterHealthcareCorp.to
assessthereproducibility
ofa turbidometric
(TB) assay,andto compareittothetraditional
agar diffusion
(AD) assay. These two microbiological
assaysare used to measurethe
This content downloaded from 194.27.18.18 on Sat, 02 May 2015 00:39:15 UTC
All use subject to JSTOR Terms and Conditions
602 Biometrics,June 1992
Table 1
Samplesize neededgivenprecision
(p2) todetectbias (u and/orv) and/orprecision
loss(x);
a= = .05
Precision(p2): .95 .96 .97 .98 .99
Ha: orp .975 .980 .985 .990 .995
X U V Cb Pc,a n Pc,a n Pc,a n Pc,a n Pc,a n
.00 .000 .9 .994 .969 1,029 .974 669 .979 389 .984 186 .989 58
.8 .976 .951 72 .956 50 .961 33 .966 19 .971 9
.125 1.0 .992 .967 653 .972 440 .977 269 .982 138 .987 48
.9 .987 .962 246 .967 170 .972 108 .977 60 .982 25
.8 .968 .944 56 .949 41 .954 28 .958 18 .963 10
.250 1.0 .970 .945 69 .950 50 .955 33 .960 20 .965 10
.9 .965 .940 55 .945 40 .950 28 .955 18 .960 10
.8 .947 .923 31 .928 24 .932 18 .937 12 .942 8
.01 .000 1.0 1.000 .970 1,234 .975 833 .980 507 .985 259 .990 91
.9 .994 .964 324 .969 221 .974 138 .979 74 .984 30
.8 .976 .946 57 .951 41 .956 28 .961 17 .966 9
.125 1.0 .992 .962 271 .967 190 .972 123 .977 69 .982 30
.9 .987 .957 146 .962 104 .967 69 .972 41 .977 20
.8 .968 .939 47 .944 35 .949 25 .954 16 .958 10
.250 1.0 .970 .940 57 .945 42 .950 30 .955 19 .960 10
.9 .965 .935 47 .940 35 .945 25 .950 17 .955 10
.8 .947 .918 29 .923 22 .928 17 .932 12 .937 8
.02 .000 1.0 1.000 .964 362 .970 253 .975 162 .980 90 .985 38
.9 .994 .959 171 .964 121 .969 79 .974 46 .979 21
.8 .976 .941 47 .946 35 .951 25 .956 16 .961 9
.125 1.0 .992 .957 158 .962 114 .967 76 .972 46 .977 22
.9 .987 .952 101 .957 74 .962 51 .967 31 .972 16
.8 .968 .934 40 .939 31 .944 22 .949 15 .954 9
.250 1.0 .970 .935 49 .940 37 .945 27 .950 18 .955 10
.9 .965 .930 41 .935 32 .940 23 .945 16 .950 9
.8 .947 .913 27 .918 21 .923 16 .928 12 .932 8
This content downloaded from 194.27.18.18 on Sat, 02 May 2015 00:39:15 UTC
All use subject to JSTOR Terms and Conditions
AssayValidationUsingConcordance 603
T
U
R
B
I
D
0
M 0.8 -
E
T
I
C 0.7 -
A 0 .5
Y
0 0
0.4
DIFFUSION ASSAY
Figure2. Reproducibility
ofturbidometricassayagainstdiffusion assay(0: Observation;
*: Actual
concentration).
The concordancecorrelationcoefficient
is .994; itslower95% confidence limitis
.986 [acceptable
(> .95)].
3. Conclusion
Thetraditionalhypothesistesting approach,whichacceptstheequivalenceofthemethods
whenthe null hypothesis of no difference cannotbe rejected,makes littlesense for
reproducibility
validation.Tests of equivalentmeanssuch as the pairedt-test, or least
squaresanalysisof testinga nullintercept and unitslope forreproducibility
validation,
presentevenmoreparadoxical results(Lin,1989).Theuseoftheseapproaches iswidespread
in theliterature,
especiallyin nonstatistical
journals.Not onlydo theyconfoundlarge
imprecisionwiththeabilityto detectbias,buttheyalso confusetheTypeI and TypeII
errors.
The concordance correlation whichevaluatestheagreement
coefficient, ofpairedsam-
ples,can be usedto validatethereproducibilityofan assay,instrument,or method.It is
meaningful and easyto perform. The proposedguidelinesforsuchvalidationrequirethe
ofallowablelossesin precisionand accuracy.The samplesize requirement
specification
foreffective
validationcan be computedbasedon thesameprinciple.
The concordance correlationcoefficientcan potentially
be an excellenttool in many
typesofgoodness-of-fit
evaluations, bysimplyexamining howwelltheobservedoutcomes
concordwiththehypothesized values.Thesepossibilities
are underinvestigation.
This content downloaded from 194.27.18.18 on Sat, 02 May 2015 00:39:15 UTC
All use subject to JSTOR Terms and Conditions
604 Biometrics,June 1992
ACKNOWLEDGEMENTS
RfSUMf
Nous proposonsun nouvelindicepourevaluerla reproductibilite d'un dosage,d'une methodeou
d'un appareil:le coefficient
de concordance-correlation.
Cet indiceenglobesimultanement des
mesuresde precisionet de biais.A partirde criteres
definissur les pertesde precisionet le biais
acceptables,
descalculsd'effectif
peuventetreeffectues.
L'approcheutilisant unintervallede confiance
peut6galement constituerune voiepertinente A titred'illustration,
pourvaliderla reproductibilite.
un exempleportant surde vraiesdonneesestpresented.
REFERENCES
Bross,I. D. (1985). ReaderViewpoint: Whyproofof safetyis muchmoredifficult thanproofof
hazard.Biometrics 41, 785-793.
Dunnett, C. W. andGent,M. (1977).Significance testing
toestablish
equivalence betweentreatments,
withspecialreference to datain theformof2 x 2 tables.Biometrics 33, 593-602.
Lin, L. (1989). A concordancecorrelation coefficient
to evaluatereproducibility.
Biometrics45,
255-268.
Rodary,C., Com-Noughe, C., and Tournade,M. F. (1989). How to establish equivalencebetween
treatments: A one-sidedclinicaltrialin pediatric
oncology.Statistics
inMedicine8, 593-598.
This content downloaded from 194.27.18.18 on Sat, 02 May 2015 00:39:15 UTC
All use subject to JSTOR Terms and Conditions