Up Lee
Up Lee
Up Lee
accurate technique, for example, substitution. When a laboratory uses a technique at the same
level of accuracy with the original characterization, a technique borrowing from the normalized
error was utilized by many credible laboratories.
Concept of Metrological Compatibility
Metrological compatibility is a concept defined in the 2008 version of the Vocabulary in
Metrology (VIM). Two measurement results are metrologically comparable only if they are
traceable to the same reference. Therefore, this concept applies to the results traceable to the
same reference. Metrological compatibility of measurement results replaces the traditional
concept of staying within the error, as it represents the criterion for deciding whether two
measurement results refer to the same measurand or not. If in a set of measurements of a
measurand, thought to be constant, a measurement result is not compatible with the others, either
the measurement was not correct (e.g. its measurement uncertainty was assessed as being too
small) or the measured quantity changed between measurements.
Correlation between the measurements influences metrological compatibility of measurement
results. If the measurements are completely uncorrelated, the standard measurement uncertainty
of their difference is equal to the root mean square sum of their standard measurement
uncertainties, while it is lower for positive covariance or higher for negative covariance.
Definition of Normalized Error, En
In a normal comparison of two measurements, the Normalized error, En is an appropriate
criterion. En numbers are calculated using Equation (1)
x X
En =
(1)
2
2
U lab + U ref
Where
Ulab is the expanded uncertainty of a participants result;
Uref is the expanded uncertainty of the reference laboratorys assigned value.
This formula is correct only if x and X are independent. It will only be meaningful if the
uncertainty estimates are determined in a consistent manner by all participants.
When E n 1 , the performance is satisfactory with the reference value and its associated
uncertainty. This leads to a conclusion that the two measurement results are equivalent.
When E n > 1 , the two measurement results are not equivalent.
Quantifying Demonstrated Equivalence
In [2], a Quantifying Demonstrated Equivalence, QDE, equation is introduced. The final
outcome of this analysis can lead to a statement like the following.
2011 NCSL International Workshop and Symposium
On the basis of comparison measurements performed in the period of date-to-date, the results of
similar measurements made at lab 1 and lab 2 can be expected to agree to within [ QDE 0.95 ] ,
with 95% confidence.
For each laboratory, the comparison result (its mean and its uncertainty budget) is used to
construct a probability density function for that laboratorys measurement. After the correction
of correlations and convolution of the joint probability density function, the range established by
this analysis can be said to have demonstrated the confidence level of the claimed equivalence.
When u ( x r ) is not negligible but x r value is known independently of the xi nominal value,
xi x r
En =
u 2 ( xi ) + u 2 ( x r )
This equation provides simple laboratory-by-laboratory consistency testing to an external
standard.
As indicated by [2], using E n as the single parameter for quantifying equivalence can be
misleading. The confidence for an equivalence interval has not been available previously for
different values of E n . Confidence levels for agreement are low and much less than the 95%
confidence.
QDE 0.95 m2 m1 + {1.645 + 0.3295 exp[ 4.05( m2 m1 / u P )]}u P
Care is required when testing equivalence is claimed: in any aggregation, a Pair-Difference (PD)
is only eligible for inclusion if both participants claim that their answers should be the same. The
uncertainties required are exactly those required when any measurement is used in the simplest
possible way: in comparison with another measurement.
Hypothesis Testing and Statistical Significance
Hypothesis testing and experimental design are popular tools in the situation of interpreting the
equivalence of experiments. They are useful tools for statistically determining the validity of
calibration results. However, a result that is statistically significant is not necessarily practically
significant. For example, suppose we wish to determine if the mean absolute errors in new and
standard methods of measurement are related by new < standard with a view to preferring the new
method over the standard method if the data support the decision. No matter how small the mean
absolute errors are, an alternative hypothesis is always rejected. Any statistical method should be
fit for its purpose. Without a clear view of the practical purpose of the task, there will not be
unanimity over the statistical approach to be taken [3].
In testing the power sensor cal factor and uncertainty, we are interested in possible differences in
the mean response for two measurements. However, it is the comparison of variability in the
uncertainty that is important. Unlike the tests on means, the procedures for tests on variances are
rather sensitive to the normality assumption. Suppose we wish to test the hypothesis that the
uncertainty of a normal distribution equals a constant, for example, 02 .
H 0 : 2 = 02
H 1 : 2 02
=
2
0
SS
02
(y
=
n
i =1
02
Now consider testing the equality of the uncertainties of two normal populations. The test
statistic is the ratio of the sample variances [5].
F0 =
S12
S 22
Given that two means, x A and x B B, and variances with normal distribution, A2 and B2, the
computed value of a paired t-test statistic, t0.025, =1.962 (with 95% confidence interval) for two
independent variables.
t 0.025, =
x A xB
A2 + B2
<2
(2)
There are two important notes for this equation from the operating curves [4].
1. The greater the difference in means, the smaller the probability of type II error (false
reject) for a given sample size. That is, the test will detect large differences more easily
than small ones.
2. As the sample size gets larger, the probability of type II error (false reject) gets smaller
for a given difference in means. That is, to detect a specified difference, we may make
the test more powerful by increasing the sample size.
The second point is critical when applying this method to determine the out-of-tolerance
conditions of the power sensor cal factor.
The denominator of Eq. (2) should include the covariance terms if two variables are not
independent. The Bayesian posterior predictive p-value [1] of the realized discrepancy measure
x A x B is therefore
x A xB
p p = Pr Z
A2 + B2 2 AB A B
where AB is the correlation coefficient between the presumed normal sampling probability
density functions.
Extension of Normalized Error to the Out-of-tolerance Decision
When two laboratories make measurements and apply the normalized errors to assess the out of
tolerance condition, we suggest that a supplemental test using the false acceptance and rejection
rates of the two measurement results based on the statistical recommendation in the ANSI
Z540.3 handbook might be used.
Microwave Power Sensor Case Study
In this case study, Anritsu power sensors, MA2445D, were used and tested by three independent
laboratories. Two laboratories use the same measurement method and the third one has a
different technique. After each measurement, the normalized error was calculated per equation (1)
and reported in Figure 1.
NormalizedError
2.50
ratio
2.00
1.50
BLabvs.Ref
1.00
ALabvs.Ref
0.50
BLabvs.ALab
0.00
0
10
20
30
40
50
Freq(GHz)
Figure1:NormalizedErrorresultsamongthethreelabs.
Seven unsatisfactory comparisons were noted. In order to understand the failure modes, we
further examine the expanded uncertainties from each lab. The uncertainty values are very close
and shown in Figure 2.
ExpandedUncertainty:CalFactor
12
10
8
6
BLab
ALab
2
0
0
10
15
20
25
30
35
40
45
50
Freq(GHz)
Figure2:ExpandedUncertaintyReportedforeachlab.
After comparing the unsatisfactory normalized errors, the following was noted. Apparently
inconsistent decisions were made based solely on the normalized errors. We, therefore, construct
four scenarios and they are shown in a graphic plot in Figure 3. The decisions between two
laboratories differ when an uncertainty window larger than the target is reported. The likelihood
of consistent decisions increases when both laboratories report uncertainty values similar to the
target. This result can be explained with the calculation of the false accept/reject rates for
compliance decision.
Anritsu: Fail Ex: Fail
Figure3:FourCasesSimulatedforthePass/FailDecision.(Dottedgreencircle:Anritsu;bluecircle:ExternalLab;
dashedgreencircle:targetuncertainty.)
This leads us to analyze the probability of false accept/reject rates borrowing the techniques
specified by the ANSI/NCSLI Z540.3 handbook. Table 1 is a summary for the 19 GHz Cal
Factor of the MA2445D. Case A shows a higher false reject rate. A large false reject rate implies
that decisions on the out-of-tolerance judgment might be statistically false.
Table1:FalseAcceptandRejectAnalysisforthreecases.
Scenario
Tol
Limit
Cal
Unc
False Accept
Rate
False Reject
Rate
Wrong Decision
Rate
Case A
2.65
3.3
0.99%
18.56%
19.55%
Case B
5.3
3.3
3.62%
7.64%
11.27%
Case C
5.3
2.65
3.27%
5.76%
9.03%
Conclusions
We discuss the need to have a measure for determining the out-of-tolerance conditions for
certain measurements. The normalized error and other statistical techniques may be used
together to make a correct decision. Cautions on En are given when covariance cannot be ignored.
When the measurement techniques are at the same level of accuracy, adding the false accept and
reject analysis would ensure proper interpretation leading to out-of-tolerance decision.
References
1. Raghu N. Kacker, Rdiger Kessel, and Klaus-Dieter Sommer, Assessing Differences
Between Results Determined According to the Guide to the Expression of Uncertainty in
Measurement, Journal of Research of the National Institute of Standards and
Technology, Vol. 115, No. 6, Nov-Dec 2010, pp. 453-459.
2. A.G.Steele and R.J. Douglas, Extending En for measurement science, Metrologia 43
(2006) S235-S243
3. B. Wood and R.Douglas, Quantifying Demonstrated Equivalence, IEEE Tran.
Instrumentation and Measurement, Vol. 48, No.2, April, 1999, pp. 162-165.
4. R. Willink, Principles of probability and statistics for metrology, Metrologia 43 (2006)
S211-S219.
5. Douglas Montgomery, Design and Analysis of Experiments, 6th Edition, Wiley.
6. ISO/IEC 17043-2010 Conformity assessment-General requirements for proficiency
testing.