Up Lee

Techniques for Determining the Measurement Compatibility of Power Sensor
Cal Factors and Out-of-Tolerance Conditions

Speaker/Author: Yeou-Song (Brian) Lee
Anritsu Company, Department of Quality
490 Jarvis Dr., Morgan Hill, CA 95037
Tel: 408-201-1976 E-mail: brian-ys.lee@anritsu.com
Abstract
The normalized error is a measure of consistency in comparison measurement. It is
recommended by the ISO 17043 and also by many accreditation bodies for PT and ILC. Its
definition was based on the Birge test for uncorrelated random variables with presumed
probability density functions. In calibration laboratories, variances that form part of the equation
of the normalized error are obtained from the GUM-based uncertainty. This paper reviews
several suggested techniques with regard to applying the normalized error to assess the
measurement results. As noted by the author, normalized error is sometimes used by the thirdparty calibration laboratories in order to determine the product conformance to specification.
This paper discusses the pros and cons of this approach and cites a case study of the microwave
power sensor. The decision risk model is suggested as a supplemental tool to the normalized
error method when determining the out of tolerance condition.
Introduction
Precision instruments and precision devices require routine calibrations in order to ensure their
accuracies are continuously fit for use. After each calibration is performed, measurement results
and associated uncertainties are normally reported to the users. Furthermore, a compliance
statement to the product specification is sometimes desirable. For some products, outputs of their
calibration are assigned with values and associated uncertainties. An issue arises for declaring
compliance to specifications for two reasons. First, there is no universal industrial guidance.
Second, certain metrics for such a declaration are often used incorrectly. This paper discusses
various techniques which might be used based on the measurement science. An example on the
power sensor cal factors is used to explore the options for the determination of compliance, i.e.
measurement compatibility and out-of-tolerance conditions.
When comparing two measurements, the Birge test is most of the time used to assess the
statistical consistency. This method treats the squared standard uncertainties as the known
variances all with normal distributions. The Birge test and the concept of statistical consistency
do not apply to the results of measurement based on the GUM [1].
When a microwave power sensor is calibrated to a standard, a set of calibration factors or
effective efficiencies along with the measurement uncertainty of the sensor is reported. Routine
calibration of power sensors within the defined interval ensures the accuracy of calibration factor
and uncertainty. Due to the complexity of the calibration system, the verification is usually done
at the factory by the manufacturer. The verification of the accuracy requires a use of traceable
standard, which is normally the calorimeter. The transfer measurement is done with a more
2011 NCSL International Workshop and Symposium
accurate technique, for example, substitution. When a laboratory uses a technique at the same
level of accuracy with the original characterization, a technique borrowing from the normalized
error was utilized by many credible laboratories.
Concept of Metrological Compatibility
Metrological compatibility is a concept defined in the 2008 version of the Vocabulary in
Metrology (VIM). Two measurement results are metrologically comparable only if they are
traceable to the same reference. Therefore, this concept applies to the results traceable to the
same reference. Metrological compatibility of measurement results replaces the traditional
concept of staying within the error, as it represents the criterion for deciding whether two
measurement results refer to the same measurand or not. If in a set of measurements of a
measurand, thought to be constant, a measurement result is not compatible with the others, either
the measurement was not correct (e.g. its measurement uncertainty was assessed as being too
small) or the measured quantity changed between measurements.
Correlation between the measurements influences metrological compatibility of measurement
results. If the measurements are completely uncorrelated, the standard measurement uncertainty
of their difference is equal to the root mean square sum of their standard measurement
uncertainties, while it is lower for positive covariance or higher for negative covariance.
Definition of Normalized Error, En
In a normal comparison of two measurements, the Normalized error, En is an appropriate
criterion. En numbers are calculated using Equation (1)
x X
En =
(1)
2
2
U lab + U ref
Where
Ulab is the expanded uncertainty of a participants result;
Uref is the expanded uncertainty of the reference laboratorys assigned value.
This formula is correct only if x and X are independent. It will only be meaningful if the
uncertainty estimates are determined in a consistent manner by all participants.
When E n 1 , the performance is satisfactory with the reference value and its associated
uncertainty. This leads to a conclusion that the two measurement results are equivalent.
When E n > 1 , the two measurement results are not equivalent.
Quantifying Demonstrated Equivalence
In [2], a Quantifying Demonstrated Equivalence, QDE, equation is introduced. The final
outcome of this analysis can lead to a statement like the following.
On the basis of comparison measurements performed in the period of date-to-date, the results of
similar measurements made at lab 1 and lab 2 can be expected to agree to within [ QDE 0.95 ] ,
with 95% confidence.
For each laboratory, the comparison result (its mean and its uncertainty budget) is used to
construct a probability density function for that laboratorys measurement. After the correction
of correlations and convolution of the joint probability density function, the range established by
this analysis can be said to have demonstrated the confidence level of the claimed equivalence.
When u ( x r ) is not negligible but x r value is known independently of the xi nominal value,
xi x r
En =
u 2 ( xi ) + u 2 ( x r )
This equation provides simple laboratory-by-laboratory consistency testing to an external
standard.
As indicated by [2], using E n as the single parameter for quantifying equivalence can be
misleading. The confidence for an equivalence interval has not been available previously for
different values of E n . Confidence levels for agreement are low and much less than the 95%
confidence.
QDE 0.95 m2 m1 + {1.645 + 0.3295 exp[ 4.05( m2 m1 / u P )]}u P
Care is required when testing equivalence is claimed: in any aggregation, a Pair-Difference (PD)
is only eligible for inclusion if both participants claim that their answers should be the same. The
uncertainties required are exactly those required when any measurement is used in the simplest
possible way: in comparison with another measurement.
Hypothesis Testing and Statistical Significance
Hypothesis testing and experimental design are popular tools in the situation of interpreting the
equivalence of experiments. They are useful tools for statistically determining the validity of
calibration results. However, a result that is statistically significant is not necessarily practically
significant. For example, suppose we wish to determine if the mean absolute errors in new and
standard methods of measurement are related by new < standard with a view to preferring the new
method over the standard method if the data support the decision. No matter how small the mean
absolute errors are, an alternative hypothesis is always rejected. Any statistical method should be
fit for its purpose. Without a clear view of the practical purpose of the task, there will not be
unanimity over the statistical approach to be taken [3].
In testing the power sensor cal factor and uncertainty, we are interested in possible differences in
the mean response for two measurements. However, it is the comparison of variability in the
uncertainty that is important. Unlike the tests on means, the procedures for tests on variances are
rather sensitive to the normality assumption. Suppose we wish to test the hypothesis that the
uncertainty of a normal distribution equals a constant, for example, 02 .
H 0 : 2 = 02
H 1 : 2 02
The test statistic is
=
2
0
SS
02
(y
=
n
i =1
02
Now consider testing the equality of the uncertainties of two normal populations. The test
statistic is the ratio of the sample variances [5].
F0 =
S12
S 22
Given that two means, x A and x B B, and variances with normal distribution, A2 and B2, the
computed value of a paired t-test statistic, t0.025, =1.962 (with 95% confidence interval) for two
independent variables.
t 0.025, =
x A xB
A2 + B2
<2
(2)
There are two important notes for this equation from the operating curves [4].
1. The greater the difference in means, the smaller the probability of type II error (false
reject) for a given sample size. That is, the test will detect large differences more easily
than small ones.
2. As the sample size gets larger, the probability of type II error (false reject) gets smaller
for a given difference in means. That is, to detect a specified difference, we may make
the test more powerful by increasing the sample size.
The second point is critical when applying this method to determine the out-of-tolerance
conditions of the power sensor cal factor.
The denominator of Eq. (2) should include the covariance terms if two variables are not
independent. The Bayesian posterior predictive p-value [1] of the realized discrepancy measure
x A x B is therefore
x A xB
p p = Pr Z
A2 + B2 2 AB A B
where AB is the correlation coefficient between the presumed normal sampling probability
density functions.
Extension of Normalized Error to the Out-of-tolerance Decision
When two laboratories make measurements and apply the normalized errors to assess the out of
tolerance condition, we suggest that a supplemental test using the false acceptance and rejection
rates of the two measurement results based on the statistical recommendation in the ANSI
Z540.3 handbook might be used.
Microwave Power Sensor Case Study
In this case study, Anritsu power sensors, MA2445D, were used and tested by three independent
laboratories. Two laboratories use the same measurement method and the third one has a
different technique. After each measurement, the normalized error was calculated per equation (1)
and reported in Figure 1.
NormalizedError
2.50
ratio
2.00
1.50
BLabvs.Ref
1.00
ALabvs.Ref
0.50
BLabvs.ALab
0.00
0
10
20
30
40
50
Freq(GHz)
Figure1:NormalizedErrorresultsamongthethreelabs.
Seven unsatisfactory comparisons were noted. In order to understand the failure modes, we
further examine the expanded uncertainties from each lab. The uncertainty values are very close
and shown in Figure 2.
ExpandedUncertainty:CalFactor
12
10
8
6
BLab
ALab
2
0
0
10
15
20
25
30
35
40
45
50
Freq(GHz)
Figure2:ExpandedUncertaintyReportedforeachlab.
After comparing the unsatisfactory normalized errors, the following was noted. Apparently
inconsistent decisions were made based solely on the normalized errors. We, therefore, construct
four scenarios and they are shown in a graphic plot in Figure 3. The decisions between two
laboratories differ when an uncertainty window larger than the target is reported. The likelihood
of consistent decisions increases when both laboratories report uncertainty values similar to the
target. This result can be explained with the calculation of the false accept/reject rates for
compliance decision.
Anritsu: Fail Ex: Fail
Anritsu: Pass Ex: Pass
Anritsu: Pass Ex: Fail
Anritsu: Fail Ex: Pass
Figure3:FourCasesSimulatedforthePass/FailDecision.(Dottedgreencircle:Anritsu;bluecircle:ExternalLab;
dashedgreencircle:targetuncertainty.)
This leads us to analyze the probability of false accept/reject rates borrowing the techniques
specified by the ANSI/NCSLI Z540.3 handbook. Table 1 is a summary for the 19 GHz Cal
Factor of the MA2445D. Case A shows a higher false reject rate. A large false reject rate implies
that decisions on the out-of-tolerance judgment might be statistically false.
Table1:FalseAcceptandRejectAnalysisforthreecases.
Scenario
Tol
Limit
Cal
Unc
False Accept
Rate
False Reject
Rate
Wrong Decision
Rate
Case A
2.65
3.3
0.99%
18.56%
19.55%
Case B
5.3
3.3
3.62%
7.64%
11.27%
Case C
5.3
2.65
3.27%
5.76%
9.03%
Conclusions
We discuss the need to have a measure for determining the out-of-tolerance conditions for
certain measurements. The normalized error and other statistical techniques may be used
together to make a correct decision. Cautions on En are given when covariance cannot be ignored.
When the measurement techniques are at the same level of accuracy, adding the false accept and
reject analysis would ensure proper interpretation leading to out-of-tolerance decision.
References
1. Raghu N. Kacker, Rdiger Kessel, and Klaus-Dieter Sommer, Assessing Differences
Between Results Determined According to the Guide to the Expression of Uncertainty in
Measurement, Journal of Research of the National Institute of Standards and
Technology, Vol. 115, No. 6, Nov-Dec 2010, pp. 453-459.
2. A.G.Steele and R.J. Douglas, Extending En for measurement science, Metrologia 43
(2006) S235-S243
3. B. Wood and R.Douglas, Quantifying Demonstrated Equivalence, IEEE Tran.
Instrumentation and Measurement, Vol. 48, No.2, April, 1999, pp. 162-165.
4. R. Willink, Principles of probability and statistics for metrology, Metrologia 43 (2006)
S211-S219.
5. Douglas Montgomery, Design and Analysis of Experiments, 6th Edition, Wiley.
6. ISO/IEC 17043-2010 Conformity assessment-General requirements for proficiency
testing.

Up Lee

Uploaded by

Copyright:

Available Formats

Up Lee

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Up Lee

Uploaded by

Copyright:

Available Formats

Techniques for Determining the Measurement Compatibility of Power Sensor

Cal Factors and Out-of-Tolerance Conditions

2011 NCSL International Workshop and Symposium

The test statistic is

2011 NCSL International Workshop and Symposium

2011 NCSL International Workshop and Symposium

Anritsu: Pass Ex: Pass

Anritsu: Pass Ex: Fail

Anritsu: Fail Ex: Pass

2011 NCSL International Workshop and Symposium

2011 NCSL International Workshop and Symposium

You might also like