Testing Theory
Testing Theory
P.J.G. Teunissen
This work is licensed under a Creative Commons Attribution 4.0 International license
P.J.G. Teunissen
December, 2006
July, 2024
Peter J.G. Teunissen
Foreword
This book is based on the lecture notes of the course ’Testing theory’ (Inleiding Toetsingstheorie)
as it has been offered since 1989 by the Department of Mathematical Geodesy and Positioning
(MGP) of the Delft University of Technology. This course is a standard requirement and is given
in the second year. The prerequisites are a solid knowledge of adjustment theory together with
linear algebra, statistics and calculus at the undergraduate level. The theory and application of
least-squares adjustments are treated in the lecture notes Adjustment theory (Delft University
Press, 2000). The material of the present course is a follow up on this course on adjustment
theory. Its main goal is to convey the knowledge necessary to be able to judge and validate the
outcome of an adjustment. As in other physical sciences, measurements and models are used in
Geodesy to describe (parts of) physical reality. It may happen however, that some of the
measurements or some parts of the model are biased or in error. The measurements, for instance,
may be corrupted by blunders, or the chosen model may fail to give an adequate enough
description of physical reality. These mistakes can and will occasionally happen, despite the fact
that every geodesist will try his or her best to avoid making such mistakes. It is therefore of
importance to have ways of detecting and identifying such mistakes. It is the material of the
present lecture notes that provides the necessary statistical theory and testing procedures for
resolving situations like these.
Following the Introduction, the basic concepts of statistical testing are presented in Chapter 1.
In Chapter 2 the necessary theory is developed for testing simple hypotheses. As opposed to its
composite counterpart, a simple hypothesis is one which is completely specified, both in its
functional form as well as in the values of its parameters. Although simple hypotheses rarely
occur in geodetic practice, the material of this chapter serves as an introduction to the chapters
following. In Chapter 3, the generalized likelihood ratio principle is used to develop the theory
for testing composite hypotheses. This theory is then worked out in detail in Chapter 4, for the
important case of linear(ized) models. Both the parametric form (observation equations) and the
implicit form (condition equations) of linear models are treated. Five different expressions are
given for the uniformly, most powerful, invariant teststatistic. As an additional aid in
understanding the basic principles involved, a geometric interpretation is given throughout. This
chapter also introduces the important concept of reliability. The internal and external reliability
measures given, enable a user to determine in advance (i.e. at the designing stage, before the
actual measurements are collected) the size of the minimal detectable biases and the size of their
potential impact on the estimated parameters of interest.
Many colleagues of the Department of Mathematical Geodesy and Positioning whose assistance
made the completion of this book possible are greatly acknowledged. C.C.J.M. Tiberius took care
of the editing, while the typing was done by Mrs. J. van der Bijl and Mrs. M.P.M. Scholtes. The
drawings were made by Mr. A.B Smits and the statistical tables were generated by Mrs. M.
Roselaar. Various lecturers have taught the book’s material over the past years. In particular the
feedback and valuable recommendations of G.J. Husti, F. Kenselaar and N.F. Jonkman are
acknowledged.
P.J.G. Teunissen
June, 2000
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Introduction Introduction 1
The present lecture notes are a follow up on the book Adjustment theory (TU Delft Open Publishing,
2024). Adjustment theory deals with the optimal combination of redundant measurements together
with the estimation of unknown parameters. There are two main reasons for performing redundant
measurements. First, the wish to increase the accuracy of the results computed. Second, the
requirement to be able to check for mistakes or errors. The present book addresses this second topic.
In order to be able to adjust redundant observations, one first needs to choose a mathematical
model. This model consists of two parts, the functional model and the stochastic model. The
functional model contains the set of functional relations the observables are assumed to obey. For
instance, when the three angles of a triangle are observed and when it is assumed that the laws of
planar Euclidean geometry apply, the three angles should add up to π. However, since measurements
are intrinsically uncertain (perfect measurements do not exist), one should also take the unavoidable
variability of the measurements into account. This is done by means of a stochastic model in which
the measurement uncertainty is captured through the use of stochastic (or random) variables. In
most geodetic applications it is assumed that the results of measurement, the observations, are
independent samples drawn from a normal (or Gaussian) distribution.
Once the mathematical model is specified, one can proceed with the adjustment. Although different
methods of adjustment exist, one of the leading principles is the principle of least-squares (for a brief
account on the early history of adjustment, see Appendix G of the book Adjustment theory). Apart
from the fact that properly weighted (linear) least-squares estimators are relatively easy to compute,
they also possess two important properties, namely the property of unbiasedness and the property of
minimum variance. In layman terms one could say that least-squares solutions coincide with their
target value on the average (property of unbiasedness), while the sum of squares of their
unavoidable, individual variations about this target value will be the smallest possible on the average
(property of minimum variance). These two properties only hold true, however, under the
assumption that the mathematical model is correct. They fail to hold in case the mathematical model
is misspecified. Errors or misspecifications in the functional model generally result in least-squares
estimators that are biased (off target). Similarly, misspecifications in the stochastic model will
generally result in least-squares estimators that are less precise (larger variations).
Although one always will try one’s best to avoid making mistakes, they can and will
occasionally happen. It is therefore of importance to have ways of detecting and identifying such
mistakes. In this book we will restrict ourselves and concentrate only on developing methods
for detecting and identifying errors in the functional model. Hence, throughout this book the
stochastic model is assumed to be specified correctly. This restriction is a legitimate one for many
geodetic applications. From past experience we know that if modelling errors occur, they usually
occur in the functional model and not so much in the stochastic model. Putting the exceptions
aside, one is usually quite capable of making a justifiable choice for the stochastic model.
Moreover, mistakes made in the functional model usually have more serious consequences for the
results computed than errors made in the stochastic modelling.
2 Testing theory
Mistakes or errors in the functional model can come in many different guises. At this point it
is of importance to realize, since every model is a caricature of reality, that every model has its
shortcomings. Hence, strictly speaking, every model is already in error to begin with. This shows
that the notion of a modelling error or a model misspecification has to be considered with some
care. In order to understand this notion, it helps if one accepts that the presence of modelling
errors can only be felt in the confrontation between data and model. We therefore speak of a
modelling error when the discrepancies between the observations and the model are such that
they can not be explained by, or attributed to, the unavoidable measurement uncertainty. Such
discrepancies can have many different causes. They could be caused by mistakes made by the
observer, or by the fact that defective instruments are used, or by wrong assumptions about the
functional relations between the observables. For instance, in case of levelling, it could happen
that the observer made a mistake when reading off the leveling rod, or in case of direction
measurements, it could happen that the observer accidentally aimed the theodolite at the wrong
point. These types of mistakes affect individual observations and are usually referred to as
blunders or gross errors. Instead of a few individual observations, whole sets of observations may
become affected by errors as well. This happens in case defective instruments are used, or when
mistakes are made in formulating the functional relations between the observables. Errors with
a common cause that affect whole sets of observations are sometimes referred to as systematic
errors.
The goal of this book is to convey the necessary knowledge for judging the validity of the model
used. Typical questions that will be addressed are: ’How to check the validity of a model? How
to search for certain mistakes or errors? How well can errors be traced? How do undetected
errors affect the final results?’ As to the detection and identification of errors, the general steps
involved are as follows:
(i) One starts with a model which is believed to give an adequate enough description of
reality. It is usually the simplest model possible which on the basis of past experience has
proven itself in similar situations. Since one will ordinarily assume that the measurements
and the modelling are done with the utmost care, one is generally not willing, at this
stage, to already make allowances for possible mistakes or errors. This is of course an
assumption or an hypothesis. This first model is therefore referred to as the null
hypothesis.
(ii) Since one can never be sure about the absence of mistakes or errors, it is always wise to
check the validity of the null hypothesis once it has been selected. Hence, one would like
to be able to detect an untrustworthy null hypothesis. This is possible in principle, when
redundant measurements are available. From the adjustment of the redundant
measurements, (least-squares) residuals can be computed. These residuals are a measure
of how well the measurements fit the model of the null hypothesis. Large residuals are
often indicative for a poor fit, while smaller residuals tend to correspond with a better fit.
These residuals are therefore used as input for deciding whether or not one is willing to
accept the null hypothesis.
(iii) Would one decide to reject the null hypothesis, one implicitly states that the
measurements do not seem to support the assumption that the model under the null
hypothesis gives an adequate enough description of reality. One will therefore have to
look for an alternative model or an alternative hypothesis. It very seldom happens
Introduction 3
however, that one knows beforehand which alternative to consider. After all, many
different errors could have led to the rejection of the null hypothesis. This implies that
in practice, instead of considering a single alternative, usually various alternatives will
have to be considered. And since different types of errors may occur in different
situations, the choice of these alternatives very much depends on the particular situation
at hand.
(iv) Once it has been decided which alternatives to consider, one can commence with the
process of identifying the most likely alternative. This in fact boils down to a search of
the alternative hypothesis which best fits the measurements. Since each alternative
hypothesis describes a particular mistake or modelling error, the most likely mistake
corresponds with the most likely hypothesis. Once one is confident that the modelling
errors have been identified, the last step consists of an adaptation of the data and/or
model. This implies either a re-measurement of the erroneous data or the inclusion of
additional parameters in the model such that the modelling errors are accounted for.
It will be intuitively clear that not all errors can be traced equally well. Some errors are better
traceable than others. Apart from being able of executing the above steps for the detection and
identification of modelling errors, one would therefore also like to know how well these errors
can be traced. This depends on the following factors. It depends on the model used (the null
hypothesis), on the type and size of the error (the alternative hypothesis), and on the decision
procedure used for accepting or rejecting the null hypothesis. Since these decisions are based on
uncertain measurements, their outcomes will be to some degree uncertain as well. As a
consequence, two kinds of wrong decisions can be made. One can decide to reject the null
hypothesis, while in fact it is true (wrong decision of the 1st kind), or one can decide to accept
the null hypothesis, although it is false (wrong decision of the 2nd kind). In the first case, one
wrongly believes that a mistake or modelling error has been made. This might then lead to an
unnecessary re-measurement of the data. In the second case, one wrongly believes that mistakes
or modelling errors are absent. As a consequence, one would then obtain biased adjustment
results. These issues and how to cope with them, will also be discussed in this book. Once
mastered, they will enable one to formulate guidelines for the reliable design of measurement
set-ups.
1 Basic concepts of hypothesis testing
1.1 Statistical hypotheses
Many social, technical and scientific problems result in the question whether a particular theory
or hypothesis is true or false. In order to answer this question one can try to design an
experiment such that its outcome can also be predicted by the postulated theory. After performing
the experiment one can then confront the experimental outcome with the theoretically predicted
value and on the basis of this comparison try to conclude whether the postulated theory or
hypothesis should be rejected. That is, if the outcome of the experiment disagrees with the
theoretically predicted value, one could conclude that the postulated theory or hypothesis should
be rejected. On the other hand, if the experimental outcome is in agreement with the theoretically
predicted value, one could conclude that as yet no evidence is available to reject the postulated
theory or hypothesis.
Example 1
According to the postulated theory or hypothesis the three points 1, 2 and 3 of Figure 1.1 lie on
one straight line. In order to test or verify this hypothesis we need to design an experiment such
that its outcome can be compared with the theoretically predicted value.
If the postulated hypothesis is correct, the three distances l12, l23 and l13 should satisfy the
relation:
(1) .
To denote a hypothesis, we will use a capital H followed by a colon that in turn is followed by
the assertion that specifies the hypothesis. As an experiment we can now measure the three
distances l12, l23 and l13, compute l12 + l23 − l13 and verify whether this computed value agrees or
disagrees with the theoretically predicted value of H. If it agrees, we are inclined to accept the
hypothesis that the three points lie on one straight line. In case of disagreement we are inclined
to reject hypothesis H.
6 Testing theory
It will be clear that in practice the testing of hypotheses is complicated by the fact that
experiments (in particular experiments where measurements are involved) in general do not give
outcomes that are exact. That is, experimental outcomes are usually affected by an amount of
uncertainty, due for instance to measurement errors. In order to take care of this uncertainty, we
will, in analogy with our derivation of estimation theory in "Adjustment theory", model the
uncertainty by making use of the results from the theory of random variables. The verification
or testing of postulated hypotheses will therefore be based on the testing of hypotheses of
random variables of which the probability distribution depends on the theory or hypothesis
postulated. From now on we will therefore consider statistical hypotheses.
This statistical hypothesis should be read as follows: According to H the scalar or vector
observable random variable has a probability density function given by . The scalar,
vector or matrix parameter used in the notation of indicates that the probability density
function of is known except for the unknown parameter . Thus, by specifying (either fully
or partially) the parameter , an assertion or conjecture about the density function of is made.
In order to see how a statistical hypothesis for a particular problem can be formulated, let us
continue with our Example 1.
Example 1 (continued)
We know from experience that in many cases the uncertainty in geodetic measurements can be
adequately modelled by the normal distribution. We therefore model the three distances between
the three points 1, 2 and 3 as normally distributed random variables 1. If we also assume that
the three distances are uncorrelated and all have the same known variance , the simultaneous
probability density function of the three distance observables becomes:
1
Note that strictly speaking distances can never be normally distributed. A distance is
always nonnegative, whereas the normal distribution, due to its infinite tails, admits
negative sample values.
Basic concepts of hypothesis testing 7
(3)
Statement (3) could already be considered a statistical hypothesis, since it has the same structure
as (2). Statement (3) asserts that the three distance observables are indeed normally distributed
with unknown mean, but with known variancematrix Q . Statement (3) is however not yet the
statistical hypothesis we are looking for. What we are looking for is a statistical hypothesis of
which the probability density function depends on the theory or hypothesis postulated. For our
case this means that we have to incorporate in some way the hypothesis that the three points lie
on one straight line. We know mathematically that this assertion implies that:
(4)
However, we cannot make this relation hold for the random variables l12, l23 and l13 . This is
simply because of the fact that random variables cannot be equal to a constant. Thus, a statement
like: l12 l23 l13 0 is nonsensical. What we can do is assume that relation (4) holds for the
expected values of the random variables l12, l23 and l13 :
(5)
For the hypothesis considered this relation makes sense. It can namely be interpreted as stating
that if the measurement experiment were to be repeated a great number of times, then on the
average the measurements will satisfy (5). With (3) and (5) we can now state our statistical
hypothesis as:
(6)
This hypothesis has the same structure of (2) with the three means playing the role of the
parameter x .
In many hypothesis-testing problems two hypotheses are discussed: The first, the hypothesis
being tested, is called the null hypothesis and is denoted by H0 . The second is called the
alternative hypothesis and is denoted by HA . The thinking is that if the null hypothesis H0 is
false, then the alternative hypothesis HA is true, and vice versa. We often say that H0 is tested
against, or versus, HA . In studying hypotheses it is also convenient to classify them into one of
two types by means of the following definition: if a hypothesis completely specifies the
distribution, that is, if it specifies its functional form as well as the values of its parameters, it
8 Testing theory
Example 1 (continued)
In our example (6) is the hypothesis to be tested. Thus, the null hypothesis reads in our case:
(7)
Since we want to find out whether E l12 E l23 E l13 0 or not, we could take as alternative
the inequality E l12 E l23 E l13 π 0 . However, we know from the geometry of our problem
that the left hand side of the inequality can never be negative. The alternative should therefore
read: E l12 E l23 E l13 > 0 . Our alternative hypothesis takes therefore the form:
(8)
When comparing (7) and (8) we see that the type of the distribution of the observables and their
variance matrix are not in question. They are assumed to be known and identical under both H0
and HA . Both of the above hypotheses, H0 and HA , are examples of composite hypotheses. The
above null hypothesis H0 would become a simple hypothesis if the individual expectations of
the observables were assumed known.
After the statistical hypotheses H0 and HA have been formulated, one would like to test them
in order to find out whether H0 should be rejected or not.
is a rule or procedure, in which a random sample of y is used for deciding whether to reject or
not reject H0 . A test of a statistical hypothesis is completely specified by the so-called critical
region (kritiek gebied), which will be denoted by K .
Basic concepts of hypothesis testing 9
The critical region K of a test is the set of sample values of y for which H0 is to be rejected.
Thus, H0 is rejected if y Œ K .
It will be obvious that we would like to choose a critical region so as to obtain a test with
desirable properties, that is, a test that is "best" in a certain sense. Criteria for comparing tests
and the theory for obtaining "best" tests will be developed in the next and following sections.
But let us first have a look at a simple testing problem for which, on more or less intuitive
grounds, an acceptable critical region can be found.
Example 2
Let us assume that a geodesist measures a scalar variable, and that this measurement can be
modelled as a random variable y with density function:
(9)
Thus, it is assumed that y has a normal distribution with unit variance. Although this assumption
constitutes a statistical hypothesis, it will not be tested here because the geodesist is quite certain
of the validity of this assumption. The geodesist is however not certain about the value of the
expectation of y . His assumption is that the value of E y is x0 . This assumption is the statistical
hypothesis to be tested. Denote this hypothesis by H0 . Then:
(10)
Thus the problem is one of testing the simple hypothesis H0 against the composite hypothesis
HA . To test H0 , a single observation on the random variable y is made. In real-life problems one
usually takes several observations, but to avoid complicating the discussion at this stage only one
observation is taken here. On the basis of the value of y obtained, denoted by y , a decision will
be made either to accept H0 or reject it. The latter decision, of course, is equivalent to accepting
HA . The problem then is to determine what values of y should be selected for accepting H0 and
what values for rejecting H0 . If a choice has been made of the values of y that will correspond
to rejection, then the remaining values of y will necessarily correspond to acceptance. As defined
above, the rejection values of y constitute the critical region K of the test. Figure 1.2 shows the
distribution of y under H0 and under two possible alternatives HA and HA .
1 2
10 Testing theory
Figure 1.2: .
Looking at this figure, it seems reasonable to reject H0 if the observation y is remote enough
from E y x0 . If H0 is true, the probability of a sample of y falling in a region remote from
Ey x0 is namely small. And if HA is true, this probability may be large. Thus the critical
region K should contain those sample values of y that are remote enough from E y x0 . Also,
since the alternative hypothesis can be located on either side of E y x0 , it seems obvious to
have one portion of K located in the left tail of H0 and one portion of K located in the right tail
of H0 . Finally, one can argue that since the distribution is symmetric about its mean value, also
the critical region K should be symmetric about E y x0 . This as a result gives the form of the
critical region K as shown in Figure 1.3. Although this critical region has been found on more
or less intuitive grounds, it can be shown that it possesses some desirable properties. We will
return to this matter in a later section.
We have seen that a test of a statistical hypothesis is completely specified once the critical region
K of the test is given. The null hypothesis H0 is rejected if the sample value or observation of
y falls in the critical region, i.e. if yŒK . Otherwise the null hypothesis H0 is accepted, i.e. if
yœK . With this kind of thinking two types of errors can be made:
Table 1.1 shows the decision table with the type I and II errors.
H0 true H0 false
Reject H0 Wrong
Correct
yŒK Type I error
Accept H0 Wrong
Correct
yœK Type II error
The size of a type I error is defined as the probability that a sample value of y falls in the
critical region when in fact H0 is true. This probability is denoted by a and is called the size of
the test or the level of significance of the test (onbetrouwbaarheid van de test). Thus:
or
(12) .
The size of the test, a, can be computed once the critical region K and the probability density
function of y is known under H0 . The size of a type II error is defined as the probability that
a sample value of y falls outside the critical region when in fact H0 is false. This probability
is denoted by b. Thus:
or
(13) .
The size of a type II error, b, can be computed once the critical region K and the probability
density function of y is known under HA .
Example 3
Since the alternative hypothesis HA is located on the right of the null hypothesis H0 , it seems
intuitively appealing to choose the critical region K right-sided. Figure 1.5a and 1.5b show two
possible right-sided critical regions K.
They also show the size of the test, a, which corresponds to the area under the graph of the
distribution of y under H0 for the interval of the critical region K.
Basic concepts of hypothesis testing 13
The size of the test, a, can be computed once the probability density function of y under H0 is
known and the form and location of the critical region K is known. In the present example the
form of the critical region has been chosen right-sided. Its location is determined by the value
of ka , the so-called critical value (kritieke waarde) of the test. Thus, for the present example the
size of the test can be computed as:
or, since:
as:
(17)
When one is dealing with one-dimensional normally distributed random variables, one can
usually compute the size of the test, a, from tables given for the standard normal distribution
(see appendix B). In order to compute (17) with the help of such a table, we first have to apply
a transformation of variables. Since y is normally distributed under H0 with mean x0 and
variance s2 , it follows that the random variable z , defined as:
(18)
(19)
we can use the last expression of (19) for computing a. Application of the change of variables
(18) to (17) gives:
(20)
We can now make use of the table of the standard normal distribution. Table 1.2 shows some
typical values of the a and ka for the case that x0 1 and s 2 .
14 Testing theory
ka x0
a ka
s
As we have seen the location of the critical region K is determined by the value chosen for ka ,
the critical value of the test. But what value should we choose for ka ? Here the geodesist should
base his judgement on his experience. Usually one first makes a choice for the size of the test,
a, and then by using (20) or Table 1.2 determines the corresponding critical value ka . For
instance, if one fixes a at a = 0.01, the corresponding critical value ka (for the present example
with x0 1 and s 2 ) reads ka 5.65. The choice of a is based on the probability of a type
I error one is willing to accept. For instance, if one chooses a as a = 0.01, one is willing to
accept that 1 out of a 100 experiments leads to rejection of H0 when in fact H0 is true.
Let us now consider the size of a type II error, b. Figure 1.6 shows for the present example the
size of a type II error, b. It corresponds to the area under the graph of the distribution of y
under HA for the interval complementary to the critical region K.
Figure 1.6: The sizes of type I and type II error, a and b, for testing
H0: E y x0 versus HA: E y x A > x0 .
The size of a type II error, b, can be computed once the probability density function of y under HA
is known and the critical region K is known. Thus, for the present example the size of the type
II error can be computed as:
or since:
Basic concepts of hypothesis testing 15
as:
(21)
Also this value can be computed with the help of the table of the standard normal distribution.
But first some transformations are needed. It will be clear that the probability that a sample or
observation of y falls in the critical region K when HA is true, is identical to 1 minus the
probability that the sample does not fall in the critical region when HA is true. Thus:
(22)
(23)
This formula has the same structure as (17). The value 1−b can therefore be computed in exactly
the same manner as the size of the test, a, was computed. And from 1−b it is trivial to compute
b, the size of the type II error.
Figure 1.7 gives the probability 1−b of rejecting H0 , when indeed HA is true, as function of the
unknown mean xA under HA . When this probability is requested to be at least 1−b = 0.80, the
unknown mean under HA has to be at least xA 7.34 . We return to the probability g = 1−b, the
power, in Section 4.5 on reliability. The size of the test was fixed to a = 0.01.
16 Testing theory
0.9
0.8
0.7
power g=1−b
0.6
0.5
0.4
0.3
0.2
0.1
0
1 2 3 4 5 6 7 8 9 10
xA
We have seen that two types of errors are involved when testing a null hypothesis H0 against
an alternative hypothesis HA : (1) The rejection of H0 when in fact H0 is true (type I error); (2)
the acceptance of H0 when in fact H0 is false (type II error). One might reasonably use the sizes
of the two types of errors, a and b, to set up criteria for defining a best test. If this is possible,
it would automatically give us a method of choosing a critical region K. A good test should be
a test for which a is small (ideally 0) and b is small (ideally 0). It would therefore be nice if we
could define a test, i.e. define a critical region K, that simultaneously minimizes both a and b.
Unfortunately this is not possible. As we decrease a, we tend to increase b, and vice versa. The
Neyman-Pearson principle provides a workable solution to this situation. This principle says that
we should fix the size of the type I error, a, and minimize the size of the type II error, b. Thus:
A testing principle (Neyman et al., 1933): Among all tests or critical regions possessing the same
size type I error, a, choose one for which the size of the type II error, b, is as small as possible.
The justification for fixing the size of the type I error to be a, (usually small and often taken as
0.05 or 0.01) seems to arise from those testing situations where the two hypotheses, H0 and HA ,
are formulated in such a way that one type of error is more serious than the other. The
hypotheses are stated so that the type I error is the more serious, and hence one wants to be
certain that it is small. Testing principles other than the above given one can of course easily be
suggested: for example, minimizing the sum of sizes of the two types of error, a + b. However,
the Neyman-Pearson principle has proved to be very useful in practice. In this book we will
therefore base our method of finding tests on this principle. Now let us consider a testing
problem from the point of view of the Neyman-Pearson principle.
Basic concepts of hypothesis testing 17
Example 4
(24) .2
Contrary to our Example 3, it is now not that obvious how to choose the form of the critical
region K. Let us first consider the case of a right-sided critical region K. Thus:
(26)
(27)
For the right-sided critical region (26) this gives for the size of the type I error:
(28)
2
Prove yourself that this function is indeed a probability density function.
18 Testing theory
(29)
(30)
For this critical region the size of the type I error becomes:
(31)
(32)
Let us now compare the two tests, that is, the one with the right-sided critical region K with the
one with the left-sided critical region K . We will base this comparison on the Neyman-Pearson
principle. According to this principle, both tests have the same size of type I error. Thus:
(33)
(34)
Using (29) and (32) this equation can be expressed in terms of b and b as:
Hence:
(35)
Figure 1.9 shows the graph of this function. It clearly shows that:
(36)
The conclusion reads therefore that of the two tests the one having the right-sided critical region
K is the best in the sense of the Neyman-Pearson principle.
Basic concepts of hypothesis testing 19
Thus far we have discussed the basic concepts underlying most of the hypothesis-testing
problems. The same concept and guidelines will provide the basis for solving more complicated
hypothesis-testing problems as treated in the next chapters. Here we summarize the main steps
on testing hypotheses about a general probability model.
(a) From the nature of the experimental data and the consideration of the assertions that are
to be examined, identify the appropriate null hypothesis and alternative hypothesis:
(b) Choose the form of the critical region K that is likely to give the best test. Use the
Neyman-Pearson principle to make this choice.
(c) Specify the size of the type I error, a, that one wishes to assign to the testing process.
Use tables to determine the location of the critical region K from:
(e) After the test has been explicitly formulated, determine whether the sample or observation
y of y falls in the critical region K or not. Reject H0 if yŒK , and accept H0 if yœK .
Never claim however that the hypotheses have been proved false or true by the testing.
2 Testing of simple hypotheses
2.1 The simple likelihood ratio test
In this chapter we consider testing a simple null hypothesis H0 against a simple alternative
hypothesis HA . This case is actually not very useful in practical applications, but it will serve
the purpose of developing some theory of testing hypotheses. We will assume that the m×1
vector random variable y is distributed as:
(1)
Our objective is, given an observation y on y , to determine from which distribution the
observation came from; from py(y x0) or from py(y xA) ? In this section we will give a general
method for solving this testing problem. The method is closely related to the maximum likelihood
principle as discussed in Adjustment theory.
For a fixed value of x the function py(y x) is a function of y and for different values of x the
function py(y x) may take different forms (see Figure 2.1).
In the context of estimation theory the objective was to determine or estimate the unknown
parameter x on the basis of the observation vector y . In the present context of hypothesis
testing, the objective is to decide between H0 and HA . In both cases, that is, in the case of
estimation theory and in the case of hypothesis testing, one could say that one would like to
determine the correct value of the parameter x that produced the observed y . This suggests
considering for each possible x how probable the observed y would be if x were the true value.
The higher this probability, the more one is attracted to the explanation that the x in question
produced y , and the more likely the value of x appears. In estimation theory, where no
constraints were put on x , this principle resulted in the maximum likelihood method. This
method chooses as an estimate of x that value which maximizes py(y x) for the given observed
y . For the problem of testing the two simple hypotheses H0 and HA we can now apply the same
22 Testing theory
principle. But instead of maximizing py(y x) as function of x , we only need to compare the two
likelihood values of py(y x0) and py(y xA) . We decide that the observation y came from H0 if
py(y x0) > py(y xA) and, conversely, decide that the observation y came from HA if
py(y x0) < py(y xA) . This simple method of obtaining a test for testing H0 against HA can be
expanded into a family of tests that, as we will see, will contain some good tests.
(3)
For each different value of a we have different tests. For a fixed value of a the test says to reject H0
if the ratio of likelihoods is small; that is, reject H0 if it is more likely that the observation came
from py(y xA) than from py(y x0) . Let us consider some examples to see how the simple
likelihood ratio test works.
Example 1
(5)
Figure 2.2 shows the distribution of y under H0 and HA for m 1 . For m 1 , it seems intuitively
appealing to reject H0 if the observation y is remote from the zero-mean value. Due to the
symmetry of the distribution of y , it also seems intuitively appealing to choose the critical region
K symmetric about 0. Thus, based on these two intuitive arguments we would choose to reject H0
if (see Figure 2.2):
(6)
Testing of simple hypotheses 23
Figure 2.3 shows the contourlines of equal density of the distribution of y under H0 and HA for
m > 1 . As a generalization of (6), it seems in this case intuitively appealing to reject H0 if (see
Figure 2.3):
(7)
Figure 2.3: Contourlines of equal density of N(0,s0Im) and N(0,sAIm), with sA > s0 .
2 2 2 2
Now let us apply the simple likelihood ratio test for this particular example, and see how it
compares with (6) and (7) respectively. With:
and
it follows that:
(8)
24 Testing theory
(9)
In order to compare (9) with (7), we first transform (9) into a simpler inequality. The inequality
of (9) can also be written as:
(10)
If we denote the right-hand side of this inequality by ka , we see that the simple likelihood ratio
test gives a critical region K which is identical to the one chosen earlier (see (7)) on intuitive
grounds. Thus, for this particular example the simple likelihood ratio test is:
(11)
In order to perform or execute this test, we still need to choose a particular value for the critical
value ka . The critical value ka can be computed once the size of the type I error, a, has been
2
fixed, and once the distribution of y y is known under H0 . Since y is distributed as N(0,s0Im)
under H0 , it follows (see appendix A) that y y is distributed under H0 as a central s0c2 -
2
distribution with m degrees of freedom. In this case, there are no unknown parameters, n = 0
and hence m - n = m. Thus:
(12)
(13)
Since:
Testing of simple hypotheses 25
(14)
and since y y/s0 is distributed as c2(m,0) under H0 , we can use a table of the c2 -distribution
2
(see appendix B) to compute the critical value ka from the chosen size of type I error, a. Table
2.1 shows some typical values of a and ka for the case s0 2 and m 1 (on the left).
2
m=1 m=4
ka 1 1
a k ka k ka
2 a 2 a
s
2
0
From ka and the distribution of y y under HA , we can also compute the size of the type II error,
b. Since:
we may use:
(15)
and the table of the c2 -distribution to compute b from ka . Table 2.2 shows some typical values
of ka and b for the case m 1 and sA 4 .
2
26 Testing theory
ka
b
1
ka 4 a
k 1 b
s
2
A
Table 2.3 shows some typical values of ka and b for the case m 4 and sA 4 .
2
ka
b
1
ka 4 a
k 1 b
s
2
A
Upon comparing Table 2.2 and Table 2.3 we note that at the same size of type I error and thus
at the same critical value ka , the b for the case m 4 is less than the b for the case m 1 . This
is also what one would expect, since by increasing the number of observations one would expect
to have a higher probability of correctly accepting HA . Show for yourself that the b-values of
table 3 will increase if instead of HA : sA 4 we have the alternative HA : sA 3 .
2 2
Example 2
(17)
With:
and
Testing of simple hypotheses 27
it follows that:
(18)
(19)
or
or
If we denote the right-hand side of this inequality by ka , we see that the simple likelihood ratio
test for this particular example reduces to:
(20)
The corresponding critical region K of this test is shown in Figure 2.4. Note that it is identical
to the critical region of Example 3 of the previous chapter, the one which was chosen on more
or less intuitive grounds. In Example 3 of the previous chapter we noted that a transformation
of y to the standard normal distribution was useful for computing the sizes a and b. We might
therefore just as well write test (20) in terms of this transformed random variable. This gives:
(21)
Example 3
With:
and
(24)
(25)
(26)
If we denote the right-hand side of this inequality by ka , we see that the simple likelihood ratio
test for this particular example is given by:
Testing of simple hypotheses 29
(27)
In Section 1.4 of the previous chapter we presented the Neyman-Pearson testing principle. This
principle says to choose among all tests possessing the same size a, the one for which the size
of the type II error, b, is as small as possible. This statement is expressed in terms of b, the
probability that the sample will fall in the non critical region when in fact HA is true. It is
usually, however, more convenient to work exclusively with the critical region K . It is therefore
customary to calculate 1 b , which is the probability that the sample will fall in the critical
region K when in fact HA is true. The probability 1 b is called the power of the test and it is
denoted by g . Thus:
The power g of a test is the probability of correctly rejecting H0 . The power can be calculated
as:
(28) .
We can now rephrase the Neyman-Pearson testing principle in terms of the power g . This gives
the following definition of a most powerful test.
(i)
and
(ii)
So far we have seen in our example that the simple likelihood ratio test produces critical regions
that are indeed intuitively appealing. We have however not yet considered the question of
optimality of the simple likelihood ratio test. The following important theorem, by Neyman and
Pearson, shows that the simple likelihood ratio test is a most powerful test.
Neyman-Pearson theorem: Let y be a sample or observation from py(y x) where x is one of two
known values x0 and xA , and let 0 < a < 1 be fixed. Let a be a positive constant and K be a
subset of the sample space which satisfies:
30 Testing theory
(i)
(ii)
Then the test corresponding to the critical region K , that is, the simple likelihood ratio test, is
a most powerful test of size a for testing H0 : x x0 versus HA : x xA .
Proof
To prove the Neyman-Pearson theorem, let K be any other critical region of size a. The
regions K and K may be represented geometrically as the regions interior to the indicated
closed surfaces in Figure 2.5.
or
(29)
But, from Figure 2.5 it is clear that the integral over 2 which is the common part of K and K ,
will cancel from both sides of (29) and reduce it to the form:
(30)
Since the power of a test is given by the probability that the sample will fall inside the critical
region when HA is true, we have for the two critical regions K and K :
Testing of simple hypotheses 31
Since the integral over the common part cancels, this difference reduces to:
(31)
Since region 1 lies in K , it follows from (ii) of the theorem that every point y of 1 satisfies
the inequality:
Hence:
(32)
Similarly, since 3 lies outside K , it follows from (ii) of the theorem that every point y of 3
satisfies the inequality:
Hence:
(33)
When the results (32) and (33) are used in (31), it follows that:
But from (30), the right side of this inequality must be equal to zero, hence:
Since g is the power of the test using any other critical region K of size a, the preceding
analysis proves that the test corresponding to the critical region K is indeed a most powerful test
of size a.
End of proof.
32 Testing theory
Although the theorem does not explicitly say how to find the constant a and the region K ,
implicitly it does since the form of the test, that is, the critical region K , is given by (ii) of the
theorem. In practice it is often, as shown in previous examples, not necessary to find a. Instead
the inequality of (ii) of the theorem for yŒK is manipulated into an equivalent form that is easier
to work with, and the actual test is then expressed in terms of the new inequality. The following
example should make this clear.
Example 4
We will now consider the multi-dimensional generalization of Example 2. Assume therefore that
y is an m×1 random vector which is distributed as:
(34)
with known variance s2 . The following two simple hypotheses are considered:
(35)
The situation is sketched in Figure 2.6. Figure 2.6 shows the location of the two simple
hypotheses H0 and HA in the sample space m . It also shows the contours of constant density
of the distribution of y , and it shows the location of the sample point y .
In order to apply the simple likelihood ratio test we need to know the density functions py(y x0)
and py(y xA) . They read:
and
Hence:
Testing of simple hypotheses 33
(36)
(37)
We will now transform this inequality into an inequality that can be considered as the multi-
dimensional generalization of the inequality of (21). After taking the logarithm and multiplying
with , the inequality of (37) takes the form:
or
By adding and subtracting 2(xA x0) x0 , this can also be written as:
or as:
(38)
(39)
If we denote the right-hand side of this inequality by ka , we see that the simple likelihood ratio
test for this particular example reduces to:
(40)
This test can be considered the multi-dimensional generalization of test (21) of Example 2. The
form of the critical region K corresponding to test (40) is shown in Figure 2.7. For the case
shown we have yŒK , implying that H0 is rejected.
(41)
(42)
which can be transformed into an integral of the standard normal distribution as:
Testing of simple hypotheses 35
(43)
Note that the power g , for a fixed critical value ka , is a monotone increasing function of —/s .
Thus g gets larger if — gets larger. This is what one would expect. The further H0 and HA are
apart (see Figure 2.7) the higher one would expect the power g to be. The power g also gets
larger if the standard deviation s gets smaller. This is also what one would expect. The better
the precision of the observations, the higher one would expect the power g to be.
(44)
Let us now try to find out if and how the theory of hypothesis testing, as developed in the
previous sections, can be applied for testing a model like (44). First of all we have to assume
a probability distribution for y . Since the normal distribution is adequate for most of the geodetic
applications, we assume that the m×1 random vector y is normally distributed with mean
Ey Ax and variance matrix D y Qy . Our null hypothesis H0 reads therefore:
(45)
Note that the n×1 parameter vector x in (45) is unspecified. Hence, the above null hypothesis H0
is a composite hypothesis. It seems therefore that our theory which so far only holds for simple
hypotheses, cannot be applied. The theory can be applied, however, if we are able to transform
(45) into a simple hypothesis. Recall from Adjustment theory the linear model of condition
equations:
(46)
As we know, this model is completely equivalent to (44). We also know that the two matrices
A and B respectively of (44) and (46) satisfy the relation:
(47)
(48)
This hypothesis is equivalent to the null hypotheses of (45), just like (46) is equivalent to (44).
H0 of (48) is of course still a composite hypothesis. This follows since only b < m linear
36 Testing theory
(49)
Recall from Adjustment theory that this is the vector of misclosures (tegenspraken). Under the
null hypothesis of (48) the random vector t is normally distributed with mean E t 0 and
variance-matrix Qt B QyB . Thus under H0 we have:
(50) .
Note that this null hypothesis H0 is a simple hypothesis. But, also note that (50) is not
equivalent to (48). That is, the hypothesis H0 follows from H0 , but H0 does not follow from
H0 . This is due to the fact that the matrix B of (49) is not invertible. Although the simple
hypothesis H0 is not equivalent to the composite hypothesis H0 , we will settle with H0 and try
to test it against an alternative hypothesis. Then, if H0 gets rejected, H0 should be rejected too.
This is because H0 cannot be true while H0 is false. On the other hand, if H0 gets accepted one
should be very careful in accepting H0 . H0 can namely be false while H0 is true. The following
example makes this clear.
Example 5
(51)
If —y π 0 and/or —Qy π 0 , this hypothesis is clearly different from H0 of (45). Now consider
the effect of Htrue on the distribution of t B y . For the mean of t under Htrue we have:
(52)
In general this hypothesis differs from H0 of (50). But if the vector —y and the columns of
matrix —Qy lie in the nullspace of B , that is, —yŒN(B ) and R(—Qy)ÃN(B ) , then B —y 0
and B —QyB 0 . In this case Htrue of (52) becomes identical to H0 of (50), while Htrue of (51)
still differs from H0 of (45). This shows that H0 can be false while H0 is true.
Now let us have a look at an alternative hypothesis for H0 . Many different types of alternative
hypotheses may be considered. For instance, the alternative hypothesis may specify that y has
a mean Ax , a variance matrix Qy , but a distribution that differs from the normal distribution. Or,
the alternative hypothesis may specify that y is normally distributed with mean Ax , but with a
variance matrix that differs from Qy . In these lecture notes however we will primarily be
concerned with alternative hypotheses that differ from the null hypothesis in the mean of y . The
reason is that in most geodetic applications the alternative hypotheses are used to model errors
or blunders in the observations. For instance, if we want to find out whether the ith-observation
is erroneous or not we may model the alternative hypothesis as:
(53)
with
(54)
In this case, the scalar — is the error or blunder in the observation and the vector cy models the
error — to be in the ith-observation.
The vector —y in (53) may also be used to model systematic errors in the observations. For
instance, if all observations contain a systematic error of — , the vector —y of (53) takes instead
of (54) the form:
(55)
These two examples show that one can model different types of errors in the observations
through an appropriate choice of the vector cy . Now let us consider the effect of HA on the
distribution of t B y . It follows that the distribution of t under HA is given by:
(56)
(57)
(58) .
For the present application of the theory we have to assume however that (58) is a simple
hypothesis and therefore that — is known and positive. Now that we formulated the two simple
hypotheses H0 and HA , we are in the position to apply our theory of hypothesis-testing. In order
to apply the simple likelihood ratio test we need to know the probability density functions of
t under H0 and HA respectively. They read:
and
Hence:
(59)
(60)
(61)
1 1
Since ct Qt t is distributed under H0 as N (0, ct Qt ct) , we may bring (61) into the standard
1
normal form by dividing by (ct Qt ct)1/2 . This gives:
Testing of simple hypotheses 39
(62)
If we denote the right-hand side of this inequality by ka , and define the random variable w as:
(63)
(64) reject
Note that the random variable w is distributed under H0 (H0) and HA (HA) as:
(65)
The random variable w is called the w-teststatistic (w-toetsgrootheid) and, as we will see in later
chapters, it plays a very important role in hypothesis-testing for geodetic applications.
It is very illustrative if we interpret the simple likelihood ratio test (64) and the w -teststatistic
(63) geometrically. In order to do so we define the following innerproduct in the space b :
(66)
b b
The norm (or length) of a vector in and the innerproduct of two vectors in can be written
as:
(67)
This shows that w is the orthogonal projection of t onto the line with direction vector ct (see
Figure 2.8).
In a similar way we may now also illustrate test (64) geometrically. This is done in Figure 2.9.
For the case shown we have tœK , implying that H0 gets accepted.
The w -teststatisic (63) has been formulated in terms of t , Qt and ct . We may however also
express w in terms of the original quantities y, Qy and cy . Substitution of:
(68)
Now, recall from Adjustment theory that the least-squares residual vector ê and its variance
matrix Qê , expressed in quantities belonging to the model of condition equations, read:
and
(69) .
This shows that the w -teststatistic can be computed directly form the results of the least- squares
adjustment of either the model of observation equations (44) or the model of condition equations
(46). Also expression (69) can be interpreted geometrically. Recall from Adjustment theory that:
Testing of simple hypotheses 41
(70)
(71)
Note that this expression has the same structure as (63). The geometric interpretation is therefore
very similar to the previous given one. We define the following innerproduct in the sample space
m
:
(72)
^ ^
This shows that w is the orthogonal projection of PA y onto the line with direction vector PA cy .
^ ^
Note that PA y and PA cy are both the orthogonal projections of y and cy respectively on R(A)^ ,
the orthogonal complement of the rangespace, R(A), of A. Figure 2.10 gives a sketch of test (64)
in terms of quantities that are located in the sample space m .
m
Figure 2.10: Critical region KÃ for test (64).
In order to see the theory at work we now will consider a typical geodetic example.
42 Testing theory
Example 6
Figure 2.11 shows a typical levelling network of four points with two loops.
If we assume that the height x0 of point 0 is known and equal to zero, the linear model of
observation equations reads:
(73)
Note that we have assumed that the variancematrix of the observables is equal to a scaled
identity matrix. We will also assume that the observables are normally distributed. The linear
model can of course also be expressed in terms of condition equations. In terms of condition
equations we get:
(74)
The models (73) or (74) together with the assumption of normally distributed observables,
constitute our null hypothesis H0 . Let us now consider the alternative hypothesis HA . For this
particular example we assume to know that if H0 is false, then an error in observation y2 has
been made of a known amount —. The alternative hypothesis in terms of observation equations
reads therefore:
Testing of simple hypotheses 43
(75)
(76)
With (74) and (76) we are now in the position to compute the quantities which are needed in the
w -teststatistic (63). The vector of misclosures, t B y , and its variancematrix, Qt B QyB
follow from (74) as:
(77)
(78)
which gives:
(80)
With this result and a computed value for ka we are now able to execute the simple likelihood
ratio test. The power of this test follows from:
—
or, since w is distributed under HA as w ~ N( ,1) , from:
s 2
(81)
Again we note that the power g gets larger if — gets larger or s gets smaller. Thus the
probability of detecting an error of size — in the observable y2 gets larger if the size of the error
gets larger or when the precision of the observables gets better. But, apart from these two effects,
the power g can also be shown to depend on the design or structure of the levelling network. In
the case of Figure 2.11 the observable y2 occurs in both levelling loops. Hence we have two
linearly independent condition equations with which a possible error in the observation can be
detected. One would expect that the power decreases if y2 would occur in only one condition
equation. In order to verify this we consider the situation as sketched in Figure 2.12. In this case
y2 occurs only in one levelling loop.
Testing of simple hypotheses 45
Following the same kind of derivation as above one can show that the w -teststatistic for
detecting an error of size — in the observation y2 reads:
(82)
—
Since w is distributed under the alternative hypothesis as w ~N( ,1) , the power of the test
becomes: s 3
(83)
A comparison of (83) with (81) clearly shows that g > g . Thus a simple likelihood ratio test
of size a based on the configuration of Figure 2.11 has a higher probability of detecting an error
of size — in the observation y2 , than a simple likelihood ratio test of size a based on the
configuration of Figure 2.12. This conclusion shows how important it is when designing geodetic
networks to make sure that an observation occurs in enough condition equations.
In the previous example we have seen that the power of the simple likelihood ratio test of size
a depends on:
It is important to realize that this is not only valid for the case considered in the previous
example, but that it is also valid for the likelihood ratio test (64) in general. This can be seen as
follows. We know that the power of the simple likelihood ratio test of size a (64) can be
computed as:
46 Testing theory
or that:
(84)
This shows that g, for a fixed size a, decreases if — decreases or ct Qt ct decreases. The
1
precision of the observations and the structure of the network are contained in the scalar
1 1
ct Qt ct . This can be seen if we write ct Qt ct as:
(85)
The structure of the network is reflected in matrix B and the precision of the observations in
matrix Qy .
To conclude this section we have summarized in Table 2.4 the various steps which were
followed in deriving the w -teststatistic.
Testing of simple hypotheses 47
Ï E y Ax , rankA n ÏE y Ax cy —
Ô m×1 Ô m×1
Ô m×nn×1 Ô m×nn×1 m×11×1
Ô Ô
H0 : Ì or HA : Ì or
Ô Ô
ÔB E y 0 , rankB b ÔB E y B cy —
Ô Ô b×m
Ób×m m×1 b×1 b m n
Ó m×1 b×m m×11×1
composite composite
Ø Ø
Transformation
t B y
Et B Ey ; Qt B QyB
Ø Ø
H0 : E t 0 HA : E t t
c — , ct B cy
b×1 b×1 b×1 b×11×1 b×m b×m m×1
simple simple
In the previous section it was shown that the simple likelihood ratio test of size a for testing:
(86)
with
(88)
The simple hypotheses of (86) were obtained from the composite hypotheses:
(89)
through the transformation t B y with B A 0. It was also shown that the w -teststatistic
(88) could be expressed in terms of quantities located in the sample space m as:
(90)
Furthermore it was pointed out that rejection of H0 implies rejection of H0, but that acceptance
of H0 not necessarily implies that one should accept H0. Finally an example was given, showing
how the theory could be applied for detecting errors of known size in the observations.
In this section we consider a testing problem that, although mathematically equivalent to the
above given testing problem, occurs when one wants to test the significance of parameters. We
will derive the appropriate simple likelihood ratio test of size a and the corresponding v -
teststatistic. Let us assume as before that the m×1 random vector y is normally distributed with
full rank variancematrix Qy . The following two hypotheses are considered:
(91)
The two hypotheses H0 and HA differ in the sense that under H0 it is assumed that the linear
function of x, b x, is identical to zero, whereas under HA it is assumed that this function is
identical to the known scalar —π
> 0.n Thus, what we would like to find out is whether
b *x 0 or b *x = .. Note that H of (91) is of the mixed model type which was discussed in
Chapter 5.3 of Adjustment theory. In order to be able to apply the theory of the previous section
Testing of simple hypotheses 49
we will first show how to rewrite the above H0 and HA in such a form that their structure is
equivalent to the hypotheses H0 and HA of (89).
(92)
We know that its solution is given by the sum of a particular solution and the homogeneous
solution. A particular solution of (92) is:
(93)
(94)
we denote the n×(n 1) matrix of which the columnvectors are orthogonal to b by b ^ . Then:
(95)
With (95) the parametric representation of the homogeneous equation (94) becomes:
(96)
The general solution of the inhomogeneous equation (92) is therefore given by the sum of (93)
and (96):
(97)
Now, since (96) is equivalent to (94) and (97) is equivalent to (92), the hypotheses of (91) may
also be written as:
(98)
Comparison of (98) with (89) shows the equivalence in structure. That is, the matrix Ab ^ of
(98) plays the role of the matrix A in (89), and the vector Ab(b b) 1 of (98) plays the role of the
vector cy in (89). Because of this equivalence in structure of the hypotheses, the simple
likelihood ratio test for the present testing problem will have the same structure as the test
developed in the previous section. The corresponding teststatistic, which will be denoted by v ,
follows then if we replace cy in (90) by Ab(b b) 1:
(99)
The least-squares residual vector ê and its variancematrix Qê in formula (99) correspond to the
least-squares solution of model H0 in (91). Recall from Chapter 5.3 of Adjustment theory that
the least-squares solution of the mixed model:
50 Testing theory
(100)
reads:
(101)
(102)
1 1 1
In (99) we need A Qy ê and A Qy QêQy A . With (102) this gives:
(103)
Substitution of these results into (99) gives the following simple expression for v -teststatistic:
(104) .
The corresponding simple likelihood ratio test of size a for the testing problem (91) reads
therefore:
(105)
Note that this test is also intuitively appealing. For instance, if b (0 ... 1 0 ... 0)
ith
s , and the v -teststatistic reduces to v x̂ i / si , implying that
2
then x̂ A x̂i and b Qx̂ b iA
A A A A
(106)
Example 7
We assume that the observables are normally distributed, uncorrelated and have equal variance
s2 . The following two hypotheses are considered:
(107)
Thus, what we would like to find out is whether the height difference between the points 1 and
2 equals zero or equals —. In order to compute the teststatistic v of (104) we need the vectors
b and x̂ A , and the variancematrix Qx̂ . According to (107) vector b reads b (1 1) . The vector x̂ A
A
this gives:
and
In the previous chapter we considered testing a simple hypothesis against a simple alternative.
We return now to the more general hypotheses-testing problem, that of testing composite
hypotheses. We will assume that the vector random variable is distributed as:
(1)
where F is a set of possible values the vector may take. The following two composite
hypotheses are considered:
(2)
(3)
Note that the ratio of (3) lies in the closed interval [0,1]. The ratio is larger or equal to zero since
we have a ratio of nonnegative quantities, and the ratio is less than or equal to one since the
maximum taken in the denominator is over a larger set of parameter values than that in the
numerator; hence the denominator cannot be smaller than the numerator. Also note that although
(3) resembles the simple likelihood ratio test (see (3) of Chapter 2), it does not reduce to the
simple likelihood ratio test for F x0 , xA . The simple likelihood ratio is namely not restricted
to the closed interval [0,1]. The nonnegative constant a is taken to lie in the open interval (0,1).
The value a = 0 is excluded, since we would like to reject H0 if the ratio in (3) equals zero. And
the value a 1 is excluded, since would like to accept H0 if the ratio in (3) equals one. The
generalized likelihood ratio test makes good intuitive sense since the ratio in (3) will tend to be
small when H0 is not true, since then the denominator of the ratio tends to be larger than the
numerator. In general (but not always), a generalized likelihood ratio test will be a good test. One
possible drawback of the test is that it is sometimes difficult to find max py(y x) ; another is that
it can be difficult to find the probability distribution of the ratio which is required to evaluate
the size a and the power g of the test.
54 Testing theory
Example 1
Assume that the scalar random variable y has the following probability density function:
(4)
and
(6)
The second maximum is a bit more complicated to derive. Let us first consider the maximum
problem without the restrictions on x :
(7)
From Elementary calculus you know that a necessary condition for xmax to be a solution of (7)
is:
(8)
From Elementary calculus you also know that xmax corresponds to a maximum if:
1
Substitution of (8) shows that the inequality is indeed fulfilled. Thus, xmax maximizes:
xe yx and y
Testing of composite hypotheses 55
We know that xmax 1/y produces the maximum for the case without restrictions. But if
xmax 1/y £ x0 , it will also produce the maximum for the case with the restrictions. Hence:
(9)
1 yx
Let us now consider what happens if > x0 . Figure 3.1 shows a sketch of xe with its
maximum at xmax 1/y : y
yx
Figure 3.1 : Sketch of graph of xe .
1
This shows that for the case 0 < x £ x0 and xmax > x0 the maximum of py(y x) is reached
at x0 . Thus: y
(10)
(11)
Since aŒ(0,1) we may restrict ourselves to the second equation of (11). This gives with (3) the
generalized likelihood ratio test:
(12)
56 Testing theory
Write:
(13)
and note that the function ze (z 1) has its maximum at z 1 (prove this yourself). Hence z ≥ 1
and ze (z 1) < a if and only if z > k , where k is a constant satisfying k > 1 (see Figure 3.2).
We see therefore that the generalized likelihood ratio test reduces to:
or
(14)
(z 1)
Figure 3.2 : Graph of ze .
Example 2
Assume that the scalar random variable y is normally distributed with variance s2 . The
following two hypotheses are considered:
(15)
(16a)
(16b)
(17)
(18)
(19)
or simply (y x0)/s > ka with ka > 0 . The generalized likelihood ratio test reduces therefore to:
58 Testing theory
Note that (y x0)/s has a standard normal distribution under H0 . Compare the above result with
Example 3 of Chapter 1.
Example 3
Again it is assumed that the scalar random variable y is normally distributed with variance s2 .
The following two hypotheses are considered:
(21)
(22)
(23)
(24)
2
If we denote ln a by ka , this reduces to:
or to:
(25)
Example 4
It is assumed that the mx1 vector random variable y has a probability density function:
(26)
(27)
(28)
(29)
Let us first consider the unrestricted maximum of py(y x0 , s2) . The following holds (prove
yourself):
(30)
and
(31)
Setting (30) equal to zero gives:
(32)
Substitution of (32) into (31) shows that the second derivative of py(y x0 , s2) at s2 smax is
2
(33)
For the case that smax £ s0 it follows from Figure 3.4 that:
2 2
60 Testing theory
(34)
We may now collect our results. From (28), (32), (33) and (34) follows then that:
(35)
(36)
Write:
(37)
and note that the function z exp[ 1 m(z 1)] has its maximum at z 1 (prove this yourself).
2
m 2
Hence, z ≥ 1 and z exp[ 1 m(z 1)] < a if and only if z > ka , where ka is a constant satisfying ka ≥ 1
2
2
(see Figure 3.5). We see therefore that the generalized likelihood ratio test (36) reduces to:
(38)
Testing of composite hypotheses 61
m
1
Figure 3.5: The graph of z 2 exp[ m(z 1)] .
2
Compare this result with Example 1 of Chapter 2. Note that (y ex0) (y ex0) / s0 is distributed
2
Example 5
It is assumed that y has the same probalility density function as in the previous example. The
following two hypotheses are considered:
(39)
(40)
The denominator of the likelihood ratio is given by the unrestricted maximum of py(y x0,s2) .
From the previous example we know that:
(41)
(42)
(43)
62 Testing theory
(44)
with
(45)
Since the function in (44) has a maximum of 1 at z = 1, it follows that the generalized likelihood
ratio test may also be written as:
(46)
where k1 £ 1 and k2 ≥ 1 .
A sketch of the critical region K of this test is given in Figure 3.6. Compare this figure with
Figure 2.3 of Chapter 2.
Recall that the power g of a test is defined as the probability of correctly rejecting H0 . In case
of a simple alternative hypothesis HA the power can be calculated as (see (28) of Section 2.2):
(47)
For more general classes of alternative hypotheses, the power will depend on the particular
alternative value of the parameter x being considered. In order to determine how good the chosen
Testing of composite hypotheses 63
test may be, compared to a competing test, it is therefore necessary to compare the power for
all possible alternative values of x rather than for just one alternative value as in (47). For this
purpose, it is necessary to consider the calculation of the power as a function of x. This leads
to the concept of the powerfunction g(x).
The powerfunction g(x) of a test is the function of the parameter x that gives the probability that
the sample or observation will fall in the critical region of the test when x is the true value of
the parameter.
(48) .
In terms of the powerfunction we now define an optimum property that a test may possess. Let
g(x) be the powerfunction corresponding to the critical region K, and let g*(x) be the
powerfunction corresponding to the critical region K*. A test of H0: xŒF0 versus HA: xŒF F0 ,
with critical region K, is defined to be a uniformly most powerful test of size a if and only if:
(i) and
(ii) and for any test with critical region K* and size
a max g (x) .
xŒF0
The adverb uniformly in the above definition refers to all alternative x values. As we will see,
a uniformly most powerful test does not exist for all testing problems, but when one does exist,
we can see that it is quite a nice test since among all test of size a it has the greatest chance of
rejecting H0 whenever it should.
In some cases when H0 is simple and HA is composite it is possible to find a uniformly most
powerful test with the help of the Neyman-Pearson theorem. Assume:
(49)
Now choose a particular x say x1 from F x0 . Then according to the Neyman-Pearson theorem
the simple likelihood ratio test:
(50)
(51)
Now, if it is possible to show that the same test (50) follows when x1ŒF x0 is replaced by
another arbitrary parameter from F x0 , then this test is a uniformly most powerful test. If this
is not possible, then no uniformly most powerful test for testing (49) exists.
Example 6
From Example 3 of Chapter 2 we know that the simple likelihood ratio test for testing:
(53)
reads:
(54)
(55)
(56)
On the other hand, inequality (55) reduces for all xA > x0 to:
Testing of composite hypotheses 65
(59)
Since the two tests (57) and (60) which correspond to (58) and (61) respectively, are not
identical, it follows that no uniformly most powerful test exists for testing:
(62)
Thus the generalized likelihood ratio test for testing (62) cannot be a uniformly most powerful
test.
Example 7
Assume that the scalar random variable y has a c2(m ,l) distribution. Its probability density
function reads then:
(63)
In order to derive a uniformly most powerful test, we first consider the following two simple
hypotheses:
(65)
or
This function is clearly a monotone decreasing function for y. From this follows that
py(y 0) //py(y lA) < a if and only if y > ka , where ka is some positive constant. Hence, the most
powerful test for testing (65) is:
(66)
Since the inequality y > ka is independent of lA > 0 , it follows that (66) is the uniformly most
powerful test for testing (64).
Example 8
Assume that the scalar random variable y has an F(m, n, l) distribution. Its probability density
function reads then:
(67)
In order to derive a uniformly most powerful test, we first consider the following two simple
hypotheses:
Testing of composite hypotheses 67
(69)
or
or
This function is clearly a monotone decreasing function of y. From this follows that
py(y 0) //py(y lA) < a if and only if y > ka , where ka is some positive constant. Hence the most
powerful test for testing (69) is:
(70)
Since the inequality y > ka is independent of lA > 0 , it follows that (70) is the uniformly most
powerful test for testing (68).
In the above Example 6 we discussed a testing problem for which no uniformly most powerful
test exists. Unfortunately there are many such hypothesis-testing problems for which no
uniformly most powerful test exists. In fact, this is the case for all testing problems that will be
considered in the remaining part of these lecture notes. The reason why a uniformly most
powerful test does not exists for a particular testing problem is usually due to the fact that one
considers a class of critical regions which is too large. The idea is therefore to restrict the class
of critical regions and to search for a uniformly most powerful test in this restricted class. One
68 Testing theory
way to restrict the class of critical regions is based on the principle of invariance. The following
example should make this idea clear.
Example 9
It is assumed that the m×1 random vector y is distributed under H0 and HA as:
(71)
(72)
(73)
Now we note that if RR* = Im (R is an orthogonal matrix), then (73) can be written as:
(74)
Comparison of (74) with (71) shows the equivalence of the two testing problems. We say that
the testing problem (71) is invariant under the transformation (72) if matrix R is orthogonal
( RR Im ). Because of the equivalence of (71) and (74), we would of course also like to have
the same test for both problems. This implies that if K is the critical region for the test of (71),
K should also be the critical region for the test of (74). Thus, if yŒK then also vŒK and if yœK
then also vœK . But since v Ry , this implies that K should be invariant for this transformation.
From (72) follows with RR I or R R 1 that :
(75)
From this follows that the critical region K must have a (hyper) spherical shape with its centre
at 0. Hence this leaves us with the following two possibilities:
(76)
Within this restricted class of critical regions we may now try to find a uniformly most powerful
test. If it exists, it is called the uniformly most powerful invariant test. The scalar random
variable y y has a c2 -distribution and is distributed under H0 and HA as:
Testing of composite hypotheses 69
(77)
From Example 7 we know that the critical region K2 of (76) gives the most power. Hence, the
uniformly most powerful invariant test of (71) reads:
(78)
Now let us have a look at the generalized likelihood ratio test of (71). It is given as:
But this inequality reduces to the same inequality of (78). We have reached therefore the
important conclusion that the generalized likelihood ratio test of (71) is a uniformly most
powerful invariant test.
Without proof we now state that all generalized likelihood ratio tests of the next chapters are in
fact uniformly most powerful invariant tests (for a proof see (Arnold, 1981)).
4 Hypothesis testing in linear models
4.1 The models of condition and observation equations
In this chapter we will derive and discuss the generalized likelihood ratio test for the important
case of linear models. In this section we consider the linear models of both condition equations
and observation equations.
We assume that the m×1 vector of observables y is normally distributed with known
variancematrix Q :
y
(1)
It is assumed that matrix Qy is of full rank. The hypotheses that will be considered in this
chapter are all hypotheses on the mean, E y , of y . The following two hypotheses are
considered:
(2) .
It is assumed that rankB = b, rankCt = q and that the q×1 vector — is unknown under HA. Note
that both the hypotheses H0 and HA are composite if b < m. If b = m, then the hypothesis H0
reduces to a simple hypothesis. The hypotheses of (2) are formulated in terms of condition
equations. As we know a completely equivalent formulation is possible in terms of observation
equations. In order to transform (2) into observation equations we consider the inhomogeneous
system of linear equations:
(3)
We know that the solution of this inhomogeneous system is given by the sum of a particular
solution and the solution of the homogeneous system. If the m×q matrix Cy is defined such that
it satisfies B Cy Ct , the particular solution of (3) is given by:
(4)
(5)
Taking the sum of (4) and (5) given the solution to (3):
72 Testing theory
(6)
(7) .
Both the matrices A and Cy are of full rank. Thus rank A = n and rank Cy q . Furthermore rank
(A Cy) n q . In practical applications the formulation of the hypotheses of (2) and (7) is
usually achieved in the following way. In geodetic practice one generally has a good idea of how
to model a particular problem in terms of either condition equations or observation equations.
This then results in the null hypothesis H0. However, while formulating the null hypothesis H0,
usually a number of assumptions are made. For instance, one assumes that the data are free from
blunders, or that the effect of refraction is negligible, or that the points of a geodetic network lie
in a two dimensional Euclidean plane etc. In order to find out whether these assumptions are
valid or not, one opposes the null hypothesis H0 to a more relaxed alternative hypothesis HA in
which more explanatory variables, namely — in (2) and (7), are introduced. The explanatory
variables — are then supposed to model those effects which were assumed absent in H0 . For
instance, through — one may model the presence of one or more blunders in the data, or the
presence of refraction, etc. The test of H0 versus HA informs us then on whether or not the
additional explanatory variables — should be taken into account. That is, the test should then
inform us on whether for instance blunders in the data are absent or not. However, referring to
the two types of errors one can make in testing, the type I and the type II error, and to the fact
that every model is only an approximation, one should never forget that the result of a test is
only indicative and never a proof of the correctness of one model over another!
Now let us derive the generalized likelihood ratio test for testing H0 against HA . From the
previous chapter (see (3) in Section 3.1) we know that this test can be computed from the
probability density function of y under H0 and HA . The probability density function of y under H0
reads:
(8)
(9)
The numerator of the generalized likelihood ratio test is given by max py(y x) . Let us denote the
n
xŒ
value of x that maximizes py(y x) by x̂0 . The index "0" is used to indicate that the density
function of y under H0 is taken. Since x̂0 maximizes py(y x) , we have:
Hypothesis testing in linear models 73
(10)
Recall from Adjustment theory that x̂0 is the maximum likelihood estimate of x and that the
maximum likelihood estimate, in case of a normal distribution, is identical to the least-squares
estimate of x. Since the least-squares residual vector is given by:
(11)
it follows from (8) and (10) that the numerator of the generalized likelihood ratio test is given
by:
(12)
Now let us have a look at the denominator of the generalized likelihood ratio test. It is given
by max py(y x , —) . Let us denote the value of x and the value of — that maximize py(y x , —)
xŒ , —Œ
n q
by x̂A and —
ˆ respectively. Then:
(13)
(14)
it follows from (9) and (13) that the denominator of the generalized likelihood ratio test is given
by:
(15)
Since this ratio is less than a positive constant if and only if the term within the brackets [...] is
larger than a positive constant, it follows that the generalized likelihood ratio test for testing H0
against HA reads:
(16) .
The left-hand side of the inequality in (16) is expressed in terms of ê0 and êA . It is also possible
however to express the left-hand side of the inequality in (16) solely in terms of:
74 Testing theory
(17)
or that:
(18)
The second term on the right-hand side of (18) can be written with the help of (17) as:
(19)
In order to see this, recall from Adjustment theory that one of the properties of the least-squares
method is that the least-squares residual vector is orthogonal to the columns of the designmatrix.
In the present context this means that êA is orthogonal to the columnvectors of the matrix
1
(A Cy) , where orthogonal is "measured" with respect to the Qy -metric. This implies that:
(21)
With (21), equation (20) follows from (19). Substitution of (20) into (18) gives with (16):
(22) .
Note that intuitively this test makes sense. One would expect to reject H0 if ŷA differs
1
considerably from ŷ0 , that is, one would expect to reject H0 if (ŷ0 ŷA) Qy (ŷ0 ŷA) is large. Also
1
note that since the left-hand side of the inequality in (22) is always non-negative, ê0 Qy ê0 must
1
always be larger than or equal to êA Qy êA . This corresponds with our earlier remark in the
previous chapter that the denominator of the generalized likelihood ratio is always larger than
or equal to the numerator. It seems, the way in which (16) and (22) are formulated, that we need
both ê0 and êA or ŷ0 and ŷA in order to perform the test. This would imply that a least-squares
computation under both H0 and HA is needed. Fortunately this is not the case. We will show that êA
in (16) or ŷA in (22) are not explicitly needed in order to perform the test. In order to show this,
we will first write ŷA in terms of ŷ0 and — ˆ . Consider therefore the two systems of normal
equations that correspond to H0 and HA :
Hypothesis testing in linear models 75
(23)
and
(24)
These systems of equations have a unique solution since rank A = n, rank (A Cy) = n+q.
Substitution of (23) into (24) gives:
(25)
Pre-multiplication of this system of equations with the square and full rank matrix:
gives:
(26)
(27)
(28)
or that:
76 Testing theory
(29)
(30)
(31)
(verify this yourself). Substitution of (31) into (30) gives together with (22) the test:
(32) .
(34) .
This result shows that êA , ŷA and — ˆ are not explicitly needed to perform the generalized
likelihood ratio test for testing H0 against HA . So far we have seen four different expressions
for the generalized likelihood ratio test, namely (16), (22), (32) and (34). There is however also
a fifth useful expression. This expression is in particular useful if the hypotheses are formulated
in terms of condition equations like in (2). The expression is formulated in terms of t , the vector
of misclosures. Recall that ê0 and Qê may be written is terms of t B y and Qt B QyB as:
0
(35)
(36) .
Hypothesis testing in linear models 77
The random variable defined by the left-hand side of the inequalities in (16), (22), (32), (34) and
(36) will be denoted by T q . Thus in terms of the expression in (36) we have:
(37)
Now in order to compute the critical value ka from the size a of the test, we need the
distribution of T q . Substitution of:
(38)
(39)
(40)
with
(41)
(42) .
To conclude this section a summary of the important results is given in Table 4.1.
78 Testing theory
Hypotheses
Teststatistic T q :
Distribution of T q :
(43)
In the present section the equality of these expressions will be shown geometrically. Let us first
consider the hypotheses H0 and HA:
(44)
Since it was assumed that rank A = n and rank ( A Cy ) = n+q, the dimensions of the range
spaces of A and ( A Cy ) are respectively: dimR(A) = n and dimR( A Cy ) = n+q. Since the
matrices A and ( A Cy ) have m-number of rows, it follows that the columnvectors of these
matrices are elements of m . Thus R(A) Ã m and R(A Cy) Ã m . Since the columns of matrix
A can be written as linear combinations of the columns of matrix ( A Cy ) it follows that
R(A) Ã R(A Cy) . Thus the rangespace of A is a linear subspace of the rangespace of ( A Cy ).
The equation of H0 in (44) states that E y under H0 can be written as a linear combination of
the columnvectors of matrix A. This implies that E y H0 ŒR(A) . Similarly the equation of HA in
(44) can be translated into E y HA ŒR(A Cy) . The above results can be summarized as:
(45)
Recall from Adjustment theory that the method of least-squares can be interpreted geometrically
as a method of orthogonal projection. That is ŷ0 follows from the orthogonal projection of
y onto the rangespace of A, and ŷ A follows from the orthogonal projection of y onto the
rangespace of (A Cy) . Thus:
80 Testing theory
(45)
m
Figure 4.2: yŒ , ŷ0 PAyŒR(A) , ŷ A P(A C y)
yŒR(A Cy).
1
This is shown in Figure 4.2. Recall that orthogonality is "measured" with respect to the Qy -
metric. This means that the innerproduct and norm in m are defined as:
(46)
Since ŷ0 is the orthogonal projection of y onto R(A), it follows that y ŷ0 is orthogonal to ŷ0 .
Thus (y ŷ0 , ŷ0) 0 , see also Figure 4.2. Since ŷ A is the orthogonal projection of y onto
R(A Cy) it follows that y ŷ A is orthogonal to R(A Cy) and thus also orthogonal to
ŷ AŒR(A Cy) . Thus: (y ŷ A , ŷ A) 0 , see also Figure 4.2. Since y ŷ A is orthogonal to R(A Cy) ,
it is also orthogonal to R(A) Ã R(A Cy) . But ŷ0ŒR(A) . Hence, y ŷ A is also orthogonal to ŷ0 .
Thus: (y ŷ A , ŷ0) 0 . Summarizing we have:
(47)
The four orthogonality relations of (47) and (48) are shown in Figure 4.3.
The right-angled triangle y , ŷ0 , ŷ A of Figure 4.3 has been shown again in Figure 4.4.
(49) .
1
In terms of the matrix Qy this can be written as:
(50)
Let us now consider the third equation of (43). We know that ŷ A can be written as:
(51)
The vector Cy— ˆ can be further decomposed in a part that lies in the range space of A, R(A), and
in a part that lies in the orthogonal complement of R(A), R(A)^ . This gives:
(52)
(53)
^
We know that ŷ A ŷ0ŒR(A)^ and ŷ0ŒR(A) . From this follows that PA (ŷ A ŷ0) ŷ A ŷ0 and
^ ^
PA (ŷ A ŷ0) PA ŷ A , and thus that:
(54)
or:
(55)
^
(56) ŷ0 ŷ A PA Cy—
ˆ .
Hypothesis testing in linear models 83
Figure 4.7:
or as:
(57)
(58)
(59)
Compare this result with the second and third equation of (43).
^
Let us now consider the fourth equation of (43). According to (55) ŷ A ŷ0 PA Cy—.
ˆ From this
follows that:
(60)
^
Since ê AŒR(A Cy)^ and R(PA Cy) Ã R(A Cy) it follows that:
(61)
84 Testing theory
(62)
(63) .
^ ^
Figure 4.8: ê0 PA Cy—
ˆ ê ; P ê
A P C 0
^
A
PA Cy—
ˆ.
y
This is shown in Figure 4.8. The right-hand side of (63) can be written in terms of matrices as:
(64)
or as:
(65)
The matrix PP ^
ACy
is given as:
(66)
(67)
^
Since PA ê0 ê0 this finally gives:
(68)
Let us now consider the fifth and last equation of (43). The geometry of this equation is quite
different from the geometry of the previous four equations. Note namely that the first four
quadratic forms of (43) are all expressed in terms of vectors that are elements of m . That is:
ê0Œ m, ê AŒ m, ŷ0Œ m , ŷ AŒ m, and Cy—Œ
ˆ m . The fifth quadratic form of (43) is expressed
however in the vector of misclosures, t , which is an element of b , Thus tŒ b and tœ m . If we
1
consider b to have an innerproduct defined by the Qt -matrix, it is still possible to interpret
the fifth quadratic form of (43) geometrically. In fact:
(69) .
and
In the previous two sections we have seen that the generalized likelihood ratio test for testing:
(70)
or
(71)
Because it was assumed that rank , it follows that q can never be larger than m-n.
If q would be larger than m-n, then rank would be larger than m, which is impossible
since the matrix only has m rows. The value of q can also not be chosen equal to zero.
If q = 0, then the matrix would not exist and the two hypotheses H0 and HA would be
identical. Thus we may conclude that the range of q is given by:
(73)
In this section we consider the case q=1. For this case the following three expressions of
T q are of interest:
(74)
We have dropped the index "0", because it will be clear by now that the least-squares residual
vector ê belongs to model H0 . If q = 1, the b×q matrix Ct and the m×q matrix Cy reduce to b×1
Hypothesis testing in linear models 87
and m×1 vectors respectively. In order to accentuate this, we will replace the capitals "Cy" and
"Ct" by the small letters "cy" and "ct". In this case the first expression of (74) can be written as:
or as:
(75) .
(78)
The estimator for the model error (33) reduces for q=1 to:
(79) .
With (75), (77) and (79) we have three expressions for the 1-dimensional T -teststatistic or the
square of the w -teststatistic. The first two of them are the more useful ones because they do not
need explicitly the results of least-squares computation under HA. The first expression (75) is
useful when the hypotheses are formulated in terms of condition equations. The second
expression is however the most commonly used expression in practice.
88 Testing theory
the presence of a blunder in the ith- observation, the hypotheses take the form:
(80)
with
(81)
with
(83)
If test (82) comes to reject H0, a blunder or gross error in the ith observation is suspected.
Checking and/or remeasurement will then be necessary. By taking i in the above test to be
successively 1,...,m the whole observations vector can be screened for observational blunders.
This procedure is called datasnooping. Generally the observation with the largest value of (83),
in absolute sense, should be rejected.
(84)
Hypotheses
or
Datasnooping
w-Teststatistic
Distribution of w
(85)
or
(86)
In the previous section we considered the case q = 1. In this section we consider the other
extreme, namely q = m-n. For this case the following two expressions of T q are of interest:
(88)
Since it was assumed that rank (A Cy) n q , it follows that if q = m-n then rank (A Cy) m . In
this case matrix Cy is chosen such that the matrix (A Cy) is square and of full rank. But this
m
means that no restrictions are placed on E y under HA. That is, since R(A Cy) if q=m-n
we have E y HA Œ . In other words, by choosing q=m-n, the number of explanatory variables
m
that are added to H0 in order to form HA are such that the redundancy (overtalligheid) of the
linear model under HA equals zero! But this implies that:
(89)
(90) .
We have again dropped the index "0", because it will be clear that the least-squares residual
vector ê of (90) belongs to model H0. Now let us see what happens with the second expression
of (88) if q = m−n. If q = m−n, then the full rank matrix Ct of (86) has b-number of rows and
(m−n)-number of columns. But we know that b = m−n. Hence, in case q = m−n the matrix Ct
is square and of full rank. But this means that the matrix Ct is invertible and therefore gets
eliminated from the second expression of (88). Thus if q = m−n, then:
(91) .
With (90) or (91) the generalized likelihood ratio test for testing the hypotheses:
(92)
or
(93)
reads:
(94)
The distribution of Tq m n
under H0 and HA follows from (42) as:
(95)
92 Testing theory
In many publications where the generalized likelihood ratio test for testing (92) or (93) is
described one will see that not the teststatistic Tq m n is used, but instead of Tq m n the teststatistic
Tq m n/(m n) . This teststatistic is denoted by ŝ2 . Thus:
(96) .
The distribution of ŝ2 under H0 and HA is given as (see also appendix A):
(98)
It will be clear of course that test (97) is completely identical to test (94). Hence there is no
special reason why the teststatistic ŝ2 should be used instead if Tq m n . However, there does exist
a special reason why the notation " ŝ2 " is used in (96). Recall from Adjustment theory (Section
2.4) that:
(99)
or with: (96)
(100)
Hence, ŝ2 can be considered an unbiased estimator of the variance factor of unit weight s2 .
This is the reason why the notation " ŝ2 " is used in (96).
The practical importance of the above given test ((94) or (97)) for testing (92) or (93) lies in the
fact that no restrictions are imposed in the mean of y under HA, that is E y HA Œ m . In other
words, for the case q = m-n no matrix Cy or matrix Ct needs to be specified. This in contrast
to all those cases for which q < m-n. For all those cases for which q < m-n one needs to specify Cy
or Ct , and therefore one has to have some idea of what kind of misspecifications to expect in
H0. In some cases this is possible. For instance, experience has learned that the class of
conventional alternative hypotheses used in datasnooping is one class that should always be taken
into account in geodetic network applications. But still this class may not cover the totality of
Hypothesis testing in linear models 93
misspecifications in H0 that occur in a particular application. In fact, one will never be able to
completely specify the class of alternative hypotheses for a particular problem, simply because
one never knows beforehand what misspecification has occured in H0. In this light one should
see test (94) or (97) as an important safeguard. The test gives an indication of the validity of H0
without the need to specify the alternative hypothesis through Cy or Ct . As such it can be
considered an overall model test. Appendix C elaborates on the relation between the overall
model test and the w-test of the previous section. A summary of the results of this section is
given in Table 4.4.
Hypotheses
or
-Teststatistic
Distribution of
From section 4.1 we know that the generalized likelihood ratio test for testing:
(101)
is given by:
(102)
where:
(103)
and
(104)
with
(105)
In (102) we have used the notation " ca(q,0) " for the critical value instead of the notation
2
" ka ". The notation " ca(q,0) " makes it clearer that the critical value should be computed
2
from the central c2 -distribution with q degrees of freedom. Instead of test (102) we may also
write:
(106)
where:
(107)
The two tests (102) and (106) are of course identical. In order to perform the generalized
likelihood ratio test (102) or (106), one needs to compute the critical value,
ca(q,0) or Fa(q,•,0) , for a chosen size a and a fixed number q of degrees
2
of freedom. Let
us denote the probability density distributions of c2(q,l) and F(q,•,l) respectively by
pc (c2 q,l) and pF(F q,•,l) . Then:
2
(108)
Hypothesis testing in linear models 95
These relations can be used to compute the critical values ca(q,0) or Fa(q,•,0) from a and q.
2
Standard tables exist that give ca(q,0) or Fa(q,•,0) for various values a and q (see appendix
2
B). Some typical values of ca(q,0) or Fa(q,•,0) are given in Table 4.5 and Table 4.6
2
respectively. Note from these tables that for a fixed number of degrees of freedom, the critical
values ca(q,0) or Fa(q,•,0) get smaller for larger a. This is also what one would expect. One
2
would expect that if H0 is true, the occurence of large values of Tq in (102) is less frequent than
the occurence of smaller values of Tq .
Also note from Table 4.5 that for a fixed a, the critical values ca(q,0) get larger for larger q.
2
This is also what one would expect. Since the c2 -distribution is defined as a sum of squares of
independent standard normal random variables, one would expect that the right tail of the c2 -
distribution gets thicker for larger sums (see Figure 4.12). Note on the other hand from Table
4.6 that for a fixed size a, the critical values Fa(q,•,0) get smaller for larger q. This is of course
due to the division by q in (106).
In Section 1.5 where the general steps for testing hypotheses were outlined, it was pointed out
that one should compute the size of the type II error in order to ensure that a reasonable
protection exists against type II errors. Since the size of a type II error equals one minus the
power of the test, we might as well compute the power g. The power of test (102) or (106)
follows as:
(109)
Note that the power g depends on: 1) the chosen size a; 2) the number of degrees of freedom
q; 3) the non-centrality parameter l. In Table 4.7 some typical values of g are given. Table 4.7
shows that the power g gets larger if the size a of the test is chosen larger. This is also what one
would expect. A larger size a implies a smaller critical value ca(q,0) or Fa(q,•,0) , and therefore
2
with (109) a larger power g. Table 4.7 also shows that the power g gets smaller for larger q.
Table 4.7: The power of test (102) or (106) for different values of a, q and l.
This is understandable if one thinks of q as the number of additional parameters in HA. The
smaller q is, the less additional parameters are used in HA and therefore the more "information"
is used in formulating HA. For such an alternative hypothesis one would expect that if HA is true
the probability of accepting it is higher. Finally note that Table 4.7 shows that the power gets
larger if the non-centrality parameter l gets larger. This is understandable if one looks at the
geometry of the testing problem. Substitution of:
Hypothesis testing in linear models 97
(110)
Thus Cy— is the separation or distance between H0 and HA (see Figure 4.13). Now, one would
expect that the power of the test increases if the distance between H0 and HA, thus Cy— ,
increases. But Cy— gets larger if l of (111) gets larger. Hence, one would indeed expect that
the power gets larger if l gets larger.
^
Figure 4.13: E y HA E y H0 Cy— ; Cy— 2
larger if PA Cy— 2
l larger.
(i) The power g of test (102) or (106) is monotonic increasing in a for fixed q and l
(ii) The power g of test (102) or (106) is monotonic decreasing in q for fixed a and l
(iii) The power g of test (102) or (106) is monotonic increasing in l for fixed a and q.
Since the power g of the test (102) or (106) depends on a , q and l , it seems that we have
three possibilities to construct a test which has a reasonable protection against type II errors. We
could increase a . But increasing a implies increasing the probability of a type I error. The size a
is therefore usually chosen at a fixed value. We could also decrease q . But usually we are not
free in choosing q . The value of q depends on the particular alternative hypothesis against
which one wants to test H0 . Finally one could try to increase the non-centrality parameter l .
What possibilities do we have to increase l ? With:
(113)
(114) Ô
Ì(ii) A the designmatrix
Ô
Ô(iii) C — the difference of E y H and E y H .
Ó y A 0
Let us now investigate what the effect on l is when either Qy , A or Cy— are changed.
ad i:
It will intuitively be clear that one can increase l by increasing the precision of the observables.
For instance if one uses µQy , where µ is a positive scalar, instead if Qy , then the non-centrality
parameter lµ becomes (see (113)):
This shows that the non-centrality parameter increases if µ decreases, that is if the observables
have a higher precision. Compare this with example 4 of Section 2.2. The dependence of l on Qy
and therefore the dependence of the power g of the test on Qy , makes it possible to obtain a test
with sufficient power if the variance matrix Qy is appropriately chosen. Since Qy depends on
the precision of the measurement equipment, an appropriate choice of measurement equipment
enables one to obtain a test with sufficient power.
ad ii:
In geodetic network applications matrix A depends on the structure of the network. Hence by
changing the structure of the network one changes A and therefore also changes l . This is an
important result, because it shows that one can look for a design or structure of a network that
is optimal in the sense that it gives a test with sufficient power. It will intuitively be clear that
one can increase l and therefore also increase the power of the test, by increasing the number
of observables. In order to prove this, let us consider the following two situations. We have a
network for which the following model holds:
(115)
(116)
The two models, that is the two networks, differ in the sense that the second model consists of
the first model plus one additional observation equation, namely E z a x . In terms of
condition equations the two models can be written as:
(117)
and
(118)
with B A 0 and b1 A b2a 0 . Note that the additional observation equation in (116)
implies an additional condition equation in (118). We will now show that the non-centrality
parameter of model (118), denoted by lb , is always larger than the non-centrality parameter of
model (116) denoted by l , The non-centrality parameter of model (116) reads (see (42)):
(119)
(120)
Since:
(121)
(123)
(124)
Since the quadratic form on the right-hand side of (125) is always non negative, equation (125)
shows that:
(126)
This shows that the power of the test indeed gets larger if the number of observations or the
number of condition equations gets larger. Compare this with Example 6 of Section 2.3.
ad iii:
Equation (113) shows that l and therefore the power of the test can be changed by changing
Cy— E y HA E y H0 . From Figure 4.13 we learn that in general l gets larger if the
separation between E y HA and E y H0 is increased. Note however that the component of Cy—
which lies in R(A) has no effect on l. In practice of course Cy— is unknown. Hence one will
never be able to compute the actual power of the test. Still, by choosing some representative
values for the separation, Cy— E y HA E y H0 , between HA and H0 , one can compute what
the power of the test would be if Cy— were the "true" separation. In this way one can find out
how well the test can detect a particular misspecification Cy— in H0 . For instance in blunder
detection the scalar — models the size of the blunder. By choosing a representative value for the
blunder, one can compute through l the probability that the test will detect the blunder with the
chosen size — . If one considers this probability too low, one has two possibilities to increase this
probability, either by changing Qy or by changing A.
So far we have been concentrating on the power g of the test, that is, on the probability of
rejecting H0 when in fact HA is true. We have seen that the power g can be computed from the
size of the test, a, from the degrees of freedom, q, and from the non-centrality parameter l.
Symbolically this may be written as:
(127)
In geodetic practice one is however not so much interested in the power of the test. One is much
more interested in the misspecification or modelerror Cy— that generates g. That is, one is much
more interested in the model error that can be detected with a certain probability g. The approach
taken in geodetic practice is therefore to fix g at a reference value g0 , for instance g0 50% ,
or 60% , or 70% , but usually 80% . From a, q and the chosen reference value g g0 one can
then compute the corresponding value for the non-centrality parameter, symbolically:
(128)
The non-centrality parameter plays an important role in linking the overall model test and the
w-test in Appendix C. From l l0 one can now compute the corresponding modelerror Cy— .
This is done by solving the quadratic form (see (105)):
(129)
102 Testing theory
for — . Once — is known, the modelerror —y E{y HA} E{y H0} follows as:
(130) .
The m×1 vector —y is said to describe the internal reliability (inwendige betrouwbaarheid) of H0
with respect to HA . One should not confuse the geodetic usage of the word "betrouwbaarheid"
with its usage in mathematical statistics. The internal reliability as described by —y is thus a
measure of the model error that can be detected with a probability g g0 by test (102) or (106).
How can we compute the q×1 vector — from (129)? Unfortunately (129) has no unique solution
for — . We will consider the following two cases: q 1 and 1 < q £ m n .
The case q 1 : If q 1 , then the m×q matrix Cy reduces to the m×1 vector cy , and the q×1
vector — reduces to the scalar — . For this case equation (129) can also be written as:
Note that one is only able to determine the size of — , but not its sign. In order to give a
geometric interpretation to (131), recall that:
Hence:
(132)
and
(133)
where use is made of the cosine rule. From (132) and (133) follows that:
(134)
Formula (134) shows that the denominator of (131) is small, and thus — is large, if the angle q
is close to 1 p . Thus — gets smaller and the internal reliability improves, the smaller the angle
2
q between cy and R(A)^ gets. If q 1
p , then cyŒR(A) and — • . This implies that the
2
corresponding model error can never be detected by the test. The internal reliability is then said
to be infinitely poor. Since 0 £ cos2q £ 1 , it follows from (134) and (131) that:
(135)
(136)
In many practical applications the variance matrix Qy is a diagonal matrix (see also (84)). If
Qy is diagonal, it follows with the choice (136) that:
(138)
Substitution of (138) into (131) gives then for the minimal detectable bias:
(139)
This shows that —i is large if sŷ is close to sy , and that —i is small if sŷ is small
2 2 2
i i i
(140)
is called the ith local redundancy number. Note that since 0 £ sŷ £ sy , the ith local redundancy
2 2
i i
The reason why ri is called the ith local redundancy number follows from the fact that:
(142)
Thus the sum of the local redundancy numbers equals the total redundancy. The proof of (142)
goes as follows. From (140) follows that:
Hence:
(143)
From Linear algebra you know that the trace of a matrix equals the sum of its eigenvalues. Thus:
(144)
^ ^
where li, i 1, ,m, are the m eigenvalues of PA . We know that PA is an orthogonal projector
with the properties:
(145)
^
Since dim R(A) n and dim R(A)^ m n , it follows from (145) that PA has (m n) number
of eigenvalues that equal 1 and n number of eigenvalues that equal 0. This together with (144)
and (143) shows that (142) must hold. Since the sum of the local redundancy numbers equals
the total redundancy m n , we may define the average redundancy r as:
(146)
If we replace the local redundancy numbers in (139) by the average redundancy, we get the
following rough approximation of —i :
Hypothesis testing in linear models 105
(147)
describes an ellipse, for the case q 3 it describes an ellipsoid and for the case q > 3 it
describes a hyperellipsoid. In order to get a form that resembles formula (131) we parametrize
the vector — as:
(149)
(150) .
By letting the vector d scan the unit sphere in , the vector — of (150) scans the ellipsoid as
described by (148). If one is interested in the principle axes of the ellipsoid (148) one should
1 1
choose d as one of the q number of eigenvectors of the matrix Cy Qy QêQy Cy :
(151)
We have seen that the model error that can be detected with a probability g g0 is given by the m×1
vector —y Cy— . In some practical applications however it can be rather cumbersome to
evaluate —y . Note namely that the number of vectors —y that need to be evaluated equals the
number of alternative hypotheses HA considered. This implies that one has to evaluate the m
elements of —y for every alternative hypothesis considered. This amounts to a lot of evaluations
and may therefore not be very practical. A notable exception occurs in case of datasnooping,
where the vector —y has only one non-zero element. In order to reduce the number of
106 Testing theory
evaluations one could try to replace the vectorial measure —y by a scalar measure. The measure ly
defined below is a scalar measure that can be used as such. If we consider —y as a possibly non-
detected "bias" in y and the variance matrix Qy as a description of the "noise" in y , we may
define a scalar squared bias-to-noise ratio [Papoulis, 1985] for y as:
(152) .
A large value of ly indicates that the model error —y is significant, and a small value of ly
indicates that the model error is insignificant. Note that ly Cy— 2 . Thus ly is the separation
squared between E y H0 and E y HA (see Figure 4.14).
Substitution of —y Cy— with (150) for the case 1 < q £ m-n into (152) gives:
(153)
In case of datasnooping the case q=1 with a diagonal variancematrix Qy , formula (153)
simplifies to:
(154)
Let us denote the maximum value of the ratio in (153) by lmax . Thus:
Recall from Linear algebra that lmax equals the largest eigenvalue of the generalized eigenvalue
problem:
(155)
Example 1
Figure 4.15 shows a typical levelling network of four points with two loops.
Hypothesis testing in linear models 107
We assume that the variance matrix of the normally distributed observables is equal to a scaled
identity matrix. The linear model of condition equations reads then:
(156)
(157)
(158)
(159)
—2 of y2 . Computation of sŷ
2
Again we are interested in the minimal detectable bias 2
(160)
Comparison of (158) with (160) shows that a blunder in the second observation is better
detectable with the two loop network than with the one loop network. Compare this with our
discussion in Example 6 of Section 2.3.
Hypothesis testing in linear models 109
Hypotheses
or
Internal reliability
In the previous section internal reliability was defined as the model error that can be detected
with the generalized likelihood ratio test with a probability g g0 . It is described by the m×1
vector:
Ï
Ô(i) The influence of —y on x̂
Ô
Ô(ii) The influence of —y on a part of x̂, namely x̂
Ì 1
Ô
Ô(iii) The influence of —y on a linear function of x̂ namely, q̂ a x̂ .
Ô
Ó 1×1 1×n n×1
ad (i):
(161)
(162) .
Hypothesis testing in linear models 111
This vector describes the influence of the model error —y on x̂ . From (162) follows that
A—x̂ PA—y . Therefore:
(163)
This orthogonal decomposition of the model error —y into R(A) and R(A)^ is shown in Figure
4.17.
^
Figure 4.17: —y A—x̂ PA —y .
If we consider —x̂ of (162) as the possibly non-detected "bias" in x̂ and Qx̂ as a description of
the "noise" in x̂ , we may define a scalar squared bias-to-noise ratio for x̂ as:
(164) .
A large value of lx̂ indicates that the influence of the model error —y on x̂ is significant, and
a small value of lx̂ indicates that this influence is insignificant. Since Qx̂
1 1
A Qy A , it follows
from (164) that:
(165)
This is also shown in Figure 4.17. Using the Pythagoras’ theorem we may now relate lx̂ to l0 .
Application of the Pythagoras’ theorem to (163) gives:
(166)
^
Since l0 PA —y 2, lx̂ PA—y 2
(see (165)), and ly —y 2
(see (157)), it follows from
(166) that (see Figure 4.17):
(167) .
With (164) and (167) we have two ways of computing l : either via —x̂ as in (164) or via ly
x̂
as in (167). Since the computation of ly is rather straightforward (especially if the variance
matrix Qy is diagonal), one usually uses (167) for computing lx̂ . The scalar lx̂ may be used for
112 Testing theory
constructing an upperbound of an individual element of —x̂ . Let us assume that we are interested
in the ith element —x̂i of —x̂ . Then:
(168)
This is an inner product which can be written with the help of the cosine rule as:
(169)
and [—y Qy A(A Qy A) 1 A Qy —y]1/2 as PA—y lx̂ . Since cosqi £ 1 , the upperbound
1 1 1 1/2
(170) .
In the previous section for the case 1 < q £ m-n the expression for — of (150) was substituted
into the expression for ly . Similarly we can substitute — into the expression for lx̂ . Since:
(171)
which shows once again with (153) that (167) holds. In case of data snooping with a diagonal
variance matrix Qy , formula (171) simplifies to:
(172)
ad (ii):
Let us partition x̂ as x̂ (x̂1 ,x̂2 ) . The partitioned system of normal equations reads then:
(173)
In order to find the solution for x̂1 , we premultiply (173) with the square and regular matrix:
This gives:
(174)
In this expression we recognize the orthogonal projector:
(175)
(176)
From this result follows that we may write the least-squares estimator of x1 under H0 as:
(177)
(178)
Since:
and
(179)
If we use the abbreviations —y Cy— and —x̂1 E x̂1 HA E x̂1 H0 and x1 E x̂1 H0 , we may
write (179) as:
(180) .
This vector describes the influence of the model error —y on x̂1 . Compare this result with (162).
From (180) follows that A1—x̂1 PA —y . What is the relation between A1—x̂1 and A—x̂ PA—y ?
^ 1
Therefore:
(181)
(182)
(184)
This orthogonal decomposition of A—x̂ PA—y into R(A1) and R(A2) is shown in Figure 4.18.
Compare this with Figure 4.17.
If we consider —x̂1 of (180) as the possibly non-detected "bias" in x̂1 and Qx̂ as a description
1
of the "noise" in x̂1 , we may define analogous to (164) the squared "bias-to-noise" ratio for x̂1
as:
(185) .
1 1
Since Qx̂ 1
A1 Qy A1 (see (177)), it follows that:
(186)
This is shown in Figure 4.18. Using the Pythagoras’ theorem we may now relate lx̂ to lx̂ .
1
Figure 4.18):
or that:
116 Testing theory
(188) .
(189) .
Formula (170) gives an upperbound for the "bias-to-noise" ratio of an individual element of x̂ .
In a completely analogous way one can derive the following upperbound for the "bias-to-noise"
ratio of an individual element x̂1 , of x̂1 :
i
(190) .
Since lx̂ £ lx̂ , the bound of (190) is sharper than the bound of (170).
1
ad (iii):
(191)
Then:
(192)
(193) .
This shows how an arbitrary linear function of x̂ is influenced by model errors. If we write (193)
as —q̂ a Qx̂ Qx̂ —x̂ , application of the cosine rule gives:
1/2 1/2
(194)
In this expression we recognize sq̂ (a Qx̂a)1/2 and lx̂ (—x̂ Qx̂ —x̂)1/2 . The upperbound
1/2 1
(195) .
This result shows that lx̂ gives an upperbound for the "bias-to-noise" ratio of every arbitrary
1/2
function of x̂ .
Influence on x̂
Influence on x̂1
Influence on q̂ a x̂
(196)
for which the variance matrix Qy is assumed to be diagonal. This means that in case of
datasnooping the following formulae of internal and external reliability may be applied:
(197)
(198)
The observables are assumed to be normally distributed. Since the observation equations are
of the form E y i x1 aix2 , they describe the equation of a straight line with intercept x1 and
slope x2 . This is shown in Figure 4.19.
Figure 4.19: The line E y x1 ax2 with intercept x1 and slope tanj x2 .
Hypothesis testing in linear models 119
(199)
Since yi x1 aix2 is the vertical distance from the point (ai, yi) to the straight line
Ey x1 ax2 , the least-squares estimates x̂1 and x̂2 follow from a minimization of the sum of
the squares of these vertical distances (see Figure 4.20).
m
1
Figure 4.20: x̂1 and x̂2 follow from min (yi x1 ai x2)2 .
x1,x2 s2 i 1
Let us first derive the minimal detectable bias —i of the ith observable. According to (197a)
one can compute —i from l0, sy and sŷ . Since l0 is fixed and sy s2 , we only need to
2 2 2
i i i
compute:
(200)
(201)
it follows that:
(202)
(203)
(204)
Then:
(205)
(206)
From the structure of the variance matrix of (206) three conclusions can be drawn:
1. The least-squares estimators x̂1 and x̂2 are uncorrelated if and only if ac 0 , that is, if
the coordinates ai, i 1, ,m are distributed symmetrically about a 0 .
2. The covariance between x̂1 and x̂2 is negative if and only if ac is positive, that is, if the
cluster of points (ai,yi) is situated in the first or fourth quadrant. This means that if ac
is positive, an increase in x1 implies a decrease in x2 for an optimal fit (see Figure 4.21).
Hypothesis testing in linear models 121
3. The closer the coordinates aj, j 1, ,m , are to ac , the larger the variances of x̂1 and x̂2
get. In the extreme case that aj ac " j 1, ,m , the two columns of matrix A of (201)
are linearly dependent and the variances of x̂1 and x̂2 are infinite. Thus the closer the
coordinates aj, j 1, ,m , are to ac , the more difficult it becomes to estimate x1 and x2
(see Figure 4.22).
(207)
(208)
(209)
Substitution of (209) into (197a) gives with sy s2 for the minimal detectable bias of the ith-
2
i
observable:
(210) .
Note that the rough approximation given in (147) of Section 4.5, corresponds for the present case
to the approximation:
.
It follows from (210) that —i is smaller for points that have coordinates ai closer to ac . Hence,
a blunder in the ith observable is better detectable if the corresponding point (ai,yi) lies near the
centre of the cluster (aj,yi) j 1, ,m , than when it would be near the left or right edges of the
cluster. A similar effect can be seen for lx̂ . Substitution of (209) into (197c) gives namely:
(211) .
Let us now consider the "bias-to-noise" ratios of the individual estimators x̂1 and x̂2 . First we
will compute lx̂ . With Qy s2Im and A2 (a1, ,am) it follows that:
1
(212)
(213) .
This shows that lx̂ (—x̂1 / sx̂ )2 is small if ai is large and/or ai is small. Thus the effect of a
1 1
possibly non-detected blunder in the ith observable on the intercept estimator x̂1 , is less
significant for points with large coordinates ai than for points with smaller coordinates ai . And
it is even less significant if also ai is close to ac . With Qy s2Im and A1 (1, ,1) it follows
that:
(214)
(215) .
This result shows that lx̂ 0 if ai 0 . Hence the effect of a possibly non-detected blunder in
2
the ith observable on the slope estimator x̂2 is insignificant if ai is close enough to ac . This
effect increases however the more ai differs from ac .
124
Appendix A
Some standard distributions
Definition: An n×1 random vector x is said to be normally distributed if its probability density
function, px(x), is given as:
(1)
with Q an n×n positive definite matrix, and µ an n×1 vector. Note that a normal distribution
is completely specified once Q and µ are given. The following notation will be used for an n×1
normally distributed vector x :
(2)
Theorem: Let the expectation and dispersion of the random n×1 vector x be given as: E x x
and D x Qx . Let the random m×1 vector y be defined by y A x a . Then:
m×1 m×n n×1 m×1
(4)
Definition: A scalar random variable x is said to have a noncentral Chi-square distribution with
n degrees of freedom and non-centrality parameter l if its probability density function, px(x),
is given as:
(6)
with
Appendix A 125
The following notation will be used for a Chi-square random variable x with n degrees of
freedom and non-centrality parameter l :
(7)
(8)
Definition: A scalar random variable x is said to have a non-central F-distribution with m and
n degrees of freedom and non-centrality parameter l if its probability density function, px(x) ,
is given as:
(10)
The following notation will be used for an F-distribution with m and n degrees of freedom and
non-centrality parameter l :
(11)
Appendix B
Statistical tables
k 0 1 2 3 4 5 6 7 8 9
0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641
0.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247
0.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859
0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483
0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121
0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776
0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451
0.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.2148
0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867
0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611
1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379
1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170
1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985
1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823
1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681
1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367
1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294
1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183
2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143
2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110
2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084
2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064
2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048
2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036
2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026
2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
2.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014
3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010
3.1 0.0010 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.0007
3.2 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.0005
3.3 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0003
3.4 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002
Table B.1: Standard normal distribution. N(0, 1); given is a, probability in right-hand tail, for
critical values k, e.g. k=1.96 yields a= 0.0250.
calculation in Matlab:
alpha = (1-normcdf (critical value, mu, sigma))
Table B.2: Chi-square distribution. c2 (q, 0); given is k, critical value, for a, probability in
right-hand tail, and q, degrees of freedom, e.g. a=0.010 and q=10 yield k = 23.21;
k ca (q,0) for test (102) in Section 4.5.
2
calculation in Matlab:
critical value = chi2inv (1-alpha, degrees of freedom)
128 Testing theory
1 39.86 49.50 53.59 55.83 57.24 58.20 59.44 60.19 61.74 63.01
2 8.526 9.000 9.162 9.243 9.293 9.326 9.367 9.392 9.441 9.481
3 5.538 5.462 5.391 5.343 5.309 5.285 5.252 5.230 5.184 5.144
4 4.545 4.325 4.191 4.107 4.051 4.010 3.955 3.920 3.844 3.778
5 4.060 3.780 3.619 3.520 3.453 3.405 3.339 3.297 3.207 3.126
6 3.776 3.463 3.289 3.181 3.108 3.055 2.983 2.937 2.836 2.746
8 3.458 3.113 2.924 2.806 2.726 2.668 2.589 2.538 2.425 2.321
10 3.285 2.924 2.728 2.605 2.522 2.461 2.377 2.323 2.201 2.087
20 2.975 2.589 2.380 2.249 2.158 2.091 1.999 1.937 1.794 1.650
100 2.756 2.356 2.139 2.002 1.906 1.834 1.732 1.663 1.494 1.293
• 2.706 2.303 2.084 1.945 1.847 1.774 1.670 1.599 1.421 1.185
a = 0.05
q2\q1 1 2 3 4 5 6 8 10 20 100
1 161.4 199.5 215.7 224.6 230.2 234.0 238.9 241.9 248.0 253.0
2 18.51 19.00 19.16 19.25 19.30 19.33 19.37 19.40 19.45 19.49
3 10.13 9.552 9.277 9.117 9.013 8.941 8.845 8.786 8.660 8.554
4 7.709 6.944 6.591 6.388 6.256 6.163 6.041 5.964 5.803 5.664
5 6.608 5.786 5.409 5.192 5.050 4.950 4.818 4.735 4.558 4.405
6 5.987 5.143 4.757 4.534 4.387 4.284 4.147 4.060 3.874 3.712
8 5.318 4.459 4.066 3.838 3.687 3.581 3.438 3.347 3.150 2.975
10 4.965 4.103 3.708 3.478 3.326 3.217 3.072 2.978 2.774 2.588
20 4.351 3.493 3.098 2.866 2.711 2.599 2.447 2.348 2.124 1.907
100 3.936 3.087 2.696 2.463 2.305 2.191 2.032 1.927 1.676 1.392
• 3.841 2.996 2.605 2.372 2.214 2.099 1.938 1.831 1.571 1.243
a = 0.01
q2\q1 1 2 3 4 5 6 8 10 20 100
1 4052. 5000. 5403. 5625. 5764. 5859. 5981. 6056. 6209. 6334.
2 98.50 99.00 99.17 99.25 99.30 99.33 99.37 99.40 99.45 99.49
3 34.12 30.82 29.46 28.71 28.24 27.91 27.49 27.23 26.69 26.24
4 21.20 18.00 16.69 15.98 15.52 15.21 14.80 14.55 14.02 13.58
5 16.26 13.27 12.06 11.39 10.97 10.67 10.29 10.05 9.553 9.130
6 13.75 10.92 9.780 9.148 8.746 8.466 8.102 7.874 7.396 6.987
8 11.26 8.649 7.591 7.006 6.632 6.371 6.029 5.814 5.359 4.963
10 10.04 7.559 6.552 5.994 5.636 5.386 5.057 4.849 4.405 4.014
20 8.096 5.849 4.938 4.431 4.103 3.871 3.564 3.368 2.938 2.535
100 6.895 4.824 3.984 3.513 3.206 2.988 2.694 2.503 2.067 1.598
• 6.635 4.605 3.782 3.319 3.017 2.802 2.511 2.321 1.878 1.358
Table B.3: Central F-distribution. F (q1, q2, 0); given is k, critical value, for q1 and q2,
degrees of freedom, for some values of a, probability in right-hand tail, e.g. a=0.01,
q1=10, q2=• yield k = 2.321; k=Fa (q, •, 0) for test (106) in Section 4.5.
calculation in Matlab:
critical value = finv (1-alpha, degrees of freedom q1, q2)
Appendix B 129
1 0.85 0.82 0.70 0.62 0.54 0.47 0.88 0.86 0.79 0.74 0.69 0.64
2 0.82 0.74 0.51 0.35 0.22 0.12 0.86 0.81 0.67 0.55 0.43 0.31
3 0.80 0.70 0.41 0.23 0.11 0.04 0.84 0.78 0.58 0.42 0.27 0.16
4 0.78 0.68 0.35 0.17 0.07 0.02 0.83 0.76 0.52 0.34 0.19 0.09
5 0.78 0.66 0.32 0.15 0.05 0.01 0.82 0.75 0.48 0.29 0.14 0.06
6 0.77 0.65 0.30 0.13 0.04 0.01 0.82 0.74 0.45 0.26 0.12 0.04
8 0.76 0.64 0.28 0.11 0.03 0.01 0.81 0.72 0.42 0.22 0.09 0.03
10 0.76 0.63 0.26 0.10 0.03 0.00 0.81 0.71 0.39 0.19 0.07 0.02
20 0.75 0.61 0.24 0.08 0.02 0.00 0.80 0.69 0.35 0.15 0.05 0.01
100 0.74 0.59 0.22 0.07 0.01 0.00 0.79 0.67 0.31 0.12 0.03 0.01
• 0.74 0.59 0.21 0.06 0.01 0.00 0.78 0.67 0.30 0.11 0.03 0.00
1 0.89 0.88 0.86 0.84 0.81 0.78 0.90 0.89 0.87 0.86 0.84 0.82
2 0.88 0.87 0.81 0.76 0.69 0.63 0.89 0.88 0.85 0.81 0.77 0.72
3 0.88 0.86 0.77 0.69 0.59 0.49 0.89 0.87 0.82 0.77 0.70 0.63
4 0.87 0.85 0.74 0.63 0.51 0.38 0.88 0.87 0.80 0.73 0.64 0.55
5 0.87 0.84 0.71 0.58 0.44 0.31 0.88 0.86 0.78 0.70 0.59 0.48
6 0.87 0.83 0.69 0.55 0.39 0.25 0.88 0.86 0.76 0.67 0.55 0.42
8 0.86 0.82 0.65 0.49 0.32 0.18 0.88 0.85 0.74 0.62 0.48 0.34
10 0.86 0.81 0.63 0.45 0.28 0.14 0.87 0.84 0.72 0.59 0.43 0.28
20 0.85 0.79 0.56 0.36 0.18 0.07 0.87 0.83 0.66 0.49 0.31 0.16
100 0.84 0.77 0.49 0.27 0.11 0.03 0.85 0.80 0.58 0.37 0.18 0.07
• 0.83 0.76 0.47 0.25 0.09 0.02 0.85 0.80 0.55 0.33 0.15 0.05
Table B.4: Non-central F-distribution. F (q1, q2, l); given is b, probability in left-hand tail of
F(q1, q2, l), for l, non-centrality parameter, and q2, degrees of freedom, for a=0.10,
probability in right-hand tail of F(q1, q2, 0), for some values of q1, degrees of freedom, e.g.
q1=1, q2=• and a=0.10 yield, with l=2, b=0.59 and hence g=0.41, see also Table 4.7 in
Section 4.5;
and
calculation in Matlab:
critical value = finv (1-alpha, degrees of freedom q1, q2)
beta = ncfcdf (critical value, degrees of freedom q1, q2, lambda)
130 Testing theory
1 0.93 0.91 0.85 0.80 0.76 0.72 0.94 0.93 0.89 0.87 0.84 0.81
2 0.90 0.86 0.71 0.58 0.46 0.34 0.93 0.90 0.82 0.74 0.65 0.56
3 0.89 0.83 0.60 0.43 0.27 0.15 0.92 0.88 0.75 0.62 0.48 0.35
4 0.88 0.80 0.54 0.34 0.18 0.08 0.91 0.87 0.69 0.53 0.37 0.23
5 0.87 0.79 0.49 0.28 0.13 0.05 0.90 0.86 0.65 0.47 0.29 0.16
6 0.86 0.78 0.46 0.25 0.11 0.03 0.90 0.85 0.62 0.42 0.24 0.12
8 0.86 0.76 0.42 0.21 0.08 0.02 0.89 0.83 0.57 0.36 0.18 0.07
10 0.85 0.75 0.40 0.19 0.06 0.02 0.89 0.82 0.55 0.32 0.15 0.05
20 0.84 0.73 0.36 0.15 0.04 0.01 0.88 0.80 0.48 0.25 0.10 0.03
100 0.83 0.71 0.32 0.12 0.03 0.00 0.87 0.78 0.43 0.20 0.06 0.01
• 0.83 0.71 0.31 0.11 0.03 0.00 0.87 0.77 0.42 0.18 0.06 0.01
1 0.95 0.94 0.93 0.92 0.90 0.89 0.95 0.95 0.94 0.93 0.92 0.91
2 0.94 0.93 0.90 0.87 0.84 0.80 0.95 0.94 0.92 0.90 0.88 0.85
3 0.94 0.93 0.88 0.83 0.76 0.69 0.94 0.94 0.91 0.88 0.84 0.79
4 0.94 0.92 0.85 0.78 0.69 0.58 0.94 0.93 0.89 0.85 0.79 0.72
5 0.93 0.91 0.83 0.74 0.63 0.50 0.94 0.93 0.88 0.82 0.75 0.66
6 0.93 0.91 0.81 0.71 0.58 0.43 0.94 0.93 0.87 0.80 0.71 0.61
8 0.93 0.90 0.78 0.65 0.49 0.33 0.94 0.92 0.85 0.76 0.65 0.52
10 0.93 0.90 0.76 0.61 0.44 0.27 0.93 0.92 0.83 0.73 0.60 0.45
20 0.92 0.88 0.70 0.50 0.30 0.14 0.93 0.90 0.78 0.63 0.45 0.28
100 0.91 0.86 0.62 0.39 0.18 0.06 0.92 0.89 0.71 0.50 0.29 0.13
• 0.91 0.85 0.60 0.36 0.16 0.05 0.92 0.88 0.68 0.46 0.24 0.09
Table B.5: Non-central F-distribution. F (q1, q2, l) given is b, probability in left-hand tail of
F(q1, q2, l), for l, non-centrality parameter, and q2, degrees of freedom, for a=0.05, probability
in right-hand tail of F(q1, q2, 0), for some values of q1, degrees of freedom.
calculation in Matlab:
critical value = finv (1-alpha, degrees of freedom q1, q2)
beta = ncfcdf (critical value, degrees of freedom q1, q2, lambda)
Appendix B 131
1 0.99 0.98 0.97 0.96 0.95 0.94 0.99 0.99 0.98 0.97 0.97 0.96
2 0.98 0.97 0.93 0.90 0.85 0.80 0.99 0.98 0.96 0.94 0.92 0.89
3 0.98 0.96 0.89 0.81 0.71 0.60 0.98 0.97 0.94 0.90 0.84 0.77
4 0.97 0.95 0.84 0.72 0.58 0.43 0.98 0.97 0.91 0.85 0.75 0.64
5 0.97 0.94 0.81 0.65 0.48 0.31 0.98 0.96 0.89 0.80 0.67 0.53
6 0.96 0.93 0.78 0.60 0.41 0.24 0.98 0.96 0.87 0.76 0.60 0.44
8 0.96 0.92 0.73 0.52 0.31 0.15 0.97 0.95 0.84 0.69 0.50 0.32
10 0.96 0.92 0.70 0.47 0.26 0.11 0.97 0.95 0.81 0.64 0.43 0.24
20 0.95 0.90 0.63 0.37 0.17 0.05 0.97 0.93 0.74 0.52 0.29 0.12
100 0.94 0.88 0.57 0.30 0.11 0.03 0.96 0.92 0.67 0.41 0.18 0.06
• 0.94 0.88 0.55 0.28 0.10 0.02 0.96 0.92 0.65 0.39 0.16 0.05
1 0.99 0.99 0.99 0.98 0.98 0.98 0.99 0.99 0.99 0.99 0.98 0.98
2 0.99 0.99 0.98 0.97 0.97 0.96 0.99 0.99 0.98 0.98 0.98 0.97
3 0.99 0.98 0.97 0.96 0.94 0.92 0.99 0.99 0.98 0.97 0.96 0.95
4 0.99 0.98 0.97 0.95 0.92 0.88 0.99 0.99 0.98 0.97 0.95 0.93
5 0.99 0.98 0.96 0.93 0.89 0.83 0.99 0.99 0.97 0.96 0.93 0.90
6 0.99 0.98 0.95 0.91 0.86 0.78 0.99 0.98 0.97 0.95 0.92 0.87
8 0.98 0.98 0.94 0.88 0.80 0.68 0.99 0.98 0.96 0.93 0.89 0.82
10 0.98 0.98 0.93 0.86 0.75 0.60 0.99 0.98 0.96 0.92 0.85 0.76
20 0.98 0.97 0.89 0.77 0.59 0.39 0.98 0.98 0.93 0.86 0.74 0.57
100 0.98 0.96 0.83 0.64 0.40 0.19 0.98 0.97 0.88 0.74 0.53 0.31
• 0.98 0.95 0.81 0.59 0.34 0.14 0.98 0.97 0.86 0.69 0.46 0.23
Table B.6: Non-central F-distribution. F (q1, q2, l); given is b, probability in left-hand tail of
F (q1, q2, l) for l, non-centrality parameter, and q2, degrees of freedom, for a=0.01,
probability in right-hand tail of F (q1, q2, 0), for some values of q1, degrees of freedom.
calculation in Matlab:
critical value = finv (1-alpha, degrees of freedom q1, q2)
beta = ncfcdf (critical value, degrees of freedom q1, q2, lambda)
132
Appendix C
Detection, identification and adaptation
We have given the teststatistic for testing the null hypothesis H0 against a particular alternative
hypothesis HA. In most practical applications however, it is usually not only one model error one
is concerned about, but quite often many more than one. This implies that one needs a testing
procedure for handling the various alternative hypotheses. In this appendix we will discuss a way
of structuring such a testing procedure. It will consist of the following three steps: detection,
identification and adaptation.
Detection
Since one usually first wants to know whether one can have any confidence in the assumed null
hypothesis without the need to specify any particular alternative hypothesis, the first step consists
of a check on the overall validity of H0. This implies that one opposes the null hypothesis to the
most relaxed alternative hypothesis possible (see Section 4.4). The most relaxed alternative
hypothesis is the one that leaves the observables completely free. Hence, under this alternative
hypothesis no restrictions at all are imposed on the observables. We therefore have the situation:
(1)
(2)
The appropriate teststatistic for testing the null hypothesis against the most relaxed alternative
hypothesis, is thus equal to the weighted sum-of-squares of the least-squares residuals. The null
hypothesis will then be rejected when:
(3)
The ŝ2 test: In the literature one often sees the above overall model test also formulated in a
slightly different way. Let us use the factorization D y Qy s2Q , where s2 is the variance
factor of unit weight and Q is the corresponding cofactor matrix. It can be shown that:
is an unbiased estimator of s2 (see also (100) in Section 4.4). Thus E ŝ2 s2 . The test (3) can
now also be formulated as:
Appendix C 133
Identification
In the detection phase, one tests the overall validity of the null hypothesis. If this leads to a
rejection of the null hypothesis, one has to search for possible model misspecifications. That is,
one will have to try to identify the model error which caused the rejection of the null hypothesis.
This implies that one will have to specify the type of likely model errors. This specification of
possible alternative hypotheses is application dependent and is one of the more difficult tasks in
hypothesis testing. It depends very much on ones experience.
The 1-dimensional case: In case the model error can be represented by a scalar, the alternative
hypothesis takes the form:
(4)
The alternative hypothesis is specified, once the vector cy is specified (see Section 4.3). The
appropriate teststatistic for testing the null hypothesis against the above alternative hypothesis
HA is given as:
(5)
This teststatistic has a standard normal distribution N(0,1) under H0. The evidence on whether
the model error as specified by (4) did or did not occur, is based on the two-sided test:
(6)
Data snooping: Apart from the possibility of having a one dimensional test as (6), it is standard
practice in geodesy to always first check the individual observations for potentional blunders.
This implies that the alternative hypotheses take the form:
(7)
with
134 Testing theory
Thus cy is a unit vector having the 1 as its ith entry. The additional term cy —i models the
i i
presence of a blunder in the ith observation. The appropriate teststatistic for testing the null
hypothesis against the above alternative hypothesis HA, is again of the general form of (5), but
now with the c-vector chosen as cy , see also (83) in Section 4.3:
i
(8)
This teststatistic has of course also a standard normal distribution N (0,1) under H0. By letting
i run from 1 up to and including m, one can screen the whole data set on the presence of
potential blunders in the individual observations. The teststatistic wi which returns the in absolute
value largest value, then pinpoints the observation which is most likely corrupted with a blunder.
Its significance is measured by comparing the value of the teststatistic with the critical value.
Thus the jth observation is suspected to have a blunder, when:
(9)
This procedure of screening each individual observation for the presence of a blunder, is known
as data snooping.
In many applications in practice, the variance matrix Qy is diagonal. If that is the case, the
expression of the above teststatistic simplifies considerably. With a diagonal Qy-matrix, we have:
The appropriate teststatistic is then thus equal to the least-squares residual of the ith observation
divided by the standard deviation of the residual.
The higher dimensional case: It may happen that a particular model error can not be represented
by a single scalar. In that case q > 1 and becomes a vector. The appropriate teststatistic is
then the one we met earlier, namely:
(10)
It is through the matrix Cy that one specifies the type of model error.
Adaptation
Once one or more likely model errors have been identified, a corrective action needs to be
undertaken in order to get the null hypothesis accepted. Here, one of the two following
approaches can be used in principle. Either one replaces the data or part of the data with new
data such that the null hypothesis does get accepted, or, one replaces the original null hypothesis
with a new hypothesis that does take the identified model errors into account. The first approach
amounts to a remeasurement of (part of) the data. This approach is feasible for instance, when
in case of datasnooping some individual observations are identified as being potentially corrupted
Appendix C 135
by blunders. These are then the observations which get remeasured. In the second approach no
remeasurement is undertaken. Instead the model of the null hypothesis is enlarged by adding
additional parameters such that all identified model errors are taken care of. Thus with this
approach, the identified alternative hypothesis becomes the new null hypothesis.
Once the adaptation step is completed, one of course still has to make sure whether the newly
created situation is acceptable or not. This at least implies a repetition of the detection step. It
is possible that a gross error in one observation masks the gross error in another observation.
This may have as consequence that the gross error which is masked, fails to have a large enough
effect on its w-teststatistic; in other words, this w-test is not rejected. It is therefore good
practice, once an observation is rejected, to repeat the adjustment without the rejected observation
and again apply to this result the datasnooping procedure. In this way, one can infer whether it
is likely that any gross errors remained undetected in the first step. Of course, if redundancy
permits, one can repeat this again after the second step. This procedure is called iterative
datasnooping.
When adaptation is applied, one also has to be aware of the fact that since the model may have
changed, also the ’strength of the model’ may have changed. In fact, when the model is adapted
through the addition of more explanatory parameters, the model has become weaker in the sense
that the teststatistics will now have less detection and identification power. That is, the reliability
has become poorer. It depends on the particular application at hand, whether this is considered
acceptable or not.
When executing the above tests, choices need to be made about the testing parameters so as to
control the errors of the first and second type. Although various approaches are possible, we only
present briefly one such approach, namely the B-method of testing (Baarda, 1968). For a more
detailed discussion on this topic, including the possible pittfalls involved, we refer to (Miller,
1966 and Arnold, 1981).
In the B-method of testing, the Tq=m-n-test of the detection step and the w-test of the identification
step are related to each other by a special choice of their testing parameters:
(11)
The procedure is then to make a choice for a1 and g0 and compute l0 and a from the above
relation. This choice of equal values for the non-centrality parameter l = l0 and power g = g0
in both tests, implies that a certain model error can be found with the same probability by both
the Tq=m-n and the w-test. Both tests will therefore have the same reliability, i.e. the same values
for the minimal detectable biases (MDB). Thus if the null hypothesis is accepted in the detection
step, no further testing is necessary and the reliability for any 1-dimensional alternative
hypothesis is given by its corresponding MDB computed on the basis of the value l0.
One consequence of the above coupling that one should be aware of, is the dependence of a on
the redundancy m-n. Due to this coupling the value of a will increase when the redundancy
136 Testing theory
increases (see Figure C.1). For a large redundancy this may lead to a too large value of a, so
that the null hypothesis gets too often falsely rejected. For such situations, Baarda proposes to
carry out the adjustment and testing in steps.
0.9
0.8
0.7
level of significance a
0.6
0.5
0.4
0.3
0.2
0.1
0 0 1 2 3
10 10 10 10
redundancy m−n
Figure C.1: Level of significance a versus redundancy m-n according to the B-method of
testing (11); a1 = 0.001, g0 = 0.80.
Literature 137
Literature
1. Calculus and linear algebra
2. Probability theory
Breiman, L.; Probability and stochastic processes: with a view toward applications
Boston, Houghton Mifflin Company, (1969)
Ghosh, B.K.; Some monotonicity theorems for c2, F and t-distributions with applications
J.R. Statist. Soc. 35, (1973), pp. 480-492
Neyman, J. and E.S. Pearson; On the problem of the most efficient tests of statistical
hypotheses
Philos. Trans. Roy. Soc., London Series A231, (1933), pp. 289-337
Teunissen, P.J.G.; Adjusting and testing with the models of the affine and similarity
transformation
Manuscripta Geodaetica, 11(3), (1986), pp. 214-225
Index 139
Index
a p
power 29, 96, 101
adaptation 134
powerful test 29
adjustment history 137
- uniformly most powerful test 63-69
b powerfunction 63
bias to noise ratio 106, 111
B-method of testing 135
r
redundancy number 104
c reliability, internal 102
conventional alternative hypothesis 88 reliability, external 110
critical region 9
critical value 13
s
chi-square distribution 124, 127 ŝ2 -test 92, 132
significance of parameters 48
d
simple likelihood ratio test 22
data snooping 88, 103, 133
size of test 11
datasnooping, iterative 135
statistical tables 126
detection 132
e t
error, estimated 76, 87 test 8
testing principle 16
f testing procedure 132
F-distribution 125, 128-131
teststatistic
g - teststatistic w 89, 39
generalized likelihood ratio test 53, 72-78 - teststatistic v 50
h - teststatistic Tq 78, 85
hypothesis 6 - teststatistic ŝ2 92
- null hypothesis 7 type I error 10
- alternative hypothesis 7 type II errror 10
- composite hypothesis 8 v
- simple hypothesis 8 variance factor of unit weight 92
i v-test 50
identification 133 w
l w-test 87, 134, 39
least-squares residual 40
level of significance 11
linear model 71, 35
- observation equations 72, 35
- condition equations 71, 35
- mixed model 49
m
minimal detectable bias 102
misclosure 36
n
Neyman-Pearson principle 16, 29
non-central Chi-square distribution 124
non-central F-distribution 125
non-centrality parameter 96, 98, 101, 124, 125
normal distribution 124, 126
o
overall model test 93, 132
Testing theory:
an introduction
These lecture notes are a follow up on Adjustment theory. Adjustment theory deals
with the optimal combination of redundant measurements together with the estimation
of unknown parameters. There are two main reasons for performing redundant
measurements. First, the wish to increase the accuracy of the results computed. Second,
the requirement to be able to check for mistakes or errors. The present book addresses
this second topic. Although one always will try one’s best to avoid making mistakes, they
can and will occasionally happen. It is therefore of importance to have ways of detecting
and identifying such mistakes. Mistakes or errors can come in many different guises.
They could be caused by mistakes made by the observer, or by the fact that defective
instruments are used, or by wrong assumptions about the functional relations between
the observables. When passed unnoticed, these errors will deteriorate the final results.
The goal of this introductory course on testing theory is therefore to convey the necessary
knowledge for testing the validity of both the measurements and the mathematical
model. Typical questions that will be addressed are: ‘How to check the validity of the
mathematical model? How to search for certain mistakes or errors? How well can errors
be traced? And how do undetected errors affect the final results?’ The theory is worked
out in detail for the important case of linear(ized) models. Both the parametric form
(observation equations) and the implicit form (condition equations) of linear models are
treated. As an additional aid in understanding the basic principles involved, a geometric
interpretation is given throughout. Attention is also paid to the performance of the
testing procedures. The closely related concept of reliability is introduced and diagnostic
measures are given to determine the size of the minimal detectable biases. In this
introductory text the methodology of testing is emphasized, although various examples
are given to illustrate the theory. The methods discussed form the basis for geodetic
quality control and they provide the ingredients for the formulation of guidelines for the
reliable design of measurement set-ups.
P.J.G. Teunissen
Delft University of Technology
Faculty of Civil Engineering and Geosciences