The Probability Integral Transformation When Parameters Are Estimated From The Sample
The Probability Integral Transformation When Parameters Are Estimated From The Sample
The Probability Integral Transformation When Parameters are Estimated from the Sample
Author(s): F. N. David and N. L. Johnson
Source: Biometrika, Vol. 35, No. 1/2 (May, 1948), pp. 182-190
Published by: Biometrika Trust
Stable URL: http://www.jstor.org/stable/2332638 .
Accessed: 17/06/2014 19:28
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.
http://www.jstor.org
JP(X)dx= 1.
Hence in the interval [0; 1] all values of y are equally likely, or in common parlance, y is
rectangularly distributed in the interval [0; 1], no matter what the elementary probability
law of x. If therefore we have n independent random variables xj(j = 1, 2, ..., n) following
a known continuous probability law which is completely specified by Ho, the hypothesis
tested, then by means of the transformation
yi = fp(xi IHo)dxj,
the x's can be transformed into n independent random variables y which are rectangularly
distributed.
2. The transformation which we have just summarized is useful statistically in that tests
based on a rectangular population can be made applicable to any variable of which the
elementary probability law is known. However, because the parameters of the elementary
probability law must be specified, it is clear that the range of application of any tests based
on this transformation will be very restricted, for cases are rare in statistical practice when
Ho is completely specified. It seemed interesting to us to investigate the effect on the trans-
formation of calculating estimates of the parameters from the data provided by the sample.
For example, if the mean of the probability law is estimated from a sample of n quantities
X1, X2, ..., Xn each of which is one observed value of n random variables x1, x2, ..., Xn, the
y's obtained by the probability integral transformation will no longer be independent,
neither will they be rectangularly distributed. We are able to show that the generality of
the transformation in the case when the parameters are completely specified is lost as soon
as we begin replacing unknown parameters by the sample estimates, and, as is intuitively
obvious, the form of the probability law of y depends on the functional form of the common
probability law of the x's.
where, in the usual way, 01, 02, ... O., are parameters descriptive of the population, all of
which may or may not be specified. It may be assumed for generality that none are specified
and that in place of the unknown parameters, 0, it is necessary to substitute functions of the
sample values, say,
Fl(X11 X2, .. 1
,xn), -F2(Xl, X2,* X , Xn)) . . Fs(X1n X2, . . . Xn)~
where x1, x2, .n.., xn are the random variables of which the n observations which form the
sample are the observed values. Thus we require to find the distribution of the variables
azy = ?f r'-~d
fx-axoor=aFrXj i=s =1 ax f a! dt
- o Mr for i 0j,
s M
and ai =(xi
=f IF,, Fr
2 Fs af dt for i= j.
?JXi r=1iaxiJ -w Mr
In matrix notation we may write this
[a(Y5. .: AY +F[)] [fi afdt]
AI is a diagonal matrix with diagonal elements f(x1 IF1, Fs) [ajl/aXk] is an n x s matrix,
being the element in the kth row and the jth column. [f
aFjl/aXk af dt] is an s x n matrix,
xF afdt being the element in the kth row and the jth column. The rank of the matrix
F(Y1. YJ is not immediately obvious. In general it may be noted that it will be at least
n - s, and that it will be less than n. A study of particular cases leads us to believe that where
the s sample estimates are algebraic functions each of the other, as for example, the sample
moment coefficients, then there will be s independent relationships between the y's, and the
rank of the matrix will be n -.s. Where the sample estimates are not functions of one another,
as for example in the case of the median and the standard deviation, the matrix rank will
be between n -.s and n. We have not been able to prove this in general but it should not be
impossible to do so.
We shall assume that there are s independent relationships between the variables
Y1,Y2*, Yan Under this last assumption we have
nA)
a(Yi)@~-,F1F~
... I F,, F.... nFs) = (F,X,X)fx
...,.............
(X1 * a)(xn-s+l) ..**xn) i=l
P(YiY2, ,
...,X F 10? 02'..., O8) i=1 i a(F1, ..., '
Alternative expressions for the joint-probability law may be obtained by using the
n-s
relationship If (Xi | ., Os) = Ap(xi, ., *n-s I0?1 ...* I s).
t= 1
(l .. F1~,15F2, s|
...,F )L'*vl s
n-s
p f F F,, 2 .. ., Fs)
1 (xi
i=1
4. In the previous section formal solutions only of the problem have been set down. For
any particular case the analysis becomes somewhat complicated. Accordingly, in order to
obtain a clear idea of the kinds of distributions arising, we shall first confine ourselves to the
discussion of (at) the special case where only location and scale parameters appear in the
probability law of the x's and (fI) the distribution of single ye. Under (fI) we may note that
xi
IN= f(t
-00
jF1,F2,..., Fi) dt = Zi(xi, F,, ..., F.), say.
then p(yA)=(g(zi))=P(zi)
5. It is not uncommon in statistical practice to find probability laws which are completely
specified by a single parameter for location and a single parameter for scale. The normal
curve is, of course, the classic example. Let 6 be the parameter of location, a- the scale
parameter, and write p(x) = f (x I9) a).
Either 6, or a-, or both may be estimated from the observed values of the random variables x.
We treat two distinct cases.
so Xix-M(x) -
P(Y1)Y2)... Yn-11)=an-1 -X
HIf(xi xoj)
i=1
The distribution of any individual yi is simply obtained. For, since
O[xi.-M(x)11/
YiJ= -00
f(t)dt,
D(x)
M(x1+a, x2+ a, ..., X.+ a) = *x n)+ a;
M(x, xX, X.
=ii +a)D(x))J
t (i) D(xl + a,X2+aa ...,Xn = D(x1,X2,
M(X1,x 2, (
...aX.); X),x)
(ii) D(x, x,..., x)=O;
(iii) D(kx,, kx2, ..., kXn) = k ID(x1X,
X2. . .., x).
V~)=s(27r)
or-X [2 ( O)]=J x|6
and as in the previous section consider two cases.
For case (i) there are many good statistical reasons for choosing
M(x) = x
for the estimate of 6 for this probability law. In the notation of ? 4
x.-x
Xi- Z* =
0'
1
AMi = ((27T) exp [2 )1 (Q27)J exp [-2(n - 1)]) /n - lexp [ 2(n- 1)]'
(3)
where zi = q5(yi)is defined by 1 ejt2 dt.
Yi
tV(27T) J0
Clearly p(yi) has a maximum value Vn/(n - 1) at zi = 0, i.e. when y = 2' and the probability
law is symmetrical about this point. p(yi) is zero at the points yi = O(zi - c) and
1i= (zi = + o). A graph of the function for three different values of n is given in Fig. 1.
In order to compare p(yi) with the rectangular distribution we may find the points at which
the curve crosses it. This will be when P(yi) = 1 or when
-1
2(n -1) - -12 log (1- n
Expanding the logarithm as a series we have that
I2 1- 2n +2n2
or, for n moderately large, zi is nearly equal to + 1. It follows that p(yi) = 1 when
yi04159 orO0841.
In case (ii) for the same p(x), assume
Ma(x) =xD(x) 8= [n 1 -x,
X, 2
It followsthat
I (n- I
(4y) (27T)) 1((n ( l B2)Vn-4n-_2 ( enz42 2
(4)-
V(27Tn) l ln z4 i1-)eiz 4
n-1 B(1,.1(n-2)) (n-1)2i
1'4
1'3
l~ 1 12
12 I 12 \
9 ~~~~~~~~~~~~~~~z~0.6-
02
0
07004 O*2~~~~~~~~~~
04 -
Fig. 1. The probability integral transformation applied to the normal curve with estimated means.
0-3-The probability integral transformation
Fig. 2. applied to the normal curve with estimated
0-3-fxjO
mean and standard deviation.
A graph of this function, for the same sample sizes considered in case (i), is given in Fig. 2.
7. Example II. Let x be distributed as %2 with two degrees of freedom, i.e. let
p(x) -
!ex---I for x>O.
Yi= 0~39
,qS-1 t).
n-1 1 1n-1I
and lf (xi
i=1
)= /1
O)fexL2__
~~(V(2ir)
exP _ (x-i l
The joint-probability law of Y1,Y2, ., Yn-i and x will be
- -1 )
(2i) ex [ n(s [)2] exp
It will be noted that this joint probability law is independent of both 6 and c.
9. The exponential law discussed in ? 7 differsfrom the examples in which the normal law
was used in that in this particularexample a measure of location is used to estimate a scale
parameter. We have 1
p(x) - ee-XIO,
in-i 1
whence, since x= x -x
ni.i n
It follows that
p(l,* * X_1X ) = anexp ( - n2\ ,
1n
/n-i
n-i X
and 1 f (xi IT) =zexpl /(
i=1X
The joint-probability law Of Y1, ,Yn-1' and x follows in a straightforward way, namely
n-1 n-1
where I [-log(1-y,)]<n or I (1-yi)<e-n.
i=i =
REFERENCES
FISHER, R. A. (1932). Statistical Methods for Research Workers, ? 21'1.
NEYMAN, J. (1937). Skand. AktuarTidskr. 20, 149.
PEARSON, E. S. (1938). Biometrika, 30, 134.
PEARSON, K. (1933). Biometrika, 25, 379.