0% found this document useful (0 votes)
22 views10 pages

The Probability Integral Transformation When Parameters Are Estimated From The Sample

Uploaded by

hpschrei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views10 pages

The Probability Integral Transformation When Parameters Are Estimated From The Sample

Uploaded by

hpschrei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Biometrika Trust

The Probability Integral Transformation When Parameters are Estimated from the Sample
Author(s): F. N. David and N. L. Johnson
Source: Biometrika, Vol. 35, No. 1/2 (May, 1948), pp. 182-190
Published by: Biometrika Trust
Stable URL: http://www.jstor.org/stable/2332638 .
Accessed: 17/06/2014 19:28

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].

Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.

http://www.jstor.org

This content downloaded from 62.122.76.48 on Tue, 17 Jun 2014 19:28:07 PM


All use subject to JSTOR Terms and Conditions
[ 182 ]

THE PROBABILITY INTEGRAL TRANSFORMATION WHEN


PARAMETERS ARE ESTIMATED FROM THE SAMPLE
BY F. N. DAVID AND N. L. JOHNSON
1. The probability integral transformation for testing goodness of fit and combining tests
of significance was introduced by R. A. Fisher in 1932. Fisher's objective was the significance
of combined independent tests of significance, but his method also proved applicable to
a certain limited range of tests for goodness of fit as can be seen in K. Pearson (1933),
J. Neyman (1937) and E. S. Pearson (1938). The transformation may be summarized briefly
in the following way. Assume that there is a continuous random variable x whose elementary
probability law is p(x), whence obviously

JP(X)dx= 1.

Consider a new random variable, y, connected with x by the relation

Y=y f p(x) dx.


y is a monotonic non-decreasing function of x and 0 < y < 1. Further, if p(y) is the elementary
probability law of y, then dx
Ay)= AX) ~1.

Hence in the interval [0; 1] all values of y are equally likely, or in common parlance, y is
rectangularly distributed in the interval [0; 1], no matter what the elementary probability
law of x. If therefore we have n independent random variables xj(j = 1, 2, ..., n) following
a known continuous probability law which is completely specified by Ho, the hypothesis
tested, then by means of the transformation

yi = fp(xi IHo)dxj,
the x's can be transformed into n independent random variables y which are rectangularly
distributed.
2. The transformation which we have just summarized is useful statistically in that tests
based on a rectangular population can be made applicable to any variable of which the
elementary probability law is known. However, because the parameters of the elementary
probability law must be specified, it is clear that the range of application of any tests based
on this transformation will be very restricted, for cases are rare in statistical practice when
Ho is completely specified. It seemed interesting to us to investigate the effect on the trans-
formation of calculating estimates of the parameters from the data provided by the sample.
For example, if the mean of the probability law is estimated from a sample of n quantities
X1, X2, ..., Xn each of which is one observed value of n random variables x1, x2, ..., Xn, the
y's obtained by the probability integral transformation will no longer be independent,
neither will they be rectangularly distributed. We are able to show that the generality of
the transformation in the case when the parameters are completely specified is lost as soon
as we begin replacing unknown parameters by the sample estimates, and, as is intuitively
obvious, the form of the probability law of y depends on the functional form of the common
probability law of the x's.

This content downloaded from 62.122.76.48 on Tue, 17 Jun 2014 19:28:07 PM


All use subject to JSTOR Terms and Conditions
F. N. DAVID AND N. L. JOHNSON 183
3. We begin by stating the problem in a formal mathematical way and indicate the method
whereby a general solution is reached. Assume thatp(x) is a single valued continuous function
of the form p(x) = f (x I 1 02, ** Os)) ..

where, in the usual way, 01, 02, ... O., are parameters descriptive of the population, all of
which may or may not be specified. It may be assumed for generality that none are specified
and that in place of the unknown parameters, 0, it is necessary to substitute functions of the
sample values, say,

Fl(X11 X2, .. 1
,xn), -F2(Xl, X2,* X , Xn)) . . Fs(X1n X2, . . . Xn)~
where x1, x2, .n.., xn are the random variables of which the n observations which form the
sample are the observed values. Thus we require to find the distribution of the variables

yi =J f(tIF1,2, ...5)dt for i=1,2,...,n.

We have immediately that

azy = ?f r'-~d
fx-axoor=aFrXj i=s =1 ax f a! dt
- o Mr for i 0j,

s M
and ai =(xi
=f IF,, Fr
2 Fs af dt for i= j.
?JXi r=1iaxiJ -w Mr
In matrix notation we may write this
[a(Y5. .: AY +F[)] [fi afdt]

AI is a diagonal matrix with diagonal elements f(x1 IF1, Fs) [ajl/aXk] is an n x s matrix,
being the element in the kth row and the jth column. [f
aFjl/aXk af dt] is an s x n matrix,
xF afdt being the element in the kth row and the jth column. The rank of the matrix

F(Y1. YJ is not immediately obvious. In general it may be noted that it will be at least

n - s, and that it will be less than n. A study of particular cases leads us to believe that where
the s sample estimates are algebraic functions each of the other, as for example, the sample
moment coefficients, then there will be s independent relationships between the y's, and the
rank of the matrix will be n -.s. Where the sample estimates are not functions of one another,
as for example in the case of the median and the standard deviation, the matrix rank will
be between n -.s and n. We have not been able to prove this in general but it should not be
impossible to do so.
We shall assume that there are s independent relationships between the variables
Y1,Y2*, Yan Under this last assumption we have
nA)
a(Yi)@~-,F1F~
... I F,, F.... nFs) = (F,X,X)fx
...,.............
(X1 * a)(xn-s+l) ..**xn) i=l

(provided partial differentiation is, in fact, possible), whence, since


n
p(X1, X2, *..* Xn Ij1, 02, Os) = Flf (xi I ?1 02, Os))
i=1

This content downloaded from 62.122.76.48 on Tue, 17 Jun 2014 19:28:07 PM


All use subject to JSTOR Terms and Conditions
184 The probability integral transformation
the joint probability law of Y1iY2'... I yn-s, 1* .F.. _Ps,may be written
n

P(YiY2, ,
...,X F 10? 02'..., O8) i=1 i a(F1, ..., '

11f (xi | F, , . .... Fs) ns,** n


i=1

Alternative expressions for the joint-probability law may be obtained by using the
n-s
relationship If (Xi | ., Os) = Ap(xi, ., *n-s I0?1 ...* I s).
t= 1

Substituting in the right-hand side of the joint law we have

P(Y1, Y2n***nYn-s 1 F2, ***Fs [ 02 * -) 0)

fl-sf (xi 02 0 ',yi F *. vnx a l-e+ . a

(l .. F1~,15F2, s|
...,F )L'*vl s
n-s
p f F F,, 2 .. ., Fs)
1 (xi
i=1

4. In the previous section formal solutions only of the problem have been set down. For
any particular case the analysis becomes somewhat complicated. Accordingly, in order to
obtain a clear idea of the kinds of distributions arising, we shall first confine ourselves to the
discussion of (at) the special case where only location and scale parameters appear in the
probability law of the x's and (fI) the distribution of single ye. Under (fI) we may note that
xi
IN= f(t
-00
jF1,F2,..., Fi) dt = Zi(xi, F,, ..., F.), say.

If the distribution of Zi is known, the distribution of y* may be found immediately, and in


particular if we can write
rzi(, F 1, ..,Fax
= Zi(xi, F., = g(t) dt,
Y- _00 .

then p(yA)=(g(zi))=P(zi)
5. It is not uncommon in statistical practice to find probability laws which are completely
specified by a single parameter for location and a single parameter for scale. The normal
curve is, of course, the classic example. Let 6 be the parameter of location, a- the scale
parameter, and write p(x) = f (x I9) a).

if Y J f (t I6,o-)dt f f(t 0, 1)dt=f _00 f(t) dt, say,


_00 00

then we may write Xi - qS()

Either 6, or a-, or both may be estimated from the observed values of the random variables x.
We treat two distinct cases.

This content downloaded from 62.122.76.48 on Tue, 17 Jun 2014 19:28:07 PM


All use subject to JSTOR Terms and Conditions
F. N. DAVID AND N. L. JOHNSON 185
Case (i): o-knownand Eestimated
We first suppose that the scaling parameteris known but that it is necessary to estimate
a central measure of location. Suppose this to be a function M(x1,x2, ...,x) which may
be written for brevity M(x). We have
rxf rxi-M(x)h/
Y= _00
f(t IM(x), o-)dt= J _00
f(t) dt. (1)

so Xix-M(x) -

It follows that M[0(y)] = 0


provided M satisfies the usual conditions for a measure of location.* In this case therefore
thereis one relationbetweenthe yi's and 'aY1'XYn) is of rankn-1. Usingthe general
formula obtained in ?3, the joint-probability law of Y1,Y2, ***, Yn-1 x, is
n

P(Y1)Y2)... Yn-11)=an-1 -X
HIf(xi xoj)
i=1
The distribution of any individual yi is simply obtained. For, since
O[xi.-M(x)11/
YiJ= -00
f(t)dt,

we have p(y ) [f (xs-M(X))]-lp (xh_M(x))


and if the distribution of xi - M(x) is known, the distribution of yi follows immediately.
Case(ii): both6 and o estimated
Assume that 6 is estimated as before by I(x1,X2, ..., Xn) = M(x). Since now o is also
unknown, suppose that it is estimated from the sample values by a measure of dispersion,
say D(xl,x2, ... ,ixn) = D(x).
We have Yi ff(t IM(x),D(x))dt,
and Xi-M(X)
D(x)
Provided D(x) is a function of the quantities xi - M(x)and satisfies the usual conditions for
a measure of dispersion the yi's must now satisfy the two conditions,
M[O(y)]= 0; D[q(y)]= 1L
.
The matrix Fa(Y1 Y)] is then of rank n -2. The conditions are satisfied, for example, if

M(x) = x and D(x) = (- (xo- )2). By an argument precisely similar to that of


case (i) we have that xi - (X))
\() Xi -
)

D(x)
M(x1+a, x2+ a, ..., X.+ a) = *x n)+ a;
M(x, xX, X.
=ii +a)D(x))J
t (i) D(xl + a,X2+aa ...,Xn = D(x1,X2,
M(X1,x 2, (
...aX.); X),x)
(ii) D(x, x,..., x)=O;
(iii) D(kx,, kx2, ..., kXn) = k ID(x1X,
X2. . .., x).

This content downloaded from 62.122.76.48 on Tue, 17 Jun 2014 19:28:07 PM


All use subject to JSTOR Terms and Conditions
186 The probability integral transformation
4 [x?--JI(x)1D(x)
where Ai 1( x f (t)dt. (2)
It is seen that y* is rectangularly distributed, i.e. p(y*) = 1, if and only if

f (Xi D(1l(x)) = (x -1lI(x))

This condition is not likely to be satisfied.


For both cases (i) and (ii) it may be noted that p(yi) is the ratio of two probability laws
with a transformation of variables given by (1) and (2) respectively, and that neither of these
two probability laws depends on 6 or on oI.
6. Example I. Let

V~)=s(27r)
or-X [2 ( O)]=J x|6
and as in the previous section consider two cases.
For case (i) there are many good statistical reasons for choosing
M(x) = x
for the estimate of 6 for this probability law. In the notation of ? 4
x.-x
Xi- Z* =
0'

and p (xi-x) = p(zi) = V(27r) In-1exp [2(n- 1)1


Applying the results of the preceding section we shall have

1
AMi = ((27T) exp [2 )1 (Q27)J exp [-2(n - 1)]) /n - lexp [ 2(n- 1)]'
(3)
where zi = q5(yi)is defined by 1 ejt2 dt.
Yi
tV(27T) J0
Clearly p(yi) has a maximum value Vn/(n - 1) at zi = 0, i.e. when y = 2' and the probability
law is symmetrical about this point. p(yi) is zero at the points yi = O(zi - c) and
1i= (zi = + o). A graph of the function for three different values of n is given in Fig. 1.
In order to compare p(yi) with the rectangular distribution we may find the points at which
the curve crosses it. This will be when P(yi) = 1 or when

-1
2(n -1) - -12 log (1- n
Expanding the logarithm as a series we have that

I2 1- 2n +2n2

or, for n moderately large, zi is nearly equal to + 1. It follows that p(yi) = 1 when
yi04159 orO0841.
In case (ii) for the same p(x), assume
Ma(x) =xD(x) 8= [n 1 -x,
X, 2

This content downloaded from 62.122.76.48 on Tue, 17 Jun 2014 19:28:07 PM


All use subject to JSTOR Terms and Conditions
F. N. DAVID AND N. L. JOHNSON 187
7
As before, write z =Xi S

Vn2)(i- (Q =4 Rn-4) n-1 n-i


then p(Z i()= ; 1)2) for <zj< +

It followsthat

I (n- I
(4y) (27T)) 1((n ( l B2)Vn-4n-_2 ( enz42 2
(4)-

V(27Tn) l ln z4 i1-)eiz 4
n-1 B(1,.1(n-2)) (n-1)2i

1'4
1'3
l~ 1 12
12 I 12 \

9 ~~~~~~~~~~~~~~~z~0.6-

02
0
07004 O*2~~~~~~~~~~
04 -

Scale olt Yi Scale of yt


n=6 n= 11 - -n=21 n=6 - --na=11 --- ---n=21
Fig. 1 Fig. 2

Fig. 1. The probability integral transformation applied to the normal curve with estimated means.
0-3-The probability integral transformation
Fig. 2. applied to the normal curve with estimated
0-3-fxjO
mean and standard deviation.

(Maxima approximately at yi-(r= eit dt.)

where, as before, = v i e-4t2dt.

A graph of this function, for the same sample sizes considered in case (i), is given in Fig. 2.

7. Example II. Let x be distributed as %2 with two degrees of freedom, i.e. let
p(x) -
!ex---I for x>O.

This content downloaded from 62.122.76.48 on Tue, 17 Jun 2014 19:28:07 PM


All use subject to JSTOR Terms and Conditions
188 The probability integral transformation
If we estimate 0 by M(x) =

then Yi e,-'Fdt = 1e-xi.


In this case we know, writing =/

that p(ui) =-(1 _) (O< ui < n).

Following the procedure of the previous sec- 13 -_


tions, we have that 1r2
n-i 1 [1+-log(1(_y) 1 _1
n
F1O-\YinJ1.0
for O<y,<I-ee-n. (5) 0.9
A graph of this function is given in Fig. 3. , 0-8-
8. The joint-probability law of the yi's for az.t
Example I of ?6 follows from an application o 0*6
of ? 3. If we are considering a normal distri- '

bution and if M(x) = x then 0?5

Yi= 0~39
,qS-1 t).

Since the quantities xi- x are independent of 0-2


xeit follows that the y's are also independent 0,1
of x. The most convenient formula to use .0 - 0 0 *0 1
0.01 0.2 0*3 0.4 0'5 0*6 0*7 0-8 0.9 -
would seem to be Scale of yj
Xn
AY1, ... ) Yn-1).T | ) = (x * -1_, a-| ) n=6 ---n=11 ----n=21

iHf(xs | Fig. 3. The probability integral transforma-


-1 1 tion applied to the law
I'
Since x=- i+-X
ni=l n p(x) - e-010,

and { x8= ~6 - 21 6 being estimated from the data. (The maximum


n n/ n n n \n/ = n is at 1- e-2 = 0-865 whatever be n.)
it is clear that
P2Gl J(2X) = exp (22[(x-) (Xi-)]}-
Hence
p(xi, .X1,X(.24,X j n,) = (l(2+)O)n) exp( .IT. ( - )

n-1 1 1n-1I
and lf (xi
i=1
)= /1
O)fexL2__
~~(V(2ir)
exP _ (x-i l
The joint-probability law of Y1,Y2, ., Yn-i and x will be

P(Y1 *... Yn-i " 4, = (2f) exp - -


(nz -)2 + (xi

- -1 )
(2i) ex [ n(s [)2] exp

This content downloaded from 62.122.76.48 on Tue, 17 Jun 2014 19:28:07 PM


All use subject to JSTOR Terms and Conditions
F. N. DAVID AND N. L. JOHNSON 189
0(yi) being defined as in (1). Integrate out for x, and we have
1
P(Y ..' Y-in-1 o-) = 4n exp [-2 v(Yi)}] (6)

It will be noted that this joint probability law is independent of both 6 and c.
9. The exponential law discussed in ? 7 differsfrom the examples in which the normal law
was used in that in this particularexample a measure of location is used to estimate a scale
parameter. We have 1
p(x) - ee-XIO,

in-i 1
whence, since x= x -x
ni.i n

it is seen that pGlIx.Xni,60) =n exp ( n- ])1

and also p(x1,X2, ***,xn 16) = a-lexp \ _ .

It follows that
p(l,* * X_1X ) = anexp ( - n2\ ,
1n

/n-i
n-i X
and 1 f (xi IT) =zexpl /(
i=1X

The joint-probability law Of Y1, ,Yn-1' and x follows in a straightforward way, namely

P(Yi) . Yn-1ix1|) = 0 e)iexp (-[nt$ i j- )


Rememberingthat x1 e-xiI or Xi log (1- yi)

the joint-probability law may be rewritten


n
P(Y1i..Yn-16x |) = [T e i j

n-1 n-1
where I [-log(1-y,)]<n or I (1-yi)<e-n.
i=i =

Integrating out with respect to x,

(Y,)..1@ Yn-1 ) -(n


(n -i)!
1)n- nf- I 1 n-i
II (-s< ) e-n, (7)
iii

again a result which is independent of the parameter of the probability law.


10. The results of this investigation, which we have carriedout partly in the general and
partly in the particular, are obviously incomplete and should be succeeded by a fuller
inquiry which would clear up the doubtful points which we have had to pass over, and
possibly extend the general theory a little further. We feel that none of the questionswhich

This content downloaded from 62.122.76.48 on Tue, 17 Jun 2014 19:28:07 PM


All use subject to JSTOR Terms and Conditions
190 The probability integral transformation
have been raised in the course of this inquiry are insoluble by algebraic analysis but it is
uncertain whether it is profitable to proceed with the fuller inquiry until some of the
statistical implications of what has been done become more clear. For example, we have
noted that given n independent random variables, x, if s sample moments are calculated
from them and used as estimates of the parametersof the probability law, then it appears
that there will be s independent relationships between the y's. Thus in this case the
point Y1Y2, .. ., Ya is constrained to move in an n - s dimensioned space within an n dimen-
sioned cube, and we have the exact analogue to the loss of degrees of freedom with x2when
the parameters have to be estimated from the data. What is not clear is how the y's are
constrainedwhen the sample estimates of the parametersare not the sample moments, and
while this situation may not often be met with in practice, yet it should be explored.
When the parametersof location and scale are estimated from the data it is clear that the
distribution of any individual yi, and the joint-probability law of the y's also, will not be
dependent on these unknown parameters of the probability law of the x's, but will depend
on the functional form of that law. This result appears capable of'extension for the case
when higher sample moments are also used for estimating parameters.This being so, there
would seem to be two ways in which the joint probability law of the y's may be utilized in
statistical applications. First, it should be possible mathematically to form certain broad
classes of functions for each of which the joint-probability laws of the y's would be approxi-
mately the same, or second one may seek for some transformation of variables so that
instead of the correlated yi we obtain n - s new independent variables following some
distributions which are independent of the original p(x). Both these methods of attack
may lead to results which will only be valid for large samples, but provided the results in
either case have sufficientalgebraicsimplicity they should make possible certain generaliza-
tions in statistical analysis of which Neyman's 'smooth' test for goodness of fit is only one
important example.

REFERENCES
FISHER, R. A. (1932). Statistical Methods for Research Workers, ? 21'1.
NEYMAN, J. (1937). Skand. AktuarTidskr. 20, 149.
PEARSON, E. S. (1938). Biometrika, 30, 134.
PEARSON, K. (1933). Biometrika, 25, 379.

This content downloaded from 62.122.76.48 on Tue, 17 Jun 2014 19:28:07 PM


All use subject to JSTOR Terms and Conditions

You might also like