0% found this document useful (0 votes)
205 views6 pages

KS Test PDF

This chapter discusses Kolmogorov-Smirnov tests for assessing whether a sample comes from a population with a known distribution or whether two populations have the same distribution. The one-sample K-S test compares the empirical cumulative distribution function of a sample to a theoretical cumulative distribution function using a test statistic. For large samples, its null distribution can be approximated. The two-sample K-S test compares the empirical CDFs of two independent samples using a test statistic with an asymptotic Kolmogorov distribution. Examples are provided to demonstrate calculating the test statistics and p-values.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
205 views6 pages

KS Test PDF

This chapter discusses Kolmogorov-Smirnov tests for assessing whether a sample comes from a population with a known distribution or whether two populations have the same distribution. The one-sample K-S test compares the empirical cumulative distribution function of a sample to a theoretical cumulative distribution function using a test statistic. For large samples, its null distribution can be approximated. The two-sample K-S test compares the empirical CDFs of two independent samples using a test statistic with an asymptotic Kolmogorov distribution. Examples are provided to demonstrate calculating the test statistics and p-values.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Chapter 3

Kolmogorov-Smirnov Tests

There are many situations where experimenters need to know what is the dis-
tribution of the population of their interest. For example, if they want to use
a parametric test it is often assumed that the population under investigation
is normal. In this chapter we consider Kolmogorov-Smirnov tests for veri-
fying that a sample comes from a population with some known distribution
and also that two populations have the same distribution.

3.1 The one-sample test


Let x1 , . . . , xm be observations on continuous i.i.d. r.vs X1 , . . . , Xm with a
c.d.f. F . We want to test the hypothesis
H0 : F (x) = F0 (x) for all x, (3.1)
where F0 is a known c.d.f.

The Kolmogorov-Smirnov test statistic Dn is defined by


Dn = sup |F (x) F0 (x)|, (3.2)
xR

where F is an empirical cumulative distribution defined as


#(i : xi x)
F (x) = . (3.3)
n
Note that supremum (3.2) must occur at one of the observed values xi or to
the left of xi .

The null distribution of the statistic Dn can be obtained by simulation or,


for large samples, using the Kolmogorov-Smirnovs distribution function.

27
28 CHAPTER 3. KOLMOGOROV-SMIRNOV TESTS

1.0

0.8
u

0.6

U
0.4

0.2

0.0
x_u
-2 0 2 4 6
X

Figure 3.1: Continuous Cumulative Distribution Function

3.1.1 Simulation of the null distribution


We may approximate the null distribution of Dn by simulation. For this we
use the standard uniform random variable.

Lemma 3.1
Let X be a continuous r.v. with a c.d.f. F and let U = F (X). Then

U U nif orm[0, 1].

Proof
Let u [0, 1]. X is continuous so xu R F (xu ) = u. See Figure (3.1).

Now, F (u) = P (U < u) = P (F (X) < F (xu )) = P (X < xu ) = F (xu ) = u

So, F (u) = u and r.v. U is uniform on [0, 1].

To perform the simulation of Dn do the following

generate random samples of size n from the standard uniform distribu-


tion U [0, 1] with the c.d.f. F (u) = u,

find maximum absolute difference between F (u) and the empirical dis-
tribution F (u) for the generated sample

repeat this N times to get the approximate distribution of Dn


3.1. THE ONE-SAMPLE TEST 29

Small Example
Five independent weighings of a standard weight (in gm 106 ) give the
following discrepancies from the supposed true weight:

1.2, 0.2, 0.6, 0.8, 1.0.

Are the discrepancies sampled from N (0, 1)?


We set the null hypothesis as H0 : F (x) = F0 (x) where F0 (x) = (x), i.e., it
is the c.d.f. of a standard normal r.v. X. To calculate the value of the test
function (3.2) we need the empirical c.d.f. for the data and also the values
of at the data points.

The empirical c.d.f.




0 f or x < 1.2
0.2 f or 1.2 x < 1.0




0.4 f or 1.0 x < 0.6

F (x) =

0.6 f or 0.6 x < 0.2
0.8 f or 0.2 x < 0.8




1 f or x 0.8

Calculations:

x F (x) (x) |F (x) (x)|


1.2 0 0.115 0.115
1.2 0.2 0.115 0.085
1.0 0.2 0.159 0.041
1.0 0.4 0.159 0.241
0.6 0.4 0.274 0.126
0.6 0.6 0.274 0.326
+0.2 0.6 0.580 0.020
+0.2 0.8 0.580 0.220
+0.8 0.8 0.788 0.012
+0.8 1 0.788 0.212
Hence, the observed value of Dn , say d0 , is d0 = 0.326

What is the null distribution of Dn ?

We have n = 5. Suppose we have randomly generated the following five


values from the standard uniform distribution:
30 CHAPTER 3. KOLMOGOROV-SMIRNOV TESTS

F(x)

0.617

0.2 d_max

0.617 x

Figure 3.2: Calculating dmax

0.8830 0.6170 0.7431 0.9368 0.3070

D5 gets value dmax = |0.6170 0.2 |. See Figure 3.2

Another random sample of 5 uniform r.vs will give another value dmax . Re-
peating this procedure we simulate a set of values for D5 . Then, having
the approximate null distribution of the test statistic we may calculate the
p value of the observed d0 .

Below is a simulated distribution of D5 using a GenStat program

- 0.18 18 ****
0.18 - 0.24 107 *********************
0.24 - 0.30 220 ********************************************
0.30 - 0.36 231 **********************************************
0.36 - 0.42 169 **********************************
0.42 - 0.48 119 ************************
0.48 - 0.54 80 ****************
0.54 - 0.60 35 *******
0.60 - 0.66 16 ***
0.66 - 5 *

This shows that d0 = 0.326 is well in the middle of the distribution and
so the data do not contradict the null hypothesis that the discrepancies are
normally distributed with zero mean and variance equal to one.
3.1. THE ONE-SAMPLE TEST 31

3.1.2 Kolmogorov-Smirnovs approximation of the null


distribution
The approximation is given by the following theorem.

Theorem 3.1 Let F0 be a continuous c.d.f., and let X1 , . . . , Xn be a sequence


of i.i.d. r.vs with the c.d.f. F0 . Then

1. The null distribution of Dn does not depend on F0 ; it depends only on


n.

2. If n the distribution of nDn is asymptotically Kolmogorovs
distribution with the c.d.f.

X 2 x2
Q(x) = 1 2 (1)k1 e2k , (3.4)
k=1

that is
lim P ( nDn x) = Q(x).
n

Example: Fire Occurrences


A natural reserve in Australia had 15 fires from the beginning of this year.
The fires occurred on the following days of the year: 4, 18, 32, 37, 56, 64,
78, 89, 104, 134, 154, 178, 190, 220, 256. A researcher claims that the time
between the occurrences of fire in the reserve, say X, follows an exponential
distribution, i.e., X Exp(), where = 0.009. Is the claim justified?

Calculations will be shown during the lectures.


32 CHAPTER 3. KOLMOGOROV-SMIRNOV TESTS

3.2 The two-sample test


Let

x1 , . . . , xm be observations on i.i.d. r.vs X1 , . . . , Xm with a c.d.f. F1 ,

y1 , . . . , yn be observations on i.i.d. r.vs Y1 , . . . , Yn with a c.d.f. F2 .

We are interested in testing the null hypothesis of the form

H0 : F1 (x) = F2 (x) for all x

against the alternative

H0 : F1 (x) 6= F2 (x)

Theorem 3.2 Let X1 , . . . , Xm and Y1 , . . . , Yn be i.i.d. r.vs with a common


continuous c.d.f. and let F1 and F2 be empirical c.d.fs of X s and Y s,
respectively. Furthermore, let

Dm,n = sup |F1 (t) F2 (t)|. (3.5)


t

Then we have r 
mn
lim P t = Q(t), (3.6)
m,n m+n
where Q(t) is given by (3.4)

Hence, the function Dm,n may serve as a test statistic for our null hypothesis.
It has asymptotic Kolmogorov-Smirnovs distribution.

NOTES

The Kolmogorov-Smirnov (K-S) two-sample test is an alternative to


the MWW test.

The MWW test is more powerful when H1 is the location shift. The K-S
test has reasonable power against a range of alternative hypotheses.

For small samples we may simulate the null distribution of Dm,n ap-
plying standard uniform distribution U [0, 1].

Learning the mechanics example will be given during the lectures.

You might also like