0% found this document useful (0 votes)
17 views14 pages

Var Mean Sample

This document provides background information on simple random sampling without replacement from a finite population. It defines key terms like population, mean, variance, random variables. It then describes two random processes: 1) randomly selecting one element from the population, and 2) randomly selecting n elements without replacement. For the second process, it introduces n random variables for the n sampled elements and defines their mean, which has an expected value equal to the population mean.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views14 pages

Var Mean Sample

This document provides background information on simple random sampling without replacement from a finite population. It defines key terms like population, mean, variance, random variables. It then describes two random processes: 1) randomly selecting one element from the population, and 2) randomly selecting n elements without replacement. For the second process, it introduces n random variables for the n sampled elements and defines their mean, which has an expected value equal to the population mean.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Simple Random Sampling Without

Replacement in a Finite Population. Notes on


the Variance of the Sample Mean.
Nicolas RAPIN
Personnal Notes
[email protected]
7 février 2022

The problematic addressed in those notes, indicated in the title, is only


detailed and formalized in Subsection 1.4. In order to make those notes as
self-contain as possible, we first start by a recall of some notions.

0.1 Random Process, Universe, Probability, Random


Variable
The set of possible results of a given random process is called an universe,
usually noted Ω. An event of Ω is a subset of Ω. Intuitively an event e ⊂ Ω
is realized by the random process if its result r belongs to e. Given a set F
of events of Ω, a probability measure P over F is a function which assigns a
value between 0 and 1, called its probability, to any event of F and which
must satisfy two properties. First, the probability of a countable union of
mutually exclusive events (i.e. having an empty intersection) must be equal
to the countable sum of the probabilities of each of these events. Second, we
must have P(Ω) = 1. The triple (Ω, F, P) is called a probability space (see the
Literature for details).

Intuitively a random variable is a transformation of the results of the


random process i.e. of the elements of Ω. Formally a random variable Y is
a map from the universe Ω to a set, noted |Y |, called the range of Y . By

1
an abuse of notation, Y is used also for denoting a value of its range i.e. in
place of Y (r) (for some r ∈ Ω). The event notion can be extended to Y :
an event relatively to Y is a subset of |Y |. Considering an event e ⊆ |Y | the
probability that e is realized, noted P(Y ∈ e), is given by P(Y −1 (e)).

P(Y ∈ e) =def P({ω ∈ Ω | Y (ω) ∈ e})


If c ∈ |Y | we note P(Y = c) as an abbreviation of P(Y ∈ {c}).

In the remainder we will only consider finite and discrete universes.

Example 1. The random process consists in throwing two dices. So the


universe is [1, 6]×[1, 6], containing 36 pairs. Events are subsets of [1, 6]×[1, 6]
and basic events are singletons of the form {(a, b)} with (a, b) ∈ [1, 6] × [1, 6].
Assuming dices are not biased the probability P is such that for any pair
1
(a, b) of [1, 6] × [1, 6] we have P({(a, b)}) = 36 .

We define the random variable S as the sum of the points appearing on


the top faces of the two dices. Formally, for any pair (a, b) of [1, 6] × [1, 6],
S((a, b)) = a + b. Then the possible values of S are |S| = {2, 3, 4, 5 , 6, 7, 8, 9,
10, 11, 12}. For example we have :

• P(S ∈ {11}) = P(S −1 ({11})) = P({(5, 6), (6, 5)}) = 36


2
.
• P(S is odd) = P(S ∈ {3, 5, 7, 11}) = (left to the reader)

Two variables Z and Y are said to have the same distribution if :

• |Z| = |Y |
• P(Z ∈ e) = P(Y ∈ e) for any e ⊂ |Z|.

Suppose that Y1 is the random variable associated to the result of the


first dice and Y2 to the second. Formally, for (a, b) ∈ Ω, Y1 (a, b) = a and
Y2 (a, b) = b. It is clear that Y1 and Y2 have the same distribution since
for any e ⊂ {1, . . . , 6} the measure assigned (by P) to Y1−1 (e) = {(a, b) ∈
Ω | a ∈ e} and to Y2−1 (e) = {(a, b) ∈ Ω | b ∈ e} are the same. Indeed
card(Y1−1 (e)) = card(Y2−1 (e)) and
1
P(Y1−1 (e)) = card(Y1−1 (e)).
36
1
P(Y2−1 (e)) = card(Y2−1 (e)).
36
Example 2. Consider now another process consisting in collecting two
balls, numbered 1 to 6, in a bag (the first is not replaced in the bag when one
collects the second). The universe is [1, 6] × [1, 6] \ {(a, a) | a ∈ [1, 6]}. Once
again we introduce the random variables Y1 and Y2 as follows : for (a, b) ∈ Ω,
Y1 (a, b) = a and Y2 (a, b) = b. Though Y1 and Y2 are not independent (the
first collect reduces the choices for the second) Y1 and Y2 have the same dis-
tribution. It suffices to see that Ω is symmetric (for any pair (a, b) there exists
its symmetric (b, a)). As we will see our main problem is somehow similar to
this latter example.

The following section introduces the context of our problem.

1 Finite Population of R
We consider a finite population P of real values. Formally P is then a
map of RN with N ∈ N∗ that can be noted as a finite tuple of elements
of R. For the remainder we assume that P is the tuple (x1 , . . . , xN ). Some
elements of this population can be repeated i.e. we may have xi = xj with
i 6= j. Thus the number M of different values present in P i.e. the cardinal
of {v ∈ R | ∃i ∈ [1, N ], v = xi } is less or equal to N . Let us note ε1 , . . . , εM
the values present in P and η1 , . . . , ηM their respective repetition in P (ηi =
card({k ∈ [1, N ] | xk = εi })).

Example : N = 4, P is (x1 = 1.2, x2 = 3.5, x3 = 4.2, x4 = 1.2),

There are M = 3 values represented in P which are

(ε1 = 1.2, ε2 = 3.5, ε3 = 4.2)


The repetition of those values is given by :

(η1 = card({1, 4}) = 2, η2 = 1, η3 = 1)


1.1 Mean, Variance of P
The mean m of the population P is defined as follows :
i=N
1 X
m= xi
N i=1

or equivalently :
i=M
1 X
m= ηi .εi
N i=1
ηi
Introducing pi = N
we have :
i=M
X
m= pi .εi
i=1

The variance υ of P is :
i=N
1 X
υ= (xi − m)2
N i=1

By König-Huygens Theorem :
i=N
1 X 2
υ=( xi ) − m 2
N i=1
Exploiting the repetition of elements we also have :
i=M
1 X 2
υ=( εi .ηi ) − m2
N i=1

1.2 Process 1 : Random Collect of One Element


We define a random process which consists in collecting randomly one
element of P . This process is indeed a random choice of an index of [1, N ].
The universe Ω is [1, N ] and we assume that for any k ∈ [1, N ], P({k}) = N1 .
From this process we can define two random variables k and X. The random
variable is k the index of the collected element so |k| = [1, N ]. Notice that,
as a map k, is the identity. Hence for y ∈ [1, N ] :
1
P(k = y) = P(k −1 ({y})) = P({y}) =
N
The random variable X is the value of the collected element, so X = xk
and hence |X| = {ε1 , . . . , εM }.

Suppose that y1 , . . . , yηi are the indexes of the elements of P which are
equal to εi for some i ∈ [1, M ]. With the notations of the previous section
we have :

P(X = εi ) = P(k = y1 | . . . | k = yηi )

j=ηi
X 1 ηi
= P(k = y1 ) + . . . + P(k = yηi ) = = = pi
j=1
N N

The expected value of X, E[X] is


i=N i=M
1 X X
E[X] = xi = pi .εi = m
N i=1 i=1

And the variance var[X] of X is


i=M
1 X 2
var[X] = ( ε .ηi ) − (E[X])2 = υ
N i=1 i

1.3 Process 2 : Collect Of n Elements Without Repla-


cement
Let us consider this other process : one collects randomly an element in
P , remove this element from P , then repeat this operation in the popula-
tion of remaining elements until he gets n elements aside. This process will
be called SRSW OR (Simple Random Sampling WithOut Replacement). It
can be modeled by the choice of n different values (indexes) in [1, N ] so by
a sequence (s1 , . . . , sn ) of n values (indexes) taken in [1, N ] and satisfying
si 6= sj provided i 6= j. The values collected from P are (xs1 , . . . , xsn ).
Let us introduce n random variables X1 , . . . , Xn associated to SRSW OR
as follows. For i ∈ [1, n] :

X i = xs i
From those variables we can define the random variable X as follows :
i=n
1X
X= Xi
n i=1

X is the mean of the sample (xs1 , . . . , xsn ).

From the linearity of the expected value we can easily prove that :

E[X] = E[X] = m

Indeed
i=n
1X
E[X] = E[ Xi ]
n i=1
i=n
1X
= E[Xi ]
n i=1
Since any Xi has the same distribution than X (this is established later
in Subsection 1.5), E[Xi ] = E[X]. Thus we have :

i=n
1X
E[X] = E[X]
n i=1
i=n
E[X] X
= . 1
n i=1

E[X]
.n = E[X] = m
n
1.4 What we want to prove
Concerning var[X], the variance of the sample mean, many books claim
without providing a proof that this equality holds :

N −n
var[X] = var[X].
n (N − 1)

The variance of the sample mean X is the variance υ of the population P


(or the variance var[X] of the variable X of Process 1) corrected by a factor
that only depends on N and n − 1 (and not on the particular values of the
elements of the population). The purpose of those notes is to provide a proof
for this equality which is not straightforward for us. As we will see this proof
is also based on linearity.

1.5 Sampling Sets, Stability by Permutation


Let us determine the number of possible samples. For the first index s1
one has N choices. Since the corresponding element is supposed removed
from P , this amounts to say that one has N − 1 index choices for the second
index and so on. So the number of such samples is :
N!
N × (N − 1) × . . . × (N − n + 1) =
(N − n)!

The previous calculus subsumes an order for building the samples. The
first collected element is placed at the first place in the sample, the second at
the second place and so on. Indeed this order is totally arbitrary. One would
obtain the same sets of samples following another order.

More formally this amounts to say that the set of samples is preserved
by permutation. Suppose that si is permuted with sj in all samples, the set
of samples in globally preserved. In particular, for any i ∈ [2, n], si can be
permuted with s1 , leaving unchanged the set of all samples.
1.6 P(X1 = εi ), P(Xr = εi )
Let us consider the first element collected from P which determines X1 .
This element can be any element of P . So s1 can be any index k ∈ [1, N ]. It
follows that for k ∈ [1, N ] :
1
P(s1 = k) =
N
This can be confirmed by considering the samples satisfying s1 = k. Their
number is :

(N − 1)!
1 × (N − 1) × . . . × (N − n + 1) =
(N − n)!
So, as above
(N −1)!
(N −n)! (N − 1)!(N − n)! (N − 1)! 1
P(s1 = k) = N!
= = =
(N −n)!
N !(N − n)! N! N

Suppose that y1 , . . . , yηi are the indexes of the elements of P which are
equal to εi for some i ∈ [1, M ]. Then :
j=ηi
X 1 ηi
P(X1 = εi ) = P(s1 = y1 | . . . | s1 = yηi ) = =
j=1
N N

Now for r ∈ [2, n] and k ∈ [1, N ] what about P(sr = k) ? Indeed it is the
same than P(s1 = k). This has been justified in the previous Subsection : the
set of samples is preserved by the permutation of s1 with sr in all samples.
So for r ∈ [2, n] :
j=ηi
X 1 ηi
P(Xr = εi ) = P(sr = y1 | . . . | sr = yηi ) = =
j=1
N N

This show that all Xi ’s have the same distribution.

Now let us consider the join random variable (X1 , X2 ). It is not difficult
to see that for q 6= r,
ηq ηr
P(X1 = εq , X2 = εr ) =
N N −1
and that :
ηr ηr − 1
P(X1 = εr , X2 = εr ) =
N N −1
As for any couple (Xi , Xj ) with i 6= j this equality (named (A)) holds

(A) {(xs1 , xs2 ) | s is a sample} = {(xsi , xsj ) | s is a sample}

we have also :
ηr ηr − 1
P(Xi = εr , Xj = εr ) =
N N −1
and for q 6= r,
ηq ηr
P(Xi = εq , Xj = εr ) =
N N −1

Let us introduce the constant functions fqr as follows


ηq ηr
fqr =
N N −1
So for q 6= r we have

fqr = P(X1 = εq , X2 = εr )

and we have
ηr
frr = P(X1 = εr , X2 = εr ) +
N (N − 1)
thus
ηr
P(X1 = εr , X2 = εr ) = frr −
N (N − 1)

1.7 Covariance
Definition
Cov(Z, Y ) ≡ E[(Z − E[Z])(Y − E[Y ])
Useful Properties :

Cov[Z, Y ] = E[Z.Y ] − E[Z].E[Y ]


Cov[Z, Y ] = Cov[Y, Z]

Cov[Z, Z] = E[(Z − E[Z])(Z − E[Z])] = E[(Z − E[Z])2 ] = var[Z]

Cov[a.Z, b.Y ] = a.b.Cov[Y, Z]

X X XX
Cov[ ai .Zi , bj .Yj ] = ai .bj .Cov[Zi , Yj ]
i j i j

Corollary :

var[a.Z + b.Y ] = Cov[a.Z + b.Y, a.Z + b.Y ]

= a2 .Cov[Z, Z] + a.b.Cov[Z, Y ] + b.a.Cov[Y, Z] + b2 .Cov[Y, Y ]

= a2 .var[Z] + b2 .var[Y ] + a.b.Cov[Z, Y ] + b.a.Cov[Y, Z]

We voluntary do not simplify a.b.Cov[Z, Y ]+b.a.Cov[Y, Z] to 2.a.b.Cov[Z, Y ]


as we want to emphasize some symmetries when this result is extended to a
more general sum of variables as just below.

Sub-Corollary :
i=n i=n
1X 1 X 1 X
var[ Zi ] = 2 var[Zi ] + 2 Cov[Zi , Zj ]
n i=1 n i=1 n
i6=j∈[1,n]

Notice that i, j ∈ [1, n] with i 6= j defines (n × n) − n terms (the square


size without the diagnonal).

From equality (A) above we can deduce that Cov[Xi , Xj ] = Cov[X1 , X2 ]


provided i 6= j. Combining this fact with the previous one we obtain :
i=n
1 X n2 − n
var[X] = 2 var[X] + Cov[X1 , X2 ]
n i=1 n2

n.var[X] n − 1
var[X] = + Cov[X1 , X2 ]
n2 n
var[X] n − 1
var[X] = + Cov[X1 , X2 ]
n n

1.8 Core of the Proof


Now that many things have been introduced, we can establish the proof.

From the last corollary of the previous Subsection we have :


i=n
1 X X
var[X] = ( var[X i ] + Cov[Xi , Xj ])
n2 i=1
i6=j∈[1,n]

From, Cov[Z, Y ] = E[Z.Y ] − E[Z].E[Y ], we have :

Cov[X1 , X2 ] = E[X1 .X2 ] − E[X1 ].E[X2 ]


As E[X1 ] = E[X2 ] = m we have :

Cov[X1 , X2 ] = E[X1 .X2 ] − m2


Let us develop E[X1 .X2 ] :
X
E[X1 .X2 ] = εq .εr .P(X1 = εq , X2 = εr )
q,r∈[1,M ]
 
X X P(X1 = εq , X2 = εr ) 
= εq .P(X1 = εq )  εr .
P(X1 = εq )
q∈[1,M ] r∈[1,M ]
 
X X P(X1 = εq , X2 = εr ) 
= εq .pq  εr .
pq
q∈[1,M ] r∈[1,M ]

Using the last equality of Subsection 1.6 and the functions fqr
 
ηq
X X fqr N.(N −1)
E[X1 .X2 ] = εq .pq ( εr . ) − (εq . )
pq pq
q∈[1,M ] r∈[1,M ]
ηq
N.(N −1)
Notice that the term (εq . pq ) has been subtracted above because fqq
is not equal to P(X1 = εq , X1 = εq ) (see Subsection 1.6)
 
ηq ηr ηq
N.(N −1)
X X
= εq .pq ( εr . N ηNq−1 ) − (εq . ηq )
q∈[1,M ] r∈[1,M ] N N
 
X X ηr εq 
= εq .pq ( εr . )−( )
N −1 N −1
q∈[1,M ] r∈[1,M ]

Since ηr = pr .N
 
X X pr .N εq 
= εq .pq ( εr . )−( )
N −1 N −1
q∈[1,M ] r∈[1,M ]
 
X N X εq 
= εq .pq  ( εr .pr ) − ( )
N −1 N −1
q∈[1,M ] r∈[1,M ]
 
X N εq
= εq .pq .m − ( )
N −1 N −1
q∈[1,M ]
   
N.m X 1 X
= εq .pq  −  ε2q .pq 
N −1 N −1
q∈[1,M ] q∈[1,M ]

N.m 1 X
=( .m) − ( ε2q .pq )
N −1 N −1
q∈[1,M ]

N.m2 1 X
= − ε2q .pq
N −1 N −1
q∈[1,M ]

N.m2 1 X ηq
= − ε2q .
N −1 N −1 N
q∈[1,M ]
N.m2 1 X
= − ε2q .ηq
N − 1 N.(N − 1)
q∈[1,M ]

In Subsection 1.1 we have seen that


i=M
1 X 2
υ=( εi .ηi ) − m2
N i=1

hence
i=M
X
ε2i .ηi = N (υ + m2 )
i=1

So
N.m2 N (υ + m2 )
E[X1 .X2 ] = −
N −1 N.(N − 1)
N.m2 υ + m2
= −
N −1 N −1
N.m2 − υ − m2
=
N −1
(N − 1).m2 − υ
=
N −1
υ
= m2 −
N −1
Finally

Cov[X1 , X2 ] = E[X1 .X2 ] − m2

υ
Cov[X1 , X2 ] = m2 − − m2
N −1
υ
Cov[X1 , X2 ] = −
N −1

var[X]
Cov[X1 , X2 ] = −
N −1
Let us plug this in the last equality of Subsection 1.7.

var[X] n − 1
var[X] = + Cov[X1 , X2 ]
n n
var[X] (n − 1) var[X]
var[X] = − .
n n N −1
var[X] n−1
var[X] = (1 − )
n N −1
var[X] N − 1 − n + 1
var[X] = ( )
n N −1
 
var[X] N −n
var[X] =
n N −1

That’s it !

You might also like