Probability - Notes2
Probability - Notes2
Probability I
Notes 2
Autumn 2005
Sampling
I have four pens in my satchel; they are red, green, blue, and purple. I take out a pen
and lay it on the desk; each pen has the same chance of being selected. In this case,
S = {R, G, B, P}, where R means red pen chosen and so on. If A is the event red or
green pen chosen, then
|A| 2 1
P(A) =
= = .
|S | 4 2
More generally, if I have a set of N objects and choose one, with each one equally
likely to be chosen, then each of the N outcomes has probability 1/N, and an event
consisting of m of the outcomes has probability m/N.
What if we choose more than one pen? We have to be more careful to specify the
sample space.
First, we have to say whether we are
sampling with replacement, or
sampling without replacement.
Sampling with replacement means that we choose a pen, note its colour, put it back
and shake the satchel, then choose a pen again (which may be the same pen as before
or a different one), and so on until the required number of pens have been chosen. If
we choose two pens with replacement, the sample space is
{RR,
GR,
BR,
PR,
RG,
GG,
BG,
PG,
RB,
GB,
BB,
PB,
RP,
GP,
BP,
PP}
The event at least one red pen is {RR, RG, RB, RP, GR, BR, PR}, and has probability
7/16.
1
In general, if we choose n items from a set of size N, and the sampling is done
with replacement, then the sample space S consists of all ordered n-tuples of the form
(1 , 2 , . . . , n ), where i denotes the object taken out on the i-th occasion.
Sampling without replacement means that we choose a pen but do not put it back,
so that our final selection cannot include two pens of the same colour. In this case, the
sample space for choosing two pens is
{
RG, RB, RP,
GR,
GB, GP,
BR, BG,
BP,
PR, PG, PB
}
and the event at least one red pen is {RG, RB, RP, GR, BR, PR}, with probability
6/12 = 1/2.
Now there is another issue, depending on whether we care about the order in which
the pens are chosen. We will only consider this in the case of sampling without replacement. Sometimes it doesnt really matter whether we choose the pens one at a
time or simply take two pens out of the drawer; we are not always interested in which
pen was chosen first. If we are not interested then the sample space is
{{R, G}, {R, B}, {R, P}, {G, B}, {G, P}, {B, P}},
containing six elements. (Each element is written as a set since, in a set, we dont
care which element is first, only which elements are actually present. So the sample
space is a set of sets!) The event at least one red pen is {{R, G}, {R, B}, {R, P}},
with probability 3/6 = 1/2. We should not be surprised that this is the same as in the
previous case.
If order is important, the sample space S still consists of ordered n-tuples of the
form (1 , 2 , . . . , n ), but now all of the i must be different. If order is not important
then S consists of all subsets of of size n.
There are formulae for the sample space size in these three cases. These involve
the following expressions:
N! = N(N 1)(N 2) 1
Pn = N(N 1)(N 2) (N n + 1)
N
Cn = N Pn /n!
N
Note that N! is the product of all the whole numbers from 1 to N; and
N
Pn =
N!
,
(N n)!
2
so that
N
Cn =
N!
.
n!(N n)!
Example Ten coins are tossed: each is equally likely to come down heads or tails.
What is the probability that we get exactly three heads?
This is equivalent to sampling from {H, T } with replacement, so |S | = 210 = 1024.
Let A be the event exactly three heads. Then |A| is equal to the number of ways
of choosing 3 things from 7, which is
10
C3 =
10 9 8
10!
=
= 120.
3! 7!
321
If all outcomes are equally likely then P(A) = 120/1024 = 15/128 0.117.
Example I have 10 coins in my pocket; 3 are copper and 7 are silver. I take out
4 coins, one after another. Let
D = 2 silver followed by 2 copper
E = all 4 are silver
F = 2 silver and 2 copper, in any order.
This is sampling without replacement.
For event D, the order matters, so we consider ordered samples. Then |S | = 10 P4 =
10 9 8 7. For event D we must choose an ordered sample of 2 from the 7 silver
coins followed by an ordered sample of 2 from the 3 copper coins, so |D| = 7 P2 3 P2 =
7 6 3 2. Therefore P(D) = |D|/|S | = 1/20.
For event E, we choose an ordered sample of 4 from the 7 silver coins,so |E| =
7 P = 7 6 5 4 and P(E) = |E|/|S | = 1/6.
4
Event F is like event D except that we have to choose which 2 of the 4 positions
should have the silver coins. There are 4 C2 ways of doing this, which is 6, so P(F) =
6P(D) = 6/20 = 3/10.
If we didnt want to know about event D then we could use unordered samples.
Then |S | = 10 C4 and |E| = 7 C4 so
P(E) =
7!
6! 4! 1
= .
4! 3! 10!
6
Also, |F| = 7 C2 3 C2 , because each choice of two silver coins can be combined with
each choice of two copper coins. Thus
P(F) =
7!
3!
4! 6!
= 3/10.
2! 5! 2! 1! 10!
The results for E and F are the same for ordered and unordered samples, as they
should be.
4
Summary: In a sampling problem, you should first read the question carefully and
decide whether the sampling is with or without replacement. If it is without replacement, decide whether the sample is ordered (e.g. does the question say anything about
the first object drawn?). If so, then use the formula for ordered samples. If not,
then you can use either ordered or unordered samples, whichever is convenient; they
should give the same answer. (Usually it is easier to use unordered samples whenever
you can.) If the sample is with replacement, or if it involves throwing a die or coin
several times, then use the formula for sampling with replacement.