STAT2011 Week3 2024
STAT2011 Week3 2024
1 )
Recall classical definition of probability. If there are n equally likely ways to perform
a certain operation and a total of m of those satisfy some stated condition A, then
Def m
P (A) = .
n
Example 2.3.1 (Urn model). N = 8 chips, numbered 1, . . . , 8, sample k = 3. Let
Determine P (A).
We begin visualising the situation with Figure 2.7.1 in Larsen and Marx (2012):
Then,
• n= N 8
k
= 3
= 56 (In R: choose(8,3) or use factorial(n) repeatedly)
• m = 1 2 = 6 (chip 5 must be selected and two other chips from the subpopulation
1 4
1 – 4 must be selected)
Thus,
1
4
6
P (A) = 1
8
2 = = 0.11 2
3
56
25
Example 2.3.2 (Urn model, continued). N = 3n chips
• n red numbered r1 , . . . , rn
• n white numbered w1 , . . . , wn
• n blue numbered b1 , . . . , bn
Note, A1 and A2 are mutually exclusive (recall, mutually exclusive events are not inde-
pendent). Therefore from Axiom 3,
Similarly,
P (A2 ) = P (two 1’s ∪ . . . ∪ two n’s); these are mutually exclusive (why?)
3
2
= n 3n
2
Thus,
n 3
3 +n n+1
P (A1 ∪ A2 ) = 2
3n
2
= 2
2
3n − 1
Example 2.3.3 (Birthday problem). Select k people (at random) from the general pop-
ulation.
What are the chances that at least two of those k were born on the same day?
Assumptions: no birthday on 29 February, all birthdays are equally likely (plausible?)
Picture k individuals as an ordered sequence
26
Using the multiplication rule, there are 365k equally likely sequences. Then consider
A = ‘at least two people have the same bday’
AC = ‘no two people have the same bday’
A is difficult to enumerate but AC is much easier:
365k − 365 Pk
P (A) = 1 − P (AC ) = ; 365 Pk = 365(364) . . . (365 − k + 1)
365k
Thus,
k P (A)
15 0.253
22 0.476
40 0.891
50 0.970
70 0.999
Relaxed assumptions: all birthdays are equally likely, except 29 February, which is four
times less likely
We could proceed thinking about
L = # of people in sample of size k with bday on 29/2
Then,
k
X
P (A) = P (A|L = l)P (L = l)
l=0
Of course if L ≥ 2 then at least two people have their birthday on the same day, so the
calculation of P (AC ) will simplify considerably and is left as an exercise. 2
27
(Begin of Lecture 3.2 )
3 Random variables
3.1 Introduction
So far probabilities were assigned to events – that is to sets of sample outcomes.
Events were restricted to either a finite or a countably infinite number of sample outcomes.
One particular probability function encountered was the assignment of 1/n as the prob-
ability associated with each of the n points in a finite sample space.
This will now be generalised using the concept of random variables.
Example 3.1.1. Medical researcher tests 8 patients for their allergic reaction (yes/no)
to a new drug.
• Then X is said to be a random variable and the number x = 3 is the value of the
random variable for the outcome (y, n, n, y, n, n, y, n). 2
In general, random variables (RVs) are functions that associate numbers in R with some
attribute of a sample outcome that is deemed to be especially important.
for s ∈ S : X(s) = ts = t ∈ R or X : S 7→ R 2
Example (continued). The RV X has nine possible values (reduced from 256), the
integers 0, 1, . . . , 8. 2
28
All RVs fall into one of two broad categories:
Definition 3.2.1. A function whose domain is a sample space S and whose values form
a finite or countably infinite set of real numbers is called a discrete random variable. 2
1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
Thus
k 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
P (X = k) 36 36 36 36 36 36 36 36 36 36 36
29
Let A ⊂ S be any event. Then,
X
P (A) = p(s)
s∈A
induces a probability function that satisfies the axioms of probability (proof left as an
exercise).
Example 3.2.2 (Geometric distribution). Toss a coin until first head is observed. What
is the probability that this happens on an odd-numbered toss?
The cdf has many properties (proofs left as an exercise for now) such as
30
(Begin of Lecture 3.3 )
Example 3.2.3 (Roulette). The following picture is from Wikimedia (retrieved on 1/2/2018)
and shows the American roulette board
There are n = 38 possible numbers, 18 of which are odd, 18 are even (zero does not count
as even) and two are different (neither odd nor even).
Thus,
9 10
“Expected” winnings = E(X) = ($1) + (−$1) = −5 cents
19 19
The following is Figure 3.5.2 in Larsen and Marx (2012):
31
Definition 3.2.4 (Expected value of a discrete RV X). Let discrete RV X have probability
mass function (pmf ) pX (k). The expected value of X, denoted by E(X) (or sometimes µ
or µX ), is given by X
E(X) = k · pX (k). 2
all k
(The above equation shows that the pX (k)’s are weights when taking a weighted average
of the k’s.)
Comment. We assume that the sum in the definition above converges absolutely, that
is X
|k|pX (k) < ∞,
all k
Example 3.2.4 (Equally likely outcomes). Here pX (k) = 1/n for all k ∈ X with #X = n.
Then,
Def
X X 1 1X
E(X) = k · pX (k) = k = k
all k all k
n n all k
Definition 3.2.5 (Median of a discrete RV). If X is a discrete RV, the median m is that
point for which
P (X < m) = P (X > m)
(or any point m that minimizes |P (X < m) − P (X > m)|.) 2
(For symmetric pmf ’s E(X) and median coincide, otherwise E(X) is drawn towards the
longer tail, whereas the median ‘stays’. )
32
3.2.3 Functions of RVs and their expected values
Let X, X1 , X2 , X3 , . . . denote RVs. We often want to learn something about their func-
tions.
Example 3.2.5.
b. Tn = X1 + X2 + . . . + Xn
c. X n = n1 ni=1 Xi = Tn /n
P
d. Y2 = X 2
2
P
provided that all k |g(k)|pX (k) < ∞.
Proof of Theorem 3.2.1. Let W = g(X). The set of k-values X = {k1 , k2 , . . .} will
give rise to a set of w-values T = {w1 , w2 , . . .}, where in general, more than one of the
k’s may be associated with a given w.
Let Xj be the set of k’s for which g(k) = wj .
Then ∪all j Xj = X .
Clearly P (W = wj ) = P (X ∈ Sj ).
Def
X X
E(W ) = wj P (W = wj ) = wj P (X ∈ Xj )
all j all j
X X
= wj pX (k)
all j k∈Xj
XX
= wj pX (k)
all j k∈Xj
XX
= g(k)pX (k)
all j k∈Xj
X
= g(k)pX (k)
all k
33
Corollary 3.2.2. For any RV X, E(aX + b) = a E(X) + b. 2
k pX (k) w pW (w)
−2 5/8 1 1/8
1 1/8 4 7/8
2 2/8
Thm P 5 1 2 29
Note, E(W ) = all k g(k)pX (k) = 4( 8 ) + 8 + 4( 8 ) = 8
Def
Also, E(W ) = all w g(w)pW (w) = 1( 18 ) + 4( 87 ) = 29 2
P
8
3.2.4 Variance
Location (e.g. expected value) is uninformative about the dispersion (spread) of RVs.
There are several ways to measure dispersion, for example through
Definition 3.2.6. The variance of a RV X is the expected value of its squared deviation
from E(X) = µ and is only defined when E(X 2 ) is finite,
Theorem 3.2.3. Let X be any RV having mean µ and for which E(X 2 ) is finite. Then,
34
Proof of Theorem 3.2.3 Let g(X) = (X − µ)2 . Then
X X
Var(X) = E[(X − µ)2 ] = g(k)pX (k) = (k − µ)2 pX (k)
all k all k
X X X
2
= k pX (k) − 2kµpX (k) + µ2 pX (k)
all k all k all k
X X
2 2
= E(X ) − 2µ kpX (k) + µ pX (k)
all k all k
= E(X 2 ) − 2µ + µ = E(X 2 ) − µ2
2 2
Theorem 3.2.4. Let X be any RV with finite µ = E(X) and finite E(X 2 ). Consider
Z = aX + b. Then
Var(Z) = a2 Var(X) 2
One unfortunate consequence of the definition of the variance is that the units for the
variance are the square of the units of the RV. Therefore, taking the square-root is often
done in statistics.
2
p
SD(X) = σX = Var(X)
References
Larsen RL, Marx ML (2012). Introduction to Mathematical Statistics and Its Applica-
tions, 5th Edition, Boston: Pearson, Section 2.7, 3.3, 3.5, 3.6
35