STAT2011 Week1 2024
STAT2011 Week1 2024
Semester 1 – 2024
STAT 2011:
Probability and Estimation Theory
Lecturer
Dr Clara Grazian
School of Mathematics & Statistics F07
University of Sydney
References 11
1.5 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Combinatorics 17
2.1 Counting permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Counting combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
References 24
2.3 Combinatorial probability . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4 Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3 Random variables 28
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 Discrete random variables . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.1 Expected value . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.2 Other measures of location . . . . . . . . . . . . . . . . . . . . . . 32
3.2.3 Functions of RVs and their expected values . . . . . . . . . . . . . 33
3.2.4 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
References 35
3.2.5 Binomial random variables . . . . . . . . . . . . . . . . . . . . . . 36
3.2.6 Hypergeometric random variables . . . . . . . . . . . . . . . . . . 41
3.2.7 Poisson random variables . . . . . . . . . . . . . . . . . . . . . . . 44
References 48
3.2.8 Geometric random variables . . . . . . . . . . . . . . . . . . . . . 49
3.2.9 Negative binomial distribution random variables . . . . . . . . . . 52
3.3 Continuous random variables . . . . . . . . . . . . . . . . . . . . . . . . 54
References 59
3.3.1 Expected value . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.3.2 Other measures of location . . . . . . . . . . . . . . . . . . . . . . 62
3.3.3 Functions of RVs and their expected values . . . . . . . . . . . . . 62
3.3.4 The variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.3.5 Normal random variables . . . . . . . . . . . . . . . . . . . . . . . 65
3.3.6 Gamma random variables . . . . . . . . . . . . . . . . . . . . . . 70
i
References 74
3.4 Joint densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.4.1 Discrete joint pmf’s . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.4.2 Continuous joint pdf’s . . . . . . . . . . . . . . . . . . . . . . . . 79
3.4.3 Joint cdf’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.5 Transforming and combining RVs . . . . . . . . . . . . . . . . . . . . . . 85
3.5.1 Sums of RVs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
References 88
3.6 Further properties of the mean and variance . . . . . . . . . . . . . . . . 89
3.7 Order Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
References 100
3.8 Conditional Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
3.8.1 Conditional pmfs for discrete RVs . . . . . . . . . . . . . . . . . . 101
3.8.2 Conditional pdfs for continuous RVs . . . . . . . . . . . . . . . . 103
3.9 Moment-Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . 105
3.9.1 Using moment-generating function to find moments . . . . . . . . 107
3.9.2 Using moment-generating function to find variances . . . . . . . . 108
3.10 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
References 112
4 Estimation 113
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.2 Point estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.2.1 Method of maximum likelihood estimation . . . . . . . . . . . . . 117
4.2.2 Method of moments . . . . . . . . . . . . . . . . . . . . . . . . . 122
References 125
4.3 Interval Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.3.1 Confidence intervals for the mean parameter µ of a normal distri-
bution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.3.2 Confidence intervals for the binomial parameter p . . . . . . . . . 130
4.3.3 Choosing sample sizes . . . . . . . . . . . . . . . . . . . . . . . . 131
References 132
4.4 Properties of estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.4.1 Unbiasedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.4.2 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
4.4.3 Minimum-Variance Estimator: The Cramer-Rao Lower Bound . . 139
References 141
ii
(Begin of Lecture 1.1 )
1 Probability
1.1 Introduction
Probability is everywhere
• precision medicine
• climate change
• genetics
• finance
• sports
Evolution of the definition of probability was needed because problems became increas-
ingly (mathematically) more complex.
m #{2, 4, 6} 3 1
P (A) = = = =
n #{1, 2, 3, 4, 5, 6} 6 2
2
1
Empirical probability concept
(Richard Von Mises, 1883-1953, Ukraine)
Example 1.1.2. A fair coin is tossed n times. Let mn = # of H(eads) in n throws. Let
A = {H},
mn
P (A) = lim
n→∞ n
(This definition has limitations because probabilities will end up being always approxima-
tions.)
Enters Andrei Kolmogorov (1903-1987, Russia) (We will see this in much more detail in
future lectures.)
Recall,
2
1.2 Sample spaces and the algebra of sets
(We begin with some basic definitions, without numbering them.)
The number of sample outcomes associated with an experiment need not be finite.
Example 1.2.2 (Flipping coin until first tail).
S = {T, HT, HHT, HHHT, HHHHT, . . .} ⇒ #S = ∞ 2
(In this example, all possible sample outcomes were characterized )
3
(Begin of Lecture 1.2 )
Example 1.2.4.
• S = {1, 2, 3, 4, 5, 6}
• A = {2, 4, 6}
• B = {1, 2, 3}
• Note, A ⊂ S and B ⊂ S 2
Example (continued).
Definition 1.2.2 (Mutually exclusive events). A and B are mutually exclusive iff A∩B =
∅, the null or empty set. 2
Theorem (De Morgan’s laws; not in textbook, will be proved in the Tutorial of Week 2)
For any two events A and B:
(A ∩ B)C = AC ∪ B C
(A ∪ B)C = AC ∩ B C 2
4
These notions extend to more than two events:
∪ki=1 Ai = A1 ∪ A2 ∪ . . . ∪ Ak = {s ∈ S : ∃i s.t. s ∈ Ai }
∩ki=1 Ai = A1 ∩ A2 ∩ . . . ∩ Ak = {s ∈ S : s ∈ Ai ∀ i = 1, . . . , k}
Axiom 2. P (S) = 1.
Axiom 3. Let A and B be any two mutually exclusive events defined overs S. Then
P (A ∪ B) = P (A) + P (B).
(In Larsen and Marx (2012) Axiom 3’ is called Axiom 4 but indeed the above is the proper
formulation of the axiomatic definition of probability, it requires exactly three axioms.)
5
Theorem 1.3.1 (Basic properties of the probability function).
(i) P (AC ) = 1 − P (A)
(ii) P (∅) = 0
(iv) The proof follows immediately from (iii) because A ⊂ S and P (S) = 1.
(vi) The Venn diagram for A ∪ B certainly suggest that the statement is true. More
formally from Axiom 3
6
• A = (A ∩ B) ∪ (A ∩ B C ) and those two are mutually exclusive
Thus,
and
P (A ∩ B C ) + P (B ∩ AC ) = 0.7 2
7
(Begin of Lecture 1.3 )
We observe, knowing B is like shrinking the sample space from S to B, the conditional
sample space.
More formally.
Let S be a finite sample space with n equally likely outcomes. Assume A and B are two
events with a and b outcomes, respectively, and let c = #{s ∈ A ∩ B}.
This is Figure 2.4.2 in Larsen and Marx (2012)
Note
c c/n P (A ∩ B)
P (A|B) = = =
b b/n P (B)
This motivates the following definition.
Definition 1.4.1 (Conditional probability). Let A, B ⊂ S such that P (B) > 0. The
conditional probability of A given B is defined as
P (A ∩ B)
P (A|B) = ⇒ P (A ∩ B) = P (A|B)P (B) 2
P (B)
8
Example 1.4.2 (Two boys problem). Consider set of all families having exactly two chil-
dren but these are not twins. What is the probability that A = “both children are boys”
given the event B = “at least one child is a boy”?
9
Definition 1.4.2. A set of events A1 , A2 , . . . , An is a partition of S if every s ∈ S belongs
to one and only one of the Ai ’s. 2
Proof of Theorem 1.4.1. Note that B can be written as the following union of mutually
exclusive events:
B = (B ∩ A1 ) ∪ (B ∩ A2 ) ∪ . . . ∪ (B ∩ An )
Thus,
n
X n
X
P (B) = P (B ∩ Ai ) = P (B|Ai )P (Ai ) 2
i=1 i=1
Example 1.4.3 (Standard poker set). Cards are shuffled and top card is removed. What
ist the probability of
B = “second card is an ace”
Let A1 = “top card was an ace” and A2 = AC
1 . Then,
Proof of Theorem 1.4.2 Follows from the definition of conditional probability and
using Theorem 1.4.1, that is
Def P (Aj ∩ B) Thm P (B|Aj )P (Aj )
P (Aj |B) = = Pn 2
P (B) i=1 P (B|Ai )P (Ai )
10
Example 1.4.4. During a power blackout, 100 persons are arrested on suspicion of
looting. Each is given a polygraph test. From past experience it is known that the
polygraph is 90% reliable when administered to a guilty suspect and 98% reliable when
given to someone who is innocent. Suppose that of the one hundred persons taken into
custody, only twelve were actually involved in any wrongdoing. What is the probability
that a given suspect is innocent given that the polygraph says he is guilty?
Let
• A1 = “Suspect is guilty”
P (B|A1 ) = 0.90
Substituting into Theorem 1.4.2 then, shows that the probability a suspect is innocent
given that the polygraph says he is guilty is 0.14:
P (B|A2 )P (A2 )
P (A2 |B) =
P (B|A1 )P (A1 ) + P (B|A2 )P (A2 )
(0.02)(88/100)
=
(0.90)(12/100) + (0.02)(88/100)
= 0.14
References
Larsen RL, Marx ML (2012). Introduction to Mathematical Statistics and Its Applica-
tions, 5th Edition, Boston: Pearson. - Chapter 2.1 - 2.4
11