0% found this document useful (0 votes)
28 views14 pages

STAT2011 Week1 2024

Uploaded by

zixuanmay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views14 pages

STAT2011 Week1 2024

Uploaded by

zixuanmay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

School of Mathematics and Statistics, University of Sydney

Semester 1 – 2024

Lecture Notes, 2024

STAT 2011:
Probability and Estimation Theory

Lecturer
Dr Clara Grazian
School of Mathematics & Statistics F07
University of Sydney

For any question related to the course, please write to


(e) [email protected]
Contents
1 Probability 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Sample spaces and the algebra of sets . . . . . . . . . . . . . . . . . . . . 3
1.3 The probability function . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Conditional probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

References 11
1.5 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 Combinatorics 17
2.1 Counting permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Counting combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

References 24
2.3 Combinatorial probability . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4 Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3 Random variables 28
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 Discrete random variables . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.1 Expected value . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.2 Other measures of location . . . . . . . . . . . . . . . . . . . . . . 32
3.2.3 Functions of RVs and their expected values . . . . . . . . . . . . . 33
3.2.4 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

References 35
3.2.5 Binomial random variables . . . . . . . . . . . . . . . . . . . . . . 36
3.2.6 Hypergeometric random variables . . . . . . . . . . . . . . . . . . 41
3.2.7 Poisson random variables . . . . . . . . . . . . . . . . . . . . . . . 44

References 48
3.2.8 Geometric random variables . . . . . . . . . . . . . . . . . . . . . 49
3.2.9 Negative binomial distribution random variables . . . . . . . . . . 52
3.3 Continuous random variables . . . . . . . . . . . . . . . . . . . . . . . . 54

References 59
3.3.1 Expected value . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.3.2 Other measures of location . . . . . . . . . . . . . . . . . . . . . . 62
3.3.3 Functions of RVs and their expected values . . . . . . . . . . . . . 62
3.3.4 The variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.3.5 Normal random variables . . . . . . . . . . . . . . . . . . . . . . . 65
3.3.6 Gamma random variables . . . . . . . . . . . . . . . . . . . . . . 70

i
References 74
3.4 Joint densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.4.1 Discrete joint pmf’s . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.4.2 Continuous joint pdf’s . . . . . . . . . . . . . . . . . . . . . . . . 79
3.4.3 Joint cdf’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.5 Transforming and combining RVs . . . . . . . . . . . . . . . . . . . . . . 85
3.5.1 Sums of RVs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

References 88
3.6 Further properties of the mean and variance . . . . . . . . . . . . . . . . 89
3.7 Order Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

References 100
3.8 Conditional Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
3.8.1 Conditional pmfs for discrete RVs . . . . . . . . . . . . . . . . . . 101
3.8.2 Conditional pdfs for continuous RVs . . . . . . . . . . . . . . . . 103
3.9 Moment-Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . 105
3.9.1 Using moment-generating function to find moments . . . . . . . . 107
3.9.2 Using moment-generating function to find variances . . . . . . . . 108
3.10 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

References 112

4 Estimation 113
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.2 Point estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.2.1 Method of maximum likelihood estimation . . . . . . . . . . . . . 117
4.2.2 Method of moments . . . . . . . . . . . . . . . . . . . . . . . . . 122

References 125
4.3 Interval Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.3.1 Confidence intervals for the mean parameter µ of a normal distri-
bution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.3.2 Confidence intervals for the binomial parameter p . . . . . . . . . 130
4.3.3 Choosing sample sizes . . . . . . . . . . . . . . . . . . . . . . . . 131

References 132
4.4 Properties of estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.4.1 Unbiasedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.4.2 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
4.4.3 Minimum-Variance Estimator: The Cramer-Rao Lower Bound . . 139

References 141

ii
(Begin of Lecture 1.1 )

1 Probability
1.1 Introduction
Probability is everywhere

• precision medicine

• climate change

• genetics

• finance

• sports

• gambling ⇐ origin of classical notion of probability

Evolution of the definition of probability was needed because problems became increas-
ingly (mathematically) more complex.

Classical definition of probability


(Geralomo Cardano 1501-1576, Italy, Gambler)

(1) Number, n, of possible outcomes is finite

(2) All n outcomes are equally likely

Then, for an event A:

Def #{favorable outcomes} m


P (A) = =
#{all outcomes} n

As a first (trivial) example consider a 6-sided fair die.

Example 1.1.1. A = {2, 4, 6} = even number (in 1st roll).

m #{2, 4, 6} 3 1
P (A) = = = =
n #{1, 2, 3, 4, 5, 6} 6 2
2

However, classical definition is inadequate when

(a) Outcomes are not equally likely

(b) Number of outcomes is not finite

1
Empirical probability concept
(Richard Von Mises, 1883-1953, Ukraine)

Assume an experiment being repeated n times, where n → ∞, presumably under identical


conditions.

Example 1.1.2. A fair coin is tossed n times. Let mn = # of H(eads) in n throws. Let
A = {H},
mn
P (A) = lim
n→∞ n

Figure 2.1.1 in Larsen and Marx (2012) shows a possible trajectory:

(This definition has limitations because probabilities will end up being always approxima-
tions.)

Enters Andrei Kolmogorov (1903-1987, Russia) (We will see this in much more detail in
future lectures.)

Axiomatic definition of probability


Kolmogorov showed in 1933 that a maximum of four axioms are necessary and sufficient
to define the way ‘probability’ must behave.

Recall,

• necessary: if axiom is missing then something is missing

• sufficient: no additional (5th) axiom is necessary

2
1.2 Sample spaces and the algebra of sets
(We begin with some basic definitions, without numbering them.)

An experiment is any procedure that


(1) can be repeated (theoretically infinitely often)

(2) has a well-defined set of possible outcomes


Each such outcome is referred to as a sample outcome, s, their totality is called the
sample space S.
Write s ∈ S.
An event, A, is a collection of “favourable” outcomes.
Example 1.2.1 (Three flips of a fair coin). Each s is a ordered triplet of letters H(ead)
and T(ail).
S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT} ⇒ #S = 8
| {z }
A
Let
A = majority of coins show heads
= {HHH, HHT, HTH, THH}
Thus #A = 4. 2
(In this example, all possible sample outcomes were listed )

The number of sample outcomes associated with an experiment need not be finite.
Example 1.2.2 (Flipping coin until first tail).
S = {T, HT, HHT, HHHT, HHHHT, . . .} ⇒ #S = ∞ 2
(In this example, all possible sample outcomes were characterized )

Example 1.2.3 (Solving equations). A programmer runs a subroutine to solve


ax2 + bx + c = 0 (1)
They choose three values a, b and c at random
S = {(a, b, c) : (a, b, c) ∈ R3 }
Let A be the event ‘Equation (1)’ has two roots. Then
A = {(a, b, c) ∈ R3 : b2 − 4ac > 0} 2
(In this example, all possible sample outcomes were stated in a mathematical formula)
We conclude that there are three ways to state S
(i) List all s as in the first example

(ii) Characterize s as in the second example

(iii) State a mathematical formula as in the third example

3
(Begin of Lecture 1.2 )

Unions, intersections and complements


Let A and B be events in the sample space S.

Example 1.2.4.

• S = {1, 2, 3, 4, 5, 6}

• A = {2, 4, 6}

• B = {1, 2, 3}

• Note, A ⊂ S and B ⊂ S 2

Definition 1.2.1 (Intersection, union).


The intersection of A and B is A ∩ B = {s ∈ S : s ∈ A and s ∈ B}.
The union of A and B is A ∪ B = {s ∈ S : s ∈ A or s ∈ B}. 2

Example (continued).

• A ∩ B = {2, 4, 6} ∩ {1, 2, 3} = {2}

• A ∪ B = {2, 4, 6} ∪ {1, 2, 3} = {1, 2, 3, 4, 6} 2

(Draw Venn diagrams)

Definition 1.2.2 (Mutually exclusive events). A and B are mutually exclusive iff A∩B =
∅, the null or empty set. 2

Example (continued). A ∩ B = {2}; therefore A and B are not mutually exclusive. 2

Definition 1.2.3 (Complement, without). The complement of A is defined as AC =


{s ∈ S : s ̸∈ A}. This is the same as writing S\A = AC , thus A\B = {s ∈ A : s ̸∈ B}
means A without B and this is equivalent to A ∩ (B C ). 2

Example (continued). AC = {2, 4, 6}C = {1, 3, 5} and A\B = {4, 6}. 2

Theorem (De Morgan’s laws; not in textbook, will be proved in the Tutorial of Week 2)
For any two events A and B:

(i) The complement of their intersection is the union of their complements:

(A ∩ B)C = AC ∪ B C

(ii) The complement of their union is the intersection of their complements:

(A ∪ B)C = AC ∩ B C 2

4
These notions extend to more than two events:

∪ki=1 Ai = A1 ∪ A2 ∪ . . . ∪ Ak = {s ∈ S : ∃i s.t. s ∈ Ai }

∩ki=1 Ai = A1 ∩ A2 ∩ . . . ∩ Ak = {s ∈ S : s ∈ Ai ∀ i = 1, . . . , k}

Example 1.2.5 (Telescoping sets). Let A1 , A2 , . . . , Ak be intervals in R such that


 
1
Ai = x : 0 ≤ x < , i = 1, 2, . . . , k
i

Then, ∪ki=1 Ai = [0, 1) and ∩ki=1 Ai = [0, 1/k). 2

1.3 The probability function


If A is any event defined on a sample space S.

• The symbol P (A) will denote the probability of A.

• P (A) is a function, mapping A ⊂ S to a value in [0, 1].

Kolmogorov showed if S has a finite number of members, that is #S = n, then as few as


three axioms are necessary and sufficient for characterizing the probability function P :

Axiom 1. Let A be any event defined over S. Then P (A) ≥ 0.

Axiom 2. P (S) = 1.

Axiom 3. Let A and B be any two mutually exclusive events defined overs S. Then

P (A ∪ B) = P (A) + P (B).

When S has an infinite number of members, Axiom 3 needs to be modified.

Axiom 3’. Let A1 , A2 , . . . ⊂ S. If Ai ∩ Aj = ∅ for every i ̸= j, then



X
P (∪∞
i=1 Ai ) = P (Ai )
i=1

(In Larsen and Marx (2012) Axiom 3’ is called Axiom 4 but indeed the above is the proper
formulation of the axiomatic definition of probability, it requires exactly three axioms.)

5
Theorem 1.3.1 (Basic properties of the probability function).
(i) P (AC ) = 1 − P (A)

(ii) P (∅) = 0

(iii) If A ⊂ B then P (A) ≤ P (B)

(iv) For any event A, P (A) ≤ 1

(v) Let A1 , A2 , . . . , Ak be (pairwise) mutually exclusive events, then


k
X
P (∪ki=1 Ai ) = P (Ai )
i=1

(vi) P (A ∪ B) = P (A) + P (B) − P (A ∩ B) 2

Proof of Theorem 1.3.1(i)–(vi)

(i) Note S = A ∪ (AC ) and these are mutually exclusive. Thus,


A2 A3
1 = P (S) = P (A ∪ AC ) = P (A) + P (AC ).

(ii) Follows from (i), noting ∅ = S C .

(iii) If A ⊂ B, then B = A ∪ (B ∩ AC ), the union of two mutually exclusive events.


(draw Venn diagram) Thus, P (B) = P (A) + P (B ∩ AC ) and the second term is ≥ 0
(from A1) and we conclude P (B) ≥ P (A).

(iv) The proof follows immediately from (iii) because A ⊂ S and P (S) = 1.

(v) Follows by induction, using Axiom 3 as the starting point.

(vi) The Venn diagram for A ∪ B certainly suggest that the statement is true. More
formally from Axiom 3

P (A) = P (A ∩ B C ) + P (A ∩ B) and P (B) = P (B ∩ AC ) + P (B ∩ A).

Adding these two equations gives

P (A) + P (B) = P (A ∩ B C ) + P (A ∩ B) + P (B ∩ AC ) + P (B ∩ A).


 

By (v), the sum in the brackets is P (A ∪ B), so we need to subtract P (A ∩ B) from


either side. 2
Example 1.3.1. Let A and B be any two events defined on S. Let P (A) = 0.4, P (B) =
0.5 and P (A ∩ B) = 0.1. What is the probability that A or B but not both occur? (Draw
a Venn diagram)
Note, that we can write A and B as follows:

6
• A = (A ∩ B) ∪ (A ∩ B C ) and those two are mutually exclusive

• B = (B ∩ A) ∪ (B ∩ AC ) and those two are mutually exclusive

Thus,

• P (A) = P (A∩B)+P (A∩B C ) thus, P (A∩B C ) = P (A)−P (A∩B) = 0.4−0.1 = 0.3

• P (B) = P (A ∩ B) + P (B ∩ AC ) thus, P (B ∩ AC ) = P (B) − P (A ∩ B) = 0.4

and
P (A ∩ B C ) + P (B ∩ AC ) = 0.7 2

7
(Begin of Lecture 1.3 )

1.4 Conditional probability


Learn from what is already known.
Any probability that is revised to take into account the (known) occurrence of other
events is said to be a conditional probability.
Example 1.4.1 (6-sided die). Let A = {6} and B = {2, 4, 6}, then P (A) = 1/6 and
P (B) = 1/2. What is the probability of A occurring, given that we know B is occurring?
Recall from Lecture 1.1 “classical definition of probability”:
m′
P (A|B) = .
n′
Here, n′ = 3 equally likely (possible) outcomes (knowing B occurred, these are 2, 4 or
6); m′ = #{6} = 1. Thus, P (A|B) = 1/3. 2

We observe, knowing B is like shrinking the sample space from S to B, the conditional
sample space.
More formally.
Let S be a finite sample space with n equally likely outcomes. Assume A and B are two
events with a and b outcomes, respectively, and let c = #{s ∈ A ∩ B}.
This is Figure 2.4.2 in Larsen and Marx (2012)

Note
c c/n P (A ∩ B)
P (A|B) = = =
b b/n P (B)
This motivates the following definition.
Definition 1.4.1 (Conditional probability). Let A, B ⊂ S such that P (B) > 0. The
conditional probability of A given B is defined as
P (A ∩ B)
P (A|B) = ⇒ P (A ∩ B) = P (A|B)P (B) 2
P (B)

8
Example 1.4.2 (Two boys problem). Consider set of all families having exactly two chil-
dren but these are not twins. What is the probability that A = “both children are boys”
given the event B = “at least one child is a boy”?

S = {(b, b), (b, g), (g, b), (g, g)}


| {z }
B
and each of the four possible outcomes is equally likely, i.e. P (s) = 1/4 for all s ∈ S.
Note, A ⊂ B, therefore A ∩ B = A.

Def P (A ∩ B) P (A) 1/4 1


P (A|B) = = = = (not 1/2!) 2
P (B) P (B) 3/4 3

Applying conditional probability to higher order intersections


Recall from Definition 1.4.1
symmetry
P (A ∩ B) = P (A|B)P (B) = P (B|A)P (A)
Consider
(A ∩ B) ∩C = D ∩ C
| {z }
D
Thus,
P (A ∩ B ∩ C) = P (C ∩ D) = P (C|D)P (D)
= P (C|A ∩ B)P (A ∩ B) = P (C|A ∩ B)P (B|A)P (A)
Repeating the same argument for k events A1 , A2 , . . . , Ak we note
P (A1 ∩ A2 ∩ . . . ∩ Ak ) =P (Ak |A1 ∩ A2 ∩ . . . ∩ Ak−1 )
×P (Ak−1 |A1 ∩ A2 ∩ . . . ∩ Ak−2)
..
.
×P (A2 |A1 )P (A1 )

“Unconditional” and “inverse” probabilities


Consider a partitioned sample space.
This is Figure 2.4.7 in Larsen and Marx (2012)

9
Definition 1.4.2. A set of events A1 , A2 , . . . , An is a partition of S if every s ∈ S belongs
to one and only one of the Ai ’s. 2

It is clear that such Ai ’s are mutually exclusive.


Theorem 1.4.1. Let {Ai }ni=1 with Ai ⊂ S such that S = ∪ni=1 Ai and Ai ∩ Aj = ∅ for
i ̸= j and P (Ai ) > 0 for i = 1, . . . , n. For any event B,
n
X
P (B) = P (B|Ai )P (Ai ) 2
i=1

Proof of Theorem 1.4.1. Note that B can be written as the following union of mutually
exclusive events:
B = (B ∩ A1 ) ∪ (B ∩ A2 ) ∪ . . . ∪ (B ∩ An )
Thus,
n
X n
X
P (B) = P (B ∩ Ai ) = P (B|Ai )P (Ai ) 2
i=1 i=1

Example 1.4.3 (Standard poker set). Cards are shuffled and top card is removed. What
ist the probability of
B = “second card is an ace”
Let A1 = “top card was an ace” and A2 = AC
1 . Then,

• P (B|A1 ) = 3/51; P (B|A2 ) = 4/51


• P (A1 ) = 4/52; P (AC
1 ) = 48/52

Using Theorem 1.4.1, we calculate

P (B) = P (B|A1 )P (A1 ) + P (B|A2 )P (A2 )


     
3 4 4 48
= +
51 52 51 52
4
= = P (A1 )
52
What we knew didn’t matter! 2

Why? (Will be topic of the next lecture)

Theorem 1.4.2 (Bayes). Let A1 , A2 , . . . , An be a partition of S. Let B ⊂ S with P (B) >


0. Then,
P (B|Aj )P (Aj )
P (Aj |B) = Pn 2
i=1 P (B|Ai )P (Ai )

Proof of Theorem 1.4.2 Follows from the definition of conditional probability and
using Theorem 1.4.1, that is
Def P (Aj ∩ B) Thm P (B|Aj )P (Aj )
P (Aj |B) = = Pn 2
P (B) i=1 P (B|Ai )P (Ai )

10
Example 1.4.4. During a power blackout, 100 persons are arrested on suspicion of
looting. Each is given a polygraph test. From past experience it is known that the
polygraph is 90% reliable when administered to a guilty suspect and 98% reliable when
given to someone who is innocent. Suppose that of the one hundred persons taken into
custody, only twelve were actually involved in any wrongdoing. What is the probability
that a given suspect is innocent given that the polygraph says he is guilty?

Let

• B = “Polygraph says suspect is guilty”

• A1 = “Suspect is guilty”

• A2 = “Suspect is not guilty”

Then, learning from the context:

• Polygraph is “90% reliable when administered to a guilty suspect”, means

P (B|A1 ) = 0.90

• 98% reliability for innocent suspects implies that

P (B C |A2 ) = 0.98 or equivalently P (B|A2 ) = 0.02

• We also know that P (A1 ) = 12/100 and P (A2 ) = 88/100.

Substituting into Theorem 1.4.2 then, shows that the probability a suspect is innocent
given that the polygraph says he is guilty is 0.14:

P (B|A2 )P (A2 )
P (A2 |B) =
P (B|A1 )P (A1 ) + P (B|A2 )P (A2 )
(0.02)(88/100)
=
(0.90)(12/100) + (0.02)(88/100)
= 0.14

References
Larsen RL, Marx ML (2012). Introduction to Mathematical Statistics and Its Applica-
tions, 5th Edition, Boston: Pearson. - Chapter 2.1 - 2.4

11

You might also like