Stat6201 ch1-3
Stat6201 ch1-3
1 Probability Theory
1.0 Basic concepts [X]
1.1 Set Theory [X]
1.2 Basics of Probability Theory
1.2.1 Axiomatic Foundations
Intuition:
1. A probability function is a measure of strength, like length, area, volume. The impossible event should
have probability 0. The certain event (sample space) should have probability 1.
2. A probability function should be additive whenever the two events have nothing in common.
• Two events are called mutually exclusive or pairwise disjoint if there is no outcome in common
between any pair, that is, two such events cannot simultaneously occur.
• A collection is called mutually exclusive (or disjoint) if any pair is mutually exclusive.
• Additivity of probability should also hold for any countably many disjoint events.
Therefore, we want to define probability on a collection of events, which is large enough (many useful events),
and good enough (reflect our intuitions above, and make sense to do set operations). This collection is called
a sigma algebra, or σ−algebra, or Borel field (see Textbook Definition 1.2.1, pp 6).
Definition 2 (Kolmogorov Axioms, Textbook Definition 1.2.4, pp 7) Given a sample space S and
an associated sigma algebra B, a probability function is a function P defined with domain B that satisfies:
(i) P (A) ≥ 0 for all A ∈ B. (nonnegativity)
(ii) P (S) = 1. (total probability) P∞
(iii) If A1 , A2 , . . . are mutually exclusive events, then P (∪∞
i=1 Ai ) = i=1 P (Ai ). (countable additivity)
Remark: Any function that satisfies the above three criteria is a valid probability function.
Example 1.2.1 1. If we toss a coin, S = {H, T }. If the coin is “fair” (balanced), how can we define a
valid probability function? What if the coin is unbalanced?
2. Classical definition for the finite sample space S = {1, . . . , N }. Show P (A) = #A/N is a valid probability
function.
3. Geometrical probability. Say, events are described by area on a plane. P (A) = area of A/the total
area, e.g. the model when hitting a dart board. As area is additive, so will be P . (In three dimension,
area will be replaced by volume.) This gives a sense of uniform distribution of probability.
Example 1.2.2 If A and B are two events and each has probability 0.95 of occuring. Then the probability
of both occurring is at least 0.90.
Definition 3 The events A1 , A2 , . . . are pairwise disjoint (or mutually exclusive) if Ai ∩ Aj = ∅ for
all i 6= j.
Remark: Note that any collection of events can be disjointified — A∗1 = A1 , A∗2 = A2 ∩ Ac1 = A2 \ A1 ,
A∗i = Ai \ (∪i−1 n n ∗ ∗
j=1 Aj ). Then ∪i=1 Ai = ∪i=1 Ai for all n = 1, 2, . . . , ∞ (including ∞), and Ai ’s are disjoint.
Acknowledgement
The lecture notes of this course are based on the textbook and Prof. Huixia Judy Wang’s lecture slides. The
instructor thanks Prof. Wang for kindly sharing them.
The End.