01_ProbabilityModelsFilled
01_ProbabilityModelsFilled
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
Basic probability models
A probability model consists of an experiment which produces ex-
actly one out of several mutually exclusive outcomes. The essential
elements are:
1. The sample space ⌦. This is simply the collection of all pos-
sible outcomes.
2. A probability law P (·), which assigns a “likelihood” to di↵er-
ent events. More on this later.
P (A) .
The probability law P (·) must obey certain properties, which we will
get to soon, but first let’s look at two simple examples.
⌦ = { , , , , , }.
1
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
Under this definition of ⌦, events include (but of course are not limited
to)
{ }, i.e., the result of the roll is a “1”
{ , , }, i.e., the result is odd
{ , , }, i.e., the result is even
{ , , }, i.e., the result is less than or equal to “3”
etc.
2
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
Example. Consider a fair coin that we will toss twice. When we have
repeated actions like this, we will often considered a single experiment
with sample space; here we have
⌦ = {HH, HT, T H, T T }.
Events include
{HH, HT, T H}, i.e., there is at least one “heads” (or at most
one “tails”)
{HT, T H}, i.e., there is exactly one “heads” (or exactly one
“tails”)
etc.
Since the coin is fair, a natural probability law is to assign each of
the four events a probability of 1/4, and so
P (at least one “heads”) = P ({HH, HT, T H}) = 3/4,
P (exactly one “heads”) = P ({HT, T H}) = 1/2,
and so on.
Note that an alternative line of reasoning would be to say that there
are three possibilities: (i) no “heads”, (ii) one “heads”, and (iii)
two “heads”, and since we have a fair coin, all three possibilities
are equally likely, so the first probability is 23 and the latter is 13 .
Unfortunately, this is wrong because these three possibilities are not
really equally likely. While this is kind of obvious here, in more
complicated situations this is an easy mistake to make.
Given that our foundational tool for understand the fundamentals
of probability are the mathematical notion of sets, it will be critical
to have a good understanding of basic set operations. If you need
a refresher, this set of notes has a review of basic set notation and
operations at the end.
3
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
Kolmogorov’s probability axioms
We will build up a theory of probability based axioms that all prob-
ability laws must obey in order to be consistent with common sense.
This abstraction allows us to develop definitive mathematical rules
that stand apart from the philosophical questions about what the
probability really represents. Specifically, we will require a probabil-
ity law to assign a number to every possible event A such that
1. Nonnegativity: P (A) 0 for every event A
2. Additivity: If A and B are disjoint, i.e., if A \ B = ;, then
P (A [ B) = P (A) + P (B).
3. Normalization: P (⌦) = 1, that is, the probability that “some-
thing happens” is 1.
There are many properties that can be immediately derived from
these three axioms. For example, the normalization and additivity
axioms tell us that
which simplifies to
P (;) = 0,
i.e., the probability that “nothing happens” is 0. Also, for any event
A,
1 = P (⌦) = P (A [ Ac ) = P (A) + P (Ac ) ,
and so
P (Ac ) = 1 P (A) .
4
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
A2 , . . . , An are n disjoint events, then
Here are some additional properties that you should prove at home:
Let A, B, C be arbitrary events and let P (·) be a probability law
satisfying the Kolmogorov axioms. Then
1. If A ✓ B, then P (A) P (B)
2. P (A [ B) = P (A) + P (B) P (A \ B)
3. P (A [ B) P (A) + P (B)
4. P (A [ B [ C) = P (A) + P (Ac \ B) + P (Ac \ B c \ C)
Proving these will help provide a good review of basic set theory.
Exercise: Out of the students in a class, 60% love soda, 70% love
pizza, and 40% love both soda and pizza. What is the probability
that a randomly selected student loves neither soda nor pizza?
Ans. If A is the event that a student loves soda and B is the event that
a student loves pizza, then P (A) = 0.6, P (B) = 0.7, and P (A \ B) =
0.4. Thus, using DeMorgan’s laws and the probability axioms, we
have that
5
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
Where do probability laws come from?
That is a good question, and this is where the “modeling” comes in.
A probability law can potentially be based on factors such as:
relative frequencies in past occurrences (i.e., “data driven”)
physical laws
subjective belief based on experience
a careful and thorough polling of the public
etc.
6
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
Discrete models vs. continuous models
When there are a finite number of possible outcomes in ⌦, defining
all of the possible events does not require too much imagination. If
|⌦| = n,
where |⌦| denotes the size or number of elements in ⌦, then there are
2n di↵erent subsets.
7
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
Example. You flip coin until you see “tails.” The outcome of the
experiment is how many times the coin gets flipped. This could be
any natural number, i.e.,
⌦ = {1, 2, 3, . . .} = N.
1
Note that we are slightly abusing notation here by letting k stand for both “the
number of flips until ‘tails’” as well as “the event that there are k flips until
‘tails.’”
2
See the wikipedia page on Cantor’s “diagonal argument” for a more complete
understanding of the di↵erence between “countably infinite” and “uncountably
infinite.”
8
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
For example, suppose I choose a point at random from the interval
[0, 1]. The natural probability law would define the probability of
any particular point p to be zero. After all, what are the chances
that you would draw 22 = 0.70710678118 . . . or 15 = 0.20000000000 . . .
exactly? However, if I define the event A to be picking a point
between 13 and 23 , i.e., A = [ 13 , 23 ], then in this case
1
P (A) = Length(A) = ,
3
and similarly for any other “typical” subset A ✓ ⌦.
However, there are some subsets for which the “length” of the subset is
not well-defined—these are called “non-measurable sets”. I’d give you
an example, but it’s not really worth it—these sets are so unusual that
they rarely (if ever) play a role in our understanding of probability.
9
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
The discrete uniform law
The most basic probability law is simply that every outcome has the
same probability. If ⌦ is finite with |⌦| = n, this simply means that
for any A ✓ ⌦,
Since ⌦ contains eight possible outcomes and each has equal proba-
bility (assuming a fair coin), each of these outcomes has a probability
of 18 . There are only three outcomes that have exactly two heads:
and so
3
P (A) = .
8
10
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
Exercise: Given three flips of a fair coin, what is the probability of:
1
1. at least two heads: Ans. P ({HHH, HHT, HTH, THH}) = 2
1
2. odd number of heads: Ans. P ({HHH, HTT, THT, TTH}) = 2
1
3. all tails: Ans. P ({TTT}) = 8
Exercise: We roll two fair six-sided dice; call the outcomes D1 and
D2 . There are now 62 = 36 possibilities, each with equal probability.
Here is a graphical depiction of some events:
6
5 {D2 = 5}
4
D2
3
2
{D1 + D2 = 8}
1
1 2 3 4 5 6
{max(D1, D2) 3}
D1
11
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
The continuous uniform law
When ⌦ is a continuum of events, the statement “every outcome is
equally likely” becomes trickier, since the outcome of any particular
event is zero.
In many cases, it will be natural to take ⌦ as an interval on the real
line R, or as a subset of the plane R2 , or as a subspace of the space
R3 , etc.
For example, suppose I throw a dart at a dartboard and ask what
angle (in radians) the result makes with respect to the x-axis.
dart
Length(A)
P (A) = .
Length(⌦)
12
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
Example. Suppose ⌦ is the unit-square [0, 1]2 = [0, 1] ⇥ [0, 1], i.e.,
⌦ = {(x, y) : 0 x 1, 0 y 1}.
⌦ = [0, 1]2
0
0 1
Area(A)
P (A) = = Area(A).
Area(⌦)
13
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
Exercise: Han and Chewbacca have arranged to meet at the cantina
at noon. Unfortunately Han gets delayed by a bounty hunter and
Chewbacca loses his watch, so they both are running late. Suppose
that they both arrive with delays of anywhere from zero to two hours
(with all possible delay combinations equally likely). Whoever gets
there first will have a drink, wait for 20 minutes, and will leave if
the other has not yet arrived. What is the probability that Han and
Chewbacca meet? (Hint: start by sketching the event A.)
Ans. We need to calculate the area of the times when Han and Chewy
will arrive within 20 minutes of each other, and find what fraction of
the total area that is. The total area is 1202 = 14400, depicted
by a square with sides of length 120 (assuming we are measuring in
minutes). The easiest way to find the area of the times when they
overlap is to subtract from the total area the times when they don’t
overlap (which are simple triangles of area 12 · 100 · 100). So,
Area(A)
P (A) =
Area(⌦)
Area(⌦) Area(Ac )
=
Area(⌦)
14400 1002 4400
= = ⇡ 0.306.
14400 14400
14
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
Background and Review: Basic set operations
As we have seen, it is very natural to talk about sample spaces, out-
comes and events in terms of set operations. This section serves as a
quick brush-up on the basics.3
A set is just a collection of objects. For example
Z = {. . . , 2, 1, 0, 1, 2, 3, . . .}
na o
Q= for all a, b 2 Z ,
b
are examples of countably infinite sets. Finally, sets like
15
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
For everything we do in this class, all sets of interest will be subsets of
a sample space ⌦ — you can think of ⌦ as the “universe” associated
with a particular experiment.
Set operations
Union: Simply combine the elements of the two sets. Easy example:
A [ B shaded
16
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
A \ B shaded
A\B =;
Ac shaded
17
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
The di↵erence between A and B is everything in A which is not in
B, i.e., A\B = A \ B c .
A\B shaded
DeMorgan’s Laws
Two simple rules of set algebra come in handy from time to time.
1. (A [ B)c = Ac \ B c .
(A [ B)c shaded
18
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
2. (A \ B)c = Ac [ B c .
(A \ B)c shaded
19
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023