0% found this document useful (0 votes)
11 views

01_ProbabilityModelsFilled

Uploaded by

haithamnoruldeen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

01_ProbabilityModelsFilled

Uploaded by

haithamnoruldeen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Introduction to Probability

ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
Basic probability models
A probability model consists of an experiment which produces ex-
actly one out of several mutually exclusive outcomes. The essential
elements are:
1. The sample space ⌦. This is simply the collection of all pos-
sible outcomes.
2. A probability law P (·), which assigns a “likelihood” to di↵er-
ent events. More on this later.

An event A is simply a collection of possible outcomes, i.e., A is a


subset of ⌦. We denote the probability that A occurs as

P (A) .

The probability law P (·) must obey certain properties, which we will
get to soon, but first let’s look at two simple examples.

Example. Consider a fair six-died die. The experiment is rolling the


die. The sample space (possible outcomes) is given by

⌦ = { , , , , , }.

Note that this definition of ⌦ involves a certain amount of idealiza-


tion. In particular, we omit possibilities such as “the die balances
perfectly on a corner or edge” and “the die rolls o↵ the table”. The
determination of the sample space often involves such judgement calls.

1
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
Under this definition of ⌦, events include (but of course are not limited
to)
{ }, i.e., the result of the roll is a “1”
{ , , }, i.e., the result is odd
{ , , }, i.e., the result is even
{ , , }, i.e., the result is less than or equal to “3”
etc.

In this case, there are 26 = 64 di↵erent possible events under the


assumption that we allow A = ; and A = ⌦, which can be interpreted
as the events that “nothing happens” and that “something happens”,
to quality as events. This might seem a little strange, but both ; and
⌦ are subsets of ⌦, and we will shortly see that there are good reasons
for letting them count as events.

Since the die is “fair”, a natural probability law is to assign each of


the six possible outcomes the same value, i.e.,
1
P ({ }) = P ({ }) = P ({ }) = P ({ }) = P ({ }) = P ({ }) = .
6
It is then straightforward to compute the corresponding probability
of di↵erent events, e.g.,
1
P ({ , , }) = 2
1
P ({ , , }) = 2
1
P ({ , }) = 3
etc.

2
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
Example. Consider a fair coin that we will toss twice. When we have
repeated actions like this, we will often considered a single experiment
with sample space; here we have
⌦ = {HH, HT, T H, T T }.
Events include
{HH, HT, T H}, i.e., there is at least one “heads” (or at most
one “tails”)
{HT, T H}, i.e., there is exactly one “heads” (or exactly one
“tails”)
etc.
Since the coin is fair, a natural probability law is to assign each of
the four events a probability of 1/4, and so
P (at least one “heads”) = P ({HH, HT, T H}) = 3/4,
P (exactly one “heads”) = P ({HT, T H}) = 1/2,
and so on.
Note that an alternative line of reasoning would be to say that there
are three possibilities: (i) no “heads”, (ii) one “heads”, and (iii)
two “heads”, and since we have a fair coin, all three possibilities
are equally likely, so the first probability is 23 and the latter is 13 .
Unfortunately, this is wrong because these three possibilities are not
really equally likely. While this is kind of obvious here, in more
complicated situations this is an easy mistake to make.
Given that our foundational tool for understand the fundamentals
of probability are the mathematical notion of sets, it will be critical
to have a good understanding of basic set operations. If you need
a refresher, this set of notes has a review of basic set notation and
operations at the end.

3
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
Kolmogorov’s probability axioms
We will build up a theory of probability based axioms that all prob-
ability laws must obey in order to be consistent with common sense.
This abstraction allows us to develop definitive mathematical rules
that stand apart from the philosophical questions about what the
probability really represents. Specifically, we will require a probabil-
ity law to assign a number to every possible event A such that
1. Nonnegativity: P (A) 0 for every event A
2. Additivity: If A and B are disjoint, i.e., if A \ B = ;, then
P (A [ B) = P (A) + P (B).
3. Normalization: P (⌦) = 1, that is, the probability that “some-
thing happens” is 1.
There are many properties that can be immediately derived from
these three axioms. For example, the normalization and additivity
axioms tell us that

1 = P (⌦) = P (⌦ [ ;) = P (⌦) + P (;) = 1 + P (;) ,

which simplifies to
P (;) = 0,
i.e., the probability that “nothing happens” is 0. Also, for any event
A,
1 = P (⌦) = P (A [ Ac ) = P (A) + P (Ac ) ,
and so
P (Ac ) = 1 P (A) .

Another useful property that follows from additivity is that if A1 ,

4
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
A2 , . . . , An are n disjoint events, then

P (A1 [ A2 [ · · · [ An ) = P (A1 ) + P (A2 [ A3 [ · · · [ An )


= P (A1 ) + P (A2 ) + P (A3 [ · · · [ An )
...
= P (A1 ) + P (A2 ) + · · · + P (An ) .

Here are some additional properties that you should prove at home:
Let A, B, C be arbitrary events and let P (·) be a probability law
satisfying the Kolmogorov axioms. Then
1. If A ✓ B, then P (A)  P (B)
2. P (A [ B) = P (A) + P (B) P (A \ B)
3. P (A [ B)  P (A) + P (B)
4. P (A [ B [ C) = P (A) + P (Ac \ B) + P (Ac \ B c \ C)
Proving these will help provide a good review of basic set theory.
Exercise: Out of the students in a class, 60% love soda, 70% love
pizza, and 40% love both soda and pizza. What is the probability
that a randomly selected student loves neither soda nor pizza?
Ans. If A is the event that a student loves soda and B is the event that
a student loves pizza, then P (A) = 0.6, P (B) = 0.7, and P (A \ B) =
0.4. Thus, using DeMorgan’s laws and the probability axioms, we
have that

P (Ac \ B c ) = P ((A [ B)c ) = 1 P (A [ B)


=1 (P (A) + P (B) P (A \ B))
=1 (0.6 + 0.7 0.4) = 0.1

5
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
Where do probability laws come from?
That is a good question, and this is where the “modeling” comes in.
A probability law can potentially be based on factors such as:
relative frequencies in past occurrences (i.e., “data driven”)
physical laws
subjective belief based on experience
a careful and thorough polling of the public
etc.

Examples where these di↵erent approaches can be exploited include:


“What is the probability that LeBron James makes his next free
throw?”
“What is the probability that more than 103 photons hit the
detector in 1 µs?”
“What is the probability that my wife will be mad at me when
I get home?”
“What is the probability that Donald Trump will be elected
president in 2024?”

6
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
Discrete models vs. continuous models
When there are a finite number of possible outcomes in ⌦, defining
all of the possible events does not require too much imagination. If

|⌦| = n,

where |⌦| denotes the size or number of elements in ⌦, then there are
2n di↵erent subsets.

In many situations ⌦ can be huge but still easy to describe:


The number of 13-card bridge hands you could be dealt
The number of possible license plates you could potentially re-
ceive
The number of possible outcomes for all teams for the entirety
of one Major League Baseball season

Here, the probability of an event is simply the sum of the proba-


bilities of the outcomes that make up that event. Thus, if A =
{s1 , s2 , . . . , sm } then

P (A) = P (s1 ) + P (s2 ) + · · · + P (sm ) .

Moving from a finite number of discrete events to an infinite number


of discrete events doesn’t cause too many mathematical difficulties.

7
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
Example. You flip coin until you see “tails.” The outcome of the
experiment is how many times the coin gets flipped. This could be
any natural number, i.e.,

⌦ = {1, 2, 3, . . .} = N.

If the coin is “fair,” a natural probability law is1


✓ ◆k
1
P (k) = P ({k flips until “tails”}) = .
2

It is easy to check that


1
X
P (k) = 1.
k=1

In contrast to the discrete case, when there is a continuum of possi-


ble outcomes (“uncountably infinite” in the language of set theory2 ),
then there are some very technical considerations about what subsets
of ⌦ can constitute an event.

1
Note that we are slightly abusing notation here by letting k stand for both “the
number of flips until ‘tails’” as well as “the event that there are k flips until
‘tails.’”
2
See the wikipedia page on Cantor’s “diagonal argument” for a more complete
understanding of the di↵erence between “countably infinite” and “uncountably
infinite.”

8
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
For example, suppose I choose a point at random from the interval
[0, 1]. The natural probability law would define the probability of
any particular point p to be zero. After all, what are the chances
that you would draw 22 = 0.70710678118 . . . or 15 = 0.20000000000 . . .
exactly? However, if I define the event A to be picking a point
between 13 and 23 , i.e., A = [ 13 , 23 ], then in this case

1
P (A) = Length(A) = ,
3
and similarly for any other “typical” subset A ✓ ⌦.

However, there are some subsets for which the “length” of the subset is
not well-defined—these are called “non-measurable sets”. I’d give you
an example, but it’s not really worth it—these sets are so unusual that
they rarely (if ever) play a role in our understanding of probability.

This issue, although seemingly arcane, is important to resolve to put


probability on a firm mathematical footing. Fortunately, this has
been done in an area of mathematics called “measure theory”. This
is a topic for first-year graduate students—in this class it is enough
to know that assigning probabilities to well-defined subsets is enough
to avoid any major difficulties.

9
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
The discrete uniform law
The most basic probability law is simply that every outcome has the
same probability. If ⌦ is finite with |⌦| = n, this simply means that
for any A ✓ ⌦,

|A| the number of elements in A


P (A) = = .
|⌦| n

Example. A fair six-sided die is rolled; call the outcome D. What


is P ({D < 5})?
4
In this case, A = {1, 2, 3, 4}, and so P (A) = 6 = 23 .

Example. A fair coin is flipped three times. What is the probability


that exactly two “heads” occur?
In this case we are actually dealing with a sequence of outcomes. We
will talk more about ways to handle such problems later on, but in
this case we can simply expand our notion of the sample space ⌦ to
include all possible sequences of outcomes, i.e., we can consider

⌦ = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}.

Since ⌦ contains eight possible outcomes and each has equal proba-
bility (assuming a fair coin), each of these outcomes has a probability
of 18 . There are only three outcomes that have exactly two heads:

A = {HHT, HTH, THH},

and so
3
P (A) = .
8

10
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
Exercise: Given three flips of a fair coin, what is the probability of:
1
1. at least two heads: Ans. P ({HHH, HHT, HTH, THH}) = 2
1
2. odd number of heads: Ans. P ({HHH, HTT, THT, TTH}) = 2
1
3. all tails: Ans. P ({TTT}) = 8

Exercise: We roll two fair six-sided dice; call the outcomes D1 and
D2 . There are now 62 = 36 possibilities, each with equal probability.
Here is a graphical depiction of some events:

6
5 {D2 = 5}
4
D2
3
2
{D1 + D2 = 8}
1
1 2 3 4 5 6
{max(D1, D2)  3}
D1

Calculate the probability that


1. the first roll is larger than the second, i.e., P (D1 > D2 )
Ans. P (D1 > D2 ) = 15 36

2. the first roll is equal to half of the second, i.e., P D1 = 12 D2


Ans. P D1 = 12 D2 = 36 3
= 121

3. at least one roll is a four, i.e., P ({D1 = 4} [ {D2 = 4})


Ans. P ({D1 = 4} [ {D2 = 4}) = 11 36

11
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
The continuous uniform law
When ⌦ is a continuum of events, the statement “every outcome is
equally likely” becomes trickier, since the outcome of any particular
event is zero.
In many cases, it will be natural to take ⌦ as an interval on the real
line R, or as a subset of the plane R2 , or as a subspace of the space
R3 , etc.
For example, suppose I throw a dart at a dartboard and ask what
angle (in radians) the result makes with respect to the x-axis.

dart

In this case, we can take ⌦ = [0, 2⇡] (or [ ⇡, ⇡]).


Then events A are subsets of ⌦, and the uniform law assigns

Length(A)
P (A) = .
Length(⌦)

In the dartboard example


⇣⇡ ⇡⌘ ⇡
1
P ✓ = 4 = .
4 2 2⇡ 8

12
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
Example. Suppose ⌦ is the unit-square [0, 1]2 = [0, 1] ⇥ [0, 1], i.e.,
⌦ = {(x, y) : 0  x  1, 0  y  1}.

⌦ = [0, 1]2

0
0 1

Then events A are subsets of [0, 1]2 and

Area(A)
P (A) = = Area(A).
Area(⌦)

Exercise: With ⌦ = [0, 1]2 and A = {(x, y) : max(x, y)  13 }. What


is P (A)?
Ans. The event A corresponds to a square of size 13 ⇥ 13 , which has an
area of 19 , so that P (A) = 19 .

13
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
Exercise: Han and Chewbacca have arranged to meet at the cantina
at noon. Unfortunately Han gets delayed by a bounty hunter and
Chewbacca loses his watch, so they both are running late. Suppose
that they both arrive with delays of anywhere from zero to two hours
(with all possible delay combinations equally likely). Whoever gets
there first will have a drink, wait for 20 minutes, and will leave if
the other has not yet arrived. What is the probability that Han and
Chewbacca meet? (Hint: start by sketching the event A.)
Ans. We need to calculate the area of the times when Han and Chewy
will arrive within 20 minutes of each other, and find what fraction of
the total area that is. The total area is 1202 = 14400, depicted
by a square with sides of length 120 (assuming we are measuring in
minutes). The easiest way to find the area of the times when they
overlap is to subtract from the total area the times when they don’t
overlap (which are simple triangles of area 12 · 100 · 100). So,

Area(A)
P (A) =
Area(⌦)
Area(⌦) Area(Ac )
=
Area(⌦)
14400 1002 4400
= = ⇡ 0.306.
14400 14400

14
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
Background and Review: Basic set operations
As we have seen, it is very natural to talk about sample spaces, out-
comes and events in terms of set operations. This section serves as a
quick brush-up on the basics.3
A set is just a collection of objects. For example

D = {Biden, Obama, Clinton},


R = {Trump, Bush},
Z8 = {0, 1, 2, 3, 4, 5, 6, 7},

are examples of finite sets, whereas the following

Z = {. . . , 2, 1, 0, 1, 2, 3, . . .}
na o
Q= for all a, b 2 Z ,
b
are examples of countably infinite sets. Finally, sets like

R = {all real numbers}


U = {x 2 R : 0  x  1},

are examples of uncountably infinite sets.

The so-called empty set ; = { } is the set which contains nothing.

A set B is a subset of another set A if everything in B is also in A:

B ⇢ A, if and only if for every x 2 B we also have x 2 A.

The empty set ; is a subset of every set.


3
See also en.wikipedia.org/wiki/Set_operations_(Boolean).

15
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
For everything we do in this class, all sets of interest will be subsets of
a sample space ⌦ — you can think of ⌦ as the “universe” associated
with a particular experiment.

Example: Suppose that A is a finite set with n elements.


1. How many subsets of A have exactly one element?
2. How many subsets of A have exactly two elements?
3. How many subsets of A are there total?

Set operations
Union: Simply combine the elements of the two sets. Easy example:

{1, 2, 3} [ {2, 3, 4} = {1, 2, 3, 4}.

A [ B shaded

Intersection: Find the common elements between two sets. Easy


example:
{1, 2, 3} \ {2, 3, 4} = {2, 3}.

16
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
A \ B shaded

We say that A and B are disjoint or mutually exclusive is they


have no elements in common.

A\B =;

The complement Ac of A is everything in ⌦ that is not in A. Easy


example:
⌦ = {1, 2, 3, 4, 5, 6}, A = {1, 2}, Ac = {3, 4, 5, 6}.

Ac shaded

17
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
The di↵erence between A and B is everything in A which is not in
B, i.e., A\B = A \ B c .

A\B shaded

Obviously, A\B 6= B\A in general.

DeMorgan’s Laws
Two simple rules of set algebra come in handy from time to time.
1. (A [ B)c = Ac \ B c .

(A [ B)c shaded

18
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023
2. (A \ B)c = Ac [ B c .

(A \ B)c shaded

Exercise: Suppose our sample space is ⌦ = {2, 4, 6, 8, 10, 12}, and


let
A = {2, 4, 6}, B = {8, 10, 12}, C = {2, 6, 8, 12}.
Find
1. A [ B
2. B [ C
3. A \ C
4. (B [ C)c
5. (A [ (B c [ (B \ C)c ))c

19
ECE 3077 Notes by M. Davenport, J. Romberg, C. Rozell, and M. Wakin. Last updated 15:13, August 25, 2023

You might also like