0% found this document useful (0 votes)
27 views

MAT 271 Probability and Statistics Lecture 2: Sample Space and Probability

This document provides an overview of foundational probability concepts including sample spaces, events, probability laws, discrete probability models, and examples of calculating probabilities of outcomes from experiments like coin flips and dice rolls; it discusses defining sample spaces and events, the axioms that define a valid probability law, and how to calculate probabilities for discrete models where outcomes are equally likely based on counting elements in the sample space.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

MAT 271 Probability and Statistics Lecture 2: Sample Space and Probability

This document provides an overview of foundational probability concepts including sample spaces, events, probability laws, discrete probability models, and examples of calculating probabilities of outcomes from experiments like coin flips and dice rolls; it discusses defining sample spaces and events, the axioms that define a valid probability law, and how to calculate probabilities for discrete models where outcomes are equally likely based on counting elements in the sample space.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

MAT 271 Probability and Statistics

Lecture 2: Sample Space and Probability

Asst. Prof. N. Kemal Ure

Istanbul Technical University


[email protected]

February 18, 2020


Overview

1 Introduction

2 Sets

3 Probabilistic Models

4 Conditional Probability

5 Total Probability Theorem and Bayes’ Rule

6 Independence

7 Counting

8 Summary
Introduction
Introduction

▶ What is probability? There are two different schools of thought:

▶ Frequentist perspective: Probability of an event is a measure of its


frequency of occurrence

∎ If I flip a coin lot of times, around 50% of the results would be heads.

▶ Bayesian perspective: Probability of an event is a measure of our


belief regarding its uncertainty

∎ Illiad and Oddysey were written by the same person with probability 90%

▶ Both views have their advantages/disadvantages in practice, more


about that when we move into Statistics.
Introduction

▶ In this lecture we will start laying out foundational material we are


going to build our course on:
∎ Sets, sample space, basic properties of probability measuring functions
∎ We will see that probability theory is all about measuring size of sets.

▶ We will also learn about some techniques that will allow us to


estimate probabilities of some simple events:
∎ Coin toss, die roll and deck shuffling problems
∎ We will see that most of the problems above are counting problems.

▶ We will also learn some of the most popular and useful probability
theorems: Total Probability Theorem and Bayes’ Rule.
Sets
Sets

▶ Why study set theory?


∎ Because later we will see that estimating probabilities of events are
equivalent to measuring the size of subsets of certain spaces.
∎ We will measure the size of discrete sets by counting the elements
∎ For continuous sets we will use integration
∎ But first, we need to have a good understanding of basic set theory.

▶ A set S is a collection of objects, which are elements of the set.


▶ A set can be defined by simply listing its elements
∎ Set of outcomes for a die roll: S = {1, 2, 3, 4, 5, 6}
∎ Set of outcomes for a coin flip: S = {H, T }
∎ The empty set ∅ is the set with no elements

▶ Or we can write a property that defines the elements of a set


∎ S = {x ∈ Z∣x ≥ 1, x ≤ 6}
Sets

▶ If a set has infinitely many elements that can be numerated in a list:

S = {x1 , x2 , . . . }
∎ Sets that can be written like this are called countable sets.
∎ Set of positive integers Z, set of even integers Z2n and set of rational
numbers Q are all countable.

▶ If an infinite set can’t be enumerated, the set is called uncountable


∎ The set S = {x∣0 ≤ x ≤ 1} is uncountable.

▶ We say set S is a subset of T or S ⊂ T if x ∈ S Ô⇒ x ∈ T


∎ Equivalently, the superset relation can be written as T ⊃ S.
∎ Two sets S and T are equal, that is S = T if statements S ⊂ T and T ⊂ S
are both true at the same time.

▶ We denote the universal set with Ω. For every set S, we have S ⊂ Ω


Set Operations

▶ The complement of a set S is denoted by S c and defined as all the


elements that does not belong to S:

S c = {x∣x ∉ S}

∎ Note that Ωc = ∅

▶ Union of two sets S and T , called S ∪ T is the set that contains all
of their elements

S ∪ T = {x∣x ∈ S or x ∈ T }

▶ Intersection of two sets S and T , called S ∩ T is the set that contains


elements that belong to both

S ∩ T = {x∣x ∈ S and x ∈ T }
Set Operations

▶ The union and intersection operations can be extended to several, or


even infinite number of sets. Let Sn be a collection of infinite
number of sets:

⋃ Sn = S1 ∪ S2 ∪ ⋅ ⋅ ⋅ = {x∣x ∈ Sn for some n}
n=1

⋂ Sn = S1 ∩ S2 ∩ ⋅ ⋅ ⋅ = {x∣x ∈ Sn for all n}
n=1

▶ Sets S, T are called disjoint if their intersection is empty; S ∩ T = ∅


∎ A collection of set Sn said to be a partition of set S, if the collection is
disjoint and their union is S.

▶ If x, y are two objects, (x, y) denotes the ordered pair of x and y.


∎ Set of all ordered pair of real numbers in the plane is R2
∎ Set of all ordered triplets of real numbers is R3
Set Operations
Algebra of Sets

▶ Set operations have some intuitive properties that can be easily


derived from definitions

S∪T = T ∪S
S ∩ (T ∩ U ) = (S ∩ T ) ∩ U
S ∩ (T ∪ U ) = (S ∩ T ) ∪ (S ∩ U )
S ∪ (T ∩ U ) = (S ∪ T ) ∩ (S ∪ U )
(S c )c = S
S ∩ Sc = ∅
S∪Ω = Ω
S∩Ω = S

▶ Two particularly useful properties are known as De Morgan’s Laws


c c
(⋃ Sn ) = c
⋂ Sn , (⋂ Sn ) = ⋃ Snc .
n n n n
Probabilistic Models
Probabilistic Models

▶ A probabilistic model is a mathematical description of an uncertain


situation. Two main ingredients of a probabilistic model are listed
below:

Definition (Elements of a Probabilistic Model)


▶ The Sample Space Ω, which is the set of all possible outcomes of an
experiment.
▶ The probability law P , which assigns the set A of possible outcomes
(also called an event), a nonnegative number P (A) (called the
probability of A), that encodes our knowledge/belief about the
collective likelihood of the elements of A. The probability law should
satisfy some certain properties to be introduced shortly
Probabilistic Models
Sample Spaces and Events

▶ All models have an underlying process called experiment


∎ An experiment produces exactly one out of several outcomes
∎ The set of all possible outcomes is called the sample space
∎ Subset of a sample space, a collection of outcomes, is called an event

▶ There is no restriction on what can be an experiment


∎ Could be a single coin toss, several tosses, or an infinite number of tosses
∎ However, there is only one experiment. Hence a sequence of tosses are
still a single experiment.

▶ Sample spaces can have finite or infinite number of elements


∎ Usually finite spaces are much simpler to deal with.
Sample Spaces and Events

▶ All elements of a sample space should be mutually exclusive


∎ That is, one outcome should refer to a single element
∎ Example: In a die roll experiment, sample space elements cannot be ”1 or
4” and ”1 or 3”

▶ A sample space should be collectively exhasutive


∎ No matter what the outcome is, there should always be a corresponding
element in the sample space.

▶ Example: What are the appropriate sample spaces for the


experiments below?
∎ We toss a coin ten times. We receive $1 each time head comes up.
∎ We toss a coin ten times. We receive $1 for the first time head comes up.
Then the received amount is doubled each time a head comes up.
Sequential Models

▶ Many experiments are inherently sequential


∎ Tossing a coin three times
∎ Observing the value of a stock on five successive days
∎ Receiving eight successive digits on communication receiver.

▶ In those cases it makes sense to describe the sample space in terms


of tree-based sequential description:
Probability Laws

▶ Now suppose that we have selected a sample space. Next step is to


select the probability law P

Definition (Probability Axioms)


1 (Nonnegativity) P (A) ≥ 0 for every events A.
2 (Additivity) If A and B are two disjoint events, then the probability of their
union is:
P (A ∪ B) = P (A) + P (B).
In general, if A1 , A2 , . . . is a sequence of disjoint events, then,

P (A1 ∪ A2 ∪ . . . ) = P (A1 ) + P (A2 ) + . . .

3 (Normalization) The probability of the entire sample space Ω is 1, that is,

P (Ω) = 1.
Probability Laws

▶ As an analogy, think of a unit of mass spread over the sample space.

∎ You can think P (A) as the collective mass that was assigned to elements
of A. Then the additivity axiom becomes intuitive.

▶ Based on the probability axioms, following results can be derived


∎ P (∅) = 0.
∎ P (A1 ∪ A2 ∪ A3 ) = P (A1 ) + P (A2 ) + P (A3 ), A1 , A2 , A3 are disjoint.
∎ More properties will be derived later.

▶ As an example, let’s derive a probability law for coin toss experiment


∎ Probability law for a single coin toss
∎ Probability law for three coin tosses
Discrete Models

▶ Generalizing the previous example, we get a probability law:

Definition (Discrete Probability Law)


If the sample space consists of a finite number of possible outcomes,
then the probability is specified by the probabilities of events that
consists of a single element. In particular, the probability of any event
{s1 , s2 , . . . , sn } is the sum of probabilities of its elements:

P ({s1 , s2 , . . . , sn }) = P (s1 ) + P (s2 ) + ⋅ ⋅ ⋅ + P (sn )

∎ Note that we are abusing the notation here by calling P ({si }) as P (si )
Discrete Models

▶ For the special case where each event is equally likely we have:

Definition (Discrete Probability Law)


If the sample space consists of n different outcomes, which are all
equally likely (i.e., P (si ) = 1/n), then probability of event A is

number of elements in A
P (A) = .
n

▶ Example: Rolling a pair of 4−sided dice.


∎ P ({Sum of rolls is even}) = 1/2
∎ P ({Sum of rolls is odd}) = 1/2
∎ P ({First roll is equal to second}) = 1/4

∎ P ({First roll is larger than second}) = 3/8

∎ P ({At least one roll is equal to 4}) = 7/16


Continuous Models

▶ Probabilistic models with continuous sample spaces differ from


discrete models:
∎ Probability of a single element is usually not enough to determine the
whole probability law
▶ Example: Consider a wheel of fortune, where the results are
calibrated from 0 to 1.
∎ Hence the sample space is Ω = [0, 1]
∎ But we can’t define P (x) > 0, x ∈ Ω, then by additivity we get P (Ω) → ∞
∎ We can work around this problem by assigning probabilities to intervals

rather than single elements, such as

P ([a, b]) = b − a, [a, b] ⊂ [0, 1]

▶ Example: Throwing darts


∎ It makes sense to define the probability of hitting a certain region as the
area of that region. (given that dartboard has unit area)
Continuous Models

▶ Example: Romeo and Juliet have a date and each arrive with a delay
of 0 to 1 hour. The first to arrive waits for at most 15 minutes and
then leaves. What is the probability that they will meet?
Continuous Models

▶ Why discrete and continuous models are so different?


∎ Because real line is uncountable. Discrete models are finite or countable.
∎ That is why probability of a single element P (x) can be positive in
discrete setting, where as it should be equal to zero in continuous setting.

▶ In general, probability of event A is measured by the integral ∫A dt.


∎ This integral is the ”area of the set”, in a general sense.
∎ However, it turns out that this integral is not well defined for all sets.
∎ Analyzing which sets are measurable is a very deep question. Thich is
addressed by a branch of mathematics called measure theory, which is out
of scope for this course.
∎ For the problems in this course, all sets will be measurable, we will not
have to deal with the non-existance of integrals or areas.
Properties of Probability Laws

▶ Using the axioms, we can prove the following properties

Theorem (Properties of Probability Laws)


Consider a probability law P and events A, B, C.
1 If A ⊂ B, then P (A) ≤ P (B).
2 P (A ∪ B) = P (A) + P (B) − P (A ∩ B).
3 P (A ∪ B) ≤ P (A) + P (B).
4 P (A ∪ B ∪ C) = P (A) + P (Ac ∩ B) + P (Ac ∩ B c ∩ C).
Models and Reality

▶ Probability theory can be used to analyze a wide variety of uncertain


physical processes. This is done in two stages:
∎ Stage I, Constructing the model: In this stage we assign an appropriate
sample space and probability law to the physical process
Sometimes, this is done via domain expertise and past experiences.
The most popular way is to use past data for building the probability
law. This is the main objective of statistics, which we will learn more
about later on.
∎ Stage II, Analyze the model: In this stage we work with our model to find
probabilities of certain events or deduce some interesting properties.
This part can be very hard depending on the mathematical tools and
algorithms you use.
One of the objectives of this course is to teach you efficient methods
for analyzing your probabilistic models.
Conditional Probability
Conditional Probability

▶ Conditional Probability provides us to compute probabilities of events


based on partial information. Consider the following problems:
∎ In an experiment with rolls of 2 dice, you are told that sum of rolls is 9.
What is the probability that first roll was a 6?
∎ In a word guessing game, the first letter of a word is ’t’. What is the

likelihood that second letter is ’h’ ?


∎ How likely is it that a a person has a certain disease given that a medical

test was negative?


∎ A spot shows up on a radar screen. How likely is it an aircraft?

▶ Assume we already have a sample space and a probability law. We


want to compute the probability of an event A, given that the
outcome is within some event B. We need a new probability law
called conditional probability law, notated as:

P (A∣B)

∎ How can we derive this probability law for givenP (A), P (B) etc?
Conditional Probability

▶ Let’s start simple. Consider a fair die roll, we already have a


probability law for that.
∎ What is the probability that the roll is 6?
∎ What is the probability that the roll is 6, given that it is even?

P (outcome is 6∣outcome is even) = 1/3

▶ This thought experiment suggests the following formula:


Number of elements in A ∩ B
P (A∣B) =
Number of elements in B

▶ Generalizing the argument, we get the following definition:

P (A ∩ B)
P (A∣B) =
P (B)

∎ We assume P (B) > 0. Otherwise conditional probability is undefined.


Conditional Probability

▶ Does conditional probability specify a probability law? We need to


check the axioms. It is easy to verify the following
∎ P (A∣B) ≥ 0 (Nonnegativity)
∎ P (Ω∣B) = 1 (Normalization)
∎ For A1 , A2 disjoint, P (A1 ∪ A2 ∣B) = P (A1 ∣B) + P (A2 ∣B), (Additivity)

▶ The answer is positive! Hence conditional probability automatically


satisfies the other properties such as
∎ For A ⊂ C, P (A∣B) ≤ P (C∣B)
∎ P (A ∪ C∣B) ≤ P (A∣B) + P (C∣B)

▶ Also note that P (B∣B) = 1 and P (A∣B) = 0 for A ∩ B = ∅. So we


can discard all the outcomes outside B and accept B as our new
universe.
Conditional Probability

▶ Let’s summarize our findings with a definition

Definition (Conditional Probability)


∎ The conditional probability of an event A, given an event B with P (B) > 0,
is defined by
P (A ∩ B)
P (A∣B) = ,
P (B)
and specifies a new probability law on the sample space Ω. In particular, all
properties of probability law remain valid for conditional probability laws.
∎ Conditional probabilities can also be viewed as a probability law on new
universe B, because all probabilities are concentrated there.
∎ If all possible outcomes are finite and equally likely, then:
Number of elements in A ∩ B
P (A∣B) =
Number of elements in B
Conditional Probability

▶ Example: Consider toss a fair coin three times. Let A be the event
more tails than heads come up and let B be the event 1st toss is a
head. Compute P (A∣B).

▶ Example: A conservative design team (call it C) and an innovative


design team (call it N ) are separately asked to design a new product
within a month. From past experience we know that

∎ The probability that team C is successful is 2/3.

∎ The probability that team N is successful is 1/2.

∎ The probability that at least one of them is successful is 3/4


Assuming only one successful design is produced, what is the
probability that it was designed by team N ?
Using Conditional Probability for Modeling

▶ In some problems it is easier to specify conditional probabilities


rather than the standard probabilities of events.

∎ In this case we can start with P (B) and P (A∣B), and then use the
following formula to compute P (A ∩ B)

P (A ∩ B) = P (A∣B)P (B)

▶ Example (Radar Detection): If an aircraft is present in a certain area,


radar detects it with probability 0.99. If an aircraft is not present,
radar generates a false alarm with probability 0.1. We assume that
an aircraft is present with probability 0.05. What is the probability of
no aircraft presence and a false alarm? What is the probability of
aircraft presence and no detection?
Using Conditional Probability for Modeling

▶ It is a good idea to attack such problems with leaf-tree models


Using Conditional Probability for Modeling

▶ Generalizing the argument in the example, we have:

Theorem (Multiplication Rule)


Assuming all conditioning events have positive probability, we have
n
P ( ⋂ Ai ) = P (A1 )P (A2 ∣A1 )P (A3 ∣A1 ∩ A2 ) . . . P (An ∣A1 ∩ . . . Ai )
i=1

∎ Most problems involving conditional probabilities can be solved by first


converting the problem to a tree, and applying the multiplication rule.
Using Conditional Probability for Modeling

▶ Example: Three cards are drawn from an ordinary 52−card deck


without replacement. Find the probability that none of the three
cards is a heart.

▶ Example: A class containing 4 graduate and 12 undergraduate


students is randomly divided into 4 groups of 4. What is the
probability that each group includes a graduate student?

▶ Example(Monty Hall): You are in a TV Show, and you are told that
the grand prize is equally likely to be found behind any of the closed
three doors. You point to one of the doors, and then host opens one
of the remaining two doors, after making sure the prize is not behind
it. Then the host gives you a chance to switch. Would you stick to
your initial choice, or switch to the unopened door? What is the best
strategy?
Total Probability Theorem and Bayes’ Rule
Total Probability Theorem

▶ The following is very useful for computing certain probabilities:


Theorem (Total Probability Theorem)
Let A1 , . . . , An be disjoint events that form a partition of the sample
space. Assume that P (Ai ) ≥ 0 for all i. Then for any event B, we have:

P (B) = P (A1 ∩ B) + ⋅ ⋅ ⋅ + P (An ∩ B)


= P (A1 )P (B∣A1 ) + ⋅ ⋅ ⋅ + P (An )P (B∣An )
Total Probability Theorem

▶ Example: You enter a chess tournament where probability of you


winning is 0.3 against half of the players (call them Type I), 0.4
against quarter of the players (call them Type II) and 0.5 against
remaining quarter of the players (call them Type III). You play against
a randomly chosen opponent. What is the probability of winning?

▶ Example: Alice is taking probability classes and at the end of each


week she is either up-to-date or have fallen behind. If she is
up-to-date in a given week, the probability that she will be
up-to-date (or behind) in the next week is 0.8 (or 0.2 respectively). If
she is behind in a given week, the probability that she will be
up-to-date (or behind) in the next week is 0.4 (or 0.6 respectively).
What is the probability that she is up-to-date after three weeks?
Inference and Bayes’ Rule

▶ Total probability theorem is usually used with the following famous


theorem, which enables us to pass from P (B∣A) to P (A∣B)

Theorem (Bayes’ Rule)


Let Ai be disjoint events that form a partition of the sample space, and
assume that P (Ai ) > 0 for all i. Then for any event B such that
P (B) > 0, we have

P (Ai )P (B∣Ai ) P (Ai )P (B∣Ai )


P (Ai ∣B) = = n
P (B) ∑j=1 P (Aj )P (B∣Aj )

∎ This theorem is tremendously useful, because in many applications we


have access to P (Ai ) and P (B∣Ai ), but what we really need is P (Ai ∣B).
More on that on the next slide.
Inference and Bayes’ Rule

▶ Bayes’ Rule is generally used for inference

∎ There are number of causes (Ai ), which may explain a certain effect (B).

∎ By using domain expertise or heuristics, we can often find P (Ai ) (overall


probability of occurrence of that cause) and P (B∣Ai ) (probability of
effect occurring given that the cause is happened)

∎ Then by using Bayes’ Rule, we can compute P (Ai ∣B) (given that effect is
observed, what is the probability that this particular effect caused it?)

∎ Finally, by comparing different P (Ai ∣B), we can infer what was the most
probable cause of effect.

∎ We call P (Ai ) prior probability and P (Ai ∣B) posterior probability


Inference and Bayes’ Rule

▶ Example: Return to the aircraft example. What is the probability


that an aircraft is present, given that the radar generated an alarm?

▶ Example: Return to the chess example. Given that you won the
match, what is the probability that your opponent was type I?

▶ Example(False-Positive Puzzle): A test for a certain rare disease is


assumed to be correct 95% of the time. If a person has the disease,
test results are positive with probability 0.95 and if the person does
not have the disease, test results are negative with probability 0.95.
A random person has probability of 0.001 having the disease. Given
that this person has tested positive, what is the probability of having
the disease?
Independence
Independence

▶ We have introduced P (A∣B) to capture partial information on event


B provides about event A. What if B has no effect on A?

P (A∣B) = P (A)

∎ This special case is so important that it needs to be studied separately.


∎ Event A is independent of B, if occurrence of B does not have any effect
on the probability of occurrence of A.
▶ Using the definitions, independence can also be formulated as:

P (A ∩ B) = P (A)P (B)

∎ This definition is preferred, it also allows use of events with P (B) = 0.


∎ Note that if A is independent of B, B is also independent of A. Hence
we can simply refer to A and B as independent events.
Independence

▶ Independence is easy to understand intuitively

∎ If two physical processes are not interacting with each other, events
caused by them are usually independent.

∎ For instance, outcomes of successive coin tosses/die rolls are usually


independent.

▶ On the other hand, independence is not easy to visualize in the


sample space

∎ Disjoint events are not usually independent! On the contrary, two


disjoint events with P (A) > 0, P (B) > 0 are never independent.

∎ Note that A and Ac are disjoint, but knowing one of them has occurred,
determines the other exactly.
Independence

▶ Example: Consider an experiment involving two successive 4−sided


die rolls. Determine the independence of following events:

∎ Ai = {1st roll results in i}, Bj = {2nd roll results in j}

∎ A = {1st roll is a 1}, B = {sum of two rolls is a 5}

∎ A = {max of two rolls is 2}, B = {min of two rolls is 2}

▶ Also note that if occurrence of B does not tell anything about A,


non-occurrence of B also should not give any information on A.

∎ Thus, if A and B are independent, so are A and B c .


Conditional Independence

▶ Since conditional probability is also a probability law, we can extend


the notion of independence to conditional setting.
∎ Given an event C, two events A, B are conditionally independent if

P (A ∩ B∣C) = P (A∣C)P (B∣C)

▶ By working on definitions, we can show that:

P (A∣B ∩ C) = P (A∣C)

∎ Thus, when C occurs, occurrence of B does not have any effect on


probability of occurrence of A.

▶ Interestingly, conditional independence and unconditional


independence does not imply each other. Let’s verify that through
examples.
Conditional Independence

▶ Consider two fair coin tosses. Let H1 = {1st toss is a head},


H2 = {2nd toss is a head} and D = {results of tosses are different}.
Show that although H1 and H2 are independent, they are not
conditionally independent when conditioned on D.

▶ Consider two coins, a blue one and a red one. We choose one at
random and proceed with two independent tosses. Coins are biased,
probability of head is 0.99 with blue coin and 0.01 with red coin. Let
B be the event that blue coin is chosen and let Hi be the event that
ith toss results in a head. Show that H1 and H2 are dependent, but
they become conditionally independent conditioned on B.
Independence

Definition (Independence)
▶ Two events A and B are said to be independent if

P (A ∩ B) = P (A)P (B).

If in addition P (B) > 0, an equivalent definition is

P (A∣B) = P (A)

▶ Two events A and B are said to be conditionally independent given


an event C with P (C) > 0, if

P (A ∩ B∣C) = P (A∣C)P (B∣C).

If in addition P (B ∩ C) > 0, an equivalent definition is

P (A∣B ∩ C) = P (A∣C)
Independence of Multiple Events

▶ Definition of independence can be extended to multiple events

Definition (Independence of Several Sets)


We say that events A1 , A2 , . . . An are independent if

P ( ⋂ Ai ) = ∏ P (Ai ), for every subset S of {1, 2, . . . , n}


i∈S i∈S

▶ To check independence of A1 , A2 , A3 , we check four conditions

P (A1 ∩ A2 ) = P (A1 )P (A2 ) P (A1 ∩ A3 ) = P (A1 )P (A3 )


P (A2 ∩ A3 ) = P (A2 )P (A3 ) P (A1 ∩ A2 ∩ A3 ) = P (A1 )P (A2 )P (A3 )

∎ First three conditions are known as pairwise independence


∎ The fourth conditions does not follow from the first three and vice versa.
Independence of Multiple Events

▶ Example (Pairwise Independence does not imply Independence):


Consider two fair coin tosses. Let H1 = {1st toss is a head},
H2 = {2nd toss is a head} and D = {results of tosses are different}.
Show that H1 and H2 , H1 and D, and H2 and D are independent,
although H1 , H2 , D are not.

▶ Example (Equality P (A1 ∩ A2 ∩ A3 ) = P (A1 )P (A2 )P (A3 ) does not


imply pairwise independence): Consider two independent rolls of a
fair die, and the following events
∎ A = {First roll is 1,2 or 3}
∎ B = {First roll is 3,4 or 5}
∎ C = {Sum of rolls is 9}
Show that although P (A ∩ B ∩ C) = P (A)P (B)P (C), no two
events are independent.
Reliability

▶ Independence has major applications in reliability analysis. In


complex systems, behavior of individual components are usually
modeled as being independent, which simplify the analysis
▶ Example: A computer network connects two nodes A and B through
intermediate nodes C, D, E, F . For every pair of nodes i, j,
probability pij is the probability that link is up. We assume links fail
independent of each other. What is the probability that there is a
path connecting A to B?
Independent Trials and Binomial Probabilities

▶ If an experiment involves several independent but identical stages, we


say that we have a sequence of independent trials.

▶ In particular, if each trial has only two outcomes, we call them


Bernoulli Trials
∎ Two outcomes can be anything (”it rains”, ”it doesn’t rain ” etc.)
∎ But we will usually think in terms of coin tosses, so outcomes area heads
(H) and tails (T).

▶ Consider an experiment with n independent tosses of a coin. Let


p ∈ [0, 1] be the probability of heads.
∎ Let Ai {ith toss is a head}, then in this context, events A1 , A2 , . . . , An are
independent.
Independent Trials and Binomial Probabilities

▶ We can visualize the Bernoulli trials in tree form

▶ In general, the probability of having a particular sequence with k


heads out of n tosses is

pk (1 − p)n−k
Independent Trials and Binomial Probabilities

▶ What about the total probability of having k heads?

p(k) = P (k heads come up in an n-toss sequence)

∎ In total, the number of such sequences is (nk), known as the Binomial


Coefficient, thus
n n!
p(k) = ( )pk (1 − p)k = pk (1 − p)k
k k!(n − k)!
∎ Don’t panic if you forgot about permutations, combinations etc. We will
review them in the next subsection.
▶ Using the fact that these probabilities should sum up to 1, we obtain
the Binomial formula
n
n k
∑ ( )p (1 − p) = 1
k
k=0 k
Independent Trials and Binomial Probabilities

▶ Of course, Bernoulli trials are not only used to analyze coin tosses,
let’s look at a more realistic example.
▶ An Internet service provider has installed c modems to serve n
customers. It is estimated that each customer will need a connection
with probability p. What is the probability that more than c
customers will simultaneously need a connection?
▶ The answer is:
n n
n k
∑ p(k) = ∑ ( )p (1 − p)
k
k=c+1 k=c+1 k

∎ For n = 100, p = 0.1 and c = 15, this probability is around 0.0399

▶ This type of computations are common in sizing problems. Note that


this method allows us to compute the number of modems we have to
install, by considering population properties.
Counting
Counting

▶ Counting is a critical tool in probability theory. For a lot of problems,


computing probabilities boils down to counting sets of elements.

∎ For instance, in finite sample spaces with equally likely outcomes:


number of elements in A
P (A) = .
number of elements in Ω

▶ Although counting might look simple, it can get frustratingly difficult


for complicated problems.

∎ There is an entire field of mathematics called combinatorics, which


focuses on counting.
∎ In this subsection we will refresh our memory on basic principles of
counting and derive some well-known formulas
The Counting Principle

▶ The counting principle is basically a divide and conquer approach,


where we divide the counting into stages
▶ Consider an experiment with two stages.
∎ Possible results of the first stage are ai , i = 1, . . . , n
∎ Possible results of the second stage are bj , j = 1, . . . , m
∎ Possible results of the overall experiment is all ordered pairs (ai , bj ).
Number of such pairs is mn

▶ Generalizing this argument:

Theorem (Counting Principle)


Consider a process of r stages. Suppose that there are n1 possible
results at the first stage and for each possible outcome of the (i − 1)th
stage, the ith stage has ni outcomes. Then the total number of
possible outcomes is:
n1 n2 . . . n r
Counting

▶ Example (Number of Telephone Numbers): The local telephone


numbers is a 7 digit sequence, but the first number cannot be 0 or 1.
How many different phone numbers are there out there?
▶ Example (Number of Subsets): Consider a set with n elements. How
many different subsets does it have, Including itself and empty set?
k-permutations

▶ For the rest of the subsection, we will focus on three different types
of counting that involves selecting k objects out of n:
∎ If the order of the elements matters, it is called permutation.
∎ Otherwise, it is called combination.
∎ Finally, we will discuss a more general type of counting that involves
partition of a collection into multiple subsets.

▶ We start with n distinct objects and let k ≤ n. We wish to count the


number of different ways we can pick k out of n objects.
∎ Using the counting principle, it is easy to show that the above number is

n(n − 1) . . . (n − k)(n − k − 1) . . . 1 n!
n(n − 1) . . . (n − k) = =
(n − k − 1) . . . 1 (n − k)!

∎ For the special case n = k, this number is called permutation

n(n − 1) . . . 1 = n!
k−permutations

▶ Example: Count the number of words that consists of four distinct


letters

▶ The count for permutations can be combined with the Counting


Principle to solve some complicated problems.

▶ Example: You have n1 classical music CDs, n2 rock music CDs, and
n3 country music CDs. In how many different ways can you arrange
them so that CDs of the same type are next to each other?
Combination

▶ Think of a problem like selecting a committee of k people from a


group of n. How many different groups are there?
∎ Note that the order of committee members are irrelevant.
∎ Hence this is not a permutation problem, it is a combination problem
▶ For example, 2−permutations of the letters A, B, C, D:
AB, BA, AC, CA, AD, DA, BC, CB, BD, DB, CD, DC;
▶ Whereas the combination of these four letters are:
AB, AC, CA, AD, BC, BD, CD
▶ Hence combinations have fewer elements. In particular, each
combination is associated with k! duplicate k−permutations, hence
the number of combinations is:
n!
k!(n − k)!

∎ Can you see why this is also the formula for the binomial coefficient (nk)?
Combination

▶ It is worth noting that counting arguments sometimes lead to


formulas that are difficult to derive algebraically.
▶ Example: Binomial formula:
n
n k
∑ ( )p (1 − p) = 1
k
k=0 k

∎ For the special case p = 1/2 we get


n
n
∑( )=2
n

k=0 k
∎ Result is the number of subsets of an n element set. Can you see why?
▶ Example: Consider a group of n people. Consider clubs that consists
of a club leader and a number of (possibly zero) additional club
members. What is the number of possible clubs of this type?
∎ Note that solution leads to an interesting identity:
n
n
∑ k( ) = n2
n−1

k=1 k
Partitions

▶ Note that combination partitions the set into two pieces.


∎ One piece with k members, and another piece with (n − k) members.

▶ We can generalize this idea by considering a partition of a set into r


pieces, each with n1 , n2 , . . . nr elements correspondingly.

▶ This number is known as the multinomial coefficient, formula is:


n n!
( )=
n1 , n 2 , . . . , n r n1 !n2 ! . . . nr !

∎ This formula is easy to derive using the counting principle.


∎ Notice how we recover the binomial coefficient formula when we set r = 2,
n1 = k and n2 = n − k.
Partitions

▶ Example (Anagrams): Hoe many different words can be obtained by


rearranging the letters in the word TATTOO?
▶ Example: A class containing 4 graduate and 12 undergraduate
students is randomly divided into 4 groups of 4. What is the
probability that each group includes a graduate student? Let’s solve
it this time using counting arguments.
▶ Here is a summary of counting results from Bertsekas’
Summary
Summary

▶ This lecture:

∎ Foundations: sets, samples spaces and probability laws


∎ How to use conditional probability for both analysis and modeling
∎ Two very important theorems: Total probability and Bayes’ Rule
∎ The concept of independence
∎ Counting tools

▶ What is next?

∎ We are going to introduce a fundamental concept in probability theory:


random variables.

You might also like