0% found this document useful (0 votes)

30 views69 pages

MAT 271 Probability and Statistics Lecture 2: Sample Space and Probability

This document provides an overview of foundational probability concepts including sample spaces, events, probability laws, discrete probability models, and examples of calculating probabilities of outcomes from experiments like coin flips and dice rolls; it discusses defining sample spaces and events, the axioms that define a valid probability law, and how to calculate probabilities for discrete models where outcomes are equally likely based on counting elements in the sample space.

Uploaded by

Serkan Burak ÖRS

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views69 pages

MAT 271 Probability and Statistics Lecture 2: Sample Space and Probability

Uploaded by

Serkan Burak ÖRS

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 69

MAT 271 Probability and Statistics

Lecture 2: Sample Space and Probability

Asst. Prof. N. Kemal Ure

Istanbul Technical University

[email protected]

February 18, 2020

Overview

1 Introduction

2 Sets

3 Probabilistic Models

4 Conditional Probability

5 Total Probability Theorem and Bayes’ Rule

6 Independence

7 Counting

8 Summary
Introduction
Introduction

▶ What is probability? There are two different schools of thought:

▶ Frequentist perspective: Probability of an event is a measure of its

frequency of occurrence

∎ If I flip a coin lot of times, around 50% of the results would be heads.

▶ Bayesian perspective: Probability of an event is a measure of our

belief regarding its uncertainty

∎ Illiad and Oddysey were written by the same person with probability 90%

▶ Both views have their advantages/disadvantages in practice, more

about that when we move into Statistics.
Introduction

▶ In this lecture we will start laying out foundational material we are

going to build our course on:
∎ Sets, sample space, basic properties of probability measuring functions
∎ We will see that probability theory is all about measuring size of sets.

▶ We will also learn about some techniques that will allow us to

estimate probabilities of some simple events:
∎ Coin toss, die roll and deck shuffling problems
∎ We will see that most of the problems above are counting problems.

▶ We will also learn some of the most popular and useful probability
theorems: Total Probability Theorem and Bayes’ Rule.
Sets
Sets

▶ Why study set theory?

∎ Because later we will see that estimating probabilities of events are
equivalent to measuring the size of subsets of certain spaces.
∎ We will measure the size of discrete sets by counting the elements
∎ For continuous sets we will use integration
∎ But first, we need to have a good understanding of basic set theory.

▶ A set S is a collection of objects, which are elements of the set.

▶ A set can be defined by simply listing its elements
∎ Set of outcomes for a die roll: S = {1, 2, 3, 4, 5, 6}
∎ Set of outcomes for a coin flip: S = {H, T }
∎ The empty set ∅ is the set with no elements

▶ Or we can write a property that defines the elements of a set

∎ S = {x ∈ Z∣x ≥ 1, x ≤ 6}
Sets

▶ If a set has infinitely many elements that can be numerated in a list:

S = {x1 , x2 , . . . }
∎ Sets that can be written like this are called countable sets.
∎ Set of positive integers Z, set of even integers Z2n and set of rational
numbers Q are all countable.

▶ If an infinite set can’t be enumerated, the set is called uncountable

∎ The set S = {x∣0 ≤ x ≤ 1} is uncountable.

▶ We say set S is a subset of T or S ⊂ T if x ∈ S Ô⇒ x ∈ T

∎ Equivalently, the superset relation can be written as T ⊃ S.
∎ Two sets S and T are equal, that is S = T if statements S ⊂ T and T ⊂ S
are both true at the same time.

▶ We denote the universal set with Ω. For every set S, we have S ⊂ Ω

Set Operations

▶ The complement of a set S is denoted by S c and defined as all the

elements that does not belong to S:

S c = {x∣x ∉ S}

∎ Note that Ωc = ∅

▶ Union of two sets S and T , called S ∪ T is the set that contains all
of their elements

S ∪ T = {x∣x ∈ S or x ∈ T }

▶ Intersection of two sets S and T , called S ∩ T is the set that contains

elements that belong to both

S ∩ T = {x∣x ∈ S and x ∈ T }
Set Operations

▶ The union and intersection operations can be extended to several, or

even infinite number of sets. Let Sn be a collection of infinite
number of sets:
∞
⋃ Sn = S1 ∪ S2 ∪ ⋅ ⋅ ⋅ = {x∣x ∈ Sn for some n}
n=1
∞
⋂ Sn = S1 ∩ S2 ∩ ⋅ ⋅ ⋅ = {x∣x ∈ Sn for all n}
n=1

▶ Sets S, T are called disjoint if their intersection is empty; S ∩ T = ∅

∎ A collection of set Sn said to be a partition of set S, if the collection is
disjoint and their union is S.

▶ If x, y are two objects, (x, y) denotes the ordered pair of x and y.

∎ Set of all ordered pair of real numbers in the plane is R2
∎ Set of all ordered triplets of real numbers is R3
Set Operations
Algebra of Sets

▶ Set operations have some intuitive properties that can be easily

derived from definitions

S∪T = T ∪S
S ∩ (T ∩ U ) = (S ∩ T ) ∩ U
S ∩ (T ∪ U ) = (S ∩ T ) ∪ (S ∩ U )
S ∪ (T ∩ U ) = (S ∪ T ) ∩ (S ∪ U )
(S c )c = S
S ∩ Sc = ∅
S∪Ω = Ω
S∩Ω = S

▶ Two particularly useful properties are known as De Morgan’s Laws

c c
(⋃ Sn ) = c
⋂ Sn , (⋂ Sn ) = ⋃ Snc .
n n n n
Probabilistic Models
Probabilistic Models

▶ A probabilistic model is a mathematical description of an uncertain

situation. Two main ingredients of a probabilistic model are listed
below:

Definition (Elements of a Probabilistic Model)

▶ The Sample Space Ω, which is the set of all possible outcomes of an
experiment.
▶ The probability law P , which assigns the set A of possible outcomes
(also called an event), a nonnegative number P (A) (called the
probability of A), that encodes our knowledge/belief about the
collective likelihood of the elements of A. The probability law should
satisfy some certain properties to be introduced shortly
Probabilistic Models
Sample Spaces and Events

▶ All models have an underlying process called experiment

∎ An experiment produces exactly one out of several outcomes
∎ The set of all possible outcomes is called the sample space
∎ Subset of a sample space, a collection of outcomes, is called an event

▶ There is no restriction on what can be an experiment

∎ Could be a single coin toss, several tosses, or an infinite number of tosses
∎ However, there is only one experiment. Hence a sequence of tosses are
still a single experiment.

▶ Sample spaces can have finite or infinite number of elements

∎ Usually finite spaces are much simpler to deal with.
Sample Spaces and Events

▶ All elements of a sample space should be mutually exclusive

∎ That is, one outcome should refer to a single element
∎ Example: In a die roll experiment, sample space elements cannot be ”1 or
4” and ”1 or 3”

▶ A sample space should be collectively exhasutive

∎ No matter what the outcome is, there should always be a corresponding
element in the sample space.

▶ Example: What are the appropriate sample spaces for the

experiments below?
∎ We toss a coin ten times. We receive $1 each time head comes up.
∎ We toss a coin ten times. We receive $1 for the first time head comes up.
Then the received amount is doubled each time a head comes up.
Sequential Models

▶ Many experiments are inherently sequential

∎ Tossing a coin three times
∎ Observing the value of a stock on five successive days
∎ Receiving eight successive digits on communication receiver.

▶ In those cases it makes sense to describe the sample space in terms

of tree-based sequential description:
Probability Laws

▶ Now suppose that we have selected a sample space. Next step is to

select the probability law P

Definition (Probability Axioms)

1 (Nonnegativity) P (A) ≥ 0 for every events A.
2 (Additivity) If A and B are two disjoint events, then the probability of their
union is:
P (A ∪ B) = P (A) + P (B).
In general, if A1 , A2 , . . . is a sequence of disjoint events, then,

P (A1 ∪ A2 ∪ . . . ) = P (A1 ) + P (A2 ) + . . .

3 (Normalization) The probability of the entire sample space Ω is 1, that is,

P (Ω) = 1.
Probability Laws

▶ As an analogy, think of a unit of mass spread over the sample space.

∎ You can think P (A) as the collective mass that was assigned to elements
of A. Then the additivity axiom becomes intuitive.

▶ Based on the probability axioms, following results can be derived

∎ P (∅) = 0.
∎ P (A1 ∪ A2 ∪ A3 ) = P (A1 ) + P (A2 ) + P (A3 ), A1 , A2 , A3 are disjoint.
∎ More properties will be derived later.

▶ As an example, let’s derive a probability law for coin toss experiment

∎ Probability law for a single coin toss
∎ Probability law for three coin tosses
Discrete Models

▶ Generalizing the previous example, we get a probability law:

Definition (Discrete Probability Law)

If the sample space consists of a finite number of possible outcomes,
then the probability is specified by the probabilities of events that
consists of a single element. In particular, the probability of any event
{s1 , s2 , . . . , sn } is the sum of probabilities of its elements:

P ({s1 , s2 , . . . , sn }) = P (s1 ) + P (s2 ) + ⋅ ⋅ ⋅ + P (sn )

∎ Note that we are abusing the notation here by calling P ({si }) as P (si )
Discrete Models

▶ For the special case where each event is equally likely we have:

Definition (Discrete Probability Law)

If the sample space consists of n different outcomes, which are all
equally likely (i.e., P (si ) = 1/n), then probability of event A is

number of elements in A
P (A) = .
n

▶ Example: Rolling a pair of 4−sided dice.

∎ P ({Sum of rolls is even}) = 1/2
∎ P ({Sum of rolls is odd}) = 1/2
∎ P ({First roll is equal to second}) = 1/4

∎ P ({First roll is larger than second}) = 3/8

∎ P ({At least one roll is equal to 4}) = 7/16

Continuous Models

▶ Probabilistic models with continuous sample spaces differ from

discrete models:
∎ Probability of a single element is usually not enough to determine the
whole probability law
▶ Example: Consider a wheel of fortune, where the results are
calibrated from 0 to 1.
∎ Hence the sample space is Ω = [0, 1]
∎ But we can’t define P (x) > 0, x ∈ Ω, then by additivity we get P (Ω) → ∞
∎ We can work around this problem by assigning probabilities to intervals

rather than single elements, such as

P ([a, b]) = b − a, [a, b] ⊂ [0, 1]

▶ Example: Throwing darts

∎ It makes sense to define the probability of hitting a certain region as the
area of that region. (given that dartboard has unit area)
Continuous Models

▶ Example: Romeo and Juliet have a date and each arrive with a delay
of 0 to 1 hour. The first to arrive waits for at most 15 minutes and
then leaves. What is the probability that they will meet?
Continuous Models

▶ Why discrete and continuous models are so different?

∎ Because real line is uncountable. Discrete models are finite or countable.
∎ That is why probability of a single element P (x) can be positive in
discrete setting, where as it should be equal to zero in continuous setting.

▶ In general, probability of event A is measured by the integral ∫A dt.

∎ This integral is the ”area of the set”, in a general sense.
∎ However, it turns out that this integral is not well defined for all sets.
∎ Analyzing which sets are measurable is a very deep question. Thich is
addressed by a branch of mathematics called measure theory, which is out
of scope for this course.
∎ For the problems in this course, all sets will be measurable, we will not
have to deal with the non-existance of integrals or areas.
Properties of Probability Laws

▶ Using the axioms, we can prove the following properties

Theorem (Properties of Probability Laws)

Consider a probability law P and events A, B, C.
1 If A ⊂ B, then P (A) ≤ P (B).
2 P (A ∪ B) = P (A) + P (B) − P (A ∩ B).
3 P (A ∪ B) ≤ P (A) + P (B).
4 P (A ∪ B ∪ C) = P (A) + P (Ac ∩ B) + P (Ac ∩ B c ∩ C).
Models and Reality

▶ Probability theory can be used to analyze a wide variety of uncertain

physical processes. This is done in two stages:
∎ Stage I, Constructing the model: In this stage we assign an appropriate
sample space and probability law to the physical process
Sometimes, this is done via domain expertise and past experiences.
The most popular way is to use past data for building the probability
law. This is the main objective of statistics, which we will learn more
about later on.
∎ Stage II, Analyze the model: In this stage we work with our model to find
probabilities of certain events or deduce some interesting properties.
This part can be very hard depending on the mathematical tools and
algorithms you use.
One of the objectives of this course is to teach you efficient methods
for analyzing your probabilistic models.
Conditional Probability
Conditional Probability

▶ Conditional Probability provides us to compute probabilities of events

based on partial information. Consider the following problems:
∎ In an experiment with rolls of 2 dice, you are told that sum of rolls is 9.
What is the probability that first roll was a 6?
∎ In a word guessing game, the first letter of a word is ’t’. What is the

likelihood that second letter is ’h’ ?

∎ How likely is it that a a person has a certain disease given that a medical

test was negative?

∎ A spot shows up on a radar screen. How likely is it an aircraft?

▶ Assume we already have a sample space and a probability law. We

want to compute the probability of an event A, given that the
outcome is within some event B. We need a new probability law
called conditional probability law, notated as:

P (A∣B)

∎ How can we derive this probability law for givenP (A), P (B) etc?
Conditional Probability

▶ Let’s start simple. Consider a fair die roll, we already have a

probability law for that.
∎ What is the probability that the roll is 6?
∎ What is the probability that the roll is 6, given that it is even?

P (outcome is 6∣outcome is even) = 1/3

▶ This thought experiment suggests the following formula:

Number of elements in A ∩ B
P (A∣B) =
Number of elements in B

▶ Generalizing the argument, we get the following definition:

P (A ∩ B)
P (A∣B) =
P (B)

∎ We assume P (B) > 0. Otherwise conditional probability is undefined.

Conditional Probability

▶ Does conditional probability specify a probability law? We need to

check the axioms. It is easy to verify the following
∎ P (A∣B) ≥ 0 (Nonnegativity)
∎ P (Ω∣B) = 1 (Normalization)
∎ For A1 , A2 disjoint, P (A1 ∪ A2 ∣B) = P (A1 ∣B) + P (A2 ∣B), (Additivity)

▶ The answer is positive! Hence conditional probability automatically

satisfies the other properties such as
∎ For A ⊂ C, P (A∣B) ≤ P (C∣B)
∎ P (A ∪ C∣B) ≤ P (A∣B) + P (C∣B)

▶ Also note that P (B∣B) = 1 and P (A∣B) = 0 for A ∩ B = ∅. So we

can discard all the outcomes outside B and accept B as our new
universe.
Conditional Probability

▶ Let’s summarize our findings with a definition

Definition (Conditional Probability)

∎ The conditional probability of an event A, given an event B with P (B) > 0,
is defined by
P (A ∩ B)
P (A∣B) = ,
P (B)
and specifies a new probability law on the sample space Ω. In particular, all
properties of probability law remain valid for conditional probability laws.
∎ Conditional probabilities can also be viewed as a probability law on new
universe B, because all probabilities are concentrated there.
∎ If all possible outcomes are finite and equally likely, then:
Number of elements in A ∩ B
P (A∣B) =
Number of elements in B
Conditional Probability

▶ Example: Consider toss a fair coin three times. Let A be the event
more tails than heads come up and let B be the event 1st toss is a
head. Compute P (A∣B).

▶ Example: A conservative design team (call it C) and an innovative

design team (call it N ) are separately asked to design a new product
within a month. From past experience we know that

∎ The probability that team C is successful is 2/3.

∎ The probability that team N is successful is 1/2.

∎ The probability that at least one of them is successful is 3/4

Assuming only one successful design is produced, what is the
probability that it was designed by team N ?
Using Conditional Probability for Modeling

▶ In some problems it is easier to specify conditional probabilities

rather than the standard probabilities of events.

∎ In this case we can start with P (B) and P (A∣B), and then use the
following formula to compute P (A ∩ B)

P (A ∩ B) = P (A∣B)P (B)

▶ Example (Radar Detection): If an aircraft is present in a certain area,

radar detects it with probability 0.99. If an aircraft is not present,
radar generates a false alarm with probability 0.1. We assume that
an aircraft is present with probability 0.05. What is the probability of
no aircraft presence and a false alarm? What is the probability of
aircraft presence and no detection?
Using Conditional Probability for Modeling

▶ It is a good idea to attack such problems with leaf-tree models

Using Conditional Probability for Modeling

▶ Generalizing the argument in the example, we have:

Theorem (Multiplication Rule)

Assuming all conditioning events have positive probability, we have
n
P ( ⋂ Ai ) = P (A1 )P (A2 ∣A1 )P (A3 ∣A1 ∩ A2 ) . . . P (An ∣A1 ∩ . . . Ai )
i=1

∎ Most problems involving conditional probabilities can be solved by first

converting the problem to a tree, and applying the multiplication rule.
Using Conditional Probability for Modeling

▶ Example: Three cards are drawn from an ordinary 52−card deck

without replacement. Find the probability that none of the three
cards is a heart.

▶ Example: A class containing 4 graduate and 12 undergraduate

students is randomly divided into 4 groups of 4. What is the
probability that each group includes a graduate student?

▶ Example(Monty Hall): You are in a TV Show, and you are told that
the grand prize is equally likely to be found behind any of the closed
three doors. You point to one of the doors, and then host opens one
of the remaining two doors, after making sure the prize is not behind
it. Then the host gives you a chance to switch. Would you stick to
your initial choice, or switch to the unopened door? What is the best
strategy?
Total Probability Theorem and Bayes’ Rule
Total Probability Theorem

▶ The following is very useful for computing certain probabilities:

Theorem (Total Probability Theorem)
Let A1 , . . . , An be disjoint events that form a partition of the sample
space. Assume that P (Ai ) ≥ 0 for all i. Then for any event B, we have:

P (B) = P (A1 ∩ B) + ⋅ ⋅ ⋅ + P (An ∩ B)

= P (A1 )P (B∣A1 ) + ⋅ ⋅ ⋅ + P (An )P (B∣An )
Total Probability Theorem

▶ Example: You enter a chess tournament where probability of you

winning is 0.3 against half of the players (call them Type I), 0.4
against quarter of the players (call them Type II) and 0.5 against
remaining quarter of the players (call them Type III). You play against
a randomly chosen opponent. What is the probability of winning?

▶ Example: Alice is taking probability classes and at the end of each

week she is either up-to-date or have fallen behind. If she is
up-to-date in a given week, the probability that she will be
up-to-date (or behind) in the next week is 0.8 (or 0.2 respectively). If
she is behind in a given week, the probability that she will be
up-to-date (or behind) in the next week is 0.4 (or 0.6 respectively).
What is the probability that she is up-to-date after three weeks?
Inference and Bayes’ Rule

▶ Total probability theorem is usually used with the following famous

theorem, which enables us to pass from P (B∣A) to P (A∣B)

Theorem (Bayes’ Rule)

Let Ai be disjoint events that form a partition of the sample space, and
assume that P (Ai ) > 0 for all i. Then for any event B such that
P (B) > 0, we have

P (Ai )P (B∣Ai ) P (Ai )P (B∣Ai )

P (Ai ∣B) = = n
P (B) ∑j=1 P (Aj )P (B∣Aj )

∎ This theorem is tremendously useful, because in many applications we

have access to P (Ai ) and P (B∣Ai ), but what we really need is P (Ai ∣B).
More on that on the next slide.
Inference and Bayes’ Rule

▶ Bayes’ Rule is generally used for inference

∎ There are number of causes (Ai ), which may explain a certain effect (B).

∎ By using domain expertise or heuristics, we can often find P (Ai ) (overall

probability of occurrence of that cause) and P (B∣Ai ) (probability of
effect occurring given that the cause is happened)

∎ Then by using Bayes’ Rule, we can compute P (Ai ∣B) (given that effect is
observed, what is the probability that this particular effect caused it?)

∎ Finally, by comparing different P (Ai ∣B), we can infer what was the most
probable cause of effect.

∎ We call P (Ai ) prior probability and P (Ai ∣B) posterior probability

Inference and Bayes’ Rule

▶ Example: Return to the aircraft example. What is the probability

that an aircraft is present, given that the radar generated an alarm?

▶ Example: Return to the chess example. Given that you won the
match, what is the probability that your opponent was type I?

▶ Example(False-Positive Puzzle): A test for a certain rare disease is

assumed to be correct 95% of the time. If a person has the disease,
test results are positive with probability 0.95 and if the person does
not have the disease, test results are negative with probability 0.95.
A random person has probability of 0.001 having the disease. Given
that this person has tested positive, what is the probability of having
the disease?
Independence
Independence

▶ We have introduced P (A∣B) to capture partial information on event

B provides about event A. What if B has no effect on A?

P (A∣B) = P (A)

∎ This special case is so important that it needs to be studied separately.

∎ Event A is independent of B, if occurrence of B does not have any effect
on the probability of occurrence of A.
▶ Using the definitions, independence can also be formulated as:

P (A ∩ B) = P (A)P (B)

∎ This definition is preferred, it also allows use of events with P (B) = 0.

∎ Note that if A is independent of B, B is also independent of A. Hence
we can simply refer to A and B as independent events.
Independence

▶ Independence is easy to understand intuitively

∎ If two physical processes are not interacting with each other, events
caused by them are usually independent.

∎ For instance, outcomes of successive coin tosses/die rolls are usually

independent.

▶ On the other hand, independence is not easy to visualize in the

sample space

∎ Disjoint events are not usually independent! On the contrary, two

disjoint events with P (A) > 0, P (B) > 0 are never independent.

∎ Note that A and Ac are disjoint, but knowing one of them has occurred,
determines the other exactly.
Independence

▶ Example: Consider an experiment involving two successive 4−sided

die rolls. Determine the independence of following events:

∎ Ai = {1st roll results in i}, Bj = {2nd roll results in j}

∎ A = {1st roll is a 1}, B = {sum of two rolls is a 5}

∎ A = {max of two rolls is 2}, B = {min of two rolls is 2}

▶ Also note that if occurrence of B does not tell anything about A,

non-occurrence of B also should not give any information on A.

∎ Thus, if A and B are independent, so are A and B c .

Conditional Independence

▶ Since conditional probability is also a probability law, we can extend

the notion of independence to conditional setting.
∎ Given an event C, two events A, B are conditionally independent if

P (A ∩ B∣C) = P (A∣C)P (B∣C)

▶ By working on definitions, we can show that:

P (A∣B ∩ C) = P (A∣C)

∎ Thus, when C occurs, occurrence of B does not have any effect on

probability of occurrence of A.

▶ Interestingly, conditional independence and unconditional

independence does not imply each other. Let’s verify that through
examples.
Conditional Independence

▶ Consider two fair coin tosses. Let H1 = {1st toss is a head},

H2 = {2nd toss is a head} and D = {results of tosses are different}.
Show that although H1 and H2 are independent, they are not
conditionally independent when conditioned on D.

▶ Consider two coins, a blue one and a red one. We choose one at
random and proceed with two independent tosses. Coins are biased,
probability of head is 0.99 with blue coin and 0.01 with red coin. Let
B be the event that blue coin is chosen and let Hi be the event that
ith toss results in a head. Show that H1 and H2 are dependent, but
they become conditionally independent conditioned on B.
Independence

Definition (Independence)
▶ Two events A and B are said to be independent if

P (A ∩ B) = P (A)P (B).

If in addition P (B) > 0, an equivalent definition is

P (A∣B) = P (A)

▶ Two events A and B are said to be conditionally independent given

an event C with P (C) > 0, if

P (A ∩ B∣C) = P (A∣C)P (B∣C).

If in addition P (B ∩ C) > 0, an equivalent definition is

P (A∣B ∩ C) = P (A∣C)
Independence of Multiple Events

▶ Definition of independence can be extended to multiple events

Definition (Independence of Several Sets)

We say that events A1 , A2 , . . . An are independent if

P ( ⋂ Ai ) = ∏ P (Ai ), for every subset S of {1, 2, . . . , n}

i∈S i∈S

▶ To check independence of A1 , A2 , A3 , we check four conditions

P (A1 ∩ A2 ) = P (A1 )P (A2 ) P (A1 ∩ A3 ) = P (A1 )P (A3 )

P (A2 ∩ A3 ) = P (A2 )P (A3 ) P (A1 ∩ A2 ∩ A3 ) = P (A1 )P (A2 )P (A3 )

∎ First three conditions are known as pairwise independence

∎ The fourth conditions does not follow from the first three and vice versa.
Independence of Multiple Events

▶ Example (Pairwise Independence does not imply Independence):

Consider two fair coin tosses. Let H1 = {1st toss is a head},
H2 = {2nd toss is a head} and D = {results of tosses are different}.
Show that H1 and H2 , H1 and D, and H2 and D are independent,
although H1 , H2 , D are not.

▶ Example (Equality P (A1 ∩ A2 ∩ A3 ) = P (A1 )P (A2 )P (A3 ) does not

imply pairwise independence): Consider two independent rolls of a
fair die, and the following events
∎ A = {First roll is 1,2 or 3}
∎ B = {First roll is 3,4 or 5}
∎ C = {Sum of rolls is 9}
Show that although P (A ∩ B ∩ C) = P (A)P (B)P (C), no two
events are independent.
Reliability

▶ Independence has major applications in reliability analysis. In

complex systems, behavior of individual components are usually
modeled as being independent, which simplify the analysis
▶ Example: A computer network connects two nodes A and B through
intermediate nodes C, D, E, F . For every pair of nodes i, j,
probability pij is the probability that link is up. We assume links fail
independent of each other. What is the probability that there is a
path connecting A to B?
Independent Trials and Binomial Probabilities

▶ If an experiment involves several independent but identical stages, we

say that we have a sequence of independent trials.

▶ In particular, if each trial has only two outcomes, we call them

Bernoulli Trials
∎ Two outcomes can be anything (”it rains”, ”it doesn’t rain ” etc.)
∎ But we will usually think in terms of coin tosses, so outcomes area heads
(H) and tails (T).

▶ Consider an experiment with n independent tosses of a coin. Let

p ∈ [0, 1] be the probability of heads.
∎ Let Ai {ith toss is a head}, then in this context, events A1 , A2 , . . . , An are
independent.
Independent Trials and Binomial Probabilities

▶ We can visualize the Bernoulli trials in tree form

▶ In general, the probability of having a particular sequence with k

heads out of n tosses is

pk (1 − p)n−k
Independent Trials and Binomial Probabilities

▶ What about the total probability of having k heads?

p(k) = P (k heads come up in an n-toss sequence)

∎ In total, the number of such sequences is (nk), known as the Binomial

Coefficient, thus
n n!
p(k) = ( )pk (1 − p)k = pk (1 − p)k
k k!(n − k)!
∎ Don’t panic if you forgot about permutations, combinations etc. We will
review them in the next subsection.
▶ Using the fact that these probabilities should sum up to 1, we obtain
the Binomial formula
n
n k
∑ ( )p (1 − p) = 1
k
k=0 k
Independent Trials and Binomial Probabilities

▶ Of course, Bernoulli trials are not only used to analyze coin tosses,
let’s look at a more realistic example.
▶ An Internet service provider has installed c modems to serve n
customers. It is estimated that each customer will need a connection
with probability p. What is the probability that more than c
customers will simultaneously need a connection?
▶ The answer is:
n n
n k
∑ p(k) = ∑ ( )p (1 − p)
k
k=c+1 k=c+1 k

∎ For n = 100, p = 0.1 and c = 15, this probability is around 0.0399

▶ This type of computations are common in sizing problems. Note that

this method allows us to compute the number of modems we have to
install, by considering population properties.
Counting
Counting

▶ Counting is a critical tool in probability theory. For a lot of problems,

computing probabilities boils down to counting sets of elements.

∎ For instance, in finite sample spaces with equally likely outcomes:

number of elements in A
P (A) = .
number of elements in Ω

▶ Although counting might look simple, it can get frustratingly difficult

for complicated problems.

∎ There is an entire field of mathematics called combinatorics, which

focuses on counting.
∎ In this subsection we will refresh our memory on basic principles of
counting and derive some well-known formulas
The Counting Principle

▶ The counting principle is basically a divide and conquer approach,

where we divide the counting into stages
▶ Consider an experiment with two stages.
∎ Possible results of the first stage are ai , i = 1, . . . , n
∎ Possible results of the second stage are bj , j = 1, . . . , m
∎ Possible results of the overall experiment is all ordered pairs (ai , bj ).
Number of such pairs is mn

▶ Generalizing this argument:

Theorem (Counting Principle)

Consider a process of r stages. Suppose that there are n1 possible
results at the first stage and for each possible outcome of the (i − 1)th
stage, the ith stage has ni outcomes. Then the total number of
possible outcomes is:
n1 n2 . . . n r
Counting

▶ Example (Number of Telephone Numbers): The local telephone

numbers is a 7 digit sequence, but the first number cannot be 0 or 1.
How many different phone numbers are there out there?
▶ Example (Number of Subsets): Consider a set with n elements. How
many different subsets does it have, Including itself and empty set?
k-permutations

▶ For the rest of the subsection, we will focus on three different types
of counting that involves selecting k objects out of n:
∎ If the order of the elements matters, it is called permutation.
∎ Otherwise, it is called combination.
∎ Finally, we will discuss a more general type of counting that involves
partition of a collection into multiple subsets.

▶ We start with n distinct objects and let k ≤ n. We wish to count the

number of different ways we can pick k out of n objects.
∎ Using the counting principle, it is easy to show that the above number is

n(n − 1) . . . (n − k)(n − k − 1) . . . 1 n!
n(n − 1) . . . (n − k) = =
(n − k − 1) . . . 1 (n − k)!

∎ For the special case n = k, this number is called permutation

n(n − 1) . . . 1 = n!
k−permutations

▶ Example: Count the number of words that consists of four distinct

letters

▶ The count for permutations can be combined with the Counting

Principle to solve some complicated problems.

▶ Example: You have n1 classical music CDs, n2 rock music CDs, and
n3 country music CDs. In how many different ways can you arrange
them so that CDs of the same type are next to each other?
Combination

▶ Think of a problem like selecting a committee of k people from a

group of n. How many different groups are there?
∎ Note that the order of committee members are irrelevant.
∎ Hence this is not a permutation problem, it is a combination problem
▶ For example, 2−permutations of the letters A, B, C, D:
AB, BA, AC, CA, AD, DA, BC, CB, BD, DB, CD, DC;
▶ Whereas the combination of these four letters are:
AB, AC, CA, AD, BC, BD, CD
▶ Hence combinations have fewer elements. In particular, each
combination is associated with k! duplicate k−permutations, hence
the number of combinations is:
n!
k!(n − k)!

∎ Can you see why this is also the formula for the binomial coefficient (nk)?
Combination

▶ It is worth noting that counting arguments sometimes lead to

formulas that are difficult to derive algebraically.
▶ Example: Binomial formula:
n
n k
∑ ( )p (1 − p) = 1
k
k=0 k

∎ For the special case p = 1/2 we get

n
n
∑( )=2
n

k=0 k
∎ Result is the number of subsets of an n element set. Can you see why?
▶ Example: Consider a group of n people. Consider clubs that consists
of a club leader and a number of (possibly zero) additional club
members. What is the number of possible clubs of this type?
∎ Note that solution leads to an interesting identity:
n
n
∑ k( ) = n2
n−1

k=1 k
Partitions

▶ Note that combination partitions the set into two pieces.

∎ One piece with k members, and another piece with (n − k) members.

▶ We can generalize this idea by considering a partition of a set into r

pieces, each with n1 , n2 , . . . nr elements correspondingly.

▶ This number is known as the multinomial coefficient, formula is:

n n!
( )=
n1 , n 2 , . . . , n r n1 !n2 ! . . . nr !

∎ This formula is easy to derive using the counting principle.

∎ Notice how we recover the binomial coefficient formula when we set r = 2,
n1 = k and n2 = n − k.
Partitions

▶ Example (Anagrams): Hoe many different words can be obtained by

rearranging the letters in the word TATTOO?
▶ Example: A class containing 4 graduate and 12 undergraduate
students is randomly divided into 4 groups of 4. What is the
probability that each group includes a graduate student? Let’s solve
it this time using counting arguments.
▶ Here is a summary of counting results from Bertsekas’
Summary
Summary

▶ This lecture:

∎ Foundations: sets, samples spaces and probability laws

∎ How to use conditional probability for both analysis and modeling
∎ Two very important theorems: Total probability and Bayes’ Rule
∎ The concept of independence
∎ Counting tools

▶ What is next?

∎ We are going to introduce a fundamental concept in probability theory:

random variables.

CH 1
No ratings yet
CH 1
107 pages
MATH 301 Lecture 1
No ratings yet
MATH 301 Lecture 1
65 pages
Chapter (1) (1)cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
No ratings yet
Chapter (1) (1)cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
22 pages
Topic 1.1
No ratings yet
Topic 1.1
40 pages
STAT 516 Course Notes Part 0: Review of STAT 515: 1 Probability
No ratings yet
STAT 516 Course Notes Part 0: Review of STAT 515: 1 Probability
21 pages
probabity 1
No ratings yet
probabity 1
27 pages
Lecture02 Chapter 02 Probability Michael Baron Inf Stats Final
No ratings yet
Lecture02 Chapter 02 Probability Michael Baron Inf Stats Final
69 pages
Chapter One
No ratings yet
Chapter One
34 pages
Probability
No ratings yet
Probability
42 pages
Frequency With Which That Outcome Would Be Obtained If The Process Were
No ratings yet
Frequency With Which That Outcome Would Be Obtained If The Process Were
23 pages
Lecture02 Ch 02 Probability Michael Baron Inf Stats Final FA24
No ratings yet
Lecture02 Ch 02 Probability Michael Baron Inf Stats Final FA24
65 pages
Probability Lectures
No ratings yet
Probability Lectures
40 pages
Chapt 2
No ratings yet
Chapt 2
213 pages
WEEK 4 - Probability
No ratings yet
WEEK 4 - Probability
30 pages
MSO201 Week1 Lecture Notes
No ratings yet
MSO201 Week1 Lecture Notes
7 pages
Environmental
No ratings yet
Environmental
32 pages
A Bayesian Nonparametric Approach to Causal Inference in the Presence of High-Dimensional, Time-Varying Confounding.
No ratings yet
A Bayesian Nonparametric Approach to Causal Inference in the Presence of High-Dimensional, Time-Varying Confounding.
47 pages
Probability Fundamentals Concepts
No ratings yet
Probability Fundamentals Concepts
9 pages
Prob_Week1_Slides
No ratings yet
Prob_Week1_Slides
55 pages
probabilty and statsics
No ratings yet
probabilty and statsics
86 pages
Random Variale & Random Process
No ratings yet
Random Variale & Random Process
298 pages
1_Probablity_Axioms
No ratings yet
1_Probablity_Axioms
25 pages
chapter_1 Background
No ratings yet
chapter_1 Background
75 pages
Week 2 Slides
No ratings yet
Week 2 Slides
46 pages
Lecture 1 PDF
No ratings yet
Lecture 1 PDF
5 pages
Section [1]
No ratings yet
Section [1]
24 pages
Math 3215: Lecture 2: 1 Set Theory Basics
No ratings yet
Math 3215: Lecture 2: 1 Set Theory Basics
2 pages
Cours Chapter1
No ratings yet
Cours Chapter1
12 pages
Stochastic Systems: Dr. Farah Haroon
No ratings yet
Stochastic Systems: Dr. Farah Haroon
24 pages
Eda Midterms-Compilation
No ratings yet
Eda Midterms-Compilation
12 pages
lecture 2. Set, event, probability
No ratings yet
lecture 2. Set, event, probability
32 pages
Chapter 1 - Probability (With Solutions)
No ratings yet
Chapter 1 - Probability (With Solutions)
65 pages
Probability Full Notes (From Book)
No ratings yet
Probability Full Notes (From Book)
65 pages
Chapter 1 Sample Space and Probability
No ratings yet
Chapter 1 Sample Space and Probability
121 pages
Sample Space and Probability
No ratings yet
Sample Space and Probability
86 pages
STOCHPROC
No ratings yet
STOCHPROC
64 pages
Unit-1: Probability and Randomvariables: Set Theory
No ratings yet
Unit-1: Probability and Randomvariables: Set Theory
18 pages
Random Variables: Chapter 1 - Part 1
No ratings yet
Random Variables: Chapter 1 - Part 1
26 pages
Chapter 1 Lecture1
No ratings yet
Chapter 1 Lecture1
39 pages
B. The Union of Events A, A, ......., A Denoted: 1 2 N N I 1 N I I
No ratings yet
B. The Union of Events A, A, ......., A Denoted: 1 2 N N I 1 N I I
12 pages
Probability Review
No ratings yet
Probability Review
29 pages
1741259436812_Econ-2042- Unit 1-HO
No ratings yet
1741259436812_Econ-2042- Unit 1-HO
25 pages
Data science book
No ratings yet
Data science book
107 pages
probability - lecture notes (1) (1)(1)
No ratings yet
probability - lecture notes (1) (1)(1)
5 pages
Set Theory, Introduction To Probability, Sample Spaces, The Addition Law
No ratings yet
Set Theory, Introduction To Probability, Sample Spaces, The Addition Law
37 pages
IITM Machine Learning
No ratings yet
IITM Machine Learning
857 pages
01 - Probability Spaces
No ratings yet
01 - Probability Spaces
15 pages
Probability Theory: 1.1 Basic Concepts
No ratings yet
Probability Theory: 1.1 Basic Concepts
21 pages
Week1 Notes
No ratings yet
Week1 Notes
10 pages
Lect - 01 Introduction Probability (Part A)
No ratings yet
Lect - 01 Introduction Probability (Part A)
76 pages
Proba
No ratings yet
Proba
23 pages
Probability Theory AND Random Processess (18B11MA314) : Unit-1 Basic Probability (Co-1)
No ratings yet
Probability Theory AND Random Processess (18B11MA314) : Unit-1 Basic Probability (Co-1)
22 pages
Probability Rules: Takeaways: Concepts
No ratings yet
Probability Rules: Takeaways: Concepts
1 page
Module 01 Ppt Class Final 02-03-2023
No ratings yet
Module 01 Ppt Class Final 02-03-2023
67 pages
Eda1
No ratings yet
Eda1
45 pages
Prob. & Stat Lecture Note- Chapter 3
No ratings yet
Prob. & Stat Lecture Note- Chapter 3
16 pages
STA347
No ratings yet
STA347
23 pages
Modern Algebra Essentials
From Everand
Modern Algebra Essentials
Lufti A. Lutfiyya
No ratings yet
Set Theory Essentials
From Everand
Set Theory Essentials
Emil Milewski
No ratings yet
Topology Essentials
From Everand
Topology Essentials
Emil G. Milewski
5/5 (1)
sr_jaspers_en
No ratings yet
sr_jaspers_en
82 pages
Composition, My Personal Approach Alan Belkin Music
No ratings yet
Composition, My Personal Approach Alan Belkin Music
2 pages
Mid War Dynamic Points 2025
No ratings yet
Mid War Dynamic Points 2025
13 pages
Federal Public Service Commission: For Recruitment To Bs-16 and Above Posts
No ratings yet
Federal Public Service Commission: For Recruitment To Bs-16 and Above Posts
2 pages
Supreme Court of India - Guidelines On Arrest
No ratings yet
Supreme Court of India - Guidelines On Arrest
4 pages
Healthcare Administration Cover Letter
100% (2)
Healthcare Administration Cover Letter
8 pages
Fossils Webquest: A Webquest For 3Rd Grade
No ratings yet
Fossils Webquest: A Webquest For 3Rd Grade
5 pages
Courtyard Brick House: Hxsyddgn
No ratings yet
Courtyard Brick House: Hxsyddgn
9 pages
Color Trend Chart 2-19-20
No ratings yet
Color Trend Chart 2-19-20
2 pages
Hitler Bing
No ratings yet
Hitler Bing
15 pages
Academy of Management
No ratings yet
Academy of Management
20 pages
Python Basics Code Full Notes
No ratings yet
Python Basics Code Full Notes
53 pages
Title Sequence Analysis
No ratings yet
Title Sequence Analysis
9 pages
Policies and Programmes For Children in India
No ratings yet
Policies and Programmes For Children in India
11 pages
Ah, I See! Metaphorical Thinking and The Pleasure of Re-Cognition
100% (1)
Ah, I See! Metaphorical Thinking and The Pleasure of Re-Cognition
25 pages
Dark Blue and White Minimalist Resume
No ratings yet
Dark Blue and White Minimalist Resume
2 pages
Identify The Aspects of High Involvement Management Contained in The "Soul of Dell" Vision Statement. - Sharksavewriters
No ratings yet
Identify The Aspects of High Involvement Management Contained in The "Soul of Dell" Vision Statement. - Sharksavewriters
1 page
Urosepsis
No ratings yet
Urosepsis
22 pages
MAN B&W Diesel A/S: Service Letter
100% (1)
MAN B&W Diesel A/S: Service Letter
2 pages
CH - 3 Memory Hierarchy Design PDF
No ratings yet
CH - 3 Memory Hierarchy Design PDF
27 pages
BocadeLeonesV2, Genealogia de Boca de Leones NL
No ratings yet
BocadeLeonesV2, Genealogia de Boca de Leones NL
368 pages
Fractional Distillation and Cracking Exam Questions
No ratings yet
Fractional Distillation and Cracking Exam Questions
5 pages
mc50 en
No ratings yet
mc50 en
20 pages
Show, Don't (Just) Tell - Jerz's Literacy Weblog (Est. 1999)
No ratings yet
Show, Don't (Just) Tell - Jerz's Literacy Weblog (Est. 1999)
19 pages
Teaching Hints For Book 3: Lane's English As A Second Language
No ratings yet
Teaching Hints For Book 3: Lane's English As A Second Language
16 pages
u2
No ratings yet
u2
20 pages
Forensic Entomology
100% (1)
Forensic Entomology
8 pages
Quantity Surveyor Civil - CV
100% (1)
Quantity Surveyor Civil - CV
2 pages
BPC - Rahul Biradar
No ratings yet
BPC - Rahul Biradar
3 pages
[FREE PDF sample] (Ebook) A Most Incomprehensible Thing: Notes Towards a Very Gentle Introduction to the Mathematics of Relativity by Collier, Peter ISBN 9780957389465, 0957389469 ebooks
100% (1)
[FREE PDF sample] (Ebook) A Most Incomprehensible Thing: Notes Towards a Very Gentle Introduction to the Mathematics of Relativity by Collier, Peter ISBN 9780957389465, 0957389469 ebooks
55 pages