0% found this document useful (0 votes)
6 views35 pages

Prob Lec2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views35 pages

Prob Lec2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Lecture 2

Conditional Probability

1 / 35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

2. Conditional probability
Conditional probability
Bayes’ Theorem
Independence

Objectives:
◮ To understand what conditioning means, reduce the
sample space
◮ To use the conditional probability, Law of Total Probability
and Bayes’ Theorem
◮ To understand and use independence and conditional
independence

522//281
35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

Conditional probability
Often one is given partial information on the outcome of an
experiment. This changes our view of likelihoods for various
outcomes. We shall build a mathematical model to handle this
issue.
Example
We roll two dice. What is the probability that the sum of the
numbers is 8? And if we know that the first die shows a 5?
The first question is by now easy: 5 cases out of 36, so the
5
answer is 36 . Now, given the fact that the first die shows 5, we
get a sum of 8 if and only if the second die shows 3. This has
probability 16 , being the answer to the second question.

We see that partial information can change the probability of


the outcomes of our experiment.

533//281
35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

1. The reduced sample space

What happened was that we have reduced our world to the


event F = {first die shows 5} = {(5, 1), (5, 2), . . . , (5, 6)} that
was given to us.

Definition
The event that is given to us is also called a
reduced sample space. We can simply work in this set to figure
out the conditional probabilities given this event.

The event F has 6 equally likely outcomes. Only one of them,


(5, 3), provides a sum of 8. Therefore, the conditional
probability is 16 .

544//281
35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

2. The formal definition


Let us also name the event

E = {the sum is 8} = {(2, 6), (3, 5), . . . , (6, 2)}.

The above question can be reformulated as: “In what


proportion of cases in F will also E occur?” or, equivalently,
“How does the probability of both E and F compare to the
probability of F only?”

Definition
Let F be an event with P{F } > 0 (we’ll assume this from now
on). Then the conditional probability of E, given F is defined as

P{E ∩ F }
P{E | F } : = .
P{F }

555//281
35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

2. The formal definition

To answer the question we began with, with the formal


definition we can now write E ∩ F = {(5, 3)} (the sum is 8 and
the first die shows 5), and
1
P{E ∩ F } 36 1
P{E | F } : = = 1
= .
P{F } 6
6

566//281
35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

3. It’s well-behaved
Proposition
The conditional probability P{· |F } is a proper probability (it
satisfies the axioms):
1. the conditional probability of any event is non-negative:
P{E | F } ≥ 0;
2. the conditional probability of the sample space is one:
P{Ω | F } = 1;
3. for any finitely or countably infinitely many mutually
exclusive events E1 , E2 , . . .,
n[ o X
P Ei | F = P{Ei | F }.
i i

577//281
35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

3. It’s well-behaved

Corollary
All statements remain valid for P{· | F }. E.g.
◮ P{E c | F } = 1 − P{E | F }.
◮ P{∅ | F } = 0.
◮ P{E | F } = 1 − P{E c | F } ≤ 1.
◮ P{E ∪ G | F } = P{E | F } + P{G | F } − P{E ∩ G | F }.
◮ If E ⊆ G, then P{G − E | F } = P{G | F } − P{E | F }.
◮ If E ⊆ G, then P{E | F } ≤ P{G | F }.

Remark
BUT: Don’t change the condition! E.g., P{E | F } and P{E | F c }
have nothing to do with each other.

588//281
35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

4. Multiplication rule
Proposition (Multiplication rule)
For E1 , E2 , . . . , En events,

P{E1 ∩ · · · ∩ En } = P{E1 } · P{E2 | E1 } · P{E3 | E1 ∩ E2 }


· · · P{En | E1 ∩ · · · ∩ En−1 }.

Proof.

Just write out the conditionals.

Example
An urn contains 6 red and 5 blue balls. We draw three balls at
random, at once (that is, without replacement). What is the
chance of drawing one red and two blue balls?
599//281
35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

4. Multiplication rule

Solution (with order)


Define R1 , R2 , R3 , B1 , B2 , B3 for the colors of the respective
draws. We need

P{R1 ∩ B2 ∩ B3 } + P{B1 ∩ R2 ∩ B3 } + P{B1 ∩ B2 ∩ R3 }


= P{R1 } · P{B2 | R1 } · P{B3 | R1 ∩ B2 }
+ P{B1 } · P{R2 | B1 } · P{B3 | B1 ∩ R2 }
+ P{B1 } · P{B2 | B1 } · P{R3 | B1 ∩ B2 }
6 5 4 5 6 4 5 4 6 4
= · · + · · + · · = .
11 10 9 11 10 9 11 10 9 11

60
10//281
35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

Bayes’ Theorem

The aim here is to say something about P{F | E}, once we


know P{E | F } (and other things. . . ). This will be very useful,
and serve as a fundamental tool in probability and statistics.

61
11//281
35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

1. The Law of Total Probability

Theorem (Law of Total Probability; aka. Partition Thm.)


For any events E and F ,

P{E} = P{E | F } · P{F } + P{E | F c } · P{F c }.

(As usual, we assume that the conditionals exist.)

Proof.

62
12//281
35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

1. The Law of Total Probability

Example
According to an insurance company,
◮ 30% of population are accident-prone, they will have an
accident in any given year with 0.4 chance;
◮ the remaining 70% of population will have an accident in
any given year with 0.2 chance.
Accepting this model, what is the probability that a new
customer will have an accident in 2016?

63
13//281
35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

1. The Law of Total Probability

Solution
Define the following events:
◮ F : = {new customer is accident-prone};
◮ A2016 : = {new customer has accident in 2016}.
Given are: P{F } = 0.3, P{A2016 | F } = 0.4, P{A2016 | F c } = 0.2.
Therefore,

P{A2016 } = P{A2016 | F } · P{F } + P{A2016 | F c } · P{F c }


= 0.4 · 0.3 + 0.2 · 0.7 = 0.26.

Notice a weighted average of 0.4 and 0.2 with weights 30% and
70%.

64
14//281
35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

1. The Law of Total Probability

Solution
Define the following events:
◮ F : = {new customer is accident-prone};
◮ A2016 : = {new customer has accident in 2016}.
Given are: P{F } = 0.3, P{A2016 | F } = 0.4, P{A2016 | F c } = 0.2.
Therefore,

P{A2016 } = P{A2016 | F } · P{F } + P{A2016 | F c } · P{F c }


= 0.4 · 0.3 + 0.2 · 0.7 = 0.26.

Notice a weighted average of 0.4 and 0.2 with weights 30% and
70%.

64
15//281
35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

1. The Law of Total Probability


Definition
Finitely or countably infinitely many events F1 , F2 , . . . form a
complete
S system of events, or a partition of Ω, if Fi ∩ Fj = ∅ and
i Fi = Ω.

Notice that exactly one of the Fi ’s occurs.

Theorem (Law of Total Probability; aka. Partition Thm.)


For any event E and a complete system F1 , F2 , . . . , we have
X
P{E} = P{E | Fi } · P{Fi }.
i

For any event F , the pair F1 : = F and F2 : = F c form a


complete system, and we are back to the previous version of
the Theorem.
65
16//281
35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

2. Bayes’ Theorem
Theorem (Bayes’ Theorem)
For any events E, F ,

P{E | F } · P{F }
P{F | E} = .
P{E | F } · P{F } + P{E | F c } · P{F c }

If {Fi }i is a complete system of events, then

P{E | Fi } · P{Fi }
P{Fi | E} = P .
j P{E | Fj } · P{Fj }

Proof.
Combine the definition of the conditional with the Law of Total
Probability.

66
17//281
35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

2. Bayes’ Theorem
Let us go back to the insurance company. Imagine it’s the 1st
January 2017.

Example
We learn that the new customer did have an accident in 2016.
Now what is the chance that (s)he is accident-prone?

According to Bayes’ Theorem,

P{A2016 | F } · P{F }
P{F | A2016 } =
P{A2016 | F } · P{F } + P{A2016 | F c } · P{F c }
0.4 · 0.3 6
= = ≃ 0.46.
0.4 · 0.3 + 0.2 · 0.7 13

C.f. the unconditioned probability P{F } = 0.3.

67
18//281
35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

Independence

In some special cases partial information on an experiment


does not change the likelihood of an event. In this case we talk
about independence.

Definition
Events E and F are independent, if P{E | F } = P{E}.
Notice that, except for some degenerate cases, this is
equivalent to P{E ∩ F } = P{E} · P{F }, and to P{F | E} = P{F }.

Don’t mix independence with mutually exclusive events.

Independence is usually trivial, or rather tricky.

68
19//281
35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

Independence
Proposition
Let E and F be independent events. Then E and F c are also
independent.

Proof.

Example
Rolling two dice, Let E be the event that the sum of the
numbers is 6, F the event that the first die shows 3. These
events are not independent:

1 5 1
= P{E ∩ F } =
6 P{E} · P{F } = · .
36 36 6

69
20//281
35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

Independence

Example
Rolling two dice, Let E be the event that the sum of the
numbers is 7, F the event that the first die shows 3. These
events are independent (!):

1 6 1 1
= P{E ∩ F } = P{E} · P{F } = · = .
36 36 6 36
Equivalently,
1
= P{E | F } = P{E}.
6
Or,
1
= P{F | E} = P{F }.
6

70
21//281
35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

Independence
Example
Rolling two dice, Let
◮ E be the event that the sum of the numbers is 7,
◮ F the event that the first die shows 3,
◮ G the event that the second die shows 4.
1 6 1 1
= P{E ∩ F } = P{E} · P{F } = · = .
36 36 6 36
1 6 1 1
= P{E ∩ G} = P{E} · P{G} = · = .
36 36 6 36
1 1 1 1
= P{F ∩ G} = P{F } · P{G} = · = .
36 6 6 36
E, F , G are pairwise independent.

But we have a bad feeling about this. . .

71
22//281
35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

Independence
Example
◮ E is the event that the sum of the numbers is 7,
◮ F the event that the first die shows 3,
◮ G the event that the second die shows 4.
Are these events independent?

1
1 = P{E | F ∩ G} =
6 P{E} = !
6
Or, equivalently,

1 1 1
= P{E ∩ F ∩ G} =
6 P{E} · P{F ∩ G} = · !
36 6 36

Recall P{F ∩ G} = P{F } · P{G} from the previous page.


72
23//281
35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

Independence

Definition
Three events E, F , G are (mutually) independent, if

P{E ∩ F } = P{E} · P{F },


P{E ∩ G} = P{E} · P{G},
P{F ∩ G} = P{F } · P{G},
P{E ∩ F ∩ G} = P{E} · P{F } · P{G}.

And, for more events the definition is that any (finite) collection
of events have this factorisation property.

73
24//281
35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

Independence
Below, 0 < p < 1 is a probability parameter.

Example
n independent experiments are performed, each of which
succeeds with probability p. What is the probability that every
single experiment succeeds?

Easy: pn .

Example (. . . )
. . . What is the probability that at least one experiment
succeeds?

Looking at the complement, 1 − P{each fails} = 1 − (1 − p)n .

74
25//281
35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

Independence
Below, 0 < p < 1 is a probability parameter.

Example
n independent experiments are performed, each of which
succeeds with probability p. What is the probability that every
single experiment succeeds?

Easy: pn . −→ 0
n→∞

Example (. . . )
. . . What is the probability that at least one experiment
succeeds?

Looking at the complement, 1 − P{each fails} = 1 − (1 − p)n .

74
26//281
35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

Independence
Below, 0 < p < 1 is a probability parameter.

Example
n independent experiments are performed, each of which
succeeds with probability p. What is the probability that every
single experiment succeeds?

Easy: pn . −→ 0
n→∞

Example (. . . )
. . . What is the probability that at least one experiment
succeeds?

Looking at the complement, 1 − P{each fails} = 1 − (1 − p)n .


−→ 1
n→∞

74
27//281
35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

Independence
Below, 0 < p < 1 is a probability parameter.

Example
n independent experiments are performed, each of which
succeeds with probability p. What is the probability that every
single experiment succeeds?

Easy: pn . −→ 0
n→∞

Example (Murphy’s Law)


. . . What is the probability that at least one experiment
succeeds?

Looking at the complement, 1 − P{each fails} = 1 − (1 − p)n .


−→ 1
n→∞

74
28//281
35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

Independence

Example
n independent experiments are performed, each of which
succeeds with probability p. What is the probability that exactly
k of them succeed?
 
n
· pk · (1 − p)n−k .
k

75
29//281
35
Conditional independence
A black box contains two coins: a fair coin (H,T) and a fake coin
(H,H). We randomly take a coin from the box then flip it twice.
Suppose that we observe heads both times.
Question: Are the two events obtaining head independent?

30 / 35
Conditional independence
A black box contains two coins: a fair coin (H,T) and a fake coin
(H,H). We randomly take a coin from the box then flip it twice.
Suppose that we observe heads both times.
Question: Are the two events obtaining head independent?

▶ Conditional independence
• Let A, B and C be events, P(C ) > 0. A and B are said to be
conditionally independent given C if and only if:

P(A | B, C ) = P(A | C )

• An equivalent definition:

P(A, B | C ) = P(A | C )P(B | C )

31 / 35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

Conditional independence

Back to the insurance company and to the 1st January 2017


again.

Example
We learn that the new customer did have an accident in 2016.
Now what is the chance that (s)he will have one in 2017?

The question is P{A2017 | A2016 }. We again consider F (being


accident-prone):

76
32//281
35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

Conditional independence
Example (. . . cont’d)

P{A2017 ∩ A2016 }
P{A2017 | A2016 } =
P{A2016 }
P{A2017 ∩ A2016 ∩ F } P{A2017 ∩ A2016 ∩ F c }
= +
P{A2016 } P{A2016 }
P{A2017 ∩ A2016 ∩ F } P{A2016 ∩ F }
= ·
P{A2016 ∩ F } P{A2016 }
P{A2017 ∩ A2016 ∩ F c } P{A2016 ∩ F c }
+ ·
P{A2016 ∩ F c } P{A2016 }
= P{A2017 | A2016 ∩ F } · P{F | A2016 }
+ P{A2017 | A2016 ∩ F c } · P{F c | A2016 }.

77
33//281
35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

Conditional independence
Example (. . . cont’d)

P{A2017 ∩ A2016 }
P{A2017 | A2016 } =
P{A2016 }
P{A2017 ∩ A2016 ∩ F } P{A2017 ∩ A2016 ∩ F c }
= +
P{A2016 } P{A2016 }
P{A2017 ∩ A2016 ∩ F } P{A2016 ∩ F }
= ·
P{A2016 ∩ F } P{A2016 }
P{A2017 ∩ A2016 ∩ F c } P{A2016 ∩ F c }
+ ·
P{A2016 ∩ F c } P{A2016 }
= P{A2017 | A2016 ∩ F } · P{F | A2016 }
+ P{A2017 | A2016 ∩ F c } · P{F c | A2016 }.

Conditional Law of Total Probability, very useful.


77
34//281
35
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Cond. Bayes Independence

Conditional independence
Example (. . . cont’d)
Now, we have conditional independence:

P{A2017 | A2016 ∩ F } = P{A2017 | F } = 0.4 and


c c
P{A2017 | A2016 ∩ F } = P{A2017 | F } = 0.2.

Thus,

P{A2017 | A2016 }
= P{A2017 | F } · P{F | A2016 } + P{A2017 | F c } · P{F c | A2016 }
6 7
= 0.4 · + 0.2 · ≃ 0.29.
13 13

C.f. P{A2017 } = P{A2016 } = 0.26 from before.


A2016 and A2017 are dependent!
78
35//281
35

You might also like