02 Probability, Bayes Theorem and The Monty Hall Problem

Probability, Bayes Theorem and the Monty Hall Problem
Probability Distributions
A random variable is a variable whose value is uncertain. For example, the height of a randomly selected person in this class is a random variable I wont know its value until the person is selected. Note that we are not completely uncertain about most random variables.
For example, we know that height will probably be in the 5-6 range. In addition, 56 is more likely than 50 or 60 (for women).
The function that describes the probability of each possible value of the random variable is called a probability distribution.
PSYC 6130, PROF. J. ELDER
Probability distributions are closely related to frequency distributions.
Dividing each frequency by the total number of scores and multiplying by 100 yields a percentage distribution.
Dividing each frequency by the total number of scores yields a probability distribution.
For a discrete distribution, the probabilities over all possible values of the random variable must sum to 1.
For a discrete distribution, we can talk about the probability of a particular score occurring, e.g., p(Province = Ontario) = 0.36. We can also talk about the probability of any one of a subset of scores occurring, e.g., p(Province = Ontario or Quebec) = 0.50. In general, we refer to these occurrences as events.
For a continuous distribution, the probabilities over all possible values of the random variable must integrate to 1 (i.e., the area under the curve must be 1). Note that the height of a continuous distribution can exceed 1!
Shaded area = 0.683 Shaded area = 0.954 Shaded area = 0.997
Continuous Distributions
For continuous distributions, it does not make sense to talk about the probability of an exact score.
e.g., what is the probability that your height is exactly 65.485948467 inches?
Normal Approximation to probability distribution for height of Canadian females (parameters from General Social Survey, 1991)
0.16 0.14
= 5'3.8"
?
Probability p
0.12 0.1 0.08 0.06 0.04 0.02 0 55 60
s = 2.6"
65 Height (in)
9
70
75
Continuous Distributions
It does make sense to talk about the probability of observing a score that falls within a certain range
e.g., what is the probability that you are between 53 and 57? e.g., what is the probability that you are less than 510?
Valid events
Normal Approximation to probability distribution for height of Canadian females (parameters from General Social Survey, 1991)
0.16 0.14
= 5'3.8" s = 2.6"
Probability p
0.12 0.1 0.08 0.06 0.04 0.02 0 55 60 65 Height (in)

10
70
75
Probability of Combined Events

Let p( A) represent the probability of event A. 0 p( A) 1 If A and B are disjoint (mutually exclu sive) events, then p( A or B ) = p( A) + p(B )
Example: in the context of the Community Health Survey: Let A represent the event that the respondent lives in Alberta. Let B represent the event that the respondent lives in BC.
Then p( A) = 0.087 p(B ) = 0.106 p( A or B ) = 0.193

PSYC 6130, PROF. J. ELDER 11
Probability of Combined Events

More generally, if A and B are not mutually exclusive, p( A or B ) = p( A) + p(B ) p( A and B)
Example: Canadian Community Health Survey, Sleeping Habits
Let A = event that respondent sleeps less than 6 hours per night.
Let B = event that respondent reports trouble sleeping most or all of the time
p( A) = 0.139 p(B ) = 0.152 p( A and B ) = 0.061 Thus p( A or B ) = 0.139 + 0.152 0.061 = 0.230
Exhaustive Events
Two or more events are said to be exhaustive if at least one of them must occur. For example, if A is the event that the respondent sleeps less than 6 hours per night and B is the event that the respondent sleeps at least 6 hours per night, then A and B are exhaustive. (Although A is probably the more exhausted!!)
13
Independence
Two events are independent if the occurence of one in no way affects the probability of the other. If events A and B are independent, then p( A and B ) = p( A) p(B ) If events A and B are not independent, then p( A and B ) = p( A )p(B | A) Example: pick a card, any card.
14
An Example: The Monty Hall Problem
15
Problem History
When problem first appeared in Parade, approximately 10,000 readers, including 1,000 PhDs, wrote claiming the solution was wrong. In a study of 228 subjects, only 13% chose to switch.
16
Intuition
Before Monty opens any doors, there is a 1/3 probability that the car lies behind the door you selected (Door 1), and a 2/3 probability it lies behind one of the other two doors. Thus with 2/3 probability, Monty will be forced to open a specific door (e.g., the car lies behind Door 2, so Monty must open Door 3). This concentrates all of the 2/3 probability in the remaining door (e.g., Door 2).
17
18
Analysis
Car hidden behind Door 1 Car hidden behind Door 2 Player initially picks Door 1 Car hidden behind Door 3
Host opens either Door 2 or 3
Host must open Door 3
Host must open Door 2
Switching loses with probability 1/6
Switching wins with probability 1/3

19
Notes
It is important that
Monty must open a door that reveals a goat Monty cannot open the door you selected
These rules mean that your choice may constrain what Monty does.
If you initially selected a door concealing a goat, then there is only one door Monty can open.
One can rigorously account for the Monty Hall problem using a Bayesian analysis
20
End of Lecture 2
Sept 17, 2008
Conditional Probability
To understand Bayesian inference, we first need to understand the concept of conditional probability. What is the probability I will roll a 12 with a pair of (fair) dice? What if I first roll one die and get a 6? What now is the probability that when I roll the second die they will sum to 12?
Let A be the state of die 1 Let B be the state of die 2 Let C be the sum of die 1 and 2
p( A = 6) = __? p(B = 6) = __? p(C = 12) = __? p(C = 12 | A = 6) = __?

Probability of C given A
22
Conditional Probability
The conditional probability of A given B is the joint probability of A and B, divided by the marginal probability of B.
p( A, B ) p( A | B ) = p(B )
Thus if A and B are statistically independent,

p( A | B ) = p( A, B ) p( A)p(B ) = = p( A). p(B ) p(B )
However, if A and B are statistically dependent, then

p( A | B ) p( A).
Bayes Theorem
Bayes Theorem is simply a consequence of the definition of conditional probabilities:
p( A, B ) p( A | B ) = p( A, B ) = p( A | B )p(B ) p(B ) p( A, B ) p(B | A) = p( A, B ) = p(B | A)p( A) p( A)
Thus p( A | B )p(B ) = p(B | A)p( A) p(B | A)p( A) p( A | B ) = p(B )
Bayes Equation
Bayes Theorem
Bayes theorem is most commonly used to estimate the state of a hidden, causal variable H based on the measured state of an observable variable D:
Likelihood Prior
p(H | D ) =
p(D | H )p(H ) p(D )
Posterior
Evidence
25
Bayesian Inference
Whereas the posterior p(H|D) is often difficult to estimate directly, reasonable models of the likelihood p(D|H) can often be formed. This is typically because H is causal on D. Thus Bayes theorem provides a means for estimating the posterior probability of the causal variable H based on observations D.
26
Marginalizing
To calculate the evidence p(D) in Bayes equation, we typically have to marginalize over all possible states of the causal variable H.
p(D | H )p(H ) p(H | D ) = p(D )
p(D ) = p(D, H1 ) + p(D, H2 ) + L + p(D, Hn ) = p(D | H1 )p(H1 ) + p(D | H 2 )p(H 2 ) + L + p(D | H n )p(Hn )
27
The Full Monty

Lets get back to The Monty Hall Problem. Lets assume you initially select Door 1. Suppose that Monty then opens Door 2 to reveal a goat. We want to calculate the posterior probability that a car lies behind Door 1 after Monty has provided these new data.
28
The Full Monty

Let Ci represent the state that the car lies behind Door i , i [1,2,3]. Let Mi represent the event that Monty opens door i , i [1,2,3], revealing a goat.
p(M2 | C1 )p(C1 ) We seek p(C1 | M2 ) = p(M2 )
29
The Full Monty

Since p(C2 | M2 ) = 0, we can obtain p(C3 | M 2 ) by subtracting p(C1 | M 2 ) from 1 (Remember that the probabilities of exhaustive events add to 1!) However, we can also calculate p(C3 | M2 ) directly:
p(C3 | M2 ) =
p(M2 | C3 )p(C3 ) p(M 2 )
30
But were not on Lets Make a Deal!

Why is the Monty Hall Problem Interesting?
It reveals limitations in human cognitive processing of uncertainty It provides a good illustration of many concepts of probability It get us to think more carefully about how we deal with and express uncertainty as scientists.
What else is Bayes theorem good for?
31
Clinical Example
Christiansen et al (2000) studied the mammogram results of 2,227 women at health centers of Harvard Pilgrim Health Care, a large HMO in the Boston metropolitan area. The women received a total of 9,747 mammograms over 10 years. Their ages ranged from 40 to 80. Ninety-three different radiologists read the mammograms, and overall they diagnosed 634 mammograms as suspicious that turned out to be false positives. This is a false positive rate of 6.5%. The false negative rate has been estimated at 10%.
32
Clinical Example
There are about 58,500,000 women between the ages of 40 and 80 in the US The incidence of breast cancer in the US is about 184,200 per year, i.e., roughly 1 in 318.
33
Clinical Example
Let C0 represent the absence of cancer. Let C1 represent the presence of cancer.
Let M0 represent a negative mammogram result. Let M1 represent a positive mammogram result.
Suppose your friend receives a positive mammogram result. What quantity do you want to compute?
Remember: p(C1 | M1 ) p(M1 | C1 )!


02 Probability, Bayes Theorem and The Monty Hall Problem

Uploaded by

Copyright:

Available Formats

02 Probability, Bayes Theorem and The Monty Hall Problem

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

02 Probability, Bayes Theorem and The Monty Hall Problem

Uploaded by

Copyright:

Available Formats

What are probability distributions and how are they related to frequency distributions?

What are probability distributions and how are they related to frequency distributions?

What is the difference between discrete and continuous probability distributions?

What is the difference between discrete and continuous probability distributions?

Probability, Bayes Theorem and the Monty Hall Problem

PSYC 6130, PROF. J. ELDER

PSYC 6130, PROF. J. ELDER

PSYC 6130, PROF. J. ELDER

PSYC 6130, PROF. J. ELDER

PSYC 6130, PROF. J. ELDER

PSYC 6130, PROF. J. ELDER

PSYC 6130, PROF. J. ELDER

0.12 0.1 0.08 0.06 0.04 0.02 0 55 60

PSYC 6130, PROF. J. ELDER

0.12 0.1 0.08 0.06 0.04 0.02 0 55 60 65 Height (in)

PSYC 6130, PROF. J. ELDER

Probability of Combined Events

Then p( A) = 0.087 p(B ) = 0.106 p( A or B ) = 0.193

Probability of Combined Events

Example: Canadian Community Health Survey, Sleeping Habits

PSYC 6130, PROF. J. ELDER

PSYC 6130, PROF. J. ELDER

An Example: The Monty Hall Problem

PSYC 6130, PROF. J. ELDER

PSYC 6130, PROF. J. ELDER

PSYC 6130, PROF. J. ELDER

PSYC 6130, PROF. J. ELDER

Host opens either Door 2 or 3

Host must open Door 3

Host must open Door 2

Switching loses with probability 1/6

Switching loses with probability 1/6

Switching wins with probability 1/3

Switching wins with probability 1/3

Switching loses with probability 1/3

Switching wins with probability 2/3

PSYC 6130, PROF. J. ELDER

p( A = 6) = __? p(B = 6) = __? p(C = 12) = __? p(C = 12 | A = 6) = __?

PSYC 6130, PROF. J. ELDER

Thus if A and B are statistically independent,

However, if A and B are statistically dependent, then

p(D | H )p(H ) p(D )

PSYC 6130, PROF. J. ELDER

PSYC 6130, PROF. J. ELDER

PSYC 6130, PROF. J. ELDER

The Full Monty

PSYC 6130, PROF. J. ELDER

The Full Monty

PSYC 6130, PROF. J. ELDER

The Full Monty

p(M2 | C3 )p(C3 ) p(M 2 )

PSYC 6130, PROF. J. ELDER

But were not on Lets Make a Deal!

What else is Bayes theorem good for?

PSYC 6130, PROF. J. ELDER

PSYC 6130, PROF. J. ELDER

PSYC 6130, PROF. J. ELDER

Remember: p(C1 | M1 ) p(M1 | C1 )!

You might also like

p( A = 6) = ? p(B = 6) = ? p(C = 12) = ? p(C = 12 | A = 6) = ?