0% found this document useful (0 votes)
66 views26 pages

Probability Review: Authors: Blume, Greevy Bios 311 Lecture Notes Page 1 of 26

This document provides an overview of Bayes' Theorem, including: 1) It introduces probability models and defines key probability terms like sample space, events, and probability distributions. 2) It presents an example probability model involving predicting the number of teeth in a randomly selected student. 3) It explains Bayes' Theorem formally in terms of conditional probabilities and the Law of Total Probability. Bayes' Theorem allows calculating the probability of a hypothesis given observed data. 4) It provides an example application of Bayes' Theorem to calculate the probability that a randomly selected, color-blind person is male.

Uploaded by

Dimple panchal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views26 pages

Probability Review: Authors: Blume, Greevy Bios 311 Lecture Notes Page 1 of 26

This document provides an overview of Bayes' Theorem, including: 1) It introduces probability models and defines key probability terms like sample space, events, and probability distributions. 2) It presents an example probability model involving predicting the number of teeth in a randomly selected student. 3) It explains Bayes' Theorem formally in terms of conditional probabilities and the Law of Total Probability. Bayes' Theorem allows calculating the probability of a hypothesis given observed data. 4) It provides an example application of Bayes' Theorem to calculate the probability that a randomly selected, color-blind person is male.

Uploaded by

Dimple panchal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Bayes’ Theorem

Probability Review

Probabilities apply to processes with unpredictable


outcomes (“random experiments”)

The probability of a particular result, or outcome,


measures the tendency of the process to produce
that result.

Probability model
(A mathematical representation of the Process)

(1) Random variable X (the result, or outcome)


(2) Sample Space S (Set of all possible outcomes)
(3) Probability distribution over S

When a probability model like this represents the


experiment, an “event” is represented by a set of the
points in S.

The probability of an event (set) A, P(A), is the sum


of probabilities of all the points that are in A.

Authors: Blume, Greevy Bios 311 Lecture Notes Page 1 of 26


Bayes’ Theorem

Example:

Suppose we select one student at random from


those registered for this class and determine the
number of teeth in that person’s head. The result of
this process will be a number -- call it X. This is our
Random Variable.

The sample space is S={0,1,2,…,30,31,32} ?

Let P(X=0) be the proportion of students with no teeth


P(X=1) be the proportion of students with one tooth
P(X=2) etc…

The event “selected student has at least 26 teeth” is


represented by the set

A = {26, 27, 28, 29, 30, 31, 32}

And the probability of this event is

P(A) = P(26) + P(27) + … + P(32) (Why?)

Authors: Blume, Greevy Bios 311 Lecture Notes Page 2 of 26


Bayes’ Theorem

The event “Selected student has an even number of


teeth” is represented by the set B = {0,2,4, …, 30, 32}.
Its probability is:

P(B) = P(0) + P(2) + ... + P(32)

The event "Selected student has at least 26 teeth or


has an even number of teeth" is represented by the
set

A or B = {26, 27, 28, 29, 30, 31, 32 , 0, 2, ..., 22, 24}

Its probability is

P(A) + P(B) - P(AB) = [P(26) + P(27) + ... + P(32)]

+ [P(0) + P(2) + ... + P(32)]

- [P(26) + P(28) + P(30) + P(32)]

Authors: Blume, Greevy Bios 311 Lecture Notes Page 3 of 26


Bayes’ Theorem

Properties of Probabilities
For event A in sample space S.

 0  P(A)  1

 P(S) = 1

 P(A) = 1- P(Ac)

 P( A or B ) = P(A) + P(B) - P(A and B)

If AB =  then P(AB) = 0.
(The intersection of A and B is the empty set –
A and B are "mutually exclusive" or "disjoint"),

 If P(A and B) = P(A)P(B), then A and B are


"independent events"

 The conditional probability of A, given B, is


defined as P(A|B) = P(A and B)/P(B)

 P(A and B) = P(A|B)P(B)


= P(B|A)P(A)

Authors: Blume, Greevy Bios 311 Lecture Notes Page 4 of 26


Bayes’ Theorem

 If A and B are independent, then

P(A|B) = P(A and B)/P(B) = P(A)P(B)/P(B) = P(A)

This says that the probability of A and the probability


of A given B, are the same. The probability of A is
unaffected by B.

On the other hand, if P(A|B)  P(A) then A and B are


not independent events. The occurrence of B
changes the probability that A will occur.

 If A and B are disjoint (mutually exclusive)


events, that both have a positive probability of
occurring, then they are not independent.

To show this, simply note that AB = , so

P(A and B) = P() = 0  P(A)P(B)

Alternatively (in terms of conditional probabilities) if


A and B are disjoint and B occurs, then A cannot, so
that

P(A|B) = 0  P(A)

Therefore A and B are not independent.

Authors: Blume, Greevy Bios 311 Lecture Notes Page 5 of 26


Bayes’ Theorem

Problem:
(source: Parade Magazine 7/27/97 – Ask Marilyn)

“A woman and a man (unrelated) each have two


children. At least one of the woman’s children is a
boy, and the man’s older child is a boy. Do the
chances that the woman has two boys equal the
chances that the man has two boys?”

Marilyn says: “The chances that the woman has two


boys are 1 in 3 and the chances that the man has
two boys are 1 in 2.”

Many people write in to tell Marilyn that she is


horribly wrong and a disgrace to the human race.
Obviously the chances are equal. Who is correct?

Authors: Blume, Greevy Bios 311 Lecture Notes Page 6 of 26


Bayes’ Theorem

To answer the question we need to set up some


notation. For any family, the probability of a boy on
one birth is ½, and births are independent.

Our sample space is S={(0,0), (0,1), (1,0), (1,1)}

Let our event be


A={older birth is a boy} ={(0,1),(1,1)}
C={Exactly one boy in two births} ={(1,0),(0,1)}
D={Exactly two boys in two births} ={(1,1)}

So that P(A)=1/2, P(C)=1/2, and P(D)=1/4.

 We are given that the man’s older child is a boy.


What is the probability of two boys, given the
older is a boy?

P(D|A) = ?

We know that

P(D|A) = P(D and A)/P(A)

=P(D)/P(A)=(1/4)/(1/2)=1/2

The probability that the man has two boys is ½.

Authors: Blume, Greevy Bios 311 Lecture Notes Page 7 of 26


Bayes’ Theorem

 We are also given that the woman has at least


one boy. What is the probability of two boys,
given at least one boy?

The event “At least one boy” is the set

{(1,0),(0,1),(1,1)} = C or A

So the question is to find P(D|{C or A})

P(D|C or A) = P(D and {C or A})/P(C or A)

=P(D)/P(C or A) =(1/4)/(3/4)=1/3

The probability that the woman has two boys is only


1/3. (Marilyn is correct.)

Authors: Blume, Greevy Bios 311 Lecture Notes Page 8 of 26


Bayes’ Theorem

Bayes’ Theorem

First we need to be familiar with the Law of Total


Probability.

Suppose the sample space is divided into any


number of disjoint sets, say A1, A2, …, An,
so that Ai  AJ = Ø and A1  A2  …  An = S

A2 A3
A1
BA2 BA3
BA1
BA4

A4 A5

In this case we can write

P(B) = P(BA1  BA2  …  BAn)


= P(B and A1) + P(B and A2) + … + P(B and An)
= P(B|A1)P(A1)+ P(B|A2)P(A2) + … + P(B|An) P(An)

Or more generally:
n

(LOTP) P(B)   P(B | A )P( A )


i 1
i i

Authors: Blume, Greevy Bios 311 Lecture Notes Page 9 of 26


Bayes’ Theorem

Example:

Suppose that we only have two disjoint sets A1, A2 so


that A1  A2 = S

Then by the Law of Total Probability we have

P(B) = P(B|A1)P(A1) + P(B|A2)P(A2)


= P(B and A1) + P(B and A2)
= P(B)

(because S is only divided by two sets)

By itself, the Law of Total Probability is not very


interesting. However, in conjunction with the law of
conditional probability, we have:

P(A1|B) = P(A1 and B) / P(B)


=P(B|A1)P(A1)/P(B)
= P(B|A1)P(A1) / (P(B|A1)P(A1) + P(B|A2)P(A2) )

or in general

P ( B | Ak ) P ( Ak )
P ( Ak | B )  n

 P(B | A )P( A )
i 1
i i

Authors: Blume, Greevy Bios 311 Lecture Notes Page 10 of 26


Bayes’ Theorem

Formally stated Bayes’ Theorem says

For mutually disjoint sets, A1, A2, …, An that comprise


the total sample space (A1  A2  …  An = S)

We have:
P ( B | Ak ) P ( Ak )
P ( Ak | B )  n

 P(B | A )P( A )
i 1
i i

In its simplest form, for two events A and B, we have

P ( B | A) P ( A)
P( A | B) 
P ( B | A) P ( A)  P ( B | A c ) P ( A c )

Authors: Blume, Greevy Bios 311 Lecture Notes Page 11 of 26


Bayes’ Theorem

Example

Suppose that 5% of men and 0.25% of women are


color-blind in a population that consists of equal
numbers of men and women. A person is chosen at
random and that person proves to be color-blind.
What is the probability that the person is male?

Solution:
Let A be the event "selected person is male" and let
B be the event "selected person is color-blind."
We want the conditional probability of A, given B.

We are given that P(B|A) = 0.05


P(B|Ac) = 0.0025 and
P(A) = 0.5 .
Thus
P(A|B) = P(A and B)/P(B)
= P(B|A)P(A) / {P(B|A)P(A) + P(B|Ac)P(Ac)}
= (0.05)(0.5) / {(0.05)(0.5) + (0.0025)(0.5)}
= (0.05) / (0.0525)
= 0.95238

Here B represents strong evidence supporting A vs


Ac, i.e., male vs female. Before B is observed, the
probability ratio is P(A)/P(Ac) = ½ / ½ = 1. Observing
B increases it to P(A|B)/P(Ac|B) = 0.95238/(1-0.95238)
= 20 .

Authors: Blume, Greevy Bios 311 Lecture Notes Page 12 of 26


Bayes’ Theorem

Important Aside:

Bayes’ theorem shows how to “turn the conditional


probabilities around”.

It is a simple fact, which has been made


controversial because of attempts to apply
probability theory to problems where A represents a
scientific hypothesis (call it H1) and B represent a
body of observed data (call it D for data).

It say in those problems, if you know the probability


of observing D when then hypothesis H1 is true P(D|
H1) and the probability when it isn’t, P(D| not H1),
then if you also can assign a probability, P(H1), to
the truth of H1 before D is observed, you can
calculate the probability P(H1|D) that hypothesis H1
is true, given the data D.

The controversial part concerns when and how one


might determine the "prior" probability that
hypothesis H1 is true, P(H1). Some have argued that
when you have no knowledge of whether H1 is true
of not you should use P(H1) = 1/2. This has been
strongly criticized.

How? Because:
P(H1|D)= P(D|H1)P(H1)/{P(D)}
=P(D| H1)P(H1)/{P(D|H1)P(H1)+ P(D|not H1) P(not H1)}

Authors: Blume, Greevy Bios 311 Lecture Notes Page 13 of 26


Bayes’ Theorem

Diagnostic tests

Sensitivity, Specificity, Positive and Negative


Predictive Value are all related via Bayes’ Theorem.
We use Bayes’ theorem every time we calculate
these values from a 2x2 table, even though it does
not feel like it.

Example:
Suppose the probability (prevalence) of a disease,
say cooties, in a population of interest is 10%. A test
is developed to detect the disease in its early stages.
When the test was applied to 1000 randomly
selected second graders, 170 tested positive. Of
those who tested positive, only 80 were confirmed to
have cooties.

cooties no cooties total


Test + 80 90 170
Test - 20 810 830
100 900 1000

From the table is it easy to see that:


1) P(cooties)=10%

2) P(Test+| cooties) = (80/1000)/(100/1000)=80/100 = 80%


3) P(Test- | not cooties) = (810/1000)/(900/1000) = 90%

4) P(cooties | Test +) = (80/1000)/(170/1000)=80/170


= 47%
5) P(not cooties | Test -) = (810/1000)/(830/1000)
= 97.6%

Authors: Blume, Greevy Bios 311 Lecture Notes Page 14 of 26


Bayes’ Theorem

These quantities have special names:

1) Prevalence
2) Sensitivity = P( Test+ | Disease )
3) Specificity = P( Test- | No Disease )
4) Positive predictive value = P( Disease | Test+ )
5) Negative predictive value = P( No Disease | Test- )

Bayes’ theorem is used implicitly in the 2x2 table:

Let D+ represent the event “having disease cooties”


and T+ represent the event “testing positive for
cooties”

Bayes’ theorem tells us that

P (T  | D  ) P ( D  )
P(D  | T ) 
P (T  | D  ) P ( D  )  P (T  | D  ) P ( D  )

or

sens  prev
PPV 
sens  prev  (1  spec )  (1  prev )

And in our example

PPV = (0.8)(0.1)/((0.8)(0.1)+(0.1)(0.9)) = 47%

Authors: Blume, Greevy Bios 311 Lecture Notes Page 15 of 26


Bayes’ Theorem

Thus, to calculate the PPV, we would need the


sensitivity, specificity, and prevalence.

Or

To calculate sensitivity, we would need the PPV, NPV


and prevalence.

Notice that both calculations depend on the


prevalence!

In an objective situation such as a diagnostic test,


the prevalence can always, in theory, be specified
because we can learn about the prevalence.

Authors: Blume, Greevy Bios 311 Lecture Notes Page 16 of 26


Bayes’ Theorem

Example:

An insurance company has three types of customers


-- high risk, medium risk, and low risk. Twenty
percent of its customers are high risk, 30% are
medium risk, and 50% are low risk. Also, the
probability that a high risk customer has at least one
accident in the current year is 0.25, while the
probability for medium risk is 0.16, and for low risk it
is only 0.10. If a randomly selected customer has at
least one accident during the year, what is the
probability that he is in the high risk group?

Solution:
Let A1 be "high risk group"
A2 be "medium risk", and
A3 be "low risk"
B be the event "has at least one accident"

Authors: Blume, Greevy Bios 311 Lecture Notes Page 17 of 26


Bayes’ Theorem

We are asked to find P(A1|B), given that

P(A1) = 0.20, P(A2) = 0.30, P(A3) = 0.50, and

P(B|A1) = 0.25 P(B|A2) = 0.16, P(B|A3) = 0.10

Bayes’ Rule gives the solution:

P(A1|B) = P(B|A1)P(A1) / {P(B|A1)P(A1) +


P(B|A2)P(A2) +
P(B|A3)P(A3)}

= (0.25)(0.20) / {(0.25)(0.20) +
(0.16)(0.30) +
(0.10)(0.50)}

= (0.05) / (0.05 + 0.048 + 0.05)

= 0.338

Authors: Blume, Greevy Bios 311 Lecture Notes Page 18 of 26


Bayes’ Theorem

Extended Reading: Expanded Diagnostic Test

MRI is often used to asses whether a tumor might be


cancerous. After looking at a scan, the tumor is
graded on the following scale:

1 = definitely not cancerous


2 = probably not cancerous
3 = inconclusive
4 = probably cancerous
5 = definitely cancerous

The following table presents some Fake data for an


experiment design to assess the accuracy of MRI for
grading tumors.

MRI Tumor
Assessment Total
1 2 3 4 5
Malignant 7 13 22 45 91 178
Benign 78 56 60 5 2 201
Total 85 69 82 50 93 379

 How do we assess the accuracy of this test?

Authors: Blume, Greevy Bios 311 Lecture Notes Page 19 of 26


Bayes’ Theorem

MRI Tumor
Assessment Total
1 2 3 4 5
Malignant 7 13 22 45 91 178
Benign 78 56 60 5 2 201
Total 85 69 82 50 93 379

If this were a 2x2 table we could calculate the


sensitivity and specificity. In fact we can do
something analogous here by collapsing the above
table into a series of 2x2 table.

If the patient received a 4 or 5 MRI score, we’ll say


that they are eligible for surgery. The properties of
such an assessment can be summarized in the
following table:

MRI Tumor
Assessment Total
1-3 4-5
Malignant 42 136 178
Benign 194 7 201
Total 236 143 379

Sensitivity = P( 4 or 5|M) = 136/178 = 0.764


Specificity = P(1,2, or 3|B) = 194/201 = 0.9652

Authors: Blume, Greevy Bios 311 Lecture Notes Page 20 of 26


Bayes’ Theorem

But the cutoff was somewhat arbitrary, why not 1-2


versus 3-5 just to be sure?

 We can calculate the properties for all score


combinations!

MRI Tumor
Assessment Total
1 2 3 4 5
Malignant 7 13 22 45 91 178
Benign 78 56 60 5 2 201
Total 85 69 82 50 93 379

Sensitivity* 1 0.96 0.89 0.76 0.51


Specificity* 0 0.39 0.67 0.97 0.99

*for having an MRI score that high or higher.

Example: Properties for basing surgery on a 2-5 MRI score.

Sensitivity P(2-5|M) = (13+22+45+91)/178 = 0.96


Specificity P(1|B) = 78/201 = 0.39

Authors: Blume, Greevy Bios 311 Lecture Notes Page 21 of 26


Bayes’ Theorem

But instead of looking at a bunch of numbers, we can


graph Sensitivity versus 1-Specificity to get a
Receiver Operator Characteristic (ROC curve)
for MRI assessment.

MRI Tumor
Assessment Total
1 2 3 4 5
Malignant 7 13 22 45 91 178
Benign 78 56 60 5 2 201
Total 85 69 82 50 93 379

Sensitivity* 1 0.96 0.89 0.76 0.51


Specificity* 0 0.39 0.67 0.97 0.99

This data was entered as two columns and 379 rows. The
first column is the MRI call for each individual (integer in
[1,5]) and the second is the tumor status (1=malignant,
0=benign). Here are the Stata commands to do this:
. use rocex
. list in 1/10

+--------------+
| mri cancer |
|--------------|
1. | 1 1 |
2. | 1 1 |
3. | 1 1 |
4. | 1 1 |
5. | 1 1 |
|--------------|
6. | 1 1 |
7. | 1 1 |
8. | 2 1 |
9. | 2 1 |
10. | 2 1 |
+--------------+

Authors: Blume, Greevy Bios 311 Lecture Notes Page 22 of 26


Bayes’ Theorem

. roctab cancer mri, table summary detail graph aspectratio(1)


| mri
cancer | 1 2 3 4 5 | Total
-----------+-------------------------------------------------------+----------
0 | 78 56 60 5 2 | 201
1 | 7 13 22 45 91 | 178
-----------+-------------------------------------------------------+----------
Total | 85 69 82 50 93 | 379

Detailed report of Sensitivity and Specificity


------------------------------------------------------------------------------
Correctly
Cutpoint Sensitivity Specificity Classified LR+ LR-
------------------------------------------------------------------------------
( >= 1 ) 100.00% 0.00% 46.97% 1.0000
( >= 2 ) 96.07% 38.81% 65.70% 1.5699 0.1013
( >= 3 ) 88.76% 66.67% 77.04% 2.6629 0.1685
( >= 4 ) 76.40% 96.52% 87.07% 21.9390 0.2445
( >= 5 ) 51.12% 99.00% 76.52% 51.3792 0.4937
( > 5 ) 0.00% 100.00% 53.03% 1.0000
------------------------------------------------------------------------------

ROC -Asymptotic Normal--


Obs Area Std. Err. [95% Conf. Interval]
--------------------------------------------------------
379 0.9028 0.0161 0.87133 0.93433
1.00
0.75
Sensitivity
0.50
0.25
0.00

0.00 0.25 0.50 0.75 1.00


1 - Specificity
Area under ROC curve = 0.9028

Authors: Blume, Greevy Bios 311 Lecture Notes Page 23 of 26


Bayes’ Theorem

An ROC curve visually displays the trade off between


sensitivity and specificity. This can be quite useful for
determining a cutoff for the diagnosis variable.

Example:
(Fisher and Van Belle p236)
Blood samples collected after a test meal, three
different blood test gave the following data:

Type of Test
Blood sugar Somogyi-Nelson Folin-Wu Anthrone
(mg/100ml) Sens Spec Sens Spec Sens Spec
70 100 0.0 100 8.2 100 2.7
80 100 1.6 97.1 22.4 100 9.4
90 100 8.8 97.1 39.0 100 22.4
100 98.6 21.4 95.7 57.3 98.6 37.3
110 98.6 38.4 92.9 70.6 94.3 54.3
120 97.1 55.9 88.6 83.3 88.6 67.1
130 92.9 70.2 78.6 90.6 81.4 80.6
140 85.7 81.4 68.6 95.1 74.3 88.2
150 80.0 90.4 57.1 97.8 64.3 92.7
160 74.3 94.3 52.9 99.4 58.6 96.3
170 61.4 97.8 47.1 99.6 51.4 98.6
180 52.9 99.0 40.0 99.8 45.7 99.2
190 44.3 99.8 34.3 100 40.0 99.8
200 40.0 99.8 28.6 100 35.7 99.8

Authors: Blume, Greevy Bios 311 Lecture Notes Page 24 of 26


Bayes’ Theorem

The corresponding ROC curves are

ROC Curve for Blood Sugar Test


1.0
0.8
0.6
Sensitivity

SN
0.4

FW
AN
0.2
0.0

0.0 0.2 0.4 0.6 0.8 1.0

1-Specificity

 Which test looks most promising any why?

 (teaser) How shall we compare these curves?

1) Best operational point?


2) Smoothest curve?
3) Area under the curve?

Authors: Blume, Greevy Bios 311 Lecture Notes Page 25 of 26


Bayes’ Theorem

Aside: Independent Trials, Independent Events,


and Independent Random Variables

If we repeat the trial or experiment, and if the


outcome of each trial is not influenced by the
outcomes of any of the others, then they are
independent trials.

If the trials are independent, and if A1 is an


event that depends only on the result of the
one trial, and if A2 is an event that depends
only on the result of another, then A1 and A2
are independent events.

If X1, X2, ... represent the result of the different


trials, and the trials are independent, then
X1, X2, ... are independent random variables.

Independent random variables can arise in other


ways as well, but the prime example of independent
random variables is this one — variables that
represent the results of independent trials.

Authors: Blume, Greevy Bios 311 Lecture Notes Page 26 of 26

You might also like