L2 Biostatistics Probability
L2 Biostatistics Probability
Probability &
Bayes Theorem
2
What is Probability?
◦Uncertainty comes from an underlying random
process
◦Random processes can be quantified by probability
◦So statistics is using probability to understand the
uncertainty from randomness
◦Each has its own mathematical definition coming
next.
3
Random Processes
◦ A random process is a situation in which we know what
outcomes could happen, but we don't know which particular
outcome will happen.
◦ Examples: coin tosses, die rolls, iTunes shuffle, whether the stock
market goes up or down tomorrow, etc.
◦ Probability is a numerical representation of the chance of each
outcome of a random process.
4
Probability
There are several possible interpretations of probability
but they (almost) completely agree on the mathematical
rules probability must follow.
P(A) = Probability of event A
0 ≤ P(A) ≤ 1
5
Probability
Frequentist interpretation:
The probability of an outcome is the proportion of times the outcome would
occur if we observed the random process an infinite number of times.
6
Numerical Probability
Probability can often be defined logically by recognition of symmetry
Answer: for a fair die, no side has higher or lower chance of landing, and since there
are a total 6 sides, so probability of seeing the side “2” is 1/6
7
Law of Large Numbers
Law of large numbers states that as more observations are collected, the
proportion of occurrences with a particular outcome, 𝑃𝑃�𝑛𝑛 ,converges to the
probability of that outcome, p.
The proportion tends to get closer to the probability 1/6 = 0.167 as the
number of rolls increases.
8
Law of Large Numbers
9
Law of Large Numbers
When tossing a fair coin, if heads comes up
on each of the first 10 tosses, what do you
think the chance is that a head will come
up on the next toss? 0.5, less than 0.5, or
more than 0.5?
10
Law of Large Numbers
When tossing a fair coin, if heads comes up
on each of the first 10 tosses, what do you
think the chance is that a head will come
up on the next toss? 0.5, less than 0.5, or
more than 0.5?
The probability is still 0.5, or there is still a 50% chance that a head
will come up on the next toss.
11
Basic Notations in Math/Statistics
◦ Sample Space is a set of all possible outcomes, commonly denoted by capital
Greek letters, such as Ω
Die example: Ω = {1,2,3,4,5,6}
◦ An Event is a subset of the Sample Space commonly denoted by a capital letter
“event of an even side”: A={2,4,6}
◦ For convenience, ∅ is used to represent the null (or empty) event of “nothing
happens”.
12
Operators on Events
Since sample space and events are defined as sets and their elements are outcomes of
experiment. It is important to understand some basic operators on sets, similar to the
situation of operations on basic numbers. Most operations are best illustrated graphically
known as Venn Diagram.
13
Union
A∪B is the combined area of A and B
2 1
9 12
4 5 3
A
A = {1, 2, 3, 4, 5} and B = {1, 3, 9, 12}. A B
A∪B={1,2,3,4,5,9,12}
Intersection
A∩B is the common (overlap) area of A and B
A = {1, 2, 3, 4, 5} and B = {1, 3, 9, 12}.
15
Intersection and Union
If A∩B = ∅, then the area of A∪B is the sum of area of A and area of B
16
Intersection and Union…
A St. Croix Story
Union – U faces towards you – your heart is
taken; The union of two sets contains all the
elements contained in either set (or both
sets)
17
Relationship
Between Events
- Recap
Intersection
◦ Denoted ∩
◦ Both A and B occur
Union
◦ Denoted ∪
◦ Either A or B occur
Complement
◦ Denoted by C (e.g. AC)
◦ A does not occur
◦ We will talk about this more
soon… get excited!
18
19
Some Probability Rules
The probability of mutually exclusive events is the sum of the probabilities
◦ 𝑃𝑃 𝐴𝐴 ∪ 𝐵𝐵 = 𝑃𝑃 𝐴𝐴 + 𝑃𝑃 𝐵𝐵 , if A and B are mutually exclusive
◦ Called the addition rule, allows probabilities to be summed
◦ P(A1 ∪ ⋯ ∪ Ak) = P(A1) + ⋯ + P(Ak) if A1 ⋯ AK are pairwise disjoint events (mutually exclusive)
The probability of the the union of A and B is equal to the sum of the probability
of A and the probability of B, minus the probability of the intersection of A and B
◦ P(A∪B) = P(A) + P(B) – P(A∩B)
20
Union and Intersection –
Can happen for more than 2 events!
Intersection and union can be defined for more than two events:
◦ A∩B∩C∩D
◦ A∪B∪C∪D
In general intersection “distributes”
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
Union:
A∪B∪C is the event which is true if any of
A, B or C is true. In a Venn diagram, it is
the combined area of the three events.
A∪B ∪C={a,b,c,d,e,f,g,h}
21
Union and Intersection –
Can happen for more than 2 events!
Intersection:
A∩B∩C is the event that is common to A, B and C, that is, it is true only when A, B and C
are all true. In Venn diagram, it is the common area belonging to A, B and C.
𝐴𝐴 ∩ 𝐵𝐵 ∩ 𝐶𝐶 = 𝑓𝑓
The complement of A
will be all the gray!
If A∩B = ∅, event
A and B are
disjoint, or
mutually exclusive
24
Union of Non-Disjoint Events
What is the probability of drawing a jack or a red card from a
well shuffled full deck?
(c) 78 / 165
(d) 78 / 188
(e) 11 / 47
27
The Addition Rule
𝑃𝑃 𝐴𝐴 ∪ 𝐵𝐵 = 𝑃𝑃 𝐴𝐴 + 𝑃𝑃 𝐵𝐵 , if A and B are mutually exclusive
Using a di as an example, the probability of rolling each outcome is 1/6.
◦ P(1) = 1/6
◦ P(2) = 1/6
28
Recap
General addition rule
P(A or B) = P(A) + P(B) - P(A and B)
Note: For disjoint events P(A and B) = 0, so the above formula simplifies to P(A or B) =
P(A) + P(B)
29
Probabilities
31
Probability Distribution
A probability distribution lists all possible events and the probabilities with which they
occur.
The probability distribution for the sex of one kitten:
32
Probability Distribution
A probability distribution lists all possible events and the probabilities with which they occur.
33
Sample Space and Complements
Sample space is the collection of all possible outcomes of a
trial.
◦ A cat has one kitten, what is the sample space for the sex of
the kitten? S = {M, F}
◦ A cat has two kittens, what is the sample space for the sex of
these kittens? S = {MM, FF, FM, MF}
34
Independence
Two processes are independent if knowing the
outcome of one provides no useful information about
the outcome of the other.
◦ Knowing that the coin landed on a head on the first toss does not provide any useful
information for determining what the coin will land on in the second toss.
>> Outcomes of two tosses of a coin are independent.
◦ Knowing that the first card drawn from a deck is an ace does provide useful information for
determining the probability of drawing an ace in the second draw.
>> Outcomes of two draws from a deck of cards (without replacement) are dependent.
35
Independence
How can we define independence?
P(A⋂B) = P(A)P(B)
Generalized: P(A1⋂⋯⋂An) = P(A1)⋯P(An)
Also called the multiplication rule
P(A|B)=P(A)
Example of independence: using two die
36
Marginal, Joint, and
Conditional Probabilities
37
Marginal, Joint, and
Conditional Probabilities
Marginal probability: the probability of an event occurring (p(A)), it may be thought of as an
unconditional probability. It is not conditioned on another event. Example: the probability that a
card drawn is red (p(red) = 0.5). Another example: the probability that a card drawn is a 4
(p(four)=1/13).
Joint probability: p(A and B). The probability of event A and event B occurring. It is the probability
of the intersection of two or more events. The probability of the intersection of A and B may be
written p(A ∩ B). Example: the probability that a card is a four and red =p(four and red) = 2/52=1/26.
(There are two red fours in a deck of 52, the 4 of hearts and the 4 of diamonds).
Conditional probability: p(A|B) is the probability of event A occurring, given that event B occurs.
Example: given that you drew a red card, what’s the probability that it’s a four
(p(four|red))=2/26=1/13. So out of the 26 red cards (given a red card), there are two fours so
2/26=1/13.
P(relapsed) = 48 / 72 ~ 0.67
51
Let’s Practice! - RECALL
Example: Relapse – Conditional Probability
53
Independence and Conditional
Probabilities
Consider the following (hypothetical) distribution of gender
and major of students in an introductory statistics class:
54
Independence and Conditional
Probabilities
Consider the following (hypothetical) distribution of gender and
major of students in an introductory statistics class:
The probability that a randomly selected student is a social science major is?
60/100 = 0.6
The probability that a randomly selected student is a social science major
given that they are female is 30 / 50 = .6 or 60%
Since P(SS | M) also equals 0.6, major of students in this class does not
depend on their gender: P(SS | F) = P(SS).
56
Independence and Conditional
Probabilities
Generically, if P(A | B) = P(A) then the events A and B are said to be independent.
• Conceptually: Giving B doesn’t tell us anything about A.
• Formally: if P(A | B) = P(A), then A and B are independent
57
Inverting Probabilities
And Bayes’ Theorem
58
Inverting Probabilities
When a patient goes through breast cancer screening there are
two competing claims: patient has cancer and patient doesn't
have cancer. If a mammogram yields a positive result, what is
the probability that patient actually has cancer?
◦ American Cancer Society estimates that about 1.7% of women have breast cancer.
http://www.cancer.org/cancer/cancerbasics/cancer-prevalence
◦ Susan G. Komen For The Cure Foundation states that mammography correctly
identifies about 78% of women who truly have breast cancer.
http://ww5.komen.org/BreastCancer/AccuracyofMammograms.html
◦ An article published in 2003 suggests that up to 10% of all mammograms result in
false positives for patients who do not have cancer.
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1360940
60
Inverting Probabilities
When a patient goes through breast cancer screening there are two
competing claims: patient has cancer and patient doesn't have cancer. If a
mammogram yields a positive result, what is the probability that patient
actually has cancer?
61
Inverting Probabilities
When a patient goes through breast cancer screening there are two
competing claims: patient has cancer and patient doesn't have cancer. If a
mammogram yields a positive result, what is the probability that patient
actually has cancer?
Note: Tree diagrams are useful for inverting probabilities: we are given P(+|C) and asked for P(C|+). 62
Inverting Probability and Bayes’ Rule
There is a formal Rule for inverting probability: It is called Bayes’ Theorem.
63
Bayes’ Theorem
Bayes’ theorem is a means to “invert” a conditional probability
Bayes’ theorem applies to events with P>0
P(A|B) = P(B|A)*P(A)/P(B)
Law of total probability
◦ Generalization of Bayes’ theorem
◦ P(Ai|B) = P(B|Ai)*P(Ai)/P(B)
◦ Example of application = diagnosis of patients given a set of symptoms
64
Application: Inverting Probabilities
A common epidemiological model for the spread of diseases is the SIR model,
where the population is partitioned into three groups: Susceptible, Infected, and
Recovered. This assumes infection occurs once (eg. chicken pox)
Imagine a population in the midst of an epidemic where 60% of the population is
susceptible, 10% is infected, and 30% is recovered. The only test for the disease
is accurate 95% of the time for susceptible individuals, 99% for infected
individuals, but 65% for recovered individuals. (Accurate means a negative result
for susceptible and recovered individuals and a positive result for infected
individuals).
-probability tree?
-If test is positive, what is the probability of having disease?
65
Application: Inverting Probabilities
66
Screening Tests
Generally, less invasive procedures are used to screen (or predict) the presence of a
disease or condition
◦ PSA test (prostate cancer)
◦ Blood pressure (hypertension)
◦ Mammography (breast cancer)
67
Test Validity
• Validity of a test is defined as its ability to distinguish
between who has a disease and who does not. Simply…
– ability of a measuring instrument to give a true measure
68
Sensitivity and Specificity
Validity has two components: Sensitivity & Specificity
• Specificity = the ability of the test to identify correctly those who do not have the disease
– percentage of cases without the disease who were identified as having no disease by
the test
69
Sample 2x2 Table for Assessing the Validity of the Test with
Dichotomous Results
Gold Standard
Disease (+) No Disease (-) Total
a b
Positive (+) a +b
(True Positives, TP) (False Positives, FP)
Test
Result c d
Negative (-) c +d
(False Negatives, FN) (True Negatives, TN)
Total a +c b +d a+b+c+d
71
PV+ and Sensitivity
PV+
◦ The probability of disease given a positive screening test
𝑃𝑃 𝐵𝐵|𝐴𝐴 or P(D|T+)
Sensitivity (Se)
◦ The probability of a positive screen test given disease
𝑃𝑃 𝐴𝐴|𝐵𝐵 or P(T+|D)
72
PV- and Specificity
(Predictive Value Negative) PV-
◦ The probability of NO disease given a negative screening test)
� 𝐴𝐴)̅ or P(ND|T-)
𝑃𝑃(𝐵𝐵|
Specificity (Sp)
◦ The probability of a negative screen test, given NO disease
𝑃𝑃 𝐴𝐴|̅ 𝐵𝐵� or P(T-|ND)
73
PV+, PV-, Specificity, and Sensitivity
When a person has a disease, the screening test should have
◦ High PV+
◦ High Se
Notice that the sign in the notation is the result of the test, “true” qualifies if the result
of the Test matches the real disease status
False positive
◦ Positive test, but person is free of disease
◦ P(False+) = 1 – Specificity
◦ P(𝐹𝐹+)=P(𝑇𝑇+│𝑁𝑁𝐷𝐷)=1−P(𝑇𝑇−|𝑁𝑁𝐷𝐷)
77
Let’s Try It!
PSA and Prostate Cancer Association.
◦ Calculate the: Sensitivity & Specificity
◦ Can you also do the probability of a false positive and negative??
Prostate Cancer
D+ D-
T+
test results
119
PSA
T- 118
138 99 78
Check Yourself!
Sensitivity = P(T+|D+)
= P(T+ and D+)/P(D+)
= 92/138 = 0.667
Specificity = P(T-|D-)
= P(T- and D-)/P(D-)
= 72/99 = 0.727
79
Let’s Try Another One!
Example: Hypertension
Suppose that 84% of hypertensive people and 23% of non-hypertensive people are
classified as hypertensive by a given test
If the prevalence of hypertension in the population is 20%, what is the quality of the test?
Draw a Table!!!
80
Table!
Example: Hypertension
Individual has Individual does not have
Hypertension Hypertension
Positive Test 84% 23%
Negative Test 16% 77%
100% 100%
81
Example: Hypertension Answers
Sensitivity = P(+|D) = 0.84
P(D) = 0.20
82
Predictive Value Positive
𝑃𝑃 +|𝐷𝐷 𝑃𝑃 𝐷𝐷
𝑃𝑃𝑃𝑃 + = 𝑃𝑃 𝐷𝐷| + =
� 𝑃𝑃 𝐷𝐷
𝑃𝑃 +|𝐷𝐷 𝑃𝑃 𝐷𝐷 + 𝑃𝑃 +|𝐷𝐷 �
0.84 0.2
= = 0.477
0.84 0.2 + 1 − .77 0.8
The positive test result is not very predictive since we are less than 50% sure that a person
has hypertension if he/she has a positive test based on this machine.
Hey! We are using Bayes’ Theorem!
83
Predictive Value Negative
𝑃𝑃 −| � 𝑃𝑃 𝐷𝐷
𝐷𝐷 �
�| − =
𝑃𝑃𝑃𝑃 − = 𝑃𝑃 𝐷𝐷
� 𝑃𝑃 𝐷𝐷
𝑃𝑃 −|𝐷𝐷 � + 𝑃𝑃 −|𝐷𝐷 P(𝐷𝐷)
1 − 0.23 ∗ 0.8
= = 0.95
1 − 0.23 ∗ 0.8 + 1 − 0.84 0.2
The negative test result is very informative since we are 95% sure that a person with negative
result are not hypertensive.
84