Probability
Probability
STATISTICS
Dr Rajeswari
Probabilities are numbers that reflect the likelihood that a
particular event will occur.
Age (years)
5 6 7 8 9 10 Total
Boys 432 379 501 410 420 418 2,560
Girls 408 513 412 436 461 500 2,730
Totals 840 892 913 846 881 918 5,290
Unconditional Probability
If we select a child at random (by simple random sampling), then each child has
the same probability (equal chance) of being selected, and the probability is 1/N,
where N=the population size.
Thus, the probability that any child is selected is 1/5,290 = 0.0002.
For example, what is the probability of selecting a boy or a child 7 years of age?
The following formula can be used to compute probabilities of selecting
individuals with specific attributes or characteristics.
For example, we are interested just in the girls and ask the question,
what is the probability of selecting a 9 year old from the sub-
population of girls?
There is a total of NG=2,730 girls (here NG refers to the population of
girls), and the probability of selecting a 9 year old from the sub-
population of girls is written as follows:
P(9 year old | girls) = # persons with characteristic / N
The concept of probability can be illustrated in the context of a study of obesity in children 5-
10 years of age who are seeking medical care at a particular pediatric practice. The population
(sampling frame) includes all children who were seen in the practice in the past 12 months and
is summarized below.
Age (years)
5 6 7 8 9 10 Total
Boys 432 379 501 410 420 418 2,560
Girls 408 513 412 436 461 500 2,730
Totals 840 892 913 846 881 918 5,290
where | girls indicates that we are conditioning the question
to a specific subgroup, i.e., the subgroup specified to the
right of the vertical line.
The conditional probability is computed using the same
approach we used to compute unconditional probabilities.
In this case:
P(9 year old | girls) = 461/2,730 = 0.169.
This also means that 16.9% of the girls are 9 years of age.
Note that this is not the same as the probability of selecting
a 9-year old girl from the overall population, which is P(girl
who is 9 years of age) = 461/5,290 = 0.087.
P(boy | 6 years of age) = 379/892 = 0.425. Thus 42.5% of
the 6 year olds are boys (57.5% of the 6 year olds are
girls).
Independence
Example
A sample of 100 men underwent the new test and also had a biopsy. The data
from the biopsy results are summarized below.
Prostate Test Risk Prostate Cancer No Prostate Cancer Total
Low 10 50 60
Moderate 6 30 36
High 4 20 24
20 100 120
•The probability that a man has prostate cancer given he has a low risk is: P(Prostate Cancer |
Low Risk) = 10/60 = 0.167.
•The probability that a man has prostate cancer given he has a moderate risk is: P(Prostate
Cancer | Moderate Risk) = 6/36 = 0.167.
•The probability that a man has prostate cancer given he has a high risk is: P(Prostate Cancer |
High Risk) = 4/24 = 0.167.
Note that regardless of whether the hypothetical Prostate Test was low, moderate, or high, the
probability that a subject had cancer was 0.167. In other words, knowing a man's prostate test
result does not affect the likelihood that he has prostate cancer in this example.
In this case, the probability that a man has prostate cancer is independent of his prostate test
results
Demonstrating Independence
Consider two events, call them A and B (e.g., A might be a low risk based on the "prostate test",
and B is a diagnosis of prostate cancer). These two events are independent if P(A | B) = P(A) or if
P(B | A) = P(B).
To check independence, we compare a conditional and an unconditional probability: P(A | B) =
P(Low Risk | Prostate Cancer) = 10/20 = 0.50 and P(A) = P(Low Risk) = 60/120 = 0.50. The equality
of the conditional and unconditional probabilities indicates independence.
Independence can also be tested by examining whether P(B | A) = P(Prostate Cancer | Low Risk)
= 10/60 = 0.167 and P(B) = P(Prostate Cancer) = 20/120 = 0.167. In other words, the probability of
the patient having a diagnosis of prostate cancer given a low risk "prostate test" (the conditional
probability) is the same as the overall probability of having a diagnosis of prostate cancer (the
unconditional probability).
Example:
The following table contains information on a population of N=6,732 individuals who are
classified as having or not having prevalent cardiovascular disease (CVD). Each individual is
also classified in terms of having a family history of cardiovascular disease. In this analysis,
family history is defined as a first degree relative (parent or sibling) with diagnosed
cardiovascular disease before age 60.
Prevalent CVD Free of CVD Total
Family History of CVD 491 368 859
No Family History of CVD 152 5,721 5,873
Total 643 6,089 6,732
Are family history and prevalent CVD independent? Is there a relationship between family history and
prevalent CVD? This is a question of independence of events.
Let A=Prevalent CVD and B = Family History of CVD. (Note that it does not matter how we define A and B,
for example we could have defined A=No Family History and B=Free of CVD, the result will be identical.)
We now must check whether P(A | B) = P(A) or if P(B | A) = P(B). Again, it makes no difference which
definition is used; the conclusion will be identical. We will compare the conditional probability to the
unconditional probability as follows:
Conditional Probability Unconditional Probability
P(A | B) = P(Prevalent CVD | Family History of CVD) =
P(A) = P(Prevalent CVD) = 643/6,732 = 0.096
491/859 = 0.572
In the overall population, the probability of prevalent
The probability of prevalent CVD given a family history is
CVD is 9.6% (or 9.6% of the population has prevalent
57.2% (as compared to 2.6% among patients with no
CVD).
family history).
Since these probabilities are not equal, family history and prevalent
CVD are not independent. Individuals with a family history of CVD
are much more likely to have prevalent CVD.
Bayes's Theorem
"A patient goes to see a doctor. The doctor performs a test with 99 percent
reliability--that is, 99 percent of people who are sick test positive and 99
percent of the healthy people test negative. The doctor knows that only 1
percent of the people in the country are sick. Now the question is: if the patient
tests positive, what are the chances the patient is sick?"
The solution to this question can easily be calculated using Bayes's theorem.
Bayes, who was a reverend who lived from 1702 to 1761 stated that the
probability you test positive AND are sick is the product of the likelihood that
you test positive GIVEN that you are sick and the "prior" probability that you
are sick (the prevalence in the population).
Bayes's theorem allows one to compute a conditional probability based on the
available information.
Bayes's Theorem
Test + 99 99 198
Test - 1 9,801 9,802
100 9,900 10,000
Therefore, in a population of 10,000 there will be 100 diseased people and 9,900
non-diseased people.
We also know the sensitivity of the test is 99%, i.e., P(B | A) = 0.99; therefore,
among the 100 diseased people, 99 will test positive. We also know that the
specificity is also 99%, or that there is a 1% error rate in non-diseased people.
Therefore, among the 9,900 non-diseased people, 99 will have a positive test.
And from these numbers, it follows that the unconditional probability of a
positive test is 198/10,000 = 0.0198; this is P(B).
Thus, P(A | B) = (0.99 x 0.01) / 0.0198 = 0.50 = 50%.
From the table above, we can also see that given a positive test (subjects in the
Test + row), the probability of disease is 99/198 = 0.05 = 50%.