0% found this document useful (0 votes)
8 views24 pages

Probability

Uploaded by

Renuka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views24 pages

Probability

Uploaded by

Renuka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

PROBABILITY

STATISTICS

Dr Rajeswari
Probabilities are numbers that reflect the likelihood that a
particular event will occur.

We hear about probabilities in many every-day situations ranging


from weather forecasts (probability of rain or snow) to the lottery
(probability of hitting the big jackpot).

In biostatistical applications, it is probability theory that


underlies statistical inference
An experiment is the process of making an observation or
taking a measurement
An event is the out come of the experiment
According to classical approach , Probability is defined as
the ratio of no. of favourable cases to the total number of
equally likely cases
A probability is a number that reflects the chance or likelihood that
a particular event will occur.

Probabilities can be expressed as proportions that range from 0 to


1, and they can also be expressed as percentages ranging from 0%
to 100%.

A probability of 0 indicates that there is no chance that a particular


event will occur, whereas a probability of 1 indicates that an event
is certain to occur.

A probability of 0.45 (45%) indicates that there are 45 chances out


of 100 of the event occurring.
The concept of probability can be illustrated in the context of a study of obesity in children 5-
10 years of age who are seeking medical care at a particular pediatric practice. The population
(sampling frame) includes all children who were seen in the practice in the past 12 months and
is summarized below.

Age (years)
5 6 7 8 9 10 Total
Boys 432 379 501 410 420 418 2,560
Girls 408 513 412 436 461 500 2,730
Totals 840 892 913 846 881 918 5,290
Unconditional Probability
If we select a child at random (by simple random sampling), then each child has
the same probability (equal chance) of being selected, and the probability is 1/N,
where N=the population size.
Thus, the probability that any child is selected is 1/5,290 = 0.0002.

In most sampling situations we are generally not concerned with sampling a


specific individual but instead we concern ourselves with the probability of
sampling certain types of individuals.

For example, what is the probability of selecting a boy or a child 7 years of age?
The following formula can be used to compute probabilities of selecting
individuals with specific attributes or characteristics.

P(characteristic) = # persons with characteristic / N


If we select a child at random, the probability that we
select a boy is computed as follows
P(boy) = 2,560/5,290 = 0.484 or 48.4%.

The probability of selecting a child who is 7 years


of age is P(7 years of age) = 913/5,290 = 0.173.

P(boy who is 10 years of age) = 418/5,290 =


0.079.

P(at least 8 years of age) = (846 + 881+


918)/5,290 = 2,645/5,290 = 0.500
Conditional Probability
Each of the probabilities computed in the previous section (e.g.,
P(boy), P(7 years of age)) is an unconditional probability, because
the denominator for each is the total population size (N=5,290)
reflecting the fact that everyone in the entire population is eligible to
be selected. However, sometimes it is of interest to focus on a
particular subset of the population (e.g., a sub-population).

For example, we are interested just in the girls and ask the question,
what is the probability of selecting a 9 year old from the sub-
population of girls?
There is a total of NG=2,730 girls (here NG refers to the population of
girls), and the probability of selecting a 9 year old from the sub-
population of girls is written as follows:
P(9 year old | girls) = # persons with characteristic / N
The concept of probability can be illustrated in the context of a study of obesity in children 5-
10 years of age who are seeking medical care at a particular pediatric practice. The population
(sampling frame) includes all children who were seen in the practice in the past 12 months and
is summarized below.

Age (years)
5 6 7 8 9 10 Total
Boys 432 379 501 410 420 418 2,560
Girls 408 513 412 436 461 500 2,730
Totals 840 892 913 846 881 918 5,290
where | girls indicates that we are conditioning the question
to a specific subgroup, i.e., the subgroup specified to the
right of the vertical line.
The conditional probability is computed using the same
approach we used to compute unconditional probabilities.
In this case:
P(9 year old | girls) = 461/2,730 = 0.169.
This also means that 16.9% of the girls are 9 years of age.
Note that this is not the same as the probability of selecting
a 9-year old girl from the overall population, which is P(girl
who is 9 years of age) = 461/5,290 = 0.087.
P(boy | 6 years of age) = 379/892 = 0.425. Thus 42.5% of
the 6 year olds are boys (57.5% of the 6 year olds are
girls).
Independence

In probability, two events are said to be independent if the probability of one is


not affected by the occurrence or non-occurrence of the other.

Example

A sample of 100 men underwent the new test and also had a biopsy. The data
from the biopsy results are summarized below.
Prostate Test Risk Prostate Cancer No Prostate Cancer Total
Low 10 50 60
Moderate 6 30 36
High 4 20 24
20 100 120

•The probability that a man has prostate cancer given he has a low risk is: P(Prostate Cancer |
Low Risk) = 10/60 = 0.167.
•The probability that a man has prostate cancer given he has a moderate risk is: P(Prostate
Cancer | Moderate Risk) = 6/36 = 0.167.
•The probability that a man has prostate cancer given he has a high risk is: P(Prostate Cancer |
High Risk) = 4/24 = 0.167.

Note that regardless of whether the hypothetical Prostate Test was low, moderate, or high, the
probability that a subject had cancer was 0.167. In other words, knowing a man's prostate test
result does not affect the likelihood that he has prostate cancer in this example.
In this case, the probability that a man has prostate cancer is independent of his prostate test
results
Demonstrating Independence

Consider two events, call them A and B (e.g., A might be a low risk based on the "prostate test",
and B is a diagnosis of prostate cancer). These two events are independent if P(A | B) = P(A) or if
P(B | A) = P(B).
To check independence, we compare a conditional and an unconditional probability: P(A | B) =
P(Low Risk | Prostate Cancer) = 10/20 = 0.50 and P(A) = P(Low Risk) = 60/120 = 0.50. The equality
of the conditional and unconditional probabilities indicates independence.

Independence can also be tested by examining whether P(B | A) = P(Prostate Cancer | Low Risk)
= 10/60 = 0.167 and P(B) = P(Prostate Cancer) = 20/120 = 0.167. In other words, the probability of
the patient having a diagnosis of prostate cancer given a low risk "prostate test" (the conditional
probability) is the same as the overall probability of having a diagnosis of prostate cancer (the
unconditional probability).
Example:
The following table contains information on a population of N=6,732 individuals who are
classified as having or not having prevalent cardiovascular disease (CVD). Each individual is
also classified in terms of having a family history of cardiovascular disease. In this analysis,
family history is defined as a first degree relative (parent or sibling) with diagnosed
cardiovascular disease before age 60.
Prevalent CVD Free of CVD Total
Family History of CVD 491 368 859
No Family History of CVD 152 5,721 5,873
Total 643 6,089 6,732

Are family history and prevalent CVD independent? Is there a relationship between family history and
prevalent CVD? This is a question of independence of events.

Let A=Prevalent CVD and B = Family History of CVD. (Note that it does not matter how we define A and B,
for example we could have defined A=No Family History and B=Free of CVD, the result will be identical.)

We now must check whether P(A | B) = P(A) or if P(B | A) = P(B). Again, it makes no difference which
definition is used; the conclusion will be identical. We will compare the conditional probability to the
unconditional probability as follows:
Conditional Probability Unconditional Probability
P(A | B) = P(Prevalent CVD | Family History of CVD) =
P(A) = P(Prevalent CVD) = 643/6,732 = 0.096
491/859 = 0.572
In the overall population, the probability of prevalent
The probability of prevalent CVD given a family history is
CVD is 9.6% (or 9.6% of the population has prevalent
57.2% (as compared to 2.6% among patients with no
CVD).
family history).

Since these probabilities are not equal, family history and prevalent
CVD are not independent. Individuals with a family history of CVD
are much more likely to have prevalent CVD.
Bayes's Theorem
"A patient goes to see a doctor. The doctor performs a test with 99 percent
reliability--that is, 99 percent of people who are sick test positive and 99
percent of the healthy people test negative. The doctor knows that only 1
percent of the people in the country are sick. Now the question is: if the patient
tests positive, what are the chances the patient is sick?"

The intuitive answer is 99 percent, but the correct answer is 50 percent...."

The solution to this question can easily be calculated using Bayes's theorem.
Bayes, who was a reverend who lived from 1702 to 1761 stated that the
probability you test positive AND are sick is the product of the likelihood that
you test positive GIVEN that you are sick and the "prior" probability that you
are sick (the prevalence in the population).
Bayes's theorem allows one to compute a conditional probability based on the
available information.
Bayes's Theorem

P(A) is the probability of event A


P(B) is the probability of event B
P(A|B) is the probability of observing event A if B is
true
P(B|A) is the probability of observing event B if A is
true.
Wiggins's explanation can be summarized with the help of the
following table which illustrates the scenario in a hypothetical
population of 10,000 people:

Diseased Not Diseased Total

Test + 99 99 198
Test - 1 9,801 9,802
100 9,900 10,000

In this scenario P(A) is the unconditional probability of disease; here it is 100/10,000 =


0.01.
P(B) is the unconditional probability of a positive test; here it is 198/10,000 = 0.0198..
What we want to know is P (A | B), i.e., the probability of disease (A), given that
the patient has a positive test (B).

We know that prevalence of disease (the unconditional probability of disease)


is 1% or 0.01; this is represented by P(A).

Therefore, in a population of 10,000 there will be 100 diseased people and 9,900
non-diseased people.
We also know the sensitivity of the test is 99%, i.e., P(B | A) = 0.99; therefore,
among the 100 diseased people, 99 will test positive. We also know that the
specificity is also 99%, or that there is a 1% error rate in non-diseased people.
Therefore, among the 9,900 non-diseased people, 99 will have a positive test.
And from these numbers, it follows that the unconditional probability of a
positive test is 198/10,000 = 0.0198; this is P(B).
Thus, P(A | B) = (0.99 x 0.01) / 0.0198 = 0.50 = 50%.
From the table above, we can also see that given a positive test (subjects in the
Test + row), the probability of disease is 99/198 = 0.05 = 50%.

You might also like