621labex2 2018 Answers

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Biostatistics 140.

621
First Term, 2018-2019
Laboratory Exercise 2
Probability Concepts and Binomial Distribution

Answer Key
1. Below find a table from a US National Medical Expenditure Survey (NMES) showing the
joint distribution of smoking and chronic obstructive pulmonary disease (COPD) status.

Smoking
COPD Never Former Current Total
No 5,030 3,282 2,876 11,188
Yes 20 98 65 183
Total 5,050 3,380 2,941 11,371

a) If a person is chosen at random from the 11,371 individuals in the NMES sample,
what is the probability that he or she will be:

i) a current smoker

P(CS) = 2,941/11,371 = 0.26

ii) a current and former smoker

P(CS and FS) = 0

iii) have COPD

P(COPD) = 183/11,371 = 0.02

iv) have COPD and be a current smoker

P(COPD and CS) = 65/11,371 = 0.0057

v) have COPD given he/she is a current smoker

P(COPD|CS) = 65/2,941 = 0.022


Biostatistics 140.621 Laboratory Exercise 2 Answer Key 2

b) Calculate the probability that a COPD patient is a current or former smoker using
Bayes theorem.

Pr(COPD | Smoke)  Pr(Smoke)


Pr(Smoke | COPD) 
Pr(COPD | Smoke)  Pr(Smoke)  Pr(COPD | no Smoke)  Pr(no Smoke)

 98  65   3380  2941
  
 3380  2941   11,371  163
Pr(Smoke | COPD)    0.89
 98  65   3380  2941   20   5050  183
    
 3380  2941   11,371   5050   11,371 

c) What is meant by "probability" in question 1.a)?


Because the person is chosen at random in the population, each person has an equal
chance of being chosen. Using the classical definition of probability here, probability
is equal to the relative frequency of the condition in the population. Note that the
variance in the population produces uncertainty about any one person.

d) Estimate the three conditional probabilities of having COPD given smoking status is
never, former, and current. Make a table of these probabilities that allows the viewer
to observe the association between smoking and the risk of COPD.

P(COPD|never smoker) = 20/5050 = 0.004

P(COPD|former smoker) = 98/3380 = 0.029

P(COPD|current smoker) = 65/2941 = 0.022

e) Hypothesize about the biological and behavioral processes that might have given rise
to the data summarized in step 1.d).

We observe that the probability of COPD is higher in former smokers than current
smokers. Possible reasons for this include: former smokers may be older, former
smokers may have quit smoking due to illness, these data are only cross-sectional data
and do not provide information on longitudinal changes over time.

© 2018 Johns Hopkins University Department of Biostatistics 08/26/18


Biostatistics 140.621 Laboratory Exercise 2 Answer Key 3

2. The following 3x2 tables derived from the Nepal mortality data show mortality at 16 months
of follow-up for different ages of girls in both treatment groups.
-> sex = Female trt= Placebo
Age of | Vital status
child | Alive Dead | Total
-----------+----------------------+----------
< 1 | 1219 69 | 1288
| 94.64 5.36 | 100.00
-----------+----------------------+----------
1-2 | 2615 72 | 2687
| 97.32 2.68 | 100.00
-----------+----------------------+----------
3-4 | 2542 25 | 2567
| 99.03 0.97 | 100.00
-----------+----------------------+----------
Total | 6376 166 | 6542
| 97.46 2.54 | 100.00

-> sex = Female trt=Vitamin


Age of | Vital status
child | Alive Dead | Total
-----------+----------------------+----------
< 1 | 1291 54 | 1345
| 95.99 4.01 | 100.00
-----------+----------------------+----------
1-2 | 2724 52 | 2776
| 98.13 1.87 | 100.00
-----------+----------------------+----------
3-4 | 2529 15 | 2544
| 99.41 0.59 | 100.00
-----------+----------------------+----------
Total | 6544 121 | 6665
| 98.18 1.82 | 100.00

3. Suppose 3 girls (call them J, R and Y) were randomly selected from the Vitamin A-treated
group. Define the random variable of interest, X, as the number out of the 3 who die during
follow-up.

a) What are the possible outcomes (values) that may be observed for this random
variable?

The observed outcome, x, may take on the values 0,1,2, or 3 deaths.


The P(X=x) is bounded between 0 and 1. The total probability of these 4
mutually exclusive outcomes sums to 1.

© 2018 Johns Hopkins University Department of Biostatistics 08/26/18


Biostatistics 140.621 Laboratory Exercise 2 Answer Key 4

b) What is the probability that only the first girl ( J) would die during follow-up?

P (only the first girl dies) = P(1st dies)*P(2nd survives)*P(3rd survives)


= (0.018)*(0.982)*(0.982)= 0.017

c) What is the probability that only one (exactly one) girl would die during follow-
up?

P(only one girl dies) = P(only the 1st dies)+P(only the 2nd dies)+P(only the 3rd dies)
= 0.0.17+0.017+0.017= 0.052

n  3
Note: This is the same as P(X=1) =   p x q n  x =  (0.018)1 (0.982) 31 =0.052
 x 1 

d) What is the probability that only the first two girls (J and R) would die during follow-
up?

P(only the first 2 girls die ) = P ( the first 2 girls die and the third girl survives)
= P ( girl dies)2 *P ( girl survives)1

= P( D  D S)

=
(0.018) 2 (0.982)1 = 0.0003

Note: This is the same as

 p x q n x

e) What is the probability that exactly two girls would die during follow-up?

P(only 2 girls die) = P(1st and 2nd die)+P(1st and 3rd die) + P(2nd and 3rd die)
= 0.0003 + 0.0003 + 0.0003 = 0.0009

Note: This is the same as


n
P(X = 2) =   p x q n  x
 x

© 2018 Johns Hopkins University Department of Biostatistics 08/26/18


Biostatistics 140.621 Laboratory Exercise 2 Answer Key 5

 3
=  (0.018) 2 (0.982)1
 2
= 0.0009

f) Describe the probability distribution for the number of deaths of Vitamin A-treated
girls during 16 months of follow-up by filling in the table below:

Using n=3 and p=0.018:

Number of deaths (x) P(X = x) = Prob of x deaths


0 0.947
1 0.052
2 0.0009
3 ~0
Total 1.0

n  3
P(X = 0) =   p x q n  x =  (0.018) 0 (0.982) 3 = 0.947
 x  0

n  3
P(X=1) =   p x q n  x =  (0.018)1 (0.982) 2 = 0.052
 x 1 

© 2018 Johns Hopkins University Department of Biostatistics 08/26/18


Biostatistics 140.621 Laboratory Exercise 2 Answer Key 6

n  3
P(X = 2) =   p x q n  x =  (0.018) 2 (0.982)1 = 0.0009
 x  2

n  3
P(X = 3) =   p x q n  x =  (0.018) 3 (0.982) 0 ~0
 x  3

© 2018 Johns Hopkins University Department of Biostatistics 08/26/18


Biostatistics 140.621 Laboratory Exercise 2 Answer Key 7

g) What probability distribution is this? What assumptions are made?

Binomial Probability Distribution with n=3 and p= 0.018

Assumptions:
1. Assume that the n girls are independent individuals; the outcome observed in
a girl does not influence the outcome of other girls.
2. Assume that the probability of death = p = 0.018.
3. Assume that the probability of death is the same for all 3 girls.

© 2018 Johns Hopkins University Department of Biostatistics 08/26/18

You might also like