Textbook Practice Problems 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 39
At a glance
Powered by AI
The document discusses blood pressure measurements and variability in measurements based on factors like body position and number of visits/readings. It also discusses probabilities related to normal blood pressure ranges.

Systolic blood pressure seems to be higher in the recumbent position than standing, while diastolic blood pressure appears comparable between positions based on the distributions shown.

The normal range is defined as between the 10th and 90th percentiles, which is 0 to 16 mm Hg change in systolic blood pressure.

Unit 2:

Lessen 2:

A man runs 1 mile approximately once per weekend. He records his time
over an 18-week period. The individual times and summary statistics are
given in Table 2.14.

2.9 What is standard deviation of the 1 mile running time over 18 weeks?

Solution:
d1 <- c(12.80,11.57,12.20,11.73,12.25,12.67,12.18,11.92,11.53,11.67,12.47,11.80,
12.30,12.33,12.08,12.55,11.72,11.83)

sd(d1) = 0.3874181
_________________________________________________________________

Suppose we construct a new variable called time_100 =100 Å~ time (e.g.,


for week 1, time_100 = 1280).
2.10 What is the mean and standard deviation of time_100?

Solution:
d1 <- c(12.80,11.57,12.20,11.73,12.25,12.67,12.18,11.92,11.53,11.67,12.47,11.80,
12.30,12.33,12.08,12.55,11.72,11.83)
d2 <- 100*(d1)
d2
mean(d2) = 1208.889
sd(d2) = 38.74181
2.11 Construct a stem and leaf plot of time_100 using the first 3 most
significant digits for the stem and the least significant digit for the leaf.
So, for week 1, time_100 = 1280 which has a stem = 128 and a leaf = 0.

Solution:
stem (d2, scale=3)
115 | 37
116 | 7
117 | 23
118 | 03
119 | 2
120 | 8
121 | 8
122 | 05
123 | 03
124 | 7
125 | 5
126 | 7
127 |
128 | 0

Hypertension:
In an experiment that examined the effect of body position on blood
pressure [8], 32 participants had their blood pressures measured while
lying down with their arms at their sides and again standing with their
arms supported at heart level. The data are given in Table 2.16.
#Spb taken in recumbent position
rec_sbp <- c(99,126,108,122,104,108, 116,106,118,92,110,138,120,142,118,134,
118,126,108,136,110,120,108,132,102,118,116,118,110,122,106,146)

#Dpb taken in recumbent position


rec_dbp <- c(71,74,72,68,64,60,70,74,82,58,78,80,70,88,58,
76,72,78,78,86,78,74,74,92,68,70,76,80,74,72,62,90)

#Spb taken in standing position


st_sbp <- c(105,124,102,114,96,96,106,106,120,88,102,124,118,136,92,126,108,
114,94,144,100,106,94,128,96,102,88,100,96,118,94,138)

#Dpb taken in standing position


st_dbp <- c(79,76,68,72,62,56,70,76,90,60,80,75,84,90,58,68,68,76,70,88,64,
70,74,88,64,68,60,84,70,78,56,94)

#difference in sys BP (rec - Standing)


diff_sbp <- rec_sbp - st_sbp

#difference in dias BP (rec - Standing)


diff_dbp <- rec_dbp - st_dbp

diff_sbp
[1] -6 2 6 8 8 12 10 0 -2 4 8 14 2 6 26 8 10 12 14 -8 10 14 14 4 6 16 28
[28] 18 14 4 12 8

diff_dbp
[1] -8 -2 4 -4 2 4 0 -2 -8 -2 -2 5 -14 -2 0 8 4 2 8 -2
[21] 14 4 0 4 4 2 16 -4 4 -6 6 -4

2.20 Construct stem-and-leaf and box plots for the difference scores for
each type of blood pressure.

> stem(diff_sbp)

-0 | 86
-0 | 2
0 | 022444
0 | 66688888
1 | 00022244444
1 | 68
2|
2 | 68

> stem(diff_dbp)

-1 | 4
-0 | 886
-0 | 444222222
0 | 0002224444444
0 | 5688
1|4
1|6

>boxplot(diff_sbp)

bwplot(diff_dbp)
2.21 Based on your answers to Problems 2.19 and 2.20, comment on the
effect of body position on the levels of systolic and diastolic blood
pressure.

Systolic blood pressure clearly seems to be higher in the recumbent position than in
the standing position. Diastolic blood pressure appears to be comparable in the two
positions. The distributions are each reasonably symmetric.

2.22 Orthostatic hypertension is sometimes defined based on an unusual


change in blood pressure after changing position. Suppose we define a
normal range for change in systolic blood pressure (SBP) based on change
in SBP from the recumbent to the standing position in Table 2.16 that is
between the upper and lower decile. What should the normal range be?

#The normal range of the diff_spb:


quantile (diff_sbp, probs= c (0.1,0.9), na.rm= TRUE, type=2)

10% 90%
0 16

#The normal range of the diff_dpb:


quantile (diff_dbp, probs= c (0.1,0.9), na.rm= TRUE, type=2)

10% 90%
-6 8

Lessen 3:

#preparing data

#1. Set working directory:


setwd("~/Desktop/3rd Semester/Methods I/Datasets")

#2. Load the data set:


lead.df <- read.table(file="LEAD.DAT.txt", header=TRUE, sep=",") #DEFULT IS
FALES

#3. View the file:


View(lead.df)
dim(lead.df) #Num of rows, Num of columns
names(lead.df) #columns names
head(lead.df)
str(lead.df) #structure of the variables

#4. Create a list of the variables that you want to change to factor:
fac_var <- c ("area", "sex", "iq_type", "lead_grp","Group", "fst2yrs",
"pica", "colic", "irrit", "convul")

lead.df[,fac_var] <- lapply(lead.df[,fac_var],factor)


str(lead.df)

2.31 Compare the exposed and control groups regarding age and gender,
using appropriate numeric and graphic descriptive measures.

#2.31:
# we want a numerical (e.g. mean, median, 5th num summary, sd, etc,)
#and graphical (e.g., boxplot, histogram, density plot, etc,) summary:
# 1* comparing the ages of the individuals in the exposed and control groups
# 2* comparing the distribution of sex in the exposed and control groups

#1. comparison of ages by groups:

names(lead.df) #"ageyrs" "Group"


require(mosaic)
#Num:
favstats(~ageyrs|Group,data=lead.df)
#gragh:
Bwplot(Group~ageyrs,data=lead.df)

#2. comparison of sex by groups:

#Num:
xtabs(~Group+sex,data=lead.df)
install.packages("tigerstats")
require("tigerstats")
rowPerc(xtabs(~Group+sex,data=lead.df))

#gragh:
bargraph(~sex,group=Group,data=lead.df) #for count
bargraph(~sex,group=Group,data=lead.df,type="percent") #for percentage

2.32 Compare the exposed and control groups regarding verbal and
performance IQ, using appropriate numeric and graphic descriptive
measures.

# we want a numerical (e.g. mean, median, 5th num summary, sd, etc,)
#and graphical (e.g., boxplot, histogram, density plot, etc,) summary:
# 1* comparing the verbal IQ in the exposed and control groups
# 2* comparing performance IQ in the exposed and control groups

names(lead.df)
str(lead.df)

# "iqv" , "iqp" , "Group"


#1. comparison of verbal IQ "iqv" by groups:
#NUM:
favstats(~iqv|Group,data=lead.df)
Group min Q1 median Q3 max mean sd n missing
1 1 57 74 85 95 126 85.14103 14.68609 78 0
2 2 51 76 83 91 116. 83.84783 11.56809 46 0

#Graphic:
bwplot(Group~iqv,data=lead.df)

#1. comparison of performance IQ "iqp" by groups:


#NUM:
favstats(~iqp|Group,data=lead.df)
Group min Q1 median Q3 max mean sd n missing
1 1 51 92.00 101 113.0 149 102.70513 16.78675 78 0
2 2 51 85.25 97 105.5 121 94.93478 13.34733 46 0
#Graphic:
bwplot(Group~iqp,data=lead.df)
The exposed children have somewhat lower mean and median IQ scores compared
to the unexposed children, but the differences don't appear to be very large.

Microbiology

A study was conducted to demonstrate that soybeans in- oculated with nitrogen-fixing
bacteria yield more and grow adequately without expensive environmentally deleterious
synthesized fertilizers. The trial was conducted under con- trolled conditions with uniform
amounts of soil. The initial hy- pothesis was that inoculated plants would outperform their
uninoculated counterparts. This assumption is based on the facts that plants need nitrogen
to manufacture vital proteins and amino acids and that nitrogen-fixing bacteria would
make more of this substance available to plants, increasing their size and yield. There were
8 inoculated plants (I) and 8 uninoculated plants (U). The plant yield as measured by pod
weight for each plant is given in Table 2.20.

2.35 Compute appropriate descriptive statistics for I and U plants.

I <- c(1.76,1.45,1.03,1.53,2.34,1.96,1.79,1.21)
U <- c(0.49,0.85,1.00,1.54,1.01,0.75,2.11,0.92)

> favstats(I)
min Q1 median Q3 max mean sd n missing
1.03 1.39 1.645 1.8325 2.34 1.63375 0.4198958 8 0

> favstats(U)
min Q1 median Q3 max mean sd n missing
0.49 0.825 0.96 1.1425 2.11 1.08375 0.5097881 8 0
2.36 Use graphic methods to compare the two groups.
Sulotion:
cor(I ~ U)
0.0266867

xyplot(I ~ U)

2.37 What is your overall impression concerning the pod weight in the two
groups?

Inoculated plants (I) tend to have higher pod weight than uninoculated plants (U)
Unit 3:

Mental Health

Estimates of the prevalence of Alzheimer’s disease have recently been provided by


Pfeffer et al. [8]. The estimates are given in Table 3.5.

Suppose an unrelated 77-year-old man, 76-year-old woman, and 82-year-old


woman are selected from a com- munity.

P(A) = event of 77-year old man having Alzheimer’s disease = 0.049

P(B) = event of 76-year old woman having Alzheimer’s disease = 0.023

P(C) = event of 82-year old woman having Alzheimer’s disease = 0.078


3.19: What is the probability that exactly one of the three people has Alzheimer’s
disease?

P( A∩ B̄∩C̄ )+ P( Ā∩B∩C̄ )+ P ( Ā∩ B̄∩C )


=(.049×. 977×. 922 )+(.951×. 023×.922)+(.951×. 977×.078 )
=.04414+.02017+.07247=.1368

3.20 Suppose we know one of the three people has Alzheimer’s


disease, but we don’t know which one. What is the conditional
probability that the affected person is a woman?

P( Ā∩B∩C̄ )+ P( Ā∩ B̄∩C ) . 02017+. 07247


= =. 677
P( A∩ B̄∩C̄ )+ P( Ā∩B∩C̄ )+ P( Ā∩B̄∩C ) .1368
3.21 Suppose we know two of the three people have Alzheimer’s
disease. What is the conditional probability that they are both
women?

P( Ā∩B∩C )
P( A∩ B∩C̄ )+ P ( A∩ B̄∩C )+ P( Ā∩B ∩C )
(.951×.023×. 078) . 00171
= = =.2639
(.049×. 023×. 922)+(.049×. 977×. 078 )+(. 951×. 023×. 078 ) . 00648

--------------------------------------------------------------------------------------------
-

Suppose the probability that both members of a married couple,


each of whom is 75–79 years of age, will have Alzheimer’s disease
is .0015.

P(A) = {The probability that male 75–79 years of age will have Alzheimer’s}
= 0.049

P(B) = {The probability that Female 75–79 years of age will have
Alzheimer’s} = 0.023

P (A∩ B) = 0.0015
3.25 What is the probability that at least one member of the couple is affected?

P (A ∪ B) = P(A) + P(B) – P (A ∩ B)

= 0.049 + 0.023 – 0.0015 = 0.0705

3.26 What is the expected overall prevalence of Alzheimer’s


disease in the community if the prevalence estimates in Table 3.5
for specific age–gender groups hold?
Let A={Alzheimer’s}.
P(A) = P(A | 65-69 y.o.male) x P(65-69 y.o.male)
+ P(A | 65-69 y.o.female) x P(65-69 y.o.female)
+ P(A | 70-74 y.o.male) x P(70-74 y.o.male)
+ P(A | 70-74 y.o.female) x P(70-74 y.o.female)
+ P(A | 75-79 y.o.male) x P(75-79 y.o.male)
+ P(A | 75-79 y.o.female) x P(75-79 y.o.female)
+ P(A | 80-84 y.o.male) x P(80-84 y.o.male) x
+ P(A | 80-84 y.o.female) x P(80-84 y.o.female)
+ P(A | 85+ y.o.male) x P(85+ y.o.male)
+ P(A | 85+ y.o.female) x P(85+ y.o.female)
= (.05 x .016) + (.10 x 0.0) + (.09 x 0.0) + (.17 x .022) + (.11 x .049) + (.18 x .023)
+ (.08 x .086) + (.12 x .078) + (.04 x .35) + (.06 x .279) = .061

The expected overall prevalence is 6.1% (or, 6.1 per 100 population).

--------------------------------------------------------------------------------------------
Hypertension

Laboratory measures of cardiovascular reactivity are receiving increasing attention. Much of the
expanded interest is based on the belief that these measures, obtained under challenge from
physical and psychological stressors, may yield a more biologically meaningful index of
cardiovascular function than more traditional static measures. Typically, measurement of
cardiovascular reactivity involves the use of an automated blood-pressure monitor to examine the
changes in blood pressure before and after a stimulating ex- perience (such as playing a video
game). For this purpose, blood-pressure measurements were made with the Vita- Stat blood-
pressure machine both before and after playing a video game. Similar measurements were
obtained using manual methods for measuring blood pressure. A person was classified as a
“reactor” if his or her DBP increased by 10 mm Hg or more after playing the game and as a
nonre- actor otherwise. The results are given in Table 3.11.

3.78 If the population tested is representative of the general population, then what are the PV+
and PV− using this test?

PPV = P(D | S+) = P(D ∩ S+) / P(S+) = (6/79) / (21/79) = 6 / 21 = .286

PPN = P(no D | S-) = P(no D ∩ S-) / P(S-) = (51/79) / (58/79) = 51 / 58 = .879


Mental Health

The Chinese Mini-Mental Status Test (CMMS) consists of 114 items intended to identify people
with Alzheimer’s disease and senile dementia among people in China [14]. An extensive clinical
evaluation of this instrument was perormed, whereby participants were interviewed by
psychiatrists and nurses and a definitive diagnosis of dementia was made. Table 3.13 shows the
results obtained for the subgroup of people with at least some formal education.

Suppose a cutoff value of ≤ 20 on the test is used to identify people with dementia.

3.87 What is the sensitivity of the test?

+
sensitivity = P (test | disease) = 12/16 = 0.75

3.88 What is the specificity of the test?

=
Specificity = P (test | no disease) = 34/16 =0.73
3.89 The cutoff value of 20 on the CMMS used to identify people with dementia is arbitrary.
Suppose we consider changing the cutoff. What are the sensitivity and specificity if cutoffs of 5,
10, 15, 20, 25, or 30 are used? Make a table of your results.

CMMS Specificity sensitivity False Positive

5 46/46 = 1 2/16 = 0.125 1–1=0

10 46/46 = 1 3/16 = 0.188 1–1=0

15 43/46 = 0.935 7/16 = 0.438 1 – 0.935 =


0.065

20 34/46 = 0.739 12/16 = 0.750 1 – 0.739 =


0.261

25 18/46 = 0.391 15/16 = 0.938 1 – 0.391 =


0.609

30 0/46 = 0 16/16 = 1 1–0=1

3.90 Construct a ROC curve based on the table constructed in Problem 3.89.
3.91 Suppose we want both the sensitivity and specificity to be at least 70%. Use the ROC curve
to identify the possible value(s) to use as the cutoff for identifying people with dementia, based
on these criteria.

Based on criteria that both the sensitivity and specificity must be at least 70%, the cutoff value
for people with dementia is CMMS score =< 20.

The criterion is that both the sensitivity for sensitivity and specificity must be at least 70%. From
Exercise 3.89, the sensitivity for CMMS score < 20 is 0.750 and specificity is 0.739. Both the
values are greater than 70%. Also, from Exercise 3.90, in ROC curve it can be observed that
point (0.261, 0.750) satisfies the criterion for CMMS score < 20. Hence, from both the table and
ROC curve, cut off value for people with dementia is CMMS score < 20 for based on criteria that
both the sensitivity and specificity must be at least 70%.
3.92 Calculate the area under the ROC curve. Interpret what this area means in words in the
context of this problem.

Area = 0.5 [ (0.188+0.438) (0.065) + (0.438+0.750) (0.261- 0.065) + (0.750+0.938)(0.609-


0.261) + (0.938+1)(1-0.609) ] = 0.809

Suppose a birth defect has a recessive form of inheritance. In a


study population, the recessive gene (a) initially has a prevalence
of 25%. A subject has the birth defect if both maternal and
paternal genes are of type a.

3.115 In the general population, what is the probability that an


individual will have the birth defect, assuming that maternal and
paternal genes are inherited independently?

**Answer**

The recessive gene (a) initially has a prevalence of 25%.

P (Maternal gene) = 25/100 = 0.25


P (Paternal gene) = 25/100 = 0.25

P (having a birth defect that maternal and paternal genes are


inherited independently) = P (Maternal gene) * P (Paternal gene)

= 0.25 * 0.25 = 0.0625

**Supporting Work**

The recessive gene (a) initially has a prevalence of 25% in the


study population. Since both maternal and paternal genes are (a)
and inherited independently, the probability of each is 0.25. Then,
we conduct P(A∩B) = P(A)*P(B) to find the probability of having a
birth defect for these parents. The result is 0.625 of babies will
probably have a birth defect if their parents have genes (a)
independently.

A further study finds that after 10 generations (≈200 years) a lot


of inbreeding has taken place in the population. Two
subpopulations (populations A and B), consisting of 30% and 70%
of the general population, respectively, have formed. Within
population A, prevalence of the recessive gene is 40%, whereas in
population B it is 10%.

P(A) = 0.30, P(B) = 0.70

P(A)*P(B) = 0.30 X 0.70 = 0.021

3.116 Suppose that in 25% of marriages both people are from


population A, in 65% both are from population B, and in 10%
there is one partner from population A and one from population B.
What is the probability of a birth defect in the next generation?

*Answer*
For marriage:

25%: P(A)* P(A)* 0.25 = 0.16 * 0.25 = 0.04

65%: P(B)* P(B)* 0.65 = 0.01 * 0.65 = 0.0065

10%: P(A)* P(B)* 10% = 0.10 * 0.40 *0.10 = 0.004

Probability of a defect birth in the next generation = 0.04 +


0.0065 + 0.004 = 0.0505

3.117 Suppose that a baby is born with a birth defect, but the
baby’s ancestry is unknown. What is the posterior probability that
the baby will have both parents from popula- tion A, both parents
from population B, or mixed ancestry, respectively? (Hint: Use
Bayes’ rule.)

## Problem 3.117

The Prevalence of Population A (PRE-A) = 0.40

The Prevalence of Population B (PRE-B) = 0.10

The Probability of Both Parents from Population A = P(A) = 0.25

The Probability of Both Parents from Population B = P(B) = 0.65

The Probability of One from A and B = P(AB) = 0.10

The Probability of a defect birth in the next generation = P(D) =


0.0505

### Both Parents from Population A:

**Answer**

The Posterior Probability = 0.792

**Supporting Work**
The posterior probability that the baby will have both parents
from population A = ((PRE-A) ^2 * P(A)) /P(D)

The posterior probability = ((0.40) ^2 * 0.25) /0.0505 = 0.792

### Both Parents from Population B:

**Answer**

The Posterior Probability = 0.1287

**Supporting Work**

The posterior probability that the baby will have both parents
from population B = ((PRE-B) ^2 *P(B)) / P(D))

The posterior probability = ((0.10) ^2 * 0.65)/0.0505 = 0.1287

### Mixed Ancestry:

**Answer**

The Posterior Probability = 0.0792

**Supporting Work**

The posterior probability that both parents from mixed ancestry

= (PRE-A * PRE-B * P(AB))/P(D)

The posterior probability = (0.40 * 0.10 * 0.10)/ 0.0505 = 0.0792

Unit 4:
Let X be the random variable representing the number of
hypertensive adults in Example 3.12.

Hypertension Genetics Suppose we are conducting a


hypertension-screening program in the home. Consider all
possible pairs of DBP measurements of the mother and father
within a given family, assuming that the mother and father are
not genetically related. This sample space consists of all pairs of
numbers of the form (X, Y) where X > 0, Y > 0. Certain specific
events might be of interest in this context. In particular, we might
be interested in whether the mother or father is hypertensive,
which is described, respectively, by events A = {mother’s DBP ≥
90}, B = {father’s DBP ≥ 90}. These events are diagrammed in
Figure 3.4.

Suppose we know that Pr(A) = .1, Pr(B) = .2. What can we say
about Pr(A ∩ B) = Pr(mother’s DBP ≥ 90 and father’s DBP ≥ 90) =
Pr(both mother and father are hypertensive)? We can say nothing
unless we are willing to make certain assumptions.

4.1 Derive the probability-mass function for X.

Let X be the random variable representing the number of adults.

Consider that Events A and B represent the male and female.

It is given that the probability of male adults has hypertensive as


= P(A) = 0.1 and the probability of female adult as hypertensive
as P(B) = 0.2.

Here, X can take 0, 1, and 2,

 "0" represents that both male and female are not affected,
 "1" represents one adult is affected and one adult is not
affected, and
 "2" represents that both male and female are affected.

S AB not hyper One of them AB hyper


hyper

X 0 1 2

P(X=x) 0.72 0.26 0.02


CDF 0.72 0.98 1

1. When X-0, the probability mass function is as follows:

P (Both are not affected) = 1-P(A) * 1-P(B) = (1-0.1) * (1-0.2) =


0.72

2. When X-1, the probability mass function is as follows:

P (one of them are affected) = (P(A)*(1-P(B)) + (P(B)*(1-P(A))

= (0.1 *(1-0.2)) + (0.2*(1-0.1)) = 0.26

3. When X-0, the probability mass function is as follows:

P (Both are affected) = P(A) * P(B) = 0.1 * 0.2 = 0.02

4.2 What is its expected value?

E(X)= μX = {x · P(X=x)

E(X) = (0*0.72) + (1*0.26) + (2*0.02) = 0.3

4.3 What is its variance?

Var(X) = σX2 = E(X^2) − [E(X)] ^2

E(X^2) = ((0^2) * 0.72) + ((1^2) * 0.26) + ((2^2) * 0.02) = 0.34

Var(X) = 0.34 -(0.3^2) = 0.25

4.4 What is its cumulative-distribution function?


FX(x)=P (X ≤x)

F (x) = P(X<0) = 0

F (0) = P(X≤0) = P(X=0) = 0.72

F (1) = P(X≤1) = P(X=0) + P(X=1) = 0.72 + 0.26 = 0.98

F (1) = P(X≤2) = P(X=0) + P(X=1) + P(X=2) = 0.72 + 0.26 +


0.02 = 1.0

Suppose we want to check the accuracy of self-reported


diagnoses of angina by getting further medical records on
a subset of the cases.

4.5 If we have 50 reported cases of angina and we want to


select 5 for further review, then how many ways can we
select these cases if order of selection matters?

nPk = n! / (n-k)!

50P5 = 50! / (50-5)! = (50*49*48*47*46) * 45! / 45! =


50*49*48*47*46

254251200

4.6 Answer Problem 4.5 assuming order of selection does


not matter.

nCk = n! / k! * (n-k)!

50C5 = 50! / 5! * (50-5)! = 2118760

4.9 Suppose 6 of 15 students in a grade-school class


develop influenza, whereas 20% of grade-school students
nationwide develop influenza. Is there evidence of an
excessive number of cases in the class? That is, what is
the probability of obtaining at least 6 cases in this class if
the nationwide rate holds true?

X ~ Binom (n=15, Prop=0.20)

P(X>= 6) = 1- P(X ≤ 5)

= 1- (P(X=0) + P(X=1) + P(X=2) + P(X=3) + P(X=4) + P(X=5))

= 1 - pbinom(q = 5, size = 15, prob = 0.20) = 0.061

4.10 What is the expected number of students in the class


who will develop influenza?

E(X) = n*p = 15*0.20 = 3

Hypertension

A national study found that treating people appropriately


for high blood pressure reduced their overall mortality by
20%. Treating people adequately for hypertension has
been difficult because it is estimated that 50% of
hypertensives do not know they have high blood pressure,
50% of those who do know are inadequately treated by
their physicians, and 50% who are appropriately treated
fail to follow this treatment by taking the right number of
pills.

4.30 What is the probability that among 10 true


hypertensives at least 50% are being treated
appropriately and are complying with this treatment?

P=0.5*0.5*0.5=0.125

X ~ Binom (n=10, Prop=0.125)


P(X>= 5) = 1- P(X ≤ 4)

= 1 - pbinom(q = 4, size = 10, prob = 0.125) = 0.00445

4.31 What is the probability that at least 7 of the 10


hypertensives know they have high blood pressure?

X ~ Binom (n=10, Prop=0.5)

P (X>= 7) = 1- P (X ≤ 6)

= 1 - pbinom(q = 6, size = 10, prob = 0.5) = 0.172

4.32 If the preceding 50% rates were each reduced to 40%


by a massive education program, then what effect would
this change have on the overall mortality rate among true
hypertensives; that is, would the mortality rate decrease
and, if so, what percentage of deaths among
hypertensives could be prevented by the education
program?

Cardiovascular Disease

An article was published [13] concerning the incidence of


cardiac death attributable to the earthquake in Los
Angeles County on January 17, 1994. In the week before
the earth- quake there were an average of 15.6 cardiac
deaths per day in Los Angeles County. On the day of the
earthquake, there were 51 cardiac deaths.

4.64 What is the exact probability of 51 deaths occurring


on one day if the cardiac death rate in the previous week
continued to hold on the day of the earthquake?
X = Pois (μ = 15.6)

μ = lambda ⁄t. =15.6*1 = 15.6

P(X=51) = dpois(x = 51, lambda = 15.6) = 7.650953e-13

4.65 Is the occurrence of 51 deaths unusual? (Hint: Use


the same methodology as in Example 4.32.)

P(X ≥ 51) = 1 - P(X ≤ 50)

1 - ppois (q = 50, lambda = 15.6) = 1.089351e-12

4.66 What is the maximum number of cardiac deaths that


could have occurred on the day of the earthquake to be
consistent with the rate of cardiac deaths in the past
week? (Hint: Use a cutoff probability of .05 to determine
the maxi- mum number.)

P(X ≥ x) > 0.05 = 1- P(X ≤ x-1)

myx <- seq(20,22, 1)

1 – ppois (q=myx, lambda=15.6)

x= 22

Hospital Epidemiology

Suppose the number of admissions to the emergency


room at a small hospital follows a Poisson distribution but
the incidence rate changes on different days of the week.
On a weekday there are on average two admissions per
day, while on a weekend day there is on average one
admission per day.
4.90 What is the probability of at least one admission on a
Saturday?

X ~ Pois (μ = 1)

P (X ≥ 1) = 1- P (X ≤ 0)

1- ppois (q=0, lambda=1) = 0.6321206

4.91 What is the probability of having 0, 1, and 2+


admissions for an entire week, if the results for different
days during the week are assumed to be independent?

μ = 2+2+2+2+2 = 10 for weekdays, μ = 1+1 = 2 for weekend

1. P (X = 0) = dpois (x=0, lambda=10) * dpois(x=0, lambda=2)


= 6.144212e-06
2. P (X = 1) =

dpois (x=1, lambda=10) * dpois(x=0, lambda=2) + dpois (x=1,


lambda=2) * dpois(x=0, lambda=10) = 7.373055e-05

3. P (X ≥ 2) = 1 – P (X = 0) + P (X = 1) = 1- (6.144212e-06 +
7.373055e-05)
= 0.9999201

Obstetrics
Suppose the incidence of a specific birth defect in a high
socioeconomic status (SES) census tract is 50 cases per
100,000 births.

4.92 If there are 5000 births in the census tract in 1 year,


then what is the probability that there will be exactly 5
cases of the birth defect during the year (census tract A in
Table 4.21)?

p = 50 / 100,000 = 0.0005

n = 5000, μ = np = 5000* 0.0005 = 2.5

X = Pois (μ = 2.5)

P (X =5) = dpois (x=5, lambda=2.5) = 0.06680094

Suppose the incidence of the same birth defect in a low


SES census tract is 100 cases per 100,000 births.

4.93 If there are 12,000 births in the census tract in 1


year, then what is the probability that there will be at
least 8 cases of the birth defect during the year (census
tract B in Table 4.21)?

p = 100 / 100,000 = 0.001

n = 12,000, μ = np = 12000* 0.001 = 12

X ~ Pois (μ = 12)

P (X ≥ 8) = 1- P (X ≤ 7)

= 1- ppois (q=7, lambda=12) = 0.9104955


Suppose a city is divided into eight census tracts as shown
in Table 4.21.

4.94 Suppose a child is born with the birth defect but the
address of the mother is unknown. What is the probability
that the child comes from a low SES census tract?

High SES μ = 10000*0.0005 = 5

Low SES μ = (12000+10000+8000+7000+20000+3000) * 0.001


= 60

Total = 65

P (Low SES| total) = 60 / 65 = 0.9230769

4.95 What is the expected number of cases over 1 year in


the city?

High SES μ = 10000*0.0005 = 5

High SES μ = (12000+10000+8000+7000+20000+3000) * 0.001


= 60

The expected number of cases over 1 year = High SES μ + Low


SES μ

= 60 +5 = 65
Unit 5:
Cardiovascular Disease

Because serum cholesterol is related to age and sex, some


investigators prefer to express it in terms of z-scores. If
X= raw serum cholesterol, then Z= X−μ / σ, where μ is the
mean and σ is the standard deviation of serum cholesterol
for a given age–gender group. Suppose Z is regarded as a
standard normal random variable.

5.1 What is Pr (Z < 0.5)?

Z ~ N (0, 1)

P (Z < 0.5) = pnorm (q=0.5, mean=0, sd=1) = 0.6914625

5.2 What is Pr (Z > 0.5)?

Z ~ N (0, 1)

P (Z > 0.5) = 1- P (Z < 0.5) = 1- pnorm (q=0.5, mean=0, sd=1) =


0.3085375
5.3 What is Pr (−1.0 < Z < 1.5)?

Z ~ N (0, 1)

P (−1.0 < Z < 1.5) = P (Z < 0.5) - P (Z < -1) = P (Z < 0.5) - (1- P
(Z < 1))
pnormGC (bound = c(-1, 1.5), region = "between", mean = 0, sd = 1, graph =
TRUE) = 0.7745375

Suppose a person is regarded as having high cholesterol if


Z > 2.0 and borderline cholesterol if 1.5 < Z < 2.0.

5.4 What proportion of people have high cholesterol?

Z ~ N (0, 1)

P (people have high cholesterol) = P (Z > 2.0) = 1- P (Z < 2)

= 1- pnorm (q=2, mean=0, sd=1) = 0.02275013


= pnormGC (bound = 2, region = "above", mean = 0, sd = 1, graph = TRUE) =
0.02275013

5.5 What proportion of people have borderline


cholesterol?

Z ~ N (0, 1)

P (people have borderline cholesterol) = P (1.5 < Z < 2.0.)

P (1.5 < Z < 2.0) = P (Z < 2.0) - P (Z < 1.5)


= pnorm (q=2, mean=0, sd=1) - pnorm (q=1.5, mean=0, sd=1)
= 0.04405707

= pnormGC (bound = c(1.5, 2.0), region = "between", mean = 0, sd = 1, graph =


TRUE) = 0.04405707

Cardiovascular Disease

Serum cholesterol is an important risk factor for coronary


disease. We can show that serum cholesterol is
approximately normally distributed, with mean = 219
mg/dL and standard deviation = 50 mg/dL.

5.14 If the clinically desirable range for cholesterol is <


200 mg/dL, what proportion of people have clinically
desirable levels of cholesterol?

X ~ N (219, 50)

P (people have clinically desirable levels of cholesterol) = P (X <


200)

= pnorm (q=200, mean=219, sd=50) = 0.3519727


5.15 Some investigators believe that only cholesterol
levels over 250 mg/dL indicate a high-enough risk for
heart disease to warrant treatment. What proportion of
the population does this group represent?

X ~ N (219, 50)

P (high-enough risk for heart disease to warrant treatment) = P (X


> 250)

= 1- P (X < 250) = 1- pnorm (q=250, mean=219, sd=50) =


0.2676289

5.16 What proportion of the general population has


borderline high-cholesterol levels—that is, > 200 but <
250 mg/dL?

X ~ N (219, 50)

P (population has borderline high-cholesterol levels) = P (200 < X


< 250)

P (200 < X < 250) = P (X < 250) - P (X < 200)


pnormGC (bound = c(200, 250), region = "between", mean = 219, sd = 50, graph
= TRUE) = 0.3803984

Hepatic Disease

Suppose we observe 84 alcoholics with cirrhosis of the


liver, of whom 29 have hepatomas—that is, liver-cell
carcinoma. Suppose we know, based on a large sample,
that the risk of hepatoma among alcoholics without
cirrhosis of the liver is 24%.

5.50 What is the probability that we observe exactly 29


alcoholics with cirrhosis of the liver who have hepatomas
if the true rate of hepatoma among alcoholics (with or
without cirrhosis of the liver) is .24?

X = Binom (n=84, Prop=0.24)

P (X= 29) = dbinom (x = 29, size = 84, prob = 0.24) =


0.008730478

5.51 What is the probability of observing at least 29


hepatomas among the 84 alcoholics with cirrhosis of the
liver under the assumptions in Problem 5.50?

X ~ Binom (n=84, Prop=0.24)

P (X ≥ 29) = 1- P (X ≤ 28)

= 1- pbinom (q = 28, size = 84, prob = 0.24) = 0.01935102

5.52 What is the smallest number of hepatomas that


would have to be observed among the alcoholics with
cirrhosis of the liver for the hepatoma experience in this
group to differ from the hepatoma experience among
alcoholics without cirrhosis of the liver? (Hint: Use a 5%
probability of getting a result at least as extreme to
denote differences between the hepatoma experiences of
the two groups.)

P(X ≥ x) > 0.05 = 1- P(X ≤ x-1)

myx <- seq (26, 28, 1)

1- pbinom (q=myx, size = 84, prob = 0.24)

x= 28

Environmental Health

5.58 A study was conducted relating particulate air


pollution and daily mortality in Steubenville, Ohio [4]. On
average over the past 10 years there have been 3 deaths
per day in Steubenville. Suppose that on 90 high-pollution
days—days in which the total suspended particulates are in the
highest quartile among all days—the death rate is 3.2
deaths per day, or 288 deaths observed over the 90 high-
pollution days. Are there an unusual number of deaths on
high-pollution days?

X = Pois (μ = 288)

Since μ ≥ 10, X=~Y ~ Pois (μ = 288, sd=sqrt (288))

= P (X ≥ 288) = ~ P (Y ≥ 288) = 1- P (Y < 288)

= 1- pnorm (q=288, mean=288, sd=sqrt (288)) = 0.5 > 0.05

Not unusual

5.106 What is the 40th percentile of a normal distribution


with mean = 5 and variance = 9?

X ~ N (mean = 5, variance = 9)

X ~ N (mean = 5, SD = sqrt (9) = 3)

P (X < x0.40) = qnorm (p=0.40, mean=5, sd=3) = 4.24

qnormGC (area = 0.40, region = "below", mean = 5, sd = 3,


graph = TRUE) = 4.24

5.108 What is z 0.90?

Z ~ N (0,1)

P (Z < z0.90) = qnorm (p=0.90, mean=0, sd=1) = 1.28

qnormGC (area = 0.90, region = "below", mean = 0, sd = 1,


graph = TRUE) = 1.28
Hypertension

Blood pressure readings are known to be highly variable.


Suppose we have mean SBP for one individual over n visits
with k readings per visit (Xn,k ). The variability of (Xn,k )
depends on n and k and is given by the formula σw2 =
σA2/n + σ2/(nk), where σA2 = between visit variability
and σ2 = within visit variability. For 30- to 49-year-old
Caucasian females, σA2 = 42.9 and σ2 = 12.8. For one
individual, we also assume that Xn,k is normally
distributed about their true long-term mean = μ with
variance = σw2.

n=visits, k=readings per visit, σA2 = 42.9, σ2 = 12.8

σw2 = σA2/n + σ2/(nk)

mean = μ with variance = σw2

5.123 Suppose a woman is measured at two visits with


two readings per visit. If her true long-term SBP = 130 mm
Hg, then what is the probability that her observed mean
SBP is ≥140 mm Hg? (Ignore any continuity correction.)
(Note: By true mean SBP we mean the average SBP over a
large number of visits for that subject.)

σw2 = σA2/n + σ2/(nk) = (42.9/2)+ (12.8/(2*2)) = 24.65

mean = μ = 130, with variance = σw2 = 24.65

P (SBP is ≥140) = P (X ≥140) = 1- P (X < 140)

= 1- pnorm (q=140, mean=130, sd=sqrt (24.65)) = 0.02199696


It is also known that over a large number of 30- to 49-
year-old Caucasian women, their true mean SBP is
normally distributed with mean = 120 mm Hg and
standard deviation = 14 mm Hg. Also, over a large number
of African American 30- to 49-year-old women, their true
mean SBP is normal with mean = 130 mm Hg and standard
deviation = 20 mm Hg.

5.125 Suppose we select a random 30- to 49-year-old


Caucasian woman and a random 30- to 49-year-old African
American woman. What is the probability that the African
American woman has a higher true SBP?

Hint: Use Equation 5.10 (on page 133).

30- to 49-year-old Caucasian woman = X1…

mean = 120 mm Hg, sd = 14 mm Hg

African American 30- to 49-year-old women = X2…

mean = 130 mm Hg, sd = 20 mm Hg

Ld = X2 – X1

E(Ld) = E(X2) - E(X1) = 130 – 120 = 10

Var (Ld) = ∑ ci^2 Var (Xi ) = (20^2) + (14^2) = 596

P (Ld > 0) = 1- P (Ld < 0)

= 1- pnorm (q=0, mean=10, sd=sqrt (596)) = 0.6589562

You might also like