Textbook Practice Problems 1
Textbook Practice Problems 1
Textbook Practice Problems 1
Lessen 2:
A man runs 1 mile approximately once per weekend. He records his time
over an 18-week period. The individual times and summary statistics are
given in Table 2.14.
2.9 What is standard deviation of the 1 mile running time over 18 weeks?
Solution:
d1 <- c(12.80,11.57,12.20,11.73,12.25,12.67,12.18,11.92,11.53,11.67,12.47,11.80,
12.30,12.33,12.08,12.55,11.72,11.83)
sd(d1) = 0.3874181
_________________________________________________________________
Solution:
d1 <- c(12.80,11.57,12.20,11.73,12.25,12.67,12.18,11.92,11.53,11.67,12.47,11.80,
12.30,12.33,12.08,12.55,11.72,11.83)
d2 <- 100*(d1)
d2
mean(d2) = 1208.889
sd(d2) = 38.74181
2.11 Construct a stem and leaf plot of time_100 using the first 3 most
significant digits for the stem and the least significant digit for the leaf.
So, for week 1, time_100 = 1280 which has a stem = 128 and a leaf = 0.
Solution:
stem (d2, scale=3)
115 | 37
116 | 7
117 | 23
118 | 03
119 | 2
120 | 8
121 | 8
122 | 05
123 | 03
124 | 7
125 | 5
126 | 7
127 |
128 | 0
Hypertension:
In an experiment that examined the effect of body position on blood
pressure [8], 32 participants had their blood pressures measured while
lying down with their arms at their sides and again standing with their
arms supported at heart level. The data are given in Table 2.16.
#Spb taken in recumbent position
rec_sbp <- c(99,126,108,122,104,108, 116,106,118,92,110,138,120,142,118,134,
118,126,108,136,110,120,108,132,102,118,116,118,110,122,106,146)
diff_sbp
[1] -6 2 6 8 8 12 10 0 -2 4 8 14 2 6 26 8 10 12 14 -8 10 14 14 4 6 16 28
[28] 18 14 4 12 8
diff_dbp
[1] -8 -2 4 -4 2 4 0 -2 -8 -2 -2 5 -14 -2 0 8 4 2 8 -2
[21] 14 4 0 4 4 2 16 -4 4 -6 6 -4
2.20 Construct stem-and-leaf and box plots for the difference scores for
each type of blood pressure.
> stem(diff_sbp)
-0 | 86
-0 | 2
0 | 022444
0 | 66688888
1 | 00022244444
1 | 68
2|
2 | 68
> stem(diff_dbp)
-1 | 4
-0 | 886
-0 | 444222222
0 | 0002224444444
0 | 5688
1|4
1|6
>boxplot(diff_sbp)
bwplot(diff_dbp)
2.21 Based on your answers to Problems 2.19 and 2.20, comment on the
effect of body position on the levels of systolic and diastolic blood
pressure.
Systolic blood pressure clearly seems to be higher in the recumbent position than in
the standing position. Diastolic blood pressure appears to be comparable in the two
positions. The distributions are each reasonably symmetric.
10% 90%
0 16
10% 90%
-6 8
Lessen 3:
#preparing data
#4. Create a list of the variables that you want to change to factor:
fac_var <- c ("area", "sex", "iq_type", "lead_grp","Group", "fst2yrs",
"pica", "colic", "irrit", "convul")
2.31 Compare the exposed and control groups regarding age and gender,
using appropriate numeric and graphic descriptive measures.
#2.31:
# we want a numerical (e.g. mean, median, 5th num summary, sd, etc,)
#and graphical (e.g., boxplot, histogram, density plot, etc,) summary:
# 1* comparing the ages of the individuals in the exposed and control groups
# 2* comparing the distribution of sex in the exposed and control groups
#Num:
xtabs(~Group+sex,data=lead.df)
install.packages("tigerstats")
require("tigerstats")
rowPerc(xtabs(~Group+sex,data=lead.df))
#gragh:
bargraph(~sex,group=Group,data=lead.df) #for count
bargraph(~sex,group=Group,data=lead.df,type="percent") #for percentage
2.32 Compare the exposed and control groups regarding verbal and
performance IQ, using appropriate numeric and graphic descriptive
measures.
# we want a numerical (e.g. mean, median, 5th num summary, sd, etc,)
#and graphical (e.g., boxplot, histogram, density plot, etc,) summary:
# 1* comparing the verbal IQ in the exposed and control groups
# 2* comparing performance IQ in the exposed and control groups
names(lead.df)
str(lead.df)
#Graphic:
bwplot(Group~iqv,data=lead.df)
Microbiology
A study was conducted to demonstrate that soybeans in- oculated with nitrogen-fixing
bacteria yield more and grow adequately without expensive environmentally deleterious
synthesized fertilizers. The trial was conducted under con- trolled conditions with uniform
amounts of soil. The initial hy- pothesis was that inoculated plants would outperform their
uninoculated counterparts. This assumption is based on the facts that plants need nitrogen
to manufacture vital proteins and amino acids and that nitrogen-fixing bacteria would
make more of this substance available to plants, increasing their size and yield. There were
8 inoculated plants (I) and 8 uninoculated plants (U). The plant yield as measured by pod
weight for each plant is given in Table 2.20.
I <- c(1.76,1.45,1.03,1.53,2.34,1.96,1.79,1.21)
U <- c(0.49,0.85,1.00,1.54,1.01,0.75,2.11,0.92)
> favstats(I)
min Q1 median Q3 max mean sd n missing
1.03 1.39 1.645 1.8325 2.34 1.63375 0.4198958 8 0
> favstats(U)
min Q1 median Q3 max mean sd n missing
0.49 0.825 0.96 1.1425 2.11 1.08375 0.5097881 8 0
2.36 Use graphic methods to compare the two groups.
Sulotion:
cor(I ~ U)
0.0266867
xyplot(I ~ U)
2.37 What is your overall impression concerning the pod weight in the two
groups?
Inoculated plants (I) tend to have higher pod weight than uninoculated plants (U)
Unit 3:
Mental Health
P( Ā∩B∩C )
P( A∩ B∩C̄ )+ P ( A∩ B̄∩C )+ P( Ā∩B ∩C )
(.951×.023×. 078) . 00171
= = =.2639
(.049×. 023×. 922)+(.049×. 977×. 078 )+(. 951×. 023×. 078 ) . 00648
--------------------------------------------------------------------------------------------
-
P(A) = {The probability that male 75–79 years of age will have Alzheimer’s}
= 0.049
P(B) = {The probability that Female 75–79 years of age will have
Alzheimer’s} = 0.023
P (A∩ B) = 0.0015
3.25 What is the probability that at least one member of the couple is affected?
P (A ∪ B) = P(A) + P(B) – P (A ∩ B)
The expected overall prevalence is 6.1% (or, 6.1 per 100 population).
--------------------------------------------------------------------------------------------
Hypertension
Laboratory measures of cardiovascular reactivity are receiving increasing attention. Much of the
expanded interest is based on the belief that these measures, obtained under challenge from
physical and psychological stressors, may yield a more biologically meaningful index of
cardiovascular function than more traditional static measures. Typically, measurement of
cardiovascular reactivity involves the use of an automated blood-pressure monitor to examine the
changes in blood pressure before and after a stimulating ex- perience (such as playing a video
game). For this purpose, blood-pressure measurements were made with the Vita- Stat blood-
pressure machine both before and after playing a video game. Similar measurements were
obtained using manual methods for measuring blood pressure. A person was classified as a
“reactor” if his or her DBP increased by 10 mm Hg or more after playing the game and as a
nonre- actor otherwise. The results are given in Table 3.11.
3.78 If the population tested is representative of the general population, then what are the PV+
and PV− using this test?
The Chinese Mini-Mental Status Test (CMMS) consists of 114 items intended to identify people
with Alzheimer’s disease and senile dementia among people in China [14]. An extensive clinical
evaluation of this instrument was perormed, whereby participants were interviewed by
psychiatrists and nurses and a definitive diagnosis of dementia was made. Table 3.13 shows the
results obtained for the subgroup of people with at least some formal education.
Suppose a cutoff value of ≤ 20 on the test is used to identify people with dementia.
+
sensitivity = P (test | disease) = 12/16 = 0.75
=
Specificity = P (test | no disease) = 34/16 =0.73
3.89 The cutoff value of 20 on the CMMS used to identify people with dementia is arbitrary.
Suppose we consider changing the cutoff. What are the sensitivity and specificity if cutoffs of 5,
10, 15, 20, 25, or 30 are used? Make a table of your results.
3.90 Construct a ROC curve based on the table constructed in Problem 3.89.
3.91 Suppose we want both the sensitivity and specificity to be at least 70%. Use the ROC curve
to identify the possible value(s) to use as the cutoff for identifying people with dementia, based
on these criteria.
Based on criteria that both the sensitivity and specificity must be at least 70%, the cutoff value
for people with dementia is CMMS score =< 20.
The criterion is that both the sensitivity for sensitivity and specificity must be at least 70%. From
Exercise 3.89, the sensitivity for CMMS score < 20 is 0.750 and specificity is 0.739. Both the
values are greater than 70%. Also, from Exercise 3.90, in ROC curve it can be observed that
point (0.261, 0.750) satisfies the criterion for CMMS score < 20. Hence, from both the table and
ROC curve, cut off value for people with dementia is CMMS score < 20 for based on criteria that
both the sensitivity and specificity must be at least 70%.
3.92 Calculate the area under the ROC curve. Interpret what this area means in words in the
context of this problem.
**Answer**
**Supporting Work**
*Answer*
For marriage:
3.117 Suppose that a baby is born with a birth defect, but the
baby’s ancestry is unknown. What is the posterior probability that
the baby will have both parents from popula- tion A, both parents
from population B, or mixed ancestry, respectively? (Hint: Use
Bayes’ rule.)
## Problem 3.117
**Answer**
**Supporting Work**
The posterior probability that the baby will have both parents
from population A = ((PRE-A) ^2 * P(A)) /P(D)
**Answer**
**Supporting Work**
The posterior probability that the baby will have both parents
from population B = ((PRE-B) ^2 *P(B)) / P(D))
**Answer**
**Supporting Work**
Unit 4:
Let X be the random variable representing the number of
hypertensive adults in Example 3.12.
Suppose we know that Pr(A) = .1, Pr(B) = .2. What can we say
about Pr(A ∩ B) = Pr(mother’s DBP ≥ 90 and father’s DBP ≥ 90) =
Pr(both mother and father are hypertensive)? We can say nothing
unless we are willing to make certain assumptions.
"0" represents that both male and female are not affected,
"1" represents one adult is affected and one adult is not
affected, and
"2" represents that both male and female are affected.
X 0 1 2
E(X)= μX = {x · P(X=x)
F (x) = P(X<0) = 0
nPk = n! / (n-k)!
254251200
nCk = n! / k! * (n-k)!
P(X>= 6) = 1- P(X ≤ 5)
Hypertension
P=0.5*0.5*0.5=0.125
P (X>= 7) = 1- P (X ≤ 6)
Cardiovascular Disease
x= 22
Hospital Epidemiology
X ~ Pois (μ = 1)
P (X ≥ 1) = 1- P (X ≤ 0)
3. P (X ≥ 2) = 1 – P (X = 0) + P (X = 1) = 1- (6.144212e-06 +
7.373055e-05)
= 0.9999201
Obstetrics
Suppose the incidence of a specific birth defect in a high
socioeconomic status (SES) census tract is 50 cases per
100,000 births.
p = 50 / 100,000 = 0.0005
X = Pois (μ = 2.5)
X ~ Pois (μ = 12)
P (X ≥ 8) = 1- P (X ≤ 7)
4.94 Suppose a child is born with the birth defect but the
address of the mother is unknown. What is the probability
that the child comes from a low SES census tract?
Total = 65
= 60 +5 = 65
Unit 5:
Cardiovascular Disease
Z ~ N (0, 1)
Z ~ N (0, 1)
Z ~ N (0, 1)
P (−1.0 < Z < 1.5) = P (Z < 0.5) - P (Z < -1) = P (Z < 0.5) - (1- P
(Z < 1))
pnormGC (bound = c(-1, 1.5), region = "between", mean = 0, sd = 1, graph =
TRUE) = 0.7745375
Z ~ N (0, 1)
Z ~ N (0, 1)
Cardiovascular Disease
X ~ N (219, 50)
X ~ N (219, 50)
X ~ N (219, 50)
Hepatic Disease
P (X ≥ 29) = 1- P (X ≤ 28)
x= 28
Environmental Health
X = Pois (μ = 288)
Not unusual
X ~ N (mean = 5, variance = 9)
Z ~ N (0,1)
Ld = X2 – X1