Midterm BioStat 2023 Ans
Midterm BioStat 2023 Ans
SPRING 2021
Midterm Exam
Please read the questions carefully and be sure to explain the reasoning behind your answers.
The method used to derive answers will be worth more points than the final numerical solutions.
There is usually sufficient space to write for a full credit answer. But you are free to add
addinotal papers to complete your answers. Please upload your answers to the E3 course website
after completing your exam.
Good Luck!!
名字:__徐衍揚________________________________
學號:__110901003_____________________________
1) ________
2) ________
3) ________
4) ________
5) ________
total: ________
(Only for students who are also taking the R practice course) 6) ________
- 2-
1. (30 points)
The Chinese Mini-Mental Status Test (CMMS) consists of 114 items intended to identify people
with Alzheimer’s disease and senile dementia among people in China. An extensive clinical
evaluation of this instrument was performed, whereby participants were interviewed by psychia-
trists and nurses and a definitive diagnosis of dementia was made. The table below shows the
results obtained for the subgroup of people with at least some formal education.
Suppose a cutoff value of ≤ 20 on the test is used to identify people with dementia.
Ans: Since the cuttoff values ≤ 20 are all identified with dementia, we can get that there are
totally 16 demented people and 12 of them are detected value under 20. So the sensitivity, the
probability of demented in positive is 12/16=0.75
Ans: There are totally 46 nondemented people in this test, but 12 of them are detected dementia,
so the specificity, the probability of nondemented in negative is 34/46=approx. 0.74
2
- 3-
c) The cutoff value of 20 on the CMMS used to identify people with dementia is arbitrary.
Suppose we consider changing the cutoff. What are the sensitivity and specificity if cutoffs
of 5, 10, 15, 20, 25, or 30 are used? Make a table of your results. (10 pts)
0.8
0.75
0.7
0.6
0.5
0.4375
0.4
0.3
0.2 0.1875
0.125
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
e) Suppose we want both the sensitivity and specificity to be at least 70%. Use the ROC curve
to identify the possible value(s) to use as the cutoff for identifying people with dementia,
based on these criteria. (2 pts)
The arrowed point is the cuttoff value which can result in both sensitivity and specificity are at
least 70%.
3
- 4-
f) Calculate the area under the ROC curve. Interpret what this area means in words in the
context of this problem. (3 pts)
4
- 5-
2. (20 points)
A random variable X is said to be a Poisson random vriable random vriable (if its probability
mass function is given by:
e−μ μ k
Pr(X¿k) ¿
k!
Please derive the mathmatic expression of the expected value, E[X], and variance, Var(X),
for the Poisson random variable. Please clearly express each step of the derivation.
Ans:
1. The definition of E[X] is that E[X]=∑ x∗Pr ( X=x )
e−μ μ k
2. Substitute the pmf we can get that E[X]=∑ (k∗¿ )¿ for all the possible of k.
k!
− λt k k
e λt λt
3. Simplify this expression by using the condition we got, E[X]=∑ =e−λt ∑ for all
k! k!
possible values of k.
4. We can recognize this as the Taylor series expansion of e− λt : e− λt∗e λt =1
5. Therefore, we have: E[X]= λt
5
- 6-
3. (20 points)
A study was performed of different predictors of low birthweight deliveries among 32,520
women in the Nurses’ Health Study.
The data in Table 2 were presented concerning the distribution of birthweight in the study:
a) If 20 women are randomly chosen from the study, what is the probability that exactly 2 will
have a low birthweight delivery (defined as < 2500 g)? (5 pts)
Ans:
The probability of exactly 2 women having a low birthweight delivery will be (202 )* (probability
of choosing 2 women from category A and choosing 18 women from else categories)/(choosing
20 women from all)= ( )
20 ∗5.7 %2∗94.3 %18 =0.215
2
b) What is the probability that at least 2 women will have a low birthweight delivery?(5 pts)
Ans:
The probability of at least 2 women having low birthweight=1-Pr(exact 1 lower)-Pr(no lower)=
( ) ( )
1− 20 ∗5.7 % 1∗94.3 %19− 20 ∗5.7 % 0∗94.3 %20=0.317
1 0
6
- 7-
An important risk factor for low birthweight delivery is maternal smoking during pregnancy
(MSMOK). The data in Table 5.6 were presented relating MSMOK to birthweight.
c) If 50 women are selected from the < 2500 g group, then what is the probability that at least 5
of them will have smoked during pregnancy? (5 pts)
Ans:
( )
0 1 ( ) 2
48
( )
2
3
47 3
( )
1− 50 ∗0.6 ∗0.4 − 50 ∗0.6 ∗0.4 − 50 ∗0.6 ∗0.4 − 50 ∗0.6 ∗0.4 − 50 ∗0.6 ∗0.4 =¿ .1
50 0 49 1
4
46 4
( )
d) What is the probability that a woman has a low birthweight delivery if she smokes during
pregnancy? (Hint: Use Bayes’ rule.) (5 pts)
Ans:
If A is the fact of a woman smokes, B is the fact of having lower birthweight delivery.
( B| A )∗Pr ( A ) 1850∗0.4
Pr ( B| A )=Pr = =0.08989
Pr ( B ) 1850∗0.4+ 6289∗0.34 +13537∗0.25+8572∗0.19+ 2272∗0.15
7
- 8-
4. (10 points)
A study is conducted to test the hypothesis that people with glaucoma have higher-than-average
blood pressure. The study includes 200 people with glaucoma whose mean SBP is 140 mm Hg
with a standard deviation of 25 mm Hg.
a) Construct a 95% CI for the true mean SBP among people with glaucoma.
Ans:
To construct the 95% CI of true mean SBP we need to use the following formula:
z α ∗σ
95 % CI =x ±
2 , where x is the sample mean, σ is the standard deviation of the sample, n is
√n
α
the sample size, z α is the percentile. In this example x is 140 mm Hg, σ is 25 mm Hg, n is
2 2
α
200 people, for a 95% CI, =0.025, so z α =1.96
2 2
b) If the average SBP for people of comparable age is 130 mm Hg, is there an association
between glaucoma and blood pressure?
Ans:
To determine if there is an association between glaucoma and blood pressure, we can use the
confidence interval to see if the average SBP for people of comparable age, which is 130 mm
Hg in this case falls in the interval. If 130 mm Hg is within the confidence interval, we cannot
rule out the possibility that the average blood pressure for people with glaucoma is the same as
for people of comparable age. If 130 mm Hg is outside the confidence interval, we can reject the
null hypothesis that the mean SBP for people with glaucoma is the same as for people of
comparable age at a 95% confidence level.
8
- 9-
Using the interval we got which is (136.5, 143.5), we can know that 130 is not within the
interval, so we can conclude that the mean SBP among people with glaucoma is higher than the
average level for people of comparable age, and therefore, there is an association between
glaucoma and blood pressure.
5. (20 points)
The estimation of allele probabilities is essential for the closer quantitative identification of
inheritance. It requires the probabilistic formulation of the applied model of inheritance. The
hereditary disease phenylketonuria (PKU) is a useful example. PKU follows a recessive form of
inheritance. Suppose there are two alleles at a gene locus denoted by a and A where the possible
genotypes are (aa), (aA), and (AA). An individual will only be affected if the genotype aa
appears (i.e., a recessive form of inheritance).
a) Suppose the probability of an a allele is p. If people mate randomly, then what is the
probability of the (aa) genotype?
Ans:
If people mate randomly, the probability of (aa) genotype is p∗p= p2
b) Provide a point estimate and 95% CI for the probability of having the PKU phenotype.
Ans:
To obtain a point estimate for the probability of having the PKU phenotype, we can use the
proportion of individuals in the population with the PKU clinical phenotype. The point estimate
is 11/10,000 = 0.0011. To obtain a 95% confidence interval for this estimate, we can use the
normal approximation to the binomial distribution. The standard error of the proportion is
√ p(1− p)
n
, where p is the point estimate, n is the sample size (10,000). Then, the 95%
confidence interval is p ±1.96*SE, where SE is the standard error. Thus, the 95% confidence
interval for the probability of having the PKU phenotype is (0.0003, 0.0019).
confidence interval for this estimate, we can use the same method as in b). The standard error of
the proportion is
√ p(1− p)
n
. Then, the 95% confidence interval is p ± 1.96*SE, where SE is the
standard error, the 95% confidence interval for the a allele frequency is (0.027, 0.037).
10
11-
-
As an experiment, 10,000 people are completely genotyped, of whom 10 have the (aa) genotype,
630 have the (aA) genotype [i.e., either (aA) or (Aa)], and 9360 have the (AA) genotype.
d) Assuming the two alleles of an individual are independent random variables, provide a point
estimate and a 95% CI for the a allele frequency p.
Ans:
We can use the proportion of a alleles in the population as an estimate of the probability of an a
allele. The point estimate is (2 * number of (aa) genotypes + number of (aA) genotypes) / (2 *
total number of individuals) = (2 * 10 + 630) / (2 * 10,000) = 0.032. To obtain a 95% confidence
interval for this estimate, we can use the same method as in c). The standard error of the
√
proportion is
p(1− p)
n
, the 95% confidence interval is p ± 1.96*SE, the 95% confidence
interval for the a allele frequency is (0.027, 0.037).
e) Does genotyping a population provide more accurate estimates of p than obtained by only
having the clinical phenotype? Why or why not?
Ans:
Genotyping a population provides more accurate estimates of p than obtained by only having the
clinical phenotype because genotyping provides information about the allele frequencies in the
population, which can be used to estimate p. In contrast, having only the clinical phenotype does
11
12-
not provide information about the allele frequencies in the population and assumes that the allele
frequency is equal to the frequency of the recessive phenotype, which may not be accurate
6. (100 points)
a) Draw 1000 random samples from a binomial distribution with parameters n = 100 and p
= .01. Also draw another 1000 radom samples from a poisson distribution with expected
number of events over n trials is (𝜇 = 𝜆𝑡 = 𝑛𝑝). Consider an approximation to both
distributions by a normal distribution with mean = np = 1 and variance = np(1-p). Draw 100
random samples from the normal approximation (using rnorm function). Plot the three
frequency distributions on the same graph, and compare the results. Do you think the normal
approximation is adequate here?
12
13-
b) Answer the same question as described above with the same value of p while n is now equal
to 1000.
The R functions that can be used in this question are listed as below:
Binomial distribution:
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Binomial.html Poisson
distribution:
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Normal.html
13