Worksheet 5 BI 343: Analysis of Biological Data
BI 343: Analysis of Biological Data
Worksheet 5
PROBLEM 1: Hypothesis Testing
Can parents distinguish their own children by smell alone? To investigate, Porter and Moore (1981) gave
new T-shirts to children of nine mothers. Each child wore his or her shirt to bed for three consecutive
nights. During the day, from waking until bedtime, the shirts were kept in individually sealed plastic bags.
No scented soaps or perfumes were used during the study. Each mother was then given the shirt of her
child and that of another, randomly chosen child and asked to identify her own by smell. Eight of nine
mothers identified their children correctly. Use this study to answer the following questions, using a two-
sided test and a significance level of =0.05.
a. What is the appropriate null hypothesis? What is the alternative hypothesis?
The null hypothesis is that there’s absolutely no correlation between the smell on a shirt & a parent’s
ability to figure out if it came from their child. An alternative hypothesis could be that perhaps there’s a
correlation between the smell of a shirt & the ability for a person who lives in the same environment to
identify it.
b. What test statistic should you use?
P = (8/9)
PO = (4.5/9)
((8/9)- (4.5/9))/ sqrt ((4.5/9) (1-(4.5/9))/9) = Z
Z = 2.33333333
c. The following figure shows the null distribution for the number of mothers out of nine guessing
correctly. The probability of each outcome is given above the bars.
1
Worksheet 5 BI 343: Analysis of Biological Data
d. What is the P-value for the test? Show your work.
P-Value = P[8]+P[9] + P[1]+P[0]
P-Value = 0.018 + 0.002 + 0.018 + 0.002
P-Value = 0.04
e. What is the appropriate conclusion?
That there is a significant result.
f. A part of the analysis of these data, why would it be a good idea to calculate a 95% confidence
interval for the true proportion of correct identifications?
P-Value doesn’t give the magnitude of the result & the result of the study would be more significant
looking if a good confidence interval was made from the data.
2
Worksheet 5 BI 343: Analysis of Biological Data
g. If you wanted to be 80% certain that the result obtained was properly rejecting the null hypothesis
(i.e., if you wanted Power to be 0.8), what sample size of mothers should have been tested? Show
your work. For the purposes of this calculation, use the difference between the proportion
correctly guessed under the null hypothesis (p0) and the observed proportion correctly guessed, as
D in the following formula:
8 p 0 (1− p0 )
n≈
D2
N / 8Po (1-Po) = 1 / D^2
8Po (1-Po) / n = D^2
sqrt(8Po (1-Po) / n) = D
D = sqrt(8Po (1-Po) / n)
PO = (4.5/9)
n=9
sqrt(8(4.5/9) (1-(4.5/9)) / 9) = D
0.471404521 = D
3
Worksheet 5 BI 343: Analysis of Biological Data
PROBLEM 2: P-values in Perspective (to be completed in Recitation Meeting)
1. Define the “P-value”:
P-Value is the distribution of the null hypotheses gives the expected probability of a given result vs the
result we got.
2. After reading the ASA Statement on P-values, discuss with your table group what your take-away
messages are regarding P-values and statistical hypothesis testing. Write them down here (and
don’t just copy and paste every recommendation from the ASA statement):
P-Values doesn’t show whether a hypothesis is true or not, the P-Value creates a conflict of interest where
the researcher might end up engaging in “P-hacking” & the P-Value tells you nothing by itself (it’s only
useful in the context of the other parts of the study)
3. The next time you see/write the statement “there is a statistically significant difference…”, what
will you also want to look at?
The magnitude & sample size.
4
Worksheet 5 BI 343: Analysis of Biological Data
PROBLEM 3: One-sided tests
A study is designed to test whether daughters resemble their fathers. In each trial of the study a participant
examines a photo of one girl and photos of two adult men, one whom is the girl’s father. The participant
must guess which man is the father. If there is no daughter-father resemblance, then the probability that
the participant guesses correctly is only ½.
1. Unusually, the authors concluded that they should use a one-sided test to analyse these data (most
hypothesis testing is two-sided). Explain why you think they concluded this, and what the null
and alternative hypotheses would be in this circumstance.
This is because this test is categorical & there’s really no reason other then complete luck that the
daughters were matched correctly to their fathers every time. This means there would be no upper
bound to worry about so all you have to worry about is the lower bound.
2. It is recommended that one-sided tests should be used sparingly. Why do you think that is?
Most of the time you should use two sided tests because they take into account more of the errors
that can happen.