R Assignment
R Assignment
Problem set policies. Please provide concise, clear answers for each question. Note that only writing the result of a calculation
(e.g., "SD = 3.3") without explanation is not sufficient. For problems involving R, include the code in your solution, along
with any plots.
Please submit your problem set via Canvas as a PDF, along with the R Markdown source file.
We encourage you to discuss problems with other students (and, of course, with the course head and the TFs), but you must
write your final answer in your own words. Solutions prepared "in committee" are not acceptable. If you do collaborate with
classmates on a problem, please list your collaborators on your solution.
Unit 4
Problem 1. (20 points)
In vertebrates, sweet and savory (“umami”) tastes are sensed by receptors termed T1Rs. Most vertebrates
have three T1Rs, with T1R2 and T1R3 receptors working together to detect sugars (carbohydrates) and
artificial sweeteners, while the T1R1-T1R3 heterodimer mediates umami taste. However, even though birds
lack T1R2 genes, several avian species display high behavioral affinity for nectar or sweet fruit. Receptor
expression studies in hummingbirds revealed that the ancestral umami receptor (T1R1-T1R3) has been
repurposed to detect sugars.1
Researchers investigated whether T1R1-T1R3 function would dictate hummingbird taste behavior. In a
se- ries of field tests, hummingbirds were presented simultaneously with two filled containers, one
containing test stimuli and a second containing sucrose. The test stimuli included aspartame, erythritol,
water, and su- crose. Aspartame is an artificial sweetener that tastes sweet to humans, but is not detected by
hummingbird T1R1-T1R3, while erythritol is an artificial sweetener that is known to activate T1R1-T1R3.
Data on how long a hummingbird drank from a particular container for a given trial, measured in
seconds, is in the file hummingbirds.Rdata. Variable names ending in 1 correspond to the test stimuli, while
names ending in 2 correspond to sucrose. For example, in the first field test comparing aspartame and
sucrose, a hummingbird drank from the aspartame container for 0.54 seconds and from the sucrose
container for
3.21 seconds.
Do the data suggest that T1R1-T1R3 play the described role in hummingbird taste behavior?
To answer this question, analyze the data for each set of trials: aspartame versus sucrose, erythritol versus
sucrose, water versus sucrose, and sucrose versus sucrose. Let α = 0.05. Write a conclusion summarizing
and interpreting the results, referencing numerical results (such as p-values) where appropriate.
1
a) How many new enrollees do they need for each group (old or new interface) to detect an effect size of
0.5 surveys per enrollee, if the desired power level is 80%? Let α = 0.05.
b) Explain the effect of increasing α on the power of the test. What is one disadvantage to increasing
α, from a decision-making standpoint?
Unit 5
Problem 3. (60 points)
Caffeine is the world’s most widely used stimulant, with approximately 80% consumed in the form of
coffee. Suppose a study was conducted to investigate the relationship between coffee consumption and
exercise. Participants were randomly recruited from the undergraduate and graduate student
populations of universities in the Boston/Cambridge area. Participants were asked to report the number
of hours they spent per week on moderate (e.g., brisk walking) and vigorous (e.g., strenuous sports and
jogging) exercise. Based on these data, the researchers estimated the total hours of metabolic equivalent
tasks (MET) per week, a value always greater than 0. The file coffee_exercise.Rdata contains simulated
MET data for the study participants, based on the amount of coffee consumed. The consumption groups are
labeled A - E.
– A: 1 cup or less of caffeinated coffee consumed per week
– B: 2 to 6 cups of caffeinated coffee consumed per week
– C: 1 cup of caffeinated coffee consumed per day
– D: 2 to 3 cups of caffeinated coffee consumed per day
– E: 4 or more cups of caffeinated coffee consumed per day
a) Create a plot that shows the association between MET score and coffee consumption. Describe
what you see.
b) Conduct an analysis to determine whether the average physical activity level varies among the
differ- ent levels of coffee consumption.
i. Assess whether the assumptions for the analysis method are reasonably satisfied.
ii. Summarize the conclusions and comment on the generalizability of the study results.
Unit 7
Problem 5. (200 points)
In Units 6 and 7, you have become familiar with the Prevention of REnal and Vascular END-stage Disease
(PREVEND) study, which took place between 2003 and 2006 in the Netherlands. Clinical and
demographic information for 500 individuals are stored as prevend.samp in the oibiostat package.
The PREVEND data were mainly used throughout the Unit 7 lectures to demonstrate one application of
multiple regression: estimating the association between a response variable and primary predictor of inter-
est while adjusting for confounders. Unit 7, Lab 3 discusses a model for the association of RFFT score
with statin use that adjusts for age, educational level, and presence of cardiovascular disease. This question
uses the PREVEND data in the context of explanatory model building.
Suppose that you have accepted a request to do some consulting work for a friend. Your task is to develop
a prediction model for RFFT score based on the following possible predictor variables and the data in
prevend.samp.
VariableDescription
Age age in years
Gender gender, coded 0 for males and 1 for females
Education highest level of education
DM diabetes status, coded 0 for absent and 1 for present Statin statin use, coded
0 for non-users and 1 for users Smoking smoking, coded 0 for non-smokers and 1 for
smokers
BMI body mass index, in kg/m2
FRS Framingham risk score, measure of risk for cardiovascular event with 10 years
The variable Education is coded 0 for primary school, 1 for lower secondary education, 2 for higher sec-
ondary school, and 3 for university. A higher FRS indicates higher risk of a cardiovascular event.
Your friend has requested that your final model have no more than two predictor variables. Additionally,
your friend would like you to predict the mean RFFT score for a female individual of age 55 with a
univer- sity education, no diabetes, no statin use, who is not a smoker, has BMI of 24, and FRS of 5. Use
only the information necessary to make a prediction from your model.
In your solution, briefly explain the work done at each step of developing the final model and evaluate the
final model’s strengths and weaknesses.
Please consider the following sections for your solution:
Data Exploration
Initial Model Fitting
Model Comparison
Model Assessment
Conclusions
R codes and visualization
3
Unit 8
Problem 6. (200 points)
Biological ornamentation refers to features that are primarily decorative, such as the elaborate tail feathers
of a peacock. The evolution of ornamentation in males has been extensively researched; there are many
studies exploring how male ornamentation functions as a signal of phenotypic and/or genetic quality to
potential mates. In contrast, there are few studies investigating female ornamentation.2
Some biologists have hypothesized that there is strong natural selection against overly conspicuous female
ornaments. Bright or colorful plumage in females might be expected to increase the incidence of predation
on nests for species in which females incubate eggs. Female ornamentation might also undergo positive
selection, functioning in sexual signaling like male ornamentation, and indicating desirable qualities such
as high immune function.
The data in the file rubythroats.Rdata are from a study of 83 female rubythroats, a bird species in which
both males and females exhibit a brightly colored red patch on the throat and breast (referred to as a “bib”).
In rubythroats, females incubate the eggs, while males provide food to females to facilitate uninterrupted
incubation.
– survival: records whether the bird survived to return to the nesting site the subsequent year, yes
if the female was observed and no if the female was not observed
– weight: weight of the bird, measured in grams
– wing.length: wing length of the bird, measured in millimeters
– tarsus.length: tarsus (i.e., leg) length of the bird, measured in millimeters
– first.clutch.size: number of eggs in the first clutch laid during the first year that the bird was
observed
– nestling.fate: whether the nestlings from the first clutch survived to fledging (Fledged) or were lost
to predation (Predated)
– second.clutch: whether the bird laid a second clutch during the first year that the bird was observed,
recorded as Yes for laying a second clutch and No for otherwise
– carotenoid.chroma: a measure of the abundance of red carotenoid pigment in feathers, as measured
from a sample of four feathers taken from the center of the bird’s bib. Larger numbers indicate higher
levels of pigment in the feathers and a more saturated red color.
– bib.area: the total area of the bird’s bib, measured in millimeters squared
– total.brightness: a measure of bib brightness, calculated from spectrometer analyses. Larger num-
bers indicate a brighter red color.
You will be conducting an analysis of the results in order to investigate how bib attributes and other
phe- notypic characteristics of female birds are associated with measures of fitness.
a) Fit a model to predict nestling fate from female bib characteristics (carotenoid chroma, bib area, total
brightness) and female body characteristics (weight, wing length, tarsus length). Identify the slope
coefficients significant at α = 0.10, and provide an interpretation of these coefficients in the
context of the data.
b) Investigate the factors associated with whether a female lays a second clutch during the first year that
she was observed.
i. Is there evidence of a significant association between nestling fate and whether a female lays a second
clutch? If so, report the direction of association.
2
Freeman-Gallant, et al., J Evol. Biol. (2014) 27: 982-991 doi:10.1111/jeb.12369.
4
ii. Fit a model to predict whether a female lays a second clutch from nestling fate and bib
charac- teristics. Identify the two predictors that are most statistically significantly associated
with the response variable.
The two predictors most statistically significantly associated with laying a second clutch are
total brightness (p = 0.030) and nestling fate (p = 0.0015).
iii. Fit a new model to predict whether a female lays a second clutch using the two predictors
iden- tified in part ii. and their interaction. Interpret the model coefficients in the context of
the data.
c) Investigate the factors associated with whether a female survives to return to the nesting site the
subsequent year.
i. Fit a model to predict survival from bib characteristics, female body characteristics, first clutch size,
and whether a second clutch was laid. Identify factors that are positively associated with survival
for the observed birds.
ii. Fit a new model with only the significant predictors from the previous model; let α = 0.10.
Comment on whether this model is preferable to the one fit in part i.
For parts iii. and iv., use the better parsimonious model of the ones fit in parts i. and ii.
iii. Compare the odds of survival for a female who laid 5 eggs in her first clutch to the odds of
survival for a female who laid 3 eggs in her first clutch, if the females are physically identical
and both laid a second clutch.
iv. Suppose female A has bib area 350 mm2, total brightness of 35, carotenoid chroma 0.90,
tarsus length of 19.5 mm, wing length 51 mm, weighs 10.8 g, lays 4 eggs in her first clutch,
and lays a second clutch. Female B has bib area 300 mm2, total brightness of 20, carotenoid
chroma 0.85, tarsus length of 19.0 mm, wing length 50 mm, weighs 10.9 g, lays 3 eggs in her
first clutch, and lays a second clutch. Compare the odds of survival for females A and B.
d) Biological fitness refers to how successful an organism is at surviving and reproducing. Based on
the results of your analysis, briefly discuss whether female ornamentation seems beneficial for
fitness in this bird species. Limit your response to at most ten sentences. You do not need to reference
specific numerical results/models from the analysis.