0% found this document useful (0 votes)
431 views

Assignment 2 - HLTH 605b - Fall 2020 (100 Marks)

This document outlines an assignment for a public health statistics course. It contains 3 problems involving analysis of data using R and interpretation of results. Problem 1 involves analyzing a figure to make arguments about the relationship between age and blood pressure. Problem 2 addresses conceptual questions about linear regression and categorical variables. Problem 3 uses a dataset to: [1] plot and compare scatterplots; [2] fit and interpret a linear regression model; [3] overlay the regression line on a scatterplot and compare fits between groups. The student must show their R code and analysis to receive full marks.

Uploaded by

David Okoduwa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
431 views

Assignment 2 - HLTH 605b - Fall 2020 (100 Marks)

This document outlines an assignment for a public health statistics course. It contains 3 problems involving analysis of data using R and interpretation of results. Problem 1 involves analyzing a figure to make arguments about the relationship between age and blood pressure. Problem 2 addresses conceptual questions about linear regression and categorical variables. Problem 3 uses a dataset to: [1] plot and compare scatterplots; [2] fit and interpret a linear regression model; [3] overlay the regression line on a scatterplot and compare fits between groups. The student must show their R code and analysis to receive full marks.

Uploaded by

David Okoduwa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Assignment 2 - HLTH 605b - Fall 2020 (100 marks)

Write up your own answers to the following questions. Also, where asked to use R, you must do so. Include
your R code as part of your answer, similar to how the code and subsequent results are presented in the
module notes. Submit your answers to the Assignment 2 dropbox in .pdf, .doc, or .docx format.

1. Problem 1 (20 marks)


Look at Fig. 3.1 on p. 35 of the Vittinghoff et al. course textbook.
a. (6 marks) Without utilizing any other evidence, such as statistical analysis results/output elsewhere in
the chapter or medical knowledge outside the class, provide a brief plausible argument on why age is
positively associated with systolic blood pressure (SBP) based only on what you see in Fig. 3.1.
b. (6 marks) Again, without utilizing any other evidence, now provide a brief plausible argument on why
age is not positively associated with systolic blood pressure (SBP) based only on what you see in Fig.
3.1.
c. (8 marks) What additional statistical evidence would you need, beyond Fig. 3.1, to make a more
convincing argument in the direction of “is positively associated” or, alternatively, “is not positively
associated” for the relationship between age and SBP.

2. Problem 2 (18 marks)


Answer the following conceptual questions connected to linear regression.
a. (10 marks)
(5 points) What do we mean by a curvilinear association between a numeric response and numeric
predictor, say as indicated in a scatterplot?
(5 points) And explain how we can still specify a linear regression model when such a curved relationship
exists?
b. (8 marks) Try to explain the difference between a nominal (unordered) categorical variable and an
ordinal (ordered) categorical variable, using one example from public health in each case to help support
your answer (and these examples should not be ones mentioned in your module notes or assigned
reading).

3. Problem 3 (62 marks)


Let’s go back to the whiteside dataset partially explored in the Module 2b notes. For this entire question,
for any plotting or computation you are asked to do, you will now focus only on the part of
this dataset when the variable Insul is equal to ‘After’.
a. (12 marks) (4 points) Using R, and the R function ggplot, produce one scatterplot with T emp on
the x-axis and Gas on the y-axis. (Again, remember to do so only for the Insul is ‘After’ data). Like
you did in Assignment 1, add your own centered title to this plot. (8 points) Next, compare what you
see in this plot to the scatterplot created for the ‘Before’ data in FIGURE 2b.1 in the Module 2b notes,
reporting any similarities and differences in what you observe in these two plots.

1
b. (18 marks)
(4 points) Using R, fit a simple linear regression of Gas (Y ) on Temp (X).
(3 points) Write out the fitted simple linear regression model, and (5 points) then interpret it.
(6 points) And specifically what would happen to the estimate of the slope of your fitted regression line
if the data for temperature (T emp) was reported in Fahrenheit units instead of units of Celcius?
c. (10 marks)
(5 points) Using R, re-create the plot from (3.a), but now overlay the fitted regression line. Again,
add a centered title to the plot.
(5 points) In words, explain how the line fits the data points, and also mention how the fit for the ‘After’
data compares with that from the regression line fit on the ‘Before’ data seen in FIGURE 2b.2 in the
Module 2b notes.
d. (22 marks)
(13 marks) Using R, obtain (i) a 95% confidence interval for the model’s predicted value of Gas when
Temp has a value of 6.0 degrees Celsius, and (ii) a 99% confidence interval for the model’s slope.
(9 marks) Briefly interpret each of these two interval estimates.

You might also like