STAT 31631 - Statistical Modeling - Assignment01
STAT 31631 - Statistical Modeling - Assignment01
University of Kelaniya
Academic Year - 2022/2023
STAT 31613 – Statistical Modeling
Assignment 01
Submit all the outputs and R markdown files with your answer sheet. You will need to
create separate R markdown files for each question. The answer script should be submitted
as a pdf document.
1. Use the "mtcars" dataset, which is available in R. This dataset comprises fuel consumption
and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).
i. Load the "mtcars" dataset and display the first few rows. Summarize the key statistics of
the dataset.
ii. Plot a scatterplot of mpg against hp. Add a title and labels for the x and y axes.
iii. Describe the observed relationship between mpg and hp based on the scatterplot.
iv. Fit a simple linear regression model with mpg as the dependent variable and hp as the
independent variable.
v. Display the summary of the regression model. Interpret the coefficients, R-squared, and
the p-value.
vi. Calculate and interpret the Root Mean Squared Error (RMSE) of the model.
vii. Conduct an independent t-test to compare the mean mpg of cars with automatic (am = 0)
and manual (am = 1) transmissions and interpret.
viii. Conduct a one-way ANOVA to compare the mean mpg across different numbers of
cylinders (4, 6, and 8) and interpret.
i. Describe any observed relationships between the Score and each predictor variable.
ii. Fit a multiple linear regression model with Score as the dependent variable and
Hours_studied, Attendance, Study_style, and Sleep_hours as independent variables.
iii. Display the summary of the regression model. Interpret the coefficients, R-squared, and p-
values.
iv. Conduct t-tests to assess the significance of individual predictor variables in predicting
student scores in a multiple linear regression model.
v. Conduct an ANOVA test to determine if there is a significant effect of the predictor
variables on the response variable.
The last line corresponds to creating a linear model in which 𝑦 is a function of 𝑥1 and 𝑥2.
Write out the form of the linear model.
What are the regression coefficients?
b) What is the correlation between 𝑥1 and 𝑥2? Create a scatterplot displaying the relationship
between the variables.
c) Using this data, fit a least squares regression to predict 𝑦 using 𝑥1 and 𝑥2. Describe the
results obtained. What are 𝛽̂0, 𝛽̂1 and 𝛽̂1 ? How do these relate to the true 𝛽0, 𝛽1, and 𝛽2?
Can you reject the null hypothesis 𝐻0: 𝛽1= 0? How about the null hypothesis 𝐻0: 𝛽2 = 0?
d) Now fit a least squares regression to predict 𝑦 using only 𝑥1. Comment on your results. Can
you reject the null hypothesis 𝐻0: 𝛽1= 0?
e) Now fit a least squares regression to predict y using only 𝑥2. Comment on your results. Can
you reject the null hypothesis 𝐻0: 𝛽1= 0?
f) Do the results obtained in (c)–(e) contradict each other? Explain your answer.
g) Now suppose we obtain one additional observation, which was unfortunately mismeasured.
Re-fit the linear models from (c) to (e) using this new data. What effect does this new
observation have on each of the models? In each model, is this observation an outlier? A high-
leverage point? Both? Explain your answers.