Assignment 1
Assignment 1
Assignment 1
Please show your work clearly to get full credit
3)- Create simulated data and fit simple linear regression models to it. Make sure to use
set.seed(1) prior to starting part (a) to ensure consistent results.
(a) Using the rnorm() function, create a vector, x, containing 100 observations
drawn from a N(0, 1) distribution. This represents a feature, X.
(b) Using the rnorm() function, create a vector, eps, containing 100 observations
drawn from a N(0, 0.25) distribution i.e. a normal distribution with mean zero and
variance 0.25.
(c) Using x and eps, generate a vector y according to the model
What is the length of the vector y? What are the values of β0 and β1 in this linear
model?
(d) Create a scatterplot displaying the relationship between x and
y. Comment on what you observe.
(e) Fit a least squares linear model to predict y using x. Comment on the model
obtained. How do βˆ0 and βˆ1 compare to β0 and β1?
(f) Display the least squares line on the scatterplot obtained in (d). Draw the
population regression line on the plot, in a different color. Use the legend() command to
create an appropriate legend.
(g) Now fit a polynomial regression model that predicts y using x and x2. Is there
evidence that the quadratic term improves the model fit? Explain your answer.
(h) Repeat (a)–(f) after modifying the data generation process in such a way that
there is less noise in the data. The model should remain the same. You can do this by
decreasing the variance of the normal distribution used to generate the error term in
(b). Describe your results.
(i) Repeat (a)–(f) after modifying the data generation process in such a way that
there is more noise in the data. The model should remain the same. You can do this by
increasing the variance of the normal distribution used to generate the error term in (b).
Describe your results.
(j) What are the confidence intervals for β0 and β1 based on the original data set,
the noisier data set, and the less noisy data set? Comment on your results.
(a) The last line corresponds to creating a linear model in which y is a function of x1
and x2. Write out the form of the linear model. What are the regression
coefficients?
(b) What is the correlation between x1 and x2? Create a scatterplot displaying the
relationship between the variables.
(c) Using this data, fit a least squares regression to predict y using x1 and x2.
Describe the results obtained. What are βˆ0, βˆ1,and βˆ2? How do these relate to the
true β0, β1,and β2? Can you reject the null hypothesis H0 : β1 = 0? How about the null
hypothesis H0 : β2 =0?
(d) Now fit a least squares regression to predict y using only x1. Comment on your
results. Can you reject the null hypothesis H0 : β1 =0?
(e) Now fit a least squares regression to predict y using only x2. Comment on your
results. Can you reject the null hypothesis H0 : β1 =0?
(f) Do the results obtained in (c)–(e) contradict each other? Explain your answer.
(g) Now suppose we obtain one additional observation, which was unfortunately
mismeasured.
>x1=c(x1,0.1)
>x2=c(x2,0.8)
>y=c(y,6)
Re-fit the linear models from (c) to (e) using this new data. What effect does this new
observation have on the each of the models? In each model, is this observation an
outlier? A high-leverage point? Both? Explain your answers.