0% found this document useful (0 votes)
26 views3 pages

Assignment 1

Uploaded by

hsarpong15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views3 pages

Assignment 1

Uploaded by

hsarpong15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

RMI 8300

Assignment 1
Please show your work clearly to get full credit

1)- Use Auto data to answer following.


a)- Use the lm() function to perform a simple linear regression with mpg as the
response and horsepower as the predictor. Use the summary() function to print the
results. Comment on the output.
(b) Plot the response and the predictor. Use the abline() function to display the least
squares regression line.
(c) Use the plot() function to produce diagnostic plots of the least squares
regression fit. Comment on any problems you see with the fit.

2)- Use Auto data to answer following.


(a) Produce a scatterplot matrix which includes all of the variables
in the data set.
(b) Compute the matrix of correlations between the variables using
the function cor(). You will need to exclude the name variable, cor() which is qualitative.
(c) Use the lm() function to perform a multiple linear regression with mpg as the
response and all other variables except name as
the predictors. Use the summary() function to print the results. Comment on the
output.
(d) Use the plot() function to produce diagnostic plots of the linear regression fit.
Comment on any problems you see with the fit. Do the residual plots suggest any
unusually large outliers? Does the leverage plot identify any observations with unusually
high leverage?
(e) Use the * and : symbols to fit linear regression models with interaction effects.
Do any interactions appear to be statistically significant?
(f) Try a few different transformations of the variables, such as log(X), squareroot of
X, etc. Comment on your findings.

3)- Create simulated data and fit simple linear regression models to it. Make sure to use
set.seed(1) prior to starting part (a) to ensure consistent results.
(a) Using the rnorm() function, create a vector, x, containing 100 observations
drawn from a N(0, 1) distribution. This represents a feature, X.
(b) Using the rnorm() function, create a vector, eps, containing 100 observations
drawn from a N(0, 0.25) distribution i.e. a normal distribution with mean zero and
variance 0.25.
(c) Using x and eps, generate a vector y according to the model

What is the length of the vector y? What are the values of β0 and β1 in this linear
model?
(d) Create a scatterplot displaying the relationship between x and
y. Comment on what you observe.
(e) Fit a least squares linear model to predict y using x. Comment on the model
obtained. How do βˆ0 and βˆ1 compare to β0 and β1?
(f) Display the least squares line on the scatterplot obtained in (d). Draw the
population regression line on the plot, in a different color. Use the legend() command to
create an appropriate legend.
(g) Now fit a polynomial regression model that predicts y using x and x2. Is there
evidence that the quadratic term improves the model fit? Explain your answer.
(h) Repeat (a)–(f) after modifying the data generation process in such a way that
there is less noise in the data. The model should remain the same. You can do this by
decreasing the variance of the normal distribution used to generate the error term in
(b). Describe your results.
(i) Repeat (a)–(f) after modifying the data generation process in such a way that
there is more noise in the data. The model should remain the same. You can do this by
increasing the variance of the normal distribution used to generate the error term in (b).
Describe your results.
(j) What are the confidence intervals for β0 and β1 based on the original data set,
the noisier data set, and the less noisy data set? Comment on your results.

4)- Perform the following commands in R:


>set.seed(1)
>x1=runif(100)
>x2=0.5*x1+rnorm(100)/10
>y=2+2*x1+0.3*x2+rnorm(100)

(a) The last line corresponds to creating a linear model in which y is a function of x1
and x2. Write out the form of the linear model. What are the regression
coefficients?
(b) What is the correlation between x1 and x2? Create a scatterplot displaying the
relationship between the variables.
(c) Using this data, fit a least squares regression to predict y using x1 and x2.
Describe the results obtained. What are βˆ0, βˆ1,and βˆ2? How do these relate to the
true β0, β1,and β2? Can you reject the null hypothesis H0 : β1 = 0? How about the null
hypothesis H0 : β2 =0?
(d) Now fit a least squares regression to predict y using only x1. Comment on your
results. Can you reject the null hypothesis H0 : β1 =0?
(e) Now fit a least squares regression to predict y using only x2. Comment on your
results. Can you reject the null hypothesis H0 : β1 =0?
(f) Do the results obtained in (c)–(e) contradict each other? Explain your answer.
(g) Now suppose we obtain one additional observation, which was unfortunately
mismeasured.

>x1=c(x1,0.1)
>x2=c(x2,0.8)
>y=c(y,6)
Re-fit the linear models from (c) to (e) using this new data. What effect does this new
observation have on the each of the models? In each model, is this observation an
outlier? A high-leverage point? Both? Explain your answers.

You might also like