0% found this document useful (0 votes)
16 views

Assignment 3

Uploaded by

hsarpong15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Assignment 3

Uploaded by

hsarpong15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

RMI 8300

Assignment 3
Please show your work clearly to get full credit

1. We will now consider the Boston housing data set, from the MASS library.

(a) Based on this data set, provide an estimate for the population mean of medv.
Call this estimate 𝜇̂ .

(b) Provide an estimate of the standard error of 𝜇̂ .Interpret this result.


Hint: We can compute the standard error of the sample mean by dividing the sample
standard deviation by the square root of the number of observations.

(c) Now estimate the standard error of 𝜇̂ using the bootstrap. How does this
compare to your answer from (b)?

(d) Based on your bootstrap estimate from (c), provide a 95 % confidence interval
for the mean of medv. Compare it to the results obtained using t.test(Boston$medv)
Hint: You can approximate a 95 % confidence interval using the formula [𝜇̂ − 2SE(𝜇̂ ), 𝜇̂
+2SE(𝜇̂ )].

(e) Based on this data set, provide an estimate, 𝜇̂ 𝑚𝑒𝑑 , for the median value of medv
in the population.

(f) We now would like to estimate the standard error of 𝜇̂ 𝑚𝑒𝑑 . Unfortunately, there
is no simple formula for computing the standard error of the median. Instead, estimate
the standard error of the median using the bootstrap. Comment on your findings.

(g) Based on this data set, provide an estimate for the tenth percentile of medv in
Boston suburbs. Call this quantity 𝜇̂ 0.1 .(You can use the quantile() function.)

(h) Use the bootstrap to estimate the standard error of 𝜇̂ 0.1.Comment on your
findings.

2. Consider the use of a logistic regression model to predict the probability of


default using income and balance on the Default data set. In particular, we will now
compute estimates for the standard errors of the income and balance logistic regression
coefficients in two different ways: (1) using the bootstrap, and (2) using the standard
formula for computing the standard errors in the glm() function. Do not forget to set a
random seed before beginning your analysis.
(a) Using the summary() and glm() functions, determine the estimated standard
errors for the coefficients associated with income and balance in a multiple
logistic regression model that uses both predictors.

(b) Write a function, boot.fn(), that takes as input the Default data set as well as an
index of the observations, and that outputs the coefficient estimates for income
and balance in the multiple logistic regression model.

(c) Use the boot() function together with your boot.fn() function to estimate the
standard errors of the logistic regression coefficients for income and balance.

(d) Comment on the estimated standard errors obtained using the glm() function
and using your bootstrap function.

3. We saw that the cv.glm() function can be used in order to compute the LOOCV test
error estimate. Alternatively, one could compute those quantities using just the glm()
and predict.glm() functions, and a for loop. You will now take this approach in order to
compute the LOOCV error for a simple logistic regression model on the Weekly data set.

(a) Fit a logistic regression model that predicts Direction using Lag1 and Lag2.

(b) Fit a logistic regression model that predicts Direction using Lag1 and Lag2 using
all but the first observation.

(c) Use the model from (b) to predict the direction of the first observation. You can
do this by predicting that the first observation will go up if P
(Direction="Up"|Lag1, Lag2) > 0.5. Was this observation correctly classified?

(d) Write a for loop from i =1 to i = n, where n is the number of observations in the
data set, that performs each of the following steps:

i. Fit a logistic regression model using all but the ith observation to predict
Direction using Lag1 and Lag2.
ii. Compute the posterior probability of the market moving up for the ith
observation.
iii. Use the posterior probability for the ith observation in order to predict
whether or not the market moves up.
iv. Determine whether or not an error was made in predicting the direction
for the ith observation. If an error was made, then indicate this as a 1, and
otherwise indicate it as a 0.
(e) Take the average of the n numbers obtained in (d)iv in order to obtain the
LOOCV estimate for the test error. Comment on the results.

4. Perform cross-validation on a simulated data set.

(a) Generate a simulated data set as follows:


>set.seed(100)
>rnorm(100)
>y=x-2*x^2+rnorm(100)
In this data set, what is n and what is p? Write out the model used to generate the data
in equation form.

(b) Create a scatterplot of X against Y . Comment on what you find.

(c) Set a random seed, and then compute the LOOCV errors that result from fitting
the following four models using least squares:

Note you may find it helpful to use the data.frame() function to create a single
data set containing both X and Y .

(d) Repeat (c) using another random seed, and report your results. Are your results
the same as what you got in (c)? Why?

(e) Which of the models in (c) had the smallest LOOCV error? Is this what you
expected? Explain your answer.

(f) Comment on the statistical significance of the coefficient estimates that results
from fitting each of the models in (c) using least squares. Do these results agree with the
conclusions drawn based on the cross-validation results?

You might also like