Demo0 Sol1
Demo0 Sol1
Name:
Student ID:
Signature:
10. proof
Statistics Exam: 00001 2
1. Problem
Which of the following are TRUE?
(a) Let V (Y | X) = σ 2 . Then, the estimate σ̂ measures the proportion of the variation
explained by the regression model.
(b) P
As long as the mean function of a linear regression model includes the intercept, then
n
i=1 ei = 0, where ei are the regression residuals.
(d) Given βˆ0 the ordinary least squares estimate of the intercept, then its variance is:
x̄2
V (β̂0 | X) = σ 2 SS x
, where x̄ is the mean of the predictor X, SSx is the sum of squares
of X and σ 2 is the residual variance.
2. Problem
A multiple regression model of the following form is fitted to a data set:
y = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + ε, ε ∼ N (0, σ 2 ).
The model is fitted using the software R and the following summary output is obtained:
3. Problem
Which of the following are TRUE?
(a) If you have estimated coefficients from a fit of E(Y | X1 ) = β0 + β1 x1 , then you know
the sign of β1 in E(Y | X1 , X2 ) = β0 + β1 x1 + β2 x2 .
(b) We have a regression with mean function E(Y | X1 , X2 ) = β0 + β1 x1 + β2 x2 . Suppose
that the two terms X1 and X2 have sample correlation equal to 0. Then the value of
the slope of the regression for X2 on X1 is 0.
(c) If you fit a multiple regression with 15 data points and 3 predictors (including the inter-
cept), then the “hat” matrix is 3 × 3.
Statistics Exam: 00001 3
x0 x1 x2 x3
0.524 −0.507 −0.19 −0.059
−0.507 0.652 0.262 0.167
−0.19 0.262 0.283 0.079
−0.059 0.167 0.079 0.212
4. Problem
Which of the following are TRUE?
(a) You did a regression analysis with model y = β0 + β1 x1 + β2 x2 + ε and you are looking
forward to add a new regressor to improve your model. Given the plot below, adding X3
to the model might increase multicollinearity, rather adding X4 might decrease multi-
X3
X4
X2 X2
collinearity issues:
(b) Consider the linear regression model E[Y | X1 , X2 , X3 ] = β0 + β1 x1 + β2 x2 + β3 x3 . To
compute the VIF of β̂1 you need the R2 of the regression model E[X1 | X2 , X3 ].
(c) In a regression model with a numeric predictor, a dummy predictor and without any
interaction term, there can be more than one slope, but only one intercept.
(d) Let X be a categorical variable with 3 levels (a, b, c) and consider the representation
using the corresponding dummies x1 , x2 and x3 . Consider the fitted linear regression
y = 0.85 − 0.5x2 − 0.35x3 + ε , where level “a” is the baseline. Then, −0.5 represents
the difference in the average Y between level “b” and “a”.
5. Problem
Which of the following is TRUE?
−1
(a) The leverage of the observation i is x> >
i X X xi , where X is the design matrix with
>
rows xi .
(b) High leverage points are the observations that do not fit the model.
−1
(c) Let V (Y | X) = Σ. Then, the Generalized Least Square estimate is β̂ = X> ΣX X> Σy,
where X is the design matrix and y the observed response vector.
6. Problem
Given the residual plots below, which of the following is FALSE?
Statistics Exam: 00001 4
Residuals
0
0
(a) Plot (i) suggests that the assumption of constant variance is not consistent with ob-
served data.
(b) Plot (ii) indicates some nonlinearity.
(c) Neither
7. Problem
Which of the following is TRUE?
(a) The backward stepwise selection searches through 2p possible models.
(b) Models with many parameters are always better for prediction then simple models with
just a few parameters.
(c) The forward stepwise selection searches through 1 + p(p + 1)/2 models.
8. Problem
Consider the output of the variable selection procedure carried out using the R function
regsubsets(). Which of the following is TRUE?
(a) Neither.
(b) The best 2-variable model contains only x1 and x2.
Statistics Exam: 00001 5
(c) The best model is the one including all the 6 predictors.
9. Problem
Which of the following is TRUE?
(a) We consider the logistic regression model with linear predictor 0.3 + 0.6x1 . For x1 = 0,
the estimated probability of success is smaller than 0.58.
q
(b) The standard error of the logistic regression coefficient βj is given by θ̂(1 − θ̂), where
θ̂ is the estimated probability of success.
(c) We consider the logistic regression model with linear predictor β0 +β1 x. Let [−0.39; 0.87]
be the 95%-confidence interval for β1 . In this case, a z-Test with significance level 1%
rejects the null hypothesis H0 : β1 = 0.
(d) We consider the logistic regression model with linear predictor 0.6 + 0.3x1 . For x1 = 2,
the estimated probability of success is higher than 0.64.
10. Problem
Derive the global F-statistic for a multiple linear regression model and show that, when only
one predictor is included, then the F-statistic is equal to the square of the t-statistic of β1 .