Econometrics Mock Exam - Solutions
Econometrics Mock Exam - Solutions
Exercise 1
Consider the linear model with k regressors: Y = X + e .
Under the strong OLS hypotheses:
a) Find 𝛽̂ , the OLS estimator of the parameters β.
b) Find the relation between the residuals 𝜀̂ and the errors e.
c) Prove that 𝛽̂ and 𝜀̂ are independent.
d) Where in a), b), c) do you need the hypothesis of normal errors? Describe in details a procedure
to test for this hypothesis.
Solution
For a) b) c) see the book and question 2 from problem set 1. For d): the hypothesis of normal errors
is needed in point c) because under normality assumption variables are independent if and only if
they are uncorrelated. A procedure to test for this is the Jarque-Bera test.
Exercise 2
Given the model
Y = X + e
Under the weak OLS hypotheses and assuming 𝑅𝛽 = 𝑟 , consider the restricted OLS estimator:
̂
𝛽 ̃ ̂ ′ −1 ′ ′ −1 ′ −1 ̂
𝑅𝐿𝑆 = 𝛽 = 𝛽 + (𝑋 𝑋) 𝑅 [𝑅(𝑋 𝑋) 𝑅 ] [𝑟 − 𝑅𝛽 ]
a) Find the expected value and variance of the restricted OLS estimator.
b) Is the variance larger, equal or smaller than that of the OLS estimator? Would your answer
change if 𝑅𝛽 ≠ 𝑟?
Solution
See question 1 from problem set 2. In b) the variance is smaller regardless of whether 𝑅𝛽 = 𝑟 or not, as we
did not impose this condition when deriving the variance.
Exercise 3
Explain what each command in the following R code does, what the output means, and what the code as a
whole performs. (You don’t need to explain the econometric theory behind the procedures, only which
procedures are being implemented, 1 or 2 lines per command should be sufficient, plus the comments on
output)
Solution
“remove” clears the memory from all the variables previously defined.
“library(package)” loads the package “package”. In this example the commands load the packages “foreign”,
that contains commands to import data, and “leaps”, that contains the commands to perform variable
selection.
“read.dta(…)” is a command to read stata data files (.dta) from a web page. We save the data as “ceosal1”.
“is.na” checks how many rows of the dataset have missing observations for some variable, and returns a
logical vector with 1 if observations are missing and 0 otherwise.
“sum” performs the sum of the elements in a vector. The sum of “is.na” is the number of rows with missing
observations, in this case 0.
“regsubsets” is the command to perform best subset variable selection. It takes an input the model and the
dataset. Here the model is “salary”, which means that we are considering all variables in the dataset as
potential explanatory variables for “salary”. The dataset is “ceosal1” that we defined above.
“summary” prints the results of the best subsets estimation, in a table with the number of variables in the
model on the rows and the variable on the columns. A star in row n for variable X means that the best model
with n explanatory variables contains X.
“reg.full.summary$bic” yields a vector of the values of the BIC for the models with 1 to the maximum
number of variables.
“which.max” finds the index of the maximum of a vector. In this case: the number of variables correspoding
to the highest BIC value. We save the index as “index.full”.
“coef” prints the coefficients of the regression for the model with “index.full” variables. We see that the best
model has only 4 variables (3 plus intercept): utility, lsalary and lsales.