ProblemSet Solution
ProblemSet Solution
X: Family income.
P: Price index.
Two different regressions are estimated with the following estimation results
(standard errors are in brackets and sample size is 500):
Find and discuss the specification error the first model is suffering. Explain it
using the estimation results of the above table.
The estimation problem that the first regression model is suffering is the
omission of a relevant explanatory variable (X). You can see that omitting X
variable in the first regression model produces your OLS estimator to be
overestimated. In other words, the effect of P on Y is greater than the one it
should be. In fact, ^β 2=2.462 in the first model whereas ^β 2=−0.739 in the
second model. Additionally, the efficiency of the OLS estimator in the first
model is lower than in the second (compare the standard errors for ^β 2 in
both models). Finally, the determination coefficient in model one is
unreliable as you are omitting a relevant factor making the explanatory
power of the model lower than it should be when introducing X.
1
DATA ANALYSIS FOR ECONOMICS
Figure 1
Figure 2
a- Discus according to the two graphs if the model may suffer a non-
linearity problem
According to the first figure, it seems that the relationship between Y
and X is suffering a non-linearity problem. Moreover, it seems there
are decreasing returns. That is, as the distance to the city center
increases, the negative effect of distance on office rental prices
seems to decrease.
When plotting the residuals versus the fitted values (Figure 2), it
seems there is a relationship between them and therefore the
covariance between these two variables is not equal to zero. This is a
signal of this model suffering a non-linearity problem as we want
residuals and predicted values of the dependent variable being
independent. This figure is consistent with the analysis of the first
figure.
2
DATA ANALYSIS FOR ECONOMICS
a- Find the assumption that does not hold in this model and explain why.
The above model suffers a perfect multicollineraity problem because:
study+ sleep+work +leisure=168 ∀ i
b- How would you rewrite the model in order to solve the problem?
4 We have representative data for 30 years old people for the US.
Levine, Gustafson and Velenchik (1997) estimated a wage equation using
the following variables:
Y = log(wage)
ED = years of education
(0.031)
R-squared = 0.35
(0.021) (0.0004)
3
DATA ANALYSIS FOR ECONOMICS
R-squared = 0.68
Compare the two fitted models and explain what happens when we omit
one relevant variable (in this case, years of education).
When omitting years of education in the first model we can see that the
negative effect of smokers on salaries is overestimated if compared with the
second regression model (the coefficient in Model 1 is more negative than in
Model 2). In addition, the standard error associated to the effect of smokers
is higher in the first model than in the second. That is, omitting education
produces the estimators in Model 1 being less efficient than in model 2.
Finally, if we compare the two regression models in terms of explanatory
power by computing the adjusted determination coefficients, we can see
that model 2 is better than model 1. That is, including education in the
second model helps to predict better variability in salaries if compared with
the first model. Furthermore, if you were to test the individual significance
of education, you would reject the null, meaning education is statistically
significant variable to explain the behavior of salaries. All of the above is
indicative of model 1 suffering the omission of a relevant factor (years of
education).
5 We have the following information for the annual growth rates (%) in
different countries about stock prices (Y) and in consumer prices (X):
Estimatio
Stock Consumer Predicte n
Country prices (Y) prices (X) dY Residuals
Australia 5 4.3
Austria 11.1 4.6
Belgium 3.2 2.4
Canada 7.9 2.4
Denmark 3.8 4.2
Finland 11.1 5.5
France 9.9 4.7
Germany 13.5 2.2
India 1.5 4
Ireland 6.4 4
Israel 8.9 8.4
Italy 8.1 3.3
Japan 13.5 4.7
Mexico 4.7 5.2
Netherlands 7.5 3.6
New
Zealand 4.7 3.6
4
DATA ANALYSIS FOR ECONOMICS
Sweden 8 4
UK 7.5 3.9
USA 9 2.1
Knowing that: ^
y i=6.83+ 0.201 x i
b- Show both graphically and formally if the above data suffers from an
outlier problem.
5
DATA ANALYSIS FOR ECONOMICS
15
10
Stock
5
0
2 4 6 8
Consumer
6
DATA ANALYSIS FOR ECONOMICS
a- We do not have data for poverty variable but the variable lnchprg
describes the percentage of students eligible for a programme
subsidising school lunches. Why is this variable a sensible proxy
variable for poverty?
b- The table below shows the OLS estimates with and without the
inclusion of lnchprg:
Explanatory
variables (1) (2)
Explain why the effect of spending and enrol are greater in the first
model than in the second one? What about if we compare standard
errors between the two models?
7
DATA ANALYSIS FOR ECONOMICS