0% found this document useful (0 votes)

9 views8 pages

ProblemSet Solution

The document discusses various econometric models and their estimation problems, including specification errors due to omitted variables and issues of multicollinearity. It analyzes the impact of different variables on food expenditure, office rental prices, wages, stock prices, and educational outcomes, highlighting the importance of including relevant factors for accurate predictions. Additionally, it addresses potential non-linearity and outlier problems in the data, providing insights into model efficiency and explanatory power.

Uploaded by

4317elyafi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views8 pages

ProblemSet Solution

Uploaded by

4317elyafi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

DATA ANALYSIS FOR ECONOMICS

PROBLEM SET 5: ESTIMATION PROBLEMS - SOLUTION

1 We have the following variables:

Y: Food expenditure in USA.

X: Family income.

P: Price index.

Two different regressions are estimated with the following estimation results
(standard errors are in brackets and sample size is 500):

Regressio Coefficient Coefficient Adjusted R-

n for X for P Squared

Y/P 2.462 0.614

(0.407)
Y / X; P 0.112 -0.739 0.978
(0.003) (0.114)

Find and discuss the specification error the first model is suffering. Explain it
using the estimation results of the above table.

The estimation problem that the first regression model is suffering is the
omission of a relevant explanatory variable (X). You can see that omitting X
variable in the first regression model produces your OLS estimator to be
overestimated. In other words, the effect of P on Y is greater than the one it
should be. In fact, ^β 2=2.462 in the first model whereas ^β 2=−0.739 in the
second model. Additionally, the efficiency of the OLS estimator in the first
model is lower than in the second (compare the standard errors for ^β 2 in
both models). Finally, the determination coefficient in model one is
unreliable as you are omitting a relevant factor making the explanatory
power of the model lower than it should be when introducing X.

2 We have estimated a SLRM explaining office rental prices in the city

of Madrid (Y) with the information contained in distance to the city center
(X). The following two graphs: Figure 1( Y versus X) and Figure 2 (residuals
versus fitted values of Y) are related to the above model.

1
DATA ANALYSIS FOR ECONOMICS

Figure 1
Figure 2

a- Discus according to the two graphs if the model may suffer a non-
linearity problem
According to the first figure, it seems that the relationship between Y
and X is suffering a non-linearity problem. Moreover, it seems there
are decreasing returns. That is, as the distance to the city center
increases, the negative effect of distance on office rental prices
seems to decrease.

When plotting the residuals versus the fitted values (Figure 2), it
seems there is a relationship between them and therefore the
covariance between these two variables is not equal to zero. This is a
signal of this model suffering a non-linearity problem as we want
residuals and predicted values of the dependent variable being
independent. This figure is consistent with the analysis of the first
figure.

b- Provide an economic reason explaining the possible non-linearity in

the above relationship.
As you move further away from the city center you do not expect the
negative effect of distance on rental prices being so big as when you
are very close of the city center and this is the reason why the
relationship between rental prices and distance may suffer
decreasing returns.

c- How should Figure 2 be if the relationship between office rental prices

and distance was a linear relationship?

If the relationship were to be linear, Figure 2 should be a random

cloud of points signaling that residuals and predicted values of the
dependent variable are independent and therefore satisfying linearity
assumption

2
DATA ANALYSIS FOR ECONOMICS

3 There is an econometric study at IE University which relates the

average grade in Econometrics with the time students employ in different
activities during the week. Some students are asked about how many hours
they employ in four different activities: study, sleep, work and leisure. Any
activity must be included in one of these four categories such that the time
spent in the four activities is 168 hours for each student.

The model is the following:

GRADE=β 0 + β 1 study+ β2 sleep+ β3 work + β 4 leisure+ u

a- Find the assumption that does not hold in this model and explain why.
The above model suffers a perfect multicollineraity problem because:
study+ sleep+work +leisure=168 ∀ i

That is, there is a perfect linear relationship among the explanatory

variables included in the regression model. Therefore, there is no
solution when performing the OLS minimization problem and
coefficients cannot be estimated.

b- How would you rewrite the model in order to solve the problem?

One possible solution would be dropping one regressor from the

model such as:

AGE=β 0+ β1 study + β 2 sleep+ β 3 work +u

4 We have representative data for 30 years old people for the US.
Levine, Gustafson and Velenchik (1997) estimated a wage equation using
the following variables:

Y = log(wage)

F = a dummy variable that takes a value of 1 for smokers and 0, otherwise

ED = years of education

Two specifications are considered:

MODEL 1: Y = 5.61 - 0.176F  omitting education

(0.031)

R-squared = 0.35

MODEL 2: Y = 3.78 - 0.080F + 0.070ED  including education

(0.021) (0.0004)

3
DATA ANALYSIS FOR ECONOMICS

R-squared = 0.68

Compare the two fitted models and explain what happens when we omit
one relevant variable (in this case, years of education).

When omitting years of education in the first model we can see that the
negative effect of smokers on salaries is overestimated if compared with the
second regression model (the coefficient in Model 1 is more negative than in
Model 2). In addition, the standard error associated to the effect of smokers
is higher in the first model than in the second. That is, omitting education
produces the estimators in Model 1 being less efficient than in model 2.
Finally, if we compare the two regression models in terms of explanatory
power by computing the adjusted determination coefficients, we can see
that model 2 is better than model 1. That is, including education in the
second model helps to predict better variability in salaries if compared with
the first model. Furthermore, if you were to test the individual significance
of education, you would reject the null, meaning education is statistically
significant variable to explain the behavior of salaries. All of the above is
indicative of model 1 suffering the omission of a relevant factor (years of
education).

5 We have the following information for the annual growth rates (%) in
different countries about stock prices (Y) and in consumer prices (X):

Estimatio
Stock Consumer Predicte n
Country prices (Y) prices (X) dY Residuals
Australia 5 4.3
Austria 11.1 4.6
Belgium 3.2 2.4
Canada 7.9 2.4
Denmark 3.8 4.2
Finland 11.1 5.5
France 9.9 4.7
Germany 13.5 2.2
India 1.5 4
Ireland 6.4 4
Israel 8.9 8.4
Italy 8.1 3.3
Japan 13.5 4.7
Mexico 4.7 5.2
Netherlands 7.5 3.6
New
Zealand 4.7 3.6

4
DATA ANALYSIS FOR ECONOMICS

Sweden 8 4
UK 7.5 3.9
USA 9 2.1

Knowing that: ^
y i=6.83+ 0.201 x i

Answer to the following questions:

y AUSTRALIA=6.83+ 0.201 ( 4 , 3 )=7.694

a- Complete the missing values in the above table.

Consum Predict Normalis

Stock er ed ed
Prices( Prices Residu Residual
Country Y) (X) Y al s
Australia 5 4,3 7,694 -2,694 -0.823
Austria 11,1 4,6 7,755 3,345 0.995
Belgium 3,2 2,4 7,312 -4,112 -1.245
Canada 7,9 2,4 7,312 0,588 0.170
Denmark 3,8 4,2 7,674 -3,874 -1.178
Finland 11,1 5,5 7,936 3,165 0.938
France 9,9 4,7 7,775 2,125 0.627
Germany 13,5 2,2 7,272 6,228 1.870
India 1,5 4 7,634 -6,134 -1.858
Ireland 6,4 4 7,634 -1,234 -0.383
Israel 8,9 8,4 8,518 0,382 0.092
Italy 8,1 3,3 7,493 0,607 0.174
Japan 13,5 4,7 7,775 5,725 1.712
Mexico 4,7 5,2 7,875 -3,175 -0.970
Netherlan
ds 7,5 3,6 7,554 -0,054 -0.026
New
Zeeland 4,7 3,6 7,554 -2,854 -0.869
Sweden 8 4 7,634 0,366 0.099
UK 7,5 3,9 7,614 -0,114 -0.045
USA 9 2,1 7,252 1,748 0.521

b- Show both graphically and formally if the above data suffers from an
outlier problem.

5
DATA ANALYSIS FOR ECONOMICS

15
10
Stock
5
0

2 4 6 8
Consumer

According to the scatter plot, there may be an outlier problem related

to the observation of Israel (slightly different behavior than the rest of
country observations – it present the highest consumer price, equal to
8.4).

Formally, we have to compare each of the normalized estimation

residuals to the critical values such that:

If z >2.06 or z ←2.06 (critical values with a 2% probability at the right

and left hand tails of the normal distribution) then, the corresponding
data point associated to that specific estimation residual can be
considered as an outlier.

Note that the standard deviation of the estimation residuals is 3.3.

See the normalized residuals in the table of section (a) in the last
column.

None of the normalized estimation residuals satisfied the above

conditions and therefore, our model does not suffer from a significant
outlier problem.

c- If the answer to b is positive, please explain any strategy you would

perform in order to solve the problem.

No strategy is required as none of the residuals are significantly

outliers.

6 We have data for a sample of high schools in Vietnam where the

variable math denotes the percentage of students who passed a math test.
We want to estimate the effect that spending per student has on the
outcomes of this test and propose the following model:

6
DATA ANALYSIS FOR ECONOMICS

log ⁡(math)=β 0 + β 1 log ( spend ) + β 2 log ( enroll )+ β3 poverty+u

Where poverty describes the percentage of students living below the

poverty line, spend denotes spending per student and enroll is the number
of students enrolled in the high school.

a- We do not have data for poverty variable but the variable lnchprg
describes the percentage of students eligible for a programme
subsidising school lunches. Why is this variable a sensible proxy
variable for poverty?

Since we do not have data for poverty variable we need to find a

proxy (similar variable to capture the same effect). Therefore, lnchprg
is a good proxy because students living below the poverty line will be,
on average, students eligible for the programme subsidising school
lunches.

b- The table below shows the OLS estimates with and without the
inclusion of lnchprg:

Explanatory
variables (1) (2)

log(spend) 0.13 0.75

(0.30) (0.04)
log(enroll) 0.022 -0.66
(0.615) (0.58)
lnchprg -0.324
(0.036)
intercept -0.24 -0.14
(0.74) (0.99)
n 408 408
R-squared 0.0293 0.1893

Explain why the effect of spending and enrol are greater in the first
model than in the second one? What about if we compare standard
errors between the two models?

In the above table we have a problem of omission of a relevant

explanatory variable. The first model omits lnchprg variable
(significant explanatory variable in the second model). One
consequence when omitting relevant explanatory variables is that
your OLS coefficients are going to be biased. In our example, the
coefficients associated to spend and enroll variables are biased
(greater values than the ones in the second model and therefore

7
DATA ANALYSIS FOR ECONOMICS

overestimating the effect of both variables on the dependent

variable). In addition, and comparing the standard errors associated
to each of the explanatory variables, we can see that in the first
model they are less efficient (standard errors are greater than in the
second model).

c- What conclusions can you derive when comparing both models?

We can conclude that the second model is a better specification than

the first one because it includes an additional relevant and significant
explanatory variable, the signs of the coefficients are the expected
ones, standard errors are more efficient than in the first model and it
has a greater explanatory power than the first model.

Top 80+ Data Analyst Interview Questions and Answers (2024)
No ratings yet
Top 80+ Data Analyst Interview Questions and Answers (2024)
78 pages
02 ML Supervised Learning
No ratings yet
02 ML Supervised Learning
32 pages
FRA Assignment - India Credit Model
No ratings yet
FRA Assignment - India Credit Model
14 pages
The Geomet Curve - A Model For Implementation of Geometallurgy
No ratings yet
The Geomet Curve - A Model For Implementation of Geometallurgy
9 pages
Answered Sheets Combined
No ratings yet
Answered Sheets Combined
52 pages
ECS4863 TL201 2024 - Marking Guide
No ratings yet
ECS4863 TL201 2024 - Marking Guide
23 pages
Econometrics 3A Supplementary Examination Memo
100% (2)
Econometrics 3A Supplementary Examination Memo
9 pages
Tutorial 1-13 Answer Intermediate Macro
No ratings yet
Tutorial 1-13 Answer Intermediate Macro
40 pages
Worksheet Econometrics I
No ratings yet
Worksheet Econometrics I
7 pages
Classical LinearReg 000
No ratings yet
Classical LinearReg 000
41 pages
Chapter 3 - Classical Simple Linear Regression
No ratings yet
Chapter 3 - Classical Simple Linear Regression
52 pages
Econometrics QP Calicut
No ratings yet
Econometrics QP Calicut
17 pages
Econometric Lec1
No ratings yet
Econometric Lec1
72 pages
Forecasting Errors Unit 3
No ratings yet
Forecasting Errors Unit 3
25 pages
Time Series Analysis Using e Views
100% (1)
Time Series Analysis Using e Views
131 pages
Chapter Three
No ratings yet
Chapter Three
35 pages
04 16 Simple Regression
No ratings yet
04 16 Simple Regression
47 pages
Chapter 2
No ratings yet
Chapter 2
17 pages
The Fundamentals of Regression Analysis PDF
No ratings yet
The Fundamentals of Regression Analysis PDF
99 pages
Exponential Smoothing
No ratings yet
Exponential Smoothing
7 pages
Lecture 4 - Bias-Variance Trade-Off and Model Selection
No ratings yet
Lecture 4 - Bias-Variance Trade-Off and Model Selection
66 pages
Econometrics: Autocorrelation: What Happens If The Error Terms Are Correlated?
No ratings yet
Econometrics: Autocorrelation: What Happens If The Error Terms Are Correlated?
43 pages
Ass 1 2019 RMBA
100% (3)
Ass 1 2019 RMBA
8 pages
ECON3049 Lecture Notes 1
No ratings yet
ECON3049 Lecture Notes 1
32 pages
Lecture 1a
No ratings yet
Lecture 1a
17 pages
Questionbank 011020035933
No ratings yet
Questionbank 011020035933
9 pages
Faculty of Social Studies and Humanties
No ratings yet
Faculty of Social Studies and Humanties
16 pages
Revision 235
No ratings yet
Revision 235
8 pages
New Group Assignment
No ratings yet
New Group Assignment
10 pages
PQ Econometrics
No ratings yet
PQ Econometrics
15 pages
Solutions To Sample Final Exam ECO2151
No ratings yet
Solutions To Sample Final Exam ECO2151
7 pages
Econometrics Sheet 2B MR 2024
No ratings yet
Econometrics Sheet 2B MR 2024
5 pages
Econ107 Assignment 1 Prep
No ratings yet
Econ107 Assignment 1 Prep
9 pages
Applications of Econometrics Group Project 2016
No ratings yet
Applications of Econometrics Group Project 2016
18 pages
27.12.10h15 KTLTC De-1
No ratings yet
27.12.10h15 KTLTC De-1
6 pages
Resume Ekonometrika Bab 2
No ratings yet
Resume Ekonometrika Bab 2
6 pages
Assignment 1
No ratings yet
Assignment 1
7 pages
mt1 2017 Soln
No ratings yet
mt1 2017 Soln
8 pages
3 Fall 2007 Exam PDF
No ratings yet
3 Fall 2007 Exam PDF
7 pages
Đề thi cuối kỳ - Tổng hợp - EN1
No ratings yet
Đề thi cuối kỳ - Tổng hợp - EN1
7 pages
Quiz M NG
No ratings yet
Quiz M NG
8 pages
Mock Midterm
No ratings yet
Mock Midterm
5 pages
Worksheet Econometrics
No ratings yet
Worksheet Econometrics
7 pages
Assignments
No ratings yet
Assignments
6 pages
Econometric Theory: Module - Iii
No ratings yet
Econometric Theory: Module - Iii
10 pages
Econometrics Assignment
No ratings yet
Econometrics Assignment
5 pages
Exam Solutions
No ratings yet
Exam Solutions
7 pages
3-Econometrics-Linear Regression
No ratings yet
3-Econometrics-Linear Regression
13 pages
Autocorrelation
No ratings yet
Autocorrelation
36 pages
Past Paper 2019
No ratings yet
Past Paper 2019
7 pages
Eco 810
No ratings yet
Eco 810
4 pages
Ansprac 2
No ratings yet
Ansprac 2
6 pages
Studenmund Top1.107
No ratings yet
Studenmund Top1.107
10 pages
CH 2
No ratings yet
CH 2
31 pages
Midterm Exam Solutions
No ratings yet
Midterm Exam Solutions
4 pages
Section 1: Multiple Choice Questions (1 X 12) Time: 50 Minutes
No ratings yet
Section 1: Multiple Choice Questions (1 X 12) Time: 50 Minutes
7 pages
CHW 4
No ratings yet
CHW 4
7 pages
3SCM Lecture3S PDF
No ratings yet
3SCM Lecture3S PDF
15 pages
Econometrics Exam
No ratings yet
Econometrics Exam
8 pages
Correlation Regression
No ratings yet
Correlation Regression
26 pages
Extra - 2017 - AMJ - PROCESS Versus Structural Equation Modeling
No ratings yet
Extra - 2017 - AMJ - PROCESS Versus Structural Equation Modeling
6 pages
EC212 Course Outline 2015
No ratings yet
EC212 Course Outline 2015
3 pages
Haar Embodied
No ratings yet
Haar Embodied
17 pages
Exercise 9
No ratings yet
Exercise 9
6 pages
Integrating Pakistan's Electricity Demand With Demographic and Energy Indicators: Analysis and Forecast
No ratings yet
Integrating Pakistan's Electricity Demand With Demographic and Energy Indicators: Analysis and Forecast
5 pages
Sales Promotions On FMCG
No ratings yet
Sales Promotions On FMCG
8 pages
Review Problems
No ratings yet
Review Problems
4 pages
University of Pennsylvania CIS 520: Machine Learning Midterm, 2016
No ratings yet
University of Pennsylvania CIS 520: Machine Learning Midterm, 2016
18 pages
2023-24 ML Notes 1
No ratings yet
2023-24 ML Notes 1
25 pages
Stratagic Management
No ratings yet
Stratagic Management
15 pages
1 s2.0 S0378426623001966 Main
No ratings yet
1 s2.0 S0378426623001966 Main
17 pages
Exploration of Mental Health Awareness A
No ratings yet
Exploration of Mental Health Awareness A
9 pages
04+4.Gaurav+Bagra+&+Others-JIER Factors+Influe+Digital+Wallet
No ratings yet
04+4.Gaurav+Bagra+&+Others-JIER Factors+Influe+Digital+Wallet
15 pages
Usage Note 40724: Comparing Covariance Structures, Testing Covariance Parameters Using The COVTEST Statement in PROC GLIMMIX
No ratings yet
Usage Note 40724: Comparing Covariance Structures, Testing Covariance Parameters Using The COVTEST Statement in PROC GLIMMIX
8 pages
Effect of Material Management On The Performance of Benue Brewery Industry, Nigeria
No ratings yet
Effect of Material Management On The Performance of Benue Brewery Industry, Nigeria
7 pages
1 - Generating Unit Model Validation - August 2017
No ratings yet
1 - Generating Unit Model Validation - August 2017
26 pages
Exponential Smoothing With Trends and Seasonality
No ratings yet
Exponential Smoothing With Trends and Seasonality
10 pages
PE Review Probability and Statistics Share
No ratings yet
PE Review Probability and Statistics Share
96 pages
Chapter 17 Least Square
No ratings yet
Chapter 17 Least Square
16 pages
Trip Generation
No ratings yet
Trip Generation
13 pages
Unit-I (Ensemble Learning)
No ratings yet
Unit-I (Ensemble Learning)
67 pages
Arima Jmulti
No ratings yet
Arima Jmulti
11 pages
MBS Report
No ratings yet
MBS Report
29 pages
Statistics II Week 6 Homework
No ratings yet
Statistics II Week 6 Homework
3 pages
The Bayesian Way: Introductory Statistics for Economists and Engineers
From Everand
The Bayesian Way: Introductory Statistics for Economists and Engineers
Svein Olav Nyberg
2.5/5 (6)
Calculus and Statistics
From Everand
Calculus and Statistics
Michael C. Gemignani
4/5 (1)
Improve your skills with Google Sheets: Professional training
From Everand
Improve your skills with Google Sheets: Professional training
Rémy Lentzner
No ratings yet
Practice Tests for CASAS Math GOAL 2 Level D, Forms 927M and 928M
From Everand
Practice Tests for CASAS Math GOAL 2 Level D, Forms 927M and 928M
Coaching For Better Learning
No ratings yet
Practice Tests for CASAS Math GOAL 2 Level E, Forms 929M and 930M
From Everand
Practice Tests for CASAS Math GOAL 2 Level E, Forms 929M and 930M
Coaching For Better Learning
No ratings yet
Hilbert Projection Theorem: Unlocking Dimensions in Computer Vision
From Everand
Hilbert Projection Theorem: Unlocking Dimensions in Computer Vision
Fouad Sabry
No ratings yet

ProblemSet Solution

Uploaded by

ProblemSet Solution

Uploaded by

DATA ANALYSIS FOR ECONOMICS

PROBLEM SET 5: ESTIMATION PROBLEMS - SOLUTION

1 We have the following variables:

Y: Food expenditure in USA.

Regressio Coefficient Coefficient Adjusted R-

Y/P 2.462 0.614

2 We have estimated a SLRM explaining office rental prices in the city

b- Provide an economic reason explaining the possible non-linearity in

c- How should Figure 2 be if the relationship between office rental prices

If the relationship were to be linear, Figure 2 should be a random

3 There is an econometric study at IE University which relates the

The model is the following:

GRADE=β 0 + β 1 study+ β2 sleep+ β3 work + β 4 leisure+ u

That is, there is a perfect linear relationship among the explanatory

One possible solution would be dropping one regressor from the

AGE=β 0+ β1 study + β 2 sleep+ β 3 work +u

F = a dummy variable that takes a value of 1 for smokers and 0, otherwise

Two specifications are considered:

MODEL 1: Y = 5.61 - 0.176F  omitting education

MODEL 2: Y = 3.78 - 0.080F + 0.070ED  including education

Answer to the following questions:

y AUSTRALIA=6.83+ 0.201 ( 4 , 3 )=7.694

a- Complete the missing values in the above table.

Consum Predict Normalis

According to the scatter plot, there may be an outlier problem related

Formally, we have to compare each of the normalized estimation

If z >2.06 or z ←2.06 (critical values with a 2% probability at the right

Note that the standard deviation of the estimation residuals is 3.3.

None of the normalized estimation residuals satisfied the above

c- If the answer to b is positive, please explain any strategy you would

No strategy is required as none of the residuals are significantly

6 We have data for a sample of high schools in Vietnam where the

log ⁡(math)=β 0 + β 1 log ( spend ) + β 2 log ( enroll )+ β3 poverty+u

Where poverty describes the percentage of students living below the

Since we do not have data for poverty variable we need to find a

log(spend) 0.13 0.75

In the above table we have a problem of omission of a relevant

overestimating the effect of both variables on the dependent

c- What conclusions can you derive when comparing both models?

We can conclude that the second model is a better specification than

You might also like