DOC-20191127-WA0017. Ecs4863
DOC-20191127-WA0017. Ecs4863
NB: Please contact me (Jeka: 073 768 7330) if you need data sets to practice.
May/June 2017
ANSWERS
QUESTION 1: [55]
Question 1.1 (10)
Calculate South Africa’s investment to GDP ratio for the period 1990Q1 to 2016Q2, using the data
provided in the worksheet. Name the variable: inv_per_gdp (Hint: you will have to generate the
variable using GFCF and GDP)
INV_PER_GDP
26
24
22
20
18
16
14
90 92 94 96 98 00 02 04 06 08 10 12 14 16
(b) Comment on the trends you observe for the periods: 1996 – 2001 and 2002 – 2016.
Do they confirm the World Bank’s findings?
Comment:
Between 1996 – 2001 investment per gdp ratio frequently changed and a significant drop occurred
in the 1st quarter of 1999 and investment per gdp started to slowly increase in the 2nd quarter
then dropped in the 1st quarter of 2002.
In the 2nd quarter of 2002, investment per gdp ratio sharply increased until 2009 which in line with
A 2011 World Bank report which indicated that the real returns to capital in South Africa have
risen sharply during the same period but the investment per gdp ratio abruptly declined after the
1st quarter of 2009 such can be owing to the global economic crisis.
Next you need to build an econometric model for investment in South Africa
Calculate the following variables by using the formulas as provided:
(a) On a single graph show the variables GFCF and RGFCF. Comment on what you
observe as far as their likely stationarity is concerned. (4)
200,000
160,000
120,000
80,000
40,000
0
90 92 94 96 98 00 02 04 06 08 10 12 14 16
GFCF RGFCF
Comment:
By merely inspecting the shapes of the two graphs you should be able to tell that:
RGFCF seems to be stationary as its means and variances seem to be constant over time thus
possibility of l(0)
GFCF seems to be non-stationary as its mean seem rising over time and its variance could be
could be constant around the trend that is l(1) or its mean are rapidly changing over time l(2)
(b) Use the ADF test to test for unit roots. Provide your results for the following
variables in the table below (Hint: please remember to log variables, where
appropriate, before performing the tests): (8)
[Note: You do not have to include D(INF) in normal scenarios, given the already highly significant results
for INF in level. Please also read section 21.8, p.748-754, which provides for different ways of testing
(i) Estimate the following long-run cointegration equation and use your results
to complete the table. (Remember to include an intercept term.) (4)
LRGFCF = f (LRGDP)
Variable Coefficient
1.178005
LRGDPt
C -3.929171
Comment:
It is evident that both LRGFCF and LRGDP are positively related (i.e. LRGDP have a
positive sign), which is correct according to our a priori (economic theory)
expectations. The magnitude of the coefficients seems irrelevant – in this case it is
not between 0 and 1, and if LRGDP increases by 1%, LRGFC will by 1.17%
(iii) Generate the residual series with the command PROC/make residual series.
Name the series: RES_GFCF. Perform a unit root test on RES_GFCF and
report your answers in the table below. (2)
Statistically significant at the: 10% level (*), 5% level (**), 1% level (***)
(iv) Can we conclude that the variables in the long-run equation are indeed
cointegrated? Explain. (2)
Comment:
The results indicate that the variable is statistically significant at 1% (, i.e. three stars)
which means that we can reject the null hypothesis of No cointegration.
(d) Build an ECM for investment, using the variables and lag lengths as provided in
the table below. Complete the following table: (10 x ½ = 5)
(ii) Given your conclusions on the diagnostic check of the ECM, do you think
that this is an acceptable model? (Please provide reasons, no marks will be
awarded for only stating yes/no.) (4)
Comment:
The model is unacceptable. There is general inconsistency of the model.
There is evidence of 1st order serial correlation (Data is often of a “cyclical” nature.
When errors associated with observations of different time periods are related to each
other, we refer to the errors as being serially correlated) hence inconsistency.
There is heteroscedasticity.
The is heteroscedasticity up to order 2.
(f) Regardless of the results you obtained in question 1(d), you still decide to create
a model to combine your long run and ECM. Provide your EViews model statement
(6 marks), and a copy of the graph depicting the actual and modelled values (2
marks), in the space provided.
(8)
lrgfcf=0.446465400297*d(lrgdp)+0.181357706001*d(lprime)-0.00576476633367*inf-
0.123823577887*res_gfcf(-1) + 0.0146753964222+res_gfcf(-1)
rgfcf=exp(lrgfcf)
QUESTION 2:
(a)Explain your priori/hypothesized expectations regarding the relationship between the dependent
variable (i.e. Admitted to Graduate Program) and the explanatory variables (i.e.Q and V). (2)
Economic theory/expectations:
Aptitude Test score(Quantitative)and Aptitude Test score (Verbal) are positively related
to the qualitative variable (ADM) that is the students Admitted to the Graduate
programme. If a student is to increase the chance or the probability of being admitted
to the programme, his/her aptitude score both verbal and quantitative has to be high.
Although our statistical results look satisfactory the LPM model is not a satisfactory
model because of the non-normality of the error term, Heteroscedasticity and others.
Weighted least squares is the procedure to be used to obtain the more efficient estimates
of the standard errors. The entire LPM is divided by the square root of the weights to have
then Weighted Least Squares. (2
[10]
QUESTION 3: [20]
(a) Explain the concept of linearity. Also comment on what is meant by an intrinsically linear
regression model?
1
(i) ln Yi 1 2 i
Xi
Linear in parameters equation: that is, the parameters are raised to the first power only.
(ii) Yi 1 23 X i i
Non-linear in parameters equation: Since, the parameters are raised to the 3rd power.
NB: The term “linear” regression will always mean a regression that is linear in the parameters;
the β’s (that is, the parameters) are raised to the first power only. It may or may not be
linear in the explanatory variables, the X’s (see Gujarati 2008:38).
(c) Discuss the following statement: “Researchers should always keep in mind that their
results are only as good as the data they are working with”
(4)
In given situations researchers find that the results of the research are unsatisfactory, the
cause may be not that they used the wrong model but that the quality of the data was
poor. Unfortunately, because of the non-experimental nature of the data used in most
social science studies, researchers very often have no choice but to depend on the
available data.
But they should always keep in mind that the data used may not be the best and should
try not to be too dogmatic about the results obtained from a given study, especially when
the quality of the data is suspect.
Because of all of the reason listed below and many other problems, the researcher should
always keep in mind that the results of research are only as good as the quality of the
data.
o Reasons:
i) There is the possibility of observational errors, either of omission or commission
since the data in non-experimental.
ii) Even in experimentally collected data, errors of measurement arise from
approximations and round offs.
iii) In questionnaire-type surveys, the problem of nonresponse can be serious
which cause selectivity bias.
iv) The sampling methods used in obtaining the data may vary so widely that it is
often difficult to compare the results obtained from the various samples.
d) In your own words explain the differences between linear probability models (LPM’s)
and logistic (Logit) models. (10)
QUESTION 4: [15]
Yt 0 1Yt 1 2 I t 1t
(a) Which of the variables are endogenous and which are exogenous? Make sure to also
include the lagged variables in your answer.
Endogenous variables
Y, I, C and Q
Exogenous variables.
𝑌𝑡−1 , 𝑄𝑡−1 , 𝐶𝑡−1 ,𝑄𝑡−2 , R and P
Compiled by Jeka: 073 768 7330 www.jekanomics.com
9
(10 x ½ = 5)
(b) Explain the method of Indirect Least Squares (ILS) and comment if it will be suited for this
model?
(5)
(c) Explain the concept of unilateral causal dependence, and if it is relevant to the above
model?
The recursive system OLS cannot be applied to each equation separately. Actually, we
do have a simultaneous-equation problem in this situation. From the structure of such
systems, it is clear that there is interdependence among the endogenous variables.
The second equation and the 3rd equation contains the exogenous variables and
endogenous variables on the right-hand side.
Thus, equations do not exhibit a unilateral causal dependence.
(3)
(d) In your own words, explain the problem of an equation being underidentified? (2)
Underidentification (too little information available) Numerical values for the structural
parameters cannot be obtained.
End
QUESTION 1: [55]
(a) Draw a graph of real imports of goods and services and paste it in the space provided. (1)
REAL IMPORTS
1,000,000
900,000
800,000
700,000
600,000
500,000
400,000
300,000
200,000
100,000
1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
(b)To what do you attribute the sudden drop in imports around 2009? (1)
IMP_GDP Ratio
.32
.28
.24
.20
.16
.12
1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
Comment:
i. Prior to South Africa obtaining independence (1994) on average imports
were below 20% of the Gross Domestic Product possible due to the fact
that South Africa was a closed economy due to the Sanctions against
apartheid.
ii. In the period 2009-2015 the relative size of imports is about 29% of the GDP
which implies that the South Africa economy was now integrated with the
international community (Open Economy).
(a)Test the variables for stationarity. Provide your results for the following variables in the table
below: (8)
You can assume that the variables LPZ, LRELPZ, LRAND and LCPI are non-stationary,
integrated of order I (1).
i) Estimate the following long-run cointegration equation and use your results to complete
the table. (4)
Variable Coefficient
LGDPt 1.372653
LRELPZt -0.491880
DUMt 0.220729
C -7.060542
(ii) Evaluate the potential long-run equation. Do the estimated coefficients correspond
to your a priori expectations in terms of size and sign? Explain. (6)
Economic Evaluation.
i. To evaluate our (potential) cointegration equation, it is evident that LGDP is
positively related to LIMP (i.e.it has a positive sign), which is correct
according to our a priori (economic theory) expectations. However, the
magnitude of the coefficients also seems to irrelevant – in this case it not
between 0 and 1, and if LGDP increases by 1% we expect LIMP to increase
by 1.4%.
ii. It is also evident that LRELPZ is negatively related to LIMP (i.e.it has a
negative sign), which is correct according to our a priori (economic theory)
expectations. However, the magnitude of the coefficients also seems to
irrelevant – in this case it not between 0 and 1, and if LRELPZ decreases
by 1% we expect LIMP to increase by 1.4%.
iii. Dummy: Other things being equal, on average real imports increased by
22.072% after 1994 it is in line with a prior economic theory expectation.
ii) Generate the residual series: RES_IMP. Test the residual series for stationarity and complete
Statistically significant at the: 10% level (*), 5% level (**), 1% level (***)
iii) Can we conclude that the variables in the long-run equation are indeed cointegrated?
Explain. (2).
Our results show that the variable is statistically significant (5% level i.e. two stars),
which means that we can reject the null hypothesis of no cointegration.
Adjusted R²:0.765795
IMP = EXP(LIMP)
1,000,000
900,000
800,000
700,000
600,000
500,000
400,000
300,000
200,000
100,000
1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
f) Supply your estimated value for imports (IMP^) for the year 2016. (2)
774080.70
g) Discuss the performance of the model (relative to the actual values). In your
discussion focus on two specific periods: 1970-2010 and 2011-2016. (4)
The actual value and the fitted values of real imports are reasonable close to each
other in the 1970 to 2010 periods. Nonetheless, in the period 2011 to 2016, the
model seems to overestimates and or underestimates the actual and fitted values.
(b) In this question, you need to analyse data related to housing loan applications of 40
individuals.
You are provided the following information in sheet “ECS4863 JanFeb 19 Question 2b” of
the MS EXCEL file “ECS4863_Jan Feb 2019 exam data.xls”.
P
Li ln i 0 1 Depi 2 ILi i
1 Pi
(ii) Estimate the preceding model using logistic modelling (Logit). Write down
the estimated results. (3)
(b)
(i) We expect to have a positive relationship between loan application outcome and
deposits because the more deposits a financial institution have in their coffers the
more they are likely to loan them out.
We also expect to see a positive relationship between loan application outcome
and income-to-loan ratio because the higher the income of an individual the more
likely they are to pay back the loan. Therefore, the bank will give them a loan as
the they can afford to pay back the loan.
(iii) The estimated slope coefficient suggests that for a unit increase in deposits, the
weighted log of the odds in favour of the loan being approved goes up by 0.15
units. Similarly, for a unit increase in individual’s income relative to the loan
amount, the weighted log of the odds in favour of the loan being approved goes up
by 0.19 units. Both variables are statistically significant at 5% level of significance.
[15]
(a)Suppose you have monthly data over a number of years, how many dummy variables will you
introduce to test the following hypotheses (provide an answer for both a present and
suppressed intercept term)?
(b)Discuss the following measurement scales of variables and provide your own example of
each: (4)
An interval scale variable satisfies the last two properties of the ratio scale variable but
not the first. Thus, the distance between two time periods, say (2000–1995) is
meaningful, but not the ratio of two time periods (2000/1995). At 11:00 a.m. PST on
August 11, 2007, Portland, Oregon, reported a temperature of 60 degrees Fahrenheit
while Tallahassee, Florida, reached 90 degrees. Temperature is not measured on a
ratio scale since it does not make sense to claim that Tallahassee was 50 percent
warmer than Portland. This is mainly due to the fact that the Fahrenheit scale does not
use 0 degrees as a natural base.
A variable belongs to this category only if it satisfies the third property of the ratio scale
(i.e., natural ordering). Examples are grading systems (A, B, C grades) or income class
(upper, middle, lower).
For these variables the ordering exists but the distances between the categories cannot
be quantified. Students of economics will recall the indifference curves between two
goods. Each higher indifference curve indicates a higher level of utility, but one cannot
quantify by how much one indifference curve is higher than the others.
c) Classical linear regression relies strongly on various assumptions underlying the method
of least squares (OLS).
Assumption 1: The regression model is linear in the coefficients and the error term: This
assumption addresses the functional form of the model.
Assumption 2: The error term has the population mean of zero. Under this assumption
the error term accounts for the variation in the dependent variable that the independent
variable does not explain.
ii) Comment on how realistic these assumptions are in practice and how you can apply
them when reviewing research done by others. (3)
The reality of assumptions is an age-old question in the philosophy of economic science.
Some argue that it does not matter whether the assumptions are realistic. What matters
are the predictions based on those assumptions.
Notable among the irrelevance-of-assumptions thesis is Milton Friedman. To him,
unreality of assumptions is a positive advantage: to be important . . . a hypothesis must
be descriptively false in its assumptions.
One may not subscribe to this viewpoint fully, but in any scientific study we make certain
assumptions because they facilitate the development of the subject matter in gradual
steps, not because they are necessarily realistic in the sense that they replicate reality
Answers
(a)
(i) False because a behavioural equation can also include endogenous variables
as explanatory variables especially in a multi-equation model.
(ii) True, reduced-form coefficients are also known as impact, or short-run
multipliers, because they measure the immediate impact on the endogenous
variable of a unit change in the value of the exogenous variable.
(iii) False, they are not often under the control of the government.
(iv) False, 𝑅 2 are also calculated for simultaneous-equation models
(b) Consider the following basic linear Keynesian macroeconomic model of the
South African economy:
Yt Ct I t Gt NX t
YDt Yt Tt
I t 3 4Yt 5 rt 1 2t
Where:
i) Which of the variables are endogenous and which are exogenous? Make sure to
also include the lagged variables in your answer. (8 x ½ = 4)
ii) Which single-equation estimation method are you most likely to use to estimate the
reduced form equations? Why? (2)
END
24
MAY/JUNE 2019 – EXAMINATION.
MEMO
QUESTION 1: [55]
You are employed as the chief economist for the BRICS New Development Bank (NDB). Your task is to
prepare for a seminar at the African Regional Centre of the BRICS NDB in Sandton, Johannesburg. The
focus of the seminar is on possible drivers of economic growth in emerging economies that make up the
BRICS (Brazil, Russia, India, China and South Africa). You turn to theory and find that GDP is negatively
correlated with inflation and interest rates. The causality regarding unemployment is however difficult to
determine.
You decide to begin with a model for Brazil (with the intention of replicating the study for the other
countries in the group). You manage to get quarterly data for Brazil, for the period 1994Q1 to 2012Q4.
The variables of interest are:
Use the information as provided by the BRICS NDB and explain how you will calculate the
following (You may use mathematical/statistical notation, but make sure to explain its
components. Note that you only need to explain how you will go about calculating the variables,
no actual calculations are required):
GDPt
RGDP = ( ) ∗ 100
CPIt
(b) Inflation Rate (using the quarter on same quarter of previous year method) (2)
CPIt
INF = (( ) − 1) ∗ 100
CPIt−1
(c) Real Interest rates (use money market rates as proxy for interest rates) (1)
After cleaning and renaming the data, you have the following variables:
BRA_GDP = BrazilReal Gross Domestic Product (GDP), (National currency, 2010 = 100)
BRA_INFLA = Brazil Inflation Rate, Percent (quarter on same quarter of previous year)
BRA_INT = Brazil Real interest rate, (percentage per annum)
BRA_UNEMP = Brazil Unemployment rate (percent, %)
BRA_DUM = Dummy variable for inflation targeting (introduced in 1999, thus dummy = 0
before 1999; and 1 from 1999Q1)
Quarterly data for the above variables is available in sheet “ECS4863 MayJun 19 Question 1” of the
MS EXCEL file “ECS4863_May June 2019 exam data.xls”.
25
(a) Test the variables for stationarity. Provide your results for the following variables in the
table below: (8)
Assume that the other variables (i.e. BRA_INFLA, BRA_INT, BRA_DUM) are non-
stationary, integrated of order I(1)..
(i) Estimate the following long-run cointegration equation and use your results to
complete the table. (2)
Variable Coefficient
LBRA_INFLAt -0.095827
LBRA_UNEMPt -0.635853
BRA_DUMt 0.504946
C 28.37804
26
BRA_DUM = 1 so LBRA_GDP will be affected by 0. 504946.The coefficient is positive.
(iv) Generate the residual series: RES_BRAZIL. Perform a unit root test on
RES_BRAZIL and report your answers in the table below. (2)
Statistically significant at the: 10% level (*), 5% level (**), 1% level (***)
(v) Can we conclude that the variables in the long-run equation are indeed
cointegrated? Explain. (2)
The results indicate that the variable is statistically significant (at 1% level, i.e. three stars,
which means that we can reject the null hypothesis (of no cointegration).
(c) Build an ECM for the GDP in Brazil, using the variables as provided in the table below.
27
(d) Perform diagnostic checks on the ECM.
LBRA_GDP = -0.378687080036*D(LBRA_GDP(-1)) -
0.0461748136591*D(LBRA_INFLA) - 0.295188744854*D(LBRA_UNEMP) -
2.78187891141e-05*D(BRA_INT) - 0.0797509566805*RES_BRAZIL(-1) +
0.0159654498461+LBRA_GDP(-1)
BRA_GDP = EXP(LBRA_GDP)
(ii) Draw the graph of the actual and modelled values. (4)
28
1.2E+12
1.0E+12
8.0E+11
6.0E+11
4.0E+11
2.0E+11
1994 1996 1998 2000 2002 2004 2006 2008 2010 2012
(i) A graph showing the impact of a temporary increase of 20% in the inflation rate,
during 2005 (that is 2005Q1 to 2005Q4). (5)
5,000
4,000
3,000
2,000
1,000
0
1994 1996 1998 2000 2002 2004 2006 2008 2010 2012
BRA_INFLA BRA_INFLA_TS
(ii) A graph showing the impact of a permanent worsening of 10% in the unemployment
rate, starting from 2000Q1. (5)
29
14
12
10
2
1994 1996 1998 2000 2002 2004 2006 2008 2010 2012
BRA_UNEMP BRA_UNEMP_PS
30
QUESTION 2: [20]
a) Briefly explain the main differences between a Linear Probability model, Probit and logit.
b) Discuss the use of pseudo R-squared as measure of goodness of fit in binary regressand
models.
The pseudo R-squared is measure of goodness of fit for some common nonlinear regression
models. For example, the Cox and Snell R-squared: it is the usual R-squared for linear
regression but it depends on the likelihoods of the models with and without predictors.
31
Comment: Qualitatively, the results of the logit model:
Collectively, 3 the coefficients are statistically significant and 3 coefficients are statistically
insignificant since the value of LR statistic is 35.10005. The value of is not very
large. Of course in most empirical research typically one could not hope to find predictors which
are strong enough to give predicted probabilities so close to 0 or 1, and so one shouldn't be
surprised.
iv) Based on these results, what recommendations do you make about cancer to the
Minister of Health. (4).
More physical exam. Doctor may feel areas of your body for lumps that may indicate a tumor
and Laboratory tests, such as urine and blood tests, may help the doctor identify abnormalities
that can be caused by cancer in South Africa.
How these effects interact with breast cancer risk depend on a woman's age. Women who give
birth to their first child at age 35 or younger tend to get a protective benefit from pregnancy.
Breast cancer risk is increased for about 10 years after a first birth.
The evidence also shows that, in general, the more weight people gain as adults, the higher the
risk of postmenopausal breast cancer. In contrast, the evidence shows that, in general, the more
excess weight people have as young adults, the lower the risk of breast cancer
QUESTION 3: [15]
a) Log(Wage) =β0 + β1Educ + β2Exper + β3 Gender + β4 Marij + u.
(i) Based on your results, what is the difference in monthly salary between Marijuana
smokers and non-smokers? (4)
Dependent Variable: WAGE
Method: Least Squares
Date: 11/27/19 Time: 12:22
Sample: 1 935
Included observations: 935
32
Prob(F-statistic) 0.000000
(4)
Model:
(ii) Do your results change when you consider the squared values of experience
(Exper2)? How would you justify such a specification?
(3)
o The results will change. One simple way to capture diminishing returns is to add
a quadratic term to a linear relationship. Each additional year of experience
increases wage by less than the previous year—reflecting a diminishing
marginal return to experience. This is not very realistic, but it is one of the
consequences of using a quadratic function to capture a diminishing marginal
effect: at some point, the function must reach a maximum and curve downward.
For practical purposes, the point at which this happens is often large enough to
be inconsequential, but not always.
b) Specify a model that would allow you to test whether drug usage has different effects
on earnings for men and women.
c) Hypothesising that marijuana usage varies across individuals, you decide to categorise
people as follows: (i) Non-user; (ii) Light user (1 to 5 times per month); (iii) Moderate
user (6 to 10 times per month); and (iv) Heavy user (more than 10 times per month).
How does your model specification change?
o Take the base group to be nonuser. Then there is need of dummy variables for
the other three groups: lghtuser, moduser, and hvyuser. Assuming no interactive
effect with gender, the model would be:
d) Explain in detail what a “dummy variable trap” is. How can you overcome this challenge?
The dummy variable trap is concerned with cases where a set of dummy variables is so
highly collinear with each other that OLS cannot identify the parameters of the model.
That happens mainly if you include all dummies from a certain variable, e.g. you have
3 dummies for education "no degree", "high school", and "college". If you include all
dummies in the regression together with an intercept (a vector of ones), then this set of
dummies will be linearly dependent with the intercept and OLS cannot solve.
The solution to the dummy variable trap is to drop one of the categorical variables (or
alternatively, drop the intercept constant) - if there are m number of categories, use m-
1 in the model, the value left out can be thought of as the reference value and the fit
values of the remaining categories represent the change.
QUESTION 4: [10]
33
You are employed as the Chief Economist in the Department of Energy where there is
increasing need to understand the energy demand and supply factors. After your extensive
literature search, you decide to follow the work of Halvorsen (1975) and specify the following
model:
Electricity supply
log P 1 2 log Q 3 log L 4 logIPP 5 log F 6 log R 7 log I 8 log T u
Where:
Q = Residential electricity sales
P = Price of residential electricity
Y = Annual income
G = Price for residential gas
D = Heating Degree Days (A measure of how cold the temperate was on a given
day or over a period of days)
J = Average June temperature
R = Percentage of population in rural areas
H = Household size (number of people in the household)
T = Time trend variable
L = Labour cost
IPP = Percentage generated by Independent Power Producers
F = Fuel cost per Kilowatt-hour generation
I = Ratio of industrial sales to residential sales
(a) Which of the variables are endogenous and which are exogenous? (4)
Endogenous variables
log Q and log P.
Exogenous variables.
logY, logL, logG, logD, logJ, logR, logH, logIPP, logF, logI and logT
(c) Describe in detail how you would estimate this model. (4)
34
End
JANUARY/FEBRUARY 2016.
MEMO.
QUESTION 1: [65]
For Question 1 you need to estimate a demand function for skilled labour in South Africa. In theory
the production function may be used to derive the demand for labour within a framework of profit
maximization. The theoretical specification of the demand for skilled labour (NS) may be specified as:
Where:
Variable Description
NS = demand for skilled labour Labour: Employment in the non-agricultural
sectors: Grand total (Seasonally adjusted,
2010=100 (Period))
GDP = real output (real GDP at market prices) Gross domestic product at market prices (GDP)
LC = nominal unit labour cost Labour: Labour costs in the non-agricultural
sectors: Nominal unit labour costs
(Seasonally adjusted, 2010=100 (Period))
LP = labour productivity Labour: Labour costs in the non-agricultural
sectors: Labour productivity (Seasonally
adjusted, 2010=100 (Period))
Annual data for the above variables is available on sheet “ECS4863 JanFeb16 Question 1” of the MS
EXCEL file “ECS4863_Jan Feb 2016 exam_data.xls” for the period 1970-2014 (i.e. 45 years).
(Hint: create a workfile in EViews using the regular frequency/annual options and for the relevant years.
Then copy the data from EXCEL into EViews.)
35
(a) Use the ADF test to test all four variables for unit roots. Provide your answers in the table below
(Hint: please remember to log variables before performing the tests): (16)
(i) Estimate the following long-run cointegration equation and use your results to
complete the table. (Remember to include an intercept term.) (4)
Variable Coefficient
LGDPt
1.298990
LLCt -0.129727
(ii) Interpret the coefficients of the long-run equation and Do the coefficients correspond
to your a priori expectations in terms of their size and sign? (8)
36
To evaluate our (potential) cointegration equation, it is evident that LGDP is positively related
to LNS (i.e.it has a positive sign), which is correct according to our a priori (economic theory)
expectations. However, the magnitude of the coefficients also seems to irrelevant – in this case
it not between 0 and 1, and if LGDP increases by 1% we expect LNS to increase by 129.9%.
It is also evident that LLC is negatively related to LNS (i.e.it has a negative sign), which is correct
according to our a priori (economic theory) expectations. However, the magnitude of the
coefficients also seems to irrelevant – in this case it not between 0 and 1, and if LLC decreases
by 1% we expect LNS to increase by 12.98%.
(iii) Generate the residual series with the command GENR: RESNS = RESID. Perform a unit
root test on RESNS and report your answers in the table below (2)
Statistically significant at the: 10% level (*), 5% level (**), 1% level (***)
(iv)
Can we conclude that the variables in the long run equation are indeed
cointegrated? Explain. (2)
The results indicate that the variable is statistically significant (at 1% level, i.e. three stars,
which means that we can reject the null hypothesis (of no cointegration).
(c) Build an Error Correction Model (ECM) for the demand for skilled labour.
(ii) Can we interpret the signs and size of the coefficients in the ECM? (2)
The coefficient of the lagged residual (RESNS (-1)) is negative and significant, that is it
between -1 and 0. It is -0.036667.
(d) Perform diagnostic checks on the ECM.
37
Test Null hypothesis Test statistic P-value Conclusion
(ii) Given your conclusions on the diagnostic check of the ECM, do you think that this is an
acceptable model? (Please provide reasons, no marks will be earned for
only stating yes/no.) (3)
NO:
Reasons:
o Residuals are not normally distributed
o There is autocorrelation up to the order 2.
o There is heteroscedasticity up to the order 2.
(e) Regardless of the results you obtained in question 1(d), suppose you still decide to create a model
statement in EViews to combine your long run and ECM.
(i) Provide the missing values/variables in the model statement (please write your
answer next to the correct option in the space provided below the statement): (5)
NS = EXP((e)--------- )
38
(a) LNS (b) 1.2989899682 (c) LNS (d) 0.970114223164
(e) LNS
(iii) Graph the actual and estimated values for the demand for labour. Comment the fit
you observe. (Hint: copy/paste your graph of NS and NS_0.)
120
110
100
90
80
70
60
50
40
1970 1975 1980 1985 1990 1995 2000 2005 2010
NS NS (Baseline)
Comment: A very good fit is observed. However, in practice this is not always the case. Keep in (3)
mind what the purpose of the model (e.g. academic publication, scenario analysis, forecasting,
etc.) is and also how good/reliable the data is that you are working with. But if you went through
all the checks/steps of the model process and you got good results up to here, the model results
will usually be acceptable.
QUESTION 2: [20]
(i) List and briefly explain two possible problems when using OLS to estimate linear
probability models (LPM’s).
Non-normality of the error term: The assumption that the error is normally distributed is
critical for performing hypothesis tests after estimating your econometric model.
Heteroscedasticity: The classical linear regression model (CLRM) assumes that the error
term is homoscedastic. The assumption of homoscedasticity is required to prove that the
OLS estimators are efficient (or best). The proof that OLS estimators are efficient is an
important component of the Gauss-Markov theorem. The presence of heteroscedasticity
39
can cause the Gauss-Markov theorem to be violated and lead to other undesirable
characteristics for the OLS estimators.
ii) What other measures of goodness of fit, apart from R², are available in binary regress and
models? List any two and briefly explain how they work. (2)
The pseudo R-squared is measure of goodness of fit for some common nonlinear
regression models.
The Cox and Snell R-squared: the usual R-squared for linear regression depends on the
likelihoods for the models with and without predictors.
(b) For Question 2 (b) you need to estimate a logistic regression (logit) function to explain the
determinants of being accepted into an honours module at university, using hypothetical
data obtained from 200 students
HON = if a student is accepted into an honours module or not, where 1 = yes and 0 = no.
READ = reading mark obtained (out of 100)
MATH = mathematics mark obtained (out of 100)
The variables are provided in the sheet named: ECS4863 JanFeb16 Question 2.
(i) Hypothesise the expected relationship with the two explanatory variables, i.e.
READ and MATH.
H0: B1 = B2 =0
For every one-unit increase in reading mark obtained (so, for every additional point on the
reading test), we expect an increase in the log-odds of HON
For every one-unit increase in MATH, we expect an increase in the log-odds of HON.
(ii) Import the data into EViews and estimate the following function (Hint: remember
to change your estimation setting to binary/logit): (2)
MATH: The coefficient of 0.118779 attached to MATH is to be interpreted as follows: Take its
antilog, subtract one from it and multiply the result by 100. Thus, antilog (0.11877) = 1.12612,
subtracting one from this and multiplying the difference by 100, gives 12.61%. This means that
40
if MATH (Mathematic mark) increases by one unit, the odds in favour of HON (A student
accepted into an honours module) goes up by 12.61%.
(a) Explain simultaneous-equation bias and why, or in which instances, OLS may not be
applied in a system of simultaneous equations.
Given the following equations representing a simultaneous equations model:
J1 1 J 2 1Z1 u1
J 2 2 J1 2 Z 2 u2
Whenever an explanatory variable is also an endogenous variable, the ordinary least squares
(OLS) estimation procedure for the value of its coefficient is biased. This arises when one or
more of the explanatory variables is jointly determined with the dependent variable.
OLS suffer from simultaneous equation bias when 𝐽2 is correlated with 𝑢1 because of
simultaneity.
QUESTION 4: [8]
You have recently been employed as economic advisor to the Department of Agriculture. As
part of your primary duties, your line manager asks you to build an econometric model to
explain the value added by the agriculture, forestry and fishing sector to the overall GDP of
South Africa.
(a) Before you start working on the model, you are requested to provide a brief overview of the
methodology that you plan to apply. Your overview should include items such as the various
steps to be followed, a discussion of potential explanatory variables (including a priori
expectations) and the estimation technique(s) to be employed. (Hint: please do not provide any
regression or other output.) Your answer should not exceed 2 pages.
Discuss:
Statement of Economic Theory or Hypothesis.
Specification of the Mathematical Model of Consumption (single-equation model)
Specification of the Econometric Model.
Obtaining Data.
Estimation of the Econometric Model
Hypothesis Testing
Forecasting or Prediction
Use of the Model for Control or Policy Purposes
End.
41