Assignment of Econometrics
Assignment of Econometrics
StatisticsEconometricsWorksheet(assignmen
t)
1. Explainbrieflyeach ofthefollowingterms. (assignment)
a. Autocorrelation
b. Multicollinearity
c. Errorsinvariables
d. Modelmisspecification
e. Qualitativevariables
f. Laggedvariables
a. Autocorrelation: Autocorrelation refers to the statistical relationship between observations of a
variable at different time points in a time series data. It measures the degree to which the current
value of a variable is correlated with its past values. Positive autocorrelation indicates a pattern of
similarity or persistence in the data, while negative autocorrelation suggests an alternating pattern.
Autocorrelation is important to consider in time series analysis and can have implications for
forecasting and model estimation.
c. Errors in variables: Errors in variables, also known as measurement errors or attenuation bias, refer
to the situation where the values of one or more independent variables in a regression model are
measured with error. This measurement error can introduce bias in the estimated regression
coefficients and affect the accuracy of the model. Errors in variables can lead to underestimation or
overestimation of the true relationships between variables and can result in misleading conclusions.
d. Model misspecification: Model misspecification occurs when the assumed functional form or
structure of a statistical model does not accurately represent the true relationship between the
variables being studied. It can arise from omitting relevant variables, including irrelevant variables, or
incorrectly specifying the functional form of the relationship. Model misspecification can lead to
biased and inefficient parameter estimates and inaccurate predictions. It is important to carefully
assess and validate the assumptions and specifications of a model.
f. Lagged variables: Lagged variables, in the context of time series analysis, are variables that are
shifted or delayed in time relative to another variable. A lagged variable represents the value of a
variable at a previous time period. Lagged variables are used to capture the influence of past values
on the current value of a variable and are commonly employed in forecasting and modeling dynamic
relationships in time series data. For example, a lag of one in a monthly sales series represents the
sales value from the previous month.
2. Listanddiscussaboutthestagestobefollowedineconometricmodeling.
2. Stages in Econometric Modeling:
a) Problem Formulation: In this stage, the researcher defines the research question or problem to be
addressed. It involves clearly stating the objectives, identifying the variables of interest, and
formulating hypotheses.
b) Data Collection: The researcher collects relevant data that are necessary to test the hypotheses and
answer the research question. Data can be collected from various sources such as surveys,
government databases, or existing datasets.
c) Data Preparation: In this stage, the collected data is organized, cleaned, and prepared for analysis.
This includes checking for missing values, outliers, and inconsistencies, as well as transforming
variables if needed.
d) Model Specification: The researcher specifies the functional form and structure of the econometric
model. This involves selecting the appropriate variables, determining their relationship, and
specifying the mathematical form of the model.
e) Estimation: In this stage, statistical techniques are applied to estimate the parameters of the
econometric model. The most commonly used method is Ordinary Least Squares (OLS), but other
estimation techniques like Maximum Likelihood Estimation (MLE) or Generalized Method of
Moments (GMM) can be used depending on the model and data characteristics.
f) Model Evaluation: The estimated model is evaluated in terms of its goodness-of-fit, statistical
significance of variables, and model assumptions. Various diagnostic tests, such as checking for
heteroscedasticity, autocorrelation, or multicollinearity, are performed to assess the model's validity.
g) Hypothesis Testing: Based on the estimated model, hypotheses are tested using appropriate
statistical tests. This involves testing the significance of individual coefficients, overall model
significance, and other specific hypotheses relevant to the research question.
2
h) Interpretation and Policy Implications: The final stage involves interpreting the estimated
coefficients and drawing conclusions about the relationships between variables. The results are then
used to make policy recommendations or draw insights for decision-making.
3. WritetheadvantageofusingBreusch–Godfrey(BG)testoverthatofDurbin-Watsontest.
(assignment)
3. Advantages of the Breusch-Godfrey (BG) test over the Durbin-Watson test:
The Breusch-Godfrey (BG) test and the Durbin-Watson (DW) test are both used to test for
autocorrelation in a regression model. However, the BG test has some advantages over the DW test:
a) Higher Order Autocorrelation: The BG test can detect higher-order autocorrelation, while the DW
test is only applicable for first-order autocorrelation. The BG test allows for the inclusion of lagged
dependent variables and higher-order lagged independent variables, making it more flexible in
capturing autocorrelation patterns.
b) More Generalized Test Statistic: The BG test employs the LM (likelihood ratio) test statistic,
which is more general and can handle different types of model specifications. In contrast, the DW test
statistic is specific to models with no lagged dependent variables and assumes a specific functional
form.
a) CalculateOLS(ordinaryleast square)estimates(ˆ0 1
b) Calculatetheestimateof
2
)and
ˆfor0
and
1
interpret.
3
Given the equation: Yi = β0 + β1Xi + εi
a) To calculate the OLS estimates (β^0 and β^1), we can use the following formulas:
β^0 = 3 - (-0.5)(1)
= 3.5
4
Using the provided data:
X1 = 2, X2 = 1, X3 = 0
Y1 = 2, Y2 = 4, Y3 = 3
It seems there is an issue with the calculation, as the denominator becomes zero. Please double-check
the provided information and equations to ensure accuracy.
5. Giventhefollowingexpenditure(𝑌𝑡)andincome(𝑋𝑡)datafora groupofpeople:
𝑌𝑖 2.3 2.3 2.0 2.5 1.1 3.0 3.3 2.5 2.5 3.8
𝑋𝑖 2.8 2.0 1.4 2.2 0.6 1.5 1.0 0.2 0.7 1.8
After ordering observations according to X values, let remove two central values
( c=2)andby applying OLS to each subset, Use the Goldfeld- Quandt test to test for
heteroscedasticityinthemodel𝑌𝑡=𝑏𝑜+ 𝑏1𝑋𝑡+ 𝑢𝑡andgiveyourconclusions!use𝛼=0.05.
(assignment)
5
Step 1: Order the observations based on increasing values of X.
X = [0.2, 0.6, 0.7, 1.0, 1.4, 1.5, 1.8, 2.0, 2.2, 2.8]
Y = [2.5, 1.1, 2.5, 3.3, 2.0, 3.0, 3.8, 2.3, 2.5, 2.3]
Subset 1:
Y1 = b0 + b1*X1 + u1
Subset 2:
Y2 = b0 + b1*X2 + u2
e1 = Y1 - (b0^1 + b1^1*X1)
e1^2 = e1 * e1
Subset 2:
e2 = Y2 - (b0^2 + b1^2*X2)
e2^2 = e2 * e2
where:
RSSR = Sum of squared residuals for the subset with higher X values
RSSL = Sum of squared residuals for the subset with lower X values
Using the F-distribution with degrees of freedom (n - k, N - k), we can find the critical
value for the specified significance level (α = 0.05).
If the test statistic is greater than the critical value, we reject the null hypothesis of
homoscedasticity and conclude that heteroscedasticity exists. If the test statistic is less
than or equal to the critical value, we fail to reject the null hypothesis, suggesting that
7
homoscedasticity is present.
Performing the above steps, you can calculate the Goldfeld-Quandt test statistic and
compare it to the critical value to draw conclusions about heteroscedasticity in the
model.
6. In studying the movement in the production workers’ share in the value added (i.e.,
labor’sshare), the following models were considered by Gujarati*(given if n=16 and
k=1;dl=1.106,du=1.371and ifn=16 &k=2;dl=0.98, du=1.54) (assignment)
ModelA: Yt =β0+β1t +ut Model B: Yt =α0+α1t +α2t2+ut
where Y = labor’s share and t = time. Based on annual data for 1949–1964,(n=16) the
followingresultswereobtained fortheprimarymetal industry:
Model A: 𝑌^𝑡=0.4529 −0.0041t R2=0.5284 d =0.8252
Model B:𝑌^𝑡=0.4786 −0.0127t +0.0005t2 R2=0.6629 d =1.82
a) Isthereserialcorrelation inmodelA?InmodelB?
b) Whataccountsfortheserialcorrelation?
a) To determine if there is serial correlation in Model A and Model B, we can examine the Durbin-
Watson statistic (d). The Durbin-Watson statistic measures the presence of autocorrelation in the
residuals of a regression model.
For Model A, the Durbin-Watson statistic (d) is given as 0.8252. Since the Durbin-Watson statistic is
between 0 and 2, we can conclude that there is positive serial correlation in Model A.
For Model B, the Durbin-Watson statistic (d) is given as 1.82. Since the Durbin-Watson statistic is
between 2 and 4, we can conclude that there is no serial correlation in Model B.
b) Serial correlation, also known as autocorrelation, occurs when there is a correlation between the
error terms (ut) of a regression model at different time periods. In the context of these models, serial
correlation implies that the error terms at one time period are dependent on the error terms at previous
time periods.
In Model A, the presence of serial correlation suggests that the error terms in the labor's share model
are influenced by the error terms from previous years. This could indicate that there are omitted
variables or other factors not included in the model that are affecting the labor's share.
8
In Model B, the absence of serial correlation suggests that the error terms in the labor's share model
are not dependent on the error terms from previous years. This indicates that the model adequately
captures the relationship between the labor's share and the time variable, and there are no significant
omitted variables or other factors influencing the labor's share that are not accounted for in the model.
It's important to note that serial correlation can lead to biased and inefficient coefficient estimates,
affecting the reliability of the model's predictions and statistical inferences. Therefore, it is crucial to
address serial correlation if present and consider alternative modeling strategies or including
additional variables to improve the model's performance.
7. KleinandGoldbergerattemptedtofitthefollowingregressionmodeltotheU.S.economy:
Yi=β1 +β2X2i+β3X3i +β4X4i+ui; whereY=consumption,X2 =wage
9
income, X3 = nonwage, nonfarm income, and X4 = farm income. But since X2, X3, and
X4areexpectedtobehighlycollinear,theyobtainedestimatesofβ3andβ4from
crosssectionalanalysis as follows: β3 = 0.75β2 and β4 = 0.625β2. Using these estimates,
they reformulatedtheirconsumption function as follows:Yi =β1 +β2(X2i+0.75X3i +0.625X4i
)+ui
=β1 +β2 Zi +ui; whereZi =X2i +0.75X3i+0.625X4i. (assignment)
a. Fitthemodified modeltothedatain Table10.11and obtainestimatesofβ1to β4.
b. Howwould youinterpret thevariableZ?
TABLE10.11
Year Y X2 X3 X4
1936 62.8 43.41 17.10 3.96
1937 65.0 46.44 18.65 5.48
1938 63.9 44.35 17.09 4.37
1939 67.5 47.82 19.28 4.51
1940 71.3 51.02 23.24 4.88
1941 76.6 58.71 28.11 6.37
1945* 86.3 87.69 30.29 8.96
1946 86.3 76.73 28.26 9.76
1947 98.3 75.91 27.91 9.31
1948 100.3 77.62 32.30 9.85
1949 103.2 78.01 31.39 7.21
1950 108.9 83.57 35.61 7.39
1951 108.5 90.59 37.58 7.98
1952 111.4 95.47 35.17 7.42
*Thedataforthewaryears1942–1944aremissing.
a) To fit the modified model to the data and obtain estimates of β1 to β4, we can
perform a regression analysis using the given consumption function:
Yi = β1 + β2Zi + ui
Using the data provided in Table 10.11, we can calculate Zi for each observation using
the formula:
Then, we can estimate the coefficients β1 and β2 through regression analysis. The
estimated coefficients β3 and β4 are not needed because they have already been
obtained from cross-sectional analysis.
1
Here are the calculations:
Year Y X2 X3 X4 Zi
1936 62.8 43.41 17.10 3.96 64.665
1937 65.0 46.44 18.65 5.48 68.908
1938 63.9 44.35 17.09 4.37 65.740
1939 67.5 47.82 19.28 4.51 70.002
1940 71.3 51.02 23.24 4.88 75.030
1941 76.6 58.71 28.11 6.37 83.687
1945 86.3 87.69 30.29 8.96 105.992
1946 86.3 76.73 28.26 9.76 97.509
1947 98.3 75.91 27.91 9.31 96.407
1948 100.3 77.62 32.30 9.85 102.492
1949 103.2 78.01 31.39 7.21 100.883
1950 108.9 83.57 35.61 7.39 107.168
1951 108.5 90.59 37.58 7.98 115.025
1952 111.4 95.47 35.17 7.42 119.141
Performing a regression analysis of Y on Zi, we can estimate the coefficients β1 and
β2. The estimated values of β1 and β2 are the estimates of β1 and β2 in the modified
consumption function.
b) The variable Z, which is calculated as Zi = X2i + 0.75X3i + 0.625X4i, represents a
linear combination of the wage income (X2), non-wage, nonfarm income (X3), and
farm income (X4) variables. The coefficients 0.75 and 0.625 indicate the weights
assigned to X3 and X4, respectively, in the linear combination.
Interpreting Z, we can say that it represents a composite income variable that combines
wage income, non-wage, nonfarm income, and farm income, with each component
weighted according to the specified coefficients. In the context of the modified
consumption function, Z serves as an independent variable that captures the combined
effect of these income components on consumption (Y).
8. Considerthefollowing model:Yt =β1+β2Xt +β3Xt−1 +β4Xt−2 +β5Xt−3 +β6Xt−4 +ut
Where Y = consumption, X = income, and t = time. The preceding model postulates
thatconsumption expenditure at time t is a function not only of income at time t but also
ofincomethroughpreviousperiods.Thus,consumptionexpenditureinthefirstquarterof2000isafu
nction ofincomein that quarter and thefourquarters of1999.
a) Would you expectmulticollinearityinsuchmodelsandwhy?
b) Ifcollinearityisexpected,howwouldyouresolvetheproblem?
a) Yes, multicollinearity is expected in such models. Multicollinearity occurs when there is a high
correlation between independent variables in a regression model. In this case, the model includes
1
income at time t (Xt) and income through previous periods (Xt-1, Xt-2, Xt-3, and Xt-4). Since
income in each period is likely to be highly correlated with income in the preceding periods,
multicollinearity is anticipated.