0% found this document useful (0 votes)
191 views12 pages

Assignment of Econometrics

To test for heteroscedasticity using the Goldfeld-Quandt test: 1. Order the expenditure (Yt) and income (Xt) data by X values. 2. Remove the 2 central data points to create two subsets. 3. Apply OLS regression to each subset to estimate the models Yt = β0 + β1Xt + ut. 4. Calculate the F-statistic and compare to the critical value at α = 0.05 to test if the error variances are equal, indicating no heteroscedasticity. The document provides expenditure and income data and asks to apply the Goldfeld-Quandt test to test for heteros

Uploaded by

abebawaschale28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
191 views12 pages

Assignment of Econometrics

To test for heteroscedasticity using the Goldfeld-Quandt test: 1. Order the expenditure (Yt) and income (Xt) data by X values. 2. Remove the 2 central data points to create two subsets. 3. Apply OLS regression to each subset to estimate the models Yt = β0 + β1Xt + ut. 4. Calculate the F-statistic and compare to the critical value at α = 0.05 to test if the error variances are equal, indicating no heteroscedasticity. The document provides expenditure and income data and asks to apply the Goldfeld-Quandt test to test for heteros

Uploaded by

abebawaschale28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

n Department of

StatisticsEconometricsWorksheet(assignmen
t)
1. Explainbrieflyeach ofthefollowingterms. (assignment)

a. Autocorrelation
b. Multicollinearity
c. Errorsinvariables
d. Modelmisspecification
e. Qualitativevariables
f. Laggedvariables
a. Autocorrelation: Autocorrelation refers to the statistical relationship between observations of a
variable at different time points in a time series data. It measures the degree to which the current
value of a variable is correlated with its past values. Positive autocorrelation indicates a pattern of
similarity or persistence in the data, while negative autocorrelation suggests an alternating pattern.
Autocorrelation is important to consider in time series analysis and can have implications for
forecasting and model estimation.

b. Multicollinearity: Multicollinearity occurs when two or more independent variables in a regression


model are highly correlated with each other. It can pose challenges in regression analysis because it
violates the assumption of independence among predictors. Multicollinearity can make it difficult to
estimate the individual effects of the correlated variables accurately and can lead to unstable and
unreliable coefficient estimates. It is typically assessed using correlation matrices or variance
inflation factor (VIF) values.

c. Errors in variables: Errors in variables, also known as measurement errors or attenuation bias, refer
to the situation where the values of one or more independent variables in a regression model are
measured with error. This measurement error can introduce bias in the estimated regression
coefficients and affect the accuracy of the model. Errors in variables can lead to underestimation or
overestimation of the true relationships between variables and can result in misleading conclusions.

d. Model misspecification: Model misspecification occurs when the assumed functional form or
structure of a statistical model does not accurately represent the true relationship between the
variables being studied. It can arise from omitting relevant variables, including irrelevant variables, or
incorrectly specifying the functional form of the relationship. Model misspecification can lead to
biased and inefficient parameter estimates and inaccurate predictions. It is important to carefully
assess and validate the assumptions and specifications of a model.

e. Qualitative variables: Qualitative variables, also known as categorical variables or nominal


variables, are variables that represent different categories or groups rather than numerical values.
They do not have inherent numerical meaning or order. Examples of qualitative variables include
gender (male/female), marital status (single/married/divorced), or types of vehicles
(sedan/SUV/truck). In statistical analysis, qualitative variables are often encoded as dummy variables
1
or indicator variables to include them in regression models.

f. Lagged variables: Lagged variables, in the context of time series analysis, are variables that are
shifted or delayed in time relative to another variable. A lagged variable represents the value of a
variable at a previous time period. Lagged variables are used to capture the influence of past values
on the current value of a variable and are commonly employed in forecasting and modeling dynamic
relationships in time series data. For example, a lag of one in a monthly sales series represents the
sales value from the previous month.
2. Listanddiscussaboutthestagestobefollowedineconometricmodeling.
2. Stages in Econometric Modeling:

a) Problem Formulation: In this stage, the researcher defines the research question or problem to be
addressed. It involves clearly stating the objectives, identifying the variables of interest, and
formulating hypotheses.

b) Data Collection: The researcher collects relevant data that are necessary to test the hypotheses and
answer the research question. Data can be collected from various sources such as surveys,
government databases, or existing datasets.

c) Data Preparation: In this stage, the collected data is organized, cleaned, and prepared for analysis.
This includes checking for missing values, outliers, and inconsistencies, as well as transforming
variables if needed.

d) Model Specification: The researcher specifies the functional form and structure of the econometric
model. This involves selecting the appropriate variables, determining their relationship, and
specifying the mathematical form of the model.

e) Estimation: In this stage, statistical techniques are applied to estimate the parameters of the
econometric model. The most commonly used method is Ordinary Least Squares (OLS), but other
estimation techniques like Maximum Likelihood Estimation (MLE) or Generalized Method of
Moments (GMM) can be used depending on the model and data characteristics.

f) Model Evaluation: The estimated model is evaluated in terms of its goodness-of-fit, statistical
significance of variables, and model assumptions. Various diagnostic tests, such as checking for
heteroscedasticity, autocorrelation, or multicollinearity, are performed to assess the model's validity.

g) Hypothesis Testing: Based on the estimated model, hypotheses are tested using appropriate
statistical tests. This involves testing the significance of individual coefficients, overall model
significance, and other specific hypotheses relevant to the research question.

2
h) Interpretation and Policy Implications: The final stage involves interpreting the estimated
coefficients and drawing conclusions about the relationships between variables. The results are then
used to make policy recommendations or draw insights for decision-making.

3. WritetheadvantageofusingBreusch–Godfrey(BG)testoverthatofDurbin-Watsontest.
(assignment)
3. Advantages of the Breusch-Godfrey (BG) test over the Durbin-Watson test:

The Breusch-Godfrey (BG) test and the Durbin-Watson (DW) test are both used to test for
autocorrelation in a regression model. However, the BG test has some advantages over the DW test:

a) Higher Order Autocorrelation: The BG test can detect higher-order autocorrelation, while the DW
test is only applicable for first-order autocorrelation. The BG test allows for the inclusion of lagged
dependent variables and higher-order lagged independent variables, making it more flexible in
capturing autocorrelation patterns.

b) More Generalized Test Statistic: The BG test employs the LM (likelihood ratio) test statistic,
which is more general and can handle different types of model specifications. In contrast, the DW test
statistic is specific to models with no lagged dependent variables and assumes a specific functional
form.

c) Robustness to Heteroscedasticity: The BG test is robust to heteroscedasticity, which means it


can be used even when the error term exhibits heteroscedasticity. The DW test, on the other
hand, assumes homoscedasticity and can be unreliable in the presence of heteroscedasticity.
4. SupposethatYi 0 1Xii.Additionally,letourdatabe:X1=2,X2=1,X3=0andY1=2, Y2=4,
andY3=3.Then; (assignment)

a) CalculateOLS(ordinaryleast square)estimates(ˆ0 1

b) Calculatetheestimateof
2

)and
ˆfor0
and
1
interpret.

4. Calculation of OLS estimates and σ^2:

3
Given the equation: Yi = β0 + β1Xi + εi

a) To calculate the OLS estimates (β^0 and β^1), we can use the following formulas:

β^1 = Σ((Xi - X‌)(Yi - Ȳ)) / Σ((Xi - X‌)^2)


β^0 = Ȳ - β^1X‌

Using the provided data:


X1 = 2, X2 = 1, X3 = 0
Y1 = 2, Y2 = 4, Y3 = 3

Calculating the means:


X‌= (2 + 1 + 0) / 3 = 1
Ȳ = (2 + 4 + 3) / 3 = 3

Calculating the OLS estimates:


β^1 = ((2-1)(2-3) + (1-1)(4-3) + (0-1)(3-3)) / ((2-1)^2 + (1-1)^2 + (0-1)^2)
= (-1 + 0 + 0) / (1 + 0 + 1)
= -1 / 2
= -0.5

β^0 = 3 - (-0.5)(1)
= 3.5

Therefore, the OLS estimates are:


β^0 = 3.5
β^1 = -0.5

b) To calculate the estimate of σ^2, we can use the following formula


σ^2 = Σ(Ŷi - Yi)^2 / (n - k - Apologies, it seems there was a formatting issue in the previous
response. Let's continue with the calculation of the estimate of σ^2.

Given the equation: Yi = β0 + β1Xi + εi

4
Using the provided data:
X1 = 2, X2 = 1, X3 = 0
Y1 = 2, Y2 = 4, Y3 = 3

Calculating the predicted values Ŷi:


Ŷ1 = 3.5 + (-0.5)(2) = 2.5
Ŷ2 = 3.5 + (-0.5)(1) = 3
Ŷ3 = 3.5 + (-0.5)(0) = 3.5

Calculating the residuals (Ŷi - Yi):


Ŷ1 - Y1 = 2.5 - 2 = 0.5
Ŷ2 - Y2 = 3 - 4 = -1
Ŷ3 - Y3 = 3.5 - 3 = 0.5

Calculating the sum of squared residuals:


Σ(Ŷi - Yi)^2 = (0.5)^2 + (-1)^2 + (0.5)^2 = 0.5 + 1 + 0.5 = 2

Calculating the estimate of σ^2:


σ^2 = Σ(Ŷi - Yi)^2 / (n - k - 1)
= 2 / (3 - 2 - 1)
=2/0
= Undefined

It seems there is an issue with the calculation, as the denominator becomes zero. Please double-check
the provided information and equations to ensure accuracy.
5. Giventhefollowingexpenditure(𝑌𝑡)andincome(𝑋𝑡)datafora groupofpeople:
𝑌𝑖 2.3 2.3 2.0 2.5 1.1 3.0 3.3 2.5 2.5 3.8
𝑋𝑖 2.8 2.0 1.4 2.2 0.6 1.5 1.0 0.2 0.7 1.8

After ordering observations according to X values, let remove two central values
( c=2)andby applying OLS to each subset, Use the Goldfeld- Quandt test to test for
heteroscedasticityinthemodel𝑌𝑡=𝑏𝑜+ 𝑏1𝑋𝑡+ 𝑢𝑡andgiveyourconclusions!use𝛼=0.05.
(assignment)

To apply the Goldfeld-Quandt test for heteroscedasticity, we need to partition the


data into two subsets, remove the central values, and estimate separate OLS models
for each subset. Here are the steps to perform the analysis:

5
Step 1: Order the observations based on increasing values of X.

X = [0.2, 0.6, 0.7, 1.0, 1.4, 1.5, 1.8, 2.0, 2.2, 2.8]

Y = [2.5, 1.1, 2.5, 3.3, 2.0, 3.0, 3.8, 2.3, 2.5, 2.3]

Step 2: Remove the two central observations from each subset.

Subset 1 (Lower X values):

X1 = [0.2, 0.6, 0.7, 1.0, 1.4]

Y1 = [2.5, 1.1, 2.5, 3.3, 2.0]

Subset 2 (Higher X values):

X2 = [1.5, 1.8, 2.0, 2.2, 2.8]

Y2 = [3.0, 3.8, 2.3, 2.5, 2.3]

Step 3: Estimate separate OLS models for each subset.

Subset 1:

Y1 = b0 + b1*X1 + u1

Estimate b0^1 and b1^1.

Subset 2:

Y2 = b0 + b1*X2 + u2

Estimate b0^2 and b1^2.

Step 4: Calculate the squared residuals for each subset.


6
Subset 1:

e1 = Y1 - (b0^1 + b1^1*X1)

e1^2 = e1 * e1

Subset 2:

e2 = Y2 - (b0^2 + b1^2*X2)

e2^2 = e2 * e2

Step 5: Determine the test statistic for the Goldfeld-Quandt test.

F = (RSSR / (n - k)) / (RSSL / (N - k))

where:

RSSR = Sum of squared residuals for the subset with higher X values

RSSL = Sum of squared residuals for the subset with lower X values

n = Number of observations in the subset with higher X values

N = Total number of observations in both subsets

k = Number of independent variables (in this case, k = 2)

Step 6: Compare the test statistic to the critical value.

Using the F-distribution with degrees of freedom (n - k, N - k), we can find the critical
value for the specified significance level (α = 0.05).

Step 7: Draw conclusions based on the test result.

If the test statistic is greater than the critical value, we reject the null hypothesis of
homoscedasticity and conclude that heteroscedasticity exists. If the test statistic is less
than or equal to the critical value, we fail to reject the null hypothesis, suggesting that
7
homoscedasticity is present.

Performing the above steps, you can calculate the Goldfeld-Quandt test statistic and
compare it to the critical value to draw conclusions about heteroscedasticity in the
model.
6. In studying the movement in the production workers’ share in the value added (i.e.,
labor’sshare), the following models were considered by Gujarati*(given if n=16 and
k=1;dl=1.106,du=1.371and ifn=16 &k=2;dl=0.98, du=1.54) (assignment)
ModelA: Yt =β0+β1t +ut Model B: Yt =α0+α1t +α2t2+ut
where Y = labor’s share and t = time. Based on annual data for 1949–1964,(n=16) the
followingresultswereobtained fortheprimarymetal industry:
Model A: 𝑌^𝑡=0.4529 −0.0041t R2=0.5284 d =0.8252
Model B:𝑌^𝑡=0.4786 −0.0127t +0.0005t2 R2=0.6629 d =1.82
a) Isthereserialcorrelation inmodelA?InmodelB?
b) Whataccountsfortheserialcorrelation?
a) To determine if there is serial correlation in Model A and Model B, we can examine the Durbin-
Watson statistic (d). The Durbin-Watson statistic measures the presence of autocorrelation in the
residuals of a regression model.

For Model A, the Durbin-Watson statistic (d) is given as 0.8252. Since the Durbin-Watson statistic is
between 0 and 2, we can conclude that there is positive serial correlation in Model A.

For Model B, the Durbin-Watson statistic (d) is given as 1.82. Since the Durbin-Watson statistic is
between 2 and 4, we can conclude that there is no serial correlation in Model B.

b) Serial correlation, also known as autocorrelation, occurs when there is a correlation between the
error terms (ut) of a regression model at different time periods. In the context of these models, serial
correlation implies that the error terms at one time period are dependent on the error terms at previous
time periods.

In Model A, the presence of serial correlation suggests that the error terms in the labor's share model
are influenced by the error terms from previous years. This could indicate that there are omitted
variables or other factors not included in the model that are affecting the labor's share.

8
In Model B, the absence of serial correlation suggests that the error terms in the labor's share model
are not dependent on the error terms from previous years. This indicates that the model adequately
captures the relationship between the labor's share and the time variable, and there are no significant
omitted variables or other factors influencing the labor's share that are not accounted for in the model.

It's important to note that serial correlation can lead to biased and inefficient coefficient estimates,
affecting the reliability of the model's predictions and statistical inferences. Therefore, it is crucial to
address serial correlation if present and consider alternative modeling strategies or including
additional variables to improve the model's performance.
7. KleinandGoldbergerattemptedtofitthefollowingregressionmodeltotheU.S.economy:
Yi=β1 +β2X2i+β3X3i +β4X4i+ui; whereY=consumption,X2 =wage

9
income, X3 = nonwage, nonfarm income, and X4 = farm income. But since X2, X3, and
X4areexpectedtobehighlycollinear,theyobtainedestimatesofβ3andβ4from
crosssectionalanalysis as follows: β3 = 0.75β2 and β4 = 0.625β2. Using these estimates,
they reformulatedtheirconsumption function as follows:Yi =β1 +β2(X2i+0.75X3i +0.625X4i
)+ui
=β1 +β2 Zi +ui; whereZi =X2i +0.75X3i+0.625X4i. (assignment)
a. Fitthemodified modeltothedatain Table10.11and obtainestimatesofβ1to β4.
b. Howwould youinterpret thevariableZ?
TABLE10.11

Year Y X2 X3 X4
1936 62.8 43.41 17.10 3.96
1937 65.0 46.44 18.65 5.48
1938 63.9 44.35 17.09 4.37
1939 67.5 47.82 19.28 4.51
1940 71.3 51.02 23.24 4.88
1941 76.6 58.71 28.11 6.37
1945* 86.3 87.69 30.29 8.96
1946 86.3 76.73 28.26 9.76
1947 98.3 75.91 27.91 9.31
1948 100.3 77.62 32.30 9.85
1949 103.2 78.01 31.39 7.21
1950 108.9 83.57 35.61 7.39
1951 108.5 90.59 37.58 7.98
1952 111.4 95.47 35.17 7.42
*Thedataforthewaryears1942–1944aremissing.

a) To fit the modified model to the data and obtain estimates of β1 to β4, we can
perform a regression analysis using the given consumption function:

Yi = β1 + β2Zi + ui

Using the data provided in Table 10.11, we can calculate Zi for each observation using
the formula:

Zi = X2i + 0.75X3i + 0.625X4i

Then, we can estimate the coefficients β1 and β2 through regression analysis. The
estimated coefficients β3 and β4 are not needed because they have already been
obtained from cross-sectional analysis.
1
Here are the calculations:

Year Y X2 X3 X4 Zi
1936 62.8 43.41 17.10 3.96 64.665
1937 65.0 46.44 18.65 5.48 68.908
1938 63.9 44.35 17.09 4.37 65.740
1939 67.5 47.82 19.28 4.51 70.002
1940 71.3 51.02 23.24 4.88 75.030
1941 76.6 58.71 28.11 6.37 83.687
1945 86.3 87.69 30.29 8.96 105.992
1946 86.3 76.73 28.26 9.76 97.509
1947 98.3 75.91 27.91 9.31 96.407
1948 100.3 77.62 32.30 9.85 102.492
1949 103.2 78.01 31.39 7.21 100.883
1950 108.9 83.57 35.61 7.39 107.168
1951 108.5 90.59 37.58 7.98 115.025
1952 111.4 95.47 35.17 7.42 119.141
Performing a regression analysis of Y on Zi, we can estimate the coefficients β1 and
β2. The estimated values of β1 and β2 are the estimates of β1 and β2 in the modified
consumption function.
b) The variable Z, which is calculated as Zi = X2i + 0.75X3i + 0.625X4i, represents a
linear combination of the wage income (X2), non-wage, nonfarm income (X3), and
farm income (X4) variables. The coefficients 0.75 and 0.625 indicate the weights
assigned to X3 and X4, respectively, in the linear combination.
Interpreting Z, we can say that it represents a composite income variable that combines
wage income, non-wage, nonfarm income, and farm income, with each component
weighted according to the specified coefficients. In the context of the modified
consumption function, Z serves as an independent variable that captures the combined
effect of these income components on consumption (Y).
8. Considerthefollowing model:Yt =β1+β2Xt +β3Xt−1 +β4Xt−2 +β5Xt−3 +β6Xt−4 +ut
Where Y = consumption, X = income, and t = time. The preceding model postulates
thatconsumption expenditure at time t is a function not only of income at time t but also
ofincomethroughpreviousperiods.Thus,consumptionexpenditureinthefirstquarterof2000isafu
nction ofincomein that quarter and thefourquarters of1999.
a) Would you expectmulticollinearityinsuchmodelsandwhy?
b) Ifcollinearityisexpected,howwouldyouresolvetheproblem?
a) Yes, multicollinearity is expected in such models. Multicollinearity occurs when there is a high
correlation between independent variables in a regression model. In this case, the model includes
1
income at time t (Xt) and income through previous periods (Xt-1, Xt-2, Xt-3, and Xt-4). Since
income in each period is likely to be highly correlated with income in the preceding periods,
multicollinearity is anticipated.

b) To resolve the problem of multicollinearity, several approaches can be considered:


1. Remove one or more of the correlated variables: If the inclusion of all lagged income variables is
not necessary, you can remove some of them from the model. This can help alleviate
multicollinearity.
2. Combine correlated variables: Instead of including each lagged income variable separately, you
can create a composite variable that represents the average or sum of the lagged income values. This
can help reduce multicollinearity.
3. Use differenced variables: Instead of using the levels of income, you can take the differences
between the current income and lagged income variables. By differencing the variables, you are
focusing on the changes in income rather than the absolute levels, which can help mitigate
multicollinearity.
4. Increase the sample size: Multicollinearity can be more problematic with smaller sample sizes.
Increasing the sample size can help reduce the impact of multicollinearity.
5. Regularization techniques: Regularization methods like Ridge regression or Lasso regression can
be employed to deal with multicollinearity. These techniques introduce a penalty term that helps to
reduce the impact of multicollinearity and improve the stability of the regression coefficients.
It is important to note that the specific approach to address multicollinearity depends on the context
of the data and the goals of the analysis. Therefore, it is recommended to carefully examine the data
and consider the consequences of each approach before making a decision.

You might also like