0% found this document useful (0 votes)
93 views22 pages

Econometric Theory and Application

This document presents the results of statistical analyses performed on economic data from South Korea from 1990 to 2019. Descriptive statistics show the mean, median, maximum, minimum, standard deviation, skewness, kurtosis, and other measures for GDP, capital formation, labor force, CO2 emissions, and energy use. Tests of normality, including skewness-kurtosis and Jarque-Bera, find the variables are approximately symmetric with thinner tails than a normal distribution. Pairwise correlation and multiple linear and log-log regression analyses are conducted to understand the relationships between GDP and the explanatory variables. Post-estimation tests examine multicollinearity, heteroscedasticity, autocorrelation, and model miss

Uploaded by

Abdul Rehman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views22 pages

Econometric Theory and Application

This document presents the results of statistical analyses performed on economic data from South Korea from 1990 to 2019. Descriptive statistics show the mean, median, maximum, minimum, standard deviation, skewness, kurtosis, and other measures for GDP, capital formation, labor force, CO2 emissions, and energy use. Tests of normality, including skewness-kurtosis and Jarque-Bera, find the variables are approximately symmetric with thinner tails than a normal distribution. Pairwise correlation and multiple linear and log-log regression analyses are conducted to understand the relationships between GDP and the explanatory variables. Post-estimation tests examine multicollinearity, heteroscedasticity, autocorrelation, and model miss

Uploaded by

Abdul Rehman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Contents

Introduction...........................................................................................................................................3
Model................................................................................................................................................4
Descriptive Statistics and Normality Test..............................................................................................4
Skewness – Kurtosis...........................................................................................................................4
Jarque – Bera Test.............................................................................................................................5
Line Graphs............................................................................................................................................8
Pairwise Correlation..............................................................................................................................9
Multiple Regression Analysis (Linear)..................................................................................................10
Interpretation of Variables..............................................................................................................11
Multiple Regression Analysis (log-log).................................................................................................14
Interpretation of Variables..............................................................................................................15
Post Estimation Tests...........................................................................................................................16
Multicollinearity..............................................................................................................................16
Heteroscedasticity...........................................................................................................................17
Autocorrelation...............................................................................................................................18
Misspecification...............................................................................................................................20
Conclusion...........................................................................................................................................22

1
Introduction
The data for this project has been gathered from WDI that will be used to perform different tests
and analysis to draw interpretations. The country that has been assigned is South Korea. All the data
is in one currency and for past 30 years.

GDP C L CO2 E
3.64199E+11 1.42447E+11 19182772 246943.114 2167.339608
4.03453E+11 1.70352E+11 19697871 261482.769 2306.643472
4.28462E+11 1.71989E+11 20089632 284280.508 2534.341028
4.57929E+11 1.8075E+11 20351324 321951.599 2814.188231
5.00373E+11 2.0786E+11 20904536 344037.94 2958.40773
5.48481E+11 2.26944E+11 21397788 374771.067 3210.132036
5.9176E+11 2.5251E+11 21798898 403780.704 3454.854851
6.28275E+11 2.49789E+11 22334662 430032.757 3726.157766
5.96048E+11 1.80905E+11 22026281 364833.497 3377.635074
6.64397E+11 2.27674E+11 22326933 399864.348 3708.705213
7.24597E+11 2.60418E+11 22812596 447561.017 4002.671284
7.59757E+11 2.64881E+11 23182387 450193.923 4033.318968
8.18449E+11 2.85851E+11 23638175 465631.993 4170.476147
8.44208E+11 3.01526E+11 23662545 466215.046 4233.487325
8.88085E+11 3.14021E+11 24117623 482276.506 4332.667741
9.26349E+11 3.20981E+11 24280376 462918.413 4364.219091
9.75115E+11 3.37034E+11 24521665 470655.783 4412.527118
1.03167E+12 3.54001E+11 24773836 495675.724 4564.98783
1.06275E+12 3.53075E+11 24926023 507589.807 4629.777921
1.07118E+12 3.18165E+11 24926774 508862.256 4649.765742
1.14407E+12 3.72418E+11 25260618 566716.515 5045.487749
1.18623E+12 3.84418E+11 25683904 589400.577 5216.587809
1.21473E+12 3.74296E+11 26084471 583966.083 5248.519891
1.25318E+12 3.7807E+11 26423221 592499.192 5231.67546
1.29331E+12 3.91499E+11 27174554 587156.373 5289.275832
1.32964E+12 4.16984E+11 27525319 597354.3 5413.347857
1.36882E+12 4.43126E+11 27812568 620302.386 ..
1.41207E+12 4.91417E+11 28111033 .. ..
1.4497E+12 4.82372E+11 28272711 .. ..
1.47918E+12 4.70322E+11 28410822 .. ..

GDP= GDP in US $(Constant@ 2010)   


C= Gross capital formation in US$ (Constant @2010) 
L= Labor force, Total 
CO2 = CO2 emissions (kt) 
E = Energy use (kg of oil equivalent per capita)

2
Model
We will use a multi variable model in this project which is as follows:

GDP = f(C, L, CO2, E)

Y t =β o+ β 1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 4 + μt

Y = GDP
X1 = C01
X2 = L
X3 = CO2
X4 = E

Descriptive Statistics and Normality Test


First, we will evaluate the descriptive stats of the variables through individual sample.

Sample 1990 – 2019

GDP C01 L CO2 E

 Mean  9.14E+11  3.11E+11  24057064  456553.9  4042.200


 Median  9.07E+11  3.16E+11  24199000  465632.0  4201.982
 Maximum  1.48E+12  4.91E+11  28410822  620302.4  5413.348
 Minimum  3.64E+11  1.42E+11  19182772  246943.1  2167.340
 Std. Dev.  3.48E+11  9.88E+10  2704942.  107508.9  963.9967
 Skewness  0.035619  0.102570 -0.029328 -0.291645 -0.375174
 Kurtosis  1.727009  2.065159  2.010075  2.202191  2.139998

 Jarque-Bera  2.031976  1.145013  1.229239  1.098817  1.411178


 Probability  0.362044  0.564110  0.540847  0.577291  0.493818

 Sum  2.74E+13  9.33E+12  7.22E+08  12326954  105097.2


 Sum Sq. Dev.  3.51E+24  2.83E+23  2.12E+14  3.01E+11  23232240

 Observations  30  30  30  27  26

Skewness – Kurtosis
Skewness is asymmetry in a statistical distribution, in which the curve appears distorted or skewed
either to the left (negative) or to the right (positive). Skewness can be quantified to define the extent

3
to which a distribution differs from a normal distribution. It tells the spread of the distribution and
the skewness for a normal distribution is zero.

Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal
distribution. It tells the height of the data. The kurtosis for a normal distribution equal to 3.

Now we will interpret the skewness and kurtosis of our variables as mentioned in the descriptive
analysis.

i. For variable Y that is our GDP, skewness is 0.03 which is near to zero shows that the
distribution is approximately symmetric. While the kurtosis is 1.72 depicting that the
distribution is not too pointy and flat surfaced. This means that the data is platykurtic and
distribution will have thinner tails than a normal distribution.
ii. In X1 variable, skewness in 0.1 which shows that distribution is approximately symmetric.
However, if we look at the value of kurtosis which is 2.06 showing the distribution will have
thinner tails and its platykurtic.
iii. For variable X2, skewness in -0.02 which is near to zero showing that the distribution is
approximately symmetric. Meanwhile the kurtosis is 2.01 depicting the distribution to be a
little pointy. This means that the distribution is platykurtic.
iv. Variable X3 has a skewness of -0.29 and the distribution spread is approximately symmetric.
However, the kurtosis is 2.2 showing the distribution is pointy/peaked and supposed to be
platykurtic.
v. For variable X4, the spread is symmetrical as the skewness is -0.3 and the kurtosis is peaked
and pointy with the value of 2.1 depicting it to be platykurtic.

Jarque – Bera Test


In statistics, the Jarque–Bera test or JB test is a goodness-of-fit test of whether sample data have the
skewness and kurtosis matching a normal distribution. If the probability for JB test is less than 0.05
or 5% then we will reject the null hypothesis and accept alternative hypothesis, if probability is
greater than 5% or 0.05 we will accept null hypothesis which means data is normally distributed.

1. GDP (Y)
Ho: Data is normally distributed
H1: Data is not normally distributed

By the value of Jarque-Bera test, we can conclude that we will accept H o. As per rule, Probability (JB)
> 0.05, which is 0.36, depicting that the Data is normally distributed.

Probability (JB) = 0.36, accept Ho

5
Series: GDP
Sample 1990 2019
4 Observations 30

3 Mean 9.14e+11
Median 9.07e+11
Maximum 1.48e+12
2 Minimum 3.64e+11
Std. Dev. 3.48e+11
1 Skewness 0.035619
Kurtosis 1.727009

0
4
Jarque-Bera 2.031976
4.0e+11 6.0e+11 8.0e+11 1.0e+12 1.2e+12 1.4e+12 Probability 0.362044
2. C (X1)
Ho: Data is normally distributed
H1: Data is not normally distributed

By the value of Jarque-Bera test, we can conclude that we will accept Ho. As per rule, Probability (JB)
> 0.05, which is 0.56, concluding that the data is normally distributed.

Probability (JB) = 0.56, accept Ho


5
Series: C01
Sample 1990 2019
4 Observations 30

3 Mean 3.11e+11
Median 3.16e+11
Maximum 4.91e+11
2 Minimum 1.42e+11
Std. Dev. 9.88e+10
1 Skewness 0.102570
Kurtosis 2.065159

0 Jarque-Bera 1.145013
2.0e+11 3.0e+11 4.0e+11 5.0e+11
Probability 0.564110

5
3. L (X2)
Ho: Data is normally distributed
H1: Data is not normally distributed

By the value of Jarque-Bera test, we can conclude that we will accept H o. As per rule, Probability (JB)
> 0.05, which is 0.54, depicting that the Data is normally distributed.

Probability (JB) = 0.54, accept Ho

7
Series: L
6 Sample 1990 2019
Observations 30
5
Mean 24057064
4 Median 24199000
Maximum 28410822
3
Minimum 19182772
Std. Dev. 2704942.
2
Skewness -0.029328
1 Kurtosis 2.010075

0 Jarque-Bera 1.229239
20000000 22000000 24000000 26000000 28000000 Probability 0.540847

4. CO2 (X3)
Ho: Data is normally distributed
H1: Data is not normally distributed

By the value of Jarque-Bera test, we can conclude that we will accept Ho. As per rule, Probability (JB)
> 0.05, which is 0.57, depicting that the Data is normally distributed.

Probability (JB) = 0.57, accept Ho

8
Series: CO2
7 Sample 1990 2019
Observations 27
6
5 Mean 456553.9
Median 465632.0
4
Maximum 620302.4
3 Minimum 246943.1
Std. Dev. 107508.9
2 Skewness -0.291645
1 Kurtosis 2.202191

0 Jarque-Bera 1.098817
200001 300001 400001 500001 600001 Probability 0.577291

6
5. E (X4)
Ho: Data is normally distributed
H1: Data is not normally distributed

By the value of Jarque-Bera test, we can conclude that we will accept Ho. As per rule, Probability (JB)
> 0.05, which is 0.49, depicting that the Data is normally distributed.

Probability (JB) = 0.49, accept Ho

5
Series: E
Sample 1990 2019
4 Observations 26

3 Mean 4042.200
Median 4201.982
Maximum 5413.348
2Line Graphs Minimum 2167.340
Std. Dev. 963.9967
Below is the common line graph of our model.
Skewness -0.375174
1
Kurtosis 2.139998
1.6E+12
0 Jarque-Bera 1.411178
2000 1.4E+12
2500 3000 3500 4000 4500 5000 5500 Probability 0.493818

1.2E+12

1.0E+12

8.0E+11

6.0E+11

4.0E+11

2.0E+11

0.0E+00
90 92 94 96 98 00 02 04 06 08 10 12 14 16 18

GDP C L
CO2 E

To interpret

These graphs depicts that:

1. GDP (Y), shows a positive trend in starting in the year 1990 and then a sudden shock is
measured in year 1998 but later on after the year 2000, it can be seen that the units are
rising.
2. C (X1), illustrates that a neagtive trend has been measured at the year 1997-98. However
later on, in the year 1999 it is forming a steady positive trend with the same capacity. Similar

7
situation occurred again in the year 2009-10 with the sudden shock followed by a steady
positive trend.
3. L (X2), as displayed in the graph, depicts that the labour force factor has been measured with
a continuous positive trend except a minor shock at year 1997.
4. CO2 (X3), trend shows somewhat of a structural break with a negative trend in the year 1996-
97 and a steady positive trend next year followed with a smooth rise as shown in graph.
Later in the year 2009-10 the trend shows a sudden rise.
5. E (X4), depicts a positive trend in the start followed by a shock measured in the year 1997
which in the later year moved in the positive direction with the same capacity as it did in the
negative way.

The shocks, structural breaks and negative trends in late 90s are because of the severe financial
crisis South Korea faced in 1997. However, after regulatory and economic reforms in the year 2008-
09, the economy bounced back, with the country's economy marking growth and apparently
recovering from the global recession.

8
GDP C
1.6E+12 5.0E+11

1.4E+12 4.5E+11
4.0E+11
1.2E+12
3.5E+11
1.0E+12
3.0E+11
8.0E+11
2.5E+11
6.0E+11
2.0E+11
4.0E+11 1.5E+11
2.0E+11 1.0E+11
1990 1995 2000 2005 2010 2015 1990 1995 2000 2005 2010 2015

L CO2
30,000,000 640,000
600,000
28,000,000
560,000
520,000
26,000,000
480,000
24,000,000 440,000
400,000
22,000,000
360,000
320,000
20,000,000
280,000
18,000,000 240,000
1990 1995 2000 2005 2010 2015 1990 1995 2000 2005 2010 2015

E
5,500

5,000

4,500

4,000

3,500

3,000

2,500

2,000
1990 1995 2000 2005 2010 2015

Pairwise Correlation
The pairwise correlation describes the relationship between variables. It can be more than two
variables. Following are the relationships between the above 5 variables, including dependent and
independent variables:

GDP C01 L CO2 E

GDP  1.000000  0.979027  0.990526  0.971690  0.980130


C  0.979027  1.000000  0.975524  0.970970  0.974017
L  0.990526  0.975524  1.000000  0.979390  0.988699
CO2  0.971690  0.970970  0.979390  1.000000  0.994922
E  0.980130  0.974017  0.988699  0.994922  1.000000

9
We have the following criteria two evaluate the intensity of relationship among two variables:

- If the values are > 0.8, there is a strong relationship between two variables
- If the values are < 0.8, there is a weak or moderate relationship between two variables

First, we will observe the relationship between dependent and independent variables.

 The relationship between GDP and C shows a strong positive correlation as the value
approaches +1. The figure estimated here is 0.979, depicting that if GDP increases, the C will
also increase and if GDP decreases, the C will also decrease.
 The relationship between GDP and L shows a strong positive correlation as the value
approaches +1. The figure estimated here is 0.99, depicting that if the variable L increases,
the GDP will also increase and vice versa.
 The relationship between GDP and CO2 shows a strong positive correlation as the value
approaches +1. The figure estimated here is 0.971, depicting that if CO2 and GDP are directly
proportional.
 The relationship between GDP and E shows a strong positive correlation as the value
approaches +1. The figure estimated here is 0.980, illustrating that both variables are
directly proportional and will move in the same direction.

Now, we will observe the relationship between dependent variables solely.

 The relationship between C and L shows a strong positive correlation as the value reaches
+1. The figure estimated here is 0.975, depicting that if C increases, the L will also increase
and if C decreases, the L will also decrease.
 The relationship between C and CO2 shows a strong positive correlation as the value reaches
+1. The figure estimated here is 0.975, illustrating that C and CO 2 are directly proportional
and will move in the same direction.
 The relationship between CO2 and E shows a strong positive correlation as the value reaches
+1. The figure estimated here is 0.994, depicting that if CO2 increases, the E will also increase
and vice versa.
 The relationship between L and E shows a strong positive correlation as the value reaches
+1. The figure estimated here is 0.988, depicting that if L increases, the E will also increase
and if L decreases, the E will also decrease.

Multiple Regression Analysis (Linear)


Multiple linear regression (MLR), also known simply as multiple regression, is a statistical technique
that uses several explanatory variables to predict the outcome of a response variable. Multiple
regression is an extension of linear (OLS) regression that uses just one explanatory variable.

Y t =β o+ β 1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 4 + μt

Where:

Y = GDP in US $
X1 = Gross capital formation in US $
X2 = Labor force, Total
X3 = CO2 emissions
X4 = Energy use

10
Following is the composite hypothesis of the variables as we are unaware of the values and signs:

Ho: β o= β 1=β 2=β 3=0−Null

H1: β o ≠ β 1≠ β 2≠ β 3 ≠ 0− Alternative `

Below is the linear regression analysis of our model:

Dependent Variable: GDP


Method: Least Squares
Date: 08/05/20 Time: 15:02
Sample (adjusted): 1990 2015
Included observations: 26 after adjustments

Variable Coefficient Std. Error t-Statistic Prob.  

C -1.77E+12 3.86E+11 -4.585552 0.0002


C01 1.073253 0.492447 2.179431 0.0408
L 102166.7 26854.53 3.804451 0.0010
CO2 -183778.0 828989.5 -0.221689 0.8267
E -3300340. 1.18E+08 -0.027879 0.9780

R-squared 0.984680     Mean dependent var 8.35E+11


Adjusted R-squared 0.981761     S.D. dependent var 3.02E+11
S.E. of regression 4.08E+10     Akaike info criterion 51.87446
Sum squared resid 3.50E+22     Schwarz criterion 52.11640
Log likelihood -669.3680     Hannan-Quinn criter. 51.94413
F-statistic 337.4303     Durbin-Watson stat 0.415560
Prob(F-statistic) 0.000000

Interpretation of Variables
1. C – Constant
The constant term in linear regression analysis seems to be such a simple thing. Also known as the y
intercept, it is simply the value at which the fitted line crosses the y-axis.

If all the other variables are assumed to be constant or zero, what would be the GDP (Y) of the
country? For this we suppose Y = C. The coefficient of the constant variable is -1.77E+12 (-
17700000000) which means that if C, L, CO2, and E is zero, the GDP would decrease by -1.77E+12
units.

As far as the the standard error criteria is concerned, the lower it is, the better it would be. Here it is
3.86E+11, which is of greater value, but it all depends on the data and sample size. The standard
error of the regression (S), also known as the standard error of the estimate, represents the average
distance that the observed values fall from the regression line. Conveniently, it tells you how wrong
the regression model is on average using the units of the response variable

Coming towards T-stats, it shows the significance of single variable. The value -4.58 depicts that the
variable is significant because it fulfills the decision criteria (T > ±2) as the level of significance is 5%.
Same is the case with P-value, 0.0002 which also qualifies the decision criteria (P < 0.05)
*A negative t-statistic shows that it lies to the left of the mean

Therefore, we conclude that the variable C – Constant is significant. This interprets that the variable
Y is fully dependent on independent variables. We will accept H1 – Alternative hypothesis as it turns
out to be a significant variable.

11
2. C01 – X1
If the variable X1 increases by 1 unit, the GDP (Y) would increase by 1.073 units. This result is
interpreted by the value of coefficient.

As for the standard error, we know the criteria. It depends on the spread of variable. Here the value
0.492 which is of very low unit and closer to the fitted line.

We analyze the t-statistic for the significance of single variable and value for that is 2.17 (towards
positive mean), a significant value as greater than ±2. So, the variable X1 is significant of Y i.e. Y is
depending on X1 and similar is the case with P-value as it is lower than 0.05, it fulfills the criteria and
tends to be significant i.e. 0.04 < 0.05

Here, we conclude that X1– C01 (Gross Capital) variable is significant. GDP (Y) is fully depends on it.
We will accept H1 – Alternative hypothesis as it turns out to be a significant variable.

3. L – X2
If the variable X2 increases by 1 unit, the GDP (Y) would increase by 102166.7 units. This result is
interpreted by the value of coefficient.

As for the standard error, it depends on the spread of variable and how much it deviates from the
mean; it doesn’t make a major difference. Here the value 26854.53 states that the observations are
far from the regression line. This might be because of size of sample. As labor force is significantly
increasing, that is why the standard error is large.

For the significance of single variable, t-stat value is 3.80, a significant value as fulfilling the criteria (T
> ±2). This interprets that X2 is significant of Y and GDP (Y) depends on X 2 (L).

Similar is the scenario with P-value as it is lower than 0.05 with the value of 0.001

Hence, we conclude that X 2 – L (Labor force) variable is significant and GDP (Y) depends on it.
Therefore, we accept H1 – Alternative hypothesis as it turns out to be a significant variable.

4. CO2 – X3
If the variable X3 increases by 1 unit, the GDP (Y) would decrease by -183778.0 units. This result is
interpreted by the negative value of coefficient.

As for the standard error, the value measured here is 828989.5 which is far away from the regression
line, might be because not too large sample size. if the sample size is very large, for example, sample
sizes greater than 1,000, then virtually any statistical result calculated on that sample will be
statistically significant.

Analyzing the t-stat, variable X 3 seems to be insignificant with a value of -0.221689 which is less than
±2 and same is the case with P-value, 0.8 > 0.05 does not fulfill the criteria of significance.

Here, we conclude that the CO2 – X3 variable is insignificant, as it can never stand out alone, without
the other independent variables.

12
5. E – X4
If the variable X4 increases by 1 unit, the GDP (Y) would decrease by -3300340 units. This result is
interpreted by the negative value of coefficient.

As for the standard error, the value measured here is 1.18E+08 which is far away from the regression
line, and when the standard error is large relative to the statistic, the statistic will typically be non-
significant.

Analyzing the t-stat, variable X 4 seems to be insignificant with a value of -0.027879 which is less than
±2 and same is the case with P-value, 0.9 > 0.05 does not fulfill the criteria of significance. Hence, we
conclude that the variable X4 is insignificant and when a predictor in regression is
statistically insignificant it means that it is numerical value may not be zero, but this predictor does
not have significant information about response variable.

As for the other tests under the CLRM, the interpretation is as follows:

1. R – Squared
Measure of the closeness of fit describes how well the explanatory variable captures the dependent
variable and measures how close the data are to the fitted regression line. Generally, a higher r-
squared indicates a better fit for the model.

Criteria for this test is as under:

a. R2 > 0.8 (Accept)


b. R2 < 0.8 (Reject)

As of this criterion, the acceptance means that the variable is fully explained and and rejection
means that variable does not qualify the criteria for the goodness of fit model. Here the value is
0.984 which is above the criterion and tends to be significantly good, capturing the dependent
variable (Y).

2. Adjusted R – Squared
The adjusted R-squared is a modified version of R-squared that has been adjusted for the number of
predictors in the model. The adjusted R-squared increases only if the new term improves the model
more than would be expected by chance. It decreases when a predictor improves the model by less
than expected by chance.

In multiple regression model, we give importance to Adjusted R – Squared because the R-squared is
a non-decreasing function that is why sometimes we do not get the actual significance of variables
and for the comparison of two R2 terms.

The criteria for this is similar to the R – squared

a. R2 > 0.8 (Accept)


b. R2 < 0.8 (Reject)

Here, the value 0.981 depicts that the model explains much of the variability of the response data
around its mean. With the help of degree of freedom (d.f), the variables are capturing the
dependent variable (Y) and depicting that 99% of variation in GDP is due to C, L, CO2 and E.

13
3. F – Stats
This test shows the overall significance of the model. The criterion used in it is if F value > 4, the
model is significant. The measured value for our model is 337.4 which is above the criteria. Similar is
the case with Prob(F-stats) i.e. it should be less than 0.05 (5% level of significance). Our measured
value is 0.000 < 0.05. Hence, we will reject null hypothesis.

4. Durbin – Watson Stat


The Durbin Watson (DW) statistic is a test for autocorrelation in the residuals from a statistical
regression analysis. The Durbin-Watson statistic will always have a value between 0 and 4. A value of
2.0 means that there is no autocorrelation detected in the sample. Values from 0 to less than 2
indicate positive autocorrelation and values from 2 to 4 indicate negative autocorrelation.

The value measured in our model is 0.415 which indicates the presence of positive autocorrelation
or serial correlation.

5. Akaike Info Criterion


The Akaike information criterion is an estimator of out-of-sample prediction error and thereby
relative quality of statistical models for a given set of data. The value measured here 51.87 because
as of OLS model, it is a good model and the value must be greater than 2.

The Schwarz criterion and Hannan-Quinn criterion has the approximately similar values and
considered to be a good fit, among other models.

In statistics, the Hannan–Quinn information criterion (HQC) is a criterion for model selection. It is an


alternative to Akaike information criterion.

Multiple Regression Analysis (log-log)


log ⁡(Y ¿¿ t)=βo+ β ¿1 log ⁡( X 1 )+ β2 log ⁡( X 2)+ β 3 log ⁡( X ¿¿ 3)+ β 4 log ⁡(X ¿¿ 4)+ μ ¿¿

Where:

Y = GDP in US $
X1 = Gross capital formation in US $
X2 = Labor force, Total
X3 = CO2 emissions
X4 = Energy use

The multiple regression analysis (log-log) is used to predict and forecast result in a much better way
as compared to classical linear analysis. When a log transformation is executed, we can see the
relationship as a percent change. As the logarithm is applied on both sides of the model, the result
will be more distinguished resulting in a better prediction model.

Hypothesis Criteria

Following is the composite hypothesis of the variables as we are unaware of the values and signs:

Ho: β o= β 1=β 2=β 3=0−Null

H1: β o ≠ β 1≠ β 2≠ β 3 ≠ 0− Alternative `

14
Below is the log – log regression analysis of the variables Y, X1, X2, X3 and X4:

Dependent Variable: LOG(GDP)


Method: Least Squares
Date: 08/05/20 Time: 15:25
Sample (adjusted): 1990 2015
Included observations: 26 after adjustments Interpretation of Variables
Variable Coefficient Std. Error t-Statistic Prob.  
In the log – log regression
C -18.91680 5.290762 -3.575441 0.0018 model, we will only interpret
LOG(C01) 0.281523 0.101458 2.774776 0.0114 the coefficients.
LOG(L) 2.375041 0.382622 6.207285 0.0000
LOG(CO2) -0.713698 0.292611 -2.439068 0.0237 1. C – Constant
LOG(E) 0.950154 0.316628 3.000847 0.0068
If all the other variables are
R-squared 0.994124     Mean dependent var 27.38050
Adjusted R-squared 0.993005     S.D. dependent var 0.393368 assumed to be constant or zero,
S.E. of regression 0.032900     Akaike info criterion -3.819662 what would be the GDP (Y) of
Sum squared resid 0.022730     Schwarz criterion -3.577720
Log likelihood 54.65560     Hannan-Quinn criter. -3.749991 the country? For this we
F-statistic 888.2485     Durbin-Watson stat 0.564841 suppose Y = C. The coefficient
Prob(F-statistic) 0.000000
of the constant variable is -18.91
which means that if C, L, CO2,
and E is zero, the GDP would decrease by -18.91%

Coming towards T-stats, it shows the significance of single variable. The value -3.57 depicts that the
variable is significant because it fulfills the decision criteria (T > ±2) as the level of significance is 5%.
Same is the case with P-value, 0.0018 which also qualifies the decision criteria (P < 0.05)

2. C01 – X1
If the variable X1 increases by 1%, the GDP (Y) would increase by 0.28% (coefficient).

For the significance of single variable, we are analysing T-stats which is 2.7, a significant value as
greater than ±2. So, the variable X1 is significant of Y i.e. Y is depending on X 1. Same is the case with
P-value as it is lower than 0.05, it fulfills the criteria and tends to be significant i.e. 0.01 < 0.05

Here, we conclude that X 1 – C01 variable is significant. GDP (Y) is fully depending on it. We will accept
H1 – alternative hypothesis as it turns out to be a significant variable.

3. L – X2
If the variable X2 increases by 1%, the GDP (Y) would increase by 2.37%. This result is interpreted by
the value of coefficient.

For the significance of single variable, t-stat value is 6.20, a significant value as fulfilling the criteria (T
> ±2). This interprets that X2 is significant of Y and GDP (Y) depends on X 2 (L).

Similar is the scenario with P-value as it is lower than 0.05 with the value of 0.000

4. CO2 – X3
If the variable X3 increases by 1%, the GDP (Y) would decrease by 0.17%. This result is interpreted by
the negative value of coefficient.

15
Analyzing the t-stat, variable X3 seems to be significant with a value of -2.439068 which is greater than
±2 and similar is the case with P-value, 0.23 < 0.05 fulfill the criteria of significance.

5. E – X4
If the variable X4 increases by 1%, the GDP (Y) would increase by 0.95%. This result is interpreted by
the negative value of coefficient.

Analyzing the t-stat, variable X4 seems to be significant with a value of 3.000487 which is more than
±2 and same is the case with P-value, 0.006 < 0.05 which fulfills the criteria of significant variable.
Hence, we conclude that the variable X 4 is significant and we will accept H1 – alternative hypothesis.

The major difference that we have observed in this model is that all the variables turned out to be
significant by following the criteria of T-stat and P-value unlike linear regression model. Hence, this
proves that multiple regression with double log does impact the overall model forecasts result in a
much better way as compared to classical linear analysis.

Post Estimation Tests


After regression analysis, we will perform a variety of follow-up tests:

Multicollinearity
Multicollinearity refers to a situation in which two or more explanatory variables in a multiple
regression model are highly linearly related.

If Centered VIF > 10, multicollinearity exist which tends to mean that r 2 would be high, leading
towards multicollinearity between independent variables. This results into the violation of
assumption 10 which tells us that there is no perfect multicollinearity between the independent
variables, also known as linear combination.

Due to the presence of multicollinearity in the model, we cannot estimate independent variables
and the standard errors are boundless.
Variance Inflation Factors
Date: 08/05/20 Time: 15:34
Sample: 1990 2019
Included observations: 30

Coefficient Uncentered Centered


Variable Variance VIF VIF

C  1.49E+23  2322.177  NA


C01  0.242504  333.1576  23.61332
L  7.21E+08  6230.006  58.21693
CO2  6.87E+11  2284.859  112.3819
E  1.40E+16  3765.744  195.2583

In our model, we can see that the relationship between X 3 and X4 is highly correlated because
whenever energy usage will increase, the CO2 emission will increase.

We have a solution for the violation of assumption and the most common solution to remove
multicollinearity is dropping highly correlated independent variable. In our model, we tried dropping
variables but due to some reasons and economical trend of data we still observed a slight presence
of multicollinearity.

Another method for the removal of multicollinearity is adding up of two independent variables.

16
Heteroscedasticity
In regression analysis, we talk about heteroscedasticity in the context of the residuals or error term.
According to the assumption no. 4 of CLRM which states that “variance of the error terms must be
zero” i.e. it should be homoscedasticity.

Homoscedasticity means “having the same scatter”, the opposite is heteroscedasticity (“different or


unequal scatter”), where points are at widely varying distances from the regression line.

E ( μ2i ) =σ 2

When the above equation is being violated, the errors of the successive terms becomes unequal.
The spread becomes disturbed and increases with the values.

E ( μ2i ) =σ i2

This type of equation is made, when the assumption 4 is violated whereas; i shows that spreads are
unequal.

Hypothesis Testing

H0: There is no Heteroscedasticity (Homo)


H1: There is Heteroscedasticity

Decision Criteria

If Prob. Chi-Square < 0.05, reject null hypothesis


If Prob. Chi-Square > 0.05, accept null hypothesis

Heteroskedasticity Test: White


Null hypothesis: Homoskedasticity

F-statistic 0.874105     Prob. F(14,11) 0.6006


Obs*R-squared 13.69229     Prob. Chi-Square(14) 0.4729
Scaled explained SS 4.423142     Prob. Chi-Square(14) 0.9923

17
Test Equation:
Dependent Variable: RESID^2
Method: Least Squares
Date: 08/05/20 Time: 16:05
Sample: 1990 2015
Included observations: 26

Variable Coefficient Std. Error t-Statistic Prob.  

C -8.71E+23 8.43E+23 -1.033786 0.3234


C01^2 -1.215492 1.417408 -0.857545 0.4094
C01*L 29904.45 118555.6 0.252240 0.8055
C01*CO2 2811586. 3302889. 0.851251 0.4128
C01*E -2.18E+08 4.61E+08 -0.473820 0.6449
C01 -4.09E+11 1.84E+12 -0.222268 0.8282
L^2 -4.87E+09 4.18E+09 -1.165864 0.2683
L*CO2 -4.84E+11 2.80E+11 -1.727424 0.1120
L*E 7.55E+13 4.44E+13 1.698269 0.1175
L 1.32E+17 1.17E+17 1.123133 0.2853
CO2^2 -1.40E+12 3.18E+12 -0.439679 0.6687
CO2*E 1.24E+15 1.12E+15 1.105982 0.2923
CO2 6.77E+18 3.79E+18 1.784022 0.1020
E^2 -1.49E+17 1.06E+17 -1.407541 0.1869
E -1.05E+21 6.00E+20 -1.757036 0.1067

R-squared 0.526626     Mean dependent var 1.35E+21


Adjusted R-squared -0.075849     S.D. dependent var 1.37E+21
S.E. of regression 1.42E+21     Akaike info criterion 100.5381
Sum squared resid 2.21E+43     Schwarz criterion 101.2639
Log likelihood -1291.995     Hannan-Quinn criter. 100.7471
F-statistic 0.874105     Durbin-Watson stat 1.359738
Prob(F-statistic) 0.600645

With the help of Heteroscedasticity Test: White (including cross terms), we conclude the results by
analyzing that the Prob. Chi-square is greater than 0.05, that is 0.4729. Therefore, we accept null
hypothesis.

Our data is time series which is why heteroscedasticity does not exist and it is mainly found in cross
sectional data. Moreover, we observe that OLS is still BLUE and retains all its properties i.e. unbiased
and efficient concluding that the variances are minimum than other methods. Hence, no need for
resolving heteroscedasticity.

The t-Statistics of all the variables shows insignificancy which shows that in White test, variables that
are insignificant tends to be significant in regression model and the chance of Heteroscedasticity is
very low.

Autocorrelation
Autocorrelation, also known as serial correlation, is the correlation of a signal with a delayed copy of
itself as a function of delay. Informally, it is the similarity between observations as a function of the
time lag between them

Autocorrelation is assessed by the Durbin Watson test in Regression Analysis as we performed in the
CLRM section.

We have the following decision criteria for Durbin Watson Test:

 If the value is near to 0, positive autocorrelation is present

18
 If the value is near to 4, negative autocorrelation is present
 If the value is near to 2, no autocorrelation

The value measured in our model is 0.415 which indicates there is a positive autocorrelation
between the error terms and time. As our data is in Time series, chances of Autocorrelation to be in
the data do exist.

Hypothesis Testing

Ho: There is no autocorrelation


H1: There is autocorrelation

Decision Criteria

If Prob. Chi-Square < 0.05, reject null hypothesis


If Prob. Chi-Square > 0.05, accept null hypothesis

Breusch-Godfrey Serial Correlation LM Test:


Null hypothesis: No serial correlation at up to 2 lags

F-statistic 15.01190     Prob. F(2,19) 0.0001


Obs*R-squared 15.92326     Prob. Chi-Square(2) 0.0003

Test Equation:
Dependent Variable: RESID
Method: Least Squares
Date: 08/05/20 Time: 16:12
Sample: 1990 2015
Included observations: 26
Presample missing value lagged residuals set to zero.

Variable Coefficient Std. Error t-Statistic Prob.  

C 1.66E+11 2.75E+11 0.603707 0.5532


C01 -0.308412 0.327667 -0.941238 0.3584
L -10796.07 19005.47 -0.568051 0.5767
CO2 -1179853. 681354.4 -1.731628 0.0995
E 1.74E+08 98688535 1.767470 0.0932
RESID(-1) 0.819785 0.212606 3.855884 0.0011
RESID(-2) 0.092750 0.251828 0.368305 0.7167

R-squared 0.612433     Mean dependent var 0.000315


Adjusted R-squared 0.490044     S.D. dependent var 3.74E+10
S.E. of regression 2.67E+10     Akaike info criterion 51.08044
Sum squared resid 1.36E+22     Schwarz criterion 51.41916
Log likelihood -657.0457     Hannan-Quinn criter. 51.17798
F-statistic 5.003966     Durbin-Watson stat 1.435217
Prob(F-statistic) 0.003135

Here Prob. Chi-Square is 0.0003 which is less than 0.05. Therefore, we reject null hypothesis and
conclude that autocorrelation exist. Due to the time series data, autocorrelation exist. Therefore, we
conclude that OLS is still unbiased but not efficient anymore which means that the variances are
greater than other models like MLE etc. Therefore, the confidence interval, R2, T-stats, all will
become insignificant.

19
Misspecification
Misspecification is where the model you made with regression analysis is in error. In other words, it
doesn't account for everything it should. Models that are mis-specified can have biased coefficients
and error terms and tend to have biased parameter estimations. It is caused by 4 types of
specification biases: omission of relevant variable, inclusion of irrelevant variable, functional
misspecification, and errors of measurement.

Ramsey RESET test is executed to judge the model specification as concluded below:

Hypothesis Testing

Ho = Model is specified
H1 = Model is mis-specified

Decision Criteria

F – statistic > 4 – null rejected


Prob < 0.05 – null rejected
Ramsey RESET Test
Equation: UNTITLED
Omitted Variables: Squares of fitted values
Specification: GDP C C01 L CO2 E

Value df Probability
t-statistic  4.419628  20  0.0003
F-statistic  19.53311 (1, 20)  0.0003
Likelihood ratio  17.71656  1  0.0000

F-test summary:
Mean
Sum of Sq. df Squares
Test SSR  1.73E+22  1  1.73E+22
Restricted SSR  3.50E+22  21  1.67E+21
Unrestricted SSR  1.77E+22  20  8.86E+20

LR test summary:
Value
Restricted LogL -669.3680
Unrestricted LogL -660.5097

Unrestricted Test Equation:


Dependent Variable: GDP
Method: Least Squares
Date: 08/05/20 Time: 16:14
Sample: 1990 2015
Included observations: 26

Variable Coefficient Std. Error t-Statistic Prob.  

C 9.83E+11 6.83E+11 1.438616 0.1657


C01 0.158156 0.414354 0.381693 0.7067
L -58655.61 41318.10 -1.419610 0.1711
CO2 -1428620. 666623.6 -2.143069 0.0446
E 3.63E+08 1.20E+08 3.034488 0.0065
FITTED^2 4.55E-13 1.03E-13 4.419628 0.0003

R-squared 0.992249     Mean dependent var 8.35E+11


Adjusted R-squared 0.990312     S.D. dependent var 3.02E+11
S.E. of regression 2.98E+10     Akaike info criterion 51.26998
Sum squared resid 1.77E+22     Schwarz criterion 51.56031

20
Log likelihood -660.5097     Hannan-Quinn criter. 51.35358
F-statistic 512.0845     Durbin-Watson stat 1.053177
Prob(F-statistic) 0.000000

The results show that the F-stats is greater than 4 that is 19.53 and Probability is less than 0.05 i.e.
0.003. Therefore, we reject null hypothesis and conclude that the model is mis-specified. It can be
due the 4 types of specification biases we discussed above.

Omission of explanatory variables: There might be other factors (other than Xr) affecting Yt that
have been left out of equation.

Model specification: We might have a mis-specified model in terms of its structure. For example, it
might be that Yt is not affected by Xt, but that it is affected by the value of X in the previous period
(i.e. Xt_ 1).

Functional misspecification: The relationship between X and Y might be a non-linear relationship.

Measurement errors: If the measurement of one or more variables is not correct then errors appear
in the relationship and this contributes to the disturbance term.

However, we can try executing the Ramsey RESET Test with multiple regression (log-log) and then
check if misspecification still exists or not as regression with log provides better estimates and
results.

Ramsey RESET Test


Equation: UNTITLED
Omitted Variables: Squares of fitted values
Specification: LOG(GDP) C LOG(C01) LOG(L) LOG(CO2) LOG(E)

Value df Probability
t-statistic  2.173621  20  0.0619
F-statistic  3.724629 (1, 20)  0.0619
Likelihood ratio  4.513758  1  0.0189

F-test summary:
Mean
Sum of Sq. df Squares
Test SSR  0.004344  1  0.004344
Restricted SSR  0.022730  21  0.001082
Unrestricted SSR  0.018387  20  0.000919

LR test summary:
Value
Restricted LogL  54.65560
Unrestricted LogL  57.41248

Unrestricted Test Equation:


Dependent Variable: LOG(GDP)
Method: Least Squares
Date: 08/07/20 Time: 16:35
Sample: 1990 2015
Included observations: 26

Variable Coefficient Std. Error t-Statistic Prob.  

C 360.5800 174.6600 2.064468 0.0522


LOG(C01) -2.809821 1.425280 -1.971418 0.0627
LOG(L) -25.19911 12.69071 -1.985634 0.0610

21
LOG(CO2) 6.851157 3.490733 1.962670 0.0637
LOG(E) -8.709847 4.453767 -1.955614 0.0646
FITTED^2 0.201870 0.092873 2.173621 0.0419

R-squared 0.995247     Mean dependent var 27.38050


Adjusted R-squared 0.994059     S.D. dependent var 0.393368
S.E. of regression 0.030321     Akaike info criterion -3.954806
Sum squared resid 0.018387     Schwarz criterion -3.664476
Log likelihood 57.41248     Hannan-Quinn criter. -3.871202
F-statistic 837.5779     Durbin-Watson stat 1.034304
Prob(F-statistic) 0.000000

Hence, by applying multiple regression analysis (log-log) we see that the misspecification does not
exist anymore as the F-Statistics value is less than 4 that is 3.724 and probability is greater than 0.05
that is 0.0619.

Therefore, we accept null hypothesis and conclude that model is specified.

Conclusion
All the data has been gathered from WDI and all the tests have been executed without any
plagiarism with the use of E-views software. We have thoroughly examined our data by performing
various testing techniques including descriptive statistics, normality tests, graphs, pairwise
correlation, regression analysis (linear and log) and few post estimations test. The interpretations
are done under the concepts and context of econometrics and results generated.

22

You might also like