Adafdfgdg
Adafdfgdg
Adafdfgdg
ECONOMETRICS (BT22203)
SEMESTER 1, 2019/2020
GROUP ASSIGNMENT
Submitted to:
Mr. Mohd Safri Saiman
Prepared by:
NO NAME MATRIC NO SIGNATURE
1
2
3
4
5
6
7
8
9
10
1.0 Introduction
2.0 Multicollinearity
2
3.0 Variance Inflation Factor (VIF)
A formal method of detecting the presence of multicollinearity that is widely used is by the
means of VIF. VIF measures how much the variances of the estimated regression
coefficients are inflated as compared to when the independent variables are not linearly
related.
Firstly, we the standard deviation for SMC and RGDP is 58.90506 and 134983.4
respectively. Hence, we need the n-1 which is 29and the standard error for SMC and RGDP
is 0.000197893 and 0.000000086358 respectively. The overall standard error is
0.057529142. we used all the statistical data to identify the VIF. Since the PPP is explained
by SMC and RGDP.
PPP = Purchasing Power Parity over GDP for Malaysia, National Currency Units per US
Dollar, Annual, Not Seasonally Adjusted.
SMC = Stock Market Capitalization to GDP for Malaysia, Percent, Annual, Not Seasonally
Adjusted.
RGDP = Real GDP at Constant National Prices for Malaysia, Millions of 2011 Us Dollar,
Annual, Not Seasonally Adjusted.
rgdpo pop hc
Mean 63559.44 8.671697 1.108736
Median 64568.95 8.76689 1.088122
Mode #N/A #N/A #N/A
Standard Deviation 17556.75 0.287817 0.072869
Minimum 31171.07 7.997787 1.024963
Maximum 93267.44 8.975291 1.258523
Count
3
SUMMARY
OUTPUT
Regression Statistics
Multiple R 0.954869832
R Square 0.911776396
Adjusted R Square 0.905474711
Standard Error 5397.816609
Observations 31
ANOVA
Significance
df SS MS F F
144.687691
Regression 2 8431363913 4215681956 8 1.73056E-15
815819876.
Residual 28 2 29136424.15
Total 30 9247183789
Standard
Coefficients Error t Stat P-value VIF Standard Deviation
58099.5468
Intercept -903160.8054 8 -15.54505765 2.66478E-15
4784.40831
pop 72564.84147 6 15.16694159 4.94728E-15 0.287817
18897.4376
hc 304364.4461 9 16.10612249 1.08752E-15 0.072869
4
Y=Constant + B1(X1) + B2(X2) + … Bn(Xn)
Y = 1.817912779 + (-0.000813025)(X1) +
(0.0000005027749432872)(X2)
5
4.0 Multi Linear Regression Model
Firstly, my team will construct a multiple linear regression model to get the general idea of
the variables. Multiple linear regression is the most common form of linear regression
analysis (predictive analysis) and used to explain the relationship between one continuous
dependent variable and two or more independent variables. The independent variables can
be continuous or categorical. In this case, the model is used to explain the relationship
between PPP which is one continuous dependent variable and SMC and RGDP which are the
independent variables.
There are 3 major uses for multiple linear regression analysis. First, it might be used to
identify the strength of the effect that the independent variables have on a dependent
variable.
Second, it can be used to forecast effects or impacts of changes. That is, multiple linear
regression analysis helps us to understand how much the dependent variable change will
when we change the independent variables. For instance, a multiple linear regression can
tell how much PPP is expected to increase (or decrease) for every one-point increase (or
decrease) in SMC and RGDP.
Third, multiple linear regression analysis predicts trends and future values. The multiple
linear regression analysis can be used to get point estimates. An example question may be
“what will the PPP be 6 months from now?”
4.2 Assumptions
There is some assumption required for the multiple linear regression analysis to be accurate.
The regression residuals must be normally distributed. A linear relationship is assumed
between the dependent variable and the independent variables. The residuals are
homoscedastic and approximately rectangular-shaped. The absence of multicollinearity is
assumed in the model, meaning that the independent variables are not too highly
correlated.
6
multiple regression model can be linearly predicted from the others with a substantial
degree of accuracy. The VIF estimates how much the variance of a regression coefficient is
inflated due to multicollinearity in the model.
VIF range from 1 upwards and the numerical value tells the percentage of the variance is
inflated for each coefficient. For example, a VIF of 1.9 tells that the variance of a coefficient
is 90% bigger than what is expected if there was no multicollinearity which means if there
was no correlation with other predictors. A rule of thumb for interpreting the variance
inflation factor:
1 = not correlated
1–5 = moderately correlated
¿5 = highly correlated
Exactly how large a VIF must be before it causes issues is a subject of debate. What is
known is that the more VIF increases, the less reliable the regression results are going to
be. In general, a VIF above 10 indicates high correlation and is cause for concern. Some
authors suggest a more conservative level of 2.5 or above.
Sometimes a high VIF is no cause for concern at all. For example, my team can get a high
VIF by including products or powers from other variables in the regression, like x and x 2. If
they have high VIFs for dummy variables representing nominal variables with three or more
categories, those are usually not a problem.
The second step is to determine the model is free from the multicollinearity problem. Thus,
my team, used VIF to clarify if the model void of multicollinearity problem. The standard
deviation of the data needed to be calculated first before the VIF because it is needed in the
formula of VIF:
The results of the VIF on both variables are 1.190662489 which is moderately correlated
which means the model is not totally free from the multicollinearity problem. But the
multicollinearity problem is not severe enough to cause the regression model to be
7
inaccurate. Thus, the multiple regression model can be used but bound to have some
inaccurate values. The model can be used to predict trend and future values but is
impossible to get an accurate value.
8
4.5 Correlation between explanatory variables
rgdpo=f(pop)
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.307287
R Square 0.094425
Adjusted R
Square 0.063198
Standard
Error 16992.92
Observation
s 31
ANOVA
Significan
df SS MS F ce F
8.73E+0 3.02385
Regression 1 8.73E+08 8 6 0.092655
2.89E+0
Residual 29 8.37E+09 8
Total 30 9.25E+09
9
0.29861 92293.2 - 92293.2
Intercept -98986.3 93524.74 -1.0584 1 -290266 4 290266 4
-
1.73892 0.09265 40790.5 3301.7 40790.5
pop 18744.4 10779.31 4 5 -3301.77 7 7 7
10
Purchasing Power Parity (PPP) on Stock Market Capitalization (SMC)
rdgpo = f (pop)
rdgpo = β0 + β1pop
β0 = -98986.3
β1 = 18744.4
R² = 0.094425
As for the PPP on SMC, we can see that it is the lower level which is SMC explain about
5.8% of the variation in PPP. The F Statistics about 1.7320 represent that SMC is statistically
significant at 1% level in the regression model will influence the PPP. The coefficient of PPP
about 1.8991 is highly significant for the P-value of obtaining a t-value for this coefficient as
much as about 51.58 with 1% significant level. The coefficient of PPP represents that when
SMC rate is zero, PPP will still increase by an average 1.899. The coefficient of SMC about
-0.0004 is highly significant for the p-value of obtaining a t-value for this coefficient as much
as about -1.3161 with 1% significant level. The coefficient of SMC represents that when
SMC increase by 1%, it will cause PPP decrease by an average 0.0004%. Thus, there have a
negative relationship between PPP and SMC.
The negative value between PPP and SMC is because β0 the coefficient is highly significant
for the P-value of obtaining a t-value for this coefficient as much as about 51.58. While in
the coefficient of β1 is highly significant for the p-value of obtaining a t-value for this
coefficient as much as about -1.3161 are practically zero. So, we will reject null hypothesis
and accept the alternative hypothesis for SMC because p-value is smaller than 0.05. the
relationship between SMC and PPP in overall have negative relationship.
11
4.6 Single Regression Model (SRM)
rdgpo=f(hc)
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.432399
R Square 0.186969
Adjusted R
Square 0.158933
Standard
Error 16101.24
Observation
s 31
ANOVA
Significan
df SS MS F ce F
1.73E+0 6.66898
Regression 1 1.73E+09 9 4 0.015126
2.59E+0
Residual 29 7.52E+09 8
Total 30 9.25E+09
12
8 6 3 3 3
13
Purchasing Power Parity (PPP) To Real GDP (RDGP)
rdgpo = f (hc)
rdgpo = β0 + β1hc
β0 = -51949.4
β1 = 104180.6
R² = 0.186969
The regression above shows that RGDP explain behaviour of PPP. The R² is 0.321 which
means the RGDP explain PPP is about 32%. The F Statistics about 13.2622 represent that
SMC is statistically significant at 1% level in the regression model will influence the PPP. The
coefficient of PPP about 1.7556 is highly significant for the P-value of obtaining a t-value for
this coefficient as much as about 57.8731 with 1% significant level. The coefficient of PPP
represents that when SMC rate is zero, PPP will still increase by an average 1.7555. The
coefficient of SMC about 0.000000361 is highly significant for the p-value of obtaining a t-
value for this coefficient as much as about 3.6417 with 1% significant level. The coefficient
of SMC represents that when SMC increase by 1%, it will cause PPP increase by an average
3.608%. Thus, there have a positive relationship between PPP and RGDP.
The positive relationship between PPP and RGDP, this is because the coefficient of
β0 is highly significant for the P-value of obtaining a t-value for this coefficient as much as
about 57.8731 which the value is almost zero. However, the β1 coefficient is highly
significant for the p-value of obtaining a t-value for this coefficient as much as about
3.6417. on the other hand, we will reject null hypothesis and accept alternative hypothesis
for RGDP because p-value is smaller than 0.05. the relationship between RGDP to PPP in
overall have positive relationship.
14
5.0 Conclusion
We identified the correlation between SMC and RGDP. The closer the correlated of SMC and
RGDP, the severer the multicollinearity. If the multicollinearity is severed, we need to drop
one of the explanatory variables to fix the multicollinearity. Since we found that is 0.4 which
is 40% correlated between SMC and RGDP, hence we decided to drop whether SMC or
RGDP for explain the PPP.
The R square for SMC to PPP is 0.058 and the R square for RGDP to PPP is 0.32. The
R square for SMC to PPP is lower than RGDP to PPP. We decided to drop SMC and chose
PPP to explain the PPP.
15