Adafdfgdg

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 15

FACULTY OF BUSINESS, ECONOMICS AND ACCOUNTANCY

UNIVERSITI MALAYSIA SABAH

ECONOMETRICS (BT22203)
SEMESTER 1, 2019/2020

GROUP ASSIGNMENT

Submitted to:
Mr. Mohd Safri Saiman

Prepared by:
NO NAME MATRIC NO SIGNATURE
1
2
3
4
5
6
7
8
9
10
1.0 Introduction

Econometric is the application of statistical methods to economic data in order to


give empirical content to economic relationship. It also is essentially at a conjunction of
economic theory and actual measurement, by using the theory and technique of statistical
inferences as a bridge pier. It means “economic measurement” is been defined as the social
science in which the tools of economic phenomena. However, the nature of econometrics
approach, the first step of every econometric research is the specification of the model, a
set of mathematical equation. A model is called as a simple regression model (SRM) when
there is just one independent variable in the model. The independent variables are also
referred to as the predictor and explanatory variables while the dependent variable is also
referred to as the response. Second, a multiple regression model (MRM) is an extension of
simple linear regression. It is used to predict the value of a variable based on the value of
two or more other variables. The variable that been used to predict is called the dependent
variable. Besides, to search the model is free from multicollinearity problem will use the
correlation to find the problem. Since the problem will find it out in MRM, it means the
correlation will decide the choosing of final model.

2.0 Multicollinearity

Multicollinearity is a very high inter-correlations or inter associations among the


independent variables. There also refers to a situation in which two or more explanatory
variables in a multiple regression model are highly linearly related. Multicollinearity means a
correlation between two variables that will cause confusion in a study because the variables
are too closely related. Multicollinearity will change the magnitudes of the regression
coefficients from one sample to another sample. In the presence of high multicollinearity,
the confidence intervals of the coefficients tend to become very wide and the statistic tend
to be very small. This will make the rejected of null hypothesis become difficult. There are
two basic kinds of multicollinearity which are structural multicollinearity and data
multicollinearity. Structural multicollinearity occurs when we create a model term using other
terms. In other words, it is a by-product of the model that we specify rather than being
present in the data itself. For example, if you square term X to model curvature, clearly
there is a correlation between X and X2. Data multicollinearity is present in the data itself
rather than being an artefact of our model. Observational experiments are more likely to
exhibit this kind of multicollinearity. We applied two methods to detect multicollinearity
which are Variance Inflation Factor (VIF) and the correlation between the explanatory
variables.

2
3.0 Variance Inflation Factor (VIF)

A formal method of detecting the presence of multicollinearity that is widely used is by the
means of VIF. VIF measures how much the variances of the estimated regression
coefficients are inflated as compared to when the independent variables are not linearly
related.

Firstly, we the standard deviation for SMC and RGDP is 58.90506 and 134983.4
respectively. Hence, we need the n-1 which is 29and the standard error for SMC and RGDP
is 0.000197893 and 0.000000086358 respectively. The overall standard error is
0.057529142. we used all the statistical data to identify the VIF. Since the PPP is explained
by SMC and RGDP.

PPP = Purchasing Power Parity over GDP for Malaysia, National Currency Units per US
Dollar, Annual, Not Seasonally Adjusted.

SMC = Stock Market Capitalization to GDP for Malaysia, Percent, Annual, Not Seasonally
Adjusted.

RGDP = Real GDP at Constant National Prices for Malaysia, Millions of 2011 Us Dollar,
Annual, Not Seasonally Adjusted.

rgdpo pop hc
Mean 63559.44 8.671697 1.108736
Median 64568.95 8.76689 1.088122
Mode #N/A #N/A #N/A
Standard Deviation 17556.75 0.287817 0.072869
Minimum 31171.07 7.997787 1.024963
Maximum 93267.44 8.975291 1.258523
Count

3
SUMMARY
OUTPUT

Regression Statistics
Multiple R 0.954869832
R Square 0.911776396
Adjusted R Square 0.905474711
Standard Error 5397.816609
Observations 31

ANOVA
Significance
  df SS MS F F
144.687691
Regression 2 8431363913 4215681956 8 1.73056E-15
815819876.
Residual 28 2 29136424.15
Total 30 9247183789      

Standard
  Coefficients Error t Stat P-value VIF Standard Deviation
58099.5468
Intercept -903160.8054 8 -15.54505765 2.66478E-15
4784.40831
pop 72564.84147 6 15.16694159 4.94728E-15 0.287817
18897.4376
hc 304364.4461 9 16.10612249 1.08752E-15 0.072869

Standard Deviation2 ×(n−1)× Standard Error 2


VIF=
Overall Standard Error 2

4
Y=Constant + B1(X1) + B2(X2) + … Bn(Xn)
Y = 1.817912779 + (-0.000813025)(X1) +
(0.0000005027749432872)(X2)

5
4.0 Multi Linear Regression Model

4.1 Major Uses

Firstly, my team will construct a multiple linear regression model to get the general idea of
the variables. Multiple linear regression is the most common form of linear regression
analysis (predictive analysis) and used to explain the relationship between one continuous
dependent variable and two or more independent variables. The independent variables can
be continuous or categorical. In this case, the model is used to explain the relationship
between PPP which is one continuous dependent variable and SMC and RGDP which are the
independent variables.

There are 3 major uses for multiple linear regression analysis.  First, it might be used to
identify the strength of the effect that the independent variables have on a dependent
variable.

Second, it can be used to forecast effects or impacts of changes.  That is, multiple linear
regression analysis helps us to understand how much the dependent variable change will
when we change the independent variables.  For instance, a multiple linear regression can
tell how much PPP is expected to increase (or decrease) for every one-point increase (or
decrease) in SMC and RGDP.

Third, multiple linear regression analysis predicts trends and future values.  The multiple
linear regression analysis can be used to get point estimates.  An example question may be
“what will the PPP be 6 months from now?”

4.2 Assumptions

There is some assumption required for the multiple linear regression analysis to be accurate.
The regression residuals must be normally distributed. A linear relationship is assumed
between the dependent variable and the independent variables. The residuals are
homoscedastic and approximately rectangular-shaped. The absence of multicollinearity is
assumed in the model, meaning that the independent variables are not too highly
correlated.

4.3 VIF Range

A variation inflation factor or known as VIF helps to detects multicollinearity in the


regression analysis. Multicollinearity is a phenomenon in which one predictor variable in a

6
multiple regression model can be linearly predicted from the others with a substantial
degree of accuracy. The VIF estimates how much the variance of a regression coefficient is
inflated due to multicollinearity in the model.

VIF range from 1 upwards and the numerical value tells the percentage of the variance is
inflated for each coefficient. For example, a VIF of 1.9 tells that the variance of a coefficient
is 90% bigger than what is expected if there was no multicollinearity which means if there
was no correlation with other predictors. A rule of thumb for interpreting the variance
inflation factor:

 1 = not correlated
 1–5 = moderately correlated
 ¿5 = highly correlated

Exactly how large a VIF must be before it causes issues is a subject of debate. What is
known is that the more VIF increases, the less reliable the regression results are going to
be. In general, a VIF above 10 indicates high correlation and is cause for concern. Some
authors suggest a more conservative level of 2.5 or above.

Sometimes a high VIF is no cause for concern at all. For example, my team can get a high
VIF by including products or powers from other variables in the regression, like x and x 2. If
they have high VIFs for dummy variables representing nominal variables with three or more
categories, those are usually not a problem.

4.4 determine the model is free from the multicollinearity or not

The second step is to determine the model is free from the multicollinearity problem. Thus,
my team, used VIF to clarify if the model void of multicollinearity problem. The standard
deviation of the data needed to be calculated first before the VIF because it is needed in the
formula of VIF:

Standard Deviation2 ×(n−1)× Standard Error 2


VIF= .
Overall Standard Error 2

The results of the VIF on both variables are 1.190662489 which is moderately correlated
which means the model is not totally free from the multicollinearity problem. But the
multicollinearity problem is not severe enough to cause the regression model to be

7
inaccurate. Thus, the multiple regression model can be used but bound to have some
inaccurate values. The model can be used to predict trend and future values but is
impossible to get an accurate value.

Y =−903160.8054+ ( 72564.84147 ) ( X 1 ) + ( 304364.4461 )( X 2 )

8
4.5 Correlation between explanatory variables

Single Regression Model (SRM)

rgdpo=f(pop)

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.307287
R Square 0.094425
Adjusted R
Square 0.063198
Standard
Error 16992.92
Observation
s 31

ANOVA
Significan
  df SS MS F ce F
8.73E+0 3.02385
Regression 1 8.73E+08 8 6 0.092655
2.89E+0
Residual 29 8.37E+09 8
Total 30 9.25E+09      

Coefficien Standard Lower Upper Lower Upper


  ts Error t Stat P-value 95% 95% 95.0% 95.0%

9
0.29861 92293.2 - 92293.2
Intercept -98986.3 93524.74 -1.0584 1 -290266 4 290266 4
-
1.73892 0.09265 40790.5 3301.7 40790.5
pop 18744.4 10779.31 4 5 -3301.77 7 7 7

10
Purchasing Power Parity (PPP) on Stock Market Capitalization (SMC)

The equation of rdgpo on pop is as below,

rdgpo = f (pop)

rdgpo = β0 + β1pop

β0 = -98986.3

β1 = 18744.4

rdgpo = -98986.3 + 18744.4 pop

R² = 0.094425

As for the PPP on SMC, we can see that it is the lower level which is SMC explain about
5.8% of the variation in PPP. The F Statistics about 1.7320 represent that SMC is statistically
significant at 1% level in the regression model will influence the PPP. The coefficient of PPP
about 1.8991 is highly significant for the P-value of obtaining a t-value for this coefficient as
much as about 51.58 with 1% significant level. The coefficient of PPP represents that when
SMC rate is zero, PPP will still increase by an average 1.899. The coefficient of SMC about
-0.0004 is highly significant for the p-value of obtaining a t-value for this coefficient as much
as about -1.3161 with 1% significant level. The coefficient of SMC represents that when
SMC increase by 1%, it will cause PPP decrease by an average 0.0004%. Thus, there have a
negative relationship between PPP and SMC.

The negative value between PPP and SMC is because β0 the coefficient is highly significant
for the P-value of obtaining a t-value for this coefficient as much as about 51.58. While in
the coefficient of β1 is highly significant for the p-value of obtaining a t-value for this
coefficient as much as about -1.3161 are practically zero. So, we will reject null hypothesis
and accept the alternative hypothesis for SMC because p-value is smaller than 0.05. the
relationship between SMC and PPP in overall have negative relationship.

11
4.6 Single Regression Model (SRM)

rdgpo=f(hc)

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.432399
R Square 0.186969
Adjusted R
Square 0.158933
Standard
Error 16101.24
Observation
s 31

ANOVA
Significan
  df SS MS F ce F
1.73E+0 6.66898
Regression 1 1.73E+09 9 4 0.015126
2.59E+0
Residual 29 7.52E+09 8
Total 30 9.25E+09      

Coefficien Standard Lower Upper Lower Upper


  ts Error t StatP-value 95% 95% 95.0% 95.0%
- 0.25590 39721.8 - 39721.8
Intercept -51949.4 44822.01 1.15902 5 -143621 6 143621 6
hc 104180.6 40341.98 2.58243 0.01512 21672.03 186689. 21672.0 186689.

12
8 6 3 3 3

13
Purchasing Power Parity (PPP) To Real GDP (RDGP)

The equation of rdgpo to hc is as below,

rdgpo = f (hc)

rdgpo = β0 + β1hc

β0 = -51949.4

β1 = 104180.6

rdgpo = 1-51949.4 + 104180.6 hc

R² = 0.186969

The regression above shows that RGDP explain behaviour of PPP. The R² is 0.321 which
means the RGDP explain PPP is about 32%. The F Statistics about 13.2622 represent that
SMC is statistically significant at 1% level in the regression model will influence the PPP. The
coefficient of PPP about 1.7556 is highly significant for the P-value of obtaining a t-value for
this coefficient as much as about 57.8731 with 1% significant level. The coefficient of PPP
represents that when SMC rate is zero, PPP will still increase by an average 1.7555. The
coefficient of SMC about 0.000000361 is highly significant for the p-value of obtaining a t-
value for this coefficient as much as about 3.6417 with 1% significant level. The coefficient
of SMC represents that when SMC increase by 1%, it will cause PPP increase by an average
3.608%. Thus, there have a positive relationship between PPP and RGDP.

The positive relationship between PPP and RGDP, this is because the coefficient of
β0 is highly significant for the P-value of obtaining a t-value for this coefficient as much as
about 57.8731 which the value is almost zero. However, the β1 coefficient is highly
significant for the p-value of obtaining a t-value for this coefficient as much as about
3.6417. on the other hand, we will reject null hypothesis and accept alternative hypothesis
for RGDP because p-value is smaller than 0.05. the relationship between RGDP to PPP in
overall have positive relationship.

14
5.0 Conclusion

We identified the correlation between SMC and RGDP. The closer the correlated of SMC and
RGDP, the severer the multicollinearity. If the multicollinearity is severed, we need to drop
one of the explanatory variables to fix the multicollinearity. Since we found that is 0.4 which
is 40% correlated between SMC and RGDP, hence we decided to drop whether SMC or
RGDP for explain the PPP.

The R square for SMC to PPP is 0.058 and the R square for RGDP to PPP is 0.32. The
R square for SMC to PPP is lower than RGDP to PPP. We decided to drop SMC and chose
PPP to explain the PPP.

15

You might also like