Econometric Project - Linear Regression Model
Econometric Project - Linear Regression Model
ECONOMETRICS PROJECT
12/01/2015
1
TABLE OF CONTENTS
I. INTRODUCTION
II. HYPOTHESIS TESTING
III. LINEAR SIMPLE REGRESSION
IV. LINEAR MULTIPLE REGRESSION
V. CONCLUSION
VI. ANEXES
2
I.
INTRODUCTION
II.
HYPOTHESIS TESTING
Hyptothesis 1
On the internet, there are rumors that the U.S. monthly average
unemployment rate is 12%. In order to test this hypothesis, the sample is
gathered on 70 months, starting in January 2007 and ending in October
2012. The test on sample evidence was found to have a percentage of 7.7%
for a standard deviation of 1.96. For a confidence level of 95%, decide if the
assumption is in accordance with the result.
Step 1: Define the null hypothesis
H0: π = 12%; Null hypothesis
H1: π≠12%; Alternative hypothesis; two tail test.
The null hypothesis relies on the fact that the U.S. monthly average
unemployment rate is 12%.
The alternative hypothesis is that the U.S. monthly average unemployment
rate is different than 12%.
We are in the case of a two tail test.
Step2: Establish level of significance
Because the significance level is α= 95% and because we are using a two
sided test, the probability of committing a type 1 error goes both ways and
we will have two cut off values.
Step 3: Cut off values and Rejection Region
Cut-off values = ± 1.96
Rejection Region = ( - ∞; -1.96) U (1.96; ∞)
Step 4: Compute zcalculated.
4
x́−µ 7.7−12 −5 −5
Sample evidence−Claim = = =−21.73
Zcalc= = σ = 1.96 1.96 0.23
Standard Error
√n √ 70 8.36
Hypothesis 2
The United States believes that the monthly average inflation rate for
all the 70 months, starting from January 2007 and ending in October 2012,
from the database is 2.16%, with a standard deviation of 1.75. There are
rumors that the U.S. monthly average inflation rate is 5%. Does this result
lend support to the United States’ opinion at 5% level of significance?
Step 1: Define the null and alternative hypotheses.
H0: π=2.16%
H1: π>2.16%
The null hypothesis is based on the fact that the U.S. monthly average
inflation rate calculated on 70 months is 2.16%.
The alternative hypothesis represents the fact that the U.S. monthly average
inflation rate calculated on 70 months is bigger than 2.16.
We are in the case of a right sided test.
Step 2: Set the significance level.
We set the significance level at 5%, so the probability to guarantee the
results is 95% of cases.
Step 3: Establish, according to the significance level, the cut-off
values, the Rejection Region (RR), and the Acceptance Region (AR).
Cut-off values = + 1.645
Rejection Region = (1.645; ∞)
Acceptance Region: (- ∞, 1.645]
5
Step 4: Compute Zcalculated:
III.
LINEAR SIMPLE REGRESSION
The GDP of a country varies a lot due to the fact that several factors
influence it. With the help of the available data, we will try to deduct the
influence of the independent variable (the monthly unemployment rate) over
the dependent variable (the monthly GDP).
The following data has been extracted from an Excel File, and by using
and analyzing it, our purpose is to find the regression equation, with the
following form: Y = α0 + α1*X+ε.
The first table called 'Summary Output' deals with the analysis of the
coefficient of correlation - Multiple R, the coefficient of determination - R
Square and the adjusted coefficient of determination - Adjusted R Square.
Regression Statistics
Multiple R 0.290097746
R Square 0.084156702
Adjusted R Square 0.070688419
Standard Error 0.567157441
Observations 70
Table 1. Summary Output.
ANOVA
df SS MS F Significance F
2.00994
Regression 1 2.009942898 3 6.24851 0.01484804
0.32166
Residual 68 21.87339425 8
Total 69 23.88333714
Table 2. ANOVA table for Simple Regression.
The model is a valid one due to the fact that Significance F (0.014) is
close to 0 (<0.05). This probability is the chance to state that the model is
valid when in reality is not, or there is a chance to wrongly reject the null
hypothesis on validity.
Defining hypothesis:
H0: All predicted monthly GDP values have the same value.
H1: In 95% of cases there are at least 2 estimated values for the monthly
GDP which are different.
7
The value of this statistics is calculated by Excel and we find its value
in the above table: F=6.24851.
The decision over the test:
A comparison is done between the F test calculated value and the
theoretical one for a significance level of 5%.
F calculated>F 5%,1,68 => It falls into the rejection region. The
chance to wrongly reject H0 is smaller than 5% and there is enough sample
evidence to reject H0 and to accept H1.
If the chance to wrongly reject H0 (Significance F = 0.014) is smaller
than α (0.05), the decision to reject H0 is correct and the model is valid.
Next, the analysis will be done for the following table's coefficients,
that will help us develop our model's equation:
Table 3
8
different from 0. It means that in 95% of cases we will correctly reject the
null hypothesis upon the slope.
In the variation of the monthly unemployment rate, the monthly GDP is
expected to be comprised between at least 0.0175 and at maximum 0.1560.
The P-value represents the computed risk to decide that the slope is
significantly different from 0 when in reality is 0. Because the P-value is lower
than 5%, there is a low risk to take wrong decisions. The probability to
commit Type 1 error is 0.01 (P-value) which is less than 5%.
f(x) = 0x + 0
R² = 0
In the U.S. Unemployment rate – Line Fit Plot chart, we can see that
there is a typical heteroscedastic model. The variance of the errors is
increasing, while the monthly unemployment rate increases. There is no
constant variance.
In real life, it will be chosen another relation between the monthly GDP
and the monthly unemployment rate of a country.
9
The U.S. Unemployment Rate % Line Fit Plot
GDP U.S
GDP U.S
Predicted GDP U.S
R² = 0.94
Sample Percentile
IV.
LINEAR MULTIPLE REGRESSION
10
variables contribute decisively to the change in the value of monthly GDP.
We will start again the same analysis as for the simple regression:
As we can see from the image extracted from the Excel file, this is how
the regression model looks like. From the following available data, our
purpose is to find the regression equation, with the following form: Y = α0 +
α1*X1+ α2*X2+ ε.
The first table we are analyzing is called 'Summary Output' and deals with
the analysis of the coefficient of correlation - Multiple R, the coefficient of
determination - R Square and the adjusted coefficient of determination -
Adjusted R Square.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.59552604
R Square 0.354651264
Adjusted R Square 0.335387123
Standard Error 0.479631099
Observations 70
Table 5. Summary output for Multiple Regression.
11
Next, we wish to test the validity of the model, by interpreting data from
the table called 'ANOVA table':
Significance
df SS MS F F
18.4099180
Regression 2 8.470255713 4.235127857 7 4.24732E-07
Residual 67 15.41308143 0.230045991
Total 69 23.88333714
Table 5. ANOVA method for Multiple Regression.
H0: All predicted monthly GDP values have the same value.
H1: In 95% of cases there are at least 2 estimated values for the monthly
GDP which are different.
A comparison is done between the F test calculated value and the
theoretical one for a significance level of 5%.
F calculated>F 5%,2,67 => It falls into the rejection region. The
chance to wrongly reject H0 is smaller than 5% and there is enough sample
evidence to reject H0 and to accept H1.
If the chance to wrongly reject H0 (Significance F = 4.24732E-07 which
tends to 0) is smaller than α (0.05), the decision to reject H 0 is correct and the
model is valid.
12
Using the 1st degree equation Y = α0 + α1*X1+ α2*X2+ ε, the specific
model for the sample will be:
The predicted monthly GDP = 12.75 + 0.18*Predicted Monthly
Unemployment rate +0.20*Predicted Monthly Inflation Rate+ ε.
∑ (e t −e t−1) ❑2❑
1.5196 belongs to (0;0.95) => positive autocorrelation
t =2
==0.09
SSResidual 15.413
According to the result obtained after applying the Durbin Watson test,
the model is affected by positive autocorrelation.
13
The U.S. Inflation Rate % Residual Plot
Residuals
The U.S. Inflation Rate %
GDP U.S
GDP U.S
14
regression output (0.35) exceeds the R-squared of any independent variable
regressed on the other independent variables (0.26).
V.
CONCLUSION
VI.
ANEXES
15
Mar-07 13.8 4.4 2.78
Apr-07 13.98 4.5 2.57
May-07 14.02 4.5 2.69
Jun-07 14.03 4.5 2.69
Jul-07 14.04 4.6 2.36
Aug-07 14.15 4.6 1.97
Sep-07 14.29 4.7 2.76
Oct-07 14.22 4.7 3.54
Nov-07 14.27 4.7 4.31
Dec-07 14.38 5 4.08
Jan-08 14.41 4.9 4.28
Feb-08 14.25 4.8 4.03
Mar-08 14.33 5.1 3.98
Apr-08 14.39 5 3.94
May-08 14.43 5.5 4.18
Jun-08 14.6 5.5 5.02
Jul-08 14.58 5.7 5.6
Aug-08 14.45 6.1 5.37
Sep-08 14.42 6.1 4.94
Oct-08 14.31 6.5 3.66
Nov-08 14.28 6.7 1.07
Dec-08 13.99 7.2 0.09
Jan-09 14.07 7.6 0.03
Feb-09 14.06 8.1 0.24
Mar-09 14.02 8.5 -0.38
Apr-09 14.04 8.9 -0.74
May-09 14.06 9.4 -1.28
Jun-09 13.86 9.5 -1.43
Jul-09 13.85 9.4 -2.1
Aug-09 13.94 9.7 -1.48
Sep-09 13.97 9.8 -1.29
Oct-09 14.14 10.1 -1.3
Nov-09 14.08 9.9 -0.2
Dec-09 14.04 9.9 1.8
Jan-10 14.2 9.7 2.7
Feb-10 14.27 9.8 2.6
Mar-10 14.36 9.8 2.1
Apr-10 14.6 9.9 2.3
May-10 14.48 9.6 2.2
Jun-10 14.45 9.4 2
Jul-10 14.54 9.5 1.1
Aug-10 14.55 9.6 1.2
Sep-10 14.64 9.5 1.1
16
Oct-10 14.71 9.5 1.1
Nov-10 14.69 9.8 1.2
Dec-10 14.81 9.3 1.1
Jan-11 14.72 9.1 1.5
Feb-11 14.74 9 1.6
Mar-11 14.99 8.9 2.1
Apr-11 15.04 9 2.7
May-11 15.04 9 3.2
Jun-11 14.93 9.1 3.6
Jul-11 15.13 9 3.6
Aug-11 15.21 9 3.6
Sep-11 15.14 9 3.8
Oct-11 15.38 8.9 3.9
Nov-11 15.29 8.6 3.5
Dec-11 15.29 8.5 3.4
Jan-12 15.42 8.3 3
Feb-12 15.55 8.3 2.9
Mar-12 15.46 8.2 2.9
Apr-12 15.54 8.1 2.7
May-12 15.59 8.2 2.3
Jun-12 15.65 8.2 1.7
Jul-12 15.81 8.2 1.7
Aug-12 15.77 8.1 1.4
Sep-12 15.86 7.8 1.7
Oct-12 15.81 7.9 2
Table 7. Database
17