0% found this document useful (0 votes)
9 views49 pages

Regression and Assumptions

The document outlines seven key assumptions for a good regression model, including the significance of independent variables, absence of serial correlation, and normal distribution of residuals. It explains the importance of statistical tests like t-tests and F-tests for validating these assumptions, as well as methods for detecting issues like heteroscedasticity and serial correlation. Additionally, it provides a hypothetical model and sample data to illustrate the application of these concepts in regression analysis.

Uploaded by

yasir6028
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views49 pages

Regression and Assumptions

The document outlines seven key assumptions for a good regression model, including the significance of independent variables, absence of serial correlation, and normal distribution of residuals. It explains the importance of statistical tests like t-tests and F-tests for validating these assumptions, as well as methods for detecting issues like heteroscedasticity and serial correlation. Additionally, it provides a hypothetical model and sample data to illustrate the application of these concepts in regression analysis.

Uploaded by

yasir6028
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 49

An Investigation into Regression Model using

EVIEWS

1
Seven assumptions about a good regression model

1. Regression line must be fitted to data strongly.


2. Most of the independent variables should be
individually significant to explain dependent variable
3. Independent variables should be jointly significant to
influence or explain dependent variable.
4. No serial or auto-correlation in the residual (u)
5. The variance of the residual (u) should be constant
meaning that homoscedasticity
6. The residual (u) should be normally distributed.
7. The sign of the coefficients should follow economic
theory or expectation or experiences or intuition.

2
(Assumption no. 1)

Regression line must be fitted to data strongly (Goodness of Data Fit)


***
Guideline : R2 => 60 percent (0.60) is better

3
Goodness of Data Fit

• Data must be fitted reasonable well.

• That is value of R2 should be reasonable


high, more than 60 percent.
• Higher the R2 better the fitted data.

4
(Assumption no. 2)

Most of the independent variables should


be individually significant

**
t- test

t –test is done to know whether each and every


independent variable (X1, X2 and X3 etc here) is
individually significant or not to influence the
dependent variable, that is Y here.

5
Individual significance of the variable

• Most of the independent variables should be individually


significant.

• This matter can be checked using t test.


• If the p-value of t statistics is less than 5 percent (0.05) we
can reject the null and accept alternative hypothesis.

• If we can reject the null hypothesis, it means that particular


independent variable is significant to influence dependent
variable in the population.

6
For Example>>
Variables:
We have four variables, Y, X1, X2 X3
Here Y is dependent and X1, X2 X3 are independent

Population regression model


Y = Bo + B1X1+ B2X2 + B3X3 + u

Sample regression model


Y = bo + b1X1+ b2X2 + b3X3 + e

Here, sample regression line is a estimator of population regression line. Our


target is to estimate population regression line (which is almost impposible or
time and money consuming to estimate) from sample regression line. For
example, small b1, b2 and b3 are estimators of big B1, B2 and B3

Here, u is the residual for population regression line while e is the residual for
sample regression line. e is the estimator of u. We want to know the nature of u
from e.

7
Tips
If the sample collection is done as per the
statistical guideline (several random procedures)
then sample regression line can be a
representative of population regression line.

Our target is to estimate the population


regression line from a sample regression line.
Setting hypothesis for t –test : An example

Null Hypothesis: Bo=0


Alternative hypothesis: Bo≠0

Null hypothesis : B1=0


Alternative hypothesis: B1≠0

Null Hypothesis : B2=0


Alternative hypothesis: B2≠0

Null Hypothesis : B3=0


Alternative hypothesis: B3 ≠0

9
Hypothesis Setting
Null hypothesis : B1=0
Alternative hypothesis: B1≠0

• Since the direction of alternative hypothesisis is ≠, meaning that we assume


that there exists a relationship between independent variable (X1 should be
here) with dependent variable (Y here) in the population. But it can not say
whether the relationship is negative or positive. This direction ≠ is a two tail
hypothesis.

Null hypothesis : B1=0


Alternative hypothesis: B1<0

• But if we set hypothesis as above, then we assume that in the population,


there exists a negative relationship between X1 and Y as the direction in
alternative hypothesis is <. It requires one tail test.
`
(Assumption no. 3)

Joint Significace
Independent variables should be jointly significant
to explain dependent variable
**
F- test

ANOVA
(Analysis of Variance)

11
Joint significance

• Independent variables should be jointly


significant to explain Y. This can be checked
using F-test.

• If the p-value of F statistic is less than 5 percent


(0.05) we can reject the null and accept
alternative hypothesis.

• If we can reject null hypothesis, it means that all


the independent variables (X1, X2 X3 ) jointly can
influence dependent variable, that is Y here.

12
Joint hypothesis setting

Null hypothesis Ho: B1=B2=B3=0

Alternative H1: Not all B’s are simultaneously equal to


zero

Here Bo is dropped as it is not associated with any


variable.

13
Few things

• Residual ( u or e) = Actual Y – estimated (fitted)


Y

• Residual or error term or disturbance term

• Serial correlation or auto-correlation

14
Residual Analysis
(Assumption no. 4)

No serial or auto-correlation in the residual (u).

**
Breusch-Godfrey serial correlation LM test : BG test

16
Serial correlation

• Serial correlation is a statistical term used to the


describe the situation when the residual is
correlated with lagged values of itself.

• In other words, If residuals are correlated, we


call this situation serial correlation which is not
desirable.

17
How serial correlation can be formed in the
model?
• Incorrect model specification,
• omitted variables,
• incorrect functional form,
• incorrectly transformed data.

18
Detection of serial correlation

• Many ways we can detect the existence of serial


correlation in the model.

• An approach of detecting serial correlation is


Breusch-Godfrey serial correlation LM test : BG
test

19
Hypothesis setting

Null hypothesis Ho: no serial correlation (no


correlation between residuals (ui and uj))

Alternative hypothesis H1: serial correlation


(correlation between residuals (ui and uj )

20
(Assumption no. 5)

The variance of the residual (u) is


constant (Homoscedasticity)
***

21
• Heteroscedasticity is a term used to the
describe the situation when the variance of the
residuals from a model is not constant.

• When the variance of the residuals is


constant, we call it homoscedasticity.
Homoscedasticity is desirable.

• If residuals do not have constant variance, we


call it hetersocedasticty, which is not desirable.

22
How the heteroscedasticity may form?
– Incorrect model specification,
– Incorrectly transformed data,

23
Hypothesis setting for heteroscedasticity

– Null hypothesis Ho: Homoscedasticity (the


variance of residual (u) is constant)

– Alternative hypothesis H1 : Heteroscedasticity


(the variance of residual (u) is not constant )

24
Detection of heteroscedasticity

• There are many test involed to detect


heteroscedasticity.

• Bruesch-Pegan-Godfrey test
• White heteroscedasticity test

25
(Assumption no. 6)

Residuals (u ) should be normally distributed


**
Jarque Bera statistics

26
Setting the hypothesis:

• Null hypothesis Ho : Normal distribution (the


residual (u) follows a normal distribution)
• Alternative hypothesis H1: Not normal distribution
(the residual (u) follows not normal distribution)

Detecting residual normality:


• Histogram-Normality test (Perform Jarque-Bera
Statistic).

• If the p-value of Jarque-Bera statistics is less than


5 percent (0.05) we can reject null and accept the
alternative, that is residuals (u) are not normally
distributed.
27
(Assumption no. 7)

The sign of the coefficients should follow


economic theory or expectation or experiences
of others (literature review) or intuition .

28
An Emperical
Model Development

29
Our hypothetical model
Variables:

We have four variables, Y, X1, X2 X3


Here Y is dependent and X1, X2 and X3 are independent

Population regression model


Y = Bo + B1X1+ B2X2 + B3X3 + u

Sample regression line


Y = bo+ b1X1+ b2X2+b3X3 + e

30
DATA

Sample size is 35 taken from


population
DATA
obs RESID X1 X2 X3 Y YF
1 0.417167 1700 1.2 20000 1.2 0.782833
2 -0.27926 1200 1.03 18000 0.65 0.929257
3 -0.17833 2100 1.2 19000 0.6 0.778327
4 0.231419 937.5 1 15163 1.2 0.968581
5 -0.33278 7343.3 0.97 21000 0.5 0.832781
6 0.139639 837.9 0.88 15329 1.2 1.060361
7 -0.01746 1648 0.91 16141 1 1.017457
8 -0.14573 739.1 1.2 21876 0.65 0.795733
9 0.480882 2100 0.89 17115 1.5 1.019118
10 -0.0297 274.6 0.23 23400 1.5 1.529701
11 -0.32756 231 0.87 16127 0.75 1.077562
12 0.016113 1879.1 0.94 17688 1 0.983887
13 -0.34631 1941 0.99 17340 0.6 0.946315
14 0.485755 2317.6 0.87 21000 1.5 1.014245
15 0.972181 471.4 0.93 16000 2 1.027819
16 -0.22757 678 0.79 16321 0.9 1.127572
17 -0.2685 7632.9 0.93 18027 0.6 0.868503
18 -0.41902 510.1 0.93 18023 0.6 1.019018
19 -0.4259 630.6 0.93 15634 0.6 1.0259
20 0.076632 1500 1.03 17886 1 0.923368
DATA
obs RESID X1 X2 X3 Y YF
21 -0.37349949 1618.3 1.1 16537 0.5 0.873499
22 0.183799347 2009.8 0.96 17655 1.15 0.966201
23 0.195832507 1562.4 0.96 23100 1.15 0.954167
24 -0.46138707 1200 0.88 13130 0.6 1.061387
25 0.309577968 13103 1 20513 1 0.690422
26 -0.21073204 3739.6 0.92 17409 0.75 0.960732
27 -0.08351157 324 1.2 14525 0.75 0.833512
28 -0.02060854 2385.8 0.89 15207 1 1.020609
29 0.14577644 1698.5 0.93 15409 1.15 1.004224
30 -0.06000649 544 0.87 18900 1 1.060006
31 -0.50510204 1769.1 0.45 17677 0.85 1.355102
32 0.870370225 1065 0.65 15092 2.1 1.22963
33 0.274774344 803.1 0.98 18014 1.25 0.975226
34 -0.1496757 1616.7 1 28988 0.75 0.899676
35 0.062732149 210 1.2 21786 0.87 0.807268

Y, X1, X2 and X3 are actual sample data collected from population


YF= Estimated, forecasted or predicted Y
RESID (e) = Residuals of the sample regression line that is, e=Actual Y – Predicted Y (fitted Y)
Regression Output
Regression output
Dependent Variable: Y 35 Observation Method: Least Squares
Included observations: 98
Variable Coefficient Std. Error t-Statistic Prob.
C 1.800 0.4836 3.72 0.0008
X1 -2.11E-05 2.58E-05 -0.820 0.4183
X2 -0.7527 0.3319 -2.267 0.0305
X3 -3.95E-06 2.08E-05 -0.189 0.8509

R-squared 0.1684 Mean dependent var 0.9834


Adjusted R-squared 0.087 S.D. dependent var 0.3912
S.E. of regression 0.3736 Akaike info criterion 0.9762
Sum squared resid 4.328 Schwarz criterion 1.15
Log likelihood -13.08 F-statistic 2.093
Durbin-Watson stat 2.184 Prob(F-statistic) 0.1213

35
Few things

• t- statistics= Coeffient / standard error

• t-statistics (absolute value) and p values always


move in opposite direction
Output
Actual Y, Fitted Y, Residual and its plotting
obs Actual Fitted Residual Residual Plot
1 1.2 0.782832991 0.417167009 | . | .* |
2 0.65 0.92925722 -0.27925722 | .* | . |
3 0.6 0.778327375 -0.178327375 | .*| . |
4 1.2 0.96858115 0.23141885 | . | *. |
5 0.5 0.8327808 -0.3327808 | * | . |
6 1.2 1.060360549 0.139639451 | . |*. |
7 1 1.017457055 -0.017457055 | . * . |
8 0.65 0.79573323 -0.14573323 | .*| . |
9 1.5 1.019118163 0.480881837 | . | .* |
10 1.5 1.529701243 -0.029701243 | . * . |
11 0.75 1.077562408 -0.327562408 | * | . |
12 1 0.983887019 0.016112981 | . * . |
13 0.6 0.946314864 -0.346314864 | * | . |
14 1.5 1.014244939 0.485755061 | . | .* |
15 2 1.027819105 0.972180895 | . | . *
16 0.9 1.127572088 -0.227572088 | .* | . |
17 0.6 0.868503447 -0.268503447 | .* | . |
18 0.6 1.019018495 -0.419018495 | *. | . |
19 0.6 1.025899595 -0.425899595 | *. | . |
20 1 0.923368304 0.076631696 | . |* . |
Output
Actual Y, Fitted Y, Residual and its plotting
obs Actual Fitted Residual Residual Plot
21 0.5 0.873499486 -0.373499486 | * | . |
22 1.15 0.966200653 0.183799347 | . |*. |
23 1.15 0.954167493 0.195832507 | . |*. |
24 0.6 1.061387074 -0.461387074 | *. | . |
25 1 0.690422032 0.309577968 | . | * |
26 0.75 0.960732042 -0.210732042 | .*| . |
27 0.75 0.833511567 -0.083511567 | . *| . |
28 1 1.020608541 -0.020608541 | . * . |
29 1.15 1.00422356 0.14577644 | . |*. |
30 1 1.060006494 -0.060006494 | . *| . |
31 0.85 1.355102042 -0.505102042 | *. | . |
32 2.1 1.229629775 0.870370225 | . | . *|
33 1.25 0.975225656 0.274774344 | . | *. |
34 0.75 0.899675696 -0.149675696 | .*| . |
35 0.87 0.807267851 0.062732149 | . |* . |
Actual Y, Fitted Y and Residual
2.4

2.0

1.6

1.2
1.2
0.8
0.8
0.4
0.4

0.0

-0.4

-0.8
5 10 15 20 25 30 35

Residual Actual Fitted


Sample residual
1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6
5 10 15 20 25 30 35

Y Residuals
(Assumption no. 1)
Goodness of Fit Data

R-square: 0.1684

• It means that 16.84 percent variation in Y can


be explained jointly by three independent
variables such as X1, x2 and X3. The rest
83.16 percent variation in Y can be explained
by residuals or other variables other than X1 X2
and X3.
41
Assumption No. 2
Independent variable significance
• For X1, p-value : 0.4183

Null Hypothesis: B1=0


Alternative hypothesis: B1≠0

Since the p-value is more than 5 percent (0.05) we can not reject null and meaning
we accept null meaning B1=0. In other words, X1 can not influence Y in the
population.

• For X2, p-value: 0.0305 (3.05 percent)

Null Hypothesis: B2=0


Alternative hypothesis: B2≠0

Since p-value (0.03035) is less than 5 percent meaning that we can reject null and
accept alternative hypothesis. It means that variable X2 can influence variable Y in
the population but what direction we can not say as alternative hypothesis is ≠.

• For X3, p-value: 0.8509. So X3 is not significant to explain Y. 42


(Assumption no. 3)

Joint Hypothesis : F statistics

F statistics: 2.093 and Prob 0.1213

Null hypothesis Ho: B1=B2=B3=0

Alternative H1: Not all B’s are simultaneously equal to zero

Since the p-value is more than 5 percent (here 12.13 percent),


we can not reject null. In other words, it means that all the
independent variables (here X1 X2 and X3) can not jointly
explain or influence Y in the population.

43
Assumption no 4
SERIAL OR AUTOCORRELATION
Breusch-Godfrey Serial Correlation LM Test:

F-statistic 1.01 Prob. F(2,29) 0.3751


Obs*R-squared 2.288 Prob. Chi-Square(2) 0.3185

Null hypothesis : No serial correlation in the residuals (u)


Alternative: There is serial correlation in the residuals (u)

Since the p-value ( 0.3185) of Obs*R-squared is more than 5 percent (p>0.05), we can not reject null hypothesis meaning
that residuals (u) are not serially correlated which is desirable.

44
Assumption no. 5
Heteroscedasticy Test

Breusch-Pegan-Godfrey test (B-P-G Test)


F-statistic 1.84 Probability 0.3316
Obs*R-squared 3.600 Probability 0.3080

Null Hypothesis: Residuals (u) are Homoscedastic


Alternative: Residuals (u) are Hetroscedastic

The p-value of Obs*R-squared shows that we can not reject null. So residuals do
have constant variance which is desirable meaning that residuals are homoscedastic.

B-P-G test normally done for large sample

45
Assumption no. 6
Residual (u) Normality Test
6
Series: Residuals
Sample 1 35
5
Observations 35

4 Mean 1.15e-16
Median -0.029701
Maximum 0.972181
3 Minimum -0.505102
Std. Dev. 0.356788
2 Skewness 0.880996
Kurtosis 3.508042

1 Jarque-Bera 4.903965
Probability 0.086123
0
-0.6 -0.4 -0.2 -0.0 0.2 0.4 0.6 0.8 1.0

Null Hypothesis: residuals (u) are normally distribution


Alternative: Not normally distributed

Jarque Berra statistics is 4.903 and the corresponding p value is 0.08612. Since p
vaue is more than 5 percent we accept null meaning that population residual (u) is
normally distrbuted which fulfills the assumption of a good regression line. 46
Assumption No. 7
Sign of the coefficients

Our sample model:


Y=bo+b1x1+b2x2+b3x3+e
Sign we expected after estimation as follows:
Y=bo - b1x1 + b2x2 - b3x3

Decision : The outcome did not match with our expectation.


So assumption 4 is violated.
Evaluation of our model on the basis of
assumptions
1. R-square is very low ( Bad sign)
2. Only X2 variable is significant out of three (Bad sign).
3. Independent variables do not jointly can influence Y
(Bad sign)
4. There is no serial correlation (Good sign)
5. Heteroscedasticity problem is not there (Good sign)
6. Residuals are normally distributed (Good sign)
7. Signs are not as expected (Bad sign)

48
References

Essentials of Econometrics by Damodar


Gujarati, McGraw Hill Publication.

Basic Econometrics by Damodar Gujarati,


McGraw Hill Publication.

You might also like