Econometrics For Accounting Students
Econometrics For Accounting Students
METHODOLOGY OF ECONOMETRICS
Introduction
Theories suggest many relationships among variables. For
instance,
In microeconomics we learn
In macroeconomics, we study :
‗investment function‘
‗Consumption function‘
ECONOMETRICS
THEORITICAL APPLIED
i) Analysis
ii) Forecasting
i.e. using the numerical estimates of the coefficients in order
to forecast the future values of economic magnitudes.
C = f(Inc) ==>
Formulating a model: Cause - effect Ct = 1 + 2Inct + ut
H0: 2>0,
Testing the hypothesis: positive relationship or not If not true
Simple Multiple
Non- Non-
Linear Linear
Linear Linear
Two Variable Linear Regression Model
The stochastic relationship with one explanatory variable is called
simple linear regression model
The true relationship which connects the variables involved has
two parts: deterministic part and part represented by the
random error term .
is called population regression function(PRF) because
Y and X represent their respective population value, and
are called the true population parameters.
The parameters estimated from the sample value of Y and X are
called the estimators of the true parameters and are symbolized as
E (u i u j ) 0 …………………………..….(
Methods of Estimation
• Specifying the model and stating its underlying
assumptions are the first stage of any econometric
application.
• The next step is the estimation of the numerical values of the
parameters of economic relationships.
• The parameters of the simple linear regression model can be
estimated by various methods.
• commonly used methods are:
– Ordinary least square method (OLS)
– Maximum likelihood method (MLM)
– Method of moments (MM)
1. Ordinary Least Square Estimation (OLS)
Cont…
Cont…
2. Statistical Properties of OLS Estimators
The statistical properties of OLS are based on classical regression
assumptions.
We would like OLS estimates, as compared to other econometric
methods, to be as close as the value of the true population
parameters.
‗Closeness‘ of OLS estimate to the true population parameter is
measured by the mean and variance of the sampling distribution of
the estimate of the different econometric methods.
we assume that we get a very large number of samples each of size
‗n‘; we compute the estimate from each sample, and for each
econometric method and we form their distribution.
We next compare the expected values and the variances of these
distributions and we choose among the alternative estimates
whose distribution is concentrated as close as possible around the
true parameter.
Gauss-Markov Theorem
Given the assumptions of the classical regression assumptions, the
OLS estimators, in the class of linear and unbiased estimators,
have the minimum variance, i.e. the OLS estimators are BLUE (i.e.
best linear unbiased estimators). Theorem gives theoretical
justification for the popularity of OLS.
An estimator is called BLUE if:
1. Linear: The parameter estimates are a linear function of the
dependent variable.
3. Minimum variance:
It has a minimum variance in the class of linear and unbiased
estimators.
An unbiased estimator with the least variance is known as best/
efficient estimator.
Hypothesis Testing
Cont…
Derivation of F (ANOVA) regression (Tests for the
coefficient of determination = R2 )
The largest value that R2 can assume is 1 (in which case all
observations fall on the regression line), and the smallest it
can assume is zero.
A low value of R2 is an indication that:
X is a poor explanatory variable in the sense that
– variation in X leaves Y unaffected, or
– while X is a relevant variable, its influence on Y is weak as
compared to some other variables that are omitted from the
regression equation, or
the regression equation is miss-specified (for example, an
exponential relationship might be more appropriate).
Thus, a small value of R2 casts doubt about the usefulness of the
regression equation.
We do not, however, pass final judgment on the equation until it has
been subjected to an objective statistical test.
Such a test is accomplished by means of ANOVA which tests the
significance of R2 (i.e., the adequacy of the linear regression model)
• ------------------------------------------------------------------------------
• QD | Coef. Std. Err. t P>|t| [95% Conf. Interval]
• -------------+----------------------------------------------------------------
• Price | -1.263914 .076193 -16.59 0.000 -1.423989 -1.103838
• _cons | 11.09874 .4233685 26.22 0.000 10.20928 11.98821
• ------------------------------------------------------------------------------
B. Tests of individual significance (Testing the significance of
OLS parameters )
• Since sampling errors are inevitable in all estimates, it is
necessary to apply test of significance in order to:
measure the size of the error and
determine the degree of confidence in order to measure
the validity of these estimates.
• This can be done by using various tests. The most common
ones are:
i)Standard error test
ii) Student’s t-test
ii) Confidence interval test
• All of these testing procedures reach on the same conclusion.
Standard error test
• This test helps us decide whether the estimates are
significantly different from zero,
• Formally we test the null hypothesis:
If SE(ˆi ) 12 ˆi , accept the null hypothesis and reject the alternative hypothesis. We conclude
If SE(ˆi ) 12 ˆi , reject the null hypothesis and accept the alternative hypothesis. We conclude
Numerical example: Suppose that from a sample of size n=30, we estimate the following
supply function.
Q 120 0.6 p ei
SE : (1.7) (0.025)
Test the significance of the slope parameter at 5% level of significance using the standard error test.
Student’s t-test
• Like the standard error test, this test is also important to test the
significance of the parameters.
• We can derive the t-value of the OLS estimates:
Since we have two parameters in simple linear regression with intercept different from zero, our
degree of freedom is n-2.
Like the standard error test we formally test the hypothesis: H 0 : i 0 against the alternative
H 1 : i 0 for the slope parameter; and H 0 : 0 against the alternative H1 : 0 for the
intercept.
Step 4: Obtain critical value of t, called tc at 2 and n-2 degree of freedom for two tail test.
Step 5: Compare t* (the computed value of t) and tc (critical value of t)
If t*> tc , reject H0 and accept H1. The conclusion is ˆ is statistically significant.
The values in the brackets are standard errors. We want to test the null hypothesis: H 0 : i 0 against
ˆ 0 ˆ 0.70
t* = 3 .3
ˆ ˆ
SE( ) SE( ) 0.21
Confidence interval test
• In order to define how close the estimate to the true parameter,
we must construct confidence interval for the true parameter,
in other words we must establish limiting values around the
estimate with in which the true parameter is expected to lie
within a certain ―degree of confidence‖.
• We choose a probability in advance and refer to it as
confidence level (interval coefficient).
• It is customarily in econometrics to choose the 95% confidence
level.
• i.e., the confidence limits, computed from the sample, would
include the true population parameter in 95% of the cases.
5% of the cases the population parameter will fall outside the
confidence interval.
The limit within which the true lies at (1 )% degree of confidence is:
H1 : 0
Decision rule: If the hypothesized value of in the null hypothesis is within the confidence interval,
accept H0 and reject H1. The implication is that ˆ is statistically insignificant; while if the hypothesized
value of in the null hypothesis is outside the limit, reject H0 and accept H1. This indicates ˆ is
statistically significant.
Regression Analysis
Multiple Regression
[ Cross-Sectional Data ]
Simple Regression
A statistical model that utilizes one quantitative or
qualitative independent variable “X” to predict the
quantitative dependent variable “Y.”
i.e., considers the relation between a single explanatory
variable and response variable:
Multiple Regression
A statistical model that utilizes two or more quantitative and
qualitative explanatory variables (x1,..., xp) to predict a
quantitative dependent variable Y.
Caution: have at least two or more explanatory variables.
Multiple regression simultaneously considers the influence of
multiple explanatory variables on a response variable Y:
Simple vs. Multiple
• represents the unit change • i represents the unit change in
in Y per unit change in X . Y per unit change in Xi.
• Does not take into account • Takes into account the effect of
any other variable besides other independent variables.
single independent variable.
• r2: proportion of variation in • R 2: proportion of variation in Y
Dummy Inter-
Linear action
Variable
Poly- Square
Log Reciprocal Exponential
Nomial Root
The Multiple Linear Regression Model building
Yi β0 β1 X 1i β2 X 2i βk X ki ε
• The coefficients of the multiple regression model are
estimated using sample data with k independent
variables
Estimated Estimated
(or predicted) Estimated slope coefficients
intercept
value of Y
Ŷi b0 b1 X 1i b2 X 2i bk X ki
• Interpretation of the Slopes:
– b1=The change in the mean of Y per unit change in X1,
taking into account the effect of X2 (or net of X2)
– b0 Y intercept. It is the same as simple regression.
ASSUMPTIONS
• Linear regression model: The regression model is linear in the
parameters, though it may or may not be linear in variables.
• The X variable is independent of the error term. This means
that we require zero covariance between ui and each X variables.
cov.(ui , X1i) = cov(ui, X2i)=------- = cov(ui, Xki) = 0
• Zero mean value of disturbance ui. Given the value of Xi, the
mean, or the expected value of the random disturbance term ui is
zero.
E(ui)= 0 for each i
• Homoscedasticity or constant variance of ui . This implies that
the variance of the error term is the same, regardless of the value
of X.
var (ui) = σ2
• No auto-correlation between the disturbance terms.
cov ( ui, uj) = 0 i≠j
This implies that the observations are sampled independently.
• the number of observations n must be greater than the number
of parameters to be estimated.
• There must be variation in the values of the X variables.
Technically, var(X) must be a positive number.
• No exact collinearity between the X variables. i.e.
No strong multicollinearity: No exact linear relationship exists
between any of the explanatory variables.
Are Individual Variables Significant?
• Use t-tests of individual variable slopes
• Shows if there is a linear relationship between the variable Xi
and Y; Hypotheses:
• H0: βi = 0 (no linear relationship)
• H1: βi ≠ 0 (linear relationship does exist between Xi and Y)
• Test Statistic: bi 0
t*
S bi
bi tc S bi
Assumptions and Procedures to Conduct Multiple
Linear Regression
When you choose to analyze your data using multiple
regression, make sure that the data you want to analyze can
actually be analyzed using multiple regression.
It is only appropriate to use multiple regression if your
data "passes" eight assumptions that are required for
multiple regression to give you a valid result.
let's take a look at these eight assumptions:
Assumption #1:
Your dependent variable should be measured on a
continuous scale .
Assumption #2:
You should have two or more IVs, which can be either
continuous or categorical or dummy.
Assumption #3:
You should have independence of residuals, which you can
easily check using the Durbin-Watson statistic.
Assumption #4:
There needs to be a linear relationship between :
the DV and each of your independent variables
Assumption #5:
Your data needs to show homoscedasticity, which is where the
variances along the line of best fit remain similar as you move
along the line.
Assumption #6:
Your data must not show multi-collinearity, which occurs
when you have two or more IVs that are highly correlated with
each other.
Assumption #7:
There should be no significant outliers
This can change the output that any Statistics produces and
reduce the predictive accuracy of your results as well as the
statistical significance.
Assumption #8:
Finally, you need to check that the residuals (errors) are
normally distributed.
You can check assumptions #3, #4, #5, #6, #7 and #8 using
SPSS.
Assumptions #1 and #2 should be checked first, before
moving onto assumptions #3, #4, #5, #6, #7 and #8.
Just remember that if you do not run the statistical tests on
these assumptions correctly, the results you get when
running multiple regression might not be valid.
Given the assumptions and data on Y and set of IVs (X1,..,
XK ), the following are a suggested procedures/steps to
conduct multiple linear regression:
1. Select variables that you believe are linearly related to the
dependent variable.
2. Use a software to generate the coefficients and the statistics
used to assess the model.
3. Diagnose violations of required conditions/ assumptions.
If there are problems, attempt to remedy them.
4. Assess the model‟s fit.
5. If we are satisfied with the model‘s fit and that the required
conditions are met, we can test & interpret the coefficients
6. We use the model to predict a value of the DV.
Dummy independent Variables
Describing Qualitative Information
• In regression analysis the dependent variable can be
influenced by variables that are essentially qualitative in
nature,
such as sex, race, color, religion, nationality, geographical
region, political upheavals, and party affiliation.
= 0 otherwise
“less than high school education” category as the base
category.
Therefore, the intercept will reflect the intercept for this
category.
• the mean health care expenditure functions for the three
levels of education, namely, less than high school, high
school, and college:
E (Yi | D2 0, D3 0, X i ) 1 X i
E (Yi | D2 1, D3 0, X i ) ( 1 2 ) X i
E (Yi | D2 0, D3 1, X i ) ( 1 3 ) X i
Log-Level:
Level – Log:
it arises less often in practice.
Y = 0 + 1 log(x) + u
HETEROSCEDASTICITY
Introduction
In both the simple and multiple regression models, we made
assumptions.
var( ˆ ) under heteroscedasticty will be greater than its variance under homoscedasticity.
As such the t-value associated with it will be underestimated which might lead to the conclusion
that ˆ is statistically insignificant (which in fact may not be true).
Moreover, our inference and prediction about the population coefficients would be incorrect.
Detecting Heteroscedasticity
two methods of testing or detecting heteroscedasticity.
i. Informal method
ii. Formal method
i. Informal method
informal because it does not undertake the formal testing
procedures such as 2-test, F-test and the like.
whether a given data exhibits H or not, we look on whether
there is a systematic relation between residual squared e2i
and the Predicted value of Y or Xi.
In fig (a), we see there is no systematic pattern between the
two variables, suggesting that no H is present in the data.
Figures b to e, however, exhibit definite patterns.
ii. Formal methods
several formal methods of testing heteroscedasticty which are
based on the formal testing procedures.
some of the major ways of detecting heterosedasticity.
a. White test
b. Breusch – Pagan Test
c. Goldfield-Quandt test
Remedial measures for the problems of heteroscedasticity
1. Transforming the model
2. Taking a robust standard error during regression
CHAPTER 5
MULTICOLLINEARITY
The nature of Multicollinearity (M)
One of the assumptions of the CLRM is that there is no exact
linear relationship exists between any of the explanatory
variables.
When this assumption is violated, we speak of perfect MC.
Reasons for Multicollinearity
1. The data collection method employed
2. Constraint over the model or in the population being sampled.
3. Over determined model:
This happens when the model has more explanatory variables
than the number of observations. This could happen in medical
research where there may be a small number of patients about
whom information is collected on a large number of variables.
Consequences of Multicollinearity
If M is perfect, the regression coefficients are indeterminate and
their standard errors are infinite.
If M is less than perfect (i.e near or high M),
the regression coefficients are determinate.
OLS coefficient estimates are still unbiased.
OLS coefficient estimates will have large variances(or the
variances will be inflated).
The regression model may do well, that is, R2 may be quite
high.
Because of the large variance of the estimators, which means
large standard errors, the confidence interval tend to be much
wider, leading to the acceptance of ―zero null hypothesis‖.
Because of large standard error of the estimators, the
computed t-ratio will be very small leading one or more of the
coefficients tend to be statistically insignificant when tested
individually.
Although the t-ratio of one or more of the coefficients is very
small (which makes the coefficients statistically insignificant
individually), R2 can be very high.
The OLS estimators and their standard errors can be
sensitive to small change in the data.
Detection of Multicollinearity
Multicollinearity almost always exists in most applications.
So the question is not whether it is present or not; it is a
question of degree!
Also MC is not a statistical problem; it is a data
(sample) problem.
Therefore, we do not “test for MC‟‟; but measure its
degree in any particular sample:
1. using variance inflation factor ( for continuous Ivs)
2. High R2 but few significant t- ratio
3. High pair wise correlation among regressors
4. Test using Eigen values and condition index
Remedial measures
The following corrective procedures have been suggested if
the problem of M is found to be serious:
1. Increase the size of the sample
2. Introduce additional equation in the model
3. Dropping Variables
4. Transforming variables
Chapter 6
AUTOCORRELATION
125
The nature of Autocorrelation
In our discussion of simple and multiple regression models,
one of the assumptions of the classicalist is that
126
Graphical representation of Autocorrelation
127
Reasons for Autocorrelation
There are several reasons why serial or autocorrelation a
rises. Some of these are:
a. Cyclical fluctuations
Time series such as GNP, price index, production,
employment and unemployment exhibit business cycle.
b. Specification bias
This arises because of the following:
i. Exclusion of variables from the regression model
ii. Incorrect functional form of the model
iii. Neglecting lagged terms from the regression model
128
Effect of Autocorrelation on OLS Estimators
1. OLS estimates are unbiased
2. The variance of OLS estimates is inefficient
The variance of the estimates will be biased down wards
(i.e. underestimated) when u‘s are auto correlated.
3. Wrong Testing Procedure
If var( ˆ ) is underestimated, SE ( ˆ ) is also underestimated, this makes t-ratio large.
This large t-ratio may make ˆ statistically significant while it may not.
129
Detection (Testing) of Autocorrelation
1. Graphic method
130
2. Formal testing method
a. Durbin-Watson (DW) test
b. Breusch-Godfrey (BG) Test
131
Remedial Measures for the problems of Autocorrelation
1. If it is pure autocorrelation, one can use appropriate
transformation of the original model so that in the
transformed model we do not have the problem of (pure)
autocorrelation.
As in the case of heteroscedasticity, we have to use some type
of generalized least-square (GLS) method.
2. In some situations we can continue to use the OLS method.
3. In large samples, we can use the Newey–West method to
obtain standard errors of OLS estimators that are corrected for
autocorrelation.
This method is actually an extension of White‘s
heteroscedasticity-consistent standard errors method.
4. Run a Cochrane-Orcutt regression
132