0% found this document useful (0 votes)
33 views132 pages

Econometrics For Accounting Students

The document outlines the methodology of econometrics, emphasizing its role in analyzing relationships among economic variables through statistical techniques. It covers key topics such as regression analysis, hypothesis testing, and the distinction between theoretical and applied econometrics. Additionally, it highlights the goals of econometrics, including analysis, forecasting, and policy-making, while also discussing financial econometrics and its applications in finance.

Uploaded by

assefafikad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views132 pages

Econometrics For Accounting Students

The document outlines the methodology of econometrics, emphasizing its role in analyzing relationships among economic variables through statistical techniques. It covers key topics such as regression analysis, hypothesis testing, and the distinction between theoretical and applied econometrics. Additionally, it highlights the goals of econometrics, including analysis, forecasting, and policy-making, while also discussing financial econometrics and its applications in finance.

Uploaded by

assefafikad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 132

Contents of the course

1. Introduction (Methodology of Econometrics)


2. Simple linear regression
3. Multiple linear regression
4. Heteroscedasticity
5. Multi-collinearity
6. Autocorrelation
Chapter 1

METHODOLOGY OF ECONOMETRICS
Introduction
 Theories suggest many relationships among variables. For
instance,

 In microeconomics we learn

– demand and supply models.

 In macroeconomics, we study :

 ‗investment function‘

 ‗Consumption function‘

 Each of such specifications involves a relationship among


variables.
Introduction cont’ed
 As an accountant, we may be interested in questions such as:
 If one variable changes in a certain magnitude, by how much
will another variable change?
 Also, given the value of one variable; can we forecast or
predict the corresponding value of another?
 The purpose of studying the relationships among variables and
attempting to answer questions of the type raised here is to help us
understood the real world we live in.
 However, theories that postulate the relationships between
variables have to be checked against data obtained from the real
world:
• If empirical data verify the relationship proposed by the theory,
we accept the theory as valid.
Introduction cont’ed
• If the theory is incompatible with the observed
behavior, we either reject the theory or in the light of
the empirical evidence of the data, modify the theory.
Having said the background statement in our attempt
for defining ‗ECONOMETRICS‘, we may now formally
define what econometrics is.
WHAT IS ECONOMETRICS?
• The term ‗Econometrics‟ is formed from two word of Greek
origin ―oikovoia‖ (economy) and ―uetpu‖ (measures).

Literally, Econometrics means ―economic measurement‖.

but the scope of econometrics is much broader as


described by leading econometricians.

• In short, Econometrics can be considered as the integration


of economics, mathematics and statistics for the purpose of :

providing numerical values for the parameters of


economic variables and

 verifying economic theories.


Econometrics may be defined as a social science in
which the tools of Economic theory, Mathematics,
and Statistical inference are applied to the analysis
of economic phenomena”(Goldberger,1964).
TYPES OF ECONOMETRICS

ECONOMETRICS

THEORITICAL APPLIED

CLASSICAL BAYESIAN CLASSICAL BAYESIAN


Theoretical Econometrics: -

 is concerned with the development of appropriate methods for


measuring economic relationships specified by models.
 One of the methods used extensively in this course is least
squares.
 TE must spell out the assumptions of this method, its
properties, and what happens to these properties when one or
more of the assumptions of the method are not fulfilled.
Applied Econometrics: -
 In applied econometrics we use the tools of theoretical
econometrics to study some special fields of economics and
business, such as the production function, investment function,
demand and supply functions, etc.
WHY ECONOMTRICS IS A SEPARATE DISCIPLNE?
 Economic theory makes statements or hypotheses that are
mostly qualitative in nature (the law of demand), it does not
provide any numerical measures of the relationships.
 This job is done by the ‗Econometricians‘.
 The main concern of mathematical economics is to express
economic theory in mathematical form without regard to
measurability or empirical measure of the theory.
 Econometrics is mainly interested in the empirical
verification of economic theory/model.
 Economic statistics is mainly concerned with collecting,
processing and presenting economic data in the form of charts
and tables.
 It does not go any further. The one who does it is the
econometricians.
Goals of Econometrics
Three main goals of Econometrics are identified:

i) Analysis

 i.e. testing economic theory

ii) Forecasting
 i.e. using the numerical estimates of the coefficients in order
to forecast the future values of economic magnitudes.

iii) Policy making


 i.e. Using numerical estimates of the coefficients of
economic relationships for policy simulations.
Financial Econometrics
 The main techniques employed for studying economic
problems are of equal importance in financial
applications.
 financial econometrics will be defined as the application of
statistical techniques to problems in finance.
 Financial econometrics can be useful for:
 testing theories in finance,
 determining asset prices or returns,
 testing hypotheses concerning the relationships b/n variables,
 examining the effect on financial markets of changes in
economic conditions,
 forecasting future values of financial variables and for
financial decision-making.
Financial Econometrics
 A list of possible examples of where econometrics may be useful:
1. Testing whether the Capital Asset Pricing Model (CAPM) or
Arbitrage Pricing Theory (APT) represent superior models for
the determination of returns on risky assets
2. Measuring and forecasting the volatility of bond returns
3. Explaining the determinants of bond credit ratings used by the
ratings agencies
4. Modelling long-term r/ns between prices & exchange rates
5. Testing the hypothesis that earnings or dividend announcements
have no effect on stock prices
6. Testing whether future markets react more rapidly to news
7. Forecasting the correlation between the stock indices of two
countries.
Is financial econometrics different from „economic
econometrics‟?

 As previously stated, the tools commonly used in financial


applications are fundamentally the same as those used in
economic applications,
 although the emphasis and the sets of problems that are
likely to be encountered when analysing data are somewhat
different.

 Financial data often differ from macroeconomic data in


terms of their frequency, accuracy, seasonality and other
properties.
METHODOLOGY OF ECONOMETRICS
• Broadly speaking econometrics methodology proceeds
along the following lines:
1. Statement of Economic theory or hypothesis.
2. Specification of the mathematical model of the theory.
3. Specification of the statistical or econometrics models.
4. Collecting the Data.
5. Estimations of the parameters of the econometric model.
6. Hypothesis testing.
7. Forecasting or Predictions.
8. Using the model for control or policy purposes.
1.Statement of Economic Theory or Hypothesis:

• Statement of a theory is a definite or clear expression of


the theory.
• For exp.-Keynes postulated that :
 on average, consumers increase their consumption as their
income increases, but not as much as the increase in their
income.
 the marginal propensity to consume (MPC),the rate of
change of consumption for a unit (say, a dollar) change in
income, is greater than zero but less than 1.
2. Specification of the Mathematical Model:
• The second step is to express the relationship in mathematical
form, that is to specify the model , with which economic
phenomena will be explored empirically.
Y=β1 +β2X (0 <β <1)
Y= consumption expenditure (dependent variable)
X= income (independent or explanatory variable)
β1= the intercept coefficient
β2= the slope coefficient
• This equation, which states that consumption is linearly
related to income , is an example of mathematical model of
the relationship b/w consumption and income, that is called the
consumption function.
• The slope coefficient β2 measures the MPC.
Geometrically:
3. Specification of the Econometrics model:
• The relationships between economic variables are
generally inexact.
 Here in addition to income, other variables affect
consumption expenditure.
 For example - size of family, Age structure, Religion,
Income distribution, traditions, psychological and
sociological factors etc, are likely to exert some influence
on consumption.
• To allow for the inexact relationships between economic
variable, the consumption function is modified as follows:
Y=β1+β2X+u
Cont...
• Where u, known as the disturbance, or error term is a
random variable that has well defined probabilistic
properties.
This disturbance term u may well represent all those
factors that affect consumption but are not taken into
account
• The econometrics consumption hypothesizes that the
dependent variable is linearly related to the explanatory
variable but that relationship between the two is not
exact; it is subject to individual variations.
The economic model of the consumption function can be
depicted as shown in the following figure.
4. Obtaining Data
 To estimate the econometric model or to obtain the
numerical values of β1 and β2 the data used may be of
various types:
A. Time series: Time series data give information about the
numerical values of variables from period to period.
 A time series data set consists of observations on a variable
or several variables over time.
B. Cross-section data: These data give information on the
variables concerning individual agents(consumers and
producers) at a given point of time.
 For exp.- A cross-section sample of various family budgets
not only tells about expenditure pattern but also family
income, family composition, and other demographic, social
or financial characteristics.
The variable avgmin refers to the average minimum wage for the year, avgcov is the
average coverage rate (the percentage of workers covered by the minimum wage law),
unemp is the unemployment rate, and gnp is the gross national product.
Cont..
C. Panel data: These are repeated surveys of a single sample
(cross-section) in different period of times.
 They record the behaviour of the same set of
individual micro economic units over time.
D. Engineering data: These data give information about the
technical requirements of the methods of production
employed for producing a certain commodity.
 These are collected from the producers of the
commodity and are used in the studies of production
(production function, input-output relationships, etc.
E. Data constructed by the econometricians: Dummy variables
 In many cases some factors affecting the dependent variable
cannot be measured in any of the above conventional data,
because they are qualitative factors. For exp.- profession ,
religion, sex are the factors affecting consumption of particular
items like bread, meat, cosmetics etc.
 Such variables can be approximated by the introduction of
function of ‗Dummy Variable‘. For exp:- if we study the
demand for bread with cross section data, the factor sex could
be represented by dummy variable, which might be assigned
the value of:
• One when individual is a Male.
• Zero when the consumer is a Female.
 In this case the coefficient of the dummy variable will be
positive if in the real world females consume less bread.
5. Estimation of the parameters of the econometric model:
• After obtaining the data, next step is to estimate the
parameters of the given function (here consumption
function)
• The numerical estimates of the parameters give
empirical content to the consumption function.
• For this we try to estimate the model on the basis of
collected data using appropriate tools and techniques.
• The main tool used to obtain the estimates is the
statistical tool of „Regression analysis‟ such as, OLS,
MLM, Logit, and Probit.
6. Hypothesis Testing/Evaluation of the estimates and the model:

• This stage enables the econometrician to evaluate the results


of calculations and determine the reliability of the results.
• The evaluation consist of deciding whether the estimates of
the parameters are theoretically meaningful and
statistically satisfactory.
• For this purpose we use various criteria which may be
classified into three groups:
i. Economic a priori criteria:
 These criteria are determined by economic theory and
 refer to the size and sign of the parameters of economic
relationships.
ii. Statistical criteria (first-order tests):
 These are determined by statistical theory and
 aim at the evaluation of the statistical reliability of the
estimates of the parameters of the model.
 Correlation coefficient test, standard error test, t-test, F-test, and
R2-test are some of the most commonly used statistical tests.
iii. Econometric criteria (second-order tests):
 aim at the detection of the violation or validity of the
assumptions of the various econometric techniques.
 They serve as a test of the statistical tests i.e. they determine
the reliability of the statistical criteria; they help us establish
whether the estimates have the desirable properties of BLUE.
HYPOTHESIS TESTING (Evaluation of
the forecasting power of the model)
Cont..
If the model fit well If model does not fit
to the Data. well to the Data
Again specify the
Use the model for economic theory,
prediction or mathematical and
forecasting econometrics model.

Use the predicted or Recollect the fresh data


forecasted value for for calculating variables
policy formation specified in the models.
7. FORCASTING OR PREDICTION
• Forecasting is one of the aims of econometric research.

• If the chosen model does not refute the hypothesis or


theory under consideration, we may use it to predict
the future value(s)of the dependent, or forecast
variable, Y on the basis of the known or expected future
value of the explanatory, or predictor variable, X.
8. USE OF THE MODEL FOR CONTROL OR POLICY
PORPOSE

• On the basis of the calculations, an estimated


model can/may be used for control or policy
purposes.
• By appropriate fiscal or monetary policy mix, the
government can manipulate the control variable X
to produce the desired level of the target variable
Y.
Economic Empirical Study

Economic Theory; Past Experience, studies

C = f(Inc) ==>
Formulating a model: Cause - effect Ct = 1 + 2Inct + ut

Gathering data: Statistics monthly, quarterly, yearly data

Estimating the model: Simple OLS method or other advances

H0: 2>0,
Testing the hypothesis: positive relationship or not If not true

Interpreting the results:

Forecasting Policy implication and decisions


Chapter 2

Two-Variable Regression Model:


Assumptions, Properties,
Estimations and hypothesis testing
Introduction
• Economic theories are mainly concerned with the
relationships among various economic variables.
• These relationships can predict the effect of one variable on
another.
• The functional relationships of these variables define the
dependence of one variable upon the other variable in the
specific form.
• The specific functional forms may be linear, quadratic,
logarithmic, exponential, or any other form.
• Regression analysis is concerned with the study of the
dependence of one variable, the dependent variable, on one
or more other variables, the explanatory variables,
 estimating and/or predicting the (population) mean or
average value of the former in terms of the known or fixed
values of the latter.
Types of Regression Models

1 Explanatory Regression 2+ Explanatory


Models
Variable Variables

Simple Multiple

Non- Non-
Linear Linear
Linear Linear
Two Variable Linear Regression Model
 The stochastic relationship with one explanatory variable is called
simple linear regression model
 The true relationship which connects the variables involved has
two parts: deterministic part and part represented by the
random error term .
 is called population regression function(PRF) because
Y and X represent their respective population value, and
are called the true population parameters.
 The parameters estimated from the sample value of Y and X are
called the estimators of the true parameters and are symbolized as

 is called SRF(shows estimated relationship) between Y


and X.
 represents the sample residual counterpart of the Ui
U may be a cumulative effect of the following
factors:
• Omission of variables from the function
• Random behavior of human beings
• Imperfect specification of the mathematical
form of the model
• Error of aggregation
• Error of measurement
Cont…
Scatter diagram representation of a function
• The scatter of observations represents the true
relationship between Y and X.
• The line represents the exact part of the relationship and
the deviation of the observation from the line represents
the random component of the relationship.
- The first component in the bracket is the part of Y explained by the changes
in X and the second is the part of Y not explained by X, that is to say the
change in Y is due to the random influence of u i .
Assumptions
Algebraically,
 
Cov(ui u j )   [(ui  (ui )][u j  (u j )]

 E (u i u j )  0 …………………………..….(
Methods of Estimation
• Specifying the model and stating its underlying
assumptions are the first stage of any econometric
application.
• The next step is the estimation of the numerical values of the
parameters of economic relationships.
• The parameters of the simple linear regression model can be
estimated by various methods.
• commonly used methods are:
– Ordinary least square method (OLS)
– Maximum likelihood method (MLM)
– Method of moments (MM)
1. Ordinary Least Square Estimation (OLS)
Cont…
Cont…
2. Statistical Properties of OLS Estimators
 The statistical properties of OLS are based on classical regression
assumptions.
 We would like OLS estimates, as compared to other econometric
methods, to be as close as the value of the true population
parameters.
 ‗Closeness‘ of OLS estimate to the true population parameter is
measured by the mean and variance of the sampling distribution of
the estimate of the different econometric methods.
 we assume that we get a very large number of samples each of size
‗n‘; we compute the estimate from each sample, and for each
econometric method and we form their distribution.
 We next compare the expected values and the variances of these
distributions and we choose among the alternative estimates
whose distribution is concentrated as close as possible around the
true parameter.
Gauss-Markov Theorem
 Given the assumptions of the classical regression assumptions, the
OLS estimators, in the class of linear and unbiased estimators,
have the minimum variance, i.e. the OLS estimators are BLUE (i.e.
best linear unbiased estimators). Theorem gives theoretical
justification for the popularity of OLS.
 An estimator is called BLUE if:
1. Linear: The parameter estimates are a linear function of the
dependent variable.

 For example, the parameters β0 & β1 are a linear function of


the dependent variable Y in the following simple regression
model.
2. Unbiased:
 The estimated parameters are unbiased estimators of the
population parameters.
 So the expected or mean values of the estimated parameters are
equal to the true population parameters:

3. Minimum variance:
 It has a minimum variance in the class of linear and unbiased
estimators.
 An unbiased estimator with the least variance is known as best/
efficient estimator.
Hypothesis Testing
Cont…
Derivation of F (ANOVA) regression (Tests for the
coefficient of determination = R2 )
 The largest value that R2 can assume is 1 (in which case all
observations fall on the regression line), and the smallest it
can assume is zero.
 A low value of R2 is an indication that:
 X is a poor explanatory variable in the sense that
– variation in X leaves Y unaffected, or
– while X is a relevant variable, its influence on Y is weak as
compared to some other variables that are omitted from the
regression equation, or
 the regression equation is miss-specified (for example, an
exponential relationship might be more appropriate).
 Thus, a small value of R2 casts doubt about the usefulness of the
regression equation.
 We do not, however, pass final judgment on the equation until it has
been subjected to an objective statistical test.
 Such a test is accomplished by means of ANOVA which tests the
significance of R2 (i.e., the adequacy of the linear regression model)

 The ANOVA table for simple linear regression is given below:


 Any sum of squares is associated with its df:
 TSS has n-1 df because we lose 1 df in computing the sample mean,𝑌 ̅.
 RSS has n-k = n - 2 df and
 ESS has k-1 = 2-1=1 df due to the fact that 𝐸𝑆𝑆 is a function of 𝛽 ̂ only.
 Let us arrange the various sums of squares and their associated df.
Then, the F-distribution can be calculated:
• Source | SS df MS Number of obs = 20
• -------------+------------------------------ F( 1, 18) = 275.17
• Model | 88.9795332 1 88.9795332 Prob > F = 0.0000
• Residual | 5.82046679 18 .323359266 R-squared = 0.9386
• -------------+------------------------------ Adj R-squared = 0.9352
• Total | 94.8 19 4.98947368 Root MSE = .56865

• ------------------------------------------------------------------------------
• QD | Coef. Std. Err. t P>|t| [95% Conf. Interval]
• -------------+----------------------------------------------------------------
• Price | -1.263914 .076193 -16.59 0.000 -1.423989 -1.103838
• _cons | 11.09874 .4233685 26.22 0.000 10.20928 11.98821
• ------------------------------------------------------------------------------
B. Tests of individual significance (Testing the significance of
OLS parameters )
• Since sampling errors are inevitable in all estimates, it is
necessary to apply test of significance in order to:
 measure the size of the error and
determine the degree of confidence in order to measure
the validity of these estimates.
• This can be done by using various tests. The most common
ones are:
i)Standard error test
ii) Student’s t-test
ii) Confidence interval test
• All of these testing procedures reach on the same conclusion.
Standard error test
• This test helps us decide whether the estimates are
significantly different from zero,
• Formally we test the null hypothesis:

• The standard error test may be outlined as follows:


• First: Compute standard error of the parameters:

• Second: compare the standard errors with the numerical


values of .
Decision rule:

 If SE(ˆi )  12 ˆi , accept the null hypothesis and reject the alternative hypothesis. We conclude

that ˆi is statistically insignificant.

 If SE(ˆi )  12 ˆi , reject the null hypothesis and accept the alternative hypothesis. We conclude

that ˆi is statistically significant.

 Numerical example: Suppose that from a sample of size n=30, we estimate the following
supply function.
Q  120  0.6 p  ei
SE : (1.7) (0.025)
Test the significance of the slope parameter at 5% level of significance using the standard error test.
Student’s t-test
• Like the standard error test, this test is also important to test the
significance of the parameters.
• We can derive the t-value of the OLS estimates:

 Since we have two parameters in simple linear regression with intercept different from zero, our
degree of freedom is n-2.

 Like the standard error test we formally test the hypothesis: H 0 :  i  0 against the alternative

H 1 :  i  0 for the slope parameter; and H 0 :   0 against the alternative H1 :   0 for the

intercept.

 To undertake the above test we follow the following steps.


Step 1: Compute t*, which is called the computed value of t,
by taking the value of  in the null hypothesis. In our case  = 0,
then t* becomes:

Step 2: Choose level of significance.


• Level of significance is the probability of making ‗wrong‘
decision, i.e. the probability of rejecting the hypothesis or
the probability of committing an error.
• It is usual in econometric research to choose the 5% or the
1% level of significance.
 This means that in making our decision we allow (tolerate)
one/five times out of a hundred to be „wrong‟.
Step 3: Check whether there is one tail test or two tail test
• If the inequality sign in H1 is ≠ , then it implies a two tail test and
divide the chosen level of significance by two; decide the critical
value of t called tc.
• But if the inequality sign is either > or < then it indicates one tail
test and there is no need to divide the chosen level of significance
by two to obtain the critical value from the t-table.

Step 4: Obtain critical value of t, called tc at  2 and n-2 degree of freedom for two tail test.
Step 5: Compare t* (the computed value of t) and tc (critical value of t)
 If t*> tc , reject H0 and accept H1. The conclusion is ˆ is statistically significant.

 If t*< tc , accept H0 and reject H1. The conclusion is ˆ is statistically insignificant.


Numerical Example:
Suppose that from a sample size n=20 we estimate the following consumption function:
C  100  0.70  e
(75.5) (0.21)

The values in the brackets are standard errors. We want to test the null hypothesis: H 0 :  i  0 against

the alternative H 1 :  i  0 using the t-test at 5% level of significance.


a. the t-value for the test statistic is:

ˆ  0 ˆ 0.70
t*   =  3 .3
ˆ ˆ
SE(  ) SE(  ) 0.21
Confidence interval test
• In order to define how close the estimate to the true parameter,
we must construct confidence interval for the true parameter,
 in other words we must establish limiting values around the
estimate with in which the true parameter is expected to lie
within a certain ―degree of confidence‖.
• We choose a probability in advance and refer to it as
confidence level (interval coefficient).
• It is customarily in econometrics to choose the 95% confidence
level.
• i.e., the confidence limits, computed from the sample, would
include the true population parameter in 95% of the cases.
 5% of the cases the population parameter will fall outside the
confidence interval.
The limit within which the true  lies at (1   )% degree of confidence is:

[ˆ  SE(ˆ )t c , ˆ  SE(ˆ )t c ] ; where t c is the critical value of t at 


2 confidence interval and n-2 degree
of freedom.
The test procedure is outlined as follows.
H0 :   0

H1 :   0

Decision rule: If the hypothesized value of  in the null hypothesis is within the confidence interval,

accept H0 and reject H1. The implication is that ˆ is statistically insignificant; while if the hypothesized

value of  in the null hypothesis is outside the limit, reject H0 and accept H1. This indicates ˆ is
statistically significant.
Regression Analysis

Multiple Regression
[ Cross-Sectional Data ]
Simple Regression
 A statistical model that utilizes one quantitative or
qualitative independent variable “X” to predict the
quantitative dependent variable “Y.”
 i.e., considers the relation between a single explanatory
variable and response variable:
Multiple Regression
 A statistical model that utilizes two or more quantitative and
qualitative explanatory variables (x1,..., xp) to predict a
quantitative dependent variable Y.
Caution: have at least two or more explanatory variables.
 Multiple regression simultaneously considers the influence of
multiple explanatory variables on a response variable Y:
Simple vs. Multiple
•  represents the unit change • i represents the unit change in
in Y per unit change in X . Y per unit change in Xi.
• Does not take into account • Takes into account the effect of
any other variable besides other independent variables.
single independent variable.
• r2: proportion of variation in • R 2: proportion of variation in Y

Y predictable from X. predictable by set of X‘s


Multiple Regression Models
Multiple
Regression
Models
Non-
Linear
Linear

Dummy Inter-
Linear action
Variable

Poly- Square
Log Reciprocal Exponential
Nomial Root
The Multiple Linear Regression Model building

Idea: Examine the linear relationship between 1


dependent (Y) & 2 or more independent variables (Xi)

Multiple Regression Model with k Independent Variables:

Y-intercept Population slopes Random Error

Yi  β0  β1 X 1i  β2 X 2i    βk X ki  ε
• The coefficients of the multiple regression model are
estimated using sample data with k independent
variables
Estimated Estimated
(or predicted) Estimated slope coefficients
intercept
value of Y

Ŷi  b0  b1 X 1i  b2 X 2i    bk X ki
• Interpretation of the Slopes:
– b1=The change in the mean of Y per unit change in X1,
taking into account the effect of X2 (or net of X2)
– b0 Y intercept. It is the same as simple regression.
ASSUMPTIONS
• Linear regression model: The regression model is linear in the
parameters, though it may or may not be linear in variables.
• The X variable is independent of the error term. This means
that we require zero covariance between ui and each X variables.
cov.(ui , X1i) = cov(ui, X2i)=------- = cov(ui, Xki) = 0
• Zero mean value of disturbance ui. Given the value of Xi, the
mean, or the expected value of the random disturbance term ui is
zero.
E(ui)= 0 for each i
• Homoscedasticity or constant variance of ui . This implies that
the variance of the error term is the same, regardless of the value
of X.
var (ui) = σ2
• No auto-correlation between the disturbance terms.
cov ( ui, uj) = 0 i≠j
 This implies that the observations are sampled independently.
• the number of observations n must be greater than the number
of parameters to be estimated.
• There must be variation in the values of the X variables.
Technically, var(X) must be a positive number.
• No exact collinearity between the X variables. i.e.
No strong multicollinearity: No exact linear relationship exists
between any of the explanatory variables.
Are Individual Variables Significant?
• Use t-tests of individual variable slopes
• Shows if there is a linear relationship between the variable Xi
and Y; Hypotheses:
• H0: βi = 0 (no linear relationship)
• H1: βi ≠ 0 (linear relationship does exist between Xi and Y)
• Test Statistic: bi  0
t* 
S bi

• Confidence interval for the population slope βi

bi  tc S bi
Assumptions and Procedures to Conduct Multiple
Linear Regression
 When you choose to analyze your data using multiple
regression, make sure that the data you want to analyze can
actually be analyzed using multiple regression.
 It is only appropriate to use multiple regression if your
data "passes" eight assumptions that are required for
multiple regression to give you a valid result.
let's take a look at these eight assumptions:
Assumption #1:
 Your dependent variable should be measured on a
continuous scale .
Assumption #2:
 You should have two or more IVs, which can be either
continuous or categorical or dummy.
Assumption #3:
 You should have independence of residuals, which you can
easily check using the Durbin-Watson statistic.
Assumption #4:
 There needs to be a linear relationship between :
 the DV and each of your independent variables
Assumption #5:
 Your data needs to show homoscedasticity, which is where the
variances along the line of best fit remain similar as you move
along the line.
Assumption #6:
 Your data must not show multi-collinearity, which occurs
when you have two or more IVs that are highly correlated with
each other.
Assumption #7:
 There should be no significant outliers
 This can change the output that any Statistics produces and
reduce the predictive accuracy of your results as well as the
statistical significance.
Assumption #8:
 Finally, you need to check that the residuals (errors) are
normally distributed.
You can check assumptions #3, #4, #5, #6, #7 and #8 using
SPSS.
Assumptions #1 and #2 should be checked first, before
moving onto assumptions #3, #4, #5, #6, #7 and #8.
 Just remember that if you do not run the statistical tests on
these assumptions correctly, the results you get when
running multiple regression might not be valid.
Given the assumptions and data on Y and set of IVs (X1,..,
XK ), the following are a suggested procedures/steps to
conduct multiple linear regression:
1. Select variables that you believe are linearly related to the
dependent variable.
2. Use a software to generate the coefficients and the statistics
used to assess the model.
3. Diagnose violations of required conditions/ assumptions.
 If there are problems, attempt to remedy them.
4. Assess the model‟s fit.
5. If we are satisfied with the model‘s fit and that the required
conditions are met, we can test & interpret the coefficients
6. We use the model to predict a value of the DV.
Dummy independent Variables
Describing Qualitative Information
• In regression analysis the dependent variable can be
influenced by variables that are essentially qualitative in
nature,
 such as sex, race, color, religion, nationality, geographical
region, political upheavals, and party affiliation.

• One way we could “quantify” such attributes is by


constructing artificial variables that take on values of 1 or 0,
 1 indicating the presence (or possession) of that attribute and
0 indicating the absence of that attribute.

• Variables that assume such 0 & 1 values are called dummy/


indicator/ binary/ categorical/ dichotomous variables.
Example 1:
where Y=annual salary of a college professor
Di  1 if male college professor

= 0 otherwise (i.e., female professor)


 The Model may enable us to find out whether sex makes any
difference in a college professor‟s salary, assuming, of course,
that all other variables such as age, degree attained, and years of
experience are held constant.
 Mean salary of female college professor:
 Mean salary of male college professor:
  tells by how much the mean salary of a male college professor
differs from the mean salary of his female counterpart.
 A test of the null hypothesis that there is no sex discrimination
(Ho:  = 0) can be easily made and finding out whether on the basis
of the t test the estimated  is statistically significant.
Example 2:
Where: Xi = years of teaching experience
Mean salary of female college professor: E (Yi / X i , Di  0)  1  X i
Mean salary of male college professor: E (Yi / X i , Di  1)  (   2 )  X i
 the male and female college professors‘ salary functions in relation to
the years of teaching experience have the same slope () but
different intercepts.
 Male intercept = a1 +a2
 Female intercept = a1
 Difference = a2
Note: If a qualitative variable has ‗m‟ categories, introduce only „m-1‟
dummy variables.
 The group, category, or classification that is assigned the value of 0
is often referred to as the base, benchmark, control, comparison,
reference, or omitted category.
Example 3: qualitative variable with more than two classes
 regress the annual expenditure on health care by an
individual on the income and education of the individual.
Yi   1   2 D2i   3 D3i  X i  u i

Where Yi  annual expenditure on health care


X i  annual income

D2  1 if high school education


= 0 otherwise
D3  1 if college education

= 0 otherwise
 “less than high school education” category as the base
category.
 Therefore, the intercept  will reflect the intercept for this
category.
• the mean health care expenditure functions for the three
levels of education, namely, less than high school, high
school, and college:
E (Yi | D2  0, D3  0, X i )   1  X i

E (Yi | D2  1, D3  0, X i )  ( 1   2 )  X i

E (Yi | D2  0, D3  1, X i )  ( 1   3 )  X i
Log-Level:

 wage increases by 8.3 percent for every additional year of education.


Log-Log:

 The coefficient of log(sales) is the estimated elasticity of


salary with respect to sales.
• It implies that a 1 percent increase in firm sales increases salary
by about 0.257 percent—the usual interpretation of an elasticity.

Level – Log:
 it arises less often in practice.
Y = 0 + 1 log(x) + u

Y –hat = 110 + 12 log(x), change in Y hat =?


CHAPTER 4

HETEROSCEDASTICITY
Introduction
In both the simple and multiple regression models, we made
assumptions.

 Now, we are going to address the following ‗questions:

 What if the error variance is not constant over all observations?

 What if the different errors are correlated?

 What if the explanatory variables are correlated?

 What are the consequences such violations on estimators?

 How do we detect their presence?

 What are the remedial measures?


 In general we could encounter any combination of 3
problems:
 the coefficient estimates are wrong
 the associated standard errors are wrong
 the distribution that we assumed for the test statistics will
be inappropriate.
Heteroscedasticity
The nature of Heteroscedasticty
 The assumption of homoscedasticity states that the
variation of each ui around its zero mean does not depend on
the value of explanatory variable.
 Mathematically,  u2 is not a function of X; i.e.  2  f ( X i )

 When the variance of the error term is different for


different values of X you have heteroskedasticity.

 Thus, we say that U‘s are heteroscedastic when:


var( u i )   u2 (a constant) but

var( u i )   ui2 (a value that varies)


Reasons for Hetroscedasticity
 Error learning model:
 It states that as people learn, their error of behavior
become smaller over time.
 In this case is expected to decrease.
 As data collection technique improves, is likely to decrease.
 H can also arise as a result of the presence of outliers.
 An outlier is an observation that is much different
(either very small or very large) in relation to the other
observation in the sample.
Consequences of Hetroscedasticity for the OLS estimates
 The OLS estimators will have no bias
 i.e., the estimates are unbiased even under the condition of H.
 Variance of OLS coefficients will be incorrect
 the estimators do not have the smallest variance in the class of
unbiased estimators and, therefore, they are not efficient.

 var( ˆ ) under heteroscedasticty will be greater than its variance under homoscedasticity.

 As a result the true standard error of ˆ shall be overestimated.

 As such the t-value associated with it will be underestimated which might lead to the conclusion
that ˆ is statistically insignificant (which in fact may not be true).

 Moreover, our inference and prediction about the population coefficients would be incorrect.
Detecting Heteroscedasticity
 two methods of testing or detecting heteroscedasticity.
i. Informal method
ii. Formal method
i. Informal method
 informal because it does not undertake the formal testing
procedures such as 2-test, F-test and the like.
 whether a given data exhibits H or not, we look on whether
there is a systematic relation between residual squared e2i
and the Predicted value of Y or Xi.
 In fig (a), we see there is no systematic pattern between the
two variables, suggesting that no H is present in the data.
 Figures b to e, however, exhibit definite patterns.
ii. Formal methods
 several formal methods of testing heteroscedasticty which are
based on the formal testing procedures.
 some of the major ways of detecting heterosedasticity.
a. White test
b. Breusch – Pagan Test
c. Goldfield-Quandt test
Remedial measures for the problems of heteroscedasticity
1. Transforming the model
2. Taking a robust standard error during regression
CHAPTER 5

MULTICOLLINEARITY
The nature of Multicollinearity (M)
 One of the assumptions of the CLRM is that there is no exact
linear relationship exists between any of the explanatory
variables.
 When this assumption is violated, we speak of perfect MC.
Reasons for Multicollinearity
1. The data collection method employed
2. Constraint over the model or in the population being sampled.
3. Over determined model:
 This happens when the model has more explanatory variables
than the number of observations. This could happen in medical
research where there may be a small number of patients about
whom information is collected on a large number of variables.
Consequences of Multicollinearity
 If M is perfect, the regression coefficients are indeterminate and
their standard errors are infinite.
 If M is less than perfect (i.e near or high M),
 the regression coefficients are determinate.
 OLS coefficient estimates are still unbiased.
 OLS coefficient estimates will have large variances(or the
variances will be inflated).
 The regression model may do well, that is, R2 may be quite
high.
 Because of the large variance of the estimators, which means
large standard errors, the confidence interval tend to be much
wider, leading to the acceptance of ―zero null hypothesis‖.
 Because of large standard error of the estimators, the
computed t-ratio will be very small leading one or more of the
coefficients tend to be statistically insignificant when tested
individually.
 Although the t-ratio of one or more of the coefficients is very
small (which makes the coefficients statistically insignificant
individually), R2 can be very high.
 The OLS estimators and their standard errors can be
sensitive to small change in the data.
Detection of Multicollinearity
 Multicollinearity almost always exists in most applications.
 So the question is not whether it is present or not; it is a
question of degree!
 Also MC is not a statistical problem; it is a data
(sample) problem.
 Therefore, we do not “test for MC‟‟; but measure its
degree in any particular sample:
1. using variance inflation factor ( for continuous Ivs)
2. High R2 but few significant t- ratio
3. High pair wise correlation among regressors
4. Test using Eigen values and condition index
Remedial measures
 The following corrective procedures have been suggested if
the problem of M is found to be serious:
1. Increase the size of the sample
2. Introduce additional equation in the model
3. Dropping Variables
4. Transforming variables
Chapter 6

AUTOCORRELATION

125
The nature of Autocorrelation
 In our discussion of simple and multiple regression models,
one of the assumptions of the classicalist is that

 which implies that successive values of disturbance term U


are independent,
 This means that when observations are made over time, the
effect of disturbance occurring at one period does not carry
over into another period.
 Hence, Autocorrelation is a special case of correlation which
refers to the relationship between successive values of the same
variable.

126
Graphical representation of Autocorrelation

127
Reasons for Autocorrelation
 There are several reasons why serial or autocorrelation a
rises. Some of these are:
a. Cyclical fluctuations
 Time series such as GNP, price index, production,
employment and unemployment exhibit business cycle.
b. Specification bias
This arises because of the following:
i. Exclusion of variables from the regression model
ii. Incorrect functional form of the model
iii. Neglecting lagged terms from the regression model

128
Effect of Autocorrelation on OLS Estimators
1. OLS estimates are unbiased
2. The variance of OLS estimates is inefficient
 The variance of the estimates will be biased down wards
(i.e. underestimated) when u‘s are auto correlated.
3. Wrong Testing Procedure
 If var( ˆ ) is underestimated, SE ( ˆ ) is also underestimated, this makes t-ratio large.

 This large t-ratio may make ˆ statistically significant while it may not.

4. Wrong testing procedure will make wrong prediction and


inference about the characteristics of the population.

129
Detection (Testing) of Autocorrelation
1. Graphic method

130
2. Formal testing method
a. Durbin-Watson (DW) test
b. Breusch-Godfrey (BG) Test

131
Remedial Measures for the problems of Autocorrelation
1. If it is pure autocorrelation, one can use appropriate
transformation of the original model so that in the
transformed model we do not have the problem of (pure)
autocorrelation.
 As in the case of heteroscedasticity, we have to use some type
of generalized least-square (GLS) method.
2. In some situations we can continue to use the OLS method.
3. In large samples, we can use the Newey–West method to
obtain standard errors of OLS estimators that are corrected for
autocorrelation.
 This method is actually an extension of White‘s
heteroscedasticity-consistent standard errors method.
4. Run a Cochrane-Orcutt regression
132

You might also like