0% found this document useful (0 votes)
127 views42 pages

Econometrics - Lecture Notes

Uploaded by

sakthi vel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
127 views42 pages

Econometrics - Lecture Notes

Uploaded by

sakthi vel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 42

Semester – V

Major Based Elective III: Econometrics – 21USTM3

UNIT – I: Introduction to Econometrics

1. Definition: Economic model


An economic model is a simplified description of reality, designed to yield hypotheses about
economic behavior that can be tested. An important feature of an economic model is that it is
necessarily subjective in design because there are no objective measures of economic outcomes.

2. Definition: Econometrics
Econometrics, the result of a certain outlook on the role of economics, consists of the application
of mathematical statistics to economic data to lend empirical support to the models constructed by
mathematical economics and to obtain numerical results.
Econometrics refers to the application of economic theory and statistical techniques for the
purpose of testing hypothesis and estimating and forecasting economic phenomenon. Literally
interpreted, econometrics means “economic measurement.”

3. Scope of Econometrics
 Developing statistical methods for the estimations of economic relationships
 Testing economic theories and hypothesis
 Evaluating and applying economic policies
 Forecasting
 Collecting and analyzing non-experimental or observational data.

4. Aim of Econometrics
i) Formulation and specification of econometric models
The economic models are formulated in an empirically testable form. Several econometric models
can be derived from an economic model. Such models differ due to different choice of functional form,
specification of stochastic structure of the variables etc.

ii) Estimation and testing of models


The models are estimated on the basis of observed set of data and are tested for their suitability.
This is the part of statistical inference of the modelling. Various estimation procedures are used to know
the numerical values of the unknown parameters of the model. Based on various formulations of
statistical models, a suitable and appropriate model is selected.

iii) Use of models


The obtained models are used for forecasting and policy formulation which is an essential part in
any policy decision. Such forecasts help the policy makers to judge the goodness of fitted model and
take necessary measures in order to re-adjust the relevant economic variables.

5. Methodology of Econometrics
a) Statement of theory or hypothesis.
b) Specification of the mathematical model of the theory
c) Specification of the statistical, or econometric, model
d) Obtaining the data
e) Estimation of the parameters of the econometric model
f) Hypothesis testing
g) Forecasting or prediction
h) Using the model for control or policy purposes.

a. Statement of theory or Hypothesis


Keynes postulated that Marginal propensity to consume (MPC), the rate of change of
consumption for a unit, change in income, is greater than zero but less than one. i.e., 0 < MPC< 1.

b. Specification of the Mathematical Model of Consumption


Keynes postulated a positive relationship between consumption and income.
The slope of the coefficient 2 measures the MPC.
 Keynesian consumption function Y 1 2x, O 2 1
 Y = Consumption expenditure
 X = Income
 1&2 are knows as the parameters of the model and are respective, the interest and slope of
coefficient.
 Shows exact and determined relationship between consumption and income. The slope of the
coefficient 2, measures the MPC.
 Equation states that consumption is linearly related to income (Example of a mathematical
model of the relationship between consumption and income that is called consumption
function in economic).
 Single or one equation is known as single equation model and more than one equation is
known as multiple equation model.

c. Specification of the econometric model of consumption.


 The inexact relationship between economic variables, the econometrician would modify the
deterministic consumption function as Y 1 2x U
 This equation is an example of the econometric model. More technically, it is an example of
linear regression model.
 We may be well representing all those factors that affect consumption but are not taken into
account explicitly.
 The econometric consumption function hypothesizes that the dependent variable Y (consumption)
is linearly related to the explanatory variable X (Income) but that is the relationship between. The
two is not exact; it is subject to individual variation.

d. Original Data
 To estimate the econometric model given in Y 1 2x U, that is, to obtain the numerical
values of β1 and β2, we need data.
 The Y variable is the aggregate (for the economy as a whole) personal consumption expenditure
(PCE) and the X variable is gross domestic product (GDP), a measure of aggregate income.
 Therefore, the data are in “real” terms; that is, they are measured in constant prices.
Note: MPC: Average change in consumption over to change in real income.

e. Estimation of the Econometric Model


 Now that we have the data, our next task is to estimate the parameters of the consumption
function.
 The numerical estimates of the parameters give empirical content to the consumption function.
 The statistical technique of regression analysis is the main tool used to obtain the estimates.
 Regression Analysis is used to obtain estimates.

f. Hypothesis Testing
 According to “positive” economists like Milton Friedman, a theory or hypothesis that is not
verifiable by appeal to empirical evidence may not be admissible as a part of scientific enquiry.
 Keynes expected the MPC to be positive but less than 1.
 Economic theories on the basis of sample evidence are based on a branch of statistical theory
known as statistical inference (hypothesis testing).

g. Forecasting or Prediction
 If the chosen model does refute the hypothesis or theory under consideration, we may use it to
predict the future value(s) of the dependent, or forecast, variable Y on the basis of known or
expected future value(s) of the explanatory, or predictor variable X.
 Macroeconomic theory shows, the change in income following change in investment expenditure
is given by the income multiplier M.

h. Use of the Model for control or Policy purpose


 Milton Friedmen has developed a model of consumption theory permanent income hypothesis.
 Robert Hall has developed a model of consumption as life cycle permanent income hypothesis.

6. Endogenous variable (Jointly determined variables)


 An endogenous variable is a variable in a statistical model that's changed or determined by its
relationship with other variables within the model. In other words, an endogenous variable is
synonymous with a dependent variable, meaning it correlates with other factors within the system
being studied.
 Endogenous variables are important in econometrics and economic modeling because they show
whether a variable causes a particular effect.
 Economists employ causal modeling to explain outcomes by analyzing dependent variables based
on a variety of factors.
Note:
The variables which are explained by the functioning of system and values of which are
determined by the simultaneous interaction of the relations in the model are endogenous variables or
jointly determined variables.

Examples:
i. In a model studying supply and demand, the price of a good is an endogenous factor because the
price can be changed by the producer (supplier) in response to consumer demand.
ii. Personal income to personal consumption, since a higher income typically leads to increases in
consumer spending.
iii. Rainfall to plant growth is correlated and studied by economists since the amount of rainfall is
important to commodity crops such as corn and wheat.
iv. Education obtained to future income levels because there's a correlation between education and
higher salaries or wages.

7. Exogenous variable (Predetermined variables)


 The variables that contribute to provide explanations for the endogenous variables and values of
which are determined from outside the model are exogenous variables or predetermined variables.
 Exogenous variables help is explaining the variations in endogenous variables.
 It is customary to include past values of endogenous variables in the predetermined group. Since
exogenous variables are predetermined, so they are independent of disturbance term in the model.
They satisfy those assumptions which explanatory variables satisfy in the usual regression model.
 Exogenous variables influence the endogenous variables but are not themselves influenced by
them. One variable which is endogenous for one model can be exogenous variable for the other
model.
 Note that in linear regression model, the explanatory variables influence study variable but not
vice versa. So relationship is one sided.

Note 1:
i. The variables that contribute to provide explanations for the endogenous variables and values of
which are determined from outside the model are exogenous variables or predetermined variables.
ii. Exogenous variables are independent, and endogenous variables are dependent. Therefore, if the
variable does not depend on variables within the model, it's an exogenous variable. If the variable
depends on variables within the model, though, it's endogenous.

Note 2: Before we proceed to a formal analysis of regression theory, let us dwell briefly on the matter
of terminology and notation. In the literature the terms dependent variable and explanatory variable are
described variously. A representative list is:

8. (i) Structural form and reduced form of Endogenous and Exogenous variables
(ii) UNIT – V.

Refer IIT, Kanpur notes…

9. Ordinary least square estimates and its properties


The method of ordinary least squares is attributed to Carl Friedrich Gauss, a German
mathematician.
Under certain assumptions, the method of least squares has some very attractive statistical
properties that have made it one of the most powerful and popular methods of regression analysis.
To understand this method, we first explain the least-squares principle.
Recall the two-variable population regression function (PRF):

However, the PRF is not directly observable. We estimate it from the sample regression function (SRF):

where is the estimated (conditional mean) value of Yi.

First, SRF itself determined from the above expression as

which shows that the (the residuals) are simply the differences between the actual and
estimated Y values.
Now given ‘n’ pairs of observations on Y and X, we would like to determine the SRF in such a
manner that it is as close as possible to the actual Y. To this end, we may adopt the following criterion:
Choose the SRF in such a way that the sum of the residuals

is as small as possible.

If we adopt the criterion of minimizing , it shows that the residuals and as well as the

residuals and receive the same weight in the sum , although the first two residuals
are much closer to the SRF than the latter two.
In other words, all the residuals receive equal importance no matter how close or how widely
scattered the individual observations are from the SRF. A consequence of this is that it is quite possible

that the algebraic sum of the is small (even zero) although the are widely scattered about the SRF.

To see this, let assume the values of 10, −2, +2, and −10, respectively. The algebraic

sum of these residuals is zero although and are scattered more widely around the SRF than and

.
We can avoid this problem if we adopt the least-squares criterion, which states that the SRF can
be fixed in such a way that

is as small as possible, where are the squared residuals. By squaring , this method gives more

weight to residuals such as and than the residuals and .

As noted previously, under the minimum criterion, the sum can be small even though the
are widely spread about the SRF. But this is not possible under the least-squares procedure, for the

larger the (in absolute value), the larger the .


A further justification for the least-squares method lies in the fact that the estimators obtained by
it have some very desirable statistical properties.
It is obvious from the above equation that

that is, the sum of the squared residuals is some function of the estimators and . For any given set

of data, choosing different values for and will give different ’s and hence different values of

We have considered all the conceivable values of and . But since time, and certainly
patience, is generally in short supply, we need to consider some shortcuts to this trial-and-error process.
Fortunately, the method of least squares provides us such a shortcut. The principle or the method

of least squares chooses and in such a manner that, for a given sample or set of data, is as
small as possible.
In other words, for a given sample, the method of least squares provides us with unique estimates

of and that give the smallest possible value of .


This is a straightforward exercise in differential calculus, the process of differentiation yields the

following equations for estimating and :


where ‘n’ is the sample size. These simultaneous equations are known as the normal equations.
Solving the normal equations simultaneously, we obtain

where and are the sample means of X and Y and where we define

and

Henceforth, we adopt the convention of letting the lowercase letters denote deviations from mean
values.

For estimating can be alternatively expressed as

The estimators obtained previously are known as the least-squares estimators, for they are derived
from the least-squares principle.
Properties of estimators obtained by the method of OLS
Numerical properties are those that hold as a consequence of the use of ordinary least squares,
regardless of how the data were generated.
We will also consider the statistical properties of OLS estimators, that is, properties “that hold
only under certain assumptions about the way the data were generated”.
 The OLS estimators are expressed solely in terms of the observable (i.e., sample) quantities (i.e.,
X and Y). Therefore, they can be easily computed.
 They are point estimators; that is, given the sample, each estimator will provide only a single
(point) value of the relevant population parameter.
 Once the OLS estimates are obtained from the sample data, the sample regression line can be
easily obtained. The regression line thus obtained has the following properties:
a) It passes through the sample means of Y and X. This fact is obvious from

for the latter can be written as

b) The mean value of the estimated is equal to the mean value of the actual Y
for
Summing both sides of this last equality over the sample values and dividing through by
the sample size n gives

where use is made of the fact that

c) The mean value of the residuals is zero. The first equation is

But since

the preceding equation reduces to

whence

As a result of the preceding property, the sample regression

can be expressed in an alternative form where both Y and X are expressed as deviations
from their mean values.

To see this, sum the above equations on both sides to give

Dividing the above equation through by n, we obtain

which is the same as

Subtracting the above two equations, we obtain


or

where yi and xi, following our convention, are deviations from their respective (sample) mean
values.

The equation

is known as the deviation form.

Notice that the intercept term is no longer present in it. But the intercept term can always be
estimated by

that is, from the fact that the sample regression line passes through the sample means of Y and
X. An advantage of the deviation form is that it often simplifies computing formulas.

In passing, note that in the deviation form, the SRF can be written as

whereas in the original units of measurement it was

d) The residuals are uncorrelated with the predicted Yi . This statement can be verified
as follows: using the deviation form, we can write

where use is made of the fact that

e) The residuals are uncorrelated with Xi; that is,


10. Ordinary Least Square (OLS) estimators and its properties
Linear regression models have several applications in real life. In econometrics, Ordinary Least
Squares (OLS) method is widely used to estimate the parameters of a linear regression model. For the
validity of OLS estimates, there are assumptions made while running linear regression models.

Assumptions:
1. The linear regression model is “linear in parameters.”
2. There is a random sampling of observations.
3. The conditional mean should be zero.
4. There is no multi-collinearity (or perfect collinearity).
5. Spherical errors: There is homoscedasticity and no auto-correlation
6. Optional Assumption: Error terms should be normally distributed.

These assumptions are extremely important because violation of any of these assumptions would make
OLS estimates unreliable and incorrect. Specifically, a violation would result in incorrect signs of OLS
estimates, or the variance of OLS estimates would be unreliable, leading to confidence intervals that are
too wide or too narrow.

The Gauss-Markov Theorem


The Gauss-Markov Theorem is named after Carl Friedrich Gauss and Andrey Markov.
Let the regression model be:
Y = βo + βi Xi + ε

Let βo and βi be the OLS estimators.


According to the Gauss-Markov Theorem, under the assumptions 1 to 5 of the linear regression
model, the OLS estimators βo and βi are the Best Linear Unbiased Estimators (BLUE) of βo and βi .
In other words, the OLS estimators βo and βi have the minimum variance of all linear and
unbiased estimators of βo and βi.
BLUE summarizes the properties of OLS regression. These properties of OLS in econometrics are
extremely important, thus making OLS estimators one of the strongest and most widely used estimators
for unknown parameters.
This theorem tells that one should use OLS estimators not only because it is unbiased but also
because it has minimum variance among the class of all linear and unbiased estimators.
11. Properties of OLS Regression Estimators

Property 1: Linear
This property is more concerned with the estimator rather than the original equation that is being
estimated. In assumption A1, the focus was that the linear regression should be “linear in parameters.”
However, the linear property of OLS estimator means that OLS belongs to that class of estimators,
which are linear in Y, the dependent variable. Note that OLS estimators are linear only with respect to
the dependent variable and not necessarily with respect to the independent variables.
The linear property of OLS estimators doesn’t depend only on assumption A 1 but on all assumptions
A1 to A5.

Property 2: Unbiasedness
If we look at the regression equation, we will find an error term associated with the regression
equation that is estimated. This makes the dependent variable also random. If an estimator uses the
dependent variable, then that estimator would also be a random number. Therefore, before describing
what unbiasedness is, it is important to mention that unbiasedness property is a property of the estimator
and not of any sample.
Unbiasedness is one of the most desirable properties of any estimator. The estimator should
ideally be an unbiased estimator of true parameter/population values.

Example: Suppose there is a population of size 1000, and we are taking out samples of 50 from this
population to estimate the population parameters. Every time we take a sample, it will have the different
set of 50 observations and, hence, we would estimate different values of βo and βi. The unbiasedness
property of OLS method says that when we take out samples of 50 repeatedly, then after some repeated
attempts, we would find that the average of all the βo and βi from the samples will equal to the actual
(or the population) values of βo and βi.
Mathematically,
E(bo) = βo
E(bi) = βi
Here, ‘E’ is the expectation operator.

In layman’s term, if we take out several samples, keep recording the values of the estimates, and then
take an average, we will get very close to the correct population value. If our estimator is biased, then
the average will not equal the true parameter value in the population.
The unbiasedness property of OLS in Econometrics is the basic minimum requirement to be
satisfied by any estimator. However, it is not sufficient for the reason that most times in real-life
applications, we will not have the luxury of taking out repeated samples. In fact, only one sample will
be available in most cases.
Property 3: Best: Minimum Variance
First, let us look at what efficient estimators are. The efficient property of any estimator says that
the estimator is the minimum variance unbiased estimator. Therefore, if we take all the unbiased
estimators of the unknown population parameter, the estimator will have the least variance. The
estimator that has less variance will have individual data points closer to the mean.
As a result, they will be more likely to give better and accurate results than other estimators
having higher variance. In short:

 If the estimator is unbiased but doesn’t have the least variance – it’s not the best!
 If the estimator has the least variance but is biased – it’s again not the best!
 If the estimator is both unbiased and has the least variance – it’s the best estimator.

Now, talking about OLS, OLS estimators have the least variance among the class of all linear
unbiased estimators. So, this property of OLS regression is less strict than efficiency property.
Efficiency property says least variance among all unbiased estimators, and OLS estimators have the
least variance among all linear and unbiased estimators.

In mathematically,
Let bo be the OLS estimator, which is linear and unbiased. Let bo be any other estimator of βo, which is
also linear and unbiased. Then,
Var (bo) < Var (bo∗)
Let bi be the OLS estimator, which is linear and unbiased. Let bi be any other estimator of βi, which is
also linear and unbiased. Then,
Var (bi) < Var (bi∗)

The above three properties of OLS model makes OLS estimators BLUE as mentioned in the
Gauss-Markov theorem.

It is worth spending time on some other estimators’ properties of OLS in econometrics. The
properties of OLS described below are asymptotic properties of OLS estimators. So far, finite sample
properties of OLS regression were discussed. These properties tried to study the behavior of the OLS
estimator under the assumption that we can have several samples and, hence, several estimators of the
same unknown population parameter. In short, the properties were that the average of these estimators
in different samples should be equal to the true population parameter (unbiasedness), or the average
distance to the true parameter value should be the least (efficient). However, in real life, we will often
have just one sample. Hence, asymptotic properties of OLS model are discussed which studies how
OLS estimators behave as sample size increases. Keep in mind that sample size should be large.
Property 4: Asymptotic Unbiasedness
This property of OLS says that as the sample size increases, the biasedness of OLS estimators
disappears.

Property 5: Consistency
An estimator is said to be consistent if its value approaches the actual, true parameter (population)
value as the sample size increases. An estimator is consistent if it satisfies two conditions:
a) It is asymptotically unbiased
b) Its variance converges to 0 as the sample size increases.
Both these hold true for OLS estimators and, hence, they are consistent estimators. For an estimator to
be useful, consistency is the minimum basic requirement. Since there may be several such estimators,
asymptotic efficiency also is considered.
Asymptotic efficiency is the sufficient condition that makes OLS estimators the best estimators.

Note:
Applications and How it Relates to Study of Econometrics
OLS estimators, because of such desirable properties discussed above, are widely used and find
several applications in real life.

Example:
Consider a bank that wants to predict the exposure of a customer at default. The bank can take the
exposure at default to be the dependent variable and several independent variables like customer level
characteristics, credit history, type of loan, mortgage, etc. The bank can simply run OLS regression and
obtain the estimates to see which factors are important in determining the exposure at default of a
customer. OLS estimators are easy to use and understand. They are also available in various statistical
software packages and can be used extensively.
OLS regressions form the building blocks of econometrics. Any econometrics class will start with
the assumption of OLS regressions. It is one of the favorite interview questions for jobs and university
admissions. Based on the building blocks of OLS, and relaxing the assumptions, several different
models have come up like GLM (generalized linear models), general linear models, heteroscedastic
models, multi-level regression models, etc.
Research in Economics and Finance are highly driven by Econometrics. OLS is the building
block of Econometrics. However, in real life, there are issues, like reverse causality, which render OLS
irrelevant or not appropriate. However, OLS can still be used to investigate the issues that exist in cross-
sectional data. Even if OLS method cannot be used for regression, OLS is used to find out the problems,
the issues, and the potential fixes.

Link:
https://www.albert.io/blog/ultimate-properties-of-ols-estimators-guide/
UNIT – II

1. Assumptions of classical linear regression model (CLRM)


Econometric techniques are used to estimate economic models, which ultimately allow we to
explain how various factors affect some outcome of interest or to forecast future events. The ordinary
least squares (OLS) technique is the most popular method of performing regression analysis and
estimating econometric models, because in standard situations (meaning the model satisfies a series of
statistical assumptions) it produces optimal (the best possible) results.
If our objective is to estimate β 1 and β2 only, the method of OLS will be sufficed. But in

regression analysis our objective is not only to obtain and but also to draw inferences about the
true β1 and β2.

For example, we would like to know how close and are to their counterparts in the

population or how close is to the true E(Y | Xi).


To that end, we must not only specify the functional form of the model, but also make certain
assumptions about the manner in which Yi are generated.
To see why this requirement is needed, look at the

PRF: Yi = β1 + β2Xi + ui

It shows that Yi depends on both Xi and ui.

Therefore, unless we are specific about how Xi and ui are created or generated, there is no way we
can make any statistical inference.
In this lesson, we will study about the various methods through which the regression models draw
inferences about the various parameters. Basically, there are three methods through which we do this:-

1. The classical linear regression model (CLRM)


2. Generalized least square (GLS).
3. Maximum Likelihood estimation (ML)

The proof that OLS generates the best results is known as the Gauss-Markov theorem, but the
proof requires several assumptions. These assumptions, known as the classical linear regression model
(CLRM) assumptions, are the following:
Assumption 1: Linear regression model.
The regression model is linear in the parameters,
i = 1 + 2Xi + ui

Assumption 2: X values are fixed in repeated sampling.


Values taken by the regressor X are considered fixed in repeated samples. More technically, X is
assumed to be nonstochastic.

Assumption 3: Zero mean value of disturbance ui.


Given the value of X, the mean, or expected, value of the random disturbance term ui is zero.
Technically, the conditional mean value of ui, is zero. Symbolically, we have
E(ui |Xi)= 0

Assumption 4: Homoscedasticity or equal variance of ui.


Given the value of X, the variance of ui is the same for all observations. That is the conditional variance
of ui, are identical. Symbolically, we have
var (ui |Xi) = E(ui |Xi)2
2
= E(u |Xi) because of Assumption 3
= 2
where var stands for variance.

Assumption 5: No autocorrelation between the disturbances.


Given any two X values, Xi and Xj (ij) the correlation between any two ui and uj (ij) is zero.
Symbolically
Cov (ui ui |Xi,Xj) = E{[ui - E(uj)]| Xi} {[ui - E(uj)]| Xi)
= E(ui |Xi) (uj |Xj)
=0
where i and j are two different observations and where cov means covariance.

Assumption 6: Zero covariance between


ui and Xi or E(uiXi) =0 Formally, Cov (ui Xi) = E[ui - E(uj)][Xi - E(xI)]
= E[ui (Xi - E(Xi))] Since E(ui) = 0
= E[uiXi) - E(Xi) E(ui) Since E(Xi) is nonstochastic
= E[uiXi) Since E(ui) =0
= 0 by assumption
Assumption 7: The number of observation n must be greater than the number of parameters to
be estimated.
Alternatively, the number of observation ‘n’ must be greater than the number of explanatory variables.

Assumption 8: Variability in X values.


The X values in a given sample and not all be the same.
Technically, var (X) must be a finite positive number.

Assumption 9: The regression model is correctly model in correctly specified.


Alternatively, there is no specification bias or error in the model used in empirical analysis.

Assumption 10: There is no perfect multicolinearity.


That is, there are no perfect linear relationships among the explanatory variable.

2. Meaning of Heteroskadasticity
One of the assumptions made about residuals/errors in OLS regression is that the errors have the
same but unknown variance. This is known as constant variance or homoscedasticity. When this
assumption is violated, the problem is known as heteroscedasticity.

3. Detection of Heteroskadasticity - Tests for detecting Heteroscedasticity


There are a large number of tests available to test for heteroscedasticity. However, some of those
tests are used more often than others.
The most important heteroscedasticity tests, their limitations and their uses are as follows:

Heteroscedasticity Limitations When to use


test

This method involves eyeballing


It is generally advised to implement
the graphs.
the residual vs fitted plots or residual
The pattern between residuals
vs independent variable graphs after
and independent variable/fitted
Graphical method every regression.
values may not always be clear.
These graphs can provide important
Moreover, it might be affected
insights into the behaviour of residuals
by the subjective opinions of
and heteroscedasticity.
researchers.

Breusch-Pagan- It is very sensitive to the This test can be employed if the error
Godfrey test assumption of normal distribution term or residuals are normally
of the residuals. distributed.
Therefore, it is advisable to check the
normality of residuals before
implementing this test.

It generally requires a large


sample because the number of
variables including the squared This test does not depend on the
and cross products can be huge, normality of error terms and it does not
White’s which can restrict degrees of require choosing ‘c’ or ordering
heteroscedasticity freedom. observations.
test This is also a test of specification Hence, it is easy to implement.
errors. Therefore, test statistic However, it is important to keep the
may be significant due to limitations in mind.
specification error rather than
heteroscedasticity.

Other important tests

Heteroscedasticity
Limitations When to use
test

‘c’ or central observations to be


omitted must be carefully chosen.
A wrong choice may lead to
unreliable results. This test is preferred over the Park test
For multiple independent and Glejser’s test.
Goldfield-Quandt variables, it becomes difficult to Goldfeld and Quandt suggested that
test know beforehand which variable c=8 and c=16 are usually appropriate
should be chosen to order the for around n=30 and n=60
observations. observations respectively.
Separate testing is required to
determine which variable is
appropriate.

This test can be used as a precursor to


The error term within the test
the Goldfeld-Quandt test, to determine
Park test itself may be heteroscedastic,
which variable is appropriate to order
leading to unreliable results.
observations.
This test has been observed to give
satisfactory results in large samples.
However, the most important usage of
this test is to determine the functional
Glejser’s Similar to the Park test, the error form of heteroscedasticity before
Heteroscedasticity term within the test may be implementing Weighted Least Squares
test heteroscedastic. (WLS).
This test can be used to determine the
weights in WLS, depending on the
relationship between the variance of
residuals and independent variables.

Note: Detection of Heteroskedasticity (OR) Test for Heteroskedasticity

i. RESET by Ramsey (1969):


Steps: 1. For the given data, fit the model y i=β x i+u i
2. Estimate ^y i , ^y i2 , ^y i3 , …∧ui
3. Regress ui∧ ^y i2 , ^y i3 , …
4. If the coefficients are significant, there is no heteroskedasticity.

ii. White Test (1980):


Steps: 1. Consider y i=α + β 1 x 1 + β 2 x 2 + β 3 x 3 +ui
2. Estimate the coefficients.
3. Compute the residuals u^ i.
4. Regress u^ i2on x 1 , x 2 , x 3 , x 12 , x 22, x 32 , x 1 x 2 , x 2 x3 ,∧x 3 x 1 .
5. If the coefficients are significant, there is no heteroskedasticity.

iii. Breusch and Pagan Test (1979):


Steps: 1. In the above table form the first group 6th, 11th, 9th, 4th, 14th, 15th,
19th, 20th, 1st and 16th.
2. Remaining second group
3. For the first group fit Y =a+bX
4. For the second group fit Y =a+bX
σ^ 2
2

5. Compute F= 2
σ^ 1
6. If F> FtabReject H o : the data is homoscedastic.
4. Consequences of Heteroskadasticity
 The OLS estimators and regression predictions based on them remains unbiased and consistent.
 The OLS estimators are no longer the BLUE (Best Linear Unbiased Estimators) because they are
no longer efficient, so the regression predictions will be inefficient too.
 Because of the inconsistency of the covariance matrix of the estimated regression coefficients, the
tests of hypotheses, (t-test, F-test) are no longer valid.

The major consequences of heteroscedasticity can be summarized as follows:

 Validity of statistical inference


 Standard errors, confidence intervals, p-values and other tests of significance are no longer
reliable in the presence of heteroscedasticity. This is because OLS standard errors assume
constant variance of residuals.
 The tests of significance are based on these standard errors. In heteroscedasticity, error variance
is non-constant, therefore, OLS standard errors are not applicable. As a result, it is not advisable
to rely on confidence intervals and p-values.

 The variance of estimates


 OLS estimates no longer have minimum variance property because the variance of residuals is
not constant.
 The coefficients end up having larger standard errors and lower precision in the presence of
heteroscedasticity. Hence, OLS estimators become inefficient in the presence of
heteroscedasticity.

 Predictions
 The forecasted or predicted values of the dependent variable based on a heteroscedastic model
will have high variance. This is because the OLS estimates are no longer efficient.
 The variance of residuals is not minimum in presence of heteroscedasticity due to which the
variance of predictions is also high.

 Biasedness
 The unbiasedness property of OLS estimates does not require a constant variance of residuals.
 However, the predictions from a heteroscedastic model can still end up being biased, especially
in the case of observations with large residuals.
Note:
In the presence heteroskedasticity, the least squares estimators will get affected in the following
manner.
i. The least squares estimators are still unbiased but inefficient.
ii. The estimates of the variances are also biased.

Consider a very simple model with no constant term:


y i=β x i+u i…(1)
2
V ( ui )=σ i
The least squares estimator of β is

^β= ∑ ∑ xi ui
xi yi
=β +
∑x i
2
∑ x i2

If E ( ui ) =0 and ui are independent of the x i, we have E ( x i ui / ∑ x i ) =0 and hence E ( ^β )= β.


2

Thus, ^β is unbiased.

If the ui are mutually independent, denoting ∑ x i2 by S xx, we can write

V ( ^β )=V
( x1
S xx
x2 xn
u1 + u 2+ … .+ u n
S xx S xx )
1
2 ( 1
x σ 1 + x 2 σ 2 +…+ x n σ n )…(2)
2 2 2 2 2 2
¿
S xx

¿
∑ xi2 σ i2
2
(∑ xi2 )

Suppose that we write σ i2 =σ 2 zi2, where z i are known, that is, we know the variances up to a
multiplicative constant. Then, dividing (1) by z i, we have the model

yi xi
= β +v i …(3)
zi zi

where v i=ui / z i has a constant variance σ 2 . Since we are “weighting” the ith observation by 1/ z i, the
OLS estimation of is called weighted least squares(WLS). If β ¿is the WLS estimator of β , we have

β ¿=
∑ ( y i /z i ) ( x i / zi )
∑ ( x i / z i )2
¿ β+
∑ ( xi /z i ) v i
∑ ( xi /z i )2

and, since the latter term has expectation zero, we have E ( β ¿ ) =β .


Thus the WLS estimator β ¿ is also unbiased.

We will show that β ¿ is more efficient than the OLS estimator ^β .


We have
2
σ
V ( β )=
¿

∑ ( x i / z i )2
And substituting σ i2 =σ 2 zi2 into equation (2), we have
2 ∑ xi z i
2
V ( ^β )=σ 2
( ∑ x i2 )
(∑ xi2 )
2
V (β )
¿
=
V ( ^β ) ∑ ( xi ¿ z i ) ∑ x i z i
2 2

This expression is of the form ( ∑ ai bi ) =∑ ai2 ∑ bi2, where a i=x i z i and b i=x i ¿ z i.
2

Given the Cauchy-Schwartz inequality, it is less than 1 and is equal to 1 only if a i∧bi are
proportional, that is x i z i and x i ¿ z i are proportional or z i2 is a constant, which is the case if the errors are
homoscedastic.
Thus the OLS estimator is unbiased but less efficient (has a higher variance) than the WLS
estimator.
The estimation of the variance of ^β , it is estimated by
RSS 1
n−1 ∑ x i2
where RSS is the residual sum of squares from the OLS model. But

E ( RSS )=E [ ∑ ( y − ^β x ) ]
i i
2

¿∑ σi −

2
2 2
xi σi
∑ x i2
Note that, if σ i2 =σ 2 for all i, this reduces to ( n−1 ) σ 2. Thus we would be estimating the variance of ^β by
an expression whose expected value is

∑ x i2 ∑ σ i2 −∑ xi2 σ i2
2
( n−1 ) ( ∑ x i2)
whereas the true variance is
∑ x i2 σ i2
2
(∑ x i2 )

Thus the estimated variances are also biased.


5. Nature of Heteroscedasticity
 It’s an error learning model, as people learn, their error of behavoiur become smaller over time.
 As income grow, people have more discretionary income & hence more scope for
choice about the disposition of their income.

 As data collecting techniques increases is likely to decrease.


i
 If can also arise as a result of the presence of collinear.
 It is skewness in the distribution of one or more regressions included in the model.
 Incorrect data transformation.
 Incorrect functional form.

6. Causes of Heteroscedasticity

i. Outliers
 Outliers are specific values within a sample that are extremely different (very large or small)
from other values.
 Outliers can alter the results of regression models and cause heteroscedasticity.
 Outlying observations can often lead to a non-constant variance of residuals.

ii. Mis-specification of the model


Incorrect specification can lead to heteroscedastic residuals. For example, if an important variable
is excluded from the model, its effects get captured in the error terms. In such a case, the residuals
might exhibit non-constant variance because they end up accounting for the omitted variable.

iii. Wrong Functional form


 Misspecification of the model’s functional form can cause heteroscedasticity.
 Suppose, the actual relationship between the variables is non-linear in nature.
 If we estimate a linear model for such variables, we might observe its effects in the residuals in
the form of heteroscedasticity.

iv. Error-learning
 Let us consider an example for this case. Errors in human behaviour become smaller over time
with more practice or learning of an activity.
 In such a case, errors will tend to decrease. For example, with the skill development of labour,
their error will decrease leading to lower defective products in the manufacturing process or a
rise in their productivity. Hence, error variance will decrease in such a setup.

v. Nature of variables
 For instance, an increase in income is accompanied by an increase in choices to spend that extra
income.
 This leads to discretionary expenditure. In such a case, error variance will increase with an
increase in income.
 A model with consumption as a dependent variable and income as an independent variable can
have an increasing error variance. Hence, the nature of variables and their relationships can play
a huge role in this phenomenon.

7. Solutions to the problem of Heteroscedasticity


i. Robust Standard Errors
 The usual OLS standard errors assume constant variance of residuals and cannot be used in
heteroscedasticity.
 Instead, robust standard errors can be employed in such cases. These allow a non-constant
variance of residuals to estimate the standard errors of coefficients.
 The HC3 Robust standard errors have been observed to perform well in heteroscedasticity.

ii. Weighted Least Squares


 The Weighted Least Squares technique is a special application of Generalized Least Squares.
 The variables in the model are transformed by assigning weights in such a manner that the
variance of residuals becomes constant.
 The weights are determined by understanding the underlying relationship or form of
heteroscedasticity.
 Glejser’s heteroscedasticity test can be used to determine the heteroscedastic relationship
between residuals and independent variables.

iii. Transforming the variables


 Transforming the variables can help reduce or even completely eliminate heteroscedasticity. For
example, using log transformation reduced the scale of all the variables.
 As a result, the scale of residual variance also decreases which helps mitigate the problem of
heteroscedasticity.
 In some cases, the problem of heteroscedasticity may be totally eliminated. In addition, log
transformation provides its own benefits in economics by letting us study the elasticities of
variables.
iv. Non-parametric methods
 Non-parametric regression techniques make no assumptions about the relationship between
dependent and independent variables.
 In OLS, the assumption of constant residual variance is violated by heteroscedasticity. However,
there are no assumptions to violate in the case of non-parametric methods.
 Estimation techniques such as Kernel regression, Splines and Random Forest fall under the
category of non-parametric methods. These provide a lot more flexibility in estimating complex
relationships.

Note 1: Solutions to the Heteroskedasticity: Box– Cox Test


The Box Cox transformation is named after statisticians George Box and Sir David Roxbee
Cox who collaborated on a 1964 paper and developed the technique.
A Box Cox transformation is a transformation of non-normal dependent variables into a normal
shape. Normality is an important assumption for many statistical techniques; if our data isn’t normal,
applying a Box-Cox means that we are able to run a broader number of tests.
Box-Cox transformation is a statistical technique that transforms our target variable so that our
data closely resembles a normal distribution.
In many statistical techniques, we assume that the errors are normally distributed. This
assumption allows us to construct confidence intervals and conduct hypothesis tests.
The problem of heteroskedasticity will be reduced if we consider log Y instead of Y. i.e Instead of
Y =a+bX , use logY =a+ b logX .
To compare linear and log linear forms R2 cannot be used. One solution to this is to use Box-
Cox test.
Box and Cox, consider the regression model y i ( λ )= β x i+ ui where ui N (0 , σ 2 ).

ML Method suggested by Box-Cox


1. Divide each y by the geometric mean of y’s.
2. Compute Y( λ ) for different values of λ .
3. Regress Y( λ ) on X.
4. Compute the residual sum of squares.
5. Choose the value of λ for which the residual sum of squares is minimum. This is the MLE of λ .

{
y λ−1
for λ ≠0
6. Formula for computing Y( λ ) is Y ( λ ) = λ
log y for λ=0

Common Box-Cox Transformations


Transformed data
Lambda value (λ)
(Y’)
-3 Y-3 = 1/Y3
-2 Y-2 = 1/Y2
-1 Y-1 = 1/Y1
-0.5 Y-0.5 = 1/(√(Y))
0 log(Y)**
0.5 Y0.5 = √(Y)
1 Y1 = Y
2 Y2
3 Y3
**Note: the transformation for zero is log(0), otherwise all data would transform to Y0 = 1.

Note 2:
Heteroskedasticity is Greek for data with a different dispersion. For example, in statistics, If a
sequence of random variables has the same finite variance, it is called homoskedastic dispersion; if a
sequence does not have the same variance, it is known as heteroscedastic dispersion.
Dispersion is a means of describing the extent of distribution of data around a central value or
point. Lower dispersion indicates higher precision in data measurements, whereas higher dispersion
means lower accuracy.

A residual spread is plotted on a graph to visualize the correlation between data and a particular
statistical model. If the dispersion is heteroscedastic, it is seen as a problem. Researchers look for
homoskedasticity. The data points should have a constant variance to satisfy the researcher’s
assumption.
In residual plots, heteroskedasticity in regression is cone-shaped. In scatter plots, variance
increases with the increase in fitted value. For cross-sectional studies like income, the range is from
poverty to high-income citizens; when plotted on a graph, the data is heteroskedastic.
Heteroskedasticity is categorized into two types.
 Pure heteroskedasticity
 Impure heteroskedasticity
In pure heteroskedastic dispersion, the chosen statistical model is correct. But with, impure
(residual) heteroskedastic dispersion errors are observed—as a result, the statistic model is incorrect for
the given data. These errors cause variance.
Methods
There are three methods to fix heteroskedasticity and improve the model –
 Redefining variables
 Weighted regression
 Transform the dependent variable
In the first method, the analyst can redefine the variables to improve the model and get desired
results with accuracy. In the second method, the regression analysis is appropriately weighted. Finally,
the third approach is to interchange the working in every model. For example, there is a dependent
variable and a predictor variable; by changing the dependent variable, the whole model gets revamped.
Thus, it is an important approach to move forward.
Independent variables used in regression analyses are referred to as the predictor variable. The
predictor variable provides information on an associated dependent variable regarding a particular
outcome.
It is imperative to determine whether data exhibits pure heteroskedastic dispersion or impure
heteroskedastic dispersion—the approach varies for each subtype. Furthermore, improving the variables
used in the impure form is important. If ignored, these variables cause bias in coefficient accuracy,
and p-values become smaller than they should be.

Causes
Heteroskedastic dispersion is caused due to the following reasons.
 It occurs in data sets with large ranges and oscillates between the largest and smallest
values.
 It occurs due to a change in factor proportionality.
 Among other reasons, the nature of the variable can be a major cause.
 It majorly occurs in cross-sectional studies.
 Some regression models are prone to heteroskedastic dispersion.
 An improper selection of regression models can cause it.
 It can also be caused by data set formations and inefficiency of calculations as well.

Examples
Example: 1
The most basic heteroskedastic example is household consumption. The variance in consumption
increases with an increase in income—directly proportional. Because when the income is low, the
variance in consumption is also low. Low-income people spend predominantly on necessary items and
bills—less variance. In contrast, with the increase in income, people tend to buy luxurious items and
develop a plethora of habits—less predictable.
Example: 2
A coach correlating runs scored by each player with time spent in training is an example of
homoskedasticity. In this case, the scores would become the dependent variable, and training time is the
predictor variable.

Example: 3
The basic application of heteroskedasticity is in the stock market—variance in stock is compared for
different dates. In addition, investors use heteroskedastic dispersion in regression models to track
securities and portfolios. Heteroskedastic dispersion can or cannot be predicted depending on the
particular situation taken into a study.

For example, when the prices of a product are studied at the launch of a new model, heteroskedasticity
is predictable. But for rainfall or income comparisons, the nature of dispersion cannot be predicted.

Note 3: Testing for Homogeneity of Variance


Tests that we can run to check our data meets this assumption include:
 Bartlett’s Test,
 Box’s M Test
 Brown-Forsythe Test
 Hartley’s Fmax test
 Levene’s Test

8. Testing the linear versus log linear functional form


The choice between a linear regression model (the regressand is a linear function of the
regressors) or a log–linear regression model (the log of the regressand is a function of the logs of the
regressors) is a perennial question in empirical analysis.
We can use a test proposed by MacKinnon, White, and Davidson, which for brevity we call the
MWD test, to choose between the two models.

To illustrate this test, assume the following

H0: Linear Model: Y is a linear function of regressors, the X’s.


H1: Log–Linear Model: ln Y is a linear function of logs of regressors, the logs of X’s.

where, as usual, H0 and H1 denote the null and alternative hypotheses.

The MWD test involves the following steps:


Step I : Estimate the linear model and obtain the estimated Y values. Call them Yf (i.e., ).
Step: II : Estimate the log–linear model and obtain the estimated ln Y values; call them
^
ln f (i.e.,ln Y ).
Step III : Obtain Z1 = (lnY f − ln f ).
Step IV : Regress Y on X’s and Z1 obtained in Step III. Reject H0 if the coefficient of
Z1 is statistically significant by the usual ‘t’ test.
Step V : Obtain Z2 = (antilog of ln f − Y f ).
Step VI : Regress log of Y on the logs of X’s and Z2. Reject H1 if the coefficient of Z2
is statistically significant by the usual ‘t’ test.

Although the MWD test seems involved, the logic of the test is quite simple. If the linear model is
in fact the correct model, the constructed variable Z 1 should not be statistically significant in Step IV,
for in that case the estimated Y values from the linear model and those estimated from the log–linear
model (after taking their antilog values for comparative purposes) should not be different. The same
comment applies to the alternative hypothesis H1.

UNIT – III

1. Meaning of Autocorrelation
Autocorrelation refers to the degree of correlation of the same variables between two successive
time intervals. It measures how the lagged version of the value of a variable is related to the original
version of it in a time series.
Autocorrelation, as a statistical concept, is also known as serial correlation. It is often used with
the autoregressive-moving-average model (ARMA) and autoregressive-integrated-moving-average
model (ARIMA). The analysis of autocorrelation helps to find repeating periodic patterns, which can be
used as a tool for technical analysis in the capital markets.
Autocorrelation gives information about the trend of a set of historical data so that it can be useful
in the technical analysis for the equity market.
However, autocorrelation can also occur in cross-sectional data when the observations are related
in some other way.
The value of autocorrelation ranges from -1 to 1. A value between -1 and 0 represents negative
autocorrelation. A value between 0 and 1 represents positive autocorrelation.

Examples:
 In a survey, for instance, one might expect people from nearby geographic locations to provide more
similar answers to each other than people who are more geographically distant. Similarly, students
from the same class might perform more similarly to each other than students from different classes.
 Assume an investor is looking to discern if a stock's returns in her portfolio exhibit autocorrelation;
the stock's returns are related to its returns in previous trading sessions. If the returns do exhibit
autocorrelation, the stock could be characterized as a momentum stock; its past returns seem to
influence its future returns. The investor runs a regression with two prior trading sessions' returns as
the independent variables and the current return as the dependent variable. She finds that returns one
day prior have a positive autocorrelation of 0.7, while the returns two days prior have a positive
autocorrelation of 0.3. Past returns seem to influence future returns, and she can adjust her portfolio
to take advantage of the autocorrelation and resulting momentum.

Note 1: Autocorrelation function (ACF)


The coefficient of correlation between two values in a time series is called the autocorrelation
function (ACF).

For example the ACF for a time series is given by:


Corr ( yt , yt−k ), k = 1, 2, ....
This value of ‘k’ is the time gap being considered and is called the lag.
A lag 1 autocorrelation (i.e., k = 1 in the above) is the correlation between values that are one time
period apart.
More generally, a lag ‘k’ autocorrelation is the correlation between values that are ‘k’ time periods
apart.
Note 2: Types of Autocorrelation
 The most common form of autocorrelation is first-order serial correlation, which can either be
positive or negative.
 Positive serial correlation is where a positive error in one period carries over into a positive error for
the following period.
 Negative serial correlation is where a negative error in one period carries over into a negative error
for the following period.
 Second-order serial correlation is where an error affects data two time periods later. This can
happen when our data has seasonality. Orders higher than second-order does happen, but they are
rare.

2. Sources of Autocorrelation
Source of autocorrelation Some of the possible reasons for the introduction of autocorrelation in
the data are as follows:
 Carryover of effect, at least in part, is an important source of autocorrelation. For example, the
monthly data on expenditure on household is influenced by the expenditure of preceding month.
The autocorrelation is present in cross-section data as well as time-series data. In the cross-
section data, the neighbouring units tend to be similar with respect to the characteristic under
study. In time-series data, time is the factor that produces autocorrelation. Whenever some
ordering of sampling units is present, the autocorrelation may arise.
 Another source of autocorrelation is the effect of deletion of some variables. In regression
modeling, it is not possible to include all the variables in the model. There can be various
reasons for this, e.g., some variable may be qualitative, sometimes direct observations may not
be available on the variable etc. The joint effect of such deleted variables gives rise to
autocorrelation in the data.
 The misspecification of the form of relationship can also introduce autocorrelation in the data. It
is assumed that the form of relationship between study and explanatory variables is linear. If
there are log or exponential terms present in the model so that the linearity of the model is
questionable, then this also gives rise to autocorrelation in the data. 4. The difference between
the observed and true values of the variable is called measurement error or errors–in-variable.
The presence of measurement errors on the dependent variable may also introduce the
autocorrelation in the data.
3. Consequences of Autocorrelation
Refer IIT Notes…

The important consequences of autocorrelation in OLS estimation with reference


to a two variable model are as follows:
i. The least squares estimators are still linear and unbiased.
ii. But they are not efficient (i.e., have minimum variance) compared to
the procedures that take into account autocorrelation. If means that, the
usual OLS estimators are not best linear unbiased estimators (BLUE).
iii. The estimator’s variances of OLS estimators are biased.
iv. Therefore, the usual t and F tests are not generally reliable.
v. As a consequence, the conventionally Computed R 2 may be an unreliable
measure of true R2.

4. Tests for Autocorrelation


i. Plot of residuals:
Plot et against ‘t’ and look for clusters of successive residuals on one side of the zero line.

ii. Durbin-Watson test:


Refer IIT Notes…

Note:
The test gives an output ranging from 0 to 4. The autocorrelation will be
 Closer to 0: Stronger and positive
 Middle: Low
 Closer to 4: Negative

iii. A Lagrange Multiplier test:


The Lagrange Multiplier test is used for detecting autocorrelation of the more general form
such as 2nd or 4th order autocorrelation, and the test is executed as follows:
a) First decide on the order of autocorrelation that we want to test
b) Run the usual OLS regression of y against the explanatory variable x.
y t =α + βx t +ut
and save the residuals; ut
c) Run a regression using the residuals from step ii as the dependent variable against the
explanatory variable xt, (as in ii) and also lagged variables of u (depending on the order of the
autocorrelation, in this case 2 lags)
ut =δ 0 +δ 1 x t +δ 2 ut −1 +δ 3 ut−2 + ε t
d) Calculate TR2 for this regression (total number of observations multiplied by the R2 value).
Under the null hypothesis of no autocorrelation, this statistic has a (chi-squared) distribution
with s (number of lags on error term) degrees of freedom. (in this case 2, which has a critical
value of 5.99).There are two important points regarding the Lagrange Multiplier test: firstly, it
is a large sample test, so caution 'is needed in interpreting results from a small sample; and
secondly, it detects not only autoregressive autocorrelation but also moving average
autocorrelation. Again caution is needed in interpreting the results.

iv. Ljung Box test:


It is also known as the modified Box-Pierce, Ljung–Box Q test, Box test, or Portmanteau
test. It tests for the absence of serial autocorrelation for a given lag “k.” In addition, it tests for
randomness and independence. If the autocorrelations of the residuals are very small, the model is
fine; that is, the model does not exhibit a significant lack of fit.
Purpose: Perform a Ljung-Box test for randomness.

Description:
There are a large number of tests of randomness (e.g., the runs tests). Autocorrelation plots
are one common method test for randomness. The Ljung-Box test is based on the autocorrelation
plot.
However, instead of testing randomness at each distinct lag, it tests the "overall" randomness
based on a number of lags.
For this reason, it is often referred to as a "portmanteau" test.
More formally, the Ljung-Box test can be defined as follows.
H0: The data are random (Or) The residuals are independently distributed
H1: The data are not random. (Or) The residuals are not independently distributed; they exhibit
serial correlation.

The test statistic is:

where ‘n’ is the sample size, ρj is the autocorrelation at lag j, and h is the number of lags being
tested.
Significance Level :α
Critical Region : The test statistic Q follows a chi-square distribution with h degrees of
freedom; that is, QLB ~ χ2(h).
We reject the null hypothesis and say that the residuals of the model are
not independently distributed if QLB > χ21-α, h
where χ2 is the percent point function of the chi-square distribution.
The Ljung-Box test is commonly used in ARIMA modeling. Note that it is applied to the
residuals of a fitted ARIMA model, not the original series.

v. A correlogram:
A pattern in the results is an indication for autocorrelation. Any values above zero should be
looked at with suspicion.

vi. The Moran’s I statistic, which is similar to a correlation coefficient:


This tool measures spatial autocorrelation using feature locations and feature values
simultaneously. The spatial autocorrelation tool utilizes multidimensional and multi-directional
factors.
The Moran’s I index will be a value between -1 and 1. Positive spatial autocorrelation will
show values that are clustered. Negative autocorrelation is dispersed. Random is close to zero.
The tool generates a Z-score and p-value which helps evaluate the significance of the
Moran’s index.
5. Estimation in levels versus first differences
Refer IIT Notes…

6. Meaning and Definition of Correlogram


A correlogram (also called Autocorrelation Function ACF Plot or Autocorrelation plot) is a
visual way to show serial correlation in data that changes over time (i.e. time series data). Serial
correlation (also called autocorrelation) is where an error at one point in time travels to a subsequent
point in time.
For example, we might overestimate the value of our stock market investments for the first quarter,
leading to an overestimate of values for following quarters.

One simple test of stationarity is based on the so-called autocorrelation function (ACF).
The ACF at lag k, denoted by ρk , is defined as
where covariance at lag ‘k’ and variance are as defined before.
Note that if k = 0, ρ0 = 1.
Since both covariance and variance are measured in the same units of measurement, ρk is a
unitless, or pure, number. It lies between −1 and +1, as any correlation coefficient does. If we plot ρk
against k, the graph we obtain is known as the population correlogram.
Since in practice we only have a realization (i.e., sample) of a stochastic process, we can only

compute the sample autocorrelation function (SAFC), . To compute this, we must first compute the

sample covariance at lag k, , and the sample variance, , which are defined as:

where n is the sample size and is the sample mean.


Therefore, the sample autocorrelation function at lag ‘k’ is:

which is simply the ratio of sample covariance (at lag k) to sample variance. A plot of against k is
known as the sample correlogram.

7. Correlogram of moving averages


A moving average term in a time series model is a past error (multiplied by a coefficient).
 Theoretical Properties of a Time Series with an MA(1) Model

 Proof of Properties of MA(1)

Note:
That the only nonzero value in the theoretical ACF is for lag 1. All other autocorrelations are 0.
Thus a sample ACF with a significant autocorrelation only at lag 1 is an indicator of a possible MA(1)
model.
Example:
 Theoretical Properties of a Time Series with an MA(2) Model

Note:
The only nonzero values in the theoretical ACF are for lags 1 and 2. Autocorrelations for higher
lags are 0. So, a sample ACF with significant autocorrelations at lags 1 and 2, but non-significant
autocorrelations for higher lags indicates a possible MA(2) model.

Example:

 ACF for General MA(q) Models


A property of MA(q) models in general is that there are nonzero autocorrelations for the first q
lags and autocorrelations = 0 for all lags > q.

 Invertibility of MA models
An MA model is said to be invertible if it is algebraically equivalent to a converging infinite order
AR model. By converging, we mean that the AR coefficients decrease to 0 as we move back in time.
Invertibility is a restriction programmed into time series software used to estimate the coefficients
of models with MA terms. It’s not something that we check for in the data analysis.
An invertible MA model is one that can be written as an infinite order AR model that converges
so that the AR coefficients converge to 0 as we move infinitely back in time. We’ll demonstrate
invertibility for the MA(1) model.

8. Correlogram of auto-regressive series


An autoregressive (AR) model forecasts future behavior based on past behavior data. This type of
analysis is used when there is a correlation between the time series values and their preceding and
succeeding values.
For example, if we know that the stock market has been going up for the past few days, we might
expect it to continue going up in the future.

Stationary series:
For an autocorrelation function (ACF) to make sense, the series must be a weakly
stationary series. This means that the autocorrelation for any particular lag is the same regardless of
where we are in time.
Stationary (Weakly) Series:

A series is said to be (weakly) stationary if it satisfies the following properties:

 The mean is the same for all ‘t’.

 The variance of is the same for all ‘t’.


 The covariance (and also correlation) between and is the same for all ‘t’ at each
lag ℎ = 1, 2, 3, etc.

Autocorrelation Function (ACF)

Let denote the value of a time series at time ‘t’. The ACF of the series gives correlations

between and for ℎ =1, 2, 3, etc. Theoretically, the autocorrelation between and equals

The denominator in the second formula occurs because the standard deviation of a stationary
series is the same at all times.
The last property of a weakly stationary series says that the theoretical value of autocorrelation of
particular lag is the same across the whole series. An interesting property of a stationary series is that
theoretically it has the same structure forwards as it does backward.
Many stationary series have recognizable ACF patterns. Most series that we encounter in practice,
however, is not stationary. A continual upward trend, for example, is a violation of the requirement that
the mean is the same for all ‘t’. Distinct seasonal patterns also violate that requirement. The strategies
for dealing with nonstationary series will unfold during the first three weeks of the semester.

The First-order Autoregression Model


We’ll now look at theoretical properties of the AR(1) model. The 1st order autoregression model
is denoted as AR(1).
In this model, the value of ‘x’ at time ‘t’ is a linear function of the value of ‘x’ at time t−1. The
algebraic expression of the model is as follows:

Assumptions

Properties of the AR(1)


Formulas for the mean, variance, and ACF for a time series process with an AR(1) model follow.
This defines the theoretical ACF for a time series variable with an AR(1) model.

Note:

is the slope in the AR(1) model and we now see that it is also the lag 1 autocorrelation.

Derivations of Properties of AR(1)


The algebraic expression of the model AR(1).is as follows:
Note 1: AR vs MA model time series
The AR part involves regressing the variable on its own lagged (i.e., past) values. The MA part
involves modeling the error term as a linear combination of error terms occurring contemporaneously
and at various times in the past.

Note 2: Difference between regression and autoregressive


Multiple regression models forecast a variable using a linear combination of predictors, whereas
autoregressive models use a combination of past values of the variable.

Note 3: Autocorrelation Function (ACF)

You might also like