Econometrics - Lecture Notes
Econometrics - Lecture Notes
2. Definition: Econometrics
Econometrics, the result of a certain outlook on the role of economics, consists of the application
of mathematical statistics to economic data to lend empirical support to the models constructed by
mathematical economics and to obtain numerical results.
Econometrics refers to the application of economic theory and statistical techniques for the
purpose of testing hypothesis and estimating and forecasting economic phenomenon. Literally
interpreted, econometrics means “economic measurement.”
3. Scope of Econometrics
Developing statistical methods for the estimations of economic relationships
Testing economic theories and hypothesis
Evaluating and applying economic policies
Forecasting
Collecting and analyzing non-experimental or observational data.
4. Aim of Econometrics
i) Formulation and specification of econometric models
The economic models are formulated in an empirically testable form. Several econometric models
can be derived from an economic model. Such models differ due to different choice of functional form,
specification of stochastic structure of the variables etc.
5. Methodology of Econometrics
a) Statement of theory or hypothesis.
b) Specification of the mathematical model of the theory
c) Specification of the statistical, or econometric, model
d) Obtaining the data
e) Estimation of the parameters of the econometric model
f) Hypothesis testing
g) Forecasting or prediction
h) Using the model for control or policy purposes.
d. Original Data
To estimate the econometric model given in Y 1 2x U, that is, to obtain the numerical
values of β1 and β2, we need data.
The Y variable is the aggregate (for the economy as a whole) personal consumption expenditure
(PCE) and the X variable is gross domestic product (GDP), a measure of aggregate income.
Therefore, the data are in “real” terms; that is, they are measured in constant prices.
Note: MPC: Average change in consumption over to change in real income.
f. Hypothesis Testing
According to “positive” economists like Milton Friedman, a theory or hypothesis that is not
verifiable by appeal to empirical evidence may not be admissible as a part of scientific enquiry.
Keynes expected the MPC to be positive but less than 1.
Economic theories on the basis of sample evidence are based on a branch of statistical theory
known as statistical inference (hypothesis testing).
g. Forecasting or Prediction
If the chosen model does refute the hypothesis or theory under consideration, we may use it to
predict the future value(s) of the dependent, or forecast, variable Y on the basis of known or
expected future value(s) of the explanatory, or predictor variable X.
Macroeconomic theory shows, the change in income following change in investment expenditure
is given by the income multiplier M.
Examples:
i. In a model studying supply and demand, the price of a good is an endogenous factor because the
price can be changed by the producer (supplier) in response to consumer demand.
ii. Personal income to personal consumption, since a higher income typically leads to increases in
consumer spending.
iii. Rainfall to plant growth is correlated and studied by economists since the amount of rainfall is
important to commodity crops such as corn and wheat.
iv. Education obtained to future income levels because there's a correlation between education and
higher salaries or wages.
Note 1:
i. The variables that contribute to provide explanations for the endogenous variables and values of
which are determined from outside the model are exogenous variables or predetermined variables.
ii. Exogenous variables are independent, and endogenous variables are dependent. Therefore, if the
variable does not depend on variables within the model, it's an exogenous variable. If the variable
depends on variables within the model, though, it's endogenous.
Note 2: Before we proceed to a formal analysis of regression theory, let us dwell briefly on the matter
of terminology and notation. In the literature the terms dependent variable and explanatory variable are
described variously. A representative list is:
8. (i) Structural form and reduced form of Endogenous and Exogenous variables
(ii) UNIT – V.
However, the PRF is not directly observable. We estimate it from the sample regression function (SRF):
which shows that the (the residuals) are simply the differences between the actual and
estimated Y values.
Now given ‘n’ pairs of observations on Y and X, we would like to determine the SRF in such a
manner that it is as close as possible to the actual Y. To this end, we may adopt the following criterion:
Choose the SRF in such a way that the sum of the residuals
is as small as possible.
If we adopt the criterion of minimizing , it shows that the residuals and as well as the
residuals and receive the same weight in the sum , although the first two residuals
are much closer to the SRF than the latter two.
In other words, all the residuals receive equal importance no matter how close or how widely
scattered the individual observations are from the SRF. A consequence of this is that it is quite possible
that the algebraic sum of the is small (even zero) although the are widely scattered about the SRF.
To see this, let assume the values of 10, −2, +2, and −10, respectively. The algebraic
sum of these residuals is zero although and are scattered more widely around the SRF than and
.
We can avoid this problem if we adopt the least-squares criterion, which states that the SRF can
be fixed in such a way that
is as small as possible, where are the squared residuals. By squaring , this method gives more
As noted previously, under the minimum criterion, the sum can be small even though the
are widely spread about the SRF. But this is not possible under the least-squares procedure, for the
that is, the sum of the squared residuals is some function of the estimators and . For any given set
of data, choosing different values for and will give different ’s and hence different values of
We have considered all the conceivable values of and . But since time, and certainly
patience, is generally in short supply, we need to consider some shortcuts to this trial-and-error process.
Fortunately, the method of least squares provides us such a shortcut. The principle or the method
of least squares chooses and in such a manner that, for a given sample or set of data, is as
small as possible.
In other words, for a given sample, the method of least squares provides us with unique estimates
where and are the sample means of X and Y and where we define
and
Henceforth, we adopt the convention of letting the lowercase letters denote deviations from mean
values.
The estimators obtained previously are known as the least-squares estimators, for they are derived
from the least-squares principle.
Properties of estimators obtained by the method of OLS
Numerical properties are those that hold as a consequence of the use of ordinary least squares,
regardless of how the data were generated.
We will also consider the statistical properties of OLS estimators, that is, properties “that hold
only under certain assumptions about the way the data were generated”.
The OLS estimators are expressed solely in terms of the observable (i.e., sample) quantities (i.e.,
X and Y). Therefore, they can be easily computed.
They are point estimators; that is, given the sample, each estimator will provide only a single
(point) value of the relevant population parameter.
Once the OLS estimates are obtained from the sample data, the sample regression line can be
easily obtained. The regression line thus obtained has the following properties:
a) It passes through the sample means of Y and X. This fact is obvious from
b) The mean value of the estimated is equal to the mean value of the actual Y
for
Summing both sides of this last equality over the sample values and dividing through by
the sample size n gives
But since
whence
can be expressed in an alternative form where both Y and X are expressed as deviations
from their mean values.
where yi and xi, following our convention, are deviations from their respective (sample) mean
values.
The equation
Notice that the intercept term is no longer present in it. But the intercept term can always be
estimated by
that is, from the fact that the sample regression line passes through the sample means of Y and
X. An advantage of the deviation form is that it often simplifies computing formulas.
In passing, note that in the deviation form, the SRF can be written as
d) The residuals are uncorrelated with the predicted Yi . This statement can be verified
as follows: using the deviation form, we can write
Assumptions:
1. The linear regression model is “linear in parameters.”
2. There is a random sampling of observations.
3. The conditional mean should be zero.
4. There is no multi-collinearity (or perfect collinearity).
5. Spherical errors: There is homoscedasticity and no auto-correlation
6. Optional Assumption: Error terms should be normally distributed.
These assumptions are extremely important because violation of any of these assumptions would make
OLS estimates unreliable and incorrect. Specifically, a violation would result in incorrect signs of OLS
estimates, or the variance of OLS estimates would be unreliable, leading to confidence intervals that are
too wide or too narrow.
Property 1: Linear
This property is more concerned with the estimator rather than the original equation that is being
estimated. In assumption A1, the focus was that the linear regression should be “linear in parameters.”
However, the linear property of OLS estimator means that OLS belongs to that class of estimators,
which are linear in Y, the dependent variable. Note that OLS estimators are linear only with respect to
the dependent variable and not necessarily with respect to the independent variables.
The linear property of OLS estimators doesn’t depend only on assumption A 1 but on all assumptions
A1 to A5.
Property 2: Unbiasedness
If we look at the regression equation, we will find an error term associated with the regression
equation that is estimated. This makes the dependent variable also random. If an estimator uses the
dependent variable, then that estimator would also be a random number. Therefore, before describing
what unbiasedness is, it is important to mention that unbiasedness property is a property of the estimator
and not of any sample.
Unbiasedness is one of the most desirable properties of any estimator. The estimator should
ideally be an unbiased estimator of true parameter/population values.
Example: Suppose there is a population of size 1000, and we are taking out samples of 50 from this
population to estimate the population parameters. Every time we take a sample, it will have the different
set of 50 observations and, hence, we would estimate different values of βo and βi. The unbiasedness
property of OLS method says that when we take out samples of 50 repeatedly, then after some repeated
attempts, we would find that the average of all the βo and βi from the samples will equal to the actual
(or the population) values of βo and βi.
Mathematically,
E(bo) = βo
E(bi) = βi
Here, ‘E’ is the expectation operator.
In layman’s term, if we take out several samples, keep recording the values of the estimates, and then
take an average, we will get very close to the correct population value. If our estimator is biased, then
the average will not equal the true parameter value in the population.
The unbiasedness property of OLS in Econometrics is the basic minimum requirement to be
satisfied by any estimator. However, it is not sufficient for the reason that most times in real-life
applications, we will not have the luxury of taking out repeated samples. In fact, only one sample will
be available in most cases.
Property 3: Best: Minimum Variance
First, let us look at what efficient estimators are. The efficient property of any estimator says that
the estimator is the minimum variance unbiased estimator. Therefore, if we take all the unbiased
estimators of the unknown population parameter, the estimator will have the least variance. The
estimator that has less variance will have individual data points closer to the mean.
As a result, they will be more likely to give better and accurate results than other estimators
having higher variance. In short:
If the estimator is unbiased but doesn’t have the least variance – it’s not the best!
If the estimator has the least variance but is biased – it’s again not the best!
If the estimator is both unbiased and has the least variance – it’s the best estimator.
Now, talking about OLS, OLS estimators have the least variance among the class of all linear
unbiased estimators. So, this property of OLS regression is less strict than efficiency property.
Efficiency property says least variance among all unbiased estimators, and OLS estimators have the
least variance among all linear and unbiased estimators.
In mathematically,
Let bo be the OLS estimator, which is linear and unbiased. Let bo be any other estimator of βo, which is
also linear and unbiased. Then,
Var (bo) < Var (bo∗)
Let bi be the OLS estimator, which is linear and unbiased. Let bi be any other estimator of βi, which is
also linear and unbiased. Then,
Var (bi) < Var (bi∗)
The above three properties of OLS model makes OLS estimators BLUE as mentioned in the
Gauss-Markov theorem.
It is worth spending time on some other estimators’ properties of OLS in econometrics. The
properties of OLS described below are asymptotic properties of OLS estimators. So far, finite sample
properties of OLS regression were discussed. These properties tried to study the behavior of the OLS
estimator under the assumption that we can have several samples and, hence, several estimators of the
same unknown population parameter. In short, the properties were that the average of these estimators
in different samples should be equal to the true population parameter (unbiasedness), or the average
distance to the true parameter value should be the least (efficient). However, in real life, we will often
have just one sample. Hence, asymptotic properties of OLS model are discussed which studies how
OLS estimators behave as sample size increases. Keep in mind that sample size should be large.
Property 4: Asymptotic Unbiasedness
This property of OLS says that as the sample size increases, the biasedness of OLS estimators
disappears.
Property 5: Consistency
An estimator is said to be consistent if its value approaches the actual, true parameter (population)
value as the sample size increases. An estimator is consistent if it satisfies two conditions:
a) It is asymptotically unbiased
b) Its variance converges to 0 as the sample size increases.
Both these hold true for OLS estimators and, hence, they are consistent estimators. For an estimator to
be useful, consistency is the minimum basic requirement. Since there may be several such estimators,
asymptotic efficiency also is considered.
Asymptotic efficiency is the sufficient condition that makes OLS estimators the best estimators.
Note:
Applications and How it Relates to Study of Econometrics
OLS estimators, because of such desirable properties discussed above, are widely used and find
several applications in real life.
Example:
Consider a bank that wants to predict the exposure of a customer at default. The bank can take the
exposure at default to be the dependent variable and several independent variables like customer level
characteristics, credit history, type of loan, mortgage, etc. The bank can simply run OLS regression and
obtain the estimates to see which factors are important in determining the exposure at default of a
customer. OLS estimators are easy to use and understand. They are also available in various statistical
software packages and can be used extensively.
OLS regressions form the building blocks of econometrics. Any econometrics class will start with
the assumption of OLS regressions. It is one of the favorite interview questions for jobs and university
admissions. Based on the building blocks of OLS, and relaxing the assumptions, several different
models have come up like GLM (generalized linear models), general linear models, heteroscedastic
models, multi-level regression models, etc.
Research in Economics and Finance are highly driven by Econometrics. OLS is the building
block of Econometrics. However, in real life, there are issues, like reverse causality, which render OLS
irrelevant or not appropriate. However, OLS can still be used to investigate the issues that exist in cross-
sectional data. Even if OLS method cannot be used for regression, OLS is used to find out the problems,
the issues, and the potential fixes.
Link:
https://www.albert.io/blog/ultimate-properties-of-ols-estimators-guide/
UNIT – II
regression analysis our objective is not only to obtain and but also to draw inferences about the
true β1 and β2.
For example, we would like to know how close and are to their counterparts in the
PRF: Yi = β1 + β2Xi + ui
Therefore, unless we are specific about how Xi and ui are created or generated, there is no way we
can make any statistical inference.
In this lesson, we will study about the various methods through which the regression models draw
inferences about the various parameters. Basically, there are three methods through which we do this:-
The proof that OLS generates the best results is known as the Gauss-Markov theorem, but the
proof requires several assumptions. These assumptions, known as the classical linear regression model
(CLRM) assumptions, are the following:
Assumption 1: Linear regression model.
The regression model is linear in the parameters,
i = 1 + 2Xi + ui
2. Meaning of Heteroskadasticity
One of the assumptions made about residuals/errors in OLS regression is that the errors have the
same but unknown variance. This is known as constant variance or homoscedasticity. When this
assumption is violated, the problem is known as heteroscedasticity.
Breusch-Pagan- It is very sensitive to the This test can be employed if the error
Godfrey test assumption of normal distribution term or residuals are normally
of the residuals. distributed.
Therefore, it is advisable to check the
normality of residuals before
implementing this test.
Heteroscedasticity
Limitations When to use
test
5. Compute F= 2
σ^ 1
6. If F> FtabReject H o : the data is homoscedastic.
4. Consequences of Heteroskadasticity
The OLS estimators and regression predictions based on them remains unbiased and consistent.
The OLS estimators are no longer the BLUE (Best Linear Unbiased Estimators) because they are
no longer efficient, so the regression predictions will be inefficient too.
Because of the inconsistency of the covariance matrix of the estimated regression coefficients, the
tests of hypotheses, (t-test, F-test) are no longer valid.
Predictions
The forecasted or predicted values of the dependent variable based on a heteroscedastic model
will have high variance. This is because the OLS estimates are no longer efficient.
The variance of residuals is not minimum in presence of heteroscedasticity due to which the
variance of predictions is also high.
Biasedness
The unbiasedness property of OLS estimates does not require a constant variance of residuals.
However, the predictions from a heteroscedastic model can still end up being biased, especially
in the case of observations with large residuals.
Note:
In the presence heteroskedasticity, the least squares estimators will get affected in the following
manner.
i. The least squares estimators are still unbiased but inefficient.
ii. The estimates of the variances are also biased.
^β= ∑ ∑ xi ui
xi yi
=β +
∑x i
2
∑ x i2
Thus, ^β is unbiased.
V ( ^β )=V
( x1
S xx
x2 xn
u1 + u 2+ … .+ u n
S xx S xx )
1
2 ( 1
x σ 1 + x 2 σ 2 +…+ x n σ n )…(2)
2 2 2 2 2 2
¿
S xx
¿
∑ xi2 σ i2
2
(∑ xi2 )
Suppose that we write σ i2 =σ 2 zi2, where z i are known, that is, we know the variances up to a
multiplicative constant. Then, dividing (1) by z i, we have the model
yi xi
= β +v i …(3)
zi zi
where v i=ui / z i has a constant variance σ 2 . Since we are “weighting” the ith observation by 1/ z i, the
OLS estimation of is called weighted least squares(WLS). If β ¿is the WLS estimator of β , we have
β ¿=
∑ ( y i /z i ) ( x i / zi )
∑ ( x i / z i )2
¿ β+
∑ ( xi /z i ) v i
∑ ( xi /z i )2
∑ ( x i / z i )2
And substituting σ i2 =σ 2 zi2 into equation (2), we have
2 ∑ xi z i
2
V ( ^β )=σ 2
( ∑ x i2 )
(∑ xi2 )
2
V (β )
¿
=
V ( ^β ) ∑ ( xi ¿ z i ) ∑ x i z i
2 2
This expression is of the form ( ∑ ai bi ) =∑ ai2 ∑ bi2, where a i=x i z i and b i=x i ¿ z i.
2
Given the Cauchy-Schwartz inequality, it is less than 1 and is equal to 1 only if a i∧bi are
proportional, that is x i z i and x i ¿ z i are proportional or z i2 is a constant, which is the case if the errors are
homoscedastic.
Thus the OLS estimator is unbiased but less efficient (has a higher variance) than the WLS
estimator.
The estimation of the variance of ^β , it is estimated by
RSS 1
n−1 ∑ x i2
where RSS is the residual sum of squares from the OLS model. But
E ( RSS )=E [ ∑ ( y − ^β x ) ]
i i
2
¿∑ σi −
∑
2
2 2
xi σi
∑ x i2
Note that, if σ i2 =σ 2 for all i, this reduces to ( n−1 ) σ 2. Thus we would be estimating the variance of ^β by
an expression whose expected value is
∑ x i2 ∑ σ i2 −∑ xi2 σ i2
2
( n−1 ) ( ∑ x i2)
whereas the true variance is
∑ x i2 σ i2
2
(∑ x i2 )
6. Causes of Heteroscedasticity
i. Outliers
Outliers are specific values within a sample that are extremely different (very large or small)
from other values.
Outliers can alter the results of regression models and cause heteroscedasticity.
Outlying observations can often lead to a non-constant variance of residuals.
iv. Error-learning
Let us consider an example for this case. Errors in human behaviour become smaller over time
with more practice or learning of an activity.
In such a case, errors will tend to decrease. For example, with the skill development of labour,
their error will decrease leading to lower defective products in the manufacturing process or a
rise in their productivity. Hence, error variance will decrease in such a setup.
v. Nature of variables
For instance, an increase in income is accompanied by an increase in choices to spend that extra
income.
This leads to discretionary expenditure. In such a case, error variance will increase with an
increase in income.
A model with consumption as a dependent variable and income as an independent variable can
have an increasing error variance. Hence, the nature of variables and their relationships can play
a huge role in this phenomenon.
{
y λ−1
for λ ≠0
6. Formula for computing Y( λ ) is Y ( λ ) = λ
log y for λ=0
Note 2:
Heteroskedasticity is Greek for data with a different dispersion. For example, in statistics, If a
sequence of random variables has the same finite variance, it is called homoskedastic dispersion; if a
sequence does not have the same variance, it is known as heteroscedastic dispersion.
Dispersion is a means of describing the extent of distribution of data around a central value or
point. Lower dispersion indicates higher precision in data measurements, whereas higher dispersion
means lower accuracy.
A residual spread is plotted on a graph to visualize the correlation between data and a particular
statistical model. If the dispersion is heteroscedastic, it is seen as a problem. Researchers look for
homoskedasticity. The data points should have a constant variance to satisfy the researcher’s
assumption.
In residual plots, heteroskedasticity in regression is cone-shaped. In scatter plots, variance
increases with the increase in fitted value. For cross-sectional studies like income, the range is from
poverty to high-income citizens; when plotted on a graph, the data is heteroskedastic.
Heteroskedasticity is categorized into two types.
Pure heteroskedasticity
Impure heteroskedasticity
In pure heteroskedastic dispersion, the chosen statistical model is correct. But with, impure
(residual) heteroskedastic dispersion errors are observed—as a result, the statistic model is incorrect for
the given data. These errors cause variance.
Methods
There are three methods to fix heteroskedasticity and improve the model –
Redefining variables
Weighted regression
Transform the dependent variable
In the first method, the analyst can redefine the variables to improve the model and get desired
results with accuracy. In the second method, the regression analysis is appropriately weighted. Finally,
the third approach is to interchange the working in every model. For example, there is a dependent
variable and a predictor variable; by changing the dependent variable, the whole model gets revamped.
Thus, it is an important approach to move forward.
Independent variables used in regression analyses are referred to as the predictor variable. The
predictor variable provides information on an associated dependent variable regarding a particular
outcome.
It is imperative to determine whether data exhibits pure heteroskedastic dispersion or impure
heteroskedastic dispersion—the approach varies for each subtype. Furthermore, improving the variables
used in the impure form is important. If ignored, these variables cause bias in coefficient accuracy,
and p-values become smaller than they should be.
Causes
Heteroskedastic dispersion is caused due to the following reasons.
It occurs in data sets with large ranges and oscillates between the largest and smallest
values.
It occurs due to a change in factor proportionality.
Among other reasons, the nature of the variable can be a major cause.
It majorly occurs in cross-sectional studies.
Some regression models are prone to heteroskedastic dispersion.
An improper selection of regression models can cause it.
It can also be caused by data set formations and inefficiency of calculations as well.
Examples
Example: 1
The most basic heteroskedastic example is household consumption. The variance in consumption
increases with an increase in income—directly proportional. Because when the income is low, the
variance in consumption is also low. Low-income people spend predominantly on necessary items and
bills—less variance. In contrast, with the increase in income, people tend to buy luxurious items and
develop a plethora of habits—less predictable.
Example: 2
A coach correlating runs scored by each player with time spent in training is an example of
homoskedasticity. In this case, the scores would become the dependent variable, and training time is the
predictor variable.
Example: 3
The basic application of heteroskedasticity is in the stock market—variance in stock is compared for
different dates. In addition, investors use heteroskedastic dispersion in regression models to track
securities and portfolios. Heteroskedastic dispersion can or cannot be predicted depending on the
particular situation taken into a study.
For example, when the prices of a product are studied at the launch of a new model, heteroskedasticity
is predictable. But for rainfall or income comparisons, the nature of dispersion cannot be predicted.
Although the MWD test seems involved, the logic of the test is quite simple. If the linear model is
in fact the correct model, the constructed variable Z 1 should not be statistically significant in Step IV,
for in that case the estimated Y values from the linear model and those estimated from the log–linear
model (after taking their antilog values for comparative purposes) should not be different. The same
comment applies to the alternative hypothesis H1.
UNIT – III
1. Meaning of Autocorrelation
Autocorrelation refers to the degree of correlation of the same variables between two successive
time intervals. It measures how the lagged version of the value of a variable is related to the original
version of it in a time series.
Autocorrelation, as a statistical concept, is also known as serial correlation. It is often used with
the autoregressive-moving-average model (ARMA) and autoregressive-integrated-moving-average
model (ARIMA). The analysis of autocorrelation helps to find repeating periodic patterns, which can be
used as a tool for technical analysis in the capital markets.
Autocorrelation gives information about the trend of a set of historical data so that it can be useful
in the technical analysis for the equity market.
However, autocorrelation can also occur in cross-sectional data when the observations are related
in some other way.
The value of autocorrelation ranges from -1 to 1. A value between -1 and 0 represents negative
autocorrelation. A value between 0 and 1 represents positive autocorrelation.
Examples:
In a survey, for instance, one might expect people from nearby geographic locations to provide more
similar answers to each other than people who are more geographically distant. Similarly, students
from the same class might perform more similarly to each other than students from different classes.
Assume an investor is looking to discern if a stock's returns in her portfolio exhibit autocorrelation;
the stock's returns are related to its returns in previous trading sessions. If the returns do exhibit
autocorrelation, the stock could be characterized as a momentum stock; its past returns seem to
influence its future returns. The investor runs a regression with two prior trading sessions' returns as
the independent variables and the current return as the dependent variable. She finds that returns one
day prior have a positive autocorrelation of 0.7, while the returns two days prior have a positive
autocorrelation of 0.3. Past returns seem to influence future returns, and she can adjust her portfolio
to take advantage of the autocorrelation and resulting momentum.
2. Sources of Autocorrelation
Source of autocorrelation Some of the possible reasons for the introduction of autocorrelation in
the data are as follows:
Carryover of effect, at least in part, is an important source of autocorrelation. For example, the
monthly data on expenditure on household is influenced by the expenditure of preceding month.
The autocorrelation is present in cross-section data as well as time-series data. In the cross-
section data, the neighbouring units tend to be similar with respect to the characteristic under
study. In time-series data, time is the factor that produces autocorrelation. Whenever some
ordering of sampling units is present, the autocorrelation may arise.
Another source of autocorrelation is the effect of deletion of some variables. In regression
modeling, it is not possible to include all the variables in the model. There can be various
reasons for this, e.g., some variable may be qualitative, sometimes direct observations may not
be available on the variable etc. The joint effect of such deleted variables gives rise to
autocorrelation in the data.
The misspecification of the form of relationship can also introduce autocorrelation in the data. It
is assumed that the form of relationship between study and explanatory variables is linear. If
there are log or exponential terms present in the model so that the linearity of the model is
questionable, then this also gives rise to autocorrelation in the data. 4. The difference between
the observed and true values of the variable is called measurement error or errors–in-variable.
The presence of measurement errors on the dependent variable may also introduce the
autocorrelation in the data.
3. Consequences of Autocorrelation
Refer IIT Notes…
Note:
The test gives an output ranging from 0 to 4. The autocorrelation will be
Closer to 0: Stronger and positive
Middle: Low
Closer to 4: Negative
Description:
There are a large number of tests of randomness (e.g., the runs tests). Autocorrelation plots
are one common method test for randomness. The Ljung-Box test is based on the autocorrelation
plot.
However, instead of testing randomness at each distinct lag, it tests the "overall" randomness
based on a number of lags.
For this reason, it is often referred to as a "portmanteau" test.
More formally, the Ljung-Box test can be defined as follows.
H0: The data are random (Or) The residuals are independently distributed
H1: The data are not random. (Or) The residuals are not independently distributed; they exhibit
serial correlation.
where ‘n’ is the sample size, ρj is the autocorrelation at lag j, and h is the number of lags being
tested.
Significance Level :α
Critical Region : The test statistic Q follows a chi-square distribution with h degrees of
freedom; that is, QLB ~ χ2(h).
We reject the null hypothesis and say that the residuals of the model are
not independently distributed if QLB > χ21-α, h
where χ2 is the percent point function of the chi-square distribution.
The Ljung-Box test is commonly used in ARIMA modeling. Note that it is applied to the
residuals of a fitted ARIMA model, not the original series.
v. A correlogram:
A pattern in the results is an indication for autocorrelation. Any values above zero should be
looked at with suspicion.
One simple test of stationarity is based on the so-called autocorrelation function (ACF).
The ACF at lag k, denoted by ρk , is defined as
where covariance at lag ‘k’ and variance are as defined before.
Note that if k = 0, ρ0 = 1.
Since both covariance and variance are measured in the same units of measurement, ρk is a
unitless, or pure, number. It lies between −1 and +1, as any correlation coefficient does. If we plot ρk
against k, the graph we obtain is known as the population correlogram.
Since in practice we only have a realization (i.e., sample) of a stochastic process, we can only
compute the sample autocorrelation function (SAFC), . To compute this, we must first compute the
sample covariance at lag k, , and the sample variance, , which are defined as:
which is simply the ratio of sample covariance (at lag k) to sample variance. A plot of against k is
known as the sample correlogram.
Note:
That the only nonzero value in the theoretical ACF is for lag 1. All other autocorrelations are 0.
Thus a sample ACF with a significant autocorrelation only at lag 1 is an indicator of a possible MA(1)
model.
Example:
Theoretical Properties of a Time Series with an MA(2) Model
Note:
The only nonzero values in the theoretical ACF are for lags 1 and 2. Autocorrelations for higher
lags are 0. So, a sample ACF with significant autocorrelations at lags 1 and 2, but non-significant
autocorrelations for higher lags indicates a possible MA(2) model.
Example:
Invertibility of MA models
An MA model is said to be invertible if it is algebraically equivalent to a converging infinite order
AR model. By converging, we mean that the AR coefficients decrease to 0 as we move back in time.
Invertibility is a restriction programmed into time series software used to estimate the coefficients
of models with MA terms. It’s not something that we check for in the data analysis.
An invertible MA model is one that can be written as an infinite order AR model that converges
so that the AR coefficients converge to 0 as we move infinitely back in time. We’ll demonstrate
invertibility for the MA(1) model.
Stationary series:
For an autocorrelation function (ACF) to make sense, the series must be a weakly
stationary series. This means that the autocorrelation for any particular lag is the same regardless of
where we are in time.
Stationary (Weakly) Series:
Let denote the value of a time series at time ‘t’. The ACF of the series gives correlations
between and for ℎ =1, 2, 3, etc. Theoretically, the autocorrelation between and equals
The denominator in the second formula occurs because the standard deviation of a stationary
series is the same at all times.
The last property of a weakly stationary series says that the theoretical value of autocorrelation of
particular lag is the same across the whole series. An interesting property of a stationary series is that
theoretically it has the same structure forwards as it does backward.
Many stationary series have recognizable ACF patterns. Most series that we encounter in practice,
however, is not stationary. A continual upward trend, for example, is a violation of the requirement that
the mean is the same for all ‘t’. Distinct seasonal patterns also violate that requirement. The strategies
for dealing with nonstationary series will unfold during the first three weeks of the semester.
Assumptions
Note:
is the slope in the AR(1) model and we now see that it is also the lag 1 autocorrelation.