0% found this document useful (0 votes)
25 views

Econometrics Notes

The document outlines the topics covered in four units of an econometrics course. Unit 1 introduces basic econometrics concepts like simple and multiple regression models under classical assumptions. Unit 2 covers violations of assumptions like multicollinearity and methods to address them. Time series topics like unit roots and stationarity are also presented. Unit 3 discusses panel data regression models. Unit 4 covers dummy variables and qualitative response models. The key assumptions of the classical linear regression model are also summarized, including the properties estimators must have like efficiency, unbiasedness and consistency.

Uploaded by

Ritvik Dutta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Econometrics Notes

The document outlines the topics covered in four units of an econometrics course. Unit 1 introduces basic econometrics concepts like simple and multiple regression models under classical assumptions. Unit 2 covers violations of assumptions like multicollinearity and methods to address them. Time series topics like unit roots and stationarity are also presented. Unit 3 discusses panel data regression models. Unit 4 covers dummy variables and qualitative response models. The key assumptions of the classical linear regression model are also summarized, including the properties estimators must have like efficiency, unbiasedness and consistency.

Uploaded by

Ritvik Dutta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

Econometrics Notes

Unit 1:
(Basic Econometrics: Chapter 1-9)
(Ecotrix by Examples: Chapters 1-3)
. Introduction to Econometrics and overview of its applications
. Simple Regression with Classical Assumptions
. Least Square Estimation and BLUE, Properties of Estimators, Multiple
Regression Model and Hypothesis Testing Related to Parameters:
Simple and Joint
. Functional Forms of Regression Models

Unit 2:
(Basic Econometrics: Chapter 10-13 and Chapter 21,22)
(Ecotrics by Examples: Chapters 4-7 and Chapter 13)
3
. Violations of classical assumptions
. multicollinearity, heteroscedasticity, autocorrelation and model
specification errors, their identification, their impact on
parameters
2
. Tests related to parameters and impact on the reliability and the
validity of inferences in cases of violations of Assumptions
. Method to take care of violations of assumptions, goodness of fit
1
. Time Series econometrics:
2
. stationary stochastic processes, non stationary stochastic
3
processes, unit root stochastic processes, trend stationary and
difference stationary stochastic processes.
2
. Tests of stationarity: graphical analysis and autocorrelation
function (ACF) and correlogram statistical significance of
autocorrection coefficients. Unit root test - the augment dickey-
1
fuller (ADF) test.
1
. Transforming non stationary financial time series - difference
stationary processes and trend-stationary processes

Unit 3:
(Basic Econometrics: Chapter 16)
4
(Ecotrics by Examples: Chapters 17,19)

Panel Data regression Models: Importance of panel data, Pooled OLS


regression of charity function, the fixed effects least squares dummy variable
3
(LSDV) model, Limitations of fixed effects LSDV, the fixed effects within group
2
(WG) estimation, random effects model (REM) or error components model
1
(ECM), fixed effects models vs random effects model and properties of various
estimators.

Stochastic Regressors and the method of instrumental variables - the problem


of endogeneity, the problem with stochastic regressors, reasons for correlation
between regressor and the error term and the method of instrumental variables
(2SLS).

Unit 4:
(Basic Econometrics: Chapter 9, 15)
(Ecotrics by Examples: Chapters 3,8)

Dummy variables: intercept dummy variables, slope dummy variables,


interactive dummy variables. Use of dummy variables to model qualitative/
binary/strucutural changes, other functional forms, qualitative response
regression models or regression models with limited dependent variables - use
of logit and probit models.

Most common topics:

Theory:
. Assumptions of CLRM
. Any one specific assumption, how to detect, how to fix, implications
on the model
. Trend stationary and difference stationary
. Unit root stationary

Practical:
. Multicollineary
. Autocorrelation
. Interoperation of model
. Dummy variables
. Log-Lin model s
. Logit/probit (not that common)
. Heteroscedasticity
. Regression Table: Pvvalue, T stat, Fstat, coefficients etc.
. Prediction
. Calculation of B1 and B2 in regression using formula
. Cobb Douglas
.

UNIT 1

Desirable Properties of Estimators


. Efficiency: Var (Beta hat) should be minimum

Beta hat must have minimum amount of variance. If the variance of Beta hat is
too far, then that means that it may not accurately capture the value of true
population parameter Beta and hence inferences we draw from the model will
not be precise.

Assume that your true population parameter is represented by Beta. It is shown


by the big black dot in the picture. Sometimes our estimate of this Beta (given
by Beta hat) will be close to the true population parameter, and sometimes they
will be far off. They are represented by the small black dots in the figure. We
want this variance to be minimum. Because if the variance is too large then the
model is unreliable.

. Unbiasedness

When we draw samples from the population, the Beta hat calculated will not
necessarily be equal to the population parameter Beta. Sometimes it will be
higher and sometimes it will be lower.
However, if we draw a large number of samples and compute Beta hat for each
of them, the mean of all of those Beta hats should be equal to the true
population parameter Beta.

. Consistency

The symbol on the right is delta (small).


The property of consistency states that the difference between Beta hat and
Beta should be minimum. It means it should be less than delta where delta is a
very small quantity.
The probability of getting a Beta hat such that its difference with Beta is very
very small (less than delta) is equal to 1 - epsilon where epsilon is a very very
small quantity as well and this happens as n (sample size) tends to infinity.

Consistency property is also called asymptotic property is satisfied only with a


large sample size.

These 3 properties of a good estimator are met only when we ensure certain
assumptions based on which our model is estimated.

When one of these assumptions are violated, then one or more of these
properties do not hold.

CLASSICAL LINEAR REGRESSION MODEL (CLRM)

Regression:
Regression is an analysis of the dependence of dependent variables on
independent variables with an objective to predict the value of the average
value of the dependent variable given a specific value of the independent
variable.
Suppose this is our regression equation.

Yi is dependent variable, xi is independent variable. We can say this model tries


to predict consumption as a function of income. So consumption is the
dependent variable, income is the independent variable.

This represents the prediction given by the regression function. A regression


analysis gives the AVERAGE value of the dependent variable given a specific
value of the independent variable.

PRF is the population regression function. It is in stochastic form because it has


an error term which is a random variable.

The E(Yi|Xi) function gives us the PRF in deterministic form because this has no
error term. This is because E(ui|Xi) = 0 based on an assumption of the CLRM.
Hence, he expected value of the error term for any given Xi is always 0.

This is the SAMPLE REGRESSION FUNCTION.

Beta interpretation: for a one-unit change in Xi, Yi changes by Beta ON


AVERAGE. The reference point of this change is always from the sample
average.

Ui: ERROR TERM


Ui is the difference between the actual value of the observation and the
predicted value of the observation.
ASSUMPTIONS OF CLRM:

If we follow the assumptions of CLRM, then our model will exhibit the properties
of efficiency, unbiasedness, and consistency.

. Linearity
The regression model must be linear in its parameters.
Linearity is required not for the variables, but only for the parameters.

Although the variable Xi^2 is not linear, this is still a linear model because the
parameters (Alpha and Beta) are linear.

Here, one of the parameters is not Linear B^2, hence the model is not linear.
Such models cannot be evaluated using CLRM. We will have to use non-linear
models.

. There should be enough variation in Xi to be qualified as an


explanatory/independent variable.

For example, suppose we want to model the consumption level taking


income as the independent variable. If everyone in the sample has an
income of Rs 10,000, then there is no variation in Xi. Hence, the
difference in their consumption levels cannot be explained by Xi.
Hence, we assume that there is enough variation in Xi. Otherwise we
cannot include it as an explanatory variable in the model.

. The mean value of the stochastic error term is zero and it follows a
normal distribution.

Mean value of error term is zero.


(This is REQUIRED for estimation)

Error term follows a normal distribution:

(This is required for INFERENCE making/hypothesis testing)

. The values of Xis are fixed over repeated sampling.

. There should be no endogeneity.


The covariance between the independent variable and the error term
is zero.

. Number of parameters to be estimated (k) from the model should be


much less than the total number of observations in the sample (n)

So, n >> k

As a rule of thumb n should be equal to 20 times k or higher.


So if you have 2 parameters to be estimated, then your n should be at
least 40.

. There should be no model misspecification.


The econometric model should be correctly specified. That means
there should not be any model misspecification.

Two types of model misspecification:


. Due to improper functional form
If the true relationship is non-linear and we use a linear model to
specify the model, then it is a case of improper functional form.
Hence, the model will sometimes underestimate the value and in
some cases it will overestimate the value.

. Due to inclusion of an irrelevant variable or exclusion of a relevant


variable.
.

In case A, there is inclusion of an irrelevant variable and in case B, there is


exclusion of a relevant variable.

. Homoskedasticity
Variance of the error term is constant.

So variance of Ui, Uj, Uk must all be the same.

If this is the case, then there is a situation of homoskedasticity. Otherwise, it is


a situation of heteroskefasticity. There should be no hetereoskedasitvity.

Even at higher levels of Xi, the variance of the error term remains constant.
. There should not be any autocorrelation.
This means that the covariance between Ut and Ut-1 should be zero. If
that is not the case, then it is a situation of autocorrelation.

In cross sectional data, if two error terms do not have zero covariance, then it is
a situation of SPATIAL CORRELATION

In time series data, if two error terms for consecutive time periods do not have
zero covariance, then it is a station of AUTOCORRELATION OR SERIAL
CORRELATION.

Why does it arise?


It may happen due to the influence of a variable that is not included in the
model. For example. Let us say weather affects the consumption patterns of
individuals. Hence, it may influence the value of consumption in the same
manner for 2 different individuals experiencing similar weather conditions and
hence their error terms may be correlated.

. Multicollinearity
There should not be a high degree of correlation between any two
independent variable. If the correlation is high, then we say that the
model has high multicollinearity.

This shows a situation of multicollinearity

ORDINARY LEAST SQUARES METHOD (OLS)


We cannot minimise sum of errors because the negative and positive errors will
cancel each other out. Hence, the summation of the squared errors must be
minimised.

Fromula for Beta hat:


Properties of Beta hat:

. It depends solely on the sample observations


. It is the point estimate of the true population parameter.
. The regression line passes through the sample means of X and Y and
has a slope equal to Beta hat

Interpretation of parameters:

. Beta hat: For a unit change in income (independent var), consumption


(dependent var) changes by Beta hat units ON AN AVERAGE.

Limitations:
Under OLS, we are giving equal weightage to all error terms. However, we
should give more weightage to points closer to the regression line and lesser
weightage to the points away from the regression line, but that does not
happen in the case of the OLS method. When we square these errors, the
points that are farther away from the line are actually contributing MORE
to the RMSE and hence are given more importance than the points that are
closer to the line.
We resolve this using the Weighted Least Squares method.

Population Regression Curve


a population regression curve is simply the locus of the conditional means of
the dependent variable for the fixed values of the explanatory variable(s). More
simply, it is the curve connecting the means of the subpopulations of Y
corresponding to the given values of the regressor X.

Error term
ui is known as the stochastic disturbance or stochastic error term.
the disturbance term ui is a surrogate for all those variables that are omitted
from the model but that collectively affect Y.

The stochastic error term is a catchall that includes all those variables that
cannot be readily quantified. It may represent variables that cannot be included
in the model for lack of data availability, or errors of measurement in the data,
or intrinsic randomness in human behavior. Whatever the source of the random
term u, it is assumed that the average effect of the error term on the
regressand is marginal at best

Estimator and Estimate


an estimator, also known as a (sample) statistic, is simply a rule or formula or
method that tells how to estimate the population parameter from the
information provided by the sample at hand. A particular numerical value
obtained by the estimator in an application is known as an estimate

BLUE
BEST LINEAR UNBIASED ESTIMATOR

An estimator, say the OLS Estimator Beta hat, is said to be a best linear
unbiased estimator (BLUE) if the following hold:

. It is linear
It is a linear function of a random variable such as the dependent
variable Y in the regression model

. It is unbiased
The average or expected value of the estimator is equal to the true
value.
E(Beta hat) = Beta

. It is efficient
It has minimum variance in the class of all such linear unbiased
estimator.
An unbiased estimator with the least variance is known as an efficient
estimator.

As long as the assumptions of CLRM are ensured, the OLS estimator will be
BLUE.

COEFFICIENT OF DETERMINATION
The overall goodness of fit of the regression model is measured by the
coefficient of determination, r2. It tells what proportion of the variation in the
dependent variable, or regressand, is explained by the explanatory variable, or
regressor. This r 2 lies between 0 and 1; the closer it is to 1, the better is the fit.

TSS: TOTAL SUM OF SQUARES


total variation of the actual Y values about their sample mean which may be
called the total sum of squares (TSS)
Limitations of R2 and Adjusted R2

METHODS OF TESTING THE NORMALITY ASSUMPTION


. Histogram of Residuals
We plot a histogram of the residuals and see if the shape of the
histogram resembles a bell-curve.

. Normal Probability Plot


A comparatively simple graphical device to study the shape of the
probability density function (PDF) of a random variable is the normal
probability plot (NPP), which makes use of normal probability paper,
a specially designed graph paper. On the horizontal, or X, axis, we plot
values of the variable of interest (say, OLS residuals, ui ), and on the
vertical, or Y, axis, we show the expected value of this variable if it
were normally distributed. Therefore, if the variable is in fact from the
normal population, the NPP will be approximately a straight line.

if the fitted line in the NPP is approximately a straight line, one can
conclude that the variable of interest is normally distributed.

. Jacque-Bera (JB) Test of Normality


The JB test of normality is a large sample test. It is also based on the
OLS residuals. The test first computes the skewness (S) and kurtosis
(K) measures of the residuals.

For a normally distributed variable, S=0, K=3. The JB normality test is


a test of joint hypothesis that S and K are 0 and 3 respectively. We use
the following test to compute the JB test statistic.

where n = sample size, S = skewness coefficient, K = kurtosis coefficient.

Hence, if the error terms are normally distributed then the JB test should give a
test statistic equal to zero.

If the computed p-value of the JB test statistic is sufficiently low, which will
happen if the value of the computed JB test statistic is significantly different
from 0, then we can reject the null hypothesis that the error terms are normally
distributed.

MULTIPLE REGRESSION MODELS


Interpretation of B1: average value of Y when X2 and X3 are set equal to zero.

The coefficients β2 and β3 are called the partial regression coefficients

The meaning of partial regression coefficient is as follows: β2 measures the


change in the mean value of Y, E(Y), per unit change in X2, holding the value of
X3 constant.

The proportion of the varia- tion in Y explained by the variables X2 and X3


jointly. The quantity that gives this information is known as the multiple
coefficient of determination and is denoted by R2; conceptually it is akin to
r2.

HYPOTHESIS TESTING OF PARAMETERS

SIMPLE

Under hypothesis testing for parameters, we check to see if the true population
parameter of a given regression coefficient is zero.

In practice, a low p value suggests that the estimated coefficient is statistically


significant. This would suggest that the particular variable under consideration
has a statistically significant impact on the regressand, holding all other
regressor values constant.

Interval Estimate for Regression Coefficient

JOINT
If the COMPUTED F-VALUE is greater than the CRITICAL F-VALUE at the given
level of significance, then we reject the null hypothesis and say that AT LEAST
ONE of the regressors is statistically significant.

It is very important to note that the use of the t and F tests is explicitly
based on the assumption that the error term, Ui, is normally distributed

FUNCTIONAL FORMS OF REGRESSION MODELS

. Log-linear or double log models where the regressed and regressors


both are in logarithmic form
. Log-lin models in which the regressed is logarithmic but the
regressors can be in log or linear form.
. Lin-log models in which the regressand is in linear form but one or
.
more regressors are in log form.
. Reciprocal models in which the regressors are in inverse form
. Standardised variable regression models

Log-Linear Model/Double Log/Constant Elasticity models

Where regressed and regressor all are in logarithmic form

Equation 2.3 is linear in the parameters A, B2, and B3 and hence is a linear
equation although it is non linear in the variables.

An interesting feature of the log-linear model is that the slope coefficients can
be interpreted as elasticities.

Specifically, B2 is the (partial) elasticity of output with respect to the labor


input, holding all other variables constant (here capital, or K). That is, it gives
the percentage change in output for a percentage change in the labor input,
ceteris paribus. Similarly, B3 gives the (partial) elasticity of output with respect
to the capital input, holding all other inputs constant. Since these elasticities
are constant over the range of observations, the double-log model is also
known as constant elasticity model.

Let us assume the output table for equation 2.4 is as follows


The coefficient of 0.47 of ln(LABOUR) indicates that if we increase the labour
input by 1%, the output goes up by 0.46% on average, holding the capital input
constant.

Similarly, if we increase the capital input by 1%, then the output goes up by
0.5% on average, holding the labour input constant.

In the two-variable model, the simplest way to decide whether the log-linear
model fits the data is to plot the scattergram of ln Yi against ln Xi and see if the
scatter points lie approximately on a straight line

Sum of Partial Slope Coefficients

Another interesting property of the CD function is that the sum of the partial
slope coefficients, (B2 +B3), gives information about returns to scale, that is,
the response of output to a proportional change in the inputs. If this sum is 1,
then there are constant returns to scale that is, doubling the inputs will double
the output, tripling the inputs will triple the output, and so on. If this sum is less
than 1, then there are de~ creasing returns to scale that is, doubling the inputs
less than doubles the output. Finally, if this sum is greater than 1, there are
increasing returns to scale - that is, doubling the inputs more than doubles the
output

Comparing R2

If we were to restate the CD function in linear terms, then it would look


something like:

We can run a linear regression and get a value of the model parameters, an R2
value and so on. However, we cannot compare the R2 value for this linear
regression model with the R2 value for the log-linear model. This is because in
order to compare R2, the dependent variable must be the same in both models.
However, in the log-linear model, the dependent variable was the natural log of
the output whereas in the linear regression model, the dependent variable is
the value of the output.

Log-Lin or Growth Models


So, if B2 = 0.03:
holding all other factors constant, if t goes up by 1 unit, then the RGDP goes up
by 3% on average.

The results of the regression are given as follows:

The time coefficient is 0.03149 or 3.15%. Hence, for every one unit increase in
time (i.e. every year), holding all other factors constant, the RGDP increased by
3.15%.

If we multiply the relative change in Y by 100, then it will then give the
percentage change, or the growth rate, in Y for an absolute change in X, the
regressor. That is, 100 times β2 gives the growth rate in Y; 100 times β2 is
known in the literature as the semielasticity of Y with respect to X.
Linear Trend Model

Lin-Log Models
Reciprocal Models
Properties of Reciprocal Models:

As X increased indefinitely, the term B2(1/X) approaches 0 and Y approaches


the limiting or asymptotic value B1.

Polynomial Regression Models

Chow Test

When we use a regression model involving time series data, it may happen that
there is a structural change in the relationship between the regressand Y and
the regressors. By structural change, we mean that the values of the
parameters of the model do not remain the same through the entire time
period. Sometimes the structural change may be due to external forces (e.g.,
the oil embargoes imposed by the OPEC oil cartel in 1973 and 1979 or the Gulf
War of 1990–1991), policy changes (such as the switch from a fixed exchange-
rate system to a flexible exchange-rate system around 1973), actions taken by
Congress (e.g., the tax changes initiated by President Reagan in his two terms
in office or changes in the minimum wage rate), or a variety of other causes.
QUALITATIVE VARIABLES

In regression analysis we often encounter variables that are essentially


qualitative in nature, such as gender, race, color, religion, nationality,
geographical region, party af- filiation, and political upheavals. These
qualitative variables are essentially nominal scale variables which have no
particular numerical values. But we can "quantify" them by creating so-called
dummy variables, which take values of °and 1, 0 indicating the absence of an
attribute and 1 indicating its presence.
. If an intercept is included in the model and ifa qualitative variable has
m cate- gories, then introduce only (m 1) dummy variables. For
example, gender has only two categories; hence we introduce only
one dummy variable for gender. This is because if a female gets a
value of 1, ipso facto a male get a value of zero.

If, for example, we consider political affiliation as choice among


Democratic, Re- publican, and Independent parties, we can have at
most two dummy variables to rep- resent the three parties. Ifwe do not
follow this rule, we will fall into what is called the dummy variable trap,
that is, the situation of perfect collinearity. Thus, if we have three
dummies for the three political parties and an intercept, the sum of the
three dummies will be I, which will then be equal to the common
intercept value of 1, lead- ing to perfect collinearity.

. if a qualitative variable has m categories, you may include m dummies,


pro- vided you do not include the (common) intercept in the model.
This way we do not fall into the dummy variable trap.

. the category that the value of 0 is called the reference, benchmark


or comparison category. All comparisons are made in relation to the
reference category, as we will show with our example.

. since dummy variables take values of 1 and 0, we cannot take their


loga- rithms. That is, we cannot introduce the dummy variables in log
form

. if the sample size is relatively small, do not introduce too many dummy
variables. Remember that each dummy coefficient will cost one
degree of freedom.

Interpretation of Dummy Variables


Interpretation of coefficient of the gender dummy variable which is equal to
-3.0748.
This means that holding all other factors constant, the average salary of a
female worker is $3.07 less than that of a male worker.

Similarly, holding all other factors constant, the salary of a non-white worker is
1.56 dollars less on average than that of a white worker.

dummy coefficients are often called differential intercept dummies, for they
show the differences in the intercept values of the category that gets the value
of 1 as compared to the refer- ence category.

The common intercept C denotes the average hourly wage of a white, non
union male worker.

The interpretation of the quantitative regressors is straightforward. For


example, the education coefficient of 1.37 suggests that holding all other
factors constant, for every additional year of schooling the average hourly wage
goes up by about $1.37. Similarly, for every additional year of work experience,
the average hourly wage goes up by about $0.17, ceteris paribus.

Interpretation of Interaction Variables

being a female has a lower aver- age salary by about $3.24, being a nonwhite
has a lower average salary by about $2.16 and being both has an average salary
lower by about $4.30 -3.24 - 2.16 + 1.10). In other words, compared to the
reference category, a nonwhite female earns a lower av- erage wage than being
a female alone or being a nonwhite alone.

Interpretation of Dummy Variables in a Log-Lin Model

Dependent Var: Log of Wages

Use of Dummy Variables for Structural Change


These results shows that the MPI is about 1.10, meaning that if GPS increases
by a dollar, the average GPI goes up by about $1.10.
The recession dummy coefficient is not significant, suggesting that there has
been no statistically visible change in the level of investment pre- and
post-1981 recession. In other words, the results would suggest that there is no
structural break in the US economy. We have to accept this conclusion
cautiously, for it is quite likely that not only the intercept but the slope of the
investment-savings regression might have changed. To allow for this possibility,
we can introduce both differential intercept and differential slope dummies. So
we estimate the following model
The results ofthis regression are shown in Table 3.9. The results in this table are
quite different from those in Table 3.8: now both the differential intercept and
slope coefficients are statistically significant. This means that the investment-
savings rela- tionship has gone structural change since the recession of 1981.

Using Dummy Variables in Seasonal Data

An interesting feature of many economic time series based on weekly, monthly,


and quarterly data is that they exhibit seasonal patterns (oscillatory
movements). Some frequently encountered examples are sales at Christmas
time, demand for money by households at vacation times, demand for cold
drinks in the summer, demand for air travel at major holidays such as
Thanksgiving and Christmas, and demand for choco- late on Valentine's Day.

The process of removing the seasonal component from a time series is called
deseasonlization or seasonal adjustment and the resulting time series is called
a deseasonalized or seasonally adjusted time series
UNIT 2

VIOLATION OF ASSUMPTIONS

Assumpti Descripti Causes Impact of How to How to


on on Violation Identify Fix
Multicollin When Data Still BLUE,
earity there collection, even
exists a model though
perfect or specificati variances
near- on, are
perfect overdeter higher.
linear mined
relationshi model,
p among constraint
two or s in the
more populatio
explanator n/model
y
variables

MULTICOLLINEARITY: DEFINITION

One of the assumptions of the CLRM is that there is no exact linear relationship
among the independent variables (regressors). If there are one or more such
relationships among the regressors, we call it multicollinearity.

Perfect Collinearity^

Imperfect Collinearity ^

Perfect vs Imperfect Collinearity

Perfect Collinearity

Imperfect Collinearity

If there is perfect multicollinearity:


the regression coefficients of the X variables are indeterminate and their
standard errors are infinite.

If multicollinearity is less than perfect:


the regression coefficients, although determinate, possess large standard
errors, which means the coefficients cannot be estimated with great precision
or accuracy.

CAUSES

. Data collection method


If we sample over a limited range of values taken by the regressors in
the population, it can lead to multicollinearity

. Model specification
If we introduce polynomial terms into the model, especially when the
values of the explanatory variables are small, it can lead to
multicollinearity

. Constraint on the model or in the population


For example, if we try to regress electricity expenditure on house size
and income, it may suffer from mutlicollinearity as there is a constraint
in the population. People with higher incomes typically have bigger
houses.

. Overdetermined model
If we have more explanatory variables than the number of
observations, then it could lead to multicollinearity. Often happens in
medical research when you only have a limited number of patients
about whom a large amount of information is collected.

IMPACT OF MULTICOLLINEARITY

In short, when regressors are collinear, statistical inference becomes unreliable,


especially if there is near-collinearity. This is because if two-variables are highly
collinear, it can become difficult to isolate the impact of each variable
separately on the dependent variable.

. OLS estimators remain unbiased.


. OLS Estimators still have minimum variance in the class of all linear
unbiased estimators, hence they are still BLUE.
. However, OLS estimators now have large variances and covariances
which makes precise estimation difficult
. As a result, the confidence intervals tend to be wider. Hence, we may
not reject the “zero null hypothesis” (which states that the true
.

population coefficient is zero)


. Because of (1), the t-statistics of one or more coefficients tend to be
statistically insignificant
. Even though some regression coefficients are statistically
insignificant, the R2 values may be very high
. OLS estimators and their standard errors can be very sensitive to
small changes in the data
. Adding a collinear variable to the chosen regression model can alter
the coefficient values of the other variables in the model

Beta hat will still be unbiased. E(Beta hat) = Beta.


Beta hat will still be consistent. |Beta hat - Beta| = small delta as n approaches
infinity.
But Beta hat will still be EFFICIENT. Even though its variance increases, it will
still have minimum variance across all linear unbiased estimators.

Why is that?

VIF is the variance-inflating factor. It is a measure to the degree to which the


variance of the OLS estimator is inflated because of multicollinearity.

In case we have more than 2 independent variables in the model, the formula
for calculating the variance of the coefficients becomes:
Where R2j is the R squared value of the regression model when we regress J on
all other explanatory variables. Hence, if variable J has high multicollinearity
with other variables, the R2j will be high, which means that the the variance of
Bj will get inflated. Hence, it will no longer exhibit the property of efficiency.

When variance increases, the standard error of Bj will also increase.

If the standard error increases, that means that the value of the t-statistic will
also fall.
This would lead to a higher chance that the value of the t-statistic will become
statistically insignificant.

If your model shows a high R squared, but some of the independent variables
are statistically insignificant, then it is a good indication that the model suffers
form multicollinearity.

HOW TO DETECT MULTICOLLINEARITY


. Check pairwise correlation among the explanatory variables
If the pairwise correlation between any 2 explanatory variables is very
high (as a rule of thumb, greater than 0.8), then your model suffers
from multicollinearity. But a low pairwise correlation does not
necessarily indicate the absence of mutlicollinearity.

. VIF and TOL

Where R2j is the R-squared of the regression model where variable J is


regressed on all other explanatory variables in the model. This is also called an
AUXILLARY REGRESSION.

If the VIF > 10, then the variable j shows multicollinearity

With the use of VIF, we can also calculate the tolerance factors (TOL).

TOL = 1/VIF. = 1 - R2j

TOL close to zero indicates perfect collinearity whereas TOL close to 1


indicates no collinearity.

. Many of the explanatory variables are individually insignificant


because their t-statistic values are statistically insignificant, but the R
squared value is high. In that case, there is a possibility that your
model suffers from multicollinearity.

. Auxiliary Regressions
To find out which of the regressors are highly collinear with the other
regressors included in the model, we can regress each regressor on
the remaining regressors and obtain the auxiliary regressions
mentioned earlier.

Since we have 15 regressors, there will be 15 auxiliary regressions. We


can test the overall significance of each regression by the F test. The
null hypothesis is that all regressor coefficients in the auxiliary
regression are zero. If we reject this hypothesis for one or more of the
auxiliary regressions, then we can conclude that the regressands of
the auxiliary regressions with significant F values are collinear with
other variables in the model.
. Partial Correlation Coefficients

HOW TO FIX MULTICOLLINEARITY

. Increasing sample size.


Let us say our dependent variable is consumption expenditure and
independent variables are income and wealth.

it may be the case that in the population, wealth and income are not
correlated. However, in the sample that we have collected, it may be
the case that individuals who have high income also have high levels
of wealth. Hence, we can rectify the problem by increasing the sample
size and taking more individuals who have high income less wealth
and less income high wealth.

Furthermore, as we increase the sample size, the variance of the


coefficient decreases and hence it becomes a more efficient estimator
and severity of multicollinearity goes down.

However, it may not always be feasible to increase the sample size due
to difficulty in collection of data, unavailability of data, high costs etc.

. Dropping non-essential variables


If we have father and mother’s education both in the model, it is
possible that they will be highly correlated. Furthermore, the age of a
wife and husband would also be highly correlated. Hence, if we
exclude variables such as the father’s age or the mother’s years of
education, then it is possible to improve the performance of the model
and the collinearity problem may not be as serious as before.

However, this may not be the best strategy. If the variable is


theoretically justified to be included in the model, dropping such a
variable would lead to a case of model misspecification.

Principal Component Analysis

A statistical method, known as the principal component analysis (PCA), can


transform correlated variables into orthogonal or uncorrelated variables.9 The
orthogonal vari- ables thus obtained are called the principal components.

The basic idea behind peA is simple. It groups the correlated variables into sub-
groups so that variables belonging to any sub-group have a "common" factor
that moves them together. This common factor may be skill, ability, intelligence,
ethnicity, or any such factor. That common factor, which is not always easy to
identify, is what we call a principal component. There is one pe for each
common factor. Hopefully, these common factors or pes are fewer in number
than the original number of regressors.

The first part of the above table gives the estimated 15 pes. PCl> the first
principal component, has a variance eigenvalue) of 3.5448 and accounts for
24% of the total variation in all the regressors. PC2, the second principal
component, has a variance of 2.8814, accounting for 19% ofthe total variation
in allIS regressors. These two pes ac- count for 42% ofthe total variation. In this
manner you will see the first six pes cumu- latively account for 74% of the total
variation in all the regressors. So although there are 15 pes, only six seem to be
quantitatively important. This can be seen more clearly in Figure 4.1 obtained
from Minitab 15.
Now look at the second part of Table 4.7. For each pe it gives what are called
load- ings or scores or weights that is, how much each of the original
regressors contrib- utes to that PC. For example, Take PCl: education, family
income, father's education, mother's education, husband's education,
husband's wage, and MTR load heavily on this pc. But if you take PC4 you will
see that husband's hours of work contribute heavily to this Pc.
Although mathematically elegant, the interpretation of pes is subjective. For in-
stance, we could think of PCl as representing the overall level of education, for
that variable loads heavily on this pc.

The method of principal components is a useful way of reducing the number of


correlated regressors into a few components that are uncorrelated. As a result,
we do not face the collinearity problem.

Since there is no such thing as a free lunch, this simplification comes at a cost
because we do not know how to interpret the PCs in a meaningful way in
practical applications. If we can identify the PCs with some economic variables,
the principal components method would prove very useful in identifying
multicollinearity and also provide a solution for it.

HETEROSKEDASTICITY

One of the problems commonly encountered in cross-sectional data is hetero-


scedasticity (unequal variance) in the error term.

There are various reasons for heteroscedasticity, such as:


. the presence of outliers in the data
. incorrect functional form of the regression model
. incorrect transformation of data
. mixing observa- tions with different measures of scale (e.g. mixing
high-income households with low-income households) etc.

Consequences of Heteroskedasticity:

. It does not alter the unbiased and consistency property of OLS


estimators
. However, OLS estimators no longer have minimum variance i.e. they
are not efficient. SO, they aren’t BLUE. They are LUE.
. As a result, the t and F statistics under the standard assumptions of
CLRM may not be reliable and may give erroneous conclusions about
the statistic significance of estimated regression coefficients.
. In the presence of heteroscedasticity, BLUE estimators are provided
by method of weighted least squares (WLS).
How to detect heteroskedasticity?

. Breusch-Pagan (BP) Test

Steps:
. Estimate the OLS regression an obtain the squared OLS residuals
from this regression (ei^2)
. Estimate an auxiliary linear regression model. Regress ei^2 on the
regressors of the original model. Save R2 from this model as
R2aux.
. Check the F-statistic from the model. The null hypothesis is that
all the slope coefficients in the OLS model are zero, i..e none of
the regressors are related to the error term. If the computed F-
statistic is statistically significant, then we reject the null
hypothesis of homoskedasticity. If it is not, we may not reject the
null hypothesis.
. Instead you can also multiply the R2aux with the number of
observations to get the chi-square statistic and compare this with
the critical chi square value for the given degrees of freedom.

. White’s test
(Do later)

. Goldfeld-Quandt Test
This popular method is applicable if one assumes that the
heteroscedastic variance, σi^2, is positively related to one of the
explanatory variables in the regression models.
Remedial Measures

. Weighted Least Squares

If we could observe the true hetereoscedastic variances σi^2, then we


could obtain BLUE estimators by dividing each observation by the
heteroscedastic σi and estimate the transformed model by OLS. This
is known as the weighted least squares regression.

Unfortunately, the true σi^2 is rarely ever known and hence we cannot
implement the WLS technique in practice.
. Transformation

If the true error variance is proportional to the square of one of the


regressors, we can divided both sides of the equation by that variable
and run the transformer rergression.

Suppose the error variance is proportional to the square of income,


then we divide both sides of the regression equation by income and
then estimate this regression. We can then subject this regression to
heteroscedasticity tests such as the BP test and White test. Ifthese
tests indicate that there is no evidence of heteroscedasticity, we may
then assume that the transformed error term is homoscedastic.

. Square Root Transformation

If the true error variance is proportional to one of the regressors, we


can use the so-called square transformation, that is, we divide both
sides of (5.1) by the square root ofthe chosen regressor. We then
estimate the regression thus trans- formed and subject that
regression to heteroscedasticity tests. If these tests are satisfactory,
we may rely on this regression.

. Log Transformation

we can regress the logarithm of the dependent variable on the


regressors, which may be linear or in log form. The reason for this is
that the log transforma- tion compresses the scales in which the
variables are measured, thereby reducing a tenfold difference between
two values to a twofold difference. For example, the number 80 is 10
times the number 8, but In 80 (= 4.3280) is about twice as large as In
8 (= 2.0794).

AUTOCORRELATION

one of the assumptions of the classical linear regression model (CLRM) is that
the error terms, Ut, are uncorrelated that is the error term at time t is not
correlated with the error term at time (t- 1) or any other error term in the past.

Reasons:

Consequences of Autocorrelation

. OLS estimators are still unbiased and consistent.


. They are still normally distributed in large samples.
. But they are no longer efficient. That is, they are no longer BLU. In a
case of autocorrelation, standard errors are UNDERESTIMATED. This
means that the t-values are OVERESTIMATED. Hence, it means that
variables that may not be statistically significant erroneously appear to
be statistically significant with high t-values.
. Hypothesis testing procedure is not reliable as standard errors are
erroneous, even with large samples. Therefore, the F and T tests may
not be valid.

Detection:

. Graphical Method
. Durbin Watson test
. Breusch-Godfrey test

. GRAPHICAL METHOD
. DURBIN WATSON d TEST

Most commonly used.

This is the ratio of the sum of squared differences in successive residuals to


the residual sum of squares.

The d value always lies between 0 and 4.

Assumptions behind d-test:

. Regression model has an intercept term


. Explanatory variables are fixed in repeated sampling
. Error terms follows the first-order autoregressive scheme:
. Error term is normally distributed
. Regressors do not include lagged values of dependent variable Yt.
That is regressors do not include Yt-1, Yt-2 and other lagged terms of
Y.

the d value lies between 0 and 4. The closer it is to zero, the greater is the
evidence of positive autocorrelation, and the closer it is to 4, the greater is the
evidence of negative autocorrelation. If d is about 2, there is no evidence of
positive or negative (first-) order autocorrelation.

. BG (Breusch-Godfrey) TEST FOR AUTCORRELATION

Remedial Measures

. First difference transformation


Suppose autocorrelation is of AR 1 type.
We can rewrite this as:

UNIT 2: TIME SERIES

Stationary
A time series is stationary if its mean and variance are constance over time and
the value of the covariance between two time. Periods depends only on the
distance or the gap between two periods and not the actual time at which the
covariance is computed.

Importance of Stationarity

. Limited applicability
If a time series is non stationary, then we can study its behaviour only
for the period under consideration. As a result, it is not possible to
generalise it to other time periods. Therefore, for forecasting pry-
roses, non stationary time series will be of little value

. Spurious Regression
If we have two or more non stationary time series, then the regression
analysis involving such time series may lead to the phenomena of
spurious regression or nonsense regression.

If you regress a non stationary time series on one or more non


stationary time series, you may obtain a high R2 and some or all
regression coefficients may be statistically significant based on the t
and F tests. Unfortunately, in cases of non stationary time series,
these tests are not reliable because they assume the underlying time
series to be stationary

Tests of Stationary
. Graphical Analysis
. Correlogram
. Unit root analysis

Graphical Analysis
In this, we simply plot the time series with the time periods on the x-axis and
the outcome variable on the y-axis. Such informal analysis will give us an initial
clue about whether a time series is stationary or not.

ACF and Correlogram


https://www.youtube.com/watch?v=8SjFXT3BAys

ACF refers to the autocorrelation function. It measures the correlation between


the original time series and a lagged time series, lagged by k periods. Hence,
the ACF at lag k is given by:

The simply measures the correlation between y(t) and y(t-k). So when k = 0,
this correlation is perfect and the value of the ACF is 1.

We calculate the ACF coefficients for lags up to k. There is no standard length


of k, but usually we use 1/4th to 1/3rd of the length of the time series. So if we
have 1,000 observations, we can calculate ACF coefficients for lags upto 250.

We then plot the ACF coefficient for lag k on the y-axis and k on the x-axis. This
is called a corelogram.

We use this correlogram to evaluate whether the ACF coefficients that we have
calculated are statistically significant or not.

To calculate the critical values, we set the H0 as that the time series is purely
random. AA purely random time series is one which has constant mean,
constant variance, and is serially uncorrelated. If a time series is purely random,
then the sample autocorrelation is given by a normal distribution with mean 0
and variance equal to 1/n where n is the sample size.
Using this, we calculate the critical values of ACF at a given level of
significance.

Hence, if we observe that several of our ACF coefficients are statistically


significant, then it is very strong evidence that the given time series is non
stationary.

Q-test

Instead of assessing the statistical significance of an individual coefficient, we


can also find out of the sum of autocorrelation coefficients squares is
statistically significant or not. This is done with the help of the Q-statsitic.

We set the null hypothesis H0 that the time series is purely random i.e. all ACF
coefficients are zero.

The test statistic Q approximately follows the chi-square distribution with m


(total number of lags) degrees of freedom.

If the computed Q statistic exceeds the critical Q statistic for m dfs at a given
significance level, then we reject the null hypothesis that all the true ACF are 0.
At least some of them must be non zero. In that situation, it is strong evidence
that the time series is non stationary.

Unit Root test of Stationarity


Hence, we regress the first order difference of the dependent variable on the
trend variable (time t) and the one period lagged value of the dependent
variable.

The null hypothesis is that B3, the coefficient of the one period lagged variable,
is zero. This is called the unit root hypothesis. The alternate hypothesis is that
B3 <0.

H0: B3 = 0 (time series is nonstationary)


H1: B3 < 0 (time series is stationary)

A rejection of the null hypothesis would suggest that the time series is
stationary.

Here, we cannot use the t-test to test the significance of B3 = 0. This is


because the t test is valid only if the udnerlying time series is stationary or not.

Instead we use the Dickey-Fuller Test

In the Dickey-Fuller Test, we run the regression equation using OLS and look at
the routinely calculated t-value. However, we do not compare this with the
critical t-value. We compare it with the critical DF values to find out if it exceeds
the critical DF values.

If the computed t-value is greater than the critical DF value, we reject the null
hypothesis and conclude that the time series is stationary. If it is less than the
critical DF value, we do not reject the null hypothesis and it is likely that the
time series is nonstationary.
Augmented Dickey-Fuller Test

In the DF test, it was assumed that the error term ut in the random walk models
(13.5, 13.6, 13.7) is uncorrelated. However, if it si correlated, which is likely to be
the case with the random walk with daft and a deterministic trend, then we use
the Augmented DIckey Fuller test, ADF.

How to make a non stationary time series into a stationary time series.

Trend Stationary
Difference Stationary
Integrated Time Series
UNIT 3
Stochastic Regressors and the method of instrumental variables - the problem
of endogeneity, the problem with stochastic regressors, reasons for correlation
between regressor and the error term and the method of instrumental variables
(2SLS).

Panel Data regression models


models that study the same group of entities over time.

Importance of Panel Data

. Since panel data deals with the same group of entities over time, there
is bound to be heterogeneity in these units which may often be
unobservable. Panel data estimation techniques can take such
heterogeneity explicitly into account by allowing for subject
specific variables.
. Combining time series and cross sectional data gives us more
informative data, more variability, less collinearity, more degrees of
freedom and more efficiency.
. By studying repeated cross-sections of observations, panel data are
better suited to study the dynamics of change. Unemployment, job
turnover, duration of unemployment are better studied with panel Data
. Can better detect and measure effects that can’t be observed in pure
cross-sectional data or pure-time series data. So, if the effects of the
minimum wage on employment have to be studied, it can be better
studied if we follow successive waves of increases in minimum wages.
. Economies of scale and technological change can be better studied
with panel data

Balanced/UnbalancedPanel Data
When the number of time observations is the same for each individual, then we
say that it is a balanced panel data. When this is not the case, it is unbalanced
panel data.
Short/Long Panel Data
When the number of cross-sectional or individual units N is greater than the
number of time periods T, it is called a short panel data.
When the number of cross-sectional units N is less than the number of time
periods T, it is called a long panel data.

3 types of models for Panel Data:

. Pooled OLS
. Fixed Effects least-squares dummy variables (LSDV)
. Within group estimator
. Random effects model

Pooled OLS

Under Pooled OLS, we simply pool all the cross-sectional and time series data
together and estimate a “grand” regression, neglecting the cross-section and
time series nature of the data.

We put subscripts of i and t representing the cross sectional unit I, and the time
period t.
Correction: B1 will not have an i subscript.

Limitations of Pooled OLS:

. By lumping the data together, we camouflage the heterogeneity that


may exist among the individual cross-sectional units. Hence, the
individuality of each cross-sectional unit is subsumed under the error
term uit. Hence, it is likely that the error term may be correlated with
some of the regressors included in the model.
.

Fixed Effects LSDV

This is known as the Fixed Effects Regression Model (FEM).


The term “fixed effects” is due to the act that each taxpayer’s intercept,
although different from the intercept of the other taxpayers, does not vary over
time. It is time invariant. Hence, it does not have a t subscript.
Such a model is a ONE-WAY FIXED EFFECTS MODEL beacause we have allowed
the intercepts to differ across cross-sections but not over time. We can
introduced nine dummy variables to represent the 10 time periods of the model
along with 46 cross section dummies to represent the 47 cross-sectional units.
In that case the model becomes aTWO-WAY FIXED EFFECTS model.

Furthermore, in the ONE-WAY FEM, we have assumed that the slope


coefficients of the charity function remain the same across cross-sectional
units. But it is possible, that these slope coefficients may be different fro all 47
individuals. To allow for this possibility, we can introduced differential slope
coefficients, where we multiply the 5 slope coefficients with 46 differential
intercept dummies, which will consume another 230 degrees of freedom.

Similarly, we can also interact the 9 time dummy variables with the five
explanatory variables, which consume an additional 50 degrees of freedom. So,
we will be left with very few degrees of freedom to perform meaningful
analysis.

Limitations of Fixed Effects LSDV

. Evert additional dummy variable will cost an additional degree of


freedom. So, if the sample is not very large, introducing too many
dummies will leave very few dfs to perform meaningful statistical
analysis
. Too many additive and multiplicative dummies may lead to the
possibility of multicollinearity, which makes precise estimation of one
or more parameters difficult.
Fixed Effects WG estimator

The WG estimator model eliminated the fixed effects variable B1i in the LSDV
by expressing the regressand and the regressor both as deviations from their
respective (group) mean values and running the regression on the mean-
corrected variables.

The OLS estimators so obtained are known as WG estimators (within group)


because they use the time variation within each cross-sectional unit.
The WG estimators are consistent, although they are not efficient (they have
large variances).

The estimators obtained in WG are exactly the same as those obtained in LSDV.

Drawbacks of WG:

By removing the fixed effects variable B1i, it also removes the effect of time-
invariant regressors that may be present in the model. For example in a panel
data regression of wages on work experience, age, gender, education, and
race, the effect of gender and race will be wiped out in the mean-corrected
values as they remain constant over time. So we cannot assess the impact of
time-invariant variables in the WG model

Random effects model

In the random effects model, it is assumed that the individual specific


coefficient B1i is a random variable with mean value B1

In the random effects model we have a composite error term. The composite
error term has 2 components: epsilon(i) which is the individual-specific error
component and uit which is the combined time series and cross-section error
component.
What if variance of epsilon is zero?

Fixed effects vs random effect model

If we assume that epsilon(i) and the regressors are uncorrelated, REM may be
appropriate. Because in this case, we have to estimate fewer parameters. But, if
they are correlated, then FEM may be appropriate.

Hausman Test
The Hausman test is used to decide whether FEM or REM is more appropriate
H0: FEM and REM do not differ substantially
Test statistics has a large sample chi square distribution with df equal to
number of regressors in the model.

If computed chi square is greater than the critical chi square for a given df and
level of significance, then we can conclude that REM is not appropriate
because the random error terms are probably correlated with one or more
regressors. In this case FEM is preferred to REM.

Properties of estimators

. Pooled Estimators:
if slope coefficients are constant across subjects and if hthe error
term is uncorrelated with the regressors, pooled estimators are
consistent. However, it is very likely that the error terms are correlated
over time for a given subject. Therefore, we myst use panel-corrected
standard errors for hypothesis testing. Otherwise, the routinely
computed standard errors may be underestimated.

. Fixed effects estimators:


Even if the underlying model is pooled or random-effects, the fixed
effects estimators are always consistent

. Random effect estimators:


The random effects model is consistent even if the true model is
pooled. But if the true model is fixed effects, then REM estimators are
inconsistent.

UNIT 4

Linear Probability Model (LPM)


Limitations of LPM:
. LPM assumes that the probability of the dependent variable moves
linearly with the the vlaue of explanatory variable, no matter how large
or small tat value is.
. Probability must lie between 0 and 1. But there is no guarantee that
the estimation given by the LPM will lie within these bounds.
. The third assumption that the error term will be normally distributed
cannot old when the dependent variable only takes values of 0 and 1.
. Moreover, the error term in LPM is heteroscedastic, so the significance
tests are no longer reliable.

Logit Model
A logit model is one where the dependent variable is the log of the odds ratio of
the probability of the outcome and it is a linear function of the explanatory
variable.
Measures of Goodness of Fit

Conventional measures of goodness of fit such as the R2 are not applicable in


this case because the dependent variable can only take values 0 or 1. Hence,
other measures similar to R2 are used such as McFadden R2 or Count R2.

Statistical Significance of Coefficients


Error Term
The error term in Logit has a logistic distribution

PROBIT

Error Term
Error term in probit has a normal distribution

Coefficients
Therefore, if we multiply the probit coefficient by about 1.81, you will get
approximately the logit coefficient.

This is because logit follows logistic distribution with mean 0, variance (pi^2)/3
and probit follows standard normal distribution with mean zero and variance 1

Interpretation of Coefficients

Logit vs Probit

You might also like