Econometrics Notes
Econometrics Notes
Unit 1:
(Basic Econometrics: Chapter 1-9)
(Ecotrix by Examples: Chapters 1-3)
. Introduction to Econometrics and overview of its applications
. Simple Regression with Classical Assumptions
. Least Square Estimation and BLUE, Properties of Estimators, Multiple
Regression Model and Hypothesis Testing Related to Parameters:
Simple and Joint
. Functional Forms of Regression Models
Unit 2:
(Basic Econometrics: Chapter 10-13 and Chapter 21,22)
(Ecotrics by Examples: Chapters 4-7 and Chapter 13)
3
. Violations of classical assumptions
. multicollinearity, heteroscedasticity, autocorrelation and model
specification errors, their identification, their impact on
parameters
2
. Tests related to parameters and impact on the reliability and the
validity of inferences in cases of violations of Assumptions
. Method to take care of violations of assumptions, goodness of fit
1
. Time Series econometrics:
2
. stationary stochastic processes, non stationary stochastic
3
processes, unit root stochastic processes, trend stationary and
difference stationary stochastic processes.
2
. Tests of stationarity: graphical analysis and autocorrelation
function (ACF) and correlogram statistical significance of
autocorrection coefficients. Unit root test - the augment dickey-
1
fuller (ADF) test.
1
. Transforming non stationary financial time series - difference
stationary processes and trend-stationary processes
Unit 3:
(Basic Econometrics: Chapter 16)
4
(Ecotrics by Examples: Chapters 17,19)
Unit 4:
(Basic Econometrics: Chapter 9, 15)
(Ecotrics by Examples: Chapters 3,8)
Theory:
. Assumptions of CLRM
. Any one specific assumption, how to detect, how to fix, implications
on the model
. Trend stationary and difference stationary
. Unit root stationary
Practical:
. Multicollineary
. Autocorrelation
. Interoperation of model
. Dummy variables
. Log-Lin model s
. Logit/probit (not that common)
. Heteroscedasticity
. Regression Table: Pvvalue, T stat, Fstat, coefficients etc.
. Prediction
. Calculation of B1 and B2 in regression using formula
. Cobb Douglas
.
UNIT 1
Beta hat must have minimum amount of variance. If the variance of Beta hat is
too far, then that means that it may not accurately capture the value of true
population parameter Beta and hence inferences we draw from the model will
not be precise.
. Unbiasedness
When we draw samples from the population, the Beta hat calculated will not
necessarily be equal to the population parameter Beta. Sometimes it will be
higher and sometimes it will be lower.
However, if we draw a large number of samples and compute Beta hat for each
of them, the mean of all of those Beta hats should be equal to the true
population parameter Beta.
. Consistency
These 3 properties of a good estimator are met only when we ensure certain
assumptions based on which our model is estimated.
When one of these assumptions are violated, then one or more of these
properties do not hold.
Regression:
Regression is an analysis of the dependence of dependent variables on
independent variables with an objective to predict the value of the average
value of the dependent variable given a specific value of the independent
variable.
Suppose this is our regression equation.
The E(Yi|Xi) function gives us the PRF in deterministic form because this has no
error term. This is because E(ui|Xi) = 0 based on an assumption of the CLRM.
Hence, he expected value of the error term for any given Xi is always 0.
If we follow the assumptions of CLRM, then our model will exhibit the properties
of efficiency, unbiasedness, and consistency.
. Linearity
The regression model must be linear in its parameters.
Linearity is required not for the variables, but only for the parameters.
Although the variable Xi^2 is not linear, this is still a linear model because the
parameters (Alpha and Beta) are linear.
Here, one of the parameters is not Linear B^2, hence the model is not linear.
Such models cannot be evaluated using CLRM. We will have to use non-linear
models.
. The mean value of the stochastic error term is zero and it follows a
normal distribution.
So, n >> k
. Homoskedasticity
Variance of the error term is constant.
Even at higher levels of Xi, the variance of the error term remains constant.
. There should not be any autocorrelation.
This means that the covariance between Ut and Ut-1 should be zero. If
that is not the case, then it is a situation of autocorrelation.
In cross sectional data, if two error terms do not have zero covariance, then it is
a situation of SPATIAL CORRELATION
In time series data, if two error terms for consecutive time periods do not have
zero covariance, then it is a station of AUTOCORRELATION OR SERIAL
CORRELATION.
. Multicollinearity
There should not be a high degree of correlation between any two
independent variable. If the correlation is high, then we say that the
model has high multicollinearity.
Interpretation of parameters:
Limitations:
Under OLS, we are giving equal weightage to all error terms. However, we
should give more weightage to points closer to the regression line and lesser
weightage to the points away from the regression line, but that does not
happen in the case of the OLS method. When we square these errors, the
points that are farther away from the line are actually contributing MORE
to the RMSE and hence are given more importance than the points that are
closer to the line.
We resolve this using the Weighted Least Squares method.
Error term
ui is known as the stochastic disturbance or stochastic error term.
the disturbance term ui is a surrogate for all those variables that are omitted
from the model but that collectively affect Y.
The stochastic error term is a catchall that includes all those variables that
cannot be readily quantified. It may represent variables that cannot be included
in the model for lack of data availability, or errors of measurement in the data,
or intrinsic randomness in human behavior. Whatever the source of the random
term u, it is assumed that the average effect of the error term on the
regressand is marginal at best
BLUE
BEST LINEAR UNBIASED ESTIMATOR
An estimator, say the OLS Estimator Beta hat, is said to be a best linear
unbiased estimator (BLUE) if the following hold:
. It is linear
It is a linear function of a random variable such as the dependent
variable Y in the regression model
. It is unbiased
The average or expected value of the estimator is equal to the true
value.
E(Beta hat) = Beta
. It is efficient
It has minimum variance in the class of all such linear unbiased
estimator.
An unbiased estimator with the least variance is known as an efficient
estimator.
As long as the assumptions of CLRM are ensured, the OLS estimator will be
BLUE.
COEFFICIENT OF DETERMINATION
The overall goodness of fit of the regression model is measured by the
coefficient of determination, r2. It tells what proportion of the variation in the
dependent variable, or regressand, is explained by the explanatory variable, or
regressor. This r 2 lies between 0 and 1; the closer it is to 1, the better is the fit.
if the fitted line in the NPP is approximately a straight line, one can
conclude that the variable of interest is normally distributed.
Hence, if the error terms are normally distributed then the JB test should give a
test statistic equal to zero.
If the computed p-value of the JB test statistic is sufficiently low, which will
happen if the value of the computed JB test statistic is significantly different
from 0, then we can reject the null hypothesis that the error terms are normally
distributed.
SIMPLE
Under hypothesis testing for parameters, we check to see if the true population
parameter of a given regression coefficient is zero.
JOINT
If the COMPUTED F-VALUE is greater than the CRITICAL F-VALUE at the given
level of significance, then we reject the null hypothesis and say that AT LEAST
ONE of the regressors is statistically significant.
It is very important to note that the use of the t and F tests is explicitly
based on the assumption that the error term, Ui, is normally distributed
Equation 2.3 is linear in the parameters A, B2, and B3 and hence is a linear
equation although it is non linear in the variables.
An interesting feature of the log-linear model is that the slope coefficients can
be interpreted as elasticities.
Similarly, if we increase the capital input by 1%, then the output goes up by
0.5% on average, holding the labour input constant.
In the two-variable model, the simplest way to decide whether the log-linear
model fits the data is to plot the scattergram of ln Yi against ln Xi and see if the
scatter points lie approximately on a straight line
Another interesting property of the CD function is that the sum of the partial
slope coefficients, (B2 +B3), gives information about returns to scale, that is,
the response of output to a proportional change in the inputs. If this sum is 1,
then there are constant returns to scale that is, doubling the inputs will double
the output, tripling the inputs will triple the output, and so on. If this sum is less
than 1, then there are de~ creasing returns to scale that is, doubling the inputs
less than doubles the output. Finally, if this sum is greater than 1, there are
increasing returns to scale - that is, doubling the inputs more than doubles the
output
Comparing R2
We can run a linear regression and get a value of the model parameters, an R2
value and so on. However, we cannot compare the R2 value for this linear
regression model with the R2 value for the log-linear model. This is because in
order to compare R2, the dependent variable must be the same in both models.
However, in the log-linear model, the dependent variable was the natural log of
the output whereas in the linear regression model, the dependent variable is
the value of the output.
The time coefficient is 0.03149 or 3.15%. Hence, for every one unit increase in
time (i.e. every year), holding all other factors constant, the RGDP increased by
3.15%.
If we multiply the relative change in Y by 100, then it will then give the
percentage change, or the growth rate, in Y for an absolute change in X, the
regressor. That is, 100 times β2 gives the growth rate in Y; 100 times β2 is
known in the literature as the semielasticity of Y with respect to X.
Linear Trend Model
Lin-Log Models
Reciprocal Models
Properties of Reciprocal Models:
Chow Test
When we use a regression model involving time series data, it may happen that
there is a structural change in the relationship between the regressand Y and
the regressors. By structural change, we mean that the values of the
parameters of the model do not remain the same through the entire time
period. Sometimes the structural change may be due to external forces (e.g.,
the oil embargoes imposed by the OPEC oil cartel in 1973 and 1979 or the Gulf
War of 1990–1991), policy changes (such as the switch from a fixed exchange-
rate system to a flexible exchange-rate system around 1973), actions taken by
Congress (e.g., the tax changes initiated by President Reagan in his two terms
in office or changes in the minimum wage rate), or a variety of other causes.
QUALITATIVE VARIABLES
. if the sample size is relatively small, do not introduce too many dummy
variables. Remember that each dummy coefficient will cost one
degree of freedom.
Similarly, holding all other factors constant, the salary of a non-white worker is
1.56 dollars less on average than that of a white worker.
dummy coefficients are often called differential intercept dummies, for they
show the differences in the intercept values of the category that gets the value
of 1 as compared to the refer- ence category.
The common intercept C denotes the average hourly wage of a white, non
union male worker.
being a female has a lower aver- age salary by about $3.24, being a nonwhite
has a lower average salary by about $2.16 and being both has an average salary
lower by about $4.30 -3.24 - 2.16 + 1.10). In other words, compared to the
reference category, a nonwhite female earns a lower av- erage wage than being
a female alone or being a nonwhite alone.
The process of removing the seasonal component from a time series is called
deseasonlization or seasonal adjustment and the resulting time series is called
a deseasonalized or seasonally adjusted time series
UNIT 2
VIOLATION OF ASSUMPTIONS
MULTICOLLINEARITY: DEFINITION
One of the assumptions of the CLRM is that there is no exact linear relationship
among the independent variables (regressors). If there are one or more such
relationships among the regressors, we call it multicollinearity.
Perfect Collinearity^
Imperfect Collinearity ^
Perfect Collinearity
Imperfect Collinearity
CAUSES
. Model specification
If we introduce polynomial terms into the model, especially when the
values of the explanatory variables are small, it can lead to
multicollinearity
. Overdetermined model
If we have more explanatory variables than the number of
observations, then it could lead to multicollinearity. Often happens in
medical research when you only have a limited number of patients
about whom a large amount of information is collected.
IMPACT OF MULTICOLLINEARITY
Why is that?
In case we have more than 2 independent variables in the model, the formula
for calculating the variance of the coefficients becomes:
Where R2j is the R squared value of the regression model when we regress J on
all other explanatory variables. Hence, if variable J has high multicollinearity
with other variables, the R2j will be high, which means that the the variance of
Bj will get inflated. Hence, it will no longer exhibit the property of efficiency.
If the standard error increases, that means that the value of the t-statistic will
also fall.
This would lead to a higher chance that the value of the t-statistic will become
statistically insignificant.
If your model shows a high R squared, but some of the independent variables
are statistically insignificant, then it is a good indication that the model suffers
form multicollinearity.
With the use of VIF, we can also calculate the tolerance factors (TOL).
. Auxiliary Regressions
To find out which of the regressors are highly collinear with the other
regressors included in the model, we can regress each regressor on
the remaining regressors and obtain the auxiliary regressions
mentioned earlier.
it may be the case that in the population, wealth and income are not
correlated. However, in the sample that we have collected, it may be
the case that individuals who have high income also have high levels
of wealth. Hence, we can rectify the problem by increasing the sample
size and taking more individuals who have high income less wealth
and less income high wealth.
However, it may not always be feasible to increase the sample size due
to difficulty in collection of data, unavailability of data, high costs etc.
The basic idea behind peA is simple. It groups the correlated variables into sub-
groups so that variables belonging to any sub-group have a "common" factor
that moves them together. This common factor may be skill, ability, intelligence,
ethnicity, or any such factor. That common factor, which is not always easy to
identify, is what we call a principal component. There is one pe for each
common factor. Hopefully, these common factors or pes are fewer in number
than the original number of regressors.
The first part of the above table gives the estimated 15 pes. PCl> the first
principal component, has a variance eigenvalue) of 3.5448 and accounts for
24% of the total variation in all the regressors. PC2, the second principal
component, has a variance of 2.8814, accounting for 19% ofthe total variation
in allIS regressors. These two pes ac- count for 42% ofthe total variation. In this
manner you will see the first six pes cumu- latively account for 74% of the total
variation in all the regressors. So although there are 15 pes, only six seem to be
quantitatively important. This can be seen more clearly in Figure 4.1 obtained
from Minitab 15.
Now look at the second part of Table 4.7. For each pe it gives what are called
load- ings or scores or weights that is, how much each of the original
regressors contrib- utes to that PC. For example, Take PCl: education, family
income, father's education, mother's education, husband's education,
husband's wage, and MTR load heavily on this pc. But if you take PC4 you will
see that husband's hours of work contribute heavily to this Pc.
Although mathematically elegant, the interpretation of pes is subjective. For in-
stance, we could think of PCl as representing the overall level of education, for
that variable loads heavily on this pc.
Since there is no such thing as a free lunch, this simplification comes at a cost
because we do not know how to interpret the PCs in a meaningful way in
practical applications. If we can identify the PCs with some economic variables,
the principal components method would prove very useful in identifying
multicollinearity and also provide a solution for it.
HETEROSKEDASTICITY
Consequences of Heteroskedasticity:
Steps:
. Estimate the OLS regression an obtain the squared OLS residuals
from this regression (ei^2)
. Estimate an auxiliary linear regression model. Regress ei^2 on the
regressors of the original model. Save R2 from this model as
R2aux.
. Check the F-statistic from the model. The null hypothesis is that
all the slope coefficients in the OLS model are zero, i..e none of
the regressors are related to the error term. If the computed F-
statistic is statistically significant, then we reject the null
hypothesis of homoskedasticity. If it is not, we may not reject the
null hypothesis.
. Instead you can also multiply the R2aux with the number of
observations to get the chi-square statistic and compare this with
the critical chi square value for the given degrees of freedom.
. White’s test
(Do later)
. Goldfeld-Quandt Test
This popular method is applicable if one assumes that the
heteroscedastic variance, σi^2, is positively related to one of the
explanatory variables in the regression models.
Remedial Measures
Unfortunately, the true σi^2 is rarely ever known and hence we cannot
implement the WLS technique in practice.
. Transformation
. Log Transformation
AUTOCORRELATION
one of the assumptions of the classical linear regression model (CLRM) is that
the error terms, Ut, are uncorrelated that is the error term at time t is not
correlated with the error term at time (t- 1) or any other error term in the past.
Reasons:
Consequences of Autocorrelation
Detection:
. Graphical Method
. Durbin Watson test
. Breusch-Godfrey test
. GRAPHICAL METHOD
. DURBIN WATSON d TEST
the d value lies between 0 and 4. The closer it is to zero, the greater is the
evidence of positive autocorrelation, and the closer it is to 4, the greater is the
evidence of negative autocorrelation. If d is about 2, there is no evidence of
positive or negative (first-) order autocorrelation.
Remedial Measures
Stationary
A time series is stationary if its mean and variance are constance over time and
the value of the covariance between two time. Periods depends only on the
distance or the gap between two periods and not the actual time at which the
covariance is computed.
Importance of Stationarity
. Limited applicability
If a time series is non stationary, then we can study its behaviour only
for the period under consideration. As a result, it is not possible to
generalise it to other time periods. Therefore, for forecasting pry-
roses, non stationary time series will be of little value
. Spurious Regression
If we have two or more non stationary time series, then the regression
analysis involving such time series may lead to the phenomena of
spurious regression or nonsense regression.
Tests of Stationary
. Graphical Analysis
. Correlogram
. Unit root analysis
Graphical Analysis
In this, we simply plot the time series with the time periods on the x-axis and
the outcome variable on the y-axis. Such informal analysis will give us an initial
clue about whether a time series is stationary or not.
The simply measures the correlation between y(t) and y(t-k). So when k = 0,
this correlation is perfect and the value of the ACF is 1.
We then plot the ACF coefficient for lag k on the y-axis and k on the x-axis. This
is called a corelogram.
We use this correlogram to evaluate whether the ACF coefficients that we have
calculated are statistically significant or not.
To calculate the critical values, we set the H0 as that the time series is purely
random. AA purely random time series is one which has constant mean,
constant variance, and is serially uncorrelated. If a time series is purely random,
then the sample autocorrelation is given by a normal distribution with mean 0
and variance equal to 1/n where n is the sample size.
Using this, we calculate the critical values of ACF at a given level of
significance.
Q-test
We set the null hypothesis H0 that the time series is purely random i.e. all ACF
coefficients are zero.
If the computed Q statistic exceeds the critical Q statistic for m dfs at a given
significance level, then we reject the null hypothesis that all the true ACF are 0.
At least some of them must be non zero. In that situation, it is strong evidence
that the time series is non stationary.
The null hypothesis is that B3, the coefficient of the one period lagged variable,
is zero. This is called the unit root hypothesis. The alternate hypothesis is that
B3 <0.
A rejection of the null hypothesis would suggest that the time series is
stationary.
In the Dickey-Fuller Test, we run the regression equation using OLS and look at
the routinely calculated t-value. However, we do not compare this with the
critical t-value. We compare it with the critical DF values to find out if it exceeds
the critical DF values.
If the computed t-value is greater than the critical DF value, we reject the null
hypothesis and conclude that the time series is stationary. If it is less than the
critical DF value, we do not reject the null hypothesis and it is likely that the
time series is nonstationary.
Augmented Dickey-Fuller Test
In the DF test, it was assumed that the error term ut in the random walk models
(13.5, 13.6, 13.7) is uncorrelated. However, if it si correlated, which is likely to be
the case with the random walk with daft and a deterministic trend, then we use
the Augmented DIckey Fuller test, ADF.
How to make a non stationary time series into a stationary time series.
Trend Stationary
Difference Stationary
Integrated Time Series
UNIT 3
Stochastic Regressors and the method of instrumental variables - the problem
of endogeneity, the problem with stochastic regressors, reasons for correlation
between regressor and the error term and the method of instrumental variables
(2SLS).
. Since panel data deals with the same group of entities over time, there
is bound to be heterogeneity in these units which may often be
unobservable. Panel data estimation techniques can take such
heterogeneity explicitly into account by allowing for subject
specific variables.
. Combining time series and cross sectional data gives us more
informative data, more variability, less collinearity, more degrees of
freedom and more efficiency.
. By studying repeated cross-sections of observations, panel data are
better suited to study the dynamics of change. Unemployment, job
turnover, duration of unemployment are better studied with panel Data
. Can better detect and measure effects that can’t be observed in pure
cross-sectional data or pure-time series data. So, if the effects of the
minimum wage on employment have to be studied, it can be better
studied if we follow successive waves of increases in minimum wages.
. Economies of scale and technological change can be better studied
with panel data
Balanced/UnbalancedPanel Data
When the number of time observations is the same for each individual, then we
say that it is a balanced panel data. When this is not the case, it is unbalanced
panel data.
Short/Long Panel Data
When the number of cross-sectional or individual units N is greater than the
number of time periods T, it is called a short panel data.
When the number of cross-sectional units N is less than the number of time
periods T, it is called a long panel data.
. Pooled OLS
. Fixed Effects least-squares dummy variables (LSDV)
. Within group estimator
. Random effects model
Pooled OLS
Under Pooled OLS, we simply pool all the cross-sectional and time series data
together and estimate a “grand” regression, neglecting the cross-section and
time series nature of the data.
We put subscripts of i and t representing the cross sectional unit I, and the time
period t.
Correction: B1 will not have an i subscript.
Similarly, we can also interact the 9 time dummy variables with the five
explanatory variables, which consume an additional 50 degrees of freedom. So,
we will be left with very few degrees of freedom to perform meaningful
analysis.
The WG estimator model eliminated the fixed effects variable B1i in the LSDV
by expressing the regressand and the regressor both as deviations from their
respective (group) mean values and running the regression on the mean-
corrected variables.
The estimators obtained in WG are exactly the same as those obtained in LSDV.
Drawbacks of WG:
By removing the fixed effects variable B1i, it also removes the effect of time-
invariant regressors that may be present in the model. For example in a panel
data regression of wages on work experience, age, gender, education, and
race, the effect of gender and race will be wiped out in the mean-corrected
values as they remain constant over time. So we cannot assess the impact of
time-invariant variables in the WG model
In the random effects model we have a composite error term. The composite
error term has 2 components: epsilon(i) which is the individual-specific error
component and uit which is the combined time series and cross-section error
component.
What if variance of epsilon is zero?
If we assume that epsilon(i) and the regressors are uncorrelated, REM may be
appropriate. Because in this case, we have to estimate fewer parameters. But, if
they are correlated, then FEM may be appropriate.
Hausman Test
The Hausman test is used to decide whether FEM or REM is more appropriate
H0: FEM and REM do not differ substantially
Test statistics has a large sample chi square distribution with df equal to
number of regressors in the model.
If computed chi square is greater than the critical chi square for a given df and
level of significance, then we can conclude that REM is not appropriate
because the random error terms are probably correlated with one or more
regressors. In this case FEM is preferred to REM.
Properties of estimators
. Pooled Estimators:
if slope coefficients are constant across subjects and if hthe error
term is uncorrelated with the regressors, pooled estimators are
consistent. However, it is very likely that the error terms are correlated
over time for a given subject. Therefore, we myst use panel-corrected
standard errors for hypothesis testing. Otherwise, the routinely
computed standard errors may be underestimated.
UNIT 4
Logit Model
A logit model is one where the dependent variable is the log of the odds ratio of
the probability of the outcome and it is a linear function of the explanatory
variable.
Measures of Goodness of Fit
PROBIT
Error Term
Error term in probit has a normal distribution
Coefficients
Therefore, if we multiply the probit coefficient by about 1.81, you will get
approximately the logit coefficient.
This is because logit follows logistic distribution with mean 0, variance (pi^2)/3
and probit follows standard normal distribution with mean zero and variance 1
Interpretation of Coefficients
Logit vs Probit