Study Material - Econometrics
Study Material - Econometrics
Econometrics
Study Material
Course Objectives:
1. To understand the econometric theory and arguments used in the economics literature.
2. To apply econometric techniques to real-world economic issues, demonstrating the ability to use econometrics as a tool for
empirical analysis and policy evaluation.
3. To construct econometric models from the economic model, and to estimate the model’s parameters using regression
analysis starting from the ordinary least squares (OLS) estimation method.
4. Students will be introduced to statistical software packages used to estimate regression models.
Course Outcomes:
CO1: Students will learn to specify and formulate economic models, including choosing the appropriate functional forms and
variables to represent economic relationships.
CO2: Students will learn to test the empirical validity of economic theory and models using empirical data and forecast future
trends.
CO3.Students will be able to estimate and interpret linear regression models and be able to distinguish between economic and
statistical importance
[2]
CO4.They will be able to use a statistical/econometric computer package to estimate an econometric model and be able to
report the results of their work in a non-technical and literate manner.
1 Nature and
Scope of 1.1 Distinction between Economic
Econometrics Model and Econometric model 6
Learning Resources
Principles of Econometrics A Modern Approach Using EViews, Sankar Kumar Bhaumik, Oxford University Press
[4]
CO-PO Mapping:
COBBABA603.1 3 3 2 1 1 1 1 1
COBBABA603.2 3 3 3 1 1 1 1 1
COBBABA603.3 3 3 3 1 1 1 1 1
COBBABA603.4 3 3 3 1 1 1 1 1
This study material is only for use of IEM & UEM students.
[6]
Topics: Distinction between Economic Model and Econometric model, Steps in formulating Econometric model (Specification,
Estimation, Testing of Hypothesis, Forecasting),Structure of Economic Data (cross-section, time series, pooled, panel), Application
of Econometrics in Management, The nature of regression analysis: regression versus causation; regression versus correlation.
METHODOLOGY OF ECONOMETRICS
Regression is a main tool of econometrics. Regression analysis is concerned with the study of the dependence of one variable, the
dependent variable, on one or more other variables, the explanatory variables, with a view to estimating and/or predicting the
[7]
(population) mean or average value of the former in terms of the known or fixed (in repeated sampling)
values of the latter.
Types of Data
Three types of data may be available for empirical analysis: time series, cross-section, and pooled (i.e., combination of time series
and crosssection) data.
A time series is a set of observations on the values that a variable takes at different times. Cross-section data are data on one or more
variables collected at the same point in time. In pooled, or combined, data are elements of both time series and cross-section data.
Panel data is a special type of pooled data in which the same cross-sectional unit (say, a family or a firm) is surveyed over time.
Questions:
1. Suppose you were to develop an economic model of criminal activities, say, the hours spent in criminal activities (e.g., selling
illegal drugs). What variables would you consider in developing such a model? (BL: 6, Create)
2. Suppose, you want to test whether the training program in your company will increase the productivity of the worker. Discuss
the various steps you will follow in the econometric analysis. (BL:5, Evaluate)
3. How does an econometric model differ from a mathematical and a statistical model? (BL4, Analyse)
4. Explain the need for the inclusion of an error term in an econometric relationship. (BL 4, Analyse).
Topics: Definition of Simple Linear Regression Model (SLRM), The classical assumptions (basic interpretation), Concepts of
population regression function and sample regression function, Estimation of model by method of ordinary least squares., Economic
interpretations of the estimated model, Properties of the Least Squares Estimators (BLUE) in SLRM- Gauss-Markov theorem.
[8]
The Concepts of Population Regression Function (PRF) and Sample Regression Function (SRF):
The key concept underlying regression analysis is the concept of the conditional expectation function (CEF), or population
regression function (PRF), E(Y | Xi) = f (Xi), where f (Xi) denotes some function of the explanatory variable X.
A population regression curve is simply the locus of the conditional means of the dependent variable for the fixed values of the
explanatory variable(s).
The PRF is an idealized concept, since in practice one rarely has access to the entire population of interest. Usually, one has a sample
of observations from the population. Therefore, one uses the stochastic sample regression function (SRF) to estimate the PRF.
Questions:
1. The relationship between nominal exchange rate and relative prices. From annual observations from 1985 to 2005, the
following regression results were obtained, where Y = exchange rate of the Canadian dollar to the U.S. dollar (CD$) and X =
ratio of the U.S. consumer price index to the Canadian consumer price index; that is, X represents the relative prices in the
two countries:
[9]
Interpret this regression. How would you interpret R2? (BL: 6, Create)
2. You are given the scattergram in Figure 2.8 along with the regression line. What general conclusion do you draw from this
diagram? Is the regression line sketched in the diagram a population regression line or the sample regression line? (BL: 5,
Evaluate)
3. (BL: 4, Analyse)
[10]
4.
[11]
Topics: Use of standard normal, chi2, t, and F statistics in linear regression model, Testing hypothesis Single test (t test and chi2
test) Joint test (F test), Goodness of fit (in terms of R2, adjusted R2 and F statistic), Statistical significance and economic
importance.
Statistical significance: Statistical significance indicates that an effect you observe in a sample is unlikely to be the product of
chance. For statistically significant results, you can conclude that an effect you observe in a sample also exists in the population.
Null hypothesis and alternative hypothesis: In the context of statistical analysis, we often talk about null hypothesis and alternative
hypothesis. If we are to compare method A with method B about its superiority and if we proceed on the assumption that both
methods are equally good, then this assumption is termed as the null hypothesis. As against this, we may think that the method A is
superior or the method B is inferior, we are then stating what is termed as alternative hypothesis. The null hypothesis is generally
symbolized as H0 and the alternative hypothesis as Ha.
Type I and Type II errors: In the context of testing of hypotheses, there are basically two types of errors we can make. We may
reject H0 when H0 is true and we may accept H0 when in fact H0 is not true. The former is known as Type I error and the latter as
Type II error. In other words, Type I error means rejection of hypothesis which should have been accepted and Type II error means
accepting the hypothesis which should have been rejected.
Two-tailed and One-tailed tests: In the context of hypothesis testing, these two terms are quite important and must be clearly
understood. A two-tailed test rejects the null hypothesis if, say, the sample mean is significantly higher or lower than the
hypothesised value of the mean of the population. Such a test is appropriate when the null hypothesis is some specified value and
[12]
the alternative hypothesis is a value not equal to the specified value of the null hypothesis. A one-tailed test would be used when we
are to test, say, whether the population mean is either lower than or higher than some hypothesised value.
Important Tests
The important parametric tests are: (1) z-test; (2) t-test; (3) Chi-square-test, and (4) F-test. All these tests are based on the
assumption of normality i.e., the source of data is considered to be normally distributed. In some cases the population may not be
normally distributed, yet the tests will be applicable on account of the fact that we mostly deal with samples and the sampling
distributions closely approach normal distributions. z-test is based on the normal probability distribution and is used for judging the
significance of several statistical measures, particularly the mean. The relevant test statistic, z, is worked out and
compared with its probable value) at a specified level of significance for judging the significance of the measure concerned. This is
a most frequently used test in research studies. This test is used even when binomial distribution or t-distribution is applicable on the
presumption that such a distribution tends to approximate normal distribution as ‘n’ becomes larger. z-test is generally used for
comparing the mean of a sample to some hypothesised mean for the population in case of large sample, or when population variance
is known. z-test is also used for judging he significance of difference between means of two independent samples in case of large
samples, or when population variance is known. z-test is also used for comparing the sample proportion to a theoretical value of
population proportion or for judging the difference in proportions of two independent samples when n happens to be large. Besides,
this test may be used for judging the significance of median, mode, coefficient of correlation and several other measures.
t-test is based on t-distribution and is considered an appropriate test for judging the significance of a sample mean or for judging the
significance of difference between the means of two samples in case of small sample(s) when population variance is not known (in
which case we use variance of the sample as an estimate of the population variance). In case two samples are related, we use paired
t-test (or what is known as difference test) for judging the significance of the mean of difference between the two related samples. It
can also be used for judging the significance of the coefficients of simple and partial correlations. The relevant test statistic, t, is
calculated from the sample data and then compared with its probable value based on t-distribution at a specified level of
significance for concerning degrees of freedom for accepting or rejecting the null hypothesis. It may be noted that t-test applies only
in case of small sample(s) when population variance is unknown. Chi-square-test is based on chi-square distribution and as a
parametric test is used for comparing a sample variance to a theoretical population variance. F-test is based on F-distribution and is
used to compare the variance of the two-independent samples.
[13]
Questions:
a. Is the estimated intercept coefficient significant at the 5 percent level of significance? What is the null hypothesis you are
testing?
b. Is the estimated slope coefficient significant at the 5 percent level? What is the underlying null hypothesis?
c. Establish a 95 percent confidence for the true slope coefficient.
d. What is the mean forecast value of cell phones demanded if the per capita income is $9,000? What is the 95 percent
confidence interval for the forecast value?
[14]
2. (BL: 6, Create)
If two or more independent variables have an exact linear relationship between them then we have perfect multicollinearity.
Yi = β0+β1X1i+β2X2i+εi
X1i= α0+α1X2i
Imperfect (or Near) Multicollinearity
When we use the word multicollinearity we are usually talking about severe imperfect multicollinearity. When explanatory variables
are approximately linearly related, we have
Yi = β0+β1X1i+β2X2i+εi
X1i= α0+α1X2i+ ui
Consequences of Multicollinearity
1. Although BLUE, the OLS estimators have large variances and covariances, making precise estimation difficult.
2. Because of consequence 1, the confidence intervals tend to be much wider, leading to the acceptance of the “zero null hypothesis”
(i.e., the true population coefficient is zero) more readily.
3. Also because of consequence 1, the t ratio of one or more coefficients tends to be statistically insignificant.
4. Although the t ratio of one or more coefficients is statistically insignificant, R2, the overall measure of goodness of fit, can be very
high.
5. The OLS estimators and their standard errors can be sensitive to small changes in the data.
Heteroscedasticity
A critical assumption of the classical linear regression model is that the disturbances ui have all the same variance, σ 2. If this
assumption is not satisfied, there is heteroscedasticity.
Consequences of Heteroscedasticity
Heteroscedasticity does not destroy the unbiasedness and consistency properties of OLS estimators. But these estimators are no
longer minimum variance or efficient.
That is, they are not BLUE.
In the presence of heteroscedasticity, the variances of OLS estimators are not provided by the usual OLS formulas. But if we persist
in using the usual OLS formulas, the t and F tests based on them can be highly misleading, resulting in erroneous conclusions.
Autocorrelation
[16]
If the assumption of the classical linear regression model—that the errors or disturbances entering into the population regression
function (PRF) are random or uncorrelated—is violated, the problem of serial or autocorrelation arises.
Consequences of Autocorrelation
Although in the presence of autocorrelation the OLS estimators remain unbiased, consistent, and asymptotically normally distributed,
they are no longer efficient. As a consequence, the usual t, F, and χ2 tests cannot be legitimately applied.
Questions:
1. Consider the set of hypothetical data in the table below. (BL:6, Create)
a. “Since the zero-order correlations are very high, there must be serious multicollinearity.” Comment.
b. Would you drop variables Xi2 and Xi3 from the model?
c. If you drop them, what will happen to the value of the coefficient of Xi?
3. (BL: 4, Analyse)
[18]
4. In studying the movement in the production workers’ share in the value added (i.e., labor’s share), the following models
were considered: (BL: 4, Analyse)
where Y = labor’s share and t = time. Based on annual data for 1949–1964, the following results were obtained for the
primary metal industry:
[19]