Linear Models and Econometrics Chapter 1-9-2
Linear Models and Econometrics Chapter 1-9-2
Javed Anwar
Basic Econometrics
Introduction:
What is
Econometrics?
2
Introduction
What is Econometrics?
Definition 1: Economic
Measurement
4
Introduction
What is Econometrics?
Definition 4: The social science
which applies economics, mathematics and
statistical inference to the analysis of economic
phenomena (By Arthur S. Goldberger,
1964)
Definition 5: The empirical
determination of economic laws (By H.
Theil, 1971)
5
Introduction
What is Econometrics?
Definition 6: A conjunction of economic theory
and actual measurements, using the theory and
technique of statistical inference as a bridge pier
(By T.Haavelmo, 1944)
6
Economic Mathematical
Theory Economics
Econometrics
Economic Mathematic
Statistics Statistics
7
Introduction
Why a separate discipline?
Economic theory makes
statements that are mostly qualitative in
nature, while econometrics gives empirical
content to most economic theory
Mathematical
economics is to express economic
theory in mathematical form without empirical
verification of the theory, while econometrics is
mainly interested in the later
8
Introduction
Why a separate discipline?
Economic Statistics is mainly
concerned with collecting, processing and
presenting economic data. It does not being
concerned with using the collected data to test
economic theories
Mathematical statistics
provides many of tools for economic studies, but
econometrics supplies the later with many
special methods of quantitative analysis based
on economic data
9
Economic Mathematical
Theory Economics
Econometrics
Economic Mathematic
Statistics Statistics
10
Introduction
Methodology of
Econometrics
(1)Statement of theory
or hypothesis:
11
Introduction
Methodology of
Econometrics
(2) Specification of the
mathematical model of
the theory
Y = ß 1+ ß2X ; 0 < ß 2< 1
Y= consumption
expenditure
X= income
ß1 and ß2 are parameters; ß1 is
intercept, and ß2 is slope coefficients
12
Introduction
Methodology of
Econometrics
(3) Specification of the
econometric model of the
theory
Y = ß1+ ß2X + u ; 0 < ß2< 1;
Y = consumption
expenditure;
X = income;
ß1 and ß2 are parameters; ß1is intercept and ß2 is
slope coefficients; u is disturbance term or error
term. It is a random or stochastic variable
13
Introduction
Methodology of
Econometrics
Y= Personal consumption
expenditure
X= Gross Domestic Product
all in Billion US Dollars
14
Introduction
Methodology of
Econometrics
(4) Obtaining Data
Year X Y
18
Introduction
Methodology of
Econometrics
(8) Using model for
control or
policy purposes
Y=4000= -231.8+0.7194 X X 5882
MPC = 0.72, an income of $5882 Bill
will produce an expenditure of $4000
Bill. By fiscal and monetary policy,
Government can manipulate the
control variable X to get the desired
level of target variable Y
19
Introduction
Methodology of Econometrics
Figure 1.4: Anatomy of
economic modelling
• 1) Economic Theory
• 2) Mathematical Model of Theory
• 3) Econometric Model of Theory
• 4) Data
• 5) Estimation of Econometric Model
• 6) Hypothesis Testing
• 7) Forecasting or Prediction
• 8) Using the Model for control or policy purposes
20
Economic Theory
Estimation
Hypothesis Testing
Application
in control or
Forecasting policy
studies
21
Basic Econometrics
Chapter 1:
THE NATURE OF
REGRESSION ANALYSIS
22
1-1. Historical origin of the term
“Regression”
23
1-1. Historical origin of the term
“Regression”
24
1-2. Modern Interpretation of
Regression Analysis
The modern way in interpretation of
Regression: Regression Analysis is
concerned with the study of the
dependence of one variable (The
Dependent Variable), on one or more
other variable(s) (The Explanatory
Variable), with a view to estimating
and/or predicting the (population) mean
or average value of the former in term
of the known or fixed (in repeated
sampling) values of the latter.
Examples: (pages 16-19)
29
Dependent Variable Y; Explanatory Variable
Xs
1. Y = Son’s Height; X = Father’s Height
2. Y = Height of boys; X = Age of boys
3. Y = Personal Consumption Expenditure
X = Personal Disposable Income
4. Y = Demand; X = Price
5. Y = Rate of Change of Wages
X = Unemployment Rate
6. Y = Money/Income; X = Inflation Rate
7. Y = % Change in Demand; X = % Change in
the
advertising budget
8. Y = Crop yield; Xs = temperature, rainfall,
sunshine,
fertilizer
30
1-3. Statistical vs.
Deterministic
Relationships
31
1-4. Regression vs.
Causation:
Regression does not necessarily imply
causation. A statistical relationship
cannot logically imply causation. “A
statistical relationship, however strong
and however suggestive, can never
establish causal connection: our ideas of
causation must come from outside
statistics, ultimately from some theory
or other” (M.G. Kendal and A. Stuart,
“The Advanced Theory of Statistics”)
32
1-5. Regression vs.
Correlation
Correlation Analysis: the primary
objective is to measure the strength or
degree of linear association between
two variables (both are assumed to be
random)
Regression Analysis: we try to estimate
or predict the average value of one
variable (dependent, and assumed to be
stochastic) on the basis of the fixed
values of other variables (independent,
and non-stochastic)
33
1-6. Terminology and Notation
Dependent Explanatory
Variable Variable(s)
Explained Variable Independent
Variable(s)
Predictand Predictor(s)
Regressand Regressor(s)
Response Stimulus or control
variable(s)
Endogenous Exogenous(es)
34
1-7. The Nature and Sources
of Data for Econometric
Analysis
1) Types of Data :
Time series data;
Cross-sectional data;
Pooled data
2) The Sources of Data
3) The Accuracy of Data
35
1-7. The Nature and Sources of Data
for Econometric Analysis
Time Series Data
A time series is a set of observations on the values that a variable
takes at different times. Such data may be collected at regular time
intervals, such as
daily (e.g., stock prices, weather reports),
weekly (e.g., money supply figures),
monthly [e.g., the unemployment rate, the Consumer Price Index
(CPI)],
quarterly (e.g., GDP),
annually (e.g., government budgets),
quinquennially, that is, every 5 years (e.g., the census of
manufactures),
decennially (e.g., the census of population).
1-7. The Nature and Sources of Data
for Econometric Analysis
Cross-Section Data Cross-section data are data on one or more
variables collected at the same point in time, such as the census of
population conducted by the Census Bureau every 10 years (the
latest being in year 2000), the surveys of consumer expenditures
conducted by the University of Michigan, and, of course, the
opinion polls by Gallup and umpteen other organizations.
A concrete example of cross-sectional data is given in Table 1.1
This table gives data on egg production and egg prices for the 50
states in the union for 1990 and 1991. For each year the data on the
50 states are cross-sectional data. Thus, in Table 1.1 we have two
cross-sectional samples.
1-7. The Nature and Sources of Data
for Econometric Analysis
Pooled Data
In pooled, or combined, data are elements of both time series and
cross-section data. The data in Table 1.1 are an example of pooled
data.
For each year we have 50 cross-sectional observations and for each
state we have two time series observations on prices and output of
eggs, a total of 100 pooled (or combined) observations.
Likewise, the data given in exercise 1.1 are pooled data in that the
Consumer Price Index (CPI) for each country for 1973–1997 is
time series data, whereas the data on the CPI for the seven
countries for a single year are cross-sectional data. In the pooled
data we have 175 observations—25 annual observations for each of
the seven countries.
1-7. The Nature and Sources of Data
for Econometric Analysis
Panel, Longitudinal, or Micro panel Data
This is a special type of pooled data in which the same cross-
sectional unit (say, a family or a firm) is surveyed over time.
For example, the U.S. Department of Commerce carries out a
census of housing at periodic intervals.
At each periodic survey the same household (or the people living at
the same address) is interviewed to find out if there has been any
change in the housing and financial conditions of that household
since the last survey.
By interviewing the same household periodically, the panel data
provides very useful information on the dynamics of household
behavior
1-7. The Nature and Sources of Data
for Econometric Analysis
The Sources of Data
The data used in empirical analysis may be collected by
a governmental agency (e.g., the Department of Commerce),
an international agency (e.g., the International Monetary Fund (IMF) or
the World Bank),
a private organization (e.g., the Standard & Poor’s Corporation),
or an individual.
Literally, there are thousands of such agencies collecting data for one
purpose or another.
Pakistan – Data Sources
Pakistan Economic Survey
SBP Bulletins
PBS Published Data
Etc.
1-7. The Nature and Sources of Data
for Econometric Analysis
A Note on the Measurement Scales of Variables15
The variables that we will generally encounter fall into four broad
categories: ratio scale, interval scale, ordinal scale, and nominal
scale. It is important that we understand each.
Ratio Scale
For a variable X, taking two values, X1 and X2, the ratio X1/X2 and the
distance (X2 − X1) are meaningful quantities.
Also, there is a natural ordering (ascending or descending) of the
values along the scale.
Therefore, comparisons such as X2 ≤ X1 or X2 ≥ X1 are meaningful.
Most economic variables belong to this category. Thus, it is
meaningful to ask how big is this year’s GDP compared with the
previous year’s GDP.
1-7. The Nature and Sources of Data
for Econometric Analysis
Interval Scale
An interval scale variable satisfies the last
two properties of the ratio scale variable but
not the first.
Thus, the distance between two time periods,
say (2000–1995) is meaningful, but not the
ratio of two time periods (2000/1995).
1-7. The Nature and Sources of Data
for Econometric Analysis
Ordinal Scale
A variable belongs to this category only if it satisfies the third
property of the ratio scale (i.e., natural ordering).
Examples are grading systems (A, B, C grades) or income class
(upper, middle, lower).
For these variables the ordering exists but the distances between the
categories cannot be quantified.
Students of economics will recall the indifference curves between
two goods, each higher indifference curve indicating higher level of
utility, but one cannot quantify by how much one indifference
curve is higher than the others.
1-7. The Nature and Sources of Data
for Econometric Analysis
Nominal Scale
Variables in this category have none of the
features of the ratio scale variables.
Variables such as gender (male, female) and
marital status (married, unmarried, divorced,
separated) simply denote categories.
1-8. Summary and
Conclusions
1) The key idea behind regression analysis
is the statistic dependence of one
variable on one or more other
variable(s)
2) The objective of regression analysis is
to estimate and/or predict the mean or
average value of the dependent
variable on basis of known (or fixed)
values of explanatory variable(s)
46
1-8. Summary and
Conclusions
3) The success of regression depends on
the available and appropriate data
4) The researcher should clearly state the
sources of the data used in the
analysis, their definitions, their
methods of collection, any gaps or
omissions and any revisions in the data
47
Basic Econometrics
Chapter 2:
TWO-VARIABLE
REGRESSION
ANALYSIS: Some basic
Ideas
48
In Chapter 1 we discussed the concept of regression in broad
terms. In this chapter we approach the subject somewhat
formally. Specifically, this and the following two chapters
introduce the reader to the theory underlying the simplest
possible regression analysis, namely, the bivariate, or two
variable, regression in which the dependent variable (the
regressand) is related to a single explanatory variable (the
regressor).
This case is considered first, not because of its practical
adequacy, but because it presents the fundamental ideas of
regression analysis as simply as possible and some of these
ideas can be illustrated with the aid of two-dimensional
graphs. Moreover, as we shall see, the more general multiple
regression analysis in which the regressand is related to one or
more regressors is in many ways a logical extension of the
two-variable case.
A HYPOTHETICAL EXAMPLE1
As noted in Section 1.2, regression analysis is largely
concerned with estimating and/or predicting the
(population) mean value of the dependent variable on the
basis of the known or fixed values of the explanatory
variable(s). To understand this, consider the data given in
Table 2.1. The data in the table refer to a total population
of 60 families in a hypothetical community and their
weekly income (X) and weekly consumption expenditure
(Y), both in dollars. The 60 families are divided into 10
income groups (from $80 to $260) and the weekly
expenditures of each family in the various groups are as
shown in the table. Therefore, we have 10 fixed values of X
and the corresponding Y values against each of the X
values; so to speak, there are 10 Y subpopulations.
2-1. A Hypothetical Example
51
2-1. A Hypothetical Example
52
Table 2-2: Weekly family income X ($), and consumption Y ($)
Total 325 462 445 707 678 750 685 1043 966 1211
54
2-2. The concepts of population
regression function (PRF)
57
2-2. The concepts of population
regression function (PRF)
58
2-2. The concepts of population
regression function (PRF)
59
2-2. The concepts of population
regression function (PRF)
61
THE MEANING OF THE
TERM LINEAR
Since this text is concerned primarily with linear models like
(2.2.2), it is essential to know what the term linear really means,
for it can be interpreted in two different ways.
62
THE MEANING OF THE
TERM LINEAR
Linearity in the Variables
The first and perhaps more “natural” meaning of linearity is that
the conditional expectation of Y is a linear function of Xi, such
as, for example, (2.2.2).
Geometrically, the regression curve in this case is a straight line.
In this interpretation, a regression function such as
E(Y | Xi) = β1 + β2Xi2
is not a linear function because the variable X appears with a power
or index of 2.
63
THE MEANING OF THE
TERM LINEAR
Linearity in the Parameters
The second interpretation of linearity is that the conditional
expectation of Y, E(Y | Xi), is a linear function of the parameters,
the β’s; it may or may not be linear in the variable X.
In this interpretation E(Y | Xi) = β1 + β2X2 i is a linear (in the
parameter) regression model. To see this, let us suppose X takes
the value 3. Therefore, E(Y | X = 3) = β1 + 9β2, which is
obviously linear in β1 and β2. All the models shown in Figure
2.3 are thus linear regression models, that is, models linear in the
parameters.
64
2-4. Stochastic Specification of
PRF
It is clear from Figure 2.1 that, as family income
increases, family consumption expenditure on the
average increases, too. But what about the
consumption expenditure of an individual family
in relation to its (fixed) level of income? It is
obvious from Table 2.1 and Figure 2.1 that an
individual family’s consumption expenditure does
not necessarily increase as the income level
increases
66
2-4. Stochastic Specification of
PRF
For example, from Table 2.1 we observe that
corresponding to the income level of $100 there is
one family whose consumption expenditure of
$65 is less than the consumption expenditures of
two families whose weekly income is only $80.
But notice that the average consumption
expenditure of families with a weekly income of
$100 is greater than the average consumption
expenditure of families with a weekly income of
$80 ($77 versus $65).
67
2-4. Stochastic Specification of
PRF
What, then, can we say about the relationship between an
individual family’s consumption expenditure and a given level of
income? We see from Figure 2.1 that, given the income level of Xi,
an individual family’s consumption expenditure is clustered around
the average consumption of all families at that Xi, that is, around its
conditional expectation. Therefore, we can express the deviation of
an individual Yi around its expected value as follows:
2-4. Stochastic Specification of
PRF
Ui = Y - E(YX=Xi ) or Yi = E(YX=Xi ) + Ui
Ui = Stochastic disturbance or stochastic
error term. It is nonsystematic
component.
It can take positive or negative values.
Component E(YX=Xi ) is systematic or
deterministic. It is the mean
consumption expenditure of all the
families with the same level of income
The assumption that the regression line
passes through the conditional means
of Y implies that E(U 69
iXi ) = 0
2-5. The Significance of the
Stochastic
Disturbance Term
Ui = Stochastic Disturbance
Term is a surrogate for all
variables that are omitted
from the model but they
collectively affect Y
Many reasons why not include
such variables into the model
as follows: 70
2-5. The Significance of the
Stochastic
Disturbance Term
If E(Y | Xi) is assumed to be linear in Xi, as in (2.2.2), Eq. (2.4.1)
may be written as:
2-5. The Significance of the
Stochastic
Disturbance Term
2-5. The Significance of the
Stochastic
Disturbance Term
Why not include as many as variable into the
model (or the reasons for using ui)
+ Vagueness of theory
+ Unavailability of Data
+ Core Variables vs. Peripheral Variables
+ Intrinsic randomness in human behavior
+ Poor proxy variables
+ Principle of parsimony
+ Wrong functional form
73
2-6. The Sample Regression
Function (SRF)
Plotting the data of Tables 2.4 and 2.5, we obtain the scattergram
given in Figure 2.4. In the scattergram two sample regression
lines are drawn so as to “fit” the scatters reasonably well: SRF1 is
based on the first sample, and SRF2 is based on the second
sample. Which of the two regression lines represents the “true”
population regression line? If we avoid the temptation of looking
at Figure 2.1, which purportedly represents the PR, there is no
way we can be absolutely sure that either of the regression lines
shown in Figure 2.4 represents the true population regression line
(or curve). The regression lines in Figure 2.4 are known as the
sample regression lines.
2-6. The Sample Regression
Function (SRF)
2-6. The Sample Regression
Function (SRF)
79
2-6. The Sample Regression
Function (SRF)
82
2-7. Summary and
Conclusions
The key concept underlying regression
analysis is the concept of the population
regression function (PRF).
This book deals with linear PRFs: linear
in the unknown parameters. They may
or may not linear in the variables.
83
2-7. Summary and
Conclusions
For empirical purposes, it is the stochastic
PRF that matters. The stochastic disturbance
term ui plays a critical role in estimating the
PRF.
The PRF is an idealized concept, since in
practice one rarely has access to the entire
population of interest. Generally, one has a
sample of observations from population and
use the stochastic sample regression (SRF)
to estimate the PRF.
84
Basic Econometrics
Chapter 3:
TWO-VARIABLE
REGRESSION
MODEL:
The problem of
Estimation
85
3-1. The method of ordinary
least square (OLS)
Least-square criterion:
Minimizing U^2i = (Yi
– Y^i) 2
= (Yi- ^1 - ^2X)2
(3.1.2)
Normal Equation and
solving it for ^1 and ^2 =
Least-square estimators
[See (3.1.6)(3.1.7)]
Numerical and statistical
86
3-1. The method of ordinary least
square (OLS)
OLS estimators are expressed
solely in terms of observable
quantities. They are point estimators
The sample regression line passes
through sample means of X and Y
The mean value of the estimated
Y^ is equal to the mean value of the
actual Y: E(Y) = E(Y^)
The mean value of the residuals
U^i is zero: E(u^i )=0
u^i are uncorrelated
87 with the
3-2. The assumptions
underlying the method of least
squares
90
3-2. The Assumptions Underlying the Method of
Least Squares
92
3-2. The Assumptions Underlying the Method of
Least Squares
93
3-2. The Assumptions Underlying the Method of
Least Squares
94
3-2. The Assumptions Underlying the Method of
Least Squares
95
3-2. The Assumptions Underlying the Method of
Least Squares
96
3-2. The Assumptions Underlying the Method of
Least Squares
97
3-3. Precision or standard errors of least-
squares estimates
In statistics the precision of an
estimate is measured by its standard
error (SE)
var( ^2) = 2 / x2i (3.3.1)
se(^2) = Var(^2) (3.3.2)
var( ^1) = 2 X2i / n x2i (3.3.3)
se(^1) = Var(^1) (3.3.4)
^ 2
= u^2i / (n - 2) (3.3.5)
^ = ^ 2
is standard error of the
estimate
98
3-3. Precision or standard errors of
least-squares estimates
Features of the variance:
+ var( ^2) is proportional to 2 and inversely
proportional to x2i
+ var( ^1) is proportional to 2 and X2i but
inversely proportional to x2i and the sample
size n.
+ cov ( ^1 , ^2) = - var( ^2) shows the
independence between ^1 and ^2
99
3-4. Properties of least-squares estimators:
The Gauss-Markov Theorem
An OLS estimator is said to be
BLUE if :
+ It is linear, that is, a linear
function of a random variable, such
as the dependent variable Y in the
regression model
+ It is unbiased , that is, its average
or expected value, E(^2), is equal
to the true value 2
+ It has minimum variance in the
class of all such100
linear unbiased
estimators
3-4. Properties of least-squares
estimators: The Gauss-Markov Theorem
Gauss- Markov
Theorem:
Given the assumptions of
the classical linear
regression model, the
least-squares estimators,
in class of unbiased linear
101
β̂
2
102
3-5. The coefficient of determination r 2: A
measure of “Goodness of fit”
Chapter 4:
THE NORMALITY
ASSUMPTION:
Classical Normal
Linear
Regression Model
(CNLRM)
106
4-2.The normality assumption
CNLR assumes that each u i is distributed
normally u i N(0, 2) with:
Mean = E(u i) = 0 Ass 3
Variance = E(u2i) = 2 Ass 4
Cov(u i , u j ) = E(u i , u j) = 0 (i#j) Ass 5
110
4-3. Properties of OLS
estimators under the
normality assumption
With the normality assumption
the OLS estimators ^1 , ^2 and
^2 have the following
properties:
1. They are unbiased
2. They have minimum variance.
Combined 1 and 2, they are
efficient estimators
3. Consistency, that is, as the
sample size increases
indefinitely, the estimators
converge to their true
population values
111
4-3. Properties of OLS estimators
under the normality assumption
4. ^1 is normally distributed
N(1, ^12)
And Z = (^1- 1)/ ^1 is N(0,1)
5. ^2 is normally distributed N(2 ,^22)
And Z = (^2- 2)/ ^2 is N(0,1)
6. (n-2) ^2/ 2 is distributed as the
2(n-2)
112
4-3. Properties of OLS estimators
under the normality assumption
7. ^1 and ^2 are distributed
independently of ^2. They have
minimum variance in the entire class of
unbiased estimators, whether linear or
not. They are best unbiased estimators
(BUE)
8. Let ui is N(0, 2 ) then Yi is
N[E(Yi); Var(Yi)] = N[1+ 2X i ; 2]
113
Some last points of chapter
4
4-4. The method of Maximum likelihood
(ML)
ML is point estimation method with some
stronger theoretical properties than OLS
(Appendix 4.A on pages 110-114)
The estimators of coefficients ’s by OLS and ML
are
identical. They are true estimators of the ’s
(ML estimator of 2) = u^i2/n (is biased
estimator)
(OLS estimator of 2) = u^i2/n-2 (is unbiased
estimator)
When sample size (n) gets larger the two
estimators tend to be equal
114
Some last points of chapter
4
4-5. Probability distributions related
to the Normal Distribution: The t, 2,
and F distributions
See section (4.5) on pages 107-108
with 8 theorems and Appendix A, on
pages 755-776
4-6. Summary and Conclusions
See 10 conclusions on pages 109-110
115
Basic Econometrics
Chapter 5:
TWO-VARIABLE
REGRESSION:
Interval Estimation
and Hypothesis Testing
116
Chapter 5
TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis
Testing
5-1. Statistical
Prerequisites
See Appendix A with key concepts such
as probability, probability
distributions, Type I Error, Type II
Error, level of significance, power of a
statistic test, and confidence interval
117
A type I error (false-positive) occurs if an investigator rejects a null
hypothesis that is actually true in the population;
A type II error (false-negative) occurs if the investigator fails to reject a
null hypothesis that is actually false in the population.
The significance level, also known as alpha or α, is a
measure of the strength of the evidence that must
be present in your sample before you will reject
the null hypothesis and conclude that the effect is
statistically significant. The researcher determines
the significance level before conducting the experiment.
11
Statistical power, or the power of a hypothesis test
is the probability that the test correctly
rejects the null hypothesis. That is, the
probability of a true positive result. It is only useful
when the null hypothesis is rejected.
11
12
Chapter 5 TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis
Testing
121
Chapter 5 TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis
Testing
126
Chapter 5 TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis
Testing
5-5. Hypothesis Testing: General Comments
The stated hypothesis is known as the
null hypothesis: Ho
The Ho is tested against and alternative
hypothesis: H1
127
Chapter 5 TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis
Testing
Two-sided or two-tail Test
H0: 2 = * versus H1: 2 # *
^2 - t se(^2) 2 ^2 + t
/2 se(^2)
/2
Type of H0 H1 Reject H0
Hypothesis if
Two-tail 2 = 2* 2 # 2* |t| > t/2,df
Right-tail 2 2* 2 > 2* t > t,df
Left-tail 2 2* 2 < 2* t < - t,df
130
Chapter 5 TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis
Testing
5-7. Hypothesis Testing: The test of significance approach
Testing the significance of 2 : The 2 Test
Under the Normality assumption we have:
^2
2 = (n-2) ------- ~ 2 (n-2)
(5.4.1)
2
From (5.4.2) and (5.4.3) on page
520 =>
131
Chapter 5 TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis
Testing
5-7. Hypothesis Testing: The test of significance approach
Table 5-2: A summary of the 2 Test
H0 H1 Reject H0 if
2 = 20 2 > 20 Df.(^2)/ 20 > 2 ,df
2 = 20 2 < 20 Df.(^2)/ 20 < 2(1-),df
2 = 20 2 # 20 Df.(^2)/ 20 > 2/2,df
or < 2 (1-/2), df
132
Chapter 5 TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis
Testing
5-8. Hypothesis Testing:
Some practical aspects
1) The meaning of “Accepting” or
“Rejecting” a Hypothesis
2) The Null Hypothesis and the Rule of
Thumb
3) Forming the Null and Alternative
Hypotheses
4) Choosing , the Level of Significance
133
Chapter 5 TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis
Testing
5-8. Hypothesis Testing:
Some practical
aspects
5) The Exact Level of Significance:
The p-Value [See page 132]
132
6) Statistical Significance versus
Practical Significance
7) The Choice between Confidence-
Interval and Test-of-Significance
Approaches to Hypothesis Testing
[Warning: Read carefully pages
117-134 ] 134
Chapter 5 TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis
Testing
5-9. Regression Analysis and
Analysis
of Variance
TSS = ESS + RSS
F=[MSS of ESS]/[MSS of RSS] =
= 2^2 xi2/ ^2 (5.9.1)
If ui are normally distributed; H0: 2 = 0 then
F follows the F distribution with 1 and n-2
degree of freedom
135
Chapter 5 TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis
Testing
5-9. Regression Analysis
and Analysis of Variance
F provides a test statistic to test the null
hypothesis that true 2 is zero by
compare this F ratio with the F-critical
obtained from F tables at the chosen
level of significance, or obtain the p-
value of the computed F statistic to
make decision
136
Chapter 5 TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis
Testing
5-9. Regression Analysis and Analysis of
Variance
Table 5-3. ANOVA for two-variable regression
model
137
Chapter 5 TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis
Testing
5-10. Application of
Regression
Analysis: Problem of
Prediction
By the data of Table 3-2, we obtained the
sample regression (3.6.2) :
Y^i = 24.4545 + 0.5091Xi , where
Y^i is the estimator of true E(Yi)
There are two kinds of prediction as
follows:
138
Chapter 5 TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis
Testing
5-10. Application of Regression
Analysis: Problem of
Prediction
Mean prediction: Prediction of the
conditional mean value of Y corresponding
to a chosen X, say X0, that is the point on
the population regression line itself (see
pages 137-138 for details)
Individual prediction: Prediction of an
individual Y value corresponding
139 to X0 (see
Chapter 5 TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis
Testing
5-11. Reporting the results of
regression analysis
An illustration:
143
Chapter 5 TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis
Testing
5-12. (Continued)
Under the null hypothesis H0 that the
residuals are normally distributed Jarque and
Bera show that in large sample
(asymptotically) the JB statistic given in
(5.12.12) follows the Chi-Square distribution
with 2 df. If the p-value of the computed Chi-
Square statistic in an application is
sufficiently low, one can reject the hypothesis
that the residuals are normally distributed.
But if p-value is reasonable high, one does
not reject the normality assumption.
144
Chapter 5 TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis
Testing
5-13. Summary and Conclusions
1. Estimation and Hypothesis testing
constitute the two main branches of
classical statistics
2. Hypothesis testing answers this
question: Is a given finding compatible
with a stated hypothesis or not?
3. There are two mutually complementary
approaches to answering the preceding
question: Confidence interval and test
of significance. 145
Chapter 5 TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis
Testing
5-13. Summary and
Conclusions
146
Chapter 5 TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis
Testing
5-13. Summary and Conclusions
5. Significance test procedure develops a
test statistic which follows a well-defined
probability distribution (like normal, t, F,
or Chi-square). Once a test statistic is
computed, its p-value can be easily
obtained.
The p-value The p-value of a test is
the lowest significance level, at which we
would reject H0. It gives exact probability
of obtaining the estimated test statistic
under H0. If p-value is small, one can
reject H0, but if 147it is large one may not
Chapter 5 TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis
Testing
5-13. Summary and Conclusions
6. Type I error is the error of rejecting a true
hypothesis. Type II error is the error of
accepting a false hypothesis. In practice, one
should be careful in fixing the level of
significance , the probability of committing a
type I error (at arbitrary values such as 1%,
5%, 10%). It is better to quote the p-value of
the test statistic.
148
Chapter 5 TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis
Testing
149
Chapter 5 TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis
Testing
150
Basic Econometrics
Chapter 6
EXTENSIONS OF THE
TWO-VARIABLE LINEAR
REGRESSION MODEL
151
Chapter 6
EXTENSIONS OF THE TWO-VARIABLE LINEAR
REGRESSION MODELS
155
Chapter 6
EXTENSIONS OF THE TWO-VARIABLE LINEAR
REGRESSION MODELS
Let Y = ^ + ^ X + u^
i 1 2 i i
(6.2.1)
Define Y* =w Y and X* =w X
i 1 i i 2 i
then:
^ = (w /w )^
2 1 2 2
(6.2.15)
^ = w ^
1 1 1
(6.2.16)
*^2 w 2^2
= 1
(6.2.17)
157
Chapter 6
EXTENSIONS OF THE TWO-VARIABLE LINEAR
REGRESSION MODELS
6-2. Scaling and units of measurement
From one scale of measurement, one can derive the
results
based on another scale of measurement. If w1= w2
the
intercept and standard error are both multiplied by
w1. If
w2=1 and scale of Y changed by w1, then all
coefficients and
standard errors are all multiplied by w1. If w1=1 and
scale of
X changed by w2, then only slope coefficient and its
standard
159
error are multiplied by 1/w2. Transformation from
6-3. Functional form of regression
model
The log-linear
model
Semi-log model
Reciprocal model
161
6-4. How to measure elasticity
The log-linear model
Exponential regression model:
Y = X e u
i 1 i i
(6.4.1)
By taking log to the base e of both
side:
lnY = ln + lnX + u , by setting
i 1 2 i i
ln1 =
lnY = + lnX + u
i 2 i i
(6.4.3)
(log-log, or double-log, or log-
linear model)
This can be estimated
162 by OLS by
letting
6-4. How to measure elasticity
166
6-6. Reciprocal Models:
Log-lin and Lin-log Models
170
Basic Econometrics
Chapter 7
MULTIPLE REGRESSION ANALYSIS:
The Problem of
Estimation
171
7-1. The three-Variable Model:
Notation and Assumptions
Yi = ß1+ ß2X2i + ß3X3i + u i (7.1.1)
ß2 , ß3 are partial regression coefficients
With the following assumptions:
+ Zero mean value of U i:: E(u i|X2i,X3i) = 0. i (7.1.2)
+ No serial correlation: Cov(ui,uj) = 0, i # j (7.1.3)
+ Homoscedasticity: Var(u i) = 2
(7.1.4)
+ Cov(ui,X2i) = Cov(ui,X3i) = 0 (7.1.5)
+ No specification bias or model correct specified
(7.1.6)
+ No exact collinearity between X variables
(7.1.7)
(no multicollinearity in the cases of more explanatory
vars. If there is linear relationship exits, X vars. Are
said
to be linearly dependent)
+ Model is linear in parameters
172
7-2. Interpretation of
Multiple Regression
173
7-3. The meaning of partial
regression coefficients
Yi= ß1+ ß2X2i + ß3X3 +….+ ßsXs+ ui
ßk measures the change in the
mean value of Y per unit change
in Xk, holding the rest
explanatory variables constant. It
gives the “direct” effect of unit
change in Xk on the E(Yi), net of
Xj (j # k)
How to control the “true” effect
of a unit change
174 in Xk on Y? (read
7-4. OLS and ML estimation of the
partial regression coefficients
This section (pages 197-201)
provides:
1. The OLS estimators in the case of
three-variable regression
Yi= ß1+ ß2X2i + ß3X3+ ui
2. Variances and standard errors of
OLS estimators
3. 8 properties of OLS estimators (pp
199-201)
175
4. Understanding on ML estimators
7-5. The multiple coefficient of
determination R2 and the multiple
coefficient of correlation R
This section provides:
1. Definition of R2 in the context of
multiple regression like r2 in the case
of two-variable regression
2. R = R2 is the coefficient of multiple
regression, it measures the degree of
association between Y and all the
explanatory variables jointly
3. Variance of a partial regression
coefficient
Var(ß^k) = 2/ x2k (1/(1-R2k)) (7.5.6)
Where ß^k is the partial regression
coefficient of regressor X k and R2k is the
R2 in the regression
176 of Xk on the rest
regressors
7-6. Example 7.1: The
expectations-augmented Philips
Curve for the US (1970-1982)
177
7-7. Simple regression in the
context of multiple regression:
Introduction to specification bias
178
7-8. R2 and the Adjusted-R2
R2 is a non-decreasing function of the number
of explanatory variables. An additional X
variable will not decrease R2
R2= ESS/TSS = 1- RSS/TSS = 1-u^2I / y^2i
(7.8.1)
This will make the wrong direction by adding
more irrelevant variables into the regression
and give an idea for an adjusted-R2 (R bar) by
taking account of degree of freedom
R2bar= 1- [ u^2I /(n-k)] / [y^2i /(n-1) ] , or
(7.8.2)
R2bar= 1- ^2 / S2Y (S2Y is sample variance of Y)
K= number of parameters including intercept term
By substituting (7.8.1) into (7.8.2) we get
R2bar = 1- (1-R2) (n-1)/(n- k)
(7.8.4)
For k > 1, R2bar179
< R2 thus when number of X
variables increases R2bar increases less than R2 and
7-8. R2 and the Adjusted-R2
182
7-10. Example 7.3: The Cobb-
Douglas Production function
More on functional form
(7.10.2)
Data set is in Table 7.3
Report of results is in page
216
183
7-11 Polynomial Regression
Models
--------------------------------------------------------
------
7-12. Summary
184 and Conclusions
(page 221)
Basic Econometrics
Chapter 8
MULTIPLE REGRESSION ANALYSIS:
The Problem of
Inference
185
Chapter 8
MULTIPLE REGRESSION ANALYSIS:
The Problem of Inference
8-3. Hypothesis testing in multiple regression:
Testing hypotheses about an individual
partial regression coefficient
Testing the overall significance of the
estimated multiple regression model, that
is, finding out if all the partial slope
coefficients are simultaneously equal to zero
Testing that two or more coefficients are
equal to one another
Testing that the partial regression
coefficients satisfy certain restrictions
Testing the stability of the estimated
regression model over time or in different
cross-sectional units
Testing the functional form of regression
models
186
Chapter 8
MULTIPLE REGRESSION ANALYSIS:
The Problem of Inference
8-4. Hypothesis testing about
individual partial regression
coefficients
With the assumption that u i ~
N(0,2) we can use t-test to test a
hypothesis about any individual
partial regression coefficient.
H0 : 2 = 0
H1 : 2 0
If the computed t value > critical t
value at the chosen level of
significance, we may reject the null
hypothesis; otherwise,
187
we may not
Chapter 8
MULTIPLE REGRESSION ANALYSIS:
The Problem of Inference
8-5. Testing the overall significance of a
multiple
regression: The F-Test
For Yi = 1 + 2X2i + 3X3i + ........+ kXki
+ ui
To test the hypothesis H : =
0 2 3
=....= k= 0 (all slope coefficients are
simultaneously zero) versus H1: Not at all
slope coefficients are simultaneously
zero, compute
F=(ESS/df)/(RSS/df)=(ESS/(k-1))/(RSS/
(n-k)) (8.5.7) (k = total number of
parameters to be estimated
188
including
intercept)
Chapter 8
MULTIPLE REGRESSION ANALYSIS:
The Problem of Inference
8-5. Testing the overall significance
of a multiple regression
Alternatively, if the p-value of F
obtained from (8.5.7) is sufficiently
low, one can reject H0
An important relationship between
R2 and F:
F=(ESS/(k-1))/(RSS/(n-k)) or
R2/(k-1)
F = ----------------
(8.5.1)
(1-R2)189/ (n-k)
Chapter 8
MULTIPLE REGRESSION ANALYSIS:
The Problem of Inference
8-5. Testing the overall significance
of a multiple regression in terms of
R2
For Yi = 1 + 2X2i + 3X3i + ........+ kXki + ui
To test the hypothesis H : = = .....=
0 2 3
k = 0 (all slope coefficients are
simultaneously zero) versus H1: Not
at all slope coefficients are
simultaneously zero, compute
F = [R2/(k-1)] / [(1-R2) / (n-k)]
(8.5.13) (k = total number of
parameters to be190estimated including
intercept)
Chapter 8
MULTIPLE REGRESSION ANALYSIS:
The Problem of Inference
8-5. Testing the overall significance
of a multiple regression
Alternatively, if the p-value of F
obtained from (8.5.13) is sufficiently
low, one can reject H0
202
Chapter 8
MULTIPLE REGRESSION ANALYSIS:
The Problem of Inference
8-8. Comparing two regressions: Testing for
structural stability of regression models
Step 4: Given the assumptions of the Chow
Test, it can be show that
F = [S5 / k] / [S4 / (n1+n2 – 2k)]
(8.8.4)
follows the F distribution with Df = (k, n1+n2
– 2k)
Decision Rule: If F computed by (8.8.4) > F-
critical at the chosen level of significance a
=> reject the hypothesis that the regression
(8.8.1) and (8.8.2) are the same, or reject the
hypothesis of structural stability; One can
use p-value of the F obtained from (8.8.4) to
reject H0 if p-value low reasonably.
+ Apply for the 203 data in Table 8.8
Chapter 8
MULTIPLE REGRESSION ANALYSIS:
The Problem of Inference
8-9. Testing the functional form of regression:
204
Chapter 8
MULTIPLE REGRESSION ANALYSIS:
The Problem of Inference
8-9. Testing the functional form of regression:
Step 1: Estimate the linear model and obtain
the estimated Y values. Call them Yf
(i.e.,Y^). Take lnYf.
Step 2: Estimate the log-linear model and
obtain the estimated lnY values, call them lnf
(i.e., ln^Y )
Step 3: Obtain Z1 = (lnYf – lnf)
Step 4: Regress Y on Xs and Z1. Reject H0 if
the coefficient of Z1 is statistically
significant, by the usual t - test
Step 5: Obtain Z2 = antilog of (lnf – Yf)
Step 6: Regress lnY on lnXs and Z2. Reject H1
if the coefficient of Z2 is statistically
significant, 205
by the usual t-test
Chapter 8
MULTIPLE REGRESSION ANALYSIS:
The Problem of Inference
8-12. Summary
206 and Conclusions