Multiple Regression Analysis(Three Variables) (1)
Multiple Regression Analysis(Three Variables) (1)
1|Page
No exact linear relationship between X2 and X3 ................................................................(7.1.9)
In Section 7.7, we will spend more time discussing the final assumption.
9. There is no specification bias.
The model is correctly specified. ....................................................................................(7.1.10)
7.4 OLS Estimation of the Partial Regression Coefficients
OLS Estimators:
To find the OLS estimators, let us first write the sample regression function (SRF) corresponding
to the PRF of Eq. (7.1.1) as follows
̂1+𝛽
𝑌𝑖 = 𝛽 ̂2 𝑋2𝑖 + 𝛽
̂3 𝑋3𝑖 +𝑢̂𝑖 ..................................................................................................(7.4.1)
where 𝑢̂𝑖 is the residual term, the sample counterpart of the stochastic disturbance term 𝑈𝑖 the OLS
procedure consists of choosing the values of the unknown parameters so that the residual sum of squares
̂2 is as small as possible. Symbolically
(RSS) ∑ 𝑢
̂2 = ∑(𝑌𝑖 − 𝛽
𝑚𝑖𝑛 ∑ 𝑢 ̂1 − 𝛽
̂2 𝑋2𝑖 − 𝛽
̂3 𝑋3𝑖 )2 ........................................................................(7.4.2)
The most straightforward procedure to obtain the estimators that will minimize Eq. (7.4.2) is to
differentiate it with respect to the unknowns, set the resulting expressions to zero, and solve them
simultaneously. This procedure gives the following normal equations:
̂1 + 𝛽
𝑌̅ = 𝛽 ̂2 ̅̅̅ ̂3 𝑋
𝑋2 + 𝛽 ̅̅̅3 … … … … … … … … … … … … … … … … … … … … … … … (7.4.3)
∑ 𝑌𝑖 𝑋2𝑖 = 𝛽 ̂2 ∑ 𝑋2𝑖 2 + 𝛽
̂1 ∑ 𝑋2𝑖 + 𝛽 ̂3 ∑ 𝑋2𝑖 𝑋3𝑖……………………………………………..(.7.4.4 )
̂1 ∑ 𝑋3𝑖 + 𝛽
∑ 𝑌𝑖 𝑋3𝑖 = 𝛽 ̂3 ∑ 𝑋3𝑖 2 … … … … … … … … … … (7.4.5)
̂2 ∑ 𝑋2𝑖 𝑋3𝑖 + 𝛽
2|Page
which give the OLS estimators of the population partial regression coefficients β2 and β3,
respectively.
In passing, note the following: (1) Equations (7.4.7) and (7.4.8) are symmetrical in nature because
one can be obtained from the other by interchanging the roles of X2 and X3;(2) the denominators
of these two equations are identical; and (3) the three-variable case is a natural extension of the
two-variable case.
Variances and Standard Errors of OLS Estimators
Having obtained the OLS estimators of the partial regression coefficients, we can derive
the variances and standard errors of these estimators. As in the two-variable case, we need the
standard errors for two main purposes: to establish confidence intervals and to test statistical
hypotheses. The relevant formulas are as follows:
1 ̅̅̅
𝑋22̅ (∑ 𝑥3𝑖 2 ) − ̅̅̅
𝑋32̅ (∑ 𝑥2𝑖 2 ) − 2(𝑋
̅̅̅2 ̅̅̅
𝑋3 )(∑ 𝑥2𝑖 𝑥3𝑖 ) 2
̂
𝑉𝑎𝑟(𝛽1 ) = [ + ]𝜎 … … … … … … .7.4.9
𝑛 2 2
(∑ 𝑥2𝑖 )(∑ 𝑥3𝑖 ) − (∑ 𝑥2𝑖 𝑥3𝑖 )2
̂
Se(𝛽 ̂
1 )=√𝑉𝑎𝑟(𝛽1 ) … … … … … … … … … … … … … … … … … … … … … … … … … … … … … . .7.4.10
∑ 𝑥3𝑖 2
̂
𝑉𝑎𝑟(𝛽2)= 𝜎 2 … … … … … … … … … … … … … … … . .7.4.11
(∑ 𝑥2𝑖 2 )(∑ 𝑥3𝑖 2 ) − (∑ 𝑥2𝑖 𝑥3𝑖 )2
or, equivalently,
𝜎2
̂
𝑉𝑎𝑟(𝛽2)= 2 … … … … … … … … … … … … … … … … … … … … … . .7.4.12
(∑ 𝑥2𝑖 2 )(1 − 𝑟23 )
where 𝑟23 is the sample coefficient of correlation between X2 and X3.
̂
Se(𝛽 ̂
2 ) = √𝑉𝑎𝑟(𝛽2 ) … … … … … … … … … … … … … … … … … … … … … … … … … … … .7.4.13
∑ 𝑥2𝑖 2
̂
𝑉𝑎𝑟(𝛽3)= 𝜎 2 … … … … … … … … … … … … … .7.4.14
(∑ 𝑥2𝑖 2 )(∑ 𝑥3𝑖 2 ) − (∑ 𝑥2𝑖 𝑥3𝑖 )2
or, equivalently,
𝜎2
̂
𝑉𝑎𝑟(𝛽3)= 2 … … … … … … … … … … … … … … … … … … … … 7.4.15
(∑ 𝑥3𝑖 2 )(1 − 𝑟23 )
̂
Se(𝛽 ̂
3 ) = √𝑉𝑎𝑟(𝛽3 ) … … … … … … … … … … … … … … … … … … … … … … … … … … .7.4.16
̂
̂ −𝑟23 𝜎2
Cov((𝛽2, 𝛽3 ) = 2 )√∑ 𝑥 2 √∑ 𝑥 2 … … … … … … … … … … … … … … … … … … … … 7.4.17
(1−𝑟23 2𝑖 3𝑖
3|Page
Unbiased estimator of 𝜎 2 is given by
̂𝑖2
∑ 𝑢
̂2 =
𝜎 … … … … … … … … … … … … … … … … … … … … … … … … … … .7.4.18
𝑛−3
The Multiple Coefficient of Determination 𝑹𝟐
Now, by definition
𝐸𝑆𝑆
𝑅2 =
𝑇𝑆𝑆
̂2 ∑ 𝑦𝑖 𝑥2𝑖 +𝛽
𝛽 ̂3 ∑ 𝑦𝑖 𝑥3𝑖
= ∑ 𝑦𝑖2
… … … … … … … … … … … … … … … … … … … … … … … … … … … … 7.4.19
𝑅𝑆𝑆
=1 − 𝑇𝑆𝑆
̂2
∑𝑢
=1 − ∑ 𝑦𝑖2 … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … .7.4.20
𝑖
∑𝑢̂𝑖2
2
𝑅 =1−
∑ 𝑦𝑖2
∑ 𝑢̂2
𝑖
̅̅̅ (𝑛 − 𝑘)
𝑅 2̅ = 1 − … … … … … … … … … … … … … … … … … … … … … … … … … … . .7.4.21
∑ 𝑦𝑖2
(𝑛 − 1)
where k = the number of parameters in the model including the intercept term.
The 𝑅 2 thus defined is known as the adjusted 𝑅 2 , denoted by ̅𝑅̅̅2̅
(𝑛 − 1)
̅̅̅
𝑅 2̅ = 1 − (1 − 𝑅 2 ) … … … … … … … … … … … … … … … … … … … … … … … .7.4.22
(𝑛 − 𝑘)
Multiple Correlation:
In problems of multiple correlation, we are dealing with situation that involve three or more
variables. For example, we may consider the association between the yield of wheat per acre and
both the amount of rainfall and the average daily temperature. We are trying to made estimates of
the value of one of these variables based on the values of all the others. The variable whose value
we are trying to estimate is called the dependent variable and the other variables on which our
estimates are based are known as independent variables.
The coefficient of multiple correlation can be expressed in terms of 𝑟12 , 𝑟13 and 𝑟23 as follows:
𝑟12 2 +𝑟13 2 −2𝑟12 𝑟13 𝑟23
𝑅1(23) = √ 1−𝑟23 2
........................................................................................7.4.23
4|Page
Coefficient of multiple determination: Coefficient of multiple determination is the square of
coefficient of multiple correlation. If for three variables multiple correlation coefficient is
𝑅1(23) then the coefficient of multiple determination is defined as
𝑟12 2 +𝑟13 2 −2𝑟12 𝑟13 𝑟23
𝑅 21(23) = 1−𝑟 2 23
...........................................................................................................7.4.24
Partial Correlation:
In partial correlation we recognize more than two variables but consider only two variables to be
influencing each other, the effect of other influencing variables being kept constant. For example,
if rice production (𝑋1) depends on rainfall (𝑋2) and irrigation (𝑋3) then correlation between
production and rain fall when irrigation is kept constant is called partial correlation. It is denoted
by 𝑟12.3 and is defined as
𝑟12 −𝑟13 𝑟23
𝑟12(3) = ..............................................................................7.4.25
√(1−𝑟 2 13 )(1−𝑟 2 23 )
𝑟13 −𝑟12 𝑟23
𝑟13(2) = .............................................................................7.4.25
√(1−𝑟 2 12 )(1−𝑟 2 23 )
Hypothesis
𝐻𝑜 : 𝛽2 = 0
𝐻1 : 𝛽2 ≠ 0
Test Statistic
̂2 − 𝛽2
𝛽
𝑡= ~𝑡(𝑛−3) … … … … … … … … … … … … … … … … … … … … … … … … … … 8.3
̂
Se(𝛽2)
5|Page
𝐻𝑜 : 𝛽3 = 0
𝐻1 : 𝛽3 ≠ 0
Test Statistic
̂3 − 𝛽3
𝛽
𝑡= ~𝑡(𝑛−3) … … … … … … … … … … … … … … … … … … … … … … … … 8.5
̂
Se(𝛽3)
𝐻𝑜 : 𝛽2 = 𝛽3 = 0
𝐻𝑜 : 𝛽2 ≠ 𝛽3 ≠ 0
6|Page
̂2 ∑ 𝑦𝑖 𝑥2𝑖 + 𝛽
[𝛽 ̂3 ∑ 𝑦𝑖 𝑥3𝑖 ] 𝐸𝑆𝑆
⁄𝑑𝑓
𝐹= 2 = ~𝐹(2,𝑛−3) … … … … … … … … … … … … … … … … … 8.8
∑ 𝑢̂2
𝑖
𝑅𝑆𝑆⁄
𝑑𝑓
(𝑛 − 𝑘)
ANOVA Table for the Three-Variable Regression
Source of variation SS df MSS
(SV)
Due to regression ̂2 ∑ 𝑦𝑖 𝑥2𝑖 + 𝛽
̂3 ∑ 𝑦𝑖 𝑥3𝑖 2 ̂2 ∑ 𝑦𝑖 𝑥2𝑖 + 𝛽
[𝛽 ̂3 ∑ 𝑦𝑖 𝑥3𝑖 ]
𝛽
(ESS) 2
∧ ∧ ∧
Due to residuals n-3
∑ 𝑢𝑖2 ∑ 𝑢𝑖2 /(n-3) =𝜎22
(RSS)
TSS ∑ 𝑦𝑖2 n-1
7|Page
Time period 1970–1981: Yt = λ1 + λ2Xt + u1t, n1 = 12 (8.7.1)
Time period 1982–1995: Yt = γ1 + γ2Xt + u2t , n2 = 14 (8.7.2)
Time period 1970–1995: Yt = α1 + α2Xt + ut ,n = (n1 + n2) = 26 (8.7.3)
Regression (8.7.3) assumes that there is no difference between the two time periods and
therefore, estimates the relationship between savings and DPI for the entire time period consisting
of 26 observations. In other words, this regression assumes that the intercept as well as the slope
coefficient remains the same over the entire period; that is, there is no structural change. If this is
in fact the situation, then α1 = λ1 = γ1 and α2 = λ2 = γ2. Regressions (8.7.1) and (8.7.2) assume
that the regressions in the two time periods are different; that is, the intercept and the slope
coefficients are different, as indicated by the subscripted parameters. In the preceding regressions,
the u’s represent the error terms and the n’s represent the number of observations.
For the data given in Table 8.9, the empirical counterparts of the preceding three regressions are
as follows:
Yˆt = 1.0161 + 0.0803 Xt
t = (0.0873) (9.6015) (8.7.1a)
R2 = 0.9021 RSS1 = 1785.032 df = 10
Yˆt = 153.4947 + 0.0148Xt
t = (4.6922) (1.7707) (8.7.2a)
R2 = 0.2971 RSS2 = 10,005.22 df = 12
Yˆt = 62.4226 + 0.0376 Xt + ···
t = (4.8917) (8.8937) +··· (8.7.3a)
R2 = 0.7672 RSS3 = 23,248.30 df = 24
This is where the Chow test comes in handy. This test assumes that:
1. u1t ∼ N(0, σ2) and u2t ∼ N(0, σ2). That is, the error terms in the subperiod regressions are
normally distributed with the same (homoscedastic) variance σ2.
2. The two error terms u1t and u2t are independently distributed.
The mechanics of the Chow test are as follows:
𝑯𝒐 : There is no structural change or break (regressions are statistically same)
𝑯𝟏 : There is a structural change or break (regressions are not statistically same)
1. Estimate regression (8.7.3), which is appropriate if there is no parameter instability, and obtain
RSS3 with df = (n1 + n2 − k), where k is the number of parameters estimated, 2 in the present
8|Page
case. For our example RSS3 = 23,248.30. We call RSS3 the restricted residual sum of squares
(RSSR) because it is obtained by imposing the restrictions that λ1 = γ1 and λ2 = γ2, that is, the
subperiod regressions are not different.
2. Estimate Eq. (8.7.1) and obtain its residual sum of squares, RSS1, with df = (n1 − k). In our
example, RSS1 = 1785.032 and df = 10.
3. Estimate Eq. (8.7.2) and obtain its residual sum of squares, RSS2, with df = (n2 − k). In our
example, RSS2 = 10,005.22 with df = 12.
4. Since the two sets of samples are deemed independent, we can add RSS1 and RSS2 to obtain
what may be called the unrestricted residual sum of squares (RSSUR), that is, RSSUR = RSS1 +
RSS2 with df = (n1 + n2 − 2k).
In the present case,
RSS(UR) = (1785.032 + 10,005.22) = 11,790.252
5. Now the idea behind the Chow test is that if in fact there is no structural change
(i.e., regressions [8.7.1] and [8.7.2] are essentially the same), then the RSS(R) and RSS(UR)
should not be statistically different. Therefore, if we form the following ratio:
(𝑅𝑆𝑆𝑅 − 𝑅𝑆𝑆𝑈𝑅 )⁄
𝐹= 𝐾 ~𝐹
[𝑘,(𝑛1 +𝑛2 −2𝑘)]
(𝑅𝑆𝑆𝑈𝑅 )
⁄(𝑛 + 𝑛 − 2𝑘)
1 2
The Troika of Hypothesis Tests: The Likelihood Ratio (LR), Wald (W), and Lagrange
Multiplier (LM) Tests:
we have, by and large, used the t, F, and chi-square tests to test a variety of hypotheses in the
context of linear (in-parameter) regression models. But once we go beyond the somewhat
comfortable world of linear regression models, we need a method(s) to test hypotheses that can
handle regression models, linear or not. The well-known trinity of likelihood, Wald, and Lagrange
multiplier tests can accomplish this purpose. The interesting thing to note is that asymptotically
(i.e., in large samples) all three tests are equivalent in that the test statistic associated with each of
these tests follows the chi-square distribution.
Testing the Functional Form of Regression: Choosing between Linear and Log–Linear
Regression Models
The choice between a linear regression model (the regressand is a linear function of the regressors)
or a log–linear regression model (the log of the regressand is a function of the logs of the
9|Page
regressors) is a perennial question in empirical analysis. We can use a test proposed by
MacKinnon, White, and Davidson, which for brevity we call the MWD test, to choose between
the two models.
To illustrate this test, assume the following:
H0: Linear Model: Y is a linear function of regressors, the X’s.
H1: Log–Linear Model: ln Y is a linear function of logs of regressors, the logs of X’s.
The MWD test involves the following steps:
Step I: Estimate the linear model and obtain the estimated Y values. Call them Yf (i.e., Yˆ).
Step: II: Estimate the log–linear model and obtain the estimated ln Y values; call them ln f (i.e.,
lnY ).
Step III: Obtain Z1 = (ln Y f − ln f ).
Step IV: Regress Y on X’s and Z1 obtained in Step III. Reject H0 if the coefficient of Z1 is
statistically significant by the usual t test.
Step V: Obtain Z2 = (antilog of ln f − Y f ).
Step VI: Regress log of Y on the logs of X’s and Z2. Reject H1 if the coefficient of Z2
is statistically significant by the usual t test.
Although the MWD test seems involved, the logic of the test is quite simple. If the linear model is
in fact the correct model, the constructed variable Z1 should not be statistically significant in Step
IV, for in that case the estimated Y values from the linear model and those estimated from the log–
linear model (after taking their antilog values for comparative purposes) should not be different.
The same comment applies to the alternative hypothesis H1.
Ex-01: On the basis of observation made on agricultural production, the use of fertilizers and the
use of irrigation the following data were obtained.
Production Use of fertilizer Use of irrigation
64 9 48
72 10 50
50 8 45
96 13 56
102 15 58
130 18 63
125 19 60
136 20 65
(j) Set up the ANOVA tables and Test the Overall Significance of a Multiple Regression
(k) Testing the Functional Form of Regression: Choosing between Linear and Log–Linear
Regression Models
(l) How would you test the hypothesis that the error term in the population regression is normally
distributed? Show the necessary calculations.
11 | P a g e