Regression Models: 4.1 Literature Review
Regression Models: 4.1 Literature Review
Regression Models
J. Chen and A.K. Gupta, Parametric Statistical Change Point Analysis: With Applications 139
to Genetics, Medicine, and Finance, DOI 10.1007/978-0-8176-4801-5_4,
© Springer Science+Business Media, LLC 2012
140 4 Regression Models
yi = β0 + β1 xi + εi , i = 1, . . . , n,
H0 : μyi = β0 + β1 xi , for i = 1, . . . , n
where k, k = 2, . . . , n−2, is the location of the change point; β0 , β1 , β01 , β11 , β0∗ ,
and β1∗ are unknown regression parameters. In the following sections we study
several methods of locating the change point k.
8
n
1 2 2
= √ e−(((yi −β0 −β1 xi ) )/2σ )
2πσ 2
i=1
1 n
2 2
= √ exp − (yi − β0 − β1 xi ) /2σ ,
( 2πσ 2 )n i=1
4.2 Simple Linear Regression Model 141
1
n
2 =
σ (yi − b0 − b1 xi )2 ,
n i=1
where
1
n
x= xi ,
n i=1
1
n
y= yi ,
n i=1
n
Sx = (xi − x)2 ,
i=1
n
Sxy = (xi − x)(yi − y).
i=1
It is clear to see that the MLEs obtained above coincide with least square
estimates of β0 , β1 , and σ 2 .
Therefore, the maximum likelihood under H0 is
8
k
1 1 1 2 2 8 n
1 ∗ ∗ 2 2
= √ e−(((yi −β0 −β1 xi ) )/2σ ) √ e−(((yi −β0 −β1 xi ) )/2σ )
2πσ 2 2πσ 2
i=1 i=k+1
1 k
1 1 2 2
= √ exp − (yi − β0 − β1 xi ) /2σ
( 2πσ 2 )n i=1
n
∗ ∗ 2 2
· exp − (yi − β0 − β1 xi ) /2σ .
i=k+1
Similar calculations give the MLEs of β01 , β11 , β0∗ , β1∗ , and σ 2 as the following,
respectively.
k Sxy
b11 β11 = ,
k Sx
where
1
k
xk = xi ,
k i=1
1
k
yk = yi ,
k i=1
1
n
xn−k = xi ,
n−k
i=k+1
1
n
y n−k = yi ,
n−k
i=k+1
k
k Sx = (xi − xk )2 ,
i=1
k
k Sxy = (xi − xk )(yi − yk ),
i=1
4.2 Simple Linear Regression Model 143
n
n−k Sx = (xi − xn−k )2 ,
i=k+1
n
n−k Sxy = (xi − xn−k )(yi − y n−k ).
i=k+1
+ n + 5 log n − n log n.
The following general vague prior probability densities π0 (·) are assigned
to the parameters.
Now, integrating π1 (k, β0 , β1 , β0∗ , β1∗ , σ 2 ) with respect to β0 , β1 , β0∗ , β1∗ , and
σ 2 , we obtain the posterior density of the change point location k as
π1 (k)
= f (k|y1 , . . . , yn )
∞ ∞ ∞ ∞ ∞
= Π1 (k, β0 , β1 , β0∗ , β1∗ , σ 2 )dβ0 dβ1 dβ0∗ dβ1∗ dσ 2
−∞ −∞ −∞ −∞ −∞
∞ ∞ ∞ ∞ ∞ n/2+1
1
∝
0 −∞ −∞ −∞ −∞ σ2
k
2 2
· exp − (yi − β0 − β1 xi ) /2σ
i=1
n
· exp − (yi − β0∗ − β1∗ xi )2 /2σ 2 dβ0 dβ1 dβ0∗ dβ1∗ dσ 2
i=k+1
I.
and
∞
1
√ exp{−[( n−k Sx β1∗ − n−k Sx β1∗ )2
−∞ σ2
−n−k Sx (β1∗2 ) +n−k Sy ]/2σ 2 }dβ1∗
2π
= exp{−[n−k Sx −n−k Sx (β1∗2 )]/2σ 2 },
n−k Sx
k
n
k Sy = (yi − y k )2 , n−k Sy = (yi − y n−k )2 .
i=1 i=k+1
Then, I reduces to
(2π)2
I=
k(n − k)k Sxn−k Sx
∞ n/2+1
1
· exp{−[k Sy − β1k
2
Sx +n−k Sy
0 σ2
D k Sy − β1k
2
Sx +n−k Sy −n−k Sx (β1∗2 )
k
n
= (yi − yi(1,k) )2 + (yi − yi(k+1,n) )2 ,
i=1 i=k+1
where
yi(1,k) = β0 + β1 xi , for i = 1, . . . , k,
and
yi(k+1,n) = β0∗ + β1∗ xi , for i = k + 1, . . . , n.
Therefore,
∞ n/2−1
(2π)2 1
I= exp{−D/2σ 2 }dσ 2 .
k(n − k)k Sx ·n−k Sx 0 σ2
4.2 Simple Linear Regression Model 147
Note that to be able to build two regression models, it is required that both
n > 2, and n − 2 > 2; hence, we have n ≥ 5.
For n = 2m, with m = 3, 4, . . . ,
∞ n/2−1
1
exp{−D/2σ 2 }dσ 2
0 σ2
(m − 3)!
=
(D/2)m−2
(n/2 − 3)!
=
(D/2)(n−4)/2
∝ D−((n−4)/2) ,
For n = 2m − 1, with m = 3, 4, . . . ,
∞ n/2−1 √
1 2 2 2π(2m − 7)!!
exp{−D/2σ }dσ =
0 σ2 Dm−5/2
∝ D−((n−4)/2) .
Finally, we obtain:
that is,
Holbert (1982) studied the switching simple linear regression models and
switching linear model from a Bayesian point of view. He assigned some
vague prior densities to the unknown position of the change point and to
the unknown parameters of the model, and obtained the posterior density
of the change point. He analyzed the dataset on stock market sales volumes
to illustrate the estimation of the change point in two-phase regression by
calculating the posterior density of the change point. He found out that the
maximum posterior density occurred at position 24, which corresponded to
the calendar month of December of 1968, and concluded that it is a change
point caused by the abolition of give-ups (commission splitting) in December
of 1968.
148 4 Regression Models
We take the same data that Holbert used to illustrate the SIC method
for locating the switching change point in linear regression. The monthly
dollar volume of sales (in millions) on the Boston Stock Exchange (BSE) is
considered as the response variable, and the combined New York American
Stock Exchange (NYAMSE) is considered as the regressor. The computed
SIC values are listed in Table 4.1 along with the original BSE and NYAMSE
values given in Holbert (1982). The starred SIC value in this table is the
minimum SIC value, which corresponds to time point 23, hence the regression
model change starts at the time point 24, which is December of 1968. This
conclusion coincides with the one drawn by Holbert using his method. As the
reader may notice, the minimum SIC principle leads us firmly to the conclu-
sion on the change point. Although Holbert (1982) found the same change
μy = Xβ,
where ⎛ ⎞
μy1
⎜μ ⎟
⎜ y2 ⎟
μy = ⎜ ⎟
⎜ .. ⎟ .
⎝ . ⎠
μyn
Obviously, the likelihood function under H0 in matrix notation is
L0 (β, σ 2 ) = f (y1 , y2 , . . . , yn ; β, σ2 )
= (2π)−n/2 (σ 2 )−n/2 exp{−(y − Xβ) (y − Xβ)/2σ 2 },
b β = (X X)−1 X y,
1
2 =
σ (y − Xb) (y − Xb).
n
Then, the maximum likelihood under H0 is
σ
L0 (β, 2 ) ≡ L0 (b, σ
2 )
−n/2
1
= (2π)−n/2 (y − Xb) (y − Xb) e−n/2 .
n
2 ) + (p + 2) log n
SIC(n) = −2 log L0 (b, σ
= n log[(y − Xb) (y − Xb)] + n(log 2π + 1) + (p + 2 − n) log n.
Let
⎛ ⎞ ⎛ ⎞
y1 yk+1
⎜ ⎟ ⎜ ⎟
⎜ y2 ⎟ ⎜yk+2 ⎟
y1 = ⎜ ⎟
⎜ .. ⎟ , y2 =⎜
⎜ .. ⎟ ,
⎟
⎝.⎠ ⎝ . ⎠
yk yn
⎛ ⎞ ⎛ ⎞
1 x11 ··· xp1 x1
⎜1 x ··· xp2 ⎟ ⎜x2 ⎟
⎟ ⎜
⎜ 12 ⎟
X1 = ⎜
⎜ .. .. .. .. ⎟ ⎜ ⎟
⎟ ≡ ⎜ .. ⎟ ,
⎝. . . . ⎠ ⎝ . ⎠
1 x1k ··· xpk xk
4.3 Multiple Linear Regression Model 151
⎛ ⎞⎛ ⎞
1 x11 ··· xp(k+1) xk+1
⎜1 x ··· xp(k+2) ⎟ ⎜xk+2 ⎟
⎟ ⎜
⎜ 12 ⎟
X2 = ⎜
⎜ .. .. .. .. ⎟ ⎜
⎟ ≡ ⎜ .. ⎟
⎟
⎝. . . . ⎠ ⎝ . ⎠
1 x1k ··· xpn xn
⎛ ⎞ ⎛ ∗⎞ ⎛ ⎞
β0 β0 β0
⎜β ⎟ ⎜β ∗ ⎟ ⎜β ⎟
⎜ 1⎟ ⎜ 1⎟ ⎜ 1⎟
β1 = ⎜ ⎟
⎜ .. ⎟ , β2 =⎜ ⎟ ⎜ ⎟
⎜ .. ⎟ , and β = ⎜ .. ⎟ ,
⎝.⎠ ⎝ .⎠ ⎝.⎠
∗
βp βp βp
where ⎛ ⎞ ⎛ ⎞
μy1 μyk+1
⎜ ⎟ ⎜μ ⎟
⎜μy2 ⎟ ⎜ yk+2 ⎟
μy1 =⎜
⎜ .. ⎟
⎟ and μy2 =⎜
⎜ .. ⎟ .
⎟
⎝ . ⎠ ⎝ . ⎠
μyk μyn
In this case, the likelihood function is found to be
L1 (β1 , β2 , σ 2 ) = f (y1 , y2 , . . . , yn ; β1 , β2 , σ 2 )
= (2π)−n/2 (σ 2 )−n/2 exp{−(y1 − X1 β1 ) (y1 − X1 β1 )/2σ 2 }
· exp{−(y2 − X2 β2 ) (y2 − X2 β2 )/2σ 2 },
L1 (β1 , β2 , σ
2 ) ≡ L1 (b1 , b2 , σ
2 )
1
= (2π)−n/2 [(y1 − X1 b1 ) (y1 − X1 b1 )
n
−n/2
+ (y2 − X2 b2 ) (y2 − X2 b2 )] e−n/2 .
152 4 Regression Models
2 ) + (2p + 3) log n
SIC(k) = −2 log L1 (b1 , b2 , σ
= n log[(y1 − X1 b1 ) (y1 − X1 b1 ) + (y2 − X2 b2 ) (y2 − X2 b2 )]
+ n(log 2π + 1) + (2p + 3 − n) log n.
SIC(
k) = min SIC(k).
p+1≤k≤n−p
for k = p + 1, . . . , n − p.
Let
y1 β1 1
y= , β= , and R = ,
y2 β2 σ2
where y1 , y2 , β1 , and β2 are defined in the previous section. We first assign
a discrete uniform prior to the change point position k:
1
n−2p , k = p + 1, . . . , n − p
π0 (k) = .
0, otherwise
We also assume that the 2(p+1) parameter vector β and the parameter R are
jointly independent of the change point position k. Finally, we assume that
the parameter R has a prior distribution which is gamma with parameters a
and b, and the conditional prior of β given R = r is a 2(p + 1)-dimensional
normal distribution with mean vector β ∗ , and covariance matrix (1/r)τ −1 ,
where τ is (p + 1) × (p + 1) positive definite; that is,
4.3 Multiple Linear Regression Model 153
ba a−1 −br
Γ (b) r e , r>0
π0 (R) = ,
0, otherwise
and # r $
rp+1 |τ |p+1 ∗ ∗
π0 (β|R = r) = exp − (β − β ) τ (β − β) .
(2π)p+1 2
Therefore, the joint prior of β and R can be written as
L1 (β, R, k) = L1 (β1 , β2 , R, k)
= f (y1 , y2 , . . . , yn ; β1 , β2 , R, k)
= (2π)−n/2 rn/2 exp{−r(y1 − X1 β1 ) (y1 − X1 β1 )/2}
· exp{−r(y2 − X2 β2 ) (y2 − X2 β2 )/2}
= (2π)−n/2 r−n/2 exp{−r(y − X(k)β) (y − X(k)β)/2},
π1 (k) = f (k|y1 , y2 , . . . , yn )
∗
∝ D(k)−a |X(k) X(k) + τ |−1/2 , for k = p + 1, . . . , n − p,
where
a∗ = a + 1 + n/2,
1 − β ∗ ] w(k)[β(k)
(k)] [y − y
D(k) = b + {[y − y (k)] + [β(k) − β ∗ ]},
2
w(k) = X(k) X(k)[X(k) X(k) + τ ]−1 τ,
β(k) = [X(k) X(k)]−1 X(k) y,
(k) = X(k)β(k).
y
Interested readers are referred to Chin Choy (1977), and Chin Choy and
Broemeling (1980) for the details.