0% found this document useful (0 votes)
307 views

Regression Models: 4.1 Literature Review

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
307 views

Regression Models: 4.1 Literature Review

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Chapter 4

Regression Models

4.1 Literature Review

Regression analysis is an important statistical application employed in many


disciplines. Before the introduction of a change point hypothesis into the
regression study, the statistician faced problems of being unable to establish
a regression model for some observed datasets. If the data structure has
changed after a certain point of time, then using one regression model to study
the data obviously leaves the data unfitted or leaves them poorly explained
by a regression model. Ever since the change point hypothesis was introduced
into statistical analyses, the study of switching regression models has taken
place in regression analysis. This made some previously poorly fitted regres-
sion models better fitted to some datasets after the change point was been
located in the regression models.
In the literature, many authors have studied the change point problem
associated with regression models. Quandt (1958, 1960) derived a likelihood
ratio-based test for testing and estimation about a linear regression model
obeying two separate regimes. Ferreira (1975) studied a switching regression
model from the Bayesian point of view with the assumption of a known
number of regimes. Brown, Durbin, and Evans (1975) introduced a method
of recursive residuals to test change points in multiple regression models.
Hawkins (1989) used a union and intersection approach to test changes in
a linear regression model. Kim (1994) considered a test for a change point
in linear regression by using the likelihood ratio statistic, and studied the
asymptotic behavior of the LRT statistic.
In this chapter, we present the change point problem in regression models
by combining the work of Quandt (1958, 1960), Ferreira (1975), Hawkins
(1989), Brown, Durbin, and Evans (1975), Kim (1994), and Chen (1998).
Specifically, we discuss the change point problem for the simple linear
regression model, as well as for the multiple linear regression model mainly
by using the Schwarz information criterion, and by a Bayesian approach.

J. Chen and A.K. Gupta, Parametric Statistical Change Point Analysis: With Applications 139
to Genetics, Medicine, and Finance, DOI 10.1007/978-0-8176-4801-5_4,
© Springer Science+Business Media, LLC 2012
140 4 Regression Models

4.2 Simple Linear Regression Model

Let (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) be a sequence of observations obtained in a


practical situation. The researcher currently might be interested in fitting a
linear regression model to these data,

yi = β0 + β1 xi + εi , i = 1, . . . , n,

where xi , i = 1, . . . , n, is a nonstochastic variable, β0 and β1 are unknown


regression parameters, εi , i = 1, . . . , n, is a random error distributed as
N (0, σ 2 ), with σ 2 unknown, and εi s are uncorrelated from observation to
observation. That is, yi , i = 1, . . . , n, is a random variable distributed as
N (β0 +β1 xi , σ 2 ). The nature of the data might cause the researcher to ponder
that the regression coefficients have changed after a certain point, say k, of the
observations, therefore, the following hypothesis testing might be interesting
to work on. That is, we want to test the null hypothesis:

H0 : μyi = β0 + β1 xi , for i = 1, . . . , n

versus the alternative hypothesis:

H0 : μyi = β01 + β11 xi , for i = 1, . . . , k


and μyi = β0∗ + β1∗ xi , for i = k + 1, . . . , n,

where k, k = 2, . . . , n−2, is the location of the change point; β0 , β1 , β01 , β11 , β0∗ ,
and β1∗ are unknown regression parameters. In the following sections we study
several methods of locating the change point k.

4.2.1 Informational Approach

Again, if we use an information criterion such as SIC, we can locate the


change point position by using the minimum SIC principle.
Under H0 , the likelihood function is
8
n
L0 (β0 , β1 , σ 2 ) = fYi (yi ; β0 , β1 , σ 2 )
i=1

8
n
1 2 2
= √ e−(((yi −β0 −β1 xi ) )/2σ )
2πσ 2
i=1
 
1 n
2 2
= √ exp − (yi − β0 − β1 xi ) /2σ ,
( 2πσ 2 )n i=1
4.2 Simple Linear Regression Model 141

and the maximum likelihood estimates of β0 , β1 , and σ 2 are, respectively,


Sxy
b1  β1 = ,
Sx
b0  β0 = y − b1 x,

1 
n
2 =
σ (yi − b0 − b1 xi )2 ,
n i=1

where

1 
n
x= xi ,
n i=1

1 
n
y= yi ,
n i=1

n
Sx = (xi − x)2 ,
i=1


n
Sxy = (xi − x)(yi − y).
i=1

It is clear to see that the MLEs obtained above coincide with least square
estimates of β0 , β1 , and σ 2 .
Therefore, the maximum likelihood under H0 is

sup L0 (β0 , β1 , σ 2 ) = L0 (β0 , β1 , σ


2 )
nn/2 e−n/2
=  n/2 ,
√ n 
n
2π (yi − b0 − b1 xi )2
i=1

and the SIC under H0 , denoted by SIC(n), is obtained as

SIC(n) = −2 log L0 (β0 , β1 , σ


2 ) + 3 log n
!

n
2
= n log 2π + n log (yi − b0 − b1 xi ) + n + 3 log n − n log n.
i=1

Under H1 , the likelihood function is

L1 (β01 , β11 , β0∗ , β1∗ , σ 2 )


8
n
= fYi (yi ; β01 , β11 , β0∗ , β1∗ , σ 2 )
i=1
142 4 Regression Models

8
k
1 1 1 2 2 8 n
1 ∗ ∗ 2 2
= √ e−(((yi −β0 −β1 xi ) )/2σ ) √ e−(((yi −β0 −β1 xi ) )/2σ )
2πσ 2 2πσ 2
i=1 i=k+1
 
1  k
1 1 2 2
= √ exp − (yi − β0 − β1 xi ) /2σ
( 2πσ 2 )n i=1
 
n
∗ ∗ 2 2
· exp − (yi − β0 − β1 xi ) /2σ .
i=k+1

Similar calculations give the MLEs of β01 , β11 , β0∗ , β1∗ , and σ 2 as the following,
respectively.

k Sxy
b11  β11 = ,
k Sx

b10  β01 = y k − b11 xk ,


n−k Sxy
b∗1  β1∗ = ,
n−k Sx

b∗0  β0∗ = y n−k − b∗1 xn−k ,


 k 
1  
n
2 =
σ (yi − b10 − b11 xi )2 + (yi − b∗0 − b∗1 xi )2 ,
n i=1
i=k+1

where

1 
k
xk = xi ,
k i=1

1 
k
yk = yi ,
k i=1

1 
n
xn−k = xi ,
n−k
i=k+1

1 
n
y n−k = yi ,
n−k
i=k+1


k
k Sx = (xi − xk )2 ,
i=1


k
k Sxy = (xi − xk )(yi − yk ),
i=1
4.2 Simple Linear Regression Model 143


n
n−k Sx = (xi − xn−k )2 ,
i=k+1


n
n−k Sxy = (xi − xn−k )(yi − y n−k ).
i=k+1

Hence, we obtain the SIC under H1 , denoted by SIC(k), for k = 2, . . . , n − 2,


as follows.

SIC(k) = −2 log L1 (β01 , β11 , β0∗ , β1∗ , σ


2 ) + 5 log n
 k 
 
n
1 1 2 ∗ ∗ 2
= n log 2π + n log (yi − b0 − b1 xi ) + (yi − b0 − b1 xi )
i=1 i=k+1

+ n + 5 log n − n log n.

With the implementation of this information criterion, SIC, we trans-


formed our task of hypothesis testing into a model selection process. The
null hypothesis H0 corresponds to a regression model with no change in the
parameters, and the alternative hypothesis H1 is represented by n – 1 regres-
sion models with a change point at position 2, or 3, . . . , or n – 2. Therefore
the decision rule for selecting one of the n regression models is: select the
model with no change (or accept H0 ) if SIC(n) < SIC(k), for all k; select a
model with a change at  k if SIC(
k) = min1<k<n−1 SIC(k) < SIC(n), where

k = 2, . . . , n − 2.

4.2.2 Bayesian Approach

In addition to the classical and informational approaches to the switching


simple linear regression model, several authors proposed the solution of such
problems from a Bayesian point of view. Chin Choy and Broemeling (1980)
studied Bayesian inference for a switching linear model. Holbert (1982) inves-
tigated the switching simple linear regression model and multiple regression
model by employing Bayesian methodology. We give a detailed presentation
of the Bayesian approach on the basis of Holbert’s work.
In the Bayesian settings, a change in the regression model is assumed to
have taken place. The problem is to find where this change point is located.
It is more appropriate to present the problem in the following setting.

μyi = β0 + β1 xi , for i = 1, . . . , k and


μyi = β0∗ + β1∗ xi ; for i = k + 1, . . . , n,

for k = 2, . . . , n − 2, where β0 , β1 , β0∗ , and β1∗ are unknown regression para-


meters. Our goal here is to find the value of k, or an estimated value of k
according to the data information.
144 4 Regression Models

The following general vague prior probability densities π0 (·) are assigned
to the parameters.

π0 (β0 , β1 , β0∗ , β1∗ |k, σ 2 ) ∝ constant, − ∞ < β0 , β1 , β0∗ , β1∗ < ∞,


 1
n−3 , k = 2, . . . , n − 2
π0 (k) = ,
0, otherwise
1
2 σ2 , 0 < σ2 < ∞
π0 (σ |k) ∝ .
0, otherwise

Because Yi , i = 1, . . . , k, ∼ iid N (β0 + β1 xi , σ 2 ), Yj , j = k + 1, . . . , n,


∼ iid N (β0∗ + β1∗ xj , σ 2 ), the joint density function (likelihood function) of
the data given the parameters is

L(β0 , β1 , β0∗ , β1∗ , k, σ2 ) = f (y1 , . . . , yn |β0 , β1 , β0∗ , β1∗ , k, σ2 )


 
1  k
2 2
= √ exp − (yi − β0 − β1 xi ) /2σ
( 2πσ 2 )n i=1
 
n
· exp − (yi − β0∗ − β1∗ xi )2 /2σ 2 .
i=k+1

Then, the joint posterior density of all the parameters is

π1 (k, β0 , β1 , β0∗ , β1∗ , σ 2 )


= f (k, β0 , β1 , β0∗ , β1∗ , σ 2 |y1 , . . . , yn )
f (y1 , . . . , yn |β0 , β1 , β0∗ , β1∗ , k, σ2 ) · f (β0 , β1 , β0∗ , β1∗ , k, σ2 )
=
f (y1 , . . . , yn )
∝ f (y1 , . . . , yn |β0 , β1 , β0∗ , β1∗ , k, σ2 ) · f (β0 , β1 , β0∗ , β1∗ , k, σ2 )
= π0 (k) · π0 (σ 2 |k) · π0 (β0 , β1 , β0∗ , β1∗ |k, σ 2 ) · L(β0 , β1 , β0∗ , β1∗ , k, σ2 )
 
1 1 1  k
2 2
∝ · · √ · exp − (yi − β0 − β1 xi ) /2σ
n − 3 σ 2 ( 2πσ 2 )n i=1
 
n
∗ ∗ 2 2
· exp − (yi − β0 − β1 xi ) /2σ
i=k+1
 n/2+1  
1 
k
2 2
∝ · exp − (yi − β0 − β1 xi ) /2σ
σ2 i=1
 

n
· exp − (yi − β0∗ − β1∗ xi )2 /2σ 2 .
i=k+1
4.2 Simple Linear Regression Model 145

Now, integrating π1 (k, β0 , β1 , β0∗ , β1∗ , σ 2 ) with respect to β0 , β1 , β0∗ , β1∗ , and
σ 2 , we obtain the posterior density of the change point location k as

π1 (k)
= f (k|y1 , . . . , yn )
 ∞ ∞ ∞ ∞  ∞
= Π1 (k, β0 , β1 , β0∗ , β1∗ , σ 2 )dβ0 dβ1 dβ0∗ dβ1∗ dσ 2
−∞ −∞ −∞ −∞ −∞
 ∞  ∞  ∞  ∞  ∞  n/2+1
1

0 −∞ −∞ −∞ −∞ σ2
 

k
2 2
· exp − (yi − β0 − β1 xi ) /2σ
i=1
 

n
· exp − (yi − β0∗ − β1∗ xi )2 /2σ 2 dβ0 dβ1 dβ0∗ dβ1∗ dσ 2
i=k+1

 I.

To simplify the expression I, we calculate the following.


 ∞  
1  k
2 2
√ exp − (yi − β0 − β1 xi ) /2σ dβ0
−∞ σ2 i=1
 ∞     2 @ 
1 √ k k 1
= √ exp − kβ0 − Σ yi − β1 Σ xi √ 2σ 2
−∞ σ2 i=1 i=1 k
 
· exp{−[( k Sx β1 − k Sx β1 )2 − β1k2
Sx +k Sy ]/2σ 2 }dβ0

2π  
= · exp{−[( k Sx β1 − k Sx β1 )2 − β1k2
Sx +k Sy ]/2σ 2 },
k
and
 

1 2π  
√ · exp{−[( k Sx β1 − k Sx β1 )2
−∞ σ2 k
− β1k
2
Sx +k Sy ]/2σ 2 }dβ1

=√ exp{−(k Sy − β1k
2
Sx )/2σ 2 }.
kk Sx
Moreover,
  

1 n
√ exp − (yi − β0 − β1 xi ) /2σ dβ0∗
∗ ∗ 2 2
−∞ σ2 i=k+1
146 4 Regression Models

2π  
= · exp{−[( n−k Sx β1∗ − n−k Sx β1∗ )2
n−k
−n−k Sx (β1∗2 ) +n−k Sy ]/2σ 2 },

and
 ∞  
1
√ exp{−[( n−k Sx β1∗ − n−k Sx β1∗ )2
−∞ σ2
−n−k Sx (β1∗2 ) +n−k Sy ]/2σ 2 }dβ1∗


= exp{−[n−k Sx −n−k Sx (β1∗2 )]/2σ 2 },
n−k Sx

where k Sx , and n−k Sx were given in the previous section, and


k 
n
k Sy = (yi − y k )2 , n−k Sy = (yi − y n−k )2 .
i=1 i=k+1

Then, I reduces to

(2π)2
I= 
k(n − k)k Sxn−k Sx
 ∞  n/2+1
1
· exp{−[k Sy − β1k
2
Sx +n−k Sy
0 σ2

−n−k Sx β1∗2 ]/2σ 2 }dσ 2 .

After some algebraic simplifications, we obtain

D k Sy − β1k
2
Sx +n−k Sy −n−k Sx (β1∗2 )


k 
n
= (yi − yi(1,k) )2 + (yi − yi(k+1,n) )2 ,
i=1 i=k+1

where
yi(1,k) = β0 + β1 xi , for i = 1, . . . , k,
and
yi(k+1,n) = β0∗ + β1∗ xi , for i = k + 1, . . . , n.
Therefore,
 ∞  n/2−1
(2π)2 1
I= exp{−D/2σ 2 }dσ 2 .
k(n − k)k Sx ·n−k Sx 0 σ2
4.2 Simple Linear Regression Model 147

Note that to be able to build two regression models, it is required that both
n > 2, and n − 2 > 2; hence, we have n ≥ 5.
For n = 2m, with m = 3, 4, . . . ,
 ∞  n/2−1
1
exp{−D/2σ 2 }dσ 2
0 σ2
(m − 3)!
=
(D/2)m−2
(n/2 − 3)!
=
(D/2)(n−4)/2
∝ D−((n−4)/2) ,

For n = 2m − 1, with m = 3, 4, . . . ,
 ∞  n/2−1 √
1 2 2 2π(2m − 7)!!
exp{−D/2σ }dσ =
0 σ2 Dm−5/2
∝ D−((n−4)/2) .

Finally, we obtain:

I ∝ [k(n − k)k Sx ·n−k Sx ]−1/2 D−((n−4)/2) , for k = 2, . . . , n − 2;

that is,

π1 (k) ∝ [k(n − k)k Sx ·n−k Sx ]−1/2 D−((n−4)/2) , for k = 2, . . . , n − 2.

From the values of π1 (k), a change point is located at 


k if π1 (
k) = max2≤k≤n−2
π1 (k).

4.2.3 Application to Stock Market Data

Holbert (1982) studied the switching simple linear regression models and
switching linear model from a Bayesian point of view. He assigned some
vague prior densities to the unknown position of the change point and to
the unknown parameters of the model, and obtained the posterior density
of the change point. He analyzed the dataset on stock market sales volumes
to illustrate the estimation of the change point in two-phase regression by
calculating the posterior density of the change point. He found out that the
maximum posterior density occurred at position 24, which corresponded to
the calendar month of December of 1968, and concluded that it is a change
point caused by the abolition of give-ups (commission splitting) in December
of 1968.
148 4 Regression Models

We take the same data that Holbert used to illustrate the SIC method
for locating the switching change point in linear regression. The monthly
dollar volume of sales (in millions) on the Boston Stock Exchange (BSE) is
considered as the response variable, and the combined New York American
Stock Exchange (NYAMSE) is considered as the regressor. The computed
SIC values are listed in Table 4.1 along with the original BSE and NYAMSE
values given in Holbert (1982). The starred SIC value in this table is the
minimum SIC value, which corresponds to time point 23, hence the regression
model change starts at the time point 24, which is December of 1968. This
conclusion coincides with the one drawn by Holbert using his method. As the
reader may notice, the minimum SIC principle leads us firmly to the conclu-
sion on the change point. Although Holbert (1982) found the same change

Table 4.1 NYAMSE and BSE Values, Computed SIC Values


Time Point Calendar Month NYAMSE BSE SIC
1 Jan. 1967 10581.6 78.8 —
2 Feb. 1967 10234.3 69.1 368.5736
3 Mar. 1967 13299.5 87.6 368.0028
4 Apr. 1967 10746.5 72.8 367.9975
5 May 1967 13310.7 79.4 366.8166
6 Jun. 1967 12835.5 85.6 366.1827
7 Jul. 1967 12194.2 75.0 365.3197
8 Aug. 1967 12860.4 85.3 364.4143
9 Set. 1967 11955.6 86.9 364.0418
10 Oct. 1967 13351.5 107.8 364.0670
11 Nov. 1967 13285.9 128.7 365.1320
12 Dec. 1967 13784.4 134.5 365.8783
13 Jan. 1968 16336.7 148.7 365.7791
14 Feb. 1968 11040.5 94.2 366.0318
15 Mar. 1968 11525.3 128.1 367.1252
16 Apr. 1968 16056.4 154.1 367.2805
17 May 1968 18464.3 191.3 367.4632
18 Jun. 1968 17092.2 191.9 367.6615
19 Jul. 1968 15178.8 159.6 367.8082
20 Aug. 1968 12774.8 185.5 368.8873
21 Sep. 1968 12377.8 178.0 368.9790
22 Oct. 1968 16856.3 271.8 364.2126
23 Nov. 1968 14635.3 212.3 359.3774*
24 Dec. 1968 17436.9 139.4 362.7803
25 Jan. 1969 16482.2 106.0 366.7591
26 Feb. 1969 13905.4 112.1 367.4118
27 Mar. 1969 11973.7 103.5 367.6757
28 Apr. 1969 12573.6 92.5 368.4138
29 May 1969 16566.8 116.9 370.6948
30 Jun. 1969 13558.7 78.9 372.0000
31 Jul. 1969 11530.9 57.4 372.1517
32 Aug. 1969 11278.0 75.9 371.8513
33 Sep. 1969 11263.7 109.8 372.1726
34 Oct. 1969 15649.5 129.2 —
35 Nov. 1969 12197.1 115.1 361.4956
4.3 Multiple Linear Regression Model 149

point as here, his conclusion is less affirmative. As he pointed out, there is a


tendency of a relative maximum at the endpoints using the Bayesian posterior
density.

4.3 Multiple Linear Regression Model

As an analogue to the simple linear regression model, we consider the


switching multiple linear regression model in this section. The model we
discuss is
yi = xi β + εi , i = 1, . . . , n,
where xi i = 1, . . . , n is a nonstochastic (p + 1)-vector variable with xi =
(1, x1i , x2i , . . . , xpi ), β  = (β0 , β1 , . . . , βp ) is a (p + 1) unknown regression
vector, εi , i = 1, . . . , n is a random error which is distributed as N (0, σ 2 ), with
σ 2 unknown, and εi s are uncorrelated from observation to observation. That
is, yi i = 1, . . . , n, is a random variable distributed as N (xi β, σ 2 ). Situations
arise in which we would like to check if there is a change at location k in the
regression model. Then, we test the null hypothesis:

H0 : μyi = xi β for i = 1, . . . , n,

versus the alternative hypothesis:

H1 : μyi = xi β1 for i = 1, . . . , k,


and μyi = xi β2 for i = k + 1, . . . , n,

where k, k = p + 1, . . . , n − p is the location of the change point, and β, β 1 ,


and β2 are unknown regression parameters. In the following sections we give
a method to locate the change point k.

4.3.1 Informational Approach

An alternate approach for this multiple regression change point problem


is to use the Schwarz information criterion, SIC. It is a simple approach
that reduces computations in comparison with the likelihood-ratio procedure
approach.
Let
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
y1 1 x11 · · · xp1 x1 β0
⎜ ⎟ ⎜1 x · · · ⎟ ⎜ x ⎟ ⎜ ⎟
⎜ 2⎟
y ⎜ 12 xp2 ⎟ ⎜ 2⎟ ⎜ 1⎟
β
y=⎜ ⎟
⎜ .. ⎟ , X=⎜ ⎜ .. .. .. .. ⎟ ⎜ ⎟ ⎜ ⎟
⎟ ≡ ⎜ .. ⎟ , and β = ⎜ .. ⎟ ;
⎝.⎠ ⎝. . . . ⎠ ⎝ . ⎠ ⎝.⎠
yn 1 x1n · · · xpn xn  βp
150 4 Regression Models

then the null hypothesis H0 corresponds to the model

μy = Xβ,

where ⎛ ⎞
μy1
⎜μ ⎟
⎜ y2 ⎟
μy = ⎜ ⎟
⎜ .. ⎟ .
⎝ . ⎠
μyn
Obviously, the likelihood function under H0 in matrix notation is

L0 (β, σ 2 ) = f (y1 , y2 , . . . , yn ; β, σ2 )
= (2π)−n/2 (σ 2 )−n/2 exp{−(y − Xβ) (y − Xβ)/2σ 2 },

and the MLEs of β, and σ 2 are, respectively,

b  β = (X  X)−1 X  y,
1
2 =
σ (y − Xb) (y − Xb).
n
Then, the maximum likelihood under H0 is

L0 (β, 2 ) ≡ L0 (b, σ
2 )
 −n/2
1
= (2π)−n/2 (y − Xb) (y − Xb) e−n/2 .
n

Therefore, under H0 the Schwarz information criterion, denoted by SIC(n),


is obtained as

2 ) + (p + 2) log n
SIC(n) = −2 log L0 (b, σ
= n log[(y − Xb) (y − Xb)] + n(log 2π + 1) + (p + 2 − n) log n.

Let
⎛ ⎞ ⎛ ⎞
y1 yk+1
⎜ ⎟ ⎜ ⎟
⎜ y2 ⎟ ⎜yk+2 ⎟
y1 = ⎜ ⎟
⎜ .. ⎟ , y2 =⎜
⎜ .. ⎟ ,

⎝.⎠ ⎝ . ⎠
yk yn
⎛ ⎞ ⎛ ⎞
1 x11 ··· xp1 x1
⎜1 x ··· xp2 ⎟ ⎜x2 ⎟
⎟ ⎜
⎜ 12 ⎟
X1 = ⎜
⎜ .. .. .. .. ⎟ ⎜ ⎟
⎟ ≡ ⎜ .. ⎟ ,
⎝. . . . ⎠ ⎝ . ⎠
1 x1k ··· xpk xk
4.3 Multiple Linear Regression Model 151
⎛ ⎞⎛  ⎞
1 x11 ··· xp(k+1) xk+1
⎜1 x ··· xp(k+2) ⎟ ⎜xk+2 ⎟
⎟ ⎜
⎜ 12 ⎟
X2 = ⎜
⎜ .. .. .. .. ⎟ ⎜
⎟ ≡ ⎜ .. ⎟

⎝. . . . ⎠ ⎝ . ⎠
1 x1k ··· xpn xn
⎛ ⎞ ⎛ ∗⎞ ⎛ ⎞
β0 β0 β0
⎜β ⎟ ⎜β ∗ ⎟ ⎜β ⎟
⎜ 1⎟ ⎜ 1⎟ ⎜ 1⎟
β1 = ⎜ ⎟
⎜ .. ⎟ , β2 =⎜ ⎟ ⎜ ⎟
⎜ .. ⎟ , and β = ⎜ .. ⎟ ,
⎝.⎠ ⎝ .⎠ ⎝.⎠

βp βp βp

for k = p + 1, . . . , n − p; then the alternative hypothesis H1 corresponds to


the following models.

μy1 = X1 β1 and μy2 = X2 β2 , for k = p + 1, . . . , n − p,

where ⎛ ⎞ ⎛ ⎞
μy1 μyk+1
⎜ ⎟ ⎜μ ⎟
⎜μy2 ⎟ ⎜ yk+2 ⎟
μy1 =⎜
⎜ .. ⎟
⎟ and μy2 =⎜
⎜ .. ⎟ .

⎝ . ⎠ ⎝ . ⎠
μyk μyn
In this case, the likelihood function is found to be

L1 (β1 , β2 , σ 2 ) = f (y1 , y2 , . . . , yn ; β1 , β2 , σ 2 )
= (2π)−n/2 (σ 2 )−n/2 exp{−(y1 − X1 β1 ) (y1 − X1 β1 )/2σ 2 }
· exp{−(y2 − X2 β2 ) (y2 − X2 β2 )/2σ 2 },

and the MLEs of the parameters are, respectively,

b1  β1 = (X1 X1 )−1 X1 y1 ,


b2  β2 = (X2 X2 )−1 X2 y2 ,
1
2 =
σ [(y1 − X1 b1 ) (y1 − X1 b1 ) + (y2 − X2 b2 ) (y2 − X2 b2 )].
n
Then the maximum likelihood is

L1 (β1 , β2 , σ
2 ) ≡ L1 (b1 , b2 , σ
2 )

1
= (2π)−n/2 [(y1 − X1 b1 ) (y1 − X1 b1 )
n
−n/2

+ (y2 − X2 b2 ) (y2 − X2 b2 )] e−n/2 .
152 4 Regression Models

Therefore, under H1 the Schwarz information criterion, denoted by SIC(k)


for k = p + 1, . . . , n − p, is obtained as

2 ) + (2p + 3) log n
SIC(k) = −2 log L1 (b1 , b2 , σ
= n log[(y1 − X1 b1 ) (y1 − X1 b1 ) + (y2 − X2 b2 ) (y2 − X2 b2 )]
+ n(log 2π + 1) + (2p + 3 − n) log n.

According to the principle of information criterion in model selection, H0 will


be accepted if SIC(n) ≤ minp+1≤k≤n−p SIC(k), and H1 will be accepted if
SIC(n) > minp+1≤k≤n−p SIC(k). When H1 is accepted, the estimated posi-
tion of the switching linear model will be 
k such that

SIC(
k) = min SIC(k).
p+1≤k≤n−p

4.3.2 Bayesian Approach

Holbert (1982) also investigated the switching multiple linear regression


models from a Bayesian point of view. In the Bayesian setting, a change
in the regression model is assumed to have taken place. The problem is to
search where this change point is located. It is more appropriate to present
the problem in the following setting.

μyi = xi β1 for i = 1, . . . , k and μyi = xi β2 for i = k + 1, . . . , n,

for k = p + 1, . . . , n − p.
Let    
y1 β1 1
y= , β= , and R = ,
y2 β2 σ2
where y1 , y2 , β1 , and β2 are defined in the previous section. We first assign
a discrete uniform prior to the change point position k:
 1
n−2p , k = p + 1, . . . , n − p
π0 (k) = .
0, otherwise

We also assume that the 2(p+1) parameter vector β and the parameter R are
jointly independent of the change point position k. Finally, we assume that
the parameter R has a prior distribution which is gamma with parameters a
and b, and the conditional prior of β given R = r is a 2(p + 1)-dimensional
normal distribution with mean vector β ∗ , and covariance matrix (1/r)τ −1 ,
where τ is (p + 1) × (p + 1) positive definite; that is,
4.3 Multiple Linear Regression Model 153
 ba a−1 −br
Γ (b) r e , r>0
π0 (R) = ,
0, otherwise

and # r $
rp+1 |τ |p+1 ∗  ∗
π0 (β|R = r) = exp − (β − β ) τ (β − β) .
(2π)p+1 2
Therefore, the joint prior of β and R can be written as

ba a−1 −br rp+1 |τ |p+1 # r $


∗  ∗
π0 (β,R) = r e · exp − (β − β ) τ (β − β )
Γ (b) (2π)p+1 2
  
1 ∗  ∗
∝r a+p
exp (−r) b + (β − β ) τ (β − β ) .
2

By introducing the n × (2p + 2) matrix X(k):


 
X1 01
X(k) = ,
02 X2

where X1 , X2 are defined in the previous section, 01 is a k × (p + 1) zero


matrix, and 02 is a (n − k) × (p + 1) zero matrix. In this case, the likelihood
function is

L1 (β, R, k) = L1 (β1 , β2 , R, k)
= f (y1 , y2 , . . . , yn ; β1 , β2 , R, k)
= (2π)−n/2 rn/2 exp{−r(y1 − X1 β1 ) (y1 − X1 β1 )/2}
· exp{−r(y2 − X2 β2 ) (y2 − X2 β2 )/2}
= (2π)−n/2 r−n/2 exp{−r(y − X(k)β) (y − X(k)β)/2},

hence, the joint posterior density of the parameters is

π1 (β, R, k) = f (β, R, k|y1 , y2 , . . . , yn )


∝ L1 (β, R, k)π0 (β, R)π0 (k)
 
1
∝ ra+p+n/2 · exp (−r) b + (β − β ∗ ) τ (β − β ∗ )
2

1
+ (y − X(k)β) (y − X(k)β) .
2

Integrating π1 (β, R, k) with respect to β and R, and simplifying, we obtain


the posterior density of the change point k as
154 4 Regression Models

π1 (k) = f (k|y1 , y2 , . . . , yn )

∝ D(k)−a |X(k) X(k) + τ |−1/2 , for k = p + 1, . . . , n − p,

where

a∗ = a + 1 + n/2,
1  − β ∗ ] w(k)[β(k)

 (k)] [y − y
D(k) = b + {[y − y  (k)] + [β(k) − β ∗ ]},
2
w(k) = X(k) X(k)[X(k) X(k) + τ ]−1 τ,

β(k) = [X(k) X(k)]−1 X(k) y,

 (k) = X(k)β(k).
y

Interested readers are referred to Chin Choy (1977), and Chin Choy and
Broemeling (1980) for the details.

You might also like