Chapter 7 Dynamic Econometric Models
Chapter 7 Dynamic Econometric Models
Which is known as the long run or total, distributed-lag multiplier, provided the sum 𝛽
exists.
1
7.2 Reason for Lags
Psychological reasons: habits (inertia) take time to change e.g. people do not
change their consumption habits immediately following a price decrease or increase,
income increase or decrease.
Technological reasons: people wait (gestation period) for improvements before they
buy a product. Another example if drop of price is expected to be temporary, firms may
not rush to substitute capital for labour.
Institutional reasons: Contractual obligations may prevent firms (institutions) to
make immediate changes such as switching suppliers of raw materials or even labour.
7.3 Estimation of Distributed-Lag Models
𝑌𝑡 = 𝛼 + 𝛽0 𝑋𝑡 + 𝛽1 𝑋𝑡−1 + 𝛽2 𝑋𝑡−2 … + 𝜇𝑡 is called an infinite (lag) model and if we define
how far back into the past we can go, then the model is called finite (lag) distributed-
lag model.
7.3.1 Ad Hoc Estimation of Distributed-Lag Model
Assumption: 𝑋𝑡 the explanatory variable is assumed to be nonstochastic,
Procedure:
Run OLS sequentially
• First regress 𝑌𝑡 on 𝑋𝑡 .
• Regress 𝑌𝑡 on 𝑋𝑡 and 𝑋𝑡−1 .
• Regress 𝑌𝑡 on 𝑋𝑡 , 𝑋𝑡−1 and 𝑋𝑡−2 and so on until the regression coefficient of
the lagged variables start to become statistically insignificant or the coefficient
of at least one of the variables change sign (+ to −) or (− to +).
e.g.
𝑌̂𝑡 = 8.37 + 0.171𝑋𝑡
𝑌̂𝑡 = 8.27 + 0.111𝑋𝑡 + 0.064𝑋𝑡−1 .
2
7.3.2 The Koyck Approach to Distributed-Lag Models
Suppose
𝑌𝑡 = 𝛼 + 𝛽0 𝑋𝑡 + 𝛽1 𝑋𝑡−1 + 𝛽2 𝑋𝑡−2 … + 𝜇𝑡 , (7.1)
where 𝜇𝑡 ~𝑁(0, 𝜎 2 ).
Assuming that 𝛽 ′ 𝑠 are all of the same sign, Koyck assumes that 𝛽 ′ 𝑠 decline
geometrically as follows
𝛽𝑘 = 𝛽0 𝜆𝑘 , 𝑘 = 0.1, …. (7.2)
where 𝜆, such that 0 < 𝜆 < 1 is known as the rate of decline or decay of the
distributed lag where where 1 − 𝜆 is known as the speed of adjustment.
(7.2) postulates that each successive 𝛽 coeeficient is numerically less than each
preceding 𝛽.
Since 𝛽𝑘 = 𝛽0 𝜆𝑘
⇒ 𝛽0 = 𝛽0 𝜆0 = 𝛽0,
𝛽1 = 𝜆𝛽0,
𝛽2 = 𝜆2 𝛽0,
𝛽3 = 𝜆3 𝛽0.
∴ (7.1) becomes
𝑌𝑡 = 𝛼 + 𝛽0 𝑋𝑡 + 𝛽0 𝜆𝑋𝑡−1 + 𝛽0 𝜆2 𝑋𝑡−2 … + 𝜇𝑡 . (7.3)
Regress (7.3)
𝑌𝑡−1 = 𝛼 + 𝛽0 𝑋𝑡−1 + 𝛽0 𝜆𝑋𝑡−2 + 𝛽0 𝜆2 𝑋𝑡−3 … + 𝜇𝑡−1 . (7.4)
Multiply (7.4) by 𝜆
𝜆𝑌𝑡−1 = 𝜆𝛼 + 𝛽0 𝜆𝑋𝑡−1 + 𝛽0 𝜆2 𝑋𝑡−2 + 𝛽0 𝜆3 𝑋𝑡−3 … + 𝜆𝜇𝑡−1. (7.5)
Substract (7.5) from (7.3)
𝑌𝑡 − 𝜆𝑌𝑡−1 = 𝛼(1 − 𝜆) + 𝛽0 𝑋𝑡 + 𝜇𝑡 − 𝜆𝜇𝑡−1,
𝑌𝑡 = 𝛼(1 − 𝜆) + 𝛽0 𝑋𝑡 + −𝜆𝑌𝑡−1 + 𝑣𝑡 , (7.6)
where 𝑣𝑡 = 𝜇𝑡 − 𝜆𝜇𝑡−1, a moving average of 𝜇𝑡 and 𝜇𝑡−1 .
Note that:
The Koyck approach:
• Converts distributed lag models to autoregressive models.
• Reduces the number of parameters to be estimated to and 𝛼, 𝛽0 and 𝜆.
• 𝑌𝑡−1 removes multicollinearity between 𝑋𝑡−1 , 𝑋𝑡−2 , 𝑋𝑡−2 , ….
3
Remarks
𝑌𝑡−1 is stochastic like 𝑌𝑡 , check whether the stochastic explanatory variable is
distributed independently of the stochastic error term 𝑣𝑡 .
If an explanatory variable in a regression model are correlated with the stochastic error
term, the OLS estimators are not only biased but also not consistent i.e. even if the
sample size is increased indefinitely, the estimators do not approximate the true
population values. Therefore, estimation of the Koyck models buy usual OLS may yield
serious misleading results.
𝑣𝑡 = 𝜇𝑡 − 𝜆𝜇𝑡−1,
𝐸(𝑣𝑡 . 𝑣𝑡−1 ) = 𝐸(𝜇𝑡 − 𝜆𝜇𝑡−1 )(𝜇𝑡−1 − 𝜆𝜇𝑡−2 ) = −𝜆𝐸(𝑣𝑡−1 )2 = −𝜆𝜎 2 .
Therefore, 𝑣𝑡 is serially corelated. Since 𝑌𝑡−1 appears in the Koyck as explanatory
variable, it is bound to be correlated with 𝑣𝑡 .
Since 𝐶𝑜𝑣(𝑌𝑡−1 , (𝜇𝑡 − 𝜆𝜇𝑡−1 ) = −𝜆𝜎 2 .
Presence of 𝑌𝑡−1 violated the assumptions of Durbin Watson d test. Durbin H test is
an alternative test for serial autocorrelation.
To estimate the Koyck model consistently , the most popular method is the method of
instrumental variables. The instrumental variable is proxy variable for the lagged
regrassand but with the property that is uncorrelated with error term.
4
Koyck Practical Example
The Model:
𝑌𝑡 = 𝛼(1 − 𝜆) + 𝛽0 𝑋𝑡 + 𝜆𝑌𝑡−1 + 𝑣𝑡
data koyck;
Input Y X;
z = lag(y);
Datalines;
8776 9685
8873 9735
8873 9901
9170 10227
9412 10455
9839 11061
10331 11594
10793 12065
10994 12457
11510 12892
11820 13163
11955 13563
12256 14001
12868 14512
13371 15345
13148 15094
13320 15291
13919 15738
14364 16128
14837 16704
15030 16931
14816 16940
14879 17217
14944 17418
15656 17828
16343 19011
17040 19476
17570 19906
17994 20072
18554 20740
18898 21120
19067 21281
18848 21109
19208 21548
19593 21493
20082 21812
20382 22153
20835 22546
21365 23065
22183 24131
23050 24564
23860 24469
24205 25687
24612 26217
5
25043 26535
24711 27232
25277 27436
26828 28005
;
SAS OUTPUT
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
• The estimated value of 𝜆 = 0.7680 and can be used to compute the lag
coefficients.
• If 𝛽0 ≈ 0.2400, 𝛽1 ≈ (0.2400)(0.7680) ≈ 0.1843, 𝛽2 ≈ (0.2400)(0.7680)2 =
0.0442, and so on, which are short and medium-term multipliers.
• The long-run multiplier, that is, the total impact of change in income on
consumption after all lagged effects are taken into account is
∞
1 1
∑ 𝛽𝑘 = 𝛽0 ( ) = 0.2400 ( ) ≈ 1.0344
1−𝜆 1 − 0.7680
𝑘=0
This mean, a sustained increase in a 1 Rand in PPDI will eventually lead to
about R1,03 increase in PPCE, the immediate, or short-run impact being 24
cents.
• The long-run consumption function can be written as:
𝑃𝑃𝐶𝐸𝑡 = −1132.58 + 1.0344𝑃𝑃𝐷𝐼𝑡
6
This is obtained by dividing the short-run consumption function from SAS output by
0.2320 (1 − 𝜆) on both sides and drop the lagged 𝑃𝑃𝐶𝐸 term. (comment).
Rationalisation of the Koyck Model
The Koyck model is an ad hoc since it was obtained algebraically. It lacks any
theoretical underpinning. This gap is filled if we start from a different perspective.
The adapative Expectations Model
Suppose
𝑌𝑡 = 𝛽0 + 𝛽1 𝑋𝑡∗ + 𝑢𝑡 , (7.7)
where 𝑋𝑡∗ =equilibrium /expected variable.
Since 𝑋𝑡∗ the expectational variable is not directly observable, let us propose the
following hypothesis about how expectations are formed.
∗ ∗
𝑋𝑡∗ − 𝑋𝑡−1 = 𝛾(𝑋𝑡 − 𝑋𝑡−1 ), (7.8)
Where 𝛾,such that 0 < 𝛾 ≤ 1 is known as the coefficient of expectation. (7.8) is known
as the adaptive expectation, progressive expectation or error learning hypothesis.
(7.8) can be written as
∗
𝑋𝑡∗ = 𝛾𝑋𝑡 + (1 − 𝛾)𝑋𝑡−1 ). (7.9)
Therefore
∗
𝑌𝑡 = 𝛽0 + 𝛽1 [𝛾𝑋𝑡 + (1 − 𝛾)𝑋𝑡−1 )] + 𝑢𝑡 .
∗ (7.10)
𝑌𝑡 = 𝛽0 + 𝛽1 𝛾𝑋𝑡 + 𝛽1 (1 − 𝛾)𝑋𝑡−1 ) + 𝑢𝑡 .
From (7.7)
∗ (7.11)
𝑌𝑡−1 = 𝛽0 + 𝛽1 𝑋𝑡−1 + 𝑢𝑡−1 .
Multiply by 1 − 𝛾
∗ (7.12)
(1 − 𝛾)𝑌𝑡−1 = (1 − 𝛾)𝛽0 + 𝛽1 (1 − 𝛾)𝑋𝑡−1 + (1 − 𝛾)𝑢𝑡−1 .
(7.10) − (7.12),
𝑌𝑡 − (1 − 𝛾)𝑌𝑡−1 = 𝛽0 − (1 − 𝛾)𝛽0 + 𝛽1 𝛾𝑋𝑡 + 𝑢𝑡 − (1 − 𝛾)𝑢𝑡−1,
𝑌𝑡 = 𝛾𝛽0 + 𝛽1 𝛾𝑋𝑡 + (1 − 𝛾)𝑌𝑡−1 + 𝑣𝑡 , (7.13)
where 𝑣𝑡 = 𝑢𝑡 − (1 − 𝛾)𝑢𝑡−1 .
In practice, we fit equation (7.13) and obtain the estimate value of 𝛾 from the coefficient
of lagged 𝑌, then we can easily compute
𝛽1 =coefficient of 𝑋𝑡 divided by 𝛾.
7
The Stock Adjustment or Partial Adjustment Model
The adaptive expectations model is one way of rarionalizing the Koyck model. Another
rationalisation is called the stock adjustment or partial adjustment model. To illustrate
this model, consider the flexible accelerator model of econometric theory which
assumes that there is equilibrium, optimal, desired or long-run amount of capital stock
to produce a given output undr given state of technology, rate of interest, etc. For
simplicity assume this desired level of capital 𝑌𝑡∗ is a linear function of output 𝑋.
Consider
𝑌𝑡∗ = 𝛽0 + 𝛽1 𝑋𝑡 + 𝑢𝑡 , (7.14)
where 𝑌𝑡∗ is the desired level of capital and is not directly observable.
The partial adjustment hypothesis postulated by Nervlove is :
𝑌𝑡 − 𝑌𝑡−1 = 𝛿(𝑌𝑡∗ − 𝑌𝑡−1 ), (7.15)
where 𝛿, such that 0 < 𝛿 ≤ 1 is known as the coefficient of adjustment and where
𝑌𝑡 − 𝑌𝑡−1 =actual change and (𝑌𝑡∗ − 𝑌𝑡−1 ) =desired change.
But 𝑌𝑡 − 𝑌𝑡−1 is change in capital stock between two periods = investment 𝐼𝑡 .
Therefore
(7.15) can be written as
∗
𝐼𝑡 = 𝛿(𝑌𝑡−1 − 𝑌𝑡−1 ), (7.16)
(7.15) postulates that the actual change in stock (investment) in any given time period
𝑡 is some fraction 𝛿 of the desired change for that period.
But (7.15) can also be written as
𝑌𝑡 = 𝛿𝑌𝑡∗ + (1 − 𝛿)𝑌𝑡−1 ), (7.17)
i.e. observed capital stock at time t is a weighted average of the desired capital stock
at that time and the capital stock existing in the previous time period.
Show that (7.14) and (7.17) gives
𝑌𝑡 = 𝛿𝛽0 + 𝛿𝛽1 𝑋𝑡 + (1 − 𝛿)𝑌𝑡−1 + 𝛿𝑢𝑡 , (7.18)
This is the partial adjustment model.
Combination of Adaptive Expectations and Partial adjustment Models
Consider the following model:
𝑌𝑡∗ = 𝛽0 + 𝛽1 𝑋𝑡∗ + 𝑢𝑡 , (7.14)
where 𝑌𝑡∗ = desired stock of capital and 𝑋𝑡∗ =expected level of output. 𝑋𝑡∗ and 𝑌𝑡∗ are
not directly observable. Show that
8
𝑌𝑡 = 𝛽0 𝛾𝛿 + 𝛽1 𝛾𝛿𝑋𝑡 + [(1 − 𝛾) + (1 − 𝛿)]𝑌𝑡−1 − (1 − 𝛿)(1 − 𝛾)𝑌𝑡−2
+ [𝛿𝜇𝑡 − 𝛿(1 − 𝛾)𝜇𝑡−1 ]
𝑌𝑡 = 𝛼0 + 𝛼1 𝑋𝑡 + 𝛼2 𝑌𝑡−1 + 𝛼2 𝑌𝑡−1 + 𝑣𝑡 ,
where 𝑣𝑡 = 𝛿[𝜇𝑡 − (1 − 𝛾)𝜇𝑡−1 ].
7.4 Estimation of Autoregressive Models
OLS will not work due to:
• Presence of stochastic explanatory variables.
• Possibility of serial correlation (𝑌𝑡−1 tend to be correlated with the error term)
Method of Instrument Variables
Use of proxy variable for 𝑌𝑡−1 that is highly correlated with 𝑌𝑡−1 but uncorrelated with
𝑣𝑡 the error term in the Koyck or adaptive expectations model. The proxy is called
instrumental variable (IV). Liviatan suggests using 𝑋𝑡−1 as the instrumental variable
for 𝑌𝑡−1 and further suggests that the parameters of the
𝑌𝑡 = 𝛼0 + 𝛼1 𝑋𝑡 + 𝛼2 𝑌𝑡−1 + 𝑣𝑡 , (7.19)
(the common form for all models) can be obtained by solving the normal equations:
∑ 𝑌𝑡 = 𝑛𝛼̂0 + 𝛼̂1 ∑ 𝑋𝑡 + 𝛼̂2 ∑ 𝑌𝑡−1 ,
∑ 𝑌𝑡 𝑋𝑡 = 𝛼̂0 ∑ 𝑋𝑡 + 𝛼̂1 ∑ 𝑋𝑡2 + 𝛼̂2 ∑ 𝑌𝑡−1 𝑋𝑡 ,
∑ 𝑌𝑡 𝑋𝑡−1 = 𝛼̂0 ∑ 𝑋𝑡−1 + 𝛼̂1 ∑ 𝑋𝑡 𝑋𝑡−1 + 𝛼̂2 ∑ 𝑌𝑡−1 𝑋𝑡−1 .
Try to find the OLS normal equations of (7.19).
Detecting Autocorrelation in Autoregressive Models
The Durbin h test is a large sample test of first order autocorrelation in autoregressive
models.
ℎ statistic is given by
𝑛
ℎ = 𝜌̂√1−𝑛(𝑣𝑎𝑟(𝛼̂ )),
2
where
𝑛 is the sample size
𝑣𝑎𝑟(𝛼̂2 ) is the variance of the lagged 𝑌(= 𝑌𝑡−1 ) coefficient in (7.19).
𝜌̂ is an estimate of the first-order serial correlation 𝜌.
For a large sample, under 𝐻0 : 𝜌 = 0, the ℎ −statistic follows the standard normal
distribution.
ℎ𝑎𝑠𝑦 ~𝑁(0,1).
9
𝑑
𝜌̂ ≈ 1 − .
2
10
Assumption:
𝛽 ′ 𝑠 are approximated by an 𝑚-degree polynomial.
𝛽𝑖 = 𝑎0 + 𝑎1 𝑖 + 𝑎2 𝑖 2 + 𝑎3 𝑖 3 + ⋯ + 𝑎𝑚 𝑖 𝑚 , (7.20)
e.g.
𝛽𝑖 = 𝑎0 + 𝑎1 𝑖 + 𝑎2 𝑖 2 (Quadratic – second degree polynomial)
𝛽𝑖 = 𝑎0 + 𝑎1 𝑖 + 𝑎2 𝑖 2 + 𝑎3 𝑖 3 (Third degree polynomial.
Consider a finite distributed-lag model
𝑌𝑡 = 𝛼 + 𝛽0 𝑋𝑡 + 𝛽1 𝑋𝑡−1 + 𝛽2 𝑋𝑡−2 … + 𝛽𝑘 𝑋𝑡−𝑘 + 𝜇𝑡 , (7.21)
which can be written as
𝑌𝑡 = 𝛼 + ∑𝑘𝑖=0 𝛽𝑖 𝑋𝑡−𝑖 + 𝜇𝑡 , (7.22)
To explain how the Almon scheme operates let us assume that the 𝛽 ′ 𝑠 follow a
quadratic pattern:
𝛽𝑖 = 𝑎0 + 𝑎1 𝑖 + 𝑎2 𝑖 2, (7.23)
Substitute into (7.22)
𝑌𝑡 = 𝛼 + ∑𝑘𝑖=0(𝑎0 + 𝑎1 𝑖 + 𝑎2 𝑖 2 )𝑋𝑡−𝑖 + 𝜇𝑡 ,
𝛽̂0 = 𝑎̂0,
11
𝛽̂𝑘 = 𝑎̂0 + 𝑘𝑎̂1 + 𝑘 2 𝑎̂2.
Before we apply the Almon technique, we must resolve the following practical
problems.
• The maximum length of the lags must be specified in advance. One can use
Alkaike or Schwartz Information criterion to choose appropriate lag length.
• Having specified 𝑘, we must also specify the degree of the poalynomial 𝑚.
Generally the degree of the polynomial should be at least one more than the
turning points in the curve𝛽𝑖 relating to 𝑖.
• Once 𝑚 and 𝑘 are specified , the 𝑍s can easily be constructed for instance,
• If 𝑚 = 2 and 𝑘 = 5 then the 𝑍s are
𝑍0𝑡 = ∑5𝑖=0 𝑋𝑡−𝑖 = 𝑋𝑡 + 𝑋𝑡−1 + 𝑋𝑡−2 + 𝑋𝑡−3 + 𝑋𝑡−4 + 𝑋𝑡−5 ,
𝑌𝑡 = 𝛼 ∑∞ 𝑖
𝑖=0 𝜆 𝑋𝑡−𝑖 + 𝜇𝑡 , −∞ < 𝛼 < ∞, 0 ≤ 𝜆 < 1.
𝑌𝑡 = 𝛼𝑋𝑡 + 𝛼𝜆𝑋𝑡−1 + 𝛼𝜆2 𝑋𝑡−2 + ⋯ + 𝜇𝑡 , (7.26)
𝑌𝑡−1 = 𝛼𝑋𝑡−1 + 𝛼𝜆𝑋𝑡−2 + 𝛼𝜆2 𝑋𝑡−3 + ⋯ + 𝜇𝑡−1 , (7.27)
Multiply (7.27) by 𝜆
𝜆𝑌𝑡−1 = 𝛼𝜆𝑋𝑡−1 + 𝛼𝜆2 𝑋𝑡−2 + 𝛼𝜆3 𝑋𝑡−3 + ⋯ + 𝜆𝜇𝑡−1 , (7.28)
Subtract equation (7.28) from (7.26)
12
𝑌𝑡 − 𝜆𝑌𝑡−1 = 𝛼𝑋𝑡 + 𝜇𝑡 − 𝜆𝜇𝑡−1 ,
∴ 𝑌𝑡 = 𝛼𝑋𝑡 + 𝜆𝑌𝑡−1 + 𝜇𝑡 − 𝜆𝜇𝑡−1 ,
But 𝜇𝑡 − 𝜆𝜇𝑡−1 = 𝑌𝑡 − 𝛼𝑋𝑡 − 𝜆𝑌𝑡−1
𝐸(𝜇𝑡 − 𝜆𝜇𝑡−1 ) = 0, since 𝐸(𝜇𝑡 ) = 0.
𝑉𝑎𝑟(𝜇𝑡 − 𝜆𝜇𝑡−1 ) = 𝜎 2 + 𝜆2 𝜎 2 = (1 + 𝜆2 )𝜎 2 ,
𝐶𝑜𝑣(𝜇𝑡 , 𝜇𝑡−𝑖 ) = 0 since 𝑖 > 0.
∴ 𝜇𝑡 − 𝜆𝜇𝑡−1 ~𝑁(0, (1 + 𝜆2 )𝜎 2 ) ,
𝝁 − 𝝁−𝟏 = 𝑁(0, 𝜎 2 𝐺).
What is 𝐺?
To find the posterior of 𝜆 and 𝛼 we first find the likelihood.
1
|𝐺|−2 1
𝑃(𝒚|𝜆, 𝛼, 𝜎, 𝑦0 ) ∝ 𝐸𝑥𝑝 [− 2𝜎2 [(𝒚 − 𝜆𝒚−𝟏 − 𝛼𝒙)′ 𝐺 −1 (𝒚 − 𝜆𝒚−𝟏 − 𝛼𝒙)]] (7.30)
𝜎𝑇
1
−
1 |𝐺| 2 1
𝑃(𝜆, 𝛼, 𝜎|𝒚, 𝑦0 ) ∝ 𝜎 . 𝐸𝑥𝑝 [− 2𝜎2 [(𝒚 − 𝜆𝒚−𝟏 − 𝛼𝒙)′ 𝐺 −1 (𝒚 − 𝜆𝒚−𝟏 − 𝛼𝒙)]].
𝜎𝑇
1
|𝐺|−2 1
𝑃(𝜆, 𝛼, 𝜎|𝒚, 𝑦0 ) ∝ 𝐸𝑥𝑝 [− 2𝜎2 [(𝒚 − 𝜆𝒚−𝟏 − 𝛼𝒙)′ 𝐺 −1 (𝒚 − 𝜆𝒚−𝟏 − 𝛼𝒙)]]. (7.31)
𝜎𝑇+1
13
1 ′ −1 1
Let 𝐴 = [ [(𝒚 − 𝜆𝒚−𝟏 − 𝛼𝒙) 𝐺 (𝒚 − 𝜆𝒚−𝟏 − 𝛼𝒙)]] and 𝑢 = 𝜎2
2
𝑑𝜎 1 1
⇒ 𝜎 = 𝑢−1/2 and |𝑑𝑢| = 2 𝑢−3/2, 𝑑𝜎 = 2 𝑢−3/2 𝑑𝑢
1 𝑇 1 𝑇
1 ∞ 1 Γ( )
𝑃(𝜆, 𝛼, |𝒚, 𝑦0 ) = 2 ∫0 |𝐺|−2 𝑢2−1 𝑒−𝐴𝑢 𝑑𝑢 = 2 |𝐺|−2 . 𝐴𝑇/2
2
1 𝑇
|𝐺|−2 .(2)2
𝑃(𝜆, 𝛼, |𝒚, 𝑦0 ) ∝ 𝑇 .
[(𝒚−𝜆𝒚−𝟏 −𝛼𝒙)′ 𝐺 −1 (𝒚−𝜆𝒚−𝟏 −𝛼𝒙]2
1
|𝐺|−2
𝑃(𝜆, 𝛼, |𝒚, 𝑦0 ) ∝ 𝑇 , (7.32)
[(𝒚−𝜆𝒚−𝟏 −𝛼𝒙)′ 𝐺 −1 (𝒚−𝜆𝒚−𝟏 −𝛼𝒙]2
As an elaboration of 𝑌𝑡 = 𝛼 ∑∞ 𝑖
𝑖=0 𝜆 𝑋𝑡−𝑖 + 𝜇𝑡 , we may entertain the hypothesis that our
data are generated by
Here we assume that the response to the current and lagged disturbance terms takes
the same form as that to the current lagged 𝑋 ′ 𝑠and involves the same parameter 𝜆.
Subtraction of 𝜆𝑌𝑡−1 from both sides yields
𝑌𝑡 = 𝜆𝑌𝑡−1 + 𝛼𝑋𝑡 + +𝜇𝑡 . (7.33)
(7.33) can be shown.
Now if
𝑌𝑡 = 𝛼 ∑∞ 𝑖 ∞ 𝑖
𝑖=0 𝜆 𝑋𝑡−𝑖 + ∑𝑖=0 𝜆 𝜇𝑡−𝑖 , (7.34)
with 𝜇𝑡 = 𝜌𝜇𝑡−1 + 𝜀𝑡 .
𝑌𝑡 = 𝛼𝑋𝑡 + 𝛼𝜆𝑋𝑡−1 + 𝛼𝜆2 𝑋𝑡−2 + ⋯ + 𝜇𝑡 + 𝜆𝜇𝑡−1 + 𝜆2 𝜇𝑡−2 + ⋯ (7.35)
𝑌𝑡−1 = 𝛼𝑋𝑡−1 + 𝛼𝜆𝑋𝑡−2 + 𝛼𝜆2 𝑋𝑡−3 + ⋯ + 𝜇𝑡−1 + 𝜆𝜇𝑡−2 + 𝜆2 𝜇𝑡−3 + ⋯ (7.36)
Multiply (7.36) by 𝜆
𝜆𝑌𝑡−1 = 𝛼𝜆𝑋𝑡−1 + 𝛼𝜆2 𝑋𝑡−2 + 𝛼𝜆3 𝑋𝑡−3 + ⋯ + 𝜆𝜇𝑡−1 + 𝜆2 𝜇𝑡−2 + 𝜆3 𝜇𝑡−3 + ⋯ (7.37)
Equation (7.35) − (7.37),
𝑌𝑡 − 𝜆𝑌𝑡−1 = 𝛼𝑋𝑡 + 𝜇𝑡 ,
𝑌𝑡 = 𝜆𝑌𝑡−1 + 𝛼𝑋𝑡 + 𝜇𝑡 (7.38)
𝑌𝑡−1 = 𝜆𝑌𝑡−2 + 𝛼𝑋𝑡−1 + 𝜇𝑡−1 (7.39)
14
Multiply by 𝜌
𝜌𝑌𝑡−1 = 𝜆𝜌𝑌𝑡−2 + 𝛼𝜌𝑋𝑡−1 + 𝜌𝜇𝑡−1 (7.40)
Equation (7.38) − (7.40),
𝑌𝑡 − 𝜌𝑌𝑡−1 = 𝜆𝑌𝑡−1 − 𝜆𝜌𝑌𝑡−2 + 𝛼𝑋𝑡 − 𝛼𝜌𝑋𝑡−1 + 𝜇𝑡 − 𝜌𝜇𝑡−1
𝑌𝑡 = (𝜌+𝜆)𝑌𝑡−1 − 𝜆𝜌𝑌𝑡−2 + 𝛼(𝑋𝑡 − 𝜌𝑋𝑡−1 ) + 𝜀𝑡 (7.41)
Since 𝜀𝑡 = 𝜇𝑡 − 𝜌𝜇𝑡−1 ~𝑁(0, 𝜏 2 )
𝜀𝑡 = 𝑌𝑡 − (𝜌+𝜆)𝑌𝑡−1 − 𝜆𝜌𝑌𝑡−2 + 𝛼(𝑋𝑡 − 𝜌𝑋𝑡−1 ).
We can also find the posterior pdf of 𝜆, 𝜌 and 𝛼.
Posterior ∝ prior × likelihood
1 1 1
𝑃(𝜆, 𝛼, 𝜌, 𝜏|𝑦) ∝ 𝜏 . 𝜏𝑇 𝐸𝑥𝑝 [− 2𝜏2 ∑𝑇𝑡=1(𝑌𝑡 − (𝜌+𝜆)𝑌𝑡−1 − 𝜆𝜌𝑌𝑡−2 + 𝛼(𝑋𝑡 − 𝜌𝑋𝑡−1 ))2 ].
1 1
𝑃(𝜆, 𝛼, 𝜌, 𝜏|𝑦) ∝ 𝐸𝑥𝑝 [− ∑𝑇𝑡=1(𝑌𝑡 − (𝜌+𝜆)𝑌𝑡−1 − 𝜆𝜌𝑌𝑡−2 + 𝛼(𝑋𝑡 − 𝜌𝑋𝑡−1 ))2 ] (7.42)
𝜏𝑇+1 2𝜏2
Now
∞
𝑃(𝜆, 𝛼, 𝜌, |𝑦) = ∫0 𝑃(𝜆, 𝛼, 𝜌, 𝜏|𝑦)𝑑𝜏.
∑𝑇
𝑡=1(𝑌𝑡 −(𝜌+𝜆)𝑌𝑡−1 −𝜆𝜌𝑌𝑡−2 +𝛼(𝑋𝑡 −𝜌𝑋𝑡−1 ))
2
Let 𝐴 = ,
2
1 𝑑𝜏 1 1
𝑢 = 𝜏2 , 𝜏 = 𝑢−1/2 and |𝑑𝑢| = 2 𝑢−3/2, 𝑑𝜏 = 2 𝑢−3/2 𝑑𝑢
1 𝑇
∞1 − (𝑇−1) −3/2 −𝐴𝑢 1 ∞ −1 −𝐴𝑢
𝑃(𝜆, 𝛼, 𝜌, |𝑦) = ∫0 𝑢 2 𝑢 𝑒 𝑑𝑢 = ∫0 𝑢2 𝑒 𝑑𝑢.
2 2
𝑇
Γ( ) 𝑇
2 −
𝑃(𝜆, 𝛼, 𝜌, |𝑦) ∝ 𝑇/2 ∝ [𝐴] 2 .
𝐴
𝑇 𝑇
𝑃(𝜆, 𝛼, 𝜌, |𝑦) ∝ 22 [∑𝑇𝑡=1(𝑌𝑡 − (𝜌+𝜆)𝑌𝑡−1 − 𝜆𝜌𝑌𝑡−2 + 𝛼(𝑋𝑡 − 𝜌𝑋𝑡−1 ))2 ]−2 .
𝑇
𝑃(𝜆, 𝛼, 𝜌, |𝑦) ∝ [∑𝑇𝑡=1(𝑌𝑡 − (𝜌+𝜆)𝑌𝑡−1 − 𝜆𝜌𝑌𝑡−2 + 𝛼(𝑋𝑡 − 𝜌𝑋𝑡−1 ))2 ]−2 .
15