0% found this document useful (0 votes)
12 views37 pages

Intervention Analysis

Uploaded by

zhuzhaodong23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views37 pages

Intervention Analysis

Uploaded by

zhuzhaodong23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Intervention analysis

ST 434/534

Donald E.K. Martin


North Carolina St. U.

November 26, 2022

Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 1 / 37
Outline

1 Intervention analysis
Introduction
Models for intervention analysis
Example: air pollution in Los Angeles

2 Outlier analysis

Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 2 / 37
Introduction

Time series are often affected by special events or circumstances


such as policy changes, strikes, advertising promotions,
environmental regulations, or other events of intervention.

Here we describe a method of intervention analysis to quantify


the expected (or average) effect of interventions.

Transfer functions are used in the analysis. The input series will
be of the form of a simple pulse or step indicator function to
indicate the presence or absence of the intervention event.

Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 3 / 37
Models for intervention analysis

It is assumed that an intervention event has occurred at a known


point in time T of a time series.

Of interest is to determine whether there is evidence of an effect


on the time series Yt associated with the event.

The transfer function model


ω(B)B b
Yt = ξt + Nt
δ(B)

will be entertained.

Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 4 / 37
The first part,
ω(B)B b
ξt
δ(B)
represents the effects of the deterministic input ξt .

Nt is the noise series that represents the background observed


series Yt without the intervention effects.

It is assumed that Nt follows an ARIMA(p, d, q) model.

A seasonal ARIMA model is also possible, but won’t be


specifically discussed here.

Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 5 / 37
Two types of deterministic input variables

There are two types of deterministic input variables ξt that have


been found to be useful for representing the impact of
intervention events on time series.

One type is a step function at time T , given by



(T ) 0 if t < T ,
St =
1 if t ≥ T .

This could represent the effects of an intervention that is


expected to remain to some extent after time T .

Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 6 / 37
The other type is a pulse function, given by

(T ) 0 if t ̸= T ,
Pt =
1 if t = T .

This could represent the effects of an intervention that are


temporary and will die out completely after time T .
(T ) (T )
Notice that (1 − B)St = Pt , and thus any transfer function
(T )
model that involves St could equally well be represented in
(T )
terms of Pt .

These indicator input variables are used when the effects of the
intervention cannot be represented as the response to a
quantitative variable because such a variable does not exist or it
is impossible to obtain measurements on such a variable. If we
had a random variable representing the intervention effects, we
could use that variable to quantify the effects.
Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 7 / 37
Determining the transfer function model

Since ξt is deterministic, it can’t be pre-whitened to help with


determining the transfer function model.

Instead, the form is postulated based on consideration of the


mechanisms that may cause the effect due to the known event,
and the form of the implied change that would be expected.

It will be useful to have in mind the form of the effect implied by


various basic transfer function models.

Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 8 / 37
General responses
Various responses can be produced through different
combinations of step and pulse inputs at different time points.

In general, the response is represented through the rational


transfer function
ω(B)B b
,
δ(B)
where ω(B) models initial effects, δ(B) measures the behavior of
the permanent effect of the interventions, and b is the time
delay before effects are felt.

The roots of δ(B) = 0 are assumed to be on or outside the unit


circle. A unit root would represent an impact that increases
linearly, whereas a root outside the unit circle represents a
phenomena that has a gradual effect.
Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 9 / 37
Commonly encountered effects

(T )
ωB b St : This actually gives a spike of height ω at all time
(T )
points t ≥ T + b, for at time T , the effect is zero as B b St
(T )
means that we are looking at a time before where St = 1.

(T )
ωB b Pt produces a spike of height ω at t = T + b.

Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 10 / 37
ωB b
Since 1−δB = ωB b (1 + δB + δ 2 B 2 + . . .)
= ω(B b + δB b+1 + δ 2 B b+2 + . . .). Thus

ωB b (T )
S
1 − δB t
produces spikes with increasing heights, beginning with a spike
with height ω at time T + b, and in the limit as t → ∞
ω
converging to a spike with height 1−δ . The heights would
increase without limit if δ = 1.
On the other hand,
ωB b (T )
P
1 − δB t
produces spikes with decreasing heights (when δ < 1), beginning
with a spike with height ω at time T + b, and decreasing
exponentially as powers of δ.

Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 11 / 37
Representing multiple intervention events

For multiple intervention inputs, we have the following general


class of models:
k
X ωj (B)B bj θ(B)
Yt = Ijt + at ,
j=1
δj (B) φ(B)

where Ijt are intervention variables that can be either step, pulse,
or even indicator functions.

Of interest is estimating the ωj (B) and δj (B) that indicate the


intervention effects.

Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 12 / 37
Air pollution intervention in Los Angeles:
background

Los Angeles is known for its air pollution problem.

The problem comes from substances produced by chemical


reactions in sunlight among some primary pollutants such as
oxides of nitrogen and reactive hydrocarbons.

The products of these chemical reactions are responsible for the


LA smog, and cause health hazards. Ozone is one measured
product of the photochemical pollution.

Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 13 / 37
Air pollution intervention in Los Angeles:
background

Different methods have been used to ease the problem through


the years, including the diversion of traffic in early 1960 by the
opening of the Golden State Freeway and the inception of a new
law (Rule 63) that reduced the allowable proportion of reactive
hydrocarbons in the gasoline sold locally.

Also, after 1966 special regulations were implemented to require


engine design changes in new cars to reduce production of ozone.

In a 1975 paper, Box and Tiao introduced intervention analysis


to study the LA ozone data.

Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 14 / 37
The data from 1955 to 1960 was assumed to be free from
intervention effects and was used to estimate the noise model Nt .
The sample ACF for that data suggested the model
(1 − B 12 )Nt = (1 − θB)(1 − ΘB 12 )at .
Box and Tiao (1975) suggested that the opening of the Golden
State Freeway and Rule 63 in 1960 constitutes an intervention I1
that may be expected to be a step change in the ozone level.
A second intervention was the 1966 regulations requiring engine
changes in new cars. The effect would most accurately have
been represented by the proportion of new cars with the
specified engine changes over time. However, no such data was
available. Proxy variables were used.
Because of the differences in the intensity of sunlight and other
meteorological conditions between summer and winter months,
the second effect was broken into two effects, I2 for the summer
month effects, I3 for the effects of winter months.
Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 15 / 37
The final model of Box and Tiao (1975) was
ω2 ω3
Zt = ω1 I1t (t) + 12
I2t + I3t
1−B 1 − B 12
(1 − θB)(1 − ΘB 12 )
+ at ,
1 − B 12
where

0 if t < Jan 1960
I1t =
1 if t ≥ Jan 1960,

1 for summer months June-Oct beginning in 1966 ,
I2t =
0 otherwise;


1 for winter months Nov-May beginning in 1966 ,
I3t =
0 otherwise.
Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 16 / 37
SAS code for determining the estimated coefficients is as follows:
title1 ’Intervention Data for Ozone Concentration’;
title2 ’(Box and Tiao, JASA 1975 P.70)’;
data air;
input ozone @@;
label ozone = ’Ozone Concentration’
x1 = ’Intervention for post 1960 period’
summer = ’Summer Months Intervention’
winter = ’Winter Months Intervention’;
date = intnx( ’month’, ’31dec1954’d, n );
format date monyy.;
month = month( date );
year = year( date );
x1 = year>= 1960;
summer = (5 < month < 11 ) * ( year > 1965 );
winter = ( year > 1965 ) - summer;
datalines;
2.63
Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 17 / 37
...
;
run;

proc sgplot data=air;


title ’ ’;
series x=date y=ozone;
run;
proc arima data=air;

/* Identify and seasonally difference ozone series */


identify var=ozone(12)
crosscorr=( x1(12) summer winter )

Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 18 / 37
* Fit a multiple regression with a seasonal MA model
/* Then fit a multiple regression with a product MA model */
/* by the maximum likelihood method */

estimate input=( x1 summer winter )


noconstant method=ml;

estimate q=(1)(12) input=( x1 summer winter )


noconstant method=ml;

/* Forecast */
forecast lead=12 id=date interval=month;

run;

Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 19 / 37
LA ozone data 1955-1972

Figure: 1.
Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 20 / 37
Trend and correlation analysis of residual
seasonally differenced LA ozone data 1955-1972

Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 21 / 37
Estimated effects

Table: 1

Pa. Est. SE t Approx Pr > |t| Lag Variable Shift


MA1,1 -0.26872 0.06698 -4.01 < .0001 1 ozone 0
MA2,1 0.77348 0.05947 13.01 < .0001 12 ozone 0
NUM1 -1.22250 0.18242 -6.70 < .0001 0 x1 0
NUM2 -0.22545 0.05661 -3.98 < .0001 0 summer 0
NUM3 -0.08081 0.04718 -1.71 0.0868 0 winter 0

Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 22 / 37
Trend and correlation analysis of residual
seasonally differenced LA ozone data 1955-1972
after q=(1)(12) model fit

Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 23 / 37
Results

A model with moving average factors at lags 1 and 12 was fit,


based on the spikes in the autocorrelation function after a
seasonal difference.

The model fit appears to be adequate.

It appears that intervention I1 reduces ozone significantly.

There is an annual reduction in the summer months, but not so


much in the winter months.

Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 24 / 37
Models for outlier analysis

We consider two simple intervention models to represent two


types of outliers that occur in practice, additive outliers (AO)
and innovational outliers (IO).

Let Nt denote the underlying time series process that is free of


the impact of outliers, and let Yt denote the observed time
series.

It is assumed that Nt follows the ARIMA(p, d, q) model


φ(B)Nt = θ(B)at .

Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 25 / 37
Then an additive outlier at time T is modeled as
(T ) (T ) θ(B)
Yt = ωPt + Nt = ωPt + at .
φ(B)

An innovational outlier at time T is modeled as


θ(B) (T ) θ(B) (T )
Yt = (ωPt + at ) = ω P + Nt .
φ(B) φ(B) t

Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 26 / 37
An AO affects the level of the observed time series only at time
T by an unknown amount ω, while an IO represents an
extraordinary random shock at time T that affects all
succeeding observations YT , YT +1 , . . . , through the dynamics of
the system described by the model for Nt .

Multiple outlier events at times T1 , T2 , . . . , Tk may be modeled


using
k
(T )
X
Yt = ωj νj (B)Pt j + Nt ,
j=1

θ(B)
where νj (B) = 1 for an AO and νj (B) = φ(B)
for an IO at time
Tj .

Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 27 / 37
Example of analysis of outliers

We illustrate an outlier analysis using Series C from the Box,


Jenkins and Reinsel book.

The data is on ”uncontrolled” temperature readings every


minute in a chemical process.

A plot of the data, along with the trend and correlation analysis
are shown on the next two slides.

Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 28 / 37
Series C: Temperature readings every minute in a
chemical process

Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 29 / 37
Series C: Trend and correlation analysis for the raw
data

Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 30 / 37
Model fitting

Nonstationarity is a possibility here. An AR(2) (or AR(3)) is


suggested by the ACF and PACF plots. Running the
Dickey-Fuller test using an AR(2) model (one lagged difference),
white noise is not quite rejected at the α = 0.05 level of
significance.

If, however, we try to fit an AR(2) model without differencing


(which actually seems to fit well) the estimated parameters are
ϕ1 = 1.82068 and ϕ2 = −0.82079. Their sum is very close to
one, indicating a unit root is present. Also, the correlation
between the parameter estimates is one. This strongly suggests
that a difference should be taken.

Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 31 / 37
Augmented Dickey-Fuller unit root tests

Table: 2

Type Lags ρ Pr < ρ τ Pr < τ F Pr > F


Zero Mean 0 -0.3435 0.6041 -2.29 0.0217
1 -0.5723 0.5539 -1.26 0.1901
Single Mean 0 -0.8976 0.8947 -0.53 0.8817 2.66 0.3924
1 -14.8857 0.0384 -2.83 0.0556 4.55 0.0549
Trend 0 -0.9186 0.9889 -0.54 0.9810 0.19 0.9900
1 -14.8760 0.1801 -2.82 0.1901 4.00 0.3775

Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 32 / 37
Model fitting
After differencing, the trend and correlation analysis suggests an
AR(1) model. The model that is fit is
(1 − 0.82017B)(1 − B)Yt = at with σ̂a2 = 0.018156. The model
fits well. Five outliers are detected.
Table: 3

Obs Type Estimate χ2 Approx Pr > χ2


58 Shift 0.70580 55.80 < .0001
163 Additive -0.16580 9.18 0.0025
59 Shift 0.25381 7.22 0.0072
61 Shift -0.25290 7.16 0.0074
41 Shift 0.25097 7.05 0.0079
Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 33 / 37
SAS code, including adding in outliers (see class moodle page):
data seriesc;
Input time temp;

cards;
1 26.6 ...
226 18.8
;
run;

proc sgplot data=seriesc;


title ’ ’;
series x=time y=temp;
run;
* proc arima data=seriesc ;
* identify var=temp stationarity = (adf=1); * run;
* identify var=temp(1); run;
Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 34 / 37
* estimate p=1 noint method=ml; * run;
* outlier alpha=0.01; * run;

data seriesc;
set seriesc;
if time = 163 then AO = 1;
else AO = 0.0;
if time=58 then LS1 = 1;
else LS1 = 0.0;
if time=59 then LS2 = 1;
else LS2 = 0.0;
if time=61 then LS3 = 1;
else LS3 = 0.0;
if time=41 then LS4 = 1;
else LS4 = 0.0;
run;

Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 35 / 37
proc arima data=seriesc ;
identify var=temp(1)
crosscorr=(AO(1) LS1(1) LS2(1) LS3(1) LS4(1))
noprint;

estimate p=1 noint input = (AO LS1 LS2 LS3 LS4 )


method=ml plot;
outlier alpha=0.01;
run;

*forecast lead=36; *run;


quit;

Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 36 / 37
Model fitting with outliers included

After introducing the outliers as input, the AR(1) parameter


estimate shifts from 0.82017 to 0.86077.

σ̂a2 was reduced from 0.018156 to 0.013906.

The model fits better, and no outliers from the revised model
are detected.

Donald E.K. Martin (North Carolina St. U.) Intervention analysis November 26, 2022 37 / 37

You might also like