0% found this document useful (0 votes)
104 views45 pages

Exercises 4: Autocorrelated Processes ARIMA Models

1) The qualitative analysis of the pH data showed non-stationarity, with a strong positive autocorrelation that decayed slowly. 2) Taking the first difference of the data resulted in a stationary time series with significant autocorrelation only at lag 1. 3) This indicates the pH process follows a random walk model, where the changes in pH at each time period are independent. The best fitting ARIMA model is (0,1,0), also known as a random walk.

Uploaded by

Gaurav Venkatesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views45 pages

Exercises 4: Autocorrelated Processes ARIMA Models

1) The qualitative analysis of the pH data showed non-stationarity, with a strong positive autocorrelation that decayed slowly. 2) Taking the first difference of the data resulted in a stationary time series with significant autocorrelation only at lag 1. 3) This indicates the pH process follows a random walk model, where the changes in pH at each time period are independent. The best fitting ARIMA model is (0,1,0), also known as a random walk.

Uploaded by

Gaurav Venkatesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

EXERCISES 4

Autocorrelated processes
ARIMA models

ARIMA time-series models (EXTRA MATERIAL)

Quality Data Analysis


REMINDER (NON RANDOM PROCESSES)
Distributional model
(observations randomly drawn from a population):
𝑋~𝑖𝑖𝑑 𝜇, 𝜎 2
To make inference (e.g. hypothesis testing): 𝑋~𝑁𝐼𝐷 𝜇, 𝜎 2

Time series model


𝑌 = 𝑓 𝑋, 𝜃 + 𝜀, 𝜀~𝑖𝑖𝑑 0, 𝜎 2
To make inference: 𝜀~𝑁𝐼𝐷 0, 𝜎 2

Remind:
Independence → absence of autocorrelation
Absence of autocorrelation → independence ONLY IF data are normal
REMINDER (NON RANDOM PROCESSES)
Qualitative analysis of
process data
Is it random?
- Overall process level is
constant over time?
- Is there a systematic
pattern?
- The variation around the
process level is constant?

Are there outliers?


Exercise 1
In a chemical process it is necessary to keep constant the pH of a compound.
Measurements are made every hour. Data acquired over the first 48 hours are
shown in the table (read from left to right, and from top to bottom)

8.67 8.65 8.64 8.67 8.74 8.82 8.85 8.83 8.88 8.84 8.84 8.81 8.8 8.76 8.73 8.69
8.66 8.62 8.61 8.58 8.56 8.55 8.54 8.44 8.44 8.4 8.44 8.4 8.42 8.51 8.56 8.51
8.44 8.4 8.43 8.47 8.5 8.49 8.48 8.51 8.53 8.52 8.55 8.55 8.58 8.57 8.53 8.55

Identify and fit a model for the data.


In a future class:
a) Design a SCC control chart and a FVC control chart
b) Design an EWMA control chart for autocorrelated data
Exercise 1 (solution)

Qualitative (graphical) analysis Time Series Plot of EXE1


8,9
and randomness tests
8,8

8,7

EXE1
8,6

8,5

8,4

1 5 10 15 20 25 30 35 40 45
Index
Exercise 1 (solution)
Partial Autocorrelation Function for EXE1
Autocorrelation Function for EXE1 (with 5% significance limits for the partial autocorrelations)
(with 5% significance limits for the autocorrelations)
1,0

1,0 0,8

0,6
0,8

Partial Autocorrelation
0,4
0,6 0,2

0,4 0,0
Autocorrelation

-0,2
0,2
-0,4
0,0 -0,6

-0,2 -0,8

-1,0
-0,4
1 2 3 4 5 6 7 8 9 10 11 12
-0,6 Lag

-0,8 Scatterplot of EXE1 vs lag1


-1,0 8,9

1 2 3 4 5 6 7 8 9 10 11 12
Lag
8,8

Strong positive correlation. 8,7


X(t)

Decay of autocorr. coefficients is


8,6
not exponential
8,5

Based on ACF analysis, we can


state that the process is non- 8,4

8,4 8,5 8,6 8,7 8,8 8,9


stationary X(t-1)
EXERCISE 1 (SOLUTION)
REMINDER (previous class)
Autoregressive models: AR(p)
X t =  + 1 X t −1 +  2 X t − 2 + ... +  p X t − p +  t
• ACF “geometrically decays”
• PACF indicates the order p

Moving average models: MA(q)


X t =  − 1 t −1 −  2 t − 2 − ... −  q t − q +  t
~
X t = X t −  = −1 t −1 −  2 t −2 − ... −  q  t −q +  t

• PACF “geometrically decays”


• ACF indicates the order q
Quality Engineering 7
EXERCISE 1 (SOLUTION)
REMINDER

ARMA(p,q): both AR(p) and MA(q) terms are present


• It resembles an AR(p) after q lags
• Parsimony: low order models are preferred (easier to deal with)
• Model identification is often a trial and error problem

Homogeneous nonstationary ARMA (p,q) = ARIMA(p,d,q)

• ACF with slow (e.g. linear) decay, not a “geometrical decay”

Quality Engineering 8
Starting time series xt Exercise 1 (solution)
SACF; d=0 The Box-Jenkins approach
(ARIMA models)
Stationary?
no
Exponential decay?
yes d:=d+1
Apply d

SACF-SPACF: choose
ARIMA model to test
Find a
model for
residuals* Coefficients estimation
and residuals computation

no yes
Residuals are IID (NID)? STOP
Exercise 1 (solution)

Apply the difference operator :


MINITAB: STAT → TIME SERIES → DIFFERENCES

Time Series Plot of DIFF1


0,10

0,05
DIFF1

0,00

-0,05

-0,10

1 5 10 15 20 25 30 35 40 45
Index
Exercise 1 (solution)

Apply the difference operator :

Autocorrelation Function for DIFF1


(with 5% significance limits for the autocorrelations)

1,0

0,8

0,6

0,4
Autocorrelation

0,2

0,0

-0,2

-0,4

-0,6

-0,8

-1,0

1 2 3 4 5 6
Lag
7 8 9 10 11 12
P-value=0.046

Differences at lag 1 are NID.


The process is a RANDOM WALK

𝑌𝑡 = 𝑌𝑡−1 + 𝜀𝑡
Exercise 1 (solution)

The model is adequate (normality is borderline; we


should reject at 95% but we can accept at 99%).

But, is it the only suitable model? Is it the best one?

E.g., we could try AR(1)


Remind: parsimony

Try with regression model


Exercise 1 (solution)

Let’s try to remove the constant


term
Exercise 1 (solution)

Uncheck this
box to remove
constant term
Exercise 1 (solution)

We have found again the random walk model!!

Notice that the SSe is very close to 0, because almost


all the variability is explained by the regression
model (random walk). This is why the R2(adj) is
approx 100%.
Remind that the DFtot is now n because we are not
including the mean term into the model! The DFres
is instead n-p=47-1=46
Exercise 1 (solution)

P-val: 0.048
Borderline condition
Reject at 95%, accept at 99%
Exercise 1 (solution)

REMIND:
Bartlett’s TEST at lag=1
Critical region:
• 𝛼 = 5%
𝑧𝛼/2 1.96
𝑟1 > = = 0.33
𝑛 47
𝑟1 = 0.288 > 0.286

• 𝛼 = 1%
𝑧𝛼/2 2.576
𝑟1 > = = 0.33
𝑛 47
𝑟1 = 0.288 < 0.376
Exercise 1 (solution)

In the random walk


model, the lagged
variable 𝑌𝑡−1 is the
estimate of 𝑌𝑡

Time Series Plot of RESI


0,10

0,05

This is the time series plot of the residuals:


RESI

0,00
notice that they range from -0.1 and 0.1, which
means that their variability is very small.
-0,05
This is why SSe is very close to 0.
-0,10

1 5 10 15 20 25 30 35 40 45
Index
Exercise 1 (solution)
If we do not accept the random walk model (95% confidence), one possibility is to add one
additional term; e.g., LAG2
Exercise 2
In a process for the production of metal laminates we collected 100 sequential
measurements of laminate width (time series ‘A’ “Statistical Control by
monitoring and feedback adjustment” Box Luceño – J. Wiley)

Read from left to right and from top to bottom

80 92 100 61 93 85 92 86 77 82 85 102 93 90 94
75 75 72 76 75 93 94 83 82 82 71 82 78 71 81
88 80 88 85 76 75 88 86 89 85 89 100 100 106 92
117 100 100 106 109 91 112 127 96 127 96 90 107 103 104
97 108 127 110 90 121 109 120 109 134 108 117 137 123 108
128 110 114 101 100 115 124 120 122 123 130 109 111 98 116
109 113 97 127 114 111 130 92 115 120

Identify and fit a model for the data.


In a future class:
a) Design a SCC control chart and a FVC control chart
b) Design an EWMA control chart for autocorrelated data
Exercise 2 (solution)
Time Series Plot of EXE2
140

130

120

110
EXE2

100

90

80
Autocorrelation Function for EXE2
70 (with 5% significance limits for the autocorrelations)
60
1,0
1 10 20 30 40 50 60 70 80 90 100
Index 0,8

0,6

0,4
Autocorrelation

0,2

0,0

-0,2

-0,4

-0,6

-0,8

The process is -1,0

2 4 6 8 10 12 14 16 18 20 22 24
non-stationary Lag
Exercise 2 (solution)

Time Series Plot of DIFF


Apply the difference operator
40

30

20

10

DIFF
0

-10

-20

-30

-40

1 10 20 30 40 50 60 70 80 90 100
Index
Exercise 2 (solution)
ACF and PACF of the differences series
Autocorrelation Function for DIFF
(with 5% significance limits for the autocorrelations)

1,0

0,8

0,6

0,4
Autocorrelation

0,2

0,0

-0,2

-0,4

-0,6

-0,8 Partial Autocorrelation Function for DIFF


-1,0
(with 5% significance limits for the partial autocorrelations)

2 4 6 8 10 12 14 16 18 20 22 24
1,0
Lag
0,8

0,6
Partial Autocorrelation

0,4

0,2

0,0

-0,2
After the differencing operation, -0,4

the most suitable model seems to -0,6

be: -0,8

-1,0

2 4 6 8 10 12 14 16 18 20 22 24

MA(1) with positive parameter θ Lag


Exercise 2 (solution)

Command ARIMA:
Stat-> Time series
Exercise 2 (solution)
ARIMA Model: EXE2 𝑌𝑡 − 𝑌𝑡−1 = 𝛻𝑌𝑡 = 𝜇 − 𝜃1 𝜀𝑡−1 + 𝜀𝑡

Let’s remove the constant term

DF:
For model with constant term: (n − d) − p − q − 1
For model without constant term: (n − d) − p − q
Exercise 2 (solution)

ARIMA Model: EXE2 𝑌𝑡 − 𝑌𝑡−1 = 𝛻𝑌𝑡 = −𝜃1 𝜀𝑡−1 + 𝜀𝑡


Exercise 2 (solution)

Diagnosis (residual checking)

ACF of Residuals for EXE2


(with 5% significance limits for the autocorrelations)

1,0

P-value = 0.721 0,8

0,6

0,4
Autocorrelation

0,2

0,0

-0,2

-0,4

-0,6

-0,8

-1,0

2 4 6 8 10 12 14 16 18 20 22 24
Lag
Exercise 2 (solution)

The model is adequate


Exercise 3
In a chemical process, data related to the concentration of a given component are
measured every two hours. 197 consecutive observations are shown in the
table (time series ‘A’ “Time Series Analysis – 3rd edition” Box Jenkins
Reinsel – Prentice Hall)

Estimate the most suitable ARIMA model.


Concentration Concentration Concentration Concentration Concentration
1 17.0 41 17.6 81 16.8 121 16.9 161 17.1
Series ‘A’ “Time Series 2 16.6 42 17.5 82 16.7 122 17.1 162 17.1

Analysis – 3rd edition”


3 16.3 43 16.5 83 16.4 123 16.8 163 17.1
4 16.1 44 17.8 84 16.5 124 17.0 164 17.4

Box Jenkins Reinsel – 5


6
17.1
16.9
45
46
17.3
17.3
85
86
16.4
16.6
125
126
17.2
17.3
165
166
17.2
16.9
7 16.8 47 17.1 87 16.5 127 17.2 167 16.9
Prentice Hall 8 17.4 48 17.4 88 16.7 128 17.3 168 17.0
9 17.1 49 16.9 89 16.4 129 17.2 169 16.7
10 17.0 50 17.3 90 16.4 130 17.2 170 16.9
11 16.7 51 17.6 91 16.2 131 17.5 171 17.3
12 17.4 52 16.9 92 16.4 132 16.9 172 17.8
13 17.2 53 16.7 93 16.3 133 16.9 173 17.8
14 17.4 54 16.8 94 16.4 134 16.9 174 17.6
15 17.4 55 16.8 95 17.0 135 17.0 175 17.5
16 17.0 56 17.2 96 16.9 136 16.5 176 17.0
17 17.3 57 16.8 97 17.1 137 16.7 177 16.9
18 17.2 58 17.6 98 17.1 138 16.8 178 17.1
19 17.4 59 17.2 99 16.7 139 16.7 179 17.2
20 16.8 60 16.6 100 16.9 140 16.7 180 17.4
21 17.1 61 17.1 101 16.5 141 16.6 181 17.5
22 17.4 62 16.9 102 17.2 142 16.5 182 17.9
23 17.4 63 16.6 103 16.4 143 17.0 183 17.0
24 17.5 64 18.0 104 17.0 144 16.7 184 17.0
25 17.4 65 17.2 105 17.0 145 16.7 185 17.0
26 17.6 66 17.3 106 16.7 146 16.9 186 17.2
27 17.4 67 17.0 107 16.2 147 17.4 187 17.3
28 17.3 68 16.9 108 16.6 148 17.1 188 17.4
29 17.0 69 17.3 109 16.9 149 17.0 189 17.4
30 17.8 70 16.8 110 16.5 150 16.8 190 17.0
31 17.5 71 17.3 111 16.6 151 17.2 191 18.0
32 18.1 72 17.4 112 16.6 152 17.2 192 18.2
33 17.5 73 17.7 113 17.0 153 17.4 193 17.6
34 17.4 74 16.8 114 17.1 154 17.2 194 17.8
35 17.4 75 16.9 115 17.1 155 16.9 195 17.7
36 17.1 76 17.0 116 16.7 156 16.8 196 17.2
37 17.6 77 16.9 117 16.8 157 17.0 197 17.4
38 17.7 78 17.0 118 16.3 158 17.4
39 17.4 79 16.6 119 16.6 159 17.2
40 17.8 80 16.7 120 16.8 160 17.2
Exercise 3 (solution)
Time Series Plot of EXE3
18,5

18,0

17,5

EXE3 17,0

16,5

16,0
1 20 40 60 80 100 120 140 160 180
Index
Exercise 3 (solution)
Autocorrelation Function for EXE3
(with 5% significance limits for the autocorrelations)

1,0

0,8

0,6

0,4
Autocorrelation

Partial Autocorrelation Function for EXE3


0,2
(with 5% significance limits for the partial autocorrelations)
0,0
1,0
-0,2
0,8
-0,4
0,6

Partial Autocorrelation
-0,6
0,4
-0,8
0,2
-1,0
0,0
1 5 10 15 20 25 30 35 40 45
-0,2
Lag
-0,4

-0,6

-0,8

-1,0

1 5 10 15 20 25 30 35 40 45
Lag

• Non stationary process?


Exercise 3 (solution)
Let’s apply the difference operator 

Time Series Plot of DiffExe3


1,5

1,0

0,5
DiffExe3

0,0

-0,5

-1,0

1 20 40 60 80 100 120 140 160 180


Index
Exercise 3 (solution)
After the differencing operation:
Autocorrelation Function for DiffExe3
(with 5% significance limits for the autocorrelations)

1,0

0,8

0,6

0,4
Autocorrelation

0,2

0,0

-0,2

-0,4
Partial Autocorrelation Function for DiffExe3
-0,6 (with 5% significance limits for the partial autocorrelations)

-0,8 1,0

-1,0 0,8

1 5 10 15 20 25 30 35 40 0,6 45
Partial Autocorrelation

Lag 0,4

0,2

0,0

MA(1) ?? -0,2

-0,4

-0,6

-0,8
Remind: parsimony! -1,0

1 5 10 15 20 25 30 35 40 45
Lag
Exercise 3 (solution)
Let’s try with a IMA(1,1) model: 𝑌𝑡 − 𝑌𝑡−1 = 𝛻𝑌𝑡 = 𝜇 − 𝜃1 𝜀𝑡−1 + 𝜀𝑡

Remove the constant term


Exercise 3 (solution)
Let’s try with a IMA(1,1) model: 𝑌𝑡 − 𝑌𝑡−1 = 𝛻𝑌𝑡 = −𝜃1 𝜀𝑡−1 + 𝜀𝑡
Exercise 3 (solution)
Diagnostic check on the residuals of IMA(1,1) model
ACF of Residuals for EXE3
(with 5% significance limits for the autocorrelations)

1,0

0,8

0,6
P-value=0.319
0,4
Autocorrelation

0,2

0,0

-0,2

-0,4

-0,6

-0,8

-1,0

1 5 10 15 20 25 30 35 40 45
Lag

Time Series Plot of RESI3

1,0

0,5
RESI3

0,0

The model is adequate.


-0,5

-1,0

1 20 40 60 80 100 120 140 160 180


Index
Exercise 3 (solution)
Diagnostic check on the residuals of IMA(1,1) model
Exercise 4
In a chemical process, 70 consecutive measurements were made over time for
a given quality characteristic of interest. Data are reported in the table (read
from top to bottom and from left to right) (series ‘F’ “Time Series Analysis –
3rd edition” Box Jenkins Reinsel – Prentice Hall)
47 71 51 50 48 38 68
64 35 57 71 55 59 38
23 57 50 56 45 55 50
71 40 60 74 57 41 60
38 58 45 50 50 53 39
64 44 57 58 62 49 59
55 80 50 45 44 34 40
41 55 45 54 64 35 57
59 37 25 36 43 54 54
48 74 59 54 52 45 23

Fit the most suitable ARIMA model


Exercise 4 (solution)
Time Series Plot of EXE4

80

70

Very high number of runs


60
EXE4

50 REMIND: typical pattern of


40
negatively correlated process
30
Systematic variation of
20
1 7 14 21 28 35 42 49 56 63 70
observations above and below the
Index mean
Exercise 4 (solution)
Autocorrelation Function for EXE4
(with 5% significance limits for the autocorrelations)

1,0

0,8

0,6

0,4
Autocorrelation

0,2

0,0

-0,2 Partial Autocorrelation Function for EXE4


(with 5% significance limits for the partial autocorrelations)
-0,4

-0,6 1,0

0,8
-0,8
0,6
-1,0

Partial Autocorrelation
0,4
2 4 6 8 10 12 14 16 18
0,2
Lag
0,0

-0,2

-0,4

-0,6

-0,8

-1,0

2 4 6 8 10 12 14 16 18

•It looks stationary Lag

•Geometric decay of ACF?


•Most suitable model may be AR(1) (with negative coeff.)
Exercise 4 (solution)

Estimated model: X t = 73.091 − 0.4257 X t −1 + et


Exercise 4 (solution)

P-val: 0.160

RUNS TEST: P-value = 0,804

The AR(1) model is adequate


Exercise 4 (solution)
You could solve the same problem with a linear regression
Regression Analysis: EXE4 versus LAG1_exe4
Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value


Regression 1 1622 1621,84 13,34 0,001
LAG1_exe4 1 1622 1621,84 13,34 0,001
Error 67 8147 121,59
Lack-of-Fit 31 4942 159,41 1,79 0,047
Pure Error 36 3205 89,03
Total 68 9769

Model Summary Estimated model:


S R-sq R-sq(adj) R-sq(pred) X t = 73.09 − 0.425 X t −1 + et
11,0269 16,60% 15,36% 11,71%

Coefficients

Term Coef SE Coef T-Value P-Value VIF


Constant 73,09 6,14 11,90 0,000
LAG1_exe4 -0,425 0,116 -3,65 0,001 1,00

Regression Equation

EXE4 = 73,09 - 0,425 LAG1_exe4


Conclusion

‘All models are wrong, but some are


useful’

G.E.P. Box

You might also like