0% found this document useful (0 votes)
20 views205 pages

Fung, D.-2006-Methods For The Estimation of Missing Values in Time Series

The document discusses different methods for estimating missing values in time series data. It examines deterministic models like least squares approximations and cubic spline interpolation. It also covers stochastic models such as Box-Jenkins ARIMA models and state space models. The thesis compares various techniques by applying them to simulated data and examining their effectiveness for different missing data patterns.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views205 pages

Fung, D.-2006-Methods For The Estimation of Missing Values in Time Series

The document discusses different methods for estimating missing values in time series data. It examines deterministic models like least squares approximations and cubic spline interpolation. It also covers stochastic models such as Box-Jenkins ARIMA models and state space models. The thesis compares various techniques by applying them to simulated data and examining their effectiveness for different missing data patterns.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 205

Edith Cowan University

Research Online
Theses: Doctorates and Masters Theses

2006

Methods for the Estimation of Missing Values in


Time Series
David S. Fung
Edith Cowan University

Recommended Citation
Fung, D. S. (2006). Methods for the Estimation of Missing Values in Time Series. Retrieved from http://ro.ecu.edu.au/theses/63

This Thesis is posted at Research Online.


http://ro.ecu.edu.au/theses/63
 
Edith Cowan University 
 

Copyright Warning 
 
 
 
 
 
You may print or download ONE copy of this document for the purpose 
of your own research or study. 
 
The University does not authorize you to copy, communicate or 
otherwise make available electronically to any other person any 
copyright material contained on this site. 
 
You are reminded of the following: 
 
 Copyright owners are entitled to take legal action against persons 
who infringe their copyright. 
 
 A reproduction of material that is protected by copyright may be a 
copyright infringement. Where the reproduction of such material is 
done without attribution of authorship, with false attribution of 
authorship or the authorship is treated in a derogatory manner, 
this may be a breach of the author’s moral rights contained in Part 
IX of the Copyright Act 1968 (Cth). 
 
 Courts have the power to impose a wide range of civil and criminal 
sanctions for infringement of copyright, infringement of moral 
rights and other offences under the Copyright Act 1968 (Cth). 
Higher penalties may apply, and higher damages may be awarded, 
for offences and infringements involving the conversion of material 
into digital or electronic form.
USE OF THESIS

The Use of Thesis statement is not included in this version of the thesis.
Methods for the Estimation of Missing Values in

Time Series

A thesis

Submitted to the

Faculty of Communications, Health and Science

Edith Cowan University

Perth, Western Australia

By

David Sheung Chi Fung

In Fulfillment of the Requirements

For the Degree

Of

Master of Science (Mathematics and Planning)

2006
Abstract

Time Series is a sequential set of data measured over time. Examples of time series arise

in a variety of areas, ranging from engineering to economics. The analysis of time series

data constitutes an important area of statistics. Since, the data are records taken through

time, missing observations in time series data are very common. This occurs because an

observation may not be made at a particular time owing to faulty equipment, lost

records, or a mistake, which cannot be rectified until later. When one or more

observations are missing it may be necessary to estimate the model and also to obtain

estimates of the missing values. By including estimates of missing values, a better

understanding of the nature of the data is possible with more accurate forecasting.

Different series may require different strategies to estimate these missing values. It is

necessary to use these strategies effectively in order to obtain the best possible

estimates. The objective of this thesis is to examine and compare the effectiveness of

various techniques for the estimation of missing values in time series data models.

The process of estimating missing values in time series data for univariate data involves

analysis and modelling. Traditional time series analysis is commonly directed toward

scalar-valued data, and can be represented by traditional Box-Jenkins’ autoregressive,

moving average, or autoregressive-moving average models. In some cases these models

can be used to obtain missing values using an interpolation approach. In a recent

development of time series analysis, the strategy has been to treat several variables

simultaneously as vector-valued variables, and to introduce an alternative way of

representing the model, called state space modelling. Examples of popular models that

2
can be represented in state space form are structured time series models where data is

considered as a combination of level, trend and seasonal components. This approach,

which uses Kalman filtering, can also be used to obtain smoothed estimates of missing

values.

There are other common approaches available for modelling time series data with

missing values such as: time series decomposition; least squares approximation; and

numerical interpolation methods. Some of these approaches are quick and the model

building procedures are easy to carry out. Often these approaches are popular choices

for professionals who work within the time series analysis industries.

This thesis will examine each approach by using a variety of computer software and

simulated data sets for estimation and forecasting of data with missing values. In

addition this thesis will also explore the advantages and disadvantages of structural state

space modelling for different missing value data patterns by examining the mean

absolute deviation of the residuals for the missing values.

3
Declaration

I certify that this thesis does not incorporate without acknowledgment any material

previously submitted for a degree or diploma in any institution of higher education; and

that to the best of my knowledge and belief it does not contain any material previously

published or written by another person where due reference is made in the text.

Signature ………………………..

Date …………………………….

4
Acknowledgments

I would like to express my special thanks and gratitude to Associate Professor James

Cross, my supervisor, from the School of Engineering and Mathematics at Joondalup

Campus at Edith Cowan University, for his expertise and assistance in clarifying

questions I had regarding Time Series Analysis and the advise he gave which was

invaluable in enabling me to compute this thesis.

I am especially appreciative of my friend Grace Guadagnino, who provided me with

advice and support throughout the writing of this dissertation.

Last, but not least, I would like to thank my mum and dad for their ongoing support and

encouragement throughout my studies, particularly in the completion of this thesis.

5
Table of Contents

Page

Abstract ………………………………………………………………..2-3

Declaration …………………………………………………………………..4

Acknowledgments …………………………………………………………………..5

Table of Contents ………………………………………………………………...6-8

List of Figures ………………………………………………………………..9-11

List of Tables ……………………………………………………………....12-14

Chapter

1 Introduction

1.1 About this Chapter……………………………………………………………15

1.2 Background………………………………………….………………………..15

1.2.1 Deterministic Modelling……………………………………………...16

1.2.2 Stochastic Modelling………………………………………………....17

1.2.3 State Space Modelling………………………………………………..18

1.3 Aim of Research………………………………………………...……………19

1.4 Significance of Research……………………………………………………..19

1.5 Data…………………………………………………………………………...19

1.6 Research Methodology……………………………………………………….20

1.7 Structure of the Dissertation………………………………………………….21

2 Literature Review………………………………....................................................24

3 Deterministic Models for Missing Values

3.1 About this Chapter……………………………………………………………32

6
3.2 Least Squares Approximations……………………………………………….33

3.3 Interpolating with a Cubic Spline…………………………………………….37

4 Stochastic Models for Missing Values

4.1 About this Chapter……………………………………………………………46

4.2 Stationary Models…………………………………………………………….47

4.3 Non-stationary Models……………………………………………………….49

4.4 Box-Jenkins Models………………………………………………………….50

4.4.1 Autoregressive Models……………………………………………….51

4.4.2 Moving Average Models……………………………………………..52

4.4.3 Autoregressive Moving Average Models…………………………….52

4.4.4 Autoregressive Integrated Moving Average Models………………...53

4.5 Box-Jenkins Models and Missing Values……………………………………55

4.6 Least Squares Principle………………………………………………………56

4.7 Interpolation………………………………………………………………….61

4.7.1 Weighting for general ARIMA models………………………………65

5 State Space Models for Missing Values

5.1 About this Chapter……………………………………………………………82

5.2 State Space Models…………………………………………………………...82

5.3 Structural Time Series Models……………………………………………….88

5.4 The Kalman Filter…………………………………………………………….91

5.4.1 Prediction Stage………………………………………………………91

5.4.2 Updating Stage……………………………………………………….92

5.4.3 Estimation of Parameters……………………………………………..93

5.4.4 The Expectation-Maximisation (EM) Algorithm…………………….94

5.5 Missing Values……………………………………………………………….96

7
6. Analysis and Comparsion of Time Series Model

6.1 About this Chapter…………………………………………………………..102

6.2 Applying Polynomial Curve fitting to Box-Jenkins Models………………..104

6.3 Applying Cubic Spline to Box-Jenkins Models………………..…………...124

6.4 ARIMA Interpolation……………………………………………………….132

6.5 Applying Structural Time Series Analysis to Box-Jenkins Models………...142

7. Conclusion

7.1 About this chapter…………………………………………………………..147

7.2 Effectiveness of various approaches………………………………………..147

7.3 Further comparison between various approaches…………………………..154

7.4 Future research direction……………………………………………………160

7.5 Conclusion……….…………………….……………………………………160

References ……………………………………………………………………………163

Appendix A…………………………………………………………………………...165

Appendix B …………………………………………………………………………..187

8
List of Figures
Page

Figure 3.1 Estimate missing values using least square approximations…………..36

Figure 3.2 Estimate missing values using cubic spline……………………………45

Figure 4.1 Estimate missing value in AR(1) process by using Least Square Principle
………………………………………………………………………....60

Figure 4.2 Estimate two missing values in AR(1) process by using Least Square
Principle………………………………………………………………...60

Figure 5.1 Plot of 71 observations on purse snatching in Chicago………………...98

Figure 5.2 Actual and Fitted Values on purse snatching in Chicago………………99

Figure 5.3 Normalised Residual on purse snatching in Chicago…………………..99

Figure 5.4 Plot on purse snatching in Chicago with a run of missing


observations…………………………………………………………...100

Figure 5.5 Results of estimating missing values on purse snatching in Chicago..100

Figure 5.6 Actual values and fitted values on purse snatching in Chicago……….101

Figure 6.1 AR1 missing at 49 S.D. 0.04………………………………………….107

Figure 6.2 AR1 missing at 49 S.D. 0.4…………………………………………...108

Figure 6.3 AR1 missing at 7 S.D. 0.04…………………………………………...110

Figure 6.4 AR1 missing at 7 S.D. 0.4……………………………………………111

Figure 6.5 AR1 missing at 91 S.D. 0.04……………………………………….....113

Figure 6.6 AR1 missing at 91 S.D. 0.4…………………………………………...114

Figure 6.7 MA1 missing at 49 S.D. 0.04…………………………………………117

Figure 6.8 MA1 missing at 49 S.D. 0.4……………………………………….....117

Figure 6.9 MA1 missing at 7 S.D. 0.04…………………………………………..120

Figure 6.10 MA1 missing at 7 S.D. 0.4……………………………………………120

Figure 6.11 MA1 missing at 91 S.D. 0.04…………………………………………122

9
List of Figures
Page

Figure 6.12 MA1 missing at 91 S.D. 0.4…………………………………………..123

Figure 6.13 Spline-AR1 Missing 49 S.D. 0.04…………………………………….125

Figure 6.14 Spline-AR1 Missing 49 S.D. 0.4……………………………………...125

Figure 6.15 Spline-AR1 Missing 7 S.D. 0.04……………………………………...126

Figure 6.16 Spline-AR1 Missing 7 S.D. 0.4……………………………………….126

Figure 6.17 Spline-AR1 Missing 91 S.D. 0.04…………………………………….127

Figure 6.18 Spline-AR1 Missing 91 S.D. 0.4……………………………………...127

Figure 6.19 Spline-MA1 Missing 49 S.D. 0.04……………………………………128

Figure 6.20 Spline-MA1 Missing 49 S.D. 0.4……………………………………..129

Figure 6.21 Spline-MA1 Missing 7 S.D. 0.04……………………………………..129

Figure 6.22 Spline-MA1 Missing 7 S.D. 0.4………………………………………130

Figure 6.23 Spline-MA1 Missing 91 S.D. 0.04……………………………………130

Figure 6.24 Spline-MA1 Missing 91 S.D. 0.4……………………………………..131

Figure 6.25 Interpolation -AR1 Missing 49 S.D. 0.04……………………………..134

Figure 6.26 Interpolation -AR1 Missing 49 S.D. 0.4………………………………134

Figure 6.27 Interpolation -AR1 Missing 7 S.D. 0.04………………………………135

Figure 6.28 Interpolation -AR1 Missing 7 S.D. 0.4………………………………..135

Figure 6.29 Interpolation -AR1 Missing 91 S.D. 0.04……………………………..135

Figure 6.30 Interpolation -AR1 Missing 91 S.D. 0.4………………………………136

Figure 6.31 Interpolation -MA1 Missing 49 S.D. 0.04…………………………….137

Figure 6.32 Interpolation -MA1 Missing 49 S.D. 0.4……………………………...138

Figure 6.33 Interpolation -MA1 Missing 7 S.D. 0.04……………………………...138

Figure 6.34 Interpolation -MA1 Missing 7 S.D. 0.4……………………………….139

10
List of Figures
Page

Figure 6.35 Interpolation -MA1 Missing 91 S.D. 0.04…………………………….139

Figure 6.36 Interpolation -MA1 Missing 91 S.D. 0.4……………………………...140

Figure 6.37 STSA -AR1 Missing 49 S.D. 0.04…………………………………….144

Figure 6.38 STSA -AR1 Missing 7 S.D. 0.04……………………………………...144

Figure 6.39 STSA -AR1 Missing 91 S.D. 0.04…………………………………….144

Figure 6.40 STSA -MA1 Missing 49 S.D. 0.04……………………………………145

Figure 6.41 STSA -MA1 Missing 7 S.D. 0.04……………………………………..145

Figure 6.42 STSA -MA1 Missing 91 S.D. 0.04……………………………………145

Figure 7.1 Various Methods -AR1 Missing 49 S.D. 0.04………………………...148

Figure 7.2 Various Methods -AR1 Missing 7 S.D. 0.04………………………….149

Figure 7.3 Various Methods -AR1 Missing 91 S.D. 0.04………………………...150

Figure 7.4 Various Methods -MA1 Missing 49 S.D. 0.04………………………..151

Figure 7.5 Various Methods -MA1 Missing 7 S.D. 0.04…………………………152

Figure 7.6 Various Methods -MA1 Missing 91 S.D. 0.04………………………..153

Figure 7.7 STSA vs Various Methods -AR1 Missing 49 S.D. 0.04……………...156

Figure 7.8 STSA vs Various Methods -AR1 Missing 7 S.D. 0.04……………….156

Figure 7.9 STSA vs Various Methods -AR1 Missing 91 S.D. 0.04……………...157

Figure 7.10 STSA vs Various Methods -MA1 Missing 49 S.D. 0.04……………..157

Figure 7.11 STSA vs Various Methods -MA1 Missing 7 S.D. 0.04………………158

Figure 7.12 STSA vs Various Methods -MA1 Missing 91 S.D. 0.04……………..158

11
List of Tables
Page

Table 3.2.1 Seven data points with three consecutive missing values……………...35

Table 3.2.2 Estimate missing values using least square approximations…………...36

Table 3.3.1 Estimate missing values using cubic spline…………………………….43

Table 4.6.1 AR(1) time series model with missing value when t = 6……………..59

Table 4.6.2 AR(1) time series model with missing value when t = 6 and 7………60

Table 4.7.1 Weightings w 1l and w 2 l for 1 missing value in ARIMA model……...78

Table 4.7.2 Weightings w 1l and w 2 l for 1 missing value in ARIMA model……...79

Table 5.5.1 Actual values and fitted values on purse snatching in Chicago……….101

Table 6.1.1 Various time series models for simulation……………………………103

Table 6.2.1 AR time series model with missing value at position 49……………...106

Table 6.2.2 Missing value at position 49 SD = 0.04……………………………….107

Table 6.2.3 Missing value at position 49 SD = 0.4…………………………….......108

Table 6.2.4 AR time series model with missing value at position 7…………….…109

Table 6.2.5 Missing value at position 7 SD = 0.04…………………………..…….110

Table 6.2.6 Missing value at position 7 SD = 0.4………………………………….111

Table 6.2.7 AR time series model with missing value at position 91……………...112

Table 6.2.8 Missing value at position 91 SD = 0.04…………………………….....113

Table 6.2.9 Missing value at position 91 SD = 0.4……………………………..….114

Table 6.2.10 MA time series model with missing value at position 49………..115-116

Table 6.2.11 Missing value at position 49 SD = 0.04……………………………...117

Table 6.2.12 Missing value at position 49 SD = 0.4……………………………….117

Table 6.2.13 MA time series model with missing value at position 7…………118-119

Table 6.2.14 Missing value at position 7 SD = 0.04……………………………….120

12
List of Tables
Page

Table 6.2.15 Missing value at position 7 SD = 0. 4………………………………...120

Table 6.2.16 MA time series model with missing value at position 91…….…121-122

Table 6.2.17 Missing value at position 91 SD = 0.04……………………………….122

Table 6.2.18 Missing value at position 91 SD = 0.4………………………………...123

Table 6.3.1 AR1 Missing at position 49 S.D. 0.04………………………………...125

Table 6.3.2 AR1 Missing at position 49 S.D. 0.4……………………………….…125

Table 6.3.3 AR1 Missing at position 7 S.D. 0.04……………………………….…126

Table 6.3.4 AR1 Missing at position 7 S.D. 0.4…………………………………...126

Table 6.3.5 AR1 Missing at position 91 S.D. 0.04………………………………...127

Table 6.3.6 AR1 Missing at position 91 S.D. 0.4……………………………….…127

Table 6.3.7 MA1 Missing at position 49 S.D. 0.04………………………………..128

Table 6.3.8 MA1 Missing at position 49 S.D. 0.4………………………………....129

Table 6.3.9 MA1 Missing at position 7 S.D. 0.04………………………………....129

Table 6.3.10 MA1 Missing at position 7 S.D. 0.4……………………………….….130

Table 6.3.11 MA1 Missing at position 91 S.D. 0.04………………………………..130

Table 6.3.12 MA1 Missing at position 91 S.D. 0.4………………………………....131

Table 6.4.1 AR1 Missing at position 49 S.D. 0.04………………………………...134

Table 6.4.2 AR1 Missing at position 49 S.D. 0.4……………………………….…134

Table 6.4.3 AR1 Missing at position 7 S.D. 0.04………………………...………..135

Table 6.4.4 AR1 Missing at position 7 S.D. 0.4…………………………………...135

Table 6.4.5 AR1 Missing at position 91 S.D. 0.04………………………………...135

Table 6.4.6 AR1 Missing at position 91 S.D. 0.4……………………………….…136

Table 6.4.7 MA1 Missing at position 49 S.D. 0.04………………………………..137

13
List of Tables
Page

Table 6.4.8 MA1 Missing at position 49 S.D. 0.4………………………………...138

Table 6.4.9 MA1 Missing at position 7 S.D. 0.04………………………………...138

Table 6.4.10 MA1 Missing at position 7 S.D. 0.4………………………………….139

Table 6.4.11 MA1 Missing at position 91 S.D. 0.04……………………………….139

Table 6.4.12 MA1 Missing at position 91 S.D. 0.4………………………………...140

Table 6.5.1 AR1 Missing at position 49 S.D. 0.04………………………………..144

Table 6.5.2 AR1 Missing at position 7 S.D. 0.04…………………………………144

Table 6.5.3 AR1 Missing at position 91 S.D. 0.04………………………………..144

Table 6.5.4 MA1 Missing at position 49 S.D. 0.04……………………………….145

Table 6.5.5 MA1 Missing at position 7 S.D. 0.04………………………………...145

Table 6.5.6 MA1 Missing at position 91 S.D. 0.04……………………………….145

Table 7.2.1 AR1 Missing at position 49 S.D. 0.04………………………………..148

Table 7.2.2 AR1 Missing at position 7 S.D. 0.04…………………………………149

Table 7.2.3 AR1 Missing at position 91 S.D. 0.04………………………………..150

Table 7.2.4 MA1 Missing at position 49 S.D. 0.04……………………………….151

Table 7.2.5 MA1 Missing at position 7 S.D. 0.04………………………………...152

Table 7.2.6 MA1 Missing at position 91 S.D. 0.04……………………………….153

Table 7.3.1 AR1 Missing at position 49 S.D. 0.04………………………………..156

Table 7.3.2 AR1 Missing at position 7 S.D. 0.04…………………………………156

Table 7.3.3 AR1 Missing at position 91 S.D. 0.04………………………………..157

Table 7.3.4 MA1 Missing at position 49 S.D. 0.04……………………………….157

Table 7.3.5 MA1 Missing at position 7 S.D. 0.04………………………………...158

Table 7.3.6 MA1 Missing at position 91 S.D. 0.04……………………………….158

14
CHAPTER 1

INTRODUCTION

1.1 About this Chapter

This chapter provides an overview of the research undertaken in this thesis. Section 1.2

will briefly discuss the background to this study while sections 1.2.1, 1.2.2 and 1.2.3

look at various approaches for modelling time series data. The aim and significance of

this research are detailed in sections 1.3 and 1.4. The data used in this thesis is described

in section 1.5. Finally, the research methodology and brief overview of each chapter are

stated in sections 1.6 and 1.7.

1.2 Background

In our society, we often have to analyse and make inferences using real data that is

available for collection. Ideally, we would like to think that the data is carefully

collected and has regular patterns with no outliers or missing value. In reality, this does

not always happen, so that an important part of the initial examination of the data is to

assess the quality of the data and to consider modifications where necessary. A

common problem that is frequently encountered is missing observations for time series

data. Also data that is known or suspected to have been observed erroneously may be

regarded as having missing values. In addition, poor record keeping, lost records and
15
uncooperative responses during data collection will also lead to missing observations in

the series. One of the key steps in time series analysis is to try to identify and correct

obvious errors and fill in any missing observations enabling comprehensive analysis and

forecasting. This can sometimes be achieved using simple methods such as eyeballing,

or calculating appropriate mean value etc. However, more complex methods may be

needed and they may also require a deeper understanding of the time series data. We

may have to attempt to discover the underlying patterns and seasonality. Once we

understand the nature of the data, we can tackle the problem using common sense

combined with various mathematical approaches. This is both an art and a science.

Sometimes, we are required to forecast values beyond, or prior to, the range of known

values. To complete this task successfully we need a model which satisfactorily fits the

available data even when missing values are present.

More complex methods for analysing time series data will depend on the type of data

that we are handling. But most of the time, we would use either a deterministic or

stochastic approach.

1.2.1 Deterministic Modelling (also called Numerical Analysis Modelling)

This method assumes the time series data corresponds to an unknown function and we

try to fit the function in an appropriate way. The missing observation can be estimated

by using the appropriate value of the function at the missing observation. Unlike

traditional time series approaches, this method discards any relationship between the

variables over time. The approach is based on obtaining the “best fit” for the time series

16
data and is usually easy to follow computationally. In this approach there is a

requirement for “best fit” process to be clearly defined. There are a variety of curves

that can be used to fit the data. This will be considered in more detail in a later chapter.

1.2.2 Stochastic Modelling (also called Time Series Modelling)

Another common time series approach for modelling data is to use Box-Jenkins’

Autoregressive Integrated Moving Average (ARIMA) models. The ARIMA models are

based on statistical concepts and principles and are able to model a wide range of time

series patterns. These models use a systematic approach to identify the known data

patterns and then select the appropriate formulas that can generate the kind of patterns

identified.

Once the appropriate model has been obtained, the known time series data can be used

to determine appropriate values for the parameters in the model. The Box-Jenkins’

ARIMA models can provide many statistical tests for verifying the validity of the

chosen model. In addition the statistical theory behind Box-Jenkins ARIMA models

allows for statistical measurements of the uncertainty in a forecast to be made.

However, a disadvantage of the Box-Jenkins ARIMA models is that it assumes that data

is recorded for every time period. Often time series data with missing values require us

to apply some intuitive method or appropriate interpolative technique to estimate those

missing values prior to Box-Jenkins’s ARIMA approach.

17
1.2.3 State Space Modelling

A new approach to time series analysis is the use of state space modelling. This

modelling approach can incorporate various time series models such as Box-Jenkins

ARIMA models and structural time series. State space modelling emphasises the notion

that a time series is a set of distinct components. Thus, we may assume that

observations relate to the mean level of the process through an observation equation,

whereas one or more state equations describe how the individual components change

through time. A related advantage of this approach is that observations can be added

one at a time, and the estimating equations are then updated to produce new estimates.

(p.144 Kendall and Ord) Essentially, Kalman Filtering and Maximum Likelihood

Estimation methods are important procedures for handling state space models. The

approach continually performs estimating and smoothing calculations that depend only

on output from forward and backward recursions. With modifications on the maximum

likelihood procedure, it enables the approach to estimate and forecast for data with

missing values.

This research will investigate the above models to determine the most appropriate

technique for modelling time series data with missing values. These models can then be

used for estimating the missing values and enable forecasting. In particular, different

patterns and frequencies of missing values will be considered using a large number of

simulated data sets.

18
1.3 Aim of Research

The research objectives are as follows:

a) to compare the application of deterministic and stochastic approaches to

modelling time series data with missing value;

b) to compare various stochastic approaches and associated models for different

missing data patterns; and

c) to compare traditional Box-Jenkins ARIMA and state space models to obtain

estimates of missing values and for forecasting.

1.4 Significance of the Research

The objective of this research is to provide a review of various modelling approaches

for time series data with missing values. In addition, new insights will be provided

regarding the most appropriate modelling techniques for different missing data patterns.

1.5 Data

To test the effectiveness of each estimation method, we require many data sets that

represent different time series models. In this thesis, we have chosen a popular

spreadsheet program created by Microsoft called Excel and a multi-purpose

mathematics package called Minitab to create simulations on the computer. We chose

these packages because of their popularity and easy access. By creating macros, both

19
packages can generate specific time series data sets without any difficulty. The specific

time series models we are going to generate are Autoregressive (AR) and Moving

Average (MA) time series with various parameters.

All simulated data sets are going to be stationary time series with missing values based

on the assumption that decomposition or transformation techniques can be used to

convert non-stationary data to stationary data. (Chatfield, 2003 p.14).

1.6 Research Methodology

To conduct our research, we will set a single missing value at various positions for each

data set. The aim is to compare the performance of each method for different time series

models with missing values in different positions. For different estimation methods, the

missing value positions for a 100 point data set are at follows:

Polynomial Curve Fitting: Missing value at position 7, 49 and 91.

Cubic Spline: Missing value at position 7, 49 and 91.

ARIMA Interpolation: Missing value at position 7, 14, 21, 28, 35, 42, 49,

56, 63, 70 ,77, 84 and 91.

State Space Modelling: Missing value at position 7, 49 and 91.

In each of the estimation processes, we calculate the absolute deviation between the

estimated value and the original value. After repeating the process one hundred times,

we determine the Mean Absolute Deviation (MAD) and associated standard deviation.

20
In this thesis, our objective is to examine the accuracy of various estimation approaches

to calculate missing values at different positions within a time series data set. With

minor adjustment, most of the approaches are able to be adapted to data sets with

multiple missing values. However, if a data set consists of consecutive missing values,

our estimates become less accurate due to lack of information. In this thesis, we are

going to focus our analysis on a single missing value at various positions within the data

set.

1.7 Structure of the Dissertation

Chapter 1 Introduction

Chapter 2 Literature review

This chapter provides a summary of the articles and publications related

to this thesis.

Chapter 3 Deterministic Approach

Deterministic approach usually refers to the use of numerical analysis

techniques on the time series data. The principle of numerical analysis is

to consider the time series data pattern as an unknown behaviour of a

function. We try to identify the most appropriate function for that

behaviour in order to estimate the missing values.

21
Chapter 4 Stochastic Approach

In time series we often consider future values as realisations from a

probability distribution which is conditioned by knowledge of past

values. Hence, the expertise and experiences of the analyst play an

important role when analysing data. Two important types of stochastic

models used are stationary and non-stationary models.

Chapter 5 State Space Approach

State Space Modelling is a popular approach in time series analysis

because of its ability to adapt to different time series models such as the

Box-Jenkins ARIMA and Structural Time Series. Observations can be

added one at a time and the estimating equations are then updated to

produce new estimates.

Chapter 6 Analysis and Comparison

Since we have examined various time series modelling approaches, we

are going to apply those approaches to simulated data sets derived from

different time series models. From this exercise, we should be able to

gain an insight on how each method performs for different time series

situations.

22
Chapter 7 Conclusion

In conclusion, we will make appropriate comparisons between methods

and highlight their performance on estimating missing value for ARIMA

models.

23
CHAPTER 2

Literature Review

This section of the thesis covers the available research literature related to time series

with missing values.

During the research, we have focused on linear time series models as there is little

reference in the literature to nonlinear time series data and to spatial data with missing

values. One approach relating to spatial data with missing values was outlined by

Gomez et al (1995). It uses the bootstrap method to input missing natural resource

inventory data. In addition, two articles on non-linear time series model were identified.

The latest article is by Volker Tresp and Reimar Hofmann (1998) that develops

techniques for nonlinear time series prediction with missing and noisy data. This is an

extension of their previous article entitled Missing and Noisy Data in Nonlinear Time

Series Prediction published in 1995.

Traditionally the approach to obtaining missing values for linear time series has

involved the use of curve fitting. Details of these approaches can be found in many

books such as: Applied Numerical Analysis by Curtis F. Gerald & Patrick O. Wheatley

(1994), The Analysis of Time Series An Introduction by Chris Chatfield (2003), Time

Series Forecasting Simulation & Application by Gareth Janacek and Louise Swift

(1993), Forecasting, Structural Time Series Models and the Kalman Filter by Andrew

24
C. Harvey (2001), Time Series: Theory and Methods by Peter J. Brockwell & Richard

A. Davis (1991) and Time Series Analysis by James D. Hamilton (1994).

In more recent times the focus has shifted towards more sophisticated approaches using

Box Jenkins ARIMA and State Space Modelling. A summary of relevant articles

relating to linear time series with missing values is provided below.

One of the earliest articles which relates to missing values in time series using state

space modelling was written by Kalman (1960). In this article he outlined a new

approach to linear filtering and prediction problem. The author investigated the classical

filtering and prediction problem by using the Bode-Shannon representation of random

processes and the “state-transition” method of analysis of dynamic systems. As a result

of this investigation, the author discovered the following:

a) The formulation and method of solution of the problem apply without

modification to stationary and nonstationary statistics and to growing-memory

and infinite-memory filters.

b) A nonlinear difference (or differential) equation is derived for the covariance

matrix of the optimal estimation error. From the solution of this equation the

coefficients of the difference (or differential) equation of the optimal linear filter

are obtained without further calculations.

25
c) The filtering problem is shown to be the dual of the noise-free regulator

problem. The new method developed here is applied to two well-known

problems, confirming and extending earlier results.

(R. E. Kalman, 1960, p.35)

The approach developed by Kalman in this article has been extended by Richard H.

Jones (1980) who derived a method of calculating the exact likelihood function of a

stationary autoregressive moving average (ARMA) time series based on Akaike’s

Markovian representation combined with Kalman recursive estimation. This state space

approach involves matrices and vectors with dimensions equal to Max (p,q+1) where p

is the order of the autoregression and q is the order of the moving average, rather than

matrices with dimensions equal to the number of observations. A key to the calculation

of the exact likelihood function is the proper calculation of the initial state covariance

matrix. The paper also included some discussion on observational error in the model

and the extension to missing observations. The use of a nonlinear optimization program

gives the maximum likelihood estimates of the parameters and allows for model

identification based on Akaike’s Information Criterion (AIC).

G. Gardner A.C. Harvey and G.D.A. Phillips (1980) presented an algorithm that enables

the exact likelihood function of a stationary autoregressive-moving average (ARMA)

process to be calculated by means of the Kalman filter. It consists of two basic

subroutines. The first, subroutine STARMA, casts the ARMA model into the “state

space” form necessary for Kalman filtering, and computes the covariance matrix

associated with the initial value of the state vector. The second subroutine, KARMA,

26
carries out the recursions and produces a set of standardized prediction errors, together

with the determinant of the covariance matrix of the observations. These two quantities

together yield the exact likelihood, this may be maximized by an iterative procedure

based on a numerical optimization algorithm which does not require analytic

derivatives. In particular, the second subroutine KARMA contains a device whereby the

likelihood may be approximated to a level of accuracy which is under the control of the

user. This enables a considerable amount of computing time to be saved, with very little

attendant loss in precision. Finally, another subroutine, KALFOR, may be used to

compute predictions of future values of the series, together with the associated

conditional mean square errors.

Four years later, A. C. Harvey and R. G. Pierse (1984) discussed two related problems

involving time series with missing data. The first concerns the maximum likelihood

estimation of the parameters in an ARIMA model when some of the observations are

missing or subject to temporal aggregation. The second concerns the estimation of the

missing observations. They also pointed out both problems can be solved by setting up

the model in state space form and applying the Kalman filter. During 1986, Robert

Kohn and Craig F. Ansley showed how to define and then compute efficiently the

marginal likelihood of an ARIMA model with missing observations. The computation is

carried out by using the univariate version of the modified Kalman filter introduced by

Ansley and Khon (1985), which allows a partially diffuse initial state vector. They also

show how to predict and interpolate missing observations and obtain the mean squared

error of the estimate. With the help of modern computers, state space modelling has

became a much simpler process than before and mathematicians have yet to examine

ways to further develop this useful process.

27
There are many articles which focus on the use of ARIMA models as well as other

techniques to determine missing values in time series. Eivind Damsleth (1979)

developed a method to find the optimal linear combination of the forecast and

backforecast for missing values in a time series which can be represented by an ARIMA

model. During the following year, W. Dunsmuir and P.M. Robinson (1981) derived a

method for the estimation of models for discrete time series in the presence of missing

data. They also stated the advantages for the use of this method over alternatives. At the

beginning they assessed the performance of the method used in estimating simple

models by using simulations, and then they applied it to a time series of pollution levels

containing some missing observations. In 1981, B. Abraham discussed a method based

on forecasting techniques to estimate missing observations in time series. He also

compared this method using minimum mean square estimate as a measure of efficiency.

Another popular estimation procedure is called exact maximum likelihood estimation.

Michael A. Wincek and Gregory C. Reinsel (1986) developed an explicit procedure to

obtain the exact maximum likelihood estimates of the parameters in a regression model

with ARMA time series errors with possibly non-consecutive data. The method is based

on an innovation transformation approach from which an explicit recursive procedure is

derived for the efficient calculation of the exact likelihood function and associated

derivatives. The innovations and associated derivatives are used to develop a modified

Newton-Raphson procedure for computation of the estimates. A weighted nonlinear

least squares interpretation of the estimator is also given. A numerical example is

provided to illustrate the method. At the same time, P. M. Robinson (1985)

demonstrated how to use the score principle to test for serial correlation in residuals for

static time series regression in the presence of missing data. It is applied both to the

28
likelihood conditional on the observation times, and to an unconditional form of

likelihood. Also, asymptotic distributions of the test statistics are established, under both

the null hypothesis of no serial correlation, and sequences of local, correlated,

alternatives, enabling analytic comparison of efficiency.

Osvaldo Ferreiro (1987) discussed different alternatives for the estimation of missing

observation in stationary time series for autoregressive moving average models. He

indicated that the occurrence of missing observations is quite common in time series

and in many cases it is necessary to estimate them. The article offers a series of

estimation alternatives to help estimate missing observations. Two years later, Yonina

Rosen, Boaz Porat (1989) considered the estimation of the covariances of stationary

time series with missing observations. The estimation provided general formulas for the

asymptotic second-order moments of the sample covariances, for either random or

deterministic pattern of missing values. The authors derived closed-form expressions for

the random Bernoulli pattern and for the deterministic periodic of missing observations.

These expressions are explicitly evaluated for autoregressive moving average time

series and the results are useful for constructing and analysing parameter or spectrum

estimation algorithm based on the sample covariances for stationary time series with

missing observations.

Both authors also consider the problem of spectral estimation through the autoregressive

moving average modelling of stationary processes with missing observations. They

present a class of estimators based on the sample covariances and propose an

asymptotically optimal estimator in this class. The proposed algorithm is based on a

nonlinear least squares fit of the sample covariances computed from the data to the true

29
covariances of the assumed ARMA model. The statistical properties of the algorithm

are explored and used to show that it is asymptotically optimal, in the sense of achieving

the smallest possible asymptotic variance. The performance of the algorithm is

illustrated by some numerical examples. During the same year, Greta M Ljung (1989)

derived an expression for the likelihood function of the parameters in an autoregressive

moving average model when there are missing values within the time series data. Also,

the method to estimate the missing values for stationary as well as nonstationary models

related to the mean squared errors are considered. Daniel Pena and George C. Tiao

(1991) demonstrated that missing values in time series can be treated as unknown

parameters and estimated by maximum likelihood or as random variables and predicted

by the expectation of the unknown values given the data. They provided examples to

illustrate the difference between these two procedures.

It is argued that the second procedure is, in general, more relevant for estimating

missing values in time series. Thomas S. Shively (1992) constructed some tests for

autoregressive disturbances in a time series regression with missing observations, where

the disturbance terms are generated by (i) an AR(1) process and (ii) an AR(p) process

with a possible seasonal component. Also, a point optimal invariant (POI) test is

constructed for each problem to check for efficiency. In addition, the paper shows how

to compute exact small sample p-values for the tests in O(n) operations, and gives a

computationally efficient procedure for choosing the specific alternative against which

to make the POI test most powerful invariant. Steve Beveridge (1992) also extended the

concept of using minimum mean square error linear interpolator for missing values in

time series to handle any pattern of non-consecutive observations. The paper refers to

the application of simple ARMA models to discuss the usefulness of either the

30
nonparametric or the parametric form of the least squares interpolator. In more recent

years, Fabio H. Nieto and Jorge Martinez (1996) demonstrated a linear recursive

technique that does not use the Kalman filter to estimate missing observations in

univariate time series. It is assumed that the series follows an invertible ARIMA model.

This procedure is based on the restricted forecasting approach, and the recursive linear

estimators are obtained when the minimum mean-square error are optimal. In 1997,

Alberto Luceno (1997) extended Ljung’s (1989) method for estimating missing values

and evaluating the corresponding likelihood function in scalar time series to the vector

cases. The series is assumed to be generated by a possibly partially nonstationary and

noninvertible vector autoregressive moving average process. It is assumed no particular

pattern of missing data existed. In order to avoid initialisation problems, the author does

not use Kalman filter iterations. Also, it does not require the series to be differenced and

thus avoids complications caused by over-differencing. The estimators of the missing

data are provided by the normal equations of an appropriate regression technique. These

equations are adapted to cope with temporally aggregated data; the procedure parallels a

matrix treatment of contour conditions in the analysis of variance.

It can be seen from the literature that there are a variety of methods available to

estimating missing values for time series data. What is however lacking in the literature

is a comparison of different methods for different types of data sets and different

positions for the missing data. This thesis aims to provide such a comparison by using a

variety of simulated data sets with missing values in different locations.

31
CHAPTER 3

Deterministic Models for Missing Values

3.1 About this Chapter

In general, deterministic models for time series refers to the use of numerical analysis

techniques for modeling time series data. A major advantage of numerical analysis is

that a numerical answer can be obtained even when a problem has no “analytical”

solution. (p.2 Gerald and Wheatly 1994)

The principle of numerical analysis is to assume the time series data pattern is a

realisation of an unknown function. The aim is to identify the most appropriate function

to represent the data in order to estimate the missing values. We assume the behaviour

of the time series data follows a polynomial function or combination of polynomial

functions and examine the time interval that involved the missing values. Sometimes

this is the most difficult part of the analysis process. We have to examine all the factors

involved and decide the appropriate length of time interval to be considered. We will

then find a polynomial that fits the selected set of points and assume that the polynomial

and the function behave nearly the same over the interval in question. Values of the

polynomial should be reasonable estimates of the values of the unknown function

(p.213 Gerald and Wheatly 1994). However, when the data appears to have local

irregularities, then we are required to fit sub-regions of the data with different

polynomials. This includes special polynomials called splines.

32
For most of the time series data, we do not want to find a polynomial that fits exactly to

the data. Often functions used to fit a set of real values will create discrepancies or the

data set may come from a set of experimental measurements that are subject to error. A

technique called least squares is normally used in such cases. Based on statistical

theory, this method finds a polynomial that is more likely to approximate the true values

(p.213 Gerald and Wheatly 1994).

3.2 Least Squares Approximations

For any curve fitting exercise, we are trying to minimize the deviations of the data

points from the estimated curve. Our goal is to make the magnitude of the maximum

error a minimum, but for most problems this criterion is rarely undertaken because the

absolute-value function has no derivative at the origin. The usual approach is to

minimize the sum of the squares of the errors for a polynomial of given degree, the

“least-squares” principle.

In addition to giving a unique result for a given set of data, the least-squares method is

also in accord with the maximum-likelihood principle of statistics. If the measurement

errors are normally distributed and if the standard deviation is constant for all the data,

the line determined by minimizing the sum of squares can be shown to have values of

slope and intercept which have maximum likelihood of occurrence.

It is highly unlikely that time series data is linear, so we need to fit the data set with

functions other than a first-degree polynomial. As we use higher-degree polynomials,

33
we will reduce the deviations of the points from the curve until the degree of the

polynomial, n, equals to one less than the number of data points, where there is an exact

match (assuming no duplicate data at the same x-value). We call this function an

interpolating polynomial which is illustrated below:

Consider fitting a polynomial of fixed degree m

y = a 0 + a1 x + a 2 x 2 + L + a m x m (1)

to n data points (x1 , y1 ), (x 2 , y 2 ),L, (x n , y n ) .


By substituting these n values into equation (1), we should obtain a system of equations

y1 = a0 + a1 x1 + L + a m x1m

y 2 = a 0 + a1 x 2 + L + a m x 2m

M M M M M M

y n = a0 + a1 x n + L + a m xnm . (2)

In order to solve the system of equations (2), we would use matrices to represent the

system y = Mv

where

⎡ y1 ⎤ ⎡1 x1 x12 L x1m ⎤ ⎡ a0 ⎤
⎢y ⎥ ⎢ ⎥ ⎢a ⎥
1 x2 x 22 L x 2m ⎥
y = ⎢ 2⎥, M =⎢ , v=⎢ 1⎥.
⎢M⎥ ⎢M M M M ⎥ ⎢ M ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ yn ⎦ ⎢⎣1 x n x n2 L x nm ⎥⎦ ⎣a m ⎦

Hence, the coefficients of the polynomial can be determined by the following:

y = Mv

M t y = M t Mv

(M M )
t −1
(
M t y = M tM ) (M M )v
−1 t

v = ( M t M ) −1 M t y

34
Example 3.1 Estimate missing values using least square approximations.

Given the following data points:

T X(t) Actual Value X(t)

1 1.700 1.700
2 1.285 1.285
3 (3.708) missing
4 (6.001) missing
5 (7.569) missing
6 10.170 10.170
7 8.777 8.777
Table 3.2.1 Seven data points with three consecutive missing values.

For missing values at t = 3, 4, and 5 we would use the data points

t = 1 , x(t) = 1.7; t = 2 , x(t) = 1.285; t = 6 , x(t) = 10.17; t = 7 , x(t) = 8.777

Hence,

1.7 = a0 + 1a1 + 1a 2 + 1a3

1.285 = a 0 + 2a1 + 4a 2 + 8a3

10.17 = a 0 + 6a1 + 36a 2 + 216a3

8.777 = a0 + 7 a1 + 49a 2 + 343a3

⎡ 1.7 ⎤ ⎡1 1 1 1 ⎤ ⎡a 0 ⎤ ⎡1 1 1 1 ⎤
⎢1.285 ⎥ ⎢1 2 4 ⎥
8 ⎥ ⎢ ⎥
a1 ⎥ ⎢1 2 6 7 ⎥⎥
y= ⎢ ⎥ , M =⎢ , v= ⎢ , M =⎢
t
⎢10.17 ⎥ ⎢1 6 36 216⎥ ⎢a 2 ⎥ ⎢1 4 36 49 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣8.777⎦ ⎣1 7 49 343⎦ ⎣ a3 ⎦ ⎣1 8 216 343⎦

v = ( M t M ) −1 M t y

35
⎡ 5.67 ⎤
⎢− 6.16⎥
v=⎢ ⎥
⎢ 2.4 ⎥
⎢ ⎥
⎣ − 0.21⎦

Hence, the polynomial of 3rd degree is:

y = 5.67 − 6.16 x + 2.4 x 2 − 0.21x 3

t X(t) Actual Value X(t) Estimate

1 1.700
2 1.285
3 (3.708) Missing 3.12
4 (6.001) Missing 5.99
5 (7.569) Missing 8.62
6 10.170
7 8.777

Table 3.2.2 Estimate missing values using least square approximations.

Estimate missing values using least


square approximations
12
10
8 TRUE
x(t)

6
4 Estimate
2
0
1 2 3 4 5 6 7
t

Figure 3.1 Estimate missing values using least square approximations.

Example 3.1 shows that interpolating polynomial has several advantages. Firstly the

method is easy to follow and quick to produce estimating values; it is not restricted to

estimating a single missing value; and provides reasonable estimates compared to the

36
original missing values. However, this approach does not examine any natural factors

concerning the time series that could lead to unrealistic estimations.

3.3 Interpolating with a Cubic Spline

A cubic spline is a common numerical curve fitting strategy that fits a "smooth curve"

to the known data, using cross-validation between each pair of adjacent points to set the

degree of smoothing and estimate the missing observation by the value of the spline.

While splines can be of any degree, cubic splines are the most popular. We write the

equation for a cubic in the ith interval as follows:

Fi ( x) = Ai ( x − x i ) 3 + Bi ( x − x i ) 2 + C i ( x − x i ) + Di (3)

Thus the cubic spline function we want is of the form

F ( x) = Fi ( x) on the interval [ xi , xi +1 ] , for i = 0,1,..., (n − 1)

and meets these conditions:

Fi ( x) = Yi , i = 0,1,..., n − 1 and Fn −1 ( x n ) = Yn , (4)

Fi ( xi +1 ) = Fi +1 ( xi +1 ), i = 0,1,..., n − 2; (5)

Fi ( xi +1 ) = Fi +' 1 ( xi +1 ), i = 0,1,..., n − 2;
'
(6)

Fi ( xi +1 ) = Fi +'' 1 ( xi +1 ), i = 0,1,..., n − 2;
''
(7)

37
Equations (4), (5), (6) and (7) indicate that the cubic spline fits to each of the points, is

continuous, and has continuous slope and curvature (6) and (7) throughout the region.

If there are (n + 1) points in equation (3), the number of intervals and the number of

Fi ( x)' s are n . There are thus n times four unknowns which are the { Ai , Bi , C i , Di }

for i = 0,1,..., (n − 1) . We know that:

Yi = Di for i = 0,1,..., (n − 1)

Yi +1 = Ai ( xi +1 − xi ) 3 + Bi ( x i +1 − xi ) 2 + Ci ( x i +1 − xi ) + Yi

= Ai H i 3 + Bi H i 2 + Ci H i + Yi for i = 0,1,..., (n − 1) (8)

where H i = ( xi +1 − xi ) , the width of the ith interval.

To relate the shapes and curvatures of the joining splines, we differentiate the function

(8) and the result is below:

Fi ' ( x) = 3 Ai H i + 2 Bi H i + Ci ,
2
(9)

Fi ' ' ( x) = 6 Ai H i + 2 Bi , for i = 0,1,..., (n − 1) (10)

If we let S i = Fi ' ' ( xi ) for i = 0,1,..., (n − 1)

S i = 6 Ai ( xi − xi ) + 2 Bi ,
(11)
= 2 Bi

Si
therefore equation (11) becomes Bi = (12)
2

S i +1 = 6 Ai ( xi +1 − xi ) + 2 Bi ,
(13)
= 6 Ai H i + 2 Bi

38
S i +1 − S i
After re-arranging equations (13) and substitute (12) into (13): Ai = (14)
6H i

S i +1 − S i 3 S i
Yi +1 = H i + H i + C i H i + Yi
2
Substitute equations (12) and (14) into (8)
6H i 2

Yi +1 − Yi 2 H i S i + H i S i +1
As a result Ci = − (15)
Hi 6

Invoking the condition that the slopes of the two cubics that join at ( X i , Yi ) are the

same, we obtain the following from equation (9):

Yi ' = 3 Ai ( xi − xi ) 2 + 2 Bi ( xi − xi ) + C i ,
where x = xi
= Ci

Yi +1 − Yi 2 H i S i + H i S i +1
= − which is equivalent to equation (15).
Hi 6

In the previous interval, from xi −1 to xi , from equation (9), the slope at its right end will

be:

Yi ' = 3 Ai −1 ( xi − xi −1 ) 2 + 2 Bi −1 ( xi − xi −1 ) + Ci −1 ,

= 3 Ai −1 H i −1 + 2 Bi −1 H i −1 + C i −1
2

When we substitute equation (12), (14) and (15) we have:

S i − S i −1 S Y − Yi −1 2 H i −1 S i −1 + H i −1 S i
=3 H i −1 + 2 i −1 H i −1 + i −
2
(16)
6 H i −1 2 H i −1 6

When we simplify equation (16) we get:

Yi +1 − Yi Yi − Yi −1
H i −1 S i −1 + (2 H i −1 + 2 H i ) S i + H i S i +1 = 6( − ) (17)
H i −1 H i −1

39
If we write equation (17) in matrix form, we will have the following:

⎡ ⎤ ⎡ S0 ⎤
⎢H 2(H + H ) H ⎥⎢ S ⎥
⎢ 0 0 1 1 ⎥⎢ 1 ⎥
⎢ H 2(H + H ) H ⎥ ⎢ S2 ⎥
⎢ 1 1 2 2 ⎥⎢ S ⎥
⎢ H 2(H + H ) H ⎥⎢ 3 ⎥
2 2 3 3
⎢ * ⎥⎢ * ⎥
⎢ ⎥⎢ ⎥
⎢ * ⎥⎢ * ⎥
⎢ H 2(H +H ) H ⎥ ⎢S ⎥
⎢ n−2 n−2 n −1 n − 1⎥ ⎢ n − 1⎥
⎣⎢ ⎦⎥ ⎢⎣ Sn ⎥⎦

⎡ F [ x1, x2 ] − F[ x , x ] ⎤
⎢ F[ x , x ] 0 1
⎢ − F [ x , x ] ⎥⎥
2 3 1 2 (18)
⎢ F[ x , x ] − F[ x , x ] ⎥
= 6⎢ 3 4 2 3 ⎥
⎢ * * ⎥
⎢ * * ⎥
⎢ ⎥
− F[ x
⎣⎢
F [ x ,x ] ,x ]⎥
n −1 n n − 2 n −1 ⎦

As we can see from the matrix (18) we get two additional equations involving S 0 and

S n when we specify conditions pertaining to the end intervals of the whole curve. There

are four alternative conditions which we would often use and each of these conditions

will make slight changes to our coefficient matrix. This is when we:

(a) Make the end cubics approach linearity at their extremities. Hence, S 0 = 0 and

S n = 0 . This condition is called a natural spline.

Coefficient Matrix :

⎡ ⎤
⎢2( H + H ) H ⎥
⎢ 0 1
1 ⎥
⎢ H1 2( H 1 + H 2 ) H2 ⎥
⎢ ⎥
⎢ H2 2( H 2 + H 3 ) H 3 ⎥
⎢ * ⎥
⎢ ⎥
⎢ * ⎥
⎢ H n−2 2 ( H n − 2 + H n −1 ) ⎥
⎢ ⎥
⎣⎢ ⎦⎥

40
(b) Force the slope at each end to assume specified values.

If F ' ( x0 ) = A and F ' ( x n ) = B

we use the following relations:

At the left end : 2 H 0 S 0 + H 1 S 1 = 6 ( F [ x0. x1 ] − A)

At the right end : H n −1 S n −1 + 2 H n S n = 6 ( B − F [ x n − 1 , x n ] )

Coefficient Matrix :

⎡ ⎤
⎢2 H H1 ⎥
⎢ 0 ⎥
⎢ H0 2( H 0 + H 1 ) H ⎥
⎢ 1 ⎥
⎢ H1 2( H 1 + H 2 ) H2 ⎥
⎢ * ⎥
⎢ ⎥
⎢ * ⎥
⎢ H n−2 2 H n −1 ⎥
⎢ ⎥
⎣⎢ ⎦⎥

(c) Assume the end cubics approach parabolas at their extremities.

Hence, S 0 = S 1 , and S n −1 = S n − 2 .

Coefficient Matrix :

⎡ ⎤
⎢3 H + 2 H H1 ⎥
⎢ 0 1 ⎥
⎢ H1 2( H 1 + H 2 ) H2 ⎥
⎢ ⎥
⎢ H2 2( H 2 + H 3 ) H3 ⎥
⎢ * ⎥
⎢ ⎥
⎢ * ⎥
⎢ H n−2 ( 2 H n − 2 + 3 H n −1 ) ⎥
⎢ ⎥
⎣⎢ ⎦⎥

41
(d) Take S 0 as a linear extrapolation from S 1 and S 2 , and S n as a linear

extrapolation from S n −1 and S n − 2 . Only this condition gives cubic spline curves that

match exactly to F (x) when F (x) is itself a cubic. We use the following relations:

S1 − S0 S 2 − S1
= ,
H0 H1
At the left end:
( H 0 + H 1 )S 1 − H 0 S 2
S0 =
H1

S n − S n −1 S n −1 − S n − 2
= ,
H n −1 H n−2
At the right end:
( H n − 2 + H n −1 ) S n −1 − H n −1 S n − 2
Sn =
H n−2

(Note: After solving the set of equations, we have to use these relations again to

calculate the values S 0 and S n .)

Coefficient Matrix :

⎡ ⎤
⎢ ( H + H )( H + 2 H ) H 21 − H 20 ⎥
⎢ 0 1 0 1

⎢ H1 H1 ⎥
⎢ H1 2( H 1 + H 2 ) H2 ⎥
⎢ ⎥
⎢ H2 2( H 2 + H 3 ) H3 ⎥
⎢ * ⎥
⎢ ⎥
⎢ * ⎥
⎢ H 2 n − 2 − H 2 n −1 ( H n −1 + H n − 2 )( H n −1 + 2 H n − 2 ) ⎥
⎢ H n−2 H n−2 ⎥
⎢ ⎥
⎣ ⎦

42
After the S i values are obtained, we get the coefficients Ai , Bi , C i and Di for the

cubics in each interval and we can calculate the points on the interpolating curve.

(Curtis F. Gerald & Patrick O. Wheatley 1994)

Example 3.2 Estimate missing values using cubic spline:

t −1
⎛ t −1⎞
2

We are going to estimate the time series f (t ) = 2e 5


+ 3⎜ ⎟ + Et for several
⎝ 5 ⎠

missing values where {Et } is a sequence of independent identically distributed normal

random variables with mean zero and constant variance.

t f(t) True Value f(t)


1 1.700 1.700
2 1.285 1.285
3 (3.708) missing
4 (6.001) missing
5 (7.569) missing
6 10.170 10.170
7 8.777 8.777
8 13.756 13.756
9 18.681 18.681
10 (20.733) missing
11 (26.088) missing
12 (30.880) missing
13 37.479 37.479
14 46.230 46.230

Table 3.3.1 Estimate missing values using cubic spline.

43
Firstly, we have to calculate the width of ith interval Hi:
H0 = 2-1 = 1
H1 = 6-2 = 4
H2 = 7-6 = 1
H3 = 8-7 = 1
H4 = 9-8 = 1
H5 = 13-9 = 4
H6 = 14-13 = 1

For a natural cubic spline, we used end condition 1 and solve

⎡ 10.17 − 1.285 1.285 − 1.7 ⎤


⎢ − ⎥
4 1
⎡10 4 0 0 0 0 ⎤ ⎡ S1 ⎤ ⎢ 8.777 − 10.17 10.17 − 1.285 ⎥
⎢ ⎥⎢ ⎥ ⎢ − ⎥
⎢ 4 10 1 0 0 0 ⎥ ⎢S 2 ⎥ ⎢ 1 4 ⎥
⎢ ⎥⎢ ⎥ ⎢ 13.756 − 8.777 −
8.777 − 10.17 ⎥
⎢0 1 4 1 0 0 ⎥⎢S3 ⎥ ⎢ 1 1 ⎥
⎢ ⎥ ⎢ ⎥ = 6 ⎢ 18.681 − 13.756 13.756 − 8.777 ⎥
⎢0 0 1 4 1 0 ⎥ ⎢S 4 ⎥ ⎢ − ⎥
⎢0 0 0 1 10 4 ⎥ ⎢ S 5 ⎥ ⎢ 1 1 ⎥
⎢ ⎥⎢ ⎥ ⎢ 37.479 − 18.681 18.681 − 13.756 ⎥

⎢0 0 0 ⎥ ⎢ ⎥ ⎢ 4 1 ⎥
⎣ 0 4 10⎦ ⎣ S 6 ⎦
⎢ 46.230 − 37.479 37.479 − 18.681⎥
⎢ − ⎥
⎣ 1 4 ⎦

⎡10 4 0 0 0 0 ⎤ ⎡ S1 ⎤ ⎡ 15.8175 ⎤
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 4 10 1 0 0 0 ⎥ ⎢ S 2 ⎥ ⎢- 8.14933⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 0 1 4 1 0 0 ⎥ ⎢ S 3 ⎥ ⎢4.225498⎥ ,
⎢ ⎥⎢ ⎥ = ⎢ ⎥
⎢ 0 0 1 4 1 0 ⎥ ⎢ S 4 ⎥ ⎢ - 1.25552 ⎥
⎢ 0 0 0 1 10 4 ⎥ ⎢ S ⎥ ⎢20.04885⎥
⎢ ⎥⎢ 5 ⎥ ⎢ ⎥
⎢ 0 0 0 0 4 10⎥ ⎢ S ⎥ ⎢ 24.309 ⎥
⎣ ⎦⎣ 6 ⎦ ⎣ ⎦

giving S0=0, S1 =2.35769, S2 =-1.93987, S3 =1.81854, S4 =-1.10882, S5 =1.3612,


S6 =1.88642 and S7=0.

Using these S’s, we compute the coefficients of the individual cubic splines.

For interval [2,6], we have A1 = -0.17907, B1 = 1.17885, C1 = 0.37091 and D1 = 1.285.

f i (t ) = −1.7907(t − 2) 3 + 1.17885(t − 2) 2 + 0.37091(t − 2) + 1.285

For interval [9,13], we have A5 = 0.02188, B5 = 0.6806, C5 = 1.62695 and D5 = 18.681.

f i (t ) = 0.02188(t − 9) 3 + 0.6806(t − 9) 2 + 1.62695(t − 9) + 18.681

Figure 3.2 show the estimated missing values using cubic spline curve.

44
Curve Fitting (cubic spline curve)

50

40

30 True Values
f(t)
20 Estimated Values

10

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14
t

Figure 3.2 Estimate missing values using cubic spline.

In example 3.2, we use f 1 (t ) to find the following:


f1 (3) = 2.656 (true = 3.707907)
f1 (4) = 5.310 (true = 6.000711)
f1 (5) = 8.173 (true = 7.569432)
Also, we use f 5 (t ) to find: f 5 (10) = 21.01 (true = 20.73259)
f 5 (11) = 24.832 (true = 26.08791)
f 5 (12) = 30.278 (true = 30.87959)

Once again with the help of modern technology, this approach is easy to follow and

very quick to produce results. It is a popular approach as it cross-validates between each

pair of adjacent points to set the degree of smoothing. In some cases, this is more

accurate than other interpolating polynomial approaches. As with all numerical analysis

methods, it does not examine the nature as well as the purpose of the data set. While this

is not a problem for the less complex data set, unrealistic estimations could occur in

some cases. When using the methods above to analyse a set of data, it is not required

that the user have a full background understanding of the time series analysis and

therefore those estimations could be less convincing at times.

45
CHAPTER 4

Box-Jenkins Models for Missing Values

4.1 About this Chapter

In our society, data analysis is an extremely important process. Through data analysis, it

is possible to make decisions based on facts. Time series analysis is a specific type of

data analysis; we must realize that successive observations are usually not independent

and that the analysis must take into account the time order of the observations. In

chapter 3 we have mentioned deterministic models, a time series that can be predicted

exactly to its behaviour. However, most of the time series are realisations of stochastic

models. Future values are only partly determined by the past values, so that exact

predictions are impossible. We must therefore consider future values as realisations

from a probability distribution which is conditioned by knowledge of past values.

Hence, the expertise and experiences of the analyst play an important role when

analysing data. Two important types of stochastic models used are stationary and non-

stationary models and these are discussed below.

46
4.2 Stationary Models

Firstly, let us look at stationary time series from an intuitive point of view. If the

property of one section of the data is much like those of any other section then we can

call the series “stationary”. In other word, the series has no systematic change in mean

and variance. In addition, any periodic variations must also be removed.

The stationary time series model shows that values fluctuate uniformly over a set period

of time at a fixed level. The fixed level over which it fluctuates is generally the mean of

the series. “Most of the probability theory of time series is concerned with stationary

time series, and for this reason time series analysis often requires one to turn a non-

stationary series into a stationary one so as to use this theory.”(Chatfield, 1989 p.10)

Mathematically, a time series ARMA process of order (p,q) can be defined as follows:

X t = φ1 X t −1 + ... + φ p X t − p + Et + θ1 Et −1 + ... + θ q Et − q

where Et (white noise) is a sequence of independent and identically distributed normal

random variable with mean zero and variance σ 2z , and the φi ' s and θ j ' s are called the

autoregressive and moving average parameters respectively.

A time series is said to be strictly stationary if the joint distribution of X (t1 ),..., X (t n ) is

the same as the joint distribution of X (t1 + τ ),..., X (t n + τ ) for all t1,....t n ,τ . In other

words, shifting the time origin by an amount τ has no effect on the joint distributions,

47
which must therefore depend only on the interval between t1, .t 2 ...t n . The above

definition holds for any value of n . (Chatfield, 2003 p.34). More information on

stationary models can be found in Chatfield 2003 p.34 and 35.

The following are two useful tools available to help identify different time series

models:

1 The Autocorrelation function (ACF) which is one of the most important

tools in the identification stage for building time series models. It measures

how strongly time series values at a specified number of periods apart are

correlated to each other over time. The number of periods apart is usually

called the lag. Thus an autocorrelation for lag 1 is a measure of how

successive values are correlated to each other throughout the series. An

autocorrelation for lag 2 measures how series values that are two periods

away from each other are correlated throughout the series. The

autocorrelation value may range from negative one to positive one, where a

value close to positive one indicates a strong positive correlation, a value

close to negative one indicates a strong negative correlation, and a value

close to 0 indicates no correlation. Usually, a set of autocorrelation values

would be computed for a given time series corresponding to a range of lags

values i.e. 1, 2, 3 …etc.

The autocorrelation between X t and X t − m is defined as:

cov( X t , X t − m )
ρ i = Corr ( X t , X t −m ) =
var( X t ) var( X t − k )

48
2 The Partial-autocorrelation function (PACF) is another important tool,

similar to autocorrelation function. With PACF a set of partial-

autocorrelation values would be computed for a given time series

corresponding to a range of lag values which are used to evaluate

relationships between time series values. Partial-autocorrelations values

also range from negative one to positive one, and are very useful in some

situations where the autocorrelation patterns are hard to determine.

Both autocorrelations and partial-autocorrelations are usually displayed either as a table

of values or as a plot or graph of correlation values, called a correlogram. Correlograms

are special graphs in which autocorrelations or partial-autocorrelations for a given time

series is plotted as a series of bars or spikes. It is probably the most widely used method

of visually demonstrating any patterns that may exist in the autocorrelations or partial-

autocorrelations of a stationary time series. As a result, it plays an important role in

identifying the most appropriate model for time series data.

4.3 Non-stationary Models

A time series is non-stationary if it appears to have no fixed level. The time series may

also display some periodic fluctuations. Most of the time series tools only apply to

stationary time series. In other words, we have to examine the data and try to transform

any non-stationary data set into a stationary model. There are two main types of

transformations to make the series stationary. Firstly, the data can be transformed

through differencing if the stochastic process has an unstable mean. This type of

49
transformation is used for the purpose of removing the polynomial trend that is

exhibited by the data. The logarithmic and square root transformations are special cases

of the class of transformations called the Box-Cox transformation which can be used to

induce stationarity. These transformations are used if the series being examined has a

non-constant mean and variance and it results in a straighter curve plot (Box et al.,1976,

pp.7-8, 85).

4.4 Box-Jenkins Models

In general, the Box-Jenkins models provide a common framework for time series

forecasting. It emphasizes the importance of identifying an appropriate model in an

interactive approach. In addition, the framework can cope with non-stationary series by

the use of differencing.

Box-Jenkins’ method derives forecasts of a time series solely on the basis of the

historical behaviour of the series itself. It is a univariate method which means only one

variable is forecast. The ideas are based on statistical concepts and principles that are

able to model a wide spectrum of time series behaviour.

The basic Box-Jenkins’ models can be represented by a linear combination of past data

and random variables. The sequence of random variables is called a white noise process.

These random variables are uncorrelated and normal with mean zero and constant

variance.

50
4.4.1 Autoregressive Models (AR models)

The most basic AR model is the model that contains only one AR parameter as follows:

X t = φX t −1 + Et

where {Et } is a sequence of independent identically distributed normal random variable

with mean zero and variance σ 2z .

It means that any given value X t in the time series is directly proportional to the

pervious value X t −1 plus some random error Et .

As the number of AR parameters increase, X t becomes directly related to additional

past values. The general autoregressive model with p AR parameters can be written as:

X t = φ1 X t −1 + φ 2 X t − 2 + ... + φ p X t − p + Et

where φ1 , φ 2 ,... are the AR parameters. The subscripts on the φ ' s are called the orders of

the AR parameters. The highest order p is referred to as the order of the model. Hence,

it is called an autoregressive model of order p and is usually abbreviated AR( p ).

(Box et al.,1976, pp.7-8, 85). The values of φ which make the process stationary are

such that the roots of φ ( B) = 0 lie outside the unit circle in the complex plane where B

is the backward shift operator such that: B j x t = x t − j and φ(B) = 1 − φ1 B − ... − φ p B p .

(Chatfield, 1989, p.41)

51
4.4.2 Moving Average Models (MA models)

The second type of basic Box-Jenkins model is called the Moving Average Model.

Unlike the autoregressive model, the Moving Average Model parameters relates what

happens in period t only to the random errors that occurred in past time periods, i.e., to

Et −1 , Et − 2 ,... etc. Basic MA model with one parameter can be written as follow:

X t = Et − θ t Et −1

It means that any given X t in the time series is directly proportional only to the random

error Et −1 from the previous period plus some current random error Et .

General moving average with q MA parameters can be written as:

X t = Et − θ 1 Et −1 − θ 2 Et − 2 − ... − θ p Et − p

where θ 1 ,θ 2 ,... are the MA parameters. Like the autoregressive model, the subscripts on

the θ ’s are called the orders of the MA parameters. The highest order q is referred to as

the order of the model. Hence, it is called a moving-average model of order q and is

usually abbreviated MA( q ).

4.4.3 Autoregressive Moving Average Models (ARMA models)

In most case, it is best to develop a mixed autoregressive moving average model when

building a stochastic model to represent a stationary time series. The order of an ARMA

model is expressed in terms of both p and q . The model parameters relates to what

52
happens in period t to both the past values and the random errors that occurred in past

time periods. A general ARMA model can be written as follow:

X t = Et + (φ1 X t −1 + ... + φt − p X p ) − (θ1 Et −1 + ... + θ q Et −q ) (19)

Equation (19) of the time series model will be simplified by a backward shift operator B

to obtain

φ ( B) X t = θ ( B) Et

where:

B is the backward shift operator such that: B jx t = x t− j and

φ(B) = 1 − φ1 B − ... − φ p B p and

θ ( B) = 1 − θ 1 B − ... − θ q B q .

4.4.4 Autoregressive Integrated Moving Average (ARIMA) Models

Generally, most of the time series are non-stationary and Box-Jenkins’ method

recommends the user to remove any non-stationary sources of variation then fit a

stationary model to the time series data. In practice, we can achieve stationarity by

applying regular differences to the original time series. These models are written in the

same way as the basic models, except that the differenced (stationary) series Wt is

substituted for the original series X t . In order to express ARIMA models, we have to

understand the use of difference operator. For example, the first difference of a series

can be expressed as:

Wt = X t − X t −1 (20)

53
However, the use of a symbol ∇ is used to simplify equation (20). The first differences

of the series X t could then be written as:

Wt = ∇X t

If we take the second consecutive difference of the original series X t , the expression

would be defined as:

Wt = ∇ 2 X t = ∇( X t − X t −1 ) = ∇(∇X t )

In general, the d-th consecutive differencing would be expressed as ∇ d X t (Vandaele,

1983, pp.52-53).

General form of ARIMA can be written as follow:

Wt = φ1Wt −1 + ... + φ pWt − p + Et + ... + θ q Et − q (21)

By using the difference operator, we may write the ARIMA model (21) as follow:

φ ( B)Wt = θ ( B) Et

or

φ ( B)(1 − B) d X t = θ ( B) Et .

In ARIMA models, the term integrated, which is a synonym for summed, is used

because the differencing process can be reversed to obtain the original time series

values by summing the successive values of the differenced series. (Hoff, 1983, pp.126)

54
4.5 Box-Jenkins’ Models and Missing Values

Box-Jenkins’ method is a popular choice for analyst to use on time series modelling. It

provides the following advantages:

i) The generalized Box-Jenkins’ models can model a wide variety of time

series patterns.

ii) There is a systematic approach for identifying the correct model form.

iii) It provides many statistical tests for verifying model validity.

iv) The statistical theory behind Box-Jenkins’ method also allows the

method to use statistical measurements for measuring the accuracy of the

forecast.

All the basic Box-Jenkins’ models have recognizable theoretical autocorrelation and

partial-autocorrelation patterns. To identify a Box-Jenkins’ model for a given time

series model, we compute the autocorrelation and partial-autocorrelation (usually by the

use of correlogram) and compare them with the known theoretical ACF and PACF

patterns. A summary of the theoretical ACF and PACF patterns associated with AR,

MA, and ARMA models can be found in Hoff p.71.

However, if missing values occurred within the time series data then it is impossible to

compute any of these values. For this reason, Box-Jenkins’ methods may not be the best

choice and they cannot be applied directly to time series which includes the missing

values. To apply Box-Jenkins’ method to time series data with missing values, we have

to consider the following:

55
i) How often do the missing values occur?

ii) Where are the missing values located in the time series?

iii) Do we have sufficient data before, after or between the missing values to

apply Box-Jenkins’ method to the remaining data?

It is possible to indirectly apply Box-Jenkins’ method to time series with missing

values. In chapter 3, numerical approaches have been used to estimate missing values

within a time series. The accuracy of results is mainly dependent on the type of time

series. Once missing values have been filled with estimates, Box-Jenkins’ method can

then be applied.

Alternatively, if sufficient data exists before, between or after missing values, Box-

Jenkins’ method can be applied to sections of data that do not contain missing values. In

this case, missing values can be obtained as forward or backward forecast values are

used or combinations of each are applied.

4.6 Least Square Principle

Applying the least square principle to time series with missing values is a basic

approach which can be incorporated into ARIMA modelling. As outlined in Ferreiro

1987 this method is intended to find missing values for stationary time series. By the

use of differencing, it can be applied to time series with one gap or more provided they

are well separated and have sufficient data to obtain an ARIMA model.

56
Consider ARMA process of order (p,q) , the process can be written in more compact

form as follow :

X t = φ1 X t −1 + ... + φ p X t − p + Et + θ 1 Et −1 + ... + θ q Et − q (22)

Then we have to rearrange equation (18), to obtain

X t − φ1 X t −1 − ... − φ p X t − p = Et + θ 1 Et −1 + ... + θ q E t − q (23)

Substituting the backshift operator into equation (19)

φ ( B) X t = θ ( B) Et where φ ( B) = 1 − φ1 B − ... − φ p B p , (24)

θ ( B) = 1 − θ1 B − ... − θ q B q

and B is the backward shift operator

BX t = X t −1 , B i X t = X t −i

Rearranging equation (20), we have

φ ( B)
Et = Xt
θ ( B)

Furthermore, we can simplify the equation to


E t = ∏( B ) X t where ∏( B) = 1 − ∏1 B − ∏ 2 B 2 − .... = −∑ ∏ j B j (25)
j =0

From equation (22), we can express the equation as follow:

Et = X t − ∏ 1 X t −1 − ∏ 2 X t − 2 − ...

57
To calculate the sum of squares of errors over a 2 L points the following formulas are

used:

L
SS = ∑E
t =− L
t
2

L ∞
SS = ∑ ( −∑ ∏
t =− L i =0
i X t −i ) 2

In order to minimize the sum of squares with respect to the missing values the following

steps are required:

Let X s be missing

∂SS L ∞
= 0 ↔ 2∑ ∏ t − s ( ∑ ∏ i X t − i ) = 0 (26)
∂Z s t =s i =0

If we let L → ∞ and substitute j = t − s into equation (26), then

∞ ∞

∑ ∏ j (∑ ∏ i X s + j − i ) = 0
j =0 i =0


- ∑ ∏ j E s + j = ∏ ( B −1 ) E s = 0 and
j =0

E s + j = ∏( B ) X s + j


E s + j = − ∑ ∏ i E s + j −i (27)
i =0

From equation (27), it is possible to obtain the least squares equation for approximating

the missing observations. The equation becomes:

∏( B −1 ) ∏( B) X s = 0

(Ferreiro, O. 1987, pp.66)

58
Example 4.6.1 Estimate missing value in AR(1) process by using Least Square

Principle

X t = 0.7 X t −1 + Et

(1 − 0.7 B) X t = Et

(1 − 0.7 B) = ∏( B) ; (1 − 0.7 B −1 ) = ∏( B −1 )

∏( B −1 ) ∏( B ) X s = 0

(1 − 0.7 B)(1 − 0.7 B −1 ) X t = 0

(1 − 0.7 B − 0.7 B −1 + 0.49 ) X t = 0

(1.49 − 0.7 B − 0.7 B −1 ) X t = 0 (28)

From equation (28), we can express the equation as follow

1.49 X t = 0.7 X t +1 + 0.7 X t −1

0.7 0.7
Hence, Xˆ t = X t +1 + X t −1
1.49 1.49

where X̂ t is the estimate for the missing value.

The data below is obtained by applying the AR(1) process with Et being a purely

random process where the mean equals zero and the variance equals one.

t Xt
1 -0.65712
2 -1.30722
3 -2.43663
4 -2.06851
5 -1.48044 Estimated
6 (-1.00819) Missing -1.17867 0.7 0.7
Xˆ t = X t +1 + X t −1
1.49 1.49
7 -1.02845
8 1.474587
9 -0.71027
10 -1.23367
Table 4.6.1 AR(1) time series model with missing value when t = 6.

59
Least Square Estimation

0
Estimate data
1 2 3 4 5 6 7 8 9 10 11
-1 Original data
-2

-3

-4

Figure 4.1 Estimate missing value in AR(1) process by using Least Square
Principle.

Example 4.6.2 Estimate two missing values in AR(1) process by using Least
Square Principle

t Xt
1 -0.65712
2 -1.30722
3 -2.43663
4 -2.06851
5 -1.48044 Estimated 0.7 ˆ 0.7
Xˆ t = X t +1 + (−1.48044)
6 (-1.00819) Missing -0.30314 1.49 1.49

7 (-1.02845) Missing 0.83518 Xˆ t +1 =


0.7
(1.474587 ) +
0.7 ˆ
Xt
1.49 1.49
8 1.474587
9 -0.71027
10 -1.23367
Table 4.6.2 AR(1) time series model with missing value when t = 6 and 7.

Least Square Estimation

0
Estimate data
1 2 3 4 5 6 7 8 9 10 11
-1 Original Data
-2

-3

-4

Figure 4.2 Estimate two missing values in AR(1) process by using Least Square
Principle.

60
The examples above show how the Box-Jenkins methods are used to estimate missing

values. However, this method is only suitable for lower order of ARMA models when

the calculations are carried out manually. For higher order ARMA (p.q) models it is

more complicated to calculate the missing values and consequently mathematical

processes are usually carried out by using appropriate software packages.

4.7 Interpolation

Interpolation is a concept which can be applied to time series data to determine missing

values. First of all, we have to determine an appropriate ARIMA time series model for

the known data. According to the position of the missing observation we can then

calculate the appropriate weightings for both the forward and backward forecast.

Finally, using a linear combination of the forward and backward forecast with the

weightings already calculated, we are able to predict the values of the missing

observations under the given conditions.

Let us consider the ARIMA (p, d, q) (autoregressive integrated moving average) model

φ ( B)(1 − B ) d X t = θ ( B) Et

where:

B is the backward shift operator such that B j x t = xt − j ,

φ ( B) = 1 − φ1 B − ... − φ p B p ,

θ ( B ) = 1 − θ 1 B − ... − θ q B q , such that φ ( B)θ ( B ) = 0 .

{Et } is a sequence of independent identically distributed normal random variable with

mean zero and variance σ z2 .

61
Given the observations x q , x q −1 ,... the minimum mean square error for forward forecast

of x q + l (l ≥ 1) at time q is

l −1
eˆq (l) = ∑ψ j Eq +l− j , ψ0 = 1
j =0

where ψ j are defined by:

θ ( B) = (1 + ψ 1 B + ψ 2 B 2 + ...)φ ( B )(1 − B) d .

In addition, the “backward” representation of our ARIMA (p,d,q) model can be

expressed as:

φ ( F )(1 − F ) d xt = θ ( F )ct

where:

F is the forward shift operator such that F j xt = xt + j ,

{ct } is a sequence of independent identically distributed normal random variable with

mean zero and variance σ c2 ( σ z2 = σ c2 ).

Now, given x q + m + j ( j ≥ 1) the minimum mean square error for backward forecast of

xq +l at time q + m + 1 is

m −l
es (m + 1 − l) = ∑ψ j c q + l + j ,
~ s = q +m+1
j =0

For stationary models, an optimal estimate X̂ l of xq +l may be obtained by finding the

linear combination of xˆ q (l) and ~


x s (m + 1 − l) . The estimate based on the linear

combination of xˆ q (l) and ~


x s (m + 1 − l) may be given as Xˆ ' = ( xˆ q +1 ,..., xˆ q + m ) with

62
xˆ q +l = w1l xˆ q (l) + w2 l ~
x s ( m + 1 + l)

where

w1l represent the weights yet to be calculated for the forward forecast,

w2 l represent the weights yet to be calculated for the backward forecast,


xˆ q (l) = ∑ π (jl ) x q − j +1 ,
j =1


x s (m + 1 − l) = ∑ π (j m +1−l ) x q + m + j ,
~
j =1

l −1
π (jl ) = π j + l −1 + ∑ π hπ (jl − h ) and π (j1) = π j , j = 1,2,...
h =1

In order to calculate the weights ( w1l and w2 l ), we use the following equation:

E[ x q +l − w1l xˆ q (l) − w2 l ~
x s (m + 1 − l)]2 =

w12lσ 12 + w22lσ 21
2
+ 2 w1l w2 lσ 12 + 2 w1l (1 − w1l − w2 l )σ 10

+ 2w2 l (1 − w1l − w2l )σ 20 + (1 − w1l − w2 l ) 2 σ x2 (29)

where


σ x2 = v( x q + l ) = (∑ψ 2j )σ z2 ,
j =0

l −1
σ 12 = v(eˆ) = (∑ψ 2j )σ z2 = σ 10 = cov(eˆ, x q +l ) ,
j =0

m−l
σ 22 = v(e~ ) = ( ∑ψ 2j )σ z2 = σ 20 = cov(e~, x q + l ) ,
j =0

eˆ = eˆq (l) ,

e~ = e~s (m + 1 − l) ,

σ 12 = ψ 1' Dψ 2' = cov(eˆ, e~ ),

63
and ψ 1 ' = (1ψ 1 ...ψ l −1 ) ,

ψ 2 ' = (1ψ 1 ...ψ m −l ) ,

D = (d ij ) is an l × (m + 1 − l) matrix with

d ij = E ( z q + l +1−i c q + l −1+ j )


= ∑ (−π h )ψ h + i + j − 2 , π 0 = −1 ,
h =0

(i = 1,2,..., l; j = 1,2,..., m + 1 − l; l = 1,2,..., m)

By taking partial derivatives on equation (29) with respect to w 1l and w 2 l it can be

shown that the minimizing values of w 1l and w 2 l are given by

w1l = L22 ( L11 − L12 ) / H , w2 l = L11 ( L22 − L12 ) / H

where

H = L11 L22 − L12


2
; (30)

L11 = σ Z2 − σ 12 ; (31)

L22 = σ Z2 − σ 22 ; and (32)

L12 = σ Z2 + σ 12 − σ 12 − σ 22 ; (33)

(Abraham B., 1981, pp.1646)

As a result, we have the following equations:

(σ 2x − σ 22 )[(σ 2x − σ12 ) − (σ 2x + σ12 − σ12 − σ 22 )]


w 1l = (34)
(σ 2x − σ12 )(σ 2x − σ 22 ) − (σ 2x + σ12 − σ12 − σ 22 ) 2

(σ 2x − σ12 )[(σ 2x − σ 22 ) − (σ 2x + σ12 − σ12 − σ 22 )]


w 2l = (35)
(σ 2x − σ12 )(σ 2x − σ 22 ) − (σ 2x + σ12 − σ12 − σ 22 ) 2

64
However in the non-stationary cases the variance is not finite and it creates a problem in

the mean square. In Abraham (1981) the proposed solution to this problem is:

(σ 22 − σ12 )
w 1l = , (36)
(σ12 + σ 22 − 2σ12 )

w 2l = 1 − w 1l

To gain further understanding of the principles detailed in this article, the above

approach is applied to some simple time series models. When only one observation is

missing, the forward and backward forecasts receive the same weighting. In addition, if

the model is non-stationary with difference equal to 1 (d=1), the estimate of the missing

value is an average of the forward and backward forecasts of the missing observations.

When a time series has one observation missing (m=1, l = 1 ), we will have the

following variance and standard deviations. Hence the weighting of the equation would

become equation (38).

σ12 = σ 22 = σ 2z ,


σ12 = (1 − ∑ π h ψ h )σ 2z (37)
h =1

w 11 = w 21 (38)

4.7.1 Weighting for general ARIMA models :

For this time series model, we have the following standards equation.

φ(B)X t = θ(B)E t (39)

65
Rearrange equation (39), we can have the following equations:

θ(B)
Xt = Et
φ(B)

= (1 + ψ 1 B + ψ 2 B 2 + ...) Et

Also, equation (39) can be rearranged as

φ ( B)
Et = Xt
θ ( B)

= (1 − π 1 B − π 2 B 2 − ...) X t

When missing observation equal to 1 (m=1), we can obtain the variance as follow:

l −1
σ12 = v(ê) = ( ∑ ψ 2j )σ 2z (40)
j= 0

1−1
= ( ∑ ψ 02 )σ 2z
j= 0

= (1)σ 2z

= σ 2z

m −l
σ 22 = v(~e ) = ( ∑ ψ 2j )σ 2z (41)
j= 0

1−1
= ( ∑ ψ 02 )σ 2z
j= 0

= (1)σ 2z

= σ 2z

66
Example 4.7.1: Calculate weightings for ARIMA (1,0,0) model

The model is expressed as

(1 − φB)X t = E t (42)

In addition, when we rearrange equation (42) into the following equation:

1
Xt = Et
1 − φB

= (1 + φB + φ 2 B 2 + ...) Et

and now, let ψ1 = φ ; ψ2 =φ2 ; ... ; ψk =φk (43)

Also, equation (42) can be expressed as

1 − φB
Et = Xt
1

Letting π1 = φ ; π2 = 0 ; ... ; π k = 0 (k > 1)

Hence, we can substitute from (43) to (40) and obtain the following:


σ 2x = v( x q + l ) = (∑ ψ 2j )σ 2z
j= 0


σ 2x = v( x q + l ) = ( ∑ φ 2j )σ 2z
j= 0

= (1 + φ 2 + φ 4 + ...)σ Z2

⎛ 1 ⎞ 2
= ⎜⎜ ⎟⎟σ Z (44)
⎝1− φ
2

67
From equation (40) and (41) we have σ12 = σ 22 = σ 2z

As we have stated from equation (31) that L11 = σ 2X − σ12

We substitute equation (44) into equation (31) and obtained the following:

⎛ 1 ⎞ 2
L11 = ⎜⎜ ⎟⎟σ Z − σ Z2
⎝1− φ
2

⎛ φ2 ⎞ 2
L11 = ⎜⎜ ⎟⎟σ Z (45)
⎝1− φ
2

From equation (32) L 22 = σ 2x − σ 22

Substitute equation (44) into equation (32) and obtained the following:

⎛ 1 ⎞ 2
L 22 = ⎜⎜ ⎟⎟σ Z − σ Z2
⎝1− φ
2

⎛ φ2 ⎞ 2
L 22 = ⎜⎜ ⎟⎟σ Z (46)
⎝1− φ
2

Also, from equation (33) L12 = σ 2x + σ12 − σ12 − σ 22

Now, substitute equations (37), (40), (41) and (44) into (33) and we obtain

⎛ 1 ⎞ 2 ∞
L12 = ⎜⎜ ⎟⎟σ Z + (1 − ∑ π hψ h )σ z2 − σ Z2 − σ Z2
⎝1− φ
2
⎠ h =1

⎛ 1 ⎞ 2 ⎛ ∞ ⎞
= ⎜⎜ ⎟⎟σ Z + σ z2 − ⎜ ∑ π hψ h ⎟σ z2 − 2σ Z2
⎝1− φ
2
⎠ ⎝ h =1 ⎠

68
⎛ 1 ⎞ 2
= ⎜⎜ ⎟⎟σ Z + σ z2 − φ 2σ z2 − 2σ Z2
⎝1− φ
2

⎛ 1 − φ 2 (1 − φ 2 ) − (1 − φ 2 ) ⎞ 2
= ⎜⎜ ⎟⎟σ Z
⎝ 1−φ 2 ⎠

⎛ φ4 ⎞ 2
= ⎜⎜ ⎟⎟σ Z (47)
⎝1− φ
2

Since, L11 = L22 and equation (30) stated the following:

H = L11 L22 − L12


2

We can substitute equation (45), (46) and (47) into (30)

2 2
⎛ φ2 ⎞ ⎛ φ4 ⎞
H = ⎜⎜ ⎟⎟ (σ Z2 ) 2 − ⎜⎜ ⎟⎟ (σ Z2 ) 2
⎝1− φ ⎝1− φ
2 2
⎠ ⎠

φ 4 −φ8
H= (σ Z2 ) 2
(1 − φ )
2 2

Now, equation (37) can be rewritten as :

⎛ φ2 ⎞ 2 ⎛⎛ φ 2 ⎞ 2 ⎛ φ 4 ⎞ 2⎞
⎜⎜ ⎟⎟σ Z ⎜ ⎜⎜ ⎟σ − ⎜⎜
2 ⎟ Z
⎟⎟σ Z ⎟
⎜ ⎟
⎝1− φ ⎠ ⎝⎝1 − φ ⎠ ⎝1− φ
2 2
⎠ ⎠
w1l =
φ 4 −φ8
(σ Z2 ) 2
(1−φ 2 2
)
⎛ φ2 ⎞ 2 ⎛⎛ φ 2 − φ 4 ⎞ 2⎞
⎜⎜ ⎟⎟σ Z ⎜ ⎜⎜ ⎟⎟σ Z ⎟
⎜ ⎟
⎝1− φ ⎠ ⎝⎝ 1 − φ
2 2
⎠ ⎠
=
φ4 −φ8
(σ Z2 ) 2
(
1−φ 2 2
)

69
φ 2 (φ 2 − φ 4 ) (1 − φ 2 )
2

= ×
(1 − φ 2 )2 φ 4 − φ 8
φ4 −φ6
=
φ 4 −φ8

φ 4 −φ6
=
(φ 4 − φ 6 )(1 + φ 2 )
1
=
1+φ 2

In order to calculate w2 l , we have to refer to equation (35)

(σ 2x − σ12 )[(σ 2x − σ 22 ) − (σ 2x + σ12 − σ12 − σ 22 )]


w 2l =
(σ 2x − σ12 )(σ 2x − σ 22 ) − (σ 2x + σ12 − σ12 − σ 22 ) 2

As we already know that σ12 = σ 22 = σ 2z

Therefore: w 2l = w 1l

Example 4.7.2: Calculate weightings for ARIMA (0,0,1) model

In this example, the time series model can be expressed as

X t = (1 − θB ) Et (48)

and from the time series model, we can establish the following :

∴ ψ 1 = −θ ; ψ2 = 0 ; ... ; ψ k = 0 (k > 1)

70
Also, equation (48) can also be expressed as

1
Et = Xt
1 − θB

Hence,

X t = (1 + θB + θ 2 B 2 + ...)E t

and

∴ π1 = − θ ; π 2 = −θ 2 ; ... ; π k = −θ k

From equation (40), we have the following:


σ 2x = v( x q + l ) = (∑ ψ 2j )σ 2z
j= 0

σ x2 = (1 + θ 2 )σ Z2 (49)

From the previous example, we already established that

σ12 = σ 22 = σ 2z

and equation (31) is L11 = σ 2X − σ12

If we substitute equation (49) into (31) then the equation would become

L11 = (1 + θ 2 )σ Z2 − σ Z2

= θ 2σ Z2

Also, equation (32) stated that L 22 = σ 2x − σ 22

71
To find L22 we have to substitute equation (49) into (32) and obtain

L 22 = (1 + θ 2 )σ Z2 − σ Z2

= θ 2σ Z2

From equation (33), L12 = σ 2x + σ12 − σ12 − σ 22

Substitute equation (42) and (49) into (36)

( )

L12 = 1 + θ 2 σ Z2 + (1 − ∑ π hψ h )σ z2 − σ Z2 − σ Z2
h =1

⎛ ∞ ⎞
( )
= 1 + θ 2 σ Z2 + σ z2 − ⎜ ∑ π hψ h ⎟σ z2 − 2σ Z2
⎝ h =1 ⎠

( )
= 1 + θ 2 σ Z2 + σ z2 − θ 2σ z2 − 2σ Z2

= (1 + θ 2 − θ 2 − 1)σ Z2

=0

and from equation (30) we can establish the following:

( )
H = θ 2 (σ Z2 ) 2 − 0
2

Equation (34) is rewritten as

(σ 2x − σ 22 )[(σ 2x − σ12 ) − (σ 2x + σ12 − σ12 − σ 22 )]


w 1l =
(σ 2x − σ12 )(σ 2x − σ 22 ) − (σ 2x + σ12 − σ12 − σ 22 ) 2

L22 ( L11 − L12 )


=
H

=
(θ )σ ((θ )σ
2 2
Z
2 2
Z −0 )
(θ ) (σ )2 2 2
Z
2

=1

72
From equation (35) we know

(σ 2x − σ12 )[(σ 2x − σ 22 ) − (σ 2x + σ12 − σ12 − σ 22 )]


w 2l =
(σ 2x − σ12 )(σ 2x − σ 22 ) − (σ 2x + σ12 − σ12 − σ 22 ) 2

and σ12 = σ 22 = σ 2z

Therefore w 2l = w 1l

Example 4.7.3: Calculate weightings for ARIMA (1,0,1) model

For this time series model, we have the following equation:

(1 − φB)X t = (1 − θB)E t (50)

and equation (50) can be rearranged as

1 − θB
Xt = Et
1 − φB

( )
= 1 + φB + φ 2 B 2 + ... (1 − θB )Et

= (... + φB − θB + φ 2 B 2 − φθB 2 + φ 3 B 3 − φ 2θB 3 ...)Et

= (... + (φ − θ )B + (φ 2 − φθ )B 2 + (φ 3 − φ 2θ )B 3 ...)Et

Hence,

∴ψ 1 = (φ − θ ) ; ψ 2 = (φ 2 − φθ ) ; .. ;ψ k = φ k − φ k −1θ (k > 1) (51)

Equation (50) can also be rewritten as

1 − φB
Et = Xt
1 − θB

= (1 + θB + θ 2 B 2 + ...)(1 − φB )X t

73
(
= ... + θB − φB + θ 2 B 2 − φθB 2 + θ 3 B 3 − θ 2φB 3 ... X t )
= (... + (φ − θ )B + (θ 2 − φθ )B 2 + (θ 3 − θ 2φ )B 3 ...)X t

Hence,

∴ π1 = φ − θ ; π 2 = φθ − θ 2 ; .. ; π k = φθ k −1 − θ k (k > 1)

Substitute equation (51) into equation (40), we have the following


σ 2x = v( x q + l ) = (∑ ψ 2j )σ 2z
j= 0

(
= 1 + (φ − θ ) + φ 2 − φθ
2
( ) + (φ
2 3
− φ 2θ )
2
)
+ ... σ Z2

(
= 1 + (φ − θ ) + φ 2 (φ − θ ) + φ 4 (φ − θ ) + ... σ Z2
2 2 2
)
{ [
= 1 + (φ − θ ) + φ 2 (φ − θ ) + φ 4 (φ − θ ) + ... σ Z2
2 2 2
]}
{ [ (
= 1 + (φ − θ ) 1 + φ 2 + φ 4 + ... σ Z2
2
)]}
⎧⎪ ⎡ 2⎛ ⎞⎤ ⎫⎪ 2
= ⎨1 + ⎢(φ − θ ) ⎜⎜
1
⎟⎟⎥ ⎬σ Z
⎪⎩ ⎣ ⎝1− φ ⎠⎦ ⎪⎭
2

⎛ (φ − θ ) 2 ⎞ 2
= ⎜⎜1 + ⎟⎟σ Z (52)
⎝ 1−φ 2 ⎠

As σ12 = σ 22 = σ 2z and L11 = σ 2X − σ12

Equation (31) would become the following:

⎛ (φ − θ ) 2 ⎞ 2
L11 = ⎜⎜1 + ⎟⎟σ Z − σ Z2
⎝ 1−φ 2 ⎠

⎛ (φ − θ ) 2 ⎞ 2
= ⎜⎜ ⎟⎟σ Z
⎝ 1−φ
2

74
Also, equation (32) L 22 = σ 2x − σ 22 would become

⎛ (φ − θ ) 2 ⎞ 2
L 22 = ⎜⎜1 + ⎟⎟σ Z − σ Z2
⎝ 1−φ 2 ⎠

⎛ (φ − θ ) 2 ⎞ 2
= ⎜⎜ ⎟⎟σ Z
⎝ 1−φ
2

( )( ) ( )( )

∑ π h ψ h = (φ − θ)(φ − θ) + φ − φθ φθ − θ + φ − φ θ φθ − θ + ...
2 2 3 2 2 3
Consider :
h =1

= (φ − θ ) + φ (φ − θ )θ (φ − θ ) + φ 2 (φ − θ )θ 2 (φ − θ ) + ...
2

= (φ − θ ) + φθ (φ − θ ) + φ 2θ 2 (φ − θ ) + ...
2 2 2

(
= (φ − θ ) 1 + φθ + φ 2θ 2 + ...
2
)
2⎛ 1 ⎞
= (φ − θ ) ⎜⎜ ⎟⎟
⎝ 1 − φθ ⎠

=
(φ − θ )2
1 − φθ

As we know equation (33) is L12 = σ 2x + σ12 − σ12 − σ 22 , we can substitute equation (37)

and (52) into (33)

⎛ (φ − θ )2 ⎞ 2 ∞
L12 = ⎜⎜1 + ⎟σ Z + (1 − ∑ π hψ h )σ z2 − σ Z2 − σ Z2
1−φ 2 ⎟
⎝ ⎠ h =1

⎛ (φ − θ )2 ⎞ 2 ⎛ ∞ ⎞
= ⎜⎜1 + ⎟σ Z + σ z2 − ⎜ ∑ π hψ h ⎟σ z2 − 2σ Z2
1−φ 2 ⎟
⎝ ⎠ ⎝ h =1 ⎠

⎛ (φ − θ )2 ⎞ 2 ⎛ (φ − θ )2 ⎞ 2
= ⎜⎜1 + ⎟σ Z + σ z2 − ⎜
⎟ ⎜ 1 − φθ
⎟σ z − 2σ Z2

⎝ 1−φ 2 ⎠ ⎝ ⎠

⎛ (φ − θ )2 (φ − θ )2 ⎞
= ⎜⎜1 + − − 1⎟⎟σ Z2
⎝ 1−φ 2
1 − φθ ⎠

75
⎛ (φ − θ )2 (1 − φθ ) − (φ − θ )2 (1 − φ 2 ) ⎞ 2
= ⎜⎜ ⎟σ Z
⎝ (1 − φ 2 )(1 − φθ ) ⎟

⎛ (φ − θ )2 (1 − φθ − 1 + φ 2 ) ⎞ 2
= ⎜⎜ ⎟σ Z
⎝ (
1 − φ 2
)(1 − φθ ) ⎟

⎛ (φ − θ )2 (φ 2 − φθ ) ⎞ 2
= ⎜⎜ ⎟σ Z
⎝ (1 − φ )(1 − φθ ) ⎠
2 ⎟

and from equation (30)

⎛ (φ − θ )2 ⎛ (φ − θ )2 (φ 2 − φθ ) ⎞
2 2

H = ⎜⎜ ⎟ (σ Z2 ) 2 − ⎜


⎜ (1 − φ 2 )(1 − φθ ) ⎟ (σ Z )
2 2

⎝ 1−φ
2
⎠ ⎝ ⎠

⎛ (φ − θ )4 ⎞ 2 2 ⎛ (φ − θ )4 (φ 2 − φθ )2 ⎞ 2 2
=⎜ ⎟(σ Z ) − ⎜ ⎟(σ Z )
⎜ (1 − φ 2 )2 ⎟ ⎜ (1 − φ 2 )2 (1 − φθ )2 ⎟
⎝ ⎠ ⎝ ⎠

⎛ (φ − θ )4 (1 − φθ )2 − (φ − θ )4 (φ 2 − φθ )2 ⎞ 2 2
=⎜ ⎟(σ Z )

⎝ (1 − φ 2 )2 (1 − φθ )2 ⎟

Using equation (34) to find w1l , we have

⎛ (φ − θ )2 ⎞ 2 ⎛ ⎛ (φ − θ )2 ⎞ 2 ⎛ (φ − θ )2 φ 2 − φθ
⎜ ⎟ ⎜⎜ ⎟ ⎜
( ) ⎞⎟σ ⎞

⎜ 1 − φ 2 ⎟σ Z ⎜ ⎜ 1 − φ 2 ⎟σ Z − ⎜ 1 − φ 2 (1 − φθ )
2

⎝ ⎠ ⎝⎝ ⎠ ⎝ ( ) ⎟

Z


w1l =
(
⎛ (φ − θ )4 (1 − φθ )2 − (φ − θ )4 φ 2 − φθ 2 ⎞ 2
⎜ ⎟σ z )

⎝ ( )
1 − φ (1 − φθ )
2 2 2 ⎟

(φ − θ )4 (1 − φθ ) − (φ − θ )4 (φ 2 − φθ ) (
1 − φ 2 ) (1 − φθ )
2 2
= ×
(1 − φ 2 )2 (1 − φθ ) (φ − θ )4 (1 − φθ )2 − (φ − θ )4 (φ 2 − φθ )2

=
(1 − φθ ) − (φ 2 − φθ ) × (1 − φθ )
1 (1 − φθ )2 − (φ 2 − φθ )2
1 − φθ
=
1 − φθ + φ 2 − φθ

1 − φθ
=
1 − 2φθ + φ 2

76
(σ 2x − σ12 )[(σ 2x − σ 22 ) − (σ 2x + σ12 − σ12 − σ 22 )]
As we know that w 2l =
(σ 2x − σ12 )(σ 2x − σ 22 ) − (σ 2x + σ12 − σ12 − σ 22 ) 2

and σ12 = σ 22 = σ 2z

Therefore: w 2l = w 1l

Example 4.7.4: Calculate weightings for ARIMA (p,1,q) model:

For this class of model φ(B)(1 − B)X t = θ(B)E t

In any non-stationary case

(σ 22 − σ12 )
w 1l = ,
(σ12 + σ 22 − 2σ12 )

when missing observation equal to 1 (m=1) , and we already know

σ12 = σ 22 = σ 2z

(σ 2z − σ12 )
Therefore: w 1l =
(σ 2z + σ 2z − 2σ12 )

1
w 1l =
2

Also w 2l = 1 − w 1l

1
Therefore: w 2l =
2

77
The weightings for the above cases are summarized in the following table:

ARIMA model Weightings w 1l and w 2 l can be determine by

1
(1,0,0)
(1 + φ 2 )

(0,0,1) 1

(1 − φθ)
(1,0,1)
(1 + φ 2 − 2φθ)

1
(p,1,r)
2

Table 4.7.1 Weightings w 1l and w 2 l for 1 missing value in ARIMA model.

When two observations are missing, the two values of forward forecast should be

unequally weighted and this also should apply to the two values of backward forecast.

However, the first value of the forward forecast and the second value of the backward

forecast (also the second value of the forward forecast and the first value of the

backward forecast) should be weighted equally.

(Abraham B., 1981, pp.1647)

For the situation where two observations are missing (m=2), we have

l = 1 , σ12 = σ 2z

σ 22 = (1 + ψ 12 )σ 2z

∞ ∞
σ12 = {(1 − ∑ ψ h π h ) + ψ 1 (ψ 1 − ∑ π h ψ h +1 )}σ 2z
h =1 h =1

l = 2 , σ12 = (1 + ψ 12 )σ 2z

σ 22 = σ 2z

∞ ∞
σ12 = {(1 − ∑ ψ h π h ) + ψ 1 (ψ 1 − ∑ π h ψ h +1 )}σ 2z
h =1 h =1

78
The weightings for these cases are summarized in the following table:

ARIMA l Weightings w 1l and w 2 l can be determine by


model w 1l w 2l

1 (1 + φ 2 ) 1
(1 + φ 2 + φ 4 ) (1 + φ + φ 4 )
2

(1,0,0)

2 w 21 w 11

1 (1 + φ 2 − φθ)(1 − φθ) (1 − φθ)


(1 + φ + φ 2 − φθ)(1 − φ + φ 2 − φθ) (1 + φ + φ − φθ)(1 − φ + φ 2 − φθ)
2

(1,0,1)

2 w 21 w 11

1 1 + (1 + φ)(1 + φ − φ 3 ) 1- w 11
2 + (1 + φ)(1 + φ − 2φ 3 )
(1,1,0)

2 1- w 11 w 11
1 2−θ 1- w 11
3−θ
(0,1,1)

2 1- w 11
w 11

Table 4.7.2 Weightings w 1l and w 2 l for 1 missing value in ARIMA model.

In 1979, an alternative approach which produced the same results as Abraham was

developed by Eivind Damsleth to find the optimal linear combination of the forward

and backward forecast for missing values in an ARIMA time series. Abraham was

unaware of Damsleth results at the time of writing his paper. The method is based on

the idea of using cross covariance to generate parameters for biased linear combination

for forward and backward forecast with a minimum error. The author states the method

requires the data to have sufficient length for parameters and the model to estimate

correctly. With shorter data length the problem is more difficult and is not discussed

79
within this paper. The advantage of this method is that it provides a step by step

algorithm for the reader to follow. Using the notation of this thesis, the algorithm as

presented by Damsleth 1979 is expressed as follows:

i) Calculate the optimal forecast eˆq (l) using φ ( B)(1 − B ) d X t = θ ( B) Et and the

optimal backforecast ~
es (m + 1 − l) using φ(F)(1 − F) d x t = θ(F)c t .

θ ( B)
ii) Calculate the coefficients π j of B j in the polynomial expansion of for
φ ( B)

j = 0,1, K, max(l − 1, m − l ) .

iii) Calculate the cross covariance function γ ae ( j ) for j = 0,1, K, m − l , using

⎛ p+ j

θ ( F )⎜⎜1 − ∑ φ i B i ⎟⎟γ ae ( j − s ) = ±α j σ 2 , − p ≤ j ≤ 0 ,
⎝ i =1 ⎠

⎛ q− j

θ ( B)⎜⎜1 − ∑θ i F i ⎟⎟γ ae ( j − s) = ±α jσ 2 , 0 ≤ j ≤ q ,
⎝ i =1 ⎠

This gives p + q +1 linear equations in p + q +1 unknowns

γ ae (− p − s), γ ae (− p − s + 1),Kγ ae (q − s − 1), γ ae (q − s), which can be solved. The

solutions will provide starting values for the difference equation

θ ( F )φ ( B )γ ae ( j − s ) = ±α j σ 2 , j = K − 1,0,1K where α j is the coefficient of

B j in θ ( B)φ ( B −1 ), given by the following:

⎧ 0 j < −p
⎪ min( p + j ,q )
⎪− φ − j + ∑ φ i − j θ i − p ≤ j ≤ −1
⎪ i =1
⎪⎪ min( p ,q )
α j = ⎨ 1 + ∑ φ iθ i j=0
⎪ i =1
min( p ,q − j )

⎪ − θ j + ∑ φ iθ i + j 1≤ j ≤ q
⎪ i =1
⎪⎩ 0 j>q

80
iv) Calculate

l −1
v(eˆ) from (∑ψ 2j )σ z2 ,
j =0

m −l
v(e~) from (∑ψ 2j )σ z2 , and
j =0

l −1 m − l
σ 12 = ∑∑ψ iψ j γ ae (i + j ) where γ ae (k ) is the cross covariance function
i =0 k =0

between {at } and {et }.

v) If {X t } is stationary then calculate c and d as the solutions to

( Ex 2 − σ 12 )c + ( Ex 2 + σ 12 − σ 12 − σ 22 )d = Ex 2 − σ 12

( Ex 2 + σ 122 − σ 12 − σ 22 )c + ( Ex 2 − σ 22 )d = Ex 2 − σ 22

In the non-stationary case, we calculate c from

σ 22 − σ 12
c= and d = 1 − c .
σ 12 + σ 22 − 2σ 12

vi) The optimal between-forecast of x r +l is then given by

c (eˆq (l) + Ex) + d (e~q + m +1 (m + 1 − l) + Ex)

81
CHAPTER 5

State Space Modelling for Missing Values

5.1 About this chapter

Recently, state space modelling has become a popular approach in time series analysis.

The main advantage of this approach is that it can adapt to different time series models

such as the Box-Jenkins ARIMA and Structural Time Series. This approach emphasises

the notion that a time series is a set of distinct components. Thus, we may assume that

observations relate to the mean level of the process through an observation equation,

whereas one or more state equations describe how the individual components change

through time. In state space modelling, observations can be added one at a time and the

estimating equations are then updated to produce new estimates. In this chapter, we will

investigate the above approach and determine the most suitable time series model(s) for

the use of this technique.

5.2 State Space Models

State space modelling is a general approach which originally developed by control

engineers for applications concerning multi-sensor systems such as tracking devices. It

assumes that the state of the process summarizes all the information from the past that is

necessary to predict the future.


82
Suppose that the observed value of a time series at time t , X t , is a function of one or

more random variables θ t1 ,..., θ td which also occur at time t but which are not

observed. These variables are called the state variables and we can represent them by

vector θ t = (θ t1 ...θ td ) T .

The simplest model is to assume that X t is a linear function of θ ti which is shown as:

X t = h1θ t1 + h2θ t2 + ... + hd θ td + ε t (53)

In this case hi is a constant parameter and ε t represents the observation error.

The equation (53) above is called measurement or observation equation and can be

written as a matrix notation which is shown below:

X t = H Tθt + ε t

In this case H T is a d-dimensional row vector of parameter.

Although the state vector θ t is not directly observable, it is assumed that we know how

θ t changes through time and we are able to use the observation on X t to make

inferences about θ t . The updated equation is θ t = Gθ t −1 + Kη t . In the equation, G is

a d × d matrix of parameters and η t is a white noise vector with covariance matrix Wt .

When we consider the state variables at time t which is represented by θ t = (θ t1 ...θ td ) T ,

that is a recursive process dependent on the previous state of the variable θ t −1 , θ t − 2 , .

Consequently a simple model for the state vector could be θ t = G θ t −1 where G is a

d × d matrix of parameter. If the state dimension is 2 the model is

θ t1 = G11θ t1−1 + G12θ t2− 2 (54)

83
θ t2 = G21θ t1−1 + G22θ t2− 2 (55)

From equations (54) & (55), we can create the following coefficient matrix

⎡G G12 ⎤
G = ⎢ 11
⎣G 21 G 22 ⎥⎦

As usual, we have to build random noise terms into each equation to make the model

more flexible. To achieve this flexibility, we introduce the random noise vector η t into

our model. For instance, where the state dimension is 2 and there are 2 independent

random variables of noise with zero mean together with the noise vector of

η t = (η t1 ,η t2 ) T then our model becomes:

θ t1 = G11θ t1−1 + G12θ t2−2 + K11η t1 + K 12η t2

θ t2 = G21θ t1−1 + G22θ t2− 2

where K ij are additional parameters.

In general, the equation θ t = Gθ t −1 + Kη t is known as the system or transition equation.

As G describes the evolution from one state to the next it is called the transition matrix,

and K is called the state noise coefficient matrix for obvious reasons. (Gareth Janacek

and Louise Swift 1993)

The advantages of using state space modelling is that it is possible to put many types of
time-series models into the state space formulation. This is illustrated by the following
three examples.

84
Example 5.2.1 A simple state space model

X t = (0.5 0.5)θ t + ε t

⎡0.7 0 ⎤ ⎡1⎤
And θt = ⎢ ⎥θ t −1 + ⎢ ⎥η t
⎣ 2 0.8⎦ ⎣0.5⎦

with var( ε t ) = 0.5 and var (η t ) = 4.

Note : The matrix H , matrix G , matrix K all contain parameters of the model. If

these parameters are unknown they must be estimated. It is important to remember that,

in general, the state variables are not observed. Notice also that all the information in

the past of the series { X t } which can influence X t must be contained only in the state

variables θ t .

The state space modelling is assumed to have the following assumptions (Harvey, 1989,

pp.115-116, pp. 101-102) :

a) E( X t ) and the autocorrelations of X t are independent of t for weak stationary;

and

b) ε t is a zero mean observation error with variance Z t and η t is a vector white

noise with variance matrix Qt . This is shown as:

⎡ε t ( ss ) ⎤ ⎛ ⎡Z t 0 ⎤⎞
⎢ η ⎥ ~ NID⎜⎜ 0, ⎢ 0 ⎟
Qt ⎥⎦ ⎟⎠
⎣ t ⎦ ⎝ ⎣

Two further assumptions are specified by (Harvey, 1981, pp. 101-102) for the state

space system:

85
c) The initial state vector, θ 0 , has a mean of a 0 and a covariance matrix P0 , that

is, E (θ 0 ) = a 0 and Var (θ 0 ) = P0 .

d) The observation errors ε t (ss ) and η t are uncorrelated with each other in all time

periods, and uncorrelated with the initial state, that is,

E (ε t ( ss )η t ) = 0, for all s , for t = 1,...., N

and E (ε t ( ss )α 0 ) = 0, E (η t α 0 ) = 0 for t = 1,...., N

Example 5.2.2 : Using state space modelling for the ARMA (p,p-1) model.

The ARMA (p,p-1) model representation can be obtained by changing the elements of

matrix K with the MA(p-1) coefficients. In this case, K = (1, β 1 , β 2 ) and

H T = (1 ,0 ,0 ) . This state space ARMA model representation can be verified as

above by repeatedly substituting the elements in the state vector.

When ARMA is incorporated into state space model framework consideration must be

given to ARMA process of order (3,2) and is shown as:

X t = α 1 X t −1 + α 2 X t − 2 + α 3 X t −3 + Z t + β 1 Z t −1 + β 2 Z t − 2

Since it has only one equation and one noise series, it does not seem compatible with

the state space model.

86
However, let us assume the following:

X t = (1 0 0)θ t ,

⎡α 1 1 0⎤ ⎡1⎤
θ t = ⎢⎢α 2 0 1⎥ X t −1 + ⎢⎢ β1 ⎥⎥η t
⎥ (56)
⎢⎣α 3 0 0⎥⎦ ⎢⎣ β 2 ⎥⎦

We can represent the state vector by X t = ( X t1 , X t2 , X t3 ) T .

When we begin at the bottom row of (56), the state equation is X t3 = α 3 X t1−1 + β 2η t

The second row (56), the state equation is X t2 = α 2 X t1−1 + X t3−1 + β1ηt

If we substitute X t3−1 into X t2 it gives:

X t2 = α 2 X t1−1 + α 3 X t1−2 + β 2ηt −1 + β1ηt

Finally, when we look at the first row of (56), the state equation is

X t1 = α 1 X t1−1 + X t2−1 + η t

Substitute X t2−1 into X t1 which gives

X t1 = α1 X t1−1 + α 2 X t1−2 + α3 X t1−3 + β2ηt −2 + β1ηt −1 + ηt

which is an ARMA model equation but in the first component of the state X t1 with

noise term Z t .

ARIMA models may also be represented in state space form. Harvey and Phillips(1979)

derived an exact maximum likelihood estimation procedure for ARIMA models. One

87
advantage of this approach is the ability to handle missing observations since we may

simply ‘update’ the estimates; that is, we may replace the missing observations by its

one-step ahead forecast.

Other than regression ARMA models, state space modelling can also represent a trend-

and-seasonal model for which exponential smoothing methods are thought to be

appropriate. In order to apply state space modelling for ARIMA models, we need to

know H and G parameters in the model equations and also to know the variances and

covariances of the disturbance terms. The choice of suitable values for H and G

parameters may be accomplished using a variety of aids including external knowledge

and preliminary examination of the data. In other words, the use of state space

modelling does not help to take away the usual problem of finding a suitable type of

model. (C.Chatfied 1994) Initially, we assume that such parameters are known;

estimation procedures for these parameters such as log-likelihood function, maximum

likelihood function and expected maximum algorithm can be used accordingly.

5.3 Structural Time Series Models

Structural time series models are a specific type of state space models. These are

modelled as a sum of meaningful and separate components and are well suited to stock

assessment.

88
A Basic Structural Model is represented as the observed value in terms of one or more

unobserved components, called the state vector. These components can be divided into

S −1
separate groups. Thus, a “s” seasonal is γ t = −∑ γ t − j + η t
j =1

The structural model can be represented in a state space form. That is, the one-

dimensional state would be θ t = [ μ t , β t , γ t , γ t −1 , γ t −2 ,..., γ t − s + 2 ] and the state noise

vector, consisting of uncorrelated white noise, would be η t = [η t(1) η t( 2) η t(3) ] . For

example, assuming s = 4 , the basic structural model has the following observation

equation and transition equation.

Observation Equation

⎡ μt ⎤
⎢β ⎥
⎢ t ⎥
X t = [1 0 1 0 0] ⎢ γ t ⎥ + ε t
⎢ ⎥
⎢ γ t −1 ⎥
⎢⎣γ t − 2 ⎥⎦

Transition Equation

⎡ μ t ⎤ ⎡1 1 0 0 0 ⎤ ⎡ μ t −1 ⎤ ⎡1 0 0⎤
⎢ β ⎥ ⎢0 1 0 0 0 ⎥⎥ ⎢⎢ β t −1 ⎥⎥ ⎢⎢0 1 0⎥⎥ ⎡η t(1) ⎤
⎢ t ⎥ ⎢ ⎢ ⎥
⎢ γ t ⎥ = ⎢0 0 − 1 − 1 − 1⎥ ⎢ γ t −1 ⎥ + ⎢0 0 1⎥ ⎢η t( 2) ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ γ t −1 ⎥ ⎢0 0 1 0 0 ⎥ ⎢γ t − 2 ⎥ ⎢0 0 0⎥ ⎢⎣η t(3) ⎥⎦
⎢⎣γ t −2 ⎥⎦ ⎢⎣0 0 0 1 0 ⎥⎦ ⎢⎣γ t −3 ⎥⎦ ⎢⎣0 0 0⎥⎦

Please note covariance matrix is ∑ = diag (η t(1) , ,η t(2) ,η t(3) ) .


(Janaceck and Swift, 1992, pp. 88-89).

89
The state space modelling can then be applied to determine the local level, trend and

seasonal components. Without the seasonal component, structural models will be:

X t = μt + ε t (57)

μ t = μ t −1 + β t −1 + η t(1) (58)

β t = β t −1 + η t( 2) (59)

The equations (57), (58) and (59) above can be referred to as a linear growth model. The

first equation is the observation equation. The state vector θ t = [μ t β t ] , where μ t is

the local level, which changes through time and β t is the local trend which may evolve

through time.

In state-space form, the observation equation is

⎡μ t ⎤
X t = [1 0] ⎢ ⎥ + ε t
⎣β t ⎦

and the transition equation is

⎡ μ t ⎤ ⎡1 1⎤ ⎡ μ t −1 ⎤ ⎡1 0⎤ ⎡η t(1) ⎤
⎢ β ⎥ = ⎢0 1⎥ ⎢ β ⎥ + ⎢0 1⎥ ⎢ ( 2) ⎥ , or
⎣ t⎦ ⎣ ⎦ ⎣ t −1 ⎦ ⎣ ⎦ ⎣η t ⎦

⎡1 1⎤ ⎡1 0⎤
θt = ⎢ ⎥θ t −1 + ⎢ ⎥η t
⎣0 1⎦ ⎣0 1 ⎦

(Chatfield, 1989, p.184).

For all the main structural models, the observation equation involves a linear function of

the state variables and yet does not restrict the model to be constant through time.

Rather it allows local features such as trend and seasonality to be updated through time

using the transition equation.

90
5.4 The Kalman Filter

In state space modelling, the prime objective is to estimate the state vector θ t . When

we are trying to produce future estimators, the problem is called forecasting or

prediction, if we are trying to update current estimators then the process is called

filtering and if we are trying to find past estimators, the process is called smoothing.

The Kalman Filter provides us with a set of equations which allows us to update the

estimate of θ t when new observations become available. There are two stages involved

in this process which are called the prediction stage and the updating stage.

5.4.1 Prediction Stage

Kalman Filter has provided us with two prediction equations.

Firstly lets consider the update equation for state space model

θ t = Gθ t −1 + Kη t ,

where η t is still unknown at time t − 1 , therefore the estimator for θ t can be written as

∧ ∧
θ t |t −1 = Gt θ t −1


where θ t −1 is the best (minimum mean square) estimator for θ t −1 and the resulting


estimator as θ t|t −1 .

91

Secondly the variance-covariance matrix Pt|t −1 of the estimator error of θ t|t −1 is given by

Pt|t −1 = Gt Pt −1GtT + KWt K T

(Janaceck and Swift, 1992, pp. 97)

5.4.2 Updating Stage

When the new observation becomes available, the estimator of θ t can be modified to

make adjustments for the additional information. Consider the state space model

X t = H Tθt + ε t .


Now, the prediction error is given by ε t = X t − H T θ t |t −1 and it can be shown that the

updating equations are given by

∧ ∧
θ t = θ t −1 + K t ε t

and

Pt = Pt |t −1 − K t H T Pt |t −1

where

K t = Pt |t −1 H t /[ H T Pt |t −1 H t + σ n2 ] (also called the Kalman gain)

(Janaceck and Swift, 1992, pp. 97)

A major advantage of the Kalman Filter is that the calculations are recursive, therefore

the current estimates are based on the whole past. Also, the Kalman Filter converges

92
fairly quickly when there is a constant underlying model, but can also follow the

movement of a system where the underlying model is evolving through time.

5.4.3 Estimation of Parameters

In 1965 Schweppe developed an innovative form of likelihood function which is based

on independent Gaussian random vectors with zero means. The likelihood function is

defined as

n n
− 2 ln LY (θ ) = ∑ log Σ t (θ ) + ∑ ε t (θ ) Σ t (θ ) ε t (θ )
' −1

t =1 t =1

The function is highly non-linear and complicated with unknown parameters.

In this case, the most convenient way is to use Kalman filter recursions. The Kalman

filter recursions is a set of recursive equations that are used to estimate parameter values

in a state space model.

The usual procedure is to set an initial value x0 and then use Kalman filter recursions

for the log likelihood function and its first two derivatives. Then, a Newton-Raphson

algorithm can be used successively to update the parameter values until the log

likelihood is maximized.

93
5.4.4 The Expectation-Maximisation (EM) Algorithm

In addition to Newton-Raphson, Shumway and Stoffer (1982) presented a conceptually

simpler estimation procedure based on EM (expectation maximization) algorithm

(Dempster et al 1977). It is an algorithm for nonlinear optimisation algorithm that is

appropriate for time series applications involving unknown components.

When we consider the unobserved signal process θ t and an unobserved noise process

ε t (ss ) . Both processes form the function X t which is an incomplete data set. Log

likelihood log L(θ , Ψ ) may be based on the complete data set, or an incomplete data

set, where the parameters denoted by the matrix Ψ are to be estimated. For the

incomplete data likelihood it is required to maximise a function using one of the

conventional non-linear optimisation techniques. In comparison, for the complete data

likelihood, the maximisation technique is usually very easy, except for the unobserved

values of θ t and ε t (ss ) (Shumway, 1988, p. 200-201).

The Expectation-Maximisation algorithm was used for estimating the unknown

parameters in an unobserved component model. Consider a general model that is time-

invariant as follows:

X t = H T θ t + ε t (ss ) ,

θ t = Ttθ t −1 + Rtη t ,

94
with a0 and P0 are known, and Var (η t ) = Q is unrestricted. If the elements in the state

vector are observed for t = 0,..., N , the log-likelihood function for the X t ’s and θ t ’s

would be:

N N 1 N
log L( X t , θ ) = −
2
log 2π − log h −
2
∑ ( X t − H tθ t ) 2
2h i =1

Nn N 1 N
− log 2π − log Q − ∑ (θ t − T *θ t −1 )' Q −1 (θ t − Nθ t −1 )
2 2 2 i =1

n 1 1
− log 2π − log P0 − (θ 0 − a 0 ) P0−1 (θ 0 − a 0 ) .
2 2 2

It follows that the iteration procedure of the EM algorithm proceeds by evaluating

⎡ ∂ log L ⎤
E⎢ | XN ⎥.
⎣ ∂Ψ ⎦

which is conditional on the latest estimate of Ψ .

The expression is then set to a vector of zeros and solved to yield a new set of estimates

of Ψ . The likelihood will remain the same or increase at each iteration under suitable

conditions. It will also converge to a local maximum (Harvey, 1989, p.188).

In general, if we have the complete data set, we could then use the results from

multivariate normal theory to easily obtain the maximum likelihood estimations of θ .

In the case of incomplete data set, EM algorithm gives us an iterative method for

95
finding the maximum likelihood estimations of θ , by successively maximizing the

conditional expectation of the complete data likelihood.

The overall procedure can be regarded as simply alternating between the Kalman

filtering and smoothing recursions and the multivariate normal maximum likelihood

estimators.

5.5 Missing Values

In practice, the Kalman filter equations are more easily to cope with missing values.

When a missing observation is encountered at time t the prediction equations are

processed as usual, the updating equations cannot be processed but can be replaced by

^ ^
Pt = Pt |t −1 and θ t = θ t |t −1 .

If there is a second consecutive missing value, the prediction equations can be processed

again to give

^ ^ ^
θ t +1|t = G θ t = G 2 θ t −1 ,

Pt +1|t = GPt G T + KWt K T

= GPt |t −1G T + KWt K T

= G (GPt −1G T + KWt K T )G T + KWt K T

96
This concept can be extended to cases with any number of consecutive missing values.

^
In addition, we can use the result of θ t 's and Pt 's as parameters in smoothing

algorithms and to efficiently obtain the estimation of the missing data themselves.

(C.Chatfied 1994)

Example 5.5.1 State Space Modelling

The PURSE data series consists of 71 observations on purse snatching in Chicago.

It is very complicated to estimate the model manually and I have used the computer

software "STAMP" (Structural Time series Analyser, Modeller and Predictor) to

estimate the time series model. The computer program, based on the work of Professor

Andrew Harvey and written primarily by Simon Peters, carries out the estimation and

testing of structural time series models. The Kalman filter plays a key role in handling

the model.

97
Figure 5.1 Plot of 71 observations on purse snatching in Chicago

From Figure 5.1, it appears that the data series have the following properties:

a) The model consists of a level and it evolve slowly over time according to a

stochastic (random) mechanism.

b) The model consists of no slope.

c) The model consists of an irregular term (random walk + noise)

98
Using "STAMP" to estimate to the model and the results are as follow :

Fig 5.2 Actual and Fitted Values on purse snatching in Chicago

Fig 5.3 Normalised Residual on purse snatching in Chicago

99
The estimated model seems to fit the data quite well. The PURMISS data series is the

purse series, with a run of missing observations. I have used a similar technique in an

attempt to estimate the model.

Figure 5.4 Plot on purse snatching in Chicago with a run of missing observations.

From Figure 5.4, it appears that the data series has the following properties :
a) The model consists of a level and it evolves slowly over time according to a
stochastic (random) mechanism.

b) The model consists of stochastic slope.

c) The model consists of an irregular term (random walk + noise)

Using "STAMP" to estimate to the model and the results are as follow :

Figure 5.5 Results of estimating missing values on purse snatching in Chicago.

100
Figure 5.6 Actual values and fitted values on purse snatching in Chicago.

Note: the fitted value on the graph have been shifted 1 time interval forward by the
software.

Observation Actual Fitted Residual


3 10 11.5687 1.5687
4 10 9.9609 -0.0391
missing 12 9.4909 -2.5091
missing 10 9.0209 -0.9791
missing 7 8.5509 1.5509
missing 17 8.0808 -8.9192
missing 10 7.6108 -2.3892
10 14 13.8880 -0.112
11 8 10.3302 2.3302
Table 5.5.1 Actual values and fitted values on purse snatching in Chicago

By using STAMP, we have obtained optimal estimates of the missing observations


using structural time series analysis.

101
CHAPTER 6

Analysis and Comparison of Time Series Model

6.1 About this Chapter

In this chapter, we apply various estimation methods to simulated data sets derived from

different time series models. From this exercise, we can gain an insight on how each

method performs for different time series situations and make appropriate comparisons

between the methods.

In order to test the effectiveness of each estimation method, we require many data sets

representing different time series models. One way to obtain such a large amount of

data is by simulation using the computer. In this thesis, we have chosen to use Microsoft

Excel and Minitab for our data generation process. Microsoft Excel is a popular

spreadsheet program created by Microsoft. We chose this software because of its

popularity and easy access. The other package we used for data generation is Minitab.

It is another popular multi-purpose mathematics package which we can easily access.

By creating macros, both packages can generate specific time series data sets without

any difficulty. The specific time series models we are going to generate are AR and MA

for various parameters.

We chose these specific Time Series Models because they are simple stationary Box-

Jenkins models that facilitate easy comparison for different parameters values. The

102
analysis will also apply for nonstationary data sets where transformations can be applied

to obtain stationary data.

Where the standard deviation (SD) relates to the purely random process; the data sets

for specific time series models are as follows:

AR Phi SD MA Theta SD MA Theta SD

1 0.2 0.04 1 -2.5 0.04 1 0.5 0.4

1 0.4 0.04 1 -2 0.04 1 1 0.4

1 0.6 0.04 1 -1.5 0.04 1 1.5 0.4

1 0.8 0.04 1 -1 0.04 1 2 0.4

1 -0.2 0.04 1 -0.5 0.04

1 -0.4 0.04 1 0 0.04

1 -0.6 0.04 1 0.5 0.04

1 -0.8 0.04 1 1 0.04

1 0.2 0.4 1 1.5 0.04

1 0.4 0.4 1 2 0.04

1 0.6 0.4 1 2.5 0.04

1 0.8 0.4 1 -2 0.4

1 -0.2 0.4 1 -1.5 0.4

1 -0.4 0.4 1 -1 0.4

1 -0.6 0.4 1 -0.5 0.4

1 -0.8 0.4 1 0 0.4

Table 6.1.1 Various time series models for simulation.

We will set a single missing value at various positions for each data set. By using this

setup, we should be able to assess the performance of each method for different time

series models with missing values in different positions. For different estimation

methods, the missing value positions are as follows:


103
Polynomial Curve Fitting: Missing value at position 7, 49 and 91.

Cubic Spline: Missing value at position 7, 49 and 91.

ARIMA interpolation: Missing value at position 7, 14, 21, 28, 35, 42, 49,

56, 63, 70 ,77, 84 and 91.

State Space Modelling: Missing value at position 7, 49 and 91.

To start our testing, we will first consider numerical analysis methods such as least

square approximation (polynomial curve fitting) and cubic spline curve fitting. Then we

apply ARIMA modelling using forecasting and weighted means. Finally we will

examine state space modelling. After we have applied each of the above methods, we

will examine our results to compare the efficiencies of each method for different time

series models.

6.2 Applying Polynomial Curve Fitting (Least Square


Approximation) to Box-Jenkins Models

The objective of least square approximation is to fit a data set with various non-linear

polynomial functions. The aim is to identify the appropriate polynomial degree with the

minimum MAD as defined previously.

It could be predicted that this method is best suited for time series with a highly

irregular pattern. As there is no requirement for users to define a time series model for

the data set, we believe this method is suitable for general curve fitting situations.

104
For this method, we have used an excel plug-in written by “Advanced Systems Design

and Development”. We applied the plug-in and created a spreadsheet that will fit

polynomials to a set of data with automatic incrementation of the degree on each

polynomial.

The simulation was carried out by following a number of steps. These were:

1. Generate specific time series model data sets for testing.

2. Take out a specific value from the data set and store it at another cell for

comparison.

3. Apply the plug-in and try to fit the data set with different degrees of

polynomials.

4. Calculate the absolute deviation for the missing value of the absolute deviations.

5. Repeat the process one hundred times, determine the MAD and standard

deviation.

105
For each model the polynomial degree with lowest MAD is highlighted.
The results are as follows for AR time series with missing value at position 49:

Table 6.2.1 AR time series model with missing value at position 49.
Summary Table 2 deg 3 deg 4 deg 5 deg 6 deg 7 deg

MAD MAD MAD MAD MAD MAD


No. Case Phi SD SD SD SD SD SD SD
1 AR 1 0.2 0.04 0.032509 0.032494 0.03278 0.032797 0.033156 0.033352

0.025184 0.025266 0.024772 0.024793 0.02555 0.025638

2 AR 1 0.4 0.04 0.034602 0.034713 0.034254 0.034145 0.034098 0.034299

0.025992 0.025965 0.025494 0.025587 0.026418 0.026555

3 AR 1 0.6 0.04 0.038823 0.038954 0.038217 0.038189 0.037837 0.037848

0.028004 0.028073 0.026614 0.026513 0.026311 0.026666

4 AR 1 0.8 0.04 0.049147 0.049332 0.047293 0.047232 0.043769 0.04393

0.035254 0.035404 0.030055 0.029918 0.028215 0.028365

5 AR 1 -0.2 0.04 0.032931 0.032948 0.033385 0.033467 0.033954 0.034083

0.025688 0.025725 0.025557 0.025638 0.026221 0.026186

6 AR 1 -0.4 0.04 0.034486 0.034509 0.034828 0.034935 0.035533 0.035595

0.027485 0.027479 0.027736 0.027872 0.028394 0.028469

7 AR 1 -0.6 0.04 0.038281 0.038303 0.038782 0.038916 0.039842 0.039875

0.031618 0.031588 0.031867 0.032033 0.03238 0.03251

8 AR 1 -0.8 0.04 0.050979 0.050942 0.051375 0.051553 0.052445 0.0525

0.041172 0.041198 0.041867 0.041984 0.042656 0.042763

9 AR 1 0.2 0.4 0.322696 0.322863 0.324576 0.324694 0.327963 0.328942

0.242403 0.24278 0.241983 0.241762 0.250389 0.251987

10 AR 1 0.4 0.4 0.34329 0.344132 0.338851 0.337705 0.337069 0.337998

0.249531 0.24918 0.249007 0.249679 0.260219 0.26266

11 AR 1 0.6 0.4 0.387499 0.388197 0.376246 0.375955 0.373327 0.373937

0.274288 0.274582 0.263341 0.262117 0.261065 0.265094

12 AR 1 0.8 0.4 0.485087 0.486805 0.457708 0.457774 0.425059 0.427045

0.346653 0.34716 0.290995 0.28869 0.276716 0.278167

13 AR 1 -0.2 0.4 0.331757 0.331974 0.337168 0.337771 0.344922 0.346408

0.25648 0.256822 0.255652 0.256177 0.262099 0.262556

14 AR 1 -0.4 0.4 0.358198 0.358366 0.361859 0.362689 0.370687 0.371696

0.278948 0.279174 0.282284 0.283253 0.288926 0.290094

15 AR 1 -0.6 0.4 0.408827 0.409083 0.414854 0.415999 0.42554 0.426228

0.32573 0.325685 0.328349 0.329421 0.334402 0.335979

16 AR 1 -0.8 0.4 0.532708 0.532352 0.536292 0.537777 0.548274 0.548964

0.429338 0.42983 0.437352 0.437997 0.444895 0.446791

106
For the AR1 model, as the Phi values approaches 1 the data has less fluctuation. The

table for Phi values greater than 0.4 shows that a higher degree of polynomial provides

the minimum MAD. Where the Phi values are negative the data fluctuates more. In this

case a 2 degree polynomial will produce optimum results.

According to our result, the relationship between mean absolute deviation and standard

deviation of random noise is directly proportional. It indicated that if we increase the

standard deviation of the random noise by scale of 10, then the mean absolute deviation

is also increased approximately by 10 times.

When Phi values are positive we notice that there is a positive relationship between Phi

values and the mean absolute deviation. As the Phi values increase the mean absolute

deviation increases.

Best MAD vs Phi Values

Table 6.2.2 Missing value at position 49, SD = 0.04.

AR1 missing value 49

0.2 0.032494

0.4 0.034098

0.6 0.037837

0.8 0.043769

-0.2 0.032931

-0.4 0.034486

-0.6 0.038281

-0.8 0.050942

Figure 6.1 AR1 missing at 49, S.D. 0.04.

107
Table 6.2.3 Missing value at position 49, SD = 0.4.

AR1 missing value 49

0.2 0.322696

0.4 0.337069

0.6 0.373327

0.8 0.425059

-0.2 0.331757

-0.4 0.358198

-0.6 0.408827

-0.8 0.532352

Figure 6.2 AR1 missing at 49, S.D. 0.4.

In order to show the effectiveness of polynominal curve fitting for time series data, the

best MAD value is plotted against different Phi values and then graphed as seen in

tables 6.2.2 and 6.2.3. These results show that when the missing value is at position 49

this method is best suited when the Phi values are closer to 0. It should also be noted

that the standard deviation of the MAD decreases as Phi approaches 0. The above

process and analysis is repeated for missing values at beginning and end of data.

108
Table 6.2.4 summarizes the results for missing data at position 7.

Table 6.2.4 AR time series model with missing value at position 7.

Summary Table 2 deg 3 deg 4 deg 5 deg 6 deg 7 deg

MAD MAD MAD MAD MAD MAD


No. Case Phi SD SD SD SD SD SD SD
1 AR 1 0.2 0.04 0.036244 0.035939 0.035862 0.036368 0.036935 0.036039

0.025805 0.02656 0.026535 0.027161 0.027009 0.02878

2 AR 1 0.4 0.04 0.040799 0.040113 0.040012 0.040518 0.04063 0.03967

0.025119 0.026202 0.0262 0.026852 0.026874 0.027502

3 AR 1 0.6 0.04 0.046022 0.044665 0.044584 0.045065 0.044461 0.041655

0.026366 0.026846 0.026725 0.028056 0.027208 0.027341

4 AR 1 0.8 0.04 0.051739 0.048197 0.047918 0.048992 0.045797 0.043019

0.033796 0.031025 0.03039 0.033843 0.0324 0.028947

5 AR 1 -0.2 0.04 0.031406 0.031844 0.031889 0.032457 0.033097 0.033677

0.026894 0.027239 0.027238 0.027497 0.02754 0.028729

6 AR 1 -0.4 0.04 0.034053 0.034506 0.034562 0.035184 0.035694 0.036964

0.025684 0.026046 0.026064 0.026251 0.026464 0.02717

7 AR 1 -0.6 0.04 0.039208 0.039684 0.03975 0.040465 0.041296 0.042754

0.028593 0.028799 0.028815 0.029039 0.029009 0.030307

8 AR 1 -0.8 0.04 0.054966 0.05519 0.055179 0.055851 0.05704 0.059501

0.041934 0.041932 0.042003 0.042765 0.04354 0.045529

9 AR 1 0.2 0.4 0.361587 0.357383 0.357426 0.362177 0.367519 0.356625

0.25993 0.269197 0.268742 0.274032 0.27472 0.295629

10 AR 1 0.4 0.4 0.395807 0.387567 0.387425 0.393592 0.395276 0.38571

0.260645 0.270618 0.270002 0.275132 0.274528 0.283117

11 AR 1 0.6 0.4 0.43581 0.422627 0.421694 0.429243 0.430122 0.408248

0.27889 0.2839 0.283254 0.293844 0.278698 0.277112

12 AR 1 0.8 0.4 0.494676 0.463275 0.459776 0.477532 0.453372 0.427964

0.339965 0.310894 0.304606 0.33535 0.316451 0.279882

13 AR 1 -0.2 0.4 0.308674 0.312725 0.313135 0.317858 0.322784 0.327035

0.253592 0.257608 0.257748 0.261652 0.263 0.27968

14 AR 1 -0.4 0.4 0.331096 0.335845 0.336394 0.341623 0.345099 0.354906

0.242903 0.246956 0.247221 0.250103 0.252723 0.265045

15 AR 1 -0.6 0.4 0.381146 0.386994 0.387604 0.393022 0.400983 0.410681

0.268414 0.270418 0.270641 0.273519 0.272342 0.291763

16 AR 1 -0.8 0.4 0.513513 0.516881 0.516597 0.521661 0.535363 0.559668

0.397752 0.397489 0.397986 0.405435 0.412941 0.431698

109
The simulation for missing value at 7 indicated that if the missing value is at the

beginning of the AR1 data set, then we have to increase the degree of polynomials in

order to achieve a smaller MAD. When the Phi value is negative then we do not require

a high degree polynominal. In fact, a 2 degree polynominal is sufficient to achieve a

small MAD. Tables 6.2.5-6.2.6 also shows the best MAD value corresponding to

obtained Phi values for different standard deviations.

Best MAD vs Phi Values

Table 6.2.5 Missing value at position 7, SD = 0.04.

AR1 missing value 7

0.2 0.035862

0.4 0.03967

0.6 0.041655

0.8 0.043019

-0.2 0.031406

-0.4 0.034053

-0.6 0.039208

-0.8 0.054966

Figure 6.3 AR1 missing at 7, S.D. 0.04.

110
Table 6.2.6 Missing value at position 7, SD = 0.4.

AR1 missing value 7

0.2 0.356625

0.4 0.38571

0.6 0.408248

0.8 0.427964

-0.2 0.308674

-0.4 0.331096

-0.6 0.381146

-0.8 0.513513

Figure 6.4 AR1 missing at 7, S.D. 0.4

The graph of best MAD values plotted for missing value at position 7 as given in

Figures 6.3 and 6.4 is similar to missing value at position 49 except that the MAD

values are higher for positive Phi values less than 0.6.

The result for an AR simulation with missing value 91 is provided in table 6.2.7.

111
Table 6.2.7 AR time series model with missing value at position 91.

Summary Table 2 deg 3 deg 4 deg 5 deg 6 deg 7 deg

MAD MAD MAD MAD MAD MAD


No. Case Phi SD SD SD SD SD SD SD
1 AR 1 0.2 0.04 0.034568 0.034697 0.035206 0.035264 0.036281 0.036858

0.027338 0.0273 0.02704 0.02704 0.027694 0.026857

2 AR 1 0.4 0.04 0.033622 0.03381 0.03392 0.034116 0.035275 0.035591

0.028227 0.02807 0.028145 0.027396 0.028445 0.027092

3 AR 1 0.6 0.04 0.034753 0.034809 0.035304 0.035802 0.036484 0.035622

0.031921 0.031732 0.031831 0.028584 0.030417 0.028167

4 AR 1 0.8 0.04 0.039435 0.038794 0.039691 0.037417 0.0388 0.035638

0.036277 0.036278 0.037539 0.032604 0.0333 0.030342

5 AR 1 -0.2 0.04 0.034815 0.034802 0.034913 0.035778 0.037236 0.037368

0.027342 0.027376 0.027542 0.027694 0.028321 0.027762

6 AR 1 -0.4 0.04 0.039314 0.03932 0.039307 0.040092 0.041674 0.041671

0.029046 0.02908 0.029296 0.02995 0.030337 0.030162

7 AR 1 -0.6 0.04 0.046911 0.046854 0.046882 0.047771 0.049315 0.049438

0.032655 0.032847 0.033037 0.033893 0.034666 0.034465

8 AR 1 -0.8 0.04 0.060149 0.06013 0.060162 0.061223 0.062658 0.063043

0.040555 0.040867 0.041059 0.041531 0.042858 0.042335

9 AR 1 0.2 0.4 0.332383 0.333896 0.339567 0.338868 0.348111 0.351816

0.253989 0.252857 0.251004 0.250062 0.26117 0.252435

10 AR 1 0.4 0.4 0.329914 0.332421 0.336529 0.335699 0.346148 0.347783

0.275642 0.273191 0.275304 0.264378 0.275135 0.263784

11 AR 1 0.6 0.4 0.336848 0.33825 0.346359 0.347734 0.35022 0.347329

0.307381 0.303987 0.306044 0.268499 0.293686 0.275076

12 AR 1 0.8 0.4 0.402849 0.396261 0.406258 0.379638 0.385607 0.357084

0.354613 0.351208 0.365192 0.311109 0.32656 0.304052

13 AR 1 -0.2 0.4 0.345434 0.345201 0.346619 0.352779 0.367107 0.370501

0.262783 0.263076 0.2644 0.265587 0.270855 0.267331

14 AR 1 -0.4 0.4 0.382155 0.382501 0.382945 0.388513 0.40427 0.404309

0.279903 0.280431 0.282186 0.288667 0.29294 0.294504

15 AR 1 -0.6 0.4 0.448976 0.448745 0.448595 0.456317 0.470453 0.472184

0.318106 0.319996 0.322817 0.329978 0.340086 0.338896

16 AR 1 -0.8 0.4 0.591065 0.591226 0.592107 0.600547 0.616263 0.617846

0.403096 0.405935 0.40782 0.412773 0.427173 0.422727

112
The above simulation indicated that if the missing value is at the end of the AR1 data

set, then two degrees polynomial should be used to make a reasonable estimation for

positive Phi values. This is because it is difficult to fit a high degree polynominal curve

within the data given.

The best MAD values corresponding to different Phi values are provided in tables 6.2.8-

6.2.9 for different standard deviations.

Best MAD vs Phi Values

Table 6.2.8 Missing value at position 91, SD = 0.04.

AR1 missing value 99

0.2 0.034568

0.4 0.033622

0.6 0.034753

0.8 0.035638

-0.2 0.034802

-0.4 0.039307

-0.6 0.046854

-0.8 0.06013

Figure 6.5 AR1 missing at 91, S.D. 0.04.

113
Table 6.2.9 Missing value at position 91, SD = 0.4.

AR1 missing value 99

0.2 0.332383

0.4 0.329914

0.6 0.336848

0.8 0.357084

-0.2 0.345201

-0.4 0.382155

-0.6 0.448595

-0.8 0.591065

Figure 6.6 AR1 missing at 91, S.D. 0.4.

For any AR time series model with lots of data points before the missing value, the

graph indicates that as the Phi value increases, the MAD value will become smaller than

for the data with a missing value at the beginning or middle. As a result, this method is

suitable for AR model with large Phi value.

It can be concluded that for any AR1 process where there is a sufficient amount of past

data available, polynomials of two or three degrees should give a reasonable estimation.

If the missing value is at the beginning of the data set, then we should reverse the data

set and calculate the missing value with two degree polynomial or just use polynomial

of five or six degrees to estimate the missing value.

The results are as follows for Moving Average time series with missing values at

position 49.

114
Table 6.2.10 MA time series model with missing value at position 49.
Summary Table 2 deg 3 deg 4 deg 5 deg 6 deg 7 deg

MAD MAD MAD MAD MAD MAD


No. Case Phi SD SD SD SD SD SD SD
1 MA 1 -2.5 0.04 0.087042 0.086985 0.086959 0.086967 0.086545 0.086399

0.065246 0.065331 0.065761 0.0654 0.067162 0.067345

2 MA 1 -2 0.04 0.073084 0.073048 0.073193 0.073238 0.073305 0.073239

0.0558 0.055803 0.056573 0.056391 0.057719 0.057772

3 MA 1 -1.5 0.04 0.05866 0.05864 0.05884 0.058888 0.059245 0.059188

0.04554 0.045527 0.046284 0.046252 0.046977 0.047004

4 MA 1 -1 0.04 0.045719 0.045712 0.046318 0.046383 0.046753 0.046736

0.036011 0.036005 0.036244 0.036308 0.036846 0.036841

5 MA 1 -0.5 0.04 0.036391 0.036408 0.036931 0.037025 0.037642 0.037663

0.028532 0.028534 0.028606 0.028687 0.029004 0.029041

6 MA 1 0 0.04 0.031826 0.031823 0.032035 0.03208 0.032677 0.032795

0.024494 0.024549 0.024523 0.024584 0.0251 0.025116

7 MA 1 0.5 0.04 0.035508 0.035562 0.034859 0.034744 0.035139 0.035176

0.026196 0.026209 0.026565 0.02658 0.027243 0.027446

8 MA 1 1 0.04 0.044308 0.044312 0.04331 0.043067 0.042989 0.042862

0.033209 0.033301 0.033438 0.033352 0.034258 0.034641

9 MA 1 1.5 0.04 0.055773 0.055764 0.054064 0.053767 0.05427 0.054287

0.04238 0.042429 0.04315 0.042875 0.042885 0.043151

10 MA 1 2 0.04 0.070009 0.069977 0.068238 0.06792 0.068459 0.068487

0.05456 0.054623 0.05472 0.054256 0.05333 0.053564

11 MA 1 2.5 0.04 0.085215 0.085162 0.083902 0.083564 0.083524 0.083678

0.065468 0.065548 0.064729 0.064117 0.063603 0.063672

12 MA 1 -2 0.4 0.753674 0.753497 0.753739 0.754142 0.751332 0.751543

0.568264 0.568062 0.577896 0.576009 0.583749 0.584024

13 MA 1 -1.5 0.4 0.613804 0.61381 0.61508 0.615143 0.618365 0.618692

0.458556 0.458293 0.468394 0.468435 0.471061 0.471316

14 MA 1 -1 0.4 0.480751 0.4808 0.486964 0.487393 0.491896 0.492257

0.361177 0.361119 0.364649 0.365235 0.369461 0.369804

15 MA 1 -0.5 0.4 0.376366 0.376434 0.382244 0.383024 0.39084 0.391171

0.284374 0.284596 0.286418 0.286944 0.290342 0.291192

16 MA 1 0 0.4 0.322103 0.321934 0.324064 0.324187 0.333521 0.334788

0.246727 0.247479 0.247367 0.247694 0.252288 0.253061

17 MA 1 0.5 0.4 0.295619 0.354114 0.354552 0.347119 0.345883 0.351764

0.25522 0.256482 0.256496 0.259646 0.259576 0.268797

115
Summary Table 2 deg 3 deg 4 deg 5 deg 6 deg 7 deg

MAD MAD MAD MAD MAD MAD


No. Case Phi SD SD SD SD SD SD SD
18 MA 1 1 0.4 0.263776 0.452198 0.452169 0.441187 0.438885 0.439373

0.209129 0.335508 0.336313 0.338075 0.337745 0.344037

19 MA 1 1.5 0.4 0.397509 0.572864 0.57319 0.556198 0.553886 0.562387

0.294549 0.437996 0.438507 0.445526 0.443531 0.43549

20 MA 1 2 0.4 0.622893 0.703274 0.703945 0.691929 0.690322 0.700696

0.448908 0.555629 0.555685 0.556524 0.552773 0.538997

The MA1 time series model has more random noise within the data and for all positive

Theta values the simulation indicated that if the missing value is in the middle of the

data set then a high degree polynomial is required to give a reasonable estimation.

Whereas, the negative Theta values consist of more random noise so therefore the

behaviour of the time series model shows an erratic pattern which suggests that a lower

degree polynominal is sufficient to achieve a small MAD. It can be noted that the

standard deviation of the MAD values are smaller when the Theta values are closer to 0.

In addition, similar to the AR data, the MAD values and the standard deviation of the

random error are directly proportional to each other.

116
Best MAD vs Theta Values
Table 6.2.11 Missing value at position 49, SD = 0.04.

MA1 missing value 49

-2.5 0.086399

-2 0.073048

-1.5 0.05864

-1 0.045712

-0.5 0.036391

0 0.031823

0.5 0.034744

1 0.042862

1.5 0.053767

2 0.06792

2.5 0.083524

Figure 6.7 MA1 missing at 49, S.D. 0.04.

Table 6.2.12 Missing value at position 49, SD = 0.4.

MA1 missing value 49

-2 0.751332

-1.5 0.613804

-1 0.480751

-0.5 0.376366

0 0.321934

0.5 0.345883

1 0.438885

1.5 0.553886

2 0.690322

Figure 6.8 MA1 missing at 49, S.D. 0.4.

117
The graph shows the best MAD values plotted for missing value at position 49 indicates

that minimum MAD is achieved when Theta value is zero i.e. purely random process.

For the MA process the standard deviation of the MAD also decreases as Theta

approaches 0. For larger positive and negative Theta values the estimation is unreliable

indicating a need for caution in using least squares polynomial.

The following table summarizes the results for missing data at position 7.

Table 6.2.13 MA time series model with missing value at position 7.


Summary Table 2 deg 3 deg 4 deg 5 deg 6 deg 7 deg

MAD MAD MAD MAD MAD MAD


No. Case Phi SD SD SD SD SD SD SD
1 MA 1 -2.5 0.04 0.082637 0.082817 0.083119 0.08235 0.081078 0.081911

0.064073 0.064287 0.064266 0.063777 0.063961 0.06645

2 MA 1 -2 0.04 0.067658 0.068373 0.068626 0.067771 0.066444 0.067283

0.052299 0.052138 0.052152 0.052131 0.05255 0.054425

3 MA 1 -1.5 0.04 0.0543 0.054777 0.054951 0.054639 0.054124 0.055376

0.040874 0.04132 0.041412 0.041099 0.040992 0.04215

4 MA 1 -1 0.04 0.043392 0.043971 0.044107 0.044114 0.044014 0.045683

0.031561 0.032045 0.032088 0.032021 0.032362 0.033113

5 MA 1 -0.5 0.04 0.034744 0.035305 0.035392 0.035822 0.03605 0.037226

0.026195 0.026664 0.026683 0.026743 0.027177 0.028247

6 MA 1 0 0.04 0.032503 0.032417 0.032394 0.032696 0.033576 0.033221

0.027135 0.027888 0.027914 0.028496 0.028289 0.030078

7 MA 1 0.5 0.04 0.044486 0.043625 0.043533 0.044231 0.044431 0.043612

0.025214 0.026644 0.026687 0.026653 0.027181 0.028636

8 MA 1 1 0.04 0.058092 0.056922 0.056781 0.057619 0.058551 0.0577

0.030279 0.031736 0.031811 0.031144 0.030249 0.031194

9 MA 1 1.5 0.04 0.07122 0.070217 0.07014 0.070987 0.071984 0.071288

0.040131 0.04098 0.04093 0.040103 0.038967 0.039264

10 MA 1 2 0.04 0.085604 0.08461 0.084502 0.085166 0.086355 0.085423

0.051021 0.051614 0.051632 0.050668 0.049277 0.04978

11 MA 1 2.5 0.04 0.100132 0.099264 0.099148 0.099758 0.101085 0.099627

0.063327 0.06367 0.063708 0.062539 0.060954 0.061475

12 MA 1 -2 0.4 0.687483 0.694861 0.696926 0.688317 0.683059 0.693488

0.532277 0.530889 0.530468 0.531157 0.540558 0.553169

118
Summary Table 2 deg 3 deg 4 deg 5 deg 6 deg 7 deg

MAD MAD MAD MAD MAD MAD

No. Case Phi SD SD SD SD SD SD SD

13 MA 1 -1.5 0.4 0.546551 0.5512 0.552439 0.55041 0.549663 0.565625

0.418326 0.422561 0.423214 0.419792 0.423034 0.428166

14 MA 1 -1 0.4 0.431226 0.437451 0.438655 0.438386 0.438717 0.454288

0.313608 0.318042 0.318482 0.317918 0.321098 0.326708

15 MA 1 -0.5 0.4 0.342795 0.347981 0.348716 0.352438 0.353868 0.3654

0.255902 0.261188 0.261468 0.262875 0.26757 0.281556

16 MA 1 0 0.4 0.321849 0.321842 0.321829 0.325788 0.33284 0.33185

0.25764 0.264584 0.264749 0.270763 0.270247 0.291456

17 MA 1 0.5 0.4 0.424102 0.418435 0.417596 0.427505 0.430485 0.430469

0.264631 0.273412 0.27397 0.271837 0.274013 0.283861

18 MA 1 1 0.4 0.555528 0.544837 0.54341 0.556715 0.572386 0.565164

0.337477 0.346988 0.348332 0.336567 0.317067 0.328613

19 MA 1 1.5 0.4 0.689781 0.678399 0.677652 0.69056 0.710286 0.703458

0.445516 0.4464 0.447238 0.431462 0.409893 0.420294

20 MA 1 2 0.4 0.836755 0.8251 0.824428 0.833766 0.852847 0.84368

0.56074 0.556685 0.558033 0.54221 0.520863 0.530577

In the case of missing value at position 7, it is difficult to predict the appropriate degree

of polynomial to be used for different positive Theta values. One possible explanation

is that there is insufficient data before the missing value. Where, the negative Theta

values consist of more random noise so therefore the behaviour of the time series model

shows an erratic pattern which suggests that a lower degree polynomial is sufficient to

achieve a small MAD.

119
Best MAD vs Theta Values
Table 6.2.14 Missing value at position 7, SD = 0.04.
MA1 missing value 7

-2.5 0.081078

-2 0.066444

-1.5 0.054124

-1 0.043392

-0.5 0.034744

0 0.032394

0.5 0.043533

1 0.056781

1.5 0.07014

2 0.084502

2.5 0.099148

Figure 6.9 MA1 missing at 7, S.D. 0.04.

Table 6.2.15 Missing value at position 7, SD = 0. 4.


MA1 missing value 7

-2 0.683059

-1.5 0.546551

-1 0.431226

-0.5 0.342795

0 0.321829

0.5 0.417596

1 0.54341

1.5 0.677652

2 0.824428

Figure 6.10 MA1 missing at 7, S.D. 0.4.

The graph of best MAD values plotted for missing value at position 7 as given in

Figures 6.9 and 6.10 is similar to missing value at position 49.

120
The result for an MA simulation with missing value 91 is provided in table 6.2.16.

Table 6.2.16 MA time series model with missing value at position 91.
Summary Table 2 deg 3 deg 4 deg 5 deg 6 deg 7 deg

MAD MAD MAD MAD MAD MAD


No. Case Phi SD SD SD SD SD SD SD
1 MA 1 -2.5 0.04 0.09355 0.093956 0.094435 0.094131 0.095678 0.097202

0.074511 0.074079 0.073789 0.076378 0.075497 0.078048

2 MA 1 -2 0.04 0.07737 0.077714 0.078137 0.07872 0.079846 0.081172

0.062783 0.062412 0.062154 0.063719 0.063159 0.065292

3 MA 1 -1.5 0.04 0.063922 0.064147 0.064633 0.065379 0.066653 0.067706

0.05014 0.04988 0.049399 0.05091 0.050421 0.052071

4 MA 1 -1 0.04 0.051664 0.051684 0.051745 0.05286 0.053952 0.054691

0.038051 0.038027 0.038049 0.038941 0.039011 0.039796

5 MA 1 -0.5 0.04 0.040358 0.040364 0.040478 0.041347 0.042599 0.042927

0.028995 0.029008 0.029079 0.0297 0.030231 0.030094

6 MA 1 0 0.04 0.03409 0.034156 0.034377 0.034793 0.035993 0.036116

0.025597 0.025541 0.025621 0.0257 0.026611 0.025833

7 MA 1 0.5 0.04 0.034356 0.03461 0.035063 0.034938 0.035975 0.036498

0.029063 0.028833 0.02879 0.028096 0.029118 0.028003

8 MA 1 1 0.04 0.043939 0.044048 0.04468 0.044773 0.045265 0.045763

0.035741 0.035712 0.035836 0.033762 0.034828 0.034722

9 MA 1 1.5 0.04 0.057532 0.057413 0.058117 0.057912 0.058933 0.059416

0.044639 0.044913 0.045113 0.042741 0.042777 0.043656

10 MA 1 2 0.04 0.072462 0.072301 0.073025 0.072442 0.074648 0.075559

0.055153 0.055525 0.055883 0.053525 0.052081 0.053438

11 MA 1 2.5 0.04 0.087979 0.087813 0.08883 0.087944 0.090233 0.092144

0.066786 0.067121 0.067213 0.064967 0.063387 0.064556

12 MA 1 -2 0.4 0.769816 0.773798 0.77969 0.783651 0.801013 0.813431

0.622222 0.618209 0.615102 0.632497 0.62809 0.647283

13 MA 1 -1.5 0.4 0.623993 0.626588 0.632059 0.638866 0.65411 0.6647

0.499489 0.496866 0.49282 0.506558 0.503956 0.516758

14 MA 1 -1 0.4 0.507052 0.507741 0.508857 0.517821 0.530361 0.536337

0.37024 0.370013 0.371226 0.37941 0.382699 0.390108

15 MA 1 -0.5 0.4 0.400369 0.400568 0.40117 0.407484 0.418318 0.423375

0.277809 0.278278 0.279708 0.286404 0.293576 0.293943

16 MA 1 0 0.4 0.334591 0.335409 0.336988 0.339803 0.348744 0.353259

0.247864 0.247668 0.248467 0.247148 0.258308 0.254792

121
Summary Table 2 deg 3 deg 4 deg 5 deg 6 deg 7 deg

MAD MAD MAD MAD MAD MAD

No. Case Phi SD SD SD SD SD SD SD

17 MA 1 0.5 0.4 0.34107 0.343717 0.34751 0.344349 0.353927 0.362122

0.288474 0.28609 0.2848 0.275923 0.282989 0.27664

18 MA 1 1 0.4 0.435317 0.436679 0.439075 0.44013 0.445236 0.457182

0.359975 0.358765 0.359877 0.334325 0.345332 0.341083

19 MA 1 1.5 0.4 0.566686 0.564455 0.566995 0.557875 0.577015 0.586095

0.445821 0.449598 0.450525 0.428853 0.424614 0.428637

20 MA 1 2 0.4 0.716731 0.712746 0.714482 0.703024 0.729697 0.742758

0.549856 0.555674 0.558306 0.534468 0.522121 0.529317

Where there are lots of data points given before the missing value there will be lots of

random noise within this type of time series model, whether it is a positive or negative

Theta model. As a result it is difficult to impose a high degree polynominal to fit the

data set. The two degree polynominal should be used because it will provide the most

accurate results producing the smallest MAD.

Best MAD vs Theta Values


Table 6.2.17 Missing value at position 91, SD = 0.04.
MA1 missing value 91

-2.5 0.09355

-2 0.07737

-1.5 0.063922

-1 0.051664

-0.5 0.040358

0 0.03409

0.5 0.034356

1 0.043939

1.5 0.057413

2 0.072301

2.5 0.087813

Figure 6.11 MA1 missing at 91, S.D. 0.04.

122
Table 6.2.18 Missing value at position 91, SD = 0.4.
MA1 missing value 91

-2 0.769816

-1.5 0.623993

-1 0.507052

-0.5 0.400369

0 0.334591

0.5 0.34107

1 0.435317

1.5 0.557875

2 0.703024

Figure 6.12 MA1 missing at 91, S.D. 0.4.

From the result above, it can be concluded that for any MA1 process with sufficient

amount of past data available, polynomials of two degrees should give a reasonable

estimation. If the missing value is at the beginning of the data set, then we should

reverse the data set and calculate the missing value with three degrees polynomials or

just use polynomial of six degrees to estimate the missing value.

In conclusion, the results show that when the missing value is at position 49 this method

is best suited when the Theta values are closer to 0 and a minimum MAD is achieved.

The same conclusion can be applied to the case when the missing values appear at

position 91. Whereas, position 7 shows that the value of MAD still retains a minimum

value when the Theta value is close to 0. However, when the Theta value is positive the

results show that there is a significant increase in the MAD value. In conclusion this

method is not suitable for use when there is a large positive value and the missing value

appears at the beginning of the data.

123
6.3 Applying Cubic Spline to Box-Jenkins Models

Another common strategy for fitting data point is “interpolating with a cubic spline”. It

fits a “cubic polynomial” to the known data, using cross-validation between each pair of

adjacent points to set the degree of smoothing and estimate the missing observation by

the value spline.

The excel plug-in that was used to calculate the polynomial fit, also contained an

additional function for fitting cubic spline which was used to create a spreadsheet to

examine the simulation results for cubic spline.

The prediction about this method is that it requires a substantial amount of data points to

be made available before the missing value can be calculated and this method can be

considered reliable.

We conducted the simulation as follows:

1. Generate data set for a specific time series model.

2. Take out a specific value from the data set and store it at another cell for

comparison.

3. Apply the plug-in to obtain an appropriate cubic spline fit.

4. Repeat the process one hundred times.

After repeating the process one hundred times, the MAD and standard deviation were

determined.

124
For AR1 case with missing data at 49, 7 and 91 the following results were obtained.

Table 6.3.1 AR1 Missing at position 49, S.D. 0.04.


Missing 49
Summary Table Spline
No. Case Al1 SD MAD/SD
1 AR 1 0.2 0.04 0.040865
0.031593
2 AR 1 0.4 0.04 0.035252
0.027938
3 AR 1 0.6 0.04 0.031162
0.024841
4 AR 1 0.8 0.04 0.027783
0.022266 0.16
5 AR 1 -0.2 0.04 0.056752 0.14
0.12
0.043447 0.1

MAD
6 AR 1 -0.4 0.04 0.070114 0.08
0.06
0.053983
0.04
7 AR 1 -0.6 0.04 0.092945 0.02
0.068056 0
-1 -0.5 0 0.5 1
8 AR 1 -0.8 0.04 0.148629
0.104152

Figure 6.13 Spline-AR1 Missing 49 S.D. 0.04.

Table 6.3.2 AR1 Missing at position 49, S.D. 0.4.

Missing 49
Summary Table Spline
No. Case Al1 SD MAD/SD
1 AR 1 0.2 0.4 0.40977
0.317756
2 AR 1 0.4 0.4 0.353888
0.281916
3 AR 1 0.6 0.4 0.310098
0.251666
4 AR 1 0.8 0.4 0.279394
0.22515 1.6
1.4
5 AR 1 -0.2 0.4 0.58708
1.2
MAD

0.434717 1
6 AR 1 -0.4 0.4 0.731758 0.8
0.6
0.540099 0.4
7 AR 1 -0.6 0.4 0.9714 0.2
0
0.706594
-1 -0.5 0 0.5 1
8 AR 1 -0.8 0.4 1.497991
1.075714

Figure 6.14 Spline-AR1 Missing 49 S.D. 0.4.

125
Table 6.3.3 AR1 Missing at position 7, S.D. 0.04.

Missing 7
Summary Table Spline
No. Case Al1 SD MAD/SD
1 AR 1 0.2 0.04 0.036933
0.029247
2 AR 1 0.4 0.04 0.032707
0.026057
3 AR 1 0.6 0.04 0.028611
0.023351
4 AR 1 0.8 0.04 0.02631
0.2
0.021008
5 AR 1 -0.2 0.04 0.053964 0.15

MAD
0.042607
0.1
6 AR 1 -0.4 0.04 0.069433
0.054567 0.05
7 AR 1 -0.6 0.04 0.09659
0
0.075585
-1 -0.5 0 0.5 1
8 AR 1 -0.8 0.04 0.166033
0.122178

Figure 6.15 Spline-AR1 Missing 7 S.D. 0.04.

Table 6.3.4 AR1 Missing at position 7, S.D. 0.4.

Case 3 Missing 7
Summary Table Spline
No. Case Al1 SD MAD/SD
1 AR 1 0.2 0.4 0.382207
0.297965
2 AR 1 0.4 0.4 0.33237
0.261229
3 AR 1 0.6 0.4 0.297167
0.234628
4 AR 1 0.8 0.4 0.264613
2
0.206865
5 AR 1 -0.2 0.4 0.532985 1.5
MAD

0.41526
1
6 AR 1 -0.4 0.4 0.669776
0.526743 0.5
7 AR 1 -0.6 0.4 0.920359
0
0.72088
-1 -0.5 0 0.5 1
8 AR 1 -0.8 0.4 1.547551
1.137891

Figure 6.16 Spline-AR1 Missing 7 S.D. 0.4.

126
Table 6.3.5 AR1 Missing at position 91, S.D. 0.04.

Missing 91
Summary Table Spline
No. Case Al1 SD MAD/SD
1 AR 1 0.2 0.04 0.046382
0.033514
2 AR 1 0.4 0.04 0.040234
0.029274
3 AR 1 0.6 0.04 0.03546
0.025787
4 AR 1 0.8 0.04 0.031523
0.2
0.023014
5 AR 1 -0.2 0.04 0.064524 0.15

MAD
0.047818
0.1
6 AR 1 -0.4 0.04 0.081325
0.059358 0.05
7 AR 1 -0.6 0.04 0.110424
0
0.078182
-1 -0.5 0 0.5 1
8 AR 1 -0.8 0.04 0.158685
0.117034

Figure 6.17 Spline-AR1 Missing 91 S.D. 0.04.

Table 6.3.6 AR1 Missing at position 91, S.D. 0.4.

Case 4 Missing 91
Summary Table Spline
No. Case Al1 SD MAD/SD

1 AR 1 0.2 0.4 0.454964


0.335259
2 AR 1 0.4 0.4 0.391341
0.294215
3 AR 1 0.6 0.4 0.343586
0.259942
4 AR 1 0.8 0.4 0.310549
2
0.229897
5 AR 1 -0.2 0.4 0.655446 1.5
MAD

0.470317
1
6 AR 1 -0.4 0.4 0.813471
0.594619 0.5
7 AR 1 -0.6 0.4 1.075999
0
0.788663
-1 -0.5 0 0.5 1
8 AR 1 -0.8 0.4 1.574595
1.190278

Figure 6.18 Spline-AR1 Missing 91 S.D. 0.4.

127
The graphs in Figures 6.13-6.18 show the MAD value when using the ‘interpolating

cubic spline smooth curve’ method. Graph results show that regardless of the position

of the missing value ie 7, 49 or 91 the MAD achieved by this method will produce a

similar result. For negative Phi values the MAD and SD of the MAD is significantly

larger. The overall performance of using the cubic spline method indicates that as the

Phi values changes from negative to positive the MAD value decreases in an

exponential manner. Clearly this method is best suited to the AR1 time series model

with positive Phi values. According to all the results, the absolute deviation and

standard deviation of the random data used to generate the data are in direct proportion

to each other.

For the MA simulated data we obtain the following results for missing data at 49, 7 and
91.
Table 6.3.7 MA1 Missing at position 49, S.D. 0.04.
Missing 49
Summary Table Spline
No. Case Theta1 SD MAD/SD
1 MA 1 -2.5 0.04 0.163606
0.119362
2 MA 1 -2 0.04 0.139247
0.101447
3 MA 1 -1.5 0.04 0.114831
0.084722
4 MA 1 -1 0.04 0.090936
0.068575
5 MA 1 -0.5 0.04 0.068185
0.05218
6 MA 1 0 0.04 0.047873 0.18
0.036378 0.16

7 MA 1 0.5 0.04 0.029624 0.14

0.025675 0.12

8 MA 1 1 0.04 0.02709 0.1


MAD

0.02149 0.08

9 MA 1 1.5 0.04 0.041176 0.06

0.030189 0.04
0.02
10 MA 1 2 0.04 0.063375
0
0.043824
-3 -2 -1 0 1 2 3
11 MA 1 2.5 0.04 0.086267
0.059502

Figure 6.19 Spline-MA1 Missing 49 S.D. 0.04.

128
Table 6.3.8 MA1 Missing at position 49, S.D. 0.4.
Missing 49
Summary Table Spline
No. Case Theta1 SD MAD/SD
1 MA 1 -2 0.4 1.432791
1.013634
2 MA 1 -1.5 0.4 1.187897
0.845383
3 MA 1 -1 0.4 0.946709
0.680004
4 MA 1 -0.5 0.4 0.713525 1.6
0.516502 1.4
5 MA 1 0 0.4 0.498291
1.2
0.359232
1

MAD
6 MA 1 0.5 0.4 0.295619
0.8
0.25522
7 MA 1 1 0.4 0.263776 0.6

0.209129 0.4
8 MA 1 1.5 0.4 0.397509 0.2
0.294549 0
9 MA 1 2 0.4 0.622893 -3 -2 -1 0 1 2 3
0.448908

Figure 6.20 Spline-MA1 Missing 49 S.D. 0.4.

Table 6.3.9 MA1 Missing at position 7, S.D. 0.04.


Missing 7
Summary Table Spline
Case Theta1 SD MAD/SD
1 MA 1 -2.5 0.04 0.151776
0.114481
2 MA 1 -2 0.04 0.129522
0.098249
3 MA 1 -1.5 0.04 0.106392
0.082194
4 MA 1 -1 0.04 0.086408
0.067235
5 MA 1 -0.5 0.04 0.065325
0.051
6 MA 1 0 0.04 0.045117
0.16
0.035172
0.14
7 MA 1 0.5 0.04 0.028603
0.022762 0.12

8 MA 1 1 0.04 0.027754 0.1


MAD

0.019848 0.08
9 MA 1 1.5 0.04 0.042571 0.06
0.029776
0.04
10 MA 1 2 0.04 0.062705
0.02
0.044422
0
11 MA 1 2.5 0.04 0.085324
-3 -2 -1 0 1 2 3
0.060348

Figure 6.21 Spline-MA1 Missing 7 S.D. 0.04.

129
Table 6.3.10 MA1 Missing at position 7, S.D. 0.4.
Missing 7
Summary Table Spline
Case Theta1 SD MAD/SD
1 MA 1 -2 0.4 1.347883
0.994823
2 MA 1 -1.5 0.4 1.09674
0.813553
3 MA 1 -1 0.4 0.860287
0.657785
4 MA 1 -0.5 0.4 0.642886
0.498423 1.6
5 MA 1 0 0.4 0.441653
1.4
0.341495
1.2
6 MA 1 0.5 0.4 0.282628
1

MAD
0.22445
0.8
7 MA 1 1 0.4 0.264801
0.6
0.197712
0.4
8 MA 1 1.5 0.4 0.384928
0.2
0.301607
9 MA 1 2 0.4 0.566299 0
-3 -2 -1 0 1 2 3
0.4418

Figure 6.22 Spline-MA1 Missing 7 S.D. 0.4.

Table 6.3.11 MA1 Missing at position 91, S.D. 0.04.


Missing 91
Summary Table Spline
Case Al1 Var MAD/SD
1 MA 1 -2.5 0.04 0.186318
0.134582
2 MA 1 -2 0.04 0.158123
0.114928
3 MA 1 -1.5 0.04 0.132833
0.095227
4 MA 1 -1 0.04 0.104645
0.076989
5 MA 1 -0.5 0.04 0.077879
0.057619
6 MA 1 0 0.04 0.053612
0.039298
0.2
7 MA 1 0.5 0.04 0.032274 0.18
0.024511 0.16
8 MA 1 1 0.04 0.02616 0.14
MAD

0.020161 0.12
0.1
9 MA 1 1.5 0.04 0.043808
0.08
0.031074
0.06
10 MA 1 2 0.04 0.065944 0.04
0.050103 0.02
11 MA 1 2.5 0.04 0.089776 0
-3 -2 -1 0 1 2 3
0.070467

Figure 6.23 Spline-MA1 Missing 91 S.D. 0.04.

130
Table 6.3.12 MA1 Missing at position 91, S.D. 0.4.
Missing 91
Summary Table Spline
Case Al1 Var MAD/SD
1 MA 1 -2 0.4 1.572588
1.152809
2 MA 1 -1.5 0.4 1.305746
0.958832
3 MA 1 -1 0.4 1.041798
0.766421
4 MA 1 -0.5 0.4 0.786114
0.571236 1.8

5 MA 1 0 0.4 0.538136 1.6

0.389602 1.4

6 MA 1 0.5 0.4 0.322228 1.2

MAD
0.241346 1

7 MA 1 1 0.4 0.274297 0.8

0.207247 0.6

8 MA 1 1.5 0.4 0.439719 0.4

0.316477 0.2

9 MA 1 2 0.4 0.661222 0
-3 -2 -1 0 1 2 3
0.500881

Figure 6.24 Spline-MA1 Missing 91 S.D. 0.4.

The graphs above indicate the best MAD value achievable when using the ‘interpolating

cubic spline smooth curve’ method. Graph results show that regardless of the position

of the missing value ie 7, 49 or 91 the MAD achieved by this method will produce a

similar pattern of results. It can be seen that the MAD value decrease as Theta

approaches 1 and then increases. The minimum MAD value for MA1 occurs when

Theta value is 1. The overall performance of using the cubic spline method indicates

that for Theta values less than 1 the MAD value decreases in a linear pattern. Clearly

the use of this method is best suited for positive Theta MA1 time series model. As

before the SD of the MAD is smallest at the minimum MAD value corresponding to

Theta of 1.

131
According to all the results, the absolute deviation and standard deviation of the random

data generated are in direct proportion to each other. Results indicate that if we increase

the standard deviation of the random data generated by scale of 10 then the means

absolute deviation and the standard deviation is also increased approximately by ten

times. Results are analysed by commenting on the Phi and Theta values rather than

concentrating on the standard deviation of the time series data.

6.4 ARIMA Interpolation

In general time series can be represented by stochastic models. It means that the future

values are only partly determined by the past values, so that the exact predictions are

impossible and must be replaced by the idea that future values have a probability

distribution which is conditioned by a knowledge of the past. By using the time series

approach, we have to determine the time series model for the data series first.

In order to compare the effectiveness of each method, I applied the ARIMA

interpolation to the same data sets from previous simulations.

We conducted the simulation as follows:

1. Import data sets for specific time series models.

2. Take out a specific value from the data set and store it at another cell for

comparison.

3. Apply minitab macro for ARIMA interpolation.

132
4. Repeat the process one hundred times.

5. Calculate the mean absolute deviation and standard deviation for comparison.

Each case looks at a particular time series model (i.e. AR and MA) with a particular

variable (i.e. Phi and Theta), which is applied to the forecasting to generate the different

simulations. To enable this to occur a macro from the computer package “minitab” will

automatically test missing values at position 7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, 84

and 91. After one hundred times, the macro will produce a table summarising the mean

absolute deviation (MAD) and the standard deviation (SD) of the residuals. The results

are then graphed. In each case we compare the forward forecasting, backward

forecasting and combined interpolation method to determine which method produces

the most accurate results. These results are detailed in the Appendix B

According to the results, the absolute deviation and standard deviation of the random

data generated are in direct proportion to each other. Results indicate that if we

increased the standard deviation of the random data generated by scale of 10 then the

means absolute deviation and the standard deviation is also increased approximately by

ten times. These results are analysed by commenting on the Phi and Theta values rather

than concentrating on the standard deviation of the time series data.

Also, our results show that when the ARIMA interpolation method is applied to AR1

time series model with positive Phi value less than or equal to 0.2 and the missing value

is at the first half of the data set the backward forecast will produce a smaller MAD. As

the missing value moves towards the end of the data set the MAD will increase

accordingly. We notice from the simulations the forward forecast will produce more

133
accurate results if the missing value appears in the second half of the data set. A

combination of the backward and forward forecast (ie combined interpolation method)

shows that the MAD result was similar to the forward forecasting where the missing

values appear in the first half of the data set. The second half of the data set shows that

the MAD result achieved a similar outcome to that of backward forecasting.

The following tables sumarise the results for missing data at position 7, 49 and 91 for

AR(1) process.

Table 6.4.1 AR1 Missing at position 49, S.D. 0.04.


AR1 Missing49

MAD R MAD RF MAD RB


0.034
-0.8 0.023533 0.0322425 0.0323951
0.032
MAD

-0.6 0.0262682 0.0321396 0.0312858 0.03 MAD R


0.028 MAD RF
-0.4 0.0291798 0.0325068 0.0306684 0.026 MAD RB
0.024
-0.2 0.0322498 0.0329494 0.0317469 0.022
0.02
0.2 0.0331776 0.0333826 0.0324672
-1 -0.5 0 0.5 1
0.4 0.0315432 0.0337797 0.0332746

0.6 0.0292484 0.0337051 0.0334359

0.8 0.0259562 0.0333102 0.0325889

Figure 6.25 Interpolation -AR1 Missing 49 S.D. 0.04.

Table 6.4.2 AR1 Missing at position 49, S.D. 0.4.


AR1 Missing49

MAD R MAD RF MAD RB


0.34
-0.8 0.237493 0.317936 0.335667
0.32
0.3
MAD

-0.6 0.264599 0.321298 0.322214 MAD R


0.28 MAD RF
-0.4 0.293285 0.322981 0.31651 0.26 MAD RB
0.24
-0.2 0.31632 0.323948 0.316303 0.22
0.2
0.2 0.333057 0.330219 0.325874
-1 -0.5 0 0.5 1
0.4 0.316576 0.333048 0.330335

0.6 0.29102 0.334398 0.330437

0.8 0.258301 0.332153 0.323784

Figure 6.26 Interpolation -AR1 Missing 49 S.D. 0.4.

134
Table 6.4.3 AR1 Missing at position 7, S.D. 0.04.
AR1 Missing7

MAD R MAD RF MAD RB


0.043
-0.8 0.0273437 0.0378197 0.030672 0.041
0.039
-0.6 0.03215 0.0401809 0.0299092 0.037

MAD
MAD R
0.035
MAD RF
-0.4 0.0367039 0.0417334 0.0299332 0.033
0.031 MAD RB
-0.2 0.0393637 0.0406404 0.0304429 0.029
0.027
0.2 0.0383548 0.0394706 0.0349698 0.025
-1 -0.5 0 0.5 1
0.4 0.0354171 0.0391116 0.0370822

0.6 0.0308411 0.0372982 0.0377745

0.8 0.0269882 0.0336106 0.0362041

Figure 6.27 Interpolation -AR1 Missing 7 S.D. 0.04.

Table 6.4.4 AR1 Missing at position 7, S.D. 0.4.


AR1 Missing7

MAD R MAD RF MAD RB


0.043
-0.8 0.269678 0.36523 0.304596 0.041
0.039
MAD

-0.6 0.318492 0.390068 0.301754 0.037 MAD R


0.035
MAD RF
-0.4 0.353347 0.398411 0.295805 0.033
0.031 MAD RB
-0.2 0.376851 0.38709 0.302846 0.029
0.027
0.2 0.38139 0.392677 0.354507 0.025
-1 -0.5 0 0.5 1
0.4 0.347751 0.390727 0.364199

0.6 0.307943 0.37496 0.371735

0.8 0.259304 0.333315 0.355098

Figure 6.28 Interpolation -AR1 Missing 7 S.D. 0.4.

Table 6.4.5 AR1 Missing at position 91, S.D. 0.04.


AR1 Missing91

MAD R MAD RF MAD RB


0.039
-0.8 0.0261095 0.0324963 0.0352997 0.037
0.035
MAD

-0.6 0.0286188 0.0330489 0.0364775 MAD R


0.033
MAD RF
-0.4 0.0315949 0.0331553 0.0372271 0.031
MAD RB
0.029
-0.2 0.0328277 0.0325427 0.0345711
0.027
0.2 0.0337177 0.0338195 0.0338952 0.025
-1 -0.5 0 0.5 1
0.4 0.0308536 0.0331826 0.0320738

0.6 0.0283289 0.0336288 0.0304446

0.8 0.0266075 0.0333797 0.0304982

Figure 6.29 Interpolation -AR1 Missing 91 S.D. 0.04.

135
Table 6.4.6 AR1 Missing at position 91, S.D. 0.4.
AR1 Missing91

MAD R MAD RF MAD RB


0.36
-0.8 0.255776 0.325213 0.349287 0.34

-0.6 0.276734 0.325343 0.350971 0.32

MAD
MAD R
0.3 MAD RF
-0.4 0.3064 0.321163 0.356815
0.28 MAD RB
-0.2 0.333653 0.324851 0.344324
0.26
0.2 0.32465 0.326182 0.32928 0.24
-1 -0.5 0 0.5 1
0.4 0.296121 0.325865 0.313798

0.6 0.277903 0.321006 0.31019

0.8 0.264841 0.324714 0.312554

Figure 6.30 Interpolation -AR1 Missing 91 S.D. 0.4.

In the case of AR1 time series model with various Phi values, the tables and graphs

show that for Phi values greater than 0.4 the use of the combined interpolation method

will produce a lower MAD than the two forecasting methods and consequently a more

accurate result.

The forward and backward forecasting methods produced similar MAD values however

as the Phi values approach negative one the combined interpolation method was able to

achieve lower MAD values.

It doesn’t matter whether the Phi values are positive or negative the results indicate that

the MAD will follow the same pattern. The tables show positive Phi values of 0.2

through to 0.8 will produce the same results as the negative Phi values of -0.2 through to

-0.8. The MAD value and standard deviation of the combined interpolation method

decrease as the Phi value approaches positive or negative one. This indicates that there

is less variation in the estimation of missing values for this method. In other words, for

missing values in all positions combined interpolation method will provide a more

136
accurate result when Phi values are greater than approximately 0.2. In addition, when

we compare the MAD value of combined interpolation method with forward forecasting

or backward forecasting methods, the difference between the MAD values increase as

Phi value approaches positive or negative one.

The following tables summarize the results for missing data at position 7, 49 and 91 for

MA(1) process..

Table 6.4.7 MA1 Missing at position 49, S.D. 0.04.


MA1 Missing49
0.09
MAD R MAD RF MAD RB 0.08
0.07
-2.5 0.076197 0.081617 0.079699 0.06
MAD R
MAD

0.05
-2 0.057751 0.066374 0.064237 MAD RF
0.04
MAD RB
-1.5 0.036482 0.050362 0.048376 0.03
0.02
-1 0.007523 0.033378 0.032155
0.01
-0.5 0.030472 0.034035 0.030166 0
-3 -2 -1 0 1 2 3
0 0.033308 0.033066 0.032012

0.5 0.029565 0.032445 0.033615

1 0.008293 0.033518 0.031846

1.5 0.036514 0.049626 0.048187

2 0.055769 0.062781 0.065359

2.5 0.074283 0.078819 0.081196

Figure 6.31 Interpolation -MA1 Missing 49 S.D. 0.04.

137
Table 6.4.8 MA1 Missing at position 49, S.D. 0.4.
MA1 Missing49
0.8
MAD R MAD RF MAD RB
0.7

-2 0.592949 0.694544 0.646697 0.6

-1.5 0.379441 0.532821 0.486054 0.5 MAD R

MAD
0.4 MAD RF
-1 0.074205 0.331129 0.326705
0.3 MAD RB
-0.5 0.293773 0.335526 0.300666 0.2
0 0.335296 0.330553 0.324336 0.1
0
0.5 0.298836 0.32456 0.339511
-3 -2 -1 0 1 2 3
1 0.081639 0.330267 0.322867

1.5 0.363546 0.490106 0.486772

2 0.553836 0.61774 0.646211 Figure 6.32 Interpolation -MA1 Missing 49 S.D. 0.4.

Table 6.4.9 MA1 Missing at position 7, S.D. 0.04.


MA1 Missing7
0.12
MAD R MAD RF MAD RB
0.1
-2.5 0.093279 0.093005 0.079602
0.08
MAD

-2 0.070695 0.070766 0.064046 MAD R


0.06 MAD RF
-1.5 0.045411 0.050762 0.04846
MAD RB
0.04
-1 0.02654 0.038548 0.03376
0.02
-0.5 0.032249 0.037254 0.028663

0 0.038905 0.038872 0.033066 0


-3 -2 -1 0 1 2 3
0.5 0.035278 0.040064 0.038379

1 0.027176 0.051408 0.033683

1.5 0.048816 0.069313 0.047079

2 0.068489 0.087708 0.062467

2.5 0.085494 0.101122 0.077717 Figure 6.33 Interpolation -MA1 Missing 7 S.D. 0.04.

138
Table 6.4.10 MA1 Missing at position 7, S.D. 0.4.
MA1 Missing7
1
MAD R MAD RF MAD RB 0.9
0.8
-2 0.694264 0.703989 0.633501
0.7

MAD
-1.5 0.455329 0.506209 0.477089 0.6 MAD R
0.5 MAD RF
-1 0.270955 0.384059 0.339572
0.4 MAD RB
-0.5 0.325232 0.368127 0.291597 0.3
0.2
0 0.377457 0.37886 0.324823 0.1
0
0.5 0.351274 0.389328 0.38042
-3 -2 -1 0 1 2 3
1 0.27056 0.50292 0.342362

1.5 0.48309 0.673747 0.479479

2 0.69476 0.866275 0.635341 Figure 6.34 Interpolation -MA1 Missing 7 S.D. 0.4.

Table 6.4.11 MA1 Missing at position 91, S.D. 0.04.


MA1 Missing91
0.1
MAD R MAD RF MAD RB 0.09
0.08
-2.5 0.084683 0.085803 0.090963 0.07
MAD

0.06 MAD R
-2 0.065363 0.070457 0.072767 0.05 MAD RF
0.04 MAD RB
-1.5 0.042868 0.055127 0.054559
0.03
-1 0.014676 0.034161 0.038218 0.02
0.01
-0.5 0.031162 0.032078 0.037918 0
-3 -2 -1 0 1 2 3
0 0.035431 0.033098 0.035601

0.5 0.031088 0.033326 0.033442

1 0.01671 0.033931 0.037242

1.5 0.043292 0.047894 0.050749

2 0.062758 0.065344 0.068009

2.5 0.081887 0.082014 0.085603

Figure 6.35 Interpolation -MA1 Missing 91 S.D. 0.04.

139
Table 6.4.12 MA1 Missing at position 91, S.D. 0.4.
MA1 Missing91
0.8
MAD R MAD RF MAD RB 0.7
0.6
-2 0.652825 0.710873 0.712933

MAD
0.5 MAD R
-1.5 0.415098 0.541671 0.522161 0.4 MAD RF
0.3 MAD RB
-1 0.146873 0.326743 0.37071
0.2
-0.5 0.310928 0.318738 0.37288
0.1
0 0.349438 0.31944 0.351491 0
-3 -2 -1 0 1 2 3
0.5 0.309007 0.320209 0.338771

1 0.172205 0.343041 0.3803

1.5 0.428258 0.49429 0.504101

2 0.621556 0.661747 0.671736

Figure 6.36 Interpolation -MA1 Missing 91 S.D. 0.4.

Results indicate that when the ARIMA interpolation method is applied to MA1 time

series model and where the missing value appears in the middle of the data set, and the

Theta value is greater than 0.5 then it is more desirable to use the combined

interpolation method. If the Theta value is less than 0.5, it is more desirable to use

backward forecasting method. For these cases where the missing value appears in the

first half of the data set and the Theta value is greater than 0.5 and less than 2, the use of

the combined interpolation method will produce a lower MAD than the two forecasting

methods and consequently a more accurate result. Otherwise, the backward forecasting

method will produce smaller mean absolute deviation. In addition if the missing value

appears in the second half of the set and the Theta value is greater than 0.5, we find that

the combined interpolation method is still performing better than the other two

forecasting methods. When the Theta value is less than 0.5, it is more desirable to

choose the forward forecasting method.

140
It doesn’t matter whether the Theta values are positive or negative the results indicate

that they will follow the same pattern ie they are almost symmetric. The tables show

positive Theta values of 0 through to 2 which will produce the same results as the

negative Theta values of 0 through to -2.

When looking at the combined interpolation method, where the Theta values are

positive, the MAD values will decrease until the Theta values equal one and increases

thereafter. This is explained by the factor that the highest correlation between

consecutive values occurs at one. The graph above also shows this behaviour is same

for negative Theta values and applies to all moving average time series models.

With regards to the MA time series model, when we compare the MAD value of

combined interpolation method with forward forecasting or backward forecasting

methods, the results showed that the difference between the MAD values is at its

maximum when Theta equals one or negative one.

It is important to note that where the missing value appears at the beginning or end of

the data, the combined interpolation method cannot be used because it is impossible to

obtain both forward and backward forecasts, therefore only forward forecasting or

backward forecasting can be used.

141
6.5 Applying Structural Time Series Analysis (STSA) to Box-Jenkins

Models

In this section, we will briefly examine the use of state space modelling in time series. It

is a complex approach to analyzing data. Often we have to rely on computer software to

carry out the analysis efficiently and accurately. The advantage of using this approach is

that it can adapt to different time series models such as the Box-Jenkins ARIMA and

Structural Time Series. This approach emphasises the notion that a time series is a set of

distinct components. Thus, we may assume that observations relate to the mean level of

the process through an observation equation, whereas one or more state equations

describe how the individual components change through time. In State Space

Modelling, observations can be added one at a time and the estimating equations are

then updated to produce new estimates. During the analysis process, a number of

problems were encountered, particularly the limited availability of resources such as

computer software. There is only one software package “STAMP” available to conduct

structural time series analysis. That is why we are using the structural time series

analysis on ARIMA models to investigate state space modelling. Again, in order to

compare the effectiveness of each method, I applied the structure time series to the same

data sets from previous simulations.

We conducted the simulation as follows:

1. Select 5 data sets for each time series models from previous simulation with

standard deviation of the random variable equal to 0.04.

The models chosen were: AR models with Phi values -0.8, -0.2, 0.2 and 0.8.

MA models with Theta values -2, -1, 0, 1 and 2.

142
2. Take out a specific value from the data set as a missing value.

3. Convert the data sets for specific time series models into text file.

4. Import the data for specific time series model into “STAMP”.

5. Apply structural time series analysis.

6. Repeat the process 5 times for each chosen models.

7. Calculate the mean absolute deviation.

When we apply structural time series analysis, we have to setup the software with the

correct settings before the analysis processes can begin. The following settings were

used during this analysis process:

1. Stochastic level

2. No slope

3. No Seasonals

4. No cycle or autoregressive process

5. Irregular component with prespecified variance. Standard Deviation = 0.04

6. No explanatory and intervention variables

7. Use all data points

8. Time domain estimation

9. The results generated by “STAMP” is not comparable with other results in this

thesis. Hence, we have to export the results as text file format then use a macro

to import the data into EXCEL for further analysis.

143
The results of the analysis process is as follow:

AR (1) time series model


Table 6.5.1 AR1 Missing at position 49, S.D. 0.04.
Position 49 AR1 SD 0.04
0.18
-0.8 0.157505 0.16
0.14
-0.2 0.018662 0.12

MAD
0.1
0.2 0.041506 0.08
0.06
0.8 0.018745
0.04
0.02
0
-1 -0.5 0 0.5 1

Figure 6.37 STSA -AR1 Missing 49 S.D. 0.04.

Table 6.5.2 AR1 Missing at position 7, S.D. 0.04.


Position 7 AR1 SD 0.04
0.12
-0.8 0.112873 0.1
MAD

-0.2 0.046072 0.08

0.06
0.2 0.031523
0.04
0.8 0.051045
0.02

0
-1 -0.5 0 0.5 1

Figure 6.38 STSA -AR1 Missing 7 S.D. 0.04.

Table 6.5.3 AR1 Missing at position 91, S.D. 0.04.


Position 91 AR1 SD 0.04
0.08
-0.8 0.072952 0.07
0.06
-0.2 0.039776
0.05
MAD

0.2 0.055622 0.04


0.03
0.8 0.039186 0.02
0.01
0
-1 -0.5 0 0.5 1

Figure 6.39 STSA -AR1 Missing 91 S.D. 0.04.

As we only have 5 sets of data for each model, there is considerable variation within the

residuals and it is difficult to identify specific patterns. However, it can be noted from

144
above that the structural time series model produced the largest MAD value when Phi is

-0.8. This indicates that Structural Time Series model is not an appropriate state space

model for fitting an AR(1) data series when Phi is a large negative value. In addition, it

can be noted that the MAD corresponding with missing value at position 49 is less than

the MAD for missing values at position 7 or 91 when Phi is positive.

MA (1) time series model


Table 6.5.4 MA1 Missing at position 49, S.D. 0.04
Position 49 MA1 SD 0.04
0.08
-2 0.071275 0.07
0.06
0.05

MAD
-1 0.040845
0.04
0 0.0127 0.03
0.02
1 0.052433 0.01
0
2 0.068496 -3 -2 -1 0 1 2 3

Figure 6.40 STSA -MA1 Missing 49 S.D. 0.04.

Table 6.5.5 MA1 Missing at position 7, S.D. 0.04.


Position 7 MA1 SD 0.04
0.14
-2 0.123219 0.12
0.1
MAD

-1 0.065875 0.08
0.06
0 0.042888 0.04
0.02
1 0.041206 0
-4 -2 0 2 4
2 0.085761

Figure 6.41 STSA -MA1 Missing 7 S.D. 0.04.

Table 6.5.6 MA1 Missing at position 91, S.D. 0.04.


Position91 MA1 SD 0.04
0.12
-2 0.110861 0.1
0.08
-1 0.036654
MAD

0.06
0 0.037778 0.04
0.02
1 0.033753 0
-3 -2 -1 0 1 2 3
2 0.07613

Figure 6.42 STSA -MA1 Missing 91 S.D. 0.04.

145
It can be noted from the above graphs that the structural time series model produced the

largest MAD value which occurred when Theta are large positive and negative values.

This indicates that Structural Time Series model is not an appropriate state space model

for fitting an MA(1) data series when Theta is a large positive or negative value. In

addition, it can be noted that the MAD corresponding with missing value at position 49

is significantly less than any other MAD when Theta is zero.

146
CHAPTER 7

Conclusion

7.1 About this chapter

The purpose of this research is to investigate various modelling approaches and try to

determine the most appropriate technique for modelling time series data with missing

value. We have examined each of the methods individually and gained some insight into

the appropriateness of these methods for estimating missing values. In this section, we

compare the MAD values from different methods for the same simulated data set.

7.2 Effectiveness of various approaches

The following tables are comparisons of the MAD between the estimated and actual

value for all the non-state space modelling methods discussed in this thesis. We also

graphed the values and identified the features for each method. Results are then

analysed by commenting on the Phi and Theta values for each ARIMA model.

147
Case 1: AR Data

Table 7.2.1 AR1 Missing at position 49, S.D. 0.04.

AR1 Missing49 Sd 0.04

Poly Spline MAD R MAD RF MAD RB

-0.8 0.050942 0.148629 0.023533 0.0322425 0.0323951

-0.6 0.038281 0.092945 0.0262682 0.0321396 0.0312858

-0.4 0.034486 0.070114 0.0291798 0.0325068 0.0306684

-0.2 0.032931 0.056752 0.0322498 0.0329494 0.0317469

0.2 0.032494 0.040865 0.0331776 0.0333826 0.0324672

0.4 0.034098 0.035252 0.0315432 0.0337797 0.0332746

0.6 0.037837 0.031162 0.0292484 0.0337051 0.0334359

0.8 0.043769 0.027783 0.0259562 0.0333102 0.0325889

0.16
0.14
poly
0.12
0.1 spline
MAD

0.08 MAD R
0.06 MAD RF
0.04 MAD RB
0.02
0
-1 -0.5 0 0.5 1

Figure 7.1 Various Methods -AR1 Missing 49 S.D. 0.04.

148
Table 7.2.2 AR1 Missing at position 7, S.D. 0.04.

AR1 Missing7 sd 0.04

Poly Spline MAD R MAD RF MAD RB

-0.8 0.054966 0.166033 0.0273437 0.0378197 0.030672

-0.6 0.039208 0.09659 0.03215 0.0401809 0.0299092

-0.4 0.034053 0.069433 0.0367039 0.0417334 0.0299332

-0.2 0.031406 0.053964 0.0393637 0.0406404 0.0304429

0.2 0.035862 0.036933 0.0383548 0.0394706 0.0349698

0.4 0.03967 0.032707 0.0354171 0.0391116 0.0370822

0.6 0.041655 0.028611 0.0308411 0.0372982 0.0377745

0.8 0.043019 0.02631 0.0269882 0.0336106 0.0362041

0.18
0.16
0.14 poly
0.12 spline
MAD

0.1 MAD R
0.08
0.06 MAD RF
0.04 MAD RB
0.02
0
-1 -0.5 0 0.5 1

Figure 7.2 Various Methods -AR1 Missing 7 S.D. 0.04.

149
Table 7.2.3 AR1 Missing at position 91, S.D. 0.04.

AR1 Missing91 sd 0.04

Poly Spline MAD R MAD RF MAD RB

-0.8 0.06013 0.158685 0.0261095 0.0324963 0.0352997

-0.6 0.046854 0.110424 0.0286188 0.0330489 0.0364775

-0.4 0.039307 0.081325 0.0315949 0.0331553 0.0372271

-0.2 0.034802 0.064524 0.0328277 0.0325427 0.0345711

0.2 0.034568 0.046382 0.0337177 0.0338195 0.0338952

0.4 0.033622 0.040234 0.0308536 0.0331826 0.0320738

0.6 0.034753 0.03546 0.0283289 0.0336288 0.0304446

0.8 0.035638 0.031523 0.0266075 0.0333797 0.0304982

0.18
0.16
0.14 poly
MAD

0.12 spline
0.1 MAD R
0.08
0.06 MAD RF
0.04 MAD RB
0.02
0
-1 -0.5 0 0.5 1

Figure 7.3 Various Methods -AR1 Missing 91 S.D. 0.04.

It can be observed that if the Phi value is negative then the cubic spline produces the

largest MAD value in comparison to the other methods mentioned. When the Phi value

is greater than approximately 0.5 then the cubic spline will produce a similar result as

the combined interpolation method. When the Phi value is close to 0 the graph

indicated that a low degree polynominal will provide acceptable results for estimating

missing value at various positions in the data set.

150
Case 2: MA data

Table 7.2.4 MA1 Missing at position 49, S.D. 0.04.

MA1 Missing 49 sd 0.04

Poly Spline MAD R MAD RF MAD RB

-2.5 0.086399 0.163606 0.076197 0.081617 0.079699

-2 0.073048 0.139247 0.057751 0.066374 0.064237

-1.5 0.05864 0.114831 0.036482 0.050362 0.048376

-1 0.045712 0.090936 0.007523 0.033378 0.032155

-0.5 0.036391 0.068185 0.030472 0.034035 0.030166

0 0.031823 0.047873 0.033308 0.033066 0.032012

0.5 0.034744 0.029624 0.029565 0.032445 0.033615

1 0.042862 0.02709 0.008293 0.033518 0.031846

1.5 0.053767 0.041176 0.036514 0.049626 0.048187

2 0.06792 0.063375 0.055769 0.062781 0.065359

2.5 0.083524 0.086267 0.074283 0.078819 0.081196

0.18
0.16
0.14
Poly
0.12
spline
0.1
MAD

MAD R
0.08 MAD RF
0.06 MAD RB
0.04
0.02
0
-3 -2 -1 0 1 2 3

Figure 7.4 Various Methods -MA1 Missing 49 S.D. 0.04.

151
Table 7.2.5 MA1 Missing at position 7, S.D. 0.04.

MA1 Missing 7 Sd 0.04

Poly Spline MAD R MAD RF MAD RB

-2.5 0.081078 0.151776 0.093279 0.093005 0.079602

-2 0.066444 0.129522 0.070695 0.070766 0.064046

-1.5 0.054124 0.106392 0.045411 0.050762 0.04846

-1 0.043392 0.086408 0.02654 0.038548 0.03376

-0.5 0.034744 0.065325 0.032249 0.037254 0.028663

0 0.032394 0.045117 0.038905 0.038872 0.033066

0.5 0.043533 0.028603 0.035278 0.040064 0.038379

1 0.056781 0.027754 0.027176 0.051408 0.033683

1.5 0.07014 0.042571 0.048816 0.069313 0.047079

2 0.084502 0.062705 0.068489 0.087708 0.062467

2.5 0.099148 0.085324 0.085494 0.101122 0.077717

0.16
0.14
0.12
Poly
0.1 spline
0.08 MAD R
MAD

MAD RF
0.06
MAD RB
0.04
0.02
0
-3 -2 -1 0 1 2 3

Figure 7.5 Various Methods -MA1 Missing 7 S.D. 0.04.

152
Table 7.2.6 MA1 Missing at position 91, S.D. 0.04.

MA1 Missing 91 sd 0.04

Poly Spline MAD R MAD RF MAD RB

-2.5 0.09355 0.186318 0.084683 0.085803 0.090963

-2 0.07737 0.158123 0.065363 0.070457 0.072767

-1.5 0.063922 0.132833 0.042868 0.055127 0.054559

-1 0.051664 0.104645 0.014676 0.034161 0.038218

-0.5 0.040358 0.077879 0.031162 0.032078 0.037918

0 0.03409 0.053612 0.035431 0.033098 0.035601

0.5 0.034356 0.032274 0.031088 0.033326 0.033442

1 0.043939 0.02616 0.01671 0.033931 0.037242

1.5 0.057413 0.043808 0.043292 0.047894 0.050749

2 0.072301 0.065944 0.062758 0.065344 0.068009

2.5 0.087813 0.089776 0.081887 0.082014 0.085603

0.2
0.18
0.16
0.14 Poly
0.12 spline
MAD

0.1 MAD R
0.08 MAD RF
0.06 MAD RB
0.04
0.02
0
-3 -2 -1 0 1 2 3

Figure 7.6 Various Methods -MA1 Missing 91 S.D. 0.04.

153
For the MA time series generated data the results above indicate that if there is

sufficient data before the missing value then a low degree polynominal will provide an

acceptable estimation where the Theta value is close to 0. When the Theta value is

greater than approximately 0.5 then the combined interpolation method will produce a

smaller MAD. However, where there is insufficient data before the missing value the

results indicate that the cubic spline will provide an estimation similar to the combined

interpolation method.

From the above it appears that for most of the ARIMA time series models, ARIMA

interpolation appears to be the most suitable method for estimating the missing value. In

the case where the missing value occurs in the second half of the series, interpolation

with forward and backward forecast seems to give the best estimate. The results also

indicated if the missing value occurs in the first half of the series, then backward

forecast can also give good estimate result. In addition for AR1 time series with Phi

values between 0.5 and 1, a cubic spline provides a good estimation. In the case of MA1

time series model, the results indicated that cubic spline approach only produces good

estimations when the Theta value is between positive 1 and 2. Hence, usages of cubic

spline approach for this model are limited.

7.3 Further comparison between various approaches

In regards to state space modelling, the software STAMP that was used within this

thesis is an earlier version and it can only work on one set of time series data at any

given time and has limited capacity to deal with data. As a result, it is difficult to make a

154
fair comparison between state space modelling with all the other methods mentioned

above.

Although it is difficult to obtain a fair comparison, we still would like to give some idea

of the MAD values from the structural time series analysis approach compared to all the

other methods we mentioned. In order to make a comparison of the MAD values

between each method, we used the same five data sets for each of the methods

considered previously as well as state space modelling. The outline of this simulation is

as follows:

1. Use the same 5 data sets for each time series models from previous simulation

with standard deviation of the random variable equal to 0.04.

The models chosen are: AR models with Phi values -0.8, -0.2, 0.2 and 0.8.

MA models with Theta values -2, -1, 0, 1 and 2.

2. Take out a specific value from the data set as a missing value.

3. Apply the plug-in to obtain an appropriate cubic spline fit.

4. Apply the plug-in and try to fit the data set with different degree of polynomials.

5. Apply minitab macro for ARIMA interpolation.

6. Repeat the process 5 times for each chosen models.

7. Calculate the mean absolute deviation and standard deviation for comparison.

155
After the simulation, we obtained the following results:

Table 7.3.1 AR1 Missing at position 49 S.D. 0.04.


Position SD 0.04
49 AR1 STSA Spline Poly MAD R MAD RF MAD RB
-0.8 0.157505 0.1738362 0.0589081 0.00118 0.0015097 0.0021143
-0.2 0.018662 0.0191134 0.0204267 0.001586 0.0016312 0.0010177
0.2 0.041506 0.0380957 0.0236466 0.0012385 0.0013061 0.0014167
0.8 0.018745 0.0126503 0.0444992 0.0008653 0.001395 0.0008081

0.045
0.04
0.035 STSA
0.03 Spline
0.025 Poly
MAD

0.02 MAD R
0.015 MAD RF
0.01 MAD RB
0.005
0
-1 -0.5 0 0.5 1
Phi values

Figure 7.7 STSA vs Various Methods -AR1 Missing 49 S.D. 0.04

Table 7.3.2 AR1 Missing at position 7 S.D. 0.04.


Position 7 SD 0.04
AR1 STSA Spline Poly MAD R MAD RF MAD RB
-0.8 0.112873 0.1820264 0.0519243 0.0013213 0.0010278 0.0012215
-0.2 0.046072 0.0668026 0.053112 0.0028584 0.0029775 0.0026071
0.2 0.031523 0.0366474 0.0217782 0.0014148 0.0015059 0.0016381
0.8 0.051045 0.0361998 0.0461244 0.0017806 0.0025806 0.0022063

0.045
0.04
0.035 STSA
0.03 Spline
0.025 Poly
MAD

0.02 MAD R
0.015 MAD RF
0.01 MAD RB
0.005
0
-1 -0.5 0 0.5 1
Phi values

Figure 7.8 STSA vs Various Methods -AR1 Missing 7 S.D. 0.04

156
Table 7.3.3 AR1 Missing at position 91 S.D. 0.04.
Position SD 0.04
91 AR1 STSA Spline Poly MAD R MAD RF MAD RB
-0.8 0.072952 0.1458479 0.0461086 0.0009614 0.0019591 0.0014623
-0.2 0.039776 0.048814 0.0354803 0.0014845 0.0020288 0.0013059
0.2 0.055622 0.0446829 0.0472979 0.0017608 0.0024536 0.0015462
0.8 0.039186 0.0240632 0.0348626 0.001044 0.0020468 0.0014848

0.05
0.045
0.04 STSA
0.035 Spline
0.03
Poly
MAD

0.025
MAD R
0.02
0.015 MAD RF
0.01 MAD RB
0.005
0
-1 -0.5 0 0.5 1
Phi values

Figure 7.9 STSA vs Various Methods -AR1 Missing 91 S.D. 0.04

Table 7.3.4 MA1 Missing at position 49 S.D. 0.04.


Position SD 0.04
49 MA1 STSA Spline Poly MAD R MAD RF MAD RB
-2 0.071275 0.0562299 0.0347884 0.0037656 0.0026683 0.0025398
-1 0.040845 0.0292791 0.011921 0.0003525 0.0012226 0.0012351
0 0.0127 0.0161288 0.0328579 0.0019812 0.0019933 0.0016595
1 0.052433 0.0376355 0.0385145 0.0006725 0.001671 0.0017761
2 0.068496 0.0665744 0.0384918 0.0033223 0.0036853 0.0031114

0.045
0.04
0.035 STSA
0.03 Spline
0.025 Poly
MAD

0.02 MAD R
0.015 MAD RF
0.01 MAD RB
0.005
0
-3 -2 -1 0 1 2 3
Theta values

Figure 7.10 STSA vs Various Methods -MA1 Missing 49 S.D. 0.04

157
Table 7.3.5 MA1 Missing at position 7 S.D. 0.04.
Position 7 SD 0.04
MA1 STSA Spline Poly MAD R MAD RF MAD RB
-2 0.123219 0.1481649 0.0828996 0.0037578 0.0036544 0.0050339
-1 0.065875 0.1032381 0.0458992 0.0007621 0.0024738 0.001082
0 0.042888 0.0607298 0.0475895 0.0025278 0.0024398 0.0025226
1 0.041206 0.0300274 0.0237 0.0009791 0.0016975 0.0009865
2 0.085761 0.0968157 0.0760574 0.0044391 0.0047032 0.0045801

0.05
0.045
0.04
STSA
0.035
Spline
0.03
Poly
MAD

0.025
MAD R
0.02
MAD RF
0.015
0.01 MAD RB

0.005
0
-3 -2 -1 0 1 2 3
Theta values

Figure 7.11 STSA vs Various Methods -MA1 Missing 7 S.D. 0.04

Table 7.3.6 MA1 Missing at position 91 S.D. 0.04.


Position91 SD 0.04
MA1 STSA Spline Poly MAD R MAD RF MAD RB
-2 0.110861 0.1499234 0.0695965 0.0034857 0.0032713 0.003762
-1 0.036654 0.0613627 0.0354714 0.0011023 0.001935 0.0014565
0 0.037778 0.0365399 0.0250365 0.0013624 0.0016608 0.0014419
1 0.033753 0.0258709 0.0520241 0.0014818 0.0020299 0.0026278
2 0.07613 0.0582578 0.084886 0.0031374 0.0035039 0.0037823

0.045
0.04
0.035 STSA
0.03 Spline
0.025 Poly
MAD

0.02 MAD R
0.015 MAD RF
0.01 MAD RB
0.005
0
-3 -2 -1 0 1 2 3
Theta values

Figure 7.12 STSA vs Various Methods -MA1 Missing 91 S.D. 0.04

158
Since structural time series analysis is a type of state space modelling it has the

advantage of being able to modify to suit various types of time series models including

nonstationary data types. This method adapted a one-step ahead forecasting method to

predict the future value which makes this method particularly popular in handling

missing values. Nonetheless, this is not the most user friendly method for an average

person to use in estimate missing values. It is difficult to identify the correct time series

model and to incorporate the data into the state space model framework. In addition,

there maybe more appropriate state space models that could be used to obtain the

missing value such as ARIMA state space model. However, such a model is very

difficult to identify and apply. The calculations required complex algorithms which

make it impossible to check manually and the availability of software to handle such

complicated analysis is limited. Hence, state space modelling may not be a feasible

solution in some cases. From the simulations above, we compared the MAD values

from each of the approaches mentioned in this thesis to find the minimum value

possible. As a result, we discovered that using structural time series analysis on ARIMA

simulated data does not produce the best estimate for missing value. This may due to the

inappropriateness of converting ARIMA model into structural time series model. In

addition, the data sets used are generated by ARIMA modeling algorithm that are biased

towards methods that are suited to the stationary model. Finally, we also have to be

aware that due to software limitation, only five data sets were used for comparison.

While there was considerable variation within the residuals there was sufficient

evidence to suggest that alternative methods are more appropriate for such data that was

examined in this thesis.

159
7.4 Future research direction

In this thesis we have only examined a limited range of time series data with missing

value. In order to gain a better understanding and be able to estimate missing values

more accurately, further research needs to be undertaken. For instance, we could apply

the analysis methods discussed on non-stationary time series data and explore their

effectiveness. Also, we should broaden our research to involve different types of data

e.g. multivariate data in time series model. Other instances such as when consecutive

missing values occurred in time series data, study of various techniques have to be taken

on their efficiency and accuracy. Furthermore, there is a need to examine the accuracy

of estimation for each method with real life time series data. Lastly, time series models

other than ARIMA also require further research by using more advance methods such as

appropriate state space models.

7.5 Conclusion

The aim of this research was to review various modelling approaches for time series

data with missing values. In addition we investigated the most appropriate modelling

techniques for different missing data patterns.

We have examined both deterministic and stochastic approaches for time series with

missing values. In order to compare their performance, simulated datasets for AR and

MA processes were used in each method and MAD values were calculated. The

objective was to minimize the MAD between the actual and forecast missing values.

Due to the limitation of software used in this thesis, the testing was separated into two
160
parts. Firstly, we examined non-state space modelling approaches. We graphed and

commented on the relationship between MAD and the AR/MA parameters for different

ARIMA models. For AR1 time series data, our simulations indicated cubic spline is not

suitable for any time series with negative values of Phi. As the Phi value approaches

one, the combined interpolation and polynomial curve fitting seems to produce similar

results. Likewise, the MA1 time series data with negative values of Theta indicate that

cubic spline is not suitable. In general, combined interpolation gives acceptable results

for time series where there is sufficient data available before the missing value. When

the Theta value is close to 0, polynomial curve fitting gives the best estimate. However,

where there is insufficient data before the missing value and the Theta value is greater

than 1, the cubic spline and combined interpolation method provides the most suitable

approach.

In addition, we have also examined the structural time series analysis through the use of

the computer software STAMP. Although this approach is famous for its adaptability,

the calculation is extremely complex and it is hard to identify and apply the time series

model correctly. As a consequent, applying structural time series analysis to ARIMA

simulated data does not give the best estimate for missing value.

In conclusion, our research has indicated that for most of the ARIMA time series

models, ARIMA interpolation appears to be the most suitable method for estimating a

missing value where there is sufficient data to obtain a reliable ARIMA model. In most

cases, the method gives the smallest MAD, In some cases similar results can be

obtained from numerical methods, ie for data fitting an AR model with high positive

correlation at lag 1 a cubic spline or polynomial approximation provides an alternative

161
method for fitting data. Similarly where there is insufficient data to fit and ARIMA

model it would be more appropriate to use a simple numerical approach or State Space

Model to fit the data.

In this thesis, our finding is limited to AR1 and MA1 models with single missing value.

There is still considerable amount of research that is needed to be undertaken. Our

future research direction would extend to considering consecutive missing values and

non-stationary time series models.

162
REFERENCES

Abraham, B. (1981). Missing Observations in Time Series. Communications in statistics


Theory A, 10, 1643-1653.

Beveridge, S. (1992). Least Squares Estimation of Missing Values in Time Series.


Communications in statistics Theory A, 21(12), 3479-3496.

Brockwell, P. J., & Davis, R. A. (1991). Time Series: Theory and Methods. New York,
USA : Springer-Verlag.

Chatfield, C. (2003). The Analysis of Time Series : An Introduction (6thed.).


New York, USA : John Wiley and Sons.

Damsleth, E. (1979). Interpolating Missing Values in a Time Series. Scand J Statist., 7,


33-39.

Ferreiro, O. (1987). Methodologies for the Estimation of Missing Observations in Time


Series. Statistics & Probability Letters, 5(1), 65-69.

Gardner, G., Harvey, A. C., & Phillips, G. D. A. (1980). An Algorithm for Exact
Maximum Likelihood Estimation of Autoregressive-Moving Average Models by Means
of Kalman. Applied Statistics, 29, 311-322.

Gerald, C. F., & Wheatley, P. O. (1994). Applied Numerical Analysis.

Gomez, I. A., Burton, D. M., & Love, H. A. (1995). Imputing Missing Natural Resource
Inventory Data and the Bootstrap. Natural Resource Modelling, 9(4), 299-328.

Hamilton, J. D. (1994). Time Series Analysis. New Jersey, USA : Princeton University
Press.

Harvey, A. C. (2001). Forecasting, Structural Time Series Models and the Kalman
Filter. Cambridge, UK : Cambridge University Press.

Harvey, A. C., & Pierse, R.G. (1984). Estimating Missing Observations in Economic
Time Series. Journal of the American Statistical Association, 79(385), 125-131.

Janacek, G., & Swift, L. (1993). Time Series Forecasting Simulation & Application.
West Sussex, England : Ellis Horwood Limited.

Jones, R. H. (1980). Maximum Likelihood Fitting of ARMA Models to Time Series


with Missing Observations. Technometrics, 22(3), 389-395.

Kalman, R. E. (1960). A New Approach to Linear Filtering and Prediction Problems.


Journal of Basic Energineering, 81, 35-45.

163
Kohn, R., & Ansley, C. F. (1986). Estimation, Prediction, and Interpolation for ARIMA
Models with Missing Data. Journal of the American Statistical Association, 81(395),
751-761.

Ljung, G. M. (1989). A Note on the Estimation of Missing Values in Time Series.


Communications in statistics simulation, 18(2), 459-465.

Luceno, A. (1997). Estimation of Missing Values in Possibly Partially Nonstationary


Vector Time Series. Biometrika, 84(2), 495-499

Nieto, F. H., & Martfncz, J. (1996). A Recursive Approach for Estimating Missing
Observations in An Univariate Time Series. Communications in statistics Theory A,
25(9), 2101-2116.

Pena, D., & Tiao, G. C. (1991). A Note on Likelihood Estimation of Missing Values in
Time Series. The American statistician, 45(3), 212-213.

Robinson, P. M. (1985). Testing for Serial Correlation in Regression with Missing


Observations. Journal of the Royal Statistical Society B, 47, 429-437.

Robinson, P. M., & Dunsmuir, W. (1981). Estimation of Time Series Models in the
Presence of Missing Data. Journal of the American Statistical Association, 76(375),
560-568.

Rosen, Y., & Porat, B. (1989). Optimal ARMA Parameter Estimation Based on The
Sample Covariances for Data with Missing Observations. IEEE Transactions on
Information Theorey, 35(2), 342-349.

Rosen, Y., & Porat, B. (1989). The Second-Order Moments of The Sample Covariances
for Time Series with Missing Observations. IEEE Transactions on Information Theorey,
35(2), 334-341.

Shively, T. S. (1992). Testing for Autoregressive Disturbances in a Time Series


Regression with Missing Observations. Journal of econometrics, 57(1), 233-255.
Shumway, R. H. (1982). An Approach to Time Series Smoothing and Forecasting
using The EM Algorithm. Journal of Time Series Analysis, 3(4), 253-264.

Tresp, V., & Hofmann, R. (1998). Nonlinear Time-Series Prediction with Missing and
Noisy Data. Neural Computation, 10, 731-747.

Wincek, M. A., & Reinsel, G. C. (1986). An Exact Maximum Likelihood Estimation


Procedure for Regression ARMA Time Series Models with Possibly Nonconsecutive
Data. Journal of the Royal Statistical Society B, 48(3), 303-313.

164
Appendix A
During this study, I used macros to help me at different stages for analyzing time series
data and examine the effectiveness of different analysis methods.

In this thesis, I have written macros for Microsoft Excel spreadsheet to generate data
sets. Also, I used Excel to evaluate the accuracy of various prediction methods such as
interpolating with a cubic spline and least squares approximations. The following
macros must be used in conjunction with excel_curvefit_analysis_book v4.5 and plugin
from XlXtrFunDistribution.

Excel Macros :
**********************************************************************
Option Explicit
Option Compare Text

Sub Analysis()
'declare variables
Dim Message, Title

Dim Phi1 As Double


Dim Phi2 As Double
Dim Theta1 As Double
Dim Theta2 As Double
Dim respond As Integer

Dim indexNum As Integer

Dim var1 As Double

Dim numberInList As Integer


Dim chosenValue As String

'unload the form in case it was left in memory


Unload UserForm1
'add items to the list box
With UserForm1.ListBox1
.AddItem "AR 1"
.AddItem "AR 2"
.AddItem "MA 1"
.AddItem "MA 2"
.AddItem "ARIMA 011"
.AddItem "ARIMA 110"
.AddItem "ARIMA 101"
.AddItem "ARIMA 111"
.AddItem "Seasonal"
End With
'display the list until an item is selected or cancel pressed
Do
UserForm1.Show

165
'check value of bOK that was set by the buttons on the form
If Not bOK Then Exit Sub
'if an item is selected, exit the loop
If UserForm1.ListBox1.ListIndex > -1 Then Exit Do
'if no item selected, display a message
MsgBox "No selection was made"
Loop
'store the values of the listbox for later use
indexNum = UserForm1.ListBox1.ListIndex
numberInList = indexNum + 1
chosenValue = UserForm1.ListBox1.Value
'unload the userform
Unload UserForm1
'display a message box of what was selected
MsgBox "List index number* of item picked: " & indexNum & Chr(13) & _
Chr(13) & _
"Number of item in the list: " & numberInList & Chr(13) & _
Chr(13) & _
"list text of item picked: " & chosenValue & Chr(13) & _
Chr(13) & _
Chr(13) & _
"* Please note that the index number of the first item in a list is 0, not 1"

Message = "Enter the value for the variance ?"


Title = "Information for the random number generator"
var1 = InputBox(Message, Title)

Select Case indexNum

Case 0
Message = "Enter the value for the first constant ?"
Title = "Simulation Data Analysis for AR1 model"
Phi1 = InputBox(Message, Title)

Case 1
Message = "Enter the value for the first constant ?"
Title = "Simulation Data Analysis for AR2 model"
Phi1 = InputBox(Message, Title)
Message = "Enter the value for the second constant ?"
Title = "Simulation Data Analysis for AR2 model"
Phi2 = InputBox(Message, Title)

Case 2
Message = "Enter the value for the first constant ?"
Title = "Simulation Data Analysis for MA1 model"
Theta1 = InputBox(Message, Title)

Case 3
Message = "Enter the value for the first constant ?"
Title = "Simulation Data Analysis for MA2 model"

166
Theta1 = InputBox(Message, Title)
Message = "Enter the value for the second constant ?"
Title = "Simulation Data Analysis for MA2 model"
Theta2 = InputBox(Message, Title)

Case 4
Message = "Enter the value for the first MA constant ?"
Title = "Simulation Data Analysis for ARIMA 011 model"
Theta1 = InputBox(Message, Title)

Case 5
Message = "Enter the value for the first AR constant ?"
Title = "Simulation Data Analysis for ARIMA 110 model"
Phi1 = InputBox(Message, Title)

Case 6
Message = "Enter the value for the first AR constant ?"
Title = "Simulation Data Analysis for ARIMA 101 model"
Phi1 = InputBox(Message, Title)
Message = "Enter the value for the first MA constant ?"
Title = "Simulation Data Analysis for ARIMA 101 model"
Theta1 = InputBox(Message, Title)

Case 7
Message = "Enter the value for the first AR constant ?"
Title = "Simulation Data Analysis for ARIMA 111 model"
Phi1 = InputBox(Message, Title)
Message = "Enter the value for the first MA constant ?"
Title = "Simulation Data Analysis for ARIMA 111 model"
Theta1 = InputBox(Message, Title)

Case 8
'Seasonal Model
Message = "Enter the value for the constant ?"
Title = "Simulation Data Analysis for SARIMA (1,0,0)(0,1,0)12 model"
Phi1 = InputBox(Message, Title)

Case Else
'Extra option

End Select

'AddIns("Analysis ToolPak - VBA").Installed = True


Message = "Enter '1' for Excel to simulate data set or '0' for user's data set"
Title = "Information"
respond = InputBox(Message, Title)
If respond = 0 Then
Processing2 Phi1, Phi2, Theta1, Theta2, indexNum, var1
ElseIf respond = 1 Then
Processing Phi1, Phi2, Theta1, Theta2, indexNum, var1

167
End If
End Sub

Sub Summary_Analysis()

Dim row_index1 As Integer


Dim column_index1 As Integer
Dim row_index2 As Integer
Dim column_index2 As Integer
Dim row_index3 As Integer
Dim column_index3 As Integer
Dim countsheet As Integer
Dim counter1 As Integer
Dim counter2 As Integer

Dim Message, Title

Dim Phi1 As Double


Dim Phi2 As Double
Dim Theta1 As Double
Dim Theta2 As Double
Dim model As String
Dim respond As Integer

Dim indexNum As Integer

Dim var1 As Double

row_index1 = 1
row_index2 = 1
respond = 2

Message = "Enter '1' for Excel to simulate data set or '0' for user's data set"
Title = "Information"
respond = InputBox(Message, Title)

For counter1 = 1 To 20

column_index1 = 1
column_index2 = 1
indexNum = 99
Sheets("Summary").Select

model = ActiveSheet.Range("B7").Cells(row_index1, column_index1).Value


Phi1 = ActiveSheet.Range("B7").Cells(row_index1, column_index1 + 1).Value
Phi2 = ActiveSheet.Range("B7").Cells(row_index1, column_index1 + 2).Value
Theta1 = ActiveSheet.Range("B7").Cells(row_index1, column_index1 + 3).Value
Theta2 = ActiveSheet.Range("B7").Cells(row_index1, column_index1 + 4).Value
var1 = ActiveSheet.Range("B7").Cells(row_index1, column_index1 + 5).Value

168
If model = "AR 1" Then
indexNum = 0
ElseIf model = "AR 2" Then
indexNum = 1
ElseIf model = "MA 1" Then
indexNum = 2
ElseIf model = "MA 2" Then
indexNum = 3
ElseIf model = "ARIMA 011" Then
indexNum = 4
ElseIf model = "ARIMA 110" Then
indexNum = 5
ElseIf model = "ARIMA 101" Then
indexNum = 6
ElseIf model = "ARIMA 111" Then
indexNum = 7
ElseIf model = "Seasonal" Then
indexNum = 8
End If

If (respond = 0) And (indexNum <> 99) Then

If counter1 = 1 Then
Sheets("DataSets (2)").Select
ElseIf counter1 = 2 Then
Sheets("DataSets (3)").Select
ElseIf counter1 = 3 Then
Sheets("DataSets (4)").Select
ElseIf counter1 = 4 Then
Sheets("DataSets (5)").Select
ElseIf counter1 = 5 Then
Sheets("DataSets (6)").Select
ElseIf counter1 = 6 Then
Sheets("DataSets (7)").Select
ElseIf counter1 = 7 Then
Sheets("DataSets (8)").Select
ElseIf counter1 = 8 Then
Sheets("DataSets (9)").Select
ElseIf counter1 = 9 Then
Sheets("DataSets (10)").Select
ElseIf counter1 = 10 Then
Sheets("DataSets (11)").Select
ElseIf counter1 = 11 Then
Sheets("DataSets (12)").Select
ElseIf counter1 = 12 Then
Sheets("DataSets (13)").Select
ElseIf counter1 = 13 Then
Sheets("DataSets (14)").Select
ElseIf counter1 = 14 Then
Sheets("DataSets (15)").Select

169
ElseIf counter1 = 15 Then
Sheets("DataSets (16)").Select
ElseIf counter1 = 16 Then
Sheets("DataSets (17)").Select
ElseIf counter1 = 17 Then
Sheets("DataSets (18)").Select
ElseIf counter1 = 18 Then
Sheets("DataSets (19)").Select
ElseIf counter1 = 19 Then
Sheets("DataSets (20)").Select
ElseIf counter1 = 20 Then
Sheets("DataSets (21)").Select
End If

Range("A3:CV102").Select
Selection.Copy
Sheets("DataSets").Select
Range("A3:CV102").Select
Selection.PasteSpecial Paste:=xlValues, Operation:=xlNone, SkipBlanks:= _
False, Transpose:=False
Application.CutCopyMode = False

Processing2 Phi1, Phi2, Theta1, Theta2, indexNum, var1


End If

If respond = 1 Then

If (indexNum = 0) Or (indexNum = 1) Or (indexNum = 2) Or (indexNum = 3) Or


(indexNum = 4) Or (indexNum = 5) Or (indexNum = 6) Or (indexNum = 7) Or
(indexNum = 8) Then
Processing Phi1, Phi2, Theta1, Theta2, indexNum, var1, counter1
Sheets("DataSets").Copy Before:=Sheets("Summary")
End If
End If

row_index1 = row_index1 + 2

For counter2 = 1 To 7

If (indexNum >= 0) And (indexNum <= 8) Then


Sheets("Summary").Select
ActiveSheet.Range("H7").Cells(row_index2, counter2).Value =
Sheets("case2").Range("AF104").Cells(1, column_index2).Value
ActiveSheet.Range("H8").Cells(row_index2, counter2).Value =
Sheets("case2").Range("AF105").Cells(1, column_index2).Value
ActiveSheet.Range("H52").Cells(row_index2, counter2).Value =
Sheets("case3").Range("AF104").Cells(1, column_index2).Value
ActiveSheet.Range("H53").Cells(row_index2, counter2).Value =
Sheets("case3").Range("AF105").Cells(1, column_index2).Value

170
ActiveSheet.Range("H97").Cells(row_index2, counter2).Value =
Sheets("case4").Range("AF104").Cells(1, column_index2).Value
ActiveSheet.Range("H98").Cells(row_index2, counter2).Value =
Sheets("case4").Range("AF105").Cells(1, column_index2).Value
Else
Sheets("Summary").Select
ActiveSheet.Range("H7").Cells(row_index2, counter2).Clear
ActiveSheet.Range("H8").Cells(row_index2, counter2).Clear
ActiveSheet.Range("H52").Cells(row_index2, counter2).Clear
ActiveSheet.Range("H53").Cells(row_index2, counter2).Clear
ActiveSheet.Range("H97").Cells(row_index2, counter2).Clear
ActiveSheet.Range("H98").Cells(row_index2, counter2).Clear
End If

column_index2 = column_index2 + 4

Next counter2

row_index2 = row_index2 + 2

Next counter1

End Sub
Sub Processing(Phi1, Phi2, Theta1, Theta2, indexNum, var1, counter1)
'declare variables

Dim M49, S49_1, S49_2, S49_3, P49_1, P49_2, P49_3 As Double


Dim M1, S1, P1 As Double
Dim M99, S99, P99 As Double
Dim M50, S50, P50 As Double
Dim M51, S51, P51 As Double

Dim rwindex As Integer


Dim rwindex2 As Integer
Dim colindex As Integer
Dim colindex2 As Integer
Dim rwindex3 As Integer
Dim colindex3 As Integer
Dim rwindex4 As Integer
Dim colindex4 As Integer
Dim rwindex5 As Integer
Dim colindex5 As Integer
Dim rwindex6 As Integer
Dim colindex6 As Integer
Dim rwindex7 As Integer
Dim colindex7 As Integer
Dim seed As Integer
Dim dcounter As Integer
Dim counter As Integer

171
colindex = 1
rwindex = 3
colindex2 = 1
rwindex2 = 4
colindex3 = 1
rwindex3 = 108
colindex6 = 1
rwindex6 = 212
colindex4 = 1
rwindex4 = 4
colindex5 = 1
rwindex5 = 108
colindex7 = 1
rwindex7 = 212

clear_data
Sheets("DataSets").Range("A3:cv102").Clear

AddIns("Analysis ToolPak - VBA").Installed = True

For counter = 0 To 99
seed = counter + counter1
Sheets.Add.Name = "Temp"

Application.Run "ATPVBAEN.XLA!Random", ActiveSheet.Range("$A$5"), 1, 125


_
, 2, seed, 0, var1

Range("b5:b124").Select
Selection.ClearContents

Select Case indexNum

Case 0
ActiveSheet.Range("a1").Value = Phi1
ActiveSheet.Range("b4").Value = ActiveSheet.Range("a5").Value
ActiveCell.FormulaR1C1 = "=R1C1*R[-1]C+R[1]C[-1]"

Case 1
ActiveSheet.Range("a1").Value = Phi1
ActiveSheet.Range("a2").Value = Phi2
ActiveSheet.Range("b3").Value = ActiveSheet.Range("a5").Value
ActiveSheet.Range("b4").Value = ActiveSheet.Range("a6").Value
ActiveCell.FormulaR1C1 = "=r1c1*r[-1]c+r2c1*r[-2]c+r[2]c[-1]"

Case 2
ActiveSheet.Range("a1").Value = Theta1
ActiveSheet.Range("b4").Value = ActiveSheet.Range("a5").Value
ActiveCell.FormulaR1C1 = "=R1C1*RC[-1]+R[1]C[-1]"

172
Case 3
ActiveSheet.Range("a1").Value = Theta1
ActiveSheet.Range("a2").Value = Theta2
ActiveSheet.Range("b3").Value = ActiveSheet.Range("a5").Value
ActiveSheet.Range("b4").Value = ActiveSheet.Range("a6").Value
ActiveCell.FormulaR1C1 = "=r1c1*rc[-1]+r2c1*r[1]c[-1]+r[2]c[-1]"

Case 4
ActiveSheet.Range("a1").Value = Theta1
ActiveSheet.Range("b4").Value = ActiveSheet.Range("a5").Value
ActiveSheet.Range("c4").Value = ActiveSheet.Range("a5").Value
ActiveCell.FormulaR1C1 = "=R1C1*RC[-1]+R[1]C[-1]"
Range("c5").Select
ActiveCell.FormulaR1C1 = "=R[-1]C+RC[-1]"

Case 5
ActiveSheet.Range("a1").Value = Phi1
ActiveSheet.Range("b4").Value = ActiveSheet.Range("a5").Value
ActiveSheet.Range("c4").Value = ActiveSheet.Range("a5").Value
ActiveCell.FormulaR1C1 = "=R1C1*R[-1]C+R[1]C[-1]"
Range("c5").Select
ActiveCell.FormulaR1C1 = "=R[-1]C+RC[-1]"

Case 6
ActiveSheet.Range("a1").Value = Phi1
ActiveSheet.Range("a2").Value = Theta1
ActiveSheet.Range("b4").Value = ActiveSheet.Range("a5").Value
ActiveCell.FormulaR1C1 = "=r1c1*r[-1]c+r2c1*rc[-1]+r[1]c[-1]"

Case 7
ActiveSheet.Range("a1").Value = Phi1
ActiveSheet.Range("a2").Value = Theta1
ActiveSheet.Range("b4").Value = ActiveSheet.Range("a5").Value
ActiveSheet.Range("c4").Value = ActiveSheet.Range("a5").Value
ActiveCell.FormulaR1C1 = "=r1c1*r[-1]c+r2c1*rc[-1]+r[1]c[-1]"
Range("c5").Select
ActiveCell.FormulaR1C1 = "=R[-1]C+RC[-1]"

Case 8
'Seasonal Model
ActiveSheet.Range("a1").Value = Phi1
ActiveSheet.Range("b1").Value = ActiveSheet.Range("a5").Value
ActiveSheet.Range("b2").Value = ActiveSheet.Range("a6").Value
ActiveSheet.Range("b3").Value = ActiveSheet.Range("a7").Value
ActiveSheet.Range("b4").Value = ActiveSheet.Range("a8").Value
ActiveSheet.Range("b5").Value = ActiveSheet.Range("a9").Value
ActiveSheet.Range("b6").Value = ActiveSheet.Range("a10").Value
ActiveSheet.Range("b7").Value = ActiveSheet.Range("a11").Value
ActiveSheet.Range("b8").Value = ActiveSheet.Range("a12").Value
ActiveSheet.Range("b9").Value = ActiveSheet.Range("a13").Value

173
ActiveSheet.Range("b10").Value = ActiveSheet.Range("a14").Value
ActiveSheet.Range("b11").Value = ActiveSheet.Range("a15").Value
ActiveSheet.Range("b12").Value = ActiveSheet.Range("a16").Value
ActiveSheet.Range("b13").Value = ActiveSheet.Range("a17").Value
Range("b14").Select
ActiveCell.FormulaR1C1 = "=r1c1*r[-1]c-r1c1*r[-13]c+r[-12]c+r[4]c[-1]"

Case Else
'Extra option

End Select

Select Case indexNum

Case 0, 1, 2, 3, 4, 5, 6, 7

Range("B5:c5").Select
Selection.AutoFill Destination:=Range("B5:c108"), Type:=xlFillDefault

Case 8

Range("B14:c14").Select
Selection.AutoFill Destination:=Range("B14:c128"), Type:=xlFillDefault

Case Else
' extra option

End Select

Select Case indexNum

Case 0, 1, 2, 3, 6
Range("b5:b104").Select

Case 4, 5, 7
Range("c5:c104").Select

Case 8
Range("b14:b113").Select

Case Else
'Extra option

End Select

Selection.Copy
Sheets("case1").Select
Range("b6").Select
Selection.PasteSpecial Paste:=xlValues, Operation:=xlNone, SkipBlanks:= _
False, Transpose:=False

174
Sheets("DataSets").Select
Range("a1:cv105").Cells(rwindex, colindex).Select
Selection.PasteSpecial Paste:=xlValues, Operation:=xlNone, SkipBlanks:= _
False, Transpose:=False
Application.CutCopyMode = False
colindex = colindex + 1
Sheets("Temp").Select
Application.CutCopyMode = False
Application.DisplayAlerts = False
ActiveWindow.SelectedSheets.Delete
Sheets("case1").Select
Range("b6").Select

M49 = ActiveSheet.Range("B54").Value
Sheets("case2").Select
S49_1 = ActiveSheet.Range("G54").Value
Range("ad1").Cells(rwindex2, colindex2).Value = M49
Range("ae1").Cells(rwindex2, colindex2).Value = S49_1
Sheets("case1").Select

M1 = ActiveSheet.Range("B12").Value
Sheets("case3").Select
S1 = ActiveSheet.Range("G12").Value
Range("ad1").Cells(rwindex2, colindex2).Value = M1
Range("ae1").Cells(rwindex2, colindex2).Value = S1
Sheets("case1").Select

M99 = ActiveSheet.Range("B96").Value
Sheets("case4").Select
S99 = ActiveSheet.Range("G96").Value
Range("ad1").Cells(rwindex2, colindex2).Value = M99
Range("ae1").Cells(rwindex2, colindex2).Value = S99
Sheets("case1").Select

For dcounter = 2 To 7

Sheets("case1").Range("d3").Value = dcounter

Sheets("case2").Select
P49_1 = ActiveSheet.Range("F54").Value
Range("ai1").Cells(rwindex4, colindex4).Value = P49_1

Sheets("case3").Select
P1 = ActiveSheet.Range("F12").Value
Range("ai1").Cells(rwindex4, colindex4).Value = P1

Sheets("case4").Select
P99 = ActiveSheet.Range("F96").Value

175
Range("ai1").Cells(rwindex4, colindex4).Value = P99

colindex4 = colindex4 + 4
colindex5 = colindex5 + 4
colindex7 = colindex7 + 4

Next dcounter

rwindex2 = rwindex2 + 1
rwindex3 = rwindex3 + 1
rwindex4 = rwindex4 + 1
rwindex5 = rwindex5 + 1
rwindex6 = rwindex6 + 1
rwindex7 = rwindex7 + 1
colindex4 = 1
colindex5 = 1
colindex6 = 1
colindex7 = 1

Next counter

End Sub

Sub clear_data()

Sheets("case2").Range("Ad4:ae103,ai4:ai103,am4:am103,aq4:aq103,au4:au103,ay4:ay1
03,bc4:bc103").Clear
Sheets("case3").Range("Ad4:ae103,ai4:ai103,am4:am103,aq4:aq103,au4:au103,ay4:ay1
03,bc4:bc103").Clear
Sheets("case4").Range("Ad4:ae103,ai4:ai103,am4:am103,aq4:aq103,au4:au103,ay4:ay1
03,bc4:bc103").Clear

End Sub
Sub Processing2(Phi1, Phi2, Theta1, Theta2, indexNum, var1)
'declare variables

Dim M49, S49_1, S49_2, S49_3, P49_1, P49_2, P49_3 As Double


Dim M1, S1, P1 As Double
Dim M99, S99, P99 As Double
Dim M50, S50, P50 As Double
Dim M51, S51, P51 As Double

Dim rwindex As Integer


Dim rwindex2 As Integer
Dim colindex As Integer
Dim colindex2 As Integer
Dim rwindex3 As Integer
Dim colindex3 As Integer
Dim rwindex4 As Integer
Dim colindex4 As Integer

176
Dim rwindex5 As Integer
Dim colindex5 As Integer
Dim rwindex6 As Integer
Dim colindex6 As Integer
Dim rwindex7 As Integer
Dim colindex7 As Integer
Dim dcounter As Integer
Dim counter As Integer

colindex = 1
rwindex = 3
colindex2 = 1
rwindex2 = 4
colindex3 = 1
rwindex3 = 108
colindex6 = 1
rwindex6 = 212
colindex4 = 1
rwindex4 = 4
colindex5 = 1
rwindex5 = 108
colindex7 = 1
rwindex7 = 212

clear_data

For counter = 0 To 4

Sheets("DataSets").Select
Range(Cells(3, colindex), Cells(102, colindex)).Select
'Range("a1:cv105").Cells(3, colindex).Select
Selection.Copy

Sheets("case1").Select
Range("b6").Select
Selection.PasteSpecial Paste:=xlValues, Operation:=xlNone, SkipBlanks:= _
False, Transpose:=False
colindex = colindex + 1

Sheets("case1").Select
Range("b6").Select

M49 = ActiveSheet.Range("B54").Value
Sheets("case2").Select
S49_1 = ActiveSheet.Range("G54").Value
Range("ad1").Cells(rwindex2, colindex2).Value = M49
Range("ae1").Cells(rwindex2, colindex2).Value = S49_1
Sheets("case1").Select

M1 = ActiveSheet.Range("B12").Value

177
Sheets("case3").Select
S1 = ActiveSheet.Range("G12").Value
Range("ad1").Cells(rwindex2, colindex2).Value = M1
Range("ae1").Cells(rwindex2, colindex2).Value = S1
Sheets("case1").Select

M99 = ActiveSheet.Range("B96").Value
Sheets("case4").Select
S99 = ActiveSheet.Range("G96").Value
Range("ad1").Cells(rwindex2, colindex2).Value = M99
Range("ae1").Cells(rwindex2, colindex2).Value = S99
Sheets("case1").Select

For dcounter = 2 To 7

Sheets("case1").Range("d3").Value = dcounter

Sheets("case2").Select
P49_1 = ActiveSheet.Range("F54").Value
Range("ai1").Cells(rwindex4, colindex4).Value = P49_1

Sheets("case3").Select
P1 = ActiveSheet.Range("F12").Value
Range("ai1").Cells(rwindex4, colindex4).Value = P1

Sheets("case4").Select
P99 = ActiveSheet.Range("F96").Value
Range("ai1").Cells(rwindex4, colindex4).Value = P99

colindex4 = colindex4 + 4
colindex5 = colindex5 + 4
colindex7 = colindex7 + 4

Next dcounter

rwindex2 = rwindex2 + 1
rwindex3 = rwindex3 + 1
rwindex4 = rwindex4 + 1
rwindex5 = rwindex5 + 1
rwindex6 = rwindex6 + 1
rwindex7 = rwindex7 + 1
colindex4 = 1
colindex5 = 1
colindex6 = 1
colindex7 = 1

Next counter

End Sub
**********************************************************************

178
In addition, I have also written the following macros for Minitab to examine the
accuracy of interpolation method in Box Jenkins approach. During my study, I have
made numerous alternations on my macros, sometimes fine tuning is required for the
macros to suit different conditions. In this thesis, I have only provided the two macros
that I considered as the most common used as references.

Minitab Macros:
a) Generate time series data using minitab
**********************************************************************
GMACRO
ARIMA
#

NOTE What is the Autoregressive order p?


SET C91;
FILE 'TERMINAL';
NOBS=1.
LET K15=C91(1)
#
IF K15 =0
GOTO 1
ENDIF
NOTE What are the coefficients for the AR component?
SET C92;
FILE 'TERMINAL';
NOBS=K15.
MLABEL 1
#
NOTE What is the Moving Average order q?
SET C93;
FILE 'TERMINAL';
NOBS=1.
LET K16=C93(1)
IF K16 = 0
GOTO 2
ENDIF
#
NOTE What are the coefficients for the MA component?
SET C94;
FILE 'TERMINAL';
NOBS=K16.
MLABEL 2
#
DO k30=1:100
RANDOM 100 C1;
NORMAL 0 1.
COPY K15 K16 C100
MAXIMUM C100 K17
DO K2=1:K17
LET C2(K2)=C1(K2)

179
ENDDO
LET K18=K17+1
DO K2=K18:100
LET C2(K2)=C1(K2)
IF K15 = 0
GOTO 3
ENDIF
DO K3=1:K15
LET C2(K2)=C2(K2)+C92(K3)*C2(K2-K3)
ENDDO
MLABEL 3
IF K16=0
GOTO 4
ENDIF
DO K3=1:K16
LET C2(K2)=C2(K2)+C94(K3)*C1(K2-K3)
ENDDO
MLABEL 4
ENDDO
Note TSPLOT C2

Let k991=c91
Let k993=c93

Set c3
1( 1 : 100 / 1 )1
End.
Copy C2 c4;
Omit 49:100.
Copy C2 c110;
Omit 1:49.
Set c111
1( 1 : 51 / 1 )1
End.
Sort C110 c5;
By c111;
Descending c111.
ARIMA k991 0 k993 C4;
NoConstant;
Forecast 1 c116.
ARIMA K991 0 k993 c5;
NoConstant;
Forecast 1 c117.

If k991=1
if k993=0
Let K99=1/(1+c92*c92)
endif
if k993=1
Let k99=(1-c92*c94)/(1+c92*c92-2*c92*c94)

180
endif
endif
If k991=0
if k993=1
Let k99=1
endif
endif

Name c10='Estimate'
Name c12='Actual'
Name c14='Residual'
Name c15='ResiF'
Name c6='Forecast'
Name c7='Backcast'
Let c6(k30)=c116
Let c7(k30)=c117
Let c10(k30)=k99*(c116+c117)
Let c12(k30)=c2(49)
Let c114(k30)=c10(k30)-c12(k30)
Let c115(k30)=c6(k30)-c12(k30)
ENDDO
Let 'Residual'=c114
Let 'ResiF'=c115
Describe c114.
Histogram Residual;
MidPoint;
Bar.
Describe c115.
Histogram ResiF;
MidPoint;
Bar.
ENDMACRO
**********************************************************************

b) Import data sets from Excel, forecast and evaluate missing value using different
interpolation methods.
**********************************************************************
GMACRO
ARIMA

Let k43=1
# Do k42=1:20

Erase C1-C1000
#Erase K1-K1000
Erase M1-M100
Let K998 = '*'
Let K999 = 2.7182818
Let K1000 = 3.14159265

181
Let c500(1)=k43

if c500(1)=1
XDGET 'excel' 'DataSets (2)' 'r3c1:r102c100'
endif
if c500(1)=2
XDGET 'excel' 'DataSets (3)' 'r3c1:r102c100'
endif
if c500(1)=3
XDGET 'excel' 'DataSets (4)' 'r3c1:r102c100'
endif
if c500(1)=4
XDGET 'excel' 'DataSets (5)' 'r3c1:r102c100'
endif
if c500(1)=5
XDGET 'excel' 'DataSets (6)' 'r3c1:r102c100'
endif
if c500(1)=6
XDGET 'excel' 'DataSets (7)' 'r3c1:r102c100'
endif
if c500(1)=7
XDGET 'excel' 'DataSets (8)' 'r3c1:r102c100'
endif
if c500(1)=8
XDGET 'excel' 'DataSets (9)' 'r3c1:r102c100'
endif
if c500(1)=9
XDGET 'excel' 'DataSets (10)' 'r3c1:r102c100'
endif
if c500(1)=10
XDGET 'excel' 'DataSets (11)' 'r3c1:r102c100'
endif
if c500(1)=11
XDGET 'excel' 'DataSets (12)' 'r3c1:r102c100'
endif
if c500(1)=12
XDGET 'excel' 'DataSets (13)' 'r3c1:r102c100'
endif
if c500(1)=13
XDGET 'excel' 'DataSets (14)' 'r3c1:r102c100'
endif
if c500(1)=14
XDGET 'excel' 'DataSets (15)' 'r3c1:r102c100'
endif
if c500(1)=15
XDGET 'excel' 'DataSets (16)' 'r3c1:r102c100'
endif
if c500(1)=16
XDGET 'excel' 'DataSets (17)' 'r3c1:r102c100'

182
endif
if c500(1)=17
XDGET 'excel' 'DataSets (18)' 'r3c1:r102c100'
endif
if c500(1)=18
XDGET 'excel' 'DataSets (19)' 'r3c1:r102c100'
endif
if c500(1)=19
XDGET 'excel' 'DataSets (20)' 'r3c1:r102c100'
endif
if c500(1)=20
XDGET 'excel' 'DataSets (21)' 'r3c1:r102c100'
endif

# Note What is the position of the missing value?


# SET c495;
# FILE 'TERMINAL';
# NOBS=1.

# Let K40=c495(1)

NOTE What is the Autoregressive order p? (1 or 0)


SET C491;
FILE 'TERMINAL';
NOBS=1.

#Let c491(1)=1

LET K15=C491(1)
Let c194(1)=c491(1)
IF K15 =0
Let c492(1)=0
GOTO 1
ENDIF

NOTE What are the coefficients for the AR component?


SET C492;
FILE 'TERMINAL';
NOBS=K15.

Let c195(1)=c492
MLABEL 1

NOTE What is the Moving Average order q? (1 or 0)


SET C493;
FILE 'TERMINAL';
NOBS=1.

#Let c493(1)=0

183
LET K16=C493(1)
Let c196(1)=c493(1)
IF K16 = 0
Let c494(1)=0
GOTO 2
ENDIF

NOTE What are the coefficients for the MA component?


SET C494;
FILE 'TERMINAL';
NOBS=K16.
Let c197(1)=c494
MLABEL 2

#Note Make sure you have copy data set from Excel
#Note Data set should start from "C2" column

Let k40=7
Do k45=1:13

Let K1 = 1

Do k30=1:100
Let c102=ck1

Set c103
1( 1 : 100 / 1 )1
End.

Copy C102 c104;


Omit k40:100.
Copy C102 c210;
Omit 1:k40.

Let K41=100-k40

Set c211
1( 1 : k41 / 1 )1
End.

Sort C210 c105;


By c211;
Descending c211.

ARIMA k15 0 k16 C104;


NoConstant;
Forecast 1 c218.
ARIMA k15 0 k16 c105;
NoConstant;

184
Forecast 1 c219.

Let c106(k30)=c218
Let c107(k30)=c219

Let k97=(1-c492*c494)/(1+c492*c492-2*c492*c494)

Let c110(k30)=k97*(c106(k30)+c107(k30))
Let c112(k30)=c102(k40)
Let c214(k30)=c110(k30)-c112(k30)
Let c215(k30)=c106(k30)-c112(k30)
Let c216(k30)=c107(k30)-c112(k30)

Let k1=k1+1

ENDDO

Let c114=c214
Let c115=c215
Let c116=c216

Absolute c114 c114.


Absolute c115 c115.
Absolute c116 c116.

sum c114 c201


Let c201(1)=c201(1)/100
StDev c114 c199.

sum c115 c202


Let c202(1)=c202(1)/100
StDev c115 c200.

sum c116 c203


Let c203(1)=c203(1)/100
StDev c116 c198.

Let c305(k45)=c201(1)
Let c306(k45)=c199(1)
Let c307(k45)=c202(1)
Let c308(k45)=c200(1)
Let c309(k45)=c203(1)
Let c310(k45)=c198(1)
Let c304(k45)=k40
Let k40=k40+7
enddo

Name c110='Estimate'
Name c112='Actual'
Name c114='Residual'

185
Name c115='Resid_F'
Name c106='Forecast'
Name c107='Backcast'
Name c201='C MAD R'
Name c202='C MAD RF'
Name c203='c MAD RB'
Name c305='MAD R'
Name c307='MAD RF'
Name c309='MAD RB'
Name c306='SD R'
Name c308='SD RF'
Name c310='sd RB'
Name c194='AR'
Name c195='Phi'
Name c196='MA'
Name c197='Theta'

# Let k43=k43+1

# Pause
# enddo

ENDMACRO
**********************************************************************

186
Appendix B
AR 1 , Phi 0.2 and Var 0.04
0.04
0.039
0.038
0.037
0.036 MA D R
0.035 MA D RF
0.034 MA D RB
0.033
0.032
0.031
0.03
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB sd RB


7 0.0383548 0.0300261 0.0394706 0.0293599 0.0349698 0.0266808
14 0.0342186 0.0261616 0.0348106 0.0267383 0.0325758 0.0243101
21 0.0348769 0.0258421 0.0347216 0.0250258 0.0333016 0.0231421
28 0.0308926 0.0247714 0.0308156 0.0241882 0.0317813 0.0255378
35 0.0327949 0.0253675 0.0335549 0.025669 0.0314575 0.0259499
42 0.0324695 0.0247884 0.0330283 0.0240163 0.032348 0.0258883
49 0.0331776 0.0254141 0.0333826 0.02569 0.0324672 0.0249714
56 0.0303941 0.023799 0.03146 0.0238795 0.0338147 0.0227298
63 0.0323239 0.0241004 0.0328842 0.0234854 0.0335653 0.0238895
70 0.0317275 0.0218221 0.031692 0.0221196 0.0321843 0.0234915
77 0.0327001 0.0241933 0.0326502 0.0244592 0.0336681 0.0244235
84 0.0343036 0.0269857 0.0324897 0.0261358 0.0351269 0.0276911
91 0.0337177 0.0266945 0.0338195 0.0268043 0.0338952 0.0272599

AR 1 , Phi 0.4 and Var 0.04


0.04

0.038

0.036
MAD R
0.034 MAD RF
MAD RB
0.032

0.03

0.028
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.0354171 0.0297334 0.0391116 0.0306955 0.0370822 0.0279217
14 0.0334243 0.0240874 0.035565 0.0263585 0.0331941 0.0243238
21 0.0339633 0.0241433 0.0351538 0.0255282 0.033365 0.0228828
28 0.0301497 0.0239761 0.0311748 0.0241983 0.0330109 0.0251015
35 0.0307671 0.0237391 0.0334672 0.0254874 0.0324043 0.0253658
42 0.0305512 0.0236527 0.0326048 0.0238531 0.0320661 0.0258765
49 0.0315432 0.0235266 0.0337797 0.0258524 0.0332746 0.0228335
56 0.0297035 0.0214621 0.0314494 0.02384 0.0349084 0.0221234
63 0.0295532 0.0228912 0.0324862 0.0235806 0.0326278 0.0235692
70 0.0301491 0.0210751 0.0312146 0.0226036 0.0322883 0.0247577
77 0.0297082 0.0224657 0.0321325 0.0242214 0.0336161 0.0226056
84 0.0337879 0.0257229 0.0332094 0.0259616 0.036365 0.0280486
91 0.0308536 0.0248005 0.0331826 0.0262302 0.0320738 0.0269271

187
AR 1 , Phi 0.6 and Var 0.04
0.04

0.038

0.036
MA D R
0.034
MA D RF
0.032
MA D RB
0.03

0.028

0.026
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.0308411 0.0260684 0.0372982 0.0300513 0.0377745 0.0281719
14 0.0303771 0.0225608 0.0355961 0.0256469 0.0345531 0.0235791
21 0.0310941 0.0215129 0.0348632 0.0253421 0.0330646 0.0238015
28 0.0276698 0.0221232 0.0306791 0.024297 0.0329367 0.0253303
35 0.0282684 0.0208651 0.0333109 0.0250419 0.0330475 0.0238976
42 0.0278348 0.0216977 0.0319963 0.0241664 0.0322576 0.0262096
49 0.0292484 0.0214444 0.0337051 0.0256392 0.0334359 0.0224409
56 0.0275844 0.0199176 0.0313756 0.02376 0.0334601 0.0239586
63 0.0263725 0.0209946 0.032266 0.0238397 0.0318277 0.0217346
70 0.0282767 0.0199047 0.0315978 0.023485 0.0326364 0.0253883
77 0.0269136 0.0207659 0.0316442 0.0242333 0.0337613 0.0215044
84 0.0312723 0.0234993 0.0330356 0.0259993 0.0363139 0.0268696
91 0.0283289 0.022088 0.0336288 0.0263108 0.0304446 0.0255336

AR 1 , Phi 0.8 and Var 0.04


0.037

0.035

0.033
MA D R
0.031
MA D RF
0.029
MA D RB
0.027

0.025

0.023
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.0269882 0.0222189 0.0336106 0.0285526 0.0362041 0.0292764
14 0.0274999 0.0206343 0.0352364 0.0258857 0.0353344 0.0238631
21 0.0278477 0.0185621 0.0337556 0.0252875 0.0338925 0.0243914
28 0.0254697 0.019544 0.0310548 0.0242494 0.0327125 0.0266088
35 0.0251019 0.0180902 0.0325542 0.0246931 0.0316466 0.0227968
42 0.0246854 0.0197351 0.031174 0.0248393 0.0336359 0.0264669
49 0.0259562 0.0194871 0.0333102 0.0251805 0.0325889 0.0249635
56 0.0255703 0.0178921 0.0313895 0.0238622 0.0316395 0.0246192
63 0.0238685 0.0185199 0.0323986 0.0238678 0.0310729 0.0210768
70 0.0250436 0.0185335 0.031691 0.0233919 0.033619 0.0244415
77 0.0247438 0.0186809 0.0316576 0.024699 0.0317307 0.0244889
84 0.0275294 0.0211607 0.0333245 0.0257529 0.0352519 0.0256628
91 0.0266075 0.0201035 0.0333797 0.0261817 0.0304982 0.0261971

188
AR 1 , Phi -0.2 and Var 0.04
0.043

0.041

0.039
MA D R
0.037
MA D RF
0.035
MA D RB
0.033

0.031

0.029
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.0393637 0.0290693 0.0406404 0.0292991 0.0304429 0.0252375
14 0.0348382 0.0266245 0.0349406 0.0266255 0.0331783 0.0254741
21 0.0325761 0.022485 0.0327737 0.0232155 0.0312844 0.0237256
28 0.0298222 0.0242613 0.0317736 0.0248001 0.0308372 0.0242395
35 0.0329578 0.0257039 0.0326025 0.0261841 0.0329743 0.0234468
42 0.0314307 0.0226177 0.0314934 0.0241625 0.0320543 0.0236444
49 0.0322498 0.025088 0.0329494 0.0253438 0.0317469 0.0255546
56 0.0317826 0.0227359 0.0314411 0.0237394 0.0311429 0.0238142
63 0.0317711 0.0230747 0.031427 0.0229376 0.0317241 0.0245613
70 0.0312431 0.0215704 0.0316544 0.0225159 0.032103 0.0223449
77 0.0341926 0.0237252 0.0315139 0.0240538 0.0339225 0.0250538
84 0.031955 0.0239206 0.0326697 0.0264244 0.031094 0.0233598
91 0.0328277 0.02468 0.0325427 0.0245135 0.0345711 0.0264157

AR 1 , Phi -0.4 and Var 0.04


0.043

0.041
0.039

0.037 MA D R
0.035 MA D RF
0.033 MA D RB

0.031
0.029

0.027
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.0367039 0.0277173 0.0417334 0.0298271 0.0299332 0.0229485
14 0.0321065 0.0244413 0.03458 0.0258979 0.0319514 0.0241525
21 0.0297819 0.0200817 0.0320539 0.0224253 0.0307124 0.0235101
28 0.0282536 0.0212684 0.0314673 0.0247179 0.0313055 0.0216571
35 0.031381 0.0244073 0.0324355 0.0261273 0.0339827 0.0229245
42 0.0291552 0.0211027 0.0314841 0.0241835 0.0325112 0.0226263
49 0.0291798 0.0230239 0.0325068 0.0247367 0.0306684 0.0252703
56 0.0304406 0.0219023 0.0316775 0.023682 0.030964 0.0237773
63 0.0296556 0.0217812 0.0312926 0.0230607 0.031919 0.024703
70 0.0296972 0.0205506 0.0320261 0.0223398 0.03248 0.0235354
77 0.0323755 0.0223382 0.0315346 0.0237933 0.0330472 0.0252771
84 0.0289661 0.0215249 0.0324229 0.026474 0.0294632 0.0217413
91 0.0315949 0.0244153 0.0331553 0.0245652 0.0372271 0.0271421

189
AR 1 , Phi -0.6 and Var 0.04
0.041

0.039
0.037

0.035 MA D R
0.033 MA D RF
0.031 MA D RB

0.029
0.027

0.025
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.03215 0.0230155 0.0401809 0.0286293 0.0299092 0.0228674
14 0.0286479 0.0230971 0.034217 0.0258879 0.0316704 0.0246171
21 0.0267589 0.0193163 0.0321352 0.0226373 0.0306829 0.0229921
28 0.0265205 0.0180159 0.0311482 0.0244511 0.0321839 0.0199024
35 0.0291452 0.022325 0.0321507 0.0260637 0.0341174 0.0235595
42 0.0270454 0.0194439 0.0319063 0.024141 0.0327339 0.0226125
49 0.0262682 0.0208842 0.0321396 0.0246819 0.0312858 0.0247434
56 0.02818 0.0208051 0.0321308 0.0237188 0.0309509 0.0239553
63 0.02703 0.0203037 0.0310643 0.0232911 0.0325249 0.024774
70 0.0270764 0.0201432 0.031639 0.0226081 0.0309939 0.0256903
77 0.0296191 0.0204445 0.0320674 0.023601 0.0313216 0.0249268
84 0.0269059 0.0205453 0.0329022 0.026149 0.0318393 0.0214277
91 0.0286188 0.0219232 0.0330489 0.0245822 0.0364775 0.0274637

AR 1 , Phi -0.8 and Var 0.04


0.04
0.038
0.036
0.034
MA D R
0.032
MA D RF
0.03
MA D RB
0.028
0.026
0.024
0.022
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.0273437 0.0201957 0.0378197 0.0279011 0.030672 0.0266283
14 0.0261526 0.0212997 0.0331292 0.0259376 0.0327643 0.025278
21 0.0237121 0.0190576 0.0316114 0.0242461 0.0284656 0.0225416
28 0.0243595 0.0163797 0.0317932 0.0244646 0.0315411 0.0226966
35 0.0268144 0.0198046 0.0316644 0.025403 0.0342872 0.0241655
42 0.0247034 0.0176463 0.0325757 0.0247674 0.033676 0.0230906
49 0.023533 0.0186582 0.0322425 0.024522 0.0323951 0.0234074
56 0.0256256 0.0192427 0.0325035 0.0238847 0.0320328 0.0229931
63 0.0243151 0.0186746 0.0305981 0.0238163 0.0329033 0.0248569
70 0.0248755 0.0183766 0.0319791 0.0240259 0.030027 0.0249108
77 0.0273765 0.0196273 0.0334131 0.0258656 0.0301672 0.0252148
84 0.0245008 0.0190813 0.0321181 0.0254396 0.0318123 0.024127
91 0.0261095 0.0189957 0.0324963 0.024076 0.0352997 0.0266595

190
AR 1 , Phi 0.2 and Var 0.4
0.405

0.385

0.365
MA D R
0.345 MA D RF
MA D RB
0.325

0.305

0.285
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.38139 0.306003 0.392677 0.29845 0.354507 0.271464
14 0.342032 0.253567 0.342878 0.257623 0.331252 0.2424
21 0.35172 0.272789 0.352419 0.264814 0.335419 0.247718
28 0.319149 0.253468 0.31756 0.250856 0.32715 0.261323
35 0.326863 0.254345 0.334778 0.256939 0.31368 0.260513
42 0.333907 0.25342 0.333702 0.243018 0.336508 0.265347
49 0.333057 0.249052 0.330219 0.250289 0.325874 0.24334
56 0.301564 0.237684 0.313363 0.238062 0.337911 0.229339
63 0.302716 0.245965 0.301037 0.237617 0.316545 0.243187
70 0.327173 0.241162 0.31764 0.233349 0.329869 0.254341
77 0.336631 0.248023 0.334287 0.260799 0.349892 0.248119
84 0.351592 0.274131 0.3299 0.262834 0.357879 0.281081
91 0.32465 0.247156 0.326182 0.245546 0.32928 0.253065

AR 1 , Phi 0.4 and Var 0.4


0.41

0.39

0.37
MA D R
0.35
MA D RF
0.33
MA D RB
0.31

0.29

0.27
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.347751 0.299149 0.390727 0.306476 0.364199 0.279019
14 0.325776 0.238409 0.343638 0.255554 0.329434 0.243461
21 0.335923 0.258112 0.355247 0.269889 0.330048 0.242528
28 0.296795 0.243743 0.308151 0.250134 0.326396 0.260502
35 0.306189 0.237778 0.338347 0.252034 0.323778 0.254907
42 0.322171 0.241646 0.335509 0.241428 0.348009 0.26687
49 0.316576 0.234991 0.333048 0.251215 0.330335 0.228512
56 0.291376 0.216829 0.31243 0.237996 0.348512 0.223573
63 0.281327 0.232358 0.299327 0.241651 0.316732 0.236971
70 0.309928 0.232405 0.313543 0.236042 0.331856 0.26139
77 0.310007 0.229515 0.332554 0.259932 0.349474 0.23041
84 0.336253 0.263173 0.327152 0.262246 0.358995 0.289287
91 0.296121 0.234921 0.325865 0.247179 0.313798 0.251826

191
AR 1 , Phi 0.6 and Var 0.4
0.39

0.37

0.35
MA D R
0.33
MA D RF
0.31
MA D RB
0.29

0.27

0.25
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.307943 0.263169 0.37496 0.301021 0.371735 0.278125
14 0.307428 0.223353 0.353835 0.251068 0.348612 0.237905
21 0.310812 0.233441 0.35712 0.269012 0.330795 0.252261
28 0.277113 0.221982 0.310103 0.248037 0.322081 0.260747
35 0.279484 0.209045 0.337314 0.247351 0.328698 0.240144
42 0.292355 0.223595 0.33452 0.24513 0.348842 0.270071
49 0.29102 0.21398 0.334398 0.250584 0.330437 0.21899
56 0.270524 0.200744 0.312004 0.237462 0.332042 0.238403
63 0.258704 0.208697 0.298976 0.245855 0.323089 0.213562
70 0.286486 0.218129 0.314066 0.236505 0.333545 0.265274
77 0.279932 0.209259 0.326321 0.255531 0.342333 0.218732
84 0.314422 0.237012 0.331568 0.26181 0.354175 0.276431
91 0.277903 0.217753 0.321006 0.242536 0.31019 0.247713

AR 1 , Phi 0.8 and Var 0.4


0.36

0.34

0.32
MA D R
0.3
MA D RF
0.28
MA D RB
0.26

0.24

0.22
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.259304 0.216488 0.333315 0.277454 0.355098 0.285448
14 0.283741 0.212362 0.355396 0.255216 0.349753 0.242503
21 0.273872 0.202194 0.342586 0.265981 0.340852 0.242832
28 0.254003 0.197722 0.318236 0.244842 0.316621 0.259313
35 0.246502 0.180914 0.326819 0.243984 0.31211 0.22833
42 0.259429 0.203519 0.32567 0.249782 0.346875 0.272505
49 0.258301 0.19531 0.332153 0.249972 0.323784 0.241
56 0.25604 0.180322 0.31242 0.239142 0.315522 0.242324
63 0.235476 0.186571 0.301207 0.247027 0.322447 0.209751
70 0.254925 0.202237 0.310341 0.235972 0.335687 0.247956
77 0.256408 0.194872 0.323638 0.257339 0.322778 0.252672
84 0.276271 0.211405 0.331813 0.259193 0.344766 0.250204
91 0.264841 0.20182 0.324714 0.241684 0.312554 0.263736

192
AR 1 , Phi -0.2 and Var 0.4
0.4

0.38

0.36
MA D R
0.34 MA D RF
MA D RB
0.32

0.3

0.28
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.376851 0.282834 0.38709 0.284361 0.302846 0.249494
14 0.340456 0.257038 0.346351 0.254526 0.316918 0.247548
21 0.339462 0.244912 0.336157 0.258111 0.32734 0.253803
28 0.300456 0.243357 0.313297 0.247524 0.309493 0.240488
35 0.320428 0.24916 0.32259 0.250828 0.327583 0.234496
42 0.319647 0.223714 0.321378 0.236195 0.320213 0.23148
49 0.31632 0.249957 0.323948 0.253216 0.316303 0.257331
56 0.306311 0.228813 0.316163 0.238816 0.29814 0.234448
63 0.303078 0.237807 0.292594 0.240185 0.303751 0.249762
70 0.316794 0.226523 0.322805 0.228581 0.321843 0.237338
77 0.339525 0.258597 0.319507 0.256247 0.335617 0.266224
84 0.316273 0.243989 0.325609 0.266298 0.30869 0.232996
91 0.333653 0.247213 0.324851 0.239101 0.344324 0.264374

AR 1 , Phi -0.4 and Var 0.4


0.41

0.39

0.37
MA D R
0.35
MA D RF
0.33
MA D RB
0.31

0.29

0.27
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.353347 0.277411 0.398411 0.296959 0.295805 0.229466
14 0.321108 0.243649 0.347624 0.25344 0.311871 0.238452
21 0.308967 0.219191 0.326622 0.251906 0.316081 0.253699
28 0.280688 0.215046 0.309531 0.246578 0.313381 0.218947
35 0.300658 0.239316 0.321524 0.252695 0.331602 0.230981
42 0.289152 0.21309 0.314798 0.238053 0.311088 0.225711
49 0.293285 0.233508 0.322981 0.249213 0.31651 0.264321
56 0.286415 0.22025 0.317877 0.238057 0.291859 0.228057
63 0.290485 0.222906 0.293477 0.240523 0.304536 0.247413
70 0.29397 0.204657 0.321919 0.230321 0.311496 0.230369
77 0.320161 0.244248 0.31846 0.254783 0.322802 0.261699
84 0.285581 0.220509 0.329712 0.266236 0.291335 0.212617
91 0.3064 0.236447 0.321163 0.23502 0.356815 0.26816

193
AR 1 , Phi -0.6 and Var 0.4
0.39

0.37

0.35
MA D R
0.33
MA D RF
0.31 MA D RB
0.29

0.27

0.25
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.318492 0.232906 0.390068 0.281441 0.301754 0.22885
14 0.289429 0.231579 0.340076 0.255049 0.308864 0.24586
21 0.278297 0.203964 0.327399 0.250274 0.315749 0.240975
28 0.26411 0.181755 0.316572 0.24572 0.319133 0.209282
35 0.279091 0.222068 0.32115 0.255672 0.331085 0.240164
42 0.26539 0.193807 0.312565 0.236759 0.313237 0.229651
49 0.264599 0.212274 0.321298 0.245609 0.322214 0.261174
56 0.262028 0.206829 0.320558 0.237875 0.292057 0.226194
63 0.268304 0.205826 0.294576 0.241118 0.310454 0.241307
70 0.266911 0.192708 0.323777 0.233137 0.292747 0.238368
77 0.292489 0.223597 0.32155 0.253854 0.303809 0.253525
84 0.254179 0.203272 0.326972 0.263609 0.294514 0.198106
91 0.276734 0.210698 0.325343 0.236015 0.350971 0.271477

AR 1 , Phi -0.8 and Var 0.4


0.37

0.35

0.33
MA D R
0.31
MA D RF
0.29
MA D RB
0.27

0.25

0.23
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.269678 0.203483 0.36523 0.271904 0.304596 0.26233
14 0.262556 0.2094 0.324071 0.256902 0.319572 0.255121
21 0.244723 0.199921 0.320987 0.26346 0.294775 0.235084
28 0.23781 0.165338 0.315206 0.246977 0.311703 0.241057
35 0.260011 0.202407 0.327296 0.279894 0.335508 0.244695
42 0.238481 0.175593 0.309183 0.238977 0.327367 0.233924
49 0.237493 0.191511 0.317936 0.2431 0.335667 0.245734
56 0.237685 0.187732 0.323484 0.238011 0.308641 0.218936
63 0.246643 0.188793 0.293679 0.244376 0.325211 0.242987
70 0.24104 0.176551 0.32349 0.237861 0.298269 0.234203
77 0.263969 0.200307 0.327019 0.256515 0.297397 0.253414
84 0.244587 0.197774 0.331663 0.264574 0.297358 0.22662
91 0.255776 0.186003 0.325213 0.238515 0.349287 0.265774

194
MA 1 , Theta 0.2 and Var 0.04
0.042

0.04

0.038
MAD R
0.036 MAD RF
MAD RB
0.034

0.032

0.03
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.0385988 0.0308767 0.0396807 0.0307202 0.0348594 0.0262824
14 0.0336716 0.0270373 0.034222 0.0270041 0.0331728 0.0245039
21 0.0340583 0.0248417 0.0334394 0.0244368 0.0335632 0.0228228
28 0.030936 0.0265798 0.031131 0.0249466 0.0312243 0.0263726
35 0.0322371 0.0254182 0.0331353 0.025865 0.0309033 0.0259661
42 0.0322804 0.0246727 0.0332459 0.0247309 0.0315449 0.0255425
49 0.0328478 0.0250577 0.0330999 0.0250918 0.0325235 0.0249681
56 0.031343 0.0230826 0.0317514 0.0236345 0.034077 0.0229055
63 0.0333329 0.0240432 0.033364 0.0234598 0.0338565 0.0240345
70 0.0307511 0.0227241 0.0318333 0.0218672 0.031548 0.0241048
77 0.0321622 0.0247661 0.0325592 0.0239951 0.0329076 0.0256682
84 0.0328968 0.0274106 0.032408 0.02629 0.0338359 0.0277199
91 0.0348789 0.0281762 0.0340066 0.0267368 0.0345848 0.028019

MA 1 , Theta 0.4 and Var 0.04


0.041

0.039

0.037
MA D R
0.035
MA D RF
0.033
MA D RB
0.031

0.029

0.027
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.0352953 0.0289647 0.0388875 0.0304686 0.0375354 0.0268733
14 0.0321077 0.0242567 0.0349263 0.0268522 0.0335647 0.0237625
21 0.0337663 0.0235949 0.0334864 0.0250469 0.0339687 0.0226985
28 0.030186 0.0259466 0.0315892 0.0247905 0.0319964 0.025846
35 0.0304925 0.0235124 0.0329755 0.0259252 0.0314192 0.0254668
42 0.0300947 0.0227491 0.0325546 0.024731 0.030746 0.0253337
49 0.0311608 0.0234025 0.0330913 0.0247844 0.0334705 0.0242442
56 0.0309698 0.0215449 0.0318397 0.0236161 0.0357144 0.023557
63 0.0302474 0.0223686 0.0330127 0.0232125 0.0332569 0.023073
70 0.0291299 0.0220816 0.0314902 0.0220266 0.0325226 0.0254594
77 0.027348 0.0222405 0.0322257 0.0235667 0.0317965 0.0237771
84 0.0317439 0.0252918 0.0328819 0.0259348 0.0351574 0.0278728
91 0.0332974 0.0255416 0.0333762 0.0259195 0.0341139 0.026922

195
MA 1 , Theta 0.6 and Var 0.04
0.043
0.041
0.039
0.037
0.035 MA D R
0.033 MA D RF
0.031 MA D RB
0.029
0.027
0.025
0.023
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.0340039 0.0271842 0.040184 0.0306512 0.039453 0.0272824
14 0.0290054 0.0218702 0.0355093 0.0271114 0.0333276 0.0213889
21 0.0298882 0.0217305 0.0338638 0.0257619 0.0317174 0.0232102
28 0.027171 0.0226906 0.0310661 0.0232594 0.0318467 0.0242996
35 0.0276286 0.0213796 0.032844 0.0261796 0.0323987 0.0250175
42 0.0267374 0.0203918 0.0326243 0.0244792 0.0301565 0.0263065
49 0.0275151 0.0201343 0.0327503 0.0243834 0.0335085 0.0241168
56 0.0280984 0.0206939 0.0322695 0.0235621 0.0356729 0.0264981
63 0.0278318 0.0200042 0.0328909 0.0234392 0.032261 0.023216
70 0.025194 0.018944 0.0319484 0.0230057 0.0322215 0.025216
77 0.0236167 0.018012 0.031681 0.0235896 0.0321066 0.0243048
84 0.028766 0.0210785 0.032733 0.0256838 0.0348732 0.0281259
91 0.0300849 0.0221525 0.0338671 0.0262404 0.0338975 0.0267939

MA 1 , Theta 0.8 and Var 0.04


0.0475

0.0425

0.0375
MA D R
0.0325 MA D RF
MA D RB
0.0275

0.0225

0.0175
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.0290583 0.0273909 0.044563 0.0309699 0.0388214 0.028812
14 0.0244912 0.0200414 0.0354718 0.0266229 0.0325798 0.0230112
21 0.0214376 0.0160924 0.033697 0.0236568 0.0279409 0.0224326
28 0.0210452 0.0168441 0.031311 0.0234256 0.0301271 0.0216654
35 0.0218347 0.0178179 0.0330702 0.0262258 0.0322516 0.0248353
42 0.0194951 0.014962 0.0322019 0.0245005 0.0300264 0.026784
49 0.0210887 0.0155766 0.0326344 0.0240657 0.0342948 0.0232509
56 0.0221642 0.016528 0.0321025 0.0242082 0.0354868 0.0288111
63 0.0212525 0.0159207 0.032808 0.0234319 0.0322647 0.0219167
70 0.0186874 0.0142388 0.0316811 0.0237323 0.0331364 0.0253297
77 0.0181389 0.0147707 0.0318353 0.0237466 0.0319389 0.0242705
84 0.022215 0.0179306 0.0318199 0.0258917 0.0339141 0.027118
91 0.0260839 0.0205974 0.0344244 0.0270749 0.0358881 0.0289346

196
MA 1 , Theta -0.2 and Var 0.04
0.039
0.038
0.037
0.036
0.035 MA D R
0.034 MA D RF
0.033 MA D RB
0.032
0.031
0.03
0.029
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.0384102 0.0257105 0.0383725 0.02605 0.0305345 0.0252273
14 0.033431 0.0270912 0.0340955 0.0268442 0.032694 0.0258565
21 0.0332083 0.0232394 0.0336205 0.0240432 0.0310729 0.0238828
28 0.0309238 0.0249945 0.032404 0.0253479 0.0313696 0.0241256
35 0.0336864 0.0256003 0.0332776 0.0258507 0.0326537 0.023715
42 0.032448 0.0235139 0.0317703 0.0239357 0.0329894 0.0242658
49 0.0335192 0.0255948 0.0335996 0.0254506 0.0321173 0.0257044
56 0.0323681 0.0229247 0.0316245 0.0236167 0.031603 0.0242178
63 0.0312506 0.0236646 0.0316226 0.023145 0.0308465 0.0245969
70 0.0301567 0.02236 0.0313841 0.0232527 0.0311265 0.0221162
77 0.0324724 0.0243971 0.0317101 0.0241789 0.0323908 0.0248439
84 0.0333974 0.0247861 0.0330342 0.0264072 0.0320871 0.0248244
91 0.034387 0.0266744 0.0323608 0.0250021 0.035926 0.027474

MA 1 , Theta -0.4 and Var 0.04


0.041

0.039

0.037
MA D R
0.035
MA D RF
0.033
MA D RB
0.031

0.029

0.027
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.0351231 0.0246511 0.0384694 0.026702 0.0287566 0.02267
14 0.0317687 0.0249296 0.0345635 0.0264959 0.0322584 0.0246515
21 0.030495 0.0212275 0.0336648 0.0240011 0.0306575 0.0233688
28 0.0281271 0.0217082 0.0319492 0.0252255 0.0309124 0.021554
35 0.0307866 0.0241351 0.0329644 0.0257205 0.0320268 0.0232217
42 0.0297624 0.0222733 0.0315405 0.0240591 0.0326454 0.0249022
49 0.0313248 0.0237694 0.033526 0.0257553 0.0302254 0.0253545
56 0.0302298 0.0215864 0.0315901 0.0231043 0.0311624 0.0240704
63 0.0291269 0.0223772 0.0310722 0.0237061 0.0306829 0.0237078
70 0.0276998 0.0221499 0.0315282 0.0234094 0.0311057 0.0240092
77 0.0314844 0.0233588 0.0319683 0.0240053 0.0335095 0.0249953
84 0.0317838 0.0225889 0.0331108 0.0261882 0.0313646 0.0254072
91 0.0343653 0.0256636 0.0328752 0.0253194 0.0388411 0.027594

197
MA 1 , Theta -0.6 and Var 0.04
0.039

0.037
0.035

0.033 MA D R
0.031 MA D RF
0.029 MA D RB

0.027
0.025

0.023
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.030461 0.0246031 0.0377911 0.0289662 0.0282651 0.0211742
14 0.0300374 0.0235605 0.034793 0.0270875 0.034323 0.0249579
21 0.0256411 0.0186823 0.0328074 0.0233332 0.0312448 0.0222147
28 0.0247913 0.019029 0.0312208 0.0255656 0.030332 0.0198875
35 0.0259724 0.0200421 0.0320598 0.0248516 0.0305786 0.0249031
42 0.0257339 0.0198106 0.0318664 0.0245488 0.0314841 0.0268013
49 0.0273979 0.0207157 0.0335161 0.0253331 0.0280172 0.0254204
56 0.0261856 0.0200333 0.031799 0.0237268 0.0317088 0.0239972
63 0.0269248 0.0206237 0.0309529 0.0242638 0.0305864 0.0240163
70 0.0239145 0.0196823 0.031407 0.0235089 0.0309078 0.0252262
77 0.0271865 0.0209445 0.0322787 0.0240723 0.032888 0.0261981
84 0.0266601 0.019723 0.0337867 0.026148 0.0314832 0.0240076
91 0.029576 0.0232837 0.032559 0.0253916 0.0372466 0.0289345

MA 1 , Theta -0.8 and Var 0.04


0.041

0.036

0.031 MA D R
MA D RF
0.026 MA D RB

0.021

0.016
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.0315254 0.0274856 0.0378521 0.0313679 0.0310246 0.0245616
14 0.0246491 0.0186035 0.035623 0.0252711 0.0352744 0.0260537
21 0.0196531 0.0155102 0.0334281 0.0242235 0.0294615 0.0209984
28 0.0184709 0.0159549 0.0314212 0.0245214 0.0291153 0.0183359
35 0.0173798 0.0140527 0.0314947 0.0244833 0.0306144 0.0246817
42 0.0183623 0.014978 0.0323489 0.0242187 0.0331517 0.0261697
49 0.0199387 0.015791 0.032992 0.025279 0.0291946 0.0242975
56 0.0208561 0.0157988 0.0322275 0.0238391 0.0330658 0.0232624
63 0.0208799 0.0171015 0.0307721 0.0245566 0.0299606 0.0239729
70 0.0173334 0.0149965 0.0315973 0.0238362 0.0325987 0.0246725
77 0.0213724 0.0166519 0.0332358 0.0267882 0.0323394 0.0259316
84 0.0214994 0.0152857 0.033381 0.0260322 0.0326923 0.0254503
91 0.0220972 0.0171824 0.0315524 0.0237744 0.0351633 0.027417

198
MA 1 , Theta 0.2 and Var 0.4
0.4
0.39
0.38
0.37
0.36 MA D R
0.35 MA D RF
0.34 MA D RB
0.33
0.32
0.31
0.3
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.378455 0.306622 0.387695 0.302522 0.353246 0.269399
14 0.335494 0.264187 0.334643 0.261153 0.337764 0.245631
21 0.341613 0.267928 0.339255 0.263067 0.3388 0.243107
28 0.320216 0.270409 0.320961 0.256739 0.321355 0.269537
35 0.321281 0.254731 0.329256 0.259968 0.308499 0.261105
42 0.330477 0.251557 0.335319 0.248588 0.328015 0.261315
49 0.330293 0.246294 0.327108 0.245085 0.328332 0.2428
56 0.310213 0.231367 0.315908 0.235356 0.339558 0.231349
63 0.315584 0.244476 0.305062 0.236265 0.320989 0.245478
70 0.313903 0.241702 0.318592 0.231319 0.319475 0.253616
77 0.330492 0.258099 0.333377 0.256767 0.342126 0.265274
84 0.338258 0.28521 0.328155 0.265446 0.348213 0.289361
91 0.33476 0.256105 0.328037 0.244308 0.33542 0.25678

MA 1 , Theta 0.4 and Var 0.4


0.4

0.38

0.36
MA D R
0.34 MA D RF
MA D RB
0.32

0.3

0.28
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.345273 0.284047 0.380909 0.293155 0.36909 0.27244
14 0.30906 0.241297 0.332621 0.26237 0.33389 0.240825
21 0.332783 0.257762 0.339825 0.270305 0.336622 0.236887
28 0.302616 0.262311 0.315625 0.253746 0.319131 0.267601
35 0.301443 0.23569 0.328226 0.261055 0.312752 0.256576
42 0.31761 0.232771 0.335891 0.24887 0.332896 0.266169
49 0.314172 0.234731 0.325225 0.242923 0.336326 0.242194
56 0.305298 0.217173 0.3168 0.235665 0.355045 0.237537
63 0.294716 0.224357 0.303218 0.239061 0.323329 0.231519
70 0.295959 0.234904 0.317597 0.231564 0.32509 0.26279
77 0.291985 0.232233 0.334195 0.25455 0.334949 0.249539
84 0.321646 0.270443 0.3234 0.262498 0.360186 0.298086
91 0.326635 0.24707 0.328822 0.244268 0.341656 0.260441

199
MA 1 , Theta 0.6 and Var 0.4
0.43
0.41
0.39
0.37
0.35 MA D R
0.33 MA D RF
0.31 MA D RB
0.29
0.27
0.25
0.23
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.344216 0.272764 0.402938 0.306485 0.382658 0.277612
14 0.290161 0.211781 0.351347 0.259535 0.331317 0.218736
21 0.301543 0.237762 0.349069 0.281757 0.32275 0.245123
28 0.275408 0.229044 0.320107 0.240317 0.318805 0.251184
35 0.268765 0.211835 0.328646 0.262827 0.32573 0.253261
42 0.284223 0.215195 0.339304 0.249828 0.334627 0.283254
49 0.274653 0.205375 0.323187 0.24004 0.334966 0.246097
56 0.280337 0.205834 0.320877 0.2372 0.352512 0.267633
63 0.271855 0.191418 0.302611 0.240438 0.317746 0.233542
70 0.2577 0.199469 0.319828 0.233452 0.320111 0.256721
77 0.243765 0.190448 0.327593 0.251761 0.331028 0.252811
84 0.294136 0.230261 0.328347 0.258232 0.361337 0.295354
91 0.29593 0.220513 0.324867 0.240718 0.340759 0.258081

MA 1 , Theta 0.8 and Var 0.4

0.41

0.36
MA D R
0.31 MA D RF

0.26 MA D RB

0.21

0.16
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.29041 0.274609 0.427648 0.30683 0.374506 0.289892
14 0.239003 0.182175 0.345714 0.251832 0.318086 0.222204
21 0.225036 0.171178 0.34839 0.261518 0.285822 0.228055
28 0.216327 0.167834 0.324095 0.241853 0.306212 0.215094
35 0.219071 0.170427 0.333893 0.262895 0.330387 0.252601
42 0.210357 0.161065 0.328377 0.251578 0.325274 0.294969
49 0.220122 0.165862 0.32073 0.241082 0.350069 0.241291
56 0.227681 0.164311 0.317471 0.243121 0.353227 0.2901
63 0.209156 0.153698 0.306184 0.239928 0.317578 0.22066
70 0.193118 0.148944 0.311852 0.241089 0.324846 0.254643
77 0.185647 0.152152 0.324577 0.250436 0.326919 0.254325
84 0.234461 0.191539 0.31599 0.257026 0.355076 0.282387
91 0.260783 0.205792 0.333215 0.246248 0.363858 0.275858

200
MA 1 , Theta -0.2 and Var 0.4
0.38
0.37
0.36
0.35
MA D R
0.34
MA D RF
0.33
MA D RB
0.32
0.31
0.3
0.29
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.375456 0.244439 0.375168 0.24417 0.303383 0.250009
14 0.322415 0.260832 0.331555 0.257844 0.314095 0.25134
21 0.347552 0.24947 0.346393 0.261585 0.326732 0.254292
28 0.31329 0.25053 0.320794 0.252953 0.313455 0.238291
35 0.332687 0.25245 0.332987 0.252747 0.326512 0.236987
42 0.329451 0.232314 0.322074 0.231751 0.331394 0.238268
49 0.327088 0.250538 0.329921 0.251298 0.317728 0.256758
56 0.311843 0.227686 0.316329 0.237058 0.304199 0.237023
63 0.295632 0.244839 0.293447 0.243313 0.293821 0.25204
70 0.308504 0.231837 0.321316 0.234936 0.314723 0.234522
77 0.316936 0.259217 0.320688 0.256237 0.315346 0.261297
84 0.329951 0.251283 0.327155 0.266218 0.319178 0.247616
91 0.345233 0.265668 0.32341 0.243586 0.352874 0.272725

MA 1 , Theta -0.4 and Var 0.4


0.39

0.37

0.35
MA D R
0.33 MA D RF
MA D RB
0.31

0.29

0.27
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.344096 0.245427 0.373023 0.261789 0.285203 0.232029
14 0.318402 0.24233 0.345176 0.257974 0.317053 0.244637
21 0.320233 0.224038 0.342921 0.25958 0.319901 0.250569
28 0.282587 0.221127 0.316255 0.251738 0.302903 0.214332
35 0.303357 0.238117 0.330929 0.253157 0.320014 0.232313
42 0.298518 0.223935 0.314461 0.23501 0.314685 0.246377
49 0.30763 0.238691 0.331571 0.254952 0.305457 0.264706
56 0.285048 0.216938 0.315507 0.231738 0.297279 0.233441
63 0.281742 0.228525 0.290257 0.247786 0.294496 0.241085
70 0.279031 0.220752 0.317242 0.240655 0.307019 0.235199
77 0.30898 0.252904 0.322582 0.254289 0.325677 0.264413
84 0.30689 0.220392 0.332878 0.263671 0.308928 0.247843
91 0.333814 0.247985 0.317341 0.240451 0.375892 0.266331

201
MA 1 , Theta -0.6 and Var 0.4
0.39
0.37
0.35
0.33 MA D R
0.31 MA D RF
0.29 MA D RB
0.27
0.25
0.23
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.315328 0.258954 0.375947 0.294683 0.294188 0.222028
14 0.30432 0.228233 0.342832 0.259866 0.334481 0.249902
21 0.271156 0.1942 0.335578 0.252336 0.330231 0.2317
28 0.24858 0.194375 0.317915 0.255043 0.293467 0.200928
35 0.255353 0.199661 0.321597 0.246514 0.307645 0.249149
42 0.259222 0.204616 0.314691 0.242897 0.307433 0.26392
49 0.265699 0.209304 0.334024 0.254018 0.289889 0.265718
56 0.244545 0.191858 0.317228 0.236948 0.298807 0.229903
63 0.261604 0.209193 0.292367 0.250771 0.297351 0.242128
70 0.242282 0.190489 0.318631 0.241138 0.303264 0.233574
77 0.273429 0.228763 0.322946 0.255396 0.327962 0.269785
84 0.257431 0.192889 0.332658 0.262735 0.31134 0.23746
91 0.288064 0.221178 0.318 0.24245 0.369448 0.273836

MA 1 , Theta -0.8 and Var 0.4


0.4

0.35

MA D R
0.3
MA D RF
0.25 MA D RB

0.2

0.15
0 20 40 60 80 100

Position MAD R SD R MAD RF SD RF MAD RB SD RB


7 0.340851 0.281864 0.381003 0.315096 0.33118 0.256379
14 0.254114 0.187364 0.346643 0.254348 0.34585 0.257446
21 0.19874 0.162586 0.344719 0.256388 0.302306 0.21614
28 0.18957 0.160901 0.31005 0.245833 0.293874 0.181866
35 0.18517 0.144938 0.326239 0.278479 0.304074 0.253179
42 0.190175 0.153621 0.31301 0.237991 0.321941 0.255233
49 0.195208 0.154468 0.32982 0.253911 0.300794 0.249364
56 0.197138 0.15786 0.32194 0.23852 0.308494 0.233854
63 0.214619 0.172694 0.292953 0.251665 0.300516 0.249827
70 0.174406 0.146451 0.316203 0.239624 0.30197 0.23589
77 0.213287 0.170897 0.327076 0.265134 0.332153 0.261977
84 0.22023 0.163007 0.343387 0.266193 0.345173 0.257615
91 0.226075 0.163026 0.314155 0.234695 0.350704 0.264876

202

You might also like