TSA Assignment
TSA Assignment
2022-11-19
As I am from NB-Stream with even roll no. (MD2106), I have chosen Dataset-4 for the analysis.
The visitors dataset presents the monthly Australian short-term overseas visitors during May 1985-April
2005 (in thousands).As the data is indexed by time, it is a timeseries data. The plot of the data against
corresponding time point is given below:
600
400
visitors
200
From the above plot, we can observe that the data has approximately increasing linear trend. Also, same
pattern appears repeatedly after every 12 months, which shows seasonality component of frequency
12 . We choose an additive time series model for the analysis.
These can be easily observed in the following plot.
1
Decomposition of multiplicative time series
600
400 data
200
400
trend
300
200
100
seasonal
1.2
1.0
0.8
1.1
remainder
1.0
0.9
The trend component can be approximated by a line with increasing slope, hence it is approximately linearly
increasing.
The seasonality component is very evident from the above plot, where the same pattern can be seen to be
repeated after every year.
The remainder is the random noise component, which we attempt to model as a stationary time series in
this analysis.
Since the decompose function uses moving averages to find trend, the random component is NA for the first
and last 6 data points. Hence the length is reduced by 12. The following is a plot of the noise component.
From the plot below, we can see that the noise component is fairly randomly spread about the line y=0 .
Hence we do not have to apply any transformation on it to model the noise component as a stationary time
series.
1.00
noise
0.85
Time
2
ACF & PACF plots of random component:
The Auto-Correlation Function and Partial Auto-Correlation Function of the random component at different
lags is plotted below:
Series noise
1.0
0.6
ACF
0.2
−0.2
0 5 10 15 20
Lag
Series noise
0.2
Partial ACF
0.0
−0.2
5 10 15 20
Lag
We know that, for an ARMA(p,q) process, the PACF and ACF plots begin to tail off at p and
q respectively. The PACF plot starts to tail off at lag 6, so it is highly probable that p=6 . Also, the
ACF becomes very small just after lag 0, but takes high values again at lags 6, 12 etc. This could be due
to randomness. It is possible that q=0 or q=6 . Once we employ a model selection criteria, the values of p
and q would become clearer.
3
Fitting ARMA model
We shall fit ARMA(6,0) & ARMA(6,6) models for now, find the coefficients and residuals and compare them.
Later, we will use a model selection criteria.
We have fitted ARMA(6,0) model to the noise data and the coefficients are as follows:
The Portmanteau tests’ results for the hypothesis that residuals follow iid noise, ACF, PACF and residuals
and Normal Q-Q plots are shown below:
ACF PACF
0.5
0.5
−1.0
−1.0
0 10 20 30 40 0 10 20 30 40
Lag Lag
0.10
−0.10
−0.10
From the Portmanteau tests, we see that 3 out of 5 tests agree with the hypothesis that the residuals are iid
noise.
4
Fitting ARMA(6,6) model:
We have fitted ARMA(6,6) model to the noise data and the coefficients are as follows:
The Portmanteau tests’ results for the hypothesis that residuals follow iid noise, ACF, PACF and residuals
and Normal Q-Q plots are shown below:
ACF PACF
0.5
0.5
−1.0
−1.0
0 10 20 30 40 0 10 20 30 40
Lag Lag
0.10
−0.10
−0.10
From the Portmanteau tests, we see that 4 out of 5 tests agree with the hypothesis that the residuals are iid
noise.
ARMA(6,0) is better model compared to ARMA(6,6) as both the AIC and BIC are higher for the former
model. Now, based on different information criteria, we will select best model.
5
Best model based on AIC:
Here, we select (p,q) such that the AIC of the fitted ARMA(p,q) model is highest.
Here, we select (p,q) such that the BIC of the fitted ARMA(p,q) model is highest.
ACF PACF
0.5
0.5
−1.0
−1.0
0 10 20 30 40 0 10 20 30 40
Lag Lag
0.10
−0.10
−0.10
6
## [1] "The AIC of the ARMA(5,0) model is -811.479590103413"
Note that both the AIC and BIC are higher for this model compared to previous models.
Without removing the trend and seasonality components, we try to model the visitors data in this section
using SARIMA model. The following is the fitted SARIMA model:
## Series: visitors
## ARIMA(1,0,1)(0,1,2)[12] with drift
##
## Coefficients:
## ar1 ma1 sma1 sma2 drift
## 0.8968 -0.3187 -0.7110 0.1461 1.4820
## s.e. 0.0379 0.0804 0.0753 0.0723 0.2667
##
## sigma^2 = 279.9: log likelihood = -966.83
## AIC=1945.66 AICc=1946.04 BIC=1966.24
where {Xt } denotes the visitors data indexed by time and {Zt } is a WhiteNoise(0,σ 2 ) process. Here the
estimated coefficients and σ 2 are as displayed above.
To test the hypothesis that the residuals follow iid noise, the Portmanteau tests are used once again and the
results are as follows:
7
ACF PACF
0.5
0.5
−1.0
−1.0
0 10 20 30 40 0 10 20 30 40
Lag Lag
Sample Quantiles
0 40
0 40
−60
−60
1985 1990 1995 2000 2005 −3 −2 −1 0 1 2 3
We see that 4 out of 5 tests agree that the residuals follow iid noise.
Conclusion