Basic Time Series With Python Code
Basic Time Series With Python Code
Application –
• AR models can be used to describe the time paths of variables and capture
their correlations between current and past values; they are generally used
for forecasting. Past values are used to forecast future values.
• Below you can see an example of an ACF and PACF plot. These plots are
called “lollipop plots”
• Both the ACF and PACF start with a lag of 0, which is the correlation of
the time series with itself and therefore results in a correlation of 1.
• The difference between ACF and PACF is the inclusion or exclusion of
indirect correlations in the calculation.
• Additionally, you can see a blue area in the ACF and PACF plots. This blue
area depicts the 95% confidence interval and is an indicator of the
significance threshold. That means, anything within the blue area is
statistically close to zero and anything outside the blue area is statistically
non-zero.
• To determine the order of the model, you check: “How [many] lollipops
are above or below the confidence interval before the next lollipop
enters the blue area?
Auto Regressive Process (AR)
Assume that a current value of the series is linearly dependent upon its previous
value, with some error. Then we could have the linear relationship
where εt is a white noise time series. [That is, the εt are a sequence of
uncorrelated random variables (possibly normally distributed, but not
necessarily normal) with mean 0 and variance σ 2 .] This model is called an
autoregressive (AR) model since X is regressed on itself. Here the lag of the
autoregression is 1. More generally, we could have an autoregressive model
of order p, an AR(p) model, defined by
AR(1) Process
The following time series is an AR(1) process with 128 timesteps and alpha_1 =
0.5. It meets the precondition of stationarity.
We can make the following observations:
• There are several autocorrelations that are significantly non-zero. Therefore,
the time series is non-random.
• High degree of autocorrelation between adjacent (lag = 1) in PACF plot
• Geometric decay in ACF plot
Based on the above table, we can use an AR(1) model to model this process.
AR(2) Process
The following time series is an AR(2) process with 128 timesteps, alpha_1 = 0.5,
and alpha_2 = -0.5. It meets the precondition of stationarity.
We can make the following observations:
• There are several autocorrelations that are significantly non-zero. Therefore,
the time series is non-random.
• High degree of autocorrelation between adjacent (lag = 1) and near-adjacent
(lag = 2) observations in PACF plot
• Geometric decay in ACF plot
As you can see, the AR(2) model fits an alpha_1 = 0.5191 and alpha_2 = -0.5855,
which is quite close to the alpha_1 = 0.5 and alpha_2 = -0.5 that we have set.
ACF plot
• The sample ACF for the simulated data follows. The pattern is typical for situations where an
MA(2) model may be useful. There are two statistically significant “spikes” at lags 1 and 2
followed by non-significant values for other lags. Note that due to sampling error, the
sample ACF did not match the theoretical pattern exactly.
Autoregressive Moving Average Model: ARMA(p,q)
• Autoregressive moving average models are simply a
combination of an AR model and an MA model
• In this model, the impact of previous lags along with the residuals is
considered for forecasting the future values of the time series. Here β
represents the coefficients of the AR model and α represents the
coefficients of the MA model.
• Hence, this model can explain the relationship of a time series with both
random noise (moving average part) and itself at a previous step
(autoregressive part).
• Consider the above graphs, where the MA and AR values are plotted with their
respective significant values. Let's assume that we consider only 1 significant
value from the AR model and, likewise 1 significant value from the MA model.
So the ARMA model will be obtained from the combined values of the other
two models will be of the order of ARMA(1,1).
where
1. X : is the univariate time series or data to use for estimation. This correspond
to our variable Yt in our case.
2. ordrer = c(1,1) : is a vector of 2 elements defining the lag order of the AR(p)
and MA(q). In order words, p is the first element of the vector and q the
second. To transform ARMA(p,q) into an AR(p) ( an MA(q)) model, we just
have to set q=0 (p=0).
3. lag = c(p,q) : this option can be used as an alternative to the option "order" in
order to define le values of p and q. So, it is a list defining the values p and q
for the AR(p) and MA(q) respectively.
4. coef : it a vector of the coefficients of an ARMA obtained from an initial
estimation.
5. include.intercept : A boolean value saying if we should include the intercept
in our estimation or not.
6. series : is the name of the series
7. qr.tol : is the stopping value of the iteration from the estimation procedure of
the standard error of the coefficients.
• Let us assume now that the monthly bitcoin data can be fitted by an
ARIMA(2,1,2) model, meaning that it has an autoregressive pattern with
2 lags and a moving average with 2 lags. Let then try to estimate the
coefficients of this model specification and see what we obtain. Here is
the code.
• As the estimation is done, we can plot the original data and the
predicted one all together on the same figure in order to check how
well we have done so far. Here is the code to follow-up
• Therefore, we obtain the following figure. As you can see, we are doing
pretty well with a certain lag. The original series is the solid black line
while the predicted one is the red line.
Forecasting using ARMA model
• As we have already estimated the model, we will forecast it. For such
purpose, we can use the function Predict(). It is worth noting that the
forecasted value of Bitcoin for the next month is evaluated as the
conditional expectation of the true value, given its past value. Here is the
code for the prediction.
ARIMA(p,d,q)
Now let us see how these three parameters bound each other and, lastly, the
plots of ACF and PACF. As AR uses its lag errors as predictors. And works
best when the used predictor is not independent of each other. The Model
works on two important key concepts:
1. The Data series as input should be stationary.
2. As ARIMA takes past values to predict the future output, the input data must
be invariant.
Implementation Steps:
• To do that, we have to make the value as a time series object and then
we look at the result of the time-series that is applied on diff function.
Here is the code look like,
• Based on this visualization, we can see that the differenced data has
slightly more stationary properties to it and there’s no trend on it.
Therefore, we can use this differenced data to find which AR and MA
parameters are good for our ARIMA model.
• To find which AR’s and MA’s parameter values that suit our ARIMA
model, we can use ACF and PACF function to find which are the best
value,
• Here is the function that we can visualize the ACF and PACF plot,
Here is the result,
• Based on this plot, we know that the ACF and the PACF function have
slightly tail off on it and it’s not reaching 0’s quickly. Because of that, we
can assume that it will use parameter p with 1 and parameter q also
with 1. Therefore, the ARMA model that suits is ARMA(1,1).
Model Diagnostic
• The ARIMA(1, 1, 1) will be our model if there are no significant changes
between the other model. In this case, we will try ARIMA(2, 1, 1) and
ARIMA(1, 1, 2) to see if the parameter has a significant p-value on it.
• After we chose the best model for the data, we have to diagnose one
more thing, which is the residuals. Below are the residual analysis result
from the sarima function,
• If we want to use this model, the residual must have a normal
distribution to it and there is no correlation between each lag of its
values. Based on this visualization, we can see that the residual don’t
have any significant autocorrelation to it as we can see from the ACF
plot, most of the points also get aligned on the line as we can see from
the Q-Q plot, and the Ljung-Box statistic p values doesn’t cross the blue
line. Therefore, we can use this model to do forecasting.
Forecasting
• If we want to do some forecasting, we can use sarima.for function to do
it. Each parameter for this function consists of the data, what time
ahead we want to forecast, and then set the p, d, and q parameter that
represents our ARIMA model.
• As we can see here, the black dot represents our data, and the red one
represents the forecast result. As we can see here, the forecast result
also shows the grey boundary between the red line. It basically
indicates the standard error from the forecast result. The boundary
doesn’t have a large boundary on it, therefore the model has a good
result to it.
ARCH and GARCH MODEL
ARCH and GARCH. These model(s) are also called volatility model(s). These
models are exclusively used in the finance industry as many asset prices are
conditional heteroskedastic.
• Since we can only tell whether the ARCH model is appropriate or not by
squaring the residuals and examining the correlogram, we also need to
ensure that the mean of the residuals is zero.
• Crucially, ARCH should only ever be applied to series that do not have
any trends or seasonal effects, i.e. that have no (evident) serially
correlation. ARIMA is often applied to such a series (or even Seasonal
ARIMA), at which point ARCH may be a good fit.
ARCH(1):
• where w(t) is the white noise with zero mean and unit variance.
• similarly ARCH(2):
Interpretation:
• If the error is high during the period (t-1), it is more likely that the value
of error at the period (t) is also higher.
• vice versa — If the error is low during the period (t-1) then the value
inside sqrt will be low which results in a decreased error in (t).
• Remember, ⍺1 ≥ 0 for the positive variance.
• For the stability condition to hold, ⍺1 < 1, otherwise ϵ(t) will be
explosive (continue to increase over time).
GARCH(1,1):
• Similarly GARH(p,q):
Interpretation:
output graph:
3. ARCH(1) Squared model:
output graph:
output graph:
output graph:
Now, let's run the above model through an example using “SPY returns”,