The Complete Guide To Time Series Analysis and Forecasting
The Complete Guide To Time Series Analysis and Forecasting
structural breaks
Autocorrelation
Informally, autocorrelation is the similarity between observations as a function
of the time lag between them.
Trend
Trend is a general direction in which something is developing or changing. A
trend can be upward(uptrend) or downward(downtrend). It is not always
necessary that the increase or decrease is consistently in the same direction in
a given period.
Upward trend
Seasonality
Seasonality refers to periodic fluctuations. For example, electricity
consumption is high during the day and low during night, or online sales
increase during Christmas before slowing down again.
Example of seasonality
As you can see above, there is a clear daily seasonality. Every day, you see a
peak towards the evening, and the lowest points are the beginning and the end
of each day.
Remember that seasonality can also be derived from an autocorrelation plot if
it has a sinusoidal shape. Simply look at the period, and it gives the length of
the season.
Cyclicity
It is a medium-term variation caused by circumstances, which repeat in
irregular intervals.
Example: 5 years of economic growth, followed by 2 years of economic
recession, followed be 7 years of economic growth followed by 1 year of
economic recession.
Stationarity
Stationarity is an important characteristic of time series. A time series is said to
be stationary if its statistical properties do not change over time. In other
words, it has constant mean and variance, and covariance is independent of
time.
In the first plot, mean varies (increases) with time which results in an
upward trend.
In the second plot, no trend in the series, but the variance of the series is
a vary over time.
In the third plot, the spread becomes closer as the time increases, which
means that the covariance is varying over time.
In this diagram, all three properties are constant with time which stationary
time series looks like.
Example of a stationary process
Looking again at the same plot, we see that the process above is stationary. The
mean and variance do not vary over time.
Often, stock prices are not a stationary process, since we might see a growing
trend, or its volatility might increase over time (meaning that variance is
changing).
Ideally, we want to have a stationary time series for modelling. Of course, not
all of them are stationary, but we can make different transformations to make
them stationary.
If any of time series components are present in data then it’s non-stationary.
Without going into the technicalities of the Dickey-Fuller test, it test the null
hypothesis that a unit root is present.
As an example, the process below is not stationary. Notice how the mean is not
constant through time.
Example of a non-stationary process
We can use several methods to identify whether the time series is stationary
or not.
Visual test which identifies the series simply by looking at each plot.
ADF (Augmented Dickey-Fuller) Test which used to determine the
presence of unit root in the series.
KPSS (Kwiatkowski-Phillips-Schmidt-Shin) Test
Rolling Statistics
Rolling statistics is help us in making time series stationary. so basically rolling
statistics calculates moving average. To calculate the moving average we need
to define the window size which is basically how much past values to be
considered.
For example, if we take the window as 2 then to calculate a moving average in
the above example then, at point T1 it will be blank, at point T2 it will be the
mean of T1 and T2, at point T3 mean of T3 and T2, and so on. And after
calculating all moving averages if you plot the line above actual values and
calculated moving averages then you can see that the plot will be smooth.
This is one method of making time series stationary, there are other methods
also which we are going to study as Exponential smoothing.
Moving average
The moving average model is probably the most naive approach to time series
modelling. This model simply states that the next observation is the mean of all
past observations.
Although simple, this model might be surprisingly good and it represents a
good starting point.
Otherwise, the moving average can be used to identify interesting trends in the
data. We can define a window to apply the moving average model
to smooth the time series, and highlight different trends.
'Moving Avg_24= df[1:50].rolling(window=24).mean()
plt.figure(figsize=(17,4))
plt.ylabel('Energy Production')
plt.title(‘Moving average window size=24’)
plt.plot(df1[['Energy_Production','Moving Avg_24']]);
Example of a moving average on a 24h window
In the plot above, we applied the moving average model to a 24h window. The
green line smoothed the time series, and we can see that there are 2 peaks in
a 24h period.
Of course, the longer the window, the smoother the trend will be. Below is an
example of moving average on a smaller window.
Exponential smoothing
Exponential smoothing uses a similar logic to moving average, but this time, a
different decreasing weight is assigned to each observations. In other
words, less importance is given to observations as we move further from the
present.
Mathematically, exponential smoothing is expressed as:
AR model
Auto-Regressive (AR only) model is one where the model depends only on its
own lags.
MA model
Moving Average model is one where the model depends only on the lagged
forecast errors which are the errors of the AR models of the respective lags.
ARMA model
The Autoregressive-moving average process is the basic model for analysing a
stationary time series. ARMA model is about merging AR and MA models.
AR model explains the momentum and mean reversion effects and MA model
captures the shock effects observed in the white noise terms. These shock
effects can be considered as unexpected events affecting the observation
process such as surprise earnings, wars, attacks, etc.
ARIMA model
Auto-Regressive Integrated Moving Average aka ARIMA is a class of models
that based on its own lags and the lagged forecast errors. Any non-seasonal
time series that exhibits patterns and is not a random white noise can be
modelled with ARIMA models.
ARIMA model is characterized by 3 terms:
p is the order of the AR term where the number of lags of Y to be used
as predictors
q is the order of the MA term where the number of lagged forecast
errors that should go.
d is the minimum number of differencing needed to make the series
stationary.
SARIMA model
In a seasonal ARIMA model, seasonal AR and MA terms predict using data
values and errors at times with lags that are multiples of m(the span of the
seasonality).
Non-seasonal terms(p,d,q): Can use ACF and PACF plots for this. By examining
spikes of early lags ACF indicates MA term(q). Similarly, PACF indicates the AR
term(p).
Seasonal terms(P, D, Q and m): For this need to examine the patterns across
lags that are multiples of m. Most cases first two or three seasonal multiples
would be enough for this. Use the ACF and PACF the same way.