0% found this document useful (0 votes)
177 views20 pages

The Complete Guide To Time Series Analysis and Forecasting

This document provides an overview of time series analysis and forecasting techniques. It discusses key characteristics of time series such as stationarity, seasonality, and autocorrelation. It also covers different types of forecasts like short, medium, and long term, and quantitative vs qualitative forecasting. Various time series modeling and evaluation techniques are presented, including moving averages, exponential smoothing, ARIMA, and assessing errors. The document provides examples to illustrate concepts like trends, seasonality, autocorrelation, and testing for stationarity.

Uploaded by

ac
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
177 views20 pages

The Complete Guide To Time Series Analysis and Forecasting

This document provides an overview of time series analysis and forecasting techniques. It discusses key characteristics of time series such as stationarity, seasonality, and autocorrelation. It also covers different types of forecasts like short, medium, and long term, and quantitative vs qualitative forecasting. Various time series modeling and evaluation techniques are presented, including moving averages, exponential smoothing, ARIMA, and assessing errors. The document provides examples to illustrate concepts like trends, seasonality, autocorrelation, and testing for stationarity.

Uploaded by

ac
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

The Complete Guide to Time Series

Analysis and Forecasting


Understand moving average, exponential smoothing, stationarity,
autocorrelation, SARIMA, and apply these techniques in two
projects.

Whether we wish to predict the trend in financial markets or electricity


consumption, time is an important factor that must now be considered in our
models. For example, it would be interesting to forecast at what hour during
the day is there going to be a peak consumption in electricity, such as to adjust
the price or the production of electricity.
Enter time series. A time series is simply a series of data points ordered in time.
In a time series, time is often the independent variable and the goal is usually to
make a forecast for the future.
However, there are other aspects that come into play when dealing with time
series.
Is it stationary?
Is there a seasonality?
Is the target variable autocorrelated?
In this post, I will introduce different characteristics of time series and how we
can model them to obtain accurate (as much as possible) forecasts.
Forecasting, planning and goals
Forecasting is a common statistical task in business, where it helps to inform
decisions about the scheduling of production, transportation and personnel,
and provides a guide to long-term strategic planning. However, business
forecasting is often done poorly, and is frequently confused with planning
and goals. They are three different things.
Forecasting
is about predicting the future as accurately as possible, given all of the
information available, including historical data and knowledge of any
future events that might impact the forecasts.
Goals
are what you would like to have happen. Goals should be linked to
forecasts and plans, but this does not always occur. Too often, goals
are set without any plan for how to achieve them, and no forecasts for
whether they are realistic.
Planning
is a response to forecasts and goals. Planning involves determining the
appropriate actions that are required to make your forecasts match
your goals.
Forecasting should be an integral part of the decision-making activities of
management, as it can play an important role in many areas of a company.
Modern organisations require short-term, medium-term and long-term
forecasts, depending on the specific application.
Short-term forecasts
are needed for the scheduling of personnel, production and
transportation. As part of the scheduling process, forecasts of demand
are often also required.
Medium-term forecasts
are needed to determine future resource requirements, in order to
purchase raw materials, hire personnel, or buy machinery and
equipment.
Long-term forecasts
are used in strategic planning. Such decisions must take account of
market opportunities, environmental factors and internal resources.
Types of forecasting:
1) Quantitative forecasting
2) Qualitative forecasting
Let us see what it is,
1) Quantitative forecasting
Quantitative forecasting is done based on the historical data (i,e) Past and
present data mostly numerical data. Through this historical data, we use
statistical methods and so we can predict with lesser bias.
2) Qualitative forecasting
Qualitative forecasting is done based on the opinion and judgment of the
subject matter experts and the customers. Why we rely upon judgment instead
of data? Because in some cases, the past data are not available or unclear. so
here we are depend on judgment and opinions.

When Not To Use Time Series Forecasting?


 When values are constant over a period of time.
 When values are represented by a certain function.

Modelling and evaluation Techniques


Modelling: Naive approach, Moving average(MA), Simple exponential
smoothing, Double Exponential Smoothing, Triple Exponential Smoothing(Holt
Winters method), linear trend model, Auto Regression Integrated Moving
Average(ARIMA), SARIMAX, etc.
Evaluation: Mean Square Error(MSE), Root Mean Squared Error(RMSE),MAPE
etc.
Structural breaks
It is a component that shows some sudden change in the time series data. This
structural break affects the reliability of the results. Statistical methods should
be used to identify the structural breaks.

structural breaks

Autocorrelation
Informally, autocorrelation is the similarity between observations as a function
of the time lag between them.

Example of an autocorrelation plot


Above is an example of an autocorrelation plot. Looking closely, you realize that
the first value and the 24th value have a high autocorrelation. Similarly, the
12th and 36th observations are highly correlated. This means that we will find a
very similar value at every 24 unit of time.
Notice how the plot looks like sinusoidal function. This is a hint
for seasonality, and you can find its value by finding the period in the plot
above, which would give 24h.
Time series Components
The time-series graph helps to highlight the trend and behaviour of the data
over time for building a more reliable model. To understand these patterns, we
should structure this data and breakdown into several factors. We use various
components to break down this data. They are,
Trend, Seasonality ,Cyclicity, Noise ,Level

Trend
Trend is a general direction in which something is developing or changing. A
trend can be upward(uptrend) or downward(downtrend). It is not always
necessary that the increase or decrease is consistently in the same direction in
a given period.

Upward trend
Seasonality
Seasonality refers to periodic fluctuations. For example, electricity
consumption is high during the day and low during night, or online sales
increase during Christmas before slowing down again.

Example of seasonality

As you can see above, there is a clear daily seasonality. Every day, you see a
peak towards the evening, and the lowest points are the beginning and the end
of each day.
Remember that seasonality can also be derived from an autocorrelation plot if
it has a sinusoidal shape. Simply look at the period, and it gives the length of
the season.
Cyclicity
It is a medium-term variation caused by circumstances, which repeat in
irregular intervals.
Example: 5 years of economic growth, followed by 2 years of economic
recession, followed be 7 years of economic growth followed by 1 year of
economic recession.

Unexpected Events/Irregular Variations/Noise


Unexpected events mean some dynamic changes occur in an organization, or
in the market which cannot be captured. for example a current pandemic we
are suffering from, and if you observe the Sensex or nifty chart there is a huge
decrease in stock price which is an unexpected event that occurs in the
surrounding.
Methods and algorithms are using which we can capture seasonality and trend
But the unexpected event occurs dynamically so capturing this becomes very
difficult.

Stationarity
Stationarity is an important characteristic of time series. A time series is said to
be stationary if its statistical properties do not change over time. In other
words, it has constant mean and variance, and covariance is independent of
time.

Stationarity means that the statistical properties of a time series which


are mean, variance and covariance do not change over time.

 In the first plot, mean varies (increases) with time which results in an
upward trend.

 In the second plot, no trend in the series, but the variance of the series is
a vary over time.

 In the third plot, the spread becomes closer as the time increases, which
means that the covariance is varying over time.

In this diagram, all three properties are constant with time which stationary
time series looks like.
Example of a stationary process

Looking again at the same plot, we see that the process above is stationary. The
mean and variance do not vary over time.

Often, stock prices are not a stationary process, since we might see a growing
trend, or its volatility might increase over time (meaning that variance is
changing).

Ideally, we want to have a stationary time series for modelling. Of course, not
all of them are stationary, but we can make different transformations to make
them stationary.

If any of time series components are present in data then it’s non-stationary.

How to test if a process is stationary


You may have noticed in the title of the plot above Dickey-Fuller. This is the
statistical test that we run to determine if a time series is stationary or not.

Without going into the technicalities of the Dickey-Fuller test, it test the null
hypothesis that a unit root is present.

If it is, then p >  0, and the process is not stationary.

Otherwise, p =  0, the null hypothesis is rejected, and the process is considered


to be stationary.

As an example, the process below is not stationary. Notice how the mean is not
constant through time.
Example of a non-stationary process
We can use several methods to identify whether the time series is stationary
or not.
 Visual test which identifies the series simply by looking at each plot.
 ADF (Augmented Dickey-Fuller) Test which used to determine the
presence of unit root in the series.
 KPSS (Kwiatkowski-Phillips-Schmidt-Shin) Test
Rolling Statistics
Rolling statistics is help us in making time series stationary. so basically rolling
statistics calculates moving average. To calculate the moving average we need
to define the window size which is basically how much past values to be
considered.
For example, if we take the window as 2 then to calculate a moving average in
the above example then, at point T1 it will be blank, at point T2 it will be the
mean of T1 and T2, at point T3 mean of T3 and T2, and so on. And after
calculating all moving averages if you plot the line above actual values and
calculated moving averages then you can see that the plot will be smooth.
This is one method of making time series stationary, there are other methods
also which we are going to study as Exponential smoothing.

Additive and Multiplicative Time series


In the real world, we meet with different kinds of time series data. For this, we
must know the concepts of Exponential smoothing and for this first, we need
to study types of time series data as additive and multiplicative. As we studied
there are 3 components we need to capture as Trend(T), seasonality(S), and
Irregularity(I).
Additive time series is a combination(addition) of trend, seasonality, and
Irregularity while multiplicative time series is the multiplication of these three
terms.
 Additive time series is one in which the magnitude of trend and
seasonality does not increase with time. They remain fairly constant.
 Multiplicative time series is one in which the magnitude of trend and
seasonality increases as time period increases.

Moving average
The moving average model is probably the most naive approach to time series
modelling. This model simply states that the next observation is the mean of all
past observations.
Although simple, this model might be surprisingly good and it represents a
good starting point.
Otherwise, the moving average can be used to identify interesting trends in the
data. We can define a window to apply the moving average model
to smooth the time series, and highlight different trends.
'Moving Avg_24= df[1:50].rolling(window=24).mean()
plt.figure(figsize=(17,4))
plt.ylabel('Energy Production')
plt.title(‘Moving average window size=24’)
plt.plot(df1[['Energy_Production','Moving Avg_24']]);
Example of a moving average on a 24h window

In the plot above, we applied the moving average model to a 24h window. The
green line smoothed the time series, and we can see that there are 2 peaks in
a 24h period.
Of course, the longer the window, the smoother the trend will be. Below is an
example of moving average on a smaller window.

'Moving Avg_12= df[1:50].rolling(window=12).mean()


plt.figure(figsize=(17,4))
plt.ylabel('Energy Production')
plt.title(‘Moving average window size=12’)
plt.plot(df1[['Energy_Production','Moving Avg_12']]);
Example of a moving average on a 12h window

Exponential smoothing
Exponential smoothing uses a similar logic to moving average, but this time, a
different decreasing weight  is assigned to each observations. In other
words, less importance  is given to observations as we move further from the
present.
Mathematically, exponential smoothing is expressed as:

Exponential smoothing expression


Here, alpha  is a smoothing factor that takes values between 0 and 1. It
determines how fast the weight decreases for previous observations.

Example of exponential smoothing


From the plot above, the dark blue line represents the exponential smoothing
of the time series using a smoothing factor of 0.3, while the orange line uses a
smoothing factor of 0.05.
As you can see, the smaller the smoothing factor, the smoother the time series
will be. This makes sense, because as the smoothing factor approaches 0, we
approach the moving average model.
Double exponential smoothing
Double exponential smoothing is used when there is a trend in the time series.
In that case, we use this technique, which is simply a recursive use of
exponential smoothing twice.
Mathematically:

Double exponential smoothing expression


Here, beta is the trend smoothing factor, and it takes values between 0 and 1.
Below, you can see how different values of alpha and beta  affect the shape of
the time series.

Example of double exponential smoothing


Tripe exponential smoothing
If we need to capture trend and seasonality for both components then it is
known as triple exponential smoothing which adds another layer on top of
trend exponential smoothing where we need to calculate trend and
seasonality for both.
This method extends double exponential smoothing, by adding a seasonal
smoothing factor. Of course, this is useful if you notice seasonality in your time
series.
Mathematically, triple exponential smoothing is expressed as:
Triple exponential smoothing expression
Where gamma is the seasonal smoothing factor and L  is the length of the
season.
Simple Exponential Smoothing
Now as we have seen in simple exponential smoothing has a parameter known
as alpha which defines how much weightage we want to give to recent
observation. we will fit 2 models, one with high value and one with less value
of alpha, and compare both.
data = df[1:50]
fit1 = SimpleExpSmoothing(data).fit(smoothing_level=0.2, optimized=False)
fit2 = SimpleExpSmoothing(data).fit(smoothing_level=0.8, optimized=False)
plt.figure(figsize=(18, 8))
plt.plot(df[1:50], marker='o', color="black")
plt.plot(fit1.fittedvalues, marker="o", color="b")
plt.plot(fit2.fittedvalues, marker="o", color="r")
plt.xticks(rotation="vertical")
plt.show()
Holt method for exponential smoothing
Hot’s method is a popular method for exponential smoothing and is also
known as Linear exponential smoothing. It forecast the data with the trend. It
works on three separate equations that work together to generate the final
forecast. let us apply this to our data and experience the changes. In the first
fit, we are assuming that there is a linear trend in data, and in the second
fitting, we are having exponential smoothing.

fit1 = Holt(data).fit() #linear trend


fit2 = Holt(data, exponential=True).fit() #exponential trend
plt.plot(data, marker='o', color='black')
plt.plot(fit1.fittedvalues, marker='o', color='b')
plt.plot(fit2.fittedvalues, marker='o', color='r')
plt.xticks(rotation="vertical")
plt.show()
You can observe that linear trend means blue plot does not fit fine, and
following the original plot whereas red plot is an exponential smoothing plot.
This is a simple smoothing with the holt method, we also add parameters like
alpha, trend component, seasonality component.
Decomposition of time-series data
from statsmodels.tsa.seasonal import seasonal_decompose
Now we will decompose time series data into multiplicative and additive and
visualize the seasonal and trend components that they have extracted.
# Additive Decomposition
add_result = seasonal_decompose(DrugSalesData['Value'],
model='additive',period=1)
# Multiplicative Decomposition
mul_result = seasonal_decompose(DrugSalesData['Value'],
model='multiplicative',period=1)
add_result.plot().suptitle('Additive Decompose', fontsize=12)
plt.show()
If you observe the plots you will get 4 plots, two for trend, one for seasonality,
and one for residual. We can see that trend is of course there using both time
methods and seasonality is zero.

Auto Correlation Function


This summarizes the strength of the relationship between two variables. We
can use the Pearson’s correlation coefficient for this purpose.
The Pearson’s correlation coefficient is a number between -1 and 1 that
describes a negative or positive correlation respectively.
We can calculate the correlation for time-series observations with previous
time steps, called lags. Since the correlation is calculated with values of the
same series at previous times, this is called a serial correlation or
autocorrelation.
A plot of the autocorrelation of a time series by lag is called
the Auto Correlation Function(ACF) and also this plot is called a correlogram or
autocorrelation plot.

Partial Autocorrelation Function


PACF describes the direct relationship between an observation and its lag. It
summarizes the relationship between an observation in a time series with
observations at prior time steps by removing the relationships of intervening
observations.
The autocorrelation for observation and observation at a prior time step is
consists of both the direct and indirect correlations. These indirect correlations
consist of a linear function of the correlation of the observation at intervening
time steps. Partial autocorrelation function removes these indirect
correlations.

AR model
Auto-Regressive (AR only) model is one where the model depends only on its
own lags.

MA model
Moving Average model is one where the model depends only on the lagged
forecast errors which are the errors of the AR models of the respective lags.

ARMA model
The Autoregressive-moving average process is the basic model for analysing a
stationary time series. ARMA model is about merging AR and MA models.
AR model explains the momentum and mean reversion effects and MA model
captures the shock effects observed in the white noise terms. These shock
effects can be considered as unexpected events affecting the observation
process such as surprise earnings, wars, attacks, etc.
ARIMA model
Auto-Regressive Integrated Moving Average aka ARIMA is a class of models
that based on its own lags and the lagged forecast errors. Any non-seasonal
time series that exhibits patterns and is not a random white noise can be
modelled with ARIMA models.
ARIMA model is characterized by 3 terms:
 p is the order of the AR term where the number of lags of Y to be used
as predictors
 q is the order of the MA term where the number of lagged forecast
errors that should go.
 d is the minimum number of differencing needed to make the series
stationary.

SARIMA model
In a seasonal ARIMA model, seasonal AR and MA terms predict using data
values and errors at times with lags that are multiples of m(the span of the
seasonality).

Non-seasonal terms(p,d,q): Can use ACF and PACF plots for this. By examining
spikes of early lags ACF indicates MA term(q). Similarly, PACF indicates the AR
term(p).
Seasonal terms(P, D, Q and m): For this need to examine the patterns across
lags that are multiples of m. Most cases first two or three seasonal multiples
would be enough for this. Use the ACF and PACF the same way.

You might also like