Diogo Resende - Time Series Forecasting Models in Python

Time Series
Forecasting Models
in Python
Diogo Resende | Time Series Forecasting Models in Python

Time Series Forecasting Models in Python
0 Introduction to forecasting
1 Seasonal Decomposition
2 Exponential Smoothing and Holt-Winters
3 TBATS
4 Arima, Sarima and Sarimax
5 Tensorflow Structural Time Series
6 Facebook Prophet
7 Facebook Prophet + XGBoost
8 Ensemble

Introduction to
Forecasting

Predictions that were just wrong
Thomas Watson, Jonh Maynard

chairman of IBM Keynes
When: 1943 Three hour shifts or
I think there is a a fifteen-hour work Einstein
world market for week There is not the slightest
maybe five
indication that nuclear
computers.
Steve Ballmer energy will ever be
There’s no chance obtainable. That would
that the iPhone is mean that the atom would
going to get any have to be shattered at
significant market will.
share.
Description
Analytics is 1 Bringing Science to a sometimes gut-feeling job
key to drive 2 Barometer for the company -> Quantifies direction
Forecasting
3 Understanding turning points
4 Can uncover opportunities

What is Time Series Data?
Visualization
Key ideas
• Sequence of data points in

time order (oldest to newest)
• Most commonly, it is data

recorded in equally
distanced time periods
• Type of Panel Data

(multidimensional dataset)

Bike Sharing
How many rides are done per day?
Case Study 1 Holidays and weather KPIs included
Briefing – 2 Time periods: 2011 and 2013
Demand 3 Forecast December 2012 to assess each forecasting
Forecasting
model
[1] Fanaee-T, Hadi, and Gama, Joao, "Event labeling combining

ensemble detectors and background knowledge", Progress in
Artificial Intelligence (2013): pp. 1-15, Springer Berlin Heidelberg,
doi:10.1007/s13748-013-0040-3.
Seasonal
Decomposition

Seasonal Decomposition: the actuals values to be
decomposed
Visualization Key ideas
A seasonal Time Series can

be decomposed into:
• Trend
• Seasonality
• Error
We try to use external

regressors to model the
remaining error term.

Seasonal Decomposition: Trend
Visualization Trend

Seasonal Decomposition: Seasonality
Visualization Seasonality
Jan Apr Jul Oct

Seasonal Decomposition: Error
Visualization Error
Start End
Additive vs. Multiplicative
Additive Multiplicative Key ideas

Y 𝑦 𝑡 = 𝑇𝑡 𝑡 + 𝑆 𝑡 + 𝑒[𝑡] Y 𝑦 𝑡 = 𝑇𝑡 𝑡 ∗ 𝑆 𝑡 ∗ 𝑒[𝑡]
If we talk about seasonality
in terms of percentage, then
we should consider a
multiplicative seasonality.
If it is in adding absolute
values, then it is additive.
If trend is exponential, then

it is multiplicative
t t

Descrition
• The essential part of forecasting

Forecasting
• Understanding what else can explain the
is all about Error
error • How? Usually in the form of external

regressors
modelling • High errors in the beginning of dataset?

Consider discarding that part of the data.
Data without patterns : Stocks
Key Idea
• If there is no pattern, you should not use forecasting models

• Forecasting models work best with consistent seasonality and trends
Trend
• Heavily dependent on the company
Seasonality
• Depends more on the industry, thus it is more predictable.

Exponential
Smoothing &
Holt-Winters

Let‘s imagine this is our full data set
Description

Splitting between training and test enables an unbiased
model assessment
Training Set Test Set
Model Assessment

Training and Test set in Time series
Training set Test set
Dataset Time
Key Ideas
Forecasting Models are usually split into a pre and post period from a time perspective
The Test Set should be of the size of a real-world forecast

What is Exponential Smoothing?
Key Ideas
Weighted averages of past observations, with the
weights decaying exponentially as the observations get older
Visualization
Importance
Today Time

Holt-Winters is a Triple split Exponential Smoothing
Splits the time series into 3: Key Ideas

• Level • Performs Exponential Smoothing in
each of the 3 levels
• Trend • Holt-Winters is also called Triple

Exponential Smoothing
• Seasonality • There are 2 variants: Additive and
Multiplicative

Mean Absolut Error (MAE) vs Root Squared Mean Error
(RSME)
Y
• MAE and RSME are performance indicators for
Model Regression models with continuous dependent
variables
σ 𝑦ො − 𝑦 2
σ 𝑦 − 𝑦ො
𝑀𝐴𝐸 = x 𝑅𝑆𝑀𝐸 =
𝑛 𝑛
• RSME is quite useful for models with extremes /

outliers
time • MAE is more interpretable.

Mean Absolut Percent Error (MAPE)
Y • MAPE represents a very interpretable way of

Model measuring errors
𝑦 − 𝑦ො
σ
x 𝑦
𝑀𝐴𝑃𝐸 =
𝑛
• Clear downside is that all error has the same
relevance, regardless of the magnitude, if the
percent error is the same
X • There is no universal good accuracy measure.

It will depend on your problem and business
need!
Pros and Cons
Easy to Apply Does not allow external regressors

1 1
Easy to understand Low Flexibility

2 2
Better with low amount of time

3
periods or frequency

Description
Use Holt-Winters to predict the amount of airmiles
1 Set Index frequency to Monthly. Use „MS“
2 Visualize data
Create Training and Test Set. Test Set should

Challenge 3
be 12 months
4 Create Holt-Winters Model
Predict 12 months and visualize them, together

5 with the training and test set
6 Assess Model based on MAE
Dataset: TSA package

TBATS

Meaning of TBATS
Description
1 Trigonometrics seasonality Origin

Created in 2011
2 Box-Cox transformation Similar to Exponential Smoothing
3 AutoRegressive Moving Average Why
4 Trend The math behind has several

similarities
5 Seasonality

AutoRegressive components
Key Idea
Past values, the lags, contain information that help predict future values
Visualization
𝑌𝑡 = 𝑐 + 𝛼1 * 𝑌𝑡−1 + 𝛼2 ∗ 𝑌𝑡−2+ … + 𝛼𝑛 ∗ 𝑌𝑡−𝑛
Today Time
How to determine how many lags

We will do it automatically in the practice tutorials

Moving Average components
Visualization of the errors
Methodological Framework
𝑦𝑡 = 𝑐 + 𝛼1 * 𝜀𝑡−1+ … + 𝛼𝑛 ∗ 𝜀𝑡−𝑛
What it is?
Past error lags, contain information
that help predict future values
How to do it?
We will do it automatically in the
Start End practice tutorials

Trigonometric seasonality
Visualization Description
• Trigonometry is part of the

modelling.
• Seasonality equation contain

the Sine and Cosine
• In practical terms, we do not

need to do anything

BOX-COX
Visualization What is it?

Transforming the dependent variable into a normal distribution
Why do we care?
Normal distribution is a requirement or assumption of many
statistical techniques
Key Idea
• Box Cox is part of the modelling.
• In practical terms, we do not need to do anything

Pros and Cons
Seasonality is allowed to change Prediction intervals often wide

1 1
overtime
Automated Optimization Does not allow external regressors

2 2
Easy implementation Slow

3 3

Description
Use TBATS to predict weekly store footfall
1 Transform Index to have weekly frequency. Use „W“
2 Visualize data. Something will be off ;)

Challenge 3
be 5 weeks
4 Create TBATS Model
Predict 5 weeks and visualize them, together

5 with the training and test set
6 Assess Model based on RMSE
Source: UK Government
ARIMA, SARIMA
& SARIMAX

What does it all mean?
Acronym Description
ARIMA AutoRegregressive Integrated Moving Average
SARIMA Seasonal + ARIMA
SARIMAX SARIMA + Exogenous variables

What is ARIMA?
Component Description
AutoRegressive The output is regressed on its own lagged values
Number of times we need to do differencing to make our time series

Integrated
stationary
Moving Average Instead of using the past values, the MA model uses past forecast errors.

ARMA recap
AutoRegressive Moving Average

Past values, the lags, contain information Past error lags, contain information that
that help predict future values help predict future values
Visualization
Visualization
Time
Start End
Stationarity
Stationary Time Series Time dependent mean Key idea

Mean, variance and
covariance are not time
dependent
Stationary Time Series
have a clearly defined
pattern
Time dependent variance Time dependent covariance
Y Y Statistical test:
Dickey-Fuller test. If p-
value is less than 0.05,
time series is
considered stationary
t t
Making Data Stationary
Time Series 1st differencing 2nd differencing Key idea
5 NA NA Making data stationary

is simple, yet the
9 4 NA concept is confusing.
1 -8 -12 From a practical
7 6 14 perspective, it is a check
that we need to do
3 -4 -10
7 4 8 The Auto.arima function
does it automatically for
4 -3 -7 us!

SARIMAX
Examples
• Moving seasonality
Events like Black Friday or seasonal holidays like
External Regressors Easter or Diwali are not in the same dates every year.
• The goal of the regressors is to • Events outside the company control

model the remaining error. Factors like weather or corona interfere with the usual
seasonality or trend, thus you need to model them in
• Information that is not recurrent your forecast to decrease errors
over time or modifies itself.
• Events caused by the company
Major investment or strategy shifts affect the normal
development of a KPI. You need to try to find a metric
that represents any of these factors

3 factors to optimize in ARIMA(p,d,q)
Order Description Explanation
p Order of the Autoregressive Number of unknown terms that multiply your

signal at past times
d Degree of first Differencing involved Number of differences to make time series

stationary
q Order of the Moving Average part Number of unknown terms that multiply your
forecast errors at past times
Key Idea
• P, d, and q are non-negative integers.
No extra work, there are functions to optimize the factors automatically
6 factors to optimize in SARIMA
Data Type Acronym Factors
Seasonal Data S P, D, Q
Time Series
Non-seasonal Data ARIMA p, d, q
Key Idea
• Despite having 3 more factors to optimize, they mirror the classic ARIMA (p, d, q)
• No extra work, there are functions to optimize the factors automatically

Akaike’s Information Criterion (AIC) and Bayesian
Information Criterion (BIC)
Key Ideas Pseudo-visualization
Goodness
• AIC and BIC provide a means to select a model of fit
• Trade-off between simplicity and goodness of fit
• Deal with overfitting and underfitting
Simplicity

Pros and Cons
Easy Implementation Better with low amount of time

1 1
periods or frequency
Automated Optimization Low Flexibility

2 2
Easy to Understand
3

Description
Use SARIMAX to predict interest in Churrasco
1 Transform Index to have weekly frequency. Use „W“
2 Visualize data.

Challenge 3
be 10 weeks
Extract Exogenous Variables and Create

4
SARIMAX model
Predict 10 weeks and visualize them, together

5
with the training and test set
6 Assess Model based on MAPE

Source: Google Trends
Tensorflow
Probabilities Structural
Time Series

Structural Time Series
Visualization Description
Data Seasonality
• Structural Time Series is the
decomposition of the data in at
least:
• Trend
• Seasonality
• Exogenous impacts
Trend Exogenous impacts • Leftovers: noise
Methodological framework
𝑦(𝑡) = 𝑐 𝑡 + 𝑠 𝑡 + 𝑥 𝑡 + 𝜖
Diogo Resende | Time Series Forecasting

Models in Python
Tensorflow Structural Time series
Seasonality
Decomposition
• Weekly
• Trend • Monthly
• Seasonality - multiple • Yearly
• Exogenous impacts
• AutoRegressive Autoregressive
• Noise • Focus on giving weight to
recent information

Description
Simulation used for Bayesian Inference
Causal inference problem statement

We know what happenened, but we do not know what led to
it
Hamiltonian Bayes Theorem

𝑃 𝑖𝑚𝑝𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑏𝑢𝑦 ∗ 𝑃(𝑏𝑢𝑦)
𝑃 𝑏𝑢𝑦 𝑖𝑚𝑝𝑟𝑒𝑠𝑠𝑖𝑜𝑛) =
Monte Carlo
𝑃(𝑖𝑚𝑝𝑟𝑒𝑠𝑠𝑖𝑜𝑛)
𝑃 𝑖𝑚𝑝𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑏𝑢𝑦 ∗ 𝑃(𝑏𝑢𝑦)

=
‫)𝑦𝑢𝑏(𝑑 𝑦𝑢𝑏 𝑃 ∗ 𝑦𝑢𝑏 𝑛𝑜𝑖𝑠𝑠𝑒𝑟𝑝𝑚𝑖 𝑃 ׬‬
Problem statement
It is not possible to solve the equation and thus we simulate
outcomes
Tensorflow Structural Time Series Pros and Cons
Flexible Complex programming

1 1
Continuous Regressors Very slow

2 2
Good with short-term dynamics

3
Intuitive
4
Description
Udemy wikipedia page visits
1 Set as regressors Easter and Christmas variables
2 Split into training and test set and isolate Y
3 Create weekly and monthly seasonality objects
Challenge 4 Create Trend and Autoregressive components

Create Tensorflow model and fit it with
5 Hamiltonion Monte Carlo
Predict 30 days and add index to the

6
predictions.
7 Visualize forecast, trainining and test data

Dataset: TSA package
Facebook Prophet

Facebook Prophet quick facts
Description
1 Built by facebook
Which? Stan background - probabilistic programming
2
language for statistical inference
3 Dynamic Holidays
4 Prophet forecasts are customizable in ways that are

intuitive to non-experts
5 Built-in Cross Validation & Hyperparameter Tuning

Methodological framework
𝑦(𝑡) = 𝑐 𝑡 + 𝑠 𝑡 + ℎ 𝑡 + 𝑥 𝑡 + 𝜖
Where:
c(t) Trend +
Prophet s(t) Seasonality +
Mechanics h(t) Holiday effects +

x(t) External regressors +
e error
Visualization
Dynamic Holidays – Valentine‘s example
Visualization
Facebook Prophet
Chocolate
demand You state Valentine‘s as a key
event and specify how many
days before/after to quantify
Other models:
You must create dummy
variables for each day, if you
believe they have different
impacts
11 12 13 14 15
February
Facebook Prophet Model
Growth Linear or Logistic
Holidays Dataframe that we prepared
Seasonality Yearly, weekly or daily. True or False
Seasonality_mode Multiplicative or additive
Seasonality_prior_scale Strength of the seasonality
Holiday_prior_scale Larger values allow the model to fit larger seasonal fluctuations
Changepoint_prior_scale flexibility of the automatic changepoint selection

Cross Validation
Training set Test set
Key Idea
Repeating the assessment of our model reinforces its evaluation

Parameters to tune
Seasonality_prior_scale Strength of the seasonality
Holiday_prior_scale Larger values allow the model to fit larger seasonal fluctuations
Changepoint_prior_scale flexibility of the automatic changepoint selection

Pros and Cons

1 1
Built-in Cross Validation Can need intense optimization

2 2
Dynamics Events Not good with short-term dynamics

3 3
Allows regressors Not good with non-linear

4 4
regressors
Description
Demand for Shelter in New York City
1 Rename Dependent and Time Variable to y and ds
Declare Easter and Thanksgiving as holidays.

2
Combine them. Use pd.concat
3 Create Prophet model. Christmas is a regressor

Challenge 4 Cross Validation. Horizon = 31, initial = 2400.
Assess via MAE
5 Create Parameter Grid for Tuning
Perform Hyperparameter Tuning. Use MAE as

6
the KPI to optimize. Gather Results
Dataset: Open Data NYC initiative

Facebook Prophet
+
XGBoost

Prophet and XGBoost step by step
Tuned Prophet Model
Borrow Seasonality, Trend and other Variables
Prepare XGBoost Matrices
Set Parameters
Run XGBoost
Assess Model
Description
XGBoost is a
1 Stands for Extreme Gradient Boosting
state-of-art 2
Can be contructed with a tree based algorithm or
linear (worse results)
Machine 3 It is an emsemble algorithm
Learning 4
Each new model is built upon the precedent one ->
continuous improvement
Algorithm 5 Can be used for both Regression and Classification

XGBoost gives different weights depending on how
difficult it is to predict
First Tree Second Tree Third Tree
Outcome Predictor Weight Outcome Predictor Weight Outcome Predictor Weight

1 X 25% 1 X 20% 1 X 23%
0 X 25% 0 X 20% 0 X 15%
0 X 25% 0 X 30% 0 X 35%
1 X 25% 1 X 30% 1 X 27%

XGBoost looks at parts of the observations at a time
First Tree Second Tree Third Tree
Outcome Predictor Weight Outcome Predictor Weight Outcome Predictor Weight

1 X1 25% 1 X1 20% 1 X1 23%
0 X2 25% 0 X2 20%
0 X3 30% 0 X3 35%
1 X4 25% 1 X4 27%
Key Idea
XGBoost only looks at a fraction of the observation at the time
Observations that are more difficult to predict are given a bigger weight

The logic is similar for Regression-based tasks
First Tree Second tree
Error Outcome Predictor Weight Error Outcome Predictor Weight

-5 15 X1 33% -1 19 X1 40%
2 22 X2 33%
-1 25 X2 30%
4 34 X4 33% 3 35 X4 35%

XGBoost also gives different weights to different
predictors
First Tree Second Tree

Error Outcome X1 X2 X3 Weight Error Outcome X1 X2 X3 Weight
-5 15 33% -1 19 40%
2 22 33%
50%
50%
50%
50%
-1 25 30%
4 34 33% 3 35 35%
Third Tree
Error Outcome X1 X2 X3 Weight
1 21 35% Key Idea
Predictors also have different weights
40%
60%
if they yield different model results

0 24 30%
2 36 40%
XGBoost quirks
Description
Which? NA:
Unlike other regression models, XGBoost treats NA‘s as
information
Non-linearity:
XGBoost is excellent dealing with non-linearity relationship
between the dependent and the independent variables.

Which parameters are there?
Parameter Description
Minimum Child Relates to the sum of the weights of each observation. Low values can
weight mean that maybe not a lot of observations are in the round
ETA Learning Rate. How fast do you want the model to learn?
Max depth How big should the tree be? Bigger trees go into more detail
Gamma How fast should the tree be split?
Subsample Share of observations in each tree?
Colsample by tree How much of the tree should be analysed per round?
Number of rounds How many times do we want the analysis to be run?

Prophet + XGBoost Pros and Cons

1 1
Great with Regressors Can need intense optimization

2 2
Decent with short-term dynamics

3

Description
Demand for Shelter in New York City
1 Create future DF with test set length. Add regressor
Forecast and create a DF with: trend, weekly,

2
yearly, holidays, multiplicative_terms
3 Concatenate with df. Drop Easter and Thanksgiving

Challenge 4 Generate Training and Test Set. Isolate X and Y
and form XGBoost Matrices
5 Set Parameters and Create XGBoost model
Predict. Visualize Test Set and Predictions.

6
Assess model using MAPE
Source: UK Government
Ensemble

Ensemble mechanism
Example
Date Y Holt- SARIMAX TBATS TFP Prophet XGBoost Ensemble
Winters
t 50 48 49 51 50.5 53 51 50.5
Key Idea
• Ensemble is an average of models. The goal models have flaws, but if you group all
of them, then some models will average out the error
To consider:
• Dynamic average. You give more weight to models that have less errors, punish the
ones that are not performing as well.

Why Ensemble
Deep dives
The research on combining forecasts to achieve better accuracy
is extensive, persuasive, and consistent.
Essam Mahmoud,
“Accuracy in Forecasting: A Survey,” Journal of Forecasting, April–
June 1984, p. 139;
Spyros Makridakis and Robert L. Winkler,
“Averages of Forecasts: Some Empirical Results,” Management
Science, September 1983, p. 987
Victor Zarnowitz,
“The Accuracy of Individual and Group Forecasts from Business
Outlook Surveys,” Journal of Forecasting, January–March 1984, p. 10.

Pros and Cons
Accuracy Lack of visibility

1 1
Preparation
2


Diogo Resende - Time Series Forecasting Models in Python

Uploaded by

Copyright:

Available Formats

Diogo Resende - Time Series Forecasting Models in Python

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Diogo Resende - Time Series Forecasting Models in Python

Uploaded by

Copyright:

Available Formats

Time Series

Diogo Resende | Time Series Forecasting Models in Python

2 Exponential Smoothing and Holt-Winters

4 Arima, Sarima and Sarimax

5 Tensorflow Structural Time Series

7 Facebook Prophet + XGBoost

Diogo Resende | Time Series Forecasting Models in Python

Diogo Resende | Time Series Forecasting Models in Python

Thomas Watson, Jonh Maynard

Analytics is 1 Bringing Science to a sometimes gut-feeling job

key to drive 2 Barometer for the company -> Quantifies direction

4 Can uncover opportunities

• Sequence of data points in

• Most commonly, it is data

• Type of Panel Data

Diogo Resende | Time Series Forecasting Models in Python

Case Study 1 Holidays and weather KPIs included

Briefing – 2 Time periods: 2011 and 2013

Demand 3 Forecast December 2012 to assess each forecasting

[1] Fanaee-T, Hadi, and Gama, Joao, "Event labeling combining

Diogo Resende | Time Series Forecasting Models in Python

A seasonal Time Series can

We try to use external

Diogo Resende | Time Series Forecasting Models in Python

Diogo Resende | Time Series Forecasting Models in Python

Jan Apr Jul Oct

Additive Multiplicative Key ideas

If trend is exponential, then

Diogo Resende | Time Series Forecasting Models in Python

• The essential part of forecasting

error • How? Usually in the form of external

modelling • High errors in the beginning of dataset?

• If there is no pattern, you should not use forecasting models

Diogo Resende | Time Series Forecasting Models in Python

Diogo Resende | Time Series Forecasting Models in Python

Diogo Resende | Time Series Forecasting Models in Python

Training Set Test Set

Diogo Resende | Time Series Forecasting Models in Python

Training set Test set

Diogo Resende | Time Series Forecasting Models in Python

Diogo Resende | Time Series Forecasting Models in Python

Splits the time series into 3: Key Ideas

• Trend • Holt-Winters is also called Triple

Diogo Resende | Time Series Forecasting Models in Python

• RSME is quite useful for models with extremes /

time • MAE is more interpretable.

Diogo Resende | Time Series Forecasting Models in Python

Visualization Key ideas

Y • MAPE represents a very interpretable way of

X • There is no universal good accuracy measure.

Easy to Apply Does not allow external regressors

Easy to understand Low Flexibility

Better with low amount of time

Diogo Resende | Time Series Forecasting Models in Python

1 Set Index frequency to Monthly. Use „MS“

Create Training and Test Set. Test Set should

4 Create Holt-Winters Model

Predict 12 months and visualize them, together

6 Assess Model based on MAE