0% found this document useful (0 votes)
299 views19 pages

Temporal Fusion Transformer Slides

I do not have access to any accounts or data to detect anomalies. I am an AI assistant created by Anthropic to be helpful, harmless, and honest.

Uploaded by

Ahmed Fakhry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
299 views19 pages

Temporal Fusion Transformer Slides

I do not have access to any accounts or data to detect anomalies. I am an AI assistant created by Anthropic to be helpful, harmless, and honest.

Uploaded by

Ahmed Fakhry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Temporal Fusion Transformers

for Interpretable Multi-horizon Time Series


Forecasting

Ahmed Fakhry Elnaggar


Multi-horizon Time Series Forecasting

Normal Univariate Time Series Multi-horizon Time Series


The problem of Heterogeneity
• Well known methods: Early Fusion of Features, Late Fusion by Multi-
branch models / Ensemble models
• Seq2Seq models deal only with sequential data

• The paper solve this issue via:


• An Encoder for Static Covariates
• Gating Mechanism to filter irrelevant features
• Temporal Processing known and future inputs
Static Covariate Encoder
• What is a static covariate ?
• It’s any feature that is constant over time ( Totally time independent )
such as product ID ( Number ), Product Description or
a mathematical constant.

• The encoder:
• To integrate the static feature into the network by
converting them to 4 context vectors (C1,C2,C3,C4)
Static Covariate Encoder
• These context vectors are used for:
• Temporal variable selection ( Used in Variable Selection Blocks ) (C1)
• In the local processing of input features:
• Used as context vectors for encoders ( C2, C3 )
• Enhance temporal features ( Input for GRNs Before head attention )
( C4 )
Static Covariate Encoder
• Variable Selection Network
Gating Mechanism
• What is a gated neural network?
• Is it gated recurrent neural network ? No

• It’s an updated to residual neural networks


Gating Mechanism
• Look at it as a coefficient that attenuate how much low level
features ( Identity connection ) vs the normal stacked layers.
Temporal Processing
• Seq2seq for short-term temporal relationships from input

(a) LSTM Encoder-Decoder Network


Temporal Processing
• Multi-head attention for long-term dependencies

(b) Multi-head Self-attention Layer


Putting All Blocks Together
Performance: Data

• The UCL Load Diagram Dataset ( Short Term Forecasting,


one week to predict a day).
• The UCI PEM-SF Traffic Dataset ( Short Term Forecasting )
• Favorita Grocery Sales Dataset (Short Term Forecasting , 90
days of past data to predict 30 days )
• Data for 31 Stock Options ( Medium Term Forecasting, One
year to predict a week )
Performance: Data Stats
Performance: Benchmarking
Performance: Benchmarking
Interpretability Use Cases
• Analyzing Variable Importance by aggregating selection weights from
the variable selection block ( For example, the retail dataset )
Interpretability Use Cases
• Visualizing Persistent Temporal Patterns:
• By using attention weights to shed the light on the most important
past time steps.
Get anomalies in account

You might also like