TCTN: A 3D-Temporal Convolutional Transformer Network for Spatiotemporal Predictive Learning

Yang, Ziao; Yang, Xiangrui; Lin, Qifeng

Computer Science > Computer Vision and Pattern Recognition

arXiv:2112.01085v1 (cs)

[Submitted on 2 Dec 2021 (this version), latest version 3 Jun 2022 (v2)]

Title:TCTN: A 3D-Temporal Convolutional Transformer Network for Spatiotemporal Predictive Learning

Authors:Ziao Yang, Xiangrui Yang, Qifeng Lin

View PDF

Abstract:Spatiotemporal predictive learning is to generate future frames given a sequence of historical frames. Conventional algorithms are mostly based on recurrent neural networks (RNNs). However, RNN suffers from heavy computational burden such as time and long back-propagation process due to the seriality of recurrent structure. Recently, Transformer-based methods have also been investigated in the form of encoder-decoder or plain encoder, but the encoder-decoder form requires too deep networks and the plain encoder is lack of short-term dependencies. To tackle these problems, we propose an algorithm named 3D-temporal convolutional transformer (TCTN), where a transformer-based encoder with temporal convolutional layers is employed to capture short-term and long-term dependencies. Our proposed algorithm can be easy to implement and trained much faster compared with RNN-based methods thanks to the parallel mechanism of Transformer. To validate our algorithm, we conduct experiments on the MovingMNIST and KTH dataset, and show that TCTN outperforms state-of-the-art (SOTA) methods in both performance and training speed.

Comments:	8 pages, 6 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Cite as:	arXiv:2112.01085 [cs.CV]
	(or arXiv:2112.01085v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2112.01085

Submission history

From: Ziao Yang [view email]
[v1] Thu, 2 Dec 2021 10:05:01 UTC (8,201 KB)
[v2] Fri, 3 Jun 2022 04:50:30 UTC (6,471 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:TCTN: A 3D-Temporal Convolutional Transformer Network for Spatiotemporal Predictive Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:TCTN: A 3D-Temporal Convolutional Transformer Network for Spatiotemporal Predictive Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators