Improve Video Representation with Temporal Adversarial Augmentation

Duan, Jinhao; Fan, Quanfu; Cheng, Hao; Shi, Xiaoshuang; Xu, Kaidi

Computer Science > Computer Vision and Pattern Recognition

arXiv:2304.14601 (cs)

[Submitted on 28 Apr 2023 (v1), last revised 14 May 2023 (this version, v2)]

Title:Improve Video Representation with Temporal Adversarial Augmentation

Authors:Jinhao Duan, Quanfu Fan, Hao Cheng, Xiaoshuang Shi, Kaidi Xu

View PDF

Abstract:Recent works reveal that adversarial augmentation benefits the generalization of neural networks (NNs) if used in an appropriate manner. In this paper, we introduce Temporal Adversarial Augmentation (TA), a novel video augmentation technique that utilizes temporal attention. Unlike conventional adversarial augmentation, TA is specifically designed to shift the attention distributions of neural networks with respect to video clips by maximizing a temporal-related loss function. We demonstrate that TA will obtain diverse temporal views, which significantly affect the focus of neural networks. Training with these examples remedies the flaw of unbalanced temporal information perception and enhances the ability to defend against temporal shifts, ultimately leading to better generalization. To leverage TA, we propose Temporal Video Adversarial Fine-tuning (TAF) framework for improving video representations. TAF is a model-agnostic, generic, and interpretability-friendly training strategy. We evaluate TAF with four powerful models (TSM, GST, TAM, and TPN) over three challenging temporal-related benchmarks (Something-something V1&V2 and diving48). Experimental results demonstrate that TAF effectively improves the test accuracy of these models with notable margins without introducing additional parameters or computational costs. As a byproduct, TAF also improves the robustness under out-of-distribution (OOD) settings. Code is available at this https URL.

Comments:	To be appeared in IJCAI 2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2304.14601 [cs.CV]
	(or arXiv:2304.14601v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2304.14601

Submission history

From: Jinhao Duan [view email]
[v1] Fri, 28 Apr 2023 03:06:37 UTC (16,496 KB)
[v2] Sun, 14 May 2023 19:25:16 UTC (16,496 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Improve Video Representation with Temporal Adversarial Augmentation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Improve Video Representation with Temporal Adversarial Augmentation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators