Action Recognition in Untrimmed Videos with Composite Self-Attention Two-Stream Framework

Cao, Dong; Xu, Lisha; Chen, HaiBo

doi:10.1007/978-3-030-41299-9_3

Computer Science > Computer Vision and Pattern Recognition

arXiv:1908.04353 (cs)

[Submitted on 4 Aug 2019 (v1), last revised 2 Sep 2019 (this version, v2)]

Title:Action Recognition in Untrimmed Videos with Composite Self-Attention Two-Stream Framework

Authors:Dong Cao, Lisha Xu, HaiBo Chen

View PDF

Abstract:With the rapid development of deep learning algorithms, action recognition in video has achieved many important research results. One issue in action recognition, Zero-Shot Action Recognition (ZSAR), has recently attracted considerable attention, which classify new categories without any positive examples. Another difficulty in action recognition is that untrimmed data may seriously affect model performance. We propose a composite two-stream framework with a pre-trained model. Our proposed framework includes a classifier branch and a composite feature branch. The graph network model is adopted in each of the two branches, which effectively improves the feature extraction and reasoning ability of the framework. In the composite feature branch, a 3-channel self-attention models are constructed to weight each frame in the video and give more attention to the key frames. Each self-attention models channel outputs a set of attention weights to focus on a particular aspect of the video, and a set of attention weights corresponds to a one-dimensional vector. The 3-channel self-attention models can evaluate key frames from multiple aspects, and the output sets of attention weight vectors form an attention matrix, which effectively enhances the attention of key frames with strong correlation of action. This model can implement action recognition under zero-shot conditions, and has good recognition performance for untrimmed video data. Experimental results on relevant data sets confirm the validity of our model.

Comments:	Accepted to ACPR 2019
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1908.04353 [cs.CV]
	(or arXiv:1908.04353v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1908.04353
Related DOI:	https://doi.org/10.1007/978-3-030-41299-9_3

Submission history

From: Dong Cao [view email]
[v1] Sun, 4 Aug 2019 02:44:37 UTC (509 KB)
[v2] Mon, 2 Sep 2019 05:36:11 UTC (500 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Action Recognition in Untrimmed Videos with Composite Self-Attention Two-Stream Framework

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Action Recognition in Untrimmed Videos with Composite Self-Attention Two-Stream Framework

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators