TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection

Min, Kyle; Corso, Jason J.

Computer Science > Computer Vision and Pattern Recognition

arXiv:1908.05786 (cs)

[Submitted on 15 Aug 2019]

Title:TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection

Authors:Kyle Min, Jason J. Corso

View PDF

Abstract:TASED-Net is a 3D fully-convolutional network architecture for video saliency detection. It consists of two building blocks: first, the encoder network extracts low-resolution spatiotemporal features from an input clip of several consecutive frames, and then the following prediction network decodes the encoded features spatially while aggregating all the temporal information. As a result, a single prediction map is produced from an input clip of multiple frames. Frame-wise saliency maps can be predicted by applying TASED-Net in a sliding-window fashion to a video. The proposed approach assumes that the saliency map of any frame can be predicted by considering a limited number of past frames. The results of our extensive experiments on video saliency detection validate this assumption and demonstrate that our fully-convolutional model with temporal aggregation method is effective. TASED-Net significantly outperforms previous state-of-the-art approaches on all three major large-scale datasets of video saliency detection: DHF1K, Hollywood2, and UCFSports. After analyzing the results qualitatively, we observe that our model is especially better at attending to salient moving objects.

Comments:	ICCV 2019 camera ready (Supplementary material: on CVF soon)
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1908.05786 [cs.CV]
	(or arXiv:1908.05786v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1908.05786

Submission history

From: Kyle Min [view email]
[v1] Thu, 15 Aug 2019 22:30:50 UTC (9,256 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators