Active Learning for Video Description With Cluster-Regularized Ensemble Ranking

Chan, David M.; Vijayanarasimhan, Sudheendra; Ross, David A.; Canny, John

Computer Science > Computer Vision and Pattern Recognition

arXiv:2007.13913 (cs)

[Submitted on 27 Jul 2020 (v1), last revised 2 Dec 2020 (this version, v3)]

Title:Active Learning for Video Description With Cluster-Regularized Ensemble Ranking

Authors:David M. Chan, Sudheendra Vijayanarasimhan, David A. Ross, John Canny

View PDF

Abstract:Automatic video captioning aims to train models to generate text descriptions for all segments in a video, however, the most effective approaches require large amounts of manual annotation which is slow and expensive. Active learning is a promising way to efficiently build a training set for video captioning tasks while reducing the need to manually label uninformative examples. In this work we both explore various active learning approaches for automatic video captioning and show that a cluster-regularized ensemble strategy provides the best active learning approach to efficiently gather training sets for video captioning. We evaluate our approaches on the MSR-VTT and LSMDC datasets using both transformer and LSTM based captioning models and show that our novel strategy can achieve high performance while using up to 60% fewer training data than the strong state of the art baselines.

Comments:	Published at the 15th Asian Conference on Computer Vision (ACCV 2020)
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2007.13913 [cs.CV]
	(or arXiv:2007.13913v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2007.13913

Submission history

From: David Chan [view email]
[v1] Mon, 27 Jul 2020 23:52:41 UTC (8,343 KB)
[v2] Wed, 29 Jul 2020 17:39:56 UTC (8,343 KB)
[v3] Wed, 2 Dec 2020 23:38:20 UTC (9,375 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Active Learning for Video Description With Cluster-Regularized Ensemble Ranking

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Active Learning for Video Description With Cluster-Regularized Ensemble Ranking

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators