Google Scholar

End-to-end video captioning

S Olivastri, G Singh, F Cuzzolin - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

… We believe we managed to set a new baseline for future work thanks to our principled
end-to-end architecture, providing an opportunity to take research in the field forward starting from …

Save Cite Cited by 35 Related articles All 10 versions View as HTML

[PDF] thecvf.com

End-to-end dense video captioning with masked transformer

L Zhou, Y Zhou, JJ Corso… - Proceedings of the …, 2018 - openaccess.thecvf.com

… an end-to-end framework for doing dense video captioning that is … porates the semantics from
captions to the proposal module. … method with baselines on the ActivityNet Caption dataset. …

Save Cite Cited by 752 Related articles All 9 versions View as HTML

[PDF] thecvf.com

End-to-end generative pretraining for multimodal video captioning

PH Seo, A Nagrani, A Arnab… - Proceedings of the …, 2022 - openaccess.thecvf.com

… end-to-end to generate a caption from raw pixels and transcribed speech directly. Our model
achieves state-of-the-art performance for multimodal video captioning on … a simple baseline, …

Save Cite Cited by 236 Related articles All 5 versions View as HTML

[PDF] arxiv.org

End-to-end video captioning with multitask reinforcement learning

L Li, B Gong - 2019 IEEE winter conference on applications of …, 2019 - ieeexplore.ieee.org

… , this is the first video captioning model that is trained end-to-end from the raw video input to
the cap… Baseline is the sentence generated by our baseline model, MR stands for sentence …

Save Cite Cited by 81 Related articles All 5 versions

[PDF] thecvf.com

Swinbert: End-to-end transformers with sparse attention for video captioning

K Lin, L Li, CC Lin, F Ahmed, Z Gan… - Proceedings of the …, 2022 - openaccess.thecvf.com

… First of all, we present a baseline that does not have any learnable attention mask, shown
in the first row of Table 4b. In the second row, we show another baseline which uses a learn…

Save Cite Cited by 367 Related articles All 6 versions View as HTML

[PDF] thecvf.com

Hierarchical video-moment retrieval and step-captioning

A Zala, J Cho, S Kottur, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

… In step captioning, models generate a textual summary for each step. We also present … -specific
and end-to-end joint baseline models for our new benchmark. While the baseline models …

Save Cite Cited by 85 Related articles All 7 versions View as HTML

[HTML] mdpi.com

[HTML][HTML] Video caption based searching using end-to-end dense captioning and sentence embeddings

A Aggarwal, A Chauhan, D Kumar, M Mittal, S Roy… - Symmetry, 2020 - mdpi.com

… an end-to-end video captioning model and various sentence embedding techniques that
collectively help in building the proposed video-… can be applied with any baseline architecture. …

Save Cite Cited by 14 Related articles All 8 versions Cached

[PDF] thecvf.com

End-to-end concept word detection for video captioning, retrieval, and question answering

Y Yu, H Ko, J Choi, G Kim - Proceedings of the IEEE …, 2017 - openaccess.thecvf.com

… We also compare our model with a couple of baselines: (CTSAN) outperforms the simple …
with significant margins from baselines. Our (CT-SAN) (Ensemble) obtains the video-sentence …

Save Cite Cited by 274 Related articles All 8 versions View as HTML

[PDF] arxiv.org

End-to-end dense video captioning as sequence generation

W Zhu, B Pang, AV Thapliyal, WY Wang… - arXiv preprint arXiv …, 2022 - arxiv.org

… 6For the random partition baseline, a video is randomly partitioned into n segments, with
n sampled uniformly from 1 to 15 (The mean number of segments in the ground-truth is 8). …

Save Cite Cited by 59 Related articles All 4 versions View as HTML

[PDF] thecvf.com

Video captioning of future frames

M Hosseinzadeh, Y Wang - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com

… a baseline and an oracle method on the ActivityNetCaptions … the proposed method outperforms
the baseline and is com… , [38, 39] propose end-to-end video dense captioning systems by …

Save Cite Cited by 26 Related articles All 3 versions View as HTML

Create alert

Cite

Advanced search

Saved to My library

End-to-end video captioning

End-to-end dense video captioning with masked transformer

End-to-end generative pretraining for multimodal video captioning

End-to-end video captioning with multitask reinforcement learning

Swinbert: End-to-end transformers with sparse attention for video captioning

Hierarchical video-moment retrieval and step-captioning

[HTML][HTML] Video caption based searching using end-to-end dense captioning and sentence embeddings

End-to-end concept word detection for video captioning, retrieval, and question answering

End-to-end dense video captioning as sequence generation

Video captioning of future frames