End-to-end video captioning

S Olivastri, G Singh, F Cuzzolin - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
… We believe we managed to set a new baseline for future work thanks to our principled
end-to-end architecture, providing an opportunity to take research in the field forward starting from …

End-to-end dense video captioning with masked transformer

L Zhou, Y Zhou, JJ Corso… - Proceedings of the …, 2018 - openaccess.thecvf.com
… an end-to-end framework for doing dense video captioning that is … porates the semantics from
captions to the proposal module. … method with baselines on the ActivityNet Caption dataset. …

End-to-end generative pretraining for multimodal video captioning

PH Seo, A Nagrani, A Arnab… - Proceedings of the …, 2022 - openaccess.thecvf.com
end-to-end to generate a caption from raw pixels and transcribed speech directly. Our model
achieves state-of-the-art performance for multimodal video captioning on … a simple baseline, …

End-to-end video captioning with multitask reinforcement learning

L Li, B Gong - 2019 IEEE winter conference on applications of …, 2019 - ieeexplore.ieee.org
… , this is the first video captioning model that is trained end-to-end from the raw video input to
the cap… Baseline is the sentence generated by our baseline model, MR stands for sentence …

Swinbert: End-to-end transformers with sparse attention for video captioning

K Lin, L Li, CC Lin, F Ahmed, Z Gan… - Proceedings of the …, 2022 - openaccess.thecvf.com
… First of all, we present a baseline that does not have any learnable attention mask, shown
in the first row of Table 4b. In the second row, we show another baseline which uses a learn…

Hierarchical video-moment retrieval and step-captioning

A Zala, J Cho, S Kottur, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
… In step captioning, models generate a textual summary for each step. We also present … -specific
and end-to-end joint baseline models for our new benchmark. While the baseline models …

[HTML][HTML] Video caption based searching using end-to-end dense captioning and sentence embeddings

A Aggarwal, A Chauhan, D Kumar, M Mittal, S Roy… - Symmetry, 2020 - mdpi.com
… an end-to-end video captioning model and various sentence embedding techniques that
collectively help in building the proposed video-… can be applied with any baseline architecture. …

End-to-end concept word detection for video captioning, retrieval, and question answering

Y Yu, H Ko, J Choi, G Kim - Proceedings of the IEEE …, 2017 - openaccess.thecvf.com
… We also compare our model with a couple of baselines: (CTSAN) outperforms the simple …
with significant margins from baselines. Our (CT-SAN) (Ensemble) obtains the video-sentence …

End-to-end dense video captioning as sequence generation

W Zhu, B Pang, AV Thapliyal, WY Wang… - arXiv preprint arXiv …, 2022 - arxiv.org
… 6For the random partition baseline, a video is randomly partitioned into n segments, with
n sampled uniformly from 1 to 15 (The mean number of segments in the ground-truth is 8). …

Video captioning of future frames

M Hosseinzadeh, Y Wang - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
… a baseline and an oracle method on the ActivityNetCaptions … the proposed method outperforms
the baseline and is com… , [38, 39] propose end-to-end video dense captioning systems by …