End-to-end video captioning
S Olivastri, G Singh, F Cuzzolin - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
… We believe we managed to set a new baseline for future work thanks to our principled
end-to-end architecture, providing an opportunity to take research in the field forward starting from …
end-to-end architecture, providing an opportunity to take research in the field forward starting from …
End-to-end dense video captioning with masked transformer
… an end-to-end framework for doing dense video captioning that is … porates the semantics from
captions to the proposal module. … method with baselines on the ActivityNet Caption dataset. …
captions to the proposal module. … method with baselines on the ActivityNet Caption dataset. …
End-to-end generative pretraining for multimodal video captioning
… end-to-end to generate a caption from raw pixels and transcribed speech directly. Our model
achieves state-of-the-art performance for multimodal video captioning on … a simple baseline, …
achieves state-of-the-art performance for multimodal video captioning on … a simple baseline, …
End-to-end video captioning with multitask reinforcement learning
… , this is the first video captioning model that is trained end-to-end from the raw video input to
the cap… Baseline is the sentence generated by our baseline model, MR stands for sentence …
the cap… Baseline is the sentence generated by our baseline model, MR stands for sentence …
Swinbert: End-to-end transformers with sparse attention for video captioning
… First of all, we present a baseline that does not have any learnable attention mask, shown
in the first row of Table 4b. In the second row, we show another baseline which uses a learn…
in the first row of Table 4b. In the second row, we show another baseline which uses a learn…
Hierarchical video-moment retrieval and step-captioning
… In step captioning, models generate a textual summary for each step. We also present … -specific
and end-to-end joint baseline models for our new benchmark. While the baseline models …
and end-to-end joint baseline models for our new benchmark. While the baseline models …
[HTML][HTML] Video caption based searching using end-to-end dense captioning and sentence embeddings
… an end-to-end video captioning model and various sentence embedding techniques that
collectively help in building the proposed video-… can be applied with any baseline architecture. …
collectively help in building the proposed video-… can be applied with any baseline architecture. …
End-to-end concept word detection for video captioning, retrieval, and question answering
… We also compare our model with a couple of baselines: (CTSAN) outperforms the simple …
with significant margins from baselines. Our (CT-SAN) (Ensemble) obtains the video-sentence …
with significant margins from baselines. Our (CT-SAN) (Ensemble) obtains the video-sentence …
End-to-end dense video captioning as sequence generation
… 6For the random partition baseline, a video is randomly partitioned into n segments, with
n sampled uniformly from 1 to 15 (The mean number of segments in the ground-truth is 8). …
n sampled uniformly from 1 to 15 (The mean number of segments in the ground-truth is 8). …
Video captioning of future frames
M Hosseinzadeh, Y Wang - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
… a baseline and an oracle method on the ActivityNetCaptions … the proposed method outperforms
the baseline and is com… , [38, 39] propose end-to-end video dense captioning systems by …
the baseline and is com… , [38, 39] propose end-to-end video dense captioning systems by …