2018 24th International Conference on Pattern Recognition (ICPR)
Download PDF

Abstract

The two-stream ConvNets in action recognition always fuse the two streams' predictions by the weighted averaging scheme. This fusion way with fixed weights lacks of pertinence to different action videos and always needs trial and error on the validation set. In order to enhance the adaptability of two-stream ConvNets, an end-to-end trainable gated fusion method, namely gating ConvNet, is proposed in this paper based on the MoE (Mixture of Experts) theory. The gating ConvNet takes the combination of convolutional layers of the spatial and temporal nets as input and outputs two fusion weights. To reduce the over-fitting of gating ConvNet caused by the redundancy of parameters, a new multi-task learning method is designed, which jointly learns the gating fusion weights for the two streams and learns the gating ConvNet for action classification. With the proposed gated fusion method and multi-task learning approach, competitive performance is achieved on the video action dataset UCF101.
Like what you’re reading?
Already a member?Sign In
Member Price
$11
Non-Member Price
$21
Add to CartSign In
Get this article FREE with a new membership!

Related Articles