Nov 5, 2019 · In this work, we investigate use of RNN-T in applications that require a tune-able latency budget during inference time. We also improved the decoding speed.
We show that the Ar-RNN-T loss provides a refined control to navigate the trade-offs between the token emission delays and the Word Error Rate (WER). The Ar-RNN ...
This work evaluates their proposed system on English videos ASR dataset and shows that neural RNN-T models can achieve comparable WER and better computational ...
In this work, we investigate use of RNN-T in applications that require a tune-able latency budget during inference time. We also improved the decoding speed.
• Improving RNN-T beam search. • Jain et al., RNN-T For Latency Controlled ASR With Imoroved Beam Search. • Contextualization: • Jain et al., Contextual RNN-T ...
Latency Controlled RNN-T: RNN-T For Latency Controlled ASR With Improved Beam Search (arXiv 2019); Transformer equipped RNN-T: Self-Attention Transducers ...
RNN-T for ASR has three main components: Audio Encoder, Text Predictor and Joiner. The Audio Encoder encodes audio frames up to a time t as audio embedding.
Abstract. Because of its streaming nature, recurrent neural network trans- ducer (RNN-T) is a very promising end-to-end (E2E) model.
We show how factoring the RNN-T's output distribution can significantly reduce the computation cost and power consumption for on-device ASR inference with no ...
Abstract. We present a novel architecture with its decoding approach for improving recurrent neural network-transducer (RNN-T) per- formance.