Profils utilisateurs correspondant à "Wayne Xiong"
![]() | Wayne XiongMicrosoft Adresse e-mail validée de microsoft.com Cité 2918 fois |
Achieving human parity in conversational speech recognition
Conversational speech recognition has served as a flagship speech recognition task since
the release of the Switchboard corpus in the 1990s. In this paper, we measure the human …
the release of the Switchboard corpus in the 1990s. In this paper, we measure the human …
The Microsoft 2017 conversational speech recognition system
We describe the latest version of Microsoft's conversational speech recognition system for
the Switchboard and CallHome domains. The system adds a CNN-BLSTM acoustic model to …
the Switchboard and CallHome domains. The system adds a CNN-BLSTM acoustic model to …
Toward human parity in conversational speech recognition
Conversational speech recognition has served as a flagship speech recognition task since
the release of the Switchboard corpus in the 1990s. In this paper, we measure a human error …
the release of the Switchboard corpus in the 1990s. In this paper, we measure a human error …
Pyramidkv: Dynamic kv cache compression based on pyramidal information funneling
In this study, we investigate whether attention-based information flow inside large language
models (LLMs) is aggregated through noticeable patterns for long context processing. Our …
models (LLMs) is aggregated through noticeable patterns for long context processing. Our …
[PDF][PDF] Deep Convolutional Neural Networks with Layer-Wise Context Expansion and Attention.
In this paper, we propose a deep convolutional neural network (CNN) with layer-wise context
expansion and location-based attention, for large vocabulary speech recognition. In our …
expansion and location-based attention, for large vocabulary speech recognition. In our …
Z-code++: A pre-trained language model optimized for abstractive summarization
This paper presents Z-Code++, a new pre-trained language model optimized for abstractive
text summarization. The model extends the state of the art encoder-decoder model using …
text summarization. The model extends the state of the art encoder-decoder model using …
Advances in online audio-visual meeting transcription
This paper describes a system that generates speaker-annotated transcripts of meetings by
using a microphone array and a 360-degree camera. The hallmark of the system is its ability …
using a microphone array and a 360-degree camera. The hallmark of the system is its ability …
Progressive joint modeling in unsupervised single-channel overlapped speech recognition
Unsupervised single-channel overlapped speech recognition is one of the hardest problems
in automatic speech recognition (ASR). Permutation invariant training (PIT) is a state of the …
in automatic speech recognition (ASR). Permutation invariant training (PIT) is a state of the …
Pyramidkv: Dynamic kv cache compression based on pyramidal information funneling
In this study, we investigate whether attention-based information flow inside large language
models (LLMs) is aggregated through noticeable patterns for long context processing. Our …
models (LLMs) is aggregated through noticeable patterns for long context processing. Our …
Momentum calibration for text generation
The input and output of most text generation tasks can be transformed to two sequences of
tokens and they can be modeled using sequence-to-sequence learning modeling tools such …
tokens and they can be modeled using sequence-to-sequence learning modeling tools such …