Profils utilisateurs correspondant à "Wayne Xiong"

Wayne Xiong

Microsoft
Adresse e-mail validée de microsoft.com
Cité 2918 fois

Achieving human parity in conversational speech recognition

W Xiong, J Droppo, X Huang, F Seide, M Seltzer… - arXiv preprint arXiv …, 2016 - arxiv.org
Conversational speech recognition has served as a flagship speech recognition task since
the release of the Switchboard corpus in the 1990s. In this paper, we measure the human …

The Microsoft 2017 conversational speech recognition system

W Xiong, L Wu, F Alleva, J Droppo… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org
We describe the latest version of Microsoft's conversational speech recognition system for
the Switchboard and CallHome domains. The system adds a CNN-BLSTM acoustic model to …

Toward human parity in conversational speech recognition

W Xiong, J Droppo, X Huang, F Seide… - … on Audio, Speech …, 2017 - ieeexplore.ieee.org
Conversational speech recognition has served as a flagship speech recognition task since
the release of the Switchboard corpus in the 1990s. In this paper, we measure a human error …

Pyramidkv: Dynamic kv cache compression based on pyramidal information funneling

…, B Gao, Y Liu, Y Li, T Liu, K Lu, W Xiong… - arXiv preprint arXiv …, 2024 - arxiv.org
In this study, we investigate whether attention-based information flow inside large language
models (LLMs) is aggregated through noticeable patterns for long context processing. Our …

[PDF][PDF] Deep Convolutional Neural Networks with Layer-Wise Context Expansion and Attention.

D Yu, W Xiong, J Droppo, A Stolcke, G Ye, J Li… - Interspeech, 2016 - isca-archive.org
In this paper, we propose a deep convolutional neural network (CNN) with layer-wise context
expansion and location-based attention, for large vocabulary speech recognition. In our …

Z-code++: A pre-trained language model optimized for abstractive summarization

…, R Xu, HH Awadalla, Y Shi, C Zhu, W Xiong… - arXiv preprint arXiv …, 2022 - arxiv.org
This paper presents Z-Code++, a new pre-trained language model optimized for abstractive
text summarization. The model extends the state of the art encoder-decoder model using …

Advances in online audio-visual meeting transcription

…, A Vinnikov, L Wu, X Xiao, W Xiong… - 2019 IEEE Automatic …, 2019 - ieeexplore.ieee.org
This paper describes a system that generates speaker-annotated transcripts of meetings by
using a microphone array and a 360-degree camera. The hallmark of the system is its ability …

Progressive joint modeling in unsupervised single-channel overlapped speech recognition

Z Chen, J Droppo, J Li, W Xiong - IEEE/ACM Transactions on …, 2017 - ieeexplore.ieee.org
Unsupervised single-channel overlapped speech recognition is one of the hardest problems
in automatic speech recognition (ASR). Permutation invariant training (PIT) is a state of the …

Pyramidkv: Dynamic kv cache compression based on pyramidal information funneling

…, B Gao, Y Liu, Y Li, T Liu, K Lu, W Xiong… - arXiv e …, 2024 - ui.adsabs.harvard.edu
In this study, we investigate whether attention-based information flow inside large language
models (LLMs) is aggregated through noticeable patterns for long context processing. Our …

Momentum calibration for text generation

…, Y Liu, X Wang, P He, Y Yu, SQ Chen, W Xiong… - arXiv preprint arXiv …, 2022 - arxiv.org
The input and output of most text generation tasks can be transformed to two sequences of
tokens and they can be modeled using sequence-to-sequence learning modeling tools such …