default search action
Guangzhi Sun
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2025
- [j5]Guangzhi Sun, Chao Zhang, Ivan Vulic, Pawel Budzianowski, Philip C. Woodland:
Knowledge-aware audio-grounded generative slot filling for limited annotated data. Comput. Speech Lang. 89: 101707 (2025) - 2024
- [j4]Guangzhi Sun, Chao Zhang, Philip C. Woodland:
Graph Neural Networks for Contextual ASR With the Tree-Constrained Pointer Generator. IEEE ACM Trans. Audio Speech Lang. Process. 32: 2407-2417 (2024) - [j3]Yang Li, Cheng Yu, Guangzhi Sun, Weiqin Zu, Zheng Tian, Ying Wen, Wei Pan, Chao Zhang, Jun Wang, Yang Yang, Fanglei Sun:
Cross-Utterance Conditioned VAE for Speech Generation. IEEE ACM Trans. Audio Speech Lang. Process. 32: 4263-4276 (2024) - [c22]Guangzhi Sun, Shutong Feng, Dongcheng Jiang, Chao Zhang, Milica Gasic, Philip C. Woodland:
Speech-based Slot Filling using Large Language Models. ACL (Findings) 2024: 6351-6362 - [c21]Zhe Chen, Heyang Liu, Wenyi Yu, Guangzhi Sun, Hongcheng Liu, Ji Wu, Chao Zhang, Yu Wang, Yanfeng Wang:
M³AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset. ACL (1) 2024: 9041-9060 - [c20]Guangzhi Sun, Xiao Zhan, Jose Such:
Building Better AI Agents: A Provocation on the Utilisation of Persona in LLM-based Conversational Agents. CUI 2024: 35 - [c19]Nineli Lashkarashvili, Wen Wu, Guangzhi Sun, Philip C. Woodland:
Parameter Efficient Finetuning for Speech Emotion Recognition and Domain Adaptation. ICASSP 2024: 10986-10990 - [c18]Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang:
Extending Large Language Models for Speech and Audio Captioning. ICASSP 2024: 11236-11240 - [c17]Qiuming Zhao, Guangzhi Sun, Chao Zhang, Mingxing Xu, Thomas Fang Zheng:
Enhancing Quantised End-to-End ASR Models Via Personalisation. ICASSP 2024: 12426-12430 - [c16]Wenyi Yu, Changli Tang, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang:
Connecting Speech Encoder and Large Language Model for ASR. ICASSP 2024: 12637-12641 - [c15]Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang:
SALMONN: Towards Generic Hearing Abilities for Large Language Models. ICLR 2024 - [c14]Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang:
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models. ICML 2024 - [c13]Shutong Feng, Guangzhi Sun, Nurul Lubis, Wen Wu, Chao Zhang, Milica Gasic:
Affect Recognition in Conversations Using Large Language Models. SIGDIAL 2024: 259-273 - [i41]Nineli Lashkarashvili, Wen Wu, Guangzhi Sun, Philip C. Woodland:
Parameter Efficient Finetuning for Speech Emotion Recognition and Domain Adaptation. CoRR abs/2402.11747 (2024) - [i40]Xiaoliang Luo, Akilles Rechardt, Guangzhi Sun, Kevin K. Nejad, Felipe Yáñez, Bati Yilmaz, Kangjoo Lee, Alexandra O. Cohen, Valentina Borghesani, Anton Pashkov, Daniele Marinazzo, Jonathan Nicholas, Alessandro Salatiello, Ilia Sucholutsky, Pasquale Minervini, Sepehr Razavi, Roberta Rocca, Elkhan Yusifov, Tereza Okalova, Nianlong Gu, Martin Ferianc, Mikail Khona, Kaustubh R. Patil, Pui-Shee Lee, Rui Mata, Nicholas E. Myers, Jennifer K. Bizley, Sebastian Musslick, Isil Poyraz Bilgin, Guiomar Niso, Justin M. Ales, Michael Gaebler, N. Apurva Ratan Murty, Leyla Loued-Khenissi, Anna Behler, Chloe M. Hall, Jessica Dafflon, Sherry Dongqi Bao, Bradley C. Love:
Large language models surpass human experts in predicting neuroscience results. CoRR abs/2403.03230 (2024) - [i39]Zhe Chen, Heyang Liu, Wenyi Yu, Guangzhi Sun, Hongcheng Liu, Ji Wu, Chao Zhang, Yu Wang, Yanfeng Wang:
M3AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset. CoRR abs/2403.14168 (2024) - [i38]Xiaoliang Luo, Guangzhi Sun, Bradley C. Love:
Matching domain experts by training from scratch on domain knowledge. CoRR abs/2405.09395 (2024) - [i37]Guangzhi Sun, Potsawee Manakul, Adian Liusie, Kunat Pipatanakul, Chao Zhang, Philip C. Woodland, Mark J. F. Gales:
CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models. CoRR abs/2405.13684 (2024) - [i36]Keqi Deng, Guangzhi Sun, Philip C. Woodland:
Wav2Prompt: End-to-End Speech Prompt Generation and Tuning For LLM in Zero and Few-shot Learning. CoRR abs/2406.00522 (2024) - [i35]Ziyun Cui, Ziyang Zhang, Wen Wu, Guangzhi Sun, Chao Zhang:
Bayesian WeakS-to-Strong from Text Classification to Generation. CoRR abs/2406.03199 (2024) - [i34]Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Jun Zhang, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang:
Can Large Language Models Understand Spatial Audio? CoRR abs/2406.07914 (2024) - [i33]Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang:
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models. CoRR abs/2406.15704 (2024) - [i32]Qiuming Zhao, Guangzhi Sun, Chao Zhang, Mingxing Xu, Thomas Fang Zheng:
SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR. CoRR abs/2406.19706 (2024) - [i31]Guangzhi Sun, Xiao Zhan, Jose Such:
Building Better AI Agents: A Provocation on the Utilisation of Persona in LLM-based Conversational Agents. CoRR abs/2407.11977 (2024) - [i30]Qiuming Zhao, Guangzhi Sun, Chao Zhang, Mingxing Xu, Thomas Fang Zheng:
Speaker Adaptation for Quantised End-to-End ASR Models. CoRR abs/2408.03979 (2024) - [i29]Yiyang Zhao, Shuai Wang, Guangzhi Sun, Zehua Chen, Chao Zhang, Mingxing Xu, Thomas Fang Zheng:
Whisper-PMFA: Partial Multi-Scale Feature Aggregation for Speaker Verification using Whisper Models. CoRR abs/2408.15585 (2024) - [i28]Yudong Yang, Zhan Liu, Wenyi Yu, Guangzhi Sun, Qiuqiang Kong, Chao Zhang:
Extract and Diffuse: Latent Integration for Improved Diffusion-based Speech and Vocal Enhancement. CoRR abs/2409.09642 (2024) - [i27]Potsawee Manakul, Guangzhi Sun, Warit Sirichotedumrong, Kasima Tharnpipitchai, Kunat Pipatanakul:
Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models. CoRR abs/2409.10999 (2024) - [i26]Siyin Wang, Wenyi Yu, Yudong Yang, Changli Tang, Yixuan Li, Jimin Zhuang, Xianzhao Chen, Xiaohai Tian, Jun Zhang, Guangzhi Sun, Lu Lu, Chao Zhang:
Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation. CoRR abs/2409.16644 (2024) - 2023
- [j2]Guangzhi Sun, Chao Zhang, Philip C. Woodland:
Minimising Biasing Word Errors for Contextual ASR With the Tree-Constrained Pointer Generator. IEEE ACM Trans. Audio Speech Lang. Process. 31: 345-354 (2023) - [c12]Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao:
TorchAudio 2.1: Advancing Speech Recognition, Self-Supervised Learning, and Audio Processing Components for Pytorch. ASRU 2023: 1-9 - [c11]Evonne P. C. Lee, Guangzhi Sun, Chao Zhang, Philip C. Woodland:
Spectral Clustering-Aware Learning of Embeddings for Speaker Diarisation. ICASSP 2023: 1-5 - [c10]Guangzhi Sun, Chao Zhang, Philip C. Woodland:
End-to-End Spoken Language Understanding with Tree-Constrained Pointer Generator. ICASSP 2023: 1-5 - [c9]Guangzhi Sun, Xianrui Zheng, Chao Zhang, Philip C. Woodland:
Can Contextual Biasing Remain Effective with Whisper and GPT-2? INTERSPEECH 2023: 1289-1293 - [i25]Guangzhi Sun, Chao Zhang, Philip C. Woodland:
Graph Neural Networks for Contextual ASR with the Tree-Constrained Pointer Generator. CoRR abs/2305.18824 (2023) - [i24]Guangzhi Sun, Xianrui Zheng, Chao Zhang, Philip C. Woodland:
Can Contextual Biasing Remain Effective with Whisper and GPT-2? CoRR abs/2306.01942 (2023) - [i23]Guangzhi Sun, Chao Zhang, Ivan Vulic, Pawel Budzianowski, Philip C. Woodland:
Knowledge-Aware Audio-Grounded Generative Slot Filling for Limited Annotated Data. CoRR abs/2307.01764 (2023) - [i22]Yang Li, Cheng Yu, Guangzhi Sun, Weiqin Zu, Zheng Tian, Ying Wen, Wei Pan, Chao Zhang, Jun Wang, Yang Yang, Fanglei Sun:
Cross-Utterance Conditioned VAE for Speech Generation. CoRR abs/2309.04156 (2023) - [i21]Qiuming Zhao, Guangzhi Sun, Chao Zhang, Mingxing Xu, Thomas Fang Zheng:
Enhancing Quantised End-to-End ASR Models via Personalisation. CoRR abs/2309.09136 (2023) - [i20]Shutong Feng, Guangzhi Sun, Nurul Lubis, Chao Zhang, Milica Gasic:
Affect Recognition in Conversations Using Large Language Models. CoRR abs/2309.12881 (2023) - [i19]Wenyi Yu, Changli Tang, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang:
Connecting Speech Encoder and Large Language Model for ASR. CoRR abs/2309.13963 (2023) - [i18]Theodor Nguyen, Guangzhi Sun, Xianrui Zheng, Chao Zhang, Philip C. Woodland:
Conditional Diffusion Model for Target Speaker Extraction. CoRR abs/2310.04791 (2023) - [i17]Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang:
Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models. CoRR abs/2310.05863 (2023) - [i16]Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang:
SALMONN: Towards Generic Hearing Abilities for Large Language Models. CoRR abs/2310.13289 (2023) - [i15]Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao, Robin Scheibler, Samuele Cornell, Sean Kim, Stavros Petridis:
TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch. CoRR abs/2310.17864 (2023) - [i14]Guangzhi Sun, Shutong Feng, Dongcheng Jiang, Chao Zhang, Milica Gasic, Philip C. Woodland:
Speech-based Slot Filling using Large Language Models. CoRR abs/2311.07418 (2023) - 2022
- [c8]Yang Li, Cheng Yu, Guangzhi Sun, Hua Jiang, Fanglei Sun, Weiqin Zu, Ying Wen, Yang Yang, Jun Wang:
Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech. ACL (1) 2022: 391-400 - [c7]Guangzhi Sun, Chao Zhang, Philip C. Woodland:
Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition. INTERSPEECH 2022: 2043-2047 - [i13]Yang Li, Cheng Yu, Guangzhi Sun, Hua Jiang, Fanglei Sun, Weiqin Zu, Ying Wen, Yang Yang, Jun Wang:
Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech. CoRR abs/2205.04120 (2022) - [i12]Guangzhi Sun, Chao Zhang, Philip C. Woodland:
Minimising Biasing Word Errors for Contextual ASR with the Tree-Constrained Pointer Generator. CoRR abs/2205.09058 (2022) - [i11]Guangzhi Sun, Chao Zhang, Philip C. Woodland:
Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition. CoRR abs/2207.00857 (2022) - [i10]Evonne P. C. Lee, Guangzhi Sun, Chao Zhang, Philip C. Woodland:
Spectral Clustering-aware Learning of Embeddings for Speaker Diarisation. CoRR abs/2210.13576 (2022) - [i9]Guangzhi Sun, Chao Zhang, Philip C. Woodland:
End-to-end Spoken Language Understanding with Tree-constrained Pointer Generator. CoRR abs/2210.16554 (2022) - 2021
- [j1]Guangzhi Sun, Chao Zhang, Philip C. Woodland:
Combination of deep speaker embeddings for diarisation. Neural Networks 141: 372-384 (2021) - [c6]Guangzhi Sun, Chao Zhang, Philip C. Woodland:
Tree-Constrained Pointer Generator for End-to-End Contextual Speech Recognition. ASRU 2021: 780-787 - [c5]Guangzhi Sun, D. Liu, Chao Zhang, Philip C. Woodland:
Content-Aware Speaker Embeddings for Speaker Diarisation. ICASSP 2021: 7168-7172 - [c4]Guangzhi Sun, Chao Zhang, Philip C. Woodland:
Transformer Language Models with LSTM-Based Cross-Utterance Information Representation. ICASSP 2021: 7363-7367 - [i8]Guangzhi Sun, D. Liu, Chao Zhang, Philip C. Woodland:
Content-Aware Speaker Embeddings for Speaker Diarisation. CoRR abs/2102.06467 (2021) - [i7]Guangzhi Sun, Chao Zhang, Philip C. Woodland:
Transformer Language Models with LSTM-based Cross-utterance Information Representation. CoRR abs/2102.06474 (2021) - [i6]Guangzhi Sun, Chao Zhang, Philip C. Woodland:
Tree-constrained Pointer Generator for End-to-end Contextual Speech Recognition. CoRR abs/2109.00627 (2021) - 2020
- [c3]Guangzhi Sun, Yu Zhang, Ron J. Weiss, Yuan Cao, Heiga Zen, Yonghui Wu:
Fully-Hierarchical Fine-Grained Prosody Modeling For Interpretable Speech Synthesis. ICASSP 2020: 6264-6268 - [c2]Guangzhi Sun, Yu Zhang, Ron J. Weiss, Yuan Cao, Heiga Zen, Andrew Rosenberg, Bhuvana Ramabhadran, Yonghui Wu:
Generating Diverse and Natural Text-to-Speech Samples Using a Quantized Fine-Grained VAE and Autoregressive Prosody Prior. ICASSP 2020: 6699-6703 - [i5]Guangzhi Sun, Yu Zhang, Ron J. Weiss, Yuan Cao, Heiga Zen, Yonghui Wu:
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis. CoRR abs/2002.03785 (2020) - [i4]Guangzhi Sun, Yu Zhang, Ron J. Weiss, Yuan Cao, Heiga Zen, Andrew Rosenberg, Bhuvana Ramabhadran, Yonghui Wu:
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior. CoRR abs/2002.03788 (2020) - [i3]Guangzhi Sun, Chao Zhang, Philip C. Woodland:
Cross-Utterance Language Models with Acoustic Error Sampling. CoRR abs/2009.01008 (2020) - [i2]Guangzhi Sun, Chao Zhang, Philip C. Woodland:
Combination of Deep Speaker Embeddings for Diarisation. CoRR abs/2010.12025 (2020)
2010 – 2019
- 2019
- [c1]Guangzhi Sun, Chao Zhang, Philip C. Woodland:
Speaker Diarisation Using 2D Self-attentive Combination of Embeddings. ICASSP 2019: 5801-5805 - [i1]Guangzhi Sun, Chao Zhang, Philip C. Woodland:
Speaker diarisation using 2D self-attentive combination of embeddings. CoRR abs/1902.03190 (2019)
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-10-23 20:32 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint