default search action
IEEE/ACM Transactions on Audio, Speech and Language Processing, Volume 30
Volume 30, 2022
- Qianying Liu, Wenyu Guan, Sujian Li, Fei Cheng, Daisuke Kawahara, Sadao Kurohashi:
RODA: Reverse Operation Based Data Augmentation for Solving Math Word Problems. 1-11 - Kai Zhen, Jongmo Sung, Mi Suk Lee, Seungkwon Beack, Minje Kim:
Scalable and Efficient Neural Speech Coding: A Hybrid Design. 12-25 - Sen Yang, Yang Liu, Dawei Feng, Dongsheng Li:
Text Generation From Data With Dynamic Planning. 26-34 - Stefan Liebich, Peter Vary:
Occlusion Effect Cancellation in Headphones and Hearing Devices - The Sister of Active Noise Cancellation. 35-48 - Zhuosheng Zhang, Haojie Yu, Hai Zhao, Masao Utiyama:
Which Apple Keeps Which Doctor Away? Colorful Word Representations With Visual Oracles. 49-59 - Zhenyu Wang, John H. L. Hansen:
Multi-Source Domain Adaptation for Text-Independent Forensic Speaker Recognition. 60-75 - Kengtao Zheng, Nankai Lin, Shengyi Jiang:
Unsupervised Character Embedding Correction and Candidate Word Denoising. 76-86 - Bing Ma, Haifeng Sun, Jingyu Wang, Qi Qi, Jianxin Liao:
Extractive Dialogue Summarization Without Annotation Based on Distantly Supervised Machine Reading Comprehension in Customer Service. 87-97 - Shengcai Liu, Ning Lu, Cheng Chen, Ke Tang:
Efficient Combinatorial Optimization for Word-Level Adversarial Textual Attack. 98-111 - Alessandro Terenzi, Nicola Ortolani, Inês Nolasco, Emmanouil Benetos, Stefania Cecchi:
Comparison of Feature Extraction Methods for Sound-Based Classification of Honey Bee Activity. 112-122 - Shuiyang Mao, P. C. Ching, Tan Lee:
Enhancing Segment-Based Speech Emotion Recognition by Iterative Self-Learning. 123-134 - Abdolreza Sabzi Shahrebabaki, Giampiero Salvi, Torbjørn Svendsen, Sabato Marco Siniscalchi:
Acoustic-to-Articulatory Mapping With Joint Optimization of Deep Speech Enhancement and Articulatory Inversion Models. 135-147 - Javier Jorge, Adrià Giménez, Joan Albert Silvestre-Cerdà, Jorge Civera, Alberto Sanchís, Alfons Juan:
Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models. 148-161 - P. V. Muhammed Shifas, Catalin Zorila, Yannis Stylianou:
End-to-End Neural Based Modification of Noisy Speech for Speech-in-Noise Intelligibility Improvement. 162-173 - Joon-Young Yang, Joon-Hyuk Chang:
VACE-WPE: Virtual Acoustic Channel Expansion Based on Neural Networks for Weighted Prediction Error-Based Speech Dereverberation. 174-189 - Chenpeng Du, Kai Yu:
Phone-Level Prosody Modelling With GMM-Based MDN for Diverse and Controllable Speech Synthesis. 190-201 - Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu, Helen Meng, Hung-Yi Lee:
Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning. 202-217 - Mixiao Hou, Zheng Zhang, Qi Cao, David Zhang, Guangming Lu:
Multi-View Speech Emotion Recognition Via Collective Relation Construction. 218-229 - Da-Rong Liu, Po-Chun Hsu, Yi-Chen Chen, Sung-Feng Huang, Shun-Po Chuang, Da-Yi Wu, Hung-yi Lee:
Learning Phone Recognition From Unpaired Audio and Phone Sequences Based on Generative Adversarial Network. 230-243 - Yuting Zhao, Mamoru Komachi, Tomoyuki Kajiwara, Chenhui Chu:
Word-Region Alignment-Guided Multimodal Neural Machine Translation. 244-259 - Zhuosheng Zhang, Yiqing Zhang, Hai Zhao:
Syntax-Aware Multi-Spans Generation for Reading Comprehension. 260-268 - Pengfei Zhu, Zhuosheng Zhang, Hai Zhao, Xiaoguang Li:
DUMA: Reading Comprehension With Transposition Thinking. 269-279 - Jiayuan Xie, Ningxin Peng, Yi Cai, Tao Wang, Qingbao Huang:
Diverse Distractor Generation for Constructing High-Quality Multiple Choice Questions. 280-291 - Jie Zhang, Guanghui Zhang:
A Parametric Unconstrained Beamformer Based Binaural Noise Reduction for Assistive Hearing. 292-304 - Luca Turchet, Johan Pauwels:
Music Emotion Recognition: Intention of Composers-Performers Versus Perception of Musicians, Non-Musicians, and Listening Machines. 305-316 - Wenxin Hou, Han Zhu, Yidong Wang, Jindong Wang, Tao Qin, Renjun Xu, Takahiro Shinozaki:
Exploiting Adapters for Cross-Lingual Low-Resource Speech Recognition. 317-329 - Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita:
Integrating Prior Translation Knowledge Into Neural Machine Translation. 330-339 - Keqi Deng, Gaofeng Cheng, Runyan Yang, Yonghong Yan:
Alleviating ASR Long-Tailed Problem by Decoupling the Learning of Representation and Classification. 340-354 - Zuchao Li, Junru Zhou, Hai Zhao, Kevin Parnow:
HPSG-Inspired Joint Neural Constituent and Dependency Parsing in O($n^3$) Time Complexity. 355-366 - Xuan Shi, Erica Cooper, Junichi Yamagishi:
Use of Speaker Recognition Approaches for Learning and Evaluating Embedding Representations of Musical Instrument Sounds. 367-377 - Zengwei Yao, Wenjie Pei, Fanglin Chen, Guangming Lu, David Zhang:
Stepwise-Refining Speech Separation Network via Fine-Grained Encoding in High-Order Latent Domain. 378-393 - Yanmin Qian, Zhikai Zhou:
Optimizing Data Usage for Low-Resource Speech Recognition. 394-403 - Narla John Metilda Sagaya Mary, Srinivasan Umesh, Sandesh Varadaraju Katta:
S-Vectors and TESA: Speaker Embeddings and a Speaker Authenticator Based on Transformer Encoder. 404-413 - Bengt J. Borgström:
Bayesian Estimation of PLDA in the Presence of Noisy Training Labels, With Applications to Speaker Verification. 414-428 - Menglong Lu, Zhen Huang, Binyang Li, Yunxiang Zhao, Zheng Qin, Dong Sheng Li:
SIFTER: A Framework for Robust Rumor Detection. 429-442 - Lantian Li, Dong Wang, Jiawen Kang, Renyu Wang, Jing Wu, Zhendong Gao, Xiao Chen:
A Principle Solution for Enroll-Test Mismatch in Speaker Recognition. 443-455 - Feiran Yang:
Analysis of Deficient-Length Partitioned-Block Frequency-Domain Adaptive Filters. 456-467 - Hui Jiang, Linfeng Song, Yubin Ge, Fandong Meng, Junfeng Yao, Jinsong Su:
An AST Structure Enhanced Decoder for Code Generation. 468-476 - Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen, Junichi Yamagishi:
Optimizing Tandem Speaker Verification and Anti-Spoofing Systems. 477-488 - Xin Ni, Jia Ren:
FC-U2-Net: A Novel Deep Neural Network for Singing Voice Separation. 489-494 - Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, Marco Tagliasacchi:
SoundStream: An End-to-End Neural Audio Codec. 495-507 - Wageesha Manamperi, Thushara D. Abhayapala, Jihui Zhang, Prasanga N. Samarasinghe:
Drone Audition: Sound Source Localization Using On-Board Microphones. 508-519 - Qian Li, Hao Peng, Jianxin Li, Jia Wu, Yuanxing Ning, Lihong Wang, Philip S. Yu, Zheng Wang:
Reinforcement Learning-Based Dialogue Guided Event Extraction to Exploit Argument Relations. 520-533 - Santiago Ruiz, Toon van Waterschoot, Marc Moonen:
Distributed Combined Acoustic Echo Cancellation and Noise Reduction in Wireless Acoustic Sensor and Actuator Networks. 534-547 - Lukas Grinewitschus, Peter Jung:
The Harmonic Shift Algorithm for Efficient Multi-Pitch Detection. 548-561 - Ziyao Lu, Xiang Li, Yang Liu, Chulun Zhou, Jianwei Cui, Bin Wang, Min Zhang, Jinsong Su:
Exploring Multi-Stage Information Interactions for Multi-Source Neural Machine Translation. 562-570 - Jingxuan Yang, Si Li, Sheng Gao, Jun Guo:
CorefDPR: A Joint Model for Coreference Resolution and Dropped Pronoun Recovery in Chinese Conversations. 571-581 - Timuçin Berk Atalay, Zühre Sü Gül, Enzo De Sena, Zoran Cvetkovic, Hüseyin Hacihabiboglu:
Scattering Delay Network Simulator of Coupled Volume Acoustics. 582-593 - Yi Zhang, Lei Li, Yunfang Wu, Qi Su, Xu Sun:
Alleviating the Knowledge-Language Inconsistency: A Study for Deep Commonsense Knowledge. 594-604 - Ke Tan, Zhong-Qiu Wang, DeLiang Wang:
Neural Spectrospatial Filtering. 605-621 - Qianren Mao, Jianxin Li, Chenghua Lin, Congwen Chen, Hao Peng, Lihong Wang, Philip S. Yu:
Adaptive Pre-Training and Collaborative Fine-Tuning: A Win-Win Strategy to Improve Review Analysis Tasks. 622-634 - Zifeng Cheng, Zhiwei Jiang, Yafeng Yin, Cong Wang, Qing Gu:
Learning to Classify Open Intent via Soft Labeling and Manifold Mixup. 635-645 - Xiaochun An, Frank K. Soong, Lei Xie:
Disentangling Style and Speaker Attributes for TTS Style Transfer. 646-658 - Zhuang Chen, Tieyun Qian:
Retrieve-and-Edit Domain Adaptation for End2End Aspect Based Sentiment Analysis. 659-672 - Jian Liu, Mengshi Yu, Yufeng Chen, Jinan Xu:
Cross-Domain Slot Filling as Machine Reading Comprehension: A New Perspective. 673-685 - Yongkang Liu, Qingbao Huang, Jing Li, Linzhang Mo, Yi Cai, Qing Li:
SSAP: Storylines and Sentiment Aware Pre-Trained Model for Story Ending Generation. 686-694 - Ying Zhou, Xuefeng Liang, Yu Gu, Yifei Yin, Longshan Yao:
Multi-Classifier Interactive Learning for Ambiguous Speech Emotion Recognition. 695-705 - Poul Hoang, Jan Mark de Haan, Zheng-Hua Tan, Jesper Jensen:
Multichannel Speech Enhancement With Own Voice-Based Interfering Speech Suppression for Hearing Assistive Devices. 706-720 - Weijie Yu, Chen Xu, Jun Xu, Liang Pang, Ji-Rong Wen:
Distribution Distance Regularized Sequence Representation for Text Matching in Asymmetrical Domains. 721-733 - Heming Wang, DeLiang Wang:
Neural Cascade Architecture With Triple-Domain Loss for Speech Enhancement. 734-743 - Riccardo R. De Lucia, Antonio Canclini, Fabio Antonacci, Augusto Sarti:
Group Dictionary Equivalent Source Method for Sparse Nearfield Acoustic Holography. 744-757 - Tong Ma, Ying Wei, Xin Lou:
Reconfigurable Nonuniform Filter Bank for Hearing Aid Systems. 758-771 - Victoria Mingote, Antonio Miguel, Dayana Ribas, Alfonso Ortega, Eduardo Lleida:
aDCF Loss Function for Deep Metric Learning in End-to-End Text-Dependent Speaker Verification Systems. 772-784 - Quansheng Tu, Huawei Chen:
Theoretical Lower Bounds on the Performance of the First-Order Differential Microphone Arrays With Sensor Imperfections. 785-801 - Taihui Wang, Feiran Yang, Jun Yang:
Convolutive Transfer Function-Based Multichannel Nonnegative Matrix Factorization for Overdetermined Blind Source Separation. 802-815 - Yi Zhang, Guangyou Zhou, Zhiwen Xie, Jimmy Xiangji Huang:
HGEN: Learning Hierarchical Heterogeneous Graph Encoding for Math Word Problem Solving. 816-828 - Eduardo Fonseca, Xavier Favory, Jordi Pons, Frederic Font, Xavier Serra:
FSD50K: An Open Dataset of Human-Labeled Sound Events. 829-852 - Yi Lei, Shan Yang, Xinsheng Wang, Lei Xie:
MsEmoTTS: Multi-Scale Emotion Transfer, Prediction, and Control for Emotional Speech Synthesis. 853-864 - Tao Wang, Ruibo Fu, Jiangyan Yi, Jianhua Tao, Zhengqi Wen:
NeuralDPS: Neural Deterministic Plus Stochastic Model With Multiband Excitation for Noise-Controllable Waveform Generation. 865-878 - Simon Stone, Yingming Gao, Peter Birkholz:
Articulatory Synthesis of Vocalized /r/ Allophones in German. 879-889 - Prashant Serai, Vishal Sunder, Eric Fosler-Lussier:
Hallucination of Speech Recognition Errors With Sequence to Sequence Learning. 890-900 - Bin Wu, Sakriani Sakti, Jinsong Zhang, Satoshi Nakamura:
Modeling Unsupervised Empirical Adaptation by DPGMM and DPGMM-RNN Hybrid Model to Extract Perceptual Features for Low-Resource ASR. 901-916 - Mi Zhang, Tieyun Qian, Bing Liu:
Exploit Feature and Relation Hierarchy for Relation Extraction. 917-930 - Wenxiang Jiao, Xing Wang, Shilin He, Zhaopeng Tu, Irwin King, Michael R. Lyu:
Exploiting Inactive Examples for Natural Language Generation With Data Rejuvenation. 931-943 - Youzhi Tu, Man-Wai Mak:
Aggregating Frame-Level Information in the Spectral Domain With Self-Attention for Speaker Embedding. 944-957 - Zhixing Tan, Zeyuan Yang, Meng Zhang, Qun Liu, Maosong Sun, Yang Liu:
Dynamic Multi-Branch Layers for On-Device Neural Machine Translation. 958-967 - Weiwei Lin, Man-Wai Mak:
Mixture Representation Learning for Deep Speaker Embedding. 968-978 - Peng Zhu, Dawei Cheng, Fangzhou Yang, Yifeng Luo, Dingjiang Huang, Weining Qian, Aoying Zhou:
Improving Chinese Named Entity Recognition by Large-Scale Syntactic Dependency Graph. 979-991 - Xiaobo Liang, Lijun Wu, Juntao Li, Tao Qin, Min Zhang, Tie-Yan Liu:
Multi-Teacher Distillation With Single Model for Neural Machine Translation. 992-1002 - Xiaofeng Chen, Guohua Wang, Haopeng Ren, Yi Cai, Ho-fung Leung, Tao Wang:
Task-Adaptive Feature Fusion for Generalized Few-Shot Relation Classification in an Open World Environment. 1003-1015 - Yu-Chen Lin, Cheng Yu, Yi-Te Hsu, Szu-Wei Fu, Yu Tsao, Tei-Wei Kuo:
SEOFP-NET: Compression and Acceleration of Deep Neural Networks for Speech Enhancement Using Sign-Exponent-Only Floating-Points. 1016-1031 - Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi Sawada, Naoyuki Kamo, Shoko Araki:
Switching Independent Vector Analysis and its Extension to Blind and Spatially Guided Convolutional Beamforming Algorithms. 1032-1047 - Jianhua Geng, Sifan Wang, Qinglai Liu, Xin Lou:
Multi-Level Time-Frequency Bins Selection for Direction of Arrival Estimation Using a Single Acoustic Vector Sensor. 1048-1060 - Qinzhuo Wu, Qi Zhang, Xuanjing Huang:
Automatic Math Word Problem Generation With Topic-Expression Co-Attention Mechanism and Reinforcement Learning. 1061-1072 - Michael Nigro, Sridhar Krishnan:
Multimodal System for Audio Scene Source Counting and Analysis. 1073-1082 - Yishu Peng, Sheng Zhang, Jiashu Zhang, Wei Xing Zheng:
Combined-Sample Multiband-Structured Subband Filtering Algorithms. 1083-1092 - Shoukang Hu, Xurong Xie, Mingyu Cui, Jiajun Deng, Shansong Liu, Jianwei Yu, Mengzhe Geng, Xunying Liu, Helen Meng:
Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks. 1093-1107 - Xudong Dang, Wen Ma, Emanuël A. P. Habets, Hongyan Zhu:
TDOA-Based Robust Sound Source Localization With Sparse Regularization in Wireless Acoustic Sensor Networks. 1108-1123 - Shan Gao, Jing Lin, Xihong Wu, Tianshu Qu:
Sparse DNN Model for Frequency Expanding of Higher Order Ambisonics Encoding Process. 1124-1135 - Giovanni Pepe, Leonardo Gabrielli, Stefano Squartini, Carlo Tripodi, Nicolo Strozzi:
Deep Optimization of Parametric IIR Filters for Audio Equalization. 1136-1149 - Moa Lee, Junmo Lee, Joon-Hyuk Chang:
Non-Autoregressive Fully Parallel Deep Convolutional Neural Speech Synthesis. 1150-1159 - Liam Barrett, Junchao Hu, Peter Howell:
Systematic Review of Machine Learning Approaches for Detecting Developmental Stuttering. 1160-1172 - Sang-Hoon Lee, Hyeong-Rae Noh, Woo-Jeoung Nam, Seong-Whan Lee:
Duration Controllable Voice Conversion via Phoneme-Based Information Bottleneck. 1173-1183 - Zhihong Shao, Zhongqin Wu, Minlie Huang:
AdvExpander: Generating Natural Language Adversarial Examples by Expanding Text. 1184-1196 - Dhanunjaya Varma Devalraju, Padmanabhan Rajan:
Multiview Embeddings for Soundscape Classification. 1197-1206 - Chengyu Wang, Suyang Dai, Yipeng Wang, Fei Yang, Minghui Qiu, Kehan Chen, Wei Zhou, Jun Huang:
ARoBERT: An ASR Robust Pre-Trained Language Model for Spoken Language Understanding. 1207-1218 - Jonah Ong, Ba-Tuong Vo, Sven Nordholm, Ba-Ngu Vo, Diluka Moratuwage, Changbeom Shim:
Audio-Visual Based Online Multi-Source Separation. 1219-1234 - Leyang Cui, Yafu Li, Yue Zhang:
Label Attention Network for Structured Prediction. 1235-1248 - Sarinah Sutojo, Tobias May, Steven van de Par:
Segmentation of Multitalker Mixtures Based on Local Feature Contrasts and Auditory Glimpses. 1249-1262 - Hao Gao, Xuelei Feng, Yong Shen:
Weighted Loudspeaker Placement Method for Sound Field Reproduction. 1263-1276 - Gongping Huang, Jacob Benesty, Israel Cohen, Jingdong Chen:
Kronecker Product Multichannel Linear Filtering for Adaptive Weighted Prediction Error-Based Speech Dereverberation. 1277-1289 - Takehiro Sugimoto:
Loudness-Level-Chasing Algorithm for Multiformat Live Audio Production. 1290-1304 - Junshuang Wu, Richong Zhang, Yongyi Mao, Jinpeng Huai:
Dealing With Hierarchical Types and Label Noise in Fine-Grained Entity Typing. 1305-1318 - Anton Ragni, Mark J. F. Gales, Oliver Rose, Katherine M. Knill, Alexandros Kastanos, Qiujia Li, Preben Ness:
Increasing Context for Estimating Confidence Scores in Automatic Speech Recognition. 1319-1329 - Zhongxin Bai, Jianyu Wang, Xiao-Lei Zhang, Jingdong Chen:
End-to-End Speaker Verification via Curriculum Bipartite Ranking Weighted Binary Cross-Entropy. 1330-1344 - Shang-Yi Chuang, Hsin-Min Wang, Yu Tsao:
Improved Lite Audio-Visual Speech Enhancement. 1345-1359 - Gaofeng Cheng, Haoran Miao, Runyan Yang, Keqi Deng, Yonghong Yan:
ETEH: Unified Attention-Based End-to-End ASR and KWS Architecture. 1360-1373 - Ashutosh Pandey, DeLiang Wang:
Self-Attending RNN for Speech Enhancement to Improve Cross-Corpus Generalization. 1374-1385 - Di Jin, Shuyang Gao, Seokhwan Kim, Yang Liu, Dilek Hakkani-Tür:
Towards Textual Out-of-Domain Detection Without In-Domain Labels. 1386-1395 - K. Mrinalini, P. Vijayalakshmi, T. Nagarajan:
SBSim: A Sentence-BERT Similarity-Based Evaluation Metric for Indian Language Neural Machine Translation Systems. 1396-1406 - Changhong Wang, Emmanouil Benetos, Vincent Lostanlen, Elaine Chew:
Adaptive Scattering Transforms for Playing Technique Recognition. 1407-1421 - Danwei Cai, Weiqing Wang, Ming Li:
Incorporating Visual Information in Audio Based Self-Supervised Speaker Recognition. 1422-1435 - Yu Luo, Lina Pu:
EC-ANC: Edge Case-Enhanced Active Noise Cancellation for True Wireless Stereo Earbuds. 1436-1447 - Tao Li, Xinsheng Wang, Qicong Xie, Zhichao Wang, Lei Xie:
Cross-Speaker Emotion Disentangling and Transfer for End-to-End Speech Synthesis. 1448-1460 - Yilin Zhao, Zhuosheng Zhang, Hai Zhao:
Reference Knowledgeable Network for Machine Reading Comprehension. 1461-1473 - Fu-Hao Yu, Kuan-Yu Chen, Ke-Han Lu:
Non-Autoregressive ASR Modeling Using Pre-Trained Language Models for Chinese Speech Recognition. 1474-1482 - Yiming Cui, Ting Liu, Wanxiang Che, Zhigang Chen, Shijin Wang:
Teaching Machines to Read, Answer and Explain. 1483-1492 - Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, Paola García:
Encoder-Decoder Based Attractors for End-to-End Neural Diarization. 1493-1507 - Chenda Li, Zhuo Chen, Yanmin Qian:
Dual-Path Modeling With Memory Embedding Model for Continuous Speech Separation. 1508-1520 - Yu Tong, Jingzhi Guo, Jizhe Zhou:
Separation Inference: A Unified Framework for Word Segmentation in East Asian Languages. 1521-1530 - Hitoshi Suda, Daisuke Saito, Satoru Fukayama, Tomoyasu Nakano, Masataka Goto:
Singer Diarization for Polyphonic Music With Unison Singing. 1531-1545 - Xinnian Liang, Jing Li, Shuangzhi Wu, Mu Li, Zhoujun Li:
Improving Unsupervised Extractive Summarization by Jointly Modeling Facet and Redundancy. 1546-1557 - Sung-Feng Huang, Chyi-Jiunn Lin, Da-Rong Liu, Yi-Chen Chen, Hung-yi Lee:
Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech. 1558-1571 - Ziyi Xu, Maximilian Strake, Tim Fingscheidt:
Deep Noise Suppression Maximizing Non-Differentiable PESQ Mediated by a Non-Intrusive PESQNet. 1572-1585 - Lin Li, Fuchuan Tong, Qingyang Hong:
When Speaker Recognition Meets Noisy Labels: Optimizations for Front-Ends and Back-Ends. 1586-1599 - Vinay Kothapally, John H. L. Hansen:
SkipConvGAN: Monaural Speech Dereverberation Using Generative Adversarial Networks via Complex Time-Frequency Masking. 1600-1613 - Hiromu Yakura, Kento Watanabe, Masataka Goto:
Self-Supervised Contrastive Learning for Singing Voices. 1614-1623 - Chen Gong, Zhenghua Li, Min Zhang:
Neural Coupled Sequence Labeling for Heterogeneous Annotation Conversion. 1624-1636 - Fangfang Su, Yue Zhang, Fei Li, Donghong Ji:
Balancing Precision and Recall for Neural Biomedical Event Extraction. 1637-1649 - Zexu Pan, Ruijie Tao, Chenglin Xu, Haizhou Li:
Selective Listening by Synchronizing Speech With Lips. 1650-1664 - Qianren Mao, Jianxin Li, Hao Peng, Shizhu He, Lihong Wang, Philip S. Yu, Zheng Wang:
Fact-Driven Abstractive Summarization by Utilizing Multi-Granular Multi-Relational Knowledge. 1665-1678 - Neeraj Kumar, Ankur Narang, Brejesh Lall:
Zero-Shot Normalization Driven Multi-Speaker Text to Speech Synthesis. 1679-1693 - Carlos Tarjano, Valdecy Pereira:
An Efficient Algorithm for Segmenting Quasi-Periodic Digital Signals Into Pseudo Cycles: Application in Lossy Audio Compression. 1694-1703 - Han Wang, Hongling Sun, Jianfeng Guo, Ming Wu, Jun Yang:
Analysis of the Frequency Interference in the Narrowband Active Noise Control System. 1704-1717 - Maryam Hosseini, Luca Celotti, Eric Plourde:
End-to-End Brain-Driven Speech Enhancement in Multi-Talker Conditions. 1718-1733 - Mathieu Fontaine, Kouhei Sekiguchi, Aditya Arie Nugraha, Yoshiaki Bando, Kazuyoshi Yoshii:
Generalized Fast Multichannel Nonnegative Matrix Factorization Based on Gaussian Scale Mixtures for Blind Source Separation. 1734-1748 - Thi Ngoc Tho Nguyen, Karn N. Watcharasupat, Ngoc Khanh Nguyen, Douglas L. Jones, Woon-Seng Gan:
SALSA: Spatial Cue-Augmented Log-Spectrogram Features for Polyphonic Sound Event Localization and Detection. 1749-1762 - Changfeng Gao, Gaofeng Cheng, Ta Li, Pengyuan Zhang, Yonghong Yan:
Self-Supervised Pre-Training for Attention-Based Encoder-Decoder ASR Model. 1763-1774 - Jiacheng Ye, Xiang Zhou, Xiaoqing Zheng, Tao Gui, Qi Zhang:
Uncertainty-Aware Sequence Labeling. 1775-1788 - Rui Liu, Berrak Sisman, Guanglai Gao, Haizhou Li:
Decoding Knowledge Transfer for Neural Text-to-Speech Training. 1789-1802 - Weiquan Fan, Xiangmin Xu, Bolun Cai, Xiaofen Xing:
ISNet: Individual Standardization Network for Speech Emotion Recognition. 1803-1814 - Jianfeng Wu, Sijie Mai, Haifeng Hu:
Interpretable Multimodal Capsule Fusion. 1815-1826 - Qian Wang, Jiajun Zhang, Chengqing Zong:
Synchronous Inference for Multilingual Neural Machine Translation. 1827-1839 - Jilu Jin, Jacob Benesty, Gongping Huang, Jingdong Chen:
On Differential Beamforming With Nonuniform Linear Microphone Arrays. 1840-1852 - Chiranjibi Sitaula, Jinyuan He, Archana Priyadarshi, Mark B. Tracy, Omid Kavehei, Murray Hinder, Anusha Withana, Alistair Lee McEwan, Faezeh Marzbanrad:
Neonatal Bowel Sound Detection Using Convolutional Neural Network and Laplace Hidden Semi-Markov Model. 1853-1864 - Chao Pan, Jingdong Chen, Jacob Benesty:
Microphone Array Beamforming With High Flexible Interference Attenuation and Noise Reduction. 1865-1876 - Han Li, Kean Chen, Bernhard U. Seeber:
Gestalt Principles Emerge When Learning Universal Sound Source Separation. 1877-1891 - Juan Manuel Miramont, Marcelo Alejandro Colominas, Gastón Schlotthauer:
Emulating Perceptual Evaluation of Voice Using Scattering Transform Based Features. 1892-1901 - Lucas Ondel, Bolaji Yusuf, Lukás Burget, Murat Saraçlar:
Non-Parametric Bayesian Subspace Models for Acoustic Unit Discovery. 1902-1917 - Jialu Li, Mark Hasegawa-Johnson:
Autosegmental Neural Nets 2.0: An Extensive Study of Training Synchronous and Asynchronous Phones and Tones for Under-Resourced Tonal Languages. 1918-1926 - Yu Lu, Jiajun Zhang, Jiali Zeng, Shuangzhi Wu, Chengqing Zong:
Attention Analysis and Calibration for Transformer in Natural Language Generation. 1927-1938 - Minseung Kim, Jong Won Shin:
Improved Speech Enhancement Considering Speech PSD Uncertainty. 1939-1951 - Avital Kleiman, Israel Cohen, Baruch Berdugo:
Constant-Beamwidth Beamforming With Nonuniform Concentric Ring Arrays. 1952-1962 - Jacopo de Berardinis, Angelo Cangelosi, Eduardo Coutinho:
Measuring the Structural Complexity of Music: From Structural Segmentations to the Automatic Evaluation of Models for Music Generation. 1963-1976 - Dino Oglic, Zoran Cvetkovic, Peter Sollich, Steve Renals, Bin Yu:
Towards Robust Waveform-Based Acoustic Models. 1977-1992 - Bo Chen, Chenpeng Du, Kai Yu:
Neural Fusion for Voice Cloning. 1993-2001 - Saurabhchand Bhati, Jesús Villalba, Piotr Zelasko, Laureano Moro-Velázquez, Najim Dehak:
Unsupervised Speech Segmentation and Variable Rate Representation Learning Using Segmental Contrastive Predictive Coding. 2002-2014 - Bo Yang, Lijun Wu, Jinhua Zhu, Bo Shao, Xiaola Lin, Tie-Yan Liu:
Multimodal Sentiment Analysis With Two-Phase Multi-Task Learning. 2015-2024 - Songbin Li, Jingang Wang, Peng Liu:
General Frame-Wise Steganalysis of Compressed Speech Based on Dual-Domain Representation and Intra-Frame Correlation Leaching. 2025-2035 - Yang Ai, Zhen-Hua Ling, Wei-Lu Wu, Ang Li:
Denoising-and-Dereverberation Hierarchical Neural Vocoder for Statistical Parametric Speech Synthesis. 2036-2048 - Alastair H. Moore, Sina Hafezi, Rebecca R. Vos, Patrick A. Naylor, Mike Brookes:
A Compact Noise Covariance Matrix Model for MVDR Beamforming. 2049-2061 - Leo McCormack, Archontis Politis, Raimundo Gonzalez, Tapio Lokki, Ville Pulkki:
Parametric Ambisonic Encoding of Arbitrary Microphone Arrays. 2062-2075 - Adriana Fernandez-Lopez, Federico M. Sukno:
End-to-End Lip-Reading Without Large-Scale Data. 2076-2090 - Jinwon An, Sungzoon Cho, Junseong Bang, Misuk Kim:
Domain-Slot Relationship Modeling Using a Pre-Trained Language Encoder for Multi-Domain Dialogue State Tracking. 2091-2102 - Roberto San Millán-Castillo, Luca Martino, Eduardo Morgado, Fernando Llorente:
An Exhaustive Variable Selection Study for Linear Models of Soundscape Emotions: Rankings and Gibbs Analysis. 2460-2474 - Kunkun SongGong, Wenwu Wang, Huawei Chen:
Acoustic Source Localization in the Circular Harmonic Domain Using Deep Learning Architecture. 2475-2491 - Yanjue Song, Nilesh Madhu:
Improved CEM for Speech Harmonic Enhancement in Single Channel Noise Suppression. 2492-2503 - Anderson Queiroz, Rosângela Coelho:
Noisy Speech Based Temporal Decomposition to Improve Fundamental Frequency Estimation. 2504-2513 - Myeongjun Jang, Thomas Lukasiewicz:
NoiER: An Approach for Training More Reliable Fine-Tuned Downstream Task Models. 2514-2525 - Jiayi Wang, Rongzhou Bao, Zhuosheng Zhang, Hai Zhao:
Rethinking Textual Adversarial Defense for Pre-Trained Language Models. 2526-2540 - Sungho Lee, Hyeong-Seok Choi, Kyogu Lee:
Differentiable Artificial Reverberation. 2541-2556 - Chuang Fan, Jiaming Li, Xuan Luo, Ruifeng Xu:
Enhancing Structure Preservation in Coreference Resolution by Constrained Graph Encoding. 2557-2567 - Richong Zhang, Qianben Chen, Yaowei Zheng, Samuel Mensah, Yongyi Mao:
Aspect-Level Sentiment Analysis via a Syntax-Based Neural Network. 2568-2583 - Moti Lugasi, Anjali Menon, Vladimir Tourbabin, Boaz Rafaely:
Spatial Audio Signal Enhancement by a Two-Stage Source - System Estimation With Frequency Smoothing for Improved Perception. 2584-2596 - Mengzhe Geng, Xurong Xie, Zi Ye, Tianzi Wang, Guinan Li, Shujie Hu, Xunying Liu, Helen Meng:
Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition. 2597-2611 - Elior Hadad, Simon Doclo, Sven Nordholm, Sharon Gannot:
A Class of Pareto Optimal Binaural Beamformers. 2612-2628 - Guochen Yu, Andong Li, Hui Wang, Yutian Wang, Yuxuan Ke, Chengshi Zheng:
DBT-Net: Dual-Branch Federative Magnitude and Phase Estimation With Attention-in-Attention Transformer for Monaural Speech Enhancement. 2629-2644 - Weiqing Wang, Qingjian Lin, Danwei Cai, Ming Li:
Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization. 2645-2658 - Sunwoo Kim, Minje Kim:
Boosted Locality Sensitive Hashing: Discriminative, Efficient, and Scalable Binary Codes for Source Separation. 2659-2672 - Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura:
A Machine Speech Chain Approach for Dynamically Adaptive Lombard TTS in Static and Dynamic Noise Environments. 2673-2688 - Qiupu Chen, Guimin Huang, Yabing Wang:
The Weighted Cross-Modal Attention Mechanism With Sentiment Prediction Auxiliary Task for Multimodal Sentiment Analysis. 2689-2695 - Leilei Gan, Zhiyang Teng, Yue Zhang, Linchao Zhu, Fei Wu, Yi Yang:
SemGloVe: Semantic Co-Occurrences for GloVe From BERT. 2696-2704 - Suliang Bu, Yunxin Zhao, Tuo Zhao, Shaojun Wang, Mei Han:
Modeling Speech Structure to Improve T-F Masks for Speech Enhancement and Recognition. 2705-2715 - Hendrik Schröter, Tobias Rosenkranz, Alberto N. Escalante-B., Andreas K. Maier:
Low Latency Speech Enhancement for Hearing Aids Using Deep Filtering. 2716-2728 - Zhihao Zhang, Yuan Zuo, Junjie Wu:
Aspect Sentiment Triplet Extraction: A Seq2Seq Approach With Span Copy Enhanced Dual Decoder. 2729-2742 - Leonardo Gabrielli, Stefano D'Angelo, Pier Paolo La Pastina, Stefano Squartini:
Antiderivative Antialiasing for Arbitrary Waveform Generation. 2743-2753 - Huadong Wang, Xin Shen, Mei Tu, Yimeng Zhuang, Zhiyuan Liu:
Improved Transformer With Multi-Head Dense Collaboration. 2754-2767 - Chuang Shi, Feiyu Du, Qianyang Wu:
A Digital Twin Architecture for Wireless Networked Adaptive Active Noise Control. 2768-2777 - Wenmeng Xiong, Changchun Bao, Mao-shen Jia, José Picheral:
Speech Enhancement With Robust Beamforming for Spatially Overlapped and Distributed Sources. 2778-2790 - Hassan Taherian, Ke Tan, DeLiang Wang:
Multi-Channel Talker-Independent Speaker Separation Through Location-Based Training. 2791-2800 - Han Zhang, Bin Liang, Min Yang, Hui Wang, Ruifeng Xu:
Prompt-Based Prototypical Framework for Continual Relation Extraction. 2801-2813 - Christof Weiß, Geoffroy Peeters:
Comparing Deep Models and Evaluation Strategies for Multi-Pitch Estimation in Music Recordings. 2814-2827 - Laura-Maria Dogariu, Jacob Benesty, Constantin Paleologu, Silviu Ciochina:
Identification of Room Acoustic Impulse Responses via Kronecker Product Decompositions. 2828-2841 - Yanmin Qian, Xun Gong, Houjun Huang:
Layer-Wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition. 2842-2853 - Liumeng Xue, Frank K. Soong, Shaofei Zhang, Lei Xie:
ParaTTS: Learning Linguistic and Prosodic Cross-Sentence Information in Paragraph-Based TTS. 2854-2864 - Soojoong Hwang, Minseung Kim, Jong Won Shin:
Dual Microphone Speech Enhancement Based on Statistical Modeling of Interchannel Phase Difference. 2865-2874 - Chao Pan, Jingdong Chen:
A Framework of Directional-Gain Beamforming and a White-Noise-Gain-Controlled Solution. 2875-2887 - Fei He, Xiaoyi Hu, Ce Zhu, Ying Li, Yipeng Liu:
Multi-Scale Spatial and Temporal Speech Associations to Swallowing for Dysphagia Screening. 2888-2899 - Boyang Xue, Shoukang Hu, Junhao Xu, Mengzhe Geng, Xunying Liu, Helen Meng:
Bayesian Neural Network Language Modeling for Speech Recognition. 2900-2917 - Kang Xu, Fei Li, Dongdong Xie, Donghong Ji:
Revisiting Aspect-Sentiment-Opinion Triplet Extraction: Detailed Analyses Towards a Simple and Effective Span-Based Model. 2918-2927 - Koichi Saito, Tomohiko Nakamura, Kohei Yatabe, Hiroshi Saruwatari:
Sampling-Frequency-Independent Convolutional Layer and its Application to Audio Source Separation. 2928-2943 - Juliano G. C. Ribeiro, Natsuki Ueno, Shoichi Koyama, Hiroshi Saruwatari:
Region-to-Region Kernel Interpolation of Acoustic Transfer Functions Constrained by Physical Properties. 2944-2954 - Ruixin Hong, Hongming Zhang, Xintong Yu, Changshui Zhang:
Learning Event Extraction From a Few Guideline Examples. 2955-2967 - Zhengjun Yue, Erfan Loweimi, Heidi Christensen, Jon Barker, Zoran Cvetkovic:
Acoustic Modelling From Raw Source and Filter Components for Dysarthric Speech Recognition. 2968-2980 - Gaku Kotani, Daisuke Saito, Nobuaki Minematsu:
Voice Conversion Based on Deep Neural Networks for Time-Variant Linear Transformations. 2981-2992 - Xiaoyu Bie, Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin:
Unsupervised Speech Enhancement Using Dynamical Variational Autoencoders. 2993-3007 - Yi Luo:
A Time-Domain Real-Valued Generalized Wiener Filter for Multi-Channel Neural Separation Systems. 3008-3019 - Kaile Shi, Xiaoyan Cai, Libin Yang, Jintao Zhao, Shirui Pan:
StarSum: A Star Architecture Based Model for Extractive Summarization. 3020-3031 - Zexu Pan, Meng Ge, Haizhou Li:
USEV: Universal Speaker Extraction With Visual Cue. 3032-3045 - Yutao Xie, Qiyu Wu, Wei Chen, Tengjiao Wang:
Stable Contrastive Learning for Self-Supervised Sentence Embeddings With Pseudo-Siamese Mutual Learning. 3046-3059 - Lei Luo, Wenzhao Zhu:
An Optimized Zero-Attracting LMS Algorithm for the Identification of Sparse System. 3060-3073 - Gongping Huang, Jacob Benesty, Jingdong Chen:
Fundamental Approaches to Robust Differential Beamforming With High Directivity Factors. 3074-3088 - Xiaoqiang Wang, Yanqing Liu, Jinyu Li, Veljko Miljanic, Sheng Zhao, Hosam Khalil:
Towards Contextual Spelling Correction for Customization of End-to-End Speech Recognition Systems. 3089-3097 - Jiaxin Zhong, Tao Zhuang, Ray Kirby, Mahmoud Karimi, Xiaojun Qiu, Haishan Zou, Jing Lu:
Low Frequency Audio Sound Field Generated by a Focusing Parametric Array Loudspeaker. 3098-3109 - Jens Ahrens, Hannes Helmholz, David Lou Alon, Sebastià V. Amengual Garí:
Spherical Harmonic Decomposition of a Sound Field Using Microphones on a Circumferential Contour Around a Non-Spherical Baffle. 3110-3119 - Yonggang Hu, Prasanga N. Samarasinghe, Sharon Gannot, Thushara D. Abhayapala:
Decoupled Multiple Speaker Direction-of-Arrival Estimator Under Reverberant Environments. 3120-3133 - Heming Wang, Xueliang Zhang, DeLiang Wang:
Fusing Bone-Conduction and Air-Conduction Sensors for Complex-Domain Speech Enhancement. 3134-3143 - Joon-Young Yang, Joon-Hyuk Chang:
Task-Specific Optimization of Virtual Channel Linear Prediction-Based Speech Dereverberation Front-End for Far-Field Speaker Verification. 3144-3159 - Jian Liu, Yufeng Chen, Jinan Xu:
MRCAug: Data Augmentation via Machine Reading Comprehension for Document-Level Event Argument Extraction. 3160-3172 - Wangyou Zhang, Xuankai Chang, Christoph Böddeker, Tomohiro Nakatani, Shinji Watanabe, Yanmin Qian:
End-to-End Dereverberation, Beamforming, and Speech Recognition in a Cocktail Party. 3173-3188 - Michele Ducceschi, Stefan Bilbao:
Non-Iterative Simulation Methods for Virtual Analog Modelling. 3189-3198 - Miguel Ferrer, Maria de Diego, Amin Hassani, Marc Moonen, Gema Piñero, Alberto González:
Multi-Tone Active Noise Equalizer With Spatially Distributed User-Selected Profiles. 3199-3213 - Gasper Begus, Alan Zhou:
Interpreting Intermediate Convolutional Layers of Generative CNNs Trained on Waveforms. 3214-3229 - Sixing Wu, Ying Li, Dawei Zhang, Zhonghai Wu:
Generating Rational Commonsense Knowledge-Aware Dialogue Responses With Channel-Aware Knowledge Fusing Network. 3230-3239 - Donghui Zhu, Ning Chen:
Multi-Source Domain Adaptation and Fusion for Speaker Verification. 2103-2116 - Daniel Yang, Thaxter Shaw, Timothy Tsai:
A Study of Parallelizable Alternatives to Dynamic Time Warping for Aligning Long Sequences. 2117-2127 - Yi Yu, Hongsen He, Rodrigo C. de Lamare, Badong Chen:
General Robust Subband Adaptive Filtering: Algorithms and Applications. 2128-2140 - Mahdie Karbasi, Steffen Zeiler, Dorothea Kolossa:
Microscopic and Blind Prediction of Speech Intelligibility: Theory and Practice. 2141-2155 - Andong Li, Chengshi Zheng, Guochen Yu, Juanjuan Cai, Xiaodong Li:
Filtering and Refining: A Collaborative-Style Framework for Single-Channel Speech Enhancement. 2156-2172 - Silin Gao, Ryuichi Takanobu, Antoine Bosselut, Minlie Huang:
End-to-End Task-Oriented Dialog Modeling With Semi-Structured Knowledge Management. 2173-2187 - Yingying Zhu, Haiquan Zhao, Xiaoqiong He, Zeliang Shu, Badong Chen:
Cascaded Random Fourier Filter for Robust Nonlinear Active Noise Control. 2188-2200 - Siyuan Wang, Zhongkun Liu, Wanjun Zhong, Ming Zhou, Zhongyu Wei, Zhumin Chen, Nan Duan:
From LSAT: The Progress and Challenges of Complex Reasoning. 2201-2216 - Cheng Lu, Yuan Zong, Wenming Zheng, Yang Li, Chuangao Tang, Björn W. Schuller:
Domain Invariant Feature Learning for Speaker-Independent Speech Emotion Recognition. 2217-2230 - Bo Zhang, Jian Wang, Hongfei Lin, Hui Ma, Bo Xu:
Exploiting Pairwise Mutual Information for Knowledge-Grounded Dialogue. 2231-2240 - Tao Wang, Jiangyan Yi, Ruibo Fu, Jianhua Tao, Zhengqi Wen:
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing. 2241-2254 - Ying-Ren Chien, Chih-Hsiang Yu, Hen-Wai Tsao:
Affine-Projection-Like Maximum Correntropy Criteria Algorithm for Robust Active Noise Control. 2255-2266 - Rui Wang, Zhihua Wei, Haoran Duan, Shouling Ji, Yang Long, Zhen Hong:
EfficientTDNN: Efficient Architecture Search for Speaker Recognition. 2267-2279 - Xiaoxue Gao, Chitralekha Gupta, Haizhou Li:
Automatic Lyrics Transcription of Polyphonic Music With Lyrics-Chord Multi-Task Learning. 2280-2294 - Jirí Málek, Jakub Janský, Zbynek Koldovský, Tomás Kounovský, Jaroslav Cmejla, Jindrich Zdánský:
Target Speech Extraction: Independent Vector Extraction Guided by Supervised Speaker Identification. 2295-2309 - Jung-Woo Choi, Franz Zotter, Byeongho Jo, Jae-Hyoun Yoo:
Multiarray Eigenbeam-ESPRIT for 3D Sound Source Localization With Multiple Spherical Microphone Arrays. 2310-2325 - Hao Zhang, DeLiang Wang:
Neural Cascade Architecture for Multi-Channel Acoustic Echo Suppression. 2326-2336 - Alessandro Opinto, Marco Martalò, Alessandro Costalunga, Nicolo Strozzi, Carlo Tripodi, Riccardo Raheli:
Experimental Analysis and Design Guidelines for Microphone Virtualization in Automotive Scenarios. 2337-2346 - Xing Tian, Jie Huang, Xuelei Feng, Yong Shen:
An Intermittent FxLMS Algorithm for Active Noise Control Systems With Saturation Nonlinearity. 2347-2356 - Haichao Zhu, Li Dong, Furu Wei, Bing Qin, Ting Liu:
Transforming Wikipedia Into Augmented Data for Query-Focused Summarization. 2357-2367 - Kouhei Sekiguchi, Yoshiaki Bando, Aditya Arie Nugraha, Mathieu Fontaine, Kazuyoshi Yoshii, Tatsuya Kawahara:
Autoregressive Moving Average Jointly-Diagonalizable Spatial Covariance Analysis for Joint Source Separation and Dereverberation. 2368-2382 - Brij Mohan Lal Srivastava, Mohamed Maouche, Md. Sahidullah, Emmanuel Vincent, Aurélien Bellet, Marc Tommasi, Natalia A. Tomashenko, Xin Wang, Junichi Yamagishi:
Privacy and Utility of X-Vector Based Speaker Anonymization. 2383-2395 - Luciana Ferrer, Diego Castán, Mitchell McLaren, Aaron Lawson:
A Discriminative Hierarchical PLDA-Based Model for Spoken Language Recognition. 2396-2410 - Xiaohuai Le, Tong Lei, Kai Chen, Jing Lu:
Inference Skipping for More Efficient Real-Time Speech Enhancement With Parallel RNNs. 2411-2421 - Chitralekha Gupta, Haizhou Li, Masataka Goto:
Deep Learning Approaches in Topics of Singing Information Processing. 2422-2451 - Atharva Anand Joshi, Harshavardhan Settibhaktini, Ananthakrishna Chintanpalli:
Modeling Concurrent Vowel Scores Using the Time Delay Neural Network and Multitask Learning. 2452-2459
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.