Technical Paper Table of Contents
Technical Paper Table of Contents
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) | 978-1-7281-6327-7/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICASSP49357.2023.10096024
Grand Challenges
6829: Multi-Speaker End-to-end Multi-modal Speaker Diarization System for the MISP 2022
CHALLENGE
Tao Liu (Shanghai Jiao Tong University)*; Zhengyang Chen (Shanghai Jiao Tong University); Yanmin
Qian (Shanghai Jiao Tong University); Kai Yu (Shanghai Jiao Tong University)
6848: The NPU-ASLP System for Audio-Visual Speech Recognition in MISP 2022 Challenge
Pengcheng Guo (Northwestern Polytechnical University)*; He Wang (NWPU); Bingshen Mu
(Northwestern Polytechnical University); Ao Zhang (Northwestern Polytechnical University); Peikun Chen
(Northwestern Polytechnical University)
6850: Personalized speech enhancement combining band-split RNN and speaker attentive module
Xiaohuai Le (Nanjing University;ByteDance)*; Li Chen (ByteDance); Yiqing Guo (ByteDance); Chao He
(ByteDance); Cheng Chen (ByteDance); Xianjun Xia (NA); Jing Lu (Nanjing University)
6853: Dialogue Context Modelling for Action Item Detection: Solution for ICASSP 2023 MUG
Challenge Track 5
Jie Huang (Harbin Institute of Technology); Xiachong Feng (Harbin Institute of Technology)*; Ye Yangfan
(HIT); Liang Zhao (HIT); Xiaocheng Feng (Harbin Institute of Technology); Bing Qin (Harbin Institute of
Technology); Ting Liu (哈尔滨工业大学)
6857: INPLACE CEPSTRAL SPEECH ENHANCEMENT SYSTEM FOR THE ICASSP 2023 CLARITY
CHALLENGE
Jinjiang Liu (College of Computer Science, Inner Mongolia University)*; Xueliang zhang (Inner Mongolia
University)
12
6858: A_TAYLOR_STYLE_NEURAL_NETWORK_IN_FULLBAND_ECHO_CANCELLATION
Xu Weiming (Northwest Polytechnic University)*; Guo Zhihao (elevoc)
6859: Multi-Head Attention and GRU for Improved Match-Mismatch Classification of Speech
Stimulus and EEG Response
Marvin Borsdorf (University of Bremen)*; Saurav Pahuja (University of Bremen); Gabriel Ivucic (University
of Bremen); Siqi Cai (National University of Singapore); Haizhou Li (The Chinese University of Hong
Kong, Shenzhen); Tanja Schultz (University of Bremen)
6861: RELATING EEG RECORDINGS TO SPEECH USING ENVELOPE TRACKING AND THE
SPEECH-FFR
Michael D Thornton (Imperial College London)*; Danilo Mandic (Imperial College London); Tobias
Reichenbach (FAU)
6863: S-FEATURE PYRAMID NETWORK AND ATTENTION MODEL FOR DRONE DETECTION
Pengcheng Dong (Shandong Normal University)*; Chuntao Wang (Shandong Normal University);
Zhenyong Lu (Shandong Normal University); Kai Zhang (Shandong Normal University); Wenbo Wan
(Shandong Normal University); Jiande Sun (Shandong Normal University)
6866: CONSEN: Complementary and Simultaneous Ensemble for Alzheimer's Disease Detection
and MMSE Score Prediction
LONGBIN JIN (Konkuk University)*; Yealim Oh (Konkuk University); Hyunseo Kim (Konkuk University);
Hyuntaek Jung (Konkuk University); Hyo Jin Jon (Konkuk University); Jung Eun Shin (Voinosis Inc.); Eun
Yi Kim (Konkuk University)
6867: The WHU-Alibaba Audio-Visual Speaker Diarization System for the MISP 2022 Challenge
Ming Cheng (Duke Kunshan University)*; Haoxu Wang (Wuhan University); Ziteng Wang (Alibaba
Group); Qiang Fu (Alibaba Group); Ming Li (Duke Kunshan University)
6869: A Low-Latency Deep Hierarchical Fusion Network for Fullband Acoustic Echo Cancellation
Haoran Zhao (Kuaishou Technology)*; Nan Li (北京达佳互联信息技术有限公司); Runqiang Han
(kuaishou); Xiguang Zheng (北京达佳互联信息技术有限公司); Chen Zhang (北京达佳互联信息技术有限公
司)
13
6875: DEEP LEARNING-BASED PATH LOSS PREDICTION FOR OUTDOOR WIRELESS
COMMUNICATION SYSTEMS
Kehai Qiu (University of Cambridge)*; Stefanos Bakirtzis (University of Cambridge); Hui Song (Ranplan
Wireless Network Design Ltd); Ian J Wassell (University of Cambridge); Jie Zhang (University of
Sheffield)
6877: Ensemble and personalized Transformer models for subject identification and relapse
detection in e-Prevention Challenge
Salvatore Calcagno (University of Catania)*; Raffaele Mineo (University of Catania); Daniela Giordano
(University of Catania); Concetto Spampinato (University of Catania)
6880: Half-temporal and half-frequency attention U2Net for speech signal improvement
Zehua Zhang (Harbin Institute of Technology(Shenzhen))*; Shiyun Xu (Harbin Institute of
Technology(Shenzhen)); Xuyi Zhuang (Harbin Institute of Technology(Shenzhen)); Yukun Qian (Harbin
Institute of Technology (Shenzhen)); Lianyu Zhou (Harbin Institute of Technology(Shenzhen)); Mingjiang
Wang (Harbin Institute of Technology Shenzhen)
6885: A Study on the Integration of Pipeline and E2E SLU systems for Spoken Semantic Parsing
toward STOP Quality Challenge
Siddhant Arora (Carnegie Mellon University)*; Hayato Futami (Sony Group Corporation); Shih-Lun Wu
(Carnegie Mellon University); Jessica Huynh (Carnegie Mellon University); Yifan Peng (Carnegie Mellon
University); Yosuke Kashiwagi (Sony); Emiru Tsunoo (Sony Group Corporation); Brian Yan (Carnegie
Mellon University); Shinji Watanabe (Carnegie Mellon University)
6887: Multi-Channel Speaker Extraction with Adversarial Training: the Wavlab submission to the
Clarity ICASSP 2023 Grand Challenge
Samuele Cornell (Università Politecnica delle Marche)*; Zhong-Qiu Wang (Carnegie Mellon University);
Yoshiki Masuyama (Tokyo Metropolitan University); Shinji Watanabe (Carnegie Mellon University);
Manuel Pariente (Pulse Audition); Nobutaka Ono (Tokyo Metropolitan University); Stefano Squartini
(Università Politecnica delle Marche)
14
6891: PMNet: Large-Scale Channel Prediction System for ICASSP 2023 First Pathloss Radio Map
Prediction Challenge
Ju-Hyung Lee (University of Southern California)*; Joohan Lee (University of Southern California); Seon-
Ho Lee (MCL, Korea University); Andreas Molisch (University of Southern California)
6892: The pipeline system of ASR and NLU with MLM-based data augmentation toward STOP low-
resource challenge
Hayato Futami (Sony Group Corporation)*; Jessica Huynh (Carnegie Mellon University); Siddhant Arora
(Carnegie Mellon University); Shih-Lun Wu (Carnegie Mellon University); Yosuke Kashiwagi (Sony); Yifan
Peng (Carnegie Mellon University); Brian Yan (Carnegie Mellon University); Emiru Tsunoo (Sony Group
Corporation); Shinji Watanabe (Carnegie Mellon University)
6897: VANI: Very-lightweight Accent-controllable TTS for Native and Non-native speakers with
Identity Preservation
Rohan Badlani (NVIDIA)*; Akshit Arora (NVIDIA); Subhankar Ghosh (NVIDIA); Rafael Valle (NVIDIA);
Kevin Shih (NVIDIA); João Felipe Santos (NVIDIA); Boris Ginsburg (NVIDIA); Bryan Catanzaro (NVIDIA)
6900: A person identification system for the ICASSP 2023 e-Prevention challenge
Jinting Wu (Samsung Research China-Beijing (SRC-B))*; Mei Tu (Samsung)
6902: HITsz TMG at ICASSP 2023 SPGC: Leveraging pre-training and distillation method for title
generation with limited resource
Tianxiao Xu (Harbin Institute of Technology, shenzhen)*; Zihao Zheng ( Harbin Institute of Technology,
shenzhen ); Xinshuo Hu (Harbin Institute of Technology, Shenzhen); Zetian Sun (Harbin Institute of
Technology, shenzhen); Yu Zhao (Harbin Institute of Technology, Shenzhen); Baotian Hu (Harbin Institute
of Technology, Shenzhen)
15
6903: LeanSpeech: The Microsoft Lightweight Speech Synthesis System for LIMMITS Challenge
2023
Chen Zhang (Microsoft)*; SHUBHAM BANSAL (Microsoft); Aakash Lakhera (Microsoft); Jinzhu Li
(Microsoft); Gag Wang (Microsoft); Sandeep kumar Satpal (Microsoft,India); sheng zhao (microsoft); Lei
He (Microsoft Cloud and AI)
6904: The NPU-Elevoc Personalized Speech Enhancement System for ICASSP2023 DNS Challenge
Xiaopeng Yan (Northwestern Polytechnical University)*; Yindi Yang (Elevoc); Zhihao Guo (Elevoc);
Liangliang Peng (Elevoc); Lei Xie (NWPU)
6905: The XMU system for audio-visual diarization and recognition in MISP challenge 2022
Tao Li (Xiamen University)*; Haodong Zhou (Xiamen University); Jie Wang (Xiamen University); Qingyang
Hong (Xiamen University); Lin Li (Xiamen University)
6910: Tspeech-AI System Description to the 5th Deep Noise Suppression (DNS) Challenge
Jianwei Yu (Tencent AI lab)*; Hangting Chen (Tencent ASSP OTeam); Yi Luo (Tencent AI Lab); Rongzhi
Gu (Tencent); Chao Weng (Tencent AI Lab)
6911: SSI-Net: A MULTI-STAGE SPEECH SIGNAL IMPROVEMENT SYSTEM FOR ICASSP 2023 SSI
CHALLENGE
weixin zhu (tencent)*; Zilin Wang (Tsinghua University); Jiuxin Lin (Tsinghua University); Chang Zeng
(National Institute of Informatics); Tao Yu (Tencent)
6915: THE AJMIDE TOPIC SEGMENTATION SYSTEM FOR THE ICASSP 2023 GENERAL MEETING
UNDERSTANDING AND GENERATION CHALLENGE
Beibei Hu (Ajmide Media)*; Qiang Li (Ajmide Media); Xianjun Xia (Ajmide Media)
16
6916: Signal Processing Grand Challenge 2023 - e-Prevention: Sleep Behavior as an Indicator of
Relapses in Psychotic Patients
Kleanthis Avramidis (University of Southern California)*; Kranti Adsul (University of Southern California);
Digbalay Bose (University of Southern California); Shrikanth Narayanan (USC)
6917: Dual-Path Dilated Convolutional Recurrent Network with Group Attention for Multi-Channel
Speech Enhancement
Jiaming Cheng (Southeast University)*; Cong Pang (Southeast University); Ruiyu Liang (Southeast
University); Jingjie Fan (Southeast University); Li Zhao (Southeast University)
6919: TWO-STAGE NEURAL NETWORK FOR ICASSP 2023 SPEECH SIGNAL IMPROVEMENT
CHALLENGE
Mingshuai Liu (NWPU)*; Shubo Lv (Shaanxi Provincial Key Laboratory of Speech and Image Information
Processing, School of Computer Science, Northwestern Polytechnical University); Zihan Zhang
(Northwestern Polytechnical University); Runduo Han (Northwestern Polytechnical University); Xiang Hao
(NWPU); Xianjun Xia (ByteDance); Li Chen (ByteDance ); Yijian Xiao (ByteDance); Lei Xie (NWPU)
6923: THE NIO SYSTEM FOR AUDIO-VISUAL DIARIZATION AND RECOGNITION IN MISP
CHALLENGE 2022
Gaopeng Xu (nio)*; Xianliang Wang (nio); Sang Wang (nio); junfeng yuan (nio); Wei Guo (nio); Wei Li
(nio); Jie Gao (nio)
6926: 3D audio signal processing systems for speech enhancement and sound localization and
detection
Jisheng Bai (School of Marine Science and Technology, Northwestern Polytechnical University)*; Siwei
Huang (JLESS); Han Yin (JLESS); Mou Wang (Northwestern Polytechnical University); Yafei Jia (School
of Marine Science and Technology, Northwestern Polytechnical University); Jianfeng Chen (School of
Marine Science and Technology, Northwestern Polytechnical University)
6929: THE NERCSLIP-USTC SYSTEM FOR THE L3DAS23 CHALLENGE TASK2: 3D SOUND EVENT
LOCALIZATION AND DETECTION (SELD)
Haoyin Yan (University of Science and Technology of China)*; Haitao Xu ( University of Science and
Technology of China); Jie Zhang (University of Science and Technology of China); Qing Wang (University
of Science and Technology of China)
6930: Cross-Lingual Transfer Learning for Alzheimer’s Detection From Spontaneous Speech
Bastiaan Tamm (KU Leuven)*; Rik Vandenberghe (University of Leuven); Hugo Van hamme (KU Leuven)
17
7015: Lightweight Machine Learning for Seizure Detection on Wearable Devices
Baichuan Huang (Lund University)*; Azra Abtahi (Lund University); Amir Aminifar (Lund University)
18
Applied Signal Processing Systems
394: Robust Dominant Periodicity Detection for Time Series with Missing Data
Qingsong Wen (Alibaba DAMO Academy)*; Linxiao Yang (Machine Intelligence Technology, Alibaba
Group, Hangzhou, China); Liang Sun (Alibaba Group)
19
1603: TSPTQ-ViT: TWO-SCALED POST-TRAINING QUANTIZATION FOR VISION TRANSFORMER
Yu Shan Tai (National Taiwan University GIEE)*; Ming Guang Lin (National Taiwan University GIEE); An-
Yeu (Andy) Wu (National Taiwan University)
1724: Adaptive Noise Canceller Algorithm with SNR-Based Stepsize and Data-Dependent
Averaging
Akihiko K. Sugiyama (Yahoo Japan Corporation)*
3137: Real-time modelling of observation filter in the Remote Microphone Technique for an Active
Noise Control application
Chung Kwan Lai (Nanyang Technological University)*; Bhan Lam (NTU); Dongyuan Shi (NTU); Woon
Seng Gan (NTU )
20
3293: Single-anchor UWB Localization using Channel Impulse Response Distributions
Sitian Li (EPFL)*; Alexios Balatsoukas-Stimming (Eindhoven University of Technology); Andreas Burg
(EPFL)
3733: Finding Optimal Numerical Format for Sub-8-bit Post-Training Quantization of Vision
Transformers
Janghwan Lee (Hanyang University)*; Youngdeok Hwang (Baruch College - The City University of New
York (CUNY)); Jungwook Choi (Hanyang University)
3925: Low-Complexity Low-Rank Approximation SVD for Massive Matrix in Tensor Train Format
Jung-Chun Chi (National Tsing Hua University); Chiao-En Chen (National Chung Hsing University); Yuan-
Hao Huang (National Tsing Hua University)*
3961: A Multi-Channel Aggregation Framework for Object Detection in Large-Scale SAR Image
Chule Yang (Defense Innovation Institute(DII))*; Chao Zhang (College of Computer Science and
Technology, Harbin Engineering University); Zunlin Fan (National Innovation Institute of Defense
Technology, China); Zeting Yu ( Defense Innovation Institute(DII)); Qianchong Sun (Defense
Innovation Institute(DII)); Mengyuan Dai (Defense Innovation Institute (DII))
21
4116: A Momentum Two-gradient Direction Algorithm with Variable Step Size Applied to Solve
Practical Output Constraint Issue for Active Noise Control
Xiaoyi Shen (Nanyang Technological University)*; Dongyuan Shi (Nanyang Technological University);
Zhengding Luo (Nanyang Technological University); Junwei Ji (Nanyang Technological University); Woon
Seng Gan (NTU )
4620: VAN-ICP: GPU-Accelerated Approximate Nearest Neighbor Search for ICP Registration via
Voxel Dilation
Weimin Wang (Dalian University of Technology)*; Qiong Chang (Tokyo Institute of Technology)
4851: Joint Angle and Respiration Estimation for Passive and Device-Free Respiration Monitoring
Gerrit Maus (University of Wuppertal)*; Dieter Brückmann (University of Wuppertal)
4985: Boosting the Accuracy of SRAM-Based In-Memory Architectures via Maximum Likelihood-
based Error Compensation Methods
Hyungyo Kim (University of Illinois at Urbana-Champaign)*; Naresh Shanbhag (University of Illinois at
Urbana-Champaign)
22
5231: TEFISTA-Net: GTD Parameter Estimation of Low-Frequency Ultra-Wideband Radar via
Model-Based Deep Learning
Rui Li (Tsinghua University)*; Xueqian Wang (Tsinghua University); Gang Li (Tsinghua University); Xiao-
Ping Zhang (Toronto Metropolitan University)
5441: Enhancing the Accuracy of Resistive In-memory Architectures using Adaptive Signal
Processing
Han-Mo Ou (University of Illinois Urbana-Champaign)*; Naresh Shanbhag (University of Illinois at
Urbana-Champaign)
5872: ClassA Entropy for the analysis of structural complexity of physiological signals
Hongjian Xiao (Imperial College London)*; Ling Li (City, University of London ); Danilo P. Mandic
((Imperial College of London, UK))
6301: Multiple Target Measurements: Bayesian Framework for Moving Object Detection in MIMO
Radar
Bastian Eisele (Friedrich-Alexander-Universität Erlangen-Nürnberg)*; Ali Bereyhi (Friedrich-Alexander-
Universität Erlangen-Nürnberg); Ralf Müller (Friedrich-Alexander-Universität Erlangen-Nürnberg)
6355: Causal discovery and causal inference based counterfactual fairness in machine learning
Yajing Wang (BNU-HKBU United International College)*; Zongwei Luo (BNU ZH)
6365: CAN2V: CAN-BUS DATA-BASED SEQ2SEQ MODEL FOR VEHICLE VELOCITY PREDICTION
Jae-Heung Cho (Hanyang University); Joon-Hyuk Chang (Hanyang University)*
6443: LMBAO: A Landmark Map for Bundle Adjustment Odometry in LiDAR SLAM
Letian Zhang (Sun Yat-sen University); Jinping Wang (Sun Yat-sen University); Jie Lu (Sun Yat-sen
University); Nanjie Chen (Sun Yat-sen University); Xiaojun Tan (Sun Yat-sen University)*; Duan Zhifei
(XPeng Inc)
23
6491: Modulo EEG Signal Recovery using Transformer
Tianyu Geng (Nanyang Technological University); Feng Ji (Nanyang Technological University); Pratibha
Rana (Agency for Science, Technology and Research); Wee Peng Tay (Nanyang Technological
University)*
24
Audio and Acoustic Signal Processing
249: Compose & Embellish: Well-Structured Piano Performance Generation via A Two-Stage
Approach
Shih-Lun Wu (National Taiwan University)*; Yi-Hsuan Yang (Academia Sinica)
267: RNN-based step-size estimation for the RLS algorithm with application to acoustic echo
cancellation
Ofer Schwartz (CEVA Inc.)*; Ayal Schwartz (BIU)
317: Few-shot continual learning with weight alignment and positive enhancement for bioacoustic
event detection
Xiaoxiao Wu (Shanghai Normal University); Dongxing Xu (Unisound AI Technology Co., Ltd., Beijing);
Haoran Wei (University of Texas at Dallas); yanhua long (Shanghai Normal University)*
348: Soft Dynamic Time Warping for Multi-Pitch Estimation and Beyond
Michael Krause (International Audio Laboratories Erlangen)*; Christof Weiß (University of Würzburg);
Meinard Müller (International Audio Laboratories Erlangen)
447: MAID: A Conditional Diffusion Model For Long Music Audio Inpainting
Kaiyang Liu (Sichuan university)*; Wendong Gan (Wiz Holdings Pte Ltd); Chenchen Yuan (Sichuan
university)
25
647: Diverse and Vivid Sound Generation from Text Descriptions
Guangwei Li (Shanghai Jiao Tong University)*; Xuenan Xu (Shanghai Jiao Tong University); Lingfeng Dai
(Shanghai Jiao Tong University); Mengyue Wu (Shanghai Jiao Tong University); Kai Yu (Shanghai Jiao
Tong University)
897: Breaking the trade-off in personalized speech enhancement with cross-task knowledge
distillation
Hassan Taherian (The Ohio State Universtiy)*; Sefik Emre Eskimez (Microsoft); Takuya Yoshioka
(Microsoft)
984: Improving Weakly Supervised Sound Event Detection with Causal Intervention
Yifei Xin (Peking University)*; Dongchao Yang (Peking university); fan cui (xiaomi); Yujun Wang (xiaomi);
Yuexian Zou (Peking University)
26
1035: End-to-End Amp Modelling: From Data to Controllable Guitar Amplifier Models
Lauri Juvela (Aalto University)*; Eero-Pekka Damskägg (Neural DSP); Aleksi Peussa (Neural DSP);
Jaakko Mäkinen (Neural DSP); Thomas Sherson (Neural DSP); Stylianos I Mimilakis (Neural DSP);
Kimmo Rauhanen (Neural DSP); Athanasios Gotsopoulos (Neural DSP)
1066: Graph neural networks for sound source localization on distributed microphone networks
Eric Grinstein (Imperial College London)*; Mike Brookes (Imperial College London); Patrick A. Naylor
(Imperial College London)
1117: Audio Coding With Unified Noise Shaping And Phase Contrast Control
Byeongho Jo (Electronics and Telecommunications Research Institute)*; Seung-Kwon Beack (IEEE
Broadcast Technology Society (BTS)); Taejin Lee (ETRI)
1207: Frame-Level Multi-Label Playing Technique Detection Using Multi-Scale Network and Self-
Attention Mechanism
Dichucheng Li (Fudan University)*; Mingjin Che (Sichuan Conservatory of Music); Wen wu Meng
(Sichuan Conservatory of Music); Yulun Wu (Fudan University); Yi Yu (NII); Fan Xia (Sichuan
Conservatory of Music ); Wei Li (Fudan University)
1329: Design Choices for Learning Embeddings from Auxiliary Tasks for Domain Generalization in
Anomalous Sound Detection
Kevin Wilkinghoff (Fraunhofer FKIE)*
1350: Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and
sound-source images
Hien Ohnaka (National Institute of Technology, Tokuyama College)*; Shinnosuke Takamichi (The
University of Tokyo); Keisuke Imoto (Doshisha University); Yuki Okamoto (Ritsumeikan University);
Kazuki Fujii (The University of Tokyo); Hiroshi Saruwatari (The University of Tokyo)
1359: Noise PSD Insensitive RTF Estimation in a Reverberant and Noisy Environment
Changheng Li (Delft University of Technology)*; Richard Hendriks (TU Delft)
1368: SARdBScene: Dataset and ResNet Baseline for Audio Scene Source Counting and Analysis
Michael Nigro (Toronto Metropolitan University)*; Sri Krishnan (Ryerson University)
27
1393: A FREQUENCY-DOMAIN RECURSIVE LEAST-SQUARES ADAPTIVE FILTERING ALGORITHM
BASED ON A KRONECKER PRODUCT DECOMPOSITION
Hongsen He (Southwest University of Science and Technology)*; Jingdong Chen (Northwestern
Polytechnical University); Jacob Benesty (INRS); Yi Yu (Southwest University of Science and Technology)
1468: TransAudio: Towards the Transferable Adversarial Audio Attack via Learning Contextualized
Perturbations
Gege Qi (Alibaba)*; Yuefeng Chen (Alibaba Group); Yao Zhu (Zhejiang University); Binyuan Hui (Alibaba
Group); Xiaodan Li (Alibaba Group); Xiaofeng Mao (Alibaba Group); rong zhang (Alibaba); hui xue
(Alibaba)
1493: NORD: Non-Matching Reference Based Relative Depth Estimation From Binaural Audio
Pranay Manocha (Princeton University)*; Israel D Gebru (Facebook); Anurag Kumar (Facebook
Research); Dejan Markovic (Facebook Reality Labs); Alexander Richard (Facebook Reality Labs)
1509: Weight-sharing Supernet for Searching Specialized Acoustic Event Classification Networks
Across Device Constraints
Guan-Ting Lin (National Taiwan University)*; Qingming Tang (Amazon, Alexa); Chieh-Chi Kao (Amazon);
Viktor Rozgic (Amazon Alexa); Chao Wang (Amazon)
1620: Global HRTF Interpolation via Learned Affine Transformation of Hyper-conditioned Features
Jin Woo Lee (Seoul National University)*; Sungho Lee (Seoul National University); Kyogu Lee (Seoul
National University)
28
1663: Masked Spectrogram Prediction for Self-supervised Audio Pre-training
DaDing Chong (Peking university); Helin Wang (Johns Hopkins University)*; Peilin Zhou (The Hong Kong
University of Science and Technology); Qingcheng Zeng (Northwestern University)
1720: Linear Microphone Array Parallel to the Driving Direction for In-Car Speech Enhancement
Masanori Tsujikawa (NEC Corporation); Akihiko K. Sugiyama (Yahoo Japan Corporation)*; Ken
Hanazawa (NEC Laboratories America, Inc.); Yoshinobu Kajikawa (Kansai University)
1825: HybridFormer: Improving SqueezeFormer with Hybrid Attention and NSR Mechanism
Yuguang Yang (Ximalaya Inc., ShangHai, China)*; Yu Pan (University of Alberta); Jingjing Yin (Ximalaya);
jiangyu Han (Ximalaya ); Lei Ma (University of Alberta); heng lu (Ximalaya Inc., ShangHai, China )
1860: Spatial active noise control method based on sound field interpolation from reference
microphone signals
Kazuyuki Arikawa (The University of Tokyo)*; Shoichi Koyama (The University of Tokyo); Hiroshi
Saruwatari (The University of Tokyo)
1884: Kernel interpolation of acoustic transfer functions with adaptive kernel for directed and
residual reverberations
Juliano G. C. Ribeiro (The University of Tokyo)*; Shoichi Koyama (The University of Tokyo); Hiroshi
Saruwatari (The University of Tokyo)
29
1958: Better Together: Dialogue Separation and Voice Activity Detection for Audio Personalization
in TV
Matteo Torcoli (International Audio Laboratories Erlangen)*; Emanuel Habets (AudioLabs Erlangen)
2121: Improving performance of real-time full-band blind packet-loss concealment with predictive
network
Nguyen Viet Anh (NamiTech JSC)*; Anh Nguyen (NamiTech JSC); Andy W H Khong (Nanyang
Technological University)
30
2305: ICCRN: INPLACE CEPSTRAL CONVOLUTIONAL RECURRENT NEURAL NETWORK FOR
MONAURAL SPEECH ENHANCEMENT
Jinjiang Liu (College of Computer Science, Inner Mongolia University)*; Xueliang zhang (Inner Mongolia
University)
2376: Speaker Diaphragm Excursion Prediction: deep attention and online adaptation
Yuwei Ren (Qualcomm AI Research, QUALCOMM Wireless Communication Technologies (China)
Limited)*; Matt Zivney (Qualcomm AI Research, Qualcomm Technologies, Inc.); Yin Huang (Qualcomm);
Eddie Choy (Qualcomm AI Research, Qualcomm Technologies, Inc.); Chirag Patel (Qualcomm); Hao Xu
(Qualcomm AI Research, Qualcomm Technologies, Inc.)
2393: A DNN-based hearing-aid strategy for real-time processing: One size fits all
Fotios Drakopoulos (Ghent University)*; Arthur Van Den Broucke (Ghent University); Sarah Verhulst
(Ghent University)
2514: Fast Online Source Steering Algorithm for Tracking Single Moving Source Using Online
Independent Vector Analysis
Taishi Nakashima (Tokyo Metropolitan University)*; Rintaro Ikeshita (NTT); Nobutaka Ono (Tokyo
Metropolitan University); Shoko Araki (NTT Corporation); Tomohiro Nakatani (NTT Communication
Science Laboratories)
31
2656: SCA: STREAMING CROSS-ATTENTION ALIGNMENT FOR ECHO CANCELLATION
Yang Liu (Meta)*; Yangyang Shi (Facebook); Yun Li (Meta); Kaustubh Kalgaonkar (Meta); Sriram
Srinivasan (Meta); Xin Lei (Meta)
2866: Zero-shot Sound Event Classification Using a Sound Attribute Vector with Global and Local
Feature Learning
Yi-Han Lin (Kobe University)*; Xunquan Chen (Kobe University); Ryoichi Takashima (Kobe University);
Tetsuya Takiguchi (Kobe University)
2881: DEEPSPACE: DYNAMIC SPATIAL AND SOURCE CUE BASED SOURCE SEPARATION FOR
DIALOG ENHANCEMENT
Aaron S Master (Dolby Laboratories, Inc)*; Lie Lu (Dolby Laboratories); Jonas Samuelsson (Dolby
Laboratories, Inc); Heidi-Maria Lehtonen (Dolby Sweden AB); Scott Norcross (Dolby Laboratories, Inc);
Nathan Swedlow (Dolby Laboratories, Inc); Audrey Howard (Dolby Laboratories, Inc)
2895: AST-SED: an Effective Sound Event Detection Method Based on Audio Spectrogram
Transformer
Kang Li (University of Science and Technology of China,National Engineering Research Center of
Speech and Language Information Processing.); Yan Song (USTC)*; Lirong Dai (University of Science
and Technology of China); Ian McLoughlin (Singapore Institute of Technology); Xin Fang (iFlytek
Research); Lin Liu (iFlytek Research)
32
3007: Ripple Sparse Self-Attention For Monaural Speech Enhancement
Qiquan Zhang (The University of New South Wales); Hongxu Zhu (Department of Electrical and
Computer Engineering, National University of Singapore); Qi Song (Alibaba)*; Xinyua Qian (Department
of Electrical and Computer Engineering, National University of Singapore); Zhaoheng Ni (Meta AI);
Haizhou Li (The Chinese University of Hong Kong, Shenzhen)
3088: Improving Text-Audio Retrieval by Text-aware Attention Pooling and Prior Matrix Revised
Loss
Yifei Xin (Peking University)*; Dongchao Yang (Peking university); Yuexian Zou (Peking University)
3196: Optimal Transport in Diffusion Modeling for Conversion Tasks in Audio Domain
Vadim Popov (Huawei Noah's Ark Lab); Amantur Amatov (Huawei); Mikhail Kudinov (Huawei Noah's Ark
Lab); Vladimir Gogoryan (Huawei Noah's Ark Lab)*; Tasnima Sadekova (Huawei Noah's Ark Lab); Ivan
Vovk (Huawei Noah's Ark Lab)
3426: Blind source counting and separation with relative harmonic coefficients
Huiyuan Sun (The Australian National University)*; Prasanga Samarasinghe (Australian National
University); thushara abhayapala (The Australian National University)
3513: The R3VIVAL dataset: Repository of room responses and 360 videos of a variable acoustics
room
Florian Klein (TU Ilmenau); Sebastia V. Amengual Garí (Reality Labs Research, Meta)*
3549: SingNet: A Real-time Singing Voice Beat and Downbeat Tracking System
Mojtaba Heydari (University of Rochester)*; Ju-Chiang Wang (TikTok); Zhiyao Duan (Unversity of
Rochester)
33
3595: Active Noise control over 3D space: A realistic error microphone geometry design
Huiyuan Sun (The Australian National University)*; Prasanga Samarasinghe (Australian National
University); thushara abhayapala (The Australian National University)
3765: A study of audio mixing methods for piano transcription in violin-piano ensembles
Hyemi Kim (KAIST / ETRI)*; Jiyun Park (KAIST); Taegyun Kwon (KAIST); Dasaem Jeong (Sogang
University); Juhan Nam (KAIST)
3819: Show Me the Instruments: Musical Instrument Retrieval from Mixture Audio
Kyungsu Kim (Seoul National University)*; Minju Park (Seoul National University); Haesun Joung (Seoul
National University); Yunkee Chae (Seoul National University); Yeongbeom Hong (Seoul National
University); Seonghyeon Go (Seoul National University); Kyogu Lee (Seoul National University)
34
3908: GENERAL OR SPECIFIC? INVESTIGATING EFFECTIVE PRIVACY PROTECTION IN
FEDERATED LEARNING FOR SPEECH EMOTION RECOGNITION
Chao Tan (Kyoto University )*; Yang Cao (Hokkaido University); Sheng Li (National Institute of Information
& Communications Technology (NICT)); Masatoshi Yoshikawa (Kyoto University)
4002: Time-weighted Frequency Domain Audio Representation with GMM Estimator for
Anomalous Sound Detection
Jian Guan (Harbin Engineering University)*; Youde Liu ( Harbin Institute of Technology); Qiaoxi Zhu
(University of Technology Sydney); 铁然 郑 (哈尔滨工业大学 ); jiqing Han (Harbin Institute of Technology);
Wenwu Wang (University of Surrey)
4094: Training sound event detection with soft labels from crowdsourced annotations
Irene Martin (Tampere University)*; Manu Harju (Tampere University); Paul Ahokas (Tampere University);
Annamaria Mesaros (Tampere University)
4119: Attention Mixup: An Accurate Mixup Scheme based on Interpretable Attention Mechanism
for Multi-label Audio Classification
Wuyang Liu (School of Cyber Science and Engineering, Wuhan University)*; Yanzhen Ren (Computer
School of Wuhan University); Jingru Wang (School of Cyber Science and Engineering, Wuhan University)
4152: Frequency bin-wise single channel speech presence probability estimation using multiple
DNNs
Shuai Tao (Aalborg University)*; Himavanth Reddy Pundla (Aalborg University); Jesper Rindom Jensen
(Aallborg University); Mads G. Christensen (Audio Analysis Lab., AD:MT, Aalborg University, Denmark)
4205: Dereverberation in Acoustic Sensor Networks using Weighted Prediction Error with
Microphone-dependent Prediction Delays
Anselm Lohmann (University of Oldenburg)*; Toon van Waterschoot (Department of Electrical
Engineering (ESAT-STADIUS/ETC)); Joerg Bitzer (Institute of Hearing Technology and Audiology, Jade
University of Applied Sciences, Oldenburg); Simon Doclo (University of Oldenburg)
35
4214: SPEAKERAUGMENT: DATA AUGMENTATION FOR GENERALIZABLE SOURCE SEPARATION
VIA SPEAKER PARAMETER MANIPULATION
Kai Wang (Xinjiang University); Yuhang Yang (School of Information Science and Engineering, Xinjiang
University, China); Hao Huang (Xinjiang University)*; Ying Hu (Xinjiang University); Sheng Li (National
Institute of Information & Communications Technology (NICT))
4215: SW-WaveNet: Learning Representation from Spectrogram and Wavegram Using WaveNet for
Anomalous Sound Detection
Haihui Chen (Huazhong University of Science and Technology)*; Likai Ran (Huazhong University of
Science and Technology); Xixia Sun (Nanjing University of Posts and Telecommunications); Chao Cai
(Huazhong University of Science and Technology)
4303: Geometry-aware DoA Estimation using a Deep Neural Network with mixed-data input
features
Ulrik Kowalk (Institute of Hearing Technology and Audiology, Jade University of Applied Sciences,
Oldenburg)*; Simon Doclo (University of Oldenburg); Joerg Bitzer (Institute of Hearing Technology and
Audiology, Jade University of Applied Sciences, Oldenburg)
36
4516: Disentangling the Horowitz factor: Learning content and style from expressive piano
performance
Huan Zhang (Queen Mary University of London)*; Simon Dixon (Queen Mary University of London)
4540: Pre-training strategies using contrastive learning and playlist information for music
classification and similarity
Pablo Alonso-Jiménez (Universitat Pompeu Fabra)*; Xavier Favory (Utopia Music); Hadrien
Foroughmand (Utopia Music); Grigoris Bourdalas (Utopia Music); Xavier Serra (Universitat Pompeu
Fabra ); Thomas Lidy (Utopia Music); Dmitry Bogdanov (Universitat Pompeu Fabra)
4575: A Frequency-weighted Leaky FxLMS Algorithm with Application to Feedback Active Noise
Control Systems
Yu Tang (Southwest Jiaotong University)*; Hongwei Zhang (Harbin Institute of Tech. Shenzhen)
4586: STUDY AND DESIGN OF ROBUST PERSONAL SOUND ZONES WITH VAST USING LOW
RANK RIRs
Sankha Subhra Bhattacharjee (Audio Analysis Lab, CREATE, Aalborg University)*; Liming Shi (CIE,
Chongqing University of Posts and Telecommunications); Guoli Ping (Acoustic Engineering Lab, Huawei
Technologies Co., Ltd); Xiaoxiang Shen (Acoustic Engineering Lab, Huawei Technologies Co., Ltd); Mads
G. Christensen (Audio Analysis Lab., AD:MT, Aalborg University, Denmark)
4589: Online Binaural Speech Separation of Moving Speakers With a Wavesplit Network
Cong Han (Columbia Univeristy)*; Nima Mesgarani (Columbia University)
4670: Distributed Adaptive Norm Estimation for Blind System Identification in Wireless Sensor
Networks
Matthias Blochberger (KU Leuven)*; Filip Elvander (Aalto University); Randall Ali (KU Leuven); Jan
Ostergaard (Aalborg University); Jesper Jensen (Aalborg University); Marc Moonen (KU Leuven); Toon
van Waterschoot (Department of Electrical Engineering (ESAT-STADIUS/ETC))
4677: On the relevance of the differences between HRTF measurement setups for machine
learning
Johan Pauwels (Queen Mary University of London)*; Lorenzo Picinali (Imperial College London)
37
4702: HiSSNet: Sound Event Detection and Speaker Identification via Hierarchical Prototypical
Networks for Low-Resource Headphones
N Shashaank (Columbia University)*; Berker Banar (Queen Mary University of London); Mohammad Izadi
(BOSE); Jeremy Kemmerer (BOSE); Shuo Zhang (Bose); Chuan-Che Huang (BOSE)
4790: HRTF Field: Unifying Measured HRTF Magnitude Representation with Neural Fields
You Zhang (University of Rochester)*; Yuxiang Wang (university of rochester); Zhiyao Duan (Unversity of
Rochester)
4848: Acoustic source localization in the spherical harmonics domain exploiting low-rank
approximations
Maximo Cobos (Universitat de Valencia)*; Mirco Pezzoli (Politecnicno di Milano); Fabio Antonacci
(Politecnico di Milano); Augusto Sarti (Politecnico di Milano)
4900: SPICE+: Evaluation of Automatic Audio Captioning Systems with Pre-trained Language
Models
Felix Gontier (INRIA)*; romain serizel (Université de Lorraine); Christophe Cerisara (CNRS)
4903: Neural-AFC: Learning-Based Step-Size Control for Adaptive Feedback Cancellation with
Closed-loop Model Training
Behrad Soleimani (Starkey Hearing Technologies)*; Henning Schepker (Starkey Hearing Technologies);
Majid Mirbagheri (Starkey Hearing Technologies)
4930: Exploiting speaker embeddings for improved microphone clustering and speech separation
in ad-hoc microphone arrays
Stijn Kindt (UGent)*; Jenthe Thienpondt (IDLab, Ghent University); Nilesh Madhu (IDLab, Ghent
University - imec)
38
4976: LEARNING ENVIRONMENTAL STRUCTURE USING ACOUSTIC PROBES WITH A DEEP
NEURAL NETWORK
Toros ARIKAN (MIT)*; Amir Weiss (Massachusetts Institute of Technology); Hari Vishnu (NUS); Grant
Deane (UCSD); Andrew C Singer (University of Illinois); Gregory W Wornell (MIT)
5089: TF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural Speaker
Separation
Zhong-Qiu Wang (Carnegie Mellon University)*; Samuele Cornell (Università Politecnica delle Marche);
Shukjae Choi (Hyundai Motor Company); Younglo Lee (42dot); Byeong-Yeol Kim (42dot); Shinji
Watanabe (Carnegie Mellon University)
5098: Does a quieter city mean fewer complaints? The Sounds of New York City During COVID-19
Lockdown
Mark Cartwright (New Jersey Institute of Technology)*; Magdalena Fuentes (New York University);
Charlie Mydlarz (New York University); Fabio Miranda (University of Illinois, USA); Juan P Bello (New
York University)
39
5153: ESTIMATING ACOUSTIC DIRECTION OF ARRIVAL USING A SINGLE STRUCTURAL SENSOR
ON A RESONANT SURFACE
Tre DiPassio (University of Rochester)*; Michael Heilemann (University of Rochester); Benjamin`
Thompson (University of Rochester); Mark Bocko (University of Rochester)
5278: Spherical sector harmonics based soundfield radial extrapolation and robustness analysis
Hanwen Bi (ANU)*; thushara abhayapala (The Australian National University); fei ma (Australian National
University); Prasanga Samarasinghe (Australian National University)
5297: LOSS FUNCTION DESIGN FOR DNN-BASED SOUND EVENT LOCALIZATION AND
DETECTION ON LOW-RESOURCE REALISTIC DATA
Qing Wang (University of Science and Technology of China); Jun Du (University of Science and
Technology of China)*; Zhaoxu Nian (University of Science and Technology of China); Shutong Niu
(University of Science and Technology of China ); Li Chai (University of Science and Technologoy of
China); Huaxin Wu (iFlytek Research); Jia Pan (University of Science and Technology of China); Chin-Hui
Lee (Georgia Institute of Technology)
5316: FretNet: Continuous-Valued Pitch Contour Streaming for Polyphonic Guitar Tablature
Transcription
Frank Cwitkowitz (University of Rochester)*; Toni Hirvonen (Yousician); Anssi Klapuri (Yousician)
5328: IMPROVING ACOUSTIC ECHO CANCELLATION BY MIXING SPEECH LOCAL AND GLOBAL
FEATURES WITH TRANSFORMER
yajie liu (School of Computer Science, Wuhan University)*; Xinmeng Xu (Wuhan University); Weiping Tu
(Wuhan University); Yuhong Yang (Wuhan University); Li Xiao (School of Computer Science, Wuhan
University)
5425: Progressive Multi-stage Neural Audio Codec with Psychoacoustic Loss and Discriminator
Byeong Hyeon Kim (Yonsei University)*; Hyungseob Lim (Yonsei University); Jihyun Lee (yonsei
university); Inseon Jang (Electronics and Telecommunications Research Institution); Hong-Goo Kang
(Yonsei University)
40
5442: LEARNING TO DETECT NOVEL AND FINE-GRAINED ACOUSTIC SEQUENCES USING
PRETRAINED AUDIO REPRESENTATIONS
Vasudha Kowtha (Apple)*; Miquel Espi (Apple); Jonathan J Huang (Apple); Yichi Zhang (Apple); Carlos
Avendano (Apple)
5468: Improving Music Genre Classification from Multi-Modal Properties of Music and Genre
Correlations Perspective
Ganghui Ru (Fudan University); Xulong Zhang (Ping An Technology (Shenzhen) Co., Ltd.); Jianzong
Wang (Ping An Technology (Shenzhen) Co., Ltd)*; Ning Cheng (Ping An Technology (Shenzhen) Co.,
Ltd); Jing Xiao (Ping An Insurance (Group) Company of China)
5545: NAS-DYMC: NAS-based Dynamic Multi-Scale Convolutional Neural Network for Sound Event
Detection
Wang Jun (Kuaishou Technology)*; Peng Yao (Kuaishou Inc.); Feng Deng (Kuaishou); Jianchao Tan
(Kwai Inc.); Chengru Song (Kuaishou); Xiaorui Wang (Kwai)
5588: JOINT NOISE REDUCTION AND LISTENING ENHANCEMENT FOR FULL-END SPEECH
ENHANCEMENT
Haoyu Li (National Institute of Informatics)*; Yun Liu (National Institute of Informatics); Junichi Yamagishi
(National Institute of Informatics)
5610: Partially Adaptive Multichannel Joint Reduction of Ego-noise and Environmental Noise
Huajian Fang ( Universität Hamburg)*; Niklas Wittmer (Universität Hamburg); Johannes Twiefel
(Universität Hamburg); Stefan Wermter (University of Hamburg); Timo Gerkmann (Universität Hamburg)
5672: Multi-dimensional frequency dynamic convolution with confident mean teacher for sound
event detection
shengchang xiao (UCAS)*; xueshuai zhang (UCAS); pengyuan zhang ( Institute of Acoustics, Chinese
Academy of Sciences)
5676: The Potential of Neural Speech Synthesis-Based Data Augmentation for Personalized
Speech Enhancement
Anastasia Kuznetsova (Indiana University)*; Aswin Sivaraman (Indiana University Bloomington); Minje
Kim (Indiana University)
41
5709: Analysing Diffusion-based Generative Approaches versus Discriminative Approaches for
Speech Restoration
Jean-Marie Lemercier (Universität Hamburg)*; Julius Richter (Universität Hamburg); Simon Welker
(Universität Hamburg); Timo Gerkmann (Universität Hamburg)
5724: Uncertainty Estimation in Deep Speech Enhancement Using Complex Gaussian Mixture
Models
Huajian Fang ( Universität Hamburg)*; Timo Gerkmann (Universität Hamburg)
5734: Performance above all ? Energy consumption vs. performance, a study on sound event
detection with heterogeneous data
romain serizel (Université de Lorraine)*; Samuele Cornell (Università Politecnica delle Marche); Nicolas
Turpault (Inria)
5759: Convolutive NTF for Ambisonic Source Separation Under Reverberant Conditions
Mateusz Guzik (AGH University of Science and Technology)*; Konrad Kowalczyk (AGH University of
Science and Technology)
5842: AUDIO SIGNAL ENHANCEMENT WITH LEARNING FROM POSITIVE AND UNLABELLED
DATA
Nobutaka Ito (UTokyo)*; Masashi Sugiyama (RIKEN/The University of Tokyo)
5884: NOTE AND PLAYING TECHNIQUE TRANSCRIPTION OF ELECTRIC GUITAR SOLOS IN REAL-
WORLD MUSIC PERFORMANCE
TungSheng Huang (Georgia Institute of Technology)*; Ping-Chung Yu (National Tsing Hua University); Li
Su (Academia Sinica)
5924: Immersive enhancement and removal of loudspeaker sound using wireless assistive
listening systems and binaural hearing devices
Ryan M Corey (University of Illinois Chicago)*; Andrew C Singer (University of Illinois)
42
6101: Wireless Deep Speech Semantic Transmission
Zixuan Xiao (Beijing University of Posts and Telecommunications); Shengshi Yao (Beijing University of
Posts and Telecommunications); Jincheng Dai (Beijing University of Posts and Telecommunications)*;
Sixian Wang (Beijing University of Posts and Telecommunications); kai niu (Beijing University of Posts
and Telecommunications); Ping Zhang ( Beijing University of Posts and Telecommunications)
6119: Incorporating lip features into audio-visual multi-speaker DOA estimation by gated fusion
Ya Jiang (University of Science and Technology of China)*; Hang Chen (USTC); Jun Du (University of
Science and Technology of China); Qing Wang (University of Science and Technology of China); Chin-Hui
Lee (Georgia Institute of Technology)
6135: Lightweight Annotation and Class Weight Training for Automatic Estimation of Alarm
Audibility in Noise
François Effa (INRS)*; romain serizel (Université de Lorraine); Jean-Pierre Arz (INRS); Nicolas Grimault
(Université Lyon 1)
6341: Effectiveness of Inter- and Intra-Subarray Spatial Features for Acoustic Scene Classification
Takao Kawamura (Tokyo Metropolitan University)*; Yuma Kinoshita (Tokai University); Nobutaka Ono
(Tokyo Metropolitan University); Robin Scheibler (LINE Corporation)
43
6346: Piecewise position encoding in convoutional neural network for cough-based COVID-19
detection
Jiakun Shen (Institute of Acoustics, Chinese Academy of Sciences)*; XueShuai Zhang (University of
Chinese Academy of Sciences); pengyuan zhang (Institute of Acoustics, Chinese Academy of Sciences);
Yonghong Yan (Institute of Acoustics, Chinese Academy of Sciences); Shaoxing Zhang (Peking
University Third Hospital); Zhihua Huang (Xinjiang University); Yanfen Tang (Beijing Ditan Hospital Capital
Medical University); Yu Wang (Beijing Ditan Hospital Capital Medical University); Fujie Zhang (Beijing
Ditan Hospital Capital Medical University); Aijun Sun (Dalian Public Health Clinical Center)
6388: Aiding speech harmonic recovery in DNN-based single channel noise reduction using
cepstral excitation manipulation (CEM) components
Yanjue Song (Ghent University - imec)*; Nilesh Madhu (IDLab, Ghent University - imec)
6537: A Contrastive Embedding-based Domain Adaptation method for Lung Sound Recognition in
Children Community-Acquired Pneumonia
Dongmin Huang (Southern University of Science and Technology); Lingwei Wang (Shenzhen People's
Hospital); Hongzhou Lu (Department of Infectious Diseases, Shanghai Public Health Clinical Center,
Fudan University, Shanghai, China); Wenjin Wang (Southern University of Science and Technology)*
44
Biomedical Imaging and Signal Processing
133: Tensor-Based Complex-valued Graph Neural Network for Dynamic Coupling Multimodal Brain
Networks
Yanwu Yang (HIT at shenzhen)*; Guoqing Cai (Harbin Institute of Technology, Shenzhen); Chenfei Ye
(Harbin Institute of Technology at Shenzhen); Yang Xiang (Peng Cheng Laboratory); Ting Ma (Harbin
Institute of Technology,Shenzhen)
139: A new Semi-supervised classification method using a supervised autoencoder for biomedical
applications
Cyprien Gille (UMONS); Frederic Guyard (Orange Labs); Michel Barlaud (University of Nice)*
201: Towards simultaneous segmentation of liver tumors and intrahepatic vessels via cross-
attention mechanism
Haopeng Kuang (Fudan University); Dingkang Yang (Fudan University); Shunli Wang (Fudan University);
Xiaoying Wang (Zhongshan Hospital, Fudan University); Lihua Zhang (Fudan University)*
280: OCT image blind despeckling based on gradient guided filter with speckle statistical prior
sanqian Li (Southern University of Science and Technology); Muxing Xiong (Southern University of
Science and Technology); Bing Yang (Southern University of Science and Technology); Xiaoqing Zhang
(Southern University of Science and Technology); Risa Higashita (tomey corporation)*; Jiang Liu
(Southern University of Science and Technology)
370: ViTASD: Robust Vision Transformer Baselines for Autism Spectrum Disorder Facial
Diagnosis
Xu Cao (NYU)*; Wenqian Ye (NYU); Elena Sizikova (FDA); Xue Bai (Shenzhen children's hospital );
Megan Coffee (NYU); Hongwu Zeng (Shenzhen Children's Hospital); Jianguo Cao (Shenzhen Children's
Hospital)
45
424: Cardiac Disease Diagnosis on Imbalanced Electrocardiography Data Through Optimal
Transport Augmentation
Jielin Qiu (Carnegie Mellon University)*; Jiacheng Zhu (Carnegie Mellon University); Mengdi Xu
(Carnegie Mellon University); Peide Huang (Carnegie Mellon University); Michael Rosenberg (University
of Colorado Denver - Anschutz Medical Campus); Douglas J Weber (Carnegie Mellon University);
Emerson Liu (Allegheny General Hospital ); DING ZHAO (Carnegie Mellon University)
538: IDEAL: Improved DEnse LocAL Contrastive Learning for Semi-Supervised Medical Image
Segmentation
Hritam Basak (Stony Brook University)*; Soumitri Chattopadhyay (Jadavpur University); Rohit Kundu
(University of California, Riverside); Sayan Nag (University of Toronto); Rammohan Mallipeddi
(Kyungpook national University)
645: Exploiting Interactivity and Heterogeneity for Sleep Stage Classification via Heterogeneous
Graph Neural Network
Ziyu Jia (Beijing Jiaotong University); Youfang Lin (Beijing Jiaotong University); Yuhan Zhou (Beijing
Jiaotong University); Xiyang Cai ( University of California, Los Angeles); Peng Zheng (Beijing Jiaotong
University); Qiang Li (RWTH Aachen University); Jing Wang (Beijing Jiaotong University)*
888: Wavelet2Vec: A Filter Bank Masked Autoencoder for EEG-based Seizure Subtype
Classification
Ruimin Peng (Huazhong University of Science and Technology); changming zhao (Huazhong University
of Science and Technology); Yifan Xu (Huazhong University of Science and Technology); Jun Jiang
(Wuhan Children's Hospital); Guangtao Kuang (Wuhan Children's Hospital); Jianbo Shao (Wuhan
Children's Hospital); Dongrui Wu (Huazhong University of Science and Technology)*
46
1049: EEG2IMAGE: Image Reconstruction from EEG Brain Signals
Prajwal Singh (Indian Institute of Technology Gandhinagar, Gujarat, India)*; Pankaj Pandey (Indian
Institute of Technology Gandhinagar); Krishna P Miyapuram (Indian Institute of
Technology,Gandhinagar,India); Shanmuganathan Raman (Indian Institute of Technology (IIT)
Gandhinagar)
1164: SCSGNet: Spatial-Correlated and Shape-Guided Network for Breast Mass Segmentation
Qingqiu Li (Fudan University)*; Jilan Xu (Fudan University); Runtian Yuan ( Fudan University);
Yuejie Zhang (Fudan University); Rui Feng (Fudan University)
1539: A Mathematical Model for Neuronal Activity and Brain Information Processing Capacity
Yu Zheng (Michigan State University); David Zhu (Michigan State University); Jian Ren (Michigan State
University); Taosheng Liu (Michigan State University); Karl Friston (University College London); Tongtong
Li (Michigan State University)*
1667: Unbiased unsupervised stimulus reconstruction for EEG-based auditory attention decoding
Nicolas Heintz (KU Leuven)*; Simon Geirnaert (KU Leuven); Tom Francart (KU Leuven); Alexander
Bertrand (KU Leuven)
1771: This changes to that : Combining causal and non-causal explanations to generate disease
progression in capsule endoscopy
Anuja Vats (NTNU)*; Ahmed Mohammed (NTNU); Marius Pedersen (NTNU); Nirmalie Wiratunga (Robert
Gordon University)
47
1865: Domain Generalized Fundus Image Segmentation via Dual-Level Mixing
Xin Luo (College of Computer, National University of Defense Technology)*; Wei Chen (College of
Computer, National University of Defense Technology); Chen Li (National University of Defense
Technology); Bin Zhou (National University of Defense Technology); yusong tan (College of Computer,
National University of Defense Technology)
2029: Real-time Wireless ECG-derived Respiration Rate Estimation Using an Autoencoder with a
DCT Layer
Hongyi Pan (University of Illinois Chicago)*; Xin Zhu (UIC); Zhilu Ye (University of Illinois Chicago); Pai-
Yen Chen (University of Illinois Chicago); Ahmet E Cetin (University of Illinois at Chicago)
2097: Prototype Knowledge Distillation for Medical Segmentation with Missing Modality
Shuai Wang (Tsinghua University)*; Zipei Yan (The Hong Kong Polytechnic University); Daoan Zhang
(Southern University of Science and Technology); Haining Wei (Tsinghua University); Zhongsen Li
(Tsinghua University); Rui Li (Tsinghua University)
2116: HIERARCHICAL FILTERING WITH ONLINE LEARNED PRIORS FOR ECG DENOISING
Timur Locher (ETH Zurich); Guy Revach (ETH Zürich)*; Nir Shlezinger (Ben-Gurion University); Ruud J.
G. van Sloun (Technical university of Eindhoven); Rik Vullings ( Technical university of Eindhoven)
2159: Assessing the Robustness of Deep Learning-Assisted Pathological Image Analysis under
Practical Variables of Imaging System
YUXUAN SUN (Westlake University)*; Chenglu Zhu (Westlake University); Yunlong Zhang (Westlake
University); Honglin Li (Westlake University); Pingyi Chen (Westlake University); Lin Yang (Westlake
University)
2213: BrainNetFormer: Decoding Brain Cognitive States With Spatial-Temporal Cross Attention
Leheng Sheng (Tsinghua University); Wenhan Wang (Southeast University); Zhiyi Shi (Carnegie Mellon
University); Jichao Zhan (Southeast University); Youyong Kong (Southeast University)*
48
2378: A Novel Heart Rate Estimation Method Exploiting Heartbeat Second Harmonic
Reconstruction via Millimeter Wave Radar
Tao Li (China University of Mining and Technology)*; Huayu Shou (China University of Mining and
Technology); Yuchen Deng (China University of Mining and Technology); Yu Zhou (China University of
Mining and Technology); Chenqi Shi (China University of Mining and Technology); Pengpeng Chen
(China University of Mining and Technology)
2466: ECG Artifact Removal from Single-Channel Surface EMG Using Fully Convolutional
Networks
Kuan-Chen Wang (National Taiwan University); Kai-Chun Liu (Academia Sinica); Sheng-Yu Peng
(National Taiwan University of Science and Technology); Yu Tsao (Academia Sinica)*
2673: A New Approach to Extract Fetal Electrocardiogram Using Affine Combination of Adaptive
Filters
Yu Xuan (University of California San Diego); Xiangyu Zhang (Johns Hopkins University)*; Shuyue Stella
Li (Johns Hopkins University); zihan shen (University of Chinese Academy of Sciences); XIN XIE
(University of Califonia, San Diego); Paola Garcia (Johns Hopkins University); Roberto Togneri (The
University of Western Australian)
2854: LSSED: A robust segmentation network for inflamed appendix from CT images
Wing W.Y. Ng (South China University of Technology); Peixin Zheng (South China University of
Technology)*; Ting Wang (South China University of Technology); Jianjun Zhang (South China University
of Technology); Hui Zhou (The Sixth Affiliated Hospital of Guangzhou Medical University, Qingyuan
People’s Hospital); GuangMing Li (The Sixth Affiliated Hospital of Guangzhou Medical University,
Qingyuan People’s Hospital); Dan Liang (Guangzhou First People’s Hospital/The Second Affiliated
Hospital, South China University of Technology); Yinhao Liang (South China University of Technology);
Xinhua Wei (Department of Radiology, Guangzhou First People's Hospital, South China University of
Technology)
2926: Decoding musical pitch from human brain activity with automatic voxel-wise whole-brain
fMRI feature selection
Vincent K.M. Cheung (Sony Computer Science Laboratories, Inc.)*; Yueh-Po Peng (Institute of
Information Science, Academia Sinica); Jing-Hua Lin (Academia Sinica); Li Su (Academia Sinica)
2964: LightVessel: Exploring Lightweight Coronary Artery Vessel Segmentation via Similarity
Knowledge Distillation
Hao Dang (Henan University of Chinese Medicine)*; Yuekai Zhang (Beijing University of Posts and
Telecommunications); Xingqun Qi (University of Technology Sydney); Wanting Zhou (Beijing University of
Posts and Telecommunications); Muyi Sun (CRIPAC, Institute of Automation, Chinese Academy of
Sciences)
49
2984: BLOOD OXYGEN SATURATION ESTIMATION FROM FACIAL VIDEO VIA DC AND AC
COMPONENTS OF SPATIO-TEMPORAL MAP
Yusuke Akamatsu (NEC Corporation)*; Yoshifumi Onishi (NEC Corporation); Hitoshi Imaoka (NEC
Corporation)
3058: BIMODAL FUSION NETWORK FOR BASIC TASTE SENSATION RECOGNITION FROM
ELECTROENCEPHALOGRAPHY AND ELECTROMYOGRAPHY
Han Gao (Zhejiang University)*; Shuo Zhao (Zhejiang university); Huiyan Li (Zhejiang University); Li Liu
(Zhejiang University); You Wang (Zhejiang University); Ruifen Hu (Zhejiang University); Jin Zhang (Hunan
Normal University); Guang Li (Zhejiang University)
3124: Interpretable Nonnegative Incoherent Deep Dictionary Learning for fMRI data analysis
Manuel Morante (AAU)*; Jan Ostergaard (Aalborg University); Sergios Theodoridis (Aalborg University)
3205: SS-ADMM: STATIONARY AND SPARSE GRANGER CAUSAL DISCOVERY FOR CORTICO-
MUSCULAR COUPLING
Farwa Abbas (Imperial College London)*; Verity McClelland (King's College London); Zoran Cvetkovic
(King's College London); Wei Dai (Imperial College London)
3219: Time-Resolved fMRI Shared Response Model Using Gaussian Process Factor Analysis
MohammadReza Ebrahimi (University of Toronto)*; Navona Calarco (University of Toronto); Colin Hawco
(Centre for Addiction and Mental Health); Aristotle Voineskos (CAMH); Ashish Khisti (University of
Toronto)
50
3342: A non-contact SpO2 estimation using video magnification and infrared data
Thomas Stogiannopoulos (DUTH Dept. of Electrical Engineering); Grigorios-Aris Cheimariotis (DUTH
Dept. of Electrical Engineering); Nikolaos Mitianoudis (DUTH Dept. of Electrical Engineering)*
3424: MTDL-Net: Morphological and Temporal Discriminative Learning for Heartbeat Classification
Can Han (Shanghai Jiao Tong University); Suncheng Xiang (Shanghai Jiao Tong University)*; Dahong
Qian (Shanghai Jiao Tong Univerisity)
3445: ADHD Classification with biomarker identification using a triplet loss attention auto-
encoding network
Yibin Tang (Hohai University); Ying Chen (Changzhou University); Yuan Gao (Hohai University); Aimin
Jiang (Hohai University); Lin Zhou (Southeast University)*
3469: UNeXt: a Low-Dose CT denoising UNet model with the modified ConvNeXt block
Farzan Niknejad Mazandarani (Toronto Metropolitan university)*; Paul Babyn (Physician Executive,
Saskatchewan Health Authority, Saskatoon, S7K 0M7, Canada, ); Javad Alirezaie (Toronto Metropolitan
University, Dept of Electrical Eng.)
3757: Heart Rate Estimation and Performance Analysis using MIMO Radar with Dispersed
Antennas
PeiChao Wang (University of Electronic Science and Technology of China); Qian He (University of
Electronic Science and Technology of China)*
3910: Brain network features differentiate intentions from different emotional expressions of the
same text
Zhongjie Li (Tianjin University)*; Bin Zhao (Japan Advanced Institute of Science and Technology);
Gaoyan Zhang (Tianjin University); Jianwu Dang (Tianjin University)
51
4009: Pseudo Multi-Source Domain Extension and Selective Pseudo-labeling for Unsupervised
Domain Adaptive Medical Image Segmentation
Xiaokang Liu (Xiangtan University); Zhiqiang Wang (Xiangnan University); Kai Hu (Xiangtan University)*;
Xieping Gao (Hunan Normal University)
4279: Learning from single-expert annotated labels for automatic sleep staging
Zhiheng Luan (School of Cyber Science and Engineering, Wuhan University)*; Yanzhen Ren (Computer
School of Wuhan University); Li Peng (Wuhan University); Xiong Chen (Sleep Medicine Centre,
Zhongnan Hospital of Wuhan University); Xiuping Yang (Sleep Medicine Centre, Zhongnan Hospital of
Wuhan University); Weiping Tu (Wuhan University); Yuhong Yang (Wuhan University)
4393: MPS-AMS: Masked Patches Selection and Adaptive Masking Strategy Based Self-
Supervised Medical Image Segmentation
Xiangtao Wang (Hebei University of Technology); Ruizhi Wang (Hebei University of Technology); Tian
Biao (Hebei University of Technology); Jiaojiao Zhang (Hebei University of Technology); Shuo Zhang
(Hebei University of Technology); Junyang Chen (Shenzhen Univeristy); Thomas Lukasiewicz (University
of Oxford); Zhenghua Xu (Hebei University of Technology)*
4430: Exploiting Multi-Decision and Deep Refinement for Ultrasound Image Segmentation
Wenjing Liu (Xiangtan University); Xuanya Li (Baidu); Kai Hu (Xiangtan University)*; Xieping Gao (Hunan
Normal University)
52
4600: RETINAL BIOMARKERS FOR DETECTING DIABETIC RETINOPATY USING SMARTPHONE-
BASED DEEP LEARNING FRAMEWORKS
Mahmut Karakaya (Kennesaw State University)*; Ramazan Aygun (Kennesaw State University)
4663: Smart Split-Federated Learning Over Noisy Channels for Embryo Image Segmentation
Zahra Hafezi Kafshgari (Simon Fraser University); Ivan Bajic (Simon Fraser University)*; Parvaneh
Saeedi (Simon Fraser University)
4927: BreathIE: Estimating Breathing Inhale Exhale Ratio Using Motion Sensor Data from
Consumer Earbuds
Nafiul Rashid (Samsung Research America)*; Md Mahbubur Rahman (Samsung Research America);
Tousif Ahmed (Samsung Research America, Inc.); Jilong Kuang (Samsung Research America); Jun Alex
Gao (Samsung Research America)
53
4982: Glacier: Glass-box Transformer for Interpretable Dynamic Neuroimaging
Usman Mahmood (Georgia State University)*; Zening Fu (Georgia State University); Vince Calhoun
(TReNDS); Sergey Plis (Georgia State University)
5007: Light-weighted CNN-Attention based architecture for Hand Gesture Recognition via
ElectroMyography
Soheil Zabihi (Concordia University); Elahe Rahimian (Concordia University); Amir Asif (York University);
Arash Mohammadi (Concordia University)*
5105: Active selection of source patients in transfer learning for epileptic seizure detection using
Riemannian Manifold
Toshiki Orihara (Tokyo University of Agriculture and Technology); Kazi Mahmudul Hassan (Tokyo
University of Agriculture and Technology)*; Toshihisa Tanaka (Tokyo University of Agriculture and
Technology)
5223: Multimodal microscopy image alignment using spatial and shape information and a branch-
and-bound algorithm
Shuonan Chen (Columbia University)*; Bovey Y Rao (Columbia University); Stephanie Herrlinger
(Columbia University); Attila Losonczy (Columbia University); Liam Paninski (Department of Statistics,
Columbia University); Erdem Varol (Columbia University)
54
5234: A New Personalized Efficacy Atlas for Pallidal Deep Brain Stimulation
Xiongbiao Luo (Xiamen University)*
5360: Representation Learning of Clinical Multivariate Time Series with Random Filter Banks
Alireza Keshavarzian (University of Toronto)*; Hojjat Salehinejad (Mayo Clinic); Shahrokh Valaee
(University of Toronto)
5648: Deep Triple-Supervision Learning Unannotated Surgical Endoscopic Video Data for
Monocular Dense Depth Estimation
Wenkang Fan (Xiamen University)*; KaiYun Zhang (Xiamen University); Hong Shi (Fujian Cancer
Hospital & Fujian Medical University Cancer Hospital); Jianhua Chen (Fujian Cancer Hospital & Fujian
Medical University Cancer Hospital); Yinran Chen (Xiamen University); Xiongbiao Luo (Xiamen
University)
5652: Local-Global Progressive U-Transformers for Accurate Hepatic and Portal Veins
Segmentation in Abdominal MR Images
Yu Wu (XiaMen University)*; Dongfang Shen (Xiamen University); Jiabao Jin (Xiamen University);
Guanping Xu (Xiamen University); Yinran Chen (Xiamen University); Xiongbiao Luo (Xiamen University)
5655: DB-UNet: MLP Based Dual Branch UNet for Accurate Vessel Segmentation in OCTA Images
Chengliang Wang (Chongqing University)*; Haojian Ning (Chongqing University); Xinrun Chen
(Chongqing University); Shiying Li (Xiamen University)
5722: IMPROVING HEART RATE AND HEART RATE VARIABILITY ESTIMATION FROM VIDEO
THROUGH A HR-RR-TUNED FILTER
Michael Chan (Georgia Institute of Technology)*; Li Zhu (Samsung Research America); Korosh
Vatanparvar (Samsung Research America); Hewon Jung (Georgia Institute of Technology); Jilong Kuang
(Samsung Research America); Alex Gao (Samsung Research America)
55
5926: CROSS-SUBJECT MENTAL FATIGUE DETECTION BASED ON SEPARABLE SPATIO-
TEMPORAL FEATURE AGGREGATION
Yalan Ye (University of Electronic Science and Technology of China)*; Yutuo He (University of Electronic
Science and Technology of China); Wanjing Huang (University of Electronic Science and Technology of
China ); Qiaosen Dong (Sichuan University); Chong Wang (University of Electronic Science and
Technology of China); Guoqing Wang (University of Electronic Science and Technology of China)
6034: Spatio-Temporal Hybrid Fusion of CAE and SWIn Transformers for Lung Cancer Malignancy
Prediction
Sadaf Khademi (Concordia University); Shahin Heidarian (Concordia University); Parnian Afshar
(Concordia Uniersity); Farnoosh Naderkhani (Concordia University); Anastasia Oikonomou (University of
Toronto); Konstantinos N Plataniotis (UofT); Arash Mohammadi (Concordia University)*
6097: Hankel Structured Low Rank and Sparse Representation via L0-Norm Optimization for
Compressed Ultrasound Plane Wave Signal Reconstruction
Miaomiao Zhang (Capital Normal University); Ji Chen (Capital Normal University); Xiaoyan Fu (Capital
Normal University); Xin Ge (Beijing Jiaotong University); Jingzhi Zhang (Capital Normal University); Na
Jiang (Information Engineering College, Capital Normal University)*; Jan D'Hooge (KU Leuven)
6193: Synthesizing Speech from ECoG with a Combination of Transformer-based Encoder and
Neural Vocoder
Kai Shigemi (Tokyo University of Agriculture and Technology); Shuji Komeiji (Tokyo University of
Agriculture and Technology)*; Takumi Mitsuhashi (Juntendo University School of Medicine); Yasushi
Iimura (Juntendo University School of Medicine); Hiroharu Suzuki (Juntendo University School of
Medicine); Hidenori Sugano (Juntendo University School of Medicine); Koichi SHINODA (Tokyo Institute
of Technology); Kohei Yatabe (Tokyo University of Agriculture and Technology); Toshihisa Tanaka (Tokyo
University of Agriculture and Technology)
6227: Graph based semantic ensemble of Riemannian Neural Structured Learning for BCI-EEG
signal classification
KURUSETTI VINAY GUPTA (IIT KANPUR)*; Prof Laxmidhar Behera (IIT Kanpur); Tushar Sandhan
(Indian Institute of Technology Kanpur)
6265: MLCGAN: MULTI-LEAD ECG SYNTHESIS WITH MULTI LABEL CONDITIONAL GENERATIVE
ADVERSARIAL NETWORK
Jian Wu (East China Normal University); Liping Wang (ECNU)*; Hailin Pan (East China Normal
University); Binyu Wang ( East China Normal University)
56
6402: New Interpretable Patterns and Discriminative Features from Brain Functional Network
Connectivity Using Dictionary Learning
Fateme Ghayem (UMBC)*; Hanlu Yang (University of Maryland, Baltimore County); Furkan Kantar
(UMBC); Seung-Jun Kim (University of Maryland, Baltimore County); Vince Calhoun (TReNDS); Tulay
Adali (University of Maryland, Baltimore County)
6504: MvCo-DoT: Multi-View Contrastive Domain Transfer Network for Medical Report Generation
Ruizhi Wang (Hebei University of Technology); Xiangtao Wang (Hebei University of Technology);
Zhenghua Xu (Hebei University of Technology)*; Wenting Xu (Hebei University of Technology); Junyang
Chen (Shenzhen Univeristy); Thomas Lukasiewicz (University of Oxford)
6506: MOTOR ACTIVITY RECOGNITION USING EEG DATA AND ENSEMBLE OF STACKED BLSTM-
LSTM NETWORK AND TRANSFORMER MODEL
Pallavi Kaushik (Indian Institute of Technology Roorkee)*; Ilina Tripathi (Thapar Institute of Engineering);
Dr. Partha Pratim Roy (IIT Roorkee)
57
Computational Imaging
1042: Unrolled Fourier Disparity Layer optimization for scene reconstruction from few-shots focal
stacks
Brandon Le Bon (Centre INRIA de l'Université de Rennes)*; Mikaël Le Pendu (InterDigital, Rennes);
Christine Guillemot (INRIA)
1287: CTTSR: A Hybrid CNN-Transformer Network for Scene Text Image Super-Resolution
Kaiwei Dai (Central South University); Nan Kang (Central South University); Li Kuang (Central South
University)*
1341: Long Range Imaging Using Multispectral Fusion of RGB and NIR Images
Hao Zhang (Xidian University); Lin Mei (Xidian University); Cheolkon Jung (Xidian University)*
1541: Attention Based Relation Network for Facial Action Units Recognition
Yao Wei (South China University of Technology); Haoxiang Wang (South China University of
Technology)*; Mingze Sun (South China University of Technology); Liu Jiawang (SCUT)
58
1641: A Targeted Sampling Strategy for Compressive Cryo Focused Ion Beam Scanning Electron
Microscopy
Daniel Nicholls (University of Liverpool)*; Jack Wells (University of Liverpool); Alex W Robinson
(University of Liverpool); Amirafshar Moshtaghpour (Rosalind Franklin Institute); Maryna Kobylynska
(King's College London); Roland Fleck (King's College London); Angus Kirkland (University of Oxford);
Nigel Browning (University of Liverpool)
1898: DEEP LOW LIGHT IMAGE ENHANCEMENT VIA MULTI-SCALE RECURSIVE FEATURE
ENHANCEMENT AND CURVE ADJUSTMENT
Haiyan Jin (Xi'an University of Technology); Dawei Wei (Xi'an University of Technology); Haonan Su
(Xi'an University of Technology)*
1996: Super-Resolution for Macro X-ray Fluorescence Data Collected from Old Master Paintings
Su Yan (Imperial College London)*; Herman Jadan (Imperial College London); Jun-Jie Huang (National
University of Defense Technology); Nathan S Daly (The Fitzwilliam Museum); Catherine Higgitt (The
National Gallery); Pier Luigi Dragotti (Imperial College London)
3323: Deep Adaptive Superpixels for Hadamard Single Pixel Imaging in Near-Infrared Spectrum
Brayan Monroy (Universidad Industrial de Santander)*; Jorge Bacca (Universidad Industrial de
Santander); Henry Arguello (Universidad Industrial Santander)
59
4292: Fast Multiscale 3D Reconstruction Using Single-Photon LiDaR Data
Sandor Plosz (Heriot-Watt University)*; Istvan Gyongy (University of Edinburgh); Jonathan Leach (Heriot-
Watt University); Stephen McLaughlin (School of Engineering, Heriot-Watt University); Gerald S. Buller
(Heriot-Watt University); Abderrahim Halimi (Heriot-Watt university)
5629: Alternating Phase Langevin Sampling with Implicit Denoiser Priors for Phase Retrieval
Rohun Agrawal (California Institute of Technology)*; Oscar Leong (California Institute of Technology)
60
Image, Video, and Multidimensional Signal Processing
142: HTNET: HUMAN TOPOLOGY AWARE NETWORK FOR 3D HUMAN POSE ESTIMATION
Jialun Cai (Peking university)*; Hong Liu (Peking University Shenzhen Graduate School); Runwei Ding
(Peking University Shenzhen Graduate School); Wenhao Li (Peking University); Jianbing Wu (Peking
University); Miaoju Ban (Peking University )
212: M2TSR: Multi-range and Mix-grained Transformer for Single Image Super-Resolution
Zhong-Han Niu (State Key Laboratory for Novel Software Technology, Nanjing University); Qinglong
Zhang (State Key Laboratory for Novel Software Technology, Nanjing University ); Yi Fan (State Key
Laboratory for Novel Software Technology, Nanjing University); Yu-Bin Yang (State Key Laboratory for
Novel Software Technology, Nanjing University)*
217: ENHANCED GM-PHD FILTER FOR REAL TIME SATELLITE MULTI-TARGET TRACKING
Camilo G Aguilar (Inria)*; Mathias Ortner (Airbus); Josiane Zerubia (n/a)
61
273: DEHRFormer: Real-time Transformer for Depth Estimation and Haze Removal from
Varicolored Haze Scenes
Sixiang Chen (Jimei University)*; Tian Ye (Jimei University); Shi Jun (XinJiang University); Yun Liu
(Southwest University); JingXia Jiang (jimei university); Erkang Chen (Jimei University); Peng Chen
(Jimei University)
295: A discriminative multi-channel noise feature representation method for image manipulation
localization
yang zhou (sichuan university); Hongxia Wang (Sichuan University)*; Qiang Zeng (Sichuan University);
Rui Zhang (Sichuan University); Sijiang Meng (Sichuan University)
296: Group-wise Co-salient Object Detection with Siamese Transformers via Brownian Distance
Covariance Matching
Yang Wu (nuist); Hao Zhang (Nuist); lingyan liang (inspur); Yaqian Zhao (Inspur); Kaihua Zhang (Inspur,
NUIST)*
301: INTERWEAVED GRAPH AND ATTENTION NETWORK FOR 3D HUMAN POSE ESTIMATION
Ti Wang (Peking University Shenzhen Graduate School); Hong Liu (Peking University Shenzhen
Graduate School); Runwei Ding (Peking University Shenzhen Graduate School)*; Wenhao Li (Peking
University); Yingxuan You (Peking University); Xia Li (ETH Zurich)
302: OAFormer: Learning Occlusion Distinguishable Feature for Amodal Instance Segmentation
Zhixuan Li (Peking University); Ruohua Shi (Peking University); Tiejun Huang (Peking University);
Tingting Jiang (Peking University)*
62
320: LongShortNet: Exploring Temporal and Semantic Features Fusion in Streaming Perception
Chenyang Li (DAMO Academy, Alibaba Group); Zhi-Qi Cheng (Carnegie Mellon University); Jun-Yan He
(DAMO Academy, Alibaba Group); Pengyu Li (Alibaba Group); Bin Luo (DAMO Academy, Alibaba
Group)*; Hanyuan Chen (Alibaba); Yifeng Geng (Alibaba Group); Jin-Peng Lan (DAMO Academy, Alibaba
Group); Xuansong Xie (DAMO Academy, Alibaba Group)
335: Jointly Visual- and Semantic-Aware Graph Memory Networks for Temporal Sentence
Localization in Videos
Daizong Liu (Peking University)*; Pan Zhou (Huazhong University of Science and Technology)
337: Enhancing Multimodal Alignment with Momentum Augmentation for Dense Video Captioning
yiwei wei (Tianjin university)*; Shaozu Yuan (JD AI ); Meng Chen (JD AI); Longbiao Wang (Tianjin
University)
347: Flow-Guided Deformable Alignment Network with Self-Supervision for Video Inpainting
Zhiliang Wu (Nanjing University of Science and Technology)*; Kang Zhang (Nanjing University of Science
and Technology); Changchang Sun (Illinois Institute of Technology); Hanyu Xuan (Anhui University); Yan
Yan (Illinois Institute of Technology)
362: A Flow-Guided Non-Local Alignment Network for Video Compressive Sensing Reconstruction
Chao Zhou (Nanjing University of Posts and Telecommunications)*; Can Chen (Nanjing University of
Posts and Telecommunications); Dengyin Zhang (School of Internet of Things Nanjing University of Posts
and Telecommunications Nanjing, China)
375: Tracking Objects and Activities with Atention for Temporal Sentence Grounding
Zeyu Xiong (Huazhong University of Science and Technology)*; Daizong Liu (Peking University); Pan
Zhou (Huazhong University of Science and Technology); Jiahao Zhu (Huazhong University of Science
and Technology)
377: LEARNING SCENE FLOW FROM 3D POINT CLOUDS WITH CROSS-TRANSFORMER AND
GLOBAL MOTION CUES
Mingliang Zhai (Nanjing University of Posts and Telecommunications)*; Kang Ni (Nanjing University of
Posts and Telecommunications); Jiucheng Xie (Nanjing University of Posts and Telecommunications);
Hao Gao (Nanjing University of Posts and Telecommunications)
63
382: MovieNet-PS: A Large-Scale Person Search Dataset in the Wild
Jie Qin (Nanjing University of Aeronautics and Astronautics)*; Peng Zheng (NUAA, MBZUAI, Aalto
University); Yichao Yan (Shanghai Jiao Tong University); Rong Quan (Nanjing University of Aeronautics
and Astronautics); Xiaogang CHENG (Nanjing University of Posts and Telecommunications); Bingbing Ni
(Shanghai Jiao Tong University)
406: WUDA: Unsupervised Domain Adaptation Based on Weak Source Domain Labels
Shengjie Liu (Beijing University of Posts and Telecommunications)*; Chuang Zhu (Beijing University of
Posts and Telecommunications ); Yuan Li (Peking University); Wenqi Tang (Beijing University of Posts
and Telecommunications)
450: I-Tuning: Tuning Frozen Language Models with Image for Lightweight Image Captioning
Ziyang Luo (Hong Kong Baptist University)*; Zhipeng Hu (NetEase Fuxi AI Lab); Yadong Xi (Fuxi AI Lab,
Netease Inc.); Rongsheng Zhang (Fuxi AI Lab, Netease Inc.); Jing Ma (Hong Kong Baptist University)
453: Towards Reliable Image Outpainting: Learning Structure-Aware Multimodal Fusion with
Depth Guidance
Lei Zhang (Beijing Jiaotong University); Chunyu Lin (Beijing Jiaotong University)*; Kang Liao (Beijing
Jiaotong University); Yao Zhao (Beijing Jiaotong University)
476: Learning to Reconnect Interrupted Trajectories for Weakly Supervised Multi-Object Tracking
Yu-Lei Li (Xiamen University); Yang Lu (Xiamen University); Jie Li (Xidian University); Hanzi Wang
(Xiamen University)*
64
490: PRIME: 3D Human Pose and Body Shape Recovery with Perspective Projection
Baobei Xu (Hikvision Research Institute )*; Shukai Fang (Hikvision Research Institute); Zhaoyang Li
(Hikvision Research Institute); Shicai Yang (Hikvision Research Institute); Di Xie (Hikvision Research
Institute); Shiliang Pu (Hikvision Research Institute)
498: A parallel attention mechanism for image manipulation detection and localization
Qiang Zeng (Sichuan University); Hongxia Wang (Sichuan University)*; yang zhou (sichuan university);
Rui Zhang (Sichuan University); Sijiang Meng (Sichuan University)
528: MABNet: Master Assistant Buddy Network with Hybrid Learning for Image Retrieval
Rohit Agarwal (UiT The Arctic University of Norway, Tromsø)*; Gyanendra Das (Indian Institute of
Technology, Dhanbad); Saksham Aggarwal (IIT (ISM) Dhanbad); Alexander Horsch (UiT The Arctic
University of Norway); Dilip K Prasad (UiT The Arctic University of Norway)
553: HPFTN: Hierarchical Progressive Fusion Transformer Network for Video Denoising
Shuaitao Zhang (Hikvision Research Institute); Yuan Zhang (Hikvision Research Institute); Zheng Zhao
(Hikvision Research Institute); Di Xie (Hikvision Research Institute); Shiliang Pu (Hikvision Research
Institute)*
619: ScaleMix: Intra- and inter-layer multiscale feature combination for change detection
Rui Huang (Civil Aviation University of China)*; Qingyi Zhao (Civil Aviation University of China); Ruofei
Wang (Civil Aviation University of China); Caihua Liu (College of Computer Science and Technology, Civil
Aviation University of China); Sihua Gao (Civil Aviation University of China); yuxiang zhang (Civil Aviation
University of China); Wei Fan (Civil Aviation University of China)
65
642: Efficient Feature Fusion for Learning-based Photometric Stereo
Yakun Ju (The Hong Kong Polytechnic University)*; Kin-Man Lam (The Hong Kong Polytechnic
University); Jun Xiao (The Hong Kong Polytechnic University); Cong Zhang (The Hong Kong Polytechnic
University); Cuixin Yang (The Hong Kong Polytechnic University); Junyu Dong (Ocean University of
China)
665: Continuous Learning for Blind Image Quality Assessment with Contrastive Transformer
Jifan Yang (National Engineering Research Center for Multimedia Software, School of Computer Science,
Wuhan University)*; Zhongyuan Wang (Wuhan University); Baojin Huang (National Engineering Research
Center for Multimedia Software, School of Computer Science, Wuhan University); Lianbing Deng
(Guangdong-Macau Joint Laboratory for Advanced and Intelligent Computing)
670: Composition of Motion From Video Animation Through Learning Local Transformations
Michalis Vrigkas (University of Western Macedonia)*; Virginia Tagka (University of Ioannina); Marina
Plissiti (University of Ioannina); Christophoros Nikou (University of Ioannina)
698: Encoder-Decoder Graph Convolutional Network for Automatic Timed-Up-and-Go and Sit-to-
Stand Segmentation
Bo Wen (University of California, San Diego)*; Chen Du (University of California, San Diego); Truong
Nguyen (UC San Diego)
737: Semantics-Guided Object Removal for Facial Images: with Broad Applicability and Robust
Style Preservation
Jookyung Song (Seoul National University )*; Yeonjin Chang (Seoul National University); SeongUk Park
(Seoul National University); Nojun Kwak (Seoul National University)
66
746: Context-Aware Face Clustering with Graph Convolutional Networks
dafeng zhang (Samsung Research China – Beijing (SRCB))*; Jiangbo Guo (Samsung Research China –
Beijing (SRCB)); Zhezhu Jin (Samsung Research Institute China – Beijing (SRC-B))
872: Nested Attention Network with Graph Filtering for Visual Question and Answering
Jing Lu (China University of Petroleum (East China)); Chunlei Wu (China University Of Petroleum(East
China))*; Leiquan Wang (UPC); Shaozu Yuan (UPC); Jie Wu (China University Of Petroleum)
890: ST360IQ: No-Reference Omnidirectional Image Quality Assessment with Spherical Vision
Transformers
Nafiseh Jabbari Tofighi (Koc University)*; Mohamed Hedi elfkir (hacettepe university); Nevrez Imamoglu
(AIST); Cagri Ozcinar (Samsung); Erkut Erdem (Hacettepe University); Aykut Erdem (Koc University)
914: Depth Estimation for a Single Omnidirectional Image with Reversed-gradient Warming-up
Thresholds Discriminator
Yihong Wu (University of Southampton)*; Yuwen Heng (University of Southampton); Mahesan Niranjan
(University of Southampton); Hansung Kim (University Of Southampton)
67
916: A Template Matching Approach for Reference Picture Padding in Video Coding
Nicolas Horst (Institute of Imaging & Computer Vision, RWTH Aachen University)*; Priyanka Das (RWTH
Aachen University, Germany); Mathias Wien (RWTH Aachen University, Germany)
1031: ESTIMATION OF VISUAL CONTENTS FROM HUMAN BRAIN SIGNALS VIA VQA BASED ON
BRAIN-SPECIFIC ATTENTION
Ryo Shichida (Hokkaido University)*; Ren Togo (Hokkaido University); Keisuke Maeda (Hokkaido
University); Takahiro Ogawa (Hokkaido University); Miki Haseyama (Hokkaido University)
1033: COMBINING THE SILHOUETTE AND SKELETON DATA FOR GAIT RECOGNITION
Likai Wang (Tianjin University)*; Ruize Han (College of Intelligence and Computing, Tianjin University);
Wei Feng (College of Intelligence and Computing, Tianjin University, China)
1041: Robust Video Anomaly Detection Framework via Prior Knowledge and Multi-Path Frame
Prediction
Menghao Zhang (Beijing University of Posts and Telecommunications)*; Jingyu Wang (Beijing University
of Posts and Telecommunications); Jing Wang (Beijing University of Posts and Telecommunications); Qi
Qi (Beijing University of Posts and Telecommunications); Zirui Zhuang (Beijing University of Posts and
Telecommunications); Haifeng Sun (Beijing university of posts and telecommunications); Ning Xiao (Didi
Chuxing)
68
1077: PCSalMix: Gradient Saliency-based Mix Augmentation for Point Cloud Classification
Tao Hong (Peking University)*; Zeren Zhang (Peking University); Jinwen Ma (Peking University)
1119: Binary Image Fast Perfect Recovery From Sparse 2D-DFT Coefficients
Soo-Chang Pei (Department of Electrical Engineering, National Taiwan University); Kuo-Wei Chang
(Chunghwa Telecom)*
1139: Learning from the raw domain: cross modality distillation for compressed video action
recognition
Yufan Liu (Institute of Automation, Chinese Academy Sciences)*; Jiajiong Cao (Ant Financial Service
Group); Weiming Bai (Chinese Academy of Sciences); Bing Li (National Laboratory of Pattern
Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences); Weiming Hu (Institute of
Automation,Chinese Academy of Sciences)
1175: HQP-MVS:A HIGH-QUALITY PLANE PRIOR ASSISTED MULTI-VIEW STEREO FOR LOW-
TEXTURED AREA
zefan tian (peking university)*; Rongjie Wang (PCL); Zhenyu Wang (Shenzhen Graduate School, Peking
University); Ronggang Wang (Peking University)
1242: SANet: Spatial Attention Network with Global Average Contrast Learning for Infrared Small
Target Detection
Jiewen Zhu (UESTC); Shengjia Chen (University of Electronic Science and Technology of China); lexiao li
(UESTC); Luping Ji (UESTC)*
69
1267: Two-stream Decoder Feature Normality Estimating Network for Industrial Anomaly Detection
Chaewon Park (Yonsei University)*; Minhyeok Lee ( Yonsei University); Suhwan Cho (Yonsei University);
Donghyeong Kim (Yonsei University); Sangyoun Lee (Yonsei University)
1310: A novel Cross-Component Context Model for End-to-End Wavelet Image Coding
Anna Meyer (Friedrich-Alexander-Universität Erlangen-Nürnberg)*; Andre Kaup (Friedrich-Alexander-
Universität Erlangen-Nürnberg)
1335: DivCon: Learning Concept Sequences for Semantically Diverse Image Captioning
Yue Zheng (Tsinghua University)*; Ya-Li Li (Tsinghua University); Shengjin Wang (Tsinghua University)
1380: Solving Jigsaw Puzzle of Large Eroded Gaps Using Puzzlet Discriminant Network
Xingke Song (University of Nottingham Ningbo China); Xiaoying YANG (University of Nottingham Ningbo
China); Jianfeng Ren (University of Nottingham Ningbo China)*; RUIBIN BAI (University of Nottingham );
Xudong Jiang (Nanyang Technological University)
70
1446: Infrared and visible image fusion by using multi-scale transformation and fractional-order
gradient information
Shiwei Wu (Nanjing University of Science and Technology); Kang Zhang (Nanjing University of Science
and Technology); Xia Yuan (Nanjing University of Science and Technology)*; ChunXia Zhao (Nanjing
university of science and technology)
1553: Learning Hybrid Representations of Semantics and Distortion for Blind Image Quality
Assessment
Xiaoqi Wang (Nanjing University of Posts and Telecommunications); Jian Xiong (Nanjing Univeristy of
Posts and Telecommunications)*; Bo Li (Xihua University); Jinli Suo (Tsinghua University); Hao Gao
(Nanjing University of Posts and Telecommunications)
1561: Efficient Online Convolutional Dictionary Learning Using Approximate Sparse Components
Farshad G Veshki (Aalto university)*; Sergiy A. Vorobyov (Aalto University)
71
1562: Laryngeal Leukoplakia Classification via Dense Multiscale Feature Extraction in White Light
Endoscopy Images
Zhenzhen You (Xi'an University of Technology)*; Yan Yan (Second Affiliated Hospital of Medical College,
Xi'an Jiaotong University); Zhenghao Shi (.School of Computer Science and Engineering,Xi’an
University of Technology); Minghua Zhao (Xi'an University of Technology); Jing Yan (Second Affiliated
Hospital of Medical College, Xi'an Jiaotong University); Haiqin Liu (Second Affiliated Hospital of Medical
College, Xi'an Jiaotong University); Xinhong Hei (Xi'an University of Technology); Xiaoyong Ren (Second
Affiliated Hospital of Medical College, Xi'an Jiaotong University)
1584: Two-Stage Video De-raining with Spatio-Temporal Fusion and Illumination-Invariant Detail
Preservation
Yufeng Tan (South China University of Technology)*; Youjun Xiang ( South China University of
Technology); Lei Cai (South China University of Technology); Pengcheng Wang (South China University
of Technology); Ying Zhang (South China University of Technology); Yuli Fu (South China University of
Technology)
1646: GAITCOTR: improved spatial-temporal representation for gait recognition with a hybrid
convolution-transformer framework
Jingqi Li (Fudan University); Yuzhen Zhang (Fudan University); Hongming Shan (Fudan University);
Junping Zhang (Fudan University)*
72
1670: Line segment matching based on intersection-enhanced point correspondences
Zhiyu Liu (School of Computer Science and Technology, Soochow University); Baojiang Zhong (School of
Computer Science and Technology, Soochow University)*
1721: Facial Texure Perceiver: Towards High-Fidelity Facial Texture Recovery with Input-Level
Inductive Biased Perceiver IO
Seungeun Lee (UNIST)*
1764: IAST: Instance Association Relying on Spatio-temporal Features for Video Instance
Segmentation
Junhao Chen (Zhejiang University of Technology); Sheng Liu (Zhejiang University of Technology)*;
ruixiang chen (Zhejiang University of Technology); BIngnan Guo (Zhejiang University of Technology);
Feng Zhang (Zhejiang University of Technology)
1780: CANet: Curved Guide Line Network with Adaptive Decoder for Lane Detection
Zhongyu Yang (University of Electronic Science and Technology of China)*; Chen Shen (Didi chuxing);
Wei Shao (Didi Chuxing); Tengfei Xing (Didi chuxing); Runbo Hu (DiDi Chuxing); Pengfei Xu (Didi
Chuxing); Hua Chai (Didi Chuxing); Ruini Xue (University of Electronic Science and Technology of China)
1791: SFEMGN: IMAGE DENOISING WITH SHALLOW FEATURE ENHANCEMENT NETWORK AND
MULTI-SCALE CONVGRU
Qidong Wang (China University of Mining and Technology); Lili Guo (China University of Mining and
Technology)*; Shifei Ding (China University of Mining and Technology); Jian Zhang (china university of
mining and technology); xiao xu (China University of Mining and Technology)
1794: FAPM: Fast Adaptive Patch Memory for Real-time Industrial Anomaly Detection
Donghyeong Kim (Yonsei University)*; Chaewon Park (Yonsei University); Suhwan Cho (Yonsei
University); Sangyoun Lee (Yonsei University)
73
1834: Continuous interaction with a smart speaker via low-dimensional embeddings of dynamic
hand pose
songpei xu (University of Glasgow)*; Chaitanya Kaul (University of Glasgow); Xuri Ge (University of
Glasgow); Roderick Murray-Smith (University of Glasgow)
1901: TAMFormer: Multi-Modal Transformer with Learned Attention Mask for Early Intent
Prediction
Nada Osman (University of Padova); Guglielmo Camporese (University of Padova); Lamberto Ballan
(University of Padova)*
1966: D2Q-DETR: Decoupling and Dynamic Queries for Oriented Object Detection with
Transformers
Qiang Zhou (Alibaba Group)*; Chaohui Yu (Alibaba Group); Zhibin Wang (Alibaba Group); Fan Wang
(Alibaba Group)
74
1974: N2MVSNet: Non-local Neighbors Aware Multi-View Stereo Network
Zhe Zhang (Peking University); Huachen Gao (Peking University); Yuxi Hu (The Chinese University of
Hong Kong, Shenzhen); Ronggang Wang (Peking University)*
1983: Look and Think: Intrinsic Unification of Self-attention and Convolution for Spatial-Channel
Specificity
Xiang Gao (South China University of Technology)*; Honghui Lin (South China University of Technology);
Yu Li (South China University of Technology); Ruiyan Fang (South China University of Technology); Xin
Zhang (South China University of Technology)
1984: SANDFORMER: CNN and Transformer under Gated Fusion for Sand Dust Image Restoration
Shi Jun (XinJiang University)*; Bingcai Wei (Shandong University of Technology); Gang Zhou (Xinjiang
University); Liye Zhang (Shandong university of technology)
2055: Trust Your Partner's Friends: Hierarchical Cross-modal Contrastive Pre-training for Video-
Text Retrieval
Yuhan Xiang (Xiamen University)*; Kaijian Liu (SenseTime Group Limited); Shixiang Tang (The University
of Sydney); Lei Bai (Shanghai AI Laboratory); Feng Zhu (University of Science and Technology of China);
Rui Zhao (SenseTime Group Limited); Xianming Lin (Xiamen University)
2059: BiSVP: Building Footprint Extraction via Bidirectional Serialized Vertex Prediction
Mingming Zhang (Beihang University); Ye Du (Beihang University); Zhenghui Hu (Hangzhou Innovation
Institute, Beihang University); Qingjie Liu (State Key Laboratory of Virtual Reality Technology and System,
Beihang University, Beijing 100191, China)*; Yunhong Wang (State Key Laboratory of Virtual Reality
Technology and System, Beihang University, Beijing 100191, China)
2088: A NOVEL MODE SELECTION-BASED FAST INTRA PREDICTION ALGORITHM FOR SPATIAL
SHVC
Dayong Wang (Institute of Bioinformatics, Chongqing University of Posts & Telecommunications,
Chongqing, China)*; Yu Sun (University of Central Arkansas); Weisheng Li (Chongqing University of
Posts and Telecommunications); Lele Xie (Chongqing University of Posts & Telecommunications); Xin Lu
(De Montfort University ); Frederic Dufaux (CNRS); Ce Zhu (University of Electronic Science &
Technology of China)
2133: ShaDocNet: Learning Spatial-Aware Tokens in Transformer for Document Shadow Removal
Xuhang Chen (University of Macau)*; Xiaodong Cun (Tencent AI Lab); Chi-Man Pun (University of
Macau); Shuqiang Wang (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences)
75
2134: Fine-grained Blind Face Inpainting with 3D Face Component Disentanglement
Yu Bai (Fudan University); Ruian He (Fudan University); Weimin Tan (Fudan University); Bo Yan (Fudan
University)*; Yangle Lin (Fudan University)
2139: Sample-Adapt Fusion Network for RGB-D Hand Detection in the Wild
Xingyu Liu (Beijing University of Posts and Telecommunications)*; Pengfei Ren (Beijing University of
Posts and Telecommunications); Yuchen Chen (Beijing University of Posts and Telecommunications);
Cong Liu (China Mobile); Jing Wang (Beijing University of Posts and Telecommunications); Haifeng Sun
(Beijing university of posts and telecommunications); Qi Qi (Beijing University of Posts and
Telecommunications); Jingyu Wang (Beijing University of Posts and Telecommunications)
2155: A fusion-based and multi-layer method for low light image enhancement
Xueyan Zhou (Nankai University)*; Jiacen Guo (Nankai University); Hao Liu (Nankai University); Chao
Wang (Nankai University)
2270: Spatial-Temporal Graph Convolutional Network boosted Flow-Frame Prediction for Video
Anomaly Detection
Kai Cheng (Fudan University)*; Xinhua Zeng (Fudan University); Yang Liu (Fudan University); Mengyang
Zhao (FUDAN University); pang chengxin (Shanghai University of Electric Power); Xing Hu (university of
shanghai for science and technology)
76
2303: IMAGE FUSION VIA SLICE_BASED CONVOLUTIONAL SPARSE REPRESENTATION
Jingchen Xu (Yanshan University); Yali Zhang (Yanshan University); Ze Li (YanShan University); Jinjia
Wang (Yanshan University)*
2310: CAENet: Using Collaborative Attention Transformer and Add-Boost Strategy for Single
Image Deraining
Shengdi Qin (Beijing Jiaotong University); Shunli Zhang (Beijing Jiaotong University)*; Yu Zhang
(Beihang University); Haoyu Gao (Beijing Jiaotong University)
2361: PI-Trans: Parallel-ConvMLP and Implicit-Transformation Based GAN for Cross-View Image
Translation
Bin Ren (University of Trento)*; Hao Tang (ETH Zurich); Yiming Wang (Fondazione Bruno Kessler); Xia Li
(ETH Zurich); Wei Wang (EPFL); Nicu Sebe (University of Trento)
2363: Multi-level fusion for burst super-resolution with deep permutation-invariant conditioning
Martina Cilia (Politecnico di Torino); Diego Valsesia (Politecnico di Torino)*; Giulia Fracastoro (Polito);
Enrico Magli (POLITO)
2453: Human Pose Estimation from Ambiguous Pressure Recordings with Spatio-temporal
Masked Transformers
Vandad Davoodnia (Queen's University)*; Ali Etemad (Queen's University)
77
2455: TrOMR:Transformer-based Polyphonic Optical Music Recognition
Yixuan Li (Hangzhou Netease cloud Music Technology Co., Ltd)*; Huaping Liu ( Hangzhou Netease
cloud Music Technology Co., Ltd); Qiang Jin (Hangzhou Netease cloud Music Technology Co., Ltd);
Miaomiao Cai (Hangzhou Netease cloud Music Technology Co., Ltd); Peng Li (NetEase Cloud Music)
2465: MSFormer: Multi-Scale Transformer with Neighborhood Consensus for Feature Matching
Dongyue Li (Southeast University); Yaping Yan (Southeast University); Dong Liang (Nanjing University of
Aeronautics and Astronautics); Songlin Du (Southeast University)*
2505: LOCAL FEATURE ENHANCED ADVERSARIAL NETWORK FOR THE BLIND IMAGE QUALITY
ASSESSMENT
Xiaomei Shi (Northwest University); Min Zhang (Northwest University)*; Shou Hai Xia (Northwest
University); Ru Xue Zhang (Northwest University); Jun Feng (Northwest University)
2538: 2DSBG: A 2D SEMI BI-GAUSSIAN FILTER ADAPTED FOR ADJACENT AND MULTI-SCALE
LINE FEATURE DETECTION
Baptiste Magnier (IMT Mines Ales CERIS)*; Ghulam Sakhi Shokouh (IMT Mines Ales); Louis Berthier
(IMT Mines Ales CERIS); Marcel Pie-Tapia (IMT Mines Ales CERIS); Adrien Ruggiero (IMT Mines Ales
CERIS)
78
2553: RATE-DISTORTION OPTIMIZED VARIABLE-NODE-SIZE TRISOUP FOR POINT CLOUD
CODING
Kyohei Unno (KDDI Research)*; Kohei Matsuzaki (KDDI Research); Satoshi Komorita (KDDI Research,
Inc.); Kei Kawamura (KDDI Research)
2569: ERBNet: An Effective Representation Based Network for Unbiased Scene Graph Generation
Wenxi Ma (Xiamen University); Tianxiang Hou (Xiamen University); Qianji Di (Xiamen University);
Zhongang Qi (Tencent); Ying Shan (Tencent); Hanzi Wang (Xiamen University)*
2680: Deep3DSketch: 3D modeling from Free-hand Sketches with View- and Structural-Aware
Adversarial Training
Tianrun Chen (Zhejiang University)*; Chenglong Fu (Huzhou University); Lanyun Zhu (Singapore
University of Technology and Design); Mao Papa (Moxin (Huzhou) Technology Co., LTD); Ying Zang
(Huzhou University); Jia Zhang (Yangzhou Polytechnic College); Lingyun Sun (Zhejiang University)
2684: Frequency Reciprocal Action and Fusion for Single Image Super-Resolution
Shuting Dong (Tsinghua University)*; Feng Lu (Tsinghua University); Chun Yuan (Graduate school at
ShenZhen,Tsinghua university)
2710: iSmallNet: Densely Nested Network with Label Decoupling for Infrared Small Target
Detection
Zhiheng Hu (Nanjing University of Aeronautics and Astronautics); Yongzhen Wang (Nanjing University of
Aeronautics and Astronautics); Peng Li (Nanjing University of Aeronautics and Astronautics); Jie Qin
(Nanjing University of Aeronautics and Astronautics)*; Haoran Xie (Lingnan University); Mingqiang Wei
(Nanjing University of Aeronautics and Astronautics)
2714: A Two-branch Network for Video Anomaly Detection with Spatio-temporal Feature Learning
Guoqiu Li (Tsinghua Shenzhen International Graduate School, Tsinghua University)*; Shengjie Chen
(Tsinghua University); Yujiu Yang (Tsinghua University); Zhenhua Guo (Tianyi Traffic Technology)
2762: A Dual-branch Adaptive Distribution Fusion Framework for Real-world Facial Expression
Recognition
Shu Liu (Central South University)*; Yan Xu (Central South University); Tongming Wan (Central South
University); Xiaoyan Kui (Central South University)
79
2765: Face Recognition on Point Cloud with cGAN-TOP for Denoising
Junyu Liu (University of Nottingham Ningbo China); Jianfeng Ren (University of Nottingham Ningbo
China)*; Hong-liang Sun (UNNC); Xudong Jiang (Nanyang Technological University)
2770: SQA: STRONG GUIDANCE QUERY WITH SELF-SELECTED ATTENTION FOR HUMAN-
OBJECT INTERACTION DETECTION
Feng Zhang (Zhejiang University of Technology); Sheng Liu (Zhejiang University of Technology)*;
BIngnan Guo (Zhejiang University of Technology); ruixiang chen (Zhejiang University of Technology);
Junhao Chen (Zhejiang University of Technology)
2777: FCIR: RETHINK AERIAL IMAGE SUPER RESOLUTION WITH FOURIER ANALYSIS
Yan Zhang (Chongqing University of Posts and Telecommunications); Pengcheng Zheng (Chongqing
University of Posts and Telecommunications); Jianan Jiang (Chongqing University Of Posts And
Telecommunications); Xiao PU (Chongqing University of Posts and Telecommunications); Xinbo Gao
(Chongqing University of Posts and Telecommunications)*
2799: Knowledge Distillation with Active Exploration and Self-attention based Inter-Class Variation
Transfer For Image Segmentation
Yifan Zhang (Shenzhen University); Shaojie Li (Shenzhen University); Xuan Yang (Shenzhen University)*
2824: Structure-Aware Multi-Feature Co-Learning for Dual Branch Face Super Resolution
Kangli Zeng (School of Computer Science, Wuhan University)*; Zhongyuan Wang (Wuhan University);
Tao Lu (Wuhan Institute of Technology); Jianyu Chen (Wuhan University)
2858: IFUNET++: ITERATIVE FEEDBACK UNET++ FOR INFRARED SMALL TARGET DETECTION
Zhangying Weng (Nanjing University of Aeronautics and Astronautics)*; Peng Li (Nanjing University of
Aeronautics and Astronautics); Xin Zhuang
(BeijingAerospaceIntelligentManufacturingTechnologyDevelopmentCo.,Ltd); Xuefeng Yan (Nanjing
University of Aeronautics and Astronautics); Lina Gong (Nanjing University of Aeronautics and
Astronautics); Haoran Xie (Lingnan University); Mingqiang Wei (Nanjing University of Aeronautics and
Astronautics)
2873: SEMI-SUPERVISED CONTRASTIVE LEARNING WITH SOFT MASK ATTENTION FOR FACIAL
ACTION UNIT DETECTION
Zhongling Liu (Fujitsu Research and Development Center); Rujie Liu (Fujitsu Research & Development
Center Co., Ltd.); Ziqiang Shi (Fujitsu Research & Development Center)*; Liu Liu (Fujitsu Research &
Development Center); Xiaoyu Mi (Fujitsu Laboratories Ltd.); Kentaro Murase (Fujitsu Laboratories Ltd.)
2898: $\psi$-Net: Point Structural Information Network for No-reference Point Cloud Quality
Assessment
Jian Xiong (Nanjing Univeristy of Posts and Telecommunications)*; Sifan Wu (Nanjing Univeristy of Posts
and Telecommunications); Wang Luo (Nanjing Univeristy of Posts and Telecommunications); Jinli Suo
(Tsinghua University); Hao Gao (Nanjing University of Posts and Telecommunications)
80
2917: JNDMix: JND-Based Data Augmentation for No-reference Image Quality Assessment
Jiamu Sheng (Fudan University); Jiayuan Fan (Fudan University)*; peng ye (fudan university); Jianjian
Cao (Fudan University)
2993: A2SConv: Asymmetric Spetral-Spatial Neural Architecture Search for Hyperspectral Image
Classification
Zhan Lin (School of Information Science and Technology, Fudan); Jiayuan Fan (Fudan University)*; peng
ye (fudan university); Cao Jianjian (Fudan University)
3032: YOLO-Based Lightweight Object Detection with Structure Simplification and Attention
Enhancement
Shuqi Sun (University of Jinan)*; Xiaohui Yang (University of Jinan); Jingliang Peng (University of Jinan)
81
3053: LEARNING TO EXPLAIN: A GRADIENT-BASED ATTRIBUTION METHOD FOR INTERPRETING
SUPER-RESOLUTION NETWORKS
Anni Yu (State Key Laboratory for Novel Software Technology, Nanjing University); Yu-Bin Yang (State
Key Laboratory for Novel Software Technology, Nanjing University)*
3073: Matrix Recovery using Deep Generative Priors with Low-Rank Deviations
Pengbin Yu (Southwest University)*; Jianjun Wang (Southwest University); Chen Xu (University of
Ottawa)
3115: MSNet: A Deep Architecture using Multi-Sentiment Semantics for Sentiment-Aware Image
Style Transfer
Shikun Sun (Tsinghua University)*; Jia Jia (Tsinghua University); Haozhe Wu (Tsinghua University); Zijie
Ye (Tsinghua University); Junliang Xing (Tsinghua University)
3167: Extracting the Brain-like Representation by an Improved Self-Organizing Map for Image
Classification
Jiahong Zhang (Communication University of China)*; Lihong Cao (Communication University of China);
Moning Zhang (Communication University of China ); Wenlong Fu (Communication University of China)
82
3206: LP-IOANET: EFFICIENT HIGH RESOLUTION DOCUMENT SHADOW REMOVAL
Kostas Georgiadis (CERTH/ITI); Mehmet Kerim Yücel (Samsung R&D UK )*; Evangelos Skartados
(Centre for Research and Technology, Hellas, Information Technologies Institute); Valia Dimaridou
(CERTH-ITI); Anastasios Drosou (CERTH-ITI); Albert Saà-Garriga (Samsung R&D UK); Bruno Manganelli
(Samsung Research UK)
3216: Non-convex approaches for low-rank tensor completion under tubal sampling
Zheng Tan (University of California, Los Angeles); Longxiu Huang (Michigan State University); HanQin
Cai (University of Central Florida ); Yifei Lou (University of Texas at Dallas)*
3287: Lit the Darkness: Three-stage zero-shot learning for low-light enhancement with multi-
neighbor enhancement factors
Mariam Saeed (Alexandria University); Marwan Torki (Alexandria University)*
3311: Data Augmentation based on Invariant Shape Blending for Deep Learning Classification
Emna Ghorbel (National School of Computer Science (ENSI))*; Mahmoud Ghorbel (National School of
Computer Science (ENSI)); Slim Mhiri (ENSI)
3327: Efficiently fusing sparse LiDAR for enhanced Self-supervised Monocular Depth Estimation
Yue Wang (University College London); Mingrong Gong (Shenzhen Institute of Advanced Technology,
Chinese Academy of Sciences); Lei Xia (Shenzhen Institute of Advanced Technology, Chinese Academy
of Sciences); Qieshi Zhang (Shenzhen Institute of Advanced Technology, Chinese Academy of
Sciences)*; Jun Cheng (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences)
3366: Bayesian Methods for Optical Flow Estimation Using a Variational Approximation, With
Applications to Ultrasound
Jan Dorazil (TU Wien)*; Bernard H. Fleury (TU Wien); Franz Hlawatsch (TU Wien)
83
3444: RD-NAS: Enhancing One-shot Supernet Ranking Ability via Ranking Distillation from Zero-
cost Proxies
Peijie Dong (School of Computer Science, National University of Defense Technology); Xin Niu (NUDT)*;
Lujun Li (Chinese Academy of Sciences); ZHILIANG TIAN (National University of Defense Technology);
Xiaodong Wang (National University of Defense Technology); Zimian Wei (School of Computer Science,
National University of Defense Technology); Hengyue Pan (National University of Defense Technology);
Dongsheng Li (School of Computer Science, National University of Defense Technology)
3477: TINYCOD: TINY AND EFFECTIVE MODEL FOR CAMOUFLAGED OBJECT DETECTION
Haozhe Xing (Fudan University); Shuyong Gao (Fudan University); Hao Tang (ETH Zurich); Tsui Qin Mok
(Fudan University); Yanlan Kang (Fudan University); Wenqiang Zhang (Fudan University)*
84
3532: BAUENet: Boundary-Aware Uncertainty Enhanced Network for Infrared Small Target
Detection
Tianxiang Chen (University of Science and Technology of China); Qi Chu (University of Science and
Technology of China)*; Zhentao Tan (Alibaba DAMO Academy); Bin Liu (University of Science and
Technology of China); Nenghai Yu (University of Science and Technology of China)
3537: BAGGING R-CNN: ENSEMBLE FOR OBJECT DETECTION IN COMPLEX TRAFFIC SCENES
Pengteng Li (Shenzhen University); Ying He (Shenzhen University); Dongfu Yin (Guangdong Laboratory
of Artificial Intelligence and Digital Economy (SZ)); F Richard Yu (Shenzhen University)*; Pinhao Song
(KU Leuven)
3555: LOCAL TO GLOBAL PRIOR LEARNING FOR BLIND UNSUPERVISED IMAGE SUPER
RESOLUTION
Kazuhiro Yamawaki (Yamaguchi University)*; Xian-Hua Han (Yamaguchi University)
3597: MTFD : Multi-teacher Fusion Distillation For Compressed Video Action Recognition
Jinxin Guo (Inner Mongolia University)*; Jiaqiang Zhang (Inner Mongolia University); Shaojie Li (Inner
Mongolia University); Xiaojing Zhang (Inner Mongolia University); Ming Ma (Inner Mongolia University)
3602: Mask Guided Selective Context Decoding for Handwritten Chinese Text Recognition
tao li (University of Science and Technology of China)*; shilian wu (University of Science and Technology
of China); Zengfu Wang ( Institute of Intelligent Machines, Chinese Academy of Sciences)
3696: Learning 3D Human Pose and Shape Estimation Using Uncertainty-Aware Body Part
Segmentation
Ziming Wang (Fudan University)*; Han Yu (Fudan University); Xiaoguang Zhu (Shanghai Jiao Tong
University); Zengwen Li (Chongqing Changan Automobile Co., Ltd.); Changxue Chen (Chongqing
Changan Automobile Co., Ltd.); Liang Song (Fudan University)
85
3711: A Simulation-Based Framework for Urban Road Accident Detection
Haohan Luo (East China Normal University); Feng Wang (East China Normal University)*
3745: TransWnet: Integrating Transformers into CNNs via Row and Column Attention for
Abdominal Multi-organ Segmentation
Yazhen Xie (Xiangtan University); Yanglin Huang (Xiangtan University); Yuan Zhang (Xiangtan
University); Xuanya Li (Baidu); Xiongjun Ye (Xiangtan University); Kai Hu (Xiangtan University)*
3841: Monocular 3D Human Pose Estimation Based on Global Temporal-Attentive and Joints-
Attention in Video
ruhan He (Wuhan Textile University); shanshan xiang (Wuhan Textile University)*; Tao Peng (Wuhan
Textile University); Yongsheng Yu (武汉理工大学)
3874: Masked-AP: Attention Pyramid Convolutional Neural Network with mask for Cervical Cell
Classification
yu jin (Institute of Artificial Intelligence, School of Computer Science, Wuhan University); Juan Liu
(Institute of Artificial Intelligence, School of Computer Science, Wuhan University)*; Hua Chen (Institute of
Artificial Intelligence, School of Computer Science, Wuhan University); Wensi Duan (Institute of Artificial
Intelligence, School of Computer Science, Wuhan University); Dehua Cao (Landing Artificial Intelligence
Center for Pathological Diagnosis); Baochuan Pang (Landing Artificial Intelligence Center for Pathological
Diagnosis )
3888: DDN: Dynamic Aggregation Enhanced Dual-stream Network for Medical Image Classification
Lang Wang (Institute of Artificial Intelligence, School of Computer Science, Wuhan University); Juan Liu
(Institute of Artificial Intelligence, School of Computer Science, Wuhan University)*; Peng Jiang (Institute
of Artificial Intelligence, School of Computer Science, Wuhan University); Dehua Cao (Landing Artificial
Intelligence Center for Pathological Diagnosis); Baochuan Pang (Landing Artificial Intelligence Center for
Pathological Diagnosis )
86
3929: Image Inpainting with Semantic-aware Transformer
Shiyu Chen (Southwest University of Science and Technology); Wenxin Yu (Southwest University of
Science and Technology)*; Qi Wang (Southwest University of Science and Technology); Jun Gong
(Beijing Institute of Technology); Peng Chen (Chengdu Hongchengyun Technology Co., Ltd)
3972: Hierarchical Spatiotemporal Feature Fusion Network for Video Saliency Prediction
Yunzuo Zhang (Shijiazhuang Tiedao University)*; Tian Zhang (Shijiazhuang Tiedao University); Cunyu
Wu (Shijiazhuang Tiedao University); Yuxin Zheng (Shijiazhaung Tiedao University)
3983: BODY PRIOR GUIDED GRAPH CONVOLUTIONAL NEURAL NETWORK FOR SKELETON-
BASED ACTION RECOGNITION
Qianshuo Hu (Chongqing university of technology); Hong Liu (Peking University Shenzhen Graduate
School); Hua-qiu Wang ( Chongqing University of Technology); Mengyuan Liu (Peking University,
Shenzhen Graduate School)*
3986: A highly Interpretable Deep equilibrium network for hyperspectral image deconvolution
Alexandros Gkillas (University of Patras)*; Dimitris Ampeliotis (Digital Media and Communication
Department, Ionian University, Greece); Kostas Berberidis (University of Patras)
4036: D-3DLD: Depth-aware Voxel Space Mapping for Monocular 3D Lane Detection with
Uncertainty
Nayeon Kim (Samsung Electronics)*; Moonsub Byeon (Samsung Electronics); Daehyun Ji (Samsung
Electronics); Dokwan Oh (Samsung Electronics)
87
4037: Recurrent Fine-Grained Self-Attention Network for Video Crowd Counting
Jifan Zhang (School of Electronic and Computer Engineering, Peking University); Zhe Wu (Peng Cheng
Laboratory); xinfeng zhang (University of Chinese Academy of Sciences); Guoli Song (Peng Cheng
Laboratory); Yaowei Wang (PengCheng Laboratory); Jie Chen (Peking University)*
4090: Gated Enhanced RPN and Hybrid-View for Few-Shot Object Detection
Xujun Wei (Fudan University); Zechu Zhou (Academy of Engineering and Technology, Fudan University);
Pinxue Guo (Fudan University); Wenqiang Zhang (Fudan University)*
4168: DO-FAM: Disentangled Non-Linear Latent Navigation for Facial Attribute Manipulation
Yifan Yuan (Fudan University)*; Siteng Ma (Fudan University); Hongming Shan (Fudan University);
Junping Zhang (Fudan University)
4237: Time-Frequency Awareness Network for Human Mesh Recovery from Videos
Boyang Zhang (Ningxia University); Suping Wu (Ningxia University)*; Meining Jia (NingXia University)
88
4263: COLOR GUIDED DEPTH MAP SUPER-RESOLUTION WITH NONLOCLA AUTOREGRESSIVE
MODELING
Wei Xu (Faculty of Information Technology, Beijing University of Technology)*; Na Qi (Beijing University of
Technology); Qing Zhu (Beijing University of Technology); Jingzhong Qi (Beijing University of
Technology); Longlu Huang (Beijing University of Technology); Kun Cao (Beijing University of
Technology); Yuxin Bao (Beijing University of Technology); Qianwen Wang (Beijing university of
technology)
4310: Could the BubbleView metaphor be used to infer visual attention on 3D graphical content ?
Alexandre Bruckert (Nantes Université)*; Mona Abid (Nantes université); Matthieu Perreira Da Silva
(Université de Nantes); Patrick Le Callet ("Universite de Nantes, France")
4359: Test your samples jointly: Pseudo-reference for image quality evaluation
Marcelin Tworski (Telecom Paris)*; Stéphane Lathuilière (Telecom-Paris)
4420: Towards Realizing the Value of Labeled Target Samples: a Two-Stage Approach for Semi-
Supervised Domain Adaptation
Mengqun Jin (Tsinghua University); Kai Li (NEC LABORATORIES AMERICA, INC); SHUYAN LI
(University of Cambridge); Chunming He (Tsinghua University); Xiu Li (Tsinghua University)*
89
4440: Learning how to learn domain-invariant parameters for domain generalization
Feng Hou (University of Chinese Academy of Sciences)*; Yao Zhang (Shanghai AI Lab); Yang Liu
(Institute of Computing Technology, University of Chinese Academy of Sciences, Lenovo AI Lab); Jin
Yuan (Southeast University); Cheng Zhong (Lenovo Research, AI Lab); Yang Zhang (Lenovo Ltd);
zhongchao shi (lenovo company); Jianping Fan (Lenovo); Zhiqiang He (Lenovo Ltd.)
4544: Synthetic Pseudo Anomalies for Unsupervised Video Anomaly Detection: A Simple yet
Efficient Framework based on Masked Autoencoder
Xiangyu Huang (School of informatics Xiamen University)*; Caidan Zhao (School of Informatics Xiamen
University); Chenxing Gao (xiamen university); Chen Lvdong (xiamen university); Zhiqiang Wu (Wright
State University)
4690: Collaborative Audio-Visual Event Localization based on Sequential Decision and Cross-
modal Consistency
Yuqian Kuang (Harbin Institute of Technology)*; Xiaopeng Fan (Harbin Institute of Technology)
90
4696: LGViT: Local-Global Vision Transformer for Breast Cancer Histopathological Image
Classification
Lang Wang (Institute of Artificial Intelligence, School of Computer Science, Wuhan University); Juan Liu
(Institute of Artificial Intelligence, School of Computer Science, Wuhan University)*; Peng Jiang (Institute
of Artificial Intelligence, School of Computer Science, Wuhan University); Dehua Cao (Landing Artificial
Intelligence Center for Pathological Diagnosis); Baochuan Pang (Landing Artificial Intelligence Center for
Pathological Diagnosis)
4719: FEW BUT INFORMATIVE LOCAL HASH CODE MATCHING FOR IMAGE RETRIEVAL
Zechao Hu (University of York)*; Adrian Bors (University of York)
4755: SELF-SIMILARITY IS ALL YOU NEED FOR FAST AND LIGHT-WEIGHT GENERIC EVENT
BOUNDARY DETECTION
Sourabh Vasant Gothe (SAMSUNG R&D INSTITUTE BANGALORE, KARNATAKA, INDIA)*; Jayesh
Rajkumar Vachhani (Samsung R&D Institute Bengaluru); Rishabh Khurana (Samsung Research,
Bangalore); Pranay Kashyap (Samsung Research Institute Bangalore)
4803: Efficient Feature Extraction for Non-Maximum Suppression in Visual Person Detection
Charalampos Symeonidis (AUTH)*; Ioannis Mademlis (Department of Informatics, Aristotle University of
Thessaloniki); Ioannis Pitas (Aristotle University of Thessaloniki); Nikolaos Nikolaidis (Aristotle University
of Thessaloniki)
4836: Motion Matters: A Novel Motion Modeling For Cross-View Gait Feature Learning
Jingqi Li (Fudan University); Jiaqi Gao (Fudan University); Yuzhen Zhang (Fudan University); Hongming
Shan (Fudan University); Junping Zhang (Fudan University)*
91
4957: Soft 2D-to-3D Delivery Using Deep Graph Neural Networks for Holographic-Type
Communication
Takuya Fujihashi (Osaka University)*; Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories);
Takashi Watanabe (Osaka University)
5121: Efficient and Effective Multi-Camera Pose Estimation with Weighted M-Estimate Sample
Consensus
Xinyu Lin (University of Electronic Science and Technology of China)*; Yingjie Zhou (Sichuan University);
Xun Zhang (Institut superieur d’electronique de Paris - ISEP); Yipeng Liu (University of Electronic Science
and Technology of China); Ce Zhu (University of Electronic Science & Technology of China)
5144: Level-line Guided Edge Drawing for Robust Line Segment Detection
Xinyu Lin (University of Electronic Science and Technology of China)*; Yingjie Zhou (Sichuan University);
Yipeng Liu (University of Electronic Science and Technology of China); Ce Zhu (University of Electronic
Science & Technology of China)
5185: Multi-Modal Approach to Food Classification Diet Tracking System with spoken and visual
inputs
Shivani Gowda Kallappanahalli (Loyola Marymount University); Yifan Hu (Loyola Marymount University);
Mandy B Korpusik (Loyola Marymount University)*
5194: CROSS-MODAL MATCHING AND ADAPTIVE GRAPH ATTENTION NETWORK FOR RGB-D
SCENE RECOGNITION
Yuhui Guo (Renmin University of China)*; Xun Liang (Renmin University of China); james kwok (The
Hong Kong University of Science and Technology); Xiangping Zheng (Renmin University of China); Bo
Wu (Renmin University of China); Yuefeng Ma (Qufu Normal University)
5280: Boundary Cue Guidance and Contextual Feature Mining for Glass Segmentation
Qiquan Xiao (Xiangtan University); Yuan Zhang (Xiangtan University); Xuanya Li (Baidu); Kai Hu
(Xiangtan University)*
92
5283: Dynamic Local and Global Context Exploration For Small Object Detection
Ziji Zhang (Beijing University of Posts and Telecommunications)*; Ping Gong (Beijing University of Posts
and Telecommunications); Haotian Sun (Beijing University of Posts and Telecommunications); Pingping
Wu (Beijing University of Posts and Telecommunications); Xuanyuan Yang (Beijing University of Posts
and Telecommunications)
5479: A PROGRESSIVE IMAGE DEHAZING FRAMEWORK WITH INTER AND INTRA CONTRASTIVE
LEARNING
honglei xu (Harbin Institute of Technology); Shaohui Liu (Harbin Institute of Technology)*; Yan Shu (State
Key Laboratory of Communication Content Cognition, People`s Daily Online, Beijing, China; Harbin
Institute of Technology; Institute of Information Engineering, CAS ); Feng Jiang (Harbin Institute of
Technology, Harbin)
5489: DMFormer: Closing the Gap between CNN and Vision Transformers
Zimian Wei (School of Computer Science, National University of Defense Technology); Hengyue Pan
(National University of Defense Technology)*; Lujun Li (Chinese Academy of Sciences); MengLong Lu
(National University of Defense Technology); Xin Niu (NUDT); Peijie Dong (School of Computer Science,
National University of Defense Technology); Dongsheng Li (School of Computer Science, National
University of Defense Technology)
5538: real-time Human reconstruction based on human pose prior and epipolar refinement
Kuncheng Luo (Tsinghua University)*; Zhiheng Li (Tsinghua University)
5567: INFORMATION EXTRACTION FROM PILL BOTTLE IMAGES VIA TEXT STITCHING
Rahul Kumar Gupta (Walmart Global Tech)*; Shilka Roy (Walmart); Sujit Jos (Walmart Global Tech); Unni
V.S. (Walmart Global Tech); Lauren Lavoie (Walmart Global Tech); Frederic Medous (Walmart Global
Tech); Walter Smith (Walmart Global Tech)
93
5569: No reference quality assessment for screen content images based on entire and high-
influence regions
Zhuoran Xu (Anhui University); Yang Yang (Anhui University)*; Zhixiang Zhang (Hefei High-Dimensional
Data Technology Co.,Ltd); Weiming Zhang (University of Science and Technology of China)
5586: Pyramid Spatial Feature Transform And Shared-Offsets Deformable Alignment Based
Convolutional Network for HDR Imaging
Junda Liao (Nanjing University; Waseda University); Qin Liu (Nanjing University)*; Takeshi Ikenaga
(Waseda University)
5668: AMPose: Alternately Mixed Global-Local Attention Model for 3D Human Pose Estimation
Hong-Xin Lin (National Taiwan University)*; Yun-Wei Chiu (National Taiwan University); Pei-Yuan Wu
(National Taiwan University)
5674: Boosting Face Recognition Performance with Synthetic Data and Limited Real Data
Wenqing Wang (University of Macau)*; Lingqing Zhang (University of Macau); Chi-Man Pun (University of
Macau); Jiucheng Xie (Nanjing University of Posts and Telecommunications)
5684: A Deep Fusion Rule for Infrared and Visible Image Fusion: Feature Communication for
Importance Assessment
Xuran Lv (Qilu University of Technology(Shandong Academy of Sciences)); Jinyong Cheng (Qilu
University of Technology(Shandong Academy of Sciences) )*; Guohua Lv (Qilu Universityof
Technology (Shandong Academy of Sciences)); Zhonghe Wei (Qilu University of Technology (Shandong
Academy of Sciences))
94
5699: STATIC-SCENE CONSTRAINED OPTIMIZATION FOR MATRIX/TENSOR-DECOMPOSITION-
FREE FOREGROUND-BACKGROUND SEPARATION
Kazuki Naganuma (Tokyo Institute of Technology)*; Shunsuke Ono (Tokyo Institute of Technology)
5708: KEPS-NET: Robust Parking Slot Detection based Keypoint Estimation for High Localization
Accuracy
Jaewoo Lee ( Samsung Electronics)*; Kapje Sung (Samsung Electronics); Daeul Park (Samsung
Electronics); Younghan Jeon (Seoul National University)
5713: Low in Resolution, High in Precision: UAV Detection with Super-Resolution and Motion
Information Extraction
Hanzhuo Wang (Zhejiang University)*; Xingjian Wang (Zhejiang University); Chengwei Zhou (Zhejiang
University); Wenchao Meng (Zhejiang University); Zhiguo Shi (Zhejiang University)
5750: FLOWPOSE: CONDITIONAL NORMALIZING FLOWS FOR 3D HUMAN POSE AND SHAPE
ESTIMATION FROM MONOCULAR VIDEOS
Yaoyao Du (Tsinghua University)*; Zixiao Zhang (Huawei); Zhihao Li (Huawei Noah's Ark Lab); Peng Wei
(Huawei Device BG); Qingmin Liao (Tsinghua Univeristy); Wenming Yang (Tsinghua University)
5827: Background Disturbance Mitigation for Video Captioning via Entity-Action Relocation
Zipeng Li (Wuhan University of Technology); Xian Zhong (Wuhan University of Technology); Shuqin Chen
(Hubei University of Education)*; Wenxuan Liu (Wuhan University of Technology); Wenxin Huang (Hubei
University); Lin Li (Wuhan University of Technology)
95
5841: Joint Multi-Level Feature Network for Lightweight Person Re-Identification
Yunzuo Zhang (Shijiazhuang Tiedao University)*; Weili Kang (Shijiazhuang Tiedao University); Yameng
Liu (Shijiazhuang Tiedao University); Pengfei Zhu (Shijiazhuang Tiedao University)
5844: ACTIVE PERCEPTION SYSTEM FOR ENHANCED VISUAL SIGNAL RECOVERY USING DEEP
REINFORCEMENT LEARNING
Gaurav Chaudhary (Indian Institute of Technology Kanpur, India)*; Prof Laxmidhar Behera (IIT Kanpur);
Tushar Sandhan (Indian Institute of Technology Kanpur)
5876: Classifying Pathological Images Based on Multi-Instance Learning and End-to-End Attention
Pooling
Yuqi Chen (Institute of Artificial Intelligence, School of Computer Science, Wuhan University); Juan Liu
(Institute of Artificial Intelligence, School of Computer Science, Wuhan University)*; Zhiqun Zuo (Institute
of Artificial Intelligence, School of Computer Science, Wuhan University); Peng Jiang (Institute of Artificial
Intelligence, School of Computer Science, Wuhan University); Yu Jin (Institute of Artificial Intelligence,
School of Computer Science, Wuhan University); Guangsheng Wu (School of Mathematics and Computer
Science, Xinyu University)
5955: YOLOX-B: A BETTER YOLOX MODEL FOR REAL-TIME DRIVER BEHAVIOR DETECTION
Xu Guo (Inner Mongolia University)*; Ming Ma (Inner Mongolia University); Jiaqiang Zhang (Inner
Mongolia University); Shaojie Li (Inner Mongolia University)
6015: Neighborhood Information-Based Label Refinement for Person Re-Identification with Label
Noise
Xian Zhong (Wuhan University of Technology); Shuaipeng Su (Wuhan University of Technology);
Wenxuan Liu (Wuhan University of Technology)*; Xuemei Jia (Wuhan University); Wenxin Huang (Hubei
University); Mengdie Wang (Wuhan University Of Technology)
96
6067: Classification-based Dynamic Network for Efficient Super-Resolution
Qi Wang (Beijing Jiaotong University); Weiwei Fang (Beijing Jiaotong University)*; Meng Wang (Beijing
Jiaotong University); Yusong Cheng (Beijing Jiaotong University)
6235: Towards Privacy and Utility in Tourette Tic Detection Through Pretraining Based on Publicly
Available Video Data of Healthy Subjects
Nele Sophie Brügge (Universität zu Lübeck)*; Esfandiar Mohammadi (Universität zu Lübeck); Alexander
Münchau (Universität zu Lübeck); Tobias Bäumer (Universität zu Lübeck); Christian Frings (Universität
Trier); Christian Beste (Technische Universität Dresden); Veit Roessner (Technische Universität Dresden);
Heinz Handels (University of Lübeck)
97
6322: RETRIEVAL-BASED NATURAL 3D HUMAN MOTION GENERATION
Zehan Tan (Fudan University)*; Weidong Yang (Fudan University); Shuai Wu (Fudan University)
6387: In-Sensor & Neuromorphic Computing are all you need for Efficient Computer Vision
Gourav Datta (University of Southern California)*; Zeyu Liu (University of Southern California); Md
Abdullah-Al Kaiser (University of Southern California); Souvik Kundu (Intel Labs); Joe Mathai (Information
Sciences Institute); Zihan Yin (USC); Ajey Jacob (USC); Akhilesh Jaiswal (USC); Peter A. Beerel
(University of Southern California)
98
Information Forensics and Security
276: SC-NET: SALIENT POINT AND CURVATURE BASED ADVERSARIAL POINT CLOUD
GENERATION NETWORK
Zihao Zhang (The University of Electronic Science and Technology of China); Nan Sang (UESTC);
Xupeng Wang (University of Electronic Science and Technology of China)*; Mumuxin Cai (University of
Electronic Science and Technology of China)
290: Audio Cross Verification Using Dual Alignment Likelihood Ratio Test
Heidi Lei (MIT); Arm Wonghirundacha (Pomona College); Irmak Bukey (Pomona College); Timothy Tsai
(Harvey Mudd College)*
344: GAPter: Gray-box Data Protector for Deep Learning Inference Services at User Side
Hao Wu (Nanjing University); Bo Yang (Nanjing University); Xiaopeng Ke (Nanjing University); Siyi He
(Nanjing University); Fengyuan Xu (Nanjing University)*; Sheng Zhong (Nanjing University)
376: Measure and Countermeasure of the Capsulation Attack against Backdoor-based Deep
Neural Network Watermarks
Fangqi Li (SEIEE, Shanghai Jiao Tong University)*; shilin wang (SEIEE, Shanghai Jiaotong University);
Yun Zhu (Shanghai Jiaotong University)
99
874: A Multi-modal Approach for Context-aware Network Traffic Classification
Bo Pang (哈尔滨工业大学); Yongquan Fu (National University of Defense Technology)*; Siyuan Ren
(Department of Computer Science and Technology, Harbin Institute of Technology(Shenzhen)); Siqi Shen
(Xiamen University); Ye Wang (National University of Defense Technology); Qing Liao (Harbin Institute of
Technology (Shenzhen)); Yan Jia (National University of Defense Technology)
906: A study on the invariance in security whatever the dimension of images for the steganalysis
by deep-learning
Kévin Planolles (LIRMM (Montpellier)); Marc Chaumont (LIRMM (Montpellier), UNimes)*; Frédéric Comby
(LIRMM)
1665: Benchmarking Cross-Domain Face Recognition with Avatars, Caricatures and Sketches
Ahmad Foroughi (Hochschule Darmstadt); Christian Rathgeb (Hochschule Darmstadt)*; Mathias Ibsen
(Hochschule Darmstadt); Christoph Busch (Hochschule Darmstadt)
1917: EXPLOITING PRNU AND LINEAR PATTERNS IN FORENSIC CAMERA ATTRIBUTION UNDER
COMPLEX LENS DISTORTION CORRECTION
Andrea AM Montibeller (University of Trento)*; Fernando Perez-Gonzalez (Universidad de Vigo)
1950: WHICH COUNTRY IS THIS PICTURE FROM? NEW DATA AND METHODS FOR DNN-BASED
COUNTRY RECOGNITION
Omran Alamayreh (University of Siena )*; Giovanna Dimitri (University of Siena ); Jun Wang (University
of Siena); Benedetta Tondi (University of Siena); Mauro Barni (University of Siena)
2183: CPA: Compressed Private Aggregation for Scalable Federated Learning over Massive
Networks
Natalie Lang ( Ben-Gurion University of the Negev)*; Elad Sofer (Ben-Gurion University of the Negev); Nir
Shlezinger (Ben-Gurion University); Rafael D'Oliveira (Clemson University); Salim El Rouayheb (Rutgers
University)
100
2322: A Graph Neural Network Multi-task Learning-Based Approach for Detection and Localization
of Cyberattacks in Smart Grids
Abdulrahman Takiddin (Texas A&M University)*; Rachad Atat (Texas A&M University at Qatar);
Muhammad Ismail (Tennessee Tech University); Katherine Davis (Texas A&M University); Erchin
Serpedin ()
2345: UNTAG: Learning Generic Features for Unsupervised Type-Agnostic Deepfake Detection
Nesryne Mejri (Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of
Luxembourg)*; Enjie Ghorbel (SnT, University of Luxembourg); Djamila Aouada (SnT, University of
Luxembourg)
2407: Defense against black-box adversarial attacks via heterogeneous fusion features
Jiahuan Zhang (Hokkaido University)*; Keisuke Maeda (Hokkaido University); Takahiro Ogawa (Hokkaido
University); Miki Haseyama (Hokkaido University)
2449: Effect of Lossy Compression Algorithms on Face Image Quality and Recognition
Torsten Schlett (Hochschule Darmstadt)*; Sebastian Schachner (Hochschule Darmstadt); Christian
Rathgeb (Hochschule Darmstadt); Juan Tapia (hda); Christoph Busch (Hochschule Darmstadt)
2570: HE-GAN: Differentially Private GAN using Hamiltonian Monte Carlo based Exponential
Mechanism
Usman Hassan (University of Kentucky); Dongjie Chen (University of California, Davis)*; Sen-ching S
Cheung (University of Kentucky); Chen-Nee Chuah (University of California Davis)
3110: Backdoor Attack Against Automatic Speaker Verification Models in Federated Learning
Dan Meng (OPPO Research Institute); Xue Wang (Wuhan University); Jun Wang (OPPO Research
Institute)*
101
3529: Sparse Black-box Inversion Attack With Limited Information
Yixiao Xu (Institute of Computer Application, China Academy of Engineering Physics); Xiaolei Liu
(Institute of Computer Application, China Academy of Engineering Physics)*; Teng Hu (Institute of
Computer Application, China Academy of Engineering Physics); Bangzhou Xin (Institute of Computer
Application, China Academy of Engineering Physics); Run Yang (Institute of computer application,
Chinese Academy of Engineering Physics)
3608: Towards Practical Edge Inference Attacks against Graph Neural Networks
Kailai Li (Shanghai Jiao Tong University)*; Jiawei Sun (Shanghai Jiao Tong University); Ruoxin Chen
(Shanghai Jiao Tong University); Wei Ding (Shanghai Jiao Tong University); Kexue Yu (Shanghai Jiao
Tong University); Jie Li (Shanghai Jiao Tong University); Chentao Wu (Shanghai Jiao Tong University)
3662: Single Domain Dynamic Generalization for Iris Presentation Attack Detection
Yachun Li (Hikvision Research Institute)*; Jingjing Wang (Hikvision Research Institute); yuhui chen
(HIKVISION); Di Xie (Hikvision Research Institute); Shiliang Pu (Hikvision Research Institute)
3684: Learning Expressive and Generalizable Motion Features for Face Forgery Detection
Jingyi Zhang (Hikvision Research Institute)*; Peng Zhang (Hikvision Research Institute); Jingjing Wang
(Hikvision Research Institute); Di Xie (Hikvision Research Institute); Shiliang Pu (Hikvision Research
Institute)
4121: Efficient Privacy Preserving Graph Neural Network for Node Classification
Xinjun Pei (Central South Univerisity); Xiaoheng Deng (Central South University)*; Shengwei Tian
(Xinjiang University); Kaiping Xue (University of Science and Technology of China)
102
4206: Towards Adversarially Robust Continual Learning
Tao Bai (Nanyang Technological University)*; Chen Chen (Sony AI); Lingjuan Lyu (Sony AI); Jun Zhao
(Nanyang Technological University); Bihan Wen (Nanyang Technological University)
4490: Hearing and Seeing Abnormality: Self-supervised Audio-Visual Mutual Learning for
Deepfake Detection
ChangSung Sung (National Taiwan University)*; Jun-Cheng Chen (Academia Sinica); Chu-Song Chen
(National Taiwan University)
4514: Two-branch multi-scale deep neural network for generalized document recapture attack
detection
Li Jiaxing (City University of Hong Kong); Chenqi KONG (City Unversity of Hong Kong); Shiqi Wang (City
University of Hong Kong); Haoliang Li (CityU)*
4877: CSM in Motion Vector Steganalysis: The Effect of Coders on Motion Vectors in H.264 Video
Encoding
Verena Lachner (ZITiS)*; Katharina Schaar (ZITiS); Ralf Zimmermann (ZITiS)
4920: Prosody is Not Identity: A Speaker Anonymization Approach Using Prosody Cloning
Sarina Meyer (University of Stuttgart)*; Florian Lux (University of Stuttgart); Julia Koch (University of
Stuttgart); Pavel Denisov (University of Stuttgart); Pascal Tilli (University of Stuttgart); Ngoc Thang Vu
(University of Stuttgart)
103
5189: A Role Engineering Approach based on Spectral Clustering Analysis for RESTful
Permissions in Cloud
Yutang Xia (Peking University)*; Yang Luo (Peking University); Wu Luo (Peking University); Qingni Shen
(Peking University); Yahui Yang (Peking University); Zhonghai Wu (Peking University)
5628: Styx: Adaptive Poisoning Attacks against Byzantine-Robust defenses in Federated Learning
Yuxin Wen (University of Maryland)*; Jonas A. Geiping (University of Maryland, College Park); Micah
Goldblum (University of Maryland); Tom Goldstein (University of Maryland, College Park)
5936: LEARNING SPARSE ALIGNMENTS VIA OPTIMAL TRANSPORT FOR CROSS-DOMAIN FAKE
NEWS DETECTION
Wei Tang (Beijing University of Posts and Telecommunications)*; zuyao ma (Beijing University of Posts
and Telecommunications)
5944: MAKE YOUR ENEMY YOUR FRIEND: IMPROVING IMAGE ROTATION ANGLE ESTIMATION
WITH HARMONICS
yu kun (School of Computer Science & Technology Southwest University of Science and Technology
Mianyang, China); Morteza Darvish Morshedi Hosseini (State University of New York at Binghamton);
Anjie Peng (Southwest University of Science and Technology); Hui Zeng (Southwest University of
Science and Technology)*; Miroslav Goljan (State University of New York at Binghamton)
104
6310: On the detection of synthetic images generated by diffusion models
Riccardo Corvi (University Federico II of Naples); Davide Cozzolino (University Federico II of Naples);
Giada Zingarini (University Federico II of Naples); GIovanni Poggi (University Federico II of Naples); Koki
Nagano (NVIDIA); Luisa Verdoliva (University Federico II of Naples)*
105
Machine Learning for Signal Processing
103: Overcoming the Seesaw in Monocular 3D Object Detection via Language Knowledge
Transferring
Weichen Xu (Peking University)*; Tianhao Fu (Peking University)
156: HDNet: Hierarchical Dynamic Network for Gait Recognition using Millimeter-Wave Radar
Yanyan Huang (Zhejiang University)*; Yong Wang (Zhejiang University); Kun Shi (Zhejiang University );
Chaojie Gu (Zhejiang University); Yu Fu (Zhejiang University); Cheng Zhuo (Zhejiang University); Zhiguo
Shi (Zhejiang University)
193: A Multi-Stage Triple-Path Method for Speech Separation in Noisy and Reverberant
Environments
Zhaoxi Mu (Xi'an Jiaotong University)*; Xinyu Yang (Xi'an Jiaotong University); Xiangyuan Yang (Xi'an
Jiaotong University); WenJing Zhu (DXM)
226: SD-PINN: Physics informed neural networks for spatially dependent PDEs
Ruixian Liu (University of California, San Diego)*; Peter Gerstoft (University of California San Diego)
106
287: Optimization for Robustness Evaluation beyond Lp Metrics
Hengyue Liang (University of Minnesota)*; Buyun Liang (University of Minnesota); Ying Cui (University of
Minnesota); Tim Mitchell (Queens College / CUNY); Ju Sun (University of Minnesota)
386: Preformer: Predictive Transformer with Multi-Scale Segment-wise Correlations for Long-Term
Time Series Forecasting
Dazhao Du (Institute of Software Chinese Academy of Sciences); Bing Su (Renmin University of China)*;
Zhewei Wei (Renmin University of China)
418: Hierarchical Hypergraph Recurrent Attention Network for Temporal Knowledge Graph
Reasoning
Jiayan Guo (Peking University)*; Meiqi Chen (Peking University); Yan Zhang (Peking University);
Jianqiang Huang (Meituan); zhiwei liu (meituan)
429: PointACL:Adversarial Contrastive Learning for Robust Point Clouds Representation under
Adversarial Attack
Junxuan Huang (University at Buffalo)*; Junsong Yuan ("State University of New York at Buffalo, USA");
Chunming Qiao (University at Buffalo); yatong an (xmotors); Cheng Lu (Xiaopeng); Chen Bai (Xpeng
Motors)
107
560: DepthFormer: Multimodal Positional Encodings and Cross-Input Attention for Transformer-
Based Segmentation Networks
Francesco Barbato (University of Padova)*; Giulia Rizzoli (University of Padova); Pietro Zanuttigh
(University of Padova)
596: Adaptive Scale and Spatial Aggregation for Real-time Object Detection
Wei Chen (College of Computer, National University of Defense Technology); Yulin He (National
University of Defense Technology)*; Zhengfa Liang (Defense Innovation Institute); Yulan Guo (National
University of Defense Technology)
599: JOINT HUMAN ORIENTATION-ACTIVITY RECOGNITION USING WIFI SIGNALS FOR HUMAN-
MACHINE INTERACTION
Hojjat Salehinejad (Mayo Clinic)*; Navid Hasanzadeh (University of Toronto); Radomir Djogo (University
of Toronto); Shahrokh Valaee (University of Toronto)
624: Enhanced Low-resolution LiDAR-Camera Calibration Via Depth Interpolation and Supervised
Contrastive Learning
Zhikang Zhang (Arizona State University)*; Zifan Yu (Arizona State University); Suya You (U.S. Army
Research Laboratory); Raghuveer Rao (Army Research Laboratory); Sanjeev Agarwal (U.S. Army
DEVCOM C5ISR Center); Fengbo Ren (Arizona State University)
643: Learning to Generate 3D Representations of Building Roofs Using Single-View Aerial Imagery
Maxim Khomiakov (Technical University of Denmark)*; Alejandro Valverde Mahou (Technical University of
Denmark); Alba Reinders Sánchez (Technical University of Denmark ); Jes Frellsen (Technical University
of Denmark); Michael Andersen (Technical University of Denmark)
681: Stochastic Optimization of Vector Quantization Methods in Application to Speech and Image
Processing
Mohammad Hassan Vali (Aalto University)*; Tom Bäckström (Aalto University)
108
684: TENSOR COMPLETION FOR EFFICIENT AND ACCURATE HYPERPARAMETER
OPTIMISATION IN LARGE-SCALE STATISTICAL LEARNING
Aaman Rebello (Imperial College London); Kriton Konstantinidis (Imperial College London)*; Yao Lei Xu
(Imperial College London); Danilo P. Mandic ((Imperial College of London, UK))
693: Autonomous Soundscape Augmentation with Multimodal Fusion of Visual and Participant-
linked Inputs
Kenneth Ooi (Nanyang Technological University)*; Karn N Watcharasupat (Georgia Institute of
Technology); Bhan Lam (NTU); Zhen-Ting Ong (Nanyang Technological University); Woon Seng Gan
(NTU )
730: K2NN: Self-supervised Learning with Hierarchical Nearest Neighbors for Remote Sensing
Jianlong Yuan (Alibaba Group)*; Yuanhong Xu (Alibaba Group); Zhibin Wang (Alibaba Group)
815: KalmanBOT: KalmanNet and Bollinger Bands based Learned Trader for Pairs Trading
Haoran Deng (ETH Zürich); Guy Revach (ETH Zürich)*; Hai Morgenstern (BeyondMinds); Nir Shlezinger
(Ben-Gurion University)
109
816: COMPLEMENTARY LEARNING SYSTEM BASED INTRINSIC REWARD IN REINFORCEMENT
LEARNING
Zijian Gao (National University of Defense Technology); Kele Xu (National Key Laboratory of Parallel and
Distributed Processing (PDL))*; Hongda Jia (National University of Defense Technology); Tianjiao Wan
(National University of Defense Technology); Ding Bo (National University of Defense Technology); Dawei
Feng (National University of Defense Technology); Xinjun Mao (National University of Defense
Technology); Huaimin Wang (National University of Defense Technology)
841: Hierarchical Multi-Task Learning for Fabric Component Analysis Based on NIR Spectral
Signals
Joseph Kim (Fudan University); Dong Wu (Fudan University); mingmin Chi (Fudan university)*; Gaoqi Xu
(Zhongshan PoolNet Technology Co. Ltd.)
903: CO-Net: Classification-oriented Point Cloud Sampling via Informative Feature Learning and
Non-overlapped Local Adjustment
Yanan Lin (Xiamen University)*; Keyu Chen (East China Normal University); Shihao Zhou (Xiamen
University); Yunan Huang (Xiamen University); Yunqi Lei (Xiamen university)
931: Wassertein GAN synthesis for time series with complex temporal dynamics: Frugal
architectures and arbitrary sample-size generation
Thomas Beroud (Ecole Centrale Nantes); Patrice Abry (CNRS, Physics Department, Ecole Normale
Supérieure de Lyon)*; Yannick Malevergne (Univ. Paris1); Marc Senneret (Vivienne Investissement );
Gerald Perrin (Vivienne Investissement); Johan Macq (Vivienne Investissement)
110
944: TransAdapt: A Transformative Framework for Online Test Time Adaptive Semantic
Segmentation
Debasmit Das (Qualcomm AI Research)*; Shubhankar Borse (Qualcomm AI Research ); Hyojin Park
(Qualcomm AI Research); Kambiz Azarian (Qualcomm AI Research); Hong Cai (Qualcomm AI
Research); Risheek Garrepalli (Qualcomm AI Research); Fatih Porikli (Qualcomm AI Research)
1011: IoU-Aware Multi-Expert Cascade Network via Dynamic Ensemble for Long-tailed Object
Detection
Wan-Cyuan Fan (National Taiwan University)*; Cheng-Yao Hong (Academia Sinica); Yen-Chi Hsu
(Academia Sinica); Tyng-Luh Liu (Academia Sinica)
111
1126: FlowReg: Latent Space Regularization using Normalizing Flow for Limited Samples Learning
Chi Wang (Queen's University Belfast)*; Jian Gao (Queen's University Belfast); Yang Hua (Queen's
University Belfast); Hui Wang (Queen's University Belfast)
1135: Conditional LS-GAN based Skylight Polarization Image Restoration and Application in
Meridian Localization
Tian Yang (Hefei University of Technology); Hongbo Bo (University of Bristol); Xinyu Yang (Lancaster
University); Jun Gao (Hefei University of Technology); Zijian Shi (University of Bristol)*
1137: Towards Trustworthy Multi-label Sewer Defect Classification via Evidential Deep Learning
Chenyang Zhao (School of Software, Southeast University); Chuanfei Hu (University of Shanghai for
Science and Technology)*; Hang Shao (University of Shanghai for Science and Technology); Zhe Wang
(University of Shanghai for Science and Technology); Yongxiong Wang (University of Shanghai for
Science and Technology)
1190: Toward Asymptotic Optimality: Sequential Unsupervised Regression of Density Ratio for
Early Classification
Akinori F Ebihara (NEC Corporation)*; Taiki Miyagawa (NEC Corporation); Kazuyuki Sakurai (NEC
Biometrics Research Laboratories); Hitoshi Imaoka (NEC Corporation)
112
1266: T5-SR: A Unified Seq-to-Seq Decoding Strategy for Semantic Parsing
Yuntao Li (Peking University)*; Zhenpeng Su (University of Chinese Academy of Sciences); yutian li
(Meituan Group); Zhang Hanchu (meituan); Sirui Wang (Meituan); Wei Wu (Meituan-Dianping Group);
Yan Zhang (Peking University)
1269: AD-YOLO: YOU LOOK ONLY ONCE IN TRAINING MULTIPLE SOUND EVENT LOCALIZATION
AND DETECTION
Jin Sob Kim (Korea University)*; Hyun Joon Park (Korea University); Wooseok Shin (Korea University);
Sung Won Han (Korea University)
1271: Bag of Tricks with Quantized Convolutional Neural Networks for image classification
Jie Hu (Institute of Software Chinese Academy of Sciences)*; Mengze Zeng (Momenta); Enhua Wu
(SKLCS, Institute of Software, Chinese Academy of Sciences, Beijing, China;Faculty of Science and
Technology, University of Macau, Macao, China )
1379: Anomalous Sound Detection using Audio Representation with Machine ID based
Contrastive Learning Pretraining
Jian Guan (Harbin Engineering University)*; Feiyang Xiao (Harbin Engineering University); Youde Liu (
Harbin Institute of Technology); Qiaoxi Zhu (University of Technology Sydney); Wenwu Wang (University
of Surrey)
113
1400: Bias Identification with RankPix Saliency
salamata konate (QUT)*; Leo Lebrat (CSIRO); Rodrigo Santa Cruz (CSIRO); Clinton Fookes
(Queensland University of Technology); Andrew Bradley (Queensland University of Technology); Olivier
Salvado (CSIRO)
1421: Designing Transformer networks for sparse recovery of sequential data using deep
unfolding
Brent De Weerdt (Vrije Universiteit Brussel )*; Yonina Eldar (); Nikos Deligiannis (Vrije Universiteit Brussel
- imec)
1429: Framewise multiple sound source localization and counting using binaural spatial audio
signals
Lei Wang (Shanghai Jiao Tong University)*; Zhibin Jiao (Huawei Technologies Co., Ltd.); Qiyong Zhao
(Huawei Technologies Co., Ltd.); jie zhu (Shanghai Jiao Tong University); Yang Fu (Huawei Technologies
Co., Ltd.)
1442: An improved optimal transport kernel embedding method with gating mechanism for
singing voice separation and speaker identification
Weitao Yuan (Tiangong University)*; Yuren Bian (Tiangong University); Shengbei Wang (Tiangong
University); Masashi Unoki (JAIST); Wenwu Wang (University of Surrey)
1463: Not All Classes are Equal: Adaptively Focus-Aware Confidence for Semi-Supervised Object
Detection
Hui Zhu (Institute of Computing Technology, Chinese Academy of Sciences)*; Yongchun Lu (Mashang
Consumer Finance Co., Ltd.); hongyu zhao (Mashang Consumer Finance Company Ltd.); Guoqing Zhao
(Mashang Consumer Finance Co., Ltd); Xiaofang Zhao (Institute of Computing Technology, Chinese
Academy of Sciences; Institute of Intelligent Computing Technology, Suzhou, CAS)
1474: Int-GNN: a User Intention Aware Graph Neural Network for Session-Based Recommendation
Guangning Xu (Harbin Institute of Technology, Shenzhen ▲)*; Jinyang Yang (Harbin Institute of
Technology, Shenzhen); Jinjin Guo (JD Intelligent Cities Research); Zhichao Huang (JD Intelligent Cities
Research); Bowen Zhang (Shenzhen Technology University)
114
1497: Interpretability in the Context of Sequential Cost-Sensitive Feature Acquisition
Yasitha Warahena Liyanage (Microsoft); Daphney-Stavroula Zois (University at Albany)*
1515: HEiMDaL: Highly Efficient Method for Detection and Localization of wake-words
Arnav Kundu (Apple)*; Mohammad Samragh (Apple); Minsik Cho (Apple ); Priyanka Padmanabhan
(Apple); Devang Naik (Apple)
1600: LEVERAGING SPARSITY WITH SPIKING RECURRENT NEURAL NETWORKS FOR ENERGY-
EFFICIENT KEYWORD SPOTTING
Manon Dampfhoffer (SPINTEC University Grenoble Alpes)*; Thomas Mesquida (CEA LIST); Emmanuel
Hardy (CEA-Leti); Alexandre Valentian (CEA-List); Lorena Anghel (SPINTEC University Grenoble Alpes)
1622: Efficient Compressed Video Action Recognition via Late Fusion with a Single Network
Hayato Terao (Hokkaido University)*; Wataru Noguchi (Hokkaido University); Hiroyuki Iizuka (Hokkaido
University); Masahito Yamamoto (Hokkaido University)
115
1649: Amicable Aid: Perturbing Images to Improve Classification Performance
Juyeop Kim (Yonsei University); Jun-Ho Choi (Yonsei University); Soobeom Jang (Yonsei University);
Jong-Seok Lee ("Yonsei University, Korea")*
1658: Channel-driven decentralized Bayesian federated learning for trustworthy decision making
in D2D networks
Luca Barbieri (Politecnico di Milano)*; Osvaldo Simeone (King's College London); Monica Nicoli
(Politecnico di Milano University)
1662: Cross-device Federated Learning for Mobile Health Diagnostics: A First Study on COVID-19
Detection
Tong Xia (University of Cambridge)*; Jing Han (); Abhirup Ghosh (University of Cambridge); Cecilia
Mascolo (University of Cambridge)
1692: Independent Vector Analysis with multivariate Gaussian model: a scalable method by
multilinear regression
Ben Gabrielson (University of Maryland, Baltimore County)*; Mingyu Sun (University of Maryland,
Baltimore County); Mohammad Akhonda (UMBC); Vince Calhoun (TReNDS); Tulay Adali (University of
Maryland, Baltimore County)
1701: Sparse Mixture Once-for-all Adversarial Training for Efficient In-Situ Trade-Off Between
Accuracy and Robustness of DNNs
Souvik Kundu (University of Southern California)*; Sairam Sundaresan (Intel AI Lab); Sharath Nittur
Sridhar (Intel AI Lab); SHUNLIN LU (The Chinese University of Hong Kong); han Tang (University of
Southern California); Peter A. Beerel (University of Southern California)
1706: Cross Modality Knowledge Distillation for Robust Pedestrian Detection in Low Light and
Adverse Weather Conditions
Mazin Hnewa (Michigan State University)*; Alireza Rahimpour (Ford Motor Company- Palo Alto); Justin
Miller (Ford); Devesh Upadhyay (Ford Motor Co.); Hayder Radha (Michigan State University)
116
1738: Training Robust Spiking Neural Networks on Neuromorphic Data with Spatiotemporal
Fragments
Haibo Shen (Huazhong University of Science and Technology)*; Yihao Luo (Yichang Testing Technique
R&D Institute); Xiang Cao (School of Computer Science and Technology, Huazhong University of Science
and Technology); Liangqi Zhang (Huazhong University of Science and Technology); Juyu Xiao (Huazhong
University of Science and Technology); Tianjiang Wang (School of Computer Science and Technology,
Huazhong University of Science and Technology)
1810: Aleatoric Uncertainty Estimation of Overnight Sleep Statistics through Posterior Sampling
using Conditional Normalizing Flows
Hans van Gorp (Eindhoven University of Technology)*; Merel M. van Gilst (Eindhoven University of
Technology); Pedro Fonseca (Philips Research); Sebastiaan Overeem (Eindhoven University of
Technology); Ruud J. G. van Sloun (Technical university of Eindhoven)
1832: HiFi++: a Unified Framework for Bandwidth Extension and Speech Enhancement
Pavel Andreev (Samsung AI Center Moscow)*; Aibek Alanov (Artificial Intelligence Research Institute);
Oleg Ivanov (Samsung AI Center Moscow); Dmitry P Vetrov (Higher School of Economics)
1837: Refined Pseudo Labeling for Source-free Domain Adaptive Object Detection
Siqi Zhang (Institute of Automation,Chinese Academy of Sciences)*; Lu Zhang (CASIA); Zhiyong Liu
(State Key Lab of Management and Control for Complex Systems, Institute of Automation, Chinese
Academy of Sciences)
117
1878: Jazznet: A Dataset of Fundamental Piano Patterns for Music Audio Machine Learning
Research
Tosiron Adegbija (University of Arizona)*
1961: The MBSTOI Binaural Intelligibility Metric Using a Close-Talking Microphone Reference
Pierre Guiraud (Imperial College London)*; Alastair H Moore (Imperial College London); Rebecca Vos
(Imperial College London); Patrick A. Naylor (Imperial College London); Mike Brookes (Imperial College
London)
1977: Batch Normalization damages Federated Learning on Non-IID data: Analysis and Remedy
Yanmeng Wang (The Chinese University of Hong Kong, Shenzhen)*; Qingjiang Shi (Tongji University);
Tsung-Hui Chang ("The Chinese University of Hong Kong,")
2070: Multilayer Subspace Learning with Self-sparse Robustness for Two-dimensional Feature
Extraction
Han Zhang (Xidian University)*; Maoguo Gong (Xidian University); Feiping Nie (Northwestern
Polytechnical University); Xuelong Li (Northwestern Polytechnical University)
2091: Deep Survival Analysis and Counterfactual Inference Using Balanced Representations
Muskan Gupta (Tata Consultancy Services - Research); Gokul Kannan (NITT); Ranjitha Prasad (IIIT
Delhi)*; Garima Gupta (TCS Innovation Labs, Delhi)
118
2117: Batch-Ensemble Stochastic Neural Networks for Out-of-Distribution Detection
Xiongjie Chen (University of Surrey)*; Yunpeng Li (University of Surrey); Yongxin Yang (Queen Mary
University of London)
2149: Dynamic Vehicle Graph Interaction for Trajectory Prediction based on Video Signals
Jian Chen (Sun Yat-sen University); Wei Wang (Shenzhen MSU-BIT University)*; Junxin Chen (Dalian
University of Technology); Ming Cai (School of Engineering, Sun Yat-sen University)
2161: LQGNet: Hybrid Model-Based and Data-Driven Linear Quadratic Stochastic Control
Solomon Goldgraber Casspi (Ben-Gurion University of the Negev); Oliver Husser (ETH Zurich); Guy
Revach (ETH Zürich); Nir Shlezinger (Ben-Gurion University)*
2195: WHC: Weighted Hybrid Criterion for Filter Pruning on Convolutional Neural Networks
Shaowu Chen (Shenzhen University)*; Weize Sun (Shenzhen University); Lei Huang (Shenzhen
University)
119
2231: SDG-L: A Semiparametric Deep Gaussian Process based Framework for Battery Capacity
Prediction
Hanbing Liu (Tsinghua University); Yanru Wu (Tsinghua University); Yang Li (Tsinghua-Berkeley
Shenzhen Institute, Tsinghua University); Ercan E Kuruoglu (Tsinghua-Berkeley Shenzhen Institute)*;
Xuan Zhang (Tsinghua University)
2246: POLICE: Provably Optimal Linear Constraint Enforcement for Deep Neural Networks
Randall Balestriero (Facebook AI Research)*; yann lecun (Facebook)
2275: Removing Radio Frequency Interference from Auroral Kilometric Radiation with Stacked
Autoencoders
Allen Chang (University of Southern California)*; Mary Knapp (Massachusetts Institute of Technology
Haystack Observatory); James LaBelle (Dartmouth College); John Swoboda (Massachusetts Institute of
Technology Haystack Observatory); Ryan Volz (Massachusetts Institute of Technology Haystack
Observatory); Philip Erickson (Massachusetts Institute of Technology Haystack Observatory)
2391: MCKD: Mutually Collaborative Knowledge Distillation for Federated Domain Adaptation and
Generalization
Ziwei Niu (Zhejiang University); Hongyi Wang (Zhejiang University); Hao Sun (Zhejiang University); Shuyi
Ouyang (Zhejiang University); Yen-Wei Chen (Ritsumeikan University); Lanfen Lin (Zhejiang University)*
120
2398: A BANDIT ONLINE CONVEX OPTIMIZATION APPROACH TO DISTRIBUTED ENERGY
MANAGEMENT IN NETWORKED SYSTEMS
Ioannis Tsetis (University of Tübingen)*; Xiaotong Cheng (University Tübingen); Setareh Maghsudi
(University of Tübingen)
2443: FFFN: Fashion Feature Fusion Network by Co-attention Model for Fashion Recommendation
Zhantu Lin (College of Computer Science and Software Engineering, Shenzhen University); Xiaoyan
Zhang (College of Computer Science and Software Engineering, Shenzhen University)*
2451: RØROS: Building a Responsive Online Recommender System via Meta-Gradients Updating
Xudong Pan (Fudan University)*; Mi Zhang (Fudan University); Duocai Wu (Ant Group)
2501: Learnable frontends that do not learn: Quantifying sensitivity to filterbank initialisation
Mark Anderson (Trinity College Dublin)*; Tomi H. Kinnunen (University of Eastern Finland); Naomi Harte
(Trinity College Dublin)
121
2596: Learned Kalman Filtering in Latent Space with High-Dimensional Data
Itay Buchnik (Ben Gurion University); Damiano Steger (ETH Zurich); Guy Revach (ETH Zürich); Ruud J.
G. van Sloun (Technical university of Eindhoven); Tirza S Routtenberg (Ben Gurion University of the
Negev); Nir Shlezinger (Ben-Gurion University)*
2626: Utility pole localization by learning from ambient traces on distributed acoustic sensing
Zhuocheng Jiang (NEC laboratories America, Inc. )*; Yue Tian (NEC laboratories America, Inc. ); Yangmin
Ding (NEC Labs America); Sarper Ozharar (NEC laboratories America, Inc.); Ting Wang (NEC
laboratories America, Inc.)
2651: Generative Modeling Based Manifold Learning for Adaptive Filtering Guidance
Karim Helwani (Amazon)*; Paris Smaragdis (University of Illinois at Urbana-Champaign); Michael M
Goodwin (AWS )
122
2707: Training Stronger Spiking Neural Networks with Biomimetic Adaptive Internal Association
Neurons
Haibo Shen (Huazhong University of Science and Technology)*; Yihao Luo (Yichang Testing Technique
R&D Institute); Xiang Cao (School of Computer Science and Technology, Huazhong University of Science
and Technology); Liangqi Zhang (Huazhong University of Science and Technology); Juyu Xiao (Huazhong
University of Science and Technology); Tianjiang Wang (School of Computer Science and Technology,
Huazhong University of Science and Technology)
2745: Robust Time Series Recovery and Classification Using Test-Time Noise Simulator Networks
Eun Som Jeon (Arizona State University)*; Suhas Lohit (Mitsubishi Electric Research Laboratories);
Rushil Anirudh (Lawrence Livermore National Laboratory); Pavan Turaga (Arizona State University)
2785: Improving Noisy Student Training on Non-target Domain Data for Automatic Speech
Recognition
YU CHEN (University of Hong Kong)*; Wen Ding (NVIDIA); Junjie Lai (NVIDIA)
2794: Strategies for Enhanced Signal Modulation Classifications Under Unknown Symbol Rates
and Noise Conditions
Ruixuan Wang (Villanova University); Yue Qi (villanova university); Mojtaba Vaezi (Villanova University);
Xun Jiao (Villanova University)*; Moeness Amin (Villanova University)
2848: Regularized Deep Generative Model Learning for Real-time Massive MIMO Channel Tracking
Lixiang Lian (ShanghaiTech University ); Ben Wang (ShanghaiTech University)*
2853: PU-EdgeFormer: Edge Transformer for Dense Prediction in Point Cloud Upsampling
Dohoon Kim (Chung-Ang University); Minwoo Shin (Chungang University); Joonki Paik (Chungang
University)*
123
2886: Towards Real-Time Person Search with Invariant Feature Learning
Chengyou Jia (Xi'an Jiaotong University)*; Minnan Luo (School of Electronic and Information Engineering,
Xi'an Jiaotong University); Zhuohang Dang (Xi'an Jiaotong University); Xiaojun Chang (University of
Technology Sydney); Qinghua Zheng (Xi'an Jiaotong University)
2906: Select the Best: Enhancing Graph Representation with Adaptive Negative Sample Selection
Xiangping Zheng (Renmin University of China)*; Xun Liang (Renmin University of China); Bo Wu (Renmin
University of China)
3028: Constrained Dynamical Neural ODE for Time Series Modelling: A Case Study on Continuous
Emotion Prediction
Ting Dang (University of Cambridge)*; Antoni Dimitriadis (University of New South Wales); Jingyao Wu
(University of New South Wales); Vidhyasaharan Sethu (University of New South Wales); Eliathamby
Ambikairajah (The University of New South Wales)
124
3085: Voice Conversion Using Feature Specific Loss Function based Self-Attentive Generative
Adversarial Network
Sandipan Dhar (National Institute of Technology Durgapur); Padmanabha Banerjee (Jalpaiguri
Engineering College); Dr. Nanda Dulal Jana (NIT Durgapur); Swagatam Das (Indian Statistical Institute)*
3100: Self-supervised Facial Action Unit Detection with Region and Relation Learning
Juan Song (Tianjin University); Zhilei Liu (Tianjin University)*
3202: WordReg: Mitigating the Gap between Training and Inference with Worst-case Drop
Regularization
Jun Xia (Westlake University)*; Ge Wang (Westlake University); Bozhen Hu (Zhejiang University &
Westlake University); Cheng Tan (Zhejiang University & Westlake University); Jiangbin Zheng (Westlake
University); Yongjie Xu (Westlake University); Stan Z. Li (Westlake University)
3207: JOINT ANN-SNN CO-TRAINING FOR OBJECT DETECTION AND IMAGE SEGMENTATION
Marc J Baltes (Ohio University); Nidal Abuhajar (Ohio University); Ye Yue (Ohio University); Charles
Smith (University of Kentucky); Jundong Liu (Ohio University)*
3256: Enrollment Rate Prediction in Clinical Trials based on CDF Sketching and Tensor
Factorization tools
Magda Amiridi (University of Virginia)*; Cheng Qian (IQVIA); Nicholas D Sidiropoulos (University of
Virginia); Lucas Glass (IQVIA)
125
3267: Prune then Distill: Dataset Distillation with Importance Sampling
Anirudh S Sundar (Georgia Institute of Technology)*; Gokce Keskin (Amazon Inc.); Chander Chandak
(Amazon Inc.); I-Fan Chen (Amazon Inc.); Pegah Ghahremani (Amazon Inc.); Shalini Ghosh (Amazon
Alexa AI)
3380: Improving Electric Load Demand Forecasting with Anchor-based Forecasting Method
Maria Tzelepi (Aristotle University of Thessaloniki)*; Paraskevi Nousi (Aristotle University of Thessaloniki);
ANASTASIOS TEFAS (Aristotle University of Thessaloniki)
3392: Bayesian Optimization with Ensemble Learning Models and Adaptive Expected
Improvement
Konstantinos D. Polyzos (University of Minnesota)*; Qin Lu (University of Minnesota); Georgios B.
Giannakis (University of Minnesota)
3402: Active Subsampling Using Deep Generative Models by Maximizing Expected Information
Gain
Koen van de Camp (Eindhoven University of Technology)*; Hamdi Joudeh (Eindhoven University of
Technology); Duarte Antunes (Eindhoven University of Technology); Ruud J. G. van Sloun (Technical
university of Eindhoven)
126
3438: Zero-shot domain adaptation of anomalous samples for semi-supervised anomaly detection
Tomoya Nishida (Hitachi, Ltd.)*; Takashi Endo (Hitachi, Ltd.); Yohei Kawaguchi (Hitachi, Ltd.)
3492: Search for efficient deep visual-inertial odometry through neural architecture search
Yu Chen (University of Michigan)*; Mingyu Yang (University of Michigan); Hun Seok Kim (Nil)
3546: Multimodal Knowledge Distillation for Arbitrary-Oriented Object Detection in Aerial Images
Zhanchao Huang (Beijing Institute of Technology)*; Wei Li (Beijing Institute of Technology, Beijing, China);
Ran Tao (Beijing Institute of Technology)
3571: A Bayesian Perspective for Determinant Minimization Based Robust Structured Matrix
Factorization
Gokcan Tatli (University of Wisconsin-Madison)*; Alper Erdogan (Koc University)
127
3590: Large-Scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-
Caption Augmentation
Yusong Wu (Mila, University of Montreal)*; Ke Chen (University of California San Diego); Tianyu Zhang
(Mila, Université de Montréal); Yuchen Hui (Université de Montréal); Taylor Berg-Kirkpatrick (UCSD);
Shlomo Dubnov (UC San Diego)
3670: Convex Optimization of Deep Polynomial and ReLU Activation Neural Networks
Burak Bartan (Stanford University)*; Mert Pilanci (Stanford University)
128
3740: A MEMORY-FREE EVOLVING BIPOLAR NEURAL NETWORK FOR EFFICIENT MULTI-LABEL
STREAM LEARNING
Sourav Mishra (Indian Institute of Science, Bangalore)*; Suresh Sundaram (Indian Institute of Science)
3748: UAV Local Path Planning Based on Improved Proximal Policy Optimization Algorithm
Jiahao xu (Nanjing University of Aeronautics and Astronautics)*; Xuefeng Yan (Nanjing University of
Aeronautics and Astronautics ); Peng Cui (Dalian Naval Academy); Xinquan Wu (Nanjing University of
Aeronautics and Astronautics); Lipeng Gu (Nanjing University of Aeronautics and Astronautics); Yan biao
Niu (Nanjing University of Aeronautics and Astronautics)
3750: Does Your Model Think Like an Engineer? Explainable AI for Bearing Fault Detection with
Deep Learning
Thomas Decker (Siemens AG and Ludwig Maximilians University)*; Michael Lebacher (Siemens AG);
Volker Tresp (Siemens AG and Ludwig Maximilian University of Munich )
3766: Intent Does Matter! Propagating High-order Relations for Exploring Interest Preferences
Xiangping Zheng (Renmin University of China)*; Xun Liang (Renmin University of China); Bo Wu (Renmin
University of China); Junlan Feng (China Mobile Research Institute); Yuhui Guo (Renmin University of
China); Sensen Zhang (Renmin University of China)
3798: Feature-Rich Audio Model Inversion for Data-Free Knowledge Distillation Towards General
Sound Classification
Zuheng Kang (Ping An Technology (Shenzhen) Co., Ltd); Yayun He (Ping An Technology (Shenzhen)
Co., Ltd); Jianzong Wang (Ping An Technology (Shenzhen) Co., Ltd)*; Junqing Peng (Ping An Technology
(Shenzhen) Co., Ltd); Xiaoyang Qu (Ping An Technology (Shenzhen) Co., Ltd); Jing Xiao (Ping An
Insurance (Group) Company of China)
3817: Learning from Label Proportion with Online Pseudo-Label Decision by Regret Minimization
Shinnosuke Matsuo (Kyushu University)*; Seiichi Uchida (Kyushu University); Ryoma Bise (Kyushu
University); Daiki Suehiro (Kyushu University)
129
3861: Spatial Cross-Attention for Transformer-based Image Captioning
Khoa Anh Ngo (Seoul National University)*; Kyuhong Shim (Seoul National University); Byonghyo Shim
(Seoul National University)
4047: SIGNAL RECONSTRUCTION FOR FMCW RADAR INTERFERENCE MITIGATION USING DEEP
UNFOLDING
Jeroen Overdevest (NXP Semiconductors, Technical University of Eindhoven)*; Arie G.C. Koppelaar
(NXP Semiconductors); Marco J.G. Bekooij (NXP Semiconductors); Jihwan Youn (Technical University of
Eindhoven); Ruud J. G. van Sloun (Technical university of Eindhoven)
130
4064: Frequency and Scale Perspectives of Feature Extraction
Liangqi Zhang (Huazhong University of Science and Technology)*; Yihao Luo (Yichang Testing Technique
R&D Institute); Xiang Cao (School of Computer Science and Technology, Huazhong University of Science
and Technology); Haibo Shen (Huazhong University of Science and Technology); Tianjiang Wang (School
of Computer Science and Technology, Huazhong University of Science and Technology)
4072: Receptive Field Reliant Zero-Cost Proxies for Neural Architecture Search
Prateek Keserwani (Samsung Research Institute Bangalore)*; Srinivas S Miriyala (Samsung Research
Institute Bangalore); Vikram Nelvoy Rajendiran (samsung Research Institute Bangalore); Pradeep
Nelahonne Shivamurthappa (Samsung R & D Institute Banglore)
4120: MCNet:Measurement-Consistent Networks via a Deep Implicit Layer for Solving Inverse
Problems
Rahul Mourya (Heriot-Watt University)*; Joao F.C. Mota (Heriot-Watt University)
4129: Joint Unmixing and Demosaicing Methods for Snapshot Spectral Images
Kinan ABBAS (Univ. Littoral Cote d’Opale , LISIC); Matthieu PUIGT (Univ. Littoral Côte d'Opale, LISIC)*;
Gilles Delmaire (LISIC); Gilles Roussel (Univ. Littoral Côte d'Opale)
4142: Neural Source Coding for bandwidth-efficient brain-computer interfacing with wireless
neuro-sensor networks
Thomas Strypsteen (KU Leuven)*; Alexander Bertrand (KU Leuven)
131
4211: A principled approach to model validation in domain generalization
Boyang Lyu (Tufts University); Thuan Nguyen (Tufts University)*; Matthias Scheutz (Tufts University);
Prakash Ishwar (Boston University); Shuchin Aeron (Tufts University)
4276: On Neural Architectures for Deep Learning-based Source Separation of Co-Channel OFDM
Signals
Gary CF Lee (MIT)*; Amir Weiss (Massachusetts Institute of Technology); Alejandro Lancho (MIT); Yury
Polyanskiy (MIT); Gregory W Wornell (MIT)
4350: A Deep Temporal Factor Analysis Method for Large Scale Financial Portfolio Selection
Yao Zhou (Shanghai JiaoTong University)*; Ruidan Su (Shanghai Jiao Tong University); Shikui Tu
(Shanghai Jiao Tong University); Lei Xu (Shanghai Jiao Tong University)
132
4384: Investigating SINDy As a Tool For Causal Discovery In Time Series Signals
Andrew O'Brien (Drexel University )*; Rosina Weber (Drexel University); Edward Kim (Drexel University)
4442: Multi-Label Temporal Evidential Neural Networks for Early Event Detection
Xujiang Zhao (NEC Lab America)*; Xuchao Zhang (Microsoft); Chen Zhao (Kitware Inc.); Jin-Hee Cho
(Virginia Tech); Lance Kaplan (DEVCOM Army Research Laboratory); DONG HYUN JEONG (University
of the District of Columbia); Audun Jøsang (University of Oslo); Haifeng Chen (NEC Labs); Feng Chen
(UT Dallas)
4498: Towards low-power heart rate estimation based on user's demographics and activity level
for wearables
Andre GC Pacheco (Samsung)*; Frank Cabello (Samsung); Paula Rodrigues (Samsung); Paula Pinto
(Samsung); Adriana Fonoff (Samsung); Otávio Penatti (SAMSUNG )
4530: SafeDeep: A Scalable Robustness Verification Framework for Deep Neural Networks
Anahita Baninajjar (Lund University)*; Kamran Hosseini (Linköping University); Ahmed Rezine (Linköping
University); Amir Aminifar (Lund University)
133
4551: Improving the Stochastic Gradient Descent's test accuracy by manipulating the l_\infty
norm of its gradient approximation
Paul Rodriguez (PUCP)*
4563: Hierarchical Graph Learning for Stock Market Prediction via a Domain-Aware Graph Pooling
Operator
Arie N Arya (Imperial College London); Yao Lei Xu (Imperial College London)*; Ljubisa Stankovic
(University of Montenegro); Danilo P. Mandic ((Imperial College of London, UK))
4631: VPPT: Visual Pre-trained Prompt Tuning Framework for Few-Shot Image Classification
Zhao Song (National Innovation Institute of Defense Technology); Ke YANG (NIIDT)*; Naiyang Guan
(National Innovation Institute of Defense Technology;Tianjin Artificial Intelligence Innovation Center);
Junjie Zhu (NIIDT); Peng Qiao (NUDT); Qingyong Hu (University of Oxford)
4668: Bayesian Network Modeling and Prediction of Transitions within the Homelessness System
Khandker Sadia Rahman (University at Albany)*; Daphney-Stavroula Zois (University at Albany);
Charalampos Chelmis (University at Albany)
4727: Low-Resource Music Genre Classification with Cross-Modal Neural Model Reprogramming
Yun-Ning Hung (TikTok)*; Chao-Han Huck Yang (Georgia Institute of Technology ); Pin-Yu Chen (IBM
Research); Alexander Lerch (Georgia Institute of Technology)
4736: SLICER: Learning universal audio representations using low-resource self-supervised pre-
training
Ashish Seth (IIT Madras)*; Sreyan Ghosh (University of Maryland, College Park); S Umesh (IIT Chennai);
Dinesh Manocha (University of Maryland at College Park)
134
4742: Data-Driven Graph Convolutional Neural Networks for Power System Contingency Analysis
Valentin Bolz (DIgSILENT GmbH & University of Tuebingen)*; Johannes Ruess (DIgSILENT GmbH);
Andreas Zell (University of Tuebingen)
4762: Improving Self-Supervised Learning for Audio Representations by Feature Diversity and
Decorrelation
Bac Nguyen (Sony Europe B.V.)*; Stefan Uhlich (Sony European Technology Center); Fabien Cardinaux
(Sony European Technology Center)
4763: Asymptotically Optimal Nonparametric Classification Rules for Spike Train Data
Mirosław Pawlak (University of Manitoba); Mateusz Pabian (AGH UST); Dominik Rzepka (AGH
University of Science and Technology)*
4840: Client Selection for Generalization in Accelerated Federated Learning: A Bandit Approach
Dan Ben Ami (Ben-Gurion University of the Negev)*; Kobi Cohen (Ben-Gurion University of the Negev);
Qing Zhao (Cornell University)
4843: Robust Monocular Localization of Drones by Adapting Domain Maps to Depth Prediction
Inaccuracies
Priyesh Shukla (University of Illinois Chicago)*; Sureshkumar Senthilkumar (University of Illinois at
Chicago); Alex C Stutts (University of Illinois Chicago); Sathya Ravi (University of Illinois at Chicago);
Theja Tulabandhula (UIC); Amit R Trivedi (University of Illinois at Chicago)
4871: Anomalous signal detection for cyber-physical systems using interpretable causal neural
network
Shuo Zhang (East China Normal University)*; Jing Liu (East China Normal University)
4914: Counterfactual explanation for multivariate times series using a contrastive variational
autoencoder
William Todo (Liebherr aerospace )*; Merwann Selmani (Liebherr Aerospace Toulouse); Béatrice Laurent
(Institut de Mathématiques de Toulouse (UMR 5219), Université de Toulouse, INSA de Toulouse); Jean-
Michel Loubes (Université Toulouse Paul Sabatier Institut de Mathématiques de Toulouse)
135
4915: GAITMIXER: SKELETON-BASED GAIT REPRESENTATION LEARNING VIA WIDE-SPECTRUM
MULTI-AXIAL MIXER
Ekkasit Pinyoanuntapong (University of North Carolina at Charlotte)*; Ayman Ali (UNCC); Pu Wang
(UNCC); Minwoo Lee (University of North Carolina at Charlotte); Chen Chen (University of Central
Florida)
5012: ANALYSING THE MASKED PREDICTIVE CODING TRAINING CRITERION FOR PRE-
TRAINING A SPEECH REPRESENTATION MODEL
Hemant Yadav (MIDAS)*; Sunayana Sitaram (Microsoft Research); Rajiv Ratn Shah (IIIT Delhi)
5036: Identifiable Bounded Component Analysis via Minimum Volume Enclosing Parallelotope
Jingzhou Hu (University of Florida); Kejun Huang (University of Florida)*
5042: DEEP LEARNING FOR LAGRANGIAN DRIFT SIMULATION AT THE SEA SURFACE
Daria Botvynko (ENIB)*; Carlos Granero-Belinchon (IMT Atlantique); Simon van Gennip (Mercator Ocean
International); Abdesslam BENZINOU (ENIB); ronan fablet (IMT Atlantique)
136
5062: SMUG: Towards robust MRI reconstruction by smoothed unrolling
Hui Li (Huazhong University of Science and Technology); jinghan jia (Michigan state university)*; Shijun
Liang (michigan state university); Yuguang Yao (Michigan State University); Saiprasad Ravishankar
(Michigan State University); Sijia Liu (Michigan State University)
5072: Towards a Robust and Efficient Classifier for Real World Radio Signal Modulation
Classification
Dancheng Liu (University of California, San Diego)*; Kazim Ergun (University of California San Diego);
Tajana S Rosing (University of California, San Diego)
5081: Online Model Compression for Federated Learning with Large Models
Tien-Ju Yang (Google)*; Yonghui Xiao (Google); Giovanni Motta (Google, Inc.); Françoise Beaufays
(Google); Rajiv Mathews (Google); Mingqing Chen (Google Inc.)
5109: gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window
Mocho Go (PKSHA Technology Inc.)*; Hideyuki Tachibana (PKSHA Technology)
5159: GraphMAD: Graph Mixup for Data Augmentation using Data-Driven Convex Clustering
Madeline Navarro (Rice University)*; Santiago Segarra (Rice University)
137
5201: Training Graph Neural Networks on Growing Stochastic Graphs
Juan Cervino (University of Pennsylvania)*; Luana Ruiz (University of Pennsylvania); Alejandro Ribeiro
(University of Pennsylvania)
5237: Modeling the Wave Equation Using Physics-Informed Neural Networks Enhanced with
Attention to Loss Weights
Shaikhah Alkhadhr (Pennsylvania State University)*; Mohamed Almekkawy (Pennsylvania State
University)
5246: DIRECTION AWARE POSITIONAL AND STRUCTURAL ENCODING FOR DIRECTED GRAPH
NEURAL NETWORKS
Yonas A Sium (Iowa State University)*; Georgios Kollias (IBM Research); Tsuyoshi Ide (IBM Research, T.
J. Watson Research Center); Payel Das (IBM Research); Naoki Abe (IBM Research); Aurelie Lozano
(IBM Research); Qi Li (Iowa State University)
5265: Clip4VideoCap: Rethinking CLIP for Video Captioning with Multiscale Temporal Fusion and
Commonsense Knowledge
Tanvir Mahmud (The University of Texas ar Austin)*; Feng Liang (The University of Texas at Austin);
Yaling Qing (University of Texas at Austin); Diana Marculescu (The University of Texas at Austin)
5273: Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects
Junghyun Koo (Seoul National University)*; Marco A Martinez Ramirez (Sony Group Corporation);
WeiHsiang Liao (Sony Group Corporation); Stefan Uhlich (Sony European Technology Center); Kyogu
Lee (Seoul National University); Yuki Mitsufuji (Sony Group Corporation)
5294: ERSAM: Neural Architecture Search For Energy-Efficient and Real-Time Social Ambiance
Measurement
Chaojian Li (Georgia Institute of Technology)*; Wenwan Chen (Rice University); Jiayi Yuan (Rice
University); Yingyan (Celine) Lin (Georgia Tech); Ashutosh Sabharwal (Rice University)
138
5327: Fast and Exact Enumeration of Deep Networks Partitions Regions
Randall Balestriero (Facebook AI Research)*; yann lecun (Facebook)
5329: CNEG-VC: Contrastive Learning using Hard Negative Example in Non-parallel Voice
Conversion
Bima Prihasto (National Central University); YiXing Lin (National Central University); Le Phuong (National
Central University); CHIEN-LIN HUANG (NCKU); Jia-Ching Wang (National Central University)*
5372: Towards Scale Adaptive Underwater Detection through Refined Pyramid Grid
Xiaoheng Deng (Central South University)*; Lirong Liao (Xinjiang Univiersity); Ping Jiang (Central South
University); Yurong Qian (Xinjiang Univiersity)
5397: Spammer Detection on Short Video Applications: A New Challenge and Baselines
Muyang Yi (Shanghai Jiao Tong University)*; Dong Liang (ByteDance); Rui Wang (Bytedance AI Lab);
Yue Ding (Shanghai Jiao Tong University); Hongtao Lu (Shanghai Jiao Tong University)
5487: A Gaussian Latent Variable Model for Incomplete Mixed Type Data
Marzieh Ajirak (Stony Brook University)*; Petar Djuric ()
139
5498: ONLINE CACHING WITH FETCHING COST FOR ARBITRARY DEMAND PATTERN: A DRIFT-
PLUS-PENALTY APPROACH
Shashank P (IIT Dharwad); Bharath Bettagere (IIT Dharwad)*
5514: Dual Collaborative Visual-Semantic Mapping for Multi-Label Zero-Shot Image Recognition
Yunqing Hu (Zhejiang University); Xuan Jin (Alibaba Turing Lab, Alibaba Group); Xi Chen (Zhejiang
University ); Yin Zhang (Zhejiang University)*
5551: Dual-Stage Graph Convolution Network With Graph Learning For Traffic Prediction
Li Zilong (Heilongjiang University); Qianqian Ren (Heilongjiang University)*; Long Chen (Heilongjiang
University); jianguo sun (xidian university)
5578: A Contrastive Knowledge Transfer Framework for Model Compression and Transfer
Learning
kaiqi zhao (Arizona State Univesity)*; Yitao Chen (Arizona State University); Ming Zhao (Arizona State
University)
5620: A Probabilistic Framework for Pruning Transformers via a Finite Admixture of Keys
Tan Minh Nguyen (University of California, Los Angeles)*; Tam Minh Nguyen (FPT Software); Long Minh
Bui (FPT Software); Hai Do (FPT Software); Duy Khuong Nguyen (FPT Software Ltd. - FPT Corporation);
Dung D. D. Le (College of Engineering and Computer Science, VinUniversity); Hung Tran-The (Deakin
University); Nhat Ho (University of Texas at Austin); Stanley Osher (UCLA); Richard Baraniuk (Rice
University)
5666: Forensics for Adversarial Machine Learning through Attack Mapping Identification
Allen H Yan (Oregon State University)*; Jinsub Kim ("); Raviv Raich (Oregon State University)
140
5679: Multi-task Bias-Variance Trade-off Through Functional Constraints
Juan Cervino (University of Pennsylvania)*; Juan Andres Bazerque (Univerity of Pittsburgh); Miguel
Calvo-Fullana (Universitat Pompeu Fabra); Alejandro Ribeiro (University of Pennsylvania)
5703: A hybrid deep neural network for nonlinear causality analysis in complex industrial control
system
Tian Feng (Zhejiang University)*; Qiming Chen (DAMOAcademy,AlibabaGroup); Yao Shi (Zhejiang
University); Xun Lang (Yunnan University); Lei Xie (Zhejiang University); Hongye Su (Zhejiang
University)
5854: Learning Unbiased Rewards with Mutual Information in Adversarial Imitation Learning
LiHua Zhang (School of Computer Science and Technology, Soochow University)*; Quan Liu (School of
Computer Science and Technology, Soochow University); Zhigang Huang (School of Computer Science
and Technology, Soochow University, Suzhou, China); Lan Wu (School of Computer Science and
Technology, Soochow University)
141
5885: Differential Analysis for Networks Obeying Conservation Laws
Anirudh Rayas (Arizona State University)*; Rajasekhar Anguluri (Arizona State University); Jiajun Cheng
(Arizona State University); Gautam Dasarathy (Arizona State University)
5891: Training Robust Spiking Neural Networks with ViewPoint Transform and SpatioTemporal
Stretching
Haibo Shen (Huazhong University of Science and Technology)*; Juyu Xiao (Huazhong University of
Science and Technology); Yihao Luo (Yichang Testing Technique R&D Institute); Xiang Cao (School of
Computer Science and Technology, Huazhong University of Science and Technology); Liangqi Zhang
(Huazhong University of Science and Technology); Tianjiang Wang (School of Computer Science and
Technology, Huazhong University of Science and Technology)
5963: LE-DTA: Local Extrema convolution for Drug Target Affinity Prediction
Tanoj Langore (National Taiwan University); Te-Cheng Hsu (National Tsing Hua University); Yi Hsien
Hsieh (National Taiwan University); Che Lin (National Taiwan University)*
6053: Diffusion Probabilistic Modeling for Fine-Grained Urban Traffic Flow Inference With Relaxed
Structural Constraint
Xovee Xu (University of Electronic Science and Technology of China)*; Yutao Wei (University of
Electronic Science and Technology of China); Pengyu Wang (School of Information and Software
Engineering, University of Electronic Science and Technology of China); Xucheng Luo (University of
Electronic Science and Technology of China); Fan Zhou (School of Information and Software
Engineering, University of Electronic Science and Technology of China); Goce Trajcevski (Iowa State
University)
142
6100: Multi-aspect Interest Neighbor-augmented Network for Next-basket Recommendation
Zhiying Deng (Huazhong University of Science and Technology); Jianjun Li (School of Computer Science
and Technology, Huazhong University of Science and Technology)*; Zhiqiang Guo (School of Computer
Science and Technology, Huazhong University of Science and Technology ); Guohui Li (School of
Computer Science and Technology Huazhong University of Science and Technology)
6138: Inv-SENet: Invariant Self Expression Network for clustering under biased data
Ashutosh Singh (Northeastern University)*; Ashish Singh (University of Massachusetts Amherst); Aria
Masoomi (Northeastern University); Tales Imbiriba (Northeastern University); Erik Learned-Miller
(University of Massachusetts, Amherst); Deniz Erdogmus (Northeastern University)
6204: Boosting Semi-Supervised Federated Learning with Model Personalization and Client-
Variance-Reduction
Shuai Wang (Singapore University of Technology and Design)*; Yanqing Xu (The Chinese University of
HongKong, Shenzhen); Yanli Yuan (Singapore University of Technology and Design); Xiuhua Wang
(Huazhong University of Science and Technology); Tony Quek (Singapore University of Technology and
Design)
6278: DPP-based Client Selection for Federated Learning with Non-IID Data
Yuxuan Zhang (Northwest A&F University)*; chao xu (Northwest A&F University); Howard H. Yang (ZJU-
UIUC Institute); Xijun Wang (Sun Yat-sen University); Tony Quek (Singapore University of Technology
and Design)
143
6300: SyncNet: correlating objective for time delay estimation in audio signals
Akshay Raina (Indian Institute of Technology Kanpur); Vipul Arora (IIT Kanpur)*
6385: Joint Cryo-ET Alignment and Reconstruction with Neural Deformation Fields
Valentin Debarnot (University of Basel)*; Sidharth Gupta (University of Illinois at Urbana-Champaign);
Konik Kothari (University of Illinois at Urbana-Champaign); Ivan Dokmanic (University of Basel)
6417: Asymptotic Distribution of Stochastic Mirror Descent Iterates in Average Ensemble Models
Taylan Kargin (California Institute of Technology)*; Fariborz Salehi (California Institute of Technology);
Babak Hassibi (Caltech)
6423: SpectraNet-SO(3): Learning Satellite Orientation from Optical Spectra by Implicitly Modeling
Mutually Exclusive Probability Distributions on the Rotation Manifold
Matthew Phelps (Odyssey Systems)*; Ryan Swindle (Odyssey Systems); Zack Gazak (Odyssey
Systems); Andrew Vandenberg (AFRL); Justin Fletcher (Odyssey Systems)
144
Multimedia Signal Processing
726: C2BN: Cross-modality and Cross-scale Balance Network for multi-modal 3D Object Detection
BoNan Ding (Chingqing University); Jin Xie (Chongqing University)*; Jing Nie (Chongqing University)
768: FedVMR: A New Federated Learning method for Video Moment Retrieval
Yan Wang (Shandong University); Xin Luo (Shandong University)*; Zhen-Duo Chen (Shandong
University); Peng-Fei Zhang (University of Queensland); Meng Liu (Shandong Jianzhu University); Xin-
Shun Xu (Shandong University)
145
976: Multi-source Templates Learning for Real-time Aerial Tracking
Yiming Sun (East China Normal University); Yang Li (East China Normal University)*; Changbo Wang
(East China Normal University)
1264: Embrace Smaller Attention: Efficient Cross-Modal Matching with Dual Gated Attention
Fusion.
Weikuo Guo (Dalian Univercity of Technology); Xiangwei Kong (Zhejiang Univercity)*
1586: MMATR: A lightweight approach for Multimodal Sentiment Analysis based on tensor
methods
Panagiotis Koromilas (University of Athens)*; Mihalis A Nicolaou (The Cyprus Institute); Theodoros
Giannakopoulos (NCSR Demokritos); Yannis Panagakis (University of Athens)
146
1700: Detection of Real-time DeepFakes in Video Conferencing with Active Probing and Corneal
Reflection
Hui Guo (University at Buffalo, SUNY)*; Xin Wang (University at Buffalo, SUNY); Siwei Lyu (University at
Buffalo)
2337: Locality Preserving Multiview Graph Hashing for Large Scale Remote Sensing Image Search
Wenyun Li (University of Macau); Guo Zhong (University of Macau); XINGYU LU (University of Macau);
Chi-Man Pun (University of Macau)*
2464: INDUCTIVE RELATION PREDICTION FROM RELATIONAL PATHS AND CONTEXT WITH
HIERARCHICAL TRANSFORMERS
Jiaang Li (University of Science and Technology of China)*; Quan Wang (Beijing University of Posts and
Telecommunications); Zhendong Mao (University of Science and Technology of China)
147
2980: Multimodal Propaganda Detection via Anti-persuasion Prompt Enhanced Contrastive
Learning
Jian Cui (Wuhan University of Technology)*; Lin Li (Wuhan University of Technology); Xin Zhang (Wuhan
University of Technology); Jingling Yuan (Wuhan University of Technology)
3337: SPARSE CONVOLUTION BASED OCTREE FEATURE PROPAGATION FOR LIDAR POINT
CLOUD COMPRESSION
Muhammad Asad Lodhi (InterDigital)*; Jiahao Pang (InterDigital); Dong Tian (InterDigital)
3515: Unrestricted Anchor Graph based GCN for Incomplete Multi-view Clustering
Liang Zhao (Dalian University of Technology)*; Zihao Wang (Dalian University of Technology); Yukun
Yuan (Dalian University of Technology); Feng Ding (Dalian University of Technology)
3605: A Mutli-stage Hierarchical Relational Graph Neural Network for Multimodal Sentiment
Analysis
Peizhu Gong (Shanghai Maritime University)*; Jin Liu (Shanghai Maritime University); Xiliang Zhang
(Shanghai Maritime University); XingYe Li (Shanghai Maritime University)
3694: An End-to-End Framework for Partial View-aligned Clustering with Graph Structure
Liang Zhao (Dalian University of Technology)*; Qiongjie Xie (大连理工大学); Songtao Wu (大连理工大学);
shubin ma (Dalian University of Technology)
148
3786: Whether Contribution of Features Differ Between Video-mediated and In-person Meetings in
Important Utterance Estimation
Fumio Nihei (NTT)*; Ryo Ishii (NTT); Yukiko Nakano (Seikei Univeristy); Atsushi Fukayama (NTT); Takao
Nakamura (NTT)
149
4422: WL-MSR: Watch and Listen for Multimodal Subtitle Recognition
Jiawei Liu (Institute of Automation, Chinese Academy of Sciences and School of Artificial Intelligence,
University of Chinese Academy of Sciences)*; Hao Wang (National Laboratory of Pattern Recognition,
Institute of Automation, Chinese Academy of Sciences and School of Artificial Intelligence, University of
Chinese Academy of Sciences); Weining Wang ( The Laboratory of Cognition and Decision Intelligence
for Complex Systems, Institute of Automation, Chinese Academy of Sciences); Xingjian He (National
Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences and School of
Artificial Intelligence, University of Chinese Academy of Sciences); Jing Liu (National Lab of Pattern
Recognition, Institute of Automation,Chinese Academy of Sciences)
4532: Exploiting modality-invariant feature for robust multimodal emotion recognition with
missing modalities
Haolin Zuo (Inner Mongolia University)*; Rui Liu (Inner Mongolia University); Jinming Zhao (Qiyuan Lab);
Guanglai Gao (Inner Mongolia University); Haizhou Li (The Chinese University of Hong Kong
(Shenzhen))
4561: CN-CVS: A Mandarin Audio-Visual Dataset for Large Vocabulary Continuous Visual to
Speech Synthesis
Chen Chen (Tsinghua University)*; Dong Wang (Tsinghua University); Thomas Fang Zheng ("CSLT,
Tsinghua University")
4648: Multimodal Dyadic Impression Recognition via Listener Adaptive Cross-Domain Fusion
Yuanchao Li (University of Edinburgh)*; Peter Bell (University of Edinburgh ); Catherine Lai
(University of Edinburgh)
4725: Vision, Deduction and Alignment: An Empirical Study on Multi-modal Knowledge Graph
Alignment
Li Yangning (Tsinghua Shenzhen International Graduate School)*; Jiaoyan Chen (The University of
Manchester); Yinghui Li (Tsinghua University); Yuejia Xiang (Tencent); Xi Chen (Tencent); Hai-Tao Zheng
(Tsinghua University)
150
5289: Contextually-rich human affect perception using multimodal scene information
Digbalay Bose (University of Southern California)*; Rajat Hebbar (University of Southern California);
Krishna Somandepalli (University of Southern California); Shrikanth Narayanan (USC)
5559: Audio-driven facial landmark generation in violin performance using 3DCNN network with
self attention model
Ting-Wei Lin (Academia Sinica)*; Chao-Lin Liu (National Chengchi University); Li Su (Academia Sinica)
5749: BAT: Bi-Alignment Based on Transformation in Multi-Target Domain Adaptation for Semantic
Segmentation
Xian Zhong (Wuhan University of Technology); Wei Li (WuHan University of Technology); Liang Liao
(Nanyang Technological University)*; Jing Xiao (Wuhan University); Wenxuan Liu (Wuhan University of
Technology); Wenxin Huang (Hubei University); Zheng Wang (Wuhan University)
5893: BIRD-PCC: Bi-directional Range Image-based Deep LiDAR Point Cloud Compression
Chia-Sheng Liu (National Taiwan University)*; Jia-Fong Yeh (National Taiwan University); Hao Hsu
(National Taiwan University); Hung-Ting Su (National Taiwan University); Ming-Sui Lee (National Taiwan
University); Winston H. Hsu (National Taiwan University)
6037: Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation
Qi Chen (Shanghai Jiao Tong University)*; Ziyang Ma (Shanghai Jiao Tong University); Tao Liu (Shanghai
Jiao Tong University); Xu Tan (Microsoft Research Asia); Qu Lu (Shanghai Media Tech); Kai Yu
(Shanghai Jiao Tong University); Xie Chen (Shanghai Jiaotong University)
151
6090: TWO-STREAM JOINT-TRAINING FOR SPEAKER INDEPENDENT ACOUSTIC-TO-
ARTICULATORY INVERSION
Jianrong Wang (School of Computer Science and Technology, Tianjin University, Tianjin, China); Jinyu
Liu (Tianjin University); Xuewei Li (Tianjin University); Mei Yu (Tianjin University); Jie Gao (Tianjin
University); Qiang Fang (Chinese Academy of Social Sciences); Li Liu (Shenzhen Research Institute of
Big Data, the chinese university of hong kong shenzhen)*
6122: Code-Switching Speech Synthesis Based on Self- Supervised Learning and Domain
Adaptive Speaker Encoder
YiXing Lin (National Central University); Cheng-Hsun Pai (National Central University); Le Phuong
(National Central University); Bima Prihasto (National Central University); CHIEN-LIN HUANG (NCKU);
Jia-Ching Wang (National Central University)*
6163: The Multimodal Information Based Speech Processing (MISP) 2022 Challenge: Audio-Visual
Diarization and Recognition
Zhe Wang (University of Science and Technology of China)*; Shilong Wu (University of Science and
Technology of China); Hang Chen (USTC); Mao-Kui He (University of Science and Technology of China);
Jun Du (University of Science and Technology of China); Chin-hui Lee (Georgia Institute of Technology);
Shinji Watanabe (Carnegie Mellon University); Sabato M Siniscalchi (Kore University of Enna); Odette
Scharenborg (Multimedia Computing Group, Delft University of Technology); Baocai Yin
(USTC,iFLYTEK); Jia Pan (iFlytek Research); Cong Liu (iFLYTEK Research)
6290: Guide and Select: A Transformer-based Multimodal Fusion Method for Points of Interest
Description Generation
Hanqing Liu (Tsinghua Shenzhen International Graduate School)*; Wei Wang (Tsinghua University); Niu
Hu (Tsinghua University); Hai-Tao Zheng (Tsinghua University); Rui Xie (Meituan); Wei Wu (Meituan);
Yang Bai (Tsinghua University)
6294: Deep probabilistic model for lossless scalable point cloud attribute compression
Dat Thanh Nguyen (University of Erlangen-Nuremberg)*; Kamal Nambiar (Friedrich-Alexander-Universität
Erlangen-Nürnberg); Andre Kaup (Friedrich-Alexander-Universität Erlangen-Nürnberg)
6332: Abusive activity detection with multi-modality based on convolutional neural network
Jisoo Kim (Korea Institute of Science and Technology (KIST))*; Hyebin Ahn ( Korea Institute of
Science and Technology (KIST)); Byounghyun Yoo (Korea Institute of Science and Technology (KIST))
152
6478: IMPROVING THE MODALITY REPRESENTATION WITH MULTI-VIEW CONTRASTIVE
LEARNING FOR MULTIMODAL SENTIMENT ANALYSIS
Peipei Liu (School of Cyber Security, University of Chinese Academy of Sciences); Xin Zheng (Henan
University); Hong Li (Institute of Information Engineering,Chinese Academy of Sciences)*; Liu Jie
(Institute of Information Engineering,Chinese Academy of Sciences); Yimo Ren (Beijing Haidian);
Hongsong Zhu (Institute of information Engineering, CAS); Limin Sun (Institute of Information
Engineering, Chinese Academy of Sciences)
153
Sensor Array and Multichannel Signal Processing
278: Improved Deep Speaker Localization and Tracking: Revised Training Paradigm and
Controlled Latency
Alexander Bohlender (IDLab, Ghent University - imec)*; Liesbeth Roelens (IDLab, Ghent University -
imec); Nilesh Madhu (IDLab, Ghent University - imec)
910: Variational Message Passing-based Respiratory Motion Estimation and Detection Using
Radar Signals
Jakob Möderl (Graz University of Technology)*; Erik Leitinger (Graz University of Technology); Franz
Pernkopf (Graz University of Technology); Klaus Witrisal (Graz University of Technology, Austria)
981: Optimal Mixed-ADC arrangement for DOA estimation via CRB under ULA
Xinnan Zhang (University of Science and Technology of China)*; Yuanbo Cheng (University of Science
and Technology of China); Xiaolei Shang (University of Science and Technology of China); Jun Liu
(University of Science and Technology of China)
986: Robust Iterative Solution for Linear Array-Based 3-D Localization By Message Passing
Yimao Sun (Sichuan University); Dominic Ho (Nil); Yanbing Yang (Sichuan University); Lei Zhang
(Sichuan University)*; Liangyin Chen (Sichuan University)
154
1482: GCC-speaker: Target Speaker Localization with Optimal Speaker-dependent Weighting in
Multi-speaker Scenarios
Guanjun Li (Institute of Automation, Chinese Academy of Sciences)*; Wei Xue (Department of Computer
Science, Hong Kong Baptist University, Hong Kong SAR, China); Wenju Liu (National Laboratory of
Pattern Recognition, Institute of Automation, University of Chinese Academy of Sciences, Beijing, China);
Jiangyan Yi (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of
Sciences); Jianhua Tao ("National Laboratory of Pattern Recognition, Institute of Automation, Chinese
Academy of Sciences")
1816: Radio-astronomy imaging and interference excision using tensor decomposition and
canonical correlation analysis
Mikael Sorensen (University of Virginia)*; Nicholas D Sidiropoulos (University of Virginia)
2049: Resolving Doppler Ambiguity via Spread Phase Alignment in FDA-MIMO Radar
Yanxing Wang (National Laboratory of Radar Signal Processing, Xidian University,)*; Shengqi
Zhu (National Laboratory of Radar Signal Processing, Xidian University,); Guisheng Liao (National
Laboratory of Radar Signal Processing, Xidian University,); Lan Lan (National Laboratory of Radar Signal
Processing, Xidian University,); Zhuochen Chen (National Laboratory of Radar Signal Processing, Xidian
University,); Feilong Liu (National Laboratory of Radar Signal Processing, Xidian University,)
2130: Bias Reduced Semidefinite Relaxation Method for Multistatic Localization in the Absence of
Transmitter Position and Its Synchronization
Pei Jian (Ningbo University); Gang Wang (Ningbo University)*; Dominic Ho (Nil); Lei Huang (Shenzhen
University)
2207: Gridless Target Localization for FDA-MIMO Radar with Sparse Arrays
Xiaohuan Wu (Nanjing University of Posts and Telecommunications)*; yaxin liu (Nanjing University of
Posts and Telecommunications); Xiaoyuan Jia (Nanjing University of Posts and Telecommunications)
2229: Exploiting Sparse Recovery Algorithms for Semi-Supervised Training of Deep Neural
Networks for Direction-of-Arrival Estimation
Murtiza Ali (Indian Institute of Technology, Jammu)*; Aditya Arie Nugraha (RIKEN); Karan Nathwani
(Indian Institute of Technology, Jammu)
155
2238: ROBUST SUBSPACE TRACKING WITH CONTAMINATION MITIGATION VIA $\alpha$-
DIVERGENCE
LE Trung Thanh (University of Orleans)*; Aref Miri Rekavandi (University of Melbourne, Melbourne,
Australia); Abd-Krim Seghouane (University of Mebourne); KARIM ABED-MERAIM (PRISME laboratory,
university of Orleans, France)
2321: Wireless location tracking via complex-domain Super MDS with time series self-localization
information
Yuya Nishi (Osaka University)*; Takumi Takahashi (Osaka University); Hiroki Iimori (Ericsson Research);
Giuseppe Abreu (Jacobs University Bremen); Shinsuke Ibi (Doshisha University); Seiichi Sampei (Osaka
University)
2330: Dual-Stream Siamese Vision Transformer with Mutual Attention for Radar Gait Verification
Ran Ji (School of Computer Science, University of Nottingham Ningbo China); Jiarui Li (School of
Computer Science, University of Nottingham Ningbo China); Wentao He (University of Nottingham
Ningbo China); Jianfeng Ren (University of Nottingham Ningbo China)*; Xudong Jiang (Nanyang
Technological University)
2463: Angle-of-arrival Target Tracking Using a Mobile UAV In External Signal-denied Environment
Bing Zhu (Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences); Sheng Xu
(Shenzhen Institute of Advanced technology, Chinese Academy of Sciences)*; Feng Rice (QinetiQ
Australia); Kutluyil Dogancay (University of South Australia)
2662: Fast Cross-Correlation for TDoA Estimation on Small Aperture Microphone Arrays
François Grondin (Université de Sherbrooke)*; Marc-Antoine Maheux (Université de Sherbrooke); Jean-
Samuel Lauzon (Université de Sherbrooke); Jonathan Vincent (Université de Sherbrooke); Francois
Michaud (Universite de Sherbrooke)
2839: Hypothesis test for leakage detection in water pipelines with high-dimensional sensor
signals
Liusha Yang (Shenzhen Technology University)*; Matthew McKay (University of Melbourne); Xun Wang
(Beihang University)
156
2937: A Computationally Efficient Algorithm for Distributed Adaptive Signal Fusion based on
Fractional Programs
Cem A. Musluoglu (KU Leuven)*; Alexander Bertrand (KU Leuven)
2973: Dynamic Independent Component Extraction with Blending Mixing Vector: Lower Bound on
Mean Interference-to-Signal Ratio
Jaroslav Čmejla (Technical University of Liberec)*; Zbynek Koldovsky (Technical University of Liberec);
Václav Kautský (Technical University of Liberec); Tulay Adali (University of Maryland, Baltimore County)
3098: Target Velocity Estimation for Quantization-Based Cooperative MIMO Radar and
Communications System
Zhen Wang (Southwest Petroleum University); xue xue yan (yanxuedan); Qian He (University of
Electronic Science and Technology of China)*; Rick S Blum (Lehigh University)
3217: Data Driven Joint Sensor Fusion and Regression based on Geometric Mean Squared Error
Carlos A Lopez Molina (Polythecnic University of Catalonia)*; Jaume Riba (UPC)
3298: Optimal Carrier Frequency Design for Frequency Diverse Array MIMO Radar
Jie Cheng (University of electronic science and technology of China); Maria Juhlin (Lund University);
Wen-Qin Wang (University of electronic science and technology of China)*; Andreas Jakobsson (Lund
University)
3324: Equivalence of aperture reduction in element space and constrained combination of DFT
beams in beamspace
Damir Rakhimov (TU-Ilmenau)*; Martin Haardt (Ilmenau University of Technology)
3637: Binary sequence set optimization for CDMA applications via mixed-integer quadratic
programming
Alan Yang (Stanford University)*; Tara Mina (Stanford University); Grace Gao (Stanford University)
157
3824: Graph Signal Processing for Narrowband Direction of Arrival Estimation
Disheng Li (University of Sheffield); Wei Liu (University of Sheffield)*; Yuriy Zakharov (University of York);
Paul Mitchell (University of York)
3948: Sparse Bayesian Learning Based Three-Dimensional Imaging for Antenna Array Radar
Yuhan Li (Xiamen University)*; Jesper Rindom Jensen (Aallborg University); Maozhong Fu (Xiamen
University of Technology); Zhenmiao Deng (Sun Yat-sen University); Mads G. Christensen (Audio
Analysis Lab., AD:MT, Aalborg University, Denmark)
4043: Sensor Selection for Angle of Arrival Estimation Based on the Two-Target Cramér-Rao
Bound
Costas A Kokke (Delft University of Technology)*; Mario Coutino (TNO); Laura Anitori (TNO); Richard
Heusdens (Netherlands Defence Academy); Geert Leus (TU Delft)
4108: Transmit Energy Focusing for Parameter Estimation in Transmit Beamspace Slow-time
MIMO Radar
Tingting Zhang (Nanjing University of Science and Technology)*; Feng Xu (Aalto University); Sergiy
Vorobyov ()
4132: Sparse Non-Contact Multiple People Localization and Vital Signs Monitoring Via FMCW
Radar
Yonathan Eder (Weizmann Institute of Science)*; Zhuoyang Liu (Weizmann Institute of Science); Yonina
Eldar ()
158
4392: Deep learning-based compressive sampling optimization in massive MIMO systems
Saidur Pavel (Temple University); Yimin D Zhang (Temple University)*; Maria S. Greco (University of
Pisa); Fulvio Gini (University of Pisa)
4528: Soft label coding for end-to-end sound source localization with ad-hoc microphone arrays
Linfeng Feng (Northwestern Polytechnical University)*; Yijun Gong (Northwestern Polytechnical
University); Zhang XiaoLei (Northwestern Polytechnical University)
4633: RATE SPLITTING AND PRECODING STRATEGIES FOR MULTI-USER MIMO BROADCAST
CHANNELS WITH COMMON AND PRIVATE STREAMS
Liana Khamidullina (Ilmenau University of Technology)*; André Almeida (Federal University of Ceará);
Martin Haardt (Ilmenau University of Technology)
4640: RANGE-ISL MINIMIZATION AND SPECTRAL SHAPING IN MIMO RADAR SYSTEMS VIA
WAVEFORM DESIGN
Ehsan Raei (SnT, University of Luxembourg)*; Mohammad Alaee (University of Luxembourg); Bhavani
Shankar Mysore Ramarao (University of Luxembourg); Bjorn Ottersten (SnT)
4658: Deep Learning Sparse Array Design Using Binary Switching Configurations
Syed Ali Hamza (Widener University)*; Kyle Juretus (Villanova University); Moeness Amin (Villanova
University); Fauzia Ahmad (Temple University)
159
4947: SUPER DILATED NESTED ARRAYS WITH IDEAL CRITICAL WEIGHTS AND INCREASED
DEGREES OF FREEDOM
Ahmed Mohammed Shaalan (University of Science and Technology of China)*; Jun Du (University of
Science and Technology of China)
5266: DIRECT POSITION DETERMINATION WITH ONE-BIT SIGNAL FOR MULTIPLE TARGETS
Lihua Ni (University of Electronic Science and Technology of China)*; Di Zhang (University of Electronic
Science and Technology of China); Tianyi Xing (University of Electronic Science and Technology of
China); Maoyan Ran (University of Electronic and technology of China); Ning Liu (Northern Institute of
Electronic Equipment); Qun Wan (University of Electronic Science and Technology of China)
5483: Waveform design to improve the estimation of target parameters using the Fourier
Transform method in a MIMO OFDM DFRC system
Satwika Bhogavalli (Department of Electrical Communication Engineering, Indian Institute of Science,
Bangalore)*; Eric Grivel (Bordeaux INP, IMS laboratory); KVS Hari (Department of Electrical
Communication Engineering, Indian Institute of Science, Bangalore); Vincent Corretja (THALES)
6019: Quantized Precoding and RIS-Assisted Modulation for Integrated Sensing and
Communications Systems
R.S. Prasobh Sankar (Indian Institute of Science Bangalore)*; Sundeep Prabhakar Chepuri (Indian
Institute of Science)
6240: Nonnegative block-term decomposition with the β-divergence: joint data fusion and blind
spectral unmixing
Clémence Prévost (University of Lille)*; Valentin Leplat (Skoltech)
160
6272: A DNN BASED NORMALIZED TIME-FREQUENCY WEIGHTED CRITERION FOR ROBUST
WIDEBAND DOA ESTIMATION
Kuan-Lin Chen (University of California San Diego)*; Ching-Hua Lee (University of California, San Diego);
Bhaskar Rao (UC San Diego); Harinath Garudadri (University of California, San Diego)
6422: Mixed Far-field and Near-field Source Localization Based on Low-Rank Matrix
Reconstruction
Yunchang Liu (Jilin University); Hong Jiang (Jilin University)*; Qi Zhang (Jilin University)
161
Signal Processing Education
5324: On Designing A 3D Imaging Summer Project for Ontario's High School Students during
Covid-19 Pandemic
Fengbo Lan (York University); Gene Cheung (York University)*; Prabhkirat Arora (York University);
Deinabo Richard-Koko (York University); Lisa Cole (York University)
162
Signal Processing for Communications and Networking
569: Noncoherent multiuser Grassmannian Constellations for the MIMO Multiple Access Channel
Javier Álvarez Vizoso (Universidad de Cantabria)*; Diego Cuevas (Universidad de Cantabria); Carlos
Beltrán (Universidad de Cantabria); Ignacio Santamaria (University of Cantabria); Vít Tucek (Huawei
Technologies); Gunnar Peters (Huawei Sweden)
1256: Optimizing distributed multi-sensor multi-target tracking algorithm based on labeled multi-
bernoulli filter
Honggang Liu (Fudan University)*; Jinlong Yang (Jiangnan university); Yue Xu (Jiangnan University); Le
Yang (University of Canterbury)
1704: EH-Enabled Distributed Detection Over Temporally Correlated Markovian MIMO Channels
Ghazaleh Ardeshiri (University of central Florida)*; Azadeh Vosoughi (University of Central Florida)
163
1808: Structure-aware Sparse Bayesian Learning-based Channel Estimation for Intelligent
Reflecting Surface-aided MIMO
Yanbin He (Delft University of Technology)*; Geethu Joseph (TU Delft)
1937: Scaling Law Analysis for Covariance Based Activity Detection in Cooperative Multi-Cell
Massive MIMO
Ziyue Wang (Chinese Academy of Sciences)*; Ya-Feng Liu (Chinese Academy of Sciences); Zhaorui
Wang (The Chinese University of Hong Kong); Wei Yu (University of Toronto)
1960: A Causal Convolutional Approach for Packet Loss Concealment in Low Powered Devices
Steven Davy (Huawei)*; Niamh Belton (Science Foundation Ireland Centre for Research Training in
Machine Learning, University College Dublin); Joshua Tobin (Huawei); Owais Bin Zuber (Huawei); Liu
Dong (Huawei); Yuan Xuewen (Huawei)
2312: Sparse Aggregation-Based Channel Estimation for Massive MIMO Systems With
Decentralized Baseband Processing
Yanqing Xu (The Chinese University of HongKong, Shenzhen)*; Enbin Song (Sichuan University);
Qingjiang Shi (Tongji University); Tsung-Hui Chang ("The Chinese University of Hong Kong,")
2413: Model-Free Online Learning for Waveform Optimization in Integrated Sensing and
Communications
Petteri Pulkkinen (Aalto University, Saab Finland Oy)*; Visa Koivunen (Aalto university)
2438: INVERSE REINFORCEMENT LEARNING WITH GRAPH NEURAL NETWORKS FOR IOT
RESOURCE ALLOCATION
Guangchen Wang (The University of Sydney)*; Peng Cheng (La Trobe University); Zhuo Chen (CSIRO);
Wei Xiang (La Trobe University); Branka Vucetic (University of Sydney); Yonghui Li (THE UNIVERSITY
OF SYDNEY)
3004: Integrated Sensing and Full-Duplex Communication: Joint Transceiver Beamforming and
Power Allocation
Zhenyao He (Southeast University)*; Wei Xu (Southeast University); Hong Shen (Southeast University);
Derrick Wing Kwan Ng (University of New South Wales); Yonina C. Eldar (Weizmann Institute of
Science); Xiaohu You (Southeast University)
164
3025: Regularized Neural Detection for Millimeter Wave Massive MIMO Communication Systems
with One-bit ADCs
Aditya Sant ("University of California, San Diego")*; Bhaskar Rao (UC San Diego)
3048: ON THE JOINT ESTIMATION OF PHASE NOISE AND TIME-VARYING CHANNELS FOR OFDM
UNDER HIGH-MOBILITY CONDITIONS
Francesco Linsalata (Politecnico di Milano )*; Nassar Ksairi (Huawei Technologies France)
3775: Bit Error and Block Error Rate Training for ML-Assisted Communication
Reinhard Wiesmayr (ETH Zurich)*; Gian Marti (ETH Zurich); Chris Dick (NVIDIA); Haochuan Song
(Southeast University); Christoph Studer (ETH Zurich)
3952: Structural Optimization of Factor Graphs for Symbol Detection via Continuous Clustering
and Machine Learning
Lukas Rapp (Communications Engineering Lab, Karlsruhe Institute of Technology (KIT))*; Luca Schmid
(Communications Engineering Lab, Karlsruhe Institute of Technology (KIT)); Andrej Rode
(Communications Engineering Lab, Karlsruhe Institute of Technology (KIT)); Laurent Schmalen
(Communications Engineering Lab, Karlsruhe Institute of Technology (KIT))
165
3955: Codes Correcting Burst and Arbitrary Erasures for Reliable and Low-Latency
Communication
Serge Kas Hanna (Aalto)*; Zhiyuan Tan (Huawei); Wen Xu (Huawei); Antonia Wachter-Zeh (TUM)
3960: Capacity Maximization for Active RIS Assisted Outdoor-to-Indoor Communication System
Chen He (Northwest University); GONG WEISHENG (Northwest University); Yangrui Dong (Northwest
University); Xie Xie (Northwest University)*; Z. Jane Wang (University of British Columbia)
3979: Joint Microstrip Selection and Beamforming Design for MmWave Systems with Dynamic
Metasurface Antennas
Wei Huang (Hefei University of Technology)*; Haiyang Zhang (Nanjing University of Posts and
Telecommunications); Nir Shlezinger (Ben-Gurion University); Yonina Eldar ()
4320: Joint Estimation of Clustered User Activity and Correlated Channels with Unknown
Covariance in mMTC
Hamza Djelouat (University of Oulu)*; Markus Leinonen (University of Oulu); Markku Juntti (OULU,
Finland)
4374: Information and Sensing Beamforming Optimization for Multi-User Multi-Target MIMO ISAC
Systems
Minghe Zhu (The Chinese University of Hong Kong, Shenzhen); Lei Li (CUHK-Shenzhen); Shuqiang Xia
(ZTE Corporation); Tsung-Hui Chang ("The Chinese University of Hong Kong,")*
4497: Joint Channel and Direction Estimation for Ground-to-UAV Communications Enabled by A
Simultaneous Reflecting and Sensing RIS
Jiguang He (Technology Innovation Institute, 9639 Masdar City, Abu Dhabi)*; Aymen Fakhreddine
(Technology Innovation Institute); George Alexandropoulos (National and Kapodistrian University of
Athens)
166
4635: Sparse Bayesian Learning Assisted Decision Fusion in Millimeter Wave Massive MIMO
Sensor Networks
Apoorva Chawla (Norwegian University of Science and Technology)*; Domenico Ciuonzo (University of
Naples Federico II); Pierluigi Salvo Rossi (NTNU)
4691: Accelerated massive MIMO detector based on annealed underdamped Langevin dynamics
Nicolas M Zilberstein (Rice University)*; Chris Dick (Nvidia); Rahman Doost-Mohammady (Rice
University); Ashutosh Sabharwal (Rice University); Santiago Segarra (Rice University)
4734: Reducing the communication and computational cost of random Fourier features Kernel
LMS in diffusion networks
Daniel G Tiglea (Universidade de Sao Paulo)*; Renato Candido (Universidade de São Paulo); Luis
Antonio Azpicueta-Ruiz (Universidad Carlos III de Madrid); Magno T.M. Silva (University of Sao Paulo)
4817: Machine Learning-Aided Piece-wise Modeling Technique of Power Amplifier for Digital
Predistortion
Sri Satish Krishna Chaitanya Bulusu (University of Oulu)*; Nuutti Tervo (University of Oulu); Praneeth
Susarla (University of Oulu); Mikko Sillanpää (University of Oulu); Olli Silven (University of Oulu); Markku
Juntti (OULU, Finland); Aarno Pärssinen (University of Oulu)
4825: Received Power Maximization with Practical Phase-dependent Amplitude Response in RIS-
Aided OFDM Wireless Communications
Dimitris Kompostiotis (University Of Patras )*; Dimitris Vordonis (University of Patras); Vassilis Paliouras
(University of Patras)
4926: ViT-CAT: Parallel Vision Transformers with Cross Attention Fusion for Popularity Prediction
in MEC Networks
Zohreh HajiAkhondi-Meybodi (Concordia University); Arash Mohammadi (Concordia University)*; Ming
Hou (Defence Research and Development Canada (DRDC)); Jamshid Abouei (Yazd University);
Konstantinos N Plataniotis (UofT)
167
5155: Comparative Study of IRS Assisted Opportunistic Communications Over i.i.d. and LoS
Channels
L Yashvanth (Indian Institute of Science, Bangalore)*; Chandra Murthy (Indian Institute of Science)
5199: Towards Efficient and Optimal Joint Beamforming and Antenna Selection: A Machine
Learning Approach
Sagar Shrestha (Oregon State University)*; Xiao Fu (Oregon State University); Mingyi Hong (University of
Minnesota)
5212: Managing Information Updating with Edge Computing: A Distributed and Learning Approach
Junyi He (Beijing Jiaotong University)*; Di Zhang (Beijing Jiaotong University); Shumeng Liu (Beijing
Jiaotong University ); Yuezhi Zhou (Tsinghua University); Yaoxue Zhang (Tsinghua University)
5577: Sparse Delay-Doppler Channel Estimation for OTFS Modulation using 2D-MUSIC
Akshay S Bondre (Arizona State University)*; Christ Richmond (Duke University); Ahmed Alkhateeb
(Arizona State University); Nicolo Michelusi (Arizona State University)
6018: Joint Millimeter-Wave AoD and AoA Estimation Using One OFDM Symbol and Frequency-
Dependent Beams
Veljko Boljanovic (University of California, Los Angeles)*; Danijela Cabric (University of California, Los
Angeles)
168
Signal Processing Theory and Methods
315: IQGAN: Robust Quantum Generative Adversarial Network for Image Synthesis On NISQ
Devices
Cheng Chu (Indiana University Bloomington); Grant Skipper (Indiana University Bloomington); Martin
Swany (Indiana University Bloomington); Fan Chen (Indiana University Bloomington)*
581: Distributed Bayesian Tracking on the Special Euclidean Group using Lie Algebra Parametric
Approximations
CLAUDIO JOSE BORDIN JUNIOR (Universidade Federal do ABC)*; CAIO DE FIGUEREDO (INSTITUTO
FEDERAL DO CEARA); Marcelo G S Bruno (ITA)
657: Passive detection of rank-one Gaussian signals for known channel subspaces and arbitrary
noise
David Ramírez (Universidad Carlos III de Madrid)*; Ignacio Santamaria (University of Cantabria); Louis
Scharf (University of Colorado)
1009: THE ROLE OF MEMORY IN SOCIAL LEARNING WHEN SHARING PARTIAL OPINIONS
Michele Cirillo (University of Salerno)*; Virginia Bordignon (EPFL); Vincenzo Matta (DIEM, University of
Salerno); Ali H. Sayed (Ecole Polytechnique Fédérale de Lausanne)
1028: Learning graph Laplacian from intrinsic patterns via Gaussian process
Koshi Watanabe (Hokkaido University)*; Keisuke Maeda (Hokkaido University); Takahiro Ogawa
(Hokkaido University); Miki Haseyama (Hokkaido University)
1045: Adaptive Axonal Delays in feedforward spiking neural networks for accurate spoken word
recognition
Pengfei SUN (Ghent University)*; Ehsan Eqlimi (Ghent University); Yansong Chua (China Nanhu
Academy of Electronics and Information Technology); Paul Devos (Ghent University); Dick Botteldooren
(Ghent University)
169
1079: ENTROPY BASED FEATURE REGULARIZATION TO IMPROVE TRANSFERABILITY OF DEEP
LEARNING MODELS
Raphaël Baena (IMT Atlantique)*; Lucas Drumetz (IMT Atlantique); Vincent Gripon (IMT Atlantique)
1224: A Compensated Shrinkage Affine Projection Algorithm for Debiased Sparse Adaptive
Filtering
Yi Zhang (Tokyo Institute of Technology)*; Isao Yamada (Tokyo Institute of Technology)
1324: A Robust Kalman Filter Based Approach for Indoor Robot Positionning with Multi-Path
Contaminated UWB Data
Justin Cano (ISAE-Supaéro)*; Yi Ding (ISAE-Supaéro); Gaël Pagès (ISAE-Supaéro); Eric Chaumette
(ISAE-Supaero); Jerome Le Ny (Polytechnique Montreal)
1333: Sparse Graph Learning with Spectrum Prior for Deep Graph Convolutional Networks
Jin Zeng (Tongji University)*; Yang Liu (Peking University); Gene Cheung (York University); Wei Hu
(Peking University)
1502: CyPMLI: WISL-Minimized Unimodular Sequence Design via Power Method-Like Iterations
Arian Eamaz (University of Illinois - Chicago, IL)*; Farhang Yeganegi (University of Illinois Chicago);
Mojtaba Soltanalian (University of Illinois)
1511: Fast convolution algorithm for Real valued finite length sequences
Weiwei Wang (FSU)*; Victor DeBrunner (FSU); Linda DeBrunner (FSU)
170
1660: Multilevel FISTA for Image Restoration
Guillaume Lauga (Inria/ENS Lyon)*; Elisa Riccietti (ENS Lyon); Nelly Pustelnik (); Paulo Goncalves (ENS
de Lyon)
1761: Dynamic Selection of p-Norm in Linear Adaptive Filtering via Online Kernel-Based
Reinforcement Learning
Minh Vu (Tokyo Institute of Technology); Yuki Akiyama (Tokyo Institute of Technology); Konstantinos
Slavakis (Tokyo Institute of Technology)*
1914: Online Residual-Based Key Frame Sampling with Self-Coach Mechanism and Adaptive
Multi-Level Feature Fusion
Rui Zhang (Shanghai Jiao Tong University); Yang Hua (Queen's University Belfast); Tao Song (Shanghai
Jiao Tong University); Zhengui Xue (Shanghai Jiao Tong University); Ruhui Ma (Shanghai Jiao Tong
University)*; Haibing Guan (Shanghai Jiao Tong University)
2389: False alarm regulation for off-grid target detection with the Matched Filter
Pierre Develter (ONERA; SONDRA, CentraleSupélec, Université Paris-Saclay)*; Jonathan Bosse
(ONERA); Olivier Rabaste (ONERA); Philippe Forster (ENS Paris-Saclay, CNRS, Université Paris-
Saclay); Jean-Philippe Ovarlez (ONERA; SONDRA, CentraleSupélec, Université Paris-Saclay)
171
2415: GRAPH LEARNING FROM GAUSSIAN AND STATIONARY GRAPH SIGNALS
Andrei Buciulea Vlas (Universidad Rey Juan Carlos); Antonio G. Marques (King Juan Carlos University)*
2511: Neural Network Models with Integrated Training and Adaptation for Nonlinear Acoustic
System Identification
Svantje Voit (Carl von Ossietzky University of Oldenburg)*; Gerald Enzner (Carl von Ossietzky University
Oldenburg)
2592: A New Probabilistic Distance Metric with Application in Gaussian Mixture Reduction
Ahmad Sajedi (University of Toronto)*; Yuri Lawryshyn (University of Toronto); Konstantinos N Plataniotis
(UofT)
2600: Cramér-Rao bound on Lie groups with observations on Lie groups: application to $SE(2)$
Samy LABSIR (IPSA)*; Alexandre Renaux (Université Paris Saclay); Jordi Vilà-Valls (ISAE-SUPAERO);
Eric Chaumette (ISAE-SUPAERO)
2611: Various Performance Bounds on the Estimation of Low-Rank Probability Mass Function
Tensors from Partial Observations
Tomer Hershkovitz (Tel Aviv University); Martin Haardt (Ilmenau University of Technology); Arie Yeredor
(Tel Aviv University)*
172
2908: REGRESSION TO CLASSIFICATION: WAVEFORM ENCODING FOR NEURAL FIELD-BASED
AUDIO SIGNAL REPRESENTATION
TaeSoo Kim (KT Corporation)*; Daniel Rho (KT Corporation); GaHui Lee (KT Corporation); JaeHan Park
(KT Corporation); Jong Hwan Ko (Sungkyunkwan University)
3107: ROBUST AND GLOBALLY SPARSE PCA VIA MAJORIZATION-MINIMIZATION AND VARIABLE
SPLITTING
Hugo Brehier (SONDRA, CentraleSupélec)*; Arnaud Breloy (Université Paris Nanterre); Mohammed
Nabil EL KORSO (Paris Nanterre University); Sandeep Prof. Kumar (IIT Delhi)
3808: Product Graph Learning from Multi-attribute Graph Signals with Inter-layer Coupling
Chenyue Zhang (The Chinese University of Hong Kong)*; Yiran HE (The Chinese University of Hong
Kong); Hoi-To Wai (Chinese University of Hong Kong)
173
3924: Adaptive Gaussian nested filter for parameter estimation and state tracking in dynamical
systems
Sara Pérez-Vieites (IMT Nord Europe)*; Victor Elvira (University of Edinburgh)
4000: Sparse asynchronous samples from networks of TEMS for reconstruction of classes of non-
bandlimited signals
Marek Hilton (Imperial College London)*; Pier Luigi Dragotti (Imperial College London)
4029: Revisit Sampling Theory of Bandlimited Graph Signals: One Bridge Between GSP and DSP
Fen Wang (Zhejiang Lab)*; Taihao Li (zhejianglab); Xue Zhang (Shandong University of Science and
Technology)
4639: Elliptical Wishart distribution: maximum likelihood estimator from information geometry
Imen AYADI (université Paris Saclay)*; Florent Bouchard (L2S); Frédéric Pascal (CentraleSupélec)
174
4659: SPARSITY-DRIVEN JOINT BLIND DECONVOLUTION-DEMODULATION WITH APPLICATION
TO MOTOR FAULT DETECTION
Varun A Kelkar (University of Illinois at Urbana-Champaign); Dehong Liu (Mitsubishi Electric Research
Laboratories (MERL))*; Hiroshi Inoue (Mitsubishi Electric Corporation); Makoto Kanemaru (Mitsubishi
Electric Corporation)
4746: Consistent estimators of a new class of covariance matrix distances in the large
dimensional regime
Roberto Pereira (Centre Tecnològic de Telecomunicacions de Catalunya); Xavier Mestre (Centre
Tecnològic de Telecomunicacions de Catalunya)*; David Gregoratti (SRS)
4750: Phase Unwrapping in Correlated Noise for FMCW Lidar Depth Estimation
Alfred Ulvog ( Mitusbishi Electric Research Laboratories); Joshua Rapp (Mitusbishi Electric Research
Laboratories)*; Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories); Hassan Mansour
(Mitsubishi Electric Research Laboratories (MERL)); Petros Boufounos (Mitsubishi Electric Research
Laboratories); Kieran Parsons (Mitsubishi Electric Research Laboratories)
4815: Improved Small Sample Hypothesis Testing using the Uncertain Likelihood Ratio
James Z Hare (DEVCOM Army Research Lab)*; Lance Kaplan (DEVCOM Army Research Laboratory)
4875: On the primal and dual formulations of the Discrete Mumford-Shah functional
Nelly Pustelnik ()*
4901: Tangent Bundle Filters and Neural Networks: from Manifolds to Cellular Sheaves and Back
Claudio Battiloro (Sapienza University of Rome)*; Zhiyang Wang (University of Pennsylvania); Hans
Riess (Duke University); Paolo Di Lorenzo (Sapienza University of Rome); Alejandro Ribeiro (University
of Pennsylvania)
175
4953: Estimating and Analyzing Neural Information Flow Using Signal Processing on Graphs
Felix Schwock (University of Washington)*; Julien Bloch (University of Washington); Les Atlas (University
of Washington); Shima Abadi (University of Washington ); Azadeh Yazdan-Shahmorad (University of
Washington)
5077: Unique Bispectrum Inversion for Signals with Finite Spectral/Temporal Support
Samuel Pinilla (STFC)*; Kumar Vijay Mishra (United States DEVCOM Army Research Laboratory); Brian
M Sadler (Army Research Laboratory, USA)
5115: Discriminative Vector Learning with Application To Single Channel Speech Separation
Ha Minh Tan (National Central University); Kai-Wen Liang (Department of Communication Engineering,
National Central University); Jia-Ching Wang (National Central University)*
5127: HeMPPCAT: Mixtures of Probabilistic Principal Component Analysers for Data with
Heteroscedastic Noise
Alec Xu (University of Michigan)*; Laura Balzano (University of Michigan); Jeffrey A Fessler (University of
Michigan)
176
5352: Adaptive ECCM for Mitigating Smart Jammers
Shashwat Jain (Cornell University)*; Kunal Pattanayak (Cornell University); Vikram Krishnamurthy
(Cornell University); Christopher Berry (Lockheed Martin Advanced Technology Labs)
5765: BEYOND RATE CODING: SIGNAL CODING AND RECONSTRUCTION USING LEAN SPIKE
TRAINS
Anik Chattopadhyay (University of Florida)*; Arunava Banerjee (University of Florida)
177
5951: SPATIAL INFERENCE USING CENSORED MULTIPLE TESTING WITH FDR CONTROL
Martin Gölz (Technische Universität Darmstadt)*; Abdelhak M Zoubir (Technische Universität Darmstadt);
Visa Koivunen (Aalto university)
6007: Element Selection with Wide Class of Optimization Criteria Using Non-convex Sparse
Optimization
Taiga Kawamura (Tokyo Metropolitan University)*; Natsuki Ueno (Tokyo Metropolitan University);
Nobutaka Ono (Tokyo Metropolitan University)
6057: Low-rank plus sparse trajectory decomposition for direct exoplanet imaging
Simon Vary (ICTEAM/INMA, UCLouvain)*; Hazan Daglayan (UCLouvain); Laurent Jacques (Université
catholique de Louvain); P.-A. Absil (UCLouvain)
6351: Robust M-Estimation based Distributed Expectation Maximization Algorithm with Robust
Aggregation
Christian A. Schroth (Technische Universität Darmstadt )*; Stefan Vlaski (Imperial College London);
Abdelhak M Zoubir (Technische Universität Darmstadt)
6378: Global Localisation in Continuous Magnetic Vector Fields Using Gaussian Processes
William T McDonald (University of Technology, Sydney)*; Cedric Le Gentil (University of Technology
Sydney); Teresa A. Vidal-Calleja (University of Technology Sydney)
6529: Differentiable adaptive short-time Fourier transform with respect to the window length
Maxime Leiber (INRIA)*; Yosra Marnissi (SAFRAN TECH); Axel Barrau (Offroad); Mohammed El Badaoui
(Safran Tech)
178
Speech and Language Processing
112: Reducing the gap between streaming and non-streaming Transducer-based ASR models by
adaptive two-stage knowledge distillation
Haitao Tang (iFlytek Research)*; Yu Fu (Zhejiang University); Lei Sun (iFlytek Research); Jiabin Xue
(Harbin Institute of Technology); Dan Liu (iFLYTEK Co., LTD.,); Yongchao Li (iFlytek Research); Zhiqiang
Ma (iFlytek Research); Minghui Wu (iFlytek Research); Jia Pan (iFlytek Research); Genshun Wan (iFlytek
Research); Ming'en Zhao (iFlytek Research)
209: Improving Contextual Spelling Correction by External Acoustics Attention and Semantic
Aware Data Augmentation
Xiaoqiang Wang (Microsoft)*; Yanqing Liu (Microsoft); Jinyu Li (Microsoft); sheng zhao (microsoft)
253: ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-
supervised Speech Representations
Shehzeen S Hussain (UCSD)*; Paarth Neekhara (UCSD); Jocelyn Huang (NVIDIA); Jason Li (NVIDIA);
Boris Ginsburg (NVIDIA)
298: Contrastive Learning at the Relation and Event Level for Rumor Detection
Yingrui Xu (Institute of Information Engineering, Chinese Academy of Sciences;School of Cyber Security,
University of Chinese Academy of Sciences); Jingyuan Hu (Institute of Information Engineering, Chinese
Academy of Sciences)*; jingguo ge (iie,cas); Yulei Wu (University Of Exeter); Hui Li (Institute of
Information Engineering, Chinese Academy of Sciences); Tong Li (Institute of Information Engineering,
Chinese Academy of Sciences)
179
303: Parameter Efficient Transfer Learning for Various Speech Processing Tasks
Shinta Otake (Tokyo Institute of Technology)*; Rei Kawakami (Tokyo Institute of Technology); Nakamasa
Inoue (Tokyo Institute of Technology)
392: From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-
Lingual Speech Recognition
Chao-Han Huck Yang (Georgia Institute of Technology )*; Bo Li (Google); Yu Zhang (Google); Nanxin
Chen (John Hopkins Universoty); Rohit Prabhavalkar (Google); Tara Sainath (Google); Trevor Strohman
(Google)
393: VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational
Speech Recognition
Naoyuki Kanda (Microsoft)*; Jian Wu (Microsoft); Xiaofei Wang (Microsoft); Zhuo Chen (Microsoft); Jinyu
Li (Microsoft); Takuya Yoshioka (Microsoft)
400: From Easy to Hard: Two-stage Selector and Reader for Multi-hop Question Answering
Xin-Yi Li (State Key Laboratory for Novel Software Technology, Nanjing University); Wei-Jun Lei (State
Key Laboratory for Novel Software Technology, Nanjing University); Yu-Bin Yang (State Key Laboratory
for Novel Software Technology, Nanjing University)*
401: Joint unsupervised and supervised learning for context-aware language identification
Jinseok Park (42dot)*; Hyung Yong Kim (42dot); Jihwan Park (42dot Inc.); Byeong-Yeol Kim (42dot);
Shukjae Choi (Hyundai Motor Company); Yunkyu Lim (42dot)
180
462: LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech
Jie Chen (Shenzhen International Graduate School, Tsinghua University)*; Xingchen Song (Horizon
Robotics, Beijing, China); Zhendong Peng ( Horizon Robotics, Beijing, China); Binbin Zhang (
Horizon Robotics, Beijing, China); Fuping Pan ( Horizon Robotics, Beijing, China); Zhiyong Wu (Tsinghua
University)
510: WeSinger 2: Fully Parallel Singing Voice Synthesis via Multi-Singer Conditional Adversarial
Training
Zewang Zhang (Tencent Inc.)*; Yibin Zheng (Tencent Inc, China); Xinhui Li (Tencent Inc); Li Lu (Tencent
Inc)
531: A Reality Check and A Practical Baseline for Semantic Speech Embedding
Guangyu Chen (Renmin University of China)*; Yuanyuan Cao (Renmin University of China)
540: Mingling or Misalignment? Temporal Shift for Speech Emotion Recognition with Pre-trained
Representations
Siyuan Shen (East China Normal University)*; Feng Liu (East China Normal University); Aimin Zhou (East
China Normal University)
578: LiteG2P: A fast, light and high accuracy model for Grapheme-to-Phoneme conversion
Chunfeng Wang (Bytedance Inc)*; Peisong Huang (ByteDance Inc.); Yuxiang Zou (Bytedance); Haoyu
Zhang (Bytedance); Shichao Liu (ByteDance); Xiang Yin (ByteDance AI LAB); Zejun Ma (Bytedance)
181
611: Question Answering system with Sparse and Noisy Feedback
Djallel Bouneffouf (IBM)*; Oznur Alkan (Optum); Raphael Feraud (Orange Labs); Baihan Lin (Columbia
University)
636: HAG: Hierarchical Attention with Graph Network for Dialogue Act Classification in
Conversation
Changzeng Fu (Osaka University)*; Zhenghan Chen (Peking University); Jiaqi Shi (Osaka University;
RIKEN); Bowen Wu (Osaka Univeristy); Chaoran Liu (Advanced Telecommunications Research Institute
International); Carlos Toshinori Ishi (Advanced Telecommunications Research Institute International);
Hiroshi Ishiguro (Osaka University)
640: LED: Label Correlation Enhanced Decoder for Multi-Label Text Classification
Kefan Ma (Shanghai Jiao Tong University)*; Zheng Huang (Shanghai Jiao Tong University); Xinrui Deng
(Shanghai Jiao Tong University); Jie Guo (Shanghai Jiao Tong University); Weidong Qiu (Shanghai
Jiaotong University)
663: Evaluating Speech–Phoneme Alignment and Its Impact on Neural Text-To-Speech Synthesis
Frank Zalkow (Fraunhofer IIS)*; Prachi Govalkar (Fraunhofer IIS); Meinard Müller (International Audio
Laboratories Erlangen); Emanuel Habets (Fraunhofer IIS); Christian Dittmar (Fraunhofer IIS)
182
766: Time-domain Speech Enhancement Assisted by Multi-resolution Frequency Encoder and
Decoder
Hao Shi (Kyoto University)*; Masato Mimura (Kyoto University); Longbiao Wang (Tianjin University);
Jianwu Dang (Tianjin University); Tatsuya Kawahara (Kyoto University)
767: Contextual Similarity is More Valuable than Character Similarity: An Empirical Study for
Chinese Spell Checking
Ding Zhang (Tsinghua University); Yinghui Li (Tsinghua University)*; Qingyu Zhou (OPPO Research
Institute); Shirong Ma (Tsinghua University); Li Yangning (Tsinghua Shenzhen International Graduate
School); Yunbo Cao (Tencent); Hai-Tao Zheng (Tsinghua University)
770: Prompt-Distiller: Few-shot Knowledge Distillation for Prompt-based Language Learners with
Dual Contrastive Learning
Boyu Hou (Chongqing University); Chengyu Wang (Alibaba)*; Xiaoqing Chen (Chongqing University);
Minghui Qiu (Alibaba); Liang Feng (Chongqing University, China); Jun Huang (Alibaba Group)
785: Stabilising and accelerating light gated recurrent units for automatic speech recognition
Adel Moumen (Avignon University)*; Titouan Parcollet (Samsung AI Research)
797: Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source
Speech
Maryam Fazel-Zarandi (Meta); Wei-Ning Hsu (Massachusetts Institute of Technology)*
826: Advancing the dimensionality reduction of speaker embeddings for speaker diarisation:
disentangling noise and informing speech activity
You Jin Kim (Naver Corporation)*; Heesoo Heo (Naver Corp.); Jee-weon Jung (Naver Corporation);
Youngki Kwon (Naver Corporation); Bong-Jin Lee (Naver Corporation); Joon Son Chung (KAIST)
831: Matching-based Term Semantics Pre-training for Spoken Patient Query Understanding
Zefa Hu (Institute of Automation,Chinese Academy of Sciences)*; Xiuyi Chen (Institute of
Automation,Chinese Academy of Science); Haoran Wu (Institute of Automation,Chinese Academy of
Sciences); Minglun Han (Institute of Automation, Chinese Academy of Sciences); Ni Ziyi (CASIA); Jing
Shi (Institute of Automation Chinese Academy of Sciences); Shuang Xu (casia); Bo Xu (Institute of
Automation, Chinese Academy of Sciences)
836: Keyword-Specific Acoustic Model Pruning for Open Vocabulary Keyword Spotting
Yujie Yang (Tsinghua University)*; Kun Zhang (The Chinese University of Hong Kong); Zhiyong Wu
(Tsinghua University); Helen Meng (The Chinese University of Hong Kong)
183
865: Towards A Unified Training for Levenshtein Transformer
Kangjie Zheng (Peking University)*; Longyue Wang (Tencent AI Lab); Zhihao Wang (Xiamen University);
Chen Binqi (Peking University); Ming Zhang (Peking University); Zhaopeng Tu (Tencent AI Lab)
915: PERCEIVE AND PREDICT: SELF SUPERVISED SPEECH REPRESENTATION BASED LOSS
FUNCTIONS FOR SPEECH ENHANCEMENT
George L Close (University of Sheffield)*; William Ravenscroft (The University of Sheffield); Thomas Hain
(University of Sheffield); Stefan Goetze (University of Sheffield)
930: Learning Cross-modal Audiovisual Representations with Ladder Networks for Emotion
Recognition
Lucas Goncalves (The University of Texas at Dallas)*; Carlos Busso (University of Texas at Dallas)
958: D2Former: A Fully Complex Dual-Path Dual-Decoder Conformer Network using Joint
Complex Masking and Complex Spectral Mapping for Monaural Speech Enhancement
Shengkui Zhao (Alibaba Group)*; Bin Ma ("Alibaba, Singapore R&D Center")
959: MossFormer: Pushing the Performance Limit of Monaural Speech Separation using Gated
Single-head Transformer with Convolution-augmented Joint Self-Attentions
Shengkui Zhao (Alibaba Group)*; Bin Ma ("Alibaba, Singapore R&D Center")
963: Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-
End Neural Diarization
Dongmei Wang (Microsoft)*; Xiong Xiao (Microsoft); Naoyuki Kanda (Microsoft); Takuya Yoshioka
(Microsoft); Jian Wu (Microsoft)
184
1007: AdapITN: A FAST, RELIABLE, AND DYNAMIC ADAPTIVE INVERSE TEXT NORMALIZATION
Binh Thai Nguyen (Karlsruhe Institute of Technology)*; Duc Minh Nhat Le (Vietnam Artificial Intelligence
Solutions); Quang Minh Nguyen (Vietnam Artificial Intelligence Solutions); Quoc Truong Do (Vietnam
Artificial Intelligence Solutions); Chi-Mai Luong (ICTLab, University of Science and Technology of Hanoi,
Vietnam Academy of Science and Technology, 18 Hoang Quoc Viet, Cau Giay, Hanoi, Vietnam.);
Alexander Waibel (Karlsruhe Institute of Technology)
1025: Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised Style Extractor
and Hierarchical Modeling in Speech Synthesis
Chunyu Qiang (Kwai)*; Peng Yang (Kwai); Hao Che (Kwai); Ying Zhang (Kwai); Xiaorui Wang (Kwai);
Zhongyuan Wang (Kwai)
1241: PERIOD VITS: VARIATIONAL INFERENCE WITH EXPLICIT PITCH MODELING FOR END-TO-
END EMOTIONAL SPEECH SYNTHESIS
Yuma Shirahata (LINE Corp.)*; Ryuichi Yamamoto (LINE Corp.); Eunwoo Song (Naver Corporation); Ryo
Terashima (LINE Corp.); Jae-Min Kim (NAVER Cloud Corp.); Kentaro Tachibana (LINE Corp.)
185
1253: AN ASR-FREE FLUENCY SCORING APPROACH WITH SELF-SUPERVISED LEARNING
Wei Liu (The Chinese University of Hong Kong)*; Kaiqi Fu (Bytedance); Xiaohai Tian (ByteDance); Shuju
Shi (ByteDance); Wei Li (Bytedance); Zejun Ma (Bytedance); Tan Lee (The Chinese University of Hong
Kong)
1336: Extreme bandwidth extension network applied to speech signals captured with noise-
resilient body-conduction microphones
Julien Hauret (Conservatoire national des arts et métiers)*; Thomas Joubaud (ISL); Véronique Zimpfer
(Department of Acoustics and Soldier Protection, French-German Research Institute of Saint-Louis (ISL));
Éric BAVU (Conservatoire National des Arts et Métiers)
1373: Fast and accurate factorized neural transducer for text adaption of end-to-end speech
recognition models
Rui Zhao (Microsoft)*; JIAN XUE (Microsoft Corporation); Partha Parthasarathy (Microsoft); Veljko
Miljanic (Microsoft); Jinyu Li (Microsoft)
1378: Unified Keyword Spotting and Audio Tagging on Mobile Devices with Transformers
Heinrich Dinkel (Xiaomi Techonology)*; Yongqing Wang (Xiaomi); Zhiyong Yan (Xiaomi); Junbo Zhang
(Xiaomi); Yujun Wang (xiaomi)
186
1398: Streaming Voice Conversion Via Intermediate Bottleneck Features And Non-streaming
Teacher Guidance
yuanzhe chen (Bytedance)*; Ming Tu (ByteDance AI Lab); Tang Li (ByteDance Ltd); Xin Li (ByteDance);
Qiuqiang Kong (Byte Dance); Jiaxin Li (ByteDance); Zhichao Wang (ByteDance); qiao tian (ByteDance);
wang yuping (bytedance); Yuxuan Wang (ByteDance AI Lab)
1423: Knowledge-Aware Graph Convolutional Network with Utterance-Specific Window Search for
Emotion Recognition in Conversations
Xiaotong Zhang (School of Software, Dalian University of Technology)*; Peng He (School of
Software,Dalian University of Technology); Han Liu (Dalian University of Technology); Zhengxi Yin
(Huawei Technologies Co. Ltd); Xinyue Liu (School of Software, Dalian University of Technology);
Xianchao Zhang (School of Software, Dalian University of Technology)
1457: Optimal Transport with a Diversified Memory Bank for Cross-Domain Speaker Verification
Ruiteng Zhang (Tianjin University)*; Jianguo Wei (School of Computer Software, Tianjin University,
Tianjin, China); Xugang Lu (NICT); Wenhuan Lu (Tianjin University); Di Jin (Tianjin University); Lin Zhang
(National Institute of Informatics); Junnhai Xu (Tianjin Key Laboratory of Cognitive Computing and
Application, College of Intelligence and Computing, Tianjin University)
1499: Neural Transducer Training: Reduced Memory Consumption with Sample-wise Computation
Stefan Braun (Apple)*; Erik McDermott (Apple); Roger Hsiao (Apple)
1518: Continual Learning for On-Device Speech Recognition using Disentangled Conformers
Anuj Diwan (University of Texas at Austin)*; Ching-Feng Yeh (Facebook); Wei-Ning Hsu (Massachusetts
Institute of Technology); Paden Tomasello (Meta); Eunsol Choi (University of Texas at Austin); David
Harwath (The University of Texas at Austin); Abdelrahman Mohamed (Rembrand Inc)
1524: Multi-output RNN-T Joint Networks for Multi-task Learning of {ASR} and Auxiliary Tasks
Weiran Wang (Google)*; Ding Zhao (Google); Shaojin Ding (Google); Hao Zhang (Google); Shuo-yiin
Chang (Google); David Rybach (Google); Tara Sainath (Google); Yanzhang He (Google); Ian McGraw ();
Shankar Kumar (Google)
187
1604: Explanations for Automatic Speech Recognition
Xiaoliang Wu (University of Edinburgh)*; Peter Bell (University of Edinburgh); Ajitha Rajan (University of
Edinburgh)
1638: Spoofed training data for speech spoofing countermeasure can be efficiently created using
neural vocoders
Xin Wang (National Institute of Informatics)*; Junichi Yamagishi (National Institute of Informatics)
1639: Multi-speaker Data Augmentation for Improved End-to-end Automatic Speech Recognition
Samuel Thomas (IBM Research AI)*; Jeff Kuo (IBM); George Saon (IBM); Brian Kingsbury (IBM
Research)
1647: Named Entity Detection and Injection for Direct Speech Translation
Marco Gaido (Fondazione Bruno Kessler)*; Yun Tang (Meta); Ilia Kulikov (Meta); Rongqing Huang (Meta);
Hongyu Gong (Meta); HIrofumi Inaguma (Meta)
1655: STRUCTURED STATE SPACE DECODER FOR SPEECH RECOGNITION AND SYNTHESIS
Koichi Miyazaki (CyberAgent, Inc.)*; Masato Murata (CyberAgent, Inc.); Tomoki Koriyama (CyberAgent,
Inc.)
1661: TrimTail: Low-Latency Streaming ASR with Simple but Effective Spectrogram-Level Length
Penalty
Xingchen Song (Tsinghua University)*; Di Wu (horizon); Zhiyong Wu (Tsinghua University); Binbin Zhang
(horizon); Yuekai Zhang (Wenet Open Source Community); Zhendong Peng (horizon); Wenpeng Li
(horizon); Fuping Pan (horizon); Changbao Zhu (horizon)
1672: Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis
Karren D Yang (Apple)*; Ting-Yao Hu (Carnegie Mellon University); Jen-Hao Rick Chang (Apple); Hema
Koppula (Apple); Oncel Tuzel (Apple)
188
1674: On-the-fly Text Retrieval for End-to-End ASR Adaptation
Bolaji Yusuf (Bogazici University)*; Aditya Gourav (Amazon); Ankur Gandhe (Amazon Alexa); Ivan Bulyko
(Amazon)
1699: On Using the UA-Speech and TORGO Databases to Validate Automatic Dysarthric Speech
Classification Approaches
Guilherme Schu (Idiap)*; Parvaneh janbakhshi (Bayer AG); Ina Kodrasi (Idiap Research Institute)
1792: Time-Aware Multiway Adaptive Fusion Network for Temporal Knowledge Graph Question
Answering
Yonghao Liu (Centre for Natural Language Processing, Meituan Inc., Beijing, China); Di Liang (Centre for
Natural Language Processing, Meituan Inc., Beijing, China)*; Fang Fang (Department of Automation,
Tsinghua University, Beijing, China); Sirui Wang (Centre for Natural Language Processing, Meituan Inc.,
Beijing, China); Wei Wu (Centre for Natural Language Processing, Meituan Inc., Beijing, China); Rui
Jiang (Department of Automation, Tsinghua University, Beijing, China)
1826: EGAN: A Neural Excitation Generation Model based on Generative Adversarial Networks
with Harmonics and Noise Input
Yen-Ting Lin (National Taipei University)*; Chen Yu CHIANG (National Taipei University)
1848: Disentangled Training with Adversarial Examples For Robust Small-footprint Keyword
Spotting
ZHENYU WANG (UTD); Li Wan (Meta); Biqiao Zhang (Meta); Yiteng Huang (Meta Platforms); Shang-
Wen Li (Meta); Ming Sun (Meta); Xin Lei (Meta); Zhaojun Yang (Meta)*
189
1850: Anchored Speech Recognition with Neural Transducers
Desh Raj (Johns Hopkins University)*; Junteng Jia (Meta AI); Jay Mahadeokar (Meta AI); Chunyang Wu
(Meta AI); Niko Moritz (Meta); Xiaohui Zhang (Meta); Ozlem Kalinli (Meta AI)
1867: A Holistic Cascade System, Benchmark, and Human Evaluation Protocol for Expressive
Speech-to-Speech Translation
Wen-Chin Huang (Nagoya University)*; Benjamin Peloquin (Meta AI); Justine Kao (Meta AI); Changhan
Wang (Facebook AI Research); Hongyu Gong (Meta AI); Elizabeth Salesky (Johns Hopkins University);
Yossi Adi (Facebook AI Research ); Ann Lee (Facebook, Inc.); Peng-Jen Chen (Meta AI)
1883: Supervised Hierarchical Clustering using Graph Neural Networks for Speaker Diarization
Prachi Singh (Indian Institute of Science, Bangalore)*; Amrit Kaul ( Indian Institute of Science,
Bangalore); Sriram Ganapathy (Indian Institute of Science, Bangalore, India, 560012)
1886: SLBERT: A NOVEL PRE-TRAINING FRAMEWORK FOR JOINT SPEECH AND LANGUAGE
MODELING
Onkar Susladkar (Natter Labs)*; Prajwal Gatti (Dayananda Sagar College of Engineering); Santosh
Kumar Yadav (Natter Labs)
1897: ON WORD ERROR RATE DEFINITIONS AND THEIR EFFICIENT COMPUTATION FOR MULTI-
SPEAKER SPEECH RECOGNITION SYSTEMS
Thilo von Neumann (Paderborn University)*; Christoph B Boeddeker (Paderborn University); Keisuke
Kinoshita (Google); Marc Delcroix (NTT); Reinhold Haeb-Umbach (University of Paderborn)
1905: ``Prediction of Sleepiness Ratings from Voice by Man and Machine": a perceptual
experiment replication study
Vincent P. Martin (Université de Bordeaux)*; Aymeric Ferron (INRIA Bordeaux); Jean-Luc Rouas (CNRS);
Pierre Philip (Université de Bordeaux)
1919: Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition
Yuchen Hu (Nanyang Technological University)*; Chen Chen (Nanyang Technological University); Ruizhe
Li (University of Aberdeen); Qiu-Shi Zhu (University of Science and Technology of China); Eng Siong
Chng (Nanyang Technological University)
190
1929: MADI: Inter-domain Matching and Intra-domain Discrimination for Cross-domain Speech
Recognition
Jiaming Zhou (Nankai University)*; Shiwan Zhao (Independent Researcher); Ning Jiang (Mashang
Consumer Finance Co., Ltd.); Guoqing Zhao (Mashang Consumer Finance Co., Ltd); Yong Qin (Nankai
University)
1948: Masked Token Similarity Transfer for Compressing Transformer-Based ASR Models
Euntae Choi (Seoul National University)*; Youshin Lim (42dot); Byeong-Yeol Kim (42dot); Hyung Yong
Kim (42dot); Hanbin Lee (42dot); Yunkyu Lim (42dot); Seung Woo Yu (42dot); Sungjoo Yoo (Seoul
National University)
1981: Pitch Mark Detection from Noisy Speech Waveform using Wave-U-Net
Hyun-Joon Nam (Pohang University of Science and Technology)*; Hong-June Park (Pohang University of
Science and Technology)
2003: ImagineNET: Target Speaker Extraction with Intermittent Visual Cue through Embedding
Inpainting
Zexu Pan (National University of Singapore)*; Wupeng Wang (NUS); Marvin Borsdorf (University of
Bremen); Haizhou Li (The Chinese University of Hong Kong (Shenzhen))
191
Yiming Wang (Microsoft Corporation); Shujie Liu (Microsoft Research Asia); Zhuo Chen (Microsoft);
DeLiang Wang (Ohio State University); Michael Zeng (Microsoft)
2056: Relational Representation Learning for Zero-shot Relation Extraction with Instance
Prompting and Prototype Rectification
Bin Duan (Beijing University of Posts and Telecommunications); Xingxian Liu (Beijing University of Posts
and Telecommunications); Shusen Wang (Beijing University of Posts and Telecommunications); Yajing Xu
(Beijing University of Posts and Telecommunications)*; Bo Xiao (Beijing University of Posts and
Telecommunications)
2096: The Edinburgh International Accents of English Corpus: Towards the Democratization of
English ASR
Ramon R Sanabria (The University Of Edinburgh)*; Nikolay Bogoychev (The University Of Edinburgh);
Nina Markl (University of Edinburgh); Andrea Carmantini (University of Edinburgh); Ondrej Klejch
(University of Edinburgh); Peter Bell (University of Edinburgh)
2123: Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech
Recognition
Shujie HU (The Chinese University of Hong Kong)*; Xurong Xie (Institute of Software, Chinese Academy
of Sciences); Zengrui Jin (The Chinese University of Hong Kong); Mengzhe GENG (The Chinese
University of Hong Kong); Yi Wang (The Chinese University of Hong Kong); Mingyu Cui (The Chinese
University of Hong Kong); Jiajun Deng (The Chinese University of HongKong); Xunying Liu (The Chinese
University of Hong Kong); Helen Meng (The Chinese University of Hong Kong)
2179: Automatic Severity Classification of Dysarthric speech by using Self-supervised Model with
Multi-task Learning
Eun Jung Yeo (Seoul National University)*; Kwanghee Choi (Sogang University); Sunhee Kim (Seoul
National University); Minhwa Chung (Seoul National University)
192
2181: A Slot-shared Span Prediction-based Neural Network for Multi-Domain Dialogue State
Tracking
Abibulla Atawulla (University of Chinese Academy of Sciences)*; Xi Zhou (Xinjiang Technical Institute of
Physics & Chemistry, Chinese Academy of Sciences); Yating Yang (Xinjiang Technical Institute of Physics
& Chemistry, Chinese Academy of Sciences); Bo Ma (Xinjiang Technical Institute of Physics & Chemistry,
Chinese Academy of Sciences); Fengyi Yang (University of Chinese Academy of Sciences)
2237: Lego-Features: Exporting modular encoder features for streaming and deliberation ASR
Rami Botros (Google)*; Rohit Prabhavalkar (Google); Johan Schalkwyk (Google); Ciprian Chelba (Google
Research); Tara Sainath (Google); Françoise Beaufays (Google)
2267: Towards trustworthy phoneme boundary detection with autoregressive model and improved
evaluation metric
Hyeongju Kim (Supertone, Inc.)*; Hyeong-Seok Choi (Seoul National University)
193
2314: A FEW SHOT LEARNING OF SINGING TECHNIQUE CONVERSION BASED ON CYCLE
CONSISTENCY GENERATIVE ADVERSARIAL NETWORKS
Po-Wei Chen (National Tsing Hua University)*; Von-Wun Soo (nthu)
2342: Dual Path Modeling for Semantic Matching by Perceiving Subtle Conflicts
Chao Xue (Beihang University); Di Liang (Centre for Natural Language Processing, Meituan Inc., Beijing,
China)*; Sirui Wang (Centre for Natural Language Processing, Meituan Inc., Beijing, China); Jing Zhang
(Beihang University); Wei Wu (Centre for Natural Language Processing, Meituan Inc., Beijing, China)
2397: Unsupervised model-based speaker adaptation of end-to-end lattice-free MMI model for
speech recognition
Xurong Xie (Institute of Software, Chinese Academy of Sciences)*; Xunying Liu (The Chinese University
of Hong Kong); Hui Chen (Institute of Software, Chinese Academy of Sciences); Hongan Wang (Institute
of Software, Chinese Academy of Sciences)
2400: Query-Utterance Attention with Joint modeling for Query-Focused Meeting Summarization
Xingxian Liu (Beijing University of Posts and Telecommunications); Bin Duan (Beijing University of Posts
and Telecommunications); Bo Xiao (Beijing University of Posts and Telecommunications); Yajing Xu
(Beijing University of Posts and Telecommunications)*
2426: A Dynamic Graph Interactive Framework with Label-Semantic Injection for Spoken
Language Understanding
Zhihong Zhu (Peking University)*; Weiyuan Xu (Peking University); Xuxin Cheng (Peking University);
Tengtao Song (Peking University); Yuexian Zou (Peking University)
194
2433: TOWARDS DOMAIN GENERALISATION IN ASR WITH ELITIST SAMPLING AND ENSEMBLE
KNOWLEDGE DISTILLATION
Rehan Ahmad (University of Sheffield)*; Md Asif Jalal (Samsung Research UK); Muhammad Umar
Farooq (University of Sheffield); Anna L Ollerenshaw (University of Sheffield); Thomas Hain (University of
Sheffield)
2456: Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition
Zengrui Jin (The Chinese University of Hong Kong)*; Xurong Xie (Institute of Software, Chinese Academy
of Sciences); Mengzhe GENG (The Chinese University of Hong Kong); Tianzi Wang (The Chinese
University of HongKong); Shujie HU (The Chinese University of Hong Kong); Jiajun Deng (The Chinese
University of HongKong); Guinan Li (Chinese University of HongKong); Xunying Liu (The Chinese
University of Hong Kong)
2490: Multi-Scale Receptive Field Graph Model for Emotion Recognition in Conversations
JIE WEI (Xi'an Jiaotong University)*; Guanyu Hu (Xi'an Jiaotong University); Anh Tuan Luu (Nanyang
Technological University); Xinyu Yang (Xi'an Jiaotong University); WenJing Zhu (DXM)
195
2573: LEARNING ROBUST SELF-ATTENTION FEATURES FOR SPEECH EMOTION RECOGNITION
WITH LABEL-ADAPTIVE MIXUP
Lei Kang (Shantou University)*; Lichao Zhang (Air Force Engineering University); Dazhi Jiang (Shantou
University)
2594: Cross-domain Diffusion based Speech Enhancement for Very Noisy Speech
Heming Wang (The Ohio State University)*; DeLiang Wang (Ohio State University)
2619: nVOC-22: A low cost Mel Spectrogram vocoder for mobile devices
Rakesh Iyer (Google Inc)*
2621: Training Large-Vocabulary Neural Language Models by Private Federated Learning for
Resource-Constrained Devices
Mingbin Xu (Apple); Congzheng Song (Apple)*; Ye Tian (Apple); Neha Agrawal (Apple); Filip Granqvist
(Apple); Rogier C van Dalen (Samsung AI Center, Cambridge, UK); Xiao Zhang (Apple); Arturo Argueta
(Apple); Shiyi Han (Apple); Yaqiao Deng (Apple); Leo Liu (Apple); Anmol Walia (Apple); Alex Jin (Apple)
2625: Improving Speech Prosody of Audiobook Text-to-Speech Synthesis with Acoustic and
Textual Contexts
Detai Xin (The University of Tokyo)*; Sharath Adavanne (Rakuten Inc.); Federico Ang (Rakuten Inc.);
Ashish Kulkarni (Rakuten); Shinnosuke Takamichi (The University of Tokyo); Hiroshi Saruwatari (The
University of Tokyo)
2640: A UNIFIED ONE-SHOT PROSODY AND SPEAKER CONVERSION SYSTEM WITH SELF-
SUPERVISED DISCRETE SPEECH UNITS
Li-Wei Chen (Carnegie Mellon University)*; Shinji Watanabe (Carnegie Mellon University); Alexander I.
Rudnicky (Carnegie Mellon University)
196
2642: Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
Yinghao A Li (Columbia University)*; Cong Han (Columbia Univeristy); Xilin Jiang (Columbia University);
Nima Mesgarani (Columbia University)
2671: Bridging Speech and Text Pre-trained Models with Unsupervised ASR
Jiatong Shi (Carnegie Mellon University)*; Chan-Jan Hsu (National Taiwan University); ho lam Chung
(National Taiwan University); Dongji Gao (Johns Hopkins University); Paola Garcia (Johns Hopkins
University); Shinji Watanabe (Carnegie Mellon University); Ann Lee (Meta, lnc.); Hung-yi Lee (National
Taiwan University)
2679: Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-to-
Speech
Takaaki Saeki (The University of Tokyo)*; Heiga Zen (Google); Zhehuai Chen (Google); Nobuyuki
Morioka (Google); Yuan Wang (Google); Yu Zhang (Google); Ankur Bapna (Google Research); Andrew
Rosenberg (Google LLC); Bhuvana Ramabhadran (Google)
197
2700: Improving CTC-based ASR Models with Gated Interlayer Collaboration
Yuting Yang (NetEase Yidun AI Lab)*; Yuke Li (NetEase Yidun AI Lab); Binbin Du (NetEase Yidun AI Lab)
2702: Reducing Language Confusion for Code-switching Speech Recognition with Token-level
Language Diarization
Hexin Liu (Nanyang Technological University)*; Haihua Xu (Temasek Laboratories, Nanyang
Technological University, Singapore); Paola Garcia (Johns Hopkins University); Andy W H Khong
(Nanyang Technological University); Yi He (Bytedance); Sanjeev Khudanpur (Johns Hopkins University)
2721: Speech reconstruction from silent tongue and lip articulation by pseudo target generation
and domain adversarial training
Rui-Chen Zheng (University of Science and Technology of China)*; Yang Ai (University of Science and
Technology of China); Zhen-Hua Ling (University of Science and Technology of China)
2768: Adaptive Knowledge Distillation between Text and Speech Pre-trained Models
Jinjie Ni (Nanyang Technological University)*; Yukun Ma (Alibaba Group); Wen Wang (Alibaba Group);
Qian Chen (Speech Lab, DAMO Academy, Alibaba Group); Dianwen Ng (Alibaba Group/Nanyang
Technological University); HAN LEI (Nanyang Technological University); Trung Hieu Nguyen (Alibaba
Group); Chong Zhang (Alibaba Group); Bin Ma ("Alibaba, Singapore R&D Center"); Erik Cambria
(Nanyang Technological University, Singapore)
2772: Picking the Underused Heads: A Network Pruning Perspective of Attention Head Selection
for Fusing Dialogue Coreference Information
Zhengyuan Liu (A*STAR)*; Nancy Chen (Institute for Infocomm Research)
2775: LIMI-VC: A LIGHT WEIGHT VOICE CONVERSION MODEL WITH MUTUAL INFORMATION
DISENTANGLEMENT
Liangjie Huang (Beijing Language and Culture University); Tian Yuan (Baidu (China) Co., Ltd); Yunming
Liang (Baidu (China) Co., Ltd); Zeyu Chen (Baidu, Inc.); Can Wen (Baidu (China) Co., Ltd); Yanlu Xie
(Beijing Language and Culture University); Jinsong Zhang (Beijing Language and Culture University);
dengfeng ke (blcu.edu.cn)*
2778: SELECTIVE FILM CONDITIONING WITH CTC-BASED ASR PROBABILITY FOR SPEECH
ENHANCEMENT
Da-Hee Yang (Hanyang University); Joon-Hyuk Chang (Hanyang University)*
198
2788: CLICKER: Attention-Based Cross-Lingual Commonsense Knowledge Transfer
Ruolin Su (Georgia Institute of Technology)*; Zhongkai Sun (Amazon Alexa AI); Sixing Lu (Amazon);
chengyuan ma (amazon); Chenlei Guo (Amazon)
2793: SPTEAE: A SOFT PROMPT TRANSFER MODEL FOR ZERO-SHOT CROSS-LINGUAL EVENT
ARGUMENT EXTRACTION
Huipeng Ma (National Computer System Engineering Research Institute of China)*; qiu tang (National
Computer System Engineering Research Institute of China); ni zhang (National Computer System
Engineering Research Institute of China ); Rui Xu (National Computer System Engineering Research
Institute of China); Yanhua Shao (National Computer System Engineering Research Institute of China);
Wei Yan (National Computer System Engineering Research Institute of China); Yaojun Wang (China
Agricultural University)
2862: Improving learning objectives for speaker verification from the perspective of score
comparison
Min Hyun Han (Seoul National University)*; Sung Hwan Mun (Seoul National University); Minchan Kim
(Seoul National University); Myeonghun Jeong (Seoul National University); Sunghwan Ahn (Seoul
National University); Nam Soo Kim (Seoul National University)
199
2865: Twitter Stance Detection via Neural Production Systems
Bowen Zhang (Shenzhen Technology University)*; Daijun Ding (Shenzhen Technology University);
Guangning Xu (Harbin Institute of Technology, Shenzhen ▲); Jinjin Guo (JD Intelligent Cities Research);
Zhichao Huang (JD Intelligent Cities Research); Xu Huang (Harbin Institute of Technology, Shenzhen)
2888: Unsupervised Fine-Tuning Data Selection for ASR Using Self-Supervised Speech Models
Reem A Gody (The University of Texas at Austin)*; David Harwath (The University of Texas at Austin)
2889: PCF: ECAPA-TDNN with Progressive Channel Fusion for Speaker Verification
Zhenduo Zhao (Institute of Acoustics, Chinese Academy of Sciences)*; Zhuo Li (Key Laboratory of
Speech Acoustics and Content Understanding,Institute of Acoustics, Chinese Academy of Sciences);
Wenchao Wang (Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics,
Chinese Academy of Sciences, Beijing, China); pengyuan zhang (Institute of Acoustics, Chinese
Academy of Sciences)
2977: Integrating Syntactic and Semantic Knowledge in AMR Parsing with Heterogeneous Graph
Attention Network
Yikemaiti Sataer (Southeast University)*; Chuanqi Shi (Southeast University); Miao Gao (Southeast
University); Yunlong Fan (Southeast University); Bin Li (Southeast University); Zhiqiang Gao (Southeast
University)
2982: Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using
wav2vec 2.0
Marie Kunešová (University of West Bohemia)*; Zbyněk Zajíc ( University of West Bohemia)
2990: Using Auxiliary Tasks In Multimodal Fusion Of Wav2vec 2.0 And BERT For Multimodal
Emotion Recognition
Dekai Sun (Harbin Institute of Technology)*; yancheng He (Harbin Institute of Technology); jiqing Han
(Harbin Institute of Technology)
200
3001: Towards Reducing Patient Effort for the Automatic Prediction of Speech Intelligibility in
Head and Neck Cancers
Sebastião Quintas (IRIT, Université de Toulouse, CNRS, Toulouse, France)*; Alberto Abad (INESC-ID);
Julie Mauclair (IRIT); Virginie Woisard (Hospitals of Toulouse); Julien Pinquier (IRIT)
3023: Multi-task Transformer with Relation-attention and Type-attention for Named Entity
Recognition
Ying Mo (Beihang University)*; Hongyin Tang (Meituan); Jiahao Liu (Meituan); Qifan Wang (Meta AI);
Zenglin Xu (Harbin Institute of Technology, Shenzhen); Jingang Wang (Meituan); Wei Wu (Meituan);
Zhoujun Li (Beihang University)
3033: UFO2: A unified pre-training framework for online and offline speech recognition
Li Fu (JD)*; Siqi Li (JD Technology); Qingtao Li (JD Technology); Liping Deng (JD Technology); Fangzhu
Li (JD Technology); fan lu (JD); Meng Chen (JD AI); Xiaodong He (JDT)
3038: Self-adaptive Incremental Machine Speech Chain for Lombard TTS with High-granularity
ASR Feedback in Dynamic Noise Condition
Sashi Novitasari (Nara Institute of Science and Technology)*; Sakriani Sakti (Japan Advanced Institute of
Science and Technology); Satoshi Nakamura (Nara Institute of Science and Technology, Japan)
201
3066: Multi-lingual pronunciation assessment with unified phoneme set and language-specific
embeddings
Binghuai Lin (MIG, Tencent Science and Technology Ltd.); Liyuan wang (Tencent Technology Co., Ltd)*
3072: Robust multi-modal speech emotion recognition with ASR error adaptation
Binghuai Lin (MIG, Tencent Science and Technology Ltd.); Liyuan wang (Tencent Technology Co., Ltd)*
3083: Narrow Down Before Selection: A Dynamic Exclusion Model For Multiple-Choice QA
Xiyan Liu (Beijing University of Posts and Telecommunications); Yidong Shi (Beijing University of Posts
and Telecommunications); Ruifang Liu (Beijing University of Posts and Telecommunications)*; Ge Bai
(Beijing University of Posts and Telecommunications); Yanyi Chen (Beijing University of Posts and
Telecommunications)
3092: Multi-modal ASR error correction with joint ASR error detection
Binghuai Lin (MIG, Tencent Science and Technology Ltd.); Liyuan wang (Tencent Technology Co., Ltd)*
3093: Joint Modeling for ASR Correction and Dialog State Tracking
Deyuan Wang (Beijing University of Posts and Telecommunications)*; Tiantian Zhang (Beijing University
of Posts and Telecommunications); Caixia Yuan (Beijing University of Posts and Telecommunications);
Xiaojie Wang (Beijing University of Posts and Telecommunications)
3099: UNIFIED PROMPT LEARNING MAKES PRE-TRAINED LANGUAGE MODELS BETTER FEW-
SHOT LEARNERS
Feihu Jin (Institute of Automation ,Chinese Academy of Sciences)*; Jinliang Lu (Institute of Automation
,Chinese Academy of Sciences); Jiajun Zhang (Institute of Automation Chinese Academy of Sciences)
3135: JOINT PRE-TRAINING WITH SPEECH AND BILINGUAL TEXT FOR DIRECT SPEECH TO
SPEECH TRANSLATION
Kun Wei (School of Computer Science, Northwestern Polytechnical University)*; Long Zhou (Microsoft
Research Asia); Ziqiang Zhang (University of Science and Technology of China); LIPING CHEN
(Microsoft); Shujie Liu (Microsoft Research Asia); Lei He (Microsoft Cloud and AI); Jinyu Li (Microsoft);
Furu Wei (Microsoft Research Asia)
3153: Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition
Models
Steven M. Hernandez (Virginia Commonwealth University)*; Ding Zhao (Google); Shaojin Ding (Google);
Antoine Bruguier (Google); Rohit Prabhavalkar (Google); Tara Sainath (Google); Yanzhang He (Google);
Ian McGraw ()
202
3169: Improving Spoken Language Identification with Map-Mix
Shangeth Rajaa (skit.ai)*; Kriti Anandan (skit.ai); Swaraj Dalmia (skit.ai); Tarun Gupta (IIT Indore); Eng
Siong Chng (Nanyang Technological University)
3175: Unifying Speech Enhancement and Separation with Gradient Modulation for End-to-End
Noise-Robust Speech Separation
Yuchen Hu (Nanyang Technological University)*; Chen Chen (Nanyang Technological University); Heqing
Zou (Nanyang Technological University); Xionghu Zhong (Hunan University); Eng Siong Chng (Nanyang
Technological University)
3189: TableIE: Capturing the Interactions among Sub-tasks in Information Extraction via Double
Tables
jiaxing lin (peking university)*; Runxin Xu (Peking University); Baobao Chang (Peking University)
3239: Multi-speaker Speech Synthesis from Electromyographic Signals by Soft Speech Unit
Prediction
Kevin Scheck (University of Bremen)*; Tanja Schultz (University of Bremen)
3241: Deep Subband Network for Joint Suppression of Echo, Noise and Reverberation in Real-
Time Fullband Speech Communication
Feifei Xiong (Alibaba Group)*; Minya Dong (Alibaba Group); Kechenying Zhou (Alibaba Group); Houwei
Zhu (Alibaba Group); Jinwei Feng (Alibaba Group)
3252: SENER: Sentiment Element Named Entity Recognition for Aspect-based Sentiment Analysis
Sun-Kyung Lee (KAIST)*; Jong-Hwan Kim (KAIST)
3258: Domain Adaptation with External Off-Policy Acoustic Catalogs for Scalable Contextual End-
To-End Automated Speech Recognition
David Chan (University of California, Berkeley)*; Shalini Ghosh (Amazon Alexa AI); Ariya Rastrow
(Amazon Alexa); Bjorn Hoffmeister (Amazon)
203
3286: An End-to-End Neural Network for Image-to-Audio Transformation
Chen Liu (Oregon Health & Science University); Michael Deisher (Intel Corporation)*; Munir Georges
(Intel Corporation); Munir Georges (THI)
3303: Can spoofing countermeasure and speaker verification systems be jointly optimised?
Wanying Ge (EURECOM)*; Hemlata Tak (EURECOM); Massimiliano Todisco (EURECOM); Nicholas
Evans (EURECOM)
3355: Fine-grained Textual Knowledge Transfer to Improve RNN Transducers for Speech
Recognition and Understanding
Vishal Sunder (The Ohio State University)*; Samuel Thomas (IBM Research AI); Jeff Kuo (IBM); Brian
Kingsbury (IBM Research); Eric Fosler-Lussier (Ohio State)
3365: JEIT: JOINT END-TO-END MODEL AND INTERNAL LANGUAGE MODEL TRAINING FOR
SPEECH RECOGNITION
Zhong Meng (Google LLC)*; Weiran Wang (Google); Rohit Prabhavalkar (Google); Tara Sainath
(Google); Tongzhou Chen (Google); Ehsan Variani (Google); Yu Zhang (Google); Bo Li (Google); Andrew
Rosenberg (Google LLC); Bhuvana Ramabhadran (Google)
3368: Variable Attention Masking for Configurable Transformer Transducer Speech Recognition
Pawel Swietojanski (Apple)*; Stefan Braun (Apple); Dogan Can (Apple); Thiago Fraga da Silva (Apple);
Arnab Ghoshal (Apple); Takaaki Hori (Apple); Roger Hsiao (Apple); Henry Mason (Apple); Erik
McDermott (Apple); Jan Silovsky (Apple); Ruchir Travadi (Apple); Xiaodan Zhuang (Apple)
3379: STUDY ON THE FAIRNESS OF SPEAKER VERIFICATION SYSTEMS ACROSS ACCENT AND
GENDER GROUPS
Mariel Estevez (CONICET / Universidad de Buenos Aires)*; Luciana Ferrer (CONICET / Universidad de
Buenos Aires)
3382: Resource-Efficient Transfer Learning From Speech Foundation Model Using Hierarchical
Feature Fusion
Zhouyuan Huo (Google )*; Khe C Sim (Google Inc.); Bo Li (Google); Dongseong Hwang (Google); Tara
Sainath (Google); Trevor Strohman (Google)
3398: Adapting a self-supervised speech representation for noisy speech emotion recognition by
using contrastive teacher-student learning
Seong-Gyun Leem (University of Texas at Dallas); Daniel Fulford (Boston University); JP Onnela (T.H.
Chan School of Public Health Harvard University); David Gard (San Francisco State University); Carlos
Busso (University of Texas at Dallas)*
204
3399: Evidence of Vocal Tract Articulation in Self-Supervised Learning of Speech
Cheol Jun Cho (UC Berkeley)*; Peter Wu (UC Berkeley); Abdelrahman Mohamed (Meta); Gopala Krishna
Anumanchipalli (UC Berkeley)
3423: Leveraging Multiple Sources in Automatic African American English Dialect Detection for
Adults and Children
Alexander Johnson (UCLA)*; Vishwas Shetty (UCLA); Mari Ostendorf (University of Washington); Abeer
Alwan (UCLA)
3440: Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic
model and variational autoencoder
Yusuke Yasuda (Nagoya university)*; Tomoki Toda (Nagoya University)
3461: A mutual implicit sentiment analysis model with bundle-aware contrastive learning
siqi cai (Wuhan University of Technology)*; Jingling Yuan (Wuhan University of Technology); Lin Li
(Wuhan University of Technology)
3462: T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language
Understanding via Phoneme level T5
Chan-Jan Hsu (National Taiwan University)*; ho lam Chung (National Taiwan University); Hung-yi Lee
(National Taiwan University); Yu Tsao (Academia Sinica)
3486: Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with Very Low
Computational Complexity
Ahmed Mustafa (Amazon )*; Jean-Marc Valin (Amazon); Jan Büthe (Amazon); Paris Smaragdis
(Amazon); Michael M Goodwin (AWS )
205
3499: Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers
Duc Le (Meta)*; Frank Seide (Meta); Yuhao Wang (Meta); Yang Li (Meta); Kjell Schubert (Meta); Ozlem
Kalinli (Meta); Mike Seltzer (Meta)
3522: Nonparallel High-Quality Audio Super Resolution with Domain Adaptation and Resampling
CycleGANs
Reo Yoneyama (Nagoya University)*; Ryuichi Yamamoto (LINE Corp.); Kentaro Tachibana (LINE Corp.)
3578: Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition
Saumya Yashmohini Sahai (Amazon); Jing Liu (Amazon.com)*; Thejaswi Muniyappa (Amazon);
Kanthashree Mysore Sathyendra (Amazon); Anastasios Alexandridis (Amazon.com); Grant Strimel
(Amazon); Ross McGowan (Amazon); Ariya Rastrow (Amazon Alexa); Athanasios Mouchtaris (Amazon
Alexa); Feng-Ju Chang (Amazon); Siegfried Kunzmann (Amazon)
3600: Comparison of Soft and Hard Target RNN-T Distillation for Large-scale ASR
Dongseong Hwang (Google)*; Khe C Sim (Google Inc.); Yu Zhang (Google); Trevor Strohman (Google)
206
3633: Generic Dependency Modeling for Multi-Party Conversation
Weizhou Shen (Sun Yat-sen University); Xiaojun Quan (Sun Yat-sen University)*; Ke Yang (Sun Yat-sen
University)
3666: token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired Speech and
Text
Xianghu Yue (National University of Singapore )*; Junyi Ao (The Chinese University of Hong Kong
(Shenzhen)); Xiaoxue Gao (National University of Singapore); Haizhou Li (The Chinese University of
Hong Kong (Shenzhen))
3692: Mitigating Domain Dependency for Improved Speech Enhancement via SNR Loss Boosting
Lili Yin ( Xinjiang University)*; Di Wu (Xinjiangdaxue); Zhibin Qiu (XinJiang University); Hao Huang
(Xinjiang University)
3714: DAIS: THE DELFT DATABASE OF EEG RECORDINGS OF DUTCH ARTICULATED AND
IMAGINED SPEECH
Bo Dekker (Department of Biomechanical Engineering, Delft University of Technology); Alfred Schouten
(Department of Biomechanical Engineering, Delft University of Technology); Odette Scharenborg
(Multimedia Computing Group, Delft University of Technology)*
207
3725: MHLAT: Multi-hop Label-wise Attention Model for Automatic ICD Coding
Junwen Duan (Central South University)*; Han Jiang (Central South University); Ying Yu (Central South
University)
3731: Weight Averaging: A Simple Yet Effective Method to Overcome Catastrophic Forgetting in
Automatic Speech Recognition
Steven Vander Eeckt (KU Leuven)*; Hugo Van hamme (KU LEUVEN)
3785: Disentangled and Robust Representation Learning for Bragging Classification in Social
Media
Xiang Li (Tianjin university)*; Yucheng Zhou (University of Technology Sydney)
3796: Hybrid Neural Network With Cross- and Self-Module Attention Pooling for Text-Independent
Speaker Verification
Jahangir Alam (Computer Research Institute of Montreal (CRIM), Montreal (Quebec) Canada)*;
Woohyun Kang (Amazon Web Services); Abderrahim Fathan (Computer Research Institute of Montreal
(CRIM), Montreal, Quebec, Canada)
3825: Discriminative Speaker Representation via Contrastive Learning with Class-Aware Attention
in Angular Space
Zhe LI (Hong Kong Polytechnic University)*; Man-Wai MAK (The Hong Kong Polytechnic University);
Helen Meng (The Chinese University of Hong Kong)
208
3859: MULTILEVEL TRANSFORMER FOR MULTIMODAL EMOTION RECOGNITION
Junyi He (360 DigiTech)*; Meimei Wu (360DigiTech); Meng Li (360 DigitalTech); Xiaobo Zhu
(360DigiTech); Feng Ye (360DigiTech, Inc.)
3896: Dynamic TF-TDNN: Dynamic Time Delay Neural Network based on Temporal-Frequency
Attention for Dialect Recognition
Chao Liao (Kuaishou)*; Jinwen Huang (Kuaishou Technology); Huan Yuan (Kuaishou Technology); Peng
Yao (Kuaishou Inc.); Jianchao Tan (Kwai Inc.); zhang dawei (Kuaishou Technology); Feng Deng
(Kuaishou); Xiaorui Wang (Kwai); Chengru Song (Kuaishou)
3917: Robust Knowledge Distillation from RNN-T Models With Noisy Training Labels Using Full-
Sum Loss
Mohammad Zeineldeen (RWTH Aachen University / AppTek)*; Kartik Audhkhasi (Google); Murali Karthick
Baskar (Google); Bhuvana Ramabhadran (Google)
3918: MGAT: Multi-granularity Attention based Transformers for Multi-modal Emotion Recognition
Weiquan Fan (South China University of Technology)*; Xiaofen Xing ( South China University of
Technology); Bolun Cai (Shopee); Xiangmin Xu (South China University of Technology)
3926: Fast-U2++: Fast and Accurate End-to-End Speech Recognition in Joint CTC/Attention
Frames
Chengdong Liang (Northwestern Polytechnical University)*; Zhang XiaoLei (Northwestern Polytechnical
University); Binbin Zhang (Horizon Robotics); Di Wu (Horizon Robotics); Shengqiang Li (Horizon
Robotics); Xingchen Song (Horizon Robotics); Zhendong Peng (Horizon Robotics); Fuping Pan (Horizon
Robotics)
209
3928: WeKws: A production first small-footprint end-to-end Keyword Spotting Toolkit
Jie Wang (School of Marine Science and Technology, Northwestern Polytechnical University, Xi’an,
China)*; Menglong Xu (Horizon Robotics); Jingyong Hou (Northwestern Polytechnical University); Binbin
Zhang (Horizon Robotics); Zhang XiaoLei (Northwestern Polytechnical University); Lei Xie (NWPU);
Fuping Pan (Horizon Robotics)
3930: META LEARNING WITH ADAPTIVE LOSS WEIGHT FOR LOW-RESOURCE SPEECH
RECOGNITION
Qiulin Wang (Xiamen University); Wenxuan Hu (Xiamen University); Lin Li (Xiamen University); Qingyang
Hong (Xiamen University)*
3966: JSV-VC: JOINTLY TRAINED SPEAKER VERIFICATION AND VOICE CONVERSION MODELS
Shogo Seki (NTT Corporation)*; Hirokazu Kameoka (NTT Communication Science Laboratories, NTT
Corporation); Kou Tanaka (NTT corpration); Takuhiro Kaneko (NTT Corporation)
3971: HOW TO PUSH THE FASTEST MODEL 50X FASTER: STREAMING NON-AUTOREGRESSIVE
SPEECH SYNTHESIS ON RESOUCE-LIMITED DEVICES
Thinh Van Nguyen (VinBigdata)*; Cuong H Pham (VinBigdata JSC); Dang-Khoa MAC (VinBigdata)
3973: WeavSpeech: Data Augmentation Strategy for Automatic Speech Recognition via Semantic-
Aware Weaving
Kyusung Seo (KAIST)*; Joonhyung Park (KAIST); Jaeyun Song (KAIST); Eunho Yang (KAIST)
3984: Real-Time MRI Video synthesis from time aligned phonemes with sequence-to-sequence
networks
Sathvik Udupa (Indian Institute of Science)*; Prasanta Dr Ghosh (Indian Institute of Science (IISc),
Bangalore)
210
3999: Peak-First CTC: Reducing the Peak Latency of CTC Models by Applying Peak-First
Regularization
Zhengkun Tian (Meituan Inc.)*; Hongyu Xiang (Meituan Inc.); Min Li (Meituan Inc.); Feifei Lin (Meituan
Inc.); Ke Ding (Meituan Inc.); Guanglu Wan (Meituan)
4016: Adapted Multimodal BERT with Layer-wise Fusion for Sentiment Analysis
Odysseas S Chlapanis (National Technical University of Athens)*; Georgios Paraskevopoulos (National
Technical University of Athens); Alexandros Potamianos (National Technical University of Athens)
4058: Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-
speech
Dong Yang (The University of Tokyo)*; Tomoki Koriyama (CyberAgent, Inc.); Yuki Saito ("The University of
Tokyo, Japan"); Takaaki Saeki (The University of Tokyo); Detai Xin (The University of Tokyo); Hiroshi
Saruwatari (The University of Tokyo)
4063: Two-stage UNet with multi-axis gated multilayer perceptron for monaural noisy-reverberant
speech enhancement
Zehua Zhang (Harbin Institute of Technology(Shenzhen))*; Shiyun Xu (Harbin Institute of
Technology(Shenzhen)); Xuyi Zhuang (Harbin Institute of Technology(Shenzhen)); Lianyu Zhou (Harbin
Institute of Technology(Shenzhen)); Heng Li (Harbin Institute of Technology(Shenzhen)); Mingjiang Wang
(Harbin Institute of Technology Shenzhen)
4065: Temporal Modeling Matters: A Novel Temporal Emotional Modeling Approach for Speech
Emotion Recognition
JiaXin Ye (Fudan University); Xin-Cheng Wen (Harbin Institute of Technology (Shenzhen)); Yujie Wei
(Fudan University); Yong Xu (Fujian University of Technology); KunHong Liu (Xiamen University);
Hongming Shan (Fudan University)*
211
4097: DialogMI: A Dialogue Model Based on Enhancing Dialogue Mutual Information
Yibo Zhang (Beijing University of Posts and Telecommunications)*; Ping Gong (Beijing University of Posts
and Telecommunications); Zelin Wang (Beijing University of Posts and Telecommunications); Zhe Li
(Beijing University of Posts and Telecommunications); Xuanyuan Yang (Beijing University of Posts and
Telecommunications)
4104: Autovocoder: Fast Waveform Generation from a Learned Speech Representation using
Differentiable Digital Signal Processing
Jacob J Webber (The Centre for Speech Technology Research, University of Edinburgh)*; Cassia
Valentini (University of Edinburgh); Evelyn Williams (University of Edinburgh); Gustav Eje Henter (KTH
Royal Institute of Technology); Simon King (University of Edinburgh)
4107: Feature Selection and Text Embedding For Detecting Dementia from Spontaneous
Cantonese
Xiaoquan Ke (The Hong Kong Polytechnic University)*; Man-Wai MAK (The Hong Kong Polytechnic
University); Mei Ling MENG (The Chinese University of Hong Kong)
4124: SPEECH AND NOISE DUAL-STREAM SPECTROGRAM REFINE NETWORK WITH SPEECH
DISTORTION LOSS FOR ROBUST SPEECH RECOGNITION
Haoyu Lu (Tianjin University)*; Nan Li (Tianjin University); Tongtong Song (Tianjin University); Longbiao
Wang (Tianjin University); Jianwu Dang (Tianjin University); Xiaobao Wang (Tianjin Univerisity); Shiliang
Zhang (Alibaba Group)
4135: Knowledge Transfer For On-Device Speech Emotion Recognition with Neural Structured
Learning
Yi Chang (Imperial College London)*; Zhao Ren (L3S Research Center); Thanh Tam Nguyen (Griffith
University); Kun Qian (Beijing Institute of Technology); Bjoern W. Schuller (Imperial College London)
4139: Joint Discriminator and Transfer Based Fast Domain Adaptation for End-to-End Speech
Recognition
Hang Shao (Shanghai Jiao Tong University)*; Tian Tan (Aispeech Ltd.); wei wang (Shanghai Jiao Tong
University); Xun Gong (Shanghai Jiaotong University); Yanmin Qian (Shanghai Jiao Tong University)
4178: F-PABEE: Flexible-patience-based Early Exiting for Single-label and Multi-label text
Classification Tasks
Xiangxiang Gao (Shanghai Jiaotong University); Wei Zhu (East China Normal University)*; Jiasheng Gao
(Shenzhen University); Congrui Yin (Nanchang University)
212
4186: MoLE : MIXTURE OF LANGUAGE EXPERTS FOR MULTI-LINGUAL AUTOMATIC SPEECH
RECOGNITION
Yoohwan Kwon (Naver corperation)*; Soo-Whan Chung (Naver Corporation)
4189: Source-Filter HiFi-GAN: Fast and Pitch Controllable High-Fidelity Neural Vocoder
Reo Yoneyama (Nagoya University)*; Yi-Chiao Wu (META); Tomoki Toda (Nagoya University)
4196: Rethinking the Reasonability of the Test Set for Simultaneous Machine Translation
Mengge Liu (Beijing Institute of Technology)*; Wen Zhang (Xiaomi AI Lab); Xiang Li (Xiaomi AI Lab); Jian
Luan (Xiaomi AI Lab); Bin Wang (Xiaomi AI Lab); Yuhang Guo (Beijing Engineering Research Center of
High Volume Language Information Processing and Cloud Computing Applications, Department of
Computer Science and Technology, Beijing Institute of technology); Shuoying Chen (Beijing Institute of
Technology)
4212: X-SepFormer: End-to-end Speaker Extraction Network with Explicit Optimization on Speaker
Confusion
KAI LIU (Huawei Technologies Co., Ltd.)*; Ziqing Du (Huawei Technologies Co., Ltd.); Xucheng Wan
(Huawei Technologies Co., Ltd.); zhou huan (AARC, Huawei Technologies Co., Ltd.)
4278: A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-
Talker One
Lingwei Meng (The Chinese University of Hong Kong)*; Jiawen Kang (The Chinese University of Hong
Kong); Mingyu Cui (The Chinese University of Hong Kong); Yuejiao Wang (The Chinese University of
Hong Kong); Xixin Wu (The Chinese University of Hong Kong); Helen Meng (The Chinese University of
Hong Kong)
4305: Target Speaker Extraction with Ultra-Short Reference Speech by VE-VE Framework
Lei Yang (Samsung)*; Wei Liu (Samsung); Lufen Tan (Samsung); Jaemo Yang (Samsung); Han-gil Moon
(Samsung)
4312: An Interpretable model using evidence information for Multi-hop Question Answering over
Long texts
Yanyi Chen (Beijing University of Posts and Telecommunications); Ruifang Liu (Beijing University of Posts
and Telecommunications)*; Xiyan Liu (Beijing University of Posts and Telecommunications); Yidong Shi
(Beijing University of Posts and Telecommunications); Ge Bai (Beijing University of Posts and
Telecommunications)
213
4328: VF-TACO2: TOWARDS FAST AND LIGHTWEIGHT SYNTHESIS FOR AUTOREGRESSIVE
MODELS WITH VARIATION AUTOENCODER AND FEATURE DISTILLATION
Yuhao Liu ( Tianjin University)*; Cheng Gong (Tianjin University); Longbiao Wang (Tianjin University);
Xixin Wu (The Chinese University of Hong Kong); Qiuyu Liu (Tianjin University); Jianwu Dang (Tianjin
University)
4340: Phase-Aware Spoof Speech Detection Based on Res2Net with Phase Network
Juntae Kim (SK Telecom)*; Sung Min Ban (SK Telecom)
4348: Gaussian Prior Reinforcement Learning for Nested Named Entity Recognition
Yawen Yang (Tsinghua University)*; Xuming Hu (Tsinghua University); Fukun Ma (Tsinghua University);
Shuang Li (Tsinghua University); Aiwei Liu (Tsinghua University); Lijie Wen (Tsinghua University); Philip S
Yu (UIC)
4354: Role of Lexical Boundary Information in Chunk-Level Segmentation for Speech Emotion
Recognition
Wei-Cheng Lin (The University of Texas at Dallas)*; Carlos Busso (University of Texas at Dallas)
4387: Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence
Generation
Motoi Omachi (Yahoo Japan Corporation)*; Brian Yan (Carnegie Mellon University); Siddharth Dalmia
(Carnegie Mellon University); Yuya Fujita (Yahoo Japan Corporation); Shinji Watanabe (Carnegie Mellon
University)
214
4409: A Token-level Contrastive Framework for Sign Language Translation
Biao Fu (Xiamen University); Peigen Ye (Xiamen University); liang zhang (Xiamen University); Pei Yu
(Xiamen University); Cong Hu (Xiamen University); xiaodong shi (xiamen university); Yidong Chen
(Xiamen University)*
4421: Audio-visual Speech Enhancement with a Deep Kalman Filter Generative Model
Ali Golmakani (Inria Nancy Grand ); Mostafa Sadeghi (INRIA)*; romain serizel (Université de Lorraine)
4441: Wespeaker: A Research and Production oriented Speaker Embedding Learning Toolkit
Hongji Wang (None); Chengdong Liang (Northwestern Polytechnical University); Shuai Wang (Shanghai
Jiao Tong University)*; Binbin Zhang (Horizon Robotics); Zhengyang Chen (Shanghai Jiao Tong
University); Xu Xiang (AISpeech Ltd); Slyne Deng (NVIDIA); Yanmin Qian (Shanghai Jiao Tong
University)
4451: A Protypical Semantic Decoupling Method via Joint Contrastive Learning for Few-Shot
Named Entity Recognition
Guanting Dong (Beijing University of Posts and Telecommunications)*; Zechen Wang (Beijing University
of Posts and Telecommunications); Liwen Wang (Beijing University of Posts and Telecommunications);
Daichi Guo (Beijing University of Posts and Telecommunications); Dayuan Fu (Beijing University of Posts
and Telecommunications); yuxiang wu (Beijing University of Posts and Telecommunications); Chen Zeng
(Beijing University of Posts and Telecommunications); Xuefeng Li (Beijing University of Posts and
Telecommunications); Tingfeng Hui (Beijing University of Posts and Telecommunications); Keqing He
(Beijing University of Posts and Telecommunications); Xinyue Cui (Beijing University of Posts and
Telecommunications); QiXiang Gao (Beijing University of Posts and Telecommunications); Weiran Xu
(Beijng University of Posts and Telecommunications)
4452: Revisit Out-of-vocabulary Problem for Slot Filling: A Unified Contrastive Framework with
Multi-level Data Augmentations
Daichi Guo (Beijing University of Posts and Telecommunications)*; Guanting Dong (Beijing University of
Posts and Telecommunications); Dayuan Fu (Beijing University of Posts and Telecommunications);
yuxiang wu (Beijing University of Posts and Telecommunications); Chen Zeng (Beijing University of Posts
and Telecommunications); Tingfeng Hui (Beijing University of Posts and Telecommunications); Liwen
Wang (Beijing University of Posts and Telecommunications); Xuefeng Li (Beijing University of Posts and
Telecommunications); Zechen Wang (Beijing University of Posts and Telecommunications); Keqing He
(Beijing University of Posts and Telecommunications); Xinyue Cui (Beijing University of Posts and
Telecommunications); Weiran Xu (Beijng University of Posts and Telecommunications)
215
4480: TEXT-TO-SPEECH SYNTHESIS FROM DARK DATA WITH EVALUATION-IN-THE-LOOP DATA
SELECTION
Kentaro Seki (The University of Tokyo)*; Shinnosuke Takamichi (The University of Tokyo); Takaaki Saeki
(The University of Tokyo); Hiroshi Saruwatari (The University of Tokyo)
4508: Improving Disfluency Detection with Multi-scale Self Attention and Contrastive Learning
Peiying Wang (JD AI); Chaoqun Duan (JD AI Research); Meng Chen (JD AI)*; Xiaodong He (JDT)
4534: Front-End Adapter: Adapting Front-End Input of Speech based Self-Supervised Learning for
Speech Recognition
Xie Chen (Shanghai Jiaotong University)*; Ziyang Ma (Shanghai Jiao Tong University); Changli Tang
(Tsinghua University); Yujin Wang (Tsinghua University); Zhisheng Zheng (Shanghai Jiao Tong University
)
4546: USING MODIFIED ADULT SPEECH AS DATA AUGMENTATION FOR CHILD SPEECH
RECOGNITION
Zijian Fan (Norwegian University of Science and Technology)*; Xinwei Cao (NTNU); Giampiero Salvi
(NTNU); Torbjørn Svendsen (NTNU)
216
4548: Improving Retrieval-based Dialogue System via Syntax-Informed Attention
Tengtao Song (Peking University)*; Nuo Chen (Peking University); Ji Jiang (Peking University); Zhihong
Zhu (Peking University); Yuexian Zou (Peking University)
4559: Zero-Shot Speech Emotion Recognition Using Generative Learning with Reconstructed
Prototypes
Xinzhou Xu (Nanjing University of Posts and Telecommunications)*; Jun Deng (Agile Robots AG); Zixing
Zhang (Imperial College London); Zhen Yang (Nanjing University of Posts and Telecommunication); Bjorn
W. Schuller (Imperial College London)
4584: Improved Training of Mixture-of-Experts Language GANs
Yekun Chai (Baidu Inc.)*; Qiyue Yin (Institute of Automation, Chinese Academy of Sciences); Junge
Zhang (CASIA)
4592: Dynamic Chunk Convolution for Unified Streaming and Non-Streaming Conformer ASR
Xilai Li (Amazon)*; Goeric Huybrechts (Amazon); Srikanth Ronanki (Amazon); Jeff Farris (Amazon);
Sravan Babu Bodapati (Amazon)
4608: SynGen: A Syntactic Plug-and-play Module for Generative Aspect-based Sentiment Analysis
Chengze Yu (Tsinghua University); Taiqiang Wu (Tsinghua University); Jiayi Li (Tsinghua University);
Xingyu Bai (Tsinghua University); Yujiu Yang (Tsinghua University)*
4612: Gated contextual adapters for selective contextual biasing in neural transducers
Anastasios Alexandridis (Amazon.com)*; Kanthashree Mysore Sathyendra (Amazon); Grant Strimel
(Amazon.com); Feng-Ju Chang (Amazon); Ariya Rastrow (Amazon Alexa); Nathan Susanj
(Amazon.com); Athanasios Mouchtaris (Amazon Alexa)
4623: Distance-based Weight Transfer for Fine-tuning from Near-field to Far-field Speaker
Verification
Li Zhang (Northwestern Polytechnical University)*; Qing Wang (Northwestern Polytechnical University);
Hongji Wang (None); Yue Li (Northwestern Polytechnical University); Wei Rao (Tencent); Yannan Wang
(Tencent); Lei Xie (NWPU)
4646: Modeling Global Latent Semantic in Multi-Turn Conversations with Random Context
Reconstruction
Chengwen Zhang (Beijing University of Posts & Telecommunications); Danqin Wu (Beijing University of
Posts & Telecommunications)*
217
4655: High-resolution embedding extractor for speaker diarisation
Heesoo Heo (Naver Corp.)*; Youngki Kwon (Naver Corporation); Bong-Jin Lee (Naver Corporation); You
Jin Kim (Naver Corporation); Jee-weon Jung (Naver Corp.)
4711: Dialog act guided contextual adapter for personalized speech recognition
Feng-Ju Chang (Amazon)*; Thejaswi Muniyappa (Amazon); Kanthashree Mysore Sathyendra (Amazon);
Kai Wei (Amazon); Grant Strimel (Amazon); Ross McGowan (Amazon)
4747: Make More of Your Data: Minimal Effort Data Augmentation for Automatic Speech
Recognition and Translation
Tsz Kin Lam (Heidelberg University); Shigehiko Schamoni (Heidelberg University)*; Stefan Riezler
(Heidelberg University)
4756: On Batching Variable Size Inputs for Training End-to-End Speech Enhancement Systems
Philippe Gonzalez (Technical University of Denmark)*; Tommy Sonne Alstrøm (Technical University of
Denmark); Tobias May (Technical University of Denmark)
4768: Context-aware Coherent Speaking Style Prediction with Hierarchical Transformers for
Audiobook Speech Synthesis
Shun Lei (Tsinghua University)*; Yixuan Zhou (Tsinghua University); Liyang Chen (Tsinghua University);
Zhiyong Wu (Tsinghua University); Shiyin Kang (XVerse Inc.); Helen Meng (The Chinese University of
Hong Kong)
218
4769: Less is more: A unified architecture for device-directed speech detection with multiple
invocation types
Ognjen Rudovic (Apple)*; Wonil Chang (Apple); Vineet Garg (Apple); Pranay Dighe (Apple); Pramod Jaya
Simha (Apple Inc); John Berkowitz (Apple); Ahmed Hussen Abdelaziz (Apple); Erik Marchi (Apple);
Sachin Kajarekar (Apple); Saurabh Adya (Apple)
4780: Pyramid Dynamic Inference: Encouraging Faster Inference via Early Exit Boosting
Ershad Banijamali (Amazon Inc.)*; Pegah Kharazmi (Amazon); Sepehr Eghbali (Amazon); Jixuan Wang
(Amazon); Clement Chung (Amazon); Samridhi Choudhary (Amazon)
4805: Deformable Temporal Convolutional Networks for Monaural Noisy Reverberant Speech
Separation
William Ravenscroft (The University of Sheffield)*; Stefan Goetze (University of Sheffield); Thomas Hain
(University of Sheffield)
4818: Analysing Discrete Self Supervised Speech Representation for Spoken Language Modeling
Amitay Sicherman (The Hebrew University of Jerusalem); Yossi Adi (Facebook AI Research )*
4823: Performance comparison of TTS models for Brazilian Portuguese to establish a baseline
Wilmer Johan Lobato (Alana AI)*; Felipe Farias (Alana AI); William Cruz (Alana AI); Marcellus Amadeus
(Alana AI)
219
4829: The 2nd Clarity Enhancement Challenge for hearing aid speech intelligibility enhancement:
Overview and Outcomes
Michael Akeroyd (University of Nottingham); Will Bailey (University of Sheffield); Jon Barker (Professor)*;
Trevor Cox (University of Salford); John F Culling (Cardiff University); Simone Graetzer (University of
Salford); Graham Naylor (University of Nottingham); Zuzanna Podwinska (University of Salford); Zehai Tu
(University of Sheffield)
4830: Internal Language Model Estimation based Adaptive Language Model Fusion for Domain
Adaptation
Rao Ma (University of Cambridge)*; Xiaobo Wu (ByteDance); Jin Qiu (ByteDance); Yanan Qin
(ByteDance); Haihua Xu (ByteDance); Peihao Wu (Bytedance); Zejun Ma (Bytedance)
4842: Multilingual end-to-end spoken language understanding for ultra-low footprint applications
Markus Mueller (Amazon Alexa)*; Anastasios Alexandridis (Amazon.com); Zach Trozenski (Amazon
Alexa); Joel Whiteman (Amazon Alexa); Grant Strimel (Amazon Alexa); Nathan Susanj (Amazon Alexa);
Athanasios Mouchtaris (Amazon Alexa); Siegfried Kunzmann (Amazon Alexa)
4850: Investigating Content-Aware Neural Text-To-Speech MOS Prediction Using Prosodic and
Linguistic Features
Alexandra Vioni (Innoetics, Samsung Electronics)*; Georgia Maniati (Samsung Electronics); Nikolaos
Ellinas (Innoetics, Samsung Electronics); June Sig Sung (Samsung Electronics); Inchul Hwang (Samsung
Research); Aimilios Chalamandaris (Samsung Electronics); Pirros Tsiakoulis (Samsung )
4858: Unsupervised domain adaptation for preference learning based speech emotion recognition
Abinay Reddy Naini (The University of Texas at Dallas); Mary Kohler (Laboratory for Analytic Sciences,
North Carolina State University); Carlos Busso (University of Texas at Dallas)*
4868: End-to-end spoken language understanding using joint CTC loss and self-supervised,
pretrained acoustic encoders
Jixuan Wang (Amazon)*; Martin Radfar (Amazon); Kai Wei (Amazon); Clement Chung (Amazon)
220
4889: DISTILL-QUANTIZE-TUNE - LEVERAGING LARGE TEACHERS FOR LOW-FOOTPRINT
EFFICIENT MULTILINGUAL NLU ON EDGE
Pegah Kharazmi (Amazon)*; Zhewei Zhao (Amazon); Clement Chung (Amazon); Samridhi Choudhary
(Amazon)
4968: EXPLORING WAV2VEC 2.0 FINE TUNING FOR IMPROVED SPEECH EMOTION
RECOGNITION
Li-Wei Chen (Carnegie Mellon University)*; Alexander I. Rudnicky (Carnegie Mellon University)
4970: Adaptable End-to-End ASR Models using Replaceable Internal LMs and Residual Softmax
Keqi Deng (University of Cambridge)*; Phil Woodland (Machine Intelligence Laboratory, Cambridge
University Department of Engineering)
221
5011: LEVERAGING LABEL CORRELATIONS IN A MULTI-LABEL SETTING: A CASE STUDY IN
EMOTION
Georgios Chochlakis (University of Southern California)*; Girish M Mahajan (Microsoft); Sabyasachee
Baruah (University of Southern California); Keith Burghardt (ISI, University of Southern California );
Kristina Lerman (USC Information Sciences Institute); Shrikanth Narayanan (USC)
5065: Exploring Attention Mechanisms for Multimodal Emotion Recognition in an Emergency Call
Center Corpus
Theo Deschamps-Berger (Paris-Saclay University, CNRS)*; Lori Lamel (CNRS LIMSI); Laurence Y.
Devillers (LISN-CNRS)
222
5084: AN ISOTROPY ANALYSIS FOR SELF-SUPERVISED ACOUSTIC UNIT EMBEDDINGS ON THE
ZERO RESOURCE SPEECH CHALLENGE 2021 FRAMEWORK
Jianan Chen (Japan Advanced Institute of Science and Technology)*; Sakriani Sakti (Japan Advanced
Institute of Science and Technology)
5106: To Wake-up or Not to Wake-up: Reducing Keyword False Alarm by Successive Refinement
Yashas Malur Saidutta (Samsung Research America)*; Rakshith Sharma Srinivasa (Samsung Research
America); Ching-Hua Lee (Samsung Research America); Chouchang Yang (Samsung Research
America); Yilin Shen (Samsung Research America); Hongxia Jin (Samsung Research America)
5113: Deep Implicit Distribution Alignment Networks for Cross-Corpus Speech Emotion
Recognition
Yan Zhao (Southeast University)*; JIncen Wang (Southeast University); Yuan Zong (Southeast
University); Wenming Zheng (Southeast University); Hailun lian (Southeast University); Li Zhao
(Southeast University)
5169: Joint Training And Decoding for Multilingual End-to-End Simultaneous Speech Translation
Wuwei Huang (Xiaomi Corporation)*; Renren Jin (Tianjin University); Wen Zhang (Xiaomi AI Lab); Jian
Luan (Xiaomi AI Lab); Bin Wang (Xiaomi AI Lab); Deyi Xiong (Tianjin University)
5204: A Fast and Accurate Pitch Estimation Algorithm Based on the Pseudo Wigner-Ville
Distribution
Yisi Liu (University of Chinese Academy of Sciences)*; Peter Wu (UC Berkeley); Alan Black (CMU);
Gopala Krishna Anumanchipalli (UC Berkeley)
223
5207: Federated Self-Learning with Weak Supervision for Speech Recognition
Milind M Rao (Amazon)*; Gopinath Chennupati (Amazon Alexa); Gautam Tiwari (Amazon); Anit Kumar
Sahu (Amazon Alexa AI); Anirudh Raju (Amazon Alexa); Ariya Rastrow (Amazon); Jasha Droppo
(Amazon)
5290: Knowledge-aware Few Shot Learning for Event Detection from Short Texts
Jinjin Guo (JD Intelligent Cities Research); Zhichao Huang (JD Intelligent Cities Research)*; Guangning
Xu (Harbin Institute of Technology, Shenzhen ▲); Bowen Zhang (Shenzhen Technology University);
Chaoqun Duan (JD AI Research)
5295: Conditional Conformer: Improving Speaker Modulation for Single and Multi-User Speech
Enhancement
Tom O'Malley (Google)*; Shaojin Ding (Google); Arun Narayanan (Google Inc.); Quan Wang (Google);
Rajeev Rikhye (Google); Qiao Liang (Google Inc.); Yanzhang He (Google); Ian McGraw ()
224
5337: PREDICTING MULTI-CODEBOOK VECTOR QUANTIZATION INDEXES FOR KNOWLEDGE
DISTILLATION
Liyong Guo (Northwestern Polytechnical University); Xiaoyu Yang (Xiaomi Corp., Beijing)*; Quandong
Wang (Xiaomi Corp., Beijing); Yuxiang Kong (Xiaomi Corp., Beijing); Zengwei Yao (Xiaomi Corp., Beijing);
fan cui (xiaomi); Fangjun Kuang (Xiaomi Corp., Beijing); Wei Kang (Xiaomi Corp., Beijing, China); Long
Lin (Xiaomi Corp., Beijing); Mingshuang Luo (Xiaomi Corp., Beijing); Piotr Żelasko (Johns Hopkins
University); Daniel Povey (Johns Hopkins University)
5361: Identifying Source Speakers for Voice Conversion based Spoofing Attacks on Speaker
Verification
Danwei Cai (Duke university)*; Zexin Cai (Duke University); Ming Li (Duke Kunshan University)
5362: Enhancement of text-predicting style token with generative adversarial network for
expressive speech synthesis
Hiroki Kanagawa (NTT Corporation)*; Yusuke Ijima (NTT Corporation)
5363: STATIC AND DYNAMIC SOURCE AND FILTER CUES FOR CLASSIFICATION OF
AMYOTROPHIC LATERAL SCLEROSIS PATIENTS AND HEALTHY SUBJECTS
Tanuka Bhattacharjee (Indian Institute of Science)*; Chowdam Venkata Thirumala Kumar (Indian Institute
of Science,Bengaluru); Yamini BK (NIMHANS); Nalini Atchayaram (NIMHANS); Ravi Yadav (NIMHANS);
Prasanta Dr Ghosh (Indian Institute of Science (IISc), Bangalore)
225
5415: LEARNING TO BUILD REASONING CHAINS BY RELIABLE PATH RETRIEVAL
Minjun Zhu (CASIA); Yixuan Weng (CASIA); Shizhu He (Institute of Automation, Chinese Academy of
Sciences); Kang Liu (Institute of Automation, Chinese Academy of Sciences); Jun Zhao (Institute of
Automation, Chinese Academy of Sciences)*
5447: SAMO: Speaker Attractor Multi-Center One-Class Learning for Voice Anti-Spoofing
Siwen Ding (Columbia University)*; You Zhang (University of Rochester); Zhiyao Duan (Unversity of
Rochester)
5456: Text Classification in the Wild: A Large-Scale Long-Tailed Name Normalization Dataset
Jiexing Qi (Shanghai Jiao Tong University)*; Shuhao Li (Shanghai Jiao Tong University ); Zhixin Guo
(Shanghai Jiao Tong University); Yusheng Huang (Shanghai Jiao Tong University); Chenghu Zhou
(Shanghai Jiao Tong University); Weinan Zhang (Shanghai Jiao Tong University); Xinbing Wang
(Shanghai Jiao Tong University); Zhouhan Lin (Shanghai Jiao Tong University)
5463: KG-ECO: Knowledge Graph Enhanced Entity Correction for Query Rewriting
Jinglun Cai (Amazon.com, Inc)*; Mingda Li (Amazon); Ziyan Jiang (Amazon); Eunah Cho (Amazon);
Zheng Chen (Amazon Alexa AI); Yang Liu (Amazon, Alexa AI); Xing Fan (Amazon); Chenlei Guo
(Amazon)
226
5473: IMPORTANCE OF DIFFERENT TEMPORAL MODULATIONS OF SPEECH: A TALE OF TWO
PERSPECTIVES
Samik Sadhu (Johns Hopkins University)*; Hynek Hermansky (The Johns Hopkins University, USA)
5496: Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy
Xulong Zhang (Ping An Technology (Shenzhen) Co., Ltd.); Haobin Tang (USTC); Jianzong Wang (Ping
An Technology (Shenzhen) Co., Ltd)*; Ning Cheng (Ping An Technology (Shenzhen) Co., Ltd); Jian Luo
(Ping An Technology (Shenzhen) Co., Ltd); Jing Xiao (Ping An Insurance (Group) Company of China)
5504: On the effectiveness of monoaural target source extraction for distant end-to-end automatic
speech recognition
Catalin Zorila (Toshiba Cambridge Research Laboratory)*; Rama S Doddipatla (Toshiba Europe LTD)
5523: FILLER WORDS DETECTION WITH HARD CATEGORY MINING AND INTER-CATEGORY
FOCAL LOSS
Zhiyuan Zhao (MSRA)*; Lijun Wu (Microsoft Research); Chuanxin Tang (Microsoft); Dacheng Yin
(University of Science and Technology of China); Yucheng Zhao (University of Science and Technology of
China); Chong Luo (MSRA)
5568: VQ-CL: Learning disentangled speech representations with contrastive learning and vector
quantization
Huaizhen Tang (University of Science and Technology of China); Xulong Zhang (Ping An Technology
(Shenzhen) Co., Ltd.); Jianzong Wang (Ping An Technology (Shenzhen) Co., Ltd)*; Ning Cheng (Ping An
Technology (Shenzhen) Co., Ltd); Jing Xiao (Ping An Insurance (Group) Company of China)
227
5579: Lightweight feature encoder for wake-up word detection based on self-supervised speech
representation
Hyungjun Lim (LG AI Research)*; Younggwan Kim (LG AI Research); Kiho Yeom (LG AI Research);
Eunjoo Seo (LG AI Research); Hoodong Lee (LG AI Research); Stanley Jungkyu Choi (LG AI Research);
Honglak Lee (LG AI Research)
5596: Mask The Bias: Improving Domain-Adaptive Generalization of CTC-based ASR with Internal
Language Model Estimation
Nilaksh Das (AWS AI Labs, Amazon)*; Monica Sunkara (Amazon); Sravan Babu Bodapati (Amazon);
Jinglun Cai (Amazon); Devang Kulshreshtha (Amazon); Jeff Farris (Amazon); Katrin Kirchhoff (Amazon)
5603: Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning
Sung-Feng Huang (National Taiwan University)*; Chia-ping Chen (Intelligo Technology Inc); Zhi-Sheng
Chen (Intelligo Technology Inc); Yu-Pao Tsai (Intelligo Technology Inc); Hung-yi Lee (National Taiwan
University)
5640: Articulatory Representation Learning Via Joint Factor Analysis and Neural Matrix
Factorization
Jiachen Lian (University of California Berkeley)*; Alan Black (CMU); Yijing Lu (University of Southern
California); Louis Goldstein (USC); Shinji Watanabe (Carnegie Mellon University); Gopala Krishna
Anumanchipalli (UC Berkeley)
228
5649: VE-KWS: VISUAL MODALITY ENHANCED END-TO-END KEYWORD SPOTTING
Ao Zhang (Northwestern Polytechnical University)*; He Wang (NWPU); Pengcheng Guo (Northwestern
Polytechnical University); Yihui Fu (Northwestern Polytechnical University); Lei Xie (NWPU); Yingying
Gao (China Mobile Research Institute); Shilei Zhang (China Mobile Research Institute); Junlan Feng
(China Mobile Research)
5653: ACF: Aligned Contrastive Finetuning for Language and Vision Tasks
Wei Zhu (East China Normal University)*; Peng Wang (Northwestern Normal Univ); Xiaoling Wang (East
China Normal University); Yuan Ni (Ping An Technology); Guotong Xie (Ping An Technology (Shenzhen)
Co. Ltd.)
5660: Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and
Understanding
Yifan Peng (Carnegie Mellon University)*; Kwangyoun Kim (ASAPP); Felix Wu (ASAPP); Prashant
Sridhar (ASAPP); Shinji Watanabe (Carnegie Mellon University)
5683: Designing and Evaluating Speech Emotion Recognition Systems: A reality check case study
with IEMOCAP
Nikolaos Antoniou (National Technical University of Athens)*; Athanasios Katsamanis ("ATHENA R.C.,
Behavioral Signal Technologies"); Theodoros Giannakopoulos (NCSR Demokritos); Shrikanth Narayanan
(University of Southern California)
5693: Learning From Yourself: A Self-Distillation Method for Fake Speech Detection
Jun Xue (Anhui Province Key Laboratory of Multimodal Cognitive Computation, School of Computer
Science and Technology, Anhui University)*; Cunhang Fan (Anhui Provincial Key Laboratory of
Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University);
Jiangyan Yi (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of
Sciences); chenglong wang (CASIA); Zhengqi Wen (Qiyuan Laboratory); Dan Zhang (Department of
Psychology, Tsinghua University); zhao lv (anhui university)
5700: Exploring universal singing speech language identification using self-supervised learning
based front-end features
Xingming Wang (Wuhan University)*; Hao Wu ( Speech, Audio and Music Intelligence (SAMI) group,
ByteDance); Chen Ding (Speech, Audio and Music Intelligence (SAMI) group, ByteDance); Chuanzeng
Huang (Speech, Audio and Music Intelligence (SAMI) group, ByteDance ); Ming Li (Duke Kunshan
University)
229
5730: Multi-View Learning for Speech Emotion Recognition With Categorical Emotion, Categorical
Sentiment, and Dimensional Scores
Daniel Tompkins (Microsoft)*; Dimitra Emmanouilidou (Microsoft Research); Soham Deshmukh
(Microsoft); Benjamin Elizalde (Microsoft)
5744: Investigation into phone-based subword units for Multilingual end-to-end speech
recognition
Saierdaer Yusuyin (Xinjiang University)*; Hao Huang (Xinjiang University); Junhua Liu (University of
Science and Technology of China); Cong Liu (iFLYTEK Research)
5784: RECOUPLE EVENT FIELD VIA PROBABILISTIC BIAS FOR EVENT EXTRACTION
Xingyu Bai (Tsinghua University); Taiqiang Wu (Tsinghua University); Han Guo (Tencent); Zhe Zhao
(Tencent ); Xuefeng Yang (Tencent); Jiayi Li (Tsinghua University); Weijie Liu (Tencent Inc.); QI JU
(Tencent); weigang guo (Tencent); Yujiu Yang (Tsinghua University)*
5787: Unsupervised Voice Type Discrimination Score Adaptation Using X-vector Clusters
Mark R Lindsey (Carnegie Mellon University)*; Tyler Vuong (Carnegie Mellon University); Richard M Stern
(Carnegie Mellon University)
5792: Multilingual Query-by-Example Keyword Spotting with Metric Learning and Phoneme-to-
Embedding Mapping
Paul M Reuter (Fraunhofer IDMT - HSA)*; Christian Rollwage (Fraunhofer IDMT - HSA); Bernd Meyer
(Carl von Ossietzky University Oldenburg)
5806: Think before you speak: Concept-guided Explicit Persona Reasoning for Personalized
Dialogue Generation
Yunpeng Li (Institute of Information Engineering,Chinese Academy of Sciences)*; Yue Hu (Institute of
Information Engineering,Chinese Academy of Sciences); Wei Peng (Institute of Information Engineering,
Chinese Academy of Sciences); Yuqiang Xie (Institute of Information Engineering, Chinese Academy of
Sciences)
230
5824: FAST AND PARALLEL DECODING FOR TRANSDUCER
Wei Kang (Xiaomi Corp., Beijing, China)*; Liyong Guo (Xiaomi Corp.); Fangjun Kuang (Xiaomi Corp.);
Long Lin (Xiaomi Corp., Beijing, China); Mingshuang Luo (Xiaomi Corp., Beijing, China); Zengwei Yao
(Xiaomi Corp., Beijing, China); Xiaoyu Yang (Xiaomi Corp., Beijing, China); Piotr Żelasko (Johns Hopkins
University); Daniel Povey (Johns Hopkins University)
5921: Efficient Uncertainty Estimation with Gaussian Process for Reliable Dialog Response
Retrieval
Tong Ye (Ping An Technology (Shenzhen) Co., Ltd. ;University of Science and Technology of China);
Zhitao Li (Ping An Technology (Shenzhen) Co., Ltd.); Jianzong Wang (Ping An Technology (Shenzhen)
Co., Ltd)*; Ning Cheng (Ping An Technology (Shenzhen) Co., Ltd); Jing Xiao (Ping An Insurance (Group)
Company of China)
5930: Preserving background sound in noise-robust voice conversion via multi-task learning
Jixun Yao (Northwestern Polytechnical University)*; Yi Lei (Northwestern Polytechnical University); Qing
Wang (Northwestern Polytechnical University); Pengcheng Guo (Northwestern Polytechnical University);
Ziqian Ning (Northwestern Polytechnical University); Lei Xie (NWPU); Hai Li (iQIYI Inc); Junhui Liu (iQIYI
Inc); Danming Xie (iQIYI)
5932: ANY-TO-ANY VOICE CONVERSION WITH F0 AND TIMBRE DISENTANGLEMENT AND NOVEL
TIMBRE CONDITIONING
Sudheer Kumar Kovela (Nvidia)*; Rafael Valle (NVIDIA); Ambrish Dantrey (Nvidia); Bryan Catanzaro
(NVIDIA)
231
5943: REPRESENTATION OF VOCAL TRACT LENGTH TRANSFORMATION BASED ON GROUP
THEORY
Atsushi Miyashita (Nagoya University)*; Tomoki Toda (Nagoya University)
5976: Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems
for Low-Resource Languages
Kaushal Bhogale (Indian Institute of Technology, Madras)*; Abhigyan Raman (AI4Bharat); Tahir Javed
(Indian Institute of Technology Madras); Sumanth Doddapaneni (Robert Bosch Centre for Data Science
and AI); Anoop Kunchukuttan (Microsoft); Pratyush Kumar (Indian Institute of Technology Madras); Mitesh
M. Khapra (Indian Institute of Technology Madras)
5998: Tranferring Quantified Emotion Knowledge for the Detection of Depression in Alzheimer's
Disease Using ForestNets
Paula Andrea Pérez-Toro (Friedrich-Alexander-Universität Erlangen-Nürnberg)*; Dalia Rodríguez-Salas
(Friedrich-Alexander-Universität Erlangen-Nürnberg); Tomas Arias-Vergara (Friedrich-Alexander-
Universitaet Erlangen-Nuernberg); Sebastian P Bayerl (Technische Hochschule Nürnberg Georg Simon
Ohm); Philipp Klumpp (Pattern Recognition Lab, FAU Erlangen-Nuremberg); Korbinian Riedhammer
(Technische Hochschule Nürnberg Georg Simon Ohm); Maria Schuster (Ludwig Maximilian University of
Munich); Elmar Noeth (friedrich Alexander Universitat, Erlangen-Nuremberg); Andreas K Maier (Pattern
Recognition Lab, FAU Erlangen-Nuremberg); Juan Rafael Orozco-Arroyave (University of Antioquia)
232
6027: SIAST: A Slot Imbalance-Aware Self-Training Scheme for Semi-Supervised Slot Filling
Jiachi Liu (Beijing University of Posts and Telecommunications)*; Sishi Xiong (Beijing University of Posts
and Telecommunications); Yuehuan He (University of Toronto); tong zhou (Beijing University of Posts and
Telecommunications); Liwen Wang (Beijing University of Posts and Telecommunications); Xuefeng Li
(Beijing University of Posts and Telecommunications); Bo Xiao (Beijing University of Posts and
Telecommunications)
6035: Moving Towards Non-Binary Gender Identification Via Analysis of System Errors in Binary
Gender Classification
Sebastian CG Ellis (University of Sheffield)*; Stefan Goetze (University of Sheffield); Heidi Christensen
(University of Sheffield)
6063: Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease
Detection
Jinchao Li (The Chinese University of Hong Kong)*; Kaitao Song (Microsoft Research Asia); Junan Li
(The Chinese University of Hong Kong); Bo ZHENG (the Chinese University of Hong Kong); Dongsheng
Li (Microsoft Research Asia); Xixin Wu (The Chinese University of Hong Kong); Xunying Liu (The Chinese
University of Hong Kong); Helen Meng (The Chinese University of Hong Kong)
6078: The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the 2021 MISP
Challenge: Deep Analysis
Haoxu Wang (Wuhan University); Ming Cheng (Duke Kunshan University); Qiang Fu (Alibaba Group);
Ming Li (Duke Kunshan University)*
233
6105: Metric-oriented Speech Enhancement using Diffusion Probabilistic Model
Chen Chen (Nanyang Technological University)*; Yuchen Hu (Nanyang Technological University); Weiwei
Weng (Nanyang Technological University); Eng Siong Chng (Nanyang Technological University)
6131: A Hierarchical Regression Chain Framework for Affective Vocal Burst Recognition
Jinchao Li (The Chinese University of Hong Kong)*; Xixin Wu (The Chinese University of Hong Kong);
Kaitao Song (Microsoft Research Asia); Dongsheng Li (Microsoft Research Asia); Xunying Liu (The
Chinese University of Hong Kong); Helen Meng (The Chinese University of Hong Kong)
6140: A processing framework to access large quantities of whispered speech found in ASMR
Pablo Pérez Zarazaga (KTH Royal Institute of Technology); Gustav Eje Henter (KTH Royal Institute of
Technology); Zofia Malisz (KTH Royal Institute of Technology)*
6177: Singing Voice Synthesis Based on a Musical Note Position-aware Attention Mechanism
Yukiya Hono (Nagoya Institute of Technology)*; Kei Hashimoto (Nagoya Institute of Technology);
Yoshihiko Nankaku (Nagoya Institute of Technology); Keiichi Tokuda (Department of Computer Science
and Engineering, Nagoya Institute of Technology)
6200: Enhancing the Vocal Range of Single-Speaker Singing Voice Synthesis with Melody-
Unsupervised Pre-training
Shaohuan Zhou (Tsinghua University)*; Xu Li (ARC Lab, Tencent); Zhiyong Wu (Tsinghua University);
Ying Shan (Tencent); Helen Meng (The Chinese University of Hong Kong)
6219: History, Present and Future: Enhancing Dialogue Generation with Few-shot History-Future
Prompt
Yihe Wang (Wuhan University)*; Yitong Li (Huawei Technologies Co., Ltd.); Yasheng Wang (NoahArk
Lab, Huawei); Fei Mi (Huawei); pingyi zhou (Noah’s Ark Lab, Huawei); Jin Liu (School of Computer
Science, Wuhan University); Xin Jiang (Huawei Noah's Ark Lab); Qun Liu (Huawei Noah's Ark Lab)
234
6225: Wav2vec-based Detection and Severity Level Classification of Dysarthria from Speech
Farhad Javanmardi (Aalto University)*; Saska Tirronen (Aalto University); Manila Kodali (Aalto
University); Sudarsana Reddy Kadiri (Aalto University); Paavo Alku (Aalto University)
6307: Continuous Action Space-based Spoken Language Acquisition Agent Using Residual
Sentence Embedding and Transformer Decoder
Ryota Komatsu (Tokyo Institute of Technology)*; Yusuke Kimura (Tokyo Institute of Technology); Takuma
Okamoto (National Institute of Information and Communications Technology); Takahiro Shinozaki (Tokyo
Institute of Technology)
6318: A Sentiment and Syntactic-Aware Graph Convolutional Network for Aspect-level Sentiment
Classification
Yuxin Yang (Northwest University)*; Xia Sun (Northwest University); Qiang Lu (Northwest university);
Richard F E Sutcliffe (Northwest University); Jun Feng (Northwest University)
6340: SPASHT: Semantic and PrAgmatic SpeecH Features for automatic assessment of autism
B Ashwini (Indraprastha Institute of Information Technology, New Delhi, India)*; Vrinda Narayan
(Indraprastha Institute of Information Technology, New Delhi, India); Jainendra Shukla (IIIT-Delhi)
6343: Estimating Shapley Values of Training Utterances for Automatic Speech Recognition Models
Ali Raza Syed (The Graduate Center, CUNY)*; Michael I Mandel (Brooklyn College, CUNY)
235
6348: UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction
Jiaxin GUO (Huawei)*; Minghan Wang (Huawei); Xiaosong Qiao (Huawei); Daimeng Wei (Huawei);
Hengchao Shang (HW-TSC); ZongYao LI (HW-TSC); Zhengzhe YU (HW-TSC); Yinglu Li (HUAWEI
TECHNOLOGIES CO., LTD.); Chang Su (Huawei); Min Zhang (Huawei); Shimin Tao (Huawei); Hao Yang
(Huawei)
6353: Improving Transformer-Based Networks with Locality for Automatic Speaker Verification
Mufan Sang (University of Texas at Dallas)*; Yong Zhao (Microsoft Corporation); GANG Liu (Microsoft);
John H Hansen (Univ. of Texas at Dallas); Jian WU (Microsoft Corp)
6371: Towards Building Text-To-Speech Systems for the Next Billion Users
Gokul Karthik Kumar (Mohamed Bin Zayed University of Artificial Intelligence)*; Praveen S V (Indian
Institute of Technology Madras); Pratyush Kumar (Indian Institute of Technology Madras); Mitesh M.
Khapra (Indian Institute of Technology Madras); Karthik Nandakumar ( Mohamed Bin Zayed University of
Artificial Intelligence)
6379: I3D: Transformer architectures with input-dependent dynamic depth for speech recognition
Yifan Peng (Carnegie Mellon University)*; Jaesong Lee (NAVER); Shinji Watanabe (Carnegie Mellon
University)
6389: Noise-aware target extension with self-distillation for robust speech recognition
Ju-seok Seong (Hanyang University); Jeong-Hwan Choi (Hanyang University); Jehyun Kyung (Hanyang
University); Ye-Rin Jeoung (Hanyang University); Joon-Hyuk Chang (Hanyang University)*
6418: DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain
supervision from DSP
Kun Song (Northwestern Polytechnical University)*; yongmao zhang (Audio, Speech and Language
Processing Group (ASLP@NPU), School of Computer Science, Northwestern Polytechnical University,
Xi’an, China); Yi Lei (Northwestern Polytechnical University); Jian Cong (Northwestern Polytechnical
University); Hanzhao Li (Northwestern Polytechnical University); Lei Xie (NWPU); Gang He (TAL
Education Group); Jinfeng Bai (TAL Education Group)
6420: Good Neighbors Are All You Need for Chinese Grapheme-to-Phoneme Conversion
Jungjun Kim (DeepBrain AI Inc.)*; Changjin Han (DeepBrain AI Inc.); Gyuhyeon Nam (DeepBrain AI Inc.);
Gyeongsu Chae (DeepBrain AI Inc.)
6449: DATA2VEC-AQC: SEARCH FOR THE RIGHT TEACHING ASSISTANT IN THE TEACHER-
STUDENT TRAINING SETUP
Vasista Sai Lodagala (Indian Institute of Technology, Madras)*; Sreyan Ghosh (University of Maryland,
College Park); S Umesh (IIT Chennai)
236
6458: PHONEix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation with
Phoneme Distribution Predictor
Yuning Wu (Renmin University of China)*; Jiatong Shi (Carnegie Mellon University); Tao Qian (RUC);
Dongji Gao (Johns Hopkins University); Qin Jin (Renmin University of China)
6477: Incorporating Visual Information Reconstruction into Progressive Learning for Optimizing
Audio-Visual Speech Enhancement
Chen-Yue Zhang (USTC)*; Hang Chen (USTC); Jun Du (University of Science and Technology of China);
Baocai Yin (USTC,iFLYTEK); Jia Pan (iFlytek Research); Chin-hui Lee (Georgia Institute of Technology)
6512: G2PL: Lexicon Enhanced Chinese Polyphone Disambiguation using BERT Adapter with a
New Dataset
Haifeng Zhao (Anhui University); Hongzhi Wan (Anhui University)*; Lili Huang (Anhui University;Institute
of Artificial Intelligence, Hefei Comprehensive National Science Center); Mingwei Cao (Anhui University)
237
Special Sessions
3325: Neurally Augmented State Space Model for Simultaneous Communication and Tracking with
Low Complexity Receivers
Fernando Pedraza (Technische Universität Berlin)*; Giuseppe Caire (Technische Universität Berlin)
3803: Joint Data Association, NLOS Mitigation, and Clutter Suppression for Networked Device-
Free Sensing in 6G Cellular Network
Qin Shi (The Hong Kong Polytechnic University); Liang Liu (The Hong Kong Polytechnic University)*;
Shuowen Zhang (The Hong Kong Polytechnic University)
2653: Applying Symmetrical Component Transform for Industrial Appliance Classification in Non-
Intrusive Load Monitoring
Anthony Faustine (Imr); Lucas Pereira (ITI, LARSyS, Técnico Lisboa)*
238
5853: Improving Knowledge Distillation for Non-Intrusive Load Monitoring through Explainability
Guided Learning
Djordje Batic (Univesity of Strathclyde)*; Giulia Tanoni (Università Politecnica delle Marche); Lina
Stankovic (University of Strathclyde); Vladimir Stankovic (University of Strathclyde); Emanuele Principi
(Università Politecnica delle Marche)
239
Automotive Radar Signal Processing and Machine Learning for Autonomous
Driving
2374: DOPPLER-CODED JOINT DIVISION MULTIPLE ACCESS WAVEFORM FOR AUTOMOTIVE
MIMO RADAR
Yanhua Wang (School of Information and Electronics, Beijing Institute of Technology;Electromagnetic
Sensing Research Center of CEMEE State Key Laboratory, Beijing Institute of Technology, Beijing,
China); Qiubo Pei (School of Information and Electronics, beijing institute of technology;Chongqing
Innovation Center, Beijing Institute of Technology, Chongqing, China); Xueyao Hu (School of Information
and Electronics, beijing institute of technology;Chongqing Innovation Center, Beijing Institute of
Technology, Chongqing, China)*; Jiamin Long (School of Information and Electronics, beijing institute of
technology;Chongqing Innovation Center, Beijing Institute of Technology, Chongqing, China); Hao Yu
(School of Information and Electronics, beijing institute of technology;Chongqing Innovation Center,
Beijing Institute of Technology, Chongqing, China); Le Zheng (School of Information and Electronics,
beijing institute of technology;Chongqing Innovation Center, Beijing Institute of Technology, Chongqing,
China)
5043: Machine learning based early debris detection using automotive low level radar data
Kanishka Tyagi (Aptiv Advance Research Center)*; Shan Zhang (Aptiv Advance Research Center);
Yihang Zhang (Aptiv Advance Research Center); John Kirkwood (Aptiv Advance Research Center);
Sanling Song (Aptiv); Narbik Manukian (Aptiv Advance Research Center)
5064: Joint Antenna Selection and Beamforming in Integrated Automotive Radar Sensing-
Communications with Quantized Double Phase Shifters
lifan xu (University of Alabama); Shunqiao Sun (The University of Alabama)*; Yimin D Zhang (Temple
University); Athina Petropulu (Rutgers)
5137: Navigating and Reaching Therapeutic Goals with Dynamical Systems in Conversation-based
Interventions
Victor Ardulov (Amazon)*; Shrikanth Narayanan (USC)
5359: Exploiting prompt learning with pre-trained language models for Alzheimer's Disease
detection
Yi Wang (The Chinese University of Hong Kong)*; Jiajun Deng (The Chinese University of HongKong);
Tianzi Wang (The Chinese University of HongKong); Bo ZHENG (the Chinese University of Hong Kong);
Shoukang Hu (Nanyang Technological University); Xunying Liu (The Chinese University of Hong Kong);
Helen Meng (The Chinese University of Hong Kong)
240
6377: Egocentric Action Anticipation for Personal Health
Ivan Rodin (University of Catania)*; Antonino Furnari (University of Catania); Dimitrios Mavroeidis (Philips
Research); Giovanni Maria Farinella (University of Catania, Italy)
6428: A Controllable Lifestyle Simulator for use in Deep Reinforcement Learning Algorithms
Libio Gonçalves Braz (UPSSITECH)*; Allmin Susaiyah (Philips)
4799: Interpolation of spatial room impulse responses using partial optimal transport
Aaron Geldert (Aalto University)*; Nils Meyer-Kahlen (Aalto University); Sebastian J Schlecht (Aalto
University)
4884: Simultaneous Acoustic Echo Sorting and 3-D Room Geometry Inference
Kathleen C MacWiliam (Department of Electrical Engineering (ESAT-STADIUS/ETC))*; Filip Elvander
(Aalto University); Toon van Waterschoot (Department of Electrical Engineering (ESAT-STADIUS/ETC))
241
2905: CADET: Control-Aware Dynamic Edge Computing for Real-Time Target Tracking in UAV
Systems
Luis Felipe Florenzan Reyes (University of L'Aquila)*; Francesco Smarra (University of L'Aquila);
Alessandro D'Innocenzo (University of L'Aquila); marco levorato (University of California, Irvine)
3263: Extended Kalman Filter for Graph Signals in Nonlinear Dynamic Systems
Guy Sagi (Ben Gurion University of the Negev); Nir Shlezinger (Ben-Gurion University); Tirza S
Routtenberg (Ben Gurion University of the Negev)*
242
3484: Multi-Agent Reinforcement Learning for Covert Semantic Communications over Wireless
Networks
Yining Wang (Beijing University of Posts and Telecommunications)*; Ye Hu (Columbia University);
HONGYANG DU (Nanyang Technological University); Tao Luo (Beijing University of Posts and
Communications); Dusit Niyato ()
5623: Asynchronous Federated Learning for Real-time Multiple Licence Plate Recognition through
Semantic Communication
renyou xie (Central South University); Chaojie Li (The University of New South Wales)*; Xiaojun Zhou
(Central South University); Zhao Yang Dong (The University of New South Wales)
6483: Generative Model Based Highly Efficient Semantic Communication Approach for Image
Transmission
TIANXIAO HAN (Zhejiang University); Jiancheng Tang (Zhejiang University); Qianqian Yang (Zhejiang
University)*; Yiping Duan (Tsinghua University); Zhaoyang Zhang (Zhejiang University); Zhiguo Shi
(Zhejiang University)
4375: Multiple Signed Graph Learning for Gene Regulatory Network Inference
Abdullah Karaaslanli (Michigan State University)*; Satabdi Saha (Michigan State University); Taps Maiti
(Michigan State University); Selin Aviyente (Michigan State University)
6456: Spatial Graph Signal Interpolation with an Application for Merging BCI Datasets with Various
Dimensionalities
Yassine El Ouahidi (IMT Atlantique)*; Lucas Drumetz (IMT Atlantique); Giulia Lioi (IMT Atlantique);
Nicolas Farrugia (IMT Atlantique); Bastien Pasdeloup (IMT Atlantique, Lab-STICC); Vincent Gripon (IMT
Atlantique)
243
Near-Field and Non-Planar Beamforming, Source Localization, and Adaptive Array
Processing
2309: Channel State Information-Free Artificial Noise-Aided Location-Privacy Enhancement
Jianxiu Li (University of Southern California)*; Urbashi Mitra (USC)
4418: Compressive estimation of near field channels for ultra massive-MIMO wideband THz
systems
Simon Tarboush (Independent Researcher)*; Anum Ali (Samsung Research America); Tareq Al-NAffouri
(CEMSE, KAUST)
6678: Utilization of Bessel Beams in Wideband Sub Terahertz Communication Systems to Mitigate
Beamsplit Effects in the Near-field
Arjun Singh (SUNY Polytechnic Institute)*; Vitaly Petrov (Northeastern University ); Josep Jornet
(Northeastern University)
244
4715: LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models
Teerapat Jenrungrot (University of Washington)*; Michael Chinen (Google); W. Bastiaan Kleijn (Google);
Jan Skoglund (Google); Zalán Borsos (Google); Neil Zeghidour (Google); Marco Tagliasacchi (Google)
5161: Neural Feature Predictor and Discriminative Residual Coding for Low-Bitrate Speech Coding
Haici Yang (Indiana University)*; Wootaek Lim (ETRI); Minje Kim (Indiana University)
3253: Exploiting spatial information with the informed complex-valued spatial autoencoder for
target speaker extraction
Annika Briegleb (Friedrich-Alexander-University Erlangen-Nürnberg)*; Mhd Modar Halimeh (Friedrich-
Alexander-University Erlangen-Nürnberg); Walter Kellermann (Friedrich-Alexander-University Erlangen-
Nürnberg)
245
Quantum Computing for Machine Learning and Signal Processing
4852: The role of initial entanglement in adaptive Gibbs state preparation on quantum computers
Sophia Economou (Virginia Tech)*; Ada Warren (Virginia Tech); Edwin Barnes (Virginia Tech)
5392: Quantum transfer learning using the large-scale unsupervised pre-trained model WavLM-
Large for synthetic speech detection
Ruoyu Wang (University of Science and Technology of China)*; Jun Du (University of Science and
Technology of China); Tian Gao (iFlytek Research)
246
Radar Waveform Design: Recent Advances and New Emerging Applications
1475: Co-Design for MIMO radar and MIMO communication aided by reconfigurable intelligent
surface
Da Li (National University of Defense Technology); Bo Tang (National University of Defense Technology
)*; Lei Xue (National University of Defense Technology)
2263: Dual-Use Signal Design for MIMO RadCom with Inter-pulse Index Modulation
Xue Yao (Southeast University)*; Cui Guolong (UESTC); Xianxiang Yu (UESTC)
3548: Joint Waveform and Passive Beamformer Design in Multi-IRS Aided Radar
Zahra Esmaeilbeig (University of Illinois at Chicago)*; Arian Eamaz (University of Illinois - Chicago, IL);
Kumar Vijay Mishra (United States DEVCOM Army Research Laboratory); Mojtaba Soltanalian (University
of Illinois)
609: ST-MVDNet++: Improve Vehicle Detection with Lidar-Radar Geometrical Augmentation via
Self-Training
Yu-Jhe Li (Carnegie Mellon University)*; Matthew O'Toole (Carnegie Mellon University); Kris Kitani
(Carnegie Mellon University)
1203: Graph Neural Networks for Object Type Classification Based on Automotive Radar Point
Clouds and Spectra
Loveneet Saini (Room 28); Axel Acosta (Bosch); Gor Hakobyan (Bosch)*
247
3467: SPATIAL-DOMAIN OBJECT DETECTION UNDER MIMO-FMCW AUTOMOTIVE RADAR
INTERFERENCE
Sian Jin (Princeton University); Pu Wang (MERL)*; Petros Boufounos (Mitsubishi Electric Research
Laboratories); Ryuhei Takahashi (Mitsubishi Electric Information Technology R&D Center); Sumit Roy
(University of Washington)
4666: Online Learning-based Waveform Selection for Improved Vehicle Recognition in Automotive
Radar
Charles E Thornton (Virginia Tech)*; William Howard (Virginia Tech); Michael R. Buehrer (Virginia Tech,
USA)
2493: Computational Efficient Monaural Speech Enhancement with Universal Sample rate Band-
split RNN
Jianwei Yu (Tencent AI lab)*; Yi Luo (Tencent AI Lab)
248
4244: Predictive SkiM: Contrastive Predictive Coding for Low-Latency Online Speech Separation
Chenda Li (Shanghai Jiao Tong University)*; Yifei Wu (Shanghai Jiao Tong University); Yanmin Qian
(Shanghai Jiao Tong University)
5485: Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via
Integrated Full- and Sub-Band Modeling
Zhong-Qiu Wang (Carnegie Mellon University)*; Samuele Cornell (Università Politecnica delle Marche);
Shukjae Choi (Hyundai Motor Company); Younglo Lee (42dot); Byeong-Yeol Kim (42dot); Shinji
Watanabe (Carnegie Mellon University)
1100: Distributionally Robust Multiclass Classification and Applications in Deep Image Classifiers
Ruidi Chen (Amazon); Boran Hao (Boston University); Ioannis C Paschalidis (Boston University)*
3334: Robust and Parallelizable Tensor Completion based on Tensor Factorization and Maximum
Correntropy Criterion
Yicong He (University of Central Florida); George Atia (University of Central Florida)*
3388: Gaussian process dynamical modeling for adaptive inference over graphs
Qin Lu (University of Minnesota)*; Konstantinos D. Polyzos (University of Minnesota)
249
3987: Online Vector Autoregressive Models over Expanding Graphs
Bishwadeep Das (TU Delft)*; Elvin Isufi (Tu Delft)
3232: DRL Path Planning For UAV-Aided V2X Networks: Comparing Discrete To Continuous Action
Spaces
Leonardo Spampinato (WiLab, CNIT / DEI, University of Bologna)*; Alessia Tarozzi (WiLab, CNIT / DEI,
University of Bologna); Chiara Buratti (WiLab, CNIT / DEI, University of Bologna); Riccardo Marini
(WiLab, CNIT / DEI, University of Bologna)
3921: Wireless sensing for simultaneous human vocal sound and heart sound recognition
yu rong (Arizona State University)*; Kumar Vijay Mishra (United States DEVCOM Army Research
Laboratory); Daniel Bliss (Arizona State University)
250
5221: Flexible Beam Design for Vital Sign Monitoring Using a Phased Array Equipped with Double-
Phase Shifters
Zhaoyi Xu (Rutgers, the State University of New Jersey)*; Donglin Gao (Rutgers University); Shuping Li
(Rutgers University); Chung-Tse Michael Wu (Rutgers University); Athina Petropulu (Rutgers)
6452: BENCHMARK OF PHYSIOLOGICAL MODEL BASED AND DEEP LEARNING BASED REMOTE
PHOTOPLETHYSMOGRAPHY IN AUTOMOTIVE
Zhiyu Wang (Shandong University of Science and Technology); Xuezhi Yang (Hefei University of
Technology); Hongzhou Lu (Department of Infectious Diseases, Shanghai Public Health Clinical Center,
Fudan University, Shanghai, China); Caifeng Shan (Shandong Univ. Science & Technology); Wenjin
Wang (Southern University of Science and Technology)*
2477: An Efficient Beam-Sharing Algorithm for RIS-aided Simultaneous Wireless Information and
Power Transfer Applications
Tran Minh Nguyen (Sungkyunkwan University)*; Muhammad Miftahul Amri (Sungkyunkwan University);
Je Hyeon Park (Sungkyunkwan University); Dong In Kim (Sungkyunkwan University); Kae Won Choi
(Sungkyunkwan University)
251
Signal Processing for Smart City Applications and the Internet of Things
1180: MSN-net: Multi-Scale Normality Network for Video Anomaly Detection
Yang Liu (Fudan University)*; Di Li (Shanghai East-bund Research Institute on NSAI); Wei Zhu (Fudan
University); Dingkang Yang (Fudan University); Jing Liu (Fudan University); Liang Song (Fudan
University)
2299: Efficient Quantized Constant Envelope Precoding for Multiuser Downlink Massive MIMO
Systems
Zheyu Wu (Academy of Mathematics and Systems Science); Ya-Feng Liu (Chinese Academy of
Sciences)*; Bo Jiang (Nanjing Normal University); Yu-Hong Dai (Academy of Mathematics and Systems
Science)
3177: Joint Symbol-Level Precoding and Sub-Block-Level RIS Design for Dual-Function Radar-
Communications
Linlong Wu (University of Luxembourg)*; Bowen Wang (University of Electronic Science and Technology
of China); Ziyang Cheng (University of Electronic Science and Technology of China); Bhavani Shankar
Mysore Ramarao (University of Luxembourg); Bjorn Ottersten (SnT)
3644: SYMBOL LEVEL PRECODING IN THE RF DOMAIN FOR LOW HARDWARE COMPLEXITY RIS-
ASSISTED MU-MISO SYSTEMS
Christos Tsinos (University of Athens); Theodoros Tsiftsis (Jinan University)*; Robert Schober (Friedrich-
Alexander University Erlangen-Nurnberg)
252
6212: SYMBOL-LEVEL PRECODING IS RELATED TO PARAMETER ESTIMATION FROM
QUANTIZED DATA
Mingjie Shao (The Chinese University of Hong Kong, Shandong University )*; Wing-Kin Ma (The Chinese
University of Hong Kong); Yatao Liu (The Chinese University of Hong Kong)
253
5420: Topo-MLP : A Simplicial Network Without Message Passing
Karthikeyan Natesan Ramamurthy (IBM Research)*; Aldo Guzmán-Sáenz (IBM); Mustafa Hajij (USFCA )
873: Unsupervised Deep Virtual Staining for Microscopic Cell Images via Knowledge Distillation
Ziwang Xu (School of Electrical and Electronic Engineering, Nanyang Technological University); Lanqing
Guo (Nanyang Technological University); Shuyan Zhang (Agency for Science, Technology and
Research); Alex Kot (Nanyang Technological University); Bihan Wen (Nanyang Technological University)*
254
2545: Overcoming Posterior Collapse in Variational Autoencoders via EM-type Training
Ying Li (The University of Hong Kong); Lei Cheng (Zhejiang University)*; Feng Yin (The Chinese
University of Hong Kong, Shenzhen); Michael Zhang (University of Hong Kong); Sergios Theodoridis
(National and Kapodistrian University of Athens)
255