Skip to main content

Showing 1–50 of 1,400 results for author: Li, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.17099  [pdf, other

    eess.IV

    Efficient Polarization Demosaicking via Low-cost Edge-aware and Inter-channel Correlation

    Authors: Guangsen Liu, Peng Rao, Xin Chen, Yao Li, Haixin Jiang

    Abstract: Efficient and high-fidelity polarization demosaicking is critical for industrial applications of the division of focal plane (DoFP) polarization imaging systems. However, existing methods have an unsatisfactory balance of speed, accuracy, and complexity. This study introduces a novel polarization demosaicking algorithm that interpolates within a three-stage basic demosaicking framework to obtain D… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: 15 pages, 9 figures

  2. arXiv:2408.16239  [pdf, other

    eess.SP

    Meta-Learning Empowered Graph Neural Networks for Radio Resource Management

    Authors: Kai Huang, Le Liang, Xinping Yi, Hao Ye, Shi Jin, Geoffrey Ye Li

    Abstract: In this paper, we consider a radio resource management (RRM) problem in the dynamic wireless networks, comprising multiple communication links that share the same spectrum resource. To achieve high network throughput while ensuring fairness across all links, we formulate a resilient power optimization problem with per-user minimum-rate constraints. We obtain the corresponding Lagrangian dual probl… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  3. arXiv:2408.14843  [pdf, other

    cs.LG cs.NE eess.SP

    Correntropy-Based Improper Likelihood Model for Robust Electrophysiological Source Imaging

    Authors: Yuanhao Li, Badong Chen, Zhongxu Hu, Keita Suzuki, Wenjun Bai, Yasuharu Koike, Okito Yamashita

    Abstract: Bayesian learning provides a unified skeleton to solve the electrophysiological source imaging task. From this perspective, existing source imaging algorithms utilize the Gaussian assumption for the observation noise to build the likelihood function for Bayesian inference. However, the electromagnetic measurements of brain activity are usually affected by miscellaneous artifacts, leading to a pote… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  4. arXiv:2408.14585  [pdf, other

    cs.CV cs.SD eess.AS

    Global-Local Distillation Network-Based Audio-Visual Speaker Tracking with Incomplete Modalities

    Authors: Yidi Li, Yihan Li, Yixin Guo, Bin Ren, Zhenhuan Xu, Hao Guo, Hong Liu, Nicu Sebe

    Abstract: In speaker tracking research, integrating and complementing multi-modal data is a crucial strategy for improving the accuracy and robustness of tracking systems. However, tracking with incomplete modalities remains a challenging issue due to noisy observations caused by occlusion, acoustic noise, and sensor failures. Especially when there is missing data in multiple modalities, the performance of… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Audio-Visual Speaker Tracking with Incomplete Modalities

  5. arXiv:2408.14493  [pdf

    cs.LG eess.SY

    Extraction of Typical Operating Scenarios of New Power System Based on Deep Time Series Aggregation

    Authors: Zhaoyang Qu, Zhenming Zhang, Nan Qu, Yuguang Zhou, Yang Li, Tao Jiang, Min Li, Chao Long

    Abstract: Extracting typical operational scenarios is essential for making flexible decisions in the dispatch of a new power system. This study proposed a novel deep time series aggregation scheme (DTSAs) to generate typical operational scenarios, considering the large amount of historical operational snapshot data. Specifically, DTSAs analyze the intrinsic mechanisms of different scheduling operational sce… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Accepted by CAAI Transactions on Intelligence Technology

  6. arXiv:2408.14453  [pdf

    cs.LG eess.IV eess.SP

    Reconstructing physiological signals from fMRI across the adult lifespan

    Authors: Shiyu Wang, Ziyuan Xu, Yamin Li, Mara Mather, Roza G. Bayrak, Catie Chang

    Abstract: Interactions between the brain and body are of fundamental importance for human behavior and health. Functional magnetic resonance imaging (fMRI) captures whole-brain activity noninvasively, and modeling how fMRI signals interact with physiological dynamics of the body can provide new insight into brain function and offer potential biomarkers of disease. However, physiological recordings are not a… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  7. arXiv:2408.14340  [pdf, other

    cs.SD cs.AI cs.CL cs.LG eess.AS

    Foundation Models for Music: A Survey

    Authors: Yinghao Ma, Anders Øland, Anton Ragni, Bleiz MacSen Del Sette, Charalampos Saitis, Chris Donahue, Chenghua Lin, Christos Plachouras, Emmanouil Benetos, Elio Quinton, Elona Shatri, Fabio Morreale, Ge Zhang, György Fazekas, Gus Xia, Huan Zhang, Ilaria Manco, Jiawen Huang, Julien Guinot, Liwei Lin, Luca Marinelli, Max W. Y. Lam, Megha Sharma, Qiuqiang Kong, Roger B. Dannenberg , et al. (18 additional authors not shown)

    Abstract: In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music, spanning from representation learning, generative learning and multimodal learning. We first contextualise the signifi… ▽ More

    Submitted 27 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

  8. arXiv:2408.13705  [pdf, other

    cs.CL cs.SD eess.AS

    Cross-Modal Denoising: A Novel Training Paradigm for Enhancing Speech-Image Retrieval

    Authors: Lifeng Zhou, Yuke Li, Rui Deng, Yuting Yang, Haoqi Zhu

    Abstract: The success of speech-image retrieval relies on establishing an effective alignment between speech and image. Existing methods often model cross-modal interaction through simple cosine similarity of the global feature of each modality, which fall short in capturing fine-grained details within modalities. To address this issue, we introduce an effective framework and a novel learning task named cro… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2408.13119

  9. Optimal Dispatch Strategy for a Multi-microgrid Cooperative Alliance Using a Two-Stage Pricing Mechanism

    Authors: Yonghui Nie, Zhi Li, Jie Zhang, Lei Gao, Yang Li, Hengyu Zhou

    Abstract: To coordinate resources among multi-level stakeholders and enhance the integration of electric vehicles (EVs) into multi-microgrids, this study proposes an optimal dispatch strategy within a multi-microgrid cooperative alliance using a nuanced two-stage pricing mechanism. Initially, the strategy assesses electric energy interactions between microgrids and distribution networks to establish a found… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Accepted by IEEE Transactions on Sustainable Energy, Paper no. TSTE-00122-2024

  10. arXiv:2408.13119  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Coarse-to-fine Alignment Makes Better Speech-image Retrieval

    Authors: Lifeng Zhou, Yuke Li

    Abstract: In this paper, we propose a novel framework for speech-image retrieval. We utilize speech-image contrastive (SIC) learning tasks to align speech and image representations at a coarse level and speech-image matching (SIM) learning tasks to further refine the fine-grained cross-modal alignment. SIC and SIM learning tasks are jointly trained in a unified manner. To optimize the learning process, we u… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  11. arXiv:2408.12914  [pdf, ps, other

    eess.SP

    A Recursion-Based SNR Determination Method for Short Packet Transmission: Analysis and Applications

    Authors: Chengzhe Yin, Rui Zhang, Yongzhao Li, Yuhan Ruan, Tao Li, Jiaheng Lu

    Abstract: The short packet transmission (SPT) has gained much attention in recent years. In SPT, the most significant characteristic is that the finite blocklength code (FBC) is adopted. With FBC, the signal-to-noise ratio (SNR) cannot be expressed as an explicit function with respect to the other transmission parameters. This raises the following two problems for the resource allocation in SPTs: (i) The ex… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  12. arXiv:2408.12897  [pdf, other

    eess.IV cs.CV

    When Diffusion MRI Meets Diffusion Model: A Novel Deep Generative Model for Diffusion MRI Generation

    Authors: Xi Zhu, Wei Zhang, Yijie Li, Lauren J. O'Donnell, Fan Zhang

    Abstract: Diffusion MRI (dMRI) is an advanced imaging technique characterizing tissue microstructure and white matter structural connectivity of the human brain. The demand for high-quality dMRI data is growing, driven by the need for better resolution and improved tissue contrast. However, acquiring high-quality dMRI data is expensive and time-consuming. In this context, deep generative modeling emerges as… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: 11 pages, 3 figures

  13. arXiv:2408.11982  [pdf, other

    eess.IV cs.CV cs.MM

    AIM 2024 Challenge on Compressed Video Quality Assessment: Methods and Results

    Authors: Maksim Smirnov, Aleksandr Gushchin, Anastasia Antsiferova, Dmitry Vatolin, Radu Timofte, Ziheng Jia, Zicheng Zhang, Wei Sun, Jiaying Qian, Yuqin Cao, Yinan Sun, Yuxin Zhu, Xiongkuo Min, Guangtao Zhai, Kanjar De, Qing Luo, Ao-Xiang Zhang, Peng Zhang, Haibo Lei, Linyan Jiang, Yaqing Li, Wenhui Meng, Xiaoheng Tan, Haiqiang Wang, Xiaozhong Xu , et al. (11 additional authors not shown)

    Abstract: Video quality assessment (VQA) is a crucial task in the development of video compression standards, as it directly impacts the viewer experience. This paper presents the results of the Compressed Video Quality Assessment challenge, held in conjunction with the Advances in Image Manipulation (AIM) workshop at ECCV 2024. The challenge aimed to evaluate the performance of VQA methods on a diverse dat… ▽ More

    Submitted 28 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

  14. arXiv:2408.11849  [pdf, other

    cs.CL cs.AI eess.AS

    Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation

    Authors: Yinghao Aaron Li, Xilin Jiang, Jordan Darefsky, Ge Zhu, Nima Mesgarani

    Abstract: The rapid advancement of large language models (LLMs) has significantly propelled the development of text-based chatbots, demonstrating their capability to engage in coherent and contextually relevant dialogues. However, extending these advancements to enable end-to-end speech-to-speech conversation bots remains a formidable challenge, primarily due to the extensive dataset and computational resou… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: CoLM 2024

  15. arXiv:2408.11329  [pdf, ps, other

    eess.SP

    Full-Duplex ISAC-Enabled D2D Underlaid Cellular Networks: Joint Transceiver Beamforming and Power Allocation

    Authors: Tao Jiang, Ming Jin, Qinghua Guo, Yinhong Liu, Yaming Li

    Abstract: Integrating device-to-device (D2D) communication into cellular networks can significantly reduce the transmission burden on base stations (BSs). Besides, integrated sensing and communication (ISAC) is envisioned as a key feature in future wireless networks. In this work, we consider a full-duplex ISAC- based D2D underlaid system, and propose a joint beamforming and power allocation scheme to impro… ▽ More

    Submitted 21 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

    Comments: This work has been submitted to IEEE Transactions on Wireless Communications on 7 June,2024

  16. arXiv:2408.10852  [pdf, other

    cs.SD eess.AS

    EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech

    Authors: Xin Qi, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Shuchen Shi, Yi Lu, Zhiyong Wang, Xiaopeng Wang, Yuankun Xie, Yukun Liu, Guanjun Li, Xuefei Liu, Yongwei Li

    Abstract: In the current era of Artificial Intelligence Generated Content (AIGC), a Low-Rank Adaptation (LoRA) method has emerged. It uses a plugin-based approach to learn new knowledge with lower parameter quantities and computational costs, and it can be plugged in and out based on the specific sub-tasks, offering high flexibility. However, the current application schemes primarily incorporate LoRA into t… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  17. arXiv:2408.10849  [pdf, other

    cs.SD eess.AS

    A Noval Feature via Color Quantisation for Fake Audio Detection

    Authors: Zhiyong Wang, Xiaopeng Wang, Yuankun Xie, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Yukun Liu, Guanjun Li, Xin Qi, Yi Lu, Xuefei Liu, Yongwei Li

    Abstract: In the field of deepfake detection, previous studies focus on using reconstruction or mask and prediction methods to train pre-trained models, which are then transferred to fake audio detection training where the encoder is used to extract features, such as wav2vec2.0 and Masked Auto Encoder. These methods have proven that using real audio for reconstruction pre-training can better help the model… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: accepted by ISCSLP2024

  18. arXiv:2408.10670  [pdf

    cs.CV eess.IV

    A Noncontact Technique for Wave Measurement Based on Thermal Stereography and Deep Learning

    Authors: Deyu Li, Longfei Xiao, Handi Wei, Yan Li, Binghua Zhang

    Abstract: The accurate measurement of the wave field and its spatiotemporal evolution is essential in many hydrodynamic experiments and engineering applications. The binocular stereo imaging technique has been widely used to measure waves. However, the optical properties of indoor water surfaces, including transparency, specular reflection, and texture absence, pose challenges for image processing and stere… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  19. arXiv:2408.10501  [pdf, other

    cs.IT eess.SP

    Generative Diffusion Models for High Dimensional Channel Estimation

    Authors: Xingyu Zhou, Le Liang, Jing Zhang, Peiwen Jiang, Yong Li, Shi Jin

    Abstract: Along with the prosperity of generative artificial intelligence (AI), its potential for solving conventional challenges in wireless communications has also surfaced. Inspired by this trend, we investigate the application of the advanced diffusion models (DMs), a representative class of generative AI models, to high dimensional wireless channel estimation. By capturing the structure of multiple-inp… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  20. arXiv:2408.10287  [pdf

    physics.optics cs.AI eess.IV

    Recognizing Beam Profiles from Silicon Photonics Gratings using Transformer Model

    Authors: Yu Dian Lim, Hong Yu Li, Simon Chun Kiat Goh, Xiangyu Wang, Peng Zhao, Chuan Seng Tan

    Abstract: Over the past decade, there has been extensive work in developing integrated silicon photonics (SiPh) gratings for the optical addressing of trapped ion qubits in the ion trap quantum computing community. However, when viewing beam profiles from infrared (IR) cameras, it is often difficult to determine the corresponding heights where the beam profiles are located. In this work, we developed transf… ▽ More

    Submitted 22 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

  21. arXiv:2408.09491  [pdf, other

    cs.SD eess.AS

    A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition

    Authors: Yangze Li, Xiong Wang, Songjun Cao, Yike Zhang, Long Ma, Lei Xie

    Abstract: Audio-LLM introduces audio modality into a large language model (LLM) to enable a powerful LLM to recognize, understand, and generate audio. However, during speech recognition in noisy environments, we observed the presence of illusions and repetition issues in audio-LLM, leading to substitution and insertion errors. This paper proposes a transcription prompt-based audio-LLM by introducing an ASR… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  22. arXiv:2408.09369  [pdf, other

    eess.IV cs.CV

    Flemme: A Flexible and Modular Learning Platform for Medical Images

    Authors: Guoqing Zhang, Jingyun Yang, Yang Li

    Abstract: As the rapid development of computer vision and the emergence of powerful network backbones and architectures, the application of deep learning in medical imaging has become increasingly significant. Unlike natural images, medical images lack huge volumes of data but feature more modalities, making it difficult to train a general model that has satisfactory performance across various datasets. In… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 8 pages, 6 figures

  23. arXiv:2408.08121  [pdf

    eess.SY

    Optimizing Highway Ramp Merge Safety and Efficiency via Spatio-Temporal Cooperative Control and Vehicle-Road Coordination

    Authors: Ting Peng, Xiaoxue Xu, Yuan Li, Jie Wu, Tao Li, Xiang Dong, Yincai Cai, Peng Wu

    Abstract: In view of existing automatic driving, it is difficult to accurately and timely obtain the status and driving intention of other vehicles. The safety risk and urgency of autonomous vehicles in the absence of collision are evaluated. To ensure safety and improve road efficiency, a method of pre-compiling the spatio-temporal trajectory of vehicles is established to eliminate conflicts between vehicl… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  24. Advancing Multi-grained Alignment for Contrastive Language-Audio Pre-training

    Authors: Yiming Li, Zhifang Guo, Xiangdong Wang, Hong Liu

    Abstract: Recent advances have been witnessed in audio-language joint learning, such as CLAP, that shows much success in multi-modal understanding tasks. These models usually aggregate uni-modal local representations, namely frame or word features, into global ones, on which the contrastive loss is employed to reach coarse-grained cross-modal alignment. However, frame-level correspondence with texts may be… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: ACM MM 2024 (Oral)

  25. arXiv:2408.07484  [pdf, other

    cs.CV eess.IV

    GRFormer: Grouped Residual Self-Attention for Lightweight Single Image Super-Resolution

    Authors: Yuzhen Li, Zehang Deng, Yuxin Cao, Lihua Liu

    Abstract: Previous works have shown that reducing parameter overhead and computations for transformer-based single image super-resolution (SISR) models (e.g., SwinIR) usually leads to a reduction of performance. In this paper, we present GRFormer, an efficient and lightweight method, which not only reduces the parameter overhead and computations, but also greatly improves performance. The core of GRFormer i… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: Accepted for ACM MM 2024

  26. arXiv:2408.07320  [pdf, other

    eess.SP

    Coordinated Spectral Efficiency Prediction for Real-World 5G CoMP Systems

    Authors: Zhixing Chen, Zhaoyu Fan, Yang Li, Yibin Kang, Qi Yan, Qingjiang Shi

    Abstract: Coordinated multipoint (CoMP) systems incur substantial resource consumption due to the management of backhaul links and the coordination among various base stations (BSs). Accurate prediction of coordinated spectral efficiency (CSE) can guide the optimization of network parameters, resulting in enhanced resource utilization efficiency. However, characterizing the CSE is intractable due to the inh… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  27. arXiv:2408.06906  [pdf, other

    eess.AS cs.AI

    VNet: A GAN-based Multi-Tier Discriminator Network for Speech Synthesis Vocoders

    Authors: Yubing Cao, Yongming Li, Liejun Wang, Yinfeng Yu

    Abstract: Since the introduction of Generative Adversarial Networks (GANs) in speech synthesis, remarkable achievements have been attained. In a thorough exploration of vocoders, it has been discovered that audio waveforms can be generated at speeds exceeding real-time while maintaining high fidelity, achieved through the utilization of GAN-based models. Typically, the inputs to the vocoder consist of band-… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: Accepted for publication by IEEE International Conference on Systems, Man, and Cybernetics 2024

  28. Deep Inertia $L_p$ Half-Quadratic Splitting Unrolling Network for Sparse View CT Reconstruction

    Authors: Yu Guo, Caiying Wu, Yaxin Li, Qiyu Jin, Tieyong Zeng

    Abstract: Sparse view computed tomography (CT) reconstruction poses a challenging ill-posed inverse problem, necessitating effective regularization techniques. In this letter, we employ $L_p$-norm ($0<p<1$) regularization to induce sparsity and introduce inertial steps, leading to the development of the inertial $L_p$-norm half-quadratic splitting algorithm. We rigorously prove the convergence of this algor… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: This paper was accepted by IEEE Signal Processing Letters on July 28, 2024

    Journal ref: IEEE Signal Processing Letters, 2024, 31:2030-2034

  29. arXiv:2408.04063  [pdf, other

    eess.SY

    From Black Box to Clarity: AI-Powered Smart Grid Optimization with Kolmogorov-Arnold Networks

    Authors: Xiaoting Wang, Yuzhuo Li, Yunwei Li, Gregory Kish

    Abstract: This work is the first to adopt Kolmogorov-Arnold Networks (KAN), a recent breakthrough in artificial intelligence, for smart grid optimizations. To fully leverage KAN's interpretability, a general framework is proposed considering complex uncertainties. The stochastic optimal power flow problem in hybrid AC/DC systems is chosen as a particularly tough case study for demonstrating the effectivenes… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted in Late Breaking Research Publications in 2024 IEEE Energy Conversion Congress and Exposition (ECCE)

  30. arXiv:2408.03361  [pdf, other

    eess.IV cs.CV

    GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI

    Authors: Pengcheng Chen, Jin Ye, Guoan Wang, Yanjun Li, Zhongying Deng, Wei Li, Tianbin Li, Haodong Duan, Ziyan Huang, Yanzhou Su, Benyou Wang, Shaoting Zhang, Bin Fu, Jianfei Cai, Bohan Zhuang, Eric J Seibel, Junjun He, Yu Qiao

    Abstract: Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals, and can be applied in various fields. In the medical field, LVLMs have a high potential to offer substantial assistance for diagnosis and treatment. Before that, it is crucial to develop benchmarks to evaluate LVLMs' effectiveness in various medical applications. Curren… ▽ More

    Submitted 9 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

  31. arXiv:2408.02095  [pdf, other

    cs.IT eess.SP

    Secure Semantic Communications: From Perspective of Physical Layer Security

    Authors: Yongkang Li, Zheng Shi, Han Hu, Yaru Fu, Hong Wang, Hongjiang Lei

    Abstract: Semantic communications have been envisioned as a potential technique that goes beyond Shannon paradigm. Unlike modern communications that provide bit-level security, the eaves-dropping of semantic communications poses a significant risk of potentially exposing intention of legitimate user. To address this challenge, a novel deep neural network (DNN) enabled secure semantic communication (DeepSSC)… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  32. arXiv:2408.01648  [pdf

    eess.IV cs.CV

    Zero-Shot Surgical Tool Segmentation in Monocular Video Using Segment Anything Model 2

    Authors: Ange Lou, Yamin Li, Yike Zhang, Robert F. Labadie, Jack Noble

    Abstract: The Segment Anything Model 2 (SAM 2) is the latest generation foundation model for image and video segmentation. Trained on the expansive Segment Anything Video (SA-V) dataset, which comprises 35.5 million masks across 50.9K videos, SAM 2 advances its predecessor's capabilities by supporting zero-shot segmentation through various prompts (e.g., points, boxes, and masks). Its robust zero-shot perfo… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: The first work evaluates the performance of SAM 2 in surgical videos

  33. arXiv:2408.00916  [pdf, other

    eess.SY

    A reference frame-based microgrid primary control for ensuring global convergence to a periodic orbit

    Authors: Xinyuan Jiang, Constantino M. Lagoa, Daning Huang, Yan Li

    Abstract: Electric power systems with growing penetration of renewable generation face problems of frequency oscillation and increased uncertainty as the operating point may veer close to instability. Traditionally the stability of these systems is studied either in terms of local stability or as an angle synchronization problem under the simplifying assumption that decouples the amplitude along with all di… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  34. arXiv:2408.00429  [pdf, other

    eess.SP cs.AI

    Augmenting Channel Simulator and Semi- Supervised Learning for Efficient Indoor Positioning

    Authors: Yupeng Li, Xinyu Ning, Shijian Gao, Yitong Liu, Zhi Sun, Qixing Wang, Jiangzhou Wang

    Abstract: This work aims to tackle the labor-intensive and resource-consuming task of indoor positioning by proposing an efficient approach. The proposed approach involves the introduction of a semi-supervised learning (SSL) with a biased teacher (SSLB) algorithm, which effectively utilizes both labeled and unlabeled channel data. To reduce measurement expenses, unlabeled data is generated using an updated… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: ACCEPTED for presentation at 2024 IEEE Global Communications Conference

  35. arXiv:2407.21280  [pdf, other

    eess.SP

    Wireless-Powered Mobile Crowdsensing Enhanced by UAV-Mounted RIS: Joint Transmission, Compression, and Trajectory Design

    Authors: Yongqing Xu, Haoqing Qi, Zhiqin Wang, Xiang Zhang, Yong Li, Tony Q. S. Quek

    Abstract: Mobile crowdsensing (MCS) enables data collection from massive devices to achieve a wide sensing range. Wireless power transfer (WPT) is a promising paradigm for prolonging the operation time of MCS systems by sustainably transferring power to distributed devices. However, the efficiency of WPT significantly deteriorates when the channel conditions are poor. Unmanned aerial vehicles (UAVs) and rec… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  36. arXiv:2407.20262  [pdf

    eess.SP

    A Neural-Network-Embedded Equivalent Circuit Model for Lithium-ion Battery State Estimation

    Authors: Zelin Guo, Yiyan Li, Zheng Yan, Mo-Yuen Chow

    Abstract: Equivalent Circuit Model(ECM)has been widelyused in battery modeling and state estimation because of itssimplicity, stability and interpretability.However, ECM maygenerate large estimation errors in extreme working conditionssuch as freezing environmenttemperature andcomplexcharging/discharging behaviors,in whichscenariostheelectrochemical characteristics of the battery become extremelycomplex and… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 8 pages

  37. arXiv:2407.19867  [pdf

    eess.SY

    Design and Testing for Steel Support Axial Force Servo System

    Authors: Sana Ullah, Yonghong Zhou, Maokai Lai, Xiang Dong, Tao Li, Xiaoxue Xu, Yuan Li, Ting Peng

    Abstract: Foundation excavations are deepening, expanding, and approaching structures. Steel supports measure and manage axial force. The study regulates steel support structure power during deep excavation using a novel axial force management system for safety, efficiency, and structural integrity. Closed-loop control changes actuator output to maintain axial force based on force. In deep excavation, the s… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 6 pages,7 figures, 1 table, 2 graph, conference paper

  38. arXiv:2407.17510  [pdf

    eess.SP

    Transfer Learning Enabled Transformer based Generative Adversarial Networks (TT-GAN) for Terahertz Channel Modeling and Generating

    Authors: Zhengdong Hu, Yuanbo Li, Chong Han

    Abstract: Terahertz (THz) communications, ranging from 100 GHz to 10 THz, are envisioned as a promising technology for 6G and beyond wireless systems. As foundation of designing THz communications, channel modeling and characterization are crucial to scrutinize the potential of the new spectrum. However, current channel modeling and standardization heavily rely on measurements, which are both time-consuming… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2306.06902

  39. arXiv:2407.16684  [pdf, other

    eess.IV cs.CV q-bio.NC

    AutoRG-Brain: Grounded Report Generation for Brain MRI

    Authors: Jiayu Lei, Xiaoman Zhang, Chaoyi Wu, Lisong Dai, Ya Zhang, Yanyong Zhang, Yanfeng Wang, Weidi Xie, Yuehua Li

    Abstract: Radiologists are tasked with interpreting a large number of images in a daily base, with the responsibility of generating corresponding reports. This demanding workload elevates the risk of human error, potentially leading to treatment delays, increased healthcare costs, revenue loss, and operational inefficiencies. To address these challenges, we initiate a series of work on grounded Automatic Re… ▽ More

    Submitted 29 July, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

  40. arXiv:2407.16657  [pdf, other

    physics.optics eess.IV

    Fluorescence Diffraction Tomography using Explicit Neural Fields

    Authors: Renzhi He, Yucheng Li, Junjie Chen, Yi Xue

    Abstract: Simultaneous imaging of fluorescence-labeled and label-free phase objects in the same sample provides distinct and complementary information. Most multimodal fluorescence-phase imaging operates in transmission mode, capturing fluorescence images and phase images separately or sequentially, which limits their practical application in vivo. Here, we develop fluorescence diffraction tomography (FDT)… ▽ More

    Submitted 19 August, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

  41. arXiv:2407.16634  [pdf, other

    eess.IV cs.AI cs.CV cs.HC

    Knowledge-driven AI-generated data for accurate and interpretable breast ultrasound diagnoses

    Authors: Haojun Yu, Youcheng Li, Nan Zhang, Zihan Niu, Xuantong Gong, Yanwen Luo, Quanlin Wu, Wangyan Qin, Mengyuan Zhou, Jie Han, Jia Tao, Ziwei Zhao, Di Dai, Di He, Dong Wang, Binghui Tang, Ling Huo, Qingli Zhu, Yong Wang, Liwei Wang

    Abstract: Data-driven deep learning models have shown great capabilities to assist radiologists in breast ultrasound (US) diagnoses. However, their effectiveness is limited by the long-tail distribution of training data, which leads to inaccuracies in rare cases. In this study, we address a long-standing challenge of improving the diagnostic model performance on rare cases using long-tailed data. Specifical… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  42. arXiv:2407.13691  [pdf, other

    eess.SP

    Unsupervised and Interpretable Synthesizing for Electrical Time Series Based on Information Maximizing Generative Adversarial Nets

    Authors: Zhenghao Zhou, Yiyan Li, Runlong Liu, Zheng Yan, Mo-Yuen Chow

    Abstract: Generating synthetic data has become a popular alternative solution to deal with the difficulties in accessing and sharing field measurement data in power systems. However, to make the generation results controllable, existing methods (e.g. Conditional Generative Adversarial Nets, cGAN) require labeled dataset to train the model, which is demanding in practice because many field measurement data l… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  43. arXiv:2407.12538  [pdf, other

    eess.IV cs.CV

    High Frequency Matters: Uncertainty Guided Image Compression with Wavelet Diffusion

    Authors: Juan Song, Jiaxiang He, Mingtao Feng, Keyan Wang, Yunsong Li, Ajmal Mian

    Abstract: Diffusion probabilistic models have recently achieved remarkable success in generating high-quality images. However, balancing high perceptual quality and low distortion remains challenging in image compression applications. To address this issue, we propose an efficient Uncertainty-Guided image compression approach with wavelet Diffusion (UGDiff). Our approach focuses on high frequency compressio… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  44. arXiv:2407.12038  [pdf, ps, other

    eess.AS cs.AI

    ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024

    Authors: Ruibo Fu, Rui Liu, Chunyu Qiang, Yingming Gao, Yi Lu, Shuchen Shi, Tao Wang, Ya Li, Zhengqi Wen, Chen Zhang, Hui Bu, Yukun Liu, Xin Qi, Guanjun Li

    Abstract: The Inspirational and Convincing Audio Generation Challenge 2024 (ICAGC 2024) is part of the ISCSLP 2024 Competitions and Challenges track. While current text-to-speech (TTS) technology can generate high-quality audio, its ability to convey complex emotions and controlled detail content remains limited. This constraint leads to a discrepancy between the generated audio and human subjective percept… ▽ More

    Submitted 31 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: ISCSLP 2024 Challenge description and results

  45. arXiv:2407.11595  [pdf, other

    eess.SP

    Machine Learning in Communications: A Road to Intelligent Transmission and Processing

    Authors: Shixiong Wang, Geoffrey Ye Li

    Abstract: Prior to the era of artificial intelligence and big data, wireless communications primarily followed a conventional research route involving problem analysis, model building and calibration, algorithm design and tuning, and holistic and empirical verification. However, this methodology often encountered limitations when dealing with large-scale and complex problems and managing dynamic and massive… ▽ More

    Submitted 25 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: Invited by and Accepted to "Communications of Huawei Research"

  46. arXiv:2407.11541  [pdf, other

    eess.IV cs.CV

    Uniformly Accelerated Motion Model for Inter Prediction

    Authors: Zhuoyuan Li, Yao Li, Chuanbo Tang, Li Li, Dong Liu, Feng Wu

    Abstract: Inter prediction is a key technology to reduce the temporal redundancy in video coding. In natural videos, there are usually multiple moving objects with variable velocity, resulting in complex motion fields that are difficult to represent compactly. In Versatile Video Coding (VVC), existing inter prediction methods usually assume uniform speed motion between consecutive frames and use the linear… ▽ More

    Submitted 21 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: 5 pages, 4 figures

  47. arXiv:2407.10926  [pdf, other

    eess.IV cs.CV

    In-Loop Filtering via Trained Look-Up Tables

    Authors: Zhuoyuan Li, Jiacheng Li, Yao Li, Li Li, Dong Liu, Feng Wu

    Abstract: In-loop filtering (ILF) is a key technology for removing the artifacts in image/video coding standards. Recently, neural network-based in-loop filtering methods achieve remarkable coding gains beyond the capability of advanced video coding standards, which becomes a powerful coding tool candidate for future video coding standards. However, the utilization of deep neural networks brings heavy time… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 11 pages, 6 figures

  48. arXiv:2407.10471  [pdf, other

    cs.CR cs.AI cs.SD eess.AS

    GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio Synthesis

    Authors: Weizhi Liu, Yue Li, Dongdong Lin, Hui Tian, Haizhou Li

    Abstract: Amid the burgeoning development of generative models like diffusion models, the task of differentiating synthesized audio from its natural counterpart grows more daunting. Deepfake detection offers a viable solution to combat this challenge. Yet, this defensive measure unintentionally fuels the continued refinement of generative models. Watermarking emerges as a proactive and sustainable tactic, p… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 2024

  49. arXiv:2407.10307  [pdf, other

    eess.SY

    Distributed Charging Coordination for Electric Trucks under Limited Facilities and Travel Uncertainties

    Authors: Ting Bai, Yuchao Li, Karl Henrik Johansson, Jonas Mårtensson

    Abstract: In this work, we address the problem of charging coordination between electric trucks and charging stations. The problem arises from the tension between the trucks' nontrivial charging times and the stations' limited charging facilities. Our goal is to reduce the trucks' waiting times at the stations while minimizing individual trucks' operational costs. We propose a distributed coordination frame… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  50. arXiv:2407.10048  [pdf, other

    cs.SD eess.AS

    Whisper-SV: Adapting Whisper for Low-data-resource Speaker Verification

    Authors: Li Zhang, Ning Jiang, Qing Wang, Yue Li, Quan Lu, Lei Xie

    Abstract: Trained on 680,000 hours of massive speech data, Whisper is a multitasking, multilingual speech foundation model demonstrating superior performance in automatic speech recognition, translation, and language identification. However, its applicability in speaker verification (SV) tasks remains unexplored, particularly in low-data-resource scenarios where labeled speaker data in specific domains are… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.