Skip to main content

Showing 1–50 of 4,167 results for author: Zhang, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.17072  [pdf, other

    cs.CL

    MaFeRw: Query Rewriting with Multi-Aspect Feedbacks for Retrieval-Augmented Large Language Models

    Authors: Yujing Wang, Hainan Zhang, Liang Pang, Liang Pang, Hongwei Zheng, Zhiming Zheng

    Abstract: In a real-world RAG system, the current query often involves spoken ellipses and ambiguous references from dialogue contexts, necessitating query rewriting to better describe user's information needs. However, traditional context-based rewriting has minimal enhancement on downstream generation tasks due to the lengthy process from query rewriting to response generation. Some researchers try to uti… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  2. arXiv:2408.16774  [pdf

    cs.IT eess.SP

    Optimal UCA Design for OAM Based Wireless Backhaul Transmission

    Authors: Haiyue Jing, Wenchi Cheng, Wei Zhang, Hailin Zhang

    Abstract: Orbital angular momentum (OAM), which is considered as a novel way to achieve high capacity, has been attracted much attention recently. OAM signals emitted by uniform circular array (UCA) are widely regarded to go through the Bessel-form channels. However, the channel gains corresponding to the Bessel-form channels are with low signal-to-noise-ratio (SNR) on OAM-modes and it is difficult to achie… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  3. arXiv:2408.16426  [pdf, other

    cs.CV cs.AI

    COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation

    Authors: Jiefeng Li, Ye Yuan, Davis Rempe, Haotian Zhang, Pavlo Molchanov, Cewu Lu, Jan Kautz, Umar Iqbal

    Abstract: Estimating global human motion from moving cameras is challenging due to the entanglement of human and camera motions. To mitigate the ambiguity, existing methods leverage learned human motion priors, which however often result in oversmoothed motions with misaligned 2D projections. To tackle this problem, we propose COIN, a control-inpainting motion diffusion prior that enables fine-grained contr… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: ECCV 2024

  4. arXiv:2408.16337  [pdf, other

    cs.LG cond-mat.mtrl-sci

    Do Graph Neural Networks Work for High Entropy Alloys?

    Authors: Hengrui Zhang, Ruishu Huang, Jie Chen, James M. Rondinelli, Wei Chen

    Abstract: Graph neural networks (GNNs) have excelled in predictive modeling for both crystals and molecules, owing to the expressiveness of graph representations. High-entropy alloys (HEAs), however, lack chemical long-range order, limiting the applicability of current graph representations. To overcome this challenge, we propose a representation of HEAs as a collection of local environment (LE) graphs. Bas… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  5. arXiv:2408.15994  [pdf, other

    cs.CV

    Perceive-IR: Learning to Perceive Degradation Better for All-in-One Image Restoration

    Authors: Xu Zhang, Jiaqi Ma, Guoli Wang, Qian Zhang, Huan Zhang, Lefei Zhang

    Abstract: The limitations of task-specific and general image restoration methods for specific degradation have prompted the development of all-in-one image restoration techniques. However, the diversity of patterns among multiple degradation, along with the significant uncertainties in mapping between degraded images of different severities and their corresponding undistorted versions, pose significant chal… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 13 pages, 8 figures

  6. arXiv:2408.15792  [pdf, other

    cs.LG

    Efficient LLM Scheduling by Learning to Rank

    Authors: Yichao Fu, Siqi Zhu, Runlong Su, Aurick Qiao, Ion Stoica, Hao Zhang

    Abstract: In Large Language Model (LLM) inference, the output length of an LLM request is typically regarded as not known a priori. Consequently, most LLM serving systems employ a simple First-come-first-serve (FCFS) scheduling strategy, leading to Head-Of-Line (HOL) blocking and reduced throughput and service quality. In this paper, we reexamine this assumption -- we show that, although predicting the exac… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  7. arXiv:2408.15741  [pdf, other

    cs.CV

    Segmentation-guided Layer-wise Image Vectorization with Gradient Fills

    Authors: Hengyu Zhou, Hui Zhang, Bin Wang

    Abstract: The widespread use of vector graphics creates a significant demand for vectorization methods. While recent learning-based techniques have shown their capability to create vector images of clear topology, filling these primitives with gradients remains a challenge. In this paper, we propose a segmentation-guided vectorization framework to convert raster images into concise vector graphics with radi… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  8. arXiv:2408.15496  [pdf, other

    cs.CL

    ReMamba: Equip Mamba with Effective Long-Sequence Modeling

    Authors: Danlong Yuan, Jiahao Liu, Bei Li, Huishuai Zhang, Jingang Wang, Xunliang Cai, Dongyan Zhao

    Abstract: While the Mamba architecture demonstrates superior inference efficiency and competitive performance on short-context natural language processing (NLP) tasks, empirical evidence suggests its capacity to comprehend long contexts is limited compared to transformer-based models. In this study, we investigate the long-context efficiency issues of the Mamba models and propose ReMamba, which enhances Mam… ▽ More

    Submitted 29 August, 2024; v1 submitted 27 August, 2024; originally announced August 2024.

  9. arXiv:2408.15461  [pdf, other

    cs.CV cs.MM

    Hand1000: Generating Realistic Hands from Text with Only 1,000 Images

    Authors: Haozhuo Zhang, Bin Zhu, Yu Cao, Yanbin Hao

    Abstract: Text-to-image generation models have achieved remarkable advancements in recent years, aiming to produce realistic images from textual descriptions. However, these models often struggle with generating anatomically accurate representations of human hands. The resulting images frequently exhibit issues such as incorrect numbers of fingers, unnatural twisting or interlacing of fingers, or blurred an… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: Project page https://haozhuo-zhang.github.io/Hand1000-project-page/

  10. arXiv:2408.15273  [pdf

    eess.SP cs.IT

    Concentric UCAs Based Low-Order OAM for High Capacity in Radio Vortex Wireless Communications

    Authors: Haiyue Jing, Wenchi Cheng, Zan Li, Hailin Zhang

    Abstract: Due to the potential capacity-boosting for wireless communications, the Radio vOrtex Wireless COMMunication (RowComm) over orthogonal states/modes of Orbital Angular Momentum (OAM) has been paid much attention in recent years. Uniform circular array (UCA), as an efficient and convenient antenna structure, can transmit/receive multiple OAM beams with different OAM-modes simultaneously when the tran… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  11. arXiv:2408.15252  [pdf, other

    eess.SP cs.AI

    Generative AI on SpectrumNet: An Open Benchmark of Multiband 3D Radio Maps

    Authors: Shuhang Zhang, Shuai Jiang, Wanjie Lin, Zheng Fang, Kangjun Liu, Hongliang Zhang, Ke Chen

    Abstract: Radio map is an efficient demonstration for visually displaying the wireless signal coverage within a certain region. It has been considered to be increasingly helpful for the future sixth generation (6G) of wireless networks, as wireless nodes are becoming more crowded and complicated. However, the construction of high resolution radio map is very challenging due to the sparse sampling in practic… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: 30 pages, 15 figures

  12. arXiv:2408.15221  [pdf, other

    cs.LG cs.CL cs.CR cs.CY

    LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet

    Authors: Nathaniel Li, Ziwen Han, Ian Steneker, Willow Primack, Riley Goodside, Hugh Zhang, Zifan Wang, Cristina Menghini, Summer Yue

    Abstract: Recent large language model (LLM) defenses have greatly improved models' ability to refuse harmful queries, even when adversarially attacked. However, LLM defenses are primarily evaluated against automated adversarial attacks in a single turn of conversation, an insufficient threat model for real-world malicious use. We demonstrate that multi-turn human jailbreaks uncover significant vulnerabiliti… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  13. arXiv:2408.15037  [pdf, other

    cs.CL cs.AI

    Evidence-Enhanced Triplet Generation Framework for Hallucination Alleviation in Generative Question Answering

    Authors: Haowei Du, Huishuai Zhang, Dongyan Zhao

    Abstract: To address the hallucination in generative question answering (GQA) where the answer can not be derived from the document, we propose a novel evidence-enhanced triplet generation framework, EATQA, encouraging the model to predict all the combinations of (Question, Evidence, Answer) triplet by flipping the source pair and the target label to understand their logical relationships, i.e., predict Ans… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  14. arXiv:2408.14977  [pdf, other

    eess.IV cs.CV

    LN-Gen: Rectal Lymph Nodes Generation via Anatomical Features

    Authors: Weidong Guo, Hantao Zhang, Shouhong Wan, Bingbing Zou, Wanqin Wang, Peiquan Jin

    Abstract: Accurate segmentation of rectal lymph nodes is crucial for the staging and treatment planning of rectal cancer. However, the complexity of the surrounding anatomical structures and the scarcity of annotated data pose significant challenges. This study introduces a novel lymph node synthesis technique aimed at generating diverse and realistic synthetic rectal lymph node samples to mitigate the reli… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 8 pages

  15. arXiv:2408.14968  [pdf, other

    cs.IR cs.CL

    MRSE: An Efficient Multi-modality Retrieval System for Large Scale E-commerce

    Authors: Hao Jiang, Haoxiang Zhang, Qingshan Hou, Chaofeng Chen, Weisi Lin, Jingchang Zhang, Annan Wang

    Abstract: Providing high-quality item recall for text queries is crucial in large-scale e-commerce search systems. Current Embedding-based Retrieval Systems (ERS) embed queries and items into a shared low-dimensional space, but uni-modality ERS rely too heavily on textual features, making them unreliable in complex contexts. While multi-modality ERS incorporate various data sources, they often overlook indi… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  16. arXiv:2408.14381  [pdf, other

    cs.LG cs.CV cs.DS

    Learning Tree-Structured Composition of Data Augmentation

    Authors: Dongyue Li, Kailai Chen, Predrag Radivojac, Hongyang R. Zhang

    Abstract: Data augmentation is widely used for training a neural network given little labeled data. A common practice of augmentation training is applying a composition of multiple transformations sequentially to the data. Existing augmentation methods such as RandAugment randomly sample from a list of pre-selected transformations, while methods such as AutoAugment apply advanced search to optimize over an… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 25 pages

  17. arXiv:2408.14340  [pdf, other

    cs.SD cs.AI cs.CL cs.LG eess.AS

    Foundation Models for Music: A Survey

    Authors: Yinghao Ma, Anders Øland, Anton Ragni, Bleiz MacSen Del Sette, Charalampos Saitis, Chris Donahue, Chenghua Lin, Christos Plachouras, Emmanouil Benetos, Elio Quinton, Elona Shatri, Fabio Morreale, Ge Zhang, György Fazekas, Gus Xia, Huan Zhang, Ilaria Manco, Jiawen Huang, Julien Guinot, Liwei Lin, Luca Marinelli, Max W. Y. Lam, Megha Sharma, Qiuqiang Kong, Roger B. Dannenberg , et al. (18 additional authors not shown)

    Abstract: In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music, spanning from representation learning, generative learning and multimodal learning. We first contextualise the signifi… ▽ More

    Submitted 27 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

  18. arXiv:2408.14211  [pdf, other

    cs.CV cs.AI

    MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement

    Authors: Xu He, Xiaoyu Li, Di Kang, Jiangnan Ye, Chaopeng Zhang, Liyang Chen, Xiangjun Gao, Han Zhang, Zhiyong Wu, Haolin Zhuang

    Abstract: Existing works in single-image human reconstruction suffer from weak generalizability due to insufficient training data or 3D inconsistencies for a lack of comprehensive multi-view knowledge. In this paper, we introduce MagicMan, a human-specific multi-view diffusion model designed to generate high-quality novel view images from a single reference image. As its core, we leverage a pre-trained 2D d… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Project Page: https://thuhcsi.github.io/MagicMan

  19. arXiv:2408.14158  [pdf, other

    cs.DC cs.AI

    Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning

    Authors: Wei An, Xiao Bi, Guanting Chen, Shanhuang Chen, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Wenjun Gao, Kang Guan, Jianzhong Guo, Yongqiang Guo, Zhe Fu, Ying He, Panpan Huang, Jiashi Li, Wenfeng Liang, Xiaodong Liu, Xin Liu, Yiyuan Liu, Yuxuan Liu, Shanghao Lu, Xuan Lu, Xiaotao Nie, Tian Pei , et al. (27 additional authors not shown)

    Abstract: The rapid progress in Deep Learning (DL) and Large Language Models (LLMs) has exponentially increased demands of computational power and bandwidth. This, combined with the high costs of faster computing chips and interconnects, has significantly inflated High Performance Computing (HPC) construction costs. To address these challenges, we introduce the Fire-Flyer AI-HPC architecture, a synergistic… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: This is the preprint version of the paper accepted for presentation at the 2024 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'24). \c{opyright} 2024 IEEE. Personal use of this material is permitted. For other uses, permission from IEEE must be obtained. Please refer to IEEE Xplore for the final published version

  20. arXiv:2408.13976  [pdf, other

    cs.SE

    Sifting through the Chaff: On Utilizing Execution Feedback for Ranking the Generated Code Candidates

    Authors: Zhihong Sun, Yao Wan, Jia Li, Hongyu Zhang, Zhi Jin, Ge Li, Chen Lyu

    Abstract: Large Language Models (LLMs), such as GPT-4, StarCoder, and CodeLlama, are transforming the way developers approach programming by automatically generating code based on given natural language descriptions. Despite advancements, generating syntactically and semantically correct code remains challenging, especially for complex programming tasks. Typically, individuals generate multiple candidate so… ▽ More

    Submitted 27 August, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

    Comments: Accepted by the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE 2024)

  21. arXiv:2408.13922  [pdf, other

    cs.CV

    COMPOSE: Comprehensive Portrait Shadow Editing

    Authors: Andrew Hou, Zhixin Shu, Xuaner Zhang, He Zhang, Yannick Hold-Geoffroy, Jae Shin Yoon, Xiaoming Liu

    Abstract: Existing portrait relighting methods struggle with precise control over facial shadows, particularly when faced with challenges such as handling hard shadows from directional light sources or adjusting shadows while remaining in harmony with existing lighting conditions. In many situations, completely altering input lighting is undesirable for portrait retouching applications: one may want to pres… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: Accepted at ECCV 2024

  22. arXiv:2408.13898  [pdf, other

    cs.CV

    Evaluating Attribute Comprehension in Large Vision-Language Models

    Authors: Haiwen Zhang, Zixi Yang, Yuanzhi Liu, Xinran Wang, Zheqi He, Kongming Liang, Zhanyu Ma

    Abstract: Currently, large vision-language models have gained promising progress on many downstream tasks. However, they still suffer many challenges in fine-grained visual understanding tasks, such as object attribute comprehension. Besides, there have been growing efforts on the evaluations of large vision-language models, but lack of in-depth study of attribute comprehension and the visual language fine-… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 15 pages, 4 figures

  23. arXiv:2408.13877  [pdf, other

    cs.CV

    Camouflaged_Object_Tracking__A_Benchmark

    Authors: Xiaoyu Guo, Pengzhi Zhong, Hao Zhang, Ling Huang, Defeng Huang, Shuiwang Li

    Abstract: Visual tracking has seen remarkable advancements, largely driven by the availability of large-scale training datasets that have enabled the development of highly accurate and robust algorithms. While significant progress has been made in tracking general objects, research on more challenging scenarios, such as tracking camouflaged objects, remains limited. Camouflaged objects, which blend seamless… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  24. arXiv:2408.13729  [pdf, other

    cs.SE

    Root Cause Analysis for Microservice System based on Causal Inference: How Far Are We?

    Authors: Luan Pham, Huong Ha, Hongyu Zhang

    Abstract: Microservice architecture has become a popular architecture adopted by many cloud applications. However, identifying the root cause of a failure in microservice systems is still a challenging and time-consuming task. In recent years, researchers have introduced various causal inference-based root cause analysis methods to assist engineers in identifying the root causes. To gain a better understand… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: This paper has been accepted to ASE'24

  25. arXiv:2408.13728  [pdf, other

    cs.CV

    3D-RCNet: Learning from Transformer to Build a 3D Relational ConvNet for Hyperspectral Image Classification

    Authors: Haizhao Jing, Liuwei Wan, Xizhe Xue, Haokui Zhang, Ying Li

    Abstract: Recently, the Vision Transformer (ViT) model has replaced the classical Convolutional Neural Network (ConvNet) in various computer vision tasks due to its superior performance. Even in hyperspectral image (HSI) classification field, ViT-based methods also show promising potential. Nevertheless, ViT encounters notable difficulties in processing HSI data. Its self-attention mechanism, which exhibits… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  26. arXiv:2408.13367  [pdf

    cs.CR cs.ET

    Generative Blockchain: Transforming Blockchain from Transaction Recording to Transaction Generation through Proof-of-Merit

    Authors: Haozhao Zhang, Zhe Zhang, Zhiqiang Zheng, Varghese Jacob

    Abstract: This paper proposes a new paradigm: generative blockchain, which aims to transform conventional blockchain technology by combining transaction generation and recording, rather than focusing solely on transaction recording. Central to our design is a novel consensus mechanism, Proof-of-Merit (PoM), specifically crafted for environments where businesses must solve complex problems before transaction… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  27. arXiv:2408.13282  [pdf

    cs.CL cs.LG

    Question answering system of bridge design specification based on large language model

    Authors: Leye Zhang, Xiangxiang Tian, Hongjun Zhang

    Abstract: This paper constructs question answering system for bridge design specification based on large language model. Three implementation schemes are tried: full fine-tuning of the Bert pretrained model, parameter-efficient fine-tuning of the Bert pretrained model, and self-built language model from scratch. Through the self-built question and answer task dataset, based on the tensorflow and keras deep… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 10 pages, 7 figures

  28. arXiv:2408.13257  [pdf, other

    cs.CV

    MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

    Authors: Yi-Fan Zhang, Huanyu Zhang, Haochen Tian, Chaoyou Fu, Shuangqing Zhang, Junfei Wu, Feng Li, Kun Wang, Qingsong Wen, Zhang Zhang, Liang Wang, Rong Jin, Tieniu Tan

    Abstract: Comprehensive evaluation of Multimodal Large Language Models (MLLMs) has recently garnered widespread attention in the research community. However, we observe that existing benchmarks present several common barriers that make it difficult to measure the significant challenges that models face in the real world, including: 1) small data scale leads to a large performance variance; 2) reliance on mo… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Project Page: $\href{https://mme-realworld.github.io/}{\text{https://mme-realworld.github.io/}}$

  29. arXiv:2408.13184  [pdf, other

    cs.CL

    Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning

    Authors: Hourui Deng, Hongjie Zhang, Jie Ou, Chaosheng Feng

    Abstract: Spatial reasoning in Large Language Models (LLMs) is the foundation for embodied intelligence. However, even in simple maze environments, LLMs still encounter challenges in long-term path-planning, primarily influenced by their spatial hallucination and context inconsistency hallucination by long-term reasoning. To address this challenge, this study proposes an innovative model, Spatial-to-Relatio… ▽ More

    Submitted 26 August, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

    Comments: Submitted to ICASSP

  30. arXiv:2408.12793  [pdf, other

    cs.CV

    La-SoftMoE CLIP for Unified Physical-Digital Face Attack Detection

    Authors: Hang Zou, Chenxi Du, Hui Zhang, Yuan Zhang, Ajian Liu, Jun Wan, Zhen Lei

    Abstract: Facial recognition systems are susceptible to both physical and digital attacks, posing significant security risks. Traditional approaches often treat these two attack types separately due to their distinct characteristics. Thus, when being combined attacked, almost all methods could not deal. Some studies attempt to combine the sparse data from both types of attacks into a single dataset and try… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  31. arXiv:2408.12609  [pdf, ps, other

    cs.RO cs.AI

    Enhanced Prediction of Multi-Agent Trajectories via Control Inference and State-Space Dynamics

    Authors: Yu Zhang, Yongxiang Zou, Haoyu Zhang, Zeyu Liu, Houcheng Li, Long Cheng

    Abstract: In the field of autonomous systems, accurately predicting the trajectories of nearby vehicles and pedestrians is crucial for ensuring both safety and operational efficiency. This paper introduces a novel methodology for trajectory forecasting based on state-space dynamic system modeling, which endows agents with models that have tangible physical implications. To enhance the precision of state est… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  32. arXiv:2408.12246  [pdf, other

    cs.CV

    OVA-DETR: Open Vocabulary Aerial Object Detection Using Image-Text Alignment and Fusion

    Authors: Guoting Wei, Xia Yuan, Yu Liu, Zhenhao Shang, Kelu Yao, Chao Li, Qingsen Yan, Chunxia Zhao, Haokui Zhang, Rong Xiao

    Abstract: Aerial object detection has been a hot topic for many years due to its wide application requirements. However, most existing approaches can only handle predefined categories, which limits their applicability for the open scenarios in real-world. In this paper, we extend aerial object detection to open scenarios by exploiting the relationship between image and text, and propose OVA-DETR, a high-eff… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  33. arXiv:2408.12086  [pdf, other

    cs.CV cs.AI

    Unlocking Attributes' Contribution to Successful Camouflage: A Combined Textual and VisualAnalysis Strategy

    Authors: Hong Zhang, Yixuan Lyu, Qian Yu, Hanyang Liu, Huimin Ma, Ding Yuan, Yifan Yang

    Abstract: In the domain of Camouflaged Object Segmentation (COS), despite continuous improvements in segmentation performance, the underlying mechanisms of effective camouflage remain poorly understood, akin to a black box. To address this gap, we present the first comprehensive study to examine the impact of camouflage attributes on the effectiveness of camouflage patterns, offering a quantitative framewor… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV 2024

  34. arXiv:2408.12077  [pdf, other

    eess.SP cs.CV cs.LG

    Through-the-Wall Radar Human Activity Micro-Doppler Signature Representation Method Based on Joint Boulic-Sinusoidal Pendulum Model

    Authors: Xiaopeng Yang, Weicheng Gao, Xiaodong Qu, Zeyu Ma, Hao Zhang

    Abstract: With the help of micro-Doppler signature, ultra-wideband (UWB) through-the-wall radar (TWR) enables the reconstruction of range and velocity information of limb nodes to accurately identify indoor human activities. However, existing methods are usually trained and validated directly using range-time maps (RTM) and Doppler-time maps (DTM), which have high feature redundancy and poor generalization… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 17 pages, 14 figures, 7 tables, in IEEE Transactions on Microwave Theory and Techniques, 2024

    MSC Class: 94 ACM Class: I.5.1

  35. arXiv:2408.11987  [pdf, other

    cs.AI

    SimBench: A Rule-Based Multi-Turn Interaction Benchmark for Evaluating an LLM's Ability to Generate Digital Twins

    Authors: Jingquan Wang, Harry Zhang, Huzaifa Mustafa Unjhawala, Peter Negrut, Shu Wang, Khailanii Slaton, Radu Serban, Jin-Long Wu, Dan Negrut

    Abstract: We introduce SimBench, a benchmark designed to evaluate the proficiency of student large language models (S-LLMs) in generating digital twins (DTs) that can be used in simulators for virtual testing. Given a collection of S-LLMs, this benchmark enables the ranking of the S-LLMs based on their ability to produce high-quality DTs. We demonstrate this by comparing over 20 open- and closed-source S-LL… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  36. arXiv:2408.11834  [pdf, other

    cs.CV cs.AI

    SCREENER: A general framework for task-specific experiment design in quantitative MRI

    Authors: Tianshu Zheng, Zican Wang, Timothy Bray, Daniel C. Alexander, Dan Wu, Hui Zhang

    Abstract: Quantitative magnetic resonance imaging (qMRI) is increasingly investigated for use in a variety of clinical tasks from diagnosis, through staging, to treatment monitoring. However, experiment design in qMRI, the identification of the optimal acquisition protocols, has been focused on obtaining the most precise parameter estimations, with no regard for the specific requirements of downstream tasks… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  37. arXiv:2408.11338  [pdf, other

    cs.AI cs.LG

    Automatic Dataset Construction (ADC): Sample Collection, Data Curation, and Beyond

    Authors: Minghao Liu, Zonglin Di, Jiaheng Wei, Zhongruo Wang, Hengxiang Zhang, Ruixuan Xiao, Haoyu Wang, Jinlong Pang, Hao Chen, Ankit Shah, Hongxin Wei, Xinlei He, Zhaowei Zhao, Haobo Wang, Lei Feng, Jindong Wang, James Davis, Yang Liu

    Abstract: Large-scale data collection is essential for developing personalized training data, mitigating the shortage of training data, and fine-tuning specialized models. However, creating high-quality datasets quickly and accurately remains a challenge due to annotation errors, the substantial time and costs associated with human labor. To address these issues, we propose Automatic Dataset Construction (A… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  38. arXiv:2408.10694  [pdf, other

    cs.CV

    MsMemoryGAN: A Multi-scale Memory GAN for Palm-vein Adversarial Purification

    Authors: Huafeng Qin, Yuming Fu, Huiyan Zhang, Mounim A. El-Yacoubi, Xinbo Gao, Qun Song, Jun Wang

    Abstract: Deep neural networks have recently achieved promising performance in the vein recognition task and have shown an increasing application trend, however, they are prone to adversarial perturbation attacks by adding imperceptible perturbations to the input, resulting in making incorrect recognition. To address this issue, we propose a novel defense model named MsMemoryGAN, which aims to filter the pe… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  39. arXiv:2408.10463  [pdf, other

    cs.SD cs.LG eess.AS

    Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting

    Authors: Hyun Jin Park, Dhruuv Agarwal, Neng Chen, Rentao Sun, Kurt Partridge, Justin Chen, Harry Zhang, Pai Zhu, Jacob Bartel, Kyle Kastner, Gary Wang, Andrew Rosenberg, Quan Wang

    Abstract: The keyword spotting (KWS) problem requires large amounts of real speech training data to achieve high accuracy across diverse populations. Utilizing large amounts of text-to-speech (TTS) synthesized data can reduce the cost and time associated with KWS development. However, TTS data may contain artifacts not present in real speech, which the KWS model can exploit (overfit), leading to degraded ac… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: to be published in a Workshop at Interspeech 2024, Synthetic Data's Transformative Role in Foundational Speech Models

  40. arXiv:2408.10247  [pdf, other

    q-bio.BM cs.AI

    MetaEnzyme: Meta Pan-Enzyme Learning for Task-Adaptive Redesign

    Authors: Jiangbin Zheng, Han Zhang, Qianqing Xu, An-Ping Zeng, Stan Z. Li

    Abstract: Enzyme design plays a crucial role in both industrial production and biology. However, this field faces challenges due to the lack of comprehensive benchmarks and the complexity of enzyme design tasks, leading to a dearth of systematic research. Consequently, computational enzyme design is relatively overlooked within the broader protein domain and remains in its early stages. In this work, we add… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Accepted to ACM Multimedia 2024

  41. arXiv:2408.10135  [pdf, other

    cs.CV

    $R^2$-Mesh: Reinforcement Learning Powered Mesh Reconstruction via Geometry and Appearance Refinement

    Authors: Haoyang Wang, Liming Liu, Quanlu Jia, Jiangkai Wu, Haodan Zhang, Peiheng Wang, Xinggong Zhang

    Abstract: Mesh reconstruction based on Neural Radiance Fields (NeRF) is popular in a variety of applications such as computer graphics, virtual reality, and medical imaging due to its efficiency in handling complex geometric structures and facilitating real-time rendering. However, existing works often fail to capture fine geometric details accurately and struggle with optimizing rendering quality. To addre… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  42. arXiv:2408.10069  [pdf, other

    cs.CV

    LNQ 2023 challenge: Benchmark of weakly-supervised techniques for mediastinal lymph node quantification

    Authors: Reuben Dorent, Roya Khajavi, Tagwa Idris, Erik Ziegler, Bhanusupriya Somarouthu, Heather Jacene, Ann LaCasce, Jonathan Deissler, Jan Ehrhardt, Sofija Engelson, Stefan M. Fischer, Yun Gu, Heinz Handels, Satoshi Kasai, Satoshi Kondo, Klaus Maier-Hein, Julia A. Schnabel, Guotai Wang, Litingyu Wang, Tassilo Wald, Guang-Zhong Yang, Hanxiao Zhang, Minghui Zhang, Steve Pieper, Gordon Harris , et al. (2 additional authors not shown)

    Abstract: Accurate assessment of lymph node size in 3D CT scans is crucial for cancer staging, therapeutic management, and monitoring treatment response. Existing state-of-the-art segmentation frameworks in medical imaging often rely on fully annotated datasets. However, for lymph node segmentation, these datasets are typically small due to the extensive time and expertise required to annotate the numerous… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Submitted to MELBA

  43. arXiv:2408.09929  [pdf, other

    cs.LG cs.CV

    Data Augmentation of Contrastive Learning is Estimating Positive-incentive Noise

    Authors: Hongyuan Zhang, Yanchen Xu, Sida Huang, Xuelong Li

    Abstract: Inspired by the idea of Positive-incentive Noise (Pi-Noise or $π$-Noise) that aims at learning the reliable noise beneficial to tasks, we scientifically investigate the connection between contrastive learning and $π$-noise in this paper. By converting the contrastive loss to an auxiliary Gaussian distribution to quantitatively measure the difficulty of the specific contrastive model under the info… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  44. arXiv:2408.09886  [pdf, other

    cs.CV

    SAM-UNet:Enhancing Zero-Shot Segmentation of SAM for Universal Medical Images

    Authors: Sihan Yang, Haixia Bi, Hai Zhang, Jian Sun

    Abstract: Segment Anything Model (SAM) has demonstrated impressive performance on a wide range of natural image segmentation tasks. However, its performance significantly deteriorates when directly applied to medical domain, due to the remarkable differences between natural images and medical images. Some researchers have attempted to train SAM on large scale medical datasets. However, poor zero-shot perfor… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  45. arXiv:2408.09752  [pdf, other

    cs.CV

    A Unified Framework for Iris Anti-Spoofing: Introducing IrisGeneral Dataset and Masked-MoE Method

    Authors: Hang Zou, Chenxi Du, Ajian Liu, Yuan Zhang, Jing Liu, Mingchuan Yang, Jun Wan, Hui Zhang

    Abstract: Iris recognition is widely used in high-security scenarios due to its stability and distinctiveness. However, the acquisition of iris images typically requires near-infrared illumination and near-infrared band filters, leading to significant and consistent differences in imaging across devices. This underscores the importance of developing cross-domain capabilities in iris anti-spoofing methods. D… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  46. arXiv:2408.09698  [pdf, other

    cs.IR cs.AI

    Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation

    Authors: Yuyang Ye, Zhi Zheng, Yishan Shen, Tianshu Wang, Hengruo Zhang, Peijun Zhu, Runlong Yu, Kai Zhang, Hui Xiong

    Abstract: Recent advances in Large Language Models (LLMs) have demonstrated significant potential in the field of Recommendation Systems (RSs). Most existing studies have focused on converting user behavior logs into textual prompts and leveraging techniques such as prompt tuning to enable LLMs for recommendation tasks. Meanwhile, research interest has recently grown in multimodal recommendation systems tha… ▽ More

    Submitted 20 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

  47. arXiv:2408.09458  [pdf, other

    cs.CV

    G2Face: High-Fidelity Reversible Face Anonymization via Generative and Geometric Priors

    Authors: Haoxin Yang, Xuemiao Xu, Cheng Xu, Huaidong Zhang, Jing Qin, Yi Wang, Pheng-Ann Heng, Shengfeng He

    Abstract: Reversible face anonymization, unlike traditional face pixelization, seeks to replace sensitive identity information in facial images with synthesized alternatives, preserving privacy without sacrificing image clarity. Traditional methods, such as encoder-decoder networks, often result in significant loss of facial details due to their limited learning capacity. Additionally, relying on latent man… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  48. VrdONE: One-stage Video Visual Relation Detection

    Authors: Xinjie Jiang, Chenxi Zheng, Xuemiao Xu, Bangzhen Liu, Weiying Zheng, Huaidong Zhang, Shengfeng He

    Abstract: Video Visual Relation Detection (VidVRD) focuses on understanding how entities interact over time and space in videos, a key step for gaining deeper insights into video scenes beyond basic visual tasks. Traditional methods for VidVRD, challenged by its complexity, typically split the task into two parts: one for identifying what relation categories are present and another for determining their tem… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 12 pages, 8 figures, accepted by ACM Multimedia 2024

  49. arXiv:2408.09262  [pdf, other

    cs.LG cs.AI cs.LO

    PREMAP: A Unifying PREiMage APproximation Framework for Neural Networks

    Authors: Xiyue Zhang, Benjie Wang, Marta Kwiatkowska, Huan Zhang

    Abstract: Most methods for neural network verification focus on bounding the image, i.e., set of outputs for a given input set. This can be used to, for example, check the robustness of neural network predictions to bounded perturbations of an input. However, verifying properties concerning the preimage, i.e., the set of inputs satisfying an output property, requires abstractions in the input space. We pres… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2305.03686

  50. arXiv:2408.09199  [pdf, other

    cs.IR

    TC-RAG:Turing-Complete RAG's Case study on Medical LLM Systems

    Authors: Xinke Jiang, Yue Fang, Rihong Qiu, Haoyu Zhang, Yongxin Xu, Hao Chen, Wentao Zhang, Ruizhe Zhang, Yuchen Fang, Xu Chu, Junfeng Zhao, Yasha Wang

    Abstract: In the pursuit of enhancing domain-specific Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) emerges as a promising solution to mitigate issues such as hallucinations, outdated knowledge, and limited expertise in highly specialized queries. However, existing approaches to RAG fall short by neglecting system state variables, which are crucial for ensuring adaptive control, retriev… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: version 1.0