Skip to main content

Showing 1–50 of 4,848 results for author: zhang, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.17053  [pdf, other

    cs.LG

    Estimating Conditional Average Treatment Effects via Sufficient Representation Learning

    Authors: Pengfei Shi, Wei Zhong, Xinyu Zhang, Ningtao Wang, Xing Fu, Weiqiang Wang, Yin Jin

    Abstract: Estimating the conditional average treatment effects (CATE) is very important in causal inference and has a wide range of applications across many fields. In the estimation process of CATE, the unconfoundedness assumption is typically required to ensure the identifiability of the regression problems. When estimating CATE using high-dimensional data, there have been many variable selection methods… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  2. arXiv:2408.17027  [pdf, other

    cs.CV

    ConDense: Consistent 2D/3D Pre-training for Dense and Sparse Features from Multi-View Images

    Authors: Xiaoshuai Zhang, Zhicheng Wang, Howard Zhou, Soham Ghosh, Danushen Gnanapragasam, Varun Jampani, Hao Su, Leonidas Guibas

    Abstract: To advance the state of the art in the creation of 3D foundation models, this paper introduces the ConDense framework for 3D pre-training utilizing existing pre-trained 2D networks and large-scale multi-view datasets. We propose a novel 2D-3D joint training scheme to extract co-embedded 2D and 3D features in an end-to-end pipeline, where 2D-3D feature consistency is enforced through a volume rende… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: ECCV 2024

  3. arXiv:2408.17009  [pdf, other

    cs.SD eess.AS

    Utilizing Speaker Profiles for Impersonation Audio Detection

    Authors: Hao Gu, JiangYan Yi, Chenglong Wang, Yong Ren, Jianhua Tao, Xinrui Yan, Yujie Chen, Xiaohui Zhang

    Abstract: Fake audio detection is an emerging active topic. A growing number of literatures have aimed to detect fake utterance, which are mostly generated by Text-to-speech (TTS) or voice conversion (VC). However, countermeasures against impersonation remain an underexplored area. Impersonation is a fake type that involves an imitator replicating specific traits and speech style of a target speaker. Unlike… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM MM2024

  4. arXiv:2408.16975  [pdf, other

    q-bio.BM cs.AI cs.LG

    Technical Report of HelixFold3 for Biomolecular Structure Prediction

    Authors: Lihang Liu, Shanzhuo Zhang, Yang Xue, Xianbin Ye, Kunrui Zhu, Yuxin Li, Yang Liu, Xiaonan Zhang, Xiaomin Fang

    Abstract: The AlphaFold series has transformed protein structure prediction with remarkable accuracy, often matching experimental methods. AlphaFold2, AlphaFold-Multimer, and the latest AlphaFold3 represent significant strides in predicting single protein chains, protein complexes, and biomolecular structures. While AlphaFold2 and AlphaFold-Multimer are open-sourced, facilitating rapid and reliable predicti… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  5. arXiv:2408.16958  [pdf, other

    cs.LG cs.AI

    Discovery of False Data Injection Schemes on Frequency Controllers with Reinforcement Learning

    Authors: Romesh Prasad, Malik Hassanaly, Xiangyu Zhang, Abhijeet Sahu

    Abstract: While inverter-based distributed energy resources (DERs) play a crucial role in integrating renewable energy into the power system, they concurrently diminish the grid's system inertia, elevating the risk of frequency instabilities. Furthermore, smart inverters, interfaced via communication networks, pose a potential vulnerability to cyber threats if not diligently managed. To proactively fortify… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  6. arXiv:2408.16500  [pdf, other

    cs.CV

    CogVLM2: Visual Language Models for Image and Video Understanding

    Authors: Wenyi Hong, Weihan Wang, Ming Ding, Wenmeng Yu, Qingsong Lv, Yan Wang, Yean Cheng, Shiyu Huang, Junhui Ji, Zhao Xue, Lei Zhao, Zhuoyi Yang, Xiaotao Gu, Xiaohan Zhang, Guanyu Feng, Da Yin, Zihan Wang, Ji Qi, Xixuan Song, Peng Zhang, Debing Liu, Bin Xu, Juanzi Li, Yuxiao Dong, Jie Tang

    Abstract: Beginning with VisualGLM and CogVLM, we are continuously exploring VLMs in pursuit of enhanced vision-language fusion, efficient higher-resolution architecture, and broader modalities and applications. Here we propose the CogVLM2 family, a new generation of visual language models for image and video understanding including CogVLM2, CogVLM2-Video and GLM-4V. As an image understanding model, CogVLM2… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  7. arXiv:2408.16486  [pdf, other

    cs.CV

    Adapting Vision-Language Models to Open Classes via Test-Time Prompt Tuning

    Authors: Zhengqing Gao, Xiang Ao, Xu-Yao Zhang, Cheng-Lin Liu

    Abstract: Adapting pre-trained models to open classes is a challenging problem in machine learning. Vision-language models fully explore the knowledge of text modality, demonstrating strong zero-shot recognition performance, which is naturally suited for various open-set problems. More recently, some research focuses on fine-tuning such models to downstream tasks. Prompt tuning methods achieved huge improve… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: PRCV 2024

  8. arXiv:2408.16315  [pdf, other

    cs.HC cs.LG eess.SP

    Passenger hazard perception based on EEG signals for highly automated driving vehicles

    Authors: Ashton Yu Xuan Tan, Yingkai Yang, Xiaofei Zhang, Bowen Li, Xiaorong Gao, Sifa Zheng, Jianqiang Wang, Xinyu Gu, Jun Li, Yang Zhao, Yuxin Zhang, Tania Stathaki

    Abstract: Enhancing the safety of autonomous vehicles is crucial, especially given recent accidents involving automated systems. As passengers in these vehicles, humans' sensory perception and decision-making can be integrated with autonomous systems to improve safety. This study explores neural mechanisms in passenger-vehicle interactions, leading to the development of a Passenger Cognitive Model (PCM) and… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  9. arXiv:2408.16277  [pdf

    eess.IV cs.CV

    Fine-grained Classification of Port Wine Stains Using Optical Coherence Tomography Angiography

    Authors: Xiaofeng Deng, Defu Chen, Bowen Liu, Xiwan Zhang, Haixia Qiu, Wu Yuan, Hongliang Ren

    Abstract: Accurate classification of port wine stains (PWS, vascular malformations present at birth), is critical for subsequent treatment planning. However, the current method of classifying PWS based on the external skin appearance rarely reflects the underlying angiopathological heterogeneity of PWS lesions, resulting in inconsistent outcomes with the common vascular-targeted photodynamic therapy (V-PDT)… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  10. arXiv:2408.16265  [pdf, other

    cs.CV

    Low Saturation Confidence Distribution-based Test-Time Adaptation for Cross-Domain Remote Sensing Image Classification

    Authors: Yu Liang, Xiucheng Zhang, Juepeng Zheng, Jianxi Huang, Haohuan Fu

    Abstract: Although the Unsupervised Domain Adaptation (UDA) method has improved the effect of remote sensing image classification tasks, most of them are still limited by access to the source domain (SD) data. Designs such as Source-free Domain Adaptation (SFDA) solve the challenge of a lack of SD data, however, they still rely on a large amount of target domain data and thus cannot achieve fast adaptations… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  11. arXiv:2408.16233  [pdf, other

    cs.CV

    PSE-Net: Channel Pruning for Convolutional Neural Networks with Parallel-subnets Estimator

    Authors: Shiguang Wang, Tao Xie, Haijun Liu, Xingcheng Zhang, Jian Cheng

    Abstract: Channel Pruning is one of the most widespread techniques used to compress deep neural networks while maintaining their performances. Currently, a typical pruning algorithm leverages neural architecture search to directly find networks with a configurable width, the key step of which is to identify representative subnet for various pruning ratios by training a supernet. However, current methods mai… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 10pages, Neural Networks

  12. arXiv:2408.15994  [pdf, other

    cs.CV

    Perceive-IR: Learning to Perceive Degradation Better for All-in-One Image Restoration

    Authors: Xu Zhang, Jiaqi Ma, Guoli Wang, Qian Zhang, Huan Zhang, Lefei Zhang

    Abstract: The limitations of task-specific and general image restoration methods for specific degradation have prompted the development of all-in-one image restoration techniques. However, the diversity of patterns among multiple degradation, along with the significant uncertainties in mapping between degraded images of different severities and their corresponding undistorted versions, pose significant chal… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 13 pages, 8 figures

  13. arXiv:2408.15844  [pdf, other

    cs.CV cs.IT

    Shot Segmentation Based on Von Neumann Entropy for Key Frame Extraction

    Authors: Xueqing Zhang, Di Fu, Naihao Liu

    Abstract: Video key frame extraction is important in various fields, such as video summary, retrieval, and compression. Therefore, we suggest a video key frame extraction algorithm based on shot segmentation using Von Neumann entropy. The segmentation of shots is achieved through the computation of Von Neumann entropy of the similarity matrix among frames within the video sequence. The initial frame of each… ▽ More

    Submitted 29 August, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

    Comments: 14 pages, 5 figures

  14. arXiv:2408.15310  [pdf, other

    q-bio.MN cs.CE cs.LG

    RGDA-DDI: Residual graph attention network and dual-attention based framework for drug-drug interaction prediction

    Authors: Changjian Zhou, Xin Zhang, Jiafeng Li, Jia Song, Wensheng Xiang

    Abstract: Recent studies suggest that drug-drug interaction (DDI) prediction via computational approaches has significant importance for understanding the functions and co-prescriptions of multiple drugs. However, the existing silico DDI prediction methods either ignore the potential interactions among drug-drug pairs (DDPs), or fail to explicitly model and fuse the multi-scale drug feature representations… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  15. arXiv:2408.15032  [pdf, other

    cs.CV cs.AI

    Mamba2MIL: State Space Duality Based Multiple Instance Learning for Computational Pathology

    Authors: Yuqi Zhang, Xiaoqian Zhang, Jiakai Wang, Yuancheng Yang, Taiying Peng, Chao Tong

    Abstract: Computational pathology (CPath) has significantly advanced the clinical practice of pathology. Despite the progress made, Multiple Instance Learning (MIL), a promising paradigm within CPath, continues to face challenges, particularly related to incomplete information utilization. Existing frameworks, such as those based on Convolutional Neural Networks (CNNs), attention, and selective scan space s… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  16. arXiv:2408.15018  [pdf, other

    cs.HC cs.AI

    Cross-subject Brain Functional Connectivity Analysis for Multi-task Cognitive State Evaluation

    Authors: Jun Chen, Anqi Chen, Bingkun Jiang, Mohammad S. Obaidat, Ni Li, Xinyu Zhang

    Abstract: Cognition refers to the function of information perception and processing, which is the fundamental psychological essence of human beings. It is responsible for reasoning and decision-making, while its evaluation is significant for the aviation domain in mitigating potential safety risks. Existing studies tend to use varied methods for cognitive state evaluation yet have limitations in timeliness,… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  17. arXiv:2408.14735  [pdf, other

    cs.MM cs.CR cs.DC

    PPVF: An Efficient Privacy-Preserving Online Video Fetching Framework with Correlated Differential Privacy

    Authors: Xianzhi Zhang, Yipeng Zhou, Di Wu, Quan Z. Sheng, Miao Hu, Linchang Xiao

    Abstract: Online video streaming has evolved into an integral component of the contemporary Internet landscape. Yet, the disclosure of user requests presents formidable privacy challenges. As users stream their preferred online videos, their requests are automatically seized by video content providers, potentially leaking users' privacy. Unfortunately, current protection methods are not well-suited to pre… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  18. arXiv:2408.14608  [pdf, other

    cs.LG stat.ML

    Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold

    Authors: Lazar Atanackovic, Xi Zhang, Brandon Amos, Mathieu Blanchette, Leo J. Lee, Yoshua Bengio, Alexander Tong, Kirill Neklyudov

    Abstract: Numerous biological and physical processes can be modeled as systems of interacting entities evolving continuously over time, e.g. the dynamics of communicating cells or physical particles. Learning the dynamics of such systems is essential for predicting the temporal evolution of populations across novel samples and unseen environments. Flow-based models allow for learning these dynamics at the p… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  19. arXiv:2408.14397  [pdf, other

    cs.AI cs.CL cs.CV

    Uncovering Knowledge Gaps in Radiology Report Generation Models through Knowledge Graphs

    Authors: Xiaoman Zhang, Julián N. Acosta, Hong-Yu Zhou, Pranav Rajpurkar

    Abstract: Recent advancements in artificial intelligence have significantly improved the automatic generation of radiology reports. However, existing evaluation methods fail to reveal the models' understanding of radiological images and their capacity to achieve human-level granularity in descriptions. To bridge this gap, we introduce a system, named ReXKG, which extracts structured information from process… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Code is available at: https://github.com/rajpurkarlab/ReXKG

  20. arXiv:2408.14342  [pdf, other

    cs.CV physics.med-ph

    Dual-Domain CLIP-Assisted Residual Optimization Perception Model for Metal Artifact Reduction

    Authors: Xinrui Zhang, Ailong Cai, Shaoyu Wang, Linyuan Wang, Zhizhong Zheng, Lei Li, Bin Yan

    Abstract: Metal artifacts in computed tomography (CT) imaging pose significant challenges to accurate clinical diagnosis. The presence of high-density metallic implants results in artifacts that deteriorate image quality, manifesting in the forms of streaking, blurring, or beam hardening effects, etc. Nowadays, various deep learning-based approaches, particularly generative models, have been proposed for me… ▽ More

    Submitted 29 August, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

    Comments: 14 pages, 18 figures

  21. arXiv:2408.13922  [pdf, other

    cs.CV

    COMPOSE: Comprehensive Portrait Shadow Editing

    Authors: Andrew Hou, Zhixin Shu, Xuaner Zhang, He Zhang, Yannick Hold-Geoffroy, Jae Shin Yoon, Xiaoming Liu

    Abstract: Existing portrait relighting methods struggle with precise control over facial shadows, particularly when faced with challenges such as handling hard shadows from directional light sources or adjusting shadows while remaining in harmony with existing lighting conditions. In many situations, completely altering input lighting is undesirable for portrait retouching applications: one may want to pres… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: Accepted at ECCV 2024

  22. arXiv:2408.13771  [pdf, other

    cs.CV

    ICFRNet: Image Complexity Prior Guided Feature Refinement for Real-time Semantic Segmentation

    Authors: Xin Zhang, Teodor Boyadzhiev, Jinglei Shi, Jufeng Yang

    Abstract: In this paper, we leverage image complexity as a prior for refining segmentation features to achieve accurate real-time semantic segmentation. The design philosophy is based on the observation that different pixel regions within an image exhibit varying levels of complexity, with higher complexities posing a greater challenge for accurate segmentation. We thus introduce image complexity as prior g… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  23. Enhancing Adaptive Deep Networks for Image Classification via Uncertainty-aware Decision Fusion

    Authors: Xu Zhang, Zhipeng Xie, Haiyang Yu, Qitong Wang, Peng Wang, Wei Wang

    Abstract: Handling varying computational resources is a critical issue in modern AI applications. Adaptive deep networks, featuring the dynamic employment of multiple classifier heads among different layers, have been proposed to address classification tasks under varying computing resources. Existing approaches typically utilize the last classifier supported by the available resources for inference, as the… ▽ More

    Submitted 29 August, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

    Comments: 13 pages, 27 figures. In ACM Multimedia 2024

  24. arXiv:2408.13681  [pdf, other

    cs.CE cs.SI

    Smart Home Cyber Insurance Pricing

    Authors: Xiaoyu Zhang, Maochao Xu, Shouhuai Xu

    Abstract: Our homes are increasingly employing various kinds of Internet of Things (IoT) devices, leading to the notion of smart homes. While this trend brings convenience to our daily life, it also introduces cyber risks. To mitigate such risks, the demand for smart home cyber insurance has been growing rapidly. However, there are no studies on analyzing the competency of smart home cyber insurance policie… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  25. arXiv:2408.13460  [pdf, other

    cs.LG cs.CR stat.ML

    DOPPLER: Differentially Private Optimizers with Low-pass Filter for Privacy Noise Reduction

    Authors: Xinwei Zhang, Zhiqi Bu, Mingyi Hong, Meisam Razaviyayn

    Abstract: Privacy is a growing concern in modern deep-learning systems and applications. Differentially private (DP) training prevents the leakage of sensitive information in the collected training data from the trained machine learning models. DP optimizers, including DP stochastic gradient descent (DPSGD) and its variants, privatize the training procedure by gradient clipping and DP noise injection. Howev… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  26. arXiv:2408.13293  [pdf, other

    cs.LG cs.AI

    Causally-Aware Spatio-Temporal Multi-Graph Convolution Network for Accurate and Reliable Traffic Prediction

    Authors: Pingping Dong, Xiao-Lin Wang, Indranil Bose, Kam K. H. Ng, Xiaoning Zhang, Xiaoge Zhang

    Abstract: Accurate and reliable prediction has profound implications to a wide range of applications. In this study, we focus on an instance of spatio-temporal learning problem--traffic prediction--to demonstrate an advanced deep learning model developed for making accurate and reliable forecast. Despite the significant progress in traffic prediction, limited studies have incorporated both explicit and impl… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  27. arXiv:2408.12811  [pdf, ps, other

    cs.IT

    Decentralized MIMO Systems with LMMSE Receivers and Imperfect CSI

    Authors: Zeyan Zhuang, Xin Zhang, Dongfang Xu, Shenghui Song, Yonina C. Eldar

    Abstract: Centralized baseband processing (CBP) is required to achieve the full potential of massive multiple-input multiple-output (MIMO) systems. However, due to the large number of antennas, CBP suffers from two major issues: 1) Tremendous data interconnection between radio frequency (RF) circuitry and processing fabrics; and 2) high-dimensional computation. To this end, decentralized baseband processing… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  28. arXiv:2408.12616  [pdf, other

    cs.CV cs.AI

    Semantic Communication based on Large Language Model for Underwater Image Transmission

    Authors: Weilong Chen, Wenxuan Xu, Haoran Chen, Xinran Zhang, Zhijin Qin, Yanru Zhang, Zhu Han

    Abstract: Underwater communication is essential for environmental monitoring, marine biology research, and underwater exploration. Traditional underwater communication faces limitations like low bandwidth, high latency, and susceptibility to noise, while semantic communication (SC) offers a promising solution by focusing on the exchange of semantics rather than symbols or bits. However, SC encounters challe… ▽ More

    Submitted 25 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

  29. arXiv:2408.12352  [pdf, other

    cs.CV

    GarmentAligner: Text-to-Garment Generation via Retrieval-augmented Multi-level Corrections

    Authors: Shiyue Zhang, Zheng Chong, Xujie Zhang, Hanhui Li, Yuhao Cheng, Yiqiang Yan, Xiaodan Liang

    Abstract: General text-to-image models bring revolutionary innovation to the fields of arts, design, and media. However, when applied to garment generation, even the state-of-the-art text-to-image models suffer from fine-grained semantic misalignment, particularly concerning the quantity, position, and interrelations of garment components. Addressing this, we propose GarmentAligner, a text-to-garment diffus… ▽ More

    Submitted 23 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV 2024

  30. arXiv:2408.12317  [pdf, other

    cs.CV

    Adapt CLIP as Aggregation Instructor for Image Dehazing

    Authors: Xiaozhe Zhang, Fengying Xie, Haidong Ding, Linpeng Pan, Zhenwei Shi

    Abstract: Most dehazing methods suffer from limited receptive field and do not explore the rich semantic prior encapsulated in vision-language models, which have proven effective in downstream tasks. In this paper, we introduce CLIPHaze, a pioneering hybrid framework that synergizes the efficient global modeling of Mamba with the prior knowledge and zero-shot capabilities of CLIP to address both issues simu… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 12 pages, 6 figures

  31. arXiv:2408.11878  [pdf, other

    cs.CL cs.CE q-fin.CP

    Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

    Authors: Qianqian Xie, Dong Li, Mengxi Xiao, Zihao Jiang, Ruoyu Xiang, Xiao Zhang, Zhengyu Chen, Yueru He, Weiguang Han, Yuzhe Yang, Shunian Chen, Yifei Zhang, Lihang Shen, Daniel Kim, Zhiwei Liu, Zheheng Luo, Yangyang Yu, Yupeng Cao, Zhiyang Deng, Zhiyuan Yao, Haohang Li, Duanyu Feng, Yongfu Dai, VijayaSai Somasundaram, Peng Lu , et al. (14 additional authors not shown)

    Abstract: Large language models (LLMs) have advanced financial applications, yet they often lack sufficient financial knowledge and struggle with tasks involving multi-modal inputs like tables and time series data. To address these limitations, we introduce \textit{Open-FinLLMs}, a series of Financial LLMs. We begin with FinLLaMA, pre-trained on a 52 billion token financial corpus, incorporating text, table… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 33 pages, 13 figures

  32. arXiv:2408.11875  [pdf, other

    cs.CL cs.AI cs.IR

    Hierarchical Retrieval-Augmented Generation Model with Rethink for Multi-hop Question Answering

    Authors: Xiaoming Zhang, Ming Wang, Xiaocui Yang, Daling Wang, Shi Feng, Yifei Zhang

    Abstract: Multi-hop Question Answering (QA) necessitates complex reasoning by integrating multiple pieces of information to resolve intricate questions. However, existing QA systems encounter challenges such as outdated information, context window length limitations, and an accuracy-quantity trade-off. To address these issues, we propose a novel framework, the Hierarchical Retrieval-Augmented Generation Mod… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: undereview

  33. arXiv:2408.11845  [pdf, other

    cs.CL

    LLaMA based Punctuation Restoration With Forward Pass Only Decoding

    Authors: Yutong Pang, Debjyoti Paul, Kevin Jiang, Xuedong Zhang, Xin Lei

    Abstract: This paper introduces two advancements in the field of Large Language Model Annotation with a focus on punctuation restoration tasks. Our first contribution is the application of LLaMA for punctuation restoration, which demonstrates superior performance compared to the established benchmark. Despite its impressive quality, LLaMA faces challenges regarding inference speed and hallucinations. To a… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  34. arXiv:2408.11554  [pdf, other

    cs.CL cs.AI

    Differentiating Choices via Commonality for Multiple-Choice Question Answering

    Authors: Wenqing Deng, Zhe Wang, Kewen Wang, Shirui Pan, Xiaowang Zhang, Zhiyong Feng

    Abstract: Multiple-choice question answering (MCQA) becomes particularly challenging when all choices are relevant to the question and are semantically similar. Yet this setting of MCQA can potentially provide valuable clues for choosing the right answer. Existing models often rank each choice separately, overlooking the context provided by other choices. Specifically, they fail to leverage the semantic com… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 9 pages, accepted to ECAI 2024

  35. arXiv:2408.11505  [pdf, other

    cs.CV

    MSCPT: Few-shot Whole Slide Image Classification with Multi-scale and Context-focused Prompt Tuning

    Authors: Minghao Han, Linhao Qu, Dingkang Yang, Xukun Zhang, Xiaoying Wang, Lihua Zhang

    Abstract: Multiple instance learning (MIL) has become a standard paradigm for weakly supervised classification of whole slide images (WSI). However, this paradigm relies on the use of a large number of labelled WSIs for training. The lack of training data and the presence of rare diseases present significant challenges for these methods. Prompt tuning combined with the pre-trained Vision-Language models (VL… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 11 pages, 5 figures, 5tables

  36. arXiv:2408.11444  [pdf, other

    cs.CR

    A Practical Trigger-Free Backdoor Attack on Neural Networks

    Authors: Jiahao Wang, Xianglong Zhang, Xiuzhen Cheng, Pengfei Hu, Guoming Zhang

    Abstract: Backdoor attacks on deep neural networks have emerged as significant security threats, especially as DNNs are increasingly deployed in security-critical applications. However, most existing works assume that the attacker has access to the original training data. This limitation restricts the practicality of launching such attacks in real-world scenarios. Additionally, using a specified trigger to… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 12 pages, 10 figures

  37. arXiv:2408.11426  [pdf, other

    cs.RO

    AS-LIO: Spatial Overlap Guided Adaptive Sliding Window LiDAR-Inertial Odometry for Aggressive FOV Variation

    Authors: Tianxiang Zhang, Xuanxuan Zhang, Zongbo Liao, Xin Xia, You Li

    Abstract: LiDAR-Inertial Odometry (LIO) demonstrates outstanding accuracy and stability in general low-speed and smooth motion scenarios. However, in high-speed and intense motion scenarios, such as sharp turns, two primary challenges arise: firstly, due to the limitations of IMU frequency, the error in estimating significantly non-linear motion states escalates; secondly, drastic changes in the Field of Vi… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 8 pages, 6 figures

  38. arXiv:2408.11416  [pdf, other

    cs.MA cs.RO

    Subgoal-based Hierarchical Reinforcement Learning for Multi-Agent Collaboration

    Authors: Cheng Xu, Changtian Zhang, Yuchen Shi, Ran Wang, Shihong Duan, Yadong Wan, Xiaotong Zhang

    Abstract: Recent advancements in reinforcement learning have made significant impacts across various domains, yet they often struggle in complex multi-agent environments due to issues like algorithm instability, low sampling efficiency, and the challenges of exploration and dimensionality explosion. Hierarchical reinforcement learning (HRL) offers a structured approach to decompose complex tasks into simple… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  39. arXiv:2408.11408  [pdf, other

    cs.CV

    Latent Feature and Attention Dual Erasure Attack against Multi-View Diffusion Models for 3D Assets Protection

    Authors: Jingwei Sun, Xuchong Zhang, Changfeng Sun, Qicheng Bai, Hongbin Sun

    Abstract: Multi-View Diffusion Models (MVDMs) enable remarkable improvements in the field of 3D geometric reconstruction, but the issue regarding intellectual property has received increasing attention due to unauthorized imitation. Recently, some works have utilized adversarial attacks to protect copyright. However, all these works focus on single-image generation tasks which only need to consider the inne… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  40. arXiv:2408.11381  [pdf, other

    cs.CL

    RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation

    Authors: Xuanwang Zhang, Yunze Song, Yidong Wang, Shuyun Tang, Xinfeng Li, Zhengran Zeng, Zhen Wu, Wei Ye, Wenyuan Xu, Yue Zhang, Xinyu Dai, Shikun Zhang, Qingsong Wen

    Abstract: Large Language Models (LLMs) demonstrate human-level capabilities in dialogue, reasoning, and knowledge retention. However, even the most advanced LLMs face challenges such as hallucinations and real-time updating of their knowledge. Current research addresses this bottleneck by equipping LLMs with external knowledge, a technique known as Retrieval Augmented Generation (RAG). However, two key issu… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 6 pages, 3 figures

  41. arXiv:2408.11306  [pdf, other

    cs.LG cs.AI

    KAN4TSF: Are KAN and KAN-based models Effective for Time Series Forecasting?

    Authors: Xiao Han, Xinfeng Zhang, Yiling Wu, Zhenduo Zhang, Zhe Wu

    Abstract: Time series forecasting is a crucial task that predicts the future values of variables based on historical data. Time series forecasting techniques have been developing in parallel with the machine learning community, from early statistical learning methods to current deep learning methods. Although existing methods have made significant progress, they still suffer from two challenges. The mathema… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  42. arXiv:2408.11006  [pdf, other

    cs.CL cs.CR

    While GitHub Copilot Excels at Coding, Does It Ensure Responsible Output?

    Authors: Wen Cheng, Ke Sun, Xinyu Zhang, Wei Wang

    Abstract: The rapid development of large language models (LLMs) has significantly advanced code completion capabilities, giving rise to a new generation of LLM-based Code Completion Tools (LCCTs). Unlike general-purpose LLMs, these tools possess unique workflows, integrating multiple information sources as input and prioritizing code suggestions over natural language interaction, which introduces distinct s… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  43. arXiv:2408.11001  [pdf, other

    cs.CV

    MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning

    Authors: Haoning Wu, Shaocheng Shen, Qiang Hu, Xiaoyun Zhang, Ya Zhang, Yanfeng Wang

    Abstract: Diffusion models have emerged as frontrunners in text-to-image generation for their impressive capabilities. Nonetheless, their fixed image resolution during training often leads to challenges in high-resolution image generation, such as semantic inaccuracies and object replication. This paper introduces MegaFusion, a novel approach that extends existing diffusion-based text-to-image generation mo… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Technical Report. Project Page: https://haoningwu3639.github.io/MegaFusion/

  44. arXiv:2408.10706  [pdf, ps, other

    cs.IT eess.SP

    Performance Analysis of Physical Layer Security: From Far-Field to Near-Field

    Authors: Boqun Zhao, Chongjun Ouyang, Xingqi Zhang, Yuanwei Liu

    Abstract: The secrecy performance in both near-field and far-field communications is analyzed using two fundamental metrics: the secrecy capacity under a power constraint and the minimum power requirement to achieve a specified secrecy rate target. 1) For the secrecy capacity, a closed-form expression is derived under a discrete-time memoryless setup. This expression is further analyzed under several far-fi… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  45. arXiv:2408.10404  [pdf, other

    cs.CV eess.IV eess.SP

    Parallel Processing of Point Cloud Ground Segmentation for Mechanical and Solid-State LiDARs

    Authors: Xiao Zhang, Zhanhong Huang, Garcia Gonzalez Antony, Witek Jachimczyk, Xinming Huang

    Abstract: In this study, we introduce a novel parallel processing framework for real-time point cloud ground segmentation on FPGA platforms, aimed at adapting LiDAR algorithms to the evolving landscape from mechanical to solid-state LiDAR (SSL) technologies. Focusing on the ground segmentation task, we explore parallel processing techniques on existing approaches and adapt them to real-world SSL data handli… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 5 pages

  46. arXiv:2408.10381  [pdf, other

    stat.ML cs.AI cs.LG

    Efficient Reinforcement Learning in Probabilistic Reward Machines

    Authors: Xiaofeng Lin, Xuezhou Zhang

    Abstract: In this paper, we study reinforcement learning in Markov Decision Processes with Probabilistic Reward Machines (PRMs), a form of non-Markovian reward commonly found in robotics tasks. We design an algorithm for PRMs that achieves a regret bound of $\widetilde{O}(\sqrt{HOAT} + H^2O^2A^{3/2} + H\sqrt{T})$, where $H$ is the time horizon, $O$ is the number of observations, $A$ is the number of actions… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 33 pages, 4 figures

  47. arXiv:2408.10365  [pdf, other

    cs.AI

    AI-Driven Review Systems: Evaluating LLMs in Scalable and Bias-Aware Academic Reviews

    Authors: Keith Tyser, Ben Segev, Gaston Longhitano, Xin-Yu Zhang, Zachary Meeks, Jason Lee, Uday Garg, Nicholas Belsten, Avi Shporer, Madeleine Udell, Dov Te'eni, Iddo Drori

    Abstract: Automatic reviewing helps handle a large volume of papers, provides early feedback and quality control, reduces bias, and allows the analysis of trends. We evaluate the alignment of automatic paper reviews with human reviews using an arena of human preferences by pairwise comparisons. Gathering human preference may be time-consuming; therefore, we also use an LLM to automatically evaluate reviews… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 42 pages

  48. arXiv:2408.10198  [pdf, other

    cs.CV cs.GR

    MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model

    Authors: Minghua Liu, Chong Zeng, Xinyue Wei, Ruoxi Shi, Linghao Chen, Chao Xu, Mengqi Zhang, Zhaoning Wang, Xiaoshuai Zhang, Isabella Liu, Hongzhi Wu, Hao Su

    Abstract: Open-world 3D reconstruction models have recently garnered significant attention. However, without sufficient 3D inductive bias, existing methods typically entail expensive training costs and struggle to extract high-quality 3D meshes. In this work, we introduce MeshFormer, a sparse-view reconstruction model that explicitly leverages 3D native structure, input guidance, and training supervision. S… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 20 pages, 9 figures

  49. arXiv:2408.10135  [pdf, other

    cs.CV

    $R^2$-Mesh: Reinforcement Learning Powered Mesh Reconstruction via Geometry and Appearance Refinement

    Authors: Haoyang Wang, Liming Liu, Quanlu Jia, Jiangkai Wu, Haodan Zhang, Peiheng Wang, Xinggong Zhang

    Abstract: Mesh reconstruction based on Neural Radiance Fields (NeRF) is popular in a variety of applications such as computer graphics, virtual reality, and medical imaging due to its efficiency in handling complex geometric structures and facilitating real-time rendering. However, existing works often fail to capture fine geometric details accurately and struggle with optimizing rendering quality. To addre… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  50. arXiv:2408.10124  [pdf, other

    cs.LG cs.AI cs.IR physics.chem-ph q-bio.BM

    Molecular Graph Representation Learning Integrating Large Language Models with Domain-specific Small Models

    Authors: Tianyu Zhang, Yuxiang Ren, Chengbin Hou, Hairong Lv, Xuegong Zhang

    Abstract: Molecular property prediction is a crucial foundation for drug discovery. In recent years, pre-trained deep learning models have been widely applied to this task. Some approaches that incorporate prior biological domain knowledge into the pre-training framework have achieved impressive results. However, these methods heavily rely on biochemical experts, and retrieving and summarizing vast amounts… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.