Skip to main content

Showing 1–50 of 1,532 results for author: Xue, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.01686  [pdf, other

    cs.CV

    Frequency-Spatial Entanglement Learning for Camouflaged Object Detection

    Authors: Yanguang Sun, Chunyan Xu, Jian Yang, Hanyu Xuan, Lei Luo

    Abstract: Camouflaged object detection has attracted a lot of attention in computer vision. The main challenge lies in the high degree of similarity between camouflaged objects and their surroundings in the spatial domain, making identification difficult. Existing methods attempt to reduce the impact of pixel similarity by maximizing the distinguishing ability of spatial features with complicated design, bu… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: Accepted at ECCV 2024

  2. arXiv:2409.01522  [pdf, other

    cs.CV

    Lagrangian Motion Fields for Long-term Motion Generation

    Authors: Yifei Yang, Zikai Huang, Chenshu Xu, Shengfeng He

    Abstract: Long-term motion generation is a challenging task that requires producing coherent and realistic sequences over extended durations. Current methods primarily rely on framewise motion representations, which capture only static spatial details and overlook temporal dynamics. This approach leads to significant redundancy across the temporal dimension, complicating the generation of effective long-ter… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 13 pages, 9 figures

  3. arXiv:2409.01366  [pdf, other

    cs.CL cs.AI cs.LG

    CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification

    Authors: Junhui He, Shangyu Wu, Weidong Wen, Chun Jason Xue, Qingan Li

    Abstract: Deploying large language models (LLMs) on edge devices presents significant challenges due to the substantial computational overhead and memory requirements. Activation sparsification can mitigate these challenges by reducing the number of activated neurons during inference. Existing methods typically employ thresholding-based sparsification based on the statistics of activation tensors. However,… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  4. arXiv:2409.01256  [pdf, other

    cs.CV cs.AI

    Real-time Accident Anticipation for Autonomous Driving Through Monocular Depth-Enhanced 3D Modeling

    Authors: Haicheng Liao, Yongkang Li, Chengyue Wang, Songning Lai, Zhenning Li, Zilin Bian, Jaeyoung Lee, Zhiyong Cui, Guohui Zhang, Chengzhong Xu

    Abstract: The primary goal of traffic accident anticipation is to foresee potential accidents in real time using dashcam videos, a task that is pivotal for enhancing the safety and reliability of autonomous driving technologies. In this study, we introduce an innovative framework, AccNet, which significantly advances the prediction capabilities beyond the current state-of-the-art (SOTA) 2D-based methods by… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  5. arXiv:2409.00744  [pdf, other

    cs.CV cs.RO

    DSLO: Deep Sequence LiDAR Odometry Based on Inconsistent Spatio-temporal Propagation

    Authors: Huixin Zhang, Guangming Wang, Xinrui Wu, Chenfeng Xu, Mingyu Ding, Masayoshi Tomizuka, Wei Zhan, Hesheng Wang

    Abstract: This paper introduces a 3D point cloud sequence learning model based on inconsistent spatio-temporal propagation for LiDAR odometry, termed DSLO. It consists of a pyramid structure with a spatial information reuse strategy, a sequential pose initialization module, a gated hierarchical pose refinement module, and a temporal feature propagation module. First, spatial features are encoded using a poi… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: 6 pages, 5 figures, accepted by IROS 2024

  6. arXiv:2409.00097  [pdf, other

    cs.CL cs.AI

    Large Language Models for Disease Diagnosis: A Scoping Review

    Authors: Shuang Zhou, Zidu Xu, Mian Zhang, Chunpu Xu, Yawen Guo, Zaifu Zhan, Sirui Ding, Jiashuo Wang, Kaishuai Xu, Yi Fang, Liqiao Xia, Jeremy Yeung, Daochen Zha, Mingquan Lin, Rui Zhang

    Abstract: Automatic disease diagnosis has become increasingly valuable in clinical practice. The advent of large language models (LLMs) has catalyzed a paradigm shift in artificial intelligence, with growing evidence supporting the efficacy of LLMs in diagnostic tasks. Despite the growing attention in this field, many critical research questions remain under-explored. For instance, what diseases and LLM tec… ▽ More

    Submitted 26 August, 2024; originally announced September 2024.

    Comments: 57 pages

  7. arXiv:2408.16357  [pdf, other

    cs.CV

    Law of Vision Representation in MLLMs

    Authors: Shijia Yang, Bohan Zhai, Quanzeng You, Jianbo Yuan, Hongxia Yang, Chenfeng Xu

    Abstract: We present the "Law of Vision Representation" in multimodal large language models (MLLMs). It reveals a strong correlation between the combination of cross-modal alignment, correspondence in vision representation, and MLLM performance. We quantify the two factors using the cross-modal Alignment and Correspondence score (AC score). Through extensive experiments involving thirteen different vision r… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: The code is available at https://github.com/bronyayang/Law_of_Vision_Representation_in_MLLMs

  8. arXiv:2408.15881  [pdf, other

    cs.CV

    LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

    Authors: Fangxun Shu, Yue Liao, Le Zhuo, Chenning Xu, Guanghao Zhang, Haonan Shi, Long Chen, Tao Zhong, Wanggui He, Siming Fu, Haoyuan Li, Bolin Li, Zhelun Yu, Si Liu, Hongsheng Li, Hao Jiang

    Abstract: We introduce LLaVA-MoD, a novel framework designed to enable the efficient training of small-scale Multimodal Language Models (s-MLLM) by distilling knowledge from large-scale MLLM (l-MLLM). Our approach tackles two fundamental challenges in MLLM distillation. First, we optimize the network structure of s-MLLM by integrating a sparse Mixture of Experts (MoE) architecture into the language model, s… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  9. arXiv:2408.15815  [pdf, other

    cs.SE

    MR-Adopt: Automatic Deduction of Input Transformation Function for Metamorphic Testing

    Authors: Congying Xu, Songqiang Chen, Jiarong Wu, Shing-Chi Cheung, Valerio Terragni, Hengcheng Zhu, Jialun Cao

    Abstract: While a recent study reveals that many developer-written test cases can encode a reusable Metamorphic Relation (MR), over 70% of them directly hard-code the source input and follow-up input in the encoded relation. Such encoded MRs, which do not contain an explicit input transformation to transform the source inputs to corresponding follow-up inputs, cannot be reused with new source inputs to enha… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: This paper is accepted to ASE 2024

  10. arXiv:2408.15287  [pdf, other

    quant-ph cs.LG

    Quantum-Powered Personalized Learning

    Authors: Yifan Zhou, Chong Cheng Xu, Mingi Song, Yew Kee Wong

    Abstract: This paper explores the transformative potential of quantum computing in the realm of personalized learning. Traditional machine learning models and GPU-based approaches have long been utilized to tailor educational experiences to individual student needs. However, these methods face significant challenges in terms of scalability, computational efficiency, and real-time adaptation to the dynamic n… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 9 pages, 2 figures

  11. arXiv:2408.14238  [pdf, other

    cs.IR

    Are LLM-based Recommenders Already the Best? Simple Scaled Cross-entropy Unleashes the Potential of Traditional Sequential Recommenders

    Authors: Cong Xu, Zhangchi Zhu, Mo Yu, Jun Wang, Jianyong Wang, Wei Zhang

    Abstract: Large language models (LLMs) have been garnering increasing attention in the recommendation community. Some studies have observed that LLMs, when fine-tuned by the cross-entropy (CE) loss with a full softmax, could achieve `state-of-the-art' performance in sequential recommendation. However, most of the baselines used for comparison are trained using a pointwise/pairwise loss function. This incons… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 18 pages. arXiv admin note: substantial text overlap with arXiv:2402.06216

  12. arXiv:2408.13980  [pdf, other

    cs.CV

    FusionSAM: Latent Space driven Segment Anything Model for Multimodal Fusion and Segmentation

    Authors: Daixun Li, Weiying Xie, Mingxiang Cao, Yunke Wang, Jiaqing Zhang, Yunsong Li, Leyuan Fang, Chang Xu

    Abstract: Multimodal image fusion and segmentation enhance scene understanding in autonomous driving by integrating data from various sensors. However, current models struggle to efficiently segment densely packed elements in such scenes, due to the absence of comprehensive fusion features that can guide mid-process fine-tuning and focus attention on relevant areas. The Segment Anything Model (SAM) has emer… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  13. arXiv:2408.13423  [pdf, other

    cs.CV

    Training-free Long Video Generation with Chain of Diffusion Model Experts

    Authors: Wenhao Li, Yichao Cao, Xiu Su, Xi Lin, Shan You, Mingkai Zheng, Yi Chen, Chang Xu

    Abstract: Video generation models hold substantial potential in areas such as filmmaking. However, current video diffusion models need high computational costs and produce suboptimal results due to high complexity of video generation task. In this paper, we propose \textbf{ConFiner}, an efficient high-quality video generation framework that decouples video generation into easier subtasks: structure \textbf{… ▽ More

    Submitted 2 September, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

  14. arXiv:2408.12991  [pdf, other

    cs.CE q-fin.TR

    Controllable Financial Market Generation with Diffusion Guided Meta Agent

    Authors: Yu-Hao Huang, Chang Xu, Yang Liu, Weiqing Liu, Wu-Jun Li, Jiang Bian

    Abstract: Order flow modeling stands as the most fundamental and essential financial task, as orders embody the minimal unit within a financial market. However, current approaches often result in unsatisfactory fidelity in generating order flow, and their generation lacks controllability, thereby limiting their application scenario. In this paper, we advocate incorporating controllability into the market ge… ▽ More

    Submitted 1 September, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

  15. arXiv:2408.12779  [pdf, ps, other

    cs.CL cs.AI

    Investigating LLM Applications in E-Commerce

    Authors: Chester Palen-Michel, Ruixiang Wang, Yipeng Zhang, David Yu, Canran Xu, Zhe Wu

    Abstract: The emergence of Large Language Models (LLMs) has revolutionized natural language processing in various applications especially in e-commerce. One crucial step before the application of such LLMs in these fields is to understand and compare the performance in different use cases in such tasks. This paper explored the efficacy of LLMs in the e-commerce domain, focusing on instruction-tuning an open… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  16. arXiv:2408.12527  [pdf, other

    cs.RO cs.CV

    UMAD: University of Macau Anomaly Detection Benchmark Dataset

    Authors: Dong Li, Lineng Chen, Cheng-Zhong Xu, Hui Kong

    Abstract: Anomaly detection is critical in surveillance systems and patrol robots by identifying anomalous regions in images for early warning. Depending on whether reference data are utilized, anomaly detection can be categorized into anomaly detection with reference and anomaly detection without reference. Currently, anomaly detection without reference, which is closely related to out-of-distribution (OoD… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: Accepted by the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2024, project code at https://github.com/IMRL/UMAD

  17. arXiv:2408.12009  [pdf, other

    cs.CV

    CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion

    Authors: Yunlong Tang, Gen Zhan, Li Yang, Yiting Liao, Chenliang Xu

    Abstract: Video saliency prediction aims to identify the regions in a video that attract human attention and gaze, driven by bottom-up features from the video and top-down processes like memory and cognition. Among these top-down influences, language plays a crucial role in guiding attention by shaping how visual information is interpreted. Existing methods primarily focus on modeling perceptual information… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  18. arXiv:2408.11494  [pdf, ps, other

    cs.AI

    Mutagenesis screen to map the functionals of parameters of Large Language Models

    Authors: Yue Hu, Kai Hu, Patrick X. Zhao, Javed Khan, Chengming Xu

    Abstract: Large Language Models (LLMs) have significantly advanced artificial intelligence, excelling in numerous tasks. Although the functionality of a model is inherently tied to its parameters, a systematic method for exploring the connections between the parameters and the functionality are lacking. Models sharing similar structure and parameter counts exhibit significant performance disparities across… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 10 pages, 6 figures, supplementary material available online

    ACM Class: I.2.0

  19. arXiv:2408.11416  [pdf, other

    cs.MA cs.RO

    Subgoal-based Hierarchical Reinforcement Learning for Multi-Agent Collaboration

    Authors: Cheng Xu, Changtian Zhang, Yuchen Shi, Ran Wang, Shihong Duan, Yadong Wan, Xiaotong Zhang

    Abstract: Recent advancements in reinforcement learning have made significant impacts across various domains, yet they often struggle in complex multi-agent environments due to issues like algorithm instability, low sampling efficiency, and the challenges of exploration and dimensionality explosion. Hierarchical reinforcement learning (HRL) offers a structured approach to decompose complex tasks into simple… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  20. arXiv:2408.11311  [pdf, other

    cs.AR quant-ph

    HiMA: Hierarchical Quantum Microarchitecture for Qubit-Scaling and Quantum Process-Level Parallelism

    Authors: Qi Zhou, Zi-Hao Mei, Han-Qing Shi, Liang-Liang Guo, Xiao-Yan Yang, Yun-Jie Wang, Xiao-Fan Xu, Cheng Xue, Wei-Cheng Kong, Jun-Chao Wang, Yu-Chun Wu, Zhao-Yun Chen, Guo-Ping Guo

    Abstract: Quantum computing holds immense potential for addressing a myriad of intricate challenges, which is significantly amplified when scaled to thousands of qubits. However, a major challenge lies in developing an efficient and scalable quantum control system. To address this, we propose a novel Hierarchical MicroArchitecture (HiMA) designed to facilitate qubit scaling and exploit quantum process-level… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  21. arXiv:2408.11241  [pdf, other

    cs.CV

    CooPre: Cooperative Pretraining for V2X Cooperative Perception

    Authors: Seth Z. Zhao, Hao Xiang, Chenfeng Xu, Xin Xia, Bolei Zhou, Jiaqi Ma

    Abstract: Existing Vehicle-to-Everything (V2X) cooperative perception methods rely on accurate multi-agent 3D annotations. Nevertheless, it is time-consuming and expensive to collect and annotate real-world data, especially for V2X systems. In this paper, we present a self-supervised learning method for V2X cooperative perception, which utilizes the vast amount of unlabeled 3D V2X data to enhance the percep… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  22. arXiv:2408.11194  [pdf, other

    cs.CV

    Compress Guidance in Conditional Diffusion Sampling

    Authors: Anh-Dung Dinh, Daochang Liu, Chang Xu

    Abstract: Enforcing guidance throughout the entire sampling process often proves counterproductive due to the model-fitting issue., where samples are generated to match the classifier's parameters rather than generalizing the expected condition. This work identifies and quantifies the problem, demonstrating that reducing or excluding guidance at numerous timesteps can mitigate this issue. By distributing th… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 10 pages, 5 figures, Computer Vision and Machine Learning

    ACM Class: I.4

  23. arXiv:2408.10826  [pdf, other

    cs.DC

    NeuLite: Memory-Efficient Federated Learning via Elastic Progressive Training

    Authors: Yebo Wu, Li Li, Chunlin Tian, Dubing Chen, Chengzhong Xu

    Abstract: Federated Learning (FL) emerges as a new learning paradigm that enables multiple devices to collaboratively train a shared model while preserving data privacy. However, intensive memory footprint during the training process severely bottlenecks the deployment of FL on resource-constrained devices in real-world cases. In this paper, we propose NeuLite, a framework that breaks the memory wall throug… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  24. arXiv:2408.10681  [pdf, other

    cs.CL cs.LG

    HMoE: Heterogeneous Mixture of Experts for Language Modeling

    Authors: An Wang, Xingwu Sun, Ruobing Xie, Shuaipeng Li, Jiaqi Zhu, Zhen Yang, Pinxue Zhao, J. N. Han, Zhanhui Kang, Di Wang, Naoaki Okazaki, Cheng-zhong Xu

    Abstract: Mixture of Experts (MoE) offers remarkable performance and computational efficiency by selectively activating subsets of model parameters. Traditionally, MoE models use homogeneous experts, each with identical capacity. However, varying complexity in input data necessitates experts with diverse capabilities, while homogeneous MoE hinders effective expert specialization and efficient parameter util… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  25. arXiv:2408.10198  [pdf, other

    cs.CV cs.GR

    MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model

    Authors: Minghua Liu, Chong Zeng, Xinyue Wei, Ruoxi Shi, Linghao Chen, Chao Xu, Mengqi Zhang, Zhaoning Wang, Xiaoshuai Zhang, Isabella Liu, Hongzhi Wu, Hao Su

    Abstract: Open-world 3D reconstruction models have recently garnered significant attention. However, without sufficient 3D inductive bias, existing methods typically entail expensive training costs and struggle to extract high-quality 3D meshes. In this work, we introduce MeshFormer, a sparse-view reconstruction model that explicitly leverages 3D native structure, input guidance, and training supervision. S… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 20 pages, 9 figures

  26. arXiv:2408.10195  [pdf, other

    cs.CV cs.AI cs.GR

    SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views

    Authors: Chao Xu, Ang Li, Linghao Chen, Yulin Liu, Ruoxi Shi, Hao Su, Minghua Liu

    Abstract: Open-world 3D generation has recently attracted considerable attention. While many single-image-to-3D methods have yielded visually appealing outcomes, they often lack sufficient controllability and tend to produce hallucinated regions that may not align with users' expectations. In this paper, we explore an important scenario in which the input consists of one or a few unposed 2D images of a sing… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: ECCV 2024

  27. arXiv:2408.09786  [pdf, other

    cs.CV

    Cross-composition Feature Disentanglement for Compositional Zero-shot Learning

    Authors: Yuxia Geng, Runkai Zhu, Jiaoyan Chen, Jintai Chen, Zhuo Chen, Xiang Chen, Can Xu, Yuxiang Wang, Xiaoliang Xu

    Abstract: Disentanglement of visual features of primitives (i.e., attributes and objects) has shown exceptional results in Compositional Zero-shot Learning (CZSL). However, due to the feature divergence of an attribute (resp. object) when combined with different objects (resp. attributes), it is challenging to learn disentangled primitive features that are general across different compositions. To this end,… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: work in progress

  28. arXiv:2408.09476  [pdf, other

    cs.CV cs.LG

    Advances in Multiple Instance Learning for Whole Slide Image Analysis: Techniques, Challenges, and Future Directions

    Authors: Jun Wang, Yu Mao, Nan Guan, Chun Jason Xue

    Abstract: Whole slide images (WSIs) are gigapixel-scale digital images of H\&E-stained tissue samples widely used in pathology. The substantial size and complexity of WSIs pose unique analytical challenges. Multiple Instance Learning (MIL) has emerged as a powerful approach for addressing these challenges, particularly in cancer classification and detection. This survey provides a comprehensive overview of… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  29. arXiv:2408.09468  [pdf, other

    cs.RO

    Towards Safe and Robust Autonomous Vehicle Platooning: A Self-Organizing Cooperative Control Framework

    Authors: Chengkai Xu, Zihao Deng, Jiaqi Liu, Chao Huang, Peng Hang

    Abstract: In the emerging hybrid traffic flow environment, which includes both human-driven vehicles (HDVs) and autonomous vehicles (AVs), ensuring safe and robust decision-making and control is crucial for the effective operation of autonomous vehicle platooning. Current systems for cooperative adaptive cruise control and lane changing are inadequate in responding to real-world emergency situations, limiti… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  30. arXiv:2408.09458  [pdf, other

    cs.CV

    G2Face: High-Fidelity Reversible Face Anonymization via Generative and Geometric Priors

    Authors: Haoxin Yang, Xuemiao Xu, Cheng Xu, Huaidong Zhang, Jing Qin, Yi Wang, Pheng-Ann Heng, Shengfeng He

    Abstract: Reversible face anonymization, unlike traditional face pixelization, seeks to replace sensitive identity information in facial images with synthesized alternatives, preserving privacy without sacrificing image clarity. Traditional methods, such as encoder-decoder networks, often result in significant loss of facial details due to their limited learning capacity. Additionally, relying on latent man… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  31. arXiv:2408.09397  [pdf, other

    cs.CV

    Combo: Co-speech holistic 3D human motion generation and efficient customizable adaptation in harmony

    Authors: Chao Xu, Mingze Sun, Zhi-Qi Cheng, Fei Wang, Yang Liu, Baigui Sun, Ruqi Huang, Alexander Hauptmann

    Abstract: In this paper, we propose a novel framework, Combo, for harmonious co-speech holistic 3D human motion generation and efficient customizable adaption. In particular, we identify that one fundamental challenge as the multiple-input-multiple-output (MIMO) nature of the generative model of interest. More concretely, on the input end, the model typically consumes both speech signals and character guida… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  32. arXiv:2408.09220  [pdf, other

    cs.CV cs.AI

    Flatten: Video Action Recognition is an Image Classification task

    Authors: Junlin Chen, Chengcheng Xu, Yangfan Xu, Jian Yang, Jun Li, Zhiping Shi

    Abstract: In recent years, video action recognition, as a fundamental task in the field of video understanding, has been deeply explored by numerous researchers.Most traditional video action recognition methods typically involve converting videos into three-dimensional data that encapsulates both spatial and temporal information, subsequently leveraging prevalent image understanding models to model and anal… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: 13pages, 6figures

  33. arXiv:2408.08502  [pdf, other

    cs.CV

    Efficient Image-to-Image Diffusion Classifier for Adversarial Robustness

    Authors: Hefei Mei, Minjing Dong, Chang Xu

    Abstract: Diffusion models (DMs) have demonstrated great potential in the field of adversarial robustness, where DM-based defense methods can achieve superior defense capability without adversarial training. However, they all require huge computational costs due to the usage of large-scale pre-trained DMs, making it difficult to conduct full evaluation under strong attacks and compare with traditional CNN-b… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  34. arXiv:2408.07673  [pdf

    cs.LG cs.AI cs.NE q-bio.QM

    Deep Learning: a Heuristic Three-stage Mechanism for Grid Searches to Optimize the Future Risk Prediction of Breast Cancer Metastasis Using EHR-based Clinical Data

    Authors: Xia Jiang, Yijun Zhou, Chuhan Xu, Adam Brufsky, Alan Wells

    Abstract: A grid search, at the cost of training and testing a large number of models, is an effective way to optimize the prediction performance of deep learning models. A challenging task concerning grid search is the time management. Without a good time management scheme, a grid search can easily be set off as a mission that will not finish in our lifetime. In this study, we introduce a heuristic three-s… ▽ More

    Submitted 15 August, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

  35. arXiv:2408.05775  [pdf, other

    cs.CV

    Efficient Test-Time Prompt Tuning for Vision-Language Models

    Authors: Yuhan Zhu, Guozhen Zhang, Chen Xu, Haocheng Shen, Xiaoxin Chen, Gangshan Wu, Limin Wang

    Abstract: Vision-language models have showcased impressive zero-shot classification capabilities when equipped with suitable text prompts. Previous studies have shown the effectiveness of test-time prompt tuning; however, these methods typically require per-image prompt adaptation during inference, which incurs high computational budgets and limits scalability and practical deployment. To overcome this issu… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  36. arXiv:2408.04294  [pdf, other

    cs.CV cs.LG

    Dual-branch PolSAR Image Classification Based on GraphMAE and Local Feature Extraction

    Authors: Yuchen Wang, Ziyi Guo, Haixia Bi, Danfeng Hong, Chen Xu

    Abstract: The annotation of polarimetric synthetic aperture radar (PolSAR) images is a labor-intensive and time-consuming process. Therefore, classifying PolSAR images with limited labels is a challenging task in remote sensing domain. In recent years, self-supervised learning approaches have proven effective in PolSAR image classification with sparse labels. However, we observe a lack of research on genera… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  37. arXiv:2408.01945  [pdf, other

    cs.CV cs.RO

    Generalized Maximum Likelihood Estimation for Perspective-n-Point Problem

    Authors: Tian Zhan, Chunfeng Xu, Cheng Zhang, Ke Zhu

    Abstract: The Perspective-n-Point (PnP) problem has been widely studied in the literature and applied in various vision-based pose estimation scenarios. However, existing methods ignore the anisotropy uncertainty of observations, as demonstrated in several real-world datasets in this paper. This oversight may lead to suboptimal and inaccurate estimation, particularly in the presence of noisy observations. T… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  38. arXiv:2408.01835  [pdf, other

    cs.CV

    TS-SAM: Fine-Tuning Segment-Anything Model for Downstream Tasks

    Authors: Yang Yu, Chen Xu, Kai Wang

    Abstract: Adapter based fine-tuning has been studied for improving the performance of SAM on downstream tasks. However, there is still a significant performance gap between fine-tuned SAMs and domain-specific models. To reduce the gap, we propose Two-Stream SAM (TS-SAM). On the one hand, inspired by the side network in Parameter-Efficient Fine-Tuning (PEFT), we designed a lightweight Convolutional Side Adap… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  39. arXiv:2408.01649  [pdf, other

    cs.RO

    LF-3PM: a LiDAR-based Framework for Perception-aware Planning with Perturbation-induced Metric

    Authors: Kaixin Chai, Long Xu, Qianhao Wang, Chao Xu, Peng Yin, Fei Gao

    Abstract: Just as humans can become disoriented in featureless deserts or thick fogs, not all environments are conducive to the Localization Accuracy and Stability (LAS) of autonomous robots. This paper introduces an efficient framework designed to enhance LiDAR-based LAS through strategic trajectory generation, known as Perception-aware Planning. Unlike vision-based frameworks, the LiDAR-based requires dif… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  40. arXiv:2408.01430  [pdf, other

    cs.CV cs.AI

    SUSTechGAN: Image Generation for Object Recognition in Adverse Conditions of Autonomous Driving

    Authors: Gongjin Lan, Yang Peng, Qi Hao, Chengzhong Xu

    Abstract: Autonomous driving significantly benefits from data-driven deep neural networks. However, the data in autonomous driving typically fits the long-tailed distribution, in which the critical driving data in adverse conditions is hard to collect. Although generative adversarial networks (GANs) have been applied to augment data for autonomous driving, generating driving images in adverse conditions is… ▽ More

    Submitted 18 July, 2024; originally announced August 2024.

    Comments: 10 pages, 9 figures

  41. arXiv:2408.01076  [pdf, other

    cs.CV

    Exploiting the Semantic Knowledge of Pre-trained Text-Encoders for Continual Learning

    Authors: Lu Yu, Zhe Tao, Hantao Yao, Joost Van de Weijer, Changsheng Xu

    Abstract: Deep neural networks (DNNs) excel on fixed datasets but struggle with incremental and shifting data in real-world scenarios. Continual learning addresses this challenge by allowing models to learn from new data while retaining previously learned knowledge. Existing methods mainly rely on visual features, often neglecting the rich semantic information encoded in text. The semantic knowledge availab… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  42. arXiv:2408.00764  [pdf, other

    cs.CL cs.AI cs.LG

    AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation

    Authors: Mengkang Hu, Pu Zhao, Can Xu, Qingfeng Sun, Jianguang Lou, Qingwei Lin, Ping Luo, Saravan Rajmohan, Dongmei Zhang

    Abstract: Large Language Model (LLM) based agents have garnered significant attention and are becoming increasingly popular. Furthermore, planning ability is a crucial component of an LLM-based agent, involving interaction with the environment and executing actions to complete a planning task, which generally entails achieving a desired goal from an initial state. This paper investigates enhancing the plann… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  43. arXiv:2407.21439  [pdf, other

    cs.AI cs.CL cs.LG

    MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training

    Authors: Zhanpeng Chen, Chengjin Xu, Yiyan Qi, Jian Guo

    Abstract: Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in processing and generating content across multiple data modalities, including text, images, audio, and video. However, a significant drawback of MLLMs is their reliance on static training data, leading to outdated information and limited contextual awareness. This static nature hampers their ability to provide acc… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  44. arXiv:2407.20523  [pdf, other

    cs.IT cs.MM

    Wireless Multi-User Interactive Virtual Reality in Metaverse with Edge-Device Collaborative Computing

    Authors: Caolu Xu, Zhiyong Chen, Meixia Tao, Wenjun Zhang

    Abstract: The immersive nature of the metaverse presents significant challenges for wireless multi-user interactive virtual reality (VR), such as ultra-low latency, high throughput and intensive computing, which place substantial demands on the wireless bandwidth and rendering resources of mobile edge computing (MEC). In this paper, we propose a wireless multi-user interactive VR with edge-device collaborat… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: submitted to IEEE journal

  45. arXiv:2407.19078  [pdf, other

    cs.LG stat.ML

    Practical Marketplace Optimization at Uber Using Causally-Informed Machine Learning

    Authors: Bobby Chen, Siyu Chen, Jason Dowlatabadi, Yu Xuan Hong, Vinayak Iyer, Uday Mantripragada, Rishabh Narang, Apoorv Pandey, Zijun Qin, Abrar Sheikh, Hongtao Sun, Jiaqi Sun, Matthew Walker, Kaichen Wei, Chen Xu, Jingnan Yang, Allen T. Zhang, Guoqing Zhang

    Abstract: Budget allocation of marketplace levers, such as incentives for drivers and promotions for riders, has long been a technical and business challenge at Uber; understanding lever budget changes' impact and estimating cost efficiency to achieve predefined budgets is crucial, with the goal of optimal allocations that maximize business value; we introduce an end-to-end machine learning and optimization… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: To be published in the 2nd Workshop on Causal Inference and Machine Learning in Practice, KDD 2024, August 25 to 29, 2024, Barcelona, Spain, 10 pages

    MSC Class: 62J99

  46. arXiv:2407.19014  [pdf, other

    cs.CV

    Sparse Refinement for Efficient High-Resolution Semantic Segmentation

    Authors: Zhijian Liu, Zhuoyang Zhang, Samir Khaki, Shang Yang, Haotian Tang, Chenfeng Xu, Kurt Keutzer, Song Han

    Abstract: Semantic segmentation empowers numerous real-world applications, such as autonomous driving and augmented/mixed reality. These applications often operate on high-resolution images (e.g., 8 megapixels) to capture the fine details. However, this comes at the cost of considerable computational complexity, hindering the deployment in latency-sensitive scenarios. In this paper, we introduce SparseRefin… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: ECCV 2024. The first two authors contributed equally to this work. Project page: https://sparserefine.mit.edu

  47. arXiv:2407.18559  [pdf, other

    cs.CV

    VSSD: Vision Mamba with Non-Causal State Space Duality

    Authors: Yuheng Shi, Minjing Dong, Mingjia Li, Chang Xu

    Abstract: Vision transformers have significantly advanced the field of computer vision, offering robust modeling capabilities and global receptive field. However, their high computational demands limit their applicability in processing long sequences. To tackle this issue, State Space Models (SSMs) have gained prominence in vision tasks as they offer linear computational complexity. Recently, State Space Du… ▽ More

    Submitted 4 August, 2024; v1 submitted 26 July, 2024; originally announced July 2024.

    Comments: 16 pages, 5 figures, 7 tables

  48. arXiv:2407.18418  [pdf, other

    cs.CL

    Know Your Limits: A Survey of Abstention in Large Language Models

    Authors: Bingbing Wen, Jihan Yao, Shangbin Feng, Chenjun Xu, Yulia Tsvetkov, Bill Howe, Lucy Lu Wang

    Abstract: Abstention, the refusal of large language models (LLMs) to provide an answer, is increasingly recognized for its potential to mitigate hallucinations and enhance safety in LLM systems. In this survey, we introduce a framework to examine abstention from three perspectives: the query, the model, and human values. We organize the literature on abstention methods, benchmarks, and evaluation metrics us… ▽ More

    Submitted 8 August, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

    Comments: preprint

  49. arXiv:2407.17757  [pdf, other

    cs.CV cs.RO

    CRASH: Crash Recognition and Anticipation System Harnessing with Context-Aware and Temporal Focus Attentions

    Authors: Haicheng Liao, Haoyu Sun, Huanming Shen, Chengyue Wang, Kahou Tam, Chunlin Tian, Li Li, Chengzhong Xu, Zhenning Li

    Abstract: Accurately and promptly predicting accidents among surrounding traffic agents from camera footage is crucial for the safety of autonomous vehicles (AVs). This task presents substantial challenges stemming from the unpredictable nature of traffic accidents, their long-tail distribution, the intricacies of traffic scene dynamics, and the inherently constrained field of vision of onboard cameras. To… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  50. arXiv:2407.17738  [pdf, other

    cs.CV

    Enhancing Fine-grained Object Detection in Aerial Images via Orthogonal Mapping

    Authors: Haoran Zhu, Yifan Zhou, Chang Xu, Ruixiang Zhang, Wen Yang

    Abstract: Fine-Grained Object Detection (FGOD) is a critical task in high-resolution aerial image analysis. This letter introduces Orthogonal Mapping (OM), a simple yet effective method aimed at addressing the challenge of semantic confusion inherent in FGOD. OM introduces orthogonal constraints in the feature space by decoupling features from the last layer of the classification branch with a class-wise or… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.