Skip to main content

Showing 1–50 of 5,953 results for author: Wang, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.18232  [pdf, other

    cs.CV

    LION: Linear Group RNN for 3D Object Detection in Point Clouds

    Authors: Zhe Liu, Jinghua Hou, Xinyu Wang, Xiaoqing Ye, Jingdong Wang, Hengshuang Zhao, Xiang Bai

    Abstract: The benefit of transformers in large-scale 3D point cloud perception tasks, such as 3D object detection, is limited by their quadratic computation cost when modeling long-range relationships. In contrast, linear RNNs have low computational complexity and are suitable for long-range modeling. Toward this goal, we propose a simple and effective window-based framework built on LInear grOup RNN (i.e.,… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Project page: https://happinesslz.github.io/projects/LION/

  2. arXiv:2407.18121  [pdf, other

    cs.CV

    Efficient Inference of Vision Instruction-Following Models with Elastic Cache

    Authors: Zuyan Liu, Benlin Liu, Jiahui Wang, Yuhao Dong, Guangyi Chen, Yongming Rao, Ranjay Krishna, Jiwen Lu

    Abstract: In the field of instruction-following large vision-language models (LVLMs), the efficient deployment of these models faces challenges, notably due to the high memory demands of their key-value (KV) caches. Conventional cache management strategies for LLMs focus on cache eviction, which often fails to address the specific needs of multimodal instruction-following models. Recognizing this gap, in th… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  3. arXiv:2407.17910  [pdf, other

    stat.ML cs.AI cs.LG

    Causal Deepsets for Off-policy Evaluation under Spatial or Spatio-temporal Interferences

    Authors: Runpeng Dai, Jianing Wang, Fan Zhou, Shikai Luo, Zhiwei Qin, Chengchun Shi, Hongtu Zhu

    Abstract: Off-policy evaluation (OPE) is widely applied in sectors such as pharmaceuticals and e-commerce to evaluate the efficacy of novel products or policies from offline datasets. This paper introduces a causal deepset framework that relaxes several key structural assumptions, primarily the mean-field assumption, prevalent in existing OPE methodologies that handle spatio-temporal interference. These tra… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  4. arXiv:2407.17838  [pdf

    cs.CV cs.AI

    UMono: Physical Model Informed Hybrid CNN-Transformer Framework for Underwater Monocular Depth Estimation

    Authors: Jian Wang, Jing Wang, Shenghui Rong, Bo He

    Abstract: Underwater monocular depth estimation serves as the foundation for tasks such as 3D reconstruction of underwater scenes. However, due to the influence of light and medium, the underwater environment undergoes a distinctive imaging process, which presents challenges in accurately estimating depth from a single image. The existing methods fail to consider the unique characteristics of underwater env… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  5. arXiv:2407.17227  [pdf, other

    cs.AI cs.CL

    LEAN-GitHub: Compiling GitHub LEAN repositories for a versatile LEAN prover

    Authors: Zijian Wu, Jiayu Wang, Dahua Lin, Kai Chen

    Abstract: Recently, large language models have presented promising results in aiding formal mathematical reasoning. However, their performance is restricted due to the scarcity of formal theorem-proving data, which requires additional effort to be extracted from raw formal language corpora. Meanwhile, a significant amount of human-written formal language corpora remains underutilized. To address this issue,… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  6. arXiv:2407.16993  [pdf, other

    cs.CV

    LoFormer: Local Frequency Transformer for Image Deblurring

    Authors: Xintian Mao, Jiansheng Wang, Xingran Xie, Qingli Li, Yan Wang

    Abstract: Due to the computational complexity of self-attention (SA), prevalent techniques for image deblurring often resort to either adopting localized SA or employing coarse-grained global SA methods, both of which exhibit drawbacks such as compromising global modeling or lacking fine-grained correlation. In order to address this issue by effectively modeling long-range dependencies without sacrificing f… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  7. arXiv:2407.16957  [pdf, other

    cs.CV

    Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop Removal

    Authors: Yeying Jin, Xin Li, Jiadong Wang, Yan Zhang, Malu Zhang

    Abstract: Existing raindrop removal datasets have two shortcomings. First, they consist of images captured by cameras with a focus on the background, leading to the presence of blurry raindrops. To our knowledge, none of these datasets include images where the focus is specifically on raindrops, which results in a blurry background. Second, these datasets predominantly consist of daytime images, thereby lac… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV2024, dataset and benchmark at: \url{https://github.com/jinyeying/RaindropClarity}

  8. arXiv:2407.16955  [pdf, other

    cs.CV cs.RO

    DVPE: Divided View Position Embedding for Multi-View 3D Object Detection

    Authors: Jiasen Wang, Zhenglin Li, Ke Sun, Xianyuan Liu, Yang Zhou

    Abstract: Sparse query-based paradigms have achieved significant success in multi-view 3D detection for autonomous vehicles. Current research faces challenges in balancing between enlarging receptive fields and reducing interference when aggregating multi-view features. Moreover, different poses of cameras present challenges in training global attention models. To address these problems, this paper proposes… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  9. arXiv:2407.16928  [pdf, other

    cs.CR

    From Sands to Mansions: Enabling Automatic Full-Life-Cycle Cyberattack Construction with LLM

    Authors: Lingzhi Wang, Jiahui Wang, Kyle Jung, Kedar Thiagarajan, Emily Wei, Xiangmin Shen, Yan Chen, Zhenyuan Li

    Abstract: The escalating battles between attackers and defenders in cybersecurity make it imperative to test and evaluate defense capabilities from the attackers' perspective. However, constructing full-life-cycle cyberattacks and performing red team emulations requires significant time and domain knowledge from security experts. Existing cyberattack simulation frameworks face challenges such as limited tec… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  10. arXiv:2407.16822  [pdf, other

    cs.CV cs.AI

    AI-Enhanced 7-Point Checklist for Melanoma Detection Using Clinical Knowledge Graphs and Data-Driven Quantification

    Authors: Yuheng Wang, Tianze Yu, Jiayue Cai, Sunil Kalia, Harvey Lui, Z. Jane Wang, Tim K. Lee

    Abstract: The 7-point checklist (7PCL) is widely used in dermoscopy to identify malignant melanoma lesions needing urgent medical attention. It assigns point values to seven attributes: major attributes are worth two points each, and minor ones are worth one point each. A total score of three or higher prompts further evaluation, often including a biopsy. However, a significant limitation of current methods… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  11. arXiv:2407.16788  [pdf, other

    cs.CV

    Occlusion-Aware 3D Motion Interpretation for Abnormal Behavior Detection

    Authors: Su Li, Wang Liang, Jianye Wang, Ziheng Zhang, Lei Zhang

    Abstract: Estimating abnormal posture based on 3D pose is vital in human pose analysis, yet it presents challenges, especially when reconstructing 3D human poses from monocular datasets with occlusions. Accurate reconstructions enable the restoration of 3D movements, which assist in the extraction of semantic details necessary for analyzing abnormal behaviors. However, most existing methods depend on predef… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  12. arXiv:2407.16697  [pdf, other

    cs.CV

    AbdomenAtlas: A Large-Scale, Detailed-Annotated, & Multi-Center Dataset for Efficient Transfer Learning and Open Algorithmic Benchmarking

    Authors: Wenxuan Li, Chongyu Qu, Xiaoxi Chen, Pedro R. A. S. Bassi, Yijia Shi, Yuxiang Lai, Qian Yu, Huimin Xue, Yixiong Chen, Xiaorui Lin, Yutong Tang, Yining Cao, Haoqi Han, Zheyuan Zhang, Jiawei Liu, Tiezheng Zhang, Yujiu Ma, Jincheng Wang, Guang Zhang, Alan Yuille, Zongwei Zhou

    Abstract: We introduce the largest abdominal CT dataset (termed AbdomenAtlas) of 20,460 three-dimensional CT volumes sourced from 112 hospitals across diverse populations, geographies, and facilities. AbdomenAtlas provides 673K high-quality masks of anatomical structures in the abdominal region annotated by a team of 10 radiologists with the help of AI algorithms. We start by having expert radiologists manu… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Published in Medical Image Analysis

  13. arXiv:2407.16210  [pdf, other

    cs.GR cs.AI cs.LG

    Strategy and Skill Learning for Physics-based Table Tennis Animation

    Authors: Jiashun Wang, Jessica Hodgins, Jungdam Won

    Abstract: Recent advancements in physics-based character animation leverage deep learning to generate agile and natural motion, enabling characters to execute movements such as backflips, boxing, and tennis. However, reproducing the selection and use of diverse motor skills in dynamic environments to solve complex tasks, as humans do, still remains a challenge. We present a strategy and skill learning appro… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: SIGGRAPH 2024

  14. arXiv:2407.16207  [pdf, other

    cs.CL

    Graph-Structured Speculative Decoding

    Authors: Zhuocheng Gong, Jiahao Liu, Ziyue Wang, Pengfei Wu, Jingang Wang, Xunliang Cai, Dongyan Zhao, Rui Yan

    Abstract: Speculative decoding has emerged as a promising technique to accelerate the inference of Large Language Models (LLMs) by employing a small language model to draft a hypothesis sequence, which is then validated by the LLM. The effectiveness of this approach heavily relies on the balance between performance and efficiency of the draft model. In our research, we focus on enhancing the proportion of d… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  15. arXiv:2407.16154  [pdf, other

    cs.CL

    DDK: Distilling Domain Knowledge for Efficient Large Language Models

    Authors: Jiaheng Liu, Chenchen Zhang, Jinyang Guo, Yuanxing Zhang, Haoran Que, Ken Deng, Zhiqi Bai, Jie Liu, Ge Zhang, Jiakai Wang, Yanan Wu, Congnan Liu, Wenbo Su, Jiamang Wang, Lin Qu, Bo Zheng

    Abstract: Despite the advanced intelligence abilities of large language models (LLMs) in various applications, they still face significant computational and storage demands. Knowledge Distillation (KD) has emerged as an effective strategy to improve the performance of a smaller LLM (i.e., the student model) by transferring knowledge from a high-performing LLM (i.e., the teacher model). Prevailing techniques… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  16. arXiv:2407.15819  [pdf, other

    cs.CV

    Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight

    Authors: Ziyuan Huang, Kaixiang Ji, Biao Gong, Zhiwu Qing, Qinglong Zhang, Kecheng Zheng, Jian Wang, Jingdong Chen, Ming Yang

    Abstract: This paper introduces Chain-of-Sight, a vision-language bridge module that accelerates the pre-training of Multimodal Large Language Models (MLLMs). Our approach employs a sequence of visual resamplers that capture visual details at various spacial scales. This architecture not only leverages global and local visual contexts effectively, but also facilitates the flexible extension of visual tokens… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  17. arXiv:2407.15567  [pdf, other

    cs.LG cs.DC cs.IT math.OC

    A New Theoretical Perspective on Data Heterogeneity in Federated Optimization

    Authors: Jiayi Wang, Shiqiang Wang, Rong-Rong Chen, Mingyue Ji

    Abstract: In federated learning (FL), data heterogeneity is the main reason that existing theoretical analyses are pessimistic about the convergence rate. In particular, for many FL algorithms, the convergence rate grows dramatically when the number of local updates becomes large, especially when the product of the gradient divergence and local Lipschitz constant is large. However, empirical studies can sho… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: ICML 2024

  18. arXiv:2407.15362  [pdf, other

    cs.CV cs.AI

    A Multimodal Knowledge-enhanced Whole-slide Pathology Foundation Model

    Authors: Yingxue Xu, Yihui Wang, Fengtao Zhou, Jiabo Ma, Shu Yang, Huangjing Lin, Xin Wang, Jiguang Wang, Li Liang, Anjia Han, Ronald Cheong Kin Chan, Hao Chen

    Abstract: Remarkable strides in computational pathology have been made in the task-agnostic foundation model that advances the performance of a wide array of downstream clinical tasks. Despite the promising performance, there are still several challenges. First, prior works have resorted to either vision-only or vision-captions data, disregarding invaluable pathology reports and gene expression profiles whi… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 44 pages, 9 figures

  19. arXiv:2407.15334  [pdf, other

    cs.CV

    Explore the LiDAR-Camera Dynamic Adjustment Fusion for 3D Object Detection

    Authors: Yiran Yang, Xu Gao, Tong Wang, Xin Hao, Yifeng Shi, Xiao Tan, Xiaoqing Ye, Jingdong Wang

    Abstract: Camera and LiDAR serve as informative sensors for accurate and robust autonomous driving systems. However, these sensors often exhibit heterogeneous natures, resulting in distributional modality gaps that present significant challenges for fusion. To address this, a robust fusion technique is crucial, particularly for enhancing 3D object detection. In this paper, we introduce a dynamic adjustment… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  20. arXiv:2407.15212  [pdf, other

    cs.CV cs.GR

    Surfel-based Gaussian Inverse Rendering for Fast and Relightable Dynamic Human Reconstruction from Monocular Video

    Authors: Yiqun Zhao, Chenming Wu, Binbin Huang, Yihao Zhi, Chen Zhao, Jingdong Wang, Shenghua Gao

    Abstract: Efficient and accurate reconstruction of a relightable, dynamic clothed human avatar from a monocular video is crucial for the entertainment industry. This paper introduces the Surfel-based Gaussian Inverse Avatar (SGIA) method, which introduces efficient training and rendering for relightable dynamic human reconstruction. SGIA advances previous Gaussian Avatar methods by comprehensively modeling… ▽ More

    Submitted 23 July, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

    Comments: Under Review; Project Page: https://GS-IA.github.io

  21. arXiv:2407.15026  [pdf, other

    cs.AR cs.AI

    Benchmarking End-To-End Performance of AI-Based Chip Placement Algorithms

    Authors: Zhihai Wang, Zijie Geng, Zhaojie Tu, Jie Wang, Yuxi Qian, Zhexuan Xu, Ziyan Liu, Siyuan Xu, Zhentao Tang, Shixiong Kai, Mingxuan Yuan, Jianye Hao, Bin Li, Yongdong Zhang, Feng Wu

    Abstract: The increasing complexity of modern very-large-scale integration (VLSI) design highlights the significance of Electronic Design Automation (EDA) technologies. Chip placement is a critical step in the EDA workflow, which positions chip modules on the canvas with the goal of optimizing performance, power, and area (PPA) metrics of final chip designs. Recent advances have demonstrated the great poten… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: A comprehensive benchmark for AI-based chip placement algorithms using end-to-end performance metrics

  22. arXiv:2407.14882  [pdf, other

    cs.LG cs.AI math.NA

    Reduced Effectiveness of Kolmogorov-Arnold Networks on Functions with Noise

    Authors: Haoran Shen, Chen Zeng, Jiahui Wang, Qiao Wang

    Abstract: It has been observed that even a small amount of noise introduced into the dataset can significantly degrade the performance of KAN. In this brief note, we aim to quantitatively evaluate the performance when noise is added to the dataset. We propose an oversampling technique combined with denoising to alleviate the impact of noise. Specifically, we employ kernel filtering based on diffusion maps f… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    MSC Class: 68T07

  23. arXiv:2407.14605  [pdf, other

    cs.CV cs.AI

    ESCAPE: Energy-based Selective Adaptive Correction for Out-of-distribution 3D Human Pose Estimation

    Authors: Luke Bidulka, Mohsen Gholami, Jiannan Zheng, Martin J. McKeown, Z. Jane Wang

    Abstract: Despite recent advances in human pose estimation (HPE), poor generalization to out-of-distribution (OOD) data remains a difficult problem. While previous works have proposed Test-Time Adaptation (TTA) to bridge the train-test domain gap by refining network parameters at inference, the absence of ground-truth annotations makes it highly challenging and existing methods typically increase inference… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 32 pages, 8 figures

    ACM Class: I.2.6; I.2.10

  24. arXiv:2407.14568  [pdf, other

    cs.CL cs.AI cs.DB

    SQLfuse: Enhancing Text-to-SQL Performance through Comprehensive LLM Synergy

    Authors: Tingkai Zhang, Chaoyu Chen, Cong Liao, Jun Wang, Xudong Zhao, Hang Yu, Jianchao Wang, Jianguo Li, Wenhui Shi

    Abstract: Text-to-SQL conversion is a critical innovation, simplifying the transition from complex SQL to intuitive natural language queries, especially significant given SQL's prevalence in the job market across various roles. The rise of Large Language Models (LLMs) like GPT-3.5 and GPT-4 has greatly advanced this field, offering improved natural language understanding and the ability to generate nuanced… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  25. arXiv:2407.14532  [pdf, other

    cs.DC cs.LG

    A Scenario-Oriented Benchmark for Assessing AIOps Algorithms in Microservice Management

    Authors: Yongqian Sun, Jiaju Wang, Zhengdan Li, Xiaohui Nie, Minghua Ma, Shenglin Zhang, Yuhe Ji, Lu Zhang, Wen Long, Hengmao Chen, Yongnan Luo, Dan Pei

    Abstract: AIOps algorithms play a crucial role in the maintenance of microservice systems. Many previous benchmarks' performance leaderboard provides valuable guidance for selecting appropriate algorithms. However, existing AIOps benchmarks mainly utilize offline datasets to evaluate algorithms. They cannot consistently evaluate the performance of algorithms using real-time datasets, and the operation scena… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Codes are available at https://github.com/MicroServo/microservo, datasets are available at https://github.com/MicroServo/hot-plugging

  26. arXiv:2407.14100  [pdf, other

    cs.GR cs.AI cs.LG

    ParamsDrag: Interactive Parameter Space Exploration via Image-Space Dragging

    Authors: Guan Li, Yang Liu, Guihua Shan, Shiyu Cheng, Weiqun Cao, Junpeng Wang, Ko-Chih Wang

    Abstract: Numerical simulation serves as a cornerstone in scientific modeling, yet the process of fine-tuning simulation parameters poses significant challenges. Conventionally, parameter adjustment relies on extensive numerical simulations, data analysis, and expert insights, resulting in substantial computational costs and low efficiency. The emergence of deep learning in recent years has provided promisi… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: To be published in Proc. IEEE VIS 2024

  27. arXiv:2407.14069  [pdf, other

    cs.CV

    Self-Supervised Video Representation Learning in a Heuristic Decoupled Perspective

    Authors: Zeen Song, Jingyao Wang, Jianqi Zhang, Changwen Zheng, Wenwen Qiang

    Abstract: Video contrastive learning (v-CL) has gained prominence as a leading framework for unsupervised video representation learning, showcasing impressive performance across various tasks such as action classification and detection. In the field of video representation learning, a feature extractor should ideally capture both static and dynamic semantics. However, our series of experiments reveals that… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  28. arXiv:2407.14058  [pdf, other

    cs.LG

    On the Causal Sufficiency and Necessity of Multi-Modal Representation Learning

    Authors: Jingyao Wang, Wenwen Qiang, Jiangmeng Li, Lingyu Si, Changwen Zheng, Bing Su

    Abstract: An effective paradigm of multi-modal learning (MML) is to learn unified representations among modalities. From a causal perspective, constraining the consistency between different modalities can mine causal representations that convey primary events. However, such simple consistency may face the risk of learning insufficient or unnecessary information: a necessary but insufficient cause is invaria… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  29. arXiv:2407.13998  [pdf, other

    cs.CL cs.AI

    RAG-QA Arena: Evaluating Domain Robustness for Long-form Retrieval Augmented Question Answering

    Authors: Rujun Han, Yuhao Zhang, Peng Qi, Yumo Xu, Jenyuan Wang, Lan Liu, William Yang Wang, Bonan Min, Vittorio Castelli

    Abstract: Question answering based on retrieval augmented generation (RAG-QA) is an important research topic in NLP and has a wide range of real-world applications. However, most existing datasets for this task are either constructed using a single source corpus or consist of short extractive answers, which fall short of evaluating large language model (LLM) based RAG-QA systems on cross-domain generalizati… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  30. arXiv:2407.13520  [pdf, other

    cs.CV

    EaDeblur-GS: Event assisted 3D Deblur Reconstruction with Gaussian Splatting

    Authors: Yuchen Weng, Zhengwen Shen, Ruofan Chen, Qi Wang, Jun Wang

    Abstract: 3D deblurring reconstruction techniques have recently seen significant advancements with the development of Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS). Although these techniques can recover relatively clear 3D reconstructions from blurry image inputs, they still face limitations in handling severe blurring and complex camera motion. To address these issues, we propose Event-ass… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  31. arXiv:2407.13284  [pdf, other

    cs.IR

    Semantic-aware Representation Learning for Homography Estimation

    Authors: Yuhan Liu, Qianxin Huang, Siqi Hui, Jingwen Fu, Sanping Zhou, Kangyi Wu, Pengna Li, Jinjun Wang

    Abstract: Homography estimation is the task of determining the transformation from an image pair. Our approach focuses on employing detector-free feature matching methods to address this issue. Previous work has underscored the importance of incorporating semantic information, however there still lacks an efficient way to utilize semantic information. Previous methods suffer from treating the semantics as a… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  32. arXiv:2407.13278  [pdf, other

    cs.LG

    Deep Time Series Models: A Comprehensive Survey and Benchmark

    Authors: Yuxuan Wang, Haixu Wu, Jiaxiang Dong, Yong Liu, Mingsheng Long, Jianmin Wang

    Abstract: Time series, characterized by a sequence of data points arranged in a discrete-time order, are ubiquitous in real-world applications. Different from other modalities, time series present unique challenges due to their complex and dynamic nature, including the entanglement of nonlinear patterns and time-variant trends. Analyzing time series data is of great significance in real-world scenarios and… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: \

  33. arXiv:2407.13220  [pdf, other

    eess.AS cs.SD

    MEDIC: Zero-shot Music Editing with Disentangled Inversion Control

    Authors: Huadai Liu, Jialei Wang, Rongjie Huang, Yang Liu, Jiayang Xu, Zhou Zhao

    Abstract: Text-guided diffusion models catalyze a paradigm shift in audio generation, facilitating the adaptability of source audio to conform to specific textual prompts. Recent advancements introduce inversion techniques, like DDIM inversion, to zero-shot editing, exploiting pre-trained diffusion models for audio modification. Nonetheless, our investigation exposes that DDIM inversion suffers from an accu… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  34. arXiv:2407.13201  [pdf, other

    cs.SE

    $μ$Drive: User-Controlled Autonomous Driving

    Authors: Kun Wang, Christopher M. Poskitt, Yang Sun, Jun Sun, Jingyi Wang, Peng Cheng, Jiming Chen

    Abstract: Autonomous Vehicles (AVs) rely on sophisticated Autonomous Driving Systems (ADSs) to provide passengers a satisfying and safe journey. The individual preferences of riders plays a crucial role in shaping the perception of safety and comfort while they are in the car. Existing ADSs, however, lack mechanisms to systematically capture and integrate rider preferences into their planning modules. To br… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  35. arXiv:2407.13137  [pdf, other

    cs.CV

    OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-eye-view Vehicle Semantic Segmentation

    Authors: Jian Sun, Yuqi Dai, Chi-Man Vong, Qing Xu, Shengbo Eben Li, Jianqiang Wang, Lei He, Keqiang Li

    Abstract: Bird's-eye-view (BEV) semantic segmentation is becoming crucial in autonomous driving systems. It realizes ego-vehicle surrounding environment perception by projecting 2D multi-view images into 3D world space. Recently, BEV segmentation has made notable progress, attributed to better view transformation modules, larger image encoders, or more temporal information. However, there are still two issu… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  36. arXiv:2407.12962  [pdf, other

    cs.RO

    NAS: N-step computation of All Solutions to the footstep planning problem

    Authors: Jiayi Wang, Saeid Samadi, Hefan Wang, Pierre Fernbach, Olivier Stasse, Sethu Vijayakumar, Steve Tonneau

    Abstract: How many ways are there to climb a staircase in a given number of steps? Infinitely many, if we focus on the continuous aspect of the problem. A finite, possibly large number if we consider the discrete aspect, i.e. on which surface which effectors are going to step and in what order. We introduce NAS, an algorithm that considers both aspects simultaneously and computes all the possible solutions… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Submitted to Humanoids 2024

  37. arXiv:2407.12940  [pdf, other

    cs.RO cs.CV

    KiGRAS: Kinematic-Driven Generative Model for Realistic Agent Simulation

    Authors: Jianbo Zhao, Jiaheng Zhuang, Qibin Zhou, Taiyu Ban, Ziyao Xu, Hangning Zhou, Junhe Wang, Guoan Wang, Zhiheng Li, Bin Li

    Abstract: Trajectory generation is a pivotal task in autonomous driving. Recent studies have introduced the autoregressive paradigm, leveraging the state transition model to approximate future trajectory distributions. This paradigm closely mirrors the real-world trajectory generation process and has achieved notable success. However, its potential is limited by the ineffective representation of realistic t… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  38. arXiv:2407.12797  [pdf, other

    cs.PF cs.LG

    CEBench: A Benchmarking Toolkit for the Cost-Effectiveness of LLM Pipelines

    Authors: Wenbo Sun, Jiaqi Wang, Qiming Guo, Ziyu Li, Wenlu Wang, Rihan Hai

    Abstract: Online Large Language Model (LLM) services such as ChatGPT and Claude 3 have transformed business operations and academic research by effortlessly enabling new opportunities. However, due to data-sharing restrictions, sectors such as healthcare and finance prefer to deploy local LLM applications using costly hardware resources. This scenario requires a balance between the effectiveness advantages… ▽ More

    Submitted 20 June, 2024; originally announced July 2024.

  39. arXiv:2407.12791  [pdf, other

    cs.CL cs.AI

    TourLLM: Enhancing LLMs with Tourism Knowledge

    Authors: Qikai Wei, Mingzhi Yang, Jinqiang Wang, Wenwei Mao, Jiabo Xu, Huansheng Ning

    Abstract: Recently, large language models (LLMs) have demonstrated their effectiveness in various natural language processing (NLP) tasks. However, the lack of tourism knowledge limits the performance of LLMs in tourist attraction presentations and travel planning. To address this challenge, we constructed a supervised fine-tuning dataset for the culture and tourism domain, named Cultour. This dataset consi… ▽ More

    Submitted 18 June, 2024; originally announced July 2024.

  40. arXiv:2407.12758  [pdf, other

    cs.CV

    Mutual Information Guided Optimal Transport for Unsupervised Visible-Infrared Person Re-identification

    Authors: Zhizhong Zhang, Jiangming Wang, Xin Tan, Yanyun Qu, Junping Wang, Yong Xie, Yuan Xie

    Abstract: Unsupervised visible infrared person re-identification (USVI-ReID) is a challenging retrieval task that aims to retrieve cross-modality pedestrian images without using any label information. In this task, the large cross-modality variance makes it difficult to generate reliable cross-modality labels, and the lack of annotations also provides additional difficulties for learning modality-invariant… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  41. arXiv:2407.12565  [pdf, other

    cs.AR

    SigDLA: A Deep Learning Accelerator Extension for Signal Processing

    Authors: Fangfa Fu, Wenyu Zhang, Zesong Jiang, Zhiyu Zhu, Guoyu Li, Bing Yang, Cheng Liu, Liyi Xiao, Jinxiang Wang, Huawei Li, Xiaowei Li

    Abstract: Deep learning and signal processing are closely correlated in many IoT scenarios such as anomaly detection to empower intelligence of things. Many IoT processors utilize digital signal processors (DSPs) for signal processing and build deep learning frameworks on this basis. While deep learning is usually much more computing-intensive than signal processing, the computing efficiency of deep learnin… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  42. arXiv:2407.12491  [pdf, other

    cs.CV

    Hierarchical and Decoupled BEV Perception Learning Framework for Autonomous Driving

    Authors: Yuqi Dai, Jian Sun, Shengbo Eben Li, Qing Xu, Jianqiang Wang, Lei He, Keqiang Li

    Abstract: Perception is essential for autonomous driving system. Recent approaches based on Bird's-eye-view (BEV) and deep learning have made significant progress. However, there exists challenging issues including lengthy development cycles, poor reusability, and complex sensor setups in perception algorithm development process. To tackle the above challenges, this paper proposes a novel hierarchical Bird'… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  43. arXiv:2407.12423  [pdf, other

    cs.HC cs.AI

    StuGPTViz: A Visual Analytics Approach to Understand Student-ChatGPT Interactions

    Authors: Zixin Chen, Jiachen Wang, Meng Xia, Kento Shigyo, Dingdong Liu, Rong Zhang, Huamin Qu

    Abstract: The integration of Large Language Models (LLMs), especially ChatGPT, into education is poised to revolutionize students' learning experiences by introducing innovative conversational learning methodologies. To empower students to fully leverage the capabilities of ChatGPT in educational scenarios, understanding students' interaction patterns with ChatGPT is crucial for instructors. However, this e… ▽ More

    Submitted 21 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

    Comments: 11 pages. To be published at IEEE Visualization 2024

  44. arXiv:2407.12309  [pdf, other

    cs.CL

    MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language Models

    Authors: Thao Minh Nguyen Phan, Cong-Tinh Dao, Chenwei Wu, Jian-Zhe Wang, Shun Liu, Jun-En Ding, David Restrepo, Feng Liu, Fang-Ming Hung, Wen-Chih Peng

    Abstract: Electronic health records (EHRs) are multimodal by nature, consisting of structured tabular features like lab tests and unstructured clinical notes. In real-life clinical practice, doctors use complementary multimodal EHR data sources to get a clearer picture of patients' health and support clinical decision-making. However, most EHR predictive models do not reflect these procedures, as they eithe… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  45. Performance Analysis and Blocklength Minimization of Uplink RSMA for Short Packet Transmissions in URLLC

    Authors: Yixin Zhang, Wenchi Cheng, Jingqing Wang, Wei Zhang

    Abstract: Rate splitting multiple access (RSMA) is one of the most promising techniques for ultra-reliable and low-latency communications (URLLC) with stringent requirements on delay and reliability of multiple access. To fully explore the delay performance enhancement brought by uplink RSMA to URLLC, in this paper, we evaluate the performance of two-user uplink RSMA and propose the corresponding blocklengt… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Journal ref: GLOBECOM 2023 - 2023 IEEE Global Communications Conference

  46. arXiv:2407.12237  [pdf, other

    cs.IT

    Delay Tradeoff and Adaptive Finite Blocklength Framework for URLLC

    Authors: Yixin Zhang, Wenchi Cheng, Jingqing Wang, Wei Zhang

    Abstract: With various time-sensitive tasks to be served, ultra-reliable and low-latency communications (URLLC) has become one of the most important scenarios for the fifth generation (5G) wireless communications. The end-to-end delay from the sub-millisecond-level to the second-level is first put forward for a wide range of delay-sensitive tasks in the future sixth generation (6G) communication networks, w… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 7 pages, 5 figures

  47. arXiv:2407.11691  [pdf, other

    cs.CV

    VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

    Authors: Haodong Duan, Junming Yang, Yuxuan Qiao, Xinyu Fang, Lin Chen, Yuan Liu, Xiaoyi Dong, Yuhang Zang, Pan Zhang, Jiaqi Wang, Dahua Lin, Kai Chen

    Abstract: We present VLMEvalKit: an open-source toolkit for evaluating large multi-modality models based on PyTorch. The toolkit aims to provide a user-friendly and comprehensive framework for researchers and developers to evaluate existing multi-modality models and publish reproducible evaluation results. In VLMEvalKit, we implement over 70 different large multi-modality models, including both proprietary… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  48. arXiv:2407.11569  [pdf, other

    cs.CV

    SFPNet: Sparse Focal Point Network for Semantic Segmentation on General LiDAR Point Clouds

    Authors: Yanbo Wang, Wentao Zhao, Chuan Cao, Tianchen Deng, Jingchuan Wang, Weidong Chen

    Abstract: Although LiDAR semantic segmentation advances rapidly, state-of-the-art methods often incorporate specifically designed inductive bias derived from benchmarks originating from mechanical spinning LiDAR. This can limit model generalizability to other kinds of LiDAR technologies and make hyperparameter tuning more complex. To tackle these issues, we propose a generalized framework to accommodate var… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  49. arXiv:2407.11419  [pdf, other

    cs.CV

    TeethDreamer: 3D Teeth Reconstruction from Five Intra-oral Photographs

    Authors: Chenfan Xu, Zhentao Liu, Yuan Liu, Yulong Dou, Jiamin Wu, Jiepeng Wang, Minjiao Wang, Dinggang Shen, Zhiming Cui

    Abstract: Orthodontic treatment usually requires regular face-to-face examinations to monitor dental conditions of the patients. When in-person diagnosis is not feasible, an alternative is to utilize five intra-oral photographs for remote dental monitoring. However, it lacks of 3D information, and how to reconstruct 3D dental models from such sparse view photographs is a challenging problem. In this study,… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: MICCAI2024

  50. arXiv:2407.11335  [pdf, other

    cs.CV

    LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction

    Authors: Penghui Du, Yu Wang, Yifan Sun, Luting Wang, Yue Liao, Gang Zhang, Errui Ding, Yan Wang, Jingdong Wang, Si Liu

    Abstract: Existing methods enhance open-vocabulary object detection by leveraging the robust open-vocabulary recognition capabilities of Vision-Language Models (VLMs), such as CLIP.However, two main challenges emerge:(1) A deficiency in concept representation, where the category names in CLIP's text space lack textual and visual knowledge.(2) An overfitting tendency towards base categories, with the open vo… ▽ More

    Submitted 18 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV2024