Search | arXiv e-print repository

Effects of fiber number and density on fiber jamming: Towards follow-the-leader deployment of a continuum robot

Authors: Chen Qian, Tangyou Liu, Liao Wu

Abstract: Fiber jamming modules (FJMs) offer flexibility and quick stiffness variation, making them suitable for follow-the-leader (FTL) motions in continuum robots, which is ideal for minimally invasive surgery (MIS). However, their potential has not been fully exploited, particularly in designing and manufacturing small-sized FJMs with high stiffness variation. Although existing research has focused on fa… ▽ More Fiber jamming modules (FJMs) offer flexibility and quick stiffness variation, making them suitable for follow-the-leader (FTL) motions in continuum robots, which is ideal for minimally invasive surgery (MIS). However, their potential has not been fully exploited, particularly in designing and manufacturing small-sized FJMs with high stiffness variation. Although existing research has focused on factors like fiber materials and geometry to maximize stiffness variation, the results often do not apply to FJMs for MIS due to size constraints. Meanwhile, other factors such as fiber number and packing density, less significant to large FJMs but critical to small-sized FJMs, have received insufficient investigation regarding their impact on the stiffness variation for FTL deployment. In this paper, we design and fabricate FJMs with a diameter of 4mm. Through theoretical and experimental analysis, we find that fiber number and packing density significantly affect both absolute stiffness and stiffness variation. Our experiments confirm the feasibility of using FJMs in a medical FTL robot design. The optimal configuration is a 4mm FJM with 0.4mm fibers at a 56% packing density, achieving up to 3400% stiffness variation. A video demonstration of a prototype robot using the suggested parameters for achieving FTL motions can be found at https://youtu.be/7pI5U0z7kcE. △ Less

Submitted 24 August, 2024; originally announced August 2024.

Comments: 6 pages, 6 figures, accepted by IROS2024

arXiv:2408.11799 [pdf, other]

Practical token pruning for foundation models in few-shot conversational virtual assistant systems

Authors: Haode Qi, Cheng Qian, Jian Ni, Pratyush Singh, Reza Fazeli, Gengyu Wang, Zhongzheng Shu, Eric Wayne, Juergen Bross

Abstract: In an enterprise Virtual Assistant (VA) system, intent classification is the crucial component that determines how a user input is handled based on what the user wants. The VA system is expected to be a cost-efficient SaaS service with low training and inference time while achieving high accuracy even with a small number of training samples. We pretrain a transformer-based sentence embedding model… ▽ More In an enterprise Virtual Assistant (VA) system, intent classification is the crucial component that determines how a user input is handled based on what the user wants. The VA system is expected to be a cost-efficient SaaS service with low training and inference time while achieving high accuracy even with a small number of training samples. We pretrain a transformer-based sentence embedding model with a contrastive learning objective and leverage the embedding of the model as features when training intent classification models. Our approach achieves the state-of-the-art results for few-shot scenarios and performs better than other commercial solutions on popular intent classification benchmarks. However, generating features via a transformer-based model increases the inference time, especially for longer user inputs, due to the quadratic runtime of the transformer's attention mechanism. On top of model distillation, we introduce a practical multi-task adaptation approach that configures dynamic token pruning without the need for task-specific training for intent classification. We demonstrate that this approach improves the inference speed of popular sentence transformer models without affecting model performance. △ Less

Submitted 21 August, 2024; originally announced August 2024.

Comments: 6 pages, 3 figures

arXiv:2408.03703 [pdf, other]

CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications

Authors: Tianfang Zhang, Lei Li, Yang Zhou, Wentao Liu, Chen Qian, Xiangyang Ji

Abstract: Vision Transformers (ViTs) mark a revolutionary advance in neural networks with their token mixer's powerful global context capability. However, the pairwise token affinity and complex matrix operations limit its deployment on resource-constrained scenarios and real-time applications, such as mobile devices, although considerable efforts have been made in previous works. In this paper, we introduc… ▽ More Vision Transformers (ViTs) mark a revolutionary advance in neural networks with their token mixer's powerful global context capability. However, the pairwise token affinity and complex matrix operations limit its deployment on resource-constrained scenarios and real-time applications, such as mobile devices, although considerable efforts have been made in previous works. In this paper, we introduce CAS-ViT: Convolutional Additive Self-attention Vision Transformers, to achieve a balance between efficiency and performance in mobile applications. Firstly, we argue that the capability of token mixers to obtain global contextual information hinges on multiple information interactions, such as spatial and channel domains. Subsequently, we construct a novel additive similarity function following this paradigm and present an efficient implementation named Convolutional Additive Token Mixer (CATM). This simplification leads to a significant reduction in computational overhead. We evaluate CAS-ViT across a variety of vision tasks, including image classification, object detection, instance segmentation, and semantic segmentation. Our experiments, conducted on GPUs, ONNX, and iPhones, demonstrate that CAS-ViT achieves a competitive performance when compared to other state-of-the-art backbones, establishing it as a viable option for efficient mobile vision applications. Our code and model are available at: \url{https://github.com/Tianfang-Zhang/CAS-ViT} △ Less

Submitted 7 August, 2024; originally announced August 2024.

arXiv:2408.01916 [pdf, other]

MAO: A Framework for Process Model Generation with Multi-Agent Orchestration

Authors: Leilei Lin, Yumeng Jin, Yingming Zhou, Wenlong Chen, Chen Qian

Abstract: Process models are frequently used in software engineering to describe business requirements, guide software testing and control system improvement. However, traditional process modeling methods often require the participation of numerous experts, which is expensive and time-consuming. Therefore, the exploration of a more efficient and cost-effective automated modeling method has emerged as a foca… ▽ More Process models are frequently used in software engineering to describe business requirements, guide software testing and control system improvement. However, traditional process modeling methods often require the participation of numerous experts, which is expensive and time-consuming. Therefore, the exploration of a more efficient and cost-effective automated modeling method has emerged as a focal point in current research. This article explores a framework for automatically generating process models with multi-agent orchestration (MAO), aiming to enhance the efficiency of process modeling and offer valuable insights for domain experts. Our framework MAO leverages large language models as the cornerstone for multi-agent, employing an innovative prompt strategy to ensure efficient collaboration among multi-agent. Specifically, 1) generation. The first phase of MAO is to generate a slightly rough process model from the text description; 2) refinement. The agents would continuously refine the initial process model through multiple rounds of dialogue; 3) reviewing. Large language models are prone to hallucination phenomena among multi-turn dialogues, so the agents need to review and repair semantic hallucinations in process models; 4) testing. The representation of process models is diverse. Consequently, the agents utilize external tools to test whether the generated process model contains format errors, namely format hallucinations, and then adjust the process model to conform to the output paradigm. The experiments demonstrate that the process models generated by our framework outperform existing methods and surpass manual modeling by 89%, 61%, 52%, and 75% on four different datasets, respectively. △ Less

Submitted 7 August, 2024; v1 submitted 3 August, 2024; originally announced August 2024.

arXiv:2407.18178 [pdf, other]

PianoMime: Learning a Generalist, Dexterous Piano Player from Internet Demonstrations

Authors: Cheng Qian, Julen Urain, Kevin Zakka, Jan Peters

Abstract: In this work, we introduce PianoMime, a framework for training a piano-playing agent using internet demonstrations. The internet is a promising source of large-scale demonstrations for training our robot agents. In particular, for the case of piano-playing, Youtube is full of videos of professional pianists playing a wide myriad of songs. In our work, we leverage these demonstrations to learn a ge… ▽ More In this work, we introduce PianoMime, a framework for training a piano-playing agent using internet demonstrations. The internet is a promising source of large-scale demonstrations for training our robot agents. In particular, for the case of piano-playing, Youtube is full of videos of professional pianists playing a wide myriad of songs. In our work, we leverage these demonstrations to learn a generalist piano-playing agent capable of playing any arbitrary song. Our framework is divided into three parts: a data preparation phase to extract the informative features from the Youtube videos, a policy learning phase to train song-specific expert policies from the demonstrations and a policy distillation phase to distil the policies into a single generalist agent. We explore different policy designs to represent the agent and evaluate the influence of the amount of training data on the generalization capability of the agent to novel songs not available in the dataset. We show that we are able to learn a policy with up to 56\% F1 score on unseen songs. △ Less

Submitted 25 July, 2024; originally announced July 2024.

arXiv:2407.12344 [pdf, other]

The Better Angels of Machine Personality: How Personality Relates to LLM Safety

Authors: Jie Zhang, Dongrui Liu, Chen Qian, Ziyue Gan, Yong Liu, Yu Qiao, Jing Shao

Abstract: Personality psychologists have analyzed the relationship between personality and safety behaviors in human society. Although Large Language Models (LLMs) demonstrate personality traits, the relationship between personality traits and safety abilities in LLMs still remains a mystery. In this paper, we discover that LLMs' personality traits are closely related to their safety abilities, i.e., toxici… ▽ More Personality psychologists have analyzed the relationship between personality and safety behaviors in human society. Although Large Language Models (LLMs) demonstrate personality traits, the relationship between personality traits and safety abilities in LLMs still remains a mystery. In this paper, we discover that LLMs' personality traits are closely related to their safety abilities, i.e., toxicity, privacy, and fairness, based on the reliable MBTI-M scale. Meanwhile, the safety alignment generally increases various LLMs' Extraversion, Sensing, and Judging traits. According to such findings, we can edit LLMs' personality traits and improve their safety performance, e.g., inducing personality from ISTJ to ISTP resulted in a relative improvement of approximately 43% and 10% in privacy and fairness performance, respectively. Additionally, we find that LLMs with different personality traits are differentially susceptible to jailbreak. This study pioneers the investigation of LLM safety from a personality perspective, providing new insights into LLM safety enhancement. △ Less

Submitted 17 July, 2024; originally announced July 2024.

arXiv:2407.12027 [pdf, ps, other]

Idle is the New Sleep: Configuration-Aware Alternative to Powering Off FPGA-Based DL Accelerators During Inactivity

Authors: Chao Qian, Christopher Cichiwskyj, Tianheng Ling, Gregor Schiele

Abstract: In the rapidly evolving Internet of Things (IoT) domain, we concentrate on enhancing energy efficiency in Deep Learning accelerators on FPGA-based heterogeneous platforms, aligning with the principles of sustainable computing. Instead of focusing on the inference phase, we introduce innovative optimizations to minimize the overhead of the FPGA configuration phase. By fine-tuning configuration para… ▽ More In the rapidly evolving Internet of Things (IoT) domain, we concentrate on enhancing energy efficiency in Deep Learning accelerators on FPGA-based heterogeneous platforms, aligning with the principles of sustainable computing. Instead of focusing on the inference phase, we introduce innovative optimizations to minimize the overhead of the FPGA configuration phase. By fine-tuning configuration parameters correctly, we achieved a 40.13-fold reduction in configuration energy. Moreover, augmented with power-saving methods, our Idle-Waiting strategy outperformed the traditional On-Off strategy in duty-cycle mode for request periods up to 499.06 ms. Specifically, at a 40 ms request period within a 4147 J energy budget, this strategy extends the system lifetime to approximately 12.39x that of the On-Off strategy. Empirically validated through hardware measurements and simulations, these optimizations provide valuable insights and practical methods for achieving energy-efficient and sustainable deployments in IoT. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: Accepted by 37th GI/ITG International Conference on Architecture of Computing Systems (ARCS 2024)

arXiv:2407.11321 [pdf, other]

TCFormer: Visual Recognition via Token Clustering Transformer

Authors: Wang Zeng, Sheng Jin, Lumin Xu, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang

Abstract: Transformers are widely used in computer vision areas and have achieved remarkable success. Most state-of-the-art approaches split images into regular grids and represent each grid region with a vision token. However, fixed token distribution disregards the semantic meaning of different image regions, resulting in sub-optimal performance. To address this issue, we propose the Token Clustering Tran… ▽ More Transformers are widely used in computer vision areas and have achieved remarkable success. Most state-of-the-art approaches split images into regular grids and represent each grid region with a vision token. However, fixed token distribution disregards the semantic meaning of different image regions, resulting in sub-optimal performance. To address this issue, we propose the Token Clustering Transformer (TCFormer), which generates dynamic vision tokens based on semantic meaning. Our dynamic tokens possess two crucial characteristics: (1) Representing image regions with similar semantic meanings using the same vision token, even if those regions are not adjacent, and (2) concentrating on regions with valuable details and represent them using fine tokens. Through extensive experimentation across various applications, including image classification, human pose estimation, semantic segmentation, and object detection, we demonstrate the effectiveness of our TCFormer. The code and models for this work are available at https://github.com/zengwang430521/TCFormer. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.11042 [pdf, other]

An Automated Approach to Collecting and Labeling Time Series Data for Event Detection Using Elastic Node Hardware

Authors: Tianheng Ling, Islam Mansour, Chao Qian, Gregor Schiele

Abstract: Recent advancements in IoT technologies have underscored the importance of using sensor data to understand environmental contexts effectively. This paper introduces a novel embedded system designed to autonomously label sensor data directly on IoT devices, thereby enhancing the efficiency of data collection methods. We present an integrated hardware and software solution equipped with specialized… ▽ More Recent advancements in IoT technologies have underscored the importance of using sensor data to understand environmental contexts effectively. This paper introduces a novel embedded system designed to autonomously label sensor data directly on IoT devices, thereby enhancing the efficiency of data collection methods. We present an integrated hardware and software solution equipped with specialized labeling sensors that streamline the capture and labeling of diverse types of sensor data. By implementing local processing with lightweight labeling methods, our system minimizes the need for extensive data transmission and reduces dependence on external resources. Experimental validation with collected data and a Convolutional Neural Network model achieved a high classification accuracy of up to 91.67%, as confirmed through 4-fold cross-validation. These results demonstrate the system's robust capability to collect audio and vibration data with correct labels. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Comments: This paper is accepted by the 4th Workshop on Collaborative Technologies and Data Science in Smart City Applications (CODASSCA 2024)

arXiv:2407.11041 [pdf, other]

Integer-only Quantized Transformers for Embedded FPGA-based Time-series Forecasting in AIoT

Authors: Tianheng Ling, Chao Qian, Gregor Schiele

Abstract: This paper presents the design of a hardware accelerator for Transformers, optimized for on-device time-series forecasting in AIoT systems. It integrates integer-only quantization and Quantization-Aware Training with optimized hardware designs to realize 6-bit and 4-bit quantized Transformer models, which achieved precision comparable to 8-bit quantized models from related research. Utilizing a co… ▽ More This paper presents the design of a hardware accelerator for Transformers, optimized for on-device time-series forecasting in AIoT systems. It integrates integer-only quantization and Quantization-Aware Training with optimized hardware designs to realize 6-bit and 4-bit quantized Transformer models, which achieved precision comparable to 8-bit quantized models from related research. Utilizing a complete implementation on an embedded FPGA (Xilinx Spartan-7 XC7S15), we examine the feasibility of deploying Transformer models on embedded IoT devices. This includes a thorough analysis of achievable precision, resource utilization, timing, power, and energy consumption for on-device inference. Our results indicate that while sufficient performance can be attained, the optimization process is not trivial. For instance, reducing the quantization bitwidth does not consistently result in decreased latency or energy consumption, underscoring the necessity of systematically exploring various optimization combinations. Compared to an 8-bit quantized Transformer model in related studies, our 4-bit quantized Transformer model increases test loss by only 0.63%, operates up to 132.33x faster, and consumes 48.19x less energy. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Comments: The paper is accepted by 2024 IEEE Annual Congress on Artificial Intelligence of Things (IEEE AIoT)

arXiv:2407.10125 [pdf, other]

When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset

Authors: Yi Zhang, Wang Zeng, Sheng Jin, Chen Qian, Ping Luo, Wentao Liu

Abstract: Recent years have witnessed increasing research attention towards pedestrian detection by taking the advantages of different sensor modalities (e.g. RGB, IR, Depth, LiDAR and Event). However, designing a unified generalist model that can effectively process diverse sensor modalities remains a challenge. This paper introduces MMPedestron, a novel generalist model for multimodal perception. Unlike p… ▽ More Recent years have witnessed increasing research attention towards pedestrian detection by taking the advantages of different sensor modalities (e.g. RGB, IR, Depth, LiDAR and Event). However, designing a unified generalist model that can effectively process diverse sensor modalities remains a challenge. This paper introduces MMPedestron, a novel generalist model for multimodal perception. Unlike previous specialist models that only process one or a pair of specific modality inputs, MMPedestron is able to process multiple modal inputs and their dynamic combinations. The proposed approach comprises a unified encoder for modal representation and fusion and a general head for pedestrian detection. We introduce two extra learnable tokens, i.e. MAA and MAF, for adaptive multi-modal feature fusion. In addition, we construct the MMPD dataset, the first large-scale benchmark for multi-modal pedestrian detection. This benchmark incorporates existing public datasets and a newly collected dataset called EventPed, covering a wide range of sensor modalities including RGB, IR, Depth, LiDAR, and Event data. With multi-modal joint training, our model achieves state-of-the-art performance on a wide range of pedestrian detection benchmarks, surpassing leading models tailored for specific sensor modality. For example, it achieves 71.1 AP on COCO-Persons and 72.6 AP on LLVIP. Notably, our model achieves comparable performance to the InternImage-H model on CrowdHuman with 30x smaller parameters. Codes and data are available at https://github.com/BubblyYi/MMPedestron. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: Accepted to ECCV'2024

arXiv:2407.09056 [pdf, other]

A Novel Quantum Realization of Jet Clustering in High-Energy Physics Experiments

Authors: Yongfeng Zhu, Weifeng Zhuang, Chen Qian, Yunheng Ma, Dong E. Liu, Manqi Ruan, Chen Zhou

Abstract: Exploring the application of quantum technologies to fundamental sciences holds the key to fostering innovation for both sides. In high-energy particle collisions, quarks and gluons are produced and immediately form collimated particle sprays known as jets. Accurate jet clustering is crucial as it retains the information of the originating quark or gluon and forms the basis for studying properties… ▽ More Exploring the application of quantum technologies to fundamental sciences holds the key to fostering innovation for both sides. In high-energy particle collisions, quarks and gluons are produced and immediately form collimated particle sprays known as jets. Accurate jet clustering is crucial as it retains the information of the originating quark or gluon and forms the basis for studying properties of the Higgs boson, which underlies teh mechanism of mass generation for subatomic particles. For the first time, by mapping collision events into graphs--with particles as nodes and their angular separations as edges--we realize jet clustering using the Quantum Approximate Optimization Algorithm (QAOA), a hybrid quantum-classical algorithm for addressing classical combinatorial optimization problems with available quantum resources. Our results, derived from 30 qubits on quantum computer simulator and 6 qubits on quantum computer hardware, demonstrate that jet clustering performance with QAOA is comparable with or even better than classical algorithms for a small-sized problem. This study highlights the feasibility of quantum computing to revolutionize jet clustering, bringing the practical application of quantum computing in high-energy physics experiments one step closer. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.07061 [pdf, other]

Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence

Authors: Weize Chen, Ziming You, Ran Li, Yitong Guan, Chen Qian, Chenyang Zhao, Cheng Yang, Ruobing Xie, Zhiyuan Liu, Maosong Sun

Abstract: The rapid advancement of large language models (LLMs) has paved the way for the development of highly capable autonomous agents. However, existing multi-agent frameworks often struggle with integrating diverse capable third-party agents due to reliance on agents defined within their own ecosystems. They also face challenges in simulating distributed environments, as most frameworks are limited to… ▽ More The rapid advancement of large language models (LLMs) has paved the way for the development of highly capable autonomous agents. However, existing multi-agent frameworks often struggle with integrating diverse capable third-party agents due to reliance on agents defined within their own ecosystems. They also face challenges in simulating distributed environments, as most frameworks are limited to single-device setups. Furthermore, these frameworks often rely on hard-coded communication pipelines, limiting their adaptability to dynamic task requirements. Inspired by the concept of the Internet, we propose the Internet of Agents (IoA), a novel framework that addresses these limitations by providing a flexible and scalable platform for LLM-based multi-agent collaboration. IoA introduces an agent integration protocol, an instant-messaging-like architecture design, and dynamic mechanisms for agent teaming and conversation flow control. Through extensive experiments on general assistant tasks, embodied AI tasks, and retrieval-augmented generation benchmarks, we demonstrate that IoA consistently outperforms state-of-the-art baselines, showcasing its ability to facilitate effective collaboration among heterogeneous agents. IoA represents a step towards linking diverse agents in an Internet-like environment, where agents can seamlessly collaborate to achieve greater intelligence and capabilities. Our codebase has been released at \url{https://github.com/OpenBMB/IoA}. △ Less

Submitted 10 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

Comments: work in progress

arXiv:2407.05102 [pdf, other]

Towards Auto-Building of Embedded FPGA-based Soft Sensors for Wastewater Flow Estimation

Authors: Tianheng Ling, Chao Qian, Gregor Schiele

Abstract: Executing flow estimation using Deep Learning (DL)-based soft sensors on resource-limited IoT devices has demonstrated promise in terms of reliability and energy efficiency. However, its application in the field of wastewater flow estimation remains underexplored due to: (1) a lack of available datasets, (2) inconvenient toolchains for on-device AI model development and deployment, and (3) hardwar… ▽ More Executing flow estimation using Deep Learning (DL)-based soft sensors on resource-limited IoT devices has demonstrated promise in terms of reliability and energy efficiency. However, its application in the field of wastewater flow estimation remains underexplored due to: (1) a lack of available datasets, (2) inconvenient toolchains for on-device AI model development and deployment, and (3) hardware platforms designed for general DL purposes rather than being optimized for energy-efficient soft sensor applications. This study addresses these gaps by proposing an automated, end-to-end solution for wastewater flow estimation using a prototype IoT device. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Comments: This paper is accepted by 2024 IEEE Annual Congress on Artificial Intelligence of Things (IEEE AIoT)

arXiv:2407.02818 [pdf, other]

WizardMerge -- Save Us From Merging Without Any Clues

Authors: Qingyu Zhang, Junzhe Li, Jiayi Lin, Jie Ding, Lanteng Lin, Chenxiong Qian

Abstract: Modern software development necessitates efficient version-oriented collaboration among developers. While Git is the most popular version control system, it generates unsatisfactory version merging results due to textual-based workflow, leading to potentially unexpected results in the merged version of the project. Although numerous merging tools have been proposed for improving merge results, dev… ▽ More Modern software development necessitates efficient version-oriented collaboration among developers. While Git is the most popular version control system, it generates unsatisfactory version merging results due to textual-based workflow, leading to potentially unexpected results in the merged version of the project. Although numerous merging tools have been proposed for improving merge results, developers remain struggling to resolve the conflicts and fix incorrectly modified code without clues. We present WizardMerge, an auxiliary tool that leverages merging results from Git to retrieve code block dependency on text and LLVM-IR level and provide suggestions for developers to resolve errors introduced by textual merging. Through the evaluation, we subjected WizardMerge to testing on 227 conflicts within five large-scale projects. The outcomes demonstrate that WizardMerge diminishes conflict merging time costs, achieving a 23.85% reduction. Beyond addressing conflicts, WizardMerge provides merging suggestions for over 70% of the code blocks potentially affected by the conflicts. Notably, WizardMerge exhibits the capability to identify conflict-unrelated code blocks that require manual intervention yet are harmfully applied by Git during the merging. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 22 pages

ACM Class: D.2; D.3

arXiv:2406.16950 [pdf]

The influence of flame-pressure waves collisions on the development and evolution of tulip flames

Authors: Chengeng Qian, Mikhail A. Liberman

Abstract: The effects of pressure waves-flame collisions and tube aspect ratio on flame evolution and the formation of tulip and distorted tulip flames were investigated using numerical simulations of the fully compressible Navier-Stokes equations coupled with a detailed chemical model for a stoichiometric hydrogen-air mixture. It is shown that: (1) the rarefaction wave generated by the decelerating flame i… ▽ More The effects of pressure waves-flame collisions and tube aspect ratio on flame evolution and the formation of tulip and distorted tulip flames were investigated using numerical simulations of the fully compressible Navier-Stokes equations coupled with a detailed chemical model for a stoichiometric hydrogen-air mixture. It is shown that: (1) the rarefaction wave generated by the decelerating flame in the unburned gas is the primary physical process leading to the flame front inversion and the tulip flame formation, (2) the flame front instabilities (Darrieus-Landau or Rayleigh-Taylor) do not participate in the formation of the tulip flame, since the time of the flame front inversion due to the rarefaction wave is considerably shorter than the characteristic times of the development of instabilities with wavelengths of the order of the tube width. The first rarefaction wave in the unburned gas mixture is generated after the flame skirt touches the tube walls and the flame is slowed down due to the reduction in flame surface area. The collision of the flame with the pressure waves reflected from the closed end of the tube leads to a faster and more pronounced formation of a tulip-shaped flame. In later stages, flame collisions with pressure waves can lead to the formation of distorted tulip flames due to short-wavelength Rayleigh-Taylor instability of the flame front. Because flame acceleration and deceleration occur much faster in 3D flames than in 2D flames, tulip flame formation also occurs much faster in 3D flames than in 2D flames. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 33 pages, 12 figures, Nordita preprint

Report number: Preprint NORDITA 2024-020

arXiv:2406.16360 [pdf, other]

MIRReS: Multi-bounce Inverse Rendering using Reservoir Sampling

Authors: Yuxin Dai, Qi Wang, Jingsen Zhu, Dianbing Xi, Yuchi Huo, Chen Qian, Ying He

Abstract: We present MIRReS, a novel two-stage inverse rendering framework that jointly reconstructs and optimizes the explicit geometry, material, and lighting from multi-view images. Unlike previous methods that rely on implicit irradiance fields or simplified path tracing algorithms, our method extracts an explicit geometry (triangular mesh) in stage one, and introduces a more realistic physically-based… ▽ More We present MIRReS, a novel two-stage inverse rendering framework that jointly reconstructs and optimizes the explicit geometry, material, and lighting from multi-view images. Unlike previous methods that rely on implicit irradiance fields or simplified path tracing algorithms, our method extracts an explicit geometry (triangular mesh) in stage one, and introduces a more realistic physically-based inverse rendering model that utilizes multi-bounce path tracing and Monte Carlo integration. By leveraging multi-bounce path tracing, our method effectively estimates indirect illumination, including self-shadowing and internal reflections, which improves the intrinsic decomposition of shape, material, and lighting. Moreover, we incorporate reservoir sampling into our framework to address the noise in Monte Carlo integration, enhancing convergence and facilitating gradient-based optimization with low sample counts. Through qualitative and quantitative evaluation of several scenarios, especially in challenging scenarios with complex shadows, we demonstrate that our method achieves state-of-the-art performance on decomposition results. Additionally, our optimized explicit geometry enables applications such as scene editing, relighting, and material editing with modern graphics engines or CAD software. The source code is available at https://brabbitdousha.github.io/MIRReS/ △ Less

Submitted 24 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

Comments: 16 pages, 14 figures

arXiv:2406.16298 [pdf, other]

Bell nonlocality and entanglement in $e^{+}e^{-} \rightarrow Y\bar{Y}$ at BESIII

Authors: Sihao Wu, Chen Qian, Qun Wang, Xiao-Rong Zhou

Abstract: The Bell nonlocality and entanglement are two kinds of quantum correlations in quantum systems. Due to the recent upgrade in Beijing Spectrometer III (BESIII) experiment, it is possible to explore the nonlocality and entanglement in hyperon-antihyperon systems produced in electron-positron annihilation with high precision data. We provide a systematic method for studying quantum correlations in sp… ▽ More The Bell nonlocality and entanglement are two kinds of quantum correlations in quantum systems. Due to the recent upgrade in Beijing Spectrometer III (BESIII) experiment, it is possible to explore the nonlocality and entanglement in hyperon-antihyperon systems produced in electron-positron annihilation with high precision data. We provide a systematic method for studying quantum correlations in spin-1/2 hyperon-antihyperon systems through the measures for the nonlocality and entanglement. We find that with nonvanishing polarizations of the hyperon and its antihyperon, the kinematic region of nonlocality in the hyperon-antihyperon system is more restricted than the $τ^{+}τ^{-}$ system in which polarizations of $τ$ leptons are vanishing. We also present an experimental proposal to probe the nonlocality and entanglement in hyperon-antihyperon systems at BSEIII. △ Less

Submitted 28 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

Comments: 9 pages, 4 figures, 4 tables. We corrected a few errors in plotting figures from analytical formula. Some results in tables read from figures have also been corrected. A new table (Table III) was added for the maximum concurrence and their corresponding angles. A few references were added

arXiv:2406.16116 [pdf, ps, other]

A First Running Time Analysis of the Strength Pareto Evolutionary Algorithm 2 (SPEA2)

Authors: Shengjie Ren, Chao Bian, Miqing Li, Chao Qian

Abstract: Evolutionary algorithms (EAs) have emerged as a predominant approach for addressing multi-objective optimization problems. However, the theoretical foundation of multi-objective EAs (MOEAs), particularly the fundamental aspects like running time analysis, remains largely underexplored. Existing theoretical studies mainly focus on basic MOEAs, with little attention given to practical MOEAs. In this… ▽ More Evolutionary algorithms (EAs) have emerged as a predominant approach for addressing multi-objective optimization problems. However, the theoretical foundation of multi-objective EAs (MOEAs), particularly the fundamental aspects like running time analysis, remains largely underexplored. Existing theoretical studies mainly focus on basic MOEAs, with little attention given to practical MOEAs. In this paper, we present a running time analysis of strength Pareto evolutionary algorithm 2 (SPEA2) for the first time. Specifically, we prove that the expected running time of SPEA2 for solving three commonly used multi-objective problems, i.e., $m$OneMinMax, $m$LeadingOnesTrailingZeroes, and $m$-OneJumpZeroJump, is $O(μn\cdot \min\{m\log n, n\})$, $O(μn^2)$, and $O(μn^k \cdot \min\{mn, 3^{m/2}\})$, respectively. Here $m$ denotes the number of objectives, and the population size $μ$ is required to be at least $(2n/m+1)^{m/2}$, $(2n/m+1)^{m-1}$ and $(2n/m-2k+3)^{m/2}$, respectively. The proofs are accomplished through general theorems which are also applicable for analyzing the expected running time of other MOEAs on these problems, and thus can be helpful for future theoretical analysis of MOEAs. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.14928 [pdf, other]

Autonomous Agents for Collaborative Task under Information Asymmetry

Authors: Wei Liu, Chenxi Wang, Yifei Wang, Zihao Xie, Rennai Qiu, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, Chen Qian

Abstract: Large Language Model Multi-Agent Systems (LLM-MAS) have achieved great progress in solving complex tasks. It performs communication among agents within the system to collaboratively solve tasks, under the premise of shared information. However, when agents' communication is leveraged to enhance human cooperation, a new challenge arises due to information asymmetry, since each agent can only access… ▽ More Large Language Model Multi-Agent Systems (LLM-MAS) have achieved great progress in solving complex tasks. It performs communication among agents within the system to collaboratively solve tasks, under the premise of shared information. However, when agents' communication is leveraged to enhance human cooperation, a new challenge arises due to information asymmetry, since each agent can only access the information of its human user. Previous MAS struggle to complete tasks under this condition. To address this, we propose a new MAS paradigm termed iAgents, which denotes Informative Multi-Agent Systems. In iAgents, the human social network is mirrored in the agent network, where agents proactively exchange human information necessary for task resolution, thereby overcoming information asymmetry. iAgents employs a novel agent reasoning mechanism, InfoNav, to navigate agents' communication towards effective information exchange. Together with InfoNav, iAgents organizes human information in a mixed memory to provide agents with accurate and comprehensive information for exchange. Additionally, we introduce InformativeBench, the first benchmark tailored for evaluating LLM agents' task-solving ability under information asymmetry. Experimental results show that iAgents can collaborate within a social network of 140 individuals and 588 relationships, autonomously communicate over 30 turns, and retrieve information from nearly 70,000 messages to complete tasks within 3 minutes. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 16 pages, 8 figures, 5 tables, Work in progress

arXiv:2406.12383 [pdf, other]

Biased Pareto Optimization for Subset Selection with Dynamic Cost Constraints

Authors: Dan-Xuan Liu, Chao Qian

Abstract: Subset selection with cost constraints aims to select a subset from a ground set to maximize a monotone objective function without exceeding a given budget, which has various applications such as influence maximization and maximum coverage. In real-world scenarios, the budget, representing available resources, may change over time, which requires that algorithms must adapt quickly to new budgets.… ▽ More Subset selection with cost constraints aims to select a subset from a ground set to maximize a monotone objective function without exceeding a given budget, which has various applications such as influence maximization and maximum coverage. In real-world scenarios, the budget, representing available resources, may change over time, which requires that algorithms must adapt quickly to new budgets. However, in this dynamic environment, previous algorithms either lack theoretical guarantees or require a long running time. The state-of-the-art algorithm, POMC, is a Pareto optimization approach designed for static problems, lacking consideration for dynamic problems. In this paper, we propose BPODC, enhancing POMC with biased selection and warm-up strategies tailored for dynamic environments. We focus on the ability of BPODC to leverage existing computational results while adapting to budget changes. We prove that BPODC can maintain the best known $(α_f/2)(1-e^{-α_f})$-approximation guarantee when the budget changes. Experiments on influence maximization and maximum coverage show that BPODC adapts more effectively and rapidly to budget changes, with a running time that is less than that of the static greedy algorithm. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: This paper has appeared at PPSN'24

arXiv:2406.11721 [pdf, other]

Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity

Authors: Bingxiang He, Ning Ding, Cheng Qian, Jia Deng, Ganqu Cui, Lifan Yuan, Huan-ang Gao, Huimin Chen, Zhiyuan Liu, Maosong Sun

Abstract: Understanding alignment techniques begins with comprehending zero-shot generalization brought by instruction tuning, but little of the mechanism has been understood. Existing work has largely been confined to the task level, without considering that tasks are artificially defined and, to LLMs, merely consist of tokens and representations. This line of research has been limited to examining transfe… ▽ More Understanding alignment techniques begins with comprehending zero-shot generalization brought by instruction tuning, but little of the mechanism has been understood. Existing work has largely been confined to the task level, without considering that tasks are artificially defined and, to LLMs, merely consist of tokens and representations. This line of research has been limited to examining transfer between tasks from a task-pair perspective, with few studies focusing on understanding zero-shot generalization from the perspective of the data itself. To bridge this gap, we first demonstrate through multiple metrics that zero-shot generalization during instruction tuning happens very early. Next, we investigate the facilitation of zero-shot generalization from both data similarity and granularity perspectives, confirming that encountering highly similar and fine-grained training data earlier during instruction tuning, without the constraints of defined "tasks", enables better generalization. Finally, we propose a more grounded training data arrangement method, Test-centric Multi-turn Arrangement, and show its effectiveness in promoting continual learning and further loss reduction. For the first time, we show that zero-shot generalization during instruction tuning is a form of similarity-based generalization between training and test data at the instance level. We hope our analysis will advance the understanding of zero-shot generalization during instruction tuning and contribute to the development of more aligned LLMs. Our code is released at https://github.com/HBX-hbx/dynamics_of_zero-shot_generalization. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 33 pages, 14 figures

arXiv:2406.10539 [pdf, other]

Self-Supervised Vision Transformer for Enhanced Virtual Clothes Try-On

Authors: Lingxiao Lu, Shengyi Wu, Haoxuan Sun, Junhong Gou, Jianlou Si, Chen Qian, Jianfu Zhang, Liqing Zhang

Abstract: Virtual clothes try-on has emerged as a vital feature in online shopping, offering consumers a critical tool to visualize how clothing fits. In our research, we introduce an innovative approach for virtual clothes try-on, utilizing a self-supervised Vision Transformer (ViT) coupled with a diffusion model. Our method emphasizes detail enhancement by contrasting local clothing image embeddings, gene… ▽ More Virtual clothes try-on has emerged as a vital feature in online shopping, offering consumers a critical tool to visualize how clothing fits. In our research, we introduce an innovative approach for virtual clothes try-on, utilizing a self-supervised Vision Transformer (ViT) coupled with a diffusion model. Our method emphasizes detail enhancement by contrasting local clothing image embeddings, generated by ViT, with their global counterparts. Techniques such as conditional guidance and focus on key regions have been integrated into our approach. These combined strategies empower the diffusion model to reproduce clothing details with increased clarity and realism. The experimental results showcase substantial advancements in the realism and precision of details in virtual try-on experiences, significantly surpassing the capabilities of existing technologies. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.09180 [pdf, other]

Detection-Rate-Emphasized Multi-objective Evolutionary Feature Selection for Network Intrusion Detection

Authors: Zi-Hang Cheng, Haopu Shang, Chao Qian

Abstract: Network intrusion detection is one of the most important issues in the field of cyber security, and various machine learning techniques have been applied to build intrusion detection systems. However, since the number of features to describe the network connections is often large, where some features are redundant or noisy, feature selection is necessary in such scenarios, which can both improve t… ▽ More Network intrusion detection is one of the most important issues in the field of cyber security, and various machine learning techniques have been applied to build intrusion detection systems. However, since the number of features to describe the network connections is often large, where some features are redundant or noisy, feature selection is necessary in such scenarios, which can both improve the efficiency and accuracy. Recently, some researchers focus on using multi-objective evolutionary algorithms (MOEAs) to select features. But usually, they only consider the number of features and classification accuracy as the objectives, resulting in unsatisfactory performance on a critical metric, detection rate. This will lead to the missing of many real attacks and bring huge losses to the network system. In this paper, we propose DR-MOFS to model the feature selection problem in network intrusion detection as a three-objective optimization problem, where the number of features, accuracy and detection rate are optimized simultaneously, and use MOEAs to solve it. Experiments on two popular network intrusion detection datasets NSL-KDD and UNSW-NB15 show that in most cases the proposed method can outperform previous methods, i.e., lead to fewer features, higher accuracy and detection rate. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08979 [pdf, other]

Multi-Agent Software Development through Cross-Team Collaboration

Authors: Zhuoyun Du, Chen Qian, Wei Liu, Zihao Xie, Yifei Wang, Yufan Dang, Weize Chen, Cheng Yang

Abstract: The latest breakthroughs in Large Language Models (LLMs), eg., ChatDev, have catalyzed profound transformations, particularly through multi-agent collaboration for software development. LLM agents can collaborate in teams like humans, and follow the waterfall model to sequentially work on requirements analysis, development, review, testing, and other phases to perform autonomous software generatio… ▽ More The latest breakthroughs in Large Language Models (LLMs), eg., ChatDev, have catalyzed profound transformations, particularly through multi-agent collaboration for software development. LLM agents can collaborate in teams like humans, and follow the waterfall model to sequentially work on requirements analysis, development, review, testing, and other phases to perform autonomous software generation. However, for an agent team, each phase in a single development process yields only one possible outcome. This results in the completion of only one development chain, thereby losing the opportunity to explore multiple potential decision paths within the solution space. Consequently, this may lead to obtaining suboptimal results. To address this challenge, we introduce Cross-Team Collaboration (CTC), a scalable multi-team framework that enables orchestrated teams to jointly propose various decisions and communicate with their insights in a cross-team collaboration environment for superior content generation. Experimental results in software development reveal a notable increase in quality compared to state-of-the-art baselines, underscoring the efficacy of our framework. The significant improvements in story generation demonstrate the promising generalization ability of our framework across various domains. We anticipate that our work will guide LLM agents towards a cross-team paradigm and contribute to their significant growth in but not limited to software development. The code and data will be available at https://github.com/OpenBMB/ChatDev. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Work in progress

arXiv:2406.07155 [pdf, other]

Scaling Large-Language-Model-based Multi-Agent Collaboration

Authors: Chen Qian, Zihao Xie, Yifei Wang, Wei Liu, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, Zhiyuan Liu, Maosong Sun

Abstract: Pioneering advancements in large language model-powered agents have underscored the design pattern of multi-agent collaboration, demonstrating that collective intelligence can surpass the capabilities of each individual. Inspired by the neural scaling law, which posits that increasing neurons leads to emergent abilities, this study investigates whether a similar principle applies to increasing age… ▽ More Pioneering advancements in large language model-powered agents have underscored the design pattern of multi-agent collaboration, demonstrating that collective intelligence can surpass the capabilities of each individual. Inspired by the neural scaling law, which posits that increasing neurons leads to emergent abilities, this study investigates whether a similar principle applies to increasing agents in multi-agent collaboration. Technically, we propose multi-agent collaboration networks (MacNet), which utilize directed acyclic graphs to organize agents and streamline their interactive reasoning via topological ordering, with solutions derived from their dialogues. Extensive experiments show that MacNet consistently outperforms baseline models, enabling effective agent collaboration across various network topologies and supporting cooperation among more than a thousand agents. Notably, we observed a small-world collaboration phenomenon, where topologies resembling small-world properties achieved superior performance. Additionally, we identified a collaborative scaling law, indicating that normalized solution quality follows a logistic growth pattern as scaling agents, with collaborative emergence occurring much earlier than previously observed instances of neural emergence. The code and data will be available at https://github.com/OpenBMB/ChatDev. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: Work in progress; The code and data will be available at https://github.com/OpenBMB/ChatDev

arXiv:2406.05743 [pdf, other]

Peptide Vaccine Design by Evolutionary Multi-Objective Optimization

Authors: Dan-Xuan Liu, Yi-Heng Xu, Chao Qian

Abstract: Peptide vaccines are growing in significance for fighting diverse diseases. Machine learning has improved the identification of peptides that can trigger immune responses, and the main challenge of peptide vaccine design now lies in selecting an effective subset of peptides due to the allelic diversity among individuals. Previous works mainly formulated this task as a constrained optimization prob… ▽ More Peptide vaccines are growing in significance for fighting diverse diseases. Machine learning has improved the identification of peptides that can trigger immune responses, and the main challenge of peptide vaccine design now lies in selecting an effective subset of peptides due to the allelic diversity among individuals. Previous works mainly formulated this task as a constrained optimization problem, aiming to maximize the expected number of peptide-Major Histocompatibility Complex (peptide-MHC) bindings across a broad range of populations by selecting a subset of diverse peptides with limited size; and employed a greedy algorithm, whose performance, however, may be limited due to the greedy nature. In this paper, we propose a new framework PVD-EMO based on Evolutionary Multi-objective Optimization, which reformulates Peptide Vaccine Design as a bi-objective optimization problem that maximizes the expected number of peptide-MHC bindings and minimizes the number of selected peptides simultaneously, and employs a Multi-Objective Evolutionary Algorithm (MOEA) to solve it. We also incorporate warm-start and repair strategies into MOEAs to improve efficiency and performance. We prove that the warm-start strategy ensures that PVD-EMO maintains the same worst-case approximation guarantee as the previous greedy algorithm, and meanwhile, the EMO framework can help avoid local optima. Experiments on a peptide vaccine design for COVID-19, caused by the SARS-CoV-2 virus, demonstrate the superiority of PVD-EMO. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: This paper has appeared at IJCAI'24

arXiv:2406.04745 [pdf, other]

Confidence-aware Contrastive Learning for Selective Classification

Authors: Yu-Chang Wu, Shen-Huan Lyu, Haopu Shang, Xiangyu Wang, Chao Qian

Abstract: Selective classification enables models to make predictions only when they are sufficiently confident, aiming to enhance safety and reliability, which is important in high-stakes scenarios. Previous methods mainly use deep neural networks and focus on modifying the architecture of classification layers to enable the model to estimate the confidence of its prediction. This work provides a generaliz… ▽ More Selective classification enables models to make predictions only when they are sufficiently confident, aiming to enhance safety and reliability, which is important in high-stakes scenarios. Previous methods mainly use deep neural networks and focus on modifying the architecture of classification layers to enable the model to estimate the confidence of its prediction. This work provides a generalization bound for selective classification, disclosing that optimizing feature layers helps improve the performance of selective classification. Inspired by this theory, we propose to explicitly improve the selective classification model at the feature level for the first time, leading to a novel Confidence-aware Contrastive Learning method for Selective Classification, CCL-SC, which similarizes the features of homogeneous instances and differentiates the features of heterogeneous instances, with the strength controlled by the model's confidence. The experimental results on typical datasets, i.e., CIFAR-10, CIFAR-100, CelebA, and ImageNet, show that CCL-SC achieves significantly lower selective risk than state-of-the-art methods, across almost all coverage degrees. Moreover, it can be combined with existing methods to bring further improvement. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: Accepted by ICML 2024

arXiv:2406.03731 [pdf, other]

Quality-Diversity with Limited Resources

Authors: Ren-Jian Wang, Ke Xue, Cong Guan, Chao Qian

Abstract: Quality-Diversity (QD) algorithms have emerged as a powerful optimization paradigm with the aim of generating a set of high-quality and diverse solutions. To achieve such a challenging goal, QD algorithms require maintaining a large archive and a large population in each iteration, which brings two main issues, sample and resource efficiency. Most advanced QD algorithms focus on improving the samp… ▽ More Quality-Diversity (QD) algorithms have emerged as a powerful optimization paradigm with the aim of generating a set of high-quality and diverse solutions. To achieve such a challenging goal, QD algorithms require maintaining a large archive and a large population in each iteration, which brings two main issues, sample and resource efficiency. Most advanced QD algorithms focus on improving the sample efficiency, while the resource efficiency is overlooked to some extent. Particularly, the resource overhead during the training process has not been touched yet, hindering the wider application of QD algorithms. In this paper, we highlight this important research question, i.e., how to efficiently train QD algorithms with limited resources, and propose a novel and effective method called RefQD to address it. RefQD decomposes a neural network into representation and decision parts, and shares the representation part with all decision parts in the archive to reduce the resource overhead. It also employs a series of strategies to address the mismatch issue between the old decision parts and the newly updated representation part. Experiments on different types of tasks from small to large resource consumption demonstrate the excellent performance of RefQD: it not only uses significantly fewer resources (e.g., 16\% GPU memories on QDax and 3.7\% on Atari) but also achieves comparable or better performance compared to sample-efficient QD algorithms. Our code is available at \url{https://github.com/lamda-bbo/RefQD}. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: ICML 2024

arXiv:2406.03722 [pdf, other]

Offline Multi-Objective Optimization

Authors: Ke Xue, Rong-Xi Tan, Xiaobin Huang, Chao Qian

Abstract: Offline optimization aims to maximize a black-box objective function with a static dataset and has wide applications. In addition to the objective function being black-box and expensive to evaluate, numerous complex real-world problems entail optimizing multiple conflicting objectives, i.e., multi-objective optimization (MOO). Nevertheless, offline MOO has not progressed as much as offline single-… ▽ More Offline optimization aims to maximize a black-box objective function with a static dataset and has wide applications. In addition to the objective function being black-box and expensive to evaluate, numerous complex real-world problems entail optimizing multiple conflicting objectives, i.e., multi-objective optimization (MOO). Nevertheless, offline MOO has not progressed as much as offline single-objective optimization (SOO), mainly due to the lack of benchmarks like Design-Bench for SOO. To bridge this gap, we propose a first benchmark for offline MOO, covering a range of problems from synthetic to real-world tasks. This benchmark provides tasks, datasets, and open-source examples, which can serve as a foundation for method comparisons and advancements in offline MOO. Furthermore, we analyze how the current related methods can be adapted to offline MOO from four fundamental perspectives, including data, model architecture, learning algorithm, and search algorithm. Empirical results show improvements over the best value of the training set, demonstrating the effectiveness of offline MOO methods. As no particular method stands out significantly, there is still an open challenge in further enhancing the effectiveness of offline MOO. We finally discuss future challenges for offline MOO, with the hope of shedding some light on this emerging field. Our code is available at \url{https://github.com/lamda-bbo/offline-moo}. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: ICML 2024

arXiv:2406.02658 [pdf, other]

Maintaining Diversity Provably Helps in Evolutionary Multimodal Optimization

Authors: Shengjie Ren, Zhijia Qiu, Chao Bian, Miqing Li, Chao Qian

Abstract: In the real world, there exist a class of optimization problems that multiple (local) optimal solutions in the solution space correspond to a single point in the objective space. In this paper, we theoretically show that for such multimodal problems, a simple method that considers the diversity of solutions in the solution space can benefit the search in evolutionary algorithms (EAs). Specifically… ▽ More In the real world, there exist a class of optimization problems that multiple (local) optimal solutions in the solution space correspond to a single point in the objective space. In this paper, we theoretically show that for such multimodal problems, a simple method that considers the diversity of solutions in the solution space can benefit the search in evolutionary algorithms (EAs). Specifically, we prove that the proposed method, working with crossover, can help enhance the exploration, leading to polynomial or even exponential acceleration on the expected running time. This result is derived by rigorous running time analysis in both single-objective and multi-objective scenarios, including $(μ+1)$-GA solving the widely studied single-objective problem, Jump, and NSGA-II and SMS-EMOA (two well-established multi-objective EAs) solving the widely studied bi-objective problem, OneJumpZeroJump. Experiments are also conducted to validate the theoretical results. We hope that our results may encourage the exploration of diversity maintenance in the solution space for multi-objective optimization, where existing EAs usually only consider the diversity in the objective space and can easily be trapped in local optima. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: arXiv admin note: text overlap with arXiv:2406.02118

arXiv:2406.02118 [pdf, other]

An Archive Can Bring Provable Speed-ups in Multi-Objective Evolutionary Algorithms

Authors: Chao Bian, Shengjie Ren, Miqing Li, Chao Qian

Abstract: In the area of multi-objective evolutionary algorithms (MOEAs), there is a trend of using an archive to store non-dominated solutions generated during the search. This is because 1) MOEAs may easily end up with the final population containing inferior solutions that are dominated by other solutions discarded during the search process and 2) the population that has a commensurable size of the probl… ▽ More In the area of multi-objective evolutionary algorithms (MOEAs), there is a trend of using an archive to store non-dominated solutions generated during the search. This is because 1) MOEAs may easily end up with the final population containing inferior solutions that are dominated by other solutions discarded during the search process and 2) the population that has a commensurable size of the problem's Pareto front is often not practical. In this paper, we theoretically show, for the first time, that using an archive can guarantee speed-ups for MOEAs. Specifically, we prove that for two well-established MOEAs (NSGA-II and SMS-EMOA) on two commonly studied problems (OneMinMax and LeadingOnesTrailingZeroes), using an archive brings a polynomial acceleration on the expected running time. The reason is that with an archive, the size of the population can reduce to a small constant; there is no need for the population to keep all the Pareto optimal solutions found. This contrasts existing theoretical studies for MOEAs where a population with a commensurable size of the problem's Pareto front is needed. The findings in this paper not only provide a theoretical confirmation for an increasingly popular practice in the design of MOEAs, but can also be beneficial to the theory community towards studying more practical MOEAs. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2405.20247 [pdf, other]

KerasCV and KerasNLP: Vision and Language Power-Ups

Authors: Matthew Watson, Divyashree Shivakumar Sreepathihalli, Francois Chollet, Martin Gorner, Kiranbir Sodhia, Ramesh Sampath, Tirth Patel, Haifeng Jin, Neel Kovelamudi, Gabriel Rasskin, Samaneh Saadat, Luke Wood, Chen Qian, Jonathan Bischof, Ian Stenbit, Abheesht Sharma, Anshuman Mishra

Abstract: We present the Keras domain packages KerasCV and KerasNLP, extensions of the Keras API for Computer Vision and Natural Language Processing workflows, capable of running on either JAX, TensorFlow, or PyTorch. These domain packages are designed to enable fast experimentation, with a focus on ease-of-use and performance. We adopt a modular, layered design: at the library's lowest level of abstraction… ▽ More We present the Keras domain packages KerasCV and KerasNLP, extensions of the Keras API for Computer Vision and Natural Language Processing workflows, capable of running on either JAX, TensorFlow, or PyTorch. These domain packages are designed to enable fast experimentation, with a focus on ease-of-use and performance. We adopt a modular, layered design: at the library's lowest level of abstraction, we provide building blocks for creating models and data preprocessing pipelines, and at the library's highest level of abstraction, we provide pretrained ``task" models for popular architectures such as Stable Diffusion, YOLOv8, GPT2, BERT, Mistral, CLIP, Gemma, T5, etc. Task models have built-in preprocessing, pretrained weights, and can be fine-tuned on raw inputs. To enable efficient training, we support XLA compilation for all models, and run all preprocessing via a compiled graph of TensorFlow operations using the tf.data API. The libraries are fully open-source (Apache 2.0 license) and available on GitHub. △ Less

Submitted 5 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

Comments: Submitted to Journal of Machine Learning Open Source Software

ACM Class: I.2.5; I.2.7; I.2.10

arXiv:2405.17311 [pdf, other]

Probabilistic Graph Rewiring via Virtual Nodes

Authors: Chendi Qian, Andrei Manolache, Christopher Morris, Mathias Niepert

Abstract: Message-passing graph neural networks (MPNNs) have emerged as a powerful paradigm for graph-based machine learning. Despite their effectiveness, MPNNs face challenges such as under-reaching and over-squashing, where limited receptive fields and structural bottlenecks hinder information flow in the graph. While graph transformers hold promise in addressing these issues, their scalability is limited… ▽ More Message-passing graph neural networks (MPNNs) have emerged as a powerful paradigm for graph-based machine learning. Despite their effectiveness, MPNNs face challenges such as under-reaching and over-squashing, where limited receptive fields and structural bottlenecks hinder information flow in the graph. While graph transformers hold promise in addressing these issues, their scalability is limited due to quadratic complexity regarding the number of nodes, rendering them impractical for larger graphs. Here, we propose implicitly rewired message-passing neural networks (IPR-MPNNs), a novel approach that integrates implicit probabilistic graph rewiring into MPNNs. By introducing a small number of virtual nodes, i.e., adding additional nodes to a given graph and connecting them to existing nodes, in a differentiable, end-to-end manner, IPR-MPNNs enable long-distance message propagation, circumventing quadratic complexity. Theoretically, we demonstrate that IPR-MPNNs surpass the expressiveness of traditional MPNNs. Empirically, we validate our approach by showcasing its ability to mitigate under-reaching and over-squashing effects, achieving state-of-the-art performance across multiple graph datasets. Notably, IPR-MPNNs outperform graph transformers while maintaining significantly faster computational efficiency. △ Less

Submitted 7 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

Comments: arXiv admin note: text overlap with arXiv:2310.02156

arXiv:2405.13839 [pdf, other]

Diffusing Winding Gradients (DWG): A Parallel and Scalable Method for 3D Reconstruction from Unoriented Point Clouds

Authors: Weizhou Liu, Jiaze Li, Xuhui Chen, Fei Hou, Shiqing Xin, Xingce Wang, Zhongke Wu, Chen Qian, Ying He

Abstract: This paper presents a method for reconstructing watertight 3D surfaces from unoriented point clouds. Starting with randomly initialized normals, the method iteratively refines each normal by diffusing the gradient of the generalized winding number (GWN) field. Upon convergence, the target surface is extracted using the standard Marching Cubes algorithm. Our method is conceptually simple, easy to i… ▽ More This paper presents a method for reconstructing watertight 3D surfaces from unoriented point clouds. Starting with randomly initialized normals, the method iteratively refines each normal by diffusing the gradient of the generalized winding number (GWN) field. Upon convergence, the target surface is extracted using the standard Marching Cubes algorithm. Our method is conceptually simple, easy to implement, and does not require numerical solvers, which distinguishes it from existing approaches. Designed for parallelization and scalability, it efficiently handles large-scale models on both CPUs and GPUs. Experimental results demonstrate that our method outperforms all existing methods in reconstructing from unoriented point clouds, particularly in terms of runtime performance. On large-scale models with 10 to 20 million points, our CUDA implementation on an NVIDIA GTX 4090 GPU is typically 30-100x faster than iPSR, the leading sequential method tested on a high-end PC with an Intel i9 CPU. Furthermore, our approach exhibits superior robustness against noise and effectively handles models with thin structures, surpassing existing methods. We will make the source code publicly available to encourage further research and applications. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.04219 [pdf, other]

Iterative Experience Refinement of Software-Developing Agents

Authors: Chen Qian, Jiahao Li, Yufan Dang, Wei Liu, YiFei Wang, Zihao Xie, Weize Chen, Cheng Yang, Yingli Zhang, Zhiyuan Liu, Maosong Sun

Abstract: Autonomous agents powered by large language models (LLMs) show significant potential for achieving high autonomy in various scenarios such as software development. Recent research has shown that LLM agents can leverage past experiences to reduce errors and enhance efficiency. However, the static experience paradigm, reliant on a fixed collection of past experiences acquired heuristically, lacks it… ▽ More Autonomous agents powered by large language models (LLMs) show significant potential for achieving high autonomy in various scenarios such as software development. Recent research has shown that LLM agents can leverage past experiences to reduce errors and enhance efficiency. However, the static experience paradigm, reliant on a fixed collection of past experiences acquired heuristically, lacks iterative refinement and thus hampers agents' adaptability. In this paper, we introduce the Iterative Experience Refinement framework, enabling LLM agents to refine experiences iteratively during task execution. We propose two fundamental patterns: the successive pattern, refining based on nearest experiences within a task batch, and the cumulative pattern, acquiring experiences across all previous task batches. Augmented with our heuristic experience elimination, the method prioritizes high-quality and frequently-used experiences, effectively managing the experience space and enhancing efficiency. Extensive experiments show that while the successive pattern may yield superior results, the cumulative pattern provides more stable performance. Moreover, experience elimination facilitates achieving better performance using just 11.54% of a high-quality subset. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: Work in progress

arXiv:2404.19541 [pdf, other]

Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging

Authors: Rayan Armani, Changlin Qian, Jiaxi Jiang, Christian Holz

Abstract: While camera-based capture systems remain the gold standard for recording human motion, learning-based tracking systems based on sparse wearable sensors are gaining popularity. Most commonly, they use inertial sensors, whose propensity for drift and jitter have so far limited tracking accuracy. In this paper, we propose Ultra Inertial Poser, a novel 3D full body pose estimation method that constra… ▽ More While camera-based capture systems remain the gold standard for recording human motion, learning-based tracking systems based on sparse wearable sensors are gaining popularity. Most commonly, they use inertial sensors, whose propensity for drift and jitter have so far limited tracking accuracy. In this paper, we propose Ultra Inertial Poser, a novel 3D full body pose estimation method that constrains drift and jitter in inertial tracking via inter-sensor distances. We estimate these distances across sparse sensor setups using a lightweight embedded tracker that augments inexpensive off-the-shelf 6D inertial measurement units with ultra-wideband radio-based ranging$-$dynamically and without the need for stationary reference anchors. Our method then fuses these inter-sensor distances with the 3D states estimated from each sensor Our graph-based machine learning model processes the 3D states and distances to estimate a person's 3D full body pose and translation. To train our model, we synthesize inertial measurements and distance estimates from the motion capture database AMASS. For evaluation, we contribute a novel motion dataset of 10 participants who performed 25 motion types, captured by 6 wearable IMU+UWB trackers and an optical motion capture system, totaling 200 minutes of synchronized sensor data (UIP-DB). Our extensive experiments show state-of-the-art performance for our method over PIP and TIP, reducing position error from $13.62$ to $10.65cm$ ($22\%$ better) and lowering jitter from $1.56$ to $0.055km/s^3$ (a reduction of $97\%$). △ Less

Submitted 30 April, 2024; originally announced April 2024.

Comments: Accepted by SIGGRAPH 2024, Code: https://github.com/eth-siplab/UltraInertialPoser

MSC Class: 68T07; 68T45; 68U01 ACM Class: I.2; I.3; I.4; I.5

arXiv:2404.19401 [pdf, other]

UniFS: Universal Few-shot Instance Perception with Point Representations

Authors: Sheng Jin, Ruijie Yao, Lumin Xu, Wentao Liu, Chen Qian, Ji Wu, Ping Luo

Abstract: Instance perception tasks (object detection, instance segmentation, pose estimation, counting) play a key role in industrial applications of visual models. As supervised learning methods suffer from high labeling cost, few-shot learning methods which effectively learn from a limited number of labeled examples are desired. Existing few-shot learning methods primarily focus on a restricted set of ta… ▽ More Instance perception tasks (object detection, instance segmentation, pose estimation, counting) play a key role in industrial applications of visual models. As supervised learning methods suffer from high labeling cost, few-shot learning methods which effectively learn from a limited number of labeled examples are desired. Existing few-shot learning methods primarily focus on a restricted set of tasks, presumably due to the challenges involved in designing a generic model capable of representing diverse tasks in a unified manner. In this paper, we propose UniFS, a universal few-shot instance perception model that unifies a wide range of instance perception tasks by reformulating them into a dynamic point representation learning framework. Additionally, we propose Structure-Aware Point Learning (SAPL) to exploit the higher-order structural relationship among points to further enhance representation learning. Our approach makes minimal assumptions about the tasks, yet it achieves competitive results compared to highly specialized and well optimized specialist models. Codes and data are available at https://github.com/jin-s13/UniFS. △ Less

Submitted 18 July, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

Comments: Accepted by ECCV 2024

arXiv:2404.09927 [pdf, other]

Autonomous Path Planning for Intercostal Robotic Ultrasound Imaging Using Reinforcement Learning

Authors: Yuan Bi, Cheng Qian, Zhicheng Zhang, Nassir Navab, Zhongliang Jiang

Abstract: Ultrasound (US) has been widely used in daily clinical practice for screening internal organs and guiding interventions. However, due to the acoustic shadow cast by the subcutaneous rib cage, the US examination for thoracic application is still challenging. To fully cover and reconstruct the region of interest in US for diagnosis, an intercostal scanning path is necessary. To tackle this challenge… ▽ More Ultrasound (US) has been widely used in daily clinical practice for screening internal organs and guiding interventions. However, due to the acoustic shadow cast by the subcutaneous rib cage, the US examination for thoracic application is still challenging. To fully cover and reconstruct the region of interest in US for diagnosis, an intercostal scanning path is necessary. To tackle this challenge, we present a reinforcement learning (RL) approach for planning scanning paths between ribs to monitor changes in lesions on internal organs, such as the liver and heart, which are covered by rib cages. Structured anatomical information of the human skeleton is crucial for planning these intercostal paths. To obtain such anatomical insight, an RL agent is trained in a virtual environment constructed using computational tomography (CT) templates with randomly initialized tumors of various shapes and locations. In addition, task-specific state representation and reward functions are introduced to ensure the convergence of the training process while minimizing the effects of acoustic attenuation and shadows during scanning. To validate the effectiveness of the proposed approach, experiments have been carried out on unseen CTs with randomly defined single or multiple scanning targets. The results demonstrate the efficiency of the proposed RL framework in planning non-shadowed US scanning trajectories in areas with limited acoustic access. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2403.15627 [pdf]

Nanoscale Imaging of Phonons and Reconfiguration in Topologically-Engineered, Self-Assembled Nanoparticle Lattice

Authors: Chang Qian, Ethan Stanifer, Zhan Ma, Binbin Luo, Chang Liu, Lehan Yao, Wenxiao Pan, Xiaoming Mao, Qian Chen

Abstract: Topologically-engineered mechanical frames are important model constructs for architecture, machine mechanisms, and metamaterials. Despite significant advances in macroscopically fashioned frames, realization and phonon imaging of nanoframes have remained challenging. Here we extend for the first time the principles of topologically-engineered mechanical frames to lattices self-assembled from nano… ▽ More Topologically-engineered mechanical frames are important model constructs for architecture, machine mechanisms, and metamaterials. Despite significant advances in macroscopically fashioned frames, realization and phonon imaging of nanoframes have remained challenging. Here we extend for the first time the principles of topologically-engineered mechanical frames to lattices self-assembled from nanoparticles. Liquid-phase transmission electron microscopy images the vibrations of nanoparticles in self-assembled Maxwell and hexagonal lattices at the nanometer resolution, measuring a series of otherwise inaccessible properties such as phonon spectra and nonlinear lattice deformation paths. These properties are experimentally modulated by ionic strength, captured by our discrete mechanical model considering the complexity of nanoscale interactions and thermal fluctuations. The experiment-theory integration bridges mechanical metamaterials and colloidal self-assembly, opening new opportunities to manufacture phononic devices with solution processibility, transformability, light weight, and emergent functions, at underexplored length, frequency, and energy scales. △ Less

Submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.11441 [pdf, other]

doi 10.1126/sciadv.adp2877

Experimental Quantum Byzantine Agreement on a Three-User Quantum Network with Integrated Photonics

Authors: Xu Jing, Cheng Qian, Chen-Xun Weng, Bing-Hong Li, Zhe Chen, Chen-Quan Wang, Jie Tang, Xiao-Wen Gu, Yue-Chan Kong, Tang-Sheng Chen, Hua-Lei Yin, Dong Jiang, Bin Niu, Liang-Liang Lu

Abstract: Quantum communication networks are crucial for both secure communication and cryptographic networked tasks. Building quantum communication networks in a scalable and cost-effective way is essential for their widespread adoption, among which a stable and miniaturized high-quality quantum light source is a key component. Here, we establish a complete polarization entanglement-based fully connected n… ▽ More Quantum communication networks are crucial for both secure communication and cryptographic networked tasks. Building quantum communication networks in a scalable and cost-effective way is essential for their widespread adoption, among which a stable and miniaturized high-quality quantum light source is a key component. Here, we establish a complete polarization entanglement-based fully connected network, which features an ultrabright integrated Bragg reflection waveguide quantum source, managed by an untrusted service provider, and a streamlined polarization analysis module, which requires only one single-photon detector for each end user. We perform a continuously working quantum entanglement distribution and create correlated bit strings between users. Within the framework of one-time universal hashing, we provide the first experimental implementation of source-independent quantum digital signatures using imperfect keys circumventing the necessity for private amplification. More importantly, we further beat the 1/3 fault-tolerance bound in Byzantine agreement, achieving unconditional security without relying on sophisticated techniques. Our results offer an affordable and practical route for addressing consensus challenges within the emerging quantum network landscape. △ Less

Submitted 27 August, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

Journal ref: Science Advances 10, eadp2877 (2024)

arXiv:2403.10319 [pdf, other]

NetBench: A Large-Scale and Comprehensive Network Traffic Benchmark Dataset for Foundation Models

Authors: Chen Qian, Xiaochang Li, Qineng Wang, Gang Zhou, Huajie Shao

Abstract: In computer networking, network traffic refers to the amount of data transmitted in the form of packets between internetworked computers or Cyber-Physical Systems. Monitoring and analyzing network traffic is crucial for ensuring the performance, security, and reliability of a network. However, a significant challenge in network traffic analysis is to process diverse data packets including both cip… ▽ More In computer networking, network traffic refers to the amount of data transmitted in the form of packets between internetworked computers or Cyber-Physical Systems. Monitoring and analyzing network traffic is crucial for ensuring the performance, security, and reliability of a network. However, a significant challenge in network traffic analysis is to process diverse data packets including both ciphertext and plaintext. While many methods have been adopted to analyze network traffic, they often rely on different datasets for performance evaluation. This inconsistency results in substantial manual data processing efforts and unfair comparisons. Moreover, some data processing methods may cause data leakage due to improper separation of training and testing data. To address these issues, we introduce the NetBench, a large-scale and comprehensive benchmark dataset for assessing machine learning models, especially foundation models, in both network traffic classification and generation tasks. NetBench is built upon seven publicly available datasets and encompasses a broad spectrum of 20 tasks, including 15 classification tasks and 5 generation tasks. Furthermore, we evaluate eight State-Of-The-Art (SOTA) classification models (including two foundation models) and two generative models using our benchmark. The results show that foundation models significantly outperform the traditional deep learning methods in traffic classification. We believe NetBench will facilitate fair comparisons among various approaches and advance the development of foundation models for network traffic. Our benchmark is available at https://github.com/WM-JayLab/NetBench. △ Less

Submitted 18 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.09338 [pdf, other]

LocalMamba: Visual State Space Model with Windowed Selective Scan

Authors: Tao Huang, Xiaohuan Pei, Shan You, Fei Wang, Chen Qian, Chang Xu

Abstract: Recent advancements in state space models, notably Mamba, have demonstrated significant progress in modeling long sequences for tasks like language understanding. Yet, their application in vision tasks has not markedly surpassed the performance of traditional Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). This paper posits that the key to enhancing Vision Mamba (ViM) lies in… ▽ More Recent advancements in state space models, notably Mamba, have demonstrated significant progress in modeling long sequences for tasks like language understanding. Yet, their application in vision tasks has not markedly surpassed the performance of traditional Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). This paper posits that the key to enhancing Vision Mamba (ViM) lies in optimizing scan directions for sequence modeling. Traditional ViM approaches, which flatten spatial tokens, overlook the preservation of local 2D dependencies, thereby elongating the distance between adjacent tokens. We introduce a novel local scanning strategy that divides images into distinct windows, effectively capturing local dependencies while maintaining a global perspective. Additionally, acknowledging the varying preferences for scan patterns across different network layers, we propose a dynamic method to independently search for the optimal scan choices for each layer, substantially improving performance. Extensive experiments across both plain and hierarchical models underscore our approach's superiority in effectively capturing image representations. For example, our model significantly outperforms Vim-Ti by 3.1% on ImageNet with the same 1.5G FLOPs. Code is available at: https://github.com/hunto/LocalMamba. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.08604 [pdf, other]

DevBench: A Comprehensive Benchmark for Software Development

Authors: Bowen Li, Wenhan Wu, Ziwei Tang, Lin Shi, John Yang, Jinyang Li, Shunyu Yao, Chen Qian, Binyuan Hui, Qicheng Zhang, Zhiyin Yu, He Du, Ping Yang, Dahua Lin, Chao Peng, Kai Chen

Abstract: Recent advancements in large language models (LLMs) have significantly enhanced their coding capabilities. However, existing benchmarks predominantly focused on simplified or isolated aspects of programming, such as single-file code generation or repository issue debugging, falling short of measuring the full spectrum of challenges raised by real-world programming activities. To this end, we propo… ▽ More Recent advancements in large language models (LLMs) have significantly enhanced their coding capabilities. However, existing benchmarks predominantly focused on simplified or isolated aspects of programming, such as single-file code generation or repository issue debugging, falling short of measuring the full spectrum of challenges raised by real-world programming activities. To this end, we propose DevBench, a comprehensive benchmark that evaluates LLMs across various stages of the software development lifecycle, including software design, environment setup, implementation, acceptance testing, and unit testing. DevBench features a wide range of programming languages and domains, high-quality data collection, and carefully designed and verified metrics for each task. Empirical studies show that current LLMs, including GPT-4-Turbo, fail to solve the challenges presented within DevBench. Analyses reveal that models struggle with understanding the complex structures in the repository, managing the compilation process, and grasping advanced programming concepts. Our findings offer actionable insights for the future development of LLMs toward real-world programming applications. Our benchmark is available at https://github.com/open-compass/DevBench △ Less

Submitted 15 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

Comments: Our data and code are available at https://github.com/open-compass/DevBench

arXiv:2403.05155 [pdf, other]

LanePtrNet: Revisiting Lane Detection as Point Voting and Grouping on Curves

Authors: Jiayan Cao, Xueyu Zhu, Cheng Qian

Abstract: Lane detection plays a critical role in the field of autonomous driving. Prevailing methods generally adopt basic concepts (anchors, key points, etc.) from object detection and segmentation tasks, while these approaches require manual adjustments for curved objects, involve exhaustive searches on predefined anchors, require complex post-processing steps, and may lack flexibility when applied to re… ▽ More Lane detection plays a critical role in the field of autonomous driving. Prevailing methods generally adopt basic concepts (anchors, key points, etc.) from object detection and segmentation tasks, while these approaches require manual adjustments for curved objects, involve exhaustive searches on predefined anchors, require complex post-processing steps, and may lack flexibility when applied to real-world scenarios.In this paper, we propose a novel approach, LanePtrNet, which treats lane detection as a process of point voting and grouping on ordered sets: Our method takes backbone features as input and predicts a curve-aware centerness, which represents each lane as a point and assigns the most probable center point to it. A novel point sampling method is proposed to generate a set of candidate points based on the votes received. By leveraging features from local neighborhoods, and cross-instance attention score, we design a grouping module that further performs lane-wise clustering between neighboring and seeding points. Furthermore, our method can accommodate a point-based framework, (PointNet++ series, etc.) as an alternative to the backbone. This flexibility enables effortless extension to 3D lane detection tasks. We conduct comprehensive experiments to validate the effectiveness of our proposed approach, demonstrating its superior performance. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.01922 [pdf, other]

doi 10.1109/PerComWorkshops59983.2024.10503436

FlowPrecision: Advancing FPGA-Based Real-Time Fluid Flow Estimation with Linear Quantization

Authors: Tianheng Ling, Julian Hoever, Chao Qian, Gregor Schiele

Abstract: In industrial and environmental monitoring, achieving real-time and precise fluid flow measurement remains a critical challenge. This study applies linear quantization in FPGA-based soft sensors for fluid flow estimation, significantly enhancing Neural Network model precision by overcoming the limitations of traditional fixed-point quantization. Our approach achieves up to a 10.10% reduction in Me… ▽ More In industrial and environmental monitoring, achieving real-time and precise fluid flow measurement remains a critical challenge. This study applies linear quantization in FPGA-based soft sensors for fluid flow estimation, significantly enhancing Neural Network model precision by overcoming the limitations of traditional fixed-point quantization. Our approach achieves up to a 10.10% reduction in Mean Squared Error and a notable 9.39% improvement in inference speed through targeted hardware optimizations. Validated across multiple data sets, our findings demonstrate that the optimized FPGA-based quantized models can provide efficient, accurate real-time inference, offering a viable alternative to cloud-based processing in pervasive autonomous systems. △ Less

Submitted 20 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: 6 pages, 3 figures, The 22nd International Conference on Pervasive Computing and Communications (PerCom 2024), PerConAI Workshop

arXiv:2403.01740 [pdf, other]

DEMOS: Dynamic Environment Motion Synthesis in 3D Scenes via Local Spherical-BEV Perception

Authors: Jingyu Gong, Min Wang, Wentao Liu, Chen Qian, Zhizhong Zhang, Yuan Xie, Lizhuang Ma

Abstract: Motion synthesis in real-world 3D scenes has recently attracted much attention. However, the static environment assumption made by most current methods usually cannot be satisfied especially for real-time motion synthesis in scanned point cloud scenes, if multiple dynamic objects exist, e.g., moving persons or vehicles. To handle this problem, we propose the first Dynamic Environment MOtion Synthe… ▽ More Motion synthesis in real-world 3D scenes has recently attracted much attention. However, the static environment assumption made by most current methods usually cannot be satisfied especially for real-time motion synthesis in scanned point cloud scenes, if multiple dynamic objects exist, e.g., moving persons or vehicles. To handle this problem, we propose the first Dynamic Environment MOtion Synthesis framework (DEMOS) to predict future motion instantly according to the current scene, and use it to dynamically update the latent motion for final motion synthesis. Concretely, we propose a Spherical-BEV perception method to extract local scene features that are specifically designed for instant scene-aware motion prediction. Then, we design a time-variant motion blending to fuse the new predicted motions into the latent motion, and the final motion is derived from the updated latent motions, benefitting both from motion-prior and iterative methods. We unify the data format of two prevailing datasets, PROX and GTA-IM, and take them for motion synthesis evaluation in 3D scenes. We also assess the effectiveness of the proposed method in dynamic environments from GTA-IM and Semantic3D to check the responsiveness. The results show our method outperforms previous works significantly and has great performance in handling dynamic environments. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.19465 [pdf, other]

Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models

Authors: Chen Qian, Jie Zhang, Wei Yao, Dongrui Liu, Zhenfei Yin, Yu Qiao, Yong Liu, Jing Shao

Abstract: Ensuring the trustworthiness of large language models (LLMs) is crucial. Most studies concentrate on fully pre-trained LLMs to better understand and improve LLMs' trustworthiness. In this paper, to reveal the untapped potential of pre-training, we pioneer the exploration of LLMs' trustworthiness during this period, focusing on five key dimensions: reliability, privacy, toxicity, fairness, and robu… ▽ More Ensuring the trustworthiness of large language models (LLMs) is crucial. Most studies concentrate on fully pre-trained LLMs to better understand and improve LLMs' trustworthiness. In this paper, to reveal the untapped potential of pre-training, we pioneer the exploration of LLMs' trustworthiness during this period, focusing on five key dimensions: reliability, privacy, toxicity, fairness, and robustness. To begin with, we apply linear probing to LLMs. The high probing accuracy suggests that \textit{LLMs in early pre-training can already distinguish concepts in each trustworthiness dimension}. Therefore, to further uncover the hidden possibilities of pre-training, we extract steering vectors from a LLM's pre-training checkpoints to enhance the LLM's trustworthiness. Finally, inspired by~\citet{choi2023understanding} that mutual information estimation is bounded by linear probing accuracy, we also probe LLMs with mutual information to investigate the dynamics of trustworthiness during pre-training. We are the first to observe a similar two-phase phenomenon: fitting and compression~\citep{shwartz2017opening}. This research provides an initial exploration of trustworthiness modeling during LLM pre-training, seeking to unveil new insights and spur further developments in the field. We will make our code publicly accessible at \url{https://github.com/ChnQ/TracingLLM}. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.18498 [pdf, other]

doi 10.1145/3640543.3645198

Take It, Leave It, or Fix It: Measuring Productivity and Trust in Human-AI Collaboration

Authors: Crystal Qian, James Wexler

Abstract: Although recent developments in generative AI have greatly enhanced the capabilities of conversational agents such as Google's Gemini (formerly Bard) or OpenAI's ChatGPT, it's unclear whether the usage of these agents aids users across various contexts. To better understand how access to conversational AI affects productivity and trust, we conducted a mixed-methods, task-based user study, observin… ▽ More Although recent developments in generative AI have greatly enhanced the capabilities of conversational agents such as Google's Gemini (formerly Bard) or OpenAI's ChatGPT, it's unclear whether the usage of these agents aids users across various contexts. To better understand how access to conversational AI affects productivity and trust, we conducted a mixed-methods, task-based user study, observing 76 software engineers (N=76) as they completed a programming exam with and without access to Bard. Effects on performance, efficiency, satisfaction, and trust vary depending on user expertise, question type (open-ended "solve" vs. definitive "search" questions), and measurement type (demonstrated vs. self-reported). Our findings include evidence of automation complacency, increased reliance on the AI over the course of the task, and increased performance for novices on "solve"-type questions when using the AI. We discuss common behaviors, design recommendations, and impact considerations to improve collaborations with conversational AI. △ Less

Submitted 1 April, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

Comments: 15 pages. Published in the 29th International Conference on Intelligent User Interfaces (IUI '24)

arXiv:2402.18439 [pdf, other]

Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication

Authors: Weize Chen, Chenfei Yuan, Jiarui Yuan, Yusheng Su, Chen Qian, Cheng Yang, Ruobing Xie, Zhiyuan Liu, Maosong Sun

Abstract: Natural language (NL) has long been the predominant format for human cognition and communication, and by extension, has been similarly pivotal in the development and application of Large Language Models (LLMs). Yet, besides NL, LLMs have seen various non-NL formats during pre-training, such as code and logical expression. NL's status as the optimal format for LLMs, particularly in single-LLM reaso… ▽ More Natural language (NL) has long been the predominant format for human cognition and communication, and by extension, has been similarly pivotal in the development and application of Large Language Models (LLMs). Yet, besides NL, LLMs have seen various non-NL formats during pre-training, such as code and logical expression. NL's status as the optimal format for LLMs, particularly in single-LLM reasoning and multi-agent communication, has not been thoroughly examined. In this work, we challenge the default use of NL by exploring the utility of non-NL formats in these contexts. We show that allowing LLMs to autonomously select the most suitable format before reasoning or communicating leads to a 3.3 to 5.7\% improvement in reasoning efficiency for different LLMs, and up to a 72.7\% reduction in token usage in multi-agent communication, all while maintaining communicative effectiveness. Our comprehensive analysis further reveals that LLMs can devise a format from limited task instructions and that the devised format is effectively transferable across different LLMs. Intriguingly, the structured communication format decided by LLMs exhibits notable parallels with established agent communication languages, suggesting a natural evolution towards efficient, structured communication in agent communication. Our code is released at \url{https://github.com/thunlp/AutoForm}. △ Less

Submitted 18 June, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

Comments: Code release at https://github.com/thunlp/AutoForm

Showing 1–50 of 413 results for author: Qian, C