Skip to main content

Showing 1–50 of 801 results for author: Shen, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.16937  [pdf, other

    cs.CL

    Plausible-Parrots @ MSP2023: Enhancing Semantic Plausibility Modeling using Entity and Event Knowledge

    Authors: Chong Shen, Chenyue Zhou

    Abstract: In this work, we investigate the effectiveness of injecting external knowledge to a large language model (LLM) to identify semantic plausibility of simple events. Specifically, we enhance the LLM with fine-grained entity types, event types and their definitions extracted from an external knowledge base. These knowledge are injected into our system via designed templates. We also augment the data t… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 10 pages, 5 figures, 5 tables

  2. arXiv:2408.14144  [pdf, other

    cs.LG cs.DC

    Neighborhood and Global Perturbations Supported SAM in Federated Learning: From Local Tweaks To Global Awareness

    Authors: Boyuan Li, Zihao Peng, Yafei Li, Mingliang Xu, Shengbo Chen, Baofeng Ji, Cong Shen

    Abstract: Federated Learning (FL) can be coordinated under the orchestration of a central server to collaboratively build a privacy-preserving model without the need for data exchange. However, participant data heterogeneity leads to local optima divergence, subsequently affecting convergence outcomes. Recent research has focused on global sharpness-aware minimization (SAM) and dynamic regularization techni… ▽ More

    Submitted 29 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

  3. arXiv:2408.11313  [pdf, other

    cs.AI

    Unlocking Adversarial Suffix Optimization Without Affirmative Phrases: Efficient Black-box Jailbreaking via LLM as Optimizer

    Authors: Weipeng Jiang, Zhenting Wang, Juan Zhai, Shiqing Ma, Zhengyu Zhao, Chao Shen

    Abstract: Despite prior safety alignment efforts, mainstream LLMs can still generate harmful and unethical content when subjected to jailbreaking attacks. Existing jailbreaking methods fall into two main categories: template-based and optimization-based methods. The former requires significant manual effort and domain knowledge, while the latter, exemplified by Greedy Coordinate Gradient (GCG), which seeks… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  4. arXiv:2408.05449  [pdf

    physics.optics cs.CV physics.app-ph

    Unidirectional imaging with partially coherent light

    Authors: Guangdong Ma, Che-Yung Shen, Jingxi Li, Luzhe Huang, Cagatay Isil, Fazil Onuralp Ardic, Xilin Yang, Yuhang Li, Yuntian Wang, Md Sadman Sakib Rahman, Aydogan Ozcan

    Abstract: Unidirectional imagers form images of input objects only in one direction, e.g., from field-of-view (FOV) A to FOV B, while blocking the image formation in the reverse direction, from FOV B to FOV A. Here, we report unidirectional imaging under spatially partially coherent light and demonstrate high-quality imaging only in the forward direction (A->B) with high power efficiency while distorting th… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: 25 Pages, 8 Figures

  5. arXiv:2408.03508  [pdf

    cond-mat.mtrl-sci cs.LG eess.SY

    Autonomous, Self-driving Multi-Step Growth of Semiconductor Heterostructures Guided by Machine Learning

    Authors: Chao Shen, Wenkang Zhan, Hongyu Sun, Kaiyao Xin, Bo Xu, Zhanguo Wang, Chao Zhao

    Abstract: The semiconductor industry has prioritized automating repetitive tasks by closed-loop, autonomous experimentation which enables accelerated optimization of complex multi-step processes. The emergence of machine learning (ML) has ushered in automated process with minimal human intervention. In this work, we develop SemiEpi, a self-driving automation platform capable of executing molecular beam epit… ▽ More

    Submitted 8 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

    Comments: 5 figures

  6. arXiv:2408.02635  [pdf, other

    cs.CV

    Interactive 3D Medical Image Segmentation with SAM 2

    Authors: Chuyun Shen, Wenhao Li, Yuhang Shi, Xiangfeng Wang

    Abstract: Interactive medical image segmentation (IMIS) has shown significant potential in enhancing segmentation accuracy by integrating iterative feedback from medical professionals. However, the limited availability of enough 3D medical data restricts the generalization and robustness of most IMIS methods. The Segment Anything Model (SAM), though effective for 2D images, requires expensive semi-auto slic… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  7. Degrade to Function: Towards Eco-friendly Morphing Devices that Function Through Programmed Sequential Degradation

    Authors: Qiuyu Lu, Semina Yi, Mentian Gan, Jihong Huang, Xiao Zhang, Yue Yang, Chenyi Shen, Lining Yao

    Abstract: While it seems counterintuitive to think of degradation within an operating device as beneficial, one may argue that when rationally designed, the controlled breakdown of materials can be harnessed for specific functions. To apply this principle to the design of morphing devices, we introduce the concept of Degrade to Function (DtF). This concept aims to create eco-friendly and self-contained morp… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: 24 pages, 24 figures, The 37th Annual ACM Symposium on User Interface Software and Technology (UIST 24)

  8. arXiv:2408.00929  [pdf, other

    cs.LG cs.CR

    Verification of Machine Unlearning is Fragile

    Authors: Binchi Zhang, Zihan Chen, Cong Shen, Jundong Li

    Abstract: As privacy concerns escalate in the realm of machine learning, data owners now have the option to utilize machine unlearning to remove their data from machine learning models, following recent legislation. To enhance transparency in machine unlearning and avoid potential dishonesty by model providers, various verification strategies have been proposed. These strategies enable data owners to ascert… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: ICML 2024

  9. arXiv:2407.19845  [pdf, other

    cs.LG cs.CR

    BackdoorBench: A Comprehensive Benchmark and Analysis of Backdoor Learning

    Authors: Baoyuan Wu, Hongrui Chen, Mingda Zhang, Zihao Zhu, Shaokui Wei, Danni Yuan, Mingli Zhu, Ruotong Wang, Li Liu, Chao Shen

    Abstract: As an emerging approach to explore the vulnerability of deep neural networks (DNNs), backdoor learning has attracted increasing interest in recent years, and many seminal backdoor attack and defense algorithms are being developed successively or concurrently, in the status of a rapid arms race. However, mainly due to the diverse settings, and the difficulties of implementation and reproducibility… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Substantial extensions based on our previous conference version "Backdoorbench: A comprehensive benchmark of backdoor learning" published at NeurIPS D&B Track 2022. 20 backdoor attack algorithms, 32 backdoor defense algorithms, 11000+ pairs of attack-against-defense evaluations, 10 analyses, 18 analysis tools

  10. arXiv:2407.16655  [pdf, other

    cs.CV

    MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence

    Authors: Canyu Zhao, Mingyu Liu, Wen Wang, Jianlong Yuan, Hao Chen, Bo Zhang, Chunhua Shen

    Abstract: Recent advancements in video generation have primarily leveraged diffusion models for short-duration content. However, these approaches often fall short in modeling complex narratives and maintaining character consistency over extended periods, which is essential for long-form video production like movies. We propose MovieDreamer, a novel hierarchical framework that integrates the strengths of aut… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 23 pages, 18 figures

  11. arXiv:2407.14904  [pdf, other

    eess.IV cs.AI cs.CL cs.CV

    Large-vocabulary forensic pathological analyses via prototypical cross-modal contrastive learning

    Authors: Chen Shen, Chunfeng Lian, Wanqing Zhang, Fan Wang, Jianhua Zhang, Shuanliang Fan, Xin Wei, Gongji Wang, Kehan Li, Hongshu Mu, Hao Wu, Xinggong Liang, Jianhua Ma, Zhenyuan Wang

    Abstract: Forensic pathology is critical in determining the cause and manner of death through post-mortem examinations, both macroscopic and microscopic. The field, however, grapples with issues such as outcome variability, laborious processes, and a scarcity of trained professionals. This paper presents SongCi, an innovative visual-language model (VLM) designed specifically for forensic pathology. SongCi u… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: 28 pages, 6 figures, under review

  12. arXiv:2407.12226  [pdf, other

    cs.LG

    Individualized Federated Learning for Traffic Prediction with Error Driven Aggregation

    Authors: Hang Chen, Collin Meese, Mark Nejad, Chien-Chung Shen

    Abstract: Low-latency traffic prediction is vital for smart city traffic management. Federated Learning has emerged as a promising technique for Traffic Prediction (FLTP), offering several advantages such as privacy preservation, reduced communication overhead, improved prediction accuracy, and enhanced adaptability to changing traffic conditions. However, majority of the current FLTP frameworks lack a real… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 16 pages, 4 figures

  13. arXiv:2407.10785  [pdf, other

    eess.IV cs.CV

    Interpretability analysis on a pathology foundation model reveals biologically relevant embeddings across modalities

    Authors: Nhat Le, Ciyue Shen, Chintan Shah, Blake Martin, Daniel Shenker, Harshith Padigela, Jennifer Hipp, Sean Grullon, John Abel, Harsha Vardhan Pokkalla, Dinkar Juyal

    Abstract: Mechanistic interpretability has been explored in detail for large language models (LLMs). For the first time, we provide a preliminary investigation with similar interpretability methods for medical imaging. Specifically, we analyze the features from a ViT-Small encoder obtained from a pathology Foundation Model via application to two datasets: one dataset of pathology images, and one dataset of… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  14. arXiv:2407.10575  [pdf, other

    cs.CV

    A Survey of Defenses against AI-generated Visual Media: Detection, Disruption, and Authentication

    Authors: Jingyi Deng, Chenhao Lin, Zhengyu Zhao, Shuai Liu, Qian Wang, Chao Shen

    Abstract: Deep generative models have demonstrated impressive performance in various computer vision applications, including image synthesis, video generation, and medical analysis. Despite their significant advancements, these models may be used for malicious purposes, such as misinformation, deception, and copyright violation. In this paper, we provide a systematic and timely review of research efforts on… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  15. arXiv:2407.10196  [pdf, other

    cs.LG cs.AI

    A3S: A General Active Clustering Method with Pairwise Constraints

    Authors: Xun Deng, Junlong Liu, Han Zhong, Fuli Feng, Chen Shen, Xiangnan He, Jieping Ye, Zheng Wang

    Abstract: Active clustering aims to boost the clustering performance by integrating human-annotated pairwise constraints through strategic querying. Conventional approaches with semi-supervised clustering schemes encounter high query costs when applied to large datasets with numerous classes. To address these limitations, we propose a novel Adaptive Active Aggregation and Splitting (A3S) framework, falling… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  16. arXiv:2407.09295  [pdf, other

    cs.CR

    Security Matrix for Multimodal Agents on Mobile Devices: A Systematic and Proof of Concept Study

    Authors: Yulong Yang, Xinshan Yang, Shuaidong Li, Chenhao Lin, Zhengyu Zhao, Chao Shen, Tianwei Zhang

    Abstract: The rapid progress in the reasoning capability of the Multi-modal Large Language Models (MLLMs) has triggered the development of autonomous agent systems on mobile devices. MLLM-based mobile agent systems consist of perception, reasoning, memory, and multi-agent collaboration modules, enabling automatic analysis of user instructions and the design of task pipelines with only natural language and d… ▽ More

    Submitted 17 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

    Comments: Preprint. Work in progress

  17. arXiv:2407.09247  [pdf, other

    cs.AI

    Constrained Intrinsic Motivation for Reinforcement Learning

    Authors: Xiang Zheng, Xingjun Ma, Chao Shen, Cong Wang

    Abstract: This paper investigates two fundamental problems that arise when utilizing Intrinsic Motivation (IM) for reinforcement learning in Reward-Free Pre-Training (RFPT) tasks and Exploration with Intrinsic Motivation (EIM) tasks: 1) how to design an effective intrinsic objective in RFPT tasks, and 2) how to reduce the bias introduced by the intrinsic objective in EIM tasks. Existing IM methods suffer fr… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted by IJCAI 2024

  18. arXiv:2407.09120  [pdf, other

    cs.LG cs.CL cs.CV

    URRL-IMVC: Unified and Robust Representation Learning for Incomplete Multi-View Clustering

    Authors: Ge Teng, Ting Mao, Chen Shen, Xiang Tian, Xuesong Liu, Yaowu Chen, Jieping Ye

    Abstract: Incomplete multi-view clustering (IMVC) aims to cluster multi-view data that are only partially available. This poses two main challenges: effectively leveraging multi-view information and mitigating the impact of missing views. Prevailing solutions employ cross-view contrastive learning and missing view recovery techniques. However, they either neglect valuable complementary information by focusi… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM SIGKDD 2024

  19. arXiv:2407.07930  [pdf

    q-bio.BM cs.LG

    Token-Mol 1.0: Tokenized drug design with large language model

    Authors: Jike Wang, Rui Qin, Mingyang Wang, Meijing Fang, Yangyang Zhang, Yuchen Zhu, Qun Su, Qiaolin Gou, Chao Shen, Odin Zhang, Zhenxing Wu, Dejun Jiang, Xujun Zhang, Huifeng Zhao, Xiaozhe Wan, Zhourui Wu, Liwei Liu, Yu Kang, Chang-Yu Hsieh, Tingjun Hou

    Abstract: Significant interests have recently risen in leveraging sequence-based large language models (LLMs) for drug design. However, most current applications of LLMs in drug discovery lack the ability to comprehend three-dimensional (3D) structures, thereby limiting their effectiveness in tasks that explicitly involve molecular conformations. In this study, we introduced Token-Mol, a token-only 3D drug… ▽ More

    Submitted 19 August, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  20. arXiv:2407.06083  [pdf, other

    cs.LG cs.IR

    A Survey of Controllable Learning: Methods and Applications in Information Retrieval

    Authors: Chenglei Shen, Xiao Zhang, Teng Shi, Changshuo Zhang, Guofu Xie, Jun Xu

    Abstract: Controllable learning (CL) emerges as a critical component in trustworthy machine learning, ensuring that learners meet predefined targets and can adaptively adjust without retraining according to the changes in those targets. We provide a formal definition of CL, and discuss its applications in information retrieval (IR) where information needs are often complex and dynamic. The survey categorize… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  21. arXiv:2407.04947  [pdf, other

    cs.CV

    FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior

    Authors: Zhekai Chen, Wen Wang, Zhen Yang, Zeqing Yuan, Hao Chen, Chunhua Shen

    Abstract: We offer a novel approach to image composition, which integrates multiple input images into a single, coherent image. Rather than concentrating on specific use cases such as appearance editing (image harmonization) or semantic editing (semantic image composition), we showcase the potential of utilizing the powerful generative prior inherent in large-scale pre-trained diffusion models to accomplish… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted to Proc. Eur. Conf. Comp. Vision 2024. Project webpage: https://github.com/aim-uofa/FreeCompose

  22. arXiv:2407.04675  [pdf, other

    eess.AS cs.SD

    Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

    Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

    Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More

    Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  23. arXiv:2407.03130  [pdf, other

    cs.CV

    Towards Efficient Pixel Labeling for Industrial Anomaly Detection and Localization

    Authors: Hanxi Li, Jingqi Wu, Lin Yuanbo Wu, Hao Chen, Deyin Liu, Chunhua Shen

    Abstract: In the realm of practical Anomaly Detection (AD) tasks, manual labeling of anomalous pixels proves to be a costly endeavor. Consequently, many AD methods are crafted as one-class classifiers, tailored for training sets completely devoid of anomalies, ensuring a more cost-effective approach. While some pioneering work has demonstrated heightened AD accuracy by incorporating real anomaly samples in… ▽ More

    Submitted 4 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: 18 pages, 5 figures

  24. arXiv:2407.02805  [pdf, other

    cs.SE cs.AI

    Efficient DNN-Powered Software with Fair Sparse Models

    Authors: Xuanqi Gao, Weipeng Jiang, Juan Zhai, Shiqing Ma, Xiaoyu Zhang, Chao Shen

    Abstract: With the emergence of the Software 3.0 era, there is a growing trend of compressing and integrating large models into software systems, with significant societal implications. Regrettably, in numerous instances, model compression techniques impact the fairness performance of these models and thus the ethical behavior of DNN-powered software. One of the most notable example is the Lottery Ticket Hy… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  25. arXiv:2407.02014  [pdf, other

    cs.CV

    Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning

    Authors: Chengchao Shen, Jianzhong Chen, Jianxin Wang

    Abstract: The existing contrastive learning methods mainly focus on single-grained representation learning, e.g., part-level, object-level or scene-level ones, thus inevitably neglecting the transferability of representations on other granularity levels. In this paper, we aim to learn multi-grained representations, which can effectively describe the image on various granularity levels, thus improving genera… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  26. arXiv:2406.19311  [pdf, other

    cs.CR cs.SD eess.AS

    Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems

    Authors: Zheng Fang, Tao Wang, Lingchen Zhao, Shenyi Zhang, Bowen Li, Yunjie Ge, Qi Li, Chao Shen, Qian Wang

    Abstract: In recent years, extensive research has been conducted on the vulnerability of ASR systems, revealing that black-box adversarial example attacks pose significant threats to real-world ASR systems. However, most existing black-box attacks rely on queries to the target ASRs, which is impractical when queries are not permitted. In this paper, we propose ZQ-Attack, a transfer-based adversarial attack… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: To appear in the Proceedings of The ACM Conference on Computer and Communications Security (CCS), 2024

  27. arXiv:2406.14913  [pdf, other

    physics.soc-ph cs.MA

    Cooperative bots exhibit nuanced effects on cooperation across strategic frameworks

    Authors: Zehua Si, Zhixue He, Chen Shen, Jun Tanimoto

    Abstract: The positive impact of cooperative bots on cooperation within evolutionary game theory is well documented; however, existing studies have predominantly used discrete strategic frameworks, focusing on deterministic actions with a fixed probability of one. This paper extends the investigation to continuous and mixed strategic approaches. Continuous strategies employ intermediate probabilities to con… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  28. arXiv:2406.13988  [pdf, other

    cs.CV

    LGmap: Local-to-Global Mapping Network for Online Long-Range Vectorized HD Map Construction

    Authors: Kuang Wu, Sulei Nian, Can Shen, Chuan Yang, Zhanbin Li

    Abstract: This report introduces the first-place winning solution for the Autonomous Grand Challenge 2024 - Mapless Driving. In this report, we introduce a novel online mapping pipeline LGmap, which adept at long-range temporal model. Firstly, we propose symmetric view transformation(SVT), a hybrid view transformation module. Our approach overcomes the limitations of forward sparse feature representation an… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  29. arXiv:2406.12671  [pdf, other

    cs.CV

    GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation Models

    Authors: Yongtao Ge, Guangkai Xu, Zhiyue Zhao, Libo Sun, Zheng Huang, Yanlong Sun, Hao Chen, Chunhua Shen

    Abstract: Recent advances in discriminative and generative pretraining have yielded geometry estimation models with strong generalization capabilities. While discriminative monocular geometry estimation methods rely on large-scale fine-tuning data to achieve zero-shot generalization, several generative-based paradigms show the potential of achieving impressive generalization performance on unseen scenes by… ▽ More

    Submitted 20 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: Code and Benchmark are available at: https://github.com/aim-uofa/GeoBench

  30. arXiv:2406.12196  [pdf, other

    cs.SE

    CITADEL: Context Similarity Based Deep Learning Framework Bug Finding

    Authors: Xiaoyu Zhang, Juan Zhai, Shiqing Ma, Shiwei Wang, Chao Shen

    Abstract: With deep learning (DL) technology becoming an integral part of the new intelligent software, tools of DL framework testing and bug-finding are in high demand. Existing DL framework testing tools have limited coverage on bug types. For example, they lack the capability of finding performance bugs, which are critical for DL model training and inference regarding performance, economics, and the envi… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 12 pages, 10 figures

  31. arXiv:2406.11548  [pdf, other

    cs.RO cs.AI cs.CV

    AIC MLLM: Autonomous Interactive Correction MLLM for Robust Robotic Manipulation

    Authors: Chuyan Xiong, Chengyu Shen, Xiaoqi Li, Kaichen Zhou, Jiaming Liu, Ruiping Wang, Hao Dong

    Abstract: The ability to reflect on and correct failures is crucial for robotic systems to interact stably with real-life objects.Observing the generalization and reasoning capabilities of Multimodal Large Language Models (MLLMs), previous approaches have aimed to utilize these models to enhance robotic systems accordingly.However, these methods typically focus on high-level planning corrections using an ad… ▽ More

    Submitted 23 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  32. arXiv:2406.10584  [pdf, other

    cs.CL

    Concentrate Attention: Towards Domain-Generalizable Prompt Optimization for Language Models

    Authors: Chengzhengxu Li, Xiaoming Liu, Zhaohan Zhang, Yichen Wang, Chen Liu, Yu Lan, Chao Shen

    Abstract: Recent advances in prompt optimization have notably enhanced the performance of pre-trained language models (PLMs) on downstream tasks. However, the potential of optimized prompts on domain generalization has been under-explored. To explore the nature of prompt generalization on unknown domains, we conduct pilot experiments and find that (i) Prompts gaining more attention weight from PLMs' deep la… ▽ More

    Submitted 27 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: Submitted to NeurIPS 2024, Preprint, Under review

  33. arXiv:2406.10125  [pdf, other

    cs.CV

    MapVision: CVPR 2024 Autonomous Grand Challenge Mapless Driving Tech Report

    Authors: Zhongyu Yang, Mai Liu, Jinluo Xie, Yueming Zhang, Chen Shen, Wei Shao, Jichao Jiao, Tengfei Xing, Runbo Hu, Pengfei Xu

    Abstract: Autonomous driving without high-definition (HD) maps demands a higher level of active scene understanding. In this competition, the organizers provided the multi-perspective camera images and standard-definition (SD) maps to explore the boundaries of scene reasoning capabilities. We found that most existing algorithms construct Bird's Eye View (BEV) features from these multi-perspective images and… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  34. arXiv:2406.08477  [pdf, other

    cs.IR

    Improving LLMs for Recommendation with Out-Of-Vocabulary Tokens

    Authors: Ting-Ji Huang, Jia-Qi Yang, Chunxu Shen, Kai-Qi Liu, De-Chuan Zhan, Han-Jia Ye

    Abstract: Characterizing users and items through vector representations is crucial for various tasks in recommender systems. Recent approaches attempt to apply Large Language Models (LLMs) in recommendation through a question and answer format, where real users and items (e.g., Item No.2024) are represented with in-vocabulary tokens (e.g., "item", "20", "24"). However, since LLMs are typically pretrained on… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  35. arXiv:2406.06579  [pdf, other

    cs.CL cs.AI cs.CV

    From Redundancy to Relevance: Enhancing Explainability in Multimodal Large Language Models

    Authors: Xiaofeng Zhang, Chen Shen, Xiaosong Yuan, Shaotian Yan, Liang Xie, Wenxiao Wang, Chaochen Gu, Hao Tang, Jieping Ye

    Abstract: Recently, multimodal large language models have exploded with an endless variety, most of the popular Large Vision Language Models (LVLMs) depend on sequential visual representation, where images are converted into hundreds or thousands of tokens before being input into the Large Language Model (LLM) along with language prompts. The black-box design hinders the interpretability of visual-language… ▽ More

    Submitted 13 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  36. arXiv:2406.05810  [pdf, other

    cs.CV

    ControlLoc: Physical-World Hijacking Attack on Visual Perception in Autonomous Driving

    Authors: Chen Ma, Ningfei Wang, Zhengyu Zhao, Qian Wang, Qi Alfred Chen, Chao Shen

    Abstract: Recent research in adversarial machine learning has focused on visual perception in Autonomous Driving (AD) and has shown that printed adversarial patches can attack object detectors. However, it is important to note that AD visual perception encompasses more than just object detection; it also includes Multiple Object Tracking (MOT). MOT enhances the robustness by compensating for object detectio… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  37. arXiv:2406.05800   

    cs.CV cs.CR

    SlowPerception: Physical-World Latency Attack against Visual Perception in Autonomous Driving

    Authors: Chen Ma, Ningfei Wang, Zhengyu Zhao, Qi Alfred Chen, Chao Shen

    Abstract: Autonomous Driving (AD) systems critically depend on visual perception for real-time object detection and multiple object tracking (MOT) to ensure safe driving. However, high latency in these visual perception components can lead to significant safety risks, such as vehicle collisions. While previous research has extensively explored latency attacks within the digital realm, translating these meth… ▽ More

    Submitted 19 July, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: This submission was made without all contributors' consent

  38. arXiv:2406.04596  [pdf, other

    cs.LG

    Federated Representation Learning in the Under-Parameterized Regime

    Authors: Renpu Liu, Cong Shen, Jing Yang

    Abstract: Federated representation learning (FRL) is a popular personalized federated learning (FL) framework where clients work together to train a common representation while retaining their personalized heads. Existing studies, however, largely focus on the over-parameterized regime. In this paper, we make the initial efforts to investigate FRL in the under-parameterized regime, where the FL model is ins… ▽ More

    Submitted 17 July, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: This work has been accepted to ICML 2024

  39. arXiv:2406.04149  [pdf

    eess.IV cs.AI

    Characterizing segregation in blast rock piles a deep-learning approach leveraging aerial image analysis

    Authors: Chengeng Liu, Sihong Liu, Chaomin Shen, Yupeng Gao, Yuxuan Liu

    Abstract: Blasted rock material serves a critical role in various engineering applications, yet the phenomenon of segregation-where particle sizes vary significantly along the gradient of a quarry pile-presents challenges for optimizing quarry material storage and handling. This study introduces an advanced image analysis methodology to characterize such segregation of rock fragments. The accurate delineati… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  40. arXiv:2406.03730  [pdf, other

    cs.LG cs.AI

    FastGAS: Fast Graph-based Annotation Selection for In-Context Learning

    Authors: Zihan Chen, Song Wang, Cong Shen, Jundong Li

    Abstract: In-context learning (ICL) empowers large language models (LLMs) to tackle new tasks by using a series of training instances as prompts. Since generating the prompts needs to sample from a vast pool of instances and annotate them (e.g., add labels in classification task), existing methods have proposed to select a subset of unlabeled examples for annotation, thus enhancing the quality of prompts an… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  41. arXiv:2406.03726  [pdf

    cs.LG

    Efficient Graph Encoder Embedding for Large Sparse Graphs in Python

    Authors: Xihan Qin, Cencheng Shen

    Abstract: Graph is a ubiquitous representation of data in various research fields, and graph embedding is a prevalent machine learning technique for capturing key features and generating fixed-sized attributes. However, most state-of-the-art graph embedding methods are computationally and spatially expensive. Recently, the Graph Encoder Embedding (GEE) has been shown as the fastest graph embedding technique… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  42. arXiv:2406.03141  [pdf, other

    q-bio.BM cs.LG

    Floating Anchor Diffusion Model for Multi-motif Scaffolding

    Authors: Ke Liu, Weian Mao, Shuaike Shen, Xiaoran Jiao, Zheng Sun, Hao Chen, Chunhua Shen

    Abstract: Motif scaffolding seeks to design scaffold structures for constructing proteins with functions derived from the desired motif, which is crucial for the design of vaccines and enzymes. Previous works approach the problem by inpainting or conditional generation. Both of them can only scaffold motifs with fixed positions, and the conditional generation cannot guarantee the presence of motifs. However… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  43. arXiv:2406.02435  [pdf, other

    cs.CV

    Generative Active Learning for Long-tailed Instance Segmentation

    Authors: Muzhi Zhu, Chengxiang Fan, Hao Chen, Yang Liu, Weian Mao, Xiaogang Xu, Chunhua Shen

    Abstract: Recently, large-scale language-image generative models have gained widespread attention and many works have utilized generated data from these models to further enhance the performance of perception tasks. However, not all generated data can positively impact downstream models, and these methods do not thoroughly explore how to better select and utilize generated data. On the other hand, there is… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024

  44. arXiv:2406.02189  [pdf, other

    cs.LG

    Fast and Scalable Multi-Kernel Encoder Classifier

    Authors: Cencheng Shen

    Abstract: This paper introduces a new kernel-based classifier by viewing kernel matrices as generalized graphs and leveraging recent progress in graph embedding techniques. The proposed method facilitates fast and scalable kernel matrix embedding, and seamlessly integrates multiple kernels to enhance the learning process. Our theoretical analysis offers a population-level characterization of this approach u… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 12 pages main + 3 pages appendix

  45. arXiv:2406.00602  [pdf, other

    cs.SE cs.PL

    From Effectiveness to Efficiency: Comparative Evaluation of Code Generated by LCGMs for Bilingual Programming Questions

    Authors: Weipeng Jiang, Xuanqi Gao, Juan Zhai, Shiqing Ma, Xiaoyu Zhang, Chao Shen

    Abstract: Large Code Generation Models (LCGMs) have garnered significant attention and achieved promising results across various programming tasks. However, concerns arise regarding performance when using non-English prompts, as these models are primarily trained on English-centric corpora, and most programming language tokens resemble English. Existing benchmarks often rely on English programming questions… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 10 and a quarter pages, 6 figures

  46. arXiv:2406.00584  [pdf, other

    cs.DB cs.AI

    A Blueprint Architecture of Compound AI Systems for Enterprise

    Authors: Eser Kandogan, Sajjadur Rahman, Nikita Bhutani, Dan Zhang, Rafael Li Chen, Kushan Mitra, Sairam Gurajada, Pouya Pezeshkpour, Hayate Iso, Yanlin Feng, Hannah Kim, Chen Shen, Jin Wang, Estevam Hruschka

    Abstract: Large Language Models (LLMs) have showcased remarkable capabilities surpassing conventional NLP challenges, creating opportunities for use in production use cases. Towards this goal, there is a notable shift to building compound AI systems, wherein LLMs are integrated into an expansive software infrastructure with many components like models, retrievers, databases and tools. In this paper, we intr… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: Compound AI Systems Workshop at the Data+AI Summit 2024

  47. arXiv:2405.17976  [pdf

    cs.AI cs.CL

    Yuan 2.0-M32: Mixture of Experts with Attention Router

    Authors: Shaohua Wu, Jiangang Luo, Xi Chen, Lingjun Li, Xudong Zhao, Tong Yu, Chao Wang, Yue Wang, Fei Wang, Weixu Qiao, Houbo He, Zeru Zhang, Zeyu Sun, Junxiong Mao, Chong Shen

    Abstract: Yuan 2.0-M32, with a similar base architecture as Yuan-2.0 2B, uses a mixture-of-experts architecture with 32 experts of which 2 experts are active. A new router network, Attention Router, is proposed and adopted for a more efficient selection of experts, which improves the accuracy compared to the model with classical router network. Yuan 2.0-M32 is trained with 2000B tokens from scratch, and the… ▽ More

    Submitted 29 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: 14 pages,3 figures, 7 tables

  48. arXiv:2405.15473  [pdf, other

    stat.ML cs.LG cs.SI

    Encoder Embedding for General Graph and Node Classification

    Authors: Cencheng Shen

    Abstract: Graph encoder embedding, a recent technique for graph data, offers speed and scalability in producing vertex-level representations from binary graphs. In this paper, we extend the applicability of this method to a general graph model, which includes weighted graphs, distance matrices, and kernel matrices. We prove that the encoder embedding satisfies the law of large numbers and the central limit… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 26 pages

  49. arXiv:2405.14092  [pdf, other

    cs.CL

    Large Language Models Can Self-Correct with Minimal Effort

    Authors: Zhenyu Wu, Qingkai Zeng, Zhihan Zhang, Zhaoxuan Tan, Chao Shen, Meng Jiang

    Abstract: Intrinsic self-correct was a method that instructed large language models (LLMs) to verify and correct their responses without external feedback. Unfortunately, the study concluded that the LLMs could not self-correct reasoning yet. We find that a simple yet effective verification method can unleash inherent capabilities of the LLMs. That is to mask a key condition in the question, add the current… ▽ More

    Submitted 23 June, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: Work in Progress

  50. arXiv:2405.13870  [pdf, other

    cs.CV

    FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition

    Authors: Ganggui Ding, Canyu Zhao, Wen Wang, Zhen Yang, Zide Liu, Hao Chen, Chunhua Shen

    Abstract: Benefiting from large-scale pre-trained text-to-image (T2I) generative models, impressive progress has been achieved in customized image generation, which aims to generate user-specified concepts. Existing approaches have extensively focused on single-concept customization and still encounter challenges when it comes to complex scenarios that involve combining multiple concepts. These approaches o… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: CVPR2024