Skip to main content

Showing 1–50 of 250 results for author: Yan, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.16207  [pdf, other

    cs.CL

    Graph-Structured Speculative Decoding

    Authors: Zhuocheng Gong, Jiahao Liu, Ziyue Wang, Pengfei Wu, Jingang Wang, Xunliang Cai, Dongyan Zhao, Rui Yan

    Abstract: Speculative decoding has emerged as a promising technique to accelerate the inference of Large Language Models (LLMs) by employing a small language model to draft a hypothesis sequence, which is then validated by the LLM. The effectiveness of this approach heavily relies on the balance between performance and efficiency of the draft model. In our research, we focus on enhancing the proportion of d… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  2. arXiv:2407.15202  [pdf, other

    q-bio.BM cs.AI cs.LG

    Exploiting Pre-trained Models for Drug Target Affinity Prediction with Nearest Neighbors

    Authors: Qizhi Pei, Lijun Wu, Zhenyu He, Jinhua Zhu, Yingce Xia, Shufang Xie, Rui Yan

    Abstract: Drug-Target binding Affinity (DTA) prediction is essential for drug discovery. Despite the application of deep learning methods to DTA prediction, the achieved accuracy remain suboptimal. In this work, inspired by the recent success of retrieval methods, we propose $k$NN-DTA, a non-parametric embedding-based retrieval method adopted on a pre-trained DTA prediction model, which can extend the power… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: Accepted by 33rd ACM International Conference on Information and Knowledge Management 2024 (CIKM 2024)

  3. arXiv:2407.06677  [pdf, other

    cs.CL

    Mixture-of-Modules: Reinventing Transformers as Dynamic Assemblies of Modules

    Authors: Zhuocheng Gong, Ang Lv, Jian Guan, Junxi Yan, Wei Wu, Huishuai Zhang, Minlie Huang, Dongyan Zhao, Rui Yan

    Abstract: Is it always necessary to compute tokens from shallow to deep layers in Transformers? The continued success of vanilla Transformers and their variants suggests an undoubted "yes". In this work, however, we attempt to break the depth-ordered convention by proposing a novel architecture dubbed mixture-of-modules (MoM), which is motivated by an intuition that any layer, regardless of its position, ca… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  4. arXiv:2407.05246  [pdf, other

    cs.LG cs.CV

    Deep Online Probability Aggregation Clustering

    Authors: Yuxuan Yan, Na Lu, Ruofan Yan

    Abstract: Combining machine clustering with deep models has shown remarkable superiority in deep clustering. It modifies the data processing pipeline into two alternating phases: feature clustering and model training. However, such alternating schedule may lead to instability and computational burden issues. We propose a centerless clustering algorithm called Probability Aggregation Clustering (PAC) to proa… ▽ More

    Submitted 13 July, 2024; v1 submitted 6 July, 2024; originally announced July 2024.

    Comments: 19 pages,2 figures, conference

  5. arXiv:2407.01601  [pdf, other

    cs.LG cs.AI

    Unveiling and Controlling Anomalous Attention Distribution in Transformers

    Authors: Ruiqing Yan, Xingbo Du, Haoyu Deng, Linghan Zheng, Qiuzhuang Sun, Jifang Hu, Yuhang Shao, Penghao Jiang, Jinrong Jiang, Lian Zhao

    Abstract: With the advent of large models based on the Transformer architecture, researchers have observed an anomalous phenomenon in the Attention mechanism--there is a very high attention on the first element, which is prevalent across Transformer-based models. It is crucial to understand it for the development of techniques focusing on attention distribution, such as Key-Value (KV) Cache compression and… ▽ More

    Submitted 3 July, 2024; v1 submitted 26 June, 2024; originally announced July 2024.

  6. arXiv:2407.00993  [pdf, other

    cs.AI cs.CL

    Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents

    Authors: Shihan Deng, Weikai Xu, Hongda Sun, Wei Liu, Tao Tan, Jianfeng Liu, Ang Li, Jian Luan, Bin Wang, Rui Yan, Shuo Shang

    Abstract: With the remarkable advancements of large language models (LLMs), LLM-based agents have become a research hotspot in human-computer interaction. However, there is a scarcity of benchmarks available for LLM-based mobile agents. Benchmarking these agents generally faces three main challenges: (1) The inefficiency of UI-only operations imposes limitations to task evaluation. (2) Specific instructions… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  7. arXiv:2406.19934  [pdf, other

    cs.CL cs.AI

    From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis

    Authors: Chuanqi Cheng, Jian Guan, Wei Wu, Rui Yan

    Abstract: We explore multi-step reasoning in vision-language models (VLMs). The problem is challenging, as reasoning data consisting of multiple steps of visual and language processing are barely available. To overcome the challenge, we first introduce a least-to-most visual reasoning paradigm, which interleaves steps of decomposing a question into sub-questions and invoking external tools for resolving sub… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  8. arXiv:2406.19853  [pdf, other

    cs.CL cs.AI

    YuLan: An Open-source Large Language Model

    Authors: Yutao Zhu, Kun Zhou, Kelong Mao, Wentong Chen, Yiding Sun, Zhipeng Chen, Qian Cao, Yihan Wu, Yushuo Chen, Feng Wang, Lei Zhang, Junyi Li, Xiaolei Wang, Lei Wang, Beichen Zhang, Zican Dong, Xiaoxue Cheng, Yuhan Chen, Xinyu Tang, Yupeng Hou, Qiangqiang Ren, Xincheng Pang, Shufang Xie, Wayne Xin Zhao, Zhicheng Dou , et al. (13 additional authors not shown)

    Abstract: Large language models (LLMs) have become the foundation of many applications, leveraging their extensive capabilities in processing and understanding natural language. While many open-source LLMs have been released with technical reports, the lack of training details hinders further research and development. This paper presents the development of YuLan, a series of open-source LLMs with $12$ billi… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  9. arXiv:2406.19598  [pdf, other

    cs.CL

    Mixture of In-Context Experts Enhance LLMs' Long Context Awareness

    Authors: Hongzhan Lin, Ang Lv, Yuhan Chen, Chen Zhu, Yang Song, Hengshu Zhu, Rui Yan

    Abstract: Many studies have revealed that large language models (LLMs) exhibit uneven awareness of different contextual positions.Their limited context awareness can lead to overlooking critical information and subsequent task failures. While several approaches have been proposed to enhance LLMs' context awareness, achieving both effectiveness and efficiency remains challenging.In this paper, for LLMs utili… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 14 pages, 5 figures

  10. arXiv:2406.06517  [pdf, other

    cs.CV

    Genomics-guided Representation Learning for Pathologic Pan-cancer Tumor Microenvironment Subtype Prediction

    Authors: Fangliangzi Meng, Hongrun Zhang, Ruodan Yan, Guohui Chuai, Chao Li, Qi Liu

    Abstract: The characterization of Tumor MicroEnvironment (TME) is challenging due to its complexity and heterogeneity. Relatively consistent TME characteristics embedded within highly specific tissue features, render them difficult to predict. The capability to accurately classify TME subtypes is of critical significance for clinical tumor diagnosis and precision medicine. Based on the observation that tumo… ▽ More

    Submitted 8 July, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: MICCAI2024

  11. arXiv:2406.06434  [pdf, ps, other

    eess.IV cs.CV

    Spatiotemporal Graph Neural Network Modelling Perfusion MRI

    Authors: Ruodan Yan, Carola-Bibiane Schönlieb, Chao Li

    Abstract: Perfusion MRI (pMRI) offers valuable insights into tumor vascularity and promises to predict tumor genotypes, thus benefiting prognosis for glioma patients, yet effective models tailored to 4D pMRI are still lacking. This study presents the first attempt to model 4D pMRI using a GNN-based spatiotemporal model PerfGAT, integrating spatial information and temporal kinetics to predict Isocitrate DeHy… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 11 pages, 2 figures

  12. arXiv:2406.05797  [pdf, other

    q-bio.BM cs.AI cs.CE cs.CL cs.LG

    3D-MolT5: Towards Unified 3D Molecule-Text Modeling with 3D Molecular Tokenization

    Authors: Qizhi Pei, Lijun Wu, Kaiyuan Gao, Jinhua Zhu, Rui Yan

    Abstract: The integration of molecule and language has garnered increasing attention in molecular science. Recent advancements in Language Models (LMs) have demonstrated potential for the comprehensive modeling of molecule and language. However, existing works exhibit notable limitations. Most existing works overlook the modeling of 3D information, which is crucial for understanding molecular structures and… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: 18 pages

  13. arXiv:2406.05360  [pdf, other

    cs.CL

    Flexible and Adaptable Summarization via Expertise Separation

    Authors: Xiuying Chen, Mingzhe Li, Shen Gao, Xin Cheng, Qingqing Zhu, Rui Yan, Xin Gao, Xiangliang Zhang

    Abstract: A proficient summarization model should exhibit both flexibility -- the capacity to handle a range of in-domain summarization tasks, and adaptability -- the competence to acquire new knowledge and adjust to unseen out-of-domain tasks. Unlike large language models (LLMs) that achieve this through parameter scaling, we propose a more parameter-efficient approach in this study. Our motivation rests o… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 10 pages, 7 figures, published in SIGIR 2024

  14. arXiv:2406.03894  [pdf, other

    cs.LG

    Transductive Off-policy Proximal Policy Optimization

    Authors: Yaozhong Gan, Renye Yan, Xiaoyang Tan, Zhe Wu, Junliang Xing

    Abstract: Proximal Policy Optimization (PPO) is a popular model-free reinforcement learning algorithm, esteemed for its simplicity and efficacy. However, due to its inherent on-policy nature, its proficiency in harnessing data from disparate policies is constrained. This paper introduces a novel off-policy extension to the original PPO method, christened Transductive Off-policy PPO (ToPPO). Herein, we provi… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 18

  15. arXiv:2406.03678  [pdf, other

    cs.LG cs.AI stat.ML

    Reflective Policy Optimization

    Authors: Yaozhong Gan, Renye Yan, Zhe Wu, Junliang Xing

    Abstract: On-policy reinforcement learning methods, like Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), often demand extensive data per update, leading to sample inefficiency. This paper introduces Reflective Policy Optimization (RPO), a novel on-policy extension that amalgamates past and future state-action information for policy optimization. This approach empowers the age… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 20 pages

  16. arXiv:2406.03075  [pdf, other

    cs.CL

    Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework

    Authors: Xiaoxi Sun, Jinpeng Li, Yan Zhong, Dongyan Zhao, Rui Yan

    Abstract: The advent of large language models (LLMs) has facilitated the development of natural language text generation. It also poses unprecedented challenges, with content hallucination emerging as a significant concern. Existing solutions often involve expensive and complex interventions during the training process. Moreover, some approaches emphasize problem disassembly while neglecting the crucial val… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 18 pages, 3 figures

  17. arXiv:2406.03002  [pdf, other

    eess.IV cs.CV

    Phy-Diff: Physics-guided Hourglass Diffusion Model for Diffusion MRI Synthesis

    Authors: Juanhua Zhang, Ruodan Yan, Alessandro Perelli, Xi Chen, Chao Li

    Abstract: Diffusion MRI (dMRI) is an important neuroimaging technique with high acquisition costs. Deep learning approaches have been used to enhance dMRI and predict diffusion biomarkers through undersampled dMRI. To generate more comprehensive raw dMRI, generative adversarial network based methods are proposed to include b-values and b-vectors as conditions, but they are limited by unstable training and l… ▽ More

    Submitted 10 July, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by MICCAI 2024

  18. arXiv:2406.00672  [pdf, other

    cs.CV

    Task-oriented Embedding Counts: Heuristic Clustering-driven Feature Fine-tuning for Whole Slide Image Classification

    Authors: Xuenian Wang, Shanshan Shi, Renao Yan, Qiehe Sun, Lianghui Zhu, Tian Guan, Yonghong He

    Abstract: In the field of whole slide image (WSI) classification, multiple instance learning (MIL) serves as a promising approach, commonly decoupled into feature extraction and aggregation. In this paradigm, our observation reveals that discriminative embeddings are crucial for aggregation to the final prediction. Among all feature updating strategies, task-oriented ones can capture characteristics specifi… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  19. arXiv:2405.20343  [pdf, other

    cs.CV cs.GR cs.LG

    Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image

    Authors: Kailu Wu, Fangfu Liu, Zhihan Cai, Runjie Yan, Hanyang Wang, Yating Hu, Yueqi Duan, Kaisheng Ma

    Abstract: In this work, we introduce Unique3D, a novel image-to-3D framework for efficiently generating high-quality 3D meshes from single-view images, featuring state-of-the-art generation fidelity and strong generalizability. Previous methods based on Score Distillation Sampling (SDS) can produce diversified 3D results by distilling 3D knowledge from large 2D diffusion models, but they usually suffer from… ▽ More

    Submitted 13 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: Project page: https://wukailu.github.io/Unique3D

    ACM Class: I.2.10

  20. arXiv:2405.18113  [pdf, other

    cs.CL cs.AI

    Facilitating Multi-Role and Multi-Behavior Collaboration of Large Language Models for Online Job Seeking and Recruiting

    Authors: Hongda Sun, Hongzhan Lin, Haiyu Yan, Chen Zhu, Yang Song, Xin Gao, Shuo Shang, Rui Yan

    Abstract: The emergence of online recruitment services has revolutionized the traditional landscape of job seeking and recruitment, necessitating the development of high-quality industrial applications to improve person-job fitting. Existing methods generally rely on modeling the latent semantics of resumes and job descriptions and learning a matching function between them. Inspired by the powerful role-pla… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  21. arXiv:2405.16914  [pdf, other

    cs.SE

    Rigorous Simulation-based Testing for Autonomous Driving Systems -- Targeting the Achilles' Heel of Four Open Autopilots

    Authors: Changwen Li, Joseph Sifakis, Rongjie Yan, Jian Zhang

    Abstract: Simulation-based testing remains the main approach for validating Autonomous Driving Systems. We propose a rigorous test method based on breaking down scenarios into simple ones, taking into account the fact that autopilots make decisions according to traffic rules whose application depends on local knowledge and context. This leads us to consider the autopilot as a dynamic system receiving three… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 57 pages, 19 figures, 29 tables

  22. arXiv:2405.15182  [pdf, other

    cs.CR cs.AI

    RFLPA: A Robust Federated Learning Framework against Poisoning Attacks with Secure Aggregation

    Authors: Peihua Mai, Ran Yan, Yan Pang

    Abstract: Federated learning (FL) allows multiple devices to train a model collaboratively without sharing their data. Despite its benefits, FL is vulnerable to privacy leakage and poisoning attacks. To address the privacy concern, secure aggregation (SecAgg) is often used to obtain the aggregation of gradients on sever without inspecting individual user updates. Unfortunately, existing defense strategies a… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 22 pages

    ACM Class: E.4

  23. arXiv:2405.13810  [pdf, other

    cs.LG cs.AI

    Leveraging 2D Information for Long-term Time Series Forecasting with Vanilla Transformers

    Authors: Xin Cheng, Xiuying Chen, Shuqi Li, Di Luo, Xun Wang, Dongyan Zhao, Rui Yan

    Abstract: Time series prediction is crucial for understanding and forecasting complex dynamics in various domains, ranging from finance and economics to climate and healthcare. Based on Transformer architecture, one approach involves encoding multiple variables from the same timestamp into a single temporal token to model global dependencies. In contrast, another approach embeds the time points of individua… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  24. arXiv:2405.13432  [pdf, other

    cs.CL cs.AI

    Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction

    Authors: Tingchen Fu, Deng Cai, Lemao Liu, Shuming Shi, Rui Yan

    Abstract: Supervised fine-tuning (SFT) on instruction-following corpus is a crucial approach toward the alignment of large language models (LLMs). However, the performance of LLMs on standard knowledge and reasoning benchmarks tends to suffer from deterioration at the latter stage of the SFT process, echoing the phenomenon of alignment tax. Through our pilot study, we put a hypothesis that the data biases a… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Accepted to the findings of ACL2024

  25. arXiv:2405.11315  [pdf, other

    cs.CV

    MediCLIP: Adapting CLIP for Few-shot Medical Image Anomaly Detection

    Authors: Ximiao Zhang, Min Xu, Dehui Qiu, Ruixin Yan, Ning Lang, Xiuzhuang Zhou

    Abstract: In the field of medical decision-making, precise anomaly detection in medical imaging plays a pivotal role in aiding clinicians. However, previous work is reliant on large-scale datasets for training anomaly detection models, which increases the development cost. This paper first focuses on the task of medical image anomaly detection in the few-shot setting, which is critically significant for the… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: 12 pages, 3 figures, 5 tables, early accepted at MICCAI 2024

  26. arXiv:2405.10988  [pdf, other

    cs.LG cs.AI

    Flow Score Distillation for Diverse Text-to-3D Generation

    Authors: Runjie Yan, Kailu Wu, Kaisheng Ma

    Abstract: Recent advancements in Text-to-3D generation have yielded remarkable progress, particularly through methods that rely on Score Distillation Sampling (SDS). While SDS exhibits the capability to create impressive 3D assets, it is hindered by its inherent maximum-likelihood-seeking essence, resulting in limited diversity in generation outcomes. In this paper, we discover that the Denoise Diffusion Im… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  27. arXiv:2405.02538  [pdf, other

    cs.CV

    AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity Recognition

    Authors: Meiqi Cao, Rui Yan, Xiangbo Shu, Guangzhao Dai, Yazhou Yao, Guo-Sen Xie

    Abstract: Panoramic Activity Recognition (PAR) aims to identify multi-granularity behaviors performed by multiple persons in panoramic scenes, including individual activities, group activities, and global activities. Previous methods 1) heavily rely on manually annotated detection boxes in training and inference, hindering further practical deployment; or 2) directly employ normal detectors to detect multip… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  28. arXiv:2405.02077  [pdf, other

    cs.CV

    MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action Recognition

    Authors: Hongyu Qu, Rui Yan, Xiangbo Shu, Hailiang Gao, Peng Huang, Guo-Sen Xie

    Abstract: Recent few-shot action recognition (FSAR) methods typically perform semantic matching on learned discriminative features to achieve promising performance. However, most FSAR methods focus on single-scale (e.g., frame-level, segment-level, etc) feature alignment, which ignores that human actions with the same semantic may appear at different velocities. To this end, we develop a novel Multi-Velocit… ▽ More

    Submitted 23 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  29. arXiv:2404.16852  [pdf, other

    cs.LG cs.AI cs.CL eess.IV

    A Disease Labeler for Chinese Chest X-Ray Report Generation

    Authors: Mengwei Wang, Ruixin Yan, Zeyi Hou, Ning Lang, Xiuzhuang Zhou

    Abstract: In the field of medical image analysis, the scarcity of Chinese chest X-ray report datasets has hindered the development of technology for generating Chinese chest X-ray reports. On one hand, the construction of a Chinese chest X-ray report dataset is limited by the time-consuming and costly process of accurate expert disease annotation. On the other hand, a single natural language generation metr… ▽ More

    Submitted 18 March, 2024; originally announced April 2024.

  30. arXiv:2404.15771  [pdf, other

    cs.CV cs.MM

    DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with Retrieval Guidelines

    Authors: Xin Jiang, Hao Tang, Rui Yan, Jinhui Tang, Zechao Li

    Abstract: Fine-grained image retrieval (FGIR) is to learn visual representations that distinguish visually similar objects while maintaining generalization. Existing methods propose to generate discriminative features, but rarely consider the particularity of the FGIR task itself. This paper presents a meticulous analysis leading to the proposal of practical guidelines to identify subcategory-specific discr… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  31. arXiv:2404.15597  [pdf, other

    cs.NE cs.AI cs.LG cs.MA

    GRSN: Gated Recurrent Spiking Neurons for POMDPs and MARL

    Authors: Lang Qin, Ziming Wang, Runhao Jiang, Rui Yan, Huajin Tang

    Abstract: Spiking neural networks (SNNs) are widely applied in various fields due to their energy-efficient and fast-inference capabilities. Applying SNNs to reinforcement learning (RL) can significantly reduce the computational resource requirements for agents and improve the algorithm's performance under resource-constrained conditions. However, in current spiking reinforcement learning (SRL) algorithms,… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  32. arXiv:2404.15353  [pdf, other

    eess.SP cs.AI cs.LG

    SQUWA: Signal Quality Aware DNN Architecture for Enhanced Accuracy in Atrial Fibrillation Detection from Noisy PPG Signals

    Authors: Runze Yan, Cheng Ding, Ran Xiao, Aleksandr Fedorov, Randall J Lee, Fadi Nahab, Xiao Hu

    Abstract: Atrial fibrillation (AF), a common cardiac arrhythmia, significantly increases the risk of stroke, heart disease, and mortality. Photoplethysmography (PPG) offers a promising solution for continuous AF monitoring, due to its cost efficiency and integration into wearable devices. Nonetheless, PPG signals are susceptible to corruption from motion artifacts and other factors often encountered in ambu… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: 15 pages; 9 figures; 2024 Conference on Health, Inference, and Learning (CHIL)

  33. arXiv:2404.10679  [pdf, other

    cs.GT cs.AI

    HSVI-based Online Minimax Strategies for Partially Observable Stochastic Games with Neural Perception Mechanisms

    Authors: Rui Yan, Gabriel Santos, Gethin Norman, David Parker, Marta Kwiatkowska

    Abstract: We consider a variant of continuous-state partially-observable stochastic games with neural perception mechanisms and an asymmetric information structure. One agent has partial information, with the observation function implemented as a neural network, while the other agent is assumed to have full knowledge of the state. We present, for the first time, an efficient online method to compute an… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: 12 pages, 2 figures

  34. arXiv:2404.04272  [pdf, other

    cs.IR cs.CL

    Selecting Query-bag as Pseudo Relevance Feedback for Information-seeking Conversations

    Authors: Xiaoqing Zhang, Xiuying Chen, Shen Gao, Shuqi Li, Xin Gao, Ji-Rong Wen, Rui Yan

    Abstract: Information-seeking dialogue systems are widely used in e-commerce systems, with answers that must be tailored to fit the specific settings of the online system. Given the user query, the information-seeking dialogue systems first retrieve a subset of response candidates, then further select the best response from the candidate set through re-ranking. Current methods mainly retrieve response candi… ▽ More

    Submitted 22 March, 2024; originally announced April 2024.

  35. arXiv:2403.19521  [pdf, other

    cs.CL cs.AI cs.LG

    Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models

    Authors: Ang Lv, Yuhan Chen, Kaiyi Zhang, Yulong Wang, Lifeng Liu, Ji-Rong Wen, Jian Xie, Rui Yan

    Abstract: In this paper, we delve into several mechanisms employed by Transformer-based language models (LLMs) for factual recall tasks. We outline a pipeline consisting of three major steps: (1) Given a prompt ``The capital of France is,'' task-specific attention heads extract the topic token, such as ``France,'' from the context and pass it to subsequent MLPs. (2) As attention heads' outputs are aggregate… ▽ More

    Submitted 24 May, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  36. arXiv:2403.15105  [pdf, other

    cs.SI

    LLM-Driven Agents for Influencer Selection in Digital Advertising Campaigns

    Authors: Xiaoqing Zhang, Xiuying Chen, Yuhan Liu, Jianzhou Wang, Zhenxing Hu, Rui Yan

    Abstract: In the digital world, influencers are pivotal as opinion leaders, shaping the views and choices of their influencees. Modern advertising often follows this trend, where marketers choose appropriate influencers for product endorsements, based on thorough market analysis. Previous studies on influencer selection have typically relied on numerical representations of individual opinions and interactio… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  37. arXiv:2403.11439  [pdf, other

    cs.CL

    StyleChat: Learning Recitation-Augmented Memory in LLMs for Stylized Dialogue Generation

    Authors: Jinpeng Li, Zekai Zhang, Quan Tu, Xin Cheng, Dongyan Zhao, Rui Yan

    Abstract: Large Language Models (LLMs) demonstrate superior performance in generative scenarios and have attracted widespread attention. Among them, stylized dialogue generation is essential in the context of LLMs for building intelligent and engaging dialogue agent. However the ability of LLMs is data-driven and limited by data bias, leading to poor performance on specific tasks. In particular, stylized di… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  38. arXiv:2403.09498  [pdf, other

    cs.SI cs.AI cs.CL

    From Skepticism to Acceptance: Simulating the Attitude Dynamics Toward Fake News

    Authors: Yuhan Liu, Xiuying Chen, Xiaoqing Zhang, Xing Gao, Ji Zhang, Rui Yan

    Abstract: In the digital era, the rapid propagation of fake news and rumors via social networks brings notable societal challenges and impacts public opinion regulation. Traditional fake news modeling typically forecasts the general popularity trends of different groups or numerically represents opinions shift. However, these methods often oversimplify real-world complexities and overlook the rich semantic… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  39. arXiv:2403.08312  [pdf, other

    cs.CL cs.AI

    StreamingDialogue: Prolonged Dialogue Learning via Long Context Compression with Minimal Losses

    Authors: Jia-Nan Li, Quan Tu, Cunli Mao, Zhengtao Yu, Ji-Rong Wen, Rui Yan

    Abstract: Standard Large Language Models (LLMs) struggle with handling dialogues with long contexts due to efficiency and consistency issues. According to our observation, dialogue contexts are highly structured, and the special token of \textit{End-of-Utterance} (EoU) in dialogues has the potential to aggregate information. We refer to the EoU tokens as ``conversational attention sinks'' (conv-attn sinks).… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  40. arXiv:2403.06408  [pdf, other

    cs.LG cs.AI

    What Makes Quantization for Large Language Models Hard? An Empirical Study from the Lens of Perturbation

    Authors: Zhuocheng Gong, Jiahao Liu, Jingang Wang, Xunliang Cai, Dongyan Zhao, Rui Yan

    Abstract: Quantization has emerged as a promising technique for improving the memory and computational efficiency of large language models (LLMs). Though the trade-off between performance and efficiency is well-known, there is still much to be learned about the relationship between quantization and LLM performance. To shed light on this relationship, we propose a new perspective on quantization, viewing it… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  41. arXiv:2403.06202  [pdf, other

    eess.SY cs.GT

    Pursuit Winning Strategies for Reach-Avoid Games with Polygonal Obstacles

    Authors: Rui Yan, Shuai Mi, Xiaoming Duan, Jintao Chen, Xiangyang Ji

    Abstract: This paper studies a multiplayer reach-avoid differential game in the presence of general polygonal obstacles that block the players' motions. The pursuers cooperate to protect a convex region from the evaders who try to reach the region. We propose a multiplayer onsite and close-to-goal (MOCG) pursuit strategy that can tell and achieve an increasing lower bound on the number of guaranteed defeate… ▽ More

    Submitted 22 May, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

    Comments: 16 pages, 10 figures

  42. arXiv:2403.05217  [pdf, other

    cs.CL cs.AI cs.IR

    Harnessing Multi-Role Capabilities of Large Language Models for Open-Domain Question Answering

    Authors: Hongda Sun, Yuxuan Liu, Chengwei Wu, Haiyu Yan, Cheng Tai, Xin Gao, Shuo Shang, Rui Yan

    Abstract: Open-domain question answering (ODQA) has emerged as a pivotal research spotlight in information systems. Existing methods follow two main paradigms to collect evidence: (1) The \textit{retrieve-then-read} paradigm retrieves pertinent documents from an external corpus; and (2) the \textit{generate-then-read} paradigm employs large language models (LLMs) to generate relevant documents. However, nei… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: TheWebConf 2024 (WWW 2024) oral, code repo: https://github.com/EthanLeo-LYX/LLMQA

  43. arXiv:2403.03102  [pdf, other

    cs.CL cs.AI

    "In Dialogues We Learn": Towards Personalized Dialogue Without Pre-defined Profiles through In-Dialogue Learning

    Authors: Chuanqi Cheng, Quan Tu, Wei Wu, Shuo Shang, Cunli Mao, Zhengtao Yu, Rui Yan

    Abstract: Personalized dialogue systems have gained significant attention in recent years for their ability to generate responses in alignment with different personas. However, most existing approaches rely on pre-defined personal profiles, which are not only time-consuming and labor-intensive to create but also lack flexibility. We propose In-Dialogue Learning (IDL), a fine-tuning framework that enhances t… ▽ More

    Submitted 12 March, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

  44. arXiv:2403.02178  [pdf, other

    cs.CL cs.AI cs.LG

    Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models

    Authors: Changyu Chen, Xiting Wang, Ting-En Lin, Ang Lv, Yuchuan Wu, Xin Gao, Ji-Rong Wen, Rui Yan, Yongbin Li

    Abstract: In reasoning tasks, even a minor error can cascade into inaccurate results, leading to suboptimal performance of large language models in such domains. Earlier fine-tuning approaches sought to mitigate this by leveraging more precise supervisory signals from human labeling, larger models, or self-sampling, although at a high cost. Conversely, we develop a method that avoids external resources, rel… ▽ More

    Submitted 10 July, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Accepted by ACL 2024

  45. arXiv:2403.01528  [pdf, other

    cs.CL cs.AI q-bio.BM

    Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey

    Authors: Qizhi Pei, Lijun Wu, Kaiyuan Gao, Jinhua Zhu, Yue Wang, Zun Wang, Tao Qin, Rui Yan

    Abstract: The integration of biomolecular modeling with natural language (BL) has emerged as a promising interdisciplinary area at the intersection of artificial intelligence, chemistry and biology. This approach leverages the rich, multifaceted descriptions of biomolecules contained within textual data sources to enhance our fundamental understanding and enable downstream computational tasks such as biomol… ▽ More

    Submitted 5 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

    Comments: Survey Paper. 25 pages, 9 figures, and 3 tables

  46. arXiv:2402.17810  [pdf, other

    q-bio.QM cs.AI cs.CE cs.LG q-bio.BM

    BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning

    Authors: Qizhi Pei, Lijun Wu, Kaiyuan Gao, Xiaozhuan Liang, Yin Fang, Jinhua Zhu, Shufang Xie, Tao Qin, Rui Yan

    Abstract: Recent research trends in computational biology have increasingly focused on integrating text and bio-entity modeling, especially in the context of molecules and proteins. However, previous efforts like BioT5 faced challenges in generalizing across diverse tasks and lacked a nuanced understanding of molecular structures, particularly in their textual representations (e.g., IUPAC). This paper intro… ▽ More

    Submitted 31 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted by ACL 2024 (Findings)

  47. arXiv:2401.10039  [pdf, other

    cs.CV

    GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition

    Authors: Guangzhao Dai, Xiangbo Shu, Wenhao Wu, Rui Yan, Jiachao Zhang

    Abstract: Vision-Language Models (VLMs), pre-trained on large-scale datasets, have shown impressive performance in various visual recognition tasks. This advancement paves the way for notable performance in Zero-Shot Egocentric Action Recognition (ZS-EAR). Typically, VLMs handle ZS-EAR as a global video-text matching task, which often leads to suboptimal alignment of vision and linguistic knowledge. We prop… ▽ More

    Submitted 11 May, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

  48. arXiv:2401.06469  [pdf, other

    cs.LG cs.CL

    Batch-ICL: Effective, Efficient, and Order-Agnostic In-Context Learning

    Authors: Kaiyi Zhang, Ang Lv, Yuhan Chen, Hansen Ha, Tao Xu, Rui Yan

    Abstract: In this paper, by treating in-context learning (ICL) as a meta-optimization process, we explain why LLMs are sensitive to the order of ICL examples. This understanding leads us to the development of Batch-ICL, an effective, efficient, and order-agnostic inference algorithm for ICL. Differing from the standard N-shot learning approach, Batch-ICL employs $N$ separate 1-shot forward computations and… ▽ More

    Submitted 5 June, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: This paper has been accepted by ACL 2024 (Findings)

  49. arXiv:2401.01275  [pdf, other

    cs.CL

    CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation

    Authors: Quan Tu, Shilong Fan, Zihang Tian, Rui Yan

    Abstract: Recently, the advent of large language models (LLMs) has revolutionized generative agents. Among them, Role-Playing Conversational Agents (RPCAs) attract considerable attention due to their ability to emotionally engage users. However, the absence of a comprehensive benchmark impedes progress in this field. To bridge this gap, we introduce CharacterEval, a Chinese benchmark for comprehensive RPCA… ▽ More

    Submitted 9 January, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

  50. arXiv:2401.00870  [pdf, other

    cs.CR cs.AI

    ConfusionPrompt: Practical Private Inference for Online Large Language Models

    Authors: Peihua Mai, Ran Yan, Rui Ye, Youjia Yang, Yinchuan Li, Yan Pang

    Abstract: State-of-the-art large language models (LLMs) are commonly deployed as online services, necessitating users to transmit informative prompts to cloud servers, thus engendering substantial privacy concerns. In response, we present ConfusionPrompt, a novel private LLM inference framework designed to obfuscate the server by: (i) decomposing the prompt into sub-prompts, and (ii) generating pseudo promp… ▽ More

    Submitted 24 May, 2024; v1 submitted 29 December, 2023; originally announced January 2024.

    Comments: 21 pages

    MSC Class: I.2.7