Skip to main content

Showing 1–12 of 12 results for author: Lv, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.06677  [pdf, other

    cs.CL

    Mixture-of-Modules: Reinventing Transformers as Dynamic Assemblies of Modules

    Authors: Zhuocheng Gong, Ang Lv, Jian Guan, Junxi Yan, Wei Wu, Huishuai Zhang, Minlie Huang, Dongyan Zhao, Rui Yan

    Abstract: Is it always necessary to compute tokens from shallow to deep layers in Transformers? The continued success of vanilla Transformers and their variants suggests an undoubted "yes". In this work, however, we attempt to break the depth-ordered convention by proposing a novel architecture dubbed mixture-of-modules (MoM), which is motivated by an intuition that any layer, regardless of its position, ca… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  2. arXiv:2406.19598  [pdf, other

    cs.CL

    Mixture of In-Context Experts Enhance LLMs' Long Context Awareness

    Authors: Hongzhan Lin, Ang Lv, Yuhan Chen, Chen Zhu, Yang Song, Hengshu Zhu, Rui Yan

    Abstract: Many studies have revealed that large language models (LLMs) exhibit uneven awareness of different contextual positions.Their limited context awareness can lead to overlooking critical information and subsequent task failures. While several approaches have been proposed to enhance LLMs' context awareness, achieving both effectiveness and efficiency remains challenging.In this paper, for LLMs utili… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 14 pages, 5 figures

  3. arXiv:2404.00586  [pdf, other

    cs.AI

    RLGNet: Repeating-Local-Global History Network for Temporal Knowledge Graph Reasoning

    Authors: Ao Lv, Yongzhong Huang, Guige Ouyang, Yue Chen, Haoran Xie

    Abstract: Temporal Knowledge Graph (TKG) reasoning is based on historical information to predict the future. Therefore, parsing and mining historical information is key to predicting the future. Most existing methods fail to concurrently address and comprehend historical information from both global and local perspectives. Neglecting the global view might result in overlooking macroscopic trends and pattern… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  4. arXiv:2403.19521  [pdf, other

    cs.CL cs.AI cs.LG

    Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models

    Authors: Ang Lv, Yuhan Chen, Kaiyi Zhang, Yulong Wang, Lifeng Liu, Ji-Rong Wen, Jian Xie, Rui Yan

    Abstract: In this paper, we delve into several mechanisms employed by Transformer-based language models (LLMs) for factual recall tasks. We outline a pipeline consisting of three major steps: (1) Given a prompt ``The capital of France is,'' task-specific attention heads extract the topic token, such as ``France,'' from the context and pass it to subsequent MLPs. (2) As attention heads' outputs are aggregate… ▽ More

    Submitted 24 May, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  5. arXiv:2403.02178  [pdf, other

    cs.CL cs.AI cs.LG

    Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models

    Authors: Changyu Chen, Xiting Wang, Ting-En Lin, Ang Lv, Yuchuan Wu, Xin Gao, Ji-Rong Wen, Rui Yan, Yongbin Li

    Abstract: In reasoning tasks, even a minor error can cascade into inaccurate results, leading to suboptimal performance of large language models in such domains. Earlier fine-tuning approaches sought to mitigate this by leveraging more precise supervisory signals from human labeling, larger models, or self-sampling, although at a high cost. Conversely, we develop a method that avoids external resources, rel… ▽ More

    Submitted 10 July, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Accepted by ACL 2024

  6. arXiv:2401.06469  [pdf, other

    cs.LG cs.CL

    Batch-ICL: Effective, Efficient, and Order-Agnostic In-Context Learning

    Authors: Kaiyi Zhang, Ang Lv, Yuhan Chen, Hansen Ha, Tao Xu, Rui Yan

    Abstract: In this paper, by treating in-context learning (ICL) as a meta-optimization process, we explain why LLMs are sensitive to the order of ICL examples. This understanding leads us to the development of Batch-ICL, an effective, efficient, and order-agnostic inference algorithm for ICL. Differing from the standard N-shot learning approach, Batch-ICL employs $N$ separate 1-shot forward computations and… ▽ More

    Submitted 5 June, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: This paper has been accepted by ACL 2024 (Findings)

  7. arXiv:2312.04455  [pdf, other

    cs.CL cs.AI cs.LG

    Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use

    Authors: Yuhan Chen, Ang Lv, Ting-En Lin, Changyu Chen, Yuchuan Wu, Fei Huang, Yongbin Li, Rui Yan

    Abstract: In this paper, we demonstrate that an inherent waveform pattern in the attention allocation of large language models (LLMs) significantly affects their performance in tasks demanding a high degree of context awareness, such as utilizing LLMs for tool-use. Specifically, the crucial information in the context will be potentially overlooked by model when it is positioned in the trough zone of the att… ▽ More

    Submitted 4 June, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: ACL 2024 main

  8. arXiv:2311.07468  [pdf, other

    cs.CL cs.AI cs.LG

    Are We Falling in a Middle-Intelligence Trap? An Analysis and Mitigation of the Reversal Curse

    Authors: Ang Lv, Kaiyi Zhang, Shufang Xie, Quan Tu, Yuhan Chen, Ji-Rong Wen, Rui Yan

    Abstract: Recent studies have highlighted a phenomenon in large language models (LLMs) known as "the reversal curse," in which the order of knowledge entities in the training data biases the models' comprehension. For example, if a model is trained on sentences where entity A consistently appears before entity B, it can respond to queries about A by providing B as the answer. However, it may encounter confu… ▽ More

    Submitted 16 November, 2023; v1 submitted 13 November, 2023; originally announced November 2023.

  9. arXiv:2310.04992  [pdf, other

    eess.IV cs.CV

    VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence

    Authors: Jianing Qiu, Jian Wu, Hao Wei, Peilun Shi, Minqing Zhang, Yunyun Sun, Lin Li, Hanruo Liu, Hongyi Liu, Simeng Hou, Yuyang Zhao, Xuehui Shi, Junfang Xian, Xiaoxia Qu, Sirui Zhu, Lijie Pan, Xiaoniao Chen, Xiaojia Zhang, Shuai Jiang, Kebing Wang, Chenlong Yang, Mingqiang Chen, Sujie Fan, Jianhua Hu, Aiguo Lv , et al. (17 additional authors not shown)

    Abstract: We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals, covering a broad range of ophthalmic diseases, modalities, imaging devices, and demography. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications, such as disease screening and diagnosis, disease prognosis, subclassifi… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

  10. arXiv:2306.16770  [pdf, other

    cs.CL cs.AI

    DialoGPS: Dialogue Path Sampling in Continuous Semantic Space for Data Augmentation in Multi-Turn Conversations

    Authors: Ang Lv, Jinpeng Li, Yuhan Chen, Xing Gao, Ji Zhang, Rui Yan

    Abstract: In open-domain dialogue generation tasks, contexts and responses in most datasets are one-to-one mapped, violating an important many-to-many characteristic: a context leads to various responses, and a response answers multiple contexts. Without such patterns, models poorly generalize and prefer responding safely. Many attempts have been made in either multi-turn settings from a one-to-many perspec… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Comments: ACL 2023 main

  11. arXiv:2305.10841  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework

    Authors: Ang Lv, Xu Tan, Peiling Lu, Wei Ye, Shikun Zhang, Jiang Bian, Rui Yan

    Abstract: Symbolic music generation aims to create musical notes, which can help users compose music, such as generating target instrument tracks based on provided source tracks. In practical scenarios where there's a predefined ensemble of tracks and various composition needs, an efficient and effective generative model that can generate any target tracks based on the other tracks becomes crucial. However,… ▽ More

    Submitted 29 September, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: 13 pages, 4 figures

  12. arXiv:2208.05697  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    Re-creation of Creations: A New Paradigm for Lyric-to-Melody Generation

    Authors: Ang Lv, Xu Tan, Tao Qin, Tie-Yan Liu, Rui Yan

    Abstract: Lyric-to-melody generation is an important task in songwriting, and is also quite challenging due to its unique characteristics: the generated melodies should not only follow good musical patterns, but also align with features in lyrics such as rhythms and structures. These characteristics cannot be well handled by neural generation models that learn lyric-to-melody mapping in an end-to-end way, d… ▽ More

    Submitted 28 January, 2023; v1 submitted 11 August, 2022; originally announced August 2022.