Skip to main content

Showing 1–50 of 88 results for author: Ge, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.19467  [pdf, other

    cs.IR cs.LG

    Enhancing Taobao Display Advertising with Multimodal Representations: Challenges, Approaches and Insights

    Authors: Xiang-Rong Sheng, Feifan Yang, Litong Gong, Biao Wang, Zhangming Chan, Yujing Zhang, Yueyao Cheng, Yong-Nan Zhu, Tiezheng Ge, Han Zhu, Yuning Jiang, Jian Xu, Bo Zheng

    Abstract: Despite the recognized potential of multimodal data to improve model accuracy, many large-scale industrial recommendation systems, including Taobao display advertising system, predominantly depend on sparse ID features in their models. In this work, we explore approaches to leverage multimodal data to enhance the recommendation accuracy. We start from identifying the key challenges in adopting mul… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: Accepted at CIKM 2024

  2. arXiv:2407.15498  [pdf, other

    cs.CL

    Refining Corpora from a Model Calibration Perspective for Chinese Spelling Correction

    Authors: Dingyao Yu, Yang An, Wei Ye, Xiongfeng Xiao, Shaoguang Mao, Tao Ge, Shikun Zhang

    Abstract: Chinese Spelling Correction (CSC) commonly lacks large-scale high-quality corpora, due to the labor-intensive labeling of spelling errors in real-life human writing or typing scenarios. Two data augmentation methods are widely adopted: (1) \textit{Random Replacement} with the guidance of confusion sets and (2) \textit{OCR/ASR-based Generation} that simulates character misusing. However, both metho… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  3. arXiv:2407.06112  [pdf, other

    cs.CL

    Enhancing Language Model Rationality with Bi-Directional Deliberation Reasoning

    Authors: Yadong Zhang, Shaoguang Mao, Wenshan Wu, Yan Xia, Tao Ge, Man Lan, Furu Wei

    Abstract: This paper introduces BI-Directional DEliberation Reasoning (BIDDER), a novel reasoning approach to enhance the decision rationality of language models. Traditional reasoning methods typically rely on historical information and employ uni-directional (left-to-right) reasoning strategy. This lack of bi-directional deliberation reasoning results in limited awareness of potential future outcomes and… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  4. arXiv:2406.01359  [pdf, other

    cs.CL cs.SE

    R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models

    Authors: Ken Deng, Jiaheng Liu, He Zhu, Congnan Liu, Jingxin Li, Jiakai Wang, Peng Zhao, Chenchen Zhang, Yanan Wu, Xueqiao Yin, Yuanxing Zhang, Wenbo Su, Bangyu Xiang, Tiezheng Ge, Bo Zheng

    Abstract: Code completion models have made significant progress in recent years. Recently, repository-level code completion has drawn more attention in modern software development, and several baseline methods and benchmarks have been proposed. However, existing repository-level code completion methods often fall short of fully using the extensive context of a project repository, such as the intricacies of… ▽ More

    Submitted 3 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  5. arXiv:2405.14040  [pdf, other

    cs.MM

    Synchronized Video Storytelling: Generating Video Narrations with Structured Storyline

    Authors: Dingyi Yang, Chunru Zhan, Ziheng Wang, Biao Wang, Tiezheng Ge, Bo Zheng, Qin Jin

    Abstract: Video storytelling is engaging multimedia content that utilizes video and its accompanying narration to attract the audience, where a key challenge is creating narrations for recorded visual scenes. Previous studies on dense video captioning and video story generation have made some progress. However, in practical applications, we typically require synchronized narrations for ongoing visual scenes… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 15 pages, 13 figures

  6. arXiv:2405.13792  [pdf, other

    cs.CL cs.AI cs.IR

    xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token

    Authors: Xin Cheng, Xun Wang, Xingxing Zhang, Tao Ge, Si-Qing Chen, Furu Wei, Huishuai Zhang, Dongyan Zhao

    Abstract: This paper introduces xRAG, an innovative context compression method tailored for retrieval-augmented generation. xRAG reinterprets document embeddings in dense retrieval--traditionally used solely for retrieval--as features from the retrieval modality. By employing a modality fusion methodology, xRAG seamlessly integrates these embeddings into the language model representation space, effectively… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  7. arXiv:2404.14768  [pdf, other

    cs.CV

    Enhancing Prompt Following with Visual Control Through Training-Free Mask-Guided Diffusion

    Authors: Hongyu Chen, Yiqi Gao, Min Zhou, Peng Wang, Xubin Li, Tiezheng Ge, Bo Zheng

    Abstract: Recently, integrating visual controls into text-to-image~(T2I) models, such as ControlNet method, has received significant attention for finer control capabilities. While various training-free methods make efforts to enhance prompt following in T2I models, the issue with visual control is still rarely studied, especially in the scenario that visual controls are misaligned with text prompts. In thi… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  8. arXiv:2404.13984  [pdf, other

    cs.CV

    RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance

    Authors: Chengrui Wang, Pengfei Liu, Min Zhou, Ming Zeng, Xubin Li, Tiezheng Ge, Bo zheng

    Abstract: Although diffusion models can generate high-quality human images, their applications are limited by the instability in generating hands with correct structures. Some previous works mitigate the problem by considering hand structure yet struggle to maintain style consistency between refined malformed hands and other image regions. In this paper, we aim to solve the problem of inconsistency regardin… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  9. arXiv:2404.13903  [pdf, other

    cs.CV

    Accelerating Image Generation with Sub-path Linear Approximation Model

    Authors: Chen Xu, Tianhui Song, Weixin Feng, Xubin Li, Tiezheng Ge, Bo Zheng, Limin Wang

    Abstract: Diffusion models have significantly advanced the state of the art in image, audio, and video generation tasks. However, their applications in practical scenarios are hindered by slow inference speed. Drawing inspiration from the approximation strategies utilized in consistency models, we propose the Sub-path Linear Approximation Model (SLAM), which accelerates diffusion models while maintaining hi… ▽ More

    Submitted 21 July, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  10. arXiv:2404.01230  [pdf, other

    cs.CL

    LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models

    Authors: Yadong Zhang, Shaoguang Mao, Tao Ge, Xun Wang, Adrian de Wynter, Yan Xia, Wenshan Wu, Ting Song, Man Lan, Furu Wei

    Abstract: This paper presents a comprehensive survey of the current status and opportunities for Large Language Models (LLMs) in strategic reasoning, a sophisticated form of reasoning that necessitates understanding and predicting adversary actions in multi-agent settings while adjusting strategies accordingly. Strategic reasoning is distinguished by its focus on the dynamic and uncertain nature of interact… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: 9 pages, 5 figures

  11. arXiv:2403.02827  [pdf, other

    cs.CV

    Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation

    Authors: Weijie Li, Litong Gong, Yiran Zhu, Fanda Fan, Biao Wang, Tiezheng Ge, Bo Zheng

    Abstract: Image-to-video (I2V) generation tasks always suffer from keeping high fidelity in the open domains. Traditional image animation techniques primarily focus on specific domains such as faces or human poses, making them difficult to generalize to open domains. Several recent I2V frameworks based on diffusion models can generate dynamic content for open domain images but fail to maintain fidelity. We… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  12. arXiv:2403.01800  [pdf, other

    cs.CV

    AtomoVideo: High Fidelity Image-to-Video Generation

    Authors: Litong Gong, Yiran Zhu, Weijie Li, Xiaoyang Kang, Biao Wang, Tiezheng Ge, Bo Zheng

    Abstract: Recently, video generation has achieved significant rapid development based on superior text-to-image generation techniques. In this work, we propose a high fidelity framework for image-to-video generation, named AtomoVideo. Based on multi-granularity image injection, we achieve higher fidelity of the generated video to the given image. In addition, thanks to high quality datasets and training str… ▽ More

    Submitted 5 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Technical report. Page: https://atomo-video.github.io/

  13. arXiv:2402.14762  [pdf, other

    cs.CL cs.AI

    MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues

    Authors: Ge Bai, Jie Liu, Xingyuan Bu, Yancheng He, Jiaheng Liu, Zhanhui Zhou, Zhuoran Lin, Wenbo Su, Tiezheng Ge, Bo Zheng, Wanli Ouyang

    Abstract: The advent of Large Language Models (LLMs) has drastically enhanced dialogue systems. However, comprehensively evaluating the dialogue abilities of LLMs remains a challenge. Previous benchmarks have primarily focused on single-turn dialogues or provided coarse-grained and incomplete assessments of multi-turn dialogues, overlooking the complexity and fine-grained nuances of real-life dialogues. To… ▽ More

    Submitted 25 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: [ACL 2024] The first three authors contribute equally, 34 pages, repo at https://github.com/mtbench101/mt-bench-101

  14. arXiv:2402.14660  [pdf, other

    cs.CL cs.AI

    ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models

    Authors: Yanan Wu, Jie Liu, Xingyuan Bu, Jiaheng Liu, Zhanhui Zhou, Yuanxing Zhang, Chenchen Zhang, Zhiqi Bai, Haibin Chen, Tiezheng Ge, Wanli Ouyang, Wenbo Su, Bo Zheng

    Abstract: This paper introduces ConceptMath, a bilingual (English and Chinese), fine-grained benchmark that evaluates concept-wise mathematical reasoning of Large Language Models (LLMs). Unlike traditional benchmarks that evaluate general mathematical reasoning with an average accuracy, ConceptMath systematically organizes math problems under a hierarchy of math concepts, so that mathematical reasoning can… ▽ More

    Submitted 23 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: The benchmark dataset will be released soon

  15. arXiv:2402.01521  [pdf, other

    cs.CL cs.AI

    K-Level Reasoning with Large Language Models

    Authors: Yadong Zhang, Shaoguang Mao, Tao Ge, Xun Wang, Yan Xia, Man Lan, Furu Wei

    Abstract: While Large Language Models (LLMs) have demonstrated their proficiency in complex reasoning tasks, their performance in dynamic, interactive, and competitive scenarios - such as business strategy and stock market analysis - remains underexplored. To bridge this gap, we formally explore the dynamic reasoning capabilities of LLMs for decision-making in rapidly evolving environments. We introduce two… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  16. arXiv:2401.07851  [pdf, other

    cs.CL

    Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding

    Authors: Heming Xia, Zhe Yang, Qingxiu Dong, Peiyi Wang, Yongqi Li, Tao Ge, Tianyu Liu, Wenjie Li, Zhifang Sui

    Abstract: To mitigate the high inference latency stemming from autoregressive decoding in Large Language Models (LLMs), Speculative Decoding has emerged as a novel decoding paradigm for LLM inference. In each decoding step, this method first drafts several future tokens efficiently and then verifies them in parallel. Unlike autoregressive decoding, Speculative Decoding facilitates the simultaneous decoding… ▽ More

    Submitted 4 June, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

    Comments: ACL 2024 Findings (Long Paper), camera-ready version

  17. arXiv:2401.06951  [pdf, other

    cs.CL cs.AI

    E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

    Authors: Jiaheng Liu, Zhiqi Bai, Yuanxing Zhang, Chenchen Zhang, Yu Zhang, Ge Zhang, Jiakai Wang, Haoran Que, Yukang Chen, Wenbo Su, Tiezheng Ge, Jie Fu, Wenhu Chen, Bo Zheng

    Abstract: Typically, training LLMs with long context sizes is computationally expensive, requiring extensive training hours and GPU resources. Existing long-context extension methods usually need additional training procedures to support corresponding long-context windows, where the long-context training data (e.g., 32k) is needed, and high GPU training costs are assumed. To address the aforementioned issue… ▽ More

    Submitted 22 February, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

  18. arXiv:2401.02678  [pdf, other

    cs.SD cs.MM eess.AS

    MusicAOG: an Energy-Based Model for Learning and Sampling a Hierarchical Representation of Symbolic Music

    Authors: Yikai Qian, Tianle Wang, Xinyi Tong, Xin Jin, Duo Xu, Bo Zheng, Tiezheng Ge, Feng Yu, Song-Chun Zhu

    Abstract: In addressing the challenge of interpretability and generalizability of artificial music intelligence, this paper introduces a novel symbolic representation that amalgamates both explicit and implicit musical information across diverse traditions and granularities. Utilizing a hierarchical and-or graph representation, the model employs nodes and edges to encapsulate a broad spectrum of musical ele… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  19. arXiv:2311.03220  [pdf, other

    cs.CL cs.AI cs.GT

    ALYMPICS: LLM Agents Meet Game Theory -- Exploring Strategic Decision-Making with AI Agents

    Authors: Shaoguang Mao, Yuzhe Cai, Yan Xia, Wenshan Wu, Xun Wang, Fengyi Wang, Tao Ge, Furu Wei

    Abstract: This paper introduces Alympics (Olympics for Agents), a systematic simulation framework utilizing Large Language Model (LLM) agents for game theory research. Alympics creates a versatile platform for studying complex game theory problems, bridging the gap between theoretical game theory and empirical investigations by providing a controlled environment for simulating human-like strategic interacti… ▽ More

    Submitted 16 January, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

  20. arXiv:2309.17061  [pdf, other

    cs.CL cs.AI

    SCALE: Synergized Collaboration of Asymmetric Language Translation Engines

    Authors: Xin Cheng, Xun Wang, Tao Ge, Si-Qing Chen, Furu Wei, Dongyan Zhao, Rui Yan

    Abstract: In this paper, we introduce SCALE, a collaborative framework that connects compact Specialized Translation Models (STMs) and general-purpose Large Language Models (LLMs) as one unified translation engine. By introducing translation from STM into the triplet in-context demonstrations, SCALE unlocks refinement and pivoting ability of LLM, thus mitigating language bias of LLM and parallel data bias o… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

  21. arXiv:2309.15214  [pdf, other

    cs.LG physics.ao-ph

    Residual Corrective Diffusion Modeling for Km-scale Atmospheric Downscaling

    Authors: Morteza Mardani, Noah Brenowitz, Yair Cohen, Jaideep Pathak, Chieh-Yu Chen, Cheng-Chin Liu, Arash Vahdat, Mohammad Amin Nabian, Tao Ge, Akshay Subramaniam, Karthik Kashinath, Jan Kautz, Mike Pritchard

    Abstract: The state of the art for physical hazard prediction from weather and climate requires expensive km-scale numerical simulations driven by coarser resolution global inputs. Here, a generative diffusion architecture is explored for downscaling such global inputs to km-scale, as a cost-effective machine learning alternative. The model is trained to predict 2km data from a regional weather model over T… ▽ More

    Submitted 11 August, 2024; v1 submitted 24 September, 2023; originally announced September 2023.

  22. arXiv:2309.02119  [pdf, other

    cs.CV

    Hierarchical Masked 3D Diffusion Model for Video Outpainting

    Authors: Fanda Fan, Chaoxu Guo, Litong Gong, Biao Wang, Tiezheng Ge, Yuning Jiang, Chunjie Luo, Jianfeng Zhan

    Abstract: Video outpainting aims to adequately complete missing areas at the edges of video frames. Compared to image outpainting, it presents an additional challenge as the model should maintain the temporal consistency of the filled area. In this paper, we introduce a masked 3D diffusion model for video outpainting. We use the technique of mask modeling to train the 3D diffusion model. This allows us to u… ▽ More

    Submitted 19 January, 2024; v1 submitted 5 September, 2023; originally announced September 2023.

    Comments: Accepted to ACM MM 2023

  23. arXiv:2308.05996  [pdf, other

    cs.AI

    Deep Task-specific Bottom Representation Network for Multi-Task Recommendation

    Authors: Qi Liu, Zhilong Zhou, Gangwei Jiang, Tiezheng Ge, Defu Lian

    Abstract: Neural-based multi-task learning (MTL) has gained significant improvement, and it has been successfully applied to recommendation system (RS). Recent deep MTL methods for RS (e.g. MMoE, PLE) focus on designing soft gating-based parameter-sharing networks that implicitly learn a generalized representation for each task. However, MTL methods may suffer from performance degeneration when dealing with… ▽ More

    Submitted 17 August, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

    Comments: CIKM'23

  24. TextPainter: Multimodal Text Image Generation with Visual-harmony and Text-comprehension for Poster Design

    Authors: Yifan Gao, Jinpeng Lin, Min Zhou, Chuanbin Liu, Hongtao Xie, Tiezheng Ge, Yuning Jiang

    Abstract: Text design is one of the most critical procedures in poster design, as it relies heavily on the creativity and expertise of humans to design text images considering the visual harmony and text-semantic. This study introduces TextPainter, a novel multimodal approach that leverages contextual visual information and corresponding text semantics to generate text images. Specifically, TextPainter take… ▽ More

    Submitted 12 August, 2023; v1 submitted 9 August, 2023; originally announced August 2023.

    Comments: Accepted to ACM MM 2023. Dataset Link: https://tianchi.aliyun.com/dataset/160034

  25. arXiv:2308.01095  [pdf, other

    cs.CV

    AutoPoster: A Highly Automatic and Content-aware Design System for Advertising Poster Generation

    Authors: Jinpeng Lin, Min Zhou, Ye Ma, Yifan Gao, Chenxi Fei, Yangjian Chen, Zhang Yu, Tiezheng Ge

    Abstract: Advertising posters, a form of information presentation, combine visual and linguistic modalities. Creating a poster involves multiple steps and necessitates design experience and creativity. This paper introduces AutoPoster, a highly automatic and content-aware system for generating advertising posters. With only product images and titles as inputs, AutoPoster can automatically produce posters of… ▽ More

    Submitted 23 August, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

    Comments: Accepted for ACM MM 2023

  26. arXiv:2307.16399  [pdf, other

    cs.MM cs.CV

    Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences

    Authors: Dingyi Yang, Hongyu Chen, Xinglin Hou, Tiezheng Ge, Yuning Jiang, Qin Jin

    Abstract: Stylized visual captioning aims to generate image or video descriptions with specific styles, making them more attractive and emotionally appropriate. One major challenge with this task is the lack of paired stylized captions for visual content, so most existing works focus on unsupervised methods that do not rely on parallel datasets. However, these approaches still require training with sufficie… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: 9 pages, 6 figures

  27. arXiv:2307.06945  [pdf, other

    cs.CL cs.AI cs.LG

    In-context Autoencoder for Context Compression in a Large Language Model

    Authors: Tao Ge, Jing Hu, Lei Wang, Xun Wang, Si-Qing Chen, Furu Wei

    Abstract: We propose the In-context Autoencoder (ICAE), leveraging the power of a large language model (LLM) to compress a long context into short compact memory slots that can be directly conditioned on by the LLM for various purposes. ICAE is first pretrained using both autoencoding and language modeling objectives on massive text data, enabling it to generate memory slots that accurately and comprehensiv… ▽ More

    Submitted 8 May, 2024; v1 submitted 13 July, 2023; originally announced July 2023.

    Comments: v4: Final camera ready for ICLR'24

  28. arXiv:2307.05300  [pdf, other

    cs.AI cs.CL

    Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration

    Authors: Zhenhailong Wang, Shaoguang Mao, Wenshan Wu, Tao Ge, Furu Wei, Heng Ji

    Abstract: Human intelligence thrives on cognitive synergy, where collaboration among different minds yield superior outcomes compared to isolated individuals. In this work, we propose Solo Performance Prompting (SPP), which transforms a single LLM into a cognitive synergist by engaging in multi-turn self-collaboration with multiple personas. A cognitive synergist is an intelligent agent that collaboratively… ▽ More

    Submitted 26 March, 2024; v1 submitted 11 July, 2023; originally announced July 2023.

    Comments: Accepted as a main conference paper at NAACL 2024

  29. arXiv:2305.09975  [pdf, other

    cs.CL

    Smart Word Suggestions for Writing Assistance

    Authors: Chenshuo Wang, Shaoguang Mao, Tao Ge, Wenshan Wu, Xun Wang, Yan Xia, Jonathan Tien, Dongyan Zhao

    Abstract: Enhancing word usage is a desired feature for writing assistance. To further advance research in this area, this paper introduces "Smart Word Suggestions" (SWS) task and benchmark. Unlike other works, SWS emphasizes end-to-end evaluation and presents a more realistic writing assistance scenario. This task involves identifying words or phrases that require improvement and providing substitution sug… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: Accepted by Findings of ACL23

  30. arXiv:2305.08389  [pdf, other

    cs.CV cs.MM

    Edit As You Wish: Video Caption Editing with Multi-grained User Control

    Authors: Linli Yao, Yuanmeng Zhang, Ziheng Wang, Xinglin Hou, Tiezheng Ge, Yuning Jiang, Xu Sun, Qin Jin

    Abstract: Automatically narrating videos in natural language complying with user requests, i.e. Controllable Video Captioning task, can help people manage massive videos with desired intentions. However, existing works suffer from two shortcomings: 1) the control signal is single-grained which can not satisfy diverse user intentions; 2) the video description is generated in a single round which can not be f… ▽ More

    Submitted 8 August, 2024; v1 submitted 15 May, 2023; originally announced May 2023.

    Comments: Accepted by ACM MM 2024

  31. arXiv:2304.08103  [pdf, other

    cs.CL cs.HC

    Low-code LLM: Graphical User Interface over Large Language Models

    Authors: Yuzhe Cai, Shaoguang Mao, Wenshan Wu, Zehua Wang, Yaobo Liang, Tao Ge, Chenfei Wu, Wang You, Ting Song, Yan Xia, Jonathan Tien, Nan Duan, Furu Wei

    Abstract: Utilizing Large Language Models (LLMs) for complex tasks is challenging, often involving a time-consuming and uncontrollable prompt engineering process. This paper introduces a novel human-LLM interaction framework, Low-code LLM. It incorporates six types of simple low-code visual programming interactions to achieve more controllable and stable responses. Through visual interaction with a graphica… ▽ More

    Submitted 1 April, 2024; v1 submitted 17 April, 2023; originally announced April 2023.

    Comments: Accepted as a Demo Track paper at NAACL 2024

  32. arXiv:2304.04487  [pdf, other

    cs.CL cs.AI

    Inference with Reference: Lossless Acceleration of Large Language Models

    Authors: Nan Yang, Tao Ge, Liang Wang, Binxing Jiao, Daxin Jiang, Linjun Yang, Rangan Majumder, Furu Wei

    Abstract: We propose LLMA, an LLM accelerator to losslessly speed up Large Language Model (LLM) inference with references. LLMA is motivated by the observation that there are abundant identical text spans between the decoding result by an LLM and the reference that is available in many real world scenarios (e.g., retrieved documents). LLMA first selects a text span from the reference and copies its tokens t… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

    Comments: 9 pages

  33. arXiv:2304.02897  [pdf, other

    cs.DB cs.DS

    LSketch: A Label-Enabled Graph Stream Sketch Toward Time-Sensitive Queries

    Authors: Yiling Zeng, Chunyao Song, Yuhan Li, Tingjian Ge

    Abstract: Graph streams represent data interactions in real applications. The mining of graph streams plays an important role in network security, social network analysis, and traffic control, among others. However, the sheer volume and high dynamics cause great challenges for efficient storage and subsequent query analysis on them. Current studies apply sketches to summarize graph streams. We propose LSket… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

  34. arXiv:2303.14377  [pdf, other

    cs.CV

    Unsupervised Domain Adaption with Pixel-level Discriminator for Image-aware Layout Generation

    Authors: Chenchen Xu, Min Zhou, Tiezheng Ge, Yuning Jiang, Weiwei Xu

    Abstract: Layout is essential for graphic design and poster generation. Recently, applying deep learning models to generate layouts has attracted increasing attention. This paper focuses on using the GAN-based model conditioned on image contents to generate advertising poster graphic layouts, which requires an advertising poster layout dataset with paired product images and graphic layouts. However, the pai… ▽ More

    Submitted 25 March, 2023; originally announced March 2023.

    Comments: 8 pages, 4 figures, 7 tables, accepted by CVPR2023

  35. arXiv:2303.14017  [pdf, other

    cs.CV

    CF-Font: Content Fusion for Few-shot Font Generation

    Authors: Chi Wang, Min Zhou, Tiezheng Ge, Yuning Jiang, Hujun Bao, Weiwei Xu

    Abstract: Content and style disentanglement is an effective way to achieve few-shot font generation. It allows to transfer the style of the font image in a source domain to the style defined with a few reference images in a target domain. However, the content feature extracted using a representative font might not be optimal. In light of this, we propose a content fusion module (CFM) to project the content… ▽ More

    Submitted 15 April, 2024; v1 submitted 24 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR 2023

  36. arXiv:2303.01421  [pdf, other

    cs.CL cs.LG

    Semiparametric Language Models Are Scalable Continual Learners

    Authors: Guangyue Peng, Tao Ge, Si-Qing Chen, Furu Wei, Houfeng Wang

    Abstract: Semiparametric language models (LMs) have shown promise in continuously learning from new text data by combining a parameterized neural LM with a growable non-parametric memory for memorizing new content. However, conventional semiparametric LMs will finally become prohibitive for computing and storing if they are applied to continual learning over streaming data, because the non-parametric memory… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: Work in progress

  37. arXiv:2302.00577  [pdf, other

    eess.IV cs.LG

    MB-DECTNet: A Model-Based Unrolled Network for Accurate 3D DECT Reconstruction

    Authors: Tao Ge, Maria Medrano, Rui Liao, David G. Politte, Jeffrey F. Williamson, Bruce R. Whiting, Joseph A. O'Sullivan

    Abstract: Numerous dual-energy CT (DECT) techniques have been developed in the past few decades. Dual-energy CT (DECT) statistical iterative reconstruction (SIR) has demonstrated its potential for reducing noise and increasing accuracy. Our lab proposed a joint statistical DECT algorithm for stopping power estimation and showed that it outperforms competing image-based material-decomposition methods. Howeve… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

    ACM Class: I.4.5

  38. arXiv:2212.10190  [pdf, other

    cs.CL

    Pay Attention to Your Tone: Introducing a New Dataset for Polite Language Rewrite

    Authors: Xun Wang, Tao Ge, Allen Mao, Yuki Li, Furu Wei, Si-Qing Chen

    Abstract: We introduce \textsc{PoliteRewrite} -- a dataset for polite language rewrite which is a novel sentence rewrite task. Compared with previous text style transfer tasks that can be mostly addressed by slight token- or phrase-level edits, polite language rewrite requires deep understanding and extensive sentence-level edits over an offensive and impolite sentence to deliver the same message euphemisti… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

  39. arXiv:2212.02871  [pdf, other

    cs.CV

    Video Object of Interest Segmentation

    Authors: Siyuan Zhou, Chunru Zhan, Biao Wang, Tiezheng Ge, Yuning Jiang, Li Niu

    Abstract: In this work, we present a new computer vision task named video object of interest segmentation (VOIS). Given a video and a target image of interest, our objective is to simultaneously segment and track all objects in the video that are relevant to the target image. This problem combines the traditional video object segmentation task with an additional image indicating the content that users are c… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: 13 pages, 8 figures

  40. arXiv:2212.00616  [pdf, other

    cs.CL

    Extensible Prompts for Language Models on Zero-shot Language Style Customization

    Authors: Tao Ge, Jing Hu, Li Dong, Shaoguang Mao, Yan Xia, Xun Wang, Si-Qing Chen, Furu Wei

    Abstract: We propose eXtensible Prompt (X-Prompt) for prompting a large language model (LLM) beyond natural language (NL). X-Prompt instructs an LLM with not only NL but also an extensible vocabulary of imaginary words. Registering new imaginary words allows us to instruct the LLM to comprehend concepts that are difficult to describe with NL words, thereby making a prompt more descriptive. Also, these imagi… ▽ More

    Submitted 30 November, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

    Comments: Accepted by NeurIPS 2023

  41. arXiv:2210.12293  [pdf, other

    physics.ao-ph cs.LG

    DL-Corrector-Remapper: A grid-free bias-correction deep learning methodology for data-driven high-resolution global weather forecasting

    Authors: Tao Ge, Jaideep Pathak, Akshay Subramaniam, Karthik Kashinath

    Abstract: Data-driven models, such as FourCastNet (FCN), have shown exemplary performance in high-resolution global weather forecasting. This performance, however, is based on supervision on mesh-gridded weather data without the utilization of raw climate observational data, the gold standard ground truth. In this work we develop a methodology to correct, remap, and fine-tune gridded uniform forecasts of FC… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

  42. arXiv:2209.14529  [pdf, other

    cs.CV cs.AI

    Motion and Appearance Adaptation for Cross-Domain Motion Transfer

    Authors: Borun Xu, Biao Wang, Jinhong Deng, Jiale Tao, Tiezheng Ge, Yuning Jiang, Wen Li, Lixin Duan

    Abstract: Motion transfer aims to transfer the motion of a driving video to a source image. When there are considerable differences between object in the driving video and that in the source image, traditional single domain motion transfer approaches often produce notable artifacts; for example, the synthesized image may fail to preserve the human shape of the source image (cf . Fig. 1 (a)). To address this… ▽ More

    Submitted 6 October, 2022; v1 submitted 28 September, 2022; originally announced September 2022.

    Comments: fix bugs

  43. arXiv:2209.14024  [pdf, other

    cs.CV

    Motion Transformer for Unsupervised Image Animation

    Authors: Jiale Tao, Biao Wang, Tiezheng Ge, Yuning Jiang, Wen Li, Lixin Duan

    Abstract: Image animation aims to animate a source image by using motion learned from a driving video. Current state-of-the-art methods typically use convolutional neural networks (CNNs) to predict motion information, such as motion keypoints and corresponding local transformations. However, these CNN based methods do not explicitly model the interactions between motions; as a result, the important underlyi… ▽ More

    Submitted 28 September, 2022; originally announced September 2022.

  44. arXiv:2209.00852  [pdf, other

    cs.CV cs.MM

    Geometry Aligned Variational Transformer for Image-conditioned Layout Generation

    Authors: Yunning Cao, Ye Ma, Min Zhou, Chuanbin Liu, Hongtao Xie, Tiezheng Ge, Yuning Jiang

    Abstract: Layout generation is a novel task in computer vision, which combines the challenges in both object localization and aesthetic appraisal, widely used in advertisements, posters, and slides design. An accurate and pleasant layout should consider both the intra-domain relationship within layout elements and the inter-domain relationship between layout elements and the image. However, most previous me… ▽ More

    Submitted 2 September, 2022; originally announced September 2022.

    Comments: To be published in ACM MM 2022

  45. arXiv:2208.12716  [pdf, other

    cs.CV cs.IT

    Multi-Scale Architectures Matter: On the Adversarial Robustness of Flow-based Lossless Compression

    Authors: Yi-chong Xia, Bin Chen, Yan Feng, Tian-shuo Ge

    Abstract: As a probabilistic modeling technique, the flow-based model has demonstrated remarkable potential in the field of lossless compression \cite{idf,idf++,lbb,ivpf,iflow},. Compared with other deep generative models (eg. Autoregressive, VAEs) \cite{bitswap,hilloc,pixelcnn++,pixelsnail} that explicitly model the data distribution probabilities, flow-based models perform better due to their excellent pr… ▽ More

    Submitted 26 August, 2022; originally announced August 2022.

  46. arXiv:2207.00335  [pdf, other

    cs.LG

    Conditional Variable Selection for Intelligent Test

    Authors: Yiwen Liao, Tianjie Ge, Raphaƫl Latty, Bin Yang

    Abstract: Intelligent test requires efficient and effective analysis of high-dimensional data in a large scale. Traditionally, the analysis is often conducted by human experts, but it is not scalable in the era of big data. To tackle this challenge, variable selection has been recently introduced to intelligent test. However, in practice, we encounter scenarios where certain variables (e.g. some specific pr… ▽ More

    Submitted 1 July, 2022; originally announced July 2022.

    Comments: Accepted by Workshop on Intelligent Methods for Test and Reliability at IEEE ETS 2022

  47. arXiv:2205.10350  [pdf, other

    cs.CL cs.LG

    Lossless Acceleration for Seq2seq Generation with Aggressive Decoding

    Authors: Tao Ge, Heming Xia, Xin Sun, Si-Qing Chen, Furu Wei

    Abstract: We study lossless acceleration for seq2seq generation with a novel decoding algorithm -- Aggressive Decoding. Unlike the previous efforts (e.g., non-autoregressive decoding) speeding up seq2seq generation at the cost of quality loss, our approach aims to yield the identical (or better) generation compared with autoregressive decoding but in a significant speedup, achieved by innovative cooperation… ▽ More

    Submitted 20 May, 2022; originally announced May 2022.

    Comments: 24-page Microsoft Research Technical Report. Content overlap with arXiv:2106.04970 and arXiv:2203.16487

  48. arXiv:2205.03534  [pdf, other

    cs.CL cs.CV cs.MM

    Attract me to Buy: Advertisement Copywriting Generation with Multimodal Multi-structured Information

    Authors: Zhipeng Zhang, Xinglin Hou, Kai Niu, Zhongzhen Huang, Tiezheng Ge, Yuning Jiang, Qi Wu, Peng Wang

    Abstract: Recently, online shopping has gradually become a common way of shopping for people all over the world. Wonderful merchandise advertisements often attract more people to buy. These advertisements properly integrate multimodal multi-structured information of commodities, such as visual spatial information and fine-grained structure information. However, traditional multimodal text generation focuses… ▽ More

    Submitted 6 May, 2022; originally announced May 2022.

  49. arXiv:2205.03039  [pdf, other

    cs.CV

    Dual-Level Decoupled Transformer for Video Captioning

    Authors: Yiqi Gao, Xinglin Hou, Wei Suo, Mengyang Sun, Tiezheng Ge, Yuning Jiang, Peng Wang

    Abstract: Video captioning aims to understand the spatio-temporal semantic concept of the video and generate descriptive sentences. The de-facto approach to this task dictates a text generator to learn from \textit{offline-extracted} motion or appearance features from \textit{pre-trained} vision models. However, these methods may suffer from the so-called \textbf{\textit{"couple"}} drawbacks on both \textit… ▽ More

    Submitted 6 May, 2022; originally announced May 2022.

  50. arXiv:2205.00303  [pdf, other

    cs.CV

    Composition-aware Graphic Layout GAN for Visual-textual Presentation Designs

    Authors: Min Zhou, Chenchen Xu, Ye Ma, Tiezheng Ge, Yuning Jiang, Weiwei Xu

    Abstract: In this paper, we study the graphic layout generation problem of producing high-quality visual-textual presentation designs for given images. We note that image compositions, which contain not only global semantics but also spatial information, would largely affect layout results. Hence, we propose a deep generative model, dubbed as composition-aware graphic layout GAN (CGL-GAN), to synthesize lay… ▽ More

    Submitted 13 July, 2022; v1 submitted 30 April, 2022; originally announced May 2022.

    Comments: Accepted by IJCAI 2022 (AI, THE ARTS AND CREATIVITY TRACK)