Skip to main content

Showing 1–38 of 38 results for author: Diao, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.13863  [pdf, other

    cs.CL cs.AI

    CodeGraph: Enhancing Graph Reasoning of LLMs with Code

    Authors: Qiaolong Cai, Zhaowei Wang, Shizhe Diao, James Kwok, Yangqiu Song

    Abstract: With the increasing popularity of large language models (LLMs), reasoning on basic graph algorithm problems is an essential intermediate step in assessing their abilities to process and infer complex graph reasoning tasks. Existing methods usually convert graph-structured data to textual descriptions and then use LLMs for reasoning and computation. However, LLMs often produce computation errors on… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: In Progress

  2. arXiv:2408.12168  [pdf, other

    cs.CL

    FIRST: Teach A Reliable Large Language Model Through Efficient Trustworthy Distillation

    Authors: KaShun Shum, Minrui Xu, Jianshu Zhang, Zixin Chen, Shizhe Diao, Hanze Dong, Jipeng Zhang, Muhammad Omer Raza

    Abstract: Large language models (LLMs) have become increasingly prevalent in our daily lives, leading to an expectation for LLMs to be trustworthy -- - both accurate and well-calibrated (the prediction confidence should align with its ground truth correctness likelihood). Nowadays, fine-tuning has become the most popular method for adapting a model to practical usage by significantly increasing accuracy on… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  3. arXiv:2407.03203  [pdf, other

    cs.FL cs.AI

    TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts

    Authors: Ruida Wang, Jipeng Zhang, Yizhen Jia, Rui Pan, Shizhe Diao, Renjie Pi, Tong Zhang

    Abstract: Proving mathematical theorems using computer-verifiable formal languages like Lean significantly impacts mathematical reasoning. One approach to formal theorem proving involves generating complete proofs using Large Language Models (LLMs) based on Natural Language (NL) proofs. Similar methods have shown promising results in code generation. However, most modern LLMs exhibit suboptimal performance… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  4. arXiv:2406.10289  [pdf, other

    cs.CL cs.AI cs.IR

    VeraCT Scan: Retrieval-Augmented Fake News Detection with Justifiable Reasoning

    Authors: Cheng Niu, Yang Guan, Yuanhao Wu, Juno Zhu, Juntong Song, Randy Zhong, Kaihua Zhu, Siliang Xu, Shizhe Diao, Tong Zhang

    Abstract: The proliferation of fake news poses a significant threat not only by disseminating misleading information but also by undermining the very foundations of democracy. The recent advance of generative artificial intelligence has further exacerbated the challenge of distinguishing genuine news from fabricated stories. In response to this challenge, we introduce VeraCT Scan, a novel retrieval-augmente… ▽ More

    Submitted 24 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  5. arXiv:2406.06887  [pdf, other

    cs.CL cs.AI cs.LG cs.PL cs.SE

    PLUM: Preference Learning Plus Test Cases Yields Better Code Language Models

    Authors: Dylan Zhang, Shizhe Diao, Xueyan Zou, Hao Peng

    Abstract: Instruction-finetuned code language models (LMs) have shown promise in various programming tasks. They are trained, using a language modeling objective, on natural language instructions and gold code snippet pairs. Recent evidence suggests that these models, never exposed to incorrect solutions during training, often struggle to distinguish between correct and incorrect solutions. This observation… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  6. arXiv:2405.20974  [pdf, other

    cs.CL cs.AI cs.LG

    SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales

    Authors: Tianyang Xu, Shujin Wu, Shizhe Diao, Xiaoze Liu, Xingyao Wang, Yangyi Chen, Jing Gao

    Abstract: Large language models (LLMs) often generate inaccurate or fabricated information and generally fail to indicate their confidence, which limits their broader applications. Previous work elicits confidence from LLMs by direct or self-consistency prompting, or constructing specific datasets for supervised finetuning. The prompting-based approaches have inferior performance, and the training-based app… ▽ More

    Submitted 5 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: The code is available at https://github.com/xu1868/SaySelf

  7. arXiv:2405.10414  [pdf, ps, other

    math.OC cs.LG

    A Reliability Theory of Compromise Decisions for Large-Scale Stochastic Programs

    Authors: Shuotao Diao, Suvrajeet Sen

    Abstract: Stochastic programming models can lead to very large-scale optimization problems for which it may be impossible to enumerate all possible scenarios. In such cases, one adopts a sampling-based solution methodology in which case the reliability of the resulting decisions may be suspect. For such instances, it is advisable to adopt methodologies that promote variance reduction. One such approach goes… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  8. arXiv:2403.17919  [pdf, other

    cs.LG cs.AI cs.CL math.OC

    LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning

    Authors: Rui Pan, Xiang Liu, Shizhe Diao, Renjie Pi, Jipeng Zhang, Chi Han, Tong Zhang

    Abstract: The machine learning community has witnessed impressive advancements since large language models (LLMs) first appeared. Yet, their massive memory consumption has become a significant roadblock to large-scale training. For instance, a 7B model typically requires at least 60 GB of GPU memory with full parameter training, which presents challenges for researchers without access to high-resource envir… ▽ More

    Submitted 25 May, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  9. arXiv:2402.18571  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards

    Authors: Haoxiang Wang, Yong Lin, Wei Xiong, Rui Yang, Shizhe Diao, Shuang Qiu, Han Zhao, Tong Zhang

    Abstract: Fine-grained control over large language models (LLMs) remains a significant challenge, hindering their adaptability to diverse user needs. While Reinforcement Learning from Human Feedback (RLHF) shows promise in aligning LLMs, its reliance on scalar rewards often limits its ability to capture diverse user preferences in real-world applications. To address this limitation, we introduce the Directi… ▽ More

    Submitted 6 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: The code and model are released at https://github.com/Haoxiang-Wang/directional-preference-alignment

  10. arXiv:2402.10528  [pdf, other

    cs.CL cs.AI

    Can We Verify Step by Step for Incorrect Answer Detection?

    Authors: Xin Xu, Shizhe Diao, Can Yang, Yang Wang

    Abstract: Chain-of-Thought (CoT) prompting has marked a significant advancement in enhancing the reasoning capabilities of large language models (LLMs). Previous studies have developed various extensions of CoT, which focus primarily on enhancing end-task performance. In addition, there has been research on assessing the quality of reasoning chains in CoT. This raises an intriguing question: Is it possible… ▽ More

    Submitted 15 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

  11. arXiv:2402.03757  [pdf, other

    cs.CV cs.CL cs.LG

    The Instinctive Bias: Spurious Images lead to Hallucination in MLLMs

    Authors: Tianyang Han, Qing Lian, Rui Pan, Renjie Pi, Jipeng Zhang, Shizhe Diao, Yong Lin, Tong Zhang

    Abstract: Large language models (LLMs) have recently experienced remarkable progress, where the advent of multi-modal large language models (MLLMs) has endowed LLMs with visual capabilities, leading to impressive performances in various multi-modal tasks. However, those powerful MLLMs such as GPT-4V still fail spectacularly when presented with certain image and text inputs. In this paper, we identify a typi… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  12. arXiv:2401.14003  [pdf, other

    cs.CL cs.AI

    ConstraintChecker: A Plugin for Large Language Models to Reason on Commonsense Knowledge Bases

    Authors: Quyet V. Do, Tianqing Fang, Shizhe Diao, Zhaowei Wang, Yangqiu Song

    Abstract: Reasoning over Commonsense Knowledge Bases (CSKB), i.e. CSKB reasoning, has been explored as a way to acquire new commonsense knowledge based on reference knowledge in the original CSKBs and external prior knowledge. Despite the advancement of Large Language Models (LLM) and prompt engineering techniques in various reasoning tasks, they still struggle to deal with CSKB reasoning. One of the proble… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: Proceedings of EACL 2024

  13. arXiv:2311.09677  [pdf, other

    cs.CL

    R-Tuning: Instructing Large Language Models to Say `I Don't Know'

    Authors: Hanning Zhang, Shizhe Diao, Yong Lin, Yi R. Fung, Qing Lian, Xingyao Wang, Yangyi Chen, Heng Ji, Tong Zhang

    Abstract: Large language models (LLMs) have revolutionized numerous domains with their impressive performance but still face their challenges. A predominant issue is the propensity for these models to generate non-existent facts, a concern termed hallucination. Our research is motivated by the observation that previous instruction tuning methods force the model to complete a sentence no matter whether the m… ▽ More

    Submitted 6 June, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: NAACL 2024

  14. arXiv:2311.08364  [pdf, other

    cs.LG cs.AI cs.DM

    Plum: Prompt Learning using Metaheuristic

    Authors: Rui Pan, Shuo Xing, Shizhe Diao, Wenhe Sun, Xiang Liu, Kashun Shum, Renjie Pi, Jipeng Zhang, Tong Zhang

    Abstract: Since the emergence of large language models, prompt learning has become a popular method for optimizing and customizing these models. Special prompts, such as Chain-of-Thought, have even revealed previously unknown reasoning capabilities within these models. However, the progress of discovering effective prompts has been slow, driving a desire for general prompt optimization methods. Unfortunatel… ▽ More

    Submitted 30 June, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: Published at Findings of ACL 2024

  15. arXiv:2310.13596  [pdf, other

    cs.CL cs.AI

    MarineGPT: Unlocking Secrets of Ocean to the Public

    Authors: Ziqiang Zheng, Jipeng Zhang, Tuan-Anh Vu, Shizhe Diao, Yue Him Wong Tim, Sai-Kit Yeung

    Abstract: Large language models (LLMs), such as ChatGPT/GPT-4, have proven to be powerful tools in promoting the user experience as an AI assistant. The continuous works are proposing multi-modal large language models (MLLM), empowering LLMs with the ability to sense multiple modality inputs through constructing a joint semantic space (e.g. visual-text space). Though significant success was achieved in LLMs… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: work in progress. Code and data will be available at https://github.com/hkust-vgd/MarineGPT

  16. arXiv:2310.09751  [pdf, other

    cs.LG

    UniTime: A Language-Empowered Unified Model for Cross-Domain Time Series Forecasting

    Authors: Xu Liu, Junfeng Hu, Yuan Li, Shizhe Diao, Yuxuan Liang, Bryan Hooi, Roger Zimmermann

    Abstract: Multivariate time series forecasting plays a pivotal role in contemporary web technologies. In contrast to conventional methods that involve creating dedicated models for specific time series application domains, this research advocates for a unified model paradigm that transcends domain boundaries. However, learning an effective cross-domain model presents the following challenges. First, various… ▽ More

    Submitted 23 February, 2024; v1 submitted 15 October, 2023; originally announced October 2023.

  17. arXiv:2309.06256  [pdf, other

    cs.LG

    Mitigating the Alignment Tax of RLHF

    Authors: Yong Lin, Hangyu Lin, Wei Xiong, Shizhe Diao, Jianmeng Liu, Jipeng Zhang, Rui Pan, Haoxiang Wang, Wenbin Hu, Hanning Zhang, Hanze Dong, Renjie Pi, Han Zhao, Nan Jiang, Heng Ji, Yuan Yao, Tong Zhang

    Abstract: LLMs acquire a wide range of abilities during pre-training, but aligning LLMs under Reinforcement Learning with Human Feedback (RLHF) can lead to forgetting, which is also known as the alignment tax. To empirically verify this hypothesis, we conducted experiments with existing RLHF algorithms using OpenLLaMA-3B, which revealed a pronounced alignment tax in NLP tasks. On the other hand, despite var… ▽ More

    Submitted 5 February, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

    Comments: 28 Pages

  18. arXiv:2308.16742  [pdf, other

    eess.IV cs.CV

    Unsupervised CT Metal Artifact Reduction by Plugging Diffusion Priors in Dual Domains

    Authors: Xuan Liu, Yaoqin Xie, Songhui Diao, Shan Tan, Xiaokun Liang

    Abstract: During the process of computed tomography (CT), metallic implants often cause disruptive artifacts in the reconstructed images, impeding accurate diagnosis. Several supervised deep learning-based approaches have been proposed for reducing metal artifacts (MAR). However, these methods heavily rely on training with simulated data, as obtaining paired metal artifact CT and clean CT data in clinical s… ▽ More

    Submitted 5 January, 2024; v1 submitted 31 August, 2023; originally announced August 2023.

  19. arXiv:2306.12420  [pdf, other

    cs.CL cs.AI

    LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models

    Authors: Shizhe Diao, Rui Pan, Hanze Dong, Ka Shun Shum, Jipeng Zhang, Wei Xiong, Tong Zhang

    Abstract: Foundation models have demonstrated a great ability to achieve general human-level intelligence far beyond traditional approaches. As the technique keeps attracting attention from the AI community, an increasing number of foundation models are becoming publicly accessible. However, a significant shortcoming of most of these models lies in their performance in specialized-domain and task-specific a… ▽ More

    Submitted 5 May, 2024; v1 submitted 21 June, 2023; originally announced June 2023.

    Comments: Published in NAACL 2024 Demo Track

  20. arXiv:2306.05406  [pdf, other

    cs.CL

    Mixture-of-Domain-Adapters: Decoupling and Injecting Domain Knowledge to Pre-trained Language Models Memories

    Authors: Shizhe Diao, Tianyang Xu, Ruijia Xu, Jiawei Wang, Tong Zhang

    Abstract: Pre-trained language models (PLMs) demonstrate excellent abilities to understand texts in the generic domain while struggling in a specific domain. Although continued pre-training on a large domain-specific corpus is effective, it is costly to tune all the parameters on the domain. In this paper, we investigate whether we can adapt PLMs both effectively and efficiently by only tuning a few paramet… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: ACL 2023

  21. arXiv:2306.03678  [pdf, other

    cs.CL

    On the Difference of BERT-style and CLIP-style Text Encoders

    Authors: Zhihong Chen, Guiming Hardy Chen, Shizhe Diao, Xiang Wan, Benyou Wang

    Abstract: Masked language modeling (MLM) has been one of the most popular pretraining recipes in natural language processing, e.g., BERT, one of the representative models. Recently, contrastive language-image pretraining (CLIP) has also attracted attention, especially its vision models that achieve excellent performance on a broad range of vision tasks. However, few studies are dedicated to studying the tex… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

    Comments: Natural Language Processing. 10 pages, 1 figure. Findings of ACL-2023

  22. arXiv:2305.15887  [pdf, other

    eess.IV cs.CV

    Diffusion Probabilistic Priors for Zero-Shot Low-Dose CT Image Denoising

    Authors: Xuan Liu, Yaoqin Xie, Jun Cheng, Songhui Diao, Shan Tan, Xiaokun Liang

    Abstract: Denoising low-dose computed tomography (CT) images is a critical task in medical image computing. Supervised deep learning-based approaches have made significant advancements in this area in recent years. However, these methods typically require pairs of low-dose and normal-dose CT images for training, which are challenging to obtain in clinical settings. Existing unsupervised deep learning-based… ▽ More

    Submitted 13 July, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

  23. arXiv:2305.14167  [pdf, other

    cs.CV cs.AI

    DetGPT: Detect What You Need via Reasoning

    Authors: Renjie Pi, Jiahui Gao, Shizhe Diao, Rui Pan, Hanze Dong, Jipeng Zhang, Lewei Yao, Jianhua Han, Hang Xu, Lingpeng Kong, Tong Zhang

    Abstract: In recent years, the field of computer vision has seen significant advancements thanks to the development of large language models (LLMs). These models have enabled more effective and sophisticated interactions between humans and machines, paving the way for novel techniques that blur the lines between human and machine intelligence. In this paper, we introduce a new paradigm for object detection… ▽ More

    Submitted 23 May, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

  24. arXiv:2304.06767  [pdf, other

    cs.LG cs.AI cs.CL cs.CV stat.ML

    RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment

    Authors: Hanze Dong, Wei Xiong, Deepanshu Goyal, Yihan Zhang, Winnie Chow, Rui Pan, Shizhe Diao, Jipeng Zhang, Kashun Shum, Tong Zhang

    Abstract: Generative foundation models are susceptible to implicit biases that can arise from extensive unsupervised training data. Such biases can produce suboptimal samples, skewed outcomes, and unfairness, with potentially serious consequences. Consequently, aligning these models with human ethics and preferences is an essential step toward ensuring their responsible and effective deployment in real-worl… ▽ More

    Submitted 1 December, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

    Comments: 29 pages, 12 figures, Published in Transactions on Machine Learning Research (TMLR)

  25. arXiv:2302.12822  [pdf, other

    cs.CL

    Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data

    Authors: KaShun Shum, Shizhe Diao, Tong Zhang

    Abstract: Chain-of-thought (CoT) advances the reasoning abilities of large language models (LLMs) and achieves superior performance in complex reasoning tasks. However, most CoT studies rely on carefully designed human-annotated rational chains to prompt LLMs, posing challenges for real-world applications where labeled data is available without rational chains. This paper proposes a new strategy, Automate-C… ▽ More

    Submitted 27 February, 2024; v1 submitted 24 February, 2023; originally announced February 2023.

    Comments: EMNLP 2023

  26. arXiv:2302.12246  [pdf, other

    cs.CL

    Active Prompting with Chain-of-Thought for Large Language Models

    Authors: Shizhe Diao, Pengcheng Wang, Yong Lin, Rui Pan, Xiang Liu, Tong Zhang

    Abstract: The increasing scale of large language models (LLMs) brings emergent abilities to various complex tasks requiring reasoning, such as arithmetic and commonsense reasoning. It is known that the effective design of task-specific prompts is critical for LLMs' ability to produce high-quality answers. In particular, an effective approach for complex question-and-answer tasks is example-based prompting w… ▽ More

    Submitted 21 July, 2024; v1 submitted 23 February, 2023; originally announced February 2023.

    Comments: Published in ACL 2024

  27. arXiv:2302.10143  [pdf, other

    cs.CL cs.CY

    Hashtag-Guided Low-Resource Tweet Classification

    Authors: Shizhe Diao, Sedrick Scott Keh, Liangming Pan, Zhiliang Tian, Yan Song, Tong Zhang

    Abstract: Social media classification tasks (e.g., tweet sentiment analysis, tweet stance detection) are challenging because social media posts are typically short, informal, and ambiguous. Thus, training on tweets is challenging and demands large-scale human-annotated labels, which are time-consuming and costly to obtain. In this paper, we find that providing hashtags to social media tweets can help allevi… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

    Comments: WWW 2023

  28. arXiv:2302.08958  [pdf, other

    cs.CV

    Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts

    Authors: Zhihong Chen, Shizhe Diao, Benyou Wang, Guanbin Li, Xiang Wan

    Abstract: Medical vision-and-language pre-training (Med-VLP) has shown promising improvements on many downstream medical tasks owing to its applicability to extracting generic representations from medical images and texts. Practically, there exist two typical types, \textit{i.e.}, the fusion-encoder type and the dual-encoder type, depending on whether a heavy fusion module is used. The former is superior at… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

    Comments: Work in progress

  29. arXiv:2211.17201  [pdf, other

    cs.CL cs.LG math.OC

    ExtremeBERT: A Toolkit for Accelerating Pretraining of Customized BERT

    Authors: Rui Pan, Shizhe Diao, Jianlin Chen, Tong Zhang

    Abstract: In this paper, we present ExtremeBERT, a toolkit for accelerating and customizing BERT pretraining. Our goal is to provide an easy-to-use BERT pretraining toolkit for the research community and industry. Thus, the pretraining of popular language models on customized datasets is affordable with limited resources. Experiments show that, to achieve the same or better GLUE scores, the time cost of our… ▽ More

    Submitted 30 November, 2022; originally announced November 2022.

  30. arXiv:2211.11638  [pdf, other

    cs.LG stat.ML

    Normalizing Flow with Variational Latent Representation

    Authors: Hanze Dong, Shizhe Diao, Weizhong Zhang, Tong Zhang

    Abstract: Normalizing flow (NF) has gained popularity over traditional maximum likelihood based methods due to its strong capability to model complex data distributions. However, the standard approach, which maps the observed data to a normal distribution, has difficulty in handling data distributions with multiple relatively isolated modes. To overcome this issue, we propose a new framework based on variat… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

    Comments: 24 pages, 7 figures

  31. arXiv:2206.07699  [pdf, other

    cs.CV cs.CL cs.LG

    Write and Paint: Generative Vision-Language Models are Unified Modal Learners

    Authors: Shizhe Diao, Wangchunshu Zhou, Xinsong Zhang, Jiawei Wang

    Abstract: Recent advances in vision-language pre-training have pushed the state-of-the-art on various vision-language tasks, making machines more capable of multi-modal writing (image-to-text generation) and painting (text-to-image generation). However, few studies investigate if these two essential capabilities can be learned together and boost each other, making a versatile and powerful multi-modal founda… ▽ More

    Submitted 16 February, 2023; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: ICLR 2023

  32. arXiv:2205.15237  [pdf, other

    cs.CV cs.CL cs.LG

    VLUE: A Multi-Task Benchmark for Evaluating Vision-Language Models

    Authors: Wangchunshu Zhou, Yan Zeng, Shizhe Diao, Xinsong Zhang

    Abstract: Recent advances in vision-language pre-training (VLP) have demonstrated impressive performance in a range of vision-language (VL) tasks. However, there exist several challenges for measuring the community's progress in building general multi-modal intelligence. First, most of the downstream VL datasets are annotated using raw images that are already seen during pre-training, which may result in an… ▽ More

    Submitted 30 May, 2022; originally announced May 2022.

    Comments: ICML 2022, Benchmark website at https://vlue-benchmark.github.io

  33. arXiv:2201.08531  [pdf, other

    cs.CL

    Black-box Prompt Learning for Pre-trained Language Models

    Authors: Shizhe Diao, Zhichao Huang, Ruijia Xu, Xuechun Li, Yong Lin, Xiao Zhou, Tong Zhang

    Abstract: The increasing scale of general-purpose Pre-trained Language Models (PLMs) necessitates the study of more efficient adaptation across different downstream tasks. In this paper, we establish a Black-box Discrete Prompt Learning (BDPL) to resonate with pragmatic interactions between the cloud infrastructure and edge devices. Particularly, instead of fine-tuning the model in the cloud, we adapt PLMs… ▽ More

    Submitted 23 February, 2023; v1 submitted 20 January, 2022; originally announced January 2022.

    Comments: To appear in the Transactions on Machine Learning Research (TMLR)

  34. arXiv:2112.11915  [pdf, other

    cs.CL cs.AI

    Automatic Product Copywriting for E-Commerce

    Authors: Xueying Zhang, Yanyan Zou, Hainan Zhang, Jing Zhou, Shiliang Diao, Jiajia Chen, Zhuoye Ding, Zhen He, Xueqi He, Yun Xiao, Bo Long, Han Yu, Lingfei Wu

    Abstract: Product copywriting is a critical component of e-commerce recommendation platforms. It aims to attract users' interest and improve user experience by highlighting product characteristics with textual descriptions. In this paper, we report our experience deploying the proposed Automatic Product Copywriting Generation (APCG) system into the JD.com e-commerce product recommendation platform. It consi… ▽ More

    Submitted 15 December, 2021; originally announced December 2021.

    Comments: Accepted by AAAI 2022/IAAI 2022 under the track of "Highly Innovative Applications of AI"

  35. arXiv:2112.10613  [pdf, other

    cs.IR cs.AI cs.CL

    Intelligent Online Selling Point Extraction for E-Commerce Recommendation

    Authors: Xiaojie Guo, Shugen Wang, Hanqing Zhao, Shiliang Diao, Jiajia Chen, Zhuoye Ding, Zhen He, Yun Xiao, Bo Long, Han Yu, Lingfei Wu

    Abstract: In the past decade, automatic product description generation for e-commerce have witnessed significant advancement. As the services provided by e-commerce platforms become diverse, it is necessary to dynamically adapt the patterns of descriptions generated. The selling point of products is an important type of product description for which the length should be as short as possible while still conv… ▽ More

    Submitted 15 December, 2021; originally announced December 2021.

    Comments: IAAI 2022 industry award

  36. arXiv:2111.05685  [pdf, other

    cs.LG cs.AI cs.CV

    Efficient Neural Network Training via Forward and Backward Propagation Sparsification

    Authors: Xiao Zhou, Weizhong Zhang, Zonghao Chen, Shizhe Diao, Tong Zhang

    Abstract: Sparse training is a natural idea to accelerate the training speed of deep neural networks and save the memory usage, especially since large modern neural networks are significantly over-parameterized. However, most of the existing methods cannot achieve this goal in practice because the chain rule based gradient (w.r.t. structure parameters) estimators adopted by previous methods require dense co… ▽ More

    Submitted 10 November, 2021; originally announced November 2021.

  37. arXiv:2004.09800   

    cs.CL

    Keyphrase Generation with Cross-Document Attention

    Authors: Shizhe Diao, Yan Song, Tong Zhang

    Abstract: Keyphrase generation aims to produce a set of phrases summarizing the essentials of a given document. Conventional methods normally apply an encoder-decoder architecture to generate the output keyphrases for an input document, where they are designed to focus on each current document so they inevitably omit crucial corpus-level information carried by other similar documents, i.e., the cross-docume… ▽ More

    Submitted 22 December, 2022; v1 submitted 21 April, 2020; originally announced April 2020.

    Comments: This paper will be superseded by another improved version with new approaches, new settings, and new experimental results

  38. arXiv:1911.00720  [pdf, other

    cs.CL

    ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations

    Authors: Shizhe Diao, Jiaxin Bai, Yan Song, Tong Zhang, Yonggang Wang

    Abstract: The pre-training of text encoders normally processes text as a sequence of tokens corresponding to small text units, such as word pieces in English and characters in Chinese. It omits information carried by larger text granularity, and thus the encoders cannot easily adapt to certain combinations of characters. This leads to a loss of important semantic information, which is especially problematic… ▽ More

    Submitted 2 November, 2019; originally announced November 2019.

    Comments: Natural Language Processing. 11 pages, 7 figures