Skip to main content

Showing 1–50 of 55 results for author: Chi, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12128  [pdf, other

    cs.LG cs.CV

    Distribution Alignment for Fully Test-Time Adaptation with Dynamic Online Data Streams

    Authors: Ziqiang Wang, Zhixiang Chi, Yanan Wu, Li Gu, Zhi Liu, Konstantinos Plataniotis, Yang Wang

    Abstract: Given a model trained on source data, Test-Time Adaptation (TTA) enables adaptation and inference in test data streams with domain shifts from the source. Current methods predominantly optimize the model for each incoming test data batch using self-training loss. While these methods yield commendable results in ideal test data streams, where batches are independently and identically sampled from t… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  2. arXiv:2407.04587  [pdf, other

    cs.LG cs.CV

    Multimodal Classification via Modal-Aware Interactive Enhancement

    Authors: Qing-Yuan Jiang, Zhouyang Chi, Yang Yang

    Abstract: Due to the notorious modality imbalance problem, multimodal learning (MML) leads to the phenomenon of optimization imbalance, thus struggling to achieve satisfactory performance. Recently, some representative methods have been proposed to boost the performance, mainly focusing on adaptive adjusting the optimization of each modality to rebalance the learning speed of dominant and non-dominant modal… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  3. arXiv:2407.04255  [pdf, other

    cs.CV

    Second Place Solution of WSDM2023 Toloka Visual Question Answering Challenge

    Authors: Xiangyu Wu, Zhouyang Chi, Yang Yang, Jianfeng Lu

    Abstract: In this paper, we present our solution for the WSDM2023 Toloka Visual Question Answering Challenge. Inspired by the application of multimodal pre-trained models to various downstream tasks(e.g., visual question answering, visual grounding, and cross-modal retrieval), we approached this competition as a visual grounding task, where the input is an image and a question, guiding the model to answer t… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Second Place of WSDM2023 Toloka Visual Question Answering Challenge

  4. arXiv:2405.08334  [pdf, other

    cs.LG cs.AI

    Could Chemical LLMs benefit from Message Passing

    Authors: Jiaqing Xie, Ziheng Chi

    Abstract: Pretrained language models (LMs) showcase significant capabilities in processing molecular text, while concurrently, message passing neural networks (MPNNs) demonstrate resilience and versatility in the domain of molecular science. Despite these advancements, we find there are limited studies investigating the bidirectional interactions between molecular structures and their corresponding textual… ▽ More

    Submitted 26 August, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: Accepted at ACL @ Languages and Molecules 2024. In Proceedings of ACL 2024

    Journal ref: In Proceedings of the 1st Workshop on Language + Molecules (L+M 2024), pages 10 20, Bangkok, Thailand. Association for Computational Linguistics

  5. arXiv:2405.04065  [pdf, other

    cs.CL

    FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference

    Authors: Runheng Liu, Xingchen Xiao, Heyan Huang, Zewen Chi, Zhijing Wu

    Abstract: Retrieval-Augmented Language Modeling (RALM) by integrating large language models (LLM) with relevant documents from an external corpus is a proven method for enabling the LLM to generate information beyond the scope of its pre-training corpus. Previous work utilizing retrieved content by simply prepending it to the input poses a high runtime issue, which degrades the inference efficiency of the L… ▽ More

    Submitted 16 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: 14 pages

  6. arXiv:2405.02797  [pdf, other

    cs.CV cs.LG

    Adapting to Distribution Shift by Visual Domain Prompt Generation

    Authors: Zhixiang Chi, Li Gu, Tao Zhong, Huan Liu, Yuanhao Yu, Konstantinos N Plataniotis, Yang Wang

    Abstract: In this paper, we aim to adapt a model at test-time using a few unlabeled data to address distribution shifts. To tackle the challenges of extracting domain knowledge from a limited amount of data, it is crucial to utilize correlated information from pre-trained backbones and source domains. Previous studies fail to utilize recent foundation models with strong out-of-distribution generalization. A… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: ICLR2024, code: https://github.com/Guliisgreat/VDPG

  7. arXiv:2404.01642  [pdf, ps, other

    cs.LG cs.CR

    ADVREPAIR:Provable Repair of Adversarial Attack

    Authors: Zhiming Chi, Jianan Ma, Pengfei Yang, Cheng-Chao Huang, Renjue Li, Xiaowei Huang, Lijun Zhang

    Abstract: Deep neural networks (DNNs) are increasingly deployed in safety-critical domains, but their vulnerability to adversarial attacks poses serious safety risks. Existing neuron-level methods using limited data lack efficacy in fixing adversaries due to the inherent complexity of adversarial attack mechanisms, while adversarial training, leveraging a large number of adversarial samples to enhance robus… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  8. arXiv:2403.17683  [pdf, other

    cs.AI

    Solution for Emotion Prediction Competition of Workshop on Emotionally and Culturally Intelligent AI

    Authors: Shengdong Xu, Zhouyang Chi, Yang Yang

    Abstract: This report provide a detailed description of the method that we explored and proposed in the WECIA Emotion Prediction Competition (EPC), which predicts a person's emotion through an artistic work with a comment. The dataset of this competition is ArtELingo, designed to encourage work on diversity across languages and cultures. The dataset has two main challenges, namely modal imbalance problem an… ▽ More

    Submitted 31 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  9. arXiv:2403.07920  [pdf, other

    q-bio.BM cs.AI cs.CL cs.LG

    ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training

    Authors: Le Zhuo, Zewen Chi, Minghao Xu, Heyan Huang, Heqi Zheng, Conghui He, Xian-Ling Mao, Wentao Zhang

    Abstract: We propose ProtLLM, a versatile cross-modal large language model (LLM) for both protein-centric and protein-language tasks. ProtLLM features a unique dynamic protein mounting mechanism, enabling it to handle complex inputs where the natural language text is interspersed with an arbitrary number of proteins. Besides, we propose the protein-as-word language modeling approach to train ProtLLM. By dev… ▽ More

    Submitted 27 February, 2024; originally announced March 2024.

    Comments: https://protllm.github.io/project/

  10. arXiv:2312.10165  [pdf, other

    cs.CV

    Test-Time Domain Adaptation by Learning Domain-Aware Batch Normalization

    Authors: Yanan Wu, Zhixiang Chi, Yang Wang, Konstantinos N. Plataniotis, Songhe Feng

    Abstract: Test-time domain adaptation aims to adapt the model trained on source domains to unseen target domains using a few unlabeled images. Emerging research has shown that the label and domain information is separately embedded in the weight matrix and batch normalization (BN) layer. Previous works normally update the whole network naively without explicitly decoupling the knowledge between label and do… ▽ More

    Submitted 16 January, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: AAAI2024(Oral), see this https URL: https://github.com/ynanwu/MABN

  11. arXiv:2311.02874  [pdf, other

    eess.IV cs.CV cs.LG

    Dynamic Neural Fields for Learning Atlases of 4D Fetal MRI Time-series

    Authors: Zeen Chi, Zhongxiao Cong, Clinton J. Wang, Yingcheng Liu, Esra Abaci Turk, P. Ellen Grant, S. Mazdak Abulnaga, Polina Golland, Neel Dey

    Abstract: We present a method for fast biomedical image atlas construction using neural fields. Atlases are key to biomedical image analysis tasks, yet conventional and deep network estimation methods remain time-intensive. In this preliminary work, we frame subject-specific atlas building as learning a neural field of deformable spatiotemporal observations. We apply our method to learning subject-specific… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: 6 pages, 2 figures. Accepted by Medical Imaging Meets NeurIPS 2023

  12. arXiv:2308.11063  [pdf, other

    cs.CV

    MetaGCD: Learning to Continually Learn in Generalized Category Discovery

    Authors: Yanan Wu, Zhixiang Chi, Yang Wang, Songhe Feng

    Abstract: In this paper, we consider a real-world scenario where a model that is trained on pre-defined classes continually encounters unlabeled data that contains both known and novel classes. The goal is to continually discover novel classes while maintaining the performance in known classes. We name the setting Continual Generalized Category Discovery (C-GCD). Existing methods for novel class discovery c… ▽ More

    Submitted 17 October, 2023; v1 submitted 21 August, 2023; originally announced August 2023.

    Comments: This paper has been accepted by ICCV2023

  13. arXiv:2308.09268  [pdf, other

    cs.CV

    Progression-Guided Temporal Action Detection in Videos

    Authors: Chongkai Lu, Man-Wai Mak, Ruimin Li, Zheru Chi, Hong Fu

    Abstract: We present a novel framework, Action Progression Network (APN), for temporal action detection (TAD) in videos. The framework locates actions in videos by detecting the action evolution process. To encode the action evolution, we quantify a complete action process into 101 ordered stages (0\%, 1\%, ..., 100\%), referred to as action progressions. We then train a neural network to recognize the acti… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: Under Review. Code available at https://github.com/makecent/APN

  14. arXiv:2308.00520  [pdf, other

    cs.CV

    NormKD: Normalized Logits for Knowledge Distillation

    Authors: Zhihao Chi, Tu Zheng, Hengjia Li, Zheng Yang, Boxi Wu, Binbin Lin, Deng Cai

    Abstract: Logit based knowledge distillation gets less attention in recent years since feature based methods perform better in most cases. Nevertheless, we find it still has untapped potential when we re-investigate the temperature, which is a crucial hyper-parameter to soften the logit outputs. For most of the previous works, it was set as a fixed value for the entire distillation procedure. However, as th… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

  15. arXiv:2305.08800  [pdf, other

    cs.CL

    Measuring Cross-Lingual Transferability of Multilingual Transformers on Sentence Classification

    Authors: Zewen Chi, Heyan Huang, Xian-Ling Mao

    Abstract: Recent studies have exhibited remarkable capabilities of pre-trained multilingual Transformers, especially cross-lingual transferability. However, current methods do not measure cross-lingual transferability well, hindering the understanding of multilingual Transformers. In this paper, we propose IGap, a cross-lingual transferability metric for multilingual Transformers on sentence classification… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

  16. arXiv:2303.17815  [pdf, other

    cs.CV

    APPT : Asymmetric Parallel Point Transformer for 3D Point Cloud Understanding

    Authors: Hengjia Li, Tu Zheng, Zhihao Chi, Zheng Yang, Wenxiao Wang, Boxi Wu, Binbin Lin, Deng Cai

    Abstract: Transformer-based networks have achieved impressive performance in 3D point cloud understanding. However, most of them concentrate on aggregating local features, but neglect to directly model global dependencies, which results in a limited effective receptive field. Besides, how to effectively incorporate local and global components also remains challenging. To tackle these problems, we propose As… ▽ More

    Submitted 31 March, 2023; originally announced March 2023.

  17. arXiv:2302.14045  [pdf, other

    cs.CL cs.CV

    Language Is Not All You Need: Aligning Perception with Language Models

    Authors: Shaohan Huang, Li Dong, Wenhui Wang, Yaru Hao, Saksham Singhal, Shuming Ma, Tengchao Lv, Lei Cui, Owais Khan Mohammed, Barun Patra, Qiang Liu, Kriti Aggarwal, Zewen Chi, Johan Bjorck, Vishrav Chaudhary, Subhojit Som, Xia Song, Furu Wei

    Abstract: A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence. In this work, we introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot). Specifically, we train Kosmos-1 from scratch on web-scale multimodal co… ▽ More

    Submitted 1 March, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

  18. arXiv:2302.06455  [pdf, other

    cs.AI cs.FL

    Incremental Satisfiability Modulo Theory for Verification of Deep Neural Networks

    Authors: Pengfei Yang, Zhiming Chi, Zongxin Liu, Mengyu Zhao, Cheng-Chao Huang, Shaowei Cai, Lijun Zhang

    Abstract: Constraint solving is an elementary way for verification of deep neural networks (DNN). In the domain of AI safety, a DNN might be modified in its structure and parameters for its repair or attack. For such situations, we propose the incremental DNN verification problem, which asks whether a safety property still holds after the DNN is modified. To solve the problem, we present an incremental sati… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

  19. arXiv:2212.09611  [pdf, other

    cs.CL cs.CV

    Optimizing Prompts for Text-to-Image Generation

    Authors: Yaru Hao, Zewen Chi, Li Dong, Furu Wei

    Abstract: Well-designed prompts can guide text-to-image models to generate amazing images. However, the performant prompts are often model-specific and misaligned with user input. Instead of laborious human engineering, we propose prompt adaptation, a general framework that automatically adapts original user input to model-preferred prompts. Specifically, we first perform supervised fine-tuning with a pretr… ▽ More

    Submitted 29 December, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: Accepted by NeurIPS-23

  20. arXiv:2212.09353  [pdf, other

    cs.CL

    Bridging The Gap: Entailment Fused-T5 for Open-retrieval Conversational Machine Reading Comprehension

    Authors: Xiao Zhang, Heyan Huang, Zewen Chi, Xian-Ling Mao

    Abstract: Open-retrieval conversational machine reading comprehension (OCMRC) simulates real-life conversational interaction scenes. Machines are required to make a decision of "Yes/No/Inquire" or generate a follow-up question when the decision is "Inquire" based on retrieved rule texts, user scenario, user question, and dialogue history. Recent studies explored the methods to reduce the information gap bet… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

  21. arXiv:2212.08273  [pdf, other

    cs.CV cs.AI cs.LG

    Learning for Vehicle-to-Vehicle Cooperative Perception under Lossy Communication

    Authors: Jinlong Li, Runsheng Xu, Xinyu Liu, Jin Ma, Zicheng Chi, Jiaqi Ma, Hongkai Yu

    Abstract: Deep learning has been widely used in the perception (e.g., 3D object detection) of intelligent vehicle driving. Due to the beneficial Vehicle-to-Vehicle (V2V) communication, the deep learning based features from other agents can be shared to the ego vehicle so as to improve the perception of the ego vehicle. It is named as Cooperative Perception in the V2V research, whose algorithms have been dra… ▽ More

    Submitted 18 March, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: this paper was accepted by IEEE Transactions on Intelligent Vehicles

    Journal ref: 2023 IEEE Transactions on Intelligent Vehicles

  22. arXiv:2211.13184  [pdf, other

    cs.LG cs.CL

    TorchScale: Transformers at Scale

    Authors: Shuming Ma, Hongyu Wang, Shaohan Huang, Wenhui Wang, Zewen Chi, Li Dong, Alon Benhaim, Barun Patra, Vishrav Chaudhary, Xia Song, Furu Wei

    Abstract: Large Transformers have achieved state-of-the-art performance across many tasks. Most open-source libraries on scaling Transformers focus on improving training or inference with better parallelization. In this work, we present TorchScale, an open-source toolkit that allows researchers and developers to scale up Transformers efficiently and effectively. TorchScale has the implementation of several… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

    Comments: Work in progress

  23. arXiv:2210.14867  [pdf, other

    cs.CL cs.LG

    Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning

    Authors: Barun Patra, Saksham Singhal, Shaohan Huang, Zewen Chi, Li Dong, Furu Wei, Vishrav Chaudhary, Xia Song

    Abstract: In this paper, we elaborate upon recipes for building multilingual representation models that are not only competitive with existing state-of-the-art models but are also more parameter efficient, thereby promoting better adoption in resource-constrained scenarios and practical applications. We show that going beyond English-centric bitexts, coupled with a novel sampling strategy aimed at reducing… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: Work in progress

  24. arXiv:2210.06546  [pdf, other

    cs.LG stat.ML

    Auto-Encoding Goodness of Fit

    Authors: Aaron Palmer, Zhiyi Chi, Derek Aguiar, Jinbo Bi

    Abstract: For generative autoencoders to learn a meaningful latent representation for data generation, a careful balance must be achieved between reconstruction error and how close the distribution in the latent space is to the prior. However, this balance is challenging to achieve due to a lack of criteria that work both at the mini-batch (local) and aggregated posterior (global) level. Goodness of fit (Go… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

  25. arXiv:2210.05461  [pdf, other

    cs.CV

    FreGAN: Exploiting Frequency Components for Training GANs under Limited Data

    Authors: Mengping Yang, Zhe Wang, Ziqiu Chi, Yanbing Zhang

    Abstract: Training GANs under limited data often leads to discriminator overfitting and memorization issues, causing divergent training. Existing approaches mitigate the overfitting by employing data augmentations, model regularization, or attention mechanisms. However, they ignore the frequency bias of GANs and take poor consideration towards frequency information, especially high-frequency signals that co… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Comments: To appear in NeurIPS 2022, github:https://github.com/kobeshegu/FreGAN_NeurIPS2022

  26. arXiv:2210.03885  [pdf, other

    cs.LG cs.CV

    Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from Mixture-of-Experts

    Authors: Tao Zhong, Zhixiang Chi, Li Gu, Yang Wang, Yuanhao Yu, Jin Tang

    Abstract: In this paper, we tackle the problem of domain shift. Most existing methods perform training on multiple source domains using a single model, and the same trained model is used on all unseen target domains. Such solutions are sub-optimal as each target domain exhibits its own specialty, which is not adapted. Furthermore, expecting single-model training to learn extensive knowledge from multiple so… ▽ More

    Submitted 11 January, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

    Comments: Accepted at NeurIPS2022

  27. arXiv:2210.00174  [pdf, other

    cs.CV cs.LG

    Improving ProtoNet for Few-Shot Video Object Recognition: Winner of ORBIT Challenge 2022

    Authors: Li Gu, Zhixiang Chi, Huan Liu, Yuanhao Yu, Yang Wang

    Abstract: In this work, we present the winning solution for ORBIT Few-Shot Video Object Recognition Challenge 2022. Built upon the ProtoNet baseline, the performance of our method is improved with three effective techniques. These techniques include the embedding adaptation, the uniform video clip sampler and the invalid frame detection. In addition, we re-factor and re-implement the official codebase to en… ▽ More

    Submitted 30 September, 2022; originally announced October 2022.

    Comments: Winner of ORBIT Challenge 2022

  28. arXiv:2209.11484  [pdf, other

    cs.CL

    ET5: A Novel End-to-end Framework for Conversational Machine Reading Comprehension

    Authors: Xiao Zhang, Heyan Huang, Zewen Chi, Xian-Ling Mao

    Abstract: Conversational machine reading comprehension (CMRC) aims to assist computers to understand an natural language text and thereafter engage in a multi-turn conversation to answer questions related to the text. Existing methods typically require three steps: (1) decision making based on entailment reasoning; (2) span extraction if required by the above decision; (3) question rephrasing based on the e… ▽ More

    Submitted 23 September, 2022; originally announced September 2022.

    Comments: Accepted by COLING2022

  29. arXiv:2208.10813  [pdf, other

    cs.CL

    Unsupervised Question Answering via Answer Diversifying

    Authors: Yuxiang Nie, Heyan Huang, Zewen Chi, Xian-Ling Mao

    Abstract: Unsupervised question answering is an attractive task due to its independence on labeled data. Previous works usually make use of heuristic rules as well as pre-trained models to construct data and train QA models. However, most of these works regard named entity (NE) as the only answer type, which ignores the high diversity of answers in the real world. To tackle this problem, we propose a novel… ▽ More

    Submitted 23 August, 2022; originally announced August 2022.

    Comments: Accepted by COLING 2022

  30. arXiv:2207.12305  [pdf, other

    cs.CV

    Error-Aware Spatial Ensembles for Video Frame Interpolation

    Authors: Zhixiang Chi, Rasoul Mohammadi Nasiri, Zheng Liu, Yuanhao Yu, Juwei Lu, Jin Tang, Konstantinos N Plataniotis

    Abstract: Video frame interpolation~(VFI) algorithms have improved considerably in recent years due to unprecedented progress in both data-driven algorithms and their implementations. Recent research has introduced advanced motion estimation or novel warping methods as the means to address challenging VFI scenarios. However, none of the published VFI works considers the spatially non-uniform characteristics… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

    Comments: 10 pages, 8 figures, demo video: https://www.youtube.com/watch?v=_32GNANSr5U

  31. arXiv:2207.11213  [pdf, other

    cs.CV

    Few-Shot Class-Incremental Learning via Entropy-Regularized Data-Free Replay

    Authors: Huan Liu, Li Gu, Zhixiang Chi, Yang Wang, Yuanhao Yu, Jun Chen, Jin Tang

    Abstract: Few-shot class-incremental learning (FSCIL) has been proposed aiming to enable a deep learning system to incrementally learn new classes with limited data. Recently, a pioneer claims that the commonly used replay-based method in class-incremental learning (CIL) is ineffective and thus not preferred for FSCIL. This has, if truth, a significant influence on the fields of FSCIL. In this paper, we sho… ▽ More

    Submitted 22 July, 2022; originally announced July 2022.

    Comments: Accepted by ECCV 2022

  32. arXiv:2207.07288  [pdf, other

    cs.CV eess.IV

    WaveGAN: Frequency-aware GAN for High-Fidelity Few-shot Image Generation

    Authors: Mengping Yang, Zhe Wang, Ziqiu Chi, Wenyi Feng

    Abstract: Existing few-shot image generation approaches typically employ fusion-based strategies, either on the image or the feature level, to produce new images. However, previous approaches struggle to synthesize high-frequency signals with fine details, deteriorating the synthesis quality. To address this, we propose WaveGAN, a frequency-aware model for few-shot image generation. Concretely, we disentang… ▽ More

    Submitted 9 August, 2022; v1 submitted 15 July, 2022; originally announced July 2022.

    Comments: Accepted by ECCV2022, Code Link:https://github.com/kobeshegu/ECCV2022_WaveGAN

  33. arXiv:2206.06336  [pdf, other

    cs.CL

    Language Models are General-Purpose Interfaces

    Authors: Yaru Hao, Haoyu Song, Li Dong, Shaohan Huang, Zewen Chi, Wenhui Wang, Shuming Ma, Furu Wei

    Abstract: Foundation models have received much attention due to their effectiveness across a broad range of downstream applications. Though there is a big convergence in terms of architecture, most pretrained models are typically still developed for specific tasks or modalities. In this work, we propose to use language models as a general-purpose interface to various foundation models. A collection of pretr… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

    Comments: 32 pages. The first three authors contribute equally

  34. arXiv:2204.09179  [pdf, other

    cs.CL cs.LG

    On the Representation Collapse of Sparse Mixture of Experts

    Authors: Zewen Chi, Li Dong, Shaohan Huang, Damai Dai, Shuming Ma, Barun Patra, Saksham Singhal, Payal Bajaj, Xia Song, Xian-Ling Mao, Heyan Huang, Furu Wei

    Abstract: Sparse mixture of experts provides larger model capacity while requiring a constant computational overhead. It employs the routing mechanism to distribute input tokens to the best-matched experts according to their hidden representations. However, learning such a routing mechanism encourages token clustering around expert centroids, implying a trend toward representation collapse. In this work, we… ▽ More

    Submitted 12 October, 2022; v1 submitted 19 April, 2022; originally announced April 2022.

    Comments: NeurIPS 2022

  35. arXiv:2204.08887  [pdf, other

    cs.CL

    Cross-Lingual Phrase Retrieval

    Authors: Heqi Zheng, Xiao Zhang, Zewen Chi, Heyan Huang, Tan Yan, Tian Lan, Wei Wei, Xian-Ling Mao

    Abstract: Cross-lingual retrieval aims to retrieve relevant text across languages. Current methods typically achieve cross-lingual retrieval by learning language-agnostic text representations in word or sentence level. However, how to learn phrase representations for cross-lingual phrase retrieval is still an open problem. In this paper, we propose XPR, a cross-lingual phrase retriever that extracts phrase… ▽ More

    Submitted 19 April, 2022; originally announced April 2022.

  36. arXiv:2112.05883  [pdf, other

    cs.CV cs.LG

    Self-supervised Spatiotemporal Representation Learning by Exploiting Video Continuity

    Authors: Hanwen Liang, Niamul Quader, Zhixiang Chi, Lizhe Chen, Peng Dai, Juwei Lu, Yang Wang

    Abstract: Recent self-supervised video representation learning methods have found significant success by exploring essential properties of videos, e.g. speed, temporal order, etc. This work exploits an essential yet under-explored property of videos, the video continuity, to obtain supervision signals for self-supervised representation learning. Specifically, we formulate three novel continuity-related pret… ▽ More

    Submitted 12 January, 2022; v1 submitted 10 December, 2021; originally announced December 2021.

  37. arXiv:2110.07936  [pdf, other

    cs.CL

    Unifying Cross-lingual Summarization and Machine Translation with Compression Rate

    Authors: Yu Bai, Heyan Huang, Kai Fan, Yang Gao, Yiming Zhu, Jiaao Zhan, Zewen Chi, Boxing Chen

    Abstract: Cross-Lingual Summarization (CLS) is a task that extracts important information from a source document and summarizes it into a summary in another language. It is a challenging task that requires a system to understand, summarize, and translate at the same time, making it highly related to Monolingual Summarization (MS) and Machine Translation (MT). In practice, the training resources for Machine… ▽ More

    Submitted 24 April, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: Accepted by SIGIR 2022

  38. arXiv:2109.11129  [pdf, other

    cs.CL

    Cross-Lingual Language Model Meta-Pretraining

    Authors: Zewen Chi, Heyan Huang, Luyang Liu, Yu Bai, Xian-Ling Mao

    Abstract: The success of pretrained cross-lingual language models relies on two essential abilities, i.e., generalization ability for learning downstream tasks in a source language, and cross-lingual transferability for transferring the task knowledge to other languages. However, current methods jointly learn the two abilities in a single-phase cross-lingual pretraining process, resulting in a trade-off bet… ▽ More

    Submitted 22 September, 2021; originally announced September 2021.

  39. arXiv:2106.16138  [pdf, other

    cs.CL

    XLM-E: Cross-lingual Language Model Pre-training via ELECTRA

    Authors: Zewen Chi, Shaohan Huang, Li Dong, Shuming Ma, Bo Zheng, Saksham Singhal, Payal Bajaj, Xia Song, Xian-Ling Mao, Heyan Huang, Furu Wei

    Abstract: In this paper, we introduce ELECTRA-style tasks to cross-lingual language model pre-training. Specifically, we present two pre-training tasks, namely multilingual replaced token detection, and translation replaced token detection. Besides, we pretrain the model, named as XLM-E, on both multilingual and parallel corpora. Our model outperforms the baseline models on various cross-lingual understandi… ▽ More

    Submitted 19 April, 2022; v1 submitted 30 June, 2021; originally announced June 2021.

    Comments: ACL-2022

  40. arXiv:2106.08226  [pdf, other

    cs.CL

    Consistency Regularization for Cross-Lingual Fine-Tuning

    Authors: Bo Zheng, Li Dong, Shaohan Huang, Wenhui Wang, Zewen Chi, Saksham Singhal, Wanxiang Che, Ting Liu, Xia Song, Furu Wei

    Abstract: Fine-tuning pre-trained cross-lingual language models can transfer task-specific supervision from one language to the others. In this work, we propose to improve cross-lingual fine-tuning with consistency regularization. Specifically, we use example consistency regularization to penalize the prediction sensitivity to four types of data augmentations, i.e., subword sampling, Gaussian noise, code-sw… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

    Comments: ACL-2021

  41. arXiv:2106.06381  [pdf, other

    cs.CL

    Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment

    Authors: Zewen Chi, Li Dong, Bo Zheng, Shaohan Huang, Xian-Ling Mao, Heyan Huang, Furu Wei

    Abstract: The cross-lingual language models are typically pretrained with masked language modeling on multilingual text or parallel sentences. In this paper, we introduce denoising word alignment as a new cross-lingual pre-training task. Specifically, the model first self-labels word alignments for parallel sentences. Then we randomly mask tokens in a bitext pair. Given a masked token, the model uses a poin… ▽ More

    Submitted 13 September, 2021; v1 submitted 11 June, 2021; originally announced June 2021.

    Comments: ACL 2021

  42. arXiv:2104.08692  [pdf, other

    cs.CL

    MT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs

    Authors: Zewen Chi, Li Dong, Shuming Ma, Shaohan Huang Xian-Ling Mao, Heyan Huang, Furu Wei

    Abstract: Multilingual T5 (mT5) pretrains a sequence-to-sequence model on massive monolingual texts, which has shown promising results on many cross-lingual tasks. In this paper, we improve multilingual text-to-text transfer Transformer with translation pairs (mT6). Specifically, we explore three cross-lingual text-to-text pre-training tasks, namely, machine translation, translation pair span corruption, an… ▽ More

    Submitted 13 September, 2021; v1 submitted 17 April, 2021; originally announced April 2021.

    Comments: EMNLP 2021

  43. DataPrep.EDA: Task-Centric Exploratory Data Analysis for Statistical Modeling in Python

    Authors: Jinglin Peng, Weiyuan Wu, Brandon Lockhart, Song Bian, Jing Nathan Yan, Linghao Xu, Zhixuan Chi, Jeffrey Rzeszotarski, Jiannan Wang

    Abstract: Exploratory Data Analysis (EDA) is a crucial step in any data science project. However, existing Python libraries fall short in supporting data scientists to complete common EDA tasks for statistical modeling. Their API design is either too low level, which is optimized for plotting rather than EDA, or too high level, which is hard to specify more fine-grained EDA tasks. In response, we propose Da… ▽ More

    Submitted 10 April, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

  44. A Robust and Domain-Adaptive Approach for Low-Resource Named Entity Recognition

    Authors: Houjin Yu, Xian-Ling Mao, Zewen Chi, Wei Wei, Heyan Huang

    Abstract: Recently, it has attracted much attention to build reliable named entity recognition (NER) systems using limited annotated data. Nearly all existing works heavily rely on domain-specific resources, such as external lexicons and knowledge bases. However, such domain-specific resources are often not available, meanwhile it's difficult and expensive to construct the resources, which has become a key… ▽ More

    Submitted 2 January, 2021; originally announced January 2021.

    Comments: Best Student Paper of 2020 IEEE International Conference on Knowledge Graph (ICKG)

    Journal ref: 2020 IEEE International Conference on Knowledge Graph (ICKG) (pp. 297-304)-

  45. arXiv:2012.15547  [pdf, other

    cs.CL

    XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders

    Authors: Shuming Ma, Jian Yang, Haoyang Huang, Zewen Chi, Li Dong, Dongdong Zhang, Hany Hassan Awadalla, Alexandre Muzio, Akiko Eriguchi, Saksham Singhal, Xia Song, Arul Menezes, Furu Wei

    Abstract: Multilingual machine translation enables a single model to translate between different languages. Most existing multilingual machine translation systems adopt a randomly initialized Transformer backbone. In this work, inspired by the recent success of language model pre-training, we present XLM-T, which initializes the model with an off-the-shelf pretrained cross-lingual Transformer encoder and fi… ▽ More

    Submitted 31 December, 2020; originally announced December 2020.

  46. arXiv:2007.11762  [pdf, other

    cs.CV

    All at Once: Temporally Adaptive Multi-Frame Interpolation with Advanced Motion Modeling

    Authors: Zhixiang Chi, Rasoul Mohammadi Nasiri, Zheng Liu, Juwei Lu, Jin Tang, Konstantinos N Plataniotis

    Abstract: Recent advances in high refresh rate displays as well as the increased interest in high rate of slow motion and frame up-conversion fuel the demand for efficient and cost-effective multi-frame video interpolation solutions. To that regard, inserting multiple frames between consecutive video frames are of paramount importance for the consumer electronics industry. State-of-the-art methods are itera… ▽ More

    Submitted 8 January, 2021; v1 submitted 22 July, 2020; originally announced July 2020.

    Comments: Accepted at ECCV2020 (poster), project: https://chi-chi-zx.github.io/all-at-once/

  47. arXiv:2007.07834  [pdf, other

    cs.CL

    InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training

    Authors: Zewen Chi, Li Dong, Furu Wei, Nan Yang, Saksham Singhal, Wenhui Wang, Xia Song, Xian-Ling Mao, Heyan Huang, Ming Zhou

    Abstract: In this work, we present an information-theoretic framework that formulates cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts. The unified view helps us to better understand the existing methods for learning cross-lingual representations. More importantly, inspired by the framework, we propose a new pre-training task based on co… ▽ More

    Submitted 7 April, 2021; v1 submitted 15 July, 2020; originally announced July 2020.

    Comments: NAACL 2021

  48. arXiv:2007.01652  [pdf, other

    cs.CL

    Generating Informative Dialogue Responses with Keywords-Guided Networks

    Authors: Heng-Da Xu, Xian-Ling Mao, Zewen Chi, Jing-Jing Zhu, Fanshu Sun, Heyan Huang

    Abstract: Recently, open-domain dialogue systems have attracted growing attention. Most of them use the sequence-to-sequence (Seq2Seq) architecture to generate responses. However, traditional Seq2Seq-based open-domain dialogue models tend to generate generic and safe responses, which are less informative, unlike human responses. In this paper, we propose a simple but effective keywords-guided Sequence-to-Se… ▽ More

    Submitted 3 July, 2020; originally announced July 2020.

  49. arXiv:2004.12231  [pdf, other

    eess.IV cs.CV cs.LG

    Deep DIH : Statistically Inferred Reconstruction of Digital In-Line Holography by Deep Learning

    Authors: Huayu Li, Xiwen Chen, Haiyu Wu, Zaoyi Chi, Christopher Mann, Abolfazl Razi

    Abstract: Digital in-line holography is commonly used to reconstruct 3D images from 2D holograms for microscopic objects. One of the technical challenges that arise in the signal processing stage is removing the twin image that is caused by the phase-conjugate wavefront from the recorded holograms. Twin image removal is typically formulated as a non-linear inverse problem due to the irreversible scattering… ▽ More

    Submitted 24 June, 2020; v1 submitted 25 April, 2020; originally announced April 2020.

  50. arXiv:1911.03913  [pdf, other

    cs.CL

    Can Monolingual Pretrained Models Help Cross-Lingual Classification?

    Authors: Zewen Chi, Li Dong, Furu Wei, Xian-Ling Mao, Heyan Huang

    Abstract: Multilingual pretrained language models (such as multilingual BERT) have achieved impressive results for cross-lingual transfer. However, due to the constant model capacity, multilingual pre-training usually lags behind the monolingual competitors. In this work, we present two approaches to improve zero-shot cross-lingual classification, by transferring the knowledge from monolingual pretrained mo… ▽ More

    Submitted 10 November, 2019; originally announced November 2019.

    Comments: 5 pages