Skip to main content

Showing 1–50 of 87 results for author: Lan, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.09422  [pdf, other

    cs.CL cs.AI

    Distinguish Confusion in Legal Judgment Prediction via Revised Relation Knowledge

    Authors: Nuo Xu, Pinghui Wang, Junzhou Zhao, Feiyang Sun, Lin Lan, Jing Tao, Li Pan, Xiaohong Guan

    Abstract: Legal Judgment Prediction (LJP) aims to automatically predict a law case's judgment results based on the text description of its facts. In practice, the confusing law articles (or charges) problem frequently occurs, reflecting that the law cases applicable to similar articles (or charges) tend to be misjudged. Although some recent works based on prior knowledge solve this issue well, they ignore t… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM TOIS

  2. arXiv:2408.09178  [pdf, other

    cs.CV

    MambaTrack: A Simple Baseline for Multiple Object Tracking with State Space Model

    Authors: Changcheng Xiao, Qiong Cao, Zhigang Luo, Long Lan

    Abstract: Tracking by detection has been the prevailing paradigm in the field of Multi-object Tracking (MOT). These methods typically rely on the Kalman Filter to estimate the future locations of objects, assuming linear object motion. However, they fall short when tracking objects exhibiting nonlinear and diverse motion in scenarios like dancing and sports. In addition, there has been limited focus on util… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM Multimedia 2024

  3. arXiv:2408.08918  [pdf, other

    cs.CR cs.AI

    Supervised and Unsupervised Alignments for Spoofing Behavioral Biometrics

    Authors: Thomas Thebaud, Gaël Le Lan, Anthony Larcher

    Abstract: Biometric recognition systems are security systems based on intrinsic properties of their users, usually encoded in high dimension representations called embeddings, which potential theft would represent a greater threat than a temporary password or a replaceable key. To study the threat of embedding theft, we perform spoofing attacks on two behavioral biometric systems (an automatic speaker verif… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 11 pages, 4 figures, 5 tables, submission in progress

  4. arXiv:2408.00118  [pdf, other

    cs.CL cs.AI

    Gemma 2: Improving Open Language Models at a Practical Size

    Authors: Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, Johan Ferret, Peter Liu, Pouya Tafti, Abe Friesen, Michelle Casbon, Sabela Ramos, Ravin Kumar, Charline Le Lan, Sammy Jerome, Anton Tsitsulin, Nino Vieillard, Piotr Stanczyk, Sertan Girgin, Nikola Momchev, Matt Hoffman , et al. (172 additional authors not shown)

    Abstract: In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We al… ▽ More

    Submitted 2 August, 2024; v1 submitted 31 July, 2024; originally announced August 2024.

  5. arXiv:2407.03648  [pdf, other

    eess.AS cs.SD

    High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching

    Authors: Gael Le Lan, Bowen Shi, Zhaoheng Ni, Sidd Srinivasan, Anurag Kumar, Brian Ellis, David Kant, Varun Nagaraja, Ernie Chang, Wei-Ning Hsu, Yangyang Shi, Vikas Chandra

    Abstract: We introduce a simple and efficient text-controllable high-fidelity music generation and editing model. It operates on sequences of continuous latent representations from a low frame rate 48 kHz stereo variational auto encoder codec that eliminates the information loss drawback of discrete representations. Based on a diffusion transformer architecture trained on a flow-matching objective the model… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  6. arXiv:2406.15007  [pdf, other

    cs.AI

    RouteFinder: Towards Foundation Models for Vehicle Routing Problems

    Authors: Federico Berto, Chuanbo Hua, Nayeli Gast Zepeda, André Hottung, Niels Wouda, Leon Lan, Kevin Tierney, Jinkyoo Park

    Abstract: Vehicle Routing Problems (VRPs) are optimization problems with significant real-world implications in logistics, transportation, and supply chain management. Despite the recent progress made in learning to solve individual VRP variants, there is a lack of a unified approach that can effectively tackle a wide range of tasks, which is crucial for real-world impact. This paper introduces RouteFinder,… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  7. arXiv:2406.11933  [pdf, other

    cs.CV

    OpticalRS-4M: Scaling Efficient Masked Autoencoder Learning on Large Remote Sensing Dataset

    Authors: Fengxiang Wang, Hongzhen Wang, Di Wang, Zonghao Guo, Zhenyu Zhong, Long Lan, Jing Zhang, Zhiyuan Liu, Maosong Sun

    Abstract: Masked Image Modeling (MIM) has become an essential method for building foundational visual models in remote sensing (RS). However, the limitations in size and diversity of existing RS datasets restrict the ability of MIM methods to learn generalizable representations. Additionally, conventional MIM techniques, which require reconstructing all tokens, introduce unnecessary computational overhead.… ▽ More

    Submitted 30 August, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  8. arXiv:2405.15056  [pdf, other

    cs.LG cs.CV cs.GR

    ElastoGen: 4D Generative Elastodynamics

    Authors: Yutao Feng, Yintong Shang, Xiang Feng, Lei Lan, Shandian Zhe, Tianjia Shao, Hongzhi Wu, Kun Zhou, Hao Su, Chenfanfu Jiang, Yin Yang

    Abstract: We present ElastoGen, a knowledge-driven model that generates physically accurate and coherent 4D elastodynamics. Instead of relying on petabyte-scale data-driven learning, ElastoGen leverages the principles of physics-in-the-loop and learns from established physical knowledge, such as partial differential equations and their numerical solutions. The core idea of ElastoGen is converting the global… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  9. arXiv:2405.12484  [pdf, other

    cs.GR

    Meta-Homogenization for Knitwear Simulation

    Authors: Chun Yuan, Kui Wu, Haoyang Shi, Lei Lan, Yuxing Qiu, Cem Yuksel, Huamin Wang, Chenfanfu Jiang, Yin Yang

    Abstract: This paper presents meta-homogenization, a spatially varying homogenization scheme for knitwear simulation. We are motivated by the observation that macro-scale fabric dynamics is strongly correlated with its underlying knitting patterns. Therefore, homogenization towards a single material is less effective when the knitting is complex and non-repetitive. Our method tackles this challenge by homog… ▽ More

    Submitted 23 May, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  10. arXiv:2405.11694  [pdf, other

    cs.GR

    PBI: Position-Based Dynamics Handles Updated Lagrangian Inelasticity

    Authors: Chang Yu, Xuan Li, Lei Lan, Yin Yang, Chenfanfu Jiang

    Abstract: Position-based Dynamics (PBD) and its extension, eXtended Position-based Dynamics (XPBD), have been predominantly applied to compliant constrained dynamics, with their potential in finite strain inelasticity remaining underexplored. XPBD stands in contrast to other meshless methods, such as the Material Point Method (MPM). MPM is based on discretizing the weak form of governing partial differentia… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  11. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  12. arXiv:2403.19272  [pdf, other

    cs.GR

    Mil2: Efficient Cloth Simulation Using Non-distance Barriers and Subspace Reuse

    Authors: Lei Lan, Zixuan Lu, Jingyi Long, Chun Yuan, Xuan Li, Xiaowei He, Huamin Wang, Chenfanfu Jiang, Yin Yang

    Abstract: Mil2 pushes the performance of high-resolution cloth simulation, making the simulation interactive (in milliseconds) for models with one million degrees of freedom (DOFs) while keeping every triangle untangled. The guarantee of being penetration-free is inspired by the interior-point method, which converts the inequality constraints to barrier potentials. Nevertheless, we propose a major overhaul… ▽ More

    Submitted 23 May, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  13. PyVRP: a high-performance VRP solver package

    Authors: Niels A. Wouda, Leon Lan, Wouter Kool

    Abstract: We introduce PyVRP, a Python package that implements hybrid genetic search in a state-of-the-art vehicle routing problem (VRP) solver. The package is designed for the VRP with time windows (VRPTW), but can be easily extended to support other VRP variants. PyVRP combines the flexibility of Python with the performance of C++, by implementing (only) performance critical parts of the algorithm in C++,… ▽ More

    Submitted 21 March, 2024; v1 submitted 22 November, 2023; originally announced March 2024.

    Comments: Pre-print of accepted paper in INFORMS Journal on Computing. 24 pages, 1 figure, 2 listings

  14. arXiv:2403.13241  [pdf, other

    cs.LG

    Tackling Noisy Labels with Network Parameter Additive Decomposition

    Authors: Jingyi Wang, Xiaobo Xia, Long Lan, Xinghao Wu, Jun Yu, Wenjing Yang, Bo Han, Tongliang Liu

    Abstract: Given data with noisy labels, over-parameterized deep networks suffer overfitting mislabeled data, resulting in poor generalization. The memorization effect of deep networks shows that although the networks have the ability to memorize all noisy data, they would first memorize clean training data, and then gradually memorize mislabeled training data. A simple and effective method that exploits the… ▽ More

    Submitted 28 April, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: Accepted by IEEE T-PAMI

  15. arXiv:2403.08635  [pdf, other

    cs.LG cs.AI stat.ML

    Human Alignment of Large Language Models through Online Preference Optimisation

    Authors: Daniele Calandriello, Daniel Guo, Remi Munos, Mark Rowland, Yunhao Tang, Bernardo Avila Pires, Pierre Harvey Richemond, Charline Le Lan, Michal Valko, Tianqi Liu, Rishabh Joshi, Zeyu Zheng, Bilal Piot

    Abstract: Ensuring alignment of language models' outputs with human preferences is critical to guarantee a useful, safe, and pleasant user experience. Thus, human alignment has been extensively studied recently and several methods such as Reinforcement Learning from Human Feedback (RLHF), Direct Policy Optimisation (DPO) and Sequence Likelihood Calibration (SLiC) have emerged. In this paper, our contributio… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  16. arXiv:2403.08295  [pdf, other

    cs.CL cs.AI

    Gemma: Open Models Based on Gemini Research and Technology

    Authors: Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari , et al. (83 additional authors not shown)

    Abstract: This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Ge… ▽ More

    Submitted 16 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  17. arXiv:2403.08271  [pdf, other

    cs.CV cs.AI

    Efficient Prompt Tuning of Large Vision-Language Model for Fine-Grained Ship Classification

    Authors: Long Lan, Fengxiang Wang, Shuyan Li, Xiangtao Zheng, Zengmao Wang, Xinwang Liu

    Abstract: Fine-grained ship classification in remote sensing (RS-FGSC) poses a significant challenge due to the high similarity between classes and the limited availability of labeled data, limiting the effectiveness of traditional supervised classification methods. Recent advancements in large pre-trained Vision-Language Models (VLMs) have demonstrated impressive capabilities in few-shot or zero-shot learn… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  18. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  19. arXiv:2402.07307  [pdf, other

    stat.ML cs.LG stat.ME

    Self-Consistent Conformal Prediction

    Authors: Lars van der Laan, Ahmed M. Alaa

    Abstract: In decision-making guided by machine learning, decision-makers may take identical actions in contexts with identical predicted outcomes. Conformal prediction helps decision-makers quantify uncertainty in point predictions of outcomes, allowing for better risk management for actions. Motivated by this perspective, we introduce \textit{Self-Consistent Conformal Prediction} for regression, which comb… ▽ More

    Submitted 22 April, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

  20. arXiv:2402.01972  [pdf, other

    stat.ML cs.LG stat.ME

    Combining T-learning and DR-learning: a framework for oracle-efficient estimation of causal contrasts

    Authors: Lars van der Laan, Marco Carone, Alex Luedtke

    Abstract: We introduce efficient plug-in (EP) learning, a novel framework for the estimation of heterogeneous causal contrasts, such as the conditional average treatment effect and conditional relative risk. The EP-learning framework enjoys the same oracle-efficiency as Neyman-orthogonal learning strategies, such as DR-learning and R-learning, while addressing some of their primary drawbacks, including that… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  21. arXiv:2402.01057  [pdf, other

    cs.LG

    Expert Proximity as Surrogate Rewards for Single Demonstration Imitation Learning

    Authors: Chia-Cheng Chiang, Li-Cheng Lan, Wei-Fang Sun, Chien Feng, Cho-Jui Hsieh, Chun-Yi Lee

    Abstract: In this paper, we focus on single-demonstration imitation learning (IL), a practical approach for real-world applications where acquiring multiple expert demonstrations is costly or infeasible and the ground truth reward function is not available. In contrast to typical IL settings with multiple demonstrations, single-demonstration IL involves an agent having access to only one expert trajectory.… ▽ More

    Submitted 7 July, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: Published at ICML 2024. Code: https://github.com/stanl1y/tdil

  22. arXiv:2401.04577  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Masked Audio Generation using a Single Non-Autoregressive Transformer

    Authors: Alon Ziv, Itai Gat, Gael Le Lan, Tal Remez, Felix Kreuk, Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi

    Abstract: We introduce MAGNeT, a masked generative sequence modeling method that operates directly over several streams of audio tokens. Unlike prior work, MAGNeT is comprised of a single-stage, non-autoregressive transformer. During training, we predict spans of masked tokens obtained from a masking scheduler, while during inference we gradually construct the output sequence using several decoding steps. T… ▽ More

    Submitted 5 March, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

  23. arXiv:2401.00722  [pdf, other

    cs.CV

    BRAU-Net++: U-Shaped Hybrid CNN-Transformer Network for Medical Image Segmentation

    Authors: Libin Lan, Pengzhou Cai, Lu Jiang, Xiaojuan Liu, Yongmei Li, Yudong Zhang

    Abstract: Accurate medical image segmentation is essential for clinical quantification, disease diagnosis, treatment planning and many other applications. Both convolution-based and transformer-based u-shaped architectures have made significant success in various medical image segmentation tasks. The former can efficiently learn local information of images while requiring much more image-specific inductive… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

    Comments: 12 pages, 6 figures, 9 tables code: https://github.com/Caipengzhou/BRAU-Netplusplus

  24. arXiv:2312.15136  [pdf, other

    physics.comp-ph cs.AI cs.CV

    Towards End-to-End Structure Solutions from Information-Compromised Diffraction Data via Generative Deep Learning

    Authors: Gabe Guo, Judah Goldfeder, Ling Lan, Aniv Ray, Albert Hanming Yang, Boyuan Chen, Simon JL Billinge, Hod Lipson

    Abstract: The revolution in materials in the past century was built on a knowledge of the atomic arrangements and the structure-property relationship. The sine qua non for obtaining quantitative structural information is single crystal crystallography. However, increasingly we need to solve structures in cases where the information content in our input signal is significantly degraded, for example, due to o… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  25. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  26. arXiv:2311.15540  [pdf

    cs.CV cs.MM

    EAFP-Med: An Efficient Adaptive Feature Processing Module Based on Prompts for Medical Image Detection

    Authors: Xiang Li, Long Lan, Husam Lahza, Shaowu Yang, Shuihua Wang, Wenjing Yang, Hengzhu Liu, Yudong Zhang

    Abstract: In the face of rapid advances in medical imaging, cross-domain adaptive medical image detection is challenging due to the differences in lesion representations across various medical imaging technologies. To address this issue, we draw inspiration from large language models to propose EAFP-Med, an efficient adaptive feature processing module based on prompts for medical image detection. EAFP-Med c… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  27. arXiv:2311.00895  [pdf, other

    cs.SD cs.CL eess.AS

    In-Context Prompt Editing For Conditional Audio Generation

    Authors: Ernie Chang, Pin-Jie Lin, Yang Li, Sidd Srinivasan, Gael Le Lan, David Kant, Yangyang Shi, Forrest Iandola, Vikas Chandra

    Abstract: Distributional shift is a central challenge in the deployment of machine learning models as they can be ill-equipped for real-world data. This is particularly evident in text-to-audio generation where the encoded representations are easily undermined by unseen prompts, which leads to the degradation of generated audio -- the limited set of the text-audio pairs remains inadequate for conditional au… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: 5 pages, 3 figures, 2 tables

  28. arXiv:2310.11093  [pdf, other

    cs.LG cs.CV

    SODA: Robust Training of Test-Time Data Adaptors

    Authors: Zige Wang, Yonggang Zhang, Zhen Fang, Long Lan, Wenjing Yang, Bo Han

    Abstract: Adapting models deployed to test distributions can mitigate the performance degradation caused by distribution shifts. However, privacy concerns may render model parameters inaccessible. One promising approach involves utilizing zeroth-order optimization (ZOO) to train a data adaptor to adapt the test data to fit the deployed models. Nevertheless, the data adaptor trained with ZOO typically brings… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

  29. arXiv:2310.09926  [pdf, other

    cs.AI

    Estimating Uncertainty in Multimodal Foundation Models using Public Internet Data

    Authors: Shiladitya Dutta, Hongbo Wei, Lars van der Laan, Ahmed M. Alaa

    Abstract: Foundation models are trained on vast amounts of data at scale using self-supervised learning, enabling adaptation to a wide range of downstream tasks. At test time, these models exhibit zero-shot capabilities through which they can classify previously unseen (user-specified) categories. In this paper, we address the problem of quantifying uncertainty in these zero-shot predictions. We propose a h… ▽ More

    Submitted 26 November, 2023; v1 submitted 15 October, 2023; originally announced October 2023.

  30. arXiv:2310.00289  [pdf, other

    eess.IV cs.CV

    Pubic Symphysis-Fetal Head Segmentation Using Pure Transformer with Bi-level Routing Attention

    Authors: Pengzhou Cai, Jiang Lu, Yanxin Li, Libin Lan

    Abstract: In this paper, we propose a method, named BRAU-Net, to solve the pubic symphysis-fetal head segmentation task. The method adopts a U-Net-like pure Transformer architecture with bi-level routing attention and skip connections, which effectively learns local-global semantic information. The proposed BRAU-Net was evaluated on transperineal Ultrasound images dataset from the pubic symphysis-fetal head… ▽ More

    Submitted 7 October, 2023; v1 submitted 30 September, 2023; originally announced October 2023.

  31. arXiv:2309.10537  [pdf, other

    eess.AS cs.MM cs.SD

    FoleyGen: Visually-Guided Audio Generation

    Authors: Xinhao Mei, Varun Nagaraja, Gael Le Lan, Zhaoheng Ni, Ernie Chang, Yangyang Shi, Vikas Chandra

    Abstract: Recent advancements in audio generation have been spurred by the evolution of large-scale deep learning models and expansive datasets. However, the task of video-to-audio (V2A) generation continues to be a challenge, principally because of the intricate relationship between the high-dimensional visual and auditory data, and the challenges associated with temporal synchronization. In this study, we… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

  32. arXiv:2309.08804  [pdf, other

    eess.AS cs.SD

    Stack-and-Delay: a new codebook pattern for music generation

    Authors: Gael Le Lan, Varun Nagaraja, Ernie Chang, David Kant, Zhaoheng Ni, Yangyang Shi, Forrest Iandola, Vikas Chandra

    Abstract: In language modeling based music generation, a generated waveform is represented by a sequence of hierarchical token stacks that can be decoded either in an auto-regressive manner or in parallel, depending on the codebook patterns. In particular, flattening the codebooks represents the highest quality decoding strategy, while being notoriously slow. To this end, we propose a novel stack-and-delay… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

  33. arXiv:2309.08773  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Enhance audio generation controllability through representation similarity regularization

    Authors: Yangyang Shi, Gael Le Lan, Varun Nagaraja, Zhaoheng Ni, Xinhao Mei, Ernie Chang, Forrest Iandola, Yang Liu, Vikas Chandra

    Abstract: This paper presents an innovative approach to enhance control over audio generation by emphasizing the alignment between audio and text representations during model training. In the context of language model-based audio generation, the model leverages input from both textual and audio token representations to predict subsequent audio tokens. However, the current configuration lacks explicit regula… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: 5 pages

  34. arXiv:2307.13981  [pdf, other

    cs.CV cs.MM eess.IV

    Analysis of Video Quality Datasets via Design of Minimalistic Video Quality Models

    Authors: Wei Sun, Wen Wen, Xiongkuo Min, Long Lan, Guangtao Zhai, Kede Ma

    Abstract: Blind video quality assessment (BVQA) plays an indispensable role in monitoring and improving the end-users' viewing experience in various real-world video-enabled media applications. As an experimental field, the improvements of BVQA models have been measured primarily on a few human-rated VQA datasets. Thus, it is crucial to gain a better understanding of existing VQA datasets in order to proper… ▽ More

    Submitted 3 April, 2024; v1 submitted 26 July, 2023; originally announced July 2023.

  35. arXiv:2307.00997  [pdf, other

    cs.CV cs.AI

    RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation

    Authors: Yonglin Li, Jing Zhang, Xiao Teng, Long Lan

    Abstract: The Segment Anything Model (SAM) has gained significant attention for its impressive performance in image segmentation. However, it lacks proficiency in referring video object segmentation (RVOS) due to the need for precise user-interactive prompts and a limited understanding of different modalities, such as language and vision. This paper presents the RefSAM model, which explores the potential of… ▽ More

    Submitted 1 October, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

    Comments: The code and models will be made publicly at https://github.com/LancasterLi/RefSAM

  36. arXiv:2306.10171  [pdf, other

    cs.LG cs.AI stat.ML

    Bootstrapped Representations in Reinforcement Learning

    Authors: Charline Le Lan, Stephen Tu, Mark Rowland, Anna Harutyunyan, Rishabh Agarwal, Marc G. Bellemare, Will Dabney

    Abstract: In reinforcement learning (RL), state representations are key to dealing with large or continuous state spaces. While one of the promises of deep learning algorithms is to automatically construct features well-tuned for the task they try to solve, such a representation might not emerge from end-to-end training of deep RL agents. To mitigate this issue, auxiliary objectives are often incorporated i… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: ICML 2023

  37. arXiv:2306.04979  [pdf, other

    cs.LG cs.AI

    CoCo: A Coupled Contrastive Framework for Unsupervised Domain Adaptive Graph Classification

    Authors: Nan Yin, Li Shen, Mengzhu Wang, Long Lan, Zeyu Ma, Chong Chen, Xian-Sheng Hua, Xiao Luo

    Abstract: Although graph neural networks (GNNs) have achieved impressive achievements in graph classification, they often need abundant task-specific labels, which could be extensively costly to acquire. A credible solution is to explore additional labeled graphs to enhance unsupervised learning on the target domain. However, how to apply GNNs to domain adaptation remains unsolved owing to the insufficient… ▽ More

    Submitted 29 July, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

  38. arXiv:2306.02585  [pdf, other

    cs.CV

    MotionTrack: Learning Motion Predictor for Multiple Object Tracking

    Authors: Changcheng Xiao, Qiong Cao, Yujie Zhong, Long Lan, Xiang Zhang, Zhigang Luo, Dacheng Tao

    Abstract: Significant progress has been achieved in multi-object tracking (MOT) through the evolution of detection and re-identification (ReID) techniques. Despite these advancements, accurately tracking objects in scenarios with homogeneous appearance and heterogeneous motion remains a challenge. This challenge arises from two main factors: the insufficient discriminability of ReID features and the predomi… ▽ More

    Submitted 11 March, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

  39. arXiv:2305.06710  [pdf, other

    cs.CV cs.AI

    Null-text Guidance in Diffusion Models is Secretly a Cartoon-style Creator

    Authors: Jing Zhao, Heliang Zheng, Chaoyue Wang, Long Lan, Wanrong Huang, Wenjing Yang

    Abstract: Classifier-free guidance is an effective sampling technique in diffusion models that has been widely adopted. The main idea is to extrapolate the model in the direction of text guidance and away from null-text guidance. In this paper, we demonstrate that null-text guidance in diffusion models is secretly a cartoon-style creator, i.e., the generated images can be efficiently transformed into cartoo… ▽ More

    Submitted 3 August, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

    Comments: Accepted by ACM MM 2023

    Journal ref: ACM MM 2023

  40. arXiv:2304.13424  [pdf, other

    cs.LG cs.AI cs.RO

    Can Agents Run Relay Race with Strangers? Generalization of RL to Out-of-Distribution Trajectories

    Authors: Li-Cheng Lan, Huan Zhang, Cho-Jui Hsieh

    Abstract: In this paper, we define, evaluate, and improve the ``relay-generalization'' performance of reinforcement learning (RL) agents on the out-of-distribution ``controllable'' states. Ideally, an RL agent that generally masters a task should reach its goal starting from any controllable state of the environment instead of memorizing a small set of trajectories. For example, a self-driving system should… ▽ More

    Submitted 26 April, 2023; originally announced April 2023.

    Comments: ICRL 2023

  41. arXiv:2304.12567  [pdf, other

    cs.LG cs.AI stat.ML

    Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks

    Authors: Jesse Farebrother, Joshua Greaves, Rishabh Agarwal, Charline Le Lan, Ross Goroshin, Pablo Samuel Castro, Marc G. Bellemare

    Abstract: Auxiliary tasks improve the representations learned by deep reinforcement learning agents. Analytically, their effect is reasonably well understood; in practice, however, their primary use remains in support of a main learning objective, rather than as a method for learning representations. This is perhaps surprising given that many auxiliary tasks are defined procedurally, and hence can be treate… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

    Comments: ICLR 2023. Code and models are available at https://github.com/google-research/google-research/tree/master/pvn 22 pages, 8 figures

  42. arXiv:2303.13126  [pdf, other

    cs.CV cs.AI

    MagicFusion: Boosting Text-to-Image Generation Performance by Fusing Diffusion Models

    Authors: Jing Zhao, Heliang Zheng, Chaoyue Wang, Long Lan, Wenjing Yang

    Abstract: The advent of open-source AI communities has produced a cornucopia of powerful text-guided diffusion models that are trained on various datasets. While few explorations have been conducted on ensembling such models to combine their strengths. In this work, we propose a simple yet effective method called Saliency-aware Noise Blending (SNB) that can empower the fused text-guided diffusion models to… ▽ More

    Submitted 14 July, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

    Comments: Accepted by ICCV 2023

  43. arXiv:2302.14011  [pdf, other

    stat.ML cs.LG stat.ME

    Causal isotonic calibration for heterogeneous treatment effects

    Authors: Lars van der Laan, Ernesto Ulloa-Pérez, Marco Carone, Alex Luedtke

    Abstract: We propose causal isotonic calibration, a novel nonparametric method for calibrating predictors of heterogeneous treatment effects. Furthermore, we introduce cross-calibration, a data-efficient variant of calibration that eliminates the need for hold-out calibration sets. Cross-calibration leverages cross-fitted predictors and generates a single calibrated predictor using all available data. Under… ▽ More

    Submitted 5 June, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

    Comments: Accepted to ICML2023

  44. arXiv:2302.09359  [pdf, other

    cs.LG

    Towards Radar Emitter Recognition in Changing Environments with Domain Generalization

    Authors: Honglin Wu, Xueqiong Li, Long Lan, Liyang Xu, Yuhua Tang

    Abstract: Analyzing radar signals from complex Electronic Warfare (EW) environment is a non-trivial task.However, in the real world, the changing EW environment results in inconsistent signal distribution, such as the pulse repetition interval (PRI) mismatch between different detected scenes.In this paper, we propose a novel domain generalization framework to improve the adaptability of signal recognition i… ▽ More

    Submitted 18 February, 2023; originally announced February 2023.

  45. arXiv:2301.11099  [pdf, other

    cs.LG

    Federated Learning over Coupled Graphs

    Authors: Runze Lei, Pinghui Wang, Junzhou Zhao, Lin Lan, Jing Tao, Chao Deng, Junlan Feng, Xidian Wang, Xiaohong Guan

    Abstract: Graphs are widely used to represent the relations among entities. When one owns the complete data, an entire graph can be easily built, therefore performing analysis on the graph is straightforward. However, in many scenarios, it is impractical to centralize the data due to data privacy concerns. An organization or party only keeps a part of the whole graph data, i.e., graph data is isolated from… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

    Comments: Accepted by IEEE Transactions on Parallel and Distributed Systems

  46. arXiv:2212.04025  [pdf, other

    cs.LG cs.AI stat.ML

    A Novel Stochastic Gradient Descent Algorithm for Learning Principal Subspaces

    Authors: Charline Le Lan, Joshua Greaves, Jesse Farebrother, Mark Rowland, Fabian Pedregosa, Rishabh Agarwal, Marc G. Bellemare

    Abstract: Many machine learning problems encode their data as a matrix with a possibly very large number of rows and columns. In several applications like neuroscience, image compression or deep reinforcement learning, the principal subspace of such a matrix provides a useful, low-dimensional representation of individual data. Here, we are interested in determining the $d$-dimensional principal subspace of… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

    Comments: 8 pages in main content, 2 pages of bibliography and 5 pages in Appendix

  47. arXiv:2212.03319  [pdf, other

    cs.LG cs.AI

    Understanding Self-Predictive Learning for Reinforcement Learning

    Authors: Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko

    Abstract: We study the learning dynamics of self-predictive learning for reinforcement learning, a family of algorithms that learn representations by minimizing the prediction error of their own future latent representations. Despite its recent empirical success, such algorithms have an apparent defect: trivial representations (such as constants) minimize the prediction error, yet it is obviously undesirabl… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

  48. arXiv:2211.13508  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    1st Workshop on Maritime Computer Vision (MaCVi) 2023: Challenge Results

    Authors: Benjamin Kiefer, Matej Kristan, Janez Perš, Lojze Žust, Fabio Poiesi, Fabio Augusto de Alcantara Andrade, Alexandre Bernardino, Matthew Dawkins, Jenni Raitoharju, Yitong Quan, Adem Atmaca, Timon Höfer, Qiming Zhang, Yufei Xu, Jing Zhang, Dacheng Tao, Lars Sommer, Raphael Spraul, Hangyue Zhao, Hongpu Zhang, Yanyun Zhao, Jan Lukas Augustin, Eui-ik Jeon, Impyeong Lee, Luca Zedda , et al. (48 additional authors not shown)

    Abstract: The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detec… ▽ More

    Submitted 28 November, 2022; v1 submitted 24 November, 2022; originally announced November 2022.

    Comments: MaCVi 2023 was part of WACV 2023. This report (38 pages) discusses the competition as part of MaCVi

  49. arXiv:2211.03769  [pdf, other

    cs.AI cs.LG cs.RO

    Are AlphaZero-like Agents Robust to Adversarial Perturbations?

    Authors: Li-Cheng Lan, Huan Zhang, Ti-Rong Wu, Meng-Yu Tsai, I-Chen Wu, Cho-Jui Hsieh

    Abstract: The success of AlphaZero (AZ) has demonstrated that neural-network-based Go AIs can surpass human performance by a large margin. Given that the state space of Go is extremely large and a human player can play the game from any legal state, we ask whether adversarial states exist for Go AIs that may lead them to play surprisingly wrong actions. In this paper, we first extend the concept of adversar… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: Accepted by Neurips 2022

  50. arXiv:2206.08751  [pdf, other

    cs.CV eess.IV

    Perceptual Quality Assessment of Virtual Reality Videos in the Wild

    Authors: Wen Wen, Mu Li, Yiru Yao, Xiangjie Sui, Yabin Zhang, Long Lan, Yuming Fang, Kede Ma

    Abstract: Investigating how people perceive virtual reality (VR) videos in the wild (i.e., those captured by everyday users) is a crucial and challenging task in VR-related applications due to complex authentic distortions localized in space and time. Existing panoramic video databases only consider synthetic distortions, assume fixed viewing conditions, and are limited in size. To overcome these shortcomin… ▽ More

    Submitted 15 March, 2024; v1 submitted 12 June, 2022; originally announced June 2022.

    Comments: Accepted by IEEE Transactions on Circuits and Systems for Video Technology