Skip to main content

Showing 1–50 of 275 results for author: Hao, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.15966  [pdf, other

    cs.CV cs.AI cs.CL

    More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding

    Authors: Yuan Tang, Xu Han, Xianzhi Li, Qiao Yu, Jinfeng Xu, Yixue Hao, Long Hu, Min Chen

    Abstract: Enabling Large Language Models (LLMs) to comprehend the 3D physical world remains a significant challenge. Due to the lack of large-scale 3D-text pair datasets, the success of LLMs has yet to be replicated in 3D understanding. In this paper, we rethink this issue and propose a new task: 3D Data-Efficient Point-Language Understanding. The goal is to enable LLMs to achieve robust 3D object understan… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  2. arXiv:2408.15461  [pdf, other

    cs.CV cs.MM

    Hand1000: Generating Realistic Hands from Text with Only 1,000 Images

    Authors: Haozhuo Zhang, Bin Zhu, Yu Cao, Yanbin Hao

    Abstract: Text-to-image generation models have achieved remarkable advancements in recent years, aiming to produce realistic images from textual descriptions. However, these models often struggle with generating anatomically accurate representations of human hands. The resulting images frequently exhibit issues such as incorrect numbers of fingers, unnatural twisting or interlacing of fingers, or blurred an… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: Project page https://haozhuo-zhang.github.io/Hand1000-project-page/

  3. arXiv:2408.11261  [pdf, other

    cs.AI cs.CL

    Towards Analyzing and Mitigating Sycophancy in Large Vision-Language Models

    Authors: Yunpu Zhao, Rui Zhang, Junbin Xiao, Changxin Ke, Ruibo Hou, Yifan Hao, Qi Guo, Yunji Chen

    Abstract: Large Vision-Language Models (LVLMs) have shown significant capability in vision-language understanding. However, one critical issue that persists in these models is sycophancy, which means models are unduly influenced by leading or deceptive prompts, resulting in biased outputs and hallucinations. Despite the progress in LVLMs, evaluating and mitigating sycophancy is yet much under-explored. In t… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  4. arXiv:2408.06741  [pdf, other

    cs.CV

    Improving Synthetic Image Detection Towards Generalization: An Image Transformation Perspective

    Authors: Ouxiang Li, Jiayin Cai, Yanbin Hao, Xiaolong Jiang, Yao Hu, Fuli Feng

    Abstract: With recent generative models facilitating photo-realistic image synthesis, the proliferation of synthetic images has also engendered certain negative impacts on social platforms, thereby raising an urgent imperative to develop effective detectors. Current synthetic image detection (SID) pipelines are primarily dedicated to crafting universal artifact features, accompanied by an oversight about SI… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  5. arXiv:2408.03520  [pdf, other

    cs.RO

    AirSLAM: An Efficient and Illumination-Robust Point-Line Visual SLAM System

    Authors: Kuan Xu, Yuefan Hao, Shenghai Yuan, Chen Wang, Lihua Xie

    Abstract: In this paper, we present an efficient visual SLAM system designed to tackle both short-term and long-term illumination challenges. Our system adopts a hybrid approach that combines deep learning techniques for feature detection and matching with traditional backend optimization methods. Specifically, we propose a unified convolutional neural network (CNN) that simultaneously extracts keypoints an… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: 19 pages, 14 figures

  6. arXiv:2408.01230  [pdf, other

    cs.RO cs.LG

    HeteroMorpheus: Universal Control Based on Morphological Heterogeneity Modeling

    Authors: YiFan Hao, Yang Yang, Junru Song, Wei Peng, Weien Zhou, Tingsong Jiang, Wen Yao

    Abstract: In the field of robotic control, designing individual controllers for each robot leads to high computational costs. Universal control policies, applicable across diverse robot morphologies, promise to mitigate this challenge. Predominantly, models based on Graph Neural Networks (GNN) and Transformers are employed, owing to their effectiveness in capturing relational dynamics across a robot's limbs… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  7. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  8. arXiv:2407.20908  [pdf, other

    cs.CV

    Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering

    Authors: Yanpeng Zhao, Yiwei Hao, Siyu Gao, Yunbo Wang, Xiaokang Yang

    Abstract: Learning object-centric representations from unsupervised videos is challenging. Unlike most previous approaches that focus on decomposing 2D images, we present a 3D generative model named DynaVol-S for dynamic scenes that enables object-centric learning within a differentiable volume rendering framework. The key idea is to perform object-centric voxelization to capture the 3D nature of the scene,… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  9. arXiv:2407.16977  [pdf, other

    cs.CV cs.MM

    Selective Vision-Language Subspace Projection for Few-shot CLIP

    Authors: Xingyu Zhu, Beier Zhu, Yi Tan, Shuo Wang, Yanbin Hao, Hanwang Zhang

    Abstract: Vision-language models such as CLIP are capable of mapping the different modality data into a unified feature space, enabling zero/few-shot inference by measuring the similarity of given images and texts. However, most existing methods overlook modality gaps in CLIP's encoded features, which is shown as the text and image features lie far apart from each other, resulting in limited classification… ▽ More

    Submitted 26 July, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted as an Oral Paper at ACM Multimedia 2024

  10. arXiv:2407.14117  [pdf, other

    cs.CV

    Rethinking Visual Content Refinement in Low-Shot CLIP Adaptation

    Authors: Jinda Lu, Shuo Wang, Yanbin Hao, Haifeng Liu, Xiang Wang, Meng Wang

    Abstract: Recent adaptations can boost the low-shot capability of Contrastive Vision-Language Pre-training (CLIP) by effectively facilitating knowledge transfer. However, these adaptation methods are usually operated on the global view of an input image, and thus biased perception of partial local details of the image. To solve this problem, we propose a Visual Content Refinement (VCR) before the adaptation… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  11. arXiv:2407.11424  [pdf, other

    cs.CV

    Model Inversion Attacks Through Target-Specific Conditional Diffusion Models

    Authors: Ouxiang Li, Yanbin Hao, Zhicai Wang, Bin Zhu, Shuo Wang, Zaixi Zhang, Fuli Feng

    Abstract: Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications. Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space. To alleviate these issues, leveraging on diffusion models' remarkable synthesis capabilities, w… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Preprint. Under review

  12. arXiv:2407.08903  [pdf, other

    cs.CR cs.AI cs.AR

    TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing

    Authors: Husheng Han, Xinyao Zheng, Yuanbo Wen, Yifan Hao, Erhu Feng, Ling Liang, Jianan Mu, Xiaqing Li, Tianyun Ma, Pengwei Jin, Xinkai Song, Zidong Du, Qi Guo, Xing Hu

    Abstract: Heterogeneous collaborative computing with NPU and CPU has received widespread attention due to its substantial performance benefits. To ensure data confidentiality and integrity during computing, Trusted Execution Environments (TEE) is considered a promising solution because of its comparatively lower overhead. However, existing heterogeneous TEE designs are inefficient for collaborative computin… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted by ASPLOS 2024

  13. arXiv:2407.07586  [pdf, other

    cs.CV cs.LG

    Simplifying Source-Free Domain Adaptation for Object Detection: Effective Self-Training Strategies and Performance Insights

    Authors: Yan Hao, Florent Forest, Olga Fink

    Abstract: This paper focuses on source-free domain adaptation for object detection in computer vision. This task is challenging and of great practical interest, due to the cost of obtaining annotated data sets for every new domain. Recent research has proposed various solutions for Source-Free Object Detection (SFOD), most being variations of teacher-student architectures with diverse feature alignment, reg… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted at ECCV 2024. 19 pages

    ACM Class: I.4.8; I.5.4

  14. arXiv:2407.04292  [pdf, other

    cs.AR cs.RO

    Corki: Enabling Real-time Embodied AI Robots via Algorithm-Architecture Co-Design

    Authors: Yiyang Huang, Yuhui Hao, Bo Yu, Feng Yan, Yuxin Yang, Feng Min, Yinhe Han, Lin Ma, Shaoshan Liu, Qiang Liu, Yiming Gan

    Abstract: Embodied AI robots have the potential to fundamentally improve the way human beings live and manufacture. Continued progress in the burgeoning field of using large language models to control robots depends critically on an efficient computing substrate. In particular, today's computing systems for embodied AI robots are designed purely based on the interest of algorithm developers, where robot act… ▽ More

    Submitted 1 August, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  15. arXiv:2407.04272  [pdf, other

    cs.LG cs.DC

    Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression

    Authors: Hao Feng, Boyuan Zhang, Fanjiang Ye, Min Si, Ching-Hsiang Chu, Jiannan Tian, Chunxing Yin, Summer Deng, Yuchen Hao, Pavan Balaji, Tong Geng, Dingwen Tao

    Abstract: DLRM is a state-of-the-art recommendation system model that has gained widespread adoption across various industry applications. The large size of DLRM models, however, necessitates the use of multiple devices/GPUs for efficient training. A significant bottleneck in this process is the time-consuming all-to-all communication required to collect embedding data from all devices. To mitigate this, we… ▽ More

    Submitted 25 August, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: camera-ready version for SC '24

  16. PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition

    Authors: Yanbin Hao, Diansong Zhou, Zhicai Wang, Chong-Wah Ngo, Meng Wang

    Abstract: In recent years, vision Transformers and MLPs have demonstrated remarkable performance in image understanding tasks. However, their inherently dense computational operators, such as self-attention and token-mixing layers, pose significant challenges when applied to spatio-temporal video data. To address this gap, we propose PosMLP-Video, a lightweight yet powerful MLP-like backbone for video recog… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Journal ref: International Journal of Computer Vision, 27 June 2024

  17. arXiv:2407.01945  [pdf, other

    cs.CV

    Indoor 3D Reconstruction with an Unknown Camera-Projector Pair

    Authors: Zhaoshuai Qi, Yifeng Hao, Rui Hu, Wenyou Chang, Jiaqi Yang, Yanning Zhang

    Abstract: Structured light-based method with a camera-projector pair (CPP) plays a vital role in indoor 3D reconstruction, especially for scenes with weak textures. Previous methods usually assume known intrinsics, which are pre-calibrated from known objects, or self-calibrated from multi-view observations. It is still challenging to reliably recover CPP intrinsics from only two views without any known obje… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  18. arXiv:2406.19435  [pdf, other

    cs.CV

    A Sanity Check for AI-generated Image Detection

    Authors: Shilin Yan, Ouxiang Li, Jiayin Cai, Yanbin Hao, Xiaolong Jiang, Yao Hu, Weidi Xie

    Abstract: With the rapid development of generative models, discerning AI-generated content has evoked increasing attention from both industry and academia. In this paper, we conduct a sanity check on "whether the task of AI-generated image detection has been solved". To start with, we present Chameleon dataset, consisting AIgenerated images that are genuinely challenging for human perception. To quantify th… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Project page: https://shilinyan99.github.io/AIDE Code: https://github.com/shilinyan99/AIDE

  19. arXiv:2406.18984  [pdf, other

    cs.IR

    Amplify Graph Learning for Recommendation via Sparsity Completion

    Authors: Peng Yuan, Haojie Li, Minying Fang, Xu Yu, Yongjing Hao, Junwei Du

    Abstract: Graph learning models have been widely deployed in collaborative filtering (CF) based recommendation systems. Due to the issue of data sparsity, the graph structure of the original input lacks potential positive preference edges, which significantly reduces the performance of recommendations. In this paper, we study how to enhance the graph structure for CF more effectively, thereby optimizing the… ▽ More

    Submitted 1 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

  20. arXiv:2406.18067  [pdf, other

    cs.CL eess.AS

    Exploring Energy-Based Models for Out-of-Distribution Detection in Dialect Identification

    Authors: Yaqian Hao, Chenguang Hu, Yingying Gao, Shilei Zhang, Junlan Feng

    Abstract: The diverse nature of dialects presents challenges for models trained on specific linguistic patterns, rendering them susceptible to errors when confronted with unseen or out-of-distribution (OOD) data. This study introduces a novel margin-enhanced joint energy model (MEJEM) tailored specifically for OOD detection in dialects. By integrating a generative model and the energy margin loss, our appro… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  21. arXiv:2406.18065  [pdf, other

    eess.AS cs.SD

    On Calibration of Speech Classification Models: Insights from Energy-Based Model Investigations

    Authors: Yaqian Hao, Chenguang Hu, Yingying Gao, Shilei Zhang, Junlan Feng

    Abstract: For speech classification tasks, deep learning models often achieve high accuracy but exhibit shortcomings in calibration, manifesting as classifiers exhibiting overconfidence. The significance of calibration lies in its critical role in guaranteeing the reliability of decision-making within deep learning systems. This study explores the effectiveness of Energy-Based Models in calibrating confiden… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  22. arXiv:2406.16968  [pdf, other

    cs.LG cs.AI

    Multimodal Physiological Signals Representation Learning via Multiscale Contrasting for Depression Recognition

    Authors: Kai Shao, Rui Wang, Yixue Hao, Long Hu, Min Chen, Hans Arno Jacobsen

    Abstract: Depression recognition based on physiological signals such as functional near-infrared spectroscopy (fNIRS) and electroencephalogram (EEG) has made considerable progress. However, most existing studies ignore the complementarity and semantic consistency of multimodal physiological signals under the same stimulation task in complex spatio-temporal patterns. In this paper, we introduce a multimodal… ▽ More

    Submitted 25 June, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

  23. arXiv:2406.16369  [pdf, ps, other

    cs.CR

    Machine Learning with Real-time and Small Footprint Anomaly Detection System for In-Vehicle Gateway

    Authors: Yi Wang, Yuanjin Zheng, Yajun Ha

    Abstract: Anomaly Detection System (ADS) is an essential part of a modern gateway Electronic Control Unit (ECU) to detect abnormal behaviors and attacks in vehicles. Among the existing attacks, ``one-time`` attack is the most challenging to be detected, together with the strict gateway ECU constraints of both microsecond or even nanosecond level real-time budget and limited footprint of code. To address the… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  24. arXiv:2406.16330  [pdf, other

    cs.CL cs.AI

    Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging

    Authors: Deyuan Liu, Zhanyue Qin, Hairu Wang, Zhao Yang, Zecheng Wang, Fangying Rong, Qingbin Liu, Yanchao Hao, Xi Chen, Cunhang Fan, Zhao Lv, Zhiying Tu, Dianhui Chu, Bo Li, Dianbo Sui

    Abstract: While large language models (LLMs) excel in many domains, their complexity and scale challenge deployment in resource-limited environments. Current compression techniques, such as parameter pruning, often fail to effectively utilize the knowledge from pruned parameters. To address these challenges, we propose Manifold-Based Knowledge Alignment and Layer Merging Compression (MKA), a novel approach… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  25. arXiv:2406.15811  [pdf, other

    cs.CV

    PointDreamer: Zero-shot 3D Textured Mesh Reconstruction from Colored Point Cloud by 2D Inpainting

    Authors: Qiao Yu, Xianzhi Li, Yuan Tang, Jinfeng Xu, Long Hu, Yixue Hao, Min Chen

    Abstract: Reconstructing textured meshes from colored point clouds is an important but challenging task in 3D graphics and vision. Most existing methods predict colors as implicit functions in 3D or UV space, suffering from blurry textures or the lack of generalization capability. Addressing this, we propose PointDreamer, a novel framework for textured mesh reconstruction from colored point cloud. It produc… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  26. arXiv:2406.13268  [pdf, other

    eess.AS cs.SD

    CEC: A Noisy Label Detection Method for Speaker Recognition

    Authors: Yao Shen, Yingying Gao, Yaqian Hao, Chenguang Hu, Fulin Zhang, Junlan Feng, Shilei Zhang

    Abstract: Noisy labels are inevitable, even in well-annotated datasets. The detection of noisy labels is of significant importance to enhance the robustness of speaker recognition models. In this paper, we propose a novel noisy label detection approach based on two new statistical metrics: Continuous Inconsistent Counting (CIC) and Total Inconsistent Counting (TIC). These metrics are calculated through Cros… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: interspeech 2024

  27. arXiv:2406.12382  [pdf, other

    cs.CL

    From Instance Training to Instruction Learning: Task Adapters Generation from Instructions

    Authors: Huanxuan Liao, Yao Xu, Shizhu He, Yuanzhe Zhang, Yanchao Hao, Shengping Liu, Kang Liu, Jun Zhao

    Abstract: Large language models (LLMs) have acquired the ability to solve general tasks by utilizing instruction finetuning (IFT). However, IFT still relies heavily on instance training of extensive task data, which greatly limits the adaptability of LLMs to real-world scenarios where labeled task instances are scarce and broader task generalization becomes paramount. Contrary to LLMs, humans acquire skills… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  28. arXiv:2406.07342  [pdf, other

    cs.NI cs.DC

    EdgeTimer: Adaptive Multi-Timescale Scheduling in Mobile Edge Computing with Deep Reinforcement Learning

    Authors: Yijun Hao, Shusen Yang, Fang Li, Yifan Zhang, Shibo Wang, Xuebin Ren

    Abstract: In mobile edge computing (MEC), resource scheduling is crucial to task requests' performance and service providers' cost, involving multi-layer heterogeneous scheduling decisions. Existing schedulers typically adopt static timescales to regularly update scheduling decisions of each layer, without adaptive adjustment of timescales for different layers, resulting in potentially poor performance in p… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  29. arXiv:2405.15414  [pdf, other

    cs.AI

    Luban: Building Open-Ended Creative Agents via Autonomous Embodied Verification

    Authors: Yuxuan Guo, Shaohui Peng, Jiaming Guo, Di Huang, Xishan Zhang, Rui Zhang, Yifan Hao, Ling Li, Zikang Tian, Mingju Gao, Yutai Li, Yiming Gan, Shuai Liang, Zihao Zhang, Zidong Du, Qi Guo, Xing Hu, Yunji Chen

    Abstract: Building open agents has always been the ultimate goal in AI research, and creative agents are the more enticing. Existing LLM agents excel at long-horizon tasks with well-defined goals (e.g., `mine diamonds' in Minecraft). However, they encounter difficulties on creative tasks with open goals and abstract criteria due to the inability to bridge the gap between them, thus lacking feedback for self… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  30. arXiv:2405.12079  [pdf, other

    cs.DC cs.OS

    PARALLELGPUOS: A Concurrent OS-level GPU Checkpoint and Restore System using Validated Speculation

    Authors: Zhuobin Huang, Xingda Wei, Yingyi Hao, Rong Chen, Mingcong Han, Jinyu Gu, Haibo Chen

    Abstract: Checkpointing (C) and restoring (R) are key components for GPU tasks. POS is an OS-level GPU C/R system: It can transparently checkpoint or restore processes that use the GPU, without requiring any cooperation from the application, a key feature required by modern systems like the cloud. Moreover, POS is the first OS-level C/R system that can concurrently execute C/R with the application execution… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  31. arXiv:2405.03202  [pdf, other

    cs.CV

    Hierarchical Space-Time Attention for Micro-Expression Recognition

    Authors: Haihong Hao, Shuo Wang, Huixia Ben, Yanbin Hao, Yansong Wang, Weiwei Wang

    Abstract: Micro-expression recognition (MER) aims to recognize the short and subtle facial movements from the Micro-expression (ME) video clips, which reveal real emotions. Recent MER methods mostly only utilize special frames from ME video clips or extract optical flow from these special frames. However, they neglect the relationship between movements and space-time, while facial cues are hidden within the… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 9 pages, 4 figures

  32. arXiv:2405.01413  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors

    Authors: Yuan Tang, Xu Han, Xianzhi Li, Qiao Yu, Yixue Hao, Long Hu, Min Chen

    Abstract: Large 2D vision-language models (2D-LLMs) have gained significant attention by bridging Large Language Models (LLMs) with images using a simple projector. Inspired by their success, large 3D point cloud-language models (3D-LLMs) also integrate point clouds into LLMs. However, directly aligning point clouds with LLM requires expensive training costs, typically in hundreds of GPU-hours on A100, whic… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 17 pages, 9 figures

  33. arXiv:2405.00760  [pdf, other

    cs.CV cs.AI

    Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models

    Authors: Xiaoshi Wu, Yiming Hao, Manyuan Zhang, Keqiang Sun, Zhaoyang Huang, Guanglu Song, Yu Liu, Hongsheng Li

    Abstract: Optimizing a text-to-image diffusion model with a given reward function is an important but underexplored research area. In this study, we propose Deep Reward Tuning (DRTune), an algorithm that directly supervises the final output image of a text-to-image diffusion model and back-propagates through the iterative sampling process to the input noise. We find that training earlier steps in the sampli… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: N/A

  34. arXiv:2404.16038  [pdf, other

    cs.CV cs.AI cs.MM

    A Survey on Generative AI and LLM for Video Generation, Understanding, and Streaming

    Authors: Pengyuan Zhou, Lin Wang, Zhi Liu, Yanbin Hao, Pan Hui, Sasu Tarkoma, Jussi Kangasharju

    Abstract: This paper offers an insightful examination of how currently top-trending AI technologies, i.e., generative artificial intelligence (Generative AI) and large language models (LLMs), are reshaping the field of video technology, including video generation, understanding, and streaming. It highlights the innovative use of these technologies in producing highly realistic videos, a significant leap in… ▽ More

    Submitted 30 January, 2024; originally announced April 2024.

    Comments: 16 pages, 10 figures, 4 tables

  35. arXiv:2404.16015  [pdf, other

    physics.comp-ph cs.AI math.NA

    Neural Operators Learn the Local Physics of Magnetohydrodynamics

    Authors: Taeyoung Kim, Youngsoo Ha, Myungjoo Kang

    Abstract: Magnetohydrodynamics (MHD) plays a pivotal role in describing the dynamics of plasma and conductive fluids, essential for understanding phenomena such as the structure and evolution of stars and galaxies, and in nuclear fusion for plasma motion through ideal MHD equations. Solving these hyperbolic PDEs requires sophisticated numerical methods, presenting computational challenges due to complex str… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 47 pages, 24 figures

  36. arXiv:2404.13512  [pdf, other

    math.OC cs.ET

    Planning of Truck Platooning for Road-Network Capacitated Vehicle Routing Problem

    Authors: Yilang Hao, Zhibin Chen, Xiaotong Sun, Lu Tong

    Abstract: Truck platooning, a linking technology of trucks on the highway, has gained enormous attention in recent years due to its benefits in energy and operation cost savings. However, most existing studies on truck platooning limit their focus on scenarios in which each truck can serve only one customer demand and is thus with a specified origin-destination pair, so only routing and time schedules are c… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: 36 pages, 12 figures

  37. arXiv:2404.11891  [pdf, other

    cs.AI cs.CL cs.HC

    Large Language Models Can Plan Your Travels Rigorously with Formal Verification Tools

    Authors: Yilun Hao, Yongchao Chen, Yang Zhang, Chuchu Fan

    Abstract: The recent advancements of Large Language Models (LLMs), with their abundant world knowledge and capabilities of tool-using and reasoning, fostered many LLM planning algorithms. However, LLMs have not shown to be able to accurately solve complex combinatorial optimization problems. In Xie et al. (2024), the authors proposed TravelPlanner, a U.S. domestic travel planning benchmark, and showed that… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 31 pages, 3 figures, 4 tables, submitted to ACL RR

  38. arXiv:2404.02406  [pdf, other

    cs.CR cs.AI cs.CL

    Exploring Backdoor Vulnerabilities of Chat Models

    Authors: Yunzhuo Hao, Wenkai Yang, Yankai Lin

    Abstract: Recent researches have shown that Large Language Models (LLMs) are susceptible to a security threat known as Backdoor Attack. The backdoored model will behave well in normal cases but exhibit malicious behaviours on inputs inserted with a specific backdoor trigger. Current backdoor studies on LLMs predominantly focus on instruction-tuned LLMs, while neglecting another realistic scenario where LLMs… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Code and data are available at https://github.com/hychaochao/Chat-Models-Backdoor-Attacking

  39. arXiv:2404.00979  [pdf, other

    cs.CV

    PDF: A Probability-Driven Framework for Open World 3D Point Cloud Semantic Segmentation

    Authors: Jinfeng Xu, Siyuan Yang, Xianzhi Li, Yuan Tang, Yixue Hao, Long Hu, Min Chen

    Abstract: Existing point cloud semantic segmentation networks cannot identify unknown classes and update their knowledge, due to a closed-set and static perspective of the real world, which would induce the intelligent agent to make bad decisions. To address this problem, we propose a Probability-Driven Framework (PDF) for open world semantic segmentation that includes (i) a lightweight U-decoder branch to… ▽ More

    Submitted 23 July, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  40. arXiv:2403.19600  [pdf, other

    cs.CV

    Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model

    Authors: Zhicai Wang, Longhui Wei, Tan Wang, Heyu Chen, Yanbin Hao, Xiang Wang, Xiangnan He, Qi Tian

    Abstract: Text-to-image (T2I) generative models have recently emerged as a powerful tool, enabling the creation of photo-realistic images and giving rise to a multitude of applications. However, the effective integration of T2I models into fundamental image classification tasks remains an open question. A prevalent strategy to bolster image classification performance is through augmenting the training set w… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  41. arXiv:2403.17592  [pdf, other

    cs.LG stat.ML

    On the Benefits of Over-parameterization for Out-of-Distribution Generalization

    Authors: Yifan Hao, Yong Lin, Difan Zou, Tong Zhang

    Abstract: In recent years, machine learning models have achieved success based on the independently and identically distributed assumption. However, this assumption can be easily violated in real-world applications, leading to the Out-of-Distribution (OOD) problem. Understanding how modern over-parameterized DNNs behave under non-trivial natural distributional shifts is essential, as current theoretical und… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  42. arXiv:2403.17025  [pdf, other

    cs.CV

    Boosting Few-Shot Learning via Attentive Feature Regularization

    Authors: Xingyu Zhu, Shuo Wang, Jinda Lu, Yanbin Hao, Haifeng Liu, Xiangnan He

    Abstract: Few-shot learning (FSL) based on manifold regularization aims to improve the recognition capacity of novel objects with limited training samples by mixing two samples from different categories with a blending factor. However, this mixing operation weakens the feature representation due to the linear interpolation and the overlooking of the importance of specific channels. To solve these issues, th… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: Accepted to AAAI 2024

  43. arXiv:2403.11114  [pdf, other

    cs.LG cs.AI

    Phasic Diversity Optimization for Population-Based Reinforcement Learning

    Authors: Jingcheng Jiang, Haiyin Piao, Yu Fu, Yihang Hao, Chuanlu Jiang, Ziqi Wei, Xin Yang

    Abstract: Reviewing the previous work of diversity Rein-forcement Learning,diversity is often obtained via an augmented loss function,which requires a balance between reward and diversity.Generally,diversity optimization algorithms use Multi-armed Bandits algorithms to select the coefficient in the pre-defined space. However, the dynamic distribution of reward signals for MABs or the conflict between qualit… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: 7 pages, 4 figures

    MSC Class: 14J60 (Primary) ACM Class: I.2.9

  44. arXiv:2403.10720  [pdf, other

    cs.AI

    Development and Application of a Monte Carlo Tree Search Algorithm for Simulating Da Vinci Code Game Strategies

    Authors: Ye Zhang, Mengran Zhu, Kailin Gui, Jiayue Yu, Yong Hao, Haozhan Sun

    Abstract: In this study, we explore the efficiency of the Monte Carlo Tree Search (MCTS), a prominent decision-making algorithm renowned for its effectiveness in complex decision environments, contingent upon the volume of simulations conducted. Notwithstanding its broad applicability, the algorithm's performance can be adversely impacted in certain scenarios, particularly within the domain of game strategy… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: This paper has been accepted by CVIDL2024

  45. arXiv:2403.08778  [pdf

    cs.CV cs.GR eess.IV

    Faster Projected GAN: Towards Faster Few-Shot Image Generation

    Authors: Chuang Wang, Zhengping Li, Yuwen Hao, Lijun Wang, Xiaoxue Li

    Abstract: In order to solve the problems of long training time, large consumption of computing resources and huge parameter amount of GAN network in image generation, this paper proposes an improved GAN network model, which is named Faster Projected GAN, based on Projected GAN. The proposed network is mainly focuses on the improvement of generator of Projected GAN. By introducing depth separable convolution… ▽ More

    Submitted 23 January, 2024; originally announced March 2024.

    Comments: 9 pages,7 figures,4 tables

  46. arXiv:2403.02545  [pdf, other

    cs.LG cs.AI

    Wukong: Towards a Scaling Law for Large-Scale Recommendation

    Authors: Buyun Zhang, Liang Luo, Yuxin Chen, Jade Nie, Xi Liu, Daifeng Guo, Yanli Zhao, Shen Li, Yuchen Hao, Yantao Yao, Guna Lakshminarayanan, Ellie Dingqiao Wen, Jongsoo Park, Maxim Naumov, Wenlin Chen

    Abstract: Scaling laws play an instrumental role in the sustainable improvement in model quality. Unfortunately, recommendation models to date do not exhibit such laws similar to those observed in the domain of large language models, due to the inefficiencies of their upscaling mechanisms. This limitation poses significant challenges in adapting these models to increasingly more complex real-world datasets.… ▽ More

    Submitted 4 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: 12 pages

  47. arXiv:2403.00877  [pdf, other

    cs.LG cs.DC cs.IR

    Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation

    Authors: Liang Luo, Buyun Zhang, Michael Tsang, Yinbin Ma, Ching-Hsiang Chu, Yuxin Chen, Shen Li, Yuchen Hao, Yanli Zhao, Guna Lakshminarayanan, Ellie Dingqiao Wen, Jongsoo Park, Dheevatsa Mudigere, Maxim Naumov

    Abstract: We study a mismatch between the deep learning recommendation models' flat architecture, common distributed training paradigm and hierarchical data center topology. To address the associated inefficiencies, we propose Disaggregated Multi-Tower (DMT), a modeling technique that consists of (1) Semantic-preserving Tower Transform (SPTT), a novel training paradigm that decomposes the monolithic global… ▽ More

    Submitted 2 May, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

  48. arXiv:2402.17759  [pdf, other

    cs.CL

    Towards Optimal Learning of Language Models

    Authors: Yuxian Gu, Li Dong, Yaru Hao, Qingxiu Dong, Minlie Huang, Furu Wei

    Abstract: This work studies the general principles of improving the learning of language models (LMs), which aims at reducing the necessary training steps for achieving superior performance. Specifically, we present a theory for the optimal learning of LMs. We first propose an objective that optimizes LM learning by maximizing the data compression ratio in an "LM-training-as-lossless-compression" view. Then… ▽ More

    Submitted 3 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  49. arXiv:2402.12015  [pdf, other

    eess.SY cs.LG

    An Index Policy Based on Sarsa and Q-learning for Heterogeneous Smart Target Tracking

    Authors: Yuhang Hao, Zengfu Wang, Jing Fu, Quan Pan

    Abstract: In solving the non-myopic radar scheduling for multiple smart target tracking within an active and passive radar network, we need to consider both short-term enhanced tracking performance and a higher probability of target maneuvering in the future with active tracking. Acquiring the long-term tracking performance while scheduling the beam resources of active and passive radars poses a challenge.… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: 11 pages

  50. arXiv:2402.08702  [pdf, other

    cs.CL cs.AI cs.HC cs.RO

    PRompt Optimization in Multi-Step Tasks (PROMST): Integrating Human Feedback and Heuristic-based Sampling

    Authors: Yongchao Chen, Jacob Arkin, Yilun Hao, Yang Zhang, Nicholas Roy, Chuchu Fan

    Abstract: Prompt optimization aims to find the best prompt to a large language model (LLM) for a given task. LLMs have been successfully used to help find and improve prompt candidates for single-step tasks. However, realistic tasks for agents are multi-step and introduce new challenges: (1) Prompt content is likely to be more extensive and complex, making it more difficult for LLMs to analyze errors, (2) t… ▽ More

    Submitted 16 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: 62 pages, 14 figures