Skip to main content

Showing 1–50 of 219 results for author: Pang, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.15914  [pdf, other

    cs.CV

    CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization

    Authors: Feize Wu, Yun Pang, Junyi Zhang, Lianyu Pang, Jian Yin, Baoquan Zhao, Qing Li, Xudong Mao

    Abstract: Recent advances in text-to-image personalization have enabled high-quality and controllable image synthesis for user-provided concepts. However, existing methods still struggle to balance identity preservation with text alignment. Our approach is based on the fact that generating prompt-aligned images requires a precise semantic understanding of the prompt, which involves accurately processing the… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  2. arXiv:2408.14762  [pdf, other

    cs.LG cs.SI

    Explainable Hierarchical Urban Representation Learning for Commuting Flow Prediction

    Authors: Mingfei Cai, Yanbo Pang, Yoshihide Sekimoto

    Abstract: Commuting flow prediction is an essential task for municipal operations in the real world. Previous studies have revealed that it is feasible to estimate the commuting origin-destination (OD) demand within a city using multiple auxiliary data. However, most existing methods are not suitable to deal with a similar task at a large scale, namely within a prefecture or the whole nation, owing to the i… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 11 pages, 6 figures

  3. arXiv:2408.11845  [pdf, other

    cs.CL

    LLaMA based Punctuation Restoration With Forward Pass Only Decoding

    Authors: Yutong Pang, Debjyoti Paul, Kevin Jiang, Xuedong Zhang, Xin Lei

    Abstract: This paper introduces two advancements in the field of Large Language Model Annotation with a focus on punctuation restoration tasks. Our first contribution is the application of LLaMA for punctuation restoration, which demonstrates superior performance compared to the established benchmark. Despite its impressive quality, LLaMA faces challenges regarding inference speed and hallucinations. To a… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  4. arXiv:2408.08056  [pdf, other

    cs.LG

    DATTA: Towards Diversity Adaptive Test-Time Adaptation in Dynamic Wild World

    Authors: Chuyang Ye, Dongyan Wei, Zhendong Liu, Yuanyi Pang, Yixi Lin, Jiarong Liao, Qinting Jiang, Xianghua Fu, Qing Li, Jingyan Jiang

    Abstract: Test-time adaptation (TTA) effectively addresses distribution shifts between training and testing data by adjusting models on test samples, which is crucial for improving model inference in real-world applications. However, traditional TTA methods typically follow a fixed pattern to address the dynamic data patterns (low-diversity or high-diversity patterns) often leading to performance degradatio… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 16 pages, 2 figures

  5. arXiv:2408.06787  [pdf, other

    cs.CL

    Unlock the Power of Frozen LLMs in Knowledge Graph Completion

    Authors: Bo Xue, Yi Xu, Yunchong Song, Yiming Pang, Yuyang Ren, Jiaxin Ding, Luoyi Fu, Xinbing Wang

    Abstract: Classical knowledge graph completion (KGC) methods rely solely on structural information, struggling with the inherent sparsity of knowledge graphs (KGs). Large Language Models (LLMs) learn extensive knowledge from large corpora with powerful context modeling, which is ideal for mitigating the limitations of previous methods. Directly fine-tuning LLMs offers great capability but comes at the cost… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  6. arXiv:2408.02666  [pdf, other

    cs.CL cs.AI

    Self-Taught Evaluators

    Authors: Tianlu Wang, Ilia Kulikov, Olga Golovneva, Ping Yu, Weizhe Yuan, Jane Dwivedi-Yu, Richard Yuanzhe Pang, Maryam Fazel-Zarandi, Jason Weston, Xian Li

    Abstract: Model-based evaluation is at the heart of successful model development -- as a reward model for training, and as a replacement for human evaluation. To train such evaluators, the standard approach is to collect a large amount of human preference judgments over model responses, which is costly and the data becomes stale as models improve. In this work, we present an approach that aims to im-prove e… ▽ More

    Submitted 8 August, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

  7. arXiv:2407.19548  [pdf, other

    cs.CV

    Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle

    Authors: Zhenyu Tang, Junwu Zhang, Xinhua Cheng, Wangbo Yu, Chaoran Feng, Yatian Pang, Bin Lin, Li Yuan

    Abstract: Recent 3D large reconstruction models typically employ a two-stage process, including first generate multi-view images by a multi-view diffusion model, and then utilize a feed-forward model to reconstruct images to 3D content.However, multi-view diffusion models often produce low-quality and inconsistent images, adversely affecting the quality of the final 3D reconstruction. To address this issue,… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: Project page: https://pku-yuangroup.github.io/Cycle3D/

  8. arXiv:2407.17120  [pdf, other

    cs.LG cs.AI

    Parameter-Efficient Fine-Tuning for Continual Learning: A Neural Tangent Kernel Perspective

    Authors: Jingren Liu, Zhong Ji, YunLong Yu, Jiale Cao, Yanwei Pang, Jungong Han, Xuelong Li

    Abstract: Parameter-efficient fine-tuning for continual learning (PEFT-CL) has shown promise in adapting pre-trained models to sequential tasks while mitigating catastrophic forgetting problem. However, understanding the mechanisms that dictate continual performance in this paradigm remains elusive. To tackle this complexity, we undertake a rigorous analysis of PEFT-CL dynamics to derive relevant metrics fo… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  9. arXiv:2407.12581  [pdf, other

    cs.CR cs.AI cs.CV cs.CY

    Towards Understanding Unsafe Video Generation

    Authors: Yan Pang, Aiping Xiong, Yang Zhang, Tianhao Wang

    Abstract: Video generation models (VGMs) have demonstrated the capability to synthesize high-quality output. It is important to understand their potential to produce unsafe content, such as violent or terrifying videos. In this work, we provide a comprehensive understanding of unsafe video generation. First, to confirm the possibility that these models could indeed generate unsafe videos, we choose unsafe… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 18 pages

  10. arXiv:2407.11503  [pdf, other

    cs.CV

    Beyond Mask: Rethinking Guidance Types in Few-shot Segmentation

    Authors: Shijie Chang, Youwei Pang, Xiaoqi Zhao, Lihe Zhang, Huchuan Lu

    Abstract: Existing few-shot segmentation (FSS) methods mainly focus on prototype feature generation and the query-support matching mechanism. As a crucial prompt for generating prototype features, the pair of image-mask types in the support set has become the default setting. However, various types such as image, text, box, and mask all can provide valuable information regarding the objects in context, clas… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Preprint under review

  11. arXiv:2407.10987  [pdf, ps, other

    cs.NI cs.AI eess.SP

    Adaptive Digital Twin and Communication-Efficient Federated Learning Network Slicing for 5G-enabled Internet of Things

    Authors: Daniel Ayepah-Mensah, Guolin Sun, Yu Pang, Wei Jiang

    Abstract: Network slicing enables industrial Internet of Things (IIoT) networks with multiservice and differentiated resource requirements to meet increasing demands through efficient use and management of network resources. Typically, the network slice orchestrator relies on demand forecasts for each slice to make informed decisions and maximize resource utilization. The new generation of Industry 4.0 has… ▽ More

    Submitted 22 June, 2024; originally announced July 2024.

    Comments: 8 pages, 7 figures, conference

  12. arXiv:2406.13853  [pdf, other

    cs.HC

    AltGeoViz: Facilitating Accessible Geovisualization

    Authors: Chu Li, Rock Yuren Pang, Ather Sharif, Arnavi Chheda-Kothary, Jeffrey Heer, Jon E. Froehlich

    Abstract: Geovisualizations are powerful tools for exploratory spatial analysis, enabling sighted users to discern patterns, trends, and relationships within geographic data. However, these visual tools have remained largely inaccessible to screen-reader users. We present AltGeoViz, a new system we designed to facilitate geovisualization exploration for these users. AltGeoViz dynamically generates alt-text… ▽ More

    Submitted 21 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  13. arXiv:2406.01987  [pdf, other

    cs.CV

    Dealing with All-stage Missing Modality: Towards A Universal Model with Robust Reconstruction and Personalization

    Authors: Yunpeng Zhao, Cheng Chen, Qing You Pang, Quanzheng Li, Carol Tang, Beng-Ti Ang, Yueming Jin

    Abstract: Addressing missing modalities presents a critical challenge in multimodal learning. Current approaches focus on developing models that can handle modality-incomplete inputs during inference, assuming that the full set of modalities are available for all the data during training. This reliance on full-modality data for training limits the use of abundant modality-incomplete samples that are often e… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  14. arXiv:2405.17441  [pdf, other

    cs.NI cs.AI cs.CL eess.SY

    When Large Language Models Meet Optical Networks: Paving the Way for Automation

    Authors: Danshi Wang, Yidi Wang, Xiaotian Jiang, Yao Zhang, Yue Pang, Min Zhang

    Abstract: Since the advent of GPT, large language models (LLMs) have brought about revolutionary advancements in all walks of life. As a superior natural language processing (NLP) technology, LLMs have consistently achieved state-of-the-art performance on numerous areas. However, LLMs are considered to be general-purpose models for NLP tasks, which may encounter challenges when applied to complex tasks in s… ▽ More

    Submitted 24 June, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

  15. arXiv:2405.17405  [pdf, other

    cs.CV

    Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer

    Authors: Ruizhi Shao, Youxin Pang, Zerong Zheng, Jingxiang Sun, Yebin Liu

    Abstract: We present a novel approach for generating high-quality, spatio-temporally coherent human videos from a single image under arbitrary viewpoints. Our framework combines the strengths of U-Nets for accurate condition injection and diffusion transformers for capturing global correlations across viewpoints and time. The core is a cascaded 4D transformer architecture that factorizes attention across vi… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Our project website is https://human4dit.github.io

  16. arXiv:2405.17247  [pdf, other

    cs.LG

    An Introduction to Vision-Language Modeling

    Authors: Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, Alexander C. Li, Adrien Bardes, Suzanne Petryk, Oscar Mañas, Zhiqiu Lin, Anas Mahmoud, Bargav Jayaraman, Mark Ibrahim, Melissa Hall, Yunyang Xiong, Jonathan Lebensold, Candace Ross, Srihari Jayakumar, Chuan Guo, Diane Bouchacourt, Haider Al-Tahan, Karthik Padthe, Vasu Sharma, Hu Xu, Xiaoqing Ellen Tan, Megan Richards, Samuel Lavoie , et al. (16 additional authors not shown)

    Abstract: Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them to the visual domain. From having a visual assistant that could guide us through unfamiliar environments to generative models that produce images using only a high-level text description, the vision-language model (VLM) applications will significantly impact our relationship with technol… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  17. arXiv:2405.15182  [pdf, other

    cs.CR cs.AI

    RFLPA: A Robust Federated Learning Framework against Poisoning Attacks with Secure Aggregation

    Authors: Peihua Mai, Ran Yan, Yan Pang

    Abstract: Federated learning (FL) allows multiple devices to train a model collaboratively without sharing their data. Despite its benefits, FL is vulnerable to privacy leakage and poisoning attacks. To address the privacy concern, secure aggregation (SecAgg) is often used to obtain the aggregation of gradients on sever without inspecting individual user updates. Unfortunately, existing defense strategies a… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 22 pages

    ACM Class: E.4

  18. Distributed Harmonization: Federated Clustered Batch Effect Adjustment and Generalization

    Authors: Bao Hoang, Yijiang Pang, Siqi Liang, Liang Zhan, Paul Thompson, Jiayu Zhou

    Abstract: Independent and identically distributed (i.i.d.) data is essential to many data analysis and modeling techniques. In the medical domain, collecting data from multiple sites or institutions is a common strategy that guarantees sufficient clinical diversity, determined by the decentralized nature of medical data. However, data from various sites are easily biased by the local environment or faciliti… ▽ More

    Submitted 7 August, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: 11 pages, 7 figures, accepted to KDD2024-ADS

  19. arXiv:2405.10523  [pdf, other

    cs.CL

    Smart Expert System: Large Language Models as Text Classifiers

    Authors: Zhiqiang Wang, Yiran Pang, Yanbin Lin

    Abstract: Text classification is a fundamental task in Natural Language Processing (NLP), and the advent of Large Language Models (LLMs) has revolutionized the field. This paper introduces the Smart Expert System, a novel approach that leverages LLMs as text classifiers. The system simplifies the traditional text classification workflow, eliminating the need for extensive preprocessing and domain expertise.… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 11 pages, 3 figures, and 8 tables

  20. arXiv:2405.06783  [pdf, other

    cs.HC cs.AI cs.CY

    BLIP: Facilitating the Exploration of Undesirable Consequences of Digital Technologies

    Authors: Rock Yuren Pang, Sebastin Santy, René Just, Katharina Reinecke

    Abstract: Digital technologies have positively transformed society, but they have also led to undesirable consequences not anticipated at the time of design or development. We posit that insights into past undesirable consequences can help researchers and practitioners gain awareness and anticipate potential adverse effects. To test this assumption, we introduce BLIP, a system that extracts real-world undes… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: To appear in the Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), May 11--16, 2024, Honolulu, HI, USA

  21. arXiv:2405.05493  [pdf, ps, other

    cs.CL cs.AI

    Parameter-Efficient Fine-Tuning With Adapters

    Authors: Keyu Chen, Yuan Pang, Zi Yang

    Abstract: In the arena of language model fine-tuning, the traditional approaches, such as Domain-Adaptive Pretraining (DAPT) and Task-Adaptive Pretraining (TAPT), although effective, but computational intensive. This research introduces a novel adaptation method utilizing the UniPELT framework as a base and added a PromptTuning Layer, which significantly reduces the number of trainable parameters while main… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  22. arXiv:2405.04376  [pdf, other

    cs.LG

    Towards Stability of Parameter-free Optimization

    Authors: Yijiang Pang, Shuyang Yu, Bao Hoang, Jiayu Zhou

    Abstract: Hyperparameter tuning, particularly the selection of an appropriate learning rate in adaptive gradient training methods, remains a challenge. To tackle this challenge, in this paper, we propose a novel parameter-free optimizer, \textsc{AdamG} (Adam with the golden step size), designed to automatically adapt to diverse optimization problems without manual tuning. The core technique underlying \text… ▽ More

    Submitted 27 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  23. arXiv:2405.01353  [pdf, other

    cs.CV

    Sparse multi-view hand-object reconstruction for unseen environments

    Authors: Yik Lung Pang, Changjae Oh, Andrea Cavallaro

    Abstract: Recent works in hand-object reconstruction mainly focus on the single-view and dense multi-view settings. On the one hand, single-view methods can leverage learned shape priors to generalise to unseen objects but are prone to inaccuracies due to occlusions. On the other hand, dense multi-view methods are very accurate but cannot easily adapt to unseen objects without further data collection. In co… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: Camera-ready version. Paper accepted to CVPRW 2024. 8 pages, 7 figures, 1 table

  24. arXiv:2405.01002  [pdf, other

    cs.CV cs.LG

    Spider: A Unified Framework for Context-dependent Concept Segmentation

    Authors: Xiaoqi Zhao, Youwei Pang, Wei Ji, Baicheng Sheng, Jiaming Zuo, Lihe Zhang, Huchuan Lu

    Abstract: Different from the context-independent (CI) concepts such as human, car, and airplane, context-dependent (CD) concepts require higher visual understanding ability, such as camouflaged object and medical lesion. Despite the rapid advance of many CD understanding tasks in respective branches, the isolated evolution leads to their limited cross-domain generalisation and repetitive technique innovatio… ▽ More

    Submitted 28 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  25. arXiv:2404.19733  [pdf, other

    cs.CL cs.AI

    Iterative Reasoning Preference Optimization

    Authors: Richard Yuanzhe Pang, Weizhe Yuan, Kyunghyun Cho, He He, Sainbayar Sukhbaatar, Jason Weston

    Abstract: Iterative preference optimization methods have recently been shown to perform well for general instruction tuning tasks, but typically make little improvement on reasoning tasks (Yuan et al., 2024, Chen et al., 2024). In this work we develop an iterative approach that optimizes the preference between competing generated Chain-of-Thought (CoT) candidates by optimizing for winning vs. losing reasoni… ▽ More

    Submitted 25 June, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

  26. arXiv:2404.15802  [pdf, other

    cs.CV cs.AI

    Raformer: Redundancy-Aware Transformer for Video Wire Inpainting

    Authors: Zhong Ji, Yimu Su, Yan Zhang, Jiacheng Hou, Yanwei Pang, Jungong Han

    Abstract: Video Wire Inpainting (VWI) is a prominent application in video inpainting, aimed at flawlessly removing wires in films or TV series, offering significant time and labor savings compared to manual frame-by-frame removal. However, wire removal poses greater challenges due to the wires being longer and slimmer than objects typically targeted in general video inpainting tasks, and often intersecting… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  27. arXiv:2404.09431  [pdf, other

    cs.CV

    VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection

    Authors: Bonan Ding, Jin Xie, Jing Nie, Jiale Cao, Xuelong Li, Yanwei Pang

    Abstract: Due to its cost-effectiveness and widespread availability, monocular 3D object detection, which relies solely on a single camera during inference, holds significant importance across various applications, including autonomous driving and robotics. Nevertheless, directly predicting the coordinates of objects in 3D space from monocular images poses challenges. Therefore, an effective solution involv… ▽ More

    Submitted 26 August, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: 11 pages, 4 figures

  28. arXiv:2404.07600  [pdf, other

    cs.CV

    Implicit and Explicit Language Guidance for Diffusion-based Visual Perception

    Authors: Hefeng Wang, Jiale Cao, Jin Xie, Aiping Yang, Yanwei Pang

    Abstract: Text-to-image diffusion models have shown powerful ability on conditional image synthesis. With large-scale vision-language pre-training, diffusion models are able to generate high-quality images with rich texture and reasonable structure under different text prompts. However, it is an open problem to adapt the pre-trained diffusion model for visual perception. In this paper, we propose an implici… ▽ More

    Submitted 15 August, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: Accepted by IEEE TMM

  29. arXiv:2404.07445  [pdf, other

    cs.CV

    Multi-view Aggregation Network for Dichotomous Image Segmentation

    Authors: Qian Yu, Xiaoqi Zhao, Youwei Pang, Lihe Zhang, Huchuan Lu

    Abstract: Dichotomous Image Segmentation (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images. When designing an effective DIS model, the main challenge is how to balance the semantic dispersion of high-resolution targets in the small receptive field and the loss of high-precision details in the large receptive field. Existing methods rely on tedious mu… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 as Highlight

  30. arXiv:2404.06787  [pdf, other

    cs.LG cs.AI

    Private Wasserstein Distance with Random Noises

    Authors: Wenqian Li, Haozhi Wang, Zhe Huang, Yan Pang

    Abstract: Wasserstein distance is a principle measure of data divergence from a distributional standpoint. However, its application becomes challenging in the context of data privacy, where sharing raw data is restricted. Prior attempts have employed techniques like Differential Privacy or Federated optimization to approximate Wasserstein distance. Nevertheless, these approaches often lack accuracy and robu… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  31. arXiv:2403.15733  [pdf, other

    cs.SI cs.CY

    Spatio-Temporal Graph Convolutional Network Combined Large Language Model: A Deep Learning Framework for Bike Demand Forecasting

    Authors: Peisen Li, Yizhe Pang, Junyu Ren

    Abstract: This study presents a new deep learning framework, combining Spatio-Temporal Graph Convolutional Network (STGCN) with a Large Language Model (LLM), for bike demand forecasting. Addressing challenges in transforming discrete datasets and integrating unstructured language data, the framework leverages LLMs to extract insights from Points of Interest (POI) text data. The proposed STGCN-L model demons… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: ISNN 2024

  32. arXiv:2403.12486  [pdf, other

    cs.LG cs.AI

    NTK-Guided Few-Shot Class Incremental Learning

    Authors: Jingren Liu, Zhong Ji, Yanwei Pang, YunLong Yu

    Abstract: While anti-amnesia FSCIL learners often excel in incremental sessions, they tend to prioritize mitigating knowledge attrition over harnessing the model's potential for knowledge acquisition. In this paper, we delve into the foundations of model generalization in FSCIL through the lens of the Neural Tangent Kernel (NTK). Our primary design focus revolves around ensuring optimal NTK convergence and… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  33. arXiv:2403.12455  [pdf, other

    cs.CV

    CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation

    Authors: Wenqi Zhu, Jiale Cao, Jin Xie, Shuangming Yang, Yanwei Pang

    Abstract: Open-vocabulary video instance segmentation strives to segment and track instances belonging to an open set of categories in a video. The vision-language model Contrastive Language-Image Pre-training (CLIP) has shown robust zero-shot classification ability in image-level open-vocabulary task. In this paper, we propose a simple encoder-decoder network, called CLIP-VIS, to adapt CLIP for open-vocabu… ▽ More

    Submitted 7 June, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  34. arXiv:2403.10127  [pdf, other

    cs.CV

    TransLandSeg: A Transfer Learning Approach for Landslide Semantic Segmentation Based on Vision Foundation Model

    Authors: Changhong Hou, Junchuan Yu, Daqing Ge, Liu Yang, Laidian Xi, Yunxuan Pang, Yi Wen

    Abstract: Landslides are one of the most destructive natural disasters in the world, posing a serious threat to human life and safety. The development of foundation models has provided a new research paradigm for large-scale landslide detection. The Segment Anything Model (SAM) has garnered widespread attention in the field of image segmentation. However, our experiment found that SAM performed poorly in th… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  35. arXiv:2403.08902  [pdf, other

    cs.CV

    Envision3D: One Image to 3D with Anchor Views Interpolation

    Authors: Yatian Pang, Tanghui Jia, Yujun Shi, Zhenyu Tang, Junwu Zhang, Xinhua Cheng, Xing Zhou, Francis E. H. Tay, Li Yuan

    Abstract: We present Envision3D, a novel method for efficiently generating high-quality 3D content from a single image. Recent methods that extract 3D content from multi-view images generated by diffusion models show great potential. However, it is still challenging for diffusion models to generate dense multi-view consistent images, which is crucial for the quality of 3D content extraction. To address this… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: GitHub repository: https://github.com/PKU-YuanGroup/Envision3D

  36. arXiv:2403.07888  [pdf, other

    cs.CV cs.AI

    Cross-modality debiasing: using language to mitigate sub-population shifts in imaging

    Authors: Yijiang Pang, Bao Hoang, Jiayu Zhou

    Abstract: Sub-population shift is a specific type of domain shift that highlights changes in data distribution within specific sub-groups or populations between training and testing. Sub-population shift accounts for a significant source of algorithmic bias and calls for distributional robustness. Recent studies found inherent distributional robustness in multi-modality foundation models, such as the vision… ▽ More

    Submitted 2 April, 2024; v1 submitted 2 February, 2024; originally announced March 2024.

  37. arXiv:2402.15810  [pdf, other

    cs.DL cs.CL cs.LG

    OAG-Bench: A Human-Curated Benchmark for Academic Graph Mining

    Authors: Fanjin Zhang, Shijie Shi, Yifan Zhu, Bo Chen, Yukuo Cen, Jifan Yu, Yelin Chen, Lulu Wang, Qingfei Zhao, Yuqing Cheng, Tianyi Han, Yuwei An, Dan Zhang, Weng Lam Tam, Kun Cao, Yunhe Pang, Xinyu Guan, Huihui Yuan, Jian Song, Xiaoyan Li, Yuxiao Dong, Jie Tang

    Abstract: With the rapid proliferation of scientific literature, versatile academic knowledge services increasingly rely on comprehensive academic graph mining. Despite the availability of public academic graphs, benchmarks, and datasets, these resources often fall short in multi-aspect and fine-grained annotations, are constrained to specific task types and domains, or lack underlying real academic graphs.… ▽ More

    Submitted 20 June, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

    Comments: KDD'24, 9 pages, 5 appendix pages

    Journal ref: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '24), August 25--29, 2024, Barcelona, Spain

  38. arXiv:2402.13126  [pdf, other

    cs.CR cs.AI cs.CV cs.LG eess.IV

    VGMShield: Mitigating Misuse of Video Generative Models

    Authors: Yan Pang, Yang Zhang, Tianhao Wang

    Abstract: With the rapid advancement in video generation, people can conveniently utilize video generation models to create videos tailored to their specific desires. Nevertheless, there are also growing concerns about their potential misuse in creating and disseminating false information. In this work, we introduce VGMShield: a set of three straightforward but pioneering mitigations through the lifecycle… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: 17 pages, 10 figures

  39. arXiv:2402.02797  [pdf, other

    cs.CV cs.LG

    Joint Attention-Guided Feature Fusion Network for Saliency Detection of Surface Defects

    Authors: Xiaoheng Jiang, Feng Yan, Yang Lu, Ke Wang, Shuai Guo, Tianzhu Zhang, Yanwei Pang, Jianwei Niu, Mingliang Xu

    Abstract: Surface defect inspection plays an important role in the process of industrial manufacture and production. Though Convolutional Neural Network (CNN) based defect inspection methods have made huge leaps, they still confront a lot of challenges such as defect scale variation, complex background, low contrast, and so on. To address these issues, we propose a joint attention-guided feature fusion netw… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  40. arXiv:2402.01621  [pdf, other

    cs.LG

    Stochastic Two Points Method for Deep Model Zeroth-order Optimization

    Authors: Yijiang Pang, Jiayu Zhou

    Abstract: Large foundation models, such as large language models, have performed exceptionally well in various application scenarios. Building or fully fine-tuning such large models is usually prohibitive due to either hardware budget or lack of access to backpropagation. The zeroth-order methods offer a promising direction for tackling this challenge, where only forward passes are needed to update the mode… ▽ More

    Submitted 27 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  41. arXiv:2401.15947  [pdf, other

    cs.CV

    MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

    Authors: Bin Lin, Zhenyu Tang, Yang Ye, Jiaxi Cui, Bin Zhu, Peng Jin, Jinfa Huang, Junwu Zhang, Yatian Pang, Munan Ning, Li Yuan

    Abstract: Recent advances demonstrate that scaling Large Vision-Language Models (LVLMs) effectively improves downstream task performances. However, existing scaling methods enable all model parameters to be active for each token in the calculation, which brings massive training and inferring costs. In this work, we propose a simple yet effective training strategy MoE-Tuning for LVLMs. This strategy innovati… ▽ More

    Submitted 6 July, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: K = P + N represents the length of the output sequence in the formula (8)

  42. arXiv:2401.10020  [pdf, other

    cs.CL cs.AI

    Self-Rewarding Language Models

    Authors: Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho, Xian Li, Sainbayar Sukhbaatar, Jing Xu, Jason Weston

    Abstract: We posit that to achieve superhuman agents, future models require superhuman feedback in order to provide an adequate training signal. Current approaches commonly train reward models from human preferences, which may then be bottlenecked by human performance level, and secondly these separate frozen reward models cannot then learn to improve during LLM training. In this work, we study Self-Rewardi… ▽ More

    Submitted 8 February, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

  43. arXiv:2401.09176  [pdf

    cs.LG

    ADCNet: a unified framework for predicting the activity of antibody-drug conjugates

    Authors: Liye Chen, Biaoshun Li, Yihao Chen, Mujie Lin, Shipeng Zhang, Chenxin Li, Yu Pang, Ling Wang

    Abstract: Antibody-drug conjugate (ADC) has revolutionized the field of cancer treatment in the era of precision medicine due to their ability to precisely target cancer cells and release highly effective drug. Nevertheless, the realization of rational design of ADC is very difficult because the relationship between their structures and activities is difficult to understand. In the present study, we introdu… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  44. arXiv:2401.00870  [pdf, other

    cs.CR cs.AI

    ConfusionPrompt: Practical Private Inference for Online Large Language Models

    Authors: Peihua Mai, Ran Yan, Rui Ye, Youjia Yang, Yinchuan Li, Yan Pang

    Abstract: State-of-the-art large language models (LLMs) are commonly deployed as online services, necessitating users to transmit informative prompts to cloud servers, thus engendering substantial privacy concerns. In response, we present ConfusionPrompt, a novel private LLM inference framework designed to obfuscate the server by: (i) decomposing the prompt into sub-prompts, and (ii) generating pseudo promp… ▽ More

    Submitted 24 May, 2024; v1 submitted 29 December, 2023; originally announced January 2024.

    Comments: 21 pages

    MSC Class: I.2.7

  45. arXiv:2312.13271  [pdf, other

    cs.CV

    Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting

    Authors: Junwu Zhang, Zhenyu Tang, Yatian Pang, Xinhua Cheng, Peng Jin, Yida Wei, Munan Ning, Li Yuan

    Abstract: Recent one image to 3D generation methods commonly adopt Score Distillation Sampling (SDS). Despite the impressive results, there are multiple deficiencies including multi-view inconsistency, over-saturated and over-smoothed textures, as well as the slow generation speed. To address these deficiencies, we present Repaint123 to alleviate multi-view bias as well as texture degradation and speed up t… ▽ More

    Submitted 27 December, 2023; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: Project page: https://pku-yuangroup.github.io/repaint123/

  46. arXiv:2312.09988  [pdf, other

    eess.IV cs.CV

    Towards Architecture-Agnostic Untrained Network Priors for Image Reconstruction with Frequency Regularization

    Authors: Yilin Liu, Yunkui Pang, Jiang Li, Yong Chen, Pew-Thian Yap

    Abstract: Untrained networks inspired by deep image priors have shown promising capabilities in recovering high-quality images from noisy or partial measurements without requiring training sets. Their success is widely attributed to implicit regularization due to the spectral bias of suitable network architectures. However, the application of such network-based priors often entails superfluous architectural… ▽ More

    Submitted 18 July, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted at ECCV 2024

  47. arXiv:2312.08207  [pdf, other

    cs.CR

    Black-box Membership Inference Attacks against Fine-tuned Diffusion Models

    Authors: Yan Pang, Tianhao Wang

    Abstract: With the rapid advancement of diffusion-based image-generative models, the quality of generated images has become increasingly photorealistic. Moreover, with the release of high-quality pre-trained image-generative models, a growing number of users are downloading these pre-trained models to fine-tune them with downstream datasets for various image-generation tasks. However, employing such powerfu… ▽ More

    Submitted 23 April, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

  48. arXiv:2312.05104   

    cs.RO

    An Autonomous Driving Model Integrated with BEV-V2X Perception, Fusion Prediction of Motion and Occupancy, and Driving Planning, in Complex Traffic Intersections

    Authors: Fukang Li, Wenlin Ou, Kunpeng Gao, Yuwen Pang, Yifei Li, Henry Fan

    Abstract: The comprehensiveness of vehicle-to-everything (V2X) recognition enriches and holistically shapes the global Birds-Eye-View (BEV) perception, incorporating rich semantics and integrating driving scene information, thereby serving features of vehicle state prediction, decision-making and driving planning. Utilizing V2X message sets to form BEV map proves to be an effective perception method for con… ▽ More

    Submitted 22 April, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

    Comments: The content of the paper has not received unanimous consent from all the members and requires further evaluation prior to submission

  49. arXiv:2312.02528  [pdf, other

    cs.CV

    Towards Automatic Power Battery Detection: New Challenge, Benchmark Dataset and Baseline

    Authors: Xiaoqi Zhao, Youwei Pang, Zhenyu Chen, Qian Yu, Lihe Zhang, Hanqi Liu, Jiaming Zuo, Huchuan Lu

    Abstract: We conduct a comprehensive study on a new task named power battery detection (PBD), which aims to localize the dense cathode and anode plates endpoints from X-ray images to evaluate the quality of power batteries. Existing manufacturers usually rely on human eye observation to complete PBD, which makes it difficult to balance the accuracy and efficiency of detection. To address this issue and driv… ▽ More

    Submitted 28 February, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: Accepted at CVPR2024

  50. arXiv:2312.01044  [pdf, other

    cs.CL

    Large Language Models Are Zero-Shot Text Classifiers

    Authors: Zhiqiang Wang, Yiran Pang, Yanbin Lin

    Abstract: Retrained large language models (LLMs) have become extensively used across various sub-disciplines of natural language processing (NLP). In NLP, text classification problems have garnered considerable focus, but still faced with some limitations related to expensive computational cost, time consumption, and robust performance to unseen classes. With the proposal of chain of thought prompting (CoT)… ▽ More

    Submitted 2 December, 2023; originally announced December 2023.

    Comments: 9 pages, 3 figures, 6 tables