Skip to main content

Showing 1–50 of 155 results for author: Guan, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.13461  [pdf, other

    cs.CV cs.AI

    Probing the Robustness of Vision-Language Pretrained Models: A Multimodal Adversarial Attack Approach

    Authors: Jiwei Guan, Tianyu Ding, Longbing Cao, Lei Pan, Chen Wang, Xi Zheng

    Abstract: Vision-language pretraining (VLP) with transformers has demonstrated exceptional performance across numerous multimodal tasks. However, the adversarial robustness of these models has not been thoroughly investigated. Existing multimodal attack methods have largely overlooked cross-modal interactions between visual and textual modalities, particularly in the context of cross-attention mechanisms. I… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  2. arXiv:2408.09675  [pdf, other

    cs.AI cs.MA cs.RO

    Multi-Agent Reinforcement Learning for Autonomous Driving: A Survey

    Authors: Ruiqi Zhang, Jing Hou, Florian Walter, Shangding Gu, Jiayi Guan, Florian Röhrbein, Yali Du, Panpan Cai, Guang Chen, Alois Knoll

    Abstract: Reinforcement Learning (RL) is a potent tool for sequential decision-making and has achieved performance surpassing human capabilities across many challenging real-world tasks. As the extension of RL in the multi-agent system domain, multi-agent RL (MARL) not only need to learn the control policy but also requires consideration regarding interactions with all other agents in the environment, mutua… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 23 pages, 6 figures and 2 tables. Submitted to IEEE Journal

  3. arXiv:2408.06402  [pdf, other

    q-bio.QM cs.AI cs.LG

    PhaGO: Protein function annotation for bacteriophages by integrating the genomic context

    Authors: Jiaojiao Guan, Yongxin Ji, Cheng Peng, Wei Zou, Xubo Tang, Jiayu Shang, Yanni Sun

    Abstract: Bacteriophages are viruses that target bacteria, playing a crucial role in microbial ecology. Phage proteins are important in understanding phage biology, such as virus infection, replication, and evolution. Although a large number of new phages have been identified via metagenomic sequencing, many of them have limited protein function annotation. Accurate function annotation of phage proteins pre… ▽ More

    Submitted 17 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

    Comments: 17 pages,6 figures

  4. arXiv:2408.05914  [pdf, other

    cs.CV

    Deep Multimodal Collaborative Learning for Polyp Re-Identification

    Authors: Suncheng Xiang, Jincheng Li, Zhengjie Zhang, Shilun Cai, Jiale Guan, Dahong Qian

    Abstract: Colonoscopic Polyp Re-Identification aims to match the same polyp from a large gallery with images from different views taken using different cameras and plays an important role in the prevention and treatment of colorectal cancer in computer-aided diagnosis. However, traditional methods for object ReID directly adopting CNN models trained on the ImageNet dataset usually produce unsatisfactory ret… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: Work in progress. arXiv admin note: text overlap with arXiv:2307.10625

  5. arXiv:2408.03284  [pdf, other

    cs.CV cs.GR cs.MM

    ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer

    Authors: Jiazhi Guan, Zhiliang Xu, Hang Zhou, Kaisiyuan Wang, Shengyi He, Zhanwang Zhang, Borong Liang, Haocheng Feng, Errui Ding, Jingtuo Liu, Jingdong Wang, Youjian Zhao, Ziwei Liu

    Abstract: Lip-syncing videos with given audio is the foundation for various applications including the creation of virtual presenters or performers. While recent studies explore high-fidelity lip-sync with different techniques, their task-orientated models either require long-term videos for clip-specific training or retain visible artifacts. In this paper, we propose a unified and effective framework ReSyn… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted to European Conference on Computer Vision (ECCV), 2024. Project page: https://guanjz20.github.io/projects/ReSyncer

  6. arXiv:2407.06677  [pdf, other

    cs.CL

    Mixture-of-Modules: Reinventing Transformers as Dynamic Assemblies of Modules

    Authors: Zhuocheng Gong, Ang Lv, Jian Guan, Junxi Yan, Wei Wu, Huishuai Zhang, Minlie Huang, Dongyan Zhao, Rui Yan

    Abstract: Is it always necessary to compute tokens from shallow to deep layers in Transformers? The continued success of vanilla Transformers and their variants suggests an undoubted "yes". In this work, however, we attempt to break the depth-ordered convention by proposing a novel architecture dubbed mixture-of-modules (MoM), which is motivated by an intuition that any layer, regardless of its position, ca… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  7. arXiv:2407.04936  [pdf, other

    cs.SD eess.AS

    A Reference-free Metric for Language-Queried Audio Source Separation using Contrastive Language-Audio Pretraining

    Authors: Feiyang Xiao, Jian Guan, Qiaoxi Zhu, Xubo Liu, Wenbo Wang, Shuhan Qi, Kejia Zhang, Jianyuan Sun, Wenwu Wang

    Abstract: Language-queried audio source separation (LASS) aims to separate an audio source guided by a text query, with the signal-to-distortion ratio (SDR)-based metrics being commonly used to objectively measure the quality of the separated audio. However, the SDR-based metrics require a reference signal, which is often difficult to obtain in real-world scenarios. In addition, with the SDR-based metrics,… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Submitted to DCASE 2024 Workshop

  8. arXiv:2406.19934  [pdf, other

    cs.CL cs.AI

    From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis

    Authors: Chuanqi Cheng, Jian Guan, Wei Wu, Rui Yan

    Abstract: We explore multi-step reasoning in vision-language models (VLMs). The problem is challenging, as reasoning data consisting of multiple steps of visual and language processing are barely available. To overcome the challenge, we first introduce a least-to-most visual reasoning paradigm, which interleaves steps of decomposing a question into sub-questions and invoking external tools for resolving sub… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  9. arXiv:2406.16708  [pdf, other

    cs.LG stat.ME

    CausalFormer: An Interpretable Transformer for Temporal Causal Discovery

    Authors: Lingbai Kong, Wengen Li, Hanchen Yang, Yichao Zhang, Jihong Guan, Shuigeng Zhou

    Abstract: Temporal causal discovery is a crucial task aimed at uncovering the causal relations within time series data. The latest temporal causal discovery methods usually train deep learning models on prediction tasks to uncover the causality between time series. They capture causal relations by analyzing the parameters of some components of the trained models, e.g., attention weights and convolution weig… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  10. arXiv:2405.18729  [pdf, other

    cs.LG cs.AI

    Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning

    Authors: Tianle Zhang, Jiayi Guan, Lin Zhao, Yihang Li, Dongjiang Li, Zecui Zeng, Lei Sun, Yue Chen, Xuelong Wei, Lusong Li, Xiaodong He

    Abstract: Offline reinforcement learning (RL) aims to learn optimal policies from previously collected datasets. Recently, due to their powerful representational capabilities, diffusion models have shown significant potential as policy models for offline RL issues. However, previous offline RL algorithms based on diffusion policies generally adopt weighted regression to improve the policy. This approach opt… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  11. arXiv:2405.15769  [pdf, other

    cs.CV

    FastDrag: Manipulate Anything in One Step

    Authors: Xuanjia Zhao, Jian Guan, Congyi Fan, Dongli Xu, Youtian Lin, Haiwei Pan, Pengming Feng

    Abstract: Drag-based image editing using generative models provides precise control over image contents, enabling users to manipulate anything in an image with a few clicks. However, prevailing methods typically adopt $n$-step iterations for latent semantic optimization to achieve drag-based image editing, which is time-consuming and limits practical applications. In this paper, we introduce a novel one-ste… ▽ More

    Submitted 6 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: 13 pages, 13 figures, Project page: https://fastdrag-site.github.io/

  12. arXiv:2404.13528  [pdf, other

    cs.LG cs.AI cs.DC

    SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile

    Authors: Wei Niu, Md Musfiqur Rahman Sanim, Zhihao Shu, Jiexiong Guan, Xipeng Shen, Miao Yin, Gagan Agrawal, Bin Ren

    Abstract: This work is motivated by recent developments in Deep Neural Networks, particularly the Transformer architectures underlying applications such as ChatGPT, and the need for performing inference on mobile devices. Focusing on emerging transformers (specifically the ones with computationally efficient Swin-like architectures) and large models (e.g., Stable Diffusion and LLMs) based on transformers, w… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  13. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  14. arXiv:2403.15759  [pdf

    cs.CY

    Deep Learning Approach to Forecasting COVID-19 Cases in Residential Buildings of Hong Kong Public Housing Estates: The Role of Environment and Sociodemographics

    Authors: E. Leung, J. Guan, KO. Kwok, CT. Hung, CC. Ching, KC. Chong, CHK. Yam, T. Sun, WH. Tsang, EK. Yeoh, A. Lee

    Abstract: Introduction: The current study investigates the complex association between COVID-19 and the studied districts' socioecology (e.g. internal and external built environment, sociodemographic profiles, etc.) to quantify their contributions to the early outbreaks and epidemic resurgence of COVID-19. Methods: We aligned the analytic model's architecture with the hierarchical structure of the resident'… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  15. arXiv:2403.13842  [pdf

    cs.LG physics.soc-ph

    Analyzing the Variations in Emergency Department Boarding and Testing the Transferability of Forecasting Models across COVID-19 Pandemic Waves in Hong Kong: Hybrid CNN-LSTM approach to quantifying building-level socioecological risk

    Authors: Eman Leung, Jingjing Guan, Kin On Kwok, CT Hung, CC. Ching, CK. Chung, Hector Tsang, EK Yeoh, Albert Lee

    Abstract: Emergency department's (ED) boarding (defined as ED waiting time greater than four hours) has been linked to poor patient outcomes and health system performance. Yet, effective forecasting models is rare before COVID-19, lacking during the peri-COVID era. Here, a hybrid convolutional neural network (CNN)-Long short-term memory (LSTM) model was applied to public-domain data sourced from Hong Kong's… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  16. arXiv:2403.07902  [pdf, other

    q-bio.BM cs.LG

    DecompDiff: Diffusion Models with Decomposed Priors for Structure-Based Drug Design

    Authors: Jiaqi Guan, Xiangxin Zhou, Yuwei Yang, Yu Bao, Jian Peng, Jianzhu Ma, Qiang Liu, Liang Wang, Quanquan Gu

    Abstract: Designing 3D ligands within a target binding site is a fundamental task in drug discovery. Existing structured-based drug design methods treat all ligand atoms equally, which ignores different roles of atoms in the ligand for drug design and can be less efficient for exploring the large drug-like molecule space. In this paper, inspired by the convention in pharmaceutical practice, we decompose the… ▽ More

    Submitted 26 February, 2024; originally announced March 2024.

    Comments: Accepted to ICML 2023

  17. arXiv:2403.07040  [pdf, other

    cs.LG cs.AI

    All in One: Multi-Task Prompting for Graph Neural Networks (Extended Abstract)

    Authors: Xiangguo Sun, Hong Cheng, Jia Li, Bo Liu, Jihong Guan

    Abstract: This paper is an extended abstract of our original work published in KDD23, where we won the best research paper award (Xiangguo Sun, Hong Cheng, Jia Li, Bo Liu, and Jihong Guan. All in one: Multi-task prompting for graph neural networks. KDD 23) The paper introduces a novel approach to bridging the gap between pre-trained graph models and the diverse tasks they're applied to, inspired by the succ… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: submitted to IJCAI 2024 Sister Conferences Track. The original paper can be seen at arXiv:2307.01504

  18. arXiv:2403.03218  [pdf, other

    cs.LG cs.AI cs.CL cs.CY

    The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

    Authors: Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Zhenqi Zhao, Ariel Herbert-Voss, Cort B. Breuer , et al. (32 additional authors not shown)

    Abstract: The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in developing biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are developing evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing furthe… ▽ More

    Submitted 15 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: See the project page at https://wmdp.ai

  19. arXiv:2402.19101  [pdf, other

    cs.IR cs.LG

    Effective Two-Stage Knowledge Transfer for Multi-Entity Cross-Domain Recommendation

    Authors: Jianyu Guan, Zongming Yin, Tianyi Zhang, Leihui Chen, Yin Zhang, Fei Huang, Jufeng Chen, Shuguang Han

    Abstract: In recent years, the recommendation content on e-commerce platforms has become increasingly rich -- a single user feed may contain multiple entities, such as selling products, short videos, and content posts. To deal with the multi-entity recommendation problem, an intuitive solution is to adopt the shared-network-based architecture for joint training. The idea is to transfer the extracted knowled… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  20. arXiv:2402.03708  [pdf, other

    cs.CV

    SISP: A Benchmark Dataset for Fine-grained Ship Instance Segmentation in Panchromatic Satellite Images

    Authors: Pengming Feng, Mingjie Xie, Hongning Liu, Xuanjia Zhao, Guangjun He, Xueliang Zhang, Jian Guan

    Abstract: Fine-grained ship instance segmentation in satellite images holds considerable significance for monitoring maritime activities at sea. However, existing datasets often suffer from the scarcity of fine-grained information or pixel-wise localization annotations, as well as the insufficient image diversity and variations, thus limiting the research of this task. To this end, we propose a benchmark da… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: 14 pages, 9 figures

  21. arXiv:2402.01469  [pdf, other

    cs.CL

    AMOR: A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback

    Authors: Jian Guan, Wei Wu, Zujie Wen, Peng Xu, Hongning Wang, Minlie Huang

    Abstract: The notable success of large language models (LLMs) has sparked an upsurge in building language agents to complete various complex tasks. We present AMOR, an agent framework based on open-source LLMs, which reasons with external knowledge bases and adapts to specific domains through human supervision to the reasoning process. AMOR builds reasoning logic over a finite state machine (FSM) that solve… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: Work in progress

  22. arXiv:2401.11040  [pdf, other

    cs.HC

    Design Frameworks for Spatial Zone Agents in XRI Metaverse Smart Environments

    Authors: Jie Guan, Jiamin Liu, Alexis Morris

    Abstract: The spatial XR-IoT (XRI) Zone Agents concept combines Extended Reality (XR), the Internet of Things (IoT), and spatial computing concepts to create hyper-connected spaces for metaverse applications; envisioning space as zones that are social, smart, scalable, expressive, and agent-based. These zone agents serve as applications and agents (partners, assistants, or guides) for users co-living and co… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Journal ref: 6th IEEE International Conference on Artificial Intelligence & extended and Virtual Reality (IEEE AIxVR 2024)

  23. arXiv:2312.16855  [pdf, other

    cs.LG q-bio.BM

    Molecular Property Prediction Based on Graph Structure Learning

    Authors: Bangyi Zhao, Weixia Xu, Jihong Guan, Shuigeng Zhou

    Abstract: Molecular property prediction (MPP) is a fundamental but challenging task in the computer-aided drug discovery process. More and more recent works employ different graph-based models for MPP, which have made considerable progress in improving prediction performance. However, current models often ignore relationships between molecules, which could be also helpful for MPP. For this sake, in this pap… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

  24. arXiv:2312.16600  [pdf, other

    q-bio.GN cs.AI cs.LG

    scRNA-seq Data Clustering by Cluster-aware Iterative Contrastive Learning

    Authors: Weikang Jiang, Jinxian Wang, Jihong Guan, Shuigeng Zhou

    Abstract: Single-cell RNA sequencing (scRNA-seq) enables researchers to analyze gene expression at single-cell level. One important task in scRNA-seq data analysis is unsupervised clustering, which helps identify distinct cell types, laying down the foundation for other downstream analysis tasks. In this paper, we propose a novel method called Cluster-aware Iterative Contrastive Learning (CICL in short) for… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

  25. arXiv:2312.04519  [pdf, other

    cs.CV

    Bootstrapping Autonomous Driving Radars with Self-Supervised Learning

    Authors: Yiduo Hao, Sohrab Madani, Junfeng Guan, Mohammed Alloulah, Saurabh Gupta, Haitham Hassanieh

    Abstract: The perception of autonomous vehicles using radars has attracted increased research interest due its ability to operate in fog and bad weather. However, training radar models is hindered by the cost and difficulty of annotating large-scale radar data. To overcome this bottleneck, we propose a self-supervised learning framework to leverage the large amount of unlabeled radar data to pre-train radar… ▽ More

    Submitted 18 April, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: 12 pages, 5 figures, to be published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024

  26. arXiv:2311.06073  [pdf, other

    cs.DC

    Collaborative Inference in DNN-based Satellite Systems with Dynamic Task Streams

    Authors: Jinglong Guan, Qiyang Zhang, Ilir Murturi, Praveen Kumar Donta, Schahram Dustdar, Shangguang Wang

    Abstract: As a driving force in the advancement of intelligent in-orbit applications, DNN models have been gradually integrated into satellites, producing daily latency-constraint and computation-intensive tasks. However, the substantial computation capability of DNN models, coupled with the instability of the satellite-ground link, pose significant challenges, hindering timely completion of tasks. It becom… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

  27. arXiv:2310.14564  [pdf, other

    cs.CL

    Language Models Hallucinate, but May Excel at Fact Verification

    Authors: Jian Guan, Jesse Dodge, David Wadden, Minlie Huang, Hao Peng

    Abstract: Recent progress in natural language processing (NLP) owes much to remarkable advances in large language models (LLMs). Nevertheless, LLMs frequently "hallucinate," resulting in non-factual outputs. Our carefully-designed human evaluation substantiates the serious hallucination issue, revealing that even GPT-3.5 produces factual outputs less than 25% of the time. This underscores the importance of… ▽ More

    Submitted 20 March, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted in NAACL 2024

  28. arXiv:2310.14173  [pdf, other

    cs.SD eess.AS

    First-Shot Unsupervised Anomalous Sound Detection With Unknown Anomalies Estimated by Metadata-Assisted Audio Generation

    Authors: Hejing Zhang, Qiaoxi Zhu, Jian Guan, Haohe Liu, Feiyang Xiao, Jiantong Tian, Xinhao Mei, Xubo Liu, Wenwu Wang

    Abstract: First-shot (FS) unsupervised anomalous sound detection (ASD) is a brand-new task introduced in DCASE 2023 Challenge Task 2, where the anomalous sounds for the target machine types are unseen in training. Existing methods often rely on the availability of normal and abnormal sound data from the target machines. However, due to the lack of anomalous sound data for the target machine types, it become… ▽ More

    Submitted 11 March, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

    Comments: Accepted at ICASSP 2024

  29. arXiv:2310.08950  [pdf, ps, other

    cs.SD eess.AS

    Transformer-based Autoencoder with ID Constraint for Unsupervised Anomalous Sound Detection

    Authors: Jian Guan, Youde Liu, Qiuqiang Kong, Feiyang Xiao, Qiaoxi Zhu, Jiantong Tian, Wenwu Wang

    Abstract: Unsupervised anomalous sound detection (ASD) aims to detect unknown anomalous sounds of devices when only normal sound data is available. The autoencoder (AE) and self-supervised learning based methods are two mainstream methods. However, the AE-based methods could be limited as the feature learned from normal sounds can also fit with anomalous sounds, reducing the ability of the model in detectin… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

    Comments: Accepted by EURASIP Journal on Audio, Speech, and Music Processing

  30. arXiv:2310.05330  [pdf, other

    cs.CV

    A Lightweight Video Anomaly Detection Model with Weak Supervision and Adaptive Instance Selection

    Authors: Yang Wang, Jiaogen Zhou, Jihong Guan

    Abstract: Video anomaly detection is to determine whether there are any abnormal events, behaviors or objects in a given video, which enables effective and intelligent public safety management. As video anomaly labeling is both time-consuming and expensive, most existing works employ unsupervised or weakly supervised learning methods. This paper focuses on weakly supervised video anomaly detection, in which… ▽ More

    Submitted 5 July, 2024; v1 submitted 8 October, 2023; originally announced October 2023.

  31. arXiv:2310.04463  [pdf, other

    q-bio.BM cs.AI cs.LG

    Diffusing on Two Levels and Optimizing for Multiple Properties: A Novel Approach to Generating Molecules with Desirable Properties

    Authors: Siyuan Guo, Jihong Guan, Shuigeng Zhou

    Abstract: In the past decade, Artificial Intelligence driven drug design and discovery has been a hot research topic, where an important branch is molecule generation by generative models, from GAN-based models and VAE-based models to the latest diffusion-based models. However, most existing models pursue only the basic properties like validity and uniqueness of the generated molecules, a few go further to… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

  32. arXiv:2309.09705  [pdf, other

    cs.SD eess.AS

    Synth-AC: Enhancing Audio Captioning with Synthetic Supervision

    Authors: Feiyang Xiao, Qiaoxi Zhu, Jian Guan, Xubo Liu, Haohe Liu, Kejia Zhang, Wenwu Wang

    Abstract: Data-driven approaches hold promise for audio captioning. However, the development of audio captioning methods can be biased due to the limited availability and quality of text-audio data. This paper proposes a SynthAC framework, which leverages recent advances in audio generative models and commonly available text corpus to create synthetic text-audio pairs, thereby enhancing text-audio represent… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  33. arXiv:2309.07498  [pdf, other

    eess.AS cs.SD

    Hierarchical Metadata Information Constrained Self-Supervised Learning for Anomalous Sound Detection Under Domain Shift

    Authors: Haiyan Lan, Qiaoxi Zhu, Jian Guan, Yuming Wei, Wenwu Wang

    Abstract: Self-supervised learning methods have achieved promising performance for anomalous sound detection (ASD) under domain shift, where the type of domain shift is considered in feature learning by incorporating section IDs. However, the attributes accompanying audio files under each section, such as machine operating conditions and noise types, have not been considered, although they are also crucial… ▽ More

    Submitted 18 December, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: To appear at ICASSP 2024

  34. arXiv:2309.04819  [pdf, other

    quant-ph cs.CR cs.LG

    Detecting Violations of Differential Privacy for Quantum Algorithms

    Authors: Ji Guan, Wang Fang, Mingyu Huang, Mingsheng Ying

    Abstract: Quantum algorithms for solving a wide range of practical problems have been proposed in the last ten years, such as data search and analysis, product recommendation, and credit scoring. The concern about privacy and other ethical issues in quantum computing naturally rises up. In this paper, we define a formal framework for detecting violations of differential privacy for quantum algorithms. A det… ▽ More

    Submitted 9 September, 2023; originally announced September 2023.

    Journal ref: In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS 2023)

  35. arXiv:2308.14063  [pdf, other

    cs.SD eess.AS

    Anomalous Sound Detection Using Self-Attention-Based Frequency Pattern Analysis of Machine Sounds

    Authors: Hejing Zhang, Jian Guan, Qiaoxi Zhu, Feiyang Xiao, Youde Liu

    Abstract: Different machines can exhibit diverse frequency patterns in their emitted sound. This feature has been recently explored in anomaly sound detection and reached state-of-the-art performance. However, existing methods rely on the manual or empirical determination of the frequency filter by observing the effective frequency range in the training data, which may be impractical for general application… ▽ More

    Submitted 6 September, 2023; v1 submitted 27 August, 2023; originally announced August 2023.

    Comments: Published in INTERSPEECH 2023

  36. arXiv:2308.09540  [pdf, other

    cs.CV cs.AI

    Meta-ZSDETR: Zero-shot DETR with Meta-learning

    Authors: Lu Zhang, Chenbo Zhang, Jiajia Zhao, Jihong Guan, Shuigeng Zhou

    Abstract: Zero-shot object detection aims to localize and recognize objects of unseen classes. Most of existing works face two problems: the low recall of RPN in unseen classes and the confusion of unseen classes with background. In this paper, we present the first method that combines DETR and meta-learning to perform zero-shot object detection, named Meta-ZSDETR, where model training is formalized as an i… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: Accepted in ICCV 2023

  37. arXiv:2308.06107  [pdf, other

    cs.CR

    Test-Time Backdoor Defense via Detecting and Repairing

    Authors: Jiyang Guan, Jian Liang, Ran He

    Abstract: Deep neural networks have played a crucial part in many critical domains, such as autonomous driving, face recognition, and medical diagnosis. However, deep neural networks are facing security threats from backdoor attacks and can be manipulated into attacker-decided behaviors by the backdoor attacker. To defend the backdoor, prior research has focused on using clean data to remove backdoor attack… ▽ More

    Submitted 29 November, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

  38. arXiv:2308.02362  [pdf, other

    cs.CR cs.AI cs.LG

    Flexible Differentially Private Vertical Federated Learning with Adaptive Feature Embeddings

    Authors: Yuxi Mi, Hongquan Liu, Yewei Xia, Yiheng Sun, Jihong Guan, Shuigeng Zhou

    Abstract: The emergence of vertical federated learning (VFL) has stimulated concerns about the imperfection in privacy protection, as shared feature embeddings may reveal sensitive information under privacy attacks. This paper studies the delicate equilibrium between data privacy and task utility goals of VFL under differential privacy (DP). To address the generality issue of prior arts, this paper advocate… ▽ More

    Submitted 26 July, 2023; originally announced August 2023.

  39. arXiv:2308.01999  [pdf, other

    quant-ph cs.PF cs.SE

    cuQuantum SDK: A High-Performance Library for Accelerating Quantum Science

    Authors: Harun Bayraktar, Ali Charara, David Clark, Saul Cohen, Timothy Costa, Yao-Lung L. Fang, Yang Gao, Jack Guan, John Gunnels, Azzam Haidar, Andreas Hehn, Markus Hohnerbach, Matthew Jones, Tom Lubowe, Dmitry Lyakh, Shinya Morino, Paul Springer, Sam Stanwyck, Igor Terentyev, Satya Varadhan, Jonathan Wong, Takuma Yamaguchi

    Abstract: We present the NVIDIA cuQuantum SDK, a state-of-the-art library of composable primitives for GPU-accelerated quantum circuit simulations. As the size of quantum devices continues to increase, making their classical simulation progressively more difficult, the availability of fast and scalable quantum circuit simulators becomes vital for quantum algorithm developers, as well as quantum hardware eng… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

    Comments: paper accepted at QCE 2023, journal reference will be updated whenever available

    MSC Class: 68Q12; 68Q09; 81P68;

  40. arXiv:2307.16410  [pdf, other

    cs.CV

    HiREN: Towards Higher Supervision Quality for Better Scene Text Image Super-Resolution

    Authors: Minyi Zhao, Yi Xu, Bingjia Li, Jie Wang, Jihong Guan, Shuigeng Zhou

    Abstract: Scene text image super-resolution (STISR) is an important pre-processing technique for text recognition from low-resolution scene images. Nowadays, various methods have been proposed to extract text-specific information from high-resolution (HR) images to supervise STISR model training. However, due to uncontrollable factors (e.g. shooting equipment, focus, and environment) in manually photographi… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

  41. arXiv:2307.10803  [pdf, other

    cs.LG physics.ao-ph

    Spatial-Temporal Data Mining for Ocean Science: Data, Methodologies, and Opportunities

    Authors: Hanchen Yang, Wengen Li, Shuyu Wang, Hui Li, Jihong Guan, Shuigeng Zhou, Jiannong Cao

    Abstract: With the rapid amassing of spatial-temporal (ST) ocean data, many spatial-temporal data mining (STDM) studies have been conducted to address various oceanic issues, including climate forecasting and disaster warning. Compared with typical ST data (e.g., traffic data), ST ocean data is more complicated but with unique characteristics, e.g., diverse regionality and high sparsity. These characteristi… ▽ More

    Submitted 3 August, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

  42. arXiv:2307.01542  [pdf, other

    cs.CL

    Mitigating the Learning Bias towards Repetition by Self-Contrastive Training for Open-Ended Generation

    Authors: Jian Guan, Minlie Huang

    Abstract: Despite the huge progress in myriad generation tasks, pretrained language models (LMs) such as GPT2 still tend to generate repetitive texts with maximization-based decoding algorithms for open-ended generation. We attribute their overestimation of token-level repetition probabilities to the learning bias: LMs capture simple repetitive patterns faster with the MLE loss. We propose self-contrastive… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

    Comments: ACL 2023 Short Findings

  43. arXiv:2307.01504  [pdf, other

    cs.SI cs.AI cs.LG

    All in One: Multi-task Prompting for Graph Neural Networks

    Authors: Xiangguo Sun, Hong Cheng, Jia Li, Bo Liu, Jihong Guan

    Abstract: Recently, ''pre-training and fine-tuning'' has been adopted as a standard workflow for many graph tasks since it can take general graph knowledge to relieve the lack of graph annotations from each application. However, graph tasks with node level, edge level, and graph level are far diversified, making the pre-training pretext often incompatible with these multiple tasks. This gap may even cause a… ▽ More

    Submitted 17 December, 2023; v1 submitted 4 July, 2023; originally announced July 2023.

    Comments: KDD 23 Best Research Paper Award, which is the first for Hong Kong and Mainland China. A Python Library is released as ProG: https://github.com/sheldonresearch/ProG Submitted to SIGKDD'23 in 03 Feb 2023; Receive Acceptance in 17 May 2023 (Rating 3 4 4 4); Submit to arXiv 1st time in 4 Jul 2023

  44. Design Frameworks for Hyper-Connected Social XRI Immersive Metaverse Environments

    Authors: Jie Guan, Alexis Morris

    Abstract: The metaverse refers to the merger of technologies for providing a digital twin of the real world and the underlying connectivity and interactions for the many kinds of agents within. As this set of technology paradigms - involving artificial intelligence, mixed reality, the internet-of-things and others - gains in scale, maturity, and utility there are rapidly emerging design challenges and new r… ▽ More

    Submitted 27 January, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

    Journal ref: IEEE Network ( Volume: 37, Issue: 4, July/August 2023)

  45. arXiv:2306.05358  [pdf, other

    cs.CR cs.AI cs.LG cs.SD eess.AS

    Trustworthy Sensor Fusion against Inaudible Command Attacks in Advanced Driver-Assistance System

    Authors: Jiwei Guan, Lei Pan, Chen Wang, Shui Yu, Longxiang Gao, Xi Zheng

    Abstract: There are increasing concerns about malicious attacks on autonomous vehicles. In particular, inaudible voice command attacks pose a significant threat as voice commands become available in autonomous driving systems. How to empirically defend against these inaudible attacks remains an open question. Previous research investigates utilizing deep learning-based multimodal fusion for defense, without… ▽ More

    Submitted 29 May, 2023; originally announced June 2023.

  46. An XRI Mixed-Reality Internet-of-Things Architectural Framework Toward Immersive and Adaptive Smart Environments

    Authors: Alexis Morris, Jie Guan, Amna Azhar

    Abstract: The internet-of-things (IoT) refers to the growing number of embedded interconnected devices within everyday ubiquitous objects and environments, especially their networks, edge controllers, data gathering and management, sharing, and contextual analysis capabilities. However, the IoT suffers from inherent limitations in terms of human-computer interaction. In this landscape, there is a need for i… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Journal ref: 2021 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct)

  47. Extending the Metaverse: Hyper-Connected Smart Environments with Mixed Reality and the Internet of Things

    Authors: Jie Guan, Alexis Morris, Jay Irizawa

    Abstract: The metaverse, i.e., the collection of technologies that provide a virtual twin of the real world via mixed reality, internet of things, and others, is gaining prominence. However, the metaverse faces challenges as it grows toward mainstream adoption. Among these is the lack of strong connections between metaverse objects and traditional physical objects and environments, which leads to inconsiste… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Journal ref: 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops

  48. Cross-Reality for Extending the Metaverse: Designing Hyper-Connected Immersive Environments with XRI

    Authors: Jie Guan, Alexis Morris, Jay Irizawa

    Abstract: The Metaverse comprises technologies to enable virtual twins of the real world, via mixed reality, internet of things, and others. As it matures unique challenges arise such as a lack of strong connections between virtual and physical worlds. This work presents design frameworks for cross-reality hybrid spaces. Contributions include: i) clarifying the metaverse "disconnect", ii) extended metaverse… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Journal ref: 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW)

  49. Extended-XRI Body Interfaces for Hyper-Connected Metaverse Environments

    Authors: Jie Guan, Alexis Morris

    Abstract: Hybrid mixed-reality (XR) internet-of-things (IoT) research, here called XRI, aims at a strong integration between physical and virtual objects, environments, and agents wherein IoT-enabled edge devices are deployed for sensing, context understanding, networked communication and control of device actuators. Likewise, as augmented reality systems provide an immersive overlay on the environments, an… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Journal ref: 2022 IEEE Games, 2022 IEEE Games, Entertainment, Media Conference (GEM)

  50. arXiv:2305.12881  [pdf, other

    cs.CV cs.MM

    Building an Invisible Shield for Your Portrait against Deepfakes

    Authors: Jiazhi Guan, Tianshu Hu, Hang Zhou, Zhizhi Guo, Lirui Deng, Chengbin Quan, Errui Ding, Youjian Zhao

    Abstract: The issue of detecting deepfakes has garnered significant attention in the research community, with the goal of identifying facial manipulations for abuse prevention. Although recent studies have focused on developing generalized models that can detect various types of deepfakes, their performance is not always be reliable and stable, which poses limitations in real-world applications. Instead of… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: under review