Skip to main content

Showing 1–50 of 236 results for author: Ma, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.15491  [pdf, other

    cs.CL

    Enhancing and Accelerating Large Language Models via Instruction-Aware Contextual Compression

    Authors: Haowen Hou, Fei Ma, Binwen Bai, Xinxin Zhu, Fei Yu

    Abstract: Large Language Models (LLMs) have garnered widespread attention due to their remarkable performance across various tasks. However, to mitigate the issue of hallucinations, LLMs often incorporate retrieval-augmented pipeline to provide them with rich external knowledge and context. Nevertheless, challenges stem from inaccurate and coarse-grained context retrieved from the retriever. Supplying irrel… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 20 pages

  2. arXiv:2408.11795  [pdf, other

    cs.CV

    EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model

    Authors: Feipeng Ma, Yizhou Zhou, Hebei Li, Zilong He, Siying Wu, Fengyun Rao, Yueyi Zhang, Xiaoyan Sun

    Abstract: In the realm of multimodal research, numerous studies leverage substantial image-text pairs to conduct modal alignment learning, transforming Large Language Models (LLMs) into Multimodal LLMs and excelling in a variety of visual-language tasks. The prevailing methodologies primarily fall into two categories: self-attention-based and cross-attention-based methods. While self-attention-based methods… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  3. arXiv:2408.10276  [pdf, other

    cs.LG cs.AI

    FEDKIM: Adaptive Federated Knowledge Injection into Medical Foundation Models

    Authors: Xiaochen Wang, Jiaqi Wang, Houping Xiao, Jinghui Chen, Fenglong Ma

    Abstract: Foundation models have demonstrated remarkable capabilities in handling diverse modalities and tasks, outperforming conventional artificial intelligence (AI) approaches that are highly task-specific and modality-reliant. In the medical domain, however, the development of comprehensive foundation models is constrained by limited access to diverse modalities and stringent privacy regulations. To add… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: Submitted to EMNLP'24

  4. arXiv:2408.09227  [pdf, other

    cs.AI

    FEDMEKI: A Benchmark for Scaling Medical Foundation Models via Federated Knowledge Injection

    Authors: Jiaqi Wang, Xiaochen Wang, Lingjuan Lyu, Jinghui Chen, Fenglong Ma

    Abstract: This study introduces the Federated Medical Knowledge Injection (FEDMEKI) platform, a new benchmark designed to address the unique challenges of integrating medical knowledge into foundation models under privacy constraints. By leveraging a cross-silo federated learning approach, FEDMEKI circumvents the issues associated with centralized data collection, which is often prohibited under health regu… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: Submitted to Neurips 2024 DB Track

  5. arXiv:2408.04590  [pdf, other

    cs.LG

    Learn To Learn More Precisely

    Authors: Runxi Cheng, Yongxian Wei, Xianglong He, Wanyun Zhu, Songsong Huang, Fei Richard Yu, Fei Ma, Chun Yuan

    Abstract: Meta-learning has been extensively applied in the domains of few-shot learning and fast adaptation, achieving remarkable performance. While Meta-learning methods like Model-Agnostic Meta-Learning (MAML) and its variants provide a good set of initial parameters for the model, the model still tends to learn shortcut features, which leads to poor generalization. In this paper, we propose the formal c… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 10pages,4 figures, meta learning

  6. arXiv:2408.00352  [pdf, other

    cs.CV

    Autonomous LLM-Enhanced Adversarial Attack for Text-to-Motion

    Authors: Honglei Miao, Fan Ma, Ruijie Quan, Kun Zhan, Yi Yang

    Abstract: Human motion generation driven by deep generative models has enabled compelling applications, but the ability of text-to-motion (T2M) models to produce realistic motions from text prompts raises security concerns if exploited maliciously. Despite growing interest in T2M, few methods focus on safeguarding these models against adversarial attacks, with existing work on text-to-image models proving i… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  7. arXiv:2407.10649  [pdf, other

    cs.CV

    APC: Adaptive Patch Contrast for Weakly Supervised Semantic Segmentation

    Authors: Wangyu Wu, Tianhong Dai, Zhenhong Chen, Xiaowei Huang, Fei Ma, Jimin Xiao

    Abstract: Weakly Supervised Semantic Segmentation (WSSS) using only image-level labels has gained significant attention due to its cost-effectiveness. The typical framework involves using image-level labels as training data to generate pixel-level pseudo-labels with refinements. Recently, methods based on Vision Transformers (ViT) have demonstrated superior capabilities in generating reliable pseudo-labels,… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  8. arXiv:2407.09822  [pdf, other

    cs.CV

    VividDreamer: Invariant Score Distillation For Hyper-Realistic Text-to-3D Generation

    Authors: Wenjie Zhuo, Fan Ma, Hehe Fan, Yi Yang

    Abstract: This paper presents Invariant Score Distillation (ISD), a novel method for high-fidelity text-to-3D generation. ISD aims to tackle the over-saturation and over-smoothing problems in Score Distillation Sampling (SDS). In this paper, SDS is decoupled into a weighted sum of two components: the reconstruction term and the classifier-free guidance term. We experimentally found that over-saturation stem… ▽ More

    Submitted 17 July, 2024; v1 submitted 13 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  9. arXiv:2407.09299  [pdf, other

    cs.CV

    PID: Physics-Informed Diffusion Model for Infrared Image Generation

    Authors: Fangyuan Mao, Jilin Mei, Shun Lu, Fuyang Liu, Liang Chen, Fangzhou Zhao, Yu Hu

    Abstract: Infrared imaging technology has gained significant attention for its reliable sensing ability in low visibility conditions, prompting many studies to convert the abundant RGB images to infrared images. However, most existing image translation methods treat infrared images as a stylistic variation, neglecting the underlying physical laws, which limits their practical application. To address these i… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  10. arXiv:2407.08428  [pdf, other

    cs.CV cs.AI

    A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights

    Authors: Wentao Lei, Jinting Wang, Fengji Ma, Guanjie Huang, Li Liu

    Abstract: Human video generation is a dynamic and rapidly evolving task that aims to synthesize 2D human body video sequences with generative models given control conditions such as text, audio, and pose. With the potential for wide-ranging applications in film, gaming, and virtual communication, the ability to generate natural and realistic human video is critical. Recent advancements in generative models… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  11. arXiv:2407.03640  [pdf, other

    cs.LG cs.CL cs.CV

    Generative Technology for Human Emotion Recognition: A Scope Review

    Authors: Fei Ma, Yucheng Yuan, Yifan Xie, Hongwei Ren, Ivan Liu, Ying He, Fuji Ren, Fei Richard Yu, Shiguang Ni

    Abstract: Affective computing stands at the forefront of artificial intelligence (AI), seeking to imbue machines with the ability to comprehend and respond to human emotions. Central to this field is emotion recognition, which endeavors to identify and interpret human emotional states from different modalities, such as speech, facial images, text, and physiological signals. In recent years, important progre… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Under Review

  12. arXiv:2407.03247  [pdf, other

    cs.DC

    Bridging Model Heterogeneity in Federated Learning via Uncertainty-based Asymmetrical Reciprocity Learning

    Authors: Jiaqi Wang, Chenxu Zhao, Lingjuan Lyu, Quanzeng You, Mengdi Huai, Fenglong Ma

    Abstract: This paper presents FedType, a simple yet pioneering framework designed to fill research gaps in heterogeneous model aggregation within federated learning (FL). FedType introduces small identical proxy models for clients, serving as agents for information exchange, ensuring model security, and achieving efficient communication simultaneously. To transfer knowledge between large private and small p… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted by ICML 2024

  13. arXiv:2407.02329  [pdf, other

    cs.CV

    MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis

    Authors: Dewei Zhou, You Li, Fan Ma, Zongxin Yang, Yi Yang

    Abstract: We introduce the Multi-Instance Generation (MIG) task, which focuses on generating multiple instances within a single image, each accurately placed at predefined positions with attributes such as category, color, and shape, strictly following user specifications. MIG faces three main challenges: avoiding attribute leakage between instances, supporting diverse instance descriptions, and maintaining… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  14. arXiv:2406.13942  [pdf, other

    cs.LG

    Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models

    Authors: Yuan Zhong, Xiaochen Wang, Jiaqi Wang, Xiaokun Zhang, Yaqing Wang, Mengdi Huai, Cao Xiao, Fenglong Ma

    Abstract: Synthesizing electronic health records (EHR) data has become a preferred strategy to address data scarcity, improve data quality, and model fairness in healthcare. However, existing approaches for EHR data generation predominantly rely on state-of-the-art generative techniques like generative adversarial networks, variational autoencoders, and language models. These methods typically replicate inp… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  15. arXiv:2406.13362  [pdf, other

    cs.CV cs.CL cs.LG

    VisualRWKV: Exploring Recurrent Neural Networks for Visual Language Models

    Authors: Haowen Hou, Peigen Zeng, Fei Ma, Fei Richard Yu

    Abstract: Visual Language Models (VLMs) have rapidly progressed with the recent success of large language models. However, there have been few attempts to incorporate efficient linear Recurrent Neural Networks (RNNs) architectures into VLMs. In this study, we introduce VisualRWKV, the first application of a linear RNN model to multimodal learning tasks, leveraging the pre-trained RWKV language model. We pro… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 18 pages,14 tables,6 figures

  16. arXiv:2406.12947  [pdf, other

    cs.CR cs.SE

    AutoFirm: Automatically Identifying Reused Libraries inside IoT Firmware at Large-Scale

    Authors: YongLe Chen, Feng Ma, Ying Zhang, YongZhong He, Haining Wang, Qiang Li

    Abstract: The Internet of Things (IoT) has become indispensable to our daily lives and work. Unfortunately, developers often reuse software libraries in the IoT firmware, leading to a major security concern. If vulnerabilities or insecure versions of these libraries go unpatched, a massive number of IoT devices can be impacted. In this paper, we propose the AutoFirm, an automated tool for detecting reused l… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 13 pages, 20 figures

  17. arXiv:2406.11048  [pdf, other

    cs.LG cs.DC

    Leveraging Foundation Models for Multi-modal Federated Learning with Incomplete Modality

    Authors: Liwei Che, Jiaqi Wang, Xinyue Liu, Fenglong Ma

    Abstract: Federated learning (FL) has obtained tremendous progress in providing collaborative training solutions for distributed data silos with privacy guarantees. However, few existing works explore a more realistic scenario where the clients hold multiple data modalities. In this paper, we aim to solve a novel challenge in multi-modal federated learning (MFL) -- modality missing -- the clients may lose p… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Accepted by ECML-PKDD 2024

  18. arXiv:2406.09669  [pdf, other

    cs.CR

    Watch the Watcher! Backdoor Attacks on Security-Enhancing Diffusion Models

    Authors: Changjiang Li, Ren Pang, Bochuan Cao, Jinghui Chen, Fenglong Ma, Shouling Ji, Ting Wang

    Abstract: Thanks to their remarkable denoising capabilities, diffusion models are increasingly being employed as defensive tools to reinforce the security of other models, notably in purifying adversarial examples and certifying adversarial robustness. However, the security risks of these practices themselves remain largely unexplored, which is highly concerning. To bridge this gap, this work investigates t… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  19. arXiv:2406.07444  [pdf, other

    cs.CL

    On the Robustness of Document-Level Relation Extraction Models to Entity Name Variations

    Authors: Shiao Meng, Xuming Hu, Aiwei Liu, Fukun Ma, Yawen Yang, Shuang Li, Lijie Wen

    Abstract: Driven by the demand for cross-sentence and large-scale relation extraction, document-level relation extraction (DocRE) has attracted increasing research interest. Despite the continuous improvement in performance, we find that existing DocRE models which initially perform well may make more mistakes when merely changing the entity names in the document, hindering the generalization to novel entit… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 Findings

    MSC Class: 68T50 ACM Class: I.2.7

  20. arXiv:2406.00045  [pdf, other

    cs.CL cs.LG

    Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization

    Authors: Yuanpu Cao, Tianrong Zhang, Bochuan Cao, Ziyi Yin, Lu Lin, Fenglong Ma, Jinghui Chen

    Abstract: Researchers have been studying approaches to steer the behavior of Large Language Models (LLMs) and build personalized LLMs tailored for various applications. While fine-tuning seems to be a direct solution, it requires substantial computational resources and may significantly affect the utility of the original LLM. Recent endeavors have introduced more lightweight strategies, focusing on extracti… ▽ More

    Submitted 29 July, 2024; v1 submitted 28 May, 2024; originally announced June 2024.

  21. arXiv:2405.20339  [pdf, other

    cs.CV

    Visual Perception by Large Language Model's Weights

    Authors: Feipeng Ma, Hongwei Xue, Guangting Wang, Yizhou Zhou, Fengyun Rao, Shilin Yan, Yueyi Zhang, Siying Wu, Mike Zheng Shou, Xiaoyan Sun

    Abstract: Existing Multimodal Large Language Models (MLLMs) follow the paradigm that perceives visual information by aligning visual features with the input space of Large Language Models (LLMs), and concatenating visual tokens with text tokens to form a unified sequence input for LLMs. These methods demonstrate promising results on various vision-language tasks but are limited by the high computational eff… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  22. arXiv:2405.19333  [pdf, other

    cs.CV

    Multi-Modal Generative Embedding Model

    Authors: Feipeng Ma, Hongwei Xue, Guangting Wang, Yizhou Zhou, Fengyun Rao, Shilin Yan, Yueyi Zhang, Siying Wu, Mike Zheng Shou, Xiaoyan Sun

    Abstract: Most multi-modal tasks can be formulated into problems of either generation or embedding. Existing models usually tackle these two types of problems by decoupling language modules into a text decoder for generation, and a text encoder for embedding. To explore the minimalism of multi-modal paradigms, we attempt to achieve only one model per modality in this work. We propose a Multi-Modal Generativ… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  23. arXiv:2405.17678  [pdf, other

    cs.CV cs.AI

    TIMA: Text-Image Mutual Awareness for Balancing Zero-Shot Adversarial Robustness and Generalization Ability

    Authors: Fengji Ma, Li Liu, Hei Victor Cheng

    Abstract: This work addresses the challenge of achieving zero-shot adversarial robustness while preserving zero-shot generalization in large-scale foundation models, with a focus on the popular Contrastive Language-Image Pre-training (CLIP). Although foundation models were reported to have exceptional zero-shot generalization, they are highly vulnerable to adversarial perturbations. Existing methods achieve… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  24. arXiv:2405.12503  [pdf, other

    cs.CV

    CLRKDNet: Speeding up Lane Detection with Knowledge Distillation

    Authors: Weiqing Qi, Guoyang Zhao, Fulong Ma, Linwei Zheng, Ming Liu

    Abstract: Road lanes are integral components of the visual perception systems in intelligent vehicles, playing a pivotal role in safe navigation. In lane detection tasks, balancing accuracy with real-time performance is essential, yet existing methods often sacrifice one for the other. To address this trade-off, we introduce CLRKDNet, a streamlined model that balances detection accuracy with real-time perfo… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  25. arXiv:2405.10022  [pdf, other

    eess.AS cs.SD

    Monaural speech enhancement on drone via Adapter based transfer learning

    Authors: Xingyu Chen, Hanwen Bi, Wei-Ting Lai, Fei Ma

    Abstract: Monaural Speech enhancement on drones is challenging because the ego-noise from the rotating motors and propellers leads to extremely low signal-to-noise ratios at onboard microphones. Although recent masking-based deep neural network methods excel in monaural speech enhancement, they struggle in the challenging drone noise scenario. Furthermore, existing drone noise datasets are limited, causing… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  26. arXiv:2405.08589  [pdf, other

    cs.CV

    Variable Substitution and Bilinear Programming for Aligning Partially Overlapping Point Sets

    Authors: Wei Lian, Zhesen Cui, Fei Ma, Hang Pan, Wangmeng Zuo

    Abstract: In many applications, the demand arises for algorithms capable of aligning partially overlapping point sets while remaining invariant to the corresponding transformations. This research presents a method designed to meet such requirements through minimization of the objective function of the robust point matching (RPM) algorithm. First, we show that the RPM objective is a cubic polynomial. Then, t… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  27. arXiv:2405.06116  [pdf, other

    cs.CV

    Rethinking Efficient and Effective Point-based Networks for Event Camera Classification and Regression: EventMamba

    Authors: Hongwei Ren, Yue Zhou, Jiadong Zhu, Haotian Fu, Yulong Huang, Xiaopeng Lin, Yuetong Fang, Fei Ma, Hao Yu, Bojun Cheng

    Abstract: Event cameras, drawing inspiration from biological systems, efficiently detect changes in ambient light with low latency and high dynamic range while consuming minimal power. The most current approach to processing event data often involves converting it into frame-based representations, which is well-established in traditional vision. However, this approach neglects the sparsity of event data, lo… ▽ More

    Submitted 2 July, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: Extension Journal of TTPOINT and PEPNet, modify the dataset split method

  28. arXiv:2405.03565  [pdf, other

    cs.CV

    Liberating Seen Classes: Boosting Few-Shot and Zero-Shot Text Classification via Anchor Generation and Classification Reframing

    Authors: Han Liu, Siyang Zhao, Xiaotong Zhang, Feng Zhang, Wei Wang, Fenglong Ma, Hongyang Chen, Hong Yu, Xianchao Zhang

    Abstract: Few-shot and zero-shot text classification aim to recognize samples from novel classes with limited labeled samples or no labeled samples at all. While prevailing methods have shown promising performance via transferring knowledge from seen classes to unseen classes, they are still limited by (1) Inherent dissimilarities among classes make the transformation of features learned from seen classes t… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Accepted to AAAI 2024

  29. arXiv:2404.16297  [pdf, other

    cs.SE cs.AI

    When Fuzzing Meets LLMs: Challenges and Opportunities

    Authors: Yu Jiang, Jie Liang, Fuchen Ma, Yuanliang Chen, Chijin Zhou, Yuheng Shen, Zhiyong Wu, Jingzhou Fu, Mingzhe Wang, ShanShan Li, Quan Zhang

    Abstract: Fuzzing, a widely-used technique for bug detection, has seen advancements through Large Language Models (LLMs). Despite their potential, LLMs face specific challenges in fuzzing. In this paper, we identified five major challenges of LLM-assisted fuzzing. To support our findings, we revisited the most recent papers from top-tier conferences, confirming that these challenges are widespread. As a rem… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  30. FineRec:Exploring Fine-grained Sequential Recommendation

    Authors: Xiaokun Zhang, Bo Xu, Youlin Wu, Yuan Zhong, Hongfei Lin, Fenglong Ma

    Abstract: Sequential recommendation is dedicated to offering items of interest for users based on their history behaviors. The attribute-opinion pairs, expressed by users in their reviews for items, provide the potentials to capture user preferences and item characteristics at a fine-grained level. To this end, we propose a novel framework FineRec that explores the attribute-opinion pairs of reviews to fine… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: This work has been accepted by SIGIR24' as a full paper

  31. Disentangling ID and Modality Effects for Session-based Recommendation

    Authors: Xiaokun Zhang, Bo Xu, Zhaochun Ren, Xiaochen Wang, Hongfei Lin, Fenglong Ma

    Abstract: Session-based recommendation aims to predict intents of anonymous users based on their limited behaviors. Modeling user behaviors involves two distinct rationales: co-occurrence patterns reflected by item IDs, and fine-grained preferences represented by item modalities (e.g., text and images). However, existing methods typically entangle these causes, leading to their failure in achieving accurate… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: This work has been accepted by SIGIR24' as a full paper

  32. arXiv:2404.12624  [pdf, other

    cs.RO cs.CV

    Dragtraffic: A Non-Expert Interactive and Point-Based Controllable Traffic Scene Generation Framework

    Authors: Sheng Wang, Ge Sun, Fulong Ma, Tianshuai Hu, Yongkang Song, Lei Zhu, Ming Liu

    Abstract: The evaluation and training of autonomous driving systems require diverse and scalable corner cases. However, most existing scene generation methods lack controllability, accuracy, and versatility, resulting in unsatisfactory generation results. To address this problem, we propose Dragtraffic, a generalized, point-based, and controllable traffic scene generation framework based on conditional diff… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  33. arXiv:2404.06860  [pdf, other

    cs.CV

    Monocular 3D lane detection for Autonomous Driving: Recent Achievements, Challenges, and Outlooks

    Authors: Fulong Ma, Weiqing Qi, Guoyang Zhao, Linwei Zheng, Sheng Wang, Yuxuan Liu, Ming Liu

    Abstract: 3D lane detection is essential in autonomous driving as it extracts structural and traffic information from the road in three-dimensional space, aiding self-driving cars in logical, safe, and comfortable path planning and motion control. Given the cost of sensors and the advantages of visual data in color information, 3D lane detection based on monocular vision is an important research direction i… ▽ More

    Submitted 19 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

  34. Generative AI in the Wild: Prospects, Challenges, and Strategies

    Authors: Yuan Sun, Eunchae Jang, Fenglong Ma, Ting Wang

    Abstract: Propelled by their remarkable capabilities to generate novel and engaging content, Generative Artificial Intelligence (GenAI) technologies are disrupting traditional workflows in many industries. While prior research has examined GenAI from a techno-centric perspective, there is still a lack of understanding about how users perceive and utilize GenAI in real-world scenarios. To bridge this gap, we… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI'24), May 11-16, 2024, Honolulu, HI, USA. (accidentally submitted as arXiv:2302.10827v2)

  35. arXiv:2404.00254  [pdf, other

    cs.LG cs.CE q-bio.BM q-bio.QM

    Clustering for Protein Representation Learning

    Authors: Ruijie Quan, Wenguan Wang, Fan Ma, Hehe Fan, Yi Yang

    Abstract: Protein representation learning is a challenging task that aims to capture the structure and function of proteins from their amino acid sequences. Previous methods largely ignored the fact that not all amino acids are equally important for protein folding and activity. In this article, we propose a neural clustering framework that can automatically discover the critical components of a protein by… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: Accepted to CVPR2024

  36. arXiv:2403.20022  [pdf, other

    cs.CV

    Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity

    Authors: Ruijie Quan, Wenguan Wang, Zhibo Tian, Fan Ma, Yi Yang

    Abstract: Reconstructing the viewed images from human brain activity bridges human and computer vision through the Brain-Computer Interface. The inherent variability in brain function between individuals leads existing literature to focus on acquiring separate models for each individual using their respective brain signal data, ignoring commonalities between these data. In this article, we devise Psychometr… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  37. arXiv:2403.16794  [pdf, other

    cs.CV cs.RO

    CurbNet: Curb Detection Framework Based on LiDAR Point Cloud Segmentation

    Authors: Guoyang Zhao, Fulong Ma, Weiqing Qi, Yuxuan Liu, Ming Liu

    Abstract: Curb detection is a crucial function in intelligent driving, essential for determining drivable areas on the road. However, the complexity of road environments makes curb detection challenging. This paper introduces CurbNet, a novel framework for curb detection utilizing point cloud segmentation. To address the lack of comprehensive curb datasets with 3D annotations, we have developed the 3D-Curb… ▽ More

    Submitted 30 May, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

  38. arXiv:2403.16005  [pdf, other

    cs.CV

    Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval

    Authors: Yucheng Suo, Fan Ma, Linchao Zhu, Yi Yang

    Abstract: We study the zero-shot Composed Image Retrieval (ZS-CIR) task, which is to retrieve the target image given a reference image and a description without training on the triplet datasets. Previous works generate pseudo-word tokens by projecting the reference image features to the text embedding space. However, they focus on the global visual representation, ignoring the representation of detailed att… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  39. arXiv:2403.15173  [pdf, other

    cs.CV

    LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels

    Authors: Tuo Feng, Wenguan Wang, Fan Ma, Yi Yang

    Abstract: Autonomous systems need to process large-scale, sparse, and irregular point clouds with limited compute resources. Consequently, it is essential to develop LiDAR perception methods that are both efficient and effective. Although naively enlarging 3D kernel size can enhance performance, it will also lead to a cubically-increasing overhead. Therefore, it is crucial to develop streamlined 3D large ke… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: Accepted at CVPR 2024; Project page: https://github.com/FengZicai/LSK3DNet

  40. arXiv:2402.15735  [pdf

    eess.AS cs.SD

    A circular microphone array with virtual microphones based on acoustics-informed neural networks

    Authors: Sipei Zhao, Fei Ma

    Abstract: Acoustic beamforming aims to focus acoustic signals to a specific direction and suppress undesirable interferences from other directions. Despite its flexibility and steerability, beamforming with circular microphone arrays suffers from significant performance degradation at frequencies corresponding to zeros of the Bessel functions. To conquer this constraint, baffled or concentric circular micro… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: Submitted to JASA on 24/02/2024

  41. arXiv:2402.15700  [pdf, other

    cs.LG cs.AI cs.CL

    CoRelation: Boosting Automatic ICD Coding Through Contextualized Code Relation Learning

    Authors: Junyu Luo, Xiaochen Wang, Jiaqi Wang, Aofei Chang, Yaqing Wang, Fenglong Ma

    Abstract: Automatic International Classification of Diseases (ICD) coding plays a crucial role in the extraction of relevant information from clinical notes for proper recording and billing. One of the most important directions for boosting the performance of automatic ICD coding is modeling ICD code relations. However, current methods insufficiently model the intricate relationships among ICD codes and oft… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: LREC-Coling 2024

  42. arXiv:2402.11083  [pdf, other

    cs.CV

    VQAttack: Transferable Adversarial Attacks on Visual Question Answering via Pre-trained Models

    Authors: Ziyi Yin, Muchao Ye, Tianrong Zhang, Jiaqi Wang, Han Liu, Jinghui Chen, Ting Wang, Fenglong Ma

    Abstract: Visual Question Answering (VQA) is a fundamental task in computer vision and natural language process fields. Although the ``pre-training & finetuning'' learning paradigm significantly improves the VQA performance, the adversarial robustness of such a learning paradigm has not been explored. In this paper, we delve into a new problem: using a pre-trained multimodal source model to create adversari… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: AAAI 2024, 11 pages

  43. arXiv:2402.08904  [pdf, other

    eess.AS cs.SD

    Sound Field Reconstruction Using a Compact Acoustics-informed Neural Network

    Authors: Fei Ma, Sipei Zhao, Ian S. Burnett

    Abstract: Sound field reconstruction (SFR) augments the information of a sound field captured by a microphone array. Conventional SFR methods using basis function decomposition are straightforward and computationally efficient, but may require more microphones than needed to measure the sound field. Recent studies show that pure data-driven and learning-based methods are promising in some SFR tasks, but the… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  44. arXiv:2402.06772  [pdf, other

    q-bio.QM cs.AI cs.CE cs.LG

    Retrosynthesis Prediction via Search in (Hyper) Graph

    Authors: Zixun Lan, Binjie Hong, Jiajun Zhu, Zuo Zeng, Zhenfu Liu, Limin Yu, Fei Ma

    Abstract: Predicting reactants from a specified core product stands as a fundamental challenge within organic synthesis, termed retrosynthesis prediction. Recently, semi-template-based methods and graph-edits-based methods have achieved good performance in terms of both interpretability and accuracy. However, due to their mechanisms these methods cannot predict complex reactions, e.g., reactions with multip… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  45. arXiv:2402.06149  [pdf, other

    cs.CV

    HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting

    Authors: Zhenglin Zhou, Fan Ma, Hehe Fan, Yi Yang

    Abstract: Creating digital avatars from textual prompts has long been a desirable yet challenging task. Despite the promising outcomes obtained through 2D diffusion priors in recent works, current methods face challenges in achieving high-quality and animated avatars effectively. In this paper, we present $\textbf{HeadStudio}$, a novel framework that utilizes 3D Gaussian splatting to generate realistic and… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: 9 pages, 8 figures

  46. arXiv:2402.05408  [pdf, other

    cs.CV

    MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis

    Authors: Dewei Zhou, You Li, Fan Ma, Xiaoting Zhang, Yi Yang

    Abstract: We present a Multi-Instance Generation (MIG) task, simultaneously generating multiple instances with diverse controls in one image. Given a set of predefined coordinates and their corresponding descriptions, the task is to ensure that generated instances are accurately at the designated locations and that all instances' attributes adhere to their corresponding description. This broadens the scope… ▽ More

    Submitted 27 February, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: Accepted for publication in CVPR 2024

  47. arXiv:2402.02503  [pdf

    cs.CV cs.CL

    GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering

    Authors: Ziyu Ma, Shutao Li, Bin Sun, Jianfei Cai, Zuxiang Long, Fuyan Ma

    Abstract: Knowledge-based visual question answering (VQA) requires world knowledge beyond the image for accurate answer. Recently, instead of extra knowledge bases, a large language model (LLM) like GPT-3 is activated as an implicit knowledge engine to jointly acquire and reason the necessary knowledge for answering by converting images into textual information (e.g., captions and answer candidates). Howeve… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: 17 pages

  48. arXiv:2402.01806  [pdf, other

    cs.CL cs.AI

    HQA-Attack: Toward High Quality Black-Box Hard-Label Adversarial Attack on Text

    Authors: Han Liu, Zhi Xu, Xiaotong Zhang, Feng Zhang, Fenglong Ma, Hongyang Chen, Hong Yu, Xianchao Zhang

    Abstract: Black-box hard-label adversarial attack on text is a practical and challenging task, as the text data space is inherently discrete and non-differentiable, and only the predicted label is accessible. Research on this problem is still in the embryonic stage and only a few methods are available. Nevertheless, existing methods rely on the complex heuristic algorithm or unreliable gradient estimation s… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  49. arXiv:2402.01077  [pdf, ps, other

    cs.LG cs.AI

    Recent Advances in Predictive Modeling with Electronic Health Records

    Authors: Jiaqi Wang, Junyu Luo, Muchao Ye, Xiaochen Wang, Yuan Zhong, Aofei Chang, Guanjie Huang, Ziyi Yin, Cao Xiao, Jimeng Sun, Fenglong Ma

    Abstract: The development of electronic health records (EHR) systems has enabled the collection of a vast amount of digitized patient data. However, utilizing EHR data for predictive modeling presents several challenges due to its unique characteristics. With the advancements in machine learning techniques, deep learning has demonstrated its superiority in various applications, including healthcare. This su… ▽ More

    Submitted 13 August, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: This paper has been accepted by IJCAI 24 Survey Track

  50. arXiv:2402.00627  [pdf, other

    cs.CV cs.AI

    CapHuman: Capture Your Moments in Parallel Universes

    Authors: Chao Liang, Fan Ma, Linchao Zhu, Yingying Deng, Yi Yang

    Abstract: We concentrate on a novel human-centric image synthesis task, that is, given only one reference facial photograph, it is expected to generate specific individual images with diverse head positions, poses, facial expressions, and illuminations in different contexts. To accomplish this goal, we argue that our generative model should be capable of the following favorable characteristics: (1) a strong… ▽ More

    Submitted 17 May, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: Accepted by CVPR 2024. Project page: https://caphuman.github.io/