Skip to main content

Showing 1–50 of 395 results for author: Zhu, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.15246  [pdf, other

    cs.CV cs.AI cs.LG

    Multi-Slice Spatial Transcriptomics Data Integration Analysis with STG3Net

    Authors: Donghai Fang, Fangfang Zhu, Wenwen Min

    Abstract: With the rapid development of the latest Spatially Resolved Transcriptomics (SRT) technology, which allows for the mapping of gene expression within tissue sections, the integrative analysis of multiple SRT data has become increasingly important. However, batch effects between multiple slices pose significant challenges in analyzing SRT data. To address these challenges, we have developed a plug-a… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  2. arXiv:2408.14035  [pdf, other

    cs.RO cs.CV

    FAST-LIVO2: Fast, Direct LiDAR-Inertial-Visual Odometry

    Authors: Chunran Zheng, Wei Xu, Zuhao Zou, Tong Hua, Chongjian Yuan, Dongjiao He, Bingyang Zhou, Zheng Liu, Jiarong Lin, Fangcheng Zhu, Yunfan Ren, Rong Wang, Fanle Meng, Fu Zhang

    Abstract: This paper proposes FAST-LIVO2: a fast, direct LiDAR-inertial-visual odometry framework to achieve accurate and robust state estimation in SLAM tasks and provide great potential in real-time, onboard robotic applications. FAST-LIVO2 fuses the IMU, LiDAR and image measurements efficiently through an ESIKF. To address the dimension mismatch between the heterogeneous LiDAR and image measurements, we… ▽ More

    Submitted 28 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

    Comments: 30 pages, 31 figures, due to the limitation that 'The abstract field cannot exceed 1,920 characters', the abstract presented here is shorter than the one in the PDF file

  3. arXiv:2408.11478  [pdf, other

    cs.CV cs.LG

    LAKD-Activation Mapping Distillation Based on Local Learning

    Authors: Yaoze Zhang, Yuming Zhang, Yu Zhao, Yue Zhang, Feiyu Zhu

    Abstract: Knowledge distillation is widely applied in various fundamental vision models to enhance the performance of compact models. Existing knowledge distillation methods focus on designing different distillation targets to acquire knowledge from teacher models. However, these methods often overlook the efficient utilization of distilled information, crudely coupling different types of information, makin… ▽ More

    Submitted 22 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

    Comments: 8 pages,7 figures

  4. arXiv:2408.08305  [pdf, other

    cs.CV

    Towards Flexible Visual Relationship Segmentation

    Authors: Fangrui Zhu, Jianwei Yang, Huaizu Jiang

    Abstract: Visual relationship understanding has been studied separately in human-object interaction(HOI) detection, scene graph generation(SGG), and referring relationships(RR) tasks. Given the complexity and interconnectedness of these tasks, it is crucial to have a flexible framework that can effectively address these tasks in a cohesive manner. In this work, we propose FleVRS, a single model that seamles… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  5. arXiv:2408.07709  [pdf, other

    q-bio.GN cs.LG

    Pretrained-Guided Conditional Diffusion Models for Microbiome Data Analysis

    Authors: Xinyuan Shi, Fangfang Zhu, Wenwen Min

    Abstract: Emerging evidence indicates that human cancers are intricately linked to human microbiomes, forming an inseparable connection. However, due to limited sample sizes and significant data loss during collection for various reasons, some machine learning methods have been proposed to address the issue of missing data. These methods have not fully utilized the known clinical information of patients to… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  6. arXiv:2408.06377  [pdf, other

    q-bio.GN cs.AI cs.LG

    Masked Graph Autoencoders with Contrastive Augmentation for Spatially Resolved Transcriptomics Data

    Authors: Donghai Fang, Fangfang Zhu, Dongting Xie, Wenwen Min

    Abstract: With the rapid advancement of Spatial Resolved Transcriptomics (SRT) technology, it is now possible to comprehensively measure gene transcription while preserving the spatial context of tissues. Spatial domain identification and gene denoising are key objectives in SRT data analysis. We propose a Contrastively Augmented Masked Graph Autoencoder (STMGAC) to learn low-dimensional latent representati… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  7. arXiv:2408.05258  [pdf, other

    q-bio.GN cs.AI cs.LG

    scASDC: Attention Enhanced Structural Deep Clustering for Single-cell RNA-seq Data

    Authors: Wenwen Min, Zhen Wang, Fangfang Zhu, Taosheng Xu, Shunfang Wang

    Abstract: Single-cell RNA sequencing (scRNA-seq) data analysis is pivotal for understanding cellular heterogeneity. However, the high sparsity and complex noise patterns inherent in scRNA-seq data present significant challenges for traditional clustering methods. To address these issues, we propose a deep clustering method, Attention-Enhanced Structural Deep Embedding Graph Clustering (scASDC), which integr… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  8. arXiv:2408.04223  [pdf, other

    cs.CV cs.AI

    VideoQA in the Era of LLMs: An Empirical Study

    Authors: Junbin Xiao, Nanxin Huang, Hangyu Qin, Dongyang Li, Yicong Li, Fengbin Zhu, Zhulin Tao, Jianxing Yu, Liang Lin, Tat-Seng Chua, Angela Yao

    Abstract: Video Large Language Models (Video-LLMs) are flourishing and has advanced many video-language tasks. As a golden testbed, Video Question Answering (VideoQA) plays pivotal role in Video-LLM developing. This work conducts a timely and comprehensive study of Video-LLMs' behavior in VideoQA, aiming to elucidate their success and failure modes, and provide insights towards more human-like video underst… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Preprint. Under Review

  9. arXiv:2408.03922  [pdf, other

    cs.CV

    FMiFood: Multi-modal Contrastive Learning for Food Image Classification

    Authors: Xinyue Pan, Jiangpeng He, Fengqing Zhu

    Abstract: Food image classification is the fundamental step in image-based dietary assessment, which aims to estimate participants' nutrient intake from eating occasion images. A common challenge of food images is the intra-class diversity and inter-class similarity, which can significantly hinder classification performance. To address this issue, we introduce a novel multi-modal contrastive learning framew… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  10. arXiv:2407.20518  [pdf, other

    eess.IV cs.AI cs.CV

    High-Resolution Spatial Transcriptomics from Histology Images using HisToSGE

    Authors: Zhiceng Shi, Shuailin Xue, Fangfang Zhu, Wenwen Min

    Abstract: Spatial transcriptomics (ST) is a groundbreaking genomic technology that enables spatial localization analysis of gene expression within tissue sections. However, it is significantly limited by high costs and sparse spatial resolution. An alternative, more cost-effective strategy is to use deep learning methods to predict high-density gene expression profiles from histological images. However, exi… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  11. arXiv:2407.15141  [pdf, other

    cs.AI cs.LG physics.chem-ph

    Text-Augmented Multimodal LLMs for Chemical Reaction Condition Recommendation

    Authors: Yu Zhang, Ruijie Yu, Kaipeng Zeng, Ding Li, Feng Zhu, Xiaokang Yang, Yaohui Jin, Yanyan Xu

    Abstract: High-throughput reaction condition (RC) screening is fundamental to chemical synthesis. However, current RC screening suffers from laborious and costly trial-and-error workflows. Traditional computer-aided synthesis planning (CASP) tools fail to find suitable RCs due to data sparsity and inadequate reaction representations. Nowadays, large language models (LLMs) are capable of tackling chemistry-r… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  12. arXiv:2407.14029  [pdf, other

    cs.CV cs.LG

    PASS++: A Dual Bias Reduction Framework for Non-Exemplar Class-Incremental Learning

    Authors: Fei Zhu, Xu-Yao Zhang, Zhen Cheng, Cheng-Lin Liu

    Abstract: Class-incremental learning (CIL) aims to recognize new classes incrementally while maintaining the discriminability of old classes. Most existing CIL methods are exemplar-based, i.e., storing a part of old data for retraining. Without relearning old data, those methods suffer from catastrophic forgetting. In this paper, we figure out two inherent problems in CIL, i.e., representation bias and clas… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  13. arXiv:2407.13182  [pdf, other

    cs.LG cs.AI q-bio.GN

    SpaDiT: Diffusion Transformer for Spatial Gene Expression Prediction using scRNA-seq

    Authors: Xiaoyu Li, Fangfang Zhu, Wenwen Min

    Abstract: The rapid development of spatial transcriptomics (ST) technologies is revolutionizing our understanding of the spatial organization of biological tissues. Current ST methods, categorized into next-generation sequencing-based (seq-based) and fluorescence in situ hybridization-based (image-based) methods, offer innovative insights into the functional dynamics of biological tissues. However, these me… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  14. arXiv:2407.09285  [pdf, other

    cs.CV

    MetaFood CVPR 2024 Challenge on Physically Informed 3D Food Reconstruction: Methods and Results

    Authors: Jiangpeng He, Yuhao Chen, Gautham Vinod, Talha Ibn Mahmud, Fengqing Zhu, Edward Delp, Alexander Wong, Pengcheng Xi, Ahmad AlMughrabi, Umair Haroon, Ricardo Marques, Petia Radeva, Jiadong Tang, Dianyi Yang, Yu Gao, Zhaoxiang Liang, Yawei Jueluo, Chengyu Shi, Pengyu Wang

    Abstract: The increasing interest in computer vision applications for nutrition and dietary monitoring has led to the development of advanced 3D reconstruction techniques for food items. However, the scarcity of high-quality data and limited collaboration between industry and academia have constrained progress in this field. Building on recent advancements in 3D reconstruction, we host the MetaFood Workshop… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Technical report for MetaFood CVPR 2024 Challenge on Physically Informed 3D Food Reconstruction. arXiv admin note: substantial text overlap with arXiv:2407.01717

  15. arXiv:2407.08224  [pdf, other

    q-bio.QM cs.AI

    stEnTrans: Transformer-based deep learning for spatial transcriptomics enhancement

    Authors: Shuailin Xue, Fangfang Zhu, Changmiao Wang, Wenwen Min

    Abstract: The spatial location of cells within tissues and organs is crucial for the manifestation of their specific functions.Spatial transcriptomics technology enables comprehensive measurement of the gene expression patterns in tissues while retaining spatial information. However, current popular spatial transcriptomics techniques either have shallow sequencing depth or low resolution. We present stEnTra… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: ISBRA2024, Code: https://github.com/shuailinxue/stEnTrans

  16. arXiv:2407.05638  [pdf, other

    cs.CV

    HPFF: Hierarchical Locally Supervised Learning with Patch Feature Fusion

    Authors: Junhao Su, Chenghao He, Feiyu Zhu, Xiaojie Xu, Dongzhi Guan, Chenyang Si

    Abstract: Traditional deep learning relies on end-to-end backpropagation for training, but it suffers from drawbacks such as high memory consumption and not aligning with biological neural networks. Recent advancements have introduced locally supervised learning, which divides networks into modules with isolated gradients and trains them locally. However, this approach can lead to performance lag due to lim… ▽ More

    Submitted 8 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  17. arXiv:2407.05623  [pdf, other

    cs.CV

    Momentum Auxiliary Network for Supervised Local Learning

    Authors: Junhao Su, Changpeng Cai, Feiyu Zhu, Chenghao He, Xiaojie Xu, Dongzhi Guan, Chenyang Si

    Abstract: Deep neural networks conventionally employ end-to-end backpropagation for their training process, which lacks biological credibility and triggers a locking dilemma during network parameter updates, leading to significant GPU memory use. Supervised local learning, which segments the network into multiple local blocks updated by independent auxiliary networks. However, these methods cannot replace e… ▽ More

    Submitted 12 August, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024(Oral)

  18. arXiv:2407.04020  [pdf, other

    cs.CL

    LLMAEL: Large Language Models are Good Context Augmenters for Entity Linking

    Authors: Amy Xin, Yunjia Qi, Zijun Yao, Fangwei Zhu, Kaisheng Zeng, Xu Bin, Lei Hou, Juanzi Li

    Abstract: Entity Linking (EL) models are well-trained at mapping mentions to their corresponding entities according to a given context. However, EL models struggle to disambiguate long-tail entities due to their limited training data. Meanwhile, large language models (LLMs) are more robust at interpreting uncommon mentions. Yet, due to a lack of specialized training, LLMs suffer at generating correct entity… ▽ More

    Submitted 15 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

  19. arXiv:2406.18962  [pdf, other

    cs.IR

    Multi-modal Food Recommendation using Clustering and Self-supervised Learning

    Authors: Yixin Zhang, Xin Zhou, Qianwen Meng, Fanglin Zhu, Yonghui Xu, Zhiqi Shen, Lizhen Cui

    Abstract: Food recommendation systems serve as pivotal components in the realm of digital lifestyle services, designed to assist users in discovering recipes and food items that resonate with their unique dietary predilections. Typically, multi-modal descriptions offer an exhaustive profile for each recipe, thereby ensuring recommendations that are both personalized and accurate. Our preliminary investigati… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Working paper

  20. arXiv:2406.16633  [pdf, other

    cs.CV

    MLAAN: Scaling Supervised Local Learning with Multilaminar Leap Augmented Auxiliary Network

    Authors: Yuming Zhang, Shouxin Zhang, Peizhe Wang, Feiyu Zhu, Dongzhi Guan, Junhao Su, Jiabin Liu, Changpeng Cai

    Abstract: Deep neural networks (DNNs) typically employ an end-to-end (E2E) training paradigm which presents several challenges, including high GPU memory consumption, inefficiency, and difficulties in model parallelization during training. Recent research has sought to address these issues, with one promising approach being local learning. This method involves partitioning the backbone network into gradient… ▽ More

    Submitted 15 August, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

  21. arXiv:2406.14540  [pdf, other

    cs.RO cs.AI cs.CV

    IRASim: Learning Interactive Real-Robot Action Simulators

    Authors: Fangqi Zhu, Hongtao Wu, Song Guo, Yuxiao Liu, Chilam Cheang, Tao Kong

    Abstract: Scalable robot learning in the real world is limited by the cost and safety issues of real robots. In addition, rolling out robot trajectories in the real world can be time-consuming and labor-intensive. In this paper, we propose to learn an interactive real-robot action simulator as an alternative. We introduce a novel method, IRASim, which leverages the power of generative models to generate ext… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Opensource, project website: https://gen-irasim.github.io

  22. arXiv:2406.13294  [pdf, other

    cs.MM cs.LG

    Enhancing Cross-Prompt Transferability in Vision-Language Models through Contextual Injection of Target Tokens

    Authors: Xikang Yang, Xuehai Tang, Fuqing Zhu, Jizhong Han, Songlin Hu

    Abstract: Vision-language models (VLMs) seamlessly integrate visual and textual data to perform tasks such as image classification, caption generation, and visual question answering. However, adversarial images often struggle to deceive all prompts effectively in the context of cross-prompt migration attacks, as the probability distribution of the tokens in these images tends to favor the semantics of the o… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 13 pages

  23. arXiv:2406.11497  [pdf, other

    cs.CL

    CrAM: Credibility-Aware Attention Modification in LLMs for Combating Misinformation in RAG

    Authors: Boyi Deng, Wenjie Wang, Fengbin Zhu, Qifan Wang, Fuli Feng

    Abstract: Retrieval-Augmented Generation (RAG) can alleviate hallucinations of Large Language Models (LLMs) by referencing external documents. However, the misinformation in external documents may mislead LLMs' generation. To address this issue, we explore the task of "credibility-aware RAG", in which LLMs automatically adjust the influence of retrieved documents based on their credibility scores to counter… ▽ More

    Submitted 27 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Under review

  24. arXiv:2406.10246  [pdf, other

    cs.IR cs.AI

    Semantic-Enhanced Relational Metric Learning for Recommender Systems

    Authors: Mingming Li, Fuqing Zhu, Feng Yuan, Songlin Hu

    Abstract: Recently, relational metric learning methods have been received great attention in recommendation community, which is inspired by the translation mechanism in knowledge graph. Different from the knowledge graph where the entity-to-entity relations are given in advance, historical interactions lack explicit relations between users and items in recommender systems. Currently, many researchers have s… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  25. arXiv:2406.03736  [pdf, other

    cs.LG cs.CL

    Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data

    Authors: Jingyang Ou, Shen Nie, Kaiwen Xue, Fengqi Zhu, Jiacheng Sun, Zhenguo Li, Chongxuan Li

    Abstract: Discrete diffusion models with absorbing processes have shown promise in language modeling. The key quantities to be estimated are the ratios between the marginal probabilities of two transitive states at all timesteps, called the concrete score. In this paper, we reveal that the concrete score in absorbing diffusion can be expressed as conditional probabilities of clean data, multiplied by a time… ▽ More

    Submitted 6 July, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  26. arXiv:2406.00446  [pdf, other

    cs.CV cs.AI

    GLCAN: Global-Local Collaborative Auxiliary Network for Local Learning

    Authors: Feiyu Zhu, Yuming Zhang, Changpeng Cai, Guinan Guo, Jiao Li, Xiuyuan Guo, Quanwei Zhang, Peizhe Wang, Chenghao He, Junhao Su

    Abstract: Traditional deep neural networks typically use end-to-end backpropagation, which often places a big burden on GPU memory. Another promising training method is local learning, which involves splitting the network into blocks and training them in parallel with the help of an auxiliary network. Local learning has been widely studied and applied to image classification tasks, and its performance is co… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  27. Knowledge Enhanced Multi-intent Transformer Network for Recommendation

    Authors: Ding Zou, Wei Wei, Feida Zhu, Chuanyu Xu, Tao Zhang, Chengfu Huo

    Abstract: Incorporating Knowledge Graphs into Recommendation has attracted growing attention in industry, due to the great potential of KG in providing abundant supplementary information and interpretability for the underlying models. However, simply integrating KG into recommendation usually brings in negative feedback in industry, due to the ignorance of the following two factors: i) users' multiple inten… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accept By The Web Conf 2024 (WWW 2024) Industry Track. arXiv admin note: text overlap with arXiv:2204.08807

  28. arXiv:2405.18240  [pdf, other

    cs.CV

    MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution

    Authors: Wenzhuo Liu, Fei Zhu, Shijie Ma, Cheng-Lin Liu

    Abstract: Although Vision Transformers (ViTs) have recently advanced computer vision tasks significantly, an important real-world problem was overlooked: adapting to variable input resolutions. Typically, images are resized to a fixed resolution, such as 224x224, for efficiency during training and inference. However, uniform input size conflicts with real-world scenarios where images naturally vary in resol… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  29. arXiv:2405.17790  [pdf, other

    cs.CV

    Instruct-ReID++: Towards Universal Purpose Instruction-Guided Person Re-identification

    Authors: Weizhen He, Yiheng Deng, Yunfeng Yan, Feng Zhu, Yizhou Wang, Lei Bai, Qingsong Xie, Donglian Qi, Wanli Ouyang, Shixiang Tang

    Abstract: Human intelligence can retrieve any person according to both visual and language descriptions. However, the current computer vision community studies specific person re-identification (ReID) tasks in different scenarios separately, which limits the applications in the real world. This paper strives to resolve this problem by proposing a novel instruct-ReID task that requires the model to retrieve… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2306.07520

  30. arXiv:2405.16060  [pdf, other

    cs.IT

    Delay-Effective Task Offloading Technology in Internet of Vehicles: From the Perspective of the Vehicle Platooning

    Authors: Kan Yu, Fuze Zhu, Xiaowu Liu, Zhiyong Feng, Dong Li

    Abstract: The task offloading technology plays a crucial vital role in the Internet of Vehicle (IoV) with the demands of delay minimum, by jointly optimizing the heterogeneous computing resources supported by the vehicles, roadside units (RSUs), and macro base stations (MBSs). In previous works, on the one hand, they ignored the wireless interference among the exchange and sharing of the task data. On the o… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  31. arXiv:2405.13868  [pdf, other

    cs.LG cs.CL

    Automatically Identifying Local and Global Circuits with Linear Computation Graphs

    Authors: Xuyang Ge, Fukang Zhu, Wentao Shu, Junxuan Wang, Zhengfu He, Xipeng Qiu

    Abstract: Circuit analysis of any certain model behavior is a central task in mechanistic interpretability. We introduce our circuit discovery pipeline with Sparse Autoencoders (SAEs) and a variant called Transcoders. With these two modules inserted into the model, the model's computation graph with respect to OV and MLP circuits becomes strictly linear. Our methods do not require linear approximation to co… ▽ More

    Submitted 21 July, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  32. arXiv:2405.09459  [pdf, other

    cs.CV cs.AI

    Fourier Boundary Features Network with Wider Catchers for Glass Segmentation

    Authors: Xiaolin Qin, Jiacen Liu, Qianlei Wang, Shaolin Zhang, Fei Zhu, Zhang Yi

    Abstract: Glass largely blurs the boundary between the real world and the reflection. The special transmittance and reflectance quality have confused the semantic tasks related to machine vision. Therefore, how to clear the boundary built by glass, and avoid over-capturing features as false positive information in deep structure, matters for constraining the segmentation of reflection surface and penetratin… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  33. arXiv:2405.07827  [pdf, other

    cs.MM cs.AI cs.CV

    Automatic Recognition of Food Ingestion Environment from the AIM-2 Wearable Sensor

    Authors: Yuning Huang, Mohamed Abul Hassan, Jiangpeng He, Janine Higgins, Megan McCrory, Heather Eicher-Miller, Graham Thomas, Edward O Sazonov, Fengqing Maggie Zhu

    Abstract: Detecting an ingestion environment is an important aspect of monitoring dietary intake. It provides insightful information for dietary assessment. However, it is a challenging problem where human-based reviewing can be tedious, and algorithm-based review suffers from data imbalance and perceptual aliasing problems. To address these issues, we propose a neural network-based method with a two-stage… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: Accepted at CVPRw 2024

  34. Robust Beamforming with Gradient-based Liquid Neural Network

    Authors: Xinquan Wang, Fenghao Zhu, Chongwen Huang, Ahmed Alhammadi, Faouzi Bader, Zhaoyang Zhang, Chau Yuen, Merouane Debbah

    Abstract: Millimeter-wave (mmWave) multiple-input multiple-output (MIMO) communication with the advanced beamforming technologies is a key enabler to meet the growing demands of future mobile communication. However, the dynamic nature of cellular channels in large-scale urban mmWave MIMO communication scenarios brings substantial challenges, particularly in terms of complexity and robustness. To address the… ▽ More

    Submitted 29 July, 2024; v1 submitted 12 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE Wireless Communications Letters

  35. arXiv:2405.07257   

    cs.CV

    Listen, Disentangle, and Control: Controllable Speech-Driven Talking Head Generation

    Authors: Changpeng Cai, Guinan Guo, Jiao Li, Junhao Su, Chenghao He, Jing Xiao, Yuanxu Chen, Lei Dai, Feiyu Zhu

    Abstract: Most earlier investigations on talking face generation have focused on the synchronization of lip motion and speech content. However, human head pose and facial emotions are equally important characteristics of natural human faces. While audio-driven talking face generation has seen notable advancements, existing methods either overlook facial emotions or are limited to specific individuals and ca… ▽ More

    Submitted 27 August, 2024; v1 submitted 12 May, 2024; originally announced May 2024.

    Comments: Due to our negligence, there are factual errors in the experimental results, so we are considering resubmitting the paper after an overhaul

    ACM Class: I.4.5; I.4.9

  36. Beamforming Inferring by Conditional WGAN-GP for Holographic Antenna Arrays

    Authors: Fenghao Zhu, Xinquan Wang, Chongwen Huang, Ahmed Alhammadi, Hui Chen, Zhaoyang Zhang, Chau Yuen, Mérouane Debbah

    Abstract: The beamforming technology with large holographic antenna arrays is one of the key enablers for the next generation of wireless systems, which can significantly improve the spectral efficiency. However, the deployment of large antenna arrays implies high algorithm complexity and resource overhead at both receiver and transmitter ends. To address this issue, advanced technologies such as artificial… ▽ More

    Submitted 15 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

    Journal ref: in IEEE Wireless Communications Letters, vol. 13, no. 7, pp. 2023-2027, July 2024

  37. arXiv:2405.00365  [pdf, other

    cs.IT eess.SP

    Robust Continuous-Time Beam Tracking with Liquid Neural Network

    Authors: Fenghao Zhu, Xinquan Wang, Chongwen Huang, Richeng Jin, Qianqian Yang, Ahmed Alhammadi, Zhaoyang Zhang, Chau Yuen, Mérouane Debbah

    Abstract: Millimeter-wave (mmWave) technology is increasingly recognized as a pivotal technology of the sixth-generation communication networks due to the large amounts of available spectrum at high frequencies. However, the huge overhead associated with beam training imposes a significant challenge in mmWave communications, particularly in urban environments with high background noise. To reduce this high… ▽ More

    Submitted 26 August, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

    Comments: 6 pages, 6 figures. Accepted by IEEE Global Communications Conference (GLOBECOM) 2024

  38. arXiv:2404.12638  [pdf, other

    cs.AI

    Learning to Cut via Hierarchical Sequence/Set Model for Efficient Mixed-Integer Programming

    Authors: Jie Wang, Zhihai Wang, Xijun Li, Yufei Kuang, Zhihao Shi, Fangzhou Zhu, Mingxuan Yuan, Jia Zeng, Yongdong Zhang, Feng Wu

    Abstract: Cutting planes (cuts) play an important role in solving mixed-integer linear programs (MILPs), which formulate many important real-world applications. Cut selection heavily depends on (P1) which cuts to prefer and (P2) how many cuts to select. Although modern MILP solvers tackle (P1)-(P2) by human-designed heuristics, machine learning carries the potential to learn more effective heuristics. Howev… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2302.00244

  39. arXiv:2404.12257  [pdf, other

    cs.CV cs.AI cs.LG cs.MM eess.IV

    Food Portion Estimation via 3D Object Scaling

    Authors: Gautham Vinod, Jiangpeng He, Zeman Shao, Fengqing Zhu

    Abstract: Image-based methods to analyze food images have alleviated the user burden and biases associated with traditional methods. However, accurate portion estimation remains a major challenge due to the loss of 3D information in the 2D representation of foods captured by smartphone cameras or wearable devices. In this paper, we propose a new framework to estimate both food volume and energy from 2D imag… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  40. arXiv:2404.11180  [pdf, other

    cs.IR

    Causal Deconfounding via Confounder Disentanglement for Dual-Target Cross-Domain Recommendation

    Authors: Jiajie Zhu, Yan Wang, Feng Zhu, Zhu Sun

    Abstract: In recent years, dual-target Cross-Domain Recommendation (CDR) has been proposed to capture comprehensive user preferences in order to ultimately enhance the recommendation accuracy in both data-richer and data-sparser domains simultaneously. However, in addition to users' true preferences, the user-item interactions might also be affected by confounders (e.g., free shipping, sales promotion). As… ▽ More

    Submitted 9 July, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  41. arXiv:2404.11068  [pdf, other

    cs.LG cs.AI cs.DC q-bio.QM

    ScaleFold: Reducing AlphaFold Initial Training Time to 10 Hours

    Authors: Feiwen Zhu, Arkadiusz Nowaczynski, Rundong Li, Jie Xin, Yifei Song, Michal Marcinkiewicz, Sukru Burc Eryilmaz, Jun Yang, Michael Andersch

    Abstract: AlphaFold2 has been hailed as a breakthrough in protein folding. It can rapidly predict protein structures with lab-grade accuracy. However, its implementation does not include the necessary training code. OpenFold is the first trainable public reimplementation of AlphaFold. AlphaFold training procedure is prohibitively time-consuming, and gets diminishing benefits from scaling to more compute res… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  42. arXiv:2404.07507  [pdf, other

    eess.IV cs.CV

    Learning to Classify New Foods Incrementally Via Compressed Exemplars

    Authors: Justin Yang, Zhihao Duan, Jiangpeng He, Fengqing Zhu

    Abstract: Food image classification systems play a crucial role in health monitoring and diet tracking through image-based dietary assessment techniques. However, existing food recognition systems rely on static datasets characterized by a pre-defined fixed number of food classes. This contrasts drastically with the reality of food consumption, which features constantly changing data. Therefore, food image… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  43. arXiv:2404.06270  [pdf, other

    cs.CV

    3D Geometry-aware Deformable Gaussian Splatting for Dynamic View Synthesis

    Authors: Zhicheng Lu, Xiang Guo, Le Hui, Tianrui Chen, Min Yang, Xiao Tang, Feng Zhu, Yuchao Dai

    Abstract: In this paper, we propose a 3D geometry-aware deformable Gaussian Splatting method for dynamic view synthesis. Existing neural radiance fields (NeRF) based solutions learn the deformation in an implicit manner, which cannot incorporate 3D scene geometry. Therefore, the learned deformation is not necessarily geometrically coherent, which results in unsatisfactory dynamic view synthesis and 3D dynam… ▽ More

    Submitted 14 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024. Project page: https://npucvr.github.io/GaGS/

  44. arXiv:2404.04476  [pdf, other

    cs.LG cs.CV

    DELTA: Decoupling Long-Tailed Online Continual Learning

    Authors: Siddeshwar Raghavan, Jiangpeng He, Fengqing Zhu

    Abstract: A significant challenge in achieving ubiquitous Artificial Intelligence is the limited ability of models to rapidly learn new information in real-world scenarios where data follows long-tailed distributions, all while avoiding forgetting previously acquired knowledge. In this work, we study the under-explored problem of Long-Tailed Online Continual Learning (LTOCL), which aims to learn new tasks f… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: CVPR Workshop acceptance archival track

  45. arXiv:2404.00681  [pdf, other

    cs.CL

    CoUDA: Coherence Evaluation via Unified Data Augmentation

    Authors: Dawei Zhu, Wenhao Wu, Yifan Song, Fangwei Zhu, Ziqiang Cao, Sujian Li

    Abstract: Coherence evaluation aims to assess the organization and structure of a discourse, which remains challenging even in the era of large language models. Due to the scarcity of annotated data, data augmentation is commonly used for training coherence evaluation models. However, previous augmentations for this task primarily rely on heuristic rules, lacking designing criteria as guidance. In this pape… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: NAACL 2024

  46. arXiv:2403.18535  [pdf, other

    eess.IV cs.LG

    Theoretical Bound-Guided Hierarchical VAE for Neural Image Codecs

    Authors: Yichi Zhang, Zhihao Duan, Yuning Huang, Fengqing Zhu

    Abstract: Recent studies reveal a significant theoretical link between variational autoencoders (VAEs) and rate-distortion theory, notably in utilizing VAEs to estimate the theoretical upper bound of the information rate-distortion function of images. Such estimated theoretical bounds substantially exceed the performance of existing neural image codecs (NICs). To narrow this gap, we propose a theoretical bo… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 2024 IEEE International Conference on Multimedia and Expo (ICME2024)

  47. arXiv:2403.18294  [pdf, other

    cs.CV

    Multi-scale Unified Network for Image Classification

    Authors: Wenzhuo Liu, Fei Zhu, Cheng-Lin Liu

    Abstract: Convolutional Neural Networks (CNNs) have advanced significantly in visual representation learning and recognition. However, they face notable challenges in performance and computational efficiency when dealing with real-world, multi-scale image inputs. Conventional methods rescale all input images into a fixed size, wherein a larger fixed size favors performance but rescaling small size images to… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  48. arXiv:2403.18291  [pdf, other

    cs.CV

    Towards Non-Exemplar Semi-Supervised Class-Incremental Learning

    Authors: Wenzhuo Liu, Fei Zhu, Cheng-Lin Liu

    Abstract: Deep neural networks perform remarkably well in close-world scenarios. However, novel classes emerged continually in real applications, making it necessary to learn incrementally. Class-incremental learning (CIL) aims to gradually recognize new classes while maintaining the discriminability of old ones. Existing CIL methods have two limitations: a heavy reliance on preserving old data for forgetti… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  49. arXiv:2403.18266  [pdf, other

    cs.LG cs.CV

    Branch-Tuning: Balancing Stability and Plasticity for Continual Self-Supervised Learning

    Authors: Wenzhuo Liu, Fei Zhu, Cheng-Lin Liu

    Abstract: Self-supervised learning (SSL) has emerged as an effective paradigm for deriving general representations from vast amounts of unlabeled data. However, as real-world applications continually integrate new content, the high computational and resource demands of SSL necessitate continual learning rather than complete retraining. This poses a challenge in striking a balance between stability and plast… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  50. arXiv:2403.17192  [pdf

    cs.CV

    Strategies to Improve Real-World Applicability of Laparoscopic Anatomy Segmentation Models

    Authors: Fiona R. Kolbinger, Jiangpeng He, Jinge Ma, Fengqing Zhu

    Abstract: Accurate identification and localization of anatomical structures of varying size and appearance in laparoscopic imaging are necessary to leverage the potential of computer vision techniques for surgical decision support. Segmentation performance of such models is traditionally reported using metrics of overlap such as IoU. However, imbalanced and unrealistic representation of classes in the train… ▽ More

    Submitted 15 April, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: 14 pages, 5 figures, 4 tables; accepted for the workshop "Data Curation and Augmentation in Medical Imaging" at CVPR 2024 (archival track)