Skip to main content

Showing 1–50 of 311 results for author: Feng, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10464  [pdf, ps, other

    cs.SI

    Improved Community Detection using Stochastic Block Models

    Authors: Minhyuk Park, Daniel Wang Feng, Siya Digra, The-Anh Vu-Le, George Chacko, Tandy Warnow

    Abstract: Community detection approaches resolve complex networks into smaller groups (communities) that are expected to be relatively edge-dense and well-connected. The stochastic block model (SBM) is one of several approaches used to uncover community structure in graphs. In this study, we demonstrate that SBM software applied to various real-world and synthetic networks produces poorly-connected to disco… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  2. arXiv:2408.09496  [pdf, other

    cs.CV

    StyleBrush: Style Extraction and Transfer from a Single Image

    Authors: Wancheng Feng, Wanquan Feng, Dawei Huang, Jiaming Pei, Guangliang Cheng, Lukun Wang

    Abstract: Stylization for visual content aims to add specific style patterns at the pixel level while preserving the original structural features. Compared with using predefined styles, stylization guided by reference style images is more challenging, where the main difficulty is to effectively separate style from structural elements. In this paper, we propose StyleBrush, a method that accurately captures s… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 9 pages, 6figures, Under Review

  3. arXiv:2408.06891  [pdf

    cs.AI cs.CE cs.CV cs.LG

    Automatic Feature Recognition and Dimensional Attributes Extraction From CAD Models for Hybrid Additive-Subtractive Manufacturing

    Authors: Muhammad Tayyab Khan, Wenhe Feng, Lequn Chen, Ye Han Ng, Nicholas Yew Jin Tan, Seung Ki Moon

    Abstract: The integration of Computer-Aided Design (CAD), Computer-Aided Process Planning (CAPP), and Computer-Aided Manufacturing (CAM) plays a crucial role in modern manufacturing, facilitating seamless transitions from digital designs to physical products. However, a significant challenge within this integration is the Automatic Feature Recognition (AFR) of CAD models, especially in the context of hybrid… ▽ More

    Submitted 14 August, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

    Comments: 10 pages, 12 figures. This paper has been accepted for presentation at the ASME IDETC-CIE 2024 conference

  4. arXiv:2408.03505  [pdf, other

    cs.CL cs.AI cs.DC

    Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation

    Authors: Weiqi Feng, Yangrui Chen, Shaoyu Wang, Yanghua Peng, Haibin Lin, Minlan Yu

    Abstract: Multimodal large language models (MLLMs) have extended the success of large language models (LLMs) to multiple data types, such as image, text and audio, achieving significant performance in various domains, including multimodal translation, visual question answering and content generation. Nonetheless, existing systems are inefficient to train MLLMs due to substantial GPU bubbles caused by the he… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  5. arXiv:2408.00418  [pdf, other

    cs.CV

    Towards Reliable Advertising Image Generation Using Human Feedback

    Authors: Zhenbang Du, Wei Feng, Haohan Wang, Yaoyu Li, Jingsen Wang, Jian Li, Zheng Zhang, Jingjing Lv, Xin Zhu, Junsheng Jin, Junjie Shen, Zhangang Lin, Jingping Shao

    Abstract: In the e-commerce realm, compelling advertising images are pivotal for attracting customer attention. While generative models automate image generation, they often produce substandard images that may mislead customers and require significant labor costs to inspect. This paper delves into increasing the rate of available generated images. We first introduce a multi-modal Reliable Feedback Network (… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: ECCV2024

  6. arXiv:2407.18690  [pdf, other

    cs.AI

    Collaborative Evolving Strategy for Automatic Data-Centric Development

    Authors: Xu Yang, Haotian Chen, Wenjun Feng, Haoxue Wang, Zeqi Ye, Xinjie Shen, Xiao Yang, Shizhao Sun, Weiqing Liu, Jiang Bian

    Abstract: Artificial Intelligence (AI) significantly influences many fields, largely thanks to the vast amounts of high-quality data for machine learning models. The emphasis is now on a data-centric AI strategy, prioritizing data development over model design progress. Automating this process is crucial. In this paper, we serve as the first work to introduce the automatic data-centric development (AD^2) ta… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 23 pages, 7 figures

  7. arXiv:2407.14047  [pdf, other

    cs.CV cs.AI

    OCTrack: Benchmarking the Open-Corpus Multi-Object Tracking

    Authors: Zekun Qian, Ruize Han, Wei Feng, Junhui Hou, Linqi Song, Song Wang

    Abstract: We study a novel yet practical problem of open-corpus multi-object tracking (OCMOT), which extends the MOT into localizing, associating, and recognizing generic-category objects of both seen (base) and unseen (novel) classes, but without the category text list as prompt. To study this problem, the top priority is to build a benchmark. In this work, we build OCTrackB, a large-scale and comprehensiv… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  8. arXiv:2407.13219  [pdf, other

    cs.CV

    Multi-sentence Video Grounding for Long Video Generation

    Authors: Wei Feng, Xin Wang, Hong Chen, Zeyang Zhang, Wenwu Zhu

    Abstract: Video generation has witnessed great success recently, but their application in generating long videos still remains challenging due to the difficulty in maintaining the temporal consistency of generated videos and the high memory cost during generation. To tackle the problems, in this paper, we propose a brave and new idea of Multi-sentence Video Grounding for Long Video Generation, connecting th… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  9. arXiv:2407.12257  [pdf, other

    cs.CV

    Compound Expression Recognition via Multi Model Ensemble for the ABAW7 Challenge

    Authors: Xuxiong Liu, Kang Shen, Jun Yao, Boyan Wang, Minrui Liu, Liuwei An, Zishun Cui, Weijie Feng, Xiao Sun

    Abstract: Compound Expression Recognition (CER) is vital for effective interpersonal interactions. Human emotional expressions are inherently complex due to the presence of compound expressions, requiring the consideration of both local and global facial cues for accurate judgment. In this paper, we propose an ensemble learning-based solution to address this complexity. Our approach involves training three… ▽ More

    Submitted 26 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2403.12572 by other authors

  10. arXiv:2407.08214  [pdf, other

    cs.LG cs.AI

    Towards stable training of parallel continual learning

    Authors: Li Yuepan, Fan Lyu, Yuyang Li, Wei Feng, Guangcan Liu, Fanhua Shang

    Abstract: Parallel Continual Learning (PCL) tasks investigate the training methods for continual learning with multi-source input, where data from different tasks are learned as they arrive. PCL offers high training efficiency and is well-suited for complex multi-source data systems, such as autonomous vehicles equipped with multiple sensors. However, at any time, multiple tasks need to be trained simultane… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  11. arXiv:2407.05054  [pdf

    cs.CL

    Cross-Lingual Word Alignment for ASEAN Languages with Contrastive Learning

    Authors: Jingshen Zhang, Xinying Qiu, Teng Shen, Wenyu Wang, Kailin Zhang, Wenhe Feng

    Abstract: Cross-lingual word alignment plays a crucial role in various natural language processing tasks, particularly for low-resource languages. Recent study proposes a BiLSTM-based encoder-decoder model that outperforms pre-trained language models in low-resource settings. However, their model only considers the similarity of word embedding spaces and does not explicitly model the differences between wor… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  12. arXiv:2407.04672  [pdf, ps, other

    cs.DS math.PR

    Rapid Mixing via Coupling Independence for Spin Systems with Unbounded Degree

    Authors: Xiaoyu Chen, Weiming Feng

    Abstract: We develop a new framework to prove the mixing or relaxation time for the Glauber dynamics on spin systems with unbounded degree. It works for general spin systems including both $2$-spin and multi-spin systems. As applications for this approach: $\bullet$ We prove the optimal $O(n)$ relaxation time for the Glauber dynamics of random $q$-list-coloring on an $n$-vertices triangle-tree graph with… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  13. arXiv:2407.04663  [pdf, other

    cs.CV cs.LG

    Unsupervised 4D Cardiac Motion Tracking with Spatiotemporal Optical Flow Networks

    Authors: Long Teng, Wei Feng, Menglong Zhu, Xinchao Li

    Abstract: Cardiac motion tracking from echocardiography can be used to estimate and quantify myocardial motion within a cardiac cycle. It is a cost-efficient and effective approach for assessing myocardial function. However, ultrasound imaging has the inherent characteristics of spatially low resolution and temporally random noise, which leads to difficulties in obtaining reliable annotation. Thus it is dif… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  14. arXiv:2407.04576  [pdf, other

    cs.DM cs.DS math.PR

    Optimal Mixing for Randomly Sampling Edge Colorings on Trees Down to the Max Degree

    Authors: Charlie Carlson, Xiaoyu Chen, Weiming Feng, Eric Vigoda

    Abstract: We address the convergence rate of Markov chains for randomly generating an edge coloring of a given tree. Our focus is on the Glauber dynamics which updates the color at a randomly chosen edge in each step. For a tree $T$ with $n$ vertices and maximum degree $Δ$, when the number of colors $q$ satisfies $q\geqΔ+2$ then we prove that the Glauber dynamics has an optimal relaxation time of $O(n)$, wh… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  15. arXiv:2406.19016  [pdf, other

    cs.RO

    Robust Multi-Robot Global Localization with Unknown Initial Pose based on Neighbor Constraints

    Authors: Yaojie Zhang, Haowen Luo, Weijun Wang, Wei Feng

    Abstract: Multi-robot global localization (MR-GL) with unknown initial positions in a large scale environment is a challenging task. The key point is the data association between different robots' viewpoints. It also makes traditional Appearance-based localization methods unusable. Recently, researchers have utilized the object's semantic invariance to generate a semantic graph to address this issue. Howeve… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 7 pages (6+1), accepted by ICRA 2024

  16. arXiv:2406.18079  [pdf, other

    cs.CV eess.IV

    MFDNet: Multi-Frequency Deflare Network for Efficient Nighttime Flare Removal

    Authors: Yiguo Jiang, Xuhang Chen, Chi-Man Pun, Shuqiang Wang, Wei Feng

    Abstract: When light is scattered or reflected accidentally in the lens, flare artifacts may appear in the captured photos, affecting the photos' visual quality. The main challenge in flare removal is to eliminate various flare artifacts while preserving the original content of the image. To address this challenge, we propose a lightweight Multi-Frequency Deflare Network (MFDNet) based on the Laplacian Pyra… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted by The Visual Computer journal

  17. arXiv:2406.08656  [pdf, other

    cs.CV cs.AI cs.CL

    TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation

    Authors: Weixi Feng, Jiachen Li, Michael Saxon, Tsu-jui Fu, Wenhu Chen, William Yang Wang

    Abstract: Video generation has many unique challenges beyond those of image generation. The temporal dimension introduces extensive possible variations across frames, over which consistency and continuity may be violated. In this study, we move beyond evaluating simple actions and argue that generated videos should incorporate the emergence of new concepts and their relation transitions like in real-world v… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  18. arXiv:2406.08407  [pdf, other

    cs.CV cs.AI cs.CL

    MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

    Authors: Xuehai He, Weixi Feng, Kaizhi Zheng, Yujie Lu, Wanrong Zhu, Jiachen Li, Yue Fan, Jianfeng Wang, Linjie Li, Zhengyuan Yang, Kevin Lin, William Yang Wang, Lijuan Wang, Xin Eric Wang

    Abstract: Multimodal Language Language Models (MLLMs) demonstrate the emerging abilities of "world models" -- interpreting and reasoning about complex real-world dynamics. To assess these abilities, we posit videos are the ideal medium, as they encapsulate rich representations of real-world dynamics and causalities. To this end, we introduce MMWorld, a new benchmark for multi-discipline, multi-faceted multi… ▽ More

    Submitted 29 July, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  19. arXiv:2406.02974  [pdf

    cs.CL

    Readability-guided Idiom-aware Sentence Simplification (RISS) for Chinese

    Authors: Jingshen Zhang, Xinglu Chen, Xinying Qiu, Zhimin Wang, Wenhe Feng

    Abstract: Chinese sentence simplification faces challenges due to the lack of large-scale labeled parallel corpora and the prevalence of idioms. To address these challenges, we propose Readability-guided Idiom-aware Sentence Simplification (RISS), a novel framework that combines data augmentation techniques with lexcial simplification. RISS introduces two key components: (1) Readability-guided Paraphrase Se… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted to the 23rd China National Conference on Computational Linguistics (CCL 2024)

  20. arXiv:2406.01112  [pdf, other

    cs.CV

    BACON: Bayesian Optimal Condensation Framework for Dataset Distillation

    Authors: Zheng Zhou, Hongbo Zhao, Guangliang Cheng, Xiangtai Li, Shuchang Lyu, Wenquan Feng, Qi Zhao

    Abstract: Dataset Distillation (DD) aims to distill knowledge from extensive datasets into more compact ones while preserving performance on the test set, thereby reducing storage costs and training expenses. However, existing methods often suffer from computational intensity, particularly exhibiting suboptimal performance with large dataset sizes due to the lack of a robust theoretical framework for analyz… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 22 pages, 10 figures

  21. arXiv:2405.20787  [pdf, other

    cs.CL

    PGA-SciRE: Harnessing LLM on Data Augmentation for Enhancing Scientific Relation Extraction

    Authors: Yang Zhou, Shimin Shan, Hongkui Wei, Zhehuan Zhao, Wenshuo Feng

    Abstract: Relation Extraction (RE) aims at recognizing the relation between pairs of entities mentioned in a text. Advances in LLMs have had a tremendous impact on NLP. In this work, we propose a textual data augmentation framework called PGA for improving the performance of models for RE in the scientific domain. The framework introduces two ways of data augmentation, utilizing a LLM to obtain pseudo-sampl… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  22. arXiv:2405.18750  [pdf, other

    cs.CV

    T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback

    Authors: Jiachen Li, Weixi Feng, Tsu-Jui Fu, Xinyi Wang, Sugato Basu, Wenhu Chen, William Yang Wang

    Abstract: Diffusion-based text-to-video (T2V) models have achieved significant success but continue to be hampered by the slow sampling speed of their iterative sampling processes. To address the challenge, consistency models have been proposed to facilitate fast inference, albeit at the cost of sample quality. In this work, we aim to break the quality bottleneck of a video consistency model (VCM) to achiev… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Project page: https://t2v-turbo.github.io/

  23. arXiv:2405.17457  [pdf, other

    cs.CV cs.DC cs.LG

    Data-Free Federated Class Incremental Learning with Diffusion-Based Generative Memory

    Authors: Naibo Wang, Yuchen Deng, Wenjie Feng, Jianwei Yin, See-Kiong Ng

    Abstract: Federated Class Incremental Learning (FCIL) is a critical yet largely underexplored issue that deals with the dynamic incorporation of new classes within federated learning (FL). Existing methods often employ generative adversarial networks (GANs) to produce synthetic images to address privacy concerns in FL. However, GANs exhibit inherent instability and high sensitivity, compromising the effecti… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  24. arXiv:2405.14602  [pdf, other

    cs.LG

    Controllable Continual Test-Time Adaptation

    Authors: Ziqi Shi, Fan Lyu, Ye Liu, Fanhua Shang, Fuyuan Hu, Wei Feng, Zhang Zhang, Liang Wang

    Abstract: Continual Test-Time Adaptation (CTTA) is an emerging and challenging task where a model trained in a source domain must adapt to continuously changing conditions during testing, without access to the original source data. CTTA is prone to error accumulation due to uncontrollable domain shifts, leading to blurred decision boundaries between categories. Existing CTTA methods primarily focus on suppr… ▽ More

    Submitted 28 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  25. arXiv:2405.12652  [pdf, other

    cs.NI eess.SP

    Edge Information Hub-Empowered 6G NTN: Latency-Oriented Resource Orchestration and Configuration

    Authors: Yueshan Lin, Wei Feng, Yunfei Chen, Ning Ge, Zhiyong Feng, Yue Gao

    Abstract: Quick response to disasters is crucial for saving lives and reducing loss. This requires low-latency uploading of situation information to the remote command center. Since terrestrial infrastructures are often damaged in disaster areas, non-terrestrial networks (NTNs) are preferable to provide network coverage, and mobile edge computing (MEC) could be integrated to improve the latency performance.… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  26. arXiv:2405.11135  [pdf, other

    cs.CR

    AquaLoRA: Toward White-box Protection for Customized Stable Diffusion Models via Watermark LoRA

    Authors: Weitao Feng, Wenbo Zhou, Jiyan He, Jie Zhang, Tianyi Wei, Guanlin Li, Tianwei Zhang, Weiming Zhang, Nenghai Yu

    Abstract: Diffusion models have achieved remarkable success in generating high-quality images. Recently, the open-source models represented by Stable Diffusion (SD) are thriving and are accessible for customization, giving rise to a vibrant community of creators and enthusiasts. However, the widespread availability of customized SD models has led to copyright concerns, like unauthorized model distribution a… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Code is available at https://github.com/Georgefwt/AquaLoRA

  27. arXiv:2405.09133  [pdf, other

    cs.LG

    Overcoming Domain Drift in Online Continual Learning

    Authors: Fan Lyu, Daofeng Liu, Linglan Zhao, Zhang Zhang, Fanhua Shang, Fuyuan Hu, Wei Feng, Liang Wang

    Abstract: Online Continual Learning (OCL) empowers machine learning models to acquire new knowledge online across a sequence of tasks. However, OCL faces a significant challenge: catastrophic forgetting, wherein the model learned in previous tasks is substantially overwritten upon encountering new tasks, leading to a biased forgetting of prior knowledge. Moreover, the continual doman drift in sequential lea… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  28. arXiv:2405.05144  [pdf, other

    cs.CY cs.LG

    Improving Automated Distractor Generation for Math Multiple-choice Questions with Overgenerate-and-rank

    Authors: Alexander Scarlatos, Wanyong Feng, Digory Smith, Simon Woodhead, Andrew Lan

    Abstract: Multiple-choice questions (MCQs) are commonly used across all levels of math education since they can be deployed and graded at a large scale. A critical component of MCQs is the distractors, i.e., incorrect answers crafted to reflect student errors or misconceptions. Automatically generating them in math MCQs, e.g., with large language models, has been challenging. In this work, we propose a nove… ▽ More

    Submitted 13 May, 2024; v1 submitted 18 April, 2024; originally announced May 2024.

    Comments: BEA workshop NAACL 2024

  29. arXiv:2404.13903  [pdf, other

    cs.CV

    Accelerating Image Generation with Sub-path Linear Approximation Model

    Authors: Chen Xu, Tianhui Song, Weixin Feng, Xubin Li, Tiezheng Ge, Bo Zheng, Limin Wang

    Abstract: Diffusion models have significantly advanced the state of the art in image, audio, and video generation tasks. However, their applications in practical scenarios are hindered by slow inference speed. Drawing inspiration from the approximation strategies utilized in consistency models, we propose the Sub-path Linear Approximation Model (SLAM), which accelerates diffusion models while maintaining hi… ▽ More

    Submitted 21 July, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  30. arXiv:2404.12400  [pdf, other

    cs.LG

    Efflex: Efficient and Flexible Pipeline for Spatio-Temporal Trajectory Graph Modeling and Representation Learning

    Authors: Ming Cheng, Ziyi Zhou, Bowen Zhang, Ziyu Wang, Jiaqi Gan, Ziang Ren, Weiqi Feng, Yi Lyu, Hefan Zhang, Xingjian Diao

    Abstract: In the landscape of spatio-temporal data analytics, effective trajectory representation learning is paramount. To bridge the gap of learning accurate representations with efficient and flexible mechanisms, we introduce Efflex, a comprehensive pipeline for transformative graph modeling and representation learning of the large-volume spatio-temporal trajectories. Efflex pioneers the incorporation of… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  31. arXiv:2404.12130  [pdf, other

    cs.LG cs.CV cs.DC

    One-Shot Sequential Federated Learning for Non-IID Data by Enhancing Local Model Diversity

    Authors: Naibo Wang, Yuchen Deng, Wenjie Feng, Shichen Fan, Jianwei Yin, See-Kiong Ng

    Abstract: Traditional federated learning mainly focuses on parallel settings (PFL), which can suffer significant communication and computation costs. In contrast, one-shot and sequential federated learning (SFL) have emerged as innovative paradigms to alleviate these costs. However, the issue of non-IID (Independent and Identically Distributed) data persists as a significant challenge in one-shot and SFL se… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  32. Accelerating Geo-distributed Machine Learning with Network-Aware Adaptive Tree and Auxiliary Route

    Authors: Zonghang Li, Wenjiao Feng, Weibo Cai, Hongfang Yu, Long Luo, Gang Sun, Hongyang Du, Dusit Niyato

    Abstract: Distributed machine learning is becoming increasingly popular for geo-distributed data analytics, facilitating the collaborative analysis of data scattered across data centers in different regions. This paradigm eliminates the need for centralizing sensitive raw data in one location but faces the significant challenge of high parameter synchronization delays, which stems from the constraints of ba… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 17 pages, 20 figures

    MSC Class: 68T99 ACM Class: I.2.11; C.2.4

  33. arXiv:2404.11276  [pdf, other

    cs.AI q-fin.GN

    Towards Data-Centric Automatic R&D

    Authors: Haotian Chen, Xinjie Shen, Zeqi Ye, Wenjun Feng, Haoxue Wang, Xiao Yang, Xu Yang, Weiqing Liu, Jiang Bian

    Abstract: The progress of humanity is driven by those successful discoveries accompanied by countless failed experiments. Researchers often seek the potential research directions by reading and then verifying them through experiments. The process imposes a significant burden on researchers. In the past decade, the data-driven black-box deep learning method has demonstrated its effectiveness in a wide range… ▽ More

    Submitted 30 July, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: 17 pages, 3 figures

  34. arXiv:2404.11111  [pdf, other

    cs.CV

    CorrNet+: Sign Language Recognition and Translation via Spatial-Temporal Correlation

    Authors: Lianyu Hu, Wei Feng, Liqing Gao, Zekang Liu, Liang Wan

    Abstract: In sign language, the conveyance of human body trajectories predominantly relies upon the coordinated movements of hands and facial expressions across successive frames. Despite the recent advancements of sign language understanding methods, they often solely focus on individual frames, inevitably overlooking the inter-frame correlations that are essential for effectively modeling human body traje… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2303.03202

  35. arXiv:2404.09245  [pdf, other

    cs.MM cs.CV

    Arena: A Patch-of-Interest ViT Inference Acceleration System for Edge-Assisted Video Analytics

    Authors: Haosong Peng, Wei Feng, Hao Li, Yufeng Zhan, Qihua Zhou, Yuanqing Xia

    Abstract: The advent of edge computing has made real-time intelligent video analytics feasible. Previous works, based on traditional model architecture (e.g., CNN, RNN, etc.), employ various strategies to filter out non-region-of-interest content to minimize bandwidth and computation consumption but show inferior performance in adverse environments. Recently, visual foundation models based on transformers h… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  36. arXiv:2404.08567  [pdf, other

    cs.CL cs.AI

    CATP: Cross-Attention Token Pruning for Accuracy Preserved Multimodal Model Inference

    Authors: Ruqi Liao, Chuqing Zhao, Jin Li, Weiqi Feng

    Abstract: In response to the rising interest in large multimodal models, we introduce Cross-Attention Token Pruning (CATP), a precision-focused token pruning method. Our approach leverages cross-attention layers in multimodal models, exemplified by BLIP-2, to extract valuable information for token importance determination. CATP employs a refined voting strategy across model heads and layers. In evaluations,… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  37. arXiv:2404.08226  [pdf, other

    cs.CV

    Improving Continuous Sign Language Recognition with Adapted Image Models

    Authors: Lianyu Hu, Tongkai Shi, Liqing Gao, Zekang Liu, Wei Feng

    Abstract: The increase of web-scale weakly labelled image-text pairs have greatly facilitated the development of large-scale vision-language models (e.g., CLIP), which have shown impressive generalization performance over a series of downstream tasks. However, the massive model size and scarcity of available data limit their applications to fine-tune the whole model in downstream tasks. Besides, fully fine-… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  38. arXiv:2404.08021  [pdf, other

    cs.LG cs.AI cs.RO

    VeTraSS: Vehicle Trajectory Similarity Search Through Graph Modeling and Representation Learning

    Authors: Ming Cheng, Bowen Zhang, Ziyu Wang, Ziyi Zhou, Weiqi Feng, Yi Lyu, Xingjian Diao

    Abstract: Trajectory similarity search plays an essential role in autonomous driving, as it enables vehicles to analyze the information and characteristics of different trajectories to make informed decisions and navigate safely in dynamic environments. Existing work on the trajectory similarity search task primarily utilizes sequence-processing algorithms or Recurrent Neural Networks (RNNs), which suffer f… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  39. arXiv:2404.06247  [pdf, other

    cs.CV

    LRR: Language-Driven Resamplable Continuous Representation against Adversarial Tracking Attacks

    Authors: Jianlang Chen, Xuhong Ren, Qing Guo, Felix Juefei-Xu, Di Lin, Wei Feng, Lei Ma, Jianjun Zhao

    Abstract: Visual object tracking plays a critical role in visual-based autonomous systems, as it aims to estimate the position and size of the object of interest within a live video. Despite significant progress made in this field, state-of-the-art (SOTA) trackers often fail when faced with adversarial perturbations in the incoming frames. This can lead to significant robustness and security issues when the… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  40. arXiv:2404.02124  [pdf, other

    cs.CL

    Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models

    Authors: Wanyong Feng, Jaewook Lee, Hunter McNichols, Alexander Scarlatos, Digory Smith, Simon Woodhead, Nancy Otero Ornelas, Andrew Lan

    Abstract: Multiple-choice questions (MCQs) are ubiquitous in almost all levels of education since they are easy to administer, grade, and are a reliable format in assessments and practices. One of the most important aspects of MCQs is the distractors, i.e., incorrect options that are designed to target common errors or misconceptions among real students. To date, the task of crafting high-quality distractor… ▽ More

    Submitted 18 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: NAACL 2024 findings

  41. arXiv:2403.18423  [pdf, other

    cs.CL cs.LG

    SemRoDe: Macro Adversarial Training to Learn Representations That are Robust to Word-Level Attacks

    Authors: Brian Formento, Wenjie Feng, Chuan Sheng Foo, Luu Anh Tuan, See-Kiong Ng

    Abstract: Language models (LMs) are indispensable tools for natural language processing tasks, but their vulnerability to adversarial attacks remains a concern. While current research has explored adversarial training techniques, their improvements to defend against word-level attacks have been limited. In this work, we propose a novel approach called Semantic Robust Defence (SemRoDe), a Macro Adversarial T… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Published in NAACL 2024 (Main Track)

  42. arXiv:2403.12519  [pdf, other

    cs.CV

    Dynamic Spatial-Temporal Aggregation for Skeleton-Aware Sign Language Recognition

    Authors: Lianyu Hu, Liqing Gao, Zekang Liu, Wei Feng

    Abstract: Skeleton-aware sign language recognition (SLR) has gained popularity due to its ability to remain unaffected by background information and its lower computational requirements. Current methods utilize spatial graph modules and temporal modules to capture spatial and temporal features, respectively. However, their spatial graph modules are typically built on fixed graph structures such as graph con… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  43. arXiv:2403.11027  [pdf, other

    cs.CV cs.AI

    Reward Guided Latent Consistency Distillation

    Authors: Jiachen Li, Weixi Feng, Wenhu Chen, William Yang Wang

    Abstract: Latent Consistency Distillation (LCD) has emerged as a promising paradigm for efficient text-to-image synthesis. By distilling a latent consistency model (LCM) from a pre-trained teacher latent diffusion model (LDM), LCD facilitates the generation of high-fidelity images within merely 2 to 4 inference steps. However, the LCM's efficient inference is obtained at the cost of the sample quality. In t… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: Project page: https://rg-lcd.github.io/

  44. arXiv:2403.07630  [pdf, other

    cs.CV cs.AI

    Hunting Attributes: Context Prototype-Aware Learning for Weakly Supervised Semantic Segmentation

    Authors: Feilong Tang, Zhongxing Xu, Zhaojun Qu, Wei Feng, Xingjian Jiang, Zongyuan Ge

    Abstract: Recent weakly supervised semantic segmentation (WSSS) methods strive to incorporate contextual knowledge to improve the completeness of class activation maps (CAM). In this work, we argue that the knowledge bias between instances and contexts affects the capability of the prototype to sufficiently understand instance semantics. Inspired by prototype learning theory, we propose leveraging prototype… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  45. arXiv:2403.04261  [pdf

    cs.AI cs.CL cs.LG

    Advancing Chinese biomedical text mining with community challenges

    Authors: Hui Zong, Rongrong Wu, Jiaxue Cha, Weizhe Feng, Erman Wu, Jiakun Li, Aibin Shao, Liang Tao, Zuofeng Li, Buzhou Tang, Bairong Shen

    Abstract: Objective: This study aims to review the recent advances in community challenges for biomedical text mining in China. Methods: We collected information of evaluation tasks released in community challenges of biomedical text mining, including task description, dataset description, data source, task type and related links. A systematic summary and comparative analysis were conducted on various biome… ▽ More

    Submitted 29 August, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Journal ref: Journal of Biomedical Informatics. 2024;157:104716.

  46. arXiv:2403.03432  [pdf, other

    cs.CL cs.AI

    Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models

    Authors: Wenfeng Feng, Chuzhan Hao, Yuewei Zhang, Yu Han, Hao Wang

    Abstract: Instruction Tuning has the potential to stimulate or enhance specific capabilities of large language models (LLMs). However, achieving the right balance of data is crucial to prevent catastrophic forgetting and interference between tasks. To address these limitations and enhance training flexibility, we propose the Mixture-of-LoRAs (MoA) architecture which is a novel and parameter-efficient tuning… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: 10 pages, COLING24 Accepted

  47. arXiv:2402.07359  [pdf, other

    cs.IT

    Structured Satellite-UAV-Terrestrial Networks for 6G Internet of Things

    Authors: Wei Feng, Yanmin Wang, Yunfei Chen, Ning Ge, Cheng-Xiang Wang

    Abstract: The upcoming sixth generation (6G) wireless communication network is envisioned to cover space, air, and maritime areas, in addition to urban-centered terrestrial coverage by the fifth generation (5G) network, to support intelligent Internet of Things (IoT). Towards this end, we investigate structured integration of satellites, unmanned aerial vehicles (UAVs), and terrestrial networks, aiming to s… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

  48. arXiv:2402.07140  [pdf, other

    cs.AI

    Graph Descriptive Order Improves Reasoning with Large Language Model

    Authors: Yuyao Ge, Shenghua Liu, Wenjie Feng, Lingrui Mei, Lizhe Chen, Xueqi Cheng

    Abstract: In recent years, large language models have achieved state-of-the-art performance across multiple domains. However, the progress in the field of graph reasoning with LLM remains limited. Our work delves into this gap by thoroughly investigating graph reasoning with LLMs. In this work, we reveal the impact of the order of graph description on LLMs' graph reasoning performance, which significantly a… ▽ More

    Submitted 24 February, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

  49. arXiv:2402.02108  [pdf, other

    cs.CV

    From Synthetic to Real: Unveiling the Power of Synthetic Data for Video Person Re-ID

    Authors: Xiangqun Zhang, Ruize Han, Wei Feng

    Abstract: In this paper, we study a new problem of cross-domain video based person re-identification (Re-ID). Specifically, we take the synthetic video dataset as the source domain for training and use the real-world videos for testing, which significantly reduces the dependence on real training data collection and annotation. To unveil the power of synthetic data for video person Re-ID, we first propose a… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  50. arXiv:2401.17617  [pdf, other

    cs.CV cs.AI

    Unveiling the Power of Self-supervision for Multi-view Multi-human Association and Tracking

    Authors: Wei Feng, Feifan Wang, Ruize Han, Zekun Qian, Song Wang

    Abstract: Multi-view multi-human association and tracking (MvMHAT), is a new but important problem for multi-person scene video surveillance, aiming to track a group of people over time in each view, as well as to identify the same person across different views at the same time, which is different from previous MOT and multi-camera MOT tasks only considering the over-time human tracking. This way, the video… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.