Skip to main content

Showing 1–50 of 5,005 results for author: Wang, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.17209  [pdf, other

    cs.DB

    Updateable Data-Driven Cardinality Estimator with Bounded Q-error

    Authors: Yingze Li, Xianglong Liu, Hongzhi Wang, Kaixin Zhang, Zixuan Wang

    Abstract: Modern Cardinality Estimators struggle with data updates. This research tackles this challenge within single-table. We introduce ICE, an Index-based Cardinality Estimator, the first data-driven estimator that enables instant, tuple-leveled updates. ICE has learned two key lessons from the multidimensional index and applied them to solve cardinality estimation in dynamic scenarios: (1) Index poss… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  2. arXiv:2408.16979  [pdf, other

    cs.CV

    Cross Fusion RGB-T Tracking with Bi-directional Adapter

    Authors: Zhirong Zeng, Xiaotao Liu, Meng Sun, Hongyu Wang, Jing Liu

    Abstract: Many state-of-the-art RGB-T trackers have achieved remarkable results through modality fusion. However, these trackers often either overlook temporal information or fail to fully utilize it, resulting in an ineffective balance between multi-modal and temporal information. To address this issue, we propose a novel Cross Fusion RGB-T Tracking architecture (CFBT) that ensures the full participation o… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  3. arXiv:2408.16767  [pdf, other

    cs.CV cs.AI cs.GR

    ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model

    Authors: Fangfu Liu, Wenqiang Sun, Hanyang Wang, Yikai Wang, Haowen Sun, Junliang Ye, Jun Zhang, Yueqi Duan

    Abstract: Advancements in 3D scene reconstruction have transformed 2D images from the real world into 3D models, producing realistic 3D results from hundreds of input photos. Despite great success in dense-view reconstruction scenarios, rendering a detailed scene from insufficient captured views is still an ill-posed optimization problem, often resulting in artifacts and distortions in unseen areas. In this… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: Project page: https://liuff19.github.io/ReconX

  4. arXiv:2408.16766  [pdf, other

    cs.CV

    CSGO: Content-Style Composition in Text-to-Image Generation

    Authors: Peng Xing, Haofan Wang, Yanpeng Sun, Qixun Wang, Xu Bai, Hao Ai, Renyuan Huang, Zechao Li

    Abstract: The diffusion model has shown exceptional capabilities in controlled image generation, which has further fueled interest in image style transfer. Existing works mainly focus on training free-based methods (e.g., image inversion) due to the scarcity of specific data. In this study, we present a data construction pipeline for content-style-stylized image triplets that generates and automatically cle… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  5. arXiv:2408.16757  [pdf, other

    cs.CV cs.AI

    Dissecting Out-of-Distribution Detection and Open-Set Recognition: A Critical Analysis of Methods and Benchmarks

    Authors: Hongjun Wang, Sagar Vaze, Kai Han

    Abstract: Detecting test-time distribution shift has emerged as a key capability for safely deployed machine learning models, with the question being tackled under various guises in recent years. In this paper, we aim to provide a consolidated view of the two largest sub-fields within the community: out-of-distribution (OOD) detection and open-set recognition (OSR). In particular, we aim to provide rigorous… ▽ More

    Submitted 29 August, 2024; v1 submitted 29 August, 2024; originally announced August 2024.

    Comments: Accepted to IJCV, preprint version; v2: add supplementary

  6. CanCal: Towards Real-time and Lightweight Ransomware Detection and Response in Industrial Environments

    Authors: Shenao Wang, Feng Dong, Hangfeng Yang, Jingheng Xu, Haoyu Wang

    Abstract: Ransomware attacks have emerged as one of the most significant cybersecurity threats. Despite numerous proposed detection and defense methods, existing approaches face two fundamental limitations in large-scale industrial applications: intolerable system overheads and notorious alert fatigue. To address these challenges, we propose CanCal, a real-time and lightweight ransomware detection system. S… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: To appear in the 2024 ACM SIGSAC Conference on Computer and Communications Security (CCS'24), October 14--18, 2024, Salt Lake City

  7. arXiv:2408.16313  [pdf, other

    cs.CV cs.AI

    FA-YOLO: Research On Efficient Feature Selection YOLO Improved Algorithm Based On FMDS and AGMF Modules

    Authors: Yukang Huo, Mingyuan Yao, Qingbin Tian, Tonghao Wang, Ruifeng Wang, Haihua Wang

    Abstract: Over the past few years, the YOLO series of models has emerged as one of the dominant methodologies in the realm of object detection. Many studies have advanced these baseline models by modifying their architectures, enhancing data quality, and developing new loss functions. However, current models still exhibit deficiencies in processing feature maps, such as overlooking the fusion of cross-scale… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 11 pages and 4 figures

  8. arXiv:2408.16307  [pdf, other

    cs.RO cs.AI

    Safe Bayesian Optimization for High-Dimensional Control Systems via Additive Gaussian Processes

    Authors: Hongxuan Wang, Xiaocong Li, Adrish Bhaumik, Prahlad Vadakkepat

    Abstract: Controller tuning and optimization have been among the most fundamental problems in robotics and mechatronic systems. The traditional methodology is usually model-based, but its performance heavily relies on an accurate mathematical model of the system. In control applications with complex dynamics, obtaining a precise model is often challenging, leading us towards a data-driven approach. While op… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  9. arXiv:2408.16094  [pdf, ps, other

    cs.DC

    Monadring: A lightweight consensus protocol to offer Validation-as-a-Service to AVS nodes

    Authors: Yu Zhang, Xiao Yan, Gang Tang, Helena Wang

    Abstract: Existing blockchain networks are often large-scale, requiring transactions to be synchronized across the entire network to reach consensus. On-chain computations can be prohibitively expensive, making many CPU-intensive computations infeasible. Inspired by the structure of IBM's token ring networks, we propose a lightweight consensus protocol called Monadring to address these issues. Monadring all… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 23 pages, 3 figures

  10. arXiv:2408.16061  [pdf, other

    cs.CV

    3D Reconstruction with Spatial Memory

    Authors: Hengyi Wang, Lourdes Agapito

    Abstract: We present Spann3R, a novel approach for dense 3D reconstruction from ordered or unordered image collections. Built on the DUSt3R paradigm, Spann3R uses a transformer-based architecture to directly regress pointmaps from images without any prior knowledge of the scene or camera parameters. Unlike DUSt3R, which predicts per image-pair pointmaps each expressed in its local coordinate frame, Spann3R… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: Project page: \url{https://hengyiwang.github.io/projects/spanner}

  11. arXiv:2408.15778  [pdf, other

    cs.AI cs.CL

    LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models

    Authors: Jiayi Gui, Yiming Liu, Jiale Cheng, Xiaotao Gu, Xiao Liu, Hongning Wang, Yuxiao Dong, Jie Tang, Minlie Huang

    Abstract: Large Language Models (LLMs) have demonstrated notable capabilities across various tasks, showcasing complex problem-solving abilities. Understanding and executing complex rules, along with multi-step planning, are fundamental to logical reasoning and critical for practical LLM agents and decision-making systems. However, evaluating LLMs as effective rule-based executors and planners remains under… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  12. arXiv:2408.15207  [pdf, other

    cs.SE

    Investigating Coverage Criteria in Large Language Models: An In-Depth Study Through Jailbreak Attacks

    Authors: Shide Zhou, Tianlin Li, Kailong Wang, Yihao Huang, Ling Shi, Yang Liu, Haoyu Wang

    Abstract: The swift advancement of large language models (LLMs) has profoundly shaped the landscape of artificial intelligence; however, their deployment in sensitive domains raises grave concerns, particularly due to their susceptibility to malicious exploitation. This situation underscores the insufficiencies in pre-deployment testing, highlighting the urgent need for more rigorous and comprehensive evalu… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  13. arXiv:2408.14506  [pdf, other

    cs.LG

    Distilling Long-tailed Datasets

    Authors: Zhenghao Zhao, Haoxuan Wang, Yuzhang Shang, Kai Wang, Yan Yan

    Abstract: Dataset distillation (DD) aims to distill a small, information-rich dataset from a larger one for efficient neural network training. However, existing DD methods struggle with long-tailed datasets, which are prevalent in real-world scenarios. By investigating the reasons behind this unexpected result, we identified two main causes: 1) Expert networks trained on imbalanced data develop biased gradi… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  14. arXiv:2408.14505  [pdf, other

    cs.LG cs.AI cs.CL

    Empowering Pre-Trained Language Models for Spatio-Temporal Forecasting via Decoupling Enhanced Discrete Reprogramming

    Authors: Hao Wang, Jindong Han, Wei Fan, Hao Liu

    Abstract: Spatio-temporal time series forecasting plays a critical role in various real-world applications, such as transportation optimization, energy management, and climate analysis. The recent advancements in Pre-trained Language Models (PLMs) have inspired efforts to reprogram these models for time series forecasting tasks, by leveraging their superior reasoning and generalization capabilities. However… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  15. arXiv:2408.14491  [pdf, other

    cs.LG cs.MM

    Multimodal Methods for Analyzing Learning and Training Environments: A Systematic Literature Review

    Authors: Clayton Cohn, Eduardo Davalos, Caleb Vatral, Joyce Horn Fonteles, Hanchen David Wang, Meiyi Ma, Gautam Biswas

    Abstract: Recent technological advancements have enhanced our ability to collect and analyze rich multimodal data (e.g., speech, video, and eye gaze) to better inform learning and training experiences. While previous reviews have focused on parts of the multimodal pipeline (e.g., conceptual models and data fusion), a comprehensive literature review on the methods informing multimodal learning and training e… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: Submitted to ACM Computing Surveys. Currently under review

  16. arXiv:2408.14023  [pdf, other

    cs.CV cs.AI

    Video-CCAM: Enhancing Video-Language Understanding with Causal Cross-Attention Masks for Short and Long Videos

    Authors: Jiajun Fei, Dian Li, Zhidong Deng, Zekun Wang, Gang Liu, Hui Wang

    Abstract: Multi-modal large language models (MLLMs) have demonstrated considerable potential across various downstream tasks that require cross-domain knowledge. MLLMs capable of processing videos, known as Video-MLLMs, have attracted broad interest in video-language understanding. However, videos, especially long videos, contain more visual tokens than images, making them difficult for LLMs to process. Exi… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 10 pages, 5 figures

  17. arXiv:2408.13987  [pdf, other

    cs.CL cs.AI

    Focused Large Language Models are Stable Many-Shot Learners

    Authors: Peiwen Yuan, Shaoxiong Feng, Yiwei Li, Xinglin Wang, Yueqi Zhang, Chuyi Tan, Boyuan Pan, Heda Wang, Yao Hu, Kan Li

    Abstract: In-Context Learning (ICL) enables large language models (LLMs) to achieve rapid task adaptation by learning from demonstrations. With the increase in available context length of LLMs, recent experiments have shown that the performance of ICL does not necessarily scale well in many-shot (demonstration) settings. We theoretically and experimentally confirm that the reason lies in more demonstrations… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 15 pages

  18. arXiv:2408.13770  [pdf, other

    cs.CV

    TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers

    Authors: Chuanrui Zhang, Yingshuang Zou, Zhuoling Li, Minmin Yi, Haoqian Wang

    Abstract: Compared with previous 3D reconstruction methods like Nerf, recent Generalizable 3D Gaussian Splatting (G-3DGS) methods demonstrate impressive efficiency even in the sparse-view setting. However, the promising reconstruction performance of existing G-3DGS methods relies heavily on accurate multi-view feature matching, which is quite challenging. Especially for the scenes that have many non-overlap… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  19. arXiv:2408.13756  [pdf, ps, other

    cs.DS

    Revisit the Partial Coloring Method: Prefix Spencer and Sampling

    Authors: Dongrun Cai, Xue Chen, Wenxuan Shu, Haoyu Wang, Guangyi Zou

    Abstract: As the most powerful tool in discrepancy theory, the partial coloring method has wide applications in many problems including the Beck-Fiala problem and Spencer's celebrated result. Currently, there are two major algorithmic methods for the partial coloring method: the first approach uses linear algebraic tools; and the second is called Gaussian measure algorithm. We explore the advantages of thes… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  20. arXiv:2408.13738  [pdf, other

    cs.CL

    Poor-Supervised Evaluation for SuperLLM via Mutual Consistency

    Authors: Peiwen Yuan, Shaoxiong Feng, Yiwei Li, Xinglin Wang, Boyuan Pan, Heda Wang, Yao Hu, Kan Li

    Abstract: The guidance from capability evaluations has greatly propelled the progress of both human society and Artificial Intelligence. However, as LLMs evolve, it becomes challenging to construct evaluation benchmarks for them with accurate labels on hard tasks that approach the boundaries of human capabilities. To credibly conduct evaluation without accurate labels (denoted as poor-supervised evaluation)… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: ACL findings

  21. arXiv:2408.13457  [pdf, other

    cs.CL cs.AI

    Make Every Penny Count: Difficulty-Adaptive Self-Consistency for Cost-Efficient Reasoning

    Authors: Xinglin Wang, Shaoxiong Feng, Yiwei Li, Peiwen Yuan, Yueqi Zhang, Boyuan Pan, Heda Wang, Yao Hu, Kan Li

    Abstract: Self-consistency (SC), a widely used decoding strategy for chain-of-thought reasoning, shows significant gains across various multi-step reasoning tasks but comes with a high cost due to multiple sampling with the preset size. Its variants, Adaptive self-consistency (ASC) and Early-stopping self-consistency (ESC), dynamically adjust the number of samples based on the posterior distribution of a se… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: Preprint

  22. arXiv:2408.13413  [pdf, other

    cs.CV

    TVG: A Training-free Transition Video Generation Method with Diffusion Models

    Authors: Rui Zhang, Yaosen Chen, Yuegen Liu, Wei Wang, Xuming Wen, Hongxia Wang

    Abstract: Transition videos play a crucial role in media production, enhancing the flow and coherence of visual narratives. Traditional methods like morphing often lack artistic appeal and require specialized skills, limiting their effectiveness. Recent advances in diffusion model-based video generation offer new possibilities for creating transitions but face challenges such as poor inter-frame relationshi… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  23. arXiv:2408.12829  [pdf, other

    cs.LG cs.SD eess.AS

    Uncertainty-Aware Mean Opinion Score Prediction

    Authors: Hui Wang, Shiwan Zhao, Jiaming Zhou, Xiguang Zheng, Haoqin Sun, Xuechen Wang, Yong Qin

    Abstract: Mean Opinion Score (MOS) prediction has made significant progress in specific domains. However, the unstable performance of MOS prediction models across diverse samples presents ongoing challenges in the practical application of these systems. In this paper, we point out that the absence of uncertainty modeling is a significant limitation hindering MOS prediction systems from applying to the real… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Accepted by Interspeech 2024, oral

  24. arXiv:2408.12615  [pdf, other

    eess.IV cs.CV cs.LG

    Pediatric TSC-Related Epilepsy Classification from Clinical MR Images Using Quantum Neural Network

    Authors: Ling Lin, Yihang Zhou, Zhanqi Hu, Dian Jiang, Congcong Liu, Shuo Zhou, Yanjie Zhu, Jianxiang Liao, Dong Liang, Hairong Zheng, Haifeng Wang

    Abstract: Tuberous sclerosis complex (TSC) manifests as a multisystem disorder with significant neurological implications. This study addresses the critical need for robust classification models tailored to TSC in pediatric patients, introducing QResNet,a novel deep learning model seamlessly integrating conventional convolutional neural networks with quantum neural networks. The model incorporates a two-lay… ▽ More

    Submitted 26 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

    Comments: 5 pages,4 figures,2 tables,presented at ISBI 2024

  25. arXiv:2408.12599  [pdf, other

    cs.CL

    Controllable Text Generation for Large Language Models: A Survey

    Authors: Xun Liang, Hanyu Wang, Yezhaohui Wang, Shichao Song, Jiawei Yang, Simin Niu, Jie Hu, Dan Liu, Shunyu Yao, Feiyu Xiong, Zhiyu Li

    Abstract: In Natural Language Processing (NLP), Large Language Models (LLMs) have demonstrated high text generation quality. However, in real-world applications, LLMs must meet increasingly complex requirements. Beyond avoiding misleading or inappropriate content, LLMs are also expected to cater to specific user needs, such as imitating particular writing styles or generating text with poetic richness. Thes… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 52 pages, 11 figures, 7 tables, 11 equations

    ACM Class: A.2; I.2.7

  26. arXiv:2408.12590  [pdf, other

    cs.CV cs.AI

    xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

    Authors: Can Qin, Congying Xia, Krithika Ramakrishnan, Michael Ryoo, Lifu Tu, Yihao Feng, Manli Shu, Honglu Zhou, Anas Awadalla, Jun Wang, Senthil Purushwalkam, Le Xue, Yingbo Zhou, Huan Wang, Silvio Savarese, Juan Carlos Niebles, Zeyuan Chen, Ran Xu, Caiming Xiong

    Abstract: We present xGen-VideoSyn-1, a text-to-video (T2V) generation model capable of producing realistic scenes from textual descriptions. Building on recent advancements, such as OpenAI's Sora, we explore the latent diffusion model (LDM) architecture and introduce a video variational autoencoder (VidVAE). VidVAE compresses video data both spatially and temporally, significantly reducing the length of vi… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV24 AI4VA

  27. arXiv:2408.12545  [pdf, other

    cs.LG cond-mat.dis-nn

    Dynamics of Meta-learning Representation in the Teacher-student Scenario

    Authors: Hui Wang, Cho Tung Yip, Bo Li

    Abstract: Gradient-based meta-learning algorithms have gained popularity for their ability to train models on new tasks using limited data. Empirical observations indicate that such algorithms are able to learn a shared representation across tasks, which is regarded as a key factor in their success. However, the in-depth theoretical understanding of the learning dynamics and the origin of the shared represe… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  28. arXiv:2408.12364  [pdf, other

    cs.CV cs.AI cs.ET

    SAM-SP: Self-Prompting Makes SAM Great Again

    Authors: Chunpeng Zhou, Kangjie Ning, Qianqian Shen, Sheng Zhou, Zhi Yu, Haishuai Wang

    Abstract: The recently introduced Segment Anything Model (SAM), a Visual Foundation Model (VFM), has demonstrated impressive capabilities in zero-shot segmentation tasks across diverse natural image datasets. Despite its success, SAM encounters noticeably performance degradation when applied to specific domains, such as medical images. Current efforts to address this issue have involved fine-tuning strategi… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: Under Review

  29. arXiv:2408.12232  [pdf, other

    cs.CV

    BihoT: A Large-Scale Dataset and Benchmark for Hyperspectral Camouflaged Object Tracking

    Authors: Hanzheng Wang, Wei Li, Xiang-Gen Xia, Qian Du

    Abstract: Hyperspectral object tracking (HOT) has exhibited potential in various applications, particularly in scenes where objects are camouflaged. Existing trackers can effectively retrieve objects via band regrouping because of the bias in existing HOT datasets, where most objects tend to have distinguishing visual appearances rather than spectral characteristics. This bias allows the tracker to directly… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  30. arXiv:2408.12171  [pdf, other

    cs.LG

    Recent Advances on Machine Learning for Computational Fluid Dynamics: A Survey

    Authors: Haixin Wang, Yadi Cao, Zijie Huang, Yuxuan Liu, Peiyan Hu, Xiao Luo, Zezheng Song, Wanjia Zhao, Jilin Liu, Jinan Sun, Shikun Zhang, Long Wei, Yue Wang, Tailin Wu, Zhi-Ming Ma, Yizhou Sun

    Abstract: This paper explores the recent advancements in enhancing Computational Fluid Dynamics (CFD) tasks through Machine Learning (ML) techniques. We begin by introducing fundamental concepts, traditional methods, and benchmark datasets, then examine the various roles ML plays in improving CFD. The literature systematically reviews papers in recent five years and introduces a novel classification for for… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 22 pages, 6 figures

  31. arXiv:2408.12102  [pdf, other

    cs.LG cs.CV cs.SD eess.AS

    Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization

    Authors: Luyao Cheng, Hui Wang, Siqi Zheng, Yafeng Chen, Rongjie Huang, Qinglin Zhang, Qian Chen, Xihao Li

    Abstract: Speaker diarization, the process of segmenting an audio stream or transcribed speech content into homogenous partitions based on speaker identity, plays a crucial role in the interpretation and analysis of human speech. Most existing speaker diarization systems rely exclusively on unimodal acoustic information, making the task particularly challenging due to the innate ambiguities of audio signals… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  32. arXiv:2408.12100  [pdf, other

    cs.CV

    A Unified Plug-and-Play Algorithm with Projected Landweber Operator for Split Convex Feasibility Problems

    Authors: Shuchang Zhang, Hongxia Wang

    Abstract: In recent years Plug-and-Play (PnP) methods have achieved state-of-the-art performance in inverse imaging problems by replacing proximal operators with denoisers. Based on the proximal gradient method, some theoretical results of PnP have appeared, where appropriate step size is crucial for convergence analysis. However, in practical applications, applying PnP methods with theoretically guaranteed… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  33. arXiv:2408.11982  [pdf, other

    eess.IV cs.CV cs.MM

    AIM 2024 Challenge on Compressed Video Quality Assessment: Methods and Results

    Authors: Maksim Smirnov, Aleksandr Gushchin, Anastasia Antsiferova, Dmitry Vatolin, Radu Timofte, Ziheng Jia, Zicheng Zhang, Wei Sun, Jiaying Qian, Yuqin Cao, Yinan Sun, Yuxin Zhu, Xiongkuo Min, Guangtao Zhai, Kanjar De, Qing Luo, Ao-Xiang Zhang, Peng Zhang, Haibo Lei, Linyan Jiang, Yaqing Li, Wenhui Meng, Xiaoheng Tan, Haiqiang Wang, Xiaozhong Xu , et al. (11 additional authors not shown)

    Abstract: Video quality assessment (VQA) is a crucial task in the development of video compression standards, as it directly impacts the viewer experience. This paper presents the results of the Compressed Video Quality Assessment challenge, held in conjunction with the Advances in Image Manipulation (AIM) workshop at ECCV 2024. The challenge aimed to evaluate the performance of VQA methods on a diverse dat… ▽ More

    Submitted 28 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

  34. arXiv:2408.11947  [pdf, ps, other

    cs.CE

    Assessing skin thermal injury risk in exposure tests of heating until flight

    Authors: Hongyun Wang, Shannon E. Foley, Hong Zhou

    Abstract: We assess the skin thermal injury risk in the situation where a test subject is exposed to an electromagnetic beam until the occurrence of flight action. The physical process is modeled as follows. The absorbed electromagnetic power increases the skin temperature. Wherever it is above a temperature threshold, thermal nociceptors are activated and transduce an electrical signal. When the activated… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  35. arXiv:2408.11840  [pdf

    cs.CV cs.AI

    Joint PET-MRI Reconstruction with Diffusion Stochastic Differential Model

    Authors: Taofeng Xie, Zhuoxu Cui, Congcong Liu, Chen Luo, Huayu Wang, Yuanzhi Zhang, Xuemei Wang, Yihang Zhou, Qiyu Jin, Guoqing Chen, Dong Liang, Haifeng Wang

    Abstract: PET suffers from a low signal-to-noise ratio. Meanwhile, the k-space data acquisition process in MRI is time-consuming by PET-MRI systems. We aim to accelerate MRI and improve PET image quality. This paper proposed a novel joint reconstruction model by diffusion stochastic differential equations based on learning the joint probability distribution of PET and MRI. Compare the results underscore the… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted as ISMRM 2024 Digital poster 6575. 04-09 May 2024 Singapore

    Journal ref: ISMRM 2024 Digital poster 6575

  36. arXiv:2408.11837  [pdf, other

    cs.LG cs.AI cs.HC eess.SP

    MicroXercise: A Micro-Level Comparative and Explainable System for Remote Physical Therapy

    Authors: Hanchen David Wang, Nibraas Khan, Anna Chen, Nilanjan Sarkar, Pamela Wisniewski, Meiyi Ma

    Abstract: Recent global estimates suggest that as many as 2.41 billion individuals have health conditions that would benefit from rehabilitation services. Home-based Physical Therapy (PT) faces significant challenges in providing interactive feedback and meaningful observation for therapists and patients. To fill this gap, we present MicroXercise, which integrates micro-motion analysis with wearable sensors… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted by IEEE/ACM CHASE 2024

  37. arXiv:2408.11828  [pdf, other

    eess.SP cs.AI cs.LG

    Online Electric Vehicle Charging Detection Based on Memory-based Transformer using Smart Meter Data

    Authors: Ammar Mansoor Kamoona, Hui Song, Mahdi Jalili, Hao Wang, Reza Razzaghi, Xinghuo Yu

    Abstract: The growing popularity of Electric Vehicles (EVs) poses unique challenges for grid operators and infrastructure, which requires effectively managing these vehicles' integration into the grid. Identification of EVs charging is essential to electricity Distribution Network Operators (DNOs) for better planning and managing the distribution grid. One critical aspect is the ability to accurately identi… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  38. Timeline and Boundary Guided Diffusion Network for Video Shadow Detection

    Authors: Haipeng Zhou, Honqiu Wang, Tian Ye, Zhaohu Xing, Jun Ma, Ping Li, Qiong Wang, Lei Zhu

    Abstract: Video Shadow Detection (VSD) aims to detect the shadow masks with frame sequence. Existing works suffer from inefficient temporal learning. Moreover, few works address the VSD problem by considering the characteristic (i.e., boundary) of shadow. Motivated by this, we propose a Timeline and Boundary Guided Diffusion (TBGDiff) network for VSD where we take account of the past-future temporal guidanc… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: ACM MM2024

  39. arXiv:2408.11490  [pdf, other

    cs.CL

    DocTabQA: Answering Questions from Long Documents Using Tables

    Authors: Haochen Wang, Kai Hu, Haoyu Dong, Liangcai Gao

    Abstract: We study a new problem setting of question answering (QA), referred to as DocTabQA. Within this setting, given a long document, the goal is to respond to questions by organizing the answers into structured tables derived directly from the document's content. Unlike traditional QA approaches which predominantly rely on unstructured text to formulate responses, DocTabQA aims to leverage structured t… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 18 pages,5 figures

  40. arXiv:2408.11372  [pdf, other

    cs.IR cs.AI

    Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation

    Authors: Hao Wang, Yongqiang Han, Kefan Wang, Kai Cheng, Zhen Wang, Wei Guo, Yong Liu, Defu Lian, Enhong Chen

    Abstract: In the realm of recommendation systems, users exhibit a diverse array of behaviors when interacting with items. This phenomenon has spurred research into learning the implicit semantic relationships between these behaviors to enhance recommendation performance. However, these methods often entail high computational complexity. To address concerns regarding efficiency, pre-training presents a viabl… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  41. arXiv:2408.11338  [pdf, other

    cs.AI cs.LG

    Automatic Dataset Construction (ADC): Sample Collection, Data Curation, and Beyond

    Authors: Minghao Liu, Zonglin Di, Jiaheng Wei, Zhongruo Wang, Hengxiang Zhang, Ruixuan Xiao, Haoyu Wang, Jinlong Pang, Hao Chen, Ankit Shah, Hongxin Wei, Xinlei He, Zhaowei Zhao, Haobo Wang, Lei Feng, Jindong Wang, James Davis, Yang Liu

    Abstract: Large-scale data collection is essential for developing personalized training data, mitigating the shortage of training data, and fine-tuning specialized models. However, creating high-quality datasets quickly and accurately remains a challenge due to annotation errors, the substantial time and costs associated with human labor. To address these issues, we propose Automatic Dataset Construction (A… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  42. arXiv:2408.11182  [pdf, other

    cs.CR cs.AI

    Hide Your Malicious Goal Into Benign Narratives: Jailbreak Large Language Models through Neural Carrier Articles

    Authors: Zhilong Wang, Haizhou Wang, Nanqing Luo, Lan Zhang, Xiaoyan Sun, Yebo Cao, Peng Liu

    Abstract: Jailbreak attacks on Language Model Models (LLMs) entail crafting prompts aimed at exploiting the models to generate malicious content. This paper proposes a new type of jailbreak attacks which shift the attention of the LLM by inserting a prohibited query into a carrier article. The proposed attack leverage the knowledge graph and a composer LLM to automatically generating a carrier article that… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  43. arXiv:2408.11051  [pdf, other

    cs.CV cs.AI cs.CL cs.RO

    FLAME: Learning to Navigate with Multimodal LLM in Urban Environments

    Authors: Yunzhe Xu, Yiyuan Pan, Zhe Liu, Hesheng Wang

    Abstract: Large Language Models (LLMs) have demonstrated potential in Vision-and-Language Navigation (VLN) tasks, yet current applications face challenges. While LLMs excel in general conversation scenarios, they struggle with specialized navigation tasks, yielding suboptimal performance compared to specialized VLN models. We introduce FLAME (FLAMingo-Architected Embodied Agent), a novel Multimodal LLM-base… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 10 pages, 5 figures

  44. arXiv:2408.10795  [pdf, other

    cs.CL

    Adversarial Attack for Explanation Robustness of Rationalization Models

    Authors: Yuankai Zhang, Lingxiao Kong, Haozhao Wang, Ruixuan Li, Jun Wang, Yuhua Li, Wei Liu

    Abstract: Rationalization models, which select a subset of input text as rationale-crucial for humans to understand and trust predictions-have recently emerged as a prominent research area in eXplainable Artificial Intelligence. However, most of previous studies mainly focus on improving the quality of the rationale, ignoring its robustness to malicious attack. Specifically, whether the rationalization mode… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  45. arXiv:2408.10738  [pdf, other

    cs.CR

    PhishAgent: A Robust Multimodal Agent for Phishing Webpage Detection

    Authors: Tri Cao, Chengyu Huang, Yuexin Li, Huilin Wang, Amy He, Nay Oo, Bryan Hooi

    Abstract: Phishing attacks are a major threat to online security, exploiting user vulnerabilities to steal sensitive information. Various methods have been developed to counteract phishing, each with varying levels of accuracy, but they also encounter notable limitations. In this study, we introduce PhishAgent, a multimodal agent that combines a wide range of tools, integrating both online and offline knowl… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  46. arXiv:2408.10673  [pdf, other

    cs.CR

    Iterative Window Mean Filter: Thwarting Diffusion-based Adversarial Purification

    Authors: Hanrui Wang, Ruoxi Sun, Cunjian Chen, Minhui Xue, Lay-Ki Soon, Shuo Wang, Zhe Jin

    Abstract: Face authentication systems have brought significant convenience and advanced developments, yet they have become unreliable due to their sensitivity to inconspicuous perturbations, such as adversarial attacks. Existing defenses often exhibit weaknesses when facing various attack algorithms and adaptive attacks or compromise accuracy for enhanced security. To address these challenges, we have devel… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Under review

  47. arXiv:2408.10668  [pdf, other

    cs.CR cs.AI

    Probing the Safety Response Boundary of Large Language Models via Unsafe Decoding Path Generation

    Authors: Haoyu Wang, Bingzhe Wu, Yatao Bian, Yongzhe Chang, Xueqian Wang, Peilin Zhao

    Abstract: Large Language Models (LLMs) are implicit troublemakers. While they provide valuable insights and assist in problem-solving, they can also potentially serve as a resource for malicious activities. Implementing safety alignment could mitigate the risk of LLMs generating harmful responses. We argue that: even when an LLM appears to successfully block harmful queries, there may still be hidden vulner… ▽ More

    Submitted 26 August, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

  48. arXiv:2408.10624  [pdf, other

    cs.CV cs.AI

    WRIM-Net: Wide-Ranging Information Mining Network for Visible-Infrared Person Re-Identification

    Authors: Yonggan Wu, Ling-Chao Meng, Yuan Zichao, Sixian Chan, Hong-Qiang Wang

    Abstract: For the visible-infrared person re-identification (VI-ReID) task, one of the primary challenges lies in significant cross-modality discrepancy. Existing methods struggle to conduct modality-invariant information mining. They often focus solely on mining singular dimensions like spatial or channel, and overlook the extraction of specific-modality multi-dimension information. To fully mine modality-… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 18 pages, 5 figures

  49. arXiv:2408.10541  [pdf, other

    cs.CV

    The Instance-centric Transformer for the RVOS Track of LSVOS Challenge: 3rd Place Solution

    Authors: Bin Cao, Yisi Zhang, Hanyi Wang, Xingjian He, Jing Liu

    Abstract: Referring Video Object Segmentation is an emerging multi-modal task that aims to segment objects in the video given a natural language expression. In this work, we build two instance-centric models and fuse predicted results from frame-level and instance-level. First, we introduce instance mask into the DETR-based model for query initialization to achieve temporal enhancement and employ SAM for sp… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2406.13939

  50. arXiv:2408.10327  [pdf, other

    cs.SE

    An Empirical Study on Package-Level Deprecation in Python Ecosystem

    Authors: Zhiqing Zhong, Shilin He, Haoxuan Wang, Boxi Yu, Haowen Yang, Pinjia He

    Abstract: Open-source software (OSS) plays a crucial role in modern software development. Utilizing OSS code can greatly accelerate software development, reduce redundancy, and enhance reliability. Python, a widely adopted programming language, is renowned for its extensive and diverse third-party package ecosystem. However, a significant number of OSS packages within the Python ecosystem are in poor mainte… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Accepted by 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE'25)