Skip to main content

Showing 1–50 of 467 results for author: Yang, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.13915  [pdf, other

    cs.CL cs.AI

    LLMs are Superior Feedback Providers: Bootstrapping Reasoning for Lie Detection with Self-Generated Feedback

    Authors: Tanushree Banerjee, Richard Zhu, Runzhe Yang, Karthik Narasimhan

    Abstract: Large Language Models (LLMs) excel at generating human-like dialogues and comprehending text. However, understanding the subtleties of complex exchanges in language remains a challenge. We propose a bootstrapping framework that leverages self-generated feedback to enhance LLM reasoning capabilities for lie detection. The framework consists of three stages: suggestion, feedback collection, and modi… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 19 pages, 18 figures

  2. arXiv:2408.11805  [pdf, other

    cs.RO cs.CV cs.LG

    ACE: A Cross-Platform Visual-Exoskeletons System for Low-Cost Dexterous Teleoperation

    Authors: Shiqi Yang, Minghuan Liu, Yuzhe Qin, Runyu Ding, Jialong Li, Xuxin Cheng, Ruihan Yang, Sha Yi, Xiaolong Wang

    Abstract: Learning from demonstrations has shown to be an effective approach to robotic manipulation, especially with the recently collected large-scale robot data with teleoperation systems. Building an efficient teleoperation system across diverse robot platforms has become more crucial than ever. However, there is a notable lack of cost-effective and user-friendly teleoperation systems for different end-… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: Webpage: https://ace-teleop.github.io/

  3. arXiv:2408.11187  [pdf, other

    cs.RO cs.AI cs.MA

    Optimization of Multi-Agent Flying Sidekick Traveling Salesman Problem over Road Networks

    Authors: Ruixiao Yang, Chuchu Fan

    Abstract: The mixed truck-drone delivery systems have attracted increasing attention for last-mile logistics, but real-world complexities demand a shift from single-agent, fully connected graph models to multi-agent systems operating on actual road networks. We introduce the multi-agent flying sidekick traveling salesman problem (MA-FSTSP) on road networks, extending the single truck-drone model to multiple… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  4. arXiv:2408.10819  [pdf, other

    cs.CL cs.AI

    Exploiting Large Language Models Capabilities for Question Answer-Driven Knowledge Graph Completion Across Static and Temporal Domains

    Authors: Rui Yang, Jiahao Zhu, Jianping Man, Li Fang, Yi Zhou

    Abstract: Knowledge graph completion (KGC) aims to identify missing triples in a knowledge graph (KG). This is typically achieved through tasks such as link prediction and instance completion. However, these methods often focus on either static knowledge graphs (SKGs) or temporal knowledge graphs (TKGs), addressing only within-scope triples. This paper introduces a new generative completion framework called… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  5. arXiv:2408.05459  [pdf, other

    cs.SI cs.LG

    A Versatile Framework for Attributed Network Clustering via K-Nearest Neighbor Augmentation

    Authors: Yiran Li, Gongyao Guo, Jieming Shi, Renchi Yang, Shiqi Shen, Qing Li, Jun Luo

    Abstract: Attributed networks containing entity-specific information in node attributes are ubiquitous in modeling social networks, e-commerce, bioinformatics, etc. Their inherent network topology ranges from simple graphs to hypergraphs with high-order interactions and multiplex graphs with separate layers. An important graph mining task is node clustering, aiming to partition the nodes of an attributed ne… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: 25 pages. Accepted by the VLDB Journal

  6. arXiv:2408.03468  [pdf, other

    cs.MM cs.AI cs.CV

    MultiHateClip: A Multilingual Benchmark Dataset for Hateful Video Detection on YouTube and Bilibili

    Authors: Han Wang, Tan Rui Yang, Usman Naseem, Roy Ka-Wei Lee

    Abstract: Hate speech is a pressing issue in modern society, with significant effects both online and offline. Recent research in hate speech detection has primarily centered on text-based media, largely overlooking multimodal content such as videos. Existing studies on hateful video datasets have predominantly focused on English content within a Western context and have been limited to binary labels (hatef… ▽ More

    Submitted 12 August, 2024; v1 submitted 28 July, 2024; originally announced August 2024.

    Comments: 10 pages, 3 figures, ACM Multimedia 2024

    ACM Class: I.2.0

  7. arXiv:2408.01680  [pdf, ps, other

    cs.IT

    Service Placement and Trajectory Design for Heterogeneous Tasks in Multi-UAV Cooperative Computing Networks

    Authors: Bin Li, Rongrong Yang, Lei Liu, Celimuge Wu

    Abstract: In this paper, we consider deploying multiple Unmanned Aerial Vehicles (UAVs) to enhance the computation service of Mobile Edge Computing (MEC) through collaborative computation among UAVs. In particular, the tasks of different types and service requirements in MEC network are offloaded from one UAV to another. To pursue the goal of low-carbon edge computing, we study the problem of minimizing sys… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: 11 pages, 10 figures

  8. arXiv:2408.01672  [pdf, ps, other

    eess.SP cs.AI

    radarODE: An ODE-Embedded Deep Learning Model for Contactless ECG Reconstruction from Millimeter-Wave Radar

    Authors: Yuanyuan Zhang, Runwei Guan, Lingxiao Li, Rui Yang, Yutao Yue, Eng Gee Lim

    Abstract: Radar-based contactless cardiac monitoring has become a popular research direction recently, but the fine-grained electrocardiogram (ECG) signal is still hard to reconstruct from millimeter-wave radar signal. The key obstacle is to decouple the cardiac activities in the electrical domain (i.e., ECG) from that in the mechanical domain (i.e., heartbeat), and most existing research only uses pure dat… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  9. arXiv:2408.01413  [pdf, other

    cs.GT

    A Game Theoretic Analysis of High Occupancy Toll Lane Design

    Authors: Zhanhao Zhang, Ruifan Yang, Manxi Wu

    Abstract: In this article, we study the optimal design of High Occupancy Toll (HOT) lanes. The traffic authority determines the road capacity allocation between HOT lanes and ordinary lanes, as well as the toll price charged for travelers using HOT lanes who do not meet the high-occupancy eligibility criteria. We develop a game-theoretic model to analyze the decisions of travelers with heterogeneous prefere… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  10. arXiv:2407.20099  [pdf, other

    cs.CV

    RSC-SNN: Exploring the Trade-off Between Adversarial Robustness and Accuracy in Spiking Neural Networks via Randomized Smoothing Coding

    Authors: Keming Wu, Man Yao, Yuhong Chou, Xuerui Qiu, Rui Yang, Bo Xu, Guoqi Li

    Abstract: Spiking Neural Networks (SNNs) have received widespread attention due to their unique neuronal dynamics and low-power nature. Previous research empirically shows that SNNs with Poisson coding are more robust than Artificial Neural Networks (ANNs) on small-scale datasets. However, it is still unclear in theory how the adversarial robustness of SNNs is derived, and whether SNNs can still maintain it… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 2024

  11. arXiv:2407.19639  [pdf, other

    cs.CR

    Segmented Private Data Aggregation in the Multi-message Shuffle Model

    Authors: Shaowei Wang, Ruilin Yang, Sufen Zeng, Kaiqi Yu, Rundong Mei, Shaozheng Huang, Wei Yang

    Abstract: The shuffle model of differential privacy (DP) offers compelling privacy-utility trade-offs in decentralized settings (e.g., internet of things, mobile edge networks). Particularly, the multi-message shuffle model, where each user may contribute multiple messages, has shown that accuracy can approach that of the central model of DP. However, existing studies typically assume a uniform privacy prot… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  12. arXiv:2407.17466  [pdf, other

    cs.LG math.OC stat.ML

    Traversing Pareto Optimal Policies: Provably Efficient Multi-Objective Reinforcement Learning

    Authors: Shuang Qiu, Dake Zhang, Rui Yang, Boxiang Lyu, Tong Zhang

    Abstract: This paper investigates multi-objective reinforcement learning (MORL), which focuses on learning Pareto optimal policies in the presence of multiple reward functions. Despite MORL's significant empirical success, there is still a lack of satisfactory understanding of various MORL optimization targets and efficient learning algorithms. Our work offers a systematic analysis of several optimization t… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: Initially submitted in May 2024

  13. arXiv:2407.17356  [pdf, other

    cs.LG cs.NE

    Gradient-based inference of abstract task representations for generalization in neural networks

    Authors: Ali Hummos, Felipe del Río, Brabeeba Mien Wang, Julio Hurtado, Cristian B. Calderon, Guangyu Robert Yang

    Abstract: Humans and many animals show remarkably adaptive behavior and can respond differently to the same input depending on their internal goals. The brain not only represents the intermediate abstractions needed to perform a computation but also actively maintains a representation of the computation itself (task abstraction). Such separation of the computation and its abstraction is associated with fast… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  14. arXiv:2407.16554  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization

    Authors: Junyan Wu, Wei Lu, Xiangyang Luo, Rui Yang, Qian Wang, Xiaochun Cao

    Abstract: Recently, a novel form of audio partial forgery has posed challenges to its forensics, requiring advanced countermeasures to detect subtle forgery manipulations within long-duration audio. However, existing countermeasures still serve a classification purpose and fail to perform meaningful analysis of the start and end timestamps of partial forgery segments. To address this challenge, we introduce… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 9pages, 3figures. This paper has been accepted for ACM MM 2024

    MSC Class: 68T07; 68T10 ACM Class: I.2; I.5

  15. arXiv:2407.14007  [pdf, other

    cs.CV cs.AI

    Multi-modal Relation Distillation for Unified 3D Representation Learning

    Authors: Huiqun Wang, Yiping Bao, Panwang Pan, Zeming Li, Xiao Liu, Ruijie Yang, Di Huang

    Abstract: Recent advancements in multi-modal pre-training for 3D point clouds have demonstrated promising results by aligning heterogeneous features across 3D shapes and their corresponding 2D images and language descriptions. However, current straightforward solutions often overlook intricate structural relations among samples, potentially limiting the full capabilities of multi-modal learning. To address… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  16. arXiv:2407.11401  [pdf, other

    cs.CV cs.IR

    EndoFinder: Online Image Retrieval for Explainable Colorectal Polyp Diagnosis

    Authors: Ruijie Yang, Yan Zhu, Peiyao Fu, Yizhe Zhang, Zhihua Wang, Quanlin Li, Pinghong Zhou, Xian Yang, Shuo Wang

    Abstract: Determining the necessity of resecting malignant polyps during colonoscopy screen is crucial for patient outcomes, yet challenging due to the time-consuming and costly nature of histopathology examination. While deep learning-based classification models have shown promise in achieving optical biopsy with endoscopic images, they often suffer from a lack of explainability. To overcome this limitatio… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: MICCAI 2024

  17. arXiv:2407.10794  [pdf, other

    cs.CL cs.AI

    Graphusion: Leveraging Large Language Models for Scientific Knowledge Graph Fusion and Construction in NLP Education

    Authors: Rui Yang, Boming Yang, Sixun Ouyang, Tianwei She, Aosong Feng, Yuang Jiang, Freddy Lecue, Jinghui Lu, Irene Li

    Abstract: Knowledge graphs (KGs) are crucial in the field of artificial intelligence and are widely applied in downstream tasks, such as enhancing Question Answering (QA) systems. The construction of KGs typically requires significant effort from domain experts. Recently, Large Language Models (LLMs) have been used for knowledge graph construction (KGC), however, most existing approaches focus on a local pe… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 24 pages, 11 figures, 13 tables. arXiv admin note: substantial text overlap with arXiv:2402.14293

  18. arXiv:2407.07913  [pdf, other

    cs.IR cs.AI

    CaseGPT: a case reasoning framework based on language models and retrieval-augmented generation

    Authors: Rui Yang

    Abstract: This paper presents CaseGPT, an innovative approach that combines Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) technology to enhance case-based reasoning in the healthcare and legal sectors. The system addresses the challenges of traditional database queries by enabling fuzzy searches based on imprecise descriptions, thereby improving data searchability and usability. Case… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Submitted to ICCBR

  19. arXiv:2407.07059  [pdf, other

    q-bio.NC cs.LG

    Differentiable Optimization of Similarity Scores Between Models and Brains

    Authors: Nathan Cloos, Moufan Li, Markus Siegel, Scott L. Brincat, Earl K. Miller, Guangyu Robert Yang, Christopher J. Cueva

    Abstract: What metrics should guide the development of more realistic models of the brain? One proposal is to quantify the similarity between models and brains using methods such as linear regression, Centered Kernel Alignment (CKA), and angular Procrustes distance. To better understand the limitations of these similarity measures we analyze neural activity recorded in five experiments on nonhuman primates,… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 16 pages, 6 figures

  20. arXiv:2407.04285  [pdf, other

    cs.LG cs.AI

    Robust Decision Transformer: Tackling Data Corruption in Offline RL via Sequence Modeling

    Authors: Jiawei Xu, Rui Yang, Feng Luo, Meng Fang, Baoxiang Wang, Lei Han

    Abstract: Learning policies from offline datasets through offline reinforcement learning (RL) holds promise for scaling data-driven decision-making and avoiding unsafe and costly online interactions. However, real-world data collected from sensors or humans often contains noise and errors, posing a significant challenge for existing offline RL methods. Our study indicates that traditional offline RL methods… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  21. arXiv:2407.03162  [pdf, other

    cs.RO cs.CV cs.LG

    Bunny-VisionPro: Real-Time Bimanual Dexterous Teleoperation for Imitation Learning

    Authors: Runyu Ding, Yuzhe Qin, Jiyue Zhu, Chengzhe Jia, Shiqi Yang, Ruihan Yang, Xiaojuan Qi, Xiaolong Wang

    Abstract: Teleoperation is a crucial tool for collecting human demonstrations, but controlling robots with bimanual dexterous hands remains a challenge. Existing teleoperation systems struggle to handle the complexity of coordinating two hands for intricate manipulations. We introduce Bunny-VisionPro, a real-time bimanual dexterous teleoperation system that leverages a VR headset. Unlike previous vision-bas… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: project page: https://dingry.github.io/projects/bunny_visionpro.html

  22. arXiv:2406.19414  [pdf, other

    q-fin.ST cs.LG q-fin.PR stat.AP stat.ML stat.OT

    Stock Volume Forecasting with Advanced Information by Conditional Variational Auto-Encoder

    Authors: Parley R Yang, Alexander Y Shestopaloff

    Abstract: We demonstrate the use of Conditional Variational Encoder (CVAE) to improve the forecasts of daily stock volume time series in both short and long term forecasting tasks, with the use of advanced information of input variables such as rebalancing dates. CVAE generates non-linear time series as out-of-sample forecasts, which have better accuracy and closer fit of correlation to the actual data, com… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  23. arXiv:2406.17624  [pdf, other

    cs.CL cs.AI

    Self-assessment, Exhibition, and Recognition: a Review of Personality in Large Language Models

    Authors: Zhiyuan Wen, Yu Yang, Jiannong Cao, Haoming Sun, Ruosong Yang, Shuaiqi Liu

    Abstract: As large language models (LLMs) appear to behave increasingly human-like in text-based interactions, more and more researchers become interested in investigating personality in LLMs. However, the diversity of psychological personality research and the rapid development of LLMs have led to a broad yet fragmented landscape of studies in this interdisciplinary field. Extensive studies across differen… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  24. arXiv:2406.17274  [pdf, other

    cs.CL cs.LG

    Can We Trust the Performance Evaluation of Uncertainty Estimation Methods in Text Summarization?

    Authors: Jianfeng He, Runing Yang, Linlin Yu, Changbin Li, Ruoxi Jia, Feng Chen, Ming Jin, Chang-Tien Lu

    Abstract: Text summarization, a key natural language generation (NLG) task, is vital in various domains. However, the high cost of inaccurate summaries in risk-critical applications, particularly those involving human-in-the-loop decision-making, raises concerns about the reliability of uncertainty estimation on text summarization (UE-TS) evaluation methods. This concern stems from the dependency of uncerta… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 63 pages, 41 figures, 11 tables

  25. arXiv:2406.13369  [pdf, other

    cs.LG cs.SI

    Effective Edge-wise Representation Learning in Edge-Attributed Bipartite Graphs

    Authors: Hewen Wang, Renchi Yang, Xiaokui Xiao

    Abstract: Graph representation learning (GRL) is to encode graph elements into informative vector representations, which can be used in downstream tasks for analyzing graph-structured data and has seen extensive applications in various domains. However, the majority of extant studies on GRL are geared towards generating node representations, which cannot be readily employed to perform edge-based analytics t… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 11 pages. Full version of the research paper accepted to KDD 2024

  26. arXiv:2406.12556  [pdf, other

    cs.NI

    Towards Deep Application-Network Integration: Architectures, Progress and Opportunities

    Authors: Berta Serracanta, Kai Gao, Jordi Ros-Giralt, Alberto Rodriguez-Natal, Luis M. Contreras, Richard Yang, Albert Cabellos

    Abstract: With the rise of a new generation of applications (e.g., virtual and augmented reality, artificial intelligence, etc) demanding stringent performance requirements, the need for networking solutions and architectures that can enable a higher Quality of Experience (QoE) is becoming increasingly important. While jointly optimizing application and network may increase the applications' QoE and simul… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  27. arXiv:2406.12449  [pdf

    cs.AI

    Retrieval-Augmented Generation for Generative Artificial Intelligence in Medicine

    Authors: Rui Yang, Yilin Ning, Emilia Keppo, Mingxuan Liu, Chuan Hong, Danielle S Bitterman, Jasmine Chiat Ling Ong, Daniel Shu Wei Ting, Nan Liu

    Abstract: Generative artificial intelligence (AI) has brought revolutionary innovations in various fields, including medicine. However, it also exhibits limitations. In response, retrieval-augmented generation (RAG) provides a potential solution, enabling models to generate more accurate contents by leveraging the retrieval of external knowledge. With the rapid advancement of generative AI, RAG can pave the… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  28. arXiv:2406.12367  [pdf, other

    cs.CV cs.LG cs.MM

    Competitive Learning for Achieving Content-specific Filters in Video Coding for Machines

    Authors: Honglei Zhang, Jukka I. Ahonen, Nam Le, Ruiying Yang, Francesco Cricri

    Abstract: This paper investigates the efficacy of jointly optimizing content-specific post-processing filters to adapt a human oriented video/image codec into a codec suitable for machine vision tasks. By observing that artifacts produced by video/image codecs are content-dependent, we propose a novel training strategy based on competitive learning principles. This strategy assigns training samples to filte… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted to be preseneted in ICIP 2024

  29. arXiv:2406.12053  [pdf, other

    cs.CL

    InternalInspector $I^2$: Robust Confidence Estimation in LLMs through Internal States

    Authors: Mohammad Beigi, Ying Shen, Runing Yang, Zihao Lin, Qifan Wang, Ankith Mohan, Jianfeng He, Ming Jin, Chang-Tien Lu, Lifu Huang

    Abstract: Despite their vast capabilities, Large Language Models (LLMs) often struggle with generating reliable outputs, frequently producing high-confidence inaccuracies known as hallucinations. Addressing this challenge, our research introduces InternalInspector, a novel framework designed to enhance confidence estimation in LLMs by leveraging contrastive learning on internal states including attention st… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 8 pages

  30. arXiv:2406.10216  [pdf, other

    cs.CL cs.AI

    Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs

    Authors: Rui Yang, Ruomeng Ding, Yong Lin, Huan Zhang, Tong Zhang

    Abstract: Reward models trained on human preference data have been proven to be effective for aligning Large Language Models (LLMs) with human intent within the reinforcement learning from human feedback (RLHF) framework. However, the generalization capabilities of current reward models to unseen prompts and responses are limited. This limitation can lead to an unexpected phenomenon known as reward over-opt… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 21 pages

  31. arXiv:2406.07801  [pdf, other

    cs.CL cs.SD eess.AS

    PolySpeech: Exploring Unified Multitask Speech Models for Competitiveness with Single-task Models

    Authors: Runyan Yang, Huibao Yang, Xiqing Zhang, Tiantian Ye, Ying Liu, Yingying Gao, Shilei Zhang, Chao Deng, Junlan Feng

    Abstract: Recently, there have been attempts to integrate various speech processing tasks into a unified model. However, few previous works directly demonstrated that joint optimization of diverse tasks in multitask speech models has positive influence on the performance of individual tasks. In this paper we present a multitask speech model -- PolySpeech, which supports speech recognition, speech synthesis,… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures

  32. arXiv:2406.05482  [pdf, other

    cs.LG

    Efficient Topology-aware Data Augmentation for High-Degree Graph Neural Networks

    Authors: Yurui Lai, Xiaoyang Lin, Renchi Yang, Hongtao Wang

    Abstract: In recent years, graph neural networks (GNNs) have emerged as a potent tool for learning on graph-structured data and won fruitful successes in varied fields. The majority of GNNs follow the message-passing paradigm, where representations of each node are learned by recursively aggregating features of its neighbors. However, this mechanism brings severe over-smoothing and efficiency issues over hi… ▽ More

    Submitted 29 August, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

    Comments: This is the technical report for the paper accepted to KDD 2024. 16 pages

  33. arXiv:2406.04784  [pdf, other

    cs.CL cs.AI

    SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals

    Authors: Ruihan Yang, Jiangjie Chen, Yikai Zhang, Siyu Yuan, Aili Chen, Kyle Richardson, Yanghua Xiao, Deqing Yang

    Abstract: Language agents powered by large language models (LLMs) are increasingly valuable as decision-making tools in domains such as gaming and programming. However, these agents often face challenges in achieving high-level goals without detailed instructions and in adapting to environments where feedback is delayed. In this paper, we present SelfGoal, a novel automatic approach designed to enhance agen… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Preprint

  34. arXiv:2406.02222  [pdf, other

    cs.SE

    Towards an Extensible Model-Based Digital Twin Framework for Space Launch Vehicles

    Authors: Ran Wei, Ruizhe Yang, Shijun Liu, Chongsheng Fan, Rong Zhou, Zekun Wu, Haochi Wang, Yifan Cai, Zhe Jiang

    Abstract: The concept of Digital Twin (DT) is increasingly applied to systems on different levels of abstraction across domains, to support monitoring, analysis, diagnosis, decision making and automated control. Whilst the interest in applying DT is growing, the definition of DT is unclear, neither is there a clear pathway to develop DT to fully realise its capacities. In this paper, we revise the concept o… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  35. arXiv:2406.02143  [pdf, other

    cs.CL

    Reinforcement Tuning for Detecting Stances and Debunking Rumors Jointly with Large Language Models

    Authors: Ruichao Yang, Wei Gao, Jing Ma, Hongzhan Lin, Bo Wang

    Abstract: Learning multi-task models for jointly detecting stance and verifying rumors poses challenges due to the need for training data of stance at post level and rumor veracity at claim level, which are difficult to obtain. To address this issue, we leverage large language models (LLMs) as the foundation annotators for the joint stance detection (SD) and rumor verification (RV) tasks, dubbed as JSDRV. W… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: ACL 2024 (Findings)

  36. arXiv:2406.02038  [pdf, other

    cs.CV

    Leveraging Predicate and Triplet Learning for Scene Graph Generation

    Authors: Jiankai Li, Yunhong Wang, Xiefan Guo, Ruijie Yang, Weixin Li

    Abstract: Scene Graph Generation (SGG) aims to identify entities and predict the relationship triplets \textit{\textless subject, predicate, object\textgreater } in visual scenes. Given the prevalence of large visual variations of subject-object pairs even in the same predicate, it can be quite challenging to model and refine predicate representations directly across such pairs, which is however a common st… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: CVPR 2024

  37. arXiv:2406.01584  [pdf, other

    cs.CV

    SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model

    Authors: An-Chieh Cheng, Hongxu Yin, Yang Fu, Qiushan Guo, Ruihan Yang, Jan Kautz, Xiaolong Wang, Sifei Liu

    Abstract: Vision Language Models (VLMs) have demonstrated remarkable performance in 2D vision and language tasks. However, their ability to reason about spatial arrangements remains limited. In this work, we introduce Spatial Region GPT (SpatialRGPT) to enhance VLMs' spatial perception and reasoning capabilities. SpatialRGPT advances VLMs' spatial understanding through two key innovations: (1) a data curati… ▽ More

    Submitted 18 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Project Page: https://www.anjiecheng.me/SpatialRGPT

  38. arXiv:2406.01069  [pdf, other

    cs.CV

    UniQA: Unified Vision-Language Pre-training for Image Quality and Aesthetic Assessment

    Authors: Hantao Zhou, Longxiang Tang, Rui Yang, Guanyi Qin, Yan Zhang, Runze Hu, Xiu Li

    Abstract: Image Quality Assessment (IQA) and Image Aesthetic Assessment (IAA) aim to simulate human subjective perception of image visual quality and aesthetic appeal. Existing methods typically address these tasks independently due to distinct learning objectives. However, they neglect the underlying interconnectedness of both tasks, which hinders the learning of task-agnostic shared representations for hu… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  39. arXiv:2405.18959  [pdf, other

    cs.CV cs.MM

    Transcending Fusion: A Multi-Scale Alignment Method for Remote Sensing Image-Text Retrieval

    Authors: Rui Yang, Shuang Wang, Yingping Han, Yuanheng Li, Dong Zhao, Dou Quan, Yanhe Guo, Licheng Jiao

    Abstract: Remote Sensing Image-Text Retrieval (RSITR) is pivotal for knowledge services and data mining in the remote sensing (RS) domain. Considering the multi-scale representations in image content and text vocabulary can enable the models to learn richer representations and enhance retrieval. Current multi-scale RSITR approaches typically align multi-scale fused image features with text features, but ove… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 16 pages, 9 figures

  40. arXiv:2405.18525  [pdf, other

    cs.CV

    REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment

    Authors: Haonan Han, Rui Yang, Huan Liao, Jiankai Xing, Zunnan Xu, Xiaoming Yu, Junwei Zha, Xiu Li, Wanhua Li

    Abstract: Traditional image-to-3D models often struggle with scenes containing multiple objects due to biases and occlusion complexities. To address this challenge, we present REPARO, a novel approach for compositional 3D asset generation from single images. REPARO employs a two-step process: first, it extracts individual objects from the scene and reconstructs their 3D meshes using off-the-shelf image-to-3… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  41. arXiv:2405.17673  [pdf, other

    cs.CV cs.LG stat.ML

    Fast Samplers for Inverse Problems in Iterative Refinement Models

    Authors: Kushagra Pandey, Ruihan Yang, Stephan Mandt

    Abstract: Constructing fast samplers for unconditional diffusion and flow-matching models has received much attention recently; however, existing methods for solving inverse problems, such as super-resolution, inpainting, or deblurring, still require hundreds to thousands of iterative steps to obtain high-quality results. We propose a plug-and-play framework for constructing efficient samplers for inverse p… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  42. arXiv:2405.16850  [pdf, other

    eess.IV cs.CV cs.LG

    UniCompress: Enhancing Multi-Data Medical Image Compression with Knowledge Distillation

    Authors: Runzhao Yang, Yinda Chen, Zhihong Zhang, Xiaoyu Liu, Zongren Li, Kunlun He, Zhiwei Xiong, Jinli Suo, Qionghai Dai

    Abstract: In the field of medical image compression, Implicit Neural Representation (INR) networks have shown remarkable versatility due to their flexible compression ratios, yet they are constrained by a one-to-one fitting approach that results in lengthy encoding times. Our novel method, ``\textbf{UniCompress}'', innovatively extends the compression capabilities of INR by being the first to compress multi… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  43. arXiv:2405.16726  [pdf, other

    cs.LG

    Exploring Edge Probability Graph Models Beyond Edge Independency: Concepts, Analyses, and Algorithms

    Authors: Fanchen Bu, Ruochen Yang, Paul Bogdan, Kijung Shin

    Abstract: Desirable random graph models (RGMs) should (i) be tractable so that we can compute and control graph statistics, and (ii) generate realistic structures such as high clustering (i.e., high subgraph densities). A popular category of RGMs (e.g., Erdos-Renyi and stochastic Kronecker) outputs edge probabilities, and we need to realize (i.e., sample from) the edge probabilities to generate graphs. Typi… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  44. arXiv:2405.16376  [pdf, other

    cs.CL cs.GT

    STRIDE: A Tool-Assisted LLM Agent Framework for Strategic and Interactive Decision-Making

    Authors: Chuanhao Li, Runhan Yang, Tiankai Li, Milad Bafarassat, Kourosh Sharifi, Dirk Bergemann, Zhuoran Yang

    Abstract: Large Language Models (LLMs) like GPT-4 have revolutionized natural language processing, showing remarkable linguistic proficiency and reasoning capabilities. However, their application in strategic multi-agent decision-making environments is hampered by significant limitations including poor mathematical reasoning, difficulty in following instructions, and a tendency to generate incorrect informa… ▽ More

    Submitted 27 May, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

    Comments: 39 pages, 4 figures

  45. arXiv:2405.16030  [pdf, other

    cs.LG

    Constrained Ensemble Exploration for Unsupervised Skill Discovery

    Authors: Chenjia Bai, Rushuai Yang, Qiaosheng Zhang, Kang Xu, Yi Chen, Ting Xiao, Xuelong Li

    Abstract: Unsupervised Reinforcement Learning (RL) provides a promising paradigm for learning useful behaviors via reward-free per-training. Existing methods for unsupervised RL mainly conduct empowerment-driven skill discovery or entropy-based exploration. However, empowerment often leads to static skills, and pure exploration only maximizes the state coverage rather than learning useful behaviors. In this… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  46. arXiv:2405.15385  [pdf, other

    cs.CV physics.med-ph

    CPT-Interp: Continuous sPatial and Temporal Motion Modeling for 4D Medical Image Interpolation

    Authors: Xia Li, Runzhao Yang, Xiangtai Li, Antony Lomax, Ye Zhang, Joachim Buhmann

    Abstract: Motion information from 4D medical imaging offers critical insights into dynamic changes in patient anatomy for clinical assessments and radiotherapy planning and, thereby, enhances the capabilities of 3D image analysis. However, inherent physical and technical constraints of imaging hardware often necessitate a compromise between temporal resolution and image quality. Frame interpolation emerges… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  47. arXiv:2405.11922  [pdf, other

    cs.SI cs.LG

    Effective Clustering on Large Attributed Bipartite Graphs

    Authors: Renchi Yang, Yidu Wu, Xiaoyang Lin, Qichen Wang, Tsz Nam Chan, Jieming Shi

    Abstract: Attributed bipartite graphs (ABGs) are an expressive data model for describing the interactions between two sets of heterogeneous nodes that are associated with rich attributes, such as customer-product purchase networks and author-paper authorship graphs. Partitioning the target node set in such graphs into k disjoint clusters (referred to as k-ABGC) finds widespread use in various domains, inclu… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: The technical report for the paper was accepted to KDD 2024. 14 pages

  48. arXiv:2405.11921  [pdf, other

    cs.CV

    MirrorGaussian: Reflecting 3D Gaussians for Reconstructing Mirror Reflections

    Authors: Jiayue Liu, Xiao Tang, Freeman Cheng, Roy Yang, Zhihao Li, Jianzhuang Liu, Yi Huang, Jiaqi Lin, Shiyong Liu, Xiaofei Wu, Songcen Xu, Chun Yuan

    Abstract: 3D Gaussian Splatting showcases notable advancements in photo-realistic and real-time novel view synthesis. However, it faces challenges in modeling mirror reflections, which exhibit substantial appearance variations from different viewpoints. To tackle this problem, we present MirrorGaussian, the first method for mirror scene reconstruction with real-time rendering based on 3D Gaussian Splatting.… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  49. arXiv:2405.11754  [pdf, other

    cs.CV

    Versatile Teacher: A Class-aware Teacher-student Framework for Cross-domain Adaptation

    Authors: Runou Yang, Tian Tian, Jinwen Tian

    Abstract: Addressing the challenge of domain shift between datasets is vital in maintaining model performance. In the context of cross-domain object detection, the teacher-student framework, a widely-used semi-supervised model, has shown significant accuracy improvements. However, existing methods often overlook class differences, treating all classes equally, resulting in suboptimal results. Furthermore, t… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  50. arXiv:2405.11225  [pdf, other

    cs.SI cs.AI

    SeBot: Structural Entropy Guided Multi-View Contrastive Learning for Social Bot Detection

    Authors: Yingguang Yang, Qi Wu, Buyun He, Hao Peng, Renyu Yang, Zhifeng Hao, Yong Liao

    Abstract: Recent advancements in social bot detection have been driven by the adoption of Graph Neural Networks. The social graph, constructed from social network interactions, contains benign and bot accounts that influence each other. However, previous graph-based detection methods that follow the transductive message-passing paradigm may not fully utilize hidden graph information and are vulnerable to ad… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: KDD 2024