Skip to main content

Showing 1–50 of 189 results for author: Yao, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.06714  [pdf, other

    cs.CV

    Improving the Transferability of Adversarial Examples by Feature Augmentation

    Authors: Donghua Wang, Wen Yao, Tingsong Jiang, Xiaohu Zheng, Junqi Wu, Xiaoqian Chen

    Abstract: Despite the success of input transformation-based attacks on boosting adversarial transferability, the performance is unsatisfying due to the ignorance of the discrepancy across models. In this paper, we propose a simple but effective feature augmentation attack (FAUG) method, which improves adversarial transferability without introducing extra computation costs. Specifically, we inject the random… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 19 pages, 4 figures, 4 tables

  2. arXiv:2407.06688  [pdf, other

    cs.CV

    Universal Multi-view Black-box Attack against Object Detectors via Layout Optimization

    Authors: Donghua Wang, Wen Yao, Tingsong Jiang, Chao Li, Xiaoqian Chen

    Abstract: Object detectors have demonstrated vulnerability to adversarial examples crafted by small perturbations that can deceive the object detector. Existing adversarial attacks mainly focus on white-box attacks and are merely valid at a specific viewpoint, while the universal multi-view black-box attack is less explored, limiting their generalization in practice. In this paper, we propose a novel univer… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 12 pages, 13 figures, 5 tables

  3. arXiv:2407.06043  [pdf, other

    cs.CV

    Test-time adaptation for geospatial point cloud semantic segmentation with distinct domain shifts

    Authors: Puzuo Wang, Wei Yao, Jie Shao, Zhiyi He

    Abstract: Domain adaptation (DA) techniques help deep learning models generalize across data shifts for point cloud semantic segmentation (PCSS). Test-time adaptation (TTA) allows direct adaptation of a pre-trained model to unlabeled data during inference stage without access to source data or additional training, avoiding privacy issues and large computational resources. We address TTA for geospatial PCSS… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  4. arXiv:2407.02830  [pdf, other

    cs.CV eess.IV

    A Radiometric Correction based Optical Modeling Approach to Removing Reflection Noise in TLS Point Clouds of Urban Scenes

    Authors: Li Fang, Tianyu Li, Yanghong Lin, Shudong Zhou, Wei Yao

    Abstract: Point clouds are vital in computer vision tasks such as 3D reconstruction, autonomous driving, and robotics. However, TLS-acquired point clouds often contain virtual points from reflective surfaces, causing disruptions. This study presents a reflection noise elimination algorithm for TLS point clouds. Our innovative reflection plane detection algorithm, based on geometry-optical models and physica… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  5. arXiv:2406.18518  [pdf, other

    cs.CL cs.AI cs.LG cs.SE

    APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

    Authors: Zuxin Liu, Thai Hoang, Jianguo Zhang, Ming Zhu, Tian Lan, Shirley Kokane, Juntao Tan, Weiran Yao, Zhiwei Liu, Yihao Feng, Rithesh Murthy, Liangwei Yang, Silvio Savarese, Juan Carlos Niebles, Huan Wang, Shelby Heinecke, Caiming Xiong

    Abstract: The advancement of function-calling agent models requires diverse, reliable, and high-quality datasets. This paper presents APIGen, an automated data generation pipeline designed to synthesize verifiable high-quality datasets for function-calling applications. We leverage APIGen and collect 3,673 executable APIs across 21 different categories to generate diverse function-calling datasets in a scal… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  6. arXiv:2406.12084  [pdf, other

    cs.CL cs.AI

    When Reasoning Meets Information Aggregation: A Case Study with Sports Narratives

    Authors: Yebowen Hu, Kaiqiang Song, Sangwoo Cho, Xiaoyang Wang, Wenlin Yao, Hassan Foroosh, Dong Yu, Fei Liu

    Abstract: Reasoning is most powerful when an LLM accurately aggregates relevant information. We examine the critical role of information aggregation in reasoning by requiring the LLM to analyze sports narratives. To succeed at this task, an LLM must infer points from actions, identify related entities, attribute points accurately to players and teams, and compile key statistics to draw conclusions. We condu… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  7. arXiv:2406.11739  [pdf, other

    cs.CV

    V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results

    Authors: Jiaqi Wang, Yuhang Zang, Pan Zhang, Tao Chu, Yuhang Cao, Zeyi Sun, Ziyu Liu, Xiaoyi Dong, Tong Wu, Dahua Lin, Zeming Chen, Zhi Wang, Lingchen Meng, Wenhao Yao, Jianwei Yang, Sihong Wu, Zhineng Chen, Zuxuan Wu, Yu-Gang Jiang, Peixi Wu, Bosong Chai, Xuan Nie, Longquan Yan, Zeyu Wang, Qifan Zhou , et al. (9 additional authors not shown)

    Abstract: Detecting objects in real-world scenes is a complex task due to various challenges, including the vast range of object categories, and potential encounters with previously unknown or unseen objects. The challenges necessitate the development of public benchmarks and challenges to advance the field of object detection. Inspired by the success of previous COCO and LVIS Challenges, we organize the V3… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  8. arXiv:2406.11592  [pdf, other

    cs.CV

    ChildDiffusion: Unlocking the Potential of Generative AI and Controllable Augmentations for Child Facial Data using Stable Diffusion and Large Language Models

    Authors: Muhammad Ali Farooq, Wang Yao, Peter Corcoran

    Abstract: In this research work we have proposed high-level ChildDiffusion framework capable of generating photorealistic child facial samples and further embedding several intelligent augmentations on child facial data using short text prompts, detailed textual guidance from LLMs, and further image to image transformation using text guidance control conditioning thus providing an opportunity to curate full… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE Transactions Journal for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  9. arXiv:2406.10932  [pdf, other

    cs.SD cs.AI eess.AS

    Imperceptible Rhythm Backdoor Attacks: Exploring Rhythm Transformation for Embedding Undetectable Vulnerabilities on Speech Recognition

    Authors: Wenhan Yao, Jiangkun Yang, Yongqiang He, Jia Liu, Weiping Wen

    Abstract: Speech recognition is an essential start ring of human-computer interaction, and recently, deep learning models have achieved excellent success in this task. However, when the model training and private data provider are always separated, some security threats that make deep neural networks (DNNs) abnormal deserve to be researched. In recent years, the typical backdoor attacks have been researched… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  10. arXiv:2406.10252  [pdf, other

    cs.IR cs.AI cs.CL

    AutoSurvey: Large Language Models Can Automatically Write Surveys

    Authors: Yidong Wang, Qi Guo, Wenjin Yao, Hongbo Zhang, Xin Zhang, Zhen Wu, Meishan Zhang, Xinyu Dai, Min Zhang, Qingsong Wen, Wei Ye, Shikun Zhang, Yue Zhang

    Abstract: This paper introduces AutoSurvey, a speedy and well-organized methodology for automating the creation of comprehensive literature surveys in rapidly evolving fields like artificial intelligence. Traditional survey paper creation faces challenges due to the vast volume and complexity of information, prompting the need for efficient survey methods. While large language models (LLMs) offer promise in… ▽ More

    Submitted 17 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

  11. arXiv:2406.06932  [pdf, other

    cs.CV

    Synthetic Face Ageing: Evaluation, Analysis and Facilitation of Age-Robust Facial Recognition Algorithms

    Authors: Wang Yao, Muhammad Ali Farooq, Joseph Lemley, Peter Corcoran

    Abstract: The ability to accurately recognize an individual's face with respect to human aging factor holds significant importance for various private as well as government sectors such as customs and public security bureaus, passport office, and national database systems. Therefore, developing a robust age-invariant face recognition system is of crucial importance to address the challenges posed by ageing… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  12. arXiv:2405.19444  [pdf, other

    cs.AI

    MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions

    Authors: Zhenwen Liang, Dian Yu, Wenhao Yu, Wenlin Yao, Zhihan Zhang, Xiangliang Zhang, Dong Yu

    Abstract: Large language models (LLMs) have demonstrated impressive capabilities in mathematical problem solving, particularly in single turn question answering formats. However, real world scenarios often involve mathematical question answering that requires multi turn or interactive information exchanges, and the performance of LLMs on these tasks is still underexplored. This paper introduces MathChat, a… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  13. arXiv:2405.18777  [pdf, other

    math.OC cs.LG

    SPABA: A Single-Loop and Probabilistic Stochastic Bilevel Algorithm Achieving Optimal Sample Complexity

    Authors: Tianshu Chu, Dachuan Xu, Wei Yao, Jin Zhang

    Abstract: While stochastic bilevel optimization methods have been extensively studied for addressing large-scale nested optimization problems in machine learning, it remains an open question whether the optimal complexity bounds for solving bilevel optimization are the same as those in single-level optimization. Our main result resolves this question: SPABA, an adaptation of the PAGE method for nonconvex op… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  14. arXiv:2405.09927  [pdf, other

    math.OC cs.LG

    Moreau Envelope for Nonconvex Bi-Level Optimization: A Single-loop and Hessian-free Solution Strategy

    Authors: Risheng Liu, Zhu Liu, Wei Yao, Shangzhi Zeng, Jin Zhang

    Abstract: This work focuses on addressing two major challenges in the context of large-scale nonconvex Bi-Level Optimization (BLO) problems, which are increasingly applied in machine learning due to their ability to model nested structures. These challenges involve ensuring computational efficiency and providing theoretical guarantees. While recent advances in scalable BLO algorithms have primarily relied o… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  15. arXiv:2405.08283  [pdf, other

    cs.RO

    Vector Field-Guided Learning Predictive Control for Motion Planning of Mobile Robots with Uncertain Dynamics

    Authors: Yang Lu, Weijia Yao, Yongqian Xiao, Xinglong Zhang, Xin Xu, Yaonan Wang, Dingbang Xiao

    Abstract: In obstacle-dense scenarios, providing safe guidance for mobile robots is critical to improve the safe maneuvering capability. However, the guidance provided by standard guiding vector fields (GVFs) may limit the motion capability due to the improper curvature of the integral curve when traversing obstacles. On the other hand, robotic system dynamics are often time-varying, uncertain, and even unk… ▽ More

    Submitted 17 July, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

  16. arXiv:2405.04861  [pdf, other

    cs.SE

    Insights into Deep Learning Refactoring: Bridging the Gap Between Practices and Expectations

    Authors: SiQi Wang, Xing Hu, Bei Wang, WenXin Yao, Xin Xia, XingYu Wang

    Abstract: With the rapid development of deep learning, the implementation of intricate algorithms and substantial data processing have become standard elements of deep learning projects. As a result, the code has become progressively complex as the software evolves, which is difficult to maintain and understand. Existing studies have investigated the impact of refactoring on software quality within traditio… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 24 pages, 18 figures

  17. arXiv:2404.13692  [pdf, other

    cs.CV

    A sustainable development perspective on urban-scale roof greening priorities and benefits

    Authors: Jie Shao, Wei Yao, Lei Luo, Linzhou Zeng, Zhiyi He, Puzuo Wang, Huadong Guo

    Abstract: Greenspaces are tightly linked to human well-being. Yet, rapid urbanization has exacerbated greenspace exposure inequality and declining human life quality. Roof greening has been recognized as an effective strategy to mitigate these negative impacts. Understanding priorities and benefits is crucial to promoting green roofs. Here, using geospatial big data, we conduct an urban-scale assessment of… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  18. arXiv:2404.06003  [pdf, other

    cs.CL cs.AI

    FreeEval: A Modular Framework for Trustworthy and Efficient Evaluation of Large Language Models

    Authors: Zhuohao Yu, Chang Gao, Wenjin Yao, Yidong Wang, Zhengran Zeng, Wei Ye, Jindong Wang, Yue Zhang, Shikun Zhang

    Abstract: The rapid development of large language model (LLM) evaluation methodologies and datasets has led to a profound challenge: integrating state-of-the-art evaluation techniques cost-effectively while ensuring reliability, reproducibility, and efficiency. Currently, there is a notable absence of a unified and adaptable framework that seamlessly integrates various evaluation approaches. Moreover, the r… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: We open-source all our code at: https://github.com/WisdomShell/FreeEval

  19. arXiv:2403.08946  [pdf, other

    cs.LG cs.CL cs.CY

    Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era

    Authors: Xuansheng Wu, Haiyan Zhao, Yaochen Zhu, Yucheng Shi, Fan Yang, Tianming Liu, Xiaoming Zhai, Wenlin Yao, Jundong Li, Mengnan Du, Ninghao Liu

    Abstract: Explainable AI (XAI) refers to techniques that provide human-understandable insights into the workings of AI models. Recently, the focus of XAI is being extended towards Large Language Models (LLMs) which are often criticized for their lack of transparency. This extension calls for a significant transformation in XAI methodologies because of two reasons. First, many existing XAI methods cannot be… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: 38 pages, 4 figures

  20. arXiv:2403.06197  [pdf, other

    eess.IV cs.CV cs.LG

    DrFuse: Learning Disentangled Representation for Clinical Multi-Modal Fusion with Missing Modality and Modal Inconsistency

    Authors: Wenfang Yao, Kejing Yin, William K. Cheung, Jia Liu, Jing Qin

    Abstract: The combination of electronic health records (EHR) and medical images is crucial for clinicians in making diagnoses and forecasting prognosis. Strategically fusing these two data modalities has great potential to improve the accuracy of machine learning models in clinical prediction tasks. However, the asynchronous and complementary nature of EHR and medical images presents unique challenges. Miss… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: Accepted by AAAI-24

  21. arXiv:2403.02132  [pdf, other

    cs.CV

    UB-FineNet: Urban Building Fine-grained Classification Network for Open-access Satellite Images

    Authors: Zhiyi He, Wei Yao, Jie Shao, Puzuo Wang

    Abstract: Fine classification of city-scale buildings from satellite remote sensing imagery is a crucial research area with significant implications for urban planning, infrastructure development, and population distribution analysis. However, the task faces big challenges due to low-resolution overhead images acquired from high altitude space-borne platforms and the long-tail sample distribution of fine-gr… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  22. arXiv:2402.19465  [pdf, other

    cs.CL cs.AI

    Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models

    Authors: Chen Qian, Jie Zhang, Wei Yao, Dongrui Liu, Zhenfei Yin, Yu Qiao, Yong Liu, Jing Shao

    Abstract: Ensuring the trustworthiness of large language models (LLMs) is crucial. Most studies concentrate on fully pre-trained LLMs to better understand and improve LLMs' trustworthiness. In this paper, to reveal the untapped potential of pre-training, we pioneer the exploration of LLMs' trustworthiness during this period, focusing on five key dimensions: reliability, privacy, toxicity, fairness, and robu… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  23. arXiv:2402.17124  [pdf, other

    cs.CL

    Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language Models

    Authors: Xinran Zhao, Hongming Zhang, Xiaoman Pan, Wenlin Yao, Dong Yu, Tongshuang Wu, Jianshu Chen

    Abstract: For a LLM to be trustworthy, its confidence level should be well-calibrated with its actual performance. While it is now common sense that LLM performances are greatly impacted by prompts, the confidence calibration in prompting LLMs has yet to be thoroughly explored. In this paper, we explore how different prompting strategies influence LLM confidence calibration and how it could be improved. We… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: 17 pages, 10 figures

  24. arXiv:2402.15538  [pdf, other

    cs.MA cs.AI

    AgentLite: A Lightweight Library for Building and Advancing Task-Oriented LLM Agent System

    Authors: Zhiwei Liu, Weiran Yao, Jianguo Zhang, Liangwei Yang, Zuxin Liu, Juntao Tan, Prafulla K. Choubey, Tian Lan, Jason Wu, Huan Wang, Shelby Heinecke, Caiming Xiong, Silvio Savarese

    Abstract: The booming success of LLMs initiates rapid development in LLM agents. Though the foundation of an LLM agent is the generative model, it is critical to devise the optimal reasoning strategies and agent architectures. Accordingly, LLM agent research advances from the simple chain-of-thought prompting to more complex ReAct and Reflection reasoning strategy; agent architecture also evolves from singl… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: preprint. Library is available at https://github.com/SalesforceAIResearch/AgentLite

  25. arXiv:2402.15506  [pdf, other

    cs.AI cs.CL cs.LG

    AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning

    Authors: Jianguo Zhang, Tian Lan, Rithesh Murthy, Zhiwei Liu, Weiran Yao, Juntao Tan, Thai Hoang, Liangwei Yang, Yihao Feng, Zuxin Liu, Tulika Awalgaonkar, Juan Carlos Niebles, Silvio Savarese, Shelby Heinecke, Huan Wang, Caiming Xiong

    Abstract: Autonomous agents powered by large language models (LLMs) have garnered significant research attention. However, fully harnessing the potential of LLMs for agent-based tasks presents inherent challenges due to the heterogeneous nature of diverse data sources featuring multi-turn trajectories. In this paper, we introduce \textbf{AgentOhana} as a comprehensive solution to address these challenges. \… ▽ More

    Submitted 20 March, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: Add GitHub repo link at \url{https://github.com/SalesforceAIResearch/xLAM} and HuggingFace model link at \url{https://huggingface.co/Salesforce/xLAM-v0.1-r}

  26. arXiv:2402.15043  [pdf, other

    cs.CL cs.AI cs.LG

    KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models

    Authors: Zhuohao Yu, Chang Gao, Wenjin Yao, Yidong Wang, Wei Ye, Jindong Wang, Xing Xie, Yue Zhang, Shikun Zhang

    Abstract: Automatic evaluation methods for large language models (LLMs) are hindered by data contamination, leading to inflated assessments of their effectiveness. Existing strategies, which aim to detect contaminated texts, focus on quantifying contamination status instead of accurately gauging model performance. In this paper, we introduce KIEval, a Knowledge-grounded Interactive Evaluation framework, whi… ▽ More

    Submitted 3 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: Accepted to ACL 2024 (main conference); 19 pages, 5 figures, 19 tables, code is available at: https://github.com/zhuohaoyu/KIEval

  27. arXiv:2401.16164  [pdf, other

    cs.LG math.OC

    Constrained Bi-Level Optimization: Proximal Lagrangian Value function Approach and Hessian-free Algorithm

    Authors: Wei Yao, Chengming Yu, Shangzhi Zeng, Jin Zhang

    Abstract: This paper presents a new approach and algorithm for solving a class of constrained Bi-Level Optimization (BLO) problems in which the lower-level problem involves constraints coupling both upper-level and lower-level variables. Such problems have recently gained significant attention due to their broad applicability in machine learning. However, conventional gradient-based methods unavoidably rely… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  28. arXiv:2401.14535  [pdf, other

    cs.LG cs.CV stat.ME

    CaRiNG: Learning Temporal Causal Representation under Non-Invertible Generation Process

    Authors: Guangyi Chen, Yifan Shen, Zhenhao Chen, Xiangchen Song, Yuewen Sun, Weiran Yao, Xiao Liu, Kun Zhang

    Abstract: Identifying the underlying time-delayed latent causal processes in sequential data is vital for grasping temporal dynamics and making downstream reasoning. While some recent methods can robustly identify these latent causal variables, they rely on strict assumptions about the invertible generation process from latent variables to observed data. However, these assumptions are often hard to satisfy… ▽ More

    Submitted 30 May, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: To appear at ICML 2024, 24 pages

  29. Coordinated Guiding Vector Field Design for Ordering-Flexible Multi-Robot Surface Navigation

    Authors: Bin-Bin Hu, Hai-Tao Zhang, Weijia Yao, Zhiyong Sun, Ming Cao

    Abstract: We design a distributed coordinated guiding vector field (CGVF) for a group of robots to achieve ordering-flexible motion coordination while maneuvering on a desired two-dimensional (2D) surface. The CGVF is characterized by three terms, i.e., a convergence term to drive the robots to converge to the desired surface, a propagation term to provide a traversing direction for maneuvering on the desir… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: Published on IEEE Transactions on Automatic Control, 2024

  30. arXiv:2401.13919  [pdf, other

    cs.CL cs.AI

    WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models

    Authors: Hongliang He, Wenlin Yao, Kaixin Ma, Wenhao Yu, Yong Dai, Hongming Zhang, Zhenzhong Lan, Dong Yu

    Abstract: The rapid advancement of large language models (LLMs) has led to a new era marked by the development of autonomous applications in real-world scenarios, which drives innovation in creating advanced web agents. Existing web agents typically only handle one input modality and are evaluated only in simplified web simulators or static web snapshots, greatly limiting their applicability in real-world s… ▽ More

    Submitted 6 June, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: Accepted to ACL 2024 (main). Code and data is released at https://github.com/MinorJerry/WebVoyager

  31. arXiv:2401.12870  [pdf, other

    cs.CV

    Unlocking the Potential: Multi-task Deep Learning for Spaceborne Quantitative Monitoring of Fugitive Methane Plumes

    Authors: Guoxin Si, Shiliang Fu, Wei Yao

    Abstract: As global warming intensifies, increased attention is being paid to monitoring fugitive methane emissions and detecting gas plumes from landfills. We have divided methane emission monitoring into three subtasks: methane concentration inversion, plume segmentation, and emission rate estimation. Traditional algorithms face certain limitations: methane concentration inversion typically employs the ma… ▽ More

    Submitted 15 July, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

  32. arXiv:2401.10495  [pdf, ps, other

    cs.LG cs.AI stat.ME

    Causal Layering via Conditional Entropy

    Authors: Itai Feigenbaum, Devansh Arpit, Huan Wang, Shelby Heinecke, Juan Carlos Niebles, Weiran Yao, Caiming Xiong, Silvio Savarese

    Abstract: Causal discovery aims to recover information about an unobserved causal graph from the observable data it generates. Layerings are orderings of the variables which place causes before effects. In this paper, we provide ways to recover layerings of a graph by accessing the data via a conditional entropy oracle, when distributions are discrete. Our algorithms work by repeatedly removing sources or s… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

  33. arXiv:2401.08734  [pdf, other

    cs.CV cs.LG

    Bag of Tricks to Boost Adversarial Transferability

    Authors: Zeliang Zhang, Wei Yao, Xiaosen Wang

    Abstract: Deep neural networks are widely known to be vulnerable to adversarial examples. However, vanilla adversarial examples generated under the white-box setting often exhibit low transferability across different models. Since adversarial transferability poses more severe threats to practical applications, various approaches have been proposed for better transferability, including gradient-based, input… ▽ More

    Submitted 19 July, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

  34. arXiv:2401.07526  [pdf, other

    cs.CL cs.AI cs.LG

    Editing Arbitrary Propositions in LLMs without Subject Labels

    Authors: Itai Feigenbaum, Devansh Arpit, Huan Wang, Shelby Heinecke, Juan Carlos Niebles, Weiran Yao, Caiming Xiong, Silvio Savarese

    Abstract: Large Language Model (LLM) editing modifies factual information in LLMs. Locate-and-Edit (L\&E) methods accomplish this by finding where relevant information is stored within the neural network, and editing the weights at that location. The goal of editing is to modify the response of an LLM to a proposition independently of its phrasing, while not modifying its response to other related propositi… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  35. arXiv:2401.05159  [pdf, other

    cs.CV cs.AI

    Derm-T2IM: Harnessing Synthetic Skin Lesion Data via Stable Diffusion Models for Enhanced Skin Disease Classification using ViT and CNN

    Authors: Muhammad Ali Farooq, Wang Yao, Michael Schukat, Mark A Little, Peter Corcoran

    Abstract: This study explores the utilization of Dermatoscopic synthetic data generated through stable diffusion models as a strategy for enhancing the robustness of machine learning model training. Synthetic data generation plays a pivotal role in mitigating challenges associated with limited labeled datasets, thereby facilitating more effective model training. In this context, we aim to incorporate enhanc… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: Paper is submitted in EMBC 2024 Conference

  36. arXiv:2401.03601  [pdf, other

    cs.CL cs.AI

    InFoBench: Evaluating Instruction Following Ability in Large Language Models

    Authors: Yiwei Qin, Kaiqiang Song, Yebowen Hu, Wenlin Yao, Sangwoo Cho, Xiaoyang Wang, Xuansheng Wu, Fei Liu, Pengfei Liu, Dong Yu

    Abstract: This paper introduces the Decomposed Requirements Following Ratio (DRFR), a new metric for evaluating Large Language Models' (LLMs) ability to follow instructions. Addressing a gap in current methodologies, DRFR breaks down complex instructions into simpler criteria, facilitating a detailed analysis of LLMs' compliance with various aspects of tasks. Alongside this metric, we present InFoBench, a b… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

  37. arXiv:2401.01730  [pdf, other

    cs.CV

    STAF: 3D Human Mesh Recovery from Video with Spatio-Temporal Alignment Fusion

    Authors: Wei Yao, Hongwen Zhang, Yunlian Sun, Jinhui Tang

    Abstract: The recovery of 3D human mesh from monocular images has significantly been developed in recent years. However, existing models usually ignore spatial and temporal information, which might lead to mesh and image misalignment and temporal discontinuity. For this reason, we propose a novel Spatio-Temporal Alignment Fusion (STAF) model. As a video-based model, it leverages coherence clues from human m… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

    Comments: Project Page: https://yw0208.github.io/staf/

  38. arXiv:2312.11336  [pdf, other

    cs.IR cs.AI

    DRDT: Dynamic Reflection with Divergent Thinking for LLM-based Sequential Recommendation

    Authors: Yu Wang, Zhiwei Liu, Jianguo Zhang, Weiran Yao, Shelby Heinecke, Philip S. Yu

    Abstract: The rise of Large Language Models (LLMs) has sparked interest in their application to sequential recommendation tasks as they can provide supportive item information. However, due to the inherent complexities of sequential recommendation, such as sequential patterns across datasets, noise within sequences, and the temporal evolution of user preferences, existing LLM reasoning strategies, such as i… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  39. arXiv:2312.02772  [pdf, other

    cs.CV

    FG-MDM: Towards Zero-Shot Human Motion Generation via Fine-Grained Descriptions

    Authors: Xu Shi, Wei Yao, Chuanchen Luo, Junran Peng, Hongwen Zhang, Yunlian Sun

    Abstract: Recently, significant progress has been made in text-based motion generation, enabling the generation of diverse and high-quality human motions that conform to textual descriptions. However, generating motions beyond the distribution of original datasets remains challenging, i.e., zero-shot generation. By adopting a divide-and-conquer strategy, we propose a new framework named Fine-Grained Human M… ▽ More

    Submitted 23 April, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: Project Page: https://sx0207.github.io/fg-mdm/

  40. arXiv:2311.18340  [pdf, other

    cs.ET

    Neuromorphic Incremental on-chip Learning with Hebbian Weight Consolidation

    Authors: Zifan Ning, Chaojin Chen, Xiang Cheng, Wangzi Yao, Tielin Zhang, Bo Xu

    Abstract: As next-generation implantable brain-machine interfaces become pervasive on edge device, incrementally learning new tasks in bio-plasticity ways is urgently demanded for Neuromorphic chips. Due to the inherent characteristics of its structure, spiking neural networks are naturally well-suited for BMI-chips. Here we propose Hebbian Weight Consolidation, as well as an on-chip learning framework. HWC… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

    Comments: 12 pages, 6 figures

  41. arXiv:2311.17460  [pdf, other

    cs.CV

    W-HMR: Human Mesh Recovery in World Space with Weak-supervised Camera Calibration and Orientation Correction

    Authors: Wei Yao, Hongwen Zhang, Yunlian Sun, Jinhui Tang

    Abstract: For a long time, in reconstructing 3D human bodies from monocular images, most methods opted to simplify the task by minimizing the influence of the camera. Using a coarse focal length setting results in the reconstructed bodies not aligning well with distorted images. Ignoring camera rotation leads to an unrealistic reconstructed body pose in world space. Consequently, the application scenarios o… ▽ More

    Submitted 24 March, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: Project Page: https://yw0208.github.io/w-hmr/

  42. arXiv:2311.14899  [pdf, other

    cs.CV

    HyperDID: Hyperspectral Intrinsic Image Decomposition with Deep Feature Embedding

    Authors: Zhiqiang Gong, Xian Zhou, Wen Yao, Xiaohu Zheng, Ping Zhong

    Abstract: The dissection of hyperspectral images into intrinsic components through hyperspectral intrinsic image decomposition (HIID) enhances the interpretability of hyperspectral data, providing a foundation for more accurate classification outcomes. However, the classification performance of HIID is constrained by the model's representational ability. To address this limitation, this study rethinks hyper… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

    Comments: Submitted to IEEE TGRS

  43. arXiv:2311.11182  [pdf, other

    stat.ML cs.LG

    Exponentially Convergent Algorithms for Supervised Matrix Factorization

    Authors: Joowon Lee, Hanbaek Lyu, Weixin Yao

    Abstract: Supervised matrix factorization (SMF) is a classical machine learning method that simultaneously seeks feature extraction and classification tasks, which are not necessarily a priori aligned objectives. Our goal is to use SMF to learn low-rank latent factors that offer interpretable, data-reconstructive, and class-discriminative features, addressing challenges posed by high-dimensional data. Train… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

    Comments: 33 pages, 3 figures. arXiv admin note: substantial text overlap with arXiv:2206.06774

    Journal ref: Neural Information Processing Systems 2023

  44. arXiv:2311.10774  [pdf, other

    cs.CL cs.AI

    MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning

    Authors: Fuxiao Liu, Xiaoyang Wang, Wenlin Yao, Jianshu Chen, Kaiqiang Song, Sangwoo Cho, Yaser Yacoob, Dong Yu

    Abstract: With the rapid development of large language models (LLMs) and their integration into large multimodal models (LMMs), there has been impressive progress in zero-shot completion of user-oriented vision-language tasks. However, a gap remains in the domain of chart image understanding due to the distinct abstract components in charts. To address this, we introduce a large-scale MultiModal Chart Instr… ▽ More

    Submitted 15 April, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: Accepted to NAACL 2024

  45. arXiv:2311.06307  [pdf

    cs.HC cs.AI cs.SD eess.AS

    Synthetic Speaking Children -- Why We Need Them and How to Make Them

    Authors: Muhammad Ali Farooq, Dan Bigioi, Rishabh Jain, Wang Yao, Mariam Yiwere, Peter Corcoran

    Abstract: Contemporary Human Computer Interaction (HCI) research relies primarily on neural network models for machine vision and speech understanding of a system user. Such models require extensively annotated training datasets for optimal performance and when building interfaces for users from a vulnerable population such as young children, GDPR introduces significant complexities in data collection, mana… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: Presented at SpeD 23

  46. arXiv:2311.05374  [pdf, other

    cs.CL cs.AI

    TencentLLMEval: A Hierarchical Evaluation of Real-World Capabilities for Human-Aligned LLMs

    Authors: Shuyi Xie, Wenlin Yao, Yong Dai, Shaobo Wang, Donlin Zhou, Lifeng Jin, Xinhua Feng, Pengzhi Wei, Yujie Lin, Zhichao Hu, Dong Yu, Zhengyou Zhang, Jing Nie, Yuhong Liu

    Abstract: Large language models (LLMs) have shown impressive capabilities across various natural language tasks. However, evaluating their alignment with human preferences remains a challenge. To this end, we propose a comprehensive human evaluation framework to assess LLMs' proficiency in following instructions on diverse real-world tasks. We construct a hierarchical task tree encompassing 7 major areas co… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  47. arXiv:2311.01696  [pdf, other

    cs.CR cs.CV

    Universal Perturbation-based Secret Key-Controlled Data Hiding

    Authors: Donghua Wang, Wen Yao, Tingsong Jiang, Xiaoqian Chen

    Abstract: Deep neural networks (DNNs) are demonstrated to be vulnerable to universal perturbation, a single quasi-perceptible perturbation that can deceive the DNN on most images. However, the previous works are focused on using universal perturbation to perform adversarial attacks, while the potential usability of universal perturbation as data carriers in data hiding is less explored, especially for the k… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: 18 pages, 8 tables, 10 figures

  48. Spontaneous-Ordering Platoon Control for Multirobot Path Navigation Using Guiding Vector Fields

    Authors: Bin-Bin Hu, Hai-Tao Zhang, Weijia Yao, Jianing Ding, Ming Cao

    Abstract: In this paper, we propose a distributed guiding-vector-field (DGVF) algorithm for a team of robots to form a spontaneous-ordering platoon moving along a predefined desired path in the n-dimensional Euclidean space. Particularly, by adding a path parameter as an additional virtual coordinate to each robot, the DGVF algorithm can eliminate the singular points where the vector fields vanish, and gove… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Journal ref: IEEE Transaction on Robotics, 2023

  49. arXiv:2310.18615  [pdf, other

    cs.LG stat.ML

    Temporally Disentangled Representation Learning under Unknown Nonstationarity

    Authors: Xiangchen Song, Weiran Yao, Yewen Fan, Xinshuai Dong, Guangyi Chen, Juan Carlos Niebles, Eric Xing, Kun Zhang

    Abstract: In unsupervised causal representation learning for sequential data with time-delayed latent causal influences, strong identifiability results for the disentanglement of causally-related latent variables have been established in stationary settings by leveraging temporal structure. However, in nonstationary setting, existing work only partially addressed the problem by either utilizing observed aux… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  50. arXiv:2310.18550  [pdf, other

    cs.CV

    MultiScale Spectral-Spatial Convolutional Transformer for Hyperspectral Image Classification

    Authors: Zhiqiang Gong, Xian Zhou, Wen Yao

    Abstract: Due to the powerful ability in capturing the global information, Transformer has become an alternative architecture of CNNs for hyperspectral image classification. However, general Transformer mainly considers the global spectral information while ignores the multiscale spatial information of the hyperspectral image. In this paper, we propose a multiscale spectral-spatial convolutional Transformer… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: submitted to IEEE GRSL