Skip to main content

Showing 1–50 of 296 results for author: Jiang, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.17535  [pdf, other

    cs.AI cs.LG cs.SE

    LAMBDA: A Large Model Based Data Agent

    Authors: Maojun Sun, Ruijian Han, Binyan Jiang, Houduo Qi, Defeng Sun, Yancheng Yuan, Jian Huang

    Abstract: We introduce ``LAMBDA," a novel open-source, code-free multi-agent data analysis system that that harnesses the power of large models. LAMBDA is designed to address data analysis challenges in complex data-driven applications through the use of innovatively designed data agents that operate iteratively and generatively using natural language. At the core of LAMBDA are two key agent roles: the prog… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 30 pages, 21 figures and 5 tables

    MSC Class: 62-04; 62-08; 68T01; 68T09

  2. arXiv:2407.17451  [pdf, other

    cs.SI cs.CY cs.IR

    BlueTempNet: A Temporal Multi-network Dataset of Social Interactions in Bluesky Social

    Authors: Ujun Jeong, Bohan Jiang, Zhen Tan, H. Russell Bernard, Huan Liu

    Abstract: Decentralized social media platforms like Bluesky Social (Bluesky) have made it possible to publicly disclose some user behaviors with millisecond-level precision. Embracing Bluesky's principles of open-source and open-data, we present the first collection of the temporal dynamics of user-driven social interactions. BlueTempNet integrates multiple types of networks into a single multi-network, inc… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: to appear in IEEE Data Description

  3. arXiv:2407.17349  [pdf, other

    cs.CL

    Boosting Large Language Models with Socratic Method for Conversational Mathematics Teaching

    Authors: Yuyang Ding, Hanglei Hu, Jie Zhou, Qin Chen, Bo Jiang, Liang He

    Abstract: With the introduction of large language models (LLMs), automatic math reasoning has seen tremendous success. However, current methods primarily focus on providing solutions or using techniques like Chain-of-Thought to enhance problem-solving accuracy. In this paper, we focus on improving the capability of mathematics teaching via a Socratic teaching-based LLM (\texttt{SocraticLLM}), which guides l… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: Accepted By CIKM 2024

  4. arXiv:2407.08585  [pdf, other

    cs.RO cs.AI cs.LG

    HACMan++: Spatially-Grounded Motion Primitives for Manipulation

    Authors: Bowen Jiang, Yilin Wu, Wenxuan Zhou, Chris Paxton, David Held

    Abstract: Although end-to-end robot learning has shown some success for robot manipulation, the learned policies are often not sufficiently robust to variations in object pose or geometry. To improve the policy generalization, we introduce spatially-grounded parameterized motion primitives in our method HACMan++. Specifically, we propose an action representation consisting of three components: what primitiv… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  5. arXiv:2407.03900  [pdf, other

    cs.CV

    Oracle Bone Inscriptions Multi-modal Dataset

    Authors: Bang Li, Donghao Luo, Yujie Liang, Jing Yang, Zengmao Ding, Xu Peng, Boyuan Jiang, Shengwei Han, Dan Sui, Peichao Qin, Pian Wu, Chaoyang Wang, Yun Qi, Taisong Jin, Chengjie Wang, Xiaoming Huang, Zhan Shu, Rongrong Ji, Yongge Liu, Yunsheng Wu

    Abstract: Oracle bone inscriptions(OBI) is the earliest developed writing system in China, bearing invaluable written exemplifications of early Shang history and paleography. However, the task of deciphering OBI, in the current climate of the scholarship, can prove extremely challenging. Out of the 4,500 oracle bone characters excavated, only a third have been successfully identified. Therefore, leveraging… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  6. arXiv:2407.03876  [pdf, other

    cs.CR cs.CL

    DART: Deep Adversarial Automated Red Teaming for LLM Safety

    Authors: Bojian Jiang, Yi Jing, Tianhao Shen, Qing Yang, Deyi Xiong

    Abstract: Manual Red teaming is a commonly-used method to identify vulnerabilities in large language models (LLMs), which, is costly and unscalable. In contrast, automated red teaming uses a Red LLM to automatically generate adversarial prompts to the Target LLM, offering a scalable way for safety vulnerability detection. However, the difficulty of building a powerful automated Red LLM lies in the fact that… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  7. SUPER: Seated Upper Body Pose Estimation using mmWave Radars

    Authors: Bo Zhang, Zimeng Zhou, Boyu Jiang, Rong Zheng

    Abstract: In industrial countries, adults spend a considerable amount of time sedentary each day at work, driving and during activities of daily living. Characterizing the seated upper body human poses using mmWave radars is an important, yet under-studied topic with many applications in human-machine interaction, transportation and road safety. In this work, we devise SUPER, a framework for seated upper bo… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  8. arXiv:2406.18572  [pdf, other

    cs.CV cs.LG

    GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model

    Authors: Ling Li, Yu Ye, Bingchuan Jiang, Wei Zeng

    Abstract: This work tackles the problem of geo-localization with a new paradigm using a large vision-language model (LVLM) augmented with human inference knowledge. A primary challenge here is the scarcity of data for training the LVLM - existing street-view datasets often contain numerous low-quality images lacking visual clues, and lack any reasoning inference. To address the data-quality issue, we devise… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  9. arXiv:2406.17992  [pdf, other

    cs.CL cs.AI

    Catching Chameleons: Detecting Evolving Disinformation Generated using Large Language Models

    Authors: Bohan Jiang, Chengshuai Zhao, Zhen Tan, Huan Liu

    Abstract: Despite recent advancements in detecting disinformation generated by large language models (LLMs), current efforts overlook the ever-evolving nature of this disinformation. In this work, we investigate a challenging yet practical research problem of detecting evolving LLM-generated disinformation. Disinformation evolves constantly through the rapid development of LLMs and their variants. As a cons… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 10 pages, 5 figures

  10. arXiv:2406.17518  [pdf, other

    cs.AI cs.SI

    Enhancing Explainability of Knowledge Learning Paths: Causal Knowledge Networks

    Authors: Yuang Wei, Yizhou Zhou, Yuan-Hao Jiang, Bo Jiang

    Abstract: A reliable knowledge structure is a prerequisite for building effective adaptive learning systems and intelligent tutoring systems. Pursuing an explainable and trustworthy knowledge structure, we propose a method for constructing causal knowledge networks. This approach leverages Bayesian networks as a foundation and incorporates causal relationship analysis to derive a causal network. Additionall… ▽ More

    Submitted 25 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: 8 pages, 3 figures, Educational Data Mining 2024, Human-Centric eXplainable AI in Education

  11. arXiv:2406.17238  [pdf, other

    cs.LG cs.CV eess.IV

    Expansive Synthesis: Generating Large-Scale Datasets from Minimal Samples

    Authors: Vahid Jebraeeli, Bo Jiang, Hamid Krim, Derya Cansever

    Abstract: The challenge of limited availability of data for training in machine learning arises in many applications and the impact on performance and generalization is serious. Traditional data augmentation methods aim to enhance training with a moderately sufficient data set. Generative models like Generative Adversarial Networks (GANs) often face problematic convergence when generating significant and di… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 14 pages. arXiv admin note: text overlap with arXiv:2405.13866

  12. arXiv:2406.14846  [pdf, other

    cs.LG

    Graph Edge Representation via Tensor Product Graph Convolutional Representation

    Authors: Bo Jiang, Sheng Ge, Ziyan Zhang, Beibei Wang, Jin Tang, Bin Luo

    Abstract: Graph Convolutional Networks (GCNs) have been widely studied. The core of GCNs is the definition of convolution operators on graphs. However, existing Graph Convolution (GC) operators are mainly defined on adjacency matrix and node features and generally focus on obtaining effective node embeddings which cannot be utilized to address the graphs with (high-dimensional) edge features. To address thi… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  13. arXiv:2406.12896  [pdf, other

    cs.AI cs.CY cs.LG

    Leveraging Pedagogical Theories to Understand Student Learning Process with Graph-based Reasonable Knowledge Tracing

    Authors: Jiajun Cui, Hong Qian, Bo Jiang, Wei Zhang

    Abstract: Knowledge tracing (KT) is a crucial task in intelligent education, focusing on predicting students' performance on given questions to trace their evolving knowledge. The advancement of deep learning in this field has led to deep-learning knowledge tracing (DLKT) models that prioritize high predictive accuracy. However, many existing DLKT methods overlook the fundamental goal of tracking students'… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Preprint, accepted to appear in SIGKDD 2024, 12 pages. The source code is available at https://github.com/JJCui96/GRKT. Keywords: interpretable knowledge tracing, student behavior modeling, intelligence education

  14. arXiv:2406.12373  [pdf, other

    cs.CL cs.AI cs.LG

    WebCanvas: Benchmarking Web Agents in Online Environments

    Authors: Yichen Pan, Dehan Kong, Sida Zhou, Cheng Cui, Yifei Leng, Bing Jiang, Hangyu Liu, Yanyi Shang, Shuyan Zhou, Tongshuang Wu, Zhengyang Wu

    Abstract: For web agents to be practically useful, they must adapt to the continuously evolving web environment characterized by frequent updates to user interfaces and content. However, most existing benchmarks only capture the static aspects of the web. To bridge this gap, we introduce WebCanvas, an innovative online evaluation framework for web agents that effectively addresses the dynamic nature of web… ▽ More

    Submitted 16 July, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: Our platform, tool and dataset are publically available at https://www.imean.ai/web-canvas/ and https://huggingface.co/datasets/iMeanAI/Mind2Web-Live/

    MSC Class: 68T50 ACM Class: I.2.7

  15. arXiv:2406.11050  [pdf, other

    cs.CL cs.AI

    A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners

    Authors: Bowen Jiang, Yangxinyu Xie, Zhuoqun Hao, Xiaomeng Wang, Tanwi Mallick, Weijie J. Su, Camillo J. Taylor, Dan Roth

    Abstract: This study introduces a hypothesis-testing framework to assess whether large language models (LLMs) possess genuine reasoning abilities or primarily depend on token bias. We go beyond evaluating LLMs on accuracy; rather, we aim to investigate their token bias in solving logical reasoning tasks. Specifically, we develop carefully controlled synthetic datasets, featuring conjunction fallacy and syll… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Codes are open-sourced at https://github.com/bowen-upenn/llm_token_bias

  16. arXiv:2406.10498  [pdf, other

    cs.LG cs.SI

    A Unified Graph Selective Prompt Learning for Graph Neural Networks

    Authors: Bo Jiang, Hao Wu, Ziyan Zhang, Beibei Wang, Jin Tang

    Abstract: In recent years, graph prompt learning/tuning has garnered increasing attention in adapting pre-trained models for graph representation learning. As a kind of universal graph prompt learning method, Graph Prompt Feature (GPF) has achieved remarkable success in adapting pre-trained models for Graph Neural Networks (GNNs). By fixing the parameters of a pre-trained GNN model, the aim of GPF is to mod… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  17. arXiv:2406.07698  [pdf, other

    cs.LG

    Label Smoothing Improves Machine Unlearning

    Authors: Zonglin Di, Zhaowei Zhu, Jinghan Jia, Jiancheng Liu, Zafar Takhirov, Bo Jiang, Yuanshun Yao, Sijia Liu, Yang Liu

    Abstract: The objective of machine unlearning (MU) is to eliminate previously learned data from a model. However, it is challenging to strike a balance between computation cost and performance when using existing MU techniques. Taking inspiration from the influence of label smoothing on model confidence and differential privacy, we propose a simple gradient-based MU approach that uses an inverse process of… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  18. arXiv:2406.07411  [pdf, other

    cs.SE cs.CL

    VersiCode: Towards Version-controllable Code Generation

    Authors: Tongtong Wu, Weigang Wu, Xingyu Wang, Kang Xu, Suyu Ma, Bo Jiang, Ping Yang, Zhenchang Xing, Yuan-Fang Li, Gholamreza Haffari

    Abstract: Significant research has focused on improving the performance of large language model on code-related tasks due to their practical importance. Although performance is typically evaluated using public benchmark datasets, the existing datasets do not account for the concept of \emph{version}, which is crucial in professional software development. In this paper, we introduce VersiCode, the first comp… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  19. arXiv:2406.01414  [pdf, other

    cs.LG eess.SP

    CE-NAS: An End-to-End Carbon-Efficient Neural Architecture Search Framework

    Authors: Yiyang Zhao, Yunzhuo Liu, Bo Jiang, Tian Guo

    Abstract: This work presents a novel approach to neural architecture search (NAS) that aims to increase carbon efficiency for the model design process. The proposed framework CE-NAS addresses the key challenge of high carbon cost associated with NAS by exploring the carbon emission variations of energy and energy differences of different NAS algorithms. At the high level, CE-NAS leverages a reinforcement-le… ▽ More

    Submitted 17 July, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2307.04131

  20. arXiv:2406.00252  [pdf, other

    cs.AI cs.CL cs.CV cs.MA

    Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey

    Authors: Bowen Jiang, Yangxinyu Xie, Xiaomeng Wang, Weijie J. Su, Camillo J. Taylor, Tanwi Mallick

    Abstract: Rationality is the quality of being guided by reason, characterized by logical thinking and decision-making that align with evidence and logical rules. This quality is essential for effective problem-solving, as it ensures that solutions are well-founded and systematically derived. Despite the advancements of large language models (LLMs) in generating human-like text with remarkable accuracy, they… ▽ More

    Submitted 18 June, 2024; v1 submitted 31 May, 2024; originally announced June 2024.

  21. arXiv:2405.20969  [pdf, other

    cs.RO eess.SY

    Design, Calibration, and Control of Compliant Force-sensing Gripping Pads for Humanoid Robots

    Authors: Yuanfeng Han, Boren Jiang, Gregory S. Chirikjian

    Abstract: This paper introduces a pair of low-cost, light-weight and compliant force-sensing gripping pads used for manipulating box-like objects with smaller-sized humanoid robots. These pads measure normal gripping forces and center of pressure (CoP). A calibration method is developed to improve the CoP measurement accuracy. A hybrid force-alignment-position control framework is proposed to regulate the g… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: 21 pages, 16 figures, Published in ASME Journal of Mechanisms and Robotics

    Journal ref: Journal of Mechanisms and Robotics, 15, 031010,2023

  22. arXiv:2405.20081  [pdf, other

    cs.CV cs.AI

    NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models

    Authors: Kai Wu, Boyuan Jiang, Zhengkai Jiang, Qingdong He, Donghao Luo, Shengzhi Wang, Qingwen Liu, Chengjie Wang

    Abstract: Multimodal large language models (MLLMs) contribute a powerful mechanism to understanding visual information building on large language models. However, MLLMs are notorious for suffering from hallucinations, especially when generating lengthy, detailed descriptions for images. Our analysis reveals that hallucinations stem from the inherent summarization mechanism of large language models, leading… ▽ More

    Submitted 31 May, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: 14 pages, 5 figures with supplementary material

  23. arXiv:2405.16854  [pdf, other

    cs.MA

    Knowing What Not to Do: Leverage Language Model Insights for Action Space Pruning in Multi-agent Reinforcement Learning

    Authors: Zhihao Liu, Xianliang Yang, Zichuan Liu, Yifan Xia, Wei Jiang, Yuanyu Zhang, Lijuan Li, Guoliang Fan, Lei Song, Bian Jiang

    Abstract: Multi-agent reinforcement learning (MARL) is employed to develop autonomous agents that can learn to adopt cooperative or competitive strategies within complex environments. However, the linear increase in the number of agents leads to a combinatorial explosion of the action space, which may result in algorithmic instability, difficulty in convergence, or entrapment in local optima. While research… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  24. arXiv:2405.13866  [pdf, other

    cs.LG cs.CV eess.IV

    Koopcon: A new approach towards smarter and less complex learning

    Authors: Vahid Jebraeeli, Bo Jiang, Derya Cansever, Hamid Krim

    Abstract: In the era of big data, the sheer volume and complexity of datasets pose significant challenges in machine learning, particularly in image processing tasks. This paper introduces an innovative Autoencoder-based Dataset Condensation Model backed by Koopman operator theory that effectively packs large datasets into compact, information-rich representations. Inspired by the predictive coding mechanis… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 7 pages, 3 figures

  25. arXiv:2405.07668  [pdf, other

    cs.SE cs.AI cs.CR

    CrossCert: A Cross-Checking Detection Approach to Patch Robustness Certification for Deep Learning Models

    Authors: Qilin Zhou, Zhengyuan Wei, Haipeng Wang, Bo Jiang, W. K. Chan

    Abstract: Patch robustness certification is an emerging kind of defense technique against adversarial patch attacks with provable guarantees. There are two research lines: certified recovery and certified detection. They aim to label malicious samples with provable guarantees correctly and issue warnings for malicious samples predicted to non-benign labels with provable guarantees, respectively. However, ex… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 23 pages, 2 figures, accepted by FSE 2024 (The ACM International Conference on the Foundations of Software Engineering)

  26. arXiv:2404.18174  [pdf, other

    cs.CV cs.AI

    Mamba-FETrack: Frame-Event Tracking via State Space Model

    Authors: Ju Huang, Shiao Wang, Shuai Wang, Zhe Wu, Xiao Wang, Bo Jiang

    Abstract: RGB-Event based tracking is an emerging research topic, focusing on how to effectively integrate heterogeneous multi-modal data (synchronized exposure video frames and asynchronous pulse Event stream). Existing works typically employ Transformer based networks to handle these modalities and achieve decent accuracy through input-level or feature-level fusion on multiple datasets. However, these tra… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: In Peer Review

  27. arXiv:2404.17929  [pdf, other

    cs.CV cs.AI cs.CL

    Spatio-Temporal Side Tuning Pre-trained Foundation Models for Video-based Pedestrian Attribute Recognition

    Authors: Xiao Wang, Qian Zhu, Jiandong Jin, Jun Zhu, Futian Wang, Bo Jiang, Yaowei Wang, Yonghong Tian

    Abstract: Existing pedestrian attribute recognition (PAR) algorithms are mainly developed based on a static image, however, the performance is unreliable in challenging scenarios, such as heavy occlusion, motion blur, etc. In this work, we propose to understand human attributes using video frames that can fully use temporal information by fine-tuning a pre-trained multi-modal foundation model efficiently. S… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Parameter Efficient Fine-Tuning Strategy for Video-based Pedestrian Attribute Recognition

  28. arXiv:2404.17926  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Pre-training on High Definition X-ray Images: An Experimental Study

    Authors: Xiao Wang, Yuehang Li, Wentao Wu, Jiandong Jin, Yao Rong, Bo Jiang, Chuanfu Li, Jin Tang

    Abstract: Existing X-ray based pre-trained vision models are usually conducted on a relatively small-scale dataset (less than 500k samples) with limited resolution (e.g., 224 $\times$ 224). However, the key to the success of self-supervised pre-training large models lies in massive training data, and maintaining high resolution in the field of X-ray images is the guarantee of effective solutions to difficul… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Technology Report

  29. arXiv:2404.14837  [pdf, other

    eess.IV cs.CV

    Ultrasound SAM Adapter: Adapting SAM for Breast Lesion Segmentation in Ultrasound Images

    Authors: Zhengzheng Tu, Le Gu, Xixi Wang, Bo Jiang

    Abstract: Segment Anything Model (SAM) has recently achieved amazing results in the field of natural image segmentation. However, it is not effective for medical image segmentation, owing to the large domain gap between natural and medical images. In this paper, we mainly focus on ultrasound image segmentation. As we know that it is very difficult to train a foundation model for ultrasound image data due to… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  30. arXiv:2404.10384  [pdf, other

    cs.CL cs.AI cs.IR

    Reasoning on Efficient Knowledge Paths:Knowledge Graph Guides Large Language Model for Domain Question Answering

    Authors: Yuqi Wang, Boran Jiang, Yi Luo, Dawei He, Peng Cheng, Liangcai Gao

    Abstract: Large language models (LLMs), such as GPT3.5, GPT4 and LLAMA2 perform surprisingly well and outperform human experts on many tasks. However, in many domain-specific evaluations, these LLMs often suffer from hallucination problems due to insufficient training of relevant corpus. Furthermore, fine-tuning large models may face problems such as the LLMs are not open source or the construction of high-… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  31. arXiv:2404.09516  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.MM

    State Space Model for New-Generation Network Alternative to Transformers: A Survey

    Authors: Xiao Wang, Shiao Wang, Yuhe Ding, Yuehang Li, Wentao Wu, Yao Rong, Weizhe Kong, Ju Huang, Shihao Li, Haoxiang Yang, Ziwen Wang, Bo Jiang, Chenglong Li, Yaowei Wang, Yonghong Tian, Jin Tang

    Abstract: In the post-deep learning era, the Transformer architecture has demonstrated its powerful performance across pre-trained big models and various downstream tasks. However, the enormous computational demands of this architecture have deterred many researchers. To further reduce the complexity of attention models, numerous efforts have been made to design more efficient methods. Among them, the State… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: The First review of State Space Model (SSM)/Mamba and their applications in artificial intelligence, 33 pages

  32. arXiv:2404.06063  [pdf, other

    cs.CL cs.AI cs.LG

    All in One: An Empirical Study of GPT for Few-Shot Aspect-Based Sentiment Anlaysis

    Authors: Baoxing Jiang

    Abstract: Aspect-Based Sentiment Analysis (ABSA) is an indispensable and highly challenging task in natural language processing. Current efforts have focused on specific sub-tasks, making it difficult to comprehensively cover all sub-tasks within the ABSA domain. With the development of Generative Pre-trained Transformers (GPTs), there came inspiration for a one-stop solution to sentiment analysis. In this… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 9 pages, 5 figures

  33. arXiv:2404.01700  [pdf, other

    cs.CV

    MotionChain: Conversational Motion Controllers via Multimodal Prompts

    Authors: Biao Jiang, Xin Chen, Chi Zhang, Fukun Yin, Zhuoyuan Li, Gang YU, Jiayuan Fan

    Abstract: Recent advancements in language models have demonstrated their adeptness in conducting multi-turn dialogues and retaining conversational context. However, this proficiency remains largely unexplored in other multimodal generative models, particularly in human motion models. By integrating multi-turn conversations in controlling continuous virtual human movements, generative human motion models can… ▽ More

    Submitted 3 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 14 pages, 4 figures

  34. arXiv:2403.17782  [pdf, other

    cs.CV cs.GR

    GenesisTex: Adapting Image Denoising Diffusion to Texture Space

    Authors: Chenjian Gao, Boyan Jiang, Xinghui Li, Yingpeng Zhang, Qian Yu

    Abstract: We present GenesisTex, a novel method for synthesizing textures for 3D geometries from text descriptions. GenesisTex adapts the pretrained image diffusion model to texture space by texture space sampling. Specifically, we maintain a latent texture map for each viewpoint, which is updated with predicted noise on the rendering of the corresponding viewpoint. The sampled latent texture maps are then… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 12 pages, 10 figures

  35. arXiv:2403.14783  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.MA

    Multi-Agent VQA: Exploring Multi-Agent Foundation Models in Zero-Shot Visual Question Answering

    Authors: Bowen Jiang, Zhijun Zhuang, Shreyas S. Shivakumar, Dan Roth, Camillo J. Taylor

    Abstract: This work explores the zero-shot capabilities of foundation models in Visual Question Answering (VQA) tasks. We propose an adaptive multi-agent system, named Multi-Agent VQA, to overcome the limitations of foundation models in object detection and counting by using specialized agents as tools. Unlike existing approaches, our study focuses on the system's performance without fine-tuning it on speci… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: A full version of the paper will be released soon. The codes are available at https://github.com/bowen-upenn/Multi-Agent-VQA

  36. arXiv:2403.13663  [pdf, other

    cs.CV

    T-Pixel2Mesh: Combining Global and Local Transformer for 3D Mesh Generation from a Single Image

    Authors: Shijie Zhang, Boyan Jiang, Keke He, Junwei Zhu, Ying Tai, Chengjie Wang, Yinda Zhang, Yanwei Fu

    Abstract: Pixel2Mesh (P2M) is a classical approach for reconstructing 3D shapes from a single color image through coarse-to-fine mesh deformation. Although P2M is capable of generating plausible global shapes, its Graph Convolution Network (GCN) often produces overly smooth results, causing the loss of fine-grained geometry details. Moreover, P2M generates non-credible features for occluded regions and stru… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Received by ICASSP 2024

  37. arXiv:2403.11699  [pdf, other

    eess.IV cs.CV

    A Spatial-Temporal Progressive Fusion Network for Breast Lesion Segmentation in Ultrasound Videos

    Authors: Zhengzheng Tu, Zigang Zhu, Yayang Duan, Bo Jiang, Qishun Wang, Chaoxue Zhang

    Abstract: Ultrasound video-based breast lesion segmentation provides a valuable assistance in early breast lesion detection and treatment. However, existing works mainly focus on lesion segmentation based on ultrasound breast images which usually can not be adapted well to obtain desirable results on ultrasound videos. The main challenge for ultrasound video-based breast lesion segmentation is how to exploi… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  38. arXiv:2403.11445  [pdf, other

    cs.CR cs.DS eess.SP

    Budget Recycling Differential Privacy

    Authors: Bo Jiang, Jian Du, Sagar Sharma, Qiang Yan

    Abstract: Differential Privacy (DP) mechanisms usually {force} reduction in data utility by producing "out-of-bound" noisy results for a tight privacy budget. We introduce the Budget Recycling Differential Privacy (BR-DP) framework, designed to provide soft-bounded noisy outputs for a broad range of existing DP mechanisms. By "soft-bounded," we refer to the mechanism's ability to release most outputs within… ▽ More

    Submitted 12 July, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

  39. arXiv:2403.11158  [pdf, other

    cs.SE

    An Empirical Study on JIT Defect Prediction Based on BERT-style Model

    Authors: Yuxiang Guo, Xiaopeng Gao, Bo Jiang

    Abstract: Previous works on Just-In-Time (JIT) defect prediction tasks have primarily applied pre-trained models directly, neglecting the configurations of their fine-tuning process. In this study, we perform a systematic empirical study to understand the impact of the settings of the fine-tuning process on BERT-style pre-trained model for JIT defect prediction. Specifically, we explore the impact of differ… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  40. arXiv:2403.10298  [pdf, other

    cs.CV

    Context-Semantic Quality Awareness Network for Fine-Grained Visual Categorization

    Authors: Qin Xu, Sitong Li, Jiahui Wang, Bo Jiang, Jinhui Tang

    Abstract: Exploring and mining subtle yet distinctive features between sub-categories with similar appearances is crucial for fine-grained visual categorization (FGVC). However, less effort has been devoted to assessing the quality of extracted visual representations. Intuitively, the network may struggle to capture discriminative features from low-quality samples, which leads to a significant decline in FG… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  41. arXiv:2403.06487  [pdf, other

    cs.CL cs.SD eess.AS

    Multilingual Turn-taking Prediction Using Voice Activity Projection

    Authors: Koji Inoue, Bing'er Jiang, Erik Ekstedt, Tatsuya Kawahara, Gabriel Skantze

    Abstract: This paper investigates the application of voice activity projection (VAP), a predictive turn-taking model for spoken dialogue, on multilingual data, encompassing English, Mandarin, and Japanese. The VAP model continuously predicts the upcoming voice activities of participants in dyadic dialogue, leveraging a cross-attention Transformer to capture the dynamic interplay between participants. The re… ▽ More

    Submitted 14 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: This paper has been accepted for presentation at The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) and represents the author's version of the work

  42. arXiv:2403.05839  [pdf, other

    cs.CV cs.AI cs.NE

    Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline

    Authors: Xiao Wang, Ju Huang, Shiao Wang, Chuanming Tang, Bo Jiang, Yonghong Tian, Jin Tang, Bin Luo

    Abstract: Current event-/frame-event based trackers undergo evaluation on short-term tracking datasets, however, the tracking of real-world scenarios involves long-term tracking, and the performance of existing tracking algorithms in these scenarios remains unclear. In this paper, we first propose a new long-term and large-scale frame-event single object tracking dataset, termed FELT. It contains 742 videos… ▽ More

    Submitted 3 April, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

    Comments: In Peer Review

  43. arXiv:2403.04009  [pdf, other

    cs.SI cs.CL cs.CY physics.soc-ph

    Media Bias Matters: Understanding the Impact of Politically Biased News on Vaccine Attitudes in Social Media

    Authors: Bohan Jiang, Lu Cheng, Zhen Tan, Ruocheng Guo, Huan Liu

    Abstract: News media has been utilized as a political tool to stray from facts, presenting biased claims without evidence. Amid the COVID-19 pandemic, politically biased news (PBN) has significantly undermined public trust in vaccines, despite strong medical evidence supporting their efficacy. In this paper, we analyze: (i) how inherent vaccine stances subtly influence individuals' selection of news sources… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 9 pages, 6 figures, 3 tables

  44. arXiv:2403.00808  [pdf, other

    cs.CL cs.AI

    IPED: An Implicit Perspective for Relational Triple Extraction based on Diffusion Model

    Authors: Jianli Zhao, Changhao Xu, Bin Jiang

    Abstract: Relational triple extraction is a fundamental task in the field of information extraction, and a promising framework based on table filling has recently gained attention as a potential baseline for entity relation extraction. However, inherent shortcomings such as redundant information and incomplete triple recognition remain problematic. To address these challenges, we propose an Implicit Perspec… ▽ More

    Submitted 24 February, 2024; originally announced March 2024.

    Comments: 12 pages, 4 figures, committed to NAACL 2024

  45. arXiv:2402.15231  [pdf, other

    cs.LG cs.CV

    Which Model to Transfer? A Survey on Transferability Estimation

    Authors: Yuhe Ding, Bo Jiang, Aijing Yu, Aihua Zheng, Jian Liang

    Abstract: Transfer learning methods endeavor to leverage relevant knowledge from existing source pre-trained models or datasets to solve downstream target tasks. With the increase in the scale and quantity of available pre-trained models nowadays, it becomes critical to assess in advance whether they are suitable for a specific target task. Model transferability estimation is an emerging and growing area of… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  46. arXiv:2402.13446  [pdf, other

    cs.CL

    Large Language Models for Data Annotation: A Survey

    Authors: Zhen Tan, Dawei Li, Song Wang, Alimohammad Beigi, Bohan Jiang, Amrita Bhattacharjee, Mansooreh Karami, Jundong Li, Lu Cheng, Huan Liu

    Abstract: Data annotation generally refers to the labeling or generating of raw data with relevant information, which could be used for improving the efficacy of machine learning models. The process, however, is labor-intensive and costly. The emergence of advanced Large Language Models (LLMs), exemplified by GPT-4, presents an unprecedented opportunity to automate the complicated process of data annotation… ▽ More

    Submitted 23 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  47. arXiv:2402.13243  [pdf, other

    cs.CV cs.RO

    VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning

    Authors: Shaoyu Chen, Bo Jiang, Hao Gao, Bencheng Liao, Qing Xu, Qian Zhang, Chang Huang, Wenyu Liu, Xinggang Wang

    Abstract: Learning a human-like driving policy from large-scale driving demonstrations is promising, but the uncertainty and non-deterministic nature of planning make it challenging. In this work, to cope with the uncertainty problem, we propose VADv2, an end-to-end driving model based on probabilistic planning. VADv2 takes multi-view image sequences as input in a streaming manner, transforms sensor data in… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: Project Page: https://hgao-cv.github.io/VADv2

  48. arXiv:2402.01723  [pdf, other

    cs.CL cs.AI

    An Empirical Study on Large Language Models in Accuracy and Robustness under Chinese Industrial Scenarios

    Authors: Zongjie Li, Wenying Qiu, Pingchuan Ma, Yichen Li, You Li, Sijia He, Baozheng Jiang, Shuai Wang, Weixi Gu

    Abstract: Recent years have witnessed the rapid development of large language models (LLMs) in various domains. To better serve the large number of Chinese users, many commercial vendors in China have adopted localization strategies, training and providing local LLMs specifically customized for Chinese users. Furthermore, looking ahead, one of the key future applications of LLMs will be practical deployment… ▽ More

    Submitted 26 January, 2024; originally announced February 2024.

  49. arXiv:2402.01666  [pdf, other

    cs.CY

    A Comprehensive Exploration of Personalized Learning in Smart Education: From Student Modeling to Personalized Recommendations

    Authors: Siyu Wu, Yang Cao, Jiajun Cui, Runze Li, Hong Qian, Bo Jiang, Wei Zhang

    Abstract: With the development of artificial intelligence, personalized learning has attracted much attention as an integral part of intelligent education. China, the United States, the European Union, and others have put forward the importance of personalized learning in recent years, emphasizing the realization of the organic combination of large-scale education and personalized training. The development… ▽ More

    Submitted 15 January, 2024; originally announced February 2024.

    Comments: 82 pages,5 figures

    MSC Class: 68-02 ACM Class: A.1

  50. VirtuWander: Enhancing Multi-modal Interaction for Virtual Tour Guidance through Large Language Models

    Authors: Zhan Wang, Lin-Ping Yuan, Liangwei Wang, Bingchuan Jiang, Wei Zeng

    Abstract: Tour guidance in virtual museums encourages multi-modal interactions to boost user experiences, concerning engagement, immersion, and spatial awareness. Nevertheless, achieving the goal is challenging due to the complexity of comprehending diverse user needs and accommodating personalized user preferences. Informed by a formative study that characterizes guidance-seeking contexts, we establish a m… ▽ More

    Submitted 23 January, 2024; v1 submitted 22 January, 2024; originally announced January 2024.