Skip to main content

Showing 1–50 of 2,363 results for author: Chen, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.16756  [pdf, other

    cs.CL

    How Far Can Cantonese NLP Go? Benchmarking Cantonese Capabilities of Large Language Models

    Authors: Jiyue Jiang, Liheng Chen, Pengan Chen, Sheng Wang, Qinghang Bao, Lingpeng Kong, Yu Li, Chuan Wu

    Abstract: The rapid evolution of large language models (LLMs) has transformed the competitive landscape in natural language processing (NLP), particularly for English and other data-rich languages. However, underrepresented languages like Cantonese, spoken by over 85 million people, face significant development gaps, which is particularly concerning given the economic significance of the Guangdong-Hong Kong… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  2. arXiv:2408.16498  [pdf, other

    cs.SE

    A Survey on Evaluating Large Language Models in Code Generation Tasks

    Authors: Liguo Chen, Qi Guo, Hongrui Jia, Zhengran Zeng, Xin Wang, Yijiang Xu, Jian Wu, Yidong Wang, Qing Gao, Jindong Wang, Wei Ye, Shikun Zhang

    Abstract: This paper provides a comprehensive review of the current methods and metrics used to evaluate the performance of Large Language Models (LLMs) in code generation tasks. With the rapid growth in demand for automated software development, LLMs have demonstrated significant potential in the field of code generation. The paper begins by reviewing the historical development of LLMs and their applicatio… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  3. arXiv:2408.16420  [pdf, other

    cs.RO

    Time-Optimized Trajectory Planning for Non-Prehensile Object Transportation in 3D

    Authors: Lingyun Chen, Haoyu Yu, Abdeldjallil Naceri, Abdalla Swikir, Sami Haddadin

    Abstract: Non-prehensile object transportation offers a way to enhance robotic performance in object manipulation tasks, especially with unstable objects. Effective trajectory planning requires simultaneous consideration of robot motion constraints and object stability. Here, we introduce a physical model for object stability and propose a novel trajectory planning approach for non-prehensile transportation… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: Accepted to the European Robotic Forum (ERF) 2024

  4. arXiv:2408.16266  [pdf, other

    cs.CV

    Improving Diffusion-based Data Augmentation with Inversion Spherical Interpolation

    Authors: Yanghao Wang, Long Chen

    Abstract: Data Augmentation (DA), \ie, synthesizing faithful and diverse samples to expand the original training set, is a prevalent and effective strategy to improve various visual recognition tasks. With the powerful image generation ability, diffusion-based DA has shown strong performance gains on different benchmarks. In this paper, we analyze today's diffusion-based DA methods, and argue that they cann… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  5. arXiv:2408.15980  [pdf, other

    cs.RO cs.AI

    In-Context Imitation Learning via Next-Token Prediction

    Authors: Letian Fu, Huang Huang, Gaurav Datta, Lawrence Yunliang Chen, William Chung-Ho Panitch, Fangchen Liu, Hui Li, Ken Goldberg

    Abstract: We explore how to enhance next-token prediction models to perform in-context imitation learning on a real robot, where the robot executes new tasks by interpreting contextual information provided during the input phase, without updating its underlying policy parameters. We propose In-Context Robot Transformer (ICRT), a causal transformer that performs autoregressive prediction on sensorimotor traj… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  6. arXiv:2408.15881  [pdf, other

    cs.CV

    LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

    Authors: Fangxun Shu, Yue Liao, Le Zhuo, Chenning Xu, Guanghao Zhang, Haonan Shi, Long Chen, Tao Zhong, Wanggui He, Siming Fu, Haoyuan Li, Bolin Li, Zhelun Yu, Si Liu, Hongsheng Li, Hao Jiang

    Abstract: We introduce LLaVA-MoD, a novel framework designed to enable the efficient training of small-scale Multimodal Language Models (s-MLLM) by distilling knowledge from large-scale MLLM (l-MLLM). Our approach tackles two fundamental challenges in MLLM distillation. First, we optimize the network structure of s-MLLM by integrating a sparse Mixture of Experts (MoE) architecture into the language model, s… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  7. arXiv:2408.15657  [pdf, other

    cs.CV cs.RO

    TeFF: Tracking-enhanced Forgetting-free Few-shot 3D LiDAR Semantic Segmentation

    Authors: Junbao Zhou, Jilin Mei, Pengze Wu, Liang Chen, Fangzhou Zhao, Xijun Zhao, Yu Hu

    Abstract: In autonomous driving, 3D LiDAR plays a crucial role in understanding the vehicle's surroundings. However, the newly emerged, unannotated objects presents few-shot learning problem for semantic segmentation. This paper addresses the limitations of current few-shot semantic segmentation by exploiting the temporal continuity of LiDAR data. Employing a tracking model to generate pseudo-ground-truths… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  8. arXiv:2408.14438  [pdf, other

    cs.CL cs.CY

    Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study

    Authors: Liuchang Xu, Shuo Zhao, Qingming Lin, Luyao Chen, Qianqian Luo, Sensen Wu, Xinyue Ye, Hailin Feng, Zhenhong Du

    Abstract: The advent of large language models such as ChatGPT, Gemini, and others has underscored the importance of evaluating their diverse capabilities, ranging from natural language understanding to code generation. However, their performance on spatial tasks has not been comprehensively assessed. This study addresses this gap by introducing a novel multi-task spatial evaluation dataset, designed to syst… ▽ More

    Submitted 28 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

  9. arXiv:2408.14211  [pdf, other

    cs.CV cs.AI

    MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement

    Authors: Xu He, Xiaoyu Li, Di Kang, Jiangnan Ye, Chaopeng Zhang, Liyang Chen, Xiangjun Gao, Han Zhang, Zhiyong Wu, Haolin Zhuang

    Abstract: Existing works in single-image human reconstruction suffer from weak generalizability due to insufficient training data or 3D inconsistencies for a lack of comprehensive multi-view knowledge. In this paper, we introduce MagicMan, a human-specific multi-view diffusion model designed to generate high-quality novel view images from a single reference image. As its core, we leverage a pre-trained 2D d… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Project Page: https://thuhcsi.github.io/MagicMan

  10. arXiv:2408.14173  [pdf, other

    cs.CV

    BackFlip: The Impact of Local and Global Data Augmentations on Artistic Image Aesthetic Assessment

    Authors: Ombretta Strafforello, Gonzalo Muradas Odriozola, Fatemeh Behrad, Li-Wei Chen, Anne-Sofie Maerten, Derya Soydaner, Johan Wagemans

    Abstract: Assessing the aesthetic quality of artistic images presents unique challenges due to the subjective nature of aesthetics and the complex visual characteristics inherent to artworks. Basic data augmentation techniques commonly applied to natural images in computer vision may not be suitable for art images in aesthetic evaluation tasks, as they can change the composition of the art images. In this p… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Published at the VISART VII workshop at ECCV 2024. Ombretta Strafforello, Gonzalo Muradas Odriozola, Fatemeh Behrad, Li-Wei Chen, Anne-Sofie Maerten and Derya Soydaner contributed equally to this work

  11. arXiv:2408.13044  [pdf, other

    cs.RO

    Identification and validation of the dynamic model of a tendon-driven anthropomorphic finger

    Authors: Junnan Li, Lingyun Chen, Johannes Ringwald, Edmundo Pozo Fortunic, Amartya Ganguly, Sami Haddadin

    Abstract: This study addresses the absence of an identification framework to quantify a comprehensive dynamic model of human and anthropomorphic tendon-driven fingers, which is necessary to investigate the physiological properties of human fingers and improve the control of robotic hands. First, a generalized dynamic model was formulated, which takes into account the inherent properties of such a mechanical… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: 8 pages, 9 figures

  12. arXiv:2408.12981  [pdf, other

    cs.AI

    QD-VMR: Query Debiasing with Contextual Understanding Enhancement for Video Moment Retrieval

    Authors: Chenghua Gao, Min Li, Jianshuo Liu, Junxing Ren, Lin Chen, Haoyu Liu, Bo Meng, Jitao Fu, Wenwen Su

    Abstract: Video Moment Retrieval (VMR) aims to retrieve relevant moments of an untrimmed video corresponding to the query. While cross-modal interaction approaches have shown progress in filtering out query-irrelevant information in videos, they assume the precise alignment between the query semantics and the corresponding video moments, potentially overlooking the misunderstanding of the natural language s… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: 9 pages, 4 figures, 4 tables

  13. arXiv:2408.12879  [pdf, other

    cs.CV cs.AI

    Frequency-aware Feature Fusion for Dense Image Prediction

    Authors: Linwei Chen, Ying Fu, Lin Gu, Chenggang Yan, Tatsuya Harada, Gao Huang

    Abstract: Dense image prediction tasks demand features with strong category information and precise spatial boundary details at high resolution. To achieve this, modern hierarchical models often utilize feature fusion, directly adding upsampled coarse features from deep layers and high-resolution features from lower levels. In this paper, we observe rapid variations in fused feature values within objects, r… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Accepted by TPAMI (2024)

  14. arXiv:2408.12857  [pdf, other

    cs.LG cs.AI cs.CL

    Memory-Efficient LLM Training with Online Subspace Descent

    Authors: Kaizhao Liang, Bo Liu, Lizhang Chen, Qiang Liu

    Abstract: Recently, a wide range of memory-efficient LLM training algorithms have gained substantial popularity. These methods leverage the low-rank structure of gradients to project optimizer states into a subspace using projection matrix found by singular value decomposition (SVD). However, convergence of these algorithms is highly dependent on the update rules of their projection matrix. In this work, we… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Code is available at https://github.com/kyleliang919/Online-Subspace-Descent

  15. arXiv:2408.12527  [pdf, other

    cs.RO cs.CV

    UMAD: University of Macau Anomaly Detection Benchmark Dataset

    Authors: Dong Li, Lineng Chen, Cheng-Zhong Xu, Hui Kong

    Abstract: Anomaly detection is critical in surveillance systems and patrol robots by identifying anomalous regions in images for early warning. Depending on whether reference data are utilized, anomaly detection can be categorized into anomaly detection with reference and anomaly detection without reference. Currently, anomaly detection without reference, which is closely related to out-of-distribution (OoD… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: Accepted by the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2024, project code at https://github.com/IMRL/UMAD

  16. arXiv:2408.12526  [pdf, other

    cs.LG

    Exploiting Student Parallelism for Low-latency GPU Inference of BERT-like Models in Online Services

    Authors: Weiyan Wang, Yilun Jin, Yiming Zhang, Victor Junqiu Wei, Han Tian, Li Chen, Kai Chen

    Abstract: Due to high accuracy, BERT-like models have been widely adopted by discriminative text mining and web searching. However, large BERT-like models suffer from inefficient online inference, as they face the following two problems on GPUs. First, they rely on the large model depth to achieve high accuracy, which linearly increases the sequential computation on GPUs. Second, stochastic and dynamic onli… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  17. arXiv:2408.11824   

    cs.HC cs.AI

    AppAgent v2: Advanced Agent for Flexible Mobile Interactions

    Authors: Yanda Li, Chi Zhang, Wanqi Yang, Bin Fu, Pei Cheng, Xin Chen, Ling Chen, Yunchao Wei

    Abstract: With the advancement of Multimodal Large Language Models (MLLM), LLM-driven visual agents are increasingly impacting software interfaces, particularly those with graphical user interfaces. This work introduces a novel LLM-based multimodal agent framework for mobile devices. This framework, capable of navigating mobile devices, emulates human-like interactions. Our agent constructs a flexible actio… ▽ More

    Submitted 23 August, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

    Comments: Pre-print version, some content needs to be supplemented

  18. arXiv:2408.11048  [pdf, other

    cs.RO cs.AI cs.LG

    RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands

    Authors: Yi Zhao, Le Chen, Jan Schneider, Quankai Gao, Juho Kannala, Bernhard Schölkopf, Joni Pajarinen, Dieter Büchler

    Abstract: It has been a long-standing research goal to endow robot hands with human-level dexterity. Bi-manual robot piano playing constitutes a task that combines challenges from dynamic tasks, such as generating fast while precise motions, with slower but contact-rich manipulation problems. Although reinforcement learning based approaches have shown promising results in single-task performance, these meth… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Project Website: https://rp1m.github.io/

  19. arXiv:2408.10198  [pdf, other

    cs.CV cs.GR

    MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model

    Authors: Minghua Liu, Chong Zeng, Xinyue Wei, Ruoxi Shi, Linghao Chen, Chao Xu, Mengqi Zhang, Zhaoning Wang, Xiaoshuai Zhang, Isabella Liu, Hongzhi Wu, Hao Su

    Abstract: Open-world 3D reconstruction models have recently garnered significant attention. However, without sufficient 3D inductive bias, existing methods typically entail expensive training costs and struggle to extract high-quality 3D meshes. In this work, we introduce MeshFormer, a sparse-view reconstruction model that explicitly leverages 3D native structure, input guidance, and training supervision. S… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 20 pages, 9 figures

  20. arXiv:2408.10195  [pdf, other

    cs.CV cs.AI cs.GR

    SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views

    Authors: Chao Xu, Ang Li, Linghao Chen, Yulin Liu, Ruoxi Shi, Hao Su, Minghua Liu

    Abstract: Open-world 3D generation has recently attracted considerable attention. While many single-image-to-3D methods have yielded visually appealing outcomes, they often lack sufficient controllability and tend to produce hallucinated regions that may not align with users' expectations. In this paper, we explore an important scenario in which the input consists of one or a few unposed 2D images of a sing… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: ECCV 2024

  21. arXiv:2408.09858  [pdf, ps, other

    cs.LG cs.AR

    ShortCircuit: AlphaZero-Driven Circuit Design

    Authors: Dimitrios Tsaras, Antoine Grosnit, Lei Chen, Zhiyao Xie, Haitham Bou-Ammar, Mingxuan Yuan

    Abstract: Chip design relies heavily on generating Boolean circuits, such as AND-Inverter Graphs (AIGs), from functional descriptions like truth tables. While recent advances in deep learning have aimed to accelerate circuit design, these efforts have mostly focused on tasks other than synthesis, and traditional heuristic methods have plateaued. In this paper, we introduce ShortCircuit, a novel transformer-… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  22. arXiv:2408.08295  [pdf, other

    cs.CV cs.AI cs.LG

    SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training

    Authors: Gengwei Zhang, Liyuan Wang, Guoliang Kang, Ling Chen, Yunchao Wei

    Abstract: In recent years, continual learning with pre-training (CLPT) has received widespread interest, instead of its traditional focus of training from scratch. The use of strong pre-trained models (PTMs) can greatly facilitate knowledge transfer and alleviate catastrophic forgetting, but also suffers from progressive overfitting of pre-trained knowledge into specific downstream tasks. A majority of curr… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: This paper is an extension of our ICCV 23 paper (arXiv:2303.05118)

  23. arXiv:2408.08243  [pdf, other

    quant-ph cs.NI

    From Entanglement Purification Scheduling to Fidelity-constrained Multi-Flow Routing

    Authors: Ziyue Jia, Lin Chen

    Abstract: Recently emerged as a disruptive networking paradigm, quantum networks rely on the mysterious quantum entanglement to teleport qubits without physically transferring quantum particles. However, the state of quantum systems is extremely fragile due to environment noise. A promising technique to combat against quantum decoherence is entanglement purification. To fully exploit its benefit, two fundam… ▽ More

    Submitted 22 August, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

    Comments: 15 pages, 12 figures

  24. arXiv:2408.08078  [pdf, other

    cs.CV cs.AI

    Treat Stillness with Movement: Remote Sensing Change Detection via Coarse-grained Temporal Foregrounds Mining

    Authors: Xixi Wang, Zitian Wang, Jingtao Jiang, Lan Chen, Xiao Wang, Bo Jiang

    Abstract: Current works focus on addressing the remote sensing change detection task using bi-temporal images. Although good performance can be achieved, however, seldom of they consider the motion cues which may also be vital. In this work, we revisit the widely adopted bi-temporal images-based framework and propose a novel Coarse-grained Temporal Mining Augmented (CTMA) framework. To be specific, given th… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: In Peer Review

  25. arXiv:2408.07999  [pdf, other

    cs.CV

    Co-Fix3D: Enhancing 3D Object Detection with Collaborative Refinement

    Authors: Wenxuan Li, Qin Zou, Chi Chen, Bo Du, Long Chen

    Abstract: In the realm of autonomous driving,accurately detecting occluded or distant objects,referred to as weak positive sample ,presents significant challenges. These challenges predominantly arise during query initialization, where an over-reliance on heatmap confidence often results in a high rate of false positives, consequently masking weaker detections and impairing system performance. To alleviate… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  26. arXiv:2408.06891  [pdf

    cs.AI cs.CE cs.CV cs.LG

    Automatic Feature Recognition and Dimensional Attributes Extraction From CAD Models for Hybrid Additive-Subtractive Manufacturing

    Authors: Muhammad Tayyab Khan, Wenhe Feng, Lequn Chen, Ye Han Ng, Nicholas Yew Jin Tan, Seung Ki Moon

    Abstract: The integration of Computer-Aided Design (CAD), Computer-Aided Process Planning (CAPP), and Computer-Aided Manufacturing (CAM) plays a crucial role in modern manufacturing, facilitating seamless transitions from digital designs to physical products. However, a significant challenge within this integration is the Automatic Feature Recognition (AFR) of CAD models, especially in the context of hybrid… ▽ More

    Submitted 14 August, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

    Comments: 10 pages, 12 figures. This paper has been accepted for presentation at the ASME IDETC-CIE 2024 conference

  27. arXiv:2408.06743  [pdf, other

    cs.LG

    Class-aware and Augmentation-free Contrastive Learning from Label Proportion

    Authors: Jialiang Wang, Ning Zhang, Shimin Di, Ruidong Wang, Lei Chen

    Abstract: Learning from Label Proportion (LLP) is a weakly supervised learning scenario in which training data is organized into predefined bags of instances, disclosing only the class label proportions per bag. This paradigm is essential for user modeling and personalization, where user privacy is paramount, offering insights into user preferences without revealing individual data. LLP faces a unique diffi… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  28. arXiv:2408.06717  [pdf, other

    cs.LG cs.AI

    Computation-friendly Graph Neural Network Design by Accumulating Knowledge on Large Language Models

    Authors: Jialiang Wang, Shimin Di, Hanmo Liu, Zhili Wang, Jiachuan Wang, Lei Chen, Xiaofang Zhou

    Abstract: Graph Neural Networks (GNNs), like other neural networks, have shown remarkable success but are hampered by the complexity of their architecture designs, which heavily depend on specific data and tasks. Traditionally, designing proper architectures involves trial and error, which requires intensive manual effort to optimize various components. To reduce human workload, researchers try to develop a… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  29. arXiv:2408.06568  [pdf, other

    cs.SE

    MORCoRA: Multi-Objective Refactoring Recommendation Considering Review Availability

    Authors: Lei Chen, Shinpei Hayashi

    Abstract: Background: Search-based refactoring involves searching for a sequence of refactorings to achieve specific objectives. Although a typical objective is improving code quality, a different perspective is also required; the searched sequence must undergo review before being applied and may not be applied if the review fails or is postponed due to no proper reviewers. Aim: Therefore, it is essential t… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: Preprint of an article accepted to be published in International Journal of Software Engineering and Knowledge Engineering, (C) 2024 World Scientific Publishing Company, https://www.worldscientific.com/worldscinet/ijseke

  30. arXiv:2408.05897  [pdf, other

    cs.HC

    TRIZ-GPT: An LLM-augmented method for problem-solving

    Authors: Liuqing Chen, Yaxuan Song, Shixian Ding, Lingyun Sun, Peter Childs, Haoyu Zuo

    Abstract: TRIZ, the Theory of Inventive Problem Solving, is derived from a comprehensive analysis of patents across various domains, offering a framework and practical tools for problem-solving. Despite its potential to foster innovative solutions, the complexity and abstractness of TRIZ methodology often make its acquisition and application challenging. This often requires users to have a deep understandin… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  31. arXiv:2408.05778  [pdf, other

    cs.LG math.OC

    Pareto Front Shape-Agnostic Pareto Set Learning in Multi-Objective Optimization

    Authors: Rongguang Ye, Longcan Chen, Wei-Bin Kou, Jinyuan Zhang, Hisao Ishibuchi

    Abstract: Pareto set learning (PSL) is an emerging approach for acquiring the complete Pareto set of a multi-objective optimization problem. Existing methods primarily rely on the mapping of preference vectors in the objective space to Pareto optimal solutions in the decision space. However, the sampling of preference vectors theoretically requires prior knowledge of the Pareto front shape to ensure high pe… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: 7 pages

    Journal ref: IEEE International Conference on Systems, Man, and Cybernetics (IEEE SMC 2024)

  32. arXiv:2408.05699  [pdf, other

    cs.CV

    MacFormer: Semantic Segmentation with Fine Object Boundaries

    Authors: Guoan Xu, Wenfeng Huang, Tao Wu, Ligeng Chen, Wenjing Jia, Guangwei Gao, Xiatian Zhu, Stuart Perry

    Abstract: Semantic segmentation involves assigning a specific category to each pixel in an image. While Vision Transformer-based models have made significant progress, current semantic segmentation methods often struggle with precise predictions in localized areas like object boundaries. To tackle this challenge, we introduce a new semantic segmentation architecture, ``MacFormer'', which features two key co… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: 13 pages, 7 figures, submitted to TIP

  33. arXiv:2408.05584  [pdf

    cs.LG stat.ME

    Dynamical causality under invisible confounders

    Authors: Jinling Yan, Shao-Wu Zhang, Chihao Zhang, Weitian Huang, Jifan Shi, Luonan Chen

    Abstract: Causality inference is prone to spurious causal interactions, due to the substantial confounders in a complex system. While many existing methods based on the statistical methods or dynamical methods attempt to address misidentification challenges, there remains a notable lack of effective methods to infer causality, in particular in the presence of invisible/unobservable confounders. As a result,… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: 23 pages, 5 figures

  34. arXiv:2408.05307  [pdf

    cs.CE cs.LG

    Audio-visual cross-modality knowledge transfer for machine learning-based in-situ monitoring in laser additive manufacturing

    Authors: Jiarui Xie, Mutahar Safdar, Lequn Chen, Seung Ki Moon, Yaoyao Fiona Zhao

    Abstract: Various machine learning (ML)-based in-situ monitoring systems have been developed to detect laser additive manufacturing (LAM) process anomalies and defects. Multimodal fusion can improve in-situ monitoring performance by acquiring and integrating data from multiple modalities, including visual and audio data. However, multimodal fusion employs multiple sensors of different types, which leads to… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: 36 pages, 12 figures, 6 tables

  35. arXiv:2408.03957  [pdf, other

    cs.NI cs.IT cs.LG eess.SP

    GNN-Based Joint Channel and Power Allocation in Heterogeneous Wireless Networks

    Authors: Lili Chen, Jingge Zhu, Jamie Evans

    Abstract: The optimal allocation of channels and power resources plays a crucial role in ensuring minimal interference, maximal data rates, and efficient energy utilisation. As a successful approach for tackling resource management problems in wireless networks, Graph Neural Networks (GNNs) have attracted a lot of attention. This article proposes a GNN-based algorithm to address the joint resource allocatio… ▽ More

    Submitted 28 July, 2024; originally announced August 2024.

  36. arXiv:2408.03771  [pdf

    cs.CV

    Methodological Explainability Evaluation of an Interpretable Deep Learning Model for Post-Hepatectomy Liver Failure Prediction Incorporating Counterfactual Explanations and Layerwise Relevance Propagation: A Prospective In Silico Trial

    Authors: Xian Zhong, Zohaib Salahuddin, Yi Chen, Henry C Woodruff, Haiyi Long, Jianyun Peng, Nuwan Udawatte, Roberto Casale, Ayoub Mokhtari, Xiaoer Zhang, Jiayao Huang, Qingyu Wu, Li Tan, Lili Chen, Dongming Li, Xiaoyan Xie, Manxia Lin, Philippe Lambin

    Abstract: Artificial intelligence (AI)-based decision support systems have demonstrated value in predicting post-hepatectomy liver failure (PHLF) in hepatocellular carcinoma (HCC). However, they often lack transparency, and the impact of model explanations on clinicians' decisions has not been thoroughly evaluated. Building on prior research, we developed a variational autoencoder-multilayer perceptron (VAE… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  37. arXiv:2408.03394  [pdf, other

    cs.RO

    Faster Model Predictive Control via Self-Supervised Initialization Learning

    Authors: Zhaoxin Li, Letian Chen, Rohan Paleja, Subramanya Nageshrao, Matthew Gombolay

    Abstract: Optimization for robot control tasks, spanning various methodologies, includes Model Predictive Control (MPC). However, the complexity of the system, such as non-convex and non-differentiable cost functions and prolonged planning horizons often drastically increases the computation time, limiting MPC's real-world applicability. Prior works in speeding up the optimization have limitations on solvin… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  38. arXiv:2408.02999  [pdf, other

    cs.FL cs.AI

    LLMs as Probabilistic Minimally Adequate Teachers for DFA Learning

    Authors: Lekai Chen, Ashutosh Trivedi, Alvaro Velasquez

    Abstract: The emergence of intelligence in large language models (LLMs) has inspired investigations into their integration into automata learning. This paper introduces the probabilistic Minimally Adequate Teacher (pMAT) formulation, which leverages a probabilistic oracle that could give persistent errors randomly during answering the membership queries for deterministic finite automata (DFA) learning. Give… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  39. arXiv:2408.02293  [pdf, other

    cs.RO eess.SY

    OPENGRASP-LITE Version 1.0: A Tactile Artificial Hand with a Compliant Linkage Mechanism

    Authors: Sonja Groß, Michael Ratzel, Edgar Welte, Diego Hidalgo-Carvajal, Lingyun Chen, Edmundo Pozo Fortunić, Amartya Ganguly, Abdalla Swikir, Sami Haddadin

    Abstract: Recent research has seen notable progress in the development of linkage-based artificial hands. While previous designs have focused on adaptive grasping, dexterity and biomimetic artificial skin, only a few systems have proposed a lightweight, accessible solution integrating tactile sensing with a compliant linkage-based mechanism. This paper introduces OPENGRASP LITE, an open-source, highly integ… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Accepted at IEEE/RSJ International Conference on Intelligent Robots and Systems, 14-18 October 2024

  40. arXiv:2408.01976  [pdf, other

    cs.CV

    Single-Point Supervised High-Resolution Dynamic Network for Infrared Small Target Detection

    Authors: Jing Wu, Rixiang Ni, Feng Huang, Zhaobing Qiu, Liqiong Chen, Changhai Luo, Yunxiang Li, Youli Li

    Abstract: Infrared small target detection (IRSTD) tasks are extremely challenging for two main reasons: 1) it is difficult to obtain accurate labelling information that is critical to existing methods, and 2) infrared (IR) small target information is easily lost in deep networks. To address these issues, we propose a single-point supervised high-resolution dynamic network (SSHD-Net). In contrast to existing… ▽ More

    Submitted 7 August, 2024; v1 submitted 4 August, 2024; originally announced August 2024.

  41. arXiv:2408.01120  [pdf, other

    cs.CV

    An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding

    Authors: Wei Chen, Long Chen, Yu Wu

    Abstract: Most advanced visual grounding methods rely on Transformers for visual-linguistic feature fusion. However, these Transformer-based approaches encounter a significant drawback: the computational costs escalate quadratically due to the self-attention mechanism in the Transformer Encoder, particularly when dealing with high-resolution images or long context sentences. This quadratic increase in compu… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 21pages, 10 figures, 9 tables. Accepted to ECCV 2024

  42. arXiv:2408.01112  [pdf, other

    cs.MA

    Agentic LLM Workflows for Generating Patient-Friendly Medical Reports

    Authors: Malavikha Sudarshan, Sophie Shih, Estella Yee, Alina Yang, John Zou, Cathy Chen, Quan Zhou, Leon Chen, Chinmay Singhal, George Shih

    Abstract: The application of Large Language Models (LLMs) in healthcare is expanding rapidly, with one potential use case being the translation of formal medical reports into patient-legible equivalents. Currently, LLM outputs often need to be edited and evaluated by a human to ensure both factual accuracy and comprehensibility, and this is true for the above use case. We aim to minimize this step by propos… ▽ More

    Submitted 5 August, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

    Comments: 12 pages, 7 figures

  43. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  44. arXiv:2407.20578  [pdf, ps, other

    cs.CL cs.AI cs.CY

    Comparison of Large Language Models for Generating Contextually Relevant Questions

    Authors: Ivo Lodovico Molina, Valdemar Švábenský, Tsubasa Minematsu, Li Chen, Fumiya Okubo, Atsushi Shimada

    Abstract: This study explores the effectiveness of Large Language Models (LLMs) for Automatic Question Generation in educational settings. Three LLMs are compared in their ability to create questions from university slide text without fine-tuning. Questions were obtained in a two-step pipeline: first, answer phrases were extracted from slides using Llama 2-Chat 13B; then, the three models generated question… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: Published in Springer ECTEL 2024 conference proceedings

    ACM Class: K.3

  45. arXiv:2407.20156  [pdf, other

    cs.RO

    Autonomous and Teleoperation Control of a Drawing Robot Avatar

    Authors: Lingyun Chen, Abdeldjallil Naceri, Abdalla Swikir, Sandra Hirche, Sami Haddadin

    Abstract: A drawing robot avatar is a robotic system that allows for telepresence-based drawing, enabling users to remotely control a robotic arm and create drawings in real-time from a remote location. The proposed control framework aims to improve bimanual robot telepresence quality by reducing the user workload and required prior knowledge through the automation of secondary or auxiliary tasks. The intro… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted to ICRA 2024

  46. arXiv:2407.19826  [pdf, other

    cs.RO

    Design and Control of a Novel Six-Degree-of-Freedom Hybrid Robotic Arm

    Authors: Yang Chen, Zhonghua Miao, Yuanyue Ge, Sen lin, Liping Chen, Ya Xiong

    Abstract: Robotic arms are key components in fruit-harvesting robots. In agricultural settings, conventional serial or parallel robotic arms often fall short in meeting the demands for a large workspace, rapid movement, enhanced capability of obstacle avoidance and affordability. This study proposes a novel hybrid six-degree-of-freedom (DoF) robotic arm that combines the advantages of parallel and serial me… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted by IROS 2024

  47. arXiv:2407.19546  [pdf, other

    cs.CV

    XLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training

    Authors: Biao Wu, Yutong Xie, Zeyu Zhang, Minh Hieu Phan, Qi Chen, Ling Chen, Qi Wu

    Abstract: Vision-and-language pretraining (VLP) in the medical field utilizes contrastive learning on image-text pairs to achieve effective transfer across tasks. Yet, current VLP approaches with the masked modelling strategy face two challenges when applied to the medical domain. First, current models struggle to accurately reconstruct key pathological features due to the scarcity of medical data. Second,… ▽ More

    Submitted 2 August, 2024; v1 submitted 28 July, 2024; originally announced July 2024.

  48. arXiv:2407.19493  [pdf, ps, other

    cs.CV cs.AI cs.MM

    Official-NV: A News Video Dataset for Multimodal Fake News Detection

    Authors: Yihao Wang, Lizhi Chen, Zhong Qian, Peifeng Li

    Abstract: News media, especially video news media, have penetrated into every aspect of daily life, which also brings the risk of fake news. Therefore, multimodal fake news detection has recently received more attention. However, the number of fake news detection data sets for video modal is small, and these data sets are composed of unofficial videos uploaded by users, so there is too much useless data. To… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  49. arXiv:2407.19453  [pdf, other

    cs.CV

    FIND: Fine-tuning Initial Noise Distribution with Policy Optimization for Diffusion Models

    Authors: Changgu Chen, Libing Yang, Xiaoyan Yang, Lianggangxu Chen, Gaoqi He, CHangbo Wang, Yang Li

    Abstract: In recent years, large-scale pre-trained diffusion models have demonstrated their outstanding capabilities in image and video generation tasks. However, existing models tend to produce visual objects commonly found in the training dataset, which diverges from user input prompts. The underlying reason behind the inaccurate generated results lies in the model's difficulty in sampling from specific i… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  50. arXiv:2407.19376  [pdf, other

    cs.CE

    CIDER: Counterfactual-Invariant Diffusion-based GNN Explainer for Causal Subgraph Inference

    Authors: Qibin Zhang, Chengshang Lyu, Lingxi Chen, Qiqi Jin, Luonan Chen

    Abstract: Inferring causal links or subgraphs corresponding to a specific phenotype or label based solely on measured data is an important yet challenging task, which is also different from inferring causal nodes. While Graph Neural Network (GNN) Explainers have shown potential in subgraph identification, existing methods with GNN often offer associative rather than causal insights. This lack of transparenc… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.