Skip to main content

Showing 1–50 of 1,337 results for author: Xu, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.17175  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

    Authors: Zhen Ye, Peiwen Sun, Jiahe Lei, Hongzhan Lin, Xu Tan, Zheqi Dai, Qiuqiang Kong, Jianyi Chen, Jiahao Pan, Qifeng Liu, Yike Guo, Wei Xue

    Abstract: Recent advancements in audio generation have been significantly propelled by the capabilities of Large Language Models (LLMs). The existing research on audio LLM has primarily focused on enhancing the architecture and scale of audio language models, as well as leveraging larger datasets, and generally, acoustic codecs, such as EnCodec, are used for audio tokenization. However, these codecs were or… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  2. arXiv:2408.15488  [pdf, other

    cs.CL

    Legilimens: Practical and Unified Content Moderation for Large Language Model Services

    Authors: Jialin Wu, Jiangyi Deng, Shengyuan Pang, Yanjiao Chen, Jiayang Xu, Xinfeng Li, Wenyuan Xu

    Abstract: Given the societal impact of unsafe content generated by large language models (LLMs), ensuring that LLM services comply with safety standards is a crucial concern for LLM service providers. Common content moderation methods are limited by an effectiveness-and-efficiency dilemma, where simple models are fragile while sophisticated models consume excessive computational resources. In this paper, we… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM Conference on Computer and Communications Security (CCS) 2024

  3. arXiv:2408.14972  [pdf, other

    cs.CL

    AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems

    Authors: Chi-Min Chan, Jianxuan Yu, Weize Chen, Chunyang Jiang, Xinyu Liu, Weijie Shi, Zhiyuan Liu, Wei Xue, Yike Guo

    Abstract: The rapid advancement of large language models (LLMs) has led to the rise of LLM-based agents. Recent research shows that multi-agent systems (MAS), where each agent plays a specific role, can outperform individual LLMs. However, configuring an MAS for a task remains challenging, with performance only observable post-execution. Inspired by scaling laws in LLM development, we investigate whether MA… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  4. arXiv:2408.14035  [pdf, other

    cs.RO cs.CV

    FAST-LIVO2: Fast, Direct LiDAR-Inertial-Visual Odometry

    Authors: Chunran Zheng, Wei Xu, Zuhao Zou, Tong Hua, Chongjian Yuan, Dongjiao He, Bingyang Zhou, Zheng Liu, Jiarong Lin, Fangcheng Zhu, Yunfan Ren, Rong Wang, Fanle Meng, Fu Zhang

    Abstract: This paper proposes FAST-LIVO2: a fast, direct LiDAR-inertial-visual odometry framework to achieve accurate and robust state estimation in SLAM tasks and provide great potential in real-time, onboard robotic applications. FAST-LIVO2 fuses the IMU, LiDAR and image measurements efficiently through an ESIKF. To address the dimension mismatch between the heterogeneous LiDAR and image measurements, we… ▽ More

    Submitted 28 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

    Comments: 30 pages, 31 figures, due to the limitation that 'The abstract field cannot exceed 1,920 characters', the abstract presented here is shorter than the one in the PDF file

  5. arXiv:2408.13849  [pdf

    cs.CR

    Sample-Independent Federated Learning Backdoor Attack

    Authors: Weida Xu, Yang Xu, Sicong Zhang

    Abstract: In federated learning, backdoor attacks embed triggers in the adversarial client's data to inject a backdoor into the model. To evade detection through sample analysis, non-sample-modifying backdoor attack methods based on dropout have been developed. However, these methods struggle to covertly utilize dropout in evaluation mode, thus hindering their deployment in real-world scenarios. To address… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  6. arXiv:2408.13773  [pdf, other

    cs.CR cs.AI

    SAB:A Stealing and Robust Backdoor Attack based on Steganographic Algorithm against Federated Learning

    Authors: Weida Xu, Yang Xu, Sicong Zhang

    Abstract: Federated learning, an innovative network architecture designed to safeguard user privacy, is gaining widespread adoption in the realm of technology. However, given the existence of backdoor attacks in federated learning, exploring the security of federated learning is significance. Nevertheless, the backdoors investigated in current federated learning research can be readily detected by human ins… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  7. arXiv:2408.12616  [pdf, other

    cs.CV cs.AI

    Semantic Communication based on Large Language Model for Underwater Image Transmission

    Authors: Weilong Chen, Wenxuan Xu, Haoran Chen, Xinran Zhang, Zhijin Qin, Yanru Zhang, Zhu Han

    Abstract: Underwater communication is essential for environmental monitoring, marine biology research, and underwater exploration. Traditional underwater communication faces limitations like low bandwidth, high latency, and susceptibility to noise, while semantic communication (SC) offers a promising solution by focusing on the exchange of semantics rather than symbols or bits. However, SC encounters challe… ▽ More

    Submitted 25 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

  8. arXiv:2408.12162  [pdf, ps, other

    cs.IT eess.SP

    Empowering Over-the-Air Personalized Federated Learning via RIS

    Authors: Wei Shi, Jiacheng Yao, Jindan Xu, Wei Xu, Lexi Xu, Chunming Zhao

    Abstract: Over-the-air computation (AirComp) integrates analog communication with task-oriented computation, serving as a key enabling technique for communication-efficient federated learning (FL) over wireless networks. However, AirComp-enabled FL (AirFL) with a single global consensus model fails to address the data heterogeneity in real-life FL scenarios with non-independent and identically distributed l… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: Accepted by SCIENCE CHINA Information Sciences

  9. arXiv:2408.11446  [pdf, other

    cs.ET

    Green Probabilistic Semantic Communication over Wireless Networks

    Authors: Ruopeng Xu, Zhaohui Yang, Yijie Mao, Chongwen Huang, Qianqian Yang, Lexi Xu, Wei Xu, Zhaoyang Zhang

    Abstract: In this paper, we propose a multi-user green semantic communication system facilitated by a probabilistic knowledge graph (PKG). By integrating probability into the knowledge graph, we enable probabilistic semantic communication (PSC) and represent semantic information accordingly. On this basis, a semantic compression model designed for multi-user downlink task-oriented communication is introduce… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  10. arXiv:2408.11381  [pdf, other

    cs.CL

    RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation

    Authors: Xuanwang Zhang, Yunze Song, Yidong Wang, Shuyun Tang, Xinfeng Li, Zhengran Zeng, Zhen Wu, Wei Ye, Wenyuan Xu, Yue Zhang, Xinyu Dai, Shikun Zhang, Qingsong Wen

    Abstract: Large Language Models (LLMs) demonstrate human-level capabilities in dialogue, reasoning, and knowledge retention. However, even the most advanced LLMs face challenges such as hallucinations and real-time updating of their knowledge. Current research addresses this bottleneck by equipping LLMs with external knowledge, a technique known as Retrieval Augmented Generation (RAG). However, two key issu… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 6 pages, 3 figures

  11. arXiv:2408.10899  [pdf, other

    cs.RO

    All Robots in One: A New Standard and Unified Dataset for Versatile, General-Purpose Embodied Agents

    Authors: Zhiqiang Wang, Hao Zheng, Yunshuang Nie, Wenjun Xu, Qingwei Wang, Hua Ye, Zhe Li, Kaidong Zhang, Xuewen Cheng, Wanxi Dong, Chang Cai, Liang Lin, Feng Zheng, Xiaodan Liang

    Abstract: Embodied AI is transforming how AI systems interact with the physical world, yet existing datasets are inadequate for developing versatile, general-purpose agents. These limitations include a lack of standardized formats, insufficient data diversity, and inadequate data volume. To address these issues, we introduce ARIO (All Robots In One), a new data standard that enhances existing datasets by of… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Project website: https://imaei.github.io/project_pages/ario/

  12. arXiv:2408.10641  [pdf, other

    cs.CV cs.AI

    A Review of Human-Object Interaction Detection

    Authors: Yuxiao Wang, Qiwei Xiong, Yu Lei, Weiying Xue, Qi Liu, Zhenao Wei

    Abstract: Human-object interaction (HOI) detection plays a key role in high-level visual understanding, facilitating a deep comprehension of human activities. Specifically, HOI detection aims to locate the humans and objects involved in interactions within images or videos and classify the specific interactions between them. The success of this task is influenced by several key factors, including the accura… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  13. arXiv:2408.10562  [pdf, other

    cs.RO cs.CV

    Kalib: Markerless Hand-Eye Calibration with Keypoint Tracking

    Authors: Tutian Tang, Minghao Liu, Wenqiang Xu, Cewu Lu

    Abstract: Hand-eye calibration involves estimating the transformation between the camera and the robot. Traditional methods rely on fiducial markers, involving much manual labor and careful setup. Recent advancements in deep learning offer markerless techniques, but they present challenges, including the need for retraining networks for each robot, the requirement of accurate mesh models for data generation… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: The code and supplementary materials are available at https://sites.google.com/view/hand-eye-kalib

  14. arXiv:2408.10280  [pdf, other

    cs.LG

    NoRA: Nested Low-Rank Adaptation for Efficient Fine-Tuning Large Models

    Authors: Cheng Lin, Lujun Li, Dezhi Li, Jie Zou, Wei Xue, Yike Guo

    Abstract: In this paper, we introduce Nested Low-Rank Adaptation (NoRA), a novel approach to parameter-efficient fine-tuning that extends the capabilities of Low-Rank Adaptation (LoRA) techniques. Vanilla LoRA overlooks pre-trained weight inheritance and still requires fine-tuning numerous parameters. To addresses these issues, our NoRA adopts a dual-layer nested structure with Singular Value Decomposition… ▽ More

    Submitted 27 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

    Comments: Work in progress, revisions ongoing

  15. arXiv:2408.10088  [pdf, other

    cs.SI

    Recent Surge in Public Interest in Transportation: Sentiment Analysis of Baidu Apollo Go Using Weibo Data

    Authors: Shiqi Wang, Zhouye Zhao, Yuhang Xie, Mingchuan Ma, Zirui Chen, Zeyu Wang, Bohao Su, Wenrui Xu, Tianyi Li

    Abstract: Urban mobility and transportation systems have been profoundly transformed by the advancement of autonomous vehicle technologies. Baidu Apollo Go, a pioneer robotaxi service from the Chinese tech giant Baidu, has recently been widely deployed in major cities like Beijing and Wuhan, sparking increased conversation and offering a glimpse into the future of urban mobility. This study investigates p… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    ACM Class: J.4

  16. arXiv:2408.09849  [pdf, other

    cs.CL cs.AI

    Importance Weighting Can Help Large Language Models Self-Improve

    Authors: Chunyang Jiang, Chi-min Chan, Wei Xue, Qifeng Liu, Yike Guo

    Abstract: Large language models (LLMs) have shown remarkable capability in numerous tasks and applications. However, fine-tuning LLMs using high-quality datasets under external supervision remains prohibitively expensive. In response, LLM self-improvement approaches have been vibrantly developed recently. The typical paradigm of LLM self-improvement involves training LLM on self-generated data, part of whic… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  17. arXiv:2408.09403  [pdf, other

    cs.AI cs.CV

    Obtaining Optimal Spiking Neural Network in Sequence Learning via CRNN-SNN Conversion

    Authors: Jiahao Su, Kang You, Zekai Xu, Weizhi Xu, Zhezhi He

    Abstract: Spiking neural networks (SNNs) are becoming a promising alternative to conventional artificial neural networks (ANNs) due to their rich neural dynamics and the implementation of energy-efficient neuromorphic chips. However, the non-differential binary communication mechanism makes SNN hard to converge to an ANN-level accuracy. When SNN encounters sequence learning, the situation becomes worse due… ▽ More

    Submitted 25 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

    Comments: Accepted by 33rd International Conference on Artificial Neural Networks

  18. arXiv:2408.08713  [pdf, other

    cs.LG cs.AI cs.IR

    Beyond KAN: Introducing KarSein for Adaptive High-Order Feature Interaction Modeling in CTR Prediction

    Authors: Yunxiao Shi, Wujiang Xu, Mingyu Jin, Haimin Zhang, Qiang Wu, Yongfeng Zhang, Min Xu

    Abstract: Modeling feature interactions is crucial for click-through rate (CTR) prediction, particularly when it comes to high-order explicit interactions. Traditional methods struggle with this task because they often predefine a maximum interaction order, which relies heavily on prior knowledge and can limit the model's effectiveness. Additionally, modeling high-order interactions typically leads to incre… ▽ More

    Submitted 25 August, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

    Comments: KarSein for CTR

  19. arXiv:2408.07470  [pdf, other

    cs.HC

    Enhancement of Co-located Shared VR Experiences: Representing Non-HMD Observers on Both HMD and 2D Screen

    Authors: Zixuan Guo, Wenge Xu, Hongyu Wang, Tingjie Wan, Nilufar Baghaei, Cheng-Hung Lo, Hai-Ning Liang

    Abstract: Virtual reality (VR) not only allows head-mounted display (HMD) users to immerse themselves in virtual worlds but also to share them with others. When designed correctly, this shared experience can be enjoyable. However, in typical scenarios, HMD users are isolated by their devices, and non-HMD observers lack connection with the virtual world. To address this, our research investigates visually re… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  20. arXiv:2408.07468  [pdf, other

    cs.HC

    Exploring the Impact of Passthrough on VR Exergaming in Public Environments: A Field Study

    Authors: Zixuan Guo, Hanxiao Deng, Hongyu Wang, Angel J. Y. Tan, Wenge Xu, Hai-Ning Liang

    Abstract: Sedentary behavior is becoming increasingly prevalent in daily work and study environments. VR exergaming has emerged as a promising solution in these places of work and study. However, private spaces in these environments are not easy, and engaging in VR exergaming in public settings presents its own set of challenges (e.g., safety, social acceptance, isolation, and privacy protection). The recen… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  21. arXiv:2408.07184  [pdf, other

    cs.SD cs.AI

    A New Dataset, Notation Software, and Representation for Computational Schenkerian Analysis

    Authors: Stephen Ni-Hahn, Weihan Xu, Jerry Yin, Rico Zhu, Simon Mak, Yue Jiang, Cynthia Rudin

    Abstract: Schenkerian Analysis (SchA) is a uniquely expressive method of music analysis, combining elements of melody, harmony, counterpoint, and form to describe the hierarchical structure supporting a work of music. However, despite its powerful analytical utility and potential to improve music understanding and generation, SchA has rarely been utilized by the computer music community. This is in large pa… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  22. arXiv:2408.06266  [pdf, other

    cs.LG cs.AI cs.CL

    Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment

    Authors: Karel D'Oosterlinck, Winnie Xu, Chris Develder, Thomas Demeester, Amanpreet Singh, Christopher Potts, Douwe Kiela, Shikib Mehri

    Abstract: Large Language Models (LLMs) are often aligned using contrastive alignment objectives and preference pair datasets. The interaction between model, paired data, and objective makes alignment a complicated procedure, sometimes producing subpar results. We study this and find that (i) preference data gives a better learning signal when the underlying responses are contrastive, and (ii) alignment obje… ▽ More

    Submitted 29 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

  23. arXiv:2408.06082  [pdf, ps, other

    cs.SE

    AutoCheck: Automatically Identifying Variables for Checkpointing by Data Dependency Analysis

    Authors: Xiang Fu, Weiping Zhang, Xin Huang, Shiman Meng, Wubiao Xu, Luanzheng Guo, Kento Sato

    Abstract: Checkpoint/Restart (C/R) has been widely deployed in numerous HPC systems, Clouds, and industrial data centers, which are typically operated by system engineers. Nevertheless, there is no existing approach that helps system engineers without domain expertise, and domain scientists without system fault tolerance knowledge identify those critical variables accounted for correct application execution… ▽ More

    Submitted 15 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

    Comments: 11 pages, 7 figures, 4 tables

  24. arXiv:2408.04738  [pdf, other

    cs.RO

    DiPGrasp: Parallel Local Searching for Efficient Differentiable Grasp Planning

    Authors: Wenqiang Xu, Jieyi Zhang, Tutian Tang, Zhenjun Yu, Yutong Li, Cewu Lu

    Abstract: Grasp planning is an important task for robotic manipulation. Though it is a richly studied area, a standalone, fast, and differentiable grasp planner that can work with robot grippers of different DOFs has not been reported. In this work, we present DiPGrasp, a grasp planner that satisfies all these goals. DiPGrasp takes a force-closure geometric surface matching grasp quality metric. It adopts a… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  25. arXiv:2408.04267  [pdf, other

    cs.SD eess.AS

    Distil-DCCRN: A Small-footprint DCCRN Leveraging Feature-based Knowledge Distillation in Speech Enhancement

    Authors: Runduo Han, Weiming Xu, Zihan Zhang, Mingshuai Liu, Lei Xie

    Abstract: The deep complex convolution recurrent network (DCCRN) achieves excellent speech enhancement performance by utilizing the audio spectrum's complex features. However, it has a large number of model parameters. We propose a smaller model, Distil-DCCRN, which has only 30% of the parameters compared to the DCCRN. To ensure that the performance of Distil-DCCRN matches that of the DCCRN, we employ the k… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Accepted by IEEE Signal Processing Letters

  26. arXiv:2408.03215  [pdf, other

    cs.LG cs.DC

    FedBAT: Communication-Efficient Federated Learning via Learnable Binarization

    Authors: Shiwei Li, Wenchao Xu, Haozhao Wang, Xing Tang, Yining Qi, Shijie Xu, Weihong Luo, Yuhua Li, Xiuqiang He, Ruixuan Li

    Abstract: Federated learning is a promising distributed machine learning paradigm that can effectively exploit large-scale data without exposing users' privacy. However, it may incur significant communication overhead, thereby potentially impairing the training efficiency. To address this challenge, numerous studies suggest binarizing the model updates. Nonetheless, traditional methods usually binarize mode… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted by ICML 2024

  27. arXiv:2408.02710  [pdf, other

    cs.LG cs.CV

    RCDM: Enabling Robustness for Conditional Diffusion Model

    Authors: Weifeng Xu, Xiang Zhu, Xiaoyong Li

    Abstract: The conditional diffusion model (CDM) enhances the standard diffusion model by providing more control, improving the quality and relevance of the outputs, and making the model adaptable to a wider range of complex tasks. However, inaccurate conditional inputs in the inverse process of CDM can easily lead to generating fixed errors in the neural network, which diminishes the adaptability of a well-… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  28. arXiv:2408.02632  [pdf, other

    cs.CL cs.AI

    SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models

    Authors: Muxi Diao, Rumei Li, Shiyang Liu, Guogang Liao, Jingang Wang, Xunliang Cai, Weiran Xu

    Abstract: As large language models (LLMs) continue to advance in capability and influence, ensuring their security and preventing harmful outputs has become crucial. A promising approach to address these concerns involves training models to automatically generate adversarial prompts for red teaming. However, the evolving subtlety of vulnerabilities in LLMs challenges the effectiveness of current adversarial… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  29. arXiv:2408.02487  [pdf, other

    cs.SE cs.AI cs.LG

    A First Look at License Compliance Capability of LLMs in Code Generation

    Authors: Weiwei Xu, Kai Gao, Hao He, Minghui Zhou

    Abstract: Recent advances in Large Language Models (LLMs) have revolutionized code generation, leading to widespread adoption of AI coding tools by developers. However, LLMs can generate license-protected code without providing the necessary license information, leading to potential intellectual property violations during software production. This paper addresses the critical, yet underexplored, issue of li… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  30. arXiv:2408.02032  [pdf, other

    cs.CV cs.AI

    Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models

    Authors: Fushuo Huo, Wenchao Xu, Zhong Zhang, Haozhao Wang, Zhicheng Chen, Peilin Zhao

    Abstract: While Large Vision-Language Models (LVLMs) have rapidly advanced in recent years, the prevalent issue known as the `hallucination' problem has emerged as a significant bottleneck, hindering their real-world deployments. Existing methods mitigate this issue mainly from two perspectives: One approach leverages extra knowledge like robust instruction tuning LVLMs with curated datasets or employing au… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  31. arXiv:2408.01803  [pdf, other

    cs.LG cs.CL

    STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs

    Authors: Peijie Dong, Lujun Li, Dayou Du, Yuhan Chen, Zhenheng Tang, Qiang Wang, Wei Xue, Wenhan Luo, Qifeng Liu, Yike Guo, Xiaowen Chu

    Abstract: In this paper, we present STBLLM, the first structural binarization framework for compressing Large Language Models (LLMs) to less than 1-bit precision. LLMs have achieved remarkable performance, but their heavy memory requirements have hindered widespread adoption, particularly on resource-constrained devices. Binarization, which quantifies weights to a mere 1-bit, achieves a milestone in increas… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  32. arXiv:2408.01791  [pdf

    cs.NI

    Implementing NAT Hole Punching with QUIC

    Authors: Jinyu Liang, Wei Xu, Taotao Wang, Qing Yang, Shengli Zhang

    Abstract: The widespread adoption of Network Address Translation (NAT) technology has led to a significant number of network end nodes being located in private networks behind NAT devices, impeding direct communication between these nodes. To solve this problem, a technique known as "hole punching" has been devised for NAT traversal to facilitate peer-to-peer communication among end nodes located in distinc… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: The paper has been accepted for oral presentation at the VTC2024-Fall Conference

  33. arXiv:2408.01419  [pdf, other

    cs.CL

    DebateQA: Evaluating Question Answering on Debatable Knowledge

    Authors: Rongwu Xu, Xuan Qi, Zehan Qi, Wei Xu, Zhijiang Guo

    Abstract: The rise of large language models (LLMs) has enabled us to seek answers to inherently debatable questions on LLM chatbots, necessitating a reliable way to evaluate their ability. However, traditional QA benchmarks assume fixed answers are inadequate for this purpose. To address this, we introduce DebateQA, a dataset of 2,941 debatable questions, each accompanied by multiple human-annotated partial… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: Dataset and scripts for evaluation are available at https://github.com/pillowsofwind/DebateQA

  34. arXiv:2408.01271  [pdf, other

    cs.CE

    HRFT: Mining High-Frequency Risk Factor Collections End-to-End via Transformer

    Authors: Wenyan Xu, Rundong Wang, Chen Li, Yonghong Hu, Zhonghua Lu

    Abstract: In quantitative trading, it is common to find patterns in short term volatile trends of the market. These patterns are known as High Frequency (HF) risk factors, serving as key indicators of future stock price volatility. Traditionally, these risk factors were generated by financial models relying heavily on domain-specific knowledge manually added rather than extensive market data. Inspired by sy… ▽ More

    Submitted 5 August, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

    Comments: Preprint. Under review

  35. arXiv:2408.00913  [pdf, other

    cs.NI cs.ET

    Design and Implementation of ARA Wireless Living Lab for Rural Broadband and Applications

    Authors: Taimoor Ul Islam, Joshua Ofori Boateng, Md Nadim, Guoying Zu, Mukaram Shahid, Xun Li, Tianyi Zhang, Salil Reddy, Wei Xu, Ataberk Atalar, Vincent Lee, Yung-Fu Chen, Evan Gosling, Elisabeth Permatasari, Christ Somiah, Zhibo Meng, Sarath Babu, Mohammed Soliman, Ali Hussain, Daji Qiao, Mai Zheng, Ozdal Boyraz, Yong Guan, Anish Arora, Mohamed Selim , et al. (6 additional authors not shown)

    Abstract: To address the rural broadband challenge and to leverage the unique opportunities that rural regions provide for piloting advanced wireless applications, we design and implement the ARA wireless living lab for research and innovation in rural wireless systems and their applications in precision agriculture, community services, and so on. ARA focuses on the unique community, application, and econom… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: 17 pages, 18 figures

  36. arXiv:2407.21531  [pdf, other

    cs.SD cs.CL cs.MM eess.AS

    Can LLMs "Reason" in Music? An Evaluation of LLMs' Capability of Music Understanding and Generation

    Authors: Ziya Zhou, Yuhang Wu, Zhiyue Wu, Xinyue Zhang, Ruibin Yuan, Yinghao Ma, Lu Wang, Emmanouil Benetos, Wei Xue, Yike Guo

    Abstract: Symbolic Music, akin to language, can be encoded in discrete symbols. Recent research has extended the application of large language models (LLMs) such as GPT-4 and Llama2 to the symbolic music domain including understanding and generation. Yet scant research explores the details of how these LLMs perform on advanced music understanding and conditioned generation, especially from the multi-step re… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: Accepted by ISMIR2024

  37. arXiv:2407.20962  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

    Authors: Xiaowei Chi, Yatian Wang, Aosong Cheng, Pengjun Fang, Zeyue Tian, Yingqing He, Zhaoyang Liu, Xingqun Qi, Jiahao Pan, Rongyu Zhang, Mengfei Li, Ruibin Yuan, Yanbing Jiang, Wei Xue, Wenhan Luo, Qifeng Chen, Shanghang Zhang, Qifeng Liu, Yike Guo

    Abstract: Massive multi-modality datasets play a significant role in facilitating the success of large video-language models. However, current video-language datasets primarily provide text descriptions for visual frames, considering audio to be weakly related information. They usually overlook exploring the potential of inherent audio-visual correlation, leading to monotonous annotation within each modalit… ▽ More

    Submitted 6 August, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

    Comments: 15 Pages. Dataset report

  38. arXiv:2407.19765  [pdf, other

    cs.AI

    Map2Traj: Street Map Piloted Zero-shot Trajectory Generation with Diffusion Model

    Authors: Zhenyu Tao, Wei Xu, Xiaohu You

    Abstract: User mobility modeling serves a crucial role in analysis and optimization of contemporary wireless networks. Typical stochastic mobility models, e.g., random waypoint model and Gauss Markov model, can hardly capture the distribution characteristics of users within real-world areas. State-of-the-art trace-based mobility models and existing learning-based trajectory generation methods, however, are… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  39. arXiv:2407.19672  [pdf, other

    cs.CL

    SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages

    Authors: Wenxuan Zhang, Hou Pong Chan, Yiran Zhao, Mahani Aljunied, Jianyu Wang, Chaoqun Liu, Yue Deng, Zhiqiang Hu, Weiwen Xu, Yew Ken Chia, Xin Li, Lidong Bing

    Abstract: Large Language Models (LLMs) have shown remarkable abilities across various tasks, yet their development has predominantly centered on high-resource languages like English and Chinese, leaving low-resource languages underserved. To address this disparity, we present SeaLLMs 3, the latest iteration of the SeaLLMs model family, tailored for Southeast Asian languages. This region, characterized by it… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  40. arXiv:2407.19514  [pdf, other

    cs.CV cs.MM

    Detached and Interactive Multimodal Learning

    Authors: Yunfeng Fan, Wenchao Xu, Haozhao Wang, Junhong Liu, Song Guo

    Abstract: Recently, Multimodal Learning (MML) has gained significant interest as it compensates for single-modality limitations through comprehensive complementary information within multimodal data. However, traditional MML methods generally use the joint learning framework with a uniform learning objective that can lead to the modality competition issue, where feedback predominantly comes from certain mod… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 24

  41. arXiv:2407.19394  [pdf, other

    cs.CV

    Depth-Wise Convolutions in Vision Transformers for Efficient Training on Small Datasets

    Authors: Tianxiao Zhang, Wenju Xu, Bo Luo, Guanghui Wang

    Abstract: The Vision Transformer (ViT) leverages the Transformer's encoder to capture global information by dividing images into patches and achieves superior performance across various computer vision tasks. However, the self-attention mechanism of ViT captures the global context from the outset, overlooking the inherent relationships between neighboring pixels in images or videos. Transformers mainly focu… ▽ More

    Submitted 2 August, 2024; v1 submitted 28 July, 2024; originally announced July 2024.

  42. arXiv:2407.17183  [pdf

    cs.RO

    Robust Point Cloud Registration in Robotic Inspection with Locally Consistent Gaussian Mixture Model

    Authors: Lingjie Su, Wei Xu, Wenlong Li

    Abstract: In robotic inspection of aviation parts, achieving accurate pairwise point cloud registration between scanned and model data is essential. However, noise and outliers generated in robotic scanned data can compromise registration accuracy. To mitigate this challenge, this article proposes a probability-based registration method utilizing Gaussian Mixture Model (GMM) with local consistency constrain… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 12 pages, 14 figures

  43. arXiv:2407.16637  [pdf, other

    cs.CL cs.AI cs.LG

    Course-Correction: Safety Alignment Using Synthetic Preferences

    Authors: Rongwu Xu, Yishuo Cai, Zhenhong Zhou, Renjie Gu, Haiqin Weng, Yan Liu, Tianwei Zhang, Wei Xu, Han Qiu

    Abstract: The risk of harmful content generated by large language models (LLMs) becomes a critical concern. This paper presents a systematic study on assessing and improving LLMs' capability to perform the task of \textbf{course-correction}, \ie, the model can steer away from generating harmful content autonomously. To start with, we introduce the \textsc{C$^2$-Eval} benchmark for quantitative assessment an… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Dataset and script will be available at https://github.com/pillowsofwind/Course-Correction

  44. arXiv:2407.16327  [pdf, other

    cs.CR cs.CV

    Understanding Impacts of Electromagnetic Signal Injection Attacks on Object Detection

    Authors: Youqian Zhang, Chunxi Yang, Eugene Y. Fu, Qinhong Jiang, Chen Yan, Sze-Yiu Chau, Grace Ngai, Hong-Va Leong, Xiapu Luo, Wenyuan Xu

    Abstract: Object detection can localize and identify objects in images, and it is extensively employed in critical multimedia applications such as security surveillance and autonomous driving. Despite the success of existing object detection models, they are often evaluated in ideal scenarios where captured images guarantee the accurate and complete representation of the detecting scenes. However, images ca… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 2024 IEEE International Conference on Multimedia and Expo (ICME), July 15 - July 19, 2024, Niagra Falls, Ontario, Canada

  45. arXiv:2407.16197  [pdf, other

    cs.CV cs.RO

    LiCROcc: Teach Radar for Accurate Semantic Occupancy Prediction using LiDAR and Camera

    Authors: Yukai Ma, Jianbiao Mei, Xuemeng Yang, Licheng Wen, Weihua Xu, Jiangning Zhang, Botian Shi, Yong Liu, Xingxing Zuo

    Abstract: Semantic Scene Completion (SSC) is pivotal in autonomous driving perception, frequently confronted with the complexities of weather and illumination changes. The long-term strategy involves fusing multi-modal information to bolster the system's robustness. Radar, increasingly utilized for 3D target detection, is gradually replacing LiDAR in autonomous driving applications, offering a robust sensin… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  46. arXiv:2407.15366  [pdf, other

    cs.CL cs.AI cs.CY

    Walking in Others' Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias

    Authors: Rongwu Xu, Zi'an Zhou, Tianwei Zhang, Zehan Qi, Su Yao, Ke Xu, Wei Xu, Han Qiu

    Abstract: The common toxicity and societal bias in contents generated by large language models (LLMs) necessitate strategies to reduce harm. Present solutions often demand white-box access to the model or substantial training, which is impractical for cutting-edge commercial LLMs. Moreover, prevailing prompting methods depend on external tool feedback and fail to simultaneously lessen toxicity and bias. Mot… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  47. arXiv:2407.15343  [pdf, other

    cs.CL

    Improving Minimum Bayes Risk Decoding with Multi-Prompt

    Authors: David Heineman, Yao Dou, Wei Xu

    Abstract: While instruction fine-tuned LLMs are effective text generators, sensitivity to prompt construction makes performance unstable and sub-optimal in practice. Relying on a single "best" prompt cannot capture all differing approaches to a generation problem. Using this observation, we propose multi-prompt decoding, where many candidate generations are decoded from a prompt bank at inference-time. To e… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  48. arXiv:2407.15045  [pdf, other

    eess.SY cs.LG

    Efficient Sampling for Data-Driven Frequency Stability Constraint via Forward-Mode Automatic Differentiation

    Authors: Wangkun Xu, Qian Chen, Pudong Ge, Zhongda Chu, Fei Teng

    Abstract: Encoding frequency stability constraints in the operation problem is challenging due to its complex dynamics. Recently, data-driven approaches have been proposed to learn the stability criteria offline with the trained model embedded as a constraint of online optimization. However, random sampling of stationary operation points is less efficient in generating balanced stable and unstable samples.… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  49. arXiv:2407.14562  [pdf, other

    cs.AI cs.CL

    Thought-Like-Pro: Enhancing Reasoning of Large Language Models through Self-Driven Prolog-based Chain-of-Thought

    Authors: Xiaoyu Tan, Yongxin Deng, Xihe Qiu, Weidi Xu, Chao Qu, Wei Chu, Yinghui Xu, Yuan Qi

    Abstract: Large language models (LLMs) have shown exceptional performance as general-purpose assistants, excelling across a variety of reasoning tasks. This achievement represents a significant step toward achieving artificial general intelligence (AGI). Despite these advancements, the effectiveness of LLMs often hinges on the specific prompting strategies employed, and there remains a lack of a robust fram… ▽ More

    Submitted 10 August, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

    ACM Class: I.2.7

  50. arXiv:2407.12471  [pdf, other

    cs.CY cs.CL

    Characterization of Political Polarized Users Attacked by Language Toxicity on Twitter

    Authors: Wentao Xu

    Abstract: Understanding the dynamics of language toxicity on social media is important for us to investigate the propagation of misinformation and the development of echo chambers for political scenarios such as U.S. presidential elections. Recent research has used large-scale data to investigate the dynamics across social media platforms. However, research on the toxicity dynamics is not enough. This study… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: This work has been accepted by 2024 Conference on Computer Supported Cooperative Work and Social Computing (CSCW2024). Association for Computing Machinery (ACM), New York, NY, USA

    MSC Class: 91; 94 ACM Class: J.4