Skip to main content

Showing 1–16 of 16 results for author: Mu, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00567  [pdf, other

    cs.AI cs.LG

    A Contextual Combinatorial Bandit Approach to Negotiation

    Authors: Yexin Li, Zhancun Mu, Siyuan Qi

    Abstract: Learning effective negotiation strategies poses two key challenges: the exploration-exploitation dilemma and dealing with large action spaces. However, there is an absence of learning-based approaches that effectively address these challenges in negotiation. This paper introduces a comprehensive formulation to tackle various negotiation problems. Our approach leverages contextual combinatorial mul… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  2. arXiv:2407.00114  [pdf, other

    cs.LG cs.AI cs.CL

    OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents

    Authors: Zihao Wang, Shaofei Cai, Zhancun Mu, Haowei Lin, Ceyao Zhang, Xuejie Liu, Qing Li, Anji Liu, Xiaojian Ma, Yitao Liang

    Abstract: We present OmniJARVIS, a novel Vision-Language-Action (VLA) model for open-world instruction-following agents in open-world Minecraft. Compared to prior works that either emit textual goals to separate controllers or produce the control command directly, OmniJARVIS seeks a different path to ensure both strong reasoning and efficient decision-making capabilities via unified tokenization of multimod… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  3. arXiv:2406.18202  [pdf, other

    physics.geo-ph cs.DB

    GlobalTomo: A global dataset for physics-ML seismic wavefield modeling and FWI

    Authors: Shiqian Li, Zhi Li, Zhancun Mu, Shiji Xin, Zhixiang Dai, Kuangdai Leng, Ruihua Zhang, Xiaodong Song, Yixin Zhu

    Abstract: Global seismic tomography, taking advantage of seismic waves from natural earthquakes, provides essential insights into the earth's internal dynamics. Advanced Full-waveform Inversion (FWI) techniques, whose aim is to meticulously interpret every detail in seismograms, confront formidable computational demands in forward modeling and adjoint simulations on a global scale. Recent advancements in Ma… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 36 pages

  4. arXiv:2404.12725  [pdf, other

    cs.SD cs.CV cs.LG cs.MM eess.AS

    Separate in the Speech Chain: Cross-Modal Conditional Audio-Visual Target Speech Extraction

    Authors: Zhaoxi Mu, Xinyu Yang

    Abstract: The integration of visual cues has revitalized the performance of the target speech extraction task, elevating it to the forefront of the field. Nevertheless, this multi-modal learning paradigm often encounters the challenge of modality imbalance. In audio-visual target speech extraction tasks, the audio modality tends to dominate, potentially overshadowing the importance of visual guidance. To ta… ▽ More

    Submitted 5 May, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  5. arXiv:2404.07181  [pdf, other

    cond-mat.mtrl-sci cs.LG physics.comp-ph

    BAMBOO: a predictive and transferable machine learning force field framework for liquid electrolyte development

    Authors: Sheng Gong, Yumin Zhang, Zhenliang Mu, Zhichen Pu, Hongyi Wang, Zhiao Yu, Mengyi Chen, Tianze Zheng, Zhi Wang, Lifei Chen, Xiaojie Wu, Shaochen Shi, Weihao Gao, Wen Yan, Liang Xiang

    Abstract: Despite the widespread applications of machine learning force field (MLFF) on solids and small molecules, there is a notable gap in applying MLFF to complex liquid electrolytes. In this work, we introduce BAMBOO (ByteDance AI Molecular Simulation Booster), a novel framework for molecular dynamics (MD) simulations, with a demonstration of its capabilities in the context of liquid electrolytes for l… ▽ More

    Submitted 22 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

  6. arXiv:2312.10305  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction

    Authors: Zhaoxi Mu, Xinyu Yang, Sining Sun, Qing Yang

    Abstract: Speech signals are inherently complex as they encompass both global acoustic characteristics and local semantic information. However, in the task of target speech extraction, certain elements of global and local semantic information in the reference speech, which are irrelevant to speaker identity, can lead to speaker confusion within the speech extraction network. To overcome this challenge, we p… ▽ More

    Submitted 19 January, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI2024

  7. arXiv:2312.04113  [pdf

    cs.CV

    Multi-strategy Collaborative Optimized YOLOv5s and its Application in Distance Estimation

    Authors: Zijian Shen, Zhenping Mu, Xiangxiang Li

    Abstract: The increasing accident rate brought about by the explosive growth of automobiles has made the research on active safety systems of automobiles increasingly important. The importance of improving the accuracy of vehicle target detection is self-evident. To achieve the goals of vehicle detection and distance estimation and provide safety warnings, a Distance Estimation Safety Warning System (DESWS)… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: This paper contains 5 pages, 10 figures, and was accepted at 4th International Conference on Advances in Electrical Engineering and Computer Applications (AEECA2023)

  8. arXiv:2303.03737  [pdf, other

    cs.SD cs.LG eess.AS

    Multi-Dimensional and Multi-Scale Modeling for Speech Separation Optimized by Discriminative Learning

    Authors: Zhaoxi Mu, Xinyu Yang, Wenjing Zhu

    Abstract: Transformer has shown advanced performance in speech separation, benefiting from its ability to capture global features. However, capturing local features and channel information of audio sequences in speech separation is equally important. In this paper, we present a novel approach named Intra-SE-Conformer and Inter-Transformer (ISCIT) for speech separation. Specifically, we design a new network… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  9. arXiv:2303.03732  [pdf, other

    cs.SD cs.LG eess.AS

    A Multi-Stage Triple-Path Method for Speech Separation in Noisy and Reverberant Environments

    Authors: Zhaoxi Mu, Xinyu Yang, Xiangyuan Yang, Wenjing Zhu

    Abstract: In noisy and reverberant environments, the performance of deep learning-based speech separation methods drops dramatically because previous methods are not designed and optimized for such situations. To address this issue, we propose a multi-stage end-to-end learning method that decouples the difficult speech separation problem in noisy and reverberant environments into three sub-problems: speech… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  10. arXiv:2207.03726  [pdf, other

    cs.CV cs.RO

    TGRMPT: A Head-Shoulder Aided Multi-Person Tracker and a New Large-Scale Dataset for Tour-Guide Robot

    Authors: Wen Wang, Shunda Hu, Shiqiang Zhu, Wei Song, Zheyuan Lin, Tianlei Jin, Zonghao Mu, Yuanhai Zhou

    Abstract: A service robot serving safely and politely needs to track the surrounding people robustly, especially for Tour-Guide Robot (TGR). However, existing multi-object tracking (MOT) or multi-person tracking (MPT) methods are not applicable to TGR for the following reasons: 1. lacking relevant large-scale datasets; 2. lacking applicable metrics to evaluate trackers. In this work, we target the visual pe… ▽ More

    Submitted 8 July, 2022; originally announced July 2022.

  11. qrpca: A Package for Fast Principal Component Analysis with GPU Acceleration

    Authors: Rafael S. de Souza, Xu Quanfeng, Shiyin Shen, Chen Peng, Zihao Mu

    Abstract: We present qrpca, a fast and scalable QR-decomposition principal component analysis package. The software, written in both R and python languages, makes use of torch for internal matrix computations, and enables GPU acceleration, when available. qrpca provides similar functionalities to prcomp (R) and sklearn (python) packages respectively. A benchmark test shows that qrpca can achieve computation… ▽ More

    Submitted 6 September, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

    Journal ref: Astronomy and Computing, 41, 100633 (2022)

  12. arXiv:2201.02660  [pdf, ps, other

    cs.RO

    A Multi-Behavior Planning Framework for Robot Guide

    Authors: Muhan Hou, Zonghao Mu, Jing Li, Qizhi Yu, Jason Gu

    Abstract: The guiding task of a mobile robot requires not only human-aware navigation, but also appropriate yet timely interaction for active instruction. State-of-the-art tour-guide models limit their socially-aware consideration to adapting to users' motion, ignoring the interactive behavior planning to fulfill the communicative demands. We propose a multi-behavior planning framework based on Monte Carlo… ▽ More

    Submitted 7 January, 2022; originally announced January 2022.

  13. arXiv:2104.09995  [pdf, other

    cs.SD cs.CL eess.AS

    Review of end-to-end speech synthesis technology based on deep learning

    Authors: Zhaoxi Mu, Xinyu Yang, Yizhuo Dong

    Abstract: As an indispensable part of modern human-computer interaction system, speech synthesis technology helps users get the output of intelligent machine more easily and intuitively, thus has attracted more and more attention. Due to the limitations of high complexity and low efficiency of traditional speech synthesis technology, the current research focus is the deep learning-based end-to-end speech sy… ▽ More

    Submitted 20 April, 2021; originally announced April 2021.

  14. arXiv:2104.06008  [pdf, other

    cs.CV cs.CL

    Disentangled Motif-aware Graph Learning for Phrase Grounding

    Authors: Zongshen Mu, Siliang Tang, Jie Tan, Qiang Yu, Yueting Zhuang

    Abstract: In this paper, we propose a novel graph learning framework for phrase grounding in the image. Developing from the sequential to the dense graph model, existing works capture coarse-grained context but fail to distinguish the diversity of context among phrases and image regions. In contrast, we pay special attention to different motifs implied in the context of the scene graph and devise the disent… ▽ More

    Submitted 13 April, 2021; originally announced April 2021.

    Comments: 10 pages, 6 figures, AAAI 2021 conference

  15. arXiv:2102.12726  [pdf, other

    cs.RO cs.CY eess.SY

    Design and Control of a Highly Redundant Rigid-Flexible Coupling Robot to Assist the COVID-19 Oropharyngeal-Swab Sampling

    Authors: Yingbai Hu, Jian Li, Yongquan Chen, Qiwen Wang, Chuliang Chi, Heng Zhang, Qing Gao, Yuanmin Lan, Zheng Li, Zonggao Mu, Zhenglong Sun, Alois Knoll

    Abstract: The outbreak of novel coronavirus pneumonia (COVID-19) has caused mortality and morbidity worldwide. Oropharyngeal-swab (OP-swab) sampling is widely used for the diagnosis of COVID-19 in the world. To avoid the clinical staff from being affected by the virus, we developed a 9-degree-of-freedom (DOF) rigid-flexible coupling (RFC) robot to assist the COVID-19 OP-swab sampling. This robot is composed… ▽ More

    Submitted 25 February, 2021; originally announced February 2021.

    Comments: 8 pages, 11 figures

  16. arXiv:2008.13507  [pdf, other

    cs.CV cs.AI

    iLGaCo: Incremental Learning of Gait Covariate Factors

    Authors: Zihao Mu, Francisco M. Castro, Manuel J. Marin-Jimenez, Nicolas Guil, Yan-ran Li, Shiqi Yu

    Abstract: Gait is a popular biometric pattern used for identifying people based on their way of walking. Traditionally, gait recognition approaches based on deep learning are trained using the whole training dataset. In fact, if new data (classes, view-points, walking conditions, etc.) need to be included, it is necessary to re-train again the model with old and new data samples. In this paper, we propose… ▽ More

    Submitted 31 August, 2020; originally announced August 2020.

    Comments: Accepted for presentation at IJCB'2020