Skip to main content

Showing 1–50 of 247 results for author: Shi, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.17431  [pdf, other

    eess.AS cs.AI

    Advancing Multi-talker ASR Performance with Large Language Models

    Authors: Mohan Shi, Zengrui Jin, Yaoxun Xu, Yong Xu, Shi-Xiong Zhang, Kun Wei, Yiwen Shao, Chunlei Zhang, Dong Yu

    Abstract: Recognizing overlapping speech from multiple speakers in conversational scenarios is one of the most challenging problem for automatic speech recognition (ASR). Serialized output training (SOT) is a classic method to address multi-talker ASR, with the idea of concatenating transcriptions from multiple speakers according to the emission times of their speech for training. However, SOT-style transcr… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: 8 pages, accepted by IEEE SLT 2024

  2. arXiv:2408.15998  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

    Authors: Min Shi, Fuxiao Liu, Shihao Wang, Shijia Liao, Subhashree Radhakrishnan, De-An Huang, Hongxu Yin, Karan Sapra, Yaser Yacoob, Humphrey Shi, Bryan Catanzaro, Andrew Tao, Jan Kautz, Zhiding Yu, Guilin Liu

    Abstract: The ability to accurately interpret complex visual information is a crucial topic of multimodal large language models (MLLMs). Recent work indicates that enhanced visual perception significantly reduces hallucinations and improves performance on resolution-sensitive tasks, such as optical character recognition and document analysis. A number of recent MLLMs achieve this goal using a mixture of vis… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: Github: https://github.com/NVlabs/Eagle, HuggingFace: https://huggingface.co/NVEagle

  3. arXiv:2408.09685  [pdf, ps, other

    cs.IT quant-ph

    Triorthogonal Codes and Self-dual Codes

    Authors: Minjia Shi, Haodong Lu, Jon-Lark Kim, Patrick Sole

    Abstract: Triorthogonal matrices were introduced in Quantum Information Theory in connection with distillation of magic states (Bravyi and Haah (2012)). We give an algorithm to construct binary triorthogonal matrices from binary self-dual codes. Further, we generalize to this setting the classical coding techniques of shortening and extending. We also give some simple propagation rules.

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 21 pages

    MSC Class: 94B05

    Journal ref: Quantum Inf Process 23, 280 (2024)

  4. arXiv:2408.02462  [pdf, other

    eess.IV cs.AI cs.CV

    An investigation into the causes of race bias in AI-based cine CMR segmentation

    Authors: Tiarna Lee, Esther Puyol-Anton, Bram Ruijsink, Sebastien Roujol, Theodore Barfoot, Shaheim Ogbomo-Harmitt, Miaojing Shi, Andrew P. King

    Abstract: Artificial intelligence (AI) methods are being used increasingly for the automated segmentation of cine cardiac magnetic resonance (CMR) imaging. However, these methods have been shown to be subject to race bias, i.e. they exhibit different levels of performance for different races depending on the (im)balance of the data used to train the AI model. In this paper we investigate the source of this… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  5. arXiv:2407.18667  [pdf, other

    cs.CV

    A Labeled Ophthalmic Ultrasound Dataset with Medical Report Generation Based on Cross-modal Deep Learning

    Authors: Jing Wang, Junyan Fan, Meng Zhou, Yanzhu Zhang, Mingyu Shi

    Abstract: Ultrasound imaging reveals eye morphology and aids in diagnosing and treating eye diseases. However, interpreting diagnostic reports requires specialized physicians. We present a labeled ophthalmic dataset for the precise analysis and the automated exploration of medical images along with their associated reports. It collects three modal data, including the ultrasound images, blood flow informatio… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  6. arXiv:2407.16310  [pdf, ps, other

    cs.IT math.CO

    Some $3$-designs invariant under $2.PΣL(2,49).$

    Authors: Minjia Shi, Ruowen Liu, Patrick Solé

    Abstract: We construct a ternary [49,25,7] code from the row span of a Jacobsthal matrix. It is equivalent to a Generalized Quadratic Residue (GQR) code in the sense of van Lint and MacWilliams (1978). These codes are the abelian generalizations of the quadratic residue (QR) codes which are cyclic. The union of the [50,25,8] extension of the said code and its dual supports a 3-(50,14,1248) design. The autom… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 9 pages

    MSC Class: 94 B15; 05 B05

  7. arXiv:2407.16139  [pdf, other

    cs.LG

    Tackling Feature-Classifier Mismatch in Federated Learning via Prompt-Driven Feature Transformation

    Authors: Xinghao Wu, Jianwei Niu, Xuefeng Liu, Mingjia Shi, Guogang Zhu, Shaojie Tang

    Abstract: In traditional Federated Learning approaches like FedAvg, the global model underperforms when faced with data heterogeneity. Personalized Federated Learning (PFL) enables clients to train personalized models to fit their local data distribution better. However, we surprisingly find that the feature extractor in FedAvg is superior to those in most PFL methods. More interestingly, by applying a line… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 23 pages, 9 figures

  8. arXiv:2407.11213  [pdf, other

    cs.CV

    OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models

    Authors: Zijian Zhou, Zheng Zhu, Holger Caesar, Miaojing Shi

    Abstract: Panoptic Scene Graph Generation (PSG) aims to segment objects and recognize their relations, enabling the structured understanding of an image. Previous methods focus on predicting predefined object and relation categories, hence limiting their applications in the open world scenarios. With the rapid development of large multimodal models (LMMs), significant progress has been made in open-set obje… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  9. arXiv:2407.08813  [pdf, other

    eess.IV cs.AI cs.CV

    FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification

    Authors: Yu Tian, Congcong Wen, Min Shi, Muhammad Muneeb Afzal, Hao Huang, Muhammad Osama Khan, Yan Luo, Yi Fang, Mengyu Wang

    Abstract: Addressing fairness in artificial intelligence (AI), particularly in medical AI, is crucial for ensuring equitable healthcare outcomes. Recent efforts to enhance fairness have introduced new methodologies and datasets in medical AI. However, the fairness issue under the setting of domain transfer is almost unexplored, while it is common that clinics rely on different imaging technologies (e.g., di… ▽ More

    Submitted 18 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; Codes and datasets are available at https://github.com/Harvard-Ophthalmology-AI-Lab/FairDomain

  10. arXiv:2407.08507  [pdf, other

    cs.CV

    Bootstrapping Vision-language Models for Self-supervised Remote Physiological Measurement

    Authors: Zijie Yue, Miaojing Shi, Hanli Wang, Shuai Ding, Qijun Chen, Shanlin Yang

    Abstract: Facial video-based remote physiological measurement is a promising research area for detecting human vital signs (e.g., heart rate, respiration frequency) in a non-contact way. Conventional approaches are mostly supervised learning, requiring extensive collections of facial videos and synchronously recorded photoplethysmography (PPG) signals. To tackle it, self-supervised learning has recently gai… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  11. arXiv:2406.16039  [pdf, other

    cs.CV

    CholecInstanceSeg: A Tool Instance Segmentation Dataset for Laparoscopic Surgery

    Authors: Oluwatosin Alabi, Ko Ko Zayar Toe, Zijian Zhou, Charlie Budd, Nicholas Raison, Miaojing Shi, Tom Vercauteren

    Abstract: In laparoscopic and robotic surgery, precise tool instance segmentation is an essential technology for advanced computer-assisted interventions. Although publicly available procedures of routine surgeries exist, they often lack comprehensive annotations for tool instance segmentation. Additionally, the majority of standard datasets for tool segmentation are derived from porcine(pig) surgeries. To… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  12. CMDS: Cross-layer Dataflow Optimization for DNN Accelerators Exploiting Multi-bank Memories

    Authors: Man Shi, Steven Colleman, Charlotte VanDeMieroop, Antony Joseph, Maurice Meijer, Wim Dehaene, Marian Verhelst

    Abstract: Deep neural networks (DNN) use a wide range of network topologies to achieve high accuracy within diverse applications. This model diversity makes it impossible to identify a single "dataflow" (execution schedule) to perform optimally across all possible layers and network topologies. Several frameworks support the exploration of the best dataflow for a given DNN layer and hardware. However, switc… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Journal ref: 2023 24th International Symposium on Quality Electronic Design (ISQED)

  13. arXiv:2406.04595  [pdf, other

    cs.SD cs.CL eess.AS

    Pitch-Aware RNN-T for Mandarin Chinese Mispronunciation Detection and Diagnosis

    Authors: Xintong Wang, Mingqian Shi, Ye Wang

    Abstract: Mispronunciation Detection and Diagnosis (MDD) systems, leveraging Automatic Speech Recognition (ASR), face two main challenges in Mandarin Chinese: 1) The two-stage models create an information gap between the phoneme or tone classification stage and the MDD stage. 2) The scarcity of Mandarin MDD datasets limits model training. In this paper, we introduce a stateless RNN-T model for Mandarin MDD,… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  14. arXiv:2406.00545  [pdf, ps, other

    cs.CV cs.AI

    Memory-guided Network with Uncertainty-based Feature Augmentation for Few-shot Semantic Segmentation

    Authors: Xinyue Chen, Miaojing Shi

    Abstract: The performance of supervised semantic segmentation methods highly relies on the availability of large-scale training data. To alleviate this dependence, few-shot semantic segmentation (FSS) is introduced to leverage the model trained on base classes with sufficient data into the segmentation of novel classes with few data. FSS methods face the challenge of model generalization on novel classes du… ▽ More

    Submitted 9 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

    Comments: Accepted to IEEE International Conference on Multimedia and Expo (ICME) 2024 as an oral presentation

  15. arXiv:2405.17403  [pdf, other

    cs.LG cs.AI

    A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

    Authors: Kai Wang, Yukun Zhou, Mingjia Shi, Zhihang Yuan, Yuzhang Shang, Xiaojiang Peng, Hanwang Zhang, Yang You

    Abstract: Training diffusion models is always a computation-intensive task. In this paper, we introduce a novel speed-up method for diffusion model training, called, which is based on a closer look at time steps. Our key findings are: i) Time steps can be empirically divided into acceleration, deceleration, and convergence areas based on the process increment. ii) These time steps are imbalanced, with many… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    ACM Class: I.2

  16. arXiv:2405.11690  [pdf, other

    cs.CV

    InterAct: Capture and Modelling of Realistic, Expressive and Interactive Activities between Two Persons in Daily Scenarios

    Authors: Yinghao Huang, Leo Ho, Dafei Qin, Mingyi Shi, Taku Komura

    Abstract: We address the problem of accurate capture and expressive modelling of interactive behaviors happening between two persons in daily scenarios. Different from previous works which either only consider one person or focus on conversational gestures, we propose to simultaneously model the activities of two persons, and target objective-driven, dynamic, and coherent interactions which often span long… ▽ More

    Submitted 27 May, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

    Comments: The first two authors contributed equally to this work

  17. arXiv:2405.02580  [pdf, other

    cs.SE cs.AI

    PropertyGPT: LLM-driven Formal Verification of Smart Contracts through Retrieval-Augmented Property Generation

    Authors: Ye Liu, Yue Xue, Daoyuan Wu, Yuqiang Sun, Yi Li, Miaolei Shi, Yang Liu

    Abstract: With recent advances in large language models (LLMs), this paper explores the potential of leveraging state-of-the-art LLMs, such as GPT-4, to transfer existing human-written properties (e.g., those from Certora auditing reports) and automatically generate customized properties for unknown code. To this end, we embed existing properties into a vector database and retrieve a reference property for… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  18. arXiv:2405.01533  [pdf, other

    cs.CV

    OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning

    Authors: Shihao Wang, Zhiding Yu, Xiaohui Jiang, Shiyi Lan, Min Shi, Nadine Chang, Jan Kautz, Ying Li, Jose M. Alvarez

    Abstract: The advances in multimodal large language models (MLLMs) have led to growing interests in LLM-based autonomous driving agents to leverage their strong reasoning capabilities. However, capitalizing on MLLMs' strong reasoning capabilities for improved planning behavior is challenging since planning requires full 3D situational awareness beyond 2D reasoning. To address this challenge, our work propos… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  19. arXiv:2404.17528  [pdf, other

    cs.CV

    Geometry-aware Reconstruction and Fusion-refined Rendering for Generalizable Neural Radiance Fields

    Authors: Tianqi Liu, Xinyi Ye, Min Shi, Zihao Huang, Zhiyu Pan, Zhan Peng, Zhiguo Cao

    Abstract: Generalizable NeRF aims to synthesize novel views for unseen scenes. Common practices involve constructing variance-based cost volumes for geometry reconstruction and encoding 3D descriptors for decoding novel views. However, existing methods show limited generalization ability in challenging conditions due to inaccurate geometry, sub-optimal descriptors, and decoding strategies. We address these… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024. Project page: https://gefucvpr24.github.io

  20. arXiv:2404.15602  [pdf, other

    cs.RO

    Decentralized Multi-Agent Trajectory Planning in Dynamic Environments with Spatiotemporal Occupancy Grid Maps

    Authors: Siyuan Wu, Gang Chen, Moji Shi, Javier Alonso-Mora

    Abstract: This paper proposes a decentralized trajectory planning framework for the collision avoidance problem of multiple micro aerial vehicles (MAVs) in environments with static and dynamic obstacles. The framework utilizes spatiotemporal occupancy grid maps (SOGM), which forecast the occupancy status of neighboring space in the near future, as the environment representation. Based on this representation… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 6 pages, 6 figures, accepted to the 2024 IEEE International Conference on Robotics and Automation (ICRA2024)

  21. arXiv:2404.15121  [pdf, other

    cs.GR cs.AI cs.CV

    Taming Diffusion Probabilistic Models for Character Control

    Authors: Rui Chen, Mingyi Shi, Shaoli Huang, Ping Tan, Taku Komura, Xuelin Chen

    Abstract: We present a novel character control framework that effectively utilizes motion diffusion probabilistic models to generate high-quality and diverse character animations, responding in real-time to a variety of dynamic user-supplied control signals. At the heart of our method lies a transformer-based Conditional Autoregressive Motion Diffusion Model (CAMDM), which takes as input the character's his… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: Accepted by SIGGRAPH 2024 (Conference Track). Project page and source codes: https://aiganimation.github.io/CAMDM/

  22. arXiv:2404.14848  [pdf, other

    cs.RO

    Evaluating Dynamic Environment Difficulty for Obstacle Avoidance Benchmarking

    Authors: Moji Shi, Gang Chen, Álvaro Serra Gómez, Siyuan Wu, Javier Alonso-Mora

    Abstract: Dynamic obstacle avoidance is a popular research topic for autonomous systems, such as micro aerial vehicles and service robots. Accurately evaluating the performance of dynamic obstacle avoidance methods necessitates the establishment of a metric to quantify the environment's difficulty, a crucial aspect that remains unexplored. In this paper, we propose four metrics to measure the difficulty of… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  23. arXiv:2404.07612  [pdf, ps, other

    cs.CY

    Measuring Geographic Diversity of Foundation Models with a Natural Language--based Geo-guessing Experiment on GPT-4

    Authors: Zilong Liu, Krzysztof Janowicz, Kitty Currier, Meilin Shi

    Abstract: Generative AI based on foundation models provides a first glimpse into the world represented by machines trained on vast amounts of multimodal data ingested by these models during training. If we consider the resulting models as knowledge bases in their own right, this may open up new avenues for understanding places through the lens of machines. In this work, we adopt this thinking and select GPT… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: Short paper accepted by AGILE 2024 conference (https://agile-gi.eu/conference-2024)

  24. SCAResNet: A ResNet Variant Optimized for Tiny Object Detection in Transmission and Distribution Towers

    Authors: Weile Li, Muqing Shi, Zhonghua Hong

    Abstract: Traditional deep learning-based object detection networks often resize images during the data preprocessing stage to achieve a uniform size and scale in the feature map. Resizing is done to facilitate model propagation and fully connected classification. However, resizing inevitably leads to object deformation and loss of valuable information in the images. This drawback becomes particularly prono… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  25. arXiv:2404.02471  [pdf, other

    cs.IT

    Some bounds on the cardinality of the $b$-symbol weight spectrum of codes

    Authors: Hongwei Zhu, Shitao Li, Minjia Shi, Shu-Tao Xia, Patrick Sole

    Abstract: The size of the Hamming distance spectrum of a code has received great attention in recent research. The main objective of this paper is to extend these significant theories to the $b$-symbol distance spectrum. We examine this question for various types of codes, including unrestricted codes, additive codes, linear codes, and cyclic codes, successively. For the first three cases, we determine the… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  26. arXiv:2404.01727  [pdf, other

    cs.RO cs.CV

    Generalizing 6-DoF Grasp Detection via Domain Prior Knowledge

    Authors: Haoxiang Ma, Modi Shi, Boyang Gao, Di Huang

    Abstract: We focus on the generalization ability of the 6-DoF grasp detection method in this paper. While learning-based grasp detection methods can predict grasp poses for unseen objects using the grasp distribution learned from the training set, they often exhibit a significant performance drop when encountering objects with diverse shapes and structures. To enhance the grasp detection methods' generaliza… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR 2024

  27. arXiv:2403.19949  [pdf, other

    cs.CV

    FairCLIP: Harnessing Fairness in Vision-Language Learning

    Authors: Yan Luo, Min Shi, Muhammad Osama Khan, Muhammad Muneeb Afzal, Hao Huang, Shuaihang Yuan, Yu Tian, Luo Song, Ava Kouhana, Tobias Elze, Yi Fang, Mengyu Wang

    Abstract: Fairness is a critical concern in deep learning, especially in healthcare, where these models influence diagnoses and treatment decisions. Although fairness has been investigated in the vision-only domain, the fairness of medical vision-language (VL) models remains unexplored due to the scarcity of medical VL datasets for studying fairness. To bridge this research gap, we introduce the first fair… ▽ More

    Submitted 5 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  28. arXiv:2403.11511  [pdf, other

    cs.RO cs.CV

    Sim-to-Real Grasp Detection with Global-to-Local RGB-D Adaptation

    Authors: Haoxiang Ma, Ran Qin, Modi shi, Boyang Gao, Di Huang

    Abstract: This paper focuses on the sim-to-real issue of RGB-D grasp detection and formulates it as a domain adaptation problem. In this case, we present a global-to-local method to address hybrid domain gaps in RGB and depth data and insufficient multi-modal feature alignment. First, a self-supervised rotation pre-training strategy is adopted to deliver robust initialization for RGB and depth networks. We… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted at ICRA 2024

  29. arXiv:2403.06728  [pdf, other

    cs.CV

    Large Model driven Radiology Report Generation with Clinical Quality Reinforcement Learning

    Authors: Zijian Zhou, Miaojing Shi, Meng Wei, Oluwatosin Alabi, Zijie Yue, Tom Vercauteren

    Abstract: Radiology report generation (RRG) has attracted significant attention due to its potential to reduce the workload of radiologists. Current RRG approaches are still unsatisfactory against clinical standards. This paper introduces a novel RRG method, \textbf{LM-RRG}, that integrates large models (LMs) with clinical quality reinforcement learning to generate accurate and comprehensive chest X-ray rad… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  30. arXiv:2403.02234  [pdf, other

    cs.CV

    3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors

    Authors: Fangzhou Hong, Jiaxiang Tang, Ziang Cao, Min Shi, Tong Wu, Zhaoxi Chen, Shuai Yang, Tengfei Wang, Liang Pan, Dahua Lin, Ziwei Liu

    Abstract: We present a two-stage text-to-3D generation system, namely 3DTopia, which generates high-quality general 3D assets within 5 minutes using hybrid diffusion priors. The first stage samples from a 3D diffusion prior directly learned from 3D data. Specifically, it is powered by a text-conditioned tri-plane latent diffusion model, which quickly generates coarse 3D samples for fast prototyping. The sec… ▽ More

    Submitted 6 May, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Code available at https://github.com/3DTopia/3DTopia

  31. arXiv:2402.11410  [pdf, ps, other

    cs.LG cs.DS stat.ML

    An Elementary Predictor Obtaining $2\sqrt{T}$ Distance to Calibration

    Authors: Eshwar Ram Arunachaleswaran, Natalie Collina, Aaron Roth, Mirah Shi

    Abstract: Blasiok et al. [2023] proposed distance to calibration as a natural measure of calibration error that unlike expected calibration error (ECE) is continuous. Recently, Qiao and Zheng [2024] gave a non-constructive argument establishing the existence of an online predictor that can obtain $O(\sqrt{T})$ distance to calibration in the adversarial setting, which is known to be impossible for ECE. They… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

  32. arXiv:2402.11073  [pdf, other

    cs.CL cs.AI

    AFaCTA: Assisting the Annotation of Factual Claim Detection with Reliable LLM Annotators

    Authors: Jingwei Ni, Minjing Shi, Dominik Stammbach, Mrinmaya Sachan, Elliott Ash, Markus Leippold

    Abstract: With the rise of generative AI, automated fact-checking methods to combat misinformation are becoming more and more important. However, factual claim detection, the first step in a fact-checking pipeline, suffers from two key issues that limit its scalability and generalizability: (1) inconsistency in definitions of the task and what a claim is, and (2) the high cost of manual annotation. To addre… ▽ More

    Submitted 2 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: ACL2024 Main Conference

  33. arXiv:2402.08753  [pdf, ps, other

    cs.GT cs.LG

    Forecasting for Swap Regret for All Downstream Agents

    Authors: Aaron Roth, Mirah Shi

    Abstract: We study the problem of making predictions so that downstream agents who best respond to them will be guaranteed diminishing swap regret, no matter what their utility functions are. It has been known since Foster and Vohra (1997) that agents who best-respond to calibrated forecasts have no swap regret. Unfortunately, the best known algorithms for guaranteeing calibrated forecasts in sequential adv… ▽ More

    Submitted 15 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

  34. AdaTreeFormer: Few Shot Domain Adaptation for Tree Counting from a Single High-Resolution Image

    Authors: Hamed Amini Amirkolaee, Miaojing Shi, Lianghua He, Mark Mulligan

    Abstract: The process of estimating and counting tree density using only a single aerial or satellite image is a difficult task in the fields of photogrammetry and remote sensing. However, it plays a crucial role in the management of forests. The huge variety of trees in varied topography severely hinders tree counting models to perform well. The purpose of this paper is to propose a framework that is learn… ▽ More

    Submitted 30 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Accepted in ISPRS Journal of Photogrammetry and Remote Sensing

  35. arXiv:2401.16185  [pdf, other

    cs.CR cs.AI cs.SE

    LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs' Vulnerability Reasoning

    Authors: Yuqiang Sun, Daoyuan Wu, Yue Xue, Han Liu, Wei Ma, Lyuye Zhang, Miaolei Shi, Yang Liu

    Abstract: Large language models (LLMs) have demonstrated significant potential for many downstream tasks, including those requiring human-level intelligence, such as vulnerability detection. However, recent attempts to use LLMs for vulnerability detection are still preliminary, as they lack an in-depth understanding of a subject LLM's vulnerability reasoning capability -- whether it originates from the mode… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: This is a technical report by Nanyang Technological University

  36. arXiv:2401.08256  [pdf, other

    cs.CV

    Multitask Learning in Minimally Invasive Surgical Vision: A Review

    Authors: Oluwatosin Alabi, Tom Vercauteren, Miaojing Shi

    Abstract: Minimally invasive surgery (MIS) has revolutionized many procedures and led to reduced recovery time and risk of patient injury. However, MIS poses additional complexity and burden on surgical teams. Data-driven surgical vision algorithms are thought to be key building blocks in the development of future MIS systems with improved autonomy. Recent advancements in machine learning and computer visio… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  37. arXiv:2312.17264  [pdf, other

    cs.CL cs.IR

    ESGReveal: An LLM-based approach for extracting structured data from ESG reports

    Authors: Yi Zou, Mengying Shi, Zhongjie Chen, Zhu Deng, ZongXiong Lei, Zihan Zeng, Shiming Yang, HongXiang Tong, Lei Xiao, Wenwen Zhou

    Abstract: ESGReveal is an innovative method proposed for efficiently extracting and analyzing Environmental, Social, and Governance (ESG) data from corporate reports, catering to the critical need for reliable ESG information retrieval. This approach utilizes Large Language Models (LLM) enhanced with Retrieval Augmented Generation (RAG) techniques. The ESGReveal system includes an ESG metadata module for ta… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

  38. arXiv:2312.09482  [pdf, ps, other

    cs.IT

    An open problem and a conjecture on binary linear complementary pairs of codes

    Authors: Shitao Li, Minjia Shi, San Ling

    Abstract: The existence of $q$-ary linear complementary pairs (LCPs) of codes with $q> 2$ has been completely characterized so far. This paper gives a characterization for the existence of binary LCPs of codes. As a result, we solve an open problem proposed by Carlet $et~al.$ (IEEE Trans. Inf. Theory 65(3): 1694-1704, 2019) and a conjecture proposed by Choi $et~al.$ (Cryptogr. Commun. 15(2): 469-486, 2023).

    Submitted 14 December, 2023; originally announced December 2023.

  39. arXiv:2312.01220  [pdf, other

    cs.CV

    Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation

    Authors: Zhipeng Du, Miaojing Shi, Jiankang Deng

    Abstract: Detecting objects in low-light scenarios presents a persistent challenge, as detectors trained on well-lit data exhibit significant performance degradation on low-light data due to low visibility. Previous methods mitigate this issue by exploring image enhancement or object detection techniques with real low-light image datasets. However, the progress is impeded by the inherent difficulties about… ▽ More

    Submitted 27 March, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

    Comments: Accepted to CVPR 2024

  40. arXiv:2312.01151  [pdf

    cs.CY cs.CL cs.SC

    Here Is Not There: Measuring Entailment-Based Trajectory Similarity for Location-Privacy Protection and Beyond

    Authors: Zilong Liu, Krzysztof Janowicz, Kitty Currier, Meilin Shi, Jinmeng Rao, Song Gao, Ling Cai, Anita Graser

    Abstract: While the paths humans take play out in social as well as physical space, measures to describe and compare their trajectories are carried out in abstract, typically Euclidean, space. When these measures are applied to trajectories of actual individuals in an application area, alterations that are inconsequential in abstract space may suddenly become problematic once overlaid with geographic realit… ▽ More

    Submitted 2 December, 2023; originally announced December 2023.

  41. arXiv:2311.16964  [pdf, other

    cond-mat.dis-nn cond-mat.mtrl-sci cs.LG

    Machine learning force-field models for metallic spin glass

    Authors: Menglin Shi, Sheng Zhang, Gia-Wei Chern

    Abstract: Metallic spin glass systems, such as dilute magnetic alloys, are characterized by randomly distributed local moments coupled to each other through a long-range electron-mediated effective interaction. We present a scalable machine learning (ML) framework for dynamical simulations of metallic spin glasses. A Behler-Parrinello type neural-network model, based on the principle of locality, is develop… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: 12 pages, 5 figures

  42. arXiv:2311.16492  [pdf, other

    cs.CV

    VLPrompt: Vision-Language Prompting for Panoptic Scene Graph Generation

    Authors: Zijian Zhou, Miaojing Shi, Holger Caesar

    Abstract: Panoptic Scene Graph Generation (PSG) aims at achieving a comprehensive image understanding by simultaneously segmenting objects and predicting relations among objects. However, the long-tail problem among relations leads to unsatisfactory results in real-world applications. Prior methods predominantly rely on vision information or utilize limited language information, such as object or relation n… ▽ More

    Submitted 19 June, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: 22 pages, 9 figures

  43. arXiv:2311.02189  [pdf, other

    cs.CV

    FairSeg: A Large-Scale Medical Image Segmentation Dataset for Fairness Learning Using Segment Anything Model with Fair Error-Bound Scaling

    Authors: Yu Tian, Min Shi, Yan Luo, Ava Kouhana, Tobias Elze, Mengyu Wang

    Abstract: Fairness in artificial intelligence models has gained significantly more attention in recent years, especially in the area of medicine, as fairness in medical models is critical to people's well-being and lives. High-quality medical fairness datasets are needed to promote fairness learning research. Existing medical fairness datasets are all for classification tasks, and no fairness datasets are a… ▽ More

    Submitted 30 April, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

    Comments: ICLR 2024; Codes available at https://github.com/Harvard-Ophthalmology-AI-Lab/FairSeg

  44. arXiv:2311.00354  [pdf, ps, other

    cs.CR

    Butson Hadamard matrices, bent sequences, and spherical codes

    Authors: Minjia Shi, Danni Lu, Andrés Armario, Ronan Egan, Ferruh Ozbudak, Patrick Solé

    Abstract: We explore a notion of bent sequence attached to the data consisting of an Hadamard matrix of order $n$ defined over the complex $q^{th}$ roots of unity, an eigenvalue of that matrix, and a Galois automorphism from the cyclotomic field of order $q.$ In particular we construct self-dual bent sequences for various $q\le 60$ and lengths $n\le 21.$ Computational construction methods comprise the resol… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  45. arXiv:2310.12511  [pdf, ps, other

    cs.IT

    The weight enumerator polynomials of the lifted codes of the projective Solomon-Stiffler codes

    Authors: Minjia Shi, Shitao Li, Tor Helleseth

    Abstract: Determining the weight distribution of a code is an old and fundamental topic in coding theory that has been thoroughly studied. In 1977, Helleseth, Kløve, and Mykkeltveit presented a weight enumerator polynomial of the lifted code over $\mathbb{F}_{q^\ell}$ of a $q$-ary linear code with significant combinatorial properties, which can determine the support weight distribution of this linear code.… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: This manuscript was first submitted on September 9, 2022

  46. arXiv:2310.09183  [pdf, other

    cs.LG cs.AI cs.DC

    PRIOR: Personalized Prior for Reactivating the Information Overlooked in Federated Learning

    Authors: Mingjia Shi, Yuhao Zhou, Kai Wang, Huaizheng Zhang, Shudong Huang, Qing Ye, Jiangcheng Lv

    Abstract: Classical federated learning (FL) enables training machine learning models without sharing data for privacy preservation, but heterogeneous data characteristic degrades the performance of the localized model. Personalized FL (PFL) addresses this by synthesizing personalized models from a global model via training on local data. Such a global model may overlook the specific information that the cli… ▽ More

    Submitted 10 November, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS 2023

    MSC Class: 68T07 ACM Class: I.2.11

  47. arXiv:2310.07355  [pdf, other

    cs.CV cs.LG

    IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training

    Authors: Che Liu, Sibo Cheng, Miaojing Shi, Anand Shah, Wenjia Bai, Rossella Arcucci

    Abstract: In the field of medical Vision-Language Pre-training (VLP), significant efforts have been devoted to deriving text and image features from both clinical reports and associated medical images. However, most existing methods may have overlooked the opportunity in leveraging the inherent hierarchical structure of clinical reports, which are generally split into `findings' for descriptive content and… ▽ More

    Submitted 1 May, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: Under Review

  48. arXiv:2310.04863  [pdf, other

    cs.SD eess.AS

    SA-Paraformer: Non-autoregressive End-to-End Speaker-Attributed ASR

    Authors: Yangze Li, Fan Yu, Yuhao Liang, Pengcheng Guo, Mohan Shi, Zhihao Du, Shiliang Zhang, Lei Xie

    Abstract: Joint modeling of multi-speaker ASR and speaker diarization has recently shown promising results in speaker-attributed automatic speech recognition (SA-ASR).Although being able to obtain state-of-the-art (SOTA) performance, most of the studies are based on an autoregressive (AR) decoder which generates tokens one-by-one and results in a large real-time factor (RTF). To speed up inference, we intro… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

  49. arXiv:2310.02492  [pdf, other

    cs.CV

    FairVision: Equitable Deep Learning for Eye Disease Screening via Fair Identity Scaling

    Authors: Yan Luo, Muhammad Osama Khan, Yu Tian, Min Shi, Zehao Dou, Tobias Elze, Yi Fang, Mengyu Wang

    Abstract: Equity in AI for healthcare is crucial due to its direct impact on human well-being. Despite advancements in 2D medical imaging fairness, the fairness of 3D models remains underexplored, hindered by the small sizes of 3D fairness datasets. Since 3D imaging surpasses 2D imaging in SOTA clinical care, it is critical to understand the fairness of these 3D models. To address this research gap, we cond… ▽ More

    Submitted 12 April, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

  50. arXiv:2309.17218  [pdf, other

    cs.CV

    When Epipolar Constraint Meets Non-local Operators in Multi-View Stereo

    Authors: Tianqi Liu, Xinyi Ye, Weiyue Zhao, Zhiyu Pan, Min Shi, Zhiguo Cao

    Abstract: Learning-based multi-view stereo (MVS) method heavily relies on feature matching, which requires distinctive and descriptive representations. An effective solution is to apply non-local feature aggregation, e.g., Transformer. Albeit useful, these techniques introduce heavy computation overheads for MVS. Each pixel densely attends to the whole image. In contrast, we propose to constrain non-local f… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

    Comments: ICCV2023