Skip to main content

Showing 1–50 of 89 results for author: Shi, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.09862  [pdf, other

    cs.CV

    ML-SemReg: Boosting Point Cloud Registration with Multi-level Semantic Consistency

    Authors: Shaocheng Yan, Pengcheng Shi, Jiayuan Li

    Abstract: Recent advances in point cloud registration mostly leverage geometric information. Although these methods have yielded promising results, they still struggle with problems of low overlap, thus limiting their practical usage. In this paper, we propose ML-SemReg, a plug-and-play point cloud registration framework that fully exploits semantic information. Our key insight is that mismatches can be cat… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  2. arXiv:2407.05361  [pdf, other

    eess.AS cs.CL

    Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation

    Authors: Haorui He, Zengqiang Shang, Chaoren Wang, Xuyuan Li, Yicheng Gu, Hua Hua, Liwei Liu, Chen Yang, Jiaqi Li, Peiyang Shi, Yuancheng Wang, Kai Chen, Pengyuan Zhang, Zhizheng Wu

    Abstract: Recently, speech generation models have made significant progress by using large-scale training data. However, the research community struggle to produce highly spontaneous and human-like speech due to the lack of large-scale, diverse, and spontaneous speech data. This paper present Emilia, the first multilingual speech generation dataset from in-the-wild speech data, and Emilia-Pipe, the first op… ▽ More

    Submitted 12 July, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

    Comments: Fix typos

  3. arXiv:2407.01517  [pdf, other

    eess.IV cs.CV cs.LG

    Centerline Boundary Dice Loss for Vascular Segmentation

    Authors: Pengcheng Shi, Jiesi Hu, Yanwu Yang, Zilve Gao, Wei Liu, Ting Ma

    Abstract: Vascular segmentation in medical imaging plays a crucial role in analysing morphological and functional assessments. Traditional methods, like the centerline Dice (clDice) loss, ensure topology preservation but falter in capturing geometric details, especially under translation and deformation. The combination of clDice with traditional Dice loss can lead to diameter imbalance, favoring larger ves… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: accepted by MICCAI 2024

  4. arXiv:2406.09601  [pdf, other

    cs.CV

    Turns Out I'm Not Real: Towards Robust Detection of AI-Generated Videos

    Authors: Qingyuan Liu, Pengyuan Shi, Yun-Yun Tsai, Chengzhi Mao, Junfeng Yang

    Abstract: The impressive achievements of generative models in creating high-quality videos have raised concerns about digital integrity and privacy vulnerabilities. Recent works to combat Deepfakes videos have developed detectors that are highly accurate at identifying GAN-generated samples. However, the robustness of these detectors on diffusion-generated videos generated from video creation tools (e.g., S… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  5. arXiv:2403.10823  [pdf, other

    cs.CV cs.AI

    VisionCLIP: An Med-AIGC based Ethical Language-Image Foundation Model for Generalizable Retina Image Analysis

    Authors: Hao Wei, Bowen Liu, Minqing Zhang, Peilun Shi, Wu Yuan

    Abstract: Generalist foundation model has ushered in newfound capabilities in medical domain. However, the contradiction between the growing demand for high-quality annotated data with patient privacy continues to intensify. The utilization of medical artificial intelligence generated content (Med-AIGC) as an inexhaustible resource repository arises as a potential solution to address the aforementioned chal… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

  6. arXiv:2401.05738  [pdf, other

    cs.CV

    LKCA: Large Kernel Convolutional Attention

    Authors: Chenghao Li, Boheng Zeng, Yi Lu, Pengbo Shi, Qingzi Chen, Jirui Liu, Lingyun Zhu

    Abstract: We revisit the relationship between attention mechanisms and large kernel ConvNets in visual transformers and propose a new spatial attention named Large Kernel Convolutional Attention (LKCA). It simplifies the attention operation by replacing it with a single large kernel convolution. LKCA combines the advantages of convolutional neural networks and visual transformers, possessing a large recepti… ▽ More

    Submitted 5 February, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

  7. arXiv:2312.17670  [pdf, other

    cs.CV cs.LG q-bio.QM q-bio.TO

    Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRA

    Authors: Kaiyuan Yang, Fabio Musio, Yihui Ma, Norman Juchler, Johannes C. Paetzold, Rami Al-Maskari, Luciano Höher, Hongwei Bran Li, Ibrahim Ethem Hamamci, Anjany Sekuboyina, Suprosanna Shit, Houjing Huang, Chinmay Prabhakar, Ezequiel de la Rosa, Diana Waldmannstetter, Florian Kofler, Fernando Navarro, Martin Menten, Ivan Ezhov, Daniel Rueckert, Iris Vos, Ynte Ruigrok, Birgitta Velthuis, Hugo Kuijf, Julien Hämmerli , et al. (59 additional authors not shown)

    Abstract: The Circle of Willis (CoW) is an important network of arteries connecting major circulations of the brain. Its vascular architecture is believed to affect the risk, severity, and clinical outcome of serious neuro-vascular diseases. However, characterizing the highly variable CoW anatomy is still a manual and time-consuming expert task. The CoW is usually imaged by two angiographic imaging modaliti… ▽ More

    Submitted 29 April, 2024; v1 submitted 29 December, 2023; originally announced December 2023.

    Comments: 24 pages, 11 figures, 9 tables. Summary Paper for the MICCAI TopCoW 2023 Challenge

  8. arXiv:2311.10331  [pdf, other

    eess.IV cs.CV

    Leveraging Multimodal Fusion for Enhanced Diagnosis of Multiple Retinal Diseases in Ultra-wide OCTA

    Authors: Hao Wei, Peilun Shi, Guitao Bai, Minqing Zhang, Shuangle Li, Wu Yuan

    Abstract: Ultra-wide optical coherence tomography angiography (UW-OCTA) is an emerging imaging technique that offers significant advantages over traditional OCTA by providing an exceptionally wide scanning range of up to 24 x 20 $mm^{2}$, covering both the anterior and posterior regions of the retina. However, the currently accessible UW-OCTA datasets suffer from limited comprehensive hierarchical informati… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

  9. arXiv:2311.03748  [pdf, other

    cs.CL

    Unified Low-Resource Sequence Labeling by Sample-Aware Dynamic Sparse Finetuning

    Authors: Sarkar Snigdha Sarathi Das, Ranran Haoran Zhang, Peng Shi, Wenpeng Yin, Rui Zhang

    Abstract: Unified Sequence Labeling that articulates different sequence labeling problems such as Named Entity Recognition, Relation Extraction, Semantic Role Labeling, etc. in a generalized sequence-to-sequence format opens up the opportunity to make the maximum utilization of large language model knowledge toward structured prediction. Unfortunately, this requires formatting them into specialized augmente… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: Accepted by EMNLP 2023

  10. arXiv:2311.01202  [pdf, other

    cs.CV cs.AI

    Cross-Modal Information-Guided Network using Contrastive Learning for Point Cloud Registration

    Authors: Yifan Xie, Jihua Zhu, Shiqi Li, Pengcheng Shi

    Abstract: The majority of point cloud registration methods currently rely on extracting features from points. However, these methods are limited by their dependence on information obtained from a single modality of points, which can result in deficiencies such as inadequate perception of global features and a lack of texture information. Actually, humans can employ visual information learned from 2D images… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: 8 pages, accepted by RAL 2023

  11. arXiv:2310.19306  [pdf, other

    cond-mat.mtrl-sci cond-mat.stat-mech cs.LG

    A Planning-and-Exploring Approach to Extreme-Mechanics Force Fields

    Authors: Pengjie Shi, Zhiping Xu

    Abstract: Extreme mechanical processes such as strong lattice distortion and bond breakage during fracture are ubiquitous in nature and engineering, which often lead to catastrophic failure of structures. However, understanding the nucleation and growth of cracks is challenged by their multiscale characteristics spanning from atomic-level structures at the crack tip to the structural features where the load… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Journal ref: Journal of Physics: Condensed Matter 36 (41), 415401, 2024

  12. arXiv:2310.10634  [pdf, other

    cs.CL cs.AI

    OpenAgents: An Open Platform for Language Agents in the Wild

    Authors: Tianbao Xie, Fan Zhou, Zhoujun Cheng, Peng Shi, Luoxuan Weng, Yitao Liu, Toh Jing Hua, Junning Zhao, Qian Liu, Che Liu, Leo Z. Liu, Yiheng Xu, Hongjin Su, Dongchan Shin, Caiming Xiong, Tao Yu

    Abstract: Language agents show potential in being capable of utilizing natural language for varied and intricate tasks in diverse environments, particularly when built upon large language models (LLMs). Current language agent frameworks aim to facilitate the construction of proof-of-concept language agents while neglecting the non-expert user access to agents and paying little attention to application-level… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: 34 pages, 8 figures

  13. arXiv:2310.04992  [pdf, other

    eess.IV cs.CV

    VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence

    Authors: Jianing Qiu, Jian Wu, Hao Wei, Peilun Shi, Minqing Zhang, Yunyun Sun, Lin Li, Hanruo Liu, Hongyi Liu, Simeng Hou, Yuyang Zhao, Xuehui Shi, Junfang Xian, Xiaoxia Qu, Sirui Zhu, Lijie Pan, Xiaoniao Chen, Xiaojia Zhang, Shuai Jiang, Kebing Wang, Chenlong Yang, Mingqiang Chen, Sujie Fan, Jianhua Hu, Aiguo Lv , et al. (17 additional authors not shown)

    Abstract: We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals, covering a broad range of ophthalmic diseases, modalities, imaging devices, and demography. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications, such as disease screening and diagnosis, disease prognosis, subclassifi… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

  14. arXiv:2309.15493  [pdf, other

    cs.CV

    CauDR: A Causality-inspired Domain Generalization Framework for Fundus-based Diabetic Retinopathy Grading

    Authors: Hao Wei, Peilun Shi, Juzheng Miao, Minqing Zhang, Guitao Bai, Jianing Qiu, Furui Liu, Wu Yuan

    Abstract: Diabetic retinopathy (DR) is the most common diabetic complication, which usually leads to retinal damage, vision loss, and even blindness. A computer-aided DR grading system has a significant impact on helping ophthalmologists with rapid screening and diagnosis. Recent advances in fundus photography have precipitated the development of novel retinal imaging cameras and their subsequent implementa… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

    Comments: 13 pages, 9 figures

  15. arXiv:2309.11669  [pdf, other

    cs.CL

    Construction of Paired Knowledge Graph-Text Datasets Informed by Cyclic Evaluation

    Authors: Ali Mousavi, Xin Zhan, He Bai, Peng Shi, Theo Rekatsinas, Benjamin Han, Yunyao Li, Jeff Pound, Josh Susskind, Natalie Schluter, Ihab Ilyas, Navdeep Jaitly

    Abstract: Datasets that pair Knowledge Graphs (KG) and text together (KG-T) can be used to train forward and reverse neural models that generate text from KG and vice versa. However models trained on datasets where KG and text pairs are not equivalent can suffer from more hallucination and poorer recall. In this paper, we verify this empirically by generating datasets with different levels of noise and find… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: 16 pages

  16. arXiv:2309.09427  [pdf, other

    cs.RO cs.CV

    TransTouch: Learning Transparent Objects Depth Sensing Through Sparse Touches

    Authors: Liuyu Bian, Pengyang Shi, Weihang Chen, Jing Xu, Li Yi, Rui Chen

    Abstract: Transparent objects are common in daily life. However, depth sensing for transparent objects remains a challenging problem. While learning-based methods can leverage shape priors to improve the sensing quality, the labor-intensive data collection in the real world and the sim-to-real domain gap restrict these methods' scalability. In this paper, we propose a method to finetune a stereo network wit… ▽ More

    Submitted 17 September, 2023; originally announced September 2023.

    Comments: Accepted to the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

  17. arXiv:2308.15126  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Evaluation and Analysis of Hallucination in Large Vision-Language Models

    Authors: Junyang Wang, Yiyang Zhou, Guohai Xu, Pengcheng Shi, Chenlin Zhao, Haiyang Xu, Qinghao Ye, Ming Yan, Ji Zhang, Jihua Zhu, Jitao Sang, Haoyu Tang

    Abstract: Large Vision-Language Models (LVLMs) have recently achieved remarkable success. However, LVLMs are still plagued by the hallucination problem, which limits the practicality in many scenarios. Hallucination refers to the information of LVLMs' responses that does not exist in the visual input, which poses potential risks of substantial consequences. There has been limited work studying hallucination… ▽ More

    Submitted 10 October, 2023; v1 submitted 29 August, 2023; originally announced August 2023.

    Comments: 11 pages, 5 figures

  18. arXiv:2308.13365  [pdf, ps, other

    cs.SD eess.AS

    Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder

    Authors: Xuyuan Li, Zengqiang Shang, Peiyang Shi, Hua Hua, Ta Li, Pengyuan Zhang

    Abstract: Neural networks have been able to generate high-quality single-sentence speech. However, it remains a challenge concerning audio-book speech synthesis due to the intra-paragraph correlation of semantic and acoustic features as well as variable styles. In this paper, we propose a highly expressive paragraph speech synthesis system with a multi-step variational autoencoder, called EP-MSTTS. EP-MSTTS… ▽ More

    Submitted 11 June, 2024; v1 submitted 25 August, 2023; originally announced August 2023.

    Comments: accepted at Interspeech 2024

  19. Rotation-Invariant Completion Network

    Authors: Yu Chen, Pengcheng Shi

    Abstract: Real-world point clouds usually suffer from incompleteness and display different poses. While current point cloud completion methods excel in reproducing complete point clouds with consistent poses as seen in the training set, their performance tends to be unsatisfactory when handling point clouds with diverse poses. We propose a network named Rotation-Invariant Completion Network (RICNet), which… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Comments: 12 pages, accepted to PRCV 2023 (The 6th Chinese Conference on Pattern Recognition and Computer Vision)

  20. arXiv:2308.09894  [pdf, other

    cs.CV

    Semantic-Human: Neural Rendering of Humans from Monocular Video with Human Parsing

    Authors: Jie Zhang, Pengcheng Shi, Zaiwang Gu, Yiyang Zhou, Zhi Wang

    Abstract: The neural rendering of humans is a topic of great research significance. However, previous works mostly focus on achieving photorealistic details, neglecting the exploration of human parsing. Additionally, classical semantic work are all limited in their ability to efficiently represent fine results in complex motions. Human parsing is inherently related to radiance reconstruction, as similar app… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

  21. arXiv:2308.09364  [pdf, other

    cs.CV

    Overlap Bias Matching is Necessary for Point Cloud Registration

    Authors: Pengcheng Shi, Jie Zhang, Haozhe Cheng, Junyang Wang, Yiyang Zhou, Chenlin Zhao, Jihua Zhu

    Abstract: Point cloud registration is a fundamental problem in many domains. Practically, the overlap between point clouds to be registered may be relatively small. Most unsupervised methods lack effective initial evaluation of overlap, leading to suboptimal registration accuracy. To address this issue, we propose an unsupervised network Overlap Bias Matching Network (OBMNet) for partial point cloud registr… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: arXiv admin note: text overlap with arXiv:2202.11292 by other authors

  22. arXiv:2307.12507  [pdf, other

    cs.CL cs.AI cs.CR cs.CY

    Gradient-Based Word Substitution for Obstinate Adversarial Examples Generation in Language Models

    Authors: Yimu Wang, Peng Shi, Hongyang Zhang

    Abstract: In this paper, we study the problem of generating obstinate (over-stability) adversarial examples by word substitution in NLP, where input text is meaningfully changed but the model's prediction does not, even though it should. Previous word substitution approaches have predominantly focused on manually designed antonym-based strategies for generating obstinate adversarial examples, which hinders… ▽ More

    Submitted 17 August, 2023; v1 submitted 23 July, 2023; originally announced July 2023.

    Comments: 19 pages

  23. arXiv:2307.11783  [pdf

    cs.RO cs.CV eess.IV

    A novel integrated method of detection-grasping for specific object based on the box coordinate matching

    Authors: Zongmin Liu, Jirui Wang, Jie Li, Zufeng Li, Kai Ren, Peng Shi

    Abstract: To better care for the elderly and disabled, it is essential for service robots to have an effective fusion method of object detection and grasp estimation. However, limited research has been observed on the combination of object detection and grasp estimation. To overcome this technical difficulty, a novel integrated method of detection-grasping for specific object based on the box coordinate mat… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

  24. arXiv:2307.11609  [pdf, other

    quant-ph cond-mat.str-el cs.LG

    Persistent Ballistic Entanglement Spreading with Optimal Control in Quantum Spin Chains

    Authors: Ying Lu, Pei Shi, Xiao-Han Wang, Jie Hu, Shi-Ju Ran

    Abstract: Entanglement propagation provides a key routine to understand quantum many-body dynamics in and out of equilibrium. In this work, we uncover that the ``variational entanglement-enhancing'' field (VEEF) robustly induces a persistent ballistic spreading of entanglement in quantum spin chains. The VEEF is time dependent, and is optimally controlled to maximize the bipartite entanglement entropy (EE)… ▽ More

    Submitted 21 July, 2023; originally announced July 2023.

    Comments: 5 pages, 4 figures

  25. arXiv:2306.10561  [pdf, other

    cs.RO

    LiDAR-Based Place Recognition For Autonomous Driving: A Survey

    Authors: Yongjun Zhang, Pengcheng Shi, Jiayuan Li

    Abstract: LiDAR-based place recognition (LPR) plays a pivotal role in autonomous driving, which assists Simultaneous Localization and Mapping (SLAM) systems in reducing accumulated errors and achieving reliable localization. However, existing reviews predominantly concentrate on visual place recognition (VPR) methods. Despite the recent remarkable progress in LPR, to the best of our knowledge, there is no d… ▽ More

    Submitted 29 July, 2023; v1 submitted 18 June, 2023; originally announced June 2023.

    Comments: 26 pages,13 figures, 5 tables

  26. arXiv:2305.15911  [pdf, other

    eess.IV cs.CV

    NexToU: Efficient Topology-Aware U-Net for Medical Image Segmentation

    Authors: Pengcheng Shi, Xutao Guo, Yanwu Yang, Chenfei Ye, Ting Ma

    Abstract: Convolutional neural networks (CNN) and Transformer variants have emerged as the leading medical image segmentation backbones. Nonetheless, due to their limitations in either preserving global image context or efficiently processing irregular shapes in visual objects, these backbones struggle to effectively integrate information from diverse anatomical regions and reduce inter-individual variabili… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: 13 pages, 6 figures

  27. arXiv:2305.09132  [pdf, other

    cs.CV

    DualGenerator: Information Interaction-based Generative Network for Point Cloud Completion

    Authors: Pengcheng Shi, Haozhe Cheng, Xu Han, Yiyang Zhou, Jihua Zhu

    Abstract: Point cloud completion estimates complete shapes from incomplete point clouds to obtain higher-quality point cloud data. Most existing methods only consider global object features, ignoring spatial and semantic information of adjacent points. They cannot distinguish structural information well between different object parts, and the robustness of models is poor. To tackle these challenges, we prop… ▽ More

    Submitted 7 December, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

  28. arXiv:2304.14178  [pdf, other

    cs.CL cs.CV cs.LG

    mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality

    Authors: Qinghao Ye, Haiyang Xu, Guohai Xu, Jiabo Ye, Ming Yan, Yiyang Zhou, Junyang Wang, Anwen Hu, Pengcheng Shi, Yaya Shi, Chenliang Li, Yuanhong Xu, Hehong Chen, Junfeng Tian, Qi Qian, Ji Zhang, Fei Huang, Jingren Zhou

    Abstract: Large language models (LLMs) have demonstrated impressive zero-shot abilities on a variety of open-ended tasks, while recent research has also explored the use of LLMs for multi-modal generation. In this study, we introduce mPLUG-Owl, a novel training paradigm that equips LLMs with multi-modal abilities through modularized learning of foundation LLM, a visual knowledge module, and a visual abstrac… ▽ More

    Submitted 29 March, 2024; v1 submitted 27 April, 2023; originally announced April 2023.

    Comments: Working in Process

  29. Generalist Vision Foundation Models for Medical Imaging: A Case Study of Segment Anything Model on Zero-Shot Medical Segmentation

    Authors: Peilun Shi, Jianing Qiu, Sai Mu Dalike Abaxi, Hao Wei, Frank P. -W. Lo, Wu Yuan

    Abstract: In this paper, we examine the recent Segment Anything Model (SAM) on medical images, and report both quantitative and qualitative zero-shot segmentation results on nine medical image segmentation benchmarks, covering various imaging modalities, such as optical coherence tomography (OCT), magnetic resonance imaging (MRI), and computed tomography (CT), as well as different applications including der… ▽ More

    Submitted 5 June, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

    Comments: Published in Diagnostics

    Journal ref: Diagnostics 2023

  30. arXiv:2303.17719  [pdf, other

    cs.CV cs.LG

    Why is the winner the best?

    Authors: Matthias Eisenmann, Annika Reinke, Vivienn Weru, Minu Dietlinde Tizabi, Fabian Isensee, Tim J. Adler, Sharib Ali, Vincent Andrearczyk, Marc Aubreville, Ujjwal Baid, Spyridon Bakas, Niranjan Balu, Sophia Bano, Jorge Bernal, Sebastian Bodenstedt, Alessandro Casella, Veronika Cheplygina, Marie Daum, Marleen de Bruijne, Adrien Depeursinge, Reuben Dorent, Jan Egger, David G. Ellis, Sandy Engelhardt, Melanie Ganz , et al. (100 additional authors not shown)

    Abstract: International benchmarking competitions have become fundamental for the comparative performance assessment of image analysis methods. However, little attention has been given to investigating what can be learnt from these competitions. Do they really generate scientific progress? What are common and successful participation strategies? What makes a solution superior to a competing method? To addre… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

    Comments: accepted to CVPR 2023

  31. Large AI Models in Health Informatics: Applications, Challenges, and the Future

    Authors: Jianing Qiu, Lin Li, Jiankai Sun, Jiachuan Peng, Peilun Shi, Ruiyang Zhang, Yinzhao Dong, Kyle Lam, Frank P. -W. Lo, Bo Xiao, Wu Yuan, Ningli Wang, Dong Xu, Benny Lo

    Abstract: Large AI models, or foundation models, are models recently emerging with massive scales both parameter-wise and data-wise, the magnitudes of which can reach beyond billions. Once pretrained, large AI models demonstrate impressive performance in various downstream tasks. A prime example is ChatGPT, whose capability has compelled people's imagination about the far-reaching influence that large AI mo… ▽ More

    Submitted 24 September, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

    Comments: This article has been accepted for publication in IEEE Journal of Biomedical and Health Informatics

    Journal ref: JBHI, 2023

  32. arXiv:2303.00319  [pdf, other

    cs.CV

    RIFT2: Speeding-up RIFT with A New Rotation-Invariance Technique

    Authors: Jiayuan Li, Pengcheng Shi, Qingwu Hu, Yongjun Zhang

    Abstract: Multimodal image matching is an important prerequisite for multisource image information fusion. Compared with the traditional matching problem, multimodal feature matching is more challenging due to the severe nonlinear radiation distortion (NRD). Radiation-variation insensitive feature transform (RIFT)~\cite{li2019rift} has shown very good robustness to NRD and become a baseline method in multim… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

  33. arXiv:2302.13339  [pdf, other

    cs.LG

    MCoCo: Multi-level Consistency Collaborative Multi-view Clustering

    Authors: Yiyang Zhou, Qinghai Zheng, Wenbiao Yan, Yifei Wang, Pengcheng Shi, Jihua Zhu

    Abstract: Multi-view clustering can explore consistent information from different views to guide clustering. Most existing works focus on pursuing shallow consistency in the feature space and integrating the information of multiple views into a unified representation for clustering. These methods did not fully consider and explore the consistency in the semantic space. To address this issue, we proposed a n… ▽ More

    Submitted 16 May, 2023; v1 submitted 26 February, 2023; originally announced February 2023.

    Comments: 9 pages, 7 figures

  34. arXiv:2302.09473  [pdf, other

    cs.CV cs.CL cs.IR cs.LG cs.MM

    Video-Text Retrieval by Supervised Sparse Multi-Grained Learning

    Authors: Yimu Wang, Peng Shi

    Abstract: While recent progress in video-text retrieval has been advanced by the exploration of better representation learning, in this paper, we present a novel multi-grained sparse learning framework, S3MA, to learn an aligned sparse space shared between the video and the text for video-text retrieval. The shared sparse space is initialized with a finite number of sparse concepts, each of which refers to… ▽ More

    Submitted 17 October, 2023; v1 submitted 18 February, 2023; originally announced February 2023.

    Comments: Findings of EMNLP 2023

  35. arXiv:2302.03860  [pdf, other

    cs.CV

    EVEN: An Event-Based Framework for Monocular Depth Estimation at Adverse Night Conditions

    Authors: Peilun Shi, Jiachuan Peng, Jianing Qiu, Xinwei Ju, Frank Po Wen Lo, Benny Lo

    Abstract: Accurate depth estimation under adverse night conditions has practical impact and applications, such as on autonomous driving and rescue robots. In this work, we studied monocular depth estimation at night time in which various adverse weather, light, and different road conditions exist, with data captured in both RGB and event modalities. Event camera can better capture intensity changes by virtu… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

  36. arXiv:2212.08568  [pdf, other

    cs.CV cs.LG

    Biomedical image analysis competitions: The state of current participation practice

    Authors: Matthias Eisenmann, Annika Reinke, Vivienn Weru, Minu Dietlinde Tizabi, Fabian Isensee, Tim J. Adler, Patrick Godau, Veronika Cheplygina, Michal Kozubek, Sharib Ali, Anubha Gupta, Jan Kybic, Alison Noble, Carlos Ortiz de Solórzano, Samiksha Pachade, Caroline Petitjean, Daniel Sage, Donglai Wei, Elizabeth Wilden, Deepak Alapatt, Vincent Andrearczyk, Ujjwal Baid, Spyridon Bakas, Niranjan Balu, Sophia Bano , et al. (331 additional authors not shown)

    Abstract: The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis,… ▽ More

    Submitted 12 September, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

  37. arXiv:2211.11944  [pdf, other

    cs.LG cs.SD eess.AS

    COVID-Net Assistant: A Deep Learning-Driven Virtual Assistant for COVID-19 Symptom Prediction and Recommendation

    Authors: Pengyuan Shi, Yuetong Wang, Saad Abbasi, Alexander Wong

    Abstract: As the COVID-19 pandemic continues to put a significant burden on healthcare systems worldwide, there has been growing interest in finding inexpensive symptom pre-screening and recommendation methods to assist in efficiently using available medical resources such as PCR tests. In this study, we introduce the design of COVID-Net Assistant, an efficient virtual assistant designed to provide symptom… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

  38. arXiv:2211.08998  [pdf, other

    cs.LG math.OC

    Data-pooling Reinforcement Learning for Personalized Healthcare Intervention

    Authors: Xinyun Chen, Pengyi Shi, Shanwen Pu

    Abstract: Motivated by the emerging needs of personalized preventative intervention in many healthcare applications, we consider a multi-stage, dynamic decision-making problem in the online setting with unknown model parameters. To deal with the pervasive issue of small sample size in personalized planning, we develop a novel data-pooling reinforcement learning (RL) algorithm based on a general perturbed va… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

  39. arXiv:2210.13693  [pdf, other

    cs.CL

    XRICL: Cross-lingual Retrieval-Augmented In-Context Learning for Cross-lingual Text-to-SQL Semantic Parsing

    Authors: Peng Shi, Rui Zhang, He Bai, Jimmy Lin

    Abstract: In-context learning using large language models has recently shown surprising results for semantic parsing tasks such as Text-to-SQL translation. Prompting GPT-3 or Codex using several examples of question-SQL pairs can produce excellent results, comparable to state-of-the-art finetuning-based models. However, existing work primarily focuses on English datasets, and it is unknown whether large lan… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

  40. arXiv:2210.08266  [pdf, other

    cs.IR cs.LG

    MenuAI: Restaurant Food Recommendation System via a Transformer-based Deep Learning Model

    Authors: Xinwei Ju, Frank Po Wen Lo, Jianing Qiu, Peilun Shi, Jiachuan Peng, Benny Lo

    Abstract: Food recommendation system has proven as an effective technology to provide guidance on dietary choices, and this is especially important for patients suffering from chronic diseases. Unlike other multimedia recommendations, such as books and movies, food recommendation task is highly relied on the context at the moment, since users' food preference can be highly dynamic over time. For example, in… ▽ More

    Submitted 15 October, 2022; originally announced October 2022.

  41. arXiv:2210.02875  [pdf, other

    cs.CL

    Binding Language Models in Symbolic Languages

    Authors: Zhoujun Cheng, Tianbao Xie, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu, Caiming Xiong, Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Tao Yu

    Abstract: Though end-to-end neural approaches have recently been dominating NLP tasks in both performance and ease-of-use, they lack interpretability and robustness. We propose Binder, a training-free neural-symbolic framework that maps the task input to a program, which (1) allows binding a unified API of language model (LM) functionalities to a programming language (e.g., SQL, Python) to extend its gramma… ▽ More

    Submitted 28 February, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: ICLR 2023 camera ready, 27 pages, 10 figures

  42. arXiv:2208.12160  [pdf, other

    cs.CV

    Clustering Egocentric Images in Passive Dietary Monitoring with Self-Supervised Learning

    Authors: Jiachuan Peng, Peilun Shi, Jianing Qiu, Xinwei Ju, Frank P. -W. Lo, Xiao Gu, Wenyan Jia, Tom Baranowski, Matilda Steiner-Asiedu, Alex K. Anderson, Megan A McCrory, Edward Sazonov, Mingui Sun, Gary Frost, Benny Lo

    Abstract: In our recent dietary assessment field studies on passive dietary monitoring in Ghana, we have collected over 250k in-the-wild images. The dataset is an ongoing effort to facilitate accurate measurement of individual food and nutrient intake in low and middle income countries with passive monitoring camera technologies. The current dataset involves 20 households (74 subjects) from both the rural a… ▽ More

    Submitted 25 August, 2022; originally announced August 2022.

    Comments: accepted to BHI 2022

  43. arXiv:2208.07167  [pdf, other

    cs.CV cs.AI

    Where is VALDO? VAscular Lesions Detection and segmentatiOn challenge at MICCAI 2021

    Authors: Carole H. Sudre, Kimberlin Van Wijnen, Florian Dubost, Hieab Adams, David Atkinson, Frederik Barkhof, Mahlet A. Birhanu, Esther E. Bron, Robin Camarasa, Nish Chaturvedi, Yuan Chen, Zihao Chen, Shuai Chen, Qi Dou, Tavia Evans, Ivan Ezhov, Haojun Gao, Marta Girones Sanguesa, Juan Domingo Gispert, Beatriz Gomez Anson, Alun D. Hughes, M. Arfan Ikram, Silvia Ingala, H. Rolf Jaeger, Florian Kofler , et al. (24 additional authors not shown)

    Abstract: Imaging markers of cerebral small vessel disease provide valuable information on brain health, but their manual assessment is time-consuming and hampered by substantial intra- and interrater variability. Automated rating may benefit biomedical research, as well as clinical assessment, but diagnostic reliability of existing algorithms is unknown. Here, we present the results of the \textit{VAscular… ▽ More

    Submitted 15 August, 2022; originally announced August 2022.

  44. arXiv:2208.04119  [pdf, other

    stat.ML cond-mat.str-el cs.LG physics.class-ph

    Deep Machine Learning Reconstructing Lattice Topology with Strong Thermal Fluctuations

    Authors: Xiao-Han Wang, Pei Shi, Bin Xi, Jie Hu, Shi-Ju Ran

    Abstract: Applying artificial intelligence to scientific problems (namely AI for science) is currently under hot debate. However, the scientific problems differ much from the conventional ones with images, texts, and etc., where new challenges emerges with the unbalanced scientific data and complicated effects from the physical setups. In this work, we demonstrate the validity of the deep convolutional neur… ▽ More

    Submitted 8 August, 2022; originally announced August 2022.

    Comments: 5 pages, 4 figures

  45. arXiv:2203.10692  [pdf, other

    cs.CL

    Better Language Model with Hypernym Class Prediction

    Authors: He Bai, Tong Wang, Alessandro Sordoni, Peng Shi

    Abstract: Class-based language models (LMs) have been long devised to address context sparsity in $n$-gram LMs. In this study, we revisit this approach in the context of neural LMs. We hypothesize that class-based prediction leads to an implicit context aggregation for similar words and thus can improve generalization for rare words. We map words that have a common WordNet hypernym to the same class and tra… ▽ More

    Submitted 20 March, 2022; originally announced March 2022.

    Comments: ACL 2022

  46. arXiv:2201.10967  [pdf

    cs.LG math.NA stat.ML

    Physics-informed ConvNet: Learning Physical Field from a Shallow Neural Network

    Authors: Pengpeng Shi, Zhi Zeng, Tianshou Liang

    Abstract: Big-data-based artificial intelligence (AI) supports profound evolution in almost all of science and technology. However, modeling and forecasting multi-physical systems remain a challenge due to unavoidable data scarcity and noise. Improving the generalization ability of neural networks by "teaching" domain knowledge and developing a new generation of models combined with the physical laws have b… ▽ More

    Submitted 7 February, 2022; v1 submitted 26 January, 2022; originally announced January 2022.

    MSC Class: 68T07; 65N99; 35Gxx;

  47. arXiv:2201.05966  [pdf, other

    cs.CL

    UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models

    Authors: Tianbao Xie, Chen Henry Wu, Peng Shi, Ruiqi Zhong, Torsten Scholak, Michihiro Yasunaga, Chien-Sheng Wu, Ming Zhong, Pengcheng Yin, Sida I. Wang, Victor Zhong, Bailin Wang, Chengzu Li, Connor Boyle, Ansong Ni, Ziyu Yao, Dragomir Radev, Caiming Xiong, Lingpeng Kong, Rui Zhang, Noah A. Smith, Luke Zettlemoyer, Tao Yu

    Abstract: Structured knowledge grounding (SKG) leverages structured knowledge to complete user requests, such as semantic parsing over databases and question answering over knowledge bases. Since the inputs and outputs of SKG tasks are heterogeneous, they have been studied separately by different communities, which limits systematic and compatible research on SKG. In this paper, we overcome this limitation… ▽ More

    Submitted 18 October, 2022; v1 submitted 15 January, 2022; originally announced January 2022.

    Comments: EMNLP 2022

  48. arXiv:2112.09417  [pdf, other

    cs.IT q-bio.OT

    Concatenated Code Design for Constrained DNA Data Storage with Asymmetric Errors

    Authors: Yixin Wang, Li Deng, Md. Noor-A-Rahim, Erry Gunawan, Yong L. Guan, Zhi P. Shi, Chueh L. Poh

    Abstract: DNA Data storage has recently attracted much attention due to its durable preservation and extremely high information density (bits per gram) properties. In this work, we propose a hybrid coding strategy comprising of generalized constrained codes to tackle homopolymer (run-length) limit and a protograph based low-density parity-check (LDPC) code to correct asymmetric nucleotide level (i.e., A/T/C… ▽ More

    Submitted 17 December, 2021; originally announced December 2021.

  49. arXiv:2111.04271  [pdf, other

    cs.LG cs.AI

    Group-Aware Threshold Adaptation for Fair Classification

    Authors: Taeuk Jang, Pengyi Shi, Xiaoqian Wang

    Abstract: The fairness in machine learning is getting increasing attention, as its applications in different fields continue to expand and diversify. To mitigate the discriminated model behaviors between different demographic groups, we introduce a novel post-processing method to optimize over multiple fairness constraints through group-aware threshold adaptation. We propose to learn adaptive classification… ▽ More

    Submitted 7 November, 2021; originally announced November 2021.

    Comments: 19 pages 1 figures

  50. arXiv:2109.14259  [pdf, other

    cs.CL cs.LG

    Hierarchical Character Tagger for Short Text Spelling Error Correction

    Authors: Mengyi Gao, Canran Xu, Peng Shi

    Abstract: State-of-the-art approaches to spelling error correction problem include Transformer-based Seq2Seq models, which require large training sets and suffer from slow inference time; and sequence labeling models based on Transformer encoders like BERT, which involve token-level label space and therefore a large pre-defined vocabulary dictionary. In this paper we present a Hierarchical Character Tagger… ▽ More

    Submitted 29 September, 2021; originally announced September 2021.

    Comments: To appear in WNUT 2021 workshop, 8 pages, 2 figures