Skip to main content

Showing 1–50 of 272 results for author: Huang, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.14354  [pdf, other

    cs.SE cs.AI cs.CL

    SWE-bench-java: A GitHub Issue Resolving Benchmark for Java

    Authors: Daoguang Zan, Zhirong Huang, Ailun Yu, Shaoxin Lin, Yifan Shi, Wei Liu, Dong Chen, Zongshuai Qi, Hao Yu, Lei Yu, Dezhi Ran, Muhan Zeng, Bo Shen, Pan Bian, Guangtai Liang, Bei Guan, Pengjie Huang, Tao Xie, Yongji Wang, Qianxiang Wang

    Abstract: GitHub issue resolving is a critical task in software engineering, recently gaining significant attention in both industry and academia. Within this task, SWE-bench has been released to evaluate issue resolving capabilities of large language models (LLMs), but has so far only focused on Python version. However, supporting more programming languages is also important, as there is a strong demand in… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: This work is in progress

  2. arXiv:2408.14158  [pdf, other

    cs.DC cs.AI

    Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning

    Authors: Wei An, Xiao Bi, Guanting Chen, Shanhuang Chen, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Wenjun Gao, Kang Guan, Jianzhong Guo, Yongqiang Guo, Zhe Fu, Ying He, Panpan Huang, Jiashi Li, Wenfeng Liang, Xiaodong Liu, Xin Liu, Yiyuan Liu, Yuxuan Liu, Shanghao Lu, Xuan Lu, Xiaotao Nie, Tian Pei , et al. (27 additional authors not shown)

    Abstract: The rapid progress in Deep Learning (DL) and Large Language Models (LLMs) has exponentially increased demands of computational power and bandwidth. This, combined with the high costs of faster computing chips and interconnects, has significantly inflated High Performance Computing (HPC) construction costs. To address these challenges, we introduce the Fire-Flyer AI-HPC architecture, a synergistic… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: This is the preprint version of the paper accepted for presentation at the 2024 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'24). \c{opyright} 2024 IEEE. Personal use of this material is permitted. For other uses, permission from IEEE must be obtained. Please refer to IEEE Xplore for the final published version

  3. arXiv:2408.14152  [pdf, other

    cs.CV cs.LG

    Application of Disentanglement to Map Registration Problem

    Authors: Hae Jin Song, Patrycja Krawczuk, Po-Hsuan Huang

    Abstract: Geospatial data come from various sources, such as satellites, aircraft, and LiDAR. The variability of the source is not limited to the types of data acquisition techniques, as we have maps from different time periods. To incorporate these data for a coherent analysis, it is essential to first align different "styles" of geospatial data to its matching images that point to the same location on the… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  4. arXiv:2408.09404  [pdf, other

    cs.CL cs.AI

    Comparison between the Structures of Word Co-occurrence and Word Similarity Networks for Ill-formed and Well-formed Texts in Taiwan Mandarin

    Authors: Po-Hsuan Huang, Hsuan-Lei Shao

    Abstract: The study of word co-occurrence networks has attracted the attention of researchers due to their potential significance as well as applications. Understanding the structure of word co-occurrence networks is therefore important to fully realize their significance and usages. In past studies, word co-occurrence networks built on well-formed texts have been found to possess certain characteristics, i… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 4 pages, 1 figure, 5 tables

    ACM Class: H.3.3; I.2.7

  5. arXiv:2408.08881  [pdf, other

    eess.IV cs.AI cs.CV

    U-MedSAM: Uncertainty-aware MedSAM for Medical Image Segmentation

    Authors: Xin Wang, Xiaoyu Liu, Peng Huang, Pu Huang, Shu Hu, Hongtu Zhu

    Abstract: Medical Image Foundation Models have proven to be powerful tools for mask prediction across various datasets. However, accurately assessing the uncertainty of their predictions remains a significant challenge. To address this, we propose a new model, U-MedSAM, which integrates the MedSAM model with an uncertainty-aware loss function and the Sharpness-Aware Minimization (SharpMin) optimizer. The un… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2405.17496

  6. arXiv:2408.07037  [pdf, other

    cs.CV cs.AI

    PathInsight: Instruction Tuning of Multimodal Datasets and Models for Intelligence Assisted Diagnosis in Histopathology

    Authors: Xiaomin Wu, Rui Xu, Pengchen Wei, Wenkang Qin, Peixiang Huang, Ziheng Li, Lin Luo

    Abstract: Pathological diagnosis remains the definitive standard for identifying tumors. The rise of multimodal large models has simplified the process of integrating image analysis with textual descriptions. Despite this advancement, the substantial costs associated with training and deploying these complex multimodal models, together with a scarcity of high-quality training datasets, create a significant… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: 10 pages, 2 figures

  7. arXiv:2408.05116  [pdf, other

    quant-ph cs.LG stat.ML

    Concept learning of parameterized quantum models from limited measurements

    Authors: Beng Yee Gan, Po-Wei Huang, Elies Gil-Fuster, Patrick Rebentrost

    Abstract: Classical learning of the expectation values of observables for quantum states is a natural variant of learning quantum states or channels. While learning-theoretic frameworks establish the sample complexity and the number of measurement shots per sample required for learning such statistical quantities, the interplay between these two variables has not been adequately quantified before. In this w… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: 16 + 8 pages, 4 figures

  8. arXiv:2408.04300  [pdf, other

    eess.IV cs.CV

    An Explainable Non-local Network for COVID-19 Diagnosis

    Authors: Jingfu Yang, Peng Huang, Jing Hu, Shu Hu, Siwei Lyu, Xin Wang, Jun Guo, Xi Wu

    Abstract: The CNN has achieved excellent results in the automatic classification of medical images. In this study, we propose a novel deep residual 3D attention non-local network (NL-RAN) to classify CT images included COVID-19, common pneumonia, and normal to perform rapid and explainable COVID-19 diagnosis. We built a deep residual 3D attention non-local network that could achieve end-to-end training. The… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  9. arXiv:2408.02373  [pdf, other

    cs.AI

    Operationalizing Contextual Integrity in Privacy-Conscious Assistants

    Authors: Sahra Ghalebikesabi, Eugene Bagdasaryan, Ren Yi, Itay Yona, Ilia Shumailov, Aneesh Pappu, Chongyang Shi, Laura Weidinger, Robert Stanforth, Leonard Berrada, Pushmeet Kohli, Po-Sen Huang, Borja Balle

    Abstract: Advanced AI assistants combine frontier LLMs and tool access to autonomously perform complex tasks on behalf of users. While the helpfulness of such assistants can increase dramatically with access to user information including emails and documents, this raises privacy concerns about assistants sharing inappropriate information with third parties without user supervision. To steer information-shar… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  10. arXiv:2408.01808  [pdf, other

    cs.CR cs.AI cs.SD eess.AS

    ALIF: Low-Cost Adversarial Audio Attacks on Black-Box Speech Platforms using Linguistic Features

    Authors: Peng Cheng, Yuwei Wang, Peng Huang, Zhongjie Ba, Xiaodong Lin, Feng Lin, Li Lu, Kui Ren

    Abstract: Extensive research has revealed that adversarial examples (AE) pose a significant threat to voice-controllable smart devices. Recent studies have proposed black-box adversarial attacks that require only the final transcription from an automatic speech recognition (ASR) system. However, these attacks typically involve many queries to the ASR, resulting in substantial costs. Moreover, AE-based adver… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: Published in the 2024 IEEE Symposium on Security and Privacy (SP)

  11. arXiv:2407.21402  [pdf, other

    cs.CV

    DD-rPPGNet: De-interfering and Descriptive Feature Learning for Unsupervised rPPG Estimation

    Authors: Pei-Kai Huang, Tzu-Hsien Chen, Ya-Ting Chan, Kuan-Wen Chen, Chiou-Ting Hsu

    Abstract: Remote Photoplethysmography (rPPG) aims to measure physiological signals and Heart Rate (HR) from facial videos. Recent unsupervised rPPG estimation methods have shown promising potential in estimating rPPG signals from facial regions without relying on ground truth rPPG signals. However, these methods seem oblivious to interference existing in rPPG signals and still result in unsatisfactory perfo… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  12. arXiv:2407.19208  [pdf, other

    cs.GR

    WindPoly: Polygonal Mesh Reconstruction via Winding Numbers

    Authors: Xin He, Chenlei Lv, Pengdi Huang, Hui Huang

    Abstract: Polygonal mesh reconstruction of a raw point cloud is a valuable topic in the field of computer graphics and 3D vision. Especially to 3D architectural models, polygonal mesh provides concise expressions for fundamental geometric structures while effectively reducing data volume. However, there are some limitations of traditional reconstruction methods: normal vector dependency, noisy points and de… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: European Conference on Computer Vision (Proceedings of ECCV 2024)

  13. arXiv:2407.13322  [pdf, other

    cs.CV

    Fully Test-Time rPPG Estimation via Synthetic Signal-Guided Feature Learning

    Authors: Pei-Kai Huang, Tzu-Hsien Chen, Ya-Ting Chan, Kuan-Wen Chen, Chiou-Ting Hsu

    Abstract: Many remote photoplethysmography (rPPG) estimation models have achieved promising performance in the training domain but often fail to accurately estimate physiological signals or heart rates (HR) in the target domains. Domain generalization (DG) or domain adaptation (DA) techniques are therefore adopted during the offline training stage to adapt the model to either unobserved or observed target d… ▽ More

    Submitted 15 August, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

  14. arXiv:2407.13164  [pdf, other

    cs.CL cs.AI

    Translate-and-Revise: Boosting Large Language Models for Constrained Translation

    Authors: Pengcheng Huang, Yongyu Mu, Yuzhang Wu, Bei Li, Chunyang Xiao, Tong Xiao, Jingbo Zhu

    Abstract: Imposing constraints on machine translation systems presents a challenging issue because these systems are not trained to make use of constraints in generating adequate, fluent translations. In this paper, we leverage the capabilities of large language models (LLMs) for constrained translation, given that LLMs can easily adapt to this task by taking translation instructions and constraints as prom… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 16 pages

  15. arXiv:2407.07331  [pdf, ps, other

    cs.CV cs.AI

    Learning with Instance-Dependent Noisy Labels by Anchor Hallucination and Hard Sample Label Correction

    Authors: Po-Hsuan Huang, Chia-Ching Lin, Chih-Fan Hsu, Ming-Ching Chang, Wei-Chao Chen

    Abstract: Learning from noisy-labeled data is crucial for real-world applications. Traditional Noisy-Label Learning (NLL) methods categorize training data into clean and noisy sets based on the loss distribution of training samples. However, they often neglect that clean samples, especially those with intricate visual patterns, may also yield substantial losses. This oversight is particularly significant in… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: ICIP 2024

  16. arXiv:2407.00752  [pdf, other

    cs.CV cs.AI

    Chest-Diffusion: A Light-Weight Text-to-Image Model for Report-to-CXR Generation

    Authors: Peng Huang, Xue Gao, Lihong Huang, Jing Jiao, Xiaokang Li, Yuanyuan Wang, Yi Guo

    Abstract: Text-to-image generation has important implications for generation of diverse and controllable images. Several attempts have been made to adapt Stable Diffusion (SD) to the medical domain. However, the large distribution difference between medical reports and natural texts, as well as high computational complexity in common stable diffusion limit the authenticity and feasibility of the generated m… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  17. arXiv:2406.17338  [pdf, other

    eess.IV cs.CV cs.LG

    Robustly Optimized Deep Feature Decoupling Network for Fatty Liver Diseases Detection

    Authors: Peng Huang, Shu Hu, Bo Peng, Jiashu Zhang, Xi Wu, Xin Wang

    Abstract: Current medical image classification efforts mainly aim for higher average performance, often neglecting the balance between different classes. This can lead to significant differences in recognition accuracy between classes and obvious recognition weaknesses. Without the support of massive data, deep learning faces challenges in fine-grained classification of fatty liver. In this paper, we propos… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: MICCAI 2024

  18. arXiv:2406.09870  [pdf, other

    cs.LG cs.AI

    IGL-Bench: Establishing the Comprehensive Benchmark for Imbalanced Graph Learning

    Authors: Jiawen Qin, Haonan Yuan, Qingyun Sun, Lyujin Xu, Jiaqi Yuan, Pengfeng Huang, Zhaonan Wang, Xingcheng Fu, Hao Peng, Jianxin Li, Philip S. Yu

    Abstract: Deep graph learning has gained grand popularity over the past years due to its versatility and success in representing graph data across a wide range of domains. However, the pervasive issue of imbalanced graph data distributions, where certain parts exhibit disproportionally abundant data while others remain sparse, undermines the efficacy of conventional graph learning algorithms, leading to bia… ▽ More

    Submitted 19 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Preprint, under review)

  19. arXiv:2406.02430  [pdf, other

    eess.AS cs.SD

    Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

    Authors: Philip Anastassiou, Jiawei Chen, Jitong Chen, Yuanzhe Chen, Zhuo Chen, Ziyi Chen, Jian Cong, Lelai Deng, Chuang Ding, Lu Gao, Mingqing Gong, Peisong Huang, Qingqing Huang, Zhiying Huang, Yuanyuan Huo, Dongya Jia, Chumin Li, Feiya Li, Hui Li, Jiaxin Li, Xiaoyang Li, Xingxing Li, Lin Liu, Shouda Liu, Sichao Liu , et al. (21 additional authors not shown)

    Abstract: We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  20. arXiv:2406.01436  [pdf, other

    cs.CL

    Editing the Mind of Giants: An In-Depth Exploration of Pitfalls of Knowledge Editing in Large Language Models

    Authors: Cheng-Hsun Hsueh, Paul Kuo-Ming Huang, Tzu-Han Lin, Che-Wei Liao, Hung-Chieh Fang, Chao-Wei Huang, Yun-Nung Chen

    Abstract: Knowledge editing is a rising technique for efficiently updating factual knowledge in Large Language Models (LLMs) with minimal alteration of parameters. However, recent studies have identified concerning side effects, such as knowledge distortion and the deterioration of general abilities, that have emerged after editing. This survey presents a comprehensive study of these side effects, providing… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  21. arXiv:2405.16640  [pdf, other

    cs.AI cs.CL cs.CV cs.MM

    A Survey of Multimodal Large Language Model from A Data-centric Perspective

    Authors: Tianyi Bai, Hao Liang, Binwang Wan, Yanran Xu, Xi Li, Shiyu Li, Ling Yang, Bozhou Li, Yifan Wang, Bin Cui, Ping Huang, Jiulong Shan, Conghui He, Binhang Yuan, Wentao Zhang

    Abstract: Multimodal large language models (MLLMs) enhance the capabilities of standard large language models by integrating and processing data from multiple modalities, including text, vision, audio, video, and 3D environments. Data plays a pivotal role in the development and refinement of these models. In this survey, we comprehensively review the literature on MLLMs from a data-centric perspective. Spec… ▽ More

    Submitted 18 July, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

  22. arXiv:2405.15655  [pdf, other

    cs.SD cs.LG eess.AS

    HiddenSpeaker: Generate Imperceptible Unlearnable Audios for Speaker Verification System

    Authors: Zhisheng Zhang, Pengyang Huang

    Abstract: In recent years, the remarkable advancements in deep neural networks have brought tremendous convenience. However, the training process of a highly effective model necessitates a substantial quantity of samples, which brings huge potential threats, like unauthorized exploitation with privacy leakage. In response, we propose a framework named HiddenSpeaker, embedding imperceptible perturbations wit… ▽ More

    Submitted 26 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted by IJCNN 2024

  23. arXiv:2405.15199  [pdf, other

    cs.CV

    ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models

    Authors: Jingyuan Zhu, Shiyu Li, Yuxuan Liu, Ping Huang, Jiulong Shan, Huimin Ma, Jian Yuan

    Abstract: Modern diffusion-based image generative models have made significant progress and become promising to enrich training data for the object detection task. However, the generation quality and the controllability for complex scenes containing multi-class objects and dense objects with occlusions remain limited. This paper presents ODGEN, a novel method to generate high-quality images conditioned on b… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  24. arXiv:2405.15140  [pdf, other

    cs.LG

    Better Membership Inference Privacy Measurement through Discrepancy

    Authors: Ruihan Wu, Pengrun Huang, Kamalika Chaudhuri

    Abstract: Membership Inference Attacks have emerged as a dominant method for empirically measuring privacy leakage from machine learning models. Here, privacy is measured by the {\em{advantage}} or gap between a score or a function computed on the training and the test data. A major barrier to the practical deployment of these attacks is that they do not scale to large well-generalized models -- either the… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 9 pages

  25. arXiv:2405.14855  [pdf, other

    cs.CV cs.AI

    Synergistic Global-space Camera and Human Reconstruction from Videos

    Authors: Yizhou Zhao, Tuanfeng Y. Wang, Bhiksha Raj, Min Xu, Jimei Yang, Chun-Hao Paul Huang

    Abstract: Remarkable strides have been made in reconstructing static scenes or human bodies from monocular videos. Yet, the two problems have largely been approached independently, without much synergy. Most visual SLAM methods can only reconstruct camera trajectories and scene structures up to scale, while most HMR methods reconstruct human meshes in metric scale but fall short in reasoning with cameras an… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: CVPR 2024

  26. arXiv:2405.13788  [pdf, other

    quant-ph cs.GT

    Quantum algorithm for large-scale market equilibrium computation

    Authors: Po-Wei Huang, Patrick Rebentrost

    Abstract: Classical algorithms for market equilibrium computation such as proportional response dynamics face scalability issues with Internet-based applications such as auctions, recommender systems, and fair division, despite having an almost linear runtime in terms of the product of buyers and goods. In this work, we provide the first quantum algorithm for market equilibrium computation with sub-linear p… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 22 pages, 1 figure

  27. arXiv:2405.12664  [pdf, other

    cs.NI

    IREE Oriented Green 6G Networks: A Radial Basis Function Based Approach

    Authors: Tao Yu, Pengbo Huang, Shunqing Zhang, Xiaojing Chen, Yanzan Sun, Xin Wang

    Abstract: In order to provide design guidelines for energy efficient 6G networks, we propose a novel radial basis function (RBF) based optimization framework to maximize the integrated relative energy efficiency (IREE) metric. Different from the conventional energy efficient optimization schemes, we maximize the transformed utility for any given IREE using spectrum efficiency oriented RBF network and gradua… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  28. arXiv:2405.10554  [pdf, other

    cs.CV

    NeRO: Neural Road Surface Reconstruction

    Authors: Ruibo Wang, Song Zhang, Ping Huang, Donghai Zhang, Haoyu Chen

    Abstract: Accurately reconstructing road surfaces is pivotal for various applications especially in autonomous driving. This paper introduces a position encoding Multi-Layer Perceptrons (MLPs) framework to reconstruct road surfaces, with input as world coordinates x and y, and output as height, color, and semantic information. The effectiveness of this method is demonstrated through its compatibility with a… ▽ More

    Submitted 28 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

  29. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  30. arXiv:2405.02077  [pdf, other

    cs.CV

    MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action Recognition

    Authors: Hongyu Qu, Rui Yan, Xiangbo Shu, Hailiang Gao, Peng Huang, Guo-Sen Xie

    Abstract: Recent few-shot action recognition (FSAR) methods typically perform semantic matching on learned discriminative features to achieve promising performance. However, most FSAR methods focus on single-scale (e.g., frame-level, segment-level, etc) feature alignment, which ignores that human actions with the same semantic may appear at different velocities. To this end, we develop a novel Multi-Velocit… ▽ More

    Submitted 23 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  31. arXiv:2405.01582  [pdf, other

    cs.CL cs.AI cs.LG

    Text Quality-Based Pruning for Efficient Training of Language Models

    Authors: Vasu Sharma, Karthik Padthe, Newsha Ardalani, Kushal Tirumala, Russell Howes, Hu Xu, Po-Yao Huang, Shang-Wen Li, Armen Aghajanyan, Gargi Ghosh, Luke Zettlemoyer

    Abstract: In recent times training Language Models (LMs) have relied on computationally heavy training over massive datasets which makes this training process extremely laborious. In this paper we propose a novel method for numerically evaluating text quality in large unlabelled NLP datasets in a model agnostic manner to assign the text instances a "quality score". By proposing the text quality metric, th… ▽ More

    Submitted 10 May, 2024; v1 submitted 26 April, 2024; originally announced May 2024.

  32. arXiv:2404.16030  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    MoDE: CLIP Data Experts via Clustering

    Authors: Jiawei Ma, Po-Yao Huang, Saining Xie, Shang-Wen Li, Luke Zettlemoyer, Shih-Fu Chang, Wen-Tau Yih, Hu Xu

    Abstract: The success of contrastive language-image pretraining (CLIP) relies on the supervision from the pairing between images and captions, which tends to be noisy in web-crawled data. We present Mixture of Data Experts (MoDE) and learn a system of CLIP data experts via clustering. Each data expert is trained on one data cluster, being less sensitive to false negative noises in other clusters. At inferen… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: IEEE CVPR 2024 Camera Ready. Code Link: https://github.com/facebookresearch/MetaCLIP/tree/main/mode

  33. arXiv:2404.14852  [pdf, other

    cs.CV

    Ultrasound Nodule Segmentation Using Asymmetric Learning with Simple Clinical Annotation

    Authors: Xingyue Zhao, Zhongyu Li, Xiangde Luo, Peiqi Li, Peng Huang, Jianwei Zhu, Yang Liu, Jihua Zhu, Meng Yang, Shi Chang, Jun Dong

    Abstract: Recent advances in deep learning have greatly facilitated the automated segmentation of ultrasound images, which is essential for nodule morphological analysis. Nevertheless, most existing methods depend on extensive and precise annotations by domain experts, which are labor-intensive and time-consuming. In this study, we suggest using simple aspect ratio annotations directly from ultrasound clini… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: Accepted by TCSVT

  34. arXiv:2404.13134  [pdf, other

    cs.MM cs.CV cs.LG

    Deep Learning-based Text-in-Image Watermarking

    Authors: Bishwa Karki, Chun-Hua Tsai, Pei-Chi Huang, Xin Zhong

    Abstract: In this work, we introduce a novel deep learning-based approach to text-in-image watermarking, a method that embeds and extracts textual information within images to enhance data security and integrity. Leveraging the capabilities of deep learning, specifically through the use of Transformer-based architectures for text processing and Vision Transformers for image feature extraction, our method se… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  35. arXiv:2404.05583  [pdf, other

    cs.CV

    Towards More General Video-based Deepfake Detection through Facial Feature Guided Adaptation for Foundation Model

    Authors: Yue-Hua Han, Tai-Ming Huang, Shu-Tzu Lo, Po-Han Huang, Kai-Lung Hua, Jun-Cheng Chen

    Abstract: With the rise of deep learning, generative models have enabled the creation of highly realistic synthetic images, presenting challenges due to their potential misuse. While research in Deepfake detection has grown rapidly in response, many detection methods struggle with unseen Deepfakes generated by new synthesis techniques. To address this generalisation challenge, we propose a novel Deepfake de… ▽ More

    Submitted 5 June, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

  36. arXiv:2404.00893  [pdf, other

    cs.RO

    An Integrating Comprehensive Trajectory Prediction with Risk Potential Field Method for Autonomous Driving

    Authors: Kailu Wu, Xing Liu, Feiyu Bian, Yizhai Zhang, Panfeng Huang

    Abstract: Due to the uncertainty of traffic participants' intentions, generating safe but not overly cautious behavior in interactive driving scenarios remains a formidable challenge for autonomous driving. In this paper, we address this issue by combining a deep learning-based trajectory prediction model with risk potential field-based motion planning. In order to comprehensively predict the possible futur… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  37. arXiv:2404.00576  [pdf

    cs.LG cs.AI cs.CV

    Automated Bi-Fold Weighted Ensemble Algorithms and its Application to Brain Tumor Detection and Classification

    Authors: PoTsang B. Huang, Muhammad Rizwan, Mehboob Ali

    Abstract: The uncontrolled and unstructured growth of brain cells is known as brain tumor, which has one of the highest mortality rates among diseases from all types of cancers. Due to limited diagnostic and treatment capabilities, they pose significant challenges, especially in third-world countries. Early diagnosis plays a vital role in effectively managing brain tumors and reducing mortality rates. Howev… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  38. arXiv:2403.19374  [pdf, other

    cs.ET eess.SY

    A noise-tolerant, resource-saving probabilistic binary neural network implemented by the SOT-MRAM compute-in-memory system

    Authors: Yu Gu, Puyang Huang, Tianhao Chen, Chenyi Fu, Aitian Chen, Shouzhong Peng, Xixiang Zhang, Xufeng Kou

    Abstract: We report a spin-orbit torque(SOT) magnetoresistive random-access memory(MRAM)-based probabilistic binary neural network(PBNN) for resource-saving and hardware noise-tolerant computing applications. With the presence of thermal fluctuation, the non-destructive SOT-driven magnetization switching characteristics lead to a random weight matrix with controllable probability distribution. In the meanwh… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: 5 pages, 10 figures

    MSC Class: 94C60 ACM Class: B.2.4; B.3.0

  39. arXiv:2403.16973  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild

    Authors: Puyuan Peng, Po-Yao Huang, Shang-Wen Li, Abdelrahman Mohamed, David Harwath

    Abstract: We introduce VoiceCraft, a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on audiobooks, internet videos, and podcasts. VoiceCraft employs a Transformer decoder architecture and introduces a token rearrangement procedure that combines causal masking and delayed stacking to enable generation within an… ▽ More

    Submitted 13 June, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: ACL 2024. Data, code, and model weights are available at https://github.com/jasonppy/VoiceCraft

  40. arXiv:2403.16242  [pdf, other

    cs.CV

    Adversarially Masked Video Consistency for Unsupervised Domain Adaptation

    Authors: Xiaoyu Zhu, Junwei Liang, Po-Yao Huang, Alex Hauptmann

    Abstract: We study the problem of unsupervised domain adaptation for egocentric videos. We propose a transformer-based model to learn class-discriminative and domain-invariant feature representations. It consists of two novel designs. The first module is called Generative Adversarial Domain Alignment Network with the aim of learning domain-invariant representations. It simultaneously learns a mask generator… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  41. Semantic Is Enough: Only Semantic Information For NeRF Reconstruction

    Authors: Ruibo Wang, Song Zhang, Ping Huang, Donghai Zhang, Wei Yan

    Abstract: Recent research that combines implicit 3D representation with semantic information, like Semantic-NeRF, has proven that NeRF model could perform excellently in rendering 3D structures with semantic labels. This research aims to extend the Semantic Neural Radiance Fields (Semantic-NeRF) model by focusing solely on semantic output and removing the RGB output component. We reformulate the model and i… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  42. arXiv:2403.13208  [pdf, other

    cs.RO

    CaDRE: Controllable and Diverse Generation of Safety-Critical Driving Scenarios using Real-World Trajectories

    Authors: Peide Huang, Wenhao Ding, Jonathan Francis, Bingqing Chen, Ding Zhao

    Abstract: Simulation is an indispensable tool in the development and testing of autonomous vehicles (AVs), offering an efficient and safe alternative to road testing by allowing the exploration of a wide range of scenarios. Despite its advantages, a significant challenge within simulation-based testing is the generation of safety-critical scenarios, which are essential to ensure that AVs can handle rare but… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  43. arXiv:2403.04481  [pdf, other

    cs.CL cs.AI

    Do Large Language Model Understand Multi-Intent Spoken Language ?

    Authors: Shangjian Yin, Peijie Huang, Yuhong Xu, Haojing Huang, Jiatian Chen

    Abstract: This research signifies a considerable breakthrough in leveraging Large Language Models (LLMs) for multi-intent spoken language understanding (SLU). Our approach re-imagines the use of entity slots in multi-intent SLU applications, making the most of the generative potential of LLMs within the SLU landscape, leading to the development of the EN-LLM series. Furthermore, we introduce the concept of… ▽ More

    Submitted 15 April, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

  44. arXiv:2402.16398  [pdf, other

    cs.RO

    Efficient Continuous-Time Ego-Motion Estimation for Asynchronous Event-based Data Associations

    Authors: Zhixiang Wang, Xudong Li, Tianle Liu, Yizhai Zhang, Panfeng Huang

    Abstract: Event cameras are bio-inspired vision sensors that asynchronously measure per-pixel brightness changes. The high temporal resolution and asynchronicity of event cameras offer great potential for estimating the robot motion state. Recent works have adopted the continuous-time ego-motion estimation methods to exploit the inherent nature of event cameras. However, most of the adopted methods have poo… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: 8 pages, 7 figures

  45. arXiv:2402.02333  [pdf, other

    cs.CR cs.CV cs.LG

    Copyright Protection in Generative AI: A Technical Perspective

    Authors: Jie Ren, Han Xu, Pengfei He, Yingqian Cui, Shenglai Zeng, Jiankun Zhang, Hongzhi Wen, Jiayuan Ding, Pei Huang, Lingjuan Lyu, Hui Liu, Yi Chang, Jiliang Tang

    Abstract: Generative AI has witnessed rapid advancement in recent years, expanding their capabilities to create synthesized content such as text, images, audio, and code. The high fidelity and authenticity of contents generated by these Deep Generative Models (DGMs) have sparked significant copyright concerns. There have been various legal debates on how to effectively safeguard copyrights in DGMs. This wor… ▽ More

    Submitted 24 July, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

    Comments: 26 pages

  46. arXiv:2401.15704  [pdf, other

    cs.CR cs.SD eess.AS

    Phoneme-Based Proactive Anti-Eavesdropping with Controlled Recording Privilege

    Authors: Peng Huang, Yao Wei, Peng Cheng, Zhongjie Ba, Li Lu, Feng Lin, Yang Wang, Kui Ren

    Abstract: The widespread smart devices raise people's concerns of being eavesdropped on. To enhance voice privacy, recent studies exploit the nonlinearity in microphone to jam audio recorders with inaudible ultrasound. However, existing solutions solely rely on energetic masking. Their simple-form noise leads to several problems, such as high energy requirements and being easily removed by speech enhancemen… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

    Comments: 14 pages, 28 figures; submitted to IEEE TDSC

  47. arXiv:2401.14461  [pdf, other

    cs.AI cs.LG cs.LO

    Marabou 2.0: A Versatile Formal Analyzer of Neural Networks

    Authors: Haoze Wu, Omri Isac, Aleksandar Zeljić, Teruhiro Tagomori, Matthew Daggitt, Wen Kokke, Idan Refaeli, Guy Amir, Kyle Julian, Shahaf Bassan, Pei Huang, Ori Lahav, Min Wu, Min Zhang, Ekaterina Komendantskaya, Guy Katz, Clark Barrett

    Abstract: This paper serves as a comprehensive system description of version 2.0 of the Marabou framework for formal analysis of neural networks. We discuss the tool's architectural design and highlight the major features and components introduced since its initial release.

    Submitted 20 May, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: Condensed version accepted at CAV'24

  48. arXiv:2401.13649  [pdf, other

    cs.LG cs.CL cs.CV

    VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks

    Authors: Jing Yu Koh, Robert Lo, Lawrence Jang, Vikram Duvvur, Ming Chong Lim, Po-Yu Huang, Graham Neubig, Shuyan Zhou, Ruslan Salakhutdinov, Daniel Fried

    Abstract: Autonomous agents capable of planning, reasoning, and executing actions on the web offer a promising avenue for automating computer tasks. However, the majority of existing benchmarks primarily focus on text-based agents, neglecting many natural tasks that require visual information to effectively solve. Given that most computer interfaces cater to human perception, visual information often augmen… ▽ More

    Submitted 5 June, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: Accepted to ACL 2024. 24 pages. Project page: https://jykoh.com/vwa

  49. arXiv:2401.10822  [pdf, other

    cs.CV

    ActAnywhere: Subject-Aware Video Background Generation

    Authors: Boxiao Pan, Zhan Xu, Chun-Hao Paul Huang, Krishna Kumar Singh, Yang Zhou, Leonidas J. Guibas, Jimei Yang

    Abstract: Generating video background that tailors to foreground subject motion is an important problem for the movie industry and visual effects community. This task involves synthesizing background that aligns with the motion and appearance of the foreground subject, while also complies with the artist's creative intention. We introduce ActAnywhere, a generative model that automates this process which tra… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

  50. arXiv:2401.08422  [pdf, other

    cs.CV

    Improving Limited Supervised Foot Ulcer Segmentation Using Cross-Domain Augmentation

    Authors: Shang-Jui Kuo, Po-Han Huang, Chia-Ching Lin, Jeng-Lin Li, Ming-Ching Chang

    Abstract: Diabetic foot ulcers pose health risks, including higher morbidity, mortality, and amputation rates. Monitoring wound areas is crucial for proper care, but manual segmentation is subjective due to complex wound features and background variation. Expert annotations are costly and time-intensive, thus hampering large dataset creation. Existing segmentation models relying on extensive annotations are… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: 5 pages, 2 figures, accepted by ICASSP 2024