Skip to main content

Showing 1–50 of 948 results for author: Lin, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.17801  [pdf, other

    cs.LG cs.AI cs.HC

    EEG-SSM: Leveraging State-Space Model for Dementia Detection

    Authors: Xuan-The Tran, Linh Le, Quoc Toan Nguyen, Thomas Do, Chin-Teng Lin

    Abstract: State-space models (SSMs) have garnered attention for effectively processing long data sequences, reducing the need to segment time series into shorter intervals for model training and inference. Traditionally, SSMs capture only the temporal dynamics of time series data, omitting the equally critical spectral features. This study introduces EEG-SSM, a novel state-space model-based approach for dem… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  2. arXiv:2407.17379  [pdf, other

    cs.CV cs.CL

    MMRA: A Benchmark for Multi-granularity Multi-image Relational Association

    Authors: Siwei Wu, Kang Zhu, Yu Bai, Yiming Liang, Yizhi Li, Haoning Wu, Jiaheng Liu, Ruibo Liu, Xingwei Qu, Xuxin Cheng, Ge Zhang, Wenhao Huang, Chenghua Lin

    Abstract: Given the remarkable success that large visual language models (LVLMs) have achieved in image perception tasks, the endeavor to make LVMLs perceive the world like humans is drawing increasing attention. Current multi-modal benchmarks mainly focus on the objective fact or certain topic related potential knowledge within a image, but overlook the associative relations between multiple images. Theref… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: VLMS, Multi-Image Association

  3. arXiv:2407.16823  [pdf, other

    cs.RO

    SE3ET: SE(3)-Equivariant Transformer for Low-Overlap Point Cloud Registration

    Authors: Chien Erh Lin, Minghan Zhu, Maani Ghaffari

    Abstract: Partial point cloud registration is a challenging problem in robotics, especially when the robot undergoes a large transformation, causing a significant initial pose error and a low overlap between measurements. This work proposes exploiting equivariant learning from 3D point clouds to improve registration robustness. We propose SE3ET, an SE(3)-equivariant registration framework that employs equiv… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  4. arXiv:2407.16364  [pdf, other

    cs.CV

    Harmonizing Visual Text Comprehension and Generation

    Authors: Zhen Zhao, Jingqun Tang, Binghong Wu, Chunhui Lin, Shu Wei, Hao Liu, Xin Tan, Zhizhong Zhang, Can Huang, Yuan Xie

    Abstract: In this work, we present TextHarmony, a unified and versatile multimodal generative model proficient in comprehending and generating visual text. Simultaneously generating images and texts typically results in performance degradation due to the inherent inconsistency between vision and language modalities. To overcome this challenge, existing approaches resort to modality-specific data for supervi… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  5. arXiv:2407.16205  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    Figure it Out: Analyzing-based Jailbreak Attack on Large Language Models

    Authors: Shi Lin, Rongchang Li, Xun Wang, Changting Lin, Wenpeng Xing, Meng Han

    Abstract: The rapid development of Large Language Models (LLMs) has brought remarkable generative capabilities across diverse tasks. However, despite the impressive achievements, these models still have numerous security vulnerabilities, particularly when faced with jailbreak attacks. Therefore, by investigating jailbreak attacks, we can uncover hidden weaknesses in LLMs and guide us in developing more robu… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  6. arXiv:2407.15848  [pdf, other

    cs.CV

    BoostMVSNeRFs: Boosting MVS-based NeRFs to Generalizable View Synthesis in Large-scale Scenes

    Authors: Chih-Hai Su, Chih-Yao Hu, Shr-Ruei Tsai, Jie-Ying Lee, Chin-Yang Lin, Yu-Lun Liu

    Abstract: While Neural Radiance Fields (NeRFs) have demonstrated exceptional quality, their protracted training duration remains a limitation. Generalizable and MVS-based NeRFs, although capable of mitigating training time, often incur tradeoffs in quality. This paper presents a novel approach called BoostMVSNeRFs to enhance the rendering quality of MVS-based NeRFs in large-scale scenes. We first identify l… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: SIGGRAPH 2024 Conference Papers. Project page: https://su-terry.github.io/BoostMVSNeRFs/

  7. FMDNN: A Fuzzy-guided Multi-granular Deep Neural Network for Histopathological Image Classification

    Authors: Weiping Ding, Tianyi Zhou, Jiashuang Huang, Shu Jiang, Tao Hou, Chin-Teng Lin

    Abstract: Histopathological image classification constitutes a pivotal task in computer-aided diagnostics. The precise identification and categorization of histopathological images are of paramount significance for early disease detection and treatment. In the diagnostic process of pathologists, a multi-tiered approach is typically employed to assess abnormalities in cell regions at different magnifications… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted by IEEE Transactions on Fuzzy Systems for publication. Permission from IEEE must be obtained for all other uses, in any current or future media. The final version is available at [doi: 10.1109/TFUZZ.2024.3410929]

    Journal ref: IEEE Transactions on Fuzzy Systems ( Early Access ) 2024

  8. arXiv:2407.14767  [pdf, other

    cs.CL cs.AI

    I Need Help! Evaluating LLM's Ability to Ask for Users' Support: A Case Study on Text-to-SQL Generation

    Authors: Cheng-Kuang Wu, Zhi Rui Tam, Chao-Chung Wu, Chieh-Yen Lin, Hung-yi Lee, Yun-Nung Chen

    Abstract: In this study, we explore the proactive ability of LLMs to seek user support, using text-to-SQL generation as a case study. We propose metrics to evaluate the trade-off between performance improvements and user burden, and investigate whether LLMs can determine when to request help and examine their performance with varying levels of information availability. Our experiments reveal that without ex… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: 9 pages, 9 figures

  9. arXiv:2407.13930  [pdf, other

    cs.CV cs.AI eess.SP

    RT-Pose: A 4D Radar Tensor-based 3D Human Pose Estimation and Localization Benchmark

    Authors: Yuan-Hao Ho, Jen-Hao Cheng, Sheng Yao Kuan, Zhongyu Jiang, Wenhao Chai, Hsiang-Wei Huang, Chih-Lung Lin, Jenq-Neng Hwang

    Abstract: Traditional methods for human localization and pose estimation (HPE), which mainly rely on RGB images as an input modality, confront substantial limitations in real-world applications due to privacy concerns. In contrast, radar-based HPE methods emerge as a promising alternative, characterized by distinctive attributes such as through-wall recognition and privacy-preserving, rendering the method m… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  10. arXiv:2407.13343  [pdf, other

    cs.CL

    Learning-From-Mistakes Prompting for Indigenous Language Translation

    Authors: You-Cheng Liao, Chen-Jui Yu, Chi-Yi Lin, He-Feng Yun, Yen-Hsiang Wang, Hsiao-Min Li, Yao-Chung Fan

    Abstract: Using large language models, this paper presents techniques to improve extremely low-resourced indigenous language translations. Our approaches are grounded in the use of (1) the presence of a datastore consisting of a limited number of parallel translation examples, (2) the inherent capabilities of LLMs like GPT-3.5, and (3) a word-level translation dictionary. We harness the potential of LLMs an… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  11. arXiv:2407.12223  [pdf, other

    cs.LG cs.AI

    Conditional Quantile Estimation for Uncertain Watch Time in Short-Video Recommendation

    Authors: Chengzhi Lin, Shuchang Liu, Chuyuan Wang, Yongqi Liu

    Abstract: Within the domain of short video recommendation, predicting users' watch time is a critical but challenging task. Prevailing deterministic solutions obtain accurate debiased statistical models, yet they neglect the intrinsic uncertainty inherent in user environments. In our observation, we found that this uncertainty could potentially limit these methods' accuracy in watch-time prediction on our o… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 8 pages, 4 figures, 4 tables

  12. arXiv:2407.10937  [pdf, other

    cs.CV

    IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation

    Authors: Yuanhao Zhai, Kevin Lin, Linjie Li, Chung-Ching Lin, Jianfeng Wang, Zhengyuan Yang, David Doermann, Junsong Yuan, Zicheng Liu, Lijuan Wang

    Abstract: Significant advances have been made in human-centric video generation, yet the joint video-depth generation problem remains underexplored. Most existing monocular depth estimation methods may not generalize well to synthesized images or videos, and multi-view-based methods have difficulty controlling the human appearance and motion. In this work, we present IDOL (unIfied Dual-mOdal Latent diffusio… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; project page: https://yhzhai.github.io/idol/

  13. arXiv:2407.10575  [pdf, other

    cs.CV

    A Survey of Defenses against AI-generated Visual Media: Detection, Disruption, and Authentication

    Authors: Jingyi Deng, Chenhao Lin, Zhengyu Zhao, Shuai Liu, Qian Wang, Chao Shen

    Abstract: Deep generative models have demonstrated impressive performance in various computer vision applications, including image synthesis, video generation, and medical analysis. Despite their significant advancements, these models may be used for malicious purposes, such as misinformation, deception, and copyright violation. In this paper, we provide a systematic and timely review of research efforts on… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  14. arXiv:2407.09466  [pdf, other

    cs.RO cs.GR

    TRAVERSE: Traffic-Responsive Autonomous Vehicle Experience & Rare-event Simulation for Enhanced safety

    Authors: Sandeep Thalapanane, Sandip Sharan Senthil Kumar, Guru Nandhan Appiya Dilipkumar Peethambari, Sourang SriHari, Laura Zheng, Julio Poveda, Ming C. Lin

    Abstract: Data for training learning-enabled self-driving cars in the physical world are typically collected in a safe, normal environment. Such data distribution often engenders a strong bias towards safe driving, making self-driving cars unprepared when encountering adversarial scenarios like unexpected accidents. Due to a dearth of such adverse data that is unrealistic for drivers to collect, autonomous… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  15. arXiv:2407.09295  [pdf, other

    cs.CR

    Security Matrix for Multimodal Agents on Mobile Devices: A Systematic and Proof of Concept Study

    Authors: Yulong Yang, Xinshan Yang, Shuaidong Li, Chenhao Lin, Zhengyu Zhao, Chao Shen, Tianwei Zhang

    Abstract: The rapid progress in the reasoning capability of the Multi-modal Large Language Models (MLLMs) has triggered the development of autonomous agent systems on mobile devices. MLLM-based mobile agent systems consist of perception, reasoning, memory, and multi-agent collaboration modules, enabling automatic analysis of user instructions and the design of task pipelines with only natural language and d… ▽ More

    Submitted 17 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

    Comments: Preprint. Work in progress

  16. arXiv:2407.09059  [pdf, other

    cs.CV

    Domain-adaptive Video Deblurring via Test-time Blurring

    Authors: Jin-Ting He, Fu-Jen Tsai, Jia-Hao Wu, Yan-Tsung Peng, Chung-Chi Tsai, Chia-Wen Lin, Yen-Yu Lin

    Abstract: Dynamic scene video deblurring aims to remove undesirable blurry artifacts captured during the exposure process. Although previous video deblurring methods have achieved impressive results, they suffer from significant performance drops due to the domain gap between training and testing videos, especially for those captured in real-world scenarios. To address this issue, we propose a domain adapta… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  17. arXiv:2407.07764  [pdf, other

    cs.CV

    PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer

    Authors: Tongkun Guan, Chengyu Lin, Wei Shen, Xiaokang Yang

    Abstract: Handwritten Mathematical Expression Recognition (HMER) has wide applications in human-machine interaction scenarios, such as digitized education and automated offices. Recently, sequence-based models with encoder-decoder architectures have been commonly adopted to address this task by directly predicting LaTeX sequences of expression images. However, these methods only implicitly learn the syntax… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  18. arXiv:2407.07580  [pdf, other

    cs.CV

    InstructLayout: Instruction-Driven 2D and 3D Layout Synthesis with Semantic Graph Prior

    Authors: Chenguo Lin, Yuchen Lin, Panwang Pan, Xuanyang Zhang, Yadong Mu

    Abstract: Comprehending natural language instructions is a charming property for both 2D and 3D layout synthesis systems. Existing methods implicitly model object joint distributions and express object relations, hindering generation's controllability. We introduce InstructLayout, a novel generative framework that integrates a semantic graph prior and a layout decoder to improve controllability and fidelity… ▽ More

    Submitted 10 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: This paper is an extension of ICLR 2024 "InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior". arXiv admin note: substantial text overlap with arXiv:2402.04717

  19. arXiv:2407.07331  [pdf, ps, other

    cs.CV cs.AI

    Learning with Instance-Dependent Noisy Labels by Anchor Hallucination and Hard Sample Label Correction

    Authors: Po-Hsuan Huang, Chia-Ching Lin, Chih-Fan Hsu, Ming-Ching Chang, Wei-Chao Chen

    Abstract: Learning from noisy-labeled data is crucial for real-world applications. Traditional Noisy-Label Learning (NLL) methods categorize training data into clean and noisy sets based on the loss distribution of training samples. However, they often neglect that clean samples, especially those with intricate visual patterns, may also yield substantial losses. This oversight is particularly significant in… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: ICIP 2024

  20. arXiv:2407.05709  [pdf, other

    eess.IV cs.CV

    Heterogeneous window transformer for image denoising

    Authors: Chunwei Tian, Menghua Zheng, Chia-Wen Lin, Zhiwu Li, David Zhang

    Abstract: Deep networks can usually depend on extracting more structural information to improve denoising results. However, they may ignore correlation between pixels from an image to pursue better denoising performance. Window transformer can use long- and short-distance modeling to interact pixels to address mentioned problem. To make a tradeoff between distance modeling and denoising time, we propose a h… ▽ More

    Submitted 14 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  21. arXiv:2407.04990  [pdf

    cs.AI cs.CL

    Conditional Semi-Supervised Data Augmentation for Spam Message Detection with Low Resource Data

    Authors: Ulin Nuha, Chih-Hsueh Lin

    Abstract: Several machine learning schemes have attempted to perform the detection of spam messages. However, those schemes mostly require a huge amount of labeled data. The existing techniques addressing the lack of data availability have issues with effectiveness and robustness. Therefore, this paper proposes a conditional semi-supervised data augmentation (CSSDA) for a spam detection model lacking the av… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  22. arXiv:2407.04360  [pdf, other

    cs.CV math.CV

    Shape Prior Segmentation Guided by Harmonic Beltrami Signature

    Authors: Chenran Lin, Lok Ming Lui

    Abstract: This paper presents a novel shape prior segmentation method guided by the Harmonic Beltrami Signature (HBS). The HBS is a shape representation fully capturing 2D simply connected shapes, exhibiting resilience against perturbations and invariance to translation, rotation, and scaling. The proposed method integrates the HBS within a quasi-conformal topology preserving segmentation framework, leverag… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 34 pages, 15 figures

  23. arXiv:2407.04241  [pdf, other

    cs.CV cs.AI

    AnySR: Realizing Image Super-Resolution as Any-Scale, Any-Resource

    Authors: Wengyi Zhan, Mingbao Lin, Chia-Wen Lin, Rongrong Ji

    Abstract: In an effort to improve the efficiency and scalability of single-image super-resolution (SISR) applications, we introduce AnySR, to rebuild existing arbitrary-scale SR methods into any-scale, any-resource implementation. As a contrast to off-the-shelf methods that solve SR tasks across various scales with the same computing costs, our AnySR innovates in: 1) building arbitrary-scale tasks as any-re… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  24. arXiv:2407.04191  [pdf, other

    cs.CV cs.AI cs.GR

    GazeFusion: Saliency-guided Image Generation

    Authors: Yunxiang Zhang, Nan Wu, Connor Z. Lin, Gordon Wetzstein, Qi Sun

    Abstract: Diffusion models offer unprecedented image generation capabilities given just a text prompt. While emerging control mechanisms have enabled users to specify the desired spatial arrangements of the generated content, they cannot predict or control where viewers will pay more attention due to the complexity of human vision. Recognizing the critical necessity of attention-controllable image generatio… ▽ More

    Submitted 16 March, 2024; originally announced July 2024.

  25. arXiv:2407.03163  [pdf, other

    cs.CV

    Global Context Modeling in YOLOv8 for Pediatric Wrist Fracture Detection

    Authors: Rui-Yang Ju, Chun-Tse Chien, Chia-Min Lin, Jen-Shiun Chiang

    Abstract: Children often suffer wrist injuries in daily life, while fracture injuring radiologists usually need to analyze and interpret X-ray images before surgical treatment by surgeons. The development of deep learning has enabled neural network models to work as computer-assisted diagnosis (CAD) tools to help doctors and experts in diagnosis. Since the YOLOv8 models have obtained the satisfactory succes… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  26. arXiv:2407.03153  [pdf, other

    cs.LG cs.CV

    Efficient Shapley Values for Attributing Global Properties of Diffusion Models to Data Group

    Authors: Chris Lin, Mingyu Lu, Chanwoo Kim, Su-In Lee

    Abstract: As diffusion models are deployed in real-world settings, data attribution is needed to ensure fair acknowledgment for contributors of high-quality training data and to identify sources of harmful content. Previous work focuses on identifying individual training samples important for the generation of a given image. However, instead of focusing on a given generated image, some use cases require und… ▽ More

    Submitted 9 June, 2024; originally announced July 2024.

  27. arXiv:2407.02490  [pdf, other

    cs.CL cs.LG

    MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

    Authors: Huiqiang Jiang, Yucheng Li, Chengruidong Zhang, Qianhui Wu, Xufang Luo, Surin Ahn, Zhenhua Han, Amir H. Abdi, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, Lili Qiu

    Abstract: The computational challenges of Large Language Model (LLM) inference remain a significant barrier to their widespread deployment, especially as prompt lengths continue to increase. Due to the quadratic complexity of the attention computation, it takes 30 minutes for an 8B LLM to process a prompt of 1M tokens (i.e., the pre-filling stage) on a single A100 GPU. Existing methods for speeding up prefi… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  28. arXiv:2407.01910  [pdf, other

    cs.LG cs.AI cs.AR

    MG-Verilog: Multi-grained Dataset Towards Enhanced LLM-assisted Verilog Generation

    Authors: Yongan Zhang, Zhongzhi Yu, Yonggan Fu, Cheng Wan, Yingyan Celine Lin

    Abstract: Large Language Models (LLMs) have recently shown promise in streamlining hardware design processes by encapsulating vast amounts of domain-specific data. In addition, they allow users to interact with the design processes through natural language instructions, thus making hardware design more accessible to developers. However, effectively leveraging LLMs in hardware design necessitates providing d… ▽ More

    Submitted 3 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted in ISLAD 2024

  29. arXiv:2407.01519  [pdf, other

    cs.CV

    DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models

    Authors: Chang-Han Yeh, Chin-Yang Lin, Zhixiang Wang, Chi-Wei Hsiao, Ting-Hsuan Chen, Yu-Lun Liu

    Abstract: This paper introduces a method for zero-shot video restoration using pre-trained image restoration diffusion models. Traditional video restoration methods often need retraining for different settings and struggle with limited generalization across various degradation types and datasets. Our approach uses a hierarchical token merging strategy for keyframes and local frames, combined with a hybrid c… ▽ More

    Submitted 19 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: Project page: https://jimmycv07.github.io/DiffIR2VR_web/

  30. arXiv:2407.00488  [pdf, other

    cs.CL cs.AI

    PFME: A Modular Approach for Fine-grained Hallucination Detection and Editing of Large Language Models

    Authors: Kunquan Deng, Zeyu Huang, Chen Li, Chenghua Lin, Min Gao, Wenge Rong

    Abstract: Large Language Models (LLMs) excel in fluency but risk producing inaccurate content, called "hallucinations." This paper outlines a standardized process for categorizing fine-grained hallucination types and proposes an innovative framework--the Progressive Fine-grained Model Editor (PFME)--specifically designed to detect and correct fine-grained hallucinations in LLMs. PFME consists of two collabo… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  31. arXiv:2406.20038  [pdf, other

    cs.CL

    BioMNER: A Dataset for Biomedical Method Entity Recognition

    Authors: Chen Tang, Bohao Yang, Kun Zhao, Bo Lv, Chenghao Xiao, Frank Guerin, Chenghua Lin

    Abstract: Named entity recognition (NER) stands as a fundamental and pivotal task within the realm of Natural Language Processing. Particularly within the domain of Biomedical Method NER, this task presents notable challenges, stemming from the continual influx of domain-specific terminologies in scholarly literature. Current research in Biomedical Method (BioMethod) NER suffers from a scarcity of resources… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  32. arXiv:2406.18284  [pdf, other

    cs.CV

    RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network

    Authors: Xiaozhong Ji, Chuming Lin, Zhonggan Ding, Ying Tai, Jian Yang, Junwei Zhu, Xiaobin Hu, Jiangning Zhang, Donghao Luo, Chengjie Wang

    Abstract: Person-generic audio-driven face generation is a challenging task in computer vision. Previous methods have achieved remarkable progress in audio-visual synchronization, but there is still a significant gap between current results and practical applications. The challenges are two-fold: 1) Preserving unique individual traits for achieving high-precision lip synchronization. 2) Generating high-qual… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  33. arXiv:2406.18197  [pdf, other

    cs.CV

    Human-free Prompted Based Anomaly Detection: prompt optimization with Meta-guiding prompt scheme

    Authors: Pi-Wei Chen, Jerry Chun-Wei Lin, Jia Ji, Feng-Hao Yeh, Chao-Chun Chen

    Abstract: Pre-trained vision-language models (VLMs) are highly adaptable to various downstream tasks through few-shot learning, making prompt-based anomaly detection a promising approach. Traditional methods depend on human-crafted prompts that require prior knowledge of specific anomaly types. Our goal is to develop a human-free prompt-based anomaly detection framework that optimally learns prompts through… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  34. arXiv:2406.17988  [pdf, other

    cs.CV

    DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image

    Authors: Qingxuan Wu, Zhiyang Dou, Sirui Xu, Soshi Shimada, Chen Wang, Zhengming Yu, Yuan Liu, Cheng Lin, Zeyu Cao, Taku Komura, Vladislav Golyanik, Christian Theobalt, Wenping Wang, Lingjie Liu

    Abstract: Reconstructing 3D hand-face interactions with deformations from a single image is a challenging yet crucial task with broad applications in AR, VR, and gaming. The challenges stem from self-occlusions during single-view hand-face interactions, diverse spatial relationships between hands and face, complex deformations, and the ambiguity of the single-view setting. The first and only method for hand… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 23 pages, 9 figures, 3 tables

  35. arXiv:2406.17962  [pdf, other

    cs.CL

    SimsChat: A Customisable Persona-Driven Role-Playing Agent

    Authors: Bohao Yang, Dong Liu, Chen Tang, Chenghao Xiao, Kun Zhao, Chao Li, Lin Yuan, Guang Yang, Lanxiao Huang, Chenghua Lin

    Abstract: Large Language Models (LLMs) possess the remarkable capability to understand human instructions and generate high-quality text, enabling them to act as agents that simulate human behaviours. This capability allows LLMs to emulate human beings in a more advanced manner, beyond merely replicating simple human behaviours. However, there is a lack of exploring into leveraging LLMs to craft characters… ▽ More

    Submitted 30 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  36. arXiv:2406.17911  [pdf, other

    cs.CL

    X-ray Made Simple: Radiology Report Generation and Evaluation with Layman's Terms

    Authors: Kun Zhao, Chenghao Xiao, Chen Tang, Bohao Yang, Kai Ye, Noura Al Moubayed, Liang Zhan, Chenghua Lin

    Abstract: Radiology Report Generation (RRG) has achieved significant progress with the advancements of multimodal generative models. However, the evaluation in the domain suffers from a lack of fair and robust metrics. We reveal that, high performance on RRG with existing lexical-based metrics (e.g. BLEU) might be more of a mirage - a model can get a high BLEU only by learning the template of reports. This… ▽ More

    Submitted 30 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  37. arXiv:2406.15765  [pdf, other

    cs.LG cs.CL

    Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration

    Authors: Zhongzhi Yu, Zheng Wang, Yonggan Fu, Huihong Shi, Khalid Shaikh, Yingyan Celine Lin

    Abstract: Attention is a fundamental component behind the remarkable achievements of large language models (LLMs). However, our current understanding of the attention mechanism, especially regarding how attention distributions are established, remains limited. Inspired by recent studies that explore the presence of attention sink in the initial token, which receives disproportionately large attention scores… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  38. arXiv:2406.15758  [pdf, other

    cs.LG cs.DC

    EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting

    Authors: Zhongzhi Yu, Zheng Wang, Yuhan Li, Haoran You, Ruijie Gao, Xiaoya Zhou, Sreenidhi Reedy Bommu, Yang Katie Zhao, Yingyan Celine Lin

    Abstract: Efficient adaption of large language models (LLMs) on edge devices is essential for applications requiring continuous and privacy-preserving adaptation and inference. However, existing tuning techniques fall short because of the high computation and memory overheads. To this end, we introduce a computation- and memory-efficient LLM tuning framework, called Edge-LLM, to facilitate affordable and ef… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  39. arXiv:2406.15396  [pdf, other

    cs.CV cs.AI cs.LG

    Feature Purified Transformer With Cross-level Feature Guiding Decoder For Multi-class OOD and Anomaly Deteciton

    Authors: Jerry Chun-Wei Lin, Pi-Wei Chen, Chao-Chun Chen

    Abstract: Reconstruction networks are prevalently used in unsupervised anomaly and Out-of-Distribution (OOD) detection due to their independence from labeled anomaly data. However, in multi-class datasets, the effectiveness of anomaly detection is often compromised by the models' generalized reconstruction capabilities, which allow anomalies to blend within the expanded boundaries of normality resulting fro… ▽ More

    Submitted 30 April, 2024; originally announced June 2024.

    Comments: 12 pages

  40. arXiv:2406.13698  [pdf, other

    cs.CL

    MMTE: Corpus and Metrics for Evaluating Machine Translation Quality of Metaphorical Language

    Authors: Shun Wang, Ge Zhang, Han Wu, Tyler Loakman, Wenhao Huang, Chenghua Lin

    Abstract: Machine Translation (MT) has developed rapidly since the release of Large Language Models and current MT evaluation is performed through comparison with reference human translations or by predicting quality scores from human-labeled data. However, these mainstream evaluation methods mainly focus on fluency and factual reliability, whilst paying little attention to figurative quality. In this paper… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  41. arXiv:2406.12834  [pdf, other

    cs.CV

    GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation

    Authors: Ci-Siang Lin, I-Jieh Liu, Min-Hung Chen, Chien-Yi Wang, Sifei Liu, Yu-Chiang Frank Wang

    Abstract: Referring Video Object Segmentation (RVOS) aims to segment the object referred to by the query sentence throughout the entire video. Most existing methods require end-to-end training with dense mask annotations, which could be computation-consuming and less scalable. In this work, we aim to efficiently adapt foundation segmentation models for addressing RVOS from weak supervision with the proposed… ▽ More

    Submitted 23 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: CVPR Workshop (CVinW) 2024. Project page: https://jack24658735.github.io/groprompt/

  42. arXiv:2406.12501  [pdf, other

    cs.IR

    Improving Multi-modal Recommender Systems by Denoising and Aligning Multi-modal Content and User Feedback

    Authors: Guipeng Xv, Xinyu Li, Ruobing Xie, Chen Lin, Chong Liu, Feng Xia, Zhanhui Kang, Leyu Lin

    Abstract: Multi-modal recommender systems (MRSs) are pivotal in diverse online web platforms and have garnered considerable attention in recent years. However, previous studies overlook the challenges of (1) noisy multi-modal content, (2) noisy user feedback, and (3) aligning multi-modal content with user feedback. In order to tackle these challenges, we propose Denoising and Aligning Multi-modal Recommende… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  43. arXiv:2406.12459  [pdf, other

    cs.CV

    HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors

    Authors: Panwang Pan, Zhuo Su, Chenguo Lin, Zhen Fan, Yongjie Zhang, Zeming Li, Tingting Shen, Yadong Mu, Yebin Liu

    Abstract: Despite recent advancements in high-fidelity human reconstruction techniques, the requirements for densely captured images or time-consuming per-instance optimization significantly hinder their applications in broader scenarios. To tackle these issues, we present HumanSplat which predicts the 3D Gaussian Splatting properties of any human from a single input image in a generalizable manner. In part… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  44. arXiv:2406.09648  [pdf, other

    cs.LG cs.CV

    An Intrinsic Vector Heat Network

    Authors: Alexander Gao, Maurice Chu, Mubbasir Kapadia, Ming C. Lin, Hsueh-Ti Derek Liu

    Abstract: Vector fields are widely used to represent and model flows for many science and engineering applications. This paper introduces a novel neural network architecture for learning tangent vector fields that are intrinsically defined on manifold surfaces embedded in 3D. Previous approaches to learning vector fields on surfaces treat vectors as multi-dimensional scalar fields, using traditional scalar-… ▽ More

    Submitted 18 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  45. arXiv:2406.09105  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance

    Authors: Chenwei Lin, Hanjia Lyu, Xian Xu, Jiebo Luo

    Abstract: Large Vision-Language Models (LVLMs) have demonstrated outstanding performance in various general multimodal applications such as image recognition and visual reasoning, and have also shown promising potential in specialized domains. However, the application potential of LVLMs in the insurance domain-characterized by rich application scenarios and abundant multimodal data-has not been effectively… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  46. arXiv:2406.08747  [pdf, other

    cs.CL

    StreamBench: Towards Benchmarking Continuous Improvement of Language Agents

    Authors: Cheng-Kuang Wu, Zhi Rui Tam, Chieh-Yen Lin, Yun-Nung Chen, Hung-yi Lee

    Abstract: Recent works have shown that large language model (LLM) agents are able to improve themselves from experience, which is an important ability for continuous enhancement post-deployment. However, existing benchmarks primarily evaluate their innate capabilities and do not assess their ability to improve over time. To address this gap, we introduce StreamBench, a pioneering benchmark designed to evalu… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  47. arXiv:2406.07368  [pdf, other

    cs.CL cs.AI cs.LG

    When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models

    Authors: Haoran You, Yichao Fu, Zheng Wang, Amir Yazdanbakhsh, Yingyan Celine Lin

    Abstract: Autoregressive Large Language Models (LLMs) have achieved impressive performance in language tasks but face two significant bottlenecks: (1) quadratic complexity in the attention module as the number of tokens increases, and (2) limited efficiency due to the sequential processing nature of autoregressive LLMs during generation. While linear attention and speculative decoding offer potential soluti… ▽ More

    Submitted 25 July, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024; 17 pages; 10 figures; 16 tables

  48. arXiv:2406.06890  [pdf, other

    cs.CV

    Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation

    Authors: Yuanhao Zhai, Kevin Lin, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Chung-Ching Lin, David Doermann, Junsong Yuan, Lijuan Wang

    Abstract: Image diffusion distillation achieves high-fidelity generation with very few sampling steps. However, applying these techniques directly to video diffusion often results in unsatisfactory frame quality due to the limited visual quality in public video datasets. This affects the performance of both teacher and student video diffusion models. Our study aims to improve video diffusion distillation wh… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Project page: https://yhzhai.github.io/mcm/

  49. arXiv:2406.05981  [pdf, other

    cs.LG cs.AI cs.CL

    ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization

    Authors: Haoran You, Yipin Guo, Yichao Fu, Wei Zhou, Huihong Shi, Xiaofan Zhang, Souvik Kundu, Amir Yazdanbakhsh, Yingyan Celine Lin

    Abstract: Large language models (LLMs) have shown impressive performance on language tasks but face challenges when deployed on resource-constrained devices due to their extensive parameters and reliance on dense multiplications, resulting in high memory demands and latency bottlenecks. Shift-and-add reparameterization offers a promising solution by replacing costly multiplications with hardware-friendly pr… ▽ More

    Submitted 25 July, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  50. arXiv:2406.05625  [pdf, other

    cs.CL

    ATLAS: Improving Lay Summarisation with Attribute-based Control

    Authors: Zhihao Zhang, Tomas Goldsack, Carolina Scarton, Chenghua Lin

    Abstract: Lay summarisation aims to produce summaries of scientific articles that are comprehensible to non-expert audiences. However, previous work assumes a one-size-fits-all approach, where the content and style of the produced summary are entirely dependent on the data used to train the model. In practice, audiences with different levels of expertise will have specific needs, impacting what content shou… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.