Skip to main content

Showing 1–50 of 367 results for author: Ye, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.12163  [pdf, other

    cs.CL

    Preference-Guided Reflective Sampling for Aligning Language Models

    Authors: Hai Ye, Hwee Tou Ng

    Abstract: Large language models (LLMs) are aligned with human preferences by reinforcement learning from human feedback (RLHF). Effective data sampling is crucial for RLHF, as it determines the efficiency of model training, ensuring that models learn from the informative samples. To achieve better data generation, we propose a new sampling method called Preference-Guided Reflective Sampling (PRS). PRS frame… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  2. arXiv:2408.12158  [pdf, other

    cs.CE cs.CY

    Could Bibliometrics Reveal Top Science and Technology Achievements and Researchers? The Case for Evaluatology-based Science and Technology Evaluation

    Authors: Guoxin Kang, Wanling Gao, Lei Wang, Chunjie Luo, Hainan Ye, Qian He, Shaopeng Dai, Jianfeng Zhan

    Abstract: By utilizing statistical methods to analyze bibliographic data, bibliometrics faces inherent limitations in identifying the most significant science and technology achievements and researchers. To overcome this challenge, we present an evaluatology-based science and technology evaluation methodology. At the heart of this approach lies the concept of an extended evaluation condition, encompassing e… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 18 pages, 8 figures, and 2 tables

  3. arXiv:2408.10899  [pdf, other

    cs.RO

    All Robots in One: A New Standard and Unified Dataset for Versatile, General-Purpose Embodied Agents

    Authors: Zhiqiang Wang, Hao Zheng, Yunshuang Nie, Wenjun Xu, Qingwei Wang, Hua Ye, Zhe Li, Kaidong Zhang, Xuewen Cheng, Wanxi Dong, Chang Cai, Liang Lin, Feng Zheng, Xiaodan Liang

    Abstract: Embodied AI is transforming how AI systems interact with the physical world, yet existing datasets are inadequate for developing versatile, general-purpose agents. These limitations include a lack of standardized formats, insufficient data diversity, and inadequate data volume. To address these issues, we introduce ARIO (All Robots In One), a new data standard that enhances existing datasets by of… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Project website: https://imaei.github.io/project_pages/ario/

  4. arXiv:2408.08977  [pdf, other

    cs.DC

    FedFQ: Federated Learning with Fine-Grained Quantization

    Authors: Haowei Li, Weiying Xie, Hangyu Ye, Jitao Ma, Shuran Ma, Yunsong Li

    Abstract: Federated learning (FL) is a decentralized approach, enabling multiple participants to collaboratively train a model while ensuring the protection of data privacy. The transmission of updates from numerous edge clusters to the server creates a significant communication bottleneck in FL. Quantization is an effective compression technology, showcasing immense potential in addressing this bottleneck… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  5. arXiv:2408.04693  [pdf, other

    cs.CL cs.AI cs.LG

    Understanding the Performance and Estimating the Cost of LLM Fine-Tuning

    Authors: Yuchen Xia, Jiho Kim, Yuhan Chen, Haojie Ye, Souvik Kundu, Cong Hao, Nishil Talati

    Abstract: Due to the cost-prohibitive nature of training Large Language Models (LLMs), fine-tuning has emerged as an attractive alternative for specializing LLMs for specific tasks using limited compute resources in a cost-effective manner. In this paper, we characterize sparse Mixture of Experts (MoE) based LLM fine-tuning to understand their accuracy and runtime performance on a single GPU. Our evaluation… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 10 pages, conference

  6. arXiv:2408.04144  [pdf, other

    cs.CV

    Integrated Dynamic Phenological Feature for Remote Sensing Image Land Cover Change Detection

    Authors: Yi Liu, Chenhao Sun, Hao Ye, Xiangying Liu, Weilong Ju

    Abstract: Remote sensing image change detection (CD) is essential for analyzing land surface changes over time, with a significant challenge being the differentiation of actual changes from complex scenes while filtering out pseudo-changes. A primary contributor to this challenge is the intra-class dynamic changes due to phenological characteristics in natural areas. To overcome this, we introduce the InPhe… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  7. arXiv:2407.19385  [pdf, other

    cs.CV cs.AI cs.LG q-bio.NC

    Multi-modal Imaging Genomics Transformer: Attentive Integration of Imaging with Genomic Biomarkers for Schizophrenia Classification

    Authors: Nagur Shareef Shaik, Teja Krishna Cherukuri, Vince D. Calhoun, Dong Hye Ye

    Abstract: Schizophrenia (SZ) is a severe brain disorder marked by diverse cognitive impairments, abnormalities in brain structure, function, and genetic factors. Its complex symptoms and overlap with other psychiatric conditions challenge traditional diagnostic methods, necessitating advanced systems to improve precision. Existing research studies have mostly focused on imaging data, such as structural and… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: Accepted for presentation at the AI for Imaging Genomic Learning (AIIG) Workshop, MICCAI 2024

  8. arXiv:2407.12705  [pdf, other

    cs.CV

    IMAGDressing-v1: Customizable Virtual Dressing

    Authors: Fei Shen, Xin Jiang, Xin He, Hu Ye, Cong Wang, Xiaoyu Du, Zechao Li, Jinhui Tang

    Abstract: Latest advances have achieved realistic virtual try-on (VTON) through localized garment inpainting using latent diffusion models, significantly enhancing consumers' online shopping experience. However, existing VTON technologies neglect the need for merchants to showcase garments comprehensively, including flexible control over garments, optional faces, poses, and scenes. To address this issue, we… ▽ More

    Submitted 6 August, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

  9. arXiv:2407.11736  [pdf, other

    cs.RO cs.CV

    GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure Detection

    Authors: Jingwen Yu, Hanjing Ye, Jianhao Jiao, Ping Tan, Hong Zhang

    Abstract: Visual loop closure detection is an important module in visual simultaneous localization and mapping (SLAM), which associates current camera observation with previously visited places. Loop closures correct drifts in trajectory estimation to build a globally consistent map. However, a false loop closure can be fatal, so verification is required as an additional step to ensure robustness by rejecti… ▽ More

    Submitted 16 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: 9 pages, 11 figures, Accepted by IROS(2024)

  10. arXiv:2407.10439  [pdf, other

    cs.CV

    PolyRoom: Room-aware Transformer for Floorplan Reconstruction

    Authors: Yuzhou Liu, Lingjie Zhu, Xiaodong Ma, Hanqiao Ye, Xiang Gao, Xianwei Zheng, Shuhan Shen

    Abstract: Reconstructing geometry and topology structures from raw unstructured data has always been an important research topic in indoor mapping research. In this paper, we aim to reconstruct the floorplan with a vectorized representation from point clouds. Despite significant advancements achieved in recent years, current methods still encounter several challenges, such as missing corners or edges, inacc… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  11. arXiv:2407.08632  [pdf, other

    cs.LG

    Generalization Error Matters in Decentralized Learning Under Byzantine Attacks

    Authors: Haoxiang Ye, Qing Ling

    Abstract: Recently, decentralized learning has emerged as a popular peer-to-peer signal and information processing paradigm that enables model training across geographically distributed agents in a scalable manner, without the presence of any central server. When some of the agents are malicious (also termed as Byzantine), resilient decentralized learning algorithms are able to limit the impact of these Byz… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  12. arXiv:2407.05383  [pdf, other

    cs.CV

    Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit for Real-Time UAV Tracking

    Authors: You Wu, Xucheng Wang, Dan Zeng, Hengzhou Ye, Xiaolan Xie, Qijun Zhao, Shuiwang Li

    Abstract: Recently, the surge in the adoption of single-stream architectures utilizing pre-trained ViT backbones represents a promising advancement in the field of generic visual tracking. By integrating feature extraction and fusion into a cohesive framework, these architectures offer improved performance, efficiency, and robustness. However, there has been limited exploration into optimizing these framewo… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  13. arXiv:2407.05364  [pdf, other

    cs.LG

    PTaRL: Prototype-based Tabular Representation Learning via Space Calibration

    Authors: Hangting Ye, Wei Fan, Xiaozhuang Song, Shun Zheng, He Zhao, Dandan Guo, Yi Chang

    Abstract: Tabular data have been playing a mostly important role in diverse real-world fields, such as healthcare, engineering, finance, etc. With the recent success of deep learning, many tabular machine learning (ML) methods based on deep networks (e.g., Transformer, ResNet) have achieved competitive performance on tabular benchmarks. However, existing deep tabular ML methods suffer from the representatio… ▽ More

    Submitted 15 July, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

    Comments: Accepted by ICLR 2024

  14. arXiv:2407.04057  [pdf, other

    cs.LG

    TALENT: A Tabular Analytics and Learning Toolbox

    Authors: Si-Yang Liu, Hao-Run Cai, Qi-Le Zhou, Han-Jia Ye

    Abstract: Tabular data is one of the most common data sources in machine learning. Although a wide range of classical methods demonstrate practical utilities in this field, deep learning methods on tabular data are becoming promising alternatives due to their flexibility and ability to capture complex interactions within the data. Considering that deep tabular methods have diverse design philosophies, inclu… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  15. arXiv:2407.03257  [pdf, other

    cs.LG

    Modern Neighborhood Components Analysis: A Deep Tabular Baseline Two Decades Later

    Authors: Han-Jia Ye, Huai-Hong Yin, De-Chuan Zhan

    Abstract: The growing success of deep learning in various domains has prompted investigations into its application to tabular data, where deep models have shown promising results compared to traditional tree-based methods. In this paper, we revisit Neighborhood Component Analysis (NCA), a classic tabular prediction method introduced in 2004, designed to learn a linear projection that captures semantic simil… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  16. arXiv:2407.02482  [pdf, other

    cs.CV

    Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models

    Authors: Fei Shen, Hu Ye, Sibo Liu, Jun Zhang, Cong Wang, Xiao Han, Wei Yang

    Abstract: Recent research showcases the considerable potential of conditional diffusion models for generating consistent stories. However, current methods, which predominantly generate stories in an autoregressive and excessively caption-dependent manner, often underrate the contextual consistency and relevance of frames during sequential generation. To address this, we propose a novel Rich-contextual Condi… ▽ More

    Submitted 3 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

  17. arXiv:2407.02320  [pdf, other

    cs.CL cs.AI

    Exploring the Role of Transliteration in In-Context Learning for Low-resource Languages Written in Non-Latin Scripts

    Authors: Chunlan Ma, Yihong Liu, Haotian Ye, Hinrich Schütze

    Abstract: Decoder-only large language models (LLMs) excel in high-resource languages across various tasks through few-shot or even zero-shot in-context learning (ICL). However, their performance often does not transfer well to low-resource languages, especially those written in non-Latin scripts. Inspired by recent work that leverages transliteration in encoder-only models, we investigate whether transliter… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  18. arXiv:2407.01509  [pdf, other

    cs.CV cs.CL

    MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs

    Authors: Yusu Qian, Hanrong Ye, Jean-Philippe Fauconnier, Peter Grasch, Yinfei Yang, Zhe Gan

    Abstract: We introduce MIA-Bench, a new benchmark designed to evaluate multimodal large language models (MLLMs) on their ability to strictly adhere to complex instructions. Our benchmark comprises a diverse set of 400 image-prompt pairs, each crafted to challenge the models' compliance with layered instructions in generating accurate responses that satisfy specific requested patterns. Evaluation results fro… ▽ More

    Submitted 25 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

  19. Deep learning for automated detection of breast cancer in deep ultraviolet fluorescence images with diffusion probabilistic model

    Authors: Sepehr Salem Ghahfarokhi, Tyrell To, Julie Jorns, Tina Yen, Bing Yu, Dong Hye Ye

    Abstract: Data limitation is a significant challenge in applying deep learning to medical images. Recently, the diffusion probabilistic model (DPM) has shown the potential to generate high-quality images by converting Gaussian random noise into realistic images. In this paper, we apply the DPM to augment the deep ultraviolet fluorescence (DUV) image dataset with an aim to improve breast cancer classificatio… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: IEEE International Symposium on Biomedical Imaging 2024

    Journal ref: 2024 IEEE International Symposium on Biomedical Imaging (ISBI), May 27-30, 2024, Athens, Greece

  20. arXiv:2407.00956  [pdf, other

    cs.LG

    A Closer Look at Deep Learning on Tabular Data

    Authors: Han-Jia Ye, Si-Yang Liu, Hao-Run Cai, Qi-Le Zhou, De-Chuan Zhan

    Abstract: Tabular data is prevalent across various domains in machine learning. Although Deep Neural Network (DNN)-based methods have shown promising performance comparable to tree-based ones, in-depth evaluation of these methods is challenging due to varying performance ranks across diverse datasets. In this paper, we propose a comprehensive benchmark comprising 300 tabular datasets, covering a wide range… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  21. arXiv:2407.00502  [pdf, other

    cs.LG cs.AI

    Deep Frequency Derivative Learning for Non-stationary Time Series Forecasting

    Authors: Wei Fan, Kun Yi, Hangting Ye, Zhiyuan Ning, Qi Zhang, Ning An

    Abstract: While most time series are non-stationary, it is inevitable for models to face the distribution shift issue in time series forecasting. Existing solutions manipulate statistical measures (usually mean and std.) to adjust time series distribution. However, these operations can be theoretically seen as the transformation towards zero frequency component of the spectrum which cannot reveal full distr… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: Accepted by IJCAI 2024

  22. Crowd-Sourced NeRF: Collecting Data from Production Vehicles for 3D Street View Reconstruction

    Authors: Tong Qin, Changze Li, Haoyang Ye, Shaowei Wan, Minzhen Li, Hongwei Liu, Ming Yang

    Abstract: Recently, Neural Radiance Fields (NeRF) achieved impressive results in novel view synthesis. Block-NeRF showed the capability of leveraging NeRF to build large city-scale models. For large-scale modeling, a mass of image data is necessary. Collecting images from specially designed data-collection vehicles can not support large-scale applications. How to acquire massive high-quality data remains an… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  23. arXiv:2406.14069  [pdf, other

    eess.IV cs.CV

    Towards Multi-modality Fusion and Prototype-based Feature Refinement for Clinically Significant Prostate Cancer Classification in Transrectal Ultrasound

    Authors: Hong Wu, Juan Fu, Hongsheng Ye, Yuming Zhong, Xuebin Zou, Jianhua Zhou, Yi Wang

    Abstract: Prostate cancer is a highly prevalent cancer and ranks as the second leading cause of cancer-related deaths in men globally. Recently, the utilization of multi-modality transrectal ultrasound (TRUS) has gained significant traction as a valuable technique for guiding prostate biopsies. In this study, we propose a novel learning framework for clinically significant prostate cancer (csPCa) classifica… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  24. arXiv:2406.13129  [pdf, other

    cs.CV cs.LG

    M3T: Multi-Modal Medical Transformer to bridge Clinical Context with Visual Insights for Retinal Image Medical Description Generation

    Authors: Nagur Shareef Shaik, Teja Krishna Cherukuri, Dong Hye Ye

    Abstract: Automated retinal image medical description generation is crucial for streamlining medical diagnosis and treatment planning. Existing challenges include the reliance on learned retinal image representations, difficulties in handling multiple imaging modalities, and the lack of clinical context in visual representations. Addressing these issues, we propose the Multi-Modal Medical Transformer (M3T),… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted for presentation at the IEEE International Conference on Image Processing (ICIP 2024)

  25. arXiv:2406.13126  [pdf, other

    cs.CV cs.LG

    Guided Context Gating: Learning to leverage salient lesions in retinal fundus images

    Authors: Teja Krishna Cherukuri, Nagur Shareef Shaik, Dong Hye Ye

    Abstract: Effectively representing medical images, especially retinal images, presents a considerable challenge due to variations in appearance, size, and contextual information of pathological signs called lesions. Precise discrimination of these lesions is crucial for diagnosing vision-threatening issues such as diabetic retinopathy. While visual attention-based neural networks have been introduced to lea… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted for presentation at the IEEE International Conference on Image Processing (ICIP 2024)

  26. arXiv:2406.12683  [pdf, other

    cs.CV cs.LG

    Spatial Sequence Attention Network for Schizophrenia Classification from Structural Brain MR Images

    Authors: Nagur Shareef Shaik, Teja Krishna Cherukuri, Vince Calhoun, Dong Hye Ye

    Abstract: Schizophrenia is a debilitating, chronic mental disorder that significantly impacts an individual's cognitive abilities, behavior, and social interactions. It is characterized by subtle morphological changes in the brain, particularly in the gray matter. These changes are often imperceptible through manual observation, demanding an automated approach to diagnosis. This study introduces a deep lear… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted for the 21st IEEE International Symposium on Biomedical Imaging (ISBI 2024)

  27. arXiv:2406.11633  [pdf, other

    cs.CV

    DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models

    Authors: Renqiu Xia, Song Mao, Xiangchao Yan, Hongbin Zhou, Bo Zhang, Haoyang Peng, Jiahao Pi, Daocheng Fu, Wenjie Wu, Hancheng Ye, Shiyang Feng, Bin Wang, Chao Xu, Conghui He, Pinlong Cai, Min Dou, Botian Shi, Sheng Zhou, Yongwei Wang, Bin Wang, Junchi Yan, Fei Wu, Yu Qiao

    Abstract: Scientific documents record research findings and valuable human knowledge, comprising a vast corpus of high-quality data. Leveraging multi-modality data extracted from these documents and assessing large models' abilities to handle scientific document-oriented tasks is therefore meaningful. Despite promising advancements, large models still perform poorly on multi-page scientific document extract… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Homepage of DocGenome: https://unimodal4reasoning.github.io/DocGenome_page 22 pages, 11 figures

  28. arXiv:2406.10903  [pdf, other

    cs.LG cs.CL cs.SE

    New Solutions on LLM Acceleration, Optimization, and Application

    Authors: Yingbing Huang, Lily Jiaxin Wan, Hanchen Ye, Manvi Jha, Jinghua Wang, Yuhong Li, Xiaofan Zhang, Deming Chen

    Abstract: Large Language Models (LLMs) have become extremely potent instruments with exceptional capacities for comprehending and producing human-like text in a wide range of applications. However, the increasing size and complexity of LLMs present significant challenges in both training and deployment, leading to substantial computational and storage costs as well as heightened energy consumption. In this… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: This is an expanded and more comprehensive study based on our invited DAC-24 paper with the same title and co-authors

  29. arXiv:2406.10621  [pdf, other

    cs.CL cs.AI

    StrucText-Eval: An Autogenerated Benchmark for Evaluating Large Language Model's Ability in Structure-Rich Text Understanding

    Authors: Zhouhong Gu, Haoning Ye, Zeyang Zhou, Hongwei Feng, Yanghua Xiao

    Abstract: Given the substantial volumes of structured data held by many companies, enabling Large Language Models (LLMs) to directly understand structured text in non-structured forms could significantly enhance their capabilities across various business scenarios. To this end, we propose evaluation data generation method for assessing LLM's ability in understanding the structure-rich text, which generates… ▽ More

    Submitted 30 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

  30. arXiv:2406.08477  [pdf, other

    cs.IR

    Improving LLMs for Recommendation with Out-Of-Vocabulary Tokens

    Authors: Ting-Ji Huang, Jia-Qi Yang, Chunxu Shen, Kai-Qi Liu, De-Chuan Zhan, Han-Jia Ye

    Abstract: Characterizing users and items through vector representations is crucial for various tasks in recommender systems. Recent approaches attempt to apply Large Language Models (LLMs) in recommendation through a question and answer format, where real users and items (e.g., Item No.2024) are represented with in-vocabulary tokens (e.g., "item", "20", "24"). However, since LLMs are typically pretrained on… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  31. arXiv:2406.08037  [pdf, other

    cs.CV

    Adaptively Bypassing Vision Transformer Blocks for Efficient Visual Tracking

    Authors: Xiangyang Yang, Dan Zeng, Xucheng Wang, You Wu, Hengzhou Ye, Qijun Zhao, Shuiwang Li

    Abstract: Empowered by transformer-based models, visual tracking has advanced significantly. However, the slow speed of current trackers limits their applicability on devices with constrained computational resources. To address this challenge, we introduce ABTrack, an adaptive computation framework that adaptively bypassing transformer blocks for efficient visual tracking. The rationale behind ABTrack is ro… ▽ More

    Submitted 1 July, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  32. arXiv:2406.05982  [pdf

    eess.IV cs.LG physics.med-ph

    Artificial Intelligence for Neuro MRI Acquisition: A Review

    Authors: Hongjia Yang, Guanhua Wang, Ziyu Li, Haoxiang Li, Jialan Zheng, Yuxin Hu, Xiaozhi Cao, Congyu Liao, Huihui Ye, Qiyuan Tian

    Abstract: Magnetic resonance imaging (MRI) has significantly benefited from the resurgence of artificial intelligence (AI). By leveraging AI's capabilities in large-scale optimization and pattern recognition, innovative methods are transforming the MRI acquisition workflow, including planning, sequence design, and correction of acquisition artifacts. These emerging algorithms demonstrate substantial potenti… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Magn Reson Mater Phy (2024)

  33. arXiv:2406.05892  [pdf, other

    cs.CR cs.LG cs.SE

    Security Vulnerability Detection with Multitask Self-Instructed Fine-Tuning of Large Language Models

    Authors: Aidan Z. H. Yang, Haoye Tian, He Ye, Ruben Martins, Claire Le Goues

    Abstract: Software security vulnerabilities allow attackers to perform malicious activities to disrupt software operations. Recent Transformer-based language models have significantly advanced vulnerability detection, surpassing the capabilities of static analysis based deep learning models. However, language models trained solely on code tokens do not capture either the explanation of vulnerability type or… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  34. arXiv:2406.04584  [pdf, other

    cs.LG cs.AI cs.CV

    CLoG: Benchmarking Continual Learning of Image Generation Models

    Authors: Haotian Zhang, Junting Zhou, Haowei Lin, Hang Ye, Jianhua Zhu, Zihao Wang, Liangcai Gao, Yizhou Wang, Yitao Liang

    Abstract: Continual Learning (CL) poses a significant challenge in Artificial Intelligence, aiming to mirror the human ability to incrementally acquire knowledge and skills. While extensive research has focused on CL within the context of classification tasks, the advent of increasingly powerful generative models necessitates the exploration of Continual Learning of Generative models (CLoG). This paper advo… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  35. arXiv:2406.04214  [pdf, other

    cs.CL

    ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models

    Authors: Yuanyi Ren, Haoran Ye, Hanjun Fang, Xin Zhang, Guojie Song

    Abstract: Large Language Models (LLMs) are transforming diverse fields and gaining increasing influence as human proxies. This development underscores the urgent need for evaluating value orientations and understanding of LLMs to ensure their responsible integration into public-facing applications. This work introduces ValueBench, the first comprehensive psychometric benchmark for evaluating value orientati… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL 2024

  36. arXiv:2406.03496  [pdf, other

    cs.CL cs.AI cs.LG

    Wings: Learning Multimodal LLMs without Text-only Forgetting

    Authors: Yi-Kai Zhang, Shiyin Lu, Yang Li, Yanqing Ma, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, De-Chuan Zhan, Han-Jia Ye

    Abstract: Multimodal large language models (MLLMs), initiated with a trained LLM, first align images with text and then fine-tune on multimodal mixed inputs. However, the MLLM catastrophically forgets the text-only instructions, which do not include images and can be addressed within the initial LLM. In this paper, we present Wings, a novel MLLM that excels in both text-only dialogues and multimodal compreh… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  37. arXiv:2406.02651  [pdf, other

    cs.LG cs.AI cs.NI

    RoutePlacer: An End-to-End Routability-Aware Placer with Graph Neural Network

    Authors: Yunbo Hou, Haoran Ye, Yingxue Zhang, Siyuan Xu, Guojie Song

    Abstract: Placement is a critical and challenging step of modern chip design, with routability being an essential indicator of placement quality. Current routability-oriented placers typically apply an iterative two-stage approach, wherein the first stage generates a placement solution, and the second stage provides non-differentiable routing results to heuristically improve the solution quality. This metho… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted at KDD 2024

  38. arXiv:2406.02539  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Parrot: Multilingual Visual Instruction Tuning

    Authors: Hai-Long Sun, Da-Wei Zhou, Yang Li, Shiyin Lu, Chao Yi, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, De-Chuan Zhan, Han-Jia Ye

    Abstract: The rapid development of Multimodal Large Language Models (MLLMs) like GPT-4V has marked a significant step towards artificial general intelligence. Existing methods mainly focus on aligning vision encoders with LLMs through supervised fine-tuning (SFT) to endow LLMs with multimodal abilities, making MLLMs' inherent ability to react to multiple languages progressively deteriorate as the training p… ▽ More

    Submitted 11 August, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Code is available at: https://github.com/AIDC-AI/Parrot

  39. arXiv:2405.20797  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Ovis: Structural Embedding Alignment for Multimodal Large Language Model

    Authors: Shiyin Lu, Yang Li, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Han-Jia Ye

    Abstract: Current Multimodal Large Language Models (MLLMs) typically integrate a pre-trained LLM with another pre-trained vision transformer through a connector, such as an MLP, endowing the LLM with visual capabilities. However, the misalignment between two embedding strategies in MLLMs -- the structural textual embeddings based on an embedding look-up table and the continuous embeddings generated directly… ▽ More

    Submitted 17 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

  40. arXiv:2405.19335  [pdf, other

    cs.CV cs.CL cs.LG

    X-VILA: Cross-Modality Alignment for Large Language Model

    Authors: Hanrong Ye, De-An Huang, Yao Lu, Zhiding Yu, Wei Ping, Andrew Tao, Jan Kautz, Song Han, Dan Xu, Pavlo Molchanov, Hongxu Yin

    Abstract: We introduce X-VILA, an omni-modality model designed to extend the capabilities of large language models (LLMs) by incorporating image, video, and audio modalities. By aligning modality-specific encoders with LLM inputs and diffusion decoders with LLM outputs, X-VILA achieves cross-modality understanding, reasoning, and generation. To facilitate this cross-modality alignment, we curate an effectiv… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Technical Report

  41. arXiv:2405.18119  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Low-Resource Crop Classification from Multi-Spectral Time Series Using Lossless Compressors

    Authors: Wei Cheng, Hongrui Ye, Xiao Wen, Jiachen Zhang, Jiping Xu, Feifan Zhang

    Abstract: Deep learning has significantly improved the accuracy of crop classification using multispectral temporal data. However, these models have complex structures with numerous parameters, requiring large amounts of data and costly training. In low-resource situations with fewer labeled samples, deep learning models perform poorly due to insufficient data. Conversely, compressors are data-type agnostic… ▽ More

    Submitted 5 July, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: 8 pages, 10 figures

  42. arXiv:2405.17761  [pdf, other

    cs.LG math.OC

    Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems without First-Order Gradient

    Authors: Hao Di, Haishan Ye, Yueling Zhang, Xiangyu Chang, Guang Dai, Ivor W. Tsang

    Abstract: Variance reduction techniques are designed to decrease the sampling variance, thereby accelerating convergence rates of first-order (FO) and zeroth-order (ZO) optimization methods. However, in composite optimization problems, ZO methods encounter an additional variance called the coordinate-wise variance, which stems from the random gradient estimation. To reduce this variance, prior works require… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  43. arXiv:2405.16414  [pdf, other

    cs.CV

    PPRSteg: Printing and Photography Robust QR Code Steganography via Attention Flow-Based Model

    Authors: Huayuan Ye, Shenzhuo Zhang, Shiqi Jiang, Jing Liao, Shuhang Gu, Changbo Wang, Chenhui Li

    Abstract: Image steganography can hide information in a host image and obtain a stego image that is perceptually indistinguishable from the original one. This technique has tremendous potential in scenarios like copyright protection, information retrospection, etc. Some previous studies have proposed to enhance the robustness of the methods against image disturbances to increase their applicability. However… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 9 content pages

  44. arXiv:2405.16256  [pdf, other

    cs.DC cs.AI

    HETHUB: A Distributed Training System with Heterogeneous Cluster for Large-Scale Models

    Authors: Si Xu, Zixiao Huang, Yan Zeng, Shengen Yan, Xuefei Ning, Quanlu Zhang, Haolin Ye, Sipei Gu, Chunsheng Shui, Zhezheng Lin, Hao Zhang, Sheng Wang, Guohao Dai, Yu Wang

    Abstract: Training large-scale models relies on a vast number of computing resources. For example, training the GPT-4 model (1.8 trillion parameters) requires 25000 A100 GPUs . It is a challenge to build a large-scale cluster with one type of GPU-accelerator. Using multiple types of GPU-accelerators to construct a large-scale cluster is an effective way to solve the problem of insufficient homogeneous GPU-a… ▽ More

    Submitted 8 August, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  45. arXiv:2405.16126  [pdf, other

    math.OC cs.LG

    Near-Optimal Distributed Minimax Optimization under the Second-Order Similarity

    Authors: Qihao Zhou, Haishan Ye, Luo Luo

    Abstract: This paper considers the distributed convex-concave minimax optimization under the second-order similarity. We propose stochastic variance-reduced optimistic gradient sliding (SVOGS) method, which takes the advantage of the finite-sum structure in the objective by involving the mini-batch client sampling and variance reduction. We prove SVOGS can achieve the $\varepsilon$-duality gap within commun… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  46. arXiv:2405.09913  [pdf, other

    cs.CL

    TransMI: A Framework to Create Strong Baselines from Multilingual Pretrained Language Models for Transliterated Data

    Authors: Yihong Liu, Chunlan Ma, Haotian Ye, Hinrich Schütze

    Abstract: Transliterating related languages that use different scripts into a common script shows effectiveness in improving crosslingual transfer in downstream tasks. However, this methodology often makes pretraining a model from scratch unavoidable, as transliteration brings about new subwords not covered in existing multilingual pretrained language models (mPLMs). This is not desired because it takes a l… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: preprint

  47. arXiv:2405.07508  [pdf, other

    cs.SE

    Revealing the value of Repository Centrality in lifespan prediction of Open Source Software Projects

    Authors: Runzhi He, Hengzhi Ye, Minghui Zhou

    Abstract: Background: Open Source Software is the building block of modern software. However, the prevalence of project deprecation in the open source world weakens the integrity of the downstream systems and the broad ecosystem. Therefore it calls for efforts in monitoring and predicting project deprecations, empowering stakeholders to take proactive measures. Challenge: Existing techniques mainly focus on… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  48. arXiv:2405.05514  [pdf, other

    cs.RO

    HPPS: A Hierarchical Progressive Perception System for Luggage Trolley Detection and Localization at Airports

    Authors: Zhirui Sun, Zhe Zhang, Jieting Zhao, Hanjing Ye, Jiankun Wang

    Abstract: The robotic autonomous luggage trolley collection system employs robots to gather and transport scattered luggage trolleys at airports. However, existing methods for detecting and locating these luggage trolleys often fail when they are not fully visible. To address this, we introduce the Hierarchical Progressive Perception System (HPPS), which enhances the detection and localization of luggage tr… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  49. arXiv:2404.18143  [pdf, other

    cs.CV

    Tracking Transforming Objects: A Benchmark

    Authors: You Wu, Yuelong Wang, Yaxin Liao, Fuliang Wu, Hengzhou Ye, Shuiwang Li

    Abstract: Tracking transforming objects holds significant importance in various fields due to the dynamic nature of many real-world scenarios. By enabling systems accurately represent transforming objects over time, tracking transforming objects facilitates advancements in areas such as autonomous systems, human-computer interaction, and security applications. Moreover, understanding the behavior of transfo… ▽ More

    Submitted 7 July, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

  50. arXiv:2404.17753  [pdf, other

    cs.CV cs.AI

    Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification

    Authors: Chao Yi, Lu Ren, De-Chuan Zhan, Han-Jia Ye

    Abstract: CLIP showcases exceptional cross-modal matching capabilities due to its training on image-text contrastive learning tasks. However, without specific optimization for unimodal scenarios, its performance in single-modality feature extraction might be suboptimal. Despite this, some studies have directly used CLIP's image encoder for tasks like few-shot classification, introducing a misalignment betwe… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.