Skip to main content

Showing 1–50 of 824 results for author: He, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.16431  [pdf, other

    cs.CV

    Discriminative Spatial-Semantic VOS Solution: 1st Place Solution for 6th LSVOS

    Authors: Deshui Miao, Yameng Gu, Xin Li, Zhenyu He, Yaowei Wang, Ming-Hsuan Yang

    Abstract: Video object segmentation (VOS) is a crucial task in computer vision, but current VOS methods struggle with complex scenes and prolonged object motions. To address these challenges, the MOSE dataset aims to enhance object recognition and differentiation in complex environments, while the LVOS dataset focuses on segmenting objects exhibiting long-term, intricate movements. This report introduces a… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 1st Place Solution for 6th LSVOS VOS Track. arXiv admin note: substantial text overlap with arXiv:2406.04600

  2. arXiv:2408.15549  [pdf, other

    cs.CL

    WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback

    Authors: Taiwei Shi, Zhuoer Wang, Longqi Yang, Ying-Chun Lin, Zexue He, Mengting Wan, Pei Zhou, Sujay Jauhar, Xiaofeng Xu, Xia Song, Jennifer Neville

    Abstract: As large language models (LLMs) continue to advance, aligning these models with human preferences has emerged as a critical challenge. Traditional alignment methods, relying on human or LLM annotated datasets, are limited by their resource-intensive nature, inherent subjectivity, and the risk of feedback loops that amplify model biases. To overcome these limitations, we introduce WildFeedback, a n… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 24 pages

  3. arXiv:2408.13983  [pdf, other

    cs.CV

    Dual-Path Adversarial Lifting for Domain Shift Correction in Online Test-time Adaptation

    Authors: Yushun Tang, Shuoshuo Chen, Zhihe Lu, Xinchao Wang, Zhihai He

    Abstract: Transformer-based methods have achieved remarkable success in various machine learning tasks. How to design efficient test-time adaptation methods for transformer models becomes an important research task. In this work, motivated by the dual-subband wavelet lifting scheme developed in multi-scale signal processing which is able to efficiently separate the input signals into principal components an… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  4. arXiv:2408.13898  [pdf, other

    cs.CV

    Evaluating Attribute Comprehension in Large Vision-Language Models

    Authors: Haiwen Zhang, Zixi Yang, Yuanzhi Liu, Xinran Wang, Zheqi He, Kongming Liang, Zhanyu Ma

    Abstract: Currently, large vision-language models have gained promising progress on many downstream tasks. However, they still suffer many challenges in fine-grained visual understanding tasks, such as object attribute comprehension. Besides, there have been growing efforts on the evaluations of large vision-language models, but lack of in-depth study of attribute comprehension and the visual language fine-… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 15 pages, 4 figures

  5. arXiv:2408.12664  [pdf, other

    cs.AI q-bio.NC

    Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience

    Authors: Zhonghao He, Jascha Achterberg, Katie Collins, Kevin Nejad, Danyal Akarca, Yinzhu Yang, Wes Gurnee, Ilia Sucholutsky, Yuhan Tang, Rebeca Ianov, George Ogden, Chole Li, Kai Sandbrink, Stephen Casper, Anna Ivanova, Grace W. Lindsay

    Abstract: As deep learning systems are scaled up to many billions of parameters, relating their internal structure to external behaviors becomes very challenging. Although daunting, this problem is not new: Neuroscientists and cognitive scientists have accumulated decades of experience analyzing a particularly complex system - the brain. In this work, we argue that interpreting both biological and artificia… ▽ More

    Submitted 25 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

  6. arXiv:2408.11795  [pdf, other

    cs.CV

    EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model

    Authors: Feipeng Ma, Yizhou Zhou, Hebei Li, Zilong He, Siying Wu, Fengyun Rao, Yueyi Zhang, Xiaoyan Sun

    Abstract: In the realm of multimodal research, numerous studies leverage substantial image-text pairs to conduct modal alignment learning, transforming Large Language Models (LLMs) into Multimodal LLMs and excelling in a variety of visual-language tasks. The prevailing methodologies primarily fall into two categories: self-attention-based and cross-attention-based methods. While self-attention-based methods… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  7. arXiv:2408.10946  [pdf, other

    cs.AI

    Large Language Model Driven Recommendation

    Authors: Anton Korikov, Scott Sanner, Yashar Deldjoo, Zhankui He, Julian McAuley, Arnau Ramisa, Rene Vidal, Mahesh Sathiamoorthy, Atoosa Kasrizadeh, Silvia Milano, Francesco Ricci

    Abstract: While previous chapters focused on recommendation systems (RSs) based on standardized, non-verbal user feedback such as purchases, views, and clicks -- the advent of LLMs has unlocked the use of natural language (NL) interactions for recommendation. This chapter discusses how LLMs' abilities for general NL reasoning present novel opportunities to build highly personalized RSs -- which can effectiv… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  8. arXiv:2408.09538  [pdf, other

    quant-ph cs.ET

    Parameter Setting Heuristics Make the Quantum Approximate Optimization Algorithm Suitable for the Early Fault-Tolerant Era

    Authors: Zichang He, Ruslan Shaydulin, Dylan Herman, Changhao Li, Rudy Raymond, Shree Hari Sureshbabu, Marco Pistoia

    Abstract: Quantum Approximate Optimization Algorithm (QAOA) is one of the most promising quantum heuristics for combinatorial optimization. While QAOA has been shown to perform well on small-scale instances and to provide an asymptotic speedup over state-of-the-art classical algorithms for some problems, fault-tolerance is understood to be required to realize this speedup in practice. The low resource requi… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 7 pages, an invited paper at ICCAD 2024 "Exploring Quantum Technologies in Practical Applications" special session

  9. arXiv:2408.09403  [pdf, other

    cs.AI cs.CV

    Obtaining Optimal Spiking Neural Network in Sequence Learning via CRNN-SNN Conversion

    Authors: Jiahao Su, Kang You, Zekai Xu, Weizhi Xu, Zhezhi He

    Abstract: Spiking neural networks (SNNs) are becoming a promising alternative to conventional artificial neural networks (ANNs) due to their rich neural dynamics and the implementation of energy-efficient neuromorphic chips. However, the non-differential binary communication mechanism makes SNN hard to converge to an ANN-level accuracy. When SNN encounters sequence learning, the situation becomes worse due… ▽ More

    Submitted 25 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

    Comments: Accepted by 33rd International Conference on Artificial Neural Networks

  10. arXiv:2408.09366  [pdf, other

    cs.CL cs.CY cs.SI

    Improving and Assessing the Fidelity of Large Language Models Alignment to Online Communities

    Authors: Minh Duc Chu, Zihao He, Rebecca Dorn, Kristina Lerman

    Abstract: Large language models (LLMs) have shown promise in representing individuals and communities, offering new ways to study complex social dynamics. However, effectively aligning LLMs with specific human groups and systematically assessing the fidelity of the alignment remains a challenge. This paper presents a robust framework for aligning LLMs with online communities via instruction-tuning and compr… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  11. arXiv:2408.08933  [pdf, other

    cs.IR cs.AI cs.DB

    RoarGraph: A Projected Bipartite Graph for Efficient Cross-Modal Approximate Nearest Neighbor Search

    Authors: Meng Chen, Kai Zhang, Zhenying He, Yinan Jing, X. Sean Wang

    Abstract: Approximate Nearest Neighbor Search (ANNS) is a fundamental and critical component in many applications, including recommendation systems and large language model-based applications. With the advancement of multimodal neural models, which transform data from different modalities into a shared high-dimensional space as feature vectors, cross-modal ANNS aims to use the data vector from one modality… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: to be published in PVLDB

  12. arXiv:2408.08930  [pdf, other

    cs.CR cs.AI cs.CL

    DePrompt: Desensitization and Evaluation of Personal Identifiable Information in Large Language Model Prompts

    Authors: Xiongtao Sun, Gan Liu, Zhipeng He, Hui Li, Xiaoguang Li

    Abstract: Prompt serves as a crucial link in interacting with large language models (LLMs), widely impacting the accuracy and interpretability of model outputs. However, acquiring accurate and high-quality responses necessitates precise prompts, which inevitably pose significant risks of personal identifiable information (PII) leakage. Therefore, this paper proposes DePrompt, a desensitization protection an… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  13. arXiv:2408.07600  [pdf, other

    cs.CV

    Disentangle and denoise: Tackling context misalignment for video moment retrieval

    Authors: Kaijing Ma, Han Fang, Xianghao Zang, Chao Ban, Lanxiang Zhou, Zhongjiang He, Yongxiang Li, Hao Sun, Zerun Feng, Xingsong Hou

    Abstract: Video Moment Retrieval, which aims to locate in-context video moments according to a natural language query, is an essential task for cross-modal grounding. Existing methods focus on enhancing the cross-modal interactions between all moments and the textual description for video understanding. However, constantly interacting with all locations is unreasonable because of uneven semantic distributio… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  14. arXiv:2408.06494  [pdf, other

    cs.HC cs.CL cs.CV

    What Color Scheme is More Effective in Assisting Readers to Locate Information in a Color-Coded Article?

    Authors: Ho Yin Ng, Zeyu He, Ting-Hao 'Kenneth' Huang

    Abstract: Color coding, a technique assigning specific colors to cluster information types, has proven advantages in aiding human cognitive activities, especially reading and comprehension. The rise of Large Language Models (LLMs) has streamlined document coding, enabling simple automatic text labeling with various schemes. This has the potential to make color-coding more accessible and benefit more users.… ▽ More

    Submitted 26 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

    Comments: This paper will appear at IEEE VIS 2024

  15. arXiv:2408.05842  [pdf, other

    cs.AI cs.HC

    Scaling Virtual World with Delta-Engine

    Authors: Hongqiu Wu, Zekai Xu, Tianyang Xu, Jiale Hong, Weiqi Wu, Yan Wang, Hai Zhao, Min Zhang, Zhezhi He

    Abstract: In this paper, we focus on the \emph{virtual world}, a cyberspace where people can live in. An ideal virtual world shares great similarity with our real world. One of the crucial aspects is its evolving nature, reflected by individuals' capability to grow and thereby influence the objective world. Such dynamics is unpredictable and beyond the reach of existing systems. For this, we propose a speci… ▽ More

    Submitted 22 August, 2024; v1 submitted 11 August, 2024; originally announced August 2024.

  16. arXiv:2408.03748  [pdf, other

    cs.CV

    Data Generation Scheme for Thermal Modality with Edge-Guided Adversarial Conditional Diffusion Model

    Authors: Guoqing Zhu, Honghu Pan, Qiang Wang, Chao Tian, Chao Yang, Zhenyu He

    Abstract: In challenging low light and adverse weather conditions,thermal vision algorithms,especially object detection,have exhibited remarkable potential,contrasting with the frequent struggles encountered by visible vision algorithms. Nevertheless,the efficacy of thermal vision algorithms driven by deep learning models remains constrained by the paucity of available training data samples. To this end,thi… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: accepted by ACM MM 2024/ACM MM24

  17. arXiv:2408.02263  [pdf, other

    cs.CV

    VoxelTrack: Exploring Voxel Representation for 3D Point Cloud Object Tracking

    Authors: Yuxuan Lu, Jiahao Nie, Zhiwei He, Hongjie Gu, Xudong Lv

    Abstract: Current LiDAR point cloud-based 3D single object tracking (SOT) methods typically rely on point-based representation network. Despite demonstrated success, such networks suffer from some fundamental problems: 1) It contains pooling operation to cope with inherently disordered point clouds, hindering the capture of 3D spatial information that is useful for tracking, a regression task. 2) The adopte… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  18. arXiv:2408.01800  [pdf, other

    cs.CV

    MiniCPM-V: A GPT-4V Level MLLM on Your Phone

    Authors: Yuan Yao, Tianyu Yu, Ao Zhang, Chongyi Wang, Junbo Cui, Hongji Zhu, Tianchi Cai, Haoyu Li, Weilin Zhao, Zhihui He, Qianyu Chen, Huarong Zhou, Zhensheng Zou, Haoye Zhang, Shengding Hu, Zhi Zheng, Jie Zhou, Jie Cai, Xu Han, Guoyang Zeng, Dahai Li, Zhiyuan Liu, Maosong Sun

    Abstract: The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally reshaped the landscape of AI research and industry, shedding light on a promising path toward the next AI milestone. However, significant challenges remain preventing MLLMs from being practical in real-world applications. The most notable challenge comes from the huge cost of running an MLLM with a massive number of par… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: preprint

  19. arXiv:2408.01137  [pdf, other

    cs.CV

    PGNeXt: High-Resolution Salient Object Detection via Pyramid Grafting Network

    Authors: Changqun Xia, Chenxi Xie, Zhentao He, Tianshu Yu, Jia Li

    Abstract: We present an advanced study on more challenging high-resolution salient object detection (HRSOD) from both dataset and network framework perspectives. To compensate for the lack of HRSOD dataset, we thoughtfully collect a large-scale high resolution salient object detection dataset, called UHRSD, containing 5,920 images from real-world complex scenarios at 4K-8K resolutions. All the images are fi… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  20. arXiv:2408.00557  [pdf, other

    quant-ph cs.ET

    End-to-End Protocol for High-Quality QAOA Parameters with Few Shots

    Authors: Tianyi Hao, Zichang He, Ruslan Shaydulin, Jeffrey Larson, Marco Pistoia

    Abstract: The quantum approximate optimization algorithm (QAOA) is a quantum heuristic for combinatorial optimization that has been demonstrated to scale better than state-of-the-art classical solvers for some problems. For a given problem instance, QAOA performance depends crucially on the choice of the parameters. While average-case optimal parameters are available in many cases, meaningful performance ga… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: 14 pages, 14 figures

  21. arXiv:2407.20265  [pdf, other

    cs.LG cs.CE

    COEFF-KANs: A Paradigm to Address the Electrolyte Field with KANs

    Authors: Xinhe Li, Zhuoying Feng, Yezeng Chen, Weichen Dai, Zixu He, Yi Zhou, Shuhong Jiao

    Abstract: To reduce the experimental validation workload for chemical researchers and accelerate the design and optimization of high-energy-density lithium metal batteries, we aim to leverage models to automatically predict Coulombic Efficiency (CE) based on the composition of liquid electrolytes. There are mainly two representative paradigms in existing methods: machine learning and deep learning. However,… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 8 pages, 5 figures

  22. arXiv:2407.20172  [pdf, other

    eess.IV cs.AI cs.CV

    LatentArtiFusion: An Effective and Efficient Histological Artifacts Restoration Framework

    Authors: Zhenqi He, Wenrui Liu, Minghao Yin, Kai Han

    Abstract: Histological artifacts pose challenges for both pathologists and Computer-Aided Diagnosis (CAD) systems, leading to errors in analysis. Current approaches for histological artifact restoration, based on Generative Adversarial Networks (GANs) and pixel-level Diffusion Models, suffer from performance limitations and computational inefficiencies. In this paper, we propose a novel framework, LatentArt… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accept to DGM4MICCAI2024

  23. arXiv:2407.17638  [pdf

    cs.CL

    Time Matters: Examine Temporal Effects on Biomedical Language Models

    Authors: Weisi Liu, Zhe He, Xiaolei Huang

    Abstract: Time roots in applying language models for biomedical applications: models are trained on historical data and will be deployed for new or future data, which may vary from training data. While increasing biomedical tasks have employed state-of-the-art language models, there are very few studies have examined temporal effects on biomedical models when data usually shifts across development and deplo… ▽ More

    Submitted 11 August, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

    Comments: Accepted to AMIA 2024 Annual Symposium

  24. arXiv:2407.16418  [pdf, other

    eess.IV cs.CV

    Accelerating Learned Video Compression via Low-Resolution Representation Learning

    Authors: Zidian Qiu, Zongyao He, Zhi Jin

    Abstract: In recent years, the field of learned video compression has witnessed rapid advancement, exemplified by the latest neural video codecs DCVC-DC that has outperformed the upcoming next-generation codec ECM in terms of compression ratio. Despite this, learned video compression frameworks often exhibit low encoding and decoding speeds primarily due to their increased computational complexity and unnec… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  25. arXiv:2407.15714  [pdf

    cs.CV cs.AI

    Mamba meets crack segmentation

    Authors: Zhili He, Yu-Hsing Wang

    Abstract: Cracks pose safety risks to infrastructure and cannot be overlooked. The prevailing structures in existing crack segmentation networks predominantly consist of CNNs or Transformers. However, CNNs exhibit a deficiency in global modeling capability, hindering the representation to entire crack features. Transformers can capture long-range dependencies but suffer from high and quadratic complexity. R… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 32 pages, 8 figures. Preprint submitted to Elsevier

  26. arXiv:2407.15353  [pdf, other

    cs.CL cs.AR

    Customized Retrieval Augmented Generation and Benchmarking for EDA Tool Documentation QA

    Authors: Yuan Pu, Zhuolun He, Tairu Qiu, Haoyuan Wu, Bei Yu

    Abstract: Retrieval augmented generation (RAG) enhances the accuracy and reliability of generative AI models by sourcing factual information from external databases, which is extensively employed in document-grounded question-answering (QA) tasks. Off-the-shelf RAG flows are well pretrained on general-purpose documents, yet they encounter significant challenges when being applied to knowledge-intensive vert… ▽ More

    Submitted 26 July, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

    Comments: Accepted by ICCAD 2024

  27. arXiv:2407.15202  [pdf, other

    q-bio.BM cs.AI cs.LG

    Exploiting Pre-trained Models for Drug Target Affinity Prediction with Nearest Neighbors

    Authors: Qizhi Pei, Lijun Wu, Zhenyu He, Jinhua Zhu, Yingce Xia, Shufang Xie, Rui Yan

    Abstract: Drug-Target binding Affinity (DTA) prediction is essential for drug discovery. Despite the application of deep learning methods to DTA prediction, the achieved accuracy remain suboptimal. In this work, inspired by the recent success of retrieval methods, we propose $k$NN-DTA, a non-parametric embedding-based retrieval method adopted on a pre-trained DTA prediction model, which can extend the power… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: Accepted by 33rd ACM International Conference on Information and Knowledge Management 2024 (CIKM 2024)

  28. arXiv:2407.11463  [pdf, other

    cs.LG cs.AI cs.CR

    Investigating Imperceptibility of Adversarial Attacks on Tabular Data: An Empirical Analysis

    Authors: Zhipeng He, Chun Ouyang, Laith Alzubaidi, Alistair Barros, Catarina Moreira

    Abstract: Adversarial attacks are a potential threat to machine learning models by causing incorrect predictions through imperceptible perturbations to the input data. While these attacks have been extensively studied in unstructured data like images, applying them to tabular data, poses new challenges. These challenges arise from the inherent heterogeneity and complex feature interdependencies in tabular d… ▽ More

    Submitted 20 August, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: 33 pages

  29. arXiv:2407.10625  [pdf, other

    cs.CV

    WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models

    Authors: Zijian He, Peixin Chen, Guangrun Wang, Guanbin Li, Philip H. S. Torr, Liang Lin

    Abstract: Video virtual try-on aims to generate realistic sequences that maintain garment identity and adapt to a person's pose and body shape in source videos. Traditional image-based methods, relying on warping and blending, struggle with complex human movements and occlusions, limiting their effectiveness in video try-on applications. Moreover, video-based models require extensive, high-quality data and… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  30. arXiv:2407.10193  [pdf, other

    cs.CV

    GRAPE: Generalizable and Robust Multi-view Facial Capture

    Authors: Jing Li, Di Kang, Zhenyu He

    Abstract: Deep learning-based multi-view facial capture methods have shown impressive accuracy while being several orders of magnitude faster than a traditional mesh registration pipeline. However, the existing systems (e.g. TEMPEH) are strictly restricted to inference on the data captured by the same camera array used to capture their training data. In this study, we aim to improve the generalization abili… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  31. arXiv:2407.09722  [pdf, other

    cs.CL cs.LG

    Multi-Token Joint Speculative Decoding for Accelerating Large Language Model Inference

    Authors: Zongyue Qin, Ziniu Hu, Zifan He, Neha Prakriya, Jason Cong, Yizhou Sun

    Abstract: Transformer-based Large language models (LLMs) have demonstrated their power in various tasks, but their inference incurs significant time and energy costs. To accelerate LLM inference, speculative decoding uses a smaller model to propose one sequence of tokens, which are subsequently validated in batch by the target large model. Compared with autoregressive decoding, speculative decoding generate… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  32. arXiv:2407.09083  [pdf, other

    cs.NE

    BKDSNN: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation

    Authors: Zekai Xu, Kang You, Qinghai Guo, Xiang Wang, Zhezhi He

    Abstract: Spiking neural networks (SNNs), which mimic biological neural system to convey information via discrete spikes, are well known as brain-inspired models with excellent computing efficiency. By utilizing the surrogate gradient estimation for discrete spikes, learning-based SNN training methods that can achieve ultra-low inference latency (number of time-step) emerge recently. Nevertheless, due to th… ▽ More

    Submitted 14 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

    Comments: accepted by European Conference on Computer Vision (ECCV) 2024

    Journal ref: European Conference on Computer Vision 2024

  33. arXiv:2407.08672  [pdf, other

    cs.CV

    NODE-Adapter: Neural Ordinary Differential Equations for Better Vision-Language Reasoning

    Authors: Yi Zhang, Chun-Wun Cheng, Ke Yu, Zhihai He, Carola-Bibiane Schönlieb, Angelica I. Aviles-Rivero

    Abstract: In this paper, we consider the problem of prototype-based vision-language reasoning problem. We observe that existing methods encounter three major challenges: 1) escalating resource demands and prolonging training times, 2) contending with excessive learnable parameters, and 3) fine-tuning based only on a single modality. These challenges will hinder their capability to adapt Vision-Language Mode… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  34. arXiv:2407.08273   

    cs.CL

    RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL

    Authors: Zhenhe Wu, Zhongqiu Li, Jie Zhang, Mengxiang Li, Yu Zhao, Ruiyu Fang, Zhongjiang He, Xuelong Li, Zhoujun Li, Shuangyong Song

    Abstract: Large language models (LLMs) with in-context learning have significantly improved the performance of text-to-SQL task. Previous works generally focus on using exclusive SQL generation prompt to improve the LLMs' reasoning ability. However, they are mostly hard to handle large databases with numerous tables and columns, and usually ignore the significance of pre-processing database and extracting v… ▽ More

    Submitted 12 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: Further improvement and modification are needed.

  35. arXiv:2407.07760  [pdf, other

    cs.CV cs.AI

    Learning Spatial-Semantic Features for Robust Video Object Segmentation

    Authors: Xin Li, Deshui Miao, Zhenyu He, Yaowei Wang, Huchuan Lu, Ming-Hsuan Yang

    Abstract: Tracking and segmenting multiple similar objects with complex or separate parts in long-term videos is inherently challenging due to the ambiguity of target parts and identity confusion caused by occlusion, background clutter, and long-term variations. In this paper, we propose a robust video object segmentation framework equipped with spatial-semantic features and discriminative object queries to… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Winner solution of the VOTS2024 Challenge

  36. arXiv:2407.07506  [pdf, other

    eess.SP cs.AI

    Generative AI for RF Sensing in IoT systems

    Authors: Li Wang, Chao Zhang, Qiyang Zhao, Hang Zou, Samson Lasaulce, Giuseppe Valenzise, Zhuo He, Merouane Debbah

    Abstract: The development of wireless sensing technologies, using signals such as Wi-Fi, infrared, and RF to gather environmental data, has significantly advanced within Internet of Things (IoT) systems. Among these, Radio Frequency (RF) sensing stands out for its cost-effective and non-intrusive monitoring of human activities and environmental changes. However, traditional RF sensing methods face significa… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  37. arXiv:2407.06043  [pdf, other

    cs.CV

    Test-time adaptation for geospatial point cloud semantic segmentation with distinct domain shifts

    Authors: Puzuo Wang, Wei Yao, Jie Shao, Zhiyi He

    Abstract: Domain adaptation (DA) techniques help deep learning models generalize across data shifts for point cloud semantic segmentation (PCSS). Test-time adaptation (TTA) allows direct adaptation of a pre-trained model to unlabeled data during inference stage without access to source data or additional training, avoiding privacy issues and large computational resources. We address TTA for geospatial PCSS… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  38. arXiv:2407.05238  [pdf, other

    cs.CV

    P2P: Part-to-Part Motion Cues Guide a Strong Tracking Framework for LiDAR Point Clouds

    Authors: Jiahao Nie, Fei Xie, Sifan Zhou, Xueyi Zhou, Dong-Kyu Chae, Zhiwei He

    Abstract: 3D single object tracking (SOT) methods based on appearance matching has long suffered from insufficient appearance information incurred by incomplete, textureless and semantically deficient LiDAR point clouds. While motion paradigm exploits motion cues instead of appearance matching for tracking, it incurs complex multi-stage processing and segmentation module. In this paper, we first provide in-… ▽ More

    Submitted 8 July, 2024; v1 submitted 6 July, 2024; originally announced July 2024.

    Comments: The source code and pre-trained models are available at https://github.com/haooozi/P2P

  39. arXiv:2407.05205  [pdf, other

    cs.CY cs.AI cs.LG

    The AI Companion in Education: Analyzing the Pedagogical Potential of ChatGPT in Computer Science and Engineering

    Authors: Zhangying He, Thomas Nguyen, Tahereh Miari, Mehrdad Aliasgari, Setareh Rafatirad, Hossein Sayadi

    Abstract: Artificial Intelligence (AI), with ChatGPT as a prominent example, has recently taken center stage in various domains including higher education, particularly in Computer Science and Engineering (CSE). The AI revolution brings both convenience and controversy, offering substantial benefits while lacking formal guidance on their application. The primary objective of this work is to comprehensively… ▽ More

    Submitted 23 April, 2024; originally announced July 2024.

    Comments: conference, 13 pages

  40. arXiv:2407.03551  [pdf, other

    cs.SI cs.CL cs.CY

    Feelings about Bodies: Emotions on Diet and Fitness Forums Reveal Gendered Stereotypes and Body Image Concerns

    Authors: Cinthia Sánchez, Minh Duc Chu, Zihao He, Rebecca Dorn, Stuart Murray, Kristina Lerman

    Abstract: The gendered expectations about ideal body types can lead to body image concerns, dissatisfaction, and in extreme cases, disordered eating and other psychopathologies across the gender spectrum. While research has focused on pro-anorexia online communities that glorify the 'thin ideal', less attention has been given to the broader spectrum of body image concerns or how emerging disorders like musc… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  41. arXiv:2407.03157  [pdf, other

    cs.CL cs.AI cs.LG cs.SE

    Let the Code LLM Edit Itself When You Edit the Code

    Authors: Zhenyu He, Jun Zhang, Shengjie Luo, Jingjing Xu, Zhi Zhang, Di He

    Abstract: In this work, we investigate a typical scenario in code generation where a developer edits existing code in real time and requests a code assistant, e.g., a large language model, to re-predict the next token or next line on the fly. Naively, the LLM needs to re-encode the entire KV cache to provide an accurate prediction. However, this process is computationally expensive, especially when the sequ… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Preprint. Work in Progress

  42. arXiv:2407.02783  [pdf, ps, other

    cs.CL cs.AI

    52B to 1T: Lessons Learned via Tele-FLM Series

    Authors: Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Chao Wang, Xinzhang Liu, Zihan Wang, Yu Zhao, Xin Wang, Yuyao Huang, Shuangyong Song, Yongxiang Li, Zheng Zhang, Bo Zhao, Aixin Sun, Yequan Wang, Zhongjiang He, Zhongyuan Wang, Xuelong Li, Tiejun Huang

    Abstract: Large Language Models (LLMs) represent a significant stride toward Artificial General Intelligence. As scaling laws underscore the potential of increasing model sizes, the academic community has intensified its investigations into LLMs with capacities exceeding 50 billion parameters. This technical report builds on our prior work with Tele-FLM (also known as FLM-2), a publicly available 52-billion… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: For the Tele-FLM-52B tech report, see also 2404.16645

  43. arXiv:2407.02350  [pdf, other

    cs.CV

    Conceptual Codebook Learning for Vision-Language Models

    Authors: Yi Zhang, Ke Yu, Siqi Wu, Zhihai He

    Abstract: In this paper, we propose Conceptual Codebook Learning (CoCoLe), a novel fine-tuning method for vision-language models (VLMs) to address the challenge of improving the generalization capability of VLMs while fine-tuning them on downstream tasks in a few-shot setting. We recognize that visual concepts, such as textures, shapes, and colors are naturally transferable across domains and play a crucial… ▽ More

    Submitted 15 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  44. arXiv:2407.01358  [pdf, other

    cs.CL

    Evaluating Knowledge-based Cross-lingual Inconsistency in Large Language Models

    Authors: Xiaolin Xing, Zhiwei He, Haoyu Xu, Xing Wang, Rui Wang, Yu Hong

    Abstract: This paper investigates the cross-lingual inconsistencies observed in Large Language Models (LLMs), such as ChatGPT, Llama, and Baichuan, which have shown exceptional performance in various Natural Language Processing (NLP) tasks. Despite their successes, these models often exhibit significant inconsistencies when processing the same concepts across different languages. This study focuses on three… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  45. Generative Iris Prior Embedded Transformer for Iris Restoration

    Authors: Yubo Huang, Jia Wang, Peipei Li, Liuyu Xiang, Peigang Li, Zhaofeng He

    Abstract: Iris restoration from complexly degraded iris images, aiming to improve iris recognition performance, is a challenging problem. Due to the complex degradation, directly training a convolutional neural network (CNN) without prior cannot yield satisfactory results. In this work, we propose a generative iris prior embedded Transformer model (Gformer), in which we build a hierarchical encoder-decoder… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: Our code is available at https://github.com/sawyercharlton/Gformer

    Journal ref: 2023 IEEE International Conference on Multimedia and Expo (ICME), Brisbane, Australia, 2023, pp. 510-515

  46. arXiv:2407.00023  [pdf, other

    cs.DC cs.LG

    Preble: Efficient Distributed Prompt Scheduling for LLM Serving

    Authors: Vikranth Srivatsa, Zijian He, Reyna Abhyankar, Dongming Li, Yiying Zhang

    Abstract: Prompts to large language models (LLMs) have evolved beyond simple user questions. For LLMs to solve complex problems, today's practices include domain-specific instructions, illustration of tool usages, and long context, such as textbook chapters in prompts. As such, many parts of prompts are repetitive across requests, and their attention computation results can be reused. However, today's LLM s… ▽ More

    Submitted 8 May, 2024; originally announced July 2024.

  47. arXiv:2407.00014  [pdf

    cs.RO eess.SY

    Simplifying Kinematic Parameter Estimation in sEMG Prosthetic Hands: A Two-Point Approach

    Authors: Gang Liu, Zhenxiang Wang, Ziyang He, Shanshan Guo, Rui Zhang, Dezhong Yao

    Abstract: Regression-based sEMG prosthetic hands are widely used for their ability to provide continuous kinematic parameters. However, establishing these models traditionally requires complex kinematic sensor systems to collect corresponding kinematic data in synchronization with EMG, which is cumbersome and user-unfriendly. This paper presents a simplified approach utilizing only two data points to depict… ▽ More

    Submitted 1 May, 2024; originally announced July 2024.

    Comments: 13 pages

  48. arXiv:2406.19341  [pdf, other

    cs.CV

    Learning Visual Conditioning Tokens to Correct Domain Shift for Fully Test-time Adaptation

    Authors: Yushun Tang, Shuoshuo Chen, Zhehan Kan, Yi Zhang, Qinghai Guo, Zhihai He

    Abstract: Fully test-time adaptation aims to adapt the network model based on sequential analysis of input samples during the inference stage to address the cross-domain performance degradation problem of deep neural networks. This work is based on the following interesting finding: in transformer-based image classification, the class token at the first transformer encoder layer can be learned to capture th… ▽ More

    Submitted 16 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

  49. arXiv:2406.17960  [pdf, other

    cs.CV cs.AI

    MAGIC: Meta-Ability Guided Interactive Chain-of-Distillation for Effective-and-Efficient Vision-and-Language Navigation

    Authors: Liuyi Wang, Zongtao He, Mengjiao Shen, Jingwei Yang, Chengju Liu, Qijun Chen

    Abstract: Despite the remarkable developments of recent large models in Embodied Artificial Intelligence (E-AI), their integration into robotics is hampered by their excessive parameter sizes and computational demands. Towards the Vision-and-Language Navigation (VLN) task, a core task in E-AI, this paper reveals the great potential of using knowledge distillation for obtaining lightweight student models by… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  50. arXiv:2406.17297  [pdf, other

    cs.CV cs.AI

    Towards Open-set Camera 3D Object Detection

    Authors: Zhuolin He, Xinrun Li, Heng Gao, Jiachen Tang, Shoumeng Qiu, Wenfu Wang, Lvjian Lu, Xuchong Qiu, Xiangyang Xue, Jian Pu

    Abstract: Traditional camera 3D object detectors are typically trained to recognize a predefined set of known object classes. In real-world scenarios, these detectors may encounter unknown objects outside the training categories and fail to identify them correctly. To address this gap, we present OS-Det3D (Open-set Camera 3D Object Detection), a two-stage training framework enhancing the ability of camera 3… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.