Skip to main content

Showing 1–50 of 115 results for author: Gan, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.16170  [pdf, other

    cs.DB cs.LG

    CardBench: A Benchmark for Learned Cardinality Estimation in Relational Databases

    Authors: Yannis Chronis, Yawen Wang, Yu Gan, Sami Abu-El-Haija, Chelsea Lin, Carsten Binnig, Fatma Özcan

    Abstract: Cardinality estimation is crucial for enabling high query performance in relational databases. Recently learned cardinality estimation models have been proposed to improve accuracy but there is no systematic benchmark or datasets which allows researchers to evaluate the progress made by new learned approaches and even systematically develop new learned approaches. In this paper, we are releasing a… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  2. arXiv:2408.12109  [pdf, other

    cs.CV cs.CL

    RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data

    Authors: Chenglong Wang, Yang Gan, Yifu Huo, Yongyu Mu, Murun Yang, Qiaozhi He, Tong Xiao, Chunliang Zhang, Tongran Liu, Quan Du, Di Yang, Jingbo Zhu

    Abstract: Large vision-language models (LVLMs) often fail to align with human preferences, leading to issues like generating misleading content without proper visual context (also known as hallucination). A promising solution to this problem is using human-preference alignment techniques, such as best-of-n sampling and reinforcement learning. However, these techniques face the difficulty arising from the sc… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  3. arXiv:2408.09974  [pdf, other

    cs.LG

    The Exploration-Exploitation Dilemma Revisited: An Entropy Perspective

    Authors: Renye Yan, Yaozhong Gan, You Wu, Ling Liang, Junliang Xing, Yimao Cai, Ru Huang

    Abstract: The imbalance of exploration and exploitation has long been a significant challenge in reinforcement learning. In policy optimization, excessive reliance on exploration reduces learning efficiency, while over-dependence on exploitation might trap agents in local optima. This paper revisits the exploration-exploitation dilemma from the perspective of entropy by revealing the relationship between en… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  4. arXiv:2408.09469  [pdf, other

    cs.CR

    Enhancing Adversarial Transferability with Adversarial Weight Tuning

    Authors: Jiahao Chen, Zhou Feng, Rui Zeng, Yuwen Pu, Chunyi Zhou, Yi Jiang, Yuyou Gan, Jinbao Li, Shouling Ji

    Abstract: Deep neural networks (DNNs) are vulnerable to adversarial examples (AEs) that mislead the model while appearing benign to human observers. A critical concern is the transferability of AEs, which enables black-box attacks without direct access to the target model. However, many previous attacks have failed to explain the intrinsic mechanism of adversarial transferability. In this paper, we rethink… ▽ More

    Submitted 20 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

    Comments: 13 pages

  5. arXiv:2408.05116  [pdf, other

    quant-ph cs.LG stat.ML

    Concept learning of parameterized quantum models from limited measurements

    Authors: Beng Yee Gan, Po-Wei Huang, Elies Gil-Fuster, Patrick Rebentrost

    Abstract: Classical learning of the expectation values of observables for quantum states is a natural variant of learning quantum states or channels. While learning-theoretic frameworks establish the sample complexity and the number of measurement shots per sample required for learning such statistical quantities, the interplay between these two variables has not been adequately quantified before. In this w… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: 16 + 8 pages, 4 figures

  6. arXiv:2407.05814  [pdf, other

    cs.CV cs.AI cs.MM

    Cross-domain Few-shot In-context Learning for Enhancing Traffic Sign Recognition

    Authors: Yaozong Gan, Guang Li, Ren Togo, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

    Abstract: Recent multimodal large language models (MLLM) such as GPT-4o and GPT-4v have shown great potential in autonomous driving. In this paper, we propose a cross-domain few-shot in-context learning method based on the MLLM for enhancing traffic sign recognition (TSR). We first construct a traffic sign detection network based on Vision Transformer Adapter and an extraction module to extract traffic sign… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  7. arXiv:2407.04292  [pdf, other

    cs.AR cs.RO

    Corki: Enabling Real-time Embodied AI Robots via Algorithm-Architecture Co-Design

    Authors: Yiyang Huang, Yuhui Hao, Bo Yu, Feng Yan, Yuxin Yang, Feng Min, Yinhe Han, Lin Ma, Shaoshan Liu, Qiang Liu, Yiming Gan

    Abstract: Embodied AI robots have the potential to fundamentally improve the way human beings live and manufacture. Continued progress in the burgeoning field of using large language models to control robots depends critically on an efficient computing substrate. In particular, today's computing systems for embodied AI robots are designed purely based on the interest of algorithm developers, where robot act… ▽ More

    Submitted 1 August, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  8. arXiv:2406.12275  [pdf, other

    cs.CV

    VoCo-LLaMA: Towards Vision Compression with Large Language Models

    Authors: Xubing Ye, Yukang Gan, Xiaoke Huang, Yixiao Ge, Ying Shan, Yansong Tang

    Abstract: Vision-Language Models (VLMs) have achieved remarkable success in various multi-modal tasks, but they are often bottlenecked by the limited context window and high computational cost of processing high-resolution image inputs and videos. Vision compression can alleviate this problem by reducing the vision token count. Previous approaches compress vision tokens with external modules and force LLMs… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 18 pages, 5 figures

  9. arXiv:2406.03894  [pdf, other

    cs.LG

    Transductive Off-policy Proximal Policy Optimization

    Authors: Yaozhong Gan, Renye Yan, Xiaoyang Tan, Zhe Wu, Junliang Xing

    Abstract: Proximal Policy Optimization (PPO) is a popular model-free reinforcement learning algorithm, esteemed for its simplicity and efficacy. However, due to its inherent on-policy nature, its proficiency in harnessing data from disparate policies is constrained. This paper introduces a novel off-policy extension to the original PPO method, christened Transductive Off-policy PPO (ToPPO). Herein, we provi… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 18

  10. arXiv:2406.03678  [pdf, other

    cs.LG cs.AI stat.ML

    Reflective Policy Optimization

    Authors: Yaozhong Gan, Renye Yan, Zhe Wu, Junliang Xing

    Abstract: On-policy reinforcement learning methods, like Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), often demand extensive data per update, leading to sample inefficiency. This paper introduces Reflective Policy Optimization (RPO), a novel on-policy extension that amalgamates past and future state-action information for policy optimization. This approach empowers the age… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 20 pages

  11. arXiv:2406.00605  [pdf, other

    cs.CL cs.AI

    LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models

    Authors: Liang Zhao, Tianwen Wei, Liang Zeng, Cheng Cheng, Liu Yang, Peng Cheng, Lijie Wang, Chenxia Li, Xuejie Wu, Bo Zhu, Yimeng Gan, Rui Hu, Shuicheng Yan, Han Fang, Yahui Zhou

    Abstract: We introduce LongSkywork, a long-context Large Language Model (LLM) capable of processing up to 200,000 tokens. We provide a training recipe for efficiently extending context length of LLMs. We identify that the critical element in enhancing long-context processing capability is to incorporate a long-context SFT stage following the standard SFT stage. A mere 200 iterations can convert the standard… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  12. arXiv:2405.15414  [pdf, other

    cs.AI

    Luban: Building Open-Ended Creative Agents via Autonomous Embodied Verification

    Authors: Yuxuan Guo, Shaohui Peng, Jiaming Guo, Di Huang, Xishan Zhang, Rui Zhang, Yifan Hao, Ling Li, Zikang Tian, Mingju Gao, Yutai Li, Yiming Gan, Shuai Liang, Zihao Zhang, Zidong Du, Qi Guo, Xing Hu, Yunji Chen

    Abstract: Building open agents has always been the ultimate goal in AI research, and creative agents are the more enticing. Existing LLM agents excel at long-horizon tasks with well-defined goals (e.g., `mine diamonds' in Minecraft). However, they encounter difficulties on creative tasks with open goals and abstract criteria due to the inability to bridge the gap between them, thus lacking feedback for self… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  13. arXiv:2404.17765  [pdf

    cs.CV

    RFL-CDNet: Towards Accurate Change Detection via Richer Feature Learning

    Authors: Yuhang Gan, Wenjie Xuan, Hang Chen, Juhua Liu, Bo Du

    Abstract: Change Detection is a crucial but extremely challenging task of remote sensing image analysis, and much progress has been made with the rapid development of deep learning. However, most existing deep learning-based change detection methods mainly focus on intricate feature extraction and multi-scale feature fusion, while ignoring the insufficient utilization of features in the intermediate stages,… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Accepted by PR, volume 153

  14. arXiv:2404.07962  [pdf, other

    cs.CV cs.LG

    Live and Learn: Continual Action Clustering with Incremental Views

    Authors: Xiaoqiang Yan, Yingtao Gan, Yiqiao Mao, Yangdong Ye, Hui Yu

    Abstract: Multi-view action clustering leverages the complementary information from different camera views to enhance the clustering performance. Although existing approaches have achieved significant progress, they assume all camera views are available in advance, which is impractical when the camera view is incremental over time. Besides, learning the invariant information among multiple camera views is s… ▽ More

    Submitted 22 March, 2024; originally announced April 2024.

  15. arXiv:2404.07484  [pdf

    cs.MM cs.AI

    Multimodal Emotion Recognition by Fusing Video Semantic in MOOC Learning Scenarios

    Authors: Yuan Zhang, Xiaomei Tao, Hanxu Ai, Tao Chen, Yanling Gan

    Abstract: In the Massive Open Online Courses (MOOC) learning scenario, the semantic information of instructional videos has a crucial impact on learners' emotional state. Learners mainly acquire knowledge by watching instructional videos, and the semantic information in the videos directly affects learners' emotional states. However, few studies have paid attention to the potential influence of the semantic… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  16. arXiv:2403.12421  [pdf, other

    cs.RO

    Dexterous Functional Pre-Grasp Manipulation with Diffusion Policy

    Authors: Tianhao Wu, Yunchong Gan, Mingdong Wu, Jingbo Cheng, Yaodong Yang, Yixin Zhu, Hao Dong

    Abstract: In real-world scenarios, objects often require repositioning and reorientation before they can be grasped, a process known as pre-grasp manipulation. Learning universal dexterous functional pre-grasp manipulation requires precise control over the relative position, orientation, and contact between the hand and object while generalizing to diverse dynamic scenarios with varying objects and goal pos… ▽ More

    Submitted 5 May, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

  17. arXiv:2403.08580  [pdf, other

    cs.CV cs.MM eess.IV

    Leveraging Compressed Frame Sizes For Ultra-Fast Video Classification

    Authors: Yuxing Han, Yunan Ding, Chen Ye Gan, Jiangtao Wen

    Abstract: Classifying videos into distinct categories, such as Sport and Music Video, is crucial for multimedia understanding and retrieval, especially when an immense volume of video content is being constantly generated. Traditional methods require video decompression to extract pixel-level features like color, texture, and motion, thereby increasing computational and storage demands. Moreover, these meth… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: 5 pages, 5 figures, 1 table. arXiv admin note: substantial text overlap with arXiv:2309.07361

  18. arXiv:2403.08309  [pdf, other

    cs.LG cs.AI

    HRLAIF: Improvements in Helpfulness and Harmlessness in Open-domain Reinforcement Learning From AI Feedback

    Authors: Ang Li, Qiugen Xiao, Peng Cao, Jian Tang, Yi Yuan, Zijie Zhao, Xiaoyuan Chen, Liang Zhang, Xiangyang Li, Kaitong Yang, Weidong Guo, Yukang Gan, Xu Yu, Daniell Wang, Ying Shan

    Abstract: Reinforcement Learning from AI Feedback (RLAIF) has the advantages of shorter annotation cycles and lower costs over Reinforcement Learning from Human Feedback (RLHF), making it highly efficient during the rapid strategy iteration periods of large language model (LLM) training. Using ChatGPT as a labeler to provide feedback on open-domain prompts in RLAIF training, we observe an increase in human… ▽ More

    Submitted 14 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: 18 pages, 7 figures

  19. arXiv:2402.15301  [pdf, other

    cs.CL cs.LG stat.ME

    Causal Graph Discovery with Retrieval-Augmented Generation based Large Language Models

    Authors: Yuzhe Zhang, Yipeng Zhang, Yidong Gan, Lina Yao, Chen Wang

    Abstract: Causal graph recovery is traditionally done using statistical estimation-based methods or based on individual's knowledge about variables of interests. They often suffer from data collection biases and limitations of individuals' knowledge. The advance of large language models (LLMs) provides opportunities to address these problems. We propose a novel method that leverages LLMs to deduce causal re… ▽ More

    Submitted 18 June, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

  20. arXiv:2402.12713  [pdf, ps, other

    cs.CL

    Are LLMs Rational Investors? A Study on Detecting and Reducing the Financial Bias in LLMs

    Authors: Yuhang Zhou, Yuchen Ni, Yunhui Gan, Zhangyue Yin, Xiang Liu, Jian Zhang, Sen Liu, Xipeng Qiu, Guangnan Ye, Hongfeng Chai

    Abstract: Large Language Models (LLMs) are increasingly adopted in financial analysis for interpreting complex market data and trends. However, their use is challenged by intrinsic biases (e.g., risk-preference bias) and a superficial understanding of market intricacies, necessitating a thorough assessment of their financial insight. To address these issues, we introduce Financial Bias Indicators (FBI), a f… ▽ More

    Submitted 1 July, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

  21. arXiv:2402.05894  [pdf, other

    cs.AI cs.LG

    Large Language Model Meets Graph Neural Network in Knowledge Distillation

    Authors: Shengxiang Hu, Guobing Zou, Song Yang, Yanglan Gan, Bofeng Zhang, Yixin Chen

    Abstract: In service-oriented architectures, accurately predicting the Quality of Service (QoS) is crucial for maintaining reliability and enhancing user satisfaction. However, significant challenges remain due to existing methods always overlooking high-order latent collaborative relationships between users and services and failing to dynamically adjust feature learning for every specific user-service invo… ▽ More

    Submitted 11 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

    MSC Class: 68T30; 68R10; 68T05

  22. arXiv:2401.05391  [pdf

    cs.AR cs.AI

    Efficient LLM inference solution on Intel GPU

    Authors: Hui Wu, Yi Gan, Feng Yuan, Jing Ma, Wei Zhu, Yutao Xu, Hong Zhu, Yuhua Zhu, Xiaoli Liu, Jinghui Gu, Peng Zhao

    Abstract: Transformer based Large Language Models (LLMs) have been widely used in many fields, and the efficiency of LLM inference becomes hot topic in real applications. However, LLMs are usually complicatedly designed in model structure with massive operations and perform inference in the auto-regressive mode, making it a challenging task to design a system with high efficiency. In this paper, we propos… ▽ More

    Submitted 23 June, 2024; v1 submitted 19 December, 2023; originally announced January 2024.

  23. arXiv:2401.02415  [pdf, other

    cs.CL

    LLaMA Pro: Progressive LLaMA with Block Expansion

    Authors: Chengyue Wu, Yukang Gan, Yixiao Ge, Zeyu Lu, Jiahao Wang, Ye Feng, Ying Shan, Ping Luo

    Abstract: Humans generally acquire new skills without compromising the old; however, the opposite holds for Large Language Models (LLMs), e.g., from LLaMA to CodeLLaMA. To this end, we propose a new post-pretraining method for LLMs with an expansion of Transformer blocks. We tune the expanded blocks using only new corpus, efficiently and effectively improving the model's knowledge without catastrophic forge… ▽ More

    Submitted 30 May, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

    Comments: Accepted by ACL 2024, Main Conference

  24. arXiv:2312.16257  [pdf, other

    cs.CL cs.AI cs.LG

    More than Correlation: Do Large Language Models Learn Causal Representations of Space?

    Authors: Yida Chen, Yixian Gan, Sijia Li, Li Yao, Xiaohan Zhao

    Abstract: Recent work found high mutual information between the learned representations of large language models (LLMs) and the geospatial property of its input, hinting an emergent internal model of space. However, whether this internal space model has any causal effects on the LLMs' behaviors was not answered by that work, led to criticism of these findings as mere statistical correlation. Our study focus… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: 12 pages, 15 figures

  25. arXiv:2312.12021  [pdf, other

    cs.CL cs.AI

    Synergistic Anchored Contrastive Pre-training for Few-Shot Relation Extraction

    Authors: Da Luo, Yanglei Gan, Rui Hou, Run Lin, Qiao Liu, Yuxiang Cai, Wannian Gao

    Abstract: Few-shot Relation Extraction (FSRE) aims to extract relational facts from a sparse set of labeled corpora. Recent studies have shown promising results in FSRE by employing Pre-trained Language Models (PLMs) within the framework of supervised contrastive learning, which considers both instances and label facts. However, how to effectively harness massive instance-label pairs to encompass the learne… ▽ More

    Submitted 11 March, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

  26. arXiv:2312.09148  [pdf, other

    cs.LG cs.CV

    Split-Ensemble: Efficient OOD-aware Ensemble via Task and Model Splitting

    Authors: Anthony Chen, Huanrui Yang, Yulu Gan, Denis A Gudovskiy, Zhen Dong, Haofan Wang, Tomoyuki Okuno, Yohei Nakata, Kurt Keutzer, Shanghang Zhang

    Abstract: Uncertainty estimation is crucial for machine learning models to detect out-of-distribution (OOD) inputs. However, the conventional discriminative deep learning classifiers produce uncalibrated closed-set predictions for OOD data. A more robust classifiers with the uncertainty estimation typically require a potentially unavailable OOD dataset for outlier exposure training, or a considerable amount… ▽ More

    Submitted 27 May, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: ICML2024. Project website is available at https://antonioo-c.github.io/projects/split-ensemble

  27. arXiv:2312.02409  [pdf, other

    cs.CV cs.RO

    MGTR: Multi-Granular Transformer for Motion Prediction with LiDAR

    Authors: Yiqian Gan, Hao Xiao, Yizhe Zhao, Ethan Zhang, Zhe Huang, Xin Ye, Lingting Ge

    Abstract: Motion prediction has been an essential component of autonomous driving systems since it handles highly uncertain and complex scenarios involving moving agents of different types. In this paper, we propose a Multi-Granular TRansformer (MGTR) framework, an encoder-decoder network that exploits context features in different granularities for different kinds of traffic agents. To further enhance MGTR… ▽ More

    Submitted 5 February, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: Accepted to ICRA 2024

  28. arXiv:2311.14294  [pdf, other

    cs.CV

    Decouple Content and Motion for Conditional Image-to-Video Generation

    Authors: Cuifeng Shen, Yulu Gan, Chen Chen, Xiongwei Zhu, Lele Cheng, Tingting Gao, Jinzhi Wang

    Abstract: The goal of conditional image-to-video (cI2V) generation is to create a believable new video by beginning with the condition, i.e., one image and text.The previous cI2V generation methods conventionally perform in RGB pixel space, with limitations in modeling motion consistency and visual continuity. Additionally, the efficiency of generating videos in pixel space is quite low.In this paper, we pr… ▽ More

    Submitted 14 December, 2023; v1 submitted 24 November, 2023; originally announced November 2023.

  29. arXiv:2311.13552  [pdf, other

    quant-ph cs.LG stat.ML

    A Unified Framework for Trace-induced Quantum Kernels

    Authors: Beng Yee Gan, Daniel Leykam, Supanut Thanasilp

    Abstract: Quantum kernel methods are promising candidates for achieving a practical quantum advantage for certain machine learning tasks. Similar to classical machine learning, an exact form of a quantum kernel is expected to have a great impact on the model performance. In this work we combine all trace-induced quantum kernels, including the commonly-used global fidelity and local projected quantum kernels… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: 12 + 15 pages, 5 figures

  30. arXiv:2310.20427  [pdf, other

    eess.IV cs.CV cs.LG

    Assessing and Enhancing Robustness of Deep Learning Models with Corruption Emulation in Digital Pathology

    Authors: Peixiang Huang, Songtao Zhang, Yulu Gan, Rui Xu, Rongqi Zhu, Wenkang Qin, Limei Guo, Shan Jiang, Lin Luo

    Abstract: Deep learning in digital pathology brings intelligence and automation as substantial enhancements to pathological analysis, the gold standard of clinical diagnosis. However, multiple steps from tissue preparation to slide imaging introduce various image corruptions, making it difficult for deep neural network (DNN) models to achieve stable diagnostic results for clinical use. In order to assess an… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

  31. arXiv:2310.07091  [pdf, other

    cs.CL cs.AI

    Jaeger: A Concatenation-Based Multi-Transformer VQA Model

    Authors: Jieting Long, Zewei Shi, Penghao Jiang, Yidong Gan

    Abstract: Document-based Visual Question Answering poses a challenging task between linguistic sense disambiguation and fine-grained multimodal retrieval. Although there has been encouraging progress in document-based question answering due to the utilization of large language and open-world prior models\cite{1}, several challenges persist, including prolonged response times, extended inference durations, a… ▽ More

    Submitted 19 October, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: This paper is the technical research paper of CIKM 2023 DocIU challenges. The authors received the CIKM 2023 DocIU Winner Award, sponsored by Google, Microsoft, and the Centre for data-driven geoscience

  32. arXiv:2310.00390  [pdf, other

    cs.CV

    InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists

    Authors: Yulu Gan, Sungwoo Park, Alexander Schubert, Anthony Philippakis, Ahmed M. Alaa

    Abstract: Recent advances in generative diffusion models have enabled text-controlled synthesis of realistic and diverse images with impressive quality. Despite these remarkable advances, the application of text-to-image generative models in computer vision for standard visual recognition tasks remains limited. The current de facto approach for these tasks is to design model architectures and loss functions… ▽ More

    Submitted 16 March, 2024; v1 submitted 30 September, 2023; originally announced October 2023.

    Comments: ICLR 2024; Code is available at https://github.com/AlaaLab/InstructCV

  33. arXiv:2309.14488  [pdf, other

    cs.CL cs.AI

    When Automated Assessment Meets Automated Content Generation: Examining Text Quality in the Era of GPTs

    Authors: Marialena Bevilacqua, Kezia Oketch, Ruiyang Qin, Will Stamey, Xinyuan Zhang, Yi Gan, Kai Yang, Ahmed Abbasi

    Abstract: The use of machine learning (ML) models to assess and score textual data has become increasingly pervasive in an array of contexts including natural language processing, information retrieval, search and recommendation, and credibility assessment of online content. A significant disruption at the intersection of ML and text are text-generating large-language models such as generative pre-trained t… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: Data available at: https://github.com/nd-hal/automated-ML-scoring-versus-generation

  34. arXiv:2309.11002  [pdf, other

    cs.CV

    PPD: A New Valet Parking Pedestrian Fisheye Dataset for Autonomous Driving

    Authors: Zizhang Wu, Xinyuan Chen, Fan Song, Yuanzhu Gan, Tianhao Xu, Jian Pu, Rui Tang

    Abstract: Pedestrian detection under valet parking scenarios is fundamental for autonomous driving. However, the presence of pedestrians can be manifested in a variety of ways and postures under imperfect ambient conditions, which can adversely affect detection performance. Furthermore, models trained on publicdatasets that include pedestrians generally provide suboptimal outcomes for these valet parking sc… ▽ More

    Submitted 24 September, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: 9 pages, 6 figures

  35. arXiv:2309.10475  [pdf, other

    cs.CV

    LineMarkNet: Line Landmark Detection for Valet Parking

    Authors: Zizhang Wu, Yuanzhu Gan, Tianhao Xu, Rui Tang, Jian Pu

    Abstract: We aim for accurate and efficient line landmark detection for valet parking, which is a long-standing yet unsolved problem in autonomous driving. To this end, we present a deep line landmark detection system where we carefully design the modules to be lightweight. Specifically, we first empirically design four general line landmarks including three physical lines and one novel mental line. The fou… ▽ More

    Submitted 24 September, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: 29 pages, 12 figures

  36. arXiv:2309.07361  [pdf, other

    cs.CV

    Judging a video by its bitstream cover

    Authors: Yuxing Han, Yunan Ding, Jiangtao Wen, Chen Ye Gan

    Abstract: Classifying videos into distinct categories, such as Sport and Music Video, is crucial for multimedia understanding and retrieval, especially in an age where an immense volume of video content is constantly being generated. Traditional methods require video decompression to extract pixel-level features like color, texture, and motion, thereby increasing computational and storage demands. Moreover,… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

  37. arXiv:2309.06038  [pdf, other

    cs.RO cs.AI

    GraspGF: Learning Score-based Grasping Primitive for Human-assisting Dexterous Grasping

    Authors: Tianhao Wu, Mingdong Wu, Jiyao Zhang, Yunchong Gan, Hao Dong

    Abstract: The use of anthropomorphic robotic hands for assisting individuals in situations where human hands may be unavailable or unsuitable has gained significant importance. In this paper, we propose a novel task called human-assisting dexterous grasping that aims to train a policy for controlling a robotic hand's fingers to assist users in grasping objects. Unlike conventional dexterous grasping, this t… ▽ More

    Submitted 14 November, 2023; v1 submitted 12 September, 2023; originally announced September 2023.

  38. arXiv:2309.06006  [pdf, ps, other

    cs.CV cs.AI

    SoccerNet 2023 Challenges Results

    Authors: Anthony Cioppa, Silvio Giancola, Vladimir Somers, Floriane Magera, Xin Zhou, Hassan Mkhallati, Adrien Deliège, Jan Held, Carlos Hinojosa, Amir M. Mansourian, Pierre Miralles, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdullah Kamal, Adrien Maglo, Albert Clapés, Amr Abdelaziz, Artur Xarles, Astrid Orcesi, Atom Scott, Bin Liu, Byoungkwon Lim , et al. (77 additional authors not shown)

    Abstract: The SoccerNet 2023 challenges were the third annual video understanding challenges organized by the SoccerNet team. For this third edition, the challenges were composed of seven vision-based tasks split into three main themes. The first theme, broadcast video understanding, is composed of three high-level tasks related to describing events occurring in the video broadcasts: (1) action spotting, fo… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

  39. arXiv:2309.04946  [pdf, other

    cs.SD cs.CV cs.GR eess.AS

    Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation

    Authors: Yuan Gan, Zongxin Yang, Xihang Yue, Lingyun Sun, Yi Yang

    Abstract: Audio-driven talking-head synthesis is a popular research topic for virtual human-related applications. However, the inflexibility and inefficiency of existing methods, which necessitate expensive end-to-end training to transfer emotions from guidance videos to talking-head predictions, are significant limitations. In this work, we propose the Emotional Adaptation for Audio-driven Talking-head (EA… ▽ More

    Submitted 12 October, 2023; v1 submitted 10 September, 2023; originally announced September 2023.

    Comments: Accepted to ICCV 2023. Project page: https://yuangan.github.io/eat/

  40. arXiv:2308.11447  [pdf, other

    cs.CL cs.AI

    Aspect-oriented Opinion Alignment Network for Aspect-Based Sentiment Classification

    Authors: Xueyi Liu, Rui Hou, Yanglei Gan, Da Luo, Changlin Li, Xiaojun Shi, Qiao Liu

    Abstract: Aspect-based sentiment classification is a crucial problem in fine-grained sentiment analysis, which aims to predict the sentiment polarity of the given aspect according to its context. Previous works have made remarkable progress in leveraging attention mechanism to extract opinion words for different aspects. However, a persistent challenge is the effective management of semantic mismatches, whi… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

    Comments: 8 pages, 5 figure, ECAI 2023

    ACM Class: I.2.7

  41. Graph-Segmenter: Graph Transformer with Boundary-aware Attention for Semantic Segmentation

    Authors: Zizhang Wu, Yuanzhu Gan, Tianhao Xu, Fan Wang

    Abstract: The transformer-based semantic segmentation approaches, which divide the image into different regions by sliding windows and model the relation inside each window, have achieved outstanding success. However, since the relation modeling between windows was not the primary emphasis of previous work, it was not fully utilized. To address this issue, we propose a Graph-Segmenter, including a Graph Tra… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Journal ref: Front. Comput. Sci. 2023

  42. arXiv:2307.12138  [pdf, other

    eess.IV cs.CV

    SCPAT-GAN: Structural Constrained and Pathology Aware Convolutional Transformer-GAN for Virtual Histology Staining of Human Coronary OCT images

    Authors: Xueshen Li, Hongshan Liu, Xiaoyu Song, Brigitta C. Brott, Silvio H. Litovsky, Yu Gan

    Abstract: There is a significant need for the generation of virtual histological information from coronary optical coherence tomography (OCT) images to better guide the treatment of coronary artery disease. However, existing methods either require a large pixel-wisely paired training dataset or have limited capability to map pathological regions. To address these issues, we proposed a structural constrained… ▽ More

    Submitted 22 July, 2023; originally announced July 2023.

    Comments: 9 pages, 4 figures

  43. arXiv:2307.11130  [pdf, other

    physics.med-ph cs.CV eess.IV

    Frequency-aware optical coherence tomography image super-resolution via conditional generative adversarial neural network

    Authors: Xueshen Li, Zhenxing Dong, Hongshan Liu, Jennifer J. Kang-Mieler, Yuye Ling, Yu Gan

    Abstract: Optical coherence tomography (OCT) has stimulated a wide range of medical image-based diagnosis and treatment in fields such as cardiology and ophthalmology. Such applications can be further facilitated by deep learning-based super-resolution technology, which improves the capability of resolving morphological structures. However, existing deep learning-based method only focuses on spatial distrib… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: 13 pages, 7 figures, submitted to Biomedical Optics Express special issue

  44. arXiv:2306.12109  [pdf, other

    eess.IV cs.CV

    DiffuseIR:Diffusion Models For Isotropic Reconstruction of 3D Microscopic Images

    Authors: Mingjie Pan, Yulu Gan, Fangxu Zhou, Jiaming Liu, Aimin Wang, Shanghang Zhang, Dawei Li

    Abstract: Three-dimensional microscopy is often limited by anisotropic spatial resolution, resulting in lower axial resolution than lateral resolution. Current State-of-The-Art (SoTA) isotropic reconstruction methods utilizing deep neural networks can achieve impressive super-resolution performance in fixed imaging settings. However, their generality in practical use is limited by degraded performance cause… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

  45. arXiv:2306.01812  [pdf, other

    cs.LG

    SAPI: Surroundings-Aware Vehicle Trajectory Prediction at Intersections

    Authors: Ethan Zhang, Hao Xiao, Yiqian Gan, Lei Wang

    Abstract: In this work we propose a deep learning model, i.e., SAPI, to predict vehicle trajectories at intersections. SAPI uses an abstract way to represent and encode surrounding environment by utilizing information from real-time map, right-of-way, and surrounding traffic. The proposed model consists of two convolutional network (CNN) and recurrent neural network (RNN)-based encoders and one decoder. A r… ▽ More

    Submitted 29 July, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

  46. arXiv:2305.12724  [pdf, other

    cs.CV

    Bridging the Gap Between End-to-end and Non-End-to-end Multi-Object Tracking

    Authors: Feng Yan, Weixin Luo, Yujie Zhong, Yiyang Gan, Lin Ma

    Abstract: Existing end-to-end Multi-Object Tracking (e2e-MOT) methods have not surpassed non-end-to-end tracking-by-detection methods. One potential reason is its label assignment strategy during training that consistently binds the tracked objects with tracking queries and then assigns the few newborns to detection queries. With one-to-one bipartite matching, such an assignment will yield unbalanced traini… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  47. arXiv:2305.07397  [pdf, other

    cs.CV

    Learning Monocular Depth in Dynamic Environment via Context-aware Temporal Attention

    Authors: Zizhang Wu, Zhuozheng Li, Zhi-Gang Fan, Yunzhe Wu, Yuanzhu Gan, Jian Pu, Xianzhi Li

    Abstract: The monocular depth estimation task has recently revealed encouraging prospects, especially for the autonomous driving task. To tackle the ill-posed problem of 3D geometric reasoning from 2D monocular images, multi-frame monocular methods are developed to leverage the perspective correlation information from sequential temporal frames. However, moving objects such as cars and trains usually violat… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

    Comments: accepted by IJCAI 2023; 9 pages, 5 figures

  48. arXiv:2304.07919  [pdf, other

    cs.CV cs.AI

    Chain of Thought Prompt Tuning in Vision Language Models

    Authors: Jiaxin Ge, Hongyin Luo, Siyuan Qian, Yulu Gan, Jie Fu, Shanghang Zhang

    Abstract: Language-Image Pre-training has demonstrated promising results on zero-shot and few-shot downstream tasks by prompting visual models with natural language prompts. However, most recent studies only use a single prompt for tuning, neglecting the inherent step-to-step cognitive reasoning process that humans conduct in complex task settings, for example, when processing images from unfamiliar domains… ▽ More

    Submitted 17 June, 2023; v1 submitted 16 April, 2023; originally announced April 2023.

  49. arXiv:2303.09792  [pdf, other

    cs.CV

    Exploring Sparse Visual Prompt for Domain Adaptive Dense Prediction

    Authors: Senqiao Yang, Jiarui Wu, Jiaming Liu, Xiaoqi Li, Qizhe Zhang, Mingjie Pan, Yulu Gan, Zehui Chen, Shanghang Zhang

    Abstract: The visual prompts have provided an efficient manner in addressing visual cross-domain problems. In previous works, Visual Domain Prompt (VDP) first introduces domain prompts to tackle the classification Test-Time Adaptation (TTA) problem by warping image-level prompts on the input and fine-tuning prompts for each target domain. However, since the image-level prompts mask out continuous spatial de… ▽ More

    Submitted 15 April, 2024; v1 submitted 17 March, 2023; originally announced March 2023.

    Comments: Accepted by AAAI 2024

  50. arXiv:2302.10549  [pdf, other

    cs.CV

    MonoPGC: Monocular 3D Object Detection with Pixel Geometry Contexts

    Authors: Zizhang Wu, Yuanzhu Gan, Lei Wang, Guilian Chen, Jian Pu

    Abstract: Monocular 3D object detection reveals an economical but challenging task in autonomous driving. Recently center-based monocular methods have developed rapidly with a great trade-off between speed and accuracy, where they usually depend on the object center's depth estimation via 2D features. However, the visual semantic features without sufficient pixel geometry information, may affect the perform… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

    Comments: Accepted by ICRA 2023