Skip to main content

Showing 1–50 of 673 results for author: Yuan, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.17005  [pdf, other

    cs.RO cs.CV

    Efficient Camera Exposure Control for Visual Odometry via Deep Reinforcement Learning

    Authors: Shuyang Zhang, Jinhao He, Yilong Zhu, Jin Wu, Jie Yuan

    Abstract: The stability of visual odometry (VO) systems is undermined by degraded image quality, especially in environments with significant illumination changes. This study employs a deep reinforcement learning (DRL) framework to train agents for exposure control, aiming to enhance imaging performance in challenging conditions. A lightweight image simulator is developed to facilitate the training process,… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: 8 pages, 7 figures

  2. arXiv:2408.16357  [pdf, other

    cs.CV

    Law of Vision Representation in MLLMs

    Authors: Shijia Yang, Bohan Zhai, Quanzeng You, Jianbo Yuan, Hongxia Yang, Chenfeng Xu

    Abstract: We present the "Law of Vision Representation" in multimodal large language models (MLLMs). It reveals a strong correlation between the combination of cross-modal alignment, correspondence in vision representation, and MLLM performance. We quantify the two factors using the cross-modal Alignment and Correspondence score (AC score). Through extensive experiments involving thirteen different vision r… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: The code is available at https://github.com/bronyayang/Law_of_Vision_Representation_in_MLLMs

  3. arXiv:2408.13704  [pdf, other

    cs.CL cs.AI

    DHP Benchmark: Are LLMs Good NLG Evaluators?

    Authors: Yicheng Wang, Jiayi Yuan, Yu-Neng Chuang, Zhuoer Wang, Yingchi Liu, Mark Cusick, Param Kulkarni, Zhengping Ji, Yasser Ibrahim, Xia Hu

    Abstract: Large Language Models (LLMs) are increasingly serving as evaluators in Natural Language Generation (NLG) tasks. However, the capabilities of LLMs in scoring NLG quality remain inadequately explored. Current studies depend on human assessments and simple metrics that fail to capture the discernment of LLMs across diverse NLG tasks. To address this gap, we propose the Discernment of Hierarchical Per… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  4. arXiv:2408.13646  [pdf, other

    cs.CV

    Mean Height Aided Post-Processing for Pedestrian Detection

    Authors: Jing Yuan, Tania Stathaki, Guangyu Ren

    Abstract: The design of pedestrian detectors seldom considers the unique characteristics of this task and usually follows the common strategies for general object detection. To explore the potential of these characteristics, we take the perspective effect in pedestrian datasets as an example and propose the mean height aided suppression for post-processing. This method rejects predictions that fall at level… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  5. arXiv:2408.13639  [pdf, other

    cs.CV

    Size Aware Cross-shape Scribble Supervision for Medical Image Segmentation

    Authors: Jing Yuan, Tania Stathaki

    Abstract: Scribble supervision, a common form of weakly supervised learning, involves annotating pixels using hand-drawn curve lines, which helps reduce the cost of manual labelling. This technique has been widely used in medical image segmentation tasks to fasten network training. However, scribble supervision has limitations in terms of annotation consistency across samples and the availability of compreh… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  6. arXiv:2408.12185  [pdf, other

    cs.LG cs.AI cs.IR

    Rank and Align: Towards Effective Source-free Graph Domain Adaptation

    Authors: Junyu Luo, Zhiping Xiao, Yifan Wang, Xiao Luo, Jingyang Yuan, Wei Ju, Langechuan Liu, Ming Zhang

    Abstract: Graph neural networks (GNNs) have achieved impressive performance in graph domain adaptation. However, extensive source graphs could be unavailable in real-world scenarios due to privacy and storage concerns. To this end, we investigate an underexplored yet practical problem of source-free graph domain adaptation, which transfers knowledge from source models instead of source graphs to a target do… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: Published in IJCAI2024

  7. arXiv:2408.12071  [pdf, other

    cs.LG

    Multi-Task Curriculum Graph Contrastive Learning with Clustering Entropy Guidance

    Authors: Chusheng Zeng, Bocheng Wang, Jinghui Yuan, Rong Wang, Mulin Chen

    Abstract: Recent advances in unsupervised deep graph clustering have been significantly promoted by contrastive learning. Despite the strides, most graph contrastive learning models face challenges: 1) graph augmentation is used to improve learning diversity, but commonly used random augmentation methods may destroy inherent semantics and cause noise; 2) the fixed positive and negative sample selection stra… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  8. arXiv:2408.06509  [pdf, other

    cs.LG cs.AI cs.CR

    Fooling SHAP with Output Shuffling Attacks

    Authors: Jun Yuan, Aritra Dasgupta

    Abstract: Explainable AI~(XAI) methods such as SHAP can help discover feature attributions in black-box models. If the method reveals a significant attribution from a ``protected feature'' (e.g., gender, race) on the model output, the model is considered unfair. However, adversarial attacks can subvert the detection of XAI methods. Previous approaches to constructing such an adversarial model require access… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  9. arXiv:2408.05750  [pdf, other

    cs.CV

    FADE: A Dataset for Detecting Falling Objects around Buildings in Video

    Authors: Zhigang Tu, Zitao Gao, Zhengbo Zhang, Chunluan Zhou, Junsong Yuan, Bo Du

    Abstract: Falling objects from buildings can cause severe injuries to pedestrians due to the great impact force they exert. Although surveillance cameras are installed around some buildings, it is challenging for humans to capture such events in surveillance videos due to the small size and fast motion of falling objects, as well as the complex background. Therefore, it is necessary to develop methods to au… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: 11 pages, 10 figures

  10. arXiv:2408.05440  [pdf

    cs.CV eess.IV

    Content-decoupled Contrastive Learning-based Implicit Degradation Modeling for Blind Image Super-Resolution

    Authors: Jiang Yuan, Ji Ma, Bo Wang, Weiming Hu

    Abstract: Implicit degradation modeling-based blind super-resolution (SR) has attracted more increasing attention in the community due to its excellent generalization to complex degradation scenarios and wide application range. How to extract more discriminative degradation representations and fully adapt them to specific image features is the key to this task. In this paper, we propose a new Content-decoup… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  11. arXiv:2408.05141  [pdf, other

    cs.CL cs.IR

    A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning

    Authors: Ye Yuan, Chengwu Liu, Jingyang Yuan, Gongbo Sun, Siqi Li, Ming Zhang

    Abstract: Retrieval-augmented generation (RAG) is a framework enabling large language models (LLMs) to enhance their accuracy and reduce hallucinations by integrating external knowledge bases. In this paper, we introduce a hybrid RAG system enhanced through a comprehensive suite of optimizations that significantly improve retrieval quality, augment reasoning capabilities, and refine numerical computation ab… ▽ More

    Submitted 26 August, 2024; v1 submitted 9 August, 2024; originally announced August 2024.

    Comments: Technical report for 3rd prize in Task 1 of Meta CRAG KDD Cup 2024

  12. arXiv:2408.02936  [pdf, other

    cs.LG

    Achieving More with Less: A Tensor-Optimization-Powered Ensemble Method

    Authors: Jinghui Yuan, Weijin Jiang, Zhe Cao, Fangyuan Xie, Rong Wang, Feiping Nie, Yuan Yuan

    Abstract: Ensemble learning is a method that leverages weak learners to produce a strong learner. However, obtaining a large number of base learners requires substantial time and computational resources. Therefore, it is meaningful to study how to achieve the performance typically obtained with many base learners using only a few. We argue that to achieve this, it is essential to enhance both classification… ▽ More

    Submitted 12 August, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

  13. arXiv:2408.02932  [pdf, other

    cs.LG cs.AI

    Doubly Stochastic Adaptive Neighbors Clustering via the Marcus Mapping

    Authors: Jinghui Yuan, Chusheng Zeng, Fangyuan Xie, Zhe Cao, Mulin Chen, Rong Wang, Feiping Nie, Yuan Yuan

    Abstract: Clustering is a fundamental task in machine learning and data science, and similarity graph-based clustering is an important approach within this domain. Doubly stochastic symmetric similarity graphs provide numerous benefits for clustering problems and downstream tasks, yet learning such graphs remains a significant challenge. Marcus theorem states that a strictly positive symmetric matrix can be… ▽ More

    Submitted 12 August, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

  14. arXiv:2408.02074  [pdf

    eess.IV cs.AI cs.CV

    Applying Conditional Generative Adversarial Networks for Imaging Diagnosis

    Authors: Haowei Yang, Yuxiang Hu, Shuyao He, Ting Xu, Jiajie Yuan, Xingxin Gu

    Abstract: This study introduces an innovative application of Conditional Generative Adversarial Networks (C-GAN) integrated with Stacked Hourglass Networks (SHGN) aimed at enhancing image segmentation, particularly in the challenging environment of medical imaging. We address the problem of overfitting, common in deep learning models applied to complex imaging datasets, by augmenting data through rotation a… ▽ More

    Submitted 17 July, 2024; originally announced August 2024.

  15. arXiv:2407.21450  [pdf, other

    cs.CV

    Forecasting Future Videos from Novel Views via Disentangled 3D Scene Representation

    Authors: Sudhir Yarram, Junsong Yuan

    Abstract: Video extrapolation in space and time (VEST) enables viewers to forecast a 3D scene into the future and view it from novel viewpoints. Recent methods propose to learn an entangled representation, aiming to model layered scene geometry, motion forecasting and novel view synthesis together, while assuming simplified affine motion and homography-based warping at each scene layer, leading to inaccurat… ▽ More

    Submitted 2 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024. Project Page: https://skrya.github.io/projects/ffn-dsr/

  16. arXiv:2407.17272  [pdf, other

    cs.CV

    DenseTrack: Drone-based Crowd Tracking via Density-aware Motion-appearance Synergy

    Authors: Yi Lei, Huilin Zhu, Jingling Yuan, Guangli Xiang, Xian Zhong, Shengfeng He

    Abstract: Drone-based crowd tracking faces difficulties in accurately identifying and monitoring objects from an aerial perspective, largely due to their small size and close proximity to each other, which complicates both localization and tracking. To address these challenges, we present the Density-aware Tracking (DenseTrack) framework. DenseTrack capitalizes on crowd counting to precisely determine objec… ▽ More

    Submitted 26 July, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

  17. arXiv:2407.16655  [pdf, other

    cs.CV

    MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence

    Authors: Canyu Zhao, Mingyu Liu, Wen Wang, Jianlong Yuan, Hao Chen, Bo Zhang, Chunhua Shen

    Abstract: Recent advancements in video generation have primarily leveraged diffusion models for short-duration content. However, these approaches often fall short in modeling complex narratives and maintaining character consistency over extended periods, which is essential for long-form video production like movies. We propose MovieDreamer, a novel hierarchical framework that integrates the strengths of aut… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 23 pages, 18 figures

  18. arXiv:2407.14214  [pdf, ps, other

    cs.LG cs.IT

    Domain Adaptation for Industrial Time-series Forecasting via Counterfactual Inference

    Authors: Chao Min, Guoquan Wen, Jiangru Yuan, Jun Yi, Xing Guo

    Abstract: Industrial time-series, as a structural data responds to production process information, can be utilized to perform data-driven decision-making for effective monitoring of industrial production process. However, there are some challenges for time-series forecasting in industry, e.g., predicting few-shot caused by data shortage, and decision-confusing caused by unknown treatment policy. To cope wit… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  19. arXiv:2407.10937  [pdf, other

    cs.CV

    IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation

    Authors: Yuanhao Zhai, Kevin Lin, Linjie Li, Chung-Ching Lin, Jianfeng Wang, Zhengyuan Yang, David Doermann, Junsong Yuan, Zicheng Liu, Lijuan Wang

    Abstract: Significant advances have been made in human-centric video generation, yet the joint video-depth generation problem remains underexplored. Most existing monocular depth estimation methods may not generalize well to synthesized images or videos, and multi-view-based methods have difficulty controlling the human appearance and motion. In this work, we present IDOL (unIfied Dual-mOdal Latent diffusio… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; project page: https://yhzhai.github.io/idol/

  20. arXiv:2407.09694  [pdf, other

    cs.CV

    Divide and Fuse: Body Part Mesh Recovery from Partially Visible Human Images

    Authors: Tianyu Luan, Zhongpai Gao, Luyuan Xie, Abhishek Sharma, Hao Ding, Benjamin Planche, Meng Zheng, Ange Lou, Terrence Chen, Junsong Yuan, Ziyan Wu

    Abstract: We introduce a novel bottom-up approach for human body mesh reconstruction, specifically designed to address the challenges posed by partial visibility and occlusion in input images. Traditional top-down methods, relying on whole-body parametric models like SMPL, falter when only a small part of the human is visible, as they require visibility of most of the human body for accurate mesh reconstruc… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  21. arXiv:2407.06078  [pdf, ps, other

    cs.SD

    Few-Shot Keyword Spotting from Mixed Speech

    Authors: Junming Yuan, Ying Shi, LanTian Li, Dong Wang, Askar Hamdulla

    Abstract: Few-shot keyword spotting (KWS) aims to detect unknown keywords with limited training samples. A commonly used approach is the pre-training and fine-tuning framework. While effective in clean conditions, this approach struggles with mixed keyword spotting -- simultaneously detecting multiple keywords blended in an utterance, which is crucial in real-world applications. Previous research has propos… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: accepted by INTERSPEECH 2024

  22. arXiv:2407.04948  [pdf, other

    cs.CV

    Zero-shot Object Counting with Good Exemplars

    Authors: Huilin Zhu, Jingling Yuan, Zhengwei Yang, Yu Guo, Zheng Wang, Xian Zhong, Shengfeng He

    Abstract: Zero-shot object counting (ZOC) aims to enumerate objects in images using only the names of object classes during testing, without the need for manual annotations. However, a critical challenge in current ZOC methods lies in their inability to identify high-quality exemplars effectively. This deficiency hampers scalability across diverse classes and undermines the development of strong visual asso… ▽ More

    Submitted 9 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  23. arXiv:2407.04181  [pdf, other

    cs.AI cs.CL

    Orchestrating LLMs with Different Personalizations

    Authors: Jin Peng Zhou, Katie Z Luo, Jingwen Gu, Jason Yuan, Kilian Q. Weinberger, Wen Sun

    Abstract: This paper presents a novel approach to aligning large language models (LLMs) with individual human preferences, sometimes referred to as Reinforcement Learning from \textit{Personalized} Human Feedback (RLPHF). Given stated preferences along multiple dimensions, such as helpfulness, conciseness, or humor, the goal is to create an LLM without re-training that best adheres to this specification. St… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  24. arXiv:2407.01527  [pdf, other

    cs.CL

    KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches

    Authors: Jiayi Yuan, Hongyi Liu, Shaochen, Zhong, Yu-Neng Chuang, Songchen Li, Guanchu Wang, Duy Le, Hongye Jin, Vipin Chaudhary, Zhaozhuo Xu, Zirui Liu, Xia Hu

    Abstract: Long context capability is a crucial competency for large language models (LLMs) as it mitigates the human struggle to digest long-form texts. This capability enables complex task-solving scenarios such as book summarization, code assistance, and many more tasks that are traditionally manpower-intensive. However, transformer-based LLMs face significant challenges with long context input due to the… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  25. arXiv:2407.00468  [pdf, other

    cs.CV cs.AI cs.CL

    MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation

    Authors: Jinsheng Huang, Liang Chen, Taian Guo, Fu Zeng, Yusheng Zhao, Bohan Wu, Ye Yuan, Haozhe Zhao, Zhihui Guo, Yichi Zhang, Jingyang Yuan, Wei Ju, Luchen Liu, Tianyu Liu, Baobao Chang, Ming Zhang

    Abstract: Large Multimodal Models (LMMs) exhibit impressive cross-modal understanding and reasoning abilities, often assessed through multiple-choice questions (MCQs) that include an image, a question, and several options. However, many benchmarks used for such evaluations suffer from systematic biases. Remarkably, Large Language Models (LLMs) without any visual perception capabilities achieve non-trivial p… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 21 pages, code released at https://github.com/chenllliang/MMEvalPro, Homepage at https://mmevalpro.github.io/

  26. arXiv:2406.19781  [pdf, other

    cs.RO

    LCSim: A Large-Scale Controllable Traffic Simulator

    Authors: Yuheng Zhang, Tianjian Ouyang, Fudan Yu, Cong Ma, Lei Qiao, Wei Wu, Jian Yuan, Yong Li

    Abstract: With the rapid development of urban transportation and the continuous advancement in autonomous vehicles, the demand for safely and efficiently testing autonomous driving and traffic optimization algorithms arises, which needs accurate modeling of large-scale urban traffic scenarios. Existing traffic simulation systems encounter two significant limitations. Firstly, they often rely on open-source… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Submitted to the 38th Conference on Neural Information Processing Systems (NeurIPS 2024) Track on Datasets and Benchmarks

  27. arXiv:2406.19400  [pdf, other

    cs.CV

    Deep Convolutional Neural Networks Meet Variational Shape Compactness Priors for Image Segmentation

    Authors: Kehui Zhang, Lingfeng Li, Hao Liu, Jing Yuan, Xue-Cheng Tai

    Abstract: Shape compactness is a key geometrical property to describe interesting regions in many image segmentation tasks. In this paper, we propose two novel algorithms to solve the introduced image segmentation problem that incorporates a shape-compactness prior. Existing algorithms for such a problem often suffer from computational inefficiency, difficulty in reaching a local minimum, and the need to fi… ▽ More

    Submitted 23 May, 2024; originally announced June 2024.

    Comments: 28 pages

  28. arXiv:2406.18548  [pdf

    eess.IV cs.CV

    Exploration of Multi-Scale Image Fusion Systems in Intelligent Medical Image Analysis

    Authors: Yuxiang Hu, Haowei Yang, Ting Xu, Shuyao He, Jiajie Yuan, Haozhang Deng

    Abstract: The diagnosis of brain cancer relies heavily on medical imaging techniques, with MRI being the most commonly used. It is necessary to perform automatic segmentation of brain tumors on MRI images. This project intends to build an MRI algorithm based on U-Net. The residual network and the module used to enhance the context information are combined, and the void space convolution pooling pyramid is a… ▽ More

    Submitted 23 May, 2024; originally announced June 2024.

  29. arXiv:2406.14045  [pdf, other

    cs.LG cs.AI

    Understanding Different Design Choices in Training Large Time Series Models

    Authors: Yu-Neng Chuang, Songchen Li, Jiayi Yuan, Guanchu Wang, Kwei-Herng Lai, Leisheng Yu, Sirui Ding, Chia-Yuan Chang, Qiaoyu Tan, Daochen Zha, Xia Hu

    Abstract: Inspired by Large Language Models (LLMs), Time Series Forecasting (TSF), a long-standing task in time series analysis, is undergoing a transition towards Large Time Series Models (LTSMs), aiming to train universal transformer-based models for TSF. However, training LTSMs on heterogeneous time series data poses unique challenges, including diverse frequencies, dimensions, and patterns across datase… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  30. arXiv:2406.13642  [pdf, other

    cs.CV

    SpatialBot: Precise Spatial Understanding with Vision Language Models

    Authors: Wenxiao Cai, Yaroslav Ponomarenko, Jianhao Yuan, Xiaoqi Li, Wankou Yang, Hao Dong, Bo Zhao

    Abstract: Vision Language Models (VLMs) have achieved impressive performance in 2D image understanding, however they are still struggling with spatial understanding which is the foundation of Embodied AI. In this paper, we propose SpatialBot for better spatial understanding by feeding both RGB and depth images. Additionally, we have constructed the SpatialQA dataset, which involves multi-level depth-related… ▽ More

    Submitted 1 August, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  31. arXiv:2406.13586  [pdf, ps, other

    cs.GT cs.AI

    Submodular Participatory Budgeting

    Authors: Jing Yuan, Shaojie Tang

    Abstract: Participatory budgeting refers to the practice of allocating public resources by collecting and aggregating individual preferences. Most existing studies in this field often assume an additive utility function, where each individual holds a private utility for each candidate project, and the total utility of a set of funded projects is simply the sum of the utilities of all projects. We argue that… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  32. arXiv:2406.11847  [pdf, other

    cs.CY cs.LG

    Integrating behavior analysis with machine learning to predict online learning performance: A scientometric review and empirical study

    Authors: Jin Yuan, Xuelan Qiu, Jinran Wu, Jiesi Guo, Weide Li, You-Gan Wang

    Abstract: The interest in predicting online learning performance using ML algorithms has been steadily increasing. We first conducted a scientometric analysis to provide a systematic review of research in this area. The findings show that most existing studies apply the ML methods without considering learning behavior patterns, which may compromise the prediction accuracy and precision of the ML methods. Th… ▽ More

    Submitted 27 March, 2024; originally announced June 2024.

    Comments: 23 pages, 12 figures, 9 tables. Submitted to Computer & Education; Authorship Contribution: Yuan: Literature review, Data curation, Methodology, Software. Qiu: Literature review, Conceptualization, Methodology, Original draft writing. Wu: Scientometric analysis, Methodology. Guo: Review and editing. Li: Comment draft, Funding seeking. Wang: Comment draft

  33. arXiv:2406.09870  [pdf, other

    cs.LG cs.AI

    IGL-Bench: Establishing the Comprehensive Benchmark for Imbalanced Graph Learning

    Authors: Jiawen Qin, Haonan Yuan, Qingyun Sun, Lyujin Xu, Jiaqi Yuan, Pengfeng Huang, Zhaonan Wang, Xingcheng Fu, Hao Peng, Jianxin Li, Philip S. Yu

    Abstract: Deep graph learning has gained grand popularity over the past years due to its versatility and success in representing graph data across a wide range of domains. However, the pervasive issue of imbalanced graph data distributions, where certain parts exhibit disproportionally abundant data while others remain sparse, undermines the efficacy of conventional graph learning algorithms, leading to bia… ▽ More

    Submitted 19 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Preprint, under review)

  34. arXiv:2406.08864  [pdf

    cs.LG cs.AI

    Research on Early Warning Model of Cardiovascular Disease Based on Computer Deep Learning

    Authors: Yuxiang Hu, Jinxin Hu, Ting Xu, Bo Zhang, Jiajie Yuan, Haozhang Deng

    Abstract: This project intends to study a cardiovascular disease risk early warning model based on one-dimensional convolutional neural networks. First, the missing values of 13 physiological and symptom indicators such as patient age, blood glucose, cholesterol, and chest pain were filled and Z-score was standardized. The convolutional neural network is converted into a 2D matrix, the convolution function… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 6 pages

  35. arXiv:2406.06890  [pdf, other

    cs.CV

    Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation

    Authors: Yuanhao Zhai, Kevin Lin, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Chung-Ching Lin, David Doermann, Junsong Yuan, Lijuan Wang

    Abstract: Image diffusion distillation achieves high-fidelity generation with very few sampling steps. However, applying these techniques directly to video diffusion often results in unsatisfactory frame quality due to the limited visual quality in public video datasets. This affects the performance of both teacher and student video diffusion models. Our study aims to improve video diffusion distillation wh… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Project page: https://yhzhai.github.io/mcm/

  36. arXiv:2406.04776  [pdf, ps, other

    eess.SP cs.AI

    OFDM-Standard Compatible SC-NOFS Waveforms for Low-Latency and Jitter-Tolerance Industrial IoT Communications

    Authors: Tongyang Xu, Shuangyang Li, Jinhong Yuan

    Abstract: Traditional communications focus on regular and orthogonal signal waveforms for simplified signal processing and improved spectral efficiency. In contrast, the next-generation communications would aim for irregular and non-orthogonal signal waveforms to introduce new capabilities. This work proposes a spectrally efficient irregular Sinc (irSinc) shaping technique, revisiting the traditional Sinc b… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  37. arXiv:2406.03402  [pdf, other

    cs.LG cs.AI

    Mixed-Precision Over-The-Air Federated Learning via Approximated Computing

    Authors: Jinsheng Yuan, Zhuangkun Wei, Weisi Guo

    Abstract: Over-the-Air Federated Learning (OTA-FL) has been extensively investigated as a privacy-preserving distributed learning mechanism. Realistic systems will see FL clients with diverse size, weight, and power configurations. A critical research gap in existing OTA-FL research is the assumption of homogeneous client computational bit precision. Indeed, many clients may exploit approximate computing (A… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  38. arXiv:2406.02126  [pdf, other

    eess.SY cs.AI cs.LG cs.MA

    CityLight: A Universal Model for Coordinated Traffic Signal Control in City-scale Heterogeneous Intersections

    Authors: Jinwei Zeng, Chao Yu, Xinyi Yang, Wenxuan Ao, Qianyue Hao, Jian Yuan, Yong Li, Yu Wang, Huazhong Yang

    Abstract: The increasingly severe congestion problem in modern cities strengthens the significance of developing city-scale traffic signal control (TSC) methods for traffic efficiency enhancement. While reinforcement learning has been widely explored in TSC, most of them still target small-scale optimization and cannot directly scale to the city level due to unbearable resource demand. Only a few of them ma… ▽ More

    Submitted 28 August, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  39. arXiv:2406.01903  [pdf, ps, other

    cs.IT

    Reverse PAC Codes: Look-ahead List Decoding

    Authors: Xinyi Gu, Mohammad Rowshan, Jinhong Yuan

    Abstract: Convolutional precoding in polarization-adjusted convolutional (PAC) codes is a recently introduced variant of polar codes. It has demonstrated an effective reduction in the number of minimum weight codewords (a.k.a error coefficient) of polar codes. This reduction has the potential to significantly improve the error correction performance. From a codeword formation perspective, this reduction has… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: To appear in the proceedings of ISIT'24. It contains 6 pages, 3 figures, and 1 table

  40. arXiv:2406.01900  [pdf, other

    cs.CV

    Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation

    Authors: Yue Ma, Hongyu Liu, Hongfa Wang, Heng Pan, Yingqing He, Junkun Yuan, Ailing Zeng, Chengfei Cai, Heung-Yeung Shum, Wei Liu, Qifeng Chen

    Abstract: We present Follow-Your-Emoji, a diffusion-based framework for portrait animation, which animates a reference portrait with target landmark sequences. The main challenge of portrait animation is to preserve the identity of the reference portrait and transfer the target expression to this portrait while maintaining temporal consistency and fidelity. To address these challenges, Follow-Your-Emoji equ… ▽ More

    Submitted 6 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Project Page: https://follow-your-emoji.github.io/

  41. arXiv:2405.17708  [pdf, other

    cs.LG cs.AI stat.ML

    OPERA: Automatic Offline Policy Evaluation with Re-weighted Aggregates of Multiple Estimators

    Authors: Allen Nie, Yash Chandak, Christina J. Yuan, Anirudhan Badrinath, Yannis Flet-Berliac, Emma Brunskil

    Abstract: Offline policy evaluation (OPE) allows us to evaluate and estimate a new sequential decision-making policy's performance by leveraging historical interaction data collected from other policies. Evaluating a new policy online without a confident estimate of its performance can lead to costly, unsafe, or hazardous outcomes, especially in education and healthcare. Several OPE estimators have been pro… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 22 pages

  42. arXiv:2405.17460  [pdf

    cs.LG cs.AI cs.CV

    Investigation of Customized Medical Decision Algorithms Utilizing Graph Neural Networks

    Authors: Yafeng Yan, Shuyao He, Zhou Yu, Jiajie Yuan, Ziang Liu, Yan Chen

    Abstract: Aiming at the limitations of traditional medical decision system in processing large-scale heterogeneous medical data and realizing highly personalized recommendation, this paper introduces a personalized medical decision algorithm utilizing graph neural network (GNN). This research innovatively integrates graph neural network technology into the medical and health field, aiming to build a high-pr… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  43. arXiv:2405.15738  [pdf, other

    cs.CV

    ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models

    Authors: Chunjiang Ge, Sijie Cheng, Ziming Wang, Jiale Yuan, Yuan Gao, Jun Song, Shiji Song, Gao Huang, Bo Zheng

    Abstract: High-resolution Large Multimodal Models (LMMs) encounter the challenges of excessive visual tokens and quadratic visual complexity. Current high-resolution LMMs address the quadratic complexity while still generating excessive visual tokens. However, the redundancy in visual tokens is the key problem as it leads to more substantial compute. To mitigate this issue, we propose ConvLLaVA, which emplo… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 17 pages

  44. arXiv:2405.15199  [pdf, other

    cs.CV

    ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models

    Authors: Jingyuan Zhu, Shiyu Li, Yuxuan Liu, Ping Huang, Jiulong Shan, Huimin Ma, Jian Yuan

    Abstract: Modern diffusion-based image generative models have made significant progress and become promising to enrich training data for the object detection task. However, the generation quality and the controllability for complex scenes containing multi-class objects and dense objects with occlusions remain limited. This paper presents ODGEN, a novel method to generate high-quality images conditioned on b… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  45. arXiv:2405.12868  [pdf, other

    cs.LG cs.AI

    Equivariant Spatio-Temporal Attentive Graph Networks to Simulate Physical Dynamics

    Authors: Liming Wu, Zhichao Hou, Jirui Yuan, Yu Rong, Wenbing Huang

    Abstract: Learning to represent and simulate the dynamics of physical systems is a crucial yet challenging task. Existing equivariant Graph Neural Network (GNN) based methods have encapsulated the symmetry of physics, \emph{e.g.}, translations, rotations, etc, leading to better generalization ability. Nevertheless, their frame-to-frame formulation of the task overlooks the non-Markov property mainly incurre… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: The paper has been published to the conference of NeurIPS 2023

  46. arXiv:2405.10640  [pdf, other

    cs.SI

    COMET: NFT Price Prediction with Wallet Profiling

    Authors: Tianfu Wang, Liwei Deng, Chao Wang, Jianxun Lian, Yue Yan, Nicholas Jing Yuan, Qi Zhang, Hui Xiong

    Abstract: As the non-fungible token (NFT) market flourishes, price prediction emerges as a pivotal direction for investors gaining valuable insight to maximize returns. However, existing works suffer from a lack of practical definitions and standardized evaluations, limiting their practical application. Moreover, the influence of users' multi-behaviour transactions that are publicly accessible on NFT price… ▽ More

    Submitted 2 July, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD 2024 (ADS Track)

  47. Defect Category Prediction Based on Multi-Source Domain Adaptation

    Authors: Ying Xing, Mengci Zhao, Bin Yang, Yuwei Zhang, Wenjin Li, Jiawei Gu, Jun Yuan

    Abstract: In recent years, defect prediction techniques based on deep learning have become a prominent research topic in the field of software engineering. These techniques can identify potential defects without executing the code. However, existing approaches mostly concentrate on determining the presence of defects at the method-level code, lacking the ability to precisely classify specific defect categor… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: 17 pages, in Chinese language, 8 figures (Due to length constraints of the abstract field, please refer to the original PDF file for the full content of abstract.)

    Journal ref: Journal of Software [2024]

  48. arXiv:2405.10276  [pdf, other

    cs.CL cs.HC

    Revisiting OPRO: The Limitations of Small-Scale LLMs as Optimizers

    Authors: Tuo Zhang, Jinyue Yuan, Salman Avestimehr

    Abstract: Numerous recent works aim to enhance the efficacy of Large Language Models (LLMs) through strategic prompting. In particular, the Optimization by PROmpting (OPRO) approach provides state-of-the-art performance by leveraging LLMs as optimizers where the optimization task is to find instructions that maximize the task accuracy. In this paper, we revisit OPRO for automated prompting with relatively s… ▽ More

    Submitted 18 July, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Journal ref: ACL Findings 2024

  49. arXiv:2405.09819  [pdf

    cs.SE cs.LG

    Automating the Training and Deployment of Models in MLOps by Integrating Systems with Machine Learning

    Authors: Penghao Liang, Bo Song, Xiaoan Zhan, Zhou Chen, Jiaqiang Yuan

    Abstract: This article introduces the importance of machine learning in real-world applications and explores the rise of MLOps (Machine Learning Operations) and its importance for solving challenges such as model deployment and performance monitoring. By reviewing the evolution of MLOps and its relationship to traditional software development methods, the paper proposes ways to integrate the system into mac… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  50. arXiv:2405.08295  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechVerse: A Large-scale Generalizable Audio Language Model

    Authors: Nilaksh Das, Saket Dingliwal, Srikanth Ronanki, Rohit Paturi, Zhaocheng Huang, Prashant Mathur, Jie Yuan, Dhanush Bekal, Xing Niu, Sai Muralidhar Jayanthi, Xilai Li, Karel Mundnich, Monica Sunkara, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

    Abstract: Large language models (LLMs) have shown incredible proficiency in performing tasks that require semantic understanding of natural language instructions. Recently, many works have further expanded this capability to perceive multimodal audio and text inputs, but their capabilities are often limited to specific fine-tuned tasks such as automatic speech recognition and translation. We therefore devel… ▽ More

    Submitted 31 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: Single Column, 13 page