Skip to main content

Showing 1–50 of 556 results for author: Fan, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.16337  [pdf, other

    cs.LG

    STATE: A Robust ATE Estimator of Heavy-Tailed Metrics for Variance Reduction in Online Controlled Experiments

    Authors: Hao Zhou, Kun Sun, Shaoming Li, Yangfeng Fan, Guibin Jiang, Jiaqi Zheng, Tao Li

    Abstract: Online controlled experiments play a crucial role in enabling data-driven decisions across a wide range of companies. Variance reduction is an effective technique to improve the sensitivity of experiments, achieving higher statistical power while using fewer samples and shorter experimental periods. However, typical variance reduction methods (e.g., regression-adjusted estimators) are built upon t… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted by KDD 2024

  2. arXiv:2407.13343  [pdf, other

    cs.CL

    Learning-From-Mistakes Prompting for Indigenous Language Translation

    Authors: You-Cheng Liao, Chen-Jui Yu, Chi-Yi Lin, He-Feng Yun, Yen-Hsiang Wang, Hsiao-Min Li, Yao-Chung Fan

    Abstract: Using large language models, this paper presents techniques to improve extremely low-resourced indigenous language translations. Our approaches are grounded in the use of (1) the presence of a datastore consisting of a limited number of parallel translation examples, (2) the inherent capabilities of LLMs like GPT-3.5, and (3) a word-level translation dictionary. We harness the potential of LLMs an… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  3. arXiv:2407.13228  [pdf

    cs.CL cs.CY cs.ET cs.LG

    Evaluating Large Language Models for Anxiety and Depression Classification using Counseling and Psychotherapy Transcripts

    Authors: Junwei Sun, Siqi Ma, Yiran Fan, Peter Washington

    Abstract: We aim to evaluate the efficacy of traditional machine learning and large language models (LLMs) in classifying anxiety and depression from long conversational transcripts. We fine-tune both established transformer models (BERT, RoBERTa, Longformer) and more recent large models (Mistral-7B), trained a Support Vector Machine with feature engineering, and assessed GPT models through prompting. We ob… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  4. arXiv:2407.11504  [pdf, other

    cs.IR

    Bootstrapped Pre-training with Dynamic Identifier Prediction for Generative Retrieval

    Authors: Yubao Tang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng

    Abstract: Generative retrieval uses differentiable search indexes to directly generate relevant document identifiers in response to a query. Recent studies have highlighted the potential of a strong generative retrieval model, trained with carefully crafted pre-training tasks, to enhance downstream retrieval tasks via fine-tuning. However, the full power of pre-training for generative retrieval remains unde… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by ACL Findings 2024

  5. arXiv:2407.11046  [pdf, other

    cs.LG cs.AI cs.CL

    A Survey on LoRA of Large Language Models

    Authors: Yuren Mao, Yuhang Ge, Yijiang Fan, Wenyi Xu, Yu Mi, Zhonghao Hu, Yunjun Gao

    Abstract: Low-Rank Adaptation~(LoRA), which updates the dense neural network layers with pluggable low-rank matrices, is one of the best performed parameter efficient fine-tuning paradigms. Furthermore, it has significant advantages in cross-task generalization and privacy-preserving. Hence, LoRA has gained much attention recently, and the number of related literature demonstrates exponential growth. It is… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  6. arXiv:2407.10671  [pdf, other

    cs.CL cs.AI

    Qwen2 Technical Report

    Authors: An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jianxin Yang, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin , et al. (37 additional authors not shown)

    Abstract: This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, a… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 25 pages, 1 figure

  7. arXiv:2407.09686  [pdf, other

    cs.CV

    SPIN: Hierarchical Segmentation with Subpart Granularity in Natural Images

    Authors: Josh Myers-Dean, Jarek Reynolds, Brian Price, Yifei Fan, Danna Gurari

    Abstract: Hierarchical segmentation entails creating segmentations at varying levels of granularity. We introduce the first hierarchical semantic segmentation dataset with subpart annotations for natural images, which we call SPIN (SubPartImageNet). We also introduce two novel evaluation metrics to evaluate how well algorithms capture spatial and semantic relationships across hierarchical levels. We benchma… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted at ECCV 2024

  8. arXiv:2407.06992  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    Robust Neural Information Retrieval: An Adversarial and Out-of-distribution Perspective

    Authors: Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng

    Abstract: Recent advances in neural information retrieval (IR) models have significantly enhanced their effectiveness over various IR tasks. The robustness of these models, essential for ensuring their reliability in practice, has also garnered significant attention. With a wide array of research on robust IR being proposed, we believe it is the opportune moment to consolidate the current status, glean insi… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Survey paper

  9. arXiv:2407.04969  [pdf, other

    cs.CL

    EVA-Score: Evaluation of Long-form Summarization on Informativeness through Extraction and Validation

    Authors: Yuchen Fan, Xin Zhong, Chengsi Wang, Gaoche Wu, Bowen Zhou

    Abstract: Summarization is a fundamental task in natural language processing (NLP) and since large language models (LLMs), such as GPT-4 and Claude, come out, increasing attention has been paid to long-form summarization whose input sequences are much longer, indicating more information contained. The current evaluation metrics either use similarity-based metrics like ROUGE and BERTScore which rely on sim… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: 16 pages, 3 figures, submitted to EMNLP

  10. arXiv:2407.04150  [pdf, other

    math.CO cs.DM

    Spectral Methods for Matrix Product Factorization

    Authors: Saieed Akbari, Yi-Zheng Fan, Fu-Tao Hu, Babak Miraftab, Yi Wang

    Abstract: A graph $G$ is factored into graphs $H$ and $K$ via a matrix product if there exist adjacency matrices $A$, $B$, and $C$ of $G$, $H$, and $K$, respectively, such that $A = BC$. In this paper, we study the spectral aspects of the matrix product of graphs, including regularity, bipartiteness, and connectivity. We show that if a graph $G$ is factored into a connected graph $H$ and a graph $K$ with no… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Comments are welcome

    MSC Class: 05C50; 15A18

  11. arXiv:2407.00503  [pdf, other

    cs.CV

    Toward a Diffusion-Based Generalist for Dense Vision Tasks

    Authors: Yue Fan, Yongqin Xian, Xiaohua Zhai, Alexander Kolesnikov, Muhammad Ferjad Naeem, Bernt Schiele, Federico Tombari

    Abstract: Building generalized models that can solve many computer vision tasks simultaneously is an intriguing direction. Recent works have shown image itself can be used as a natural interface for general-purpose visual perception and demonstrated inspiring results. In this paper, we explore diffusion-based vision generalists, where we unify different types of dense prediction tasks as conditional image g… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: Published at CVPR 2024 as a workshop paper

  12. arXiv:2406.19263  [pdf, other

    cs.CL cs.CV

    Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding

    Authors: Yue Fan, Lei Ding, Ching-Chen Kuo, Shan Jiang, Yang Zhao, Xinze Guan, Jie Yang, Yi Zhang, Xin Eric Wang

    Abstract: Graphical User Interfaces (GUIs) are central to our interaction with digital devices. Recently, growing efforts have been made to build models for various GUI understanding tasks. However, these efforts largely overlook an important GUI-referring task: screen reading based on user-indicated points, which we name the Screen Point-and-Read (SPR) task. This task is predominantly handled by rigid acce… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  13. arXiv:2406.18017  [pdf, other

    cs.IT cs.ET

    Dependence Analysis and Structured Construction for Batched Sparse Code

    Authors: Jiaxin Qing, Xiaohong Cai, Yijun Fan, Mingyang Zhu, Raymond W. Yeung

    Abstract: In coding theory, codes are usually designed with a certain level of randomness to facilitate analysis and accommodate different channel conditions. However, the resulting random code constructed can be suboptimal in practical implementations. Represented by a bipartite graph, the Batched Sparse Code (BATS Code) is a randomly constructed erasure code that utilizes network coding to achieve near-op… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  14. arXiv:2406.16786  [pdf, other

    cs.CE

    Generalized and high-efficiency arbitrary-positioned buffer for smoothed particle hydrodynamics

    Authors: Shuoguo Zhang, Yu Fan, Yaru Ren, Bin Qian, Xiangyu Hu

    Abstract: This paper develops an arbitrary-positioned buffer for the smoothed particle hydrodynamics (SPH) method, whose generality and high efficiency are achieved through two techniques. First, with the local coordinate system established at each arbitrary-positioned in-/outlet, particle positions in the global coordinate system are transformed into those in it via coordinate transformation. Since one loc… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 34 pages and 17 figures

  15. arXiv:2406.15716  [pdf, other

    eess.IV cs.CV

    Predicting fluorescent labels in label-free microscopy images with pix2pix and adaptive loss in Light My Cells challenge

    Authors: Han Liu, Hao Li, Jiacheng Wang, Yubo Fan, Zhoubing Xu, Ipek Oguz

    Abstract: Fluorescence labeling is the standard approach to reveal cellular structures and other subcellular constituents for microscopy images. However, this invasive procedure may perturb or even kill the cells and the procedure itself is highly time-consuming and complex. Recently, in silico labeling has emerged as a promising alternative, aiming to use machine learning models to directly predict the flu… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  16. arXiv:2406.15019  [pdf, other

    cs.CL

    MedOdyssey: A Medical Domain Benchmark for Long Context Evaluation Up to 200K Tokens

    Authors: Yongqi Fan, Hongli Sun, Kui Xue, Xiaofan Zhang, Shaoting Zhang, Tong Ruan

    Abstract: Numerous advanced Large Language Models (LLMs) now support context lengths up to 128K, and some extend to 200K. Some benchmarks in the generic domain have also followed up on evaluating long-context capabilities. In the medical domain, tasks are distinctive due to the unique contexts and need for domain expertise, necessitating further evaluation. However, despite the frequent presence of long tex… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  17. arXiv:2406.13578  [pdf, other

    cs.CL

    Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration

    Authors: Han-Cheng Yu, Yu-An Shih, Kin-Man Law, Kai-Yu Hsieh, Yu-Chen Cheng, Hsin-Chih Ho, Zih-An Lin, Wen-Chuan Hsu, Yao-Chung Fan

    Abstract: In this paper, we tackle the task of distractor generation (DG) for multiple-choice questions. Our study introduces two key designs. First, we propose \textit{retrieval augmented pretraining}, which involves refining the language model pretraining to align it more closely with the downstream task of DG. Second, we explore the integration of knowledge graphs to enhance the performance of DG. Throug… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Findings at ACL 2024

  18. arXiv:2406.08407  [pdf, other

    cs.CV cs.AI cs.CL

    MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

    Authors: Xuehai He, Weixi Feng, Kaizhi Zheng, Yujie Lu, Wanrong Zhu, Jiachen Li, Yue Fan, Jianfeng Wang, Linjie Li, Zhengyuan Yang, Kevin Lin, William Yang Wang, Lijuan Wang, Xin Eric Wang

    Abstract: Multimodal Language Language Models (MLLMs) demonstrate the emerging abilities of "world models" -- interpreting and reasoning about complex real-world dynamics. To assess these abilities, we posit videos are the ideal medium, as they encapsulate rich representations of real-world dynamics and causalities. To this end, we introduce MMWorld, a new benchmark for multi-discipline, multi-faceted multi… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  19. arXiv:2406.07572  [pdf, ps, other

    cs.AI cs.CE cs.LG

    Domain-specific ReAct for physics-integrated iterative modeling: A case study of LLM agents for gas path analysis of gas turbines

    Authors: Tao Song, Yuwei Fan, Chenlong Feng, Keyu Song, Chao Liu, Dongxiang Jiang

    Abstract: This study explores the application of large language models (LLMs) with callable tools in energy and power engineering domain, focusing on gas path analysis of gas turbines. We developed a dual-agent tool-calling process to integrate expert knowledge, predefined tools, and LLM reasoning. We evaluated various LLMs, including LLama3, Qwen1.5 and GPT. Smaller models struggled with tool usage and par… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  20. arXiv:2406.05779  [pdf, other

    cs.CV

    Learning to utilize image second-order derivative information for crisp edge detection

    Authors: Changsong Liu, Wei Zhang, Yanyan Liu, Yimeng Fan, Mingyang Li, Wenlin Li

    Abstract: Edge detection is a fundamental task in computer vision. It has made great progress under the development of deep convolutional neural networks (DCNNs), some of which have achieved a beyond human-level performance. However, recent top-performing edge detection methods tend to generate thick and noisy edge lines. In this work, we solve this problem from two aspects: (1) the lack of prior knowledge… ▽ More

    Submitted 28 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  21. arXiv:2405.20073  [pdf, other

    cs.IT eess.SP

    Power Allocation for Cell-Free Massive MIMO ISAC Systems with OTFS Signal

    Authors: Yifei Fan, Shaochuan Wu, Xixi Bi, Guoyu Li

    Abstract: Applying integrated sensing and communication (ISAC) to a cell-free massive multiple-input multiple-output (CF mMIMO) architecture has attracted increasing attention. This approach equips CF mMIMO networks with sensing capabilities and resolves the problem of unreliable service at cell edges in conventional cellular networks. However, existing studies on CF-ISAC systems have focused on the applica… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: This work is submitted to IEEE for possible publication

  22. arXiv:2405.17931  [pdf, other

    cs.CL cs.LG

    Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment

    Authors: Keming Lu, Bowen Yu, Fei Huang, Yang Fan, Runji Lin, Chang Zhou

    Abstract: Effectively aligning Large Language Models (LLMs) with human-centric values while preventing the degradation of abilities acquired through Pre-training and Supervised Fine-tuning (SFT) poses a central challenge in Reinforcement Learning from Human Feedback (RLHF). In this paper, we first discover that interpolating RLHF and SFT model parameters can adjust the trade-off between human preference and… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  23. arXiv:2405.17905  [pdf, other

    cs.CV cs.AI cs.CY cs.LG

    Cycle-YOLO: A Efficient and Robust Framework for Pavement Damage Detection

    Authors: Zhengji Li, Xi Xiao, Jiacheng Xie, Yuxiao Fan, Wentao Wang, Gang Chen, Liqiang Zhang, Tianyang Wang

    Abstract: With the development of modern society, traffic volume continues to increase in most countries worldwide, leading to an increase in the rate of pavement damage Therefore, the real-time and highly accurate pavement damage detection and maintenance have become the current need. In this paper, an enhanced pavement damage detection method with CycleGAN and improved YOLOv5 algorithm is presented. We se… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  24. arXiv:2405.16197  [pdf, other

    cs.CV eess.IV

    A 7K Parameter Model for Underwater Image Enhancement based on Transmission Map Prior

    Authors: Fuheng Zhou, Dikai Wei, Ye Fan, Yulong Huang, Yonggang Zhang

    Abstract: Although deep learning based models for underwater image enhancement have achieved good performance, they face limitations in both lightweight and effectiveness, which prevents their deployment and application on resource-constrained platforms. Moreover, most existing deep learning based models use data compression to get high-level semantic information in latent space instead of using the origina… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 10 pages

  25. arXiv:2405.13320  [pdf, ps, other

    cs.IT

    Self-dual 2-quasi Negacyclic Codes over Finite Fields

    Authors: Yun Fan, Yue Leng

    Abstract: In this paper, we investigate the existence and asymptotic property of self-dual $2$-quasi negacyclic codes of length $2n$ over a finite field of cardinality $q$. When $n$ is odd, we show that the $q$-ary self-dual $2$-quasi negacyclic codes exist if and only if $q\,{\not\equiv}-\!1~({\rm mod}~4)$. When $n$ is even, we prove that the $q$-ary self-dual $2$-quasi negacyclic codes always exist. By us… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  26. Selective Focus: Investigating Semantics Sensitivity in Post-training Quantization for Lane Detection

    Authors: Yunqian Fan, Xiuying Wei, Ruihao Gong, Yuqing Ma, Xiangguo Zhang, Qi Zhang, Xianglong Liu

    Abstract: Lane detection (LD) plays a crucial role in enhancing the L2+ capabilities of autonomous driving, capturing widespread attention. The Post-Processing Quantization (PTQ) could facilitate the practical application of LD models, enabling fast speeds and limited memories without labeled data. However, prior PTQ methods do not consider the complex LD outputs that contain physical semantics, such as off… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: Accepted by AAAI-24

    Journal ref: AAAI 2024, 38, 11936-11943

  27. arXiv:2405.00954  [pdf, other

    cs.CV

    X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation

    Authors: Yiwei Ma, Zhekai Lin, Jiayi Ji, Yijun Fan, Xiaoshuai Sun, Rongrong Ji

    Abstract: Recent advancements in automatic 3D avatar generation guided by text have made significant progress. However, existing methods have limitations such as oversaturation and low-quality output. To address these challenges, we propose X-Oscar, a progressive framework for generating high-quality animatable avatars from text prompts. It follows a sequential Geometry->Texture->Animation paradigm, simplif… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: ICML2024

  28. arXiv:2404.17897  [pdf, other

    cs.CL

    Tool Calling: Enhancing Medication Consultation via Retrieval-Augmented Large Language Models

    Authors: Zhongzhen Huang, Kui Xue, Yongqi Fan, Linjie Mu, Ruoyu Liu, Tong Ruan, Shaoting Zhang, Xiaofan Zhang

    Abstract: Large-scale language models (LLMs) have achieved remarkable success across various language tasks but suffer from hallucinations and temporal misalignment. To mitigate these shortcomings, Retrieval-augmented generation (RAG) has been utilized to provide external knowledge to facilitate the answer generation. However, applying such models to the medical domain faces several challenges due to the la… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  29. arXiv:2404.11945  [pdf, other

    cs.RO

    Terrain-Aware Stride-Level Trajectory Forecasting for a Powered Hip Exoskeleton via Vision and Kinematics Fusion

    Authors: Ruoqi Zhao, Xingbang Yan, Yubo Fan

    Abstract: Powered hip exoskeletons have shown the ability for locomotion assistance during treadmill walking. However, providing suitable assistance in real-world walking scenarios which involve changing terrain remains challenging. Recent research suggests that forecasting the lower limb joint's angles could provide target trajectories for exoskeletons and prostheses, and the performance could be improved… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 6 pages, submitted to IEEE RA-L, under review. This work has been submitted to the IEEE Robotics and Automation Letters (RA-L) for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  30. arXiv:2404.11483  [pdf, other

    cs.AI cs.LG

    AgentKit: Structured LLM Reasoning with Dynamic Graphs

    Authors: Yue Wu, Yewen Fan, So Yeon Min, Shrimai Prabhumoye, Stephen McAleer, Yonatan Bisk, Ruslan Salakhutdinov, Yuanzhi Li, Tom Mitchell

    Abstract: We propose an intuitive LLM prompting framework (AgentKit) for multifunctional agents. AgentKit offers a unified framework for explicitly constructing a complex "thought process" from simple natural language prompts. The basic building block in AgentKit is a node, containing a natural language prompt for a specific subtask. The user then puts together chains of nodes, like stacking LEGO pieces. Th… ▽ More

    Submitted 24 July, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  31. arXiv:2404.09738  [pdf

    q-bio.BM cs.AI q-bio.QM

    AMPCliff: quantitative definition and benchmarking of activity cliffs in antimicrobial peptides

    Authors: Kewei Li, Yuqian Wu, Yutong Guo, Yinheng Li, Yusi Fan, Ruochi Zhang, Lan Huang, Fengfeng Zhou

    Abstract: Activity cliff (AC) is a phenomenon that a pair of similar molecules differ by a small structural alternation but exhibit a large difference in their biochemical activities. The AC of small molecules has been extensively investigated but limited knowledge is accumulated about the AC phenomenon in peptides with canonical amino acids. This study introduces a quantitative definition and benchmarking… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  32. arXiv:2404.08402  [pdf, ps, other

    cs.IT

    Galois Self-dual 2-quasi Constacyclic Codes over Finite Fields

    Authors: Yun Fan, Yue Leng

    Abstract: Let $F$ be a field with cardinality $p^\ell$ and $0\neq λ\in F$, and $0\le h<\ell$. Extending Euclidean and Hermitian inner products, Fan and Zhang introduced Galois $p^h$-inner product (DCC, vol.84, pp.473-492). In this paper, we characterize the structure of $2$-quasi $λ$-constacyclic codes over $F$; and exhibit necessary and sufficient conditions for $2$-quasi $λ$-constacyclic codes being Galoi… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  33. arXiv:2404.07424  [pdf, other

    cs.CV

    CopilotCAD: Empowering Radiologists with Report Completion Models and Quantitative Evidence from Medical Image Foundation Models

    Authors: Sheng Wang, Tianming Du, Katherine Fischer, Gregory E Tasian, Justin Ziemba, Joanie M Garratt, Hersh Sagreiya, Yong Fan

    Abstract: Computer-aided diagnosis systems hold great promise to aid radiologists and clinicians in radiological clinical practice and enhance diagnostic accuracy and efficiency. However, the conventional systems primarily focus on delivering diagnostic results through text report generation or medical image classification, positioning them as standalone decision-makers rather than helpers and ignoring radi… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  34. arXiv:2404.05489  [pdf, other

    cs.SE

    The Impact of Sanctions on GitHub Developers and Activities

    Authors: Youmei Fan, Ani Hovhannisyan, Hideaki Hata, Christoph Treude, Raula Gaikovina Kula

    Abstract: The GitHub platform has fueled the creation of truly global software, enabling contributions from developers across various geographical regions of the world. As software becomes more entwined with global politics and social regulations, it becomes similarly subject to government sanctions. In 2019, GitHub restricted access to certain services for users in specific locations but rolled back these… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  35. arXiv:2404.05264  [pdf, other

    cs.CR cs.CV

    Unbridled Icarus: A Survey of the Potential Perils of Image Inputs in Multimodal Large Language Model Security

    Authors: Yihe Fan, Yuxin Cao, Ziyu Zhao, Ziyao Liu, Shaofeng Li

    Abstract: Multimodal Large Language Models (MLLMs) demonstrate remarkable capabilities that increasingly influence various aspects of our daily lives, constantly defining the new boundary of Artificial General Intelligence (AGI). Image modalities, enriched with profound semantic information and a more continuous mathematical nature compared to other modalities, greatly enhance the functionalities of MLLMs w… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: 8 pages, 1 figure

  36. arXiv:2404.04317  [pdf, other

    stat.ML cs.LG q-bio.QM

    DeepLINK-T: deep learning inference for time series data using knockoffs and LSTM

    Authors: Wenxuan Zuo, Zifan Zhu, Yuxuan Du, Yi-Chun Yeh, Jed A. Fuhrman, Jinchi Lv, Yingying Fan, Fengzhu Sun

    Abstract: High-dimensional longitudinal time series data is prevalent across various real-world applications. Many such applications can be modeled as regression problems with high-dimensional time series covariates. Deep learning has been a popular and powerful tool for fitting these regression models. Yet, the development of interpretable and reproducible deep-learning models is challenging and remains un… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  37. arXiv:2404.03577  [pdf, other

    cs.CL

    Untangle the KNOT: Interweaving Conflicting Knowledge and Reasoning Skills in Large Language Models

    Authors: Yantao Liu, Zijun Yao, Xin Lv, Yuchen Fan, Shulin Cao, Jifan Yu, Lei Hou, Juanzi Li

    Abstract: Providing knowledge documents for large language models (LLMs) has emerged as a promising solution to update the static knowledge inherent in their parameters. However, knowledge in the document may conflict with the memory of LLMs due to outdated or incorrect knowledge in the LLMs' parameters. This leads to the necessity of examining the capability of LLMs to assimilate supplemental external know… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Accepted by LREC-COLING 2024 as long paper

  38. arXiv:2404.03532  [pdf, other

    cs.CL

    Evaluating Generative Language Models in Information Extraction as Subjective Question Correction

    Authors: Yuchen Fan, Yantao Liu, Zijun Yao, Jifan Yu, Lei Hou, Juanzi Li

    Abstract: Modern Large Language Models (LLMs) have showcased remarkable prowess in various tasks necessitating sophisticated cognitive behaviors. Nevertheless, a paradoxical performance discrepancy is observed, where these models underperform in seemingly elementary tasks like relation extraction and event extraction due to two issues in conventional evaluation. (1) The imprecision of existing evaluation me… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Accepted by LREC-COLING 2024, short paper

  39. arXiv:2404.01574  [pdf, other

    cs.IR cs.CR cs.LG

    Multi-granular Adversarial Attacks against Black-box Neural Ranking Models

    Authors: Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng

    Abstract: Adversarial ranking attacks have gained increasing attention due to their success in probing vulnerabilities, and, hence, enhancing the robustness, of neural ranking models. Conventional attack methods employ perturbations at a single granularity, e.g., word or sentence level, to target documents. However, limiting perturbations to a single level of granularity may reduce the flexibility of advers… ▽ More

    Submitted 10 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: Accepted by SIGIR2024

  40. arXiv:2403.20014  [pdf, other

    cs.DB cs.AI cs.CL

    PURPLE: Making a Large Language Model a Better SQL Writer

    Authors: Tonghui Ren, Yuankai Fan, Zhenying He, Ren Huang, Jiaqi Dai, Can Huang, Yinan Jing, Kai Zhang, Yifan Yang, X. Sean Wang

    Abstract: Large Language Model (LLM) techniques play an increasingly important role in Natural Language to SQL (NL2SQL) translation. LLMs trained by extensive corpora have strong natural language understanding and basic SQL generation abilities without additional tuning specific to NL2SQL tasks. Existing LLMs-based NL2SQL approaches try to improve the translation by enhancing the LLMs with an emphasis on us… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: 12 pages, accepted by ICDE 2024 (40th IEEE International Conference on Data Engineering)

  41. arXiv:2403.19216  [pdf, other

    cs.IR

    Are Large Language Models Good at Utility Judgments?

    Authors: Hengran Zhang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng

    Abstract: Retrieval-augmented generation (RAG) is considered to be a promising approach to alleviate the hallucination issue of large language models (LLMs), and it has received widespread attention from researchers recently. Due to the limitation in the semantic understanding of retrieval models, the success of RAG heavily lies on the ability of LLMs to identify passages with utility. Recent efforts have e… ▽ More

    Submitted 8 June, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: Acctepted by SIGIR2024

  42. MRSch: Multi-Resource Scheduling for HPC

    Authors: Boyang Li, Yuping Fan, Matthew Dearing, Zhiling Lan, Paul Richy, William Allcocky, Michael Papka

    Abstract: Emerging workloads in high-performance computing (HPC) are embracing significant changes, such as having diverse resource requirements instead of being CPU-centric. This advancement forces cluster schedulers to consider multiple schedulable resources during decision-making. Existing scheduling studies rely on heuristic or optimization methods, which are limited by an inability to adapt to new scen… ▽ More

    Submitted 3 April, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

  43. arXiv:2403.15624  [pdf, other

    cs.CV

    Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting

    Authors: Jun Guo, Xiaojian Ma, Yue Fan, Huaping Liu, Qing Li

    Abstract: Open-vocabulary 3D scene understanding presents a significant challenge in computer vision, withwide-ranging applications in embodied agents and augmented reality systems. Previous approaches haveadopted Neural Radiance Fields (NeRFs) to analyze 3D scenes. In this paper, we introduce SemanticGaussians, a novel open-vocabulary scene understanding approach based on 3D Gaussian Splatting. Our keyidea… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: Project page: see https://semantic-gaussians.github.io

  44. arXiv:2403.15192  [pdf, other

    cs.CV cs.AI

    SFOD: Spiking Fusion Object Detector

    Authors: Yimeng Fan, Wei Zhang, Changsong Liu, Mingyang Li, Wenrui Lu

    Abstract: Event cameras, characterized by high temporal resolution, high dynamic range, low power consumption, and high pixel bandwidth, offer unique capabilities for object detection in specialized contexts. Despite these advantages, the inherent sparsity and asynchrony of event data pose challenges to existing object detection algorithms. Spiking Neural Networks (SNNs), inspired by the way the human brain… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024

  45. arXiv:2403.13909  [pdf, other

    cs.LG eess.SY

    Sequential Modeling of Complex Marine Navigation: Case Study on a Passenger Vessel (Student Abstract)

    Authors: Yimeng Fan, Pedram Agand, Mo Chen, Edward J. Park, Allison Kennedy, Chanwoo Bae

    Abstract: The maritime industry's continuous commitment to sustainability has led to a dedicated exploration of methods to reduce vessel fuel consumption. This paper undertakes this challenge through a machine learning approach, leveraging a real-world dataset spanning two years of a ferry in west coast Canada. Our focus centers on the creation of a time series forecasting model given the dynamic and static… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: 5 pages, 3 figures, AAAI 2024 student abstract

  46. arXiv:2403.13358  [pdf, other

    cs.RO cs.CV cs.LG

    GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot

    Authors: Wenxuan Song, Han Zhao, Pengxiang Ding, Can Cui, Shangke Lyu, Yaning Fan, Donglin Wang

    Abstract: Multi-task robot learning holds significant importance in tackling diverse and complex scenarios. However, current approaches are hindered by performance issues and difficulties in collecting training datasets. In this paper, we propose GeRM (Generalist Robotic Model). We utilize offline reinforcement learning to optimize data utilization strategies to learn from both demonstrations and sub-optima… ▽ More

    Submitted 9 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  47. arXiv:2403.11705  [pdf, other

    cond-mat.str-el cs.LG

    Coarsening of chiral domains in itinerant electron magnets: A machine learning force field approach

    Authors: Yunhao Fan, Sheng Zhang, Gia-Wei Chern

    Abstract: Frustrated itinerant magnets often exhibit complex noncollinear or noncoplanar magnetic orders which support topological electronic structures. A canonical example is the anomalous quantum Hall state with a chiral spin order stabilized by electron-spin interactions on a triangular lattice. While a long-range magnetic order cannot survive thermal fluctuations in two dimensions, the chiral order whi… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 16 pages, 8 figures

  48. arXiv:2403.11481  [pdf, other

    cs.CV

    VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding

    Authors: Yue Fan, Xiaojian Ma, Rujie Wu, Yuntao Du, Jiaqi Li, Zhi Gao, Qing Li

    Abstract: We explore how reconciling several foundation models (large language models and vision-language models) with a novel unified memory mechanism could tackle the challenging video understanding problem, especially capturing the long-term temporal relations in lengthy videos. In particular, the proposed multimodal agent VideoAgent: 1) constructs a structured memory to store both the generic temporal e… ▽ More

    Submitted 15 July, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: ECCV-24; Project page: videoagent.github.io; First two authors contributed equally

  49. arXiv:2403.11453  [pdf, other

    cs.GR cs.CV

    Bridging 3D Gaussian and Mesh for Freeview Video Rendering

    Authors: Yuting Xiao, Xuan Wang, Jiafei Li, Hongrui Cai, Yanbo Fan, Nan Xue, Minghui Yang, Yujun Shen, Shenghua Gao

    Abstract: This is only a preview version of GauMesh. Recently, primitive-based rendering has been proven to achieve convincing results in solving the problem of modeling and rendering the 3D dynamic scene from 2D images. Despite this, in the context of novel view synthesis, each type of primitive has its inherent defects in terms of representation ability. It is difficult to exploit the mesh to depict the f… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 7 pages

  50. arXiv:2403.10779  [pdf, other

    cs.CL

    LLM-based Conversational AI Therapist for Daily Functioning Screening and Psychotherapeutic Intervention via Everyday Smart Devices

    Authors: Jingping Nie, Hanya Shao, Yuang Fan, Qijia Shao, Haoxuan You, Matthias Preindl, Xiaofan Jiang

    Abstract: Despite the global mental health crisis, access to screenings, professionals, and treatments remains high. In collaboration with licensed psychotherapists, we propose a Conversational AI Therapist with psychotherapeutic Interventions (CaiTI), a platform that leverages large language models (LLM)s and smart devices to enable better mental health self-care. CaiTI can screen the day-to-day functionin… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.