Skip to main content

Showing 1–50 of 734 results for author: Wu, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.15914  [pdf, other

    cs.CV

    CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization

    Authors: Feize Wu, Yun Pang, Junyi Zhang, Lianyu Pang, Jian Yin, Baoquan Zhao, Qing Li, Xudong Mao

    Abstract: Recent advances in text-to-image personalization have enabled high-quality and controllable image synthesis for user-provided concepts. However, existing methods still struggle to balance identity preservation with text alignment. Our approach is based on the fact that generating prompt-aligned images requires a precise semantic understanding of the prompt, which involves accurately processing the… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  2. arXiv:2408.14792  [pdf, other

    cs.CY

    Measuring Human Contribution in AI-Assisted Content Generation

    Authors: Yueqi Xie, Tao Qi, Jingwei Yi, Ryan Whalen, Junming Huang, Qian Ding, Yu Xie, Xing Xie, Fangzhao Wu

    Abstract: With the growing prevalence of generative artificial intelligence (AI), an increasing amount of content is no longer exclusively generated by humans but by generative AI models with human guidance. This shift presents notable challenges for the delineation of originality due to the varying degrees of human contribution in AI-assisted works. This study raises the research question of measuring huma… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  3. arXiv:2408.14219  [pdf, other

    cs.RO

    Visuo-Tactile Exploration of Unknown Rigid 3D Curvatures by Vision-Augmented Unified Force-Impedance Control

    Authors: Kübra Karacan, Anran Zhang, Hamid Sadeghian, Fan Wu, Sami Haddadin

    Abstract: Despite recent advancements in torque-controlled tactile robots, integrating them into manufacturing settings remains challenging, particularly in complex environments. Simplifying robotic skill programming for non-experts is crucial for increasing robot deployment in manufacturing. This work proposes an innovative approach, Vision-Augmented Unified Force-Impedance Control (VA-UFIC), aimed at intu… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 8 pages, 3 figures, accepted by IROS 2024

  4. arXiv:2408.12867  [pdf, other

    cs.CV

    Semantic Alignment for Multimodal Large Language Models

    Authors: Tao Wu, Mengze Li, Jingyuan Chen, Wei Ji, Wang Lin, Jinyang Gao, Kun Kuang, Zhou Zhao, Fei Wu

    Abstract: Research on Multi-modal Large Language Models (MLLMs) towards the multi-image cross-modal instruction has received increasing attention and made significant progress, particularly in scenarios involving closely resembling images (e.g., change captioning). Existing MLLMs typically follow a two-step process in their pipelines: first, extracting visual tokens independently for each input image, and t… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Accepted by MM 2024

  5. arXiv:2408.12469  [pdf, other

    cs.CV

    Envisioning Class Entity Reasoning by Large Language Models for Few-shot Learning

    Authors: Mushui Liu, Fangtai Wu, Bozheng Li, Ziqian Lu, Yunlong Yu, Xi Li

    Abstract: Few-shot learning (FSL) aims to recognize new concepts using a limited number of visual samples. Existing approaches attempt to incorporate semantic information into the limited visual data for category understanding. However, these methods often enrich class-level feature representations with abstract category names, failing to capture the nuanced features essential for effective generalization.… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 9 pages, 7 figures

  6. arXiv:2408.12285  [pdf, other

    cs.RO

    Tactile-Morph Skills: Energy-Based Control Meets Data-Driven Learning

    Authors: Anran Zhang, Kübra Karacan, Hamid Sadeghian, Yansong Wu, Fan Wu, Sami Haddadin

    Abstract: Robotic manipulation is essential for modernizing factories and automating industrial tasks like polishing, which require advanced tactile abilities. These robots must be easily set up, safely work with humans, learn tasks autonomously, and transfer skills to similar tasks. Addressing these needs, we introduce the tactile-morph skill framework, which integrates unified force-impedance control with… ▽ More

    Submitted 23 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

    Comments: 15 pages, 7 figures,updated footnote

  7. arXiv:2408.08152  [pdf, other

    cs.CL cs.AI cs.LG cs.LO

    DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

    Authors: Huajian Xin, Z. Z. Ren, Junxiao Song, Zhihong Shao, Wanjia Zhao, Haocheng Wang, Bo Liu, Liyue Zhang, Xuan Lu, Qiushi Du, Wenjun Gao, Qihao Zhu, Dejian Yang, Zhibin Gou, Z. F. Wu, Fuli Luo, Chong Ruan

    Abstract: We introduce DeepSeek-Prover-V1.5, an open-source language model designed for theorem proving in Lean 4, which enhances DeepSeek-Prover-V1 by optimizing both training and inference processes. Pre-trained on DeepSeekMath-Base with specialization in formal mathematical languages, the model undergoes supervised fine-tuning using an enhanced formal theorem proving dataset derived from DeepSeek-Prover-… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  8. arXiv:2408.06849  [pdf, other

    cs.AI cs.CL

    Causal Agent based on Large Language Model

    Authors: Kairong Han, Kun Kuang, Ziyu Zhao, Junjian Ye, Fei Wu

    Abstract: Large language models (LLMs) have achieved significant success across various domains. However, the inherent complexity of causal problems and causal theory poses challenges in accurately describing them in natural language, making it difficult for LLMs to comprehend and use them effectively. Causal methods are not easily conveyed through natural language, which hinders LLMs' ability to apply them… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  9. arXiv:2408.06021  [pdf, other

    cs.CV

    ClickAttention: Click Region Similarity Guided Interactive Segmentation

    Authors: Long Xu, Shanghong Li, Yongquan Chen, Junkang Chen, Rui Huang, Feng Wu

    Abstract: Interactive segmentation algorithms based on click points have garnered significant attention from researchers in recent years. However, existing studies typically use sparse click maps as model inputs to segment specific target objects, which primarily affect local regions and have limited abilities to focus on the whole target object, leading to increased times of clicks. In addition, most exist… ▽ More

    Submitted 12 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

  10. arXiv:2408.05428  [pdf, other

    cs.LG stat.ME stat.ML

    Generalized Encouragement-Based Instrumental Variables for Counterfactual Regression

    Authors: Anpeng Wu, Kun Kuang, Ruoxuan Xiong, Xiangwei Chen, Zexu Sun, Fei Wu, Kun Zhang

    Abstract: In causal inference, encouragement designs (EDs) are widely used to analyze causal effects, when randomized controlled trials (RCTs) are impractical or compliance to treatment cannot be perfectly enforced. Unlike RCTs, which directly allocate treatments, EDs randomly assign encouragement policies that positively motivate individuals to engage in a specific treatment. These random encouragements ac… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  11. arXiv:2408.04863  [pdf, other

    cs.SE

    Coding-PTMs: How to Find Optimal Code Pre-trained Models for Code Embedding in Vulnerability Detection?

    Authors: Yu Zhao, Lina Gong, Zhiqiu Huang, Yongwei Wang, Mingqiang Wei, Fei Wu

    Abstract: Vulnerability detection is garnering increasing attention in software engineering, since code vulnerabilities possibly pose significant security. Recently, reusing various code pre-trained models has become common for code embedding without providing reasonable justifications in vulnerability detection. The premise for casually utilizing pre-trained models (PTMs) is that the code embeddings genera… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: Accepted by ASE 2024

  12. arXiv:2408.04637  [pdf, other

    cs.CL

    APE: Active Learning-based Tooling for Finding Informative Few-shot Examples for LLM-based Entity Matching

    Authors: Kun Qian, Yisi Sang, Farima Fatahi Bayat, Anton Belyi, Xianqi Chu, Yash Govind, Samira Khorshidi, Rahul Khot, Katherine Luna, Azadeh Nikfarjam, Xiaoguang Qi, Fei Wu, Xianhan Zhang, Yunyao Li

    Abstract: Prompt engineering is an iterative procedure often requiring extensive manual effort to formulate suitable instructions for effectively directing large language models (LLMs) in specific tasks. Incorporating few-shot examples is a vital and effective approach to providing LLMs with precise instructions, leading to improved LLM performance. Nonetheless, identifying the most informative demonstratio… ▽ More

    Submitted 29 July, 2024; originally announced August 2024.

    Comments: 3 pages, Proceedings of the Fifth Workshop on Data Science with Human-in-the-Loop (DaSH 2024)

  13. arXiv:2408.01556  [pdf, other

    astro-ph.IM cs.DL cs.IR

    pathfinder: A Semantic Framework for Literature Review and Knowledge Discovery in Astronomy

    Authors: Kartheik G. Iyer, Mikaeel Yunus, Charles O'Neill, Christine Ye, Alina Hyk, Kiera McCormick, Ioana Ciuca, John F. Wu, Alberto Accomazzi, Simone Astarita, Rishabh Chakrabarty, Jesse Cranney, Anjalie Field, Tirthankar Ghosal, Michele Ginolfi, Marc Huertas-Company, Maja Jablonska, Sandor Kruk, Huiling Liu, Gabriel Marchidan, Rohit Mistry, J. P. Naiman, J. E. G. Peek, Mugdha Polimera, Sergio J. Rodriguez , et al. (5 additional authors not shown)

    Abstract: The exponential growth of astronomical literature poses significant challenges for researchers navigating and synthesizing general insights or even domain-specific knowledge. We present Pathfinder, a machine learning framework designed to enable literature review and knowledge discovery in astronomy, focusing on semantic searching with natural language instead of syntactic searches with keywords.… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 25 pages, 9 figures, submitted to AAS jorunals. Comments are welcome, and the tools mentioned are available online at https://pfdr.app

  14. arXiv:2408.00657  [pdf, other

    cs.LG

    Disentangling Dense Embeddings with Sparse Autoencoders

    Authors: Charles O'Neill, Christine Ye, Kartheik Iyer, John F. Wu

    Abstract: Sparse autoencoders (SAEs) have shown promise in extracting interpretable features from complex neural networks. We present one of the first applications of SAEs to dense text embeddings from large language models, demonstrating their effectiveness in disentangling semantic concepts. By training SAEs on embeddings of over 420,000 scientific paper abstracts from computer science and astronomy, we s… ▽ More

    Submitted 4 August, 2024; v1 submitted 1 August, 2024; originally announced August 2024.

  15. arXiv:2408.00123  [pdf, other

    cs.IR cs.AI cs.MM cs.SI

    Semantic Codebook Learning for Dynamic Recommendation Models

    Authors: Zheqi Lv, Shaoxuan He, Tianyu Zhan, Shengyu Zhang, Wenqiao Zhang, Jingyuan Chen, Zhou Zhao, Fei Wu

    Abstract: Dynamic sequential recommendation (DSR) can generate model parameters based on user behavior to improve the personalization of sequential recommendation under various user preferences. However, it faces the challenges of large parameter search space and sparse and noisy user-item interactions, which reduces the applicability of the generated model parameters. The Semantic Codebook Learning for Dyn… ▽ More

    Submitted 31 July, 2024; originally announced August 2024.

  16. arXiv:2407.20600  [pdf, other

    cs.CV

    Knowledge Fused Recognition: Fusing Hierarchical Knowledge for Image Recognition through Quantitative Relativity Modeling and Deep Metric Learning

    Authors: Yunfeng Zhao, Huiyu Zhou, Fei Wu, Xifeng Wu

    Abstract: Image recognition is an essential baseline for deep metric learning. Hierarchical knowledge about image classes depicts inter-class similarities or dissimilarities. Effective fusion of hierarchical knowledge about image classes to enhance image recognition remains a challenging topic to advance. In this paper, we propose a novel deep metric learning based method to effectively fuse hierarchical pr… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  17. arXiv:2407.19402  [pdf, other

    cs.CV eess.IV

    NVC-1B: A Large Neural Video Coding Model

    Authors: Xihua Sheng, Chuanbo Tang, Li Li, Dong Liu, Feng Wu

    Abstract: The emerging large models have achieved notable progress in the fields of natural language processing and computer vision. However, large models for neural video coding are still unexplored. In this paper, we try to explore how to build a large neural video coding model. Based on a small baseline model, we gradually scale up the model sizes of its different coding parts, including the motion encod… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  18. arXiv:2407.19389  [pdf, other

    cs.DC cs.LG math.OC

    FIARSE: Model-Heterogeneous Federated Learning via Importance-Aware Submodel Extraction

    Authors: Feijie Wu, Xingchen Wang, Yaqing Wang, Tianci Liu, Lu Su, Jing Gao

    Abstract: In federated learning (FL), accommodating clients' varied computational capacities poses a challenge, often limiting the participation of those with constrained resources in global model training. To address this issue, the concept of model heterogeneity through submodel extraction has emerged, offering a tailored solution that aligns the model's complexity with each client's computational capacit… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  19. arXiv:2407.15309  [pdf, other

    cs.DC cs.LG

    vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving

    Authors: Jiale Xu, Rui Zhang, Cong Guo, Weiming Hu, Zihan Liu, Feiyang Wu, Yu Feng, Shixuan Sun, Changxu Shao, Yuhong Guo, Junping Zhao, Ke Zhang, Minyi Guo, Jingwen Leng

    Abstract: Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests. This surge in demand poses significant challenges in optimizing throughput and latency while keeping costs manageable. The Key-Value (KV) cache, a standard method for retaining previous computations, makes LLM inference highly bounded by memory. While batching strategies can enhance performa… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 16 pages, 12 figures

  20. arXiv:2407.15026  [pdf, other

    cs.AR cs.AI

    Benchmarking End-To-End Performance of AI-Based Chip Placement Algorithms

    Authors: Zhihai Wang, Zijie Geng, Zhaojie Tu, Jie Wang, Yuxi Qian, Zhexuan Xu, Ziyan Liu, Siyuan Xu, Zhentao Tang, Shixiong Kai, Mingxuan Yuan, Jianye Hao, Bin Li, Yongdong Zhang, Feng Wu

    Abstract: The increasing complexity of modern very-large-scale integration (VLSI) design highlights the significance of Electronic Design Automation (EDA) technologies. Chip placement is a critical step in the EDA workflow, which positions chip modules on the canvas with the goal of optimizing performance, power, and area (PPA) metrics of final chip designs. Recent advances have demonstrated the great poten… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: A comprehensive benchmark for AI-based chip placement algorithms using end-to-end performance metrics

  21. arXiv:2407.14022  [pdf, other

    stat.ME cs.LG

    Causal Inference with Complex Treatments: A Survey

    Authors: Yingrong Wang, Haoxuan Li, Minqin Zhu, Anpeng Wu, Ruoxuan Xiong, Fei Wu, Kun Kuang

    Abstract: Causal inference plays an important role in explanatory analysis and decision making across various fields like statistics, marketing, health care, and education. Its main task is to estimate treatment effects and make intervention policies. Traditionally, most of the previous works typically focus on the binary treatment setting that there is only one treatment for a unit to adopt or not. However… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  22. arXiv:2407.11541  [pdf, other

    eess.IV cs.CV

    Uniformly Accelerated Motion Model for Inter Prediction

    Authors: Zhuoyuan Li, Yao Li, Chuanbo Tang, Li Li, Dong Liu, Feng Wu

    Abstract: Inter prediction is a key technology to reduce the temporal redundancy in video coding. In natural videos, there are usually multiple moving objects with variable velocity, resulting in complex motion fields that are difficult to represent compactly. In Versatile Video Coding (VVC), existing inter prediction methods usually assume uniform speed motion between consecutive frames and use the linear… ▽ More

    Submitted 21 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: 5 pages, 4 figures

  23. arXiv:2407.10926  [pdf, other

    eess.IV cs.CV

    In-Loop Filtering via Trained Look-Up Tables

    Authors: Zhuoyuan Li, Jiacheng Li, Yao Li, Li Li, Dong Liu, Feng Wu

    Abstract: In-loop filtering (ILF) is a key technology for removing the artifacts in image/video coding standards. Recently, neural network-based in-loop filtering methods achieve remarkable coding gains beyond the capability of advanced video coding standards, which becomes a powerful coding tool candidate for future video coding standards. However, the utilization of deep neural networks brings heavy time… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 11 pages, 6 figures

  24. arXiv:2407.07771  [pdf, other

    cs.CL cs.CV cs.MM

    Multi-task Prompt Words Learning for Social Media Content Generation

    Authors: Haochen Xue, Chong Zhang, Chengzhi Liu, Fangyu Wu, Xiaobo Jin

    Abstract: The rapid development of the Internet has profoundly changed human life. Humans are increasingly expressing themselves and interacting with others on social media platforms. However, although artificial intelligence technology has been widely used in many aspects of life, its application in social media content creation is still blank. To solve this problem, we propose a new prompt word generation… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 8 pages, 5 figures

    Journal ref: International Joint Conference on Neural Networks 2024

  25. arXiv:2407.07295  [pdf, other

    eess.IV cs.CE cs.CV

    Deformation-Recovery Diffusion Model (DRDM): Instance Deformation for Image Manipulation and Synthesis

    Authors: Jian-Qing Zheng, Yuanhan Mo, Yang Sun, Jiahua Li, Fuping Wu, Ziyang Wang, Tonia Vincent, Bartłomiej W. Papież

    Abstract: In medical imaging, the diffusion models have shown great potential in synthetic image generation tasks. However, these models often struggle with the interpretable connections between the generated and existing images and could create illusions. To address these challenges, our research proposes a novel diffusion-based generative model based on deformation diffusion and recovery. This model, name… ▽ More

    Submitted 21 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

  26. arXiv:2407.06590  [pdf, other

    cs.RO cs.AI

    Revolutionizing Battery Disassembly: The Design and Implementation of a Battery Disassembly Autonomous Mobile Manipulator Robot(BEAM-1)

    Authors: Yanlong Peng, Zhigang Wang, Yisheng Zhang, Shengmin Zhang, Nan Cai, Fan Wu, Ming Chen

    Abstract: The efficient disassembly of end-of-life electric vehicle batteries(EOL-EVBs) is crucial for green manufacturing and sustainable development. The current pre-programmed disassembly conducted by the Autonomous Mobile Manipulator Robot(AMMR) struggles to meet the disassembly requirements in dynamic environments, complex scenarios, and unstructured processes. In this paper, we propose a Battery Disas… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  27. arXiv:2407.06503  [pdf, other

    cs.LG

    Preference-Guided Reinforcement Learning for Efficient Exploration

    Authors: Guojian Wang, Faguo Wu, Xiao Zhang, Tianyuan Chen, Xuyang Chen, Lin Zhao

    Abstract: In this paper, we investigate preference-based reinforcement learning (PbRL) that allows reinforcement learning (RL) agents to learn from human feedback. This is particularly valuable when defining a fine-grain reward function is not feasible. However, this approach is inefficient and impractical for promoting deep exploration in hard-exploration tasks with long horizons and sparse rewards. To tac… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 13 pages, 17 figures

  28. arXiv:2407.06127  [pdf, other

    cs.CV

    Better Sampling, towards Better End-to-end Small Object Detection

    Authors: Zile Huang, Chong Zhang, Mingyu Jin, Fangyu Wu, Chengzhi Liu, Xiaobo Jin

    Abstract: While deep learning-based general object detection has made significant strides in recent years, the effectiveness and efficiency of small object detection remain unsatisfactory. This is primarily attributed not only to the limited characteristics of such small targets but also to the high density and mutual overlap among these targets. The existing transformer-based small object detectors do not… ▽ More

    Submitted 17 May, 2024; originally announced July 2024.

    Comments: 14 pages, 5 figures

  29. arXiv:2407.03038  [pdf, other

    cs.CL cs.DC cs.LG

    On the Client Preference of LLM Fine-tuning in Federated Learning

    Authors: Feijie Wu, Xiaoze Liu, Haoyu Wang, Xingchen Wang, Jing Gao

    Abstract: Reinforcement learning with human feedback (RLHF) fine-tunes a pretrained large language model (LLM) using preference datasets, enabling the LLM to generate outputs that align with human preferences. Given the sensitive nature of these preference datasets held by various clients, there is a need to implement RLHF within a federated learning (FL) framework, where clients are reluctant to share thei… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Work in progress

  30. arXiv:2407.02209  [pdf, other

    cs.CL cs.AI

    Generative Monoculture in Large Language Models

    Authors: Fan Wu, Emily Black, Varun Chandrasekaran

    Abstract: We introduce {\em generative monoculture}, a behavior observed in large language models (LLMs) characterized by a significant narrowing of model output diversity relative to available training data for a given task: for example, generating only positive book reviews for books with a mixed reception. While in some cases, generative monoculture enhances performance (e.g., LLMs more often produce eff… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  31. arXiv:2406.17706  [pdf, other

    cs.LG cs.CL cs.DC

    FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model

    Authors: Feijie Wu, Zitao Li, Yaliang Li, Bolin Ding, Jing Gao

    Abstract: Large language models (LLMs) show amazing performance on many domain-specific tasks after fine-tuning with some appropriate data. However, many domain-specific data are privately distributed across multiple owners. Thus, this dilemma raises the interest in how to perform LLM fine-tuning in federated learning (FL). However, confronted with limited computation and communication capacities, FL client… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: KDD 2024

  32. arXiv:2406.16989  [pdf, other

    cs.LG cs.AI

    Retrieval-Augmented Mixture of LoRA Experts for Uploadable Machine Learning

    Authors: Ziyu Zhao, Leilei Gan, Guoyin Wang, Yuwei Hu, Tao Shen, Hongxia Yang, Kun Kuang, Fei Wu

    Abstract: Low-Rank Adaptation (LoRA) offers an efficient way to fine-tune large language models (LLMs). Its modular and plug-and-play nature allows the integration of various domain-specific LoRAs, enhancing LLM capabilities. Open-source platforms like Huggingface and Modelscope have introduced a new computational paradigm, Uploadable Machine Learning (UML). In UML, contributors use decentralized data to tr… ▽ More

    Submitted 16 July, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2402.09997

  33. arXiv:2406.12975  [pdf, other

    cs.CL cs.AI cs.CY

    SHIELD: Evaluation and Defense Strategies for Copyright Compliance in LLM Text Generation

    Authors: Xiaoze Liu, Ting Sun, Tianyang Xu, Feijie Wu, Cunxiang Wang, Xiaoqian Wang, Jing Gao

    Abstract: Large Language Models (LLMs) have transformed machine learning but raised significant legal concerns due to their potential to produce text that infringes on copyrights, resulting in several high-profile lawsuits. The legal landscape is struggling to keep pace with these rapid advancements, with ongoing debates about whether generated text might plagiarize copyrighted materials. Current LLMs may i… ▽ More

    Submitted 21 August, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: Work in progress

  34. arXiv:2406.12928  [pdf, other

    cs.LG cs.AI cs.CL

    Evaluating the Generalization Ability of Quantized LLMs: Benchmark, Analysis, and Toolbox

    Authors: Yijun Liu, Yuan Meng, Fang Wu, Shenhao Peng, Hang Yao, Chaoyu Guan, Chen Tang, Xinzhu Ma, Zhi Wang, Wenwu Zhu

    Abstract: Large language models (LLMs) have exhibited exciting progress in multiple scenarios, while the huge computational demands hinder their deployments in lots of real-world applications. As an effective means to reduce memory footprint and inference cost, quantization also faces challenges in performance degradation at low bit-widths. Understanding the impact of quantization on LLM capabilities, espec… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  35. arXiv:2406.11633  [pdf, other

    cs.CV

    DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models

    Authors: Renqiu Xia, Song Mao, Xiangchao Yan, Hongbin Zhou, Bo Zhang, Haoyang Peng, Jiahao Pi, Daocheng Fu, Wenjie Wu, Hancheng Ye, Shiyang Feng, Bin Wang, Chao Xu, Conghui He, Pinlong Cai, Min Dou, Botian Shi, Sheng Zhou, Yongwei Wang, Bin Wang, Junchi Yan, Fei Wu, Yu Qiao

    Abstract: Scientific documents record research findings and valuable human knowledge, comprising a vast corpus of high-quality data. Leveraging multi-modality data extracted from these documents and assessing large models' abilities to handle scientific document-oriented tasks is therefore meaningful. Despite promising advancements, large models still perform poorly on multi-page scientific document extract… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Homepage of DocGenome: https://unimodal4reasoning.github.io/DocGenome_page 22 pages, 11 figures

  36. arXiv:2406.08709  [pdf, other

    cs.LG stat.ME

    Introducing Diminutive Causal Structure into Graph Representation Learning

    Authors: Hang Gao, Peng Qiao, Yifan Jin, Fengge Wu, Jiangmeng Li, Changwen Zheng

    Abstract: When engaging in end-to-end graph representation learning with Graph Neural Networks (GNNs), the intricate causal relationships and rules inherent in graph data pose a formidable challenge for the model in accurately capturing authentic data relationships. A proposed mitigating strategy involves the direct integration of rules or relationships corresponding to the graph data into the model. Howeve… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  37. arXiv:2406.07741  [pdf, other

    cs.CV

    Back to the Color: Learning Depth to Specific Color Transformation for Unsupervised Depth Estimation

    Authors: Yufan Zhu, Chongzhi Ran, Mingtao Feng, Fangfang Wu, Le Dong, Weisheng Dong, Antonio M. López, Guangming Shi

    Abstract: Virtual engines can generate dense depth maps for various synthetic scenes, making them invaluable for training depth estimation models. However, discrepancies between synthetic and real-world colors pose significant challenges for depth estimation in real-world scenes, especially in complex and uncertain environments encountered in unsupervised monocular depth estimation tasks. To address this is… ▽ More

    Submitted 26 July, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  38. arXiv:2406.05504  [pdf, other

    cs.LG

    G-Transformer: Counterfactual Outcome Prediction under Dynamic and Time-varying Treatment Regimes

    Authors: Hong Xiong, Feng Wu, Leon Deng, Megan Su, Li-wei H Lehman

    Abstract: In the context of medical decision making, counterfactual prediction enables clinicians to predict treatment outcomes of interest under alternative courses of therapeutic actions given observed patient history. Prior machine learning approaches for counterfactual predictions under time-varying treatments focus on static time-varying treatment regimes where treatments do not depend on previous cova… ▽ More

    Submitted 27 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

  39. arXiv:2406.05000  [pdf, other

    cs.CV

    AttnDreamBooth: Towards Text-Aligned Personalized Text-to-Image Generation

    Authors: Lianyu Pang, Jian Yin, Baoquan Zhao, Feize Wu, Fu Lee Wang, Qing Li, Xudong Mao

    Abstract: Recent advances in text-to-image models have enabled high-quality personalized image synthesis of user-provided concepts with flexible textual control. In this work, we analyze the limitations of two primary techniques in text-to-image personalization: Textual Inversion and DreamBooth. When integrating the learned concept into new prompts, Textual Inversion tends to overfit the concept, while Drea… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  40. arXiv:2406.03703  [pdf, other

    cs.CL cs.LG

    Synthesizing Conversations from Unlabeled Documents using Automatic Response Segmentation

    Authors: Fanyou Wu, Weijie Xu, Chandan K. Reddy, Srinivasan H. Sengamedu

    Abstract: In this study, we tackle the challenge of inadequate and costly training data that has hindered the development of conversational question answering (ConvQA) systems. Enterprises have a large corpus of diverse internal documents. Instead of relying on a searching engine, a more compelling approach for people to comprehend these documents is to create a dialogue system. In this paper, we propose a… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: findings of ACL 2024

  41. arXiv:2406.02027  [pdf, other

    cs.LG cs.AI cs.CR cs.CV

    Inference Attacks: A Taxonomy, Survey, and Promising Directions

    Authors: Feng Wu, Lei Cui, Shaowen Yao, Shui Yu

    Abstract: The prosperity of machine learning has also brought people's concerns about data privacy. Among them, inference attacks can implement privacy breaches in various MLaaS scenarios and model training/prediction phases. Specifically, inference attacks can perform privacy inference on undisclosed target training sets based on outputs of the target model, including but not limited to statistics, members… ▽ More

    Submitted 27 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  42. arXiv:2405.20626  [pdf, other

    cs.IR cs.IT

    Causal Distillation for Alleviating Performance Heterogeneity in Recommender Systems

    Authors: Shengyu Zhang, Ziqi Jiang, Jiangchao Yao, Fuli Feng, Kun Kuang, Zhou Zhao, Shuo Li, Hongxia Yang, Tat-Seng Chua, Fei Wu

    Abstract: Recommendation performance usually exhibits a long-tail distribution over users -- a small portion of head users enjoy much more accurate recommendation services than the others. We reveal two sources of this performance heterogeneity problem: the uneven distribution of historical interactions (a natural source); and the biased training of recommender models (a model source). As addressing this pr… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: TKDE 2023

  43. arXiv:2405.20389  [pdf, other

    astro-ph.IM cs.AI cs.HC cs.IR

    Designing an Evaluation Framework for Large Language Models in Astronomy Research

    Authors: John F. Wu, Alina Hyk, Kiera McCormick, Christine Ye, Simone Astarita, Elina Baral, Jo Ciuca, Jesse Cranney, Anjalie Field, Kartheik Iyer, Philipp Koehn, Jenn Kotler, Sandor Kruk, Michelle Ntampaka, Charles O'Neill, Joshua E. G. Peek, Sanjib Sharma, Mikaeel Yunus

    Abstract: Large Language Models (LLMs) are shifting how scientific research is done. It is imperative to understand how researchers interact with these models and how scientific sub-communities like astronomy might benefit from them. However, there is currently no standard for evaluating the use of LLMs in astronomy. Therefore, we present the experimental design for an evaluation study on how astronomy rese… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 7 pages, 3 figures. Code available at https://github.com/jsalt2024-evaluating-llms-for-astronomy/astro-arxiv-bot

  44. arXiv:2405.18315  [pdf, other

    cs.AI cs.PL

    DSDL: Data Set Description Language for Bridging Modalities and Tasks in AI Data

    Authors: Bin Wang, Linke Ouyang, Fan Wu, Wenchang Ning, Xiao Han, Zhiyuan Zhao, Jiahui Peng, Yiying Jiang, Dahua Lin, Conghui He

    Abstract: In the era of artificial intelligence, the diversity of data modalities and annotation formats often renders data unusable directly, requiring understanding and format conversion before it can be used by researchers or developers with different needs. To tackle this problem, this article introduces a framework called Dataset Description Language (DSDL) that aims to simplify dataset processing by p… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  45. arXiv:2405.17830  [pdf, other

    cs.CL

    More Than Catastrophic Forgetting: Integrating General Capabilities For Domain-Specific LLMs

    Authors: Chengyuan Liu, Shihang Wang, Yangyang Kang, Lizhi Qing, Fubang Zhao, Changlong Sun, Kun Kuang, Fei Wu

    Abstract: The performance on general tasks decreases after Large Language Models (LLMs) are fine-tuned on domain-specific tasks, the phenomenon is known as Catastrophic Forgetting (CF). However, this paper presents a further challenge for real application of domain-specific LLMs beyond CF, called General Capabilities Integration (GCI), which necessitates the integration of both the general capabilities and… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  46. arXiv:2405.16847  [pdf, other

    cs.CV cs.AI

    TokenUnify: Scalable Autoregressive Visual Pre-training with Mixture Token Prediction

    Authors: Yinda Chen, Haoyuan Shi, Xiaoyu Liu, Te Shi, Ruobing Zhang, Dong Liu, Zhiwei Xiong, Feng Wu

    Abstract: Autoregressive next-token prediction is a standard pretraining method for large-scale language models, but its application to vision tasks is hindered by the non-sequential nature of image data, leading to cumulative errors. Most vision models employ masked autoencoder (MAE) based pretraining, which faces scalability issues. To address these challenges, we introduce \textbf{TokenUnify}, a novel pr… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  47. arXiv:2405.14314  [pdf, other

    cs.AI cs.CL cs.LG cs.MA cs.RO

    Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration

    Authors: Yang Zhang, Shixin Yang, Chenjia Bai, Fei Wu, Xiu Li, Zhen Wang, Xuelong Li

    Abstract: Grounding the reasoning ability of large language models (LLMs) for embodied tasks is challenging due to the complexity of the physical world. Especially, LLM planning for multi-agent collaboration requires communication of agents or credit assignment as the feedback to re-adjust the proposed plans and achieve effective coordination. However, existing methods that overly rely on physical verificat… ▽ More

    Submitted 25 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: The first two authors contributed equally

  48. arXiv:2405.13097  [pdf, other

    cs.CV

    NieR: Normal-Based Lighting Scene Rendering

    Authors: Hongsheng Wang, Yang Wang, Yalan Liu, Fayuan Hu, Shengyu Zhang, Fei Wu, Feng Lin

    Abstract: In real-world road scenes, diverse material properties lead to complex light reflection phenomena, making accurate color reproduction crucial for enhancing the realism and safety of simulated driving environments. However, existing methods often struggle to capture the full spectrum of lighting effects, particularly in dynamic scenarios where viewpoint changes induce significant material color var… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  49. arXiv:2405.12806  [pdf, other

    cs.CV

    MOSS: Motion-based 3D Clothed Human Synthesis from Monocular Video

    Authors: Hongsheng Wang, Xiang Cai, Xi Sun, Jinhong Yue, Zhanyun Tang, Shengyu Zhang, Feng Lin, Fei Wu

    Abstract: Single-view clothed human reconstruction holds a central position in virtual reality applications, especially in contexts involving intricate human motions. It presents notable challenges in achieving realistic clothing deformation. Current methodologies often overlook the influence of motion on surface deformation, resulting in surfaces lacking the constraints imposed by global motion. To overcom… ▽ More

    Submitted 21 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:1710.03746 by other authors

  50. arXiv:2405.12724  [pdf, other

    cs.CV

    RemoCap: Disentangled Representation Learning for Motion Capture

    Authors: Hongsheng Wang, Lizao Zhang, Zhangnan Zhong, Shuolin Xu, Xinrui Zhou, Shengyu Zhang, Huahao Xu, Fei Wu, Feng Lin

    Abstract: Reconstructing 3D human bodies from realistic motion sequences remains a challenge due to pervasive and complex occlusions. Current methods struggle to capture the dynamics of occluded body parts, leading to model penetration and distorted motion. RemoCap leverages Spatial Disentanglement (SD) and Motion Disentanglement (MD) to overcome these limitations. SD addresses occlusion interference betwee… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.