Skip to main content

Showing 1–50 of 221 results for author: Peng, W

Searching in archive cs. Search in all archives.
.
  1. T3M: Text Guided 3D Human Motion Synthesis from Speech

    Authors: Wenshuo Peng, Kaipeng Zhang, Sai Qian Zhang

    Abstract: Speech-driven 3D motion synthesis seeks to create lifelike animations based on human speech, with potential uses in virtual reality, gaming, and the film production. Existing approaches reply solely on speech audio for motion generation, leading to inaccurate and inflexible synthesis results. To mitigate this problem, we introduce a novel text-guided 3D human motion synthesis method, termed \texti… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: 10 pages,4figures

  2. arXiv:2408.08822  [pdf, ps, other

    cs.CV

    PFDiff: Training-free Acceleration of Diffusion Models through the Gradient Guidance of Past and Future

    Authors: Guangyi Wang, Yuren Cai, Lijiang Li, Wei Peng, Songzhi Su

    Abstract: Diffusion Probabilistic Models (DPMs) have shown remarkable potential in image generation, but their sampling efficiency is hindered by the need for numerous denoising steps. Most existing solutions accelerate the sampling process by proposing fast ODE solvers. However, the inevitable discretization errors of the ODE solvers are significantly magnified when the number of function evaluations (NFE)… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  3. arXiv:2408.06576  [pdf, other

    cs.CL

    CTISum: A New Benchmark Dataset For Cyber Threat Intelligence Summarization

    Authors: Wei Peng, Junmei Ding, Wei Wang, Lei Cui, Wei Cai, Zhiyu Hao, Xiaochun Yun

    Abstract: Cyber Threat Intelligence (CTI) summarization task requires the system to generate concise and accurate highlights from raw intelligence data, which plays an important role in providing decision-makers with crucial information to quickly detect and respond to cyber threats in the cybersecurity domain. However, efficient techniques for summarizing CTI reports, including facts, analytical insights,… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  4. arXiv:2408.04568  [pdf, other

    cs.CL cs.AI

    Learning Fine-Grained Grounded Citations for Attributed Large Language Models

    Authors: Lei Huang, Xiaocheng Feng, Weitao Ma, Yuxuan Gu, Weihong Zhong, Xiachong Feng, Weijiang Yu, Weihua Peng, Duyu Tang, Dandan Tu, Bing Qin

    Abstract: Despite the impressive performance on information-seeking tasks, large language models (LLMs) still struggle with hallucinations. Attributed LLMs, which augment generated text with in-line citations, have shown potential in mitigating hallucinations and improving verifiability. However, current approaches suffer from suboptimal citation quality due to their reliance on in-context learning. Further… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Accepted by ACL 2024 Findings

  5. arXiv:2408.01230  [pdf, other

    cs.RO cs.LG

    HeteroMorpheus: Universal Control Based on Morphological Heterogeneity Modeling

    Authors: YiFan Hao, Yang Yang, Junru Song, Wei Peng, Weien Zhou, Tingsong Jiang, Wen Yao

    Abstract: In the field of robotic control, designing individual controllers for each robot leads to high computational costs. Universal control policies, applicable across diverse robot morphologies, promise to mitigate this challenge. Predominantly, models based on Graph Neural Networks (GNN) and Transformers are employed, owing to their effectiveness in capturing relational dynamics across a robot's limbs… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  6. arXiv:2407.19687  [pdf, other

    cs.CR cs.CL

    Efficiently and Effectively: A Two-stage Approach to Balance Plaintext and Encrypted Text for Traffic Classification

    Authors: Wei Peng

    Abstract: Encrypted traffic classification is the task of identifying the application or service associated with encrypted network traffic. One effective approach for this task is to use deep learning methods to encode the raw traffic bytes directly and automatically extract features for classification (byte-based models). However, current byte-based models input raw traffic bytes, whether plaintext or encr… ▽ More

    Submitted 11 August, 2024; v1 submitted 29 July, 2024; originally announced July 2024.

  7. arXiv:2407.19651  [pdf, other

    cs.CV cs.LG cs.MM

    ComNeck: Bridging Compressed Image Latents and Multimodal LLMs via Universal Transform-Neck

    Authors: Chia-Hao Kao, Cheng Chien, Yu-Jen Tseng, Yi-Hsin Chen, Alessandro Gnutti, Shao-Yuan Lo, Wen-Hsiao Peng, Riccardo Leonardi

    Abstract: This paper presents the first-ever study of adapting compressed image latents to suit the needs of downstream vision tasks that adopt Multimodal Large Language Models (MLLMs). MLLMs have extended the success of large language models to modalities (e.g. images) beyond text, but their billion scale hinders deployment on resource-constrained end devices. While cloud-hosted MLLMs could be available, t… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  8. arXiv:2407.13205  [pdf, ps, other

    cs.CL

    Transformer-based Single-Cell Language Model: A Survey

    Authors: Wei Lan, Guohang He, Mingyang Liu, Qingfeng Chen, Junyue Cao, Wei Peng

    Abstract: The transformers have achieved significant accomplishments in the natural language processing as its outstanding parallel processing capabilities and highly flexible attention mechanism. In addition, increasing studies based on transformers have been proposed to model single-cell data. In this review, we attempt to systematically summarize the single-cell language models and applications based on… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  9. arXiv:2407.12309  [pdf, other

    cs.CL

    MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language Models

    Authors: Thao Minh Nguyen Phan, Cong-Tinh Dao, Chenwei Wu, Jian-Zhe Wang, Shun Liu, Jun-En Ding, David Restrepo, Feng Liu, Fang-Ming Hung, Wen-Chih Peng

    Abstract: Electronic health records (EHRs) are multimodal by nature, consisting of structured tabular features like lab tests and unstructured clinical notes. In real-life clinical practice, doctors use complementary multimodal EHR data sources to get a clearer picture of patients' health and support clinical decision-making. However, most EHR predictive models do not reflect these procedures, as they eithe… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  10. arXiv:2407.12254  [pdf, other

    cs.LG stat.ME

    COKE: Causal Discovery with Chronological Order and Expert Knowledge in High Proportion of Missing Manufacturing Data

    Authors: Ting-Yun Ou, Ching Chang, Wen-Chih Peng

    Abstract: Understanding causal relationships between machines is crucial for fault diagnosis and optimization in manufacturing processes. Real-world datasets frequently exhibit up to 90% missing data and high dimensionality from hundreds of sensors. These datasets also include domain-specific expert knowledge and chronological order information, reflecting the recording order across different machines, whic… ▽ More

    Submitted 31 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted by the ACM International Conference on Information and Knowledge Management (CIKM) 2024

  11. Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification

    Authors: Wenshuo Peng, Kaipeng Zhang, Yue Yang, Hao Zhang, Yu Qiao

    Abstract: Vision-language foundation models have been incredibly successful in a wide range of downstream computer vision tasks using adaptation methods. However, due to the high cost of obtaining pre-training datasets, pairs with weak image-text correlation in the data exist in large numbers. We call them weak-paired samples. Due to the limitations of these weak-paired samples, the pre-training model are u… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 9 pages,4 figures

  12. arXiv:2407.03245  [pdf, other

    cs.RO cs.AI eess.SY

    TieBot: Learning to Knot a Tie from Visual Demonstration through a Real-to-Sim-to-Real Approach

    Authors: Weikun Peng, Jun Lv, Yuwei Zeng, Haonan Chen, Siheng Zhao, Jichen Sun, Cewu Lu, Lin Shao

    Abstract: The tie-knotting task is highly challenging due to the tie's high deformation and long-horizon manipulation actions. This work presents TieBot, a Real-to-Sim-to-Real learning from visual demonstration system for the robots to learn to knot a tie. We introduce the Hierarchical Feature Matching approach to estimate a sequence of tie's meshes from the demonstration video. With these estimated meshes… ▽ More

    Submitted 3 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: fix few typos

  13. arXiv:2406.18129  [pdf, other

    cs.CV cs.LG

    CTS: Sim-to-Real Unsupervised Domain Adaptation on 3D Detection

    Authors: Meiying Zhang, Weiyuan Peng, Guangyao Ding, Chenyang Lei, Chunlin Ji, Qi Hao

    Abstract: Simulation data can be accurately labeled and have been expected to improve the performance of data-driven algorithms, including object detection. However, due to the various domain inconsistencies from simulation to reality (sim-to-real), cross-domain object detection algorithms usually suffer from dramatic performance drops. While numerous unsupervised domain adaptation (UDA) methods have been d… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  14. arXiv:2406.18116  [pdf, other

    cs.CL cs.AI cs.HC

    BADGE: BADminton report Generation and Evaluation with LLM

    Authors: Shang-Hsuan Chiang, Lin-Wei Chao, Kuang-Da Wang, Chih-Chuan Wang, Wen-Chih Peng

    Abstract: Badminton enjoys widespread popularity, and reports on matches generally include details such as player names, game scores, and ball types, providing audiences with a comprehensive view of the games. However, writing these reports can be a time-consuming task. This challenge led us to explore whether a Large Language Model (LLM) could automate the generation and evaluation of badminton reports. We… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted by IJCAI 2024 Workshop: The 2nd International Workshop on Intelligent Technologies for Precision Sports Science (IT4PSS)

  15. arXiv:2406.11483  [pdf

    cs.CE

    Analysis of water injection heat recovery potential of abandoned oil wells to geothermal wells in northern Shaanxi

    Authors: Yu Huagui, Liu Shi, Pang Yanyan, Wang Peng, Gao Qian

    Abstract: The Chang 2 bottom water reservoir area in the western part of northern Shaanxi is one of the core oil-producing areas in the Ordos Basin.One of the main reservoirs is the Chang 2 reservoir of the Triassic Yanchang Formation, which has good physical conditions, active edge and bottom water, and high geothermal gradient. In this paper, the reservoir numerical simulation software CMG is used to simu… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Journal ref: Modern Electric Power, 2023, 1-9

  16. arXiv:2406.11176  [pdf, other

    cs.CL cs.AI

    Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement

    Authors: Weimin Xiong, Yifan Song, Xiutian Zhao, Wenhao Wu, Xun Wang, Ke Wang, Cheng Li, Wei Peng, Sujian Li

    Abstract: Large language model agents have exhibited exceptional performance across a range of complex interactive tasks. Recent approaches have utilized tuning with expert trajectories to enhance agent performance, yet they primarily concentrate on outcome rewards, which may lead to errors or suboptimal actions due to the absence of process supervision signals. In this paper, we introduce the Iterative ste… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  17. arXiv:2406.10744  [pdf, other

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Shengping Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou, Cong Li, Senyan Xu , et al. (75 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 12 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 PBDL Challenges: https://pbdl-ws.github.io/pbdl2024/challenge/index.html

  18. arXiv:2406.09265  [pdf, other

    cs.CL

    Sharing Matters: Analysing Neurons Across Languages and Tasks in LLMs

    Authors: Weixuan Wang, Barry Haddow, Wei Peng, Alexandra Birch

    Abstract: Multilingual large language models (LLMs) have greatly increased the ceiling of performance on non-English tasks. However the mechanisms behind multilingualism in these LLMs are poorly understood. Of particular interest is the degree to which internal representations are shared between languages. Recent work on neuron analysis of LLMs has focused on the monolingual case, and the limited work on th… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  19. arXiv:2405.15299  [pdf, other

    cs.CV

    Transparent Object Depth Completion

    Authors: Yifan Zhou, Wanli Peng, Zhongyu Yang, He Liu, Yi Sun

    Abstract: The perception of transparent objects for grasp and manipulation remains a major challenge, because existing robotic grasp methods which heavily rely on depth maps are not suitable for transparent objects due to their unique visual properties. These properties lead to gaps and inaccuracies in the depth maps of the transparent objects captured by depth sensors. To address this issue, we propose an… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  20. arXiv:2405.10305  [pdf, other

    cs.CV cs.AI

    4D Panoptic Scene Graph Generation

    Authors: Jingkang Yang, Jun Cen, Wenxuan Peng, Shuai Liu, Fangzhou Hong, Xiangtai Li, Kaiyang Zhou, Qifeng Chen, Ziwei Liu

    Abstract: We are living in a three-dimensional space while moving forward through a fourth dimension: time. To allow artificial intelligence to develop a comprehensive understanding of such a 4D environment, we introduce 4D Panoptic Scene Graph (PSG-4D), a new representation that bridges the raw visual data perceived in a dynamic 4D world and high-level visual understanding. Specifically, PSG-4D abstracts r… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Accepted as NeurIPS 2023. Code: https://github.com/Jingkang50/PSG4D Previous Series: PSG https://github.com/Jingkang50/OpenPSG and PVSG https://github.com/Jingkang50/OpenPVSG

  21. arXiv:2405.06964  [pdf, other

    cs.RO cs.AI

    ManiFoundation Model for General-Purpose Robotic Manipulation of Contact Synthesis with Arbitrary Objects and Robots

    Authors: Zhixuan Xu, Chongkai Gao, Zixuan Liu, Gang Yang, Chenrui Tie, Haozhuo Zheng, Haoyu Zhou, Weikun Peng, Debang Wang, Tianyi Chen, Zhouliang Yu, Lin Shao

    Abstract: To substantially enhance robot intelligence, there is a pressing need to develop a large model that enables general-purpose robots to proficiently undertake a broad spectrum of manipulation tasks, akin to the versatile task-planning ability exhibited by LLMs. The vast diversity in objects, robots, and manipulation tasks presents huge challenges. Our work introduces a comprehensive framework to dev… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  22. arXiv:2404.18527  [pdf

    cs.LG cs.AI cs.CR stat.AP

    Bridging Data Barriers among Participants: Assessing the Potential of Geoenergy through Federated Learning

    Authors: Weike Peng, Jiaxin Gao, Yuntian Chen, Shengwei Wang

    Abstract: Machine learning algorithms emerge as a promising approach in energy fields, but its practical is hindered by data barriers, stemming from high collection costs and privacy concerns. This study introduces a novel federated learning (FL) framework based on XGBoost models, enabling safe collaborative modeling with accessible yet concealed data from multiple parties. Hyperparameter tuning of the mode… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  23. arXiv:2404.10413  [pdf, other

    cs.DB cs.LG cs.PF

    VDTuner: Automated Performance Tuning for Vector Data Management Systems

    Authors: Tiannuo Yang, Wen Hu, Wangqi Peng, Yusen Li, Jianguo Li, Gang Wang, Xiaoguang Liu

    Abstract: Vector data management systems (VDMSs) have become an indispensable cornerstone in large-scale information retrieval and machine learning systems like large language models. To enhance the efficiency and flexibility of similarity search, VDMS exposes many tunable index parameters and system parameters for users to specify. However, due to the inherent characteristics of VDMS, automatic performance… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Accepted by ICDE 2024

  24. arXiv:2404.10229  [pdf, other

    cs.CL

    Generative Text Steganography with Large Language Model

    Authors: Jiaxuan Wu, Zhengxian Wu, Yiming Xue, Juan Wen, Wanli Peng

    Abstract: Recent advances in large language models (LLMs) have blurred the boundary of high-quality text generation between humans and machines, which is favorable for generative text steganography. While, current advanced steganographic mapping is not suitable for LLMs since most users are restricted to accessing only the black-box API or user interface of the LLMs, thereby lacking access to the training v… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  25. arXiv:2404.07200  [pdf, other

    cs.LG

    Toward a Better Understanding of Fourier Neural Operators: Analysis and Improvement from a Spectral Perspective

    Authors: Shaoxiang Qin, Fuyuan Lyu, Wenhui Peng, Dingyang Geng, Ju Wang, Naiping Gao, Xue Liu, Liangzhu Leon Wang

    Abstract: In solving partial differential equations (PDEs), Fourier Neural Operators (FNOs) have exhibited notable effectiveness compared to Convolutional Neural Networks (CNNs). This paper presents clear empirical evidence through spectral analysis to elucidate the superiority of FNO over CNNs: FNO is significantly more capable of learning low-frequencies. This empirical evidence also unveils FNO's distinc… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  26. arXiv:2403.12406  [pdf, other

    cs.AI cs.LG

    Offline Imitation of Badminton Player Behavior via Experiential Contexts and Brownian Motion

    Authors: Kuang-Da Wang, Wei-Yao Wang, Ping-Chun Hsieh, Wen-Chih Peng

    Abstract: In the dynamic and rapid tactic involvements of turn-based sports, badminton stands out as an intrinsic paradigm that requires alter-dependent decision-making of players. While the advancement of learning from offline expert data in sequential decision-making has been witnessed in various domains, how to rally-wise imitate the behaviors of human players from offline badminton matches has remained… ▽ More

    Submitted 3 August, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted by the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2024)

  27. arXiv:2403.10281  [pdf, other

    cs.CL cs.AI cs.LG

    Team Trifecta at Factify5WQA: Setting the Standard in Fact Verification with Fine-Tuning

    Authors: Shang-Hsuan Chiang, Ming-Chih Lo, Lin-Wei Chao, Wen-Chih Peng

    Abstract: In this paper, we present Pre-CoFactv3, a comprehensive framework comprised of Question Answering and Text Classification components for fact verification. Leveraging In-Context Learning, Fine-tuned Large Language Models (LLMs), and the FakeNet model, we address the challenges of fact verification. Our experiments explore diverse approaches, comparing different Pre-trained LLMs, introducing FakeNe… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted by AAAI 2024 Workshop: FACTIFY 3.0 - Workshop Series on Multimodal Fact-Checking and Hate Speech Detection

  28. arXiv:2403.04785  [pdf, other

    cs.CL cs.AI

    Large Language Multimodal Models for 5-Year Chronic Disease Cohort Prediction Using EHR Data

    Authors: Jun-En Ding, Phan Nguyen Minh Thao, Wen-Chih Peng, Jian-Zhe Wang, Chun-Cheng Chug, Min-Chen Hsieh, Yun-Chien Tseng, Ling Chen, Dongsheng Luo, Chi-Te Wang, Pei-fu Chen, Feng Liu, Fang-Ming Hung

    Abstract: Chronic diseases such as diabetes are the leading causes of morbidity and mortality worldwide. Numerous research studies have been attempted with various deep learning models in diagnosis. However, most previous studies had certain limitations, including using publicly available datasets (e.g. MIMIC), and imbalanced data. In this study, we collected five-year electronic health records (EHRs) from… ▽ More

    Submitted 29 August, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

  29. arXiv:2402.01204  [pdf, other

    cs.LG cs.AI

    A Survey on Self-Supervised Learning for Non-Sequential Tabular Data

    Authors: Wei-Yao Wang, Wei-Wei Du, Derek Xu, Wei Wang, Wen-Chih Peng

    Abstract: Self-supervised learning (SSL) has been incorporated into many state-of-the-art models in various domains, where SSL defines pretext tasks based on unlabeled datasets to learn contextualized and robust representations. Recently, SSL has been a new trend in exploring the representation learning capability in the realm of tabular data, which is more challenging due to not having explicit relations f… ▽ More

    Submitted 5 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: The paper list can be found at https://github.com/wwweiwei/awesome-self-supervised-learning-for-tabular-data

  30. arXiv:2402.01140  [pdf, other

    cs.LG cs.AI cs.DC

    Root Cause Analysis In Microservice Using Neural Granger Causal Discovery

    Authors: Cheng-Ming Lin, Ching Chang, Wei-Yao Wang, Kuang-Da Wang, Wen-Chih Peng

    Abstract: In recent years, microservices have gained widespread adoption in IT operations due to their scalability, maintenance, and flexibility. However, it becomes challenging for site reliability engineers (SREs) to pinpoint the root cause due to the complex relationships in microservices when facing system malfunctions. Previous research employed structured learning methods (e.g., PC-algorithm) to estab… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: AAAI 2024 Main Track

  31. arXiv:2402.00253  [pdf, other

    cs.CV cs.CL cs.LG

    A Survey on Hallucination in Large Vision-Language Models

    Authors: Hanchao Liu, Wenyuan Xue, Yifei Chen, Dapeng Chen, Xiutian Zhao, Ke Wang, Liping Hou, Rongjun Li, Wei Peng

    Abstract: Recent development of Large Vision-Language Models (LVLMs) has attracted growing attention within the AI landscape for its practical implementation potential. However, ``hallucination'', or more specifically, the misalignment between factual visual content and corresponding textual generation, poses a significant challenge of utilizing LVLMs. In this comprehensive survey, we dissect LVLM-related h… ▽ More

    Submitted 5 May, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

  32. arXiv:2401.15509  [pdf, other

    cs.CL cs.AI cs.SI

    Style-News: Incorporating Stylized News Generation and Adversarial Verification for Neural Fake News Detection

    Authors: Wei-Yao Wang, Yu-Chieh Chang, Wen-Chih Peng

    Abstract: With the improvements in generative models, the issues of producing hallucinations in various domains (e.g., law, writing) have been brought to people's attention due to concerns about misinformation. In this paper, we focus on neural fake news, which refers to content generated by neural networks aiming to mimic the style of real news to deceive people. To prevent harmful disinformation spreading… ▽ More

    Submitted 27 January, 2024; originally announced January 2024.

    Comments: EACL 2024 Main Track

  33. arXiv:2401.09025  [pdf, other

    cs.HC cs.CY

    Exploring the Diversity of Music Experiences for Deaf and Hard of Hearing People

    Authors: Kyrie Zhixuan Zhou, Weirui Peng, Yuhan Liu, Rachel F. Adler

    Abstract: Sensory substitution or enhancement techniques have been proposed to enable deaf or hard of hearing (DHH) people to listen to and even compose music. However, little is known about how such techniques enhance DHH people's music experience. Since deafness is a spectrum -- as are DHH people's preferences and perceptions of music -- a more situated understanding of their interaction with music is nee… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  34. arXiv:2401.08053  [pdf, other

    cs.CV

    SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation

    Authors: Zhixuan Liu, Peter Schaldenbrand, Beverley-Claire Okogwu, Wenxuan Peng, Youngsik Yun, Andrew Hundt, Jihie Kim, Jean Oh

    Abstract: Accurate representation in media is known to improve the well-being of the people who consume it. Generative image models trained on large web-crawled datasets such as LAION are known to produce images with harmful stereotypes and misrepresentations of cultures. We improve inclusive representation in generated images by (1) engaging with communities to collect a culturally representative dataset t… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  35. arXiv:2401.06775  [pdf, other

    cs.CL cs.AI

    Large language models in healthcare and medical domain: A review

    Authors: Zabir Al Nazi, Wei Peng

    Abstract: The deployment of large language models (LLMs) within the healthcare sector has sparked both enthusiasm and apprehension. These models exhibit the remarkable capability to provide proficient responses to free-text queries, demonstrating a nuanced understanding of professional medical knowledge. This comprehensive survey delves into the functionalities of existing LLMs designed for healthcare appli… ▽ More

    Submitted 8 July, 2024; v1 submitted 12 December, 2023; originally announced January 2024.

  36. arXiv:2401.00652  [pdf, other

    cs.CV

    From Covert Hiding to Visual Editing: Robust Generative Video Steganography

    Authors: Xueying Mao, Xiaoxiao Hu, Wanli Peng, Zhenliang Gan, Qichao Ying, Zhenxing Qian, Sheng Li, Xinpeng Zhang

    Abstract: Traditional video steganography methods are based on modifying the covert space for embedding, whereas we propose an innovative approach that embeds secret message within semantic feature for steganography during the video editing process. Although existing traditional video steganography methods display a certain level of security and embedding capacity, they lack adequate robustness against comm… ▽ More

    Submitted 31 December, 2023; originally announced January 2024.

    Comments: Under Review

  37. arXiv:2312.17617  [pdf, other

    cs.CL

    Large Language Models for Generative Information Extraction: A Survey

    Authors: Derong Xu, Wei Chen, Wenjun Peng, Chao Zhang, Tong Xu, Xiangyu Zhao, Xian Wu, Yefeng Zheng, Yang Wang, Enhong Chen

    Abstract: Information extraction (IE) aims to extract structural knowledge (such as entities, relations, and events) from plain natural language texts. Recently, generative Large Language Models (LLMs) have demonstrated remarkable capabilities in text understanding and generation, allowing for generalization across various domains and tasks. As a result, numerous works have been proposed to harness abilitie… ▽ More

    Submitted 4 June, 2024; v1 submitted 29 December, 2023; originally announced December 2023.

    Comments: v2: Updated 100+ new papers, 5 technical categories

  38. arXiv:2312.11553  [pdf, other

    cs.SI cs.AI cs.CL cs.LG

    SeGA: Preference-Aware Self-Contrastive Learning with Prompts for Anomalous User Detection on Twitter

    Authors: Ying-Ying Chang, Wei-Yao Wang, Wen-Chih Peng

    Abstract: In the dynamic and rapidly evolving world of social media, detecting anomalous users has become a crucial task to address malicious activities such as misinformation and cyberbullying. As the increasing number of anomalous users improves the ability to mimic normal users and evade detection, existing methods only focusing on bot detection are ineffective in terms of capturing subtle distinctions b… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: AAAI 2024 Main Track

  39. arXiv:2312.10942  [pdf, other

    cs.AI cs.LG

    ShuttleSHAP: A Turn-Based Feature Attribution Approach for Analyzing Forecasting Models in Badminton

    Authors: Wei-Yao Wang, Wen-Chih Peng, Wei Wang, Philip S. Yu

    Abstract: Agent forecasting systems have been explored to investigate agent patterns and improve decision-making in various domains, e.g., pedestrian predictions and marketing bidding. Badminton represents a fascinating example of a multifaceted turn-based sport, requiring both sophisticated tactic developments and alternate-dependent decision-making. Recent deep learning approaches for player tactic foreca… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: Preprint

  40. arXiv:2312.06372  [pdf, other

    cs.CV

    Ternary Spike: Learning Ternary Spikes for Spiking Neural Networks

    Authors: Yufei Guo, Yuanpei Chen, Xiaode Liu, Weihang Peng, Yuhan Zhang, Xuhui Huang, Zhe Ma

    Abstract: The Spiking Neural Network (SNN), as one of the biologically inspired neural network infrastructures, has drawn increasing attention recently. It adopts binary spike activations to transmit information, thus the multiplications of activations and weights can be substituted by additions, which brings high energy efficiency. However, in the paper, we theoretically and experimentally prove that the b… ▽ More

    Submitted 16 December, 2023; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI2024

  41. arXiv:2312.04142  [pdf, other

    cs.LG cs.AI

    TimeDRL: Disentangled Representation Learning for Multivariate Time-Series

    Authors: Ching Chang, Chiao-Tung Chan, Wei-Yao Wang, Wen-Chih Peng, Tien-Fu Chen

    Abstract: Multivariate time-series data in numerous real-world applications (e.g., healthcare and industry) are informative but challenging due to the lack of labels and high dimensionality. Recent studies in self-supervised learning have shown their potential in learning rich representations without relying on labels, yet they fall short in learning disentangled embeddings and addressing issues of inductiv… ▽ More

    Submitted 17 July, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: This paper has been accepted by the International Conference on Data Engineering (ICDE) 2024

  42. arXiv:2312.02366  [pdf, other

    cs.CV cs.AI

    Towards General Purpose Vision Foundation Models for Medical Image Analysis: An Experimental Study of DINOv2 on Radiology Benchmarks

    Authors: Mohammed Baharoon, Waseem Qureshi, Jiahong Ouyang, Yanwu Xu, Abdulrhman Aljouie, Wei Peng

    Abstract: The integration of deep learning systems into healthcare has been hindered by the resource-intensive process of data annotation and the inability of these systems to generalize to different data distributions. Foundation models, which are models pre-trained on large datasets, have emerged as a solution to reduce reliance on annotated data and enhance model generalizability and robustness. DINOv2 i… ▽ More

    Submitted 28 December, 2023; v1 submitted 4 December, 2023; originally announced December 2023.

  43. arXiv:2312.00081  [pdf, other

    cs.CV

    Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding

    Authors: Wujian Peng, Sicheng Xie, Zuyao You, Shiyi Lan, Zuxuan Wu

    Abstract: Vision language models (VLM) have demonstrated remarkable performance across various downstream tasks. However, understanding fine-grained visual-linguistic concepts, such as attributes and inter-object relationships, remains a significant challenge. While several benchmarks aim to evaluate VLMs in finer granularity, their primary focus remains on the linguistic aspect, neglecting the visual dimen… ▽ More

    Submitted 30 March, 2024; v1 submitted 29 November, 2023; originally announced December 2023.

    Comments: Accepted by CVPR 2024

  44. arXiv:2311.17058  [pdf, other

    cs.CV cs.AI

    Panoptic Video Scene Graph Generation

    Authors: Jingkang Yang, Wenxuan Peng, Xiangtai Li, Zujin Guo, Liangyu Chen, Bo Li, Zheng Ma, Kaiyang Zhou, Wayne Zhang, Chen Change Loy, Ziwei Liu

    Abstract: Towards building comprehensive real-world visual perception systems, we propose and study a new problem called panoptic scene graph generation (PVSG). PVSG relates to the existing video scene graph generation (VidSGG) problem, which focuses on temporal interactions between humans and objects grounded with bounding boxes in videos. However, the limitation of bounding boxes in detecting non-rigid ob… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: Accepted to CVPR 2023. Project Page: https://jingkang50.github.io/PVSG/. Codebase: https://github.com/LilyDaytoy/OpenPVSG. We provide 400 long videos with frame-level panoptic segmentation, scene graph, dense captions, and QA annotations

  45. arXiv:2311.16113  [pdf, other

    cs.CR

    BAGEL: Backdoor Attacks against Federated Contrastive Learning

    Authors: Yao Huang, Kongyang Chen, Jiannong Cao, Jiaxing Shen, Shaowei Wang, Yun Peng, Weilong Peng, Kechao Cai

    Abstract: Federated Contrastive Learning (FCL) is an emerging privacy-preserving paradigm in distributed learning for unlabeled data. In FCL, distributed parties collaboratively learn a global encoder with unlabeled data, and the global encoder could be widely used as a feature extractor to build models for many downstream tasks. However, FCL is also vulnerable to many security threats (e.g., backdoor attac… ▽ More

    Submitted 14 September, 2023; originally announced November 2023.

  46. arXiv:2311.15619  [pdf, other

    cs.CV cs.AI

    Align before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition

    Authors: Yifei Chen, Dapeng Chen, Ruijin Liu, Sai Zhou, Wenyuan Xue, Wei Peng

    Abstract: Large-scale visual-language pre-trained models have achieved significant success in various video tasks. However, most existing methods follow an "adapt then align" paradigm, which adapts pre-trained image encoders to model video-level representations and utilizes one-hot or text embedding of the action labels for supervision. This paradigm overlooks the challenge of mapping from static images to… ▽ More

    Submitted 20 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: Accepted at CVPR 2024

  47. arXiv:2311.14091  [pdf, other

    cs.HC cs.AI cs.CY cs.MM

    PortfolioMentor: Multimodal Generative AI Companion for Learning and Crafting Interactive Digital Art Portfolios

    Authors: Tao Long, Weirui Peng

    Abstract: Digital art portfolios serve as impactful mediums for artists to convey their visions, weaving together visuals, audio, interactions, and narratives. However, without technical backgrounds, design students often find it challenging to translate creative ideas into tangible codes and designs, given the lack of tailored resources for the non-technical, academic support in art schools, and a comprehe… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

    Comments: 3 pages, 1 figure, work in progress

  48. arXiv:2311.05876  [pdf, other

    cs.CL

    Trends in Integration of Knowledge and Large Language Models: A Survey and Taxonomy of Methods, Benchmarks, and Applications

    Authors: Zhangyin Feng, Weitao Ma, Weijiang Yu, Lei Huang, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, Ting liu

    Abstract: Large language models (LLMs) exhibit superior performance on various natural language tasks, but they are susceptible to issues stemming from outdated data and domain-specific limitations. In order to address these challenges, researchers have pursued two primary strategies, knowledge editing and retrieval augmentation, to enhance LLMs by incorporating external information from different aspects.… ▽ More

    Submitted 7 December, 2023; v1 submitted 10 November, 2023; originally announced November 2023.

    Comments: Work in progress; 22 pages. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  49. arXiv:2311.05232  [pdf, other

    cs.CL

    A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

    Authors: Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, Ting Liu

    Abstract: The emergence of large language models (LLMs) has marked a significant breakthrough in natural language processing (NLP), leading to remarkable advancements in text understanding and generation. Nevertheless, alongside these strides, LLMs exhibit a critical tendency to produce hallucinations, resulting in content that is inconsistent with real-world facts or user inputs. This phenomenon poses subs… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: Work in progress; 49 pages

  50. arXiv:2311.03758  [pdf, other

    cs.IR

    Large Language Model based Long-tail Query Rewriting in Taobao Search

    Authors: Wenjun Peng, Guiyang Li, Yue Jiang, Zilong Wang, Dan Ou, Xiaoyi Zeng, Derong Xu, Tong Xu, Enhong Chen

    Abstract: In the realm of e-commerce search, the significance of semantic matching cannot be overstated, as it directly impacts both user experience and company revenue. Along this line, query rewriting, serving as an important technique to bridge the semantic gaps inherent in the semantic matching process, has attached wide attention from the industry and academia. However, existing query rewriting methods… ▽ More

    Submitted 4 March, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

    Comments: WWW Industry