Skip to main content

Showing 1–50 of 181 results for author: Song, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12077  [pdf, other

    cs.CL cs.AI

    GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression

    Authors: Daniel Goldstein, Fares Obeid, Eric Alcaide, Guangyu Song, Eugene Cheah

    Abstract: We introduce GoldFinch, a hybrid Linear Attention/Transformer sequence model that uses a new technique to efficiently generate a highly compressed and reusable KV-Cache in linear time and space with respect to sequence length. GoldFinch stacks our new GOLD transformer on top of an enhanced version of the Finch (RWKV-6) architecture. We train up to 1.5B parameter class models of the Finch, Llama, a… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  2. arXiv:2407.06683  [pdf, other

    cs.RO cs.CV cs.LG

    Accelerating Online Mapping and Behavior Prediction via Direct BEV Feature Attention

    Authors: Xunjiang Gu, Guanyu Song, Igor Gilitschenski, Marco Pavone, Boris Ivanovic

    Abstract: Understanding road geometry is a critical component of the autonomous vehicle (AV) stack. While high-definition (HD) maps can readily provide such information, they suffer from high labeling and maintenance costs. Accordingly, many recent works have proposed methods for estimating HD maps online from sensor data. The vast majority of recent approaches encode multi-camera observations into an inter… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 21 pages, 10 figures, 6 tables. ECCV 2024

  3. arXiv:2407.04753  [pdf, other

    cs.LG cs.HC eess.SP

    Annotation of Sleep Depth Index with Scalable Deep Learning Yields Novel Digital Biomarkers for Sleep Health

    Authors: Songchi Zhou, Ge Song, Haoqi Sun, Yue Leng, M. Brandon Westover, Shenda Hong

    Abstract: Traditional sleep staging categorizes sleep and wakefulness into five coarse-grained classes, overlooking subtle variations within each stage. It provides limited information about the probability of arousal and may hinder the diagnosis of sleep disorders, such as insomnia. To address this issue, we propose a deep-learning method for automatic and scalable annotation of sleep depth index using exi… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: working in progress

  4. arXiv:2407.04066  [pdf, ps, other

    cs.CV

    EMPL: A novel Efficient Meta Prompt Learning Framework for Few-shot Unsupervised Domain Adaptation

    Authors: Wanqi Yang, Haoran Wang, Lei Wang, Ge Song, Yang Gao

    Abstract: Few-shot unsupervised domain adaptation (FS-UDA) utilizes few-shot labeled source domain data to realize effective classification in unlabeled target domain. However, current FS-UDA methods are still suffer from two issues: 1) the data from different domains can not be effectively aligned by few-shot labeled data due to the large domain gaps, 2) it is unstable and time-consuming to generalize to n… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  5. arXiv:2407.00979  [pdf, other

    cs.CV

    Cross-Modal Attention Alignment Network with Auxiliary Text Description for zero-shot sketch-based image retrieval

    Authors: Hanwen Su, Ge Song, Kai Huang, Jiyan Wang, Ming Yang

    Abstract: In this paper, we study the problem of zero-shot sketch-based image retrieval (ZS-SBIR). The prior methods tackle the problem in a two-modality setting with only category labels or even no textual information involved. However, the growing prevalence of Large-scale pre-trained Language Models (LLMs), which have demonstrated great knowledge learned from web-scale data, can provide us with an opport… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  6. arXiv:2406.11831  [pdf, other

    cs.CV

    Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models

    Authors: Bingqi Ma, Zhuofan Zong, Guanglu Song, Hongsheng Li, Yu Liu

    Abstract: Large language models (LLMs) based on decoder-only transformers have demonstrated superior text understanding capabilities compared to CLIP and T5-series models. However, the paradigm for utilizing current advanced LLMs in text-to-image diffusion models remains to be explored. We observed an unusual phenomenon: directly using a large language model as the prompt encoder significantly degrades the… ▽ More

    Submitted 21 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  7. arXiv:2406.09388  [pdf, other

    cs.CV cs.AI cs.LG

    Exploring the Spectrum of Visio-Linguistic Compositionality and Recognition

    Authors: Youngtaek Oh, Pyunghwan Ahn, Jinhyung Kim, Gwangmo Song, Soonyoung Lee, In So Kweon, Junmo Kim

    Abstract: Vision and language models (VLMs) such as CLIP have showcased remarkable zero-shot recognition abilities yet face challenges in visio-linguistic compositionality, particularly in linguistic comprehension and fine-grained image-text alignment. This paper explores the intricate relationship between compositionality and recognition -- two pivotal aspects of VLM capability. We conduct a comprehensive… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted to CVPRW 2024 on 'What is Next in Multimodal Foundation Models?'. Code: https://github.com/ytaek-oh/vl_compo

  8. arXiv:2406.07348  [pdf, other

    cs.LG cs.CL

    DR-RAG: Applying Dynamic Document Relevance to Retrieval-Augmented Generation for Question-Answering

    Authors: Zijian Hei, Weiling Liu, Wenjie Ou, Juyi Qiao, Junming Jiao, Guowen Song, Ting Tian, Yi Lin

    Abstract: Retrieval-Augmented Generation (RAG) has recently demonstrated the performance of Large Language Models (LLMs) in the knowledge-intensive tasks such as Question-Answering (QA). RAG expands the query context by incorporating external knowledge bases to enhance the response accuracy. However, it would be inefficient to access LLMs multiple times for each query and unreliable to retrieve all the rele… ▽ More

    Submitted 16 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  9. arXiv:2406.04214  [pdf, other

    cs.CL

    ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models

    Authors: Yuanyi Ren, Haoran Ye, Hanjun Fang, Xin Zhang, Guojie Song

    Abstract: Large Language Models (LLMs) are transforming diverse fields and gaining increasing influence as human proxies. This development underscores the urgent need for evaluating value orientations and understanding of LLMs to ensure their responsible integration into public-facing applications. This work introduces ValueBench, the first comprehensive psychometric benchmark for evaluating value orientati… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL 2024

  10. arXiv:2406.02651  [pdf, other

    cs.LG cs.AI cs.NI

    RoutePlacer: An End-to-End Routability-Aware Placer with Graph Neural Network

    Authors: Yunbo Hou, Haoran Ye, Yingxue Zhang, Siyuan Xu, Guojie Song

    Abstract: Placement is a critical and challenging step of modern chip design, with routability being an essential indicator of placement quality. Current routability-oriented placers typically apply an iterative two-stage approach, wherein the first stage generates a placement solution, and the second stage provides non-differentiable routing results to heuristically improve the solution quality. This metho… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted at KDD 2024

  11. arXiv:2405.18407  [pdf, other

    cs.LG cs.CV

    Phased Consistency Model

    Authors: Fu-Yun Wang, Zhaoyang Huang, Alexander William Bergman, Dazhong Shen, Peng Gao, Michael Lingelbach, Keqiang Sun, Weikang Bian, Guanglu Song, Yu Liu, Hongsheng Li, Xiaogang Wang

    Abstract: The consistency model (CM) has recently made significant progress in accelerating the generation of diffusion models. However, its application to high-resolution, text-conditioned image generation in the latent space (a.k.a., LCM) remains unsatisfactory. In this paper, we identify three key flaws in the current design of LCM. We investigate the reasons behind these limitations and propose the Phas… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  12. arXiv:2405.03481  [pdf, other

    cs.LG

    AnchorGT: Efficient and Flexible Attention Architecture for Scalable Graph Transformers

    Authors: Wenhao Zhu, Guojie Song, Liang Wang, Shaoguo Liu

    Abstract: Graph Transformers (GTs) have significantly advanced the field of graph representation learning by overcoming the limitations of message-passing graph neural networks (GNNs) and demonstrating promising performance and expressive power. However, the quadratic complexity of self-attention mechanism in GTs has limited their scalability, and previous approaches to address this issue often suffer from… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  13. arXiv:2405.00760  [pdf, other

    cs.CV cs.AI

    Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models

    Authors: Xiaoshi Wu, Yiming Hao, Manyuan Zhang, Keqiang Sun, Zhaoyang Huang, Guanglu Song, Yu Liu, Hongsheng Li

    Abstract: Optimizing a text-to-image diffusion model with a given reward function is an important but underexplored research area. In this study, we propose Deep Reward Tuning (DRTune), an algorithm that directly supervises the final output image of a text-to-image diffusion model and back-propagates through the iterative sampling process to the input noise. We find that training earlier steps in the sampli… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: N/A

  14. arXiv:2404.13046  [pdf, other

    cs.CV

    MoVA: Adapting Mixture of Vision Experts to Multimodal Context

    Authors: Zhuofan Zong, Bingqi Ma, Dazhong Shen, Guanglu Song, Hao Shao, Dongzhi Jiang, Hongsheng Li, Yu Liu

    Abstract: As the key component in multimodal large language models (MLLMs), the ability of the visual encoder greatly affects MLLM's understanding on diverse image content. Although some large-scale pretrained vision encoders such as vision encoders in CLIP and DINOv2 have brought promising performance, we found that there is still no single vision encoder that can dominate various image content understandi… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  15. arXiv:2404.10180  [pdf, other

    cs.CL cs.AI cs.LG cs.NE eess.AS

    Deferred NAM: Low-latency Top-K Context Injection via Deferred Context Encoding for Non-Streaming ASR

    Authors: Zelin Wu, Gan Song, Christopher Li, Pat Rondon, Zhong Meng, Xavier Velez, Weiran Wang, Diamantino Caseiro, Golan Pundak, Tsendsuren Munkhdalai, Angad Chandorkar, Rohit Prabhavalkar

    Abstract: Contextual biasing enables speech recognizers to transcribe important phrases in the speaker's context, such as contact names, even if they are rare in, or absent from, the training data. Attention-based biasing is a leading approach which allows for full end-to-end cotraining of the recognizer and biasing system and requires no separate inference-time components. Such biasers typically consist of… ▽ More

    Submitted 23 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: 9 pages, 3 figures, accepted by NAACL 2024 - Industry Track

    Journal ref: 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics - Industry Track

  16. arXiv:2404.05892  [pdf, other

    cs.CL cs.AI

    Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

    Authors: Bo Peng, Daniel Goldstein, Quentin Anthony, Alon Albalak, Eric Alcaide, Stella Biderman, Eugene Cheah, Xingjian Du, Teddy Ferdinan, Haowen Hou, Przemysław Kazienko, Kranthi Kiran GV, Jan Kocoń, Bartłomiej Koptyra, Satyapriya Krishna, Ronald McClelland Jr., Niklas Muennighoff, Fares Obeid, Atsushi Saito, Guangyu Song, Haoqin Tu, Stanisław Woźniak, Ruichong Zhang, Bingchen Zhao, Qihang Zhao , et al. (3 additional authors not shown)

    Abstract: We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the RWKV (RWKV-4) architecture. Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs. We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokeni… ▽ More

    Submitted 10 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

  17. arXiv:2404.05384  [pdf, other

    cs.CV cs.AI

    Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance

    Authors: Dazhong Shen, Guanglu Song, Zeyue Xue, Fu-Yun Wang, Yu Liu

    Abstract: Classifier-Free Guidance (CFG) has been widely used in text-to-image diffusion models, where the CFG scale is introduced to control the strength of text guidance on the whole image space. However, we argue that a global CFG scale results in spatial inconsistency on varying semantic strengths and suboptimal image quality. To address this problem, we present a novel approach, Semantic-aware Classifi… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: accepted by CVPR-2024

  18. arXiv:2404.03653  [pdf, other

    cs.CV cs.AI cs.CL

    CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

    Authors: Dongzhi Jiang, Guanglu Song, Xiaoshi Wu, Renrui Zhang, Dazhong Shen, Zhuofan Zong, Yu Liu, Hongsheng Li

    Abstract: Diffusion models have demonstrated great success in the field of text-to-image generation. However, alleviating the misalignment between the text prompts and images is still challenging. The root reason behind the misalignment has not been extensively investigated. We observe that the misalignment is caused by inadequate token attention activation. We further attribute this phenomenon to the diffu… ▽ More

    Submitted 3 June, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: Project Page: https://caraj7.github.io/comat

  19. arXiv:2403.16999  [pdf, other

    cs.CV

    Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning

    Authors: Hao Shao, Shengju Qian, Han Xiao, Guanglu Song, Zhuofan Zong, Letian Wang, Yu Liu, Hongsheng Li

    Abstract: Multi-Modal Large Language Models (MLLMs) have demonstrated impressive performance in various VQA tasks. However, they often lack interpretability and struggle with complex visual inputs, especially when the resolution of the input image is high or when the interested region that could provide key information for answering the question is small. To address these challenges, we collect and introduc… ▽ More

    Submitted 7 July, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: Code: https://github.com/deepcs233/Visual-CoT

  20. arXiv:2403.16439  [pdf, other

    cs.RO cs.CV cs.LG

    Producing and Leveraging Online Map Uncertainty in Trajectory Prediction

    Authors: Xunjiang Gu, Guanyu Song, Igor Gilitschenski, Marco Pavone, Boris Ivanovic

    Abstract: High-definition (HD) maps have played an integral role in the development of modern autonomous vehicle (AV) stacks, albeit with high associated labeling and maintenance costs. As a result, many recent works have proposed methods for estimating HD maps online from sensor data, enabling AVs to operate outside of previously-mapped regions. However, current online map estimation approaches are develop… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: 14 pages, 14 figures, 6 tables. CVPR 2024

  21. arXiv:2403.15931  [pdf, other

    cs.CV cs.AI

    X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention

    Authors: You Xie, Hongyi Xu, Guoxian Song, Chao Wang, Yichun Shi, Linjie Luo

    Abstract: We propose X-Portrait, an innovative conditional diffusion model tailored for generating expressive and temporally coherent portrait animation. Specifically, given a single portrait as appearance reference, we aim to animate it with motion derived from a driving video, capturing both highly dynamic and subtle facial expressions along with wide-range head movements. As its core, we leverage the gen… ▽ More

    Submitted 25 July, 2024; v1 submitted 23 March, 2024; originally announced March 2024.

    Comments: SIGGRAPH 2024

  22. arXiv:2403.13745  [pdf, other

    cs.CV

    Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation

    Authors: Fu-Yun Wang, Xiaoshi Wu, Zhaoyang Huang, Xiaoyu Shi, Dazhong Shen, Guanglu Song, Yu Liu, Hongsheng Li

    Abstract: Video outpainting is a challenging task, aiming at generating video content outside the viewport of the input video while maintaining inter-frame and intra-frame consistency. Existing methods fall short in either generation quality or flexibility. We introduce MOTIA Mastering Video Outpainting Through Input-Specific Adaptation, a diffusion-based pipeline that leverages both the intrinsic data-spec… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Code will be available at https://github.com/G-U-N/Be-Your-Outpainter

  23. arXiv:2403.13324  [pdf, other

    cs.CV

    Out-of-Distribution Detection Using Peer-Class Generated by Large Language Model

    Authors: K Huang, G Song, Hanwen Su, Jiyan Wang

    Abstract: Out-of-distribution (OOD) detection is a critical task to ensure the reliability and security of machine learning models deployed in real-world applications. Conventional methods for OOD detection that rely on single-modal information, often struggle to capture the rich variety of OOD instances. The primary difficulty in OOD detection arises when an input image has numerous similarities to a parti… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  24. arXiv:2403.12963  [pdf, other

    cs.CV

    FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

    Authors: Linjiang Huang, Rongyao Fang, Aiping Zhang, Guanglu Song, Si Liu, Yu Liu, Hongsheng Li

    Abstract: In this study, we delve into the generation of high-resolution images from pre-trained diffusion models, addressing persistent challenges, such as repetitive patterns and structural distortions, that emerge when models are applied beyond their trained resolutions. To address this issue, we introduce an innovative, training-free approach FouriScale from the perspective of frequency domain analysis.… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  25. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  26. arXiv:2402.01145  [pdf, other

    cs.NE cs.AI

    Large Language Models as Hyper-Heuristics for Combinatorial Optimization

    Authors: Haoran Ye, Jiarui Wang, Zhiguang Cao, Federico Berto, Chuanbo Hua, Haeyeon Kim, Jinkyoo Park, Guojie Song

    Abstract: The omnipresence of NP-hard combinatorial optimization problems (COPs) compels domain experts to engage in trial-and-error heuristic design. The long-standing endeavor of design automation has gained new momentum with the rise of large language models (LLMs). This paper introduces Language Hyper-Heuristics (LHHs), an emerging variant of Hyper-Heuristics that leverages LLMs for heuristic generation… ▽ More

    Submitted 20 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  27. arXiv:2402.00769  [pdf, other

    cs.CV cs.LG

    AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning

    Authors: Fu-Yun Wang, Zhaoyang Huang, Xiaoyu Shi, Weikang Bian, Guanglu Song, Yu Liu, Hongsheng Li

    Abstract: Video diffusion models has been gaining increasing attention for its ability to produce videos that are both coherent and of high fidelity. However, the iterative denoising process makes it computationally intensive and time-consuming, thus limiting its applications. Inspired by the Consistency Model (CM) that distills pretrained image diffusion models to accelerate the sampling with minimal steps… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: Project Page: https://animatelcm.github.io/

  28. arXiv:2401.10254  [pdf, ps, other

    cs.CV cs.LG

    Beyond the Frame: Single and mutilple video summarization method with user-defined length

    Authors: Vahid Ahmadi Kalkhorani, Qingquan Zhang, Guanqun Song, Ting Zhu

    Abstract: Video smmarization is a crucial method to reduce the time of videos which reduces the spent time to watch/review a long video. This apporach has became more important as the amount of publisehed video is increasing everyday. A single or multiple videos can be summarized into a relatively short video using various of techniques from multimodal audio-visual techniques, to natural language processing… ▽ More

    Submitted 22 December, 2023; originally announced January 2024.

  29. arXiv:2401.02544  [pdf, other

    cs.LG stat.CO

    Hyperparameter Estimation for Sparse Bayesian Learning Models

    Authors: Feng Yu, Lixin Shen, Guohui Song

    Abstract: Sparse Bayesian Learning (SBL) models are extensively used in signal processing and machine learning for promoting sparsity through hierarchical priors. The hyperparameters in SBL models are crucial for the model's performance, but they are often difficult to estimate due to the non-convexity and the high-dimensionality of the associated objective function. This paper presents a comprehensive fram… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    MSC Class: 62F15; 65K10; 65F22

  30. arXiv:2401.01761  [pdf, other

    cs.CL

    Cross-target Stance Detection by Exploiting Target Analytical Perspectives

    Authors: Daijun Ding, Rong Chen, Liwen Jing, Bowen Zhang, Xu Huang, Li Dong, Xiaowen Zhao, Ge Song

    Abstract: Cross-target stance detection (CTSD) is an important task, which infers the attitude of the destination target by utilizing annotated data derived from the source target. One important approach in CTSD is to extract domain-invariant features to bridge the knowledge gap between multiple targets. However, the analysis of informal and short text structure, and implicit expressions, complicate the ext… ▽ More

    Submitted 3 January, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

  31. arXiv:2312.16200  [pdf

    cs.CR

    Security in 5G Networks -- How 5G networks help Mitigate Location Tracking Vulnerability

    Authors: Abshir Ali, Guanqun Song, Ting Zhu

    Abstract: As 5G networks become more mainstream, privacy has come to the forefront of end users. More scrutiny has been shown to previous generation cellular technologies such as 3G and 4G on how they handle sensitive metadata transmitted from an end user mobile device to base stations during registration with a cellular network. These generation cellular networks do not enforce any encryption on this infor… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  32. arXiv:2312.15153  [pdf

    cs.OS cs.CR eess.SY

    Design and Implementation Considerations for a Virtual File System Using an Inode Data Structure

    Authors: Qin Sun, Grace McKenzie, Guanqun Song, Ting Zhu

    Abstract: Virtual file systems are a tool to centralize and mobilize a file system that could otherwise be complex and consist of multiple hierarchies, hard disks, and more. In this paper, we discuss the design of Unix-based file systems and how this type of file system layout using inode data structures and a disk emulator can be implemented as a single-file virtual file system in Linux. We explore the way… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  33. arXiv:2312.15152  [pdf

    cs.LG

    Data Classification With Multiprocessing

    Authors: Anuja Dixit, Shreya Byreddy, Guanqun Song, Ting Zhu

    Abstract: Classification is one of the most important tasks in Machine Learning (ML) and with recent advancements in artificial intelligence (AI) it is important to find efficient ways to implement it. Generally, the choice of classification algorithm depends on the data it is dealing with, and accuracy of the algorithm depends on the hyperparameters it is tuned with. One way is to check the accuracy of the… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  34. arXiv:2312.15150  [pdf

    cs.CR

    The Inner Workings of Windows Security

    Authors: Ashvini A Kulshrestha, Guanqun Song, Ting Zhu

    Abstract: The year 2022 saw a significant increase in Microsoft vulnerabilities, reaching an all-time high in the past decade. With new vulnerabilities constantly emerging, there is an urgent need for proactive approaches to harden systems and protect them from potential cyber threats. This project aims to investigate the vulnerabilities of the Windows Operating System and explore the effectiveness of key s… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  35. arXiv:2312.13578  [pdf, other

    cs.CV

    DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation

    Authors: Chenxu Zhang, Chao Wang, Jianfeng Zhang, Hongyi Xu, Guoxian Song, You Xie, Linjie Luo, Yapeng Tian, Xiaohu Guo, Jiashi Feng

    Abstract: The generation of emotional talking faces from a single portrait image remains a significant challenge. The simultaneous achievement of expressive emotional talking and accurate lip-sync is particularly difficult, as expressiveness is often compromised for the accuracy of lip-sync. As widely adopted by many prior works, the LSTM network often fails to capture the subtleties and variations of emoti… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: Project Page at https://magic-research.github.io/dream-talk/

  36. arXiv:2312.13016  [pdf, other

    cs.CV

    DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis

    Authors: Yuming Gu, You Xie, Hongyi Xu, Guoxian Song, Yichun Shi, Di Chang, Jing Yang, Linjie Luo

    Abstract: We present DiffPortrait3D, a conditional diffusion model that is capable of synthesizing 3D-consistent photo-realistic novel views from as few as a single in-the-wild portrait. Specifically, given a single RGB input, we aim to synthesize plausible but consistent facial details rendered from novel camera views with retained both identity and facial expression. In lieu of time-consuming optimization… ▽ More

    Submitted 19 March, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

  37. arXiv:2312.11854  [pdf, other

    cs.IT

    Outer Channel of DNA-Based Data Storage: Capacity and Efficient Coding Schemes

    Authors: Xuan He, Yi Ding, Kui Cai, Guanghui Song, Bin Dai, Xiaohu Tang

    Abstract: In this paper, we consider the outer channel for DNA-based data storage, where each DNA string is either correctly transmitted, or being erased, or being corrupted by uniformly distributed random substitution errors, and all strings are randomly shuffled with each other. We first derive the capacity of the outer channel, which surprisingly implies that the uniformly distributed random substitution… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: This paper has been submitted to IEEE Trans. Inf. Theory

  38. arXiv:2312.09744  [pdf, other

    cs.LG cond-mat.mtrl-sci

    Bridging the Semantic-Numerical Gap: A Numerical Reasoning Method of Cross-modal Knowledge Graph for Material Property Prediction

    Authors: Guangxuan Song, Dongmei Fu, Zhongwei Qiu, Zijiang Yang, Jiaxin Dai, Lingwei Ma, Dawei Zhang

    Abstract: Using machine learning (ML) techniques to predict material properties is a crucial research topic. These properties depend on numerical data and semantic factors. Due to the limitations of small-sample datasets, existing methods typically adopt ML algorithms to regress numerical properties or transfer other pre-trained knowledge graphs (KGs) to the material. However, these methods cannot simultane… ▽ More

    Submitted 24 April, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

  39. arXiv:2311.14496  [pdf, other

    cs.CR

    RTPS Attack Dataset Description

    Authors: Dong Young Kim, Dongsung Kim, Yuchan Song, Gang Min Kim, Min Geun Song, Jeong Do Yoo, Huy Kang Kim

    Abstract: This paper explains all about our RTPS datasets. We collect malicious/benign packet data by injecting attack data in an Unmanned Ground Vehicle (UGV) in the normal state. We assembled the testbed, consisting of UGV, Controller, PC, and Router. We collect this dataset in the UGV part of our testbed. We conducted two types of attack "Command Injection" and "Command Injection with ARP Spoofing" on… ▽ More

    Submitted 2 April, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

    Comments: This manuscript is written in Korean. You can download our dataset through our lab: https://ocslab.hksecurity.net/Datasets/rtps-attack-dataset We welcome your comments or feedback. Contact INFO: Dong Young Kim ([email protected]), Huy Kang Kim ([email protected])

  40. arXiv:2311.14342  [pdf, other

    cs.CR

    AI-based Attack Graph Generation

    Authors: Sangbeom Park, Jaesung Lee, Jeong Do Yoo, Min Geun Song, Hyosun Lee, Jaewoong Choi, Chaeyeon Sagong, Huy Kang Kim

    Abstract: With the advancement of IoT technology, many electronic devices are interconnected through networks, communicating with each other and performing specific roles. However, as numerous devices join networks, the threat of cyberattacks also escalates. Preventing and detecting cyber threats are crucial, and one method of preventing such threats involves using attack graphs. Attack graphs are widely us… ▽ More

    Submitted 27 November, 2023; v1 submitted 24 November, 2023; originally announced November 2023.

    Comments: in Korean Language, 8 Figures, 14 Pages

  41. arXiv:2311.14327  [pdf, other

    cs.CR

    C-ITS Environment Modeling and Attack Modeling

    Authors: Jaewoong Choi, Min Geun Song, Hyosun Lee, Chaeyeon Sagong, Sangbeom Park, Jaesung Lee, Jeong Do Yoo, Huy Kang Kim

    Abstract: As technology advances, cities are evolving into smart cities, with the ability to process large amounts of data and the increasing complexity and diversification of various elements within urban areas. Among the core systems of a smart city is the Cooperative-Intelligent Transport Systems (C-ITS). C-ITS is a system where vehicles provide real-time information to drivers about surrounding traffic… ▽ More

    Submitted 27 November, 2023; v1 submitted 24 November, 2023; originally announced November 2023.

    Comments: in Korean Language, 14 Figures, 15 Pages

  42. arXiv:2311.12052  [pdf, other

    cs.CV

    MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion

    Authors: Di Chang, Yichun Shi, Quankai Gao, Jessica Fu, Hongyi Xu, Guoxian Song, Qing Yan, Yizhe Zhu, Xiao Yang, Mohammad Soleymani

    Abstract: In this work, we propose MagicPose, a diffusion-based model for 2D human pose and facial expression retargeting. Specifically, given a reference image, we aim to generate a person's new images by controlling the poses and facial expressions while keeping the identity unchanged. To this end, we propose a two-stage training strategy to disentangle human motions and appearance (e.g., facial expressio… ▽ More

    Submitted 5 May, 2024; v1 submitted 18 November, 2023; originally announced November 2023.

    Comments: Accepted by ICML 2024. MagicPose and MagicDance are the same project. Website:https://boese0601.github.io/magicdance/ Code:https://github.com/Boese0601/MagicDance

  43. arXiv:2310.16364  [pdf, other

    cs.CV

    Towards Large-scale Masked Face Recognition

    Authors: Manyuan Zhang, Bingqi Ma, Guanglu Song, Yunxiao Wang, Hongsheng Li, Yu Liu

    Abstract: During the COVID-19 coronavirus epidemic, almost everyone is wearing masks, which poses a huge challenge for deep learning-based face recognition algorithms. In this paper, we will present our \textbf{championship} solutions in ICCV MFR WebFace260M and InsightFace unconstrained tracks. We will focus on four challenges in large-scale masked face recognition, i.e., super-large scale training, data n… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: the top1 solution for ICCV2021-MFR challenge

  44. arXiv:2310.15955  [pdf, other

    cs.CV

    Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection

    Authors: Manyuan Zhang, Guanglu Song, Yu Liu, Hongsheng Li

    Abstract: The introduction of DETR represents a new paradigm for object detection. However, its decoder conducts classification and box localization using shared queries and cross-attention layers, leading to suboptimal results. We observe that different regions of interest in the visual feature map are suitable for performing query classification and box localization tasks, even for the same object. Salien… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: accepted by ICCV2023

  45. arXiv:2310.06380  [pdf, other

    cs.LG

    CAST: Cluster-Aware Self-Training for Tabular Data via Reliable Confidence

    Authors: Minwook Kim, Juseong Kim, Ki Beom Kim, Giltae Song

    Abstract: Tabular data is one of the most widely used data modalities, encompassing numerous datasets with substantial amounts of unlabeled data. Despite this prevalence, there is a notable lack of simple and versatile methods for utilizing unlabeled data in the tabular domain, where both gradient-boosting decision trees and neural networks are employed. In this context, self-training has gained attraction… ▽ More

    Submitted 29 August, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: 11 pages for main body, and 10 additional pages for appendix

  46. arXiv:2310.02714  [pdf, other

    cs.CV

    GETAvatar: Generative Textured Meshes for Animatable Human Avatars

    Authors: Xuanmeng Zhang, Jianfeng Zhang, Rohan Chacko, Hongyi Xu, Guoxian Song, Yi Yang, Jiashi Feng

    Abstract: We study the problem of 3D-aware full-body human generation, aiming at creating animatable human avatars with high-quality textures and geometries. Generally, two challenges remain in this field: i) existing methods struggle to generate geometries with rich realistic details such as the wrinkles of garments; ii) they typically utilize volumetric radiance fields and neural renderers in the synthesi… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted by ICCV2023. Project Page: https://getavatar.github.io/

  47. arXiv:2310.00178  [pdf, other

    cs.CL eess.AS

    Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm

    Authors: Weiran Wang, Zelin Wu, Diamantino Caseiro, Tsendsuren Munkhdalai, Khe Chai Sim, Pat Rondon, Golan Pundak, Gan Song, Rohit Prabhavalkar, Zhong Meng, Ding Zhao, Tara Sainath, Pedro Moreno Mengibar

    Abstract: Contextual biasing refers to the problem of biasing the automatic speech recognition (ASR) systems towards rare entities that are relevant to the specific user or application scenarios. We propose algorithms for contextual biasing based on the Knuth-Morris-Pratt algorithm for pattern matching. During beam search, we boost the score of a token extension if it extends matching into a set of biasing… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

  48. arXiv:2308.12380  [pdf, other

    cs.CV

    FG-Net: Facial Action Unit Detection with Generalizable Pyramidal Features

    Authors: Yufeng Yin, Di Chang, Guoxian Song, Shen Sang, Tiancheng Zhi, Jing Liu, Linjie Luo, Mohammad Soleymani

    Abstract: Automatic detection of facial Action Units (AUs) allows for objective facial expression analysis. Due to the high cost of AU labeling and the limited size of existing benchmarks, previous AU detection methods tend to overfit the dataset, resulting in a significant performance loss when evaluated across corpora. To address this problem, we propose FG-Net for generalizable facial action unit detecti… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

  49. arXiv:2308.07575  [pdf, other

    cs.CV cs.AI cs.LG

    Story Visualization by Online Text Augmentation with Context Memory

    Authors: Daechul Ahn, Daneul Kim, Gwangmo Song, Seung Hwan Kim, Honglak Lee, Dongyeop Kang, Jonghyun Choi

    Abstract: Story visualization (SV) is a challenging text-to-image generation task for the difficulty of not only rendering visual details from the text descriptions but also encoding a long-term context across multiple sentences. While prior efforts mostly focus on generating a semantically relevant image for each sentence, encoding a context spread across the given paragraph to generate contextually convin… ▽ More

    Submitted 19 August, 2023; v1 submitted 15 August, 2023; originally announced August 2023.

    Comments: ICCV 2023, Project page: https://dcahn12.github.io/projects/CMOTA/

  50. arXiv:2307.13924  [pdf, other

    cs.CV cs.LG cs.RO

    trajdata: A Unified Interface to Multiple Human Trajectory Datasets

    Authors: Boris Ivanovic, Guanyu Song, Igor Gilitschenski, Marco Pavone

    Abstract: The field of trajectory forecasting has grown significantly in recent years, partially owing to the release of numerous large-scale, real-world human trajectory datasets for autonomous vehicles (AVs) and pedestrian motion tracking. While such datasets have been a boon for the community, they each use custom and unique data formats and APIs, making it cumbersome for researchers to train and evaluat… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

    Comments: 15 pages, 15 figures, 3 tables