Skip to main content

Showing 1–50 of 2,074 results for author: Lee, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.14841  [pdf, other

    cs.CV cs.AI

    Diffusion based Semantic Outlier Generation via Nuisance Awareness for Out-of-Distribution Detection

    Authors: Suhee Yoon, Sanghyu Yoon, Hankook Lee, Ye Seul Sim, Sungik Choi, Kyungeun Lee, Hye-Seung Cho, Woohyung Lim

    Abstract: Out-of-distribution (OOD) detection, which determines whether a given sample is part of the in-distribution (ID), has recently shown promising results through training with synthetic OOD datasets. Nonetheless, existing methods often produce outliers that are considerably distant from the ID, showing limited efficacy for capturing subtle distinctions between ID and OOD. To address these issues, we… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  2. arXiv:2408.14739  [pdf, other

    cs.SD eess.AS

    VoiceTailor: Lightweight Plug-In Adapter for Diffusion-Based Personalized Text-to-Speech

    Authors: Heeseung Kim, Sang-gil Lee, Jiheum Yeom, Che Hyun Lee, Sungwon Kim, Sungroh Yoon

    Abstract: We propose VoiceTailor, a parameter-efficient speaker-adaptive text-to-speech (TTS) system, by equipping a pre-trained diffusion-based TTS model with a personalized adapter. VoiceTailor identifies pivotal modules that benefit from the adapter based on a weight change ratio analysis. We utilize Low-Rank Adaptation (LoRA) as a parameter-efficient adaptation method and incorporate the adapter into pi… ▽ More

    Submitted 27 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

    Comments: INTERSPEECH 2024

  3. arXiv:2408.14559  [pdf, other

    cs.CV cs.LG

    Exploring the Potential of Synthetic Data to Replace Real Data

    Authors: Hyungtae Lee, Yan Zhang, Heesung Kwon, Shuvra S. Bhattacharrya

    Abstract: The potential of synthetic data to replace real data creates a huge demand for synthetic data in data-hungry AI. This potential is even greater when synthetic data is used for training along with a small number of real images from domains other than the test domain. We find that this potential varies depending on (i) the number of cross-domain real images and (ii) the test set on which the trained… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: ICIP 2024

  4. arXiv:2408.14488  [pdf

    cs.LG cond-mat.mtrl-sci

    Multi-Task Multi-Fidelity Learning of Properties for Energetic Materials

    Authors: Robert J. Appleton, Daniel Klinger, Brian H. Lee, Michael Taylor, Sohee Kim, Samuel Blankenship, Brian C. Barnes, Steven F. Son, Alejandro Strachan

    Abstract: Data science and artificial intelligence are playing an increasingly important role in the physical sciences. Unfortunately, in the field of energetic materials data scarcity limits the accuracy and even applicability of ML tools. To address data limitations, we compiled multi-modal data: both experimental and computational results for several properties. We find that multi-task neural networks ca… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 16 pages, 4 figures, 2 tables

  5. arXiv:2408.14016  [pdf, other

    cs.CV cs.AI

    Pixel-Aligned Multi-View Generation with Depth Guided Decoder

    Authors: Zhenggang Tang, Peiye Zhuang, Chaoyang Wang, Aliaksandr Siarohin, Yash Kant, Alexander Schwing, Sergey Tulyakov, Hsin-Ying Lee

    Abstract: The task of image-to-multi-view generation refers to generating novel views of an instance from a single image. Recent methods achieve this by extending text-to-image latent diffusion models to multi-view version, which contains an VAE image encoder and a U-Net diffusion model. Specifically, these generation methods usually fix VAE and finetune the U-Net only. However, the significant downscaling… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  6. arXiv:2408.13891  [pdf, other

    cs.CL eess.AS

    SpeechCaps: Advancing Instruction-Based Universal Speech Models with Multi-Talker Speaking Style Captioning

    Authors: Chien-yu Huang, Min-Han Shih, Ke-Han Lu, Chi-Yuan Hsiao, Hung-yi Lee

    Abstract: Instruction-based speech processing is becoming popular. Studies show that training with multiple tasks boosts performance, but collecting diverse, large-scale tasks and datasets is expensive. Thus, it is highly desirable to design a fundamental task that benefits other downstream tasks. This paper introduces a multi-talker speaking style captioning task to enhance the understanding of speaker and… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: SynData4GenAI 2024

  7. arXiv:2408.13850  [pdf, other

    cs.LG cs.AI

    Condensed Sample-Guided Model Inversion for Knowledge Distillation

    Authors: Kuluhan Binici, Shivam Aggarwal, Cihan Acar, Nam Trung Pham, Karianto Leman, Gim Hee Lee, Tulika Mitra

    Abstract: Knowledge distillation (KD) is a key element in neural network compression that allows knowledge transfer from a pre-trained teacher model to a more compact student model. KD relies on access to the training dataset, which may not always be fully available due to privacy concerns or logistical issues related to the size of the data. To address this, "data-free" KD methods use synthetic data, gener… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  8. arXiv:2408.13751  [pdf, other

    stat.ML cs.LG math.OC

    Improved identification of breakpoints in piecewise regression and its applications

    Authors: Taehyeong Kim, Hyungu Lee, Hayoung Choi

    Abstract: Identifying breakpoints in piecewise regression is critical in enhancing the reliability and interpretability of data fitting. In this paper, we propose novel algorithms based on the greedy algorithm to accurately and efficiently identify breakpoints in piecewise polynomial regression. The algorithm updates the breakpoints to minimize the error by exploring the neighborhood of each breakpoint. It… ▽ More

    Submitted 27 August, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

    Comments: 13 pages, 6 figures

  9. arXiv:2408.13492  [pdf, other

    cs.CV

    Online Continuous Generalized Category Discovery

    Authors: Keon-Hee Park, Hakyung Lee, Kyungwoo Song, Gyeong-Moon Park

    Abstract: With the advancement of deep neural networks in computer vision, artificial intelligence (AI) is widely employed in real-world applications. However, AI still faces limitations in mimicking high-level human capabilities, such as novel category discovery, for practical use. While some methods utilizing offline continual learning have been proposed for novel category discovery, they neglect the cont… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  10. arXiv:2408.13040  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks

    Authors: Kai-Wei Chang, Haibin Wu, Yu-Kai Wang, Yuan-Kuei Wu, Hua Shen, Wei-Cheng Tseng, Iu-thing Kang, Shang-Wen Li, Hung-yi Lee

    Abstract: Prompting has become a practical method for utilizing pre-trained language models (LMs). This approach offers several advantages. It allows an LM to adapt to new tasks with minimal training and parameter updates, thus achieving efficiency in both storage and computation. Additionally, prompting modifies only the LM's inputs and harnesses the generative capabilities of language models to address va… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)

    Journal ref: in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 3730-3744, 2024

  11. arXiv:2408.11829  [pdf

    cs.CV cs.LG eess.IV

    FAKER: Full-body Anonymization with Human Keypoint Extraction for Real-time Video Deidentification

    Authors: Byunghyun Ban, Hyoseok Lee

    Abstract: In the contemporary digital era, protection of personal information has become a paramount issue. The exponential growth of the media industry has heightened concerns regarding the anonymization of individuals captured in video footage. Traditional methods, such as blurring or pixelation, are commonly employed, while recent advancements have introduced generative adversarial networks (GAN) to redr… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  12. arXiv:2408.11814  [pdf, other

    cs.CV

    SynPlay: Importing Real-world Diversity for a Synthetic Human Dataset

    Authors: Jinsub Yim, Hyungtae Lee, Sungmin Eum, Yi-Ting Shen, Yan Zhang, Heesung Kwon, Shuvra S. Bhattacharyya

    Abstract: We introduce Synthetic Playground (SynPlay), a new synthetic human dataset that aims to bring out the diversity of human appearance in the real world. We focus on two factors to achieve a level of diversity that has not yet been seen in previous works: i) realistic human motions and poses and ii) multiple camera viewpoints towards human instances. We first use a game engine and its library-provide… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: Project Page: https://synplaydataset.github.io/

  13. arXiv:2408.11751  [pdf, other

    cs.RO cs.MA

    Bayesian Optimization Framework for Efficient Fleet Design in Autonomous Multi-Robot Exploration

    Authors: David Molina Concha, Jiping Li, Haoran Yin, Kyeonghyeon Park, Hyun-Rok Lee, Taesik Lee, Dhruv Sirohi, Chi-Guhn Lee

    Abstract: This study addresses the challenge of fleet design optimization in the context of heterogeneous multi-robot fleets, aiming to obtain feasible designs that balance performance and costs. In the domain of autonomous multi-robot exploration, reinforcement learning agents play a central role, offering adaptability to complex terrains and facilitating collaboration among robots. However, modifying the… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  14. arXiv:2408.11318  [pdf, ps, other

    cs.CV

    TWLV-I: Analysis and Insights from Holistic Evaluation on Video Foundation Models

    Authors: Hyeongmin Lee, Jin-Young Kim, Kyungjune Baek, Jihwan Kim, Hyojun Go, Seongsu Ha, Seokjin Han, Jiho Jang, Raehyuk Jung, Daewoo Kim, GeunOh Kim, JongMok Kim, Jongseok Kim, Junwan Kim, Soonwoo Kwon, Jangwon Lee, Seungjoon Park, Minjoon Seo, Jay Suh, Jaehyuk Yi, Aiden Lee

    Abstract: In this work, we discuss evaluating video foundation models in a fair and robust manner. Unlike language or image foundation models, many video foundation models are evaluated with differing parameters (such as sampling rate, number of frames, pretraining steps, etc.), making fair and robust comparisons challenging. Therefore, we present a carefully designed evaluation framework for measuring two… ▽ More

    Submitted 22 August, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

    Comments: 17 pages; Twelve Labs Technical Report

  15. arXiv:2408.10107  [pdf, other

    cs.LG cs.AI stat.ML

    Perturb-and-Compare Approach for Detecting Out-of-Distribution Samples in Constrained Access Environments

    Authors: Heeyoung Lee, Hoyoon Byun, Changdae Oh, JinYeong Bak, Kyungwoo Song

    Abstract: Accessing machine learning models through remote APIs has been gaining prevalence following the recent trend of scaling up model parameters for increased performance. Even though these models exhibit remarkable ability, detecting out-of-distribution (OOD) samples remains a crucial safety concern for end users as these samples may induce unreliable outputs from the model. In this work, we propose a… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Accepted to European Conference on Artificial Intelligence (ECAI) 2024

  16. arXiv:2408.09703  [pdf, other

    cs.AI

    Partial-Multivariate Model for Forecasting

    Authors: Jaehoon Lee, Hankook Lee, Sungik Choi, Sungjun Cho, Moontae Lee

    Abstract: When solving forecasting problems including multiple time-series features, existing approaches often fall into two extreme categories, depending on whether to utilize inter-feature information: univariate and complete-multivariate models. Unlike univariate cases which ignore the information, complete-multivariate models compute relationships among a complete set of features. However, despite the p… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 25 pages

  17. arXiv:2408.09686  [pdf, other

    cs.MA

    Algorithmic Contract Design with Reinforcement Learning Agents

    Authors: David Molina Concha, Kyeonghyeon Park, Hyun-Rok Lee, Taesik Lee, Chi-Guhn Lee

    Abstract: We introduce a novel problem setting for algorithmic contract design, named the principal-MARL contract design problem. This setting extends traditional contract design to account for dynamic and stochastic environments using Markov Games and Multi-Agent Reinforcement Learning. To tackle this problem, we propose a Multi-Objective Bayesian Optimization (MOBO) framework named Constrained Pareto Maxi… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  18. arXiv:2408.09662  [pdf, other

    cs.RO cs.DC

    CusADi: A GPU Parallelization Framework for Symbolic Expressions and Optimal Control

    Authors: Se Hwan Jeon, Seungwoo Hong, Ho Jae Lee, Charles Khazoom, Sangbae Kim

    Abstract: The parallelism afforded by GPUs presents significant advantages in training controllers through reinforcement learning (RL). However, integrating model-based optimization into this process remains challenging due to the complexity of formulating and solving optimization problems across thousands of instances. In this work, we present CusADi, an extension of the CasADi symbolic framework to suppor… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: RAL 2024 submission

  19. arXiv:2408.09554  [pdf, other

    q-bio.QM cs.CV eess.IV

    Screen Them All: High-Throughput Pan-Cancer Genetic and Phenotypic Biomarker Screening from H&E Whole Slide Images

    Authors: Yi Kan Wang, Ludmila Tydlitatova, Jeremy D. Kunz, Gerard Oakley, Ran A. Godrich, Matthew C. H. Lee, Chad Vanderbilt, Razik Yousfi, Thomas Fuchs, David S. Klimstra, Siqi Liu

    Abstract: Many molecular alterations serve as clinically prognostic or therapy-predictive biomarkers, typically detected using single or multi-gene molecular assays. However, these assays are expensive, tissue destructive and often take weeks to complete. Using AI on routine H&E WSIs offers a fast and economical approach to screen for multiple molecular biomarkers. We present a high-throughput AI-based syst… ▽ More

    Submitted 20 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

  20. arXiv:2408.07790  [pdf, other

    cs.CV

    Cropper: Vision-Language Model for Image Cropping through In-Context Learning

    Authors: Seung Hyun Lee, Junjie Ke, Yinxiao Li, Junfeng He, Steven Hickson, Katie Datsenko, Sangpil Kim, Ming-Hsuan Yang, Irfan Essa, Feng Yang

    Abstract: The goal of image cropping is to identify visually appealing crops within an image. Conventional methods rely on specialized architectures trained on specific datasets, which struggle to be adapted to new requirements. Recent breakthroughs in large vision-language models (VLMs) have enabled visual in-context learning without explicit training. However, effective strategies for vision downstream ta… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  21. arXiv:2408.07665  [pdf, ps, other

    cs.CL eess.AS

    Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models

    Authors: Yi-Cheng Lin, Wei-Chih Chen, Hung-yi Lee

    Abstract: Warning: This paper may contain texts with uncomfortable content. Large Language Models (LLMs) have achieved remarkable performance in various tasks, including those involving multimodal data like speech. However, these models often exhibit biases due to the nature of their training data. Recently, more Speech Large Language Models (SLLMs) have emerged, underscoring the urgent need to address th… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  22. arXiv:2408.07416  [pdf, other

    cs.CV

    Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space

    Authors: Hyunjee Lee, Youngsik Yun, Jeongmin Bae, Seoha Kim, Youngjung Uh

    Abstract: Understanding the 3D semantics of a scene is a fundamental problem for various scenarios such as embodied agents. While NeRFs and 3DGS excel at novel-view synthesis, previous methods for understanding their semantics have been limited to incomplete 3D understanding: their segmentation results are 2D masks and their supervision is anchored at 2D pixels. This paper revisits the problem set to pursue… ▽ More

    Submitted 18 August, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

    Comments: Project page: https://hyunji12.github.io/Open3DRF

  23. arXiv:2408.07326  [pdf, other

    cs.AR

    LPU: A Latency-Optimized and Highly Scalable Processor for Large Language Model Inference

    Authors: Seungjae Moon, Jung-Hoon Kim, Junsoo Kim, Seongmin Hong, Junseo Cha, Minsu Kim, Sukbin Lim, Gyubin Choi, Dongjin Seo, Jongho Kim, Hunjong Lee, Hyunjun Park, Ryeowook Ko, Soongyu Choi, Jongse Park, Jinwon Lee, Joo-Young Kim

    Abstract: The explosive arrival of OpenAI's ChatGPT has fueled the globalization of large language model (LLM), which consists of billions of pretrained parameters that embodies the aspects of syntax and semantics. HyperAccel introduces latency processing unit (LPU), a latency-optimized and highly scalable processor architecture for the acceleration of LLM inference. LPU perfectly balances the memory bandwi… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  24. arXiv:2408.07081  [pdf, other

    cs.LG cs.AI cs.CL

    MathBridge: A Large Corpus Dataset for Translating Spoken Mathematical Expressions into $LaTeX$ Formulas for Improved Readability

    Authors: Kyudan Jung, Sieun Hyeon, Jeong Youn Kwon, Nam-Joon Kim, Hyun Gon Ryu, Hyuk-Jae Lee, Jaeyoung Do

    Abstract: Improving the readability of mathematical expressions in text-based document such as subtitle of mathematical video, is an significant task. To achieve this, mathematical expressions should be convert to compiled formulas. For instance, the spoken expression ``x equals minus b plus or minus the square root of b squared minus four a c, all over two a'' from automatic speech recognition is more read… ▽ More

    Submitted 16 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

    Comments: 9 pages, 6 figures

  25. arXiv:2408.06816  [pdf, other

    cs.AI cs.CL

    MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty

    Authors: Yongjin Yang, Haneul Yoo, Hwaran Lee

    Abstract: Although large language models (LLMs) are capable of performing various tasks, they still suffer from producing plausible but incorrect responses. To improve the reliability of LLMs, recent research has focused on uncertainty quantification to predict whether a response is correct or not. However, most uncertainty quantification methods have been evaluated on questions requiring a single clear ans… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  26. arXiv:2408.06598  [pdf

    cs.CL cs.AI

    A Perspective on Large Language Models, Intelligent Machines, and Knowledge Acquisition

    Authors: Vladimir Cherkassky, Eng Hock Lee

    Abstract: Large Language Models (LLMs) are known for their remarkable ability to generate synthesized 'knowledge', such as text documents, music, images, etc. However, there is a huge gap between LLM's and human capabilities for understanding abstract concepts and reasoning. We discuss these issues in a larger philosophical context of human knowledge acquisition and the Turing test. In addition, we illustra… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  27. arXiv:2408.04874  [pdf, other

    cs.HC

    DG Comics: Semi-Automatically Authoring Graph Comics for Dynamic Graphs

    Authors: Joohee Kim, Hyunwook Lee, Duc M. Nguyen, Minjeong Shin, Bum Chul Kwon, Sungahn Ko, Niklas Elmqvist

    Abstract: Comics are an effective method for sequential data-driven storytelling, especially for dynamic graphs -- graphs whose vertices and edges change over time. However, manually creating such comics is currently time-consuming, complex, and error-prone. In this paper, we propose DG Comics, a novel comic authoring tool for dynamic graphs that allows users to semi-automatically build and annotate comics.… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: To appear in IEEE Transactions on Visualization and Computer Graphics

  28. arXiv:2408.03907  [pdf, other

    cs.CL cs.AI

    Decoding Biases: Automated Methods and LLM Judges for Gender Bias Detection in Language Models

    Authors: Shachi H Kumar, Saurav Sahay, Sahisnu Mazumder, Eda Okur, Ramesh Manuvinakurike, Nicole Beckage, Hsuan Su, Hung-yi Lee, Lama Nachman

    Abstract: Large Language Models (LLMs) have excelled at language understanding and generating human-level text. However, even with supervised training and human alignment, these LLMs are susceptible to adversarial attacks where malicious users can prompt the model to generate undesirable text. LLMs also inherently encode potential biases that can cause various harmful effects during interactions. Bias evalu… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 6 pages paper content, 17 pages of appendix

  29. arXiv:2408.03612  [pdf, other

    cs.CV cs.LG

    JARViS: Detecting Actions in Video Using Unified Actor-Scene Context Relation Modeling

    Authors: Seok Hwan Lee, Taein Son, Soo Won Seo, Jisong Kim, Jun Won Choi

    Abstract: Video action detection (VAD) is a formidable vision task that involves the localization and classification of actions within the spatial and temporal dimensions of a video clip. Among the myriad VAD architectures, two-stage VAD methods utilize a pre-trained person detector to extract the region of interest features, subsequently employing these features for action detection. However, the performan… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 31 pages, 10 figures

  30. arXiv:2408.03541  [pdf, ps, other

    cs.CL cs.AI

    EXAONE 3.0 7.8B Instruction Tuned Language Model

    Authors: LG AI Research, :, Soyoung An, Kyunghoon Bae, Eunbi Choi, Stanley Jungkyu Choi, Yemuk Choi, Seokhee Hong, Yeonjung Hong, Junwon Hwang, Hyojin Jeon, Gerrard Jeongwon Jo, Hyunjik Jo, Jiyeon Jung, Yountae Jung, Euisoon Kim, Hyosang Kim, Joonkee Kim, Seonghwan Kim, Soyeon Kim, Sunkyoung Kim, Yireun Kim, Youchul Kim, Edward Hwayoung Lee, Haeju Lee , et al. (14 additional authors not shown)

    Abstract: We introduce EXAONE 3.0 instruction-tuned language model, the first open model in the family of Large Language Models (LLMs) developed by LG AI Research. Among different model sizes, we publicly release the 7.8B instruction-tuned model to promote open research and innovations. Through extensive evaluations across a wide range of public and in-house benchmarks, EXAONE 3.0 demonstrates highly compet… ▽ More

    Submitted 13 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

  31. arXiv:2408.03331  [pdf, other

    physics.soc-ph cs.SI stat.AP

    The Wasserstein Bipolarization Index: A New Measure of Public Opinion Polarization, with an Application to Cross-Country Attitudes toward COVID-19 Vaccination Mandates

    Authors: Hane Lee, Michael E. Sobel

    Abstract: Although the topic of opinion polarization receives much attention from the media, public opinion researchers and political scientists, the phenomenon itself has not been adequately characterized in either the lay or academic literature. To study opinion polarization among the public, researchers compare the distributions of respondents to survey questions or track the distribution of responses to… ▽ More

    Submitted 19 July, 2024; originally announced August 2024.

  32. arXiv:2408.03178  [pdf, other

    cs.CV cs.GR cs.LG

    An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion

    Authors: Xingguang Yan, Han-Hung Lee, Ziyu Wan, Angel X. Chang

    Abstract: We introduce a new approach for generating realistic 3D models with UV maps through a representation termed "Object Images." This approach encapsulates surface geometry, appearance, and patch structures within a 64x64 pixel image, effectively converting complex 3D shapes into a more manageable 2D format. By doing so, we address the challenges of both geometric and semantic irregularity inherent in… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Project Page: https://omages.github.io/

  33. arXiv:2408.02662  [pdf, other

    cs.RO eess.SY

    Integrating Model-Based Footstep Planning with Model-Free Reinforcement Learning for Dynamic Legged Locomotion

    Authors: Ho Jae Lee, Seungwoo Hong, Sangbae Kim

    Abstract: In this work, we introduce a control framework that combines model-based footstep planning with Reinforcement Learning (RL), leveraging desired footstep patterns derived from the Linear Inverted Pendulum (LIP) dynamics. Utilizing the LIP model, our method forward predicts robot states and determines the desired foot placement given the velocity commands. We then train an RL policy to track the foo… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 8 pages

  34. arXiv:2408.02442  [pdf, other

    cs.CL

    Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models

    Authors: Zhi Rui Tam, Cheng-Kuang Wu, Yi-Lin Tsai, Chieh-Yen Lin, Hung-yi Lee, Yun-Nung Chen

    Abstract: Structured generation, the process of producing content in standardized formats like JSON and XML, is widely utilized in real-world applications to extract key output information from large language models (LLMs). This study investigates whether such constraints on generation space impact LLMs' abilities, including reasoning and domain knowledge comprehension. Specifically, we evaluate LLMs' perfo… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 18 pages

  35. arXiv:2408.02336  [pdf, other

    cs.CV cs.LG

    Infusing Environmental Captions for Long-Form Video Language Grounding

    Authors: Hyogun Lee, Soyeon Hong, Mujeen Sung, Jinwoo Choi

    Abstract: In this work, we tackle the problem of long-form video-language grounding (VLG). Given a long-form video and a natural language query, a model should temporally localize the precise moment that answers the query. Humans can easily solve VLG tasks, even with arbitrarily long videos, by discarding irrelevant moments using extensive and robust knowledge gained from experience. Unlike humans, existing… ▽ More

    Submitted 6 August, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

    Comments: 7 pages, 3 figures

  36. arXiv:2408.02307  [pdf, other

    cs.CV

    Low-Cost Self-Ensembles Based on Multi-Branch Transformation and Grouped Convolution

    Authors: Hojung Lee, Jong-Seok Lee

    Abstract: Recent advancements in low-cost ensemble learning have demonstrated improved efficiency for image classification. However, the existing low-cost ensemble methods show relatively lower accuracy compared to conventional ensemble learning. In this paper, we propose a new low-cost ensemble learning, which can simultaneously achieve high efficiency and classification performance. A CNN is transformed i… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  37. arXiv:2408.02301  [pdf, other

    cs.CV cs.LG

    Network Fission Ensembles for Low-Cost Self-Ensembles

    Authors: Hojung Lee, Jong-Seok Lee

    Abstract: Recent ensemble learning methods for image classification have been shown to improve classification accuracy with low extra cost. However, they still require multiple trained models for ensemble inference, which eventually becomes a significant burden when the model size increases. In this paper, we propose a low-cost ensemble learning and inference, called Network Fission Ensembles (NFE), by conv… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  38. arXiv:2408.01320  [pdf, ps, other

    eess.SP cs.IT

    Generalized Reduced-WMMSE Approach for Cell-Free Massive MIMO With Per-AP Power Constraints

    Authors: Wonsik Yoo, Daesung Yu, Hoon Lee, Seok-Hwan Park

    Abstract: The optimization of cooperative beamforming vectors in cell-free massive MIMO (mMIMO) systems is presented where multi-antenna access points (APs) support downlink data transmission of multiple users. Albeit the successes of the weighted minimum mean squared error (WMMSE) algorithm and their variants, they lack careful investigations about computational complexity that scales with the number of an… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: accepted for publication in IEEE Wireless Communications Letters

  39. arXiv:2407.21199  [pdf, other

    cs.HC

    A Survey on Exploratory Spatiotemporal Visual Analytics Approaches for Climate Science

    Authors: Abdullah-Al-Raihan Nayeem, Dongyun Han, Huikyo Lee, Donghoon Kim, Daniel Feldman, William J. Tolone, Daniel Crichton, Isaac Cho

    Abstract: Climate science produces a wealth of complex, high-dimensional, multivariate data from observations and numerical models. These data are critical for understanding climate changes and their socioeconomic impacts. Climate scientists are continuously evaluating output from numerical models against observations. This model evaluation process provides useful guidance to improve the numerical models an… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  40. arXiv:2407.20806  [pdf, other

    cs.AI cs.LG

    ARCLE: The Abstraction and Reasoning Corpus Learning Environment for Reinforcement Learning

    Authors: Hosung Lee, Sejin Kim, Seungpil Lee, Sanha Hwang, Jihwan Lee, Byung-Jun Lee, Sundong Kim

    Abstract: This paper introduces ARCLE, an environment designed to facilitate reinforcement learning research on the Abstraction and Reasoning Corpus (ARC). Addressing this inductive reasoning benchmark with reinforcement learning presents these challenges: a vast action space, a hard-to-reach goal, and a variety of tasks. We demonstrate that an agent with proximal policy optimization can learn individual ta… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: Accepted by CoLLAs 2024, Project page: https://github.com/confeitoHS/arcle

  41. arXiv:2407.20496  [pdf, other

    cs.LG cs.AI

    Toward Efficient Permutation for Hierarchical N:M Sparsity on GPUs

    Authors: Seungmin Yu, Xiaodie Yi, Hayun Lee, Dongkun Shin

    Abstract: N:M sparsity pruning is a powerful technique for compressing deep neural networks, utilizing NVIDIA's Sparse Tensor Core technology. This method benefits from hardware support for sparse indexing, enabling the adoption of fine-grained sparsity to maintain model accuracy while minimizing the overhead typically associated with irregular data access. Although restricted to a fixed level of sparsity d… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 11 pages, 5 figures

  42. arXiv:2407.20391  [pdf, other

    cs.CV cs.RO

    Alignment Scores: Robust Metrics for Multiview Pose Accuracy Evaluation

    Authors: Seong Hun Lee, Javier Civera

    Abstract: We propose three novel metrics for evaluating the accuracy of a set of estimated camera poses given the ground truth: Translation Alignment Score (TAS), Rotation Alignment Score (RAS), and Pose Alignment Score (PAS). The TAS evaluates the translation accuracy independently of the rotations, and the RAS evaluates the rotation accuracy independently of the translations. The PAS is the average of the… ▽ More

    Submitted 2 August, 2024; v1 submitted 29 July, 2024; originally announced July 2024.

  43. arXiv:2407.20021  [pdf, other

    cs.LG cs.AI cs.CV

    MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity

    Authors: Kanghyun Choi, Hye Yoon Lee, Dain Kwon, SunJong Park, Kyuyeun Kim, Noseong Park, Jinho Lee

    Abstract: Data-free quantization (DFQ) is a technique that creates a lightweight network from its full-precision counterpart without the original training data, often through a synthetic dataset. Although several DFQ methods have been proposed for vision transformer (ViT) architectures, they fail to achieve efficacy in low-bit settings. Examining the existing methods, we identify that their synthetic data p… ▽ More

    Submitted 1 August, 2024; v1 submitted 29 July, 2024; originally announced July 2024.

    Comments: Author Preprint

  44. arXiv:2407.19871  [pdf, ps, other

    cs.CR cs.NI

    Fast Private Location-based Information Retrieval Over the Torus

    Authors: Joon Soo Yoo, Mi Yeon Hong, Ji Won Heo, Kang Hoon Lee, Ji Won Yoon

    Abstract: Location-based services offer immense utility, but also pose significant privacy risks. In response, we propose LocPIR, a novel framework using homomorphic encryption (HE), specifically the TFHE scheme, to preserve user location privacy when retrieving data from public clouds. Our system employs TFHE's expertise in non-polynomial evaluations, crucial for comparison operations. LocPIR showcases min… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted at the IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS) 2024

  45. arXiv:2407.19862  [pdf, other

    cs.SD eess.AS

    Wavespace: A Highly Explorable Wavetable Generator

    Authors: Hazounne Lee, Kihong Kim, Sungho Lee, Kyogu Lee

    Abstract: Wavetable synthesis generates quasi-periodic waveforms of musical tones by interpolating a list of waveforms called wavetable. As generative models that utilize latent representations offer various methods in waveform generation for musical applications, studies in wavetable generation with invertible architecture have also arisen recently. While they are promising, it is still challenging to gene… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  46. arXiv:2407.19644  [pdf, other

    cs.LG cs.AI

    Realizing Unaligned Block-wise Pruning for DNN Acceleration on Mobile Devices

    Authors: Hayun Lee, Dongkun Shin

    Abstract: With the recent proliferation of on-device AI, there is an increasing need to run computationally intensive DNNs directly on mobile devices. However, the limited computing and memory resources of these devices necessitate effective pruning techniques. Block-wise pruning is promising due to its low accuracy drop tradeoff for speedup gains, but it requires block positions to be aligned with block si… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: 11 pages, 8 figures

  47. arXiv:2407.18422  [pdf, other

    cs.AI cs.LG

    A Black Swan Hypothesis in Markov Decision Process via Irrationality

    Authors: Hyunin Lee, David Abel, Ming Jin, Javad Lavaei, Somayeh Sojoudi

    Abstract: Black swan events are statistically rare occurrences that carry extremely high risks. A typical view of defining black swan events is heavily assumed to originate from an unpredictable time-varying environments; however, the community lacks a comprehensive definition of black swan events. To this end, this paper challenges that the standard view is incomplete and claims that high-risk, statistical… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  48. arXiv:2407.18044  [pdf, other

    cs.LG

    The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation

    Authors: Eric Yang, Jonathan Amar, Jong Ha Lee, Bhawesh Kumar, Yugang Jia

    Abstract: Digital health chatbots powered by Large Language Models (LLMs) have the potential to significantly improve personal health management for chronic conditions by providing accessible and on-demand health coaching and question-answering. However, these chatbots risk providing unverified and inaccurate information because LLMs generate responses based on patterns learned from diverse internet data. R… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: 22 pages

  49. arXiv:2407.17423  [pdf, ps, other

    cs.CV

    On selection of centroids of fuzzy clusters for color classification

    Authors: Dae-Won Kim, Kwang H. Lee

    Abstract: A novel initialization method in the fuzzy c-means (FCM) algorithm is proposed for the color clustering problem. Given a set of color points, the proposed initialization extracts dominant colors that are the most vivid and distinguishable colors. Color points closest to the dominant colors are selected as initial centroids in the FCM. To obtain the dominant colors and their closest color points, w… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  50. arXiv:2407.16171  [pdf, other

    cs.CV cs.AI cs.MM

    Learning Trimodal Relation for Audio-Visual Question Answering with Missing Modality

    Authors: Kyu Ri Park, Hong Joo Lee, Jung Uk Kim

    Abstract: Recent Audio-Visual Question Answering (AVQA) methods rely on complete visual and audio input to answer questions accurately. However, in real-world scenarios, issues such as device malfunctions and data transmission errors frequently result in missing audio or visual modality. In such cases, existing AVQA methods suffer significant performance degradation. In this paper, we propose a framework th… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted at ECCV 2024