Skip to main content

Showing 1–50 of 2,034 results for author: Kim, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.17433  [pdf, other

    cs.CV

    DARES: Depth Anything in Robotic Endoscopic Surgery with Self-supervised Vector-LoRA of the Foundation Model

    Authors: Mona Sheikh Zeinoddin, Chiara Lena, Jiongqi Qu, Luca Carlini, Mattia Magro, Seunghoi Kim, Elena De Momi, Sophia Bano, Matthew Grech-Sollars, Evangelos Mazomenos, Daniel C. Alexander, Danail Stoyanov, Matthew J. Clarkson, Mobarakol Islam

    Abstract: Robotic-assisted surgery (RAS) relies on accurate depth estimation for 3D reconstruction and visualization. While foundation models like Depth Anything Models (DAM) show promise, directly applying them to surgery often yields suboptimal results. Fully fine-tuning on limited surgical data can cause overfitting and catastrophic forgetting, compromising model robustness and generalization. Although L… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: 11 pages

  2. arXiv:2408.17066  [pdf, other

    cs.RO

    Non-verbal Interaction and Interface with a Quadruped Robot using Body and Hand Gestures: Design and User Experience Evaluation

    Authors: Soohyun Shin, Trevor Evetts, Hunter Saylor, Hyunji Kim, Soojin Woo, Wonhwha Rhee, Seong-Woo Kim

    Abstract: In recent years, quadruped robots have attracted significant attention due to their practical advantages in maneuverability, particularly when navigating rough terrain and climbing stairs. As these robots become more integrated into various industries, including construction and healthcare, researchers have increasingly focused on developing intuitive interaction methods such as speech and gesture… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: 16 pages

  3. arXiv:2408.17006  [pdf, other

    cs.CV

    Retrieval-Augmented Natural Language Reasoning for Explainable Visual Question Answering

    Authors: Su Hyeon Lim, Minkuk Kim, Hyeon Bae Kim, Seong Tae Kim

    Abstract: Visual Question Answering with Natural Language Explanation (VQA-NLE) task is challenging due to its high demand for reasoning-based inference. Recent VQA-NLE studies focus on enhancing model networks to amplify the model's reasoning capability but this approach is resource-consuming and unstable. In this work, we introduce a new VQA-NLE model, ReRe (Retrieval-augmented natural language Reasoning)… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: ICIP Workshop 2024

  4. arXiv:2408.16213  [pdf, other

    cs.CV cs.AI cs.CL

    M4CXR: Exploring Multi-task Potentials of Multi-modal Large Language Models for Chest X-ray Interpretation

    Authors: Jonggwon Park, Soobum Kim, Byungmu Yoon, Jihun Hyun, Kyoyun Choi

    Abstract: The rapid evolution of artificial intelligence, especially in large language models (LLMs), has significantly impacted various domains, including healthcare. In chest X-ray (CXR) analysis, previous studies have employed LLMs, but with limitations: either underutilizing the multi-tasking capabilities of LLMs or lacking clinical accuracy. This paper presents M4CXR, a multi-modal LLM designed to enha… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  5. arXiv:2408.15620  [pdf, other

    cs.LG cs.IR

    CAPER: Enhancing Career Trajectory Prediction using Temporal Knowledge Graph and Ternary Relationship

    Authors: Yeon-Chang Lee, JaeHyun Lee, Michiharu Yamashita, Dongwon Lee, Sang-Wook Kim

    Abstract: The problem of career trajectory prediction (CTP) aims to predict one's future employer or job position. While several CTP methods have been developed for this problem, we posit that none of these methods (1) jointly considers the mutual ternary dependency between three key units (i.e., user, position, and company) of a career and (2) captures the characteristic shifts of key units in career over… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  6. arXiv:2408.14855  [pdf, other

    cs.AI cs.LO

    Enhancing Analogical Reasoning in the Abstraction and Reasoning Corpus via Model-Based RL

    Authors: Jihwan Lee, Woochang Sim, Sejin Kim, Sundong Kim

    Abstract: This paper demonstrates that model-based reinforcement learning (model-based RL) is a suitable approach for the task of analogical reasoning. We hypothesize that model-based RL can solve analogical reasoning tasks more efficiently through the creation of internal models. To test this, we compared DreamerV3, a model-based RL method, with Proximal Policy Optimization, a model-free RL method, on the… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: Accepted to IJCAI 2024 IARML Workshop

  7. arXiv:2408.14739  [pdf, other

    cs.SD eess.AS

    VoiceTailor: Lightweight Plug-In Adapter for Diffusion-Based Personalized Text-to-Speech

    Authors: Heeseung Kim, Sang-gil Lee, Jiheum Yeom, Che Hyun Lee, Sungwon Kim, Sungroh Yoon

    Abstract: We propose VoiceTailor, a parameter-efficient speaker-adaptive text-to-speech (TTS) system, by equipping a pre-trained diffusion-based TTS model with a personalized adapter. VoiceTailor identifies pivotal modules that benefit from the adapter based on a weight change ratio analysis. We utilize Low-Rank Adaptation (LoRA) as a parameter-efficient adaptation method and incorporate the adapter into pi… ▽ More

    Submitted 27 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

    Comments: INTERSPEECH 2024

  8. arXiv:2408.14488  [pdf

    cs.LG cond-mat.mtrl-sci

    Multi-Task Multi-Fidelity Learning of Properties for Energetic Materials

    Authors: Robert J. Appleton, Daniel Klinger, Brian H. Lee, Michael Taylor, Sohee Kim, Samuel Blankenship, Brian C. Barnes, Steven F. Son, Alejandro Strachan

    Abstract: Data science and artificial intelligence are playing an increasingly important role in the physical sciences. Unfortunately, in the field of energetic materials data scarcity limits the accuracy and even applicability of ML tools. To address data limitations, we compiled multi-modal data: both experimental and computational results for several properties. We find that multi-task neural networks ca… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 16 pages, 4 figures, 2 tables

  9. arXiv:2408.13092  [pdf, other

    cs.LG

    Diffusion-based Episodes Augmentation for Offline Multi-Agent Reinforcement Learning

    Authors: Jihwan Oh, Sungnyun Kim, Gahee Kim, Sunghwan Kim, Se-Young Yun

    Abstract: Offline multi-agent reinforcement learning (MARL) is increasingly recognized as crucial for effectively deploying RL algorithms in environments where real-time interaction is impractical, risky, or costly. In the offline setting, learning from a static dataset of past interactions allows for the development of robust and safe policies without the need for live data collection, which can be fraught… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Accepted by SPIGM Workshop at ICML 2024 (Structured Probabilistic Inference & Generative Modeling)

  10. arXiv:2408.12890  [pdf, other

    cs.AI

    Multiple Areal Feature Aware Transportation Demand Prediction

    Authors: Sumin Han, Jisun An, Youngjun Park, Suji Kim, Kitae Jang, Dongman Lee

    Abstract: A reliable short-term transportation demand prediction supports the authorities in improving the capability of systems by optimizing schedules, adjusting fleet sizes, and generating new transit networks. A handful of research efforts incorporate one or a few areal features while learning spatio-temporal correlation, to capture similar demand patterns between similar areas. However, urban character… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  11. arXiv:2408.12875  [pdf, other

    cs.LG cs.SI

    Disentangling, Amplifying, and Debiasing: Learning Disentangled Representations for Fair Graph Neural Networks

    Authors: Yeon-Chang Lee, Hojung Shin, Sang-Wook Kim

    Abstract: Graph Neural Networks (GNNs) have become essential tools for graph representation learning in various domains, such as social media and healthcare. However, they often suffer from fairness issues due to inherent biases in node attributes and graph structure, leading to unfair predictions. To address these challenges, we propose a novel GNN framework, DAB-GNN, that Disentangles, Amplifies, and deBi… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  12. arXiv:2408.12692  [pdf, other

    cs.AI

    Unlocking Intrinsic Fairness in Stable Diffusion

    Authors: Eunji Kim, Siwon Kim, Rahim Entezari, Sungroh Yoon

    Abstract: Recent text-to-image models like Stable Diffusion produce photo-realistic images but often show demographic biases. Previous debiasing methods focused on training-based approaches, failing to explore the root causes of bias and overlooking Stable Diffusion's potential for unbiased image generation. In this paper, we demonstrate that Stable Diffusion inherently possesses fairness, which can be unlo… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 21 pages, 20 figures; First two authors contributed equally

  13. arXiv:2408.12490  [pdf, other

    cs.RO

    Probabilistic Homotopy Optimization for Dynamic Motion Planning

    Authors: Shayan Pardis, Matthew Chignoli, Sangbae Kim

    Abstract: We present a homotopic approach to solving challenging, optimization-based motion planning problems. The approach uses Homotopy Optimization, which, unlike standard continuation methods for solving homotopy problems, solves a sequence of constrained optimization problems rather than a sequence of nonlinear systems of equations. The insight behind our proposed algorithm is formulating the discovery… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 8 pages, 9 Figures, 2 Tables, to appear in the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

  14. arXiv:2408.10490  [pdf, other

    cs.CL cs.IR

    Analysis of Plan-based Retrieval for Grounded Text Generation

    Authors: Ameya Godbole, Nicholas Monath, Seungyeon Kim, Ankit Singh Rawat, Andrew McCallum, Manzil Zaheer

    Abstract: In text generation, hallucinations refer to the generation of seemingly coherent text that contradicts established knowledge. One compelling hypothesis is that hallucinations occur when a language model is given a generation task outside its parametric knowledge (due to rarity, recency, domain, etc.). A common strategy to address this limitation is to infuse the language models with retrieval mech… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  15. arXiv:2408.10356  [pdf, other

    cs.CV physics.data-an physics.soc-ph

    Diversity and stylization of the contemporary user-generated visual arts in the complexity-entropy plane

    Authors: Seunghwan Kim, Byunghwee Lee, Wonjae Lee

    Abstract: The advent of computational and numerical methods in recent times has provided new avenues for analyzing art historiographical narratives and tracing the evolution of art styles therein. Here, we investigate an evolutionary process underpinning the emergence and stylization of contemporary user-generated visual art styles using the complexity-entropy (C-H) plane, which quantifies local structures… ▽ More

    Submitted 21 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

    Comments: 18 pages, 3 figures, 1 table, SI(4 figures, 3 tables)

  16. arXiv:2408.09662  [pdf, other

    cs.RO cs.DC

    CusADi: A GPU Parallelization Framework for Symbolic Expressions and Optimal Control

    Authors: Se Hwan Jeon, Seungwoo Hong, Ho Jae Lee, Charles Khazoom, Sangbae Kim

    Abstract: The parallelism afforded by GPUs presents significant advantages in training controllers through reinforcement learning (RL). However, integrating model-based optimization into this process remains challenging due to the complexity of formulating and solving optimization problems across thousands of instances. In this work, we present CusADi, an extension of the CasADi symbolic framework to suppor… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: RAL 2024 submission

  17. arXiv:2408.09140  [pdf, other

    cs.LG cs.AI cs.CV

    Learning to Explore for Stochastic Gradient MCMC

    Authors: SeungHyun Kim, Seohyeon Jung, Seonghyeon Kim, Juho Lee

    Abstract: Bayesian Neural Networks(BNNs) with high-dimensional parameters pose a challenge for posterior inference due to the multi-modality of the posterior distributions. Stochastic Gradient MCMC(SGMCMC) with cyclical learning rate scheduling is a promising solution, but it requires a large number of sampling steps to explore high-dimensional multi-modal posteriors, making it computationally expensive. In… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  18. arXiv:2408.08577  [pdf, other

    cond-mat.soft cs.CE physics.bio-ph physics.chem-ph

    Mechanistic Modeling of Lipid Nanoparticle Formation for the Delivery of Nucleic Acid Therapeutics

    Authors: Pavan K. Inguva, Saikat Mukherjee, Pierre J. Walker, Mona A. Kanso, Jie Wang, Yanchen Wu, Vico Tenberg, Srimanta Santra, Shalini Singh, Shin Hyuk Kim, Bernhardt L. Trout, Martin Z. Bazant, Allan S. Myerson, Richard D. Braatz

    Abstract: Nucleic acids such as mRNA have emerged as a promising therapeutic modality with the capability of addressing a wide range of diseases. Lipid nanoparticles (LNPs) as a delivery platform for nucleic acids were used in the COVID-19 vaccines and have received much attention. While modern manufacturing processes which involve rapidly mixing an organic stream containing the lipids with an aqueous strea… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 67 pages, 10 figures

  19. arXiv:2408.08144  [pdf, other

    cs.CL

    MIDAS: Multi-level Intent, Domain, And Slot Knowledge Distillation for Multi-turn NLU

    Authors: Yan Li, So-Eon Kim, Seong-Bae Park, Soyeon Caren Han

    Abstract: Although Large Language Models(LLMs) can generate coherent and contextually relevant text, they often struggle to recognise the intent behind the human user's query. Natural Language Understanding (NLU) models, however, interpret the purpose and key information of user's input to enable responsive interactions. Existing NLU models generally map individual utterances to a dual-level semantic frame,… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  20. arXiv:2408.08090  [pdf, other

    cs.IT

    UV-Plane Beam Mapping for Non-Terrestrial Networks in 3GPP System-Level Simulations

    Authors: Dong-Hyun Jung, Sucheol Kim, Miyeon Lee, Joon-Gyu Ryu, Junil Choi

    Abstract: Due to the high altitudes and large beam sizes of satellites, the curvature of the Earth's surface can impact system-level performance. To consider this, 3GPP introduces the UV-plane beam mapping for system-level simulations of non-terrestrial networks (NTNs). This paper aims to provide a comprehensive understanding of how beams and user equipments (UEs) are placed on the UV-plane and subsequently… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 5 pages, 9 figures, 1 table

  21. arXiv:2408.07947  [pdf, other

    eess.IV cs.AI cs.CV

    Conditional Brownian Bridge Diffusion Model for VHR SAR to Optical Image Translation

    Authors: Seon-Hoon Kim, Dae-won Chung

    Abstract: Synthetic Aperture Radar (SAR) imaging technology provides the unique advantage of being able to collect data regardless of weather conditions and time. However, SAR images exhibit complex backscatter patterns and speckle noise, which necessitate expertise for interpretation. Research on translating SAR images into optical-like representations has been conducted to aid the interpretation of SAR da… ▽ More

    Submitted 20 August, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

    Comments: 5 pages, 2 figures, 1 table

  22. arXiv:2408.07790  [pdf, other

    cs.CV

    Cropper: Vision-Language Model for Image Cropping through In-Context Learning

    Authors: Seung Hyun Lee, Junjie Ke, Yinxiao Li, Junfeng He, Steven Hickson, Katie Datsenko, Sangpil Kim, Ming-Hsuan Yang, Irfan Essa, Feng Yang

    Abstract: The goal of image cropping is to identify visually appealing crops within an image. Conventional methods rely on specialized architectures trained on specific datasets, which struggle to be adapted to new requirements. Recent breakthroughs in large vision-language models (VLMs) have enabled visual in-context learning without explicit training. However, effective strategies for vision downstream ta… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  23. arXiv:2408.07648  [pdf, other

    cs.CV cs.CL

    See It All: Contextualized Late Aggregation for 3D Dense Captioning

    Authors: Minjung Kim, Hyung Suk Lim, Seung Hwan Kim, Soonyoung Lee, Bumsoo Kim, Gunhee Kim

    Abstract: 3D dense captioning is a task to localize objects in a 3D scene and generate descriptive sentences for each object. Recent approaches in 3D dense captioning have adopted transformer encoder-decoder frameworks from object detection to build an end-to-end pipeline without hand-crafted components. However, these approaches struggle with contradicting objectives where a single query attention has to s… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: Accepted to ACL 2024 Findings

  24. arXiv:2408.07416  [pdf, other

    cs.CV

    Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space

    Authors: Hyunjee Lee, Youngsik Yun, Jeongmin Bae, Seoha Kim, Youngjung Uh

    Abstract: Understanding the 3D semantics of a scene is a fundamental problem for various scenarios such as embodied agents. While NeRFs and 3DGS excel at novel-view synthesis, previous methods for understanding their semantics have been limited to incomplete 3D understanding: their segmentation results are 2D masks and their supervision is anchored at 2D pixels. This paper revisits the problem set to pursue… ▽ More

    Submitted 18 August, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

    Comments: Project page: https://hyunji12.github.io/Open3DRF

  25. arXiv:2408.07372  [pdf, ps, other

    stat.ML cs.LG stat.CO

    An Adaptive Importance Sampling for Locally Stable Point Processes

    Authors: Hee-Geon Kang, Sunggon Kim

    Abstract: The problem of finding the expected value of a statistic of a locally stable point process in a bounded region is addressed. We propose an adaptive importance sampling for solving the problem. In our proposal, we restrict the importance point process to the family of homogeneous Poisson point processes, which enables us to generate quickly independent samples of the importance point process. The o… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  26. arXiv:2408.06673  [pdf

    cs.CL

    Pragmatic inference of scalar implicature by LLMs

    Authors: Ye-eun Cho, Seong mook Kim

    Abstract: This study investigates how Large Language Models (LLMs), particularly BERT (Devlin et al., 2019) and GPT-2 (Radford et al., 2019), engage in pragmatic inference of scalar implicature, such as some. Two sets of experiments were conducted using cosine similarity and next sentence/token prediction as experimental methods. The results in experiment 1 showed that, both models interpret some as pragmat… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: This research was presented at the Association for Computational Linguistics conference, held on August 11-16

  27. arXiv:2408.06043  [pdf, other

    cs.CL cs.SD eess.AS

    Enhancing Dialogue Speech Recognition with Robust Contextual Awareness via Noise Representation Learning

    Authors: Wonjun Lee, San Kim, Gary Geunbae Lee

    Abstract: Recent dialogue systems rely on turn-based spoken interactions, requiring accurate Automatic Speech Recognition (ASR). Errors in ASR can significantly impact downstream dialogue tasks. To address this, using dialogue context from user and agent interactions for transcribing subsequent utterances has been proposed. This method incorporates the transcription of the user's speech and the agent's resp… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: 11 pages, 2 figures, Accepted to SIGDIAL2024

  28. arXiv:2408.05749  [pdf, other

    cs.CV cs.LG

    Efficient and Versatile Robust Fine-Tuning of Zero-shot Models

    Authors: Sungyeon Kim, Boseung Jeong, Donghyun Kim, Suha Kwak

    Abstract: Large-scale image-text pre-trained models enable zero-shot classification and provide consistent accuracy across various data distributions. Nonetheless, optimizing these models in downstream tasks typically requires fine-tuning, which reduces generalization to out-of-distribution (OOD) data and demands extensive computational resources. We introduce Robust Adapter (R-Adapter), a novel method for… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: Accepted to ECCV 2024

  29. arXiv:2408.05337  [pdf, other

    cs.CV cs.AI

    VACoDe: Visual Augmented Contrastive Decoding

    Authors: Sihyeon Kim, Boryeong Cho, Sangmin Bae, Sumyeong Ahn, Se-Young Yun

    Abstract: Despite the astonishing performance of recent Large Vision-Language Models (LVLMs), these models often generate inaccurate responses. To address this issue, previous studies have focused on mitigating hallucinations by employing contrastive decoding (CD) with augmented images, which amplifies the contrast with the original image. However, these methods have limitations, including reliance on a sin… ▽ More

    Submitted 26 July, 2024; originally announced August 2024.

    Comments: 10 pages, 7 figures

    MSC Class: 68T01 ACM Class: I.2.0

  30. arXiv:2408.04297  [pdf, other

    cs.ET cs.HC

    Spatial Affordance-aware Interactable Subspace Allocation for Mixed Reality Telepresence

    Authors: Dooyoung Kim, Seonji Kim, Selin Choi, Woontack Woo

    Abstract: To enable remote Virtual Reality (VR) and Augmented Reality (AR) clients to collaborate as if they were in the same space during Mixed Reality (MR) telepresence, it is essential to overcome spatial heterogeneity and generate a unified shared collaborative environment by integrating remote spaces into a target host space. Especially when multiple remote users connect, a large shared space is necess… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Accepted at the 2024 IEEE ISMAR Conference. 10 pages, 6 figures

  31. arXiv:2408.04278  [pdf, other

    cs.CL

    LaDiMo: Layer-wise Distillation Inspired MoEfier

    Authors: Sungyoon Kim, Youngjun Kim, Kihyo Moon, Minsung Jang

    Abstract: The advent of large language models has revolutionized natural language processing, but their increasing complexity has led to substantial training costs, resource demands, and environmental impacts. In response, sparse Mixture-of-Experts (MoE) models have emerged as a promising alternative to dense models. Since training MoE models from scratch can be prohibitively expensive, recent studies have… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 21 pages, 10 figures

  32. HiQuE: Hierarchical Question Embedding Network for Multimodal Depression Detection

    Authors: Juho Jung, Chaewon Kang, Jeewoo Yoon, Seungbae Kim, Jinyoung Han

    Abstract: The utilization of automated depression detection significantly enhances early intervention for individuals experiencing depression. Despite numerous proposals on automated depression detection using recorded clinical interview videos, limited attention has been paid to considering the hierarchical structure of the interview questions. In clinical interviews for diagnosing depression, clinicians u… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 11 pages, 6 figures, Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM '24)

    Journal ref: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM '24), October 21-25, 2024, Boise, ID, USA

  33. arXiv:2408.03541  [pdf, ps, other

    cs.CL cs.AI

    EXAONE 3.0 7.8B Instruction Tuned Language Model

    Authors: LG AI Research, :, Soyoung An, Kyunghoon Bae, Eunbi Choi, Stanley Jungkyu Choi, Yemuk Choi, Seokhee Hong, Yeonjung Hong, Junwon Hwang, Hyojin Jeon, Gerrard Jeongwon Jo, Hyunjik Jo, Jiyeon Jung, Yountae Jung, Euisoon Kim, Hyosang Kim, Joonkee Kim, Seonghwan Kim, Soyeon Kim, Sunkyoung Kim, Yireun Kim, Youchul Kim, Edward Hwayoung Lee, Haeju Lee , et al. (14 additional authors not shown)

    Abstract: We introduce EXAONE 3.0 instruction-tuned language model, the first open model in the family of Large Language Models (LLMs) developed by LG AI Research. Among different model sizes, we publicly release the 7.8B instruction-tuned model to promote open research and innovations. Through extensive evaluations across a wide range of public and in-house benchmarks, EXAONE 3.0 demonstrates highly compet… ▽ More

    Submitted 13 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

  34. arXiv:2408.03322  [pdf, other

    eess.IV cs.CV

    Segment Anything in Medical Images and Videos: Benchmark and Deployment

    Authors: Jun Ma, Sumin Kim, Feifei Li, Mohammed Baharoon, Reza Asakereh, Hongwei Lyu, Bo Wang

    Abstract: Recent advances in segmentation foundation models have enabled accurate and efficient segmentation across a wide range of natural images and videos, but their utility to medical data remains unclear. In this work, we first present a comprehensive benchmarking of the Segment Anything Model 2 (SAM2) across 11 medical image modalities and videos and point out its strengths and weaknesses by comparing… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  35. arXiv:2408.02888  [pdf, other

    cs.CV cs.AI

    VizECGNet: Visual ECG Image Network for Cardiovascular Diseases Classification with Multi-Modal Training and Knowledge Distillation

    Authors: Ju-Hyeon Nam, Seo-Hyung Park, Su Jung Kim, Sang-Chul Lee

    Abstract: An electrocardiogram (ECG) captures the heart's electrical signal to assess various heart conditions. In practice, ECG data is stored as either digitized signals or printed images. Despite the emergence of numerous deep learning models for digitized signals, many hospitals prefer image storage due to cost considerations. Recognizing the unavailability of raw ECG signals in many clinical settings,… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Accepted in International Conference on Image Processing (ICIP) 2024

  36. arXiv:2408.02662  [pdf, other

    cs.RO eess.SY

    Integrating Model-Based Footstep Planning with Model-Free Reinforcement Learning for Dynamic Legged Locomotion

    Authors: Ho Jae Lee, Seungwoo Hong, Sangbae Kim

    Abstract: In this work, we introduce a control framework that combines model-based footstep planning with Reinforcement Learning (RL), leveraging desired footstep patterns derived from the Linear Inverted Pendulum (LIP) dynamics. Utilizing the LIP model, our method forward predicts robot states and determines the desired foot placement given the velocity commands. We then train an RL policy to track the foo… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 8 pages

  37. arXiv:2408.02295  [pdf, other

    cs.LG cs.AI math.PR stat.ML

    Generalized Gaussian Temporal Difference Error For Uncertainty-aware Reinforcement Learning

    Authors: Seyeon Kim, Joonhun Lee, Namhoon Cho, Sungjun Han, Seungeon Baek

    Abstract: Conventional uncertainty-aware temporal difference (TD) learning methods often rely on simplistic assumptions, typically including a zero-mean Gaussian distribution for TD errors. Such oversimplification can lead to inaccurate error representations and compromised uncertainty estimation. In this paper, we introduce a novel framework for generalized Gaussian error modeling in deep reinforcement lea… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  38. arXiv:2408.01872  [pdf, other

    cs.LG cs.AI cs.CV

    Safe Semi-Supervised Contrastive Learning Using In-Distribution Data as Positive Examples

    Authors: Min Gu Kwak, Hyungu Kahng, Seoung Bum Kim

    Abstract: Semi-supervised learning methods have shown promising results in solving many practical problems when only a few labels are available. The existing methods assume that the class distributions of labeled and unlabeled data are equal; however, their performances are significantly degraded in class distribution mismatch scenarios where out-of-distribution (OOD) data exist in the unlabeled data. Previ… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  39. arXiv:2408.01040  [pdf, other

    cs.DC cs.CR cs.CV cs.LG

    Privacy-Preserving Split Learning with Vision Transformers using Patch-Wise Random and Noisy CutMix

    Authors: Seungeun Oh, Sihun Baek, Jihong Park, Hyelin Nam, Praneeth Vepakomma, Ramesh Raskar, Mehdi Bennis, Seong-Lyun Kim

    Abstract: In computer vision, the vision transformer (ViT) has increasingly superseded the convolutional neural network (CNN) for improved accuracy and robustness. However, ViT's large model sizes and high sample complexity make it difficult to train on resource-constrained edge devices. Split learning (SL) emerges as a viable solution, leveraging server-side resources to train ViTs while utilizing private… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 23 pages, 11 figures, 8 tables, to be published in Transactions on Machine Learning Research (TMLR)

  40. arXiv:2408.01024  [pdf, other

    cs.AI

    Semantic Skill Grounding for Embodied Instruction-Following in Cross-Domain Environments

    Authors: Sangwoo Shin, Seunghyun Kim, Youngsoo Jang, Moontae Lee, Honguk Woo

    Abstract: In embodied instruction-following (EIF), the integration of pretrained language models (LMs) as task planners emerges as a significant branch, where tasks are planned at the skill level by prompting LMs with pretrained skills and user instructions. However, grounding these pretrained skills in different domains remains challenging due to their intricate entanglement with the domain-specific knowle… ▽ More

    Submitted 20 August, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

    Comments: Findings of ACL-2024 Camera Ready Version

  41. arXiv:2408.00351  [pdf, other

    cs.CV

    Hierarchically Structured Neural Bones for Reconstructing Animatable Objects from Casual Videos

    Authors: Subin Jeon, In Cho, Minsu Kim, Woong Oh Cho, Seon Joo Kim

    Abstract: We propose a new framework for creating and easily manipulating 3D models of arbitrary objects using casually captured videos. Our core ingredient is a novel hierarchy deformation model, which captures motions of objects with a tree-structured bones. Our hierarchy system decomposes motions based on the granularity and reveals the correlations between parts without exploiting any prior structural k… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: ECCV 2024 accepted

  42. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  43. arXiv:2407.21622  [pdf, other

    stat.ML cs.LG math.ST

    Extended Fiducial Inference: Toward an Automated Process of Statistical Inference

    Authors: Faming Liang, Sehwan Kim, Yan Sun

    Abstract: While fiducial inference was widely considered a big blunder by R.A. Fisher, the goal he initially set --`inferring the uncertainty of model parameters on the basis of observations' -- has been continually pursued by many statisticians. To this end, we develop a new statistical inference method called extended Fiducial inference (EFI). The new method achieves the goal of fiducial inference by leve… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  44. arXiv:2407.21448  [pdf, other

    cs.CV

    Accelerating Image Super-Resolution Networks with Pixel-Level Classification

    Authors: Jinho Jeong, Jinwoo Kim, Younghyun Jo, Seon Joo Kim

    Abstract: In recent times, the need for effective super-resolution (SR) techniques has surged, especially for large-scale images ranging 2K to 8K resolutions. For DNN-based SISR, decomposing images into overlapping patches is typically necessary due to computational constraints. In such patch-decomposing scheme, one can allocate computational resources differently based on each patch's difficulty to further… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  45. arXiv:2407.21075  [pdf, other

    cs.AI cs.CL cs.LG

    Apple Intelligence Foundation Language Models

    Authors: Tom Gunter, Zirui Wang, Chong Wang, Ruoming Pang, Andy Narayanan, Aonan Zhang, Bowen Zhang, Chen Chen, Chung-Cheng Chiu, David Qiu, Deepak Gopinath, Dian Ang Yap, Dong Yin, Feng Nan, Floris Weers, Guoli Yin, Haoshuo Huang, Jianyu Wang, Jiarui Lu, John Peebles, Ke Ye, Mark Lee, Nan Du, Qibin Chen, Quentin Keunebroek , et al. (130 additional authors not shown)

    Abstract: We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  46. arXiv:2407.21032  [pdf, other

    cs.CV cs.AI

    Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion

    Authors: Sanghyun Kim, Seohyeon Jung, Balhae Kim, Moonseok Choi, Jinwoo Shin, Juho Lee

    Abstract: This paper addresses the societal concerns arising from large-scale text-to-image diffusion models for generating potentially harmful or copyrighted content. Existing models rely heavily on internet-crawled data, wherein problematic concepts persist due to incomplete filtration processes. While previous approaches somewhat alleviate the issue, they often rely on text-specified concepts, introducin… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ECCV 2024. 56 pages, 24 figures. Caution: This paper contains discussions and examples related to harmful content, including text and images. Reader discretion is advised. Code is available at https://github.com/nannullna/safeguard-hfi

  47. arXiv:2407.20806  [pdf, other

    cs.AI cs.LG

    ARCLE: The Abstraction and Reasoning Corpus Learning Environment for Reinforcement Learning

    Authors: Hosung Lee, Sejin Kim, Seungpil Lee, Sanha Hwang, Jihwan Lee, Byung-Jun Lee, Sundong Kim

    Abstract: This paper introduces ARCLE, an environment designed to facilitate reinforcement learning research on the Abstraction and Reasoning Corpus (ARC). Addressing this inductive reasoning benchmark with reinforcement learning presents these challenges: a vast action space, a hard-to-reach goal, and a variety of tasks. We demonstrate that an agent with proximal policy optimization can learn individual ta… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: Accepted by CoLLAs 2024, Project page: https://github.com/confeitoHS/arcle

  48. arXiv:2407.20643  [pdf

    cs.CV

    Generalizing AI-driven Assessment of Immunohistochemistry across Immunostains and Cancer Types: A Universal Immunohistochemistry Analyzer

    Authors: Biagio Brattoli, Mohammad Mostafavi, Taebum Lee, Wonkyung Jung, Jeongun Ryu, Seonwook Park, Jongchan Park, Sergio Pereira, Seunghwan Shin, Sangjoon Choi, Hyojin Kim, Donggeun Yoo, Siraj M. Ali, Kyunghyun Paeng, Chan-Young Ock, Soo Ick Cho, Seokhwi Kim

    Abstract: Despite advancements in methodologies, immunohistochemistry (IHC) remains the most utilized ancillary test for histopathologic and companion diagnostics in targeted therapies. However, objective IHC assessment poses challenges. Artificial intelligence (AI) has emerged as a potential solution, yet its development requires extensive training for each cancer and IHC type, limiting versatility. We dev… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  49. arXiv:2407.20212  [pdf, other

    cs.DC cs.CE quant-ph

    Distributed Quantum Approximate Optimization Algorithm on Integrated High-Performance Computing and Quantum Computing Systems for Large-Scale Optimization

    Authors: Seongmin Kim, Tengfei Luo, Eungkyu Lee, In-Saeng Suh

    Abstract: Quantum approximated optimization algorithm (QAOA) has shown promise for solving combinatorial optimization problems by providing quantum speedup on near-term gate-based quantum computing systems. However, QAOA faces challenges in optimizing variational parameters for high-dimensional problems due to the large number of qubits required and the complexity of deep circuits, which limit its scalabili… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  50. arXiv:2407.19681  [pdf, other

    cs.RO cs.AI

    Motion Manifold Flow Primitives for Language-Guided Trajectory Generation

    Authors: Yonghyeon Lee, Byeongho Lee, Seungyeon Kim, Frank C. Park

    Abstract: Developing text-based robot trajectory generation models is made particularly difficult by the small dataset size, high dimensionality of the trajectory space, and the inherent complexity of the text-conditional motion distribution. Recent manifold learning-based methods have partially addressed the dimensionality and dataset size issues, but struggle with the complex text-conditional distribution… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: 12 pages, 10 figures, under review