Skip to main content

Showing 1–50 of 538 results for author: Choi, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.15666  [pdf, other

    cs.CL

    StyleRemix: Interpretable Authorship Obfuscation via Distillation and Perturbation of Style Elements

    Authors: Jillian Fisher, Skyler Hallinan, Ximing Lu, Mitchell Gordon, Zaid Harchaoui, Yejin Choi

    Abstract: Authorship obfuscation, rewriting a text to intentionally obscure the identity of the author, is an important but challenging task. Current methods using large language models (LLMs) lack interpretability and controllability, often ignoring author-specific stylistic features, resulting in less robust performance overall. To address this, we develop StyleRemix, an adaptive and interpretable obfus… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  2. arXiv:2408.13654  [pdf, other

    cs.CL

    Symbolic Working Memory Enhances Language Models for Complex Rule Application

    Authors: Siyuan Wang, Zhongyu Wei, Yejin Choi, Xiang Ren

    Abstract: Large Language Models (LLMs) have shown remarkable reasoning performance but struggle with multi-step deductive reasoning involving a series of rule application steps, especially when rules are presented non-sequentially. Our preliminary analysis shows that while LLMs excel in single-step rule application, their performance drops significantly in multi-step scenarios due to the challenge in rule g… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  3. arXiv:2408.12894  [pdf, other

    cs.CV

    FLoD: Integrating Flexible Level of Detail into 3D Gaussian Splatting for Customizable Rendering

    Authors: Yunji Seo, Young Sun Choi, Hyun Seung Son, Youngjung Uh

    Abstract: 3D Gaussian Splatting (3DGS) achieves fast and high-quality renderings by using numerous small Gaussians, which leads to significant memory consumption. This reliance on a large number of Gaussians restricts the application of 3DGS-based models on low-cost devices due to memory limitations. However, simply reducing the number of Gaussians to accommodate devices with less memory capacity leads to i… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Project page: https://3dgs-flod.github.io/flod.github.io/

  4. arXiv:2408.10937  [pdf, other

    cs.HC

    Proxona: Leveraging LLM-Driven Personas to Enhance Creators' Understanding of Their Audience

    Authors: Yoonseo Choi, Eun Jeong Kang, Seulgi Choi, Min Kyung Lee, Juho Kim

    Abstract: Creators are nothing without their audience, and thereby understanding their audience is the cornerstone of their professional achievement. Yet many creators feel lost while comprehending audiences with existing tools, which offer insufficient insights for tailoring content to audience needs. To address the challenges creators face in understanding their audience, we present Proxona, a system for… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 32 pages (including 14 pages of Appendix)

  5. arXiv:2408.10923  [pdf, other

    cs.CL cs.AI

    LBC: Language-Based-Classifier for Out-Of-Variable Generalization

    Authors: Kangjun Noh, Baekryun Seong, Hoyoon Byun, Youngjun Choi, Sungjin Song, Kyungwoo Song

    Abstract: Large Language Models (LLMs) have great success in natural language processing tasks such as response generation. However, their use in tabular data has been limited due to their inferior performance compared to traditional machine learning models (TMLs) such as XGBoost. We find that the pre-trained knowledge of LLMs enables them to interpret new variables that appear in a test without additional… ▽ More

    Submitted 23 August, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

    Comments: 16 pages, 7 figures, 4 tables

  6. arXiv:2408.08872  [pdf, other

    cs.CV cs.AI cs.CL

    xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

    Authors: Le Xue, Manli Shu, Anas Awadalla, Jun Wang, An Yan, Senthil Purushwalkam, Honglu Zhou, Viraj Prabhu, Yutong Dai, Michael S Ryoo, Shrikant Kendre, Jieyu Zhang, Can Qin, Shu Zhang, Chia-Chih Chen, Ning Yu, Juntao Tan, Tulika Manoj Awalgaonkar, Shelby Heinecke, Huan Wang, Yejin Choi, Ludwig Schmidt, Zeyuan Chen, Silvio Savarese, Juan Carlos Niebles , et al. (2 additional authors not shown)

    Abstract: This report introduces xGen-MM (also known as BLIP-3), a framework for developing Large Multimodal Models (LMMs). The framework comprises meticulously curated datasets, a training recipe, model architectures, and a resulting suite of LMMs. xGen-MM, short for xGen-MultiModal, expands the Salesforce xGen initiative on foundation AI models. Our models undergo rigorous evaluation across a range of tas… ▽ More

    Submitted 28 August, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

  7. arXiv:2408.06672  [pdf, other

    cs.LG cs.AI

    Leveraging Priors via Diffusion Bridge for Time Series Generation

    Authors: Jinseong Park, Seungyun Lee, Woojin Jeong, Yujin Choi, Jaewook Lee

    Abstract: Time series generation is widely used in real-world applications such as simulation, data augmentation, and hypothesis test techniques. Recently, diffusion models have emerged as the de facto approach for time series generation, emphasizing diverse synthesis scenarios based on historical or correlated time series data streams. Since time series have unique characteristics, such as fixed time order… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  8. arXiv:2408.05955  [pdf, other

    cs.CV

    Probabilistic Vision-Language Representation for Weakly Supervised Temporal Action Localization

    Authors: Geuntaek Lim, Hyunwoo Kim, Joonsoo Kim, Yukyung Choi

    Abstract: Weakly supervised temporal action localization (WTAL) aims to detect action instances in untrimmed videos using only video-level annotations. Since many existing works optimize WTAL models based on action classification labels, they encounter the task discrepancy problem (i.e., localization-by-classification). To tackle this issue, recent studies have attempted to utilize action category names as… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: Accepted to ACM MM 2024

  9. arXiv:2408.04895  [pdf, other

    cs.LG cs.AI

    Better Not to Propagate: Understanding Edge Uncertainty and Over-smoothing in Signed Graph Neural Networks

    Authors: Yoonhyuk Choi, Jiho Choi, Taewook Ko, Chong-Kwon Kim

    Abstract: Traditional Graph Neural Networks (GNNs) rely on network homophily, which can lead to performance degradation due to over-smoothing in many real-world heterophily scenarios. Recent studies analyze the smoothing effect (separability) after message-passing (MP), depending on the expectation of node features. Regarding separability gain, they provided theoretical backgrounds on over-smoothing caused… ▽ More

    Submitted 25 August, 2024; v1 submitted 9 August, 2024; originally announced August 2024.

  10. arXiv:2408.04376  [pdf, other

    cs.LG

    Deep Reinforcement Learning for the Design of Metamaterial Mechanisms with Functional Compliance Control

    Authors: Yejun Choi, Yeoneung Kim, Keun Park

    Abstract: Metamaterial mechanisms are micro-architectured compliant structures that operate through the elastic deformation of specially designed flexible members. This study develops an efficient design methodology for compliant mechanisms using deep reinforcement learning (RL). For this purpose, design domains are digitized into finite cells with various hinge connections, and finite element analyses (FEA… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  11. arXiv:2408.03541  [pdf, ps, other

    cs.CL cs.AI

    EXAONE 3.0 7.8B Instruction Tuned Language Model

    Authors: LG AI Research, :, Soyoung An, Kyunghoon Bae, Eunbi Choi, Stanley Jungkyu Choi, Yemuk Choi, Seokhee Hong, Yeonjung Hong, Junwon Hwang, Hyojin Jeon, Gerrard Jeongwon Jo, Hyunjik Jo, Jiyeon Jung, Yountae Jung, Euisoon Kim, Hyosang Kim, Joonkee Kim, Seonghwan Kim, Soyeon Kim, Sunkyoung Kim, Yireun Kim, Youchul Kim, Edward Hwayoung Lee, Haeju Lee , et al. (14 additional authors not shown)

    Abstract: We introduce EXAONE 3.0 instruction-tuned language model, the first open model in the family of Large Language Models (LLMs) developed by LG AI Research. Among different model sizes, we publicly release the 7.8B instruction-tuned model to promote open research and innovations. Through extensive evaluations across a wide range of public and in-house benchmarks, EXAONE 3.0 demonstrates highly compet… ▽ More

    Submitted 13 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

  12. arXiv:2408.00973  [pdf, other

    stat.ML cs.LG math.ST

    META-ANOVA: Screening interactions for interpretable machine learning

    Authors: Yongchan Choi, Seokhun Park, Chanmoo Park, Dongha Kim, Yongdai Kim

    Abstract: There are two things to be considered when we evaluate predictive models. One is prediction accuracy,and the other is interpretability. Over the recent decades, many prediction models of high performance, such as ensemble-based models and deep neural networks, have been developed. However, these models are often too complex, making it difficult to intuitively interpret their predictions. This comp… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: 26 pages

  13. arXiv:2407.18370  [pdf, other

    cs.LG cs.CL

    Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement

    Authors: Jaehun Jung, Faeze Brahman, Yejin Choi

    Abstract: We present a principled approach to provide LLM-based evaluation with a rigorous guarantee of human agreement. We first propose that a reliable evaluation method should not uncritically rely on model preferences for pairwise evaluation, but rather assess the confidence of judge models and selectively decide when to trust its judgement. We then show that under this selective evaluation framework, h… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  14. arXiv:2407.17468  [pdf, other

    cs.CL cs.AI

    WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries

    Authors: Wenting Zhao, Tanya Goyal, Yu Ying Chiu, Liwei Jiang, Benjamin Newman, Abhilasha Ravichander, Khyathi Chandu, Ronan Le Bras, Claire Cardie, Yuntian Deng, Yejin Choi

    Abstract: While hallucinations of large language models (LLMs) prevail as a major challenge, existing evaluation benchmarks on factuality do not cover the diverse domains of knowledge that the real-world users of LLMs seek information about. To bridge this gap, we introduce WildHallucinations, a benchmark that evaluates factuality. It does so by prompting LLMs to generate information about entities mined fr… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  15. arXiv:2407.16607  [pdf, other

    cs.CL cs.LG

    Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?

    Authors: Jonathan Hayase, Alisa Liu, Yejin Choi, Sewoong Oh, Noah A. Smith

    Abstract: The pretraining data of today's strongest language models is opaque; in particular, little is known about the proportions of various domains or languages represented. In this work, we tackle a task which we call data mixture inference, which aims to uncover the distributional make-up of training data. We introduce a novel attack based on a previously overlooked source of information -- byte-pair e… ▽ More

    Submitted 24 July, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

    Comments: 19 pages, 5 figures

  16. arXiv:2407.14733  [pdf, other

    cs.LG cs.AI cs.CL

    Hard Prompts Made Interpretable: Sparse Entropy Regularization for Prompt Tuning with RL

    Authors: Yunseon Choi, Sangmin Bae, Seonghyun Ban, Minchan Jeong, Chuheng Zhang, Lei Song, Li Zhao, Jiang Bian, Kee-Eung Kim

    Abstract: With the advent of foundation models, prompt tuning has positioned itself as an important technique for directing model behaviors and eliciting desired responses. Prompt tuning regards selecting appropriate keywords included into the input, thereby adapting to the downstream task without adjusting or fine-tuning the model parameters. There is a wide range of work in prompt tuning, from approaches… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  17. arXiv:2407.13166  [pdf, other

    cs.HC cs.IR

    Using LLMs to Investigate Correlations of Conversational Follow-up Queries with User Satisfaction

    Authors: Hyunwoo Kim, Yoonseo Choi, Taehyun Yang, Honggu Lee, Chaneon Park, Yongju Lee, Jin Young Kim, Juho Kim

    Abstract: With large language models (LLMs), conversational search engines shift how users retrieve information from the web by enabling natural conversations to express their search intents over multiple turns. Users' natural conversation embodies rich but implicit signals of users' search intents and evaluation of search results to understand user experience with the system. However, it is underexplored h… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted to LLM4Eval @ SIGIR 2024 - The First Workshop on Large Language Models (LLMs) for Evaluation in Information Retrieval

  18. arXiv:2407.12043  [pdf, other

    cs.CL cs.AI cs.HC

    The Art of Saying No: Contextual Noncompliance in Language Models

    Authors: Faeze Brahman, Sachin Kumar, Vidhisha Balachandran, Pradeep Dasigi, Valentina Pyatkin, Abhilasha Ravichander, Sarah Wiegreffe, Nouha Dziri, Khyathi Chandu, Jack Hessel, Yulia Tsvetkov, Noah A. Smith, Yejin Choi, Hannaneh Hajishirzi

    Abstract: Chat-based language models are designed to be helpful, yet they should not comply with every user request. While most existing work primarily focuses on refusal of "unsafe" queries, we posit that the scope of noncompliance should be broadened. We introduce a comprehensive taxonomy of contextual noncompliance describing when and how models should not comply with user requests. Our taxonomy spans a… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  19. arXiv:2407.11438  [pdf, other

    cs.CL

    Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild

    Authors: Niloofar Mireshghallah, Maria Antoniak, Yash More, Yejin Choi, Golnoosh Farnadi

    Abstract: Measuring personal disclosures made in human-chatbot interactions can provide a better understanding of users' AI literacy and facilitate privacy research for large language models (LLMs). We run an extensive, fine-grained analysis on the personal disclosures made by real users to commercial GPT models, investigating the leakage of personally identifiable and sensitive information. To understand t… ▽ More

    Submitted 19 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

  20. arXiv:2407.07087  [pdf, other

    cs.CL cs.LG

    CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation

    Authors: Tong Chen, Akari Asai, Niloofar Mireshghallah, Sewon Min, James Grimmelmann, Yejin Choi, Hannaneh Hajishirzi, Luke Zettlemoyer, Pang Wei Koh

    Abstract: Evaluating the degree of reproduction of copyright-protected content by language models (LMs) is of significant interest to the AI and legal communities. Although both literal and non-literal similarities are considered by courts when assessing the degree of reproduction, prior research has focused only on literal similarities. To bridge this gap, we introduce CopyBench, a benchmark designed to me… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  21. arXiv:2407.06004  [pdf, other

    cs.CL

    Perceptions to Beliefs: Exploring Precursory Inferences for Theory of Mind in Large Language Models

    Authors: Chani Jung, Dongkwan Kim, Jiho Jin, Jiseon Kim, Yeon Seonwoo, Yejin Choi, Alice Oh, Hyunwoo Kim

    Abstract: While humans naturally develop theory of mind (ToM), the capability to understand other people's mental states and beliefs, state-of-the-art large language models (LLMs) underperform on simple ToM benchmarks. We posit that we can extend our understanding of LLMs' ToM abilities by evaluating key human ToM precursors -- perception inference and perception-to-belief inference -- in LLMs. We introduce… ▽ More

    Submitted 9 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  22. arXiv:2407.03086  [pdf, other

    cs.LG cs.AI cs.DC

    Effective Heterogeneous Federated Learning via Efficient Hypernetwork-based Weight Generation

    Authors: Yujin Shin, Kichang Lee, Sungmin Lee, You Rim Choi, Hyung-Sin Kim, JeongGil Ko

    Abstract: While federated learning leverages distributed client resources, it faces challenges due to heterogeneous client capabilities. This necessitates allocating models suited to clients' resources and careful parameter aggregation to accommodate this heterogeneity. We propose HypeMeFed, a novel federated learning framework for supporting client heterogeneity by combining a multi-exit network architectu… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  23. arXiv:2407.02273  [pdf, other

    cs.CL

    Multilingual Trolley Problems for Language Models

    Authors: Zhijing Jin, Sydney Levine, Max Kleiman-Weiner, Giorgio Piatti, Jiarui Liu, Fernando Gonzalez Adauto, Francesco Ortu, András Strausz, Mrinmaya Sachan, Rada Mihalcea, Yejin Choi, Bernhard Schölkopf

    Abstract: As large language models (LLMs) are deployed in more and more real-world situations, it is crucial to understand their decision-making when faced with moral dilemmas. Inspired by a large-scale cross-cultural study of human moral preferences, "The Moral Machine Experiment", we set up the same set of moral choices for LLMs. We translate 1K vignettes of moral dilemmas, parametrically varied across ke… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  24. arXiv:2407.01942  [pdf, other

    cs.AI cs.CL cs.CV

    Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness

    Authors: Khyathi Raghavi Chandu, Linjie Li, Anas Awadalla, Ximing Lu, Jae Sung Park, Jack Hessel, Lijuan Wang, Yejin Choi

    Abstract: The ability to acknowledge the inevitable uncertainty in their knowledge and reasoning is a prerequisite for AI systems to be truly truthful and reliable. In this paper, we present a taxonomy of uncertainty specific to vision-language AI systems, distinguishing between epistemic uncertainty (arising from a lack of information) and aleatoric uncertainty (due to inherent unpredictability), and furth… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 26 pages

  25. arXiv:2407.00369  [pdf, other

    cs.CL

    How to Train Your Fact Verifier: Knowledge Transfer with Multimodal Open Models

    Authors: Jaeyoung Lee, Ximing Lu, Jack Hessel, Faeze Brahman, Youngjae Yu, Yonatan Bisk, Yejin Choi, Saadia Gabriel

    Abstract: Given the growing influx of misinformation across news and social media, there is a critical need for systems that can provide effective real-time verification of news claims. Large language or multimodal model based verification has been proposed to scale up online policing mechanisms for mitigating spread of false and harmful content. While these can potentially reduce burden on human fact-check… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  26. arXiv:2407.00337  [pdf, other

    cs.CE cs.LG math.NA

    Physics-informed active learning with simultaneous weak-form latent space dynamics identification

    Authors: Xiaolong He, April Tran, David M. Bortz, Youngsoo Choi

    Abstract: The parametric greedy latent space dynamics identification (gLaSDI) framework has demonstrated promising potential for accurate and efficient modeling of high-dimensional nonlinear physical systems. However, it remains challenging to handle noisy data. To enhance robustness against noise, we incorporate the weak-form estimation of nonlinear dynamics (WENDy) into gLaSDI. In the proposed weak-form g… ▽ More

    Submitted 20 July, 2024; v1 submitted 29 June, 2024; originally announced July 2024.

  27. arXiv:2406.18510  [pdf, other

    cs.CL

    WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

    Authors: Liwei Jiang, Kavel Rao, Seungju Han, Allyson Ettinger, Faeze Brahman, Sachin Kumar, Niloofar Mireshghallah, Ximing Lu, Maarten Sap, Yejin Choi, Nouha Dziri

    Abstract: We introduce WildTeaming, an automatic LLM safety red-teaming framework that mines in-the-wild user-chatbot interactions to discover 5.7K unique clusters of novel jailbreak tactics, and then composes multiple tactics for systematic exploration of novel jailbreaks. Compared to prior work that performed red-teaming via recruited human workers, gradient-based optimization, or iterative revision with… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  28. arXiv:2406.18495  [pdf, other

    cs.CL

    WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs

    Authors: Seungju Han, Kavel Rao, Allyson Ettinger, Liwei Jiang, Bill Yuchen Lin, Nathan Lambert, Yejin Choi, Nouha Dziri

    Abstract: We introduce WildGuard -- an open, light-weight moderation tool for LLM safety that achieves three goals: (1) identifying malicious intent in user prompts, (2) detecting safety risks of model responses, and (3) determining model refusal rate. Together, WildGuard serves the increasing needs for automatic safety moderation and evaluation of LLM interactions, providing a one-stop tool with enhanced a… ▽ More

    Submitted 9 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: First two authors contributed equally. Third and fourth authors contributed equally

  29. arXiv:2406.15951  [pdf, other

    cs.CL

    Modular Pluralism: Pluralistic Alignment via Multi-LLM Collaboration

    Authors: Shangbin Feng, Taylor Sorensen, Yuhan Liu, Jillian Fisher, Chan Young Park, Yejin Choi, Yulia Tsvetkov

    Abstract: While existing alignment paradigms have been integral in developing large language models (LLMs), LLMs often learn an averaged human preference and struggle to model diverse preferences across cultures, demographics, and communities. We propose Modular Pluralism, a modular framework based on multi-LLM collaboration for pluralistic alignment: it "plugs into" a base LLM a pool of smaller but special… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  30. arXiv:2406.12909  [pdf, other

    cs.LG physics.comp-ph

    Scalable Training of Graph Foundation Models for Atomistic Materials Modeling: A Case Study with HydraGNN

    Authors: Massimiliano Lupo Pasini, Jong Youl Choi, Kshitij Mehta, Pei Zhang, David Rogers, Jonghyun Bae, Khaled Z. Ibrahim, Ashwin M. Aji, Karl W. Schulz, Jorda Polo, Prasanna Balaprakash

    Abstract: We present our work on developing and training scalable graph foundation models (GFM) using HydraGNN, a multi-headed graph convolutional neural network architecture. HydraGNN expands the boundaries of graph neural network (GNN) in both training scale and data diversity. It abstracts over message passing algorithms, allowing both reproduction of and comparison across algorithmic innovations that de… ▽ More

    Submitted 28 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: 16 pages, 13 figures

    MSC Class: 68T07; 68T09 ACM Class: C.2.4; I.2.11

  31. arXiv:2406.11280  [pdf, other

    cs.CV

    i-SRT: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective Judgment

    Authors: Daechul Ahn, Yura Choi, San Kim, Youngjae Yu, Dongyeop Kang, Jonghyun Choi

    Abstract: Aligning Video Large Multimodal Models (VLMMs) face challenges such as modality misalignment and verbose responses. Although iterative approaches such as self-rewarding or iterative direct preference optimization (DPO) recently showed a significant improvement in language model alignment, particularly on reasoning tasks, self-aligned models applied to large video-language models often result in le… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Technical report

  32. arXiv:2406.11271  [pdf, other

    cs.CV cs.LG

    MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens

    Authors: Anas Awadalla, Le Xue, Oscar Lo, Manli Shu, Hannah Lee, Etash Kumar Guha, Matt Jordan, Sheng Shen, Mohamed Awadalla, Silvio Savarese, Caiming Xiong, Ran Xu, Yejin Choi, Ludwig Schmidt

    Abstract: Multimodal interleaved datasets featuring free-form interleaved sequences of images and text are crucial for training frontier large multimodal models (LMMs). Despite the rapid progression of open-source LMMs, there remains a pronounced scarcity of large-scale, diverse open-source multimodal interleaved datasets. In response, we introduce MINT-1T, the most extensive and diverse open-source Multimo… ▽ More

    Submitted 30 July, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  33. arXiv:2406.11069  [pdf, other

    cs.CV cs.AI cs.CL

    WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences

    Authors: Yujie Lu, Dongfu Jiang, Wenhu Chen, William Yang Wang, Yejin Choi, Bill Yuchen Lin

    Abstract: Recent breakthroughs in vision-language models (VLMs) emphasize the necessity of benchmarking human preferences in real-world multimodal interactions. To address this gap, we launched WildVision-Arena (WV-Arena), an online platform that collects human preferences to evaluate VLMs. We curated WV-Bench by selecting 500 high-quality samples from 8,000 user submissions in WV-Arena. WV-Bench uses GPT-4… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: link: https://hf.co/spaces/WildVision/vision-arena

  34. arXiv:2406.09279  [pdf, other

    cs.CL

    Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback

    Authors: Hamish Ivison, Yizhong Wang, Jiacheng Liu, Zeqiu Wu, Valentina Pyatkin, Nathan Lambert, Noah A. Smith, Yejin Choi, Hannaneh Hajishirzi

    Abstract: Learning from preference feedback has emerged as an essential step for improving the generation quality and performance of modern language models (LMs). Despite its widespread use, the way preference-based learning is applied varies wildly, with differing data, learning algorithms, and evaluations used, making disentangling the impact of each aspect difficult. In this work, we identify four core a… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Preprint

  35. arXiv:2406.08464  [pdf, other

    cs.CL cs.AI

    Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

    Authors: Zhangchen Xu, Fengqing Jiang, Luyao Niu, Yuntian Deng, Radha Poovendran, Yejin Choi, Bill Yuchen Lin

    Abstract: High-quality instruction data is critical for aligning large language models (LLMs). Although some models, such as Llama-3-Instruct, have open weights, their alignment data remain private, which hinders the democratization of AI. High human labor costs and a limited, predefined scope for prompting prevent existing open-source data creation methods from scaling effectively, potentially limiting the… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Link: https://magpie-align.github.io/

  36. arXiv:2406.06955  [pdf, other

    cs.DC cs.IR cs.LG

    ElasticRec: A Microservice-based Model Serving Architecture Enabling Elastic Resource Scaling for Recommendation Models

    Authors: Yujeong Choi, Jiin Kim, Minsoo Rhu

    Abstract: With the increasing popularity of recommendation systems (RecSys), the demand for compute resources in datacenters has surged. However, the model-wise resource allocation employed in current RecSys model serving architectures falls short in effectively utilizing resources, leading to sub-optimal total cost of ownership. We propose ElasticRec, a model serving architecture for RecSys providing resou… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Journal ref: 51th IEEE/ACM International Symposium on Computer Architecture (ISCA-51), 2024

  37. arXiv:2406.06786  [pdf, other

    cs.SD cs.AI eess.AS

    BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification

    Authors: June-Woo Kim, Miika Toikkanen, Yera Choi, Seoung-Eun Moon, Ho-Young Jung

    Abstract: Respiratory sound classification (RSC) is challenging due to varied acoustic signatures, primarily influenced by patient demographics and recording environments. To address this issue, we introduce a text-audio multimodal model that utilizes metadata of respiratory sounds, which provides useful complementary information for RSC. Specifically, we fine-tune a pretrained text-audio multimodal model u… ▽ More

    Submitted 14 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted INTERSPEECH 2024

  38. arXiv:2406.05578  [pdf, other

    cs.IR

    Prioritizing Potential Wetland Areas via Region-to-Region Knowledge Transfer and Adaptive Propagation

    Authors: Yoonhyuk Choi, Reepal Shah, John Sabo, K. Selcuk Candan, Huan Liu

    Abstract: Wetlands are important to communities, offering benefits ranging from water purification, and flood protection to recreation and tourism. Therefore, identifying and prioritizing potential wetland areas is a critical decision problem. While data-driven solutions are feasible, this is complicated by significant data sparsity due to the low proportion of wetlands (3-6\%) in many areas of interest in… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  39. arXiv:2406.04770  [pdf, other

    cs.CL cs.AI

    WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

    Authors: Bill Yuchen Lin, Yuntian Deng, Khyathi Chandu, Faeze Brahman, Abhilasha Ravichander, Valentina Pyatkin, Nouha Dziri, Ronan Le Bras, Yejin Choi

    Abstract: We introduce WildBench, an automated evaluation framework designed to benchmark large language models (LLMs) using challenging, real-world user queries. WildBench consists of 1,024 tasks carefully selected from over one million human-chatbot conversation logs. For automated evaluation with WildBench, we have developed two metrics, WB-Reward and WB-Score, which are computable using advanced LLMs su… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Link: https://hf.co/spaces/allenai/WildBench

  40. arXiv:2406.04678  [pdf, other

    cs.CV

    ACE Metric: Advection and Convection Evaluation for Accurate Weather Forecasting

    Authors: Doyi Kim, Minseok Seo, Yeji Choi

    Abstract: Recently, data-driven weather forecasting methods have received significant attention for surpassing the RMSE performance of traditional NWP (Numerical Weather Prediction)-based methods. However, data-driven models are tuned to minimize the loss between forecasted data and ground truths, often using pixel-wise loss. This can lead to models that produce blurred outputs, which, despite being signifi… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 9 pages

  41. arXiv:2405.18093  [pdf, other

    cs.DC cs.LG

    Pipette: Automatic Fine-grained Large Language Model Training Configurator for Real-World Clusters

    Authors: Jinkyu Yim, Jaeyong Song, Yerim Choi, Jaebeen Lee, Jaewon Jung, Hongsun Jang, Jinho Lee

    Abstract: Training large language models (LLMs) is known to be challenging because of the huge computational and memory capacity requirements. To address these issues, it is common to use a cluster of GPUs with 3D parallelism, which splits a model along the data batch, pipeline stage, and intra-layer tensor dimensions. However, the use of 3D parallelism produces the additional challenge of finding the optim… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: published at DATE 2024

  42. arXiv:2405.17821  [pdf, other

    cs.CV cs.AI

    RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in LVLMs

    Authors: Sangmin Woo, Jaehyuk Jang, Donguk Kim, Yubin Choi, Changick Kim

    Abstract: Recent advancements in Large Vision Language Models (LVLMs) have revolutionized how machines understand and generate textual responses based on visual inputs. Despite their impressive capabilities, they often produce "hallucinatory" outputs that do not accurately reflect the visual information, posing challenges in reliability and trustworthiness. Current methods such as contrastive decoding have… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Project page: https://sangminwoo.github.io/RITUAL/

  43. arXiv:2405.17820  [pdf, other

    cs.CV cs.AI

    Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models

    Authors: Sangmin Woo, Donguk Kim, Jaehyuk Jang, Yubin Choi, Changick Kim

    Abstract: This study addresses the issue observed in Large Vision Language Models (LVLMs), where excessive attention on a few image tokens, referred to as blind tokens, leads to hallucinatory responses in tasks requiring fine-grained understanding of visual objects. We found that tokens receiving lower attention weights often hold essential information for identifying nuanced object details -- ranging from… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Project page: https://sangminwoo.github.io/AvisC/

  44. arXiv:2405.16301  [pdf, other

    cs.CV cs.LG

    Active Learning for Finely-Categorized Image-Text Retrieval by Selecting Hard Negative Unpaired Samples

    Authors: Dae Ung Jo, Kyuewang Lee, JaeHo Chung, Jin Young Choi

    Abstract: Securing a sufficient amount of paired data is important to train an image-text retrieval (ITR) model, but collecting paired data is very expensive. To address this issue, in this paper, we propose an active learning algorithm for ITR that can collect paired data cost-efficiently. Previous studies assume that image-text pairs are given and their category labels are asked to the annotator. However,… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  45. arXiv:2405.15780  [pdf, other

    cs.CV cs.LG

    Sequence Length Scaling in Vision Transformers for Scientific Images on Frontier

    Authors: Aristeidis Tsaris, Chengming Zhang, Xiao Wang, Junqi Yin, Siyan Liu, Moetasim Ashfaq, Ming Fan, Jong Youl Choi, Mohamed Wahib, Dan Lu, Prasanna Balaprakash, Feiyi Wang

    Abstract: Vision Transformers (ViTs) are pivotal for foundational models in scientific imagery, including Earth science applications, due to their capability to process large sequence lengths. While transformers for text has inspired scaling sequence lengths in ViTs, yet adapting these for ViTs introduces unique challenges. We develop distributed sequence parallelism for ViTs, enabling them to handle up to… ▽ More

    Submitted 17 April, 2024; originally announced May 2024.

  46. arXiv:2405.14838  [pdf, other

    cs.CL cs.AI cs.LG

    From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step

    Authors: Yuntian Deng, Yejin Choi, Stuart Shieber

    Abstract: When leveraging language models for reasoning tasks, generating explicit chain-of-thought (CoT) steps often proves essential for achieving high accuracy in final outputs. In this paper, we investigate if models can be taught to internalize these CoT steps. To this end, we propose a simple yet effective method for internalizing CoT steps: starting with a model trained for explicit CoT reasoning, we… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  47. arXiv:2405.07105  [pdf, other

    cond-mat.mtrl-sci cs.AI cs.LG

    Overcoming systematic softening in universal machine learning interatomic potentials by fine-tuning

    Authors: Bowen Deng, Yunyeong Choi, Peichen Zhong, Janosh Riebesell, Shashwat Anand, Zhuohan Li, KyuJung Jun, Kristin A. Persson, Gerbrand Ceder

    Abstract: Machine learning interatomic potentials (MLIPs) have introduced a new paradigm for atomic simulations. Recent advancements have seen the emergence of universal MLIPs (uMLIPs) that are pre-trained on diverse materials datasets, providing opportunities for both ready-to-use universal force fields and robust foundations for downstream machine learning refinements. However, their performance in extrap… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  48. arXiv:2405.03958  [pdf, other

    cs.CV cs.AI cs.LG

    Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model

    Authors: Joo Young Choi, Jaesung R. Park, Inkyu Park, Jaewoong Cho, Albert No, Ernest K. Ryu

    Abstract: Current state-of-the-art diffusion models employ U-Net architectures containing convolutional and (qkv) self-attention layers. The U-Net processes images while being conditioned on the time embedding input for each sampling step and the class or caption embedding input corresponding to the desired conditional generation. Such conditioning involves scale-and-shift operations to the convolutional la… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  49. arXiv:2405.02367  [pdf, other

    cs.LG cs.CV

    Enhancing Social Media Post Popularity Prediction with Visual Content

    Authors: Dahyun Jeong, Hyelim Son, Yunjin Choi, Keunwoo Kim

    Abstract: Our study presents a framework for predicting image-based social media content popularity that focuses on addressing complex image information and a hierarchical data structure. We utilize the Google Cloud Vision API to effectively extract key image and color information from users' postings, achieving 6.8% higher accuracy compared to using non-image covariates alone. For prediction, we explore a… ▽ More

    Submitted 8 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

    Report number: Report-no: JKSS-D-23-00299R1

  50. arXiv:2405.01470  [pdf, other

    cs.CL

    WildChat: 1M ChatGPT Interaction Logs in the Wild

    Authors: Wenting Zhao, Xiang Ren, Jack Hessel, Claire Cardie, Yejin Choi, Yuntian Deng

    Abstract: Chatbots such as GPT-4 and ChatGPT are now serving millions of users. Despite their widespread use, there remains a lack of public datasets showcasing how these tools are used by a population of users in practice. To bridge this gap, we offered free access to ChatGPT for online users in exchange for their affirmative, consensual opt-in to anonymously collect their chat transcripts and request head… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: accepted by ICLR 2024