-
Mixed Non-linear Quantization for Vision Transformers
Authors:
Gihwan Kim,
Jemin Lee,
Sihyeong Park,
Yongin Kwon,
Hyungshin Kim
Abstract:
The majority of quantization methods have been proposed to reduce the model size of Vision Transformers, yet most of them have overlooked the quantization of non-linear operations. Only a few works have addressed quantization for non-linear operations, but they applied a single quantization method across all non-linear operations. We believe that this can be further improved by employing a differe…
▽ More
The majority of quantization methods have been proposed to reduce the model size of Vision Transformers, yet most of them have overlooked the quantization of non-linear operations. Only a few works have addressed quantization for non-linear operations, but they applied a single quantization method across all non-linear operations. We believe that this can be further improved by employing a different quantization method for each non-linear operation. Therefore, to assign the most error-minimizing quantization method from the known methods to each non-linear layer, we propose a mixed non-linear quantization that considers layer-wise quantization sensitivity measured by SQNR difference metric. The results show that our method outperforms I-BERT, FQ-ViT, and I-ViT in both 8-bit and 6-bit settings for ViT, DeiT, and Swin models by an average of 0.6%p and 19.6%p, respectively. Our method outperforms I-BERT and I-ViT by 0.6%p and 20.8%p, respectively, when training time is limited. We plan to release our code at https://gitlab.com/ones-ai/mixed-non-linear-quantization.
△ Less
Submitted 25 July, 2024;
originally announced July 2024.
-
First Demonstration of HZO/beta-Ga2O3 Ferroelectric FinFET with Improved Memory Window
Authors:
Seohyeon Park,
Jaewook Yoo,
Hyeojun Song,
Hongseung Lee,
Seongbin Lim,
Soyeon Kim,
Minah Park,
Bongjoong Kim,
Keun Heo,
Peide D. Ye,
Hagyoul Bae
Abstract:
We have experimentally demonstrated the effectiveness of beta-gallium oxide (beta-Ga2O3) ferroelectric fin field-effect transistors (Fe-FinFETs) for the first time. Atomic layer deposited (ALD) hafnium zirconium oxide (HZO) is used as the ferroelectric layer. The HZO/beta-Ga2O3 Fe-FinFETs have wider counterclockwise hysteresis loops in the transfer characteristics than that of conventional planar…
▽ More
We have experimentally demonstrated the effectiveness of beta-gallium oxide (beta-Ga2O3) ferroelectric fin field-effect transistors (Fe-FinFETs) for the first time. Atomic layer deposited (ALD) hafnium zirconium oxide (HZO) is used as the ferroelectric layer. The HZO/beta-Ga2O3 Fe-FinFETs have wider counterclockwise hysteresis loops in the transfer characteristics than that of conventional planar FET, achieving record-high memory window (MW) of 13.9 V in a single HZO layer. When normalized to the actual channel width, FinFETs show an improved ION/IOFF ratio of 2.3x10^7 and a subthreshold swing value of 110 mV/dec. The enhanced characteristics are attributed to the low-interface state density (Dit), showing good interface properties between the beta-Ga2O3 and HZO layer. The enhanced polarization due to larger electric fields across the entire ferroelectric layer in FinFETs is validated using Sentaurus TCAD. After 5x10^6 program/erase (PGM/ERS) cycles, the MW was maintained at 9.2 V, and the retention time was measured up to 3x10^4 s with low degradation. Therefore, the ultrawide bandgap (UWBG) Fe-FinFET was shown to be one of the promising candidates for high-density non-volatile memory devices.
△ Less
Submitted 25 July, 2024;
originally announced July 2024.
-
Determination of $|V_{ub}|$ from simultaneous measurements of untagged $B^0\toπ^- \ell^+ ν_{\ell}$ and $B^+\toρ^0 \ell^+ν_{\ell}$ decays
Authors:
Belle II Collaboration,
I. Adachi,
L. Aggarwal,
H. Aihara,
N. Akopov,
A. Aloisio,
N. Althubiti,
N. Anh Ky,
D. M. Asner,
H. Atmacan,
T. Aushev,
V. Aushev,
M. Aversano,
R. Ayad,
V. Babu,
H. Bae,
S. Bahinipati,
P. Bambade,
Sw. Banerjee,
S. Bansal,
M. Barrett,
J. Baudot,
M. Bauer,
A. Baur,
A. Beaubien
, et al. (395 additional authors not shown)
Abstract:
We present a measurement of $|V_{ub}|$ from a simultaneous study of the charmless semileptonic decays $B^0\toπ^- \ell^+ ν_{\ell}$ and $B^+\toρ^0 \ell^+ν_{\ell}$, where $\ell = e, μ$. This measurement uses a data sample of 387 million $B\overline{B}$ meson pairs recorded by the Belle~II detector at the SuperKEKB electron-positron collider between 2019 and 2022. The two decays are reconstructed with…
▽ More
We present a measurement of $|V_{ub}|$ from a simultaneous study of the charmless semileptonic decays $B^0\toπ^- \ell^+ ν_{\ell}$ and $B^+\toρ^0 \ell^+ν_{\ell}$, where $\ell = e, μ$. This measurement uses a data sample of 387 million $B\overline{B}$ meson pairs recorded by the Belle~II detector at the SuperKEKB electron-positron collider between 2019 and 2022. The two decays are reconstructed without identifying the partner $B$ mesons. We simultaneously measure the differential branching fractions of $B^0\toπ^- \ell^+ ν_{\ell}$ and $B^+\toρ^0 \ell^+ν_{\ell}$ decays as functions of $q^2$ (momentum transfer squared). From these, we obtain total branching fractions $B(B^0\toπ^- \ell^+ ν_{\ell}) = (1.516 \pm 0.042 (\mathrm{stat}) \pm 0.059 (\mathrm{syst})) \times 10^{-4}$ and $B(B^+\toρ^0 \ell^+ν_{\ell}) = (1.625 \pm 0.079 (\mathrm{stat}) \pm 0.180 (\mathrm{syst})) \times 10^{-4}$. By fitting the measured $B^0\toπ^- \ell^+ ν_{\ell}$ partial branching fractions as functions of $q^2$, together with constraints on the non-perturbative hadronic contribution from lattice QCD calculations, we obtain $|V_{ub}|$ = $(3.93 \pm 0.09 \pm 0.13 \pm 0.19) \times 10^{-3}$. Here, the first uncertainty is statistical, the second is systematic, and the third is theoretical.
△ Less
Submitted 24 July, 2024;
originally announced July 2024.
-
Quantile Learn-Then-Test: Quantile-Based Risk Control for Hyperparameter Optimization
Authors:
Amirmohammad Farzaneh,
Sangwoo Park,
Osvaldo Simeone
Abstract:
The increasing adoption of Artificial Intelligence (AI) in engineering problems calls for the development of calibration methods capable of offering robust statistical reliability guarantees. The calibration of black box AI models is carried out via the optimization of hyperparameters dictating architecture, optimization, and/or inference configuration. Prior work has introduced learn-then-test (L…
▽ More
The increasing adoption of Artificial Intelligence (AI) in engineering problems calls for the development of calibration methods capable of offering robust statistical reliability guarantees. The calibration of black box AI models is carried out via the optimization of hyperparameters dictating architecture, optimization, and/or inference configuration. Prior work has introduced learn-then-test (LTT), a calibration procedure for hyperparameter optimization (HPO) that provides statistical guarantees on average performance measures. Recognizing the importance of controlling risk-aware objectives in engineering contexts, this work introduces a variant of LTT that is designed to provide statistical guarantees on quantiles of a risk measure. We illustrate the practical advantages of this approach by applying the proposed algorithm to a radio access scheduling problem.
△ Less
Submitted 24 July, 2024;
originally announced July 2024.
-
Learning Networked Dynamical System Models with Weak Form and Graph Neural Networks
Authors:
Yin Yu,
Daning Huang,
Seho Park,
Herschel C. Pangborn
Abstract:
This paper presents a sequence of two approaches for the data-driven control-oriented modeling of networked systems, i.e., the systems that involve many interacting dynamical components. First, a novel deep learning approach named the weak Latent Dynamics Model (wLDM) is developed for learning generic nonlinear dynamics with control. Leveraging the weak form, the wLDM enables more numerically stab…
▽ More
This paper presents a sequence of two approaches for the data-driven control-oriented modeling of networked systems, i.e., the systems that involve many interacting dynamical components. First, a novel deep learning approach named the weak Latent Dynamics Model (wLDM) is developed for learning generic nonlinear dynamics with control. Leveraging the weak form, the wLDM enables more numerically stable and computationally efficient training as well as more accurate prediction, when compared to conventional methods such as neural ordinary differential equations. Building upon the wLDM framework, we propose the weak Graph Koopman Bilinear Form (wGKBF) model, which integrates geometric deep learning and Koopman theory to learn latent space dynamics for networked systems, especially for the challenging cases having multiple timescales. The effectiveness of the wLDM framework and wGKBF model are demonstrated on three example systems of increasing complexity - a controlled double pendulum, the stiff Brusselator dynamics, and an electrified aircraft energy system. These numerical examples show that the wLDM and wGKBF achieve superior predictive accuracy and training efficiency as compared to baseline models. Parametric studies provide insights into the effects of hyperparameters in the weak form. The proposed framework shows the capability to efficiently capture control-dependent dynamics in these systems, including stiff dynamics and multi-physics interactions, offering a promising direction for learning control-oriented models of complex networked systems.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Optical alignment of contamination-sensitive Far-Ultraviolet spectrographs for Aspera SmallSat mission
Authors:
Aafaque R. Khan,
Erika Hamden,
Haeun Chung,
Heejoo Choi,
Daewook Kim,
Nicole Melso,
Keri Hoadley,
Carlos J. Vargas,
Daniel Truong,
Elijah Garcia,
Bill Verts,
Fernando Coronado,
Jamison Noenickx,
Jason Corliss,
Hannah Tanquary,
Tom Mcmahon,
Dave Hamara,
Simran Agarwal,
Ramona Augustin,
Peter Behroozi,
Harrison Bradley,
Trenton Brendel,
Joe Burchett,
Jasmine Martinez Castillo,
Jacob Chambers
, et al. (26 additional authors not shown)
Abstract:
Aspera is a NASA Astrophysics Pioneers SmallSat mission designed to study diffuse OVI emission from the warm-hot phase gas in the halos of nearby galaxies. Its payload consists of two identical Rowland Circle-type long-slit spectrographs, sharing a single MicroChannel plate detector. Each spectrograph channel consists of an off-axis parabola primary mirror and a toroidal diffraction grating optimi…
▽ More
Aspera is a NASA Astrophysics Pioneers SmallSat mission designed to study diffuse OVI emission from the warm-hot phase gas in the halos of nearby galaxies. Its payload consists of two identical Rowland Circle-type long-slit spectrographs, sharing a single MicroChannel plate detector. Each spectrograph channel consists of an off-axis parabola primary mirror and a toroidal diffraction grating optimized for the 1013-1057 Angstroms bandpass. Despite the simple configuration, the optical alignment/integration process for Aspera is challenging due to tight optical alignment tolerances, driven by the compact form factor, and the contamination sensitivity of the Far-Ultraviolet optics and detectors. In this paper, we discuss implementing a novel multi-phase approach to meet these requirements using state-of-the-art optical metrology tools. For coarsely positioning the optics we use a blue-laser 3D scanner while the fine alignment is done with a Zygo interferometer and a custom computer-generated hologram. The detector focus requires iterative in-vacuum alignment using a Vacuum UV collimator. The alignment is done in a controlled cleanroom facility at the University of Arizona.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
Harmful Suicide Content Detection
Authors:
Kyumin Park,
Myung Jae Baik,
YeongJun Hwang,
Yen Shin,
HoJae Lee,
Ruda Lee,
Sang Min Lee,
Je Young Hannah Sun,
Ah Rah Lee,
Si Yeun Yoon,
Dong-ho Lee,
Jihyung Moon,
JinYeong Bak,
Kyunghyun Cho,
Jong-Woo Paik,
Sungjoon Park
Abstract:
Harmful suicide content on the Internet is a significant risk factor inducing suicidal thoughts and behaviors among vulnerable populations. Despite global efforts, existing resources are insufficient, specifically in high-risk regions like the Republic of Korea. Current research mainly focuses on understanding negative effects of such content or suicide risk in individuals, rather than on automati…
▽ More
Harmful suicide content on the Internet is a significant risk factor inducing suicidal thoughts and behaviors among vulnerable populations. Despite global efforts, existing resources are insufficient, specifically in high-risk regions like the Republic of Korea. Current research mainly focuses on understanding negative effects of such content or suicide risk in individuals, rather than on automatically detecting the harmfulness of content. To fill this gap, we introduce a harmful suicide content detection task for classifying online suicide content into five harmfulness levels. We develop a multi-modal benchmark and a task description document in collaboration with medical professionals, and leverage large language models (LLMs) to explore efficient methods for moderating such content. Our contributions include proposing a novel detection task, a multi-modal Korean benchmark with expert annotations, and suggesting strategies using LLMs to detect illegal and harmful content. Owing to the potential harm involved, we publicize our implementations and benchmark, incorporating an ethical verification process.
△ Less
Submitted 2 June, 2024;
originally announced July 2024.
-
A Multi-Messenger Search for Exotic Field Emission with a Global Magnetometer Network
Authors:
Sami S. Khamis,
Ibrahim A. Sulai,
Paul Hamilton,
S. Afach,
B. C. Buchler,
D. Budker,
N. L. Figueroa,
R. Folman,
D. Gavilán-Martín,
M. Givon,
Z. D. Grujić,
H. Guo,
M. P. Hedges,
D. F. Jackson Kimball,
D. Kim,
E. Klinger,
T. Kornack,
A. Kryemadhi,
N. Kukowski,
G. Lukasiewicz,
H. Masia-Roig,
M. Padniuk,
C. A. Palm,
S. Y. Park,
X. Peng
, et al. (16 additional authors not shown)
Abstract:
We present an analysis method to search for exotic low-mass field (ELF) bursts generated during large energy astrophysical events such as supernovae, binary black hole or binary neutron star mergers, and fast radio bursts using the Global Network of Optical Magnetometers for Exotic physics searches (GNOME). In our model, the associated gravitational waves or electromagnetic signals herald the arri…
▽ More
We present an analysis method to search for exotic low-mass field (ELF) bursts generated during large energy astrophysical events such as supernovae, binary black hole or binary neutron star mergers, and fast radio bursts using the Global Network of Optical Magnetometers for Exotic physics searches (GNOME). In our model, the associated gravitational waves or electromagnetic signals herald the arrival of the ELF burst that interacts via coupling to the spin of fermions in the magnetometers. This enables GNOME to serve as a tool for multi-messenger astronomy. The algorithm employs a model-agnostic excess-power method to identify network-wide candidate events to be subjected to a model-dependent generalized likelihood-ratio test to determine their statistical significance. We perform the first search with this technique on GNOME data coincident with the binary black hole merger S200311bg detected by LIGO/Virgo on the 11th of March 2020 and find no significant events. We place the first lab-based limits on combinations of ELF production and coupling parameters.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
Simultaneous Localization and Affordance Prediction for Tasks in Egocentric Video
Authors:
Zachary Chavis,
Hyun Soo Park,
Stephen J. Guy
Abstract:
Vision-Language Models (VLMs) have shown great success as foundational models for downstream vision and natural language applications in a variety of domains. However, these models lack the spatial understanding necessary for robotics applications where the agent must reason about the affordances provided by the 3D world around them. We present a system which trains on spatially-localized egocentr…
▽ More
Vision-Language Models (VLMs) have shown great success as foundational models for downstream vision and natural language applications in a variety of domains. However, these models lack the spatial understanding necessary for robotics applications where the agent must reason about the affordances provided by the 3D world around them. We present a system which trains on spatially-localized egocentric videos in order to connect visual input and task descriptions to predict a task's spatial affordance, that is the location where a person would go to accomplish the task. We show our approach outperforms the baseline of using a VLM to map similarity of a task's description over a set of location-tagged images. Our learning-based approach has less error both on predicting where a task may take place and on predicting what tasks are likely to happen at the current location. The resulting system enables robots to use egocentric sensing to navigate to physical locations of novel tasks specified in natural language.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
Development of MMC-based lithium molybdate cryogenic calorimeters for AMoRE-II
Authors:
A. Agrawal,
V. V. Alenkov,
P. Aryal,
H. Bae,
J. Beyer,
B. Bhandari,
R. S. Boiko,
K. Boonin,
O. Buzanov,
C. R. Byeon,
N. Chanthima,
M. K. Cheoun,
J. S. Choe,
S. Choi,
S. Choudhury,
J. S. Chung,
F. A. Danevich,
M. Djamal,
D. Drung,
C. Enss,
A. Fleischmann,
A. M. Gangapshev,
L. Gastaldo,
Y. M. Gavrilyuk,
A. M. Gezhaev
, et al. (84 additional authors not shown)
Abstract:
The AMoRE collaboration searches for neutrinoless double beta decay of $^{100}$Mo using molybdate scintillating crystals via low temperature thermal calorimetric detection. The early phases of the experiment, AMoRE-pilot and AMoRE-I, have demonstrated competitive discovery potential. Presently, the AMoRE-II experiment, featuring a large detector array with about 90 kg of $^{100}$Mo isotope, is und…
▽ More
The AMoRE collaboration searches for neutrinoless double beta decay of $^{100}$Mo using molybdate scintillating crystals via low temperature thermal calorimetric detection. The early phases of the experiment, AMoRE-pilot and AMoRE-I, have demonstrated competitive discovery potential. Presently, the AMoRE-II experiment, featuring a large detector array with about 90 kg of $^{100}$Mo isotope, is under construction.This paper discusses the baseline design and characterization of the lithium molybdate cryogenic calorimeters to be used in the AMoRE-II detector modules. The results from prototype setups that incorporate new housing structures and two different crystal masses (316 g and 517 - 521 g), operated at 10 mK temperature, show energy resolutions (FWHM) of 7.55 - 8.82 keV at the 2.615 MeV $^{208}$Tl $γ$ line, and effective light detection of 0.79 - 0.96 keV/MeV. The simultaneous heat and light detection enables clear separation of alpha particles with a discrimination power of 12.37 - 19.50 at the energy region around $^6$Li(n, $α$)$^3$H with Q-value = 4.785 MeV. Promising detector performances were demonstrated at temperatures as high as 30 mK, which relaxes the temperature constraints for operating the large AMoRE-II array.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
MapDistill: Boosting Efficient Camera-based HD Map Construction via Camera-LiDAR Fusion Model Distillation
Authors:
Xiaoshuai Hao,
Ruikai Li,
Hui Zhang,
Dingzhe Li,
Rong Yin,
Sangil Jung,
Seung-In Park,
ByungIn Yoo,
Haimei Zhao,
Jing Zhang
Abstract:
Online high-definition (HD) map construction is an important and challenging task in autonomous driving. Recently, there has been a growing interest in cost-effective multi-view camera-based methods without relying on other sensors like LiDAR. However, these methods suffer from a lack of explicit depth information, necessitating the use of large models to achieve satisfactory performance. To addre…
▽ More
Online high-definition (HD) map construction is an important and challenging task in autonomous driving. Recently, there has been a growing interest in cost-effective multi-view camera-based methods without relying on other sensors like LiDAR. However, these methods suffer from a lack of explicit depth information, necessitating the use of large models to achieve satisfactory performance. To address this, we employ the Knowledge Distillation (KD) idea for efficient HD map construction for the first time and introduce a novel KD-based approach called MapDistill to transfer knowledge from a high-performance camera-LiDAR fusion model to a lightweight camera-only model. Specifically, we adopt the teacher-student architecture, i.e., a camera-LiDAR fusion model as the teacher and a lightweight camera model as the student, and devise a dual BEV transform module to facilitate cross-modal knowledge distillation while maintaining cost-effective camera-only deployment. Additionally, we present a comprehensive distillation scheme encompassing cross-modal relation distillation, dual-level feature distillation, and map head distillation. This approach alleviates knowledge transfer challenges between modalities, enabling the student model to learn improved feature representations for HD map construction. Experimental results on the challenging nuScenes dataset demonstrate the effectiveness of MapDistill, surpassing existing competitors by over 7.7 mAP or 4.5X speedup.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Geometric additivity of modular commutator for multipartite entanglement
Authors:
Sung-Min Park,
Isaac H. Kim,
Eun-Gook Moon
Abstract:
A recent surge of research in many-body quantum entanglement has uncovered intriguing properties of quantum many-body systems. A prime example is the modular commutator, which can extract a topological invariant from a single wave function. Here, we unveil novel geometric properties of many-body entanglement via a modular commutator of two-dimensional gapped quantum many-body systems. We obtain th…
▽ More
A recent surge of research in many-body quantum entanglement has uncovered intriguing properties of quantum many-body systems. A prime example is the modular commutator, which can extract a topological invariant from a single wave function. Here, we unveil novel geometric properties of many-body entanglement via a modular commutator of two-dimensional gapped quantum many-body systems. We obtain the geometric additivity of a modular commutator, indicating that modular commutator for a multipartite system may be an integer multiple of the one for tripartite systems. Using our additivity formula, we also derive a curious identity for the modular commutators involving disconnected intervals in a certain class of conformal field theories. We further illustrate this geometric additivity for both bulk and edge subsystems using numerical calculations of the Haldane and $π$-flux models.
△ Less
Submitted 25 July, 2024; v1 submitted 15 July, 2024;
originally announced July 2024.
-
SPIN: SE(3)-Invariant Physics Informed Network for Binding Affinity Prediction
Authors:
Seungyeon Choi,
Sangmin Seo,
Sanghyun Park
Abstract:
Accurate prediction of protein-ligand binding affinity is crucial for rapid and efficient drug development. Recently, the importance of predicting binding affinity has led to increased attention on research that models the three-dimensional structure of protein-ligand complexes using graph neural networks to predict binding affinity. However, traditional methods often fail to accurately model the…
▽ More
Accurate prediction of protein-ligand binding affinity is crucial for rapid and efficient drug development. Recently, the importance of predicting binding affinity has led to increased attention on research that models the three-dimensional structure of protein-ligand complexes using graph neural networks to predict binding affinity. However, traditional methods often fail to accurately model the complex's spatial information or rely solely on geometric features, neglecting the principles of protein-ligand binding. This can lead to overfitting, resulting in models that perform poorly on independent datasets and ultimately reducing their usefulness in real drug development. To address this issue, we propose SPIN, a model designed to achieve superior generalization by incorporating various inductive biases applicable to this task, beyond merely training on empirical data from datasets. For prediction, we defined two types of inductive biases: a geometric perspective that maintains consistent binding affinity predictions regardless of the complexs rotations and translations, and a physicochemical perspective that necessitates minimal binding free energy along their reaction coordinate for effective protein-ligand binding. These prior knowledge inputs enable the SPIN to outperform comparative models in benchmark sets such as CASF-2016 and CSAR HiQ. Furthermore, we demonstrated the practicality of our model through virtual screening experiments and validated the reliability and potential of our proposed model based on experiments assessing its interpretability.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Kinetic Typography Diffusion Model
Authors:
Seonmi Park,
Inhwan Bae,
Seunghyun Shin,
Hae-Gon Jeon
Abstract:
This paper introduces a method for realistic kinetic typography that generates user-preferred animatable 'text content'. We draw on recent advances in guided video diffusion models to achieve visually-pleasing text appearances. To do this, we first construct a kinetic typography dataset, comprising about 600K videos. Our dataset is made from a variety of combinations in 584 templates designed by p…
▽ More
This paper introduces a method for realistic kinetic typography that generates user-preferred animatable 'text content'. We draw on recent advances in guided video diffusion models to achieve visually-pleasing text appearances. To do this, we first construct a kinetic typography dataset, comprising about 600K videos. Our dataset is made from a variety of combinations in 584 templates designed by professional motion graphics designers and involves changing each letter's position, glyph, and size (i.e., flying, glitches, chromatic aberration, reflecting effects, etc.). Next, we propose a video diffusion model for kinetic typography. For this, there are three requirements: aesthetic appearances, motion effects, and readable letters. This paper identifies the requirements. For this, we present static and dynamic captions used as spatial and temporal guidance of a video diffusion model, respectively. The static caption describes the overall appearance of the video, such as colors, texture and glyph which represent a shape of each letter. The dynamic caption accounts for the movements of letters and backgrounds. We add one more guidance with zero convolution to determine which text content should be visible in the video. We apply the zero convolution to the text content, and impose it on the diffusion model. Lastly, our glyph loss, only minimizing a difference between the predicted word and its ground-truth, is proposed to make the prediction letters readable. Experiments show that our model generates kinetic typography videos with legible and artistic letter motions based on text prompts.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Measurement of $CP$ asymmetries in $B^0 \to K^0_S π^0 γ$ decays at Belle II
Authors:
Belle II Collaboration,
I. Adachi,
L. Aggarwal,
H. Ahmed,
H. Aihara,
N. Akopov,
A. Aloisio,
N. Anh Ky,
D. M. Asner,
H. Atmacan,
T. Aushev,
V. Aushev,
M. Aversano,
R. Ayad,
V. Babu,
H. Bae,
S. Bahinipati,
P. Bambade,
Sw. Banerjee,
S. Bansal,
M. Barrett,
J. Baudot,
A. Baur,
A. Beaubien,
F. Becherer
, et al. (414 additional authors not shown)
Abstract:
We report measurements of time-dependent $CP$ asymmetries in $B^0 \to K^0_S π^0 γ$ decays based on a data sample of $(388\pm6)\times10^6$ $B\bar{B}$ events collected at the $Υ(4S)$ resonance with the Belle II detector. The Belle II experiment operates at the SuperKEKB asymmetric-energy $e^+e^-$ collider. We measure decay-time distributions to determine $CP$-violating parameters $S$ and $C$. We det…
▽ More
We report measurements of time-dependent $CP$ asymmetries in $B^0 \to K^0_S π^0 γ$ decays based on a data sample of $(388\pm6)\times10^6$ $B\bar{B}$ events collected at the $Υ(4S)$ resonance with the Belle II detector. The Belle II experiment operates at the SuperKEKB asymmetric-energy $e^+e^-$ collider. We measure decay-time distributions to determine $CP$-violating parameters $S$ and $C$. We determine these parameters for two ranges of $K^0_S π^0$ invariant mass: $m(K^0_S π^0)\in (0.8, 1.0)$ $GeV/c^2$, which is dominated by $B^0 \to K^{*0} (\to K^0_S π^0) γ$ decays, and a complementary region $m(K^0_S π^0)\in (0.6, 0.8)\cup(1.0, 1.8)$ $GeV/c^2$. Our results have improved precision as compared to previous measurements and are consistent with theory predictions.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Measurement of branching fractions, CP asymmetry, and isospin asymmetry for $\boldsymbol{B\rightarrowργ}$ decays using Belle and Belle II data
Authors:
Belle II Collaboration,
I. Adachi,
K. Adamczyk,
L. Aggarwal,
H. Aihara,
N. Akopov,
A. Aloisio,
N. Anh Ky,
D. M. Asner,
H. Atmacan,
T. Aushev,
V. Aushev,
M. Aversano,
R. Ayad,
V. Babu,
H. Bae,
S. Bahinipati,
P. Bambade,
Sw. Banerjee,
S. Bansal,
M. Barrett,
J. Baudot,
A. Baur,
A. Beaubien,
F. Becherer
, et al. (385 additional authors not shown)
Abstract:
We present measurements of $B^{+}\rightarrowρ^{+}γ$ and $B^{0}\rightarrowρ^{0}γ$ decays using a combined data sample of $772 \times 10^6$ $B\overline{B}$ pairs collected by the Belle experiment and $387\times 10^6$ $B\overline{B}$ pairs collected by the Belle II experiment in $e^{+}e^{-}$ collisions at the $Υ(4S)$ resonance. After an optimized selection, a simultaneous fit to the Belle and Belle I…
▽ More
We present measurements of $B^{+}\rightarrowρ^{+}γ$ and $B^{0}\rightarrowρ^{0}γ$ decays using a combined data sample of $772 \times 10^6$ $B\overline{B}$ pairs collected by the Belle experiment and $387\times 10^6$ $B\overline{B}$ pairs collected by the Belle II experiment in $e^{+}e^{-}$ collisions at the $Υ(4S)$ resonance. After an optimized selection, a simultaneous fit to the Belle and Belle II data sets yields $114\pm 12$ $B^{+}\rightarrowρ^{+}γ$ and $99\pm 12$ $B^{0}\rightarrowρ^{0}γ$ decays. The measured branching fractions are $(13.1^{+2.0 +1.3}_{-1.9 -1.2})\times 10^{-7}$ and $(7.5\pm 1.3^{+1.0}_{-0.8})\times 10^{-7}$ for $B^{+}\rightarrowρ^{+}γ$ and $B^{0}\rightarrowρ^{0}γ$ decays, respectively, where the first uncertainty is statistical and the second is systematic. We also measure the isospin asymmetry $A_{\rm I}(B\rightarrowργ)=(10.9^{+11.2 +7.8}_{-11.7 -7.3})\%$ and the direct CP asymmetry $A_{CP}(B^{+}\rightarrowρ^{+}γ)=(-8.2\pm 15.2^{+1.6}_{-1.2})\%$.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Centrality dependence of Lévy-stable two-pion Bose-Einstein correlations in $\sqrt{s_{_{NN}}}=200$ GeV Au$+$Au collisions
Authors:
PHENIX Collaboration,
N. J. Abdulameer,
U. Acharya,
A. Adare,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
R. Akimoto,
H. Al-Ta'ani,
J. Alexander,
A. Angerami,
K. Aoki,
N. Apadula,
Y. Aramaki,
H. Asano,
E. C. Aschenauer,
E. T. Atomssa,
T. C. Awes,
B. Azmoun,
V. Babintsev,
M. Bai,
B. Bannier,
K. N. Barish,
B. Bassalleck,
S. Bathe
, et al. (377 additional authors not shown)
Abstract:
The PHENIX experiment measured the centrality dependence of two-pion Bose-Einstein correlation functions in $\sqrt{s_{_{NN}}}=200$~GeV Au$+$Au collisions at the Relativistic Heavy Ion Collider at Brookhaven National Laboratory. The data are well represented by Lévy-stable source distributions. The extracted source parameters are the correlation-strength parameter $λ$, the Lévy index of stability…
▽ More
The PHENIX experiment measured the centrality dependence of two-pion Bose-Einstein correlation functions in $\sqrt{s_{_{NN}}}=200$~GeV Au$+$Au collisions at the Relativistic Heavy Ion Collider at Brookhaven National Laboratory. The data are well represented by Lévy-stable source distributions. The extracted source parameters are the correlation-strength parameter $λ$, the Lévy index of stability $α$, and the Lévy-scale parameter $R$ as a function of transverse mass $m_T$ and centrality. The $λ(m_T)$ parameter is constant at larger values of $m_T$, but decreases as $m_T$ decreases. The Lévy scale parameter $R(m_T)$ decreases with $m_T$ and exhibits proportionality to the length scale of the nuclear overlap region. The Lévy exponent $α(m_T)$ is independent of $m_T$ within uncertainties in each investigated centrality bin, but shows a clear centrality dependence. At all centralities, the Lévy exponent $α$ is significantly different from that of Gaussian ($α=2$) or Cauchy ($α=1$) source distributions. Comparisons to the predictions of Monte-Carlo simulations of resonance-decay chains show that in all but the most peripheral centrality class (50%-60%), the obtained results are inconsistent with the measurements, unless a significant reduction of the in-medium mass of the $η'$ meson is included. In each centrality class, the best value of the in-medium $η'$ mass is compared to the mass of the $η$ meson, as well as to several theoretical predictions that consider restoration of $U_A(1)$ symmetry in hot hadronic matter.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Safe-Embed: Unveiling the Safety-Critical Knowledge of Sentence Encoders
Authors:
Jinseok Kim,
Jaewon Jung,
Sangyeop Kim,
Sohyung Park,
Sungzoon Cho
Abstract:
Despite the impressive capabilities of Large Language Models (LLMs) in various tasks, their vulnerability to unsafe prompts remains a critical issue. These prompts can lead LLMs to generate responses on illegal or sensitive topics, posing a significant threat to their safe and ethical use. Existing approaches attempt to address this issue using classification models, but they have several drawback…
▽ More
Despite the impressive capabilities of Large Language Models (LLMs) in various tasks, their vulnerability to unsafe prompts remains a critical issue. These prompts can lead LLMs to generate responses on illegal or sensitive topics, posing a significant threat to their safe and ethical use. Existing approaches attempt to address this issue using classification models, but they have several drawbacks. With the increasing complexity of unsafe prompts, similarity search-based techniques that identify specific features of unsafe prompts provide a more robust and effective solution to this evolving problem. This paper investigates the potential of sentence encoders to distinguish safe from unsafe prompts, and the ability to classify various unsafe prompts according to a safety taxonomy. We introduce new pairwise datasets and the Categorical Purity (CP) metric to measure this capability. Our findings reveal both the effectiveness and limitations of existing sentence encoders, proposing directions to improve sentence encoders to operate as more robust safety detectors. Our code is available at https://github.com/JwdanielJung/Safe-Embed.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Implicit Regression in Subspace for High-Sensitivity CEST Imaging
Authors:
Chu Chen,
Yang Liu,
Se Weon Park,
Jizhou Li,
Kannie W. Y. Chan,
Raymond H. F. Chan
Abstract:
Chemical Exchange Saturation Transfer (CEST) MRI demonstrates its capability in significantly enhancing the detection of proteins and metabolites with low concentrations through exchangeable protons. The clinical application of CEST, however, is constrained by its low contrast and low signal-to-noise ratio (SNR) in the acquired data. Denoising, as one of the post-processing stages for CEST data, c…
▽ More
Chemical Exchange Saturation Transfer (CEST) MRI demonstrates its capability in significantly enhancing the detection of proteins and metabolites with low concentrations through exchangeable protons. The clinical application of CEST, however, is constrained by its low contrast and low signal-to-noise ratio (SNR) in the acquired data. Denoising, as one of the post-processing stages for CEST data, can effectively improve the accuracy of CEST quantification. In this work, by modeling spatial variant z-spectrums into low-dimensional subspace, we introduce Implicit Regression in Subspace (IRIS), which is an unsupervised denoising algorithm utilizing the excellent property of implicit neural representation for continuous mapping. Experiments conducted on both synthetic and in-vivo data demonstrate that our proposed method surpasses other CEST denoising methods regarding both qualitative and quantitative performance.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Learning Equilibrium with Estimated Payoffs in Population Games
Authors:
Shinkyu Park
Abstract:
We study a multi-agent decision problem in population games, where agents select from multiple available strategies and continually revise their selections based on the payoffs associated with these strategies. Unlike conventional population game formulations, we consider a scenario where agents must estimate the payoffs through local measurements and communication with their neighbors. By employi…
▽ More
We study a multi-agent decision problem in population games, where agents select from multiple available strategies and continually revise their selections based on the payoffs associated with these strategies. Unlike conventional population game formulations, we consider a scenario where agents must estimate the payoffs through local measurements and communication with their neighbors. By employing task allocation games -- dynamic extensions of conventional population games -- we examine how errors in payoff estimation by individual agents affect the convergence of the strategy revision process. Our main contribution is an analysis of how estimation errors impact the convergence of the agents' strategy profile to equilibrium. Based on the analytical results, we propose a design for a time-varying strategy revision rate to guarantee convergence. Simulation studies illustrate how the proposed method for updating the revision rate facilitates convergence to equilibrium.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Improved limit on neutrinoless double beta decay of \mohundred~from AMoRE-I
Authors:
A. Agrawal,
V. V. Alenkov,
P. Aryal,
J. Beyer,
B. Bhandari,
R. S. Boiko,
K. Boonin,
O. Buzanov,
C. R. Byeon,
N. Chanthima,
M. K. Cheoun,
J. S. Choe,
Seonho Choi,
S. Choudhury,
J. S. Chung,
F. A. Danevich,
M. Djamal,
D. Drung,
C. Enss,
A. Fleischmann,
A. M. Gangapshev,
L. Gastaldo,
Y. M. Gavrilyuk,
A. M. Gezhaev,
O. Gileva
, et al. (83 additional authors not shown)
Abstract:
AMoRE searches for the signature of neutrinoless double beta decay of $^{100}$Mo with a 100 kg sample of enriched $^{100}$Mo. Scintillating molybdate crystals coupled with a metallic magnetic calorimeter operate at milli-Kelvin temperatures to measure the energy of electrons emitted in the decay. As a demonstration of the full-scale AMoRE, we conducted AMoRE-I, a pre-experiment with 18 molybdate c…
▽ More
AMoRE searches for the signature of neutrinoless double beta decay of $^{100}$Mo with a 100 kg sample of enriched $^{100}$Mo. Scintillating molybdate crystals coupled with a metallic magnetic calorimeter operate at milli-Kelvin temperatures to measure the energy of electrons emitted in the decay. As a demonstration of the full-scale AMoRE, we conducted AMoRE-I, a pre-experiment with 18 molybdate crystals, at the Yangyang Underground Laboratory for over two years. The exposure was 8.02 kg$\cdot$year (or 3.89 kg$_{\mathrm{^{100}Mo}}\cdot$year) and the total background rate near the Q-value was 0.025 $\pm$ 0.002 counts/keV/kg/year. We observed no indication of $0νββ$ decay and report a new lower limit of the half-life of $^{100}$Mo $0νββ$ decay as $ T^{0ν}_{1/2}>3.0\times10^{24}~\mathrm{years}$ at 90\% confidence level. The effective Majorana mass limit range is $m_{ββ}<$(210--610) meV using nuclear matrix elements estimated in the framework of different models, including the recent shell model calculations.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Rethinking Image Skip Connections in StyleGAN2
Authors:
Seung Park,
Yong-Goo Shin
Abstract:
Various models based on StyleGAN have gained significant traction in the field of image synthesis, attributed to their robust training stability and superior performances. Within the StyleGAN framework, the adoption of image skip connection is favored over the traditional residual connection. However, this preference is just based on empirical observations; there has not been any in-depth mathemat…
▽ More
Various models based on StyleGAN have gained significant traction in the field of image synthesis, attributed to their robust training stability and superior performances. Within the StyleGAN framework, the adoption of image skip connection is favored over the traditional residual connection. However, this preference is just based on empirical observations; there has not been any in-depth mathematical analysis on it yet. To rectify this situation, this brief aims to elucidate the mathematical meaning of the image skip connection and introduce a groundbreaking methodology, termed the image squeeze connection, which significantly improves the quality of image synthesis. Specifically, we analyze the image skip connection technique to reveal its problem and introduce the proposed method which not only effectively boosts the GAN performance but also reduces the required number of network parameters. Extensive experiments on various datasets demonstrate that the proposed method consistently enhances the performance of state-of-the-art models based on StyleGAN. We believe that our findings represent a vital advancement in the field of image synthesis, suggesting a novel direction for future research and applications.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Search for the baryon number and lepton number violating decays $τ^-\to Λπ^-$ and $τ^-\to \barΛπ^-$ at Belle II
Authors:
Belle II Collaboration,
I. Adachi,
L. Aggarwal,
H. Ahmed,
H. Aihara,
N. Akopov,
A. Aloisio,
N. Althubiti,
N. Anh Ky,
D. M. Asner,
H. Atmacan,
T. Aushev,
V. Aushev,
M. Aversano,
R. Ayad,
V. Babu,
H. Bae,
S. Bahinipati,
P. Bambade,
Sw. Banerjee,
S. Bansal,
M. Barrett,
J. Baudot,
A. Baur,
A. Beaubien
, et al. (349 additional authors not shown)
Abstract:
We present a search for the baryon number $B$ and lepton number $L$ violating decays $τ^- \rightarrow Λπ^-$ and $τ^- \rightarrow \barΛ π^-$ produced from the $e^+e^-\to τ^+τ^-$ process, using a 364 fb$^{-1}$ data sample collected by the Belle~II experiment at the SuperKEKB collider. No evidence of signal is found in either decay mode, which have $|Δ(B-L)|$ equal to $2$ and $0$, respectively. Upper…
▽ More
We present a search for the baryon number $B$ and lepton number $L$ violating decays $τ^- \rightarrow Λπ^-$ and $τ^- \rightarrow \barΛ π^-$ produced from the $e^+e^-\to τ^+τ^-$ process, using a 364 fb$^{-1}$ data sample collected by the Belle~II experiment at the SuperKEKB collider. No evidence of signal is found in either decay mode, which have $|Δ(B-L)|$ equal to $2$ and $0$, respectively. Upper limits at 90\% credibility level on the branching fractions of $τ^- \rightarrow Λπ^-$ and $τ^- \rightarrow \barΛπ^-$ are determined to be $4.7 \times 10^{-8}$ and $4.3 \times 10^{-8}$, respectively.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Spectroscopy of deeply bound orbitals in neutron-rich Ca isotopes
Authors:
P. J. Li,
J. Lee,
P. Doornenbal,
S. Chen,
S. Wang,
A. Obertelli,
Y. Chazono,
J. D. Holt,
B. S. Hu,
K. Ogata,
Y. Utsuno,
K. Yoshida,
N. L. Achouri,
H. Baba,
F. Browne,
D. Calvet,
F. Château,
N. Chiga,
A. Corsi,
M. L. Cortés,
A. Delbart,
J-M. Gheller,
A. Giganon,
A. Gillibert,
C. Hilaire
, et al. (63 additional authors not shown)
Abstract:
The calcium isotopes are an ideal system to investigate the evolution of shell structure and magic numbers. Although the properties of surface nucleons in calcium have been well studied, probing the structure of deeply bound nucleons remains a challenge. Here, we report on the first measurement of unbound states in $^{53}$Ca and $^{55}$Ca, populated from \ts{54,56}Ca($p,pn$) reactions at a beam en…
▽ More
The calcium isotopes are an ideal system to investigate the evolution of shell structure and magic numbers. Although the properties of surface nucleons in calcium have been well studied, probing the structure of deeply bound nucleons remains a challenge. Here, we report on the first measurement of unbound states in $^{53}$Ca and $^{55}$Ca, populated from \ts{54,56}Ca($p,pn$) reactions at a beam energy of around 216 MeV/nucleon at the RIKEN Radioactive Isotopes Beam Factory. The resonance properties, partial cross sections, and momentum distributions of these unbound states were analyzed. Orbital angular momentum $l$ assignments were extracted from momentum distributions based on calculations using the distorted wave impulse approximation (DWIA) reaction model. The resonances at excitation energies of 5516(41)\,keV in $^{53}$Ca and 6000(250)\,keV in $^{55}$Ca indicate a significant $l$\, =\,3 component, providing the first experimental evidence for the $ν0f_{7/2}$ single-particle strength of unbound hole states in the neutron-rich Ca isotopes. The observed excitation energies and cross-sections point towards extremely localized and well separated strength distributions, with some fragmentation for the $ν0f_{7/2}$ orbital in $^{55}$Ca. These results are in good agreement with predictions from shell-model calculations using the effective GXPF1Bs interaction and \textit{ab initio} calculations and diverge markedly from the experimental distributions in the nickel isotones at $Z=28$.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Evidence of $h_{b}(\text{2P}) \to Υ(\text{1S})η$ decay and search for $h_{b}(\text{1P,2P}) \to Υ(\text{1S})π^0$ with the Belle detector
Authors:
Belle Collaboration,
E. Kovalenko,
I. Adachi,
H. Aihara,
D. M. Asner,
T. Aushev,
R. Ayad,
V. Babu,
Sw. Banerjee,
K. Belous,
J. Bennett,
M. Bessner,
T. Bilka,
D. Biswas,
A. Bobrov,
D. Bodrov,
A. Bondar,
A. Bozek,
M. Bračko,
P. Branchini,
T. E. Browder,
A. Budano,
M. Campajola,
M. -C. Chang,
B. G. Cheon
, et al. (142 additional authors not shown)
Abstract:
We report the first evidence for the $h_{b}(\text{2P}) \to Υ(\text{1S})η$ transition with a significance of $3.5$ standard deviations. The decay branching fraction is measured to be $\mathcal{B}[h_{b}(\text{2P}) \to Υ(\text{1S})η]=(7.1 ~^{+3.7} _{-3.2}\pm 0.8)\times10^{-3}$, which is noticeably smaller than expected. We also set upper limits on $π^0$ transitions of…
▽ More
We report the first evidence for the $h_{b}(\text{2P}) \to Υ(\text{1S})η$ transition with a significance of $3.5$ standard deviations. The decay branching fraction is measured to be $\mathcal{B}[h_{b}(\text{2P}) \to Υ(\text{1S})η]=(7.1 ~^{+3.7} _{-3.2}\pm 0.8)\times10^{-3}$, which is noticeably smaller than expected. We also set upper limits on $π^0$ transitions of $\mathcal{B}[h_{b}(\text{2P}) \to Υ(\text{1S})π^0] < 1.8\times10^{-3}$, and $\mathcal{B}[h_{b}(\text{1P})\to Υ(\text{1S})π^0] < 1.8\times10^{-3}$, at the $90\%$ confidence level. These results are obtained with a $131.4$~fb$^{-1}$ data sample collected near the $Υ(\text{5S})$ resonance with the Belle detector at the KEKB asymmetric-energy $e^+e^-$ collider.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Dimensionality Engineering of Magnetic Anisotropy from Anomalous Hall Effect in Synthetic SrRuO3 Crystals
Authors:
Seung Gyo Jeong,
Seong Won Cho,
Sehwan Song,
Jin Young Oh,
Do Gyeom Jeong,
Gyeongtak Han,
Hu Young Jeong,
Ahmed Yousef Mohamed,
Woo-suk Noh,
Sungkyun Park,
Jong Seok Lee,
Suyoun Lee,
Young-Min Kim,
Deok-Yong Cho,
Woo Seok Choi
Abstract:
Magnetic anisotropy in atomically thin correlated heterostructures is essential for exploring quantum magnetic phases for next-generation spintronics. Whereas previous studies have mostly focused on van der Waals systems, here, we investigate the impact of dimensionality of epitaxially-grown correlated oxides down to the monolayer limit on structural, magnetic, and orbital anisotropies. By designi…
▽ More
Magnetic anisotropy in atomically thin correlated heterostructures is essential for exploring quantum magnetic phases for next-generation spintronics. Whereas previous studies have mostly focused on van der Waals systems, here, we investigate the impact of dimensionality of epitaxially-grown correlated oxides down to the monolayer limit on structural, magnetic, and orbital anisotropies. By designing oxide superlattices with a correlated ferromagnetic SrRuO3 and nonmagnetic SrTiO3 layers, we observed modulated ferromagnetic behavior with the change of the SrRuO3 thickness. Especially, for three-unit-cell-thick layers, we observe a significant 1,500% improvement of coercive field in the anomalous Hall effect, which cannot be solely attributed to the dimensional crossover in ferromagnetism. The atomic-scale heterostructures further reveal the systematic modulation of anisotropy for the lattice structure and orbital hybridization, explaining the enhanced magnetic anisotropy. Our findings provide valuable insights into engineering the anisotropic hybridization of synthetic magnetic crystals, offering a tunable spin order for various applications.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment
Authors:
Janghwan Lee,
Seongmin Park,
Sukjin Hong,
Minsoo Kim,
Du-Seong Chang,
Jungwook Choi
Abstract:
The rapid advancement of large language models (LLMs) has facilitated their transformation into conversational chatbots that can grasp contextual nuances and generate pertinent sentences, closely mirroring human values through advanced techniques such as instruction tuning and reinforcement learning from human feedback (RLHF). However, the computational efficiency required for LLMs, achieved throu…
▽ More
The rapid advancement of large language models (LLMs) has facilitated their transformation into conversational chatbots that can grasp contextual nuances and generate pertinent sentences, closely mirroring human values through advanced techniques such as instruction tuning and reinforcement learning from human feedback (RLHF). However, the computational efficiency required for LLMs, achieved through techniques like post-training quantization (PTQ), presents challenges such as token-flipping that can impair chatbot performance. In response, we propose a novel preference alignment approach, quantization-aware direct preference optimization (QDPO), that aligns quantized LLMs with their full-precision counterparts, improving conversational abilities. Evaluated on two instruction-tuned LLMs in various languages, QDPO demonstrated superior performance in improving conversational abilities compared to established PTQ and knowledge-distillation fine-tuning techniques, marking a significant step forward in the development of efficient and effective conversational LLMs.
△ Less
Submitted 18 July, 2024; v1 submitted 3 July, 2024;
originally announced July 2024.
-
MentalAgora: A Gateway to Advanced Personalized Care in Mental Health through Multi-Agent Debating and Attribute Control
Authors:
Yeonji Lee,
Sangjun Park,
Kyunghyun Cho,
JinYeong Bak
Abstract:
As mental health issues globally escalate, there is a tremendous need for advanced digital support systems. We introduce MentalAgora, a novel framework employing large language models enhanced by interaction between multiple agents for tailored mental health support. This framework operates through three stages: strategic debating, tailored counselor creation, and response generation, enabling the…
▽ More
As mental health issues globally escalate, there is a tremendous need for advanced digital support systems. We introduce MentalAgora, a novel framework employing large language models enhanced by interaction between multiple agents for tailored mental health support. This framework operates through three stages: strategic debating, tailored counselor creation, and response generation, enabling the dynamic customization of responses based on individual user preferences and therapeutic needs. We conduct experiments utilizing a high-quality evaluation dataset TherapyTalk crafted with mental health professionals, shwoing that MentalAgora generates expert-aligned and user preference-enhanced responses. Our evaluations, including experiments and user studies, demonstrate that MentalAgora aligns with professional standards and effectively meets user preferences, setting a new benchmark for digital mental health interventions.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness
Authors:
Khyathi Raghavi Chandu,
Linjie Li,
Anas Awadalla,
Ximing Lu,
Jae Sung Park,
Jack Hessel,
Lijuan Wang,
Yejin Choi
Abstract:
The ability to acknowledge the inevitable uncertainty in their knowledge and reasoning is a prerequisite for AI systems to be truly truthful and reliable. In this paper, we present a taxonomy of uncertainty specific to vision-language AI systems, distinguishing between epistemic uncertainty (arising from a lack of information) and aleatoric uncertainty (due to inherent unpredictability), and furth…
▽ More
The ability to acknowledge the inevitable uncertainty in their knowledge and reasoning is a prerequisite for AI systems to be truly truthful and reliable. In this paper, we present a taxonomy of uncertainty specific to vision-language AI systems, distinguishing between epistemic uncertainty (arising from a lack of information) and aleatoric uncertainty (due to inherent unpredictability), and further explore finer categories within. Based on this taxonomy, we synthesize a benchmark dataset, CertainlyUncertain, featuring 178K visual question answering (VQA) samples as contrastive pairs. This is achieved by 1) inpainting images to make previously answerable questions into unanswerable ones; and 2) using image captions to prompt large language models for both answerable and unanswerable questions. Additionally, we introduce a new metric confidence-weighted accuracy, that is well correlated with both accuracy and calibration error, to address the shortcomings of existing metrics.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Pointwise estimates of the Bergman kernel with an exponential weight on the unit ball
Authors:
Hong Rae Cho,
Soohyun Park
Abstract:
We consider the weighted Bergman space $A^2_ψ(\Bn)$ of all holomorphic functions on $\Bn$ square integrable with respect to a particular exponential weight measure $e^{-ψ} dV$ on $\Bn$, where \begin{align*} ψ(z):=\frac{1}{1-|z|^2}. \end{align*} We prove the following estimate for the Bergman kernel $K_ψ(z,w)$ of $A^2_ψ(\Bn)$: \begin{align*}
|K_ψ(z,w)|^2\le C\frac{e^{ψ(z)+ψ(w)}}{{\rm Vol}(B_ψ(z,1…
▽ More
We consider the weighted Bergman space $A^2_ψ(\Bn)$ of all holomorphic functions on $\Bn$ square integrable with respect to a particular exponential weight measure $e^{-ψ} dV$ on $\Bn$, where \begin{align*} ψ(z):=\frac{1}{1-|z|^2}. \end{align*} We prove the following estimate for the Bergman kernel $K_ψ(z,w)$ of $A^2_ψ(\Bn)$: \begin{align*}
|K_ψ(z,w)|^2\le C\frac{e^{ψ(z)+ψ(w)}}{{\rm Vol}(B_ψ(z,1)){\rm Vol}(B_ψ(w, 1))}e^{-\varepsilon d_ψ(z,w)}, \quad z, w\in\Bn, \end{align*} where $d_ψ$ is the Riemannian distance induced by the potential function $ψ$ and $B_ψ(z,1)$ is the $d_ψ$-ball of center $z$ and radius $1$. The result is motivated by Christ \cite{Chr}.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Measurement of the integrated luminosity of data samples collected during 2019-2022 by the Belle II experiment
Authors:
The Belle II Collaboration,
I. Adachi,
L. Aggarwal,
H. Ahmed,
J. K. Ahn,
H. Aihara,
N. Akopov,
A. Aloisio,
N. Althubiti,
N. Anh Ky,
D. M. Asner,
H. Atmacan,
T. Aushev,
V. Aushev,
M. Aversano,
R. Ayad,
V. Babu,
H. Bae,
S. Bahinipati,
P. Bambade,
Sw. Banerjee,
M. Barrett,
J. Baudot,
A. Baur,
A. Beaubien
, et al. (382 additional authors not shown)
Abstract:
A series of data samples was collected with the Belle II detector at the SuperKEKB collider from March 2019 to June 2022. We determine the integrated luminosities of these data samples using three distinct methodologies involving Bhabha ($e^+e^- \to e^+e^-(nγ)$), digamma ($e^+e^- \to γγ(nγ)$), and dimuon ($e^+e^- \to μ^+ μ^- (nγ)$) events. The total integrated luminosity obtained with Bhabha, diga…
▽ More
A series of data samples was collected with the Belle II detector at the SuperKEKB collider from March 2019 to June 2022. We determine the integrated luminosities of these data samples using three distinct methodologies involving Bhabha ($e^+e^- \to e^+e^-(nγ)$), digamma ($e^+e^- \to γγ(nγ)$), and dimuon ($e^+e^- \to μ^+ μ^- (nγ)$) events. The total integrated luminosity obtained with Bhabha, digamma, and dimuon events is (426.52 $\pm$ 0.03 $\pm$ 2.48)~fb$^{-1}$, (427.32 $\pm$ 0.03 $\pm$ 2.56)~fb$^{-1}$, and (424.84 $\pm$ 0.04 $\pm$ 3.88)~fb$^{-1}$, where the first uncertainties are statistical and the second are systematic. The resulting total integrated luminosity obtained from the combination of the three methods is (426.88 $\pm$ 1.93)~fb$^{-1}$.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Study of $χ_{bJ}(2P)\toωΥ(1S)$ at Belle
Authors:
Belle Collaboration,
Z. S. Stottler,
T. K. Pedlar,
B. G. Fulsom,
I. Adachi,
K. Adamczyk,
H. Aihara,
S. Al Said,
D. M. Asner,
H. Atmacan,
T. Aushev,
R. Ayad,
V. Babu,
Sw. Banerjee,
M. Bauer,
P. Behera,
K. Belous,
J. Bennett,
F. Bernlochner,
M. Bessner,
T. Bilka,
D. Biswas,
A. Bobrov,
D. Bodrov,
G. Bonvicini
, et al. (157 additional authors not shown)
Abstract:
We report a study of the hadronic transitions $χ_{bJ}(2P)\toωΥ(1S)$, with $ω\toπ^{+}π^{-}π^{0}$, using $28.2\times10^6~Υ(3S)$ mesons recorded by the Belle detector. We present the first evidence for the near--threshold transition $χ_{b0}(2P)\toωΥ(1S)$, the analog of the charm sector decay $χ_{c1}(3872)\toωJ/ψ$, with a branching fraction of…
▽ More
We report a study of the hadronic transitions $χ_{bJ}(2P)\toωΥ(1S)$, with $ω\toπ^{+}π^{-}π^{0}$, using $28.2\times10^6~Υ(3S)$ mesons recorded by the Belle detector. We present the first evidence for the near--threshold transition $χ_{b0}(2P)\toωΥ(1S)$, the analog of the charm sector decay $χ_{c1}(3872)\toωJ/ψ$, with a branching fraction of $B\big(χ_{b0}(2P)\toωΥ(1S)\big) = \big(0.55\pm0.19\pm0.07\big)\%$. We also obtain branching fractions of $B\big(χ_{b1}(2P)\toωΥ(1S)\big) = \big(2.39{}^{+0.20}_{-0.19}\pm0.24\big)\%$ and $B\big(χ_{b2}(2P)\toωΥ(1S)\big) = \big(0.47{}^{+0.13}_{-0.12}\pm0.06\big)\%$, confirming the measurement of the $ω$ transitions of the $J=1,2~P$--wave states. The ratio for the $J=2$ to $J=1$ transitions is also measured and found to differ by 3.3 standard deviations from the expected value in the QCD multipole expansion.
△ Less
Submitted 8 July, 2024; v1 submitted 30 June, 2024;
originally announced July 2024.
-
Investigating How Large Language Models Leverage Internal Knowledge to Perform Complex Reasoning
Authors:
Miyoung Ko,
Sue Hyun Park,
Joonsuk Park,
Minjoon Seo
Abstract:
Despite significant advancements, there is a limited understanding of how large language models (LLMs) utilize knowledge for reasoning. To address this, we propose a method that deconstructs complex real-world questions into a graph, representing each question as a node with parent nodes of background knowledge needed to solve the question. We develop the DepthQA dataset, deconstructing questions…
▽ More
Despite significant advancements, there is a limited understanding of how large language models (LLMs) utilize knowledge for reasoning. To address this, we propose a method that deconstructs complex real-world questions into a graph, representing each question as a node with parent nodes of background knowledge needed to solve the question. We develop the DepthQA dataset, deconstructing questions into three depths: (i) recalling conceptual knowledge, (ii) applying procedural knowledge, and (iii) analyzing strategic knowledge. Based on a hierarchical graph, we quantify forward discrepancy, discrepancies in LLMs' performance on simpler sub-problems versus complex questions. We also measure backward discrepancy, where LLMs answer complex questions but struggle with simpler ones. Our analysis shows that smaller models have more discrepancies than larger models. Additionally, guiding models from simpler to complex questions through multi-turn interactions improves performance across model sizes, highlighting the importance of structured intermediate steps in knowledge reasoning. This work enhances our understanding of LLM reasoning and suggests ways to improve their problem-solving abilities.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Finite size scaling of the Kuramoto model at criticality
Authors:
Su-Chan Park,
Hyunggyu Park
Abstract:
The asymptotic scaling behavior of the Kuramoto model with finite populations has been notably elusive, despite comprehensive investigations employing both analytical and numerical methods. In this study, we explore the Kuramoto model with "deterministic" sampling of natural frequencies, employing extensive numerical simulations and report the asymptotic values of the finite-size scaling (FSS) exp…
▽ More
The asymptotic scaling behavior of the Kuramoto model with finite populations has been notably elusive, despite comprehensive investigations employing both analytical and numerical methods. In this study, we explore the Kuramoto model with "deterministic" sampling of natural frequencies, employing extensive numerical simulations and report the asymptotic values of the finite-size scaling (FSS) exponents, which deviate significantly from the previously reported values in the literature. Additionally, we observe that these exponents are sensitive to the specifics of the sampling method. We discuss the origins of this variability through the self-consistent theory of the entrained oscillators.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism
Authors:
Byungsoo Jeon,
Mengdi Wu,
Shiyi Cao,
Sunghyun Kim,
Sunghyun Park,
Neeraj Aggarwal,
Colin Unger,
Daiyaan Arfeen,
Peiyuan Liao,
Xupeng Miao,
Mohammad Alizadeh,
Gregory R. Ganger,
Tianqi Chen,
Zhihao Jia
Abstract:
Deep neural networks (DNNs) continue to grow rapidly in size, making them infeasible to train on a single device. Pipeline parallelism is commonly used in existing DNN systems to support large-scale DNN training by partitioning a DNN into multiple stages, which concurrently perform DNN training for different micro-batches in a pipeline fashion. However, existing pipeline-parallel approaches only c…
▽ More
Deep neural networks (DNNs) continue to grow rapidly in size, making them infeasible to train on a single device. Pipeline parallelism is commonly used in existing DNN systems to support large-scale DNN training by partitioning a DNN into multiple stages, which concurrently perform DNN training for different micro-batches in a pipeline fashion. However, existing pipeline-parallel approaches only consider sequential pipeline stages and thus ignore the topology of a DNN, resulting in missed model-parallel opportunities. This paper presents graph pipeline parallelism (GPP), a new pipeline-parallel scheme that partitions a DNN into pipeline stages whose dependencies are identified by a directed acyclic graph. GPP generalizes existing sequential pipeline parallelism and preserves the inherent topology of a DNN to enable concurrent execution of computationally-independent operators, resulting in reduced memory requirement and improved GPU performance. In addition, we develop GraphPipe, a distributed system that exploits GPP strategies to enable performant and scalable DNN training. GraphPipe partitions a DNN into a graph of stages, optimizes micro-batch schedules for these stages, and parallelizes DNN training using the discovered GPP strategies. Evaluation on a variety of DNNs shows that GraphPipe outperforms existing pipeline-parallel systems such as PipeDream and Piper by up to 1.6X. GraphPipe also reduces the search time by 9-21X compared to PipeDream and Piper.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Quantum Multi-Agent Reinforcement Learning for Cooperative Mobile Access in Space-Air-Ground Integrated Networks
Authors:
Gyu Seon Kim,
Yeryeong Cho,
Jaehyun Chung,
Soohyun Park,
Soyi Jung,
Zhu Han,
Joongheon Kim
Abstract:
Achieving global space-air-ground integrated network (SAGIN) access only with CubeSats presents significant challenges such as the access sustainability limitations in specific regions (e.g., polar regions) and the energy efficiency limitations in CubeSats. To tackle these problems, high-altitude long-endurance unmanned aerial vehicles (HALE-UAVs) can complement these CubeSat shortcomings for prov…
▽ More
Achieving global space-air-ground integrated network (SAGIN) access only with CubeSats presents significant challenges such as the access sustainability limitations in specific regions (e.g., polar regions) and the energy efficiency limitations in CubeSats. To tackle these problems, high-altitude long-endurance unmanned aerial vehicles (HALE-UAVs) can complement these CubeSat shortcomings for providing cooperatively global access sustainability and energy efficiency. However, as the number of CubeSats and HALE-UAVs, increases, the scheduling dimension of each ground station (GS) increases. As a result, each GS can fall into the curse of dimensionality, and this challenge becomes one major hurdle for efficient global access. Therefore, this paper provides a quantum multi-agent reinforcement Learning (QMARL)-based method for scheduling between GSs and CubeSats/HALE-UAVs in order to improve global access availability and energy efficiency. The main reason why the QMARL-based scheduler can be beneficial is that the algorithm facilitates a logarithmic-scale reduction in scheduling action dimensions, which is one critical feature as the number of CubeSats and HALE-UAVs expands. Additionally, individual GSs have different traffic demands depending on their locations and characteristics, thus it is essential to provide differentiated access services. The superiority of the proposed scheduler is validated through data-intensive experiments in realistic CubeSat/HALE-UAV settings.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
An antiferromagnetic diode effect in even-layered MnBi2Te4
Authors:
Anyuan Gao,
Shao-Wen Chen,
Barun Ghosh,
Jian-Xiang Qiu,
Yu-Fei Liu,
Yugo Onishi,
Chaowei Hu,
Tiema Qian,
Damien Bérubé,
Thao Dinh,
Houchen Li,
Christian Tzschaschel,
Seunghyun Park,
Tianye Huang,
Shang-Wei Lien,
Zhe Sun,
Sheng-Chin Ho,
Bahadur Singh,
Kenji Watanabe,
Takashi Taniguchi,
David C. Bell,
Arun Bansil,
Hsin Lin,
Tay-Rong Chang,
Amir Yacoby
, et al. (4 additional authors not shown)
Abstract:
In a PN junction, the separation between positive and negative charges leads to diode transport. In the past few years, the intrinsic diode transport in noncentrosymmetric polar conductors has attracted great interest, because it suggests novel nonlinear applications and provides a symmetry-sensitive probe of Fermi surface. Recently, such studies have been extended to noncentrosymmetric supercondu…
▽ More
In a PN junction, the separation between positive and negative charges leads to diode transport. In the past few years, the intrinsic diode transport in noncentrosymmetric polar conductors has attracted great interest, because it suggests novel nonlinear applications and provides a symmetry-sensitive probe of Fermi surface. Recently, such studies have been extended to noncentrosymmetric superconductors, realizing the superconducting diode effect. Here, we show that, even in a centrosymmetric crystal without directional charge separation, the spins of an antiferromagnet (AFM) can generate a spatial directionality, leading to an AFM diode effect. We observe large second-harmonic transport in a nonlinear electronic device enabled by the compensated AFM state of even-layered MnBi2Te4. We also report a novel electrical sum-frequency generation (SFG), which has been rarely explored in contrast to the well-known optical SFG in wide-gap insulators. We demonstrate that the AFM enables an in-plane field-effect transistor and harvesting of wireless electromagnetic energy. The electrical SFG establishes a powerful method to study nonlinear electronics built by quantum materials. The AFM diode effect paves the way for potential device concepts including AFM logic circuits, self-powered AFM spintronics, and other applications that potentially bridge nonlinear electronics with AFM spintronics.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Time transformation between the solar system barycenter and the surfaces of the Earth and Moon
Authors:
Slava G. Turyshev,
James G. Williams,
Dale H. Boggs,
Ryan S. Park
Abstract:
The transformation of time between the surface of the Earth, the solar system barycenter, and the surface of the Moon involves relativistic corrections. For solar system Barycentric Dynamical Time (TDB), we also require that there be no rate difference between Terrestrial Time (TT) and TDB. The IAU has addressed these transformations with several resolutions. A series of robotic and crewed landing…
▽ More
The transformation of time between the surface of the Earth, the solar system barycenter, and the surface of the Moon involves relativistic corrections. For solar system Barycentric Dynamical Time (TDB), we also require that there be no rate difference between Terrestrial Time (TT) and TDB. The IAU has addressed these transformations with several resolutions. A series of robotic and crewed landings on the Moon are planned. The analogous transformation between TDB and time on the surface of the Moon (TL) needs a review and discussion. In this paper, we compute the rate terms involved in that transformation. We also present the TDB-compatible spatial scale and Lorentz contraction of Moon-centered positional coordinates. These transformations have been implemented in the JPL programs used to generate ephemerides of the Moon and planets. Finally, we provide expressions that can be used to synchronize TT and TL using either TDB or TT. The relevant transformations contain a small secular drift between the two time scales, along with additional small periodic terms that can be numerically evaluated using the solar system and lunar ephemerides.
△ Less
Submitted 7 July, 2024; v1 submitted 23 June, 2024;
originally announced June 2024.
-
Search for charmed baryons in the $Λ_c^+η$ system and measurement of the branching fractions of $Λ_c(2880)^+$ and $Λ_c(2940)^+$ decaying to $Λ_c^+η$ and $pD^0$ relative to $Σ_c(2455)π$
Authors:
Belle Collaboration,
S. X. Li,
C. P. Shen,
I. Adachi,
J. K. Ahn,
H. Aihara,
D. M. Asner,
H. Atmacan,
T. Aushev,
R. Ayad,
Sw. Banerjee,
K. Belous,
J. Bennett,
M. Bessner,
T. Bilka,
D. Biswas,
D. Bodrov,
A. Bozek,
M. Bračko,
P. Branchini,
T. E. Browder,
A. Budano,
M. Campajola,
M. -C. Chang,
B. G. Cheon
, et al. (102 additional authors not shown)
Abstract:
We search for excited charmed baryons in the $Λ_c^+η$ system using a data sample corresponding to an integrated luminosity of 980 $\rm fb^{-1}$. The data were collected by the Belle detector at the KEKB $e^{+}$$e^{-}$ asymmetric-energy collider. No significant signals are found in the $Λ_c^+η$ mass spectrum, including the known $Λ_c(2880)^+$ and $Λ_c(2940)^+$. Clear $Λ_c(2880)^+$ and…
▽ More
We search for excited charmed baryons in the $Λ_c^+η$ system using a data sample corresponding to an integrated luminosity of 980 $\rm fb^{-1}$. The data were collected by the Belle detector at the KEKB $e^{+}$$e^{-}$ asymmetric-energy collider. No significant signals are found in the $Λ_c^+η$ mass spectrum, including the known $Λ_c(2880)^+$ and $Λ_c(2940)^+$. Clear $Λ_c(2880)^+$ and $Λ_c(2940)^+$ signals are observed in the $pD^0$ mass spectrum. We set upper limits at 90\% credibility level on ratios of branching fractions of $Λ_c(2880)^+$ and $Λ_c(2940)^+$ decaying to $Λ_c^+η$ relative to $Σ_c(2455)π$ of $<0.13$ for the $Λ_c(2880)^+$ and $<1.11$ for the $Λ_c(2940)^+$. We measure ratios of branching fractions of $Λ_c(2880)^+$ and $Λ_c(2940)^+$ decaying to $pD^0$ relative to $Σ_c(2455)π$ of $0.75 \pm 0.03(\text{stat.}) \pm 0.07(\text{syst.})$ for the $Λ_c(2880)^+$ and $3.59 \pm 0.21(\text{stat.}) \pm 0.56(\text{syst.})$ for the $Λ_c(2940)^+$.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Automatic AI Model Selection for Wireless Systems: Online Learning via Digital Twinning
Authors:
Qiushuo Hou,
Matteo Zecchin,
Sangwoo Park,
Yunlong Cai,
Guanding Yu,
Kaushik Chowdhury,
Osvaldo Simeone
Abstract:
In modern wireless network architectures, such as O-RAN, artificial intelligence (AI)-based applications are deployed at intelligent controllers to carry out functionalities like scheduling or power control. The AI "apps" are selected on the basis of contextual information such as network conditions, topology, traffic statistics, and design goals. The mapping between context and AI model parameter…
▽ More
In modern wireless network architectures, such as O-RAN, artificial intelligence (AI)-based applications are deployed at intelligent controllers to carry out functionalities like scheduling or power control. The AI "apps" are selected on the basis of contextual information such as network conditions, topology, traffic statistics, and design goals. The mapping between context and AI model parameters is ideally done in a zero-shot fashion via an automatic model selection (AMS) mapping that leverages only contextual information without requiring any current data. This paper introduces a general methodology for the online optimization of AMS mappings. Optimizing an AMS mapping is challenging, as it requires exposure to data collected from many different contexts. Therefore, if carried out online, this initial optimization phase would be extremely time consuming. A possible solution is to leverage a digital twin of the physical system to generate synthetic data from multiple simulated contexts. However, given that the simulator at the digital twin is imperfect, a direct use of simulated data for the optimization of the AMS mapping would yield poor performance when tested in the real system. This paper proposes a novel method for the online optimization of AMS mapping that corrects for the bias of the simulator by means of limited real data collected from the physical system. Experimental results for a graph neural network-based power control app demonstrate the significant advantages of the proposed approach.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
DataFreeShield: Defending Adversarial Attacks without Training Data
Authors:
Hyeyoon Lee,
Kanghyun Choi,
Dain Kwon,
Sunjong Park,
Mayoore Selvarasa Jaiswal,
Noseong Park,
Jonghyun Choi,
Jinho Lee
Abstract:
Recent advances in adversarial robustness rely on an abundant set of training data, where using external or additional datasets has become a common setting. However, in real life, the training data is often kept private for security and privacy issues, while only the pretrained weight is available to the public. In such scenarios, existing methods that assume accessibility to the original data bec…
▽ More
Recent advances in adversarial robustness rely on an abundant set of training data, where using external or additional datasets has become a common setting. However, in real life, the training data is often kept private for security and privacy issues, while only the pretrained weight is available to the public. In such scenarios, existing methods that assume accessibility to the original data become inapplicable. Thus we investigate the pivotal problem of data-free adversarial robustness, where we try to achieve adversarial robustness without accessing any real data. Through a preliminary study, we highlight the severity of the problem by showing that robustness without the original dataset is difficult to achieve, even with similar domain datasets. To address this issue, we propose DataFreeShield, which tackles the problem from two perspectives: surrogate dataset generation and adversarial training using the generated data. Through extensive validation, we show that DataFreeShield outperforms baselines, demonstrating that the proposed method sets the first entirely data-free solution for the adversarial robustness problem.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
A variational perspective on the dissipative Hamiltonian structure of the Vlasov-Fokker-Planck equation
Authors:
Sangmin Park
Abstract:
The Vlasov-Fokker-Planck equation describes the evolution of the probability density of the position and velocity of particles under the influence of external confinement, interaction, friction, and stochastic force. It is well-known that this equation can be formally seen as a dissipative Hamiltonian system in the Wasserstein space of probability measures. In order to better understand this geome…
▽ More
The Vlasov-Fokker-Planck equation describes the evolution of the probability density of the position and velocity of particles under the influence of external confinement, interaction, friction, and stochastic force. It is well-known that this equation can be formally seen as a dissipative Hamiltonian system in the Wasserstein space of probability measures. In order to better understand this geometric formalism, we introduce a time-discrete variational scheme, solutions of which converge to the solution of the Vlasov-Fokker-Planck equation as time-step vanishes. The implicit scheme is based on the symplectic Euler scheme, and updates the probability density at each iteration first in the velocity variable then in the position variable.
The algorithm leverages the geometric structure of the Wasserstein space, and has several desirable properties. Energy functionals involved in each variational problem are geodesically-convex, which implies the unique solvability of the problem. Furthermore, the correct dissipation of the Hamiltonian is observed at the discrete level up to higher order errors.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Stackelberg Games with $k$-Submodular Function under Distributional Risk-Receptiveness and Robustness
Authors:
Seonghun Park,
Manish Bansal
Abstract:
We study submodular optimization in adversarial context, applicable to machine learning problems such as feature selection using data susceptible to uncertainties and attacks. We focus on Stackelberg games between an attacker (or interdictor) and a defender where the attacker aims to minimize the defender's objective of maximizing a $k$-submodular function. We allow uncertainties arising from the…
▽ More
We study submodular optimization in adversarial context, applicable to machine learning problems such as feature selection using data susceptible to uncertainties and attacks. We focus on Stackelberg games between an attacker (or interdictor) and a defender where the attacker aims to minimize the defender's objective of maximizing a $k$-submodular function. We allow uncertainties arising from the success of attacks and inherent data noise, and address challenges due to incomplete knowledge of the probability distribution of random parameters. Specifically, we introduce Distributionally Risk-Averse $k$-Submodular Interdiction Problem (DRA $k$-SIP) and Distributionally Risk-Receptive $k$-Submodular Interdiction Problem (DRR $k$-SIP) along with finitely convergent exact algorithms for solving them. The DRA $k$-SIP solution allows risk-averse interdictor to develop robust strategies for real-world uncertainties. Conversely, DRR $k$-SIP solution suggests aggressive tactics for attackers, willing to embrace (distributional) risk to inflict maximum damage, identifying critical vulnerable components, which can be used for the defender's defensive strategies. The optimal values derived from both DRA $k$-SIP and DRR $k$-SIP offer a confidence interval-like range for the expected value of the defender's objective function, capturing distributional ambiguity. We conduct computational experiments using instances of feature selection and sensor placement problems, and Wisconsin breast cancer data and synthetic data, respectively.
△ Less
Submitted 28 June, 2024; v1 submitted 18 June, 2024;
originally announced June 2024.
-
Explanations for the two-component spectral energy distributions of gravitationally lensed stars at high redshifts
Authors:
Armin Nabizadeh,
Erik Zackrisson,
Emma Lundqvist,
Massimo Ricotti,
Seyong Park,
Brian Welch,
Jose M. Diego
Abstract:
Observations of gravitationally lensed, high-mass stars at redshifts $\gtrsim1$ occasionally reveal spectral energy distributions that contain two components with different effective temperatures. Given that two separate stars are involved, it suggests that both stars have simultaneously reached very high magnification, as expected for two stars in a binary system close to the caustic curve of the…
▽ More
Observations of gravitationally lensed, high-mass stars at redshifts $\gtrsim1$ occasionally reveal spectral energy distributions that contain two components with different effective temperatures. Given that two separate stars are involved, it suggests that both stars have simultaneously reached very high magnification, as expected for two stars in a binary system close to the caustic curve of the foreground galaxy-cluster lens. The inferred effective temperatures and luminosities of these stars are, however, difficult to reconcile with known binaries, or even with isolated stars of the same age. Here, we explore three alternative explanations for these cases: circumstellar dust around the cooler of the two stars; age differences of a few Myr among stars in the same star cluster, and a scenario in which the stars originate in two separate star clusters of different age along the lensing caustic. While all of these scenarios are deemed plausible in principle, dust solutions would require more circumstellar extinction than seen in local observations of the relevant super/hypergiant stars. Hence, we argue that age differences between the two stars are the most likely scenario, given the current data.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
Authors:
Young Jin Ahn,
Jungwoo Park,
Sangha Park,
Jonghyun Choi,
Kee-Eung Kim
Abstract:
Visual Speech Recognition (VSR) stands at the intersection of computer vision and speech recognition, aiming to interpret spoken content from visual cues. A prominent challenge in VSR is the presence of homophenes-visually similar lip gestures that represent different phonemes. Prior approaches have sought to distinguish fine-grained visemes by aligning visual and auditory semantics, but often fel…
▽ More
Visual Speech Recognition (VSR) stands at the intersection of computer vision and speech recognition, aiming to interpret spoken content from visual cues. A prominent challenge in VSR is the presence of homophenes-visually similar lip gestures that represent different phonemes. Prior approaches have sought to distinguish fine-grained visemes by aligning visual and auditory semantics, but often fell short of full synchronization. To address this, we present SyncVSR, an end-to-end learning framework that leverages quantized audio for frame-level crossmodal supervision. By integrating a projection layer that synchronizes visual representation with acoustic data, our encoder learns to generate discrete audio tokens from a video sequence in a non-autoregressive manner. SyncVSR shows versatility across tasks, languages, and modalities at the cost of a forward pass. Our empirical evaluations show that it not only achieves state-of-the-art results but also reduces data usage by up to ninefold.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Learning Hierarchical Semantic Classification by Grounding on Consistent Image Segmentations
Authors:
Seulki Park,
Youren Zhang,
Stella X. Yu,
Sara Beery,
Jonathan Huang
Abstract:
Hierarchical semantic classification requires the prediction of a taxonomy tree instead of a single flat level of the tree, where both accuracies at individual levels and consistency across levels matter. We can train classifiers for individual levels, which has accuracy but not consistency, or we can train only the finest level classification and infer higher levels, which has consistency but not…
▽ More
Hierarchical semantic classification requires the prediction of a taxonomy tree instead of a single flat level of the tree, where both accuracies at individual levels and consistency across levels matter. We can train classifiers for individual levels, which has accuracy but not consistency, or we can train only the finest level classification and infer higher levels, which has consistency but not accuracy. Our key insight is that hierarchical recognition should not be treated as multi-task classification, as each level is essentially a different task and they would have to compromise with each other, but be grounded on image segmentations that are consistent across semantic granularities. Consistency can in fact improve accuracy. We build upon recent work on learning hierarchical segmentation for flat-level recognition, and extend it to hierarchical recognition. It naturally captures the intuition that fine-grained recognition requires fine image segmentation whereas coarse-grained recognition requires coarse segmentation; they can all be integrated into one recognition model that drives fine-to-coarse internal visual parsing.Additionally, we introduce a Tree-path KL Divergence loss to enforce consistent accurate predictions across levels. Our extensive experimentation and analysis demonstrate our significant gains on predicting an accurate and consistent taxonomy tree.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Adversarial Style Augmentation via Large Language Model for Robust Fake News Detection
Authors:
Sungwon Park,
Sungwon Han,
Meeyoung Cha
Abstract:
The spread of fake news negatively impacts individuals and is regarded as a significant social challenge that needs to be addressed. A number of algorithmic and insightful features have been identified for detecting fake news. However, with the recent LLMs and their advanced generation capabilities, many of the detectable features (e.g., style-conversion attacks) can be altered, making it more cha…
▽ More
The spread of fake news negatively impacts individuals and is regarded as a significant social challenge that needs to be addressed. A number of algorithmic and insightful features have been identified for detecting fake news. However, with the recent LLMs and their advanced generation capabilities, many of the detectable features (e.g., style-conversion attacks) can be altered, making it more challenging to distinguish from real news. This study proposes adversarial style augmentation, AdStyle, to train a fake news detector that remains robust against various style-conversion attacks. Our model's key mechanism is the careful use of LLMs to automatically generate a diverse yet coherent range of style-conversion attack prompts. This improves the generation of prompts that are particularly difficult for the detector to handle. Experiments show that our augmentation strategy improves robustness and detection performance when tested on fake news benchmark datasets.
△ Less
Submitted 22 July, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
GeoSEE: Regional Socio-Economic Estimation With a Large Language Model
Authors:
Sungwon Han,
Donghyun Ahn,
Seungeon Lee,
Minhyuk Song,
Sungwon Park,
Sangyoon Park,
Jihee Kim,
Meeyoung Cha
Abstract:
Moving beyond traditional surveys, combining heterogeneous data sources with AI-driven inference models brings new opportunities to measure socio-economic conditions, such as poverty and population, over expansive geographic areas. The current research presents GeoSEE, a method that can estimate various socio-economic indicators using a unified pipeline powered by a large language model (LLM). Pre…
▽ More
Moving beyond traditional surveys, combining heterogeneous data sources with AI-driven inference models brings new opportunities to measure socio-economic conditions, such as poverty and population, over expansive geographic areas. The current research presents GeoSEE, a method that can estimate various socio-economic indicators using a unified pipeline powered by a large language model (LLM). Presented with a diverse set of information modules, including those pre-constructed from satellite imagery, GeoSEE selects which modules to use in estimation, for each indicator and country. This selection is guided by the LLM's prior socio-geographic knowledge, which functions similarly to the insights of a domain expert. The system then computes target indicators via in-context learning after aggregating results from selected modules in the format of natural language-based texts. Comprehensive evaluation across countries at various stages of development reveals that our method outperforms other predictive models in both unsupervised and low-shot contexts. This reliable performance under data-scarce setting in under-developed or developing countries, combined with its cost-effectiveness, underscores its potential to continuously support and monitor the progress of Sustainable Development Goals, such as poverty alleviation and equitable growth, on a global scale.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Projected background and sensitivity of AMoRE-II
Authors:
A. Agrawal,
V. V. Alenkov,
P. Aryal,
J. Beyer,
B. Bhandari,
R. S. Boiko,
K. Boonin,
O. Buzanov,
C. R. Byeon,
N. Chanthima,
M. K. Cheoun,
J. S. Choe,
Seonho Choi,
S. Choudhury,
J. S. Chung,
F. A. Danevich,
M. Djamal,
D. Drung,
C. Enss,
A. Fleischmann,
A. M. Gangapshev,
L. Gastaldo,
Y. M. Gavrilyuk,
A. M. Gezhaev,
O. Gileva
, et al. (81 additional authors not shown)
Abstract:
AMoRE-II aims to search for neutrinoless double beta decay with an array of 423 Li$_2$$^{100}$MoO$_4$ crystals operating in the cryogenic system as the main phase of the Advanced Molybdenum-based Rare process Experiment (AMoRE). AMoRE has been planned to operate in three phases: AMoRE-pilot, AMoRE-I, and AMoRE-II. AMoRE-II is currently being installed at the Yemi Underground Laboratory, located ap…
▽ More
AMoRE-II aims to search for neutrinoless double beta decay with an array of 423 Li$_2$$^{100}$MoO$_4$ crystals operating in the cryogenic system as the main phase of the Advanced Molybdenum-based Rare process Experiment (AMoRE). AMoRE has been planned to operate in three phases: AMoRE-pilot, AMoRE-I, and AMoRE-II. AMoRE-II is currently being installed at the Yemi Underground Laboratory, located approximately 1000 meters deep in Jeongseon, Korea. The goal of AMoRE-II is to reach up to $T^{0νββ}_{1/2}$ $\sim$ 6 $\times$ 10$^{26}$ years, corresponding to an effective Majorana mass of 15 - 29 meV, covering all the inverted mass hierarchy regions. To achieve this, the background level of the experimental configurations and possible background sources of gamma and beta events should be well understood. We have intensively performed Monte Carlo simulations using the GEANT4 toolkit in all the experimental configurations with potential sources. We report the estimated background level that meets the 10$^{-4}$counts/(keV$\cdot$kg$\cdot$yr) requirement for AMoRE-II in the region of interest (ROI) and show the projected half-life sensitivity based on the simulation study.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Is Value Learning Really the Main Bottleneck in Offline RL?
Authors:
Seohong Park,
Kevin Frans,
Sergey Levine,
Aviral Kumar
Abstract:
While imitation learning requires access to high-quality data, offline reinforcement learning (RL) should, in principle, perform similarly or better with substantially lower data quality by using a value function. However, current results indicate that offline RL often performs worse than imitation learning, and it is often unclear what holds back the performance of offline RL. Motivated by this o…
▽ More
While imitation learning requires access to high-quality data, offline reinforcement learning (RL) should, in principle, perform similarly or better with substantially lower data quality by using a value function. However, current results indicate that offline RL often performs worse than imitation learning, and it is often unclear what holds back the performance of offline RL. Motivated by this observation, we aim to understand the bottlenecks in current offline RL algorithms. While poor performance of offline RL is typically attributed to an imperfect value function, we ask: is the main bottleneck of offline RL indeed in learning the value function, or something else? To answer this question, we perform a systematic empirical study of (1) value learning, (2) policy extraction, and (3) policy generalization in offline RL problems, analyzing how these components affect performance. We make two surprising observations. First, we find that the choice of a policy extraction algorithm significantly affects the performance and scalability of offline RL, often more so than the value learning objective. For instance, we show that common value-weighted behavioral cloning objectives (e.g., AWR) do not fully leverage the learned value function, and switching to behavior-constrained policy gradient objectives (e.g., DDPG+BC) often leads to substantial improvements in performance and scalability. Second, we find that a big barrier to improving offline RL performance is often imperfect policy generalization on test-time states out of the support of the training data, rather than policy learning on in-distribution states. We then show that the use of suboptimal but high-coverage data or test-time policy training techniques can address this generalization issue in practice. Specifically, we propose two simple test-time policy improvement methods and show that these methods lead to better performance.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.