-
Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Knowledge
Authors:
Young-Jun Lee,
Dokyong Lee,
Junyoung Youn,
Kyeongjin Oh,
Byungsoo Ko,
Jonghwan Hyeon,
Ho-Jin Choi
Abstract:
Humans share a wide variety of images related to their personal experiences within conversations via instant messaging tools. However, existing works focus on (1) image-sharing behavior in singular sessions, leading to limited long-term social interaction, and (2) a lack of personalized image-sharing behavior. In this work, we introduce Stark, a large-scale long-term multi-modal conversation datas…
▽ More
Humans share a wide variety of images related to their personal experiences within conversations via instant messaging tools. However, existing works focus on (1) image-sharing behavior in singular sessions, leading to limited long-term social interaction, and (2) a lack of personalized image-sharing behavior. In this work, we introduce Stark, a large-scale long-term multi-modal conversation dataset that covers a wide range of social personas in a multi-modality format, time intervals, and images. To construct Stark automatically, we propose a novel multi-modal contextualization framework, Mcu, that generates long-term multi-modal dialogue distilled from ChatGPT and our proposed Plan-and-Execute image aligner. Using our Stark, we train a multi-modal conversation model, Ultron 7B, which demonstrates impressive visual imagination ability. Furthermore, we demonstrate the effectiveness of our dataset in human evaluation. We make our source code and dataset publicly available.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Understanding Pedestrian Movement Using Urban Sensing Technologies: The Promise of Audio-based Sensors
Authors:
Chaeyeon Han,
Pavan Seshadri,
Yiwei Ding,
Noah Posner,
Bon Woo Koo,
Animesh Agrawal,
Alexander Lerch,
Subhrajit Guhathakurta
Abstract:
While various sensors have been deployed to monitor vehicular flows, sensing pedestrian movement is still nascent. Yet walking is a significant mode of travel in many cities, especially those in Europe, Africa, and Asia. Understanding pedestrian volumes and flows is essential for designing safer and more attractive pedestrian infrastructure and for controlling periodic overcrowding. This study dis…
▽ More
While various sensors have been deployed to monitor vehicular flows, sensing pedestrian movement is still nascent. Yet walking is a significant mode of travel in many cities, especially those in Europe, Africa, and Asia. Understanding pedestrian volumes and flows is essential for designing safer and more attractive pedestrian infrastructure and for controlling periodic overcrowding. This study discusses a new approach to scale up urban sensing of people with the help of novel audio-based technology. It assesses the benefits and limitations of microphone-based sensors as compared to other forms of pedestrian sensing. A large-scale dataset called ASPED is presented, which includes high-quality audio recordings along with video recordings used for labeling the pedestrian count data. The baseline analyses highlight the promise of using audio sensors for pedestrian tracking, although algorithmic and technological improvements to make the sensors practically usable continue. This study also demonstrates how the data can be leveraged to predict pedestrian trajectories. Finally, it discusses the use cases and scenarios where audio-based pedestrian sensing can support better urban and transportation planning.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
An insertable glucose sensor using a compact and cost-effective phosphorescence lifetime imager and machine learning
Authors:
Artem Goncharov,
Zoltan Gorocs,
Ridhi Pradhan,
Brian Ko,
Ajmal Ajmal,
Andres Rodriguez,
David Baum,
Marcell Veszpremi,
Xilin Yang,
Maxime Pindrys,
Tianle Zheng,
Oliver Wang,
Jessica C. Ramella-Roman,
Michael J. McShane,
Aydogan Ozcan
Abstract:
Optical continuous glucose monitoring (CGM) systems are emerging for personalized glucose management owing to their lower cost and prolonged durability compared to conventional electrochemical CGMs. Here, we report a computational CGM system, which integrates a biocompatible phosphorescence-based insertable biosensor and a custom-designed phosphorescence lifetime imager (PLI). This compact and cos…
▽ More
Optical continuous glucose monitoring (CGM) systems are emerging for personalized glucose management owing to their lower cost and prolonged durability compared to conventional electrochemical CGMs. Here, we report a computational CGM system, which integrates a biocompatible phosphorescence-based insertable biosensor and a custom-designed phosphorescence lifetime imager (PLI). This compact and cost-effective PLI is designed to capture phosphorescence lifetime images of an insertable sensor through the skin, where the lifetime of the emitted phosphorescence signal is modulated by the local concentration of glucose. Because this phosphorescence signal has a very long lifetime compared to tissue autofluorescence or excitation leakage processes, it completely bypasses these noise sources by measuring the sensor emission over several tens of microseconds after the excitation light is turned off. The lifetime images acquired through the skin are processed by neural network-based models for misalignment-tolerant inference of glucose levels, accurately revealing normal, low (hypoglycemia) and high (hyperglycemia) concentration ranges. Using a 1-mm thick skin phantom mimicking the optical properties of human skin, we performed in vitro testing of the PLI using glucose-spiked samples, yielding 88.8% inference accuracy, also showing resilience to random and unknown misalignments within a lateral distance of ~4.7 mm with respect to the position of the insertable sensor underneath the skin phantom. Furthermore, the PLI accurately identified larger lateral misalignments beyond 5 mm, prompting user intervention for re-alignment. The misalignment-resilient glucose concentration inference capability of this compact and cost-effective phosphorescence lifetime imager makes it an appealing wearable diagnostics tool for real-time tracking of glucose and other biomarkers.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Enhanced tunable cavity development for axion dark matter searches using a piezoelectric motor in combination with gears
Authors:
A. K. Yi,
T. Seong,
S. Lee,
S. Ahn,
B. I. Ivanov,
S. V. Uchaikin,
B. R. Ko,
Y. K. Semertzidis
Abstract:
Most search experiments sensitive to quantum chromodynamics (QCD) axion dark matter benefit from microwave cavities, as electromagnetic resonators, that enhance the detectable axion signal power and thus the experimental sensitivity drastically. As the possible axion mass spans multiple orders of magnitude, microwave cavities must be tunable and it is desirable for the cavity to have a tunable fre…
▽ More
Most search experiments sensitive to quantum chromodynamics (QCD) axion dark matter benefit from microwave cavities, as electromagnetic resonators, that enhance the detectable axion signal power and thus the experimental sensitivity drastically. As the possible axion mass spans multiple orders of magnitude, microwave cavities must be tunable and it is desirable for the cavity to have a tunable frequency range that is as wide as possible. Since the tunable frequency range generally increases as the dimension of the conductor tuning rod increases for a given cylindrical conductor cavity system, we developed a cavity system with a large dimensional tuning rod in order to increase this. We, for the first time, employed not only a piezoelectric motor, but also gears to drive a large and accordingly heavy tuning rod, where such a combination to increase driving power can be adopted for extreme environments as is the case for axion dark matter experiments: cryogenic, high-magnetic-field, and high vacuum. Thanks to such higher power derived from the piezoelectric motor and gear combination, we realized a wideband tunable cavity whose frequency range is about 42\% of the central resonant frequency of the cavity, without sacrificing the experimental sensitivity too much.
△ Less
Submitted 8 July, 2024; v1 submitted 11 June, 2024;
originally announced June 2024.
-
Shockingly Bright Warm Carbon Monoxide Molecular Features in the Supernova Remnant Cassiopeia A Revealed by JWST
Authors:
J. Rho,
S. -H. Park,
R. Arendt,
M. Matsuura,
D. Milisavljevic,
T. Temim,
I. De Looze,
W. P. Blair,
A. Rest,
O. Fox,
A. P. Ravi,
B. -C. Koo,
M. Barlow,
A. Burrows,
R. Chevalier,
G. Clayton,
R. Fesen,
C. Fransson,
C. Fryer,
H. L. Gomez,
H. -T. Janka,
F. Kirchschlarger,
J. M. Laming,
S. Orlando,
D. Patnaude
, et al. (14 additional authors not shown)
Abstract:
We present JWST NIRCam (F356W and F444W filters) and MIRI (F770W) images and NIRSpec- IFU spectroscopy of the young supernova remnant Cassiopeia A (Cas A). We obtained the data as part of a JWST survey of Cas A. The NIRCam and MIRI images map the spatial distributions of synchrotron radiation, Ar-rich ejecta, and CO on both large and small scales, revealing remarkably complex structures. The CO em…
▽ More
We present JWST NIRCam (F356W and F444W filters) and MIRI (F770W) images and NIRSpec- IFU spectroscopy of the young supernova remnant Cassiopeia A (Cas A). We obtained the data as part of a JWST survey of Cas A. The NIRCam and MIRI images map the spatial distributions of synchrotron radiation, Ar-rich ejecta, and CO on both large and small scales, revealing remarkably complex structures. The CO emission is stronger at the outer layers than the Ar ejecta, which indicates the reformation of CO molecules behind the reverse shock. NIRSpec-IFU spectra (3 - 5.5 microns) were obtained toward two representative knots in the NE and S fields. Both regions are dominated by the bright fundamental rovibrational band of CO in the two R and P branches, with strong [Ar VI] and relatively weaker, variable strength ejecta lines of [Si IX], [Ca IV], [Ca V] and [Mg IV]. The NIRSpec-IFU data resolve individual ejecta knots and filaments spatially and in velocity space. The fundamental CO band in the JWST spectra reveals unique shapes of CO, showing a few tens of sinusoidal patterns of rovibrational lines with pseudo-continuum underneath, which is attributed to the high-velocity widths of CO lines. The CO also shows high J lines at different vibrational transitions. Our results with LTE modeling of CO emission indicate a temperature of 1080 K and provide unique insight into the correlations between dust, molecules, and highly ionized ejecta in supernovae, and have strong ramifications for modeling dust formation that is led by CO cooling in the early Universe.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
EdgeSphere: A Three-Tier Architecture for Cognitive Edge Computing
Authors:
Christian Makaya,
Keith Grueneberg,
Bongjun Ko,
David Wood,
Nirmit Desai,
Xiping Wang
Abstract:
Computing at the edge is increasingly important as Internet of Things (IoT) devices at the edge generate massive amounts of data and pose challenges in transporting all that data to the Cloud where they can be analyzed. On the other hand, harnessing the edge data is essential for offering cognitive applications, if the challenges, such as device capabilities, connectivity, and heterogeneity can be…
▽ More
Computing at the edge is increasingly important as Internet of Things (IoT) devices at the edge generate massive amounts of data and pose challenges in transporting all that data to the Cloud where they can be analyzed. On the other hand, harnessing the edge data is essential for offering cognitive applications, if the challenges, such as device capabilities, connectivity, and heterogeneity can be overcome. This paper proposes a novel three-tier architecture, called EdgeSphere, which harnesses resources of the edge devices, to analyze the data in situ at the edge. In contrast to the state-of-the-art cloud and mobile applications, EdgeSphere applications span across cloud, edge gateways, and edge devices. At its core, EdgeSphere builds on Apache Mesos to optimize resources usage and scheduling. EdgeSphere has been applied to practical scenarios and this paper describes the engineering challenges faced as well as innovative solutions.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
Scene Graph Generation Strategy with Co-occurrence Knowledge and Learnable Term Frequency
Authors:
Hyeongjin Kim,
Sangwon Kim,
Dasom Ahn,
Jong Taek Lee,
Byoung Chul Ko
Abstract:
Scene graph generation (SGG) is an important task in image understanding because it represents the relationships between objects in an image as a graph structure, making it possible to understand the semantic relationships between objects intuitively. Previous SGG studies used a message-passing neural networks (MPNN) to update features, which can effectively reflect information about surrounding o…
▽ More
Scene graph generation (SGG) is an important task in image understanding because it represents the relationships between objects in an image as a graph structure, making it possible to understand the semantic relationships between objects intuitively. Previous SGG studies used a message-passing neural networks (MPNN) to update features, which can effectively reflect information about surrounding objects. However, these studies have failed to reflect the co-occurrence of objects during SGG generation. In addition, they only addressed the long-tail problem of the training dataset from the perspectives of sampling and learning methods. To address these two problems, we propose CooK, which reflects the Co-occurrence Knowledge between objects, and the learnable term frequency-inverse document frequency (TF-l-IDF) to solve the long-tail problem. We applied the proposed model to the SGG benchmark dataset, and the results showed a performance improvement of up to 3.8% compared with existing state-of-the-art models in SGGen subtask. The proposed method exhibits generalization ability from the results obtained, showing uniform performance improvement for all MPNN models.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
StyleForge: Enhancing Text-to-Image Synthesis for Any Artistic Styles with Dual Binding
Authors:
Junseo Park,
Beomseok Ko,
Hyeryung Jang
Abstract:
Recent advancements in text-to-image models, such as Stable Diffusion, have showcased their ability to create visual images from natural language prompts. However, existing methods like DreamBooth struggle with capturing arbitrary art styles due to the abstract and multifaceted nature of stylistic attributes. We introduce Single-StyleForge, a novel approach for personalized text-to-image synthesis…
▽ More
Recent advancements in text-to-image models, such as Stable Diffusion, have showcased their ability to create visual images from natural language prompts. However, existing methods like DreamBooth struggle with capturing arbitrary art styles due to the abstract and multifaceted nature of stylistic attributes. We introduce Single-StyleForge, a novel approach for personalized text-to-image synthesis across diverse artistic styles. Using approximately 15 to 20 images of the target style, Single-StyleForge establishes a foundational binding of a unique token identifier with a broad range of attributes of the target style. Additionally, auxiliary images are incorporated for dual binding that guides the consistent representation of crucial elements such as people within the target style. Furthermore, we present Multi-StyleForge, which enhances image quality and text alignment by binding multiple tokens to partial style attributes. Experimental evaluations across six distinct artistic styles demonstrate significant improvements in image quality and perceptual fidelity, as measured by FID, KID, and CLIP scores.
△ Less
Submitted 17 July, 2024; v1 submitted 8 April, 2024;
originally announced April 2024.
-
HyperCLOVA X Technical Report
Authors:
Kang Min Yoo,
Jaegeun Han,
Sookyo In,
Heewon Jeon,
Jisu Jeong,
Jaewook Kang,
Hyunwook Kim,
Kyung-Min Kim,
Munhyong Kim,
Sungju Kim,
Donghyun Kwak,
Hanock Kwak,
Se Jung Kwon,
Bado Lee,
Dongsoo Lee,
Gichang Lee,
Jooho Lee,
Baeseong Park,
Seongjin Shin,
Joonsang Yu,
Seolki Baek,
Sumin Byeon,
Eungsup Cho,
Dooseok Choe,
Jeesung Han
, et al. (371 additional authors not shown)
Abstract:
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t…
▽ More
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs.
△ Less
Submitted 13 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Extensive search for axion dark matter over 1\,GHz with CAPP's Main Axion eXperiment
Authors:
Saebyeok Ahn,
JinMyeong Kim,
Boris I. Ivanov,
Ohjoon Kwon,
HeeSu Byun,
Arjan F. van Loo,
SeongTae Par,
Junu Jeong,
Soohyung Lee,
Jinsu Kim,
Çağlar Kutlu,
Andrew K. Yi,
Yasunobu Nakamura,
Seonjeong Oh,
Danho Ahn,
SungJae Bae,
Hyoungsoon Choi,
Jihoon Choi,
Yonuk Chong,
Woohyun Chung,
Violeta Gkika,
Jihn E. Kim,
Younggeun Kim,
Byeong Rok Ko,
Lino Miceli
, et al. (11 additional authors not shown)
Abstract:
We report an extensive high-sensitivity search for axion dark matter above 1\,GHz at the Center for Axion and Precision Physics Research (CAPP). The cavity resonant search, exploiting the coupling between axions and photons, explored the frequency (mass) range of 1.025\,GHz (4.24\,$μ$eV) to 1.185\,GHz (4.91\,$μ$eV). We have introduced a number of innovations in this field, demonstrating the practi…
▽ More
We report an extensive high-sensitivity search for axion dark matter above 1\,GHz at the Center for Axion and Precision Physics Research (CAPP). The cavity resonant search, exploiting the coupling between axions and photons, explored the frequency (mass) range of 1.025\,GHz (4.24\,$μ$eV) to 1.185\,GHz (4.91\,$μ$eV). We have introduced a number of innovations in this field, demonstrating the practical approach of optimizing all the relevant parameters of axion haloscopes, extending presently available technology. The CAPP 12\,T magnet with an aperture of 320\,mm made of Nb$_3$Sn and NbTi superconductors surrounding a 37-liter ultralight-weight copper cavity is expected to convert DFSZ axions into approximately $10^2$ microwave photons per second. A powerful dilution refrigerator, capable of keeping the core system below 40\,mK, combined with quantum-noise limited readout electronics, achieved a total system noise of about 200\,mK or below, which corresponds to a background of roughly $4\times 10^3$ photons per second within the axion bandwidth. The combination of all those improvements provides unprecedented search performance, imposing the most stringent exclusion limits on axion--photon coupling in this frequency range to date. These results also suggest an experimental capability suitable for highly-sensitive searches for axion dark matter above 1\,GHz.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
A JWST Survey of the Supernova Remnant Cassiopeia A
Authors:
Dan Milisavljevic,
Tea Temim,
Ilse De Looze,
Danielle Dickinson,
J. Martin Laming,
Robert Fesen,
John C. Raymond,
Richard G. Arendt,
Jacco Vink,
Bettina Posselt,
George G. Pavlov,
Ori D. Fox,
Ethan Pinarski,
Bhagya Subrayan,
Judy Schmidt,
William P. Blair,
Armin Rest,
Daniel Patnaude,
Bon-Chul Koo,
Jeonghee Rho,
Salvatore Orlando,
Hans-Thomas Janka,
Moira Andrews,
Michael J. Barlow,
Adam Burrows
, et al. (21 additional authors not shown)
Abstract:
We present initial results from a JWST survey of the youngest Galactic core-collapse supernova remnant Cassiopeia A (Cas A), made up of NIRCam and MIRI imaging mosaics that map emission from the main shell, interior, and surrounding circumstellar/interstellar material (CSM/ISM). We also present four exploratory positions of MIRI/MRS IFU spectroscopy that sample ejecta, CSM, and associated dust fro…
▽ More
We present initial results from a JWST survey of the youngest Galactic core-collapse supernova remnant Cassiopeia A (Cas A), made up of NIRCam and MIRI imaging mosaics that map emission from the main shell, interior, and surrounding circumstellar/interstellar material (CSM/ISM). We also present four exploratory positions of MIRI/MRS IFU spectroscopy that sample ejecta, CSM, and associated dust from representative shocked and unshocked regions. Surprising discoveries include: 1) a web-like network of unshocked ejecta filaments resolved to 0.01 pc scales exhibiting an overall morphology consistent with turbulent mixing of cool, low-entropy matter from the progenitor's oxygen layer with hot, high-entropy matter heated by neutrino interactions and radioactivity, 2) a thick sheet of dust-dominated emission from shocked CSM seen in projection toward the remnant's interior pockmarked with small (approximately one arcsecond) round holes formed by knots of high-velocity ejecta that have pierced through the CSM and driven expanding tangential shocks, 3) dozens of light echoes with angular sizes between 0.1 arcsecond to 1 arcminute reflecting previously unseen fine-scale structure in the ISM. NIRCam observations place new upper limits on infrared emission from the neutron star in Cas A's center and tightly constrain scenarios involving a possible fallback disk. These JWST survey data and initial findings help address unresolved questions about massive star explosions that have broad implications for the formation and evolution of stellar populations, the metal and dust enrichment of galaxies, and the origin of compact remnant objects.
△ Less
Submitted 10 June, 2024; v1 submitted 4 January, 2024;
originally announced January 2024.
-
Dosimetric calibration of an anatomically specific ultra-high dose rate electron irradiation platform for preclinical FLASH radiobiology experiments
Authors:
Jinghui Wang,
Stavros Melemenidis,
Rakesh Manjappa,
Vignesh Viswanathan,
Ramish M. Ashraf,
Karen Levy,
Lawrie Skinner,
Luis A. Soto,
Stephanie Chow,
Brianna Lau,
Ryan B. Ko,
Edward E. Graves,
Amy S. Yu,
Karl K. Bush,
Murat Surucu,
Erinn B. Rankin,
Billy W. Loo Jr,
Emil Schüler,
Peter G. Maxim
Abstract:
We characterized the dosimetric properties of a clinical linear accelerator configured to deliver ultra-high dose rate (UHDR) irradiation to mice and cell-culture FLASH radiobiology experiments. UHDR electron beams were controlled by a microcontroller and relay interfaced with the respiratory gating system. We produced beam collimators with indexed stereotactic mouse positioning devices to provide…
▽ More
We characterized the dosimetric properties of a clinical linear accelerator configured to deliver ultra-high dose rate (UHDR) irradiation to mice and cell-culture FLASH radiobiology experiments. UHDR electron beams were controlled by a microcontroller and relay interfaced with the respiratory gating system. We produced beam collimators with indexed stereotactic mouse positioning devices to provide anatomically specific preclinical treatments. Treatment delivery was monitored directly with an ionization chamber, and charge measurements were correlated with radiochromic film at the entry surface of the mice. The setup for conventional (CONV) dose rate irradiation was similar but the source-to-surface distance was longer. Monte Carlo simulations and film dosimetry were used to characterize beam properties and dose distributions. The mean electron beam energies before the flattening filter were 18.8 MeV (UHDR) and 17.7 MeV (CONV), with corresponding values at the mouse surface of 17.2 MeV and 16.2 MeV. The charges measured with an external ion chamber were linearly correlated with the mouse entrance dose. Use of relay gating for pulse control initially led to a delivery failure rate of 20% ($+/-$ 1 pulse); adjustments to account for the linac latency improved this rate to <1/20. Beam field sizes for two anatomically specific mouse collimators (4x4 $cm^2$ for whole-abdomen and 1.5x1.5 $cm^2$ for unilateral lung irradiation) were accurate within <5% and had low radiation leakage (<4%). Normalizing the dose at the center of the mouse (~0.75 cm depth) produced UHDR and CONV doses to the irradiated volumes with >95% agreement. We successfully configured a clinical linear accelerator for increased output and developed a robust preclinical platform for anatomically specific irradiation, with highly accurate and precise temporal and spatial dose delivery, for both CONV and UHDR applications.
△ Less
Submitted 17 December, 2023;
originally announced December 2023.
-
Neural Speech Embeddings for Speech Synthesis Based on Deep Generative Networks
Authors:
Seo-Hyun Lee,
Young-Eun Lee,
Soowon Kim,
Byung-Kwan Ko,
Jun-Young Kim,
Seong-Whan Lee
Abstract:
Brain-to-speech technology represents a fusion of interdisciplinary applications encompassing fields of artificial intelligence, brain-computer interfaces, and speech synthesis. Neural representation learning based intention decoding and speech synthesis directly connects the neural activity to the means of human linguistic communication, which may greatly enhance the naturalness of communication.…
▽ More
Brain-to-speech technology represents a fusion of interdisciplinary applications encompassing fields of artificial intelligence, brain-computer interfaces, and speech synthesis. Neural representation learning based intention decoding and speech synthesis directly connects the neural activity to the means of human linguistic communication, which may greatly enhance the naturalness of communication. With the current discoveries on representation learning and the development of the speech synthesis technologies, direct translation of brain signals into speech has shown great promise. Especially, the processed input features and neural speech embeddings which are given to the neural network play a significant role in the overall performance when using deep generative models for speech generation from brain signals. In this paper, we introduce the current brain-to-speech technology with the possibility of speech synthesis from brain signals, which may ultimately facilitate innovation in non-verbal communication. Also, we perform comprehensive analysis on the neural features and neural speech embeddings underlying the neurophysiological activation while performing speech, which may play a significant role in the speech synthesis works.
△ Less
Submitted 26 February, 2024; v1 submitted 10 December, 2023;
originally announced December 2023.
-
Semantic Scene Graph Generation Based on an Edge Dual Scene Graph and Message Passing Neural Network
Authors:
Hyeongjin Kim,
Sangwon Kim,
Jong Taek Lee,
Byoung Chul Ko
Abstract:
Along with generative AI, interest in scene graph generation (SGG), which comprehensively captures the relationships and interactions between objects in an image and creates a structured graph-based representation, has significantly increased in recent years. However, relying on object-centric and dichotomous relationships, existing SGG methods have a limited ability to accurately predict detailed…
▽ More
Along with generative AI, interest in scene graph generation (SGG), which comprehensively captures the relationships and interactions between objects in an image and creates a structured graph-based representation, has significantly increased in recent years. However, relying on object-centric and dichotomous relationships, existing SGG methods have a limited ability to accurately predict detailed relationships. To solve these problems, a new approach to the modeling multiobject relationships, called edge dual scene graph generation (EdgeSGG), is proposed herein. EdgeSGG is based on a edge dual scene graph and Dual Message Passing Neural Network (DualMPNN), which can capture rich contextual interactions between unconstrained objects. To facilitate the learning of edge dual scene graphs with a symmetric graph structure, the proposed DualMPNN learns both object- and relation-centric features for more accurately predicting relation-aware contexts and allows fine-grained relational updates between objects. A comparative experiment with state-of-the-art (SoTA) methods was conducted using two public datasets for SGG operations and six metrics for three subtasks. Compared with SoTA approaches, the proposed model exhibited substantial performance improvements across all SGG subtasks. Furthermore, experiment on long-tail distributions revealed that incorporating the relationships between objects effectively mitigates existing long-tail problems.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
Search for the Sagittarius tidal stream of axion dark matter around 4.55 $μ$eV
Authors:
B. R. Ko
Abstract:
We report the first search for the Sagittarius tidal stream of axion dark matter around 4.55 $μ$eV using CAPP-12TB haloscope data acquired in March of 2022. Our result excluded the Sagittarius tidal stream of Dine-Fischler-Srednicki-Zhitnitskii and Kim-Shifman-Vainshtein-Zakharov axion dark matter densities of $ρ_a\gtrsim0.184$ and $\gtrsim0.025$ GeV/cm$^3$, respectively, over a mass range from 4.…
▽ More
We report the first search for the Sagittarius tidal stream of axion dark matter around 4.55 $μ$eV using CAPP-12TB haloscope data acquired in March of 2022. Our result excluded the Sagittarius tidal stream of Dine-Fischler-Srednicki-Zhitnitskii and Kim-Shifman-Vainshtein-Zakharov axion dark matter densities of $ρ_a\gtrsim0.184$ and $\gtrsim0.025$ GeV/cm$^3$, respectively, over a mass range from 4.51 to 4.59 $μ$eV at a 90\% confidence level.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Supernova Ejecta with Crystalline Silicate Dust in the Supernova Remnant MSH 15-52
Authors:
Hyun-Jeong Kim,
Bon-Chul Koo,
Takashi Onaka
Abstract:
IRAS 15099-5856 in the young supernova remnant (SNR) MSH 15-52 is the first and only SNR-associated object with crystalline silicate dust detected so far, although its nature and the origin of the crystalline silicate are still unclear. In this paper, we present high-resolution mid-infrared (MIR) imaging observations of the bright central compact source IRS1 of IRAS 15099-5856 to study the spatial…
▽ More
IRAS 15099-5856 in the young supernova remnant (SNR) MSH 15-52 is the first and only SNR-associated object with crystalline silicate dust detected so far, although its nature and the origin of the crystalline silicate are still unclear. In this paper, we present high-resolution mid-infrared (MIR) imaging observations of the bright central compact source IRS1 of IRAS 15099-5856 to study the spatial distributions of gas and dust and the analysis of its Spitzer MIR spectrum to explore the origin of IRS1. The MIR images obtained with the T-ReCS attached on the Gemini South telescope show a complicated, inhomogeneous morphology of IRS1 with bright clumps and diffuse emission in [Ne II] 12.81 $μ$m and Qa 18.30 $μ$m, which confirms that IRS1 is an extended source externally heated by the nearby O star Muzzio 10, a candidate for the binary companion of the progenitor star. The Spitzer MIR spectrum reveals several ionic emission lines including a strong [Ne II] 12.81 $μ$m line, but no hydrogen line is detected. We model the spectrum using the photoionization code CLOUDY with varying elemental composition. The elemental abundance of IRS1 derived from the model is close to that of SN ejecta with depleted hydrogen and enhanced metals, particularly neon, argon, and iron. Our results imply that IRS1 originates from the SN ejecta and suggest the possibility of the formation of crystalline silicate in newly-formed SN dust.
△ Less
Submitted 25 April, 2024; v1 submitted 24 October, 2023;
originally announced October 2023.
-
Analytical estimation of the signal to noise ratio efficiency in axion dark matter searches using a Savitzky-Golay filter
Authors:
A. K. Yi,
S. Ahn,
B. R. Ko,
Y. K. Semertzidis
Abstract:
The signal to noise ratio efficiency $ε_{\rm SNR}$ in axion dark matter searches has been estimated using large-statistic simulation data reflecting the background information and the expected axion signal power obtained from a real experiment. This usually requires a lot of computing time even with the assistance of powerful computing resources. Employing a Savitzky-Golay filter for background su…
▽ More
The signal to noise ratio efficiency $ε_{\rm SNR}$ in axion dark matter searches has been estimated using large-statistic simulation data reflecting the background information and the expected axion signal power obtained from a real experiment. This usually requires a lot of computing time even with the assistance of powerful computing resources. Employing a Savitzky-Golay filter for background subtraction, in this work, we estimated a fully analytical $ε_{\rm SNR}$ without relying on large-statistic simulation data, but only with an arbitrary axion mass and the relevant signal shape information. Hence, our work can provide $ε_{\rm SNR}$ using minimal computing time and resources prior to the acquisition of experimental data, without the detailed information that has to be obtained from real experiments. Axion haloscope searches have been observing the coincidence that the frequency independent scale factor $ξ$ is approximately consistent with the $ε_{\rm SNR}$. This was confirmed analytically in this work, when the window length of the Savitzky-Golay filter is reasonably wide enough, i.e., at least 5 times the signal window.
△ Less
Submitted 9 November, 2023; v1 submitted 11 October, 2023;
originally announced October 2023.
-
Latent Disentanglement in Mesh Variational Autoencoders Improves the Diagnosis of Craniofacial Syndromes and Aids Surgical Planning
Authors:
Simone Foti,
Alexander J. Rickart,
Bongjin Koo,
Eimear O' Sullivan,
Lara S. van de Lande,
Athanasios Papaioannou,
Roman Khonsari,
Danail Stoyanov,
N. u. Owase Jeelani,
Silvia Schievano,
David J. Dunaway,
Matthew J. Clarkson
Abstract:
The use of deep learning to undertake shape analysis of the complexities of the human head holds great promise. However, there have traditionally been a number of barriers to accurate modelling, especially when operating on both a global and local level. In this work, we will discuss the application of the Swap Disentangled Variational Autoencoder (SD-VAE) with relevance to Crouzon, Apert and Muen…
▽ More
The use of deep learning to undertake shape analysis of the complexities of the human head holds great promise. However, there have traditionally been a number of barriers to accurate modelling, especially when operating on both a global and local level. In this work, we will discuss the application of the Swap Disentangled Variational Autoencoder (SD-VAE) with relevance to Crouzon, Apert and Muenke syndromes. Although syndrome classification is performed on the entire mesh, it is also possible, for the first time, to analyse the influence of each region of the head on the syndromic phenotype. By manipulating specific parameters of the generative model, and producing procedure-specific new shapes, it is also possible to simulate the outcome of a range of craniofacial surgical procedures. This opens new avenues to advance diagnosis, aids surgical planning and allows for the objective evaluation of surgical outcomes.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
ASPED: An Audio Dataset for Detecting Pedestrians
Authors:
Pavan Seshadri,
Chaeyeon Han,
Bon-Woo Koo,
Noah Posner,
Subhrajit Guhathakurta,
Alexander Lerch
Abstract:
We introduce the new audio analysis task of pedestrian detection and present a new large-scale dataset for this task. While the preliminary results prove the viability of using audio approaches for pedestrian detection, they also show that this challenging task cannot be easily solved with standard approaches.
We introduce the new audio analysis task of pedestrian detection and present a new large-scale dataset for this task. While the preliminary results prove the viability of using audio approaches for pedestrian detection, they also show that this challenging task cannot be easily solved with standard approaches.
△ Less
Submitted 16 January, 2024; v1 submitted 12 September, 2023;
originally announced September 2023.
-
Neutral atomic and molecular clouds and star formation in the outer Carina arm
Authors:
Geumsook Park,
Bon-Chul Koo,
Kee-Tae Kim,
Bruce Elmegreen
Abstract:
We present a comprehensive investigation of HI (super)clouds, molecular clouds (MCs), and star formation in the Carina spiral arm of the outer Galaxy. Utilizing HI4PI and CfA CO survey data, we identify HI clouds and MCs based on the ($l$, ${v_\mathrm{LSR}}$) locations of the Carina arm. We analyzed 26 HI clouds and 48 MCs. Most of the identified HI clouds are superclouds, with masses exceeding…
▽ More
We present a comprehensive investigation of HI (super)clouds, molecular clouds (MCs), and star formation in the Carina spiral arm of the outer Galaxy. Utilizing HI4PI and CfA CO survey data, we identify HI clouds and MCs based on the ($l$, ${v_\mathrm{LSR}}$) locations of the Carina arm. We analyzed 26 HI clouds and 48 MCs. Most of the identified HI clouds are superclouds, with masses exceeding $10^6~{\mathrm{M_\odot}}$. We find that 15 of these superclouds have associated MC(s) with ${M_\mathrm{HI}} \gtrsim 10^6~{\mathrm{M_\odot}}$ and ${Σ_\mathrm{HI+H_2}} \gtrsim$ 50 ${\mathrm{M_\odot}} \rm pc^{-2}$. Our virial equilibrium analysis suggests that these CO-bright HI clouds are gravitationally bound or marginally bound. We report an anti-correlation between molecular mass fractions and Galactocentric distances, and a correlation with total gas surface densities. Nine CO-bright HI superclouds are associated with HII regions, indicating ongoing star formation. We confirm the regular spacing of HI superclouds along the spiral arm, which is likely due to some underlying physical process, such as gravitational instabilities. We observe a strong spatial correlation between HII regions and MCs, with some offsets between MCs and local HI column density peaks. Our study reveals that in the context of HI superclouds, the star formation rate surface density is independent of HI and total gas surface densities but positively correlates with molecular gas surface density. This finding is consistent with both extragalactic studies of the resolved Kennicutt-Schmidt relation and local giant molecular clouds study of Lada et al. (2013), emphasizing the crucial role of molecular gas in regulating star formation processes.
△ Less
Submitted 3 August, 2023;
originally announced August 2023.
-
CryoChains: Heterogeneous Reconstruction of Molecular Assembly of Semi-flexible Chains from Cryo-EM Images
Authors:
Bongjin Koo,
Julien Martel,
Ariana Peck,
Axel Levy,
Frédéric Poitevin,
Nina Miolane
Abstract:
Cryogenic electron microscopy (cryo-EM) has transformed structural biology by allowing to reconstruct 3D biomolecular structures up to near-atomic resolution. However, the 3D reconstruction process remains challenging, as the 3D structures may exhibit substantial shape variations, while the 2D image acquisition suffers from a low signal-to-noise ratio, requiring to acquire very large datasets that…
▽ More
Cryogenic electron microscopy (cryo-EM) has transformed structural biology by allowing to reconstruct 3D biomolecular structures up to near-atomic resolution. However, the 3D reconstruction process remains challenging, as the 3D structures may exhibit substantial shape variations, while the 2D image acquisition suffers from a low signal-to-noise ratio, requiring to acquire very large datasets that are time-consuming to process. Current reconstruction methods are precise but computationally expensive, or faster but lack a physically-plausible model of large molecular shape variations. To fill this gap, we propose CryoChains that encodes large deformations of biomolecules via rigid body transformation of their chains, while representing their finer shape variations with the normal mode analysis framework of biophysics. Our synthetic data experiments on the human GABA\textsubscript{B} and heat shock protein show that CryoChains gives a biophysically-grounded quantification of the heterogeneous conformations of biomolecules, while reconstructing their 3D molecular structures at an improved resolution compared to the current fastest, interpretable deep learning method.
△ Less
Submitted 15 July, 2023; v1 submitted 12 June, 2023;
originally announced June 2023.
-
Divided spectro-temporal attention for sound event localization and detection in real scenes for DCASE2023 challenge
Authors:
Yusun Shul,
Byeong-Yun Ko,
Jung-Woo Choi
Abstract:
Localizing sounds and detecting events in different room environments is a difficult task, mainly due to the wide range of reflections and reverberations. When training neural network models with sounds recorded in only a few room environments, there is a tendency for the models to become overly specialized to those specific environments, resulting in overfitting. To address this overfitting issue…
▽ More
Localizing sounds and detecting events in different room environments is a difficult task, mainly due to the wide range of reflections and reverberations. When training neural network models with sounds recorded in only a few room environments, there is a tendency for the models to become overly specialized to those specific environments, resulting in overfitting. To address this overfitting issue, we propose divided spectro-temporal attention. In comparison to the baseline method, which utilizes a convolutional recurrent neural network (CRNN) followed by a temporal multi-head self-attention layer (MHSA), we introduce a separate spectral attention layer that aggregates spectral features prior to the temporal MHSA. To achieve efficient spectral attention, we reduce the frequency pooling size in the convolutional encoder of the baseline to obtain a 3D tensor that incorporates information about frequency, time, and channel. As a result, we can implement spectral attention with channel embeddings, which is not possible in the baseline method dealing with only temporal context in the RNN and MHSA layers. We demonstrate that the proposed divided spectro-temporal attention significantly improves the performance of sound event detection and localization scores for real test data from the STARSS23 development dataset. Additionally, we show that various data augmentations, such as frameshift, time masking, channel swapping, and moderate mix-up, along with the use of external data, contribute to the overall improvement in SELD performance.
△ Less
Submitted 5 June, 2023;
originally announced June 2023.
-
Near-Infrared Spectroscopy of Dense Ejecta Knots in the Outer Eastern Area of the Cassiopeia A Supernova Remnant
Authors:
Bon-Chul Koo,
Yong-Hyun Lee,
Jae-Joon Lee,
Sung-Chul Yoon
Abstract:
The Cassiopeia A supernova remnant has a complex structure, manifesting the multidimensional nature of core-collapse supernova explosions. To further understand this, we carried out near-infrared multi-object spectroscopy on the ejecta knots located in the "northeastern (NE) jet" and the "Fe K plume" regions, which are two distinct features in the outer eastern area of the remnant. Our study revea…
▽ More
The Cassiopeia A supernova remnant has a complex structure, manifesting the multidimensional nature of core-collapse supernova explosions. To further understand this, we carried out near-infrared multi-object spectroscopy on the ejecta knots located in the "northeastern (NE) jet" and the "Fe K plume" regions, which are two distinct features in the outer eastern area of the remnant. Our study reveals that the knots exhibit varying ratios of [S II] 1.03 $μ$m, [P II] 1.189 $μ$m, and [Fe II] 1.257 $μ$m lines depending on their locations within the remnant, suggesting regional differences in elemental composition. Notably, the knots in the NE jet are mostly 'S-rich' with weak or no [P II] lines, implying that they originated below the explosive Ne burning layer, consistent with the results of previous studies. We detected no ejecta knots exhibiting only [Fe II] lines in the NE jet area that are expected in the jet-driven SN explosion model. Instead, we discovered a dozen 'Fe-rich' knots in the Fe K plume area. We propose that they are dense knots produced by a complete Si burning with $α$-rich freezeout in the innermost region of the progenitor and ejected with the diffuse X-ray emitting Fe ejecta but decoupled after crossing the reverse shock. In addition to these metal-rich ejecta knots, several knots emitting only He I 1.083 $μ$m lines were detected, and their origin remains unclear. We also detected three extended H emission features of circumstellar or interstellar origin in this area and discuss its association with the supernova remnant.
△ Less
Submitted 8 May, 2023;
originally announced May 2023.
-
He abundance of Dense Circumstellar Clumps in the Cassiopeia A Supernova Remnant
Authors:
Bon-Chul Koo,
Dongkok Kim,
Sung-Chul Yoon,
John C. Raymond
Abstract:
We report on the result of He abundance analysis of dense circumstellar clumps in the young supernova remnant Cassiopeia A. These clumps, which are called quasi-stationary flocculi (QSFs), are known from previous optical studies to be enriched in He along with N, but the degree of He overabundance relative to H has remained uncertain. For several QSFs with near-infrared spectroscopic data, we have…
▽ More
We report on the result of He abundance analysis of dense circumstellar clumps in the young supernova remnant Cassiopeia A. These clumps, which are called quasi-stationary flocculi (QSFs), are known from previous optical studies to be enriched in He along with N, but the degree of He overabundance relative to H has remained uncertain. For several QSFs with near-infrared spectroscopic data, we have analyzed their He I 1.083 $μ$m/Pa$γ$ ratios together with the ratios of [Fe II] lines by using the Raymond shock code. According to our analysis, He is overabundant relative to H by a factor of $\lesssim 3$ in most of these QSFs. This He abundance of QSFs is consistent with the previous conclusion from the N overabundance that QSFs were ejected when a substantial amount of the H envelope of the progenitor star had been stripped off. We discuss the mass-loss history of the progenitor star and the origin of QSFs.
△ Less
Submitted 29 March, 2023;
originally announced March 2023.
-
VVS: Video-to-Video Retrieval with Irrelevant Frame Suppression
Authors:
Won Jo,
Geuntaek Lim,
Gwangjin Lee,
Hyunwoo Kim,
Byungsoo Ko,
Yukyung Choi
Abstract:
In content-based video retrieval (CBVR), dealing with large-scale collections, efficiency is as important as accuracy; thus, several video-level feature-based studies have actively been conducted. Nevertheless, owing to the severe difficulty of embedding a lengthy and untrimmed video into a single feature, these studies have been insufficient for accurate retrieval compared to frame-level feature-…
▽ More
In content-based video retrieval (CBVR), dealing with large-scale collections, efficiency is as important as accuracy; thus, several video-level feature-based studies have actively been conducted. Nevertheless, owing to the severe difficulty of embedding a lengthy and untrimmed video into a single feature, these studies have been insufficient for accurate retrieval compared to frame-level feature-based studies. In this paper, we show that appropriate suppression of irrelevant frames can provide insight into the current obstacles of the video-level approaches. Furthermore, we propose a Video-to-Video Suppression network (VVS) as a solution. VVS is an end-to-end framework that consists of an easy distractor elimination stage to identify which frames to remove and a suppression weight generation stage to determine the extent to suppress the remaining frames. This structure is intended to effectively describe an untrimmed video with varying content and meaningless information. Its efficacy is proved via extensive experiments, and we show that our approach is not only state-of-the-art in video-level approaches but also has a fast inference time despite possessing retrieval capabilities close to those of frame-level approaches. Code is available at https://github.com/sejong-rcv/VVS
△ Less
Submitted 19 December, 2023; v1 submitted 15 March, 2023;
originally announced March 2023.
-
Disentangling Structural Breaks in Factor Models for Macroeconomic Data
Authors:
Bonsoo Koo,
Benjamin Wong,
Ze-Yu Zhong
Abstract:
Through a routine normalization of the factor variance, standard methods for estimating factor models in macroeconomics do not distinguish between breaks of the factor variance and factor loadings. We argue that it is important to distinguish between structural breaks in the factor variance and loadings within factor models commonly employed in macroeconomics as both can lead to markedly different…
▽ More
Through a routine normalization of the factor variance, standard methods for estimating factor models in macroeconomics do not distinguish between breaks of the factor variance and factor loadings. We argue that it is important to distinguish between structural breaks in the factor variance and loadings within factor models commonly employed in macroeconomics as both can lead to markedly different interpretations when viewed via the lens of the underlying dynamic factor model. We then develop a projection-based decomposition that leads to two standard and easy-to-implement Wald tests to disentangle structural breaks in the factor variance and factor loadings. Applying our procedure to U.S. macroeconomic data, we find evidence of both types of breaks associated with the Great Moderation and the Great Recession. Through our projection-based decomposition, we estimate that the Great Moderation is associated with an over 60% reduction in the total factor variance, highlighting the relevance of disentangling breaks in the factor structure.
△ Less
Submitted 3 June, 2024; v1 submitted 28 February, 2023;
originally announced March 2023.
-
3D Generative Model Latent Disentanglement via Local Eigenprojection
Authors:
Simone Foti,
Bongjin Koo,
Danail Stoyanov,
Matthew J. Clarkson
Abstract:
Designing realistic digital humans is extremely complex. Most data-driven generative models used to simplify the creation of their underlying geometric shape do not offer control over the generation of local shape attributes. In this paper, we overcome this limitation by introducing a novel loss function grounded in spectral geometry and applicable to different neural-network-based generative mode…
▽ More
Designing realistic digital humans is extremely complex. Most data-driven generative models used to simplify the creation of their underlying geometric shape do not offer control over the generation of local shape attributes. In this paper, we overcome this limitation by introducing a novel loss function grounded in spectral geometry and applicable to different neural-network-based generative models of 3D head and body meshes. Encouraging the latent variables of mesh variational autoencoders (VAEs) or generative adversarial networks (GANs) to follow the local eigenprojections of identity attributes, we improve latent disentanglement and properly decouple the attribute creation. Experimental results show that our local eigenprojection disentangled (LED) models not only offer improved disentanglement with respect to the state-of-the-art, but also maintain good generation capabilities with training times comparable to the vanilla implementations of the models.
△ Less
Submitted 4 April, 2023; v1 submitted 24 February, 2023;
originally announced February 2023.
-
Search for the Sagittarius Tidal Stream of Axion Dark Matter around 4.55 $μ$eV
Authors:
Andrew K. Yi,
Saebyeok Ahn,
Çağlar Kutlu,
JinMyeong Kim,
Byeong Rok Ko,
Boris I. Ivanov,
HeeSu Byun,
Arjan F. van Loo,
SeongTae Park,
Junu Jeong,
Ohjoon Kwon,
Yasunobu Nakamura,
Sergey V. Uchaikin,
Jihoon Choi,
Soohyung Lee,
MyeongJae Lee,
Yun Chang Shin,
Jinsu Kim,
Doyu Lee,
Danho Ahn,
SungJae Bae,
Jiwon Lee,
Younggeun Kim,
Violeta Gkika,
Ki Woong Lee
, et al. (7 additional authors not shown)
Abstract:
We report the first search for the Sagittarius tidal stream of axion dark matter around 4.55 $μ$eV using CAPP-12TB haloscope data acquired in March of 2022. Our result excluded the Sagittarius tidal stream of Dine-Fischler-Srednicki-Zhitnitskii and Kim-Shifman-Vainshtein-Zakharov axion dark matter densities of $ρ_a\gtrsim0.184$ and $\gtrsim0.025$ GeV/cm$^{3}$, respectively, over a mass range from…
▽ More
We report the first search for the Sagittarius tidal stream of axion dark matter around 4.55 $μ$eV using CAPP-12TB haloscope data acquired in March of 2022. Our result excluded the Sagittarius tidal stream of Dine-Fischler-Srednicki-Zhitnitskii and Kim-Shifman-Vainshtein-Zakharov axion dark matter densities of $ρ_a\gtrsim0.184$ and $\gtrsim0.025$ GeV/cm$^{3}$, respectively, over a mass range from 4.51 to 4.59 $μ$eV at a 90% confidence level.
△ Less
Submitted 13 July, 2023; v1 submitted 2 February, 2023;
originally announced February 2023.
-
The Internet of Bio-Nano Things in Blood Vessels: System Design and Prototypes
Authors:
Changmin Lee,
Bon-Hong Koo,
Chan-Byoung Chae,
Robert Schober
Abstract:
In this paper, we investigate the Internet of Bio-Nano Things (IoBNT) which relates to networks formed by molecular communications. By providing a means of communication through the ubiquitously connected blood vessels (arteries, veins, and capillaries), molecular communication-based IoBNT enables a host of new eHealth applications. For example, an organ monitoring sensor can transfer internal bod…
▽ More
In this paper, we investigate the Internet of Bio-Nano Things (IoBNT) which relates to networks formed by molecular communications. By providing a means of communication through the ubiquitously connected blood vessels (arteries, veins, and capillaries), molecular communication-based IoBNT enables a host of new eHealth applications. For example, an organ monitoring sensor can transfer internal body signals through the IoBNT for health monitoring applications. We empirically show that blood vessel channels introduce a new set of challenges for the design of molecular communication systems in comparison to free-space channels. We then propose cylindrical duct channel models and discuss the corresponding system designs conforming to the channel characteristics. Furthermore, based on prototype implementations, we confirm that molecular communication techniques can be utilized for composing the IoBNT. We believe that the promising results presented in this work, together with the rich research challenges that lie ahead, are strong indicators that IoBNT with molecular communications can drive novel applications for emerging eHealth systems.
△ Less
Submitted 21 December, 2022;
originally announced December 2022.
-
Cross-Modal Learning with 3D Deformable Attention for Action Recognition
Authors:
Sangwon Kim,
Dasom Ahn,
Byoung Chul Ko
Abstract:
An important challenge in vision-based action recognition is the embedding of spatiotemporal features with two or more heterogeneous modalities into a single feature. In this study, we propose a new 3D deformable transformer for action recognition with adaptive spatiotemporal receptive fields and a cross-modal learning scheme. The 3D deformable transformer consists of three attention modules: 3D d…
▽ More
An important challenge in vision-based action recognition is the embedding of spatiotemporal features with two or more heterogeneous modalities into a single feature. In this study, we propose a new 3D deformable transformer for action recognition with adaptive spatiotemporal receptive fields and a cross-modal learning scheme. The 3D deformable transformer consists of three attention modules: 3D deformability, local joint stride, and temporal stride attention. The two cross-modal tokens are input into the 3D deformable attention module to create a cross-attention token with a reflected spatiotemporal correlation. Local joint stride attention is applied to spatially combine attention and pose tokens. Temporal stride attention temporally reduces the number of input tokens in the attention module and supports temporal expression learning without the simultaneous use of all tokens. The deformable transformer iterates L-times and combines the last cross-modal token for classification. The proposed 3D deformable transformer was tested on the NTU60, NTU120, FineGYM, and PennAction datasets, and showed results better than or similar to pre-trained state-of-the-art methods even without a pre-training process. In addition, by visualizing important joints and correlations during action recognition through spatial joint and temporal stride attention, the possibility of achieving an explainable potential for action recognition is presented.
△ Less
Submitted 17 August, 2023; v1 submitted 11 December, 2022;
originally announced December 2022.
-
DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue Dataset
Authors:
Young-Jun Lee,
Byungsoo Ko,
Han-Gyu Kim,
Jonghwan Hyeon,
Ho-Jin Choi
Abstract:
As sharing images in an instant message is a crucial factor, there has been active research on learning an image-text multi-modal dialogue models. However, training a well-generalized multi-modal dialogue model remains challenging due to the low quality and limited diversity of images per dialogue in existing multi-modal dialogue datasets. In this paper, we propose an automated pipeline to constru…
▽ More
As sharing images in an instant message is a crucial factor, there has been active research on learning an image-text multi-modal dialogue models. However, training a well-generalized multi-modal dialogue model remains challenging due to the low quality and limited diversity of images per dialogue in existing multi-modal dialogue datasets. In this paper, we propose an automated pipeline to construct a multi-modal dialogue dataset, ensuring both dialogue quality and image diversity without requiring minimum human effort. In our pipeline, to guarantee the coherence between images and dialogue, we prompt GPT-4 to infer potential image-sharing moments - specifically, the utterance, speaker, rationale, and image description. Furthermore, we leverage CLIP similarity to maintain consistency between aligned multiple images to the utterance. Through this pipeline, we introduce DialogCC, a high-quality and diverse multi-modal dialogue dataset that surpasses existing datasets in terms of quality and diversity in human evaluation. Our comprehensive experiments highlight that when multi-modal dialogue models are trained using our dataset, their generalization performance on unseen dialogue datasets is significantly enhanced. We make our source code and dataset publicly available.
△ Less
Submitted 29 March, 2024; v1 submitted 8 December, 2022;
originally announced December 2022.
-
Group Generalized Mean Pooling for Vision Transformer
Authors:
Byungsoo Ko,
Han-Gyu Kim,
Byeongho Heo,
Sangdoo Yun,
Sanghyuk Chun,
Geonmo Gu,
Wonjae Kim
Abstract:
Vision Transformer (ViT) extracts the final representation from either class token or an average of all patch tokens, following the architecture of Transformer in Natural Language Processing (NLP) or Convolutional Neural Networks (CNNs) in computer vision. However, studies for the best way of aggregating the patch tokens are still limited to average pooling, while widely-used pooling strategies, s…
▽ More
Vision Transformer (ViT) extracts the final representation from either class token or an average of all patch tokens, following the architecture of Transformer in Natural Language Processing (NLP) or Convolutional Neural Networks (CNNs) in computer vision. However, studies for the best way of aggregating the patch tokens are still limited to average pooling, while widely-used pooling strategies, such as max and GeM pooling, can be considered. Despite their effectiveness, the existing pooling strategies do not consider the architecture of ViT and the channel-wise difference in the activation maps, aggregating the crucial and trivial channels with the same importance. In this paper, we present Group Generalized Mean (GGeM) pooling as a simple yet powerful pooling strategy for ViT. GGeM divides the channels into groups and computes GeM pooling with a shared pooling parameter per group. As ViT groups the channels via a multi-head attention mechanism, grouping the channels by GGeM leads to lower head-wise dependence while amplifying important channels on the activation maps. Exploiting GGeM shows 0.1%p to 0.7%p performance boosts compared to the baselines and achieves state-of-the-art performance for ViT-Base and ViT-Large models in ImageNet-1K classification task. Moreover, GGeM outperforms the existing pooling strategies on image retrieval and multi-modal representation learning tasks, demonstrating the superiority of GGeM for a variety of tasks. GGeM is a simple algorithm in that only a few lines of code are necessary for implementation.
△ Less
Submitted 8 December, 2022;
originally announced December 2022.
-
Towards Neural Decoding of Imagined Speech based on Spoken Speech
Authors:
Seo-Hyun Lee,
Young-Eun Lee,
Soowon Kim,
Byung-Kwan Ko,
Seong-Whan Lee
Abstract:
Decoding imagined speech from human brain signals is a challenging and important issue that may enable human communication via brain signals. While imagined speech can be the paradigm for silent communication via brain signals, it is always hard to collect enough stable data to train the decoding model. Meanwhile, spoken speech data is relatively easy and to obtain, implying the significance of ut…
▽ More
Decoding imagined speech from human brain signals is a challenging and important issue that may enable human communication via brain signals. While imagined speech can be the paradigm for silent communication via brain signals, it is always hard to collect enough stable data to train the decoding model. Meanwhile, spoken speech data is relatively easy and to obtain, implying the significance of utilizing spoken speech brain signals to decode imagined speech. In this paper, we performed a preliminary analysis to find out whether if it would be possible to utilize spoken speech electroencephalography data to decode imagined speech, by simply applying the pre-trained model trained with spoken speech brain signals to decode imagined speech. While the classification performance of imagined speech data solely used to train and validation was 30.5 %, the transferred performance of spoken speech based classifier to imagined speech data displayed average accuracy of 26.8 % which did not have statistically significant difference compared to the imagined speech based classifier (p = 0.0983, chi-square = 4.64). For more comprehensive analysis, we compared the result with the visual imagery dataset, which would naturally be less related to spoken speech compared to the imagined speech. As a result, visual imagery have shown solely trained performance of 31.8 % and transferred performance of 26.3 % which had shown statistically significant difference between each other (p = 0.022, chi-square = 7.64). Our results imply the potential of applying spoken speech to decode imagined speech, as well as their underlying common features.
△ Less
Submitted 14 February, 2023; v1 submitted 5 December, 2022;
originally announced December 2022.
-
Axion Dark Matter Search around 4.55 $μ$eV with Dine-Fischler-Srednicki-Zhitnitskii Sensitivity
Authors:
Andrew K. Yi,
Saebyeok Ahn,
Çağlar Kutlu,
JinMyeong Kim,
Byeong Rok Ko,
Boris I. Ivanov,
HeeSu Byun,
Arjan F. van Loo,
SeongTae Park,
Junu Jeong,
Ohjoon Kwon,
Yasunobu Nakamura,
Sergey V. Uchaikin,
Jihoon Choi,
Soohyung Lee,
MyeongJae Lee,
Yun Chang Shin,
Jinsu Kim,
Doyu Lee,
Danho Ahn,
SungJae Bae,
Jiwon Lee,
Younggeun Kim,
Violeta Gkika,
Ki Woong Lee
, et al. (7 additional authors not shown)
Abstract:
We report an axion dark matter search at Dine-Fischler-Srednicki-Zhitnitskii sensitivity with the CAPP-12TB haloscope, assuming axions contribute 100\% of the local dark matter density.
The search excluded the axion--photon coupling $g_{aγγ}$ down to about $6.2\times10^{-16}$ GeV$^{-1}$ over the axion mass range between 4.51 and 4.59 $μ$eV at a 90\% confidence level.
The achieved experimental…
▽ More
We report an axion dark matter search at Dine-Fischler-Srednicki-Zhitnitskii sensitivity with the CAPP-12TB haloscope, assuming axions contribute 100\% of the local dark matter density.
The search excluded the axion--photon coupling $g_{aγγ}$ down to about $6.2\times10^{-16}$ GeV$^{-1}$ over the axion mass range between 4.51 and 4.59 $μ$eV at a 90\% confidence level.
The achieved experimental sensitivity can also exclude Kim-Shifman-Vainshtein-Zakharov axion dark matter that makes up just 13\% of the local dark matter density.
The CAPP-12TB haloscope will continue the search over a wide range of axion masses.
△ Less
Submitted 16 February, 2023; v1 submitted 19 October, 2022;
originally announced October 2022.
-
STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition
Authors:
Dasom Ahn,
Sangwon Kim,
Hyunsu Hong,
Byoung Chul Ko
Abstract:
In action recognition, although the combination of spatio-temporal videos and skeleton features can improve the recognition performance, a separate model and balancing feature representation for cross-modal data are required. To solve these problems, we propose Spatio-TemporAl cRoss (STAR)-transformer, which can effectively represent two cross-modal features as a recognizable vector. First, from t…
▽ More
In action recognition, although the combination of spatio-temporal videos and skeleton features can improve the recognition performance, a separate model and balancing feature representation for cross-modal data are required. To solve these problems, we propose Spatio-TemporAl cRoss (STAR)-transformer, which can effectively represent two cross-modal features as a recognizable vector. First, from the input video and skeleton sequence, video frames are output as global grid tokens and skeletons are output as joint map tokens, respectively. These tokens are then aggregated into multi-class tokens and input into STAR-transformer. The STAR-transformer encoder layer consists of a full self-attention (FAttn) module and a proposed zigzag spatio-temporal attention (ZAttn) module. Similarly, the continuous decoder consists of a FAttn module and a proposed binary spatio-temporal attention (BAttn) module. STAR-transformer learns an efficient multi-feature representation of the spatio-temporal features by properly arranging pairings of the FAttn, ZAttn, and BAttn modules. Experimental results on the Penn-Action, NTU RGB+D 60, and 120 datasets show that the proposed method achieves a promising improvement in performance in comparison to previous state-of-the-art methods.
△ Less
Submitted 14 October, 2022;
originally announced October 2022.
-
Granularity-aware Adaptation for Image Retrieval over Multiple Tasks
Authors:
Jon Almazán,
Byungsoo Ko,
Geonmo Gu,
Diane Larlus,
Yannis Kalantidis
Abstract:
Strong image search models can be learned for a specific domain, ie. set of labels, provided that some labeled images of that domain are available. A practical visual search model, however, should be versatile enough to solve multiple retrieval tasks simultaneously, even if those cover very different specialized domains. Additionally, it should be able to benefit from even unlabeled images from th…
▽ More
Strong image search models can be learned for a specific domain, ie. set of labels, provided that some labeled images of that domain are available. A practical visual search model, however, should be versatile enough to solve multiple retrieval tasks simultaneously, even if those cover very different specialized domains. Additionally, it should be able to benefit from even unlabeled images from these various retrieval tasks. This is the more practical scenario that we consider in this paper. We address it with the proposed Grappa, an approach that starts from a strong pretrained model, and adapts it to tackle multiple retrieval tasks concurrently, using only unlabeled images from the different task domains. We extend the pretrained model with multiple independently trained sets of adaptors that use pseudo-label sets of different sizes, effectively mimicking different pseudo-granularities. We reconcile all adaptor sets into a single unified model suited for all retrieval tasks by learning fusion layers that we guide by propagating pseudo-granularity attentions across neighbors in the feature space. Results on a benchmark composed of six heterogeneous retrieval tasks show that the unsupervised Grappa model improves the zero-shot performance of a state-of-the-art self-supervised learning model, and in some places reaches or improves over a task label-aware oracle that selects the most fitting pseudo-granularity per task.
△ Less
Submitted 5 October, 2022;
originally announced October 2022.
-
Heterogeneous reconstruction of deformable atomic models in Cryo-EM
Authors:
Youssef Nashed,
Ariana Peck,
Julien Martel,
Axel Levy,
Bongjin Koo,
Gordon Wetzstein,
Nina Miolane,
Daniel Ratner,
Frédéric Poitevin
Abstract:
Cryogenic electron microscopy (cryo-EM) provides a unique opportunity to study the structural heterogeneity of biomolecules. Being able to explain this heterogeneity with atomic models would help our understanding of their functional mechanisms but the size and ruggedness of the structural space (the space of atomic 3D cartesian coordinates) presents an immense challenge. Here, we describe a heter…
▽ More
Cryogenic electron microscopy (cryo-EM) provides a unique opportunity to study the structural heterogeneity of biomolecules. Being able to explain this heterogeneity with atomic models would help our understanding of their functional mechanisms but the size and ruggedness of the structural space (the space of atomic 3D cartesian coordinates) presents an immense challenge. Here, we describe a heterogeneous reconstruction method based on an atomistic representation whose deformation is reduced to a handful of collective motions through normal mode analysis. Our implementation uses an autoencoder. The encoder jointly estimates the amplitude of motion along the normal modes and the 2D shift between the center of the image and the center of the molecule . The physics-based decoder aggregates a representation of the heterogeneity readily interpretable at the atomic level. We illustrate our method on 3 synthetic datasets corresponding to different distributions along a simulated trajectory of adenylate kinase transitioning from its open to its closed structures. We show for each distribution that our approach is able to recapitulate the intermediate atomic models with atomic-level accuracy.
△ Less
Submitted 29 September, 2022;
originally announced September 2022.
-
What Impulse Response Do Instrumental Variables Identify?
Authors:
Bonsoo Koo,
Seojeong Lee,
Myung Hwan Seo
Abstract:
Macro shocks are often composites, yet overlooked in the impulse response analysis. When an instrumental variable (IV) is used to identify a composite shock, it violates the common IV exclusion restriction. We show that the Local Projection-IV estimand is represented as a weighted average of component-wise impulse responses but with possibly negative weights, which occur when the IV and shock comp…
▽ More
Macro shocks are often composites, yet overlooked in the impulse response analysis. When an instrumental variable (IV) is used to identify a composite shock, it violates the common IV exclusion restriction. We show that the Local Projection-IV estimand is represented as a weighted average of component-wise impulse responses but with possibly negative weights, which occur when the IV and shock components have opposite correlations. We further develop alternative (set-) identification strategies for the LP-IV based on sign restrictions or additional granular information. Our applications confirm the composite nature of monetary policy shocks and reveal a non-defense spending multiplier exceeding one.
△ Less
Submitted 23 August, 2023; v1 submitted 24 August, 2022;
originally announced August 2022.
-
Massive MIMO Channel Prediction Using Machine Learning: Power of Domain Transformation
Authors:
Beomsoo Ko,
Hwanjin Kim,
Junil Choi
Abstract:
To compensate the loss from outdated channel state information in wideband massive multiple-input multipleoutput (MIMO) systems, channel prediction can be performed by leveraging the temporal correlation of wireless channels. Machine learning (ML)-based channel predictors for massive MIMO systems were designed recently; however, the time overhead to collect a large amount of training data directly…
▽ More
To compensate the loss from outdated channel state information in wideband massive multiple-input multipleoutput (MIMO) systems, channel prediction can be performed by leveraging the temporal correlation of wireless channels. Machine learning (ML)-based channel predictors for massive MIMO systems were designed recently; however, the time overhead to collect a large amount of training data directly affects the latency of the system. In this paper, we propose a novel ML-based channel prediction technique, which can reduce the time overhead to collect the training data by transforming the domain of channels from subcarrier to antenna in wideband massive MIMO systems. Numerical results show that the proposed technique can not only reduce the time overhead but also give additional performance gain compared to the ML-based channel prediction techniques without the domain transformation.
△ Less
Submitted 9 August, 2022;
originally announced August 2022.
-
Coverage Increase at THz Frequencies: A Cooperative Rate-Splitting Approach
Authors:
Hyesang Cho,
Beomsoo Ko,
Bruno Clerckx,
Junil Choi
Abstract:
Numerous studies claim that terahertz (THz) communication will be an essential piece of sixth-generation wireless communication systems. Its promising potential also comes with major challenges, in particular the reduced coverage due to harsh propagation loss, hardware constraints, and blockage vulnerability. To increase the coverage of THz communication, we revisit cooperative communication. We p…
▽ More
Numerous studies claim that terahertz (THz) communication will be an essential piece of sixth-generation wireless communication systems. Its promising potential also comes with major challenges, in particular the reduced coverage due to harsh propagation loss, hardware constraints, and blockage vulnerability. To increase the coverage of THz communication, we revisit cooperative communication. We propose a new type of cooperative rate-splitting (CRS) called extraction-based CRS (eCRS). Furthermore, we explore two extreme cases of eCRS, namely, identical eCRS and distinct eCRS. To enable the proposed eCRS framework, we design a novel THz cooperative channel model by considering unique characteristics of THz communication. Through mathematical derivations and convex optimization techniques considering the THz cooperative channel model, we derive local optimal solutions for the two cases of eCRS and a global optimal closed form solution for a specific scenario. Finally, we propose a novel channel estimation technique that not only specifies the channel value, but also the time delay of the channel from each cooperating user equipment to fully utilize the THz cooperative channel. In simulation results, we verify the validity of the two cases of our proposed framework and channel estimation technique.
△ Less
Submitted 9 August, 2022;
originally announced August 2022.
-
Ice features of low-luminosity protostars in near-infrared spectra of AKARI/IRC
Authors:
Jaeyeong Kim,
Jeong-Eun Lee,
Woong-Seob Jeong,
Il-Seok Kim,
Yuri Aikawa,
Jeniffer A. Noble,
Minho Choi,
Ho-Gyu Lee,
Michael M. Dunham,
Chul-Hwan Kim,
Bon-Chul Koo
Abstract:
We present near-infrared spectra of three low-luminosity protostars and one background star in the Perseus molecular cloud, acquired using the Infrared Camera (IRC) onboard the \textit{AKARI} space telescope. For the comparison with different star-forming environments, we also present spectra of the massive protostar AFGL 7009S, where the protostellar envelope is heated significantly, and the low-…
▽ More
We present near-infrared spectra of three low-luminosity protostars and one background star in the Perseus molecular cloud, acquired using the Infrared Camera (IRC) onboard the \textit{AKARI} space telescope. For the comparison with different star-forming environments, we also present spectra of the massive protostar AFGL 7009S, where the protostellar envelope is heated significantly, and the low-mass protostar RNO 91, which is suspected to be undergoing an episodic burst. We detected ice absorption features of \ch{H2O}, \ch{CO2}, and \ch{CO} at all spectra around the wavelengths of 3.05, 4.27, and 4.67 $μ$m, respectively. At least two low-luminosity protostars, we also detected the \ch{XCN} ice feature at 4.62 $μ$m. The presence of the crystalline \ch{H2O} ice and \ch{XCN} ice components indicates that the low-luminosity protostars experienced a hot phase via accretion bursts during the past mass accretion process. We compared the ice abundances of low-luminosity protostars with those of the embedded low-mass protostars and the dense molecular clouds and cores, suggesting that their ice abundances reflect the strength of prior bursts and the timescale after the last burst.
△ Less
Submitted 11 July, 2022;
originally announced July 2022.
-
Data Augmentation and Squeeze-and-Excitation Network on Multiple Dimension for Sound Event Localization and Detection in Real Scenes
Authors:
Byeong-Yun Ko,
Hyeonuk Nam,
Seong-Hu Kim,
Deokki Min,
Seung-Deok Choi,
Yong-Hwa Park
Abstract:
Performance of sound event localization and detection (SELD) in real scenes is limited by small size of SELD dataset, due to difficulty in obtaining sufficient amount of realistic multi-channel audio data recordings with accurate label. We used two main strategies to solve problems arising from the small real SELD dataset. First, we applied various data augmentation methods on all data dimensions:…
▽ More
Performance of sound event localization and detection (SELD) in real scenes is limited by small size of SELD dataset, due to difficulty in obtaining sufficient amount of realistic multi-channel audio data recordings with accurate label. We used two main strategies to solve problems arising from the small real SELD dataset. First, we applied various data augmentation methods on all data dimensions: channel, frequency and time. We also propose original data augmentation method named Moderate Mixup in order to simulate situations where noise floor or interfering events exist. Second, we applied Squeeze-and-Excitation block on channel and frequency dimensions to efficiently extract feature characteristics. Result of our trained models on the STARSS22 test dataset achieved the best ER, F1, LE, and LR of 0.53, 49.8%, 16.0deg., and 56.2% respectively.
△ Less
Submitted 23 June, 2022;
originally announced June 2022.
-
Frequency Dependent Sound Event Detection for DCASE 2022 Challenge Task 4
Authors:
Hyeonuk Nam,
Seong-Hu Kim,
Deokki Min,
Byeong-Yun Ko,
Seung-Deok Choi,
Yong-Hwa Park
Abstract:
While many deep learning methods on other domains have been applied to sound event detection (SED), differences between original domains of the methods and SED have not been appropriately considered so far. As SED uses audio data with two dimensions (time and frequency) for input, thorough comprehension on these two dimensions is essential for application of methods from other domains on SED. Prev…
▽ More
While many deep learning methods on other domains have been applied to sound event detection (SED), differences between original domains of the methods and SED have not been appropriately considered so far. As SED uses audio data with two dimensions (time and frequency) for input, thorough comprehension on these two dimensions is essential for application of methods from other domains on SED. Previous works proved that methods those address on frequency dimension are especially powerful in SED. By applying FilterAugment and frequency dynamic convolution those are frequency dependent methods proposed to enhance SED performance, our submitted models achieved best PSDS1 of 0.4704 and best PSDS2 of 0.8224.
△ Less
Submitted 23 June, 2022;
originally announced June 2022.
-
Efficient Scheduling of Data Augmentation for Deep Reinforcement Learning
Authors:
Byungchan Ko,
Jungseul Ok
Abstract:
In deep reinforcement learning (RL), data augmentation is widely considered as a tool to induce a set of useful priors about semantic consistency and improve sample efficiency and generalization performance. However, even when the prior is useful for generalization, distilling it to RL agent often interferes with RL training and degenerates sample efficiency. Meanwhile, the agent is forgetful of t…
▽ More
In deep reinforcement learning (RL), data augmentation is widely considered as a tool to induce a set of useful priors about semantic consistency and improve sample efficiency and generalization performance. However, even when the prior is useful for generalization, distilling it to RL agent often interferes with RL training and degenerates sample efficiency. Meanwhile, the agent is forgetful of the prior due to the non-stationary nature of RL. These observations suggest two extreme schedules of distillation: (i) over the entire training; or (ii) only at the end. Hence, we devise a stand-alone network distillation method to inject the consistency prior at any time (even after RL), and a simple yet efficient framework to automatically schedule the distillation. Specifically, the proposed framework first focuses on mastering train environments regardless of generalization by adaptively deciding which {\it or no} augmentation to be used for the training. After this, we add the distillation to extract the remaining benefits for generalization from all the augmentations, which requires no additional new samples. In our experiments, we demonstrate the utility of the proposed framework, in particular, that considers postponing the augmentation to the end of RL training.
△ Less
Submitted 1 March, 2023; v1 submitted 1 June, 2022;
originally announced June 2022.
-
Frequency Dynamic Convolution: Frequency-Adaptive Pattern Recognition for Sound Event Detection
Authors:
Hyeonuk Nam,
Seong-Hu Kim,
Byeong-Yun Ko,
Yong-Hwa Park
Abstract:
2D convolution is widely used in sound event detection (SED) to recognize two dimensional time-frequency patterns of sound events. However, 2D convolution enforces translation equivariance on sound events along both time and frequency axis while frequency is not shift-invariant dimension. In order to improve physical consistency of 2D convolution on SED, we propose frequency dynamic convolution wh…
▽ More
2D convolution is widely used in sound event detection (SED) to recognize two dimensional time-frequency patterns of sound events. However, 2D convolution enforces translation equivariance on sound events along both time and frequency axis while frequency is not shift-invariant dimension. In order to improve physical consistency of 2D convolution on SED, we propose frequency dynamic convolution which applies kernel that adapts to frequency components of input. Frequency dynamic convolution outperforms the baseline by 6.3% in DESED validation dataset in terms of polyphonic sound detection score (PSDS). It also significantly outperforms other pre-existing content-adaptive methods on SED. In addition, by comparing class-wise F1 scores of baseline and frequency dynamic convolution, we showed that frequency dynamic convolution is especially more effective for detection of non-stationary sound events with intricate time-frequency patterns. From this result, we verified that frequency dynamic convolution is superior in recognizing frequency-dependent patterns.
△ Less
Submitted 3 July, 2022; v1 submitted 29 March, 2022;
originally announced March 2022.
-
Large-scale Bilingual Language-Image Contrastive Learning
Authors:
Byungsoo Ko,
Geonmo Gu
Abstract:
This paper is a technical report to share our experience and findings building a Korean and English bilingual multimodal model. While many of the multimodal datasets focus on English and multilingual multimodal research uses machine-translated texts, employing such machine-translated texts is limited to describing unique expressions, cultural information, and proper noun in languages other than En…
▽ More
This paper is a technical report to share our experience and findings building a Korean and English bilingual multimodal model. While many of the multimodal datasets focus on English and multilingual multimodal research uses machine-translated texts, employing such machine-translated texts is limited to describing unique expressions, cultural information, and proper noun in languages other than English. In this work, we collect 1.1 billion image-text pairs (708 million Korean and 476 million English) and train a bilingual multimodal model named KELIP. We introduce simple yet effective training schemes, including MAE pre-training and multi-crop augmentation. Extensive experiments demonstrate that a model trained with such training schemes shows competitive performance in both languages. Moreover, we discuss multimodal-related research questions: 1) strong augmentation-based methods can distract the model from learning proper multimodal relations; 2) training multimodal model without cross-lingual relation can learn the relation via visual semantics; 3) our bilingual KELIP can capture cultural differences of visual semantics for the same meaning of words; 4) a large-scale multimodal model can be used for multimodal feature analogy. We hope that this work will provide helpful experience and findings for future research. We provide an open-source pre-trained KELIP.
△ Less
Submitted 14 April, 2022; v1 submitted 27 March, 2022;
originally announced March 2022.
-
Unusually high HCO+/CO ratios in and outside supernova remnant W49B
Authors:
Ping Zhou,
Gao-Yuan Zhang,
Xin Zhou,
Maria Arias,
Bon-Chul Koo,
Jacco Vink,
Zhi-Yu Zhang,
Lei Sun,
Fu-Jun Du,
Hui Zhu,
Yang Chen,
Stefano Bovino,
Yong-Hyun Lee
Abstract:
Galactic supernova remnants (SNRs) and their environments provide the nearest laboratories to study SN feedback. We performed molecular observations toward SNR W49B, the most luminous Galactic SNR in the X-ray band, aiming to explore signs of multiple feedback channels of SNRs on nearby molecular clouds (MCs). We found very broad HCO+ lines with widths of dv = 48--75 km/s in the SNR southwest, pro…
▽ More
Galactic supernova remnants (SNRs) and their environments provide the nearest laboratories to study SN feedback. We performed molecular observations toward SNR W49B, the most luminous Galactic SNR in the X-ray band, aiming to explore signs of multiple feedback channels of SNRs on nearby molecular clouds (MCs). We found very broad HCO+ lines with widths of dv = 48--75 km/s in the SNR southwest, providing strong evidence that W49B is perturbing MCs at a systemic velocity of $V_{LSR}=61$--65 km/s, and placing W49B at a distance of $7.9\pm 0.6$ kpc. We observed unusually high-intensity ratios of HCO+ J=1-0/CO J=1-0 not only at shocked regions ($1.1\pm 0.4$ and $0.70\pm 0.16$), but also in quiescent clouds over 1 pc away from the SNR's eastern boundary (> 0.2). By comparing with the magnetohydrodynamics shock models, we interpret that the high ratio in the broad-line regions can result from a cosmic-ray (CR) induced chemistry in shocked MCs, where the CR ionization rate is enhanced to around 10--100 times of the Galactic level. The high HCO+/CO ratio outside the SNR is probably caused by the radiation precursor, while the luminous X-ray emission of W49B can explain a few properties in this region. The above results provide observational evidence that SNRs can strongly influence the molecular chemistry in and outside the shock boundary via their shocks, CRs, and radiation. We propose that the HCO+/CO ratio is a potentially useful tool to probe an SNR's multichannel influence on MCs.
△ Less
Submitted 25 May, 2022; v1 submitted 24 March, 2022;
originally announced March 2022.
-
Quality Control of Mass-Produced GEM Detectors for the CMS GE1/1 Muon Upgrade
Authors:
M. Abbas,
M. Abbrescia,
H. Abdalla,
A. Abdelalim,
S. AbuZeid,
A. Agapitos,
A. Ahmad,
A. Ahmed,
W. Ahmed,
C. Aimè,
C. Aruta,
I. Asghar,
P. Aspell,
C. Avila,
J. Babbar,
Y. Ban,
R. Band,
S. Bansal,
L. Benussi,
T. Beyrouthy,
V. Bhatnagar,
M. Bianco,
S. Bianco,
K. Black,
L. Borgonovi
, et al. (157 additional authors not shown)
Abstract:
The series of upgrades to the Large Hadron Collider, culminating in the High Luminosity Large Hadron Collider, will enable a significant expansion of the physics program of the CMS experiment. However, the accelerator upgrades will also make the experimental conditions more challenging, with implications for detector operations, triggering, and data analysis. The luminosity of the proton-proton co…
▽ More
The series of upgrades to the Large Hadron Collider, culminating in the High Luminosity Large Hadron Collider, will enable a significant expansion of the physics program of the CMS experiment. However, the accelerator upgrades will also make the experimental conditions more challenging, with implications for detector operations, triggering, and data analysis. The luminosity of the proton-proton collisions is expected to exceed $2-3\times10^{34}$~cm$^{-2}$s$^{-1}$ for Run 3 (starting in 2022), and it will be at least $5\times10^{34}$~cm$^{-2}$s$^{-1}$ when the High Luminosity Large Hadron Collider is completed for Run 4. These conditions will affect muon triggering, identification, and measurement, which are critical capabilities of the experiment. To address these challenges, additional muon detectors are being installed in the CMS endcaps, based on Gas Electron Multiplier technology. For this purpose, 161 large triple-Gas Electron Multiplier detectors have been constructed and tested. Installation of these devices began in 2019 with the GE1/1 station and will be followed by two additional stations, GE2/1 and ME0, to be installed in 2023 and 2026, respectively. The assembly and quality control of the GE1/1 detectors were distributed across several production sites around the world. We motivate and discuss the quality control procedures that were developed to standardize the performance of the detectors, and we present the final results of the production. Out of 161 detectors produced, 156 detectors passed all tests, and 144 detectors are now installed in the CMS experiment. The various visual inspections, gas tightness tests, intrinsic noise rate characterizations, and effective gas gain and response uniformity tests allowed the project to achieve this high success rate.
△ Less
Submitted 22 March, 2022;
originally announced March 2022.
-
HRTF measurement for accurate sound localization cues
Authors:
Gyeong-Tae Lee,
Sang-Min Choi,
Byeong-Yun Ko,
Yong-Hwa Park
Abstract:
A new database of head-related transfer functions (HRTFs) for accurate sound source localization is presented through precise measurement and post-processing in terms of improved frequency bandwidth and causality of head-related impulse responses (HRIRs) for accurate spectral cue (SC) and interaural time difference (ITD), respectively. The improvement effects of the proposed methods on binaural so…
▽ More
A new database of head-related transfer functions (HRTFs) for accurate sound source localization is presented through precise measurement and post-processing in terms of improved frequency bandwidth and causality of head-related impulse responses (HRIRs) for accurate spectral cue (SC) and interaural time difference (ITD), respectively. The improvement effects of the proposed methods on binaural sound localization cues were investigated. To achieve sufficient frequency bandwidth with a single source, a one-way sealed speaker module was designed to obtain wide band frequency response based on electro-acoustics, whereas most existing HRTF databases rely on a two-way vented loudspeaker that has multiple sources. The origin transfer function at the head center was obtained by the proposed measurement scheme using a 0 degree on-axis microphone to ensure accurate spectral cue pattern of HRTFs, whereas in the previous measurements with a 90 degree off-axis microphone, the magnitude response of the origin transfer function fluctuated and decreased with increasing frequency, causing erroneous SCs of HRTFs. To prevent discontinuity of ITD due to non-causality of ipsilateral HRTFs, obtained HRIRs were circularly shifted by time delay considering the head radius of the measurement subject. Finally, various sound localization cues such as ITD, interaural level difference (ILD), SC, and horizontal plane directivity (HPD) were derived from the presented HRTFs, and improvements on binaural sound localization cues were examined. As a result, accurate SC patterns of HRTFs were confirmed through the proposed measurement scheme using the 0 degree on-axis microphone, and continuous ITD patterns were obtained due to the non-causality compensation. Source codes and presented HRTF database are available to relevant research groups at GitHub (https://github.com/han-saram/HRTF-HATS-KAIST).
△ Less
Submitted 5 April, 2022; v1 submitted 7 March, 2022;
originally announced March 2022.
-
High-temperature superconductivity in hydrides: experimental evidence and details
Authors:
M. I. Eremets,
V. S. Minkov,
A. P. Drozdov,
P. P. Kong,
V. Ksenofontov,
S. I. Shylin,
S. L. Bud ko,
R. Prozorov,
F. F. Balakirev,
Dan Sun,
S. Mozaffari,
L. Balicas
Abstract:
Since the discovery of superconductivity at 200 K in H3S [1] similar or higher transition temperatures, Tcs, have been reported for various hydrogen-rich compounds under ultra-high pressures [2]. Superconductivity was experimentally proved by different methods, including electrical resistance, magnetic susceptibility, optical infrared, and nuclear resonant scattering measurements. The crystal stru…
▽ More
Since the discovery of superconductivity at 200 K in H3S [1] similar or higher transition temperatures, Tcs, have been reported for various hydrogen-rich compounds under ultra-high pressures [2]. Superconductivity was experimentally proved by different methods, including electrical resistance, magnetic susceptibility, optical infrared, and nuclear resonant scattering measurements. The crystal structures of superconducting phases were determined by X-ray diffraction. Numerous electrical transport measurements demonstrate the typical behaviour of a conventional phonon-mediated superconductor: zero resistance below Tc, the shift of Tc to lower temperatures under external magnetic fields, and pronounced isotope effect. Remarkably, the results are in good agreement with the theoretical predictions, which describe superconductivity in hydrides within the framework of the conventional BCS theory. However, despite this acknowledgment, experimental evidence for the superconducting state in these compounds has recently been treated with criticism [3, 4], which apparently stems from misunderstanding and misinterpretation of complicated experiments performed under very high pressures. Here, we describe in greater detail the experiments revealing high-temperature superconductivity in hydrides under high pressures. We show that the arguments against superconductivity [3, 4] can be either refuted or explained. The experiments on the high-temperature superconductivity in hydrides clearly contradict the theory of hole superconductivity [4] and eliminate it [3].
△ Less
Submitted 13 January, 2022;
originally announced January 2022.