Search | arXiv e-print repository

D2LLM: Decomposed and Distilled Large Language Models for Semantic Search

Authors: Zihan Liao, Hang Yu, Jianguo Li, Jun Wang, Wei Zhang

Abstract: The key challenge in semantic search is to create models that are both accurate and efficient in pinpointing relevant sentences for queries. While BERT-style bi-encoders excel in efficiency with pre-computed embeddings, they often miss subtle nuances in search tasks. Conversely, GPT-style LLMs with cross-encoder designs capture these nuances but are computationally intensive, hindering real-time a… ▽ More The key challenge in semantic search is to create models that are both accurate and efficient in pinpointing relevant sentences for queries. While BERT-style bi-encoders excel in efficiency with pre-computed embeddings, they often miss subtle nuances in search tasks. Conversely, GPT-style LLMs with cross-encoder designs capture these nuances but are computationally intensive, hindering real-time applications. In this paper, we present D2LLMs-Decomposed and Distilled LLMs for semantic search-that combines the best of both worlds. We decompose a cross-encoder into an efficient bi-encoder integrated with Pooling by Multihead Attention and an Interaction Emulation Module, achieving nuanced understanding and pre-computability. Knowledge from the LLM is distilled into this model using contrastive, rank, and feature imitation techniques. Our experiments show that D2LLM surpasses five leading baselines in terms of all metrics across three tasks, particularly improving NLI task performance by at least 6.45%. The source code is available at https://github.com/codefuse-ai/D2LLM. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.16287 [pdf, other]

Energetic Spectral-Element Time Marching Methods for Phase-Field Nonlinear Gradient Systems

Authors: Shiqin Liu, Haijun Yu

Abstract: We propose two efficient energetic spectral-element methods in time for marching nonlinear gradient systems with the phase-field Allen--Cahn equation as an example: one fully implicit nonlinear method and one semi-implicit linear method. Different from other spectral methods in time using spectral Petrov-Galerkin or weighted Galerkin approximations, the presented implicit method employs an energet… ▽ More We propose two efficient energetic spectral-element methods in time for marching nonlinear gradient systems with the phase-field Allen--Cahn equation as an example: one fully implicit nonlinear method and one semi-implicit linear method. Different from other spectral methods in time using spectral Petrov-Galerkin or weighted Galerkin approximations, the presented implicit method employs an energetic variational Galerkin form that can maintain the mass conservation and energy dissipation property of the continuous dynamical system. Another advantage of this method is its superconvergence. A high-order extrapolation is adopted for the nonlinear term to get the semi-implicit method. The semi-implicit method does not have superconvergence, but can be improved by a few Picard-like iterations to recover the superconvergence of the implicit method. Numerical experiments verify that the method using Legendre elements of degree three outperforms the 4th-order implicit-explicit backward differentiation formula and the 4th-order exponential time difference Runge-Kutta method, which were known to have best performances in solving phase-field equations. In addition to the standard Allen--Cahn equation, we also apply the method to a conservative Allen--Cahn equation, in which the conservation of discrete total mass is verified. The applications of the proposed methods are not limited to phase-field Allen--Cahn equations. They are suitable for solving general, large-scale nonlinear dynamical systems. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: 28 pages, 10 figures

arXiv:2406.16242 [pdf, other]

Foliation of area minimizing hypersurfaces in asymptotically flat manifolds and Schoen's conjecture

Authors: Shihang He, Yuguang Shi, Haobin Yu

Abstract: In this paper, we demonstrate that any asymptotically flat manifold $(M^n, g)$ with $4\leq n\leq 7$ can be foliated by a family of area-minimizing hypersurfaces, each of which is asymptotic to Cartesian coordinate hyperplanes defined at an end of $(M^n, g)$. As an application of this foliation, we show that for any asymptotically flat manifold $(M^n, g)$ with $4\leq n\leq 7$, nonnegative scalar cu… ▽ More In this paper, we demonstrate that any asymptotically flat manifold $(M^n, g)$ with $4\leq n\leq 7$ can be foliated by a family of area-minimizing hypersurfaces, each of which is asymptotic to Cartesian coordinate hyperplanes defined at an end of $(M^n, g)$. As an application of this foliation, we show that for any asymptotically flat manifold $(M^n, g)$ with $4\leq n\leq 7$, nonnegative scalar curvature and positive mass, the solution of free boundary problem for area-minimizing hypersurface in coordinate cylinder $C_{R_i}$ in $(M^n, g)$ either does not exist or drifts to infinity of $(M^n, g)$ as $R_i$ tends to infinity. Additionally, we introduce a concept of globally minimizing hypersurface in $(M^n, g)$, and verify a version of the Schoen Conjecture. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: 39pages, 8 figures. Comments are welcome!

arXiv:2406.15363 [pdf]

Exploring LLM Multi-Agents for ICD Coding

Authors: Rumeng Li, Xun Wang, Hong Yu

Abstract: Large Language Models (LLMs) have demonstrated impressive and diverse abilities that can benefit various domains, such as zero and few-shot information extraction from clinical text without domain-specific training. However, for the ICD coding task, they often hallucinate key details and produce high recall but low precision results due to the high-dimensional and skewed distribution of the ICD co… ▽ More Large Language Models (LLMs) have demonstrated impressive and diverse abilities that can benefit various domains, such as zero and few-shot information extraction from clinical text without domain-specific training. However, for the ICD coding task, they often hallucinate key details and produce high recall but low precision results due to the high-dimensional and skewed distribution of the ICD codes. Existing LLM-based methods fail to account for the complex and dynamic interactions among the human agents involved in coding, such as patients, physicians, and coders, and they lack interpretability and reliability. In this paper, we present a novel multi-agent method for ICD coding, which mimics the real-world coding process with five agents: a patient agent, a physician agent, a coder agent, a reviewer agent, and an adjuster agent. Each agent has a specific function and uses a LLM-based model to perform it. We evaluate our method on the MIMIC-III dataset and show that our proposed multi-agent coding framework substantially improves performance on both common and rare codes compared to Zero-shot Chain of Thought (CoT) prompting and self-consistency with CoT. The ablation study confirms the proposed agent roles' efficacy. Our method also matches the state-of-the-art ICD coding methods that require pre-training or fine-tuning, in terms of coding accuracy, rare code accuracy, and explainability. △ Less

Submitted 1 April, 2024; originally announced June 2024.

arXiv:2406.13972 [pdf, other]

CREF: An LLM-based Conversational Software Repair Framework for Programming Tutors

Authors: Boyang Yang, Haoye Tian, Weiguo Pian, Haoran Yu, Haitao Wang, Jacques Klein, Tegawendé F. Bissyandé, Shunfu Jin

Abstract: Program repair techniques offer cost-saving benefits for debugging within software development and programming education scenarios. With the proven effectiveness of Large Language Models (LLMs) in code-related tasks, researchers have explored their potential for program repair. However, it is crucial to recognize that existing repair benchmarks may have influenced LLM training data, potentially ca… ▽ More Program repair techniques offer cost-saving benefits for debugging within software development and programming education scenarios. With the proven effectiveness of Large Language Models (LLMs) in code-related tasks, researchers have explored their potential for program repair. However, it is crucial to recognize that existing repair benchmarks may have influenced LLM training data, potentially causing data leakage. To evaluate LLMs' realistic repair capabilities, (1) we introduce an extensive, non-crawled benchmark, referred to as TutorCode, comprising 1,239 C++ defect codes and associated information such as tutor guidance, solution description, failing test cases, and the corrected code. Our work assesses the repair performance of 12 LLMs on TutorCode, measuring repair correctness (TOP-5 and AVG-5) and patch precision (RPSR). (2) We then provide a comprehensive investigation into which types of extra information can help LLMs improve their performance in repairing defects. Among these types, tutor guidance was found to be the most effective information in enhancing LLM repair capabilities. To fully harness LLMs' conversational capabilities and the benefits of augmented information, (3) we introduce a novel conversational semi-automatic repair framework CREF assisting human tutor. It demonstrates a remarkable AVG-5 improvement of 17.2%-24.6% compared to the baseline, achieving an impressive AVG-5 of 76.6% when utilizing GPT-4. These results highlight the potential for enhancing LLMs' repair capabilities through interactions with tutors and historical conversations involving incorrect responses. The successful application of CREF in a real-world educational setting demonstrates its effectiveness in reducing tutors' workload and improving students' learning experience, while also showcasing its promise for facilitating other software engineering tasks, such as code review. △ Less

Submitted 8 July, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13578 [pdf, other]

Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration

Authors: Han-Cheng Yu, Yu-An Shih, Kin-Man Law, Kai-Yu Hsieh, Yu-Chen Cheng, Hsin-Chih Ho, Zih-An Lin, Wen-Chuan Hsu, Yao-Chung Fan

Abstract: In this paper, we tackle the task of distractor generation (DG) for multiple-choice questions. Our study introduces two key designs. First, we propose \textit{retrieval augmented pretraining}, which involves refining the language model pretraining to align it more closely with the downstream task of DG. Second, we explore the integration of knowledge graphs to enhance the performance of DG. Throug… ▽ More In this paper, we tackle the task of distractor generation (DG) for multiple-choice questions. Our study introduces two key designs. First, we propose \textit{retrieval augmented pretraining}, which involves refining the language model pretraining to align it more closely with the downstream task of DG. Second, we explore the integration of knowledge graphs to enhance the performance of DG. Through experiments with benchmarking datasets, we show that our models significantly outperform the state-of-the-art results. Our best-performing model advances the F1@3 score from 14.80 to 16.47 in MCQ dataset and from 15.92 to 16.50 in Sciq dataset. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: Findings at ACL 2024

arXiv:2406.12793 [pdf, other]

ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

Authors: Team GLM, :, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, Jing Zhang, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong, Mingdao Liu, Minlie Huang , et al. (32 additional authors not shown)

Abstract: We introduce ChatGLM, an evolving family of large language models that we have been developing over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM. To date, the GLM-4 models are pre-trained… ▽ More We introduce ChatGLM, an evolving family of large language models that we have been developing over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM. To date, the GLM-4 models are pre-trained on ten trillions of tokens mostly in Chinese and English, along with a small set of corpus from 24 languages, and aligned primarily for Chinese and English usage. The high-quality alignment is achieved via a multi-stage post-training process, which involves supervised fine-tuning and learning from human feedback. Evaluations show that GLM-4 1) closely rivals or outperforms GPT-4 in terms of general metrics such as MMLU, GSM8K, MATH, BBH, GPQA, and HumanEval, 2) gets close to GPT-4-Turbo in instruction following as measured by IFEval, 3) matches GPT-4 Turbo (128K) and Claude 3 for long context tasks, and 4) outperforms GPT-4 in Chinese alignments as measured by AlignBench. The GLM-4 All Tools model is further aligned to understand user intent and autonomously decide when and which tool(s) touse -- including web browser, Python interpreter, text-to-image model, and user-defined functions -- to effectively complete complex tasks. In practical applications, it matches and even surpasses GPT-4 All Tools in tasks like accessing online information via web browsing and solving math problems using Python interpreter. Over the course, we have open-sourced a series of models, including ChatGLM-6B (three generations), GLM-4-9B (128K, 1M), GLM-4V-9B, WebGLM, and CodeGeeX, attracting over 10 million downloads on Hugging face in the year 2023 alone. The open models can be accessed through https://github.com/THUDM and https://huggingface.co/THUDM. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12195 [pdf, other]

Quantum Compiling with Reinforcement Learning on a Superconducting Processor

Authors: Z. T. Wang, Qiuhao Chen, Yuxuan Du, Z. H. Yang, Xiaoxia Cai, Kaixuan Huang, Jingning Zhang, Kai Xu, Jun Du, Yinan Li, Yuling Jiao, Xingyao Wu, Wu Liu, Xiliang Lu, Huikai Xu, Yirong Jin, Ruixia Wang, Haifeng Yu, S. P. Zhao

Abstract: To effectively implement quantum algorithms on noisy intermediate-scale quantum (NISQ) processors is a central task in modern quantum technology. NISQ processors feature tens to a few hundreds of noisy qubits with limited coherence times and gate operations with errors, so NISQ algorithms naturally require employing circuits of short lengths via quantum compilation. Here, we develop a reinforcemen… ▽ More To effectively implement quantum algorithms on noisy intermediate-scale quantum (NISQ) processors is a central task in modern quantum technology. NISQ processors feature tens to a few hundreds of noisy qubits with limited coherence times and gate operations with errors, so NISQ algorithms naturally require employing circuits of short lengths via quantum compilation. Here, we develop a reinforcement learning (RL)-based quantum compiler for a superconducting processor and demonstrate its capability of discovering novel and hardware-amenable circuits with short lengths. We show that for the three-qubit quantum Fourier transformation, a compiled circuit using only seven CZ gates with unity circuit fidelity can be achieved. The compiler is also able to find optimal circuits under device topological constraints, with lengths considerably shorter than those by the conventional method. Our study exemplifies the codesign of the software with hardware for efficient quantum compilation, offering valuable insights for the advancement of RL-based compilers. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11508 [pdf, other]

Leveraging Cooperative Connected Automated Vehicles for Mixed Traffic Safety

Authors: Chenguang Zhao, Tamas G. Molnar, Huan Yu

Abstract: The introduction of connected and automated vehicles (CAV) is believed to reduce congestion, enhance safety, and improve traffic efficiency. Numerous research studies have focused on controlling pure CAV platoons in fully connected automated traffic, as well as single or multiple CAVs in mixed traffic with human-driven vehicles (HVs). CAV cruising control designs have been proposed to stabilize th… ▽ More The introduction of connected and automated vehicles (CAV) is believed to reduce congestion, enhance safety, and improve traffic efficiency. Numerous research studies have focused on controlling pure CAV platoons in fully connected automated traffic, as well as single or multiple CAVs in mixed traffic with human-driven vehicles (HVs). CAV cruising control designs have been proposed to stabilize the car-following traffic dynamics, but few studies has considered their safety impact, particularly the trade-offs between stability and safety. In this paper, we study how cooperative control strategies for CAVs can be designed to enhance the safety and smoothness of mixed traffic under varying penetrations of connectivity and automation. Considering mixed traffic where a pair of CAVs travels amongst HVs, we design cooperative feedback controllers for the pair CAVs to stabilize traffic via cooperation and, possibly, by also leveraging connectivity with HVs. The real-time safety impact of the CAV controllers is investigated using control barrier functions (CBF). We construct CBF safety constraints, based on which we propose safety-critical control designs to guarantee CAV safety, HV safety and platoon safety. Both theoretical and numerical analyses have been conducted to explore the effect of CAV cooperation and HV connectivity on stability and safety. Our results show that the cooperation of CAVs helps to stabilize the mixed traffic while safety can be guaranteed with the safety filters. Moreover, connectivity between CAVs and HVs offers additional benefits: if an HV connects to an upstream CAV (i.e., the CAV looks ahead), it helps the CAV to stabilize the upstream traffic, while if an HV connects to a downstream CAV (i.e., the CAV looks behind), the safety of this connected HV can be enhanced. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11274 [pdf, other]

Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers

Authors: Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Shiliang Zhang, Chong Deng, Hai Yu, Jiaqing Liu, Yukun Ma, Chong Zhang

Abstract: The Transformer architecture has significantly advanced deep learning, particularly in natural language processing, by effectively managing long-range dependencies. However, as the demand for understanding complex relationships grows, refining the Transformer's architecture becomes critical. This paper introduces Skip-Layer Attention (SLA) to enhance Transformer models by enabling direct attention… ▽ More The Transformer architecture has significantly advanced deep learning, particularly in natural language processing, by effectively managing long-range dependencies. However, as the demand for understanding complex relationships grows, refining the Transformer's architecture becomes critical. This paper introduces Skip-Layer Attention (SLA) to enhance Transformer models by enabling direct attention between non-adjacent layers. This method improves the model's ability to capture dependencies between high-level abstract features and low-level details. By facilitating direct attention between these diverse feature levels, our approach overcomes the limitations of current Transformers, which often rely on suboptimal intra-layer attention. Our implementation extends the Transformer's functionality by enabling queries in a given layer to interact with keys and values from both the current layer and one preceding layer, thus enhancing the diversity of multi-head attention without additional computational burden. Extensive experiments demonstrate that our enhanced Transformer model achieves superior performance in language modeling tasks, highlighting the effectiveness of our skip-layer attention mechanism. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 7 pages, 1 figure

arXiv:2406.10753 [pdf, other]

Testing the parametric model for self-interacting dark matter using matched halos in cosmological simulations

Authors: Daneng Yang, Ethan O. Nadler, Hai-Bo Yu

Abstract: We systemically evaluate the performance of the self-interacting dark matter (SIDM) halo model proposed in arXiv:2305.16176 with matched halos from high-resolution cosmological CDM and SIDM simulations. The model incorporates SIDM effects along mass evolution histories of CDM halos and it is applicable to both isolated halos and suhbhalos. We focus on the accuracy of the model in predicting halo d… ▽ More We systemically evaluate the performance of the self-interacting dark matter (SIDM) halo model proposed in arXiv:2305.16176 with matched halos from high-resolution cosmological CDM and SIDM simulations. The model incorporates SIDM effects along mass evolution histories of CDM halos and it is applicable to both isolated halos and suhbhalos. We focus on the accuracy of the model in predicting halo density profiles at $z=0$ and the evolution of maximum circular velocity. We find the model predictions agree with the simulations within $10\%-50\%$ for most of the simulated (sub)halos, $50\%-100\%$ for extreme cases. This indicates that the model effectively captures the gravothermal evolution of the halos with very strong, velocity-dependent self-interactions. For an example application, we apply the model to study the impact of various SIDM scenarios on strong lensing perturber systems, demonstrating its utility in predicting SIDM effects for small-scale structure analyses. Our findings confirm that the model is an effective tool for mapping CDM halos into their SIDM counterparts. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: 20 pages, 19 figures

arXiv:2406.10593 [pdf, other]

QDA-SQL: Questions Enhanced Dialogue Augmentation for Multi-Turn Text-to-SQL

Authors: Yinggang Sun, Ziming Guo, Haining Yu, Chuanyi Liu, Xiang Li, Bingxuan Wang, Xiangzhan Yu, Tiancheng Zhao

Abstract: Fine-tuning large language models (LLMs) for specific domain tasks has achieved great success in Text-to-SQL tasks. However, these fine-tuned models often face challenges with multi-turn Text-to-SQL tasks caused by ambiguous or unanswerable questions. It is desired to enhance LLMs to handle multiple types of questions in multi-turn Text-to-SQL tasks. To address this, we propose a novel data augmen… ▽ More Fine-tuning large language models (LLMs) for specific domain tasks has achieved great success in Text-to-SQL tasks. However, these fine-tuned models often face challenges with multi-turn Text-to-SQL tasks caused by ambiguous or unanswerable questions. It is desired to enhance LLMs to handle multiple types of questions in multi-turn Text-to-SQL tasks. To address this, we propose a novel data augmentation method, called QDA-SQL, which generates multiple types of multi-turn Q\&A pairs by using LLMs. In QDA-SQL, we introduce a novel data augmentation method incorporating validation and correction mechanisms to handle complex multi-turn Text-to-SQL tasks. Experimental results demonstrate that QDA-SQL enables fine-tuned models to exhibit higher performance on SQL statement accuracy and enhances their ability to handle complex, unanswerable questions in multi-turn Text-to-SQL tasks. The generation script and test set are released at https://github.com/mcxiaoxiao/QDA-SQL. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: 13 pages, 7 figures

arXiv:2406.10583 [pdf, other]

Demonstration of neutron identification in neutrino interactions in the MicroBooNE liquid argon time projection chamber

Authors: MicroBooNE collaboration, P. Abratenko, O. Alterkait, D. Andrade Aldana, L. Arellano, J. Asaadi, A. Ashkenazi, S. Balasubramanian, B. Baller, A. Barnard, G. Barr, D. Barrow, J. Barrow, V. Basque, J. Bateman, O. Benevides Rodrigues, S. Berkman, A. Bhanderi, A. Bhat, M. Bhattacharya, M. Bishai, A. Blake, B. Bogart, T. Bolton, J. Y. Book , et al. (165 additional authors not shown)

Abstract: A significant challenge in measurements of neutrino oscillations is reconstructing the incoming neutrino energies. While modern fully-active tracking calorimeters such as liquid argon time projection chambers in principle allow the measurement of all final state particles above some detection threshold, undetected neutrons remain a considerable source of missing energy with little to no data const… ▽ More A significant challenge in measurements of neutrino oscillations is reconstructing the incoming neutrino energies. While modern fully-active tracking calorimeters such as liquid argon time projection chambers in principle allow the measurement of all final state particles above some detection threshold, undetected neutrons remain a considerable source of missing energy with little to no data constraining their production rates and kinematics. We present the first demonstration of tagging neutrino-induced neutrons in liquid argon time projection chambers using secondary protons emitted from neutron-argon interactions in the MicroBooNE detector. We describe the method developed to identify neutrino-induced neutrons and demonstrate its performance using neutrons produced in muon-neutrino charged current interactions. The method is validated using a small subset of MicroBooNE's total dataset. The selection yields a sample with $60\%$ of selected tracks corresponding to neutron-induced secondary protons. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Report number: FERMILAB-PUB-24-0301

arXiv:2406.10123 [pdf, other]

Improving neutrino energy estimation of charged-current interaction events with recurrent neural networks in MicroBooNE

Authors: MicroBooNE collaboration, P. Abratenko, O. Alterkait, D. Andrade Aldana, L. Arellano, J. Asaadi, A. Ashkenazi, S. Balasubramanian, B. Baller, A. Barnard, G. Barr, D. Barrow, J. Barrow, V. Basque, J. Bateman, O. Benevides Rodrigues, S. Berkman, A. Bhanderi, A. Bhat, M. Bhattacharya, M. Bishai, A. Blake, B. Bogart, T. Bolton, J. Y. Book , et al. (164 additional authors not shown)

Abstract: We present a deep learning-based method for estimating the neutrino energy of charged-current neutrino-argon interactions. We employ a recurrent neural network (RNN) architecture for neutrino energy estimation in the MicroBooNE experiment, utilizing liquid argon time projection chamber (LArTPC) detector technology. Traditional energy estimation approaches in LArTPCs, which largely rely on reconstr… ▽ More We present a deep learning-based method for estimating the neutrino energy of charged-current neutrino-argon interactions. We employ a recurrent neural network (RNN) architecture for neutrino energy estimation in the MicroBooNE experiment, utilizing liquid argon time projection chamber (LArTPC) detector technology. Traditional energy estimation approaches in LArTPCs, which largely rely on reconstructing and summing visible energies, often experience sizable biases and resolution smearing because of the complex nature of neutrino interactions and the detector response. The estimation of neutrino energy can be improved after considering the kinematics information of reconstructed final-state particles. Utilizing kinematic information of reconstructed particles, the deep learning-based approach shows improved resolution and reduced bias for the muon neutrino Monte Carlo simulation sample compared to the traditional approach. In order to address the common concern about the effectiveness of this method on experimental data, the RNN-based energy estimator is further examined and validated with dedicated data-simulation consistency tests using MicroBooNE data. We also assess its potential impact on a neutrino oscillation study after accounting for all statistical and systematic uncertainties and show that it enhances physics sensitivity. This method has good potential to improve the performance of other physics analyses. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Report number: FERMILAB-PUB-24-0287

arXiv:2406.09683 [pdf, other]

Interstellar Nitrogen Isotope Ratios: Measurements on tracers of C$^{14}$N and C$^{15}$N

Authors: J. L. Chen, J. S. Zhang, C. Henkel, Y. T. Yan, H. Z. Yu, Y. X. Wang, Y. P. Zou, J. Y. Zhao, X. Y. Wang

Abstract: The nitrogen isotope ratio 14N/15N is a powerful tool to trace Galactic stellar nucleosynthesis and constraining Galactic chemical evolution. Previous observations have found lower 14N/15N ratios in the Galactic center and higher values in the Galactic disk. This is consistent with the inside-out formation scenario of our Milky Way. However, previous studies mostly utilized double isotope ratios a… ▽ More The nitrogen isotope ratio 14N/15N is a powerful tool to trace Galactic stellar nucleosynthesis and constraining Galactic chemical evolution. Previous observations have found lower 14N/15N ratios in the Galactic center and higher values in the Galactic disk. This is consistent with the inside-out formation scenario of our Milky Way. However, previous studies mostly utilized double isotope ratios also including 12C/13C, which introduces additional uncertainties. Here we therefore present observations of C14N and its rare isotopologue, C15N, toward a sample of star forming regions, measured by the IRAM 30 m and/or the ARO 12 m telescope at $λ$ ~3 mm wavelength. For those 35 sources detected in both isotopologues, physical parameters are determined. Furthermore we have obtained nitrogen isotope ratios using the strongest hyperfine components of CN and C15N. For those sources showing small deviations from Local Thermodynamical Equilibrium and/or self-absorption, the weakest hyperfine component, likely free of the latter effect, was used to obtain reliable 14N/15N values. Our measured 14N/15N isotope ratios from C14N and C15N measurements are compatible with those from our earlier measurements of NH3 and 15NH3 (Paper I), i.e., increasing ratios to a Galacticentric distance of ~9 kpc. The unweighted second order polynomial fit yields $\frac{{\rm C^{14}N}}{{\rm C^{15}N}} = (-4.85 \pm 1.89)\;{\rm kpc^{-2}} \times R_{\rm GC}^{2} + (82.11 \pm 31.93) \;{\rm kpc^{-1}} \times R_{\rm GC} - (28.12 \pm 126.62)$. Toward the outer galaxy, the isotope ratio tends to decrease, supporting an earlier finding by H13CN/HC15N. Galactic chemical evolution models are consistent with our measurements of the 14N/15N isotope ratio, i.e. a rising trend from the Galactic center region to approximately 9 kpc, followed by a decreasing trend with increasing $R_{\rm GC}$ toward the outer Galaxy. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 34 pages, 9 figures, 6 tables

Journal ref: The Astrophysical Journal (2004)

arXiv:2406.09394 [pdf, other]

WonderWorld: Interactive 3D Scene Generation from a Single Image

Authors: Hong-Xing Yu, Haoyi Duan, Charles Herrmann, William T. Freeman, Jiajun Wu

Abstract: We present WonderWorld, a novel framework for interactive 3D scene extrapolation that enables users to explore and shape virtual environments based on a single input image and user-specified text. While significant improvements have been made to the visual quality of scene generation, existing methods are run offline, taking tens of minutes to hours to generate a scene. By leveraging Fast Gaussian… ▽ More We present WonderWorld, a novel framework for interactive 3D scene extrapolation that enables users to explore and shape virtual environments based on a single input image and user-specified text. While significant improvements have been made to the visual quality of scene generation, existing methods are run offline, taking tens of minutes to hours to generate a scene. By leveraging Fast Gaussian Surfels and a guided diffusion-based depth estimation method, WonderWorld generates geometrically consistent extrapolation while significantly reducing computational time. Our framework generates connected and diverse 3D scenes in less than 10 seconds on a single A6000 GPU, enabling real-time user interaction and exploration. We demonstrate the potential of WonderWorld for applications in virtual reality, gaming, and creative design, where users can quickly generate and navigate immersive, potentially infinite virtual worlds from a single image. Our approach represents a significant advancement in interactive 3D scene generation, opening up new possibilities for user-driven content creation and exploration in virtual environments. We will release full code and software for reproducibility. Project website: https://WonderWorld-2024.github.io/ △ Less

Submitted 14 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: Project website: https://WonderWorld-2024.github.io/

arXiv:2406.09205 [pdf, other]

ReadCtrl: Personalizing text generation with readability-controlled instruction learning

Authors: Hieu Tran, Zonghai Yao, Lingxi Li, Hong Yu

Abstract: Content generation conditioning on users's readability is an important application for personalization. In an era of large language models (LLMs), readability-controlled text generation based on LLMs has become increasingly important. This paper introduces a novel methodology called "Readability-Controlled Instruction Learning (ReadCtrl)," which aims to instruction-tune LLMs to tailor users' reada… ▽ More Content generation conditioning on users's readability is an important application for personalization. In an era of large language models (LLMs), readability-controlled text generation based on LLMs has become increasingly important. This paper introduces a novel methodology called "Readability-Controlled Instruction Learning (ReadCtrl)," which aims to instruction-tune LLMs to tailor users' readability levels. Unlike the traditional methods, which primarily focused on categorical readability adjustments typically classified as high, medium, and low or expert and layperson levels with limited success, ReadCtrl introduces a dynamic framework that enables LLMs to generate content at various (near continuous level) complexity levels, thereby enhancing their versatility across different applications. Our results show that the ReadCtrl-Mistral-7B models significantly outperformed strong baseline models such as GPT-4 and Claude-3, with a win rate of 52.1%:35.7% against GPT-4 in human evaluations. Furthermore, Read-Ctrl has shown significant improvements in automatic evaluations, as evidenced by better readability metrics (e.g., FOG, FKGL) and generation quality metrics (e.g., BLEU, SARI, SummaC-Factuality, UniEval-Consistency and Coherence). These results underscore Read-Ctrl's effectiveness and tenacity in producing high-quality, contextually appropriate outputs that closely align with targeted readability levels, marking a significant advancement in personalized content generation using LLMs. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 9 pages

arXiv:2406.08698 [pdf, other]

Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes… ▽ More In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes of astrophysical $γ$-ray background while large amount of dark matter. By analyzing more than 700 days observational data at LHAASO, no significant dark matter signal from 1 TeV to 1 EeV is detected. Accordingly we derive the most stringent constraints on the ultra-heavy dark matter annihilation cross-section up to EeV. The constraints on the lifetime of dark matter in decay mode are also derived. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 17 pages, 12 figures, accepted by PRL

arXiv:2406.08301 [pdf, other]

Jet modification via $π^0$-hadron correlations in Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV

Authors: PHENIX Collaboration, N. J. Abdulameer, U. Acharya, A. Adare, S. Afanasiev, C. Aidala, N. N. Ajitanand, Y. Akiba, H. Al-Bataineh, J. Alexander, M. Alfred, K. Aoki, N. Apadula, L. Aphecetche, J. Asai, H. Asano, E. T. Atomssa, R. Averbeck, T. C. Awes, B. Azmoun, V. Babintsev, M. Bai, G. Baksay, L. Baksay, A. Baldisseri , et al. (510 additional authors not shown)

Abstract: High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is obs… ▽ More High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is observed in the yield of high-momentum jet fragments opposite the trigger particle, which indicates jet suppression stemming from in-medium partonic energy loss, while enhancement is observed for low-momentum particles. The ratio and differences between the yield in Au$+$Au collisions and $p$$+$$p$ collisions, $I_{AA}$ and $Δ_{AA}$, as a function of the trigger-hadron azimuthal separation, $Δφ$, are measured for the first time at the Relativistic Heavy Ion Collider. These results better quantify how the yield of low-$p_T$ associated hadrons is enhanced at wide angle, which is crucial for studying energy loss as well as medium-response effects. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 534 authors from 83 institutions, 12 pages, 7 figures. v1 is version submitted to Physical Review C. HEPdata tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

arXiv:2406.08275 [pdf]

An accurate and transferable machine learning interatomic potential for equimolar and non-equimolar high-entropy diborides

Authors: Hong Meng, Yiwen Liu, Hulei Yu, Lei Zhuang, Yanhui Chu

Abstract: Machine learning interatomic potentials have become a powerful tool to achieve molecular dynamics (MD) simulations with the accuracy of ab initio methods while beyond their length and timescale limitations. Here, we develop an efficient moment tensor potential (MTP) for high-entropy diborides (HEBs) based on unary and binary diborides with Ti-V-Cr-Zr-Nb-Mo-Hf-Ta-W principal elements. Notably, the… ▽ More Machine learning interatomic potentials have become a powerful tool to achieve molecular dynamics (MD) simulations with the accuracy of ab initio methods while beyond their length and timescale limitations. Here, we develop an efficient moment tensor potential (MTP) for high-entropy diborides (HEBs) based on unary and binary diborides with Ti-V-Cr-Zr-Nb-Mo-Hf-Ta-W principal elements. Notably, the trained MTP exhibits exceptional generalization across both equimolar and non-equimolar HEBs, with testing errors in energy and force of 2.6 meV/atom and 155 meV/Å for equimolar HEBs, and 3.7 meV/atom and 172 meV/Å for non-equimolar HEBs, respectively, indicating its remarkable accuracy and transferability. The reliability of the established MTP is further confirmed by a comparative analysis with first-principles calculations, where our MTP accurately reproduces the structural and mechanical properties of various HEBs. The work presents a significant advancement in the simulation of high-entropy ceramics with enhanced efficiency and accuracy. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 17 pages, 3 figures

arXiv:2406.08243 [pdf]

An efficient strategy to construct general machine learning potentials for high-entropy ceramics

Authors: Yiwen Liu, Hong Meng, Zijie Zhu, Hulei Yu, Lei Zhuang, Yanhui Chu

Abstract: Molecular dynamics (MD) simulations are considered an efficient and low-cost means to develop remarkable properties of high-entropy ceramics with vast composition space, yet the lack of general potentials severely limits their applications. Herein, taking high-entropy carbides (HECs) as the model, we propose a strategy to efficiently construct a general neuroevolution potential (NEP) with broad co… ▽ More Molecular dynamics (MD) simulations are considered an efficient and low-cost means to develop remarkable properties of high-entropy ceramics with vast composition space, yet the lack of general potentials severely limits their applications. Herein, taking high-entropy carbides (HECs) as the model, we propose a strategy to efficiently construct a general neuroevolution potential (NEP) with broad compositional applicability for HECs. Specifically, the small dataset comprising unary and binary carbides with ten transition metal principal elements is first identified as the most efficient choice to train the general NEP for HECs, and then a highly accurate and transferable NEP is constructed. Further MD predictions on structural, mechanical, and thermal properties of HECs using the established NEP show good agreement with the results of first-principles calculations and experimental measurements, validating the accuracy, generalization, and reliability of our developed NEP. Our work provides an efficient solution to accelerating the MD simulations searching for high-entropy ceramics with desirable properties. △ Less

Submitted 18 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: 28 pages, 6 figures

arXiv:2406.07637 [pdf, other]

doi 10.12149/101450

The destiny of open cluster NGC 6530: past and future

Authors: Delong Jia, Heng Yu, Zhengyi Shao, Lu Li

Abstract: Studying the structures of open clusters is crucial for understanding stellar evolution and galactic dynamics. Based on Gaia DR3 data, we apply the hierarchical clustering algorithm to a young open cluster NGC 6530 and group its members into 5 substructures. By linear tracing with the kinematic information of their members, we find that: Sub 1 is the core of the cluster. It is expanding slowly. Su… ▽ More Studying the structures of open clusters is crucial for understanding stellar evolution and galactic dynamics. Based on Gaia DR3 data, we apply the hierarchical clustering algorithm to a young open cluster NGC 6530 and group its members into 5 substructures. By linear tracing with the kinematic information of their members, we find that: Sub 1 is the core of the cluster. It is expanding slowly. Sub 2 consists of less bound members, which began escaping from the core about 0.78 Myr ago. Sub 3 is associated with a young star forming region. It will merge with the core after 0.72 Myr; Sub 4, as an outskirt group, is also moving towards the core, but won't end up falling in. While Sub 5 is composed of less-bound members with field contamination. This work reveals the complex internal structure and evolutionary trends of the cluster NGC 6530. It also shows the potential of the hierarchical clustering algorithm in star cluster structure analysis. △ Less

Submitted 14 July, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

Comments: 13 pages, 11 figures, accepted for publication in AJ

arXiv:2406.07472 [pdf, other]

4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models

Authors: Heng Yu, Chaoyang Wang, Peiye Zhuang, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Laszlo A Jeni, Sergey Tulyakov, Hsin-Ying Lee

Abstract: Existing dynamic scene generation methods mostly rely on distilling knowledge from pre-trained 3D generative models, which are typically fine-tuned on synthetic object datasets. As a result, the generated scenes are often object-centric and lack photorealism. To address these limitations, we introduce a novel pipeline designed for photorealistic text-to-4D scene generation, discarding the dependen… ▽ More Existing dynamic scene generation methods mostly rely on distilling knowledge from pre-trained 3D generative models, which are typically fine-tuned on synthetic object datasets. As a result, the generated scenes are often object-centric and lack photorealism. To address these limitations, we introduce a novel pipeline designed for photorealistic text-to-4D scene generation, discarding the dependency on multi-view generative models and instead fully utilizing video generative models trained on diverse real-world datasets. Our method begins by generating a reference video using the video generation model. We then learn the canonical 3D representation of the video using a freeze-time video, delicately generated from the reference video. To handle inconsistencies in the freeze-time video, we jointly learn a per-frame deformation to model these imperfections. We then learn the temporal deformation based on the canonical representation to capture dynamic interactions in the reference video. The pipeline facilitates the generation of dynamic scenes with enhanced photorealism and structural integrity, viewable from multiple perspectives, thereby setting a new standard in 4D scene generation. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.07103 [pdf, other]

MR-RawNet: Speaker verification system with multiple temporal resolutions for variable duration utterances using raw waveforms

Authors: Seung-bin Kim, Chan-yeong Lim, Jungwoo Heo, Ju-ho Kim, Hyun-seo Shin, Kyo-Won Koo, Ha-Jin Yu

Abstract: In speaker verification systems, the utilization of short utterances presents a persistent challenge, leading to performance degradation primarily due to insufficient phonetic information to characterize the speakers. To overcome this obstacle, we propose a novel structure, MR-RawNet, designed to enhance the robustness of speaker verification systems against variable duration utterances using raw… ▽ More In speaker verification systems, the utilization of short utterances presents a persistent challenge, leading to performance degradation primarily due to insufficient phonetic information to characterize the speakers. To overcome this obstacle, we propose a novel structure, MR-RawNet, designed to enhance the robustness of speaker verification systems against variable duration utterances using raw waveforms. The MR-RawNet extracts time-frequency representations from raw waveforms via a multi-resolution feature extractor that optimally adjusts both temporal and spectral resolutions simultaneously. Furthermore, we apply a multi-resolution attention block that focuses on diverse and extensive temporal contexts, ensuring robustness against changes in utterance length. The experimental results, conducted on VoxCeleb1 dataset, demonstrate that the MR-RawNet exhibits superior performance in handling utterances of variable duration compared to other raw waveform-based systems. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 5 pages, accepted by Interspeech 2024

arXiv:2406.07056 [pdf, other]

Effectively Compress KV Heads for LLM

Authors: Hao Yu, Zelan Yang, Shen Li, Yong Li, Jianxin Wu

Abstract: The advent of pre-trained large language models (LLMs) has revolutionized various natural language processing tasks. These models predominantly employ an auto-regressive decoding mechanism that utilizes Key-Value (KV) caches to eliminate redundant calculations for previous tokens. Nevertheless, as context lengths and batch sizes increase, the linear expansion in memory footprint of KV caches becom… ▽ More The advent of pre-trained large language models (LLMs) has revolutionized various natural language processing tasks. These models predominantly employ an auto-regressive decoding mechanism that utilizes Key-Value (KV) caches to eliminate redundant calculations for previous tokens. Nevertheless, as context lengths and batch sizes increase, the linear expansion in memory footprint of KV caches becomes a key bottleneck of LLM deployment, which decreases generation speeds significantly. To mitigate this issue, previous techniques like multi-query attention (MQA) and grouped-query attention (GQA) have been developed, in order to reduce KV heads to accelerate inference with comparable accuracy to multi-head attention (MHA). Despite their effectiveness, existing strategies for compressing MHA often overlook the intrinsic properties of the KV caches. In this work, we explore the low-rank characteristics of the KV caches and propose a novel approach for compressing KV heads. In particular, we carefully optimize the MHA-to-GQA transformation to minimize compression error, and to remain compatible with rotary position embeddings (RoPE), we also introduce specialized strategies for key caches with RoPE. We demonstrate that our method can compress half or even three-quarters of KV heads while maintaining performance comparable to the original LLMs, which presents a promising direction for more efficient LLM deployment in resource-constrained environments. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.06080 [pdf, other]

Probing vector chirality in the early Universe

Authors: Junsup Shim, Ue-Li Pen, Hao-Ran Yu, Teppei Okumura

Abstract: We explore the potential of detecting parity violation in primordial vector fossils using late-time galaxy spins. Utilizing $N$-body simulations, we use halo spins as a reliable proxy for galaxy spins to investigate how effectively such primordial vectorial parity asymmetry remains in galaxy spins at low redshifts. We develop a novel approach to generate initial conditions with substantial parity… ▽ More We explore the potential of detecting parity violation in primordial vector fossils using late-time galaxy spins. Utilizing $N$-body simulations, we use halo spins as a reliable proxy for galaxy spins to investigate how effectively such primordial vectorial parity asymmetry remains in galaxy spins at low redshifts. We develop a novel approach to generate initial conditions with substantial parity asymmetry, while maintaining the initial matter power spectrum unchanged. From the parity broken initial condition and halos evolved from it, we construct the initial spin and halo spin fields, respectively. Focusing on the helicity of these vector fields, we detect substantial asymmetry in the initial spin field as a consequence of parity violation in the primordial vector fossil. In addition, we discover that over $50\%$ of the primordial asymmetry in the initial spin field remains in the late-time halo spin field on a range of scales. Given the tight correlation between halo spins and observable galaxy spins, we expect to detect the current amplitude of vectorial parity asymmetry potentially up to $16σ$-level in observation, when utilizing galaxy samples from DESI BGS. Our findings demonstrate that the primordial imprints of vectorial parity violation persist through non-linear gravitational evolution, highlighting the reliability of galaxy spin as a sensitive probe for testing the vectorial parity-invariance in the early Universe. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 5 pages, 3 figures. Submitted to PRL

arXiv:2406.06056 [pdf, other]

Synth-SBDH: A Synthetic Dataset of Social and Behavioral Determinants of Health for Clinical Text

Authors: Avijit Mitra, Emily Druhl, Raelene Goodwin, Hong Yu

Abstract: Social and behavioral determinants of health (SBDH) play a crucial role in health outcomes and are frequently documented in clinical text. Automatically extracting SBDH information from clinical text relies on publicly available good-quality datasets. However, existing SBDH datasets exhibit substantial limitations in their availability and coverage. In this study, we introduce Synth-SBDH, a novel… ▽ More Social and behavioral determinants of health (SBDH) play a crucial role in health outcomes and are frequently documented in clinical text. Automatically extracting SBDH information from clinical text relies on publicly available good-quality datasets. However, existing SBDH datasets exhibit substantial limitations in their availability and coverage. In this study, we introduce Synth-SBDH, a novel synthetic dataset with detailed SBDH annotations, encompassing status, temporal information, and rationale across 15 SBDH categories. We showcase the utility of Synth-SBDH on three tasks using real-world clinical datasets from two distinct hospital settings, highlighting its versatility, generalizability, and distillation capabilities. Models trained on Synth-SBDH consistently outperform counterparts with no Synth-SBDH training, achieving up to 62.5% macro-F improvements. Additionally, Synth-SBDH proves effective for rare SBDH categories and under-resource constraints. Human evaluation demonstrates a Human-LLM alignment of 71.06% and uncovers areas for future refinements. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: Github: https://github.com/avipartho/Synth-SBDH

arXiv:2406.06045 [pdf, other]

Synthesizing Efficient Data with Diffusion Models for Person Re-Identification Pre-Training

Authors: Ke Niu, Haiyang Yu, Xuelin Qian, Teng Fu, Bin Li, Xiangyang Xue

Abstract: Existing person re-identification (Re-ID) methods principally deploy the ImageNet-1K dataset for model initialization, which inevitably results in sub-optimal situations due to the large domain gap. One of the key challenges is that building large-scale person Re-ID datasets is time-consuming. Some previous efforts address this problem by collecting person images from the internet e.g., LUPerson,… ▽ More Existing person re-identification (Re-ID) methods principally deploy the ImageNet-1K dataset for model initialization, which inevitably results in sub-optimal situations due to the large domain gap. One of the key challenges is that building large-scale person Re-ID datasets is time-consuming. Some previous efforts address this problem by collecting person images from the internet e.g., LUPerson, but it struggles to learn from unlabeled, uncontrollable, and noisy data. In this paper, we present a novel paradigm Diffusion-ReID to efficiently augment and generate diverse images based on known identities without requiring any cost of data collection and annotation. Technically, this paradigm unfolds in two stages: generation and filtering. During the generation stage, we propose Language Prompts Enhancement (LPE) to ensure the ID consistency between the input image sequence and the generated images. In the diffusion process, we propose a Diversity Injection (DI) module to increase attribute diversity. In order to make the generated data have higher quality, we apply a Re-ID confidence threshold filter to further remove the low-quality images. Benefiting from our proposed paradigm, we first create a new large-scale person Re-ID dataset Diff-Person, which consists of over 777K images from 5,183 identities. Next, we build a stronger person Re-ID backbone pre-trained on our Diff-Person. Extensive experiments are conducted on four person Re-ID benchmarks in six widely used settings. Compared with other pre-training and self-supervised competitors, our approach shows significant superiority. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.06028 [pdf, other]

ReCon1M:A Large-scale Benchmark Dataset for Relation Comprehension in Remote Sensing Imagery

Authors: Xian Sun, Qiwei Yan, Chubo Deng, Chenglong Liu, Yi Jiang, Zhongyan Hou, Wanxuan Lu, Fanglong Yao, Xiaoyu Liu, Lingxiang Hao, Hongfeng Yu

Abstract: Scene Graph Generation (SGG) is a high-level visual understanding and reasoning task aimed at extracting entities (such as objects) and their interrelationships from images. Significant progress has been made in the study of SGG in natural images in recent years, but its exploration in the domain of remote sensing images remains very limited. The complex characteristics of remote sensing images ne… ▽ More Scene Graph Generation (SGG) is a high-level visual understanding and reasoning task aimed at extracting entities (such as objects) and their interrelationships from images. Significant progress has been made in the study of SGG in natural images in recent years, but its exploration in the domain of remote sensing images remains very limited. The complex characteristics of remote sensing images necessitate higher time and manual interpretation costs for annotation compared to natural images. The lack of a large-scale public SGG benchmark is a major impediment to the advancement of SGG-related research in aerial imagery. In this paper, we introduce the first publicly available large-scale, million-level relation dataset in the field of remote sensing images which is named as ReCon1M. Specifically, our dataset is built upon Fair1M and comprises 21,392 images. It includes annotations for 859,751 object bounding boxes across 60 different categories, and 1,149,342 relation triplets across 64 categories based on these bounding boxes. We provide a detailed description of the dataset's characteristics and statistical information. We conducted two object detection tasks and three sub-tasks within SGG on this dataset, assessing the performance of mainstream methods on these tasks. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.05644 [pdf, other]

How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States

Authors: Zhenhong Zhou, Haiyang Yu, Xinghua Zhang, Rongwu Xu, Fei Huang, Yongbin Li

Abstract: Large language models (LLMs) rely on safety alignment to avoid responding to malicious user inputs. Unfortunately, jailbreak can circumvent safety guardrails, resulting in LLMs generating harmful content and raising concerns about LLM safety. Due to language models with intensive parameters often regarded as black boxes, the mechanisms of alignment and jailbreak are challenging to elucidate. In th… ▽ More Large language models (LLMs) rely on safety alignment to avoid responding to malicious user inputs. Unfortunately, jailbreak can circumvent safety guardrails, resulting in LLMs generating harmful content and raising concerns about LLM safety. Due to language models with intensive parameters often regarded as black boxes, the mechanisms of alignment and jailbreak are challenging to elucidate. In this paper, we employ weak classifiers to explain LLM safety through the intermediate hidden states. We first confirm that LLMs learn ethical concepts during pre-training rather than alignment and can identify malicious and normal inputs in the early layers. Alignment actually associates the early concepts with emotion guesses in the middle layers and then refines them to the specific reject tokens for safe generations. Jailbreak disturbs the transformation of early unethical classification into negative emotions. We conduct experiments on models from 7B to 70B across various model families to prove our conclusion. Overall, our paper indicates the intrinsical mechanism of LLM safety and how jailbreaks circumvent safety guardrails, offering a new perspective on LLM safety and reducing concerns. Our code is available at https://github.com/ydyjya/LLM-IHS-Explanation. △ Less

Submitted 13 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

Comments: 27 pages

arXiv:2406.05391 [pdf, other]

DUPLEX: Dual GAT for Complex Embedding of Directed Graphs

Authors: Zhaoru Ke, Hang Yu, Jianguo Li, Haipeng Zhang

Abstract: Current directed graph embedding methods build upon undirected techniques but often inadequately capture directed edge information, leading to challenges such as: (1) Suboptimal representations for nodes with low in/out-degrees, due to the insufficient neighbor interactions; (2) Limited inductive ability for representing new nodes post-training; (3) Narrow generalizability, as training is overly c… ▽ More Current directed graph embedding methods build upon undirected techniques but often inadequately capture directed edge information, leading to challenges such as: (1) Suboptimal representations for nodes with low in/out-degrees, due to the insufficient neighbor interactions; (2) Limited inductive ability for representing new nodes post-training; (3) Narrow generalizability, as training is overly coupled with specific tasks. In response, we propose DUPLEX, an inductive framework for complex embeddings of directed graphs. It (1) leverages Hermitian adjacency matrix decomposition for comprehensive neighbor integration, (2) employs a dual GAT encoder for directional neighbor modeling, and (3) features two parameter-free decoders to decouple training from particular tasks. DUPLEX outperforms state-of-the-art models, especially for nodes with sparse connectivity, and demonstrates robust inductive capability and adaptability across various tasks. The code is available at https://github.com/alipay/DUPLEX. △ Less

Submitted 19 July, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.03848 [pdf, other]

OceanCastNet: A Deep Learning Ocean Wave Model with Energy Conservation

Authors: Ziliang Zhang, Huaming Yu, Danqin Ren

Abstract: Traditional wave forecasting models, although based on energy conservation equations, are computationally expensive. On the other hand, existing deep learning geophysical fluid models, while computationally efficient, often suffer from issues such as energy dissipation in long-term forecasts. This paper proposes a novel energy-balanced deep learning wave forecasting model called OceanCastNet (OCN)… ▽ More Traditional wave forecasting models, although based on energy conservation equations, are computationally expensive. On the other hand, existing deep learning geophysical fluid models, while computationally efficient, often suffer from issues such as energy dissipation in long-term forecasts. This paper proposes a novel energy-balanced deep learning wave forecasting model called OceanCastNet (OCN). By incorporating wind fields at the current, previous, and future time steps, as well as wave fields at the current and previous time steps as input variables, OCN maintains energy balance within the model. Furthermore, the model employs adaptive Fourier operators as its core components and designs a masked loss function to better handle the impact of land-sea boundaries. A series of experiments on the ERA5 dataset demonstrate that OCN can achieve short-term forecast accuracy comparable to traditional models while exhibiting an understanding of the wave generation process. In comparative experiments under both normal and extreme conditions, OCN consistently outperforms the widely used WaveWatch III model in the industry. Even after long-term forecasting, OCN maintains a stable and energy-rich state. By further constructing a simple meteorological model, OCN-wind, which considers energy balance, this paper confirms the importance of energy constraints for improving the long-term forecast performance of deep learning meteorological models. This finding provides new ideas for future research on deep learning geophysical fluid models. △ Less

Submitted 9 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.02956 [pdf, other]

doi 10.1103/PhysRevD.110.L021304

Alleviating the Hubble-constant tension and the growth tension via a transition of absolute magnitude favored by the Pantheon+ sample

Authors: Yang Liu, Hongwei Yu, Puxun Wu

Abstract: We establish a cosmological-model-independent method to extract the apparent magnitude and its derivative at different redshifts from the Pantheon+ type Ia supernova sample, and find that the obtained values deviate clearly from the prediction of the $Λ$CDM model at the lowest redshift. This deviation can be explained as a result of a transition of the absolute magnitude $M$ in the low redshift re… ▽ More We establish a cosmological-model-independent method to extract the apparent magnitude and its derivative at different redshifts from the Pantheon+ type Ia supernova sample, and find that the obtained values deviate clearly from the prediction of the $Λ$CDM model at the lowest redshift. This deviation can be explained as a result of a transition of the absolute magnitude $M$ in the low redshift region. The observations seem to favor this transition since the minimum values of $χ^2$ for two ansatzes of a varying $M$ are less than that of a constant $M$. The Hubble constant tension is alleviated from larger than $5σ$ to be about $1$ to $2σ$ for a varying $M$, and the growth tension can be resolved after attributing the variation of $M$ to a modification of the effective Newton's constant. △ Less

Submitted 24 July, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

Comments: 20 pages, 5 figures, 4 tables. Published in Physical Review D (Letter)

arXiv:2406.02948 [pdf, other]

Copula-based semiparametric nonnormal transformed linear model for survival data with dependent censoring

Authors: Huazhen Yu, Lixin Zhang

Abstract: Although the independent censoring assumption is commonly used in survival analysis, it can be violated when the censoring time is related to the survival time, which often happens in many practical applications. To address this issue, we propose a flexible semiparametric method for dependent censored data. Our approach involves fitting the survival time and the censoring time with a joint transfo… ▽ More Although the independent censoring assumption is commonly used in survival analysis, it can be violated when the censoring time is related to the survival time, which often happens in many practical applications. To address this issue, we propose a flexible semiparametric method for dependent censored data. Our approach involves fitting the survival time and the censoring time with a joint transformed linear model, where the transformed function is unspecified. This allows for a very general class of models that can account for possible covariate effects, while also accommodating administrative censoring. We assume that the transformed variables have a bivariate nonnormal distribution based on parametric copulas and parametric marginals, which further enhances the flexibility of our method. We demonstrate the identifiability of the proposed model and establish the consistency and asymptotic normality of the model parameters under appropriate regularity conditions and assumptions. Furthermore, we evaluate the performance of our method through extensive simulation studies, and provide a real data example for illustration. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.01602 [pdf, other]

Effectiveness of denoising diffusion probabilistic models for fast and high-fidelity whole-event simulation in high-energy heavy-ion experiments

Authors: Yeonju Go, Dmitrii Torbunov, Timothy Rinn, Yi Huang, Haiwang Yu, Brett Viren, Meifeng Lin, Yihui Ren, Jin Huang

Abstract: Artificial intelligence (AI) generative models, such as generative adversarial networks (GANs), variational auto-encoders, and normalizing flows, have been widely used and studied as efficient alternatives for traditional scientific simulations. However, they have several drawbacks, including training instability and inability to cover the entire data distribution, especially for regions where dat… ▽ More Artificial intelligence (AI) generative models, such as generative adversarial networks (GANs), variational auto-encoders, and normalizing flows, have been widely used and studied as efficient alternatives for traditional scientific simulations. However, they have several drawbacks, including training instability and inability to cover the entire data distribution, especially for regions where data are rare. This is particularly challenging for whole-event, full-detector simulations in high-energy heavy-ion experiments, such as sPHENIX at the Relativistic Heavy Ion Collider and Large Hadron Collider experiments, where thousands of particles are produced per event and interact with the detector. This work investigates the effectiveness of Denoising Diffusion Probabilistic Models (DDPMs) as an AI-based generative surrogate model for the sPHENIX experiment that includes the heavy-ion event generation and response of the entire calorimeter stack. DDPM performance in sPHENIX simulation data is compared with a popular rival, GANs. Results show that both DDPMs and GANs can reproduce the data distribution where the examples are abundant (low-to-medium calorimeter energies). Nonetheless, DDPMs significantly outperform GANs, especially in high-energy regions where data are rare. Additionally, DDPMs exhibit superior stability compared to GANs. The results are consistent between both central and peripheral centrality heavy-ion collision events. Moreover, DDPMs offer a substantial speedup of approximately a factor of 100 compared to the traditional Geant4 simulation method. △ Less

Submitted 23 May, 2024; originally announced June 2024.

Comments: 11 pages, 7 figures

arXiv:2406.01304 [pdf, other]

CodeR: Issue Resolving with Multi-Agent and Task Graphs

Authors: Dong Chen, Shaoxin Lin, Muhan Zeng, Daoguang Zan, Jian-Gang Wang, Anton Cheshkov, Jun Sun, Hao Yu, Guoliang Dong, Artem Aliev, Jie Wang, Xiao Cheng, Guangtai Liang, Yuchi Ma, Pan Bian, Tao Xie, Qianxiang Wang

Abstract: GitHub issue resolving recently has attracted significant attention from academia and industry. SWE-bench is proposed to measure the performance in resolving issues. In this paper, we propose CodeR, which adopts a multi-agent framework and pre-defined task graphs to Repair & Resolve reported bugs and add new features within code Repository. On SWE-bench lite, CodeR is able to solve 28.33% of issue… ▽ More GitHub issue resolving recently has attracted significant attention from academia and industry. SWE-bench is proposed to measure the performance in resolving issues. In this paper, we propose CodeR, which adopts a multi-agent framework and pre-defined task graphs to Repair & Resolve reported bugs and add new features within code Repository. On SWE-bench lite, CodeR is able to solve 28.33% of issues, when submitting only once for each issue. We examine the performance impact of each design of CodeR and offer insights to advance this research direction. △ Less

Submitted 10 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

Comments: https://github.com/NL2Code/CodeR

arXiv:2406.01235 [pdf, other]

Boosting Spatial-Spectral Masked Auto-Encoder Through Mining Redundant Spectra for HSI-SAR/LiDAR Classification

Authors: Junyan Lin, Xuepeng Jin, Feng Gao, Junyu Dong, Hui Yu

Abstract: Although recent masked image modeling (MIM)-based HSI-LiDAR/SAR classification methods have gradually recognized the importance of the spectral information, they have not adequately addressed the redundancy among different spectra, resulting in information leakage during the pretraining stage. This issue directly impairs the representation ability of the model. To tackle the problem, we propose a… ▽ More Although recent masked image modeling (MIM)-based HSI-LiDAR/SAR classification methods have gradually recognized the importance of the spectral information, they have not adequately addressed the redundancy among different spectra, resulting in information leakage during the pretraining stage. This issue directly impairs the representation ability of the model. To tackle the problem, we propose a new strategy, named Mining Redundant Spectra (MRS). Unlike randomly masking spectral bands, MRS selectively masks them by similarity to increase the reconstruction difficulty. Specifically, a random spectral band is chosen during pretraining, and the selected and highly similar bands are masked. Experimental results demonstrate that employing the MRS strategy during the pretraining stage effectively improves the accuracy of existing MIM-based methods on the Berlin and Houston 2018 datasets. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: Accepted by IGARSS 2024

arXiv:2406.01007 [pdf, other]

Measurement of Electron Antineutrino Oscillation Amplitude and Frequency via Neutron Capture on Hydrogen at Daya Bay

Authors: Daya Bay collaboration, F. P. An, W. D. Bai, A. B. Balantekin, M. Bishai, S. Blyth, G. F. Cao, J. Cao, J. F. Chang, Y. Chang, H. S. Chen, H. Y. Chen, S. M. Chen, Y. Chen, Y. X. Chen, Z. Y. Chen, J. Cheng, J. Cheng, Y. -C. Cheng, Z. K. Cheng, J. J. Cherwinka, M. C. Chu, J. P. Cummings, O. Dalager, F. S. Deng , et al. (177 additional authors not shown)

Abstract: This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive… ▽ More This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive region, the relative $\overlineν_{e}$ rates and energy spectra variation among the near and far detectors gives $\mathrm{sin}^22θ_{13} = 0.0759_{-0.0049}^{+0.0050}$ and $Δm^2_{32} = (2.72^{+0.14}_{-0.15})\times10^{-3}$ eV$^2$ assuming the normal neutrino mass ordering, and $Δm^2_{32} = (-2.83^{+0.15}_{-0.14})\times10^{-3}$ eV$^2$ for the inverted neutrino mass ordering. This estimate of $\sin^2 2θ_{13}$ is consistent with and essentially independent from the one obtained using the capture-on-gadolinium sample at Daya Bay. The combination of these two results yields $\mathrm{sin}^22θ_{13}= 0.0833\pm0.0022$, which represents an 8% relative improvement in precision regarding the Daya Bay full 3158-day capture-on-gadolinium result. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.00502 [pdf, other]

Non-geodesically-convex optimization in the Wasserstein space

Authors: Hoang Phuc Hau Luu, Hanlin Yu, Bernardo Williams, Petrus Mikkola, Marcelo Hartmann, Kai Puolamäki, Arto Klami

Abstract: We study a class of optimization problems in the Wasserstein space (the space of probability measures) where the objective function is \emph{nonconvex} along generalized geodesics. When the regularization term is the negative entropy, the optimization problem becomes a sampling problem where it minimizes the Kullback-Leibler divergence between a probability measure (optimization variable) and a ta… ▽ More We study a class of optimization problems in the Wasserstein space (the space of probability measures) where the objective function is \emph{nonconvex} along generalized geodesics. When the regularization term is the negative entropy, the optimization problem becomes a sampling problem where it minimizes the Kullback-Leibler divergence between a probability measure (optimization variable) and a target probability measure whose logarithmic probability density is a nonconvex function. We derive multiple convergence insights for a novel {\em semi Forward-Backward Euler scheme} under several nonconvex (and possibly nonsmooth) regimes. Notably, the semi Forward-Backward Euler is just a slight modification of the Forward-Backward Euler whose convergence is -- to our knowledge -- still unknown in our very general non-geodesically-convex setting. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2406.00494 [pdf, other]

Activation-Descent Regularization for Input Optimization of ReLU Networks

Authors: Hongzhan Yu, Sicun Gao

Abstract: We present a new approach for input optimization of ReLU networks that explicitly takes into account the effect of changes in activation patterns. We analyze local optimization steps in both the input space and the space of activation patterns to propose methods with superior local descent properties. To accomplish this, we convert the discrete space of activation patterns into differentiable repr… ▽ More We present a new approach for input optimization of ReLU networks that explicitly takes into account the effect of changes in activation patterns. We analyze local optimization steps in both the input space and the space of activation patterns to propose methods with superior local descent properties. To accomplish this, we convert the discrete space of activation patterns into differentiable representations and propose regularization terms that improve each descent step. Our experiments demonstrate the effectiveness of the proposed input-optimization methods for improving the state-of-the-art in various areas, such as adversarial learning, generative modeling, and reinforcement learning. △ Less

Submitted 1 June, 2024; originally announced June 2024.

Comments: ICML'24 Proceedings

arXiv:2406.00488 [pdf, other]

Federated Model Heterogeneous Matryoshka Representation Learning

Authors: Liping Yi, Han Yu, Chao Ren, Gang Wang, Xiaoguang Liu, Xiaoxiao Li

Abstract: Model heterogeneous federated learning (MHeteroFL) enables FL clients to collaboratively train models with heterogeneous structures in a distributed fashion. However, existing MHeteroFL methods rely on training loss to transfer knowledge between the client model and the server model, resulting in limited knowledge exchange. To address this limitation, we propose the Federated model heterogeneous M… ▽ More Model heterogeneous federated learning (MHeteroFL) enables FL clients to collaboratively train models with heterogeneous structures in a distributed fashion. However, existing MHeteroFL methods rely on training loss to transfer knowledge between the client model and the server model, resulting in limited knowledge exchange. To address this limitation, we propose the Federated model heterogeneous Matryoshka Representation Learning (FedMRL) approach for supervised learning tasks. It adds an auxiliary small homogeneous model shared by clients with heterogeneous local models. (1) The generalized and personalized representations extracted by the two models' feature extractors are fused by a personalized lightweight representation projector. This step enables representation fusion to adapt to local data distribution. (2) The fused representation is then used to construct Matryoshka representations with multi-dimensional and multi-granular embedded representations learned by the global homogeneous model header and the local heterogeneous model header. This step facilitates multi-perspective representation learning and improves model learning capability. Theoretical analysis shows that FedMRL achieves a $O(1/T)$ non-convex convergence rate. Extensive experiments on benchmark datasets demonstrate its superior model accuracy with low communication and computational costs compared to seven state-of-the-art baselines. It achieves up to 8.48% and 24.94% accuracy improvement compared with the state-of-the-art and the best same-category baseline, respectively. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2406.00399 [pdf, other]

Patterned Beam Training: A Novel Low-Complexity and Low-Overhead Scheme for ELAA

Authors: Hongkang Yu, Yuan Si, Shujuan Zhang, Yijian Chen

Abstract: Extremely large antenna arrays (ELAAs) can provide higher spectral efficiency. However, the use of narrower beams for data transmission significantly increases the overhead associated with beam training. In this letter, we propose a novel patterned beam training (PBT) scheme characterized by its low overhead and complexity. This scheme requires only a single linear operation by both the base stati… ▽ More Extremely large antenna arrays (ELAAs) can provide higher spectral efficiency. However, the use of narrower beams for data transmission significantly increases the overhead associated with beam training. In this letter, we propose a novel patterned beam training (PBT) scheme characterized by its low overhead and complexity. This scheme requires only a single linear operation by both the base station and the user equipment to determine the optimal beam, reducing the training overhead to half or even less compared to traditional exhaustive search methods. Furthermore, We discuss the pattern design principles in detail and provide specific forms. Simulation results demonstrate that the proposed scheme outperforms the compared methods in terms of beam alignment accuracy and achieves a balance between signal-to-noise ratio (SNR) conditions and training overhead, making it a promising alternative. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2406.00265 [pdf, other]

Shadows, Quasinormal Modes, and Optical Appearances of Black Holes in Horndeski Theory

Authors: Zhi Luo, Jin Li, Ke-Jian He, Hao Yu

Abstract: This work describes the motion of photons in black hole (BH) spacetimes within the framework of Horndeski theory. We focus on the shadows, quasinormal modes (QNMs) and optical appearances of BHs surrounded by geometrically thin accretion disks. The QNMs of BHs are calculated by the WKB method and the eikonal limit, respectively. Using Event Horizon Telescope (EHT) observations of… ▽ More This work describes the motion of photons in black hole (BH) spacetimes within the framework of Horndeski theory. We focus on the shadows, quasinormal modes (QNMs) and optical appearances of BHs surrounded by geometrically thin accretion disks. The QNMs of BHs are calculated by the WKB method and the eikonal limit, respectively. Using Event Horizon Telescope (EHT) observations of $\mathrm{M} 87^*$ and $\mathrm{Sgr} \mathrm{A}^*$, we can constrain the parameter in Horndeski theory to a small range. Based on the constraint, we obtain the frequency ranges of the fundamental modes for $\mathrm{M} 87^*$ and $\mathrm{Sgr} \mathrm{A}^*$ in Horndeski theory. By exploring the optical appearances of BHs, we find that for the current resolution of the EHT, it primarily captures direct emission. This work advances our understanding of the observational characteristics of BHs in Horndeski theory and constrains Horndeski theory by EHT observations of $\mathrm{M} 87^*$ and $\mathrm{Sgr} \mathrm{A}^*$. △ Less

Submitted 22 July, 2024; v1 submitted 31 May, 2024; originally announced June 2024.

arXiv:2406.00187 [pdf, other]

doi 10.3847/1538-4357/ad5ffb

Are WASP-107-like Systems Consistent with High-eccentricity Migration?

Authors: Hang Yu, Fei Dai

Abstract: WASP-107 b seems to be a poster child of the long-suspected high-eccentricity migration scenario. It is on a 5.7-day, polar orbit. The planet is Jupiter-like in radius but Neptune-like in mass with exceptionally low density. WASP-107 c is on a 1100-day, $e=0.28$ orbit with at least Saturn mass. Planet b may still have a residual eccentricity of $0.06\pm 0.04$: the ongoing tidal dissipation leads t… ▽ More WASP-107 b seems to be a poster child of the long-suspected high-eccentricity migration scenario. It is on a 5.7-day, polar orbit. The planet is Jupiter-like in radius but Neptune-like in mass with exceptionally low density. WASP-107 c is on a 1100-day, $e=0.28$ orbit with at least Saturn mass. Planet b may still have a residual eccentricity of $0.06\pm 0.04$: the ongoing tidal dissipation leads to the observed internally heated atmosphere and hydrodynamic atmospheric erosion. We present a population synthesis study coupling octopole Lidov-Kozai oscillations with various short-range forces, while simultaneously accounting for the radius inflation and tidal disruption of the planet. We find that a high-eccentricity migration scenario can successfully explain nearly all observed system properties. Our simulations further suggest that the initial location of WASP-107 b at the onset of migration is likely within the snowline ($<0.5\,{\rm AU}$). More distant initial orbits usually lead to tidal disruption or orbit crossing. WASP-107 b most likely lost no more than 20% of its mass during the high-eccentricity migration, i.e. it did not form as a Jupiter-mass object. More vigorous tidally-induced mass loss leads to disruption of the planet during migration. We predict that the current-day mutual inclination between the planets b and c is substantial: at least 25-55$^\circ$ which may be tested with future Gaia astrometric observations. Knowing the current-day mutual inclination may further constrain the initial orbit of planet b. We suggest that the proposed high-eccentricity migration scenario of WASP-107 may be applicable to HAT-P-11, GJ-3470, HAT-P-18, and GJ-436 which have similar orbital architectures. △ Less

Submitted 9 July, 2024; v1 submitted 31 May, 2024; originally announced June 2024.

Comments: 19 pages, 10 figures. Accepted by ApJ

arXiv:2405.20337 [pdf, other]

OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving

Authors: Lening Wang, Wenzhao Zheng, Yilong Ren, Han Jiang, Zhiyong Cui, Haiyang Yu, Jiwen Lu

Abstract: Understanding the evolution of 3D scenes is important for effective autonomous driving. While conventional methods mode scene development with the motion of individual instances, world models emerge as a generative framework to describe the general scene dynamics. However, most existing methods adopt an autoregressive framework to perform next-token prediction, which suffer from inefficiency in mo… ▽ More Understanding the evolution of 3D scenes is important for effective autonomous driving. While conventional methods mode scene development with the motion of individual instances, world models emerge as a generative framework to describe the general scene dynamics. However, most existing methods adopt an autoregressive framework to perform next-token prediction, which suffer from inefficiency in modeling long-term temporal evolutions. To address this, we propose a diffusion-based 4D occupancy generation model, OccSora, to simulate the development of the 3D world for autonomous driving. We employ a 4D scene tokenizer to obtain compact discrete spatial-temporal representations for 4D occupancy input and achieve high-quality reconstruction for long-sequence occupancy videos. We then learn a diffusion transformer on the spatial-temporal representations and generate 4D occupancy conditioned on a trajectory prompt. We conduct extensive experiments on the widely used nuScenes dataset with Occ3D occupancy annotations. OccSora can generate 16s-videos with authentic 3D layout and temporal consistency, demonstrating its ability to understand the spatial and temporal distributions of driving scenes. With trajectory-aware 4D generation, OccSora has the potential to serve as a world simulator for the decision-making of autonomous driving. Code is available at: https://github.com/wzzheng/OccSora. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: Code is available at: https://github.com/wzzheng/OccSora

arXiv:2405.19902 [pdf, other]

Learning Discriminative Dynamics with Label Corruption for Noisy Label Detection

Authors: Suyeon Kim, Dongha Lee, SeongKu Kang, Sukang Chae, Sanghwan Jang, Hwanjo Yu

Abstract: Label noise, commonly found in real-world datasets, has a detrimental impact on a model's generalization. To effectively detect incorrectly labeled instances, previous works have mostly relied on distinguishable training signals, such as training loss, as indicators to differentiate between clean and noisy labels. However, they have limitations in that the training signals incompletely reveal the… ▽ More Label noise, commonly found in real-world datasets, has a detrimental impact on a model's generalization. To effectively detect incorrectly labeled instances, previous works have mostly relied on distinguishable training signals, such as training loss, as indicators to differentiate between clean and noisy labels. However, they have limitations in that the training signals incompletely reveal the model's behavior and are not effectively generalized to various noise types, resulting in limited detection accuracy. In this paper, we propose DynaCor framework that distinguishes incorrectly labeled instances from correctly labeled ones based on the dynamics of the training signals. To cope with the absence of supervision for clean and noisy labels, DynaCor first introduces a label corruption strategy that augments the original dataset with intentionally corrupted labels, enabling indirect simulation of the model's behavior on noisy labels. Then, DynaCor learns to identify clean and noisy instances by inducing two clearly distinguishable clusters from the latent representations of training dynamics. Our comprehensive experiments show that DynaCor outperforms the state-of-the-art competitors and shows strong robustness to various noise types and noise rates. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: Accepted to CVPR 2024

arXiv:2405.19611 [pdf, other]

Nonthermal Acceleration of Electrons, Positrons and Protons at a Nonrelativistic Quasiparallel Collisionless Shock

Authors: Huan Yu, Qi Xia, Jun Fang

Abstract: Energetic positrons have been observed in the interstellar medium, and high-energy positrons with relativistic energies up to approximately 1 TeV have been detected in Galactic cosmic rays. We conducted a study on the acceleration of particles, specifically positrons, in a nonrelativistic quasiparallel collisionless shock induced by a plasma consisting of protons, electrons, and positrons. The pos… ▽ More Energetic positrons have been observed in the interstellar medium, and high-energy positrons with relativistic energies up to approximately 1 TeV have been detected in Galactic cosmic rays. We conducted a study on the acceleration of particles, specifically positrons, in a nonrelativistic quasiparallel collisionless shock induced by a plasma consisting of protons, electrons, and positrons. The positron-to-proton number density ratio in the plasma is 0.1. We focused on a representative shock with a sonic Mach number of 17.1 and an Alfvénic Mach number of 16.8 in the rest frame of the shock. To investigate the acceleration mechanisms of particles including positrons in the shock, we utilized one-dimensional particle-in-cell (PIC) simulations. It was found that all three species of particles in the shock can be accelerated and exhibit power law spectra. At the shock front, a significant portion of incoming upstream particles are reflected and undergo significant energy increase, and these reflected particles can be efficiently injected into the process of diffusive shock acceleration (DSA). Moveover, the reflected positrons can be further accelerated by an electric field parallel to the magnetic field when they move along the magnetic field upstream of the shock. As a result, positrons can be preferentially accelerated to be injected in the DSA process compared to electrons. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 7 pages, 7 figures, Accepted for publication in ApJ

arXiv:2405.19366 [pdf, other]

ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text

Authors: Han Yu, Peikun Guo, Akane Sano

Abstract: The utilization of deep learning on electrocardiogram (ECG) analysis has brought the advanced accuracy and efficiency of cardiac healthcare diagnostics. By leveraging the capabilities of deep learning in semantic understanding, especially in feature extraction and representation learning, this study introduces a new multimodal contrastive pretaining framework that aims to improve the quality and r… ▽ More The utilization of deep learning on electrocardiogram (ECG) analysis has brought the advanced accuracy and efficiency of cardiac healthcare diagnostics. By leveraging the capabilities of deep learning in semantic understanding, especially in feature extraction and representation learning, this study introduces a new multimodal contrastive pretaining framework that aims to improve the quality and robustness of learned representations of 12-lead ECG signals. Our framework comprises two key components, including Cardio Query Assistant (CQA) and ECG Semantics Integrator(ESI). CQA integrates a retrieval-augmented generation (RAG) pipeline to leverage large language models (LLMs) and external medical knowledge to generate detailed textual descriptions of ECGs. The generated text is enriched with information about demographics and waveform patterns. ESI integrates both contrastive and captioning loss to pretrain ECG encoders for enhanced representations. We validate our approach through various downstream tasks, including arrhythmia detection and ECG-based subject identification. Our experimental results demonstrate substantial improvements over strong baselines in these tasks. These baselines encompass supervised and self-supervised learning methods, as well as prior multimodal pretraining approaches. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.19079 [pdf, other]

On the Influence of Smoothness Constraints in Computed Tomography Motion Compensation

Authors: Mareike Thies, Fabian Wagner, Noah Maul, Siyuan Mei, Mingxuan Gu, Laura Pfaff, Nastassia Vysotskaya, Haijun Yu, Andreas Maier

Abstract: Computed tomography (CT) relies on precise patient immobilization during image acquisition. Nevertheless, motion artifacts in the reconstructed images can persist. Motion compensation methods aim to correct such artifacts post-acquisition, often incorporating temporal smoothness constraints on the estimated motion patterns. This study analyzes the influence of a spline-based motion model within an… ▽ More Computed tomography (CT) relies on precise patient immobilization during image acquisition. Nevertheless, motion artifacts in the reconstructed images can persist. Motion compensation methods aim to correct such artifacts post-acquisition, often incorporating temporal smoothness constraints on the estimated motion patterns. This study analyzes the influence of a spline-based motion model within an existing rigid motion compensation algorithm for cone-beam CT on the recoverable motion frequencies. Results demonstrate that the choice of motion model crucially influences recoverable frequencies. The optimization-based motion compensation algorithm is able to accurately fit the spline nodes for frequencies almost up to the node-dependent theoretical limit according to the Nyquist-Shannon theorem. Notably, a higher node count does not compromise reconstruction performance for slow motion patterns, but can extend the range of recoverable high frequencies for the investigated algorithm. Eventually, the optimal motion model is dependent on the imaged anatomy, clinical use case, and scanning protocol and should be tailored carefully to the expected motion frequency spectrum to ensure accurate motion compensation. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.19046 [pdf, other]

Continual Collaborative Distillation for Recommender System

Authors: Gyuseok Lee, SeongKu Kang, Wonbin Kweon, Hwanjo Yu

Abstract: Knowledge distillation (KD) has emerged as a promising technique for addressing the computational challenges associated with deploying large-scale recommender systems. KD transfers the knowledge of a massive teacher system to a compact student model, to reduce the huge computational burdens for inference while retaining high accuracy. The existing KD studies primarily focus on one-time distillatio… ▽ More Knowledge distillation (KD) has emerged as a promising technique for addressing the computational challenges associated with deploying large-scale recommender systems. KD transfers the knowledge of a massive teacher system to a compact student model, to reduce the huge computational burdens for inference while retaining high accuracy. The existing KD studies primarily focus on one-time distillation in static environments, leaving a substantial gap in their applicability to real-world scenarios dealing with continuously incoming users, items, and their interactions. In this work, we delve into a systematic approach to operating the teacher-student KD in a non-stationary data stream. Our goal is to enable efficient deployment through a compact student, which preserves the high performance of the massive teacher, while effectively adapting to continuously incoming data. We propose Continual Collaborative Distillation (CCD) framework, where both the teacher and the student continually and collaboratively evolve along the data stream. CCD facilitates the student in effectively adapting to new data, while also enabling the teacher to fully leverage accumulated knowledge. We validate the effectiveness of CCD through extensive quantitative, ablative, and exploratory experiments on two real-world datasets. We expect this research direction to contribute to narrowing the gap between existing KD studies and practical applications, thereby enhancing the applicability of KD in real-world systems. △ Less

Submitted 25 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

Comments: Accepted by KDD 2024 research track. 9 main pages + 1 appendix page, 5 figures

Showing 51–100 of 3,175 results for author: Yu, H