Search | arXiv e-print repository

LLM-Based Multi-Hop Question Answering with Knowledge Graph Integration in Evolving Environments

Authors: Ruirui Chen, Weifeng Jiang, Chengwei Qin, Ishaan Singh Rawal, Cheston Tan, Dongkyu Choi, Bo Xiong, Bo Ai

Abstract: The rapid obsolescence of information in Large Language Models (LLMs) has driven the development of various techniques to incorporate new facts. However, existing methods for knowledge editing still face difficulties with multi-hop questions that require accurate fact identification and sequential logical reasoning, particularly among numerous fact updates. To tackle these challenges, this paper i… ▽ More The rapid obsolescence of information in Large Language Models (LLMs) has driven the development of various techniques to incorporate new facts. However, existing methods for knowledge editing still face difficulties with multi-hop questions that require accurate fact identification and sequential logical reasoning, particularly among numerous fact updates. To tackle these challenges, this paper introduces Graph Memory-based Editing for Large Language Models (GMeLLo), a straitforward and effective method that merges the explicit knowledge representation of Knowledge Graphs (KGs) with the linguistic flexibility of LLMs. Beyond merely leveraging LLMs for question answering, GMeLLo employs these models to convert free-form language into structured queries and fact triples, facilitating seamless interaction with KGs for rapid updates and precise multi-hop reasoning. Our results show that GMeLLo significantly surpasses current state-of-the-art knowledge editing methods in the multi-hop question answering benchmark, MQuAKE, especially in scenarios with extensive knowledge edits. △ Less

Submitted 28 August, 2024; originally announced August 2024.

arXiv:2408.14917 [pdf, other]

PMSN: A Parallel Multi-compartment Spiking Neuron for Multi-scale Temporal Processing

Authors: Xinyi Chen, Jibin Wu, Chenxiang Ma, Yinsong Yan, Yujie Wu, Kay Chen Tan

Abstract: Spiking Neural Networks (SNNs) hold great potential to realize brain-inspired, energy-efficient computational systems. However, current SNNs still fall short in terms of multi-scale temporal processing compared to their biological counterparts. This limitation has resulted in poor performance in many pattern recognition tasks with information that varies across different timescales. To address thi… ▽ More Spiking Neural Networks (SNNs) hold great potential to realize brain-inspired, energy-efficient computational systems. However, current SNNs still fall short in terms of multi-scale temporal processing compared to their biological counterparts. This limitation has resulted in poor performance in many pattern recognition tasks with information that varies across different timescales. To address this issue, we put forward a novel spiking neuron model called Parallel Multi-compartment Spiking Neuron (PMSN). The PMSN emulates biological neurons by incorporating multiple interacting substructures and allows for flexible adjustment of the substructure counts to effectively represent temporal information across diverse timescales. Additionally, to address the computational burden associated with the increased complexity of the proposed model, we introduce two parallelization techniques that decouple the temporal dependencies of neuronal updates, enabling parallelized training across different time steps. Our experimental results on a wide range of pattern recognition tasks demonstrate the superiority of PMSN. It outperforms other state-of-the-art spiking neuron models in terms of its temporal processing capacity, training speed, and computation cost. Specifically, compared with the commonly used Leaky Integrate-and-Fire neuron, PMSN offers a simulation acceleration of over 10 $\times$ and a 30 % improvement in accuracy on Sequential CIFAR10 dataset, while maintaining comparable computational cost. △ Less

Submitted 27 August, 2024; originally announced August 2024.

arXiv:2408.13987 [pdf, other]

Focused Large Language Models are Stable Many-Shot Learners

Authors: Peiwen Yuan, Shaoxiong Feng, Yiwei Li, Xinglin Wang, Yueqi Zhang, Chuyi Tan, Boyuan Pan, Heda Wang, Yao Hu, Kan Li

Abstract: In-Context Learning (ICL) enables large language models (LLMs) to achieve rapid task adaptation by learning from demonstrations. With the increase in available context length of LLMs, recent experiments have shown that the performance of ICL does not necessarily scale well in many-shot (demonstration) settings. We theoretically and experimentally confirm that the reason lies in more demonstrations… ▽ More In-Context Learning (ICL) enables large language models (LLMs) to achieve rapid task adaptation by learning from demonstrations. With the increase in available context length of LLMs, recent experiments have shown that the performance of ICL does not necessarily scale well in many-shot (demonstration) settings. We theoretically and experimentally confirm that the reason lies in more demonstrations dispersing the model attention from the query, hindering its understanding of key content. Inspired by how humans learn from examples, we propose a training-free method FocusICL, which conducts triviality filtering to avoid attention being diverted by unimportant contents at token-level and operates hierarchical attention to further ensure sufficient attention towards current query at demonstration-level. We also design an efficient hyperparameter searching strategy for FocusICL based on model perplexity of demonstrations. Comprehensive experiments validate that FocusICL achieves an average performance improvement of 5.2% over vanilla ICL and scales well with many-shot demonstrations. △ Less

Submitted 25 August, 2024; originally announced August 2024.

Comments: 15 pages

arXiv:2408.11895 [pdf, other]

doi 10.3847/1538-4357/ad6a13

Contemporaneous X-ray Observations of 30 Bright Radio Bursts from the Prolific Fast Radio Burst Source FRB 20220912A

Authors: Amanda M. Cook, Paul Scholz, Aaron B. Pearlman, Thomas C. Abbott, Marilyn Cruces, B. M. Gaensler, Fengqiu, Dong, Daniele Michilli, Gwendolyn Eadie, Victoria M. Kaspi, Ingrid Stairs, Chia Min Tan, Mohit Bhardwaj, Tomas Cassanelli, Alice P. Curtin, Adaeze L. Ibik, Mattias Lazda, Kiyoshi W. Masui, Ayush Pandhi, Masoud Rafiei-Ravandi, Mawson W. Sammons, Kaitlyn Shin, Kendrick Smith, David C. Stenning

Abstract: We present an extensive contemporaneous X-ray and radio campaign performed on the repeating fast radio burst (FRB) source FRB 20220912A for eight weeks immediately following the source's detection by CHIME/FRB. This includes X-ray data from XMM-Newton, NICER, and Swift, and radio detections of FRB 20220912A from CHIME/Pulsar and Effelsberg. We detect no significant X-ray emission at the time of 30… ▽ More We present an extensive contemporaneous X-ray and radio campaign performed on the repeating fast radio burst (FRB) source FRB 20220912A for eight weeks immediately following the source's detection by CHIME/FRB. This includes X-ray data from XMM-Newton, NICER, and Swift, and radio detections of FRB 20220912A from CHIME/Pulsar and Effelsberg. We detect no significant X-ray emission at the time of 30 radio bursts with upper limits on $0.5-10.0$ keV X-ray fluence of $(1.5-14.5)\times 10^{-10}$ erg cm$^{-2}$ (99.7% credible interval, unabsorbed) on a timescale of 100 ms. Translated into a fluence ratio $η_{\text{ x/r}} = F_{\text{X-ray}}/F_{\text{radio}}$, this corresponds to $η_{\text{ x/r}} < 7\times10^{6}$. For persistent emission from the location of FRB 20220912A, we derive a 99.7% $0.5-10.0$ keV isotropic flux limit of $8.8\times 10^{-15}$ erg cm$^{-2}$ s$^{-1}$ (unabsorbed) or an isotropic luminosity limit of 1.4$\times10^{41}$ erg s$^{-1}$ at a distance of 362.4 Mpc. We derive a hierarchical extension to the standard Bayesian treatment of low-count and background-contaminated X-ray data, which allows the robust combination of multiple observations. This methodology allows us to place the best (lowest) 99.7% credible interval upper limit on an FRB $η_{\text{ x/r}}$ to date, $η_{\text{ x/r}} < 2\times10^6$, assuming that all thirty detected radio bursts are associated with X-ray bursts with the same fluence ratio. If we instead adopt an X-ray spectrum similar to the X-ray burst observed contemporaneously with FRB-like emission from Galactic magnetar SGR 1935+2154 detected on 2020 April 28, we derive a 99.7% credible interval upper limit on $η_{\text{ x/r}}$ of $8\times10^5$, which is only 3 times the observed value of $η_{\text{ x/r}}$ for SGR 1935+2154. △ Less

Submitted 21 August, 2024; originally announced August 2024.

Comments: 23 pages, 3 figures. ApJ in press (accepted after resubmission July 19th, 2024)

arXiv:2408.11330 [pdf, other]

Design Principle Transfer in Neural Architecture Search via Large Language Models

Authors: Xun Zhou, Liang Feng, Xingyu Wu, Zhichao Lu, Kay Chen Tan

Abstract: Transferable neural architecture search (TNAS) has been introduced to design efficient neural architectures for multiple tasks, to enhance the practical applicability of NAS in real-world scenarios. In TNAS, architectural knowledge accumulated in previous search processes is reused to warm up the architecture search for new tasks. However, existing TNAS methods still search in an extensive search… ▽ More Transferable neural architecture search (TNAS) has been introduced to design efficient neural architectures for multiple tasks, to enhance the practical applicability of NAS in real-world scenarios. In TNAS, architectural knowledge accumulated in previous search processes is reused to warm up the architecture search for new tasks. However, existing TNAS methods still search in an extensive search space, necessitating the evaluation of numerous architectures. To overcome this challenge, this work proposes a novel transfer paradigm, i.e., design principle transfer. In this work, the linguistic description of various structural components' effects on architectural performance is termed design principles. They are learned from established architectures and then can be reused to reduce the search space by discarding unpromising architectures. Searching in the refined search space can boost both the search performance and efficiency for new NAS tasks. To this end, a large language model (LLM)-assisted design principle transfer (LAPT) framework is devised. In LAPT, LLM is applied to automatically reason the design principles from a set of given architectures, and then a principle adaptation method is applied to refine these principles progressively based on the new search results. Experimental results show that LAPT can beat the state-of-the-art TNAS methods on most tasks and achieve comparable performance on others. △ Less

Submitted 21 August, 2024; originally announced August 2024.

arXiv:2408.10316 [pdf, other]

Project Dinos II: Redshift evolution of dark and luminous matter density profiles in strong-lensing elliptical galaxies across $0.1 < z < 0.9$

Authors: William Sheu, Anowar J. Shajib, Tommaso Treu, Alessandro Sonnenfeld, Simon Birrer, Michele Cappellari, Lindsay J. Oldham, Chin Yi Tan

Abstract: We present a new measurement of the dark and luminous matter distribution of massive elliptical galaxies, and their evolution with redshift, by combining strong lensing and dynamical observables. Our sample of 58 lens galaxies covers a redshift range of $0.090\leq z_{\rm l}\leq0.884$. By combining new Hubble Space Telescope imaging with previously observed velocity dispersion and line-of-sight mea… ▽ More We present a new measurement of the dark and luminous matter distribution of massive elliptical galaxies, and their evolution with redshift, by combining strong lensing and dynamical observables. Our sample of 58 lens galaxies covers a redshift range of $0.090\leq z_{\rm l}\leq0.884$. By combining new Hubble Space Telescope imaging with previously observed velocity dispersion and line-of-sight measurements, we decompose the luminous matter profile from the dark matter profile and perform a Bayesian hierarchical analysis to constrain the population-level properties of both profiles. We find that the inner slope of the dark matter density profile ("cusp"; $ρ_{\rm DM}\propto r^{-γ_{\rm in}}$) is slightly steeper ($μ_{γ_{\rm in}}=1.18^{+0.03}_{-0.03}$ at $z=0.35$ with $\leq0.16$ intrinsic scatter) than a standard Navarro$-$Frenk$-$White (NFW; $γ_{\rm in}=1$), with an appreciable evolution with redshift ($d\log(γ_{\rm in})/dz=-0.33\pm0.13$) and is consistent with NFW-like distributions at higher redshifts ($z\geq0.56$ for $\leq1σ$ consistency). Additionally, we find the stellar mass-to-light ratio at the population level consistent with that of a Salpeter initial mass function, a small stellar mass-to-light gradient ($κ_{*}(r)\propto r^{-η}$, with $\overlineη\leq9.4\times10^{-3}$), and isotropic stellar orbits. Our averaged total mass density profile is consistent with a power-law profile within $0.25-4$ Einstein radii ($\overlineγ=2.14\pm0.06$), with an internal mass-sheet transformation parameter $\overlineλ=1.02\pm0.01$ consistent with no mass sheet. Our findings confirm the validity of the standard mass models used for time-delay cosmography. However, our results are in strong tension with predictions from hydrodynamical simulations such as IllustrisTNG, highlighting the need to better understand the formation of massive galaxies. △ Less

Submitted 19 August, 2024; originally announced August 2024.

Comments: 28 pages, 20 figures

arXiv:2408.10287 [pdf]

Recognizing Beam Profiles from Silicon Photonics Gratings using Transformer Model

Authors: Yu Dian Lim, Hong Yu Li, Simon Chun Kiat Goh, Xiangyu Wang, Peng Zhao, Chuan Seng Tan

Abstract: Over the past decade, there has been extensive work in developing integrated silicon photonics (SiPh) gratings for the optical addressing of trapped ion qubits in the ion trap quantum computing community. However, when viewing beam profiles from infrared (IR) cameras, it is often difficult to determine the corresponding heights where the beam profiles are located. In this work, we developed transf… ▽ More Over the past decade, there has been extensive work in developing integrated silicon photonics (SiPh) gratings for the optical addressing of trapped ion qubits in the ion trap quantum computing community. However, when viewing beam profiles from infrared (IR) cameras, it is often difficult to determine the corresponding heights where the beam profiles are located. In this work, we developed transformer models to recognize the corresponding height categories of beam profiles of light from SiPh gratings. The model is trained using two techniques: (1) input patches, and (2) input sequence. For model trained with input patches, the model achieved recognition accuracy of 0.938. Meanwhile, model trained with input sequence shows lower accuracy of 0.895. However, when repeating the model-training 150 cycles, model trained with input patches shows inconsistent accuracy ranges between 0.445 to 0.959, while model trained with input sequence exhibit higher accuracy values between 0.789 to 0.936. The obtained outcomes can be expanded to various applications, including auto-focusing of light beam and auto-adjustment of z-axis stage to acquire desired beam profiles. △ Less

Submitted 22 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

arXiv:2408.09691 [pdf, ps, other]

Regularity of Fourier integrals on product spaces

Authors: Chaoqiang Tan, Zipeng Wang

Abstract: We study a family of Fourier integral operators by allowing their symbols to satisfy a multi-parameter differential inequality on R^N. We show that these operators of order -(N-1)/2 are bounded from classical, atom decomposable H^1-Hardy space to L^1(R^N). Consequently, we obtain a sharp L^p-regularity result due to Seeger, Sogge and Stein. We study a family of Fourier integral operators by allowing their symbols to satisfy a multi-parameter differential inequality on R^N. We show that these operators of order -(N-1)/2 are bounded from classical, atom decomposable H^1-Hardy space to L^1(R^N). Consequently, we obtain a sharp L^p-regularity result due to Seeger, Sogge and Stein. △ Less

Submitted 28 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

Comments: We corrected some typos and errors from the previous version

arXiv:2408.09647 [pdf, other]

C2P-CLIP: Injecting Category Common Prompt in CLIP to Enhance Generalization in Deepfake Detection

Authors: Chuangchuang Tan, Renshuai Tao, Huan Liu, Guanghua Gu, Baoyuan Wu, Yao Zhao, Yunchao Wei

Abstract: This work focuses on AIGC detection to develop universal detectors capable of identifying various types of forgery images. Recent studies have found large pre-trained models, such as CLIP, are effective for generalizable deepfake detection along with linear classifiers. However, two critical issues remain unresolved: 1) understanding why CLIP features are effective on deepfake detection through a… ▽ More This work focuses on AIGC detection to develop universal detectors capable of identifying various types of forgery images. Recent studies have found large pre-trained models, such as CLIP, are effective for generalizable deepfake detection along with linear classifiers. However, two critical issues remain unresolved: 1) understanding why CLIP features are effective on deepfake detection through a linear classifier; and 2) exploring the detection potential of CLIP. In this study, we delve into the underlying mechanisms of CLIP's detection capabilities by decoding its detection features into text and performing word frequency analysis. Our finding indicates that CLIP detects deepfakes by recognizing similar concepts (Fig. \ref{fig:fig1} a). Building on this insight, we introduce Category Common Prompt CLIP, called C2P-CLIP, which integrates the category common prompt into the text encoder to inject category-related concepts into the image encoder, thereby enhancing detection performance (Fig. \ref{fig:fig1} b). Our method achieves a 12.41\% improvement in detection accuracy compared to the original CLIP, without introducing additional parameters during testing. Comprehensive experiments conducted on two widely-used datasets, encompassing 20 generation models, validate the efficacy of the proposed method, demonstrating state-of-the-art performance. The code is available at \url{https://github.com/chuangchuangtan/C2P-CLIP-DeepfakeDetection} △ Less

Submitted 18 August, 2024; originally announced August 2024.

Comments: 10 pages, 5 figures

arXiv:2408.08044 [pdf, other]

Crystalline Material Discovery in the Era of Artificial Intelligence

Authors: Zhenzhong Wang, Haowei Hua, Wanyu Lin, Ming Yang, Kay Chen Tan

Abstract: Crystalline materials, with their symmetrical and periodic structures, possess a diverse array of properties and have been widely used in various fields, ranging from electronic devices to energy applications. To discover crystalline materials, traditional experimental and computational approaches are often time-consuming and expensive. In these years, thanks to the explosive amount of crystalline… ▽ More Crystalline materials, with their symmetrical and periodic structures, possess a diverse array of properties and have been widely used in various fields, ranging from electronic devices to energy applications. To discover crystalline materials, traditional experimental and computational approaches are often time-consuming and expensive. In these years, thanks to the explosive amount of crystalline materials data, great interest has been given to data-driven materials discovery. Particularly, recent advancements have exploited the expressive representation ability of deep learning to model the highly complex atomic systems within crystalline materials, opening up new avenues for fast and accurate materials discovery. These works typically focus on four types of tasks, including physicochemical property prediction, crystalline material synthesis, aiding characterization, and accelerating theoretical computations. Despite the remarkable progress, there is still a lack of systematic research to summarize their correlations, distinctions, and limitations. To fill this gap, we systematically investigated the progress made in deep learning-based material discovery in recent years. We first introduce several data representations of the crystalline materials. Based on the representations, we summarize various fundamental deep learning models and their tailored usages in material discovery tasks. We also point out the remaining challenges and propose several future directions. This review offers comprehensive and valuable insights, and fosters progress in the intersection of artificial intelligence and material science. △ Less

Submitted 23 August, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

arXiv:2408.07176 [pdf, other]

Surrogate-Assisted Search with Competitive Knowledge Transfer for Expensive Optimization

Authors: Xiaoming Xue, Yao Hu, Liang Feng, Kai Zhang, Linqi Song, Kay Chen Tan

Abstract: Expensive optimization problems (EOPs) have attracted increasing research attention over the decades due to their ubiquity in a variety of practical applications. Despite many sophisticated surrogate-assisted evolutionary algorithms (SAEAs) that have been developed for solving such problems, most of them lack the ability to transfer knowledge from previously-solved tasks and always start their sea… ▽ More Expensive optimization problems (EOPs) have attracted increasing research attention over the decades due to their ubiquity in a variety of practical applications. Despite many sophisticated surrogate-assisted evolutionary algorithms (SAEAs) that have been developed for solving such problems, most of them lack the ability to transfer knowledge from previously-solved tasks and always start their search from scratch, making them troubled by the notorious cold-start issue. A few preliminary studies that integrate transfer learning into SAEAs still face some issues, such as defective similarity quantification that is prone to underestimate promising knowledge, surrogate-dependency that makes the transfer methods not coherent with the state-of-the-art in SAEAs, etc. In light of the above, a plug and play competitive knowledge transfer method is proposed to boost various SAEAs in this paper. Specifically, both the optimized solutions from the source tasks and the promising solutions acquired by the target surrogate are treated as task-solving knowledge, enabling them to compete with each other to elect the winner for expensive evaluation, thus boosting the search speed on the target task. Moreover, the lower bound of the convergence gain brought by the knowledge competition is mathematically analyzed, which is expected to strengthen the theoretical foundation of sequential transfer optimization. Experimental studies conducted on a series of benchmark problems and a practical application from the petroleum industry verify the efficacy of the proposed method. The source code of the competitive knowledge transfer is available at https://github.com/XmingHsueh/SAS-CKT. △ Less

Submitted 20 August, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

Comments: 22 pages, 14 figures

arXiv:2408.06351 [pdf, other]

A Probabilistic Approach for Queue Length Estimation Using License Plate Recognition Data: Considering Overtaking in Multi-lane Scenarios

Authors: Lyuzhou Luo, Hao Wu, Jiahao Liu, Keshuang Tang, Chaopeng Tan

Abstract: Multi-section license plate recognition (LPR) data provides input-output information and sampled travel times of the investigated link, serving as an ideal data source for lane-based queue length estimation in recent studies. However, most of these studies assumed the strict FIFO rule or a specific arrival process, thus ignoring the potential impact of overtaking and the variation of traffic flows… ▽ More Multi-section license plate recognition (LPR) data provides input-output information and sampled travel times of the investigated link, serving as an ideal data source for lane-based queue length estimation in recent studies. However, most of these studies assumed the strict FIFO rule or a specific arrival process, thus ignoring the potential impact of overtaking and the variation of traffic flows, especially in multi-lane scenarios. To address this issue, we propose a probabilistic approach to derive the stochastic queue length by constructing a conditional probability model of no-delay arrival time (NAT), i.e., the arrival time of vehicles without experiencing any delay, based on multi-section LPR data. First, the NAT conditions for all vehicles are established based on upstream and downstream vehicle departure times and sequences. To reduce the computational dimensionality and complexity, a DP-based algorithm is developed for vehicle group partitioning based on potential interactions between vehicles. Then, the conditional probability of NATs of each vehicle group is derived and an MCMC sampling method is employed for calculation. Subsequently, the stochastic queue profile and maximum queue length for each cycle can be derived based on the NATs of vehicles. Eventually, to leverage the LPR data sufficiently, we extend our approach to multi-lane scenarios, where the problem can be converted to a weighted general exact coverage problem and solved by a backtracking algorithm with heuristics. Empirical and simulation experiments have shown that the proposed approach outperforms the state-of-the-art method, demonstrating significant improvements in accuracy and robustness across various traffic conditions, including different V/C ratios, matching rates, and FIFO violation rates. In addition, the performance of the proposed approach can be further improved by utilizing multi-lane LPR data. △ Less

Submitted 24 July, 2024; originally announced August 2024.

Comments: 30 pages, 20 figures

arXiv:2408.03506 [pdf, ps, other]

1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality Data

Authors: Calvin Tan, Jerome Wang

Abstract: This paper presents a compute-efficient approach to pre-training a Language Model-the "1.5-Pints"-in only 9 days, while outperforming state-of-the-art models as an instruction-following assistant.Based on MT-Bench (a benchmark that emulates human judgments), 1.5-Pints outperforms Apple's OpenELM and Microsoft's Phi.This is achieved by a carefully curated pre-training dataset of 57 billion tokens,… ▽ More This paper presents a compute-efficient approach to pre-training a Language Model-the "1.5-Pints"-in only 9 days, while outperforming state-of-the-art models as an instruction-following assistant.Based on MT-Bench (a benchmark that emulates human judgments), 1.5-Pints outperforms Apple's OpenELM and Microsoft's Phi.This is achieved by a carefully curated pre-training dataset of 57 billion tokens, using a mix of automated workflows and manual human review. The selection of the dataset prioritizes content that is considered expository and "textbook-like" to aid the model in reasoning and logical deduction, culminating in its overall ability as a strong and versatile AI model. In terms of the model architecture, we employed a modified Mistral tokenizer, alongside a Llama-2 architecture for wider compatibility. For training, we adopted the methodologies used by StableLM, TinyLlama, and Huggingface Zephyr. 1.5-Pints demonstrates that by focusing on data quality over quantity in LLM training, we can significantly reduce training time and resources required. We believe this approach will not only make pre-training more accessible but also reduce our carbon footprint. Our findings and resources from this research are open-sourced, aiming to facilitate further advancements in the field. The 1.5-Pints model is available in two versions: 2K and 16K context windows. △ Less

Submitted 6 August, 2024; originally announced August 2024.

Comments: Technical Report for 1.5-Pints

arXiv:2408.03211 [pdf, ps, other]

Boundedness of New Type Fourier Integral Operators with Product Structure

Authors: Chaoqiang Tan, Zipeng Wang

Abstract: We investigate a class of Fourier integral operators with weakened symbols, which satisfy a multi-parameter differential inequality in $\R^n$. We establish that these operators retain the classical $L^p$ boundedness and the $H^1$ to $L^1$ boundedness. Notably, the Hardy space considered here is the traditional single-parameter Hardy space rather than a product Hardy space. We investigate a class of Fourier integral operators with weakened symbols, which satisfy a multi-parameter differential inequality in $\R^n$. We establish that these operators retain the classical $L^p$ boundedness and the $H^1$ to $L^1$ boundedness. Notably, the Hardy space considered here is the traditional single-parameter Hardy space rather than a product Hardy space. △ Less

Submitted 6 August, 2024; originally announced August 2024.

MSC Class: Primary 42B20; Secondary 42B30; 42B37; 42B15

arXiv:2408.01735 [pdf, other]

Something from Nothing: A Theoretical Framework for Enhancing or Enabling Cooling of a Mechanical Resonator via the anti-Stokes or Stokes Interaction and Zero-Photon Detection

Authors: Jack Clarke, Evan A. Cryer-Jenkins, Arjun Gupta, Kyle D. Major, Jinglei Zhang, Georg Enzian, Magdalena Szczykulska, Anthony C. Leung, Harsh Rathee, Andreas Ø. Svela, Anthony K. C. Tan, Almut Beige, Klaus Mølmer, Michael R. Vanner

Abstract: We develop a theoretical framework to describe how zero-photon detection may be utilized to enhance laser cooling via the anti-Stokes interaction and, somewhat surprisingly, enable cooling via the Stokes interaction commonly associated with heating. Our description includes both pulsed and continuous measurements as well as optical detection efficiency and open-system dynamics. For both cases, we… ▽ More We develop a theoretical framework to describe how zero-photon detection may be utilized to enhance laser cooling via the anti-Stokes interaction and, somewhat surprisingly, enable cooling via the Stokes interaction commonly associated with heating. Our description includes both pulsed and continuous measurements as well as optical detection efficiency and open-system dynamics. For both cases, we discuss how the cooling depends on the system parameters such as detection efficiency and optomechanical cooperativity, and we study the continuous-measurement-induced dynamics, contrasting to single-photon detection events. For the Stokes case, we explore the interplay between cooling and heating via optomechanical parametric amplification, and we find the efficiency required to cool a mechanical oscillator via zero-photon detection. This work serves as a companion article to the recent experiment [E. A. Cryer-Jenkins, K. D. Major, et al., arXiv:2408.01734 (2024)], which demonstrated enhanced laser cooling of a mechanical oscillator via zero-photon detection on the anti-Stokes signal. The framework developed here provides new approaches for cooling mechanical resonators that can be applied to a wide range of areas including nonclassical state preparation, quantum thermodynamics, and avoiding the often unwanted heating effects of parametric amplification. △ Less

Submitted 6 August, 2024; v1 submitted 3 August, 2024; originally announced August 2024.

Comments: 15 pages, 6 figures

arXiv:2408.01734 [pdf, other]

Something from Nothing: Enhanced Laser Cooling of a Mechanical Resonator via Zero-Photon Detection

Authors: Evan A. Cryer-Jenkins, Kyle D. Major, Jack Clarke, Georg Enzian, Magdalena Szczykulska, Jinglei Zhang, Arjun Gupta, Anthony C. Leung, Harsh Rathee, Andreas Ø. Svela, Anthony K. C. Tan, Almut Beige, Klaus Mølmer, Michael R. Vanner

Abstract: Throughout quantum science and technology, measurement is used as a powerful resource for nonlinear operations and quantum state engineering. In particular, single-photon detection is commonly employed for quantum-information applications and tests of fundamental physics. By contrast, and perhaps counter-intuitively, measurement of the absence of photons also provides useful information, and offer… ▽ More Throughout quantum science and technology, measurement is used as a powerful resource for nonlinear operations and quantum state engineering. In particular, single-photon detection is commonly employed for quantum-information applications and tests of fundamental physics. By contrast, and perhaps counter-intuitively, measurement of the absence of photons also provides useful information, and offers significant potential for a wide range of new experimental directions. Here, we propose and experimentally demonstrate cooling of a mechanical resonator below its laser-cooled mechanical occupation via zero-photon detection on the anti-Stokes scattered optical field and verify this cooling through heterodyne measurements. Our measurements are well captured by a stochastic master equation and the techniques introduced here open new avenues for cooling, quantum thermodynamics, quantum state engineering, and quantum measurement and control. △ Less

Submitted 6 August, 2024; v1 submitted 3 August, 2024; originally announced August 2024.

Comments: Main: 5 pages, 2 figures. Supplemental: 6 pages, 2 figures

arXiv:2408.01669 [pdf, other]

SynopGround: A Large-Scale Dataset for Multi-Paragraph Video Grounding from TV Dramas and Synopses

Authors: Chaolei Tan, Zihang Lin, Junfu Pu, Zhongang Qi, Wei-Yi Pei, Zhi Qu, Yexin Wang, Ying Shan, Wei-Shi Zheng, Jian-Fang Hu

Abstract: Video grounding is a fundamental problem in multimodal content understanding, aiming to localize specific natural language queries in an untrimmed video. However, current video grounding datasets merely focus on simple events and are either limited to shorter videos or brief sentences, which hinders the model from evolving toward stronger multimodal understanding capabilities. To address these lim… ▽ More Video grounding is a fundamental problem in multimodal content understanding, aiming to localize specific natural language queries in an untrimmed video. However, current video grounding datasets merely focus on simple events and are either limited to shorter videos or brief sentences, which hinders the model from evolving toward stronger multimodal understanding capabilities. To address these limitations, we present a large-scale video grounding dataset named SynopGround, in which more than 2800 hours of videos are sourced from popular TV dramas and are paired with accurately localized human-written synopses. Each paragraph in the synopsis serves as a language query and is manually annotated with precise temporal boundaries in the long video. These paragraph queries are tightly correlated to each other and contain a wealth of abstract expressions summarizing video storylines and specific descriptions portraying event details, which enables the model to learn multimodal perception on more intricate concepts over longer context dependencies. Based on the dataset, we further introduce a more complex setting of video grounding dubbed Multi-Paragraph Video Grounding (MPVG), which takes as input multiple paragraphs and a long video for grounding each paragraph query to its temporal interval. In addition, we propose a novel Local-Global Multimodal Reasoner (LGMR) to explicitly model the local-global structures of long-term multimodal inputs for MPVG. Our method provides an effective baseline solution to the multi-paragraph video grounding problem. Extensive experiments verify the proposed model's effectiveness as well as its superiority in long-term multi-paragraph video grounding over prior state-of-the-arts. Dataset and code are publicly available. Project page: https://synopground.github.io/. △ Less

Submitted 18 August, 2024; v1 submitted 3 August, 2024; originally announced August 2024.

Comments: Accepted to ACM MM 2024. Project page: https://synopground.github.io/

arXiv:2408.01551 [pdf, other]

PiCoGen2: Piano cover generation with transfer learning approach and weakly aligned data

Authors: Chih-Pin Tan, Hsin Ai, Yi-Hsin Chang, Shuen-Huei Guan, Yi-Hsuan Yang

Abstract: Piano cover generation aims to create a piano cover from a pop song. Existing approaches mainly employ supervised learning and the training demands strongly-aligned and paired song-to-piano data, which is built by remapping piano notes to song audio. This would, however, result in the loss of piano information and accordingly cause inconsistencies between the original and remapped piano versions.… ▽ More Piano cover generation aims to create a piano cover from a pop song. Existing approaches mainly employ supervised learning and the training demands strongly-aligned and paired song-to-piano data, which is built by remapping piano notes to song audio. This would, however, result in the loss of piano information and accordingly cause inconsistencies between the original and remapped piano versions. To overcome this limitation, we propose a transfer learning approach that pre-trains our model on piano-only data and fine-tunes it on weakly-aligned paired data constructed without note remapping. During pre-training, to guide the model to learn piano composition concepts instead of merely transcribing audio, we use an existing lead sheet transcription model as the encoder to extract high-level features from the piano recordings. The pre-trained model is then fine-tuned on the paired song-piano data to transfer the learned composition knowledge to the pop song domain. Our evaluation shows that this training strategy enables our model, named PiCoGen2, to attain high-quality results, outperforming baselines on both objective and subjective metrics across five pop genres. △ Less

Submitted 2 August, 2024; originally announced August 2024.

Comments: Accepted at the 25th International Society for Music Information Retrieval Conference (ISMIR), 2024

arXiv:2408.00865 [pdf, other]

A Pride of Satellites in the Constellation Leo? Discovery of the Leo VI Milky Way Satellite Galaxy with DELVE Early Data Release 3

Authors: C. Y. Tan, W. Cerny, A. Drlica-Wagner, A. B. Pace, M. Geha, A. P. Ji, T. S. Li, M. Adamów, D. Anbajagane, C. R. Bom, J. A. Carballo-Bello, J. L. Carlin, C. Chang, Y. Choi, M. L. M. Collins, A. Doliva-Dolinsky, P. S. Ferguson, R. A. Gruendl, D. J. James, G. Limberg, M. Navabi, D. Martínez-Delgado, C. E. Martínez-Vázquez, G. E. Medina, B. Mutlu-Pakdil , et al. (9 additional authors not shown)

Abstract: We report the discovery and spectroscopic confirmation of an ultra-faint Milky Way (MW) satellite in the constellation of Leo. This system was discovered as a spatial overdensity of resolved stars observed with Dark Energy Camera (DECam) data from an early version of the third data release of the DECam Local Volume Exploration survey (DELVE EDR3). The low luminosity ($M_V = -3.56_{-0.37}^{+0.47}$;… ▽ More We report the discovery and spectroscopic confirmation of an ultra-faint Milky Way (MW) satellite in the constellation of Leo. This system was discovered as a spatial overdensity of resolved stars observed with Dark Energy Camera (DECam) data from an early version of the third data release of the DECam Local Volume Exploration survey (DELVE EDR3). The low luminosity ($M_V = -3.56_{-0.37}^{+0.47}$; $L_V = 2300_{-800}^{+1000} L_\odot$), large size ($r_{1/2} = 90_{-30}^{+30}$ pc), and large heliocentric distance ($D = 111_{-4}^{+7}$ kpc) are all consistent with the population of ultra-faint dwarf galaxies (UFDs). Using Keck/DEIMOS observations of the system, we were able to spectroscopically confirm 11 member stars, while measuring a mass to light ratio of $1000_{-700}^{+1900} M_\odot/L_\odot$ and a non-zero metallicity dispersion of $σ_{[\rm Fe/H]}=0.33_{-0.14}^{+0.19}$, further confirming Leo VI's identity as an UFD. While the system has an highly elliptical shape, $ε= 0.54_{-0.29}^{+0.19}$, we do not find any evidence that it is tidally disrupting. Moreover, despite its apparent on-sky proximity to members of the proposed Crater-Leo infall group, its relatively lower heliocentric distance and inconsistent position in energy-angular momentum space with the other group members make it unlikely for it to be part of the proposed infall group. △ Less

Submitted 1 August, 2024; originally announced August 2024.

Comments: 21 pages, 11 figures, 2 tables; to be submitted to AAS Journals

Report number: FERMILAB-PUB-24-0358-LDRD-PPD

arXiv:2407.21713 [pdf, other]

Social Learning through Interactions with Other Agents: A Survey

Authors: Dylan Hillier, Cheston Tan, Jing Jiang

Abstract: Social learning plays an important role in the development of human intelligence. As children, we imitate our parents' speech patterns until we are able to produce sounds; we learn from them praising us and scolding us; and as adults, we learn by working with others. In this work, we survey the degree to which this paradigm -- social learning -- has been mirrored in machine learning. In particular… ▽ More Social learning plays an important role in the development of human intelligence. As children, we imitate our parents' speech patterns until we are able to produce sounds; we learn from them praising us and scolding us; and as adults, we learn by working with others. In this work, we survey the degree to which this paradigm -- social learning -- has been mirrored in machine learning. In particular, since learning socially requires interacting with others, we are interested in how embodied agents can and have utilised these techniques. This is especially in light of the degree to which recent advances in natural language processing (NLP) enable us to perform new forms of social learning. We look at how behavioural cloning and next-token prediction mirror human imitation, how learning from human feedback mirrors human education, and how we can go further to enable fully communicative agents that learn from each other. We find that while individual social learning techniques have been used successfully, there has been little unifying work showing how to bring them together into socially embodied agents. △ Less

Submitted 3 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

Comments: To be published in IJCAI 2024, available on http://www.ijcai.org

ACM Class: I.2.7; I.2.0

arXiv:2407.21242 [pdf, other]

Supervised brain node and network construction under voxel-level functional imaging

Authors: Wanwan Xu, Selena Wang, Chichun Tan, Xilin Shen, Wenjing Luo, Todd Constable, Tianxi Li, Yize Zhao

Abstract: Recent advancements in understanding the brain's functional organization related to behavior have been pivotal, particularly in the development of predictive models based on brain connectivity. Traditional methods in this domain often involve a two-step process by first constructing a connectivity matrix from predefined brain regions, and then linking these connections to behaviors or clinical out… ▽ More Recent advancements in understanding the brain's functional organization related to behavior have been pivotal, particularly in the development of predictive models based on brain connectivity. Traditional methods in this domain often involve a two-step process by first constructing a connectivity matrix from predefined brain regions, and then linking these connections to behaviors or clinical outcomes. However, these approaches with unsupervised node partitions predict outcomes inefficiently with independently established connectivity. In this paper, we introduce the Supervised Brain Parcellation (SBP), a brain node parcellation scheme informed by the downstream predictive task. With voxel-level functional time courses generated under resting-state or cognitive tasks as input, our approach clusters voxels into nodes in a manner that maximizes the correlation between inter-node connections and the behavioral outcome, while also accommodating intra-node homogeneity. We rigorously evaluate the SBP approach using resting-state and task-based fMRI data from both the Adolescent Brain Cognitive Development (ABCD) study and the Human Connectome Project (HCP). Our analyses show that SBP significantly improves out-of-sample connectome-based predictive performance compared to conventional step-wise methods under various brain atlases. This advancement holds promise for enhancing our understanding of brain functional architectures with behavior and establishing more informative network neuromarkers for clinical applications. △ Less

Submitted 30 July, 2024; originally announced July 2024.

arXiv:2407.20883 [pdf, other]

doi 10.1145/3652583.3657626

PiCoGen: Generate Piano Covers with a Two-stage Approach

Authors: Chih-Pin Tan, Shuen-Huei Guan, Yi-Hsuan Yang

Abstract: Cover song generation stands out as a popular way of music making in the music-creative community. In this study, we introduce Piano Cover Generation (PiCoGen), a two-stage approach for automatic cover song generation that transcribes the melody line and chord progression of a song given its audio recording, and then uses the resulting lead sheet as the condition to generate a piano cover in the s… ▽ More Cover song generation stands out as a popular way of music making in the music-creative community. In this study, we introduce Piano Cover Generation (PiCoGen), a two-stage approach for automatic cover song generation that transcribes the melody line and chord progression of a song given its audio recording, and then uses the resulting lead sheet as the condition to generate a piano cover in the symbolic domain. This approach is advantageous in that it does not required paired data of covers and their original songs for training. Compared to an existing approach that demands such paired data, our evaluation shows that PiCoGen demonstrates competitive or even superior performance across songs of different musical genres. △ Less

Submitted 30 July, 2024; originally announced July 2024.

Comments: Published at ICMR 2024 (project page: https://tanchihpin0517.github.io/PiCoGen/)

arXiv:2407.17448 [pdf, ps, other]

Chiral-even twist-3 GPDs for the proton in a spectator diquark model

Authors: Chentao Tan, Zhun Lu

Abstract: We investigate the chiral-even twist-3 generalized parton distributions (GPDs) of valence quarks in the proton at nonzero skewness $ξ$, using a spectator model with scalar and axial-vector diquarks. We consider the exponential form factor for the nucleon-quark-diquark vertex and the axial-vector diquark with light-cone transverse polarization. We analyze the dependence of GPDs on the longitudinal… ▽ More We investigate the chiral-even twist-3 generalized parton distributions (GPDs) of valence quarks in the proton at nonzero skewness $ξ$, using a spectator model with scalar and axial-vector diquarks. We consider the exponential form factor for the nucleon-quark-diquark vertex and the axial-vector diquark with light-cone transverse polarization. We analyze the dependence of GPDs on the longitudinal momentum fraction $x$ at different $ξ$, and on the square of the transverse momentum transfer $Δ^2_T$ at different $x$. Our numerical results reveal distinct discontinuities in all twist-3 GPDs except $G_1$ and $\tilde{G}_1$. By taking the forward limit, we obtain the twist-3 parton distribution function $g_T$, which encodes the transverse spin distribution of quarks. We also compare the kinetic orbital angular momentum and the spin-orbit correlations of quarks defined by the twist-2 and twist-3 GPDs, respectively. △ Less

Submitted 24 July, 2024; originally announced July 2024.

Comments: 18 pages, 8 figures

arXiv:2407.16148 [pdf, other]

CHIME: LLM-Assisted Hierarchical Organization of Scientific Studies for Literature Review Support

Authors: Chao-Chun Hsu, Erin Bransom, Jenna Sparks, Bailey Kuehl, Chenhao Tan, David Wadden, Lucy Lu Wang, Aakanksha Naik

Abstract: Literature review requires researchers to synthesize a large amount of information and is increasingly challenging as the scientific literature expands. In this work, we investigate the potential of LLMs for producing hierarchical organizations of scientific studies to assist researchers with literature review. We define hierarchical organizations as tree structures where nodes refer to topical ca… ▽ More Literature review requires researchers to synthesize a large amount of information and is increasingly challenging as the scientific literature expands. In this work, we investigate the potential of LLMs for producing hierarchical organizations of scientific studies to assist researchers with literature review. We define hierarchical organizations as tree structures where nodes refer to topical categories and every node is linked to the studies assigned to that category. Our naive LLM-based pipeline for hierarchy generation from a set of studies produces promising yet imperfect hierarchies, motivating us to collect CHIME, an expert-curated dataset for this task focused on biomedicine. Given the challenging and time-consuming nature of building hierarchies from scratch, we use a human-in-the-loop process in which experts correct errors (both links between categories and study assignment) in LLM-generated hierarchies. CHIME contains 2,174 LLM-generated hierarchies covering 472 topics, and expert-corrected hierarchies for a subset of 100 topics. Expert corrections allow us to quantify LLM performance, and we find that while they are quite good at generating and organizing categories, their assignment of studies to categories could be improved. We attempt to train a corrector model with human feedback which improves study assignment by 12.6 F1 points. We release our dataset and models to encourage research on developing better assistive tools for literature review. △ Less

Submitted 22 July, 2024; originally announced July 2024.

Comments: 2024 ACL Findings

arXiv:2407.15734 [pdf, other]

TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSON

Authors: John Chong Min Tan, Prince Saroj, Bharat Runwal, Hardik Maheshwari, Brian Lim Yi Sheng, Richard Cottrill, Alankrit Chona, Ambuj Kumar, Mehul Motani

Abstract: TaskGen is an open-sourced agentic framework which uses an Agent to solve an arbitrary task by breaking them down into subtasks. Each subtask is mapped to an Equipped Function or another Agent to execute. In order to reduce verbosity (and hence token usage), TaskGen uses StrictJSON that ensures JSON output from the Large Language Model (LLM), along with additional features such as type checking an… ▽ More TaskGen is an open-sourced agentic framework which uses an Agent to solve an arbitrary task by breaking them down into subtasks. Each subtask is mapped to an Equipped Function or another Agent to execute. In order to reduce verbosity (and hence token usage), TaskGen uses StrictJSON that ensures JSON output from the Large Language Model (LLM), along with additional features such as type checking and iterative error correction. Key to the philosophy of TaskGen is the management of information/memory on a need-to-know basis. We empirically evaluate TaskGen on various environments such as 40x40 dynamic maze navigation with changing obstacle locations (100% solve rate), TextWorld escape room solving with dense rewards and detailed goals (96% solve rate), web browsing (69% of actions successful), solving the MATH dataset (71% solve rate over 100 Level-5 problems), Retrieval Augmented Generation on NaturalQuestions dataset (F1 score of 47.03%) △ Less

Submitted 22 July, 2024; originally announced July 2024.

Comments: 53 pages

arXiv:2407.12176 [pdf, other]

GPT-4V Cannot Generate Radiology Reports Yet

Authors: Yuyang Jiang, Chacha Chen, Dang Nguyen, Benjamin M. Mervak, Chenhao Tan

Abstract: GPT-4V's purported strong multimodal abilities raise interests in using it to automate radiology report writing, but there lacks thorough evaluations. In this work, we perform a systematic evaluation of GPT-4V in generating radiology reports on two chest X-ray report datasets: MIMIC-CXR and IU X-Ray. We attempt to directly generate reports using GPT-4V through different prompting strategies and fi… ▽ More GPT-4V's purported strong multimodal abilities raise interests in using it to automate radiology report writing, but there lacks thorough evaluations. In this work, we perform a systematic evaluation of GPT-4V in generating radiology reports on two chest X-ray report datasets: MIMIC-CXR and IU X-Ray. We attempt to directly generate reports using GPT-4V through different prompting strategies and find that it fails terribly in both lexical metrics and clinical efficacy metrics. To understand the low performance, we decompose the task into two steps: 1) the medical image reasoning step of predicting medical condition labels from images; and 2) the report synthesis step of generating reports from (groundtruth) conditions. We show that GPT-4V's performance in image reasoning is consistently low across different prompts. In fact, the distributions of model-predicted labels remain constant regardless of which groundtruth conditions are present on the image, suggesting that the model is not interpreting chest X-rays meaningfully. Even when given groundtruth conditions in report synthesis, its generated reports are less correct and less natural-sounding than a finetuned LLaMA-2. Altogether, our findings cast doubt on the viability of using GPT-4V in a radiology workflow. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 24 pages, 3 figures, code: https://github.com/YuyangJ0/GPT-4V-evaluation-radiology-report

arXiv:2407.11845 [pdf, other]

Asymmetric Kinematics in Young Clusters: The λ Ori Cluster

Authors: Joseph J. Armstrong, Jonathan C. Tan

Abstract: Context. Most stars form in clusters or associations but only a small number of these groups are expected to remain bound for longer than a few Myr. Once star formation has ended and the molecular gas around young stellar objects has been expelled via feedback processes, most initially bound young clusters lose the majority of their binding mass and begin to disperse into the Galactic field. Aims.… ▽ More Context. Most stars form in clusters or associations but only a small number of these groups are expected to remain bound for longer than a few Myr. Once star formation has ended and the molecular gas around young stellar objects has been expelled via feedback processes, most initially bound young clusters lose the majority of their binding mass and begin to disperse into the Galactic field. Aims. This process can be investigated by analysing the structure and kinematic trends in nearby young clusters, particularly expansion, the tell-tale sign that a cluster is no longer gravitationally bound but is dispersing into the field. Methods. We combine Gaia DR3 5-parameter astrometry with calibrated radial velocities for members of the nearby young cluster λ Ori (Collinder 69). Results. We characterise the plane-of-sky substructure of the cluster using the Q-parameter and Angular Dispersion parameter. We find evidence that the cluster contains significant substructure, but that this is preferentially located away from the central cluster core, which is smooth and likely remains bound. We find strong evidence for expansion in λ Ori in the plane-of-sky using a number of metrics, but also that the trends are asymmetric at the 5σ significance level. with the maximum rate of expansion being directed nearly parallel to the Galactic plane. We then invert the maximum rate of expansion of 0.144^{+0.003}_{-0.003} kms^{-1}pc^{-1} to give an expansion timescale of 6.944^{+0.148}_{-0.142} Myr, which is slightly larger than typical literature age estimates for the cluster. We also find asymmetry in the velocity dispersion, potential signatures of cluster rotation, and calculate kinematic ages for individual cluster members by tracing their motion back in time to their closest approach to the cluster center. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 20 pages, 17 figures, submitted to A&A

arXiv:2407.10058 [pdf, other]

Learning to Refuse: Towards Mitigating Privacy Risks in LLMs

Authors: Zhenhua Liu, Tong Zhu, Chuanyuan Tan, Wenliang Chen

Abstract: Large language models (LLMs) exhibit remarkable capabilities in understanding and generating natural language. However, these models can inadvertently memorize private information, posing significant privacy risks. This study addresses the challenge of enabling LLMs to protect specific individuals' private data without the need for complete retraining. We propose \return, a Real-world pErsonal daT… ▽ More Large language models (LLMs) exhibit remarkable capabilities in understanding and generating natural language. However, these models can inadvertently memorize private information, posing significant privacy risks. This study addresses the challenge of enabling LLMs to protect specific individuals' private data without the need for complete retraining. We propose \return, a Real-world pErsonal daTa UnleaRNing dataset, comprising 2,492 individuals from Wikipedia with associated QA pairs, to evaluate machine unlearning (MU) methods for protecting personal data in a realistic scenario. Additionally, we introduce the Name-Aware Unlearning Framework (NAUF) for Privacy Protection, which enables the model to learn which individuals' information should be protected without affecting its ability to answer questions related to other unrelated individuals. Our extensive experiments demonstrate that NAUF achieves a state-of-the-art average unlearning score, surpassing the best baseline method by 5.65 points, effectively protecting target individuals' personal data while maintaining the model's general capabilities. △ Less

Submitted 13 July, 2024; originally announced July 2024.

arXiv:2407.09949 [pdf, other]

The formation of supermassive black holes from Population III.1 seeds. III. Galaxy evolution and black hole growth from semi-analytic modelling

Authors: Vieri Cammelli, Pierluigi Monaco, Jonathan C. Tan, Jasbir Singh, Fabio Fontanot, Gabriella De Lucia, Michaela Hirschmann, Lizhi Xie

Abstract: We present an implementation of Pop III.1 seeding of supermassive black holes (SMBHs) in a theoretical model of galaxy formation and evolution to assess the growth the SMBH population and the properties of the host galaxies. The model of Pop III.1 seeding involves SMBH formation at redshifts $z\gtrsim 20$ in dark matter minihalos that are isolated from external radiative feedback, parameterized by… ▽ More We present an implementation of Pop III.1 seeding of supermassive black holes (SMBHs) in a theoretical model of galaxy formation and evolution to assess the growth the SMBH population and the properties of the host galaxies. The model of Pop III.1 seeding involves SMBH formation at redshifts $z\gtrsim 20$ in dark matter minihalos that are isolated from external radiative feedback, parameterized by isolation distance $d_{\rm iso}$. Within a standard $Λ$CDM cosmology, we generate dark matter halos using the code \textsc{pinocchio} and seed them according to the Pop III.1 scenario, exploring values of $d_{\rm iso}$ from 50 to 100~kpc (proper distance). We consider two alternative cases of SMBH seeding: a Halo Mass Threshold (HMT) model in which all halos $>7\times10^{10}\:M_\odot$ are seeded with $\sim 10^5\:M_\odot$ black holes; an All Light Seed (ALS) model in which all halos are seeded with low, stellar-mass black holes. We follow the redshift evolution of the halos, populating them with galaxies using the GAlaxy Evolution and Assembly theoretical model of galaxy formation, including accretion on SMBHs and related feedback processes. Here we present predictions for the properties of galaxy populations, focusing on stellar masses, star formation rates, and black hole masses. The local, $z\sim0$ metrics of occupation fraction as a function of the galaxy stellar mass, galaxy stellar mass function (GSMF), and black hole mass function (BHMF) all suggest a constraint of $d_{\rm iso}<75\:$kpc. We discuss the implications of this result for the Pop III.1 seeding mechanism. △ Less

Submitted 13 July, 2024; originally announced July 2024.

Comments: Submitted to MNRAS, comments welcome

arXiv:2407.09045 [pdf, other]

Time-Frequency Analysis of Variable-Length WiFi CSI Signals for Person Re-Identification

Authors: Chen Mao, Chong Tan, Jingqi Hu, Min Zheng

Abstract: Person re-identification (ReID), as a crucial technology in the field of security, plays an important role in security detection and people counting. Current security and monitoring systems largely rely on visual information, which may infringe on personal privacy and be susceptible to interference from pedestrian appearances and clothing in certain scenarios. Meanwhile, the widespread use of rout… ▽ More Person re-identification (ReID), as a crucial technology in the field of security, plays an important role in security detection and people counting. Current security and monitoring systems largely rely on visual information, which may infringe on personal privacy and be susceptible to interference from pedestrian appearances and clothing in certain scenarios. Meanwhile, the widespread use of routers offers new possibilities for ReID. This letter introduces a method using WiFi Channel State Information (CSI), leveraging the multipath propagation characteristics of WiFi signals as a basis for distinguishing different pedestrian features. We propose a two-stream network structure capable of processing variable-length data, which analyzes the amplitude in the time domain and the phase in the frequency domain of WiFi signals, fuses time-frequency information through continuous lateral connections, and employs advanced objective functions for representation and metric learning. Tested on a dataset collected in the real world, our method achieves 93.68% mAP and 98.13% Rank-1. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.07480 [pdf, other]

The discovery of a nearby 421~s transient with CHIME/FRB/Pulsar

Authors: Fengqiu Adam Dong, Tracy Clarke, Alice P. Curtin, Ajay Kumar, Ingrid Stairs, Shami Chatterjee, Amanda M. Cook, Emmanuel Fonseca, B. M. Gaensler, Jason W. T. Hessels, Victoria M. Kaspi, Mattias Lazda, Kiyoshi W. Masui, James W. McKee, Bradley W. Meyers, Aaron B. Pearlman, Scott M. Ransom, Paul Scholz, Kaitlyn Shin, Kendrick M. Smith, Chia Min Tan

Abstract: Neutron stars and white dwarfs are both dense remnants of post-main-sequence stars. Pulsars, magnetars and strongly magnetised white dwarfs have all been seen to been observed to exhibit coherent, pulsed radio emission in relation to their rotational period. Recently, a new type of radio long period transient (LPT) has been discovered. The bright radio emission of LPTs resembles that of radio puls… ▽ More Neutron stars and white dwarfs are both dense remnants of post-main-sequence stars. Pulsars, magnetars and strongly magnetised white dwarfs have all been seen to been observed to exhibit coherent, pulsed radio emission in relation to their rotational period. Recently, a new type of radio long period transient (LPT) has been discovered. The bright radio emission of LPTs resembles that of radio pulsars and magnetars. However, they pulse on timescales (minutes) much longer than previously seen. While minute timescales are common rotation periods for white dwarfs, LPTs are much brighter than the known pulsating white dwarfs, and dipolar radiation from isolated (as opposed to binary) magnetic white dwarfs has yet to be observed. Here, we report the discovery of a new $\sim$421~s LPT, CHIME J0630+25, using the CHIME/FRB and CHIME/Pulsar instruments. We used standard pulsar timing techniques and obtained a phase-coherent timing solution which yielded limits on the inferred magnetic field and characteristic age. CHIME J0630+25 is remarkably nearby ($170 \pm 80$~pc), making it the closest LPT discovered to date. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: Submitted

arXiv:2407.05410 [pdf, other]

doi 10.1109/RAISE.2019.00012

Synthetic Test Data Generation Using Recurrent Neural Networks: A Position Paper

Authors: Razieh Behjati, Erik Arisholm, Chao Tan, Margrethe M. Bedregal

Abstract: Testing in production-like test environments is an essential part of quality assurance processes in many industries. Provisioning of such test environments, for information-intensive services, involves setting up databases that are rich-enough to enable simulating a wide variety of user scenarios. While production data is perhaps the gold-standard here, many organizations, particularly within the… ▽ More Testing in production-like test environments is an essential part of quality assurance processes in many industries. Provisioning of such test environments, for information-intensive services, involves setting up databases that are rich-enough to enable simulating a wide variety of user scenarios. While production data is perhaps the gold-standard here, many organizations, particularly within the public sectors, are not allowed to use production data for testing purposes due to privacy concerns. The alternatives are to use anonymized data, or synthetically generated data. In this paper, we elaborate on these alternatives and compare them in an industrial context. Further we focus on synthetic data generation and investigate the use of recurrent neural networks for this purpose. In our preliminary experiments, we were able to generate representative and highly accurate data using a recurrent neural network. These results open new research questions that we discuss here, and plan to investigate in our future research. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: This paper was published in the proceedings of RAISE@ICSE in 2019

Journal ref: Proceedings of the 7th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering, RAISE@ICSE 2019, (2019), 22-27

arXiv:2407.04069 [pdf, other]

A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations

Authors: Md Tahmid Rahman Laskar, Sawsan Alqahtani, M Saiful Bari, Mizanur Rahman, Mohammad Abdullah Matin Khan, Haidar Khan, Israt Jahan, Amran Bhuiyan, Chee Wei Tan, Md Rizwan Parvez, Enamul Hoque, Shafiq Joty, Jimmy Huang

Abstract: Large Language Models (LLMs) have recently gained significant attention due to their remarkable capabilities in performing diverse tasks across various domains. However, a thorough evaluation of these models is crucial before deploying them in real-world applications to ensure they produce reliable performance. Despite the well-established importance of evaluating LLMs in the community, the comple… ▽ More Large Language Models (LLMs) have recently gained significant attention due to their remarkable capabilities in performing diverse tasks across various domains. However, a thorough evaluation of these models is crucial before deploying them in real-world applications to ensure they produce reliable performance. Despite the well-established importance of evaluating LLMs in the community, the complexity of the evaluation process has led to varied evaluation setups, causing inconsistencies in findings and interpretations. To address this, we systematically review the primary challenges and limitations causing these inconsistencies and unreliable evaluations in various steps of LLM evaluation. Based on our critical review, we present our perspectives and recommendations to ensure LLM evaluations are reproducible, reliable, and robust. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.01418 [pdf, other]

RoboPack: Learning Tactile-Informed Dynamics Models for Dense Packing

Authors: Bo Ai, Stephen Tian, Haochen Shi, Yixuan Wang, Cheston Tan, Yunzhu Li, Jiajun Wu

Abstract: Tactile feedback is critical for understanding the dynamics of both rigid and deformable objects in many manipulation tasks, such as non-prehensile manipulation and dense packing. We introduce an approach that combines visual and tactile sensing for robotic manipulation by learning a neural, tactile-informed dynamics model. Our proposed framework, RoboPack, employs a recurrent graph neural network… ▽ More Tactile feedback is critical for understanding the dynamics of both rigid and deformable objects in many manipulation tasks, such as non-prehensile manipulation and dense packing. We introduce an approach that combines visual and tactile sensing for robotic manipulation by learning a neural, tactile-informed dynamics model. Our proposed framework, RoboPack, employs a recurrent graph neural network to estimate object states, including particles and object-level latent physics information, from historical visuo-tactile observations and to perform future state predictions. Our tactile-informed dynamics model, learned from real-world data, can solve downstream robotics tasks with model-predictive control. We demonstrate our approach on a real robot equipped with a compliant Soft-Bubble tactile sensor on non-prehensile manipulation and dense packing tasks, where the robot must infer the physics properties of objects from direct and indirect interactions. Trained on only an average of 30 minutes of real-world interaction data per task, our model can perform online adaptation and make touch-informed predictions. Through extensive evaluations in both long-horizon dynamics prediction and real-world manipulation, our method demonstrates superior effectiveness compared to previous learning-based and physics-based simulation systems. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: Robotics: Science and Systems (RSS), 2024. Project page: https://robo-pack.github.io/

ACM Class: I.2.9; I.2.6; I.2.10

arXiv:2407.00050 [pdf, other]

FoldToken2: Learning compact, invariant and generative protein structure language

Authors: Zhangyang Gao, Cheng Tan, Stan Z. Li

Abstract: The equivalent nature of 3D coordinates has posed long term challenges in protein structure representation learning, alignment, and generation. Can we create a compact and invariant language that equivalently represents protein structures? Towards this goal, we propose FoldToken2 to transfer equivariant structures into discrete tokens, while maintaining the recoverability of the original structure… ▽ More The equivalent nature of 3D coordinates has posed long term challenges in protein structure representation learning, alignment, and generation. Can we create a compact and invariant language that equivalently represents protein structures? Towards this goal, we propose FoldToken2 to transfer equivariant structures into discrete tokens, while maintaining the recoverability of the original structures. From FoldToken1 to FoldToken2, we improve three key components: (1) invariant structure encoder, (2) vector-quantized compressor, and (3) equivalent structure decoder. We evaluate FoldToken2 on the protein structure reconstruction task and show that it outperforms previous FoldToken1 by 20\% in TMScore and 81\% in RMSD. FoldToken2 probably be the first method that works well on both single-chain and multi-chain protein structures quantization. We believe that FoldToken2 will inspire further improvement in protein structure representation learning, structure alignment, and structure generation tasks. △ Less

Submitted 11 June, 2024; originally announced July 2024.

arXiv:2406.16603 [pdf, other]

Bipolarized Weyl semimetals and quantum crystal valley Hall effect in two-dimensional altermagnetic materials

Authors: Chao-Yang Tan, Ze-Feng Gao, Huan-Cheng Yang, Kai Liu, Peng-Jie Guo, Zhong-Yi Lu

Abstract: Magnetism and topology are two major areas of condensed matter physics. The combination of magnetism and topology gives rise to more novel physical effects, which have attracted strongly theoretical and experimental attention. Recently, the concept of altermagnetism has been introduced, characterized by a dual nature: real-space antiferromagnetism and reciprocal-space anisotropic spin polarization… ▽ More Magnetism and topology are two major areas of condensed matter physics. The combination of magnetism and topology gives rise to more novel physical effects, which have attracted strongly theoretical and experimental attention. Recently, the concept of altermagnetism has been introduced, characterized by a dual nature: real-space antiferromagnetism and reciprocal-space anisotropic spin polarization. The amalgamation of altermagnetism with topology may lead to the emergence of previously unobserved topological phases and the associated physical effects. In this study, utilizing a four-band lattice model that incorporates altermagnetism and spin group symmetry, we demonstrate that type-I, type-II, and type-III bipolarized Weyl semimetals can exist in altermagnetic systems. Through the first-principles electronic structure calculations, we predict four ideal two-dimensional type-I altermagnetic bipolarized Weyl semimetals Fe$_2$WTe$_4$ and Fe$_2$MoZ$_4$ (Z=S,Se,Te). More significantly, we introduce the quantum crystal valley Hall effect, a phenomenon achievable in three of these materials namely Fe$_2$WTe$_4$, Fe$_2$MoS$_4$, and Fe$_2$MoTe$_4$, when spin-orbit coupling is considered. Furthermore, these materials have the potential to transition from a quantum crystal valley Hall phase to a Chern insulator phase under strain. In contrast, Fe$_2$MoSe$_4$ remains to be a Weyl semimetal under spin-orbit coupling but is distinguished by possessing only a single pair of Weyl points. Additionally, the position, polarization, and number of Weyl points in Fe$_2$WTe$_4$ and Fe$_2$MoZ$_4$ can be manipulated by adjusting the direction of the Néel vector. Consequently, Fe$_2$WTe$_4$ and Fe$_2$MoZ$_4$ emerge as promising experimental platforms for investigating the distinctive physical attributes of various altermagnetic topological phases. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 7 pages, 5 figures

arXiv:2406.15238 [pdf, other]

Fermilab Booster Beam Emittances from Quadrupole Modes Measured by BPMs

Authors: C. Y. Tan, M. Balcewicz

Abstract: The measurement of beam emittances by extracting the quadrupole mode signal from a 4 plate beam position monitor (BPM) was published at least 40 years ago. Unfortunately, in practice, this method suffers from poor signal to noise ratio and requires a lot of tuning to extract out the emittances. In this paper, an improved method where multiple BPMs are used together with better mathematical analysi… ▽ More The measurement of beam emittances by extracting the quadrupole mode signal from a 4 plate beam position monitor (BPM) was published at least 40 years ago. Unfortunately, in practice, this method suffers from poor signal to noise ratio and requires a lot of tuning to extract out the emittances. In this paper, an improved method where multiple BPMs are used together with better mathematical analysis is described. The BPM derived emittances are then compared with those measured by the Ion Profile Monitor (IPM). Surprisingly, the BPM measured emittances behave very well and are more realistic than those measured by the IPM. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 15th International Particle Accelerator Conference (IPAC'24)

Report number: FERMILAB-CONF-24-0179-AD

arXiv:2406.14359 [pdf, other]

Learning to Transfer for Evolutionary Multitasking

Authors: Sheng-Hao Wu, Yuxiao Huang, Xingyu Wu, Liang Feng, Zhi-Hui Zhan, Kay Chen Tan

Abstract: Evolutionary multitasking (EMT) is an emerging approach for solving multitask optimization problems (MTOPs) and has garnered considerable research interest. The implicit EMT is a significant research branch that utilizes evolution operators to enable knowledge transfer (KT) between tasks. However, current approaches in implicit EMT face challenges in adaptability, due to the use of a limited numbe… ▽ More Evolutionary multitasking (EMT) is an emerging approach for solving multitask optimization problems (MTOPs) and has garnered considerable research interest. The implicit EMT is a significant research branch that utilizes evolution operators to enable knowledge transfer (KT) between tasks. However, current approaches in implicit EMT face challenges in adaptability, due to the use of a limited number of evolution operators and insufficient utilization of evolutionary states for performing KT. This results in suboptimal exploitation of implicit KT's potential to tackle a variety of MTOPs. To overcome these limitations, we propose a novel Learning to Transfer (L2T) framework to automatically discover efficient KT policies for the MTOPs at hand. Our framework conceptualizes the KT process as a learning agent's sequence of strategic decisions within the EMT process. We propose an action formulation for deciding when and how to transfer, a state representation with informative features of evolution states, a reward formulation concerning convergence and transfer efficiency gain, and the environment for the agent to interact with MTOPs. We employ an actor-critic network structure for the agent and learn it via proximal policy optimization. This learned agent can be integrated with various evolutionary algorithms, enhancing their ability to address a range of new MTOPs. Comprehensive empirical studies on both synthetic and real-world MTOPs, encompassing diverse inter-task relationships, function classes, and task distributions are conducted to validate the proposed L2T framework. The results show a marked improvement in the adaptability and performance of implicit EMT when solving a wide spectrum of unseen MTOPs. △ Less

Submitted 22 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

Comments: Under review

arXiv:2406.14108 [pdf, other]

Connected Vehicle Data-driven Robust Optimization for Traffic Signal Timing: Modeling Traffic Flow Variability and Errors

Authors: Chaopeng Tan, Yue Ding, Kaidi Yang, Hong Zhu, Keshuang Tang

Abstract: Recent advancements in Connected Vehicle (CV) technology have prompted research on leveraging CV data for more effective traffic management. Despite the low penetration rate, such detailed CV data has demonstrated great potential in improving traffic signal performance. However, existing studies share a common shortcoming in that they all ignore traffic flow estimation errors in their modeling pro… ▽ More Recent advancements in Connected Vehicle (CV) technology have prompted research on leveraging CV data for more effective traffic management. Despite the low penetration rate, such detailed CV data has demonstrated great potential in improving traffic signal performance. However, existing studies share a common shortcoming in that they all ignore traffic flow estimation errors in their modeling process, which is inevitable due to the sampling observation nature of CVs. This study proposes a CV data-driven robust optimization framework for traffic signal timing accounting for both traffic flow variability and estimation errors. First, we propose a general CV data-driven optimization model that can be widely applied to various signalized intersection scenarios including under-/over-saturated and fixed-/real-time. Then, we propose a novel data-driven uncertainty set of arrival rates based on the bounds information derived from CVs, which circumvents the error-prone arrival rate estimation process. Finally, a CV data-driven robust optimization model (CV-RO) is formulated to explicitly handle arrival rate uncertainties. By means of the robust counterpart approach, this robust optimization problem can be equalized to a deterministic mixed-integer linear programming problem with an exact solution. The evaluation results highlight the superior performance of the CV-RO model compared to the deterministic model and traditional methods across various scenarios: different penetration rates, traffic demands, and control types. Notably, the CV-RO model demonstrates its excellence at lower CV penetration rates and in the presence of different traffic flow fluctuation levels, affirming its effectiveness and robustness. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: Accepted for podium session of the Conference in Emerging Technologies in Transportation Systems (TRC-30)

arXiv:2406.13434 [pdf, other]

Tactile Aware Dynamic Obstacle Avoidance in Crowded Environment with Deep Reinforcement Learning

Authors: Yung Chuen Ng, Qi Wen, Lim, Chun Ye Tan, Zhen Hao Gan, Meng Yee, Chuah

Abstract: Mobile robots operating in crowded environments require the ability to navigate among humans and surrounding obstacles efficiently while adhering to safety standards and socially compliant mannerisms. This scale of the robot navigation problem may be classified as both a local path planning and trajectory optimization problem. This work presents an array of force sensors that act as a tactile laye… ▽ More Mobile robots operating in crowded environments require the ability to navigate among humans and surrounding obstacles efficiently while adhering to safety standards and socially compliant mannerisms. This scale of the robot navigation problem may be classified as both a local path planning and trajectory optimization problem. This work presents an array of force sensors that act as a tactile layer to complement the use of a LiDAR for the purpose of inducing awareness of contact with any surrounding objects within immediate vicinity of a mobile robot undetected by LiDARs. By incorporating the tactile layer, the robot can take more risks in its movements and possibly go right up to an obstacle or wall, and gently squeeze past it. In addition, we built up a simulation platform via Pybullet which integrates Robot Operating System (ROS) and reinforcement learning (RL) together. A touch-aware neural network model was trained on it to create an RL-based local path planner for dynamic obstacle avoidance. Our proposed method was demonstrated successfully on an omni-directional mobile robot who was able to navigate in a crowded environment with high agility and versatility in movement, while not being overly sensitive to nearby obstacles-not-in-contact. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.12266 [pdf, other]

Towards a Client-Centered Assessment of LLM Therapists by Client Simulation

Authors: Jiashuo Wang, Yang Xiao, Yanran Li, Changhe Song, Chunpu Xu, Chenhao Tan, Wenjie Li

Abstract: Although there is a growing belief that LLMs can be used as therapists, exploring LLMs' capabilities and inefficacy, particularly from the client's perspective, is limited. This work focuses on a client-centered assessment of LLM therapists with the involvement of simulated clients, a standard approach in clinical medical education. However, there are two challenges when applying the approach to a… ▽ More Although there is a growing belief that LLMs can be used as therapists, exploring LLMs' capabilities and inefficacy, particularly from the client's perspective, is limited. This work focuses on a client-centered assessment of LLM therapists with the involvement of simulated clients, a standard approach in clinical medical education. However, there are two challenges when applying the approach to assess LLM therapists at scale. Ethically, asking humans to frequently mimic clients and exposing them to potentially harmful LLM outputs can be risky and unsafe. Technically, it can be difficult to consistently compare the performances of different LLM therapists interacting with the same client. To this end, we adopt LLMs to simulate clients and propose ClientCAST, a client-centered approach to assessing LLM therapists by client simulation. Specifically, the simulated client is utilized to interact with LLM therapists and complete questionnaires related to the interaction. Based on the questionnaire results, we assess LLM therapists from three client-centered aspects: session outcome, therapeutic alliance, and self-reported feelings. We conduct experiments to examine the reliability of ClientCAST and use it to evaluate LLMs therapists implemented by Claude-3, GPT-3.5, LLaMA3-70B, and Mixtral 8*7B. Codes are released at https://github.com/wangjs9/ClientCAST. △ Less

Submitted 20 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.10840 [pdf, other]

CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph

Authors: Haitao Lin, Guojiang Zhao, Odin Zhang, Yufei Huang, Lirong Wu, Zicheng Liu, Siyuan Li, Cheng Tan, Zhifeng Gao, Stan Z. Li

Abstract: Structure-based drug design (SBDD) aims to generate potential drugs that can bind to a target protein and is greatly expedited by the aid of AI techniques in generative models. However, a lack of systematic understanding persists due to the diverse settings, complex implementation, difficult reproducibility, and task singularity. Firstly, the absence of standardization can lead to unfair compariso… ▽ More Structure-based drug design (SBDD) aims to generate potential drugs that can bind to a target protein and is greatly expedited by the aid of AI techniques in generative models. However, a lack of systematic understanding persists due to the diverse settings, complex implementation, difficult reproducibility, and task singularity. Firstly, the absence of standardization can lead to unfair comparisons and inconclusive insights. To address this dilemma, we propose CBGBench, a comprehensive benchmark for SBDD, that unifies the task as a generative heterogeneous graph completion, analogous to fill-in-the-blank of the 3D complex binding graph. By categorizing existing methods based on their attributes, CBGBench facilitates a modular and extensible framework that implements various cutting-edge methods. Secondly, a single task on \textit{de novo} molecule generation can hardly reflect their capabilities. To broaden the scope, we have adapted these models to a range of tasks essential in drug design, which are considered sub-tasks within the graph fill-in-the-blank tasks. These tasks include the generative designation of \textit{de novo} molecules, linkers, fragments, scaffolds, and sidechains, all conditioned on the structures of protein pockets. Our evaluations are conducted with fairness, encompassing comprehensive perspectives on interaction, chemical properties, geometry authenticity, and substructure validity. We further provide the pre-trained versions of the state-of-the-art models and deep insights with analysis from empirical studies. The codebase for CBGBench is publicly accessible at \url{https://github.com/Edapinenut/CBGBench}. △ Less

Submitted 22 July, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

Comments: 9 pages main context

arXiv:2406.09628 [pdf]

doi 10.1103/PhysRevB.104.085153

Massive Dirac Fermions and Strong Shubnikov-de Haas Oscillations in Topological Insulator Sm,Fe:Bi2Se3 Single Crystals

Authors: Weiyao Zhao, Chi Xuan Trang, Qile Li, Lei Chen, Zengji Yue, Abdulhakim Bake, Cheng Tan, Lan Wang, Mitchell Nancarrow, Mark Edmonds, David Cortie, Xiaolin Wang

Abstract: Topological insulators (TIs) are emergent materials with unique band structure, which allow the study of quantum effect in solids, as well as contribute to high performance quantum devices. To achieve the better performance of TI, here we present a co-doping strategy using synergistic rare-earth Sm and transition-metal Fe dopants in Bi2Se3 single crystals, which combine the advantages of both tran… ▽ More Topological insulators (TIs) are emergent materials with unique band structure, which allow the study of quantum effect in solids, as well as contribute to high performance quantum devices. To achieve the better performance of TI, here we present a co-doping strategy using synergistic rare-earth Sm and transition-metal Fe dopants in Bi2Se3 single crystals, which combine the advantages of both transition metal doped TI (high ferromagnetic ordering temperature and observed QAHE), and rare-earth doped TI (large magnetic moments and significant spin orbit coupling). In the as-grown single crystals, clear evidences of ferromagnetic ordering were observed. The angle resolve photoemission spectroscopy indicate the ferromagnetism opens a 44 meV band gap at surface Dirac point. Moreover, the carrier mobility at 3 K is ~ 7400 cm2/Vs, and we thus observed an ultra-strong Shubnikov-de Haas oscillation in the longitudinal resistivity, as well as the Hall steps in transverse resistivity below 14 T. Our transport and angular resolved photoemission spectroscopy results suggest that the rare-earth and transition metal co-doping in Bi2Se3 system is a promising avenue implement the quantum anomalous Hall effect, as well as harnessing the massive Dirac fermion in electrical devices. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 5 figures

Journal ref: Physical Review B 104, 085153 (2021)

arXiv:2406.08987 [pdf, other]

Autonomous Multi-Objective Optimization Using Large Language Model

Authors: Yuxiao Huang, Shenghao Wu, Wenjie Zhang, Jibin Wu, Liang Feng, Kay Chen Tan

Abstract: Multi-objective optimization problems (MOPs) are ubiquitous in real-world applications, presenting a complex challenge of balancing multiple conflicting objectives. Traditional evolutionary algorithms (EAs), though effective, often rely on domain-specific expertise and iterative fine-tuning, hindering adaptability to unseen MOPs. In recent years, the advent of Large Language Models (LLMs) has revo… ▽ More Multi-objective optimization problems (MOPs) are ubiquitous in real-world applications, presenting a complex challenge of balancing multiple conflicting objectives. Traditional evolutionary algorithms (EAs), though effective, often rely on domain-specific expertise and iterative fine-tuning, hindering adaptability to unseen MOPs. In recent years, the advent of Large Language Models (LLMs) has revolutionized software engineering by enabling the autonomous generation and refinement of programs. Leveraging this breakthrough, we propose a new LLM-based framework that autonomously designs EA operators for solving MOPs. The proposed framework includes a robust testing module to refine the generated EA operator through error-driven dialogue with LLMs, a dynamic selection strategy along with informative prompting-based crossover and mutation to fit textual optimization pipeline. Our approach facilitates the design of EA operators without the extensive demands for expert intervention, thereby speeding up the innovation of EA operators. Empirical studies across various MOP categories validate the robustness and superior performance of our proposed framework. △ Less

Submitted 26 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: 14 pages, 11 figures, 6 tables

arXiv:2406.05688 [pdf, other]

Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions

Authors: Cheng Tan, Dongxin Lyu, Siyuan Li, Zhangyang Gao, Jingxuan Wei, Siqi Ma, Zicheng Liu, Stan Z. Li

Abstract: Large Language Models (LLMs) have demonstrated wide-ranging applications across various fields and have shown significant potential in the academic peer-review process. However, existing applications are primarily limited to static review generation based on submitted papers, which fail to capture the dynamic and iterative nature of real-world peer reviews. In this paper, we reformulate the peer-r… ▽ More Large Language Models (LLMs) have demonstrated wide-ranging applications across various fields and have shown significant potential in the academic peer-review process. However, existing applications are primarily limited to static review generation based on submitted papers, which fail to capture the dynamic and iterative nature of real-world peer reviews. In this paper, we reformulate the peer-review process as a multi-turn, long-context dialogue, incorporating distinct roles for authors, reviewers, and decision makers. We construct a comprehensive dataset containing over 26,841 papers with 92,017 reviews collected from multiple sources, including the top-tier conference and prestigious journal. This dataset is meticulously designed to facilitate the applications of LLMs for multi-turn dialogues, effectively simulating the complete peer-review process. Furthermore, we propose a series of metrics to evaluate the performance of LLMs for each role under this reformulated peer-review setting, ensuring fair and comprehensive evaluations. We believe this work provides a promising perspective on enhancing the LLM-driven peer-review process by incorporating dynamic, role-based interactions. It aligns closely with the iterative and interactive nature of real-world academic peer review, offering a robust foundation for future research and development in this area. We open-source the dataset at https://github.com/chengtan9907/ReviewMT. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: Under review

arXiv:2406.03198 [pdf, other]

The Impossibility of Fair LLMs

Authors: Jacy Anthis, Kristian Lum, Michael Ekstrand, Avi Feller, Alexander D'Amour, Chenhao Tan

Abstract: The need for fair AI is increasingly clear in the era of general-purpose systems such as ChatGPT, Gemini, and other large language models (LLMs). However, the increasing complexity of human-AI interaction and its social impacts have raised questions of how fairness standards could be applied. Here, we review the technical frameworks that machine learning researchers have used to evaluate fairness,… ▽ More The need for fair AI is increasingly clear in the era of general-purpose systems such as ChatGPT, Gemini, and other large language models (LLMs). However, the increasing complexity of human-AI interaction and its social impacts have raised questions of how fairness standards could be applied. Here, we review the technical frameworks that machine learning researchers have used to evaluate fairness, such as group fairness and fair representations, and find that their application to LLMs faces inherent limitations. We show that each framework either does not logically extend to LLMs or presents a notion of fairness that is intractable for LLMs, primarily due to the multitudes of populations affected, sensitive attributes, and use cases. To address these challenges, we develop guidelines for the more realistic goal of achieving fairness in particular use cases: the criticality of context, the responsibility of LLM developers, and the need for stakeholder participation in an iterative process of design and evaluation. Moreover, it may eventually be possible and even necessary to use the general-purpose capabilities of AI systems to address fairness challenges as a form of scalable AI-assisted alignment. △ Less

Submitted 28 May, 2024; originally announced June 2024.

Comments: Presented at the 1st Human-Centered Evaluation and Auditing of Language Models (HEAL) workshop at CHI 2024

arXiv:2406.02234 [pdf, other]

On the Limitations of Fractal Dimension as a Measure of Generalization

Authors: Charlie Tan, Inés García-Redondo, Qiquan Wang, Michael M. Bronstein, Anthea Monod

Abstract: Bounding and predicting the generalization gap of overparameterized neural networks remains a central open problem in theoretical machine learning. Neural network optimization trajectories have been proposed to possess fractal structure, leading to bounds and generalization measures based on notions of fractal dimension on these trajectories. Prominently, both the Hausdorff dimension and the persi… ▽ More Bounding and predicting the generalization gap of overparameterized neural networks remains a central open problem in theoretical machine learning. Neural network optimization trajectories have been proposed to possess fractal structure, leading to bounds and generalization measures based on notions of fractal dimension on these trajectories. Prominently, both the Hausdorff dimension and the persistent homology dimension have been proposed to correlate with generalization gap, thus serving as a measure of generalization. This work performs an extended evaluation of these topological generalization measures. We demonstrate that fractal dimension fails to predict generalization of models trained from poor initializations. We further identify that the $\ell^2$ norm of the final parameter iterate, one of the simplest complexity measures in learning theory, correlates more strongly with the generalization gap than these notions of fractal dimension. Finally, our study reveals the intriguing manifestation of model-wise double descent in persistent homology-based generalization measures. This work lays the ground for a deeper investigation of the causal relationships between fractal geometry, topological data analysis, and neural network optimization. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 17 pages, 6 figures

arXiv:2406.01627 [pdf, other]

GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models

Authors: Zicheng Liu, Jiahui Li, Siyuan Li, Zelin Zang, Cheng Tan, Yufei Huang, Yajing Bai, Stan Z. Li

Abstract: The Genomic Foundation Model (GFM) paradigm is expected to facilitate the extraction of generalizable representations from massive genomic data, thereby enabling their application across a spectrum of downstream applications. Despite advancements, a lack of evaluation framework makes it difficult to ensure equitable assessment due to experimental settings, model intricacy, benchmark datasets, and… ▽ More The Genomic Foundation Model (GFM) paradigm is expected to facilitate the extraction of generalizable representations from massive genomic data, thereby enabling their application across a spectrum of downstream applications. Despite advancements, a lack of evaluation framework makes it difficult to ensure equitable assessment due to experimental settings, model intricacy, benchmark datasets, and reproducibility challenges. In the absence of standardization, comparative analyses risk becoming biased and unreliable. To surmount this impasse, we introduce GenBench, a comprehensive benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models. GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies. Through systematic evaluations of datasets spanning diverse biological domains with a particular emphasis on both short-range and long-range genomic tasks, firstly including the three most important DNA tasks covering Coding Region, Non-Coding Region, Genome Structure, etc. Moreover, We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance. Our findings reveal an interesting observation: independent of the number of parameters, the discernible difference in preference between the attention-based and convolution-based models on short- and long-range tasks may provide insights into the future design of GFM. △ Less

Submitted 5 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

arXiv:2406.01333 [pdf, other]

Probing Language Models for Pre-training Data Detection

Authors: Zhenhua Liu, Tong Zhu, Chuanyuan Tan, Haonan Lu, Bing Liu, Wenliang Chen

Abstract: Large Language Models (LLMs) have shown their impressive capabilities, while also raising concerns about the data contamination problems due to privacy issues and leakage of benchmark datasets in the pre-training phase. Therefore, it is vital to detect the contamination by checking whether an LLM has been pre-trained on the target texts. Recent studies focus on the generated texts and compute perp… ▽ More Large Language Models (LLMs) have shown their impressive capabilities, while also raising concerns about the data contamination problems due to privacy issues and leakage of benchmark datasets in the pre-training phase. Therefore, it is vital to detect the contamination by checking whether an LLM has been pre-trained on the target texts. Recent studies focus on the generated texts and compute perplexities, which are superficial features and not reliable. In this study, we propose to utilize the probing technique for pre-training data detection by examining the model's internal activations. Our method is simple and effective and leads to more trustworthy pre-training data detection. Additionally, we propose ArxivMIA, a new challenging benchmark comprising arxiv abstracts from Computer Science and Mathematics categories. Our experiments demonstrate that our method outperforms all baselines, and achieves state-of-the-art performance on both WikiMIA and ArxivMIA, with additional experiments confirming its efficacy (Our code and dataset are available at https://github.com/zhliu0106/probing-lm-data). △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: Accepted by ACL-2024 main conference

arXiv:2405.20834 [pdf, other]

Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits Multimodal Reasoning

Authors: Cheng Tan, Jingxuan Wei, Linzhuang Sun, Zhangyang Gao, Siyuan Li, Bihui Yu, Ruifeng Guo, Stan Z. Li

Abstract: Large language models equipped with retrieval-augmented generation (RAG) represent a burgeoning field aimed at enhancing answering capabilities by leveraging external knowledge bases. Although the application of RAG with language-only models has been extensively explored, its adaptation into multimodal vision-language models remains nascent. Going beyond mere answer generation, the primary goal of… ▽ More Large language models equipped with retrieval-augmented generation (RAG) represent a burgeoning field aimed at enhancing answering capabilities by leveraging external knowledge bases. Although the application of RAG with language-only models has been extensively explored, its adaptation into multimodal vision-language models remains nascent. Going beyond mere answer generation, the primary goal of multimodal RAG is to cultivate the models' ability to reason in response to relevant queries. To this end, we introduce a novel multimodal RAG framework named RMR (Retrieval Meets Reasoning). The RMR framework employs a bi-modal retrieval module to identify the most relevant question-answer pairs, which then serve as scaffolds for the multimodal reasoning process. This training-free approach not only encourages the model to engage deeply with the reasoning processes inherent in the retrieved content but also facilitates the generation of answers that are precise and richly interpretable. Surprisingly, utilizing solely the ScienceQA dataset, collected from elementary and high school science curricula, RMR significantly boosts the performance of various vision-language models across a spectrum of benchmark datasets, including A-OKVQA, MMBench, and SEED. These outcomes highlight the substantial potential of our multimodal retrieval and reasoning mechanism to improve the reasoning capabilities of vision-language models. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: Under review

Showing 1–50 of 1,186 results for author: Tan, C