Search | arXiv e-print repository

Detecting AI Flaws: Target-Driven Attacks on Internal Faults in Language Models

Authors: Yuhao Du, Zhuo Li, Pengyu Cheng, Xiang Wan, Anningzhe Gao

Abstract: Large Language Models (LLMs) have become a focal point in the rapidly evolving field of artificial intelligence. However, a critical concern is the presence of toxic content within the pre-training corpus of these models, which can lead to the generation of inappropriate outputs. Investigating methods for detecting internal faults in LLMs can help us understand their limitations and improve their… ▽ More Large Language Models (LLMs) have become a focal point in the rapidly evolving field of artificial intelligence. However, a critical concern is the presence of toxic content within the pre-training corpus of these models, which can lead to the generation of inappropriate outputs. Investigating methods for detecting internal faults in LLMs can help us understand their limitations and improve their security. Existing methods primarily focus on jailbreaking attacks, which involve manually or automatically constructing adversarial content to prompt the target LLM to generate unexpected responses. These methods rely heavily on prompt engineering, which is time-consuming and usually requires specially designed questions. To address these challenges, this paper proposes a target-driven attack paradigm that focuses on directly eliciting the target response instead of optimizing the prompts. We introduce the use of another LLM as the detector for toxic content, referred to as ToxDet. Given a target toxic response, ToxDet can generate a possible question and a preliminary answer to provoke the target model into producing desired toxic responses with meanings equivalent to the provided one. ToxDet is trained by interacting with the target LLM and receiving reward signals from it, utilizing reinforcement learning for the optimization process. While the primary focus of the target models is on open-source LLMs, the fine-tuned ToxDet can also be transferred to attack black-box models such as GPT-4o, achieving notable results. Experimental results on AdvBench and HH-Harmless datasets demonstrate the effectiveness of our methods in detecting the tendencies of target LLMs to generate harmful responses. This algorithm not only exposes vulnerabilities but also provides a valuable resource for researchers to strengthen their models against such attacks. △ Less

Submitted 27 August, 2024; originally announced August 2024.

arXiv:2408.11860 [pdf, other]

doi 10.18653/v1/2023.findings-acl.81

Risks and NLP Design: A Case Study on Procedural Document QA

Authors: Nikita Haduong, Alice Gao, Noah A. Smith

Abstract: As NLP systems are increasingly deployed at scale, concerns about their potential negative impacts have attracted the attention of the research community, yet discussions of risk have mostly been at an abstract level and focused on generic AI or NLP applications. We argue that clearer assessments of risks and harms to users--and concrete strategies to mitigate them--will be possible when we specia… ▽ More As NLP systems are increasingly deployed at scale, concerns about their potential negative impacts have attracted the attention of the research community, yet discussions of risk have mostly been at an abstract level and focused on generic AI or NLP applications. We argue that clearer assessments of risks and harms to users--and concrete strategies to mitigate them--will be possible when we specialize the analysis to more concrete applications and their plausible users. As an illustration, this paper is grounded in cooking recipe procedural document question answering (ProcDocQA), where there are well-defined risks to users such as injuries or allergic reactions. Our case study shows that an existing language model, applied in "zero-shot" mode, quantitatively answers real-world questions about recipes as well or better than the humans who have answered the questions on the web. Using a novel questionnaire informed by theoretical work on AI risk, we conduct a risk-oriented error analysis that could then inform the design of a future system to be deployed with lower risk of harm and better performance. △ Less

Submitted 16 August, 2024; originally announced August 2024.

Journal ref: Findings of the Association for Computational Linguistics ACL (2023) 1248-1269

arXiv:2407.13301 [pdf, other]

CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis

Authors: Junying Chen, Chi Gui, Anningzhe Gao, Ke Ji, Xidong Wang, Xiang Wan, Benyou Wang

Abstract: The field of medical diagnosis has undergone a significant transformation with the advent of large language models (LLMs), yet the challenges of interpretability within these models remain largely unaddressed. This study introduces Chain-of-Diagnosis (CoD) to enhance the interpretability of LLM-based medical diagnostics. CoD transforms the diagnostic process into a diagnostic chain that mirrors a… ▽ More The field of medical diagnosis has undergone a significant transformation with the advent of large language models (LLMs), yet the challenges of interpretability within these models remain largely unaddressed. This study introduces Chain-of-Diagnosis (CoD) to enhance the interpretability of LLM-based medical diagnostics. CoD transforms the diagnostic process into a diagnostic chain that mirrors a physician's thought process, providing a transparent reasoning pathway. Additionally, CoD outputs the disease confidence distribution to ensure transparency in decision-making. This interpretability makes model diagnostics controllable and aids in identifying critical symptoms for inquiry through the entropy reduction of confidences. With CoD, we developed DiagnosisGPT, capable of diagnosing 9604 diseases. Experimental results demonstrate that DiagnosisGPT outperforms other LLMs on diagnostic benchmarks. Moreover, DiagnosisGPT provides interpretability while ensuring controllability in diagnostic rigor. △ Less

Submitted 18 July, 2024; originally announced July 2024.

arXiv:2407.10666 [pdf, other]

Flow Perturbation to Accelerate Unbiased Sampling of Boltzmann distribution

Authors: Xin Peng, Ang Gao

Abstract: Flow-based generative models have been employed for sampling the Boltzmann distribution, but their application to high-dimensional systems is hindered by the significant computational cost of obtaining the Jacobian of the flow. To overcome this challenge, we introduce the flow perturbation method, which incorporates optimized stochastic perturbations into the flow. By reweighting trajectories gene… ▽ More Flow-based generative models have been employed for sampling the Boltzmann distribution, but their application to high-dimensional systems is hindered by the significant computational cost of obtaining the Jacobian of the flow. To overcome this challenge, we introduce the flow perturbation method, which incorporates optimized stochastic perturbations into the flow. By reweighting trajectories generated by the perturbed flow, our method achieves unbiased sampling of the Boltzmann distribution with orders of magnitude speedup compared to both brute force Jacobian calculations and the Hutchinson estimator. Notably, it accurately sampled the Chignolin protein with all atomic Cartesian coordinates explicitly represented, which, to our best knowledge, is the largest molecule ever Boltzmann sampled in such detail using generative models. △ Less

Submitted 27 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.05302 [pdf, other]

Mamba Hawkes Process

Authors: Anningzhe Gao, Shan Dai, Yan Hu

Abstract: Irregular and asynchronous event sequences are prevalent in many domains, such as social media, finance, and healthcare. Traditional temporal point processes (TPPs), like Hawkes processes, often struggle to model mutual inhibition and nonlinearity effectively. While recent neural network models, including RNNs and Transformers, address some of these issues, they still face challenges with long-ter… ▽ More Irregular and asynchronous event sequences are prevalent in many domains, such as social media, finance, and healthcare. Traditional temporal point processes (TPPs), like Hawkes processes, often struggle to model mutual inhibition and nonlinearity effectively. While recent neural network models, including RNNs and Transformers, address some of these issues, they still face challenges with long-term dependencies and computational efficiency. In this paper, we introduce the Mamba Hawkes Process (MHP), which leverages the Mamba state space architecture to capture long-range dependencies and dynamic event interactions. Our results show that MHP outperforms existing models across various datasets. Additionally, we propose the Mamba Hawkes Process Extension (MHP-E), which combines Mamba and Transformer models to enhance predictive capabilities. We present the novel application of the Mamba architecture to Hawkes processes, a flexible and extensible model structure, and a theoretical analysis of the synergy between state space models and Hawkes processes. Experimental results demonstrate the superior performance of both MHP and MHP-E, advancing the field of temporal point process modeling. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2406.19280 [pdf, other]

HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale

Authors: Junying Chen, Ruyi Ouyang, Anningzhe Gao, Shunian Chen, Guiming Hardy Chen, Xidong Wang, Ruifei Zhang, Zhenyang Cai, Ke Ji, Guangjun Yu, Xiang Wan, Benyou Wang

Abstract: The rapid development of multimodal large language models (MLLMs), such as GPT-4V, has led to significant advancements. However, these models still face challenges in medical multimodal capabilities due to limitations in the quantity and quality of medical vision-text data, stemming from data privacy concerns and high annotation costs. While pioneering approaches utilize PubMed's large-scale, de-i… ▽ More The rapid development of multimodal large language models (MLLMs), such as GPT-4V, has led to significant advancements. However, these models still face challenges in medical multimodal capabilities due to limitations in the quantity and quality of medical vision-text data, stemming from data privacy concerns and high annotation costs. While pioneering approaches utilize PubMed's large-scale, de-identified medical image-text pairs to address these limitations, they still fall short due to inherent data noise. To tackle this, we refined medical image-text pairs from PubMed and employed MLLMs (GPT-4V) in an 'unblinded' capacity to denoise and reformat the data, resulting in the creation of the PubMedVision dataset with 1.3 million medical VQA samples. Our validation demonstrates that: (1) PubMedVision can significantly enhance the medical multimodal capabilities of current MLLMs, showing significant improvement in benchmarks including the MMMU Health & Medicine track; (2) manual checks by medical experts and empirical results validate the superior data quality of our dataset compared to other data construction methods. Using PubMedVision, we train a 34B medical MLLM HuatuoGPT-Vision, which shows superior performance in medical multimodal scenarios among open-source MLLMs. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.18034 [pdf, other]

LLMs for Doctors: Leveraging Medical LLMs to Assist Doctors, Not Replace Them

Authors: Wenya Xie, Qingying Xiao, Yu Zheng, Xidong Wang, Junying Chen, Ke Ji, Anningzhe Gao, Xiang Wan, Feng Jiang, Benyou Wang

Abstract: The recent success of Large Language Models (LLMs) has had a significant impact on the healthcare field, providing patients with medical advice, diagnostic information, and more. However, due to a lack of professional medical knowledge, patients are easily misled by generated erroneous information from LLMs, which may result in serious medical problems. To address this issue, we focus on tuning th… ▽ More The recent success of Large Language Models (LLMs) has had a significant impact on the healthcare field, providing patients with medical advice, diagnostic information, and more. However, due to a lack of professional medical knowledge, patients are easily misled by generated erroneous information from LLMs, which may result in serious medical problems. To address this issue, we focus on tuning the LLMs to be medical assistants who collaborate with more experienced doctors. We first conduct a two-stage survey by inspiration-feedback to gain a broad understanding of the real needs of doctors for medical assistants. Based on this, we construct a Chinese medical dataset called DoctorFLAN to support the entire workflow of doctors, which includes 92K Q\&A samples from 22 tasks and 27 specialists. Moreover, we evaluate LLMs in doctor-oriented scenarios by constructing the DoctorFLAN-\textit{test} containing 550 single-turn Q\&A and DotaBench containing 74 multi-turn conversations. The evaluation results indicate that being a medical assistant still poses challenges for existing open-source models, but DoctorFLAN can help them significantly. It demonstrates that the doctor-oriented dataset and benchmarks we construct can complement existing patient-oriented work and better promote medical LLMs research. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.18027 [pdf, other]

Automated Clinical Data Extraction with Knowledge Conditioned LLMs

Authors: Diya Li, Asim Kadav, Aijing Gao, Rui Li, Richard Bourgon

Abstract: The extraction of lung lesion information from clinical and medical imaging reports is crucial for research on and clinical care of lung-related diseases. Large language models (LLMs) can be effective at interpreting unstructured text in reports, but they often hallucinate due to a lack of domain-specific knowledge, leading to reduced accuracy and posing challenges for use in clinical settings. To… ▽ More The extraction of lung lesion information from clinical and medical imaging reports is crucial for research on and clinical care of lung-related diseases. Large language models (LLMs) can be effective at interpreting unstructured text in reports, but they often hallucinate due to a lack of domain-specific knowledge, leading to reduced accuracy and posing challenges for use in clinical settings. To address this, we propose a novel framework that aligns generated internal knowledge with external knowledge through in-context learning (ICL). Our framework employs a retriever to identify relevant units of internal or external knowledge and a grader to evaluate the truthfulness and helpfulness of the retrieved internal-knowledge rules, to align and update the knowledge bases. Our knowledge-conditioned approach also improves the accuracy and reliability of LLM outputs by addressing the extraction task in two stages: (i) lung lesion finding detection and primary structured field parsing, followed by (ii) further parsing of lesion description text into additional structured fields. Experiments with expert-curated test datasets demonstrate that this ICL approach can increase the F1 score for key fields (lesion size, margin and solidity) by an average of 12.9% over existing ICL methods. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.16771 [pdf, other]

An antiferromagnetic diode effect in even-layered MnBi2Te4

Authors: Anyuan Gao, Shao-Wen Chen, Barun Ghosh, Jian-Xiang Qiu, Yu-Fei Liu, Yugo Onishi, Chaowei Hu, Tiema Qian, Damien Bérubé, Thao Dinh, Houchen Li, Christian Tzschaschel, Seunghyun Park, Tianye Huang, Shang-Wei Lien, Zhe Sun, Sheng-Chin Ho, Bahadur Singh, Kenji Watanabe, Takashi Taniguchi, David C. Bell, Arun Bansil, Hsin Lin, Tay-Rong Chang, Amir Yacoby , et al. (4 additional authors not shown)

Abstract: In a PN junction, the separation between positive and negative charges leads to diode transport. In the past few years, the intrinsic diode transport in noncentrosymmetric polar conductors has attracted great interest, because it suggests novel nonlinear applications and provides a symmetry-sensitive probe of Fermi surface. Recently, such studies have been extended to noncentrosymmetric supercondu… ▽ More In a PN junction, the separation between positive and negative charges leads to diode transport. In the past few years, the intrinsic diode transport in noncentrosymmetric polar conductors has attracted great interest, because it suggests novel nonlinear applications and provides a symmetry-sensitive probe of Fermi surface. Recently, such studies have been extended to noncentrosymmetric superconductors, realizing the superconducting diode effect. Here, we show that, even in a centrosymmetric crystal without directional charge separation, the spins of an antiferromagnet (AFM) can generate a spatial directionality, leading to an AFM diode effect. We observe large second-harmonic transport in a nonlinear electronic device enabled by the compensated AFM state of even-layered MnBi2Te4. We also report a novel electrical sum-frequency generation (SFG), which has been rarely explored in contrast to the well-known optical SFG in wide-gap insulators. We demonstrate that the AFM enables an in-plane field-effect transistor and harvesting of wireless electromagnetic energy. The electrical SFG establishes a powerful method to study nonlinear electronics built by quantum materials. The AFM diode effect paves the way for potential device concepts including AFM logic circuits, self-powered AFM spintronics, and other applications that potentially bridge nonlinear electronics with AFM spintronics. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 33+8 pages, 14+2 figures

arXiv:2406.09648 [pdf, other]

An Intrinsic Vector Heat Network

Authors: Alexander Gao, Maurice Chu, Mubbasir Kapadia, Ming C. Lin, Hsueh-Ti Derek Liu

Abstract: Vector fields are widely used to represent and model flows for many science and engineering applications. This paper introduces a novel neural network architecture for learning tangent vector fields that are intrinsically defined on manifold surfaces embedded in 3D. Previous approaches to learning vector fields on surfaces treat vectors as multi-dimensional scalar fields, using traditional scalar-… ▽ More Vector fields are widely used to represent and model flows for many science and engineering applications. This paper introduces a novel neural network architecture for learning tangent vector fields that are intrinsically defined on manifold surfaces embedded in 3D. Previous approaches to learning vector fields on surfaces treat vectors as multi-dimensional scalar fields, using traditional scalar-valued architectures to process channels individually, thus fail to preserve fundamental intrinsic properties of the vector field. The core idea of this work is to introduce a trainable vector heat diffusion module to spatially propagate vector-valued feature data across the surface, which we incorporate into our proposed architecture that consists of vector-valued neurons. Our architecture is invariant to rigid motion of the input, isometric deformation, and choice of local tangent bases, and is robust to discretizations of the surface. We evaluate our Vector Heat Network on triangle meshes, and empirically validate its invariant properties. We also demonstrate the effectiveness of our method on the useful industrial application of quadrilateral mesh generation. △ Less

Submitted 18 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.00606 [pdf, other]

LLMs Could Autonomously Learn Without External Supervision

Authors: Ke Ji, Junying Chen, Anningzhe Gao, Wenya Xie, Xiang Wan, Benyou Wang

Abstract: In the quest for super-human performance, Large Language Models (LLMs) have traditionally been tethered to human-annotated datasets and predefined training objectives-a process that is both labor-intensive and inherently limited. This paper presents a transformative approach: Autonomous Learning for LLMs, a self-sufficient learning paradigm that frees models from the constraints of human supervisi… ▽ More In the quest for super-human performance, Large Language Models (LLMs) have traditionally been tethered to human-annotated datasets and predefined training objectives-a process that is both labor-intensive and inherently limited. This paper presents a transformative approach: Autonomous Learning for LLMs, a self-sufficient learning paradigm that frees models from the constraints of human supervision. This method endows LLMs with the ability to self-educate through direct interaction with text, akin to a human reading and comprehending literature. Our approach eliminates the reliance on annotated data, fostering an Autonomous Learning environment where the model independently identifies and reinforces its knowledge gaps. Empirical results from our comprehensive experiments, which utilized a diverse array of learning materials and were evaluated against standard public quizzes, reveal that Autonomous Learning outstrips the performance of both Pre-training and Supervised Fine-Tuning (SFT), as well as retrieval-augmented methods. These findings underscore the potential of Autonomous Learning to not only enhance the efficiency and effectiveness of LLM training but also to pave the way for the development of more advanced, self-reliant AI systems. △ Less

Submitted 6 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

Comments: 20 pages, 8 figures

arXiv:2406.00073 [pdf, other]

A Novel Review of Stability Techniques for Improved Privacy-Preserving Machine Learning

Authors: Coleman DuPlessie, Aidan Gao

Abstract: Machine learning models have recently enjoyed a significant increase in size and popularity. However, this growth has created concerns about dataset privacy. To counteract data leakage, various privacy frameworks guarantee that the output of machine learning models does not compromise their training data. However, this privatization comes at a cost by adding random noise to the training process, w… ▽ More Machine learning models have recently enjoyed a significant increase in size and popularity. However, this growth has created concerns about dataset privacy. To counteract data leakage, various privacy frameworks guarantee that the output of machine learning models does not compromise their training data. However, this privatization comes at a cost by adding random noise to the training process, which reduces model performance. By making models more resistant to small changes in input and thus more stable, the necessary amount of noise can be decreased while still protecting privacy. This paper investigates various techniques to enhance stability, thereby minimizing the negative effects of privatization in machine learning. △ Less

Submitted 30 May, 2024; originally announced June 2024.

arXiv:2405.19799 [pdf, other]

Unsupervised Mutual Learning of Dialogue Discourse Parsing and Topic Segmentation

Authors: Jiahui Xu, Feng Jiang, Anningzhe Gao, Haizhou Li

Abstract: The advancement of large language models (LLMs) has propelled the development of dialogue systems. Unlike the popular ChatGPT-like assistant model, which only satisfies the user's preferences, task-oriented dialogue systems have also faced new requirements and challenges in the broader business field. They are expected to provide correct responses at each dialogue turn, at the same time, achieve t… ▽ More The advancement of large language models (LLMs) has propelled the development of dialogue systems. Unlike the popular ChatGPT-like assistant model, which only satisfies the user's preferences, task-oriented dialogue systems have also faced new requirements and challenges in the broader business field. They are expected to provide correct responses at each dialogue turn, at the same time, achieve the overall goal defined by the task. By understanding rhetorical structures and topic structures via topic segmentation and discourse parsing, a dialogue system may do a better planning to achieve both objectives. However, while both structures belong to discourse structure in linguistics, rhetorical structure and topic structure are mostly modeled separately or with one assisting the other in the prior work. The interaction between these two structures has not been considered for joint modeling and mutual learning. Furthermore, unsupervised learning techniques to achieve the above are not well explored. To fill this gap, we propose an unsupervised mutual learning framework of two structures leveraging the global and local connections between them. We extend the topic modeling between non-adjacent discourse units to ensure global structural relevance with rhetorical structures. We also incorporate rhetorical structures into the topic structure through a graph neural network model to ensure local coherence consistency. Finally, we utilize the similarity between the two fused structures for mutual learning. The experimental results demonstrate that our methods outperform all strong baselines on two dialogue rhetorical datasets (STAC and Molweni), as well as dialogue topic datasets (Doc2Dial and TIAGE). We provide our code at https://github.com/Jeff-Sue/URT. △ Less

Submitted 3 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.14559 [pdf, other]

HemSeg-200: A Voxel-Annotated Dataset for Intracerebral Hemorrhages Segmentation in Brain CT Scans

Authors: Changwei Song, Qing Zhao, Jianqiang Li, Xin Yue, Ruoyun Gao, Zhaoxuan Wang, An Gao, Guanghui Fu

Abstract: Acute intracerebral hemorrhage is a life-threatening condition that demands immediate medical intervention. Intraparenchymal hemorrhage (IPH) and intraventricular hemorrhage (IVH) are critical subtypes of this condition. Clinically, when such hemorrhages are suspected, immediate CT scanning is essential to assess the extent of the bleeding and to facilitate the formulation of a targeted treatment… ▽ More Acute intracerebral hemorrhage is a life-threatening condition that demands immediate medical intervention. Intraparenchymal hemorrhage (IPH) and intraventricular hemorrhage (IVH) are critical subtypes of this condition. Clinically, when such hemorrhages are suspected, immediate CT scanning is essential to assess the extent of the bleeding and to facilitate the formulation of a targeted treatment plan. While current research in deep learning has largely focused on qualitative analyses, such as identifying subtypes of cerebral hemorrhages, there remains a significant gap in quantitative analysis crucial for enhancing clinical treatments. Addressing this gap, our paper introduces a dataset comprising 222 CT annotations, sourced from the RSNA 2019 Brain CT Hemorrhage Challenge and meticulously annotated at the voxel level for precise IPH and IVH segmentation. This dataset was utilized to train and evaluate seven advanced medical image segmentation algorithms, with the goal of refining the accuracy of segmentation for these hemorrhages. Our findings demonstrate that this dataset not only furthers the development of sophisticated segmentation algorithms but also substantially aids scientific research and clinical practice by improving the diagnosis and management of these severe hemorrhages. Our dataset and codes are available at \url{https://github.com/songchangwei/3DCT-SD-IVH-ICH}. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.13144 [pdf, other]

Mamo: a Mathematical Modeling Benchmark with Solvers

Authors: Xuhan Huang, Qingning Shen, Yan Hu, Anningzhe Gao, Benyou Wang

Abstract: Mathematical modeling involves representing real-world phenomena, systems, or problems using mathematical expressions and equations to analyze, understand, and predict their behavior. Given that this process typically requires experienced experts, there is an interest in exploring whether Large Language Models (LLMs) can undertake mathematical modeling to potentially decrease human labor. To evalu… ▽ More Mathematical modeling involves representing real-world phenomena, systems, or problems using mathematical expressions and equations to analyze, understand, and predict their behavior. Given that this process typically requires experienced experts, there is an interest in exploring whether Large Language Models (LLMs) can undertake mathematical modeling to potentially decrease human labor. To evaluate of LLMs in mathematical modeling, we introduce a new benchmark, Mamo, that transcends traditional result-oriented assessments. Unlike conventional methods that primarily assess LLMs based on the accuracy of solutions to mathematical problems, our approach offers deeper insight into the modeling process itself. By focusing on the processes LLMs undertake rather than the correctness of their final solutions, Mamo pioneers a novel evaluation paradigm. This shift underscores the importance of understanding the inherent modeling capabilities of LLMs, paving the way for a more nuanced and comprehensive analysis of their problem-solving strategies. Our work marks a significant advancement in the field, suggesting a new direction for future research by emphasizing the evaluation of LLMs' modeling processes over the mere correctness of answers. This benchmark not only facilitates a better understanding of LLMs' mathematical modeling capabilities but also sets a new standard for evaluating their performance in complex problem-solving scenarios. △ Less

Submitted 30 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

Comments: Project: https://github.com/FreedomIntelligence/Mamo Updates: 1. include more models 2. minor modification of the metric with new results 3. fix some typos 4. add error analysis with examples

arXiv:2405.06985 [pdf, other]

RoTHP: Rotary Position Embedding-based Transformer Hawkes Process

Authors: Anningzhe Gao, Shan Dai

Abstract: Temporal Point Processes (TPPs), especially Hawkes Process are commonly used for modeling asynchronous event sequences data such as financial transactions and user behaviors in social networks. Due to the strong fitting ability of neural networks, various neural Temporal Point Processes are proposed, among which the Neural Hawkes Processes based on self-attention such as Transformer Hawkes Process… ▽ More Temporal Point Processes (TPPs), especially Hawkes Process are commonly used for modeling asynchronous event sequences data such as financial transactions and user behaviors in social networks. Due to the strong fitting ability of neural networks, various neural Temporal Point Processes are proposed, among which the Neural Hawkes Processes based on self-attention such as Transformer Hawkes Process (THP) achieve distinct performance improvement. Although the THP has gained increasing studies, it still suffers from the {sequence prediction issue}, i.e., training on history sequences and inferencing about the future, which is a prevalent paradigm in realistic sequence analysis tasks. What's more, conventional THP and its variants simply adopt initial sinusoid embedding in transformers, which shows performance sensitivity to temporal change or noise in sequence data analysis by our empirical study. To deal with the problems, we propose a new Rotary Position Embedding-based THP (RoTHP) architecture in this paper. Notably, we show the translation invariance property and {sequence prediction flexibility} of our RoTHP induced by the {relative time embeddings} when coupled with Hawkes process theoretically. Furthermore, we demonstrate empirically that our RoTHP can be better generalized in sequence data scenarios with timestamp translations and in sequence prediction tasks. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2404.17104 [pdf, other]

Don't Look at the Camera: Achieving Perceived Eye Contact

Authors: Alice Gao, Samyukta Jayakumar, Marcello Maniglia, Brian Curless, Ira Kemelmacher-Shlizerman, Aaron R. Seitz, Steven M. Seitz

Abstract: We consider the question of how to best achieve the perception of eye contact when a person is captured by camera and then rendered on a 2D display. For single subjects photographed by a camera, conventional wisdom tells us that looking directly into the camera achieves eye contact. Through empirical user studies, we show that it is instead preferable to {\em look just below the camera lens}. We q… ▽ More We consider the question of how to best achieve the perception of eye contact when a person is captured by camera and then rendered on a 2D display. For single subjects photographed by a camera, conventional wisdom tells us that looking directly into the camera achieves eye contact. Through empirical user studies, we show that it is instead preferable to {\em look just below the camera lens}. We quantitatively assess where subjects should direct their gaze relative to a camera lens to optimize the perception that they are making eye contact. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.12036 [pdf, other]

Exploring the Premelting Transition through Molecular Simulations Powered by Neural Network Potentials

Authors: Limin Zeng, Ang Gao

Abstract: The system has addressed the error of "Bad character(s) in field Abstract" for no reason. Please refer to manuscript for the full abstract. The system has addressed the error of "Bad character(s) in field Abstract" for no reason. Please refer to manuscript for the full abstract. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 10 pages, 6 figures

arXiv:2404.05236 [pdf, other]

Stylizing Sparse-View 3D Scenes with Hierarchical Neural Representation

Authors: Y. Wang, A. Gao, Y. Gong, Y. Zeng

Abstract: Recently, a surge of 3D style transfer methods has been proposed that leverage the scene reconstruction power of a pre-trained neural radiance field (NeRF). To successfully stylize a scene this way, one must first reconstruct a photo-realistic radiance field from collected images of the scene. However, when only sparse input views are available, pre-trained few-shot NeRFs often suffer from high-fr… ▽ More Recently, a surge of 3D style transfer methods has been proposed that leverage the scene reconstruction power of a pre-trained neural radiance field (NeRF). To successfully stylize a scene this way, one must first reconstruct a photo-realistic radiance field from collected images of the scene. However, when only sparse input views are available, pre-trained few-shot NeRFs often suffer from high-frequency artifacts, which are generated as a by-product of high-frequency details for improving reconstruction quality. Is it possible to generate more faithful stylized scenes from sparse inputs by directly optimizing encoding-based scene representation with target style? In this paper, we consider the stylization of sparse-view scenes in terms of disentangling content semantics and style textures. We propose a coarse-to-fine sparse-view scene stylization framework, where a novel hierarchical encoding-based neural representation is designed to generate high-quality stylized scenes directly from implicit scene representations. We also propose a new optimization strategy with content strength annealing to achieve realistic stylization and better content preservation. Extensive experiments demonstrate that our method can achieve high-quality stylization of sparse-view scenes and outperforms fine-tuning-based baselines in terms of stylization quality and efficiency. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.02986 [pdf, other]

Universal Functional Regression with Neural Operator Flows

Authors: Yaozhong Shi, Angela F. Gao, Zachary E. Ross, Kamyar Azizzadenesheli

Abstract: Regression on function spaces is typically limited to models with Gaussian process priors. We introduce the notion of universal functional regression, in which we aim to learn a prior distribution over non-Gaussian function spaces that remains mathematically tractable for functional regression. To do this, we develop Neural Operator Flows (OpFlow), an infinite-dimensional extension of normalizing… ▽ More Regression on function spaces is typically limited to models with Gaussian process priors. We introduce the notion of universal functional regression, in which we aim to learn a prior distribution over non-Gaussian function spaces that remains mathematically tractable for functional regression. To do this, we develop Neural Operator Flows (OpFlow), an infinite-dimensional extension of normalizing flows. OpFlow is an invertible operator that maps the (potentially unknown) data function space into a Gaussian process, allowing for exact likelihood estimation of functional point evaluations. OpFlow enables robust and accurate uncertainty quantification via drawing posterior samples of the Gaussian process and subsequently mapping them into the data function space. We empirically study the performance of OpFlow on regression and generation tasks with data generated from Gaussian processes with known posterior forms and non-Gaussian processes, as well as real-world earthquake seismograms with an unknown closed-form distribution. △ Less

Submitted 3 April, 2024; originally announced April 2024.

arXiv:2403.15912 [pdf, other]

doi 10.1038/s41586-024-07211-8

Observation of the dual quantum spin Hall insulator by density-tuned correlations in a van der Waals monolayer

Authors: Jian Tang, Thomas Siyuan Ding, Hongyu Chen, Anyuan Gao, Tiema Qian, Zumeng Huang, Zhe Sun, Xin Han, Alex Strasser, Jiangxu Li, Michael Geiwitz, Mohamed Shehabeldin, Vsevolod Belosevich, Zihan Wang, Yiping Wang, Kenji Watanabe, Takashi Taniguchi, David C. Bell, Ziqiang Wang, Liang Fu, Yang Zhang, Xiaofeng Qian, Kenneth S. Burch, Youguo Shi, Ni Ni , et al. (3 additional authors not shown)

Abstract: The convergence of topology and correlations represents a highly coveted realm in the pursuit of novel quantum states of matter. Introducing electron correlations to a quantum spin Hall (QSH) insulator can lead to the emergence of a fractional topological insulator and other exotic time-reversal-symmetric topological order, not possible in quantum Hall and Chern insulator systems. However, the QSH… ▽ More The convergence of topology and correlations represents a highly coveted realm in the pursuit of novel quantum states of matter. Introducing electron correlations to a quantum spin Hall (QSH) insulator can lead to the emergence of a fractional topological insulator and other exotic time-reversal-symmetric topological order, not possible in quantum Hall and Chern insulator systems. However, the QSH insulator with quantized edge conductance remains rare, let alone that with significant correlations. In this work, we report a novel dual QSH insulator within the intrinsic monolayer crystal of TaIrTe4, arising from the interplay of its single-particle topology and density-tuned electron correlations. At charge neutrality, monolayer TaIrTe4 demonstrates the QSH insulator that aligns with single-particle band structure calculations, manifesting enhanced nonlocal transport and quantized helical edge conductance. Interestingly, upon introducing electrons from charge neutrality, TaIrTe4 only shows metallic behavior in a small range of charge densities but quickly goes into a new insulating state, entirely unexpected based on TaIrTe4's single-particle band structure. This insulating state could arise from a strong electronic instability near the van Hove singularities (VHS), likely leading to a charge density wave (CDW). Remarkably, within this correlated insulating gap, we observe a resurgence of the QSH state, marked by the revival of nonlocal transport and quantized helical edge conduction. Our observation of helical edge conduction in a CDW gap could bridge spin physics and charge orders. The discovery of a dual QSH insulator introduces a new method for creating topological flat minibands via CDW superlattices, which offer a promising platform for exploring time-reversal-symmetric fractional phases and electromagnetism. △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: 23 pages, 15 figures, submitted version

arXiv:2403.03640 [pdf, other]

Apollo: A Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People

Authors: Xidong Wang, Nuo Chen, Junyin Chen, Yidong Wang, Guorui Zhen, Yan Hu, Xiangbo Wu, Anningzhe Gao, Xiang Wan, Haizhou Li, Benyou Wang

Abstract: Despite the vast repository of global medical knowledge predominantly being in English, local languages are crucial for delivering tailored healthcare services, particularly in areas with limited medical resources. To extend the reach of medical AI advancements to a broader population, we aim to develop medical LLMs across the six most widely spoken languages, encompassing a global population of 6… ▽ More Despite the vast repository of global medical knowledge predominantly being in English, local languages are crucial for delivering tailored healthcare services, particularly in areas with limited medical resources. To extend the reach of medical AI advancements to a broader population, we aim to develop medical LLMs across the six most widely spoken languages, encompassing a global population of 6.1 billion. This effort culminates in the creation of the ApolloCorpora multilingual medical dataset and the XMedBench benchmark. In the multilingual medical benchmark, the released Apollo models, at various relatively-small sizes (i.e., 0.5B, 1.8B, 2B, 6B, and 7B), achieve the best performance among models of equivalent size. Especially, Apollo-7B is the state-of-the-art multilingual medical LLMs up to 70B. Additionally, these lite models could be used to improve the multi-lingual medical capabilities of larger models without fine-tuning in a proxy-tuning fashion. We will open-source training corpora, code, model weights and evaluation benchmark. △ Less

Submitted 16 August, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

Comments: Preprint

arXiv:2401.10956 [pdf, other]

AI Revolution on Chat Bot: Evidence from a Randomized Controlled Experiment

Authors: Sida Peng, Wojciech Swiatek, Allen Gao, Paul Cullivan, Haoge Chang

Abstract: In recent years, generative AI has undergone major advancements, demonstrating significant promise in augmenting human productivity. Notably, large language models (LLM), with ChatGPT-4 as an example, have drawn considerable attention. Numerous articles have examined the impact of LLM-based tools on human productivity in lab settings and designed tasks or in observational studies. Despite recent a… ▽ More In recent years, generative AI has undergone major advancements, demonstrating significant promise in augmenting human productivity. Notably, large language models (LLM), with ChatGPT-4 as an example, have drawn considerable attention. Numerous articles have examined the impact of LLM-based tools on human productivity in lab settings and designed tasks or in observational studies. Despite recent advances, field experiments applying LLM-based tools in realistic settings are limited. This paper presents the findings of a field randomized controlled trial assessing the effectiveness of LLM-based tools in providing unmonitored support services for information retrieval. △ Less

Submitted 19 January, 2024; originally announced January 2024.

arXiv:2401.00931 [pdf, other]

doi 10.1007/JHEP05(2024)328

A Collinear Perspective on the Regge Limit

Authors: Anjie Gao, Ian Moult, Sanjay Raman, Gregory Ridgway, Iain W. Stewart

Abstract: The high energy (Regge) limit provides a playground for understanding all loop structures of scattering amplitudes, and plays an important role in the description of many phenomenologically relevant cross-sections. While well understood in the planar limit, the structure of non-planar corrections introduces many fascinating complexities, for which a general organizing principle is still lacking. W… ▽ More The high energy (Regge) limit provides a playground for understanding all loop structures of scattering amplitudes, and plays an important role in the description of many phenomenologically relevant cross-sections. While well understood in the planar limit, the structure of non-planar corrections introduces many fascinating complexities, for which a general organizing principle is still lacking. We study the structure of multi-reggeon exchanges in the context of the effective field theory for forward scattering, and derive their factorization into collinear operators (impact factors) and soft operators. We derive the structure of the renormalization group consistency equations in the effective theory, showing how the anomalous dimensions of the soft operators are related to those of the collinear operators, allowing us to derive renormalization group equations in the Regge limit purely from a collinear perspective. The rigidity of the consistency equations provides considerable insight into the all orders organization of Regge amplitudes in the effective theory, as well as its relation to other approaches. Along the way we derive a number of technical results that improve the understanding of the effective theory. We illustrate this collinear perspective by re-deriving all the standard BFKL equations for two-Glauber exchange from purely collinear calculations, and we show that this perspective provides a number of conceptual and computational advantages as compared to the standard view from soft or Glauber physics. We anticipate that this formulation in terms of collinear operators will enable a better understanding of the relation between BFKL and DGLAP in gauge theories, and facilitate the analysis of renormalization group evolution equations describing Reggeization beyond next-to-leading order. △ Less

Submitted 29 August, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

Comments: 48 pages + references, 10 figures; v2: JHEP version; v3: fix references

Report number: MIT-CTP 5628

Journal ref: JHEP 05 (2024) 328

arXiv:2312.16408 [pdf, other]

The Transverse Energy-Energy Correlator at Next-to-Next-to-Next-to-Leading Logarithm

Authors: Anjie Gao, Hai Tao Li, Ian Moult, Hua Xing Zhu

Abstract: We present an operator based factorization formula for the transverse energy-energy correlator in the back-to-back (dijet) region, and uncover its remarkable perturbative simplicity and relation to transverse momentum dynamics. This simplicity enables us to achieve next-to-next-to-next-to leading logarithmic (N$^3$LL) accuracy for a hadron collider dijet event shape for the first time. Our factori… ▽ More We present an operator based factorization formula for the transverse energy-energy correlator in the back-to-back (dijet) region, and uncover its remarkable perturbative simplicity and relation to transverse momentum dynamics. This simplicity enables us to achieve next-to-next-to-next-to leading logarithmic (N$^3$LL) accuracy for a hadron collider dijet event shape for the first time. Our factorization formula applies to color singlet, $W/Z/γ$ + jet, and dijet production, providing a natural generalization of transverse momentum observables to one- and two-jet final states. This provides a laboratory for precision studies of QCD and transverse momentum dynamics at hadron colliders, as well as an opportunity for understanding factorization and its violation in a perturbatively well controlled setting. △ Less

Submitted 26 December, 2023; originally announced December 2023.

Comments: 54 pages, 12 figures

Report number: MIT-CTP 5662

arXiv:2311.13951 [pdf, other]

MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria

Authors: Wentao Ge, Shunian Chen, Guiming Hardy Chen, Zhihong Chen, Junying Chen, Shuo Yan, Chenghao Zhu, Ziyue Lin, Wenya Xie, Xinyi Zhang, Yichen Chai, Xiaoyu Liu, Dingjie Song, Xidong Wang, Anningzhe Gao, Zhiyi Zhang, Jianquan Li, Xiang Wan, Benyou Wang

Abstract: Multimodal large language models (MLLMs) (e.g., GPT-4V, LLaVA, and Claude-3) have broadened the scope of AI applications. Yet, evaluating their performance presents a significant challenge owing to the inherently subjective nature of tasks that do not yield clear-cut solutions especially for those open-ended queries. Existing automatic evaluation methodologies are mainly limited in evaluating obje… ▽ More Multimodal large language models (MLLMs) (e.g., GPT-4V, LLaVA, and Claude-3) have broadened the scope of AI applications. Yet, evaluating their performance presents a significant challenge owing to the inherently subjective nature of tasks that do not yield clear-cut solutions especially for those open-ended queries. Existing automatic evaluation methodologies are mainly limited in evaluating objective queries without considering real-world user experiences, inadequately addressing the nuances of creative and associative multimodal tasks. In our paper, we propose a new evaluation paradigm for MLLMs, which is evaluating MLLMs with \textit{per-sample criteria} using potent MLLM as the judge. To validate the feasibility and effectiveness of this paradigm, we design a benchmark, dubbed \textit{MLLM-Bench}, with the evaluation samples across six critical levels following the revised Bloom's Taxonomy with the ethical consideration. We benchmark 21 popular MLLMs in a pairwise-comparison fashion, showing diverse performance across models. Moreover, the validity of our benchmark manifests itself in reaching 88.02\% agreement with human evaluation. We contend that the proposed paradigm explores the potential of MLLMs as effective evaluation tools with the help of per-sample criteria, and that MLLM-Bench will serve as a catalyst for encouraging the development of user-centric MLLMs tailored to real-world applications. Our benchmark data, online leaderboard and submission entry are at https://mllm-bench.llmzoo.com. △ Less

Submitted 27 April, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

Comments: 23 pages

arXiv:2311.09774 [pdf, other]

HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs

Authors: Junying Chen, Xidong Wang, Anningzhe Gao, Feng Jiang, Shunian Chen, Hongbo Zhang, Dingjie Song, Wenya Xie, Chuyi Kong, Jianquan Li, Xiang Wan, Haizhou Li, Benyou Wang

Abstract: Adapting a language model into a specific domain, a.k.a `domain adaption', is a common practice when specialized knowledge, e.g. medicine, is not encapsulated in a general language model like Llama2. The challenge lies in the heterogeneity of data across the two training stages, as it varies in languages, genres, or formats. To tackle this and simplify the learning protocol, we propose to transfor… ▽ More Adapting a language model into a specific domain, a.k.a `domain adaption', is a common practice when specialized knowledge, e.g. medicine, is not encapsulated in a general language model like Llama2. The challenge lies in the heterogeneity of data across the two training stages, as it varies in languages, genres, or formats. To tackle this and simplify the learning protocol, we propose to transform heterogeneous data, from the both pre-training and supervised stages, into a unified, simple input-output pair format. We validate the new protocol in the domains where proprietary LLMs like ChatGPT perform relatively poorly, such as Traditional Chinese Medicine. The developed model, HuatuoGPT-II, has shown state-of-the-art performance in Chinese medicine domain on a number of benchmarks, e.g. medical licensing exams. It even outperforms proprietary models like ChatGPT and GPT-4 in some aspects, especially in Traditional Chinese Medicine. Expert manual evaluations further validate HuatuoGPT-II's advantages over existing LLMs. Notably, HuatuoGPT-II was benchmarked in a fresh Chinese National Medical Licensing Examination where it achieved the best performance, showcasing not only its effectiveness but also its generalization capabilities. △ Less

Submitted 16 November, 2023; originally announced November 2023.

arXiv:2311.09724 [pdf, other]

OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning

Authors: Fei Yu, Anningzhe Gao, Benyou Wang

Abstract: Large language models (LLMs) often struggle with maintaining accuracy throughout multiple multiple reasoning steps, especially in mathematical reasoning where an error in earlier steps can propagate to subsequent ones and it ultimately leading to an incorrect answer. To reduce error propagation, guided decoding is employed to direct the LM decoding on a step-by-step basis. We argue that in guided… ▽ More Large language models (LLMs) often struggle with maintaining accuracy throughout multiple multiple reasoning steps, especially in mathematical reasoning where an error in earlier steps can propagate to subsequent ones and it ultimately leading to an incorrect answer. To reduce error propagation, guided decoding is employed to direct the LM decoding on a step-by-step basis. We argue that in guided decoding, assessing the potential of an incomplete reasoning path can be more advantageous than simply ensuring per-step correctness, as the former approach leads towards a correct final answer. This transforms the task into a $\textit{value estimation}$ problem in planning. Inspired by the findings that $\textit{outcome supervision for guided decoding essentially acts as a value model}$, we propose Outcome-supervised Value Model (OVM) that employs outcome supervision for training a value model, which prioritizes steps that lead to accurate conclusions. Furthermore, the OVM eliminates the need for labor-intensive annotations of step-level correctness, thereby significantly enhancing its scalability. Our experiments on two multi-step mathematical reasoning datasets, GSM8K and Game of 24, demonstrate the superior performance of the OVM model. Notably, in GSM8K, our $\textbf{OVM-7B model achieves state-of-the-art results among LLMs up to 13B parameters}$; especially it does not utilize GPT-4 or code execution. These findings offer a novel perspective on the role of outcome supervision in training value models for multi-step reasoning tasks and provide theoretical justification for its advantage in value estimation for guided decoding. △ Less

Submitted 1 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

Comments: Accepted to NAACL findings. https://github.com/FreedomIntelligence/OVM

arXiv:2311.06929 [pdf, ps, other]

The combinatorics behind the leading Kazhdan-Lusztig coefficients of braid matroids

Authors: Alice L. L. Gao, Nicholas Proudfoot, Arthur L. B. Yang, Zhong-Xue Zhang

Abstract: Ferroni and Larson gave a combinatorial interpretation of the braid Kazhdan-Lusztig polynomials in terms of series-parallel matroids. As a consequence, they confirmed an explicit formula for the leading Kazhdan-Lusztig coefficients of braid matroids with odd rank, as conjectured by Elias, Proudfoot, and Wakefield. Based on Ferroni and Larson's work, we further explore the combinatorics behind the… ▽ More Ferroni and Larson gave a combinatorial interpretation of the braid Kazhdan-Lusztig polynomials in terms of series-parallel matroids. As a consequence, they confirmed an explicit formula for the leading Kazhdan-Lusztig coefficients of braid matroids with odd rank, as conjectured by Elias, Proudfoot, and Wakefield. Based on Ferroni and Larson's work, we further explore the combinatorics behind the leading Kazhdan-Lusztig coefficients of braid matroids. The main results of this paper include an explicit formula for the leading Kazhdan-Lusztig coefficients of braid matroids with even rank, a simple expression for the number of simple series-parallel matroids of rank k + 1 on 2k elements, and explicit formulas for the leading coefficients of inverse Kazhdan-Lusztig polynomials of braid matroids. The binomial identity for the Abel polynomials plays an important role in the proofs of these formulas. △ Less

Submitted 12 November, 2023; originally announced November 2023.

MSC Class: 05B35; 05A15; 05A19

arXiv:2309.12119 [pdf, other]

Pseudo-Bayesian unit level modeling for small area estimation under informative sampling

Authors: Peter A. Gao, Jon Wakefield

Abstract: When mapping subnational health and demographic indicators, direct weighted estimators of small area means based on household survey data can be unreliable when data are limited. If survey microdata are available, unit level models can relate individual survey responses to unit level auxiliary covariates and explicitly account for spatial dependence and between area variation using random effects.… ▽ More When mapping subnational health and demographic indicators, direct weighted estimators of small area means based on household survey data can be unreliable when data are limited. If survey microdata are available, unit level models can relate individual survey responses to unit level auxiliary covariates and explicitly account for spatial dependence and between area variation using random effects. These models can produce estimators with improved precision, but often neglect to account for the design of the surveys used to collect data. Pseudo-Bayesian approaches incorporate sampling weights to address informative sampling when using such models to conduct population inference but credible sets based on the resulting pseudo-posterior distributions can be poorly calibrated without adjustment. We outline a pseudo-Bayesian strategy for small area estimation that addresses informative sampling and incorporates a post-processing rescaling step that produces credible sets with close to nominal empirical frequentist coverage rates. We compare our approach with existing design-based and model-based estimators using real and simulated data. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2309.04581 [pdf, other]

Dynamic Mesh-Aware Radiance Fields

Authors: Yi-Ling Qiao, Alexander Gao, Yiran Xu, Yue Feng, Jia-Bin Huang, Ming C. Lin

Abstract: Embedding polygonal mesh assets within photorealistic Neural Radience Fields (NeRF) volumes, such that they can be rendered and their dynamics simulated in a physically consistent manner with the NeRF, is under-explored from the system perspective of integrating NeRF into the traditional graphics pipeline. This paper designs a two-way coupling between mesh and NeRF during rendering and simulation.… ▽ More Embedding polygonal mesh assets within photorealistic Neural Radience Fields (NeRF) volumes, such that they can be rendered and their dynamics simulated in a physically consistent manner with the NeRF, is under-explored from the system perspective of integrating NeRF into the traditional graphics pipeline. This paper designs a two-way coupling between mesh and NeRF during rendering and simulation. We first review the light transport equations for both mesh and NeRF, then distill them into an efficient algorithm for updating radiance and throughput along a cast ray with an arbitrary number of bounces. To resolve the discrepancy between the linear color space that the path tracer assumes and the sRGB color space that standard NeRF uses, we train NeRF with High Dynamic Range (HDR) images. We also present a strategy to estimate light sources and cast shadows on the NeRF. Finally, we consider how the hybrid surface-volumetric formulation can be efficiently integrated with a high-performance physics simulator that supports cloth, rigid and soft bodies. The full rendering and simulation system can be run on a GPU at interactive rates. We show that a hybrid system approach outperforms alternatives in visual realism for mesh insertion, because it allows realistic light transport from volumetric NeRF media onto surfaces, which affects the appearance of reflective/refractive surfaces and illumination of diffuse surfaces informed by the dynamic scene. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: ICCV 2023

arXiv:2307.15603 [pdf, other]

doi 10.1038/s41467-024-47291-8

Nonlinear optical diode effect in a magnetic Weyl semimetal

Authors: Christian Tzschaschel, Jian-Xiang Qiu, Xue-Jian Gao, Hou-Chen Li, Chunyu Guo, Hung-Yu Yang, Cheng-Ping Zhang, Ying-Ming Xie, Yu-Fei Liu, Anyuan Gao, Damien Bérubé, Thao Dinh, Sheng-Chin Ho, Yuqiang Fang, Fuqiang Huang, Johanna Nordlander, Qiong Ma, Fazel Tafti, Philip J. W. Moll, Kam Tuen Law, Su-Yang Xu

Abstract: Diode effects are of great interest for both fundamental physics and modern technologies. Electrical diode effects (nonreciprocal transport) have been observed in Weyl systems. Optical diode effects arising from the Weyl fermions have been theoretically considered but not probed experimentally. Here, we report the observation of a nonlinear optical diode effect (NODE) in the magnetic Weyl semimeta… ▽ More Diode effects are of great interest for both fundamental physics and modern technologies. Electrical diode effects (nonreciprocal transport) have been observed in Weyl systems. Optical diode effects arising from the Weyl fermions have been theoretically considered but not probed experimentally. Here, we report the observation of a nonlinear optical diode effect (NODE) in the magnetic Weyl semimetal CeAlSi, where the magnetization introduces a pronounced directionality in the nonlinear optical second-harmonic generation (SHG). We show demonstrate a six-fold change of the measured SHG intensity between opposite propagation directions over a bandwidth exceeding 250 meV. Supported by density-functional theory, we establish the linearly dispersive bands emerging from Weyl nodes as the origin of this broadband effect. We further demonstrate current-induced magnetization switching and thus electrical control of the NODE. Our results advance ongoing research to identify novel nonlinear optical/transport phenomena in magnetic topological materials and further opens new pathways for the unidirectional manipulation of light. △ Less

Submitted 8 April, 2024; v1 submitted 28 July, 2023; originally announced July 2023.

Comments: 20 pages, 4 figures, SI included

Journal ref: Nat. Commun 15, 3017 (2024)

arXiv:2307.10539 [pdf, ps, other]

Induced log-concavity of equivariant matroid invariants

Authors: Alice L. L. Gao, Ethan Y. H. Li, Matthew H. Y. Xie, Arthur L. B. Yang, Zhong-Xue Zhang

Abstract: Inspired by the notion of equivariant log-concavity, we introduce the concept of induced log-concavity for a sequence of representations of a finite group. For an equivariant matroid equipped with a symmetric group action or a finite general linear group action, we transform the problem of proving the induced log-concavity of matroid invariants to that of proving the Schur positivity of symmetric… ▽ More Inspired by the notion of equivariant log-concavity, we introduce the concept of induced log-concavity for a sequence of representations of a finite group. For an equivariant matroid equipped with a symmetric group action or a finite general linear group action, we transform the problem of proving the induced log-concavity of matroid invariants to that of proving the Schur positivity of symmetric functions. We prove the induced log-concavity of the equivariant Kazhdan-Lusztig polynomials of $q$-niform matroids equipped with the action of a finite general linear group, as well as that of the equivariant Kazhdan-Lusztig polynomials of uniform matroids equipped with the action of a symmetric group. As a consequence of the former, we obtain the log-concavity of Kazhdan-Lusztig polynomials of $q$-niform matroids, thus providing further positive evidence for Elias, Proudfoot and Wakefield's log-concavity conjecture on the matroid Kazhdan-Lusztig polynomials. From the latter we obtain the log-concavity of Kazhdan-Lusztig polynomials of uniform matroids, which was recently proved by Xie and Zhang by using a computer algebra approach. We also establish the induced log-concavity of the equivariant characteristic polynomials and the equivariant inverse Kazhdan-Lusztig polynomials for $q$-niform matroids and uniform matroids. △ Less

Submitted 19 July, 2023; originally announced July 2023.

Comments: 36 pages

MSC Class: 05B35; 05E05; 20C30

arXiv:2307.09793 [pdf]

On the Origin of LLMs: An Evolutionary Tree and Graph for 15,821 Large Language Models

Authors: Sarah Gao, Andrew Kean Gao

Abstract: Since late 2022, Large Language Models (LLMs) have become very prominent with LLMs like ChatGPT and Bard receiving millions of users. Hundreds of new LLMs are announced each week, many of which are deposited to Hugging Face, a repository of machine learning models and datasets. To date, nearly 16,000 Text Generation models have been uploaded to the site. Given the huge influx of LLMs, it is of int… ▽ More Since late 2022, Large Language Models (LLMs) have become very prominent with LLMs like ChatGPT and Bard receiving millions of users. Hundreds of new LLMs are announced each week, many of which are deposited to Hugging Face, a repository of machine learning models and datasets. To date, nearly 16,000 Text Generation models have been uploaded to the site. Given the huge influx of LLMs, it is of interest to know which LLM backbones, settings, training methods, and families are popular or trending. However, there is no comprehensive index of LLMs available. We take advantage of the relatively systematic nomenclature of Hugging Face LLMs to perform hierarchical clustering and identify communities amongst LLMs using n-grams and term frequency-inverse document frequency. Our methods successfully identify families of LLMs and accurately cluster LLMs into meaningful subgroups. We present a public web application to navigate and explore Constellation, our atlas of 15,821 LLMs. Constellation rapidly generates a variety of visualizations, namely dendrograms, graphs, word clouds, and scatter plots. Constellation is available at the following link: https://constellation.sites.stanford.edu/. △ Less

Submitted 19 July, 2023; originally announced July 2023.

Comments: 14 pages, 6 figures, 1 table

ACM Class: I.2.1; H.5.0

arXiv:2307.05537 [pdf]

NLP Meets RNA: Unsupervised Embedding Learning for Ribozymes with Word2Vec

Authors: Andrew Kean Gao

Abstract: Ribozymes, RNA molecules with distinct 3D structures and catalytic activity, have widespread applications in synthetic biology and therapeutics. However, relatively little research has focused on leveraging deep learning to enhance our understanding of ribozymes. This study implements Word2Vec, an unsupervised learning technique for natural language processing, to learn ribozyme embeddings. Ribo2V… ▽ More Ribozymes, RNA molecules with distinct 3D structures and catalytic activity, have widespread applications in synthetic biology and therapeutics. However, relatively little research has focused on leveraging deep learning to enhance our understanding of ribozymes. This study implements Word2Vec, an unsupervised learning technique for natural language processing, to learn ribozyme embeddings. Ribo2Vec was trained on over 9,000 diverse ribozymes, learning to map sequences to 128 and 256-dimensional vector spaces. Using Ribo2Vec, sequence embeddings for five classes of ribozymes (hatchet, pistol, hairpin, hovlinc, and twister sister) were calculated. Principal component analysis demonstrated the ability of these embeddings to distinguish between ribozyme classes. Furthermore, a simple SVM classifier trained on ribozyme embeddings showed promising results in accurately classifying ribozyme types. Our results suggest that the embedding vectors contained meaningful information about ribozymes. Interestingly, 256-dimensional embeddings behaved similarly to 128-dimensional embeddings, suggesting that a lower dimension vector space is generally sufficient to capture ribozyme features. This approach demonstrates the potential of Word2Vec for bioinformatics, opening new avenues for ribozyme research. Future research includes using a Transformer-based method to learn RNA embeddings, which can capture long-range interactions between nucleotides. △ Less

Submitted 8 July, 2023; originally announced July 2023.

ACM Class: I.2.7

arXiv:2307.00213 [pdf]

More for Less: Compact Convolutional Transformers Enable Robust Medical Image Classification with Limited Data

Authors: Andrew Kean Gao

Abstract: Transformers are very powerful tools for a variety of tasks across domains, from text generation to image captioning. However, transformers require substantial amounts of training data, which is often a challenge in biomedical settings, where high quality labeled data can be challenging or expensive to obtain. This study investigates the efficacy of Compact Convolutional Transformers (CCT) for rob… ▽ More Transformers are very powerful tools for a variety of tasks across domains, from text generation to image captioning. However, transformers require substantial amounts of training data, which is often a challenge in biomedical settings, where high quality labeled data can be challenging or expensive to obtain. This study investigates the efficacy of Compact Convolutional Transformers (CCT) for robust medical image classification with limited data, addressing a key issue faced by conventional Vision Transformers - their requirement for large datasets. A hybrid of transformers and convolutional layers, CCTs demonstrate high accuracy on modestly sized datasets. We employed a benchmark dataset of peripheral blood cell images of eight distinct cell types, each represented by approximately 2,000 low-resolution (28x28x3 pixel) samples. Despite the dataset size being smaller than those typically used with Vision Transformers, we achieved a commendable classification accuracy of 92.49% and a micro-average ROC AUC of 0.9935. The CCT also learned quickly, exceeding 80% validation accuracy after five epochs. Analysis of per-class precision, recall, F1, and ROC showed that performance was strong across cell types. Our findings underscore the robustness of CCTs, indicating their potential as a solution to data scarcity issues prevalent in biomedical imaging. We substantiate the applicability of CCTs in data-constrained areas and encourage further work on CCTs. △ Less

Submitted 30 June, 2023; originally announced July 2023.

Comments: 9 pages, 4 figures, 2 tables

MSC Class: I.4.9; I.2.10

arXiv:2306.14461 [pdf, other]

Polynomial-based Online Planning for Autonomous Drone Racing in Dynamic Environments

Authors: Qianhao Wang, Dong Wang, Chao Xu, Alan Gao, Fei Gao

Abstract: In recent years, there is a noteworthy advancement in autonomous drone racing. However, the primary focus is on attaining execution times, while scant attention is given to the challenges of dynamic environments. The high-speed nature of racing scenarios, coupled with the potential for unforeseeable environmental alterations, present stringent requirements for online replanning and its timeliness.… ▽ More In recent years, there is a noteworthy advancement in autonomous drone racing. However, the primary focus is on attaining execution times, while scant attention is given to the challenges of dynamic environments. The high-speed nature of racing scenarios, coupled with the potential for unforeseeable environmental alterations, present stringent requirements for online replanning and its timeliness. For racing in dynamic environments, we propose an online replanning framework with an efficient polynomial trajectory representation. We trade off between aggressive speed and flexible obstacle avoidance based on an optimization approach. Additionally, to ensure safety and precision when crossing intermediate racing waypoints, we formulate the demand as hard constraints during planning. For dynamic obstacles, parallel multi-topology trajectory planning is designed based on engineering considerations to prevent racing time loss due to local optimums. The framework is integrated into a quadrotor system and successfully demonstrated at the DJI Robomaster Intelligent UAV Championship, where it successfully complete the racing track and placed first, finishing in less than half the time of the second-place. △ Less

Submitted 26 June, 2023; originally announced June 2023.

arXiv:2306.12689 [pdf]

Vec2Vec: A Compact Neural Network Approach for Transforming Text Embeddings with High Fidelity

Authors: Andrew Kean Gao

Abstract: Vector embeddings have become ubiquitous tools for many language-related tasks. A leading embedding model is OpenAI's text-ada-002 which can embed approximately 6,000 words into a 1,536-dimensional vector. While powerful, text-ada-002 is not open source and is only available via API. We trained a simple neural network to convert open-source 768-dimensional MPNet embeddings into text-ada-002 embedd… ▽ More Vector embeddings have become ubiquitous tools for many language-related tasks. A leading embedding model is OpenAI's text-ada-002 which can embed approximately 6,000 words into a 1,536-dimensional vector. While powerful, text-ada-002 is not open source and is only available via API. We trained a simple neural network to convert open-source 768-dimensional MPNet embeddings into text-ada-002 embeddings. We compiled a subset of 50,000 online food reviews. We calculated MPNet and text-ada-002 embeddings for each review and trained a simple neural network to for 75 epochs. The neural network was designed to predict the corresponding text-ada-002 embedding for a given MPNET embedding. Our model achieved an average cosine similarity of 0.932 on 10,000 unseen reviews in our held-out test dataset. We manually assessed the quality of our predicted embeddings for vector search over text-ada-002-embedded reviews. While not as good as real text-ada-002 embeddings, predicted embeddings were able to retrieve highly relevant reviews. Our final model, Vec2Vec, is lightweight (<80 MB) and fast. Future steps include training a neural network with a more sophisticated architecture and a larger dataset of paired embeddings to achieve greater performance. The ability to convert between and align embedding spaces may be helpful for interoperability, limiting dependence on proprietary models, protecting data privacy, reducing costs, and offline operations. △ Less

Submitted 22 June, 2023; originally announced June 2023.

Comments: 14 pages, 6 figures, 5 tables

ACM Class: I.2.7; D.2.12

arXiv:2306.09575 [pdf, other]

doi 10.1126/science.adf1506

Quantum metric nonlinear Hall effect in a topological antiferromagnetic heterostructure

Authors: Anyuan Gao, Yu-Fei Liu, Jian-Xiang Qiu, Barun Ghosh, Thaís V. Trevisan, Yugo Onishi, Chaowei Hu, Tiema Qian, Hung-Ju Tien, Shao-Wen Chen, Mengqi Huang, Damien Bérubé, Houchen Li, Christian Tzschaschel, Thao Dinh, Zhe Sun, Sheng-Chin Ho, Shang-Wei Lien, Bahadur Singh, Kenji Watanabe, Takashi Taniguchi, David C. Bell, Hsin Lin, Tay-Rong Chang, Chunhui Rita Du , et al. (6 additional authors not shown)

Abstract: Quantum geometry - the geometry of electron Bloch wavefunctions - is central to modern condensed matter physics. Due to the quantum nature, quantum geometry has two parts, the real part quantum metric and the imaginary part Berry curvature. The studies of Berry curvature have led to countless breakthroughs, ranging from the quantum Hall effect in 2DEGs to the anomalous Hall effect (AHE) in ferroma… ▽ More Quantum geometry - the geometry of electron Bloch wavefunctions - is central to modern condensed matter physics. Due to the quantum nature, quantum geometry has two parts, the real part quantum metric and the imaginary part Berry curvature. The studies of Berry curvature have led to countless breakthroughs, ranging from the quantum Hall effect in 2DEGs to the anomalous Hall effect (AHE) in ferromagnets. However, in contrast to Berry curvature, the quantum metric has rarely been explored. Here, we report a new nonlinear Hall effect induced by quantum metric by interfacing even-layered MnBi2Te4 (a PT-symmetric antiferromagnet (AFM)) with black phosphorus. This novel nonlinear Hall effect switches direction upon reversing the AFM spins and exhibits distinct scaling that suggests a non-dissipative nature. Like the AHE brought Berry curvature under the spotlight, our results open the door to discovering quantum metric responses. Moreover, we demonstrate that the AFM can harvest wireless electromagnetic energy via the new nonlinear Hall effect, therefore enabling intriguing applications that bridges nonlinear electronics with AFM spintronics. △ Less

Submitted 23 July, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: 19 pages, 4 figures and a Supplementary Materials with 66 pages, 4 figures and 3 tables. Originally submitted to Science on Oct. 5, 2022

Journal ref: Science 381, 181-186 (2023)

arXiv:2306.03922 [pdf, other]

Electronic ratchet effect in a moiré system: signatures of excitonic ferroelectricity

Authors: Zhiren Zheng, Xueqiao Wang, Ziyan Zhu, Stephen Carr, Trithep Devakul, Sergio de la Barrera, Nisarga Paul, Zumeng Huang, Anyuan Gao, Yang Zhang, Damien Bérubé, Kathryn Natasha Evancho, Kenji Watanabe, Takashi Taniguchi, Liang Fu, Yao Wang, Su-Yang Xu, Efthimios Kaxiras, Pablo Jarillo-Herrero, Qiong Ma

Abstract: Electronic ferroelectricity represents a new paradigm where spontaneous symmetry breaking driven by electronic correlations, in contrast to traditional lattice-driven ferroelectricity, leads to the formation of electric dipoles. Despite the potential application advantages arising from its electronic nature, switchable electronic ferroelectricity remains exceedingly rare. Here, we report the disco… ▽ More Electronic ferroelectricity represents a new paradigm where spontaneous symmetry breaking driven by electronic correlations, in contrast to traditional lattice-driven ferroelectricity, leads to the formation of electric dipoles. Despite the potential application advantages arising from its electronic nature, switchable electronic ferroelectricity remains exceedingly rare. Here, we report the discovery of an electronic ratchet effect that manifests itself as switchable electronic ferroelectricity in a layer-contrasting graphene-boron nitride moiré heterostructure. Our engineered layer-asymmetric moiré potential landscapes result in layer-polarized localized and itinerant electronic subsystems. At particular fillings of the localized subsystem, we find a ratcheting injection of itinerant carriers in a non-volatile manner, leading to a highly unusual ferroelectric response. Strikingly, the remnant polarization can be stabilized at multiple (quasi-continuous) states with behavior markedly distinct from known ferroelectrics. Our experimental observations, simulations, and theoretical analysis suggest that dipolar excitons are the driving force and elementary ferroelectric units in our system. This signifies a new type of electronic ferroelectricity where the formation of dipolar excitons with aligned moments generates a macroscopic polarization and leads to an electronically-driven ferroelectric response, which we term excitonic ferroelectricity. Such new ferroelectrics, driven by quantum objects like dipolar excitons, could pave the way to innovative quantum analog memory and synaptic devices. △ Less

Submitted 6 June, 2023; originally announced June 2023.

arXiv:2304.05589 [pdf, other]

Discovering Structure From Corruption for Unsupervised Image Reconstruction

Authors: Oscar Leong, Angela F. Gao, He Sun, Katherine L. Bouman

Abstract: We consider solving ill-posed imaging inverse problems without access to an image prior or ground-truth examples. An overarching challenge in these inverse problems is that an infinite number of images, including many that are implausible, are consistent with the observed measurements. Thus, image priors are required to reduce the space of possible solutions to more desirable reconstructions. Howe… ▽ More We consider solving ill-posed imaging inverse problems without access to an image prior or ground-truth examples. An overarching challenge in these inverse problems is that an infinite number of images, including many that are implausible, are consistent with the observed measurements. Thus, image priors are required to reduce the space of possible solutions to more desirable reconstructions. However, in many applications it is difficult or potentially impossible to obtain example images to construct an image prior. Hence inaccurate priors are often used, which inevitably result in biased solutions. Rather than solving an inverse problem using priors that encode the spatial structure of any one image, we propose to solve a set of inverse problems jointly by incorporating prior constraints on the collective structure of the underlying images. The key assumption of our work is that the underlying images we aim to reconstruct share common, low-dimensional structure. We show that such a set of inverse problems can be solved simultaneously without the use of a spatial image prior by instead inferring a shared image generator with a low-dimensional latent space. The parameters of the generator and latent embeddings are found by maximizing a proxy for the Evidence Lower Bound (ELBO). Once identified, the generator and latent embeddings can be combined to provide reconstructed images for each inverse problem. The framework we propose can handle general forward model corruptions, and we show that measurements derived from only a small number of ground-truth images ($\leqslant 150$) are sufficient for image reconstruction. We demonstrate our approach on a variety of convex and non-convex inverse problems, including denoising, phase retrieval, and black hole video reconstruction. △ Less

Submitted 1 November, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

Comments: Extended version of arXiv:2303.12217

arXiv:2303.12217 [pdf, other]

Image Reconstruction without Explicit Priors

Authors: Angela F. Gao, Oscar Leong, He Sun, Katherine L. Bouman

Abstract: We consider solving ill-posed imaging inverse problems without access to an explicit image prior or ground-truth examples. An overarching challenge in inverse problems is that there are many undesired images that fit to the observed measurements, thus requiring image priors to constrain the space of possible solutions to more plausible reconstructions. However, in many applications it is difficult… ▽ More We consider solving ill-posed imaging inverse problems without access to an explicit image prior or ground-truth examples. An overarching challenge in inverse problems is that there are many undesired images that fit to the observed measurements, thus requiring image priors to constrain the space of possible solutions to more plausible reconstructions. However, in many applications it is difficult or potentially impossible to obtain ground-truth images to learn an image prior. Thus, inaccurate priors are often used, which inevitably result in biased solutions. Rather than solving an inverse problem using priors that encode the explicit structure of any one image, we propose to solve a set of inverse problems jointly by incorporating prior constraints on the collective structure of the underlying images.The key assumption of our work is that the ground-truth images we aim to reconstruct share common, low-dimensional structure. We show that such a set of inverse problems can be solved simultaneously by learning a shared image generator with a low-dimensional latent space. The parameters of the generator and latent embedding are learned by maximizing a proxy for the Evidence Lower Bound (ELBO). Once learned, the generator and latent embeddings can be combined to provide reconstructions for each inverse problem. The framework we propose can handle general forward model corruptions, and we show that measurements derived from only a few ground-truth images (O(10)) are sufficient for image reconstruction without explicit priors. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Comments: ICASSP 2023

arXiv:2303.05451 [pdf, other]

doi 10.1038/s41563-023-01493-5

Axion optical induction of antiferromagnetic order

Authors: Jian-Xiang Qiu, Christian Tzschaschel, Junyeong Ahn, Anyuan Gao, Houchen Li, Xin-Yue Zhang, Barun Ghosh, Chaowei Hu, Yu-Xuan Wang, Yu-Fei Liu, Damien Bérubé, Thao Dinh, Zhenhao Gong, Shang-Wei Lien, Sheng-Chin Ho, Bahadur Singh, Kenji Watanabe, Takashi Taniguchi, David C. Bell, Hai-Zhou Lu, Arun Bansil, Hsin Lin, Tay-Rong Chang, Brian B. Zhou, Qiong Ma , et al. (3 additional authors not shown)

Abstract: Using circularly-polarized light to control quantum matter is a highly intriguing topic in physics, chemistry and biology. Previous studies have demonstrated helicity-dependent optical control of spatial chirality and magnetization $M$. The former is central for asymmetric synthesis in chemistry and homochirality in bio-molecules, while the latter is of great interest for ferromagnetic spintronics… ▽ More Using circularly-polarized light to control quantum matter is a highly intriguing topic in physics, chemistry and biology. Previous studies have demonstrated helicity-dependent optical control of spatial chirality and magnetization $M$. The former is central for asymmetric synthesis in chemistry and homochirality in bio-molecules, while the latter is of great interest for ferromagnetic spintronics. In this paper, we report the surprising observation of helicity-dependent optical control of fully-compensated antiferromagnetic (AFM) order in 2D even-layered MnBi$_2$Te$_4$, a topological Axion insulator with neither chirality nor $M$. We further demonstrate helicity-dependent optical creation of AFM domain walls by double induction beams and the direct reversal of AFM domains by ultrafast pulses. The control and reversal of AFM domains and domain walls by light helicity have never been achieved in any fully-compensated AFM. To understand this optical control, we study a novel type of circular dichroism (CD) proportional to the AFM order, which only appears in reflection but is absent in transmission. We show that the optical control and CD both arise from the optical Axion electrodynamics, which can be visualized as a Berry curvature real space dipole. Our Axion induction provides the possibility to optically control a family of $\mathcal{PT}$-symmetric AFMs such as Cr$_2$O$_3$, CrI$_3$ and possibly novel states in cuprates. In MnBi$_2$Te$_4$, this further opens the door for optical writing of dissipationless circuit formed by topological edge states. △ Less

Submitted 9 March, 2023; originally announced March 2023.

Journal ref: Nature Materials 22, 583-590 (2023)

arXiv:2303.01330 [pdf, other]

Continuous Implicit SDF Based Any-shape Robot Trajectory Optimization

Authors: Tingrui Zhang, Jingping Wang, Chao Xu, Alan Gao, Fei Gao

Abstract: Optimization-based trajectory generation methods are widely used in whole-body planning for robots. However, existing work either oversimplifies the robot's geometry and environment representation, resulting in a conservative trajectory, or suffers from a huge overhead in maintaining additional information such as the Signed Distance Field (SDF). To bridge the gap, we consider the robot as an impl… ▽ More Optimization-based trajectory generation methods are widely used in whole-body planning for robots. However, existing work either oversimplifies the robot's geometry and environment representation, resulting in a conservative trajectory, or suffers from a huge overhead in maintaining additional information such as the Signed Distance Field (SDF). To bridge the gap, we consider the robot as an implicit function, with its surface boundary represented by the zero-level set of its SDF. Based on this, we further employ another implicit function to lazily compute the signed distance to the swept volume generated by the robot and its trajectory. The computation is efficient by exploiting continuity in space-time, and the implicit function guarantees precise and continuous collision evaluation even for nonconvex robots with complex surfaces. Furthermore, we propose a trajectory optimization pipeline applicable to the implicit SDF. Simulation and real-world experiments validate the high performance of our approach for arbitrarily shaped robot trajectory optimization. △ Less

Submitted 2 March, 2023; originally announced March 2023.

arXiv:2301.07188 [pdf, other]

Dielectric Saturation in Water from a Long Range Machine Learning Model

Authors: Harender S. Dhattarwal, Ang Gao, Richard C. Remsing

Abstract: Machine learning-based neural network potentials have the ability to provide ab initio-level predictions while reaching large length and time scales often limited to empirical force fields. Traditionally, neural network potentials rely on a local description of atomic environments to achieve this scalability. These local descriptions result in short range models that neglect long range interaction… ▽ More Machine learning-based neural network potentials have the ability to provide ab initio-level predictions while reaching large length and time scales often limited to empirical force fields. Traditionally, neural network potentials rely on a local description of atomic environments to achieve this scalability. These local descriptions result in short range models that neglect long range interactions necessary for processes like dielectric screening in polar liquids. Several approaches to including long range electrostatic interactions within neural network models have appeared recently, and here we investigate the transferability of one such model, the self consistent neural network (SCFNN), which focuses on learning the physics associated with long range response. By learning the essential physics, one can expect that such a neural network model should exhibit at least partial transferability. We illustrate this transferability by modeling dielectric saturation in a SCFNN model of water. We show that the SCFNN model can predict non-linear response at high electric fields, including saturation of the dielectric constant, without training the model on these high field strengths and the resulting liquid configurations. We then use these simulations to examine the nuclear and electronic structure changes underlying dielectric saturation. Our results suggest that neural network models can exhibit transferability beyond the linear response regime and make genuine predictions when the relevant physics is properly learned. △ Less

Submitted 17 January, 2023; originally announced January 2023.

Comments: 10 pages, 7 figures

arXiv:2212.05155 [pdf, other]

Acela: Predictable Datacenter-level Maintenance Job Scheduling

Authors: Yi Ding, Aijia Gao, Thibaud Ryden, Kaushik Mitra, Sukumar Kalmanje, Yanai Golany, Michael Carbin, Henry Hoffmann

Abstract: Datacenter operators ensure fair and regular server maintenance by using automated processes to schedule maintenance jobs to complete within a strict time budget. Automating this scheduling problem is challenging because maintenance job duration varies based on both job type and hardware. While it is tempting to use prior machine learning techniques for predicting job duration, we find that the st… ▽ More Datacenter operators ensure fair and regular server maintenance by using automated processes to schedule maintenance jobs to complete within a strict time budget. Automating this scheduling problem is challenging because maintenance job duration varies based on both job type and hardware. While it is tempting to use prior machine learning techniques for predicting job duration, we find that the structure of the maintenance job scheduling problem creates a unique challenge. In particular, we show that prior machine learning methods that produce the lowest error predictions do not produce the best scheduling outcomes due to asymmetric costs. Specifically, underpredicting maintenance job duration has results in more servers being taken offline and longer server downtime than overpredicting maintenance job duration. The system cost of underprediction is much larger than that of overprediction. We present Acela, a machine learning system for predicting maintenance job duration, which uses quantile regression to bias duration predictions toward overprediction. We integrate Acela into a maintenance job scheduler and evaluate it on datasets from large-scale, production datacenters. Compared to machine learning based predictors from prior work, Acela reduces the number of servers that are taken offline by 1.87-4.28X, and reduces the server offline time by 1.40-2.80X. △ Less

Submitted 9 December, 2022; originally announced December 2022.

arXiv:2212.01468 [pdf, ps, other]

The Struggles of Chessland

Authors: Irene Choi, Shreyas Ekanathan, Aidan Gao, Tanya Khovanova, Sylvia Zia Lee, Rajarshi Mandal, Vaibhav Rastogi, Daniel Sheffield, Michael Yang, Angela Zhao, Corey Zhao

Abstract: This is a fairy tale taking place in Chessland, located in the Bermuda triangle. The chess pieces survey their land and trap enemy pieces. Behind the story, there is fascinating mathematics on how to optimize surveying and trapping. The tale is written by the students in the PRIMES STEP junior group, who were in grades 6 through 9. The paper has a conclusion, written by the group's mentor, Tanya K… ▽ More This is a fairy tale taking place in Chessland, located in the Bermuda triangle. The chess pieces survey their land and trap enemy pieces. Behind the story, there is fascinating mathematics on how to optimize surveying and trapping. The tale is written by the students in the PRIMES STEP junior group, who were in grades 6 through 9. The paper has a conclusion, written by the group's mentor, Tanya Khovanova, explaining the students' results in terms of graph theory. △ Less

Submitted 2 December, 2022; originally announced December 2022.

Comments: 31 pages, 32 figures, 5 tables

MSC Class: 00A08; 05C99

arXiv:2211.13329 [pdf, other]

Extent of Safety Database in Pediatric Drug Development: Types of Assessment, Analytical Precision, and Pathway for Extrapolation through On-Target Effects

Authors: Margaret Gamalo, Yihua Zhao, Aijun Gao, Jingjing Ye, Ralph DeMasi, Eiji Eshida, YJ Choi, Robert Nelson

Abstract: Pediatric patients should have access to medicines that have been appropriately evaluated for safety and efficacy. Given this goal of revised labelling, the adequacy of the pediatric clinical development plan and resulting safety database must inform a favorable benefit-risk assessment for the intended use of the medicinal product. While extrapolation from adults can be used to support efficacy of… ▽ More Pediatric patients should have access to medicines that have been appropriately evaluated for safety and efficacy. Given this goal of revised labelling, the adequacy of the pediatric clinical development plan and resulting safety database must inform a favorable benefit-risk assessment for the intended use of the medicinal product. While extrapolation from adults can be used to support efficacy of drugs in children, there may be a reluctance to use the same approach in safety assessments, wiping out potential gains in trial efficiency through a reduction of sample size. To address this reluctance, we explore safety review in pediatric trials, including factors affecting these data, specific types of safety assessments, and precision on the estimation of event rates for specific adverse events (AEs) that can be achieved. In addition, we discuss the assessments which can provide a benchmark for the use of extrapolation of safety that focuses on on-target effects. Finally, we explore a unified approach for understanding precision using Bayesian approaches as the most appropriate methodology to describe/ascertain risk in probabilistic terms for the estimate of the event rate of specific AEs. △ Less

Submitted 23 November, 2022; originally announced November 2022.

arXiv:2211.06573 [pdf]

Approaching intrinsic threshold breakdown voltage and ultra-high gain in graphite/InSe Schottky photodetector

Authors: Zhiyi Zhang, Bin Cheng, Jeremy Lim, Anyuan Gao, Lingyuan Lyu, Tianju Cao, Shuang Wang, Zhu-An Li, Qingyun Wu, L. K. Ang, Yee Sin Ang, Shi-Jun Liang, Feng Miao

Abstract: Realizing both ultra-low breakdown voltage and ultra-high gain has been one of the major challenges in the development of high-performance avalanche photodetector. Here, we report that an ultra-high avalanche gain of 3*10^5 can be realized in the graphite/InSe Schottky photodetector at a breakdown voltage down to 5.5 V. Remarkably, the threshold breakdown voltage can be further reduced down to 1.8… ▽ More Realizing both ultra-low breakdown voltage and ultra-high gain has been one of the major challenges in the development of high-performance avalanche photodetector. Here, we report that an ultra-high avalanche gain of 3*10^5 can be realized in the graphite/InSe Schottky photodetector at a breakdown voltage down to 5.5 V. Remarkably, the threshold breakdown voltage can be further reduced down to 1.8 V by raising the operating temperature, approaching the theoretical limit of 1.5E_g/e with E_g the band gap of semiconductor. We develop a two-dimensional impact ionization model and uncover that observation of high gain at low breakdown voltage arises from reduced dimensionality of electron-phonon (e-ph) scattering in the layered InSe flake. Our findings open up a promising avenue for developing novel weak-light detectors with low energy consumption and high sensitivity. △ Less

Submitted 11 November, 2022; originally announced November 2022.

arXiv:2210.12352 [pdf, other]

NeuPhysics: Editable Neural Geometry and Physics from Monocular Videos

Authors: Yi-Ling Qiao, Alexander Gao, Ming C. Lin

Abstract: We present a method for learning 3D geometry and physics parameters of a dynamic scene from only a monocular RGB video input. To decouple the learning of underlying scene geometry from dynamic motion, we represent the scene as a time-invariant signed distance function (SDF) which serves as a reference frame, along with a time-conditioned deformation field. We further bridge this neural geometry re… ▽ More We present a method for learning 3D geometry and physics parameters of a dynamic scene from only a monocular RGB video input. To decouple the learning of underlying scene geometry from dynamic motion, we represent the scene as a time-invariant signed distance function (SDF) which serves as a reference frame, along with a time-conditioned deformation field. We further bridge this neural geometry representation with a differentiable physics simulator by designing a two-way conversion between the neural field and its corresponding hexahedral mesh, enabling us to estimate physics parameters from the source video by minimizing a cycle consistency loss. Our method also allows a user to interactively edit 3D objects from the source video by modifying the recovered hexahedral mesh, and propagating the operation back to the neural field representation. Experiments show that our method achieves superior mesh and video reconstruction of dynamic scenes compared to competing Neural Field approaches, and we provide extensive examples which demonstrate its ability to extract useful 3D representations from videos captured with consumer-grade cameras. △ Less

Submitted 22 October, 2022; originally announced October 2022.

Comments: NeurIPS 2022

Showing 1–50 of 158 results for author: Gao, A