Search | arXiv e-print repository

arXiv:2407.13076 [pdf, other]

Matching-Driven Deep Reinforcement Learning for Energy-Efficient Transmission Parameter Allocation in Multi-Gateway LoRa Networks

Authors: Ziqi Lin, Xu Zhang, Shimin Gong, Lanhua Li, Zhou Su, Bo Gu

Abstract: Long-range (LoRa) communication technology, distinguished by its low power consumption and long communication range, is widely used in the Internet of Things. Nevertheless, the LoRa MAC layer adopts pure ALOHA for medium access control, which may suffer from severe packet collisions as the network scale expands, consequently reducing the system energy efficiency (EE). To address this issue, it is… ▽ More Long-range (LoRa) communication technology, distinguished by its low power consumption and long communication range, is widely used in the Internet of Things. Nevertheless, the LoRa MAC layer adopts pure ALOHA for medium access control, which may suffer from severe packet collisions as the network scale expands, consequently reducing the system energy efficiency (EE). To address this issue, it is critical to carefully allocate transmission parameters such as the channel (CH), transmission power (TP) and spreading factor (SF) to each end device (ED). Owing to the low duty cycle and sporadic traffic of LoRa networks, evaluating the system EE under various parameter settings proves to be time-consuming. Consequently, we propose an analytical model aimed at calculating the system EE while fully considering the impact of multiple gateways, duty cycling, quasi-orthogonal SFs and capture effects. On this basis, we investigate a joint CH, SF and TP allocation problem, with the objective of optimizing the system EE for uplink transmissions. Due to the NP-hard complexity of the problem, the optimization problem is decomposed into two subproblems: CH assignment and SF/TP assignment. First, a matching-based algorithm is introduced to address the CH assignment subproblem. Then, an attention-based multiagent reinforcement learning technique is employed to address the SF/TP assignment subproblem for EDs allocated to the same CH, which reduces the number of learning agents to achieve fast convergence. The simulation outcomes indicate that the proposed approach converges quickly under various parameter settings and obtains significantly better system EE than baseline algorithms. △ Less

Submitted 17 July, 2024; originally announced July 2024.

arXiv:2407.06476 [pdf, other]

On the fractal dimension of non-Newtonian Hele-Shaw flow subject to Saffman-Taylor instability

Authors: J. Adriazola, B. Gu, L. Cummings, L. Kondic

Abstract: We introduce a discrete numerical method based on the diffusion-limited aggregation (DLA) approach to simulate two-fluid Hele-Shaw flow subject to the Saffman-Taylor interfacial instability, in the case where the displaced fluid is non-Newtonian. Focusing on fluids for which the most relevant non-Newtonian aspect of the thin-gap flow is shear-thinning, we introduce a history-dependent aspect into… ▽ More We introduce a discrete numerical method based on the diffusion-limited aggregation (DLA) approach to simulate two-fluid Hele-Shaw flow subject to the Saffman-Taylor interfacial instability, in the case where the displaced fluid is non-Newtonian. Focusing on fluids for which the most relevant non-Newtonian aspect of the thin-gap flow is shear-thinning, we introduce a history-dependent aspect into the algorithm, modeling shear-rate-dependent fluid viscosity. The main finding is that the morphology of the emerging patterns, characterized by the fractal dimension, is modified in a nontrivial manner by the shear-thinning nature of the displaced fluid. In particular, we consistently find that shear-thinning leads to the formation of patterns characterized by a smaller fractal dimension, compared to the corresponding Newtonian fluid. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.03804 [pdf, other]

Multi-Time Scale Service Caching and Pricing in MEC Systems with Dynamic Program Popularity

Authors: Yiming Chen, Xingyuan Hu, Bo Gu, Shimin Gong, Zhou Su

Abstract: In mobile edge computing systems, base stations (BSs) equipped with edge servers can provide computing services to users to reduce their task execution time. However, there is always a conflict of interest between the BS and users. The BS prices the service programs based on user demand to maximize its own profit, while the users determine their offloading strategies based on the prices to minimiz… ▽ More In mobile edge computing systems, base stations (BSs) equipped with edge servers can provide computing services to users to reduce their task execution time. However, there is always a conflict of interest between the BS and users. The BS prices the service programs based on user demand to maximize its own profit, while the users determine their offloading strategies based on the prices to minimize their costs. Moreover, service programs need to be pre-cached to meet immediate computing needs. Due to the limited caching capacity and variations in service program popularity, the BS must dynamically select which service programs to cache. Since service caching and pricing have different needs for adjustment time granularities, we propose a two-time scale framework to jointly optimize service caching, pricing and task offloading. For the large time scale, we propose a game-nested deep reinforcement learning algorithm to dynamically adjust service caching according to the estimated popularity information. For the small time scale, by modeling the interaction between the BS and users as a two-stage game, we prove the existence of the equilibrium under incomplete information and then derive the optimal pricing and offloading strategies. Extensive simulations based on a real-world dataset demonstrate the efficiency of the proposed approach. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2406.17386 [pdf, other]

Double Momentum Method for Lower-Level Constrained Bilevel Optimization

Authors: Wanli Shi, Yi Chang, Bin Gu

Abstract: Bilevel optimization (BO) has recently gained prominence in many machine learning applications due to its ability to capture the nested structure inherent in these problems. Recently, many hypergradient methods have been proposed as effective solutions for solving large-scale problems. However, current hypergradient methods for the lower-level constrained bilevel optimization (LCBO) problems need… ▽ More Bilevel optimization (BO) has recently gained prominence in many machine learning applications due to its ability to capture the nested structure inherent in these problems. Recently, many hypergradient methods have been proposed as effective solutions for solving large-scale problems. However, current hypergradient methods for the lower-level constrained bilevel optimization (LCBO) problems need very restrictive assumptions, namely, where optimality conditions satisfy the differentiability and invertibility conditions and lack a solid analysis of the convergence rate. What's worse, existing methods require either double-loop updates, which are sometimes less efficient. To solve this problem, in this paper, we propose a new hypergradient of LCBO leveraging the theory of nonsmooth implicit function theorem instead of using the restrive assumptions. In addition, we propose a \textit{single-loop single-timescale} algorithm based on the double-momentum method and adaptive step size method and prove it can return a $(δ, ε)$-stationary point with $\tilde{\mathcal{O}}(d_2^2ε^{-4})$ iterations. Experiments on two applications demonstrate the effectiveness of our proposed method. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 27pages, 9 figures

arXiv:2406.07945 [pdf, other]

Making peace with random phases: Ab initio conical intersection dynamics in random gauges

Authors: Xiaotong Zhu, Bing Gu

Abstract: Ab initio modeling of conical intersection dynamics is crucial for various photochemical, photophysical, and biological processes. However, adiabatic electronic states obtained from electronic structure computations involve random phases, or more generally, random gauge fixings, which hampers the modeling of nonadiabatic molecular dynamics. Here we develop a random-gauge local diabatic representat… ▽ More Ab initio modeling of conical intersection dynamics is crucial for various photochemical, photophysical, and biological processes. However, adiabatic electronic states obtained from electronic structure computations involve random phases, or more generally, random gauge fixings, which hampers the modeling of nonadiabatic molecular dynamics. Here we develop a random-gauge local diabatic representation that allows an exact modeling of conical intersection dynamics directly using the adiabatic electronic states with phases randomly assigned during the electronic structure computations. Its utility is demonstrated by an exact ab initio modeling of the two-dimensional Shin-Metiu model with and without an external magnetic field. Our results provide a simple approach to integrating the electronic structure computations into non-adiabatic quantum dynamics, thus paving the way for ab initio modeling of conical intersection dynamics. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2405.18725 [pdf, other]

Can We Enhance the Quality of Mobile Crowdsensing Data Without Ground Truth?

Authors: Jiajie Li, Bo Gu, Shimin Gong, Zhou Su, Mohsen Guizani

Abstract: Mobile crowdsensing (MCS) has emerged as a prominent trend across various domains. However, ensuring the quality of the sensing data submitted by mobile users (MUs) remains a complex and challenging problem. To address this challenge, an advanced method is required to detect low-quality sensing data and identify malicious MUs that may disrupt the normal operations of an MCS system. Therefore, this… ▽ More Mobile crowdsensing (MCS) has emerged as a prominent trend across various domains. However, ensuring the quality of the sensing data submitted by mobile users (MUs) remains a complex and challenging problem. To address this challenge, an advanced method is required to detect low-quality sensing data and identify malicious MUs that may disrupt the normal operations of an MCS system. Therefore, this article proposes a prediction- and reputation-based truth discovery (PRBTD) framework, which can separate low-quality data from high-quality data in sensing tasks. First, we apply a correlation-focused spatial-temporal transformer network to predict the ground truth of the input sensing data. Then, we extract the sensing errors of the data as features based on the prediction results to calculate the implications among the data. Finally, we design a reputation-based truth discovery (TD) module for identifying low-quality data with their implications. Given sensing data submitted by MUs, PRBTD can eliminate the data with heavy noise and identify malicious MUs with high accuracy. Extensive experimental results demonstrate that PRBTD outperforms the existing methods in terms of identification accuracy and data quality enhancement. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.08684 [pdf, other]

Superconducting and topological properties of compound Lu$_4$H$_7$N

Authors: Zheng-Wei Liao, Xin-Wei Yi, Jing-Yang You, Bo Gu, Gang Su

Abstract: A recent experiment has reported a nitrogen-doped lutetium hydride acheving a remarkable Tc of 294 K at just 1 GPa, significantly reducing the required pressure for obtaining room temperature superconductivity. However, subsequent experimental and theoretical investigations have encountered difficulties in replicating these results, leaving the structure of this Lu-H-N compound shrouded in uncerta… ▽ More A recent experiment has reported a nitrogen-doped lutetium hydride acheving a remarkable Tc of 294 K at just 1 GPa, significantly reducing the required pressure for obtaining room temperature superconductivity. However, subsequent experimental and theoretical investigations have encountered difficulties in replicating these results, leaving the structure of this Lu-H-N compound shrouded in uncertainty. Here, we propose a stable structure for Lu$_4$H$_7$N employing first-principles calculations. Our calculations reveal that Lu$_4$H$_7$N has a Tc of 1.044 K, which can be substantially enhanced to 11.721 K at 150 GPa, due to the increasing electron-phonon coupling (EPC). Notably, we delve into the nontrivial Z2 band topology of Lu$_4$H$_7$N, featuring discernible surface states near the Fermi level, and we explore its spin Hall conductivity characteristics. Furthermore, we find that the electron doping can enhance the EPC strength and Tc of Lu$_4$H$_7$N, such as the Lu$_4$H$_7$O structure we predict simulating electron doping for Lu$_4$H$_7$N with an impressive Tc of 3.837 K. This work demonstrates the coexistence of superconducting and topological properties in a Lu-H-N system compound, which holds the promise of guiding the search for novel topological superconducting materials. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.01615 [pdf, other]

Hard-Thresholding Meets Evolution Strategies in Reinforcement Learning

Authors: Chengqian Gao, William de Vazelhes, Hualin Zhang, Bin Gu, Zhiqiang Xu

Abstract: Evolution Strategies (ES) have emerged as a competitive alternative for model-free reinforcement learning, showcasing exemplary performance in tasks like Mujoco and Atari. Notably, they shine in scenarios with imperfect reward functions, making them invaluable for real-world applications where dense reward signals may be elusive. Yet, an inherent assumption in ES, that all input features are task-… ▽ More Evolution Strategies (ES) have emerged as a competitive alternative for model-free reinforcement learning, showcasing exemplary performance in tasks like Mujoco and Atari. Notably, they shine in scenarios with imperfect reward functions, making them invaluable for real-world applications where dense reward signals may be elusive. Yet, an inherent assumption in ES, that all input features are task-relevant, poses challenges, especially when confronted with irrelevant features common in real-world problems. This work scrutinizes this limitation, particularly focusing on the Natural Evolution Strategies (NES) variant. We propose NESHT, a novel approach that integrates Hard-Thresholding (HT) with NES to champion sparsity, ensuring only pertinent features are employed. Backed by rigorous analysis and empirical tests, NESHT demonstrates its promise in mitigating the pitfalls of irrelevant features and shines in complex decision-making problems like noisy Mujoco and Atari tasks. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: 16 pages, including proofs in the appendix

arXiv:2404.19449 [pdf, other]

AoI-aware Sensing Scheduling and Trajectory Optimization for Multi-UAV-assisted Wireless Backscatter Networks

Authors: Yusi Long, Songhan Zhao, Shimin Gong, Bo Gu, Dusit Niyato, Xuemin, Shen

Abstract: This paper considers multiple unmanned aerial vehicles (UAVs) to assist sensing data transmissions from the ground users (GUs) to a remote base station (BS). Each UAV collects sensing data from the GUs and then forwards the sensing data to the remote BS. The GUs first backscatter their data to the UAVs and then all UAVs forward data to the BS by the nonorthogonal multiple access (NOMA) transmissio… ▽ More This paper considers multiple unmanned aerial vehicles (UAVs) to assist sensing data transmissions from the ground users (GUs) to a remote base station (BS). Each UAV collects sensing data from the GUs and then forwards the sensing data to the remote BS. The GUs first backscatter their data to the UAVs and then all UAVs forward data to the BS by the nonorthogonal multiple access (NOMA) transmissions. We formulate a multi-stage stochastic optimization problem to minimize the long-term time-averaged age-of-information (AoI) by jointly optimizing the GUs' access control, the UAVs' beamforming, and trajectory planning strategies. To solve this problem, we first model the dynamics of the GUs' AoI statuses by virtual queueing systems, and then propose the AoI-aware sensing scheduling and trajectory optimization (AoI-STO) algorithm. This allows us to transform the multi-stage AoI minimization problem into a series of per-slot control problems by using the Lyapunov optimization framework. In each time slot, the GUs' access control, the UAVs' beamforming, and mobility control strategies are updated by using the block coordinate descent (BCD) method according to the instant GUs' AoI statuses. Simulation results reveal that the proposed AoI-STO algorithm can reduce the overall AoI by more than 50%. The GUs' scheduling fairness is also improved greatly by adapting the GUs' access control compared with typical baseline schemes. △ Less

Submitted 30 April, 2024; originally announced April 2024.

Comments: This paper has been accepted by IEEE TVT

arXiv:2404.08885 [pdf, other]

Is Next Token Prediction Sufficient for GPT? Exploration on Code Logic Comprehension

Authors: Mengnan Qi, Yufan Huang, Yongqiang Yao, Maoquan Wang, Bin Gu, Neel Sundaresan

Abstract: Large language models (LLMs) has experienced exponential growth, they demonstrate remarkable performance across various tasks. Notwithstanding, contemporary research primarily centers on enhancing the size and quality of pretraining data, still utilizing the next token prediction task on autoregressive transformer model structure. The efficacy of this task in truly facilitating the model's compreh… ▽ More Large language models (LLMs) has experienced exponential growth, they demonstrate remarkable performance across various tasks. Notwithstanding, contemporary research primarily centers on enhancing the size and quality of pretraining data, still utilizing the next token prediction task on autoregressive transformer model structure. The efficacy of this task in truly facilitating the model's comprehension of code logic remains questionable, we speculate that it still interprets code as mere text, while human emphasizes the underlying logical knowledge. In order to prove it, we introduce a new task, "Logically Equivalent Code Selection," which necessitates the selection of logically equivalent code from a candidate set, given a query code. Our experimental findings indicate that current LLMs underperform in this task, since they understand code by unordered bag of keywords. To ameliorate their performance, we propose an advanced pretraining task, "Next Token Prediction+". This task aims to modify the sentence embedding distribution of the LLM without sacrificing its generative capabilities. Our experimental results reveal that following this pretraining, both Code Llama and StarCoder, the prevalent code domain pretraining models, display significant improvements on our logically equivalent code selection task and the code completion task. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2404.01897 [pdf, other]

Continuous Spiking Graph Neural Networks

Authors: Nan Yin, Mengzhu Wan, Li Shen, Hitesh Laxmichand Patel, Baopu Li, Bin Gu, Huan Xiong

Abstract: Continuous graph neural networks (CGNNs) have garnered significant attention due to their ability to generalize existing discrete graph neural networks (GNNs) by introducing continuous dynamics. They typically draw inspiration from diffusion-based methods to introduce a novel propagation scheme, which is analyzed using ordinary differential equations (ODE). However, the implementation of CGNNs req… ▽ More Continuous graph neural networks (CGNNs) have garnered significant attention due to their ability to generalize existing discrete graph neural networks (GNNs) by introducing continuous dynamics. They typically draw inspiration from diffusion-based methods to introduce a novel propagation scheme, which is analyzed using ordinary differential equations (ODE). However, the implementation of CGNNs requires significant computational power, making them challenging to deploy on battery-powered devices. Inspired by recent spiking neural networks (SNNs), which emulate a biological inference process and provide an energy-efficient neural architecture, we incorporate the SNNs with CGNNs in a unified framework, named Continuous Spiking Graph Neural Networks (COS-GNN). We employ SNNs for graph node representation at each time step, which are further integrated into the ODE process along with time. To enhance information preservation and mitigate information loss in SNNs, we introduce the high-order structure of COS-GNN, which utilizes the second-order ODE for spiking representation and continuous propagation. Moreover, we provide the theoretical proof that COS-GNN effectively mitigates the issues of exploding and vanishing gradients, enabling us to capture long-range dependencies between nodes. Experimental results on graph-based learning tasks demonstrate the effectiveness of the proposed COS-GNN over competitive baselines. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2403.18388 [pdf, other]

FTBC: Forward Temporal Bias Correction for Optimizing ANN-SNN Conversion

Authors: Xiaofeng Wu, Velibor Bojkovic, Bin Gu, Kun Suo, Kai Zou

Abstract: Spiking Neural Networks (SNNs) offer a promising avenue for energy-efficient computing compared with Artificial Neural Networks (ANNs), closely mirroring biological neural processes. However, this potential comes with inherent challenges in directly training SNNs through spatio-temporal backpropagation -- stemming from the temporal dynamics of spiking neurons and their discrete signal processing -… ▽ More Spiking Neural Networks (SNNs) offer a promising avenue for energy-efficient computing compared with Artificial Neural Networks (ANNs), closely mirroring biological neural processes. However, this potential comes with inherent challenges in directly training SNNs through spatio-temporal backpropagation -- stemming from the temporal dynamics of spiking neurons and their discrete signal processing -- which necessitates alternative ways of training, most notably through ANN-SNN conversion. In this work, we introduce a lightweight Forward Temporal Bias Correction (FTBC) technique, aimed at enhancing conversion accuracy without the computational overhead. We ground our method on provided theoretical findings that through proper temporal bias calibration the expected error of ANN-SNN conversion can be reduced to be zero after each time step. We further propose a heuristic algorithm for finding the temporal bias only in the forward pass, thus eliminating the computational burden of backpropagation and we evaluate our method on CIFAR-10/100 and ImageNet datasets, achieving a notable increase in accuracy on all datasets. Codes are released at a GitHub repository. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.11455 [pdf, other]

Antiferromagnetic Ground State, Charge Density Waves and Oxygen Vacancies Induced Metal-Insulator Transition in Pressurized La$_{3}$Ni$_{2}$O$_{7}$

Authors: Xin-Wei Yi, Ying Meng, Jia-Wen Li, Zheng-Wei Liao, Jing-Yang You, Bo Gu, Gang Su

Abstract: La$_{3}$Ni$_{2}$O$_{7}$ has garnered widespread interest recently due to its high-temperature superconductivity under pressure, accompanied by charge density wave (CDW) ordering and metal-insulator (MI) transitions in the phase diagram. Here, we reveal with comprehensive calculations that La$_{3}$Ni$_{2}$O$_{7}$ possesses an antiferromagnetic ground state under both low and high pressures, with th… ▽ More La$_{3}$Ni$_{2}$O$_{7}$ has garnered widespread interest recently due to its high-temperature superconductivity under pressure, accompanied by charge density wave (CDW) ordering and metal-insulator (MI) transitions in the phase diagram. Here, we reveal with comprehensive calculations that La$_{3}$Ni$_{2}$O$_{7}$ possesses an antiferromagnetic ground state under both low and high pressures, with the strong Fermi surface nesting contributed by the flat band that leads to phonon softening and electronic instabilities. Several stable CDW orders with oxygen octahedral distortions are identified, which can trigger the MI transitions. The estimated CDW transition temperature ($\approx$120 K) at ambient pressure agrees nicely with experimental results. In the presence of apical oxygen vacancies, we identify two different phases, say, half distortion and full distortion phases, respectively, and their competition can lead to a pressure-induced MI transition, in good agreement with experimental observations. In addition, we find that the electron-phonon coupling is too small to contribute to superconductivity. These results appear to indicate an unconventional superconducting pairing mechanism mediated by antiferromagnetic fluctuations. A phase diagram that is consistent with the experimental results is given. The present results not only explain the origins of experimentally observed CDW and MI transitions, but also provide insight for deeply understanding the properties like superconductivity, CDW and the role of oxygen vacancies in pressurized La$_{3}$Ni$_{2}$O$_{7}$. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.08964 [pdf]

Hyperelasticity of Blood Clots: Bridging the Gap between Microscopic and Continuum Scales

Authors: Nicholas Filla, Beikang Gu, Jixin Hou, Kenan Song, He Li, Ning Liu, Xianqiao Wang

Abstract: The biomechanical properties of blood clots, which are dictated by their compositions and micro-structures, play a critical role in determining their fates, occlusion, persistency, or embolization in the human circulatory system. While numerous constitutive models have emerged to describe the biomechanics of blood clots, the majority of these models have primarily focused on the macroscopic deform… ▽ More The biomechanical properties of blood clots, which are dictated by their compositions and micro-structures, play a critical role in determining their fates, occlusion, persistency, or embolization in the human circulatory system. While numerous constitutive models have emerged to describe the biomechanics of blood clots, the majority of these models have primarily focused on the macroscopic deformation of the clots and the resultant strain-stress correlations without depicting the microscopic contributions from their structural components, such as fibrin fibers, fibrin network and red blood cells. This work addresses the gap in current scientific understanding by quantifying how changes in the microstructure of blood clots affect its mechanical responses under different external stresses. We leverage our previous published work to develop a hyperelastic potential model for blood clots, which incorporates six distinct strain-energy components to describe the alignment of fibers, the entropic and enthalpic stretching of fibrin fibers, the buckling of these fibers, clot densification, and clot jamming.These strain-energy components are represented by a combination of simple harmonic oscillators, one-sided harmonic potentials, and a Gaussian potential. The proposed model, which is C0, C1, and C2 continuous with a total of 13 parameters, has been validated against three data sets: fibrin clot in tension, blood clot in compression, and blood clots in shear, demonstrating its robustness. Subsequent simulations of a microscopic blood clot model are performed to uncover mechanistic correlations for a majority of the hyperelastic potential's stiffness/strain parameters. Our results show that only one proposed term concerning fiber buckling needs further refinement, while the remaining five strain-energy terms appear to describe precisely what they were intended to. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 13 figures

arXiv:2402.15938 [pdf, other]

Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models

Authors: Yihong Dong, Xue Jiang, Huanyu Liu, Zhi Jin, Bin Gu, Mengfei Yang, Ge Li

Abstract: Recent statements about the impressive capabilities of large language models (LLMs) are usually supported by evaluating on open-access benchmarks. Considering the vast size and wide-ranging sources of LLMs' training data, it could explicitly or implicitly include test data, leading to LLMs being more susceptible to data contamination. However, due to the opacity of training data, the black-box acc… ▽ More Recent statements about the impressive capabilities of large language models (LLMs) are usually supported by evaluating on open-access benchmarks. Considering the vast size and wide-ranging sources of LLMs' training data, it could explicitly or implicitly include test data, leading to LLMs being more susceptible to data contamination. However, due to the opacity of training data, the black-box access of models, and the rapid growth of synthetic training data, detecting and mitigating data contamination for LLMs faces significant challenges. In this paper, we propose CDD, which stands for Contamination Detection via output Distribution for LLMs. CDD necessitates only the sampled texts to detect data contamination, by identifying the peakedness of LLM's output distribution. To mitigate the impact of data contamination in evaluation, we also present TED: Trustworthy Evaluation via output Distribution, based on the correction of LLM's output distribution. To facilitate this study, we introduce two benchmarks, i.e., DetCon and ComiEval, for data contamination detection and contamination mitigation evaluation tasks. Extensive experimental results show that CDD achieves the average relative improvements of 21.8\%-30.2\% over other contamination detection approaches in terms of Accuracy, F1 Score, and AUC metrics, and can effectively detect implicit contamination. TED substantially mitigates performance improvements up to 66.9\% attributed to data contamination across various contamination setups. In real-world applications, we reveal that ChatGPT exhibits a high potential to suffer from data contamination on HumanEval benchmark. △ Less

Submitted 31 May, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

Comments: Accepted to ACL

arXiv:2402.14312 [pdf, other]

doi 10.61977/ati2024008

The Jiao Tong University Spectroscopic Telescope Project

Authors: JUST Team, Chengze Liu, Ying Zu, Fabo Feng, Zhaoyu Li, Yu Yu, Hua Bai, Xiangqun Cui, Bozhong Gu, Yizhou Gu, Jiaxin Han, Yonghui Hou, Zhongwen Hu, Hangxin Ji, Yipeng Jing, Wei Li, Zhaoxiang Qi, Xianyu Tan, Cairang Tian, Dehua Yang, Xiangyan Yuan, Chao Zhai, Congcong Zhang, Jun Zhang, Haotong Zhang , et al. (6 additional authors not shown)

Abstract: The Jiao Tong University Spectroscopic Telescope (JUST) is a 4.4-meter f/6.0 segmentedmirror telescope dedicated to spectroscopic observations. The JUST primary mirror is composed of 18 hexagonal segments, each with a diameter of 1.1 m. JUST provides two Nasmyth platforms for placing science instruments. One Nasmyth focus fits a field of view of 10 arcmin and the other has an extended field of vie… ▽ More The Jiao Tong University Spectroscopic Telescope (JUST) is a 4.4-meter f/6.0 segmentedmirror telescope dedicated to spectroscopic observations. The JUST primary mirror is composed of 18 hexagonal segments, each with a diameter of 1.1 m. JUST provides two Nasmyth platforms for placing science instruments. One Nasmyth focus fits a field of view of 10 arcmin and the other has an extended field of view of 1.2 deg with correction optics. A tertiary mirror is used to switch between the two Nasmyth foci. JUST will be installed at a site at Lenghu in Qinghai Province, China, and will conduct spectroscopic observations with three types of instruments to explore the dark universe, trace the dynamic universe, and search for exoplanets: (1) a multi-fiber (2000 fibers) medium-resolution spectrometer (R=4000-5000) to spectroscopically map galaxies and large-scale structure; (2) an integral field unit (IFU) array of 500 optical fibers and/or a long-slit spectrograph dedicated to fast follow-ups of transient sources for multimessenger astronomy; (3) a high-resolution spectrometer (R~100000) designed to identify Jupiter analogs and Earth-like planets, with the capability to characterize the atmospheres of hot exoplanets. △ Less

Submitted 29 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

Comments: 28 pages, 6 figures

arXiv:2402.13241 [pdf, other]

Federated Causal Discovery from Heterogeneous Data

Authors: Loka Li, Ignavier Ng, Gongxu Luo, Biwei Huang, Guangyi Chen, Tongliang Liu, Bin Gu, Kun Zhang

Abstract: Conventional causal discovery methods rely on centralized data, which is inconsistent with the decentralized nature of data in many real-world situations. This discrepancy has motivated the development of federated causal discovery (FCD) approaches. However, existing FCD methods may be limited by their potentially restrictive assumptions of identifiable functional causal models or homogeneous data… ▽ More Conventional causal discovery methods rely on centralized data, which is inconsistent with the decentralized nature of data in many real-world situations. This discrepancy has motivated the development of federated causal discovery (FCD) approaches. However, existing FCD methods may be limited by their potentially restrictive assumptions of identifiable functional causal models or homogeneous data distributions, narrowing their applicability in diverse scenarios. In this paper, we propose a novel FCD method attempting to accommodate arbitrary causal models and heterogeneous data. We first utilize a surrogate variable corresponding to the client index to account for the data heterogeneity across different clients. We then develop a federated conditional independence test (FCIT) for causal skeleton discovery and establish a federated independent change principle (FICP) to determine causal directions. These approaches involve constructing summary statistics as a proxy of the raw data to protect data privacy. Owing to the nonparametric properties, FCIT and FICP make no assumption about particular functional forms, thereby facilitating the handling of arbitrary causal models. We conduct extensive experiments on synthetic and real datasets to show the efficacy of our method. The code is available at https://github.com/lokali/FedCDH.git. △ Less

Submitted 26 February, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

Comments: ICLR 2024

arXiv:2402.01146 [pdf, other]

Limited Memory Online Gradient Descent for Kernelized Pairwise Learning with Dynamic Averaging

Authors: Hilal AlQuabeh, William de Vazelhes, Bin Gu

Abstract: Pairwise learning, an important domain within machine learning, addresses loss functions defined on pairs of training examples, including those in metric learning and AUC maximization. Acknowledging the quadratic growth in computation complexity accompanying pairwise loss as the sample size grows, researchers have turned to online gradient descent (OGD) methods for enhanced scalability. Recently,… ▽ More Pairwise learning, an important domain within machine learning, addresses loss functions defined on pairs of training examples, including those in metric learning and AUC maximization. Acknowledging the quadratic growth in computation complexity accompanying pairwise loss as the sample size grows, researchers have turned to online gradient descent (OGD) methods for enhanced scalability. Recently, an OGD algorithm emerged, employing gradient computation involving prior and most recent examples, a step that effectively reduces algorithmic complexity to $O(T)$, with $T$ being the number of received examples. This approach, however, confines itself to linear models while assuming the independence of example arrivals. We introduce a lightweight OGD algorithm that does not require the independence of examples and generalizes to kernel pairwise learning. Our algorithm builds the gradient based on a random example and a moving average representing the past data, which results in a sub-linear regret bound with a complexity of $O(T)$. Furthermore, through the integration of $O(\sqrt{T}{\log{T}})$ random Fourier features, the complexity of kernel calculations is effectively minimized. Several experiments with real-world datasets show that the proposed technique outperforms kernel and linear algorithms in offline and online scenarios. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: Accepted in AAAI 2024

arXiv:2401.12983 [pdf]

Assessing Large Language Models in Mechanical Engineering Education: A Study on Mechanics-Focused Conceptual Understanding

Authors: Jie Tian, Jixin Hou, Zihao Wu, Peng Shu, Zhengliang Liu, Yujie Xiang, Beikang Gu, Nicholas Filla, Yiwei Li, Ning Liu, Xianyan Chen, Keke Tang, Tianming Liu, Xianqiao Wang

Abstract: This study is a pioneering endeavor to investigate the capabilities of Large Language Models (LLMs) in addressing conceptual questions within the domain of mechanical engineering with a focus on mechanics. Our examination involves a manually crafted exam encompassing 126 multiple-choice questions, spanning various aspects of mechanics courses, including Fluid Mechanics, Mechanical Vibration, Engin… ▽ More This study is a pioneering endeavor to investigate the capabilities of Large Language Models (LLMs) in addressing conceptual questions within the domain of mechanical engineering with a focus on mechanics. Our examination involves a manually crafted exam encompassing 126 multiple-choice questions, spanning various aspects of mechanics courses, including Fluid Mechanics, Mechanical Vibration, Engineering Statics and Dynamics, Mechanics of Materials, Theory of Elasticity, and Continuum Mechanics. Three LLMs, including ChatGPT (GPT-3.5), ChatGPT (GPT-4), and Claude (Claude-2.1), were subjected to evaluation against engineering faculties and students with or without mechanical engineering background. The findings reveal GPT-4's superior performance over the other two LLMs and human cohorts in answering questions across various mechanics topics, except for Continuum Mechanics. This signals the potential future improvements for GPT models in handling symbolic calculations and tensor analyses. The performances of LLMs were all significantly improved with explanations prompted prior to direct responses, underscoring the crucial role of prompt engineering. Interestingly, GPT-3.5 demonstrates improved performance with prompts covering a broader domain, while GPT-4 excels with prompts focusing on specific subjects. Finally, GPT-4 exhibits notable advancements in mitigating input bias, as evidenced by guessing preferences for humans. This study unveils the substantial potential of LLMs as highly knowledgeable assistants in both mechanical pedagogy and scientific research. △ Less

Submitted 13 January, 2024; originally announced January 2024.

Comments: 30 pages, 7 figures, and 1 table

arXiv:2401.06401

DevEval: Evaluating Code Generation in Practical Software Projects

Authors: Jia Li, Ge Li, Yunfei Zhao, Yongmin Li, Zhi Jin, Hao Zhu, Huanyu Liu, Kaibo Liu, Lecheng Wang, Zheng Fang, Lanshen Wang, Jiazheng Ding, Xuanming Zhang, Yihong Dong, Yuqi Zhu, Bin Gu, Mengfei Yang

Abstract: How to evaluate Large Language Models (LLMs) in code generation is an open question. Many benchmarks have been proposed but are inconsistent with practical software projects, e.g., unreal program distributions, insufficient dependencies, and small-scale project contexts. Thus, the capabilities of LLMs in practical projects are still unclear. In this paper, we propose a new benchmark named DevEval,… ▽ More How to evaluate Large Language Models (LLMs) in code generation is an open question. Many benchmarks have been proposed but are inconsistent with practical software projects, e.g., unreal program distributions, insufficient dependencies, and small-scale project contexts. Thus, the capabilities of LLMs in practical projects are still unclear. In this paper, we propose a new benchmark named DevEval, aligned with Developers' experiences in practical projects. DevEval is collected through a rigorous pipeline, containing 2,690 samples from 119 practical projects and covering 10 domains. Compared to previous benchmarks, DevEval aligns to practical projects in multiple dimensions, e.g., real program distributions, sufficient dependencies, and enough-scale project contexts. We assess five popular LLMs on DevEval (e.g., gpt-4, gpt-3.5-turbo, CodeLLaMa, and StarCoder) and reveal their actual abilities in code generation. For instance, the highest Pass@1 of gpt-3.5-turbo only is 42 in our experiments. We also discuss the challenges and future directions of code generation in practical projects. We open-source DevEval and hope it can facilitate the development of code generation in practical projects. △ Less

Submitted 5 March, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

Comments: We are re-checking this benchmark and repeating related experiments. New versions of DevEval will be released later

arXiv:2401.05394 [pdf, other]

Iterative Regularization with k-support Norm: An Important Complement to Sparse Recovery

Authors: William de Vazelhes, Bhaskar Mukhoty, Xiao-Tong Yuan, Bin Gu

Abstract: Sparse recovery is ubiquitous in machine learning and signal processing. Due to the NP-hard nature of sparse recovery, existing methods are known to suffer either from restrictive (or even unknown) applicability conditions, or high computational cost. Recently, iterative regularization methods have emerged as a promising fast approach because they can achieve sparse recovery in one pass through ea… ▽ More Sparse recovery is ubiquitous in machine learning and signal processing. Due to the NP-hard nature of sparse recovery, existing methods are known to suffer either from restrictive (or even unknown) applicability conditions, or high computational cost. Recently, iterative regularization methods have emerged as a promising fast approach because they can achieve sparse recovery in one pass through early stopping, rather than the tedious grid-search used in the traditional methods. However, most of those iterative methods are based on the $\ell_1$ norm which requires restrictive applicability conditions and could fail in many cases. Therefore, achieving sparse recovery with iterative regularization methods under a wider range of conditions has yet to be further explored. To address this issue, we propose a novel iterative regularization algorithm, IRKSN, based on the $k$-support norm regularizer rather than the $\ell_1$ norm. We provide conditions for sparse recovery with IRKSN, and compare them with traditional conditions for recovery with $\ell_1$ norm regularizers. Additionally, we give an early stopping bound on the model error of IRKSN with explicit constants, achieving the standard linear rate for sparse recovery. Finally, we illustrate the applicability of our algorithm on several experiments, including a support recovery experiment with a correlated design matrix. △ Less

Submitted 19 March, 2024; v1 submitted 19 December, 2023; originally announced January 2024.

Comments: Accepted at AAAI 2024. Code at https://github.com/wdevazelhes/IRKSN_AAAI2024

arXiv:2401.05373 [pdf, other]

Dynamic Spiking Graph Neural Networks

Authors: Nan Yin, Mengzhu Wang, Zhenghan Chen, Giulia De Masi, Bin Gu, Huan Xiong

Abstract: The integration of Spiking Neural Networks (SNNs) and Graph Neural Networks (GNNs) is gradually attracting attention due to the low power consumption and high efficiency in processing the non-Euclidean data represented by graphs. However, as a common problem, dynamic graph representation learning faces challenges such as high complexity and large memory overheads. Current work often uses SNNs inst… ▽ More The integration of Spiking Neural Networks (SNNs) and Graph Neural Networks (GNNs) is gradually attracting attention due to the low power consumption and high efficiency in processing the non-Euclidean data represented by graphs. However, as a common problem, dynamic graph representation learning faces challenges such as high complexity and large memory overheads. Current work often uses SNNs instead of Recurrent Neural Networks (RNNs) by using binary features instead of continuous ones for efficient training, which would overlooks graph structure information and leads to the loss of details during propagation. Additionally, optimizing dynamic spiking models typically requires propagation of information across time steps, which increases memory requirements. To address these challenges, we present a framework named \underline{Dy}namic \underline{S}p\underline{i}king \underline{G}raph \underline{N}eural Networks (\method{}). To mitigate the information loss problem, \method{} propagates early-layer information directly to the last layer for information compensation. To accommodate the memory requirements, we apply the implicit differentiation on the equilibrium state, which does not rely on the exact reverse of the forward computation. While traditional implicit differentiation methods are usually used for static situations, \method{} extends it to the dynamic graph setting. Extensive experiments on three large-scale real-world dynamic graph datasets validate the effectiveness of \method{} on dynamic node classification tasks with lower computational costs. △ Less

Submitted 15 December, 2023; originally announced January 2024.

arXiv:2312.15259 [pdf, other]

doi 10.1103/PhysRevB.109.134436

Room temperature ferromagnetic semiconductors through metal-semiconductor transition in monolayer MnSe2

Authors: Jia-Wen Li, Gang Su, Bo Gu

Abstract: To realize room temperature ferromagnetic semiconductors is still a challenge in spintronics. Recent experiments have obtained two-dimensional (2D) room temperature ferromagnetic metals, such as monolayer MnSe2. In this paper, we proposed a way to obtain room temperature ferromagnetic semiconductors through metal-semiconductor transition. By the density functional theory calculations, a room tempe… ▽ More To realize room temperature ferromagnetic semiconductors is still a challenge in spintronics. Recent experiments have obtained two-dimensional (2D) room temperature ferromagnetic metals, such as monolayer MnSe2. In this paper, we proposed a way to obtain room temperature ferromagnetic semiconductors through metal-semiconductor transition. By the density functional theory calculations, a room temperature ferromagnetic semiconductor is obtained in monolayer MnSe2 with a few percent tensile strains, where a metal-semiconductor transition occurs with 2.2% tensile stain. The tensile stains raise the energy of d orbitals of Mn atoms and p orbitals of Se atoms near the Fermi level, making the Fermi level sets in the energy gap of bonding and antibonding states of these p and d orbitals, and opening a small band gap. The room temperature ferromagnetic semiconductors are also obtained in the heterostructures MnSe2/X (X = Al2Se3, GaSe, SiH, and GaP), where metal-semiconductor transition happens due to the tensile strains by interface of heterostructures. In addition, a large magneto-optical Kerr effect (MOKE) is obtained in monolayer MnSe2 with tensile strain and MnSe2-based heterostructures. Our theoretical results pave a way to obtain room temperature magnetic semiconductors from experimentally obtained 2D room temperature ferromagnetic metals through metal-semiconductor transitions. △ Less

Submitted 23 December, 2023; originally announced December 2023.

arXiv:2312.11508 [pdf, other]

Rethinking the Instruction Quality: LIFT is What You Need

Authors: Yang Xu, Yongqiang Yao, Yufan Huang, Mengnan Qi, Maoquan Wang, Bin Gu, Neel Sundaresan

Abstract: Instruction tuning, a specialized technique to enhance large language model (LLM) performance via instruction datasets, relies heavily on the quality of employed data. Existing quality improvement methods alter instruction data through dataset expansion or curation. However, the expansion method risks data redundancy, potentially compromising LLM performance, while the curation approach confines t… ▽ More Instruction tuning, a specialized technique to enhance large language model (LLM) performance via instruction datasets, relies heavily on the quality of employed data. Existing quality improvement methods alter instruction data through dataset expansion or curation. However, the expansion method risks data redundancy, potentially compromising LLM performance, while the curation approach confines the LLM's potential to the original dataset. Our aim is to surpass the original data quality without encountering these shortcomings. To achieve this, we propose LIFT (LLM Instruction Fusion Transfer), a novel and versatile paradigm designed to elevate the instruction quality to new heights. LIFT strategically broadens data distribution to encompass more high-quality subspaces and eliminates redundancy, concentrating on high-quality segments across overall data subspaces. Experimental results demonstrate that, even with a limited quantity of high-quality instruction data selected by our paradigm, LLMs not only consistently uphold robust performance across various tasks but also surpass some state-of-the-art results, highlighting the significant improvement in instruction quality achieved by our paradigm. △ Less

Submitted 27 December, 2023; v1 submitted 11 December, 2023; originally announced December 2023.

arXiv:2312.00496 [pdf, other]

doi 10.1021/acs.jctc.3c01317

Nonadiabatic conical intersection dynamics in the local diabatic representation with Strang splitting and Fourier basis

Authors: Bing Gu

Abstract: We develop and implement an exact conical intersection nonadiabatic wave packet dynamics method that combines the local diabatic representation, Strang splitting for the total molecular propagator, and discrete variable representation with uniform grids. By employing the local diabatic representation, this method captures all non-adiabatic effects, including nonadiabatic transitions, electronic co… ▽ More We develop and implement an exact conical intersection nonadiabatic wave packet dynamics method that combines the local diabatic representation, Strang splitting for the total molecular propagator, and discrete variable representation with uniform grids. By employing the local diabatic representation, this method captures all non-adiabatic effects, including nonadiabatic transitions, electronic coherences, and geometric phases. Moreover, it is free of singularities in the first and second derivative couplings, and does not require a smooth gauge of electronic wavefunction phase. We further show that in contrast to the adiabatic representation, the split-operator method can be directly applied to the full molecular propagator with the locally diabatic ansatz. The Fourier series, employed as the primitive nuclear basis functions, is universal and can be applied to all types of reactive coordinates. The combination of local diabatic representation, Strang splitting, and Fourier basis allows exact modeling of conical intersection quantum dynamics directly with adiabatic electronic states that can be obtained from standard electronic structure computations. △ Less

Submitted 1 December, 2023; originally announced December 2023.

arXiv:2311.15368 [pdf, other]

Flow-Guided Diffusion for Video Inpainting

Authors: Bohai Gu, Yongsheng Yu, Heng Fan, Libo Zhang

Abstract: Video inpainting has been challenged by complex scenarios like large movements and low-light conditions. Current methods, including emerging diffusion models, face limitations in quality and efficiency. This paper introduces the Flow-Guided Diffusion model for Video Inpainting (FGDVI), a novel approach that significantly enhances temporal consistency and inpainting quality via reusing an off-the-s… ▽ More Video inpainting has been challenged by complex scenarios like large movements and low-light conditions. Current methods, including emerging diffusion models, face limitations in quality and efficiency. This paper introduces the Flow-Guided Diffusion model for Video Inpainting (FGDVI), a novel approach that significantly enhances temporal consistency and inpainting quality via reusing an off-the-shelf image generation diffusion model. We employ optical flow for precise one-step latent propagation and introduces a model-agnostic flow-guided latent interpolation technique. This technique expedites denoising, seamlessly integrating with any Video Diffusion Model (VDM) without additional training. Our FGDVI demonstrates a remarkable 10% improvement in flow warping error E_warp over existing state-of-the-art methods. Our comprehensive experiments validate superior performance of FGDVI, offering a promising direction for advanced video inpainting. The code and detailed results will be publicly available in https://github.com/NevSNev/FGDVI. △ Less

Submitted 26 November, 2023; originally announced November 2023.

arXiv:2311.11283 [pdf, other]

High Curie temperature and high hole mobility in diluted magnetic semiconductors (B, Mn)X (X = N, P, As, Sb)

Authors: Xiang Li, Jia-Wen Li, Jing-Yang You, Gang Su, Bo Gu

Abstract: Doping nonmagnetic semiconductors with magnetic impurities is a feasible way to obtain diluted magnetic semiconductors (DMSs). It is generally accepted that for the most extensively studied DMS, (Ga, Mn)As, its highest Curie temperature T$_{\text{C}}$ was achieved at 200 K with a Mn concentration of approximately 16% in experiments. A recent experiment reported record-breaking high electron and ho… ▽ More Doping nonmagnetic semiconductors with magnetic impurities is a feasible way to obtain diluted magnetic semiconductors (DMSs). It is generally accepted that for the most extensively studied DMS, (Ga, Mn)As, its highest Curie temperature T$_{\text{C}}$ was achieved at 200 K with a Mn concentration of approximately 16% in experiments. A recent experiment reported record-breaking high electron and hole mobilities in the semiconductor BAs [Science 377, 437 (2022)]. Since BAs shares the same zinc-blende structure with GaAs, here we predict four DMSs (B, Mn)X (X = N, P, As, Sb) by density functional theory calculations. Our results indicate that a significantly higher T$_{\text{C}}$ in the range of 254 K to 300 K for (B, Mn)As with a Mn concentration of around 15.6%, and even higher T$_{\text{C}}$ values above the room temperature for (B, Mn)N and (B, Mn)P with a Mn concentration exceeding 12.5%. Furthermore, we have predicted a large hole mobility of 1561 cm$^{\text{2}}$V$^{\text{-1}}$s$^{\text{-1}}$ at 300 K for (B, Mn)As with a Mn concentration of about 3.7%, which is three orders of magnitude larger than the hole mobility of 4 cm$^{\text{2}}$V$^{\text{-1}}$s$^{\text{-1}}$ at 300 K observed in the experiment for (Ga, Mn)As. Our findings predict the emergence of a new family of DMS, (B, Mn)X, and are expected to stimulate both experimental and theoretical studies of the DMS with high T$_{\text{C}}$ and high mobilities. △ Less

Submitted 19 November, 2023; originally announced November 2023.

arXiv:2311.06816 [pdf, other]

On original and latent space connectivity in deep neural networks

Authors: Boyang Gu, Anastasia Borovykh

Abstract: We study whether inputs from the same class can be connected by a continuous path, in original or latent representation space, such that all points on the path are mapped by the neural network model to the same class. Understanding how the neural network views its own input space and how the latent spaces are structured has value for explainability and robustness. We show that paths, linear or non… ▽ More We study whether inputs from the same class can be connected by a continuous path, in original or latent representation space, such that all points on the path are mapped by the neural network model to the same class. Understanding how the neural network views its own input space and how the latent spaces are structured has value for explainability and robustness. We show that paths, linear or nonlinear, connecting same-class inputs exist in all cases studied. △ Less

Submitted 12 November, 2023; originally announced November 2023.

arXiv:2311.05112 [pdf]

A Survey of Large Language Models in Medicine: Progress, Application, and Challenge

Authors: Hongjian Zhou, Fenglin Liu, Boyang Gu, Xinyu Zou, Jinfa Huang, Jinge Wu, Yiru Li, Sam S. Chen, Peilin Zhou, Junling Liu, Yining Hua, Chengfeng Mao, Chenyu You, Xian Wu, Yefeng Zheng, Lei Clifton, Zheng Li, Jiebo Luo, David A. Clifton

Abstract: Large language models (LLMs), such as ChatGPT, have received substantial attention due to their capabilities for understanding and generating human language. While there has been a burgeoning trend in research focusing on the employment of LLMs in supporting different medical tasks (e.g., enhancing clinical diagnostics and providing medical education), a review of these efforts, particularly their… ▽ More Large language models (LLMs), such as ChatGPT, have received substantial attention due to their capabilities for understanding and generating human language. While there has been a burgeoning trend in research focusing on the employment of LLMs in supporting different medical tasks (e.g., enhancing clinical diagnostics and providing medical education), a review of these efforts, particularly their development, practical applications, and outcomes in medicine, remains scarce. Therefore, this review aims to provide a detailed overview of the development and deployment of LLMs in medicine, including the challenges and opportunities they face. In terms of development, we provide a detailed introduction to the principles of existing medical LLMs, including their basic model structures, number of parameters, and sources and scales of data used for model development. It serves as a guide for practitioners in developing medical LLMs tailored to their specific needs. In terms of deployment, we offer a comparison of the performance of different LLMs across various medical tasks, and further compare them with state-of-the-art lightweight models, aiming to provide an understanding of the advantages and limitations of LLMs in medicine. Overall, in this review, we address the following questions: 1) What are the practices for developing medical LLMs 2) How to measure the medical task performance of LLMs in a medical setting? 3) How have medical LLMs been employed in real-world practice? 4) What challenges arise from the use of medical LLMs? and 5) How to more effectively develop and deploy medical LLMs? By answering these questions, this review aims to provide insights into the opportunities for LLMs in medicine and serve as a practical resource. We also maintain a regularly updated list of practical guides on medical LLMs at https://github.com/AI-in-Health/MedLLMsPracticalGuide △ Less

Submitted 22 July, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

Comments: Preprint. Version 6. Update Figures 1-5; Tables 2-3; 31 pages

arXiv:2310.14209 [pdf, other]

SUT: Active Defects Probing for Transcompiler Models

Authors: Mengnan Qi, Yufan Huang, Maoquan Wang, Yongqiang Yao, Zihan Liu, Bin Gu, Colin Clement, Neel Sundaresan

Abstract: Automatic Program translation has enormous application value and hence has been attracting significant interest from AI researchers. However, we observe that current program translation models still make elementary syntax errors, particularly, when the target language does not have syntax elements in the source language. Metrics like BLUE, CodeBLUE and computation accuracy may not expose these iss… ▽ More Automatic Program translation has enormous application value and hence has been attracting significant interest from AI researchers. However, we observe that current program translation models still make elementary syntax errors, particularly, when the target language does not have syntax elements in the source language. Metrics like BLUE, CodeBLUE and computation accuracy may not expose these issues. In this paper we introduce a new metrics for programming language translation and these metrics address these basic syntax errors. We develop a novel active defects probing suite called Syntactic Unit Tests (SUT) which includes a highly interpretable evaluation harness for accuracy and test scoring. Experiments have shown that even powerful models like ChatGPT still make mistakes on these basic unit tests. Specifically, compared to previous program translation task evaluation dataset, its pass rate on our unit tests has decreased by 26.15%. Further our evaluation harness reveal syntactic element errors in which these models exhibit deficiencies. △ Less

Submitted 22 October, 2023; originally announced October 2023.

arXiv:2310.11729 [pdf, other]

Diagrammatic representation and nonperturbative approximation of exact time-convolutionless master equation

Authors: Bing Gu

Abstract: The time-convolutionless master equation provides a general framework to model non-Markovian dynamics of an open quantum system with a time-local generator. A diagrammatic representation is developed and proven for the perturbative expansion of the exact time-local generator for an open quantum system interacting with arbitrary environments. A truncation of the perturbation expansion leads to the… ▽ More The time-convolutionless master equation provides a general framework to model non-Markovian dynamics of an open quantum system with a time-local generator. A diagrammatic representation is developed and proven for the perturbative expansion of the exact time-local generator for an open quantum system interacting with arbitrary environments. A truncation of the perturbation expansion leads to the perturbative time-convolutionless quantum master equations. We further introduce a nonperturbative approach that approximates the time-convolutionless generator as a nested time-ordered exponential function. △ Less

Submitted 18 October, 2023; originally announced October 2023.

arXiv:2310.11476 [pdf, other]

Program Translation via Code Distillation

Authors: Yufan Huang, Mengnan Qi, Yongqiang Yao, Maoquan Wang, Bin Gu, Colin Clement, Neel Sundaresan

Abstract: Software version migration and program translation are an important and costly part of the lifecycle of large codebases. Traditional machine translation relies on parallel corpora for supervised translation, which is not feasible for program translation due to a dearth of aligned data. Recent unsupervised neural machine translation techniques have overcome data limitations by included techniques s… ▽ More Software version migration and program translation are an important and costly part of the lifecycle of large codebases. Traditional machine translation relies on parallel corpora for supervised translation, which is not feasible for program translation due to a dearth of aligned data. Recent unsupervised neural machine translation techniques have overcome data limitations by included techniques such as back translation and low level compiler intermediate representations (IR). These methods face significant challenges due to the noise in code snippet alignment and the diversity of IRs respectively. In this paper we propose a novel model called Code Distillation (CoDist) whereby we capture the semantic and structural equivalence of code in a language agnostic intermediate representation. Distilled code serves as a translation pivot for any programming language, leading by construction to parallel corpora which scale to all available source code by simply applying the distillation compiler. We demonstrate that our approach achieves state-of-the-art performance on CodeXGLUE and TransCoder GeeksForGeeks translation benchmarks, with an average absolute increase of 12.7% on the TransCoder GeeksforGeeks translation benchmark compare to TransCoder-ST. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2310.09061 [pdf, other]

High temperature ferrimagnetic semiconductors by spin-dependent doping in high temperature antiferromagnets

Authors: Jia-Wen Li, Gang Su, Bo Gu

Abstract: To realize room temperature ferromagnetic (FM) semiconductors is still a challenge in spintronics. Many antiferromagnetic (AFM) insulators and semiconductors with high Neel temperature $T_N$ are obtained in experiments, such as LaFeO$_3$, BiFeO$_3$, etc. High concentrations of magnetic impurities can be doped into these AFM materials, but AFM state with very tiny net magnetic moments was obtained… ▽ More To realize room temperature ferromagnetic (FM) semiconductors is still a challenge in spintronics. Many antiferromagnetic (AFM) insulators and semiconductors with high Neel temperature $T_N$ are obtained in experiments, such as LaFeO$_3$, BiFeO$_3$, etc. High concentrations of magnetic impurities can be doped into these AFM materials, but AFM state with very tiny net magnetic moments was obtained in experiments, because the magnetic impurities were equally doped into the spin up and down sublattices of the AFM materials. Here, we propose that the effective magnetic field provided by a FM substrate could guarantee the spin-dependent doping in AFM materials, where the doped magnetic impurities prefer one sublattice of spins, and the ferrimagnetic (FIM) materials are obtained. To demonstrate this proposal, we study the Mn-doped AFM insulator LaFeO$_3$ with FM substrate of Fe metal by the density functional theory (DFT) calculations. It is shown that the doped magnetic Mn impurities prefer to occupy one sublattice of AFM insulator, and introduce large magnetic moments in La(Fe,Mn)O$_3$. For the AFM insulator LaFeO$_3$ with high $T_N$ = 740 K, several FIM semiconductors with high Curie temperature $T_C >$ 300 K and the band gap less than 2 eV are obtained by DFT calculations, when 1/8 or 1/4 Fe atoms in LaFeO$_3$ are replaced by the other 3d, 4d transition metal elements. The large magneto-optical Kerr effect (MOKE) is obtained in these LaFeO$_3$-based FIM semiconductors. In addition, the FIM semiconductors with high $T_C$ are also obtained by spin-dependent doping in some other AFM materials with high $T_N$, including BiFeO$_3$, SrTcO$_3$, CaTcO$_3$, etc. Our theoretical results propose a way to obtain high $T_C$ FIM semiconductors by spin-dependent doping in high $T_N$ AFM insulators and semiconductors. △ Less

Submitted 13 October, 2023; originally announced October 2023.

arXiv:2310.06483 [pdf, other]

Variance Reduced Online Gradient Descent for Kernelized Pairwise Learning with Limited Memory

Authors: Hilal AlQuabeh, Bhaskar Mukhoty, Bin Gu

Abstract: Pairwise learning is essential in machine learning, especially for problems involving loss functions defined on pairs of training examples. Online gradient descent (OGD) algorithms have been proposed to handle online pairwise learning, where data arrives sequentially. However, the pairwise nature of the problem makes scalability challenging, as the gradient computation for a new sample involves al… ▽ More Pairwise learning is essential in machine learning, especially for problems involving loss functions defined on pairs of training examples. Online gradient descent (OGD) algorithms have been proposed to handle online pairwise learning, where data arrives sequentially. However, the pairwise nature of the problem makes scalability challenging, as the gradient computation for a new sample involves all past samples. Recent advancements in OGD algorithms have aimed to reduce the complexity of calculating online gradients, achieving complexities less than $O(T)$ and even as low as $O(1)$. However, these approaches are primarily limited to linear models and have induced variance. In this study, we propose a limited memory OGD algorithm that extends to kernel online pairwise learning while improving the sublinear regret. Specifically, we establish a clear connection between the variance of online gradients and the regret, and construct online gradients using the most recent stratified samples with a limited buffer of size of $s$ representing all past data, which have a complexity of $O(sT)$ and employs $O(\sqrt{T}\log{T})$ random Fourier features for kernel approximation. Importantly, our theoretical results demonstrate that the variance-reduced online gradients lead to an improved sublinear regret bound. The experiments on real-world datasets demonstrate the superiority of our algorithm over both kernelized and linear online pairwise learning algorithms. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: Accepted in ACML2023

arXiv:2309.08965 [pdf, other]

Multiagent Reinforcement Learning with an Attention Mechanism for Improving Energy Efficiency in LoRa Networks

Authors: Xu Zhang, Ziqi Lin, Shimin Gong, Bo Gu, Dusit Niyato

Abstract: Long Range (LoRa) wireless technology, characterized by low power consumption and a long communication range, is regarded as one of the enabling technologies for the Industrial Internet of Things (IIoT). However, as the network scale increases, the energy efficiency (EE) of LoRa networks decreases sharply due to severe packet collisions. To address this issue, it is essential to appropriately assi… ▽ More Long Range (LoRa) wireless technology, characterized by low power consumption and a long communication range, is regarded as one of the enabling technologies for the Industrial Internet of Things (IIoT). However, as the network scale increases, the energy efficiency (EE) of LoRa networks decreases sharply due to severe packet collisions. To address this issue, it is essential to appropriately assign transmission parameters such as the spreading factor and transmission power for each end device (ED). However, due to the sporadic traffic and low duty cycle of LoRa networks, evaluating the system EE performance under different parameter settings is time-consuming. Therefore, we first formulate an analytical model to calculate the system EE. On this basis, we propose a transmission parameter allocation algorithm based on multiagent reinforcement learning (MALoRa) with the aim of maximizing the system EE of LoRa networks. Notably, MALoRa employs an attention mechanism to guide each ED to better learn how much ''attention'' should be given to the parameter assignments for relevant EDs when seeking to improve the system EE. Simulation results demonstrate that MALoRa significantly improves the system EE compared with baseline algorithms with an acceptable degradation in packet delivery rate (PDR). △ Less

Submitted 16 September, 2023; originally announced September 2023.

Comments: 6 pages, 3 figures, This paper has been accepted for publication in IEEE Global Communications Conference (GLOBECOM) 2023

arXiv:2309.05908 [pdf, other]

Reset Controller Synthesis by Reach-avoid Analysis for Delay Hybrid Systems

Authors: Han Su, Jiyu Zhu, Shenghua Feng, Yunjun Bai, Bin Gu, Jiang Liu, Mengfei Yang, Naijun Zhan

Abstract: A reset controller plays a crucial role in designing hybrid systems. It restricts the initial set and redefines the reset map associated with discrete transitions, in order to guarantee the system to achieve its objective. Reset controller synthesis, together with feedback controller synthesis and switching logic controller synthesis, provides a correct-by-construction approach to designing hybrid… ▽ More A reset controller plays a crucial role in designing hybrid systems. It restricts the initial set and redefines the reset map associated with discrete transitions, in order to guarantee the system to achieve its objective. Reset controller synthesis, together with feedback controller synthesis and switching logic controller synthesis, provides a correct-by-construction approach to designing hybrid systems. However, time-delay is an inevitable factor in hybrid systems, which can degrade control performance and render verification certificates obtained by abstracting away time-delay invalid in practice. In this paper, we investigate this issue in a practical manner by taking time-delay into account. We propose an approach that reduces the synthesis of reset controllers to the generation of reach-avoid sets for the hybrid system under consideration, which can be efficiently solved using off-the-shell convex optimization solvers. △ Less

Submitted 27 May, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

Comments: 15 pages, 10 figures

arXiv:2309.05906 [pdf, other]

Correct-by-Construction for Hybrid Systems by Synthesizing Reset Controller

Authors: Jiang Liu, Han Su, Yunjun Bai, Bin Gu, Bai Xue, Mengfei Yang, Naijun Zhan

Abstract: Controller synthesis, including reset controller, feedback controller, and switching logic controller, provides an essential mechanism to guarantee the correctness and reliability of hybrid systems in a correct-by-construction manner. Unfortunately, reset controller synthesis is still in an infant stage in the literature, although it makes theoretical and practical significance. In this paper, we… ▽ More Controller synthesis, including reset controller, feedback controller, and switching logic controller, provides an essential mechanism to guarantee the correctness and reliability of hybrid systems in a correct-by-construction manner. Unfortunately, reset controller synthesis is still in an infant stage in the literature, although it makes theoretical and practical significance. In this paper, we propose a convex programming based method to synthesize reset controllers for polynomial hybrid systems subject to safety, possibly together with liveness. Such a problem essentially corresponds to computing an initial set of continuous states in each mode and a reset map associated with each discrete jump such that any trajectory starting from any computed initial state keeps safe if only safety constraints are given or reaches the target set eventually and keeps safe before that if both safety and liveness are given, through the computed reset maps. Both cases can be reduced to reach-avoid and/or differential invariant generation problems, further encoded as convex optimization problems. Finally, several examples are provided to demonstrate the efficiency and effectiveness of our method. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Comments: 26 pages, 8 figures

arXiv:2308.16031 [pdf, other]

Breaking the Interference and Fading Gridlock in Backscatter Communications: State-of-the-Art, Design Challenges, and Future Directions

Authors: Bowen Gu, Dong Li, Haiyang Ding, Gongpu Wang, Chintha Tellambura

Abstract: As the Internet of Things (IoT) advances by leaps and bounds, a multitude of devices are becoming interconnected, marking the onset of an era where all things are connected. While this growth opens up opportunities for novel products and applications, it also leads to increased energy demand and battery reliance for IoT devices, creating a significant bottleneck that hinders sustainable progress.… ▽ More As the Internet of Things (IoT) advances by leaps and bounds, a multitude of devices are becoming interconnected, marking the onset of an era where all things are connected. While this growth opens up opportunities for novel products and applications, it also leads to increased energy demand and battery reliance for IoT devices, creating a significant bottleneck that hinders sustainable progress. At this juncture, backscatter communication (BackCom), as a low-power and passive communication method, emerges as one of the promising solutions to this energy impasse by reducing the manufacturing costs and energy consumption of IoT devices. However, BackCom systems face challenges such as complex interference environments, including direct link interference (DLI) and mutual interference (MI) between tags, which can severely disrupt the efficiency of BackCom networks. Moreover, double-path fading is another major issue that leads to the degraded system performance. To fully unleash the potential of BackComs, the purpose of this paper is to furnish a comprehensive review of existing solutions with a focus on combatting these specific interference challenges and overcoming dual-path fading, offering an insightful analysis and comparison of various strategies for effectively mitigating these issues. Specifically, we begin by introducing the preliminaries for the BackCom, including its history, operating mechanisms, main architectures, etc, providing a foundational understanding of the field. Then, we delve into fundamental issues related to BackCom systems, such as solutions for the DLI, the MI, and the double-path fading. This paper thoroughly provides state-of-the-art advances for each case, particularly highlighting how the latest innovations in theoretical approaches and system design can strategically address these challenges. △ Less

Submitted 9 January, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

arXiv:2308.06030 [pdf, other]

Chiral magnetism in lithium-decorated monolayer CrTe$_{2}$: Interplay between Dzyaloshinskii-Moriya interaction and higher-order interactions

Authors: Weiyi Pan, Changsong Xu, Xueyang Li, Zhiming Xu, Boyu Liu, Bing-Lin Gu, Wenhui Duan

Abstract: Chiral magnetic states in two-dimensional (2D) layered noncentrosymmetric magnets, which are promising advanced spintronic materials, are usually attributed to Dzyaloshinskii-Moriya interactions (DMI). However, the role of underlying higher-order spin couplings in determining the properties of chiral spin textures has much less reported. In this work, taking the lithium-decorated monolayer CrTe… ▽ More Chiral magnetic states in two-dimensional (2D) layered noncentrosymmetric magnets, which are promising advanced spintronic materials, are usually attributed to Dzyaloshinskii-Moriya interactions (DMI). However, the role of underlying higher-order spin couplings in determining the properties of chiral spin textures has much less reported. In this work, taking the lithium-decorated monolayer CrTe$_{2}$ (monolayer LiCrTe$_{2}$) as an example, we develop a first-principles-based comprehensive spin model constructed by using the symmetry-adapted cluster expansion method. Based on this spin model, we identify the ground state of monolayer LiCrTe$_{2}$ as a chiral spin spiral state, which can further assemble macroscopic chiral labyrinth domains (LD) under zero-field conditions as well as evolve into skyrmions under a finite magnetic field. Moreover, higher-order biquadratic and three-site interactions are identified to be responsible for modulating both the size and the field stability of the spin spiral state. Our study sheds light on complex magnetic couplings in 2D magnets. △ Less

Submitted 17 March, 2024; v1 submitted 11 August, 2023; originally announced August 2023.

Comments: 10 pages, 5 figures

arXiv:2307.14205 [pdf, ps, other]

doi 10.1016/j.physletb.2024.138545

Braneworld sum rules and positive tension branes in a massive gravity

Authors: Ke Yang, Bao-Min Gu, Yi Zhong

Abstract: By taking advantage of the braneworld sum rules, we explore the feasibility of constructing a flat 3-brane scenario consisting solely of positive tension branes in a 5D extension of the Lorentz-violating massive gravity. It is found that the theory supports three distinct brane configurations, one of which is exactly what we expected, consisting solely of two positive tension branes. The cosmologi… ▽ More By taking advantage of the braneworld sum rules, we explore the feasibility of constructing a flat 3-brane scenario consisting solely of positive tension branes in a 5D extension of the Lorentz-violating massive gravity. It is found that the theory supports three distinct brane configurations, one of which is exactly what we expected, consisting solely of two positive tension branes. The cosmological problem of Randall-Sundrum-1 model and the gauge hierarchy problem can be solved in this model simultaneously. Furthermore, the analysis of linear perturbations reveals that the tensor, vector and scalar modes are all massive and share the same mass spectrum, except that the ground state of vector mode is absent. Moreover, the tensor and vector modes are robust, but the scalar mode is ghost-like. Interestingly, even though the Kaluza-Klein gravitons have an extremely small mass splitting scale, an estimation of the effective gravitational potential and production of these gravitons on the brane indicates that the phenomenology of the present model is equivalent to that of the 6D ADD model. △ Less

Submitted 8 May, 2024; v1 submitted 26 July, 2023; originally announced July 2023.

Comments: 7 pages and 1 figure, published version

Journal ref: Phys. Lett. B 850 (2024) 138545

arXiv:2307.11391 [pdf, other]

doi 10.1140/epjc/s10052-024-12413-5

Quantum Gravity Induced Entanglement of Masses With Extra Dimensions

Authors: Shuai Feng, Bao-Min Gu, Fu-Wen Shu

Abstract: It is believed that gravity can be considered as a quantum coherent mediator. In this study, we propose a plan to test the existence of extra dimensions using the Quantum Gravity Induced Entanglement of Masses (QGEM) experiment. This experiment involves two freely falling test masses passing through a Stern-Gerlach-like device. We investigate the entanglement witness between these masses within th… ▽ More It is believed that gravity can be considered as a quantum coherent mediator. In this study, we propose a plan to test the existence of extra dimensions using the Quantum Gravity Induced Entanglement of Masses (QGEM) experiment. This experiment involves two freely falling test masses passing through a Stern-Gerlach-like device. We investigate the entanglement witness between these masses within the framework of the Randall-Sundrum II model (RS-II). Our findings indicate that the system reaches entanglement more rapidly in the presence of extra dimensions, particularly when the radius of the extra dimension is large. △ Less

Submitted 17 March, 2024; v1 submitted 21 July, 2023; originally announced July 2023.

arXiv:2307.00510 [pdf, other]

Inflation with shallow dip and primordial black holes

Authors: Bao-Min Gu, Fu-Wen Shu, Ke Yang

Abstract: Primordial black holes may arise through ultra slow-roll inflation. In this work we study a toy model of ultra slow-roll inflation with a shallow dip. The ultra slow-roll stage enhances the curvature perturbations and thus the primordial scalar power spectrum. We analyze the features of the power spectrum numerically and analytically, and then give a rough estimate of the lower and upper bound of… ▽ More Primordial black holes may arise through ultra slow-roll inflation. In this work we study a toy model of ultra slow-roll inflation with a shallow dip. The ultra slow-roll stage enhances the curvature perturbations and thus the primordial scalar power spectrum. We analyze the features of the power spectrum numerically and analytically, and then give a rough estimate of the lower and upper bound of the enhancement. These large perturbations also produce second order gravitational waves, which are in the scope of future observations. △ Less

Submitted 17 July, 2023; v1 submitted 2 July, 2023; originally announced July 2023.

Comments: 11pages, 7 figures

arXiv:2306.17616 [pdf, other]

doi 10.34133/research.0238

Superconducting, topological and transport properties of kagome metals CsTi$ _{3} $Bi$ _{5} $ and RbTi$ _{3} $Bi$ _{5} $

Authors: Xin-Wei Yi, Zheng-Wei Liao, Jing-Yang You, Bo Gu, Gang Su

Abstract: The recently discovered ATi$_3$Bi$_5$ (A=Cs, Rb) exhibit intriguing quantum phenomena including superconductivity, electronic nematicity, and abundant topological states, which provide promising platforms for studying kagome superconductivity, band topology, and charge orders. In this work, we comprehensively study various properties of ATi$_3$Bi$_5$ including superconductivity under pressure and… ▽ More The recently discovered ATi$_3$Bi$_5$ (A=Cs, Rb) exhibit intriguing quantum phenomena including superconductivity, electronic nematicity, and abundant topological states, which provide promising platforms for studying kagome superconductivity, band topology, and charge orders. In this work, we comprehensively study various properties of ATi$_3$Bi$_5$ including superconductivity under pressure and doping, band topology under pressure, thermal conductivity, heat capacity, electrical resistance, and spin Hall conductivity (SHC) using first-principles calculations. Calculated superconducting transition temperature ($\mathrm{ T_{c}}$) of CsTi$_3$Bi$_5$ and RbTi$_3$Bi$_5$ at ambient pressure are about 1.85 and 1.92K. When subject to pressure, $\mathrm{ T_{c}}$ of CsTi$_3$Bi$_5$ exhibits a special valley and dome shape, which arises from quasi-two-dimensional to three-dimensional isotropic compression within the context of an overall decreasing trend. Furthermore, $\mathrm{ T_{c}}$ of RbTi$_3$Bi$_5$ can be effectively enhanced up to 3.09K by tuning the kagome van Hove singularities (VHSs) and flat band through doping. Pressure can also induce abundant topological surface states at the Fermi energy ($\mathrm{E}_{\mathrm{F}}$) and tune VHSs across $\mathrm{E}_{\mathrm{F}}$. Additionally, our transport calculations are in excellent agreement with recent experiments, confirming the absence of charge density wave. Notably, SHC of CsTi$_3$Bi$_5$ can reach as large as 226$ \hbar\cdot (e\cdot Ω\cdot cm) ^{-1} $ at $\mathrm{E}_{\mathrm{F}}$. Our work provides a timely and detailed analysis of the rich physical properties for ATi$_3$Bi$_5$, offering valuable insights for further explorations and understandings on these intriguing superconducting materials. △ Less

Submitted 30 June, 2023; originally announced June 2023.

Comments: 11 pages, 5 figures

Journal ref: Research. 2023;6:0238

arXiv:2306.16077 [pdf, other]

Secure and Fast Asynchronous Vertical Federated Learning via Cascaded Hybrid Optimization

Authors: Ganyu Wang, Qingsong Zhang, Li Xiang, Boyu Wang, Bin Gu, Charles Ling

Abstract: Vertical Federated Learning (VFL) attracts increasing attention because it empowers multiple parties to jointly train a privacy-preserving model over vertically partitioned data. Recent research has shown that applying zeroth-order optimization (ZOO) has many advantages in building a practical VFL algorithm. However, a vital problem with the ZOO-based VFL is its slow convergence rate, which limits… ▽ More Vertical Federated Learning (VFL) attracts increasing attention because it empowers multiple parties to jointly train a privacy-preserving model over vertically partitioned data. Recent research has shown that applying zeroth-order optimization (ZOO) has many advantages in building a practical VFL algorithm. However, a vital problem with the ZOO-based VFL is its slow convergence rate, which limits its application in handling modern large models. To address this problem, we propose a cascaded hybrid optimization method in VFL. In this method, the downstream models (clients) are trained with ZOO to protect privacy and ensure that no internal information is shared. Meanwhile, the upstream model (server) is updated with first-order optimization (FOO) locally, which significantly improves the convergence rate, making it feasible to train the large models without compromising privacy and security. We theoretically prove that our VFL framework converges faster than the ZOO-based VFL, as the convergence of our framework is not limited by the size of the server model, making it effective for training large models with the major part on the server. Extensive experiments demonstrate that our method achieves faster convergence than the ZOO-based VFL framework, while maintaining an equivalent level of privacy protection. Moreover, we show that the convergence of our VFL is comparable to the unsafe FOO-based VFL baseline. Additionally, we demonstrate that our method makes the training of a large model feasible. △ Less

Submitted 29 June, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

Comments: Under Review

arXiv:2306.13874 [pdf, other]

Enhancing Spectrum Sensing via Reconfigurable Intelligent Surfaces: Passive or Active Sensing and How Many Reflecting Elements are Needed?

Authors: Hao Xie, Dong Li, Bowen Gu

Abstract: Cognitive radio has been proposed to alleviate the scarcity of available spectrum caused by the significant demand for wideband services and the fragmentation of spectrum resources. However, sensing performance is quite poor due to the low sensing signal-to-noise ratio, especially in complex environments with severe channel fading. Fortunately, reconfigurable intelligent surface (RIS)-aided spectr… ▽ More Cognitive radio has been proposed to alleviate the scarcity of available spectrum caused by the significant demand for wideband services and the fragmentation of spectrum resources. However, sensing performance is quite poor due to the low sensing signal-to-noise ratio, especially in complex environments with severe channel fading. Fortunately, reconfigurable intelligent surface (RIS)-aided spectrum sensing can effectively tackle the above challenge due to its high array gain. Nevertheless, the traditional passive RIS may suffer from the ``double fading'' effect, which severely limits the performance of passive RIS-aided spectrum sensing. Thus, a crucial challenge is how to fully exploit the potential advantages of the RIS and further improve the sensing performance. To this end, we introduce the active RIS into spectrum sensing and respectively formulate two optimization problems for the passive RIS and the active RIS to maximize the detection probability. In light of the intractability of the formulated problems, we develop a one-stage optimization algorithm with inner approximation and a two-stage optimization algorithm with a bisection method to obtain sub-optimal solutions, and apply the Rayleigh quotient to obtain the upper and lower bounds of the detection probability. Furthermore, in order to gain more insight into the impact of the RIS on spectrum sensing, we respectively investigate the number configuration for passive RIS and active RIS and analyze how many reflecting elements are needed to achieve the detection probability close to 1. Simulation results verify that the proposed algorithms outperform existing algorithms under the same parameter configuration, and achieve a detection probability close to 1 with even fewer reflecting elements or antennas than existing schemes. △ Less

Submitted 21 October, 2023; v1 submitted 24 June, 2023; originally announced June 2023.

arXiv:2306.08944 [pdf, other]

Toward collective chemistry by strong light-matter coupling

Authors: Bing Gu

Abstract: Strong light-matter coupling provides a versatile and novel means to manipulate chemical processes. Here we develop a theoretical framework to investigate the spectroscopy and dynamics of a molecular ensemble embedded in an optical cavity under the collective strong coupling regime. This theory is constructed by a pseudoparticle representation of the molecular Hamiltonians, mapping the polaritonic… ▽ More Strong light-matter coupling provides a versatile and novel means to manipulate chemical processes. Here we develop a theoretical framework to investigate the spectroscopy and dynamics of a molecular ensemble embedded in an optical cavity under the collective strong coupling regime. This theory is constructed by a pseudoparticle representation of the molecular Hamiltonians, mapping the polaritonic Hamiltonian into a coupled fermion-boson model under particle number constraints. The mapped model is then analyzed using the non-equilibrium Green function theory with the important self-energy diagrams identified through power counting. Numerical demonstrations are shown for the driven Tavis-Cummings model, which shows an excellent agreement with exact results. △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2306.05751 [pdf, other]

Advancing Counterfactual Inference through Nonlinear Quantile Regression

Authors: Shaoan Xie, Biwei Huang, Bin Gu, Tongliang Liu, Kun Zhang

Abstract: The capacity to address counterfactual "what if" inquiries is crucial for understanding and making use of causal influences. Traditional counterfactual inference, under Pearls' counterfactual framework, typically depends on having access to or estimating a structural causal model. Yet, in practice, this causal model is often unknown and might be challenging to identify. Hence, this paper aims to p… ▽ More The capacity to address counterfactual "what if" inquiries is crucial for understanding and making use of causal influences. Traditional counterfactual inference, under Pearls' counterfactual framework, typically depends on having access to or estimating a structural causal model. Yet, in practice, this causal model is often unknown and might be challenging to identify. Hence, this paper aims to perform reliable counterfactual inference based solely on observational data and the (learned) qualitative causal structure, without necessitating a predefined causal model or even direct estimations of conditional distributions. To this end, we establish a novel connection between counterfactual inference and quantile regression and show that counterfactual inference can be reframed as an extended quantile regression problem. Building on this insight, we propose a practical framework for efficient and effective counterfactual inference implemented with neural networks under a bi-level optimization scheme. The proposed approach enhances the capacity to generalize estimated counterfactual outcomes to unseen data, thereby providing an upper bound on the generalization error. Furthermore, empirical evidence demonstrates its superior statistical efficiency in comparison to existing methods. Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions. △ Less

Submitted 27 February, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

arXiv:2306.01260 [pdf, other]

FREPA: An Automated and Formal Approach to Requirement Modeling and Analysis in Aircraft Control Domain

Authors: Jincao Feng, Weikai Miao, Hanyue Zheng, Yihao Huang, Jianwen Li, Zheng Wang, Ting Su, Bin Gu, Geguang Pu, Mengfei Yang, Jifeng He

Abstract: Formal methods are promising for modeling and analyzing system requirements. However, applying formal methods to large-scale industrial projects is a remaining challenge. The industrial engineers are suffering from the lack of automated engineering methodologies to effectively conduct precise requirement models, and rigorously validate and verify (V&V) the generated models. To tackle this challeng… ▽ More Formal methods are promising for modeling and analyzing system requirements. However, applying formal methods to large-scale industrial projects is a remaining challenge. The industrial engineers are suffering from the lack of automated engineering methodologies to effectively conduct precise requirement models, and rigorously validate and verify (V&V) the generated models. To tackle this challenge, in this paper, we present a systematic engineering approach, named Formal Requirement Engineering Platform in Aircraft (FREPA), for formal requirement modeling and V\&V in the aerospace and aviation control domains. FREPA is an outcome of the seamless collaboration between the academy and industry over the last eight years. The main contributions of this paper include 1) an automated and systematic engineering approach FREPA to construct requirement models, validate and verify systems in the aerospace and aviation control domain, 2) a domain-specific modeling language AASRDL to describe the formal specification, and 3) a practical FREPA-based tool AeroReq which has been used by our industry partners. We have successfully adopted FREPA to seven real aerospace gesture control and two aviation engine control systems. The experimental results show that FREPA and the corresponding tool AeroReq significantly facilitate formal modeling and V&V in the industry. Moreover, we also discuss the experiences and lessons gained from using FREPA in aerospace and aviation projects. △ Less

Submitted 2 June, 2023; originally announced June 2023.

Comments: 12 pages, Published by FSE 2020

arXiv:2305.09946 [pdf]

AdaMSS: Adaptive Multi-Modality Segmentation-to-Survival Learning for Survival Outcome Prediction from PET/CT Images

Authors: Mingyuan Meng, Bingxin Gu, Michael Fulham, Shaoli Song, Dagan Feng, Lei Bi, Jinman Kim

Abstract: Survival prediction is a major concern for cancer management. Deep survival models based on deep learning have been widely adopted to perform end-to-end survival prediction from medical images. Recent deep survival models achieved promising performance by jointly performing tumor segmentation with survival prediction, where the models were guided to extract tumor-related information through Multi-… ▽ More Survival prediction is a major concern for cancer management. Deep survival models based on deep learning have been widely adopted to perform end-to-end survival prediction from medical images. Recent deep survival models achieved promising performance by jointly performing tumor segmentation with survival prediction, where the models were guided to extract tumor-related information through Multi-Task Learning (MTL). However, these deep survival models have difficulties in exploring out-of-tumor prognostic information. In addition, existing deep survival models are unable to effectively leverage multi-modality images. Empirically-designed fusion strategies were commonly adopted to fuse multi-modality information via task-specific manually-designed networks, thus limiting the adaptability to different scenarios. In this study, we propose an Adaptive Multi-modality Segmentation-to-Survival model (AdaMSS) for survival prediction from PET/CT images. Instead of adopting MTL, we propose a novel Segmentation-to-Survival Learning (SSL) strategy, where our AdaMSS is trained for tumor segmentation and survival prediction sequentially in two stages. This strategy enables the AdaMSS to focus on tumor regions in the first stage and gradually expand its focus to include other prognosis-related regions in the second stage. We also propose a data-driven strategy to fuse multi-modality information, which realizes adaptive optimization of fusion strategies based on training data during training. With the SSL and data-driven fusion strategies, our AdaMSS is designed as an adaptive model that can self-adapt its focus regions and fusion strategy for different training stages. Extensive experiments with two large clinical datasets show that our AdaMSS outperforms state-of-the-art survival prediction methods. △ Less

Submitted 19 July, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

Comments: Under Review

arXiv:2305.09808 [pdf, other]

doi 10.1103/PhysRevB.108.064308

Floquet theory and computational method for the optical absorption of laser-dressed solids

Authors: Vishal Tiwari, Bing Gu, Ignacio Franco

Abstract: Recent advances in laser technology now enable engineering the electronic structure of matter through strong light-matter interactions. However, the effective physicochemical properties of these laser-dressed nonequilibrium materials are not well understood. Here we develop a general theory that now enables modeling and interpreting the linear optical absorption of solids that are dressed by light… ▽ More Recent advances in laser technology now enable engineering the electronic structure of matter through strong light-matter interactions. However, the effective physicochemical properties of these laser-dressed nonequilibrium materials are not well understood. Here we develop a general theory that now enables modeling and interpreting the linear optical absorption of solids that are dressed by light of arbitrary strength and photon energy. The theory applies to any crystalline solid and quantum materials. In the theory, the dressing of Bloch electrons by the driving laser is treated exactly using Floquet theory. The effective optical properties of this laser-dressed material are probed through a weak laser whose effects are captured to first order in perturbation theory. Remarkably, in this nonequilibrium system the time- and space-periodic Floquet-Bloch modes play the role of the pristine eigenstates of matter as the optical absorption is seen to emerge from transitions among them. We implement the theoretical framework into a code FloqticS: Floquet optics in Solids) available through Github. To isolate the emergent phenomenology, we performed computations in a model solid with a cosine-shaped lattice potential driven by strong nonresonant light. The computations recover the dynamical Franz-Keldysh effect and identify novel dramatic changes in the optical absorption upon increasing the amplitude of the driving laser. The Floquet replicas open absorption sidebands separated by integer multiples of the drive photon energy. The hybridization of the Floquet-Bloch modes, create intense low-frequency absorption and stimulated emissions, and dips in the absorption spectrum. We assign these emerging effects as purely-optical tell-tale signatures of the Floquet-Bloch modes. These advances can be used to model, control and characterize the response properties of laser-dressed materials. △ Less

Submitted 3 August, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

Journal ref: Phys. Rev. B 108, 064308 (2023)

Showing 1–50 of 266 results for author: Gu, B