Search | arXiv e-print repository

Bayesian Federated Learning with Hamiltonian Monte Carlo: Algorithm and Theory

Authors: Jiajun Liang, Qian Zhang, Wei Deng, Qifan Song, Guang Lin

Abstract: This work introduces a novel and efficient Bayesian federated learning algorithm, namely, the Federated Averaging stochastic Hamiltonian Monte Carlo (FA-HMC), for parameter estimation and uncertainty quantification. We establish rigorous convergence guarantees of FA-HMC on non-iid distributed data sets, under the strong convexity and Hessian smoothness assumptions. Our analysis investigates the ef… ▽ More This work introduces a novel and efficient Bayesian federated learning algorithm, namely, the Federated Averaging stochastic Hamiltonian Monte Carlo (FA-HMC), for parameter estimation and uncertainty quantification. We establish rigorous convergence guarantees of FA-HMC on non-iid distributed data sets, under the strong convexity and Hessian smoothness assumptions. Our analysis investigates the effects of parameter space dimension, noise on gradients and momentum, and the frequency of communication (between the central node and local nodes) on the convergence and communication costs of FA-HMC. Beyond that, we establish the tightness of our analysis by showing that the convergence rate cannot be improved even for continuous FA-HMC process. Moreover, extensive empirical studies demonstrate that FA-HMC outperforms the existing Federated Averaging-Langevin Monte Carlo (FA-LD) algorithm. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2405.07839 [pdf, other]

Constrained Exploration via Reflected Replica Exchange Stochastic Gradient Langevin Dynamics

Authors: Haoyang Zheng, Hengrong Du, Qi Feng, Wei Deng, Guang Lin

Abstract: Replica exchange stochastic gradient Langevin dynamics (reSGLD) is an effective sampler for non-convex learning in large-scale datasets. However, the simulation may encounter stagnation issues when the high-temperature chain delves too deeply into the distribution tails. To tackle this issue, we propose reflected reSGLD (r2SGLD): an algorithm tailored for constrained non-convex exploration by util… ▽ More Replica exchange stochastic gradient Langevin dynamics (reSGLD) is an effective sampler for non-convex learning in large-scale datasets. However, the simulation may encounter stagnation issues when the high-temperature chain delves too deeply into the distribution tails. To tackle this issue, we propose reflected reSGLD (r2SGLD): an algorithm tailored for constrained non-convex exploration by utilizing reflection steps within a bounded domain. Theoretically, we observe that reducing the diameter of the domain enhances mixing rates, exhibiting a $\textit{quadratic}$ behavior. Empirically, we test its performance through extensive experiments, including identifying dynamical systems with physical constraints, simulations of constrained multi-modal distributions, and image classification tasks. The theoretical and empirical findings highlight the crucial role of constrained exploration in improving the simulation efficiency. △ Less

Submitted 3 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

Comments: 28 pages, 13 figures

arXiv:2402.07569 [pdf, other]

EM Estimation of the B-spline Copula with Penalized Log-Likelihood Function

Authors: Xiaoling Dou, Satoshi Kuriki, Gwo Dong Lin, Donald Richards

Abstract: The B-spline copula function is defined by a linear combination of elements of the normalized B-spline basis. We develop a modified EM algorithm, to maximize the penalized log-likelihood function, wherein we use the smoothly clipped absolute deviation (SCAD) penalty function for the penalization term. We conduct simulation studies to demonstrate the stability of the proposed numerical procedure, s… ▽ More The B-spline copula function is defined by a linear combination of elements of the normalized B-spline basis. We develop a modified EM algorithm, to maximize the penalized log-likelihood function, wherein we use the smoothly clipped absolute deviation (SCAD) penalty function for the penalization term. We conduct simulation studies to demonstrate the stability of the proposed numerical procedure, show that penalization yields estimates with smaller mean-square errors when the true parameter matrix is sparse, and provide methods for determining tuning parameters and for model selection. We analyze as an example a data set consisting of birth and death rates from 237 countries, available at the website, ''Our World in Data,'' and we estimate the marginal density and distribution functions of those rates together with all parameters of our B-spline copula model. △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2401.11665 [pdf, other]

Accelerating Approximate Thompson Sampling with Underdamped Langevin Monte Carlo

Authors: Haoyang Zheng, Wei Deng, Christian Moya, Guang Lin

Abstract: Approximate Thompson sampling with Langevin Monte Carlo broadens its reach from Gaussian posterior sampling to encompass more general smooth posteriors. However, it still encounters scalability issues in high-dimensional problems when demanding high accuracy. To address this, we propose an approximate Thompson sampling strategy, utilizing underdamped Langevin Monte Carlo, where the latter is the g… ▽ More Approximate Thompson sampling with Langevin Monte Carlo broadens its reach from Gaussian posterior sampling to encompass more general smooth posteriors. However, it still encounters scalability issues in high-dimensional problems when demanding high accuracy. To address this, we propose an approximate Thompson sampling strategy, utilizing underdamped Langevin Monte Carlo, where the latter is the go-to workhorse for simulations of high-dimensional posteriors. Based on the standard smoothness and log-concavity conditions, we study the accelerated posterior concentration and sampling using a specific potential function. This design improves the sample complexity for realizing logarithmic regrets from $\mathcal{\tilde O}(d)$ to $\mathcal{\tilde O}(\sqrt{d})$. The scalability and robustness of our algorithm are also empirically validated through synthetic experiments in high-dimensional bandit problems. △ Less

Submitted 20 June, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

Comments: 52 pages, 2 figures

arXiv:2311.05866 [pdf, other]

Fair Supervised Learning with A Simple Random Sampler of Sensitive Attributes

Authors: Jinwon Sohn, Qifan Song, Guang Lin

Abstract: As the data-driven decision process becomes dominating for industrial applications, fairness-aware machine learning arouses great attention in various areas. This work proposes fairness penalties learned by neural networks with a simple random sampler of sensitive attributes for non-discriminatory supervised learning. In contrast to many existing works that critically rely on the discreteness of s… ▽ More As the data-driven decision process becomes dominating for industrial applications, fairness-aware machine learning arouses great attention in various areas. This work proposes fairness penalties learned by neural networks with a simple random sampler of sensitive attributes for non-discriminatory supervised learning. In contrast to many existing works that critically rely on the discreteness of sensitive attributes and response variables, the proposed penalty is able to handle versatile formats of the sensitive attributes, so it is more extensively applicable in practice than many existing algorithms. This penalty enables us to build a computationally efficient group-level in-processing fairness-aware training framework. Empirical evidence shows that our framework enjoys better utility and fairness measures on popular benchmark data sets than competing methods. We also theoretically characterize estimation errors and loss of utility of the proposed neural-penalized risk minimization problem. △ Less

Submitted 9 March, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

arXiv:2309.04013 [pdf, other]

An Element-wise RSAV Algorithm for Unconstrained Optimization Problems

Authors: Shiheng Zhang, Jiahao Zhang, Jie Shen, Guang Lin

Abstract: We present a novel optimization algorithm, element-wise relaxed scalar auxiliary variable (E-RSAV), that satisfies an unconditional energy dissipation law and exhibits improved alignment between the modified and the original energy. Our algorithm features rigorous proofs of linear convergence in the convex setting. Furthermore, we present a simple accelerated algorithm that improves the linear con… ▽ More We present a novel optimization algorithm, element-wise relaxed scalar auxiliary variable (E-RSAV), that satisfies an unconditional energy dissipation law and exhibits improved alignment between the modified and the original energy. Our algorithm features rigorous proofs of linear convergence in the convex setting. Furthermore, we present a simple accelerated algorithm that improves the linear convergence rate to super-linear in the univariate case. We also propose an adaptive version of E-RSAV with Steffensen step size. We validate the robustness and fast convergence of our algorithm through ample numerical experiments. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: 25 pages, 7 figures

MSC Class: 90C26; 68T99; 68W40

arXiv:2306.06281 [pdf, other]

Energy-Dissipative Evolutionary Deep Operator Neural Networks

Authors: Jiahao Zhang, Shiheng Zhang, Jie Shen, Guang Lin

Abstract: Energy-Dissipative Evolutionary Deep Operator Neural Network is an operator learning neural network. It is designed to seed numerical solutions for a class of partial differential equations instead of a single partial differential equation, such as partial differential equations with different parameters or different initial conditions. The network consists of two sub-networks, the Branch net and… ▽ More Energy-Dissipative Evolutionary Deep Operator Neural Network is an operator learning neural network. It is designed to seed numerical solutions for a class of partial differential equations instead of a single partial differential equation, such as partial differential equations with different parameters or different initial conditions. The network consists of two sub-networks, the Branch net and the Trunk net. For an objective operator G, the Branch net encodes different input functions u at the same number of sensors, and the Trunk net evaluates the output function at any location. By minimizing the error between the evaluated output q and the expected output G(u)(y), DeepONet generates a good approximation of the operator G. In order to preserve essential physical properties of PDEs, such as the Energy Dissipation Law, we adopt a scalar auxiliary variable approach to generate the minimization problem. It introduces a modified energy and enables unconditional energy dissipation law at the discrete level. By taking the parameter as a function of time t, this network can predict the accurate solution at any further time with feeding data only at the initial state. The data needed can be generated by the initial conditions, which are readily available. In order to validate the accuracy and efficiency of our neural networks, we provide numerical simulations of several partial differential equations, including heat equations, parametric heat equations and Allen-Cahn equations. △ Less

Submitted 9 June, 2023; originally announced June 2023.

Comments: 18 pages

arXiv:2211.10837 [pdf, other]

Non-reversible Parallel Tempering for Deep Posterior Approximation

Authors: Wei Deng, Qian Zhang, Qi Feng, Faming Liang, Guang Lin

Abstract: Parallel tempering (PT), also known as replica exchange, is the go-to workhorse for simulations of multi-modal distributions. The key to the success of PT is to adopt efficient swap schemes. The popular deterministic even-odd (DEO) scheme exploits the non-reversibility property and has successfully reduced the communication cost from $O(P^2)$ to $O(P)$ given sufficiently many $P$ chains. However,… ▽ More Parallel tempering (PT), also known as replica exchange, is the go-to workhorse for simulations of multi-modal distributions. The key to the success of PT is to adopt efficient swap schemes. The popular deterministic even-odd (DEO) scheme exploits the non-reversibility property and has successfully reduced the communication cost from $O(P^2)$ to $O(P)$ given sufficiently many $P$ chains. However, such an innovation largely disappears in big data due to the limited chains and few bias-corrected swaps. To handle this issue, we generalize the DEO scheme to promote non-reversibility and propose a few solutions to tackle the underlying bias caused by the geometric stopping time. Notably, in big data scenarios, we obtain an appealing communication cost $O(P\log P)$ based on the optimal window size. In addition, we also adopt stochastic gradient descent (SGD) with large and constant learning rates as exploration kernels. Such a user-friendly nature enables us to conduct approximation tasks for complex posteriors without much tuning costs. △ Less

Submitted 19 November, 2022; originally announced November 2022.

Comments: Accepted by AAAI 2023

arXiv:2205.15268 [pdf, other]

Federated X-Armed Bandit

Authors: Wenjie Li, Qifan Song, Jean Honorio, Guang Lin

Abstract: This work establishes the first framework of federated $\mathcal{X}$-armed bandit, where different clients face heterogeneous local objective functions defined on the same domain and are required to collaboratively figure out the global optimum. We propose the first federated algorithm for such problems, named \texttt{Fed-PNE}. By utilizing the topological structure of the global objective inside… ▽ More This work establishes the first framework of federated $\mathcal{X}$-armed bandit, where different clients face heterogeneous local objective functions defined on the same domain and are required to collaboratively figure out the global optimum. We propose the first federated algorithm for such problems, named \texttt{Fed-PNE}. By utilizing the topological structure of the global objective inside the hierarchical partitioning and the weak smoothness property, our algorithm achieves sublinear cumulative regret with respect to both the number of clients and the evaluation budget. Meanwhile, it only requires logarithmic communications between the central server and clients, protecting the client privacy. Experimental results on synthetic functions and real datasets validate the advantages of \texttt{Fed-PNE} over various centralized and federated baseline algorithms. △ Less

Submitted 15 May, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

arXiv:2204.04819 [pdf, other]

RMFGP: Rotated Multi-fidelity Gaussian process with Dimension Reduction for High-dimensional Uncertainty Quantification

Authors: Jiahao Zhang, Shiqi Zhang, Guang Lin

Abstract: Multi-fidelity modelling arises in many situations in computational science and engineering world. It enables accurate inference even when only a small set of accurate data is available. Those data often come from a high-fidelity model, which is computationally expensive. By combining the realizations of the high-fidelity model with one or more low-fidelity models, the multi-fidelity method can ma… ▽ More Multi-fidelity modelling arises in many situations in computational science and engineering world. It enables accurate inference even when only a small set of accurate data is available. Those data often come from a high-fidelity model, which is computationally expensive. By combining the realizations of the high-fidelity model with one or more low-fidelity models, the multi-fidelity method can make accurate predictions of quantities of interest. This paper proposes a new dimension reduction framework based on rotated multi-fidelity Gaussian process regression and a Bayesian active learning scheme when the available precise observations are insufficient. By drawing samples from the trained rotated multi-fidelity model, the so-called supervised dimension reduction problems can be solved following the idea of the sliced average variance estimation (SAVE) method combined with a Gaussian process regression dimension reduction technique. This general framework we develop can effectively solve high-dimensional problems while the data are insufficient for applying traditional dimension reduction methods. Moreover, a more accurate surrogate Gaussian process model of the original problem can be obtained based on our trained model. The effectiveness of the proposed rotated multi-fidelity Gaussian process(RMFGP) model is demonstrated in four numerical examples. The results show that our method has better performance in all cases and uncertainty propagation analysis is performed for last two cases involving stochastic partial differential equations. △ Less

Submitted 10 April, 2022; originally announced April 2022.

arXiv:2204.03193 [pdf, other]

MultiAuto-DeepONet: A Multi-resolution Autoencoder DeepONet for Nonlinear Dimension Reduction, Uncertainty Quantification and Operator Learning of Forward and Inverse Stochastic Problems

Authors: Jiahao Zhang, Shiqi Zhang, Guang Lin

Abstract: A new data-driven method for operator learning of stochastic differential equations(SDE) is proposed in this paper. The central goal is to solve forward and inverse stochastic problems more effectively using limited data. Deep operator network(DeepONet) has been proposed recently for operator learning. Compared to other neural networks to learn functions, it aims at the problem of learning nonline… ▽ More A new data-driven method for operator learning of stochastic differential equations(SDE) is proposed in this paper. The central goal is to solve forward and inverse stochastic problems more effectively using limited data. Deep operator network(DeepONet) has been proposed recently for operator learning. Compared to other neural networks to learn functions, it aims at the problem of learning nonlinear operators. However, it can be challenging by using the original model to learn nonlinear operators for high-dimensional stochastic problems. We propose a new multi-resolution autoencoder DeepONet model referred to as MultiAuto-DeepONet to deal with this difficulty with the aid of convolutional autoencoder. The encoder part of the network is designed to reduce the dimensionality as well as discover the hidden features of high-dimensional stochastic inputs. The decoder is designed to have a special structure, i.e. in the form of DeepONet. The first DeepONet in decoder is designed to reconstruct the input function involving randomness while the second one is used to approximate the solution of desired equations. Those two DeepONets has a common branch net and two independent trunk nets. This architecture enables us to deal with multi-resolution inputs naturally. By adding $L_1$ regularization to our network, we found the outputs from the branch net and two trunk nets all have sparse structures. This reduces the number of trainable parameters in the neural network thus making the model more efficient. Finally, we conduct several numerical experiments to illustrate the effectiveness of our proposed MultiAuto-DeepONet model with uncertainty quantification. △ Less

Submitted 6 April, 2022; originally announced April 2022.

arXiv:2204.02583 [pdf, other]

PAGP: A physics-assisted Gaussian process framework with active learning for forward and inverse problems of partial differential equations

Authors: Jiahao Zhang, Shiqi Zhang, Guang Lin

Abstract: In this work, a Gaussian process regression(GPR) model incorporated with given physical information in partial differential equations(PDEs) is developed: physics-assisted Gaussian processes(PAGP). The targets of this model can be divided into two types of problem: finding solutions or discovering unknown coefficients of given PDEs with initial and boundary conditions. We introduce three different… ▽ More In this work, a Gaussian process regression(GPR) model incorporated with given physical information in partial differential equations(PDEs) is developed: physics-assisted Gaussian processes(PAGP). The targets of this model can be divided into two types of problem: finding solutions or discovering unknown coefficients of given PDEs with initial and boundary conditions. We introduce three different models: continuous time, discrete time and hybrid models. The given physical information is integrated into Gaussian process model through our designed GP loss functions. Three types of loss function are provided in this paper based on two different approaches to train the standard GP model. The first part of the paper introduces the continuous time model which treats temporal domain the same as spatial domain. The unknown coefficients in given PDEs can be jointly learned with GP hyper-parameters by minimizing the designed loss function. In the discrete time models, we first choose a time discretization scheme to discretize the temporal domain. Then the PAGP model is applied at each time step together with the scheme to approximate PDE solutions at given test points of final time. To discover unknown coefficients in this setting, observations at two specific time are needed and a mixed mean square error function is constructed to obtain the optimal coefficients. In the last part, a novel hybrid model combining the continuous and discrete time models is presented. It merges the flexibility of continuous time model and the accuracy of the discrete time model. The performance of choosing different models with different GP loss functions is also discussed. The effectiveness of the proposed PAGP methods is illustrated in our numerical section. △ Less

Submitted 6 April, 2022; originally announced April 2022.

arXiv:2202.13448

Federated Online Sparse Decision Making

Authors: Chi-Hua Wang, Wenjie Li, Guang Cheng, Guang Lin

Abstract: This paper presents a novel federated linear contextual bandits model, where individual clients face different K-armed stochastic bandits with high-dimensional decision context and coupled through common global parameters. By leveraging the sparsity structure of the linear reward , a collaborative algorithm named \texttt{Fedego Lasso} is proposed to cope with the heterogeneity across clients witho… ▽ More This paper presents a novel federated linear contextual bandits model, where individual clients face different K-armed stochastic bandits with high-dimensional decision context and coupled through common global parameters. By leveraging the sparsity structure of the linear reward , a collaborative algorithm named \texttt{Fedego Lasso} is proposed to cope with the heterogeneity across clients without exchanging local decision context vectors or raw reward data. \texttt{Fedego Lasso} relies on a novel multi-client teamwork-selfish bandit policy design, and achieves near-optimal regrets for shared parameter cases with logarithmic communication costs. In addition, a new conceptual tool called federated-egocentric policies is introduced to delineate exploration-exploitation trade-off. Experiments demonstrate the effectiveness of the proposed algorithms on both synthetic and real-world datasets. △ Less

Submitted 20 March, 2022; v1 submitted 27 February, 2022; originally announced February 2022.

Comments: This paper has been withdrawn by the author due to a revision decision

arXiv:2202.09867 [pdf, other]

Interacting Contour Stochastic Gradient Langevin Dynamics

Authors: Wei Deng, Siqi Liang, Botao Hao, Guang Lin, Faming Liang

Abstract: We propose an interacting contour stochastic gradient Langevin dynamics (ICSGLD) sampler, an embarrassingly parallel multiple-chain contour stochastic gradient Langevin dynamics (CSGLD) sampler with efficient interactions. We show that ICSGLD can be theoretically more efficient than a single-chain CSGLD with an equivalent computational budget. We also present a novel random-field function, which f… ▽ More We propose an interacting contour stochastic gradient Langevin dynamics (ICSGLD) sampler, an embarrassingly parallel multiple-chain contour stochastic gradient Langevin dynamics (CSGLD) sampler with efficient interactions. We show that ICSGLD can be theoretically more efficient than a single-chain CSGLD with an equivalent computational budget. We also present a novel random-field function, which facilitates the estimation of self-adapting parameters in big data and obtains free mode explorations. Empirically, we compare the proposed algorithm with popular benchmark methods for posterior sampling. The numerical results show a great potential of ICSGLD for large-scale uncertainty estimation tasks. △ Less

Submitted 20 February, 2022; originally announced February 2022.

Comments: ICLR 2022

arXiv:2112.05120 [pdf, other]

On Convergence of Federated Averaging Langevin Dynamics

Authors: Wei Deng, Qian Zhang, Yi-An Ma, Zhao Song, Guang Lin

Abstract: We propose a federated averaging Langevin algorithm (FA-LD) for uncertainty quantification and mean predictions with distributed clients. In particular, we generalize beyond normal posterior distributions and consider a general class of models. We develop theoretical guarantees for FA-LD for strongly log-concave distributions with non-i.i.d data and study how the injected noise and the stochastic-… ▽ More We propose a federated averaging Langevin algorithm (FA-LD) for uncertainty quantification and mean predictions with distributed clients. In particular, we generalize beyond normal posterior distributions and consider a general class of models. We develop theoretical guarantees for FA-LD for strongly log-concave distributions with non-i.i.d data and study how the injected noise and the stochastic-gradient noise, the heterogeneity of data, and the varying learning rates affect the convergence. Such an analysis sheds light on the optimal choice of local updates to minimize communication costs. Important to our approach is that the communication efficiency does not deteriorate with the injected noise in the Langevin algorithms. In addition, we examine in our FA-LD algorithm both independent and correlated noise used over different clients. We observe there is a trade-off between the pairs among communication, accuracy, and data privacy. As local devices may become inactive in federated networks, we also show convergence results based on different averaging schemes where only partial device updates are available. In such a case, we discover an additional bias that does not decay to zero. △ Less

Submitted 5 October, 2023; v1 submitted 9 December, 2021; originally announced December 2021.

Comments: A polished proof without the federated formulation of Langevin diffusion to avoid confusion

arXiv:2102.01432 [pdf, other]

Bayesian data-driven discovery of partial differential equations with variable coefficients

Authors: Aoxue Chen, Yifan Du, Liyao Mars Gao, Guang Lin

Abstract: The discovery of Partial Differential Equations (PDEs) is an essential task for applied science and engineering. However, data-driven discovery of PDEs is generally challenging, primarily stemming from the sensitivity of the discovered equation to noise and the complexities of model selection. In this work, we propose an advanced Bayesian sparse learning algorithm for PDE discovery with variable c… ▽ More The discovery of Partial Differential Equations (PDEs) is an essential task for applied science and engineering. However, data-driven discovery of PDEs is generally challenging, primarily stemming from the sensitivity of the discovered equation to noise and the complexities of model selection. In this work, we propose an advanced Bayesian sparse learning algorithm for PDE discovery with variable coefficients, predominantly when the coefficients are spatially or temporally dependent. Specifically, we apply threshold Bayesian group Lasso regression with a spike-and-slab prior (tBGL-SS) and leverage a Gibbs sampler for Bayesian posterior estimation of PDE coefficients. This approach not only enhances the robustness of point estimation with valid uncertainty quantification but also relaxes the computational burden from Bayesian inference through the integration of coefficient thresholds as an approximate MCMC method. Moreover, from the quantified uncertainties, we propose a Bayesian total error bar criteria for model selection, which outperforms classic metrics including the root mean square and the Akaike information criterion. The capability of this method is illustrated by the discovery of several classical benchmark PDEs with spatially or temporally varying coefficients from solution data obtained from the reference simulations. In the experiments, we show that the tBGL-SS method is more robust than the baseline methods under noisy environments and provides better model selection criteria along the regularization path. △ Less

Submitted 26 March, 2024; v1 submitted 2 February, 2021; originally announced February 2021.

arXiv:2012.00960 [pdf, ps, other]

doi 10.1080/02331888.2021.1893728

An Identity for Two Integral Transforms Applied to the Uniqueness of a Distribution via its Laplace-Stieltjes Transform

Authors: Gwo Dong Lin, Xiaoling Dou

Abstract: It is well known that the Laplace-Stieltjes transform of a nonnegative random variable (or random vector) uniquely determines its distribution function. We extend this uniqueness theorem by using the Muntz-Szasz Theorem and the identity for the Laplace-Stieltjes and Laplace-Carson transforms of a distribution function. The latter appears for the first time to the best of our knowledge. In particul… ▽ More It is well known that the Laplace-Stieltjes transform of a nonnegative random variable (or random vector) uniquely determines its distribution function. We extend this uniqueness theorem by using the Muntz-Szasz Theorem and the identity for the Laplace-Stieltjes and Laplace-Carson transforms of a distribution function. The latter appears for the first time to the best of our knowledge. In particular, if X and Y are two nonnegative random variables with joint distribution H, then H can be characterized by a suitable set of countably many values of its bivariate Laplace-Stieltjes transform. The general high-dimensional case is also investigated. Besides, Lerch's uniqueness theorem for conventional Laplace transforms is extended as well. The identity can be used to simplify the calculation of Laplace-Stieltjes transforms when the underlying distributions have singular parts. Finally, some examples are given to illustrate the characterization results via the uniqueness theorem. △ Less

Submitted 6 March, 2021; v1 submitted 1 December, 2020; originally announced December 2020.

Comments: 22 pages

MSC Class: 62E10; 60E05; 46F12; 30E05

Journal ref: Statistics 2021

arXiv:2010.09800 [pdf, other]

A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions

Authors: Wei Deng, Guang Lin, Faming Liang

Abstract: We propose an adaptively weighted stochastic gradient Langevin dynamics algorithm (SGLD), so-called contour stochastic gradient Langevin dynamics (CSGLD), for Bayesian learning in big data statistics. The proposed algorithm is essentially a \emph{scalable dynamic importance sampler}, which automatically \emph{flattens} the target distribution such that the simulation for a multi-modal distribution… ▽ More We propose an adaptively weighted stochastic gradient Langevin dynamics algorithm (SGLD), so-called contour stochastic gradient Langevin dynamics (CSGLD), for Bayesian learning in big data statistics. The proposed algorithm is essentially a \emph{scalable dynamic importance sampler}, which automatically \emph{flattens} the target distribution such that the simulation for a multi-modal distribution can be greatly facilitated. Theoretically, we prove a stability condition and establish the asymptotic convergence of the self-adapting parameter to a {\it unique fixed-point}, regardless of the non-convexity of the original energy function; we also present an error analysis for the weighted averaging estimators. Empirically, the CSGLD algorithm is tested on multiple benchmark datasets including CIFAR10 and CIFAR100. The numerical results indicate its superiority to avoid the local trap problem in training deep neural networks. △ Less

Submitted 23 May, 2022; v1 submitted 19 October, 2020; originally announced October 2020.

Comments: Accepted by NeurIPS 2020

arXiv:2010.01084 [pdf, other]

Accelerating Convergence of Replica Exchange Stochastic Gradient MCMC via Variance Reduction

Authors: Wei Deng, Qi Feng, Georgios Karagiannis, Guang Lin, Faming Liang

Abstract: Replica exchange stochastic gradient Langevin dynamics (reSGLD) has shown promise in accelerating the convergence in non-convex learning; however, an excessively large correction for avoiding biases from noisy energy estimators has limited the potential of the acceleration. To address this issue, we study the variance reduction for noisy energy estimators, which promotes much more effective swaps.… ▽ More Replica exchange stochastic gradient Langevin dynamics (reSGLD) has shown promise in accelerating the convergence in non-convex learning; however, an excessively large correction for avoiding biases from noisy energy estimators has limited the potential of the acceleration. To address this issue, we study the variance reduction for noisy energy estimators, which promotes much more effective swaps. Theoretically, we provide a non-asymptotic analysis on the exponential acceleration for the underlying continuous-time Markov jump process; moreover, we consider a generalized Girsanov theorem which includes the change of Poisson measure to overcome the crude discretization based on the Gröwall's inequality and yields a much tighter error in the 2-Wasserstein ($\mathcal{W}_2$) distance. Numerically, we conduct extensive experiments and obtain the state-of-the-art results in optimization and uncertainty estimates for synthetic experiments and image data. △ Less

Submitted 18 March, 2021; v1 submitted 2 October, 2020; originally announced October 2020.

Comments: Accepted by ICLR 2021

arXiv:2008.05367 [pdf, other]

Non-convex Learning via Replica Exchange Stochastic Gradient MCMC

Authors: Wei Deng, Qi Feng, Liyao Gao, Faming Liang, Guang Lin

Abstract: Replica exchange Monte Carlo (reMC), also known as parallel tempering, is an important technique for accelerating the convergence of the conventional Markov Chain Monte Carlo (MCMC) algorithms. However, such a method requires the evaluation of the energy function based on the full dataset and is not scalable to big data. The naïve implementation of reMC in mini-batch settings introduces large bias… ▽ More Replica exchange Monte Carlo (reMC), also known as parallel tempering, is an important technique for accelerating the convergence of the conventional Markov Chain Monte Carlo (MCMC) algorithms. However, such a method requires the evaluation of the energy function based on the full dataset and is not scalable to big data. The naïve implementation of reMC in mini-batch settings introduces large biases, which cannot be directly extended to the stochastic gradient MCMC (SGMCMC), the standard sampling method for simulating from deep neural networks (DNNs). In this paper, we propose an adaptive replica exchange SGMCMC (reSGMCMC) to automatically correct the bias and study the corresponding properties. The analysis implies an acceleration-accuracy trade-off in the numerical discretization of a Markov jump process in a stochastic environment. Empirically, we test the algorithm through extensive experiments on various setups and obtain the state-of-the-art results on CIFAR10, CIFAR100, and SVHN in both supervised learning and semi-supervised learning tasks. △ Less

Submitted 22 March, 2021; v1 submitted 12 August, 2020; originally announced August 2020.

Comments: Accepted by ICML 2020

arXiv:2008.05129 [pdf, other]

Open Set Recognition with Conditional Probabilistic Generative Models

Authors: Xin Sun, Chi Zhang, Guosheng Lin, Keck-Voon Ling

Abstract: Deep neural networks have made breakthroughs in a wide range of visual understanding tasks. A typical challenge that hinders their real-world applications is that unknown samples may be fed into the system during the testing phase, but traditional deep neural networks will wrongly recognize these unknown samples as one of the known classes. Open set recognition (OSR) is a potential solution to ove… ▽ More Deep neural networks have made breakthroughs in a wide range of visual understanding tasks. A typical challenge that hinders their real-world applications is that unknown samples may be fed into the system during the testing phase, but traditional deep neural networks will wrongly recognize these unknown samples as one of the known classes. Open set recognition (OSR) is a potential solution to overcome this problem, where the open set classifier should have the flexibility to reject unknown samples and meanwhile maintain high classification accuracy in known classes. Probabilistic generative models, such as Variational Autoencoders (VAE) and Adversarial Autoencoders (AAE), are popular methods to detect unknowns, but they cannot provide discriminative representations for known classification. In this paper, we propose a novel framework, called Conditional Probabilistic Generative Models (CPGM), for open set recognition. The core insight of our work is to add discriminative information into the probabilistic generative models, such that the proposed models can not only detect unknown samples but also classify known classes by forcing different latent features to approximate conditional Gaussian distributions. We discuss many model variants and provide comprehensive experiments to study their characteristics. Experiment results on multiple benchmark datasets reveal that the proposed method significantly outperforms the baselines and achieves new state-of-the-art performance. △ Less

Submitted 9 February, 2021; v1 submitted 12 August, 2020; originally announced August 2020.

Comments: Extended version of CGDL arXiv:2003.08823 in CVPR2020

arXiv:2008.03838 [pdf, ps, other]

Generalized k-Means in GLMs with Applications to the Outbreak of COVID-19 in the United States

Authors: Tonglin Zhang, Ge Lin

Abstract: Generalized $k$-means can be incorporated with any similarity or dissimilarity measure for clustering. By choosing the dissimilarity measure as the well known likelihood ratio or $F$-statistic, this work proposes a method based on generalized $k$-means to group statistical models. Given the number of clusters $k$, the method is established under hypothesis tests between statistical models. If $k$… ▽ More Generalized $k$-means can be incorporated with any similarity or dissimilarity measure for clustering. By choosing the dissimilarity measure as the well known likelihood ratio or $F$-statistic, this work proposes a method based on generalized $k$-means to group statistical models. Given the number of clusters $k$, the method is established under hypothesis tests between statistical models. If $k$ is unknown, then the method can be combined with GIC to automatically select the best $k$ for clustering. The article investigates both AIC and BIC as the special cases. Theoretical and simulation results show that the number of clusters can be identified by BIC but not AIC. The resulting method for GLMs is used to group the state-level time series patterns for the outbreak of COVID-19 in the United States. A further study shows that the statistical models between the clusters are significantly different from each other. This study confirms the result given by the proposed method based on generalized $k$-means. △ Less

Submitted 9 August, 2020; originally announced August 2020.

MSC Class: 62H30; 62J12

arXiv:2008.01066 [pdf, ps, other]

doi 10.4208/cicp.OA-2020-0151

Multifidelity Data Fusion via Gradient-Enhanced Gaussian Process Regression

Authors: Yixiang Deng, Guang Lin, Xiu Yang

Abstract: We propose a data fusion method based on multi-fidelity Gaussian process regression (GPR) framework. This method combines available data of the quantity of interest (QoI) and its gradients with different fidelity levels, namely, it is a Gradient-enhanced Cokriging method (GE-Cokriging). It provides the approximations of both the QoI and its gradients simultaneously with uncertainty estimates. We c… ▽ More We propose a data fusion method based on multi-fidelity Gaussian process regression (GPR) framework. This method combines available data of the quantity of interest (QoI) and its gradients with different fidelity levels, namely, it is a Gradient-enhanced Cokriging method (GE-Cokriging). It provides the approximations of both the QoI and its gradients simultaneously with uncertainty estimates. We compare this method with the conventional multi-fidelity Cokriging method that does not use gradients information, and the result suggests that GE-Cokriging has a better performance in predicting both QoI and its gradients. Moreover, GE-Cokriging even shows better generalization result in some cases where Cokriging performs poorly due to the singularity of the covariance matrix. We demonstrate the application of GE-Cokriging in several practical cases including reconstructing the trajectories and velocity of an underdamped oscillator with respect to time simultaneously, and investigating the sensitivity of power factor of a load bus with respect to varying power inputs of a generator bus in a large scale power system. We also show that though GE-Cokriging method requires a little bit higher computational cost than Cokriging method, the result of accuracy comparison shows that this cost is usually worth it. △ Less

Submitted 3 August, 2020; originally announced August 2020.

arXiv:2005.13461 [pdf, other]

Peri-Net-Pro: The neural processes with quantified uncertainty for crack patterns

Authors: Moonseop Kim, Guang Lin

Abstract: This paper uses the peridynamic theory, which is well-suited to crack studies, to predict the crack patterns in a moving disk and classify them according to the modes and finally perform regression analysis. In that way, the crack patterns are obtained according to each mode by Molecular Dynamic (MD) simulation using the peridynamics. Image classification and regression studies are conducted throu… ▽ More This paper uses the peridynamic theory, which is well-suited to crack studies, to predict the crack patterns in a moving disk and classify them according to the modes and finally perform regression analysis. In that way, the crack patterns are obtained according to each mode by Molecular Dynamic (MD) simulation using the peridynamics. Image classification and regression studies are conducted through Convolutional Neural Networks (CNNs) and the neural processes. First, we increased the amount and quality of the data using peridynamics, which can theoretically compensate for the problems of the finite element method (FEM) in generating crack pattern images. Second, we did the case study for the PMB, LPS, and VES models that were obtained using the peridynamic theory. Case studies were performed to classify the images using CNNs and determine the PMB, LBS, and VES models' suitability. Finally, we performed the regression analysis for the images of the crack patterns with neural processes to predict the crack patterns. In the regression problem, by representing the results of the variance according to the epochs, it can be confirmed that the result of the variance is decreased by increasing the epoch numbers through the neural processes. The most critical point of this study is that the neural processes make an accurate prediction even if there are missing or insufficient training data. △ Less

Submitted 23 May, 2020; originally announced May 2020.

arXiv:2005.08638 [pdf, other]

Multi-Fidelity Gaussian Process based Empirical Potential Development for Si:H Nanowires

Authors: Moonseop Kim, Huayi Yin, Guang Lin

Abstract: In material modeling, the calculation speed using the empirical potentials is fast compared to the first principle calculations, but the results are not as accurate as of the first principle calculations. First principle calculations are accurate but slow and very expensive to calculate. In this work, first, the H-H binding energy and H$_2$-H$_2$ interaction energy are calculated using the first p… ▽ More In material modeling, the calculation speed using the empirical potentials is fast compared to the first principle calculations, but the results are not as accurate as of the first principle calculations. First principle calculations are accurate but slow and very expensive to calculate. In this work, first, the H-H binding energy and H$_2$-H$_2$ interaction energy are calculated using the first principle calculations which can be applied to the Tersoff empirical potential. Second, the H-H parameters are estimated. After fitting H-H parameters, the mechanical properties are obtained. Finally, to integrate both the low-fidelity empirical potential data and the data from the high-fidelity first-principle calculations, the multi-fidelity Gaussian process regression is employed to predict the H-H binding energy and the H$_2$-H$_2$ interaction energy. Numerical results demonstrate the accuracy of the developed empirical potentials. △ Less

Submitted 10 May, 2020; originally announced May 2020.

Comments: 7pages, 9 figures, 1 table

arXiv:2005.04286 [pdf, other]

RotEqNet: Rotation-Equivariant Network for Fluid Systems with Symmetric High-Order Tensors

Authors: Liyao Gao, Yifan Du, Hongshan Li, Guang Lin

Abstract: In the recent application of scientific modeling, machine learning models are largely applied to facilitate computational simulations of fluid systems. Rotation symmetry is a general property for most symmetric fluid systems. However, in general, current machine learning methods have no theoretical way to guarantee rotational symmetry. By observing an important property of contraction and rotation… ▽ More In the recent application of scientific modeling, machine learning models are largely applied to facilitate computational simulations of fluid systems. Rotation symmetry is a general property for most symmetric fluid systems. However, in general, current machine learning methods have no theoretical way to guarantee rotational symmetry. By observing an important property of contraction and rotation operation on high-order symmetric tensors, we prove that the rotation operation is preserved via tensor contraction. Based on this theoretical justification, in this paper, we introduce Rotation-Equivariant Network (RotEqNet) to guarantee the property of rotation-equivariance for high-order tensors in fluid systems. We implement RotEqNet and evaluate our claims through four case studies on various fluid systems. The property of error reduction and rotation-equivariance is verified in these case studies. Results from the comparative study show that our method outperforms conventional methods, which rely on data augmentation. △ Less

Submitted 28 April, 2020; originally announced May 2020.

Comments: Preprint submitted to Journal of Computational Physics

arXiv:2004.07300 [pdf, other]

Gumbel-softmax-based Optimization: A Simple General Framework for Optimization Problems on Graphs

Authors: Yaoxin Li, Jing Liu, Guozheng Lin, Yueyuan Hou, Muyun Mou, Jiang Zhang

Abstract: In computer science, there exist a large number of optimization problems defined on graphs, that is to find a best node state configuration or a network structure such that the designed objective function is optimized under some constraints. However, these problems are notorious for their hardness to solve because most of them are NP-hard or NP-complete. Although traditional general methods such a… ▽ More In computer science, there exist a large number of optimization problems defined on graphs, that is to find a best node state configuration or a network structure such that the designed objective function is optimized under some constraints. However, these problems are notorious for their hardness to solve because most of them are NP-hard or NP-complete. Although traditional general methods such as simulated annealing (SA), genetic algorithms (GA) and so forth have been devised to these hard problems, their accuracy and time consumption are not satisfying in practice. In this work, we proposed a simple, fast, and general algorithm framework based on advanced automatic differentiation technique empowered by deep learning frameworks. By introducing Gumbel-softmax technique, we can optimize the objective function directly by gradient descent algorithm regardless of the discrete nature of variables. We also introduce evolution strategy to parallel version of our algorithm. We test our algorithm on three representative optimization problems on graph including modularity optimization from network science, Sherrington-Kirkpatrick (SK) model from statistical physics, maximum independent set (MIS) and minimum vertex cover (MVC) problem from combinatorial optimization on graph. High-quality solutions can be obtained with much less time consuming compared to traditional approaches. △ Less

Submitted 14 April, 2020; originally announced April 2020.

Comments: arXiv admin note: text overlap with arXiv:1909.07018

arXiv:2002.06987 [pdf, other]

DeepLight: Deep Lightweight Feature Interactions for Accelerating CTR Predictions in Ad Serving

Authors: Wei Deng, Junwei Pan, Tian Zhou, Deguang Kong, Aaron Flores, Guang Lin

Abstract: Click-through rate (CTR) prediction is a crucial task in online display advertising. The embedding-based neural networks have been proposed to learn both explicit feature interactions through a shallow component and deep feature interactions using a deep neural network (DNN) component. These sophisticated models, however, slow down the prediction inference by at least hundreds of times. To address… ▽ More Click-through rate (CTR) prediction is a crucial task in online display advertising. The embedding-based neural networks have been proposed to learn both explicit feature interactions through a shallow component and deep feature interactions using a deep neural network (DNN) component. These sophisticated models, however, slow down the prediction inference by at least hundreds of times. To address the issue of significantly increased serving delay and high memory usage for ad serving in production, this paper presents \emph{DeepLight}: a framework to accelerate the CTR predictions in three aspects: 1) accelerate the model inference via explicitly searching informative feature interactions in the shallow component; 2) prune redundant layers and parameters at intra-layer and inter-layer level in the DNN component; 3) promote the sparsity of the embedding layer to preserve the most discriminant signals. By combining the above efforts, the proposed approach accelerates the model inference by 46X on Criteo dataset and 27X on Avazu dataset without any loss on the prediction accuracy. This paves the way for successfully deploying complicated embedding-based neural networks in production for ad serving. △ Less

Submitted 6 January, 2021; v1 submitted 17 February, 2020; originally announced February 2020.

Comments: Accepted by WSDM 2021; Source code: https://github.com/WayneDW/DeepLight_Deep-Lightweight-Feature-Interactions

arXiv:2001.02728 [pdf, other]

Learning Generative Models using Denoising Density Estimators

Authors: Siavash A. Bigdeli, Geng Lin, Tiziano Portenier, L. Andrea Dunbar, Matthias Zwicker

Abstract: Learning probabilistic models that can estimate the density of a given set of samples, and generate samples from that density, is one of the fundamental challenges in unsupervised machine learning. We introduce a new generative model based on denoising density estimators (DDEs), which are scalar functions parameterized by neural networks, that are efficiently trained to represent kernel density es… ▽ More Learning probabilistic models that can estimate the density of a given set of samples, and generate samples from that density, is one of the fundamental challenges in unsupervised machine learning. We introduce a new generative model based on denoising density estimators (DDEs), which are scalar functions parameterized by neural networks, that are efficiently trained to represent kernel density estimators of the data. Leveraging DDEs, our main contribution is a novel technique to obtain generative models by minimizing the KL-divergence directly. We prove that our algorithm for obtaining generative models is guaranteed to converge to the correct solution. Our approach does not require specific network architecture as in normalizing flows, nor use ordinary differential equation solvers as in continuous normalizing flows. Experimental results demonstrate substantial improvement in density estimation and competitive performance in generative model training. △ Less

Submitted 9 June, 2020; v1 submitted 8 January, 2020; originally announced January 2020.

Comments: Code and models available at https://drive.google.com/file/d/1EzKRxnFG1Hd8g6Ggvt-jvKkgpDDwK2bY

arXiv:1910.10791 [pdf, other]

An Adaptive Empirical Bayesian Method for Sparse Deep Learning

Authors: Wei Deng, Xiao Zhang, Faming Liang, Guang Lin

Abstract: We propose a novel adaptive empirical Bayesian method for sparse deep learning, where the sparsity is ensured via a class of self-adaptive spike-and-slab priors. The proposed method works by alternatively sampling from an adaptive hierarchical posterior distribution using stochastic gradient Markov Chain Monte Carlo (MCMC) and smoothly optimizing the hyperparameters using stochastic approximation… ▽ More We propose a novel adaptive empirical Bayesian method for sparse deep learning, where the sparsity is ensured via a class of self-adaptive spike-and-slab priors. The proposed method works by alternatively sampling from an adaptive hierarchical posterior distribution using stochastic gradient Markov Chain Monte Carlo (MCMC) and smoothly optimizing the hyperparameters using stochastic approximation (SA). We further prove the convergence of the proposed method to the asymptotically correct distribution under mild conditions. Empirical applications of the proposed method lead to the state-of-the-art performance on MNIST and Fashion MNIST with shallow convolutional neural networks and the state-of-the-art compression performance on CIFAR10 with Residual Networks. The proposed method also improves resistance to adversarial attacks. △ Less

Submitted 13 April, 2020; v1 submitted 23 October, 2019; originally announced October 2019.

Comments: Accepted by NeurIPS 2019; Update the assumption on the regularity of Poisson equation

arXiv:1910.08409 [pdf, other]

Inverse modeling of hydrologic parameters in CLM4 via generalized polynomial chaos in the Bayesian framework

Authors: Georgios Karagiannis, Zhangshuan Hou, Maoyi Huang, Guang Lin

Abstract: In this study, the applicability of generalized polynomial chaos (gPC) expansion for land surface model parameter estimation is evaluated. We compute the (posterior) distribution of the critical hydrological parameters that are subject to great uncertainty in the community land model (CLM). The unknown parameters include those that have been identified as the most influential factors on the simula… ▽ More In this study, the applicability of generalized polynomial chaos (gPC) expansion for land surface model parameter estimation is evaluated. We compute the (posterior) distribution of the critical hydrological parameters that are subject to great uncertainty in the community land model (CLM). The unknown parameters include those that have been identified as the most influential factors on the simulations of surface and subsurface runoff, latent and sensible heat fluxes, and soil moisture in CLM4.0. We setup the inversion problem this problem in the Bayesian framework in two steps: (i) build a surrogate model expressing the input-output mapping, and (ii) compute the posterior distributions of the input parameters. Development of the surrogate model is done with a Bayesian procedure, based on the variable selection methods that use gPC expansions. Our approach accounts for bases selection uncertainty and quantifies the importance of the gPC terms, and hence all the input parameters, via the associated posterior probabilities. △ Less

Submitted 18 October, 2019; originally announced October 2019.

arXiv:1910.03120 [pdf, other]

doi 10.1080/00401706.2020.1817790

Gaussian Process Assisted Active Learning of Physical Laws

Authors: Jiuhai Chen, Lulu Kang, Guang Lin

Abstract: In many areas of science and engineering, discovering the governing differential equations from the noisy experimental data is an essential challenge. It is also a critical step in understanding the physical phenomena and prediction of the future behaviors of the systems. However, in many cases, it is expensive or time-consuming to collect experimental data. This article provides an active learnin… ▽ More In many areas of science and engineering, discovering the governing differential equations from the noisy experimental data is an essential challenge. It is also a critical step in understanding the physical phenomena and prediction of the future behaviors of the systems. However, in many cases, it is expensive or time-consuming to collect experimental data. This article provides an active learning approach to estimate the unknown differential equations accurately with reduced experimental data size. We propose an adaptive design criterion combining the D-optimality and the maximin space-filling criterion. In contrast to active learning for other regression models, the D-optimality here requires the unknown solution of the differential equations and derivatives of the solution. We estimate the Gaussian process (GP) regression models from the available experimental data and use them as the surrogates of these unknown solution functions. The derivatives of the estimated GP models are derived and used to substitute the derivatives of the solution. Variable selection-based regression methods are used to learn the differential equations from the experimental data. Through multiple case studies, we demonstrate the proposed approach outperforms the D-optimality and the maximin space-filling design alone in terms of model accuracy and data economy. △ Less

Submitted 2 August, 2020; v1 submitted 7 October, 2019; originally announced October 2019.

Comments: 27 pages, 5 figures, 10 tables

Journal ref: Technometrics 2021. 63(3), 329-342

arXiv:1908.07136 [pdf, ps, other]

A Review of Changepoint Detection Models

Authors: Yixiao Li, Gloria Lin, Thomas Lau, Ruochen Zeng

Abstract: The objective of the change-point detection is to discover the abrupt property changes lying behind the time-series data. In this paper, we firstly summarize the definition and in-depth implication of the changepoint detection. The next stage is to elaborate traditional and some alternative model-based changepoint detection algorithms. Finally, we try to go a bit further in the theory and look int… ▽ More The objective of the change-point detection is to discover the abrupt property changes lying behind the time-series data. In this paper, we firstly summarize the definition and in-depth implication of the changepoint detection. The next stage is to elaborate traditional and some alternative model-based changepoint detection algorithms. Finally, we try to go a bit further in the theory and look into future research directions. △ Less

Submitted 19 August, 2019; originally announced August 2019.

Comments: 11 pages

arXiv:1907.07788 [pdf, other]

SubTSBR to tackle high noise and outliers for data-driven discovery of differential equations

Authors: Sheng Zhang, Guang Lin

Abstract: Data-driven discovery of differential equations has been an emerging research topic. We propose a novel algorithm subsampling-based threshold sparse Bayesian regression (SubTSBR) to tackle high noise and outliers. The subsampling technique is used for improving the accuracy of the Bayesian learning algorithm. It has two parameters: subsampling size and the number of subsamples. When the subsamplin… ▽ More Data-driven discovery of differential equations has been an emerging research topic. We propose a novel algorithm subsampling-based threshold sparse Bayesian regression (SubTSBR) to tackle high noise and outliers. The subsampling technique is used for improving the accuracy of the Bayesian learning algorithm. It has two parameters: subsampling size and the number of subsamples. When the subsampling size increases with fixed total sample size, the accuracy of our algorithm goes up and then down. When the number of subsamples increases, the accuracy of our algorithm keeps going up. We demonstrate how to use our algorithm step by step and compare our algorithm with threshold sparse Bayesian regression (TSBR) for the discovery of differential equations. We show that our algorithm produces better results. We also discuss the merits of discovering differential equations from data and demonstrate how to discover models with random initial and boundary condition as well as models with bifurcations. The numerical examples are: (1) predator-prey model with noise, (2) shallow water equations with outliers, (3) heat diffusion with random initial and boundary condition, and (4) fish-harvesting problem with bifurcations. △ Less

Submitted 27 October, 2020; v1 submitted 17 July, 2019; originally announced July 2019.

arXiv:1907.02586 [pdf, other]

doi 10.3390/electronics9030432

Structure fusion based on graph convolutional networks for semi-supervised classification

Authors: Guangfeng Lin, Jing Wang, Kaiyang Liao, Fan Zhao, Wanjun Chen

Abstract: Suffering from the multi-view data diversity and complexity for semi-supervised classification, most of existing graph convolutional networks focus on the networks architecture construction or the salient graph structure preservation, and ignore the the complete graph structure for semi-supervised classification contribution. To mine the more complete distribution structure from multi-view data wi… ▽ More Suffering from the multi-view data diversity and complexity for semi-supervised classification, most of existing graph convolutional networks focus on the networks architecture construction or the salient graph structure preservation, and ignore the the complete graph structure for semi-supervised classification contribution. To mine the more complete distribution structure from multi-view data with the consideration of the specificity and the commonality, we propose structure fusion based on graph convolutional networks (SF-GCN) for improving the performance of semi-supervised classification. SF-GCN can not only retain the special characteristic of each view data by spectral embedding, but also capture the common style of multi-view data by distance metric between multi-graph structures. Suppose the linear relationship between multi-graph structures, we can construct the optimization function of structure fusion model by balancing the specificity loss and the commonality loss. By solving this function, we can simultaneously obtain the fusion spectral embedding from the multi-view data and the fusion structure as adjacent matrix to input graph convolutional networks for semi-supervised classification. Experiments demonstrate that the performance of SF-GCN outperforms that of the state of the arts on three challenging datasets, which are Cora,Citeseer and Pubmed in citation networks. △ Less

Submitted 2 July, 2019; originally announced July 2019.

Journal ref: Electronics,2020

arXiv:1906.08018 [pdf, ps, other]

Bayesian inverse regression for dimension reduction with small datasets

Authors: Xin Cai, Guang Lin, Jinglai Li

Abstract: We consider supervised dimension reduction problems, namely to identify a low dimensional projection of the predictors $\-x$ which can retain the statistical relationship between $\-x$ and the response variable $y$. We follow the idea of the sliced inverse regression (SIR) and the sliced average variance estimation (SAVE) type of methods, which is to use the statistical information of the conditio… ▽ More We consider supervised dimension reduction problems, namely to identify a low dimensional projection of the predictors $\-x$ which can retain the statistical relationship between $\-x$ and the response variable $y$. We follow the idea of the sliced inverse regression (SIR) and the sliced average variance estimation (SAVE) type of methods, which is to use the statistical information of the conditional distribution $π(\-x|y)$ to identify the dimension reduction (DR) space. In particular we focus on the task of computing this conditional distribution without slicing the data. We propose a Bayesian framework to compute the conditional distribution where the likelihood function is obtained using the Gaussian process regression model. The conditional distribution $π(\-x|y)$ can then be computed directly via Monte Carlo sampling. We then can perform DR by considering certain moment functions (e.g. the first or the second moment) of the samples of the posterior distribution. With numerical examples, we demonstrate that the proposed method is especially effective for small data problems. △ Less

Submitted 29 October, 2019; v1 submitted 19 June, 2019; originally announced June 2019.

arXiv:1808.05465 [pdf, other]

Trimmed Ensemble Kalman Filter for Nonlinear and Non-Gaussian Data Assimilation Problems

Authors: Weixuan Li, W. Steven Rosenthal, Guang Lin

Abstract: We study the ensemble Kalman filter (EnKF) algorithm for sequential data assimilation in a general situation, that is, for nonlinear forecast and measurement models with non-additive and non-Gaussian noises. Such applications traditionally force us to choose between inaccurate Gaussian assumptions that permit efficient algorithms (e.g., EnKF), or more accurate direct sampling methods which scale p… ▽ More We study the ensemble Kalman filter (EnKF) algorithm for sequential data assimilation in a general situation, that is, for nonlinear forecast and measurement models with non-additive and non-Gaussian noises. Such applications traditionally force us to choose between inaccurate Gaussian assumptions that permit efficient algorithms (e.g., EnKF), or more accurate direct sampling methods which scale poorly with dimension (e.g., particle filters, or PF). We introduce a trimmed ensemble Kalman filter (TEnKF) which can interpolate between the limiting distributions of the EnKF and PF to facilitate adaptive control over both accuracy and efficiency. This is achieved by introducing a trimming function that removes non-Gaussian outliers that introduce errors in the correlation between the model and observed forecast, which otherwise prevent the EnKF from proposing accurate forecast updates. We show for specific trimming functions that the TEnKF exactly reproduces the limiting distributions of the EnKF and PF. We also develop an adaptive implementation which provides control of the effective sample size and allows the filter to overcome periods of increased model nonlinearity. This algorithm allow us to demonstrate substantial improvements over the traditional EnKF in convergence and robustness for the nonlinear Lorenz-63 and Lorenz-96 models. △ Less

Submitted 15 August, 2018; originally announced August 2018.

Comments: In revision, SIAM Journal of Uncertainty Quantification

MSC Class: 62F15; 60H10; 60G35

arXiv:1712.02070 [pdf]

Inverse modeling of hydrologic systems with adaptive multi-fidelity Markov chain Monte Carlo simulations

Authors: Jiangjiang Zhang, Jun Man, Guang Lin, Laosheng Wu, Lingzao Zeng

Abstract: Markov chain Monte Carlo (MCMC) simulation methods are widely used to assess parametric uncertainties of hydrologic models conditioned on measurements of observable state variables. However, when the model is CPU-intensive and high-dimensional, the computational cost of MCMC simulation will be prohibitive. In this situation, a CPU-efficient while less accurate low-fidelity model (e.g., a numerical… ▽ More Markov chain Monte Carlo (MCMC) simulation methods are widely used to assess parametric uncertainties of hydrologic models conditioned on measurements of observable state variables. However, when the model is CPU-intensive and high-dimensional, the computational cost of MCMC simulation will be prohibitive. In this situation, a CPU-efficient while less accurate low-fidelity model (e.g., a numerical model with a coarser discretization, or a data-driven surrogate) is usually adopted. Nowadays, multi-fidelity simulation methods that can take advantage of both the efficiency of the low-fidelity model and the accuracy of the high-fidelity model are gaining popularity. In the MCMC simulation, as the posterior distribution of the unknown model parameters is the region of interest, it is wise to distribute most of the computational budget (i.e., the high-fidelity model evaluations) therein. Based on this idea, in this paper we propose an adaptive multi-fidelity MCMC algorithm for efficient inverse modeling of hydrologic systems. In this method, we evaluate the high-fidelity model mainly in the posterior region through iteratively running MCMC based on a Gaussian process (GP) system that is adaptively constructed with multi-fidelity simulation. The error of the GP system is rigorously considered in the MCMC simulation and gradually reduced to a negligible level in the posterior region. Thus, the proposed method can obtain an accurate estimate of the posterior distribution with a small number of the high-fidelity model evaluations. The performance of the proposed method is demonstrated by three numerical case studies in inverse modeling of hydrologic systems. △ Less

Submitted 14 June, 2018; v1 submitted 6 December, 2017; originally announced December 2017.

Comments: 57 pages,16 figures

arXiv:1708.09824 [pdf, other]

doi 10.1016/j.spasta.2017.08.002

On the Bayesian calibration of expensive computer models with input dependent parameters

Authors: Georgios Karagiannis, Bledar A. Konomi, Guang Lin

Abstract: Computer models, aiming at simulating a complex real system, are often calibrated in the light of data to improve performance. Standard calibration methods assume that the optimal values of calibration parameters are invariant to the model inputs. In several real world applications where models involve complex parametrizations whose optimal values depend on the model inputs, such an assumption can… ▽ More Computer models, aiming at simulating a complex real system, are often calibrated in the light of data to improve performance. Standard calibration methods assume that the optimal values of calibration parameters are invariant to the model inputs. In several real world applications where models involve complex parametrizations whose optimal values depend on the model inputs, such an assumption can be too restrictive and may lead to misleading results. We propose a fully Bayesian methodology that produces input dependent optimal values for the calibration parameters, as well as it characterizes the associated uncertainties via posterior distributions. Central to methodology is the idea of formulating the calibration parameter as a step function whose uncertain structure is modeled properly via a binary treed process. Our method is particularly suitable to address problems where the computer model requires the selection of a sub-model from a set of competing ones, but the choice of the `best' sub-model may change with the input values. The method produces a selection probability for each sub-model given the input. We propose suitable reversible jump operations to facilitate the challenging computations. We assess the performance of our method against benchmark examples, and use it to analyze a real world application with a large-scale climate model. △ Less

Submitted 31 August, 2017; originally announced August 2017.

arXiv:1611.04702 [pdf]

doi 10.1002/2017WR020906

An iterative local updating ensemble smoother for estimation and uncertainty assessment of hydrologic model parameters with multimodal distributions

Authors: Jiangjiang Zhang, Guang Lin, Weixuan Li, Laosheng Wu, Lingzao Zeng

Abstract: Ensemble smoother (ES) has been widely used in inverse modeling of hydrologic systems. However, for problems where the distribution of model parameters is multimodal, using ES directly would be problematic. One popular solution is to use a clustering algorithm to identify each mode and update the clusters with ES separately. However, this strategy may not be very efficient when the dimension of pa… ▽ More Ensemble smoother (ES) has been widely used in inverse modeling of hydrologic systems. However, for problems where the distribution of model parameters is multimodal, using ES directly would be problematic. One popular solution is to use a clustering algorithm to identify each mode and update the clusters with ES separately. However, this strategy may not be very efficient when the dimension of parameter space is high or the number of modes is large. Alternatively, we propose in this paper a very simple and efficient algorithm, i.e., the iterative local updating ensemble smoother (ILUES), to explore multimodal distributions of model parameters in nonlinear hydrologic systems. The ILUES algorithm works by updating local ensembles of each sample with ES to explore possible multimodal distributions. To achieve satisfactory data matches in nonlinear problems, we adopt an iterative form of ES to assimilate the measurements multiple times. Numerical cases involving nonlinearity and multimodality are tested to illustrate the performance of the proposed method. It is shown that overall the ILUES algorithm can well quantify the parametric uncertainties of complex hydrologic models, no matter whether the multimodal distribution exists. △ Less

Submitted 25 February, 2018; v1 submitted 14 November, 2016; originally announced November 2016.

arXiv:1509.04613 [pdf, other]

doi 10.1016/j.jcp.2016.02.053

Gaussian process surrogates for failure detection: a Bayesian experimental design approach

Authors: Hongqiao Wang, Guang Lin, Jinglai Li

Abstract: An important task of uncertainty quantification is to identify {the probability of} undesired events, in particular, system failures, caused by various sources of uncertainties. In this work we consider the construction of Gaussian {process} surrogates for failure detection and failure probability estimation. In particular, we consider the situation that the underlying computer models are extremel… ▽ More An important task of uncertainty quantification is to identify {the probability of} undesired events, in particular, system failures, caused by various sources of uncertainties. In this work we consider the construction of Gaussian {process} surrogates for failure detection and failure probability estimation. In particular, we consider the situation that the underlying computer models are extremely expensive, and in this setting, determining the sampling points in the state space is of essential importance. We formulate the problem as an optimal experimental design for Bayesian inferences of the limit state (i.e., the failure boundary) and propose an efficient numerical scheme to solve the resulting optimization problem. In particular, the proposed limit-state inference method is capable of determining multiple sampling points at a time, and thus it is well suited for problems where multiple computer simulations can be performed in parallel. The accuracy and performance of the proposed method is demonstrated by both academic and practical examples. △ Less

Submitted 11 September, 2015; originally announced September 2015.

MSC Class: 65C60

arXiv:1508.04876 [pdf, other]

Parallel and Interacting Stochastic Approximation Annealing algorithms for global optimisation

Authors: Georgios Karagiannis, Bledar A. Konomi, Guang Lin, Faming Liang

Abstract: We present the parallel and interacting stochastic approximation annealing (PISAA) algorithm, a stochastic simulation procedure for global optimisation, that extends and improves the stochastic approximation annealing (SAA) by using population Monte Carlo ideas. The standard SAA algorithm guarantees convergence to the global minimum when a square-root cooling schedule is used; however the efficien… ▽ More We present the parallel and interacting stochastic approximation annealing (PISAA) algorithm, a stochastic simulation procedure for global optimisation, that extends and improves the stochastic approximation annealing (SAA) by using population Monte Carlo ideas. The standard SAA algorithm guarantees convergence to the global minimum when a square-root cooling schedule is used; however the efficiency of its performance depends crucially on its self-adjusting mechanism. Because its mechanism is based on information obtained from only a single chain, SAA may present slow convergence in complex optimisation problems. The proposed algorithm involves simulating a population of SAA chains that interact each other in a manner that ensures significant improvement of the self-adjusting mechanism and better exploration of the sampling space. Central to the proposed algorithm are the ideas of (i) recycling information from the whole population of Markov chains to design a more accurate/stable self-adjusting mechanism and (ii) incorporating more advanced proposals, such as crossover operations, for the exploration of the sampling space. PISAA presents a significantly improved performance in terms of convergence. PISAA can be implemented in parallel computing environments if available. We demonstrate the good performance of the proposed algorithm on challenging applications including Bayesian network learning and protein folding. Our numerical comparisons suggest that PISAA outperforms the simulated annealing, stochastic approximation annealing, and annealing evolutionary stochastic approximation Monte Carlo especially in high dimensional or rugged scenarios. △ Less

Submitted 20 August, 2015; originally announced August 2015.

arXiv:1506.02108 [pdf, ps, other]

Deeply Learning the Messages in Message Passing Inference

Authors: Guosheng Lin, Chunhua Shen, Ian Reid, Anton van den Hengel

Abstract: Deep structured output learning shows great promise in tasks like semantic image segmentation. We proffer a new, efficient deep structured model learning scheme, in which we show how deep Convolutional Neural Networks (CNNs) can be used to estimate the messages in message passing inference for structured prediction with Conditional Random Fields (CRFs). With such CNN message estimators, we obviate… ▽ More Deep structured output learning shows great promise in tasks like semantic image segmentation. We proffer a new, efficient deep structured model learning scheme, in which we show how deep Convolutional Neural Networks (CNNs) can be used to estimate the messages in message passing inference for structured prediction with Conditional Random Fields (CRFs). With such CNN message estimators, we obviate the need to learn or evaluate potential functions for message calculation. This confers significant efficiency for learning, since otherwise when performing structured learning for a CRF with CNN potentials it is necessary to undertake expensive inference for every stochastic gradient iteration. The network output dimension for message estimation is the same as the number of classes, in contrast to the network output for general CNN potential functions in CRFs, which is exponential in the order of the potentials. Hence CNN message learning has fewer network parameters and is more scalable for cases that a large number of classes are involved. We apply our method to semantic image segmentation on the PASCAL VOC 2012 dataset. We achieve an intersection-over-union score of 73.4 on its test set, which is the best reported result for methods using the VOC training images alone. This impressive performance demonstrates the effectiveness and usefulness of our CNN message learning method. △ Less

Submitted 8 September, 2015; v1 submitted 5 June, 2015; originally announced June 2015.

Comments: 11 pages. Appearing in Proc. The Twenty-ninth Annual Conference on Neural Information Processing Systems (NIPS), 2015, Montreal, Canada

arXiv:1408.5629 [pdf, ps, other]

doi 10.1137/140981587

Quantifying the influence of conformational uncertainty in biomolecular solvation

Authors: Huan Lei, Xiu Yang, Bin Zheng, Guang Lin, Nathan A. Baker

Abstract: Biomolecules exhibit conformational fluctuations near equilibrium states, inducing uncertainty in various biological properties in a dynamic way. We have developed a general method to quantify the uncertainty of target properties induced by conformational fluctuations. Using a generalized polynomial chaos (gPC) expansion, we construct a surrogate model of the target property with respect to varyin… ▽ More Biomolecules exhibit conformational fluctuations near equilibrium states, inducing uncertainty in various biological properties in a dynamic way. We have developed a general method to quantify the uncertainty of target properties induced by conformational fluctuations. Using a generalized polynomial chaos (gPC) expansion, we construct a surrogate model of the target property with respect to varying conformational states. We also propose a method to increase the sparsity of the gPC expansion by defining a set of conformational "active space" random variables. With the increased sparsity, we employ the compressive sensing method to accurately construct the surrogate model. We demonstrate the performance of the surrogate model by evaluating fluctuation-induced uncertainty in solvent-accessible surface area for the bovine trypsin inhibitor protein system and show that the new approach offers more accurate statistical information than standard Monte Carlo approaches. Further more, the constructed surrogate model also enables us to directly evaluate the target property under various conformational states, yielding a more accurate response surface than standard sparse grid collocation methods. In particular, the new method provides higher accuracy in high-dimensional systems, such as biomolecules, where sparse grid performance is limited by the accuracy of the computed quantity of interest. Our new framework is generalizable and can be used to investigate the uncertainty of a wide variety of target properties in biomolecular systems. △ Less

Submitted 31 August, 2015; v1 submitted 24 August, 2014; originally announced August 2014.

Comments: Accepted by Multiscale Modeling & Simulation

Journal ref: Multiscale.Model.Simul 13 (2015) 1327-1353

arXiv:1311.5947 [pdf, other]

Fast Training of Effective Multi-class Boosting Using Coordinate Descent Optimization

Authors: Guosheng Lin, Chunhua Shen, Anton van den Hengel, David Suter

Abstract: Wepresentanovelcolumngenerationbasedboostingmethod for multi-class classification. Our multi-class boosting is formulated in a single optimization problem as in Shen and Hao (2011). Different from most existing multi-class boosting methods, which use the same set of weak learners for all the classes, we train class specified weak learners (i.e., each class has a different set of weak learners). We… ▽ More Wepresentanovelcolumngenerationbasedboostingmethod for multi-class classification. Our multi-class boosting is formulated in a single optimization problem as in Shen and Hao (2011). Different from most existing multi-class boosting methods, which use the same set of weak learners for all the classes, we train class specified weak learners (i.e., each class has a different set of weak learners). We show that using separate weak learner sets for each class leads to fast convergence, without introducing additional computational overhead in the training procedure. To further make the training more efficient and scalable, we also propose a fast co- ordinate descent method for solving the optimization problem at each boosting iteration. The proposed coordinate descent method is conceptually simple and easy to implement in that it is a closed-form solution for each coordinate update. Experimental results on a variety of datasets show that, compared to a range of existing multi-class boosting meth- ods, the proposed method has much faster convergence rate and better generalization performance in most cases. We also empirically show that the proposed fast coordinate descent algorithm needs less training time than the MultiBoost algorithm in Shen and Hao (2011). △ Less

Submitted 22 November, 2013; originally announced November 2013.

Comments: Appeared in Proc. Asian Conf. Computer Vision 2012. Code can be downloaded at http://goo.gl/WluhrQ

arXiv:1301.2677 [pdf, ps, other]

EM algorithms for estimating the Bernstein copula

Authors: Xiaoling Dou, Satoshi Kuriki, Gwo Dong Lin, Donald Richards

Abstract: A method that uses order statistics to construct multivariate distributions with fixed marginals and which utilizes a representation of the Bernstein copula in terms of a finite mixture distribution is proposed. Expectation-maximization (EM) algorithms to estimate the Bernstein copula are proposed, and a local convergence property is proved. Moreover, asymptotic properties of the proposed semipara… ▽ More A method that uses order statistics to construct multivariate distributions with fixed marginals and which utilizes a representation of the Bernstein copula in terms of a finite mixture distribution is proposed. Expectation-maximization (EM) algorithms to estimate the Bernstein copula are proposed, and a local convergence property is proved. Moreover, asymptotic properties of the proposed semiparametric estimators are provided. Illustrative examples are presented using three real data sets and a 3-dimensional simulated data set. These studies show that the Bernstein copula is able to represent various distributions flexibly and that the proposed EM algorithms work well for such data. △ Less

Submitted 15 January, 2014; v1 submitted 12 January, 2013; originally announced January 2013.

Comments: 34 pages, 7 figures, 3 tables

Showing 1–46 of 46 results for author: Lin, G