Skip to main content

Showing 1–27 of 27 results for author: Wibisono, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.03472  [pdf, other

    math.OC cs.GT cs.LG math.DS math.NA

    A Symplectic Analysis of Alternating Mirror Descent

    Authors: Jonas Katona, Xiuyuan Wang, Andre Wibisono

    Abstract: Motivated by understanding the behavior of the Alternating Mirror Descent (AMD) algorithm for bilinear zero-sum games, we study the discretization of continuous-time Hamiltonian flow via the symplectic Euler method. We provide a framework for analysis using results from Hamiltonian dynamics, Lie algebra, and symplectic numerical integrators, with an emphasis on the existence and properties of a co… ▽ More

    Submitted 28 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

    Comments: 94 pages, 3 figures

  2. arXiv:2402.17067  [pdf, ps, other

    math.ST cs.IT stat.ML

    On Independent Samples Along the Langevin Diffusion and the Unadjusted Langevin Algorithm

    Authors: Jiaming Liang, Siddharth Mitra, Andre Wibisono

    Abstract: We study the rate at which the initial and current random variables become independent along a Markov chain, focusing on the Langevin diffusion in continuous time and the Unadjusted Langevin Algorithm (ULA) in discrete time. We measure the dependence between random variables via their mutual information. For the Langevin diffusion, we show the mutual information converges to $0$ exponentially fast… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: 41 pages

  3. arXiv:2312.08823  [pdf, other

    stat.CO cs.DS cs.LG math.ST stat.ML

    Fast sampling from constrained spaces using the Metropolis-adjusted Mirror Langevin algorithm

    Authors: Vishwak Srinivasan, Andre Wibisono, Ashia Wilson

    Abstract: We propose a new method called the Metropolis-adjusted Mirror Langevin algorithm for approximate sampling from distributions whose support is a compact and convex set. This algorithm adds an accept-reject filter to the Markov chain induced by a single step of the Mirror Langevin algorithm (Zhang et al., 2020), which is a basic discretisation of the Mirror Langevin dynamics. Due to the inclusion of… ▽ More

    Submitted 21 June, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: 49 pages, 6 figures, 2 tables. Shorter version without experiments accepted to COLT 2024

  4. arXiv:2309.14155  [pdf, other

    math.OC cs.LG

    Extragradient Type Methods for Riemannian Variational Inequality Problems

    Authors: Zihao Hu, Guanghui Wang, Xi Wang, Andre Wibisono, Jacob Abernethy, Molei Tao

    Abstract: Riemannian convex optimization and minimax optimization have recently drawn considerable attention. Their appeal lies in their capacity to adeptly manage the non-convexity of the objective function as well as constraints inherent in the feasible set in the Euclidean sense. In this work, we delve into monotone Riemannian Variational Inequality Problems (RVIPs), which encompass both Riemannian conve… ▽ More

    Submitted 1 June, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: Published in Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024)

  5. arXiv:2305.17244  [pdf, other

    cs.LG

    Mitigating Catastrophic Forgetting in Long Short-Term Memory Networks

    Authors: Ketaki Joshi, Raghavendra Pradyumna Pothukuchi, Andre Wibisono, Abhishek Bhattacharjee

    Abstract: Continual learning on sequential data is critical for many machine learning (ML) deployments. Unfortunately, LSTM networks, which are commonly used to learn on sequential data, suffer from catastrophic forgetting and are limited in their ability to learn multiple tasks continually. We discover that catastrophic forgetting in LSTM networks can be overcome in two novel and readily-implementable ways… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

  6. arXiv:2302.07851  [pdf, other

    math.OC cs.LG

    Continuized Acceleration for Quasar Convex Functions in Non-Convex Optimization

    Authors: Jun-Kun Wang, Andre Wibisono

    Abstract: Quasar convexity is a condition that allows some first-order methods to efficiently minimize a function even when the optimization landscape is non-convex. Previous works develop near-optimal accelerated algorithms for minimizing this class of functions, however, they require a subroutine of binary search which results in multiple calls to gradient evaluations in each iteration, and consequently t… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

    Comments: Accepted at ICLR (International Conference on Learning Representations), 2023

  7. arXiv:2211.01512  [pdf, ps, other

    cs.LG math.ST

    Convergence of the Inexact Langevin Algorithm and Score-based Generative Models in KL Divergence

    Authors: Kaylee Yingxi Yang, Andre Wibisono

    Abstract: We study the Inexact Langevin Dynamics (ILD), Inexact Langevin Algorithm (ILA), and Score-based Generative Modeling (SGM) when utilizing estimated score functions for sampling. Our focus lies in establishing stable biased convergence guarantees in terms of the Kullback-Leibler (KL) divergence. To achieve these guarantees, we impose two key assumptions: 1) the target distribution satisfies the log-… ▽ More

    Submitted 2 June, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

  8. arXiv:2210.16181  [pdf, ps, other

    cs.LG cs.DC

    Aggregation in the Mirror Space (AIMS): Fast, Accurate Distributed Machine Learning in Military Settings

    Authors: Ryan Yang, Haizhou Du, Andre Wibisono, Patrick Baker

    Abstract: Distributed machine learning (DML) can be an important capability for modern military to take advantage of data and devices distributed at multiple vantage points to adapt and learn. The existing distributed machine learning frameworks, however, cannot realize the full benefits of DML, because they are all based on the simple linear aggregation framework, but linear aggregation cannot handle the… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

    Comments: 9 pages. To be published in MILCOM 2022

  9. arXiv:2210.10019  [pdf, other

    cs.LG

    Towards Understanding GD with Hard and Conjugate Pseudo-labels for Test-Time Adaptation

    Authors: Jun-Kun Wang, Andre Wibisono

    Abstract: We consider a setting that a model needs to adapt to a new domain under distribution shifts, given that only unlabeled test samples from the new domain are accessible at test time. A common idea in most of the related works is constructing pseudo-labels for the unlabeled test samples and applying gradient descent (GD) to a loss function with the pseudo-labels. Recently, \cite{GSRK22} propose conju… ▽ More

    Submitted 25 February, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

    Comments: Accepted at ICLR (International Conference on Learning Representations), 2023

  10. arXiv:2207.02189  [pdf, other

    cs.LG stat.CO

    Accelerating Hamiltonian Monte Carlo via Chebyshev Integration Time

    Authors: Jun-Kun Wang, Andre Wibisono

    Abstract: Hamiltonian Monte Carlo (HMC) is a popular method in sampling. While there are quite a few works of studying this method on various aspects, an interesting question is how to choose its integration time to achieve acceleration. In this work, we consider accelerating the process of sampling from a distribution $π(x) \propto \exp(-f(x))$ via HMC via time-varying integration time. When the potential… ▽ More

    Submitted 14 February, 2023; v1 submitted 5 July, 2022; originally announced July 2022.

    Comments: Accepted at ICLR (International Conference on Learning Representations), 2023

  11. arXiv:2206.11872  [pdf, other

    math.OC cs.LG

    Provable Acceleration of Heavy Ball beyond Quadratics for a Class of Polyak-Łojasiewicz Functions when the Non-Convexity is Averaged-Out

    Authors: Jun-Kun Wang, Chi-Heng Lin, Andre Wibisono, Bin Hu

    Abstract: Heavy Ball (HB) nowadays is one of the most popular momentum methods in non-convex optimization. It has been widely observed that incorporating the Heavy Ball dynamic in gradient-based methods accelerates the training process of modern machine learning models. However, the progress on establishing its theoretical foundation of acceleration is apparently far behind its empirical success. Existing p… ▽ More

    Submitted 29 August, 2023; v1 submitted 22 June, 2022; originally announced June 2022.

    Comments: (ICML 2022) Proceedings of the 39th International Conference on Machine Learning;

  12. arXiv:2206.04160  [pdf, other

    cs.GT cs.LG math.DS

    Alternating Mirror Descent for Constrained Min-Max Games

    Authors: Andre Wibisono, Molei Tao, Georgios Piliouras

    Abstract: In this paper we study two-player bilinear zero-sum games with constrained strategy spaces. An instance of natural occurrences of such constraints is when mixed strategies are used, which correspond to a probability simplex constraint. We propose and analyze the alternating mirror descent algorithm, in which each player takes turns to take action following the mirror descent algorithm for constrai… ▽ More

    Submitted 8 June, 2022; originally announced June 2022.

  13. arXiv:2201.12488  [pdf, ps, other

    cs.LG cs.DC

    Achieving Efficient Distributed Machine Learning Using a Novel Non-Linear Class of Aggregation Functions

    Authors: Haizhou Du, Ryan Yang, Yijian Chen, Qiao Xiang, Andre Wibisono, Wei Huang

    Abstract: Distributed machine learning (DML) over time-varying networks can be an enabler for emerging decentralized ML applications such as autonomous driving and drone fleeting. However, the commonly used weighted arithmetic mean model aggregation function in existing DML systems can result in high model loss, low model accuracy, and slow convergence speed over time-varying networks. To address this issue… ▽ More

    Submitted 19 February, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: 13 pages, 26 figures

    ACM Class: I.2.11

  14. arXiv:2109.12077  [pdf, ps, other

    cs.DS cs.LG math.ST stat.ML

    The Mirror Langevin Algorithm Converges with Vanishing Bias

    Authors: Ruilin Li, Molei Tao, Santosh S. Vempala, Andre Wibisono

    Abstract: The technique of modifying the geometry of a problem from Euclidean to Hessian metric has proved to be quite effective in optimization, and has been the subject of study for sampling. The Mirror Langevin Diffusion (MLD) is a sampling analogue of mirror flow in continuous time, and it has nice convergence properties under log-Sobolev or Poincare inequalities relative to the Hessian metric, as shown… ▽ More

    Submitted 11 October, 2021; v1 submitted 24 September, 2021; originally announced September 2021.

  15. arXiv:1911.08418  [pdf, other

    cs.GT

    Fast Convergence of Fictitious Play for Diagonal Payoff Matrices

    Authors: Jacob Abernethy, Kevin A. Lai, Andre Wibisono

    Abstract: Fictitious Play (FP) is a simple and natural dynamic for repeated play in zero-sum games. Proposed by Brown in 1949, FP was shown to converge to a Nash Equilibrium by Robinson in 1951, albeit at a slow rate that may depend on the dimension of the problem. In 1959, Karlin conjectured that FP converges at the more natural rate of $O(1/\sqrt{t})$. However, Daskalakis and Pan disproved a version of th… ▽ More

    Submitted 15 November, 2020; v1 submitted 19 November, 2019; originally announced November 2019.

  16. arXiv:1911.01469  [pdf, ps, other

    stat.ML cs.DS cs.IT cs.LG

    Proximal Langevin Algorithm: Rapid Convergence Under Isoperimetry

    Authors: Andre Wibisono

    Abstract: We study the Proximal Langevin Algorithm (PLA) for sampling from a probability distribution $ν= e^{-f}$ on $\mathbb{R}^n$ under isoperimetry. We prove a convergence guarantee for PLA in Kullback-Leibler (KL) divergence when $ν$ satisfies log-Sobolev inequality (LSI) and $f$ has bounded second and third derivatives. This improves on the result for the Unadjusted Langevin Algorithm (ULA), and matche… ▽ More

    Submitted 4 November, 2019; originally announced November 2019.

  17. arXiv:1906.02027  [pdf, other

    math.OC cs.GT cs.LG stat.ML

    Last-iterate convergence rates for min-max optimization

    Authors: Jacob Abernethy, Kevin A. Lai, Andre Wibisono

    Abstract: While classic work in convex-concave min-max optimization relies on average-iterate convergence results, the emergence of nonconvex applications such as training Generative Adversarial Networks has led to renewed interest in last-iterate convergence guarantees. Proving last-iterate convergence is challenging because many natural algorithms, such as Simultaneous Gradient Descent/Ascent, provably di… ▽ More

    Submitted 25 October, 2019; v1 submitted 5 June, 2019; originally announced June 2019.

  18. arXiv:1903.08568  [pdf, other

    cs.DS cs.LG math.PR stat.ML

    Rapid Convergence of the Unadjusted Langevin Algorithm: Isoperimetry Suffices

    Authors: Santosh S. Vempala, Andre Wibisono

    Abstract: We study the Unadjusted Langevin Algorithm (ULA) for sampling from a probability distribution $ν= e^{-f}$ on $\mathbb{R}^n$. We prove a convergence guarantee in Kullback-Leibler (KL) divergence assuming $ν$ satisfies a log-Sobolev inequality and the Hessian of $f$ is bounded. Notably, we do not assume convexity or bounds on higher derivatives. We also prove convergence guarantees in Rényi divergen… ▽ More

    Submitted 2 March, 2022; v1 submitted 20 March, 2019; originally announced March 2019.

    Comments: v4: Updated discussion and added properties of biased limit v3: Simplified analysis of Rényi divergence, improved exposition, and added figures v2: Added analysis of Rényi divergence and Poincaré assumption

  19. arXiv:1805.01401  [pdf, ps, other

    cs.IT

    Convexity of mutual information along the Ornstein-Uhlenbeck flow

    Authors: Andre Wibisono, Varun Jog

    Abstract: We study the convexity of mutual information as a function of time along the flow of the Ornstein-Uhlenbeck process. We prove that if the initial distribution is strongly log-concave, then mutual information is eventually convex, i.e., convex for all large time. In particular, if the initial distribution is sufficiently strongly log-concave compared to the target Gaussian measure, then mutual info… ▽ More

    Submitted 31 July, 2018; v1 submitted 3 May, 2018; originally announced May 2018.

    Comments: 12 pages, 1 figure. To appear at the International Symposium on Information Theory and Its Applications (ISITA), October 2018

  20. arXiv:1802.08089  [pdf, ps, other

    math.OC cs.IT cs.LG stat.ML

    Sampling as optimization in the space of measures: The Langevin dynamics as a composite optimization problem

    Authors: Andre Wibisono

    Abstract: We study sampling as optimization in the space of measures. We focus on gradient flow-based optimization with the Langevin dynamics as a case study. We investigate the source of the bias of the unadjusted Langevin algorithm (ULA) in discrete time, and consider how to remove or reduce the bias. We point out the difficulty is that the heat flow is exactly solvable, but neither its forward nor backwa… ▽ More

    Submitted 6 June, 2018; v1 submitted 22 February, 2018; originally announced February 2018.

    Comments: To appear at the Conference on Learning Theory (COLT), July 2018

  21. arXiv:1801.06968  [pdf, ps, other

    cs.IT

    Convexity of mutual information along the heat flow

    Authors: Andre Wibisono, Varun Jog

    Abstract: We study the convexity of mutual information along the evolution of the heat equation. We prove that if the initial distribution is log-concave, then mutual information is always a convex function of time. We also prove that if the initial distribution is either bounded, or has finite fourth moment and Fisher information, then mutual information is eventually convex, i.e., convex for all large tim… ▽ More

    Submitted 7 May, 2018; v1 submitted 22 January, 2018; originally announced January 2018.

    Comments: 10 pages, 1 figure. To appear at the IEEE International Symposium on Information Theory (ISIT), June 2018

  22. arXiv:1702.03656  [pdf, ps, other

    cs.IT math.ST

    Information and estimation in Fokker-Planck channels

    Authors: Andre Wibisono, Varun Jog, Po-Ling Loh

    Abstract: We study the relationship between information- and estimation-theoretic quantities in time-evolving systems. We focus on the Fokker-Planck channel defined by a general stochastic differential equation, and show that the time derivatives of entropy, KL divergence, and mutual information are characterized by estimation-theoretic quantities involving an appropriate generalization of the Fisher inform… ▽ More

    Submitted 13 February, 2017; originally announced February 2017.

  23. arXiv:1603.04245  [pdf, ps, other

    math.OC cs.LG stat.ML

    A Variational Perspective on Accelerated Methods in Optimization

    Authors: Andre Wibisono, Ashia C. Wilson, Michael I. Jordan

    Abstract: Accelerated gradient methods play a central role in optimization, achieving optimal rates in many settings. While many generalizations and extensions of Nesterov's original acceleration method have been proposed, it is not yet clear what is the natural scope of the acceleration concept. In this paper, we study accelerated methods from a continuous-time perspective. We show that there is a Lagrangi… ▽ More

    Submitted 14 March, 2016; originally announced March 2016.

    Comments: 38 pages. Subsumes an earlier working draft arXiv:1509.03616

  24. arXiv:1312.2139  [pdf, ps, other

    math.OC cs.IT stat.ML

    Optimal rates for zero-order convex optimization: the power of two function evaluations

    Authors: John C. Duchi, Michael I. Jordan, Martin J. Wainwright, Andre Wibisono

    Abstract: We consider derivative-free algorithms for stochastic and non-stochastic convex optimization problems that use only function values rather than gradients. Focusing on non-asymptotic bounds on convergence rates, we show that if pairs of function values are available, algorithms for $d$-dimensional optimization that use gradient estimates based on random perturbations suffer a factor of at most… ▽ More

    Submitted 20 August, 2014; v1 submitted 7 December, 2013; originally announced December 2013.

    Comments: 34 pages

  25. arXiv:1307.6769  [pdf, other

    stat.ML cs.LG

    Streaming Variational Bayes

    Authors: Tamara Broderick, Nicholas Boyd, Andre Wibisono, Ashia C. Wilson, Michael I. Jordan

    Abstract: We present SDA-Bayes, a framework for (S)treaming, (D)istributed, (A)synchronous computation of a Bayesian posterior. The framework makes streaming updates to the estimated posterior according to a user-specified approximation batch primitive. We demonstrate the usefulness of our framework, with variational Bayes (VB) as the primitive, by fitting the latent Dirichlet allocation model to two large-… ▽ More

    Submitted 20 November, 2013; v1 submitted 25 July, 2013; originally announced July 2013.

    Comments: 25 pages, 3 figures, 1 table

  26. arXiv:1210.4251  [pdf

    cs.DC cs.CE q-bio.BM

    Performance Analysis Cluster and GPU Computing Environment on Molecular Dynamic Simulation of BRV-1 and REM2 with GROMACS

    Authors: Heru Suhartanto, Arry Yanuar, Ari Wibisono

    Abstract: One of application that needs high performance computing resources is molecular d ynamic. There is some software available that perform molecular dynamic, one of these is a well known GROMACS. Our previous experiment simulating molecular dynamics of Indonesian grown herbal compounds show sufficient speed up on 32 n odes Cluster computing environment. In order to obtain a reliable simulation, one u… ▽ More

    Submitted 16 October, 2012; originally announced October 2012.

    Comments: 5 pages, 1 figure, 5 tables

    Journal ref: Int. J. Comp. Sci. Issue (2011), Vol. 8, Issue 4, No 2, p131-135

  27. arXiv:1202.2585  [pdf, ps, other

    q-fin.CP cs.GT q-fin.PR

    Minimax Option Pricing Meets Black-Scholes in the Limit

    Authors: Jacob Abernethy, Rafael M. Frongillo, Andre Wibisono

    Abstract: Option contracts are a type of financial derivative that allow investors to hedge risk and speculate on the variation of an asset's future market price. In short, an option has a particular payout that is based on the market price for an asset on a given date in the future. In 1973, Black and Scholes proposed a valuation model for options that essentially estimates the tail risk of the asset price… ▽ More

    Submitted 12 February, 2012; originally announced February 2012.

    Comments: 19 pages