Skip to main content

Showing 1–28 of 28 results for author: Sidford, A

Searching in archive stat. Search in all archives.
.
  1. arXiv:2310.18265  [pdf, other

    cs.DS cs.LG math.OC stat.ML

    Structured Semidefinite Programming for Recovering Structured Preconditioners

    Authors: Arun Jambulapati, Jerry Li, Christopher Musco, Kirankumar Shiragur, Aaron Sidford, Kevin Tian

    Abstract: We develop a general framework for finding approximately-optimal preconditioners for solving linear systems. Leveraging this framework we obtain improved runtimes for fundamental preconditioning and linear system solving problems including the following. We give an algorithm which, given positive definite $\mathbf{K} \in \mathbb{R}^{d \times d}$ with $\mathrm{nnz}(\mathbf{K})$ nonzero entries, com… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: Merge of arXiv:1812.06295 and arXiv:2008.01722

  2. arXiv:2301.00457  [pdf, other

    math.OC cs.CR cs.DS cs.LG stat.ML

    ReSQueing Parallel and Private Stochastic Convex Optimization

    Authors: Yair Carmon, Arun Jambulapati, Yujia Jin, Yin Tat Lee, Daogao Liu, Aaron Sidford, Kevin Tian

    Abstract: We introduce a new tool for stochastic convex optimization (SCO): a Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. Combining ReSQue with recent advances in ball oracle acceleration [CJJJLST20, ACJJS21], we develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings. For a SCO obj… ▽ More

    Submitted 27 October, 2023; v1 submitted 1 January, 2023; originally announced January 2023.

  3. arXiv:2210.06728  [pdf, ps, other

    stat.ML cs.DS cs.IT cs.LG stat.CO

    On the Efficient Implementation of High Accuracy Optimality of Profile Maximum Likelihood

    Authors: Moses Charikar, Zhihao Jiang, Kirankumar Shiragur, Aaron Sidford

    Abstract: We provide an efficient unified plug-in approach for estimating symmetric properties of distributions given $n$ independent samples. Our estimator is based on profile-maximum-likelihood (PML) and is sample optimal for estimating various symmetric properties when the estimation error $ε\gg n^{-1/3}$. This result improves upon the previous best accuracy threshold of $ε\gg n^{-1/4}$ achievable by pol… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Comments: Accepted at NeurIPS 2022

  4. arXiv:2203.15260  [pdf, other

    cs.LG cs.CC cs.DS math.OC stat.ML

    Efficient Convex Optimization Requires Superlinear Memory

    Authors: Annie Marsden, Vatsal Sharan, Aaron Sidford, Gregory Valiant

    Abstract: We show that any memory-constrained, first-order algorithm which minimizes $d$-dimensional, $1$-Lipschitz convex functions over the unit ball to $1/\mathrm{poly}(d)$ accuracy using at most $d^{1.25 - δ}$ bits of memory must make at least $\tildeΩ(d^{1 + (4/3)δ})$ first-order queries (for any constant $δ\in [0, 1/4]$). Consequently, the performance of such memory-constrained algorithms are a polyno… ▽ More

    Submitted 24 July, 2024; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: 33 pages, 1 figure

  5. arXiv:2203.04002  [pdf, ps, other

    cs.DS cs.LG math.OC stat.ML

    Semi-Random Sparse Recovery in Nearly-Linear Time

    Authors: Jonathan A. Kelner, Jerry Li, Allen Liu, Aaron Sidford, Kevin Tian

    Abstract: Sparse recovery is one of the most fundamental and well-studied inverse problems. Standard statistical formulations of the problem are provably solved by general convex programming techniques and more practical, fast (nearly-linear time) iterative methods. However, these latter "fast algorithms" have previously been observed to be brittle in various real-world settings. We investigate the brittl… ▽ More

    Submitted 8 March, 2022; originally announced March 2022.

    Comments: 42 pages, comments welcome!

  6. arXiv:2111.03137  [pdf, other

    math.OC cs.DS cs.LG stat.ML

    Big-Step-Little-Step: Efficient Gradient Methods for Objectives with Multiple Scales

    Authors: Jonathan Kelner, Annie Marsden, Vatsal Sharan, Aaron Sidford, Gregory Valiant, Honglin Yuan

    Abstract: We provide new gradient-based methods for efficiently solving a broad class of ill-conditioned optimization problems. We consider the problem of minimizing a function $f : \mathbb{R}^d \rightarrow \mathbb{R}$ which is implicitly decomposable as the sum of $m$ unknown non-interacting smooth, strongly convex functions and provide a method which solves this problem with a number of gradient evaluatio… ▽ More

    Submitted 4 November, 2021; originally announced November 2021.

    Comments: 95 pages, 4 figures; authors are listed in alphabetical order

  7. arXiv:2011.02761  [pdf, other

    cs.DS cs.IT cs.LG stat.CO stat.ML

    Instance Based Approximations to Profile Maximum Likelihood

    Authors: Nima Anari, Moses Charikar, Kirankumar Shiragur, Aaron Sidford

    Abstract: In this paper we provide a new efficient algorithm for approximately computing the profile maximum likelihood (PML) distribution, a prominent quantity in symmetric property estimation. We provide an algorithm which matches the previous best known efficient algorithms for computing approximate PML distributions and improves when the number of distinct observed frequencies in the given instance is s… ▽ More

    Submitted 5 November, 2020; originally announced November 2020.

    Comments: Accepted at Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS 2020)

  8. arXiv:2010.05893  [pdf, other

    math.OC cs.LG stat.ML

    Large-Scale Methods for Distributionally Robust Optimization

    Authors: Daniel Levy, Yair Carmon, John C. Duchi, Aaron Sidford

    Abstract: We propose and analyze algorithms for distributionally robust optimization of convex losses with conditional value at risk (CVaR) and $χ^2$ divergence uncertainty sets. We prove that our algorithms require a number of gradient evaluations independent of training set size and number of parameters, making them suitable for large-scale applications. For $χ^2$ uncertainty sets these are the first such… ▽ More

    Submitted 10 December, 2020; v1 submitted 12 October, 2020; originally announced October 2020.

    Comments: 63 pages, NeurIPS 2020

  9. arXiv:2008.12776  [pdf, other

    cs.LG cs.DS math.OC stat.ML

    Efficiently Solving MDPs with Stochastic Mirror Descent

    Authors: Yujia Jin, Aaron Sidford

    Abstract: We present a unified framework based on primal-dual stochastic mirror descent for approximately solving infinite-horizon Markov decision processes (MDPs) given a generative model. When applied to an average-reward MDP with $A_{tot}$ total state-action pairs and mixing time bound $t_{mix}$ our method computes an $ε$-optimal policy with an expected $\widetilde{O}(t_{mix}^2 A_{tot} ε^{-2})$ samples f… ▽ More

    Submitted 28 August, 2020; originally announced August 2020.

    Comments: ICML 2020

  10. arXiv:2008.01722  [pdf, ps, other

    math.OC cs.DS cs.LG stat.ML

    Fast and Near-Optimal Diagonal Preconditioning

    Authors: Arun Jambulapati, Jerry Li, Christopher Musco, Aaron Sidford, Kevin Tian

    Abstract: The convergence rates of iterative methods for solving a linear system $\mathbf{A} x = b$ typically depend on the condition number of the matrix $\mathbf{A}$. Preconditioning is a common way of speeding up these methods by reducing that condition number in a computationally inexpensive way. In this paper, we revisit the decades-old problem of how to best improve $\mathbf{A}$'s condition number by… ▽ More

    Submitted 3 November, 2021; v1 submitted 4 August, 2020; originally announced August 2020.

    Comments: 46 pages

  11. arXiv:2004.02425  [pdf, other

    cs.DS cs.IT cs.LG stat.CO stat.ML

    The Bethe and Sinkhorn Permanents of Low Rank Matrices and Implications for Profile Maximum Likelihood

    Authors: Nima Anari, Moses Charikar, Kirankumar Shiragur, Aaron Sidford

    Abstract: In this paper we consider the problem of computing the likelihood of the profile of a discrete distribution, i.e., the probability of observing the multiset of element frequencies, and computing a profile maximum likelihood (PML) distribution, i.e., a distribution with the maximum profile likelihood. For each problem we provide polynomial time algorithms that given $n$ i.i.d.\ samples from a discr… ▽ More

    Submitted 6 April, 2020; originally announced April 2020.

    Comments: 59 pages

  12. arXiv:2003.00844  [pdf, other

    cs.DS cs.IT cs.LG stat.CO stat.ML

    A General Framework for Symmetric Property Estimation

    Authors: Moses Charikar, Kirankumar Shiragur, Aaron Sidford

    Abstract: In this paper we provide a general framework for estimating symmetric properties of distributions from i.i.d. samples. For a broad class of symmetric properties we identify the easy region where empirical estimation works and the difficult region where more complex estimators are required. We show that by approximately computing the profile maximum likelihood (PML) distribution \cite{ADOS16} in th… ▽ More

    Submitted 2 March, 2020; originally announced March 2020.

    Comments: Published in Neural Information Processing Systems 2019

  13. arXiv:1908.11071  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity

    Authors: Aaron Sidford, Mengdi Wang, Lin F. Yang, Yinyu Ye

    Abstract: In this paper, we settle the sampling complexity of solving discounted two-player turn-based zero-sum stochastic games up to polylogarithmic factors. Given a stochastic game with discount factor $γ\in(0,1)$ we provide an algorithm that computes an $ε$-optimal strategy with high-probability given $\tilde{O}((1 - γ)^{-3} ε^{-2})$ samples from the transition function for each state-action-pair. Our a… ▽ More

    Submitted 29 August, 2019; originally announced August 2019.

  14. arXiv:1906.11985  [pdf, other

    math.OC cs.CC cs.DS cs.LG stat.ML

    Near-Optimal Methods for Minimizing Star-Convex Functions and Beyond

    Authors: Oliver Hinder, Aaron Sidford, Nimit S. Sohoni

    Abstract: In this paper, we provide near-optimal accelerated first-order methods for minimizing a broad class of smooth nonconvex functions that are strictly unimodal on all lines through a minimizer. This function class, which we call the class of smooth quasar-convex functions, is parameterized by a constant $γ\in (0,1]$, where $γ= 1$ encompasses the classes of smooth convex and star-convex functions, and… ▽ More

    Submitted 24 February, 2023; v1 submitted 27 June, 2019; originally announced June 2019.

    Comments: 48 pages. Published as a conference paper at COLT 2020

  15. arXiv:1906.00618  [pdf, other

    cs.DS cs.LG math.OC stat.CO stat.ML

    A Direct $\tilde{O}(1/ε)$ Iteration Parallel Algorithm for Optimal Transport

    Authors: Arun Jambulapati, Aaron Sidford, Kevin Tian

    Abstract: Optimal transportation, or computing the Wasserstein or ``earth mover's'' distance between two distributions, is a fundamental primitive which arises in many learning and statistical settings. We give an algorithm which solves this problem to additive $ε$ with $\tilde{O}(1/ε)$ parallel depth, and $\tilde{O}\left(n^2/ε\right)$ work. Barring a breakthrough on a long-standing algorithmic open problem… ▽ More

    Submitted 3 June, 2019; originally announced June 2019.

    Comments: 23 pages, 2 figures

  16. arXiv:1905.08448  [pdf, other

    cs.DS cs.IT cs.LG stat.CO stat.ML

    Efficient Profile Maximum Likelihood for Universal Symmetric Property Estimation

    Authors: Moses Charikar, Kirankumar Shiragur, Aaron Sidford

    Abstract: Estimating symmetric properties of a distribution, e.g. support size, coverage, entropy, distance to uniformity, are among the most fundamental problems in algorithmic statistics. While each of these properties have been studied extensively and separate optimal estimators are known for each, in striking recent work, Acharya et al. 2016 showed that there is a single estimator that is competitive fo… ▽ More

    Submitted 21 May, 2019; originally announced May 2019.

    Comments: 68 pages

  17. arXiv:1904.08544  [pdf, ps, other

    cs.LG stat.ML

    Memory-Sample Tradeoffs for Linear Regression with Small Error

    Authors: Vatsal Sharan, Aaron Sidford, Gregory Valiant

    Abstract: We consider the problem of performing linear regression over a stream of $d$-dimensional examples, and show that any algorithm that uses a subquadratic amount of memory exhibits a slower rate of convergence than can be achieved without memory constraints. Specifically, consider a sequence of labeled examples $(a_1,b_1), (a_2,b_2)\ldots,$ with $a_i$ drawn independently from a $d$-dimensional isotro… ▽ More

    Submitted 10 October, 2020; v1 submitted 17 April, 2019; originally announced April 2019.

    Comments: A few minor edits over previous version

  18. arXiv:1903.02675  [pdf, other

    cs.LG cs.DS math.OC stat.ML

    A Rank-1 Sketch for Matrix Multiplicative Weights

    Authors: Yair Carmon, John C. Duchi, Aaron Sidford, Kevin Tian

    Abstract: We show that a simple randomized sketch of the matrix multiplicative weight (MMW) update enjoys (in expectation) the same regret bounds as MMW, up to a small constant factor. Unlike MMW, where every step requires full matrix exponentiation, our steps require only a single product of the form $e^A b$, which the Lanczos method approximates efficiently. Our key technique is to view the sketch as a… ▽ More

    Submitted 12 August, 2019; v1 submitted 6 March, 2019; originally announced March 2019.

  19. arXiv:1711.08426  [pdf, ps, other

    stat.ML cs.LG math.OC

    Leverage Score Sampling for Faster Accelerated Regression and ERM

    Authors: Naman Agarwal, Sham Kakade, Rahul Kidambi, Yin Tat Lee, Praneeth Netrapalli, Aaron Sidford

    Abstract: Given a matrix $\mathbf{A}\in\mathbb{R}^{n\times d}$ and a vector $b \in\mathbb{R}^{d}$, we show how to compute an $ε$-approximate solution to the regression problem $ \min_{x\in\mathbb{R}^{d}}\frac{1}{2} \|\mathbf{A} x - b\|_{2}^{2} $ in time $ \tilde{O} ((n+\sqrt{d\cdotκ_{\text{sum}}})\cdot s\cdot\logε^{-1}) $ where… ▽ More

    Submitted 22 November, 2017; originally announced November 2017.

  20. arXiv:1710.09430  [pdf, ps, other

    stat.ML cs.LG math.OC

    A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)

    Authors: Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Venkata Krishna Pillutla, Aaron Sidford

    Abstract: This work provides a simplified proof of the statistical minimax optimality of (iterate averaged) stochastic gradient descent (SGD), for the special case of least squares. This result is obtained by analyzing SGD as a stochastic process and by sharply characterizing the stationary covariance matrix of this process. The finite rate optimality characterization captures the constant factors and addre… ▽ More

    Submitted 21 July, 2018; v1 submitted 25 October, 2017; originally announced October 2017.

    Comments: Lemma 1 has been updated in v2

  21. arXiv:1704.08227  [pdf, other

    stat.ML cs.LG math.OC math.ST

    Accelerating Stochastic Gradient Descent For Least Squares Regression

    Authors: Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Aaron Sidford

    Abstract: There is widespread sentiment that it is not possible to effectively utilize fast gradient methods (e.g. Nesterov's acceleration, conjugate gradient, heavy ball) for the purposes of stochastic optimization due to their instability and error accumulation, a notion made precise in d'Aspremont 2008 and Devolder, Glineur, and Nesterov 2014. This work considers these issues for the special case of stoc… ▽ More

    Submitted 31 July, 2018; v1 submitted 26 April, 2017; originally announced April 2017.

    Comments: 54 pages, 3 figures, 1 table; updated acknowledgements, minor title change. Paper appeared in the proceedings of the Conference on Learning Theory (COLT), 2018

  22. arXiv:1610.03774  [pdf, other

    stat.ML cs.DS cs.LG

    Parallelizing Stochastic Gradient Descent for Least Squares Regression: mini-batching, averaging, and model misspecification

    Authors: Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Aaron Sidford

    Abstract: This work characterizes the benefits of averaging schemes widely used in conjunction with stochastic gradient descent (SGD). In particular, this work provides a sharp analysis of: (1) mini-batching, a method of averaging many samples of a stochastic gradient to both reduce the variance of the stochastic gradient estimate and for parallelizing SGD and (2) tail-averaging, a method involving averagin… ▽ More

    Submitted 31 July, 2018; v1 submitted 12 October, 2016; originally announced October 2016.

    Comments: 39 pages. Published in the Journal of Machine Learning Research (JMLR)

  23. arXiv:1604.03930  [pdf, ps, other

    cs.LG math.OC stat.ML

    Efficient Algorithms for Large-scale Generalized Eigenvector Computation and Canonical Correlation Analysis

    Authors: Rong Ge, Chi Jin, Sham M. Kakade, Praneeth Netrapalli, Aaron Sidford

    Abstract: This paper considers the problem of canonical-correlation analysis (CCA) (Hotelling, 1936) and, more broadly, the generalized eigenvector problem for a pair of symmetric matrices. These are two fundamental problems in data analysis and scientific computing with numerous applications in machine learning and statistics (Shi and Malik, 2000; Hardoon et al., 2004; Witten et al., 2009). We provide si… ▽ More

    Submitted 27 May, 2016; v1 submitted 13 April, 2016; originally announced April 2016.

    Comments: International Conference on Machine Learning (ICML) 2016

  24. arXiv:1602.06929  [pdf, ps, other

    cs.LG cs.DS cs.NE stat.ML

    Streaming PCA: Matching Matrix Bernstein and Near-Optimal Finite Sample Guarantees for Oja's Algorithm

    Authors: Prateek Jain, Chi Jin, Sham M. Kakade, Praneeth Netrapalli, Aaron Sidford

    Abstract: This work provides improved guarantees for streaming principle component analysis (PCA). Given $A_1, \ldots, A_n\in \mathbb{R}^{d\times d}$ sampled independently from distributions satisfying $\mathbb{E}[A_i] = Σ$ for $Σ\succeq \mathbf{0}$, this work provides an $O(d)$-space linear-time single-pass streaming algorithm for estimating the top eigenvector of $Σ$. The algorithm nearly matches (and in… ▽ More

    Submitted 28 March, 2016; v1 submitted 22 February, 2016; originally announced February 2016.

    Comments: Updated title

  25. arXiv:1602.06872  [pdf, ps, other

    cs.DS cs.LG stat.ML

    Principal Component Projection Without Principal Component Analysis

    Authors: Roy Frostig, Cameron Musco, Christopher Musco, Aaron Sidford

    Abstract: We show how to efficiently project a vector onto the top principal components of a matrix, without explicitly computing these components. Specifically, we introduce an iterative algorithm that provably computes the projection using few calls to any black-box routine for ridge regression. By avoiding explicit principal component analysis (PCA), our algorithm is the first with no runtime dependenc… ▽ More

    Submitted 26 November, 2019; v1 submitted 22 February, 2016; originally announced February 2016.

  26. arXiv:1506.07512  [pdf, other

    stat.ML cs.DS cs.LG

    Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization

    Authors: Roy Frostig, Rong Ge, Sham M. Kakade, Aaron Sidford

    Abstract: We develop a family of accelerated stochastic algorithms that minimize sums of convex functions. Our algorithms improve upon the fastest running time for empirical risk minimization (ERM), and in particular linear least-squares regression, across a wide range of problem settings. To achieve this, we establish a framework based on the classical proximal point algorithm. Namely, we provide several a… ▽ More

    Submitted 24 June, 2015; originally announced June 2015.

  27. arXiv:1412.6606  [pdf, other

    stat.ML cs.LG

    Competing with the Empirical Risk Minimizer in a Single Pass

    Authors: Roy Frostig, Rong Ge, Sham M. Kakade, Aaron Sidford

    Abstract: In many estimation problems, e.g. linear and logistic regression, we wish to minimize an unknown objective given only unbiased samples of the objective function. Furthermore, we aim to achieve this using as few samples as possible. In the absence of computational constraints, the minimizer of a sample average of observed data -- commonly referred to as either the empirical risk minimizer (ERM) or… ▽ More

    Submitted 25 February, 2015; v1 submitted 20 December, 2014; originally announced December 2014.

  28. arXiv:1408.5099  [pdf, ps, other

    cs.DS cs.LG stat.ML

    Uniform Sampling for Matrix Approximation

    Authors: Michael B. Cohen, Yin Tat Lee, Cameron Musco, Christopher Musco, Richard Peng, Aaron Sidford

    Abstract: Random sampling has become a critical tool in solving massive matrix problems. For linear regression, a small, manageable set of data rows can be randomly selected to approximate a tall, skinny data matrix, improving processing time significantly. For theoretical performance guarantees, each row must be sampled with probability proportional to its statistical leverage score. Unfortunately, leverag… ▽ More

    Submitted 21 August, 2014; originally announced August 2014.