Skip to main content

Showing 1–50 of 56 results for author: Gasnikov, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.17196  [pdf, other

    math.OC cs.GT

    An Equilibrium Dynamic Traffic Assignment Model with Linear Programming Formulation

    Authors: Victoria Guseva, Ilya Sklonin, Irina Podlipnova, Demyan Yarmoshik, Alexander Gasnikov

    Abstract: In this paper, we consider a dynamic equilibrium transportation problem. There is a fixed number of cars moving from origin to destination areas. Preferences for arrival times are expressed as a cost of arriving before or after the preferred time at the destination. Each driver aims to minimize the time spent during the trip, making the time spent a measure of cost. The chosen routes and departure… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  2. arXiv:2408.05606  [pdf, other

    cs.IR cs.AI cs.LG math.OC

    Exploring Applications of State Space Models and Advanced Training Techniques in Sequential Recommendations: A Comparative Study on Efficiency and Performance

    Authors: Mark Obozov, Makar Baderko, Stepan Kulibaba, Nikolay Kutuzov, Alexander Gasnikov

    Abstract: Recommender systems aim to estimate the dynamically changing user preferences and sequential dependencies between historical user behaviour and metadata. Although transformer-based models have proven to be effective in sequential recommendations, their state growth is proportional to the length of the sequence that is being processed, which makes them expensive in terms of memory and inference cos… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2403.07691 by other authors

  3. arXiv:2406.04443  [pdf, other

    cs.LG math.OC

    Gradient Clipping Improves AdaGrad when the Noise Is Heavy-Tailed

    Authors: Savelii Chezhegov, Yaroslav Klyukin, Andrei Semenov, Aleksandr Beznosikov, Alexander Gasnikov, Samuel Horváth, Martin Takáč, Eduard Gorbunov

    Abstract: Methods with adaptive stepsizes, such as AdaGrad and Adam, are essential for training modern Deep Learning models, especially Large Language Models. Typically, the noise in the stochastic gradients is heavy-tailed for the later ones. Gradient clipping provably helps to achieve good high-probability convergence for such noises. However, despite the similarity between AdaGrad/Adam and Clip-SGD, the… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 37 pages, 8 figures

  4. arXiv:2406.00846  [pdf, other

    cs.LG cs.DC math.OC

    Local Methods with Adaptivity via Scaling

    Authors: Savelii Chezhegov, Sergey Skorik, Nikolas Khachaturov, Danil Shalagin, Aram Avetisyan, Aleksandr Beznosikov, Martin Takáč, Yaroslav Kholodov, Alexander Gasnikov

    Abstract: The rapid development of machine learning and deep learning has introduced increasingly complex optimization challenges that must be addressed. Indeed, training modern, advanced models has become difficult to implement without leveraging multiple computing nodes in a distributed environment. Distributed optimization is also fundamental to emerging fields such as federated learning. Specifically, t… ▽ More

    Submitted 12 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

    Comments: 41 pages, 2 algorithms, 6 figures, 1 table

  5. arXiv:2405.18031  [pdf, other

    math.OC cs.LG

    Lower Bounds and Optimal Algorithms for Non-Smooth Convex Decentralized Optimization over Time-Varying Networks

    Authors: Dmitry Kovalev, Ekaterina Borodich, Alexander Gasnikov, Dmitrii Feoktistov

    Abstract: We consider the task of minimizing the sum of convex functions stored in a decentralized manner across the nodes of a communication network. This problem is relatively well-studied in the scenario when the objective functions are smooth, or the links of the network are fixed in time, or both. In particular, lower bounds on the number of decentralized communications and (sub)gradient computations r… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  6. arXiv:2404.03323  [pdf, other

    cs.CV cs.AI

    Sparse Concept Bottleneck Models: Gumbel Tricks in Contrastive Learning

    Authors: Andrei Semenov, Vladimir Ivanov, Aleksandr Beznosikov, Alexander Gasnikov

    Abstract: We propose a novel architecture and method of explainable classification with Concept Bottleneck Models (CBMs). While SOTA approaches to Image Classification task work as a black box, there is a growing demand for models that would provide interpreted results. Such a models often learn to predict the distribution over class labels using additional description of this target instances, called conce… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: 23 pages, 1 algorithm, 36 figures

    MSC Class: I.2.6; I.2.10; I.4.10; I.5.1; I.5.4; I.5.5 ACM Class: I.2.6; I.2.10; I.4.10; I.5.1; I.5.4; I.5.5

  7. arXiv:2403.13117  [pdf, other

    stat.ML cs.LG

    Optimal Flow Matching: Learning Straight Trajectories in Just One Step

    Authors: Nikita Kornilov, Petr Mokrov, Alexander Gasnikov, Alexander Korotin

    Abstract: Over the several recent years, there has been a boom in development of Flow Matching (FM) methods for generative modeling. One intriguing property pursued by the community is the ability to learn flows with straight trajectories which realize the Optimal Transport (OT) displacements. Straightness is crucial for the fast integration (inference) of the learned flow's paths. Unfortunately, most exist… ▽ More

    Submitted 25 May, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

  8. arXiv:2402.05264  [pdf, other

    cs.LG math.OC

    AdaBatchGrad: Combining Adaptive Batch Size and Adaptive Step Size

    Authors: Petr Ostroukhov, Aigerim Zhumabayeva, Chulu Xiang, Alexander Gasnikov, Martin Takáč, Dmitry Kamzolov

    Abstract: This paper presents a novel adaptation of the Stochastic Gradient Descent (SGD), termed AdaBatchGrad. This modification seamlessly integrates an adaptive step size with an adjustable batch size. An increase in batch size and a decrease in step size are well-known techniques to tighten the area of convergence of SGD and decrease its variance. A range of studies by R. Byrd and J. Nocedal introduced… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  9. Optimal Data Splitting in Distributed Optimization for Machine Learning

    Authors: Daniil Medyakov, Gleb Molodtsov, Aleksandr Beznosikov, Alexander Gasnikov

    Abstract: The distributed optimization problem has become increasingly relevant recently. It has a lot of advantages such as processing a large amount of data in less time compared to non-distributed methods. However, most distributed approaches suffer from a significant bottleneck - the cost of communications. Therefore, a large amount of research has recently been directed at solving this problem. One suc… ▽ More

    Submitted 26 March, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

    Comments: 17 pages, 2 figures

  10. arXiv:2401.07788  [pdf, other

    cs.LG cs.DC math.OC

    Activations and Gradients Compression for Model-Parallel Training

    Authors: Mikhail Rudakov, Aleksandr Beznosikov, Yaroslav Kholodov, Alexander Gasnikov

    Abstract: Large neural networks require enormous computational clusters of machines. Model-parallel training, when the model architecture is partitioned sequentially between workers, is a popular approach for training modern models. Information compression can be applied to decrease workers communication time, as it is often a bottleneck in such systems. This work explores how simultaneous compression of ac… ▽ More

    Submitted 26 March, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

    Comments: 17 pages, 6 figures, 5 tables

  11. arXiv:2311.04161  [pdf, other

    math.OC cs.DS cs.LG math.ST

    Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems

    Authors: Nikita Puchkin, Eduard Gorbunov, Nikolay Kutuzov, Alexander Gasnikov

    Abstract: We consider stochastic optimization problems with heavy-tailed noise with structured density. For such problems, we show that it is possible to get faster rates of convergence than $\mathcal{O}(K^{-2(α- 1)/α})$, when the stochastic gradients have finite moments of order $α\in (1, 2]$. In particular, our analysis allows the noise norm to have an unbounded expectation. To achieve these results, we s… ▽ More

    Submitted 17 April, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

    Comments: AISTATS 2024. 60 pages, 3 figures. Changes in V2: small typos were fixed, extra experiments and discussion were added. Code: https://github.com/Kutuz4/AISTATS2024_SMoM

  12. arXiv:2310.01860  [pdf, other

    math.OC cs.LG

    High-Probability Convergence for Composite and Distributed Stochastic Minimization and Variational Inequalities with Heavy-Tailed Noise

    Authors: Eduard Gorbunov, Abdurakhmon Sadiev, Marina Danilova, Samuel Horváth, Gauthier Gidel, Pavel Dvurechensky, Alexander Gasnikov, Peter Richtárik

    Abstract: High-probability analysis of stochastic first-order optimization methods under mild assumptions on the noise has been gaining a lot of attention in recent years. Typically, gradient clipping is one of the key algorithmic ingredients to derive good high-probability guarantees when the noise is heavy-tailed. However, if implemented naïvely, clipping can spoil the convergence of the popular methods f… ▽ More

    Submitted 24 July, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: ICML 2024; changes in version 2: minor corrections (typos were fixed and the structure was modified)

  13. arXiv:2305.15938  [pdf, ps, other

    math.OC cs.LG stat.ML

    First Order Methods with Markovian Noise: from Acceleration to Variational Inequalities

    Authors: Aleksandr Beznosikov, Sergey Samsonov, Marina Sheshukova, Alexander Gasnikov, Alexey Naumov, Eric Moulines

    Abstract: This paper delves into stochastic optimization problems that involve Markovian noise. We present a unified approach for the theoretical analysis of first-order gradient methods for stochastic optimization and variational inequalities. Our approach covers scenarios for both non-convex and strongly convex minimization problems. To achieve an optimal (linear) dependence on the mixing time of the unde… ▽ More

    Submitted 30 March, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Appears in: Advances in Neural Information Processing Systems 36 (NeurIPS 2023). 41 pages, 3 algorithms, 2 tables

    Journal ref: https://proceedings.neurips.cc/paper_files/paper/2023/hash/8c3e38ce55a0fa44bc325bc6fdb7f4e5-Abstract-Conference.html

  14. arXiv:2305.06743  [pdf, other

    cs.LG math.OC stat.ML

    Implicitly normalized forecaster with clipping for linear and non-linear heavy-tailed multi-armed bandits

    Authors: Yuriy Dorn, Nikita Kornilov, Nikolay Kutuzov, Alexander Nazin, Eduard Gorbunov, Alexander Gasnikov

    Abstract: The Implicitly Normalized Forecaster (INF) algorithm is considered to be an optimal solution for adversarial multi-armed bandit (MAB) problems. However, most of the existing complexity results for INF rely on restrictive assumptions, such as bounded rewards. Recently, a related algorithm was proposed that works for both adversarial and stochastic heavy-tailed MAB settings. However, this algorithm… ▽ More

    Submitted 26 December, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

  15. arXiv:2302.07615  [pdf, other

    math.OC cs.DC cs.GT cs.LG stat.ML

    Similarity, Compression and Local Steps: Three Pillars of Efficient Communications for Distributed Variational Inequalities

    Authors: Aleksandr Beznosikov, Martin Takáč, Alexander Gasnikov

    Abstract: Variational inequalities are a broad and flexible class of problems that includes minimization, saddle point, and fixed point problems as special cases. Therefore, variational inequalities are used in various applications ranging from equilibrium search to adversarial learning. With the increasing size of data and models, today's instances demand parallel and distributed computing for real-world m… ▽ More

    Submitted 30 March, 2024; v1 submitted 15 February, 2023; originally announced February 2023.

    Comments: Appears in: Advances in Neural Information Processing Systems 36 (NeurIPS 2023) (https://proceedings.neurips.cc/paper_files/paper/2023/hash/5b4a459db23e6db9be2a128380953d96-Abstract-Conference.html). 36 pages, 3 algorithms, 1 figure, 1 table

  16. arXiv:2302.00999  [pdf, ps, other

    math.OC cs.LG

    High-Probability Bounds for Stochastic Optimization and Variational Inequalities: the Case of Unbounded Variance

    Authors: Abdurakhmon Sadiev, Marina Danilova, Eduard Gorbunov, Samuel Horváth, Gauthier Gidel, Pavel Dvurechensky, Alexander Gasnikov, Peter Richtárik

    Abstract: During recent years the interest of optimization and machine learning communities in high-probability convergence of stochastic optimization methods has been growing. One of the main reasons for this is that high-probability complexity bounds are more accurate and less studied than in-expectation ones. However, SOTA high-probability non-asymptotic convergence results are derived under strong assum… ▽ More

    Submitted 18 July, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

    Comments: ICML 2023. 86 pages. Changes in v2: ICML formatting was applied along with minor edits of the text

  17. arXiv:2212.14439  [pdf, other

    math.OC cs.LG

    An Optimal Algorithm for Strongly Convex Min-min Optimization

    Authors: Alexander Gasnikov, Dmitry Kovalev, Grigory Malinovsky

    Abstract: In this paper we study the smooth strongly convex minimization problem $\min_{x}\min_y f(x,y)$. The existing optimal first-order methods require $\mathcal{O}(\sqrt{\max\{κ_x,κ_y\}} \log 1/ε)$ of computations of both $\nabla_x f(x,y)$ and $\nabla_y f(x,y)$, where $κ_x$ and $κ_y$ are condition numbers with respect to variable blocks $x$ and $y$. We propose a new algorithm that only requires… ▽ More

    Submitted 8 February, 2023; v1 submitted 29 December, 2022; originally announced December 2022.

    Comments: 12 pages, 2 figures, 1 algorithm

  18. SARAH-based Variance-reduced Algorithm for Stochastic Finite-sum Cocoercive Variational Inequalities

    Authors: Aleksandr Beznosikov, Alexander Gasnikov

    Abstract: Variational inequalities are a broad formalism that encompasses a vast number of applications. Motivated by applications in machine learning and beyond, stochastic methods are of great importance. In this paper we consider the problem of stochastic finite-sum cocoercive variational inequalities. For this class of problems, we investigate the convergence of the method based on the SARAH variance re… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: 11 pages, 1 algorithm, 1 figure, 1 theorem

  19. arXiv:2208.13592  [pdf, ps, other

    math.OC cs.GT cs.LG stat.ML

    Smooth Monotone Stochastic Variational Inequalities and Saddle Point Problems: A Survey

    Authors: Aleksandr Beznosikov, Boris Polyak, Eduard Gorbunov, Dmitry Kovalev, Alexander Gasnikov

    Abstract: This paper is a survey of methods for solving smooth (strongly) monotone stochastic variational inequalities. To begin with, we give the deterministic foundation from which the stochastic methods eventually evolved. Then we review methods for the general stochastic formulation, and look at the finite sum setup. The last parts of the paper are devoted to various recent (not necessarily stochastic)… ▽ More

    Submitted 2 April, 2023; v1 submitted 29 August, 2022; originally announced August 2022.

    Comments: 12 pages

  20. Compression and Data Similarity: Combination of Two Techniques for Communication-Efficient Solving of Distributed Variational Inequalities

    Authors: Aleksandr Beznosikov, Alexander Gasnikov

    Abstract: Variational inequalities are an important tool, which includes minimization, saddles, games, fixed-point problems. Modern large-scale and computationally expensive practical applications make distributed methods for solving these problems popular. Meanwhile, most distributed systems have a basic problem - a communication bottleneck. There are various techniques to deal with it. In particular, in t… ▽ More

    Submitted 3 September, 2022; v1 submitted 19 June, 2022; originally announced June 2022.

    Comments: v2: minor changes. 19 pages, 1 algorithm, 1 figure, 1 table, 1 theorem

  21. arXiv:2206.08303  [pdf, other

    cs.LG math.OC

    On Scaled Methods for Saddle Point Problems

    Authors: Aleksandr Beznosikov, Aibek Alanov, Dmitry Kovalev, Martin Takáč, Alexander Gasnikov

    Abstract: Methods with adaptive scaling of different features play a key role in solving saddle point problems, primarily due to Adam's popularity for solving adversarial machine learning problems, including GANS training. This paper carries out a theoretical analysis of the following scaling techniques for solving SPPs: the well-known Adam and RmsProp scaling and the newer AdaHessian and OASIS based on Hut… ▽ More

    Submitted 21 June, 2023; v1 submitted 16 June, 2022; originally announced June 2022.

    Comments: 54 pages, 2 algorithms with 4 options for each, 12 figures, 5 tables, 2 theorems

  22. arXiv:2206.01666  [pdf, other

    math.OC cs.LG

    Algorithm for Constrained Markov Decision Process with Linear Convergence

    Authors: Egor Gladin, Maksim Lavrik-Karmazin, Karina Zainullina, Varvara Rudenko, Alexander Gasnikov, Martin Takáč

    Abstract: The problem of constrained Markov decision process is considered. An agent aims to maximize the expected accumulated discounted reward subject to multiple constraints on its costs (the number of constraints is relatively small). A new dual approach is proposed with the integration of two ingredients: entropy regularized policy optimizer and Vaidya's dual optimizer, both of which are critical to ac… ▽ More

    Submitted 19 October, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

    Comments: 27 pages, 2 figures, 3 tables. Improved presentation of the material, added a table with results, stated contributions more clearly, changed article template

  23. arXiv:2206.01095  [pdf, other

    math.OC cs.LG

    Clipped Stochastic Methods for Variational Inequalities with Heavy-Tailed Noise

    Authors: Eduard Gorbunov, Marina Danilova, David Dobre, Pavel Dvurechensky, Alexander Gasnikov, Gauthier Gidel

    Abstract: Stochastic first-order methods such as Stochastic Extragradient (SEG) or Stochastic Gradient Descent-Ascent (SGDA) for solving smooth minimax problems and, more generally, variational inequality problems (VIP) have been gaining a lot of attention in recent years due to the growing popularity of adversarial formulations in machine learning. However, while high-probability convergence bounds are kno… ▽ More

    Submitted 1 November, 2022; v1 submitted 2 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022. 74 pages, 18 figures. Changes in v2: few typos were fixed, new experiments with clipped-SEG were added. Code: https://github.com/busycalibrating/clipped-stochastic-methods

  24. arXiv:2205.15136  [pdf, other

    math.OC cs.DC cs.LG

    Optimal Gradient Sliding and its Application to Distributed Optimization Under Similarity

    Authors: Dmitry Kovalev, Aleksandr Beznosikov, Ekaterina Borodich, Alexander Gasnikov, Gesualdo Scutari

    Abstract: We study structured convex optimization problems, with additive objective $r:=p + q$, where $r$ is ($μ$-strongly) convex, $q$ is $L_q$-smooth and convex, and $p$ is $L_p$-smooth, possibly nonconvex. For such a class of problems, we proposed an inexact accelerated gradient sliding method that can skip the gradient computation for one of these components while still achieving optimal complexity of g… ▽ More

    Submitted 30 May, 2022; originally announced May 2022.

    Comments: 24 pages, 2 new algorithms, 12 theorems, 2 figures

  25. arXiv:2205.09647  [pdf, other

    math.OC cs.LG

    The First Optimal Acceleration of High-Order Methods in Smooth Convex Optimization

    Authors: Dmitry Kovalev, Alexander Gasnikov

    Abstract: In this paper, we study the fundamental open question of finding the optimal high-order algorithm for solving smooth convex minimization problems. Arjevani et al. (2019) established the lower bound $Ω\left(ε^{-2/(3p+1)}\right)$ on the number of the $p$-th order oracle calls required by an algorithm to find an $ε$-accurate solution to the problem, where the $p$-th order oracle stands for the comput… ▽ More

    Submitted 19 May, 2022; originally announced May 2022.

  26. arXiv:2205.05653  [pdf, other

    math.OC cs.LG

    The First Optimal Algorithm for Smooth and Strongly-Convex-Strongly-Concave Minimax Optimization

    Authors: Dmitry Kovalev, Alexander Gasnikov

    Abstract: In this paper, we revisit the smooth and strongly-convex-strongly-concave minimax optimization problem. Zhang et al. (2021) and Ibrahim et al. (2020) established the lower bound $Ω\left(\sqrt{κ_xκ_y} \log \frac{1}ε\right)$ on the number of gradient evaluations required to find an $ε$-accurate solution, where $κ_x$ and $κ_y$ are condition numbers for the strong convexity and strong concavity assump… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

  27. arXiv:2202.02771  [pdf, other

    math.OC cs.DC cs.LG

    Optimal Algorithms for Decentralized Stochastic Variational Inequalities

    Authors: Dmitry Kovalev, Aleksandr Beznosikov, Abdurakhmon Sadiev, Michael Persiianov, Peter Richtárik, Alexander Gasnikov

    Abstract: Variational inequalities are a formalism that includes games, minimization, saddle point, and equilibrium problems as special cases. Methods for variational inequalities are therefore universal approaches for many applied tasks, including machine learning problems. This work concentrates on the decentralized setting, which is increasingly important but not well understood. In particular, we consid… ▽ More

    Submitted 2 April, 2023; v1 submitted 6 February, 2022; originally announced February 2022.

    Comments: Appears in: Advances in Neural Information Processing Systems 35 (NeurIPS 2022). Minor modifications with respect to the NeurIPS version. 58 pages, 6 algorithms, 9 figures, 4 tables

    Journal ref: https://proceedings.neurips.cc/paper_files/paper/2022/hash/c959bb2cb164d37569a17fa67494d69a-Abstract-Conference.html

  28. arXiv:2112.15199  [pdf, ps, other

    math.OC cs.LG

    Accelerated Primal-Dual Gradient Method for Smooth and Convex-Concave Saddle-Point Problems with Bilinear Coupling

    Authors: Dmitry Kovalev, Alexander Gasnikov, Peter Richtárik

    Abstract: In this paper we study the convex-concave saddle-point problem $\min_x \max_y f(x) + y^T \mathbf{A} x - g(y)$, where $f(x)$ and $g(y)$ are smooth and convex functions. We propose an Accelerated Primal-Dual Gradient Method (APDG) for solving this problem, achieving (i) an optimal linear convergence rate in the strongly-convex-strongly-concave regime, matching the lower complexity bound (Zhang et al… ▽ More

    Submitted 9 March, 2022; v1 submitted 30 December, 2021; originally announced December 2021.

  29. arXiv:2110.12347  [pdf, other

    math.OC cs.LG

    Acceleration in Distributed Optimization under Similarity

    Authors: Ye Tian, Gesualdo Scutari, Tianyu Cao, Alexander Gasnikov

    Abstract: We study distributed (strongly convex) optimization problems over a network of agents, with no centralized nodes. The loss functions of the agents are assumed to be \textit{similar}, due to statistical data similarity or otherwise. In order to reduce the number of communications to reach a solution accuracy, we proposed a {\it preconditioned, accelerated} distributed method. An $\varepsilon$-solut… ▽ More

    Submitted 9 April, 2022; v1 submitted 24 October, 2021; originally announced October 2021.

  30. arXiv:2110.03313  [pdf, other

    cs.LG stat.ML

    Distributed Methods with Compressed Communication for Solving Variational Inequalities, with Theoretical Guarantees

    Authors: Aleksandr Beznosikov, Peter Richtárik, Michael Diskin, Max Ryabinin, Alexander Gasnikov

    Abstract: Variational inequalities in general and saddle point problems in particular are increasingly relevant in machine learning applications, including adversarial learning, GANs, transport and robust optimization. With increasing data and problem sizes necessary to train high performing models across various applications, we need to rely on parallel and distributed computing. However, in distributed tr… ▽ More

    Submitted 2 April, 2023; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: Appears in: Advances in Neural Information Processing Systems 35 (NeurIPS 2022). Minor modifications with respect to the NeurIPS version. 73 pages, 9 algorithms, 2 figures, 2 tables

    Journal ref: https://proceedings.neurips.cc/paper_files/paper/2022/hash/5ac1428c23b5da5e66d029646ea3206d-Abstract-Conference.html

  31. arXiv:2107.10706  [pdf, other

    math.OC cs.DC cs.LG

    Distributed Saddle-Point Problems Under Similarity

    Authors: Aleksandr Beznosikov, Gesualdo Scutari, Alexander Rogozin, Alexander Gasnikov

    Abstract: We study solution methods for (strongly-)convex-(strongly)-concave Saddle-Point Problems (SPPs) over networks of two type - master/workers (thus centralized) architectures and meshed (thus decentralized) networks. The local functions at each node are assumed to be similar, due to statistical data similarity or otherwise. We establish lower complexity bounds for a fairly general class of algorithms… ▽ More

    Submitted 22 August, 2022; v1 submitted 22 July, 2021; originally announced July 2021.

    Comments: Appears in: Advances in Neural Information Processing Systems 34 (NeurIPS 2021). Minor modifications with respect to the NeurIPS version. 35 pages, 3 algorithms, 4 figures, 1 table

    Journal ref: https://proceedings.neurips.cc/paper/2021/hash/44e65d3e9bc2f88b2b3d566de51a5381-Abstract.html

  32. arXiv:2107.05951  [pdf, other

    math.OC cs.DC

    One-Point Gradient-Free Methods for Composite Optimization with Applications to Distributed Optimization

    Authors: Ivan Stepanov, Artyom Voronov, Aleksandr Beznosikov, Alexander Gasnikov

    Abstract: This work is devoted to solving the composite optimization problem with the mixture oracle: for the smooth part of the problem, we have access to the gradient, and for the non-smooth part, only to the one-point zero-order oracle. For such a setup, we present a new method based on the sliding algorithm. Our method allows to separate the oracle complexities and compute the gradient for one of the fu… ▽ More

    Submitted 16 February, 2022; v1 submitted 13 July, 2021; originally announced July 2021.

    Comments: New in v2: completely new text of the paper; 26 pages, 1 figure, 2 tables, 1 algorithm

  33. arXiv:2106.08315  [pdf, other

    math.OC cs.DC cs.LG

    Decentralized Local Stochastic Extra-Gradient for Variational Inequalities

    Authors: Aleksandr Beznosikov, Pavel Dvurechensky, Anastasia Koloskova, Valentin Samokhin, Sebastian U Stich, Alexander Gasnikov

    Abstract: We consider distributed stochastic variational inequalities (VIs) on unbounded domains with the problem data that is heterogeneous (non-IID) and distributed across many devices. We make a very general assumption on the computational network that, in particular, covers the settings of fully decentralized calculations with time-varying networks and centralized topologies commonly used in Federated L… ▽ More

    Submitted 2 April, 2023; v1 submitted 15 June, 2021; originally announced June 2021.

    Comments: Appears in: Advances in Neural Information Processing Systems 35 (NeurIPS 2022). Minor modifications with respect to the NeurIPS version. 43 pages, 1 algorithm, 6 figures, 2 tables

    Journal ref: https://proceedings.neurips.cc/paper_files/paper/2022/hash/f9379afacdbabfdc6b060972b60f9ab8-Abstract-Conference.html

  34. arXiv:2106.07289  [pdf, other

    cs.LG cs.DC math.OC

    Decentralized Personalized Federated Learning for Min-Max Problems

    Authors: Ekaterina Borodich, Aleksandr Beznosikov, Abdurakhmon Sadiev, Vadim Sushko, Nikolay Savelyev, Martin Takáč, Alexander Gasnikov

    Abstract: Personalized Federated Learning (PFL) has witnessed remarkable advancements, enabling the development of innovative machine learning applications that preserve the privacy of training data. However, existing theoretical research in this field has primarily focused on distributed optimization for minimization problems. This paper is the first to study PFL for saddle point problems encompassing a br… ▽ More

    Submitted 17 April, 2024; v1 submitted 14 June, 2021; originally announced June 2021.

    Comments: 33 pages, 3 algorithms, 5 figures, 2 tables

  35. arXiv:2106.05958  [pdf, other

    math.OC cs.LG

    High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise

    Authors: Eduard Gorbunov, Marina Danilova, Innokentiy Shibaev, Pavel Dvurechensky, Alexander Gasnikov

    Abstract: Stochastic first-order methods are standard for training large-scale machine learning models. Random behavior may cause a particular run of an algorithm to result in a highly suboptimal objective value, whereas theoretical guarantees are usually proved for the expectation of the objective value. Thus, it is essential to theoretically guarantee that algorithms provide small objective residual with… ▽ More

    Submitted 30 August, 2024; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: 61 pages, 12 figures. Changes in V2: different presentation of the results, different structure, new experiments. Changes in V3: some typos were fixed

  36. arXiv:2106.04469  [pdf, other

    math.OC cs.LG

    Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization Over Time-Varying Networks

    Authors: Dmitry Kovalev, Elnur Gasanov, Peter Richtárik, Alexander Gasnikov

    Abstract: We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network whose links are allowed to change in time. We solve two fundamental problems for this task. First, we establish the first lower bounds on the number of decentralized communication rounds and the number of local computations required to find… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

  37. arXiv:2102.09234  [pdf, other

    math.OC cs.LG

    ADOM: Accelerated Decentralized Optimization Method for Time-Varying Networks

    Authors: Dmitry Kovalev, Egor Shulgin, Peter Richtárik, Alexander Rogozin, Alexander Gasnikov

    Abstract: We propose ADOM - an accelerated method for smooth and strongly convex decentralized optimization over time-varying networks. ADOM uses a dual oracle, i.e., we assume access to the gradient of the Fenchel conjugate of the individual loss functions. Up to a constant factor, which depends on the network structure only, its communication complexity is the same as that of accelerated Nesterov gradient… ▽ More

    Submitted 18 February, 2021; originally announced February 2021.

  38. Decentralized Distributed Optimization for Saddle Point Problems

    Authors: Alexander Rogozin, Aleksandr Beznosikov, Darina Dvinskikh, Dmitry Kovalev, Pavel Dvurechensky, Alexander Gasnikov

    Abstract: We consider distributed convex-concave saddle point problems over arbitrary connected undirected networks and propose a decentralized distributed algorithm for their solution. The local functions distributed across the nodes are assumed to have global and local groups of variables. For the proposed algorithm we prove non-asymptotic convergence rate estimates with explicit dependence on the network… ▽ More

    Submitted 9 April, 2024; v1 submitted 15 February, 2021; originally announced February 2021.

  39. arXiv:2102.06780  [pdf, ps, other

    math.OC cs.DC

    Newton Method over Networks is Fast up to the Statistical Precision

    Authors: Amir Daneshmand, Gesualdo Scutari, Pavel Dvurechensky, Alexander Gasnikov

    Abstract: We propose a distributed cubic regularization of the Newton method for solving (constrained) empirical risk minimization problems over a network of agents, modeled as undirected graph. The algorithm employs an inexact, preconditioned Newton step at each agent's side: the gradient of the centralized loss is iteratively estimated via a gradient-tracking consensus mechanism and the Hessian is subsamp… ▽ More

    Submitted 16 June, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

    Comments: In proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021

  40. arXiv:2012.06188  [pdf, ps, other

    math.OC cs.LG

    Recent Theoretical Advances in Non-Convex Optimization

    Authors: Marina Danilova, Pavel Dvurechensky, Alexander Gasnikov, Eduard Gorbunov, Sergey Guminov, Dmitry Kamzolov, Innokentiy Shibaev

    Abstract: Motivated by recent increased interest in optimization algorithms for non-convex optimization in application to training deep neural networks and other optimization problems in data analysis, we give an overview of recent theoretical results on global performance guarantees of optimization algorithms for non-convex optimization. We start with classical arguments showing that general non-convex pro… ▽ More

    Submitted 26 November, 2021; v1 submitted 11 December, 2020; originally announced December 2020.

    Comments: 81 pages

  41. arXiv:2010.13112  [pdf, other

    cs.LG cs.DC math.OC

    Distributed Saddle-Point Problems: Lower Bounds, Near-Optimal and Robust Algorithms

    Authors: Aleksandr Beznosikov, Valentin Samokhin, Alexander Gasnikov

    Abstract: This paper focuses on the distributed optimization of stochastic saddle point problems. The first part of the paper is devoted to lower bounds for the cenralized and decentralized distributed methods for smooth (strongly) convex-(strongly) concave saddle-point problems as well as the near-optimal algorithms by which these bounds are achieved. Next, we present a new federated algorithm for cenraliz… ▽ More

    Submitted 8 July, 2022; v1 submitted 25 October, 2020; originally announced October 2020.

    Comments: 52 pages, 9 figures, 1 table, 4 algorithms (3 new)

  42. Zeroth-Order Algorithms for Smooth Saddle-Point Problems

    Authors: Abdurakhmon Sadiev, Aleksandr Beznosikov, Pavel Dvurechensky, Alexander Gasnikov

    Abstract: Saddle-point problems have recently gained increased attention from the machine learning community, mainly due to applications in training Generative Adversarial Networks using stochastic gradients. At the same time, in some applications only a zeroth-order oracle is available. In this paper, we propose several algorithms to solve stochastic smooth (strongly) convex-concave saddle-point problems u… ▽ More

    Submitted 27 February, 2021; v1 submitted 21 September, 2020; originally announced September 2020.

  43. arXiv:2006.06763  [pdf, other

    math.OC cs.LG stat.ML

    Stochastic Saddle-Point Optimization for Wasserstein Barycenters

    Authors: Daniil Tiapkin, Alexander Gasnikov, Pavel Dvurechensky

    Abstract: We consider the population Wasserstein barycenter problem for random probability measures supported on a finite set of points and generated by an online stream of data. This leads to a complicated stochastic optimization problem where the objective is given as an expectation of a function given as a solution to a random optimization problem. We employ the structure of the problem and obtain a conv… ▽ More

    Submitted 2 December, 2021; v1 submitted 11 June, 2020; originally announced June 2020.

  44. arXiv:2005.10785  [pdf, other

    math.OC cs.LG

    Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping

    Authors: Eduard Gorbunov, Marina Danilova, Alexander Gasnikov

    Abstract: In this paper, we propose a new accelerated stochastic first-order method called clipped-SSTM for smooth convex stochastic optimization with heavy-tailed distributed noise in stochastic gradients and derive the first high-probability complexity bounds for this method closing the gap in the theory of stochastic optimization with heavy-tailed noise. Our method is based on a special variant of accele… ▽ More

    Submitted 23 October, 2020; v1 submitted 21 May, 2020; originally announced May 2020.

    Comments: NeurIPS 2020; 60 pages, 15 figures

  45. Gradient-Free Methods for Saddle-Point Problem

    Authors: Aleksandr Beznosikov, Abdurakhmon Sadiev, Alexander Gasnikov

    Abstract: In the paper, we generalize the approach Gasnikov et. al, 2017, which allows to solve (stochastic) convex optimization problems with an inexact gradient-free oracle, to the convex-concave saddle-point problem. The proposed approach works, at least, like the best existing approaches. But for a special set-up (simplex type constraints and closeness of Lipschitz constants in 1 and 2 norms) our approa… ▽ More

    Submitted 11 September, 2022; v1 submitted 12 May, 2020; originally announced May 2020.

    Comments: Appears in: Communications in Computer and Information Science book series (CCIS,volume 1275). Minor modifications (typos) with respect to the CCIS version. 26 pages, 1 algorithm, 5 figures, 3 tables

  46. arXiv:1906.03622  [pdf, other

    math.OC cs.LG

    On a Combination of Alternating Minimization and Nesterov's Momentum

    Authors: Sergey Guminov, Pavel Dvurechensky, Nazarii Tupitsa, Alexander Gasnikov

    Abstract: Alternating minimization (AM) procedures are practically efficient in many applications for solving convex and non-convex optimization problems. On the other hand, Nesterov's accelerated gradient is theoretically optimal first-order method for convex optimization. In this paper we combine AM and Nesterov's acceleration to propose an accelerated alternating minimization algorithm. We prove $1/k^2$… ▽ More

    Submitted 15 September, 2021; v1 submitted 9 June, 2019; originally announced June 2019.

    Comments: Compared to previous versions: dual WB problem and complexity analysis for WB problem corrected, updated and extended experiments

  47. arXiv:1901.08686  [pdf, ps, other

    math.OC cs.DS

    On the Complexity of Approximating Wasserstein Barycenter

    Authors: Alexey Kroshnin, Darina Dvinskikh, Pavel Dvurechensky, Alexander Gasnikov, Nazarii Tupitsa, Cesar Uribe

    Abstract: We study the complexity of approximating Wassertein barycenter of $m$ discrete measures, or histograms of size $n$ by contrasting two alternative approaches, both using entropic regularization. The first approach is based on the Iterative Bregman Projections (IBP) algorithm for which our novel analysis gives a complexity bound proportional to $\frac{mn^2}{\varepsilon^2}$ to approximate the origina… ▽ More

    Submitted 20 February, 2020; v1 submitted 24 January, 2019; originally announced January 2019.

    Comments: Corrected misprints. Added a reference to accelerated Iterative Bregman Projections introduced in arXiv:1906.03622

    MSC Class: 90C25; 90C30; 90C06; 90C90

    Journal ref: ICML 2019, in PMLR 97:3530-3540. http://proceedings.mlr.press/v97/kroshnin19a.html

  48. arXiv:1809.00710  [pdf, other

    math.OC cs.DC cs.MA eess.SY stat.ML

    A Dual Approach for Optimal Algorithms in Distributed Optimization over Networks

    Authors: César A. Uribe, Soomin Lee, Alexander Gasnikov, Angelia Nedić

    Abstract: We study dual-based algorithms for distributed convex optimization problems over networks, where the objective is to minimize a sum $\sum_{i=1}^{m}f_i(z)$ of functions over in a network. We provide complexity bounds for four different cases, namely: each function $f_i$ is strongly convex and smooth, each function is either strongly convex or smooth, and when it is convex but neither strongly conve… ▽ More

    Submitted 16 March, 2020; v1 submitted 3 September, 2018; originally announced September 2018.

    Comments: This work is an extended version of the manuscript: Optimal Algorithms for Distributed Optimization arXiv:1712.00232

  49. arXiv:1806.03915  [pdf, other

    math.OC cs.DC

    Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters

    Authors: Pavel Dvurechensky, Darina Dvinskikh, Alexander Gasnikov, César A. Uribe, Angelia Nedić

    Abstract: We study the decentralized distributed computation of discrete approximations for the regularized Wasserstein barycenter of a finite set of continuous probability measures distributedly stored over a network. We assume there is a network of agents/machines/computers, and each agent holds a private continuous probability measure and seeks to compute the barycenter of all the measures in the network… ▽ More

    Submitted 19 February, 2020; v1 submitted 11 June, 2018; originally announced June 2018.

    MSC Class: 90C25; 90C30; 90C06; 90C90; 68Q25; 65K05; 65Y20; 68W40 ACM Class: G.1.6

  50. arXiv:1804.02394  [pdf, other

    math.OC cs.CC cs.DS

    An Accelerated Directional Derivative Method for Smooth Stochastic Convex Optimization

    Authors: Pavel Dvurechensky, Eduard Gorbunov, Alexander Gasnikov

    Abstract: We consider smooth stochastic convex optimization problems in the context of algorithms which are based on directional derivatives of the objective function. This context can be considered as an intermediate one between derivative-free optimization and gradient-based optimization. We assume that at any given point and for any given direction, a stochastic approximation for the directional derivati… ▽ More

    Submitted 21 September, 2020; v1 submitted 8 April, 2018; originally announced April 2018.

    Comments: arXiv admin note: text overlap with arXiv:1802.09022

    MSC Class: 90C25; 90C15; 90C56; 90C30; 90C06; 68Q25; 65K05; 68W20; 65Y20; 68W40 ACM Class: G.1.6