Anderson acceleration with adaptive relaxation for convergent fixed-point iterations

Nicolas Lepage-Saucier Concordia University
Department of Economics
Sir George William Campus
1455 De Maisonneuve Blvd. W.
Montreal, Quebec, Canada
H3G 1M8
[email protected]
(August 2024)
Abstract

Two adaptive relaxation strategies are proposed for Anderson acceleration. They are specifically designed for applications in which mappings converge to a fixed point. Their superiority over alternative Anderson acceleration is demonstrated for linear contraction mappings. Both strategies perform well in three nonlinear fixed-point applications that include partial differential equations and the EM algorithm. One strategy surpasses all other Anderson acceleration implementations tested in terms of computation time across various specifications, including composite Anderson acceleration.

Keywords: Anderson acceleration, Fixed-point iteration, Optimal relaxation

1 Introduction

Anderson acceleration (AA), also referred to as Anderson mixing, was formulated by Donald Anderson in 1965 [2] to solve integration problems numerically. Since then, its use has expanded to a wide array of problems in physics, mathematics and other disciplines. It is very close to methods developed in other contexts such as Pulay mixing (direct inversion in the iterative subspace) in chemistry [18] or the generalized minimal residual method (GMRES) and its predecessors like the generalized conjugate residual method (GCR) in the linear case (see [20], [23] and [16] for comparisons). There are two versions of the methods, related to type I and type II Broyden methods. This paper focuses on the latter.

Consider the mapping g:ℝn→ℝn:𝑔→superscriptℝ𝑛superscriptℝ𝑛g:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}italic_g : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT β†’ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT with at least one fixed point: g⁒(xβˆ—)=xβˆ—π‘”superscriptπ‘₯βˆ—superscriptπ‘₯βˆ—g(x^{\ast})=x^{\ast}italic_g ( italic_x start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT ) = italic_x start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT. In what follows, f⁒(x)=𝑓π‘₯absentf\left(x\right)=italic_f ( italic_x ) = g⁒(x)βˆ’x𝑔π‘₯π‘₯g(x)-xitalic_g ( italic_x ) - italic_x can be interpreted as a residual of xπ‘₯xitalic_x, β€–x‖≑‖xβ€–2=x⊺⁒xnormπ‘₯subscriptnormπ‘₯2superscriptπ‘₯⊺π‘₯\left\|x\right\|\equiv\left\|x\right\|_{2}=\sqrt{x^{\intercal}x}βˆ₯ italic_x βˆ₯ ≑ βˆ₯ italic_x βˆ₯ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = square-root start_ARG italic_x start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT italic_x end_ARG is the EuclideanΒ Β norm of a vector xπ‘₯xitalic_x, and ⟨x,y⟩=x⊺⁒yπ‘₯𝑦superscriptπ‘₯βŠΊπ‘¦\langle x,y\rangle=x^{\intercal}y⟨ italic_x , italic_y ⟩ = italic_x start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT italic_y is the inner product of the vectors xπ‘₯xitalic_x and y𝑦yitalic_y.

The AA algorithm can be described as follows.

Algorithm 1.

Input: a mapping g:ℝn→ℝn:𝑔→superscriptℝ𝑛superscriptℝ𝑛g:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}italic_g : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT β†’ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, a starting point x0βˆˆβ„nsubscriptπ‘₯0superscriptℝ𝑛x_{0}\in\mathbb{R}^{n}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, and an integer 1≀m≀n1π‘šπ‘›1\leq m\leq n1 ≀ italic_m ≀ italic_n

Β 

1 Set x1=g⁒(x0)subscriptπ‘₯1𝑔subscriptπ‘₯0x_{1}=g(x_{0})italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_g ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT )
2 for k=1,2,β€¦π‘˜12…k=1,2,...italic_k = 1 , 2 , … until convergence
3     Compute g⁒(xk)𝑔subscriptπ‘₯π‘˜g\left(x_{k}\right)italic_g ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
4     Compute (Ξ±1(k),…,Ξ±mk(k))superscriptsubscript𝛼1π‘˜β€¦superscriptsubscript𝛼subscriptπ‘šπ‘˜π‘˜(\alpha_{1}^{(k)},...,\alpha_{m_{k}}^{(k)})( italic_Ξ± start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT , … , italic_Ξ± start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) that solve
     minΞ±1(k),…,Ξ±mk(k)β‘β€–βˆ‘i=1mkΞ±i(k)⁒f⁒(xkβˆ’mk+i)β€–subscriptsuperscriptsubscript𝛼1π‘˜β€¦superscriptsubscript𝛼subscriptπ‘šπ‘˜π‘˜normsuperscriptsubscript𝑖1subscriptπ‘šπ‘˜superscriptsubscriptπ›Όπ‘–π‘˜π‘“subscriptπ‘₯π‘˜subscriptπ‘šπ‘˜π‘–\min_{\alpha_{1}^{(k)},...,\alpha_{m_{k}}^{(k)}}\left\|\sum_{i=1}^{m_{k}}% \alpha_{i}^{(k)}f\left(x_{k-m_{k}+i}\right)\right\|roman_min start_POSTSUBSCRIPT italic_Ξ± start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT , … , italic_Ξ± start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT βˆ₯ βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_Ξ± start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_k - italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_i end_POSTSUBSCRIPT ) βˆ₯ s.t. βˆ‘i=1mkΞ±i(k)=1superscriptsubscript𝑖1subscriptπ‘šπ‘˜superscriptsubscriptπ›Όπ‘–π‘˜1\sum_{i=1}^{m_{k}}\alpha_{i}^{(k)}=1βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_Ξ± start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT = 1
5     Compute xΒ―k=βˆ‘i=1mkΞ±i(k)⁒xkβˆ’mk+i⁒ and ⁒yΒ―k=βˆ‘i=1mkΞ±i(k)⁒g⁒(xkβˆ’mk+i)subscriptΒ―π‘₯π‘˜superscriptsubscript𝑖1subscriptπ‘šπ‘˜superscriptsubscriptπ›Όπ‘–π‘˜subscriptπ‘₯π‘˜subscriptπ‘šπ‘˜π‘–Β andΒ subscriptΒ―π‘¦π‘˜superscriptsubscript𝑖1subscriptπ‘šπ‘˜superscriptsubscriptπ›Όπ‘–π‘˜π‘”subscriptπ‘₯π‘˜subscriptπ‘šπ‘˜π‘–\overline{x}_{k}=\sum_{i=1}^{m_{k}}\alpha_{i}^{(k)}x_{k-m_{k}+i}\text{ and }% \overline{y}_{k}=\sum_{i=1}^{m_{k}}\alpha_{i}^{(k)}g\left(x_{k-m_{k}+i}\right)overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_Ξ± start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_k - italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_i end_POSTSUBSCRIPT and overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_Ξ± start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT italic_g ( italic_x start_POSTSUBSCRIPT italic_k - italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_i end_POSTSUBSCRIPT )
6     Set xk+1=xΒ―k+Ξ²kβ‹…(yΒ―kβˆ’xΒ―k)subscriptπ‘₯π‘˜1subscriptΒ―π‘₯π‘˜β‹…subscriptπ›½π‘˜subscriptΒ―π‘¦π‘˜subscriptΒ―π‘₯π‘˜x_{k+1}=\overline{x}_{k}+\beta_{k}\cdot\left(\overline{y}_{k}-\overline{x}_{k}\right)italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT β‹… ( overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
7 end for

Β 

The number of lags 1≀mk≀min⁑(k,m)1subscriptπ‘šπ‘˜π‘˜π‘š1\leq m_{k}\leq\min\left(k,m\right)1 ≀ italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≀ roman_min ( italic_k , italic_m ) is usually set as high as possible to accelerate convergence while providing an acceptable conditioning of the linear problem for numerical stability.

The scalar Ξ²ksubscriptπ›½π‘˜\beta_{k}italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is the relaxation (damping, mixing) parameter. In most AA applications, it is usually stationary and is often set to 1. As discussed by Anderson in [3], the choice of Ξ²k=1subscriptπ›½π‘˜1\beta_{k}=1italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 is natural under the implicit assumption that g𝑔gitalic_g converges to a fixed point xβˆ—superscriptπ‘₯βˆ—x^{\ast}italic_x start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT. In this case, g⁒(xk)𝑔subscriptπ‘₯π‘˜g(x_{k})italic_g ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) should be closer to xβˆ—superscriptπ‘₯βˆ—x^{\ast}italic_x start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT than xksubscriptπ‘₯π‘˜x_{k}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and, correspondingly, yΒ―ksubscriptΒ―π‘¦π‘˜\overline{y}_{k}overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT should be closer to xβˆ—superscriptπ‘₯βˆ—x^{\ast}italic_x start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT than xΒ―ksubscriptΒ―π‘₯π‘˜\overline{x}_{k}overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, provided that yΒ―ksubscriptΒ―π‘¦π‘˜\overline{y}_{k}overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a good approximation of g⁒(xΒ―k)𝑔subscriptΒ―π‘₯π‘˜g(\overline{x}_{k})italic_g ( overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ). This assumption is of course not always valid. Smaller Ξ²ksubscriptπ›½π‘˜\beta_{k}italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT have sometimes been shown to help stability and improve global convergence for challenging applications (see, for instance, [14] and [25]). But for fixed-point iterations which do converge to a fixed point, this property can be exploited to compute adaptive relaxation parameters which may significantly speed up the convergence of AA.

The idea of AA withΒ dynamically adjusted relaxation is a relatively recent one. In 2019, Anderson [3] outlined an algorithm for adjusting Ξ²ksubscriptπ›½π‘˜\beta_{k}italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT by monitoring the convergence of the Picard iteration for g⁒(x)𝑔π‘₯g(x)italic_g ( italic_x ) over multiple iterations. Focusing on linearly-converging fixed-point methods, Evans et al. [9] used the optimization gain ΞΈk+1subscriptπœƒπ‘˜1\theta_{k+1}italic_ΞΈ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT from an AA step to set Ξ²k=0.9βˆ’ΞΈk+1/2subscriptπ›½π‘˜0.9subscriptπœƒπ‘˜12\beta_{k}=0.9-\theta_{k+1}/2italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 0.9 - italic_ΞΈ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT / 2. They show that this choice of relaxation parameter improved robustness and efficiency over constant relaxation with Ξ²=0.4𝛽0.4\beta=0.4italic_Ξ² = 0.4 and Ξ²=.75𝛽.75\beta=.75italic_Ξ² = .75 in various applications. Recently, Jin et al. [12] incorporated a dynamically adjusted Ξ²ksubscriptπ›½π‘˜\beta_{k}italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT in their Anderson acceleration of the derivative-free projection method.

In 2013, Potra and Engler [16] extended the results of [23] by characterizing the behavior of AA for linear systems with arbitrary non-zero relaxation parameters. While doing so, they also suggested a simple procedure to compute an optimal Ξ²ksubscriptπ›½π‘˜\beta_{k}italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT minimizing the residual of xk+1subscriptπ‘₯π‘˜1x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT. Chen and Vuik [6] extended their method to nonlinear problems with a locally optimal Ξ²ksubscriptπ›½π‘˜\beta_{k}italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT that minimizes β€–f⁒(xk+1)β€–norm𝑓subscriptπ‘₯π‘˜1\left\|f\left(x_{k+1}\right)\right\|βˆ₯ italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) βˆ₯. Using the approximation g⁒(xΒ―k+Ξ²k⁒(yΒ―kβˆ’xΒ―k))β‰ˆg⁒(xΒ―k)+Ξ²k⁒(g⁒(yΒ―k)βˆ’g⁒(xΒ―k))𝑔subscriptΒ―π‘₯π‘˜subscriptπ›½π‘˜subscriptΒ―π‘¦π‘˜subscriptΒ―π‘₯π‘˜π‘”subscriptΒ―π‘₯π‘˜subscriptπ›½π‘˜π‘”subscriptΒ―π‘¦π‘˜π‘”subscriptΒ―π‘₯π‘˜g\left(\overline{x}_{k}+\beta_{k}\left(\overline{y}_{k}-\overline{x}_{k}\right% )\right)\approx g\left(\overline{x}_{k}\right)+\beta_{k}\left(g\left(\overline% {y}_{k}\right)-g\left(\overline{x}_{k}\right)\right)italic_g ( overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) β‰ˆ italic_g ( overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_g ( overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_g ( overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ), this optimal Ξ²ksubscriptπ›½π‘˜\beta_{k}italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is

Ξ²kβˆ—=arg⁑minβ⁑‖f⁒(xΒ―k)+Ξ²β‹…(f⁒(yΒ―k)βˆ’f⁒(xΒ―k))β€–=βˆ’βŸ¨f⁒(yΒ―k)βˆ’f⁒(xΒ―k),f⁒(xΒ―k)βŸ©β€–f⁒(yΒ―k)βˆ’f⁒(xΒ―k)β€–2superscriptsubscriptπ›½π‘˜βˆ—subscript𝛽norm𝑓subscriptΒ―π‘₯π‘˜β‹…π›½π‘“subscriptΒ―π‘¦π‘˜π‘“subscriptΒ―π‘₯π‘˜π‘“subscriptΒ―π‘¦π‘˜π‘“subscriptΒ―π‘₯π‘˜π‘“subscriptΒ―π‘₯π‘˜superscriptnorm𝑓subscriptΒ―π‘¦π‘˜π‘“subscriptΒ―π‘₯π‘˜2\beta_{k}^{\ast}=\arg\min_{\beta}\left\|f\left(\overline{x}_{k}\right)+\beta% \cdot\left(f\left(\overline{y}_{k}\right)-f\left(\overline{x}_{k}\right)\right% )\right\|=-\frac{\left\langle f\left(\overline{y}_{k}\right)-f\left(\overline{% x}_{k}\right),f\left(\overline{x}_{k}\right)\right\rangle}{\left\|f\left(% \overline{y}_{k}\right)-f\left(\overline{x}_{k}\right)\right\|^{2}}italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT = roman_arg roman_min start_POSTSUBSCRIPT italic_Ξ² end_POSTSUBSCRIPT βˆ₯ italic_f ( overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + italic_Ξ² β‹… ( italic_f ( overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_f ( overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) βˆ₯ = - divide start_ARG ⟨ italic_f ( overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_f ( overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , italic_f ( overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ⟩ end_ARG start_ARG βˆ₯ italic_f ( overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_f ( overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) βˆ₯ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG (1.1)

and the optimal AA update is

xk+10=xΒ―k+Ξ²kβˆ—β’(yΒ―kβˆ’xΒ―k).superscriptsubscriptπ‘₯π‘˜10subscriptΒ―π‘₯π‘˜superscriptsubscriptπ›½π‘˜βˆ—subscriptΒ―π‘¦π‘˜subscriptΒ―π‘₯π‘˜x_{k+1}^{0}=\overline{x}_{k}+\beta_{k}^{\ast}\left(\overline{y}_{k}-\overline{% x}_{k}\right).italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT ( overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) . (1.2)

This scheme will be noted AAopt0. In various numerical applications, Chen and Vuik show that AAopt0 needs fewer iterations to converge compared to regular AA. Since AAopt0 needs two extra maps per iteration, g⁒(xΒ―k)𝑔subscriptΒ―π‘₯π‘˜g\left(\overline{x}_{k}\right)italic_g ( overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) and g⁒(yΒ―k)𝑔subscriptΒ―π‘¦π‘˜g\left(\overline{y}_{k}\right)italic_g ( overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), they suggest using their method for applications for which mappings are inexpensive to compute. Otherwise, the potential efficiency gains from a smaller number of iterations may be offset by the extra computation per iteration.

1.1 Locally optimal relaxation for convergent mapping applications

Two modifications to AAopt0 are now proposed to improve its performance for convergent mapping iterations. First, note that (1.1) and (1.2) constitute by themselves a mini order-1 AA iteration. Using xΒ―ksubscriptΒ―π‘₯π‘˜\overline{x}_{k}overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and yΒ―ksubscriptΒ―π‘¦π‘˜\overline{y}_{k}overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT to compute xk+10superscriptsubscriptπ‘₯π‘˜10x_{k+1}^{0}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT means implicitly choosing full relaxation (Ξ²=0𝛽0\beta=0italic_Ξ² = 0) for this mini AA iteration. But for convergent mappings, we can hope that g⁒(xΒ―k)𝑔subscriptΒ―π‘₯π‘˜g\left(\overline{x}_{k}\right)italic_g ( overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) and g⁒(yΒ―k)𝑔subscriptΒ―π‘¦π‘˜g\left(\overline{y}_{k}\right)italic_g ( overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) are closer to xβˆ—superscriptπ‘₯βˆ—x^{\ast}italic_x start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT than xΒ―ksubscriptΒ―π‘₯π‘˜\overline{x}_{k}overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and yΒ―ksubscriptΒ―π‘¦π‘˜\overline{y}_{k}overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, respectively. To take full advantage of the work of computing them, a more natural choice is thus the non-relaxed mini AA

xk+11=g⁒(xΒ―k)+Ξ²kβˆ—β’(g⁒(yΒ―k)βˆ’g⁒(xΒ―k)).superscriptsubscriptπ‘₯π‘˜11𝑔subscriptΒ―π‘₯π‘˜superscriptsubscriptπ›½π‘˜βˆ—π‘”subscriptΒ―π‘¦π‘˜π‘”subscriptΒ―π‘₯π‘˜x_{k+1}^{1}=g\left(\overline{x}_{k}\right)+\beta_{k}^{\ast}\left(g\left(% \overline{y}_{k}\right)-g\left(\overline{x}_{k}\right)\right).italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT = italic_g ( overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT ( italic_g ( overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_g ( overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) . (1.3)

This modified scheme will be labeled AAopt1. Section 2.1 presents a proof that AAopt1 improves convergence compared to AAopt0 for contractive linear mappings.

Second, the empirical tests shown below suggest that the optimal relaxation parameters Ξ²kβˆ—superscriptsubscriptπ›½π‘˜βˆ—\beta_{k}^{\ast}italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT tend to be correlated between iterations. This opens up the possibility of the same Ξ²kβˆ—superscriptsubscriptπ›½π‘˜βˆ—\beta_{k}^{\ast}italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT as approximation for Ξ²k+1βˆ—,Ξ²k+2βˆ—,…superscriptsubscriptπ›½π‘˜1βˆ—superscriptsubscriptπ›½π‘˜2βˆ—β€¦\beta_{k+1}^{\ast},\beta_{k+2}^{\ast},...italic_Ξ² start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT , italic_Ξ² start_POSTSUBSCRIPT italic_k + 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT , …, as long as the lower precision due to the approximation does not deteriorate convergence too much. An AAopt1 which only updates the relaxation every T𝑇Titalic_T iterations will be noted AAopt1_T, with AAopt1_1 ≑\equiv≑ AAopt1. Of course, the same modification could be applied to AAopt0. Details of the algorithm are presented in Section 2.3.

1.2 Costless adaptive relaxation for convergent mappings

A new adaptive relaxation parameter for AA is now proposed. It requires only two inner products and no extra maps per iteration.

Assume that xk+1=xΒ―k+Ξ²k⁒(yΒ―kβˆ’xΒ―k)subscriptπ‘₯π‘˜1subscriptΒ―π‘₯π‘˜subscriptπ›½π‘˜subscriptΒ―π‘¦π‘˜subscriptΒ―π‘₯π‘˜x_{k+1}=\overline{x}_{k}+\beta_{k}\left(\overline{y}_{k}-\overline{x}_{k}\right)italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) has been computed for some Ξ²ksubscriptπ›½π‘˜\beta_{k}italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. Since the mapping g𝑔gitalic_g is assumed to converge to xβˆ—superscriptπ‘₯βˆ—x^{\ast}italic_x start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT, the next map used in the AA algorithm, g⁒(xk+1)𝑔subscriptπ‘₯π‘˜1g\left(x_{k+1}\right)italic_g ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ), should provide useful information on the location of xβˆ—superscriptπ‘₯βˆ—x^{\ast}italic_x start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT. An improved choice of relaxation parameter in iteration kπ‘˜kitalic_k would be the Ξ²^ksubscript^π›½π‘˜\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT that minimizes the distance between g⁒(xk+1)𝑔subscriptπ‘₯π‘˜1g\left(x_{k+1}\right)italic_g ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) and xΒ―k+Ξ²^k⁒(yΒ―βˆ’xΒ―k)subscriptΒ―π‘₯π‘˜subscript^π›½π‘˜Β―π‘¦subscriptΒ―π‘₯π‘˜\overline{x}_{k}+\hat{\beta}_{k}\left(\overline{y}-\overline{x}_{k}\right)overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( overΒ― start_ARG italic_y end_ARG - overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ):

Ξ²^k=arg⁑minβ⁑‖xΒ―k+β⁒(yΒ―kβˆ’xΒ―k)βˆ’g⁒(xk+1)β€–2=⟨yΒ―kβˆ’xΒ―k,g⁒(xk+1)βˆ’xΒ―kβŸ©β€–yΒ―kβˆ’xΒ―kβ€–2.subscript^π›½π‘˜subscript𝛽subscriptnormsubscriptΒ―π‘₯π‘˜π›½subscriptΒ―π‘¦π‘˜subscriptΒ―π‘₯π‘˜π‘”subscriptπ‘₯π‘˜12subscriptΒ―π‘¦π‘˜subscriptΒ―π‘₯π‘˜π‘”subscriptπ‘₯π‘˜1subscriptΒ―π‘₯π‘˜superscriptnormsubscriptΒ―π‘¦π‘˜subscriptΒ―π‘₯π‘˜2\hat{\beta}_{k}=\arg\min_{\beta}\left\|\overline{x}_{k}+\beta\left(\overline{y% }_{k}-\overline{x}_{k}\right)-g\left(x_{k+1}\right)\right\|_{2}=\frac{\left% \langle\overline{y}_{k}-\overline{x}_{k},g\left(x_{k+1}\right)-\overline{x}_{k% }\right\rangle}{\left\|\overline{y}_{k}-\overline{x}_{k}\right\|^{2}}.over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = roman_arg roman_min start_POSTSUBSCRIPT italic_Ξ² end_POSTSUBSCRIPT βˆ₯ overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_Ξ² ( overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_g ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) βˆ₯ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = divide start_ARG ⟨ overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_g ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) - overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ end_ARG start_ARG βˆ₯ overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT βˆ₯ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (1.4)

Section 3.1 includes a demonstration that Ξ²^ksubscript^π›½π‘˜\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT can improve convergence for a contractive linear operators under an appropriate norm.

In practice, since g⁒(xk+1)𝑔subscriptπ‘₯π‘˜1g\left(x_{k+1}\right)italic_g ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) is needed to compute Ξ²^ksubscript^π›½π‘˜\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, using it at iteration kπ‘˜kitalic_k to recompute an improved xk+1subscriptπ‘₯π‘˜1x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT implies computing an extra map. However, as stated before, if Ξ²^ksubscript^π›½π‘˜\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is correlated with Ξ²^k+1subscript^π›½π‘˜1\hat{\beta}_{k+1}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT, Ξ²^ksubscript^π›½π‘˜\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT it can be used as an approximation for Ξ²^k+1subscript^π›½π‘˜1\hat{\beta}_{k+1}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT. Computing Ξ²^ksubscript^π›½π‘˜\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT would then require only two inner products, a negligible cost within the AA algorithm.

This scheme, which minimizes the distance between xk+1subscriptπ‘₯π‘˜1x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT and its map, will be labeled AAmd. Its implementation details are provided in Section 3. As will be seen in Section 5, AAmd improves convergence speeds for all nonlinear applications presented.

1.3 Regularizing β𝛽\betaitalic_Ξ²

Essentially all the AA literature assumes 0<Ξ²k≀10subscriptπ›½π‘˜10<\beta_{k}\leq 10 < italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≀ 1. One exception could in some way be SqS1-SQUAREMΒ [22], as discussed in [21]. It corresponds to a 1-iteration AA where α𝛼\alphaitalic_Ξ± is also used as the relaxation parameter. Varadhan and Roland show that this choice can greatly speed up convergence. The same was pointed out by Raydan and Svaiter [19] for a linear version of the algorithm. Since the resulting values for Ξ²ksubscriptπ›½π‘˜\beta_{k}italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT depend on the eigenvalues of the Jacobian of g⁒(x)𝑔π‘₯g(x)italic_g ( italic_x ), these can take values well above one.

A striking result from the numerical simulations presented in Section 5.4 is that for both suggested schemes, and especially for AAmd, the calculated relaxation parameters are regularly above one. When this happens, Chen and Vuik [6] recommend falling back to a default relaxation parameter below 1. For AAopt1_T and AAmd, empirical tests suggest that relaxation parameters above 1 can provide good results, as long as excessively high values are avoided. To do so, a practical option is simply to cap Ξ²ksubscriptπ›½π‘˜\beta_{k}italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT at some maximum value Ξ²max>1subscript𝛽1\beta_{\max}>1italic_Ξ² start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT > 1.Β For all the numerical tests presented below, Ξ²max=3subscript𝛽3\beta_{\max}=3italic_Ξ² start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT = 3 was used.

1.4 Testing framework

The performances of AAopt1_T and AAmd will be tested in three fixed-point application: a Poisson partial differential equation (PDE) and two expectation-maximization (EM) algorithm [7]. Since AAopt1_T and AAmd vary in terms of mappings per iteration, the number of iterations or maps until convergence are not the most informative criteria for comparison. For practitioners and software library designers, the important benchmark is ultimately computation speed. In practice however, comparing computation times meaningfully is challenging since they can be affected by software implementation, hardware choices, and programming languages. To get the most complete picture, iterations, mappings, and computation times will be shown. To make the speed comparisons as informative as possible, all problems were programmed as efficiently as possible in a compiled programming language and executed on the same hardware.

Another matter of concern when comparing multiple AA implementations is the choice of parameters, especially the number of lags mπ‘šmitalic_m. With a sufficiently large mπ‘šmitalic_m, AA with constant relaxation can often converge in a similar number of iterations as AAopt0 or AAopt1. Of course, a larger mπ‘šmitalic_m entails more work solving the linear optimization problem and potentially less precision due to worse conditioning. To properly compare the various versions of AA, each of them should be implemented with an optimal mπ‘šmitalic_m for each problem, i.e., the one which minimizes computation times.

Finally, initial starting points can play a major and unpredictable role in convergence rates, especially for non-linear applications. To make the comparisons more robust, numerical experiments with many different starting points and different synthetic data will be performed.

The applications details, the test implementations and the results are presented in Section 5. Section 6 offers concluding remarks.

2 AAopt1_T

This section demonstrates the convergence improvement of AAopt1 compared to AAopt0 for a contractive linear operator. This advantage is then illustrated by solving numerically a linear system of equations. Finally, the full AAopt1_T algorithm is laid out in detail.

2.1 Local improvements from AAopt1 for a linear contraction mapping

Consider solving A⁒x=b𝐴π‘₯𝑏Ax=bitalic_A italic_x = italic_b where xβˆˆβ„nπ‘₯superscriptℝ𝑛x\in\mathbb{R}^{n}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, bβˆˆβ„n𝑏superscriptℝ𝑛b\in\mathbb{R}^{n}italic_b ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and Aβˆˆβ„nΓ—n𝐴superscriptℝ𝑛𝑛A\in\mathbb{R}^{n\times n}italic_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_n Γ— italic_n end_POSTSUPERSCRIPT is symmetric with eigenvalues 0<Ξ»i<20subscriptπœ†π‘–20<\lambda_{i}<20 < italic_Ξ» start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < 2 for all i𝑖iitalic_i. To solve this system of equations, define the linear mapping g⁒(x)=xβˆ’(A⁒xβˆ’b)𝑔π‘₯π‘₯𝐴π‘₯𝑏g(x)=x-\left(Ax-b\right)italic_g ( italic_x ) = italic_x - ( italic_A italic_x - italic_b ).

At iteration kπ‘˜kitalic_k of the AA algorithm, assume that yΒ―ksubscriptΒ―π‘¦π‘˜\overline{y}_{k}overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and xΒ―ksubscriptΒ―π‘₯π‘˜\overline{x}_{k}overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT (as in step 5 of Algorithm 1) have been computed.

Theorem 2.1.

Let f⁒(x)=g⁒(x)βˆ’x=βˆ’(A⁒xβˆ’b)𝑓π‘₯𝑔π‘₯π‘₯𝐴π‘₯𝑏f(x)=g(x)-x=-\left(Ax-b\right)italic_f ( italic_x ) = italic_g ( italic_x ) - italic_x = - ( italic_A italic_x - italic_b ) with bβˆˆβ„n𝑏superscriptℝ𝑛b\in\mathbb{R}^{n}italic_b ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and Aβˆˆβ„nΓ—n𝐴superscriptℝ𝑛𝑛A\in\mathbb{R}^{n\times n}italic_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_n Γ— italic_n end_POSTSUPERSCRIPT symmetric with eigenvalues 0<Ξ»i<20subscriptπœ†π‘–20<\lambda_{i}<20 < italic_Ξ» start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < 2 for i=1,…,n𝑖1…𝑛i=1,...,nitalic_i = 1 , … , italic_n. For any real vectors xΒ―k,yΒ―k,subscriptΒ―π‘₯π‘˜subscriptΒ―π‘¦π‘˜\bar{x}_{k},\bar{y}_{k},overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ,

β€–f⁒(xk+11)β€–<β€–f⁒(xk+10)β€–,norm𝑓superscriptsubscriptπ‘₯π‘˜11norm𝑓superscriptsubscriptπ‘₯π‘˜10\left\|f\left(x_{k+1}^{1}\right)\right\|<\left\|f\left(x_{k+1}^{0}\right)% \right\|,βˆ₯ italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ) βˆ₯ < βˆ₯ italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) βˆ₯ ,

as long as f⁒(xk+10)β‰ πŸŽπ‘“superscriptsubscriptπ‘₯π‘˜100f\left(x_{k+1}^{0}\right)\neq\mathbf{0}italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) β‰  bold_0 (AA has not converged), where xk+10superscriptsubscriptπ‘₯π‘˜10x_{k+1}^{0}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT and xk+11superscriptsubscriptπ‘₯π‘˜11x_{k+1}^{1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT are defined in (1.2) and (1.3), respectively.

Proof.

It is straightforward to rewrite xk+11superscriptsubscriptπ‘₯π‘˜11x_{k+1}^{1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT in terms of xk+10superscriptsubscriptπ‘₯π‘˜10x_{k+1}^{0}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT:

xk+11superscriptsubscriptπ‘₯π‘˜11\displaystyle x_{k+1}^{1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT =(1βˆ’Ξ²kβˆ—)⁒g⁒(xΒ―k)+Ξ²kβˆ—β’g⁒(yΒ―k)absent1superscriptsubscriptπ›½π‘˜βˆ—π‘”subscriptΒ―π‘₯π‘˜superscriptsubscriptπ›½π‘˜βˆ—π‘”subscriptΒ―π‘¦π‘˜\displaystyle=\left(1-\beta_{k}^{\ast}\right)g\left(\overline{x}_{k}\right)+% \beta_{k}^{\ast}g\left(\overline{y}_{k}\right)= ( 1 - italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT ) italic_g ( overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT italic_g ( overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
xk+11superscriptsubscriptπ‘₯π‘˜11\displaystyle x_{k+1}^{1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT =(1βˆ’Ξ²kβˆ—)⁒[xΒ―kβˆ’(A⁒xΒ―kβˆ’b)]+Ξ²kβˆ—β’[yΒ―kβˆ’(A⁒yΒ―kβˆ’b)]absent1superscriptsubscriptπ›½π‘˜βˆ—delimited-[]subscriptΒ―π‘₯π‘˜π΄subscriptΒ―π‘₯π‘˜π‘superscriptsubscriptπ›½π‘˜βˆ—delimited-[]subscriptΒ―π‘¦π‘˜π΄subscriptΒ―π‘¦π‘˜π‘\displaystyle=\left(1-\beta_{k}^{\ast}\right)\left[\overline{x}_{k}-\left(A% \overline{x}_{k}-b\right)\right]+\beta_{k}^{\ast}\left[\overline{y}_{k}-\left(% A\overline{y}_{k}-b\right)\right]= ( 1 - italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT ) [ overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - ( italic_A overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_b ) ] + italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT [ overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - ( italic_A overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_b ) ]
xk+11superscriptsubscriptπ‘₯π‘˜11\displaystyle x_{k+1}^{1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT =xk+10βˆ’(A⁒xk+10βˆ’b)absentsuperscriptsubscriptπ‘₯π‘˜10𝐴superscriptsubscriptπ‘₯π‘˜10𝑏\displaystyle=x_{k+1}^{0}-\left(Ax_{k+1}^{0}-b\right)= italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - ( italic_A italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - italic_b )
xk+11superscriptsubscriptπ‘₯π‘˜11\displaystyle x_{k+1}^{1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT =xk+10+f⁒(xk+10).absentsuperscriptsubscriptπ‘₯π‘˜10𝑓superscriptsubscriptπ‘₯π‘˜10\displaystyle=x_{k+1}^{0}+f\left(x_{k+1}^{0}\right).= italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT + italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) .

Expressing f⁒(xk+11)𝑓superscriptsubscriptπ‘₯π‘˜11f\left(x_{k+1}^{1}\right)italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ) in terms of f⁒(xk+10)𝑓superscriptsubscriptπ‘₯π‘˜10f\left(x_{k+1}^{0}\right)italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ):

f⁒(xk+11)𝑓superscriptsubscriptπ‘₯π‘˜11\displaystyle f\left(x_{k+1}^{1}\right)italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ) =βˆ’(A⁒(xk+10+f⁒(xk+10))βˆ’b)absent𝐴superscriptsubscriptπ‘₯π‘˜10𝑓superscriptsubscriptπ‘₯π‘˜10𝑏\displaystyle=-\left(A\left(x_{k+1}^{0}+f\left(x_{k+1}^{0}\right)\right)-b\right)= - ( italic_A ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT + italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) ) - italic_b )
f⁒(xk+11)𝑓superscriptsubscriptπ‘₯π‘˜11\displaystyle f\left(x_{k+1}^{1}\right)italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ) =βˆ’(A⁒xk+10βˆ’b)βˆ’A⁒f⁒(xk+10)absent𝐴superscriptsubscriptπ‘₯π‘˜10𝑏𝐴𝑓superscriptsubscriptπ‘₯π‘˜10\displaystyle=-\left(Ax_{k+1}^{0}-b\right)-Af\left(x_{k+1}^{0}\right)= - ( italic_A italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - italic_b ) - italic_A italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT )
f⁒(xk+11)𝑓superscriptsubscriptπ‘₯π‘˜11\displaystyle f\left(x_{k+1}^{1}\right)italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ) =(Iβˆ’A)⁒f⁒(xk+10).absent𝐼𝐴𝑓superscriptsubscriptπ‘₯π‘˜10\displaystyle=\left(I-A\right)f\left(x_{k+1}^{0}\right).= ( italic_I - italic_A ) italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) .

The squared 2-norms of f⁒(xk+10)𝑓superscriptsubscriptπ‘₯π‘˜10f\left(x_{k+1}^{0}\right)italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) and f⁒(xk+11)𝑓superscriptsubscriptπ‘₯π‘˜11f\left(x_{k+1}^{1}\right)italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ) are

β€–f⁒(xk+10)β€–2=f⁒(xk+10)⊺⁒f⁒(xk+10)superscriptnorm𝑓superscriptsubscriptπ‘₯π‘˜102𝑓superscriptsuperscriptsubscriptπ‘₯π‘˜10βŠΊπ‘“superscriptsubscriptπ‘₯π‘˜10\left\|f\left(x_{k+1}^{0}\right)\right\|^{2}=f\left(x_{k+1}^{0}\right)^{% \intercal}f\left(x_{k+1}^{0}\right)βˆ₯ italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) βˆ₯ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT )

and

β€–f⁒(xk+11)β€–2superscriptnorm𝑓superscriptsubscriptπ‘₯π‘˜112\displaystyle\left\|f\left(x_{k+1}^{1}\right)\right\|^{2}βˆ₯ italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ) βˆ₯ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =((Iβˆ’A)⁒f⁒(xk+10))⊺⁒(Iβˆ’A)⁒f⁒(xk+10)absentsuperscript𝐼𝐴𝑓superscriptsubscriptπ‘₯π‘˜10βŠΊπΌπ΄π‘“superscriptsubscriptπ‘₯π‘˜10\displaystyle=\left(\left(I-A\right)f\left(x_{k+1}^{0}\right)\right)^{% \intercal}\left(I-A\right)f\left(x_{k+1}^{0}\right)= ( ( italic_I - italic_A ) italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT ( italic_I - italic_A ) italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT )
=f⁒(xk+10)⊺⁒(Iβˆ’A)⊺⁒(Iβˆ’A)⁒f⁒(xk+10).absent𝑓superscriptsuperscriptsubscriptπ‘₯π‘˜10⊺superscriptπΌπ΄βŠΊπΌπ΄π‘“superscriptsubscriptπ‘₯π‘˜10\displaystyle=f\left(x_{k+1}^{0}\right)^{\intercal}\left(I-A\right)^{\intercal% }\left(I-A\right)f\left(x_{k+1}^{0}\right).= italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT ( italic_I - italic_A ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT ( italic_I - italic_A ) italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) .

Since A𝐴Aitalic_A has eigenvalues 0<Ξ»i<20subscriptπœ†π‘–20<\lambda_{i}<20 < italic_Ξ» start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < 2 for i=1,…,n𝑖1…𝑛i=1,...,nitalic_i = 1 , … , italic_n, the product (Iβˆ’A)⊺⁒(Iβˆ’A)superscript𝐼𝐴⊺𝐼𝐴\left(I-A\right)^{\intercal}\left(I-A\right)( italic_I - italic_A ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT ( italic_I - italic_A ) is symmetric with eigenvalues 0≀ai<10subscriptπ‘Žπ‘–10\leq a_{i}<10 ≀ italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < 1 for i=1,…,n𝑖1…𝑛i=1,...,nitalic_i = 1 , … , italic_n. The ratio

β€–f⁒(xk+11)β€–2β€–f⁒(xk+10)β€–2=f⁒(xk+10)⊺⁒(Iβˆ’A)⊺⁒(Iβˆ’A)⁒f⁒(xk+10)f⁒(xk+10)⊺⁒f⁒(xk+10)superscriptnorm𝑓superscriptsubscriptπ‘₯π‘˜112superscriptnorm𝑓superscriptsubscriptπ‘₯π‘˜102𝑓superscriptsuperscriptsubscriptπ‘₯π‘˜10⊺superscriptπΌπ΄βŠΊπΌπ΄π‘“superscriptsubscriptπ‘₯π‘˜10𝑓superscriptsuperscriptsubscriptπ‘₯π‘˜10βŠΊπ‘“superscriptsubscriptπ‘₯π‘˜10\frac{\left\|f\left(x_{k+1}^{1}\right)\right\|^{2}}{\left\|f\left(x_{k+1}^{0}% \right)\right\|^{2}}=\frac{f\left(x_{k+1}^{0}\right)^{\intercal}\left(I-A% \right)^{\intercal}\left(I-A\right)f\left(x_{k+1}^{0}\right)}{f\left(x_{k+1}^{% 0}\right)^{\intercal}f\left(x_{k+1}^{0}\right)}divide start_ARG βˆ₯ italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ) βˆ₯ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG βˆ₯ italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) βˆ₯ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = divide start_ARG italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT ( italic_I - italic_A ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT ( italic_I - italic_A ) italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) end_ARG

is a Rayleigh quotient that can only take values in [0,1)01[0,1)[ 0 , 1 ). Therefore, β€–f⁒(xk+11)β€–β€–f⁒(xk+10)β€–<1norm𝑓superscriptsubscriptπ‘₯π‘˜11norm𝑓superscriptsubscriptπ‘₯π‘˜101\frac{\left\|f\left(x_{k+1}^{1}\right)\right\|}{\left\|f\left(x_{k+1}^{0}% \right)\right\|}<1divide start_ARG βˆ₯ italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ) βˆ₯ end_ARG start_ARG βˆ₯ italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) βˆ₯ end_ARG < 1, which completes the proof. ∎

2.2 A linear example with AAopt1

To apply the proof in a concrete example, consider using the mapping g⁒(x)=xβˆ’(A⁒xβˆ’b)𝑔π‘₯π‘₯𝐴π‘₯𝑏g(x)=x-\left(Ax-b\right)italic_g ( italic_x ) = italic_x - ( italic_A italic_x - italic_b ) to find the solution to A⁒x=b𝐴π‘₯𝑏Ax=bitalic_A italic_x = italic_b where A=𝐴absentA=italic_A = diag(0.1,0.2,…,1.9)0.10.2…1.9(0.1,0.2,...,1.9)( 0.1 , 0.2 , … , 1.9 ),b=πŸπ‘1\ b=\mathbf{1}\ italic_b = bold_1and the initial guess is x0=𝟎subscriptπ‘₯00x_{0}=\mathbf{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = bold_0. To highlight the difference between AAopt0 and AAopt1, no bounds on Ξ²kβˆ—superscriptsubscriptπ›½π‘˜βˆ—\beta_{k}^{\ast}italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT is imposed on either of them. AA with constant Ξ²=1𝛽1\beta=1italic_Ξ² = 1 is also shown for comparison. All are implemented with m=8π‘š8m=8italic_m = 8.

Refer to caption


Figure 1: Residuals (with norm Aβˆ’1superscript𝐴1A^{-1}italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT), Linear system of equations

Refer to caption

Figure 2: Relaxation parameter, linear system of equations

Figure 2 shows the Euclidean norm of the residual for each algorithm and Figure 2 shows the relaxation parameter at each iteration. AAopt1 converges almost 3 times faster than AAopt0 and AA with Ξ²=1𝛽1\beta=1italic_Ξ² = 1. This small number of iterations for AAopt1’s looks promising. But since each iteration requires three maps per iteration instead of one for stationary AA, comparing computation speeds will be a true test of its promises as an optimization algorithm.

2.3 The full AAopt1_T algorithm

The full AAopt1_T algorithm takes as extra parameters Tβ‰₯1𝑇1T\geq 1italic_T β‰₯ 1, which determines at which interval the optimal relaxation Ξ²kβˆ—superscriptsubscriptπ›½π‘˜βˆ—\beta_{k}^{\ast}italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT is recomputed, Ξ²maxsubscript𝛽\beta_{\max}italic_Ξ² start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT, and Ξ²defaultsubscript𝛽default\beta_{\text{default}}italic_Ξ² start_POSTSUBSCRIPT default end_POSTSUBSCRIPT as described in 1.3.

Algorithm 2.

Input: a mapping g:ℝn→ℝn:𝑔→superscriptℝ𝑛superscriptℝ𝑛g:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}italic_g : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT β†’ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, a starting point x0βˆˆβ„nsubscriptπ‘₯0superscriptℝ𝑛x_{0}\in\mathbb{R}^{n}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, 1≀m≀n1π‘šπ‘›1\leq m\leq n1 ≀ italic_m ≀ italic_n, Ξ²max>0,0<Ξ²default≀βmaxformulae-sequencesubscript𝛽00subscript𝛽defaultsubscript𝛽\beta_{\max}>0,0<\beta_{\text{default}}\leq\beta_{\max}italic_Ξ² start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT > 0 , 0 < italic_Ξ² start_POSTSUBSCRIPT default end_POSTSUBSCRIPT ≀ italic_Ξ² start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT and Tβ‰₯1.𝑇1T\geq 1.italic_T β‰₯ 1 .

Β 

1 Set x1=g⁒(x0)subscriptπ‘₯1𝑔subscriptπ‘₯0x_{1}=g(x_{0})italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_g ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT )
2 for k=1,2,β€¦π‘˜12…k=1,2,...italic_k = 1 , 2 , … until convergence
3     Compute g⁒(xk)𝑔subscriptπ‘₯π‘˜g\left(x_{k}\right)italic_g ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
4     Compute (Ξ±1(k),…,Ξ±mk(k))superscriptsubscript𝛼1π‘˜β€¦superscriptsubscript𝛼subscriptπ‘šπ‘˜π‘˜(\alpha_{1}^{(k)},...,\alpha_{m_{k}}^{(k)})( italic_Ξ± start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT , … , italic_Ξ± start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) that solve
     minΞ±1(k),…,Ξ±mk(k)β‘β€–βˆ‘i=1mkΞ±i(k)⁒f⁒(xkβˆ’mk+i)β€–subscriptsuperscriptsubscript𝛼1π‘˜β€¦superscriptsubscript𝛼subscriptπ‘šπ‘˜π‘˜normsuperscriptsubscript𝑖1subscriptπ‘šπ‘˜superscriptsubscriptπ›Όπ‘–π‘˜π‘“subscriptπ‘₯π‘˜subscriptπ‘šπ‘˜π‘–\min_{\alpha_{1}^{(k)},...,\alpha_{m_{k}}^{(k)}}\left\|\sum_{i=1}^{m_{k}}% \alpha_{i}^{(k)}f\left(x_{k-m_{k}+i}\right)\right\|roman_min start_POSTSUBSCRIPT italic_Ξ± start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT , … , italic_Ξ± start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT βˆ₯ βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_Ξ± start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_k - italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_i end_POSTSUBSCRIPT ) βˆ₯ s.t. βˆ‘i=1mkΞ±i(k)=1superscriptsubscript𝑖1subscriptπ‘šπ‘˜superscriptsubscriptπ›Όπ‘–π‘˜1\sum_{i=1}^{m_{k}}\alpha_{i}^{(k)}=1βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_Ξ± start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT = 1
5     Compute xΒ―k=βˆ‘i=1mkΞ±i(k)⁒xkβˆ’mk+i⁒ and ⁒yΒ―k=βˆ‘i=1mkΞ±i(k)⁒g⁒(xkβˆ’mk+i)subscriptΒ―π‘₯π‘˜superscriptsubscript𝑖1subscriptπ‘šπ‘˜superscriptsubscriptπ›Όπ‘–π‘˜subscriptπ‘₯π‘˜subscriptπ‘šπ‘˜π‘–Β andΒ subscriptΒ―π‘¦π‘˜superscriptsubscript𝑖1subscriptπ‘šπ‘˜superscriptsubscriptπ›Όπ‘–π‘˜π‘”subscriptπ‘₯π‘˜subscriptπ‘šπ‘˜π‘–\overline{x}_{k}=\sum_{i=1}^{m_{k}}\alpha_{i}^{(k)}x_{k-m_{k}+i}\text{ and }% \overline{y}_{k}=\sum_{i=1}^{m_{k}}\alpha_{i}^{(k)}g\left(x_{k-m_{k}+i}\right)overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_Ξ± start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_k - italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_i end_POSTSUBSCRIPT and overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_Ξ± start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT italic_g ( italic_x start_POSTSUBSCRIPT italic_k - italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_i end_POSTSUBSCRIPT )
6     If k=1π‘˜1k=1italic_k = 1 or k⁒mod⁑T=0π‘˜mod𝑇0k\operatorname{mod}T=0italic_k roman_mod italic_T = 0
7       Compute g⁒(xΒ―k)𝑔subscriptΒ―π‘₯π‘˜g\left(\overline{x}_{k}\right)italic_g ( overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) and g⁒(yΒ―k)𝑔subscriptΒ―π‘¦π‘˜g(\overline{y}_{k})italic_g ( overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
8       Compute Ξ²kβˆ—=βˆ’βŸ¨f⁒(yΒ―k)βˆ’f⁒(xΒ―k),f⁒(xΒ―k)βŸ©β€–f⁒(yΒ―k)βˆ’f⁒(xΒ―k)β€–2superscriptsubscriptπ›½π‘˜βˆ—π‘“subscriptΒ―π‘¦π‘˜π‘“subscriptΒ―π‘₯π‘˜π‘“subscriptΒ―π‘₯π‘˜superscriptnorm𝑓subscriptΒ―π‘¦π‘˜π‘“subscriptΒ―π‘₯π‘˜2\beta_{k}^{\ast}=-\frac{\left\langle f\left(\overline{y}_{k}\right)-f\left(% \overline{x}_{k}\right),f\left(\overline{x}_{k}\right)\right\rangle}{\left\|f% \left(\overline{y}_{k}\right)-f\left(\overline{x}_{k}\right)\right\|^{2}}italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT = - divide start_ARG ⟨ italic_f ( overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_f ( overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , italic_f ( overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ⟩ end_ARG start_ARG βˆ₯ italic_f ( overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_f ( overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) βˆ₯ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
9       If Ξ²kβˆ—β‰€0superscriptsubscriptπ›½π‘˜βˆ—0\beta_{k}^{\ast}\leq 0italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT ≀ 0
10         Set Ξ²k=Ξ²defaultsubscriptπ›½π‘˜subscript𝛽default\beta_{k}=\beta_{\text{default}}italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_Ξ² start_POSTSUBSCRIPT default end_POSTSUBSCRIPT
11       else
12         Set Ξ²k=min⁑(Ξ²kβˆ—,Ξ²max)subscriptπ›½π‘˜superscriptsubscriptπ›½π‘˜βˆ—subscript𝛽\beta_{k}=\min(\beta_{k}^{\ast},\beta_{\max})italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = roman_min ( italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT , italic_Ξ² start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT )
13       end if
14       Set xk+1=g⁒(xΒ―k)+Ξ²k⁒(g⁒(yΒ―k)βˆ’g⁒(xΒ―k))subscriptπ‘₯π‘˜1𝑔subscriptΒ―π‘₯π‘˜subscriptπ›½π‘˜π‘”subscriptΒ―π‘¦π‘˜π‘”subscriptΒ―π‘₯π‘˜x_{k+1}=g\left(\overline{x}_{k}\right)+\beta_{k}(g(\overline{y}_{k})-g\left(% \overline{x}_{k}\right))italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_g ( overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_g ( overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_g ( overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) )
15     else
16       Set Ξ²k=Ξ²kβˆ’1subscriptπ›½π‘˜subscriptπ›½π‘˜1\beta_{k}=\beta_{k-1}italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_Ξ² start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT
17       Set xk+1=xΒ―k+Ξ²k⁒(yΒ―kβˆ’xΒ―k)subscriptπ‘₯π‘˜1subscriptΒ―π‘₯π‘˜subscriptπ›½π‘˜subscriptΒ―π‘¦π‘˜subscriptΒ―π‘₯π‘˜x_{k+1}=\overline{x}_{k}+\beta_{k}\left(\overline{y}_{k}-\overline{x}_{k}\right)italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
18     end if
19 end for

Β 

Note that possible adjustments to mksubscriptπ‘šπ‘˜m_{k}italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT such as restarts and composite AA will be described in Section 4.2. Also, Ξ²default=1subscript𝛽default1\beta_{\text{default}}=1italic_Ξ² start_POSTSUBSCRIPT default end_POSTSUBSCRIPT = 1, Ξ²max=3subscript𝛽3\beta_{\max}=3italic_Ξ² start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT = 3 in all experiments.

3 AAmd

This section justifies the use of Ξ²^^𝛽\hat{\beta}over^ start_ARG italic_Ξ² end_ARG as a relaxation parameter and shows its properties in the same linear example as before. Then, it details the implementation of the AAmd algorithm.

3.1 Local improvements from AAmd in a linear contraction mapping

Consider the same mapping as the one for Proof 2.1. Contrary to AAopt1, AAmd may improve convergence at each iteration in the familiar Euclidean norm. Nevertheless, we will show that it does in the elliptic norm

β€–xβ€–Aβˆ’1=x⊺⁒Aβˆ’1⁒xsubscriptnormπ‘₯superscript𝐴1superscriptπ‘₯⊺superscript𝐴1π‘₯\left\|x\right\|_{A^{-1}}=\sqrt{x^{\intercal}A^{-1}x}βˆ₯ italic_x βˆ₯ start_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = square-root start_ARG italic_x start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_x end_ARG

induced by the inner product ⟨.,.⟩Aβˆ’1\left\langle.,.\right\rangle_{A^{-1}}⟨ . , . ⟩ start_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT

⟨x,y⟩Aβˆ’1=x⊺⁒Aβˆ’1⁒y.subscriptπ‘₯𝑦superscript𝐴1superscriptπ‘₯⊺superscript𝐴1𝑦\left\langle x,y\right\rangle_{A^{-1}}=x^{\intercal}A^{-1}y.⟨ italic_x , italic_y ⟩ start_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = italic_x start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_y .

Assume that at iteration kπ‘˜kitalic_k of AA, yΒ―ksubscriptΒ―π‘¦π‘˜\overline{y}_{k}overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and xΒ―ksubscriptΒ―π‘₯π‘˜\overline{x}_{k}overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT (as in step 5 of Algorithm 1) have been computed. To lighten the notation, define the distance dΒ―k≑yΒ―kβˆ’xΒ―ksubscriptΒ―π‘‘π‘˜subscriptΒ―π‘¦π‘˜subscriptΒ―π‘₯π‘˜\bar{d}_{k}\equiv\overline{y}_{k}-\overline{x}_{k}overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≑ overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. For a given Ξ²ksubscriptπ›½π‘˜\beta_{k}italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, the next iteration of AA is xk+1=xΒ―k+dΒ―k⁒βksubscriptπ‘₯π‘˜1subscriptΒ―π‘₯π‘˜subscriptΒ―π‘‘π‘˜subscriptπ›½π‘˜x_{k+1}=\overline{x}_{k}+\bar{d}_{k}\beta_{k}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and the map of xk+1subscriptπ‘₯π‘˜1x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT is g⁒(xk+1)𝑔subscriptπ‘₯π‘˜1g(x_{k+1})italic_g ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ). Using this information, we can compute the improved relaxation parameter Ξ²^k=⟨dΒ―k,g⁒(xk+1)βˆ’xΒ―kβŸ©β€–dΒ―kβ€–2subscript^π›½π‘˜subscriptΒ―π‘‘π‘˜π‘”subscriptπ‘₯π‘˜1subscriptΒ―π‘₯π‘˜superscriptnormsubscriptΒ―π‘‘π‘˜2\hat{\beta}_{k}=\frac{\left\langle\bar{d}_{k},g\left(x_{k+1}\right)-\overline{% x}_{k}\right\rangle}{\left\|\bar{d}_{k}\right\|^{2}}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = divide start_ARG ⟨ overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_g ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) - overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ end_ARG start_ARG βˆ₯ overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT βˆ₯ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG as defined in (1.4). Using Ξ²^ksubscript^π›½π‘˜\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, we can compute an improved next iterate x^k+1=xΒ―k+dΒ―k⁒β^ksubscript^π‘₯π‘˜1subscriptΒ―π‘₯π‘˜subscriptΒ―π‘‘π‘˜subscript^π›½π‘˜\hat{x}_{k+1}=\overline{x}_{k}+\bar{d}_{k}\hat{\beta}_{k}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. The following theorem proves that Ξ²^ksubscript^π›½π‘˜\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT would have been at least as good a choice as Ξ²ksubscriptπ›½π‘˜\beta_{k}italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT or better for any Ξ²ksubscriptπ›½π‘˜\beta_{k}italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, in the sense that β€–f⁒(x^k+1)β€–Aβˆ’1≀‖f⁒(xk+1)β€–Aβˆ’1subscriptnorm𝑓subscript^π‘₯π‘˜1superscript𝐴1subscriptnorm𝑓subscriptπ‘₯π‘˜1superscript𝐴1\left\|f\left(\hat{x}_{k+1}\right)\right\|_{A^{-1}}\leq\left\|f\left(x_{k+1}% \right)\right\|_{A^{-1}}βˆ₯ italic_f ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) βˆ₯ start_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≀ βˆ₯ italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) βˆ₯ start_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT for all Ξ²ksubscriptπ›½π‘˜\beta_{k}italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.

Theorem 3.1.

Let f⁒(x)=βˆ’(A⁒xβˆ’b)𝑓π‘₯𝐴π‘₯𝑏f(x)=-\left(Ax-b\right)italic_f ( italic_x ) = - ( italic_A italic_x - italic_b ) with bβˆˆβ„n𝑏superscriptℝ𝑛b\in\mathbb{R}^{n}italic_b ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and Aβˆˆβ„nΓ—n𝐴superscriptℝ𝑛𝑛A\in\mathbb{R}^{n\times n}italic_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_n Γ— italic_n end_POSTSUPERSCRIPT symmetric with eigenvalues 0<Ξ»i<20subscriptπœ†π‘–20<\lambda_{i}<20 < italic_Ξ» start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < 2 for i=1,…,n𝑖1…𝑛i=1,...,nitalic_i = 1 , … , italic_n. For any xΒ―k,dΒ―k,Ξ²k,subscriptΒ―π‘₯π‘˜subscriptΒ―π‘‘π‘˜subscriptπ›½π‘˜\bar{x}_{k},\bar{d}_{k},\beta_{k},overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ,

β€–f⁒(x^k+1)β€–Aβˆ’1≀‖f⁒(xk+1)β€–Aβˆ’1,subscriptnorm𝑓subscript^π‘₯π‘˜1superscript𝐴1subscriptnorm𝑓subscriptπ‘₯π‘˜1superscript𝐴1\left\|f\left(\hat{x}_{k+1}\right)\right\|_{A^{-1}}\leq\left\|f\left(x_{k+1}% \right)\right\|_{A^{-1}},βˆ₯ italic_f ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) βˆ₯ start_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≀ βˆ₯ italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) βˆ₯ start_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ,

where xk+1=xΒ―k+dΒ―k⁒βksubscriptπ‘₯π‘˜1subscriptΒ―π‘₯π‘˜subscriptΒ―π‘‘π‘˜subscriptπ›½π‘˜x_{k+1}=\overline{x}_{k}+\bar{d}_{k}\beta_{k}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, x^k+1=xΒ―k+dΒ―k⁒β^ksubscript^π‘₯π‘˜1subscriptΒ―π‘₯π‘˜subscriptΒ―π‘‘π‘˜subscript^π›½π‘˜\hat{x}_{k+1}=\overline{x}_{k}+\bar{d}_{k}\hat{\beta}_{k}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, and Ξ²^k=⟨dΒ―k,g⁒(xk+1)βˆ’xΒ―kβŸ©β€–dΒ―kβ€–2subscript^π›½π‘˜subscriptΒ―π‘‘π‘˜π‘”subscriptπ‘₯π‘˜1subscriptΒ―π‘₯π‘˜superscriptnormsubscriptΒ―π‘‘π‘˜2\hat{\beta}_{k}=\frac{\left\langle\bar{d}_{k},g\left(x_{k+1}\right)-\overline{% x}_{k}\right\rangle}{\left\|\bar{d}_{k}\right\|^{2}}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = divide start_ARG ⟨ overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_g ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) - overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ end_ARG start_ARG βˆ₯ overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT βˆ₯ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG.

Proof.

Substituting xΒ―k=xk+1βˆ’dΒ―k⁒βksubscriptΒ―π‘₯π‘˜subscriptπ‘₯π‘˜1subscriptΒ―π‘‘π‘˜subscriptπ›½π‘˜\overline{x}_{k}=x_{k+1}-\bar{d}_{k}\beta_{k}overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT in the definition of Ξ²^ksubscript^π›½π‘˜\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT:

Ξ²^ksubscript^π›½π‘˜\displaystyle\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT =⟨dΒ―k,g⁒(xk+1)βˆ’xk+1+dΒ―k⁒βkβŸ©β€–dΒ―kβ€–2absentsubscriptΒ―π‘‘π‘˜π‘”subscriptπ‘₯π‘˜1subscriptπ‘₯π‘˜1subscriptΒ―π‘‘π‘˜subscriptπ›½π‘˜superscriptnormsubscriptΒ―π‘‘π‘˜2\displaystyle=\frac{\left\langle\bar{d}_{k},g\left(x_{k+1}\right)-x_{k+1}+\bar% {d}_{k}\beta_{k}\right\rangle}{\left\|\bar{d}_{k}\right\|^{2}}= divide start_ARG ⟨ overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_g ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) - italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT + overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ end_ARG start_ARG βˆ₯ overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT βˆ₯ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
Ξ²^ksubscript^π›½π‘˜\displaystyle\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT =⟨dΒ―k,f⁒(xk+1)βŸ©β€–dΒ―kβ€–2+Ξ²k.absentsubscriptΒ―π‘‘π‘˜π‘“subscriptπ‘₯π‘˜1superscriptnormsubscriptΒ―π‘‘π‘˜2subscriptπ›½π‘˜\displaystyle=\frac{\left\langle\bar{d}_{k},f\left(x_{k+1}\right)\right\rangle% }{\left\|\bar{d}_{k}\right\|^{2}}+\beta_{k}.= divide start_ARG ⟨ overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) ⟩ end_ARG start_ARG βˆ₯ overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT βˆ₯ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT .

Note that ⟨dΒ―k,f⁒(xk+1)βŸ©β€–dΒ―kβ€–2=0subscriptΒ―π‘‘π‘˜π‘“subscriptπ‘₯π‘˜1superscriptnormsubscriptΒ―π‘‘π‘˜20\frac{\left\langle\bar{d}_{k},f\left(x_{k+1}\right)\right\rangle}{\left\|\bar{% d}_{k}\right\|^{2}}=0divide start_ARG ⟨ overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) ⟩ end_ARG start_ARG βˆ₯ overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT βˆ₯ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = 0 – i.e., f⁒(xk+1)𝑓subscriptπ‘₯π‘˜1f\left(x_{k+1}\right)italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) is orthogonal to dΒ―ksubscriptΒ―π‘‘π‘˜\bar{d}_{k}overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT – implies Ξ²^k=Ξ²ksubscript^π›½π‘˜subscriptπ›½π‘˜\hat{\beta}_{k}=\beta_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. In other words, Ξ²^ksubscript^π›½π‘˜\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is not an improvement over Ξ²ksubscriptπ›½π‘˜\beta_{k}italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.

The next iterate x^k+1subscript^π‘₯π‘˜1\hat{x}_{k+1}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT is

x^k+1subscript^π‘₯π‘˜1\displaystyle\hat{x}_{k+1}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT =xΒ―k+dΒ―k⁒β^kabsentsubscriptΒ―π‘₯π‘˜subscriptΒ―π‘‘π‘˜subscript^π›½π‘˜\displaystyle=\overline{x}_{k}+\bar{d}_{k}\hat{\beta}_{k}= overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT
x^k+1subscript^π‘₯π‘˜1\displaystyle\hat{x}_{k+1}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT =xk+1βˆ’dΒ―k⁒βk+dΒ―k⁒(⟨dΒ―k,f⁒(xk+1)βŸ©β€–dΒ―kβ€–2+Ξ²k)absentsubscriptπ‘₯π‘˜1subscriptΒ―π‘‘π‘˜subscriptπ›½π‘˜subscriptΒ―π‘‘π‘˜subscriptΒ―π‘‘π‘˜π‘“subscriptπ‘₯π‘˜1superscriptnormsubscriptΒ―π‘‘π‘˜2subscriptπ›½π‘˜\displaystyle=x_{k+1}-\bar{d}_{k}\beta_{k}+\bar{d}_{k}\left(\frac{\left\langle% \bar{d}_{k},f\left(x_{k+1}\right)\right\rangle}{\left\|\bar{d}_{k}\right\|^{2}% }+\beta_{k}\right)= italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( divide start_ARG ⟨ overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) ⟩ end_ARG start_ARG βˆ₯ overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT βˆ₯ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
x^k+1subscript^π‘₯π‘˜1\displaystyle\hat{x}_{k+1}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT =xk+1+dΒ―k⁒⟨dΒ―k,f⁒(xk+1)βŸ©β€–dΒ―kβ€–2absentsubscriptπ‘₯π‘˜1subscriptΒ―π‘‘π‘˜subscriptΒ―π‘‘π‘˜π‘“subscriptπ‘₯π‘˜1superscriptnormsubscriptΒ―π‘‘π‘˜2\displaystyle=x_{k+1}+\bar{d}_{k}\frac{\left\langle\bar{d}_{k},f\left(x_{k+1}% \right)\right\rangle}{\left\|\bar{d}_{k}\right\|^{2}}= italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT + overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT divide start_ARG ⟨ overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) ⟩ end_ARG start_ARG βˆ₯ overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT βˆ₯ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
x^k+1subscript^π‘₯π‘˜1\displaystyle\hat{x}_{k+1}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT =xk+1βˆ’dΒ―k⁒(dΒ―k⊺⁒dΒ―k)βˆ’1⁒dΒ―k⊺⁒(A⁒xk+1βˆ’b).absentsubscriptπ‘₯π‘˜1subscriptΒ―π‘‘π‘˜superscriptsuperscriptsubscriptΒ―π‘‘π‘˜βŠΊsubscriptΒ―π‘‘π‘˜1superscriptsubscriptΒ―π‘‘π‘˜βŠΊπ΄subscriptπ‘₯π‘˜1𝑏\displaystyle=x_{k+1}-\bar{d}_{k}\left(\bar{d}_{k}^{\intercal}\bar{d}_{k}% \right)^{-1}\bar{d}_{k}^{\intercal}\left(Ax_{k+1}-b\right).= italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT ( italic_A italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_b ) .
x^k+1subscript^π‘₯π‘˜1\displaystyle\hat{x}_{k+1}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT =xk+1βˆ’PdΒ―k⁒(A⁒xk+1βˆ’b).absentsubscriptπ‘₯π‘˜1subscript𝑃subscriptΒ―π‘‘π‘˜π΄subscriptπ‘₯π‘˜1𝑏\displaystyle=x_{k+1}-P_{\bar{d}_{k}}\left(Ax_{k+1}-b\right).= italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_P start_POSTSUBSCRIPT overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_A italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_b ) .

where PdΒ―k=dΒ―k⁒(dΒ―k⊺⁒dΒ―k)βˆ’1⁒dΒ―k⊺subscript𝑃subscriptΒ―π‘‘π‘˜subscriptΒ―π‘‘π‘˜superscriptsuperscriptsubscriptΒ―π‘‘π‘˜βŠΊsubscriptΒ―π‘‘π‘˜1superscriptsubscriptΒ―π‘‘π‘˜βŠΊP_{\bar{d}_{k}}=\bar{d}_{k}\left(\bar{d}_{k}^{\intercal}\bar{d}_{k}\right)^{-1% }\bar{d}_{k}^{\intercal}italic_P start_POSTSUBSCRIPT overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT = overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT is a projection matrix. Computing f⁒(x^k+1)𝑓subscript^π‘₯π‘˜1f\left(\hat{x}_{k+1}\right)italic_f ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) and expressing it as function of f⁒(xk+1)𝑓subscriptπ‘₯π‘˜1f\left(x_{k+1}\right)italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ):

f⁒(x^k+1)𝑓subscript^π‘₯π‘˜1\displaystyle f\left(\hat{x}_{k+1}\right)italic_f ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) =βˆ’A⁒(xk+1βˆ’PdΒ―k⁒(A⁒xk+1βˆ’b))+babsent𝐴subscriptπ‘₯π‘˜1subscript𝑃subscriptΒ―π‘‘π‘˜π΄subscriptπ‘₯π‘˜1𝑏𝑏\displaystyle=-A\left(x_{k+1}-P_{\bar{d}_{k}}\left(Ax_{k+1}-b\right)\right)+b= - italic_A ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_P start_POSTSUBSCRIPT overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_A italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_b ) ) + italic_b
f⁒(x^k+1)𝑓subscript^π‘₯π‘˜1\displaystyle f\left(\hat{x}_{k+1}\right)italic_f ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) =βˆ’(A⁒xk+1βˆ’A⁒PdΒ―k⁒(A⁒xk+1βˆ’b)βˆ’b)absent𝐴subscriptπ‘₯π‘˜1𝐴subscript𝑃subscriptΒ―π‘‘π‘˜π΄subscriptπ‘₯π‘˜1𝑏𝑏\displaystyle=-\left(Ax_{k+1}-AP_{\bar{d}_{k}}\left(Ax_{k+1}-b\right)-b\right)= - ( italic_A italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_A italic_P start_POSTSUBSCRIPT overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_A italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_b ) - italic_b )
f⁒(x^k+1)𝑓subscript^π‘₯π‘˜1\displaystyle f\left(\hat{x}_{k+1}\right)italic_f ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) =βˆ’(Iβˆ’A⁒PdΒ―k)⁒(A⁒xk+1βˆ’b)absent𝐼𝐴subscript𝑃subscriptΒ―π‘‘π‘˜π΄subscriptπ‘₯π‘˜1𝑏\displaystyle=-\left(I-AP_{\bar{d}_{k}}\right)\left(Ax_{k+1}-b\right)= - ( italic_I - italic_A italic_P start_POSTSUBSCRIPT overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ( italic_A italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_b )
f⁒(x^k+1)𝑓subscript^π‘₯π‘˜1\displaystyle f\left(\hat{x}_{k+1}\right)italic_f ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) =(Iβˆ’A⁒PdΒ―k)⁒f⁒(xk+1).absent𝐼𝐴subscript𝑃subscriptΒ―π‘‘π‘˜π‘“subscriptπ‘₯π‘˜1\displaystyle=\left(I-AP_{\bar{d}_{k}}\right)f\left(x_{k+1}\right).= ( italic_I - italic_A italic_P start_POSTSUBSCRIPT overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) .

If f⁒(xk+1)=πŸŽπ‘“subscriptπ‘₯π‘˜10f\left(x_{k+1}\right)=\mathbf{0}italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) = bold_0 (the AA algorithm has converged), f⁒(x^k+1)=πŸŽπ‘“subscript^π‘₯π‘˜10f\left(\hat{x}_{k+1}\right)=\mathbf{0}italic_f ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) = bold_0 and the theorem trivially verified. If not, we may compute β€–f⁒(x^k+1)β€–Aβˆ’12superscriptsubscriptnorm𝑓subscript^π‘₯π‘˜1superscript𝐴12\left\|f\left(\hat{x}_{k+1}\right)\right\|_{A^{-1}}^{2}βˆ₯ italic_f ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) βˆ₯ start_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and β€–f⁒(xk+1)β€–Aβˆ’12superscriptsubscriptnorm𝑓subscriptπ‘₯π‘˜1superscript𝐴12\left\|f\left(x_{k+1}\right)\right\|_{A^{-1}}^{2}βˆ₯ italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) βˆ₯ start_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT:

β€–f⁒(xk+1)β€–Aβˆ’12=f⁒(xk+1)⊺⁒Aβˆ’1⁒f⁒(xk+1)=(Aβˆ’0.5⁒f⁒(xk+1))⊺⁒(Aβˆ’0.5⁒f⁒(xk+1)).superscriptsubscriptnorm𝑓subscriptπ‘₯π‘˜1superscript𝐴12𝑓superscriptsubscriptπ‘₯π‘˜1⊺superscript𝐴1𝑓subscriptπ‘₯π‘˜1superscriptsuperscript𝐴0.5𝑓subscriptπ‘₯π‘˜1⊺superscript𝐴0.5𝑓subscriptπ‘₯π‘˜1\left\|f\left(x_{k+1}\right)\right\|_{A^{-1}}^{2}=f\left(x_{k+1}\right)^{% \intercal}A^{-1}f\left(x_{k+1}\right)=\left(A^{-0.5}f\left(x_{k+1}\right)% \right)^{\intercal}\left(A^{-0.5}f\left(x_{k+1}\right)\right).βˆ₯ italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) βˆ₯ start_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) = ( italic_A start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT ( italic_A start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) ) .

Similarly,

β€–f⁒(x^k+1)β€–Aβˆ’12superscriptsubscriptnorm𝑓subscript^π‘₯π‘˜1superscript𝐴12\displaystyle\left\|f\left(\hat{x}_{k+1}\right)\right\|_{A^{-1}}^{2}βˆ₯ italic_f ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) βˆ₯ start_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =f⁒(xk+1)⊺⁒(Iβˆ’A⁒PdΒ―k)⊺⁒Aβˆ’1⁒(Iβˆ’A⁒PdΒ―k)⁒f⁒(xk+1)absent𝑓superscriptsubscriptπ‘₯π‘˜1⊺superscript𝐼𝐴subscript𝑃subscriptΒ―π‘‘π‘˜βŠΊsuperscript𝐴1𝐼𝐴subscript𝑃subscriptΒ―π‘‘π‘˜π‘“subscriptπ‘₯π‘˜1\displaystyle=f\left(x_{k+1}\right)^{\intercal}\left(I-AP_{\bar{d}_{k}}\right)% ^{\intercal}A^{-1}\left(I-AP_{\bar{d}_{k}}\right)f\left(x_{k+1}\right)= italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT ( italic_I - italic_A italic_P start_POSTSUBSCRIPT overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_I - italic_A italic_P start_POSTSUBSCRIPT overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT )
β€–f⁒(x^k+1)β€–Aβˆ’12superscriptsubscriptnorm𝑓subscript^π‘₯π‘˜1superscript𝐴12\displaystyle\left\|f\left(\hat{x}_{k+1}\right)\right\|_{A^{-1}}^{2}βˆ₯ italic_f ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) βˆ₯ start_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =f⁒(xk+1)⊺⁒(Aβˆ’0.5βˆ’A0.5⁒PdΒ―k)⊺⁒(Aβˆ’0.5βˆ’A0.5⁒PdΒ―k)⁒f⁒(xk+1)absent𝑓superscriptsubscriptπ‘₯π‘˜1⊺superscriptsuperscript𝐴0.5superscript𝐴0.5subscript𝑃subscriptΒ―π‘‘π‘˜βŠΊsuperscript𝐴0.5superscript𝐴0.5subscript𝑃subscriptΒ―π‘‘π‘˜π‘“subscriptπ‘₯π‘˜1\displaystyle=f\left(x_{k+1}\right)^{\intercal}\left(A^{-0.5}-A^{0.5}P_{\bar{d% }_{k}}\right)^{\intercal}\left(A^{-0.5}-A^{0.5}P_{\bar{d}_{k}}\right)f\left(x_% {k+1}\right)= italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT ( italic_A start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT - italic_A start_POSTSUPERSCRIPT 0.5 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT ( italic_A start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT - italic_A start_POSTSUPERSCRIPT 0.5 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT )
β€–f⁒(x^k+1)β€–Aβˆ’12superscriptsubscriptnorm𝑓subscript^π‘₯π‘˜1superscript𝐴12\displaystyle\left\|f\left(\hat{x}_{k+1}\right)\right\|_{A^{-1}}^{2}βˆ₯ italic_f ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) βˆ₯ start_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =(Aβˆ’0.5⁒f⁒(xk+1))⊺⁒(Iβˆ’A0.5⁒PdΒ―k⁒A0.5)2⁒Aβˆ’0.5⁒f⁒(xk+1).absentsuperscriptsuperscript𝐴0.5𝑓subscriptπ‘₯π‘˜1⊺superscript𝐼superscript𝐴0.5subscript𝑃subscriptΒ―π‘‘π‘˜superscript𝐴0.52superscript𝐴0.5𝑓subscriptπ‘₯π‘˜1\displaystyle=\left(A^{-0.5}f\left(x_{k+1}\right)\right)^{\intercal}\left(I-A^% {0.5}P_{\bar{d}_{k}}A^{0.5}\right)^{2}A^{-0.5}f\left(x_{k+1}\right).= ( italic_A start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT ( italic_I - italic_A start_POSTSUPERSCRIPT 0.5 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT 0.5 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) .

By the properties of projection operators and matrix multiplications, the minimum eigenvalue of P𝑃Pitalic_P is zero. Label Ξ»minsubscriptπœ†\lambda_{\min}italic_Ξ» start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT and Ξ»maxsubscriptπœ†\lambda_{\max}italic_Ξ» start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT are the smallest and largest eigenvalue of A𝐴Aitalic_A, respectively. Since A𝐴Aitalic_A has no negative eigenvalues, the minimum eigenvalue of A0.5⁒PdΒ―k⁒A0.5superscript𝐴0.5subscript𝑃subscriptΒ―π‘‘π‘˜superscript𝐴0.5A^{0.5}P_{\bar{d}_{k}}A^{0.5}italic_A start_POSTSUPERSCRIPT 0.5 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT 0.5 end_POSTSUPERSCRIPT for any dΒ―ksubscriptΒ―π‘‘π‘˜\bar{d}_{k}overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is Ξ»minβˆ—0βˆ—Ξ»min=0βˆ—subscriptπœ†0subscriptπœ†0\sqrt{\lambda_{\min}}\ast 0\ast\sqrt{\lambda_{\min}}=0square-root start_ARG italic_Ξ» start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT end_ARG βˆ— 0 βˆ— square-root start_ARG italic_Ξ» start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT end_ARG = 0 and its maximum eigenvalue is Ξ»maxβˆ—1βˆ—Ξ»max=Ξ»max<2βˆ—subscriptπœ†1subscriptπœ†subscriptπœ†2\sqrt{\lambda_{\max}}\ast 1\ast\sqrt{\lambda_{\max}}=\lambda_{\max}<2square-root start_ARG italic_Ξ» start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_ARG βˆ— 1 βˆ— square-root start_ARG italic_Ξ» start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_ARG = italic_Ξ» start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT < 2. Therefore, the maximum eigenvalue of (Iβˆ’A0.5⁒PdΒ―k⁒A0.5)2superscript𝐼superscript𝐴0.5subscript𝑃subscriptΒ―π‘‘π‘˜superscript𝐴0.52\left(I-A^{0.5}P_{\bar{d}_{k}}A^{0.5}\right)^{2}( italic_I - italic_A start_POSTSUPERSCRIPT 0.5 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT 0.5 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT must be max⁑{(1βˆ’0)2,(1βˆ’Ξ»max)2}=1superscript102superscript1subscriptπœ†21\max\{\left(1-0\right)^{2},\left(1-\lambda_{\max}\right)^{2}\}=1roman_max { ( 1 - 0 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ( 1 - italic_Ξ» start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } = 1.

The ratio

β€–f⁒(x^k+1)β€–Aβˆ’12β€–f⁒(xk+1)β€–Aβˆ’12=(Aβˆ’0.5⁒f⁒(xk+1))⊺⁒(Iβˆ’A0.5⁒PdΒ―k⁒A0.5)2⁒Aβˆ’0.5⁒f⁒(xk+1)(Aβˆ’0.5⁒f⁒(xk+1))⊺⁒(Aβˆ’0.5⁒f⁒(xk+1))superscriptsubscriptnorm𝑓subscript^π‘₯π‘˜1superscript𝐴12superscriptsubscriptnorm𝑓subscriptπ‘₯π‘˜1superscript𝐴12superscriptsuperscript𝐴0.5𝑓subscriptπ‘₯π‘˜1⊺superscript𝐼superscript𝐴0.5subscript𝑃subscriptΒ―π‘‘π‘˜superscript𝐴0.52superscript𝐴0.5𝑓subscriptπ‘₯π‘˜1superscriptsuperscript𝐴0.5𝑓subscriptπ‘₯π‘˜1⊺superscript𝐴0.5𝑓subscriptπ‘₯π‘˜1\frac{\left\|f\left(\hat{x}_{k+1}\right)\right\|_{A^{-1}}^{2}}{\left\|f\left(x% _{k+1}\right)\right\|_{A^{-1}}^{2}}=\frac{\left(A^{-0.5}f\left(x_{k+1}\right)% \right)^{\intercal}\left(I-A^{0.5}P_{\bar{d}_{k}}A^{0.5}\right)^{2}A^{-0.5}f% \left(x_{k+1}\right)}{\left(A^{-0.5}f\left(x_{k+1}\right)\right)^{\intercal}% \left(A^{-0.5}f\left(x_{k+1}\right)\right)}divide start_ARG βˆ₯ italic_f ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) βˆ₯ start_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG βˆ₯ italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) βˆ₯ start_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = divide start_ARG ( italic_A start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT ( italic_I - italic_A start_POSTSUPERSCRIPT 0.5 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT 0.5 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) end_ARG start_ARG ( italic_A start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT ( italic_A start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) ) end_ARG

is a Rayleigh quotient. By the property of Rayleigh quotients, its maximum value is the maximum eigenvalue of (Iβˆ’A0.5⁒PdΒ―k⁒A0.5)2superscript𝐼superscript𝐴0.5subscript𝑃subscriptΒ―π‘‘π‘˜superscript𝐴0.52\left(I-A^{0.5}P_{\bar{d}_{k}}A^{0.5}\right)^{2}( italic_I - italic_A start_POSTSUPERSCRIPT 0.5 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT overΒ― start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT 0.5 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Therefore β€–f⁒(x^k+1)β€–Aβˆ’1β€–f⁒(xk+1)β€–Aβˆ’1≀1subscriptnorm𝑓subscript^π‘₯π‘˜1superscript𝐴1subscriptnorm𝑓subscriptπ‘₯π‘˜1superscript𝐴11\frac{\left\|f\left(\hat{x}_{k+1}\right)\right\|_{A^{-1}}}{\left\|f\left(x_{k+% 1}\right)\right\|_{A^{-1}}}\leq 1divide start_ARG βˆ₯ italic_f ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) βˆ₯ start_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG βˆ₯ italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) βˆ₯ start_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG ≀ 1, which completes the proof. ∎

As mentioned in introduction, with g⁒(xk+1)𝑔subscriptπ‘₯π‘˜1g\left(x_{k+1}\right)italic_g ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) already computed, using Ξ²^ksubscript^π›½π‘˜\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT at step kπ‘˜kitalic_k instead of Ξ²ksubscriptπ›½π‘˜\beta_{k}italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT involves computing g⁒(x^k+1)𝑔subscript^π‘₯π‘˜1g\left(\hat{x}_{k+1}\right)italic_g ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ), a second map in the same iteration kπ‘˜kitalic_k. However, if the optimal relaxation parameters tend to be correlated, using Ξ²^ksubscript^π›½π‘˜\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT in step k+1π‘˜1k+1italic_k + 1 only adds two inner products, a negligible computation cost within the AA algorithm.

3.2 A linear example with AAmd

Consider the same example as in Section 2.2. Four different AA implementations will be compared to study the impact of the initial choice of Ξ²ksubscriptπ›½π‘˜\beta_{k}italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT on Ξ²^ksubscript^π›½π‘˜\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and the impact of using Ξ²^kβˆ’1subscript^π›½π‘˜1\hat{\beta}_{k-1}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT as approximation for Ξ²^ksubscript^π›½π‘˜\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. The first implementation is a stationary relaxation parameter Ξ²=1𝛽1\beta=1italic_Ξ² = 1 for reference. The second is AA where, at each iteration, a default relaxation parameter Ξ²=1𝛽1\beta=1italic_Ξ² = 1 is used to compute xk+1subscriptπ‘₯π‘˜1x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT and g⁒(xk+1)𝑔subscriptπ‘₯π‘˜1g(x_{k+1})italic_g ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) and xk+1subscriptπ‘₯π‘˜1x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT and g⁒(xk+1)𝑔subscriptπ‘₯π‘˜1g(x_{k+1})italic_g ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) are used to compute Ξ²^ksubscript^π›½π‘˜\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT to obtain x^k+1subscript^π‘₯π‘˜1\hat{x}_{k+1}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT. This second specification will be labeled β€œAAmd, Ξ²^ksubscript^π›½π‘˜\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT (from 1)”. The third implementation is AA where, at each iteration, the previous relaxation parameter Ξ²^kβˆ’1subscript^π›½π‘˜1\hat{\beta}_{k-1}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT is used to compute xk+1subscriptπ‘₯π‘˜1x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT and g⁒(xk+1)𝑔subscriptπ‘₯π‘˜1g(x_{k+1})italic_g ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ), which are used to recompute Ξ²^ksubscript^π›½π‘˜\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT to obtain x^k+1subscript^π‘₯π‘˜1\hat{x}_{k+1}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT. This third specification will be labeled β€œAAmd, Ξ²^ksubscript^π›½π‘˜\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT (from Ξ²^kβˆ’1subscript^π›½π‘˜1\hat{\beta}_{k-1}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT)”. The fourth specification is AA where, at each iteration, x^ksubscript^π‘₯π‘˜\hat{x}_{k}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and g⁒(x^k)𝑔subscript^π‘₯π‘˜g(\hat{x}_{k})italic_g ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) are used to compute Ξ²^kβˆ’1subscript^π›½π‘˜1\hat{\beta}_{k-1}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT, and Ξ²^kβˆ’1subscript^π›½π‘˜1\hat{\beta}_{k-1}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT is used directly at iteration kπ‘˜kitalic_k to compute x^k+1subscript^π‘₯π‘˜1\hat{x}_{k+1}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT. It will be labeled β€œAAmd, Ξ²^kβˆ’1subscript^π›½π‘˜1\hat{\beta}_{k-1}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT”. Again, m=8π‘š8m=8italic_m = 8 is used in all algorithms.

Refer to caption


Figure 3: Residuals (with norm Aβˆ’1superscript𝐴1A^{-1}italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT), Linear system of equations

Refer to caption

Figure 4: Relaxation parameter, linear system of equations

Figure 4 shows the Aβˆ’1superscript𝐴1A^{-1}italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT-norm of the residual for each algorithm. AAmd, Ξ²^ksubscript^π›½π‘˜\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT (from Ξ²^kβˆ’1subscript^π›½π‘˜1\hat{\beta}_{k-1}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT) and AAmd, Ξ²^kβˆ’1subscript^π›½π‘˜1\hat{\beta}_{k-1}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT show very modest convergence improvements. Note that with mβ‰₯19π‘š19m\geq 19italic_m β‰₯ 19, all algorithms would essentially converge in the same number of iterations.

More interestingly, Figure 4 shows the relaxation parameter for each AA implementations. Interestingly, most Ξ²^ksubscript^π›½π‘˜\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT are above 1, and very often above 2. As argued before, they are also obviously correlated from one iteration to the next. Hence, it makes sense to avoid computing an extra map by using past information to compute Ξ²^^𝛽\hat{\beta}over^ start_ARG italic_Ξ² end_ARG. Finally, the default relaxation parameter used to calculate xk+1subscriptπ‘₯π‘˜1x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT matters, as attested by the relatively poor performance of AAmd, Ξ²^ksubscript^π›½π‘˜\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT (from 1111) compared to AAmd, Ξ²^ksubscript^π›½π‘˜\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT (from Ξ²^kβˆ’1subscript^π›½π‘˜1\hat{\beta}_{k-1}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT) and AAmd, Ξ²^kβˆ’1subscript^π›½π‘˜1\hat{\beta}_{k-1}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT.

3.3 Extra regularization for AAmd

From now on, we only consider AAmd, Ξ²^kβˆ’1subscript^π›½π‘˜1\hat{\beta}_{k-1}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT – where Ξ²^kβˆ’1subscript^π›½π‘˜1\hat{\beta}_{k-1}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT is used as an approximation for Ξ²^ksubscript^π›½π‘˜\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT to compute xk+1subscriptπ‘₯π‘˜1x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT. There is no guarantee that Ξ²^kβ‰ˆΞ²^kβˆ’1subscript^π›½π‘˜subscript^π›½π‘˜1\hat{\beta}_{k}\approx\hat{\beta}_{k-1}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT β‰ˆ over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT, but a heuristic way of detecting whether it may be the case is by verifying how much Ξ²^ksubscript^π›½π‘˜\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT varies between iterations. If it varies too much, a safer strategy is to fall back on the default relaxation parameter.

Another concern is the fact that a high Ξ²^ksubscript^π›½π‘˜\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT can lead to an even higher Ξ²^k+1subscript^π›½π‘˜1\hat{\beta}_{k+1}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT, Ξ²^k+2subscript^π›½π‘˜2\hat{\beta}_{k+2}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k + 2 end_POSTSUBSCRIPT, etc. In many numerical experiments shown in Section 5.4, Ξ²^1subscript^𝛽1\hat{\beta}_{1}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is set to a default relaxation parameter (always 1), but Ξ²^2,Ξ²^3,…subscript^𝛽2subscript^𝛽3…\hat{\beta}_{2},\hat{\beta}_{3},...over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , … sometimes diverge far away from the unit interval, causing worse convergence than with the default Ξ²=1𝛽1\beta=1italic_Ξ² = 1. A very effective solution to this problem is to reset Ξ²ksubscriptπ›½π‘˜\beta_{k}italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT to 1111 (or some other default value between 0 and 1) for one iteration if Ξ²^^𝛽\hat{\beta}over^ start_ARG italic_Ξ² end_ARG has been above 1 for too many consecutive iterations.

3.4 The full AAmd algorithm

The entire AAmd algorithm is as follows. It takes as input two additional parameters: δ𝛿\deltaitalic_Ξ΄, the discrepancy allowed between Ξ²^ksubscript^π›½π‘˜\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and Ξ²^kβˆ’1subscript^π›½π‘˜1\hat{\beta}_{k-1}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT and P𝑃Pitalic_P, the maximum number of consecutive iterations where Ξ²^ksubscript^π›½π‘˜\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is allowed to be greater than 1 before Ξ²^ksubscript^π›½π‘˜\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is reset to Ξ²defaultsubscript𝛽default\beta_{\text{default}}italic_Ξ² start_POSTSUBSCRIPT default end_POSTSUBSCRIPT.

Algorithm 3.

Input: a mapping g:ℝn→ℝn:𝑔→superscriptℝ𝑛superscriptℝ𝑛g:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}italic_g : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT β†’ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, a starting point x0βˆˆβ„nsubscriptπ‘₯0superscriptℝ𝑛x_{0}\in\mathbb{R}^{n}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, 1≀m≀n1π‘šπ‘›1\leq m\leq n1 ≀ italic_m ≀ italic_n, Pβ‰₯0𝑃0P\geq 0italic_P β‰₯ 0,Ξ΄>0,Ξ²max>0formulae-sequence𝛿0subscript𝛽0\delta>0,\beta_{\max}>0\ \ italic_Ξ΄ > 0 , italic_Ξ² start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT > 0and 0<Ξ²default≀βmax0subscript𝛽defaultsubscript𝛽0<\beta_{\text{default}}\leq\beta_{\max}0 < italic_Ξ² start_POSTSUBSCRIPT default end_POSTSUBSCRIPT ≀ italic_Ξ² start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT.

Β 

1 Set x1=g⁒(x0)subscriptπ‘₯1𝑔subscriptπ‘₯0x_{1}=g(x_{0})italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_g ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT )
2 Set n>1=0subscript𝑛absent10n_{>1}=0italic_n start_POSTSUBSCRIPT > 1 end_POSTSUBSCRIPT = 0
3 for k=1,2,β€¦π‘˜12…k=1,2,...italic_k = 1 , 2 , … until convergence
4     Compute g⁒(xk)𝑔subscriptπ‘₯π‘˜g\left(x_{k}\right)italic_g ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
5     Compute (Ξ±1(k),…,Ξ±mk(k))superscriptsubscript𝛼1π‘˜β€¦superscriptsubscript𝛼subscriptπ‘šπ‘˜π‘˜(\alpha_{1}^{(k)},...,\alpha_{m_{k}}^{(k)})( italic_Ξ± start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT , … , italic_Ξ± start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) that solve
     minΞ±1(k),…,Ξ±mk(k)β‘β€–βˆ‘i=1mkΞ±i(k)⁒f⁒(xkβˆ’mk+i)β€–subscriptsuperscriptsubscript𝛼1π‘˜β€¦superscriptsubscript𝛼subscriptπ‘šπ‘˜π‘˜normsuperscriptsubscript𝑖1subscriptπ‘šπ‘˜superscriptsubscriptπ›Όπ‘–π‘˜π‘“subscriptπ‘₯π‘˜subscriptπ‘šπ‘˜π‘–\min_{\alpha_{1}^{(k)},...,\alpha_{m_{k}}^{(k)}}\left\|\sum_{i=1}^{m_{k}}% \alpha_{i}^{(k)}f\left(x_{k-m_{k}+i}\right)\right\|roman_min start_POSTSUBSCRIPT italic_Ξ± start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT , … , italic_Ξ± start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT βˆ₯ βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_Ξ± start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_k - italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_i end_POSTSUBSCRIPT ) βˆ₯ s.t. βˆ‘i=1mkΞ±i(k)=1superscriptsubscript𝑖1subscriptπ‘šπ‘˜superscriptsubscriptπ›Όπ‘–π‘˜1\sum_{i=1}^{m_{k}}\alpha_{i}^{(k)}=1βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_Ξ± start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT = 1
6     Compute xΒ―k=βˆ‘i=1mkΞ±i(k)⁒xkβˆ’mk+i⁒ and ⁒yΒ―k=βˆ‘i=1mkΞ±i(k)⁒g⁒(xkβˆ’mk+i)subscriptΒ―π‘₯π‘˜superscriptsubscript𝑖1subscriptπ‘šπ‘˜superscriptsubscriptπ›Όπ‘–π‘˜subscriptπ‘₯π‘˜subscriptπ‘šπ‘˜π‘–Β andΒ subscriptΒ―π‘¦π‘˜superscriptsubscript𝑖1subscriptπ‘šπ‘˜superscriptsubscriptπ›Όπ‘–π‘˜π‘”subscriptπ‘₯π‘˜subscriptπ‘šπ‘˜π‘–\overline{x}_{k}=\sum_{i=1}^{m_{k}}\alpha_{i}^{(k)}x_{k-m_{k}+i}\text{ and }% \overline{y}_{k}=\sum_{i=1}^{m_{k}}\alpha_{i}^{(k)}g\left(x_{k-m_{k}+i}\right)overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_Ξ± start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_k - italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_i end_POSTSUBSCRIPT and overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_Ξ± start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT italic_g ( italic_x start_POSTSUBSCRIPT italic_k - italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_i end_POSTSUBSCRIPT )
7     If kβ‰₯2π‘˜2k\geq 2italic_k β‰₯ 2: Compute Ξ²^kβˆ’1=⟨yΒ―kβˆ’1βˆ’xΒ―kβˆ’1,g⁒(xk)βˆ’xΒ―kβˆ’1βŸ©β€–yΒ―kβˆ’1βˆ’xΒ―kβˆ’1β€–2subscript^π›½π‘˜1subscriptΒ―π‘¦π‘˜1subscriptΒ―π‘₯π‘˜1𝑔subscriptπ‘₯π‘˜subscriptΒ―π‘₯π‘˜1superscriptnormsubscriptΒ―π‘¦π‘˜1subscriptΒ―π‘₯π‘˜12\hat{\beta}_{k-1}=\frac{\left\langle\overline{y}_{k-1}-\overline{x}_{k-1},g% \left(x_{k}\right)-\overline{x}_{k-1}\right\rangle}{\left\|\overline{y}_{k-1}-% \overline{x}_{k-1}\right\|^{2}}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT = divide start_ARG ⟨ overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT - overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , italic_g ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ⟩ end_ARG start_ARG βˆ₯ overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT - overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT βˆ₯ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
8     If kβ‰₯3π‘˜3k\geq 3italic_k β‰₯ 3 and |Ξ²^kβˆ’1βˆ’Ξ²^kβˆ’2|<Ξ΄subscript^π›½π‘˜1subscript^π›½π‘˜2𝛿|\hat{\beta}_{k-1}-\hat{\beta}_{k-2}|<\delta\ | over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT - over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k - 2 end_POSTSUBSCRIPT | < italic_Ξ΄and n>1≀Psubscript𝑛absent1𝑃n_{>1}\leq Pitalic_n start_POSTSUBSCRIPT > 1 end_POSTSUBSCRIPT ≀ italic_P and Ξ²^kβˆ’1>0subscript^π›½π‘˜10\hat{\beta}_{k-1}>0over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT > 0
9       Set Ξ²k=min⁑(Ξ²^kβˆ’1,Ξ²max)subscriptπ›½π‘˜subscript^π›½π‘˜1subscript𝛽\beta_{k}=\min(\hat{\beta}_{k-1},\beta_{\max})italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = roman_min ( over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , italic_Ξ² start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT )
10     else
11       Set Ξ²k=Ξ²defaultsubscriptπ›½π‘˜subscript𝛽default\beta_{k}=\beta_{\text{default}}italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_Ξ² start_POSTSUBSCRIPT default end_POSTSUBSCRIPT
12     end if
13     If Ξ²k>1subscriptπ›½π‘˜1\beta_{k}>1italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > 1
14       Set n>1=n>1+1subscript𝑛absent1subscript𝑛absent11n_{>1}=n_{>1}+1italic_n start_POSTSUBSCRIPT > 1 end_POSTSUBSCRIPT = italic_n start_POSTSUBSCRIPT > 1 end_POSTSUBSCRIPT + 1
15     else
16       Set n>1=0subscript𝑛absent10n_{>1}=0italic_n start_POSTSUBSCRIPT > 1 end_POSTSUBSCRIPT = 0
17     end if
18     Set xk+1=xΒ―k+Ξ²k⁒(yΒ―kβˆ’xΒ―k)subscriptπ‘₯π‘˜1subscriptΒ―π‘₯π‘˜subscriptπ›½π‘˜subscriptΒ―π‘¦π‘˜subscriptΒ―π‘₯π‘˜x_{k+1}=\overline{x}_{k}+\beta_{k}(\overline{y}_{k}-\overline{x}_{k})italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( overΒ― start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - overΒ― start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
19 end for

Β 

The parameters chosen for the experiments below are Ξ²default=1subscript𝛽default1\beta_{\text{default}}=1italic_Ξ² start_POSTSUBSCRIPT default end_POSTSUBSCRIPT = 1, Ξ²max=3subscript𝛽3\beta_{\max}=3italic_Ξ² start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT = 3, Ξ΄=2,P=10formulae-sequence𝛿2𝑃10\delta=2,P=10italic_Ξ΄ = 2 , italic_P = 10.

4 Implementation details for Anderson acceleration

4.1 Solving the linear system

In addition to computing g𝑔gitalic_g, AA can spend a substantial amount of time solving the linear system. The problem is customarily formulated as an unconstrained optimization

minΞ³k⁑‖f⁒(xk)βˆ’β„±k⁒γkβ€–,subscriptsubscriptπ›Ύπ‘˜norm𝑓subscriptπ‘₯π‘˜subscriptβ„±π‘˜subscriptπ›Ύπ‘˜\min_{\gamma_{k}}\left\|f\left(x_{k}\right)-\mathcal{F}_{k}\gamma_{k}\right\|,roman_min start_POSTSUBSCRIPT italic_Ξ³ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT βˆ₯ italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - caligraphic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_Ξ³ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT βˆ₯ ,

where β„±kβˆˆβ„nΓ—mk=[f⁒(xk)βˆ’f⁒(xkβˆ’1),β‹―,f⁒(xkβˆ’mk)βˆ’f⁒(xkβˆ’mkβˆ’1)]subscriptβ„±π‘˜superscriptℝ𝑛subscriptπ‘šπ‘˜π‘“subscriptπ‘₯π‘˜π‘“subscriptπ‘₯π‘˜1⋯𝑓subscriptπ‘₯π‘˜subscriptπ‘šπ‘˜π‘“subscriptπ‘₯π‘˜subscriptπ‘šπ‘˜1\mathcal{F}_{k}\in\mathbb{R}^{n\times m_{k}}=\left[f\left(x_{k}\right)-f\left(% x_{k-1}\right),\cdots,f\left(x_{k-m_{k}}\right)-f\left(x_{k-m_{k}-1}\right)\right]caligraphic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n Γ— italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = [ italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) , β‹― , italic_f ( italic_x start_POSTSUBSCRIPT italic_k - italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_k - italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT ) ], and Ξ³kβˆˆβ„mksubscriptπ›Ύπ‘˜superscriptℝsubscriptπ‘šπ‘˜\gamma_{k}\in\mathbb{R}^{m_{k}}italic_Ξ³ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. As suggested in [23], it can be solved more quickly by QR decomposition. New columns can be added from the right of the Q and R matrices at each iteration and efficiently dropped from the left using Givens rotation. In the AA implementation used in the numerical section, the QR decomposition is recomputed anew after 10 rotations to limit the accumulation of numerical inaccuracies.

A central concern of all Anderson-type acceleration methods is the conditioning of the linear system. As the algorithm converges, new columns can be orders of magnitude smaller than old ones and can sometimes be close to linearly dependent. As in [23], this will be addressed by dropping left-most columns of β„±ksubscriptβ„±π‘˜\mathcal{F}_{k}caligraphic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT until the R matrix has a reasonable condition number. In the PDE estimation, the upper limit for the condition number was set to 1012superscript101210^{12}10 start_POSTSUPERSCRIPT 12 end_POSTSUPERSCRIPT while for the EM algorithm application, it was 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT.

Other methods for addressing ill-conditioning and adjusting mksubscriptπ‘šπ‘˜m_{k}italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT have been suggested in [10], [15] and [6].

4.2 Restarts and composite AA

To limit ill-conditioning and the size of the linear system to be solved, Fang and Saad [10] suggested restarting the algorithm from the last iterate and ignoring past directions. Pratapa and Suryanarayana [17] and Henderson and Varadhan [11] made similar points in the context of Pulay mixing and AA.

A cousin of this idea is the composite AA, explored by Chen and Vuik in 2022 [5]. After one AA iteration, instead of using the iterate xksubscriptπ‘₯π‘˜x_{k}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT directly to compute the map g⁒(xk)𝑔subscriptπ‘₯π‘˜g(x_{k})italic_g ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), they propose using xksubscriptπ‘₯π‘˜x_{k}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT as the starting point for a second AA (lasting only one or two iterations), and feeding the result of this second (inner-loop) AA back in the original (outer-loop) AA.

Both ideas were tried on all problems. Whilst periodic restarts did not reliably improve performances, composite AAmd with a one-iteration AA for the inner loop showed very good results for the PDE problem. Hence, composite AA with a one-iteration AA with Ξ²=1𝛽1\beta=1italic_Ξ² = 1 in the inner loop will be included in the set of specifications to test.

5 Applications

A Poisson PDE applications and two EM algorithm applications were used as benchmarks. They are sufficiently challenging to estimate and require enough iterations to create visible differences in performance between different AA implementations. They also offer a good variety of number of parameters and mapping computation time.

The EM algorithm is commonly used to fit statistical models with missing data or latent variables to estimate the parameters of underlying unobserved distributions. It consists of two steps. An expectation step takes the model parameters as given and updates the parameters of the unobserved data via Bayes’ rule. Then, the likelihood of the observed data is maximized, taking the unobserved distributions as given. The EM algorithm is usually very stable and always converges, although it can sometimes be to a saddle point instead of a maximum. It is also notoriously slow, making it a prime candidate for acceleration. Both EM application were adapted from the R code used in [11].

5.1 The Bratu problem

The standard Liouville-Bratu-Gelfand equation is a nonlinear version of the Poisson equation, described as

Δ⁒u+λ⁒eu=0,Ξ”π‘’πœ†superscript𝑒𝑒0\Delta u+\lambda e^{u}=0,roman_Ξ” italic_u + italic_Ξ» italic_e start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT = 0 ,

where u𝑒uitalic_u is a function (x,y)βˆˆπ’Ÿ=[0,1]2π‘₯π‘¦π’Ÿsuperscript012\left(x,y\right)\in\mathcal{D}=[0,1]^{2}( italic_x , italic_y ) ∈ caligraphic_D = [ 0 , 1 ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and Ξ»πœ†\lambdaitalic_Ξ» is a constant physical parameter. It is a popular application for benchmarking new fixed-point acceleration methods (see [23] and [10], for example). Dirichlet boundary conditions are applied such that u⁒(x,y)=0𝑒π‘₯𝑦0u(x,y)=0italic_u ( italic_x , italic_y ) = 0 on the boundary of π’Ÿπ’Ÿ\mathcal{D}caligraphic_D. It is solved using the inverse of the discrete Laplace operator as preconditioner as in [6]. The mapping is

xi(k+1)=xi(k)+(biβˆ’Ai⁒x+λ⁒exi(k))/Ai,ii=1,…,502,formulae-sequencesuperscriptsubscriptπ‘₯π‘–π‘˜1superscriptsubscriptπ‘₯π‘–π‘˜subscript𝑏𝑖subscript𝐴𝑖π‘₯πœ†superscript𝑒superscriptsubscriptπ‘₯π‘–π‘˜subscript𝐴𝑖𝑖𝑖1…superscript502x_{i}^{(k+1)}=x_{i}^{(k)}+(b_{i}-A_{i}x+\lambda e^{x_{i}^{(k)}})/A_{i,i}\quad i% =1,...,50^{2},italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT = italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT + ( italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x + italic_Ξ» italic_e start_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) / italic_A start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT italic_i = 1 , … , 50 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where A𝐴Aitalic_A is the Laplace operator, Ai,isubscript𝐴𝑖𝑖A_{i,i}italic_A start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT is its row i𝑖iitalic_i and column i𝑖iitalic_i’s entry, and Aisubscript𝐴𝑖A_{i}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the entire row i𝑖iitalic_i. In the experiments, Ξ»πœ†\lambdaitalic_Ξ» was set to 6666 and a centered-difference discretization on a 50Γ—50505050\times 5050 Γ— 50 grid was used.

5.2 The EM algorithm for a proportional hazard model with interval censoring

Proportional hazard models are commonly used in medical and social studies, and censored data is a frequent occurrence which complicates their estimation. Wang et al. [24] proposed using the EM algorithm to estimate a semiparametric proportional hazard model with interval censoring. Their estimation is a two-stage data augmentation with latent Poisson random variables and a monotone spline to represent the baseline hazard function. The algorithm is light and simple to implement (see [24] for details), yet may benefit from acceleration.

The likelihood of an individual observation is

L⁒(Ξ΄1,Ξ΄2,Ξ΄3,𝐱)=F⁒(R|𝐱)Ξ΄1⁒{F⁒(R|𝐱)βˆ’F⁒(L|𝐱)}Ξ΄2⁒{1βˆ’F⁒(L|𝐱)}Ξ΄3,𝐿subscript𝛿1subscript𝛿2subscript𝛿3𝐱𝐹superscriptconditional𝑅𝐱subscript𝛿1superscript𝐹conditional𝑅𝐱𝐹conditional𝐿𝐱subscript𝛿2superscript1𝐹conditional𝐿𝐱subscript𝛿3L(\delta_{1},\delta_{2},\delta_{3},\mathbf{x})=F(R|\mathbf{x})^{\delta_{1}}\{F% (R|\mathbf{x})-F(L|\mathbf{x})\}^{\delta_{2}}\{1-F(L|\mathbf{x})\}^{\delta_{3}},italic_L ( italic_Ξ΄ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Ξ΄ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_Ξ΄ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , bold_x ) = italic_F ( italic_R | bold_x ) start_POSTSUPERSCRIPT italic_Ξ΄ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT { italic_F ( italic_R | bold_x ) - italic_F ( italic_L | bold_x ) } start_POSTSUPERSCRIPT italic_Ξ΄ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT { 1 - italic_F ( italic_L | bold_x ) } start_POSTSUPERSCRIPT italic_Ξ΄ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ,

where Ξ΄1subscript𝛿1\delta_{1}italic_Ξ΄ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, Ξ΄2subscript𝛿2\delta_{2}italic_Ξ΄ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and Ξ΄3subscript𝛿3\delta_{3}italic_Ξ΄ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT represent right-, interval-, or left-censoring, respectively.

Synthetic test data is produced as follows. The baseline hazard function is modeled as a six-parameter I-spline and generated as Ξ›0⁒(t)=log⁑(1+t)+t1/2subscriptΞ›0𝑑1𝑑superscript𝑑12\Lambda_{0}(t)=\log(1+t)+t^{1/2}roman_Ξ› start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t ) = roman_log ( 1 + italic_t ) + italic_t start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT. Failure times T𝑇Titalic_T are generated from a distribution F⁒(t,𝐱)=1βˆ’exp⁑{βˆ’Ξ›0⁒(t)⁒exp⁑(𝐱⊺⁒β)}𝐹𝑑𝐱1subscriptΞ›0𝑑superscriptπ±βŠΊπ›½F(t,\mathbf{x})=1-\exp\{-\Lambda_{0}(t)\exp(\mathbf{x}^{\intercal}\beta)\}italic_F ( italic_t , bold_x ) = 1 - roman_exp { - roman_Ξ› start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t ) roman_exp ( bold_x start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT italic_Ξ² ) }. The covariates 𝐱𝐱\mathbf{x}bold_x are x1,x2∼N⁒(0,0.52)similar-tosubscriptπ‘₯1subscriptπ‘₯2𝑁0superscript0.52x_{1},x_{2}\sim N(0,0.5^{2})italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∼ italic_N ( 0 , 0.5 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) and x3,x4∼similar-tosubscriptπ‘₯3subscriptπ‘₯4absentx_{3},x_{4}\simitalic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ∼Bernoulli(0.5)0.5(0.5)( 0.5 ), for a total of 10 parameters to estimate. For each subject, censoring is simulated by generating a Y∼similar-toπ‘ŒabsentY\simitalic_Y ∼Exponential(1)1(1)( 1 ) distribution and setting (L,R)=(Y,∞)πΏπ‘…π‘Œ(L,R)=(Y,\infty)( italic_L , italic_R ) = ( italic_Y , ∞ ) if Y≀Tπ‘Œπ‘‡Y\leq Titalic_Y ≀ italic_T or (L,R)=(0,Y)𝐿𝑅0π‘Œ(L,R)=(0,Y)( italic_L , italic_R ) = ( 0 , italic_Y ) if Y>Tπ‘Œπ‘‡Y>Titalic_Y > italic_T. The sample size was 2000 individuals.

During the estimations, monitoring the value of the likelihood was not necessary for convergence.

5.3 EM algorithm for admixed populations

When associating health outcomes with specific genes, a recurring confounding factor is population stratification, the clustering of genes within population subgroups. In [1], Alexander, Novembre and Lange put forward a new algorithm called ADMIXTURE to identify latent subpopulations from genomics data using the EM algorithm.

To test the algorithm, datasets are simulated as follows. Individual i∈1,…,n𝑖1…𝑛i\in 1,...,nitalic_i ∈ 1 , … , italic_n are assumed to be part of K𝐾Kitalic_K distinct ancestral groups in various proportions. Each has a pair of alleles (ai,j1,ai,j2superscriptsubscriptπ‘Žπ‘–π‘—1superscriptsubscriptπ‘Žπ‘–π‘—2a_{i,j}^{1},a_{i,j}^{2}italic_a start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT) at the marker j∈1,…,J𝑗1…𝐽j\in 1,...,Jitalic_j ∈ 1 , … , italic_J with major or minor frequencies recorded in a variable X𝑋Xitalic_X. For individual i𝑖iitalic_i and marker j𝑗jitalic_j, Xi,j=0subscript𝑋𝑖𝑗0X_{i,j}=0italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = 0 if both minor alleles are minor, Xi,j=1subscript𝑋𝑖𝑗1X_{i,j}=1italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = 1 if one is minor and one is major, and Xi,j=2subscript𝑋𝑖𝑗2X_{i,j}=2italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = 2 if both are major. The probability of observing Xi,jsubscript𝑋𝑖𝑗X_{i,j}italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT is determined by ancestry-specific parameters fk,jsubscriptπ‘“π‘˜π‘—f_{k,j}italic_f start_POSTSUBSCRIPT italic_k , italic_j end_POSTSUBSCRIPT, the frequency of minor alleles at the marker j𝑗jitalic_j in the ancestral population k∈1⁒…⁒Kπ‘˜1…𝐾k\in 1...Kitalic_k ∈ 1 … italic_K, and the parameter qi,ksubscriptπ‘žπ‘–π‘˜q_{i,k}italic_q start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT representing the unobserved proportion of ancestry of individual i𝑖iitalic_i from group kπ‘˜kitalic_k. The log-likelihood function is

L⁒(𝐅,𝐐)=βˆ‘i=1nβˆ‘j=1J{Xi,j⁒log⁑(βˆ‘k=1Kqi,k⁒fk,j)+(2βˆ’Xi,j)⁒log⁑(βˆ‘k=1Kqi,k⁒(1βˆ’fk,j))},𝐿𝐅𝐐superscriptsubscript𝑖1𝑛superscriptsubscript𝑗1𝐽subscript𝑋𝑖𝑗superscriptsubscriptπ‘˜1𝐾subscriptπ‘žπ‘–π‘˜subscriptπ‘“π‘˜π‘—2subscript𝑋𝑖𝑗superscriptsubscriptπ‘˜1𝐾subscriptπ‘žπ‘–π‘˜1subscriptπ‘“π‘˜π‘—L\left(\mathbf{F},\mathbf{Q}\right)=\sum_{i=1}^{n}\sum_{j=1}^{J}\left\{X_{i,j}% \log\left(\sum_{k=1}^{K}q_{i,k}f_{k,j}\right)+\left(2-X_{i,j}\right)\log\left(% \sum_{k=1}^{K}q_{i,k}\left(1-f_{k,j}\right)\right)\right\},italic_L ( bold_F , bold_Q ) = βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT βˆ‘ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_J end_POSTSUPERSCRIPT { italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT roman_log ( βˆ‘ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_k , italic_j end_POSTSUBSCRIPT ) + ( 2 - italic_X start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) roman_log ( βˆ‘ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT ( 1 - italic_f start_POSTSUBSCRIPT italic_k , italic_j end_POSTSUBSCRIPT ) ) } , (5.1)

where 𝐅KΓ—Jsuperscript𝐅𝐾𝐽\mathbf{F}^{K\times J}bold_F start_POSTSUPERSCRIPT italic_K Γ— italic_J end_POSTSUPERSCRIPT and 𝐐nΓ—Ksuperscript𝐐𝑛𝐾\mathbf{Q}^{n\times K}bold_Q start_POSTSUPERSCRIPT italic_n Γ— italic_K end_POSTSUPERSCRIPT are matrices with entries fk,jsubscriptπ‘“π‘˜π‘—f_{k,j}italic_f start_POSTSUBSCRIPT italic_k , italic_j end_POSTSUBSCRIPT and qi,ksubscriptπ‘žπ‘–π‘˜q_{i,k}italic_q start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT,respectively. To restrict probabilities to the unit interval during the estimation, they are modeled as transformations from unbounded parameters uk,j,vi,ksubscriptπ‘’π‘˜π‘—subscriptπ‘£π‘–π‘˜u_{k,j},v_{i,k}italic_u start_POSTSUBSCRIPT italic_k , italic_j end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT: fk,j=1/(1+eβˆ’uk,j)subscriptπ‘“π‘˜π‘—11superscript𝑒subscriptπ‘’π‘˜π‘—f_{k,j}=1/(1+e^{-u_{k,j}})italic_f start_POSTSUBSCRIPT italic_k , italic_j end_POSTSUBSCRIPT = 1 / ( 1 + italic_e start_POSTSUPERSCRIPT - italic_u start_POSTSUBSCRIPT italic_k , italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) and qi,k=evi,k/βˆ‘kevi,ksubscriptπ‘žπ‘–π‘˜superscript𝑒subscriptπ‘£π‘–π‘˜subscriptπ‘˜superscript𝑒subscriptπ‘£π‘–π‘˜q_{i,k}=e^{v_{i,k}}/\sum_{k}e^{v_{i,k}}italic_q start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT = italic_e start_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT / βˆ‘ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. New random values for X𝑋Xitalic_X and new starting values are generated for each draw with parameters K=3𝐾3K=3italic_K = 3, J=100𝐽100J=100italic_J = 100, and n=150𝑛150n=150italic_n = 150, for a total of 3Γ—(100+150)=75031001507503\times(100+150)=7503 Γ— ( 100 + 150 ) = 750 parameters to estimate.

For AA to converge, it was necessary to monitor the likelihood value (5.1) and fall back to the last EM iteration in case an AA iteration lead to a worse likelihood value. Also, since convergence was slow, the stopping criterion was set to β€–f⁒(x)‖≀10βˆ’4norm𝑓π‘₯superscript104\left\|f(x)\right\|\leq 10^{-4}βˆ₯ italic_f ( italic_x ) βˆ₯ ≀ 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT.

5.4 Example Bratu problem

5.4.1 Non-composite AA

This section compares AAopt1 and AAmd to AAopt0 and AA with stationary relaxation parameters of Ξ²=1𝛽1\beta=1italic_Ξ² = 1 and Ξ²=0.5𝛽0.5\beta=0.5italic_Ξ² = 0.5 for the Bratu problem with a x0=𝟎subscriptπ‘₯00x_{0}=\mathbf{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = bold_0 starting point. Hereafter, AAmd refers to Algorithm 3 with Ξ²^^𝛽\hat{\beta}over^ start_ARG italic_Ξ² end_ARG capped at Ξ²maxsubscript𝛽\beta_{\max}italic_Ξ² start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT and resets to Ξ²^=1^𝛽1\hat{\beta}=1over^ start_ARG italic_Ξ² end_ARG = 1. To show the impact of these regularizations, the examples also show the performance of AAmd without bounds on Ξ²^^𝛽\hat{\beta}over^ start_ARG italic_Ξ² end_ARG or resets, labeled β€œAAmd (no reg.)”. All algorithms used a maximum of m=16π‘š16m=16italic_m = 16 lags. AAopt0 was implemented as in [6], with Ξ²ksubscriptπ›½π‘˜\beta_{k}italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT set to 0.50.50.50.5 whenever the optimal relaxation parameter Ξ²kβˆ—superscriptsubscriptπ›½π‘˜βˆ—\beta_{k}^{\ast}italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT was zero or fell outside the unit interval.

Refer to caption

Figure 5: Residual norm, Bratu problem, m=16

Refer to caption

Figure 6: Relaxation parameters, Bratu problem, m=16

Figure 6 shows the residuals for the Bratu problem for non-composite versions of AAopt0, AAopt1, AAmd, and AAmd (no reg.). Like in the linear model of Section 2.2, AAopt1 needed the fewest iterations to converge. However, AAmd’s results are promising knowing that requires a single mapping per iteration.

Figure 6 shows the corresponding relaxation parameters. Looking at AAmd (no reg.), it is striking how without constraints or restarts, Ξ²^ksubscript^π›½π‘˜\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT takes extremely large values and converges as slowly as AA with stationary relaxation. With regularization, AAmd performs much better than AA with constant relaxation.

5.4.2 Composite AA

The Bratu problem is estimated with the same specifications as in Section 5.4.1, except that AA is now composite with an AA1 inner loop (identified with a c, see Section 4.2).

Refer to caption

Figure 7: Residual norm, Bratu problem, m=16 with, composite AA

Refer to caption

Figure 8: Relaxation parameters, Bratu problem, m=16, composite AA

Figures 6 show the residuals for the Bratu problem with composite versions of AA and Figure 6 shows the corresponding relaxation parameters. The horizontal axis refers to the number of outer loop iterations. AAopt1, c converges in the fewest iterations, although the difference with other AA algorithms is relatively small, especially considering the fact that it requires 6 maps per iteration. AA implementations that require only 3 mappings per iteration such as AAmd, c and AA, c with constant Ξ²=0.5𝛽0.5\beta=0.5italic_Ξ² = 0.5 also converged reasonably quickly.

5.5 Experiments

Based on the previous example, the most promising AA implementations to investigate further are AAopt1, AAmd and AAmd, c. Additionally, AAopt1_4 and AAopt1_16, which update Ξ²βˆ—superscriptπ›½βˆ—\beta^{\ast}italic_Ξ² start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT every 4 or every 16 iterations, are also studied. These implementations are again compared with stationary AA with Ξ²=1𝛽1\beta=1italic_Ξ² = 1 and Ξ²=0.5𝛽0.5\beta=0.5italic_Ξ² = 0.5, as well as their composite versions.

To select the optimal mπ‘šmitalic_m for each AA algorithm in each experiment, each algorithm was implemented with m∈{2,4,8,16,32,64}π‘š248163264m\in\left\{2,4,8,16,32,64\right\}italic_m ∈ { 2 , 4 , 8 , 16 , 32 , 64 } for 500 draws. The fastest in terms of computation speed (at the 0.75 quantile to favor robustness) was selected. For the EM algorithm for the proportional hazard model with interval censoring, the maximum m=10π‘š10m=10italic_m = 10 was clearly the best choice for all algorithms.

A total of 5000 draws were generated for the Bratu problem and the EM for the proportional hazard model with interval censoring. To reduce simulation time, only 1000 draws were calculated for the EM algorithm for admixed populations. The stopping criterion was β€–f⁒(xk)‖≀10βˆ’8norm𝑓subscriptπ‘₯π‘˜superscript108\left\|f\left(x_{k}\right)\right\|\leq 10^{-8}βˆ₯ italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) βˆ₯ ≀ 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT for all applications, except the EM algorithm for admixed populations which used β€–f⁒(xk)‖≀10βˆ’4norm𝑓subscriptπ‘₯π‘˜superscript104\left\|f\left(x_{k}\right)\right\|\leq 10^{-4}βˆ₯ italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) βˆ₯ ≀ 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT. All algorithms that did not converge were stopped after 10 000 mappings.

Computation times are presented using the performance profiles of Dolan and MorΓ© [8]. They show at which frequency each algorithm’s time was within a certain factor of the fastest algorithm for each draw. The 99%Β confidence intervals for the median number of iterations (outer-loop iterations for composite AA), mappings, and computation times are also reported in tables, along with convergence rates. All computations were single-threaded, performed on Julia 1.10.4 [4], with a 13th Gen Intel(R) Core(TM) i9-13900HX 2.20 GHz CPU running the Windows subsystem for Linux.

Refer to caption

Figure 9: Performance profiles for the Bratu problem

For the Bratu problem, the starting values x0(i)superscriptsubscriptπ‘₯0𝑖x_{0}^{(i)}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT were drawn from U(0,1) distributions. The results are reported in Figure 9 and Table 1. The composite AAmd was the fastest more than 60% of the time, requiring a median of 67 iterations and 199 maps to converge. The runner-up was the composite AA with constant Ξ²=0.5𝛽0.5\beta=0.5italic_Ξ² = 0.5. Interestingly, non-composite AAmd was also faster than both non-composite AA with constant relaxation.

Table 1: Median performances: Bratu problem
Algorithm Iterations Β Β Β Maps Β Β Time (ms) Converged
AAopt1, m=32π‘š32m=32italic_m = 32 (89, 89) (263, 263) (148.22, 149.19) 1
AAopt1_4, m=16π‘š16m=16italic_m = 16 (161, 162) (241, 242) (136.73, 137.67) 1
AAopt1_16, m=16π‘š16m=16italic_m = 16 (203, 204) (229, 230) (134.28, 135.47) 1
AAmd, m=32π‘š32m=32italic_m = 32 (218, 219) (218, 219) (152.02, 153.4) 1
AAmd, c, m=32π‘š32m=32italic_m = 32 (67, 67) (199, 199) (112.12, 112.82) 1
AA, Ξ²=1.0,𝛽1.0\beta=1.0,italic_Ξ² = 1.0 , m=64π‘š64m=64italic_m = 64 (223, 224) (223, 224) (212.83, 214.67) 1
AA, Ξ²=0.5,𝛽0.5\beta=0.5,italic_Ξ² = 0.5 , m=32π‘š32m=32italic_m = 32 (273, 278) (273, 278) (194.53, 197.93) 1
AA, Ξ²=1.0𝛽1.0\beta=1.0italic_Ξ² = 1.0, c,m=64,m=64, italic_m = 64 (87, 87) (259, 259) (157.77, 158.96) 1
AA, Ξ²=0.5𝛽0.5\beta=0.5italic_Ξ² = 0.5, c,m=32,m=32, italic_m = 32 (70, 71) (208, 211) (116.79, 117.9) 1
Note: 99% conf. interval for the median. 2500 parameters.

Refer to caption

Figure 10: Performance profiles for the EM algorithm for a proportional hazard model with interval censoring

Refer to caption

Figure 11: Performance profiles for the EM algorithm for admixed populations

The results of the numerical experiments with the EM algorithm for a proportional hazard model with interval censoring are shown in Figure 11 and Table 2. AAmd was the fastest to converge, with a median number of iterations and mapping evaluations of approximately 99. AAopt1_16 was a close second. Contrary to the Bratu problem, composite AA clearly did not benefit from the AA1 inner loop compared to non-composite AA. Still among composite AA, AAmd did slightly outperformed those with constant relaxation.

Table 2: Median performances: EM algorithm for a proportional hazard model with interval censoring
Algorithm Iterations Maps Time (ms) Converged
AAopt1, m=10π‘š10m=10italic_m = 10 (87, 93) (257, 275) (71.59, 76.41) 0.962
AAopt1_4, m=10π‘š10m=10italic_m = 10 (102, 110) (152, 164) (43.71, 47.56) 0.894
AAopt1_16, m=10π‘š10m=10italic_m = 10 (99, 107) (113, 121) (32.8, 35.14) 0.96
AAmd, m=10π‘š10m=10italic_m = 10 (96, 102) (96, 102) (28.43, 30.28) 0.958
AAmd, c, m=10π‘š10m=10italic_m = 10 (68, 72) (202, 214) (56.84, 60.03) 0.97
AA, Ξ²=1.0𝛽1.0\beta=1.0italic_Ξ² = 1.0, m=10π‘š10m=10italic_m = 10 (129, 139) (129, 139) (37.98, 40.41) 0.952
AA, Ξ²=0.5𝛽0.5\beta=0.5italic_Ξ² = 0.5, m=10π‘š10m=10italic_m = 10 (169, 184) (169, 184) (48.79, 52.8) 0.937
AA, Ξ²=1.0𝛽1.0\beta=1.0italic_Ξ² = 1.0, c, m=10π‘š10m=10italic_m = 10 (75, 79) (223, 235) (61.79, 65.68) 0.962
AA, Ξ²=0.5𝛽0.5\beta=0.5italic_Ξ² = 0.5, c, m=10π‘š10m=10italic_m = 10 (76, 80) (226, 239) (63.13, 67.07) 0.964
Note: 99% conf. interval for the median. 10 parameters.
Table 3: Median performances: EM algorithm for admixed populations
Algorithm Iterations Maps Time (s) Converged
AAopt1, m=8π‘š8m=8italic_m = 8 (245, 253) (731, 755) (4.14, 4.28) 1
AAopt1_4, m=2π‘š2m=2italic_m = 2 (355, 374) (533, 560) (3.4, 3.68) 1
AAopt1_16, m=16π‘š16m=16italic_m = 16 (434, 454) (488, 512) (3.38, 3.55) 1
AAmd, m=4π‘š4m=4italic_m = 4 (363, 379) (363, 379) (2.55, 2.65) 1
AAmd, c, m=4π‘š4m=4italic_m = 4 (161, 166) (481, 496) (2.76, 2.86) 0.999
AA, Ξ²=1.0𝛽1.0\beta=1.0italic_Ξ² = 1.0, m=8π‘š8m=8italic_m = 8 (377, 389) (377, 389) (2.67, 2.77) 1
AA, Ξ²=0.5𝛽0.5\beta=0.5italic_Ξ² = 0.5, m=16π‘š16m=16italic_m = 16 (449, 461) (449, 461) (3.21, 3.35) 1
AA, Ξ²=1.0𝛽1.0\beta=1.0italic_Ξ² = 1.0, c, m=4π‘š4m=4italic_m = 4 (182, 189) (544, 565) (3.16, 3.3) 1
AA, Ξ²=0.5𝛽0.5\beta=0.5italic_Ξ² = 0.5, c, m=8π‘š8m=8italic_m = 8 (207, 214) (619, 640) (3.57, 3.71) 1
Note: 99% conf. interval for the median. 750 parameters. Tolerance set to β€–f⁒(x)‖≀10βˆ’4norm𝑓π‘₯superscript104\left\|f(x)\right\|\leq 10^{-4}βˆ₯ italic_f ( italic_x ) βˆ₯ ≀ 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT. 1000 draws.

Finally, Figure 11 and Table 3 summarize the results of the numerical experiments with the EM algorithm for admixed populations. AAmd was again the fastest, although its edge over stationary AA with Ξ²=1𝛽1\beta=1italic_Ξ² = 1 was small. Composite AAmd was in third place with a clear edge over both composite AA with constant relaxation.

5.6 Discussion

The results shown in Sections 5.4 and 5.5 show that adaptive relaxation parameters can clearly reduce the number of iterations needed for AA to converge. Since maps are often computationally expensive, a relaxation strategy like AAmd which do not require additional maps per iteration offers the most predictable benefits. AA with a well-chosen constant relaxation parameters can approach its performances, but the results can vary greatly with the choice of β𝛽\betaitalic_Ξ². A clear advantage of adaptive relaxation is that they do not require tuning.

Additional numerical experiments (not presented here) were conducted to explore the impact of each parameter on AAmd. Setting Ξ²maxsubscript𝛽\beta_{\max}italic_Ξ² start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT higher than 3 generally made little difference. A smaller δ𝛿\deltaitalic_Ξ΄ (the allowed discrepancy between Ξ²^ksubscript^π›½π‘˜\hat{\beta}_{k}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and Ξ²^k+1subscript^π›½π‘˜1\hat{\beta}_{k+1}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT) made AAmd slightly slower by falling back on Ξ²defaultsubscript𝛽default\beta_{\text{default}}italic_Ξ² start_POSTSUBSCRIPT default end_POSTSUBSCRIPT more frequently. Conversly, a larger δ𝛿\deltaitalic_Ξ΄ may not be advisable for highly nonlinear applications for which optimal relaxation parameters could vary considerably between iterations. A key parameter is P𝑃Pitalic_P, the number of consecutive Ξ²^^𝛽\hat{\beta}over^ start_ARG italic_Ξ² end_ARG values above 1111 before a restart with Ξ²=1𝛽1\beta=1italic_Ξ² = 1. A reasonable range is 3≀P≀103𝑃103\leq P\leq 103 ≀ italic_P ≀ 10. Without periodic resets of Ξ²^^𝛽\hat{\beta}over^ start_ARG italic_Ξ² end_ARG, AAmd’s convergence was often slower than that of stationary AA, as seen in Section 5.4.

Composite AA was successful in accelerating the estimation of the Bratu problem but performed poorly for the EM algorithm. Among non-composite AA implementations, AAopt1 often converged in fewer iterations, though it was not competitive in terms of computation time. By not recomputing Ξ²βˆ—superscriptπ›½βˆ—\beta^{\ast}italic_Ξ² start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT at each iteration, AAopt1_4 and AAopt1_16 were often faster than AA with constant mappings with Ξ²=1𝛽1\beta=1italic_Ξ² = 1 or Ξ²=0.5𝛽0.5\beta=0.5italic_Ξ² = 0.5, but did not match the speed of AAmd.

Since AA can be made efficient by using QR decomposition and Givens rotations to solve the internal linear optimization problem, the fastest algorithms were clearly those requiring the the fewest mappings to converge. This could change for applications with truly negligible mapping computation times. However, for such applications, lighter algorithms that do not require solving any linear systems, such as SQUAREM [22] or ACX [13], would likely be faster than AA.

6 Conclusion

Two adaptive relaxation schemes for Anderson acceleration have been proposed for convergent fixed-point applications. Both are demonstrated to improve Anderson acceleration’s convergence for a linear contraction mapping.

The first scheme, AAopt1, uses two extra maps to compute a locally optimal relaxation parameter. Convergence is accelerated by reusing the same maps a second time to compute the next iterate. Furthermore, by reusing the same relaxation parameter over multiple iterations, AAopt1_T always outperforms AAopt1 in terms of speed, though AA with a constant, well-chosen relaxation parameter can still be faster.

The second proposed scheme, AAmd, requires fewer iterations to converge while needing minimal extra calculation. As a result, it outperformed all other AA specifications across all tests in terms of computation time. Interestingly, AAmd’s adaptive relaxation parameters are frequently above one, an unexpected result in the context of the AA literature that warrants further investigation.

References

  • Alexander et al., [2009] David H. Alexander, John Novembre, and Kenneth Lange. Fast model-based estimation of ancestry in unrelated individuals. Genome Research, 19:1655–1664, July 2009.
  • Anderson, [1965] Donald G. M. Anderson. Iterative procedures for nonlinear integral equations. Journal of the Association for Computing Machinery, 12(4):547–560, October 1965.
  • Anderson, [2019] Donald G. M. Anderson. Comments on β€œAnderson acceleration, mixing and extrapolation”. Numerical Algorithms, 80:135–234, 2019.
  • Bezanson et al., [2017] Jeff Bezanson, Alan Edelman, Stefan Karpinski, and ViralΒ B. Shah. Julia: A fresh approach to numerical computing. SIAM Review, 59(1):65–98, 2017.
  • Chen and Vuik, [2022] Kewang Chen and Cornelis Vuik. Composite Anderson acceleration method with two window sizes and optimized damping. International Journal for Numerical Methods in Engineering, 123(23):5964–5985, August 2022.
  • Chen and Vuik, [2024] Kewang Chen and Cornelis Vuik. Non-stationary Anderson acceleration with optimized damping. Journal of Computational and Applied Mathematics, 451, June 2024.
  • Dempster et al., [1977] ArthurΒ P. Dempster, NanΒ N. Laird, and DonaldΒ B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B (Statistical Methodology), 39(1):1–38, 1977.
  • Dolan and MorΓ©, [2002] ElizabethΒ D. Dolan and JorgeΒ J. MorΓ©. Benchmarking optimization software with performance profiles. Mathematical Programming, 91:201–213, 2002.
  • Evans et al., [2020] Claire Evans, Sara Pollock, Leo G. Rebholz, and Mengying Xiao. A proof that Anderson acceleration improves the convergence rate in linearly converging fixed-point methods (but not in those converging quadratically). SIAM Journal on Numerical Analysis, 58(1):788–810, 2020.
  • Fang and Saad, [2009] Haw-ren Fang and Yousef Saad. Two classes of multisecant methods for nonlinear acceleration. Numerical Linear Algebra with Applications, 16:197–221, 2009.
  • Henderson and Varadhan, [2019] Nicholas C. Henderson and Ravi Varadhan. Damped Anderson acceleration with restarts and monotonicity control for accelerating EM and EM-like algorithms. Journal of Computational and Graphical Statistics, May 2019.
  • Jin et al., [2024] Jiachen Jin, Hongxia Wang, and Kangkang Deng. Anderson acceleration of derivative-free projection methods for constrained monotone nonlinear equations, 2024.
  • Lepage-Saucier, [2024] Nicolas Lepage-Saucier. Alternating cyclic vector extrapolation technique for accelerating nonlinear optimization algorithms and fixed-point mapping applications. Journal of Computational and Applied Mathematics, 439, March 2024.
  • Pollock and Rebholz, [2021] Sara Pollock and Leo G. Rebholz. Anderson acceleration for contractive and noncontractive operators. IMA Journal of Numerical Analysis, 41:2841–2872, January 2021.
  • Pollock and Rebholz, [2023] Sara Pollock and Leo G. Rebholz. Filtering for Anderson acceleration. SIAM Journal on Scientific Computing, 45(4):A1571–A1590, 2023.
  • Potra and Engler, [2013] Florian A. Potra and Hans Engler. A characterization of the behavior of the Anderson acceleration on linear problems. Linear Algebra and its Applications, 438:1002–1011, November 2013.
  • Pratapa and Suryanarayana, [2015] Phanisri P. Pratapa and Phanish Suryanarayana. Restarted Pulay mixing for efficient and robust acceleration of fixed-point iterations. Chemical Physics Letters, 635:69–74, 2015.
  • Pulay, [1980] Peter Pulay. Convergence acceleration of iterative sequences. the case of SCF iteration. Chemical Physics Letters, 73(2):393–398, July 1980.
  • Raydan and Svaiter, [2002] Marcos Raydan and Benar F. Svaiter. Relaxed steepest descent and Cauchy-Barzilai-Borwein method. Computational Optimization and Applications, 21:155–167, 2002.
  • Saad and Schultz, [1986] Youcef Saad and Martin H. Schultz. GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM Journal on Scientific and Statistical Computing, 7(3):856–869, July 1986.
  • Tang et al., [2023] Bohao Tang, Nicholas C. Henderson, and Ravi Varadhan. Accelerating fixed-point algorithms in statistics and datascience: A state-of-art review. Journal of Data Science, 21(1):1–26, July 2023.
  • Varadhan and Roland, [2008] Ravi Varadhan and Christophe Roland. Simple and globally convergent methods for accelerating the convergence of any EM algorithm. Scandinavian Journal of Statistics, 35:335–353, 2008.
  • Walker and Ni, [2011] Homer F. Walker and Peng Ni. Anderson acceleration for fixed-point iterations. SIAM Journal on Numerical Analysis, 49(4):1715–1735, 2011.
  • Wang et al., [2016] Lianming Wang, Christopher S. McMahan, Michael G. Hudgens, and Zaina P. Qureshi. A flexible, computationally efficient method for fitting the proportional hazards model to interval-censored data. Biometrics, 72:222–231, March 2016.
  • Warnock, [2021] Robert Warnock. Equilibrium of an arbitrary bunch train with cavity resonators and short range wake: Enhanced iterative solution with Anderson acceleration. Physical Review Accelerators and Beams, 2021.