Incremental Gauss–Newton Methods with
Superlinear Convergence Rates

Zhiling Zhou    Zhuanghua Liu    Chengchang Liu    Luo Luo School of Data Science, Fudan University; [email protected] of Computer Science, National University of Singapore; [email protected] of Computer Science and Engineering, The Chinese University of Hong Kong; [email protected] of Data Science, Fudan University; [email protected]
Abstract

This paper addresses the challenge of solving large-scale nonlinear equations with Hölder continuous Jacobians. We introduce a novel Incremental Gauss–Newton (IGN) method within explicit superlinear convergence rate, which outperforms existing methods that only achieve linear convergence rate. In particular, we formulate our problem by the nonlinear least squares with finite-sum structure, and our method incrementally iterates with the information of one component in each round. We also provide a mini-batch extension to our IGN method that obtains an even faster superlinear convergence rate. Furthermore, we conduct numerical experiments to show the advantages of the proposed methods.

1 Introduction

We study the problem of solving the system of nonlinear equations

𝐟(𝐱)=𝟎,𝐟𝐱0\displaystyle{\bf{f}}({\bf{x}})={\bf{0}},bold_f ( bold_x ) = bold_0 , (1)

where the nonlinear vector function 𝐟:dn:𝐟superscript𝑑superscript𝑛{\bf{f}}:{\mathbb{R}}^{d}\to{\mathbb{R}}^{n}bold_f : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is Lipschitz continuous and its Jacobian is Hölder continuous. This formulation is a fundamental problem in scientific computing [38], and it is popular in a large number of applications including machine learning [15, 10, 4], control system [7], data assimilation [45] and game theory [20, 40].

The Newton-type methods [16, 25, 26, 6, 39, 46] are widely used for solving nonlinear equations. The classical Newton’s method uses the curvature information in Jacobians to obtain a local quadratic convergence rate [39], while it suffers from the expensive computational cost to access the Jacobian and its (pseudo) inverse. Several lines of research focus on approximating Newton’s methods with inexact Jacobians. For example, the quasi-Newton methods [11, 29, 1, 29] estimate the Jacobians via secant equations, leading to the iteration scheme that only needs to access the function value and Jacobian-vector calls. The explicit local superlinear convergence rates of these methods have been established in recent years [30, 31, 32, 49]. Another line of work [51, 50, 42] introduce matrix sketching technique [48] to reduce the dimension of the Jacobian matrix, which improves the computational efficiency per iteration. The superiority of their local convergence depends on the structure of Jacobian in the specific problem. Although quasi-Newton and sketched Newton methods can benefit from the inexact Jacobians, they still require accessing the full information of the nonlinear vector function value at every iteration.

For large-scale nonlinear equations, we are interested in methods that do not require the computation of full function values and Jacobians. In particular, Bertsekas [8] proposed a variant of Gauss–Newton (GN) method by following the Extended Kalman Filter (EKF) framework [3, 34, 5], which incrementally accesses partial information of the vector function values and corresponding Jacobian during the iterations. Consequently, Moriyama et al. [36] incorporated a stepsize into the EKF method, which guarantees the global linear convergence rate under the gradient-growth condition [23]. In the past decade, the incremental (quasi) Newton methods with local superlinear convergence rates are established for strongly convex optimization [43, 44, 35, 28, 33].111In the view of solving nonlinear equations, the methods designed for convex optimization [43, 44, 35, 28, 33] require an additional assumption that the Jacobian is symmetric positive-definite. However, the superiority of local convergence for incremental Newton-type methods in solving the general nonlinear equations is still unclear.

In this work, we propose an incremental Gauss–Newton (IGN) method for solving the systems of nonlinear equations. Our method only requires access to one component of the nonlinear vector function and its gradient per iteration. We maintain an aggregated vector and an aggregated matrix to estimate the vector function value and its Jacobian by incrementally updating. We also introduce a Gram matrix with a low-rank update to reduce the computational cost of matrix inverse in vanilla Gauss–Newton methods. The theoretical analysis shows our IGN method enjoys explicit local superlinear convergence rate for nonlinear equations problem with Hölder continuous Jacobians. Furthermore, we provide a variant of our IGN that makes use of the information of a mini-batch of components, which achieves an even faster superlinear convergence rate. The numerical experiments on real-world applications validate the advantages of the proposed methods.

Paper Organization

In Section 2, we formalize the notations and assumptions for our problem. In Section 3, we propose our incremental Gauss–Newton (IGN) method and present its convergence analysis. In Section 4, we extend the IGN method with the mini-batch update to obtain an even faster convergence rate. In Section 5, we provide a discussion to compare the proposed method with related works. In Section 6, we conduct numerical experiments to show the advantages of our methods. We conclude our work in Section 7.

2 Preliminaries

In this section, we formalize the notations and assumptions throughout this paper.

2.1 Notations

We let [n]={1,,n}delimited-[]𝑛1𝑛[n]=\{1,\dots,n\}[ italic_n ] = { 1 , … , italic_n } and use the notation t%npercent𝑡𝑛t\%nitalic_t % italic_n to present the remainder of t𝑡titalic_t divided by n𝑛nitalic_n. We denote 𝐞insubscript𝐞𝑖superscript𝑛{\bf{e}}_{i}\in{\mathbb{R}}^{n}bold_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT as the i𝑖iitalic_i-th standard basis vector of the n𝑛nitalic_n-dimensional Euclidean space for all i[n]𝑖delimited-[]𝑛i\in[n]italic_i ∈ [ italic_n ]. We use \left\|\cdot\right\|∥ ⋅ ∥ to represent the Euclidean norm for a given vector and the spectral norm for a given matrix. Moreover, we use the notation σmin()subscript𝜎\sigma_{\min}(\cdot)italic_σ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( ⋅ ) to represent the smallest singular value for a given matrix.

For the system of nonlinear equations (1), we partition the vector function 𝐟:dn:𝐟superscript𝑑superscript𝑛{\bf{f}}:{\mathbb{R}}^{d}\to{\mathbb{R}}^{n}bold_f : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT at 𝐱d𝐱superscript𝑑{\bf{x}}\in{\mathbb{R}}^{d}bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT as 𝐟(𝐱)=[f1(𝐱),,fn(𝐱)]n𝐟𝐱superscriptsubscript𝑓1𝐱subscript𝑓𝑛𝐱topsuperscript𝑛{\bf{f}}({\bf{x}})=[f_{1}({\bf{x}}),\dots,f_{n}({\bf{x}})]^{\top}\in{\mathbb{R% }}^{n}bold_f ( bold_x ) = [ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_x ) , … , italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_x ) ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, where fi:d:subscript𝑓𝑖superscript𝑑f_{i}:{\mathbb{R}}^{d}\to{\mathbb{R}}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R. We also denote the gradient of component fi()subscript𝑓𝑖f_{i}(\cdot)italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( ⋅ ) at 𝐱d𝐱superscript𝑑{\bf{x}}\in{\mathbb{R}}^{d}bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT as 𝐠i(𝐱)=fi(𝐱)subscript𝐠𝑖𝐱subscript𝑓𝑖𝐱{\bf{g}}_{i}({\bf{x}})=\nabla f_{i}({\bf{x}})bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) = ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ), and we organize the corresponding Jacobian as 𝐉(𝐱)=[𝐠1(𝐱),,𝐠n(𝐱)]n×d𝐉𝐱superscriptsubscript𝐠1𝐱subscript𝐠𝑛𝐱topsuperscript𝑛𝑑{\bf J}({\bf{x}})=[{\bf{g}}_{1}({\bf{x}}),\cdots,{\bf{g}}_{n}({\bf{x}})]^{\top% }\in{\mathbb{R}}^{n\times d}bold_J ( bold_x ) = [ bold_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_x ) , ⋯ , bold_g start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_x ) ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_d end_POSTSUPERSCRIPT.

2.2 Assumptions

Throughout this paper, we suppose the function 𝐟:dn:𝐟superscript𝑑superscript𝑛{\bf{f}}:{\mathbb{R}}^{d}\to{\mathbb{R}}^{n}bold_f : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT satisfies the following assumptions.

Assumption 1.

We suppose the vector function 𝐟:dn:𝐟superscript𝑑superscript𝑛{\bf{f}}:{\mathbb{R}}^{d}\to{\mathbb{R}}^{n}bold_f : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is Lipschitz continuous, i.e., there exists constant Lf>0subscript𝐿𝑓0L_{f}>0italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT > 0 such that

𝐟(𝐱)𝐟(𝐲)Lf𝐱𝐲norm𝐟𝐱𝐟𝐲subscript𝐿𝑓norm𝐱𝐲\displaystyle\left\|{\bf{f}}({\bf{x}})-{\bf{f}}({\bf{y}})\right\|\leq L_{f}% \left\|{\bf{x}}-{\bf{y}}\right\|∥ bold_f ( bold_x ) - bold_f ( bold_y ) ∥ ≤ italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ∥ bold_x - bold_y ∥ (2)

for all 𝐱,𝐲d𝐱𝐲superscript𝑑{\bf{x}},{\bf{y}}\in{\mathbb{R}}^{d}bold_x , bold_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT.

Assumption 2.

We suppose the Jacobian 𝐉:dn×d:𝐉superscript𝑑superscript𝑛𝑑{\bf J}:{\mathbb{R}}^{d}\to{\mathbb{R}}^{n\times d}bold_J : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_n × italic_d end_POSTSUPERSCRIPT is ν𝜈\nuitalic_ν-Hölder continuous for some ν(0,1]𝜈01\nu\in(0,1]italic_ν ∈ ( 0 , 1 ], i.e., there exists constant ν>0subscript𝜈0{\mathcal{H}}_{\nu}>0caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT > 0 such that

𝐉(𝐱)𝐉(𝐲)ν𝐱𝐲νnorm𝐉𝐱𝐉𝐲subscript𝜈superscriptnorm𝐱𝐲𝜈\displaystyle\left\|{\bf J}({\bf{x}})-{\bf J}({\bf{y}})\right\|\leq{\mathcal{H% }}_{\nu}\left\|{\bf{x}}-{\bf{y}}\right\|^{\nu}∥ bold_J ( bold_x ) - bold_J ( bold_y ) ∥ ≤ caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∥ bold_x - bold_y ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT (3)

for all 𝐱,𝐲d𝐱𝐲superscript𝑑{\bf{x}},{\bf{y}}\in{\mathbb{R}}^{d}bold_x , bold_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT.

Assumption 3.

The system of the nonlinear equations (1) holds nd𝑛𝑑n\geq ditalic_n ≥ italic_d and has a non-degenerate solution 𝐱dsuperscript𝐱superscript𝑑{\bf{x}}^{*}\in{\mathbb{R}}^{d}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, i.e., there exists some μ>0𝜇0\mu>0italic_μ > 0 such that

μ=σmin(𝐉(𝐱))>0.𝜇subscript𝜎𝐉superscript𝐱0\displaystyle\mu=\sigma_{\min}({\bf J}({\bf{x}}^{*}))>0.italic_μ = italic_σ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( bold_J ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) > 0 . (4)

Noticing that most of existing work [44, 51, 32, 9, 8, 36, 23] focus on the assumption of Lipschitz continuous Jacobian, which is a special case of our Assumption 2 by taking ν=1𝜈1\nu=1italic_ν = 1.

3 The Incremental Gauss–Newton Method

Algorithm 1 Incremental Gauss–Newton Method (IGN)
1:Input: 𝐱0dsuperscript𝐱0superscript𝑑{\bf{x}}^{0}\in{\mathbb{R}}^{d}bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, 𝐮0dsuperscript𝐮0superscript𝑑{\bf{u}}^{0}\in{\mathbb{R}}^{d}bold_u start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, 𝐇0,𝐆0d×dsuperscript𝐇0superscript𝐆0superscript𝑑𝑑{\bf H}^{0},{\bf G}^{0}\in{\mathbb{R}}^{d\times d}bold_H start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , bold_G start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d × italic_d end_POSTSUPERSCRIPT
2:for t=0,1,𝑡01t=0,1,\dotsitalic_t = 0 , 1 , …
3:𝐱t+1=𝐆t𝐮tsuperscript𝐱𝑡1superscript𝐆𝑡superscript𝐮𝑡{\bf{x}}^{t+1}={\bf G}^{t}{\bf{u}}^{t}bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_u start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT
4:it=t%n+1subscript𝑖𝑡percent𝑡𝑛1i_{t}=t\%n+1italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_t % italic_n + 1
5:𝐔t=[𝐠it(𝐳itt),𝐠it(𝐱t+1)]superscript𝐔𝑡subscript𝐠subscript𝑖𝑡superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝐠subscript𝑖𝑡superscript𝐱𝑡1{\bf U}^{t}=\big{[}-{\bf{g}}_{i_{t}}({\bf{z}}_{i_{t}}^{t}),~{}~{}{\bf{g}}_{i_{% t}}({\bf{x}}^{t+1})\big{]}bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = [ - bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) ]
6:𝐕t=[𝐠it(𝐳itt),𝐠it(𝐱t+1)]superscript𝐕𝑡subscript𝐠subscript𝑖𝑡superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝐠subscript𝑖𝑡superscript𝐱𝑡1{\bf V}^{t}=\big{[}{\bf{g}}_{i_{t}}({\bf{z}}_{i_{t}}^{t}),~{}~{}{\bf{g}}_{i_{t% }}({\bf{x}}^{t+1})\big{]}bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = [ bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) ]
7:𝐮t+1=𝐮t(𝐠it(𝐳itt)𝐳ittfit(𝐳itt))𝐠it(𝐳itt)+(𝐠it(𝐱t+1)𝐱t+1fit(𝐱t+1))𝐠it(𝐱t+1)superscript𝐮𝑡1superscript𝐮𝑡subscript𝐠subscript𝑖𝑡superscriptsuperscriptsubscript𝐳subscript𝑖𝑡𝑡topsuperscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝑓subscript𝑖𝑡superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝐠subscript𝑖𝑡superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝐠subscript𝑖𝑡superscriptsuperscript𝐱𝑡1topsuperscript𝐱𝑡1subscript𝑓subscript𝑖𝑡superscript𝐱𝑡1subscript𝐠subscript𝑖𝑡superscript𝐱𝑡1{\bf{u}}^{t+1}={\bf{u}}^{t}-\left({\bf{g}}_{i_{t}}({\bf{z}}_{i_{t}}^{t})^{\top% }{\bf{z}}_{i_{t}}^{t}-f_{i_{t}}({\bf{z}}_{i_{t}}^{t})\right){\bf{g}}_{i_{t}}({% \bf{z}}_{i_{t}}^{t})+\left({\bf{g}}_{i_{t}}({\bf{x}}^{t+1})^{\top}{\bf{x}}^{t+% 1}-f_{i_{t}}({\bf{x}}^{t+1})\right){\bf{g}}_{i_{t}}({\bf{x}}^{t+1})bold_u start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_u start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - ( bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ) bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + ( bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) ) bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT )
8:𝐇t+1=𝐇t𝐠it(𝐳itt)𝐠it(𝐳itt)+𝐠it(𝐱t+1)𝐠it(𝐱t+1)superscript𝐇𝑡1superscript𝐇𝑡subscript𝐠subscript𝑖𝑡superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝐠subscript𝑖𝑡superscriptsuperscriptsubscript𝐳subscript𝑖𝑡𝑡topsubscript𝐠subscript𝑖𝑡superscript𝐱𝑡1subscript𝐠subscript𝑖𝑡superscriptsuperscript𝐱𝑡1top{\bf H}^{t+1}={\bf H}^{t}-{\bf{g}}_{i_{t}}({\bf{z}}_{i_{t}}^{t}){\bf{g}}_{i_{t% }}({\bf{z}}_{i_{t}}^{t})^{\top}+{\bf{g}}_{i_{t}}({\bf{x}}^{t+1}){\bf{g}}_{i_{t% }}({\bf{x}}^{t+1})^{\top}bold_H start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT
9:𝐆t+1=𝐆t𝐆t𝐔t(𝐈+(𝐕t)𝐆t𝐔t)1(𝐕t)𝐆tsuperscript𝐆𝑡1superscript𝐆𝑡superscript𝐆𝑡superscript𝐔𝑡superscript𝐈superscriptsuperscript𝐕𝑡topsuperscript𝐆𝑡superscript𝐔𝑡1superscriptsuperscript𝐕𝑡topsuperscript𝐆𝑡{\bf G}^{t+1}={\bf G}^{t}-{\bf G}^{t}{\bf U}^{t}({\bf I}+({\bf V}^{t})^{\top}{% \bf G}^{t}{\bf U}^{t})^{-1}({\bf V}^{t})^{\top}{\bf G}^{t}bold_G start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( bold_I + ( bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT
10:𝐳it+1={𝐱t+1,if i=it𝐳it,otherwisesuperscriptsubscript𝐳𝑖𝑡1casessuperscript𝐱𝑡1if 𝑖subscript𝑖𝑡superscriptsubscript𝐳𝑖𝑡otherwise{\bf{z}}_{i}^{t+1}=\begin{cases}{\bf{x}}^{t+1},&\text{if~{}}i=i_{t}\\ {\bf{z}}_{i}^{t},&\text{otherwise}\end{cases}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = { start_ROW start_CELL bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT , end_CELL start_CELL if italic_i = italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , end_CELL start_CELL otherwise end_CELL end_ROW
11:end for

In this section, we propose the Incremental Gauss-Newton (IGN) method and provide its explicit superlinear convergence rate.

3.1 The Algorithm

We first introduce the intuition of our algorithm design. Solving the system of nonlinear equations (1) can be regarded as minimizing the norm of the nonlinear vector function 𝐟:dn:𝐟superscript𝑑superscript𝑛{\bf{f}}:{\mathbb{R}}^{d}\to{\mathbb{R}}^{n}bold_f : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, which means we can reformulate the problem as the following nonlinear least squares minimization problem

min𝐱dϕ(𝐱)12i=1n(fi(𝐱))2.subscript𝐱superscript𝑑italic-ϕ𝐱12superscriptsubscript𝑖1𝑛superscriptsubscript𝑓𝑖𝐱2\displaystyle\min_{{\bf{x}}\in{\mathbb{R}}^{d}}\phi({\bf{x}})\triangleq\frac{1% }{2}\sum_{i=1}^{n}(f_{i}({\bf{x}}))^{2}.roman_min start_POSTSUBSCRIPT bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_ϕ ( bold_x ) ≜ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (5)

For each component fi:d:subscript𝑓𝑖superscript𝑑f_{i}:{\mathbb{R}}^{d}\to{\mathbb{R}}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R, we consider its linear approximation

fi(𝐱)fi(𝐳it)+𝐠i(𝐳it)(𝐱𝐳it),subscript𝑓𝑖𝐱subscript𝑓𝑖superscriptsubscript𝐳𝑖𝑡subscript𝐠𝑖superscriptsuperscriptsubscript𝐳𝑖𝑡top𝐱superscriptsubscript𝐳𝑖𝑡\displaystyle f_{i}({\bf{x}})\approx f_{i}({\bf{z}}_{i}^{t})+{\bf{g}}_{i}({\bf% {z}}_{i}^{t})^{\top}({\bf{x}}-{\bf{z}}_{i}^{t}),italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) ≈ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_x - bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , (6)

where 𝐳itdsuperscriptsubscript𝐳𝑖𝑡superscript𝑑{\bf{z}}_{i}^{t}\in{\mathbb{R}}^{d}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is some point related to component fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT at the t𝑡titalic_t-th iteration. The estimation (6) motivates us to construct the surrogate problem for the nonlinear least squares (5) as follows

min𝐱dψ(𝐱)i=1nψi(𝐱),whereψi(𝐱)12fi(𝐳it)+𝐠i(𝐳it)(𝐱𝐳it)2.formulae-sequencesubscript𝐱superscript𝑑𝜓𝐱superscriptsubscript𝑖1𝑛subscript𝜓𝑖𝐱wheresubscript𝜓𝑖𝐱12superscriptnormsubscript𝑓𝑖superscriptsubscript𝐳𝑖𝑡subscript𝐠𝑖superscriptsuperscriptsubscript𝐳𝑖𝑡top𝐱superscriptsubscript𝐳𝑖𝑡2\displaystyle\min_{{\bf{x}}\in{\mathbb{R}}^{d}}\psi({\bf{x}})\triangleq\sum_{i% =1}^{n}\psi_{i}({\bf{x}}),\qquad\text{where}~{}\psi_{i}({\bf{x}})\triangleq% \frac{1}{2}\left\|f_{i}({\bf{z}}_{i}^{t})+{\bf{g}}_{i}({\bf{z}}_{i}^{t})^{\top% }({\bf{x}}-{\bf{z}}_{i}^{t})\right\|^{2}.roman_min start_POSTSUBSCRIPT bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_ψ ( bold_x ) ≜ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) , where italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) ≜ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_x - bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (7)

Since each ψisubscript𝜓𝑖\psi_{i}italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is convex, which implies problem (7) has the closed-form solution

𝐱t+1=(i=1n𝐠i(𝐳it)𝐠i(𝐳it))1i=1n(𝐠i(𝐳it)𝐳itfi(𝐳it))𝐠i(𝐳it).superscript𝐱𝑡1superscriptsuperscriptsubscript𝑖1𝑛subscript𝐠𝑖superscriptsubscript𝐳𝑖𝑡subscript𝐠𝑖superscriptsuperscriptsubscript𝐳𝑖𝑡top1superscriptsubscript𝑖1𝑛subscript𝐠𝑖superscriptsuperscriptsubscript𝐳𝑖𝑡topsuperscriptsubscript𝐳𝑖𝑡subscript𝑓𝑖superscriptsubscript𝐳𝑖𝑡subscript𝐠𝑖superscriptsubscript𝐳𝑖𝑡\displaystyle{\bf{x}}^{t+1}=\left(\sum_{i=1}^{n}{\bf{g}}_{i}({\bf{z}}_{i}^{t})% {\bf{g}}_{i}({\bf{z}}_{i}^{t})^{\top}\right)^{-1}\sum_{i=1}^{n}\left({\bf{g}}_% {i}({\bf{z}}_{i}^{t})^{\top}{\bf{z}}_{i}^{t}-f_{i}({\bf{z}}_{i}^{t})\right){% \bf{g}}_{i}({\bf{z}}_{i}^{t}).bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ) bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) . (8)

We assume the matrix i=1n𝐠i(𝐳it)𝐠i(𝐳it)superscriptsubscript𝑖1𝑛subscript𝐠𝑖superscriptsubscript𝐳𝑖𝑡subscript𝐠𝑖superscriptsuperscriptsubscript𝐳𝑖𝑡top\sum_{i=1}^{n}{\bf{g}}_{i}({\bf{z}}_{i}^{t}){\bf{g}}_{i}({\bf{z}}_{i}^{t})^{\top}∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT is always non-singular in this subsection, which will be verified under our assumptions in later analysis.

We propose the Incremental Gauss-Newton (IGN) method by performing an update (8) at the t𝑡titalic_t-th iteration. It is worth noting that we can take advantage of the inherent finite-sum structure in formulation (5) to establish incremental methods. Specifically, we update one of {𝐳it}i=1nsuperscriptsubscriptsubscriptsuperscript𝐳𝑡𝑖𝑖1𝑛\{{\bf{z}}^{t}_{i}\}_{i=1}^{n}{ bold_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT at each iteration in a cyclic fashion, that is

𝐳it+1={𝐱t+1,if i=it,𝐳it,otherwise,superscriptsubscript𝐳𝑖𝑡1casessuperscript𝐱𝑡1if 𝑖subscript𝑖𝑡superscriptsubscript𝐳𝑖𝑡otherwise\displaystyle{\bf{z}}_{i}^{t+1}=\begin{cases}{\bf{x}}^{t+1},&\text{if~{}}i=i_{% t},\\ {\bf{z}}_{i}^{t},&\text{otherwise},\end{cases}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = { start_ROW start_CELL bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT , end_CELL start_CELL if italic_i = italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , end_CELL start_CELL otherwise , end_CELL end_ROW (9)

where it=t%n+1subscript𝑖𝑡percent𝑡𝑛1i_{t}={t\%n}+1italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_t % italic_n + 1. This indicates that we only need to address the terms associated with point 𝐳ittsuperscriptsubscript𝐳subscript𝑖𝑡𝑡{\bf{z}}_{i_{t}}^{t}bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT in update (8), which can be implemented by introducing the aggregated variables

𝐮t=i=1n(𝐠i(𝐳it)𝐳itfi(𝐳it))𝐠i(𝐳it),𝐇t=i=1n𝐠i(𝐳it)𝐠i(𝐳it)and𝐆t=(𝐇t)1.formulae-sequencesuperscript𝐮𝑡superscriptsubscript𝑖1𝑛subscript𝐠𝑖superscriptsuperscriptsubscript𝐳𝑖𝑡topsuperscriptsubscript𝐳𝑖𝑡subscript𝑓𝑖superscriptsubscript𝐳𝑖𝑡subscript𝐠𝑖superscriptsubscript𝐳𝑖𝑡formulae-sequencesuperscript𝐇𝑡superscriptsubscript𝑖1𝑛subscript𝐠𝑖superscriptsubscript𝐳𝑖𝑡subscript𝐠𝑖superscriptsuperscriptsubscript𝐳𝑖𝑡topandsuperscript𝐆𝑡superscriptsuperscript𝐇𝑡1\displaystyle{\bf{u}}^{t}=\sum_{i=1}^{n}\left({\bf{g}}_{i}({\bf{z}}_{i}^{t})^{% \top}{\bf{z}}_{i}^{t}-f_{i}({\bf{z}}_{i}^{t})\right){\bf{g}}_{i}({\bf{z}}_{i}^% {t}),\qquad{\bf H}^{t}=\sum_{i=1}^{n}{\bf{g}}_{i}({\bf{z}}_{i}^{t}){\bf{g}}_{i% }({\bf{z}}_{i}^{t})^{\top}\qquad\text{and}\qquad{\bf G}^{t}=\left({\bf H}^{t}% \right)^{-1}.bold_u start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ) bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT and bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = ( bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT . (10)

Then we can write update (8) as

𝐱t+1=𝐆t𝐮tsuperscript𝐱𝑡1superscript𝐆𝑡subscript𝐮𝑡\displaystyle{\bf{x}}^{t+1}={\bf G}^{t}{\bf{u}}_{t}bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (11)

and maintain the aggregated variables by following recursions222Noticing that there is no need to explicitly construct matrix 𝐇tsubscript𝐇𝑡{\bf H}_{t}bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in implementation, while this matrix is useful to understand and analyze our method.

{𝐮t+1=𝐮t(𝐠it(𝐳itt)𝐳ittfit(𝐳itt))𝐠it(𝐳itt)+(𝐠it(𝐱t+1)𝐱t+1fit(𝐱t+1))𝐠it(𝐱t+1),𝐇t+1=𝐇t𝐠it(𝐳itt)𝐠it(𝐳itt)+𝐠it(𝐱t+1)𝐠it(𝐱t+1),𝐆t+1=𝐆t𝐆t𝐔t(𝐈+(𝐕t)𝐆t𝐔t)1(𝐕t)𝐆t,casessuperscript𝐮𝑡1superscript𝐮𝑡subscript𝐠subscript𝑖𝑡superscriptsuperscriptsubscript𝐳subscript𝑖𝑡𝑡topsuperscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝑓subscript𝑖𝑡superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝐠subscript𝑖𝑡superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝐠subscript𝑖𝑡superscriptsuperscript𝐱𝑡1topsuperscript𝐱𝑡1subscript𝑓subscript𝑖𝑡superscript𝐱𝑡1subscript𝐠subscript𝑖𝑡superscript𝐱𝑡1otherwisesuperscript𝐇𝑡1superscript𝐇𝑡subscript𝐠subscript𝑖𝑡superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝐠subscript𝑖𝑡superscriptsuperscriptsubscript𝐳subscript𝑖𝑡𝑡topsubscript𝐠subscript𝑖𝑡superscript𝐱𝑡1subscript𝐠subscript𝑖𝑡superscriptsuperscript𝐱𝑡1topotherwisesuperscript𝐆𝑡1superscript𝐆𝑡superscript𝐆𝑡superscript𝐔𝑡superscript𝐈superscriptsuperscript𝐕𝑡topsuperscript𝐆𝑡superscript𝐔𝑡1superscriptsuperscript𝐕𝑡topsuperscript𝐆𝑡otherwise\displaystyle\small\begin{cases}\,{\bf{u}}^{t+1}={\bf{u}}^{t}-\left({\bf{g}}_{% i_{t}}({\bf{z}}_{i_{t}}^{t})^{\top}{\bf{z}}_{i_{t}}^{t}-f_{i_{t}}({\bf{z}}_{i_% {t}}^{t})\right){\bf{g}}_{i_{t}}({\bf{z}}_{i_{t}}^{t})+\left({\bf{g}}_{i_{t}}(% {\bf{x}}^{t+1})^{\top}{\bf{x}}^{t+1}-f_{i_{t}}({\bf{x}}^{t+1})\right){\bf{g}}_% {i_{t}}({\bf{x}}^{t+1}),\\[4.26773pt] {\bf H}^{t+1}={\bf H}^{t}-{\bf{g}}_{i_{t}}({\bf{z}}_{i_{t}}^{t}){\bf{g}}_{i_{t% }}({\bf{z}}_{i_{t}}^{t})^{\top}+{\bf{g}}_{i_{t}}({\bf{x}}^{t+1}){\bf{g}}_{i_{t% }}({\bf{x}}^{t+1})^{\top},\\[4.26773pt] {\bf G}^{t+1}={\bf G}^{t}-{\bf G}^{t}{\bf U}^{t}({\bf I}+({\bf V}^{t})^{\top}{% \bf G}^{t}{\bf U}^{t})^{-1}({\bf V}^{t})^{\top}{\bf G}^{t},\end{cases}{ start_ROW start_CELL bold_u start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_u start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - ( bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ) bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + ( bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) ) bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL bold_H start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL bold_G start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( bold_I + ( bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , end_CELL start_CELL end_CELL end_ROW (12)

where the last one is based on Sherman–Morrison–Woodbury formula [47] and definitions

𝐔t[𝐠it(𝐳itt),𝐠it(𝐱t+1)]d×2and𝐕t[𝐠it(𝐳itt),𝐠it(𝐱t+1)]d×2.formulae-sequencesuperscript𝐔𝑡subscript𝐠subscript𝑖𝑡superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝐠subscript𝑖𝑡superscript𝐱𝑡1superscript𝑑2andsuperscript𝐕𝑡subscript𝐠subscript𝑖𝑡superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝐠subscript𝑖𝑡superscript𝐱𝑡1superscript𝑑2\displaystyle{\bf U}^{t}\triangleq\big{[}-{\bf{g}}_{i_{t}}({\bf{z}}_{i_{t}}^{t% }),~{}~{}{\bf{g}}_{i_{t}}({\bf{x}}^{t+1})\big{]}\in{\mathbb{R}}^{d\times 2}% \qquad\text{and}\qquad{\bf V}^{t}\triangleq\big{[}{\bf{g}}_{i_{t}}({\bf{z}}_{i% _{t}}^{t}),~{}~{}{\bf{g}}_{i_{t}}({\bf{x}}^{t+1})\big{]}\in{\mathbb{R}}^{d% \times 2}.bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≜ [ - bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_d × 2 end_POSTSUPERSCRIPT and bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≜ [ bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_d × 2 end_POSTSUPERSCRIPT . (13)

Since each of matrices 𝐔tsuperscript𝐔𝑡{\bf U}^{t}bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and 𝐕tsuperscript𝐕𝑡{\bf V}^{t}bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT only contains two columns, updating the variables 𝐱t+1superscript𝐱𝑡1{\bf{x}}^{t+1}bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT, 𝐮t+1superscript𝐮𝑡1{\bf{u}}^{t+1}bold_u start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT and 𝐆t+1superscript𝐆𝑡1{\bf G}^{t+1}bold_G start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT can be implemented within the complexity of 𝒪(d2)𝒪superscript𝑑2{\mathcal{O}}(d^{2})caligraphic_O ( italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) flops. Additionally, the memory cost for maintaining variables {𝐳it}i=1nsuperscriptsubscriptsuperscriptsubscript𝐳𝑖𝑡𝑖1𝑛\{{\bf{z}}_{i}^{t}\}_{i=1}^{n}{ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, {𝐠it(𝐳t)}i=1nsuperscriptsubscriptsubscript𝐠subscript𝑖𝑡superscript𝐳𝑡𝑖1𝑛\{{\bf{g}}_{i_{t}}({\bf{z}}^{t})\}_{i=1}^{n}{ bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, 𝐮tsuperscript𝐮𝑡{\bf{u}}^{t}bold_u start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and 𝐆tsuperscript𝐆𝑡{\bf G}^{t}bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT is 𝒪(nd+d2)𝒪𝑛𝑑superscript𝑑2{\mathcal{O}}(nd+d^{2})caligraphic_O ( italic_n italic_d + italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). As a comparison, the vanilla Gauss–Newton (GN) method [6, 39] performs the iteration

𝐱t+1=𝐱t(𝐉(𝐱t)𝐉(𝐱t))1𝐉(𝐱t)𝐟(𝐱t),superscript𝐱𝑡1superscript𝐱𝑡superscript𝐉superscriptsuperscript𝐱𝑡top𝐉superscript𝐱𝑡1𝐉superscriptsuperscript𝐱𝑡top𝐟superscript𝐱𝑡\displaystyle\begin{split}{\bf{x}}^{t+1}=&{\bf{x}}^{t}-\left({\bf J}({\bf{x}}^% {t})^{\top}{\bf J}({\bf{x}}^{t})\right)^{-1}{\bf J}({\bf{x}}^{t})^{\top}{\bf{f% }}({\bf{x}}^{t}),\end{split}start_ROW start_CELL bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = end_CELL start_CELL bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - ( bold_J ( bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_J ( bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_f ( bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , end_CELL end_ROW (14)

which takes a computation cost of 𝒪(nd+d3)𝒪𝑛𝑑superscript𝑑3{\mathcal{O}}(nd+d^{3})caligraphic_O ( italic_n italic_d + italic_d start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) flops and a memory cost of 𝒪(nd+d2)𝒪𝑛𝑑superscript𝑑2{\mathcal{O}}(nd+d^{2})caligraphic_O ( italic_n italic_d + italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ).

We summarize the procedure of our IGN in Algorithm 1. Observe that the vanilla GN iteration (14) can be reformulated by

𝐱t+1=(𝐉(𝐱t)𝐉(𝐱t))1𝐉(𝐱t)(𝐉(𝐱t)𝐱t𝐟(𝐱t)).superscript𝐱𝑡1superscript𝐉superscriptsuperscript𝐱𝑡top𝐉superscript𝐱𝑡1𝐉superscriptsuperscript𝐱𝑡top𝐉superscript𝐱𝑡superscript𝐱𝑡𝐟superscript𝐱𝑡\displaystyle{\bf{x}}^{t+1}=\left({\bf J}({\bf{x}}^{t})^{\top}{\bf J}({\bf{x}}% ^{t})\right)^{-1}{\bf J}({\bf{x}}^{t})^{\top}({\bf J}({\bf{x}}^{t}){\bf{x}}^{t% }-{\bf{f}}({\bf{x}}^{t})).bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = ( bold_J ( bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_J ( bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_J ( bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_f ( bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ) . (15)

Comparing our updates (7)–(11) with (15), the aggregated variables 𝐮tsuperscript𝐮𝑡{\bf{u}}^{t}bold_u start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, 𝐇tsuperscript𝐇𝑡{\bf H}^{t}bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and 𝐆tsuperscript𝐆𝑡{\bf G}^{t}bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT can be regarded as the estimators of terms 𝐉(𝐱t)(𝐉(𝐱t)𝐱t𝐟(𝐱t))𝐉superscriptsuperscript𝐱𝑡top𝐉superscript𝐱𝑡superscript𝐱𝑡𝐟superscript𝐱𝑡{\bf J}({\bf{x}}^{t})^{\top}({\bf J}({\bf{x}}^{t}){\bf{x}}^{t}-{\bf{f}}({\bf{x% }}^{t}))bold_J ( bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_J ( bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_f ( bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ), 𝐉(𝐱t)𝐉(𝐱t)𝐉superscriptsuperscript𝐱𝑡top𝐉superscript𝐱𝑡{\bf J}({\bf{x}}^{t})^{\top}{\bf J}({\bf{x}}^{t})bold_J ( bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) and (𝐉(𝐱t)𝐉(𝐱t))1superscript𝐉superscriptsuperscript𝐱𝑡top𝐉superscript𝐱𝑡1({\bf J}({\bf{x}}^{t})^{\top}{\bf J}({\bf{x}}^{t}))^{-1}( bold_J ( bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT in scheme of (15) respectively. The efficiency of our IGN method comes from the strategy that we apply the different 𝐳itsuperscriptsubscript𝐳𝑖𝑡{\bf{z}}_{i}^{t}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT in the linear approximation (6) for the different component fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. In contrast, the vanilla GN method is based on the linear approximation at the identical point 𝐱tsubscript𝐱𝑡{\bf{x}}_{t}bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT for all components.

3.2 The Convergence Analysis

In this subsection, we establish the local superlinear convergence of the proposed IGN method.

We start our analysis from the following proposition, which shows the non-singularity of the Gram matrix associated with the exact Jacobian at the non-degenerate solution 𝐱dsuperscript𝐱superscript𝑑{\bf{x}}^{*}\in{\mathbb{R}}^{d}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT.

Proposition 1.

Under Assumption 3, it holds that

σmin(𝐉(𝐱)𝐉(𝐱))=μ2>0.subscript𝜎𝐉superscriptsuperscript𝐱top𝐉superscript𝐱superscript𝜇20\displaystyle\sigma_{\min}({\bf J}({\bf{x}}^{*})^{\top}{\bf J}({\bf{x}}^{*}))=% \mu^{2}>0.italic_σ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( bold_J ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) = italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > 0 . (16)

Under the continuous assumptions on 𝐟()𝐟{\bf{f}}(\cdot)bold_f ( ⋅ ) and 𝐉()𝐉{\bf J}(\cdot)bold_J ( ⋅ ), we can provide the Hölder continuity of the Gram matrices.

Lemma 1.

Under Assumptions 1 and 2, we have

𝐉(𝐲)𝐉(𝐲)𝐉(𝐱)𝐉(𝐱)norm𝐉superscript𝐲top𝐉𝐲𝐉superscript𝐱top𝐉𝐱\displaystyle\left\|{\bf J}({\bf{y}})^{\top}{\bf J}({\bf{y}})-{\bf J}({\bf{x}}% )^{\top}{\bf J}({\bf{x}})\right\|∥ bold_J ( bold_y ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_y ) - bold_J ( bold_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_x ) ∥ 2Lfν𝐲𝐱νabsent2subscript𝐿𝑓subscript𝜈superscriptnorm𝐲𝐱𝜈\displaystyle\leq 2L_{f}{\mathcal{H}}_{\nu}\left\|{\bf{y}}-{\bf{x}}\right\|^{\nu}≤ 2 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∥ bold_y - bold_x ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT

and

𝐠i(𝐲)𝐠i(𝐲)𝐠i(𝐱)𝐠i(𝐱)2Lfν𝐲𝐱νnormsubscript𝐠𝑖𝐲subscript𝐠𝑖superscript𝐲topsubscript𝐠𝑖𝐱subscript𝐠𝑖superscript𝐱top2subscript𝐿𝑓subscript𝜈superscriptnorm𝐲𝐱𝜈\displaystyle\left\|{\bf{g}}_{i}({\bf{y}}){\bf{g}}_{i}({\bf{y}})^{\top}-{\bf{g% }}_{i}({\bf{x}}){\bf{g}}_{i}({\bf{x}})^{\top}\right\|\leq 2L_{f}{\mathcal{H}}_% {\nu}\left\|{\bf{y}}-{\bf{x}}\right\|^{\nu}∥ bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_y ) bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_y ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∥ ≤ 2 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∥ bold_y - bold_x ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT

for all 𝐱,𝐲n𝐱𝐲superscript𝑛{\bf{x}},{\bf{y}}\in{\mathbb{R}}^{n}bold_x , bold_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and i[n]𝑖delimited-[]𝑛i\in[n]italic_i ∈ [ italic_n ].

Recall the design of IGN method is motivated by the estimation 𝐇t𝐉(𝐱t)𝐉(𝐱t)subscript𝐇𝑡𝐉superscriptsubscript𝐱𝑡top𝐉subscript𝐱𝑡{\bf H}_{t}\approx{\bf J}({\bf{x}}_{t})^{\top}{\bf J}({\bf{x}}_{t})bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≈ bold_J ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), which indicates we can connect Proposition 1 and Lemma 1 to bound the spectrum of 𝐇tsubscript𝐇𝑡{\bf H}_{t}bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT as follows.

Lemma 2.

Under Assumptions 1, 2 and 3, running IGN (Algorithm 1) with 𝐇0=𝐉(𝐱0)𝐉(𝐱0)superscript𝐇0𝐉superscriptsuperscript𝐱0top𝐉superscript𝐱0{\bf H}^{0}={\bf J}({\bf{x}}^{0})^{\top}{\bf J}({\bf{x}}^{0})bold_H start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = bold_J ( bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) and 𝐆0=(𝐇0)1superscript𝐆0superscriptsuperscript𝐇01{\bf G}^{0}=({\bf H}^{0})^{-1}bold_G start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = ( bold_H start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT holds that

σmin(𝐇t)μ22Lfνi=1n𝐳it𝐱νsubscript𝜎superscript𝐇𝑡superscript𝜇22subscript𝐿𝑓subscript𝜈superscriptsubscript𝑖1𝑛superscriptnormsuperscriptsubscript𝐳𝑖𝑡superscript𝐱𝜈\displaystyle\sigma_{\min}({\bf H}^{t})\geq\mu^{2}-2L_{f}{\mathcal{H}}_{\nu}% \sum_{i=1}^{n}\left\|{\bf{z}}_{i}^{t}-{\bf{x}}^{*}\right\|^{\nu}italic_σ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ≥ italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT

for all t0𝑡0t\geq 0italic_t ≥ 0.

Lemma 2 indicates that if all of the points 𝐳1t,,𝐳ntsuperscriptsubscript𝐳1𝑡superscriptsubscript𝐳𝑛𝑡{\bf{z}}_{1}^{t},\dots,{\bf{z}}_{n}^{t}bold_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , … , bold_z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT are sufficiently close to the solution 𝐱superscript𝐱{\bf{x}}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, the matrix 𝐇tsuperscript𝐇𝑡{\bf H}^{t}bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT is positive-definite, which guarantees that the inverse of 𝐇t+1superscript𝐇𝑡1{\bf H}^{t+1}bold_H start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT (i.e., matrix 𝐆t+1superscript𝐆𝑡1{\bf G}^{t+1}bold_G start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT) in the algorithm is always well-defined. Based on this intuition, we use induction to show the positive-definiteness of matrices 𝐇tsuperscript𝐇𝑡{\bf H}^{t}bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and 𝐈+(𝐕t)𝐆t𝐔t𝐈superscriptsuperscript𝐕𝑡topsuperscript𝐆𝑡superscript𝐔𝑡{\bf I}+({\bf V}^{t})^{\top}{\bf G}^{t}{\bf U}^{t}bold_I + ( bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, and the local superlinear convergence rate of the proposed method.

Theorem 1.

Under Assumptions 1, 2 and 3, running IGN (Algorithm 1) with initialization 𝐱0dsuperscript𝐱0superscript𝑑{\bf{x}}^{0}\in{\mathbb{R}}^{d}bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, 𝐇0=𝐉(𝐱0)𝐉(𝐱0)superscript𝐇0𝐉superscriptsuperscript𝐱0top𝐉superscript𝐱0{\bf H}^{0}={\bf J}({\bf{x}}^{0})^{\top}{\bf J}({\bf{x}}^{0})bold_H start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = bold_J ( bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) and 𝐆0=(𝐇0)1superscript𝐆0superscriptsuperscript𝐇01{\bf G}^{0}=({\bf H}^{0})^{-1}bold_G start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = ( bold_H start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT such that

𝐱0𝐱(μ24Lfνn)1/ν,normsuperscript𝐱0superscript𝐱superscriptsuperscript𝜇24subscript𝐿𝑓subscript𝜈𝑛1𝜈\displaystyle\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|\leq\left(\frac{\mu^{2}}{% 4L_{f}{\mathcal{H}}_{\nu}n}\right)^{1/\nu},∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ ( divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_n end_ARG ) start_POSTSUPERSCRIPT 1 / italic_ν end_POSTSUPERSCRIPT ,

we have 𝐇t(μ2/2)𝐈succeeds-or-equalssuperscript𝐇𝑡superscript𝜇22𝐈{\bf H}^{t}\succeq(\mu^{2}/2){\bf I}bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⪰ ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 ) bold_I and σmin(𝐈+(𝐕t)𝐆t𝐔t)>0subscript𝜎𝐈superscriptsuperscript𝐕𝑡topsuperscript𝐆𝑡superscript𝐔𝑡0\sigma_{\min}({\bf I}+({\bf V}^{t})^{\top}{\bf G}^{t}{\bf U}^{t})>0italic_σ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( bold_I + ( bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) > 0 for all t0𝑡0t\geq 0italic_t ≥ 0. Additionally, there exists sequence {rt}superscript𝑟𝑡\{r^{t}\}{ italic_r start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT } such that 𝐱t𝐱rtnormsuperscript𝐱𝑡superscript𝐱subscript𝑟𝑡\left\|{\bf{x}}^{t}-{\bf{x}}^{*}\right\|\leq r_{t}∥ bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and it holds

rt+1c(1+ν)(t/n1)rtwithc=11n(1(12(1+ν))(1+ν))formulae-sequencesubscript𝑟𝑡1superscript𝑐superscript1𝜈𝑡𝑛1subscript𝑟𝑡with𝑐11𝑛1superscript121𝜈1𝜈\displaystyle r_{t+1}\leq c^{(1+\nu)^{\left(\left\lfloor{t}/{n}\right\rfloor-1% \right)}}r_{t}\qquad\text{with}\qquad c=1-\frac{1}{n}\left(1-\left(\frac{1}{2(% 1+\nu)}\right)^{(1+\nu)}\right)italic_r start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ≤ italic_c start_POSTSUPERSCRIPT ( 1 + italic_ν ) start_POSTSUPERSCRIPT ( ⌊ italic_t / italic_n ⌋ - 1 ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT with italic_c = 1 - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ( 1 - ( divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG ) start_POSTSUPERSCRIPT ( 1 + italic_ν ) end_POSTSUPERSCRIPT )

for all tn𝑡𝑛t\geq nitalic_t ≥ italic_n.

Observe that the term of c𝑐citalic_c in Theorem 1 is monotonically decreasing with respect to ν(0,1]𝜈01\nu\in(0,1]italic_ν ∈ ( 0 , 1 ], we can bound it by 115/(16n)c<11/(2n)11516𝑛𝑐112𝑛1-15/(16n)\leq c<1-1/(2n)1 - 15 / ( 16 italic_n ) ≤ italic_c < 1 - 1 / ( 2 italic_n ) and simplify the superlinear convergence as follows.

Corollary 1.

Under the settings and notations of Theorem 1, we have

rt+1<(112n)(1+ν)(t/n1)rtsubscript𝑟𝑡1superscript112𝑛superscript1𝜈𝑡𝑛1subscript𝑟𝑡\displaystyle r_{t+1}<\Big{(}1-\frac{1}{2n}\Big{)}^{(1+\nu)^{\left(\left% \lfloor{t}/{n}\right\rfloor-1\right)}}r_{t}italic_r start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT < ( 1 - divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ) start_POSTSUPERSCRIPT ( 1 + italic_ν ) start_POSTSUPERSCRIPT ( ⌊ italic_t / italic_n ⌋ - 1 ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT

for all tn𝑡𝑛t\geq nitalic_t ≥ italic_n.

Theorem 1 also indicates that the larger ν(0,1]𝜈01\nu\in(0,1]italic_ν ∈ ( 0 , 1 ] leads to faster superlinear convergence rate. In the case of ν=1𝜈1\nu=1italic_ν = 1, our Hölder continuous condition (Assumption 2) degenerates to the Lipschitz continuity, then we can achieve the n𝑛nitalic_n-step local quadratic convergence rate as follows.

Corollary 2.

Under the settings and notations of Theorem 1 with ν=1𝜈1\nu=1italic_ν = 1, we have the n𝑛nitalic_n-step quadratic convergence

rt14rtn2subscript𝑟𝑡14superscriptsubscript𝑟𝑡𝑛2\displaystyle r_{t}\leq\frac{1}{4}r_{t-n}^{2}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG 4 end_ARG italic_r start_POSTSUBSCRIPT italic_t - italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

for all tn𝑡𝑛t\geq nitalic_t ≥ italic_n.

4 The Extension to Mini-Batch Methods

We can also improve the efficiency of IGN method by using the mini-batch update. Specifically, we consider the mini-batch size k𝑘kitalic_k and divide the indices into m=n/k𝑚𝑛𝑘m=\lceil n/k\rceilitalic_m = ⌈ italic_n / italic_k ⌉ non-overlapping subsets, i.e., we partition the index set [n]={1,,n}delimited-[]𝑛1𝑛[n]=\{1,\dots,n\}[ italic_n ] = { 1 , … , italic_n } into subsets {𝒮1,,𝒮m}subscript𝒮1subscript𝒮𝑚\{{\mathcal{S}}_{1},\dots,{\mathcal{S}}_{m}\}{ caligraphic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , caligraphic_S start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT } such that |𝒮1|==|𝒮m1|=ksubscript𝒮1subscript𝒮𝑚1𝑘|{\mathcal{S}}_{1}|=\dots=|{\mathcal{S}}_{m-1}|=k| caligraphic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | = ⋯ = | caligraphic_S start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT | = italic_k, i=1m𝒮i=[n]superscriptsubscript𝑖1𝑚subscript𝒮𝑖delimited-[]𝑛\cup_{i=1}^{m}{\mathcal{S}}_{i}=[n]∪ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = [ italic_n ] and 𝒮i𝒮j=subscript𝒮𝑖subscript𝒮𝑗{\mathcal{S}}_{i}\cap{\mathcal{S}}_{j}=\emptysetcaligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∩ caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = ∅ for all distinct i,j[k]𝑖𝑗delimited-[]𝑘i,j\in[k]italic_i , italic_j ∈ [ italic_k ].

The mini-batch variant of IGN also apply the update of the form 𝐱t+1=𝐆t𝐮tsuperscript𝐱𝑡1superscript𝐆𝑡superscript𝐮𝑡{\bf{x}}^{t+1}={\bf G}^{t}{\bf{u}}^{t}bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_u start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT. Different from IGN, we update variables {𝐳it}i=1msuperscriptsubscriptsuperscriptsubscript𝐳𝑖𝑡𝑖1𝑚\{{\bf{z}}_{i}^{t}\}_{i=1}^{m}{ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT with the smaller period m=n/k𝑚𝑛𝑘m=\lceil n/k\rceilitalic_m = ⌈ italic_n / italic_k ⌉ such that

𝐳it+1={𝐱t+1,if i=it,𝐳it,otherwise,superscriptsubscript𝐳𝑖𝑡1casessuperscript𝐱𝑡1if 𝑖subscript𝑖𝑡superscriptsubscript𝐳𝑖𝑡otherwise\displaystyle{\bf{z}}_{i}^{t+1}=\begin{cases}{\bf{x}}^{t+1},&\text{if~{}}i=i_{% t},\\ {\bf{z}}_{i}^{t},&\text{otherwise},\end{cases}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = { start_ROW start_CELL bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT , end_CELL start_CELL if italic_i = italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , end_CELL start_CELL otherwise , end_CELL end_ROW (17)

where it=t%m+1subscript𝑖𝑡percent𝑡𝑚1i_{t}={t\%m}+1italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_t % italic_m + 1.

We establish recursions of aggregated variables by a mini-batch way as follows

{𝐮t+1=𝐮tj𝒮it(𝐠j(𝐳itt)𝐳ittfj(𝐳itt))𝐠j(𝐳itt)+j𝒮it(𝐠j(𝐱t+1)𝐱t+1fj(𝐱t+1))𝐠j(𝐱t+1),𝐇t+1=𝐇tj𝒮it𝐠j(𝐳itt)𝐠j(𝐳itt)+j𝒮it𝐠j(𝐱t+1)𝐠j(𝐱t+1),𝐆t+1=𝐆t𝐆t𝐔t(𝐈+(𝐕t)𝐆t𝐔t)1(𝐕t)𝐆t,casessuperscript𝐮𝑡1superscript𝐮𝑡subscript𝑗subscript𝒮subscript𝑖𝑡subscript𝐠𝑗superscriptsuperscriptsubscript𝐳subscript𝑖𝑡𝑡topsuperscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝑓𝑗superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝐠𝑗superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝑗subscript𝒮subscript𝑖𝑡subscript𝐠𝑗superscriptsuperscript𝐱𝑡1topsuperscript𝐱𝑡1subscript𝑓𝑗superscript𝐱𝑡1subscript𝐠𝑗superscript𝐱𝑡1otherwisesuperscript𝐇𝑡1superscript𝐇𝑡subscript𝑗subscript𝒮subscript𝑖𝑡subscript𝐠𝑗superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝐠𝑗superscriptsuperscriptsubscript𝐳subscript𝑖𝑡𝑡topsubscript𝑗subscript𝒮subscript𝑖𝑡subscript𝐠𝑗superscript𝐱𝑡1subscript𝐠𝑗superscriptsuperscript𝐱𝑡1topotherwisesuperscript𝐆𝑡1superscript𝐆𝑡superscript𝐆𝑡superscript𝐔𝑡superscript𝐈superscriptsuperscript𝐕𝑡topsuperscript𝐆𝑡superscript𝐔𝑡1superscriptsuperscript𝐕𝑡topsuperscript𝐆𝑡otherwise\displaystyle\small\!\!\!\!\begin{cases}\displaystyle{{\bf{u}}^{t+1}\!=\!{\bf{% u}}^{t}\!-\!\!\sum_{j\in{\mathcal{S}}_{i_{t}}}\!\!\left({\bf{g}}_{j}({\bf{z}}_% {i_{t}}^{t})^{\top}{\bf{z}}_{i_{t}}^{t}\!-\!f_{j}({\bf{z}}_{i_{t}}^{t})\right)% {\bf{g}}_{j}({\bf{z}}_{i_{t}}^{t})\!+\!\!\sum_{j\in{\mathcal{S}}_{i_{t}}}\!\!% \left({\bf{g}}_{j}({\bf{x}}^{t+1})^{\top}{\bf{x}}^{t+1}\!-\!f_{j}({\bf{x}}^{t+% 1})\right){\bf{g}}_{j}({\bf{x}}^{t+1}),}\\[11.38092pt] \displaystyle{{\bf H}^{t+1}\!=\!{\bf H}^{t}-\sum_{j\in{\mathcal{S}}_{i_{t}}}{% \bf{g}}_{j}({\bf{z}}_{i_{t}}^{t}){\bf{g}}_{j}({\bf{z}}_{i_{t}}^{t})^{\top}+% \sum_{j\in{\mathcal{S}}_{i_{t}}}{\bf{g}}_{j}({\bf{x}}^{t+1}){\bf{g}}_{j}({\bf{% x}}^{t+1})^{\top},}\\[11.38092pt] \displaystyle{{\bf G}^{t+1}\!=\!{\bf G}^{t}-{\bf G}^{t}{\bf U}^{t}({\bf I}+({% \bf V}^{t})^{\top}{\bf G}^{t}{\bf U}^{t})^{-1}({\bf V}^{t})^{\top}{\bf G}^{t},% }\end{cases}{ start_ROW start_CELL bold_u start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_u start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ) bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) ) bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL bold_H start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL bold_G start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( bold_I + ( bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , end_CELL start_CELL end_CELL end_ROW (18)

where we construct matrices 𝐔t,𝐕td×2|𝒮it|superscript𝐔𝑡superscript𝐕𝑡superscript𝑑2subscript𝒮subscript𝑖𝑡{\bf U}^{t},{\bf V}^{t}\in{\mathbb{R}}^{d\times 2|{\mathcal{S}}_{i_{t}}|}bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d × 2 | caligraphic_S start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT | end_POSTSUPERSCRIPT as

{𝐔t=[𝐠j1(𝐳itt),𝐠j1(𝐱t+1),,𝐠j|𝒮it|(𝐳itt),𝐠j|𝒮it|(𝐱t+1)],𝐕t=[𝐠j1(𝐳itt),𝐠j1(𝐱t+1),,𝐠j|𝒮it|(𝐳itt),𝐠j|𝒮it|(𝐱t+1)],casessuperscript𝐔𝑡subscript𝐠subscript𝑗1superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝐠subscript𝑗1superscript𝐱𝑡1subscript𝐠subscript𝑗subscript𝒮subscript𝑖𝑡superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝐠subscript𝑗subscript𝒮subscript𝑖𝑡superscript𝐱𝑡1otherwisesuperscript𝐕𝑡subscript𝐠subscript𝑗1superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝐠subscript𝑗1superscript𝐱𝑡1subscript𝐠subscript𝑗subscript𝒮subscript𝑖𝑡superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝐠subscript𝑗subscript𝒮subscript𝑖𝑡superscript𝐱𝑡1otherwise\displaystyle\begin{cases}{\bf U}^{t}=\Big{[}-{\bf{g}}_{j_{1}}({\bf{z}}_{i_{t}% }^{t}),~{}~{}{\bf{g}}_{j_{1}}({\bf{x}}^{t+1}),~{}\cdots~{},~{}-{\bf{g}}_{j_{|{% \mathcal{S}}_{i_{t}}|}}({\bf{z}}_{i_{t}}^{t}),~{}~{}{\bf{g}}_{j_{|{\mathcal{S}% }_{i_{t}}|}}({\bf{x}}^{t+1})\Big{]},\\[5.69046pt] {\bf V}^{t}=\Big{[}{\bf{g}}_{j_{1}}({\bf{z}}_{i_{t}}^{t}),~{}~{}{\bf{g}}_{j_{1% }}({\bf{x}}^{t+1}),~{}\cdots~{},~{}{\bf{g}}_{j_{|{\mathcal{S}}_{i_{t}}|}}({\bf% {z}}_{i_{t}}^{t}),~{}~{}{\bf{g}}_{j_{|{\mathcal{S}}_{i_{t}}|}}({\bf{x}}^{t+1})% \Big{]},\end{cases}{ start_ROW start_CELL bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = [ - bold_g start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_g start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) , ⋯ , - bold_g start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT | caligraphic_S start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT | end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_g start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT | caligraphic_S start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT | end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) ] , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = [ bold_g start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_g start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) , ⋯ , bold_g start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT | caligraphic_S start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT | end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_g start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT | caligraphic_S start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT | end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) ] , end_CELL start_CELL end_CELL end_ROW

and indices j1,,j|𝒮it|subscript𝑗1subscript𝑗subscript𝒮subscript𝑖𝑡j_{1},\dots,j_{|{\mathcal{S}}_{i_{t}}|}italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_j start_POSTSUBSCRIPT | caligraphic_S start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT | end_POSTSUBSCRIPT are the elements in subset 𝒮itsubscript𝒮subscript𝑖𝑡{\mathcal{S}}_{i_{t}}caligraphic_S start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT such that |𝒮it|ksubscript𝒮subscript𝑖𝑡𝑘|{\mathcal{S}}_{i_{t}}|\leq k| caligraphic_S start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT | ≤ italic_k.

We formally present the procedure of the Mini-Batch Incremental Gauss-Newton (MB-IGN) method in Algorithm 2 (see Appendix A). The memory cost of MB-IGN is 𝒪(nd+d2)𝒪𝑛𝑑superscript𝑑2{\mathcal{O}}(nd+d^{2})caligraphic_O ( italic_n italic_d + italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), matching the complexity of IGN. Each iteration of MB-IGN includes the matrix multiplication of 𝐆tsuperscript𝐆𝑡{\bf G}^{t}bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, 𝐔tsuperscript𝐔𝑡{\bf U}^{t}bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and 𝐕tsuperscript𝐕𝑡{\bf V}^{t}bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT within the complexity of 𝒪(kd2)𝒪𝑘superscript𝑑2{\mathcal{O}}(kd^{2})caligraphic_O ( italic_k italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) flops. It is worth noting that the mini-batch update in MB-IGN can be efficiently implemented by block matrix operation that takes advantage of parallel computation [14].

Formally, we present the following convergence results of MB-IGN.

Theorem 2.

Under Assumptions 1, 2 and 3, running MB-IGN (Algorithm 2) with mini-batch size k𝑘kitalic_k and initialization 𝐱0dsuperscript𝐱0superscript𝑑{\bf{x}}^{0}\in{\mathbb{R}}^{d}bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, 𝐇0=𝐉(𝐱0)𝐉(𝐱0)superscript𝐇0𝐉superscriptsuperscript𝐱0top𝐉superscript𝐱0{\bf H}^{0}={\bf J}({\bf{x}}^{0})^{\top}{\bf J}({\bf{x}}^{0})bold_H start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = bold_J ( bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) and 𝐆0=(𝐇0)1superscript𝐆0superscriptsuperscript𝐇01{\bf G}^{0}=({\bf H}^{0})^{-1}bold_G start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = ( bold_H start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT such that

𝐱0𝐱(μ24kLfνn/k)1/ν,normsuperscript𝐱0superscript𝐱superscriptsuperscript𝜇24𝑘subscript𝐿𝑓subscript𝜈𝑛𝑘1𝜈\displaystyle\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|\leq\left(\frac{\mu^{2}}{% 4kL_{f}{\mathcal{H}}_{\nu}\lceil{n}/{k}\rceil}\right)^{1/\nu},∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ ( divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ⌈ italic_n / italic_k ⌉ end_ARG ) start_POSTSUPERSCRIPT 1 / italic_ν end_POSTSUPERSCRIPT ,

we have 𝐇t(μ2/2)𝐈succeeds-or-equalssuperscript𝐇𝑡superscript𝜇22𝐈{\bf H}^{t}\succeq(\mu^{2}/2){\bf I}\,bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⪰ ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 ) bold_I and σmin(𝐈+(𝐕t)𝐆t𝐔t)>0subscript𝜎𝐈superscriptsuperscript𝐕𝑡topsuperscript𝐆𝑡superscript𝐔𝑡0\sigma_{\min}({\bf I}+({\bf V}^{t})^{\top}{\bf G}^{t}{\bf U}^{t})>0italic_σ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( bold_I + ( bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) > 0 for all t0𝑡0t\geq 0italic_t ≥ 0. Additionally, there exists sequence {rt}subscript𝑟𝑡\{r_{t}\}{ italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } such that 𝐱t𝐱rtnormsuperscript𝐱𝑡superscript𝐱subscript𝑟𝑡\left\|{\bf{x}}^{t}-{\bf{x}}^{*}\right\|\leq r_{t}∥ bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and it holds

rt+1c(1+ν)(tn/k1)rtwithc=11n/k(1(12(1+ν))(1+ν)).formulae-sequencesubscript𝑟𝑡1superscript𝑐superscript1𝜈𝑡𝑛𝑘1subscript𝑟𝑡with𝑐11𝑛𝑘1superscript121𝜈1𝜈\displaystyle r_{t+1}\leq c^{(1+\nu)^{\left(\left\lfloor\frac{t}{\lceil n/k% \rceil}\right\rfloor-1\right)}}r_{t}\qquad\text{with}\qquad c=1-\frac{1}{% \lceil n/k\rceil}\left(1-\left(\frac{1}{2(1+\nu)}\right)^{(1+\nu)}\right).italic_r start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ≤ italic_c start_POSTSUPERSCRIPT ( 1 + italic_ν ) start_POSTSUPERSCRIPT ( ⌊ divide start_ARG italic_t end_ARG start_ARG ⌈ italic_n / italic_k ⌉ end_ARG ⌋ - 1 ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT with italic_c = 1 - divide start_ARG 1 end_ARG start_ARG ⌈ italic_n / italic_k ⌉ end_ARG ( 1 - ( divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG ) start_POSTSUPERSCRIPT ( 1 + italic_ν ) end_POSTSUPERSCRIPT ) .

The terms of n/k𝑛𝑘n/kitalic_n / italic_k in the results of Theorem 2 imply that increasing mini-batch size k𝑘kitalic_k can speed up the convergence of MB-IGN. Additionally, the convergence of MB-IGN matches IGN if we take k=1𝑘1k=1italic_k = 1.

Similar to the discussion in Section 3.2, we have the following corollary for MB-IGN method.

Corollary 3.

Under settings of Theorem 2, we have

rt+1(112n/k)(1+ν)(tn/k1)rtsubscript𝑟𝑡1superscript112𝑛𝑘superscript1𝜈𝑡𝑛𝑘1subscript𝑟𝑡\displaystyle r_{t+1}\leq\Big{(}1-\frac{1}{2\lceil n/k\rceil}\Big{)}^{(1+\nu)^% {\left(\left\lfloor\frac{t}{\lceil n/k\rceil}\right\rfloor-1\right)}}r_{t}italic_r start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ≤ ( 1 - divide start_ARG 1 end_ARG start_ARG 2 ⌈ italic_n / italic_k ⌉ end_ARG ) start_POSTSUPERSCRIPT ( 1 + italic_ν ) start_POSTSUPERSCRIPT ( ⌊ divide start_ARG italic_t end_ARG start_ARG ⌈ italic_n / italic_k ⌉ end_ARG ⌋ - 1 ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT

for all tn/k𝑡𝑛𝑘t\geq\lceil n/k\rceilitalic_t ≥ ⌈ italic_n / italic_k ⌉. In the case of ν=1𝜈1\nu=1italic_ν = 1, we have the n/k𝑛𝑘\lceil n/k\rceil⌈ italic_n / italic_k ⌉-step quadratic convergence

rt14rtn/k2subscript𝑟𝑡14superscriptsubscript𝑟𝑡𝑛𝑘2\displaystyle r_{t}\leq\frac{1}{4}r_{t-\lceil n/k\rceil}^{2}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG 4 end_ARG italic_r start_POSTSUBSCRIPT italic_t - ⌈ italic_n / italic_k ⌉ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

for all tn/k𝑡𝑛𝑘t\geq\lceil n/k\rceilitalic_t ≥ ⌈ italic_n / italic_k ⌉.

Specifically, Corollary 3 indicates that the MB-IGN  method with k=n𝑘𝑛k=nitalic_k = italic_n has the quadratic convergence under the assumption of Lipschitz continuous Jacobian (Assumption 2 with ν=1𝜈1\nu=1italic_ν = 1), which matches the rate of vanilla Gauss–Newton method.

5 Related Work

Table 1: We compare the per-iteration computation complexity, memory cost, convergence rates and the assumption of Jacobin of proposed methods and baselines. The rightmost column means that the methods GN, SNR, GN-BFGS, BFB and BBB require to access all of the components f1,,fnsubscript𝑓1subscript𝑓𝑛f_{1},\dots,f_{n}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT at each iteration, while the other methods only require to access one or mini-batch of components.
Methods Computation Memory Convergence Jacobian fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
GN [6, 39] 𝒪(nd2+d3)𝒪𝑛superscript𝑑2superscript𝑑3{\mathcal{O}}(nd^{2}+d^{3})caligraphic_O ( italic_n italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_d start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) 𝒪(nd+d2)𝒪𝑛𝑑superscript𝑑2{\mathcal{O}}(nd+d^{2})caligraphic_O ( italic_n italic_d + italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) quadratic Lipschitz
SNR [51]\sharp 𝒪(nτ2+τ3)𝒪𝑛superscript𝜏2superscript𝜏3{\mathcal{O}}(n\tau^{2}+\tau^{3})caligraphic_O ( italic_n italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) 𝒪(τd)𝒪𝜏𝑑{\mathcal{O}}(\tau d)caligraphic_O ( italic_τ italic_d ) sublinear Lipschitz
GN-BFGS [29]{\ddagger} 𝒪(d2)𝒪superscript𝑑2{\mathcal{O}}(d^{2})caligraphic_O ( italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) 𝒪(d2)𝒪superscript𝑑2{\mathcal{O}}(d^{2})caligraphic_O ( italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) asymptotic superlinear Hölder
BGB [32]§§\S§ 𝒪(k~d2)𝒪~𝑘superscript𝑑2{\mathcal{O}}(\tilde{k}d^{2})caligraphic_O ( over~ start_ARG italic_k end_ARG italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) 𝒪(d2)𝒪superscript𝑑2{\mathcal{O}}(d^{2})caligraphic_O ( italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) 𝒪((1k~/d)t(t1)/4)𝒪superscript1~𝑘𝑑𝑡𝑡14{\mathcal{O}}\big{(}(1-{\tilde{k}}/d)^{t(t-1)/4}\big{)}caligraphic_O ( ( 1 - over~ start_ARG italic_k end_ARG / italic_d ) start_POSTSUPERSCRIPT italic_t ( italic_t - 1 ) / 4 end_POSTSUPERSCRIPT ) Lipschitz
BBB [32]§§\S§ 𝒪(k~d2)𝒪~𝑘superscript𝑑2{\mathcal{O}}(\tilde{k}d^{2})caligraphic_O ( over~ start_ARG italic_k end_ARG italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) 𝒪(d2)𝒪superscript𝑑2{\mathcal{O}}(d^{2})caligraphic_O ( italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) 𝒪((1k~/(ϰd))t(t1)/4)𝒪superscript1~𝑘italic-ϰ𝑑𝑡𝑡14{\mathcal{O}}\big{(}(1-\tilde{k}/(\varkappa d))^{t(t-1)/4}\big{)}caligraphic_O ( ( 1 - over~ start_ARG italic_k end_ARG / ( italic_ϰ italic_d ) ) start_POSTSUPERSCRIPT italic_t ( italic_t - 1 ) / 4 end_POSTSUPERSCRIPT ) Lipschitz
EKF [8] 𝒪(d2)𝒪superscript𝑑2{\mathcal{O}}(d^{2})caligraphic_O ( italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) 𝒪(d2)𝒪superscript𝑑2{\mathcal{O}}(d^{2})caligraphic_O ( italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) sublinear Lipschitz
EKF-S [36, 23] 𝒪(d2)𝒪superscript𝑑2{\mathcal{O}}(d^{2})caligraphic_O ( italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) 𝒪(d2)𝒪superscript𝑑2{\mathcal{O}}(d^{2})caligraphic_O ( italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) linear Lipschitz
IGN (this work) 𝒪(d2)𝒪superscript𝑑2{\mathcal{O}}(d^{2})caligraphic_O ( italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) 𝒪(nd+d2)𝒪𝑛𝑑superscript𝑑2{\mathcal{O}}(nd+d^{2})caligraphic_O ( italic_n italic_d + italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) 𝒪((11/(2n))(1+ν)t/n)𝒪superscript112𝑛superscript1𝜈𝑡𝑛{\mathcal{O}}\big{(}(1-1/(2n))^{(1+\nu)^{\left\lfloor t/n\right\rfloor}}\big{)}caligraphic_O ( ( 1 - 1 / ( 2 italic_n ) ) start_POSTSUPERSCRIPT ( 1 + italic_ν ) start_POSTSUPERSCRIPT ⌊ italic_t / italic_n ⌋ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) Hölder
MB-IGN (this work) 𝒪(kd2)𝒪𝑘superscript𝑑2{\mathcal{O}}(kd^{2})caligraphic_O ( italic_k italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) 𝒪(nd+d2)𝒪𝑛𝑑superscript𝑑2{\mathcal{O}}(nd+d^{2})caligraphic_O ( italic_n italic_d + italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) 𝒪((1k/(2n))(1+ν)kt/n)𝒪superscript1𝑘2𝑛superscript1𝜈𝑘𝑡𝑛{\mathcal{O}}\big{(}(1-k/(2n))^{(1+\nu)^{\left\lfloor{kt}/{n}\right\rfloor}}% \big{)}caligraphic_O ( ( 1 - italic_k / ( 2 italic_n ) ) start_POSTSUPERSCRIPT ( 1 + italic_ν ) start_POSTSUPERSCRIPT ⌊ italic_k italic_t / italic_n ⌋ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) Hölder
  • \sharp

    The SNR method requires the star convexity in their minimization formulation. The notation τ𝜏\tauitalic_τ presents the sketch size.

  • {\ddagger}

    The GN-BFGS method requires n=d𝑛𝑑n=ditalic_n = italic_d and the Jacobian is symmetric.

  • §§\S§

    The BGB and BBB methods requires n=d𝑛𝑑n=ditalic_n = italic_d. The notation k~~𝑘\tilde{k}over~ start_ARG italic_k end_ARG is rank of the modification matrix and ϰLf/μitalic-ϰsubscript𝐿𝑓𝜇\varkappa\triangleq L_{f}/\muitalic_ϰ ≜ italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT / italic_μ is the condition number.

We compare the theoretical results of proposed IGN and MB-IGN with existing methods in Table 1.

The methods including Gauss–Newton-based BFGS (GN-BFGS) [29], Block Good Broyden’s method (BGB) [32], Block Bad Broyden’s method (BBB) [32] and Sketched Newton–Raphson (SNR) [51] only focus on establishing the Jacobian estimator, while each of their iteration depends on accessing all components in the nonlinear vector function that is expensive for large-scale problems. In addition, the quasi-Newton methods including GN-BFGS [29], BGB [32] and BBB [32] only work for the scenario of n=d𝑛𝑑n=ditalic_n = italic_d. The SNR method enjoys an efficient update for large n𝑛nitalic_n, while it lacks the local superlinear convergence like classical Newton-type methods.

The Extended Kalman Filter with Stepsize (EKF-S) [36, 23] is based on the incremental update that only accesses one (or mini-batch) of components and the corresponding gradient at each iteration. Concretely, the EKF-S method performs the iteration

𝐱t+1=𝐱tαt(𝐇~t)1𝐠it(𝐱t)fit(𝐱t)superscript𝐱𝑡1superscript𝐱𝑡superscript𝛼𝑡superscriptsuperscript~𝐇𝑡1subscript𝐠subscript𝑖𝑡superscript𝐱𝑡subscript𝑓subscript𝑖𝑡superscript𝐱𝑡\displaystyle{\bf{x}}^{t+1}={\bf{x}}^{t}-\alpha^{t}(\tilde{\bf H}^{t})^{-1}{% \bf{g}}_{i_{t}}({\bf{x}}^{t})f_{i_{t}}({\bf{x}}^{t})bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( over~ start_ARG bold_H end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) italic_f start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT )

with some stepsize αt>0superscript𝛼𝑡0\alpha^{t}>0italic_α start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT > 0, where 𝐇~td×dsuperscript~𝐇𝑡superscript𝑑𝑑\tilde{\bf H}^{t}\in{\mathbb{R}}^{d\times d}over~ start_ARG bold_H end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d × italic_d end_POSTSUPERSCRIPT is the estimator for the Gram matrix 𝐉(𝐱t)𝐉(𝐱t)𝐉superscriptsuperscript𝐱𝑡top𝐉superscript𝐱𝑡{\bf J}({\bf{x}}^{t})^{\top}{\bf J}({\bf{x}}^{t})bold_J ( bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) which is constructed by the recursion

𝐇~t+1=λt𝐇~t+𝐠it(𝐱t+1)𝐠it(𝐱t+1)superscript~𝐇𝑡1superscript𝜆𝑡superscript~𝐇𝑡subscript𝐠subscript𝑖𝑡superscript𝐱𝑡1subscript𝐠subscript𝑖𝑡superscriptsuperscript𝐱𝑡1top\displaystyle\tilde{\bf H}^{t+1}=\lambda^{t}\tilde{\bf H}^{t}+{\bf{g}}_{i_{t}}% ({\bf{x}}^{t+1}){\bf{g}}_{i_{t}}({\bf{x}}^{t+1})^{\top}over~ start_ARG bold_H end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = italic_λ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT over~ start_ARG bold_H end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT (19)

for some λt(0,1]superscript𝜆𝑡01\lambda^{t}\in(0,1]italic_λ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∈ ( 0 , 1 ]. The original Extended Kalman Filter method (EKF) [8] takes a fixed stepsize of αt=1superscript𝛼𝑡1\alpha^{t}=1italic_α start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = 1 in the above iteration and achieves a sublinear convergence rate. Later, Gürbüzbalaban et al. [23] showed that introducing the adaptive stepsize can achieve the linear convergence rate. Note that EKF-S and EKF will not explicitly reuse the information of vector git(𝐱t)subscript𝑔subscript𝑖𝑡subscript𝐱𝑡g_{i_{t}}({\bf{x}}_{t})italic_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) in later iterations. In other words, the recursion (19) indicates all information of the historical gradient is heuristically compressed into the term of λt𝐇~tsuperscript𝜆𝑡superscript~𝐇𝑡\lambda^{t}\tilde{\bf H}^{t}italic_λ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT over~ start_ARG bold_H end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT. In contrast, the proposed IGN method establishes the Gram matrix approximation 𝐇t𝐉(𝐱t)𝐉(𝐱t)subscript𝐇𝑡𝐉superscriptsuperscript𝐱𝑡top𝐉superscript𝐱𝑡{\bf H}_{t}\approx{\bf J}({\bf{x}}^{t})^{\top}{\bf J}({\bf{x}}^{t})bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≈ bold_J ( bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) by equations (10) and (12), which clearly corresponds to the linear approximation (6)-(7) by reusing all of the historical gradients {𝐠i(𝐳it)}i=1nsuperscriptsubscriptsubscript𝐠𝑖superscriptsubscript𝐳𝑖𝑡𝑖1𝑛\{{\bf{g}}_{i}({\bf{z}}_{i}^{t})\}_{i=1}^{n}{ bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. This strategy encourages a more accurate Gram matrix estimation in our method and leads to a superlinear convergence rate.

The incremental Newton-type methods have also been studied in finite-sum strongly convex optimization [43, 44, 35, 28, 33]. In the view of our formulation (1), this work considers solving the system of nonlinear equations of the form 𝐟(𝐱)=𝟎𝐟𝐱0{\bf{f}}({\bf{x}})={\bf{0}}bold_f ( bold_x ) = bold_0, where 𝐟:dd:𝐟superscript𝑑superscript𝑑{\bf{f}}:{\mathbb{R}}^{d}\to{\mathbb{R}}^{d}bold_f : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is the gradient of some objective function and has the finite-sum structure 𝐟(𝐱)(1/N)i=1N𝐟i(𝐱)𝐟𝐱1𝑁superscriptsubscript𝑖1𝑁subscript𝐟𝑖𝐱{\bf{f}}({\bf{x}})\triangleq(1/N)\sum_{i=1}^{N}{\bf{f}}_{i}({\bf{x}})bold_f ( bold_x ) ≜ ( 1 / italic_N ) ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT bold_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) with symmetric positive-definite Jacobian. These methods can achieve superlinear convergence rates by accessing one of {𝐟i}i=1Nsuperscriptsubscriptsubscript𝐟𝑖𝑖1𝑁\{{\bf{f}}_{i}\}_{i=1}^{N}{ bold_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT and its Jacobian at each iteration. However, their iterations have to maintain Jacobians for all of the individuals {𝐟i}i=1Nsuperscriptsubscriptsubscript𝐟𝑖𝑖1𝑁\{{\bf{f}}_{i}\}_{i=1}^{N}{ bold_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT with a memory cost of 𝒪(Nd2)𝒪𝑁superscript𝑑2{\mathcal{O}}(Nd^{2})caligraphic_O ( italic_N italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), which is prohibitive for a large N𝑁Nitalic_N.

6 Experiments

We conduct numerical experiments on the following applications:

  • Regularized Logistic Regression: We consider training the binary classifier 𝐱d𝐱superscript𝑑{\bf{x}}\in{\mathbb{R}}^{d}bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT by solving the nonconvex regularized logistic regression problem [2, 27]

    min𝐱d(𝐱)1Nj=1Nlog(1+exp(bj𝐚j𝐱))+θk=1dνxk21+νxk2,subscript𝐱superscript𝑑𝐱1𝑁superscriptsubscript𝑗1𝑁1subscript𝑏𝑗superscriptsubscript𝐚𝑗top𝐱𝜃superscriptsubscript𝑘1𝑑𝜈superscriptsubscript𝑥𝑘21𝜈superscriptsubscript𝑥𝑘2\displaystyle\min_{{\bf{x}}\in{\mathbb{R}}^{d}}\ell({\bf{x}})\triangleq\frac{1% }{N}\sum_{j=1}^{N}\log(1+\exp(-b_{j}{\bf{a}}_{j}^{\top}{\bf{x}}))+\theta\sum_{% k=1}^{d}\frac{\nu x_{k}^{2}}{1+\nu x_{k}^{2}},roman_min start_POSTSUBSCRIPT bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_ℓ ( bold_x ) ≜ divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT roman_log ( 1 + roman_exp ( - italic_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_x ) ) + italic_θ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT divide start_ARG italic_ν italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 + italic_ν italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ,

    where {(𝐚j,bj)}j=1Nsuperscriptsubscriptsubscript𝐚𝑗subscript𝑏𝑗𝑗1𝑁\{({\bf{a}}_{j},b_{j})\}_{j=1}^{N}{ ( bold_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT is the training set such that 𝐚jdsubscript𝐚𝑗superscript𝑑{\bf{a}}_{j}\in{\mathbb{R}}^{d}bold_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and bj{1,1}subscript𝑏𝑗11b_{j}\in\{-1,1\}italic_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ { - 1 , 1 } for all j[N]𝑗delimited-[]𝑁j\in[N]italic_j ∈ [ italic_N ]. We set θ=102𝜃superscript102\theta=10^{-2}italic_θ = 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT and ν=1𝜈1\nu=1italic_ν = 1 for the model. We formulated the above minimization problem by the formulation of nonlinear equations (1) with 𝐟(𝐱)(𝐱)𝐟𝐱𝐱{\bf{f}}({\bf{x}})\triangleq\nabla\ell({\bf{x}})bold_f ( bold_x ) ≜ ∇ roman_ℓ ( bold_x ). We perform the experiments on dataset “DBWorld” (N=64𝑁64N=64italic_N = 64 and d=4,702𝑑4702d=4,702italic_d = 4 , 702) [19] for this problem.

  • Chandrasekhar’s H-Equation: We consider the Chandrasekhar’s H-equation, which is widely used in analytical radiative transfer theory [24, 13]. It can be formulated by problem (1) with

    fi(𝐱)=xi(1c2nj=1nμixjμi+μj)1for alli[n],whereμi=i1/2n.formulae-sequencesubscript𝑓𝑖𝐱subscript𝑥𝑖superscript1𝑐2𝑛superscriptsubscript𝑗1𝑛subscript𝜇𝑖subscript𝑥𝑗subscript𝜇𝑖subscript𝜇𝑗1for all𝑖delimited-[]𝑛wheresubscript𝜇𝑖𝑖12𝑛\displaystyle f_{i}({\bf{x}})=x_{i}-\left(1-\frac{c}{2n}\sum_{j=1}^{n}\frac{% \mu_{i}x_{j}}{\mu_{i}+\mu_{j}}\right)^{-1}~{}\text{for all}~{}i\in[n],\quad% \text{where}\quad\mu_{i}=\frac{i-1/2}{n}.italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) = italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - ( 1 - divide start_ARG italic_c end_ARG start_ARG 2 italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT for all italic_i ∈ [ italic_n ] , where italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG italic_i - 1 / 2 end_ARG start_ARG italic_n end_ARG .

    We set d=2,000𝑑2000d=2,000italic_d = 2 , 000 and c=1105𝑐1superscript105c=1-10^{-5}italic_c = 1 - 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT for this problem in our experiments. .

  • Soft Maximum Minimization: We consider the soft maximum minimization problem [37, 12]

    min𝐱dh(𝐱)μln(i=1Nexp(𝐚i,𝐱biμ))+λ2x2,subscript𝐱superscript𝑑𝐱𝜇superscriptsubscript𝑖1𝑁subscript𝐚𝑖𝐱subscript𝑏𝑖𝜇𝜆2superscriptnorm𝑥2\displaystyle\min_{{\bf{x}}\in{\mathbb{R}}^{d}}h({\bf{x}})\triangleq\mu\ln{% \left(\sum_{i=1}^{N}\exp{\left(\frac{\langle{\bf{a}}_{i},{\bf{x}}\rangle-b_{i}% }{\mu}\right)}\right)}+\frac{\lambda}{2}\left\|x\right\|^{2},roman_min start_POSTSUBSCRIPT bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_h ( bold_x ) ≜ italic_μ roman_ln ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT roman_exp ( divide start_ARG ⟨ bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_x ⟩ - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_μ end_ARG ) ) + divide start_ARG italic_λ end_ARG start_ARG 2 end_ARG ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (20)

    which can be formulated by problem (1) with 𝐟(𝐱)h(𝐱)𝐟𝐱𝐱{\bf{f}}({\bf{x}})\triangleq\nabla h({\bf{x}})bold_f ( bold_x ) ≜ ∇ italic_h ( bold_x ). We follow the setting of [17, 18] by generating the entries of 𝐚1,,𝐚Ndsubscript𝐚1subscript𝐚𝑁superscript𝑑{\bf{a}}_{1},\cdots,{\bf{a}}_{N}\in{\mathbb{R}}^{d}bold_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , bold_a start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and 𝐛N𝐛superscript𝑁{\bf{b}}\in{\mathbb{R}}^{N}bold_b ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT randomly and independently from the uniform distribution on [1,1]11[-1,1][ - 1 , 1 ]. We set N=2000𝑁2000N=2000italic_N = 2000, d=2000𝑑2000d=2000italic_d = 2000, μ=5𝜇5\mu=5italic_μ = 5 and λ=2𝜆2\lambda=2italic_λ = 2 in our experiments.

We first investigate the impact of mini-batch size k𝑘kitalic_k of MB-IGN method (Algorithm 2) on the performance. We run MB-IGN by taking the different mini-batch sizes on the three problems and present the empirical results for time (s) against 𝐟(𝐱)norm𝐟𝐱||{\bf{f}}({\bf{x}})||| | bold_f ( bold_x ) | | in Figure 1, where the setting k=1𝑘1k=1italic_k = 1 corresponds to our IGN method (Algorithm 1). We can observe that the mini-batch update is effective in reducing the time cost. The mini-batch sizes of 500500500500, 200200200200, and 100100100100 achieve the best performance on the problems of robust logistic regression, Chandrasekhar’s H-equation, and soft maximum minimization, respectively.

We then compare the proposed methods MB-IGN (Algorithm 2) with baseline methods SNR [51], EKF-S [8, 36], BGB [32] and BBB [32]. We present the empirical results for the number of epochs against 𝐟(𝐱)norm𝐟𝐱||{\bf{f}}({\bf{x}})||| | bold_f ( bold_x ) | | in Figure 2, where one epoch means one complete pass of all components of the nonlinear vector function. We can obverse that the proposed MB-IGN and the baseline method BGB outperforms others on all problems. This is reasonable since only these two methods enjoy the explicit condition-number-free superlinear convergence rates (see Table 1). The superlinear convergence rate of BBB method depends on the condition number, which leads to its performance not always better than the linear convergent method EKF-S.

We also present the empirical results for the cost of time (second) against 𝐟(𝐱)norm𝐟𝐱||{\bf{f}}({\bf{x}})||| | bold_f ( bold_x ) | | in Figure 3. We can obverse that the proposed MB-IGN always performs significantly better than all baseline methods. This is in line with our expectations because only our MB-IGN method enjoys both the superlinear convergence rate and the cheap iteration cost. Although the BGB method has a comparable number of epochs to our MB-IGN on the problem of solving Chandrasekhar’s H-Equation, the iteration with accessing all components makes its time cost expensive.

Refer to caption
(a) Robust Logistic Regression
Refer to caption
(b) Chandrasekhar’s H-Equation
Refer to caption
(c) Soft Maximum Minimization
Figure 1: Experimental results of time (s) vs. 𝐟(𝐱)norm𝐟𝐱\|{\bf{f}}({\bf{x}})\|∥ bold_f ( bold_x ) ∥ for MB-IGN with different mini-batch size k𝑘kitalic_k.
Refer to caption
(a) Robust Logistic Regression
Refer to caption
(b) Chandrasekhar’s H-Equation
Refer to caption
(c) Soft Maximum Minimization
Figure 2: Experimental results of epochs vs. 𝐟(𝐱)norm𝐟𝐱\|{\bf{f}}({\bf{x}})\|∥ bold_f ( bold_x ) ∥ for all methods.
Refer to caption
(a) Robust Logistic Regression
Refer to caption
(b) Chandrasekhar’s H-Equation
Refer to caption
(c) Soft Maximum Minimization
Figure 3: Experimental results of time (s) vs. 𝐟(𝐱)norm𝐟𝐱\|{\bf{f}}({\bf{x}})\|∥ bold_f ( bold_x ) ∥ for all methods.

7 Conclusion

In this work, we propose the incremental Gauss–Newton method (IGN) for solving the system of nonlinear equations. We design the algorithm by tracking the historical gradient of all components to establish the estimator of the Gram matrix (its inverse). The theoretical analysis shows IGN enjoys the explicit superlinear convergence rate under the assumption of Hölder continuous Jacobian. We also provide a mini-batch extension of our IGN method (MB-IGN) and show it has an even faster superlinear convergence rate. The numerical experiments on the applications of regularized logistic regression, Chandrasekhar’s H-equation, and soft maximum minimization validate the advantage of the proposed methods over existing baselines.

In the future, it will be interesting to study the incremental Gauss–Newton method to solve nonlinear equations in the distributed setting. It is also possible to design incremental quasi-Newton methods for solving the general nonlinear equations.

References

  • Al-Baali et al. [2014] Mehiddin Al-Baali, Emilio Spedicato, and Francesca Maggioni. Broyden’s quasi-Newton methods for a nonlinear system of equations and unconstrained optimization: a review and open problems. Optimization Methods and Software, 29(5):937–954, 2014.
  • Antoniadis et al. [2011] Anestis Antoniadis, Irène Gijbels, and Mila Nikolova. Penalized likelihood regression for generalized linear models with non-quadratic penalties. Annals of the Institute of Statistical Mathematics, 63:585–615, 2011.
  • Athans et al. [1968] Michael Athans, Richard Wishner, and Anthony Bertolini. Suboptimal state estimation for continuous-time nonlinear systems from discrete noisy measurements. IEEE Transactions on Automatic Control, 13(5):504–514, 1968.
  • Bai et al. [2019] Shaojie Bai, J. Zico Kolter, and Vladlen Koltun. Deep equilibrium models. Advances in Neural Information Processing Systems, 2019.
  • Bell [1994] Bradley M. Bell. The iterated Kalman smoother as a Gauss–Newton method. SIAM Journal on Optimization, 4(3):626–636, 1994.
  • Ben-Israel [1966] Adi Ben-Israel. A Newton–Raphson method for the solution of systems of equations. Journal of Mathematical analysis and applications, 15(2):243–252, 1966.
  • Berthier et al. [2021] Eloıse Berthier, Justin Carpentier, and Francis Bach. Fast and robust stability region estimation for nonlinear dynamical systems. In European Control Conference, 2021.
  • Bertsekas [1996] Dimitri P. Bertsekas. Incremental least squares methods and the extended Kalman filter. SIAM Journal on Optimization, 6(3):807–822, 1996.
  • Bertsekas [1997] Dimitri P. Bertsekas. A new class of incremental gradient methods for least squares problems. SIAM Journal on Optimization, 7(4):913–926, 1997.
  • Botev et al. [2017] Aleksandar Botev, Hippolyt Ritter, and David Barber. Practical Gauss–Newton optimisation for deep learning. In International Conference on Machine Learning, 2017.
  • Broyden [1965] Charles G Broyden. A class of methods for solving nonlinear simultaneous equations. Mathematics of computation, 19(92):577–593, 1965.
  • Bullins [2020] Brian Bullins. Highly smooth minimization of non-smooth problems. In Conference on Learning Theory, 2020.
  • Chandrasekhar [1960] Subrahmanyan Chandrasekhar. Radiative transfer. Courier Corporation, 1960.
  • Davis [1998] Timothy A. Davis. Block matrix methods: Taking advantage of high-performance computers. Technical report, Computer and Information Sciences Department, 1998.
  • Défossez and Bach [2015] Alexandre Défossez and Francis Bach. Averaged least-mean-squares: Bias-variance trade-offs and optimal sampling distributions. In International Conference on Artificial Intelligence and Statistics, 2015.
  • Dennis Jr and Schnabel [1996] John E. Dennis Jr and Robert B. Schnabel. Numerical methods for unconstrained optimization and nonlinear equations. SIAM, 1996.
  • Doikov et al. [2023] Nikita Doikov, El Mahdi Chayti, and Martin Jaggi. Second-order optimization with lazy Hessians. In International Conference on Machine Learning, 2023.
  • Doikov et al. [2024] Nikita Doikov, Konstantin Mishchenko, and Yurii Nesterov. Super-universal regularized Newton method. SIAM Journal on Optimization, 34(1):27–56, 2024.
  • Filannino [2011] Michele Filannino. DBWorld e-mails. UCI Machine Learning Repository, 2011. DOI: https://doi.org/10.24432/C5589M.
  • Frehse and Bensoussan [1984] J. Frehse and A. Bensoussan. Nonlinear elliptic systems in stochastic game theory. 1984.
  • Grapiglia and Nesterov [2017] Geovani N. Grapiglia and Yurii Nesterov. Regularized Newton methods for minimizing functions with Hölder continuous Hessians. SIAM Journal on Optimization, 27(1):478–506, 2017.
  • Grapiglia and Nesterov [2019] Geovani N. Grapiglia and Yurii Nesterov. Accelerated regularized Newton methods for minimizing composite convex functions. SIAM Journal on Optimization, 29(1):77–99, 2019.
  • Gürbüzbalaban et al. [2015] Mert Gürbüzbalaban, Asuman Ozdaglar, and Pablo Parrilo. A globally convergent incremental Newton method. Mathematical Programming, 151(1):283–313, 2015.
  • Hottel and Saforim [1967] Hoyt C. Hottel and Adel F. Saforim. Radiative transfer. 1967.
  • Kelley [1995] Carl T. Kelley. Iterative methods for linear and nonlinear equations. SIAM, 1995.
  • Kelley [2003] Carl T. Kelley. Solving nonlinear equations with Newton’s method. SIAM, 2003.
  • Kohler and Lucchi [2017] Jonas Moritz Kohler and Aurelien Lucchi. Sub-sampled cubic regularization for non-convex optimization. In International Conference on Machine Learning, 2017.
  • Lahoti et al. [2023] Aakash Lahoti, Spandan Senapati, Ketan Rajawat, and Alec Koppel. Sharpened lazy incremental quasi-Newton method. arXiv preprint arXiv:2305.17283, 2023.
  • Li and Fukushima [1999] Donghui Li and Masao Fukushima. A globally and superlinearly convergent Gauss–Newton-based BFGS method for symmetric nonlinear equations. SIAM Journal on numerical Analysis, 37(1):152–172, 1999.
  • Lin et al. [2021] Dachao Lin, Haishan Ye, and Zhihua Zhang. Explicit superlinear convergence rates of Broyden’s methods in nonlinear equations. arXiv preprint arXiv:2109.01974, 2021.
  • Liu and Luo [2022] Chengchang Liu and Luo Luo. Quasi-Newton methods for saddle point problems. Advances in Neural Information Processing Systems, 2022.
  • Liu et al. [2023] Chengchang Liu, Cheng Chen, Luo Luo, and John Lui. Block Broyden’s methods for solving nonlinear equations. Advances in Neural Information Processing Systems, 2023.
  • Liu et al. [2024] Zhuanghua Liu, Luo Luo, and Bryan Kian Hsiang Low. Incremental Quasi-newton methods with faster superlinear convergence rates. In AAAI Conference on Artificial Intelligence, 2024.
  • Ljung [1979] Lennart Ljung. Asymptotic behavior of the extended Kalman filter as a parameter estimator for linear systems. IEEE Transactions on Automatic Control, 24(1):36–50, 1979.
  • Mokhtari et al. [2018] Aryan Mokhtari, Mark Eisen, and Alejandro Ribeiro. IQN: An incremental quasi-Newton method with local superlinear convergence rate. SIAM Journal on Optimization, 28(2):1670–1698, 2018.
  • Moriyama et al. [2003] Hiroyuki Moriyama, Nobuo Yamashita, and Masao Fukushima. The incremental Gauss–Newton algorithm with adaptive stepsize rule. Computational Optimization and Applications, 26:107–141, 2003.
  • Nesterov [2005] Yurii Nesterov. Smooth minimization of non-smooth functions. Mathematical programming, 103:127–152, 2005.
  • Nesterov and Polyak [2006] Yurii Nesterov and Boris T. Polyak. Cubic regularization of Newton method and its global performance. Mathematical programming, 108(1):177–205, 2006.
  • Nocedal and Wright [1999] Jorge Nocedal and Stephen J. Wright. Numerical optimization. Springer, 1999.
  • Nourian and Caines [2013] Mojtaba Nourian and Peter E. Caines. ϵitalic-ϵ\epsilonitalic_ϵ-Nash mean field game theory for nonlinear stochastic dynamical systems with major and minor agents. SIAM Journal on Control and Optimization, 51(4):3302–3331, 2013.
  • Petersen and Pedersen [2008] Kaare Brandt Petersen and Michael Syskind Pedersen. The matrix cookbook. Technical University of Denmark, 7(15):510, 2008.
  • Pilanci and Wainwright [2017] Mert Pilanci and Martin J. Wainwright. Newton sketch: A near linear-time optimization algorithm with linear-quadratic convergence. SIAM Journal on Optimization, 27(1):205–245, 2017.
  • Rodomanov and Kropotov [2015] Anton Rodomanov and Dmitry Kropotov. A Newton-type incremental method with a superlinear convergence rate. In Optimization for Machine Learning, 2015.
  • Rodomanov and Kropotov [2016] Anton Rodomanov and Dmitry Kropotov. A superlinearly-convergent proximal Newton-type method for the optimization of finite sums. In International Conference on Machine Learning, 2016.
  • Trémolet [2007] Yannick Trémolet. Model-error estimation in 4D-var. Quarterly Journal of the Royal Meteorological Society: A journal of the atmospheric sciences, applied meteorology and physical oceanography, 133(626):1267–1280, 2007.
  • Wang [2012] Yong Wang. Gauss–Newton method. Wiley Interdisciplinary Reviews: Computational Statistics, 4(4):415–420, 2012.
  • Woodbury [1950] Max A. Woodbury. Inverting modified matrices. Department of Statistics, Princeton University, 1950.
  • Woodruff [2014] David P. Woodruff. Sketching as a tool for numerical linear algebra. Foundations and Trends® in Theoretical Computer Science, 10(1–2):1–157, 2014.
  • Ye et al. [2021a] Haishan Ye, Dachao Lin, and Zhihua Zhang. Greedy and random Broyden’s methods with explicit superlinear convergence rates in nonlinear equations. arXiv preprint arXiv:2110.08572, 2021a.
  • Ye et al. [2021b] Haishan Ye, Luo Luo, and Zhihua Zhang. Approximate Newton methods. Journal of Machine Learning Research, 22(66):1–41, 2021b.
  • Yuan et al. [2022] Rui Yuan, Alessandro Lazaric, and Robert M. Gower. Sketched Newton–Raphson. SIAM Journal on Optimization, 32(3):1555–1583, 2022.

Appendix

The appendix is organized as follows. In Section A, we provide the detailed procedure of Mini-Batch Incremental Gauss–Newton Method (MB-IGN). In Section B, we provide some results for Jacobians In Section C, we introduces an auxiliary sequence and analyze its properties. In Sections D and E, we provide the convergence analysis for proposed IGN and MB-IGN, respectively.

Appendix A The Mini-Batch Incremental Gauss–Newton Method

We provide the detailed procedure of Mini-Batch Incremental Gauss–Newton Method (MB-IGN) in Algorithm 2.

Algorithm 2 Mini-Batch Incremental Gauss–Newton Method (MB-IGN)
1:Input: 𝐱0dsuperscript𝐱0superscript𝑑{\bf{x}}^{0}\in{\mathbb{R}}^{d}bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, 𝐮0dsuperscript𝐮0superscript𝑑{\bf{u}}^{0}\in{\mathbb{R}}^{d}bold_u start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, 𝐇0,𝐆0d×dsuperscript𝐇0superscript𝐆0superscript𝑑𝑑{\bf H}^{0},{\bf G}^{0}\in{\mathbb{R}}^{d\times d}bold_H start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , bold_G start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d × italic_d end_POSTSUPERSCRIPT, kn𝑘𝑛k\leq nitalic_k ≤ italic_n, m=n/k𝑚𝑛𝑘m=\lceil n/k\rceilitalic_m = ⌈ italic_n / italic_k ⌉
2:Partition the index set [n]={1,,n}delimited-[]𝑛1𝑛[n]=\{1,\dots,n\}[ italic_n ] = { 1 , … , italic_n } into subsets {𝒮1,,𝒮m}subscript𝒮1subscript𝒮𝑚\{{\mathcal{S}}_{1},\dots,{\mathcal{S}}_{m}\}{ caligraphic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , caligraphic_S start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT } such that |𝒮1|==|𝒮m1|=ksubscript𝒮1subscript𝒮𝑚1𝑘|{\mathcal{S}}_{1}|=\dots=|{\mathcal{S}}_{m-1}|=k| caligraphic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | = ⋯ = | caligraphic_S start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT | = italic_k,   i=1m𝒮i=[n]superscriptsubscript𝑖1𝑚subscript𝒮𝑖delimited-[]𝑛\cup_{i=1}^{m}{\mathcal{S}}_{i}=[n]∪ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = [ italic_n ]  and   𝒮i𝒮j=subscript𝒮𝑖subscript𝒮𝑗{\mathcal{S}}_{i}\cap{\mathcal{S}}_{j}=\emptysetcaligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∩ caligraphic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = ∅  for all i,j[k]𝑖𝑗delimited-[]𝑘i,j\in[k]italic_i , italic_j ∈ [ italic_k ]
3:for t=0,1,𝑡01t=0,1,\dotsitalic_t = 0 , 1 , …
4:𝐱t+1=𝐆t𝐮tsuperscript𝐱𝑡1superscript𝐆𝑡superscript𝐮𝑡{\bf{x}}^{t+1}={\bf G}^{t}{\bf{u}}^{t}bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_u start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT
5:it=t%m+1subscript𝑖𝑡percent𝑡𝑚1i_{t}=t\%m+1italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_t % italic_m + 1
6:𝐔t=[𝐠j1(𝐳itt),𝐠j1(𝐱t+1),,𝐠j|𝒮it|(𝐳itt),𝐠j|𝒮it|(𝐱t+1)]superscript𝐔𝑡subscript𝐠subscript𝑗1superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝐠subscript𝑗1superscript𝐱𝑡1subscript𝐠subscript𝑗subscript𝒮subscript𝑖𝑡superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝐠subscript𝑗subscript𝒮subscript𝑖𝑡superscript𝐱𝑡1{\bf U}^{t}=\Big{[}-{\bf{g}}_{j_{1}}({\bf{z}}_{i_{t}}^{t}),~{}~{}{\bf{g}}_{j_{% 1}}({\bf{x}}^{t+1}),~{}\cdots~{},~{}-{\bf{g}}_{j_{|{\mathcal{S}}_{i_{t}}|}}({% \bf{z}}_{i_{t}}^{t}),~{}~{}{\bf{g}}_{j_{|{\mathcal{S}}_{i_{t}}|}}({\bf{x}}^{t+% 1})\Big{]}bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = [ - bold_g start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_g start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) , ⋯ , - bold_g start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT | caligraphic_S start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT | end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_g start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT | caligraphic_S start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT | end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) ]
7:𝐕t=[𝐠j1(𝐳itt),𝐠j1(𝐱t+1),,𝐠j|𝒮it|(𝐳itt),𝐠j|𝒮it|(𝐱t+1)]superscript𝐕𝑡subscript𝐠subscript𝑗1superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝐠subscript𝑗1superscript𝐱𝑡1subscript𝐠subscript𝑗subscript𝒮subscript𝑖𝑡superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝐠subscript𝑗subscript𝒮subscript𝑖𝑡superscript𝐱𝑡1{\bf V}^{t}=\Big{[}{\bf{g}}_{j_{1}}({\bf{z}}_{i_{t}}^{t}),~{}~{}{\bf{g}}_{j_{1% }}({\bf{x}}^{t+1}),~{}\cdots~{},~{}{\bf{g}}_{j_{|{\mathcal{S}}_{i_{t}}|}}({\bf% {z}}_{i_{t}}^{t}),~{}~{}{\bf{g}}_{j_{|{\mathcal{S}}_{i_{t}}|}}({\bf{x}}^{t+1})% \Big{]}bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = [ bold_g start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_g start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) , ⋯ , bold_g start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT | caligraphic_S start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT | end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_g start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT | caligraphic_S start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT | end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) ]
8:𝐮t+1=𝐮tj𝒮it(𝐠j(𝐳itt)𝐳ittfj(𝐳itt))𝐠j(𝐳itt)+j𝒮it(𝐠j(𝐱t+1)𝐱t+1fj(𝐱t+1))𝐠j(𝐱t+1)superscript𝐮𝑡1superscript𝐮𝑡subscript𝑗subscript𝒮subscript𝑖𝑡subscript𝐠𝑗superscriptsuperscriptsubscript𝐳subscript𝑖𝑡𝑡topsuperscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝑓𝑗superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝐠𝑗superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝑗subscript𝒮subscript𝑖𝑡subscript𝐠𝑗superscriptsuperscript𝐱𝑡1topsuperscript𝐱𝑡1subscript𝑓𝑗superscript𝐱𝑡1subscript𝐠𝑗superscript𝐱𝑡1\displaystyle{{\bf{u}}^{t+1}\!=\!{\bf{u}}^{t}\!-\!\!\sum_{j\in{\mathcal{S}}_{i% _{t}}}\!\!\left({\bf{g}}_{j}({\bf{z}}_{i_{t}}^{t})^{\top}{\bf{z}}_{i_{t}}^{t}% \!-\!f_{j}({\bf{z}}_{i_{t}}^{t})\right){\bf{g}}_{j}({\bf{z}}_{i_{t}}^{t})\!+\!% \!\sum_{j\in{\mathcal{S}}_{i_{t}}}\!\!\left({\bf{g}}_{j}({\bf{x}}^{t+1})^{\top% }{\bf{x}}^{t+1}\!-\!f_{j}({\bf{x}}^{t+1})\right){\bf{g}}_{j}({\bf{x}}^{t+1})}bold_u start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_u start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ) bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) ) bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT )
9:𝐇t+1=𝐇tj𝒮it𝐠j(𝐳itt)𝐠j(𝐳itt)+j𝒮it𝐠j(𝐱t+1)𝐠j(𝐱t+1)superscript𝐇𝑡1superscript𝐇𝑡subscript𝑗subscript𝒮subscript𝑖𝑡subscript𝐠𝑗superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝐠𝑗superscriptsuperscriptsubscript𝐳subscript𝑖𝑡𝑡topsubscript𝑗subscript𝒮subscript𝑖𝑡subscript𝐠𝑗superscript𝐱𝑡1subscript𝐠𝑗superscriptsuperscript𝐱𝑡1top\displaystyle{{\bf H}^{t+1}\!=\!{\bf H}^{t}-\sum_{j\in{\mathcal{S}}_{i_{t}}}{% \bf{g}}_{j}({\bf{z}}_{i_{t}}^{t}){\bf{g}}_{j}({\bf{z}}_{i_{t}}^{t})^{\top}+% \sum_{j\in{\mathcal{S}}_{i_{t}}}{\bf{g}}_{j}({\bf{x}}^{t+1}){\bf{g}}_{j}({\bf{% x}}^{t+1})^{\top}}bold_H start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT
10:𝐆t+1=𝐆t𝐆t𝐔t(𝐈+(𝐕t)𝐆t𝐔t)1(𝐕t)𝐆tsuperscript𝐆𝑡1superscript𝐆𝑡superscript𝐆𝑡superscript𝐔𝑡superscript𝐈superscriptsuperscript𝐕𝑡topsuperscript𝐆𝑡superscript𝐔𝑡1superscriptsuperscript𝐕𝑡topsuperscript𝐆𝑡{\bf G}^{t+1}={\bf G}^{t}-{\bf G}^{t}{\bf U}^{t}({\bf I}+({\bf V}^{t})^{\top}{% \bf G}^{t}{\bf U}^{t})^{-1}({\bf V}^{t})^{\top}{\bf G}^{t}bold_G start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( bold_I + ( bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT
11:𝐳it+1={𝐱t+1,if i=it𝐳it,otherwisesuperscriptsubscript𝐳𝑖𝑡1casessuperscript𝐱𝑡1if 𝑖subscript𝑖𝑡superscriptsubscript𝐳𝑖𝑡otherwise{\bf{z}}_{i}^{t+1}=\begin{cases}{\bf{x}}^{t+1},&\text{if~{}}i=i_{t}\\ {\bf{z}}_{i}^{t},&\text{otherwise}\end{cases}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = { start_ROW start_CELL bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT , end_CELL start_CELL if italic_i = italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , end_CELL start_CELL otherwise end_CELL end_ROW
12:end for

Appendix B Some Basic Results for Jacobians

This section presents some useful results for our later analysis.

Lemma 3.

(Hölder continuity of each gradient) Under Assumption 2, it satisfies that

𝐠i(𝐲)𝐠i(𝐱)ν𝐲𝐱ν,normsubscript𝐠𝑖𝐲subscript𝐠𝑖𝐱subscript𝜈superscriptnorm𝐲𝐱𝜈\displaystyle\left\|{\bf{g}}_{i}({\bf{y}})-{\bf{g}}_{i}({\bf{x}})\right\|\leq{% \mathcal{H}}_{\nu}\left\|{\bf{y}}-{\bf{x}}\right\|^{\nu},∥ bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_y ) - bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) ∥ ≤ caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∥ bold_y - bold_x ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT , (21)

for any 𝐱,𝐲d𝐱𝐲superscript𝑑{\bf{x}},{\bf{y}}\in{\mathbb{R}}^{d}bold_x , bold_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, and i[n]𝑖delimited-[]𝑛i\in[n]italic_i ∈ [ italic_n ].

Proof.

We denote

𝐉~=[(𝐠1(𝐲)𝐠1(𝐱))(𝐠n(𝐲)𝐠n(𝐱))]n×dwith𝐠i(𝐱)=fi(𝐱)formulae-sequence~𝐉matrixsuperscriptsubscript𝐠1𝐲subscript𝐠1𝐱topsuperscriptsubscript𝐠𝑛𝐲subscript𝐠𝑛𝐱topsuperscript𝑛𝑑withsubscript𝐠𝑖𝐱subscript𝑓𝑖𝐱\displaystyle\tilde{{\bf J}}=\begin{bmatrix}({\bf{g}}_{1}({\bf{y}})-{\bf{g}}_{% 1}({\bf{x}}))^{\top}\\ \vdots\\ ({\bf{g}}_{n}({\bf{y}})-{\bf{g}}_{n}({\bf{x}}))^{\top}\end{bmatrix}\in{\mathbb% {R}}^{n\times d}\qquad\text{with}\qquad{\bf{g}}_{i}({\bf{x}})=\nabla f_{i}({% \bf{x}})over~ start_ARG bold_J end_ARG = [ start_ARG start_ROW start_CELL ( bold_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y ) - bold_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_x ) ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL ( bold_g start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_y ) - bold_g start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_x ) ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_d end_POSTSUPERSCRIPT with bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) = ∇ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x )

and let 𝐞insubscript𝐞𝑖superscript𝑛{\bf{e}}_{i}\in{\mathbb{R}}^{n}bold_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT be the i𝑖iitalic_i-th standard basic vector in n𝑛nitalic_n-dimensional Euclidean space. Then the facts 𝐉~=𝐉(𝐲)𝐉(𝐱)~𝐉𝐉𝐲𝐉𝐱\tilde{{\bf J}}={\bf J}({\bf{y}})-{\bf J}({\bf{x}})over~ start_ARG bold_J end_ARG = bold_J ( bold_y ) - bold_J ( bold_x ) and 𝐉~𝐞i=𝐠i(𝐲)𝐠i(𝐱)superscript~𝐉topsubscript𝐞𝑖subscript𝐠𝑖𝐲subscript𝐠𝑖𝐱\tilde{{\bf J}}^{\top}{\bf{e}}_{i}={\bf{g}}_{i}({\bf{y}})-{\bf{g}}_{i}({\bf{x}})over~ start_ARG bold_J end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_y ) - bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) imply we have

𝐠i(𝐲)𝐠i(𝐱)𝐉~𝐞i=𝐉~=𝐉(𝐲)𝐉(𝐱)ν𝐲𝐱ν,normsubscript𝐠𝑖𝐲subscript𝐠𝑖𝐱normsuperscript~𝐉topnormsubscript𝐞𝑖norm~𝐉norm𝐉𝐲𝐉𝐱subscript𝜈superscriptnorm𝐲𝐱𝜈\displaystyle\left\|{\bf{g}}_{i}({\bf{y}})-{\bf{g}}_{i}({\bf{x}})\right\|\leq% \|\tilde{{\bf J}}^{\top}\|\left\|{\bf{e}}_{i}\right\|=\|\tilde{{\bf J}}\|=% \left\|{\bf J}({\bf{y}})-{\bf J}({\bf{x}})\right\|\leq{\mathcal{H}}_{\nu}\left% \|{\bf{y}}-{\bf{x}}\right\|^{\nu},∥ bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_y ) - bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) ∥ ≤ ∥ over~ start_ARG bold_J end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∥ ∥ bold_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ = ∥ over~ start_ARG bold_J end_ARG ∥ = ∥ bold_J ( bold_y ) - bold_J ( bold_x ) ∥ ≤ caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∥ bold_y - bold_x ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ,

where the last step is based on the Hölder continuouity of 𝐉()𝐉{\bf J}(\cdot)bold_J ( ⋅ ). ∎

Lemma 4.

(Bound for Hölder-continuous function) Under Assumption 2, we have

fi(𝐲)fi(𝐱)𝐠i(𝐱)(𝐲𝐱)ν1+ν𝐲𝐱1+ν,subscript𝑓𝑖𝐲subscript𝑓𝑖𝐱subscript𝐠𝑖superscript𝐱top𝐲𝐱subscript𝜈1𝜈superscriptnorm𝐲𝐱1𝜈\displaystyle f_{i}({\bf{y}})-f_{i}({\bf{x}})-{\bf{g}}_{i}({\bf{x}})^{\top}({% \bf{y}}-{\bf{x}})\leq\frac{{\mathcal{H}}_{\nu}}{1+\nu}\left\|{\bf{y}}-{\bf{x}}% \right\|^{1+\nu},italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_y ) - italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) - bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_y - bold_x ) ≤ divide start_ARG caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_ν end_ARG ∥ bold_y - bold_x ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT , (22)

for any 𝐱,𝐲d𝐱𝐲superscript𝑑{\bf{x}},{\bf{y}}\in{\mathbb{R}}^{d}bold_x , bold_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and i[n]𝑖delimited-[]𝑛i\in[n]italic_i ∈ [ italic_n ].

Proof.

Following the proof of [21, 22], we have

fi(𝐲)fi(𝐱)𝐠i(𝐱)(𝐲𝐱)subscript𝑓𝑖𝐲subscript𝑓𝑖𝐱subscript𝐠𝑖superscript𝐱top𝐲𝐱\displaystyle f_{i}({\bf{y}})-f_{i}({\bf{x}})-{\bf{g}}_{i}({\bf{x}})^{\top}({% \bf{y}}-{\bf{x}})italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_y ) - italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) - bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_y - bold_x ) =t=01𝐠i(𝐱+t(𝐲𝐱))(𝐲𝐱)dt𝐠i(𝐱)(𝐲𝐱)absentsuperscriptsubscript𝑡01subscript𝐠𝑖superscript𝐱𝑡𝐲𝐱top𝐲𝐱d𝑡subscript𝐠𝑖superscript𝐱top𝐲𝐱\displaystyle=\int_{t=0}^{1}{\bf{g}}_{i}({\bf{x}}+t({\bf{y}}-{\bf{x}}))^{\top}% ({\bf{y}}-{\bf{x}})\text{d}t-{\bf{g}}_{i}({\bf{x}})^{\top}({\bf{y}}-{\bf{x}})= ∫ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x + italic_t ( bold_y - bold_x ) ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_y - bold_x ) d italic_t - bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_y - bold_x )
=t=01(𝐠i(𝐱+t(𝐲𝐱))𝐠i(𝐱))(𝐲𝐱)dtabsentsuperscriptsubscript𝑡01superscriptsubscript𝐠𝑖𝐱𝑡𝐲𝐱subscript𝐠𝑖𝐱top𝐲𝐱d𝑡\displaystyle=\int_{t=0}^{1}\left({\bf{g}}_{i}({\bf{x}}+t({\bf{y}}-{\bf{x}}))-% {\bf{g}}_{i}({\bf{x}})\right)^{\top}({\bf{y}}-{\bf{x}})\text{d}t= ∫ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x + italic_t ( bold_y - bold_x ) ) - bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_y - bold_x ) d italic_t
t=01𝐠i(𝐱+t(𝐲𝐱))𝐠i(𝐱)𝐲𝐱dtabsentsuperscriptsubscript𝑡01normsubscript𝐠𝑖𝐱𝑡𝐲𝐱subscript𝐠𝑖𝐱norm𝐲𝐱d𝑡\displaystyle\leq\int_{t=0}^{1}\left\|{\bf{g}}_{i}({\bf{x}}+t({\bf{y}}-{\bf{x}% }))-{\bf{g}}_{i}({\bf{x}})\right\|\left\|{\bf{y}}-{\bf{x}}\right\|\text{d}t≤ ∫ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ∥ bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x + italic_t ( bold_y - bold_x ) ) - bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) ∥ ∥ bold_y - bold_x ∥ d italic_t
t=01νtν𝐲𝐱1+νdtabsentsuperscriptsubscript𝑡01subscript𝜈superscript𝑡𝜈superscriptnorm𝐲𝐱1𝜈d𝑡\displaystyle\leq\int_{t=0}^{1}{\mathcal{H}}_{\nu}t^{\nu}\left\|{\bf{y}}-{\bf{% x}}\right\|^{1+\nu}\text{d}t≤ ∫ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ∥ bold_y - bold_x ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT d italic_t
=ν𝐲𝐱1+νt=01tνdtabsentsubscript𝜈superscriptnorm𝐲𝐱1𝜈superscriptsubscript𝑡01superscript𝑡𝜈d𝑡\displaystyle={\mathcal{H}}_{\nu}\left\|{\bf{y}}-{\bf{x}}\right\|^{1+\nu}\int_% {t=0}^{1}t^{\nu}\text{d}t= caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∥ bold_y - bold_x ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT d italic_t
=ν1+ν𝐲𝐱1+ν,absentsubscript𝜈1𝜈superscriptnorm𝐲𝐱1𝜈\displaystyle=\frac{{\mathcal{H}}_{\nu}}{1+\nu}\left\|{\bf{y}}-{\bf{x}}\right% \|^{1+\nu},= divide start_ARG caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_ν end_ARG ∥ bold_y - bold_x ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ,

where the first inequality comes from Cauchy-Schwarz inequality, and the second one comes from Lemma 3 that each gradient is Hölder continuous. ∎

Lemma 5.

(Bound for Jacobian and gradient) Under Assumption 1, we have

𝐠i(𝐱)𝐉(𝐱)Lfnormsubscript𝐠𝑖𝐱norm𝐉𝐱subscript𝐿𝑓\displaystyle\left\|{\bf{g}}_{i}({\bf{x}})\right\|\leq\left\|{\bf J}({\bf{x}})% \right\|\leq L_{f}∥ bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) ∥ ≤ ∥ bold_J ( bold_x ) ∥ ≤ italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT

for all 𝐱d𝐱superscript𝑑{\bf{x}}\in{\mathbb{R}}^{d}bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and i[n]𝑖delimited-[]𝑛i\in[n]italic_i ∈ [ italic_n ].

Proof.

For all 𝐱,𝐯d𝐱𝐯superscript𝑑{\bf{x}},{\bf{v}}\in{\mathbb{R}}^{d}bold_x , bold_v ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, we have

𝐉(𝐱)𝐯=limh0𝐟(𝐱+h𝐯)𝐟(𝐱)h.𝐉𝐱𝐯subscript0𝐟𝐱𝐯𝐟𝐱\displaystyle{\bf J}({\bf{x}}){\bf{v}}=\lim_{h\to 0}\frac{{\bf{f}}({\bf{x}}+h{% \bf{v}})-{\bf{f}}({\bf{x}})}{h}.bold_J ( bold_x ) bold_v = roman_lim start_POSTSUBSCRIPT italic_h → 0 end_POSTSUBSCRIPT divide start_ARG bold_f ( bold_x + italic_h bold_v ) - bold_f ( bold_x ) end_ARG start_ARG italic_h end_ARG .

Taking the spectral norm on both sides, we have

𝐉(𝐱)𝐯norm𝐉𝐱𝐯\displaystyle\left\|{\bf J}({\bf{x}}){\bf{v}}\right\|∥ bold_J ( bold_x ) bold_v ∥ =limh0𝐟(𝐱+h𝐯)𝐟(𝐱)|h|absentsubscript0norm𝐟𝐱𝐯𝐟𝐱\displaystyle=\lim_{h\to 0}\frac{\left\|{\bf{f}}({\bf{x}}+h{\bf{v}})-{\bf{f}}(% {\bf{x}})\right\|}{|h|}= roman_lim start_POSTSUBSCRIPT italic_h → 0 end_POSTSUBSCRIPT divide start_ARG ∥ bold_f ( bold_x + italic_h bold_v ) - bold_f ( bold_x ) ∥ end_ARG start_ARG | italic_h | end_ARG
limh0Lf𝐱+h𝐯𝐱|h|absentsubscript0subscript𝐿𝑓norm𝐱𝐯𝐱\displaystyle\leq\lim_{h\to 0}\frac{L_{f}\left\|{\bf{x}}+h{\bf{v}}-{\bf{x}}% \right\|}{|h|}≤ roman_lim start_POSTSUBSCRIPT italic_h → 0 end_POSTSUBSCRIPT divide start_ARG italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ∥ bold_x + italic_h bold_v - bold_x ∥ end_ARG start_ARG | italic_h | end_ARG
=limh0Lf|h|𝐯|h|absentsubscript0subscript𝐿𝑓norm𝐯\displaystyle=\lim_{h\to 0}\frac{L_{f}|h|\left\|{\bf{v}}\right\|}{|h|}= roman_lim start_POSTSUBSCRIPT italic_h → 0 end_POSTSUBSCRIPT divide start_ARG italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT | italic_h | ∥ bold_v ∥ end_ARG start_ARG | italic_h | end_ARG
=Lf𝐯,absentsubscript𝐿𝑓norm𝐯\displaystyle=L_{f}\left\|{\bf{v}}\right\|,= italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ∥ bold_v ∥ ,

where the inequality comes from Assumption 1.

Therefore, for all 𝐱d𝐱superscript𝑑{\bf{x}}\in{\mathbb{R}}^{d}bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT it holds

𝐉(𝐱)=sup𝐯d𝐉(𝐱)𝐯𝐯Lf.norm𝐉𝐱subscriptsupremum𝐯superscript𝑑norm𝐉𝐱𝐯norm𝐯subscript𝐿𝑓\displaystyle\left\|{\bf J}({\bf{x}})\right\|=\sup_{{\bf{v}}\in{\mathbb{R}}^{d% }}\frac{\left\|{\bf J}({\bf{x}}){\bf{v}}\right\|}{\left\|{\bf{v}}\right\|}\leq L% _{f}.∥ bold_J ( bold_x ) ∥ = roman_sup start_POSTSUBSCRIPT bold_v ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG ∥ bold_J ( bold_x ) bold_v ∥ end_ARG start_ARG ∥ bold_v ∥ end_ARG ≤ italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT .

Let 𝐞insubscript𝐞𝑖superscript𝑛{\bf{e}}_{i}\in{\mathbb{R}}^{n}bold_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT be the i𝑖iitalic_i-th standard basic vector in n𝑛nitalic_n-dimensional Euclid space, then we have

𝐠i(𝐱)=𝐉𝐞i𝐉𝐞i=𝐉(𝐱)Lfnormsubscript𝐠𝑖𝐱normsuperscript𝐉topsubscript𝐞𝑖norm𝐉normsubscript𝐞𝑖norm𝐉𝐱subscript𝐿𝑓\displaystyle\left\|{\bf{g}}_{i}({\bf{x}})\right\|=\left\|{\bf J}^{\top}{\bf{e% }}_{i}\right\|\leq\left\|{\bf J}\right\|\left\|{\bf{e}}_{i}\right\|=\left\|{% \bf J}({\bf{x}})\right\|\leq L_{f}∥ bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) ∥ = ∥ bold_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ≤ ∥ bold_J ∥ ∥ bold_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ = ∥ bold_J ( bold_x ) ∥ ≤ italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT

for all i[n]𝑖delimited-[]𝑛i\in[n]italic_i ∈ [ italic_n ]. ∎

Appendix C The Auxiliary Sequence and Its Properties

We construct the following sequence for our convergence analysis in later sections.

Definition 1.

We define the following sequence {at(n,ν)}t0subscriptsubscript𝑎𝑡𝑛𝜈𝑡0\{a_{t}(n,\nu)\}_{t\geq 0}{ italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) } start_POSTSUBSCRIPT italic_t ≥ 0 end_POSTSUBSCRIPT for given n+𝑛superscriptn\in{\mathbb{N}}^{+}italic_n ∈ blackboard_N start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT and ν(0,1]𝜈01\nu\in(0,1]italic_ν ∈ ( 0 , 1 ]:

at(n,ν){1,t=0,12(1+ν)n(j=0t1(aj(n,ν))1+ν+nt),1tn,12(1+ν)nj=tnt1(aj(n,ν))1+ν,t>n.subscript𝑎𝑡𝑛𝜈cases1𝑡0121𝜈𝑛superscriptsubscript𝑗0𝑡1superscriptsubscript𝑎𝑗𝑛𝜈1𝜈𝑛𝑡1𝑡𝑛121𝜈𝑛superscriptsubscript𝑗𝑡𝑛𝑡1superscriptsubscript𝑎𝑗𝑛𝜈1𝜈𝑡𝑛\displaystyle a_{t}(n,\nu)\triangleq\begin{cases}1,~{}~{}~{}~{}&t=0,\\[4.26773% pt] \displaystyle{\frac{1}{2(1+\nu)n}\left(\sum_{j=0}^{t-1}(a_{j}(n,\nu))^{1+\nu}+% n-t\right)},~{}~{}~{}~{}&1\leq t\leq n,\\[17.07182pt] \displaystyle{\frac{1}{2(1+\nu)n}\sum_{j=t-n}^{t-1}(a_{j}(n,\nu))^{1+\nu}},~{}% ~{}~{}~{}&t>n.\end{cases}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) ≜ { start_ROW start_CELL 1 , end_CELL start_CELL italic_t = 0 , end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_n end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT + italic_n - italic_t ) , end_CELL start_CELL 1 ≤ italic_t ≤ italic_n , end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = italic_t - italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT , end_CELL start_CELL italic_t > italic_n . end_CELL end_ROW (23)

We then provide several useful properties for the sequence in Definition 1.

Lemma 6.

The sequence {at(n,ν)}t0subscriptsubscript𝑎𝑡𝑛𝜈𝑡0\{a_{t}(n,\nu)\}_{t\geq 0}{ italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) } start_POSTSUBSCRIPT italic_t ≥ 0 end_POSTSUBSCRIPT satisfies

at(n,ν)1subscript𝑎𝑡𝑛𝜈1\displaystyle a_{t}(n,\nu)\leq 1italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) ≤ 1

for all t0𝑡0t\geq 0italic_t ≥ 0.

Proof.

Part I: We first use induction to prove at(n,ν)1subscript𝑎𝑡𝑛𝜈1a_{t}(n,\nu)\leq 1italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) ≤ 1 for all t=0,1,n𝑡01𝑛t=0,1\dots,nitalic_t = 0 , 1 … , italic_n. For the induction base, we can verify that a0(n,ν)=11subscript𝑎0𝑛𝜈11a_{0}(n,\nu)=1\leq 1italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_n , italic_ν ) = 1 ≤ 1. For the induction step, we assume

aj(n,ν)1subscript𝑎𝑗𝑛𝜈1\displaystyle a_{j}(n,\nu)\leq 1italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ≤ 1

holds for all j=1,,t1𝑗1𝑡1j=1,\dots,t-1italic_j = 1 , … , italic_t - 1 such that tn𝑡𝑛t\leq nitalic_t ≤ italic_n. Then we have

at(n,ν)=12(1+ν)n(j=0t1(aj(n,ν))1+ν+nt)12(1+ν)n(t+nt)=12(1+ν)1,subscript𝑎𝑡𝑛𝜈121𝜈𝑛superscriptsubscript𝑗0𝑡1superscriptsubscript𝑎𝑗𝑛𝜈1𝜈𝑛𝑡121𝜈𝑛𝑡𝑛𝑡121𝜈1\displaystyle a_{t}(n,\nu)=\frac{1}{2(1+\nu)n}\left(\sum_{j=0}^{t-1}(a_{j}(n,% \nu))^{1+\nu}+n-t\right)\leq\frac{1}{2(1+\nu)n}\left(t+n-t\right)=\frac{1}{2(1% +\nu)}\leq 1,italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) = divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_n end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT + italic_n - italic_t ) ≤ divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_n end_ARG ( italic_t + italic_n - italic_t ) = divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG ≤ 1 ,

where the first inequality is based on the induction hypothesis and the last inequality is based on the setting ν(0,1]𝜈01\nu\in(0,1]italic_ν ∈ ( 0 , 1 ]. This finishes the induction.

Part II: We then use induction to prove at(n,ν)1subscript𝑎𝑡𝑛𝜈1a_{t}(n,\nu)\leq 1italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) ≤ 1 for all tn+1𝑡𝑛1t\geq n+1italic_t ≥ italic_n + 1. For the induction base, we can verify that

an+1(n,ν)=12(1+ν)nj=1n(aj(n,ν))1+ν12(1+ν)nn=12(1+ν)1,subscript𝑎𝑛1𝑛𝜈121𝜈𝑛superscriptsubscript𝑗1𝑛superscriptsubscript𝑎𝑗𝑛𝜈1𝜈121𝜈𝑛𝑛121𝜈1\displaystyle a_{n+1}(n,\nu)=\frac{1}{2(1+\nu)n}\sum_{j=1}^{n}(a_{j}(n,\nu))^{% 1+\nu}\leq\frac{1}{2(1+\nu)n}\cdot n=\frac{1}{2(1+\nu)}\leq 1,italic_a start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) = divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_n end_ARG ⋅ italic_n = divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG ≤ 1 ,

where the first inequality is based on at(n,ν)1subscript𝑎𝑡𝑛𝜈1a_{t}(n,\nu)\leq 1italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) ≤ 1 for all tn𝑡𝑛t\leq nitalic_t ≤ italic_n (which have shown in Part I), and the last inequality is based on the setting ν(0,1]𝜈01\nu\in(0,1]italic_ν ∈ ( 0 , 1 ]. For the induction step, we assume

an+1(n,ν)1subscript𝑎𝑛1𝑛𝜈1\displaystyle a_{n+1}(n,\nu)\leq 1italic_a start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) ≤ 1

holds for all j=n+2,,t1𝑗𝑛2𝑡1j=n+2,\dots,t-1italic_j = italic_n + 2 , … , italic_t - 1 such that tn+3𝑡𝑛3t\geq n+3italic_t ≥ italic_n + 3. Then we have

at(n,ν)=12(1+ν)nj=tnt1(aj(n,ν))1+ν12(1+ν)nn=12(1+ν)1,subscript𝑎𝑡𝑛𝜈121𝜈𝑛superscriptsubscript𝑗𝑡𝑛𝑡1superscriptsubscript𝑎𝑗𝑛𝜈1𝜈121𝜈𝑛𝑛121𝜈1\displaystyle a_{t}(n,\nu)=\frac{1}{2(1+\nu)n}\sum_{j=t-n}^{t-1}(a_{j}(n,\nu))% ^{1+\nu}\leq\frac{1}{2(1+\nu)n}\cdot n=\frac{1}{2(1+\nu)}\leq 1,italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) = divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = italic_t - italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_n end_ARG ⋅ italic_n = divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG ≤ 1 ,

where the first inequality is based on the induction hypothesis and the last inequality is based on the setting ν(0,1]𝜈01\nu\in(0,1]italic_ν ∈ ( 0 , 1 ]. This finishes the induction.

Combining the results of above two parts, we finish the proof of this lemma. ∎

Lemma 7.

The sequence {at(n,ν)}t0subscriptsubscript𝑎𝑡𝑛𝜈𝑡0\{a_{t}(n,\nu)\}_{t\geq 0}{ italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) } start_POSTSUBSCRIPT italic_t ≥ 0 end_POSTSUBSCRIPT satisfies

at(n,ν)at+1(n,ν)subscript𝑎𝑡𝑛𝜈subscript𝑎𝑡1𝑛𝜈\displaystyle a_{t}(n,\nu)\geq a_{t+1}(n,\nu)italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) ≥ italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_n , italic_ν )

for all t0𝑡0t\geq 0italic_t ≥ 0.

Proof.

Part I: For t=0𝑡0t=0italic_t = 0, the fact ν(0,1]𝜈01\nu\in(0,1]italic_ν ∈ ( 0 , 1 ] means

a1(n,ν)=12(1+ν)1=a0(n,ν).subscript𝑎1𝑛𝜈121𝜈1subscript𝑎0𝑛𝜈\displaystyle a_{1}(n,\nu)=\frac{1}{2(1+\nu)}\leq 1=a_{0}(n,\nu).italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) = divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG ≤ 1 = italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_n , italic_ν ) .

Part II: For all t=1,,n1𝑡1𝑛1t=1,\dots,n-1italic_t = 1 , … , italic_n - 1, we have

at+1(n,ν)at(n,ν)subscript𝑎𝑡1𝑛𝜈subscript𝑎𝑡𝑛𝜈\displaystyle a_{t+1}(n,\nu)-a_{t}(n,\nu)italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) - italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν )
=12(1+ν)n(j=0t(aj(n,ν))1+ν+nt1)12(1+ν)n(j=0t1(aj(n,ν))1+ν+nt)absent121𝜈𝑛superscriptsubscript𝑗0𝑡superscriptsubscript𝑎𝑗𝑛𝜈1𝜈𝑛𝑡1121𝜈𝑛superscriptsubscript𝑗0𝑡1superscriptsubscript𝑎𝑗𝑛𝜈1𝜈𝑛𝑡\displaystyle=\frac{1}{2(1+\nu)n}\left(\sum_{j=0}^{t}(a_{j}(n,\nu))^{1+\nu}+n-% t-1\right)-\frac{1}{2(1+\nu)n}\left(\sum_{j=0}^{t-1}(a_{j}(n,\nu))^{1+\nu}+n-t\right)= divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_n end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT + italic_n - italic_t - 1 ) - divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_n end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT + italic_n - italic_t )
=12(1+ν)n((at(n,ν))1+ν1)0,absent121𝜈𝑛superscriptsubscript𝑎𝑡𝑛𝜈1𝜈10\displaystyle=\frac{1}{2(1+\nu)n}\left((a_{t}(n,\nu))^{1+\nu}-1\right)\leq 0,= divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_n end_ARG ( ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT - 1 ) ≤ 0 ,

where the last inequality is based on Lemma 6. This indicates at+1atsubscript𝑎𝑡1subscript𝑎𝑡a_{t+1}\leq a_{t}italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ≤ italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT for t=1,,n1𝑡1𝑛1t=1,\dots,n-1italic_t = 1 , … , italic_n - 1.

Part III: For all tn𝑡𝑛t\geq nitalic_t ≥ italic_n, we use induction to prove at+1(n,ν)at(n,ν)subscript𝑎𝑡1𝑛𝜈subscript𝑎𝑡𝑛𝜈a_{t+1}(n,\nu)\leq a_{t}(n,\nu)italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) ≤ italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ). For the induction base, we can verify that

an+1(n,ν)an(n,ν)subscript𝑎𝑛1𝑛𝜈subscript𝑎𝑛𝑛𝜈\displaystyle a_{n+1}(n,\nu)-a_{n}(n,\nu)italic_a start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) - italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_n , italic_ν )
=12(1+ν)nj=1n(aj(n,ν))1+ν12(1+ν)n(j=1n1(aj(n,ν))1+ν+1)absent121𝜈𝑛superscriptsubscript𝑗1𝑛superscriptsubscript𝑎𝑗𝑛𝜈1𝜈121𝜈𝑛superscriptsubscript𝑗1𝑛1superscriptsubscript𝑎𝑗𝑛𝜈1𝜈1\displaystyle=\frac{1}{2(1+\nu)n}\sum_{j=1}^{n}(a_{j}(n,\nu))^{1+\nu}-\frac{1}% {2(1+\nu)n}\left(\sum_{j=1}^{n-1}(a_{j}(n,\nu))^{1+\nu}+1\right)= divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_n end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT + 1 )
=12(1+ν)n((an(n,ν))1+ν1)absent121𝜈𝑛superscriptsubscript𝑎𝑛𝑛𝜈1𝜈1\displaystyle=\frac{1}{2(1+\nu)n}\left((a_{n}(n,\nu))^{1+\nu}-1\right)= divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_n end_ARG ( ( italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT - 1 )
0,absent0\displaystyle\leq 0,≤ 0 ,

where the last inequality is based on Lemma 6.

For the induction step, we assume

aj+1(n,ν)aj(n,ν)subscript𝑎𝑗1𝑛𝜈subscript𝑎𝑗𝑛𝜈\displaystyle a_{j+1}(n,\nu)\leq a_{j}(n,\nu)italic_a start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) ≤ italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν )

holds for all j=n+1,,t1𝑗𝑛1𝑡1j=n+1,\cdots,t-1italic_j = italic_n + 1 , ⋯ , italic_t - 1 such that tn+2𝑡𝑛2t\geq n+2italic_t ≥ italic_n + 2. Then we have

at+1(n,ν)at(n,ν)subscript𝑎𝑡1𝑛𝜈subscript𝑎𝑡𝑛𝜈\displaystyle a_{t+1}(n,\nu)-a_{t}(n,\nu)italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) - italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν )
=\displaystyle== 12(1+ν)nj=tn+1t(aj(n,ν))1+ν12(1+ν)nj=tnt1(aj(n,ν))1+ν0,121𝜈𝑛superscriptsubscript𝑗𝑡𝑛1𝑡superscriptsubscript𝑎𝑗𝑛𝜈1𝜈121𝜈𝑛superscriptsubscript𝑗𝑡𝑛𝑡1superscriptsubscript𝑎𝑗𝑛𝜈1𝜈0\displaystyle\frac{1}{2(1+\nu)n}\sum_{j=t-n+1}^{t}(a_{j}(n,\nu))^{1+\nu}-\frac% {1}{2(1+\nu)n}\sum_{j=t-n}^{t-1}(a_{j}(n,\nu))^{1+\nu}\leq 0,divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = italic_t - italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = italic_t - italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ≤ 0 ,

where the inequality is based on the induction hypothesis and the fact at+1(n,ν)at(n,ν)subscript𝑎𝑡1𝑛𝜈subscript𝑎𝑡𝑛𝜈a_{t+1}(n,\nu)\leq a_{t}(n,\nu)italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) ≤ italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) for all tn1𝑡𝑛1t\leq n-1italic_t ≤ italic_n - 1 (which have shown in Part I).

Combining the results of above three parts, we finish the proof of this lemma. ∎

Lemma 8.

For the sequence {at(n,ν)}t0subscriptsubscript𝑎𝑡𝑛𝜈𝑡0\{a_{t}(n,\nu)\}_{t\geq 0}{ italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) } start_POSTSUBSCRIPT italic_t ≥ 0 end_POSTSUBSCRIPT, we have

at(n,ν)12(1+ν)(atn(n,ν))1+νsubscript𝑎𝑡𝑛𝜈121𝜈superscriptsubscript𝑎𝑡𝑛𝑛𝜈1𝜈\displaystyle a_{t}(n,\nu)\leq\frac{1}{2(1+\nu)}(a_{t-n}(n,\nu))^{1+\nu}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) ≤ divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG ( italic_a start_POSTSUBSCRIPT italic_t - italic_n end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT

for all tn𝑡𝑛t\geq nitalic_t ≥ italic_n.

Proof.

For all tn𝑡𝑛t\geq nitalic_t ≥ italic_n, the definition of at(n,ν)subscript𝑎𝑡𝑛𝜈a_{t}(n,\nu)italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) implies

at(n,ν)=12(1+ν)n(j=tnt1(aj(n,ν))1+ν)subscript𝑎𝑡𝑛𝜈121𝜈𝑛superscriptsubscript𝑗𝑡𝑛𝑡1superscriptsubscript𝑎𝑗𝑛𝜈1𝜈\displaystyle a_{t}(n,\nu)=\frac{1}{2(1+\nu)n}\left(\sum_{j=t-n}^{t-1}(a_{j}(n% ,\nu))^{1+\nu}\right)italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) = divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_n end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = italic_t - italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT )
12(1+ν)max{(atn(n,ν))1+ν,,(at1(n,ν))1+ν}.absent121𝜈superscriptsubscript𝑎𝑡𝑛𝑛𝜈1𝜈superscriptsubscript𝑎𝑡1𝑛𝜈1𝜈\displaystyle\leq\frac{1}{2(1+\nu)}\max\{(a_{t-n}(n,\nu))^{1+\nu},\cdots,(a_{t% -1}(n,\nu))^{1+\nu}\}.≤ divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG roman_max { ( italic_a start_POSTSUBSCRIPT italic_t - italic_n end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT , ⋯ , ( italic_a start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT } .

Additionally, Lemma 7 implies for all tn𝑡𝑛t\geq nitalic_t ≥ italic_n, we have

max{(atn(n,ν))1+ν,,(at1(n,ν))1+ν}=(atn(n,ν))1+ν,tn.formulae-sequencesuperscriptsubscript𝑎𝑡𝑛𝑛𝜈1𝜈superscriptsubscript𝑎𝑡1𝑛𝜈1𝜈superscriptsubscript𝑎𝑡𝑛𝑛𝜈1𝜈𝑡𝑛\displaystyle\max\{(a_{t-n}(n,\nu))^{1+\nu},\cdots,(a_{t-1}(n,\nu))^{1+\nu}\}=% (a_{t-n}(n,\nu))^{1+\nu},\quad t\geq n.roman_max { ( italic_a start_POSTSUBSCRIPT italic_t - italic_n end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT , ⋯ , ( italic_a start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT } = ( italic_a start_POSTSUBSCRIPT italic_t - italic_n end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT , italic_t ≥ italic_n .

Combining above results, we achieve

at(n,ν)12(1+ν)(atn(n,ν))1+νsubscript𝑎𝑡𝑛𝜈121𝜈superscriptsubscript𝑎𝑡𝑛𝑛𝜈1𝜈\displaystyle a_{t}(n,\nu)\leq\frac{1}{2(1+\nu)}(a_{t-n}(n,\nu))^{1+\nu}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) ≤ divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG ( italic_a start_POSTSUBSCRIPT italic_t - italic_n end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT

for all tn𝑡𝑛t\geq nitalic_t ≥ italic_n. ∎

Lemma 9.

For the sequence {at(n,ν)}t0subscriptsubscript𝑎𝑡𝑛𝜈𝑡0\{a_{t}(n,\nu)\}_{t\geq 0}{ italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) } start_POSTSUBSCRIPT italic_t ≥ 0 end_POSTSUBSCRIPT, we have

at+1(n,ν)c0at(n,ν)subscript𝑎𝑡1𝑛𝜈subscript𝑐0subscript𝑎𝑡𝑛𝜈\displaystyle a_{t+1}(n,\nu)\leq c_{0}a_{t}(n,\nu)italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) ≤ italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν )

for all tn𝑡𝑛t\geq nitalic_t ≥ italic_n, where

c0=11n(1(12(1+ν))1+ν).subscript𝑐011𝑛1superscript121𝜈1𝜈\displaystyle c_{0}=1-\frac{1}{n}\left(1-\left(\frac{1}{2(1+\nu)}\right)^{1+% \nu}\right).italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1 - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ( 1 - ( divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ) .
Proof.

For all tn𝑡𝑛t\geq nitalic_t ≥ italic_n, we have

at(n,ν)12(1+ν)(atn(n,ν))1+ν12(1+ν)atn(n,ν),subscript𝑎𝑡𝑛𝜈121𝜈superscriptsubscript𝑎𝑡𝑛𝑛𝜈1𝜈121𝜈subscript𝑎𝑡𝑛𝑛𝜈\displaystyle a_{t}(n,\nu)\leq\frac{1}{2(1+\nu)}(a_{t-n}(n,\nu))^{1+\nu}\leq% \frac{1}{2(1+\nu)}a_{t-n}(n,\nu),italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) ≤ divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG ( italic_a start_POSTSUBSCRIPT italic_t - italic_n end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG italic_a start_POSTSUBSCRIPT italic_t - italic_n end_POSTSUBSCRIPT ( italic_n , italic_ν ) , (24)

where the first inequality is based on Lemma 8 and the second one is based on Lemma 6.

Then we also have

at+1=12(1+ν)n(j=tn+1t(aj(n,ν))1+ν)12(1+ν)n((12(1+ν))1+ν(atn(n,ν))1+ν+j=tn+1t1(aj(n,ν))1+ν)=12(1+ν)n((12(1+ν))1+ν(atn(n,ν))1+ν+j=tn+1t1(aj(n,ν))1+ν+(atn(n,ν))1+ν(atn(n,ν))1+ν)=at(n,ν)+12(1+ν)n((12(1+ν))1+ν(atn(n,ν))1+ν(atn(n,ν))1+ν)=at(n,ν)12(1+ν)n(1(12(1+ν))1+ν)(atn(n,ν))1+νat(n,ν)1n(1(12(1+ν))1+ν)at(n,ν)=(11n(1(12(1+ν))1+ν))at(n,ν),subscript𝑎𝑡1121𝜈𝑛superscriptsubscript𝑗𝑡𝑛1𝑡superscriptsubscript𝑎𝑗𝑛𝜈1𝜈121𝜈𝑛superscript121𝜈1𝜈superscriptsubscript𝑎𝑡𝑛𝑛𝜈1𝜈superscriptsubscript𝑗𝑡𝑛1𝑡1superscriptsubscript𝑎𝑗𝑛𝜈1𝜈121𝜈𝑛superscript121𝜈1𝜈superscriptsubscript𝑎𝑡𝑛𝑛𝜈1𝜈superscriptsubscript𝑗𝑡𝑛1𝑡1superscriptsubscript𝑎𝑗𝑛𝜈1𝜈superscriptsubscript𝑎𝑡𝑛𝑛𝜈1𝜈superscriptsubscript𝑎𝑡𝑛𝑛𝜈1𝜈subscript𝑎𝑡𝑛𝜈121𝜈𝑛superscript121𝜈1𝜈superscriptsubscript𝑎𝑡𝑛𝑛𝜈1𝜈superscriptsubscript𝑎𝑡𝑛𝑛𝜈1𝜈subscript𝑎𝑡𝑛𝜈121𝜈𝑛1superscript121𝜈1𝜈superscriptsubscript𝑎𝑡𝑛𝑛𝜈1𝜈subscript𝑎𝑡𝑛𝜈1𝑛1superscript121𝜈1𝜈subscript𝑎𝑡𝑛𝜈11𝑛1superscript121𝜈1𝜈subscript𝑎𝑡𝑛𝜈\displaystyle\small\begin{split}a_{t+1}&=\frac{1}{2(1+\nu)n}\left(\sum_{j=t-n+% 1}^{t}(a_{j}(n,\nu))^{1+\nu}\right)\\ &\leq\frac{1}{2(1+\nu)n}\left(\left(\frac{1}{2(1+\nu)}\right)^{1+\nu}(a_{t-n}(% n,\nu))^{1+\nu}+\sum_{j=t-n+1}^{t-1}(a_{j}(n,\nu))^{1+\nu}\right)\\ &=\frac{1}{2(1+\nu)n}\left(\left(\frac{1}{2(1+\nu)}\right)^{1+\nu}(a_{t-n}(n,% \nu))^{1+\nu}+\sum_{j=t-n+1}^{t-1}(a_{j}(n,\nu))^{1+\nu}+(a_{t-n}(n,\nu))^{1+% \nu}-(a_{t-n}(n,\nu))^{1+\nu}\right)\\ &=a_{t}(n,\nu)+\frac{1}{2(1+\nu)n}\left(\left(\frac{1}{2(1+\nu)}\right)^{1+\nu% }(a_{t-n}(n,\nu))^{1+\nu}-(a_{t-n}(n,\nu))^{1+\nu}\right)\\ &=a_{t}(n,\nu)-\frac{1}{2(1+\nu)n}\left(1-\left(\frac{1}{2(1+\nu)}\right)^{1+% \nu}\right)(a_{t-n}(n,\nu))^{1+\nu}\\ &\leq a_{t}(n,\nu)-\frac{1}{n}\left(1-\left(\frac{1}{2(1+\nu)}\right)^{1+\nu}% \right)a_{t}(n,\nu)\\ &=\left(1-\frac{1}{n}\left(1-\left(\frac{1}{2(1+\nu)}\right)^{1+\nu}\right)% \right)a_{t}(n,\nu),\end{split}start_ROW start_CELL italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_CELL start_CELL = divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_n end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = italic_t - italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ≤ divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_n end_ARG ( ( divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t - italic_n end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j = italic_t - italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_n end_ARG ( ( divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t - italic_n end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j = italic_t - italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT + ( italic_a start_POSTSUBSCRIPT italic_t - italic_n end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT - ( italic_a start_POSTSUBSCRIPT italic_t - italic_n end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) + divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_n end_ARG ( ( divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t - italic_n end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT - ( italic_a start_POSTSUBSCRIPT italic_t - italic_n end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) - divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_n end_ARG ( 1 - ( divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ) ( italic_a start_POSTSUBSCRIPT italic_t - italic_n end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ≤ italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ( 1 - ( divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ) italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ( 1 - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ( 1 - ( divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ) ) italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) , end_CELL end_ROW

for all tn𝑡𝑛t\geq nitalic_t ≥ italic_n, where the first inequality is based on equation (24) and the last inequality is based on Lemma 8. This finish the proof. ∎

Lemma 10.

For the sequence {at(n,ν)}t0subscriptsubscript𝑎𝑡𝑛𝜈𝑡0\{a_{t}(n,\nu)\}_{t\geq 0}{ italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) } start_POSTSUBSCRIPT italic_t ≥ 0 end_POSTSUBSCRIPT, if there exists c1(0,1)subscript𝑐101c_{1}\in(0,1)italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ ( 0 , 1 ) and t00subscript𝑡00t_{0}\geq 0italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ≥ 0 such that

at+1(n,ν)c1at(n,ν)subscript𝑎𝑡1𝑛𝜈subscript𝑐1subscript𝑎𝑡𝑛𝜈\displaystyle a_{t+1}(n,\nu)\leq c_{1}a_{t}(n,\nu)italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) ≤ italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) (25)

for all tt0+n𝑡subscript𝑡0𝑛t\geq t_{0}+nitalic_t ≥ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_n, then we have

at+1(n,ν)c11+νat(n,ν)subscript𝑎𝑡1𝑛𝜈superscriptsubscript𝑐11𝜈subscript𝑎𝑡𝑛𝜈\displaystyle a_{t+1}(n,\nu)\leq c_{1}^{1+\nu}a_{t}(n,\nu)italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) ≤ italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν )

for all tt0+2n𝑡subscript𝑡02𝑛t\geq t_{0}+2nitalic_t ≥ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + 2 italic_n.

Proof.

For all tt0+2n𝑡subscript𝑡02𝑛t\geq t_{0}+2nitalic_t ≥ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + 2 italic_n, we have

at+1(n,ν)=12(1+ν)nj=tn+1t(aj(n,ν))1+νsubscript𝑎𝑡1𝑛𝜈121𝜈𝑛superscriptsubscript𝑗𝑡𝑛1𝑡superscriptsubscript𝑎𝑗𝑛𝜈1𝜈\displaystyle a_{t+1}(n,\nu)=\frac{1}{2(1+\nu)n}\sum_{j=t-n+1}^{t}(a_{j}(n,\nu% ))^{1+\nu}italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) = divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = italic_t - italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT
12(1+ν)nj=tnt1c11+ν(aj(n,ν))1+ν=c11+νat(n,ν),absent121𝜈𝑛superscriptsubscript𝑗𝑡𝑛𝑡1superscriptsubscript𝑐11𝜈superscriptsubscript𝑎𝑗𝑛𝜈1𝜈superscriptsubscript𝑐11𝜈subscript𝑎𝑡𝑛𝜈\displaystyle\leq\frac{1}{2(1+\nu)n}\sum_{j=t-n}^{t-1}c_{1}^{1+\nu}(a_{j}(n,% \nu))^{1+\nu}=c_{1}^{1+\nu}a_{t}(n,\nu),≤ divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = italic_t - italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT = italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) ,

where the inequality is based on equation (25). ∎

Lemma 11.

For the sequence {at(n,ν)}t0subscriptsubscript𝑎𝑡𝑛𝜈𝑡0\{a_{t}(n,\nu)\}_{t\geq 0}{ italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) } start_POSTSUBSCRIPT italic_t ≥ 0 end_POSTSUBSCRIPT, we have the superlinear convergence

at+1(n,ν)c(1+ν)(t1n1)at(n,ν)subscript𝑎𝑡1𝑛𝜈superscript𝑐superscript1𝜈𝑡1𝑛1subscript𝑎𝑡𝑛𝜈\displaystyle a_{t+1}(n,\nu)\leq c^{(1+\nu)^{\left(\left\lfloor\frac{t-1}{n}% \right\rfloor-1\right)}}a_{t}(n,\nu)italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) ≤ italic_c start_POSTSUPERSCRIPT ( 1 + italic_ν ) start_POSTSUPERSCRIPT ( ⌊ divide start_ARG italic_t - 1 end_ARG start_ARG italic_n end_ARG ⌋ - 1 ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν )

for all tn𝑡𝑛t\geq nitalic_t ≥ italic_n, where

c=11n(1(12(1+ν))1+ν).𝑐11𝑛1superscript121𝜈1𝜈\displaystyle c=1-\frac{1}{n}\left(1-\left(\frac{1}{2(1+\nu)}\right)^{1+\nu}% \right).italic_c = 1 - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ( 1 - ( divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ) .
Proof.

According to Lemma 9, we have

at+1(n,ν)cat(n,ν)for alltn.formulae-sequencesubscript𝑎𝑡1𝑛𝜈𝑐subscript𝑎𝑡𝑛𝜈for all𝑡𝑛\displaystyle a_{t+1}(n,\nu)\leq ca_{t}(n,\nu)\quad\text{for all}~{}~{}t\geq n.italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) ≤ italic_c italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) for all italic_t ≥ italic_n .

According to Lemma 10, we have

at+1(n,ν)subscript𝑎𝑡1𝑛𝜈\displaystyle a_{t+1}(n,\nu)italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) c1+νat(n,ν)for allt2n,formulae-sequenceabsentsuperscript𝑐1𝜈subscript𝑎𝑡𝑛𝜈for all𝑡2𝑛\displaystyle\leq c^{1+\nu}a_{t}(n,\nu)\quad\text{for all}~{}~{}t\geq 2n,≤ italic_c start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) for all italic_t ≥ 2 italic_n ,
at+1(n,ν)subscript𝑎𝑡1𝑛𝜈\displaystyle a_{t+1}(n,\nu)italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) c(1+ν)2at(n,ν)for allt3n,formulae-sequenceabsentsuperscript𝑐superscript1𝜈2subscript𝑎𝑡𝑛𝜈for all𝑡3𝑛\displaystyle\leq c^{(1+\nu)^{2}}a_{t}(n,\nu)\quad\text{for all}~{}~{}t\geq 3n,≤ italic_c start_POSTSUPERSCRIPT ( 1 + italic_ν ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) for all italic_t ≥ 3 italic_n ,
at+1(n,ν)subscript𝑎𝑡1𝑛𝜈\displaystyle a_{t+1}(n,\nu)italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) c(1+ν)3at(n,ν)for allt4n,formulae-sequenceabsentsuperscript𝑐superscript1𝜈3subscript𝑎𝑡𝑛𝜈for all𝑡4𝑛\displaystyle\leq c^{(1+\nu)^{3}}a_{t}(n,\nu)\quad\text{for all}~{}~{}t\geq 4n,≤ italic_c start_POSTSUPERSCRIPT ( 1 + italic_ν ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) for all italic_t ≥ 4 italic_n ,
\displaystyle\cdots

which implies

at+1(n,ν)c(1+ν)(t/n1)at(n,ν)subscript𝑎𝑡1𝑛𝜈superscript𝑐superscript1𝜈𝑡𝑛1subscript𝑎𝑡𝑛𝜈\displaystyle a_{t+1}(n,\nu)\leq c^{(1+\nu)^{\left(\lfloor{t}/{n}\rfloor-1% \right)}}a_{t}(n,\nu)italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) ≤ italic_c start_POSTSUPERSCRIPT ( 1 + italic_ν ) start_POSTSUPERSCRIPT ( ⌊ italic_t / italic_n ⌋ - 1 ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν )

for all tn𝑡𝑛t\geq nitalic_t ≥ italic_n.

The superlinear convergence of the sequence {at(n,ν)}t0subscriptsubscript𝑎𝑡𝑛𝜈𝑡0\{a_{t}(n,\nu)\}_{t\geq 0}{ italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) } start_POSTSUBSCRIPT italic_t ≥ 0 end_POSTSUBSCRIPT can be verify by the fact

limtc(1+ν)(t/n1)=0.subscript𝑡superscript𝑐superscript1𝜈𝑡𝑛10\displaystyle\lim_{t\to\infty}c^{(1+\nu)^{\left(\lfloor{t}/{n}\rfloor-1\right)% }}=0.roman_lim start_POSTSUBSCRIPT italic_t → ∞ end_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT ( 1 + italic_ν ) start_POSTSUPERSCRIPT ( ⌊ italic_t / italic_n ⌋ - 1 ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT = 0 .

Hence, we finish the proof. ∎

Appendix D The Convergence Analysis for IGN

In this section, we provide the proofs for result in Section 3.

D.1 The Proof of Proposition 1

Proof.

We denote the singular value decomposition of 𝐉(𝐱)𝐉superscript𝐱{\bf J}({\bf{x}}^{*})bold_J ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) as

𝐉(𝐱)=𝐏𝐃𝐐,𝐉superscript𝐱superscript𝐏𝐃𝐐top\displaystyle{\bf J}({\bf{x}}^{*})={\bf P}{\bf D}{\bf Q}^{\top},bold_J ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = bold_PDQ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ,

where 𝐏n×d,𝐐d×dformulae-sequence𝐏superscript𝑛𝑑𝐐superscript𝑑𝑑{\bf P}\in{\mathbb{R}}^{n\times d},{\bf Q}\in{\mathbb{R}}^{d\times d}bold_P ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_d end_POSTSUPERSCRIPT , bold_Q ∈ blackboard_R start_POSTSUPERSCRIPT italic_d × italic_d end_POSTSUPERSCRIPT are (column) orthogonal matrices and 𝐃d×d𝐃superscript𝑑𝑑{\bf D}\in{\mathbb{R}}^{d\times d}bold_D ∈ blackboard_R start_POSTSUPERSCRIPT italic_d × italic_d end_POSTSUPERSCRIPT is diagonal matrix with the smallest diagonal entry of μ>0𝜇0\mu>0italic_μ > 0. Therefore, we have

𝐉(𝐱)𝐉(𝐱)=𝐐𝐃2𝐐,𝐉superscriptsuperscript𝐱top𝐉superscript𝐱superscript𝐐𝐃2superscript𝐐top\displaystyle{\bf J}({\bf{x}}^{*})^{\top}{\bf J}({\bf{x}}^{*})={\bf Q}{\bf D}^% {2}{\bf Q}^{\top},bold_J ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = bold_QD start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_Q start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ,

which means the smallest singular value of 𝐉(𝐱)𝐉(𝐱)𝐉superscriptsuperscript𝐱top𝐉superscript𝐱{\bf J}({\bf{x}}^{*})^{\top}{\bf J}({\bf{x}}^{*})bold_J ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) is equal to the smallest value of 𝐃2superscript𝐃2{\bf D}^{2}bold_D start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, which is μ2superscript𝜇2\mu^{2}italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Therefore, we have

σmin(𝐉(𝐱)𝐉(𝐱))μ2.subscript𝜎𝐉superscriptsuperscript𝐱top𝐉superscript𝐱superscript𝜇2\displaystyle\sigma_{\min}({\bf J}({\bf{x}}^{*})^{\top}{\bf J}({\bf{x}}^{*}))% \geq\mu^{2}.italic_σ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( bold_J ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) ≥ italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

D.2 Proof of Lemma 1

Proof.

The Jacobian holds that

𝐉(𝐲)𝐉(𝐲)𝐉(𝐱)𝐉(𝐱)norm𝐉superscript𝐲top𝐉𝐲𝐉superscript𝐱top𝐉𝐱\displaystyle\left\|{\bf J}({\bf{y}})^{\top}{\bf J}({\bf{y}})-{\bf J}({\bf{x}}% )^{\top}{\bf J}({\bf{x}})\right\|∥ bold_J ( bold_y ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_y ) - bold_J ( bold_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_x ) ∥ =𝐉(𝐲)𝐉(𝐲)𝐉(𝐱)𝐉(𝐲)+𝐉(𝐱)𝐉(𝐲)𝐉(𝐱)𝐉(𝐱)absentnorm𝐉superscript𝐲top𝐉𝐲𝐉superscript𝐱top𝐉𝐲𝐉superscript𝐱top𝐉𝐲𝐉superscript𝐱top𝐉𝐱\displaystyle=\left\|{\bf J}({\bf{y}})^{\top}{\bf J}({\bf{y}})-{\bf J}({\bf{x}% })^{\top}{\bf J}({\bf{y}})+{\bf J}({\bf{x}})^{\top}{\bf J}({\bf{y}})-{\bf J}({% \bf{x}})^{\top}{\bf J}({\bf{x}})\right\|= ∥ bold_J ( bold_y ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_y ) - bold_J ( bold_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_y ) + bold_J ( bold_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_y ) - bold_J ( bold_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_x ) ∥
(𝐉(𝐲)𝐉(𝐱))𝐉(𝐲)+𝐉(𝐱)(𝐉(𝐲)𝐉(𝐱))absentnormsuperscript𝐉𝐲𝐉𝐱top𝐉𝐲norm𝐉superscript𝐱top𝐉𝐲𝐉𝐱\displaystyle\leq\left\|\left({\bf J}({\bf{y}})-{\bf J}({\bf{x}})\right)^{\top% }{\bf J}({\bf{y}})\right\|+\left\|{\bf J}({\bf{x}})^{\top}\left({\bf J}({\bf{y% }})-{\bf J}({\bf{x}})\right)\right\|≤ ∥ ( bold_J ( bold_y ) - bold_J ( bold_x ) ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_y ) ∥ + ∥ bold_J ( bold_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_J ( bold_y ) - bold_J ( bold_x ) ) ∥
𝐉(𝐲)𝐉(𝐲)𝐉(𝐱)+𝐉(𝐱)𝐉(𝐲)𝐉(𝐱)absentnorm𝐉𝐲norm𝐉𝐲𝐉𝐱norm𝐉𝐱norm𝐉𝐲𝐉𝐱\displaystyle\leq\left\|{\bf J}({\bf{y}})\right\|\left\|{\bf J}({\bf{y}})-{\bf J% }({\bf{x}})\right\|+\left\|{\bf J}({\bf{x}})\right\|\left\|{\bf J}({\bf{y}})-{% \bf J}({\bf{x}})\right\|≤ ∥ bold_J ( bold_y ) ∥ ∥ bold_J ( bold_y ) - bold_J ( bold_x ) ∥ + ∥ bold_J ( bold_x ) ∥ ∥ bold_J ( bold_y ) - bold_J ( bold_x ) ∥
2Lfν𝐲𝐱ν,absent2subscript𝐿𝑓subscript𝜈superscriptnorm𝐲𝐱𝜈\displaystyle\leq 2L_{f}{\mathcal{H}}_{\nu}\left\|{\bf{y}}-{\bf{x}}\right\|^{% \nu},≤ 2 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∥ bold_y - bold_x ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ,

where the first inequality comes from triangular inequality, the second inequality comes from property of norm, and the last inequality is based on Lemma 5 and Assumption 2.

For all j[n]𝑗delimited-[]𝑛j\in[n]italic_j ∈ [ italic_n ], the gradient holds that

𝐠i(𝐲)𝐠i(𝐲)𝐠i(𝐱)𝐠i(𝐱)normsubscript𝐠𝑖𝐲subscript𝐠𝑖superscript𝐲topsubscript𝐠𝑖𝐱subscript𝐠𝑖superscript𝐱top\displaystyle\left\|{\bf{g}}_{i}({\bf{y}}){\bf{g}}_{i}({\bf{y}})^{\top}\!-{\bf% {g}}_{i}({\bf{x}}){\bf{g}}_{i}({\bf{x}})^{\top}\right\|∥ bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_y ) bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_y ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∥ =𝐠i(𝐲)𝐠i(𝐲)𝐠i(𝐱)𝐠i(𝐲)+𝐠i(𝐱)𝐠i(𝐲)𝐠i(𝐱)𝐠i(𝐱)absentnormsubscript𝐠𝑖𝐲subscript𝐠𝑖superscript𝐲topsubscript𝐠𝑖𝐱subscript𝐠𝑖superscript𝐲topsubscript𝐠𝑖𝐱subscript𝐠𝑖superscript𝐲topsubscript𝐠𝑖𝐱subscript𝐠𝑖superscript𝐱top\displaystyle=\left\|{\bf{g}}_{i}({\bf{y}}){\bf{g}}_{i}({\bf{y}})^{\top}\!-{% \bf{g}}_{i}({\bf{x}}){\bf{g}}_{i}({\bf{y}})^{\top}+{\bf{g}}_{i}({\bf{x}}){\bf{% g}}_{i}({\bf{y}})^{\top}\!-{\bf{g}}_{i}({\bf{x}}){\bf{g}}_{i}({\bf{x}})^{\top}\right\|= ∥ bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_y ) bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_y ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_y ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_y ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∥
(𝐠i(𝐲)𝐠i(𝐱))𝐠i(𝐲)+𝐠i(𝐱)(𝐠i(𝐲)𝐠i(𝐱))absentnormsubscript𝐠𝑖𝐲subscript𝐠𝑖𝐱subscript𝐠𝑖superscript𝐲topnormsubscript𝐠𝑖𝐱superscriptsubscript𝐠𝑖𝐲subscript𝐠𝑖𝐱top\displaystyle\leq\left\|\left({\bf{g}}_{i}({\bf{y}})-{\bf{g}}_{i}({\bf{x}})% \right){\bf{g}}_{i}({\bf{y}})^{\top}\right\|+\left\|{\bf{g}}_{i}({\bf{x}})% \left({\bf{g}}_{i}({\bf{y}})-{\bf{g}}_{i}({\bf{x}})\right)^{\top}\right\|≤ ∥ ( bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_y ) - bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) ) bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_y ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∥ + ∥ bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) ( bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_y ) - bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∥
𝐠i(𝐲)𝐠i(𝐲)𝐠i(𝐱)+𝐠i(𝐱)𝐠i(𝐲)𝐠i(𝐱)absentnormsubscript𝐠𝑖𝐲normsubscript𝐠𝑖𝐲subscript𝐠𝑖𝐱normsubscript𝐠𝑖𝐱normsubscript𝐠𝑖𝐲subscript𝐠𝑖𝐱\displaystyle\leq\left\|{\bf{g}}_{i}({\bf{y}})\right\|\left\|{\bf{g}}_{i}({\bf% {y}})-{\bf{g}}_{i}({\bf{x}})\right\|+\left\|{\bf{g}}_{i}({\bf{x}})\right\|% \left\|{\bf{g}}_{i}({\bf{y}})-{\bf{g}}_{i}({\bf{x}})\right\|≤ ∥ bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_y ) ∥ ∥ bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_y ) - bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) ∥ + ∥ bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) ∥ ∥ bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_y ) - bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) ∥
2Lfν𝐲𝐱ν,absent2subscript𝐿𝑓subscript𝜈superscriptnorm𝐲𝐱𝜈\displaystyle\leq 2L_{f}{\mathcal{H}}_{\nu}\left\|{\bf{y}}-{\bf{x}}\right\|^{% \nu},≤ 2 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∥ bold_y - bold_x ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ,

where the first inequality comes from triangular inequality, the second inequality comes from property of norm, and the last inequality is based on Lemma 3 and 5. ∎

D.3 Proof of Lemma 2

Proof.

We have

𝐇t𝐉(𝐱)𝐉(𝐱)normsuperscript𝐇𝑡𝐉superscriptsuperscript𝐱top𝐉superscript𝐱\displaystyle\left\|{\bf H}^{t}-{\bf J}({\bf{x}}^{*})^{\top}{\bf J}({\bf{x}}^{% *})\right\|∥ bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_J ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∥ =i=1n𝐠i(𝐳it)𝐠i(𝐳it)i=1n𝐠i(𝐱)𝐠i(𝐱)absentnormsuperscriptsubscript𝑖1𝑛subscript𝐠𝑖superscriptsubscript𝐳𝑖𝑡subscript𝐠𝑖superscriptsuperscriptsubscript𝐳𝑖𝑡topsuperscriptsubscript𝑖1𝑛subscript𝐠𝑖superscript𝐱subscript𝐠𝑖superscriptsuperscript𝐱top\displaystyle=\left\|\sum_{i=1}^{n}{\bf{g}}_{i}({\bf{z}}_{i}^{t}){\bf{g}}_{i}(% {\bf{z}}_{i}^{t})^{\top}-\sum_{i=1}^{n}{\bf{g}}_{i}({\bf{x}}^{*}){\bf{g}}_{i}(% {\bf{x}}^{*})^{\top}\right\|= ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∥
i=1n𝐠i(𝐳it)𝐠i(𝐳it)𝐠i(𝐱)𝐠i(𝐱)absentsuperscriptsubscript𝑖1𝑛normsubscript𝐠𝑖superscriptsubscript𝐳𝑖𝑡subscript𝐠𝑖superscriptsuperscriptsubscript𝐳𝑖𝑡topsubscript𝐠𝑖superscript𝐱subscript𝐠𝑖superscriptsuperscript𝐱top\displaystyle\leq\sum_{i=1}^{n}\left\|{\bf{g}}_{i}({\bf{z}}_{i}^{t}){\bf{g}}_{% i}({\bf{z}}_{i}^{t})^{\top}-{\bf{g}}_{i}({\bf{x}}^{*}){\bf{g}}_{i}({\bf{x}}^{*% })^{\top}\right\|≤ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∥
i=1n2Lfν𝐳it𝐱ν,absentsuperscriptsubscript𝑖1𝑛2subscript𝐿𝑓subscript𝜈superscriptnormsuperscriptsubscript𝐳𝑖𝑡superscript𝐱𝜈\displaystyle\leq\sum_{i=1}^{n}2L_{f}{\mathcal{H}}_{\nu}\left\|{\bf{z}}_{i}^{t% }-{\bf{x}}^{*}\right\|^{\nu},≤ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT 2 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ,

where the first inequality comes from the triangle inequality and the second inequality is based on Lemma 1. Thus, we have

𝐇t𝐉(𝐱)𝐉(𝐱)i=1n2Lfν𝐳it𝐱ν𝐈,succeeds-or-equalssuperscript𝐇𝑡𝐉superscriptsuperscript𝐱top𝐉superscript𝐱superscriptsubscript𝑖1𝑛2subscript𝐿𝑓subscript𝜈superscriptnormsuperscriptsubscript𝐳𝑖𝑡superscript𝐱𝜈𝐈\displaystyle{\bf H}^{t}-{\bf J}({\bf{x}}^{*})^{\top}{\bf J}({\bf{x}}^{*})% \succeq-\sum_{i=1}^{n}2L_{f}{\mathcal{H}}_{\nu}\left\|{\bf{z}}_{i}^{t}-{\bf{x}% }^{*}\right\|^{\nu}\cdot{\bf I},bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_J ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ⪰ - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT 2 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ⋅ bold_I ,

which implies that

σmin(𝐇t)σmin(𝐉(𝐱)𝐉(𝐱))i=1n2Lfν𝐳it𝐱ν=μ22Lfνi=1n𝐳it𝐱ν,subscript𝜎superscript𝐇𝑡subscript𝜎𝐉superscriptsuperscript𝐱top𝐉superscript𝐱superscriptsubscript𝑖1𝑛2subscript𝐿𝑓subscript𝜈superscriptnormsuperscriptsubscript𝐳𝑖𝑡superscript𝐱𝜈superscript𝜇22subscript𝐿𝑓subscript𝜈superscriptsubscript𝑖1𝑛superscriptnormsuperscriptsubscript𝐳𝑖𝑡superscript𝐱𝜈\displaystyle\sigma_{\min}({\bf H}^{t})\geq\sigma_{\min}({\bf J}({\bf{x}}^{*})% ^{\top}{\bf J}({\bf{x}}^{*}))-\sum_{i=1}^{n}2L_{f}{\mathcal{H}}_{\nu}\left\|{% \bf{z}}_{i}^{t}-{\bf{x}}^{*}\right\|^{\nu}=\mu^{2}-2L_{f}{\mathcal{H}}_{\nu}% \sum_{i=1}^{n}\left\|{\bf{z}}_{i}^{t}-{\bf{x}}^{*}\right\|^{\nu},italic_σ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ≥ italic_σ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( bold_J ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT 2 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT = italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ,

where the last step is based on Proposition 1. ∎

D.4 The Proof of Theorem 1

We first show the update

𝐆t+1=𝐆t𝐆t𝐔t(𝐈+(𝐕t)𝐆t𝐔t)1(𝐕t)𝐆tsuperscript𝐆𝑡1superscript𝐆𝑡superscript𝐆𝑡superscript𝐔𝑡superscript𝐈superscriptsuperscript𝐕𝑡topsuperscript𝐆𝑡superscript𝐔𝑡1superscriptsuperscript𝐕𝑡topsuperscript𝐆𝑡\displaystyle{\bf G}^{t+1}={\bf G}^{t}-{\bf G}^{t}{\bf U}^{t}({\bf I}+({\bf V}% ^{t})^{\top}{\bf G}^{t}{\bf U}^{t})^{-1}({\bf V}^{t})^{\top}{\bf G}^{t}bold_G start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( bold_I + ( bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT

in IGN method (Line 9 of Algorithm 1) is well-defined if the matrices 𝐇tsuperscript𝐇𝑡{\bf H}^{t}bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and 𝐇t+1superscript𝐇𝑡1{\bf H}^{t+1}bold_H start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT are non-singular.

Lemma 12.

Following the setting of Theorem 1, if the matrices 𝐇tsuperscript𝐇𝑡{\bf H}^{t}bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and 𝐇t+1superscript𝐇𝑡1{\bf H}^{t+1}bold_H start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT are non-singular, then the matrix 𝐈+𝐕t𝐆t𝐔t𝐈superscriptsuperscript𝐕𝑡topsuperscript𝐆𝑡superscript𝐔𝑡{\bf I}+{{\bf V}^{t}}^{\top}{\bf G}^{t}{\bf U}^{t}bold_I + bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT is also non-singular, where

𝐔t=[𝐠it(𝐳itt)𝐠it(𝐱t+1)]d×2,𝐕t=[𝐠it(𝐳itt)𝐠it(𝐱t+1)]d×2andit=t%n+1.formulae-sequencesuperscript𝐔𝑡matrixsubscript𝐠subscript𝑖𝑡superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝐠subscript𝑖𝑡superscript𝐱𝑡1superscript𝑑2superscript𝐕𝑡matrixsubscript𝐠subscript𝑖𝑡superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝐠subscript𝑖𝑡superscript𝐱𝑡1superscript𝑑2andsubscript𝑖𝑡percent𝑡𝑛1\displaystyle{\bf U}^{t}=\begin{bmatrix}-{\bf{g}}_{i_{t}}({\bf{z}}_{i_{t}}^{t}% )\!&{\bf{g}}_{i_{t}}({\bf{x}}^{t+1})\end{bmatrix}\in{\mathbb{R}}^{d\times 2},~% {}~{}{\bf V}^{t}=\begin{bmatrix}{\bf{g}}_{i_{t}}({\bf{z}}_{i_{t}}^{t})\!&{\bf{% g}}_{i_{t}}({\bf{x}}^{t+1})\end{bmatrix}\in{\mathbb{R}}^{d\times 2}~{}~{}\text% {and}~{}~{}i_{t}={t\%n}+1.bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = [ start_ARG start_ROW start_CELL - bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) end_CELL start_CELL bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) end_CELL end_ROW end_ARG ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_d × 2 end_POSTSUPERSCRIPT , bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = [ start_ARG start_ROW start_CELL bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) end_CELL start_CELL bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) end_CELL end_ROW end_ARG ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_d × 2 end_POSTSUPERSCRIPT and italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_t % italic_n + 1 .
Proof.

The recursion of 𝐇tsuperscript𝐇𝑡{\bf H}^{t}bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and the definition of 𝐔tsuperscript𝐔𝑡{\bf U}^{t}bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and 𝐕tsuperscript𝐕𝑡{\bf V}^{t}bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT imply

𝐇t+1=𝐇t𝐠it(𝐳itt)𝐠it(𝐳itt)+𝐠it(𝐱t+1)𝐠it(𝐱t+1)=𝐇t+𝐔t𝐕t.superscript𝐇𝑡1superscript𝐇𝑡subscript𝐠subscript𝑖𝑡superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝐠subscript𝑖𝑡superscriptsuperscriptsubscript𝐳subscript𝑖𝑡𝑡topsubscript𝐠subscript𝑖𝑡superscript𝐱𝑡1subscript𝐠subscript𝑖𝑡superscriptsuperscript𝐱𝑡1topsuperscript𝐇𝑡superscript𝐔𝑡superscriptsuperscript𝐕𝑡top\displaystyle{\bf H}^{t+1}={\bf H}^{t}-{\bf{g}}_{i_{t}}({\bf{z}}_{i_{t}}^{t}){% \bf{g}}_{i_{t}}({\bf{z}}_{i_{t}}^{t})^{\top}+{\bf{g}}_{i_{t}}({\bf{x}}^{t+1}){% \bf{g}}_{i_{t}}({\bf{x}}^{t+1})^{\top}={\bf H}^{t}+{\bf U}^{t}{{\bf V}^{t}}^{% \top}.bold_H start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT = bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT .

Since we assume matrices 𝐇tsuperscript𝐇𝑡{\bf H}^{t}bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and 𝐇t+1superscript𝐇𝑡1{\bf H}^{t+1}bold_H start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT are non-singular, applying the matrix determinant lemma [41, Section 9.1.2] on above equation leads to

det(𝐇t+1)=det(𝐇t+𝐔t𝐕t)=det(𝐈+𝐕t(𝐇t)1𝐔t)det(𝐇t).superscript𝐇𝑡1superscript𝐇𝑡superscript𝐔𝑡superscriptsuperscript𝐕𝑡top𝐈superscriptsuperscript𝐕𝑡topsuperscriptsuperscript𝐇𝑡1superscript𝐔𝑡superscript𝐇𝑡\displaystyle\det({\bf H}^{t+1})=\det({\bf H}^{t}+{\bf U}^{t}{{\bf V}^{t}}^{% \top})=\det({\bf I}+{{\bf V}^{t}}^{\top}({\bf H}^{t})^{-1}{\bf U}^{t})\det({% \bf H}^{t}).roman_det ( bold_H start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) = roman_det ( bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) = roman_det ( bold_I + bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) roman_det ( bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) .

Then the definition 𝐆t=𝐇t1superscript𝐆𝑡superscriptsuperscript𝐇𝑡1{\bf G}^{t}={{\bf H}^{t}}^{-1}bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT implies

det(𝐈+𝐕t𝐆t𝐔t)=det(𝐈+𝐕t𝐇t1𝐔t)0𝐈superscriptsuperscript𝐕𝑡topsuperscript𝐆𝑡superscript𝐔𝑡𝐈superscriptsuperscript𝐕𝑡topsuperscriptsuperscript𝐇𝑡1superscript𝐔𝑡0\displaystyle\det({\bf I}+{{\bf V}^{t}}^{\top}{\bf G}^{t}{\bf U}^{t})=\det({% \bf I}+{{\bf V}^{t}}^{\top}{{\bf H}^{t}}^{-1}{\bf U}^{t})\neq 0roman_det ( bold_I + bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) = roman_det ( bold_I + bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ≠ 0

which finish the proofs. ∎

Then we show the non-singular assumption on {𝐇j}j=0tsuperscriptsubscriptsuperscript𝐇𝑗𝑗0𝑡\{{\bf H}^{j}\}_{j=0}^{t}{ bold_H start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT can upper bound the distance 𝐱t+1𝐱normsuperscript𝐱𝑡1superscript𝐱\left\|{\bf{x}}^{t+1}-{\bf{x}}^{*}\right\|∥ bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥.

Lemma 13.

Under Assumptions 1 and 2, we assume matrices {𝐇j}j=0tsuperscriptsubscriptsuperscript𝐇𝑗𝑗0𝑡\{{\bf H}^{j}\}_{j=0}^{t}{ bold_H start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT are non-singular and run IGN (Algorithm 1), then it holds

𝐱t+1𝐱Lfν1+ν𝐆ti=1n𝐳it𝐱1+ν,normsuperscript𝐱𝑡1superscript𝐱subscript𝐿𝑓subscript𝜈1𝜈normsuperscript𝐆𝑡superscriptsubscript𝑖1𝑛superscriptnormsuperscriptsubscript𝐳𝑖𝑡superscript𝐱1𝜈\displaystyle\left\|{\bf{x}}^{t+1}-{\bf{x}}^{*}\right\|\leq\frac{L_{f}{% \mathcal{H}}_{\nu}}{1+\nu}\left\|{\bf G}^{t}\right\|\sum_{i=1}^{n}\left\|{\bf{% z}}_{i}^{t}-{\bf{x}}^{*}\right\|^{1+\nu},∥ bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ divide start_ARG italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_ν end_ARG ∥ bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ,

where 𝐆t=(𝐇t)1superscript𝐆𝑡superscriptsuperscript𝐇𝑡1{\bf G}^{t}=\left({\bf H}^{t}\right)^{-1}bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = ( bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT.

Proof.

Subtracting the term 𝐱superscript𝐱{\bf{x}}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT on both sides of equation (8), we have

𝐱t+1𝐱superscript𝐱𝑡1superscript𝐱\displaystyle{\bf{x}}^{t+1}-{\bf{x}}^{*}bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT =𝐆t(i=1n𝐠i(𝐳it)𝐠i(𝐳it)(𝐳it𝐱)i=1nfi(𝐳it)𝐠i(𝐳it))absentsuperscript𝐆𝑡superscriptsubscript𝑖1𝑛subscript𝐠𝑖superscriptsubscript𝐳𝑖𝑡subscript𝐠𝑖superscriptsuperscriptsubscript𝐳𝑖𝑡topsuperscriptsubscript𝐳𝑖𝑡superscript𝐱superscriptsubscript𝑖1𝑛subscript𝑓𝑖superscriptsubscript𝐳𝑖𝑡subscript𝐠𝑖superscriptsubscript𝐳𝑖𝑡\displaystyle={\bf G}^{t}\left(\sum_{i=1}^{n}{\bf{g}}_{i}({\bf{z}}_{i}^{t}){% \bf{g}}_{i}({\bf{z}}_{i}^{t})^{\top}({\bf{z}}_{i}^{t}-{\bf{x}}^{*})-\sum_{i=1}% ^{n}f_{i}({\bf{z}}_{i}^{t}){\bf{g}}_{i}({\bf{z}}_{i}^{t})\right)= bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) )
=𝐆t(i=1n𝐠i(𝐳it)𝐠i(𝐳it)(𝐳it𝐱)i=1nfi(𝐳it)𝐠i(𝐳it)+i=1nfi(𝐱)𝐠i(𝐳it))absentsuperscript𝐆𝑡superscriptsubscript𝑖1𝑛subscript𝐠𝑖superscriptsubscript𝐳𝑖𝑡subscript𝐠𝑖superscriptsuperscriptsubscript𝐳𝑖𝑡topsuperscriptsubscript𝐳𝑖𝑡superscript𝐱superscriptsubscript𝑖1𝑛subscript𝑓𝑖superscriptsubscript𝐳𝑖𝑡subscript𝐠𝑖superscriptsubscript𝐳𝑖𝑡superscriptsubscript𝑖1𝑛subscript𝑓𝑖superscript𝐱subscript𝐠𝑖superscriptsubscript𝐳𝑖𝑡\displaystyle={\bf G}^{t}\left(\sum_{i=1}^{n}{\bf{g}}_{i}({\bf{z}}_{i}^{t}){% \bf{g}}_{i}({\bf{z}}_{i}^{t})^{\top}({\bf{z}}_{i}^{t}-{\bf{x}}^{*})-\sum_{i=1}% ^{n}f_{i}({\bf{z}}_{i}^{t}){\bf{g}}_{i}({\bf{z}}_{i}^{t})+\sum_{i=1}^{n}f_{i}(% {\bf{x}}^{*}){\bf{g}}_{i}({\bf{z}}_{i}^{t})\right)= bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) )
=𝐆ti=1n(𝐠i(𝐳it)(𝐳it𝐱)fi(𝐳it)+fi(𝐱))𝐠i(𝐳it).absentsuperscript𝐆𝑡superscriptsubscript𝑖1𝑛subscript𝐠𝑖superscriptsuperscriptsubscript𝐳𝑖𝑡topsuperscriptsubscript𝐳𝑖𝑡superscript𝐱subscript𝑓𝑖superscriptsubscript𝐳𝑖𝑡subscript𝑓𝑖superscript𝐱subscript𝐠𝑖superscriptsubscript𝐳𝑖𝑡\displaystyle={\bf G}^{t}\sum_{i=1}^{n}\left({\bf{g}}_{i}({\bf{z}}_{i}^{t})^{% \top}({\bf{z}}_{i}^{t}-{\bf{x}}^{*})-f_{i}({\bf{z}}_{i}^{t})+f_{i}({\bf{x}}^{*% })\right){\bf{g}}_{i}({\bf{z}}_{i}^{t}).= bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) .

Taking the norm on the both sides of above results, we have

𝐱t+1𝐱normsuperscript𝐱𝑡1superscript𝐱\displaystyle\left\|{\bf{x}}^{t+1}-{\bf{x}}^{*}\right\|∥ bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ =𝐆ti=1n(𝐠i(𝐳it)(𝐳it𝐱)fi(𝐳it)+fi(𝐱))𝐠i(𝐳it)absentnormsuperscript𝐆𝑡superscriptsubscript𝑖1𝑛subscript𝐠𝑖superscriptsuperscriptsubscript𝐳𝑖𝑡topsuperscriptsubscript𝐳𝑖𝑡superscript𝐱subscript𝑓𝑖superscriptsubscript𝐳𝑖𝑡subscript𝑓𝑖superscript𝐱subscript𝐠𝑖superscriptsubscript𝐳𝑖𝑡\displaystyle=\left\|{\bf G}^{t}\sum_{i=1}^{n}\left({\bf{g}}_{i}({\bf{z}}_{i}^% {t})^{\top}({\bf{z}}_{i}^{t}-{\bf{x}}^{*})-f_{i}({\bf{z}}_{i}^{t})+f_{i}({\bf{% x}}^{*})\right){\bf{g}}_{i}({\bf{z}}_{i}^{t})\right\|= ∥ bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥
𝐆ti=1n(𝐠i(𝐳it)(𝐳it𝐱)fi(𝐳it)+fi(𝐱))𝐠i(𝐳it)absentnormsuperscript𝐆𝑡normsuperscriptsubscript𝑖1𝑛subscript𝐠𝑖superscriptsuperscriptsubscript𝐳𝑖𝑡topsuperscriptsubscript𝐳𝑖𝑡superscript𝐱subscript𝑓𝑖superscriptsubscript𝐳𝑖𝑡subscript𝑓𝑖superscript𝐱subscript𝐠𝑖superscriptsubscript𝐳𝑖𝑡\displaystyle\leq\left\|{\bf G}^{t}\right\|\left\|\sum_{i=1}^{n}\left({\bf{g}}% _{i}({\bf{z}}_{i}^{t})^{\top}({\bf{z}}_{i}^{t}-{\bf{x}}^{*})-f_{i}({\bf{z}}_{i% }^{t})+f_{i}({\bf{x}}^{*})\right){\bf{g}}_{i}({\bf{z}}_{i}^{t})\right\|≤ ∥ bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥
Lfν1+ν𝐆ti=1n𝐳it𝐱1+νabsentsubscript𝐿𝑓subscript𝜈1𝜈normsuperscript𝐆𝑡superscriptsubscript𝑖1𝑛superscriptnormsuperscriptsubscript𝐳𝑖𝑡superscript𝐱1𝜈\displaystyle\leq\frac{L_{f}{\mathcal{H}}_{\nu}}{1+\nu}\left\|{\bf G}^{t}% \right\|\sum_{i=1}^{n}\left\|{\bf{z}}_{i}^{t}-{\bf{x}}^{*}\right\|^{1+\nu}≤ divide start_ARG italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_ν end_ARG ∥ bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT

where the first inequality comes from the property of matrix norm, the second inequality is based on Lemma 4 and 5. ∎

We split the results of Theorem 1 into two parts (i.e., Theorem 3 and 4) and provide their proofs as follows. Our analysis is based on the properties of our the auxiliary sequence constructed in Section C.

Theorem 3.

Under the Assumption 1, 2 and 3, we run IGN (Algorithm 1) with initialization 𝐱0dsuperscript𝐱0superscript𝑑{\bf{x}}^{0}\in{\mathbb{R}}^{d}bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and 𝐇0=𝐉(𝐱0)𝐉(𝐱0)superscript𝐇0𝐉superscriptsuperscript𝐱0top𝐉superscript𝐱0{\bf H}^{0}={\bf J}({\bf{x}}^{0})^{\top}{\bf J}({\bf{x}}^{0})bold_H start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = bold_J ( bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) such that

𝐱0𝐱(μ24Lfνn)1/ν,normsuperscript𝐱0superscript𝐱superscriptsuperscript𝜇24subscript𝐿𝑓subscript𝜈𝑛1𝜈\displaystyle\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|\leq\left(\frac{\mu^{2}}{% 4L_{f}{\mathcal{H}}_{\nu}n}\right)^{{1}/{\nu}},∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ ( divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_n end_ARG ) start_POSTSUPERSCRIPT 1 / italic_ν end_POSTSUPERSCRIPT ,

then it holds

σmin(𝐈+(𝐕t)(𝐇t)1𝐔t)>0,𝐇tμ22𝐈and𝐱t𝐱at+1(n,ν)𝐱0𝐱formulae-sequencesubscript𝜎𝐈superscriptsuperscript𝐕𝑡topsuperscriptsuperscript𝐇𝑡1superscript𝐔𝑡0formulae-sequencesucceeds-or-equalssuperscript𝐇𝑡superscript𝜇22𝐈andnormsuperscript𝐱𝑡superscript𝐱subscript𝑎𝑡1𝑛𝜈normsuperscript𝐱0superscript𝐱\displaystyle\sigma_{\min}({\bf I}+({\bf V}^{t})^{\top}({\bf H}^{t})^{-1}{\bf U% }^{t})>0,\quad{\bf H}^{t}\succeq\frac{\mu^{2}}{2}{\bf I}\quad\text{and}\quad% \left\|{\bf{x}}^{t}-{\bf{x}}^{*}\right\|\leq a_{t+1}(n,\nu)\left\|{\bf{x}}^{0}% -{\bf{x}}^{*}\right\|italic_σ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( bold_I + ( bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) > 0 , bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⪰ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG bold_I and ∥ bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥

for all t0𝑡0t\geq 0italic_t ≥ 0, where sequence {at(n,ν)}t0subscriptsubscript𝑎𝑡𝑛𝜈𝑡0\{a_{t}(n,\nu)\}_{t\geq 0}{ italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) } start_POSTSUBSCRIPT italic_t ≥ 0 end_POSTSUBSCRIPT is defined in equation (23).

Proof.

We first show

𝐇tμ22𝐈and𝐱t𝐱at+1(n,ν)𝐱0𝐱formulae-sequencesucceeds-or-equalssuperscript𝐇𝑡superscript𝜇22𝐈andnormsuperscript𝐱𝑡superscript𝐱subscript𝑎𝑡1𝑛𝜈normsuperscript𝐱0superscript𝐱\displaystyle{\bf H}^{t}\succeq\frac{\mu^{2}}{2}{\bf I}\qquad\text{and}\qquad% \left\|{\bf{x}}^{t}-{\bf{x}}^{*}\right\|\leq a_{t+1}(n,\nu)\left\|{\bf{x}}^{0}% -{\bf{x}}^{*}\right\|bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⪰ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG bold_I and ∥ bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ (26)

holds for all t0𝑡0t\geq 0italic_t ≥ 0. We split the proof of results (26) into the following three parts.

Part I: For t=0𝑡0t=0italic_t = 0, the initialization and the fact a0=1subscript𝑎01a_{0}=1italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1 leads to

𝐱0𝐱=a0(n,ν)𝐱0𝐱.normsuperscript𝐱0superscript𝐱subscript𝑎0𝑛𝜈normsuperscript𝐱0superscript𝐱\displaystyle\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|=a_{0}(n,\nu)\left\|{\bf{% x}}^{0}-{\bf{x}}^{*}\right\|.∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ = italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_n , italic_ν ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ .

Part II: For all t=0,,n1𝑡0𝑛1t=0,\cdots,n-1italic_t = 0 , ⋯ , italic_n - 1, we use induction to prove the results of

𝐇tμ22𝐈and𝐱t+1𝐱at+1(n,ν)𝐱0𝐱.formulae-sequencesucceeds-or-equalssuperscript𝐇𝑡superscript𝜇22𝐈andnormsuperscript𝐱𝑡1superscript𝐱subscript𝑎𝑡1𝑛𝜈normsuperscript𝐱0superscript𝐱\displaystyle{\bf H}^{t}\succeq\frac{\mu^{2}}{2}{\bf I}\qquad\text{and}\qquad% \left\|{\bf{x}}^{t+1}-{\bf{x}}^{*}\right\|\leq a_{t+1}(n,\nu)\left\|{\bf{x}}^{% 0}-{\bf{x}}^{*}\right\|.bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⪰ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG bold_I and ∥ bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ . (27)

For the induction base, we can apply Lemma 2 to verify

σmin(𝐇0)subscript𝜎superscript𝐇0\displaystyle\sigma_{\min}({\bf H}^{0})italic_σ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( bold_H start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) μ22Lfνi=1n𝐳i0𝐱νabsentsuperscript𝜇22subscript𝐿𝑓subscript𝜈superscriptsubscript𝑖1𝑛superscriptnormsuperscriptsubscript𝐳𝑖0superscript𝐱𝜈\displaystyle\geq\mu^{2}-2L_{f}{\mathcal{H}}_{\nu}\sum_{i=1}^{n}\left\|{\bf{z}% }_{i}^{0}-{\bf{x}}^{*}\right\|^{\nu}≥ italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT
=μ22Lfνi=1n𝐱0𝐱νabsentsuperscript𝜇22subscript𝐿𝑓subscript𝜈superscriptsubscript𝑖1𝑛superscriptnormsuperscript𝐱0superscript𝐱𝜈\displaystyle=\mu^{2}-2L_{f}{\mathcal{H}}_{\nu}\sum_{i=1}^{n}\left\|{\bf{x}}^{% 0}-{\bf{x}}^{*}\right\|^{\nu}= italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT
μ22Lfνnμ24Lfνnabsentsuperscript𝜇22subscript𝐿𝑓subscript𝜈𝑛superscript𝜇24subscript𝐿𝑓subscript𝜈𝑛\displaystyle\geq\mu^{2}-2L_{f}{\mathcal{H}}_{\nu}n\frac{\mu^{2}}{4L_{f}{% \mathcal{H}}_{\nu}n}≥ italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_n divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_n end_ARG
=μ2μ22absentsuperscript𝜇2superscript𝜇22\displaystyle=\mu^{2}-\frac{\mu^{2}}{2}= italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG
=μ22.absentsuperscript𝜇22\displaystyle=\frac{\mu^{2}}{2}.= divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG .

This implies

𝐇0μ22and𝐆0=(𝐇0)12μ2.formulae-sequencesucceeds-or-equalssuperscript𝐇0superscript𝜇22andnormsuperscript𝐆0normsuperscriptsuperscript𝐇012superscript𝜇2\displaystyle{\bf H}^{0}\succeq\frac{\mu^{2}}{2}\qquad\text{and}\qquad\left\|{% \bf G}^{0}\right\|=\left\|({\bf H}^{0})^{-1}\right\|\leq\frac{2}{\mu^{2}}.bold_H start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ⪰ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG and ∥ bold_G start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∥ = ∥ ( bold_H start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ ≤ divide start_ARG 2 end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (28)

According to Lemma 13, we have

𝐱1𝐱normsuperscript𝐱1superscript𝐱\displaystyle\left\|{\bf{x}}^{1}-{\bf{x}}^{*}\right\|∥ bold_x start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ Lfν1+ν𝐆0i=1n𝐳i0𝐱1+νabsentsubscript𝐿𝑓subscript𝜈1𝜈normsuperscript𝐆0superscriptsubscript𝑖1𝑛superscriptnormsuperscriptsubscript𝐳𝑖0superscript𝐱1𝜈\displaystyle\leq\frac{L_{f}{\mathcal{H}}_{\nu}}{1+\nu}\left\|{\bf G}^{0}% \right\|\sum_{i=1}^{n}\left\|{\bf{z}}_{i}^{0}-{\bf{x}}^{*}\right\|^{1+\nu}≤ divide start_ARG italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_ν end_ARG ∥ bold_G start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT
Lfν1+ν2μ2i=1n𝐳i0𝐱1+νabsentsubscript𝐿𝑓subscript𝜈1𝜈2superscript𝜇2superscriptsubscript𝑖1𝑛superscriptnormsuperscriptsubscript𝐳𝑖0superscript𝐱1𝜈\displaystyle\leq\frac{L_{f}{\mathcal{H}}_{\nu}}{1+\nu}\cdot\frac{2}{\mu^{2}}% \cdot\sum_{i=1}^{n}\left\|{\bf{z}}_{i}^{0}-{\bf{x}}^{*}\right\|^{1+\nu}≤ divide start_ARG italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_ν end_ARG ⋅ divide start_ARG 2 end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ⋅ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT
=Lfν1+ν2μ2n𝐱0𝐱1+νabsentsubscript𝐿𝑓subscript𝜈1𝜈2superscript𝜇2𝑛superscriptnormsuperscript𝐱0superscript𝐱1𝜈\displaystyle=\frac{L_{f}{\mathcal{H}}_{\nu}}{1+\nu}\cdot\frac{2}{\mu^{2}}% \cdot n\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|^{1+\nu}= divide start_ARG italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_ν end_ARG ⋅ divide start_ARG 2 end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ⋅ italic_n ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT
nLfν1+ν2μ2μ24Lfνn𝐱0𝐱absent𝑛subscript𝐿𝑓subscript𝜈1𝜈2superscript𝜇2superscript𝜇24subscript𝐿𝑓subscript𝜈𝑛normsuperscript𝐱0superscript𝐱\displaystyle\leq\frac{nL_{f}{\mathcal{H}}_{\nu}}{1+\nu}\cdot\frac{2}{\mu^{2}}% \cdot\frac{\mu^{2}}{4L_{f}{\mathcal{H}}_{\nu}n}\left\|{\bf{x}}^{0}-{\bf{x}}^{*% }\right\|≤ divide start_ARG italic_n italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_ν end_ARG ⋅ divide start_ARG 2 end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ⋅ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_n end_ARG ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥
=12(1+ν)𝐱0𝐱absent121𝜈normsuperscript𝐱0superscript𝐱\displaystyle=\frac{1}{2(1+\nu)}\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|= divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥
=a1(n,ν)𝐱0𝐱,absentsubscript𝑎1𝑛𝜈normsuperscript𝐱0superscript𝐱\displaystyle=a_{1}(n,\nu)\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|,= italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ,

where the first inequality is based on equation (28) and the second inequality is based on initial condition. Therefore, the induction base holds

For the induction step, we assume

𝐇jμ22𝐈and𝐱j+1𝐱aj+1(n,ν)𝐱0𝐱formulae-sequencesucceeds-or-equalssuperscript𝐇𝑗superscript𝜇22𝐈andnormsuperscript𝐱𝑗1superscript𝐱subscript𝑎𝑗1𝑛𝜈normsuperscript𝐱0superscript𝐱\displaystyle{\bf H}^{j}\succeq\frac{\mu^{2}}{2}{\bf I}\qquad\text{and}\qquad% \left\|{\bf{x}}^{j+1}-{\bf{x}}^{*}\right\|\leq a_{j+1}(n,\nu)\left\|{\bf{x}}^{% 0}-{\bf{x}}^{*}\right\|bold_H start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ⪰ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG bold_I and ∥ bold_x start_POSTSUPERSCRIPT italic_j + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ italic_a start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥

hold for all j=2,,t1𝑗2𝑡1j=2,\cdots,t-1italic_j = 2 , ⋯ , italic_t - 1 such that tn1𝑡𝑛1t\leq n-1italic_t ≤ italic_n - 1. Therefore, the update (9) means

𝐳it={𝐱i,1it,𝐱0,t<in.superscriptsubscript𝐳𝑖𝑡casessuperscript𝐱𝑖1𝑖𝑡superscript𝐱0𝑡𝑖𝑛\displaystyle{\bf{z}}_{i}^{t}=\begin{cases}{\bf{x}}^{i},~{}~{}~{}~{}&1\leq i% \leq t,\\ {\bf{x}}^{0},~{}~{}~{}~{}&t<i\leq n.\end{cases}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = { start_ROW start_CELL bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , end_CELL start_CELL 1 ≤ italic_i ≤ italic_t , end_CELL end_ROW start_ROW start_CELL bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , end_CELL start_CELL italic_t < italic_i ≤ italic_n . end_CELL end_ROW (29)

The induction hypothesis leads to

𝐱j𝐱ν(aj(n,ν))ν𝐱0𝐱ν𝐱0𝐱νμ24Lfνn,superscriptnormsuperscript𝐱𝑗superscript𝐱𝜈superscriptsubscript𝑎𝑗𝑛𝜈𝜈superscriptnormsuperscript𝐱0superscript𝐱𝜈superscriptnormsuperscript𝐱0superscript𝐱𝜈superscript𝜇24subscript𝐿𝑓subscript𝜈𝑛\displaystyle\left\|{\bf{x}}^{j}-{\bf{x}}^{*}\right\|^{\nu}\leq(a_{j}(n,\nu))^% {\nu}\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|^{\nu}\leq\left\|{\bf{x}}^{0}-{% \bf{x}}^{*}\right\|^{\nu}\leq\frac{\mu^{2}}{4L_{f}{\mathcal{H}}_{\nu}n},∥ bold_x start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ≤ ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ≤ ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ≤ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_n end_ARG ,

for j=1,,t1𝑗1𝑡1j=1,\cdots,t-1italic_j = 1 , ⋯ , italic_t - 1, where the second is based on Lemma 6 and the third comes from the initial condition. Combining with the result of (29), we achive

𝐳it𝐱νμ24Lfνn.superscriptnormsuperscriptsubscript𝐳𝑖𝑡superscript𝐱𝜈superscript𝜇24subscript𝐿𝑓subscript𝜈𝑛\displaystyle\left\|{\bf{z}}_{i}^{t}-{\bf{x}}^{*}\right\|^{\nu}\leq\frac{\mu^{% 2}}{4L_{f}{\mathcal{H}}_{\nu}n}.∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ≤ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_n end_ARG .

According to Lemma 2, we have

σmin(𝐇t)subscript𝜎superscript𝐇𝑡\displaystyle\sigma_{\min}({\bf H}^{t})italic_σ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) μ22Lfνi=1n𝐳it𝐱νabsentsuperscript𝜇22subscript𝐿𝑓subscript𝜈superscriptsubscript𝑖1𝑛superscriptnormsuperscriptsubscript𝐳𝑖𝑡superscript𝐱𝜈\displaystyle\geq\mu^{2}-2L_{f}{\mathcal{H}}_{\nu}\sum_{i=1}^{n}\left\|{\bf{z}% }_{i}^{t}-{\bf{x}}^{*}\right\|^{\nu}≥ italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT
μ22Lfνnμ24Lfνnabsentsuperscript𝜇22subscript𝐿𝑓subscript𝜈𝑛superscript𝜇24subscript𝐿𝑓subscript𝜈𝑛\displaystyle\geq\mu^{2}-2L_{f}{\mathcal{H}}_{\nu}n\frac{\mu^{2}}{4L_{f}{% \mathcal{H}}_{\nu}n}≥ italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_n divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_n end_ARG
=μ2μ22absentsuperscript𝜇2superscript𝜇22\displaystyle=\mu^{2}-\frac{\mu^{2}}{2}= italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG
=μ22,absentsuperscript𝜇22\displaystyle=\frac{\mu^{2}}{2},= divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG ,

where the second inequality comes from the initial condition. Therefore, we have

𝐇tμ22𝐈and𝐆t=(𝐇t)12μ2.formulae-sequencesucceeds-or-equalssuperscript𝐇𝑡superscript𝜇22𝐈andnormsuperscript𝐆𝑡normsuperscriptsuperscript𝐇𝑡12superscript𝜇2\displaystyle{\bf H}^{t}\succeq\frac{\mu^{2}}{2}{\bf I}\qquad\text{and}\qquad% \left\|{\bf G}^{t}\right\|=\left\|({\bf H}^{t})^{-1}\right\|\leq\frac{2}{\mu^{% 2}}.bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⪰ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG bold_I and ∥ bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ = ∥ ( bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ ≤ divide start_ARG 2 end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

According to Lemma 13, we have

𝐱t+1𝐱normsuperscript𝐱𝑡1superscript𝐱\displaystyle\left\|{\bf{x}}^{t+1}-{\bf{x}}^{*}\right\|∥ bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ Lfν1+ν𝐆ti=1n𝐳it𝐱1+νabsentsubscript𝐿𝑓subscript𝜈1𝜈normsuperscript𝐆𝑡superscriptsubscript𝑖1𝑛superscriptnormsuperscriptsubscript𝐳𝑖𝑡superscript𝐱1𝜈\displaystyle\leq\frac{L_{f}{\mathcal{H}}_{\nu}}{1+\nu}\left\|{\bf G}^{t}% \right\|\sum_{i=1}^{n}\left\|{\bf{z}}_{i}^{t}-{\bf{x}}^{*}\right\|^{1+\nu}≤ divide start_ARG italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_ν end_ARG ∥ bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT
Lfν1+ν2μ2i=1n𝐳it𝐱1+νabsentsubscript𝐿𝑓subscript𝜈1𝜈2superscript𝜇2superscriptsubscript𝑖1𝑛superscriptnormsuperscriptsubscript𝐳𝑖𝑡superscript𝐱1𝜈\displaystyle\leq\frac{L_{f}{\mathcal{H}}_{\nu}}{1+\nu}\frac{2}{\mu^{2}}\sum_{% i=1}^{n}\left\|{\bf{z}}_{i}^{t}-{\bf{x}}^{*}\right\|^{1+\nu}≤ divide start_ARG italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_ν end_ARG divide start_ARG 2 end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT
2Lfν(1+ν)μ2(j=1t𝐱j𝐱1+ν+(nt)𝐱0𝐱1+ν)absent2subscript𝐿𝑓subscript𝜈1𝜈superscript𝜇2superscriptsubscript𝑗1𝑡superscriptnormsuperscript𝐱𝑗superscript𝐱1𝜈𝑛𝑡superscriptnormsuperscript𝐱0superscript𝐱1𝜈\displaystyle\leq\frac{2L_{f}{\mathcal{H}}_{\nu}}{(1+\nu)\mu^{2}}\left(\sum_{j% =1}^{t}\left\|{\bf{x}}^{j}-{\bf{x}}^{*}\right\|^{1+\nu}+(n-t)\left\|{\bf{x}}^{% 0}-{\bf{x}}^{*}\right\|^{1+\nu}\right)≤ divide start_ARG 2 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG ( 1 + italic_ν ) italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ bold_x start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT + ( italic_n - italic_t ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT )
2Lfν(1+ν)μ2(j=1t(aj(n,ν))1+ν𝐱0𝐱1+ν+(nt)𝐱0𝐱1+ν)absent2subscript𝐿𝑓subscript𝜈1𝜈superscript𝜇2superscriptsubscript𝑗1𝑡superscriptsubscript𝑎𝑗𝑛𝜈1𝜈superscriptnormsuperscript𝐱0superscript𝐱1𝜈𝑛𝑡superscriptnormsuperscript𝐱0superscript𝐱1𝜈\displaystyle\leq\frac{2L_{f}{\mathcal{H}}_{\nu}}{(1+\nu)\mu^{2}}\left(\sum_{j% =1}^{t}(a_{j}(n,\nu))^{1+\nu}\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|^{1+\nu}+% (n-t)\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|^{1+\nu}\right)≤ divide start_ARG 2 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG ( 1 + italic_ν ) italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT + ( italic_n - italic_t ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT )
2Lfν(1+ν)μ2μ24Lfνn(j=1t(aj(n,ν))1+ν+nt)𝐱0𝐱absent2subscript𝐿𝑓subscript𝜈1𝜈superscript𝜇2superscript𝜇24subscript𝐿𝑓subscript𝜈𝑛superscriptsubscript𝑗1𝑡superscriptsubscript𝑎𝑗𝑛𝜈1𝜈𝑛𝑡normsuperscript𝐱0superscript𝐱\displaystyle\leq\frac{2L_{f}{\mathcal{H}}_{\nu}}{(1+\nu)\mu^{2}}\frac{\mu^{2}% }{4L_{f}{\mathcal{H}}_{\nu}n}\left(\sum_{j=1}^{t}(a_{j}(n,\nu))^{1+\nu}+n-t% \right)\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|≤ divide start_ARG 2 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG ( 1 + italic_ν ) italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_n end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT + italic_n - italic_t ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥
=12(1+ν)n(j=1t(aj(n,ν))1+ν+nt)𝐱0𝐱absent121𝜈𝑛superscriptsubscript𝑗1𝑡superscriptsubscript𝑎𝑗𝑛𝜈1𝜈𝑛𝑡normsuperscript𝐱0superscript𝐱\displaystyle=\frac{1}{2(1+\nu)n}\left(\sum_{j=1}^{t}(a_{j}(n,\nu))^{1+\nu}+n-% t\right)\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|= divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_n end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT + italic_n - italic_t ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥
=12(1+ν)n(j=0t(aj(n,ν))1+ν+nt1)𝐱0𝐱absent121𝜈𝑛superscriptsubscript𝑗0𝑡superscriptsubscript𝑎𝑗𝑛𝜈1𝜈𝑛𝑡1normsuperscript𝐱0superscript𝐱\displaystyle=\frac{1}{2(1+\nu)n}\left(\sum_{j=0}^{t}(a_{j}(n,\nu))^{1+\nu}+n-% t-1\right)\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|= divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_n end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT + italic_n - italic_t - 1 ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥
=at+1(n,ν)𝐱0𝐱,absentsubscript𝑎𝑡1𝑛𝜈normsuperscript𝐱0superscript𝐱\displaystyle=a_{t+1}(n,\nu)\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|,= italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ,

where the last equality comes from the fact a0(n,ν)=1subscript𝑎0𝑛𝜈1a_{0}(n,\nu)=1italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_n , italic_ν ) = 1. Therefore, we finish the induction.

Part III: For all tn𝑡𝑛t\geq nitalic_t ≥ italic_n, we use induction to prove

𝐇t(μ2/2)𝐈and𝐱t+1𝐱at+1(n,ν)𝐱0𝐱.formulae-sequencesucceeds-or-equalssuperscript𝐇𝑡superscript𝜇22𝐈andnormsuperscript𝐱𝑡1superscript𝐱subscript𝑎𝑡1𝑛𝜈normsuperscript𝐱0superscript𝐱\displaystyle{\bf H}^{t}\succeq(\mu^{2}/2){\bf I}\qquad\text{and}\qquad\left\|% {\bf{x}}^{t+1}-{\bf{x}}^{*}\right\|\leq a_{t+1}(n,\nu)\left\|{\bf{x}}^{0}-{\bf% {x}}^{*}\right\|.bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⪰ ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 ) bold_I and ∥ bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ .

For the induction base, we can verify that it holds (from the result of Part II)

𝐇jμ22𝐈for all j=0,,n1,formulae-sequencesucceeds-or-equalssuperscript𝐇𝑗superscript𝜇22𝐈for all 𝑗0𝑛1\displaystyle{\bf H}^{j}\succeq\frac{\mu^{2}}{2}{\bf I}\qquad\text{for all~{}~% {}}j=0,\dots,n-1,bold_H start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ⪰ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG bold_I for all italic_j = 0 , … , italic_n - 1 ,

and

𝐱j𝐱aj(n,ν)𝐱0𝐱for all j=1,,n.formulae-sequencenormsuperscript𝐱𝑗superscript𝐱subscript𝑎𝑗𝑛𝜈normsuperscript𝐱0superscript𝐱for all 𝑗1𝑛\displaystyle\left\|{\bf{x}}^{j}-{\bf{x}}^{*}\right\|\leq a_{j}(n,\nu)\left\|{% \bf{x}}^{0}-{\bf{x}}^{*}\right\|\qquad\text{for all~{}~{}}j=1,\dots,n.∥ bold_x start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ for all italic_j = 1 , … , italic_n .

Then we have

𝐱j𝐱ν(aj(n,ν))ν𝐱0𝐱ν𝐱0𝐱νμ24Lfνn,for all j=1,,n,formulae-sequencesuperscriptnormsuperscript𝐱𝑗superscript𝐱𝜈superscriptsubscript𝑎𝑗𝑛𝜈𝜈superscriptnormsuperscript𝐱0superscript𝐱𝜈superscriptnormsuperscript𝐱0superscript𝐱𝜈superscript𝜇24subscript𝐿𝑓subscript𝜈𝑛for all 𝑗1𝑛\displaystyle\left\|{\bf{x}}^{j}-{\bf{x}}^{*}\right\|^{\nu}\leq(a_{j}(n,\nu))^% {\nu}\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|^{\nu}\leq\left\|{\bf{x}}^{0}-{% \bf{x}}^{*}\right\|^{\nu}\leq\frac{\mu^{2}}{4L_{f}{\mathcal{H}}_{\nu}n},\quad% \text{for all~{}~{}}j=1,\dots,n,∥ bold_x start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ≤ ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ≤ ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ≤ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_n end_ARG , for all italic_j = 1 , … , italic_n ,

where the second inequality is based on Lemma 6 and the third inequality is based on the initial condition.

From Eq. 9, we have

𝐳in=𝐱ifor all i[n].formulae-sequencesuperscriptsubscript𝐳𝑖𝑛superscript𝐱𝑖for all 𝑖delimited-[]𝑛\displaystyle{\bf{z}}_{i}^{n}={\bf{x}}^{i}\qquad\text{for all~{}~{}}i\in[n].bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT for all italic_i ∈ [ italic_n ] .

Therefore, we have

𝐳in𝐱νμ24Lfνn,for all i[n].formulae-sequencesuperscriptnormsuperscriptsubscript𝐳𝑖𝑛superscript𝐱𝜈superscript𝜇24subscript𝐿𝑓subscript𝜈𝑛for all 𝑖delimited-[]𝑛\displaystyle\left\|{\bf{z}}_{i}^{n}-{\bf{x}}^{*}\right\|^{\nu}\leq\frac{\mu^{% 2}}{4L_{f}{\mathcal{H}}_{\nu}n},\qquad\text{for all~{}~{}}i\in[n].∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ≤ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_n end_ARG , for all italic_i ∈ [ italic_n ] .

According to Lemma 2, we have

σmin(𝐇n)subscript𝜎superscript𝐇𝑛\displaystyle\sigma_{\min}({\bf H}^{n})italic_σ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( bold_H start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) μ22Lfνi=1n𝐳in𝐱νabsentsuperscript𝜇22subscript𝐿𝑓subscript𝜈superscriptsubscript𝑖1𝑛superscriptnormsuperscriptsubscript𝐳𝑖𝑛superscript𝐱𝜈\displaystyle\geq\mu^{2}-2L_{f}{\mathcal{H}}_{\nu}\sum_{i=1}^{n}\left\|{\bf{z}% }_{i}^{n}-{\bf{x}}^{*}\right\|^{\nu}≥ italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT
μ22Lfνnμ24Lfνnabsentsuperscript𝜇22subscript𝐿𝑓subscript𝜈𝑛superscript𝜇24subscript𝐿𝑓subscript𝜈𝑛\displaystyle\geq\mu^{2}-2L_{f}{\mathcal{H}}_{\nu}n\frac{\mu^{2}}{4L_{f}{% \mathcal{H}}_{\nu}n}≥ italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_n divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_n end_ARG
μ2μ22=μ22,absentsuperscript𝜇2superscript𝜇22superscript𝜇22\displaystyle\geq\mu^{2}-\frac{\mu^{2}}{2}=\frac{\mu^{2}}{2},≥ italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG = divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG ,

which implies

𝐇nμ22𝐈and𝐆n=(𝐇n)12μ2.formulae-sequencesucceeds-or-equalssuperscript𝐇𝑛superscript𝜇22𝐈andnormsuperscript𝐆𝑛normsuperscriptsuperscript𝐇𝑛12superscript𝜇2\displaystyle{\bf H}^{n}\succeq\frac{\mu^{2}}{2}{\bf I}\qquad\text{and}\qquad% \left\|{\bf G}^{n}\right\|=\left\|({\bf H}^{n})^{-1}\right\|\leq\frac{2}{\mu^{% 2}}.bold_H start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ⪰ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG bold_I and ∥ bold_G start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ = ∥ ( bold_H start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ ≤ divide start_ARG 2 end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

According to Lemma 13, we have

𝐱n+1𝐱normsuperscript𝐱𝑛1superscript𝐱\displaystyle\left\|{\bf{x}}^{n+1}-{\bf{x}}^{*}\right\|∥ bold_x start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ Lfν1+ν𝐆ni=1n𝐳in𝐱1+νabsentsubscript𝐿𝑓subscript𝜈1𝜈normsuperscript𝐆𝑛superscriptsubscript𝑖1𝑛superscriptnormsuperscriptsubscript𝐳𝑖𝑛superscript𝐱1𝜈\displaystyle\leq\frac{L_{f}{\mathcal{H}}_{\nu}}{1+\nu}\left\|{\bf G}^{n}% \right\|\sum_{i=1}^{n}\left\|{\bf{z}}_{i}^{n}-{\bf{x}}^{*}\right\|^{1+\nu}≤ divide start_ARG italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_ν end_ARG ∥ bold_G start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT
2Lfν(1+ν)μ2i=1n𝐳in𝐱1+νabsent2subscript𝐿𝑓subscript𝜈1𝜈superscript𝜇2superscriptsubscript𝑖1𝑛superscriptnormsuperscriptsubscript𝐳𝑖𝑛superscript𝐱1𝜈\displaystyle\leq\frac{2L_{f}{\mathcal{H}}_{\nu}}{(1+\nu)\mu^{2}}\sum_{i=1}^{n% }\left\|{\bf{z}}_{i}^{n}-{\bf{x}}^{*}\right\|^{1+\nu}≤ divide start_ARG 2 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG ( 1 + italic_ν ) italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT
2Lfν(1+ν)μ2(j=1n(aj(n,ν))1+ν𝐱0𝐱1+ν)absent2subscript𝐿𝑓subscript𝜈1𝜈superscript𝜇2superscriptsubscript𝑗1𝑛superscriptsubscript𝑎𝑗𝑛𝜈1𝜈superscriptnormsuperscript𝐱0superscript𝐱1𝜈\displaystyle\leq\frac{2L_{f}{\mathcal{H}}_{\nu}}{(1+\nu)\mu^{2}}\left(\sum_{j% =1}^{n}(a_{j}(n,\nu))^{1+\nu}\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|^{1+\nu}\right)≤ divide start_ARG 2 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG ( 1 + italic_ν ) italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT )
2Lfν(1+ν)μ2μ24Lfνn(j=1n(aj(n,ν))1+ν)𝐱0𝐱absent2subscript𝐿𝑓subscript𝜈1𝜈superscript𝜇2superscript𝜇24subscript𝐿𝑓subscript𝜈𝑛superscriptsubscript𝑗1𝑛superscriptsubscript𝑎𝑗𝑛𝜈1𝜈normsuperscript𝐱0superscript𝐱\displaystyle\leq\frac{2L_{f}{\mathcal{H}}_{\nu}}{(1+\nu)\mu^{2}}\frac{\mu^{2}% }{4L_{f}{\mathcal{H}}_{\nu}n}\left(\sum_{j=1}^{n}(a_{j}(n,\nu))^{1+\nu}\right)% \left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|≤ divide start_ARG 2 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG ( 1 + italic_ν ) italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_n end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥
=12(1+ν)n(j=1n(aj(n,ν))1+ν)𝐱0𝐱absent121𝜈𝑛superscriptsubscript𝑗1𝑛superscriptsubscript𝑎𝑗𝑛𝜈1𝜈normsuperscript𝐱0superscript𝐱\displaystyle=\frac{1}{2(1+\nu)n}\left(\sum_{j=1}^{n}(a_{j}(n,\nu))^{1+\nu}% \right)\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|= divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_n end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥
=an+1(n,ν)𝐱0𝐱.absentsubscript𝑎𝑛1𝑛𝜈normsuperscript𝐱0superscript𝐱\displaystyle=a_{n+1}(n,\nu)\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|.= italic_a start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ .

Hence, we have shown the induction base holds.

For the induction step, we assume

𝐇jμ22𝐈and𝐱j+1𝐱aj+1(n,ν)𝐱0𝐱formulae-sequencesucceeds-or-equalssuperscript𝐇𝑗superscript𝜇22𝐈andnormsuperscript𝐱𝑗1superscript𝐱subscript𝑎𝑗1𝑛𝜈normsuperscript𝐱0superscript𝐱\displaystyle{\bf H}^{j}\succeq\frac{\mu^{2}}{2}{\bf I}\qquad\text{and}\qquad% \left\|{\bf{x}}^{j+1}-{\bf{x}}^{*}\right\|\leq a_{j+1}(n,\nu)\left\|{\bf{x}}^{% 0}-{\bf{x}}^{*}\right\|bold_H start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ⪰ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG bold_I and ∥ bold_x start_POSTSUPERSCRIPT italic_j + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ italic_a start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥

holds for all j=n+1,,t1𝑗𝑛1𝑡1j=n+1,\cdots,t-1italic_j = italic_n + 1 , ⋯ , italic_t - 1 such that tn+2𝑡𝑛2t\geq n+2italic_t ≥ italic_n + 2. Combining results of Part I and II, we have

𝐱j𝐱aj(n,ν)𝐱0𝐱for allj=0,,t,formulae-sequencenormsuperscript𝐱𝑗superscript𝐱subscript𝑎𝑗𝑛𝜈normsuperscript𝐱0superscript𝐱for all𝑗0𝑡\displaystyle\left\|{\bf{x}}^{j}-{\bf{x}}^{*}\right\|\leq a_{j}(n,\nu)\left\|{% \bf{x}}^{0}-{\bf{x}}^{*}\right\|\qquad\text{for all}~{}~{}j=0,\dots,t,∥ bold_x start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ for all italic_j = 0 , … , italic_t ,

which implies

𝐱j𝐱ν(aj(n,ν))ν𝐱0𝐱ν𝐱0𝐱νμ24Lfνn,for allj=1,,t,formulae-sequencesuperscriptnormsuperscript𝐱𝑗superscript𝐱𝜈superscriptsubscript𝑎𝑗𝑛𝜈𝜈superscriptnormsuperscript𝐱0superscript𝐱𝜈superscriptnormsuperscript𝐱0superscript𝐱𝜈superscript𝜇24subscript𝐿𝑓subscript𝜈𝑛for all𝑗1𝑡\displaystyle\left\|{\bf{x}}^{j}-{\bf{x}}^{*}\right\|^{\nu}\leq(a_{j}(n,\nu))^% {\nu}\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|^{\nu}\leq\left\|{\bf{x}}^{0}-{% \bf{x}}^{*}\right\|^{\nu}\leq\frac{\mu^{2}}{4L_{f}{\mathcal{H}}_{\nu}n},\qquad% \text{for all}~{}~{}j=1,\dots,t,∥ bold_x start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ≤ ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ≤ ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ≤ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_n end_ARG , for all italic_j = 1 , … , italic_t ,

where the second inequality is based on Lemma 6 and the last inequality is based on the condition condition.

The update (9) means the points {𝐳it}i=1nsuperscriptsubscriptsuperscriptsubscript𝐳𝑖𝑡𝑖1𝑛\{{\bf{z}}_{i}^{t}\}_{i=1}^{n}{ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT can be written as {𝐱t+1n,,𝐱t}superscript𝐱𝑡1𝑛superscript𝐱𝑡\{{\bf{x}}^{t+1-n},\cdots,{\bf{x}}^{t}\}{ bold_x start_POSTSUPERSCRIPT italic_t + 1 - italic_n end_POSTSUPERSCRIPT , ⋯ , bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT }, which implies

max{𝐳1t𝐱,,𝐳nt𝐱}=max{𝐱t+1n𝐱,,𝐱t𝐱}.normsuperscriptsubscript𝐳1𝑡superscript𝐱normsuperscriptsubscript𝐳𝑛𝑡superscript𝐱normsuperscript𝐱𝑡1𝑛superscript𝐱normsuperscript𝐱𝑡superscript𝐱\displaystyle\max\{\left\|{\bf{z}}_{1}^{t}-{\bf{x}}^{*}\right\|,\cdots,\left\|% {\bf{z}}_{n}^{t}-{\bf{x}}^{*}\right\|\}=\max\{\left\|{\bf{x}}^{t+1-n}-{\bf{x}}% ^{*}\right\|,\cdots,\left\|{\bf{x}}^{t}-{\bf{x}}^{*}\right\|\}.roman_max { ∥ bold_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ , ⋯ , ∥ bold_z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ } = roman_max { ∥ bold_x start_POSTSUPERSCRIPT italic_t + 1 - italic_n end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ , ⋯ , ∥ bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ } .

Therefore, we have

𝐳in𝐱νμ24Lfνnfor alli=1,,n.formulae-sequencesuperscriptnormsuperscriptsubscript𝐳𝑖𝑛superscript𝐱𝜈superscript𝜇24subscript𝐿𝑓subscript𝜈𝑛for all𝑖1𝑛\displaystyle\left\|{\bf{z}}_{i}^{n}-{\bf{x}}^{*}\right\|^{\nu}\leq\frac{\mu^{% 2}}{4L_{f}{\mathcal{H}}_{\nu}n}\qquad\text{for all}~{}~{}i=1,\dots,n.∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ≤ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_n end_ARG for all italic_i = 1 , … , italic_n .

Combing with Lemma 2, we have

σmin(𝐇t)subscript𝜎superscript𝐇𝑡\displaystyle\sigma_{\min}({\bf H}^{t})italic_σ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) μ22Lfνi=1n𝐳it𝐱νabsentsuperscript𝜇22subscript𝐿𝑓subscript𝜈superscriptsubscript𝑖1𝑛superscriptnormsuperscriptsubscript𝐳𝑖𝑡superscript𝐱𝜈\displaystyle\geq\mu^{2}-2L_{f}{\mathcal{H}}_{\nu}\sum_{i=1}^{n}\left\|{\bf{z}% }_{i}^{t}-{\bf{x}}^{*}\right\|^{\nu}≥ italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT
μ22Lfνnμ24Lfνnabsentsuperscript𝜇22subscript𝐿𝑓subscript𝜈𝑛superscript𝜇24subscript𝐿𝑓subscript𝜈𝑛\displaystyle\geq\mu^{2}-2L_{f}{\mathcal{H}}_{\nu}n\frac{\mu^{2}}{4L_{f}{% \mathcal{H}}_{\nu}n}≥ italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_n divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_n end_ARG
=μ2μ22=μ22.absentsuperscript𝜇2superscript𝜇22superscript𝜇22\displaystyle=\mu^{2}-\frac{\mu^{2}}{2}=\frac{\mu^{2}}{2}.= italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG = divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG .

Therefore, we achieve

𝐇tμ22𝐈and𝐆tsucceeds-or-equalssuperscript𝐇𝑡superscript𝜇22𝐈andnormsuperscript𝐆𝑡\displaystyle{\bf H}^{t}\succeq\frac{\mu^{2}}{2}{\bf I}\qquad\text{and}\qquad% \left\|{\bf G}^{t}\right\|bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⪰ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG bold_I and ∥ bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ =(𝐇t)12μ2.absentnormsuperscriptsuperscript𝐇𝑡12superscript𝜇2\displaystyle=\left\|({\bf H}^{t})^{-1}\right\|\leq\frac{2}{\mu^{2}}.= ∥ ( bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ ≤ divide start_ARG 2 end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

According to Lemma 13, we have

𝐱t+1𝐱normsuperscript𝐱𝑡1superscript𝐱\displaystyle\left\|{\bf{x}}^{t+1}-{\bf{x}}^{*}\right\|∥ bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ Lfν1+ν𝐆ti=1n𝐳it𝐱1+νabsentsubscript𝐿𝑓subscript𝜈1𝜈normsuperscript𝐆𝑡superscriptsubscript𝑖1𝑛superscriptnormsuperscriptsubscript𝐳𝑖𝑡superscript𝐱1𝜈\displaystyle\leq\frac{L_{f}{\mathcal{H}}_{\nu}}{1+\nu}\left\|{\bf G}^{t}% \right\|\sum_{i=1}^{n}\left\|{\bf{z}}_{i}^{t}-{\bf{x}}^{*}\right\|^{1+\nu}≤ divide start_ARG italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_ν end_ARG ∥ bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT
2Lfν(1+ν)μ2i=1n𝐳it𝐱1+νabsent2subscript𝐿𝑓subscript𝜈1𝜈superscript𝜇2superscriptsubscript𝑖1𝑛superscriptnormsuperscriptsubscript𝐳𝑖𝑡superscript𝐱1𝜈\displaystyle\leq\frac{2L_{f}{\mathcal{H}}_{\nu}}{(1+\nu)\mu^{2}}\sum_{i=1}^{n% }\left\|{\bf{z}}_{i}^{t}-{\bf{x}}^{*}\right\|^{1+\nu}≤ divide start_ARG 2 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG ( 1 + italic_ν ) italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT
2Lfν(1+ν)μ2(j=tn+1t(aj(n,ν))1+ν𝐱0𝐱1+ν)absent2subscript𝐿𝑓subscript𝜈1𝜈superscript𝜇2superscriptsubscript𝑗𝑡𝑛1𝑡superscriptsubscript𝑎𝑗𝑛𝜈1𝜈superscriptnormsuperscript𝐱0superscript𝐱1𝜈\displaystyle\leq\frac{2L_{f}{\mathcal{H}}_{\nu}}{(1+\nu)\mu^{2}}\left(\sum_{j% =t-n+1}^{t}(a_{j}(n,\nu))^{1+\nu}\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|^{1+% \nu}\right)≤ divide start_ARG 2 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG ( 1 + italic_ν ) italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = italic_t - italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT )
2Lfν(1+ν)μ2(j=tn+1t(aj(n,ν))1+ν)𝐱0𝐱1+νabsent2subscript𝐿𝑓subscript𝜈1𝜈superscript𝜇2superscriptsubscript𝑗𝑡𝑛1𝑡superscriptsubscript𝑎𝑗𝑛𝜈1𝜈superscriptnormsuperscript𝐱0superscript𝐱1𝜈\displaystyle\leq\frac{2L_{f}{\mathcal{H}}_{\nu}}{(1+\nu)\mu^{2}}\left(\sum_{j% =t-n+1}^{t}(a_{j}(n,\nu))^{1+\nu}\right)\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right% \|^{1+\nu}≤ divide start_ARG 2 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG ( 1 + italic_ν ) italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = italic_t - italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT
2Lfν(1+ν)μ2μ24Lfνn(j=tn+1t+1aj(n,ν)1+ν)𝐱0𝐱absent2subscript𝐿𝑓subscript𝜈1𝜈superscript𝜇2superscript𝜇24subscript𝐿𝑓subscript𝜈𝑛superscriptsubscript𝑗𝑡𝑛1𝑡1subscript𝑎𝑗superscript𝑛𝜈1𝜈normsuperscript𝐱0superscript𝐱\displaystyle\leq\frac{2L_{f}{\mathcal{H}}_{\nu}}{(1+\nu)\mu^{2}}\frac{\mu^{2}% }{4L_{f}{\mathcal{H}}_{\nu}n}\left(\sum_{j=t-n+1}^{t+1}a_{j}(n,\nu)^{1+\nu}% \right)\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|≤ divide start_ARG 2 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG ( 1 + italic_ν ) italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_n end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = italic_t - italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥
=12(1+ν)n(j=tn+2t+1(aj(n,ν))1+ν)𝐱0𝐱absent121𝜈𝑛superscriptsubscript𝑗𝑡𝑛2𝑡1superscriptsubscript𝑎𝑗𝑛𝜈1𝜈normsuperscript𝐱0superscript𝐱\displaystyle=\frac{1}{2(1+\nu)n}\left(\sum_{j=t-n+2}^{t+1}(a_{j}(n,\nu))^{1+% \nu}\right)\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|= divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_n end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = italic_t - italic_n + 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥
=at+1(n,ν)𝐱0𝐱.absentsubscript𝑎𝑡1𝑛𝜈normsuperscript𝐱0superscript𝐱\displaystyle=a_{t+1}(n,\nu)\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|.= italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ .

Hence, we finish the induction.

Combining results of Part I, II and III completes the proof of (26).

Since the non-singularity of 𝐇tsuperscript𝐇𝑡{\bf H}^{t}bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and 𝐇t+1superscript𝐇𝑡1{\bf H}^{t+1}bold_H start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT has been verified by result (26), we can apply Lemma 12 to achieve

σmin(𝐈+(𝐕t)(𝐇t)1𝐔t)>0.subscript𝜎𝐈superscriptsuperscript𝐕𝑡topsuperscriptsuperscript𝐇𝑡1superscript𝐔𝑡0\displaystyle\sigma_{\min}({\bf I}+({\bf V}^{t})^{\top}({\bf H}^{t})^{-1}{\bf U% }^{t})>0.italic_σ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( bold_I + ( bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) > 0 .

Theorem 4.

We define the sequence {rt}t0subscriptsubscript𝑟𝑡𝑡0\{r_{t}\}_{t\geq 0}{ italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t ≥ 0 end_POSTSUBSCRIPT such that

rt{max{𝐱0𝐱,1},t=0,at(n,ν)r0,t1,subscript𝑟𝑡casesnormsuperscript𝐱0superscript𝐱1𝑡0subscript𝑎𝑡𝑛𝜈subscript𝑟0𝑡1\displaystyle r_{t}\triangleq\begin{cases}\max\{\left\|{\bf{x}}^{0}-{\bf{x}}^{% *}\right\|,1\},~{}~{}~{}~{}&t=0,\\[5.69046pt] a_{t}(n,\nu)r_{0},~{}~{}~{}~{}&t\geq 1,\\ \end{cases}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≜ { start_ROW start_CELL roman_max { ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ , 1 } , end_CELL start_CELL italic_t = 0 , end_CELL end_ROW start_ROW start_CELL italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , end_CELL start_CELL italic_t ≥ 1 , end_CELL end_ROW

where the sequence {at(n,ν)}t0subscriptsubscript𝑎𝑡𝑛𝜈𝑡0\{a_{t}(n,\nu)\}_{t\geq 0}{ italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) } start_POSTSUBSCRIPT italic_t ≥ 0 end_POSTSUBSCRIPT is defined by equation (23). Under the Assumptions 1, 2 and 3, running IGN (Algorithm 1) with initial condition shown in Theorem 3, we have

𝐱t𝐱rtandrt+1c(1+ν)(tn1)rtformulae-sequencenormsuperscript𝐱𝑡superscript𝐱subscript𝑟𝑡andsubscript𝑟𝑡1superscript𝑐superscript1𝜈𝑡𝑛1subscript𝑟𝑡\displaystyle\left\|{\bf{x}}^{t}-{\bf{x}}^{*}\right\|\leq r_{t}\qquad\text{and% }\qquad r_{t+1}\leq c^{(1+\nu)^{\left(\left\lfloor\frac{t}{n}\right\rfloor-1% \right)}}r_{t}∥ bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and italic_r start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ≤ italic_c start_POSTSUPERSCRIPT ( 1 + italic_ν ) start_POSTSUPERSCRIPT ( ⌊ divide start_ARG italic_t end_ARG start_ARG italic_n end_ARG ⌋ - 1 ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (30)

for all tn𝑡𝑛t\geq nitalic_t ≥ italic_n, where

c=11n(1(12(1+ν))1+ν).𝑐11𝑛1superscript121𝜈1𝜈\displaystyle c=1-\frac{1}{n}\left(1-\left(\frac{1}{2(1+\nu)}\right)^{1+\nu}% \right).italic_c = 1 - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ( 1 - ( divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ) .
Proof.

The definition of {rt}t0subscriptsubscript𝑟𝑡𝑡0\{r_{t}\}_{t\geq 0}{ italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t ≥ 0 end_POSTSUBSCRIPT leads to

r0=max{𝐱0𝐱,1}𝐱0𝐱.subscript𝑟0normsuperscript𝐱0superscript𝐱1normsuperscript𝐱0superscript𝐱\displaystyle r_{0}=\max\{\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|,1\}\geq% \left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|.italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = roman_max { ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ , 1 } ≥ ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ .

According to Theorem 3, we have

𝐱t𝐱at(n,ν)𝐱0𝐱at(n,ν)r0=rt.normsuperscript𝐱𝑡superscript𝐱subscript𝑎𝑡𝑛𝜈normsuperscript𝐱0superscript𝐱subscript𝑎𝑡𝑛𝜈subscript𝑟0subscript𝑟𝑡\displaystyle\left\|{\bf{x}}^{t}-{\bf{x}}^{*}\right\|\leq a_{t}(n,\nu)\left\|{% \bf{x}}^{0}-{\bf{x}}^{*}\right\|\leq a_{t}(n,\nu)r_{0}=r_{t}.∥ bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT .

According to Lemma 11, we have

at+1(n,ν)c(1+ν)(tn1)at(n,ν)for alltn.formulae-sequencesubscript𝑎𝑡1𝑛𝜈superscript𝑐superscript1𝜈𝑡𝑛1subscript𝑎𝑡𝑛𝜈for all𝑡𝑛\displaystyle a_{t+1}(n,\nu)\leq c^{(1+\nu)^{\left(\left\lfloor\frac{t}{n}% \right\rfloor-1\right)}}a_{t}(n,\nu)\qquad\text{for all}~{}~{}t\geq n.italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) ≤ italic_c start_POSTSUPERSCRIPT ( 1 + italic_ν ) start_POSTSUPERSCRIPT ( ⌊ divide start_ARG italic_t end_ARG start_ARG italic_n end_ARG ⌋ - 1 ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) for all italic_t ≥ italic_n .

Thus, achieve

rt+1=at+1(n,ν)r0c(1+ν)(tn1)at(n,ν)r0=c(1+ν)(tn1)rtfor alltn,formulae-sequencesubscript𝑟𝑡1subscript𝑎𝑡1𝑛𝜈subscript𝑟0superscript𝑐superscript1𝜈𝑡𝑛1subscript𝑎𝑡𝑛𝜈subscript𝑟0superscript𝑐superscript1𝜈𝑡𝑛1subscript𝑟𝑡for all𝑡𝑛\displaystyle r_{t+1}=a_{t+1}(n,\nu)r_{0}\leq c^{(1+\nu)^{\left(\left\lfloor% \frac{t}{n}\right\rfloor-1\right)}}a_{t}(n,\nu)r_{0}=c^{(1+\nu)^{\left(\left% \lfloor\frac{t}{n}\right\rfloor-1\right)}}r_{t}\qquad\text{for all}~{}~{}t\geq n,italic_r start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_n , italic_ν ) italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ≤ italic_c start_POSTSUPERSCRIPT ( 1 + italic_ν ) start_POSTSUPERSCRIPT ( ⌊ divide start_ARG italic_t end_ARG start_ARG italic_n end_ARG ⌋ - 1 ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_c start_POSTSUPERSCRIPT ( 1 + italic_ν ) start_POSTSUPERSCRIPT ( ⌊ divide start_ARG italic_t end_ARG start_ARG italic_n end_ARG ⌋ - 1 ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT for all italic_t ≥ italic_n ,

where

c=11n(1(12(1+ν))1+ν).𝑐11𝑛1superscript121𝜈1𝜈\displaystyle c=1-\frac{1}{n}\left(1-\left(\frac{1}{2(1+\nu)}\right)^{1+\nu}% \right).italic_c = 1 - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ( 1 - ( divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ) .

Combining the results of Theorem 3 and 4, we finish the proof of Theorem 1.

D.5 Proof of Corollary 1

Proof.

According to Theorem 1, we have

rt+1c(1+ν)(tn1)rtwithc=11n(1(12(1+ν))1+ν).formulae-sequencesubscript𝑟𝑡1superscript𝑐superscript1𝜈𝑡𝑛1subscript𝑟𝑡with𝑐11𝑛1superscript121𝜈1𝜈\displaystyle r_{t+1}\leq c^{(1+\nu)^{\left(\left\lfloor\frac{t}{n}\right% \rfloor-1\right)}}r_{t}\qquad\text{with}\qquad c=1-\frac{1}{n}\left(1-\left(% \frac{1}{2(1+\nu)}\right)^{1+\nu}\right).italic_r start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ≤ italic_c start_POSTSUPERSCRIPT ( 1 + italic_ν ) start_POSTSUPERSCRIPT ( ⌊ divide start_ARG italic_t end_ARG start_ARG italic_n end_ARG ⌋ - 1 ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT with italic_c = 1 - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ( 1 - ( divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ) .

for all ν(0,1]𝜈01\nu\in(0,1]italic_ν ∈ ( 0 , 1 ]. Noticing that the value of c𝑐citalic_c is monotonically decreasing according to ν𝜈\nuitalic_ν, we have

112n>c11516n,112𝑛𝑐11516𝑛\displaystyle 1-\frac{1}{2n}>c\geq 1-\frac{15}{16n},1 - divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG > italic_c ≥ 1 - divide start_ARG 15 end_ARG start_ARG 16 italic_n end_ARG ,

which implies

rt+1(112n)(1+ν)(t/n1)rtsubscript𝑟𝑡1superscript112𝑛superscript1𝜈𝑡𝑛1subscript𝑟𝑡\displaystyle r_{t+1}\leq\Big{(}1-\frac{1}{2n}\Big{)}^{(1+\nu)^{(\left\lfloor t% /n\right\rfloor-1)}}r_{t}italic_r start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ≤ ( 1 - divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ) start_POSTSUPERSCRIPT ( 1 + italic_ν ) start_POSTSUPERSCRIPT ( ⌊ italic_t / italic_n ⌋ - 1 ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT

for all tn𝑡𝑛t\geq nitalic_t ≥ italic_n.

D.6 Proof of Corollary 2

Proof.

According to the definition of {rt}t0subscriptsubscript𝑟𝑡𝑡0\{r_{t}\}_{t\geq 0}{ italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t ≥ 0 end_POSTSUBSCRIPT and Theorem 4, we have

r0=max{𝐱0𝐱,1}1.subscript𝑟0normsuperscript𝐱0superscript𝐱11\displaystyle r_{0}=\max\{\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|,1\}\geq 1.italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = roman_max { ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ , 1 } ≥ 1 .

Combining with Lemma 8, we have

rt=subscript𝑟𝑡absent\displaystyle r_{t}=italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = at(n,ν)r0subscript𝑎𝑡𝑛𝜈subscript𝑟0\displaystyle a_{t}(n,\nu)r_{0}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n , italic_ν ) italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT
\displaystyle\leq 12(1+ν)(atn(n,ν))1+νr0121𝜈superscriptsubscript𝑎𝑡𝑛𝑛𝜈1𝜈subscript𝑟0\displaystyle\frac{1}{2(1+\nu)}(a_{t-n}(n,\nu))^{1+\nu}r_{0}divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG ( italic_a start_POSTSUBSCRIPT italic_t - italic_n end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT
=\displaystyle== 12(1+ν)r0ν(atn(n,ν))1+νr01+ν121𝜈superscriptsubscript𝑟0𝜈superscriptsubscript𝑎𝑡𝑛𝑛𝜈1𝜈superscriptsubscript𝑟01𝜈\displaystyle\frac{1}{2(1+\nu)r_{0}^{\nu}}(a_{t-n}(n,\nu))^{1+\nu}r_{0}^{1+\nu}divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT end_ARG ( italic_a start_POSTSUBSCRIPT italic_t - italic_n end_POSTSUBSCRIPT ( italic_n , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT
=\displaystyle== 12(1+ν)r0νrtn1+ν121𝜈superscriptsubscript𝑟0𝜈superscriptsubscript𝑟𝑡𝑛1𝜈\displaystyle\frac{1}{2(1+\nu)r_{0}^{\nu}}r_{t-n}^{1+\nu}divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT end_ARG italic_r start_POSTSUBSCRIPT italic_t - italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT
\displaystyle\leq 12(1+ν)rtn1+ν121𝜈superscriptsubscript𝑟𝑡𝑛1𝜈\displaystyle\frac{1}{2(1+\nu)}r_{t-n}^{1+\nu}divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG italic_r start_POSTSUBSCRIPT italic_t - italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT

for all tn𝑡𝑛t\geq nitalic_t ≥ italic_n. This leads to

rt14rtn2subscript𝑟𝑡14superscriptsubscript𝑟𝑡𝑛2\displaystyle r_{t}\leq\frac{1}{4}r_{t-n}^{2}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG 4 end_ARG italic_r start_POSTSUBSCRIPT italic_t - italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

in the case of ν=1𝜈1\nu=1italic_ν = 1. ∎

Appendix E The Convergence Analysis for MB-IGN

In this section, we analyze the convergence of MB-IGN (Algorithm 2). Most of the proof in this section can be achieved by follow the analysis in Section D and we provide the details for the completeness.

E.1 The Additional Lemma for Gram Matrix

We provide the bound for the spectrum of matrix 𝐇tsuperscript𝐇𝑡{\bf H}^{t}bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT for MB-IGN method as follows

Lemma 14.

Under Assumptions 1, 2 and 3, running MB-IGN (Algorithm 2) with batch size k𝑘kitalic_k, 𝐇0=𝐉(𝐱0)𝐉(𝐱0)superscript𝐇0𝐉superscriptsuperscript𝐱0top𝐉superscript𝐱0{\bf H}^{0}={\bf J}({\bf{x}}^{0})^{\top}{\bf J}({\bf{x}}^{0})bold_H start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = bold_J ( bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) and 𝐆0=(𝐇0)1superscript𝐆0superscriptsuperscript𝐇01{\bf G}^{0}=({\bf H}^{0})^{-1}bold_G start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = ( bold_H start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT holds that

σmin(𝐇t)μ22kLfνi=1m𝐳it𝐱νsubscript𝜎superscript𝐇𝑡superscript𝜇22𝑘subscript𝐿𝑓subscript𝜈superscriptsubscript𝑖1𝑚superscriptnormsuperscriptsubscript𝐳𝑖𝑡superscript𝐱𝜈\displaystyle\sigma_{\min}({\bf H}^{t})\geq\mu^{2}-2kL_{f}{\mathcal{H}}_{\nu}% \sum_{i=1}^{m}\left\|{\bf{z}}_{i}^{t}-{\bf{x}}^{*}\right\|^{\nu}italic_σ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ≥ italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT

for all t0𝑡0t\geq 0italic_t ≥ 0, where m=n/k𝑚𝑛𝑘m=\lceil n/k\rceilitalic_m = ⌈ italic_n / italic_k ⌉.

Proof.

We have

𝐇t𝐉(𝐱)𝐉(𝐱)normsuperscript𝐇𝑡𝐉superscriptsuperscript𝐱top𝐉superscript𝐱\displaystyle\left\|{\bf H}^{t}-{\bf J}({\bf{x}}^{*})^{\top}{\bf J}({\bf{x}}^{% *})\right\|∥ bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_J ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∥ =i=1mj𝒮i𝐠j(𝐳it)𝐠j(𝐳it)i=1mj𝒮i𝐠j(𝐱)𝐠j(𝐱)absentnormsuperscriptsubscript𝑖1𝑚subscript𝑗subscript𝒮𝑖subscript𝐠𝑗superscriptsubscript𝐳𝑖𝑡subscript𝐠𝑗superscriptsuperscriptsubscript𝐳𝑖𝑡topsuperscriptsubscript𝑖1𝑚subscript𝑗subscript𝒮𝑖subscript𝐠𝑗superscript𝐱subscript𝐠𝑗superscriptsuperscript𝐱top\displaystyle=\left\|\sum_{i=1}^{m}\sum_{j\in{\mathcal{S}}_{i}}{\bf{g}}_{j}({% \bf{z}}_{i}^{t}){\bf{g}}_{j}({\bf{z}}_{i}^{t})^{\top}-\sum_{i=1}^{m}\sum_{j\in% {\mathcal{S}}_{i}}{\bf{g}}_{j}({\bf{x}}^{*}){\bf{g}}_{j}({\bf{x}}^{*})^{\top}\right\|= ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∥
i=1mj𝒮i𝐠j(𝐳it)𝐠j(𝐳it)𝐠j(𝐱)𝐠j(𝐱)absentsuperscriptsubscript𝑖1𝑚subscript𝑗subscript𝒮𝑖normsubscript𝐠𝑗superscriptsubscript𝐳𝑖𝑡subscript𝐠𝑗superscriptsuperscriptsubscript𝐳𝑖𝑡topsubscript𝐠𝑗superscript𝐱subscript𝐠𝑗superscriptsuperscript𝐱top\displaystyle\leq\sum_{i=1}^{m}\sum_{j\in{\mathcal{S}}_{i}}\left\|{\bf{g}}_{j}% ({\bf{z}}_{i}^{t}){\bf{g}}_{j}({\bf{z}}_{i}^{t})^{\top}-{\bf{g}}_{j}({\bf{x}}^% {*}){\bf{g}}_{j}({\bf{x}}^{*})^{\top}\right\|≤ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∥
i=1m2|𝒮i|Lfν𝐳it𝐱νabsentsuperscriptsubscript𝑖1𝑚2subscript𝒮𝑖subscript𝐿𝑓subscript𝜈superscriptnormsuperscriptsubscript𝐳𝑖𝑡superscript𝐱𝜈\displaystyle\leq\sum_{i=1}^{m}2|{\mathcal{S}}_{i}|L_{f}{\mathcal{H}}_{\nu}% \left\|{\bf{z}}_{i}^{t}-{\bf{x}}^{*}\right\|^{\nu}≤ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT 2 | caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT
i=1m2kLfν𝐳it𝐱ν,absentsuperscriptsubscript𝑖1𝑚2𝑘subscript𝐿𝑓subscript𝜈superscriptnormsuperscriptsubscript𝐳𝑖𝑡superscript𝐱𝜈\displaystyle\leq\sum_{i=1}^{m}2kL_{f}{\mathcal{H}}_{\nu}\left\|{\bf{z}}_{i}^{% t}-{\bf{x}}^{*}\right\|^{\nu},≤ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT 2 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ,

where the first inequality comes from the triangle inequality and the second inequality is based on Lemma 1. Thus, we have

𝐇t𝐉(𝐱)𝐉(𝐱)i=1m2kLfν𝐳it𝐱ν𝐈,succeeds-or-equalssuperscript𝐇𝑡𝐉superscriptsuperscript𝐱top𝐉superscript𝐱superscriptsubscript𝑖1𝑚2𝑘subscript𝐿𝑓subscript𝜈superscriptnormsuperscriptsubscript𝐳𝑖𝑡superscript𝐱𝜈𝐈\displaystyle{\bf H}^{t}-{\bf J}({\bf{x}}^{*})^{\top}{\bf J}({\bf{x}}^{*})% \succeq-\sum_{i=1}^{m}2kL_{f}{\mathcal{H}}_{\nu}\left\|{\bf{z}}_{i}^{t}-{\bf{x% }}^{*}\right\|^{\nu}\cdot{\bf I},bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_J ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ⪰ - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT 2 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ⋅ bold_I ,

which implies that

σmin(𝐇t)σmin(𝐉(𝐱)𝐉(𝐱))i=1m2kLfν𝐳it𝐱ν=μ22kLfνi=1m𝐳it𝐱ν,subscript𝜎superscript𝐇𝑡subscript𝜎𝐉superscriptsuperscript𝐱top𝐉superscript𝐱superscriptsubscript𝑖1𝑚2𝑘subscript𝐿𝑓subscript𝜈superscriptnormsuperscriptsubscript𝐳𝑖𝑡superscript𝐱𝜈superscript𝜇22𝑘subscript𝐿𝑓subscript𝜈superscriptsubscript𝑖1𝑚superscriptnormsuperscriptsubscript𝐳𝑖𝑡superscript𝐱𝜈\displaystyle\sigma_{\min}({\bf H}^{t})\geq\sigma_{\min}({\bf J}({\bf{x}}^{*})% ^{\top}{\bf J}({\bf{x}}^{*}))-\sum_{i=1}^{m}2kL_{f}{\mathcal{H}}_{\nu}\left\|{% \bf{z}}_{i}^{t}-{\bf{x}}^{*}\right\|^{\nu}=\mu^{2}-2kL_{f}{\mathcal{H}}_{\nu}% \sum_{i=1}^{m}\left\|{\bf{z}}_{i}^{t}-{\bf{x}}^{*}\right\|^{\nu},italic_σ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ≥ italic_σ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( bold_J ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT 2 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT = italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ,

where the last step is based on Proposition 1. ∎

E.2 Proof of Theorem 2

Similarly, we then show the update

𝐆t+1=𝐆t𝐆t𝐔t(𝐈+(𝐕t)𝐆t𝐔t)1(𝐕t)𝐆tsuperscript𝐆𝑡1superscript𝐆𝑡superscript𝐆𝑡superscript𝐔𝑡superscript𝐈superscriptsuperscript𝐕𝑡topsuperscript𝐆𝑡superscript𝐔𝑡1superscriptsuperscript𝐕𝑡topsuperscript𝐆𝑡\displaystyle{\bf G}^{t+1}={\bf G}^{t}-{\bf G}^{t}{\bf U}^{t}({\bf I}+({\bf V}% ^{t})^{\top}{\bf G}^{t}{\bf U}^{t})^{-1}({\bf V}^{t})^{\top}{\bf G}^{t}bold_G start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( bold_I + ( bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT

in MB-IGN method (Line 10 of Algorithm 2) is well-defined if the matrices 𝐇tsuperscript𝐇𝑡{\bf H}^{t}bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and 𝐇t+1superscript𝐇𝑡1{\bf H}^{t+1}bold_H start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT are non-singular.

Lemma 15.

Following the setting of Theorem 2, if the matrices 𝐇tsuperscript𝐇𝑡{\bf H}^{t}bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and 𝐇t+1superscript𝐇𝑡1{\bf H}^{t+1}bold_H start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT are non-singular, then the matrix 𝐈+𝐕t𝐆t𝐔t𝐈superscriptsuperscript𝐕𝑡topsuperscript𝐆𝑡superscript𝐔𝑡{\bf I}+{{\bf V}^{t}}^{\top}{\bf G}^{t}{\bf U}^{t}bold_I + bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT is also non-singular, where

{𝐔t=[𝐠j1(𝐳itt),𝐠j1(𝐱t+1),,𝐠j|𝒮it|(𝐳itt),𝐠j|𝒮it|(𝐱t+1)],𝐕t=[𝐠j1(𝐳itt),𝐠j1(𝐱t+1),,𝐠j|𝒮it|(𝐳itt),𝐠j|𝒮it|(𝐱t+1)],it=t%m+1,casessuperscript𝐔𝑡subscript𝐠subscript𝑗1superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝐠subscript𝑗1superscript𝐱𝑡1subscript𝐠subscript𝑗subscript𝒮subscript𝑖𝑡superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝐠subscript𝑗subscript𝒮subscript𝑖𝑡superscript𝐱𝑡1otherwiseformulae-sequencesuperscript𝐕𝑡subscript𝐠subscript𝑗1superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝐠subscript𝑗1superscript𝐱𝑡1subscript𝐠subscript𝑗subscript𝒮subscript𝑖𝑡superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝐠subscript𝑗subscript𝒮subscript𝑖𝑡superscript𝐱𝑡1subscript𝑖𝑡percent𝑡𝑚1otherwise\displaystyle\begin{cases}{\bf U}^{t}=\Big{[}-{\bf{g}}_{j_{1}}({\bf{z}}_{i_{t}% }^{t}),~{}~{}{\bf{g}}_{j_{1}}({\bf{x}}^{t+1}),~{}\cdots~{},~{}-{\bf{g}}_{j_{|{% \mathcal{S}}_{i_{t}}|}}({\bf{z}}_{i_{t}}^{t}),~{}~{}{\bf{g}}_{j_{|{\mathcal{S}% }_{i_{t}}|}}({\bf{x}}^{t+1})\Big{]},\\[5.69046pt] {\bf V}^{t}=\Big{[}{\bf{g}}_{j_{1}}({\bf{z}}_{i_{t}}^{t}),~{}~{}{\bf{g}}_{j_{1% }}({\bf{x}}^{t+1}),~{}\cdots~{},~{}{\bf{g}}_{j_{|{\mathcal{S}}_{i_{t}}|}}({\bf% {z}}_{i_{t}}^{t}),~{}~{}{\bf{g}}_{j_{|{\mathcal{S}}_{i_{t}}|}}({\bf{x}}^{t+1})% \Big{]},\quad i_{t}={t\%m}+1,\end{cases}{ start_ROW start_CELL bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = [ - bold_g start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_g start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) , ⋯ , - bold_g start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT | caligraphic_S start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT | end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_g start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT | caligraphic_S start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT | end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) ] , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = [ bold_g start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_g start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) , ⋯ , bold_g start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT | caligraphic_S start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT | end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_g start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT | caligraphic_S start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT | end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) ] , italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_t % italic_m + 1 , end_CELL start_CELL end_CELL end_ROW
Proof.

The recursion of 𝐇tsuperscript𝐇𝑡{\bf H}^{t}bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and the definition of 𝐔tsuperscript𝐔𝑡{\bf U}^{t}bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and 𝐕tsuperscript𝐕𝑡{\bf V}^{t}bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT imply

𝐇t+1=𝐇tj𝒮it𝐠j(𝐳itt)𝐠j(𝐳itt)+j𝒮it𝐠j(𝐱t+1)𝐠j(𝐱t+1)=𝐇t+𝐔t𝐕t.superscript𝐇𝑡1superscript𝐇𝑡subscript𝑗subscript𝒮subscript𝑖𝑡subscript𝐠𝑗superscriptsubscript𝐳subscript𝑖𝑡𝑡subscript𝐠𝑗superscriptsuperscriptsubscript𝐳subscript𝑖𝑡𝑡topsubscript𝑗subscript𝒮subscript𝑖𝑡subscript𝐠𝑗superscript𝐱𝑡1subscript𝐠𝑗superscriptsuperscript𝐱𝑡1topsuperscript𝐇𝑡superscript𝐔𝑡superscriptsuperscript𝐕𝑡top\displaystyle{\bf H}^{t+1}={\bf H}^{t}-\sum_{j\in{\mathcal{S}}_{i_{t}}}{\bf{g}% }_{j}({\bf{z}}_{i_{t}}^{t}){\bf{g}}_{j}({\bf{z}}_{i_{t}}^{t})^{\top}+\sum_{j% \in{\mathcal{S}}_{i_{t}}}{\bf{g}}_{j}({\bf{x}}^{t+1}){\bf{g}}_{j}({\bf{x}}^{t+% 1})^{\top}={\bf H}^{t}+{\bf U}^{t}{{\bf V}^{t}}^{\top}.bold_H start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT = bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT .

Since we assume matrices 𝐇tsuperscript𝐇𝑡{\bf H}^{t}bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and 𝐇t+1superscript𝐇𝑡1{\bf H}^{t+1}bold_H start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT are non-singular, applying the matrix determinant lemma [41, section 9.1.2] on above equation leads to

det(𝐇t+1)=det(𝐇t+𝐔t𝐕t)=det(𝐈+𝐕t(𝐇t)1𝐔t)det(𝐇t).superscript𝐇𝑡1superscript𝐇𝑡superscript𝐔𝑡superscriptsuperscript𝐕𝑡top𝐈superscriptsuperscript𝐕𝑡topsuperscriptsuperscript𝐇𝑡1superscript𝐔𝑡superscript𝐇𝑡\displaystyle\det({\bf H}^{t+1})=\det({\bf H}^{t}+{\bf U}^{t}{{\bf V}^{t}}^{% \top})=\det({\bf I}+{{\bf V}^{t}}^{\top}({\bf H}^{t})^{-1}{\bf U}^{t})\det({% \bf H}^{t}).roman_det ( bold_H start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) = roman_det ( bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) = roman_det ( bold_I + bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) roman_det ( bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) .

Then the definition 𝐆t=𝐇t1superscript𝐆𝑡superscriptsuperscript𝐇𝑡1{\bf G}^{t}={{\bf H}^{t}}^{-1}bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT implies

det(𝐈+𝐕t𝐆t𝐔t)=det(𝐈+𝐕t𝐇t1𝐔t)0𝐈superscriptsuperscript𝐕𝑡topsuperscript𝐆𝑡superscript𝐔𝑡𝐈superscriptsuperscript𝐕𝑡topsuperscriptsuperscript𝐇𝑡1superscript𝐔𝑡0\displaystyle\det({\bf I}+{{\bf V}^{t}}^{\top}{\bf G}^{t}{\bf U}^{t})=\det({% \bf I}+{{\bf V}^{t}}^{\top}{{\bf H}^{t}}^{-1}{\bf U}^{t})\neq 0roman_det ( bold_I + bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) = roman_det ( bold_I + bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ≠ 0

which finish the proofs. ∎

Then we show the non-singular assumption on {𝐇j}j=0tsuperscriptsubscriptsuperscript𝐇𝑗𝑗0𝑡\{{\bf H}^{j}\}_{j=0}^{t}{ bold_H start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT can upper bound the distance 𝐱t+1𝐱normsuperscript𝐱𝑡1superscript𝐱\left\|{\bf{x}}^{t+1}-{\bf{x}}^{*}\right\|∥ bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥.

Lemma 16.

Under Assumptions 1 and 2, we assume matrices {𝐇j}j=0tsuperscriptsubscriptsuperscript𝐇𝑗𝑗0𝑡\{{\bf H}^{j}\}_{j=0}^{t}{ bold_H start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT are non-singular and run MB-IGN (Algorithm 2) with batch size k𝑘kitalic_k, then it holds

𝐱t+1𝐱kLfν1+ν𝐆ti=1m𝐳it𝐱1+ν,normsuperscript𝐱𝑡1superscript𝐱𝑘subscript𝐿𝑓subscript𝜈1𝜈normsuperscript𝐆𝑡superscriptsubscript𝑖1𝑚superscriptnormsuperscriptsubscript𝐳𝑖𝑡superscript𝐱1𝜈\displaystyle\left\|{\bf{x}}^{t+1}-{\bf{x}}^{*}\right\|\leq\frac{kL_{f}{% \mathcal{H}}_{\nu}}{1+\nu}\left\|{\bf G}^{t}\right\|\sum_{i=1}^{m}\left\|{\bf{% z}}_{i}^{t}-{\bf{x}}^{*}\right\|^{1+\nu},∥ bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ divide start_ARG italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_ν end_ARG ∥ bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ,

where 𝐆t=(𝐇t)1superscript𝐆𝑡superscriptsuperscript𝐇𝑡1{\bf G}^{t}=\left({\bf H}^{t}\right)^{-1}bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = ( bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT and m=n/k𝑚𝑛𝑘m=\lceil n/k\rceilitalic_m = ⌈ italic_n / italic_k ⌉.

Proof.

Subtracting the term 𝐱superscript𝐱{\bf{x}}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT on both sides of equation (8), we have

𝐱t+1𝐱superscript𝐱𝑡1superscript𝐱\displaystyle{\bf{x}}^{t+1}-{\bf{x}}^{*}bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT =(i=1mj𝒮i𝐠j(𝐳it)𝐠j(𝐳it))1(i=1m(j𝒮i𝐠j(𝐳it)𝐠j(𝐳it))(𝐳it𝐱)i=1mj𝒮ifj(𝐳it)𝐠j(𝐳it))absentsuperscriptsuperscriptsubscript𝑖1𝑚subscript𝑗subscript𝒮𝑖subscript𝐠𝑗superscriptsubscript𝐳𝑖𝑡subscript𝐠𝑗superscriptsuperscriptsubscript𝐳𝑖𝑡top1superscriptsubscript𝑖1𝑚subscript𝑗subscript𝒮𝑖subscript𝐠𝑗superscriptsubscript𝐳𝑖𝑡subscript𝐠𝑗superscriptsuperscriptsubscript𝐳𝑖𝑡topsuperscriptsubscript𝐳𝑖𝑡superscript𝐱superscriptsubscript𝑖1𝑚subscript𝑗subscript𝒮𝑖subscript𝑓𝑗superscriptsubscript𝐳𝑖𝑡subscript𝐠𝑗superscriptsubscript𝐳𝑖𝑡\displaystyle=\left(\sum_{i=1}^{m}\sum_{j\in{\mathcal{S}}_{i}}{\bf{g}}_{j}({% \bf{z}}_{i}^{t}){\bf{g}}_{j}({\bf{z}}_{i}^{t})^{\top}\right)^{-1}\left(\sum_{i% =1}^{m}\left(\sum_{j\in{\mathcal{S}}_{i}}{\bf{g}}_{j}({\bf{z}}_{i}^{t}){\bf{g}% }_{j}({\bf{z}}_{i}^{t})^{\top}\right)({\bf{z}}_{i}^{t}-{\bf{x}}^{*})-\sum_{i=1% }^{m}\sum_{j\in{\mathcal{S}}_{i}}f_{j}({\bf{z}}_{i}^{t}){\bf{g}}_{j}({\bf{z}}_% {i}^{t})\right)= ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) )
=𝐆t(i=1m(j𝒮i𝐠j(𝐳it)𝐠j(𝐳it))(𝐳it𝐱)i=1mj𝒮ifj(𝐳it)𝐠j(𝐳it)+i=1mj𝒮ifj(𝐱)𝐠j(𝐳it))absentsuperscript𝐆𝑡superscriptsubscript𝑖1𝑚subscript𝑗subscript𝒮𝑖subscript𝐠𝑗superscriptsubscript𝐳𝑖𝑡subscript𝐠𝑗superscriptsuperscriptsubscript𝐳𝑖𝑡topsuperscriptsubscript𝐳𝑖𝑡superscript𝐱superscriptsubscript𝑖1𝑚subscript𝑗subscript𝒮𝑖subscript𝑓𝑗superscriptsubscript𝐳𝑖𝑡subscript𝐠𝑗superscriptsubscript𝐳𝑖𝑡superscriptsubscript𝑖1𝑚subscript𝑗subscript𝒮𝑖subscript𝑓𝑗superscript𝐱subscript𝐠𝑗superscriptsubscript𝐳𝑖𝑡\displaystyle={\bf G}^{t}\left(\sum_{i=1}^{m}\left(\sum_{j\in{\mathcal{S}}_{i}% }{\bf{g}}_{j}({\bf{z}}_{i}^{t}){\bf{g}}_{j}({\bf{z}}_{i}^{t})^{\top}\right)({% \bf{z}}_{i}^{t}-{\bf{x}}^{*})-\sum_{i=1}^{m}\sum_{j\in{\mathcal{S}}_{i}}f_{j}(% {\bf{z}}_{i}^{t}){\bf{g}}_{j}({\bf{z}}_{i}^{t})+\sum_{i=1}^{m}\sum_{j\in{% \mathcal{S}}_{i}}f_{j}({\bf{x}}^{*}){\bf{g}}_{j}({\bf{z}}_{i}^{t})\right)= bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) )
=𝐆ti=1mj𝒮i𝐠j(𝐳it)(𝐠j(𝐳it)(𝐳it𝐱)fj(𝐳it)+fj(𝐱)).absentsuperscript𝐆𝑡superscriptsubscript𝑖1𝑚subscript𝑗subscript𝒮𝑖subscript𝐠𝑗superscriptsubscript𝐳𝑖𝑡subscript𝐠𝑗superscriptsuperscriptsubscript𝐳𝑖𝑡topsuperscriptsubscript𝐳𝑖𝑡superscript𝐱subscript𝑓𝑗superscriptsubscript𝐳𝑖𝑡subscript𝑓𝑗superscript𝐱\displaystyle={\bf G}^{t}\sum_{i=1}^{m}\sum_{j\in{\mathcal{S}}_{i}}{\bf{g}}_{j% }({\bf{z}}_{i}^{t})\left({\bf{g}}_{j}({\bf{z}}_{i}^{t})^{\top}({\bf{z}}_{i}^{t% }-{\bf{x}}^{*})-f_{j}({\bf{z}}_{i}^{t})+f_{j}({\bf{x}}^{*})\right).= bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ( bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) .

Taking the norm on the both sides of above results, we have

𝐱t+1𝐱normsuperscript𝐱𝑡1superscript𝐱\displaystyle\left\|{\bf{x}}^{t+1}-{\bf{x}}^{*}\right\|∥ bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ =𝐆ti=1mj𝒮i𝐠j(𝐳it)(𝐠j(𝐳it)(𝐳it𝐱)fj(𝐳it)+fj(𝐱))absentnormsuperscript𝐆𝑡superscriptsubscript𝑖1𝑚subscript𝑗subscript𝒮𝑖subscript𝐠𝑗superscriptsubscript𝐳𝑖𝑡subscript𝐠𝑗superscriptsuperscriptsubscript𝐳𝑖𝑡topsuperscriptsubscript𝐳𝑖𝑡superscript𝐱subscript𝑓𝑗superscriptsubscript𝐳𝑖𝑡subscript𝑓𝑗superscript𝐱\displaystyle=\left\|{\bf G}^{t}\sum_{i=1}^{m}\sum_{j\in{\mathcal{S}}_{i}}{\bf% {g}}_{j}({\bf{z}}_{i}^{t})\left({\bf{g}}_{j}({\bf{z}}_{i}^{t})^{\top}({\bf{z}}% _{i}^{t}-{\bf{x}}^{*})-f_{j}({\bf{z}}_{i}^{t})+f_{j}({\bf{x}}^{*})\right)\right\|= ∥ bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ( bold_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) ∥
𝐆ti=1mj𝒮i(𝐠i(𝐳it)(𝐳it𝐱)fi(𝐳it)+fi(𝐱))𝐠i(𝐳it)absentnormsuperscript𝐆𝑡normsuperscriptsubscript𝑖1𝑚subscript𝑗subscript𝒮𝑖subscript𝐠𝑖superscriptsuperscriptsubscript𝐳𝑖𝑡topsuperscriptsubscript𝐳𝑖𝑡superscript𝐱subscript𝑓𝑖superscriptsubscript𝐳𝑖𝑡subscript𝑓𝑖superscript𝐱subscript𝐠𝑖superscriptsubscript𝐳𝑖𝑡\displaystyle\leq\left\|{\bf G}^{t}\right\|\left\|\sum_{i=1}^{m}\sum_{j\in{% \mathcal{S}}_{i}}\left({\bf{g}}_{i}({\bf{z}}_{i}^{t})^{\top}({\bf{z}}_{i}^{t}-% {\bf{x}}^{*})-f_{i}({\bf{z}}_{i}^{t})+f_{i}({\bf{x}}^{*})\right){\bf{g}}_{i}({% \bf{z}}_{i}^{t})\right\|≤ ∥ bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) - italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥
Lfν1+ν𝐆ti=1nj𝒮i𝐳it𝐱1+νabsentsubscript𝐿𝑓subscript𝜈1𝜈normsuperscript𝐆𝑡superscriptsubscript𝑖1𝑛subscript𝑗subscript𝒮𝑖superscriptnormsuperscriptsubscript𝐳𝑖𝑡superscript𝐱1𝜈\displaystyle\leq\frac{L_{f}{\mathcal{H}}_{\nu}}{1+\nu}\left\|{\bf G}^{t}% \right\|\sum_{i=1}^{n}\sum_{j\in{\mathcal{S}}_{i}}\left\|{\bf{z}}_{i}^{t}-{\bf% {x}}^{*}\right\|^{1+\nu}≤ divide start_ARG italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_ν end_ARG ∥ bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT
kLfν1+ν𝐆ti=1n𝐳it𝐱1+νabsent𝑘subscript𝐿𝑓subscript𝜈1𝜈normsuperscript𝐆𝑡superscriptsubscript𝑖1𝑛superscriptnormsuperscriptsubscript𝐳𝑖𝑡superscript𝐱1𝜈\displaystyle\leq\frac{kL_{f}{\mathcal{H}}_{\nu}}{1+\nu}\left\|{\bf G}^{t}% \right\|\sum_{i=1}^{n}\left\|{\bf{z}}_{i}^{t}-{\bf{x}}^{*}\right\|^{1+\nu}≤ divide start_ARG italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_ν end_ARG ∥ bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT

where the first inequality comes from the property of matrix norm, the second inequality is based on Lemma 4 and 5, the last inequality is based on |𝒮i|ksubscript𝒮𝑖𝑘|{\mathcal{S}}_{i}|\leq k| caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ≤ italic_k for all i[m]𝑖delimited-[]𝑚i\in[m]italic_i ∈ [ italic_m ]. ∎

We split the results of Theorem 2 into two parts (i.e., Theorem 5 and 6) and provide their proofs as follows. Our analysis is based on the properties of our the auxiliary sequence constructed in Section C.

Theorem 5.

Under the Assumption 1, 2 and 3, we run MB-IGN (Algorithm 2) with batch size k𝑘kitalic_k, and initialization 𝐱0dsuperscript𝐱0superscript𝑑{\bf{x}}^{0}\in{\mathbb{R}}^{d}bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and 𝐇0=𝐉(𝐱0)𝐉(𝐱0)superscript𝐇0𝐉superscriptsuperscript𝐱0top𝐉superscript𝐱0{\bf H}^{0}={\bf J}({\bf{x}}^{0})^{\top}{\bf J}({\bf{x}}^{0})bold_H start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = bold_J ( bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J ( bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) such that

𝐱0𝐱(μ24kLfνm)1/ν,normsuperscript𝐱0superscript𝐱superscriptsuperscript𝜇24𝑘subscript𝐿𝑓subscript𝜈𝑚1𝜈\displaystyle\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|\leq\left(\frac{\mu^{2}}{% 4kL_{f}{\mathcal{H}}_{\nu}m}\right)^{{1}/{\nu}},∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ ( divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_m end_ARG ) start_POSTSUPERSCRIPT 1 / italic_ν end_POSTSUPERSCRIPT ,

where m=n/k𝑚𝑛𝑘m=\lceil n/k\rceilitalic_m = ⌈ italic_n / italic_k ⌉, then it holds

σmin(𝐈+(𝐕t)(𝐇t)1𝐔t)>0,𝐇tμ22𝐈and𝐱t𝐱at+1(m,ν)𝐱0𝐱formulae-sequencesubscript𝜎𝐈superscriptsuperscript𝐕𝑡topsuperscriptsuperscript𝐇𝑡1superscript𝐔𝑡0formulae-sequencesucceeds-or-equalssuperscript𝐇𝑡superscript𝜇22𝐈andnormsuperscript𝐱𝑡superscript𝐱subscript𝑎𝑡1𝑚𝜈normsuperscript𝐱0superscript𝐱\displaystyle\sigma_{\min}({\bf I}+({\bf V}^{t})^{\top}({\bf H}^{t})^{-1}{\bf U% }^{t})>0,\quad{\bf H}^{t}\succeq\frac{\mu^{2}}{2}{\bf I}\quad\text{and}\quad% \left\|{\bf{x}}^{t}-{\bf{x}}^{*}\right\|\leq a_{t+1}(m,\nu)\left\|{\bf{x}}^{0}% -{\bf{x}}^{*}\right\|italic_σ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( bold_I + ( bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) > 0 , bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⪰ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG bold_I and ∥ bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_m , italic_ν ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥

for all t0𝑡0t\geq 0italic_t ≥ 0, where the sequence {at(m,ν)}t0subscriptsubscript𝑎𝑡𝑚𝜈𝑡0\{a_{t}(m,\nu)\}_{t\geq 0}{ italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_m , italic_ν ) } start_POSTSUBSCRIPT italic_t ≥ 0 end_POSTSUBSCRIPT is defined in equation (23).

Proof.

We first show

𝐇tμ22𝐈and𝐱t𝐱at+1(m,ν)𝐱0𝐱formulae-sequencesucceeds-or-equalssuperscript𝐇𝑡superscript𝜇22𝐈andnormsuperscript𝐱𝑡superscript𝐱subscript𝑎𝑡1𝑚𝜈normsuperscript𝐱0superscript𝐱\displaystyle{\bf H}^{t}\succeq\frac{\mu^{2}}{2}{\bf I}\qquad\text{and}\qquad% \left\|{\bf{x}}^{t}-{\bf{x}}^{*}\right\|\leq a_{t+1}(m,\nu)\left\|{\bf{x}}^{0}% -{\bf{x}}^{*}\right\|bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⪰ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG bold_I and ∥ bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_m , italic_ν ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ (31)

holds for all t0𝑡0t\geq 0italic_t ≥ 0. We split the proof of results (31) into the following three parts.

Part I: For t=0𝑡0t=0italic_t = 0, the initialization and the fact a0=1subscript𝑎01a_{0}=1italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1 leads to

𝐱0𝐱=a0(m,ν)𝐱0𝐱.normsuperscript𝐱0superscript𝐱subscript𝑎0𝑚𝜈normsuperscript𝐱0superscript𝐱\displaystyle\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|=a_{0}(m,\nu)\left\|{\bf{% x}}^{0}-{\bf{x}}^{*}\right\|.∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ = italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_m , italic_ν ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ .

Part II: For all t=0,,m1𝑡0𝑚1t=0,\cdots,m-1italic_t = 0 , ⋯ , italic_m - 1, we use induction to prove the results of

𝐇tμ22𝐈and𝐱t+1𝐱at+1(m,ν)𝐱0𝐱.formulae-sequencesucceeds-or-equalssuperscript𝐇𝑡superscript𝜇22𝐈andnormsuperscript𝐱𝑡1superscript𝐱subscript𝑎𝑡1𝑚𝜈normsuperscript𝐱0superscript𝐱\displaystyle{\bf H}^{t}\succeq\frac{\mu^{2}}{2}{\bf I}\qquad\text{and}\qquad% \left\|{\bf{x}}^{t+1}-{\bf{x}}^{*}\right\|\leq a_{t+1}(m,\nu)\left\|{\bf{x}}^{% 0}-{\bf{x}}^{*}\right\|.bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⪰ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG bold_I and ∥ bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_m , italic_ν ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ . (32)

For the induction base, we can apply Lemma 14 to verify

σmin(𝐇0)subscript𝜎superscript𝐇0\displaystyle\sigma_{\min}({\bf H}^{0})italic_σ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( bold_H start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) μ22kLfνi=1m𝐳i0𝐱νabsentsuperscript𝜇22𝑘subscript𝐿𝑓subscript𝜈superscriptsubscript𝑖1𝑚superscriptnormsuperscriptsubscript𝐳𝑖0superscript𝐱𝜈\displaystyle\geq\mu^{2}-2kL_{f}{\mathcal{H}}_{\nu}\sum_{i=1}^{m}\left\|{\bf{z% }}_{i}^{0}-{\bf{x}}^{*}\right\|^{\nu}≥ italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT
=μ22kLfνi=1m𝐱0𝐱νabsentsuperscript𝜇22𝑘subscript𝐿𝑓subscript𝜈superscriptsubscript𝑖1𝑚superscriptnormsuperscript𝐱0superscript𝐱𝜈\displaystyle=\mu^{2}-2kL_{f}{\mathcal{H}}_{\nu}\sum_{i=1}^{m}\left\|{\bf{x}}^% {0}-{\bf{x}}^{*}\right\|^{\nu}= italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT
μ22kLfνmμ24kLfνmabsentsuperscript𝜇22𝑘subscript𝐿𝑓subscript𝜈𝑚superscript𝜇24𝑘subscript𝐿𝑓subscript𝜈𝑚\displaystyle\geq\mu^{2}-2kL_{f}{\mathcal{H}}_{\nu}m\frac{\mu^{2}}{4kL_{f}{% \mathcal{H}}_{\nu}m}≥ italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_m divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_m end_ARG
=μ2μ22absentsuperscript𝜇2superscript𝜇22\displaystyle=\mu^{2}-\frac{\mu^{2}}{2}= italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG
=μ22.absentsuperscript𝜇22\displaystyle=\frac{\mu^{2}}{2}.= divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG .

This implies

𝐇0μ22and𝐆0=(𝐇0)12μ2.formulae-sequencesucceeds-or-equalssuperscript𝐇0superscript𝜇22andnormsuperscript𝐆0normsuperscriptsuperscript𝐇012superscript𝜇2\displaystyle{\bf H}^{0}\succeq\frac{\mu^{2}}{2}\qquad\text{and}\qquad\left\|{% \bf G}^{0}\right\|=\left\|({\bf H}^{0})^{-1}\right\|\leq\frac{2}{\mu^{2}}.bold_H start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ⪰ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG and ∥ bold_G start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∥ = ∥ ( bold_H start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ ≤ divide start_ARG 2 end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (33)

According to Lemma 16, we have

𝐱1𝐱normsuperscript𝐱1superscript𝐱\displaystyle\left\|{\bf{x}}^{1}-{\bf{x}}^{*}\right\|∥ bold_x start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ kLfν1+ν𝐆0i=1m𝐳i0𝐱1+νabsent𝑘subscript𝐿𝑓subscript𝜈1𝜈normsuperscript𝐆0superscriptsubscript𝑖1𝑚superscriptnormsuperscriptsubscript𝐳𝑖0superscript𝐱1𝜈\displaystyle\leq\frac{kL_{f}{\mathcal{H}}_{\nu}}{1+\nu}\left\|{\bf G}^{0}% \right\|\sum_{i=1}^{m}\left\|{\bf{z}}_{i}^{0}-{\bf{x}}^{*}\right\|^{1+\nu}≤ divide start_ARG italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_ν end_ARG ∥ bold_G start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT
kLfν1+ν2μ2i=1m𝐳i0𝐱1+νabsent𝑘subscript𝐿𝑓subscript𝜈1𝜈2superscript𝜇2superscriptsubscript𝑖1𝑚superscriptnormsuperscriptsubscript𝐳𝑖0superscript𝐱1𝜈\displaystyle\leq\frac{kL_{f}{\mathcal{H}}_{\nu}}{1+\nu}\cdot\frac{2}{\mu^{2}}% \cdot\sum_{i=1}^{m}\left\|{\bf{z}}_{i}^{0}-{\bf{x}}^{*}\right\|^{1+\nu}≤ divide start_ARG italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_ν end_ARG ⋅ divide start_ARG 2 end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ⋅ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT
=kLfν1+ν2μ2m𝐱0𝐱1+νabsent𝑘subscript𝐿𝑓subscript𝜈1𝜈2superscript𝜇2𝑚superscriptnormsuperscript𝐱0superscript𝐱1𝜈\displaystyle=\frac{kL_{f}{\mathcal{H}}_{\nu}}{1+\nu}\cdot\frac{2}{\mu^{2}}% \cdot m\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|^{1+\nu}= divide start_ARG italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_ν end_ARG ⋅ divide start_ARG 2 end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ⋅ italic_m ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT
kmLfν1+ν2μ2μ24kLfνm𝐱0𝐱absent𝑘𝑚subscript𝐿𝑓subscript𝜈1𝜈2superscript𝜇2superscript𝜇24𝑘subscript𝐿𝑓subscript𝜈𝑚normsuperscript𝐱0superscript𝐱\displaystyle\leq\frac{kmL_{f}{\mathcal{H}}_{\nu}}{1+\nu}\cdot\frac{2}{\mu^{2}% }\cdot\frac{\mu^{2}}{4kL_{f}{\mathcal{H}}_{\nu}m}\left\|{\bf{x}}^{0}-{\bf{x}}^% {*}\right\|≤ divide start_ARG italic_k italic_m italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_ν end_ARG ⋅ divide start_ARG 2 end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ⋅ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_m end_ARG ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥
=12(1+ν)𝐱0𝐱absent121𝜈normsuperscript𝐱0superscript𝐱\displaystyle=\frac{1}{2(1+\nu)}\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|= divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥
=a1(m,ν)𝐱0𝐱,absentsubscript𝑎1𝑚𝜈normsuperscript𝐱0superscript𝐱\displaystyle=a_{1}(m,\nu)\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|,= italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_m , italic_ν ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ,

where the first inequality is based on equation (33) and the second inequality is based on initial condition. Therefore, the induction base holds

For the induction step, we assume

𝐇jμ22𝐈and𝐱j+1𝐱aj+1(m,ν)𝐱0𝐱formulae-sequencesucceeds-or-equalssuperscript𝐇𝑗superscript𝜇22𝐈andnormsuperscript𝐱𝑗1superscript𝐱subscript𝑎𝑗1𝑚𝜈normsuperscript𝐱0superscript𝐱\displaystyle{\bf H}^{j}\succeq\frac{\mu^{2}}{2}{\bf I}\qquad\text{and}\qquad% \left\|{\bf{x}}^{j+1}-{\bf{x}}^{*}\right\|\leq a_{j+1}(m,\nu)\left\|{\bf{x}}^{% 0}-{\bf{x}}^{*}\right\|bold_H start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ⪰ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG bold_I and ∥ bold_x start_POSTSUPERSCRIPT italic_j + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ italic_a start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT ( italic_m , italic_ν ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥

hold for all j=2,,t1𝑗2𝑡1j=2,\cdots,t-1italic_j = 2 , ⋯ , italic_t - 1 such that tm1𝑡𝑚1t\leq m-1italic_t ≤ italic_m - 1. Therefore, the update (9) means

𝐳it={𝐱i,1it,𝐱0,t<im.superscriptsubscript𝐳𝑖𝑡casessuperscript𝐱𝑖1𝑖𝑡superscript𝐱0𝑡𝑖𝑚\displaystyle{\bf{z}}_{i}^{t}=\begin{cases}{\bf{x}}^{i},~{}~{}~{}~{}&1\leq i% \leq t,\\ {\bf{x}}^{0},~{}~{}~{}~{}&t<i\leq m.\end{cases}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = { start_ROW start_CELL bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , end_CELL start_CELL 1 ≤ italic_i ≤ italic_t , end_CELL end_ROW start_ROW start_CELL bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , end_CELL start_CELL italic_t < italic_i ≤ italic_m . end_CELL end_ROW (34)

The induction hypothesis leads to

𝐱j𝐱ν(aj(m,ν))ν𝐱0𝐱ν𝐱0𝐱νμ24kLfνm,superscriptnormsuperscript𝐱𝑗superscript𝐱𝜈superscriptsubscript𝑎𝑗𝑚𝜈𝜈superscriptnormsuperscript𝐱0superscript𝐱𝜈superscriptnormsuperscript𝐱0superscript𝐱𝜈superscript𝜇24𝑘subscript𝐿𝑓subscript𝜈𝑚\displaystyle\left\|{\bf{x}}^{j}-{\bf{x}}^{*}\right\|^{\nu}\leq(a_{j}(m,\nu))^% {\nu}\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|^{\nu}\leq\left\|{\bf{x}}^{0}-{% \bf{x}}^{*}\right\|^{\nu}\leq\frac{\mu^{2}}{4kL_{f}{\mathcal{H}}_{\nu}m},∥ bold_x start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ≤ ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_m , italic_ν ) ) start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ≤ ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ≤ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_m end_ARG ,

for j=1,,t1𝑗1𝑡1j=1,\cdots,t-1italic_j = 1 , ⋯ , italic_t - 1, where the second is based on Lemma 6 and the third comes from the initial condition. Combining with the result of (34), we achive

𝐳it𝐱νμ24kLfνm.superscriptnormsuperscriptsubscript𝐳𝑖𝑡superscript𝐱𝜈superscript𝜇24𝑘subscript𝐿𝑓subscript𝜈𝑚\displaystyle\left\|{\bf{z}}_{i}^{t}-{\bf{x}}^{*}\right\|^{\nu}\leq\frac{\mu^{% 2}}{4kL_{f}{\mathcal{H}}_{\nu}m}.∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ≤ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_m end_ARG .

According to Lemma 14, we have

σmin(𝐇t)subscript𝜎superscript𝐇𝑡\displaystyle\sigma_{\min}({\bf H}^{t})italic_σ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) μ22kLfνi=1m𝐳it𝐱νabsentsuperscript𝜇22𝑘subscript𝐿𝑓subscript𝜈superscriptsubscript𝑖1𝑚superscriptnormsuperscriptsubscript𝐳𝑖𝑡superscript𝐱𝜈\displaystyle\geq\mu^{2}-2kL_{f}{\mathcal{H}}_{\nu}\sum_{i=1}^{m}\left\|{\bf{z% }}_{i}^{t}-{\bf{x}}^{*}\right\|^{\nu}≥ italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT
μ22kLfνmμ24kLfνmabsentsuperscript𝜇22𝑘subscript𝐿𝑓subscript𝜈𝑚superscript𝜇24𝑘subscript𝐿𝑓subscript𝜈𝑚\displaystyle\geq\mu^{2}-2kL_{f}{\mathcal{H}}_{\nu}m\frac{\mu^{2}}{4kL_{f}{% \mathcal{H}}_{\nu}m}≥ italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_m divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_m end_ARG
=μ2μ22absentsuperscript𝜇2superscript𝜇22\displaystyle=\mu^{2}-\frac{\mu^{2}}{2}= italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG
=μ22,absentsuperscript𝜇22\displaystyle=\frac{\mu^{2}}{2},= divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG ,

where the second inequality comes from the initial condition. Therefore, we have

𝐇tμ22𝐈and𝐆t=(𝐇t)12μ2.formulae-sequencesucceeds-or-equalssuperscript𝐇𝑡superscript𝜇22𝐈andnormsuperscript𝐆𝑡normsuperscriptsuperscript𝐇𝑡12superscript𝜇2\displaystyle{\bf H}^{t}\succeq\frac{\mu^{2}}{2}{\bf I}\qquad\text{and}\qquad% \left\|{\bf G}^{t}\right\|=\left\|({\bf H}^{t})^{-1}\right\|\leq\frac{2}{\mu^{% 2}}.bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⪰ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG bold_I and ∥ bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ = ∥ ( bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ ≤ divide start_ARG 2 end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

According to Lemma 13, we have

𝐱t+1𝐱normsuperscript𝐱𝑡1superscript𝐱\displaystyle\left\|{\bf{x}}^{t+1}-{\bf{x}}^{*}\right\|∥ bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ kLfν1+ν𝐆ti=1m𝐳it𝐱1+νabsent𝑘subscript𝐿𝑓subscript𝜈1𝜈normsuperscript𝐆𝑡superscriptsubscript𝑖1𝑚superscriptnormsuperscriptsubscript𝐳𝑖𝑡superscript𝐱1𝜈\displaystyle\leq\frac{kL_{f}{\mathcal{H}}_{\nu}}{1+\nu}\left\|{\bf G}^{t}% \right\|\sum_{i=1}^{m}\left\|{\bf{z}}_{i}^{t}-{\bf{x}}^{*}\right\|^{1+\nu}≤ divide start_ARG italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_ν end_ARG ∥ bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT
kLfν1+ν2μ2i=1m𝐳it𝐱1+νabsent𝑘subscript𝐿𝑓subscript𝜈1𝜈2superscript𝜇2superscriptsubscript𝑖1𝑚superscriptnormsuperscriptsubscript𝐳𝑖𝑡superscript𝐱1𝜈\displaystyle\leq\frac{kL_{f}{\mathcal{H}}_{\nu}}{1+\nu}\frac{2}{\mu^{2}}\sum_% {i=1}^{m}\left\|{\bf{z}}_{i}^{t}-{\bf{x}}^{*}\right\|^{1+\nu}≤ divide start_ARG italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_ν end_ARG divide start_ARG 2 end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT
2Lfν(1+ν)μ2(j=1t𝐱j𝐱1+ν+(mt)𝐱0𝐱1+ν)absent2subscript𝐿𝑓subscript𝜈1𝜈superscript𝜇2superscriptsubscript𝑗1𝑡superscriptnormsuperscript𝐱𝑗superscript𝐱1𝜈𝑚𝑡superscriptnormsuperscript𝐱0superscript𝐱1𝜈\displaystyle\leq\frac{2L_{f}{\mathcal{H}}_{\nu}}{(1+\nu)\mu^{2}}\left(\sum_{j% =1}^{t}\left\|{\bf{x}}^{j}-{\bf{x}}^{*}\right\|^{1+\nu}+(m-t)\left\|{\bf{x}}^{% 0}-{\bf{x}}^{*}\right\|^{1+\nu}\right)≤ divide start_ARG 2 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG ( 1 + italic_ν ) italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ bold_x start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT + ( italic_m - italic_t ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT )
2kLfν(1+ν)μ2(j=1t(aj(m,ν))1+ν𝐱0𝐱1+ν+(mt)𝐱0𝐱1+ν)absent2𝑘subscript𝐿𝑓subscript𝜈1𝜈superscript𝜇2superscriptsubscript𝑗1𝑡superscriptsubscript𝑎𝑗𝑚𝜈1𝜈superscriptnormsuperscript𝐱0superscript𝐱1𝜈𝑚𝑡superscriptnormsuperscript𝐱0superscript𝐱1𝜈\displaystyle\leq\frac{2kL_{f}{\mathcal{H}}_{\nu}}{(1+\nu)\mu^{2}}\left(\sum_{% j=1}^{t}(a_{j}(m,\nu))^{1+\nu}\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|^{1+\nu}% +(m-t)\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|^{1+\nu}\right)≤ divide start_ARG 2 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG ( 1 + italic_ν ) italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_m , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT + ( italic_m - italic_t ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT )
2kLfν(1+ν)μ2μ24kLfνm(j=1t(aj(m,ν))1+ν+mt)𝐱0𝐱absent2𝑘subscript𝐿𝑓subscript𝜈1𝜈superscript𝜇2superscript𝜇24𝑘subscript𝐿𝑓subscript𝜈𝑚superscriptsubscript𝑗1𝑡superscriptsubscript𝑎𝑗𝑚𝜈1𝜈𝑚𝑡normsuperscript𝐱0superscript𝐱\displaystyle\leq\frac{2kL_{f}{\mathcal{H}}_{\nu}}{(1+\nu)\mu^{2}}\frac{\mu^{2% }}{4kL_{f}{\mathcal{H}}_{\nu}m}\left(\sum_{j=1}^{t}(a_{j}(m,\nu))^{1+\nu}+m-t% \right)\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|≤ divide start_ARG 2 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG ( 1 + italic_ν ) italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_m end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_m , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT + italic_m - italic_t ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥
=12(1+ν)m(j=1t(aj(m,ν))1+ν+mt)𝐱0𝐱absent121𝜈𝑚superscriptsubscript𝑗1𝑡superscriptsubscript𝑎𝑗𝑚𝜈1𝜈𝑚𝑡normsuperscript𝐱0superscript𝐱\displaystyle=\frac{1}{2(1+\nu)m}\left(\sum_{j=1}^{t}(a_{j}(m,\nu))^{1+\nu}+m-% t\right)\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|= divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_m end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_m , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT + italic_m - italic_t ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥
=12(1+ν)m(j=0t(aj(m,ν))1+ν+mt1)𝐱0𝐱absent121𝜈𝑚superscriptsubscript𝑗0𝑡superscriptsubscript𝑎𝑗𝑚𝜈1𝜈𝑚𝑡1normsuperscript𝐱0superscript𝐱\displaystyle=\frac{1}{2(1+\nu)m}\left(\sum_{j=0}^{t}(a_{j}(m,\nu))^{1+\nu}+m-% t-1\right)\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|= divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_m end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_m , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT + italic_m - italic_t - 1 ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥
=at+1(m,ν)𝐱0𝐱,absentsubscript𝑎𝑡1𝑚𝜈normsuperscript𝐱0superscript𝐱\displaystyle=a_{t+1}(m,\nu)\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|,= italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_m , italic_ν ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ,

where the last equality comes from the fact a0(m,ν)=1subscript𝑎0𝑚𝜈1a_{0}(m,\nu)=1italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_m , italic_ν ) = 1. Therefore, we finish the induction.

Part III: For all tm𝑡𝑚t\geq mitalic_t ≥ italic_m, we use induction to prove

𝐇t(μ2/2)𝐈and𝐱t+1𝐱at+1(m,ν)𝐱0𝐱.formulae-sequencesucceeds-or-equalssuperscript𝐇𝑡superscript𝜇22𝐈andnormsuperscript𝐱𝑡1superscript𝐱subscript𝑎𝑡1𝑚𝜈normsuperscript𝐱0superscript𝐱\displaystyle{\bf H}^{t}\succeq(\mu^{2}/2){\bf I}\qquad\text{and}\qquad\left\|% {\bf{x}}^{t+1}-{\bf{x}}^{*}\right\|\leq a_{t+1}(m,\nu)\left\|{\bf{x}}^{0}-{\bf% {x}}^{*}\right\|.bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⪰ ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 ) bold_I and ∥ bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_m , italic_ν ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ .

For the induction base, we can verify that it holds (from the result of Part II)

𝐇jμ22𝐈for all j=0,,m1,formulae-sequencesucceeds-or-equalssuperscript𝐇𝑗superscript𝜇22𝐈for all 𝑗0𝑚1\displaystyle{\bf H}^{j}\succeq\frac{\mu^{2}}{2}{\bf I}\qquad\text{for all~{}~% {}}j=0,\dots,m-1,bold_H start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ⪰ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG bold_I for all italic_j = 0 , … , italic_m - 1 ,

and

𝐱j𝐱aj(m,ν)𝐱0𝐱for all j=1,,m.formulae-sequencenormsuperscript𝐱𝑗superscript𝐱subscript𝑎𝑗𝑚𝜈normsuperscript𝐱0superscript𝐱for all 𝑗1𝑚\displaystyle\left\|{\bf{x}}^{j}-{\bf{x}}^{*}\right\|\leq a_{j}(m,\nu)\left\|{% \bf{x}}^{0}-{\bf{x}}^{*}\right\|\qquad\text{for all~{}~{}}j=1,\dots,m.∥ bold_x start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_m , italic_ν ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ for all italic_j = 1 , … , italic_m .

Then we have

𝐱j𝐱ν(aj(m,ν))ν𝐱0𝐱ν𝐱0𝐱νμ24kLfνm,for all j=1,,m,formulae-sequencesuperscriptnormsuperscript𝐱𝑗superscript𝐱𝜈superscriptsubscript𝑎𝑗𝑚𝜈𝜈superscriptnormsuperscript𝐱0superscript𝐱𝜈superscriptnormsuperscript𝐱0superscript𝐱𝜈superscript𝜇24𝑘subscript𝐿𝑓subscript𝜈𝑚for all 𝑗1𝑚\displaystyle\left\|{\bf{x}}^{j}-{\bf{x}}^{*}\right\|^{\nu}\leq(a_{j}(m,\nu))^% {\nu}\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|^{\nu}\leq\left\|{\bf{x}}^{0}-{% \bf{x}}^{*}\right\|^{\nu}\leq\frac{\mu^{2}}{4kL_{f}{\mathcal{H}}_{\nu}m},\quad% \text{for all~{}~{}}j=1,\dots,m,∥ bold_x start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ≤ ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_m , italic_ν ) ) start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ≤ ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ≤ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_m end_ARG , for all italic_j = 1 , … , italic_m ,

where the second inequality is based on Lemma 6 and the third inequality is based on the initial condition.

From Eq. 9, we have

𝐳im=𝐱ifor all i[m].formulae-sequencesuperscriptsubscript𝐳𝑖𝑚superscript𝐱𝑖for all 𝑖delimited-[]𝑚\displaystyle{\bf{z}}_{i}^{m}={\bf{x}}^{i}\qquad\text{for all~{}~{}}i\in[m].bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT = bold_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT for all italic_i ∈ [ italic_m ] .

Therefore, we have

𝐳im𝐱νμ24kLfνm,for all i[m].formulae-sequencesuperscriptnormsuperscriptsubscript𝐳𝑖𝑚superscript𝐱𝜈superscript𝜇24𝑘subscript𝐿𝑓subscript𝜈𝑚for all 𝑖delimited-[]𝑚\displaystyle\left\|{\bf{z}}_{i}^{m}-{\bf{x}}^{*}\right\|^{\nu}\leq\frac{\mu^{% 2}}{4kL_{f}{\mathcal{H}}_{\nu}m},\qquad\text{for all~{}~{}}i\in[m].∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ≤ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_m end_ARG , for all italic_i ∈ [ italic_m ] .

According to Lemma 14, we have

σmin(𝐇m)subscript𝜎superscript𝐇𝑚\displaystyle\sigma_{\min}({\bf H}^{m})italic_σ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( bold_H start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ) μ22kLfνi=1m𝐳im𝐱νabsentsuperscript𝜇22𝑘subscript𝐿𝑓subscript𝜈superscriptsubscript𝑖1𝑚superscriptnormsuperscriptsubscript𝐳𝑖𝑚superscript𝐱𝜈\displaystyle\geq\mu^{2}-2kL_{f}{\mathcal{H}}_{\nu}\sum_{i=1}^{m}\left\|{\bf{z% }}_{i}^{m}-{\bf{x}}^{*}\right\|^{\nu}≥ italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT
μ22kLfνmμ24kLfνmabsentsuperscript𝜇22𝑘subscript𝐿𝑓subscript𝜈𝑚superscript𝜇24𝑘subscript𝐿𝑓subscript𝜈𝑚\displaystyle\geq\mu^{2}-2kL_{f}{\mathcal{H}}_{\nu}m\frac{\mu^{2}}{4kL_{f}{% \mathcal{H}}_{\nu}m}≥ italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_m divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_m end_ARG
μ2μ22=μ22,absentsuperscript𝜇2superscript𝜇22superscript𝜇22\displaystyle\geq\mu^{2}-\frac{\mu^{2}}{2}=\frac{\mu^{2}}{2},≥ italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG = divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG ,

which implies

𝐇mμ22𝐈and𝐆m=(𝐇m)12μ2.formulae-sequencesucceeds-or-equalssuperscript𝐇𝑚superscript𝜇22𝐈andnormsuperscript𝐆𝑚normsuperscriptsuperscript𝐇𝑚12superscript𝜇2\displaystyle{\bf H}^{m}\succeq\frac{\mu^{2}}{2}{\bf I}\qquad\text{and}\qquad% \left\|{\bf G}^{m}\right\|=\left\|({\bf H}^{m})^{-1}\right\|\leq\frac{2}{\mu^{% 2}}.bold_H start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ⪰ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG bold_I and ∥ bold_G start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∥ = ∥ ( bold_H start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ ≤ divide start_ARG 2 end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

According to Lemma 16, we have

𝐱n+1𝐱normsuperscript𝐱𝑛1superscript𝐱\displaystyle\left\|{\bf{x}}^{n+1}-{\bf{x}}^{*}\right\|∥ bold_x start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ kLfν1+ν𝐆ni=1m𝐳im𝐱1+νabsent𝑘subscript𝐿𝑓subscript𝜈1𝜈normsuperscript𝐆𝑛superscriptsubscript𝑖1𝑚superscriptnormsuperscriptsubscript𝐳𝑖𝑚superscript𝐱1𝜈\displaystyle\leq\frac{kL_{f}{\mathcal{H}}_{\nu}}{1+\nu}\left\|{\bf G}^{n}% \right\|\sum_{i=1}^{m}\left\|{\bf{z}}_{i}^{m}-{\bf{x}}^{*}\right\|^{1+\nu}≤ divide start_ARG italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_ν end_ARG ∥ bold_G start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT
2kLfν(1+ν)μ2i=1m𝐳im𝐱1+νabsent2𝑘subscript𝐿𝑓subscript𝜈1𝜈superscript𝜇2superscriptsubscript𝑖1𝑚superscriptnormsuperscriptsubscript𝐳𝑖𝑚superscript𝐱1𝜈\displaystyle\leq\frac{2kL_{f}{\mathcal{H}}_{\nu}}{(1+\nu)\mu^{2}}\sum_{i=1}^{% m}\left\|{\bf{z}}_{i}^{m}-{\bf{x}}^{*}\right\|^{1+\nu}≤ divide start_ARG 2 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG ( 1 + italic_ν ) italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT
2kLfν(1+ν)μ2(j=1m(aj(m,ν))1+ν𝐱0𝐱1+ν)absent2𝑘subscript𝐿𝑓subscript𝜈1𝜈superscript𝜇2superscriptsubscript𝑗1𝑚superscriptsubscript𝑎𝑗𝑚𝜈1𝜈superscriptnormsuperscript𝐱0superscript𝐱1𝜈\displaystyle\leq\frac{2kL_{f}{\mathcal{H}}_{\nu}}{(1+\nu)\mu^{2}}\left(\sum_{% j=1}^{m}(a_{j}(m,\nu))^{1+\nu}\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|^{1+\nu}\right)≤ divide start_ARG 2 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG ( 1 + italic_ν ) italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_m , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT )
2kLfν(1+ν)μ2μ24kLfνm(j=1m(aj(m,ν))1+ν)𝐱0𝐱absent2𝑘subscript𝐿𝑓subscript𝜈1𝜈superscript𝜇2superscript𝜇24𝑘subscript𝐿𝑓subscript𝜈𝑚superscriptsubscript𝑗1𝑚superscriptsubscript𝑎𝑗𝑚𝜈1𝜈normsuperscript𝐱0superscript𝐱\displaystyle\leq\frac{2kL_{f}{\mathcal{H}}_{\nu}}{(1+\nu)\mu^{2}}\frac{\mu^{2% }}{4kL_{f}{\mathcal{H}}_{\nu}m}\left(\sum_{j=1}^{m}(a_{j}(m,\nu))^{1+\nu}% \right)\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|≤ divide start_ARG 2 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG ( 1 + italic_ν ) italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_m end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_m , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥
=12(1+ν)m(j=1m(aj(m,ν))1+ν)𝐱0𝐱absent121𝜈𝑚superscriptsubscript𝑗1𝑚superscriptsubscript𝑎𝑗𝑚𝜈1𝜈normsuperscript𝐱0superscript𝐱\displaystyle=\frac{1}{2(1+\nu)m}\left(\sum_{j=1}^{m}(a_{j}(m,\nu))^{1+\nu}% \right)\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|= divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_m end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_m , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥
=am+1(m,ν)𝐱0𝐱.absentsubscript𝑎𝑚1𝑚𝜈normsuperscript𝐱0superscript𝐱\displaystyle=a_{m+1}(m,\nu)\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|.= italic_a start_POSTSUBSCRIPT italic_m + 1 end_POSTSUBSCRIPT ( italic_m , italic_ν ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ .

Hence, we have shown the induction base holds.

For the induction step, we assume

𝐇jμ22𝐈and𝐱j+1𝐱aj+1(m,ν)𝐱0𝐱formulae-sequencesucceeds-or-equalssuperscript𝐇𝑗superscript𝜇22𝐈andnormsuperscript𝐱𝑗1superscript𝐱subscript𝑎𝑗1𝑚𝜈normsuperscript𝐱0superscript𝐱\displaystyle{\bf H}^{j}\succeq\frac{\mu^{2}}{2}{\bf I}\qquad\text{and}\qquad% \left\|{\bf{x}}^{j+1}-{\bf{x}}^{*}\right\|\leq a_{j+1}(m,\nu)\left\|{\bf{x}}^{% 0}-{\bf{x}}^{*}\right\|bold_H start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ⪰ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG bold_I and ∥ bold_x start_POSTSUPERSCRIPT italic_j + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ italic_a start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT ( italic_m , italic_ν ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥

holds for all j=m+1,,t1𝑗𝑚1𝑡1j=m+1,\cdots,t-1italic_j = italic_m + 1 , ⋯ , italic_t - 1 such that tm+2𝑡𝑚2t\geq m+2italic_t ≥ italic_m + 2. Combining results of Part I and II, we have

𝐱j𝐱aj(m,ν)𝐱0𝐱for allj=0,,t,formulae-sequencenormsuperscript𝐱𝑗superscript𝐱subscript𝑎𝑗𝑚𝜈normsuperscript𝐱0superscript𝐱for all𝑗0𝑡\displaystyle\left\|{\bf{x}}^{j}-{\bf{x}}^{*}\right\|\leq a_{j}(m,\nu)\left\|{% \bf{x}}^{0}-{\bf{x}}^{*}\right\|\qquad\text{for all}~{}~{}j=0,\dots,t,∥ bold_x start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_m , italic_ν ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ for all italic_j = 0 , … , italic_t ,

which implies

𝐱j𝐱ν(aj(m,ν))ν𝐱0𝐱ν𝐱0𝐱νμ24kLfνm,for allj=1,,t,formulae-sequencesuperscriptnormsuperscript𝐱𝑗superscript𝐱𝜈superscriptsubscript𝑎𝑗𝑚𝜈𝜈superscriptnormsuperscript𝐱0superscript𝐱𝜈superscriptnormsuperscript𝐱0superscript𝐱𝜈superscript𝜇24𝑘subscript𝐿𝑓subscript𝜈𝑚for all𝑗1𝑡\displaystyle\left\|{\bf{x}}^{j}-{\bf{x}}^{*}\right\|^{\nu}\leq(a_{j}(m,\nu))^% {\nu}\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|^{\nu}\leq\left\|{\bf{x}}^{0}-{% \bf{x}}^{*}\right\|^{\nu}\leq\frac{\mu^{2}}{4kL_{f}{\mathcal{H}}_{\nu}m},% \qquad\text{for all}~{}~{}j=1,\dots,t,∥ bold_x start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ≤ ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_m , italic_ν ) ) start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ≤ ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ≤ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_m end_ARG , for all italic_j = 1 , … , italic_t ,

where the second inequality is based on Lemma 6 and the last inequality is based on the condition condition.

The update (17) means the points {𝐳it}i=1msuperscriptsubscriptsuperscriptsubscript𝐳𝑖𝑡𝑖1𝑚\{{\bf{z}}_{i}^{t}\}_{i=1}^{m}{ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT can be written as {𝐱t+1m,,𝐱t}superscript𝐱𝑡1𝑚superscript𝐱𝑡\{{\bf{x}}^{t+1-m},\cdots,{\bf{x}}^{t}\}{ bold_x start_POSTSUPERSCRIPT italic_t + 1 - italic_m end_POSTSUPERSCRIPT , ⋯ , bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT }, which implies

max{𝐳1t𝐱,,𝐳mt𝐱}=max{𝐱t+1m𝐱,,𝐱t𝐱}.normsuperscriptsubscript𝐳1𝑡superscript𝐱normsuperscriptsubscript𝐳𝑚𝑡superscript𝐱normsuperscript𝐱𝑡1𝑚superscript𝐱normsuperscript𝐱𝑡superscript𝐱\displaystyle\max\{\left\|{\bf{z}}_{1}^{t}-{\bf{x}}^{*}\right\|,\cdots,\left\|% {\bf{z}}_{m}^{t}-{\bf{x}}^{*}\right\|\}=\max\{\left\|{\bf{x}}^{t+1-m}-{\bf{x}}% ^{*}\right\|,\cdots,\left\|{\bf{x}}^{t}-{\bf{x}}^{*}\right\|\}.roman_max { ∥ bold_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ , ⋯ , ∥ bold_z start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ } = roman_max { ∥ bold_x start_POSTSUPERSCRIPT italic_t + 1 - italic_m end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ , ⋯ , ∥ bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ } .

Therefore, we have

𝐳im𝐱νμ24kLfνmfor alli=1,,m.formulae-sequencesuperscriptnormsuperscriptsubscript𝐳𝑖𝑚superscript𝐱𝜈superscript𝜇24𝑘subscript𝐿𝑓subscript𝜈𝑚for all𝑖1𝑚\displaystyle\left\|{\bf{z}}_{i}^{m}-{\bf{x}}^{*}\right\|^{\nu}\leq\frac{\mu^{% 2}}{4kL_{f}{\mathcal{H}}_{\nu}m}\qquad\text{for all}~{}~{}i=1,\dots,m.∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ≤ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_m end_ARG for all italic_i = 1 , … , italic_m .

Combing with Lemma 14, we have

σmin(𝐇t)subscript𝜎superscript𝐇𝑡\displaystyle\sigma_{\min}({\bf H}^{t})italic_σ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) μ22kLfνi=1n𝐳it𝐱νabsentsuperscript𝜇22𝑘subscript𝐿𝑓subscript𝜈superscriptsubscript𝑖1𝑛superscriptnormsuperscriptsubscript𝐳𝑖𝑡superscript𝐱𝜈\displaystyle\geq\mu^{2}-2kL_{f}{\mathcal{H}}_{\nu}\sum_{i=1}^{n}\left\|{\bf{z% }}_{i}^{t}-{\bf{x}}^{*}\right\|^{\nu}≥ italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT
μ22kLfνmμ24kLfνmabsentsuperscript𝜇22𝑘subscript𝐿𝑓subscript𝜈𝑚superscript𝜇24𝑘subscript𝐿𝑓subscript𝜈𝑚\displaystyle\geq\mu^{2}-2kL_{f}{\mathcal{H}}_{\nu}m\frac{\mu^{2}}{4kL_{f}{% \mathcal{H}}_{\nu}m}≥ italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_m divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_m end_ARG
=μ2μ22=μ22.absentsuperscript𝜇2superscript𝜇22superscript𝜇22\displaystyle=\mu^{2}-\frac{\mu^{2}}{2}=\frac{\mu^{2}}{2}.= italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG = divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG .

Therefore, we achieve

𝐇tμ22𝐈and𝐆tsucceeds-or-equalssuperscript𝐇𝑡superscript𝜇22𝐈andnormsuperscript𝐆𝑡\displaystyle{\bf H}^{t}\succeq\frac{\mu^{2}}{2}{\bf I}\qquad\text{and}\qquad% \left\|{\bf G}^{t}\right\|bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⪰ divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG bold_I and ∥ bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ =(𝐇t)12μ2.absentnormsuperscriptsuperscript𝐇𝑡12superscript𝜇2\displaystyle=\left\|({\bf H}^{t})^{-1}\right\|\leq\frac{2}{\mu^{2}}.= ∥ ( bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ ≤ divide start_ARG 2 end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

According to Lemma 16, we have

𝐱t+1𝐱normsuperscript𝐱𝑡1superscript𝐱\displaystyle\left\|{\bf{x}}^{t+1}-{\bf{x}}^{*}\right\|∥ bold_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ kLfν1+ν𝐆ti=1m𝐳it𝐱1+νabsent𝑘subscript𝐿𝑓subscript𝜈1𝜈normsuperscript𝐆𝑡superscriptsubscript𝑖1𝑚superscriptnormsuperscriptsubscript𝐳𝑖𝑡superscript𝐱1𝜈\displaystyle\leq\frac{kL_{f}{\mathcal{H}}_{\nu}}{1+\nu}\left\|{\bf G}^{t}% \right\|\sum_{i=1}^{m}\left\|{\bf{z}}_{i}^{t}-{\bf{x}}^{*}\right\|^{1+\nu}≤ divide start_ARG italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_ν end_ARG ∥ bold_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT
2kLfν(1+ν)μ2i=1m𝐳it𝐱1+νabsent2𝑘subscript𝐿𝑓subscript𝜈1𝜈superscript𝜇2superscriptsubscript𝑖1𝑚superscriptnormsuperscriptsubscript𝐳𝑖𝑡superscript𝐱1𝜈\displaystyle\leq\frac{2kL_{f}{\mathcal{H}}_{\nu}}{(1+\nu)\mu^{2}}\sum_{i=1}^{% m}\left\|{\bf{z}}_{i}^{t}-{\bf{x}}^{*}\right\|^{1+\nu}≤ divide start_ARG 2 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG ( 1 + italic_ν ) italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT
2kLfν(1+ν)μ2(j=tm+1t(aj(m,ν))1+ν𝐱0𝐱1+ν)absent2𝑘subscript𝐿𝑓subscript𝜈1𝜈superscript𝜇2superscriptsubscript𝑗𝑡𝑚1𝑡superscriptsubscript𝑎𝑗𝑚𝜈1𝜈superscriptnormsuperscript𝐱0superscript𝐱1𝜈\displaystyle\leq\frac{2kL_{f}{\mathcal{H}}_{\nu}}{(1+\nu)\mu^{2}}\left(\sum_{% j=t-m+1}^{t}(a_{j}(m,\nu))^{1+\nu}\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|^{1+% \nu}\right)≤ divide start_ARG 2 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG ( 1 + italic_ν ) italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = italic_t - italic_m + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_m , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT )
2kLfν(1+ν)μ2(j=tm+1t(aj(m,ν))1+ν)𝐱0𝐱1+νabsent2𝑘subscript𝐿𝑓subscript𝜈1𝜈superscript𝜇2superscriptsubscript𝑗𝑡𝑚1𝑡superscriptsubscript𝑎𝑗𝑚𝜈1𝜈superscriptnormsuperscript𝐱0superscript𝐱1𝜈\displaystyle\leq\frac{2kL_{f}{\mathcal{H}}_{\nu}}{(1+\nu)\mu^{2}}\left(\sum_{% j=t-m+1}^{t}(a_{j}(m,\nu))^{1+\nu}\right)\left\|{\bf{x}}^{0}-{\bf{x}}^{*}% \right\|^{1+\nu}≤ divide start_ARG 2 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG ( 1 + italic_ν ) italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = italic_t - italic_m + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_m , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT
2kLfν(1+ν)μ2μ24kLfνm(j=tm+1t+1(aj(m,ν))1+ν)𝐱0𝐱absent2𝑘subscript𝐿𝑓subscript𝜈1𝜈superscript𝜇2superscript𝜇24𝑘subscript𝐿𝑓subscript𝜈𝑚superscriptsubscript𝑗𝑡𝑚1𝑡1superscriptsubscript𝑎𝑗𝑚𝜈1𝜈normsuperscript𝐱0superscript𝐱\displaystyle\leq\frac{2kL_{f}{\mathcal{H}}_{\nu}}{(1+\nu)\mu^{2}}\frac{\mu^{2% }}{4kL_{f}{\mathcal{H}}_{\nu}m}\left(\sum_{j=t-m+1}^{t+1}(a_{j}(m,\nu))^{1+\nu% }\right)\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|≤ divide start_ARG 2 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_ARG ( 1 + italic_ν ) italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG divide start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_k italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT italic_m end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = italic_t - italic_m + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_m , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥
=12(1+ν)m(j=tm+2t+1(aj(m,ν))1+ν)𝐱0𝐱absent121𝜈𝑚superscriptsubscript𝑗𝑡𝑚2𝑡1superscriptsubscript𝑎𝑗𝑚𝜈1𝜈normsuperscript𝐱0superscript𝐱\displaystyle=\frac{1}{2(1+\nu)m}\left(\sum_{j=t-m+2}^{t+1}(a_{j}(m,\nu))^{1+% \nu}\right)\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|= divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_m end_ARG ( ∑ start_POSTSUBSCRIPT italic_j = italic_t - italic_m + 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_m , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥
=at+1(m,ν)𝐱0𝐱.absentsubscript𝑎𝑡1𝑚𝜈normsuperscript𝐱0superscript𝐱\displaystyle=a_{t+1}(m,\nu)\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|.= italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_m , italic_ν ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ .

Hence, we finish the induction.

Combining results of Part I, II and III completes the proof of (31).

Since the non-singularity of 𝐇tsuperscript𝐇𝑡{\bf H}^{t}bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and 𝐇t+1superscript𝐇𝑡1{\bf H}^{t+1}bold_H start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT has been verified by result (31), we can apply Lemma 15 to achieve

σmin(𝐈+(𝐕t)(𝐇t)1𝐔t)>0.subscript𝜎𝐈superscriptsuperscript𝐕𝑡topsuperscriptsuperscript𝐇𝑡1superscript𝐔𝑡0\displaystyle\sigma_{\min}({\bf I}+({\bf V}^{t})^{\top}({\bf H}^{t})^{-1}{\bf U% }^{t})>0.italic_σ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( bold_I + ( bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) > 0 .

Theorem 6.

We define the sequence {rt}t0subscriptsubscript𝑟𝑡𝑡0\{r_{t}\}_{t\geq 0}{ italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t ≥ 0 end_POSTSUBSCRIPT such that

rt{max{𝐱0𝐱,1},t=0,at(m,ν)r0,t1,subscript𝑟𝑡casesnormsuperscript𝐱0superscript𝐱1𝑡0subscript𝑎𝑡𝑚𝜈subscript𝑟0𝑡1\displaystyle r_{t}\triangleq\begin{cases}\max\{\left\|{\bf{x}}^{0}-{\bf{x}}^{% *}\right\|,1\},~{}~{}~{}~{}&t=0,\\[5.69046pt] a_{t}(m,\nu)r_{0},~{}~{}~{}~{}&t\geq 1,\\ \end{cases}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≜ { start_ROW start_CELL roman_max { ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ , 1 } , end_CELL start_CELL italic_t = 0 , end_CELL end_ROW start_ROW start_CELL italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_m , italic_ν ) italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , end_CELL start_CELL italic_t ≥ 1 , end_CELL end_ROW

where the sequence {at(m,ν)}t0subscriptsubscript𝑎𝑡𝑚𝜈𝑡0\{a_{t}(m,\nu)\}_{t\geq 0}{ italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_m , italic_ν ) } start_POSTSUBSCRIPT italic_t ≥ 0 end_POSTSUBSCRIPT is defined by equation (23). Under the Assumptions 1, 2 and 3, running MB-IGN (Algorithm 2) with initial condition shown in Theorem 6, we have

𝐱t𝐱rtandrt+1c(1+ν)(tm1)rtformulae-sequencenormsuperscript𝐱𝑡superscript𝐱subscript𝑟𝑡andsubscript𝑟𝑡1superscript𝑐superscript1𝜈𝑡𝑚1subscript𝑟𝑡\displaystyle\left\|{\bf{x}}^{t}-{\bf{x}}^{*}\right\|\leq r_{t}\qquad\text{and% }\qquad r_{t+1}\leq c^{(1+\nu)^{\left(\left\lfloor\frac{t}{m}\right\rfloor-1% \right)}}r_{t}∥ bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and italic_r start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ≤ italic_c start_POSTSUPERSCRIPT ( 1 + italic_ν ) start_POSTSUPERSCRIPT ( ⌊ divide start_ARG italic_t end_ARG start_ARG italic_m end_ARG ⌋ - 1 ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (35)

for all tm𝑡𝑚t\geq mitalic_t ≥ italic_m, where

c=11m(1(12(1+ν))1+ν).𝑐11𝑚1superscript121𝜈1𝜈\displaystyle c=1-\frac{1}{m}\left(1-\left(\frac{1}{2(1+\nu)}\right)^{1+\nu}% \right).italic_c = 1 - divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ( 1 - ( divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ) .
Proof.

The definition of {rt}t0subscriptsubscript𝑟𝑡𝑡0\{r_{t}\}_{t\geq 0}{ italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t ≥ 0 end_POSTSUBSCRIPT leads to

r0=max{𝐱0𝐱,1}𝐱0𝐱.subscript𝑟0normsuperscript𝐱0superscript𝐱1normsuperscript𝐱0superscript𝐱\displaystyle r_{0}=\max\{\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|,1\}\geq% \left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|.italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = roman_max { ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ , 1 } ≥ ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ .

According to Theorem 5, we have

𝐱t𝐱at(m,ν)𝐱0𝐱at(m,ν)r0=rt.normsuperscript𝐱𝑡superscript𝐱subscript𝑎𝑡𝑚𝜈normsuperscript𝐱0superscript𝐱subscript𝑎𝑡𝑚𝜈subscript𝑟0subscript𝑟𝑡\displaystyle\left\|{\bf{x}}^{t}-{\bf{x}}^{*}\right\|\leq a_{t}(m,\nu)\left\|{% \bf{x}}^{0}-{\bf{x}}^{*}\right\|\leq a_{t}(m,\nu)r_{0}=r_{t}.∥ bold_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_m , italic_ν ) ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_m , italic_ν ) italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT .

According to Lemma 11, we have

at+1(m,ν)c(1+ν)(tm1)at(m,ν)for alltm.formulae-sequencesubscript𝑎𝑡1𝑚𝜈superscript𝑐superscript1𝜈𝑡𝑚1subscript𝑎𝑡𝑚𝜈for all𝑡𝑚\displaystyle a_{t+1}(m,\nu)\leq c^{(1+\nu)^{\left(\left\lfloor\frac{t}{m}% \right\rfloor-1\right)}}a_{t}(m,\nu)\qquad\text{for all}~{}~{}t\geq m.italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_m , italic_ν ) ≤ italic_c start_POSTSUPERSCRIPT ( 1 + italic_ν ) start_POSTSUPERSCRIPT ( ⌊ divide start_ARG italic_t end_ARG start_ARG italic_m end_ARG ⌋ - 1 ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_m , italic_ν ) for all italic_t ≥ italic_m .

Thus, achieve

rt+1=at+1(m,ν)r0c(1+ν)(tm1)at(m,ν)r0=c(1+ν)(tm1)rtfor alltm,formulae-sequencesubscript𝑟𝑡1subscript𝑎𝑡1𝑚𝜈subscript𝑟0superscript𝑐superscript1𝜈𝑡𝑚1subscript𝑎𝑡𝑚𝜈subscript𝑟0superscript𝑐superscript1𝜈𝑡𝑚1subscript𝑟𝑡for all𝑡𝑚\displaystyle r_{t+1}=a_{t+1}(m,\nu)r_{0}\leq c^{(1+\nu)^{\left(\left\lfloor% \frac{t}{m}\right\rfloor-1\right)}}a_{t}(m,\nu)r_{0}=c^{(1+\nu)^{\left(\left% \lfloor\frac{t}{m}\right\rfloor-1\right)}}r_{t}\qquad\text{for all}~{}~{}t\geq m,italic_r start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_m , italic_ν ) italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ≤ italic_c start_POSTSUPERSCRIPT ( 1 + italic_ν ) start_POSTSUPERSCRIPT ( ⌊ divide start_ARG italic_t end_ARG start_ARG italic_m end_ARG ⌋ - 1 ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_m , italic_ν ) italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_c start_POSTSUPERSCRIPT ( 1 + italic_ν ) start_POSTSUPERSCRIPT ( ⌊ divide start_ARG italic_t end_ARG start_ARG italic_m end_ARG ⌋ - 1 ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT for all italic_t ≥ italic_m ,

where

c=11m(1(12(1+ν))1+ν).𝑐11𝑚1superscript121𝜈1𝜈\displaystyle c=1-\frac{1}{m}\left(1-\left(\frac{1}{2(1+\nu)}\right)^{1+\nu}% \right).italic_c = 1 - divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ( 1 - ( divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ) .

Combining the results of Theorem 5 and 6, we finish the proof of Theorem 2.

E.3 Proof of Corollary 3

Proof.

Denote m=n/k𝑚𝑛𝑘m=\lceil n/k\rceilitalic_m = ⌈ italic_n / italic_k ⌉, according to Theorem 2, we have

rt+1c(1+ν)(t/m1)rtwithc=11m(1(12(1+ν))1+ν).formulae-sequencesubscript𝑟𝑡1superscript𝑐superscript1𝜈𝑡𝑚1subscript𝑟𝑡with𝑐11𝑚1superscript121𝜈1𝜈\displaystyle r_{t+1}\leq c^{(1+\nu)^{\left(\left\lfloor{t}/{m}\right\rfloor-1% \right)}}r_{t}\qquad\text{with}\qquad c=1-\frac{1}{m}\left(1-\left(\frac{1}{2(% 1+\nu)}\right)^{1+\nu}\right).italic_r start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ≤ italic_c start_POSTSUPERSCRIPT ( 1 + italic_ν ) start_POSTSUPERSCRIPT ( ⌊ italic_t / italic_m ⌋ - 1 ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT with italic_c = 1 - divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ( 1 - ( divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT ) .

for all ν(0,1]𝜈01\nu\in(0,1]italic_ν ∈ ( 0 , 1 ]. Noticing that the value of c𝑐citalic_c is monotonically decreasing according to ν𝜈\nuitalic_ν, we have

112m>c11516m,112𝑚𝑐11516𝑚\displaystyle 1-\frac{1}{2m}>c\geq 1-\frac{15}{16m},1 - divide start_ARG 1 end_ARG start_ARG 2 italic_m end_ARG > italic_c ≥ 1 - divide start_ARG 15 end_ARG start_ARG 16 italic_m end_ARG ,

which implies

rt+1(112m)(1+ν)(t/m1)rtsubscript𝑟𝑡1superscript112𝑚superscript1𝜈𝑡𝑚1subscript𝑟𝑡\displaystyle r_{t+1}\leq\Big{(}1-\frac{1}{2m}\Big{)}^{(1+\nu)^{(\left\lfloor t% /m\right\rfloor-1)}}r_{t}italic_r start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ≤ ( 1 - divide start_ARG 1 end_ARG start_ARG 2 italic_m end_ARG ) start_POSTSUPERSCRIPT ( 1 + italic_ν ) start_POSTSUPERSCRIPT ( ⌊ italic_t / italic_m ⌋ - 1 ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT

for all tm𝑡𝑚t\geq mitalic_t ≥ italic_m.

According to the definition of {rt}t0subscriptsubscript𝑟𝑡𝑡0\{r_{t}\}_{t\geq 0}{ italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t ≥ 0 end_POSTSUBSCRIPT and Theorem 6, we have

r0=max{𝐱0𝐱,1}1.subscript𝑟0normsuperscript𝐱0superscript𝐱11\displaystyle r_{0}=\max\{\left\|{\bf{x}}^{0}-{\bf{x}}^{*}\right\|,1\}\geq 1.italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = roman_max { ∥ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ , 1 } ≥ 1 .

Combining with Lemma 8, we have

rt=subscript𝑟𝑡absent\displaystyle r_{t}=italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = at(m,ν)r0subscript𝑎𝑡𝑚𝜈subscript𝑟0\displaystyle a_{t}(m,\nu)r_{0}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_m , italic_ν ) italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT
\displaystyle\leq 12(1+ν)(atm(m,ν))1+νr0121𝜈superscriptsubscript𝑎𝑡𝑚𝑚𝜈1𝜈subscript𝑟0\displaystyle\frac{1}{2(1+\nu)}(a_{t-m}(m,\nu))^{1+\nu}r_{0}divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG ( italic_a start_POSTSUBSCRIPT italic_t - italic_m end_POSTSUBSCRIPT ( italic_m , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT
=\displaystyle== 12(1+ν)r0ν(atm(m,ν))1+νr01+ν121𝜈superscriptsubscript𝑟0𝜈superscriptsubscript𝑎𝑡𝑚𝑚𝜈1𝜈superscriptsubscript𝑟01𝜈\displaystyle\frac{1}{2(1+\nu)r_{0}^{\nu}}(a_{t-m}(m,\nu))^{1+\nu}r_{0}^{1+\nu}divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT end_ARG ( italic_a start_POSTSUBSCRIPT italic_t - italic_m end_POSTSUBSCRIPT ( italic_m , italic_ν ) ) start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT
=\displaystyle== 12(1+ν)r0νrtm1+ν121𝜈superscriptsubscript𝑟0𝜈superscriptsubscript𝑟𝑡𝑚1𝜈\displaystyle\frac{1}{2(1+\nu)r_{0}^{\nu}}r_{t-m}^{1+\nu}divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT end_ARG italic_r start_POSTSUBSCRIPT italic_t - italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT
\displaystyle\leq 12(1+ν)rtm1+ν121𝜈superscriptsubscript𝑟𝑡𝑚1𝜈\displaystyle\frac{1}{2(1+\nu)}r_{t-m}^{1+\nu}divide start_ARG 1 end_ARG start_ARG 2 ( 1 + italic_ν ) end_ARG italic_r start_POSTSUBSCRIPT italic_t - italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 + italic_ν end_POSTSUPERSCRIPT

for all tm𝑡𝑚t\geq mitalic_t ≥ italic_m. This leads to

rt14rtm2subscript𝑟𝑡14superscriptsubscript𝑟𝑡𝑚2\displaystyle r_{t}\leq\frac{1}{4}r_{t-m}^{2}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG 4 end_ARG italic_r start_POSTSUBSCRIPT italic_t - italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

in the case of ν=1𝜈1\nu=1italic_ν = 1. ∎