Differentially Private Linear Regression with Linked Data

Abstract.

There has been increasing demand for establishing privacy-preserving methodologies for modern statistics and machine learning. Differential privacy, a mathematical notion from computer science, is a rising tool offering robust privacy guarantees. Recent work focuses primarily on developing differentially private versions of individual statistical and machine learning tasks, with nontrivial upstream pre-processing typically not incorporated. An important example is when record linkage is done prior to downstream modeling. Record linkage refers to the statistical task of linking two or more datasets of the same group of entities without a unique identifier. This probabilistic procedure brings additional uncertainty to the subsequent task. In this paper, we present two differentially private algorithms for linear regression with linked data. In particular, we propose a noisy gradient method and a sufficient statistics perturbation approach for the estimation of regression coefficients. We investigate the privacy-accuracy tradeoff by providing finite-sample error bounds for the estimators, which allows us to understand the relative contributions of linkage error, estimation error, and the cost of privacy. The variances of the estimators are also discussed. We demonstrate the performance of the proposed algorithms through simulations and an application to synthetic data.

Shurong Lin\upstairs\affilone,*, Elliot Paquette\upstairs\affiltwo, Eric D. Kolaczyk\upstairs\affiltwo
\upstairs\affilone Department of Mathematics and Statistics, Boston University
\upstairs\affiltwo Department of Mathematics and Statistics, McGill University
\emails\upstairs

* [email protected]. The research was supported in part by the U.S. Census Bureau Cooperative Agreement CB20ADR0160001 and Canadian NSERC RGPIN-2023-03566. The authors would like to thank Adam Smith (Boston University) for all the helpful discussions and comments.

Keywords: differential privacy, record linkage, data integration, privacy-preserving record linkage, gradient descent

\copyrightnotice

Media Summary

Differential privacy is a mathematical framework for ensuring the privacy of individuals in datasets. It mitigates the privacy risk of disclosing sensitive information about individuals within the dataset during data analysis. Under such a framework, we are interested in finding the relationship between two variables (via statistical regression) after they are linked from two data sources with uncertainties. A pre-processing procedure of linking datasets is called record linkage, and the uncertainties should be taken into account in the downstream analysis. In the article, we propose two algorithms that satisfy differential privacy for regression estimation problems with linked data. The theoretical results regarding privacy guarantees and statistical accuracy are provided. We demonstrate the performance of the proposed algorithms through simulations and an application.

1. Introduction

Data for the same group of entities are often scattered across different resources, lacking unique identifiers for perfect linkage. To conduct statistical modeling or inference on the integrated information, it is necessary to probabilistically link multiple datasets by comparing the common quasi-identifiers (e.g., names, gender, address) as a pre-processing step. Such a procedure is called record linkage (RL), also known as entity resolution, or data matching (Christen2012data), which is an essential component of data integration in big data analytics (Dong2015). Thanks to its wide application in many disciplines such as public health and official statistics, record linkage has been studied for decades. Earlier pioneering works include Newcombe1959; fellegi1969; Jaro1989AdvancesIR. In addition, record linkage is frequently used in current practice. The U.S. Census Bureau has a long tradition using record linkage methodology for multiple endeavors. A current prominent example is the Decennial Census (census2022rl). In this context, record linkage involves using administrative records and other data sources to improve data quality, with efforts underway to construct a comprehensive “reference database” including individuals from multiple administrative records. A recent review paper, Olivier2022RL, provided a comprehensive summary of record linkage. Broadly speaking, there are two perspectives regarding record linkage (Chambers2019SmallAE): (1) the primary viewpoint concerns how to link the records; (2) the secondary perspective is focused on how to propagate the uncertainty to the downstream statistical learning tasks after the linkage has been determined. Our focus in this paper will adopt the second of these two perspectives.

Closely related to record linkage is data privacy. In the area of privacy-preserving record linkage (PPRL), two or more private datasets owned by different organizations are linked without revealing the data to one another (Hall2010pprl; Christen2020). The outcome of PPRL is the information regarding which pairs or sets are matched. PPRL, in turn, is associated with secure multiparty computation (SMPC) in that SMPC techniques are commonly used to solve PPRL problems (Kuzu2013; He2017pprl; Rao2019). PPRL to date only engages in the private linkage process from a primary perspective, without concerning how the linkage uncertainties would impact the downstream analysis. On the contrary, the secondary perspective is to modify the statistical tools to account for the linkage uncertainty. Our goal is to incorporate privacy into the secondary perspective of record linkage, which is different from yet complementary to PPRL or SMPC.

Privacy concerns have, if anything, become significantly more exacerbated with the emergence of individual-level big data. Releasing information about a sensitive dataset is subject to a variety of privacy attacks (Dwork2017Exposed). Therefore, there has been a growing demand for establishing robust privacy-preserving methodologies for modern statistics and machine learning. A mathematical framework proposed by Dwork2006, differential privacy (DP), is now considered the gold standard for rigorous privacy protection and has made its way to broad application in industry (apple2017dp; ms2020dp; google2021dp) and the public sector (census2021dp). The literature on differential privacy has been flourishing in recent years and the interface of differential privacy and statistics has started to draw increasing attention from the statistics community.

Recent work on differential privacy focuses primarily on individual statistical and machine learning tasks, with nontrivial upstream pre-processing, such as record linkage, typically not incorporated. In this paper, we consider the linear regression problem, i.e.,

(1) 𝒚=X𝜷+𝒆,𝒆𝒩(0,σ2In)formulae-sequence𝒚𝑋𝜷𝒆similar-to𝒆𝒩0superscript𝜎2subscript𝐼𝑛\bm{y}=X\bm{\beta}+\bm{e},\quad\bm{e}\sim\mathcal{N}(0,\sigma^{2}I_{n})bold_italic_y = italic_X bold_italic_β + bold_italic_e , bold_italic_e ∼ caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT )

but where X𝑋Xitalic_X and 𝒚𝒚\bm{y}bold_italic_y are observed in two separate datasets. As a result, rather than having X𝑋Xitalic_X and 𝒚𝒚\bm{y}bold_italic_y in hand, we are instead provided with a pair X𝑋Xitalic_X and 𝒛𝒛\bm{z}bold_italic_z. Here 𝒛𝒛\bm{z}bold_italic_z is a permutation of 𝒚𝒚\bm{y}bold_italic_y resulting from record linkage performed by an external entity, who also supplies a minimum amount of information about the linkage accuracy. In the regression procedure, we take into account the linkage uncertainty as well as offer differential privacy guarantees. As shown in Figure 1 which depicts the pipeline of the problem we consider, we assume that an external analyst conducts record linkage a priori. From there, we aim to devise a private estimator for the regression coefficients of ultimate interest with the help of differential privacy.

Our focusRecord linkageExternal partyX,𝒛,Q𝑋𝒛𝑄X,\bm{z},Qitalic_X , bold_italic_z , italic_QDP regressionPrivate estimator 𝜷^privsuperscriptbold-^𝜷priv{\bm{\hat{\beta}}^{\text{priv}}}overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPToutputinput
Figure 1. Pipeline of private regression with linked data.

Specifically, we propose two algorithms for linear regression after record linkage to meet differential privacy: (1) post-RL noisy gradient descent (NGD), and (2) post-RL sufficient statistics perturbation (SSP). Our work builds on the seminal work by lahiri2005 where an estimator is proposed for linear regression with linked data in a non-privacy-aware setting. We construct a private estimator, 𝜷^privsuperscriptbold-^𝜷priv{\bm{\hat{\beta}}^{\text{priv}}}overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT, by deploying differential privacy tools to achieve privacy protections. To the best of our knowledge, our work is the first one in the literature to consider a statistical model after record linkage in a privacy-aware setting.

The two proposed algorithms also extend the noisy gradient method (Bassily2014PrivateER) and the “Analyze Gauss” algorithm (DworkTT014), which are applied to linear regression, to additionally handle the presence of linkage errors. Prior works (Sheffet17; wang2018; Bernstein2019; Cai2019TheCO; Alabi2022) on differentially private linear regression do not consider possible record linkage pre-processing. If the data are linked beforehand, directly applying their algorithms to the imperfectly linked data is not ideal. It is well known that overlooking the linkage errors leads to substantial bias even with a high linkage accuracy (Neter1965TheEO; Scheuren1993). Figure 2 showcases a toy example of record linkage, where mismatches, if treated as true, change the sign of the slope estimate. Our illustrative application later in the paper confirms this, where around 90% of the records are correctly linked, and the estimators ignoring linkage errors end up with large biases.

Our focus
first name last name x𝑥xitalic_x
Shurong Lin 1
Erik K 2
Elliot P 3
S Li 4
Dataset A
first name last name y𝑦yitalic_y
S L 2
Eric K 4
Eliot P 6
Sharon Li 8
Dataset B
Figure 2. A toy example of record linkage with mismatches (dashed links).
The true dataset (X,𝒚)𝑋𝒚(X,\bm{y})( italic_X , bold_italic_y ) is {(1,2),(2,4),(3,6),(4,8)}12243648\{(1,2),(2,4),(3,6),(4,8)\}{ ( 1 , 2 ) , ( 2 , 4 ) , ( 3 , 6 ) , ( 4 , 8 ) }, yielding a slop estimate β^1=2subscript^𝛽12\hat{\beta}_{1}=2over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 2, while the linked set (X,𝒛)𝑋𝒛(X,\bm{z})( italic_X , bold_italic_z ) is given by {(1,8),(2,4),(3,6),(4,2)}18243642\{(1,8),(2,4),(3,6),(4,2)\}{ ( 1 , 8 ) , ( 2 , 4 ) , ( 3 , 6 ) , ( 4 , 2 ) }, yielding β^1=1.6subscript^𝛽11.6\hat{\beta}_{1}=-1.6over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = - 1.6.

Accompanying the estimators resulting from our algorithms, we provide mean-squared error bounds under typical regularity assumptions and record linkage schemes. When no linkage errors are present (i.e., a special case in our scenario), our result in Theorem 4.4 improves upon the noisy gradient method proposed in Cai2019TheCO by using zero-concentrated differential privacy (zCDP, Bun2016ConcentratedDP) to enable tighter bounds on privacy cost (see Lemma 10). Additionally, we have presented (approximate) theoretical variances for 𝜷^privsuperscriptbold-^𝜷priv{\bm{\hat{\beta}}^{\text{priv}}}overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT resulting from both proposed algorithms. There appear to be very few other works that have addressed the issue of uncertainty. Two that we are aware of are Alabi, who provided confidence bounds for the univariate case, and Sheffet17, who provided confidence intervals dependent on differential privacy noise. Our work focuses on the multivariate case and appears to be the first to directly work on exact variances rather than relying on bounds.

The remainder of this paper is organized as follows. Section 2 provides preliminaries on linear regression with linked data and differential privacy. We propose our two algorithms in Section 3 and present the relevant theoretical results in Section 4. In Section 5, we conduct a series of simulation studies and an application to synthetic data. Section 6 concludes and discusses future work. Complete proofs of all theorems can be found in the supplementary materials.

2. Preliminaries

In this section, we review the background results of linear regression after record linkage upon which we build our work, and fundamental concepts from differential privacy. Related work on linear regression with linked data and record linkage with privacy awareness are discussed.

2.1. Linear Regression with Record Linkage

Let (X,ΦX)𝑋subscriptΦ𝑋(X,\Phi_{X})( italic_X , roman_Φ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ) and (𝒚,Φ𝒚)𝒚subscriptΦ𝒚(\bm{y},\Phi_{\bm{y}})( bold_italic_y , roman_Φ start_POSTSUBSCRIPT bold_italic_y end_POSTSUBSCRIPT ) be two datasets that refer to the same group of n𝑛nitalic_n entities, with unknown one-to-one correspondence. The quasi-identifiers ΦXsubscriptΦ𝑋\Phi_{X}roman_Φ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT and Φ𝒚subscriptΦ𝒚\Phi_{\bm{y}}roman_Φ start_POSTSUBSCRIPT bold_italic_y end_POSTSUBSCRIPT are used to perform the linkage procedure. Let (X,𝒛)𝑋𝒛(X,\bm{z})( italic_X , bold_italic_z ) be the linked data where 𝒛𝒛\bm{z}bold_italic_z is a permutation of 𝒚𝒚\bm{y}bold_italic_y. Consider the following model for 𝒛𝒛\bm{z}bold_italic_z:

(2) (zi=yj)=qij,i,j=1,,n,formulae-sequencesubscript𝑧𝑖subscript𝑦𝑗subscript𝑞𝑖𝑗𝑖𝑗1𝑛\mathbb{P}(z_{i}=y_{j})=q_{ij},\quad i,j=1,\dots,n,blackboard_P ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = italic_q start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT , italic_i , italic_j = 1 , … , italic_n ,

then j=1nqij=1superscriptsubscript𝑗1𝑛subscript𝑞𝑖𝑗1\sum_{j=1}^{n}q_{ij}=1∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = 1 for all i𝑖iitalic_i and i=1nqij=1superscriptsubscript𝑖1𝑛subscript𝑞𝑖𝑗1\sum_{i=1}^{n}q_{ij}=1∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = 1 for all j𝑗jitalic_j. Thus, qiisubscript𝑞𝑖𝑖q_{ii}italic_q start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT is the probability of the i𝑖iitalic_ith record being linked correctly. Let Q=(qij)𝑄subscript𝑞𝑖𝑗Q=(q_{ij})italic_Q = ( italic_q start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ), which we call the matching probability matrix (MPM), a doubly stochastic matrix. The matrix Q𝑄Qitalic_Q can be estimated, for example, through bootstrapping (Chipperfield2015; Chipperfield2020). In some cases, estimating Q𝑄Qitalic_Q can require inference on only a single parameter (e.g., in the exchangeable linkage error (ELE) model described in Section 2.1.1).

For the fixed-design homoskedastic linear model (1), when inference is done after record linkage based on (X,𝒛)𝑋𝒛(X,{\bm{z}})( italic_X , bold_italic_z ), lahiri2005 proposed an unbiased estimator

(3) 𝜷^RL=(WW)1W𝒛,superscript^𝜷RLsuperscriptsuperscript𝑊top𝑊1superscript𝑊top𝒛{\hat{\bm{\beta}}^{\text{RL}}}=(W^{\top}W)^{-1}W^{\top}\bm{z},over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT = ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_z ,

where W=QX𝑊𝑄𝑋W=QXitalic_W = italic_Q italic_X. Let 𝒘isubscript𝒘𝑖\bm{w}_{i}bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT be the i𝑖iitalic_i-th row vector of W𝑊Witalic_W, then 𝒘i=j=1nqij𝒙jsubscript𝒘𝑖superscriptsubscript𝑗1𝑛subscript𝑞𝑖𝑗subscript𝒙𝑗\bm{w}_{i}=\sum_{j=1}^{n}q_{ij}\bm{x}_{j}bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Note that 𝔼(zi)=𝒘i𝜷𝔼subscript𝑧𝑖superscriptsubscript𝒘𝑖top𝜷\mathbb{E}(z_{i})=\bm{w}_{i}^{\top}\bm{\beta}blackboard_E ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_β, where the expectation is taken over both linkage uncertainties and 𝒚𝒚\bm{y}bold_italic_y. Transforming X𝑋Xitalic_X into W𝑊Witalic_W offers bias correction for regression estimation after record linkage.

In addition, the variance of 𝜷^RLsuperscript^𝜷RL{\hat{\bm{\beta}}^{\text{RL}}}over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT is given by

(4) ΣRL=defVar(𝜷^RL)=(WW)1WΣ𝒛W(WW)1,superscript𝑑𝑒𝑓superscriptΣRLVarsuperscript^𝜷RLsuperscriptsuperscript𝑊top𝑊1superscript𝑊topsubscriptΣ𝒛𝑊superscriptsuperscript𝑊top𝑊1\Sigma^{\text{RL}}\stackrel{{\scriptstyle def}}{{=}}\operatorname{Var}({\hat{% \bm{\beta}}^{\text{RL}}})=(W^{\top}W)^{-1}W^{\top}\Sigma_{\bm{z}}W(W^{\top}W)^% {-1},roman_Σ start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG italic_d italic_e italic_f end_ARG end_RELOP roman_Var ( over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT ) = ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT bold_italic_z end_POSTSUBSCRIPT italic_W ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ,

where Σ𝒛=defVar(𝒛)superscript𝑑𝑒𝑓subscriptΣ𝒛Var𝒛\Sigma_{\bm{z}}\stackrel{{\scriptstyle def}}{{=}}\operatorname{Var}(\bm{z})roman_Σ start_POSTSUBSCRIPT bold_italic_z end_POSTSUBSCRIPT start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG italic_d italic_e italic_f end_ARG end_RELOP roman_Var ( bold_italic_z ). lahiri2005 provide the following characterization of the first two moments of 𝒛𝒛\bm{z}bold_italic_z.

Lemma 2.1 (Theorem A.1, lahiri2005).

Under the model described by (1) and (2), we have for i,j=1,,nformulae-sequence𝑖𝑗1𝑛i,j=1,\dots,nitalic_i , italic_j = 1 , … , italic_n

  • 𝔼(zi)=𝒘i𝜷𝔼subscript𝑧𝑖superscriptsubscript𝒘𝑖top𝜷\mathbb{E}(z_{i})=\bm{w}_{i}^{\top}\bm{\beta}blackboard_E ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_β;

  • Var(zi)=σ2+𝜷Ai𝜷 with Ai=j=1nqij(𝒙j𝒘i)(𝒙j𝒘i)Varsubscript𝑧𝑖superscript𝜎2superscript𝜷topsubscript𝐴𝑖𝜷 with subscript𝐴𝑖superscriptsubscript𝑗1𝑛subscript𝑞𝑖𝑗subscript𝒙𝑗subscript𝒘𝑖superscriptsubscript𝒙𝑗subscript𝒘𝑖top\operatorname{Var}(z_{i})=\sigma^{2}+\bm{\beta}^{\top}A_{i}\bm{\beta}\mbox{ % with }A_{i}=\sum_{j=1}^{n}q_{ij}(\bm{x}_{j}-\bm{w}_{i})(\bm{x}_{j}-\bm{w}_{i})% ^{\top}roman_Var ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + bold_italic_β start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_β with italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT;

  • Cov(zi,zj)=𝜷Aij𝜷 with Aij=u=1nvunqiuqjv(𝒙i𝒘u)(𝒙j𝒘v)Covsubscript𝑧𝑖subscript𝑧𝑗superscript𝜷topsubscript𝐴𝑖𝑗𝜷 with subscript𝐴𝑖𝑗superscriptsubscript𝑢1𝑛superscriptsubscript𝑣𝑢𝑛subscript𝑞𝑖𝑢subscript𝑞𝑗𝑣subscript𝒙𝑖subscript𝒘𝑢superscriptsubscript𝒙𝑗subscript𝒘𝑣top\operatorname{Cov}(z_{i},z_{j})=\bm{\beta}^{\top}A_{ij}\bm{\beta}\mbox{ with }% A_{ij}=\sum_{u=1}^{n}\sum_{v\neq u}^{n}q_{iu}q_{jv}(\bm{x}_{i}-\bm{w}_{u})(\bm% {x}_{j}-\bm{w}_{v})^{\top}roman_Cov ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = bold_italic_β start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT bold_italic_β with italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_u = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_v ≠ italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_i italic_u end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_j italic_v end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_w start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) ( bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - bold_italic_w start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT.

Note that Σ𝒛subscriptΣ𝒛\Sigma_{\bm{z}}roman_Σ start_POSTSUBSCRIPT bold_italic_z end_POSTSUBSCRIPT involves the true coefficients 𝜷𝜷\bm{\beta}bold_italic_β and Σ𝒛=σ2Id+h(𝜷,Q,X)subscriptΣ𝒛superscript𝜎2subscript𝐼𝑑𝜷𝑄𝑋\Sigma_{\bm{z}}=\sigma^{2}I_{d}+h(\bm{\beta},Q,X)roman_Σ start_POSTSUBSCRIPT bold_italic_z end_POSTSUBSCRIPT = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT + italic_h ( bold_italic_β , italic_Q , italic_X ) where h(𝜷,Q,X)𝜷𝑄𝑋h(\bm{\beta},Q,X)italic_h ( bold_italic_β , italic_Q , italic_X ) is a function of 𝜷,Q,X𝜷𝑄𝑋\bm{\beta},Q,Xbold_italic_β , italic_Q , italic_X as elaborated in Lemma 2.1. Compared to the covariance of 𝒚𝒚\bm{y}bold_italic_y, Σ𝒛subscriptΣ𝒛\Sigma_{\bm{z}}roman_Σ start_POSTSUBSCRIPT bold_italic_z end_POSTSUBSCRIPT has an additional component h(𝜷,Q,X)𝜷𝑄𝑋h(\bm{\beta},Q,X)italic_h ( bold_italic_β , italic_Q , italic_X ) due to the uncertainty of record linkage.

2.1.1. Structural Schemes of MPM

The matching probability matrix (MPM) Q𝑄Qitalic_Q is generally assumed to have a simple structure. Two schemes used commonly in the literature are as follows.

Blocking Scheme

It is assumed that the MPM is a block diagonal matrix, which means the true matches only happen within blocks. Blocking significantly reduces the number of pairs for comparison and allows scalable record linkage. This scheme is used in almost all real-world applications, and different methods for blocking have been developed (Christen2012data; Steorts2014a; Christophides2020an).

Exchangeable Linkage Errors (ELE) Model

The ELE model (Chambers2009RegressionAO) assumes homogeneous linkage accuracy and errors:

(5) (correct linkage)=qii=γ,correct linkagesubscript𝑞𝑖𝑖𝛾\displaystyle\mathbb{P}(\text{correct linkage})=q_{ii}=\gamma,blackboard_P ( correct linkage ) = italic_q start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT = italic_γ ,
(incorrect linkage)=qij=1γn1 for ij.incorrect linkagesubscript𝑞𝑖𝑗1𝛾𝑛1 for 𝑖𝑗\displaystyle\mathbb{P}(\text{incorrect linkage})=q_{ij}=\frac{1-\gamma}{n-1}% \text{ for }i\neq j.blackboard_P ( incorrect linkage ) = italic_q start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = divide start_ARG 1 - italic_γ end_ARG start_ARG italic_n - 1 end_ARG for italic_i ≠ italic_j .

The ELE model has been adopted in recent works, such as Chambers2019SmallAE; Chambers2022, for various estimation problems. Even though (5) may oversimplify the reality, it is a representative model for a secondary analyst who has minimum information about the linkage quality. When blocking is used, the homogeneous linkage accuracy assumption is imposed within individual blocks. In other words, it still allows heterogeneous linkage accuracy between blocks.

2.2. Differential Privacy

Let 𝒳𝒳\mathcal{X}caligraphic_X be some data space, and D,D𝒳n𝐷superscript𝐷superscript𝒳𝑛D,D^{\prime}\in\mathcal{X}^{n}italic_D , italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT be two neighboring datasets of size n𝑛nitalic_n which only differ in one record. Such a relation is denoted by DDsimilar-to𝐷superscript𝐷D\sim D^{\prime}italic_D ∼ italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

Definition 1 ((ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP, DworkR14).

For ϵ>0,δ0formulae-sequenceitalic-ϵ0𝛿0\epsilon>0,\delta\geq 0italic_ϵ > 0 , italic_δ ≥ 0, a randomized algorithm A𝐴Aitalic_A: 𝒳nsuperscript𝒳𝑛\mathcal{X}^{n}\rightarrow\mathcal{R}caligraphic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → caligraphic_R is (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-differentially private if, for all DD𝒳nsimilar-to𝐷superscript𝐷superscript𝒳𝑛D\sim D^{\prime}\in\mathcal{X}^{n}italic_D ∼ italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and any 𝒪𝒪\mathcal{O}\subseteq\mathcal{R}caligraphic_O ⊆ caligraphic_R,

(6) (A(D)𝒪)eϵ(A(D)𝒪)+δ.𝐴𝐷𝒪superscript𝑒italic-ϵ𝐴superscript𝐷𝒪𝛿\mathbb{P}(A(D)\in\mathcal{O})\leq e^{\epsilon}\cdot\mathbb{P}(A(D^{\prime})% \in\mathcal{O})+\delta.blackboard_P ( italic_A ( italic_D ) ∈ caligraphic_O ) ≤ italic_e start_POSTSUPERSCRIPT italic_ϵ end_POSTSUPERSCRIPT ⋅ blackboard_P ( italic_A ( italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ caligraphic_O ) + italic_δ .

The expression (6) controls the distance between the output distributions on two neighboring datasets through the privacy budget ϵitalic-ϵ\epsilonitalic_ϵ and δ𝛿\deltaitalic_δ. Intuitively, differential privacy ensures that D𝐷Ditalic_D is not distinguishable from Dsuperscript𝐷D^{\prime}italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT based on the outputs. Thus, ϵitalic-ϵ\epsilonitalic_ϵ should be small enough for the privacy level to be meaningful. Typically, ϵ(103,10)italic-ϵsuperscript10310\epsilon\in(10^{-3},10)italic_ϵ ∈ ( 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT , 10 ) and δ=o(1/n)𝛿𝑜1𝑛\delta=o(1/n)italic_δ = italic_o ( 1 / italic_n ).

Differential privacy enjoys the following properties that facilitate the construction of differentially private algorithms.

Proposition 2.1 (Basic composition, DworkR14).

If f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is (ϵ1,δ1)subscriptitalic-ϵ1subscript𝛿1(\epsilon_{1},\delta_{1})( italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT )-DP and f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is (ϵ2,δ2)subscriptitalic-ϵ2subscript𝛿2(\epsilon_{2},\delta_{2})( italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )-DP, then f:=(f1,f2)assign𝑓subscript𝑓1subscript𝑓2f:=(f_{1},f_{2})italic_f := ( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) is (ϵ1+ϵ2,δ1+δ2)subscriptitalic-ϵ1subscriptitalic-ϵ2subscript𝛿1subscript𝛿2(\epsilon_{1}+\epsilon_{2},\delta_{1}+\delta_{2})( italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )-DP.

Proposition 2.2 (Post-processing, DworkR14).

If f𝑓fitalic_f is (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP, for any deterministic mapping g𝑔gitalic_g that takes f(D)𝑓𝐷f(D)italic_f ( italic_D ) as an input, then g(f(D))𝑔𝑓𝐷g(f(D))italic_g ( italic_f ( italic_D ) ) is (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP.

Generally, a differentially private algorithm is constructed by adding random noise from a certain structured distribution, such as the Laplace or Gaussian distributions. A notion central to the amount of noise we add is the sensitivity of the estimation function we desire to release privately.

Definition 2 (2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-sensitivity).

Let f:𝒳nd:𝑓superscript𝒳𝑛superscript𝑑f:\mathcal{X}^{n}\rightarrow\mathbb{R}^{d}italic_f : caligraphic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT be an algorithm. The 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-sensitivity of f𝑓fitalic_f is defined as

(7) Δf=maxDD𝒳nf(D)f(D)2.subscriptΔ𝑓subscriptsimilar-to𝐷superscript𝐷superscript𝒳𝑛subscriptnorm𝑓𝐷𝑓superscript𝐷2\Delta_{f}=\max_{D\sim D^{\prime}\in\mathcal{X}^{n}}\|f(D)-f(D^{\prime})\|_{2}.roman_Δ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_D ∼ italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ italic_f ( italic_D ) - italic_f ( italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT .

The sensitivity of a function characterizes how much the output would change if one record in the dataset changes. To achieve (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP, the amount of noise we need depends on both the budget and the sensitivity. The Gaussian Mechanism is a canonical example that will be employed herein, which does just that.

Lemma 2.2 (Gaussian mechanism, DworkR14).

Let 0<ϵ<10italic-ϵ10<\epsilon<10 < italic_ϵ < 1 and δ>0𝛿0\delta>0italic_δ > 0. For an algorithm f𝑓fitalic_f on the dataset D𝐷Ditalic_D, the Gaussian Mechanism A()𝐴A(\cdot)italic_A ( ⋅ ) defined as

(8) A(D):=f(D)+u,assign𝐴𝐷𝑓𝐷𝑢A(D):=f(D)+u,italic_A ( italic_D ) := italic_f ( italic_D ) + italic_u ,

where u𝒩(0,2ln(1.25/δ)(Δf/ϵ)2)similar-to𝑢𝒩021.25𝛿superscriptsubscriptΔ𝑓italic-ϵ2u\sim\mathcal{N}(0,2\ln(1.25/\delta)(\Delta_{f}/\epsilon)^{2})italic_u ∼ caligraphic_N ( 0 , 2 roman_ln ( 1.25 / italic_δ ) ( roman_Δ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT / italic_ϵ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), is (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP.

Combining the basic composition rule and the Gaussian mechanism, for a sequence of functions (f1,f2,,fT)subscript𝑓1subscript𝑓2subscript𝑓𝑇(f_{1},f_{2},\dots,f_{T})( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ), let

ut𝒩(0,2T2Δt2ln(1.25T/δ)ϵ2),similar-tosubscript𝑢𝑡𝒩02superscript𝑇2superscriptsubscriptΔ𝑡21.25𝑇𝛿superscriptitalic-ϵ2u_{t}\sim\mathcal{N}\left(0,\frac{2T^{2}\Delta_{t}^{2}\ln(1.25T/\delta)}{% \epsilon^{2}}\right),italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , divide start_ARG 2 italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_ln ( 1.25 italic_T / italic_δ ) end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) ,

where ΔtsubscriptΔ𝑡\Delta_{t}roman_Δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-sensitivity of ftsubscript𝑓𝑡f_{t}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Then, A:=(f1+u1,f2+u2,,ft+uT)assign𝐴subscript𝑓1subscript𝑢1subscript𝑓2subscript𝑢2subscript𝑓𝑡subscript𝑢𝑇A:=(f_{1}+u_{1},f_{2}+u_{2},\dots,f_{t}+u_{T})italic_A := ( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_u start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) satisfies (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP. However, as T𝑇Titalic_T increases, this construction tends to add more noise than necessary due to the loose composition. Instead, we could utilize zero-concentrated differential privacy (zCDP, Bun2016ConcentratedDP), another variant of DP, to achieve tighter composition for (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP. The following Lemma essentially captures the results from Bun2016ConcentratedDP, formulated for our purposes.

Lemma 2.3 (Better composition for (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP via zCDP).

Let ϵ>0,δ>0formulae-sequenceitalic-ϵ0𝛿0\epsilon>0,\delta>0italic_ϵ > 0 , italic_δ > 0. For a sequence of functions (f1,f2,,fT)subscript𝑓1subscript𝑓2subscript𝑓𝑇(f_{1},f_{2},\dots,f_{T})( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ), let

(9) ut𝒩(0,TΔt22ρ),similar-tosubscript𝑢𝑡𝒩0𝑇superscriptsubscriptΔ𝑡22𝜌u_{t}\sim\mathcal{N}\left(0,\frac{T\Delta_{t}^{2}}{2\rho}\right),italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , divide start_ARG italic_T roman_Δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_ρ end_ARG ) ,

with ρ:=ϵ+2ln(1/δ)2(ϵ+ln(1/δ))ln(1/δ)assign𝜌italic-ϵ21𝛿2italic-ϵ1𝛿1𝛿\rho:=\epsilon+2\ln(1/\delta)-2\sqrt{(\epsilon+\ln(1/\delta))\ln(1/\delta)}italic_ρ := italic_ϵ + 2 roman_ln ( 1 / italic_δ ) - 2 square-root start_ARG ( italic_ϵ + roman_ln ( 1 / italic_δ ) ) roman_ln ( 1 / italic_δ ) end_ARG. Then, the randomized algorithm A:=(f1+u1,f2+u2,,fT+uT)assign𝐴subscript𝑓1subscript𝑢1subscript𝑓2subscript𝑢2subscript𝑓𝑇subscript𝑢𝑇A:=(f_{1}+u_{1},f_{2}+u_{2},\dots,f_{T}+u_{T})italic_A := ( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT + italic_u start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) satisfies (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP. If ϵ8ln(1/δ)2+2italic-ϵ81𝛿22\epsilon\leq\frac{8\ln(1/\delta)}{2+\sqrt{2}}italic_ϵ ≤ divide start_ARG 8 roman_ln ( 1 / italic_δ ) end_ARG start_ARG 2 + square-root start_ARG 2 end_ARG end_ARG, it suffices to have

(10) ut𝒩(0,4TΔt2ln(1/δ)ϵ2).similar-tosubscript𝑢𝑡𝒩04𝑇superscriptsubscriptΔ𝑡21𝛿superscriptitalic-ϵ2u_{t}\sim\mathcal{N}\left(0,\frac{4T\Delta_{t}^{2}\ln(1/\delta)}{\epsilon^{2}}% \right).italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , divide start_ARG 4 italic_T roman_Δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_ln ( 1 / italic_δ ) end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) .

Please refer to the supplementary materials for details. Since, in most practical budget settings, we have ϵ8ln(1/δ)2+2italic-ϵ81𝛿22\epsilon\leq\frac{8\ln(1/\delta)}{2+\sqrt{2}}italic_ϵ ≤ divide start_ARG 8 roman_ln ( 1 / italic_δ ) end_ARG start_ARG 2 + square-root start_ARG 2 end_ARG end_ARG, we will apply (10) for composition and analysis in the rest of the paper, acknowledging that (9) is valid for all parameter ranges.

In Section 3, we shall employ Lemmas 2.2 and 10 in devising two distinct algorithms for linear regression after record linkage.

2.3. Related Work

Linear regression with linked data is a fundamental statistical task that has been explored in various articles. Scheuren1993 initially considered the linkage model (2) for linear regression and proposed an estimator that is not generally unbiased. Later, lahiri2005 introduced an exactly unbiased OLS-like estimator given in (3) with an expression for the variance, which outperformed the approach by Scheuren1993. Besides, Chambers2009RegressionAO; Zhang2021linkage offered a few other estimators. According to their simulation studies, some of the estimators provided performance that was at most similar, but not noticeably better, compared to the one proposed by lahiri2005. Yet, Zhang2021linkage relaxed the condition by not assuming that the probability of correct linkage, qiisubscript𝑞𝑖𝑖q_{ii}italic_q start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT in the model (2), can be obtained or estimated. For more extensive reviews of this literature, Wang2022reg gave an account of the recent development of various methods on regression analysis with linked datasets. Chambers2022 reviewed current research on robust regression of linked data.

On the other hand, there is ongoing research on privacy-preserving record linkage (PPRL) in the field of computer science. PPRL aims to privately link multiple sensitive datasets held by different organizations when they are unwilling or not permitted to share their data with external parties due to privacy and confidentiality concerns. To achieve privacy protection, techniques such as SMPC and DP are combined with machine learning and deep learning methods for conducting PPRL (Christen2020; Divanis2021; Ranbaduge2022PPRL). PPRL primarily concerns data leakage during the linkage process and produces a linked dataset that can be used for further analysis, yet most applications treat the linked data as if there were no linkage errors. Neither the uncertainty propagation nor private release of the downstream analysis is considered within the scope of PPRL.

Note that there are several articles on privacy-preserving analysis on vertically partitioned databases. In these databases, the attributes are distributed among multiple parties, but common unique identifiers exist to facilitate data linkage across the different parties. Unlike probabilistic record linkage, vertically partitioned databases do not involve linkage errors. Du2004; Sanil2004; Hall2011SecureML; Gasc2017 discussed the implementations of privacy-preserving linear regression protocols that prevent data disclosure across organizations, whereas Dwork2004PPVRD considered data mining from the perspective of the private release of statistical querying in a spirit similar to our work.

3. Differentially Private Algorithms

The unbiased and simply structured estimator provided in (3) with a known closed-form variance makes it a suitable prototype to construct our private estimators. We introduce two differentially private algorithms in the following, based on (1) noisy gradient descent, and (2) sufficient statistics perturbation. As the names suggest, we mitigate privacy risk by perturbing either the gradient or sufficient statistics during the computation of the linear model. Hereafter, if not specified otherwise, \|\cdot\|∥ ⋅ ∥ denotes the 2-norm.

3.1. Post-RL Noisy Gradient Descent

Gradient descent methods are ubiquitous in scientific computing for numerous optimization problems. Within the framework of differential privacy, Bassily2014PrivateER provided a noisy variant of the classic gradient descent algorithm. It was later adapted by Cai2019TheCO to solve the classic linear regression problem with faster convergence. Leveraging the work by Bassily2014PrivateER; Cai2019TheCO, we tailor the noisy gradient method for the post-RL linear regression model for (X,𝒛)𝑋𝒛(X,\bm{z})( italic_X , bold_italic_z ) based on (1) and (2).

Let n(𝜷)=def12n(𝒛W𝜷)(𝒛W𝜷)superscript𝑑𝑒𝑓subscript𝑛𝜷12𝑛superscript𝒛𝑊𝜷top𝒛𝑊𝜷\mathcal{L}_{n}(\bm{\beta})\stackrel{{\scriptstyle def}}{{=}}\frac{1}{2n}(\bm{% z}-W\bm{\beta})^{\top}(\bm{z}-W\bm{\beta})caligraphic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_italic_β ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG italic_d italic_e italic_f end_ARG end_RELOP divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ( bold_italic_z - italic_W bold_italic_β ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_z - italic_W bold_italic_β ) be the loss function, where recall W=QX𝑊𝑄𝑋W=QXitalic_W = italic_Q italic_X. The minimizer of n(𝜷)subscript𝑛𝜷\mathcal{L}_{n}(\bm{\beta})caligraphic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_italic_β ) is the non-private RL estimator proposed by lahiri2005. Let ΠR(𝒓)subscriptΠ𝑅𝒓\Pi_{R}(\bm{r})roman_Π start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( bold_italic_r ) denote the projection of 𝒓s𝒓superscript𝑠\bm{r}\in\mathbb{R}^{s}bold_italic_r ∈ blackboard_R start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT onto the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ball {𝒓s:𝒓R}conditional-set𝒓superscript𝑠norm𝒓𝑅\{\bm{r}\in\mathbb{R}^{s}:\|\bm{r}\|\leq R\}{ bold_italic_r ∈ blackboard_R start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT : ∥ bold_italic_r ∥ ≤ italic_R }. The post-RL noisy gradient descent (NGD) algorithm is defined as follows.

Algorithm 1 Post-RL Noisy Gradient Descent
1:Linked dataset (X,𝒛)𝑋𝒛(X,\bm{z})( italic_X , bold_italic_z ) and matching probability matrix Q𝑄Qitalic_Q, privacy budget (ϵ,δitalic-ϵ𝛿\epsilon,\deltaitalic_ϵ , italic_δ), noise scale factor B𝐵Bitalic_B, step size η𝜂\etaitalic_η, number of iterations T𝑇Titalic_T, truncation level R𝑅Ritalic_R, feasibility C𝐶Citalic_C, initial value 𝜷0superscript𝜷0\bm{\beta}^{0}bold_italic_β start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT.
2:Let W=QX𝑊𝑄𝑋W=QXitalic_W = italic_Q italic_X.
3:for t=0𝑡0t=0italic_t = 0 to T1𝑇1T-1italic_T - 1 do
4:    Generate 𝒖t𝒩(0,ω2Id)similar-tosubscript𝒖𝑡𝒩0superscript𝜔2subscript𝐼𝑑\bm{u}_{t}\,{\sim}\,\mathcal{N}\left(0,\omega^{2}I_{d}\right)bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) where ω=2ηBTln(1/δ)nϵ𝜔2𝜂𝐵𝑇1𝛿𝑛italic-ϵ\displaystyle\omega=\frac{2\eta B\sqrt{T\ln(1/\delta)}}{n\epsilon}italic_ω = divide start_ARG 2 italic_η italic_B square-root start_ARG italic_T roman_ln ( 1 / italic_δ ) end_ARG end_ARG start_ARG italic_n italic_ϵ end_ARG.
5:    Compute
(11) 𝜷t+1=ΠC(𝜷tηni=1n(𝒘i𝜷tΠR(zi))𝒘i+𝒖t).superscript𝜷𝑡1subscriptΠ𝐶superscript𝜷𝑡𝜂𝑛superscriptsubscript𝑖1𝑛superscriptsubscript𝒘𝑖topsuperscript𝜷𝑡subscriptΠ𝑅subscript𝑧𝑖subscript𝒘𝑖subscript𝒖𝑡\bm{\beta}^{t+1}=\Pi_{C}(\bm{\beta}^{t}-\frac{\eta}{n}\sum_{i=1}^{n}(\bm{w}_{i% }^{\top}\bm{\beta}^{t}-\Pi_{R}(z_{i}))\bm{w}_{i}+\bm{u}_{t}).bold_italic_β start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = roman_Π start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ( bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - divide start_ARG italic_η end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - roman_Π start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) .
6:end for
7:𝜷^priv=𝜷Tsuperscriptbold-^𝜷𝑝𝑟𝑖𝑣superscript𝜷𝑇\bm{\hat{\beta}}^{priv}=\bm{\beta}^{T}overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT italic_p italic_r italic_i italic_v end_POSTSUPERSCRIPT = bold_italic_β start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT.

Algorithm 1 is a modified version of the projected gradient descent that incorporates (1) post-RL transformation of the design matrix, (2) addition of noise 𝒖tsubscript𝒖𝑡\bm{u}_{t}bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT at each gradient step, and (3) use of projection ΠR()subscriptΠ𝑅\Pi_{R}(\cdot)roman_Π start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( ⋅ ) on the response variable. The regular parameters, including η𝜂\etaitalic_η, T𝑇Titalic_T and C𝐶Citalic_C for the projected gradient method, are specified in Theorem 4.4 for the discussion of the accuracy of 𝜷^privsuperscriptbold-^𝜷priv{\bm{\hat{\beta}}^{\text{priv}}}overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT. The injection of noise follows Lemma 10. The scale of the Gaussian noise 𝒖tsubscript𝒖𝑡\bm{u}_{t}bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT at step t𝑡titalic_t depends on the privacy budget (ϵ,δitalic-ϵ𝛿\epsilon,\deltaitalic_ϵ , italic_δ), and the noise scale factor B𝐵Bitalic_B associated with the sensitivity in the update function (11). The purpose of the projection on 𝒛𝒛\bm{z}bold_italic_z is to bound the sensitivity of the gradient. With a proper choice of R𝑅Ritalic_R that scales up with lnn𝑛\sqrt{\ln n}square-root start_ARG roman_ln italic_n end_ARG (specified in Section 4), the projection does not affect the accuracy of the final estimator with high probability.

The major challenge lies in calculating the sensitivity. In the non-RL least square regression, two neighboring datasets D=(X,𝒚)𝐷𝑋𝒚D=(X,\bm{y})italic_D = ( italic_X , bold_italic_y ) and D=(X,𝒚)𝐷superscript𝑋superscript𝒚D=(X^{\prime},\bm{y}^{\prime})italic_D = ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) differ in a single row, making it straightforward to derive the sensitivity of the gradient of n(𝜷)=12n(𝒚X𝜷)(𝒚X𝜷)subscript𝑛𝜷12𝑛superscript𝒚𝑋𝜷top𝒚𝑋𝜷\mathcal{L}_{n}(\bm{\beta})=\frac{1}{2n}(\bm{y}-X\bm{\beta})^{\top}(\bm{y}-X% \bm{\beta})caligraphic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_italic_β ) = divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ( bold_italic_y - italic_X bold_italic_β ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_y - italic_X bold_italic_β ). Here, in the context of post-RL analysis, we consider two neighboring datasets containing both linking variables and regression variables, denoted as D=(X,ΦX,𝒚,Φ𝒚)𝐷𝑋subscriptΦ𝑋𝒚subscriptΦ𝒚D=(X,\Phi_{X},\bm{y},\Phi_{\bm{y}})italic_D = ( italic_X , roman_Φ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT , bold_italic_y , roman_Φ start_POSTSUBSCRIPT bold_italic_y end_POSTSUBSCRIPT ) and D=(X,ΦX,𝒚,Φ𝒚)superscript𝐷superscript𝑋subscriptΦsuperscript𝑋superscript𝒚subscriptΦsuperscript𝒚D^{\prime}=(X^{\prime},\Phi_{X^{\prime}},\bm{y}^{\prime},\Phi_{\bm{y}^{\prime}})italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , roman_Φ start_POSTSUBSCRIPT italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , bold_italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , roman_Φ start_POSTSUBSCRIPT bold_italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ), which differ in the record of one individual. The change in one row of the quasi-identifiers ΦXsubscriptΦ𝑋\Phi_{X}roman_Φ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT and Φ𝒚subscriptΦ𝒚\Phi_{\bm{y}}roman_Φ start_POSTSUBSCRIPT bold_italic_y end_POSTSUBSCRIPT may affect more than one row of the matching probability matrix Q𝑄Qitalic_Q. As a result, the entries of the transformed design matrix W=QX𝑊𝑄𝑋W=QXitalic_W = italic_Q italic_X subject to change are not limited to one row as in the non-RL case. Consequently, determining the sensitivity of the gradient of n(𝜷)=12n(𝒛W𝜷)(𝒛W𝜷)subscript𝑛𝜷12𝑛superscript𝒛𝑊𝜷top𝒛𝑊𝜷\mathcal{L}_{n}(\bm{\beta})=\frac{1}{2n}(\bm{z}-W\bm{\beta})^{\top}(\bm{z}-W% \bm{\beta})caligraphic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_italic_β ) = divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ( bold_italic_z - italic_W bold_italic_β ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_z - italic_W bold_italic_β ) becomes non-trivial. This challenge distinguishes our work from Cai2019TheCO. However, we will demonstrate in Section 4 that, under a condition on the structure of Q𝑄Qitalic_Q, the sensitivity can be tracked.

3.2. Post-RL Sufficient Statistics Perturbation

Noise can be injected into the process besides the gradient computation. Since the estimator interacts with the data through its (joint) sufficient statistics, an efficient way is to perturb the sufficient statistics to protect the data. Such a technique, sufficient statistics perturbation (SSP), has been used in previous works such as Slavkovic2009; Foulds2016; wang2018. For the non-private OLS estimator 𝜷^OLS=(XX)1X𝒚superscript^𝜷OLSsuperscriptsuperscript𝑋top𝑋1𝑋𝒚{\hat{\bm{\beta}}^{\text{OLS}}}=(X^{\top}X)^{-1}X\bm{y}over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT OLS end_POSTSUPERSCRIPT = ( italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_X ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_X bold_italic_y, to perturb the joint sufficient statistics (XX,X𝒚)superscript𝑋top𝑋𝑋𝒚(X^{\top}X,X\bm{y})( italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_X , italic_X bold_italic_y ), it suffices to add noise to AAsuperscript𝐴top𝐴A^{\top}Aitalic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A where A=(X𝒚)𝐴conditional𝑋𝒚A=(X\mid\bm{y})italic_A = ( italic_X ∣ bold_italic_y ) is the augmented matrix. DworkTT014 offered an algorithm, “Analyze Gauss”, to privately release AAsuperscript𝐴top𝐴A^{\top}Aitalic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A. It was later utilized by Sheffet17 for private linear regression, primarily perturbing the sufficient statistics.

In our work, we adapt the “Analyze Gauss” algorithm to linear regression after record linkage, as shown in Algorithm 2. The noise scale factor B𝐵Bitalic_B is the sensitivity of AA=def(WWW𝒛𝒛W𝒛𝒛)superscript𝑑𝑒𝑓superscript𝐴top𝐴matrixsuperscript𝑊top𝑊superscript𝑊top𝒛superscript𝒛top𝑊superscript𝒛top𝒛A^{\top}A\stackrel{{\scriptstyle def}}{{=}}\begin{pmatrix}W^{\top}W&W^{\top}% \bm{z}\\ \bm{z}^{\top}W&\bm{z}^{\top}\bm{z}\end{pmatrix}italic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG italic_d italic_e italic_f end_ARG end_RELOP ( start_ARG start_ROW start_CELL italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W end_CELL start_CELL italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_z end_CELL end_ROW start_ROW start_CELL bold_italic_z start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W end_CELL start_CELL bold_italic_z start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_z end_CELL end_ROW end_ARG ) which is specified in Section 4. The gram matrix AAsuperscript𝐴top𝐴A^{\top}Aitalic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A exhibits properties that facilitate the computation of its sensitivity. Algorithm 2 illustrates how incorporating the joint sufficient statistics in a comprehensive form facilitates the deployment of differential privacy.

Algorithm 2 Post-RL Sufficient Statistics Perturbation
1:Linked dataset (X,𝒛)𝑋𝒛(X,\bm{z})( italic_X , bold_italic_z ) and matching probability matrix Q𝑄Qitalic_Q, privacy budget (ϵ,δitalic-ϵ𝛿\epsilon,\deltaitalic_ϵ , italic_δ), noise scale factor B𝐵Bitalic_B, truncation level R𝑅Ritalic_R.
2:Let W=QX𝑊𝑄𝑋W=QXitalic_W = italic_Q italic_X.
3:Generate a d×d𝑑𝑑d\times ditalic_d × italic_d symmetric Gaussian random matrix U𝑈Uitalic_U whose upper triangle entries (including the diagonal) are sampled i.i.d. from 𝒩(0,ω2)𝒩0superscript𝜔2\mathcal{N}(0,\omega^{2})caligraphic_N ( 0 , italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) where ω=B2ln(1.25/δ)ϵ𝜔𝐵21.25𝛿italic-ϵ\displaystyle\omega=\frac{B\sqrt{2\ln(1.25/\delta)}}{\epsilon}italic_ω = divide start_ARG italic_B square-root start_ARG 2 roman_ln ( 1.25 / italic_δ ) end_ARG end_ARG start_ARG italic_ϵ end_ARG.
4:Generate a d𝑑ditalic_d-dimensional Gaussian random vector 𝒖𝒖\bm{u}bold_italic_u whose entries are sampled i.i.d. from 𝒩(0,ω2)𝒩0superscript𝜔2\mathcal{N}(0,\omega^{2})caligraphic_N ( 0 , italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ).
5:if WW+Usuperscript𝑊top𝑊𝑈W^{\top}W+Uitalic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W + italic_U is computationally singular then
6:    Repeat steps 23similar-to232\sim 32 ∼ 3.
7:end if
8:𝜷^priv=(WW+U)1(W𝒛+𝒖)superscriptbold-^𝜷privsuperscriptsuperscript𝑊top𝑊𝑈1superscript𝑊topsuperscript𝒛𝒖\displaystyle{\bm{\hat{\beta}}^{\text{priv}}}=(W^{\top}W+U)^{-1}(W^{\top}\bm{z% }^{*}+\bm{u})overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT = ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W + italic_U ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_z start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + bold_italic_u ) where z=(ΠR(z1),,ΠR(zn))superscript𝑧superscriptsubscriptΠ𝑅subscript𝑧1subscriptΠ𝑅subscript𝑧𝑛topz^{*}=(\Pi_{R}(z_{1}),\dots,\Pi_{R}(z_{n}))^{\top}italic_z start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ( roman_Π start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , roman_Π start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT.
Remark 3.1.

In step 4, by post-processing, checking for singularity of WW+Usuperscript𝑊top𝑊𝑈W^{\top}W+Uitalic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W + italic_U consumes no extra privacy budget. In fact, the probability of WW+Usuperscript𝑊top𝑊𝑈W^{\top}W+Uitalic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W + italic_U being singular decreases exponentially as the sample size increases.

An alternative approach to implementing the SSP method is to add random noise separately to each sufficient statistic. In this approach, the total privacy budget should be divided between XXsuperscript𝑋top𝑋X^{\top}Xitalic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_X and X𝒚𝑋𝒚X\bm{y}italic_X bold_italic_y for the estimation of linear regression, as proposed by wang2018. However, treating the joint statistics as a whole is more economical in terms of budgeting in general. Lin2023 showed through comparison that splitting the total budget among the components results in introducing larger noise on average. Although adding noise individually to the components of interest allows for the private release of each quantity, it is not part of the goal of the estimation.

4. Theoretical Results

In this section, we provide the theoretical results of the two algorithms introduced in Section 3. The results are threefold: (1) differential privacy guarantees, (2) finite-sample error bounds, and (3) variances of the private estimators. We present each of these along with a discussion of the corresponding conditions as they relate to the main variables in our record linkage model. All proofs for these results can be found in the supplementary materials.

4.1. Privacy Guarantees

The algorithms are designed to achieve certain privacy guarantees, given the corresponding sensitivity, for the post-RL case:

Theorem 4.1 (Privacy Guarantees).

Assume the following boundedness conditions hold:

(A1) There is a constant cx<subscript𝑐𝑥c_{x}<\inftyitalic_c start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT < ∞ such that 𝐱2cxsubscriptnorm𝐱2subscript𝑐𝑥\|\bm{x}\|_{2}\leq c_{x}∥ bold_italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_c start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT.

(A2) Let Q𝑄Qitalic_Q and Qsuperscript𝑄Q^{\prime}italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT be the matching probability matrices (MPMs) resulting from the neighboring datasets D𝐷Ditalic_D and Dsuperscript𝐷D^{\prime}italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and let QQsimilar-to𝑄superscript𝑄Q\sim Q^{\prime}italic_Q ∼ italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT denote such a relation. We assume that supQQQQ1Msubscriptsupremumsimilar-to𝑄superscript𝑄subscriptnorm𝑄superscript𝑄1𝑀\sup_{Q\sim Q^{\prime}}\|Q-Q^{\prime}\|_{1}\leq Mroman_sup start_POSTSUBSCRIPT italic_Q ∼ italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ italic_Q - italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_M for some constant M<𝑀M<\inftyitalic_M < ∞, where 1\|\cdot\|_{1}∥ ⋅ ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is the entry-wise 1-norm.

Given the linked data (X,𝐳)𝑋𝐳(X,\bm{z})( italic_X , bold_italic_z ) and the matching probability matrix Q𝑄Qitalic_Q for the regression problem in (1), under Assumptions (A1) and (A2), it follows that

  1. (1)

    Algorithm 1 satisfies (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-differential privacy with

    (12) B=Rcx(M+4)+2Ccx2(M+2),𝐵𝑅subscript𝑐𝑥𝑀42𝐶subscriptsuperscript𝑐2𝑥𝑀2B=Rc_{x}(M+4)+2Cc^{2}_{x}(M+2),italic_B = italic_R italic_c start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_M + 4 ) + 2 italic_C italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_M + 2 ) ,
  2. (2)

    Algorithm 2 satisfies (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-differential privacy with

    (13) B=Rcx(M+4)+max{2cx2(M+2),2R2}.𝐵𝑅subscript𝑐𝑥𝑀42superscriptsubscript𝑐𝑥2𝑀22superscript𝑅2B=Rc_{x}(M+4)+\max\{2c_{x}^{2}(M+2),2R^{2}\}.italic_B = italic_R italic_c start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_M + 4 ) + roman_max { 2 italic_c start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_M + 2 ) , 2 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } .

Essentially, we assume that the data domain is bounded, which is critical for deriving a finite sensitivity of the target function on the data. (A1) is a standard assumption for a bounded design X𝑋Xitalic_X. For the linking variables that are generally categorical, there are no analogous definitions of “norm” for numerical vectors. Instead, (A2) is imposed on the MPM since it summarizes all the information of the linking variables in the linkage model we consider. Specifically, we assume that two MPMs produced by two neighboring datasets do not differ much in terms of the entry-wise 1 norm. This assumption characterizes a bounded linkage model.

The rationale of (A2) is supported by typical schemes imposed on the structures of MPM in practice, as reviewed in Section 2.1.1. For example, with the blocking scheme, the size of each block is manageably small (O(1)). When one record is altered, the fluctuation of the MPM is limited to at most two blocks. Additionally, with the ELE model (5), as long as the changes to a single record only affect a finite number of records, the linkage accuracy γ𝛾\gammaitalic_γ changes at most O(1/n)𝑂1𝑛O(1/n)italic_O ( 1 / italic_n ). Therefore, we have supQQQQ1=O(1)subscriptsupremumsimilar-to𝑄superscript𝑄subscriptnorm𝑄superscript𝑄1𝑂1\sup_{Q\sim Q^{\prime}}\|Q-Q^{\prime}\|_{1}=O(1)roman_sup start_POSTSUBSCRIPT italic_Q ∼ italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ italic_Q - italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_O ( 1 ). In general, a robust record linkage approach should not produce two considerably different MPMs from two neighboring datasets. Therefore, it is realistic to assume a bounded linkage model.

The proofs of Theorem 4.1 revolve around calculating the sensitivity of the target function in each algorithm. Besides the upper bounds cxsubscript𝑐𝑥c_{x}italic_c start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT and M𝑀Mitalic_M discussed above, the sensitivity also depends on the truncation level R𝑅Ritalic_R on the response. Truncation is commonly used in DP algorithm designs when there are no priori bounds on the relevant quantities (e.g., Abadi2016). In Section 4.2, we provide a specific choice of R𝑅Ritalic_R and present an accuracy statement with high probability.

4.2. Finite-Sample Error Bounds

We study the accuracy of the proposed estimators by deriving the finite-sample error bounds. In the following, we introduce two more assumptions in addition to (A1) and (A2):

(A3) The true parameter 𝜷𝜷\bm{\beta}bold_italic_β satisfies 𝜷2c0subscriptnorm𝜷2subscript𝑐0\|\bm{\beta}\|_{2}\leq c_{0}∥ bold_italic_β ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT for some constant 0<c0<0subscript𝑐00<c_{0}<\infty0 < italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < ∞.

(A4) The minimum and maximum eigenvalues of WW/nsuperscript𝑊top𝑊𝑛W^{\top}W/nitalic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W / italic_n satisfy

(14) 0<1L<dλmin(WWn)dλmax(WWn)<L01𝐿𝑑subscript𝜆superscript𝑊top𝑊𝑛𝑑subscript𝜆superscript𝑊top𝑊𝑛𝐿0<\frac{1}{L}<d\lambda_{\min}\left(\frac{W^{\top}W}{n}\right)\leq d\lambda_{% \max}\left(\frac{W^{\top}W}{n}\right)<L0 < divide start_ARG 1 end_ARG start_ARG italic_L end_ARG < italic_d italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( divide start_ARG italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W end_ARG start_ARG italic_n end_ARG ) ≤ italic_d italic_λ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ( divide start_ARG italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W end_ARG start_ARG italic_n end_ARG ) < italic_L

for some constant 1<L<1𝐿1<L<\infty1 < italic_L < ∞.

Assumption (A4) implies the smoothness and strong convexity of the loss function n(𝜷)=12n(𝒛W𝜷)(𝒛W𝜷)subscript𝑛𝜷12𝑛superscript𝒛𝑊𝜷top𝒛𝑊𝜷\mathcal{L}_{n}(\bm{\beta})=\frac{1}{2n}(\bm{z}-W\bm{\beta})^{\top}(\bm{z}-W% \bm{\beta})caligraphic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_italic_β ) = divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ( bold_italic_z - italic_W bold_italic_β ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_z - italic_W bold_italic_β ), which allows for a fast convergence rate for the gradient descent method in Algorithm 1. On the other hand, for Algorithm 2, note that the term (WW)1superscriptsuperscript𝑊top𝑊1(W^{\top}W)^{-1}( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT is a component of sufficient statistics. Assumption (A4) offers a bound on the norm of (WW)1superscriptsuperscript𝑊top𝑊1(W^{\top}W)^{-1}( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, which helps derive the error bound of 𝜷^privsuperscriptbold-^𝜷priv{\bm{\hat{\beta}}^{\text{priv}}}overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT. Let Assumption (A4’) be (A4) with W𝑊Witalic_W replaced by X𝑋Xitalic_X and the constant L𝐿Litalic_L replaced by Lsuperscript𝐿L^{\prime}italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. The larger of L𝐿Litalic_L and Lsuperscript𝐿L^{\prime}italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT can be chosen as the constant to satisfy both (A4) and (A4’). Therefore, for convenience, we consider (A4) and (A4’) to be the same assumption. We first obtain the accuracy of the non-private estimators, for comparison purposes.

Lemma 4.2.

Let 𝛃^OLS=argmin𝛃(𝐲X𝛃)(𝐲X𝛃)superscript^𝛃OLS𝛃superscript𝐲𝑋𝛃top𝐲𝑋𝛃{\hat{\bm{\beta}}^{\text{OLS}}}=\underset{\bm{\beta}}{\arg\min}(\bm{y}-X\bm{% \beta})^{\top}(\bm{y}-X\bm{\beta})over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT OLS end_POSTSUPERSCRIPT = underbold_italic_β start_ARG roman_arg roman_min end_ARG ( bold_italic_y - italic_X bold_italic_β ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_y - italic_X bold_italic_β ) be the OLS estimator. Then, under (A4), it follows that 𝔼𝛃^OLS𝛃2=σ2tr(XX)1=Θ(σ2d2n)\displaystyle\mathbb{E}\|{{\hat{\bm{\beta}}^{\text{OLS}}}-\bm{\beta}}\|^{2}=% \sigma^{2}\operatorname{tr}(X^{\top}X)^{-1}=\Theta\left(\frac{\sigma^{2}d^{2}}% {n}\right)blackboard_E ∥ over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT OLS end_POSTSUPERSCRIPT - bold_italic_β ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_tr ( italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_X ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = roman_Θ ( divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG ).

Lemma 4.3.

Let 𝛃^RL=argmin𝛃(𝐳W𝛃)(𝐳W𝛃)superscript^𝛃RL𝛃superscript𝐳𝑊𝛃top𝐳𝑊𝛃{\hat{\bm{\beta}}^{\text{RL}}}=\underset{\bm{\beta}}{\arg\min}(\bm{z}-W\bm{% \beta})^{\top}(\bm{z}-W\bm{\beta})over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT = underbold_italic_β start_ARG roman_arg roman_min end_ARG ( bold_italic_z - italic_W bold_italic_β ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_z - italic_W bold_italic_β ) be the non-private record linkage estimator, and ΣRLsuperscriptΣRL\Sigma^{\text{RL}}roman_Σ start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT be the covariance matrix of 𝛃^RLsuperscript^𝛃RL{\hat{\bm{\beta}}^{\text{RL}}}over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT. Then,

(15) 𝔼𝜷^RL𝜷2=tr(ΣRL),𝔼superscriptnormsuperscript^𝜷RL𝜷2trsuperscriptΣRL\mathbb{E}\|{{\hat{\bm{\beta}}^{\text{RL}}}-\bm{\beta}}\|^{2}=\operatorname{tr% }(\Sigma^{\text{RL}}),blackboard_E ∥ over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT - bold_italic_β ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = roman_tr ( roman_Σ start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT ) ,

where ΣRL=(WW)1WΣ𝐳W(WW)1superscriptΣRLsuperscriptsuperscript𝑊top𝑊1superscript𝑊topsubscriptΣ𝐳𝑊superscriptsuperscript𝑊top𝑊1\Sigma^{\text{RL}}=(W^{\top}W)^{-1}W^{\top}\Sigma_{\bm{z}}W(W^{\top}W)^{-1}roman_Σ start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT = ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT bold_italic_z end_POSTSUBSCRIPT italic_W ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT.

As a special case, when the linkage is perfect (i.e., Q𝑄Qitalic_Q is an identity matrix), the expected error of 𝜷^RLsuperscript^𝜷RL{\hat{\bm{\beta}}^{\text{RL}}}over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT in (15) takes the reduced form σ2tr(XX)1\sigma^{2}\operatorname{tr}(X^{\top}X)^{-1}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_tr ( italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_X ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT which is exactly the lower bound obtained by 𝜷^OLSsuperscript^𝜷OLS{\hat{\bm{\beta}}^{\text{OLS}}}over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT OLS end_POSTSUPERSCRIPT. Then, by Lemma 4.2, we know that 𝔼𝜷^RL𝜷2𝔼superscriptnormsuperscript^𝜷RL𝜷2\mathbb{E}\|{{\hat{\bm{\beta}}^{\text{RL}}}-\bm{\beta}}\|^{2}blackboard_E ∥ over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT - bold_italic_β ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is of order at least σ2d2nsuperscript𝜎2superscript𝑑2𝑛\displaystyle\frac{\sigma^{2}d^{2}}{n}divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG under (A4). From a secondary perspective regarding record linkage, it is beyond our scope to study how tr(ΣRL)trsuperscriptΣRL\operatorname{tr}(\Sigma^{\text{RL}})roman_tr ( roman_Σ start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT ) behaves in general.

For the two proposed algorithms, we present upper bounds of the excess squared error of the private estimators, namely, 𝜷^priv𝜷^RL2superscriptnormsuperscriptbold-^𝜷privsuperscript^𝜷RL2\|{\bm{\hat{\beta}}^{\text{priv}}}-{\hat{\bm{\beta}}^{\text{RL}}}\|^{2}∥ overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

Theorem 4.4 (Post-RL NGD).

Given the linked data (X,𝐳)𝑋𝐳(X,\bm{z})( italic_X , bold_italic_z ) and the matching probability matrix Q𝑄Qitalic_Q for the regression problem in (1), set the parameters of Algorithm 1 as follows:

  • step size η=d/L𝜂𝑑𝐿\eta=d/Litalic_η = italic_d / italic_L, number of iterations T=L2ln(c02n)𝑇superscript𝐿2superscriptsubscript𝑐02𝑛T=\lceil L^{2}\ln(c_{0}^{2}n)\rceilitalic_T = ⌈ italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_ln ( italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n ) ⌉, feasibility C=c0𝐶subscript𝑐0C=c_{0}italic_C = italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, initialization 𝜷0=𝟎superscript𝜷00\bm{\beta}^{0}=\bm{0}bold_italic_β start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = bold_0;

  • truncation level R=σ2lnn𝑅𝜎2𝑛R=\sigma\sqrt{2\ln n}italic_R = italic_σ square-root start_ARG 2 roman_ln italic_n end_ARG;

  • noise scale factor B=Rcx(M+4)+2c0cx2(M+2)𝐵𝑅subscript𝑐𝑥𝑀42subscript𝑐0subscriptsuperscript𝑐2𝑥𝑀2B=Rc_{x}(M+4)+2c_{0}c^{2}_{x}(M+2)italic_B = italic_R italic_c start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_M + 4 ) + 2 italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_M + 2 );

Under Assumptions (A1)-(A4), given δ=o(1/n)𝛿𝑜1𝑛\delta=o(1/n)italic_δ = italic_o ( 1 / italic_n ), with probability at least 1c1ec2lnnec3d1subscript𝑐1superscript𝑒subscript𝑐2𝑛superscript𝑒subscript𝑐3𝑑1-c_{1}e^{-c_{2}\ln n}-e^{-c_{3}d}1 - italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_ln italic_n end_POSTSUPERSCRIPT - italic_e start_POSTSUPERSCRIPT - italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_d end_POSTSUPERSCRIPT where c1,c2,c3subscript𝑐1subscript𝑐2subscript𝑐3c_{1},c_{2},c_{3}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT are constants (see the proof), it follows that

(16) 𝜷^priv𝜷^RL2=1n+O(σ2d3ln2nln(1/δ)n2ϵ2).superscriptnormsuperscriptbold-^𝜷privsuperscript^𝜷RL21𝑛𝑂superscript𝜎2superscript𝑑3superscript2𝑛1𝛿superscript𝑛2superscriptitalic-ϵ2\|{\bm{\hat{\beta}}^{\text{priv}}}-{\hat{\bm{\beta}}^{\text{RL}}}\|^{2}=\frac{% 1}{n}+O\left(\frac{\sigma^{2}d^{3}\ln^{2}n\ln(1/\delta)}{n^{2}\epsilon^{2}}% \right).∥ overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG + italic_O ( divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT roman_ln start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n roman_ln ( 1 / italic_δ ) end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) .
Theorem 4.5 (Post-RL SSP).

Given the linked data (X,𝐳)𝑋𝐳(X,\bm{z})( italic_X , bold_italic_z ) and the matching probability matrix Q𝑄Qitalic_Q for the regression problem in (1), in Algorithm 2, set

  • truncation level R=σ2lnn𝑅𝜎2𝑛R=\sigma\sqrt{2\ln n}italic_R = italic_σ square-root start_ARG 2 roman_ln italic_n end_ARG;

  • noise scale factor B=Rcx(M+4)+2max{cx2(M+2),R2}𝐵𝑅subscript𝑐𝑥𝑀42superscriptsubscript𝑐𝑥2𝑀2superscript𝑅2B=Rc_{x}(M+4)+2\max\{c_{x}^{2}(M+2),R^{2}\}italic_B = italic_R italic_c start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_M + 4 ) + 2 roman_max { italic_c start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_M + 2 ) , italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT }.

Under Assumptions (A1)-(A4), given δ=o(1/n)𝛿𝑜1𝑛\delta=o(1/n)italic_δ = italic_o ( 1 / italic_n ), with probability at least 1c1ec2lnnec3d1subscript𝑐1superscript𝑒subscript𝑐2𝑛superscript𝑒subscript𝑐3𝑑1-c_{1}e^{-c_{2}\ln n}-e^{-c_{3}d}1 - italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_ln italic_n end_POSTSUPERSCRIPT - italic_e start_POSTSUPERSCRIPT - italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_d end_POSTSUPERSCRIPT where c1,c2,c3subscript𝑐1subscript𝑐2subscript𝑐3c_{1},c_{2},c_{3}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT are constants (see the proof),

(17) 𝜷^priv𝜷^RL2=O(σ4d3ln2nln(1/δ)n2ϵ2).superscriptnormsuperscriptbold-^𝜷privsuperscript^𝜷RL2𝑂superscript𝜎4superscript𝑑3superscript2𝑛1𝛿superscript𝑛2superscriptitalic-ϵ2\|{\bm{\hat{\beta}}^{\text{priv}}}-{\hat{\bm{\beta}}^{\text{RL}}}\|^{2}=O\left% (\frac{\sigma^{4}d^{3}\ln^{2}n\ln(1/\delta)}{n^{2}\epsilon^{2}}\right).∥ overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_O ( divide start_ARG italic_σ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT roman_ln start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n roman_ln ( 1 / italic_δ ) end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) .

In both algorithms, the response is projected with a level R=σ2lnn𝑅𝜎2𝑛R=\sigma\sqrt{2\ln n}italic_R = italic_σ square-root start_ARG 2 roman_ln italic_n end_ARG where σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is the homoskedastic variance of the random error in linear model (1). Let ={ΠR(zi)=zi,i[n]}formulae-sequencesubscriptΠ𝑅subscript𝑧𝑖subscript𝑧𝑖for-all𝑖delimited-[]𝑛\mathcal{E}=\{\Pi_{R}(z_{i})=z_{i},\forall i\in[n]\}caligraphic_E = { roman_Π start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ∀ italic_i ∈ [ italic_n ] }, then \mathcal{E}caligraphic_E is a high-probability event. The error bound is analyzed under \mathcal{E}caligraphic_E, thus we obtain a statement with high probability.

In the NGD method, the bound consists of two parts on the RHS in (16). The first error term 1/n1𝑛1/n1 / italic_n results from the convergence rate of gradient descent after T𝑇Titalic_T iterations. The second error term is due to the addition of Gaussian noise for privacy and thus involves ϵ,δitalic-ϵ𝛿\epsilon,\deltaitalic_ϵ , italic_δ. It is worth noting that the choice in theory T=L2ln(c02n)𝑇superscript𝐿2superscriptsubscript𝑐02𝑛T=\lceil L^{2}\ln(c_{0}^{2}n)\rceilitalic_T = ⌈ italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_ln ( italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n ) ⌉ is, to some extent, conservative to ensure the first error term is O(1/n)𝑂1𝑛O\left(1/n\right)italic_O ( 1 / italic_n ), which is the same order as 𝔼𝜷^OLS𝜷2𝔼superscriptnormsuperscript^𝜷OLS𝜷2\mathbb{E}\|{{\hat{\bm{\beta}}^{\text{OLS}}}-\bm{\beta}}\|^{2}blackboard_E ∥ over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT OLS end_POSTSUPERSCRIPT - bold_italic_β ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. However, more iterations give rise to larger random noise being added to gradient updates due to a smaller privacy budget per iteration. In practice, a smaller number of iterations may be favored for the tradeoff (see the experiment in Section 5.2), especially when n𝑛nitalic_n is not sufficiently large.

For the SSP algorithm, the convergence rate in (17) depends on similar variables as in the NGD algorithm. The major difference is that it is controlled by σ4superscript𝜎4\sigma^{4}italic_σ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT instead of σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT due to the sensitivity of the gram matrix AAsuperscript𝐴top𝐴A^{\top}Aitalic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A defined in Section 3.2. However, the SSP method has a faster convergence rate when n𝑛nitalic_n is sufficiently large. As a result, the SSP estimator is more susceptible to a large variance of the random error in the response variable whereas the NGD method is more robust. As we shall see in Section 5, the performance of the two algorithms is different under various scenarios.

Putting together Lemma 4.3 and Theorems 4.4 and 4.5, we obtain a high probability error bound for each algorithm as follows.

Corollary 4.1.

Under the regularity conditions (A1)-(A4),

(i) (Post-RL NGD)

(18) 𝔼𝜷^priv𝜷2=O(tr(ΣRL)+σ2d3ln2nln(1/δ)n2ϵ2)𝔼superscriptnormsuperscriptbold-^𝜷priv𝜷2𝑂trsuperscriptΣRLsuperscript𝜎2superscript𝑑3superscript2𝑛1𝛿superscript𝑛2superscriptitalic-ϵ2\mathbb{E}\|{\bm{\hat{\beta}}^{\text{priv}}}-{\bm{\beta}}\|^{2}=O\left(% \operatorname{tr}(\Sigma^{\text{RL}})+\frac{\sigma^{2}d^{3}\ln^{2}n\ln(1/% \delta)}{n^{2}\epsilon^{2}}\right)blackboard_E ∥ overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT - bold_italic_β ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_O ( roman_tr ( roman_Σ start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT ) + divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT roman_ln start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n roman_ln ( 1 / italic_δ ) end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG )

with probability at least 1c1ec2lnnec3d1subscript𝑐1superscript𝑒subscript𝑐2𝑛superscript𝑒subscript𝑐3𝑑1-c_{1}e^{-c_{2}\ln n}-e^{-c_{3}d}1 - italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_ln italic_n end_POSTSUPERSCRIPT - italic_e start_POSTSUPERSCRIPT - italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_d end_POSTSUPERSCRIPT.

(ii) (Post-RL SSP)

(19) 𝔼𝜷^priv𝜷2=O(tr(ΣRL)+σ4d3ln2nln(1/δ)n2ϵ2)𝔼superscriptnormsuperscriptbold-^𝜷priv𝜷2𝑂trsuperscriptΣRLsuperscript𝜎4superscript𝑑3superscript2𝑛1𝛿superscript𝑛2superscriptitalic-ϵ2\mathbb{E}\|{\bm{\hat{\beta}}^{\text{priv}}}-{\bm{\beta}}\|^{2}=O\left(% \operatorname{tr}(\Sigma^{\text{RL}})+\frac{\sigma^{4}d^{3}\ln^{2}n\ln(1/% \delta)}{n^{2}\epsilon^{2}}\right)blackboard_E ∥ overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT - bold_italic_β ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_O ( roman_tr ( roman_Σ start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT ) + divide start_ARG italic_σ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT roman_ln start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n roman_ln ( 1 / italic_δ ) end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG )

with probability at least 1c1ec2lnnec3d1subscript𝑐1superscript𝑒subscript𝑐2𝑛superscript𝑒subscript𝑐3𝑑1-c_{1}e^{-c_{2}\ln n}-e^{-c_{3}d}1 - italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_ln italic_n end_POSTSUPERSCRIPT - italic_e start_POSTSUPERSCRIPT - italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_d end_POSTSUPERSCRIPT.

4.3. Variances

As discussed in the Introduction, although a few works (Alabi; Sheffet17) have addressed uncertainty of DP estimators through confidence bounds and intervals, the exact variance of DP estimators is rarely determined in most cases. Recent work, such as Lin2023, has explored the variance of the private estimators for population proportions that have fairly simple structures. The main barrier to the inspection of variance is that if the noise is injected into the intermediate steps of the estimation process other than the output, then it is difficult to track the variability that noise introduces to the output estimator due to the intricate nature of the algorithm.

The NGD and SSP algorithms are two examples where noise is added in the middle of the estimation process. The operations like function composition and taking the inverse complicate the inspection of the variance of the output estimator 𝜷^privsuperscriptbold-^𝜷priv{\bm{\hat{\beta}}^{\text{priv}}}overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT. To address this issue, we investigate the variance of 𝜷^privsuperscriptbold-^𝜷priv{\bm{\hat{\beta}}^{\text{priv}}}overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT for the two algorithms by studying the variances of two proxy estimators. The theoretical variances of the proxy estimators can be used to approximate those of 𝜷^privsuperscriptbold-^𝜷priv{\bm{\hat{\beta}}^{\text{priv}}}overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT.

Theorem 4.6 (Variance for Post-RL NGD).

In Algorithm 1 , if we consider the estimator without projections

(20) 𝜷t+1=𝜷tηnW(W𝜷t𝒛)+𝒖t,superscript𝜷𝑡1superscript𝜷𝑡𝜂𝑛superscript𝑊top𝑊superscript𝜷𝑡𝒛subscript𝒖𝑡\bm{\beta}^{t+1}=\bm{\beta}^{t}-\frac{\eta}{n}W^{\top}(W\bm{\beta}^{t}-\bm{z})% +\bm{u}_{t},bold_italic_β start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - divide start_ARG italic_η end_ARG start_ARG italic_n end_ARG italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_W bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_italic_z ) + bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,

then the variance of the T𝑇Titalic_Tth iterate is given by

(21) Σ=t=1T(IdA)t1BΣ𝒛Bt=1T(IdA)t1+ω2t=1T(IdA)2t2,Σsuperscriptsubscript𝑡1𝑇superscriptsubscript𝐼𝑑𝐴𝑡1superscript𝐵topsubscriptΣ𝒛𝐵superscriptsubscript𝑡1𝑇superscriptsubscript𝐼𝑑𝐴𝑡1superscript𝜔2superscriptsubscript𝑡1𝑇superscriptsubscript𝐼𝑑𝐴2𝑡2\Sigma=\sum_{t=1}^{T}(I_{d}-A)^{t-1}\cdot B^{\top}\Sigma_{\bm{z}}B\cdot\sum_{t% =1}^{T}(I_{d}-A)^{t-1}+\omega^{2}\sum_{t=1}^{T}(I_{d}-A)^{2t-2},roman_Σ = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT - italic_A ) start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ⋅ italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT bold_italic_z end_POSTSUBSCRIPT italic_B ⋅ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT - italic_A ) start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT + italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT - italic_A ) start_POSTSUPERSCRIPT 2 italic_t - 2 end_POSTSUPERSCRIPT ,

where Idsubscript𝐼𝑑I_{d}italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT is the identity matrix of size d𝑑ditalic_d, A=defηnWWsuperscript𝑑𝑒𝑓𝐴𝜂𝑛superscript𝑊top𝑊\displaystyle A\stackrel{{\scriptstyle def}}{{=}}\frac{\eta}{n}W^{\top}Witalic_A start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG italic_d italic_e italic_f end_ARG end_RELOP divide start_ARG italic_η end_ARG start_ARG italic_n end_ARG italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W, B=defηnWsuperscript𝑑𝑒𝑓𝐵𝜂𝑛𝑊\displaystyle B\stackrel{{\scriptstyle def}}{{=}}\frac{\eta}{n}Witalic_B start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG italic_d italic_e italic_f end_ARG end_RELOP divide start_ARG italic_η end_ARG start_ARG italic_n end_ARG italic_W, and ω2superscript𝜔2\omega^{2}italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is the variance of 𝐮tsubscript𝐮𝑡\bm{u}_{t}bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.

Remark 4.1.

In the non-private case where ω2=0superscript𝜔20\omega^{2}=0italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0, let T𝑇T\to\inftyitalic_T → ∞, in which case

ΣA1BΣ𝒛BA1=(WW)1WΣ𝒛W(WW)1=ΣRL,Σsuperscript𝐴1superscript𝐵topsubscriptΣ𝒛𝐵superscript𝐴1superscriptsuperscript𝑊top𝑊1superscript𝑊topsubscriptΣ𝒛𝑊superscriptsuperscript𝑊top𝑊1superscriptΣRL\Sigma\to A^{-1}B^{\top}\Sigma_{\bm{z}}BA^{-1}=(W^{\top}W)^{-1}W^{\top}\Sigma_% {\bm{z}}W(W^{\top}W)^{-1}=\Sigma^{\text{RL}},roman_Σ → italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT bold_italic_z end_POSTSUBSCRIPT italic_B italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT bold_italic_z end_POSTSUBSCRIPT italic_W ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = roman_Σ start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT ,

which is exactly the variance of 𝛃^RLsuperscript^𝛃RL{\hat{\bm{\beta}}^{\text{RL}}}over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT given in (4).

The estimator in Algorithm 1 is a projected variant of (20). The use of projection with level C𝐶Citalic_C on 𝜷t+1superscript𝜷𝑡1\bm{\beta}^{t+1}bold_italic_β start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT in (11) impedes the exact analysis of variance for 𝜷^privsuperscriptbold-^𝜷priv{\bm{\hat{\beta}}^{\text{priv}}}overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT. Instead, we provide the variance in (21) for the non-projected estimator as a conservative variance for 𝜷^privsuperscriptbold-^𝜷priv{\bm{\hat{\beta}}^{\text{priv}}}overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT. The level of projection, the scale of noise, and the number of iterations together determine how conservative it is. From Remark 4.1, we know that as T𝑇Titalic_T increases, the first term in the RHS of (21) is getting close to ΣRLsuperscriptΣRL\Sigma^{\text{RL}}roman_Σ start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT. The second term, ω2t=1T(IdA)2t2superscript𝜔2superscriptsubscript𝑡1𝑇superscriptsubscript𝐼𝑑𝐴2𝑡2\displaystyle\omega^{2}\sum_{t=1}^{T}(I_{d}-A)^{2t-2}italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT - italic_A ) start_POSTSUPERSCRIPT 2 italic_t - 2 end_POSTSUPERSCRIPT, then summarizes the cumulative variability resulting from adding random noise at each iteration. Note that this term does not converge by simply increasing T𝑇Titalic_T, due to the fact that a smaller budget leads to larger noise at each iteration.

Theorem 4.7 (Variance for Post-RL SSP).

For Algorithm 2, let 𝛃^=𝛃^RL+(WW)1𝐮(WW)1U(𝛃^RL+(WW)1𝐮)superscript^𝛃superscript^𝛃RLsuperscriptsuperscript𝑊top𝑊1𝐮superscriptsuperscript𝑊top𝑊1𝑈superscriptbold-^𝛃RLsuperscriptsuperscript𝑊top𝑊1𝐮\hat{\bm{\beta}}^{\prime}={\hat{\bm{\beta}}^{\text{RL}}}+(W^{\top}W)^{-1}\bm{u% }-(W^{\top}W)^{-1}\cdot U(\bm{\hat{\bm{\beta}}^{\text{RL}}}+(W^{\top}W)^{-1}% \bm{u})over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT + ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_u - ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ⋅ italic_U ( overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT + ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_u ), then 𝛃^priv𝛃^p0superscript𝑝superscriptbold-^𝛃privsuperscript^𝛃0{\bm{\hat{\beta}}^{\text{priv}}}-\hat{\bm{\beta}}^{\prime}\stackrel{{% \scriptstyle p}}{{\to}}0overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_RELOP SUPERSCRIPTOP start_ARG → end_ARG start_ARG italic_p end_ARG end_RELOP 0 as n𝑛n\to\inftyitalic_n → ∞. The variance of 𝛃^superscript^𝛃\hat{\bm{\beta}}^{\prime}over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is given by

(22) Σ=ΣRL+ω2(WW)1(Id+Σ0+Σ1+Σ2)(WW)1,ΣsuperscriptΣRLsuperscript𝜔2superscriptsuperscript𝑊top𝑊1subscript𝐼𝑑subscriptΣ0subscriptΣ1subscriptΣ2superscriptsuperscript𝑊top𝑊1\Sigma=\Sigma^{\text{RL}}+\omega^{2}(W^{\top}W)^{-1}(I_{d}+\Sigma_{0}+\Sigma_{% 1}+\Sigma_{2})(W^{\top}W)^{-1},roman_Σ = roman_Σ start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT + italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT + roman_Σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + roman_Σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + roman_Σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ,

where ΣRL=Cov(𝛃^RL)superscriptΣRLCovsuperscript^𝛃RL\Sigma^{\text{RL}}=\operatorname{Cov}({\hat{\bm{\beta}}^{\text{RL}}})roman_Σ start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT = roman_Cov ( over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT ) and the entries of Σ0subscriptΣ0\Sigma_{0}roman_Σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, Σ1subscriptΣ1\Sigma_{1}roman_Σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and Σ2subscriptΣ2\Sigma_{2}roman_Σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are given by

  • (Σ0)kk=i=1dβi2subscriptsubscriptΣ0𝑘𝑘superscriptsubscript𝑖1𝑑superscriptsubscript𝛽𝑖2(\Sigma_{0})_{kk}=\sum_{i=1}^{d}\beta_{i}^{2}( roman_Σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for k=1,,d𝑘1𝑑k=1,...,ditalic_k = 1 , … , italic_d; (Σ0)kl=βkβlsubscriptsubscriptΣ0𝑘𝑙subscript𝛽𝑘subscript𝛽𝑙(\Sigma_{0})_{kl}=\beta_{k}\beta_{l}( roman_Σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k italic_l end_POSTSUBSCRIPT = italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT for kl𝑘𝑙k\neq litalic_k ≠ italic_l.

  • (Σ1)kk=i=1dΣiiRLsubscriptsubscriptΣ1𝑘𝑘superscriptsubscript𝑖1𝑑subscriptsuperscriptΣRL𝑖𝑖(\Sigma_{1})_{kk}=\sum_{i=1}^{d}\Sigma^{\text{RL}}_{ii}( roman_Σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT for k=1,,d𝑘1𝑑k=1,...,ditalic_k = 1 , … , italic_d; (Σ1)kl=ΣklRLsubscriptsubscriptΣ1𝑘𝑙subscriptsuperscriptΣRL𝑘𝑙(\Sigma_{1})_{kl}=\Sigma^{\text{RL}}_{kl}( roman_Σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k italic_l end_POSTSUBSCRIPT = roman_Σ start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k italic_l end_POSTSUBSCRIPT for kl𝑘𝑙k\neq litalic_k ≠ italic_l.

  • (Σ2)kk=i=1dΣiisubscriptsubscriptΣ2𝑘𝑘superscriptsubscript𝑖1𝑑subscriptsuperscriptΣ𝑖𝑖(\Sigma_{2})_{kk}=\sum_{i=1}^{d}\Sigma^{\prime}_{ii}( roman_Σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT for k=1,,d𝑘1𝑑k=1,...,ditalic_k = 1 , … , italic_d; (Σ2)kl=ΣklsubscriptsubscriptΣ2𝑘𝑙subscriptsuperscriptΣ𝑘𝑙(\Sigma_{2})_{kl}=\Sigma^{\prime}_{kl}( roman_Σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k italic_l end_POSTSUBSCRIPT = roman_Σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k italic_l end_POSTSUBSCRIPT for kl,𝑘𝑙k\neq l,italic_k ≠ italic_l , where Σ=defω2(WW)2superscript𝑑𝑒𝑓superscriptΣsuperscript𝜔2superscriptsuperscript𝑊top𝑊2\Sigma^{\prime}\stackrel{{\scriptstyle def}}{{=}}\omega^{2}(W^{\top}W)^{-2}roman_Σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG italic_d italic_e italic_f end_ARG end_RELOP italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT.

Remark 4.2.

As shown in the proof of 4.7 (see the supplemental), the proxy estimator 𝛃^superscript^𝛃\hat{\bm{\beta}}^{\prime}over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is a first-order approximation for 𝛃^privsuperscriptbold-^𝛃priv{\bm{\hat{\beta}}^{\text{priv}}}overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT using Taylor series for the term (I+U(WW)1)1superscript𝐼𝑈superscriptsuperscript𝑊top𝑊11(I+U(W^{\top}W)^{-1})^{-1}( italic_I + italic_U ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT which appears in the decomposition of 𝛃^privsuperscriptbold-^𝛃priv{\bm{\hat{\beta}}^{\text{priv}}}overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT.

The variance of β^superscript^𝛽\hat{\beta}^{\prime}over^ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT also consists of two parts: the variance of the non-private estimator 𝜷^RLsuperscript^𝜷RL{\hat{\bm{\beta}}^{\text{RL}}}over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT and the additional variation due to the noise injected for privacy purposes. Given Assumption (A4), we have (WW)1=O(d/n)normsuperscriptsuperscript𝑊top𝑊1𝑂𝑑𝑛\|(W^{\top}W)^{-1}\|=O(d/n)∥ ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ = italic_O ( italic_d / italic_n ) that appears in Σ1subscriptΣ1\Sigma_{1}roman_Σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and Σ2subscriptΣ2\Sigma_{2}roman_Σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. As n𝑛nitalic_n increases, the dominant component of the second term would be ω2(WW)1(Id+Σ0)(WW)1superscript𝜔2superscriptsuperscript𝑊top𝑊1subscript𝐼𝑑subscriptΣ0superscriptsuperscript𝑊top𝑊1\omega^{2}(W^{\top}W)^{-1}(I_{d}+\Sigma_{0})(W^{\top}W)^{-1}italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT + roman_Σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT.

5. Numerical Results

To evaluate the finite-sample performance of the proposed algorithms, we conduct a series of simulation studies and an application to a synthetic dataset that contains real data.

5.1. Simulation Studies

In this section, we conduct simulation studies to assess the performance of the two proposed algorithms for simple linear regression with linked data. The non-private OLS estimator and RL estimator 𝜷^RLsuperscript^𝜷RL{\hat{\bm{\beta}}^{\text{RL}}}over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT by lahiri2005 are included as benchmarks. The private, non-RL counterpart methods are also performed in the absence of linkage errors for comparison.

For each simulation, a fixed design matrix X𝑋Xitalic_X and an matching probability matrix Q𝑄Qitalic_Q are produced and a total of 1000 repetitions are run over the randomness of both the intrinsic error 𝒆𝒩(0,σ2In)similar-to𝒆𝒩0superscript𝜎2subscript𝐼𝑛\bm{e}\sim\mathcal{N}(0,\sigma^{2}I_{n})bold_italic_e ∼ caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) of the regression model and the noise injected for privacy. Figure 3 displays the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT relative error and both empirical and theoretical variances for the two settings.

Two sets of simulations are conducted to explore the performance with varying sample size n𝑛nitalic_n and σ𝜎\sigmaitalic_σ, the homoskedastic variance of the random error in linear model (1). The parameters are set as follows:

  • ELE linkage model: blockwise linkage accuracy γisubscript𝛾𝑖\gamma_{i}italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT characterizing Q𝑄Qitalic_Q, block size ni=25subscript𝑛𝑖25n_{i}=25italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 25.

    • Settings 1 and 2: γiuniform(0.6,0.9)subscript𝛾𝑖uniform0.60.9\gamma_{i}\in\text{uniform}(0.6,0.9)italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ uniform ( 0.6 , 0.9 ), M=1𝑀1M=1italic_M = 1 in Assumption (A2).

    • Setting 3: the linkage accuracy γiγsubscript𝛾𝑖𝛾\gamma_{i}\equiv\gammaitalic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≡ italic_γ which varies from 0.6 to 1, while M𝑀Mitalic_M scales from 1 to 0.

  • regression model: x1,,xni.i.d.uniform(1,1)x_{1},\dots,x_{n}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\text{uniform}(-1,1)italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_RELOP SUPERSCRIPTOP start_ARG ∼ end_ARG start_ARG italic_i . italic_i . italic_d . end_ARG end_RELOP uniform ( - 1 , 1 ), true regression coefficient β=1𝛽1\beta=1italic_β = 1.

    • Setting 1: n𝑛nitalic_n varies from 3,000 to 10,000, σ𝜎\sigmaitalic_σ is fixed at 1.

    • Setting 2: n𝑛nitalic_n is fixed at 10,0001000010,00010 , 000, σ𝜎\sigmaitalic_σ varies from 0.5 to 1.8.

    • Setting 3: n𝑛nitalic_n is fixed at 10,0001000010,00010 , 000, σ𝜎\sigmaitalic_σ is fixed at 1.

  • privacy budget: ϵ=1italic-ϵ1\epsilon=1italic_ϵ = 1, δ=1/n1.1𝛿1superscript𝑛1.1\delta=1/n^{1.1}italic_δ = 1 / italic_n start_POSTSUPERSCRIPT 1.1 end_POSTSUPERSCRIPT.

Refer to caption
(a) Setting 1: σ=1𝜎1\sigma=1italic_σ = 1
Refer to caption
(b) Setting 1: σ=1𝜎1\sigma=1italic_σ = 1
Refer to caption
(c) Setting 2: n=10,000𝑛10000n=10,000italic_n = 10 , 000
Refer to caption
(d) Setting 2: n=10,000𝑛10000n=10,000italic_n = 10 , 000
Refer to caption
(e) Setting 3: σ=1𝜎1\sigma=1italic_σ = 1, n=10,000𝑛10000n=10,000italic_n = 10 , 000
Refer to caption
(f) Setting 3: σ=1𝜎1\sigma=1italic_σ = 1, n=10,000𝑛10000n=10,000italic_n = 10 , 000
Figure 3. Average 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-error and variance (theoretical versus empirical), with (ϵ,δ)=(1,8.5×105)italic-ϵ𝛿18.5superscript105(\epsilon,\delta)=(1,8.5\times 10^{-5})( italic_ϵ , italic_δ ) = ( 1 , 8.5 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT ), against n𝑛nitalic_n and σ𝜎\sigmaitalic_σ, respectively.
The “RL-NGD” and “RL-SSP” algorithms are our proposed post-RL approaches applied to the linked data, compared with the non-RL “NGD” and “SSP” methods applied to (X,𝒚)𝑋𝒚(X,\bm{y})( italic_X , bold_italic_y ) (i.e., with no linkage errors). The non-private “OLS” and “RL-OLS” (lahiri2005) results are also plotted for benchmarking. The number of iterations for “RL-NGD” results fall within the range of (210,260)210260(210,260)( 210 , 260 ).

In setting 1, where σ𝜎\sigmaitalic_σ is fixed at 1, Figure 3(a) shows the errors of all methods decrease with a growing sample size. Due to the linkage errors, the post-RL methods, including 𝜷^RLsuperscript^𝜷RL{\hat{\bm{\beta}}^{\text{RL}}}over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT and our two algorithms (denoted as “RL-OLS”, “RL-NGD”, and “RL-SSP” in the figures) run on the linked data (X,𝒛)𝑋𝒛(X,\bm{z})( italic_X , bold_italic_z ), naturally always yield larger errors than their counterparts run on (X,𝒚)𝑋𝒚(X,\bm{y})( italic_X , bold_italic_y ) when no linkage has to be done beforehand. In this case, with σ=1𝜎1\sigma=1italic_σ = 1, post-RL SSP outperforms post-RL NGD in terms of both accuracy and variance. However, as σ𝜎\sigmaitalic_σ increases, post-RL NGD algorithm starts perform better, as depicted in Figure 3(c) with varying σ𝜎\sigmaitalic_σ. The error grows linearly for post-RL NGD and quadratically for post-RL SSP, which aligns with the theoretical results on the error bounds presented in Section 4.2. Similar trends are observed for comparison of the non-RL NGD and SSP algorithms. In Figure 3(e), where linkage error tends to zero, the post-RL versions of the three estimators approach the corresponding non-RL versions. NGD and SSP methods have strictly larger error than OLS due to the cost of privacy.

Figures 3(b), 3(d) and 3(f) illustrate the empirical variances (EMP) against the theoretical variances (THR) of the proxy estimators given in Section 4.3. The theoretical variance of post-RL NGD closely aligns with the empirical variance at the chosen level of projection C𝐶Citalic_C. Recall that the theoretical variance would be exact when no projection is applied. Thus, with a lower level of projection on the gradient update, we anticipate it to be conservative. On the other hand, the theoretical variance of post-RL SSP approximates well with moderately large n𝑛nitalic_n and small σ𝜎\sigmaitalic_σ. However, in scenarios with small n𝑛nitalic_n and/or large σ𝜎\sigmaitalic_σ, our theoretical variance may underestimate the reality due to the approximation’s reliance on a first-order Taylor expansion. Therefore, one can include higher-order terms for better approximation. In setting 3, where n𝑛nitalic_n and σ𝜎\sigmaitalic_σ are fixed, as the linkage error vanishes, the variance reduces as a result of the smaller DP noise needed.

5.2. Application to Synthetic Data

Due to privacy concerns, pairs of datasets containing personal information, which serve as quasi-identifiers, are typically not made public. We instead synthesize from a pair of generated quasi-identifiers datasets and real data for regression, as in Chambers2019SmallAE. For quasi-identifiers, we take advantage of the datasets generated by the Freely Extensible Biomedical Record Linkage (Febrl), which are available in the module RecordLinkage by rlpython in Python. The pair of datasets for linkage we use correspond to 5000 individuals. The domain indicator (state) is used for blocking. The record linkage is performed based on the Jaro-Winkler score (Jaro1989AdvancesIR) or exact comparison on 6 quasi-identifiers (names, date of birth, address, etc.). The maximum score is 6 for pairs that have exact alignment. A threshold of 4 is chosen to link the records. For those left unlinked, we assign random links to ensure one-to-one linkage. A unique identifier is available in the dataset for verification purposes. The resulting linkage accuracy for the 9 blocks are 𝜸=(0.880,0.903,0.918,0.938,0.905,0.875,0.898,0.917,0.898)𝜸0.8800.9030.9180.9380.9050.8750.8980.9170.898\bm{\gamma}=(0.880,0.903,0.918,0.938,0.905,0.875,0.898,0.917,0.898)bold_italic_γ = ( 0.880 , 0.903 , 0.918 , 0.938 , 0.905 , 0.875 , 0.898 , 0.917 , 0.898 ), making the overall accuracy 92.5%. We adopt the ELE model for Q𝑄Qitalic_Q and estimate it using 𝜸𝜸\bm{\gamma}bold_italic_γ.

On the other hand, an anonymous dataset for regression comes from the Survey on Household Income and Wealth (SHIW) from the italydataset. The net disposable income and consumption are the explanatory variable X𝑋Xitalic_X and the response 𝒚𝒚\bm{y}bold_italic_y, respectively. Since the SHIW dataset is larger, consisting of 8151 data points, we drop the outliers and randomly draw 5000 records and synthesize them with the Febrl dataset. Figure 4 depicts the setup of the synthesization process. Using the unique identifier from the Ferlb dataset, the regression variables (X,𝒚)𝑋𝒚(X,\bm{y})( italic_X , bold_italic_y ) are appended to the quasi-identifiers (ΦA,ΦB)subscriptΦ𝐴subscriptΦ𝐵(\Phi_{A},\Phi_{B})( roman_Φ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT , roman_Φ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ), resulting in two separate datasets: (ΦA,X)subscriptΦ𝐴𝑋(\Phi_{A},X)( roman_Φ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT , italic_X ) and (ΦB,𝒚)subscriptΦ𝐵𝒚(\Phi_{B},\bm{y})( roman_Φ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT , bold_italic_y ). Then, record linkage is performed by comparing ΦAsubscriptΦ𝐴\Phi_{A}roman_Φ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT and ΦBsubscriptΦ𝐵\Phi_{B}roman_Φ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT to output the linked data (X,𝒛)𝑋𝒛(X,\bm{z})( italic_X , bold_italic_z ) and the matrix Q𝑄Qitalic_Q.

Our focusFerlb: (ΦA,ΦB)subscriptΦ𝐴subscriptΦ𝐵(\Phi_{A},\Phi_{B})( roman_Φ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT , roman_Φ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT )SHIW: (X,𝒚)𝑋𝒚(X,\bm{y})( italic_X , bold_italic_y )(ΦA,X)subscriptΦ𝐴𝑋(\Phi_{A},X)( roman_Φ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT , italic_X )(ΦB,𝒚)subscriptΦ𝐵𝒚(\Phi_{B},\bm{y})( roman_Φ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT , bold_italic_y )(X,𝒛,Q)𝑋𝒛𝑄(X,\bm{z},Q)( italic_X , bold_italic_z , italic_Q )synthesizeoutput
Figure 4. Synthesization. The Ferlb dataset provides quasi-identifiers (ΦA,ΦB)subscriptΦ𝐴subscriptΦ𝐵(\Phi_{A},\Phi_{B})( roman_Φ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT , roman_Φ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ), and the SHIW dataset provides regression variables (X,𝒚)𝑋𝒚(X,\bm{y})( italic_X , bold_italic_y ).

To apply the proposed DP algorithms to the synthesized dataset, we set the (hyper)parameters as follows. The privacy budget is given by (ϵ,δ)=(1,8.5×105)italic-ϵ𝛿18.5superscript105(\epsilon,\delta)=(1,8.5\times 10^{-5})( italic_ϵ , italic_δ ) = ( 1 , 8.5 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT ). The variance of the random error, σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, is estimated by the MSE. The upper bounds in Assumptions (A1)-(A3) are set as: M=1𝑀1M=1italic_M = 1, c0=1subscript𝑐01c_{0}=1italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1, cx=max(X)=2.78subscript𝑐𝑥𝑋2.78c_{x}=\max(X)=2.78italic_c start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT = roman_max ( italic_X ) = 2.78. In the NDG method, the projection level C𝐶Citalic_C is set to 1.2.

To illustrate the importance of propagating linkage uncertainty when conducting downstream regression, we also apply the non-RL version of NGD and SSP algorithms. We obtain the non-RL regression results by running post-RL NGD and post-RL SSP methods with M𝑀Mitalic_M set to 0 and without converting X𝑋Xitalic_X into W𝑊Witalic_W. This is equivalent to applying the non-RL methods discussed in Cai2019TheCO; Sheffet17 to the linked set (X,𝒛)𝑋𝒛(X,\bm{z})( italic_X , bold_italic_z ) as if it were perfectly linked.

Refer to caption
Figure 5. Boxplots of DP estimates based on 1000 repetitions with (ϵ,δ)=(1,8.5×105)italic-ϵ𝛿18.5superscript105(\epsilon,\delta)=(1,8.5\times 10^{-5})( italic_ϵ , italic_δ ) = ( 1 , 8.5 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT ).
The red dashed line indicates the OLS estimate. The proposed post-RL algorithms are compared with the non-RL “NGD” and “SSP” methods applied to (X,𝒛)𝑋𝒛(X,\bm{z})( italic_X , bold_italic_z ) (i.e., without accounting present linkage errors). The third and fourth columns represent the two NGD methods running for T=L2ln(c02n)/3𝑇superscript𝐿2superscriptsubscript𝑐02𝑛3T=\lceil L^{2}\ln(c_{0}^{2}n)/3\rceilitalic_T = ⌈ italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_ln ( italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n ) / 3 ⌉ iterations.

Figure 5 displays the boxplots of the estimates of each algorithm. For each algorithm, a total of 1000 repetitions are done in order to reflect the randomness of the injected noise for privacy purposes. The variables X𝑋Xitalic_X and y𝑦yitalic_y are standardized before conducting simple linear regression. The OLS estimator on (X,𝒚)𝑋𝒚(X,\bm{y})( italic_X , bold_italic_y ) (dashed line) is plotted for comparison. As can be seen, the DP estimators by running (non-RL) NGD and SSP on (X,𝒛)𝑋𝒛(X,\bm{z})( italic_X , bold_italic_z ) directly are excessively biased as a consequence of ignoring linkage errors, even when the overall linkage accuracy is as high as 92.5%. Conversely, the results of post-RL NGD and post-RL SSP yields estimates centered around the OLS estimator but with higher variances, attributed to the cost of bias correction. Post-RL NGD is more flexible due to hyperparameter tuning. Additionally, we run the NGD methods for fewer iterations with T=L2ln(c02n)/3𝑇superscript𝐿2superscriptsubscript𝑐02𝑛3T=\lceil L^{2}\ln(c_{0}^{2}n)/3\rceilitalic_T = ⌈ italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_ln ( italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n ) / 3 ⌉, which is one-third of the value recommended by theory. We have found that this approach yields smaller variance while still producing accurate results in finite samples. Therefore, the theoretical number of iterations T=L2ln(c02n)𝑇superscript𝐿2superscriptsubscript𝑐02𝑛T=\lceil L^{2}\ln(c_{0}^{2}n)\rceilitalic_T = ⌈ italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_ln ( italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n ) ⌉ may be conservative in some circumstances. Moderately reducing T𝑇Titalic_T may lead to better results.

6. Discussion

In this paper, we propose two differentially private algorithms for linear regression on a linked dataset that contains linkage errors, by leveraging the existing work on (1) linear regression after record linkage, and (2) differentially private linear regression. Figure 6 displays the connections among the related areas at a high level, including PPRL and SMPC mentioned in Sections 1 and 2.3. Our work is the first one to simultaneously consider the linkage uncertainty propagation and the privatization of the output. It also complements the area of PPRL where the main concern is the data leakage among different parties. However, we do not discuss how to link the records in the first place and thus the security issues of the linkage process are beyond our scope. Instead, we treat record linkage from a secondary perspective: we begin with linked data prepared by an external entity and we have limited information about the linkage quality.

Our focus Record linkage Regression analysis Secure protocol Private output Our scopeStatistical RL modelPPRLDP-based methodSMPC
Figure 6. Diagram of related research areas. A secure protocol ensures no data is revealed to external parties during the linkage process.

Specifically, we propose two post-RL algorithms based on the noisy gradient descent and sufficient statistics perturbation methods from the DP literature. We provide privacy guarantees and finite-sample error bounds for these algorithms and discuss the variances of the private estimators. Our simulation studies and the application demonstrate the following: (1) the proposed estimators converge as the sample size increases; (2) post-RL linear regression incurs a higher cost than the non-RL counterpart in terms of the privacy-accuracy tradeoff; (3) The NGD method is flexible with hyperparameter tuning and can be applied to more general optimization problems; (4) SSP is specific to the least-squares problem, offering greater budget efficiency and more accurate results provided that the random error of the regression model is not too large.

There are different directions to extend our work. Note that there may be different scenarios of linking between the two datasets of the same set of entities. Assuming one-to-one linkage, as in our paper, is a canonical scenario. Although we do not explore it, we expect that our methods can be extended to other scenarios (e.g., one-to-many linkage) where Q𝑄Qitalic_Q still makes sense. Extra assumptions may be required when determining the relevant sensitivities for privacy purposes.

One can also consider record linkage from a primary perspective. In addition to the traditional Fellegi–Sunter model, Bayesian approaches and machine learning-based methods have gained popularity. The record linkage may take forms other than the matching probability matrix adopted here. Furthermore, when privacy concerns arise during the linkage process involving different parties, PPRL and SMPC protocols become essential. Tackling all the challenges depicted in Figure 6 simultaneously with a single efficient tool is of great practical use and significance. This interdisciplinary challenge requires expertise in both statistics and computer science.

Another important direction is exploring related statistical problems in the post-RL context, with or without privacy constraints. For example, confidence intervals and hypothesis testing are fundamental statistical inference tools. Other potential problems that interest statisticians include high-dimensional linear regression and ridge regression.

Appendix A Proofs

A.1. Lemmas

The lemmas here support the proofs in Section A.2.

Lemma A.1.

If the minimum and maximum eigenvalues of WW/nsuperscript𝑊top𝑊𝑛W^{\top}W/nitalic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W / italic_n satisfy 0<a<λmin(WW/n)λmax(WW/n)<b0𝑎subscript𝜆superscript𝑊top𝑊𝑛subscript𝜆superscript𝑊top𝑊𝑛𝑏0<a<\lambda_{\min}(W^{\top}W/n)\leq\lambda_{\max}(W^{\top}W/n)<b0 < italic_a < italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W / italic_n ) ≤ italic_λ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W / italic_n ) < italic_b for some constant 1<L<1𝐿1<L<\infty1 < italic_L < ∞, then the loss function n(𝛃)=12n(𝐳W𝛃)(𝐳W𝛃)subscript𝑛𝛃12𝑛superscript𝐳𝑊𝛃top𝐳𝑊𝛃\mathcal{L}_{n}(\bm{\beta})=\frac{1}{2n}(\bm{z}-W\bm{\beta})^{\top}(\bm{z}-W% \bm{\beta})caligraphic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_italic_β ) = divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ( bold_italic_z - italic_W bold_italic_β ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_z - italic_W bold_italic_β ) is b𝑏bitalic_b-smooth and a𝑎aitalic_a-strongly convex.

Proof.

Since n(𝜷)=1n(WW𝜷W𝒛)subscript𝑛𝜷1𝑛superscript𝑊top𝑊𝜷superscript𝑊top𝒛\nabla\mathcal{L}_{n}(\bm{\beta})=\frac{1}{n}(W^{\top}W\bm{\beta}-W^{\top}\bm{% z})∇ caligraphic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_italic_β ) = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W bold_italic_β - italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_z ), then for any 𝜷1,𝜷2subscript𝜷1subscript𝜷2\bm{\beta}_{1},\bm{\beta}_{2}bold_italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT,

(n(𝜷1)n(𝜷2))(𝜷1𝜷2)superscriptsubscript𝑛subscript𝜷1subscript𝑛subscript𝜷2topsubscript𝜷1subscript𝜷2\displaystyle(\nabla\mathcal{L}_{n}(\bm{\beta}_{1})-\nabla\mathcal{L}_{n}(\bm{% \beta}_{2}))^{\top}(\bm{\beta}_{1}-\bm{\beta}_{2})( ∇ caligraphic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - ∇ caligraphic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) =(𝜷1𝜷2)WWn(𝜷1𝜷2)absentsuperscriptsubscript𝜷1subscript𝜷2topsuperscript𝑊top𝑊𝑛subscript𝜷1subscript𝜷2\displaystyle=(\bm{\beta}_{1}-\bm{\beta}_{2})^{\top}\frac{W^{\top}W}{n}(\bm{% \beta}_{1}-\bm{\beta}_{2})= ( bold_italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT divide start_ARG italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W end_ARG start_ARG italic_n end_ARG ( bold_italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )
λmin(WWn)𝜷1𝜷22absentsubscript𝜆superscript𝑊top𝑊𝑛superscriptnormsubscript𝜷1subscript𝜷22\displaystyle\geq\lambda_{\min}\left(\frac{W^{\top}W}{n}\right)\|\bm{\beta}_{1% }-\bm{\beta}_{2}\|^{2}≥ italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( divide start_ARG italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W end_ARG start_ARG italic_n end_ARG ) ∥ bold_italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
a𝜷1𝜷22.absent𝑎superscriptnormsubscript𝜷1subscript𝜷22\displaystyle\geq a\|\bm{\beta}_{1}-\bm{\beta}_{2}\|^{2}.≥ italic_a ∥ bold_italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

By definition, n(𝜷)subscript𝑛𝜷\mathcal{L}_{n}(\bm{\beta})caligraphic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_italic_β ) is a𝑎aitalic_a-strongly convex.

For smoothness, we have

n(𝜷1)n(𝜷2)normsubscript𝑛subscript𝜷1subscript𝑛subscript𝜷2\displaystyle\|\nabla\mathcal{L}_{n}(\bm{\beta}_{1})-\nabla\mathcal{L}_{n}(\bm% {\beta}_{2})\|∥ ∇ caligraphic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - ∇ caligraphic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∥ =WWn(𝜷1𝜷2)absentnormsuperscript𝑊top𝑊𝑛subscript𝜷1subscript𝜷2\displaystyle=\left\|\frac{W^{\top}W}{n}(\bm{\beta}_{1}-\bm{\beta}_{2})\right\|= ∥ divide start_ARG italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W end_ARG start_ARG italic_n end_ARG ( bold_italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∥
WWn𝜷1𝜷2absentnormsuperscript𝑊top𝑊𝑛normsubscript𝜷1subscript𝜷2\displaystyle\leq\left\|\frac{W^{\top}W}{n}\right\|\|\bm{\beta}_{1}-\bm{\beta}% _{2}\|≤ ∥ divide start_ARG italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W end_ARG start_ARG italic_n end_ARG ∥ ∥ bold_italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥
=λmax(WWn)𝜷1𝜷2absentsubscript𝜆superscript𝑊top𝑊𝑛normsubscript𝜷1subscript𝜷2\displaystyle=\lambda_{\max}\left(\frac{W^{\top}W}{n}\right)\|\bm{\beta}_{1}-% \bm{\beta}_{2}\|= italic_λ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ( divide start_ARG italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W end_ARG start_ARG italic_n end_ARG ) ∥ bold_italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥
b𝜷1𝜷2.absent𝑏normsubscript𝜷1subscript𝜷2\displaystyle\leq b\|\bm{\beta}_{1}-\bm{\beta}_{2}\|.≤ italic_b ∥ bold_italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ .

The second equality is due to the fact that A=|λmax(A)|norm𝐴subscript𝜆𝐴\|A\|=|\lambda_{\max}(A)|∥ italic_A ∥ = | italic_λ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ( italic_A ) | for symmetric matrix A𝐴Aitalic_A. By definition, n(𝜷)subscript𝑛𝜷\mathcal{L}_{n}(\bm{\beta})caligraphic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_italic_β ) is b𝑏bitalic_b-smooth.

One can have a neater proof using alternative definitions (See Eq. (4) and (10) in lecture_smoothness2 for a twice differentiable function:

(1)f𝑓fitalic_f is μ𝜇\muitalic_μ–strongly convex if λmin(2f)μsubscript𝜆superscript2𝑓𝜇\lambda_{\min}(\nabla^{2}f)\geq\muitalic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ) ≥ italic_μ;

(1)f𝑓fitalic_f is L𝐿Litalic_L-smooth if λmax(2f)Lsubscript𝜆superscript2𝑓𝐿\lambda_{\max}(\nabla^{2}f)\leq Litalic_λ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ( ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ) ≤ italic_L.

Lemma A.2 (Bubeck2015, Proof of Theorem 3.10.).

Let f𝑓fitalic_f be α𝛼\alphaitalic_α-strongly convex and β𝛽\betaitalic_β-smooth on 𝒳𝒳\mathcal{X}caligraphic_X, and xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT be the minimizer of f𝑓fitalic_f on 𝒳𝒳\mathcal{X}caligraphic_X. Then projected gradient descent with step size η=1β𝜂1𝛽\eta=\frac{1}{\beta}italic_η = divide start_ARG 1 end_ARG start_ARG italic_β end_ARG satisfies for t0𝑡0t\geq 0italic_t ≥ 0,

xt+1x2(1αβ)xtx2.superscriptnormsubscript𝑥𝑡1superscript𝑥21𝛼𝛽superscriptnormsubscript𝑥𝑡superscript𝑥2\|x_{t+1}-x^{*}\|^{2}\leq\left(1-\frac{\alpha}{\beta}\right)\|x_{t}-x^{*}\|^{2}.∥ italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ( 1 - divide start_ARG italic_α end_ARG start_ARG italic_β end_ARG ) ∥ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .
Lemma A.3.

𝒂+𝒃2(1+c2)𝒂2+(1+1/c2)𝒃2superscriptnorm𝒂𝒃21superscript𝑐2superscriptnorm𝒂211superscript𝑐2superscriptnorm𝒃2\|\bm{a}+\bm{b}\|^{2}\leq(1+c^{2})\|\bm{a}\|^{2}+(1+1/c^{2})\|\bm{b}\|^{2}∥ bold_italic_a + bold_italic_b ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ( 1 + italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ∥ bold_italic_a ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 + 1 / italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ∥ bold_italic_b ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for any scalar c0𝑐0c\neq 0italic_c ≠ 0.

Proof.

Since

c2𝒂2+1c2𝒃22𝒂𝒃=(c𝒂1c𝒃)20,superscript𝑐2superscriptnorm𝒂21superscript𝑐2superscriptnorm𝒃22norm𝒂norm𝒃superscriptnorm𝑐𝒂norm1𝑐𝒃20c^{2}\|\bm{a}\|^{2}+\frac{1}{c^{2}}\|\bm{b}\|^{2}-2\|\bm{a}\|\|\bm{b}\|=\left(% \|c\bm{a}\|-\left\|\frac{1}{c}\bm{b}\right\|\right)^{2}\geq 0,italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ bold_italic_a ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ bold_italic_b ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 ∥ bold_italic_a ∥ ∥ bold_italic_b ∥ = ( ∥ italic_c bold_italic_a ∥ - ∥ divide start_ARG 1 end_ARG start_ARG italic_c end_ARG bold_italic_b ∥ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ 0 ,

it follows that

𝒂+𝒃2(𝒂+𝒃)2𝒂2+𝒃2+2𝒂𝒃(1+c2)𝒂2+(1+1c2)𝒃2.superscriptnorm𝒂𝒃2superscriptnorm𝒂norm𝒃2superscriptnorm𝒂2superscriptnorm𝒃22norm𝒂norm𝒃1superscript𝑐2superscriptnorm𝒂211superscript𝑐2superscriptnorm𝒃2\|\bm{a}+\bm{b}\|^{2}\leq(\|\bm{a}\|+\|\bm{b}\|)^{2}\leq\|\bm{a}\|^{2}+\|\bm{b% }\|^{2}+2\|\bm{a}\|\|\bm{b}\|\leq(1+c^{2})\|\bm{a}\|^{2}+(1+\frac{1}{c^{2}})\|% \bm{b}\|^{2}.∥ bold_italic_a + bold_italic_b ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ( ∥ bold_italic_a ∥ + ∥ bold_italic_b ∥ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ∥ bold_italic_a ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ bold_italic_b ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ∥ bold_italic_a ∥ ∥ bold_italic_b ∥ ≤ ( 1 + italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ∥ bold_italic_a ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 + divide start_ARG 1 end_ARG start_ARG italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) ∥ bold_italic_b ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Lemma A.4 (Cai2019TheCO, Lemma A.2).

For X1,,Xki.i.d.χd2X_{1},...,X_{k}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\chi^{2}_{d}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_RELOP SUPERSCRIPTOP start_ARG ∼ end_ARG start_ARG italic_i . italic_i . italic_d . end_ARG end_RELOP italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, ζ>0𝜁0\zeta>0italic_ζ > 0, 0<ρ<10𝜌10<\rho<10 < italic_ρ < 1,

(j=1kζρjXj>ρζd1ρ+s)exp(min((1ρ2)s28ρ2ζ2d,s8ρζ)).superscriptsubscript𝑗1𝑘𝜁superscript𝜌𝑗subscript𝑋𝑗𝜌𝜁𝑑1𝜌𝑠1superscript𝜌2superscript𝑠28superscript𝜌2superscript𝜁2𝑑𝑠8𝜌𝜁\mathbb{P}\left(\sum_{j=1}^{k}\zeta\rho^{j}X_{j}>\frac{\rho\zeta d}{1-\rho}+s% \right)\leq\exp\left(-\min\left(\frac{(1-\rho^{2})s^{2}}{8\rho^{2}\zeta^{2}d},% \frac{s}{8\rho\zeta}\right)\right).blackboard_P ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_ζ italic_ρ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT > divide start_ARG italic_ρ italic_ζ italic_d end_ARG start_ARG 1 - italic_ρ end_ARG + italic_s ) ≤ roman_exp ( - roman_min ( divide start_ARG ( 1 - italic_ρ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 8 italic_ρ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d end_ARG , divide start_ARG italic_s end_ARG start_ARG 8 italic_ρ italic_ζ end_ARG ) ) .
Lemma A.5 (Sheffet17, Proof of Proposition D.2).

For any invertible matrix A𝐴Aitalic_A and any matrix B𝐵Bitalic_B such that (I+BA1)𝐼𝐵superscript𝐴1(I+BA^{-1})( italic_I + italic_B italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) is invertible,

(A+B)1=A1A1(I+BA1)1BA1.superscript𝐴𝐵1superscript𝐴1superscript𝐴1superscript𝐼𝐵superscript𝐴11𝐵superscript𝐴1(A+B)^{-1}=A^{-1}-A^{-1}(I+BA^{-1})^{-1}BA^{-1}.( italic_A + italic_B ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT - italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_I + italic_B italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_B italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT .
Lemma A.6.

Let X be a d×d𝑑𝑑d\times ditalic_d × italic_d symmetric random matrix with i.i.d upper triangle entries. Each entry has mean 0 and variance σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Let 𝐲𝐲\bm{y}bold_italic_y be a d𝑑ditalic_d-dimensional random vector, which has mean 𝛍𝛍\bm{\mu}bold_italic_μ and covariance matrix ΣΣ\Sigmaroman_Σ. Let ΣX𝐲subscriptΣ𝑋𝐲\Sigma_{X\bm{y}}roman_Σ start_POSTSUBSCRIPT italic_X bold_italic_y end_POSTSUBSCRIPT denote the covariance matrix of X𝐲𝑋𝐲X\bm{y}italic_X bold_italic_y. Then, the diagonal entries of ΣX𝐲subscriptΣ𝑋𝐲\Sigma_{X\bm{y}}roman_Σ start_POSTSUBSCRIPT italic_X bold_italic_y end_POSTSUBSCRIPT are given by

(23) (ΣX𝒚)kk=σ2i=1d(μi2+Σii);subscriptsubscriptΣ𝑋𝒚𝑘𝑘superscript𝜎2superscriptsubscript𝑖1𝑑superscriptsubscript𝜇𝑖2subscriptΣ𝑖𝑖(\Sigma_{X\bm{y}})_{kk}=\sigma^{2}\sum_{i=1}^{d}(\mu_{i}^{2}+\Sigma_{ii});( roman_Σ start_POSTSUBSCRIPT italic_X bold_italic_y end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + roman_Σ start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT ) ;

the off-diagonal entries are

(24) (ΣX𝒚)kl=σ2(μkμl+Σkl) for kl.subscriptsubscriptΣ𝑋𝒚𝑘𝑙superscript𝜎2subscript𝜇𝑘subscript𝜇𝑙subscriptΣ𝑘𝑙 for 𝑘𝑙(\Sigma_{X\bm{y}})_{kl}=\sigma^{2}(\mu_{k}\mu_{l}+\Sigma_{kl})\text{ for }k% \neq l.( roman_Σ start_POSTSUBSCRIPT italic_X bold_italic_y end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k italic_l end_POSTSUBSCRIPT = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT + roman_Σ start_POSTSUBSCRIPT italic_k italic_l end_POSTSUBSCRIPT ) for italic_k ≠ italic_l .
Proof.

Let X=(𝒙1,𝒙2,,𝒙d)𝑋subscript𝒙1subscript𝒙2subscript𝒙𝑑X=(\bm{x}_{1},\bm{x}_{2},...,\bm{x}_{d})italic_X = ( bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , bold_italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) where 𝒙i=(xi1,xi2,,xid)Tsubscript𝒙𝑖superscriptsubscript𝑥𝑖1subscript𝑥𝑖2subscript𝑥𝑖𝑑𝑇\bm{x}_{i}=(x_{i1},x_{i2},...,x_{id})^{T}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( italic_x start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_i 2 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_i italic_d end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT and y=(y1,y2,,yd)T𝑦superscriptsubscript𝑦1subscript𝑦2subscript𝑦𝑑𝑇y=(y_{1},y_{2},...,y_{d})^{T}italic_y = ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT. Then,

(25) X𝒚=i=1d𝒙iyi.𝑋𝒚superscriptsubscript𝑖1𝑑subscript𝒙𝑖subscript𝑦𝑖X\bm{y}=\sum_{i=1}^{d}\bm{x}_{i}y_{i}.italic_X bold_italic_y = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT .

Therefore,

(26) Var(X𝒚)=Var(i=1d𝒙iyi)=i=1dVar(𝒙iyi)+ijdCov(𝒙iyi,𝒙jyj).Var𝑋𝒚Varsuperscriptsubscript𝑖1𝑑subscript𝒙𝑖subscript𝑦𝑖superscriptsubscript𝑖1𝑑Varsubscript𝒙𝑖subscript𝑦𝑖superscriptsubscript𝑖𝑗𝑑Covsubscript𝒙𝑖subscript𝑦𝑖subscript𝒙𝑗subscript𝑦𝑗\operatorname{Var}(X\bm{y})=\operatorname{Var}\left(\sum_{i=1}^{d}\bm{x}_{i}y_% {i}\right)=\sum_{i=1}^{d}\operatorname{Var}(\bm{x}_{i}y_{i})+\sum_{i\neq j}^{d% }\operatorname{Cov}(\bm{x}_{i}y_{i},\bm{x}_{j}y_{j}).roman_Var ( italic_X bold_italic_y ) = roman_Var ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT roman_Var ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_i ≠ italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT roman_Cov ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) .

For the first term,

Var(𝒙iyi)Varsubscript𝒙𝑖subscript𝑦𝑖\displaystyle\operatorname{Var}(\bm{x}_{i}y_{i})roman_Var ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) =𝔼[Var(𝒙iyiyi)]+Var[𝔼(𝒙iyiyi)]absent𝔼delimited-[]Varconditionalsubscript𝒙𝑖subscript𝑦𝑖subscript𝑦𝑖Var𝔼conditionalsubscript𝒙𝑖subscript𝑦𝑖subscript𝑦𝑖\displaystyle=\mathbb{E}[\operatorname{Var}(\bm{x}_{i}y_{i}\mid y_{i})]+% \operatorname{Var}[\mathbb{E}(\bm{x}_{i}y_{i}\mid y_{i})]= blackboard_E [ roman_Var ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] + roman_Var [ blackboard_E ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ]
=𝔼[yi2Var(𝒙iyi)]+Var[yi𝔼(𝒙iyi)]absent𝔼delimited-[]superscriptsubscript𝑦𝑖2Varconditionalsubscript𝒙𝑖subscript𝑦𝑖Varsubscript𝑦𝑖𝔼conditionalsubscript𝒙𝑖subscript𝑦𝑖\displaystyle=\mathbb{E}[y_{i}^{2}\operatorname{Var}(\bm{x}_{i}\mid y_{i})]+% \operatorname{Var}[y_{i}\mathbb{E}(\bm{x}_{i}\mid y_{i})]= blackboard_E [ italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Var ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] + roman_Var [ italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT blackboard_E ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ]
=𝔼(yi2)Var(𝒙i)+Var(yi)𝔼(𝒙i)absent𝔼superscriptsubscript𝑦𝑖2Varsubscript𝒙𝑖Varsubscript𝑦𝑖𝔼subscript𝒙𝑖\displaystyle=\mathbb{E}(y_{i}^{2})\operatorname{Var}(\bm{x}_{i})+% \operatorname{Var}(y_{i})\mathbb{E}(\bm{x}_{i})= blackboard_E ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) roman_Var ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + roman_Var ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) blackboard_E ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )
=σ2(μi2+Σii)Id.absentsuperscript𝜎2superscriptsubscript𝜇𝑖2subscriptΣ𝑖𝑖subscript𝐼𝑑\displaystyle=\sigma^{2}(\mu_{i}^{2}+\Sigma_{ii})I_{d}.= italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + roman_Σ start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT ) italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT .

For ij𝑖𝑗i\neq jitalic_i ≠ italic_j,

Cov(𝒙iyi,𝒙jyj)Covsubscript𝒙𝑖subscript𝑦𝑖subscript𝒙𝑗subscript𝑦𝑗\displaystyle\operatorname{Cov}(\bm{x}_{i}y_{i},\bm{x}_{j}y_{j})roman_Cov ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) =𝔼[𝒙iyi(𝒙jyj)T]𝔼(𝒙iyi)𝔼(𝒙jyj)absent𝔼delimited-[]subscript𝒙𝑖subscript𝑦𝑖superscriptsubscript𝒙𝑗subscript𝑦𝑗𝑇𝔼subscript𝒙𝑖subscript𝑦𝑖𝔼subscript𝒙𝑗subscript𝑦𝑗\displaystyle=\mathbb{E}[\bm{x}_{i}y_{i}(\bm{x}_{j}y_{j})^{T}]-\mathbb{E}(\bm{% x}_{i}y_{i})\mathbb{E}(\bm{x}_{j}y_{j})= blackboard_E [ bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ] - blackboard_E ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) blackboard_E ( bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT )
=𝔼[(yiyj)𝒙i𝒙jT]𝔼(𝒙i)𝔼(yi)𝔼(𝒙j)𝔼(yj)absent𝔼delimited-[]subscript𝑦𝑖subscript𝑦𝑗subscript𝒙𝑖superscriptsubscript𝒙𝑗𝑇𝔼subscript𝒙𝑖𝔼subscript𝑦𝑖𝔼subscript𝒙𝑗𝔼subscript𝑦𝑗\displaystyle=\mathbb{E}[(y_{i}y_{j})\bm{x}_{i}\bm{x}_{j}^{T}]-\mathbb{E}(\bm{% x}_{i})\mathbb{E}(y_{i})\mathbb{E}(\bm{x}_{j})\mathbb{E}(y_{j})= blackboard_E [ ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ] - blackboard_E ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) blackboard_E ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) blackboard_E ( bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) blackboard_E ( italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT )
=𝔼(yiyj)𝔼(𝒙i𝒙jT),absent𝔼subscript𝑦𝑖subscript𝑦𝑗𝔼subscript𝒙𝑖superscriptsubscript𝒙𝑗𝑇\displaystyle=\mathbb{E}(y_{i}y_{j})\mathbb{E}(\bm{x}_{i}\bm{x}_{j}^{T}),= blackboard_E ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) blackboard_E ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) ,

where

𝔼(yiyj)=𝔼(yi)𝔼(yj)+Cov(yi,yj)=μiμj+Σij,𝔼subscript𝑦𝑖subscript𝑦𝑗𝔼subscript𝑦𝑖𝔼subscript𝑦𝑗Covsubscript𝑦𝑖subscript𝑦𝑗subscript𝜇𝑖subscript𝜇𝑗subscriptΣ𝑖𝑗\mathbb{E}(y_{i}y_{j})=\mathbb{E}(y_{i})\mathbb{E}(y_{j})+\operatorname{Cov}(y% _{i},y_{j})=\mu_{i}\mu_{j}+\Sigma_{ij},blackboard_E ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = blackboard_E ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) blackboard_E ( italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) + roman_Cov ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + roman_Σ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ,

and 𝔼(𝒙i𝒙jT)𝔼subscript𝒙𝑖superscriptsubscript𝒙𝑗𝑇\mathbb{E}(\bm{x}_{i}\bm{x}_{j}^{T})blackboard_E ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) is a d×d𝑑𝑑d\times ditalic_d × italic_d matrix with the (j,i)𝑗𝑖(j,i)( italic_j , italic_i ) entry being σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and 0 other wise. Putting Var(𝒙iyi)Varsubscript𝒙𝑖subscript𝑦𝑖\operatorname{Var}(\bm{x}_{i}y_{i})roman_Var ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) and Cov(𝒙iyi,𝒙jyj)Covsubscript𝒙𝑖subscript𝑦𝑖subscript𝒙𝑗subscript𝑦𝑗\operatorname{Cov}(\bm{x}_{i}y_{i},\bm{x}_{j}y_{j})roman_Cov ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) back in (26), we know that Var(X𝒚)Var𝑋𝒚\operatorname{Var}(X\bm{y})roman_Var ( italic_X bold_italic_y ) has the diagonal entries

σ2i=1d(μi2+Σii)superscript𝜎2superscriptsubscript𝑖1𝑑superscriptsubscript𝜇𝑖2subscriptΣ𝑖𝑖\sigma^{2}\sum_{i=1}^{d}(\mu_{i}^{2}+\Sigma_{ii})italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + roman_Σ start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT )

and the (k,l)𝑘𝑙(k,l)( italic_k , italic_l ) off-diagonal entry

σ2(μkμl+Σkl).superscript𝜎2subscript𝜇𝑘subscript𝜇𝑙subscriptΣ𝑘𝑙\sigma^{2}(\mu_{k}\mu_{l}+\Sigma_{kl}).italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT + roman_Σ start_POSTSUBSCRIPT italic_k italic_l end_POSTSUBSCRIPT ) .

Remark A.1.

In Lemma A.6, Var(X𝐲)Var𝑋𝐲\operatorname{Var}(X\bm{y})roman_Var ( italic_X bold_italic_y ) is given by σ2(𝛍𝛍T+Σ)superscript𝜎2𝛍superscript𝛍𝑇Σ\sigma^{2}(\bm{\mu}\bm{\mu}^{T}+\Sigma)italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( bold_italic_μ bold_italic_μ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + roman_Σ ) with the diagonal entries replaced by its trace.

Lemma A.7.

Let X𝑋Xitalic_X be a d×d𝑑𝑑d\times ditalic_d × italic_d random matrix with 𝔼X=0d×d𝔼𝑋subscript0𝑑𝑑\mathbb{E}X=0_{d\times d}blackboard_E italic_X = 0 start_POSTSUBSCRIPT italic_d × italic_d end_POSTSUBSCRIPT. Let 𝐲𝐲\bm{y}bold_italic_y be a d𝑑ditalic_d-dimensional random vector that is independent of X𝑋Xitalic_X. Then, Cov(𝐲,X𝐲)=0d×dCov𝐲𝑋𝐲subscript0𝑑𝑑\operatorname{Cov}(\bm{y},X\bm{y})=0_{d\times d}roman_Cov ( bold_italic_y , italic_X bold_italic_y ) = 0 start_POSTSUBSCRIPT italic_d × italic_d end_POSTSUBSCRIPT.

Proof.
Cov(𝒚,X𝒚)Cov𝒚𝑋𝒚\displaystyle\operatorname{Cov}(\bm{y},X\bm{y})roman_Cov ( bold_italic_y , italic_X bold_italic_y ) =𝔼[(𝒚𝔼𝒚)(X𝒚𝔼X𝒚)]absent𝔼delimited-[]𝒚𝔼𝒚superscript𝑋𝒚𝔼𝑋𝒚top\displaystyle=\mathbb{E}[(\bm{y}-\mathbb{E}\bm{y})(X\bm{y}-\mathbb{E}X\bm{y})^% {\top}]= blackboard_E [ ( bold_italic_y - blackboard_E bold_italic_y ) ( italic_X bold_italic_y - blackboard_E italic_X bold_italic_y ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ]
=𝔼[(𝒚𝔼𝒚)(X𝒚𝔼X𝔼𝒚)]absent𝔼delimited-[]𝒚𝔼𝒚superscript𝑋𝒚𝔼𝑋𝔼𝒚top\displaystyle=\mathbb{E}[(\bm{y}-\mathbb{E}\bm{y})(X\bm{y}-\mathbb{E}X\mathbb{% E}\bm{y})^{\top}]= blackboard_E [ ( bold_italic_y - blackboard_E bold_italic_y ) ( italic_X bold_italic_y - blackboard_E italic_X blackboard_E bold_italic_y ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ]
=𝔼[(𝒚𝔼𝒚)(X𝒚)]absent𝔼delimited-[]𝒚𝔼𝒚superscript𝑋𝒚top\displaystyle=\mathbb{E}[(\bm{y}-\mathbb{E}\bm{y})(X\bm{y})^{\top}]= blackboard_E [ ( bold_italic_y - blackboard_E bold_italic_y ) ( italic_X bold_italic_y ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ]
=𝔼𝒚𝒚𝔼X𝔼𝒚𝔼𝒚𝔼X=0d×d.absent𝔼𝒚superscript𝒚top𝔼superscript𝑋top𝔼𝒚𝔼superscript𝒚top𝔼superscript𝑋topsubscript0𝑑𝑑\displaystyle=\mathbb{E}\bm{y}\bm{y}^{\top}\mathbb{E}X^{\top}-\mathbb{E}\bm{y}% \mathbb{E}\bm{y}^{\top}\mathbb{E}X^{\top}=0_{d\times d}.= blackboard_E bold_italic_y bold_italic_y start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT blackboard_E italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - blackboard_E bold_italic_y blackboard_E bold_italic_y start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT blackboard_E italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT = 0 start_POSTSUBSCRIPT italic_d × italic_d end_POSTSUBSCRIPT .

Lemma A.8.

Let X𝑋Xitalic_X be a d×d𝑑𝑑d\times ditalic_d × italic_d random matrix with 𝔼X=0d×d𝔼𝑋subscript0𝑑𝑑\mathbb{E}X=0_{d\times d}blackboard_E italic_X = 0 start_POSTSUBSCRIPT italic_d × italic_d end_POSTSUBSCRIPT. Let 𝐲𝐲\bm{y}bold_italic_y be a d𝑑ditalic_d-dimensional random vector. Let 𝐳𝐳\bm{z}bold_italic_z be another d𝑑ditalic_d-dimensional random vector that is independent of both X𝑋Xitalic_X and 𝐲𝐲\bm{y}bold_italic_y and 𝔼𝐳=0d𝔼𝐳subscript0𝑑\mathbb{E}\bm{z}=0_{d}blackboard_E bold_italic_z = 0 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT. Then, Cov(X𝐲,X𝐳)=0d×dCov𝑋𝐲𝑋𝐳subscript0𝑑𝑑\operatorname{Cov}(X\bm{y},X\bm{z})=0_{d\times d}roman_Cov ( italic_X bold_italic_y , italic_X bold_italic_z ) = 0 start_POSTSUBSCRIPT italic_d × italic_d end_POSTSUBSCRIPT.

Proof.
Cov(X𝒚,X𝒛)Cov𝑋𝒚𝑋𝒛\displaystyle\operatorname{Cov}(X\bm{y},X\bm{z})roman_Cov ( italic_X bold_italic_y , italic_X bold_italic_z ) =𝔼[(X𝒚𝔼X𝒚)(X𝒛𝔼X𝒛)]absent𝔼delimited-[]𝑋𝒚𝔼𝑋𝒚superscript𝑋𝒛𝔼𝑋𝒛top\displaystyle=\mathbb{E}[(X\bm{y}-\mathbb{E}X\bm{y})(X\bm{z}-\mathbb{E}X\bm{z}% )^{\top}]= blackboard_E [ ( italic_X bold_italic_y - blackboard_E italic_X bold_italic_y ) ( italic_X bold_italic_z - blackboard_E italic_X bold_italic_z ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ]
=𝔼[(X𝒚𝔼X𝔼𝒚)(X𝒛𝔼X𝔼𝒛)]absent𝔼delimited-[]𝑋𝒚𝔼𝑋𝔼𝒚superscript𝑋𝒛𝔼𝑋𝔼𝒛top\displaystyle=\mathbb{E}[(X\bm{y}-\mathbb{E}X\mathbb{E}\bm{y})(X\bm{z}-\mathbb% {E}X\mathbb{E}\bm{z})^{\top}]= blackboard_E [ ( italic_X bold_italic_y - blackboard_E italic_X blackboard_E bold_italic_y ) ( italic_X bold_italic_z - blackboard_E italic_X blackboard_E bold_italic_z ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ]
=𝔼[X𝒚𝒛X].absent𝔼delimited-[]𝑋𝒚superscript𝒛topsuperscript𝑋top\displaystyle=\mathbb{E}[X\bm{y}\bm{z}^{\top}X^{\top}].= blackboard_E [ italic_X bold_italic_y bold_italic_z start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] .

Since 𝒛𝒛\bm{z}bold_italic_z is independent of both X𝑋Xitalic_X and 𝒚𝒚\bm{y}bold_italic_y and the entries of 𝒛𝒛\bm{z}bold_italic_z appear linearly in every entry of X𝒚𝒛X𝑋𝒚superscript𝒛topsuperscript𝑋topX\bm{y}\bm{z}^{\top}X^{\top}italic_X bold_italic_y bold_italic_z start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT. By the zero-expectation of 𝒛𝒛\bm{z}bold_italic_z, Cov(X𝒚,X𝒛)=0d×dCov𝑋𝒚𝑋𝒛subscript0𝑑𝑑\operatorname{Cov}(X\bm{y},X\bm{z})=0_{d\times d}roman_Cov ( italic_X bold_italic_y , italic_X bold_italic_z ) = 0 start_POSTSUBSCRIPT italic_d × italic_d end_POSTSUBSCRIPT.

A.2. Proofs

This section provides the proofs for theorems and lemmas presented in the paper.

To derive this tighter composition for (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP, we utilize the notion of zero-concentrated differential privacy (zCDP, Bun2016ConcentratedDP), defined as follows.

Definition 3 (ρ𝜌\rhoitalic_ρ-zCDP).

A randomized mechanism M:𝒳n𝒴:𝑀superscript𝒳𝑛𝒴M:\mathcal{X}^{n}\rightarrow\mathcal{Y}italic_M : caligraphic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → caligraphic_Y is ρ𝜌\rhoitalic_ρ-zero-concentrated-differentially private (ρ𝜌\rhoitalic_ρ-zCDP) if, for all 𝐱𝐱𝒳nsimilar-to𝐱superscript𝐱superscript𝒳𝑛\bm{x}\sim\bm{x}^{\prime}\in\mathcal{X}^{n}bold_italic_x ∼ bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT differing on a single entry and all α(1,)𝛼1\alpha\in(1,\infty)italic_α ∈ ( 1 , ∞ ),

Dα(M(𝒙)M(𝒙))ρα,subscriptD𝛼conditional𝑀𝒙𝑀superscript𝒙𝜌𝛼\operatorname{D}_{\alpha}(M(\bm{x})\|M(\bm{x}^{\prime}))\leq\rho\alpha,roman_D start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_M ( bold_italic_x ) ∥ italic_M ( bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ≤ italic_ρ italic_α ,

where Dα(M(𝐱)M(𝐱))subscriptD𝛼conditional𝑀𝐱𝑀superscript𝐱\operatorname{D}_{\alpha}(M(\bm{x})\|M(\bm{x}^{\prime}))roman_D start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_M ( bold_italic_x ) ∥ italic_M ( bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) is the α𝛼\alphaitalic_α-Rényi divergence van2014renyi between the distribution of M(𝐱)𝑀𝐱M(\bm{x})italic_M ( bold_italic_x ) and the distribution of M(𝐱)𝑀superscript𝐱M(\bm{x}^{\prime})italic_M ( bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ).

Proof of Lemma 10.

Bun2016ConcentratedDP have shown that, like the classic (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP notion, ρ𝜌\rhoitalic_ρ-zCDP enjoys properties including basic composition and post-processing. The corresponding Gaussian mechanism (Proposition 1.6, Bun2016ConcentratedDP) states that an algorithm f𝑓fitalic_f is ρ𝜌\rhoitalic_ρ-zCDP after adding Gaussian noise 𝒩(0,Δf22ρ)𝒩0superscriptsubscriptΔ𝑓22𝜌\mathcal{N}\left(0,\frac{\Delta_{f}^{2}}{2\rho}\right)caligraphic_N ( 0 , divide start_ARG roman_Δ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_ρ end_ARG ) to it. In addition, they have shown that ρ𝜌\rhoitalic_ρ-zCDP implies (ρ+2ρln(1/δ),δ)(\rho+2\sqrt{\rho\ln(1/\delta}),\delta)( italic_ρ + 2 square-root start_ARG italic_ρ roman_ln ( 1 / italic_δ end_ARG ) , italic_δ )-DP (Proposition 1.3, Bun2016ConcentratedDP).

Therefore, to achieve (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-DP, it suffices for the algorithm to be ρ𝜌\rhoitalic_ρ-zCDP with ρ:=ϵ+2ln(1/δ)2(ϵ+ln(1/δ))ln(1/δ)assign𝜌italic-ϵ21𝛿2italic-ϵ1𝛿1𝛿\rho:=\epsilon+2\ln(1/\delta)-2\sqrt{(\epsilon+\ln(1/\delta))\ln(1/\delta)}italic_ρ := italic_ϵ + 2 roman_ln ( 1 / italic_δ ) - 2 square-root start_ARG ( italic_ϵ + roman_ln ( 1 / italic_δ ) ) roman_ln ( 1 / italic_δ ) end_ARG. Using the Gaussian mechanism and basic composition rule of ρ𝜌\rhoitalic_ρ-zCDP, it suffices to add noise

ut𝒩(0,TΔt22ρ)similar-tosubscript𝑢𝑡𝒩0𝑇superscriptsubscriptΔ𝑡22𝜌u_{t}\sim\mathcal{N}\left(0,\frac{T\Delta_{t}^{2}}{2\rho}\right)italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , divide start_ARG italic_T roman_Δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_ρ end_ARG )

to ftsubscript𝑓𝑡f_{t}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT for all t𝑡titalic_t. Note that if ϵ8ln(1/δ)2+2italic-ϵ81𝛿22\epsilon\leq\frac{8\ln(1/\delta)}{2+\sqrt{2}}italic_ϵ ≤ divide start_ARG 8 roman_ln ( 1 / italic_δ ) end_ARG start_ARG 2 + square-root start_ARG 2 end_ARG end_ARG, then ρϵ28ln(1/δ)𝜌superscriptitalic-ϵ281𝛿\rho\geq\frac{\epsilon^{2}}{8\ln(1/\delta)}italic_ρ ≥ divide start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 8 roman_ln ( 1 / italic_δ ) end_ARG. Therefore, it suffices to add noise

ut𝒩(0,4TΔt2ln(1/δ)ϵ2)similar-tosubscript𝑢𝑡𝒩04𝑇superscriptsubscriptΔ𝑡21𝛿superscriptitalic-ϵ2u_{t}\sim\mathcal{N}\left(0,\frac{4T\Delta_{t}^{2}\ln(1/\delta)}{\epsilon^{2}}\right)italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , divide start_ARG 4 italic_T roman_Δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_ln ( 1 / italic_δ ) end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG )

to ftsubscript𝑓𝑡f_{t}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT for all t𝑡titalic_t.

Proof of Theorem 4.1(i).

By the composition proposition of differential privacy, to establish that Algorithm 1 (Post-RL Noisy Gradient Descent) satisfies (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-differential privacy it suffices to show that the computation of βt+1superscript𝛽𝑡1\beta^{t+1}italic_β start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT is (ϵ/T,δ/T)italic-ϵ𝑇𝛿𝑇(\epsilon/T,\delta/T)( italic_ϵ / italic_T , italic_δ / italic_T )-differentially private. According to the Gaussian mechanism Theorem 2.1, showing the latter boils down to proving that the sensitivity is controlled at each gradient step. Let gt+1(𝜷t;X,𝒚,Q)=i=1n(𝒘i𝜷tΠR(zi))𝒘isuperscript𝑔𝑡1superscript𝜷𝑡𝑋𝒚𝑄superscriptsubscript𝑖1𝑛superscriptsubscript𝒘𝑖topsuperscript𝜷𝑡subscriptΠ𝑅subscript𝑧𝑖subscript𝒘𝑖g^{t+1}(\bm{\beta}^{t};X,\bm{y},Q)=\sum_{i=1}^{n}(\bm{w}_{i}^{\top}\bm{\beta}^% {t}-\Pi_{R}(z_{i}))\bm{w}_{i}italic_g start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ( bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ; italic_X , bold_italic_y , italic_Q ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - roman_Π start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. We will show that the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-sensitivity of gt+1superscript𝑔𝑡1g^{t+1}italic_g start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT, denoted by ΔgsubscriptΔ𝑔\Delta_{g}roman_Δ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT, is bounded by B𝐵Bitalic_B.

Without loss of generality, we assume the neighboring data sets (X,ΦX,𝒚,Φ𝒚)𝑋subscriptΦ𝑋𝒚subscriptΦ𝒚(X,\Phi_{X},\bm{y},\Phi_{\bm{y}})( italic_X , roman_Φ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT , bold_italic_y , roman_Φ start_POSTSUBSCRIPT bold_italic_y end_POSTSUBSCRIPT ) and (X,ΦX,𝒚,Φ𝒚)superscript𝑋subscriptΦsuperscript𝑋superscript𝒚subscriptΦsuperscript𝒚(X^{\prime},\Phi_{X^{\prime}},\bm{y}^{\prime},\Phi_{\bm{y}^{\prime}})( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , roman_Φ start_POSTSUBSCRIPT italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , bold_italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , roman_Φ start_POSTSUBSCRIPT bold_italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) differ in the j𝑗jitalic_j-th record. Recall 𝒛𝒛\bm{z}bold_italic_z is a permutation of 𝒚𝒚\bm{y}bold_italic_y satisfying qij=P(zi=yi)subscript𝑞𝑖𝑗𝑃subscript𝑧𝑖subscript𝑦𝑖q_{ij}=P(z_{i}=y_{i})italic_q start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = italic_P ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). Let 𝒛superscript𝒛\bm{z}^{\prime}bold_italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT be a copy of 𝒛𝒛\bm{z}bold_italic_z but with the entry yjsubscript𝑦𝑗y_{j}italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT changed to yjsubscriptsuperscript𝑦𝑗y^{\prime}_{j}italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Recall in the record linkage linear model elaborated in lahiri2005, 𝒘i=j=1nqij𝒙jsubscript𝒘𝑖superscriptsubscript𝑗1𝑛subscript𝑞𝑖𝑗subscript𝒙𝑗\bm{w}_{i}=\sum_{j=1}^{n}q_{ij}\bm{x}_{j}bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, which are convex combinations of rows of X𝑋Xitalic_X. We can write 𝒘i=kjqik𝒙k+qij𝒙jsubscript𝒘𝑖subscript𝑘𝑗subscript𝑞𝑖𝑘subscript𝒙𝑘subscript𝑞𝑖𝑗subscript𝒙𝑗\bm{w}_{i}=\sum_{k\neq j}q_{ik}\bm{x}_{k}+q_{ij}\bm{x}_{j}bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k ≠ italic_j end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_q start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and 𝒘i=kjqik𝒙k+qij𝒙jsubscriptsuperscript𝒘𝑖subscript𝑘𝑗subscriptsuperscript𝑞𝑖𝑘subscript𝒙𝑘subscriptsuperscript𝑞𝑖𝑗subscriptsuperscript𝒙𝑗\bm{w}^{\prime}_{i}=\sum_{k\neq j}q^{\prime}_{ik}\bm{x}_{k}+q^{\prime}_{ij}\bm% {x}^{\prime}_{j}bold_italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k ≠ italic_j end_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. From assumption (A1) that 𝒙<c𝒙norm𝒙subscript𝑐𝒙\|\bm{x}\|<c_{\bm{x}}∥ bold_italic_x ∥ < italic_c start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT with probability 1, we know 𝒘i<c𝒙normsubscript𝒘𝑖subscript𝑐𝒙\|\bm{w}_{i}\|<c_{\bm{x}}∥ bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ < italic_c start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT almost surely. Then, we have

(27) 𝒘i𝒘inormsubscript𝒘𝑖subscriptsuperscript𝒘𝑖\displaystyle\|\bm{w}_{i}-\bm{w}^{\prime}_{i}\|∥ bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ =(kjqik𝒙k+qij𝒙j)(kjqik𝒙k+qij𝒙j)absentnormsubscript𝑘𝑗subscript𝑞𝑖𝑘subscript𝒙𝑘subscript𝑞𝑖𝑗subscript𝒙𝑗subscript𝑘𝑗subscriptsuperscript𝑞𝑖𝑘subscript𝒙𝑘subscriptsuperscript𝑞𝑖𝑗subscriptsuperscript𝒙𝑗\displaystyle=\left\|\left(\sum_{k\neq j}q_{ik}\bm{x}_{k}+q_{ij}\bm{x}_{j}% \right)-\left(\sum_{k\neq j}q^{\prime}_{ik}\bm{x}_{k}+q^{\prime}_{ij}\bm{x}^{% \prime}_{j}\right)\right\|= ∥ ( ∑ start_POSTSUBSCRIPT italic_k ≠ italic_j end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_q start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) - ( ∑ start_POSTSUBSCRIPT italic_k ≠ italic_j end_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ∥
=kj(qikqik)𝒙k+(qij𝒙jqij𝒙j)absentnormsubscript𝑘𝑗subscript𝑞𝑖𝑘subscriptsuperscript𝑞𝑖𝑘subscript𝒙𝑘subscript𝑞𝑖𝑗subscript𝒙𝑗subscriptsuperscript𝑞𝑖𝑗superscriptsubscript𝒙𝑗\displaystyle=\left\|\sum_{k\neq j}(q_{ik}-q^{\prime}_{ik})\bm{x}_{k}+(q_{ij}% \bm{x}_{j}-q^{\prime}_{ij}\bm{x}_{j}^{\prime})\right\|= ∥ ∑ start_POSTSUBSCRIPT italic_k ≠ italic_j end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT - italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + ( italic_q start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∥
=kj(qikqik)𝒙k+(qijqij)𝒙j+qij(𝒙j𝒙j)absentnormsubscript𝑘𝑗subscript𝑞𝑖𝑘subscriptsuperscript𝑞𝑖𝑘subscript𝒙𝑘subscript𝑞𝑖𝑗subscriptsuperscript𝑞𝑖𝑗subscript𝒙𝑗subscriptsuperscript𝑞𝑖𝑗subscript𝒙𝑗superscriptsubscript𝒙𝑗\displaystyle=\left\|\sum_{k\neq j}(q_{ik}-q^{\prime}_{ik})\bm{x}_{k}+(q_{ij}-% q^{\prime}_{ij})\bm{x}_{j}+q^{\prime}_{ij}(\bm{x}_{j}-\bm{x}_{j}^{\prime})\right\|= ∥ ∑ start_POSTSUBSCRIPT italic_k ≠ italic_j end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT - italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + ( italic_q start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT - italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∥
k=1n(qikqik)𝒙k+qij(𝒙j𝒙j)absentnormsuperscriptsubscript𝑘1𝑛subscript𝑞𝑖𝑘subscriptsuperscript𝑞𝑖𝑘subscript𝒙𝑘normsubscriptsuperscript𝑞𝑖𝑗subscript𝒙𝑗superscriptsubscript𝒙𝑗\displaystyle\leq\left\|\sum_{k=1}^{n}(q_{ik}-q^{\prime}_{ik})\bm{x}_{k}\right% \|+\left\|q^{\prime}_{ij}(\bm{x}_{j}-\bm{x}_{j}^{\prime})\right\|≤ ∥ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_q start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT - italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ + ∥ italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∥
c𝒙(k=1n|qikqik|+2qij).absentsubscript𝑐𝒙superscriptsubscript𝑘1𝑛subscript𝑞𝑖𝑘subscriptsuperscript𝑞𝑖𝑘2subscriptsuperscript𝑞𝑖𝑗\displaystyle\leq c_{\bm{x}}\left(\sum_{k=1}^{n}|q_{ik}-q^{\prime}_{ik}|+2q^{% \prime}_{ij}\right).≤ italic_c start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_q start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT - italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT | + 2 italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) .

Since Qsuperscript𝑄Q^{\prime}italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is a doubly stochastic matrix, i=1nqij=1superscriptsubscript𝑖1𝑛subscriptsuperscript𝑞𝑖𝑗1\sum_{i=1}^{n}q^{\prime}_{ij}=1∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = 1 for any j𝑗jitalic_j. By the arbitrariness of index j𝑗jitalic_j, it follows that

(28) max(X,𝒚,Q)(X,𝒚,Q)i=1n𝒘i𝒘ii=1nc𝒙(k=1n|qikqik|+2qij)=c𝒙(QQ1+2).subscriptsimilar-to𝑋𝒚𝑄superscript𝑋superscript𝒚superscript𝑄superscriptsubscript𝑖1𝑛normsubscript𝒘𝑖subscriptsuperscript𝒘𝑖superscriptsubscript𝑖1𝑛subscript𝑐𝒙superscriptsubscript𝑘1𝑛subscript𝑞𝑖𝑘subscriptsuperscript𝑞𝑖𝑘2subscriptsuperscript𝑞𝑖𝑗subscript𝑐𝒙subscriptnorm𝑄superscript𝑄12\displaystyle\max_{(X,\bm{y},Q)\sim(X^{\prime},\bm{y}^{\prime},Q^{\prime})}% \sum_{i=1}^{n}\|\bm{w}_{i}-\bm{w}^{\prime}_{i}\|\leq\sum_{i=1}^{n}c_{\bm{x}}% \left(\sum_{k=1}^{n}|q_{ik}-q^{\prime}_{ik}|+2q^{\prime}_{ij}\right)=c_{\bm{x}% }(\|Q-Q^{\prime}\|_{1}+2).roman_max start_POSTSUBSCRIPT ( italic_X , bold_italic_y , italic_Q ) ∼ ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ≤ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_q start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT - italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT | + 2 italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) = italic_c start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ( ∥ italic_Q - italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 2 ) .

The sensitivity of gt+1superscript𝑔𝑡1g^{t+1}italic_g start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT is

(29) ΔgsubscriptΔ𝑔\displaystyle\Delta_{g}roman_Δ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT =max(X,𝒚,Q)(X,𝒚,Q)gt+1(𝜷t;X,𝒚,Q)gt+1(𝜷t;X,𝒚,Q)absentsubscriptsimilar-to𝑋𝒚𝑄superscript𝑋superscript𝒚superscript𝑄normsuperscript𝑔𝑡1superscript𝜷𝑡𝑋𝒚𝑄superscript𝑔𝑡1superscript𝜷𝑡superscript𝑋superscript𝒚superscript𝑄\displaystyle=\max_{(X,\bm{y},Q)\sim(X^{\prime},\bm{y}^{\prime},Q^{\prime})}% \left\|g^{t+1}(\bm{\beta}^{t};X,\bm{y},Q)-g^{t+1}(\bm{\beta}^{t};X^{\prime},% \bm{y}^{\prime},Q^{\prime})\right\|= roman_max start_POSTSUBSCRIPT ( italic_X , bold_italic_y , italic_Q ) ∼ ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ∥ italic_g start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ( bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ; italic_X , bold_italic_y , italic_Q ) - italic_g start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ( bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ; italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∥
=max(X,𝒚,Q)(X,𝒚,Q)i=1n(𝒘i𝜷tΠR(zi))𝒘ii=1n(𝒘iT𝜷tΠR(zi))𝒘iabsentsubscriptsimilar-to𝑋𝒚𝑄superscript𝑋superscript𝒚superscript𝑄normsuperscriptsubscript𝑖1𝑛superscriptsubscript𝒘𝑖topsuperscript𝜷𝑡subscriptΠ𝑅subscript𝑧𝑖subscript𝒘𝑖superscriptsubscript𝑖1𝑛superscriptsubscript𝒘𝑖𝑇superscript𝜷𝑡subscriptΠ𝑅subscriptsuperscript𝑧𝑖subscriptsuperscript𝒘𝑖\displaystyle=\max_{(X,\bm{y},Q)\sim(X^{\prime},\bm{y}^{\prime},Q^{\prime})}% \left\|\sum_{i=1}^{n}(\bm{w}_{i}^{\top}\bm{\beta}^{t}-\Pi_{R}(z_{i}))\bm{w}_{i% }-\sum_{i=1}^{n}(\bm{w}_{i}^{\prime T}\bm{\beta}^{t}-\Pi_{R}(z^{\prime}_{i}))% \bm{w}^{\prime}_{i}\right\|= roman_max start_POSTSUBSCRIPT ( italic_X , bold_italic_y , italic_Q ) ∼ ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ∥ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - roman_Π start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ italic_T end_POSTSUPERSCRIPT bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - roman_Π start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) bold_italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥
max(X,𝒚,Q)(X,𝒚,Q)i=1n𝒘i𝜷t𝒘i𝒘iT𝜷t𝒘i+max(X,𝒚,Q)(X,𝒚,Q)i=1nΠR(zi)𝒘iΠR(zi)𝒘iabsentsubscriptsimilar-to𝑋𝒚𝑄superscript𝑋superscript𝒚superscript𝑄superscriptsubscript𝑖1𝑛normsuperscriptsubscript𝒘𝑖topsuperscript𝜷𝑡subscript𝒘𝑖superscriptsubscript𝒘𝑖𝑇superscript𝜷𝑡subscriptsuperscript𝒘𝑖subscriptsimilar-to𝑋𝒚𝑄superscript𝑋superscript𝒚superscript𝑄superscriptsubscript𝑖1𝑛normsubscriptΠ𝑅subscript𝑧𝑖subscript𝒘𝑖subscriptΠ𝑅subscriptsuperscript𝑧𝑖subscriptsuperscript𝒘𝑖\displaystyle\leq\max_{(X,\bm{y},Q)\sim(X^{\prime},\bm{y}^{\prime},Q^{\prime})% }\sum_{i=1}^{n}\left\|\bm{w}_{i}^{\top}\bm{\beta}^{t}\bm{w}_{i}-\bm{w}_{i}^{% \prime T}\bm{\beta}^{t}\bm{w}^{\prime}_{i}\right\|+\max_{(X,\bm{y},Q)\sim(X^{% \prime},\bm{y}^{\prime},Q^{\prime})}\sum_{i=1}^{n}\left\|\Pi_{R}(z_{i})\bm{w}_% {i}-\Pi_{R}(z^{\prime}_{i})\bm{w}^{\prime}_{i}\right\|≤ roman_max start_POSTSUBSCRIPT ( italic_X , bold_italic_y , italic_Q ) ∼ ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ italic_T end_POSTSUPERSCRIPT bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ + roman_max start_POSTSUBSCRIPT ( italic_X , bold_italic_y , italic_Q ) ∼ ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ roman_Π start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - roman_Π start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) bold_italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥
=:Δ1+Δ2.\displaystyle=:\Delta_{1}+\Delta_{2}.= : roman_Δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + roman_Δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT .

We use Δ1subscriptΔ1\Delta_{1}roman_Δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and Δ2subscriptΔ2\Delta_{2}roman_Δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT to denote the two terms on the right of (29), respectively. To bound the first term Δ1subscriptΔ1\Delta_{1}roman_Δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, since

𝒘i𝜷t𝒘i𝒘iT𝜷t𝒘inormsuperscriptsubscript𝒘𝑖topsuperscript𝜷𝑡subscript𝒘𝑖superscriptsubscript𝒘𝑖𝑇superscript𝜷𝑡superscriptsubscript𝒘𝑖\displaystyle\|\bm{w}_{i}^{\top}\bm{\beta}^{t}\bm{w}_{i}-\bm{w}_{i}^{\prime T}% \bm{\beta}^{t}\bm{w}_{i}^{\prime}\|∥ bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ italic_T end_POSTSUPERSCRIPT bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ =𝒘i𝜷t(𝒘i𝒘i+𝒘i)𝒘iT𝜷t𝒘iabsentnormsuperscriptsubscript𝒘𝑖topsuperscript𝜷𝑡subscript𝒘𝑖superscriptsubscript𝒘𝑖superscriptsubscript𝒘𝑖superscriptsubscript𝒘𝑖𝑇superscript𝜷𝑡superscriptsubscript𝒘𝑖\displaystyle=\|\bm{w}_{i}^{\top}\bm{\beta}^{t}(\bm{w}_{i}-\bm{w}_{i}^{\prime}% +\bm{w}_{i}^{\prime})-\bm{w}_{i}^{\prime T}\bm{\beta}^{t}\bm{w}_{i}^{\prime}\|= ∥ bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) - bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ italic_T end_POSTSUPERSCRIPT bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥
=𝒘i𝜷t(𝒘i𝒘i)+𝒘i𝜷t𝒘i𝒘iT𝜷t𝒘iabsentnormsuperscriptsubscript𝒘𝑖topsuperscript𝜷𝑡subscript𝒘𝑖superscriptsubscript𝒘𝑖superscriptsubscript𝒘𝑖topsuperscript𝜷𝑡superscriptsubscript𝒘𝑖superscriptsubscript𝒘𝑖𝑇superscript𝜷𝑡superscriptsubscript𝒘𝑖\displaystyle=\|\bm{w}_{i}^{\top}\bm{\beta}^{t}(\bm{w}_{i}-\bm{w}_{i}^{\prime}% )+\bm{w}_{i}^{\top}\bm{\beta}^{t}\bm{w}_{i}^{\prime}-\bm{w}_{i}^{\prime T}\bm{% \beta}^{t}\bm{w}_{i}^{\prime}\|= ∥ bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) + bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ italic_T end_POSTSUPERSCRIPT bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥
=𝒘i𝜷t(𝒘i𝒘i)+(𝒘i𝒘i)T𝜷t𝒘iabsentnormsuperscriptsubscript𝒘𝑖topsuperscript𝜷𝑡subscript𝒘𝑖superscriptsubscript𝒘𝑖superscriptsubscript𝒘𝑖superscriptsubscript𝒘𝑖𝑇superscript𝜷𝑡superscriptsubscript𝒘𝑖\displaystyle=\|\bm{w}_{i}^{\top}\bm{\beta}^{t}(\bm{w}_{i}-\bm{w}_{i}^{\prime}% )+(\bm{w}_{i}-\bm{w}_{i}^{\prime})^{T}\bm{\beta}^{t}\bm{w}_{i}^{\prime}\|= ∥ bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) + ( bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥
𝒘i𝜷t(𝒘i𝒘i)+(𝒘i𝒘i)T𝜷t𝒘iabsentnormsuperscriptsubscript𝒘𝑖topsuperscript𝜷𝑡subscript𝒘𝑖superscriptsubscript𝒘𝑖normsuperscriptsubscript𝒘𝑖superscriptsubscript𝒘𝑖𝑇superscript𝜷𝑡superscriptsubscript𝒘𝑖\displaystyle\leq\|\bm{w}_{i}^{\top}\bm{\beta}^{t}(\bm{w}_{i}-\bm{w}_{i}^{% \prime})\|+\|(\bm{w}_{i}-\bm{w}_{i}^{\prime})^{T}\bm{\beta}^{t}\bm{w}_{i}^{% \prime}\|≤ ∥ bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∥ + ∥ ( bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥
𝒘i𝜷t𝒘i𝒘i+𝒘i𝒘i𝜷t𝒘i,absentnormsubscript𝒘𝑖normsuperscript𝜷𝑡normsubscript𝒘𝑖superscriptsubscript𝒘𝑖normsubscript𝒘𝑖superscriptsubscript𝒘𝑖normsuperscript𝜷𝑡normsuperscriptsubscript𝒘𝑖\displaystyle\leq\|\bm{w}_{i}\|\|\bm{\beta}^{t}\|\|\bm{w}_{i}-\bm{w}_{i}^{% \prime}\|+\|\bm{w}_{i}-\bm{w}_{i}^{\prime}\|\|\bm{\beta}^{t}\|\|\bm{w}_{i}^{% \prime}\|,≤ ∥ bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ∥ bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ ∥ bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ + ∥ bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ ∥ bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ ∥ bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ ,

then by (28), Δ1subscriptΔ1\Delta_{1}roman_Δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT can be controlled:

(30) Δ1subscriptΔ1\displaystyle\Delta_{1}roman_Δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT max(X,𝒚,Q)(X,𝒚,Q)i=1n2Cc𝒙𝒘i𝒘i2Cc𝒙2(QQ1+2)2Cc𝒙2(M+2).absentsubscriptsimilar-to𝑋𝒚𝑄superscript𝑋superscript𝒚superscript𝑄superscriptsubscript𝑖1𝑛2𝐶subscript𝑐𝒙normsubscript𝒘𝑖superscriptsubscript𝒘𝑖2𝐶subscriptsuperscript𝑐2𝒙subscriptnorm𝑄superscript𝑄122𝐶subscriptsuperscript𝑐2𝒙𝑀2\displaystyle\leq\max_{(X,\bm{y},Q)\sim(X^{\prime},\bm{y}^{\prime},Q^{\prime})% }\sum_{i=1}^{n}2Cc_{\bm{x}}\|\bm{w}_{i}-\bm{w}_{i}^{\prime}\|\leq 2Cc^{2}_{\bm% {x}}(\|Q-Q^{\prime}\|_{1}+2)\leq 2Cc^{2}_{\bm{x}}(M+2).≤ roman_max start_POSTSUBSCRIPT ( italic_X , bold_italic_y , italic_Q ) ∼ ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT 2 italic_C italic_c start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ∥ bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ ≤ 2 italic_C italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ( ∥ italic_Q - italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 2 ) ≤ 2 italic_C italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ( italic_M + 2 ) .

For the second term Δ2=max(X,𝒚,Q)(X,𝒚,Q)i=1nΠR(zi)𝒘iΠR(zi)𝒘isubscriptΔ2subscriptsimilar-to𝑋𝒚𝑄superscript𝑋superscript𝒚superscript𝑄superscriptsubscript𝑖1𝑛normsubscriptΠ𝑅subscript𝑧𝑖subscript𝒘𝑖subscriptΠ𝑅subscriptsuperscript𝑧𝑖subscriptsuperscript𝒘𝑖\Delta_{2}=\max_{(X,\bm{y},Q)\sim(X^{\prime},\bm{y}^{\prime},Q^{\prime})}\sum_% {i=1}^{n}\|\Pi_{R}(z_{i})\bm{w}_{i}-\Pi_{R}(z^{\prime}_{i})\bm{w}^{\prime}_{i}\|roman_Δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT ( italic_X , bold_italic_y , italic_Q ) ∼ ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ roman_Π start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - roman_Π start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) bold_italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥, since

(31) ΠR(zi)𝒘iΠR(zi)𝒘inormsubscriptΠ𝑅subscript𝑧𝑖subscript𝒘𝑖subscriptΠ𝑅subscriptsuperscript𝑧𝑖subscriptsuperscript𝒘𝑖\displaystyle\|\Pi_{R}(z_{i})\bm{w}_{i}-\Pi_{R}(z^{\prime}_{i})\bm{w}^{\prime}% _{i}\|∥ roman_Π start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - roman_Π start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) bold_italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ =ΠR(zi)(𝒘i𝒘i)+(ΠR(zi)ΠR(zi))𝒘iabsentnormsubscriptΠ𝑅subscript𝑧𝑖subscript𝒘𝑖subscriptsuperscript𝒘𝑖subscriptΠ𝑅subscript𝑧𝑖subscriptΠ𝑅subscriptsuperscript𝑧𝑖subscriptsuperscript𝒘𝑖\displaystyle=\|\Pi_{R}(z_{i})(\bm{w}_{i}-\bm{w}^{\prime}_{i})+(\Pi_{R}(z_{i})% -\Pi_{R}(z^{\prime}_{i}))\bm{w}^{\prime}_{i}\|= ∥ roman_Π start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + ( roman_Π start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - roman_Π start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) bold_italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥
ΠR(zi)(𝒘i𝒘i)+(ΠR(zi)ΠR(zi))𝒘iabsentnormsubscriptΠ𝑅subscript𝑧𝑖subscript𝒘𝑖subscriptsuperscript𝒘𝑖normsubscriptΠ𝑅subscript𝑧𝑖subscriptΠ𝑅subscriptsuperscript𝑧𝑖subscriptsuperscript𝒘𝑖\displaystyle\leq\|\Pi_{R}(z_{i})(\bm{w}_{i}-\bm{w}^{\prime}_{i})\|+\|(\Pi_{R}% (z_{i})-\Pi_{R}(z^{\prime}_{i}))\bm{w}^{\prime}_{i}\|≤ ∥ roman_Π start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∥ + ∥ ( roman_Π start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - roman_Π start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) bold_italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥
R(𝒘i𝒘i)+c𝒙ΠR(zi)ΠR(zi).absent𝑅normsubscript𝒘𝑖subscriptsuperscript𝒘𝑖subscript𝑐𝒙normsubscriptΠ𝑅subscript𝑧𝑖subscriptΠ𝑅subscriptsuperscript𝑧𝑖\displaystyle\leq R\|(\bm{w}_{i}-\bm{w}^{\prime}_{i})\|+c_{\bm{x}}\|\Pi_{R}(z_% {i})-\Pi_{R}(z^{\prime}_{i})\|.≤ italic_R ∥ ( bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∥ + italic_c start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ∥ roman_Π start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - roman_Π start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∥ .

Then,

(32) Δ2subscriptΔ2\displaystyle\Delta_{2}roman_Δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT max(X,𝒚,Q)(X,𝒚,Q)i=1n(R(𝒘i𝒘i)+c𝒙ΠR(zi)ΠR(zi))absentsubscriptsimilar-to𝑋𝒚𝑄superscript𝑋superscript𝒚superscript𝑄superscriptsubscript𝑖1𝑛𝑅normsubscript𝒘𝑖subscriptsuperscript𝒘𝑖subscript𝑐𝒙normsubscriptΠ𝑅subscript𝑧𝑖subscriptΠ𝑅subscriptsuperscript𝑧𝑖\displaystyle\leq\max_{(X,\bm{y},Q)\sim(X^{\prime},\bm{y}^{\prime},Q^{\prime})% }\sum_{i=1}^{n}\left(R\|(\bm{w}_{i}-\bm{w}^{\prime}_{i})\|+c_{\bm{x}}\|\Pi_{R}% (z_{i})-\Pi_{R}(z^{\prime}_{i})\|\right)≤ roman_max start_POSTSUBSCRIPT ( italic_X , bold_italic_y , italic_Q ) ∼ ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_R ∥ ( bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∥ + italic_c start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ∥ roman_Π start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - roman_Π start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∥ )
=(Rc𝒙(QQ1+2)+2Rc𝒙).absent𝑅subscript𝑐𝒙subscriptnorm𝑄superscript𝑄122𝑅subscript𝑐𝒙\displaystyle=(Rc_{\bm{x}}(\|Q-Q^{\prime}\|_{1}+2)+2Rc_{\bm{x}}).= ( italic_R italic_c start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ( ∥ italic_Q - italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 2 ) + 2 italic_R italic_c start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ) .
Rc𝒙(M+4).absent𝑅subscript𝑐𝒙𝑀4\displaystyle\leq Rc_{\bm{x}}(M+4).≤ italic_R italic_c start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ( italic_M + 4 ) .

It follows that

ΔgΔ1+Δ22Cc𝒙2(M+2)+Rc𝒙(M+4).subscriptΔ𝑔subscriptΔ1subscriptΔ22𝐶subscriptsuperscript𝑐2𝒙𝑀2𝑅subscript𝑐𝒙𝑀4\Delta_{g}\leq\Delta_{1}+\Delta_{2}\leq 2Cc^{2}_{\bm{x}}(M+2)+Rc_{\bm{x}}(M+4).roman_Δ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ≤ roman_Δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + roman_Δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 2 italic_C italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ( italic_M + 2 ) + italic_R italic_c start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ( italic_M + 4 ) .

Proof of Theorem 4.1(ii).

We now show that Algorithm 2 (Post-RL sufficient statistics perturbation) is (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)( italic_ϵ , italic_δ )-differentially private. Let A=(W𝒛)𝐴conditional𝑊𝒛A=(W\mid\bm{z})italic_A = ( italic_W ∣ bold_italic_z ) be the augmented matrix considering linkage errors. Then AAsuperscript𝐴top𝐴A^{\top}Aitalic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A contains sufficient statistics for the 𝜷𝜷\bm{\beta}bold_italic_β. Thus, it suffices to show that the sensitivity of AAsuperscript𝐴top𝐴A^{\top}Aitalic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A is controlled by B=Rcx(M+4)+max{2cx2(M+2),2R2}𝐵𝑅subscript𝑐𝑥𝑀42superscriptsubscript𝑐𝑥2𝑀22superscript𝑅2B=Rc_{x}(M+4)+\max\{2c_{x}^{2}(M+2),2R^{2}\}italic_B = italic_R italic_c start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_M + 4 ) + roman_max { 2 italic_c start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_M + 2 ) , 2 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT }.

Let A=(W𝒛)superscript𝐴conditionalsuperscript𝑊superscript𝒛A^{\prime}=(W^{\prime}\mid\bm{z}^{\prime})italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = ( italic_W start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∣ bold_italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ), where Wsuperscript𝑊W^{\prime}italic_W start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and 𝒛superscript𝒛\bm{z}^{\prime}bold_italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT come from any neighboring data set, as in the proof of Theorem 4.1 (i). We have

AAAAsuperscript𝐴top𝐴superscript𝐴topsuperscript𝐴\displaystyle A^{\top}A-A^{\prime\top}A^{\prime}italic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A - italic_A start_POSTSUPERSCRIPT ′ ⊤ end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT =(WWWW00𝒛𝒛𝒛𝒛)+(0W𝒛W𝒛𝒛W𝒛W0)absentmatrixsuperscript𝑊top𝑊superscript𝑊topsuperscript𝑊00superscript𝒛top𝒛superscript𝒛topsuperscript𝒛matrix0superscript𝑊top𝒛superscript𝑊topsuperscript𝒛superscript𝒛top𝑊superscript𝒛topsuperscript𝑊0\displaystyle=\begin{pmatrix}W^{\top}W-W^{\prime\top}W^{\prime}&0\\ 0&\bm{z}^{\top}\bm{z}-\bm{z}^{\prime\top}\bm{z}^{\prime}\end{pmatrix}+\begin{% pmatrix}0&W^{\top}\bm{z}-W^{\prime\top}\bm{z}^{\prime}\\ \bm{z}^{\top}W-\bm{z}^{\prime\top}W^{\prime}&0\end{pmatrix}= ( start_ARG start_ROW start_CELL italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W - italic_W start_POSTSUPERSCRIPT ′ ⊤ end_POSTSUPERSCRIPT italic_W start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL bold_italic_z start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_z - bold_italic_z start_POSTSUPERSCRIPT ′ ⊤ end_POSTSUPERSCRIPT bold_italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ) + ( start_ARG start_ROW start_CELL 0 end_CELL start_CELL italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_z - italic_W start_POSTSUPERSCRIPT ′ ⊤ end_POSTSUPERSCRIPT bold_italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL bold_italic_z start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W - bold_italic_z start_POSTSUPERSCRIPT ′ ⊤ end_POSTSUPERSCRIPT italic_W start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_CELL start_CELL 0 end_CELL end_ROW end_ARG )
=:A1+A2.\displaystyle=:A_{1}+A_{2}.= : italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT .

By the properties of the norm of block matrices,

A1max{WWWW,𝒛𝒛𝒛𝒛}normsubscript𝐴1normsuperscript𝑊top𝑊superscript𝑊topsuperscript𝑊normsuperscript𝒛top𝒛superscript𝒛topsuperscript𝒛\|A_{1}\|\leq\max\{\|W^{\top}W-W^{\prime\top}W^{\prime}\|,\|\bm{z}^{\top}\bm{z% }-\bm{z}^{\prime\top}\bm{z}^{\prime}\|\}∥ italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ ≤ roman_max { ∥ italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W - italic_W start_POSTSUPERSCRIPT ′ ⊤ end_POSTSUPERSCRIPT italic_W start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ , ∥ bold_italic_z start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_z - bold_italic_z start_POSTSUPERSCRIPT ′ ⊤ end_POSTSUPERSCRIPT bold_italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ }

and

A2normsubscript𝐴2\displaystyle\|A_{2}\|∥ italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ =W𝒛W𝒛absentnormsuperscript𝑊top𝒛superscript𝑊topsuperscript𝒛\displaystyle=\|W^{\top}\bm{z}-W^{\prime\top}\bm{z}^{\prime}\|= ∥ italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_z - italic_W start_POSTSUPERSCRIPT ′ ⊤ end_POSTSUPERSCRIPT bold_italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥
=W𝒛W𝒛+W𝒛W𝒛absentnormsuperscript𝑊top𝒛superscript𝑊top𝒛superscript𝑊top𝒛superscript𝑊topsuperscript𝒛\displaystyle=\|W^{\top}\bm{z}-W^{\prime\top}\bm{z}+W^{\prime\top}\bm{z}-W^{% \prime\top}\bm{z}^{\prime}\|= ∥ italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_z - italic_W start_POSTSUPERSCRIPT ′ ⊤ end_POSTSUPERSCRIPT bold_italic_z + italic_W start_POSTSUPERSCRIPT ′ ⊤ end_POSTSUPERSCRIPT bold_italic_z - italic_W start_POSTSUPERSCRIPT ′ ⊤ end_POSTSUPERSCRIPT bold_italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥
(WW)𝒛+W(𝒛𝒛).absentnormsuperscript𝑊superscript𝑊top𝒛normsuperscript𝑊top𝒛superscript𝒛\displaystyle\leq\|(W-W^{\prime})^{\top}\bm{z}\|+\|W^{\prime\top}(\bm{z}-\bm{z% }^{\prime})\|.≤ ∥ ( italic_W - italic_W start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_z ∥ + ∥ italic_W start_POSTSUPERSCRIPT ′ ⊤ end_POSTSUPERSCRIPT ( bold_italic_z - bold_italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∥ .

We have

WWWWnormsuperscript𝑊top𝑊superscript𝑊topsuperscript𝑊\displaystyle\|W^{\top}W-W^{\prime\top}W^{\prime}\|∥ italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W - italic_W start_POSTSUPERSCRIPT ′ ⊤ end_POSTSUPERSCRIPT italic_W start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ =in𝒘i𝒘iin𝒘i𝒘iabsentnormsuperscriptsubscript𝑖𝑛subscript𝒘𝑖superscriptsubscript𝒘𝑖topsuperscriptsubscript𝑖𝑛subscriptsuperscript𝒘𝑖subscriptsuperscript𝒘top𝑖\displaystyle=\|\sum_{i}^{n}\bm{w}_{i}\bm{w}_{i}^{\top}-\sum_{i}^{n}\bm{w}^{% \prime}_{i}\bm{w}^{\prime\top}_{i}\|= ∥ ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT - ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_w start_POSTSUPERSCRIPT ′ ⊤ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥
in𝒘i(𝒘i𝒘i)+(𝒘i𝒘i)𝒘iabsentsuperscriptsubscript𝑖𝑛normsubscript𝒘𝑖superscriptsubscript𝒘𝑖superscriptsubscript𝒘𝑖topsubscript𝒘𝑖superscriptsubscript𝒘𝑖superscriptsubscript𝒘𝑖top\displaystyle\leq\sum_{i}^{n}\|\bm{w}_{i}(\bm{w}_{i}-\bm{w}_{i}^{\prime})^{% \top}+(\bm{w}_{i}-\bm{w}_{i}^{\prime})\bm{w}_{i}^{\prime\top}\|≤ ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + ( bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ⊤ end_POSTSUPERSCRIPT ∥
2cxin𝒘i𝒘iabsent2subscript𝑐𝑥superscriptsubscript𝑖𝑛normsubscript𝒘𝑖superscriptsubscript𝒘𝑖\displaystyle\leq 2c_{x}\sum_{i}^{n}\|\bm{w}_{i}-\bm{w}_{i}^{\prime}\|≤ 2 italic_c start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥
2cx2(M+2),absent2superscriptsubscript𝑐𝑥2𝑀2\displaystyle\leq 2c_{x}^{2}(M+2),≤ 2 italic_c start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_M + 2 ) ,

and

𝒛𝒛𝒛𝒛=|yj2yj2|2R2.normsuperscript𝒛top𝒛superscript𝒛topsuperscript𝒛subscriptsuperscript𝑦2𝑗subscriptsuperscript𝑦2𝑗2superscript𝑅2\|\bm{z}^{\top}\bm{z}-\bm{z}^{\prime\top}\bm{z}^{\prime}\|=|y^{2}_{j}-y^{% \prime 2}_{j}|\leq 2R^{2}.∥ bold_italic_z start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_z - bold_italic_z start_POSTSUPERSCRIPT ′ ⊤ end_POSTSUPERSCRIPT bold_italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ = | italic_y start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_y start_POSTSUPERSCRIPT ′ 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | ≤ 2 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Note that we can swap the rows of 𝒛superscript𝒛\bm{z}^{\prime}bold_italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and Qsuperscript𝑄Q^{\prime}italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, such that 𝒛𝒛\bm{z}bold_italic_z and 𝒛superscript𝒛\bm{z}^{\prime}bold_italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT only differ in one record and it does not change the estimation using Qsuperscript𝑄Q^{\prime}italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and 𝒛superscript𝒛\bm{z}^{\prime}bold_italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT after swapping. Then,

A1max{2cx2(M+2),2R2}.normsubscript𝐴12superscriptsubscript𝑐𝑥2𝑀22superscript𝑅2\|A_{1}\|\leq\max\{2c_{x}^{2}(M+2),2R^{2}\}.∥ italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ ≤ roman_max { 2 italic_c start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_M + 2 ) , 2 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } .

Since

(WW)𝒛=in(𝒘i𝒘i)ziRin𝒘i𝒘iRcx(M+2)normsuperscript𝑊superscript𝑊top𝒛normsuperscriptsubscript𝑖𝑛subscript𝒘𝑖subscriptsuperscript𝒘𝑖subscript𝑧𝑖𝑅superscriptsubscript𝑖𝑛normsubscript𝒘𝑖superscriptsubscript𝒘𝑖𝑅subscript𝑐𝑥𝑀2\|(W-W^{\prime})^{\top}\bm{z}\|=\|\sum_{i}^{n}(\bm{w}_{i}-\bm{w}^{\prime}_{i})% z_{i}\|\leq R\sum_{i}^{n}\|\bm{w}_{i}-\bm{w}_{i}^{\prime}\|\leq Rc_{x}(M+2)∥ ( italic_W - italic_W start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_z ∥ = ∥ ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ≤ italic_R ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ ≤ italic_R italic_c start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_M + 2 )

and

W(𝒛𝒛)normsuperscript𝑊top𝒛superscript𝒛\displaystyle\|W^{\prime\top}(\bm{z}-\bm{z}^{\prime})\|∥ italic_W start_POSTSUPERSCRIPT ′ ⊤ end_POSTSUPERSCRIPT ( bold_italic_z - bold_italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∥ =in𝒘i(zizi)absentnormsuperscriptsubscript𝑖𝑛subscriptsuperscript𝒘𝑖subscript𝑧𝑖superscriptsubscript𝑧𝑖\displaystyle=\|\sum_{i}^{n}\bm{w}^{\prime}_{i}(z_{i}-z_{i}^{\prime})\|= ∥ ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∥ =in𝒘i|zizi|absentsuperscriptsubscript𝑖𝑛normsubscript𝒘𝑖subscript𝑧𝑖superscriptsubscript𝑧𝑖\displaystyle=\sum_{i}^{n}\|\bm{w}_{i}\||z_{i}-z_{i}^{\prime}|= ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ | italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | cxin|zizi|absentsubscript𝑐𝑥superscriptsubscript𝑖𝑛subscript𝑧𝑖superscriptsubscript𝑧𝑖\displaystyle\leq c_{x}\sum_{i}^{n}|z_{i}-z_{i}^{\prime}|≤ italic_c start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | 2Rcx,absent2𝑅subscript𝑐𝑥\displaystyle\leq 2Rc_{x},≤ 2 italic_R italic_c start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ,

we have

A2Rcx(M+4).normsubscript𝐴2𝑅subscript𝑐𝑥𝑀4\|A_{2}\|\leq Rc_{x}(M+4).∥ italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ ≤ italic_R italic_c start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_M + 4 ) .

Putting the upper bounds together, we derive

maxA,AAAAARcx(M+4)+max{2cx2(M+2),2R2}.subscript𝐴superscript𝐴normsuperscript𝐴top𝐴superscript𝐴topsuperscript𝐴𝑅subscript𝑐𝑥𝑀42superscriptsubscript𝑐𝑥2𝑀22superscript𝑅2\max_{A,A^{\prime}}\|A^{\top}A-A^{\prime\top}A^{\prime}\|\leq Rc_{x}(M+4)+\max% \{2c_{x}^{2}(M+2),2R^{2}\}.roman_max start_POSTSUBSCRIPT italic_A , italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ italic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A - italic_A start_POSTSUPERSCRIPT ′ ⊤ end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ ≤ italic_R italic_c start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_M + 4 ) + roman_max { 2 italic_c start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_M + 2 ) , 2 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } .

Proof of Lemma 4.2.

To establish the behavior of the expected squared-error loss of 𝜷^OLSsuperscript^𝜷OLS{\hat{\bm{\beta}}^{\text{OLS}}}over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT OLS end_POSTSUPERSCRIPT, first note that since

0<1L<dλmin(XXn)dλmax(XXn)<L,01𝐿𝑑subscript𝜆superscript𝑋top𝑋𝑛𝑑subscript𝜆superscript𝑋top𝑋𝑛𝐿0<\frac{1}{L}<d\lambda_{\min}\left(\frac{X^{\top}X}{n}\right)\leq d\lambda_{% \max}\left(\frac{X^{\top}X}{n}\right)<L,0 < divide start_ARG 1 end_ARG start_ARG italic_L end_ARG < italic_d italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( divide start_ARG italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_X end_ARG start_ARG italic_n end_ARG ) ≤ italic_d italic_λ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ( divide start_ARG italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_X end_ARG start_ARG italic_n end_ARG ) < italic_L ,

we have

dLn<λmax(XX)1λmax(XX)1<dLn.𝑑𝐿𝑛subscript𝜆superscriptsuperscript𝑋top𝑋1subscript𝜆superscriptsuperscript𝑋top𝑋1𝑑𝐿𝑛\frac{d}{Ln}<\lambda_{\max}(X^{\top}X)^{-1}\leq\lambda_{\max}(X^{\top}X)^{-1}<% \frac{dL}{n}.divide start_ARG italic_d end_ARG start_ARG italic_L italic_n end_ARG < italic_λ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ( italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_X ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ≤ italic_λ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ( italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_X ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT < divide start_ARG italic_d italic_L end_ARG start_ARG italic_n end_ARG .

Therefore, tr(XX)1=i=1dλi((XX)1)=Θ(d2/n)\operatorname{tr}(X^{\top}X)^{-1}=\sum_{i=1}^{d}\lambda_{i}((X^{\top}X)^{-1})=% \Theta(d^{2}/n)roman_tr ( italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_X ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( ( italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_X ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) = roman_Θ ( italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_n ) where λi,i=1,,dformulae-sequencesubscript𝜆𝑖𝑖1𝑑\lambda_{i},\quad i=1,\dots,ditalic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i = 1 , … , italic_d are the eigenvalues of (XX)1superscriptsuperscript𝑋top𝑋1(X^{\top}X)^{-1}( italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_X ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. ∎

Proof of Lemma 4.3.

Note that 𝜷^RLsuperscript^𝜷RL{\hat{\bm{\beta}}^{\text{RL}}}over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT is an unbiased estimator for 𝜷𝜷\bm{\beta}bold_italic_β, and hence

(33) 𝔼𝜷^RL𝜷2𝔼superscriptnormsuperscript^𝜷RL𝜷2\displaystyle\mathbb{E}\|{{\hat{\bm{\beta}}^{\text{RL}}}-\bm{\beta}}\|^{2}blackboard_E ∥ over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT - bold_italic_β ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =𝔼[i=1d(β^iRLβi)2]absent𝔼delimited-[]superscriptsubscript𝑖1𝑑superscriptsuperscriptsubscript^𝛽𝑖RLsubscript𝛽𝑖2\displaystyle=\mathbb{E}\left[\sum_{i=1}^{d}(\hat{\beta}_{i}^{\text{RL}}-\beta% _{i})^{2}\right]= blackboard_E [ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ( over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]
=i=1d𝔼(β^iRLβi)2absentsuperscriptsubscript𝑖1𝑑𝔼superscriptsuperscriptsubscript^𝛽𝑖RLsubscript𝛽𝑖2\displaystyle=\sum_{i=1}^{d}\mathbb{E}(\hat{\beta}_{i}^{\text{RL}}-\beta_{i})^% {2}= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT blackboard_E ( over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=i=1dVar(β^iRL)absentsuperscriptsubscript𝑖1𝑑Varsuperscriptsubscript^𝛽𝑖RL\displaystyle=\sum_{i=1}^{d}\operatorname{Var}(\hat{\beta}_{i}^{\text{RL}})= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT roman_Var ( over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT )
=tr(ΣRL).absenttrsuperscriptΣRL\displaystyle=\operatorname{tr}(\Sigma^{\text{RL}}).= roman_tr ( roman_Σ start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT ) .

Proof of Theorem 4.4.

To establish an upper bound of the excess error of the private estimator, i.e., 𝜷^priv𝜷^RL2superscriptnormsuperscriptbold-^𝜷privsuperscript^𝜷RL2\|{\bm{\hat{\beta}}^{\text{priv}}}-{\hat{\bm{\beta}}^{\text{RL}}}\|^{2}∥ overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, for Algorithm 1, we work under the event ={ΠR(zi)=zi,i[n]}formulae-sequencesubscriptΠ𝑅subscript𝑧𝑖subscript𝑧𝑖for-all𝑖delimited-[]𝑛\mathcal{E}=\{\Pi_{R}(z_{i})=z_{i},\forall i\in[n]\}caligraphic_E = { roman_Π start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ∀ italic_i ∈ [ italic_n ] }. By the concentration bound of the Gaussian distribution, with the choice of R=σ2lnn𝑅𝜎2𝑛R=\sigma\sqrt{2\ln n}italic_R = italic_σ square-root start_ARG 2 roman_ln italic_n end_ARG, ()1c1exp(c2lnn)1subscript𝑐1subscript𝑐2𝑛\mathbb{P}(\mathcal{E})\geq 1-c_{1}\exp(-c_{2}\ln n)blackboard_P ( caligraphic_E ) ≥ 1 - italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_exp ( - italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_ln italic_n ) where c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are constants.

Recall that the loss function n(𝜷)=def12n(𝒛W𝜷)(𝒛W𝜷)superscript𝑑𝑒𝑓subscript𝑛𝜷12𝑛superscript𝒛𝑊𝜷top𝒛𝑊𝜷\mathcal{L}_{n}(\bm{\beta})\stackrel{{\scriptstyle def}}{{=}}\frac{1}{2n}(\bm{% z}-W\bm{\beta})^{\top}(\bm{z}-W\bm{\beta})caligraphic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_italic_β ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG italic_d italic_e italic_f end_ARG end_RELOP divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ( bold_italic_z - italic_W bold_italic_β ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_z - italic_W bold_italic_β ). The assumption about the eigenvalues of WW/nsuperscript𝑊top𝑊𝑛W^{\top}W/nitalic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W / italic_n implies that n(𝜷)subscript𝑛𝜷\mathcal{L}_{n}(\bm{\beta})caligraphic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_italic_β ) is Ld𝐿𝑑\frac{L}{d}divide start_ARG italic_L end_ARG start_ARG italic_d end_ARG-smooth and 1dL1𝑑𝐿\frac{1}{dL}divide start_ARG 1 end_ARG start_ARG italic_d italic_L end_ARG-strongly convex. See the proof in Lemma A.1. Under \mathcal{E}caligraphic_E, the iterate 𝜷t+1=ΠC(𝜷tηn(𝜷t)+𝒖t)superscript𝜷𝑡1subscriptΠ𝐶superscript𝜷𝑡𝜂subscript𝑛superscript𝜷𝑡subscript𝒖𝑡\bm{\beta}^{t+1}=\Pi_{C}(\bm{\beta}^{t}-\eta\nabla\mathcal{L}_{n}(\bm{\beta}^{% t})+\bm{u}_{t})bold_italic_β start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = roman_Π start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ( bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_η ∇ caligraphic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ). Let 𝜷^t+1=ΠC(𝜷tηn(𝜷t))superscript^𝜷𝑡1subscriptΠ𝐶superscript𝜷𝑡𝜂subscript𝑛superscript𝜷𝑡\hat{\bm{\beta}}^{t+1}=\Pi_{C}(\bm{\beta}^{t}-\eta\nabla\mathcal{L}_{n}(\bm{% \beta}^{t}))over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = roman_Π start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ( bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_η ∇ caligraphic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ) be the unperturbed iterate, then 𝜷t+1𝜷^t+1𝒖tnormsuperscript𝜷𝑡1superscript^𝜷𝑡1normsubscript𝒖𝑡\|\bm{\beta}^{t+1}-\hat{\bm{\beta}}^{t+1}\|\leq\|\bm{u}_{t}\|∥ bold_italic_β start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∥ ≤ ∥ bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥. Since (A3) says that 𝜷c0norm𝜷subscript𝑐0\|\bm{\beta}\|\leq c_{0}∥ bold_italic_β ∥ ≤ italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, we can assume 𝜷^RLc0normsuperscript^𝜷RLsubscript𝑐0\|{\hat{\bm{\beta}}^{\text{RL}}}\|\leq c_{0}∥ over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT ∥ ≤ italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT without loss of generality where 𝜷^RL=argmin𝜷n(𝜷)superscript^𝜷RL𝜷subscript𝑛𝜷{\hat{\bm{\beta}}^{\text{RL}}}=\underset{\bm{\beta}}{\arg\min}\mathcal{L}_{n}(% \bm{\beta})over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT = underbold_italic_β start_ARG roman_arg roman_min end_ARG caligraphic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_italic_β ). Then, 𝜷^RLsuperscript^𝜷RL{\hat{\bm{\beta}}^{\text{RL}}}over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT is the same as 𝜷^RL=defargmin𝜷Cn(𝜷)superscript𝑑𝑒𝑓superscript^𝜷RLnorm𝜷𝐶subscript𝑛𝜷{\hat{\bm{\beta}}^{\text{RL}}}\stackrel{{\scriptstyle def}}{{=}}\underset{\|% \bm{\beta}\|\leq C}{\arg\min}\mathcal{L}_{n}(\bm{\beta})over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG italic_d italic_e italic_f end_ARG end_RELOP start_UNDERACCENT ∥ bold_italic_β ∥ ≤ italic_C end_UNDERACCENT start_ARG roman_arg roman_min end_ARG caligraphic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_italic_β ) by setting C=c0𝐶subscript𝑐0C=c_{0}italic_C = italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. By Lemma A.2, with η=dL𝜂𝑑𝐿\eta=\frac{d}{L}italic_η = divide start_ARG italic_d end_ARG start_ARG italic_L end_ARG, it then follows that

𝜷^t+1𝜷^RL2(11L2)𝜷t𝜷^RL2.superscriptnormsuperscript^𝜷𝑡1superscript^𝜷RL211superscript𝐿2superscriptnormsuperscript𝜷𝑡superscript^𝜷RL2\|\hat{\bm{\beta}}^{t+1}-{\hat{\bm{\beta}}^{\text{RL}}}\|^{2}\leq\left(1-\frac% {1}{L^{2}}\right)\|{\bm{\beta}}^{t}-{\hat{\bm{\beta}}^{\text{RL}}}\|^{2}.∥ over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ( 1 - divide start_ARG 1 end_ARG start_ARG italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) ∥ bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Let c2𝑐2c\geq 2italic_c ≥ 2 be some constant, then by Lemma A.3, we have the following for the noisy iterate 𝜷t+1superscript𝜷𝑡1{\bm{\beta}}^{t+1}bold_italic_β start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT:

𝜷t+1𝜷^RL2superscriptnormsuperscript𝜷𝑡1superscript^𝜷RL2\displaystyle\|\bm{\beta}^{t+1}-{\hat{\bm{\beta}}^{\text{RL}}}\|^{2}∥ bold_italic_β start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =𝜷^t+1𝜷^RL+𝜷t+1𝜷^t+12absentsuperscriptnormsuperscript^𝜷𝑡1superscript^𝜷RLsuperscript𝜷𝑡1superscript^𝜷𝑡12\displaystyle=\|\hat{\bm{\beta}}^{t+1}-{\hat{\bm{\beta}}^{\text{RL}}}+\bm{% \beta}^{t+1}-\hat{\bm{\beta}}^{t+1}\|^{2}= ∥ over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT + bold_italic_β start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
(1+1cL2)𝜷^t+1𝜷^RL2+(1+cL2)𝒖t2absent11𝑐superscript𝐿2superscriptnormsuperscript^𝜷𝑡1superscript^𝜷RL21𝑐superscript𝐿2superscriptnormsubscript𝒖𝑡2\displaystyle\leq\left(1+\frac{1}{cL^{2}}\right)\|\hat{\bm{\beta}}^{t+1}-{\hat% {\bm{\beta}}^{\text{RL}}}\|^{2}+\left(1+cL^{2}\right)\|\bm{u}_{t}\|^{2}≤ ( 1 + divide start_ARG 1 end_ARG start_ARG italic_c italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) ∥ over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 + italic_c italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ∥ bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
(1+1cL2)(11L2)𝜷t𝜷^RL2+(1+cL2)𝒖t2absent11𝑐superscript𝐿211superscript𝐿2superscriptnormsuperscript𝜷𝑡superscript^𝜷RL21𝑐superscript𝐿2superscriptnormsubscript𝒖𝑡2\displaystyle\leq\left(1+\frac{1}{cL^{2}}\right)\left(1-\frac{1}{L^{2}}\right)% \|{\bm{\beta}}^{t}-{\hat{\bm{\beta}}^{\text{RL}}}\|^{2}+\left(1+cL^{2}\right)% \|\bm{u}_{t}\|^{2}≤ ( 1 + divide start_ARG 1 end_ARG start_ARG italic_c italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) ( 1 - divide start_ARG 1 end_ARG start_ARG italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) ∥ bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 + italic_c italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ∥ bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
(1c1cL2)𝜷t𝜷^RL2+(1+cL2)𝒖t2.absent1𝑐1𝑐superscript𝐿2superscriptnormsuperscript𝜷𝑡superscript^𝜷RL21𝑐superscript𝐿2superscriptnormsubscript𝒖𝑡2\displaystyle\leq\left(1-\frac{c-1}{cL^{2}}\right)\|{\bm{\beta}}^{t}-{\hat{\bm% {\beta}}^{\text{RL}}}\|^{2}+\left(1+cL^{2}\right)\|\bm{u}_{t}\|^{2}.≤ ( 1 - divide start_ARG italic_c - 1 end_ARG start_ARG italic_c italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) ∥ bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 + italic_c italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ∥ bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

The above recursive formula yields

𝜷^priv𝜷^RL2(1c1cL2)T𝜷0𝜷^RL2+(1+cL2)k=0T1(1c1cL2)k𝒖T1k2.superscriptnormsuperscriptbold-^𝜷privsuperscript^𝜷RL2superscript1𝑐1𝑐superscript𝐿2𝑇superscriptnormsuperscript𝜷0superscript^𝜷RL21𝑐superscript𝐿2superscriptsubscript𝑘0𝑇1superscript1𝑐1𝑐superscript𝐿2𝑘superscriptnormsubscript𝒖𝑇1𝑘2\|{\bm{\hat{\beta}}^{\text{priv}}}-{\hat{\bm{\beta}}^{\text{RL}}}\|^{2}\leq% \left(1-\frac{c-1}{cL^{2}}\right)^{T}\|{\bm{\beta}}^{0}-{\hat{\bm{\beta}}^{% \text{RL}}}\|^{2}+\left(1+cL^{2}\right)\sum_{k=0}^{T-1}\left(1-\frac{c-1}{cL^{% 2}}\right)^{k}\|\bm{u}_{T-1-k}\|^{2}.∥ overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ( 1 - divide start_ARG italic_c - 1 end_ARG start_ARG italic_c italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_β start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 + italic_c italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ( 1 - divide start_ARG italic_c - 1 end_ARG start_ARG italic_c italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ bold_italic_u start_POSTSUBSCRIPT italic_T - 1 - italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Setting T=ln(c02n)/ln(1c1cL2)𝑇superscriptsubscript𝑐02𝑛1𝑐1𝑐superscript𝐿2\displaystyle T=\left\lceil\ln(c_{0}^{2}n)/\ln\left(1-\frac{c-1}{cL^{2}}\right% )\right\rceilitalic_T = ⌈ roman_ln ( italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n ) / roman_ln ( 1 - divide start_ARG italic_c - 1 end_ARG start_ARG italic_c italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) ⌉, the first term (1c1cL2)T𝜷0𝜷^RL21nsuperscript1𝑐1𝑐superscript𝐿2𝑇superscriptnormsuperscript𝜷0superscript^𝜷RL21𝑛\displaystyle\left(1-\frac{c-1}{cL^{2}}\right)^{T}\|{\bm{\beta}}^{0}-{\hat{\bm% {\beta}}^{\text{RL}}}\|^{2}\leq\frac{1}{n}( 1 - divide start_ARG italic_c - 1 end_ARG start_ARG italic_c italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_β start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG, given that 𝜷0=𝟎superscript𝜷00\bm{\beta}^{0}=\bm{0}bold_italic_β start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = bold_0 and 𝜷^RLc0normsuperscript^𝜷RLsubscript𝑐0\|{\hat{\bm{\beta}}^{\text{RL}}}\|\leq c_{0}∥ over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT ∥ ≤ italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. When c𝑐citalic_c is sufficiently large, T𝑇Titalic_T can be set to L2ln(c02n)superscript𝐿2superscriptsubscript𝑐02𝑛\left\lceil L^{2}\ln(c_{0}^{2}n)\right\rceil⌈ italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_ln ( italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n ) ⌉.

To control the second term, we apply Lemma A.4 with ρ=1(c1)/(cL2)𝜌1𝑐1𝑐superscript𝐿2\rho=1-(c-1)/(cL^{2})italic_ρ = 1 - ( italic_c - 1 ) / ( italic_c italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) and ζ=2η2TB2ln(1/δ)n2ϵ2𝜁2superscript𝜂2𝑇superscript𝐵21𝛿superscript𝑛2superscriptitalic-ϵ2\zeta=2{\eta}^{2}TB^{2}\frac{\ln(1/\delta)}{n^{2}\epsilon^{2}}italic_ζ = 2 italic_η start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG roman_ln ( 1 / italic_δ ) end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG which is the variance of 𝒖T1ksubscript𝒖𝑇1𝑘\bm{u}_{T-1-k}bold_italic_u start_POSTSUBSCRIPT italic_T - 1 - italic_k end_POSTSUBSCRIPT for k=1,,T1𝑘1𝑇1k=1,...,T-1italic_k = 1 , … , italic_T - 1. Provided δ=o(1/n)𝛿𝑜1𝑛\delta=o(1/n)italic_δ = italic_o ( 1 / italic_n ), let s=K1ζd𝑠subscript𝐾1𝜁𝑑s=K_{1}\zeta ditalic_s = italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_ζ italic_d for some sufficiently large constant K1subscript𝐾1K_{1}italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT so ρζd1ρ+s=Θ(ζd)𝜌𝜁𝑑1𝜌𝑠Θ𝜁𝑑\frac{\rho\zeta d}{1-\rho}+s=\Theta(\zeta d)divide start_ARG italic_ρ italic_ζ italic_d end_ARG start_ARG 1 - italic_ρ end_ARG + italic_s = roman_Θ ( italic_ζ italic_d ) where

ζd𝜁𝑑\displaystyle\zeta ditalic_ζ italic_d =4η2TB2ln(1/δ)n2ϵ2dabsent4superscript𝜂2𝑇superscript𝐵21𝛿superscript𝑛2superscriptitalic-ϵ2𝑑\displaystyle=4{\eta}^{2}TB^{2}\frac{\ln(1/\delta)}{n^{2}\epsilon^{2}}\cdot d= 4 italic_η start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG roman_ln ( 1 / italic_δ ) end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ⋅ italic_d
=4(dL)2L2ln(c02n)(σ2lnnc𝒙(M+4)+2c0c𝒙2(M+2))2ln(1/δ)n2ϵ2dabsent4superscript𝑑𝐿2superscript𝐿2superscriptsubscript𝑐02𝑛superscript𝜎2𝑛subscript𝑐𝒙𝑀42subscript𝑐0subscriptsuperscript𝑐2𝒙𝑀221𝛿superscript𝑛2superscriptitalic-ϵ2𝑑\displaystyle=4\left(\frac{d}{L}\right)^{2}\left\lceil L^{2}\ln(c_{0}^{2}n)% \right\rceil(\sigma\sqrt{2\ln n}c_{\bm{x}}(M+4)+2c_{0}c^{2}_{\bm{x}}(M+2))^{2}% \frac{\ln(1/\delta)}{n^{2}\epsilon^{2}}\cdot d= 4 ( divide start_ARG italic_d end_ARG start_ARG italic_L end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⌈ italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_ln ( italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n ) ⌉ ( italic_σ square-root start_ARG 2 roman_ln italic_n end_ARG italic_c start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ( italic_M + 4 ) + 2 italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT ( italic_M + 2 ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG roman_ln ( 1 / italic_δ ) end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ⋅ italic_d
=O(σ2d3ln2nln(1/δ)n2ϵ2).absent𝑂superscript𝜎2superscript𝑑3superscript2𝑛1𝛿superscript𝑛2superscriptitalic-ϵ2\displaystyle=O\left(\frac{\sigma^{2}d^{3}\ln^{2}n\ln(1/\delta)}{n^{2}\epsilon% ^{2}}\right).= italic_O ( divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT roman_ln start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n roman_ln ( 1 / italic_δ ) end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) .

That is, the noise term k=0T1(1c1cL2)k𝒖T1k2superscriptsubscript𝑘0𝑇1superscript1𝑐1𝑐superscript𝐿2𝑘superscriptnormsubscript𝒖𝑇1𝑘2\displaystyle\sum_{k=0}^{T-1}\left(1-\frac{c-1}{cL^{2}}\right)^{k}\|\bm{u}_{T-% 1-k}\|^{2}∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ( 1 - divide start_ARG italic_c - 1 end_ARG start_ARG italic_c italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ bold_italic_u start_POSTSUBSCRIPT italic_T - 1 - italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is then controlled by a corresponding big-O statement with probability at least 1ec3d1superscript𝑒subscript𝑐3𝑑1-e^{-c_{3}d}1 - italic_e start_POSTSUPERSCRIPT - italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_d end_POSTSUPERSCRIPT, where c3=min((1ρ2)K128ρ2,K18ρ)subscript𝑐31superscript𝜌2superscriptsubscript𝐾128superscript𝜌2subscript𝐾18𝜌c_{3}=\min\left(\frac{(1-\rho^{2})K_{1}^{2}}{8\rho^{2}},\frac{K_{1}}{8\rho}\right)italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = roman_min ( divide start_ARG ( 1 - italic_ρ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 8 italic_ρ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , divide start_ARG italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG 8 italic_ρ end_ARG ), hence

𝜷^priv𝜷^RL2=1n+O(σ2d3ln2nln(1/δ)n2ϵ2).superscriptnormsuperscriptbold-^𝜷privsuperscript^𝜷RL21𝑛𝑂superscript𝜎2superscript𝑑3superscript2𝑛1𝛿superscript𝑛2superscriptitalic-ϵ2\|{\bm{\hat{\beta}}^{\text{priv}}}-{\hat{\bm{\beta}}^{\text{RL}}}\|^{2}=\frac{% 1}{n}+O\left(\frac{\sigma^{2}d^{3}\ln^{2}n\ln(1/\delta)}{n^{2}\epsilon^{2}}% \right).∥ overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG + italic_O ( divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT roman_ln start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n roman_ln ( 1 / italic_δ ) end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) .

Proof of Theorem 4.5.

Algorithm 2 is also analyzed under ={ΠR(zi)=zi,i=1,,n}formulae-sequencesubscriptΠ𝑅subscript𝑧𝑖subscript𝑧𝑖for-all𝑖1𝑛\mathcal{E}=\{\Pi_{R}(z_{i})=z_{i},\forall i=1,\dots,n\}caligraphic_E = { roman_Π start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ∀ italic_i = 1 , … , italic_n } as in Theorem 4.5. Then, with R=σ2lnn𝑅𝜎2𝑛R=\sigma\sqrt{2\ln n}italic_R = italic_σ square-root start_ARG 2 roman_ln italic_n end_ARG, we have ()1c1exp(c2lnn)1subscript𝑐1subscript𝑐2𝑛\mathbb{P}(\mathcal{E})\geq 1-c_{1}\exp(-c_{2}\ln n)blackboard_P ( caligraphic_E ) ≥ 1 - italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_exp ( - italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_ln italic_n ).

By Lemma A.5,

(34) (WW+U)1superscriptsuperscript𝑊top𝑊𝑈1\displaystyle\quad(W^{\top}W+U)^{-1}( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W + italic_U ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT =(WW)1(WW)1(I+U(WW)1)1U(WW)1.absentsuperscriptsuperscript𝑊top𝑊1superscriptsuperscript𝑊top𝑊1superscript𝐼𝑈superscriptsuperscript𝑊top𝑊11𝑈superscriptsuperscript𝑊top𝑊1\displaystyle=(W^{\top}W)^{-1}-(W^{\top}W)^{-1}\cdot(I+U(W^{\top}W)^{-1})^{-1}% \cdot U(W^{\top}W)^{-1}.= ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT - ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ⋅ ( italic_I + italic_U ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ⋅ italic_U ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT .

Then,

(35) 𝜷^priv𝜷^RLsuperscriptbold-^𝜷privsuperscript^𝜷RL\displaystyle{\bm{\hat{\beta}}^{\text{priv}}}-{\hat{\bm{\beta}}^{\text{RL}}}overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT =(WW+U)1(W𝒛+𝒖)(WW)1W𝒛absentsuperscriptsuperscript𝑊top𝑊𝑈1superscript𝑊top𝒛𝒖superscriptsuperscript𝑊top𝑊1superscript𝑊top𝒛\displaystyle=(W^{\top}W+U)^{-1}(W^{\top}\bm{z}+\bm{u})-(W^{\top}W)^{-1}W^{% \top}\bm{z}= ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W + italic_U ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_z + bold_italic_u ) - ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_z
=(WW)1𝒖(WW)1(I+U(WW)1)1U(WW)1(W𝒛+𝒖)absentsuperscriptsuperscript𝑊top𝑊1𝒖superscriptsuperscript𝑊top𝑊1superscript𝐼𝑈superscriptsuperscript𝑊top𝑊11𝑈superscriptsuperscript𝑊top𝑊1superscript𝑊top𝒛𝒖\displaystyle=(W^{\top}W)^{-1}\bm{u}-(W^{\top}W)^{-1}\cdot(I+U(W^{\top}W)^{-1}% )^{-1}\cdot U(W^{\top}W)^{-1}(W^{\top}\bm{z}+\bm{u})= ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_u - ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ⋅ ( italic_I + italic_U ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ⋅ italic_U ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_z + bold_italic_u )
=(WW)1𝒖(WW)1(I+U(WW)1)1U(𝜷^RL+(WW)1𝒖)absentsuperscriptsuperscript𝑊top𝑊1𝒖superscriptsuperscript𝑊top𝑊1superscript𝐼𝑈superscriptsuperscript𝑊top𝑊11𝑈superscriptbold-^𝜷RLsuperscriptsuperscript𝑊top𝑊1𝒖\displaystyle=(W^{\top}W)^{-1}\bm{u}-(W^{\top}W)^{-1}\cdot(I+U(W^{\top}W)^{-1}% )^{-1}\cdot U(\bm{\hat{\bm{\beta}}^{\text{RL}}}+(W^{\top}W)^{-1}\bm{u})= ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_u - ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ⋅ ( italic_I + italic_U ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ⋅ italic_U ( overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT + ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_u )

To bound 𝜷^priv𝜷^RLnormsuperscriptbold-^𝜷privsuperscript^𝜷RL\|{\bm{\hat{\beta}}^{\text{priv}}}-{\hat{\bm{\beta}}^{\text{RL}}}\|∥ overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT ∥, we need to bound the norms of (WW)1superscriptsuperscript𝑊top𝑊1(W^{\top}W)^{-1}( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, U𝑈Uitalic_U, 𝒖𝒖\bm{u}bold_italic_u, and (I+U(WW)1)1superscript𝐼𝑈superscriptsuperscript𝑊top𝑊11(I+U(W^{\top}W)^{-1})^{-1}( italic_I + italic_U ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. First, by the assumption (A4):

0<1L<dλmin(WWn)dλmax(WWn)<L01𝐿𝑑subscript𝜆superscript𝑊top𝑊𝑛𝑑subscript𝜆superscript𝑊top𝑊𝑛𝐿0<\frac{1}{L}<d\lambda_{\min}\left(\frac{W^{\top}W}{n}\right)\leq d\lambda_{% \max}\left(\frac{W^{\top}W}{n}\right)<L0 < divide start_ARG 1 end_ARG start_ARG italic_L end_ARG < italic_d italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( divide start_ARG italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W end_ARG start_ARG italic_n end_ARG ) ≤ italic_d italic_λ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ( divide start_ARG italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W end_ARG start_ARG italic_n end_ARG ) < italic_L

for some constant 1<L<1𝐿1<L<\infty1 < italic_L < ∞, we have

dLnλmin(WW)1λmax(WW)1Ldn.𝑑𝐿𝑛subscript𝜆superscriptsuperscript𝑊top𝑊1subscript𝜆superscriptsuperscript𝑊top𝑊1𝐿𝑑𝑛\frac{d}{Ln}\leq\lambda_{\min}(W^{\top}W)^{-1}\leq\lambda_{\max}(W^{\top}W)^{-% 1}\leq\frac{Ld}{n}.divide start_ARG italic_d end_ARG start_ARG italic_L italic_n end_ARG ≤ italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ≤ italic_λ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ≤ divide start_ARG italic_L italic_d end_ARG start_ARG italic_n end_ARG .

Then,

(36) (WW)1=λmax(WW)1=O(dn).normsuperscriptsuperscript𝑊top𝑊1subscript𝜆superscriptsuperscript𝑊top𝑊1𝑂𝑑𝑛\|(W^{\top}W)^{-1}\|=\lambda_{\max}(W^{\top}W)^{-1}=O\left(\frac{d}{n}\right).∥ ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ = italic_λ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = italic_O ( divide start_ARG italic_d end_ARG start_ARG italic_n end_ARG ) .

Recall from Algorithm 2 that U𝑈Uitalic_U is a Gaussian symmetric matrix with upper triangle given by iid N(0,ω).𝑁0𝜔N(0,\omega).italic_N ( 0 , italic_ω ) . The Gaussian concentration bounds give that w.p. 1ec3dabsent1superscript𝑒subscript𝑐3𝑑\geq 1-e^{-c_{3}d}≥ 1 - italic_e start_POSTSUPERSCRIPT - italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_d end_POSTSUPERSCRIPT, we have

(37) U=O(ωd)=O(σ2lnndln(1/δ)ϵ).norm𝑈𝑂𝜔𝑑𝑂superscript𝜎2𝑛𝑑1𝛿italic-ϵ\|U\|=O(\omega\sqrt{d})=O\left(\frac{\sigma^{2}\ln n\sqrt{d\ln(1/\delta)}}{% \epsilon}\right).∥ italic_U ∥ = italic_O ( italic_ω square-root start_ARG italic_d end_ARG ) = italic_O ( divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_ln italic_n square-root start_ARG italic_d roman_ln ( 1 / italic_δ ) end_ARG end_ARG start_ARG italic_ϵ end_ARG ) .

The result (37) is from random matrix theory (Vershynin, Corollary 4.4.8). Since the vector norm is bounded by the norm of the matrix that contains the vector as a column, thus we also have

(38) 𝒖=O(σ2lnndln(1/δ)ϵ).norm𝒖𝑂superscript𝜎2𝑛𝑑1𝛿italic-ϵ\|\bm{u}\|=O\left(\frac{\sigma^{2}\ln n\sqrt{d\ln(1/\delta)}}{\epsilon}\right).∥ bold_italic_u ∥ = italic_O ( divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_ln italic_n square-root start_ARG italic_d roman_ln ( 1 / italic_δ ) end_ARG end_ARG start_ARG italic_ϵ end_ARG ) .

For (I+U(WW)1)1normsuperscript𝐼𝑈superscriptsuperscript𝑊top𝑊11\|(I+U(W^{\top}W)^{-1})^{-1}\|∥ ( italic_I + italic_U ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥, consider its Taylor series:

(39) (I+U(WW)1)1=i=0(U(WW)1)isuperscript𝐼𝑈superscriptsuperscript𝑊top𝑊11superscriptsubscript𝑖0superscript𝑈superscriptsuperscript𝑊top𝑊1𝑖(I+U(W^{\top}W)^{-1})^{-1}=\sum_{i=0}^{\infty}(-U(W^{\top}W)^{-1})^{i}( italic_I + italic_U ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( - italic_U ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT

W. p. 1ec3dabsent1superscript𝑒subscript𝑐3𝑑\geq 1-e^{-c_{3}d}≥ 1 - italic_e start_POSTSUPERSCRIPT - italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_d end_POSTSUPERSCRIPT,

U(WW)1U(WW)1=O(σ2dlnndln(1/δ)nϵ)norm𝑈superscriptsuperscript𝑊top𝑊1norm𝑈normsuperscriptsuperscript𝑊top𝑊1𝑂superscript𝜎2𝑑𝑛𝑑1𝛿𝑛italic-ϵ\|U(W^{\top}W)^{-1}\|\leq\|U\|\|(W^{\top}W)^{-1}\|=O\left(\frac{\sigma^{2}d\ln n% \sqrt{d\ln(1/\delta)}}{n\epsilon}\right)∥ italic_U ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ ≤ ∥ italic_U ∥ ∥ ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ = italic_O ( divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d roman_ln italic_n square-root start_ARG italic_d roman_ln ( 1 / italic_δ ) end_ARG end_ARG start_ARG italic_n italic_ϵ end_ARG )

is going to zero as n𝑛n\to\inftyitalic_n → ∞. Therefore, the Taylor Series (39) converges. Then,

(40) (I+U(WW)1)1normsuperscript𝐼𝑈superscriptsuperscript𝑊top𝑊11\displaystyle\|(I+U(W^{\top}W)^{-1})^{-1}\|∥ ( italic_I + italic_U ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ =i=0(U(WW)1)iabsentnormsuperscriptsubscript𝑖0superscript𝑈superscriptsuperscript𝑊top𝑊1𝑖\displaystyle=\|\sum_{i=0}^{\infty}(-U(W^{\top}W)^{-1})^{i}\|= ∥ ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( - italic_U ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∥
i=0(U(WW)1)iabsentsuperscriptsubscript𝑖0normsuperscript𝑈superscriptsuperscript𝑊top𝑊1𝑖\displaystyle\leq\sum_{i=0}^{\infty}\|(-U(W^{\top}W)^{-1})^{i}\|≤ ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∥ ( - italic_U ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∥
i=0U(WW)1)i\displaystyle\leq\sum_{i=0}^{\infty}\|U(W^{\top}W)^{-1})\|^{i}≤ ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∥ italic_U ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT
=11U(WW)1)1\displaystyle=\frac{1}{1-\|U(W^{\top}W)^{-1})^{-1}\|}= divide start_ARG 1 end_ARG start_ARG 1 - ∥ italic_U ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ end_ARG =O(1)absent𝑂1\displaystyle=O(1)= italic_O ( 1 )

w.p. 1ec3dabsent1superscript𝑒subscript𝑐3𝑑\geq 1-e^{-c_{3}d}≥ 1 - italic_e start_POSTSUPERSCRIPT - italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_d end_POSTSUPERSCRIPT. Plugging all the bounds into (35), we have derived

𝜷^priv𝜷^RL2=O(σ4d3ln2nln(1/δ)n2ϵ2).superscriptnormsuperscriptbold-^𝜷privsuperscript^𝜷RL2𝑂superscript𝜎4superscript𝑑3superscript2𝑛1𝛿superscript𝑛2superscriptitalic-ϵ2\|{\bm{\hat{\beta}}^{\text{priv}}}-{\hat{\bm{\beta}}^{\text{RL}}}\|^{2}=O\left% (\frac{\sigma^{4}d^{3}\ln^{2}n\ln(1/\delta)}{n^{2}\epsilon^{2}}\right).∥ overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_O ( divide start_ARG italic_σ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT roman_ln start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n roman_ln ( 1 / italic_δ ) end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) .

Proof of Corollary 4.1.

The proof is completed by using the inequality

𝜷^priv𝜷2superscriptnormsuperscriptbold-^𝜷priv𝜷2\displaystyle\|{{\bm{\hat{\beta}}^{\text{priv}}}-\bm{\beta}}\|^{2}∥ overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT - bold_italic_β ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 2(𝜷^RL𝜷2+𝜷^priv𝜷^RL2).absent2superscriptnormsuperscript^𝜷RL𝜷2superscriptnormsuperscriptbold-^𝜷privsuperscript^𝜷RL2\displaystyle\leq 2(\|{{\hat{\bm{\beta}}^{\text{RL}}}-\bm{\beta}}\|^{2}+\|{{% \bm{\hat{\beta}}^{\text{priv}}}-{\hat{\bm{\beta}}^{\text{RL}}}}\|^{2}).≤ 2 ( ∥ over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT - bold_italic_β ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) .

Proof of Theorem 4.6.

Consider the non-projected estimator

(41) 𝜷t+1=𝜷tηnW(W𝜷t𝒛)+𝒖t.superscript𝜷𝑡1superscript𝜷𝑡𝜂𝑛superscript𝑊top𝑊superscript𝜷𝑡𝒛subscript𝒖𝑡\bm{\beta}^{t+1}=\bm{\beta}^{t}-\frac{\eta}{n}W^{\top}(W\bm{\beta}^{t}-\bm{z})% +\bm{u}_{t}.bold_italic_β start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - divide start_ARG italic_η end_ARG start_ARG italic_n end_ARG italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_W bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_italic_z ) + bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT .

Let A=defηnWWsuperscript𝑑𝑒𝑓𝐴𝜂𝑛superscript𝑊top𝑊\displaystyle A\stackrel{{\scriptstyle def}}{{=}}\frac{\eta}{n}W^{\top}Witalic_A start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG italic_d italic_e italic_f end_ARG end_RELOP divide start_ARG italic_η end_ARG start_ARG italic_n end_ARG italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W and B=defηnWsuperscript𝑑𝑒𝑓𝐵𝜂𝑛𝑊\displaystyle B\stackrel{{\scriptstyle def}}{{=}}\frac{\eta}{n}Witalic_B start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG italic_d italic_e italic_f end_ARG end_RELOP divide start_ARG italic_η end_ARG start_ARG italic_n end_ARG italic_W. From the recursive form 𝜷t+1=(IdA)𝜷t+B𝒛+𝒖tsuperscript𝜷𝑡1subscript𝐼𝑑𝐴superscript𝜷𝑡superscript𝐵top𝒛subscript𝒖𝑡\bm{\beta}^{t+1}=(I_{d}-A)\bm{\beta}^{t}+B^{\top}\bm{z}+\bm{u}_{t}bold_italic_β start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = ( italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT - italic_A ) bold_italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_z + bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, we derive for 𝜷^priv=def𝜷Tsuperscript𝑑𝑒𝑓superscriptbold-^𝜷privsuperscript𝜷𝑇{\bm{\hat{\beta}}^{\text{priv}}}\stackrel{{\scriptstyle def}}{{=}}\bm{\beta}^{T}overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG italic_d italic_e italic_f end_ARG end_RELOP bold_italic_β start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT:

(42) 𝜷^priv=(IdA)T𝜷0+t=1T(IdA)t1(B𝒛+𝒖Tt).superscriptbold-^𝜷privsuperscriptsubscript𝐼𝑑𝐴𝑇superscript𝜷0superscriptsubscript𝑡1𝑇superscriptsubscript𝐼𝑑𝐴𝑡1superscript𝐵top𝒛subscript𝒖𝑇𝑡{\bm{\hat{\beta}}^{\text{priv}}}=(I_{d}-A)^{T}\bm{\beta}^{0}+\sum_{t=1}^{T}(I_% {d}-A)^{t-1}(B^{\top}\bm{z}+\bm{u}_{T-t}).overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT = ( italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT - italic_A ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_β start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT - italic_A ) start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ( italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_z + bold_italic_u start_POSTSUBSCRIPT italic_T - italic_t end_POSTSUBSCRIPT ) .

Then,

(43) Var(𝜷^priv)=t=1T(IdA)t1BΣ𝒛Bt=1T(IdA)t1+ω2t=1T(IdA)2t2.Varsuperscriptbold-^𝜷privsuperscriptsubscript𝑡1𝑇superscriptsubscript𝐼𝑑𝐴𝑡1superscript𝐵topsubscriptΣ𝒛𝐵superscriptsubscript𝑡1𝑇superscriptsubscript𝐼𝑑𝐴𝑡1superscript𝜔2superscriptsubscript𝑡1𝑇superscriptsubscript𝐼𝑑𝐴2𝑡2\operatorname{Var}({\bm{\hat{\beta}}^{\text{priv}}})=\sum_{t=1}^{T}(I_{d}-A)^{% t-1}\cdot B^{\top}\Sigma_{\bm{z}}B\cdot\sum_{t=1}^{T}(I_{d}-A)^{t-1}+\omega^{2% }\sum_{t=1}^{T}(I_{d}-A)^{2t-2}.roman_Var ( overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT - italic_A ) start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ⋅ italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT bold_italic_z end_POSTSUBSCRIPT italic_B ⋅ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT - italic_A ) start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT + italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT - italic_A ) start_POSTSUPERSCRIPT 2 italic_t - 2 end_POSTSUPERSCRIPT .

Proof of Theorem 4.7.

By rewriting 𝜷^privsuperscriptbold-^𝜷priv{\bm{\hat{\beta}}^{\text{priv}}}overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT as in (35) and ignoring the remainder of the first-order Taylor expansion in (39), we have

(44) 𝜷^privsuperscriptbold-^𝜷priv\displaystyle{\bm{\hat{\beta}}^{\text{priv}}}overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT =𝜷^RL+(WW)1𝒖(WW)1U𝜷^RL(WW)1U(WW)1𝒖.absentsuperscript^𝜷RLsuperscriptsuperscript𝑊top𝑊1𝒖superscriptsuperscript𝑊top𝑊1𝑈superscriptbold-^𝜷RLsuperscriptsuperscript𝑊top𝑊1𝑈superscriptsuperscript𝑊top𝑊1𝒖\displaystyle={\hat{\bm{\beta}}^{\text{RL}}}+(W^{\top}W)^{-1}\bm{u}-(W^{\top}W% )^{-1}U\bm{\hat{\bm{\beta}}^{\text{RL}}}-(W^{\top}W)^{-1}U(W^{\top}W)^{-1}\bm{% u}.= over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT + ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_u - ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_U overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT - ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_U ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_u .

Then,

(45) Var(𝜷^priv)Varsuperscriptbold-^𝜷priv\displaystyle\operatorname{Var}({\bm{\hat{\beta}}^{\text{priv}}})roman_Var ( overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT priv end_POSTSUPERSCRIPT ) =Var(𝜷^RL)+Var[(WW)1𝒖]+Var[(WW)1U𝜷^RL]absentVarsuperscript^𝜷RLVarsuperscriptsuperscript𝑊top𝑊1𝒖Varsuperscriptsuperscript𝑊top𝑊1𝑈superscriptbold-^𝜷RL\displaystyle=\operatorname{Var}({\hat{\bm{\beta}}^{\text{RL}}})+\operatorname% {Var}[(W^{\top}W)^{-1}\bm{u}]+\operatorname{Var}[(W^{\top}W)^{-1}U\bm{\hat{\bm% {\beta}}^{\text{RL}}}]= roman_Var ( over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT ) + roman_Var [ ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_u ] + roman_Var [ ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_U overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT ]
+Var[(WW)1U(WW)1𝒖]2Cov(𝜷^RL,(WW)1U𝜷^RL)Varsuperscriptsuperscript𝑊top𝑊1𝑈superscriptsuperscript𝑊top𝑊1𝒖2Covsuperscript^𝜷RLsuperscriptsuperscript𝑊top𝑊1𝑈superscriptbold-^𝜷RL\displaystyle\quad+\operatorname{Var}[(W^{\top}W)^{-1}U(W^{\top}W)^{-1}\bm{u}]% -2\operatorname{Cov}({\hat{\bm{\beta}}^{\text{RL}}},(W^{\top}W)^{-1}U\bm{\hat{% \bm{\beta}}^{\text{RL}}})+ roman_Var [ ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_U ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_u ] - 2 roman_Cov ( over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT , ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_U overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT )
2Cov((WW)1𝒖,(WW)1U(WW)1𝒖)2Covsuperscriptsuperscript𝑊top𝑊1𝒖superscriptsuperscript𝑊top𝑊1𝑈superscriptsuperscript𝑊top𝑊1𝒖\displaystyle\quad-2\operatorname{Cov}((W^{\top}W)^{-1}\bm{u},(W^{\top}W)^{-1}% U(W^{\top}W)^{-1}\bm{u})- 2 roman_Cov ( ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_u , ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_U ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_u )
+2Cov((WW)1U𝜷^RL,(WW)1U(WW)1𝒖).2Covsuperscriptsuperscript𝑊top𝑊1𝑈superscriptbold-^𝜷RLsuperscriptsuperscript𝑊top𝑊1𝑈superscriptsuperscript𝑊top𝑊1𝒖\displaystyle\quad+2\operatorname{Cov}((W^{\top}W)^{-1}U\bm{\hat{\bm{\beta}}^{% \text{RL}}},(W^{\top}W)^{-1}U(W^{\top}W)^{-1}\bm{u}).+ 2 roman_Cov ( ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_U overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT , ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_U ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_u ) .

Since 𝒖𝒩(0,ω2Id)similar-to𝒖𝒩0superscript𝜔2subscript𝐼𝑑\bm{u}\sim\mathcal{N}(0,\omega^{2}I_{d})bold_italic_u ∼ caligraphic_N ( 0 , italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ), we have

Var((WW)1𝒖)=ω2(WW)2.Varsuperscriptsuperscript𝑊top𝑊1𝒖superscript𝜔2superscriptsuperscript𝑊top𝑊2\operatorname{Var}((W^{\top}W)^{-1}\bm{u})=\omega^{2}(W^{\top}W)^{-2}.roman_Var ( ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_u ) = italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT .

For the third term, let Σ1=defVar(U𝜷^RL)superscript𝑑𝑒𝑓subscriptΣ1Var𝑈superscriptbold-^𝜷RL\Sigma_{1}\stackrel{{\scriptstyle def}}{{=}}\operatorname{Var}(U\bm{\hat{\bm{% \beta}}^{\text{RL}}})roman_Σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG italic_d italic_e italic_f end_ARG end_RELOP roman_Var ( italic_U overbold_^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT ). By Lemma A.6,

(Σ1)kk=ω2i=1d(βi2+ΣiiRL)subscriptsubscriptΣ1𝑘𝑘superscript𝜔2superscriptsubscript𝑖1𝑑superscriptsubscript𝛽𝑖2subscriptsuperscriptΣRL𝑖𝑖(\Sigma_{1})_{kk}=\omega^{2}\sum_{i=1}^{d}(\beta_{i}^{2}+\Sigma^{\text{RL}}_{% ii})( roman_Σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT = italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ( italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + roman_Σ start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT )

and

(Σ1)kl=ω2(βkβl+ΣklRL).subscriptsubscriptΣ1𝑘𝑙superscript𝜔2subscript𝛽𝑘subscript𝛽𝑙subscriptsuperscriptΣRL𝑘𝑙(\Sigma_{1})_{kl}=\omega^{2}(\beta_{k}\beta_{l}+\Sigma^{\text{RL}}_{kl}).( roman_Σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k italic_l end_POSTSUBSCRIPT = italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT + roman_Σ start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k italic_l end_POSTSUBSCRIPT ) .

For the fourth term, let Σ2=defVar(U(WW)1𝒖)superscript𝑑𝑒𝑓subscriptΣ2Var𝑈superscriptsuperscript𝑊top𝑊1𝒖\Sigma_{2}\stackrel{{\scriptstyle def}}{{=}}\operatorname{Var}(U(W^{\top}W)^{-% 1}\bm{u})roman_Σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG italic_d italic_e italic_f end_ARG end_RELOP roman_Var ( italic_U ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_u ) and let Σ=Var((WW)1𝒖)=ω2(WW)2superscriptΣVarsuperscriptsuperscript𝑊top𝑊1𝒖superscript𝜔2superscriptsuperscript𝑊top𝑊2\Sigma^{\prime}=\operatorname{Var}((W^{\top}W)^{-1}\bm{u})=\omega^{2}(W^{\top}% W)^{-2}roman_Σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = roman_Var ( ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_u ) = italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT.

(Σ2)kk=ω2i=1dΣiisubscriptsubscriptΣ2𝑘𝑘superscript𝜔2superscriptsubscript𝑖1𝑑subscriptsuperscriptΣ𝑖𝑖(\Sigma_{2})_{kk}=\omega^{2}\sum_{i=1}^{d}\Sigma^{\prime}_{ii}( roman_Σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k italic_k end_POSTSUBSCRIPT = italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT

and

(Σ2)kl=ω2Σkl.subscriptsubscriptΣ2𝑘𝑙superscript𝜔2subscriptsuperscriptΣ𝑘𝑙(\Sigma_{2})_{kl}=\omega^{2}\Sigma^{\prime}_{kl}.( roman_Σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k italic_l end_POSTSUBSCRIPT = italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k italic_l end_POSTSUBSCRIPT .

On the other hand, the covariances in (45) are all zeros by Lemmas A.7 and A.8 due to the independencies and zero expectations of U𝑈Uitalic_U and 𝒖𝒖\bm{u}bold_italic_u. Then putting them together, we have

(46) Σ=ΣRL+(WW)1(ω2Id+Σ1+Σ2)(WW)1.ΣsuperscriptΣRLsuperscriptsuperscript𝑊top𝑊1superscript𝜔2subscript𝐼𝑑subscriptΣ1subscriptΣ2superscriptsuperscript𝑊top𝑊1\Sigma=\Sigma^{\text{RL}}+(W^{\top}W)^{-1}(\omega^{2}I_{d}+\Sigma_{1}+\Sigma_{% 2})(W^{\top}W)^{-1}.roman_Σ = roman_Σ start_POSTSUPERSCRIPT RL end_POSTSUPERSCRIPT + ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT + roman_Σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + roman_Σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ( italic_W start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_W ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT .

Note that Σ1subscriptΣ1\Sigma_{1}roman_Σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and Σ2subscriptΣ2\Sigma_{2}roman_Σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT have a common factor ω2superscript𝜔2\omega^{2}italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. By rescaling Σ1subscriptΣ1\Sigma_{1}roman_Σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and Σ2subscriptΣ2\Sigma_{2}roman_Σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, we have the expression for ΣΣ\Sigmaroman_Σ as stated in the theorem.

\printbibliography