Cheap Subsampling Bootstrap Confidence Intervals For Fast and Robust Inference in Biostatistics
Cheap Subsampling Bootstrap Confidence Intervals For Fast and Robust Inference in Biostatistics
Abstract
Bootstrapping is often applied to get confidence limits for semiparametric inference
of a target parameter in the presence of nuisance parameters. Bootstrapping with
replacement can be computationally expensive and problematic when cross-validation
is used in the estimation algorithm due to duplicate observations in the bootstrap
samples. We provide a valid, fast, easy-to-implement subsampling bootstrap method
for constructing confidence intervals for asymptotically linear estimators and discuss
its application to semiparametric causal inference. Our method, inspired by the Cheap
Bootstrap (Lam, 2022), leverages the quantiles of a t-distribution and has the desired
coverage with few bootstrap replications. We show that the method is asymptotically
valid if the subsample size is chosen appropriately as a function of the sample size. We
illustrate our method with data from the LEADER trial (Marso et al., 2016), obtaining
confidence intervals for a longitudinal targeted minimum loss-based estimator (van der
Laan and Gruber, 2012). Through a series of empirical experiments, we also explore
the impact of subsample size, sample size, and the number of bootstrap repetitions on
the performance of the confidence interval.
Keywords: bootstrap; causal inference; computational efficiency; subsampling; targeted
learning.
1 Introduction
Epidemiological studies of observational data are often characterized by large sample sizes
and analyzed using statistical algorithms that incorporate machine learning estimators to
estimate the nuisance parameters involved in the estimation of a target parameter of in-
terest (Hernán and Robins, 2016, 2020; van der Laan and Rose, 2018). The bootstrap is
a standard approach in cases where (asymptotic) formulas for standard errors do not exist
or are not implemented. But even if there exists an asymptotic formula for constructing
confidence intervals, one may wish to supplement the analysis with bootstrap confidence
intervals if the validity of the estimator of the formula-based standard error depends on the
correct specification of the nuisance parameter models (Chiu et al., 2023). However, the
computational burden of the standard bootstrap algorithms increases with the sample size.
Methods for constructing bootstrap confidence intervals often use empirical quantiles of a
bootstrapped statistic. Popular choices include the percentile bootstrap and the bootstrap-
t confidence interval (Tibshirani and Efron, 1993). It is recommended these methods be
1
run with a minimum of 1000 bootstrap replications (Efron, 1987). Others are based on the
standard error using the bootstrap samples. According to Efron (1987), performing between
25 and 100 bootstrap replications is sufficient for stability in standard error-based bootstrap
confidence intervals.
When bootstrap samples are drawn with replacement, complications can arise if one
of the subroutines is sensitive to duplicate observations in the data (Bickel et al., 1997).
This is the case, for example, if cross-validation is used to tune hyperparameters of machine
learning algorithms for nuisance parameters. Cross-validation is a means of evaluating the
performance of a model on unseen data by splitting the data into independent training
and test sets. However, if we first apply the bootstrap with replacement and then cross-
validation, the same observation may be present in both the training and test sets (see Figure
1). This violates the independence between the training and test sets in the cross-validation
procedure, which may lead to a biased estimate of the out-of-sample error.
Non-parametric bootstrap
Fold 1
Fold 2
Subsampling
Fold 1
Fold 2
Figure 1: Illustration of the problem with ties in the data when using bootstrap with
replacement and the subsampling bootstrap. First, a bootstrap sample is drawn with and
without replacement from the data set. The data set is then split into two folds for the
cross-validation procedure. We see that there is one observation present in both folds for
the bootstrap sample drawn with replacement. Conversely, the subsample does not have
this issue.
In this article, we propose a Cheap Subsampling bootstrap algorithm that samples with-
out replacement to obtain bootstrap data sets that are smaller than the original data set.
Our algorithm and formula are based on the Cheap Bootstrap confidence interval (Lam,
2022). Note that (Lam, 2022) also discusses subsampling but with replacement. Their
approach may have theoretical advantages over subsampling without replacement (Bickel
et al., 1997), but our approach is compatible with cross-validation and other methods that
2
are sensitive to ties.
The consistency of subsampling, which we need for the validity of the Cheap Subsampling
confidence interval, has been derived under the assumption that the asymptotic distribution
of the estimator of interest exists (Politis and Romano, 1994). Consistency has also been
derived under the assumption that the estimator of interest is asymptotically linear (Wu,
1990). In contrast, non-parametric bootstrapping (drawing a bootstrap sample of size n
from the data set with replacement) is known to fail in various theoretical settings (Bickel
et al., 1997). Consistency of subsampling requires that the subsample size is chosen correctly
as a function of the sample size. The results are asymptotic and do not provide a method for
selecting the subsample size in practice. Here we apply the conditions in Wu (1990) to show
the asymptotic validity of the Cheap Subsampling confidence interval for asymptotically
linear estimators (Theorem 1). In addition, we show that the Cheap Subsampling confidence
interval converges to a confidence interval based on a delete-d jackknife variance estimator
(Shao and Wu, 1989) as the number of bootstrap repetitions increases. In the limit, our
confidence interval is valid for any number of bootstrap replications.
We demonstrate the use of our method with an application in causal inference in the
LEADER trial (Marso et al., 2016), which investigates the effects of liraglutide on cardiovas-
cular outcomes in patients with type 2 diabetes. The overall goal is to estimate the causal
effect of staying on a treatment which we estimate with a longitudinal targeted minimum
loss-based estimator (van der Laan and Gruber, 2012). Bootstrap inference for the longitu-
dinal targeted minimum loss-based estimator is of general interest and in particular useful
in cases where estimates of the standard error are not reliable (Tran et al., 2023; van der
Laan et al., 2023).
The remainder of the article is organized as follows. In Section 2, we introduce the
Cheap Subsampling algorithm and formulate the conditions for the asymptotic validity of
the Cheap Subsampling confidence interval and its connection to an asymptotic confidence
interval based on the delete-d jackknife variance estimator. In Section 3, we apply the
Cheap Subsampling confidence interval to the LEADER trial data. In Section 4, we present
a simulation study to investigate the performance of the Cheap Subsampling confidence
interval.
where ϕP : Rd → R is a measurable function with EP [ϕP (O)] = 0, 0 < EP [ϕP (O)2 ] < ∞,
∗
and the remainder term fulfills Rn (P ) = oP ( √1n ) for all P ∈ P. A subsample Dm =
∗ ∗
(O1 , . . . , Om ) is a diminished data set obtained by drawing m < n observations without
3
replacement from the data set Dn . We denote by Ψ̂∗m the estimate based on the subsample
∗
Dm .
P (Ψ(P ) ∈ I(m,n,B) ) → 1 − α,
as m, n → ∞ for any B ≥ 1.
Proof. The proof is given in Appendix A.
Remark 1. Other choices of the subsample size may be of interest such as m/n → 1 and
n − m → ∞ as n → ∞ (Wu, 1990), but would require conditions on the remainder term Rn
that are too restrictive for our purposes.
Next, we state a result that shows that the endpoints of the Cheap Subsampling confi-
dence interval converge to a random limit fully determined by the data Dn as the number of
bootstrap repetitions increases (Theorem 2). Specifically, the theorem has the consequence
that the endpoints of the Cheap Subsampling confidence interval (1) converge to the end-
points of an (asymptotic) confidence interval based on the delete-(n − m) jackknife variance
estimator for the variance as B → ∞. The delete-(n−m) jackknife variance estimator (Shao
and Wu, 1989) is given by
m
d jack =
Var EP [(Ψ̂∗m − Ψ̂n )2 |Dn ].
n−m
If the condition of Theorem 2 is fulfilled, we have
r
m
q
P
Ψ̂n ± tB,1−α/2 S → Ψ̂n ± q1−α/2 Var
d jack , as B → ∞,
n−m
where q1−α/2 is the 1 − α/2 quantile of the standard normal distribution.
4
Theorem 2. Let Ψ̂n be any estimator. If EP [Ψ̂4n ] < ∞, then
B 2
1 X ∗ P
S2 = Ψ̂(m,b) − Ψ̂n → EP [(Ψ̂∗m − Ψ̂n )2 |Dn ],
B
b=1
5
15%
Endpoints of the confidence intervals
10%
5%
Figure 2: Lower and upper endpoints (y-axis) of 95% Cheap Subsampling confidence inter-
vals for the absolute risk of dying within 4 years for the placebo regimen in the LEADER trial
using the LTMLE. The x-axis shows the number of bootstrap repetitions B ∈ {1, . . . , 200}
for the subsample size m = ⌊0.8 · 8652⌋ = 6850. Additionally, the lower and upper endpoints
of the asymptotic confidence interval is the black horizontal lines and the point estimate is
the dotted line.
4 Simulation study
In this section, we simulate data to investigate the effects of sample size, subsample size, and
the number of bootstrap repetitions on the coverage probability and the width of the Cheap
Subsampling confidence interval. We consider a survival setting with a binary treatment
and a time-to-event outcome and apply the LTMLE algorithm for which we discretize time
into 2 time intervals. In the simulation study, the target parameter is the absolute risk
of an event within the end of the second time interval under sustained treatment. For
details on the data-generating mechanism, the simulation study, and the R code, see the
supplementary material (Appendix C) and https://github.com/jsohlendorff/cheap_
6
15.0%
Upper limit of confidence interval
12.5%
Bootstrap
iteration (B)
5
10.0%
25
100
200
7.5%
5.0%
Figure 3: The upper endpoint of 95% Cheap Subsampling confidence interval based on
the LTMLE of the absolute risk of dying within 4 years under the placebo regimen in the
LEADER trial. The plot shows the Monte Carlo error (random seed effect) based on 10
runs of the Cheap Subsampling algorithm for each of the subsample sizes m = ⌊η · 8652⌋
with η ∈ {0.5, 0.632, 0.8, 0.9} and number of bootstrap repetitions B ∈ {5, 20, 100, 200}.
subsampling_simulation_study.
In our simulation study, we consider sample sizes n ∈ {250, 500, 1000, 2000, 8000} and
vary the subsample size m = ⌊η · n⌋ with η ∈ {0.5, 0.632, 0.8, 0.9} and the number of
bootstrap repetitions B ∈ {1, . . . , 500}. For each scenario, we repeat the whole procedure in
2000 simulated data sets. For the estimation of the nuisance parameters, we use (correctly
specified) logistic regression models.
In each instance, we compute the empirical coverage of the confidence intervals and
the average relative width of the Cheap Subsampling confidence interval for the LTMLE
when compared with the asymptotic confidence interval which is based on an estimate of
the efficient influence function van der Laan and Gruber (2012). Additionally, we compare
7
our Cheap Subsampling confidence interval with the Cheap Bootstrap confidence interval
(Lam, 2022). The results are summarized across the 2000 simulated data sets in Table 1
and Figure 4.
96%
Confidence interval
method
Coverage
94%
Figure 4: Results from the simulation study illustrating the coverage (y-axis) of three 95%
confidence intervals for the absolute risk of an event before the end of the second time
interval under sustained exposure using the LTMLE for n = 2000. The three confidence
intervals are the asymptotic confidence interval, the Cheap Subsampling confidence interval
(m = ⌊0.632 · 2000⌋ = 1264), and the Cheap Bootstrap confidence interval (Lam, 2022).
The x-axis shows the number of bootstrap repetitions B ∈ {1, . . . , 500}.
Figure 4 shows that the coverage is close to the nominal level for very low numbers
of bootstrap repetitions and fixed subsample size m = ⌊0.632 · 2000⌋ = 1264. This was
guaranteed by Theorem 1 only in large samples. When we compare the coverage of the
asymptotic confidence interval with the coverage of the Cheap Bootstrap confidence interval
(Lam, 2022), we see that the Cheap Subsampling confidence interval has similar coverage,
albeit with slightly worse coverage for very low numbers of bootstrap replications B. Table
8
1 shows no systematic effects on the coverage of the Cheap Subsampling confidence interval
for different subsample sizes in large sample sizes. However, the coverage appears to depend
on the subsample size when the sample size is small.
For the widths in Table 1, we see that the Cheap Subsampling confidence interval is, in
general, slightly wider than the asymptotic confidence intervals, but that increasing B results
in narrower confidence intervals. A possible explanation for the wider Cheap Subsampling
confidence intervals at low B is that the quantiles of the t-distribution are large for low
degrees of freedom but quite comparable to the normal distribution for large degrees of
freedom (B ≥ 25). Similar results were obtained in the case study (Section 3) for small
values of B. Moreover, the width of the Cheap Subsampling confidence interval slightly
decreases with increasing subsample size and sample size when compared to the asymptotic
confidence interval, but this effect is less noticeable than the effect of the number of bootstrap
repetitions.
Table 1: The table shows the coverage and relative widths compared to the asymptotic
confidence interval of the 95% Cheap Subsampling confidence interval for the absolute risk
of an event before the end of the second time interval under sustained exposure using
the LTMLE for different subsample percentages η and sample sizes n and the number of
bootstrap repetitions B.
Coverage (%) Relative width (%)
Subsample proportion (η) Subsample proportion (η)
B n 50% 63.2% 80% 90% 50% 63.2% 80% 90%
5 250 94.8 93.2 92.9 93.7 131.0 127.5 127.5 127.2
500 94.2 93.8 95.0 94.5 126.3 126.0 126.9 125.8
1000 94.0 95.0 94.2 94.7 125.1 125.8 124.9 125.3
2000 94.8 95.1 95.0 94.5 125.6 126.2 123.5 124.8
8000 95.3 95.8 94.5 95.3 125.8 127.1 124.8 124.0
25 250 93.6 92.5 93.2 92.2 107.9 106.0 106.1 106.3
500 94.2 93.8 94.3 95.1 105.8 104.9 104.8 104.1
1000 94.5 95.2 94.2 94.3 104.3 104.8 104.5 104.1
2000 94.2 95.0 95.3 95.0 104.9 105.2 104.2 104.3
8000 94.7 95.3 95.3 94.5 104.2 104.8 104.0 103.6
100 250 93.8 92.7 93.2 92.5 104.8 103.5 103.2 103.0
500 94.2 93.8 94.9 95.0 102.5 102.2 101.8 101.7
1000 94.9 94.8 94.6 93.8 101.5 101.5 101.4 101.1
2000 94.2 94.6 94.8 95.0 101.5 101.3 101.0 101.1
8000 94.5 95.2 95.0 95.4 101.0 101.3 100.7 101.0
500 250 93.9 92.8 93.0 92.7 103.9 102.8 102.2 102.1
500 93.8 93.8 94.7 95.1 101.7 101.3 101.1 101.0
1000 94.8 94.8 94.5 94.0 100.8 100.7 100.6 100.5
2000 94.0 94.5 94.8 95.0 100.6 100.5 100.3 100.3
8000 94.6 94.9 95.0 94.7 100.2 100.3 100.2 100.2
9
5 Discussion
The Cheap Subsampling confidence interval is a valuable tool for applied research where
computational efficiency is needed. We have shown that it provides asymptotically valid
confidence intervals and investigated the real world and small sample performance for a
target parameter in a semiparametric causal inference setting. The Cheap Subsampling
confidence interval is easy to implement and can be applied to any asymptotically linear
estimator. Theoretically, the method can be applied already with very few bootstrap rep-
etitions. But, in our case study, the Monte Carlo error may not be regarded as negligible
for B < 25. This is similar to the suggestion given by Efron (1987). This is likely due to
mn 2
n−m S being a Monte Carlo bootstrap estimator of the asymptotic variance.
Our empirical study shows that the coverage of the Cheap Subsampling confidence inter-
val is more sensitive to the subsample sizes in small data sets. In these situations, we need
to choose the subsample size carefully to ensure correct coverage. Politis et al. (1999) and
Bickel and Sakov (2008) provide methods for adaptively selecting the subsample size. For
example, one may want to conduct a Monte Carlo experiment by selecting from a list of sub-
sample sizes m1 , . . . , mK the one that gives the best apparent coverage. The most notable
issue with these approaches is the computational burden. In future work, we will investigate
the possibility of adapting these methods to choosing the subsample size in practice.
With large data sets, such as those found in electronic health records, there is also the
possibility of using the Bag of Little Bootstraps (Kleiner et al., 2012) or the Cheap Bag of
Little Bootstraps (Lam, 2022). The idea behind these methods is to avoid the tuning of the
subsample size but to retain computational feasibility by estimating on smaller data sets.
Another advantage over the Cheap Subsampling confidence interval is that the confidence
intervals based on the Bag of Little Bootstraps are second-order accurate, likely resulting
in narrower confidence intervals. Lam (2022) showed that the Cheap Bootstrap based on
resampling yields confidence intervals that are second-order accurate. In future work, we
will investigate if the Cheap Subsampling bootstrap confidence interval can be made second-
order accurate, e.g., by using interpolation (Bertail, 1997) and extrapolation (Bertail and
Politis, 2001). On the other hand, the Bag of Little Bootstraps sample with replacement
and hence these methods suffer from the problem illustrated in Figure 1.
In our application of the Cheap Subsampling confidence intervals, we chose to study the
finite-sample properties with the TMLE, but there are also other choices given by Tran et al.
(2023); Coyle and van der Laan (2018). Both of these approaches provide valid confidence
intervals for the TMLE that adequately deal with the issue with cross-validation. The
method in Tran et al. (2023) also reduces the computation time for bootstrapping by only
needing to estimate the nuisance parameters once in the entire sample. However, these
approaches are specifically designed for the TMLE and may not be applicable to other
estimators.
5.1 Acknowledgments
The authors would like to thank Novo Nordisk for providing the data from the LEADER
trial.
10
5.2 Funding
Partially funded by the European Union. Views and opinions expressed are however those
of the author(s) only and do not necessarily reflect those of the European Union or Euro-
pean Health and Digital Executive Agency (HADEA). Neither the European Union nor the
granting authority can be held responsible for them. This work has received funding from
the UK research and Innovation under contract number 101095556.
References
Bertail, P. (1997). Second-order properties of an extrapolated bootstrap without replace-
ment under weak assumptions. Bernoulli 3 (2), 149 – 179.
Bertail, P. and D. N. Politis (2001). Extrapolation of subsampling distribution estima-
tors: The i.i.d. and strong mixing cases. The Canadian Journal of Statistics / La Revue
Canadienne de Statistique 29 (4), 667–680.
Bickel, P. J., F. Götze, and W. R. van Zwet (1997). Resampling Fewer Than n Observations:
Gains, Losses, and Remedies for Losses. Statistica Sinica 7 (1), 1–31.
Bickel, P. J., C. A. Klaassen, P. J. Bickel, Y. Ritov, J. Klaassen, J. A. Wellner, and Y. Ritov
(1993). Efficient and adaptive estimation for semiparametric models, Volume 4. Springer.
Bickel, P. J. and A. Sakov (2008). On the choice of m in the m out of n bootstrap and
confidence bounds for extrema. Statistica Sinica, 967–985.
Chiu, Y.-H., L. Wen, S. McGrath, R. Logan, I. J. Dahabreh, and M. A. Hernán (2023).
Evaluating model specification when using the parametric g-formula in the presence of
censoring. American journal of epidemiology 192 (11), 1887–1895.
Coyle, J. and M. J. van der Laan (2018). Targeted Bootstrap, pp. 523–539. Cham: Springer
International Publishing.
Efron, B. (1987). Better bootstrap confidence intervals. Journal of the American Statistical
Association 82 (397), 171–185.
Hernán, M. A. and J. M. Robins (2016). Using big data to emulate a target trial when a
randomized trial is not available. American journal of epidemiology 183 (8), 758–764.
Hernán, M. A. and J. M. Robins (2020). Causal inference: What if. Boca Raton: Chapman
& Hall/CRC, FL.
Kleiner, A., A. Talwalkar, P. Sarkar, and M. I. Jordan (2012). A scalable bootstrap for
massive data.
11
Lam, H. (2022, January). A Cheap Bootstrap Method for Fast Inference.
https://arxiv.org/abs/2202.00090v1.
Marso, S. P., G. H. Daniels, K. Brown-Frandsen, P. Kristensen, J. F. Mann, M. A. Nauck,
S. E. Nissen, S. Pocock, N. R. Poulter, L. S. Ravn, W. M. Steinberg, M. Stockner,
B. Zinman, R. M. Bergenstal, and J. B. Buse (2016). Liraglutide and cardiovascular
outcomes in type 2 diabetes. New England Journal of Medicine 375 (4), 311–322.
Politis, D. N. and J. P. Romano (1994). Large Sample Confidence Regions Based on Sub-
samples under Minimal Assumptions. The Annals of Statistics 22 (4), 2031 – 2050.
Politis, D. N., J. P. Romano, and M. Wolf (1999). Subsampling. Springer Series in Statistics.
New York, NY: Springer.
Shao, J. and C. F. J. Wu (1989). A General Theory for Jackknife Variance Estimation. The
Annals of Statistics 17 (3), 1176 – 1197.
Tay, J. K., B. Narasimhan, and T. Hastie (2023). Elastic net regularization paths for all
generalized linear models. Journal of Statistical Software 106 (1), 1–31.
van der Laan, M. J., D. Benkeser, and W. Cai (2023). Efficient estimation of pathwise
differentiable target parameters with the undersmoothed highly adaptive lasso. The In-
ternational Journal of Biostatistics 19 (1), 261–289.
van der Laan, M. J. and S. Gruber (2012, May). Targeted Minimum Loss Based Estimation
of Causal Effects of Multiple Time Point Interventions. The International Journal of
Biostatistics 8 (1).
van der Laan, M. J., E. C. Polley, and A. E. Hubbard (2007). Super learner. Statistical
Applications in Genetics and Molecular Biology 6 (1).
van der Laan, M. J. and S. Rose (2018). Targeted Learning in Data Science: Causal In-
ference for Complex Longitudinal Studies. Springer Series in Statistics. Cham: Springer
International Publishing.
Wright, M. N. and A. Ziegler (2017). ranger: A fast implementation of random forests for
high dimensional data in C++ and R. Journal of Statistical Software 77 (1), 1–17.
Wu, C. F. J. (1990). On the Asymptotic Properties of the Jackknife Histogram. The Annals
of Statistics 18 (3), 1438 – 1452.
12
6 Appendix A: Proof of Theorem 1
To prove the theorem, we use the notation and framework of Section 2.1. Since the estimator
Ψ̂n is asymptotically linear, Slutsky’s theorem and the central limit theorem yield
√ d d
→ Z = N (0, σ 2 )
n(Ψ̂n − Ψ(P )) −
if n−m
n > λ for some λ > 0 for all n ∈ N. By our assumption on the subsample size, we
have mn ≤ c for some 0 < c < 1, and thus
n−m
n =1− m n ≥ λ := 1 − c > 0. This implies
r
mn P
P (Ψ̂∗ − Ψ̂n ) ≤ x|Dn → Φσ2 (x),
n−m m
d
where Z0 , . . . , ZB are independent and identically
p distributed with Zb = Z. By conditioning
on Dn and using conditional independence of k(m,n) (Ψ̂∗(m,b) − Ψ̂n ), b = 1, . . . , B (bootstrap
samples are drawn independently given the data), we have
√ q q
P n(Ψ̂n − Ψ(P )) ≤ z0 , k(m,n) (Ψ̂∗(m,1) − Ψ̂n ) ≤ z1 , . . . , k(m,n) (Ψ̂∗(m,B) − Ψ̂n ) ≤ zB
B
Y
− Φσ2 (zb )
b=0
" B B
#
√ Y q Y
≤E I n(Ψ̂n − Ψ(P )) ≤ z0 P k(m,n) (Ψ̂∗(m,b) − Ψ̂n ) ≤ zb |Dn − Φσ2 (zb )
b=1 b=1
(3)
B
Y h √ i
+ Φσ2 (zb ) P n(Ψ̂n − Ψ(P )) ≤ z0 − Φσ2 (z0 ) , (4)
b=1
√ d
for any z = (z0 , . . . , zB ) ∈ RB+1 . Since, by assumption, n(Ψ̂n − Ψ(P )) − → Z, the term in
P
equation (4) converges to zero as n → ∞. Since also P ( k(m,n) (Ψ̂∗(m,b) − Ψ̂n ) ≤ z|Dn ) →
p
Φσ2 (z) for b = 1, . . . , B and all z ∈ R as n → ∞, it follows that the integrand of the term (3)
converges to zero in probability as n → ∞. Since the integrand in the term (3) is bounded
by 1, it follows from dominated convergence that (3) tends to zero. Thus (2) holds. From
13
this result, we deduce that
√ √
n(Ψ̂n −Ψ(P ))
Ψ̂n − Ψ(P ) n(Ψ̂n − Ψ(P )) σ
T(m,n) = q = p =s √
m k(m,n) S 2
n−m S 1
PB k(m,n)
B b=1 σ (Ψ̂∗(m,b) − Ψ̂n )
d Z̃0
−
→q PB
1
B b=1 Z̃b2
d
where Z̃b = Zb /σ = N (0, 1). Note that by the independence of the Z̃b ’s and the fact that
d Z̃0
Z̃b = N (0, 1), we have that √ 1 P B 2
has a t-distribution with B degrees of freedom. This
B b=1 Z̃b
shows that T(m,n) converges in distribution to a t-distribution with B degrees of freedom,
as n → ∞. Thus, we have
as n → ∞.
∗,i
Applying Hölder’s inequality with p = 4/i and q = 4/(4 − i), we have |E[Ψ̂4−i n Ψ̂m ]| ≤
4 1/p ∗ 4 1/q 4 1/p 4 1/q
(E[Ψ̂n ]) E[(Ψ̂m ) ] = (E[Ψ̂n ]) E[(Ψ̂m ) ] , where the latter equality follows from the
fact that the subsample has marginally the same distribution as a full sample of m obser-
vations. Moreover, since E[(Ψ̂∗m − Ψ̂n )2 |Dn ] = argming E[(Ψ̂∗m − Ψ̂n )2 − g(Dn )]2 , where the
minimum is taken over all Dn -measurable functions g, we have that
2 h i
∗ ∗
2 2
E Ψ̂m − Ψ̂n ) − E[(Ψ̂m − Ψ̂n ) |Dn ] ≤ E (Ψ̂∗m − Ψ̂n )4 < ∞. (5)
By using that conditionally on Dn , the random variables Ψ̂∗(m,1) , . . . , Ψ̂∗(m,B) are indepen-
dent, we have by Chebyshev’s inequality for conditional expectations, for arbitrary ε > 0,
that
B
! B
1 X 2 1 X
∗
P 2
Ψ̂(m,b) − Ψ̂n − E[(Ψ̂m − Ψ̂n ) |Dn ] ≥ ε Dn ≤ 2 2 Var[(Ψ̂∗(m,b) −Ψ̂n )2 |Dn ].
B B ε
b=1 b=1
14
Taking the expectation on both sides of the previous display, we have
B
!
1 X 2
∗ 2
P Ψ̂(m,b) − Ψ̂n − E[(Ψ̂m − Ψ̂n ) |Dn ] ≥ ε
B
b=1
1
≤ E[Var[(Ψ̂∗ − Ψ̂n )2 |Dn ]] (6)
Bε2 m
1 2
∗ 2 ∗ 2
= E E (Ψ̂ m − Ψ̂ n ) − E[( Ψ̂ m − Ψ̂ n ) |D n ] | Dn
Bε2
2
1 ∗ 2 ∗ 2
= E ( Ψ̂m − Ψ̂n ) − E[(Ψ̂ m − Ψ̂ n ) |D n ] (7)
Bε2
1 h i
≤ 2
E (Ψ̂∗m − Ψ̂n )4 . (8)
Bε
2
where (6) follows since Ψ̂(m,b) − Ψ̂n , b = 1, . . . , B are identically distributed and expec-
tations are linear, (7) follows from the tower property of conditional expectations, and (8)
follows by (5). Taking the limit as B → ∞ concludes the proof.
15
1
where expit(x) = 1+exp(−x) is the logistic function, N (µ, σ 2 ) is the normal distribution with
mean µ and variance σ 2 , Bern(p) is the Bernoulli distribution with probability p, and ∅ is
the missingness indicator.
16