0% found this document useful (0 votes)

10 views16 pages

Cheap Subsampling Bootstrap Confidence Intervals For Fast and Robust Inference in Biostatistics

This document presents a novel Cheap Subsampling bootstrap method for constructing confidence intervals in biostatistics, which addresses the computational challenges of traditional bootstrapping techniques. The method is designed to be fast and robust, particularly in the context of semiparametric causal inference, and is validated through empirical experiments and application to data from the LEADER trial. The authors demonstrate that their approach yields reliable confidence intervals with fewer bootstrap replications while maintaining asymptotic validity under appropriate conditions.

Uploaded by

남승우

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views16 pages

Cheap Subsampling Bootstrap Confidence Intervals For Fast and Robust Inference in Biostatistics

Uploaded by

남승우

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Cheap Subsampling bootstrap confidence intervals for

fast and robust inference in biostatistics

Johan Sebastian Ohlendorff1,∗ , Anders Munch1 , Kathrine Kold Sørensen2 ,
arXiv:2501.10289v1 [stat.ME] 17 Jan 2025

and Thomas Alexander Gerds1

1
Section of Biostatistics, University of Copenhagen, Denmark
2
Department of Cardiology, Nordsjællands Hospital, Denmark

Abstract
Bootstrapping is often applied to get confidence limits for semiparametric inference
of a target parameter in the presence of nuisance parameters. Bootstrapping with
replacement can be computationally expensive and problematic when cross-validation
is used in the estimation algorithm due to duplicate observations in the bootstrap
samples. We provide a valid, fast, easy-to-implement subsampling bootstrap method
for constructing confidence intervals for asymptotically linear estimators and discuss
its application to semiparametric causal inference. Our method, inspired by the Cheap
Bootstrap (Lam, 2022), leverages the quantiles of a t-distribution and has the desired
coverage with few bootstrap replications. We show that the method is asymptotically
valid if the subsample size is chosen appropriately as a function of the sample size. We
illustrate our method with data from the LEADER trial (Marso et al., 2016), obtaining
confidence intervals for a longitudinal targeted minimum loss-based estimator (van der
Laan and Gruber, 2012). Through a series of empirical experiments, we also explore
the impact of subsample size, sample size, and the number of bootstrap repetitions on
the performance of the confidence interval.
Keywords: bootstrap; causal inference; computational efficiency; subsampling; targeted
learning.

1 Introduction
Epidemiological studies of observational data are often characterized by large sample sizes
and analyzed using statistical algorithms that incorporate machine learning estimators to
estimate the nuisance parameters involved in the estimation of a target parameter of in-
terest (Hernán and Robins, 2016, 2020; van der Laan and Rose, 2018). The bootstrap is
a standard approach in cases where (asymptotic) formulas for standard errors do not exist
or are not implemented. But even if there exists an asymptotic formula for constructing
confidence intervals, one may wish to supplement the analysis with bootstrap confidence
intervals if the validity of the estimator of the formula-based standard error depends on the
correct specification of the nuisance parameter models (Chiu et al., 2023). However, the
computational burden of the standard bootstrap algorithms increases with the sample size.
Methods for constructing bootstrap confidence intervals often use empirical quantiles of a
bootstrapped statistic. Popular choices include the percentile bootstrap and the bootstrap-
t confidence interval (Tibshirani and Efron, 1993). It is recommended these methods be

1
run with a minimum of 1000 bootstrap replications (Efron, 1987). Others are based on the
standard error using the bootstrap samples. According to Efron (1987), performing between
25 and 100 bootstrap replications is sufficient for stability in standard error-based bootstrap
confidence intervals.
When bootstrap samples are drawn with replacement, complications can arise if one
of the subroutines is sensitive to duplicate observations in the data (Bickel et al., 1997).
This is the case, for example, if cross-validation is used to tune hyperparameters of machine
learning algorithms for nuisance parameters. Cross-validation is a means of evaluating the
performance of a model on unseen data by splitting the data into independent training
and test sets. However, if we first apply the bootstrap with replacement and then cross-
validation, the same observation may be present in both the training and test sets (see Figure
1). This violates the independence between the training and test sets in the cross-validation
procedure, which may lead to a biased estimate of the out-of-sample error.

Non-parametric bootstrap
Fold 1

Fold 2

Sample Resampled Data

Subsampling
Fold 1

Fold 2

Sample Subsampled Data

Figure 1: Illustration of the problem with ties in the data when using bootstrap with
replacement and the subsampling bootstrap. First, a bootstrap sample is drawn with and
without replacement from the data set. The data set is then split into two folds for the
cross-validation procedure. We see that there is one observation present in both folds for
the bootstrap sample drawn with replacement. Conversely, the subsample does not have
this issue.

In this article, we propose a Cheap Subsampling bootstrap algorithm that samples with-
out replacement to obtain bootstrap data sets that are smaller than the original data set.
Our algorithm and formula are based on the Cheap Bootstrap confidence interval (Lam,
2022). Note that (Lam, 2022) also discusses subsampling but with replacement. Their
approach may have theoretical advantages over subsampling without replacement (Bickel
et al., 1997), but our approach is compatible with cross-validation and other methods that

2
are sensitive to ties.
The consistency of subsampling, which we need for the validity of the Cheap Subsampling
confidence interval, has been derived under the assumption that the asymptotic distribution
of the estimator of interest exists (Politis and Romano, 1994). Consistency has also been
derived under the assumption that the estimator of interest is asymptotically linear (Wu,
1990). In contrast, non-parametric bootstrapping (drawing a bootstrap sample of size n
from the data set with replacement) is known to fail in various theoretical settings (Bickel
et al., 1997). Consistency of subsampling requires that the subsample size is chosen correctly
as a function of the sample size. The results are asymptotic and do not provide a method for
selecting the subsample size in practice. Here we apply the conditions in Wu (1990) to show
the asymptotic validity of the Cheap Subsampling confidence interval for asymptotically
linear estimators (Theorem 1). In addition, we show that the Cheap Subsampling confidence
interval converges to a confidence interval based on a delete-d jackknife variance estimator
(Shao and Wu, 1989) as the number of bootstrap repetitions increases. In the limit, our
confidence interval is valid for any number of bootstrap replications.
We demonstrate the use of our method with an application in causal inference in the
LEADER trial (Marso et al., 2016), which investigates the effects of liraglutide on cardiovas-
cular outcomes in patients with type 2 diabetes. The overall goal is to estimate the causal
effect of staying on a treatment which we estimate with a longitudinal targeted minimum
loss-based estimator (van der Laan and Gruber, 2012). Bootstrap inference for the longitu-
dinal targeted minimum loss-based estimator is of general interest and in particular useful
in cases where estimates of the standard error are not reliable (Tran et al., 2023; van der
Laan et al., 2023).
The remainder of the article is organized as follows. In Section 2, we introduce the
Cheap Subsampling algorithm and formulate the conditions for the asymptotic validity of
the Cheap Subsampling confidence interval and its connection to an asymptotic confidence
interval based on the delete-d jackknife variance estimator. In Section 3, we apply the
Cheap Subsampling confidence interval to the LEADER trial data. In Section 4, we present
a simulation study to investigate the performance of the Cheap Subsampling confidence
interval.

2 Cheap Subsampling bootstrap

2.1 Notation and framework
We introduce our Cheap Subsampling algorithm in a general framework that includes the
applied settings. Let Dn = (O1 , . . . , On ) with Oi ∈ Rd be a data set of independent and
identically distributed random variables sampled from some unknown probability measure
P ∈ P. Here P denotes a suitably large set of probability measures on Rd . We denote by
Ψ : P → R the statistical functional of interest and by Ψ̂n an estimator of Ψ(P ) based on
Dn . The estimator Ψ̂n is asymptotically linear (Bickel et al., 1993) if
n
1X
Ψ̂n − Ψ(P ) = ϕP (Oi ) + Rn (P )
n i=1

where ϕP : Rd → R is a measurable function with EP [ϕP (O)] = 0, 0 < EP [ϕP (O)2 ] < ∞,
∗
and the remainder term fulfills Rn (P ) = oP ( √1n ) for all P ∈ P. A subsample Dm =
∗ ∗
(O1 , . . . , Om ) is a diminished data set obtained by drawing m < n observations without

3
replacement from the data set Dn . We denote by Ψ̂∗m the estimate based on the subsample
∗
Dm .

2.2 Cheap Subsampling confidence interval

We aim to construct confidence intervals for Ψ(P ) based on the estimator Ψ̂n and B ≥ 1
subsamples of size m. By repeating the subsampling procedure independently B ≥ 1 times
we obtain the estimates based on subsamples {Ψ̂∗(m,1) , . . . , Ψ̂∗(m,B) } and define the Cheap
Subsampling confidence interval as
r r
m m
I(m,n,B) = Ψ̂n − tB,1−α/2 S, Ψ̂n + tB,1−α/2 S , (1)
n−m n−m
where tB,1−α/2 is the 1 − α/2 quantile of a t-distribution with B degrees of freedom and
r 2
PB
S = B1 b=1 Ψ̂∗(m,b) − Ψ̂n .
The asymptotic validity of the Cheap Subsampling confidence interval (1) can be shown
under the assumption that Ψ̂n is asymptotically normal and that the subsample size m(n)
is chosen appropriately as a function of the sample size n. To ease the notation, we will just
write m instead of m(n) in what follows. Sufficient conditions for asymptotic validity are
formulated in Theorem 1.
Theorem 1. Let Ψ̂n be an asymptotically linear estimator of Ψ(P ). If the subsample size
m fulfills that supn∈N m
n ≤ c for some 0 < c < 1, then the coverage probability of the Cheap
Subsampling confidence interval I(m,n,B) converges to 1 − α, that is,

P (Ψ(P ) ∈ I(m,n,B) ) → 1 − α,

as m, n → ∞ for any B ≥ 1.
Proof. The proof is given in Appendix A.
Remark 1. Other choices of the subsample size may be of interest such as m/n → 1 and
n − m → ∞ as n → ∞ (Wu, 1990), but would require conditions on the remainder term Rn
that are too restrictive for our purposes.
Next, we state a result that shows that the endpoints of the Cheap Subsampling confi-
dence interval converge to a random limit fully determined by the data Dn as the number of
bootstrap repetitions increases (Theorem 2). Specifically, the theorem has the consequence
that the endpoints of the Cheap Subsampling confidence interval (1) converge to the end-
points of an (asymptotic) confidence interval based on the delete-(n − m) jackknife variance
estimator for the variance as B → ∞. The delete-(n−m) jackknife variance estimator (Shao
and Wu, 1989) is given by
m
d jack =
Var EP [(Ψ̂∗m − Ψ̂n )2 |Dn ].
n−m
If the condition of Theorem 2 is fulfilled, we have
r
m
q
P
Ψ̂n ± tB,1−α/2 S → Ψ̂n ± q1−α/2 Var
d jack , as B → ∞,
n−m
where q1−α/2 is the 1 − α/2 quantile of the standard normal distribution.

4
Theorem 2. Let Ψ̂n be any estimator. If EP [Ψ̂4n ] < ∞, then
B 2
1 X ∗ P
S2 = Ψ̂(m,b) − Ψ̂n → EP [(Ψ̂∗m − Ψ̂n )2 |Dn ],
B
b=1

as B → ∞ for fixed m and n.

3 Case study: Application to the LEADER data set

For the sole purpose of illustrating our method, we use the data from the LEADER trial
(Marso et al., 2016), where we apply the Cheap Subsampling confidence interval to the
longitudinal targeted minimum loss-based estimator (LTMLE) (van der Laan and Rose,
2018). We use a subset of the data from the LEADER trial, which includes 8652 patients
with type 2 diabetes. The baseline variables selected for analysis were sex, age, use of
thiazide, use of statin, hypertension, BMI, and the number of years since diabetes diagnosis.
The time-varying data was discretized into 8 intervals of 6 months length each. In each
interval, we defined time-varying covariates that represent HbA1c, use of thiazolidinediones,
use of sulfonylureas, use of metformin, and use of DPP-4 inhibitors. For each time interval,
the LTMLE algorithm estimates nuisance regression models for the outcome, the propensity
of treatment, and the censoring probability. Nuisance parameter estimation for the LTMLE
was performed using a discrete super learner (van der Laan et al., 2007), including learners
based on penalized regression (Tay et al., 2023) and random forests obtained with the ranger
algorithm (Wright and Ziegler, 2017). The discrete super learner used 2-fold cross-validation
to choose the best-fitting model. Throughout, we apply the Cheap Subsampling algorithm
to obtain confidence intervals for the causal effect of adhering to the placebo regimen and the
absolute 4-year risk of all-cause death. We compare our method with asymptotic confidence
intervals that are based on an estimate of the efficient influence function (van der Laan and
Gruber, 2012). We investigate the Monte Carlo error (effect of setting the random seed),
the impact of the subsample size, and the number of bootstrap repetitions on the Cheap
Subsampling confidence interval.
Figure 2 shows the effect of the number of bootstrap repetitions B on the Cheap Sub-
sampling confidence interval. We observe that the Cheap Subsampling confidence interval
is comparable to the asymptotic confidence interval and that a higher number of bootstrap
repetitions results in a more stable confidence interval. It is seen that the Cheap Subsam-
pling confidence intervals stabilize quickly and do not change significantly after 10 bootstrap
replications.
Due to the random nature of the bootstrap, the bootstrap confidence intervals are af-
fected by Monte Carlo error, i.e., they depend on the random seed set by the algorithm.
This means that the confidence intervals will change when repeating the whole bootstrap
procedure with a new random seed. In Figure 3, we illustrate this seed effect on the upper
endpoint of the confidence interval for various subsample sizes and 10 repetitions of the
whole Cheap Subsampling bootstrap algorithm. This shows that B ≥ 25 seems sufficient
for the seed effect to be negligible in our data example. We also see that in this example the
Cheap Bootstrap confidence interval does not depend on the subsample size with sufficiently
many bootstrap repetitions (B ≥ 25).

5
15%
Endpoints of the confidence intervals

10%

0 50 100 150 200

Bootstrap iteration (B)

Figure 2: Lower and upper endpoints (y-axis) of 95% Cheap Subsampling confidence inter-
vals for the absolute risk of dying within 4 years for the placebo regimen in the LEADER trial
using the LTMLE. The x-axis shows the number of bootstrap repetitions B ∈ {1, . . . , 200}
for the subsample size m = ⌊0.8 · 8652⌋ = 6850. Additionally, the lower and upper endpoints
of the asymptotic confidence interval is the black horizontal lines and the point estimate is
the dotted line.

4 Simulation study
In this section, we simulate data to investigate the effects of sample size, subsample size, and
the number of bootstrap repetitions on the coverage probability and the width of the Cheap
Subsampling confidence interval. We consider a survival setting with a binary treatment
and a time-to-event outcome and apply the LTMLE algorithm for which we discretize time
into 2 time intervals. In the simulation study, the target parameter is the absolute risk
of an event within the end of the second time interval under sustained treatment. For
details on the data-generating mechanism, the simulation study, and the R code, see the
supplementary material (Appendix C) and https://github.com/jsohlendorff/cheap_

6
15.0%
Upper limit of confidence interval

12.5%

Bootstrap
iteration (B)
5
10.0%
25
100
200

7.5%

5.0%

0.5 0.632 0.8 0.9

Subsample proportion (η)

Figure 3: The upper endpoint of 95% Cheap Subsampling confidence interval based on
the LTMLE of the absolute risk of dying within 4 years under the placebo regimen in the
LEADER trial. The plot shows the Monte Carlo error (random seed effect) based on 10
runs of the Cheap Subsampling algorithm for each of the subsample sizes m = ⌊η · 8652⌋
with η ∈ {0.5, 0.632, 0.8, 0.9} and number of bootstrap repetitions B ∈ {5, 20, 100, 200}.

subsampling_simulation_study.
In our simulation study, we consider sample sizes n ∈ {250, 500, 1000, 2000, 8000} and
vary the subsample size m = ⌊η · n⌋ with η ∈ {0.5, 0.632, 0.8, 0.9} and the number of
bootstrap repetitions B ∈ {1, . . . , 500}. For each scenario, we repeat the whole procedure in
2000 simulated data sets. For the estimation of the nuisance parameters, we use (correctly
specified) logistic regression models.
In each instance, we compute the empirical coverage of the confidence intervals and
the average relative width of the Cheap Subsampling confidence interval for the LTMLE
when compared with the asymptotic confidence interval which is based on an estimate of
the efficient influence function van der Laan and Gruber (2012). Additionally, we compare

7
our Cheap Subsampling confidence interval with the Cheap Bootstrap confidence interval
(Lam, 2022). The results are summarized across the 2000 simulated data sets in Table 1
and Figure 4.

96%

Confidence interval
method
Coverage

95% Cheap Bootstrap

Cheap Subsampling
Standard error based
on asymptotics

94%

0 100 200 300 400 500

Number of bootstrap samples (B)

Figure 4: Results from the simulation study illustrating the coverage (y-axis) of three 95%
confidence intervals for the absolute risk of an event before the end of the second time
interval under sustained exposure using the LTMLE for n = 2000. The three confidence
intervals are the asymptotic confidence interval, the Cheap Subsampling confidence interval
(m = ⌊0.632 · 2000⌋ = 1264), and the Cheap Bootstrap confidence interval (Lam, 2022).
The x-axis shows the number of bootstrap repetitions B ∈ {1, . . . , 500}.

Figure 4 shows that the coverage is close to the nominal level for very low numbers
of bootstrap repetitions and fixed subsample size m = ⌊0.632 · 2000⌋ = 1264. This was
guaranteed by Theorem 1 only in large samples. When we compare the coverage of the
asymptotic confidence interval with the coverage of the Cheap Bootstrap confidence interval
(Lam, 2022), we see that the Cheap Subsampling confidence interval has similar coverage,
albeit with slightly worse coverage for very low numbers of bootstrap replications B. Table

8
1 shows no systematic effects on the coverage of the Cheap Subsampling confidence interval
for different subsample sizes in large sample sizes. However, the coverage appears to depend
on the subsample size when the sample size is small.
For the widths in Table 1, we see that the Cheap Subsampling confidence interval is, in
general, slightly wider than the asymptotic confidence intervals, but that increasing B results
in narrower confidence intervals. A possible explanation for the wider Cheap Subsampling
confidence intervals at low B is that the quantiles of the t-distribution are large for low
degrees of freedom but quite comparable to the normal distribution for large degrees of
freedom (B ≥ 25). Similar results were obtained in the case study (Section 3) for small
values of B. Moreover, the width of the Cheap Subsampling confidence interval slightly
decreases with increasing subsample size and sample size when compared to the asymptotic
confidence interval, but this effect is less noticeable than the effect of the number of bootstrap
repetitions.

Table 1: The table shows the coverage and relative widths compared to the asymptotic
confidence interval of the 95% Cheap Subsampling confidence interval for the absolute risk
of an event before the end of the second time interval under sustained exposure using
the LTMLE for different subsample percentages η and sample sizes n and the number of
bootstrap repetitions B.
Coverage (%) Relative width (%)
Subsample proportion (η) Subsample proportion (η)
B n 50% 63.2% 80% 90% 50% 63.2% 80% 90%
5 250 94.8 93.2 92.9 93.7 131.0 127.5 127.5 127.2
500 94.2 93.8 95.0 94.5 126.3 126.0 126.9 125.8
1000 94.0 95.0 94.2 94.7 125.1 125.8 124.9 125.3
2000 94.8 95.1 95.0 94.5 125.6 126.2 123.5 124.8
8000 95.3 95.8 94.5 95.3 125.8 127.1 124.8 124.0
25 250 93.6 92.5 93.2 92.2 107.9 106.0 106.1 106.3
500 94.2 93.8 94.3 95.1 105.8 104.9 104.8 104.1
1000 94.5 95.2 94.2 94.3 104.3 104.8 104.5 104.1
2000 94.2 95.0 95.3 95.0 104.9 105.2 104.2 104.3
8000 94.7 95.3 95.3 94.5 104.2 104.8 104.0 103.6
100 250 93.8 92.7 93.2 92.5 104.8 103.5 103.2 103.0
500 94.2 93.8 94.9 95.0 102.5 102.2 101.8 101.7
1000 94.9 94.8 94.6 93.8 101.5 101.5 101.4 101.1
2000 94.2 94.6 94.8 95.0 101.5 101.3 101.0 101.1
8000 94.5 95.2 95.0 95.4 101.0 101.3 100.7 101.0
500 250 93.9 92.8 93.0 92.7 103.9 102.8 102.2 102.1
500 93.8 93.8 94.7 95.1 101.7 101.3 101.1 101.0
1000 94.8 94.8 94.5 94.0 100.8 100.7 100.6 100.5
2000 94.0 94.5 94.8 95.0 100.6 100.5 100.3 100.3
8000 94.6 94.9 95.0 94.7 100.2 100.3 100.2 100.2

9
5 Discussion
The Cheap Subsampling confidence interval is a valuable tool for applied research where
computational efficiency is needed. We have shown that it provides asymptotically valid
confidence intervals and investigated the real world and small sample performance for a
target parameter in a semiparametric causal inference setting. The Cheap Subsampling
confidence interval is easy to implement and can be applied to any asymptotically linear
estimator. Theoretically, the method can be applied already with very few bootstrap rep-
etitions. But, in our case study, the Monte Carlo error may not be regarded as negligible
for B < 25. This is similar to the suggestion given by Efron (1987). This is likely due to
mn 2
n−m S being a Monte Carlo bootstrap estimator of the asymptotic variance.
Our empirical study shows that the coverage of the Cheap Subsampling confidence inter-
val is more sensitive to the subsample sizes in small data sets. In these situations, we need
to choose the subsample size carefully to ensure correct coverage. Politis et al. (1999) and
Bickel and Sakov (2008) provide methods for adaptively selecting the subsample size. For
example, one may want to conduct a Monte Carlo experiment by selecting from a list of sub-
sample sizes m1 , . . . , mK the one that gives the best apparent coverage. The most notable
issue with these approaches is the computational burden. In future work, we will investigate
the possibility of adapting these methods to choosing the subsample size in practice.
With large data sets, such as those found in electronic health records, there is also the
possibility of using the Bag of Little Bootstraps (Kleiner et al., 2012) or the Cheap Bag of
Little Bootstraps (Lam, 2022). The idea behind these methods is to avoid the tuning of the
subsample size but to retain computational feasibility by estimating on smaller data sets.
Another advantage over the Cheap Subsampling confidence interval is that the confidence
intervals based on the Bag of Little Bootstraps are second-order accurate, likely resulting
in narrower confidence intervals. Lam (2022) showed that the Cheap Bootstrap based on
resampling yields confidence intervals that are second-order accurate. In future work, we
will investigate if the Cheap Subsampling bootstrap confidence interval can be made second-
order accurate, e.g., by using interpolation (Bertail, 1997) and extrapolation (Bertail and
Politis, 2001). On the other hand, the Bag of Little Bootstraps sample with replacement
and hence these methods suffer from the problem illustrated in Figure 1.
In our application of the Cheap Subsampling confidence intervals, we chose to study the
finite-sample properties with the TMLE, but there are also other choices given by Tran et al.
(2023); Coyle and van der Laan (2018). Both of these approaches provide valid confidence
intervals for the TMLE that adequately deal with the issue with cross-validation. The
method in Tran et al. (2023) also reduces the computation time for bootstrapping by only
needing to estimate the nuisance parameters once in the entire sample. However, these
approaches are specifically designed for the TMLE and may not be applicable to other
estimators.

5.1 Acknowledgments
The authors would like to thank Novo Nordisk for providing the data from the LEADER
trial.

10
5.2 Funding
Partially funded by the European Union. Views and opinions expressed are however those
of the author(s) only and do not necessarily reflect those of the European Union or Euro-
pean Health and Digital Executive Agency (HADEA). Neither the European Union nor the
granting authority can be held responsible for them. This work has received funding from
the UK research and Innovation under contract number 101095556.

5.3 Conflicts of interest

None declared.

5.4 Data availability

Data sharing cannot be provided due to the proprietary nature of the data.

References
Bertail, P. (1997). Second-order properties of an extrapolated bootstrap without replace-
ment under weak assumptions. Bernoulli 3 (2), 149 – 179.
Bertail, P. and D. N. Politis (2001). Extrapolation of subsampling distribution estima-
tors: The i.i.d. and strong mixing cases. The Canadian Journal of Statistics / La Revue
Canadienne de Statistique 29 (4), 667–680.
Bickel, P. J., F. Götze, and W. R. van Zwet (1997). Resampling Fewer Than n Observations:
Gains, Losses, and Remedies for Losses. Statistica Sinica 7 (1), 1–31.
Bickel, P. J., C. A. Klaassen, P. J. Bickel, Y. Ritov, J. Klaassen, J. A. Wellner, and Y. Ritov
(1993). Efficient and adaptive estimation for semiparametric models, Volume 4. Springer.
Bickel, P. J. and A. Sakov (2008). On the choice of m in the m out of n bootstrap and
confidence bounds for extrema. Statistica Sinica, 967–985.
Chiu, Y.-H., L. Wen, S. McGrath, R. Logan, I. J. Dahabreh, and M. A. Hernán (2023).
Evaluating model specification when using the parametric g-formula in the presence of
censoring. American journal of epidemiology 192 (11), 1887–1895.
Coyle, J. and M. J. van der Laan (2018). Targeted Bootstrap, pp. 523–539. Cham: Springer
International Publishing.
Efron, B. (1987). Better bootstrap confidence intervals. Journal of the American Statistical
Association 82 (397), 171–185.
Hernán, M. A. and J. M. Robins (2016). Using big data to emulate a target trial when a
randomized trial is not available. American journal of epidemiology 183 (8), 758–764.
Hernán, M. A. and J. M. Robins (2020). Causal inference: What if. Boca Raton: Chapman
& Hall/CRC, FL.
Kleiner, A., A. Talwalkar, P. Sarkar, and M. I. Jordan (2012). A scalable bootstrap for
massive data.

11
Lam, H. (2022, January). A Cheap Bootstrap Method for Fast Inference.
https://arxiv.org/abs/2202.00090v1.
Marso, S. P., G. H. Daniels, K. Brown-Frandsen, P. Kristensen, J. F. Mann, M. A. Nauck,
S. E. Nissen, S. Pocock, N. R. Poulter, L. S. Ravn, W. M. Steinberg, M. Stockner,
B. Zinman, R. M. Bergenstal, and J. B. Buse (2016). Liraglutide and cardiovascular
outcomes in type 2 diabetes. New England Journal of Medicine 375 (4), 311–322.
Politis, D. N. and J. P. Romano (1994). Large Sample Confidence Regions Based on Sub-
samples under Minimal Assumptions. The Annals of Statistics 22 (4), 2031 – 2050.
Politis, D. N., J. P. Romano, and M. Wolf (1999). Subsampling. Springer Series in Statistics.
New York, NY: Springer.
Shao, J. and C. F. J. Wu (1989). A General Theory for Jackknife Variance Estimation. The
Annals of Statistics 17 (3), 1176 – 1197.
Tay, J. K., B. Narasimhan, and T. Hastie (2023). Elastic net regularization paths for all
generalized linear models. Journal of Statistical Software 106 (1), 1–31.

Tibshirani, R. J. and B. Efron (1993). An introduction to the bootstrap. Monographs on

statistics and applied probability 57 (1).
Tran, L., M. Petersen, J. Schwab, and M. J. van der Laan (2023, January). Robust variance
estimation and inference for causal effect estimation. Journal of Causal Inference 11 (1).

van der Laan, M. J., D. Benkeser, and W. Cai (2023). Efficient estimation of pathwise
differentiable target parameters with the undersmoothed highly adaptive lasso. The In-
ternational Journal of Biostatistics 19 (1), 261–289.
van der Laan, M. J. and S. Gruber (2012, May). Targeted Minimum Loss Based Estimation
of Causal Effects of Multiple Time Point Interventions. The International Journal of
Biostatistics 8 (1).
van der Laan, M. J., E. C. Polley, and A. E. Hubbard (2007). Super learner. Statistical
Applications in Genetics and Molecular Biology 6 (1).
van der Laan, M. J. and S. Rose (2018). Targeted Learning in Data Science: Causal In-
ference for Complex Longitudinal Studies. Springer Series in Statistics. Cham: Springer
International Publishing.
Wright, M. N. and A. Ziegler (2017). ranger: A fast implementation of random forests for
high dimensional data in C++ and R. Journal of Statistical Software 77 (1), 1–17.
Wu, C. F. J. (1990). On the Asymptotic Properties of the Jackknife Histogram. The Annals
of Statistics 18 (3), 1438 – 1452.

12
6 Appendix A: Proof of Theorem 1
To prove the theorem, we use the notation and framework of Section 2.1. Since the estimator
Ψ̂n is asymptotically linear, Slutsky’s theorem and the central limit theorem yield
√ d d
→ Z = N (0, σ 2 )
n(Ψ̂n − Ψ(P )) −

where σ 2 = EP [ϕP (O)2 ] > 0. Theorem 2 (iii) of Wu (1990) gives that

r
mn P
sup P (Ψ̂∗m − Ψ̂n ) ≤ x|Dn − Φσ2 (x) → 0,
x∈R n−m

if n−m
n > λ for some λ > 0 for all n ∈ N. By our assumption on the subsample size, we
have mn ≤ c for some 0 < c < 1, and thus
n−m
n =1− m n ≥ λ := 1 − c > 0. This implies
r
mn P
P (Ψ̂∗ − Ψ̂n ) ≤ x|Dn → Φσ2 (x),
n−m m

for all x ∈ R as m, n → ∞, where Φσ2 is the cumulative distribution function of a Normal

distribution with mean 0 and variance σ 2 . The remainder of the proof follows along the
mn
steps of the proof of Theorem 1 in Lam (2022). Let k(m,n) = n−m . We want to show that
√ q q
d
( n(Ψ̂n − Ψ(P )), k(m,n) (Ψ̂∗(m,1) − Ψ̂n ), . . . , k(m,n) (Ψ̂∗(m,B) − Ψ̂n )) −
→ (Z0 , . . . , ZB ), (2)

d
where Z0 , . . . , ZB are independent and identically
p distributed with Zb = Z. By conditioning
on Dn and using conditional independence of k(m,n) (Ψ̂∗(m,b) − Ψ̂n ), b = 1, . . . , B (bootstrap
samples are drawn independently given the data), we have
√ q q
P n(Ψ̂n − Ψ(P )) ≤ z0 , k(m,n) (Ψ̂∗(m,1) − Ψ̂n ) ≤ z1 , . . . , k(m,n) (Ψ̂∗(m,B) − Ψ̂n ) ≤ zB
B
Y
− Φσ2 (zb )
b=0
" B B
#
√ Y q Y
≤E I n(Ψ̂n − Ψ(P )) ≤ z0 P k(m,n) (Ψ̂∗(m,b) − Ψ̂n ) ≤ zb |Dn − Φσ2 (zb )
b=1 b=1
(3)
B
Y h √ i
+ Φσ2 (zb ) P n(Ψ̂n − Ψ(P )) ≤ z0 − Φσ2 (z0 ) , (4)
b=1

√ d
for any z = (z0 , . . . , zB ) ∈ RB+1 . Since, by assumption, n(Ψ̂n − Ψ(P )) − → Z, the term in
P
equation (4) converges to zero as n → ∞. Since also P ( k(m,n) (Ψ̂∗(m,b) − Ψ̂n ) ≤ z|Dn ) →
p

Φσ2 (z) for b = 1, . . . , B and all z ∈ R as n → ∞, it follows that the integrand of the term (3)
converges to zero in probability as n → ∞. Since the integrand in the term (3) is bounded
by 1, it follows from dominated convergence that (3) tends to zero. Thus (2) holds. From

13
this result, we deduce that
√ √
n(Ψ̂n −Ψ(P ))
Ψ̂n − Ψ(P ) n(Ψ̂n − Ψ(P )) σ
T(m,n) = q = p =s √
m k(m,n) S 2
n−m S 1
PB k(m,n)
B b=1 σ (Ψ̂∗(m,b) − Ψ̂n )

d Z̃0
−
→q PB
1
B b=1 Z̃b2

d
where Z̃b = Zb /σ = N (0, 1). Note that by the independence of the Z̃b ’s and the fact that
d Z̃0
Z̃b = N (0, 1), we have that √ 1 P B 2
has a t-distribution with B degrees of freedom. This
B b=1 Z̃b
shows that T(m,n) converges in distribution to a t-distribution with B degrees of freedom,
as n → ∞. Thus, we have

P (Ψ(P ) ∈ I(m,n,B) ) = P (−tB,1−α/2 < T(m,n) < tB,1−α/2 ) → 1 − α,

as n → ∞.

7 Appendix B: Proof of Theorem 2

h i
To prove the theorem, we shall argue that E (Ψ̂∗m − Ψ̂n )4 < ∞. An application of Cheby-
shev’s inequality for conditional expectations then provides the means to show the statement
of the theorem.
Since E[Ψ̂4n ] < ∞, the binomial theorem gives
4
X 4
E[(Ψ̂∗m − Ψ̂n ) ] = 4
E[Ψ̂4−i ∗ i
n (Ψ̂m ) ].
i=0
i

∗,i
Applying Hölder’s inequality with p = 4/i and q = 4/(4 − i), we have |E[Ψ̂4−i n Ψ̂m ]| ≤
4 1/p ∗ 4 1/q 4 1/p 4 1/q
(E[Ψ̂n ]) E[(Ψ̂m ) ] = (E[Ψ̂n ]) E[(Ψ̂m ) ] , where the latter equality follows from the
fact that the subsample has marginally the same distribution as a full sample of m obser-
vations. Moreover, since E[(Ψ̂∗m − Ψ̂n )2 |Dn ] = argming E[(Ψ̂∗m − Ψ̂n )2 − g(Dn )]2 , where the
minimum is taken over all Dn -measurable functions g, we have that
2 h i
∗ ∗
2 2
E Ψ̂m − Ψ̂n ) − E[(Ψ̂m − Ψ̂n ) |Dn ] ≤ E (Ψ̂∗m − Ψ̂n )4 < ∞. (5)

By using that conditionally on Dn , the random variables Ψ̂∗(m,1) , . . . , Ψ̂∗(m,B) are indepen-
dent, we have by Chebyshev’s inequality for conditional expectations, for arbitrary ε > 0,
that
B
! B
1 X 2 1 X
∗
P 2
Ψ̂(m,b) − Ψ̂n − E[(Ψ̂m − Ψ̂n ) |Dn ] ≥ ε Dn ≤ 2 2 Var[(Ψ̂∗(m,b) −Ψ̂n )2 |Dn ].
B B ε
b=1 b=1

14
Taking the expectation on both sides of the previous display, we have
B
!
1 X 2
∗ 2
P Ψ̂(m,b) − Ψ̂n − E[(Ψ̂m − Ψ̂n ) |Dn ] ≥ ε
B
b=1
1
≤ E[Var[(Ψ̂∗ − Ψ̂n )2 |Dn ]] (6)
Bε2 m
1 2
∗ 2 ∗ 2
= E E (Ψ̂ m − Ψ̂ n ) − E[( Ψ̂ m − Ψ̂ n ) |D n ] | Dn
Bε2
2
1 ∗ 2 ∗ 2
= E ( Ψ̂m − Ψ̂n ) − E[(Ψ̂ m − Ψ̂ n ) |D n ] (7)
Bε2
1 h i
≤ 2
E (Ψ̂∗m − Ψ̂n )4 . (8)
Bε
2
where (6) follows since Ψ̂(m,b) − Ψ̂n , b = 1, . . . , B are identically distributed and expec-
tations are linear, (7) follows from the tower property of conditional expectations, and (8)
follows by (5). Taking the limit as B → ∞ concludes the proof.

8 Appendix C: Data-generating mechanism

In this section, we describe the data-generating mechanism for our simulation study (Section
4). See https://github.com/jsohlendorff/cheap_subsampling_simulation_study for
the R code. We denote by Yt the outcome, where Yt = 1 if an event has occurred at time t
and Yt = 0 otherwise. We also denote the treatment by At (1 is treated and 0 is untreated),
the time-varying confounders by Wt (continuous), and the censoring indicators by Ct (1 is
uncensored and 0 is censored). We generate the data at baseline (t = 0) and at two time
points (t = 1 and t = 2) as follows:
W0 ∼ N (0, 1),
A0 ∼ Bern(expit(−0.2 + 0.4W0 )),
C1 ∼ Bern(expit(3.5 + W0 )),
(
Bern(expit(−1.4 + 0.1W0 − 1.5A0 )), if C1 = 1
Y1 ∼
∅, if C1 = 0,
(
N (0.5W0 + 0.2A0 , 1), if C1 = 1 and Y1 = 0
W1 ∼
∅, otherwise,
(
Bern(expit(−0.4W0 + 0.8A0 )), if C1 = 1 and Y1 = 0
A1 ∼
∅, otherwise,

Bern(expit(3.5 + W1 )), if C1 = 1 and Y1 = 0

C2 ∼ 0 if C1 = 0

∅, otherwise,


Bern(expit(−1.4 + 0.1W1 − 1.5A1 )), if C2 = 1 and Y1 = 0

Y2 ∼ 1 if C2 = 1 and Y1 = 1

∅, if C2 = 0,


15
1
where expit(x) = 1+exp(−x) is the logistic function, N (µ, σ 2 ) is the normal distribution with
mean µ and variance σ 2 , Bern(p) is the Bernoulli distribution with probability p, and ∅ is
the missingness indicator.

Soul Therapy A 365 Day Journal For Self Exploration Healing and
No ratings yet
Soul Therapy A 365 Day Journal For Self Exploration Healing and
388 pages
Davison Hinkley Bootstrap Methods and Their Application
No ratings yet
Davison Hinkley Bootstrap Methods and Their Application
596 pages
(Dimitris N. Politis, Joseph P. Romano, Michael Subsampling
No ratings yet
(Dimitris N. Politis, Joseph P. Romano, Michael Subsampling
180 pages
Bootstrap Methods and Their Application
100% (1)
Bootstrap Methods and Their Application
596 pages
Unseen Passage For Class 7 With Questions
100% (1)
Unseen Passage For Class 7 With Questions
8 pages
1ps0 01 Rms 20240822
100% (1)
1ps0 01 Rms 20240822
27 pages
Conversation Course Book
No ratings yet
Conversation Course Book
41 pages
CPWI
No ratings yet
CPWI
168 pages
An Introduction To The Bootstrap 3ai7r0o65z
No ratings yet
An Introduction To The Bootstrap 3ai7r0o65z
8 pages
Microeconometrics Slides
No ratings yet
Microeconometrics Slides
346 pages
Bootstrap
No ratings yet
Bootstrap
63 pages
Module
No ratings yet
Module
348 pages
Lesson 16
No ratings yet
Lesson 16
24 pages
Ece R2022 Syllabus 1-4
No ratings yet
Ece R2022 Syllabus 1-4
87 pages
MODULE 9 Personal Relationships
No ratings yet
MODULE 9 Personal Relationships
91 pages
(tailieudieuky.com) Đề thi tuyển sinh vào lớp 10 THPT, THPT Chuyên Lương Văn Tụy, Ninh Bình năm học 2022-2023 môn Tiếng Anh (chuyên) có đáp án
No ratings yet
(tailieudieuky.com) Đề thi tuyển sinh vào lớp 10 THPT, THPT Chuyên Lương Văn Tụy, Ninh Bình năm học 2022-2023 môn Tiếng Anh (chuyên) có đáp án
10 pages
L8 Bootstrap Methods
No ratings yet
L8 Bootstrap Methods
69 pages
S M S T C Lecture 2425 4
No ratings yet
S M S T C Lecture 2425 4
43 pages
Bootstrap Report
No ratings yet
Bootstrap Report
92 pages
Bootstrap Methods 2020
No ratings yet
Bootstrap Methods 2020
16 pages
Journal of The American Statistical Association
No ratings yet
Journal of The American Statistical Association
16 pages
Horowitz Annu Rev
No ratings yet
Horowitz Annu Rev
67 pages
User Manual
No ratings yet
User Manual
86 pages
I ST Year II Sem. Scheme Syllabus 22 23 New
No ratings yet
I ST Year II Sem. Scheme Syllabus 22 23 New
38 pages
Chapter 4
No ratings yet
Chapter 4
25 pages
Intro Bootstrap 341
No ratings yet
Intro Bootstrap 341
18 pages
Accuracy Measures
No ratings yet
Accuracy Measures
61 pages
2015 ChangHall Biometrika
No ratings yet
2015 ChangHall Biometrika
13 pages
AOD Lec9
No ratings yet
AOD Lec9
26 pages
MPRA Paper 7163
No ratings yet
MPRA Paper 7163
24 pages
Int Statistical Rev - 2013 - H Rdle - Bootstrap Methods For Time Series
No ratings yet
Int Statistical Rev - 2013 - H Rdle - Bootstrap Methods For Time Series
25 pages
A General Bootstrap Algorithm For Hypothesis Testing
No ratings yet
A General Bootstrap Algorithm For Hypothesis Testing
12 pages
Click The Link Below To Download
100% (1)
Click The Link Below To Download
58 pages
Boot
No ratings yet
Boot
15 pages
A Comparison of Bootstrap Methods For Variance Estimation
No ratings yet
A Comparison of Bootstrap Methods For Variance Estimation
22 pages
Engineering Management Prelim Reviewer
No ratings yet
Engineering Management Prelim Reviewer
5 pages
Bootstrap
No ratings yet
Bootstrap
52 pages
Bootstrapping Time Series Models
No ratings yet
Bootstrapping Time Series Models
43 pages
Braun Bootstrap2012 PDF
No ratings yet
Braun Bootstrap2012 PDF
63 pages
Boots Trapping
No ratings yet
Boots Trapping
4 pages
Documentary Compressed
No ratings yet
Documentary Compressed
15 pages
Lecture 4
No ratings yet
Lecture 4
6 pages
Bde Unit IV
No ratings yet
Bde Unit IV
21 pages
Bootstrap Method PDF
No ratings yet
Bootstrap Method PDF
14 pages
Acha, Akintunde and Charles 2023
No ratings yet
Acha, Akintunde and Charles 2023
13 pages
1 s2.0 S0167947399000663 Main
No ratings yet
1 s2.0 S0167947399000663 Main
11 pages
EH-SolutionsLibrary 999902104 1
No ratings yet
EH-SolutionsLibrary 999902104 1
16 pages
1983 Efron Gong A Leisurely Look at The Bootstrap Jackknife CV CV
No ratings yet
1983 Efron Gong A Leisurely Look at The Bootstrap Jackknife CV CV
14 pages
Ute Electrical Engineer
No ratings yet
Ute Electrical Engineer
14 pages
Academic Emergency Medicine - 2008 - Haukoos - Advanced Statistics Bootstrapping Confidence Intervals For Statistics With
No ratings yet
Academic Emergency Medicine - 2008 - Haukoos - Advanced Statistics Bootstrapping Confidence Intervals For Statistics With
7 pages
AdvEcx Chp3 Full 3006
No ratings yet
AdvEcx Chp3 Full 3006
17 pages
Yarn Breakage
No ratings yet
Yarn Breakage
24 pages
Bootstrap
No ratings yet
Bootstrap
4 pages
Ray Tracing Study of Optical Characteristics of The Solar Image in The Receiver For A Thermal Solar Parabolic Dish Collector
No ratings yet
Ray Tracing Study of Optical Characteristics of The Solar Image in The Receiver For A Thermal Solar Parabolic Dish Collector
12 pages
Lesson 9 TCW Report
No ratings yet
Lesson 9 TCW Report
14 pages
Polycyclic Aromatic Hydrocarbons in Biomass-Burning Emissions and
No ratings yet
Polycyclic Aromatic Hydrocarbons in Biomass-Burning Emissions and
11 pages
DLL - All Subjects 2 - Q4 - W3 - D4
No ratings yet
DLL - All Subjects 2 - Q4 - W3 - D4
9 pages
Bootstrap 1
No ratings yet
Bootstrap 1
7 pages
Bootstrap Methodology
No ratings yet
Bootstrap Methodology
33 pages
Estacion Experiment 7 Report
No ratings yet
Estacion Experiment 7 Report
10 pages
Lecture 9 PDF
No ratings yet
Lecture 9 PDF
22 pages
Gold-First-NE-2015-Exam-Maximiser-Answer-Key First For Schools - Answer Key UNIT 1 Vocabulary 1 - Studocu PDF
No ratings yet
Gold-First-NE-2015-Exam-Maximiser-Answer-Key First For Schools - Answer Key UNIT 1 Vocabulary 1 - Studocu PDF
1 page
A Leisurely Look at The Bootstrap, The Jackknife, and Cross-Validation (1983 13s) - BRADLEY EFRON
No ratings yet
A Leisurely Look at The Bootstrap, The Jackknife, and Cross-Validation (1983 13s) - BRADLEY EFRON
13 pages
07 - An Exploration Into Some Dominant Features of Filipino Social Behavior
No ratings yet
07 - An Exploration Into Some Dominant Features of Filipino Social Behavior
8 pages
Resampled Inference Resampled Inference
No ratings yet
Resampled Inference Resampled Inference
21 pages
Wasserman 8 PDF
No ratings yet
Wasserman 8 PDF
12 pages
Bootstrapping Techniques in Statistical Analysis and Approaches in R MATH 289
No ratings yet
Bootstrapping Techniques in Statistical Analysis and Approaches in R MATH 289
10 pages
L22 Bootstrap
No ratings yet
L22 Bootstrap
7 pages
Lecture+14 SAS Bootstrap and Jackknife
No ratings yet
Lecture+14 SAS Bootstrap and Jackknife
12 pages
Ultraviolet and Visible Spectros
No ratings yet
Ultraviolet and Visible Spectros
7 pages
Bootstrapping The General Linear Hypothesis Test: Pedro Delicado
No ratings yet
Bootstrapping The General Linear Hypothesis Test: Pedro Delicado
17 pages
v2n220141 14 01
No ratings yet
v2n220141 14 01
14 pages
(1994) The Stationary Bootstrap - Politis and Romano
No ratings yet
(1994) The Stationary Bootstrap - Politis and Romano
12 pages
Bootstrap Explained
No ratings yet
Bootstrap Explained
1 page
Advanced English Grammar
No ratings yet
Advanced English Grammar
2 pages
Energy Losses in Bends and Fittings - F1-22
No ratings yet
Energy Losses in Bends and Fittings - F1-22
1 page
Talc - and Serpentine-Like "Garnierites"
No ratings yet
Talc - and Serpentine-Like "Garnierites"
2 pages
Estimation Through Bootsrtapping
No ratings yet
Estimation Through Bootsrtapping
6 pages
Resampling Methods For Time Series
No ratings yet
Resampling Methods For Time Series
5 pages
Bootstrapping Regression Models: 1 Basic Ideas
No ratings yet
Bootstrapping Regression Models: 1 Basic Ideas
14 pages
Fast Approximation of The Bootstrap For Model Selection
No ratings yet
Fast Approximation of The Bootstrap For Model Selection
6 pages
Chapter 23 Summary: T Method. We Discussed The Pros and Cons of Each Method and Illustrated
No ratings yet
Chapter 23 Summary: T Method. We Discussed The Pros and Cons of Each Method and Illustrated
2 pages
Basic Calc q4 Wk4 Las1
No ratings yet
Basic Calc q4 Wk4 Las1
1 page
Bootstrapping: Bias Statistic
No ratings yet
Bootstrapping: Bias Statistic
2 pages
Bootstrap Up
No ratings yet
Bootstrap Up
5 pages
Lecture 19 20
No ratings yet
Lecture 19 20
5 pages
10 Minute Guide to Orthogonal Array Test Strategy
From Everand
10 Minute Guide to Orthogonal Array Test Strategy
Rajeev Nair Raman
No ratings yet
Active Appearance Model: Unlocking the Power of Active Appearance Models in Computer Vision
From Everand
Active Appearance Model: Unlocking the Power of Active Appearance Models in Computer Vision
Fouad Sabry
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet

Cheap Subsampling Bootstrap Confidence Intervals For Fast and Robust Inference in Biostatistics

Uploaded by

Cheap Subsampling Bootstrap Confidence Intervals For Fast and Robust Inference in Biostatistics

Uploaded by

Cheap Subsampling bootstrap confidence intervals for

fast and robust inference in biostatistics

and Thomas Alexander Gerds1

Sample Resampled Data

Sample Subsampled Data

2 Cheap Subsampling bootstrap

2.2 Cheap Subsampling confidence interval

as B → ∞ for fixed m and n.

3 Case study: Application to the LEADER data set

0 50 100 150 200

0.5 0.632 0.8 0.9

95% Cheap Bootstrap

0 100 200 300 400 500

5.3 Conflicts of interest

5.4 Data availability

Tibshirani, R. J. and B. Efron (1993). An introduction to the bootstrap. Monographs on

where σ 2 = EP [ϕP (O)2 ] > 0. Theorem 2 (iii) of Wu (1990) gives that

for all x ∈ R as m, n → ∞, where Φσ2 is the cumulative distribution function of a Normal

P (Ψ(P ) ∈ I(m,n,B) ) = P (−tB,1−α/2 < T(m,n) < tB,1−α/2 ) → 1 − α,

7 Appendix B: Proof of Theorem 2

8 Appendix C: Data-generating mechanism

You might also like