0% found this document useful (0 votes)
3 views24 pages

Davidson

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views24 pages

Davidson

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

An Improved Fast Double Bootstrap

by
Russell Davidson
Department of Economics and CIREQ Aix-Marseille Université
McGill University CNRS, EHESS, AMSE
Montreal, Quebec, Canada 13205 Marseille cedex 01, France
H3A 2T7
email: [email protected]
and
Andrea Monticini
Catholic University
Via Necchi 5
20123 Milan, Italy
email: [email protected]

Abstract

The fast double bootstrap can improve considerably on the single bootstrap when the
bootstrapped statistic is approximately independent of the bootstrap DGP. This is
because, among the approximations that underlie the fast double bootstrap (FDB),
is the assumption of such independence. In this paper, use is made of a discrete
formulation of bootstrapping in order to develop a conditional version of the FDB,
which makes use of the joint distribution of a statistic and its bootstrap counterpart,
rather than the full joint distribution of the statistic and the bootstrap data-generating
process (DGP), which is available only by means of a simulation as costly as the full
double bootstrap. Simulation evidence shows that the conditional FDB can greatly
improve on the performance of the FDB when the statistic and the bootstrap DGP
are far from independent, while giving similar results in cases of near independence.

Keywords: Bootstrap inference, fast double bootstrap, discrete model, condi-


tional fast double bootstrap
JEL codes: C12, C22, C32

This research was supported by the Canada Research Chair program (Chair in Economics,
McGill University) and by grants from the Fonds de Recherche du Québec - Société et Culture.
This work was also supported by the French National Research Agency Grant ANR-17-EURE-
0020

February, 2023
1. Introduction

The double bootstrap proposed by Beran (1988) can permit more reliable inference
than the ordinary or single bootstrap, by means of bootstrapping the single bootstrap
P value. If the statistic that is bootstrapped is pivotal for the null model under
test, then bootstrap inference is exact up to simulation randomness. Similarly, an
approximately pivotal statistic, for instance an asymptotically pivotal statistic, benefits
when bootstrapped from bootstrap refinements; see Hall (1992). Beran shows how
the bootstrap P value obtained by bootstrapping any statistic, approximately pivotal
or not, is closer to being pivotal than the original statistic, and thus benefits from
asymptotic refinements. Further iterations are expected in general to improve on the
double bootstrap, although their implementation is usually extremely computationally
costly. The idea is that, if the (finite-sample) distribution of a statistic is known, it can
yield exact inference, and iterated bootstraps provide better and better approximations
to the distribution of the P value of the previous iteration.
The fast double bootstrap (FDB) studied in detail by Davidson and MacKinnon (2007)
has been shown to improve the reliability of bootstrap inference in many practical situ-
ations; see among others Davidson and MacKinnon (2002), Lamarche (2004), Davidson
(2006), Omtzigt and Fachin (2006), Ahlgren and Antell (2008), and Ouysse (2013).
Like the double bootstrap, it makes use of an approximation to the distribution of
the single bootstrap P value, but without introducing an additional layer of boot-
strapping. Rather, at each step of the single layer of simulation, joint realisations are
obtained of the ordinary bootstrap statistic, which we call τ , and of the bootstrap
statistic, denoted τ 1 , generated by the bootstrap data-generating process (DGP) that
is distributed jointly with τ . Let the distribution function (CDF) of the statistic τ
under some DGP µ be characterised by the function R0 (·, µ), and let the CDF of the
second-layer statistic τ 1 be R1 (·, µ). Under the assumption that both statistics have
absolutely continuous distributions on their support, let the uniquely defined quantile
functions that are inverse to R0 and R1 be Q0 (·, µ) and Q1 (·, µ) respectively.
Let the bootstrap DGP that is generated jointly with τ from the data be denoted β.
The bootstrap P value can then be written as p1 = R0 (τ, β), where, without loss of
generality and for notational convenience, the rejection region is presumed to be to the
left of the statistic. We write R1 (·, µ) for the CDF of p1 under DGP µ. Beran’s double
bootstrap approximates this by R1 (·, β) at the cost of a nested layer of simulation.
The FDB approximation of R1 (·, µ) is R1f (·, β) ≡ R0 (Q1 (·, β), β), so that a simulation
estimate of R1f can be obtained using the simulated distributions of τ and τ 1 . However,
this approximation depends only on the marginal distributions. In this paper, we see
how to make use of the joint distribution of τ and τ 1 to get a better approximation,
and hence more reliable inference.
The FDB approximation relies on two assumptions. The first is that the statistic τ and
the bootstrap DGP β are approximately (usually asymptotically) independent. This
assumption is correct in many, but by no means all, commonly occurring situations.
The other assumption, which we will not spell out in detail at this point, is reasonable
more generally, and can usually be checked by an asymptotic argument.

–1–
The paper is laid out as follows. In Section 2, the maximum-entropy (ME) bootstrap
of Vinod (2006) is discussed, and its failings noted. Analysis of these failings is un-
dertaken by use of a discrete formulation of bootstrapping introduced by Davidson
(2017b), which facilitates much analysis of the bootstrap by making it possible to
obtain exact expressions for quantities without any asymptotic approximation. Some
results are established concerning the joint distribution of the statistics τ and τ 1 . These
results are extended in Section 3, to show how the joint distribution of τ and β affects
the bootstrap discrepancy, that is, the error in the rejection probability of a bootstrap
test. A link is then forged to a diagnostic procedure proposed by Davidson (2017a),
and it is shown that this procedure can detect possible over- or under-rejection by a
bootstrap test. Then, in Section 4, the fast double bootstrap (FDB) of Davidson and
MacKinnon (2007) is formulated in the discrete formulation used for the theoretical
parts of this paper, and an exact expression for the CDF of the FDB P value given.
However, the main contribution of this section is to modify the FDB by making explicit
use of the joint distribution of τ and τ 1 . This is an approximate version of a result in
Davidson and MacKinnon (1999) that depends on the joint distribution of τ and β.
Whereas that earlier result involves distributions that cannot be estimated without
great computational cost, the modified FDB, called CFDB for conditional FDB, in-
volves no more cost than the FDB itself. An explicit algorithm is given for the CFDB
that is not restricted to the discrete formulation. Simulation evidence is provided in
Section 5 to show that the CFDB gives results little different from those given by the
FDB when τ and τ 1 are nearly independent, but that it greatly reduces the bootstrap
discrepancy when they are dependent, as is the case with the ME bootstrap, and rivals
the full double bootstrap in its reliability. Section 6 concludes, and discusses how the
work of this paper can be extended in various directions.

2. Joint Distribution of the Statistic and the Bootstrap Statistic

A technique for bootstrapping time series, called the maximum-entropy bootstrap,


was introduced by Vinod (2006) and further studied by Vinod and López-de Lacalle
(2009). However, Bergamelli, Novotný, and Urga (2015) analysed the performance
of the technique for a unit-root test and found that it was very poor indeed. They
were able to show that this was due to the near perfect rank correlation of the test
statistic (our τ ) and the bootstrap statistics (realisations of our τ 1 ) generated by the
ME bootstrap.
Davidson (2017a) proposed a diagnostic procedure for evaluating bootstrap perfor-
mance, based on a simulation experiment. This was used in a working paper, David-
son and Monticini (2014), to examine the ME bootstrap for testing hypotheses in
the context of a linear regression model with strongly serially correlated data. They
found that it was characterised by serious under-rejection for all conventional levels,
and over-rejection for levels greater than a half. Like Bergamelli et al , they also saw
that statistics τ realised with serially correlated data and their ME bootstrap counter-
parts τ 1 were strongly positively correlated. In this section, it will be shown that these
phenomena are not at all unrelated, and that, quite generally, correlation between a

–2–
statistic and its bootstrap version aggravates the size distortion of a bootstrap test, in
a way qualitatively similar to what was seen with the ME bootstrap.

The diagnostic procedure works as follows: for each of N replications, simulated data
are drawn from the chosen DGP, and are then used to generate a realisation τi ,
i = 1, . . . , N , of a test statistic, and, at the same time, a realisation of the corre-
sponding bootstrap DGP, βi say, which is then used to generate a single realisation τi1
of the bootstrap statistic.

The marginal distributions of the variables τ and τ 1 of which the τi and τi1 are realisa-
tions should, if the bootstrap is to perform satisfactorily, be similar: Davidson (2017a)
suggests comparing kernel-density plots of the simulated statistics to see whether this
is the case. The diagnostic that more particularly concerns us here is given by an
OLS regression of the τi1 on a constant and the τi . A significant coefficient for the
regressor τ indicates correlation in the joint distribution of τ and τ 1 , with the same
sign as that of the estimated coefficient.

The discrete formulation

In order to analyse the consequences of a correlation of τ and τ 1 , it is convenient to use


the discrete formulation of bootstrapping developed in Davidson (2017b). It is only
fair to warn readers that this formulation is unconventional, although it accurately
reflects very general bootstrap procedures. Its advantage for the current study is that
the discreteness makes it straightforward to derive tractable expressions for the joint
distribution of τ and τ 1 , without any need for limiting arguments or any asymptotic
reasoning. In particular, there is no need for regularity conditions in order to derive
results like the asymptotic validity of some bootstrap procedure. Indeed, the discrete
approach is not designed to treat anything asymptotic at all. However, it is worth
emphasising that it can encompass situations that in a more conventional approach
would be considered instances of bootstrap failure, as well as situations in which the
bootstrap would have desirable properties. It would be necessary to impose further
structure on the model described below before any judgements of bootstrap success or
failure could be made. Without additional structure, the model is limited to discus-
sion of purely formal properties of bootstrapping in general, although these include
bootstrap iteration, as shown in Davidson (2017b).

It may be thought that theoretical results derived from a discretised model are not
much related to the usual case in which both the statistic and the bootstrap discrep-
ancy vary over a continuous domain. In order to refute this idea, it is useful, first,
to recall that computers work with discrete quantities, and, second, that nothing in
our arguments prevents the discretisation from being arbitrarily fine, and so approxi-
mating the continuous setup arbitrarily well. The virtue of the discrete formulation is
that it enables analysis that is perfectly rigorous within its own limitations, and with
no appeal to asymptotic approximations. In fact, the sample size is a quantity that
appears nowhere in the basic discrete model.

–3–
It is assumed that the statistic τ , in approximate P value form, can take on only the
values πi , i = 0, 1, . . . , n, with

0 = π0 < π1 < π2 < . . . < πn−1 < πn = 1.

For instance, if n = 100, with πi = i/100, i = 0, 1, . . . , n, P values would thereby


be limited to integer percentages. Further, we assume that there are only m possible
DGPs in the model that represents the null hypothesis. Thus the outcome space
on which the random variables τ , the test statistic, and the bootstrap DGP β, are
defined consists of just m(n + 1) points, labelled by two integer coordinates (i, j),
i = 0, 1, . . . , n, j = 1, . . . , m. Any scalar- or vector-valued function of the coordinates
(i, j) is a random variable defined on this outcome space.
The third component of a probability space is a probability measure. The probability
space we use here is completely characterised by the discrete set of probabilities pkij ,
k, j = 1, . . . , m, i = 0, 1, . . . , n, where, under the DGP indexed by k,
 
pkij = Pr k τ = πi and β = j .

It follows for all k = 1, . . . , m, that


X
n X
m
pkij = 1. (1)
i=0 j=1

Make the following definitions for k = 1, . . . , m and i = 0, . . . , n:

X
m X
i−1 X
m X
n
Pki = pkij , akij = pklj , Aki = akij , bkj = pkij . (2)
j=1 l=0 j=1 i=0

Clearly, Pki is the probability under DGP k of the event τ = πi ; akij the probability
that simultaneously τ < πi and β = j; Aki the probability that τ < πi ; bkj the
probability that β = j. It follows directly from (1) and (2) that, for k = 1, . . . , m,

X
m X
i−1
bkj = 1 and Aki = Pkl . (3)
j=1 l=0

We begin by expressing the probability under DGP k that τ = πi and τ 1 = πl . With


probability pkij , τ = πi and β = j. If β = j, the probability that τ 1 = πl is Pjl . Thus

 X
m
Pk (τ = πi ) ∧ (τ = πl ) =
1
pkij Pjl . (4)
j=1

In addition, we have for the marginal distribution of τ 1 that


X
n X
m X
m
1
Pk (τ = πl ) = pkij Pjl = bkj Pjl . (5)
i=0 j=1 j=1

–4–
The following theorem shows that independence of τ and τ 1 follows from that of τ
and β, and that the implication goes in the other direction as well under stronger
regularity conditions.

Theorem 1
If, under DGP k, the statistic τ and the bootstrap DGP β are independent, then τ
and τ 1 are also independent. The converse implication holds as well if the m × (n + 1)
matrix P with element jl given by Pjl has full row rank.

Proof:
Independence of τ and β under DGP k means that pkij = bkj Pki , and so the joint
probability (4) becomes

X
m X
m
bkj Pki Pjl = Pki bkj Pjl ,
j=1 j=1

and this is the product of the probability Pki that τ = πi , and the probability that
τ 1 = πl , by (5). Thus independence of τ and β implies that of τ and τ 1 .
Next, suppose that τ and τ 1 are independent under k. This implies that, for all
i, l = 0, . . . , n,
X
m
1 1
Pk (τ = πl τ = πi ) = Pk (τ = πl ) = bkj Pjl .
j=1
Pm
The conditional probability above is equal to j=1 pkij Pjl /Pki , and so the condition
is equivalent to

X
m
Pjl (bkj Pki − pkij ) = 0 for all i, l = 0, . . . , n. (6)
j=1

Now Pjl is element jl of the matrix P , which is supposed to have full row rank. Thus
(6) implies that, for all i = 0, . . . , n and j = 1, . . . , m, pkij = bkj Pki . This expresses
the independence of τ and β under k.

Remarks:
The most interesting result of this theorem is the first. It is hard to derive any intuition
about the full-rank condition. In some sense, it implies that the different DGPs in the
model are sufficiently different. In addition, it is necessary for the full-rank condition
that m ≤ n + 1. In particular, if τ is an exact pivot for the model, then all the rows
of P are the same.
In Davidson and MacKinnon (1999), it is shown that, if τ and β are asymptotically
independent, the bootstrap benefits from an additional asymptotic refinement relative

–5–
to cases without such approximate independence. The result of the present theorem
requires exact independence, but applies exactly in finite samples.
The virtues and limitations of the discrete approach may be better appreciated by
comparing the proof above of the first part of the theorem with the one given in the
Appendix for the absolutely continuous case. This alternative proof illustrates how
the discrete and absolutely continuous cases are related.
Corollary 1
If the statistic τ is a pivot for the model, so is τ 1 , both τ and τ 1 have the same
distribution, and the two are independent.

Proof:
If τ is a pivot, then Pki = Pi independent of k. From (5), for k = 1, . . . , m,
X
m X
m
Pk (τ 1 = πl ) = bkj Pjl = Pl bkj = Pl ,
j=1 j=1

where the last equality follows from (3). This implies that τ 1 has the same distribution
as τ for all k, and so is a pivot. The joint distribution (4) becomes

 X
m X
m
Pk (τ = πi ) ∧ (τ = πl ) =
1
pkij Pjl = Pl pkij = Pl Pi ,
j=1 j=1

which demonstrates the independence of τ and τ 1 .

3. The Bootstrap P Value and the Error in Rejection Probability

Consider next the bootstrap P value with joint realisation (i, j). It is the probability
mass under the bootstrap DGP j of a value of τ less than πi , that is, Aji . Note that
this makes no mention of the DGP that generated the realisation – in practice the true
DGP is unknown. But we can nonetheless compute the distribution of the bootstrap
P value under a given DGP k. Denote by R1 (·, k) the CDF of this distribution under k.
Then
X
n Xm
R1 (x, k) = Pk (Aji ≤ x) = pkij I(Aji ≤ x), (7)
i=0 j=1

where I(·) is an indicator function. Let qj1 (x) be defined by



qj1 (x) = max i : Aji ≤ x}. (8)
i=0,...,n+1

Then, as shown explicitly in Davidson (2017b), (7) becomes


X
m
R1 (x, k) = akqj1 (x)j . (9)
j=1

–6–
When Aji is the realised bootstrap P value, the bootstrap test rejects the null hypoth-
esis at nominal level α if Aji < α. Under DGP k, the actual probability of rejection is
R1 (α, k). For α = Aki with arbitrary i, this becomes

qj1 (Aki )−1


X
m X
m X
R1 (Aki , k) = akqj1 (Aki )j = pklj . (10)
j=1 j=1 l=0

The bootstrap discrepancy at level Aki under k, that is, the difference between the
actual rejection probability and the nominal level, is R1 (Aki , k) − Aki .

We have seen that, when τ and β are independent, pkij = Pki bkj . In general, without
independence, define δkij > 0 such that

pkij = bkj Pki δkij . (11)

Pm
Then, since j=1 pkij = Pki , we see by summing (11) over j that

X
m
Pki = Pki bkj δkij ,
j=1

whence we conclude that, for any k = 1, . . . , m and i = 0, . . . , n

X
m
bkj δkij = 1. (12)
j=1

In addition, (10) can be written as

qj1 (Aki )−1


X
m X
R1 (Aki , k) = bkj Pkl δklj .
j=1 l=0

Thus the bootstrap discrepancy under k at level Aki is

qj1 (Aki )−1


X
m X X
m X
i−1
bkj Pkl δklj − bkj Pkl , (13)
j=1 l=0 j=1 l=0

Pm
where we have used the relation j=1 bkj = 1 from (3).

–7–
If τ and β are independent under k, δkij = 1 for all relevant i and j, and the discrepancy
(13) is
1
X
m hqj (A
X ki )−1
X
i−1 i
bkj Pkl − Pkl . (14)
j=1 l=0 l=0

The interpretation of (14) is interesting. Notice first that it follows at once from the
definition (8) that qk1 (Aki ) = i. Thus the term with j = k in (14) vanishes. If for
some j, qj1 (Aki ) < i, the corresponding term in (14) is negative, and, if qj1 (Aki ) > i,
it is positive. In the former case, (8) implies that, for all l > qj1 (Aki ), Ajl > Aki , and
so, in particular, Aji > Aki . This means that DGP j assigns more probability mass
to the event τ < πi than does DGP k.

In many cases, there will be some positive and some negative terms in (14). Suppose
then that, under DGP k, the realisation is (i, j) with qj1 (Aki ) < i. The bootstrap
P value is Aji . The “ideal” P value, that is, the true probability mass for the event
τ < πi , is Aki , which is less than Aji . The bootstrap P value is greater than the
ideal one, and so the realisation (πi , j) corresponds to under-rejection. Similarly, if
qj1 (Aki ) > i, this corresponds to over-rejection.

The presence of factors δkqj1 (Aki )j different from 1 in (13) complicates the story, as
we will see shortly. Notice here that the difference between the discrepancies (13)
and (14) is
qj1 (Aki )−1
X
m X
bkj Pkl (δklj − 1). (15)
j=1 l=0

The linear regression

In order to study the consequences of a correlation of τ and τ 1 , as revealed by the


diagnostic regression of realisations of τ 1 on joint realisations of τ and a constant, it
is useful to express the covariance of τ and τ 1 in terms of the notation we have been
developing. First, the expectations of these two random variables. We see immediately
that
Xn
Ek (τ ) = Pki πi .
i=0

From (5), we find that

X
m X
n X
m
1
Ek (τ ) = bkj Pjl πl = bkj Ej (τ ).
j=1 l=0 j=1

–8–
The expectation of the product τ τ 1 can be calculated by use of the joint distribution
given by (4):
X
n X
n

Ek (τ τ 1 ) = πi πl Pk (τ = πi ) ∧ (τ 1 = πl )
i=0 l=0
Xn X n X
m X
n X
m
= πi πl pkij Pjl = πi pkij Ej (τ ).
i=0 l=0 j=1 i=0 j=1

Then
cov k (τ, τ 1 ) = Ek (τ τ 1 ) − Ek (τ )Ek (τ 1 )
Xn Xm X
m
= πi pkij Ej (τ ) − Ek (τ ) bkj Ej (τ )
i=0 j=1 j=1
Xm hX
n i
= Ej (τ ) πi pkij − bkj Ek (τ ) .
j=1 i=0
We check quickly that independence implies a zero correlation, as shown in Theorem 1.
Since with independence pkij = Pki bkj ,
X
m h X n i
1
cov k (τ, τ ) = Ej (τ ) bkj πi Pki − Ek (τ ) = 0.
j=1 i=0

In general,
X
m X
n
1
cov k (τ, τ ) = Ej (τ ) bkj πi Pki (δkij − 1). (16)
j=1 i=0

Corollary 2
If the expectations Ek (τ ) are all equal to τ̄ , independent of k, the covariance of τ
and τ 1 is zero.

Proof:
The result follows immediately from Corollary 1 if τ is a pivot. Under the weaker
condition used here, from (16) we see that
X
n X
m
1
cov k (τ, τ ) = τ̄ πi Pki bkj (δkij − 1) = 0,
i=0 j=1

where the last equality follows from (12).


Theorem 2
Let the expectations Ej (τ ) j = 1, . . . , m, of the statistic under the DGPs of the model
be bounded above by the positive number M . Then, if the covariance (16) is positive,
the differences (15) between the bootstrap discrepancy and what it would be in the
case of independence of τ and β average to negative values in the lower part of the
distribution, and to positive values in the upper part.

–9–
Proof:
Observe first that, by (12),

X
n X
m
bkj Pki (δkij − 1) = 0. (17)
i=0 j=1

Pm
Thus some of the terms j=1 bkj Pki (δkij − 1) in the sum over i are positive and some
negative. For given Aki , expression (15) is the sum of some of these terms with upper
limits for the index l less than n.
The covariance (16) is bounded above by M times

X
n X
m
bkj πi Pki (δkij − 1), (18)
i=0 j=1

and so, if (16) is positive, so too is the sum (18). By comparing (17) and (18), it
can be seen that the weights πi , increasing in i, push the sumPfrom being zero, as
m
in (17), to being positive, as in (18). This implies that the terms j=1 bkj Pki (δkij − 1)
average to a negative value for small values of i and to a positive value for large values
of i. Thus the values of expression (15), for increasing values of the nominal level Aki ,
progressively incorporate fewer negative terms and more positive terms. This implies
the statement of the Theorem.

Remarks:
Our assumption that τ takes the form of an approximate P value guarantees that
M ≤ 1. The seemingly redundant requirement in the statement of the Theorem is
made in order to indicate that, if raw statistics are used, as would normally be the
case in practice, the result continues to hold under this mild condition that M < ∞.
Nothing in the model requires the bootstrap discrepancies to move smoothly as a
function of the nominal level. That is why the theorem speaks of averages of the
discrepancies rather than their precise values. Especially if the correlation of τ and τ 1
is slight, particular values of the discrepancy may differ from the average effects.
The theorem tells us how the pattern of the bootstrap discrepancy differs from what
it would be if, other things being equal, τ and β were independent. This means that
non-independence may well reduce the magnitude of the discrepancy for particular
levels. If, however, the discrepancy with independence is small, the introduction of
dependence is bound to increase its magnitude for most levels.
In particular, the pattern of under-rejection by a test based on the ME bootstrap
for conventional levels, and over-rejection for greater levels, follows from the strong
positive correlation of τ and τ 1 .

– 10 –
4. The Fast Double Bootstrap

It was remarked in the introduction that the FDB relies on the marginal distributions
of τ and τ 1 . In this section, we will see that the FDB can be made more accurate by
use of their joint distribution. Of course, if the two are nearly independent, there is
little or no gain, but when there is substantial correlation, the improvement can be
considerable.
It may be useful here to present the algorithm that can be used to implement the
FDB, where it is assumed without loss of generality that the rejection region is to the
left of the statistic.

Algorithm for FDB


1. From the data set under analysis, compute the realisations of the statistic τ
and the bootstrap DGP β.
2. Draw B bootstrap samples and use them to compute B independent realisations
of the bootstrap statistic τj∗ , j = 1, . . . , B, and of the bootstrap DGP βj∗ .

3. Compute B second-level bootstrap statistics τj1∗ by drawing from the simulated


bootstrap DGPs βj∗ .
4. Compute the estimated first-level bootstrap P value p̂1 as the proportion of the τj∗
smaller than τ . This is our estimate of R0 (τ, β).
1∗ 1
5. Compute an estimate of the p̂1 -quantile of the τ ; denote it by q̂ . This is our
1
estimate of Q R0 (τ, β), β .

6. Compute the estimated FDB P value p̂f2 as the proportion of the τj∗ smaller

than q̂ 1 . This is our estimate of the FDB P value R0 Q1 (p̂1 , β), β .

Remark on the algorithm:

It is quite possible to use in the above algorithm a statistic that rejects to the right,
like a χ2 or F test, without first converting it to an approximate P value. The needed
modifications are these: The statistic τ , as also the τj∗ and the τj1∗ are all constructed
to reject to the right; in step 4, p̂1 is the proportion of the τj∗ greater than τ ; in step 5,
it is the (1 − p̂1 )-quantile that is needed; in step 6, p̂f2 is the proportion of the τj∗
greater than q̂ 1 .
We can now express the FDB P value in the notation of our discrete model. Suppose
that the DGP called µ in the introduction is the DGP k. The CDF of the single
bootstrap P value is written as R1 (x, k), and it can be approximated by R1f (x, k) =
R0 Q1 (x, k), k , where R0 (x, k) is the CDF under k of τ , and Q1 (x, k) is the quantile
function of τ 1 . For 0 ≤ x ≤ 1, let

i(x) = max i : πi < x + 1.

– 11 –
Note that i(πl ) = l. For given x, the function returns the index of the greatest value
of τ that is no greater than x; in particular, for πi ≤ x < πi+1 , i(x) = i. Then, from
the definitions (2), we can see that R0 (x, k) = Aki(x) . Similarly, from (5) we have

X
m
1
R (x, k) = bkj Aji(x) ≡ A∗ki(x) , (19)
j=1

thus defining the probabilities A∗ki . Analogously to (8), define



qk∗ (x) = max i : A∗ki ≤ x , (20)
i=0,...,n

and observe that qk∗ (A∗ki ) = i. A possible definition of the quantile function Q1 , and
the one we use here, is then Q1 (x, k) = πqk∗ (x) . With these definitions, it follows that

X
m
R1f (x, k) = Akqk∗ (x) = akqk∗ (x)j . (21)
j=1

This can be compared with the expression (9) for R1 (x, k) itself. The nature of the
approximation may be made still clearer by noting that, when τ and β are independent,
so that pkij = Pki bkj , we have

X
m X
m
R1 (x, k) = Akqj1 (x) bkj and R1f (x, k) = Akqk∗ (x) bkj .
j=1 j=1

The double bootstrap P value, p2 , is R1 (p1 , β). With absolutely continuous distri-
butions, the random variable R1 (p1 , µ) has the U(0,1) distribution, and so for p2 the
unknown true DGP µ is replaced by its estimate, namely the bootstrap DGP β. In
exactly the same way, the FDB P value can be written as R1f (p1 , β), and, since in our
current notation p1 = Aji , this is

pf2 = Ajqj∗ (Aji ) . (22)

Theorem 3
The CDF of the fast double bootstrap P value under DGP k is given by

X
m
Pk (pf2 ≤ x) = aky(j,x)j , (23)
j=1

where y(j, x) is defined to be qj1 (A∗jq1 (x)+1 ).


j

– 12 –
Proof:
We have
X
n X
m
Pk (pf2 ≤ x) = pkij I(Ajqj∗ (Aji ) ≤ x).
i=0 j=1

The condition in the indicator function is equivalent to the condition qj∗ (Aji ) ≤ qj1 (x).
Suppose that i is such that Aji ∈ [A∗jl , A∗jl+1 [ for some l. Then qj∗ (Aji ) = l. The
condition is thus equivalent to l ≤ qj1 (x). This last inequality is satisfied if and only
if Aji < A∗jq1 (x)+1 (with strict inequality), and this is equivalent to the requirement
j

that i < qj1 (A∗jq1 (x)+1 ), that is, i < y(j, x), again with strict inequality. The result (23)
j
follows at once.
Remark:
At this point, we have imposed no structure at all on the model defined by the three-
dimensional array of the quantities pkij . It is therefore unsurprising that the result (23)
by itself sheds no light on the good or bad performance of the FDB. The challenge
is to discover conditions that make sense in the context of an econometric model and
influence this performance.

Using the joint distribution


In Davidson and MacKinnon (1999) a finite-sample calculation leads to an expression
for the bootstrap discrepancy at a given nominal level α in terms of the expectation
of the α-quantile of the distribution of the statistic under the bootstrap DGP β,
conditional on the statistic τ . If τ and β are independent, the conditional expectation
is just the unconditional one, but otherwise the extent of dependence influences the
bootstrap discrepancy. This suggests that the performance of the FDB, relying as it
does on the assumption of near independence, can be enhanced in cases of dependence
by explicitly taking account of it.
The simulations needed to estimate the FDB P value do not yield realisations of the
quantile. Although those needed for the full double bootstrap do so, it appears that
any technique for simulating these realisations must be as computationally expensive as
the double bootstrap. However, the FDB does make explicit use, through its quantile
function, of the distribution of τ 1 , generated by the average of the bootstrap DGPs,
and this may serve as a proxy for the distributions of the statistics generated by the
different bootstrap DGPs severally.
It is therefore tempting to replace the unconditional quantile of τ 1 in the definition of
the FDB P value by the quantile conditional on the realised value of τ . That realised
value is generated by the true unknown DGP µ, but we can use the bootstrap principle
to replace µ by the bootstrap DGP β, and obtain an estimated conditional quantile
from the joint distribution of random variables τ and τ 1 generated by β.
We have
X
i−1
Pk (τ < πi | τ = πl ) =
1
Pk (τ 1 = πi′ | τ = πl ).
i′ =0

– 13 –
Then it is natural to make the definition of the CDF of τ 1 conditional on τ = πl as
follows:
X
i(x)−1
X X
i(x)−1 m
R (x, k | l) =
1
Pk (τ = 1
πi′ | τ = πl ) = pklj Pji′ /Pkl ,
i′ =0 i′ =0 j=1

the last equality following from (4). Let bkj|l = pklj /Pkl , the probability under k that
β = j conditional on τ = πl . Then

X X
m i(x)−1 X
m
R (x, k | l) =
1
bkj|l Pji = bkj|l Aji(x) .
j=1 i=0 j=1

By analogy with (19) and (20), we make the definitions


X
m

A∗ki|l = bkj|l Aji and ∗
qk|l (x) = max i : A∗ki|l ≤ x .
i=0,...,n
j=1

We are led to define the following conditional approximation:



R1cf (x, k | l) = R0 Q1 (x, k | l), k , (24)

where Q1 (x, k | l) = π(qk|l (x)). Explicitly,

X
m
R1cf (x, k | l) = Akqk|l
∗ (x) = akqk|l
∗ (x)j ;

j=1

compare (21). We may now define the conditional FDB P value as

p2cf = R1cf (Aji , j | i) = Ajqj|i


∗ (A ) ,
ji
(25)

analogously to (22).
Corollary to Theorem 3:
The CDF of the conditional FDB P value under DGP k is given by
X
m X
Pk (p2cf ≤ x) = pkij ,
j=1 i<yc (j,x|i)

where yc (j, x | i) is qj1 (A∗jq1 (x)+1|i ).


j

Proof:
Most of the algebra is identical to that used in the proof of Theorem 3. In the very
last step of that proof, however, the sum over i cannot be performed explicitly, since
the upper limit imax is determined implicitly by

imax = max i : i < yc (j, x | i) .
i=0,...,n

– 14 –
The algorithm
Since it would be unusual to wish to use either the FDB or its conditional version with
a discrete model, it is preferable here to give the algorithm for computing a conditional
fast double bootstrap (CFDB) P value in a general way, allowing for both continuous
and discrete models. In terms of the quantities computed from the data, the statistic τ ,
which is represented in the discrete model by the index i, and the bootstrap DGP β,
represented by j, the P value p2cf , as given by (25), can be written, using (24), as
 
p2cf = R1cf R0 (τ, β), β | τ = R0 Q1 (R0 (τ, β), β | τ ), β . (26)

In order to implement this formula, the unconditional CDF R0 and the conditional
CDF R1 must be estimated for the bootstrap DGP, and the latter then inverted to
obtain the conditional quantile function Q1 . The procedure is almost identical to that
given in the Algorithm for FDB. The difference is that Steps 5 and 6 are to be replaced
by the steps below:
5′ . Each pair (τj∗ , τj1∗ ) is a drawing from the joint distribution under β of (τ, τ 1 ).
Compute an estimate of the p̂1 -quantile of the τ 1∗conditional on τ ∗ = τ ; denote
it by q̂ 1 . This is our estimate of Q1 R0 (τ, β), β | τ .
6′ . Compute the estimated CFDB P value p̂2cf as the proportion of the τj∗ smaller
than q̂ 1 . This is our estimate of the CFDB P value (26).

Remark on the algorithm:


For step 5′ , there are various ways to estimate the conditional quantile. The most
obvious way is first to estimate the conditional CDF, using a kernel estimate, with a
suitable kernel function K, perhaps Gaussian or Epanechnikov, and a bandwidth h.
The estimate of the CDF of τ 1 , evaluated at t1 , conditional on τ = t, is
PB ∗
 1∗
j=1 K (τj − t)/h I(τj < t )
1
F̂ (t | t) =
1
PB 

j=1 K (τj − t)/h

This is just the Nadaraya-Watson estimate for the non-parametric regression of the
indicator I(τ 1∗ < t1 ) on τ ∗ . It would no doubt be better to use a locally linear estimator
instead. It has also been suggested that it might help to smooth the discontinuous
indicator function, replacing it by a cumulative kernel, L say, evaluated at (t1 −τj1∗ )/b,
where b is a bandwidth. Some experimentation showed that this led to a deterioration
in the accuracy of the estimate, and so we made no use of this idea.
Whatever choice is made for the estimation of the conditional CDF, the conditional
α-quantile can be estimated by solving the equation F̂ (t1 | t) = α for t1 given t. A
root-finding algorithm, such as bisection or Brent, can be used for this purpose.
Alternatively, the check-function approach can be used. It is well known that the
α-quantile of a distribution can be defined as the solution of the problem

argmin E ρα (Y − q) ,
q

– 15 –
where Y is a random variable with the distribution of which the quantile
 is sought, and
where the check function is defined as ρα (u) = u α − I(u < 0) . For the conditional
quantile we seek, we define another kernel estimator:

X
B

Ŝα (q | t) = K (τj∗ − t)/h ρα (τj1∗ − q),
j=1

If this estimator is minimised with respect to q, the minimising q estimates the


α-quantile of the τ 1∗ conditional on τ ∗ = t. Again, a locally linear estimator may
be preferred: it is given by minimising the function

X
B

Ŝα (q, β | t) = K (τj∗ − t)/h ρα (τj1∗ − q − (τj∗ − t)β)
j=1

with respect to q and β, the minimising q being the estimate of the conditional quantile.
Recently, Racine and Li (2017) have reviewed a number of methods for estimating
conditional quantiles, and have proposed a new approach, in which the quantile is
estimated directly, rather than by inverting an estimated conditional CDF. More ex-
perience will be needed to determine which of the various methods is best adapted to
the current problem.

5. Simulation Evidence

In seeking evidence for the performance of the CFDB by means of simulations, the first
experiments were designed just to see if the CFDB was no worse than the ordinary
FDB when there is approximate independence of τ and β. For this purpose, we used
a setup considered in Davidson and MacKinnon (2007) as an illustration of the FDB.
Consider the linear regression model

yt = Xt β + ut , u t = σt εt , t = 1, . . . , n,
σt2 2
=σ + γu2t−1 + 2
δσt−1 , εt ∼ IID(0, 1).

The disturbances of this model follow the GARCH(1,1) process introduced by Boller-
slev (1986). The hypothesis that the ut are IID in this model is tested by running the
regression
û2t = b0 + b1 û2t−1 + residual,
where ût is the t th residual from an OLS regression of yt on Xt . The null hypoth-
esis that γ = δ = 0 can be tested by testing the hypothesis that b1 = 0. Besides
the ordinary t statistic for b1 , a commonly used statistic is n times the centred R2
of the regression, which has a limiting asymptotic distribution of χ21 under the null
hypothesis.
The experimental design is copied from Davidson and MacKinnon (2007). Since in
general one is unwilling to make any restrictive assumptions about the distribution

– 16 –
of the εt , a resampling bootstrap seems the best choice. In all cases, Xt consists of
a constant and two independent, standard normal random variates. In order to have
a non-negligible bootstrap discrepancy, the εt are drawn from the χ22 distribution,
subsequently centred and rescaled to have variance 1. For the same reason, the sample
size of 40 is rather small. Without loss of generality, we set β = 0 and σ 2 = 1, since the
test statistic is invariant to changes in the values of these parameters. The invariance
means that we can use a straightforward resampling bootstrap DGP, with the yt∗ in
a bootstrap sample IID drawings from the empirical distribution of the yt . For the
iterated bootstrap, yt∗∗ is resampled from the yt∗ .
The experiment consisted of 10,000 replications, each with 9,999 bootstrap repetitions.
The distributions of the P values of the single bootstrap, the FDB, and the CFDB
were estimated; the results are displayed in Figure 1 as P value discrepancy plots.
Despite the number of replications, there remains non-negligible simulation random-
ness. However, it can be seen that the discrepancies of the FDB and CFDB are of the
same order of magnitude, with the FDB actually performing better than the CFDB
for nominal levels of practical interest. The overall discrepancy of the single bootstrap
is, as expected, somewhat greater.
0.01
single bootstrap
0 . FDB
CFDB

−0.02

−0.04

−0.06
α
0 0.2 0.4 0.6 0.8 1

Figure 1: P value discrepancy plots for the bootstrap, FDB, and CFDB

A much more interesting experiment was then undertaken, with the ME bootstrap, for
which τ and β are greatly correlated, so that the discrepancies are huge. The model
considered was
yt = X1 β1 + x2 β2 + ut . (27)

The disturbances follow an AR(1) process with autoregressive parameter ρ = 0.9 and
Gaussian innovations. The regressors in the matrix X1 include a constant and two
other regressors, serially correlated with autoregressive parameter 0.8 as also the last
regressor x2 . Without loss of generality, all the slope coefficients, β1 and β2 are set
to zero. The null hypothesis that β2 = 0 is tested using a χ2 with one degree of
freedom, and with a Newey-West HAC covariance matrix estimator; see Newey and

– 17 –
West (1987). The sample size is 64, and the lag truncation parameter for the HAC
estimator is 16.

As with the GARCH experiment, the conditional quantile was estimated by first es-
timating the conditional CDF using a locally linear estimator, and then inverting it
using the bisection method. A preliminary experiment was undertaken with a bivari-
ate normal distribution, for which the true value of the conditional quantile was known
analytically. It was found that smoothing the indicator was counter-productive, and
led to significantly greater bias than with the indicator itself, and so in the experiment
with the model (27) the indicator was not smoothed.

The details of the ME algorithm used can be found in Davidson and Monticini (2014)
and in Bergamelli et al (2015). In Figure 2 P value discrepancy plots are shown for
the test of β2 = 0 by the single ME bootstrap and its FDB and CFDB versions. We
used 10,000 replications each with 999 bootstrap repetitions. It can be seen that the
FDB is at best marginally better than the single ME bootstrap, but that the CFDB
gives rise to a very considerable improvement, without being really reliable.

A very much costlier experiment was undertaken for which, in addition to the boot-
strap versions examined in Figure 2, the discrepancy for the full double bootstrap
was estimated. Again, 10,000 replications were performed, with 999 bootstrap rep-
etitions at the first level, and 399 at the second level. In order for the experiment
to be completed in a reasonable timeframe, 50 concurrent processes were used, with
different random number generators in each. The mean running time of the processes
was around 14 hours, but with a fairly wide variance, so that results were available
after around 16 hours. As an indication of the relative computing costs, each process
for the experiment for Figure 2 took on average only a little more than 5 minutes.

The results of this experiment, shown in Figure 3, show the same results as in Figure 2,
with the addition of those for the full double bootstrap. It emerges clearly that the
full double bootstrap performs only slightly better than the CFDB.

– 18 –
0.2
ME single bootstrap

ME FDB

ME CFDB
0.1

0 .

−0.1

−0.2 α
0 0.2 0.4 0.6 0.8 1

Figure 2: Results for the single ME bootstrap and its FDB and CFDB versions

0.2
ME single bootstrap

ME FDB

ME CFDB
0.1
ME double bootstrap

0 .

−0.1

−0.2 α
0 0.2 0.4 0.6 0.8 1

Figure 3: As in Figure 2, with the addition of the full double bootstrap

– 19 –
6. Conclusion
Although the focus of everything here has been on bootstrap hypothesis tests, it is
entirely possible to use the techniques of both the FDB and the CFDB for the con-
struction of confidence intervals. It was pointed out by Chang and Hall (2015) that,
although the FDB benefits from asymptotic refinements for testing, it does not do so
for confidence intervals. This seeming defect can be overcome at a certain computa-
tional cost, by a procedure similar to Hansen’s (1999) grid bootstrap, or the one used
in Davidson and MacKinnon (2010).
Davidson and Trokić (2020) propose fast versions of higher-order iterated bootstraps,
beginning with the fast triple bootstrap, which they show to be able in some cases
to reduce discrepancies relative to the FDB. It would be good to develop conditional
fast iterated bootstraps. The use of the discrete formulation will almost certainly be a
great help in this task, as it should be for many studies of the finite-sample properties
of the bootstrap.
In Davidson (2017b), a discrete model is given for which the actual numerical values
of the probabilities pkij are available. It will be extremely interesting to use these
numbers as a test bed for the new conditional procedure, and for the extensions that
will be explored in future work.

Appendix

Theorem 1 in the continuous case

In keeping with the spirit of the paper, the regularity conditions used for this theorem
will be kept to the bare minimum.
A.1 For convenience, we suppose that the test statistic τ is in approximate P value
form, and can take on any real value in the interval [0,1], on which is defined the Borel
σ-algebra B[0, 1], and the usual Lebesgue measure L[0, 1].
A.2 The model M is a set consisting of all the DGPs that respect the null hypothesis.
A σ-algebra B(M) is defined on M, along with a probability measure m : B(M) →
[0, 1].
A.3 The product σ-algebra B[0, 1] × B(M) and the product measure L[0, 1] × m are
well defined.
A.4 For each DGP µ ∈ M, a probability measure Pµ is defined on the product
σ-algebra and is absolutely continuous with respect to L[0, 1] × m.
A.5 For each µ ∈ M, the measure Pµ has a density f (µ, t, b), such that, for any
subset C ∈ B[0, 1] × B(M),
Z 1Z

Pµ (C) = I (t, b) ∈ C f (µ, t, b) dt m(db).
0 M

Here, I is the indicator function.

– 20 –
For any given µ, define the marginal densities:
Z
gµ (t) = f (µ, t, b) m(db), and
M
Z 1
hµ (b) = f (µ, t, b) dt,
0

and note that independence of τ and β implies that f (µ, t, b) = gµ (t)hµ (b). Then
Z Z Z
 x x1
Pµ (τ ≤ x) ∧ (τ ≤ x1 ) = 1
f (µ, t, b) dt m(db) gb (t1 ) dt1 . (28)
0 M 0

From this, by setting first x1 = 1, and then x = 1, we derive easily that


Z t Z Z x1
Pµ (τ ≤ x) = gµ (t) dt and Pµ (τ ≤ x1 ) = 1
hµ (b) gb (t1 ) dt1 m(db). (29)
0 M 0

Now, if the joint density factorises in the case of independence, the probability in (28)
becomes Z x Z Z x1
gµ (t) dt hµ (b) gb (t1 ) dt1 m(db),
0 M 0

which is just the product of the two marginal probabilities in (29).

References

Ahlgren, N. and J. Antell (2008). “Bootstrap and Fast Double Bootstrap Tests of
Cointegration Rank with Financial Time Series”, Computational Statistics & Data
Analysis, 52, 4754–4767.

Beran, R., (1988). “Prepivoting test statistics: a bootstrap view of asymptotic refine-
ments”, Journal of the American Statistical Association 83, 687-697.

Bergamelli, M., J. Novotný, and G. Urga (2015). “Maximum Non-Extensive Entropy


Block Bootstrap for Non-stationary Processes”, L’Actualité Economique, 91(1-2),
115–139.

Bollerslev, T. (1986). “Generalized autoregressive conditional heteroskedasticity”,


Journal of Econometrics 31, 307–327.

Chang, J. and P. Hall (2015). “Double-bootstrap methods that use a single double-
bootstrap simulation”, Biometrika doi:10.1093/biomet/asu060

Davidson, J. (2006). “Alternative bootstrap procedures for testing cointegration in


fractionally integrated processes,” Journal of Econometrics, 133, 741-777.

– 21 –
Davidson, R. (2017a). “Diagnostics for the Bootstrap and Fast Double Bootstrap”,
Econometric Reviews, 36, 1021–1038, doi:10.1080/07474938.2017.1307918

Davidson, R. (2017b). “A Discrete Model for Bootstrap Iteration”, Journal of Econo-


metrics 201, 228–236, doi:10.1016/j.jeconom.2017.08.005

Davidson, R. and J. G. MacKinnon (1999). “The Size Distortion of Bootstrap Tests”,


Econometric Theory, 15, 361-376.

Davidson, R. and J. G. MacKinnon (2002). “Fast double bootstrap tests of nonnested


linear regression models,” Econometric Reviews, 21, 417–427.

Davidson, R. and J. G. MacKinnon (2007). “Improving the Reliability of Bootstrap


Tests with the Fast Double Bootstrap,” Computational Statistics & Data Analysis,
51, 3259–3281.

Davidson, R. and J. G. MacKinnon (2010). “Wild bootstrap tests for IV regression”,


Journal of Business and Economic Statistics, 28, 128–144.

Davidson, R. and A. Monticini (2014). “Heteroskedasticity-and-autocorrelation-consi-


stent bootstrapping. Technical report, Università Cattolica del Sacro Cuore, Dipar-
timenti e Istituti di Scienze Economiche (DISCE).

Davidson, R. and M. Trokić (2020). “The Fast Iterated Bootstrap”, Journal of Econo-
metrics, https://doi.org/10.1016/j.jeconom.2020.04.025

Hall, P. (1992). The Bootstrap and Edgeworth Expansion, Springer-Verlag, New York.

Hansen, B. E. (1999). “The grid bootstrap and the autoregressive model,” Review of
Economics and Statistics, 81, 594–607.

Lamarche, J.-F. (2004). “The numerical performance of fast bootstrap procedures,”


Computational Economics, 23, 379–389.

Newey, W. K. and K. D. West (1987). “A simple, positive semi-definite, heteroskedas-


ticity and autocorrelation consistent covariance matrix”, Econometrica 55, 703–708.

Omtzigt, P. and S. Fachin (2006). “The size and power of bootstrap and Bartlett-
corrected tests of hypotheses on the cointegrating vectors”, Econometric Reviews
25, 41–60.

Ouysse, R. (2013). “A Fast Iterated Bootstrap Procedure for Approximating the Small-
Sample Bias”, Communications in Statistics – Simulation and Computation, 42,
doi:10.1080/03610918.2012.667473

Racine, J. S. and K. Li (2017). “Nonparametric conditional quantile estimation: A


locally weighted quantile kernel approach”, Journal of Econometrics 201, 72–94.
http://dx.doi.org/10.1016/j.jeconom.2017.06.020

– 22 –
Vinod, H. D. (2006). “Maximum entropy ensembles for time series inference in eco-
nomics”, Journal of Asian Economics 17, 955–978.

Vinod, H. D. and J. López-de Lacalle (2009). “Maximum Entropy Bootstrap for Time
Series: the meboot R Package”, Journal of Statistical Software 29, 1–19.

– 23 –

You might also like