Model-Assisted Bayesian Designs For Dose Finding and Optimization Methods and Applications
Model-Assisted Bayesian Designs For Dose Finding and Optimization Methods and Applications
Bayesian adaptive designs provide a critical approach to improve the efficiency and success of
drug development that has been embraced by the US Food and Drug Administration (FDA).
This is particularly important for early phase trials as they form the basis for the development
and success of subsequent phase II and III trials.
The objective of this book is to describe the state-of-the-art model-assisted designs to facilitate
and accelerate the use of novel adaptive designs for early phase clinical trials. Model-assisted
designs possess avant-garde features where superiority meets simplicity. Model-assisted
designs enjoy exceptional performance comparable to more complicated model-based adap-
tive designs, yet their decision rules often can be pre-tabulated and included in the protocol—
making implementation as simple as conventional algorithm-based designs. An example is
the Bayesian optimal interval (BOIN) design, the first dose-finding design to receive the fit-
for-purpose designation from the FDA. This designation underscores the regulatory agency’s
support of the use of the novel adaptive design to improve drug development.
Features
Series Editors
Shein-Chung Chow, Duke University School of Medicine, USA
Byron Jones, Novartis Pharma AG, Switzerland
Jen-pei Liu, National Taiwan University, Taiwan
Karl E. Peace, Georgia Southern University, USA
Bruce W. Turnbull, Cornell University, USA
Ying Yuan
The University of Texas MD Anderson Cancer Center, USA
Ruitao Lin
The University of Texas MD Anderson Cancer Center, USA
J. Jack Lee
The University of Texas MD Anderson Cancer Center, USA
First edition published 2023
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
Reasonable efforts have been made to publish reliable data and information, but the author and pub-
lisher cannot assume responsibility for the validity of all materials or the consequences of their use.
The authors and publishers have attempted to trace the copyright holders of all material reproduced in
this publication and apologize to copyright holders if permission to publish in this form has not been
obtained. If any copyright material has not been acknowledged please write and let us know so we may
rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or here-
after invented, including photocopying, microfilming, and recording, or in any information storage or
retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, access www.copyright.com
or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-
750-8400. For works that are not available on CCC please contact [email protected]
Trademark notice: Product or corporate names may be trademarks or registered trademarks and are
used only for identification and explanation without intent to infringe.
DOI: 10.1201/9780429052781
Typeset in CMR10
by KnowledgeWorks Global Ltd.
Publisher’s note: This book has been prepared from camera-ready copy provided by the authors.
To my wife, Suyu, and daughter, Selina.
Ying Yuan
Ruitao Lin
J. Jack Lee
Contents
Preface xi
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Traditional 3+3 design . . . . . . . . . . . . . . . . . . . . . 16
2.3 Cohort expansion . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4 Accelerated titration design . . . . . . . . . . . . . . . . . . . 20
2.5 Continual reassessment method . . . . . . . . . . . . . . . . 21
2.6 Bayesian model averaging CRM . . . . . . . . . . . . . . . . 25
2.7 Escalation with overdose control . . . . . . . . . . . . . . . . 28
2.8 Bayesian logistic regression method . . . . . . . . . . . . . . 30
2.9 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Modified toxicity probability interval design . . . . . . . . . 33
3.3 Keyboard design . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4 Bayesian optimal interval (BOIN) design . . . . . . . . . . . 38
3.4.1 Trial design . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4.2 Theoretical derivation . . . . . . . . . . . . . . . . . . 42
3.4.3 Specification of design parameters . . . . . . . . . . . 46
3.4.4 Statistical properties . . . . . . . . . . . . . . . . . . . 47
3.4.5 Frequently asked questions . . . . . . . . . . . . . . . 49
3.5 Operating characteristics . . . . . . . . . . . . . . . . . . . . 52
3.6 Software and case study . . . . . . . . . . . . . . . . . . . . . 57
vii
viii
4 Drug-Combination Trials 69
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2 Model-based designs . . . . . . . . . . . . . . . . . . . . . . . 71
4.3 Model-assisted designs . . . . . . . . . . . . . . . . . . . . . . 74
4.3.1 BOIN combination design . . . . . . . . . . . . . . . . 74
4.3.2 Keyboard combination design . . . . . . . . . . . . . . 77
4.3.3 Waterfall design . . . . . . . . . . . . . . . . . . . . . 77
4.4 Operating characteristics . . . . . . . . . . . . . . . . . . . . 82
4.5 Software and case study . . . . . . . . . . . . . . . . . . . . . 85
5 Late-Onset Toxicity 93
Bibliography 205
Index 217
Preface
xi
xii Preface
Ying Yuan
Ruitao Lin
J. Jack Lee
Author Biographies
f (θ)L(D | θ)
f (θ | D) = ,
f (D)
DOI: 10.1201/9780429052781-1 1
2 Model-Assisted Bayesian Designs for Dose Finding and Optimization
This posterior is then used to guide the treatment decision for cohort 3. By
continuing this process, we can learn about θ successively and refine treatment
decisions for patients as the trial progresses.
Third, Bayesian methods conform to the likelihood principle, which states
that all information for making inference on the parameters is contained in
the observed data only and not in the unobserved data. Fourth, Bayesian
Bayesian Statistics and Adaptive Designs 3
f (θ | D) ∝ f (θ)L(D | θ)
∝ θα−1 (1 − θ)β−1 θy (1 − θ)n−y
∝ θα+y−1 (1 − θ)β+n−y−1 . (1.2)
Noting that (1.2) is the kernel of the beta distribution, the posterior distribu-
tion of θ follows a beta distribution
θ | D ∼ Beta(α + y, β + n − y).
'HQVLW\
'HQVLW\
'HQVLW\
FIGURE 1.1: Prior, likelihood (unnormalized), and posterior densities for the
probability of response θ, based on the Beta-binomial model with different
beta priors.
f (µ | D) ∝ f (µ | θ0 , τ 2 )L(D | µ)
( n
)
1 1 X
∝ exp − 2 (µ − θ0 )2 exp − 2 (yi − µ)2
2τ 2σ i=1
Pn 2
σ 2 θ0 + τ 2 i=1 yi
µ−
σ 2 + nτ 2
∝ exp − 2 2
2τ σ
2
σ + nτ 2
Pn
θ0 /τ 2 + i=1 yi /σ 2
1
= N µ| , .
1/τ 2 + n/σ 2 1/τ 2 + n/σ 2
As can be seen, the posterior mean is the weighted average of the prior mean
and the observed sample mean, weighted by the respective variance.
In Example 1.1 or 1.2, the posterior distribution is in the same distribu-
tional family as the prior distribution, greatly facilitating the posterior com-
putation. This type of prior is called a conjugate prior. For many statistical
models, however, the conjugate prior does not exist. In these cases, the pos-
terior distribution does not have a form of standard distributions, and the
Bayesian Statistics and Adaptive Designs 5
When a non-informative prior or the flat prior f (θ) ∝ 1 is used, the posterior
mode estimator is often equal or similar to the frequentist maximum likelihood
estimator.
Because Bayesian statistics treat the unknown parameter θ as a random
variable, it is straightforward to make interval inference about θ based on its
posterior distribution. Analogous to the frequentist confidence interval, the
100(1 − α)% Bayesian credible interval Cα is given by
Z
f (θ | D)dθ = 1 − α,
Cα
Alternatively, the highest posterior density (HPD) interval gives the short-
est length among all credible intervals with a given α. A 100(1 − α)% HPD
interval for θ is given by
Cα = {θ : f (θ | D) ≥ πα },
Prediction
Another strength of Bayesian statistics is its ability to make predictions on
future observations. Assuming that the future observations D̃ and the cur-
rent data D are conditionally independent given θ, the posterior predictive
distribution for D̃ is given by
Z
f (D̃ | D) = f (D̃, θ | D)dθ
Z
= f (D̃ | θ, D)f (θ | D)dθ
Z
= L(D̃ | θ)f (θ | D)dθ.
for k = 0, 1. The Bayes factor is defined as the ratio of the marginal likelihoods
under H1 verus H0
Pr(D | H1 )
BF10 = .
Pr(D | H0 )
Let Pr(Hk ) denote the prior probability of Hk being true, the posterior prob-
ability of Hk being true is given by
Pr(D | Hk ) Pr(Hk )
Pr(Hk | D) =
Pr(D)
Pr(D | Hk ) Pr(Hk )
= .
Pr(D | H0 ) Pr(H0 ) + Pr(D | H1 ) Pr(H1 )
Pr(H1 | D) Pr(H1 )
= BF10 .
Pr(H0 | D) Pr(H0 )
D 1XPEHURISURWRFROVE\\HDUIURPWR E 1XPEHURISURWRFROVE\\HDUIURPWR
1RQí%D\HVLDQ 1RQí%D\HVLDQ
%D\HVLDQ %D\HVLDQ
1XPEHURISURWRFROV
1XPEHURISURWRFROV
í
<HDU <HDU
F 1XPEHURISURWRFROVE\SKDVHIURPWR G 1XPEHURISURWRFROVE\SKDVHIURPWR
1XPEHURISURWRFROV
1XPEHURISURWRFROV
3KDVHRIGHYHORSPHQW 3KDVHRIGHYHORSPHQW
2.1 Introduction
Conventionally, the objective of a phase I clinical trial is to identify the highest
dose of a new drug that is acceptably safe. This dose is called the maximum
tolerated dose (MTD), typically defined as the dose with the probability of
causing a dose-limiting toxicity (DLT) closest to a prespecified target rate,
for example 20% or 30%. The adverse events (AEs) that define a DLT are
prespecified by the investigators, and often scored using the Common Ter-
minology Criteria for Adverse Events (CTCAE) from the National Cancer
Institute (NCI). The CTCAE defines the severity of the AE using a 5-grade
scale based on the general guideline: grade 1 is mild; grade 2 is moderate;
grade 3 is severe or medically significant but not immediately life-threatening;
grade 4 is life-threatening; and grade 5 is death related to AE. Typically, the
DLT is often defined as an AE of grade 3 or higher.
An implicit assumption for finding the MTD is that efficacy and toxicity
of the investigational drug both increase with the dose, and thus the MTD
is presumed to be the most efficacious dose with an acceptable probability of
causing a DLT. This dose–toxicity–efficacy monotonicity assumption is rea-
sonable for most conventional cytotoxic agents, such as chemotherapies. For
many novel molecularly targeted or immunotherapy agents, efficacy may not
monotonically increase with the dose, although toxicity generally increases
with the dose. In these cases, a more appropriate objective for dose-finding
trials is to find the optimal biological dose (OBD), which is generally defined
as the dose that has the highest desirability in terms of the efficacy-toxicity
tradeoff. For example, assume that 20 mg of a new drug produces the efficacy
probability of 0.4 and toxicity probability of 0.3, if 10 mg produces the effi-
cacy probability of 0.39 and toxicity probability of 0.1, then 10 mg is more
desirable because it yields comparable (or higher) efficacy with lower toxicity.
Chapter 8 describes designs for finding the OBD.
Due to logistical reasons (e.g., preparation and manufacture of the drug),
the set of doses to be explored in a phase I trial is often prespecified by
investigators. The lowest dose typically is specified as one-tenth of the dose
that killed 10% of rodents during pre-clinical studies, i.e, one-tenth of the LD10
DOI: 10.1201/9780429052781-2 13
14 Model-Assisted Bayesian Designs for Dose Finding and Optimization
in rodents, after adjusting for differences in body surface area. The other doses
may be specified to follow a modified Fibonacci sequence so that successive
increments are, say 100%, 67%, 50%, 40%, and 33% thereafter. Alternatively,
successive increments may be a fixed percentage, e.g., 33%, or quantity, e.g.,
50 mg.
Two central statistical issues in dose finding are
1. Dose exploration (i.e., dose escalation/de-escalation), that is, how
to explore the set of doses during the trial?
2. Dose selection, i.e., how to determine the MTD upon completion of
the trial?
One unique challenge associated with these issues is that we should consider
not only statistical efficiency, but also patient ethics. On one hand, dose ex-
ploration should proceed quickly through doses that are well below the MTD,
since these doses presumably are subtherapeutic. On the other hand, dose ex-
ploration in the phase I trial should also proceed cautiously to avoid exposing
an excessive number of patients to doses that are above the MTD.
Dose selection is critical because the dose selected as the MTD upon com-
pletion of the phase I trial will likely be used in subsequent phase II and III
trials. If the dose selected as the MTD is well below the true MTD, then a
subsequent trial may likely fail, thereby wasting enormous resources and pos-
sibly overlooking a promising new drug. If the dose selected as the MTD is
well above the true MTD, then a subsequent trial will expose patients to an
unsafe dose and likely be terminated early due to an excessive rate of DLTs.
Phase I trial designs can be generally classified as algorithm-based designs,
model-based designs, and model-assisted designs (Yuan et al., 2019). This
taxonomy is based more on the characteristics of implementation of designs,
rather than statistical theory, to facilitate practitioners to understand and
apply the designs.
Algorithm-based designs use a set of simple, pre-specified rules (or al-
gorithms) to determine dose escalation and de-escalation, without assuming
any model on the dose–toxicity relationship. Examples include the 3+3 de-
sign (Storer, 1989), the accelerated titration design (Simon et al., 1997), the
rolling-six design (Skolnik et al., 2008), the A+B design (Lin and Shih, 2001),
the biased-coin design (Durham et al., 1997) and their variations (Ivanova
et al., 2003; Stylianou and Follmann, 2004). The implementation of the de-
signs does not require a computer program or much support from statisticians.
Despite widespread criticism of the 3+3 design for poor operating character-
istics, its simplicity continues to make it one of the most widely used phase I
trial designs in practice.
In contrast to the algorithm-based designs, model-based designs utilize
prespecified parametric dose–toxicity models (e.g., the power model or logis-
tic model) to guide dose escalation and de-escalation. As information accrues
during the trial, the dose–toxicity relationship is re-evaluated by updating the
Algorithm-Based and Model-Based Dose Finding Designs 15
estimates of the model parameters and then used to guide the dose alloca-
tion for subsequent patients. A typical example of the model-based design
is the continuous reassessment method (CRM) (O’Quigley et al., 1990). Al-
though a model-based design, such as the CRM, yields better performance
than an algorithm-based design (Le Tourneau et al., 2009; Jaki et al., 2013;
van Brummelen et al., 2016), it is considered by many to be statistically and
computationally complex due to the requirement of specifying the model and
prior, as well as repeated model fitting and estimation. This leads practition-
ers to perceive dose allocations as coming from a “black box.” As a result, the
use of the model-based designs has been fairly limited in practice (Rogatko
et al., 2007).
Emerging in the last decade, model-assisted designs combine the simplicity
of algorithm-based designs and the good performance of model-based designs.
This class of designs utilizes a probability model to derive the design, similar
to the model-based designs, but their rules of dose escalation and de-escalation
can be pre-tabulated before the onset of the trial in a fashion similar to the
algorithm-based designs. Examples of model-assisted designs include the mod-
ified toxicity probability interval (mTPI) design (Ji et al., 2010), Bayesian
optimal interval (BOIN) design (Liu and Yuan, 2015; Yuan et al., 2016a),
and keyboard design (Yan et al., 2017). Due to their competitive performance
and simplicity, model-assisted designs have been increasingly used in practice
(Yuan et al., 2019).
Statistically, these three classes of designs are more or less intertwined.
For example, algorithm-based designs involve, explicitly or implicitly, a certain
probability model (e.g., the binomial or Bernoulli model). Model-based designs
often also use some algorithms/rules to guide dose escalation (e.g., no skipping
of untried doses). Model-assisted designs, from a certain perspective, might
be regarded as a hybrid of model-based and algorithm-based designs as their
decision rules are model-based but can be enumerated and implemented in a
way similar to algorithm-based designs.
In this chapter, we briefly overview some algorithm-based designs and
model-based designs to lay down the foundation for the model-assisted designs,
the focus of this book and subsequent chapters. We assume that the DLT is
scored as a binary variable (i.e., DLT/no DLT) and is quickly ascertainable
such that when a new patient is enrolled and ready for dose assignment,
the DLT outcomes have been ascertained for all patients already enrolled. In
Chapters 5 and 7, we will discuss how to design phase I trials when the DLT
cannot be ascertained quickly (i.e., late-onset) and account for toxicity grades
scored in a scale of more than two levels.
Before describing the designs, we establish some notation. Let d1 < · · · <
dJ denote the J prespecified doses of the new drug that is under investigation
in the trial. We use π(dj ), or shorthand πj when no confusion is caused, to
denote the DLT probability that corresponds to dj , and φ to denote the target
DLT probability for the MTD. We use nj to denote the number of patients who
have been assigned to dj , and yj to denote the number of DLTs observed at
dj , j = 1, . . . , J. Therefore, at a particular point during the trial, the observed
16 Model-Assisted Bayesian Designs for Dose Finding and Optimization
data are D = {Dj , j = 1, . . . , J}, where Dj = (nj , yj ) are the “local” data
observed at dose level j.
Enroll 3
paƟents at
current dose
Escalate to DeͲescalate
the next 0 DLT 1 DLT 2 or 3 DLTs to the next
higher dose lower dose
NO
Enroll 3 more
paƟents at the
same dose
Have 6 paƟents
been treated at
previous dose?
Stop
MTD =
previous dose
(a) A 3+3 design targeting the MTD with the DLT rate ≤ 1/6
Enroll 3
patients at
current dose
Escalate to DeͲescalate
the next 0 DLT 1 DLT 2 or 3 DLTs to the next
higher dose lower dose
NO
Enroll 3 more
patients at the
same dose
Have 6 patients
been treated at
previous dose?
Stop Stop
MTD = current MTD =
dose previous dose
(b) A 3+3 design targeting the MTD with the DLT rate ≤ 2/6
Many numerical studies show that the 3+3 design has poor operating
characteristics, e.g., poor accuracy to identify the MTD (Ahn, 1998; Iasonos
et al., 2008; Onar-Thomas and Xiong, 2010; Zhou et al., 2018b). This can
be explained by Table 2.1 applying the design shown in Figure 2.1(a), which
displays each of the possible outcomes and the corresponding posterior sum-
maries, including the 95% credible interval of the DLT probability and the
posterior probability that the DLT probability is greater than 25%. The pos-
terior summaries are derived by assuming independent beta-binomial models
for πj at each dj , i.e.,
yj | nj , πj ∼ Binomial(nj , πj )
πj ∼ Beta(0.5, 0.5),
The 3+3 design does not target a specific DLT probability, but a range of DLT
probabilities approximately ranging from 1/6 to 1/3, with the mean of 0.25.
Thus, Prob(πj ≤ 0.25 | yj , nj ) is used to summarize the posterior evidence for
whether dj is tolerable.
As shown in Table 2.1, the widths of the 95% credible intervals, which
are mostly > 0.5, indicate that three or six patients are far from adequate for
precisely estimating the DLT probability at a particular dose. For example, for
a dose with 1/6 DLT, the 95% credible interval for the true DLT probability
is (0.019, 0.558), which is too wide to make meaningful assessment of safety.
In addition, despite that doses corresponding to the outcome 2/6 still have
TABLE 2.1: What can be learned from the traditional 3+3 design?
This would give a total of 1/6 + 3/3 = 4/9 (44%) toxicities at the MTD. The
initial 3/3 toxicities in the expansion cohort suggest that the selected MTD
is not safe. Should one treat seven more patients at the estimated MTD, as
mandated by the protocol, or violate the protocol by abandoning the MTD
and de-escalating to a lower dose? If one de-escalates, what sort of rule or
algorithm should be applied to choose a dose, or doses, for the remaining seven
patients? If one continues to treat patients at the selected MTD, and ends up
with 7/10 toxicities, for a total of 1/6 + 7/10 = 8/16 (50%), what should one
conclude? The point is that the idea of treating a fixed expansion cohort at a
chosen MTD may seem sensible, but in practice can be problematic.
In recent years, the sizes of expansion cohorts following phase I trials have
exploded, from 6 or 10 to hundreds in some protocols (Bugano et al., 2017).
What nominally is a large “phase I expansion cohort” actually is a phase II
trial, but conducted without any design, other than a specification of sample
size. This practice magnifies all of the problems described above that occur
with a small expansion cohort. It fails to use the new data in the expansion
cohort adaptively to change the MTD if appropriate, and thus fails to protect
patient safety adequately. Furthermore, expansion cohorts are often used to
provide an initial exploration of treatment efficacy in specific subgroups. How-
ever, without a statistical design, there is no provision to discontinue trials
when experimental treatments are ineffective. As a general rule of thumb, if
the size of an expansion cohort reaches to the size of a phase II study, e.g., 30,
a statistical design is required. More discussion of scientific and ethical pitfalls
of cohort expansion can be found in Yan, Thall and Yuan (2017).
where α is the unknown parameter, and q1 < · · · < qJ are prior estimates of
the DLT probabilities, called the skeleton, at each of the dose levels, respec-
tively. Other models, such as single-parameter logistic and hyperbolic tangent
models have also been proposed for the CRM, see, Cheung (2011, Section
3.2.2). Research shows that the choice of the model has little impact on the
performance of the design. What is more important is the configuration and
calibration of the model, e.g., the skeleton and priors.
Because the sample size of phase I trials is typically small, the model
used in the CRM is often simple and parsimonious, containing one or two
parameters (Iasonos et al., 2016). For illustration, under a target DLT rate
of 0.25, Figure 2.2 depicts the family of dose–toxicity curves that the power
model covers with the skeleton (q1 , · · · , q6 ) = (0.01, 0.04, 0.12, 0.25, 0.40, 0.54)
for a trial with six doses. This skeleton is computed using the model calibration
approach of Lee and Cheung (2009) with the half-width of the indifference
intervals being 0.07 and the prior guess of the MTD being the fourth dose.
Let D = {(nj , yj ), j = 1, . . . , J} denote the accrued data with yj DLTs
in nj patients at dose level j after n patients have been treated in the trial,
PJ
n = j=1 nj . The likelihood function under (2.1) is
J h iy h i(nj −yj )
exp{α} j exp{α}
Y
L(D | α) = qj 1 − qj .
j=1
Let f (α) denote the prior distribution for α, which often is taken as N (0, 2).
Applying Bayes’ theorem, the posterior distribution for α is
●
●
●
●
● ●
● ●
● ●
●
●
●
●
●
●
●
●
●
● ●
● ● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
● ●
● ● ●
●
● ●
●
●
●
●
●
●
'/73UREDELOLW\
● ●
● ●
● ●
● ● ●
●
●
● ●
●
●
●
●
● ● ●
●
● ●
●
● ● ●
● ●
●
● ●
●
●
●
●
●
●
●
● ● ●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
● ●
●
● ●
● ●
●
●
●
●
●
●
●
● ●
● ● ●
●
● ● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
● ● ●
● ●
● ● ●
● ● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
● ● ●
●
● ● ● ●
● ● ● ● ●
● ● ● ● ● ●
●
● ● ● ●
● ●
● ● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
'RVH
FIGURE 2.2: Prior support of the power model with the skeleton of (0.01,
0.04, 0.12, 0.25, 0.40, 0.54) and a normal N (0, 2) prior distribution assigned
to the unknown parameter α.
The posterior mean estimate of the actual DLT probability at dose level j is
Z
exp{α}
π
bj = qj f (α | D)dα, for j = 1, . . . , J. (2.2)
1. Patients in the first cohort are treated at the lowest dose d1 or the
pre-specified dose.
2. Based on the cumulated data, we obtain the posterior DLT proba-
bility estimates π̂j , and find dose level j ∗ that has a DLT probability
closest to the target φ. Let j denote the current dose level,
• if j ∗ < j, de-escalate the dose level to j − 1;
• if j ∗ > j, escalate the dose level to j + 1;
• otherwise, stay at the same level j for the next cohort of pa-
tients.
3. The trial continues until the planned maximum sample size N is
exhausted, after which the MTD is selected as the dose with π
bj
closest to the target φ, i.e.,
Although not included in the original CRM, an early stopping rule is often
imposed in practice to guard against the situation where the lowest dose is
excessively toxic. Specifically, if
perform poorly if it is based on a skeleton such that the true DLT probabil-
ities cannot be approximately recovered by the parsimonious model defined
in (2.1). Unfortunately, because the actual DLT probabilities are unknown,
practitioners often lack adequate information to determine whether a skeleton
is reasonable or not.
To simplify the specification of the skeleton, Lee and Cheung (2009) de-
veloped an automatic approach for selecting a skeleton when reliable prior
information is lacking. Their method is based on the “indifference interval,”
which is the range of DLT probabilities that the CRM cannot distinguish from
the target φ in large samples. The indifference-interval method determines a
skeleton based on φ (the target toxicity rate), j † (the prior guess for which
dose level is the MTD), J (the number of doses under consideration in the
trial), and δ (the desired half-width of the indifference interval). For the power
model CRM, the recommended skeleton is
qj = φ, j = j†,
log(φ − δ) log(qj+1 )
qj = exp , j = j † − 1, . . . , 1,
log(φ + δ) (2.4)
log(φ + δ) log(qj−1 )
qj = exp , j = j † + 1, . . . , J.
log(φ − δ)
Pan and Yuan (2017) showed that given a fixed half-width of the indifference
interval (i.e., δ), the indifference-interval method is invariant to the prior guess
for which dose level is the MTD (i.e., j † ). That is, different values of j † result
in equivalent skeletons, where equivalent skeletons are defined as skeletons
that lead to the same likelihood under the power model (2.1). As a result, the
half-width of the indifference interval plays a much more important role than
the prior guess of the MTD location.
Lee and Cheung’s method simplifies the specification of the skeleton, but
does not resolve the sensitivity issue of the CRM pertaining to model mis-
specification. Table 2.2 shows the simulation results of the CRM with two
different skeletons, skeleton 1 = (0.070, 0.127, 0.200, 0.286, 0.377, 0.468) and
skeleton 2 = (0.012, 0.069, 0.200, 0.380, 0.560, 0.706), respectively generated
by the method of Lee and Cheung (2009) with a half-width of the indifference
interval of 0.04 and 0.08. We can see that skeleton 1 substantially outperforms
skeleton 2 in scenario 1, whereas the result is opposite in scenario 2. In other
words, a skeleton that works well in one scenario may not work well in another
scenario, and there does not exist a single “best” skeleton that outperforms
all others in every scenario. In the next section, we will describe how to solve
this sensitivity issue by specifying multiple skeletons and then using Bayesian
model averaging or selection to adaptively identify the best fitted skeleton for
robust decision making.
Algorithm-Based and Model-Based Dose Finding Designs 25
TABLE 2.2: The performance of the CRM and Bayesian model averaging
CRM (BMA-CRM) with two different skeletons generated with half-widths of
the indifference interval of 0.04 (skeleton 1) and 0.08 (skeleton 2). The target
toxicity rate is φ = 0.2 and sample size is N = 36.
Scenario 1
Dose level 1 2 3 4 5 6
True DLT rate 0.03 0.04 0.05 0.06 0.07 0.20
% sel§ 0 0.10 1.25 3.85 21.05 73.20
CRM (skeleton 1)
No. pts† 1.5 1.7 2.2 3.5 8.0 18.8
% sel 0.05 1.35 5.70 8.10 28.25 56.40
CRM (skeleton 2)
No. pts 1.5 2.2 3.4 5.2 10.1 13.4
% sel 0 0.25 2.45 5.65 21.80 69.60
BMA-CRM
No. pts 1.5 1.9 2.7 3.9 8.6 17.4
Scenario 2
Dose level 1 2 3 4 5 6
True DLT rate 0.12 0.24 0.33 0.60 0.70 0.80
% sel 35.15 43.50 10.45 0.25 0 0
CRM (skeleton 1)
No. pts 13.9 12.2 5.2 1.2 0.3 0.1
% sel 30.60 53.10 11.15 0 0 0
CRM (skeleton 2)
No. pts 12.3 15.1 6.1 1.0 0.1 0
% sel 32.30 49.85 10.05 0 0 0
BMA-CRM
No. pts 12.7 14.3 5.5 1.0 0.3 0.1
§
: Average selection percentage at each dose;
†
: Average number of patients treated at each dose.
and Yuan (2009b) showed that the Bayesian model selection approach yields
similar performance as the BMA, thus we herein focus on the BMA approach.
Through the choice of multiple skeletons, BMA provides a more robust way
to construct the dose–oxicity curve compared to the standard CRM.
The rationale behind the BMA-CRM is to use multiple skeletons to repre-
sent different dose–toxicity relationships. Each skeleton corresponds to a CRM
model described previously with a different set of q1 < · · · < qJ . As long as
one of them is close to the truth, the design will perform well, since BMA
automatically identifies and favors the best fitted model.
Specifically, let {Mk }K
k=1 denote the CRM models corresponding to the K
prespecified skeletons {q1k < · · · < qJk : k = 1, . . . , K}. Like the CRM, the
kth model in the BMA-CRM connects the actual DLT probabilities to the kth
skeleton by assuming,
exp{αk }
πjk = qjk , for j = 1, . . . , J, k = 1, . . . , K.
Let Pr(Mk ) be the prior probability that model Mk is the true model, i.e.,
the probability that the kth skeleton (q1k < · · · < qJk ) matches the true
dose–toxicity curve. If there is no preference a priori for any one skeleton
over the other skeletons, equal weights can be assigned to the different models
by setting Pr(Mk ) = 1/K, k = 1, . . . , K. When there is prior information
about the importance of each set of the prespecified toxicity probabilities, such
information can be incorporated into Pr(Mk ), k = 1, . . . , K. For example, if
one skeleton is more likely to be true, it can be assigned a higher prior model
probability. After n patients have PJ been treated and the observed data are D =
{(nj , yj ), j = 1, . . . , J}, n = j=1 nj , the likelihood function corresponding
to the kth model is
J h iyj h i(nj −yj )
exp{αk } exp{αk }
Y
L(D | αk , Mk ) = qjk 1 − qjk .
j=1
where Z
m(D | Mk ) = L(D | αk , Mk )f (αk )dαk
estimate of the toxicity probabilities for doses under consideration in the trial,
the BMA approach takes a weighted average across the CRM models corre-
sponding to the different skeletons, where the weight for each model reflects
how well that model fits the accumulated data relative to the other models.
The potential estimation bias caused by a misspecification of the skeleton is
averaged out, leading to a more robust design compared to the original CRM.
More precisely, the BMA estimate for the toxicity probability at each dose
level is given by
K
X
π̄j = π̂jk Pr(Mk | D), j = 1, . . . , J, (2.5)
k=1
where π
bjk is the posterior mean of the toxicity probability of dose level j under
model Mk , i.e.,
L(D | αk )f (αk )
Z
exp{α }
π̂jk = qjk k R dαk .
L(D | αk )f (αk )dαk
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
'/73UREDELOLW\ ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
● ●
●
●
'/73UREDELOLW\
'/73UREDELOLW\
● ● ●
● ●
● ● ● ● ● ●
● ● ●
● ●
● ●
● ● ● ● ● ● ●
● ● ● ●
● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
'RVH 'RVH 'RVH
FIGURE 2.3: Prior support of the proposed default set of three skeletons for
the BMA-CRM when the target DLT rate is 0.25.
and then further calibrate the skeletons to maximize the probability of correct
selection of the MTD using a set of prespecified scenarios. That calibration
method is computationally intensive, but it results in a set of three skeletons
that performs well in the specified scenarios.
Table 2.2 provides a numerical example that the BMA-CRM is more reli-
able than the CRM: the performance of the BMA-CRM is close to that of the
CRM with the better skeleton in both scenarios 1 and 2. Such robustness and
reliability are tremendously important because in practice we often prefer a
method that yields reliable performance to a method that has high variability
(i.e., performs well in one scenario but not in another scenario).
logit{π(dj )} = β0 + β1 dj , j = 1, · · · , J, (2.6)
where β0 and β1 are unknown intercept and slope parameters, respectively, and
logit(x) = log{x/(1−x)}. To facilitate interpretation, EWOC re-parameterizes
the logistic model by replacing (β0 , β1 ) with (π1 , ρ), where π1 = π(d1 ) is the
DLT probability at the lowest dose under consideration in the trial, and ρ is
the MTD with logit−1 (β0 + β1 ρ) = φ. The re-parameterized logistic model is
given by
(dj − d1 )logit(φ) + (ρ − dj )logit(π1 )
logit{π(dj )} = .
ρ − d1
If dj = ρ, then logit{π(ρ)} = logit(φ), and thus ρ is the unknown MTD.
Algorithm-Based and Model-Based Dose Finding Designs 29
Pr(π(d) ≥ φ | D) = α.
where f (ρ|D) is the marginal posterior function of the MTD. If α < 0.5,
then L(ρ − δ; ρ, α) < L(ρ + δ; ρ, α), and thus underdosing is preferable to
overdosing. More precisely, since L(ρ + δ; ρ, α)/L(ρ − δ; ρ, α) = (1 − α)/α,
this loss function says that the loss incurred by treating a patient with a
dose δ units above the MTD is (1 − α)/α times more than treating a patient
with a dose δ units below the MTD. Due to the use of the overdose control
rule, EWOC is more conservative and safer than the CRM, but at the cost
of sacrificing the accuracy of identifying the MTD, which is often substantial.
In addition, EWOC is subject to the influence of model misspecification in a
similar way as the CRM.
30 Model-Assisted Bayesian Designs for Dose Finding and Optimization
where β0 and β1 are unknown intercept and slope parameters, dj is the raw
dosage at dose level j, and d∗ is the reference dose. Neuenschwander et al.
(2008) recommended the use of the vague bivariate normal distribution as the
prior distribution of (β0 , log(β1 )), e.g.,
2
µ1 σ1 ρσ1 σ2
(β0 , log(β1 )) ∼ N , .
µ2 ρσ1 σ2 σ22
Pr(πj ∈ (δ1 , δ2 )|D). For patient safety, as with the CRM, the no-dose-skipping
rule is often imposed in practice. Thus, if the estimated optimal dose is higher
than the current dose, we escalate the dose by one level; and if the estimated
optimal dose is lower than the current dose, we de-escalate the dose by one
level. The overdose control rule leads to the following safety stopping rule:
stop the trial if Pr(π1 > δ2 |D) ≥ 0.25 (i.e., the lowest dose is an overdose).
The trial continues until the prespecified maximum sample size N is ex-
hausted. Upon completion of the trial, the BLRM selects the final estimate
of the “optimal” dose as the MTD. Alternative early stopping rules can be
added to the BLRM, for example, stop the trial if the “optimal” has a large
probability (say ≥ 50%) of the proper dosing interval or a minimum number
of patients have been treated at the “optimal” dose.
Of note, given dj and the logistic model (2.7), the prior distribution of
β0 , log(β1 )) automatically determines a set of the prior estimates of π(dj ),
i.e., the skeleton. Thus, the BLRM suffers from the similar sensitivity issue to
the skeleton, or more precisely the prior specification. In addition, as noted
and elaborated by Cheung (2011) and Iasonos et al. (2016), somewhat counter-
intuitively, using the more flexible two-parameter logistic model (2.7) actually
is inferior to and leads to worse performance than the single-parameter power
model (2.1). This is also verified by extensive simulation study by Zhou et al.
(2018b), which shows that the CRM (Section 2.5) outperforms the BLRM.
Lastly, the BLRM depends on the (standardized) raw dosage dj , thus may be
decision-inconsistent. Consider two drugs with the same number of dose levels,
but different raw dosages, e.g., one drug is (5 mg, 10 mg, 15 mg, 20 mg), and
the other drug is (30 mg, 60 mg, 120 mg, 240 mg). We call a design decision-
consistent if it generates identical operating characteristics as long as the DLT
probabilities of the dose levels are the same, regardless of the raw dosages. For
example, the two drugs have the same DLT probabilities (0.05, 0.1, 0.25, 0.45)
at the four dose levels. Clearly, decision-consistency is a desirable property to
have because when the underlying data generation mechanisms are the same,
the design should yield the same operating characteristics. The BLRM, how-
ever, does not have this property because the same data may result in different
estimates when two trials have different raw dosages dj . In contrast, the CRM
and BMA-CRM (based on the power model) do not depend on raw dosages,
but only dose levels, and thus are decision-consistent.
2.9 Software
Software is not required to conduct the algorithm-based 3+3 design and the
accelerated titration design, since their dose exploration and selection rules
32 Model-Assisted Bayesian Designs for Dose Finding and Optimization
3.1 Introduction
Model-assisted designs have emerged as an attractive approach for phase I
clinical trials that combine the simplicity of algorithm-based designs with
the superior performance of model-based designs. Model-assisted designs refer
to a class of novel designs that use a model (e.g., the binomial model) for
efficient decision making like model-based designs, while their dose escalation
and de-escalation rules can be tabulated before the onset of a trial as with
algorithm-based designs (Yuan et al., 2019). This chapter introduces several
model-assisted phase I designs that aim to find the maximum tolerated dose
(MTD), including the modified toxicity probability interval (mTPI) design (Ji
et al., 2010), keyboard design (Yan et al., 2017), and Bayesian optimal interval
(BOIN) design (Liu and Yuan, 2015; Yuan et al., 2016a).
yj | nj , πj ∼ Binomial(nj , πj ) (3.1)
πj ∼ Beta(1, 1) ≡ Unif(0, 1).
DOI: 10.1201/9780429052781-3 33
34 Model-Assisted Bayesian Designs for Dose Finding and Optimization
Suppose j is the current dose level. mTPI determines the next dose as follows:
• If UPM1 = max{UPM1, UPM2, UPM3}, then escalate the dose to level j+1.
• If UPM2 = max{UPM1, UPM2, UPM3}, then stay at the current dose level
j.
• If UPM3 = max{UPM1, UPM2, UPM3}, then de-escalate the dose to level
j − 1.
Because the three UPMs can be determined for all possible outcomes Dj =
(nj , yj ), the dose escalation and de-escalation rules can be tabulated before
the trial begins, which makes mTPI easy to implement in practice.
This can be done as follows: enumerate all possible values of nj from
1 up to the maximum sample size N , and given each possible value of nj ,
enumerate all possible values of yj from 0 up to nj . Then, given a pair of
(nj , yj ), calculate three UPMs and record the resulting dose escalation/de-
escalation decision. Table 3.1 shows the dose escalation and de-escalation table
for mTPI based on a target DLT rate φ = 0.20 and the proper dosing interval
(δ1 , δ2 ) = (0.17, 0.23).
TABLE 3.1: Decision table of the mTPI design based on a target toxicity
rate φ = 0.20 and the proper dosing interval (δ1 , δ2 ) = (0.17, 0.23), nj is the
number of patients treated at dose level j, and yj is the number of DLTs
observed at dose level j.
Number of patients nj 1 2 3 4 5 6 7 8 9 10 11 12
Escalate to j + 1 if yj ≤ 0 0 0 0 0 0 0 0 1 1 1 1
De-escalate to j − 1 if yj ≥ 1 1 2 2 3 3 3 4 4 4 5 5
Eliminate levels j to J if yj ≥ NA 2 2 3 3 3 4 4 4 5 5 5
Model-Assisted Dose Finding Designs 35
2.0
Target key Strongest key
1.5
1.5
Density
Density
1.0
1.0
UPM2
UPM3
UPM1
0.5
0.5
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
FIGURE 3.1: Contrast between (a) the mTPI design and (b) the keyboard
design. The curves are the posterior distributions of πj . To determine the next
dose, the mTPI design compares the values of the three UPMs, whereas the
keyboard design compares the location of the strongest key with respect to
the target key.
to guide dose escalation and de-escalation. Figure 3.1 contrasts the keyboard
and mTPI designs.
The keyboard design starts by specifying a proper dosing interval I ∗ =
(δ1 , δ2 ), referred to as the “target key,” and then populates this interval toward
both sides of the target key, forming a series of keys of an equal width. The
keys span the range of 0 to 1. For example, given the proper dosing interval
or target key of (0.25, 0.35), on its left side we form two keys of width 0.1
(i.e., (0.15, 0.25) and (0.05, 0.15)); and on its right side we form six keys of
width 0.1 (i.e., (0.35, 0.45), (0.45, 0.55), (0.55, 0.65), (0.65, 0.75), (0.75, 0.85),
and (0.85, 0.95)). We denote the resulting intervals/keys as I1 , · · · , IK . As all
keys must have an equal width and be within [0, 1], some DLT probability
values at the two ends (e.g., < 0.05 or > 0.95 in the example) may not be
covered by keys. As explained in Yan et al. (2017), ignoring these “residual”
DLT rates at the two ends does not pose any issue for the decision making of
dose escalation and de-escalation. This is because the posterior distribution of
πj is unimodal, and the decision is made by the relative position of the target
key to the strongest key.
To make the decision of dose escalation and de-escalation, given the ob-
served data Dj = (nj , yj ) at the current dose level j, the keyboard design
identifies the interval Imax that has the largest posterior probability,
TABLE 3.2: Decision table of the keyboard design based on a target toxicity
rate φ = 0.20 and the proper dosing interval (δ1 , δ2 ) = (0.17, 0.23), nj is the
number of patients treated at dose level j, and yj is the number of DLTs
observed at dose level j.
Number of patients nj 1 2 3 4 5 6 7 8 9 10 11 12
Escalate to j + 1 if yj ≤ 0 0 0 0 0 1 1 1 1 1 1 2
De-escalate to j − 1 if yj ≥ 1 1 1 1 2 2 2 2 3 3 3 3
Eliminate levels j to J if yj ≥ NA NA 2 3 3 3 4 4 4 5 5 5
mTPI, mTPI-2 (Guo and Yuan, 2017), is equivalent to the keyboard design,
but is perplexing and less transparent than the keyboard design. mTPI-2 relies
on complicated procedures, such as Occam’s razor and model selection, which
are difficult to understand and communicate with non-statisticians.
1. Patients in the first cohort are treated at the lowest dose d1 , or the
physician-specified dose.
Model-Assisted Dose Finding Designs 39
Start
at the prespecified
starting dose
Treat a patient or a
cohort of patients
Reach
Stop the trial and Yes
the maximum
select the MTD sample size
No
≤ λe Compute !Lϊ
the DLT rate*
at the current
dose
if Pr(πj > φ | Dj ) > 0.95 and nj ≥ 3, dose levels j and higher are elimi-
nated from the trial, and the trial is terminated if the lowest dose level is
eliminated.
The value of Pr(πj > φ | Dj ) is evaluated based on the posterior distribution
(3.2). When the trial is terminated due to toxicity, no dose should be selected
as MTD.
Table 3.3 provides the optimal dose escalation and de-escalation bound-
aries (λe , λd ) for commonly-used target DLT probabilities φ. For example,
given φ = 0.3, the corresponding escalation boundary λe = 0.236 and the
de-escalation boundary λd = 0.358. That is, escalate/de-escalate/retain the
dose if 0/3 or 2/3 or 1/3 patients have DLT, respectively, given that three
patients have been treated at the current dose.
Table 3.4, or equivalently Figure 3.3, shows the discretized escalation/de-
escalation boundaries up to nj = 12 when φ = 0.3. This discretized version of
the decision table is often handy in practice to conduct the trial, which can
be easily generated using the software described later.
The hallmark of the BOIN design is its simplicity and transparency. Ar-
guably, its decision rule is even simpler than the 3+3 design; it just involves a
simple comparison between π̂j and (λe , λd ). Such simplicity renders BOIN sev-
eral important advantages over other model-assisted designs (e.g., the mTPI
and keyboard designs). From the statistical viewpoint, π̂j is the (nonpara-
metric) maximum likelihood estimate of πj , and it enjoys desirable statistical
properties such as being consistent and efficient. From a practical viewpoint,
π̂j is the most natural and intuitive estimate of πj that is accessible by non-
statisticians, making the BOIN design simpler and more transparent than the
mTPI and keyboard designs. In our experience, explaining the BOIN design
to clinicians, especially when equipped with the flowchart displayed in Figure
3.2, is easy and well received.
In addition, due to the feature that the BOIN design guarantees de-
escalating the dose when π̂j > λd , it is easy for clinicians and regulatory
agencies to assess the safety of a trial using the BOIN design. For example,
given a target DLT rate φ = 0.25, we know a priori that a phase I trial using
the BOIN design guarantees de-escalating the dose if the observed DLT rate is
Model-Assisted Dose Finding Designs 41
The BOIN design is derived based on the optimal design theory. Let
λe (dj , nj , φ) and λd (dj , nj , φ) denote the general dose escalation and de-
escalation boundaries that are unspecified functions of the dose (i.e., dj ), the
number of patients treated (i.e., nj ), and the DLT target (i.e., φ), where
0 ≤ λe (dj , nj , φ) < λd (dj , nj , φ) < 1. Consider a class of nonparametric de-
signs Cnp :
1. Patients in the first cohort are treated at the lowest dose d1 , or the
physician-specified dose.
2. Suppose j is the current dose level; to assign a dose to the next
cohort of patients:
• if π̂j ≤ λe (dj , nj , φ), escalate the dose level to j + 1.
• if π̂j > λd (dj , nj , φ), de-escalate the dose level to j − 1.
• otherwise, i.e., λe (dj , nj , φ) < π̂j ≤ λd (dj , nj , φ), stay at the
current dose level j.
3. Repeat step 2 until the maximum sample size N is reached.
As escalation and de-escalation boundaries λe (dj , nj , φ) and λd (dj , nj , φ) can
freely vary according to dj , nj , and φ, this class of designs are extremely
broad and contain all possible nonparametric designs that do not impose
the parametric assumption on the dose-toxicity curve and make dose esca-
lation and de-escalation based on the local data Dj = (yj , nj ). The mTPI,
keyboard/mTPI-2, and BOIN designs all belong to Cnp . For notational brevity,
we use shorthands λej ≡ λe (dj , nj , φ) and λdj ≡ λd (dj , nj , φ).
The BOIN design is obtained by minimizing the probability of making
incorrect decisions of dose escalation and de-escalation within Cnp . Liu and
Yuan (2015) described two versions of the BOIN design: the local optimal
BOIN design, which is optimized based on point hypotheses, and the global
optimal BOIN design, which is optimized based on interval hypotheses. Liu
and Yuan (2015) recommended the local optimal BOIN design because of
its better performance. Thus, we here focus on the development of the local
optimal BOIN design and simply call it the BOIN design.
To proceed, define three point hypotheses:
Ē = {S, D}; and under H2j , the correct decision is D, and incorrect decisions
are D̄ = {S, E}.
Under the Bayesian paradigm, we assign each of the hypotheses a prior
probability of being true, denoted as ωkj = Pr(Hkj ), k = 0, 1, 2. The prob-
ability of making an incorrect decision (the decision error rate), denoted as
α(λej , λdj ), at each of the dose assignments is given by
α(λej , λdj )
= Pr(H0j ) Pr(S̄|H0j ) + Pr(H1j ) Pr(Ē|H1j ) + Pr(H2j ) Pr(D̄|H2j )
= Pr(H0j ) Pr(yj ≤ nj λej or yj > nj λdj |H0j ) + Pr(H1j ) Pr(yj > nj λej |H1j )
+ Pr(H2j ) Pr(yj ≤ nj λdj |H2j )
= ω0j {Bin(nj λej ; nj , φ) + 1 − Bin(nj λdj ; nj , φ)} +
ω1j {1 − Bin(nj λej ; nj , φ1 )} + ω2j Bin(nj λdj ; nj , φ2 ), (3.3)
where Bin(b; n, φ) is the cumulative density function (CDF) of the binomial
distribution, with size and probability parameters n and φ evaluated at the
value b. We rewrite the decision error α(λej , λdj ) as
α(λej , λdj ) = α1 (λej ) + α2 (λdj ) + ω0j + ω1j ,
where
α1 (λej ) = ω0j Bin(nj λej ; nj , φ) − ω1j Bin(nj λej ; nj , φ1 )
α2 (λdj ) = ω2j Bin(nj λdj ; nj , φ2 ) − ω0j Bin(nj λdj ; nj , φ).
To minimize α(λej , λdj ), we can minimize α1 (λej ) and α2 (λdj ) separately with
regard to λej and λdj , respectively. As α1 (λej ) and α2 (λdj ) are symmetric,
below we consider the minimization of α1 (λej ):
α1 (λej ) = ω0j Bin(nj λej ; nj , φ) − ω1j Bin(nj λej ; nj , φ1 )
bnj λej c
X nj
= {ω0j φy (1 − φ)nj −y − ω1j φy1 (1 − φ1 )nj −y }
y=0
y
bnj λej c ( y nj −y )
X nj y ω0j φ 1 − φ
= ω1j φ1 (1 − φ1 )nj −y −1 .
y=0
y ω1j φ1 1 − φ1
By the definition of the CDF, αe (λej ) = 0 when bnj λej c < 0 and α1 (λej ) = 1
when bnj λej c ≥ nj .
Assuming that y ∗ is continuous and setting
y ∗ nj −y∗
ω0j φ 1−φ
− 1 = 0, (3.4)
ω1j φ1 1 − φ1
we obtain
1 − φ1 ω1j
nj log + log
1−φ ω
y∗ = 0j .
φ(1 − φ1 )
log
φ1 (1 − φ)
Model-Assisted Dose Finding Designs 45
y nj −y
Because φ > φ1 , φφ1 1−φ
1−φ1 monotonically increases with y. It follows
y nj −y
π0j φ 1−φ
that π1j φ1 1−φ1 − 1 ≥ 0 when y ≥ y ∗ , and < 0 when y < y ∗ .
Therefore, given 0 ≤ y ≤ nj , α1 (λej ) is minimized when
∗
y ∗ ≥ nj ,
[nj −I(y = nj ), ∞) , if
nj λej ∈ [dy ∗ e − 1, by ∗ c + 1) , if 0 < y ∗ < nj ,
(−∞, I(y ∗ = 0)), y ∗ ≤ 0,
if
I(y ∗∗ = nj )
1− nj , ∞ , if y ∗∗ ≥ nj ,
∗∗
dy e − 1 by ∗∗ c + 1
λdj ∈ nj , nj , if 0 < y ∗∗ < nj , (3.6)
I(y ∗∗ = 0)
−∞, , if y ∗∗ ≤ 0,
nj
where
1−φ ω0j
nj log + log
1 − φ2 ω
y ∗∗ = 2j .
φ2 (1 − φ)
log
φ(1 − φ2 )
As any values of λej and λdj located in the interval solutions (3.5) and
(3.6) produce the same error rate, for the purpose of designing the trial and
making decisions, a point solution located in the interval solutions is sufficient.
As dxe − 1 < x < bxc + 1, one specific “middle” point solution is
1 − φ1 −1 ω1j
log + nj log
1−φ ω
λ∗ej = y ∗ /nj = 0j , (3.7)
φ(1 − φ1 )
log
φ1 (1 − φ)
46 Model-Assisted Bayesian Designs for Dose Finding and Optimization
1−φ −1 ω0j
log + nj log
1 − φ2 ω
λ∗dj = y ∗∗ /nj = 2j . (3.8)
φ2 (1 − φ)
log
φ(1 − φ2 )
This is the solution provided by Liu and Yuan (2015). The above derivation
provides a more complete interval solution pair (3.5) and (3.6).
When the non-informative prior ω0j = ω1j = ω2j = 1/3 is used, which is
recommended for most trials, the optimal escalation boundaries become
1 − φ1 1−φ
log log
1−φ 1 − φ2
λe = , λd = , (3.9)
φ(1 − φ1 ) φ2 (1 − φ)
log log
φ1 (1 − φ) φ(1 − φ2 )
In other words, the decision rule of BOIN is equivalent to the following intu-
itive Bayesian decision rule:
Pr(H1j |Dj )
• If ≥ 1 (i.e., the data indicate that H1j is equal or more likely
Pr(H0j |Dj )
to be true than H0j ), escalate the dose.
Pr(H2j |Dj )
• If > 1 (i.e., the data indicate that H2j is more likely to be true
Pr(H0j |Dj )
than H0j ), de-escalate the dose.
The proof of Theorem 3.1 is straightforward noting that the ratio in equation
(3.4) is the Pr(H1j |Dj )/Pr(H0j |Dj ).
Define L(Dj |H0j ) ∝ φyj (1 − φ)(nj −yj ) as the binomial likelihood func-
tion of the data Dj = (nj , yj ) under H0j , and similarly define L(Dj |Hkj ) as
the likelihood function under Hkj , k = 1, 2. When the non-informative prior
48 Model-Assisted Bayesian Designs for Dose Finding and Optimization
Long-memory coherence
Coherence is a finite-sample property that describes how a phase I design be-
haves in dose escalation and de-escalation in light of observed DLT data. Che-
ung (2005) originally defined coherence as a design property by which dose es-
calation (or de-escalation) is prohibited when the most recently treated patient
experiences (or does not experience) toxicity. Liu and Yuan (2015) extended
that concept and defined two different types of coherence: short-memory co-
herence and long-memory coherence. They referred to the coherence proposed
by Cheung (2005) as short-memory coherence because it concerns the obser-
vation from only the most recently treated patient, ignoring the observations
from the patients who were previously treated. By contrast, long-memory co-
herence concerns the accumulated data observed from the most recent dose
level.
Definition (Short-memory coherence) A design is called short-memory
coherent if it never escalates the dose when the most recently treated
patient experiences DLT, and never de-escalates the dose when the most
recently treated patient does not experience DLT.
Definition (Long-memory coherence) A design is called long-memory co-
herent if it never escalates the dose when the observed DLT rate at the
current dose is higher than the target φ, and never de-escalates the dose
when the observed DLT rate at the current dose is lower than φ.
From a practical viewpoint, long-memory coherence is more relevant be-
cause when clinicians determine whether a dose escalation/de-escalation is
practically plausible, they almost always base their decision on the toxicity
data from all patients treated at the current dose, rather than only the sin-
gle patient most recently treated. This is more important considering that,
patients in phase I trials are highly heterogeneous, and the toxicity outcome
from a single patient can be spurious. For example, suppose the target DLT
rate φ = 0.3 and, at the current dose, the most recently treated patient expe-
rienced DLT but none of the nine patients previously treated at the same dose
had DLT. As the overall observed DLT rate at the current dose is 1/10, esca-
lating the dose should not be regarded as an inappropriate action, although
it violates short-memory coherence.
Model-Assisted Dose Finding Designs 49
Theorem 3.2 The BOIN design based on the non-informative prior with
ω0j = ω1j = ω2j = 1/3 is long-memory coherent.
The proof of Theorem 3.2 is straightforward based on the equation (3.9). Be-
sides the above finite-sample properties, Liu and Yuan (2015) showed that,
assuming the existence of the target dose (i.e., at least a dose located in
(λe , λd )), the BOIN design also has the following desirable large-sample prop-
erty.
Theorem 3.3 As the number of patients goes to infinity, the dose assignment
and the selection of MTD under the BOIN design converge almost surely to
dose level j ∗ if dose level j ∗ is the only dose satisfying πj ∗ ∈ (λe , λd ). If there
are multiple dose levels in (λe , λd ), the design will converge almost surely to
one of these levels.
As a side note, one might be concerned that BOIN converges to the “stay”
interval (λe , λd ), rather than the target φ. Actually, this is not a concern. As
noted by Zhou et al. (2021b), under large samples it is more √ appropriate to
use the local
√ alternative hypotheses H 1 : π j = φ − ∆ 1 / n and H2 : πj =
φ + ∆2 / n, where ∆1 and ∆2 are constant, rather than fixed alternatives
H1 : πj = φ1 and H2 : πj = φ2 as used in the above theorem. Then, as
(λe , λd ) converges to φ, the dose assignment and the selection of MTD under
the BOIN design naturally converge almost surely to φ.
1. Does the BOIN decision rule account for the variance of π̂j (or equivalently,
the sample size nj )?
The answer is yes. The simplicity of the BOIN design (i.e., making decisions
by comparing π̂j with λe and λd ) might lead one to think that the BOIN
decision rule does not consider the variance of π̂j (or equivalently, the sample
size nj ). This, however, is not true. As shown by equation (3.3), the derivation
and minimization of the decision error α depends on the sampling distribution
of π̂j , thus it directly accounts for the uncertainty of π̂j . The optimal decision
boundaries independent of nj should not be mistakenly regarded as ignoring
nj .
To help readers to understand this point, consider an experiment of draw-
ing balls with replacement from a bag of red and black balls. There are a total
of 9 balls in the bag, but we do not know if there are more red or black balls.
The objective is to determine if there are more red or black balls. The exper-
iment is to randomly draw a ball from the bag, record the color, put it back,
and repeat. Clearly, no matter whether we do the experiment 3 or 30 times, as
50 Model-Assisted Bayesian Designs for Dose Finding and Optimization
long as we see more red balls, the best decision is to claim that there are more
red balls. The only difference is that the decision based on 30 experiments
has a smaller decision error, although both minimize the decision error. This
is exactly how the Bayes classifier works, which optimizes and minimizes the
Bayes error rate (Berger, 2013).
The answer is no. For the purpose of dose finding, the loss of efficiency due to
the use of local data is minimal, and mostly ignorable. This is because unlike
most statistical inferential procedures, dose finding is a sequential decision-
making process, escalating from low doses to high doses. Suppose that the
current dose level is j. In order to reach j, the data observed previously at lower
doses (i.e., < j) must indicate that these doses are safe and substantially lower
than the MTD (e.g., 0/3 DLT). Thus, these data provide little information
to determine whether the current dose j is below, equal (or sufficiently close)
Model-Assisted Dose Finding Designs 51
FIGURE 3.4: Panel (a): 25 randomly selected dose–toxicity curves with six
picked curves showing different shapes; Panel (b): Distribution of the DLT
probabilities by dose level from the 1000 randomly generated scenarios.
target DLT rate φ and J dose levels, the random scenarios were generated as
follows:
1. Select one of the J dose levels as the MTD with equal probabilities.
2. Sample M ∼ Beta(max{J − j, 0.5}, 1), where j denotes the selected
MTD level, and set an upper bound B = φ + (1 − φ)M for the
toxicity probabilities.
3. Repeatedly sample J toxicity probabilities uniformly on [0, B] until
these correspond to a scenario in which dose level j is MTD.
Figure 3.4 panels (a) and (b) show 25 randomly selected scenarios and dis-
tributions of the DLT probabilities by dose level from the 1000 scenarios,
respectively. It can be seen that the simulated dose–toxicity curves cover var-
ious shapes and a wide range of toxicity probabilities. The algorithm above
guarantees that the generated dose–toxicity curves are monotonically increas-
ing (i.e., higher doses have higher toxicity rates). For each scenario, 2000 trials
were simulated.
Performance metrics
• Accuracy
A1. The percentage of correct selection (PCS), defined as the percentage of
simulated trials in which the target dose is correctly selected as MTD. When
54 Model-Assisted Bayesian Designs for Dose Finding and Optimization
all the dose levels are above MTD (i.e., the DLT probability of the lowest
dose > φ + 0.1), PCS is defined as the percentage of early termination of
trials.
A2. The average percentage of patients who are assigned to MTD across
the simulated trials. When all the dose levels are above MTD (i.e., the DLT
probability of the lowest dose > φ + 0.1), the average percentage of patients
not enrolled into the trial is used for this metric.
• Safety
B1. The percentage of simulated trials in which a toxic dose with the true
DLT probability > 33% is selected as MTD when the target φ = 25%.
B2. The average percentage of patients assigned to the toxic doses with true
DLT probabilities > 33% when the target φ = 25%.
• Reliability
C1. The risk of overdosing, defined as the percentage of simulated trials with
more than 50% of patients treated at doses above MTD.
C2. The risk of poor allocation, defined as the percentage of simulated trials
in which fewer than six patients are treated at MTD.
C3. The risk of irrational dose assignment, defined the percentage of times
that the design fails to de-escalate the dose when 2/3 or > 3/6 patients had
DLTs at a dose.
Reliability metrics C1 to C3 measure the likelihood of a design demonstrat-
ing problematic behaviors (e.g., treating 50% or more patients at toxic doses,
or fewer than six patients at MTD) that have severe clinical consequences.
These metrics are of great practical importance, but unfortunately are often
overlooked in the literature. The reliability metrics are not covered by other
metrics. For example, the percentage of patients overdosed (i.e., metric B2)
does not cover the risk of overdosing (i.e., metric C1). Two designs can have
similar percentages of patients overdosed, but rather different risks of over-
dosing 50% of the patients. Statistically, metric B2 measures the mean of
overdosing, while metric C1 measures the tail probability of overdosing.
Results
Accuracy Panels A1 and A2 in Figure 3.5 show distributions of the PCS
and the average percentages of patients treated at MTD, respectively, for the
investigational designs relative to the 3+3 design across 1000 scenarios. That
is, the values displayed in the figure are the difference between those of a
specified design and the reference (i.e., 3+3 design). For example, PCS = 0
means that the design has the same PCS as the 3+3 design. As each dose–
toxicity scenario generates a value of the performance metric (e.g., PCS), there
are a total of 1000 values for each of the metrics across the 1000 scenarios.
The boxplot reflects the distribution of the metric across the 1000 scenarios. In
terms of the accuracy of correctly selecting the MTD, the CRM, mTPI, BOIN,
Model-Assisted Dose Finding Designs 55
and keyboard designs are comparable and substantially outperform the 3+3
design. The BLRM and EWOC designs perform the worst, with the average
PCS similar to that of the 3+3 design. The EWOC design also has the largest
variation in PCS. The results for the number of patients treated at MTD are
similar to those for PCS. The CRM, mTPI, BOIN, and keyboard designs are
generally comparable and substantially outperform the 3+3 design. The mTPI
and CRM designs allocate slightly more patients to MTD than the BOIN and
keyboard designs, but the latter two designs are less variable, as shown by
the shorter boxes in the box plot. mTPI is less robust than the BOIN and
keyboard designs. For example, when the target φ = 20%, mTPI has notably
lower PCS than the BOIN and keyboard designs, see Zhou et al. (2018b) for
details.
FIGURE 3.5: Accuracy and safety of the eight designs with respect to the 3+3
design, including (A1) percentage of correct selection of MTD, (A2) percentage
of patients treated at MTD, (B1) percentage of selecting a dose with the DLT
probability ≥ 33% as MTD, and (B2) percentage of patients treated at doses
with DLT probabilities ≥ 33%. For (A1) and (A2), a larger value indicates
better performance; a positive value means that the design outperforms the
3+3 design. For (B1) and (B2), a smaller value indicates better performance;
a negative value means that the design outperforms the 3+3 design.
56 Model-Assisted Bayesian Designs for Dose Finding and Optimization
Safety As shown in Figure 3.5 panel B1, the CRM, mTPI, BOIN, and key-
board designs are comparable in terms of the percentage of selecting a toxic
dose (with a DLT probability ≥ 33%) as MTD, but CRM and mTPI are
slightly more variable than the BOIN and keyboard designs. The BLRM and
EWOC designs are the most conservative and least likely to select a toxic dose
as MTD. In terms of the percentage of patients treated at a toxic dose with
a DLT probability ≥ 33%, on average the CRM, mTPI, BOIN, and keyboard
designs are comparable, but BOIN and keyboard show smaller variations (Fig-
ure 3.5 panel B2).
Reliability In terms of the risk of overdosing 50% or more of the patients
(Figure 3.6, panel C1), the BLRM, BOIN, and keyboard designs perform the
best. The performances of the CRM and mTPI designs are similar and rank
FIGURE 3.6: Reliability of the eight designs with respect to the 3+3 design,
including (C1) risk of overdosing 50% or more patients, (C2) risk of treating
< 6 patients at the MTD, and (C3) risk of irrational dose assignments. A
smaller value indicates better performance; negative value means that the
design outperforms the 3+3 design.
Model-Assisted Dose Finding Designs 57
in between the performances of these other designs. The EWOC design has
a similar averaged risk of overdosing patients as the BOIN and keyboard
designs, but is much more variable. Of note, the CRM, mTPI, BOIN, and
keyboard designs, on average, overdose similar percentages of patients (Figure
3.6 panel B2), but have different risks of overdosing 50% or more of the patients
(Figure 3.6, panel C1). This indicates that the risk of overdosing (50% or more
patients) and the average percentage of patients overdosed indeed measure
different aspects of a design, and it is thus important to consider both metrics
when evaluating a design. In terms of the risk of poor allocation (i.e., treating
fewer than six patients at the MTD, see Figure 3.6, panel C2), BLRM and
EWOC perform the worst, with a significantly higher risk than the other
designs. The CRM, BOIN, and keyboard designs have comparable risks of
poor allocation.
In terms of the risk of irrational dose assignment (Figure 3.6, panel C3), the
model-assisted designs outperform the model-based designs. The model-based
designs (i.e., CRM, BLRM, and EWOC) have an 8% to 55% chance of failing
to de-escalate the dose when ≥ 2/3 or ≥ 3/6 patients have DLTs, whereas such
irrational dose assignments never occur in the mTPI, BOIN, and keyboard
designs. The model-based designs rely on the assumed model to make the
decision of dose assignment. When the model is misspecified, the estimates
can be biased and thus irrational dose assignment arises. The model-assisted
designs are free of that issue because they do not impose any model assumption
on the dose–toxicity curve. For example, by its dose escalation/de-escalation
rule, the BOIN design guarantees de-escalating the dose if the observed DLT
rate at the current dose is higher than 29.8%, given the target DLT rate of
25%.
In summary, the model-assisted designs (e.g., the BOIN and keyboard
designs) substantially outperform the algorithm-based 3+3 design in the ac-
curacy of identifying MTD and allocating patients to MTD. They produce
competitive accuracy and safety comparable to the model-based designs (e.g.,
CRM), but are much simpler and more transparent. In addition, the model-
assisted designs are more robust, and avoid the irrational dose assignment of
the model-based designs due to model misspecification. Among the model-
assisted designs, BOIN stands out. It has similar operating characteristics as
the keyboard design, but is simpler, more flexible, and transparent. The mTPI
design is not recommended due to the poor reliability and safety concerns.
Case study
Solid Tumor Dose Finding Trial The objective of this phase I trial
(ClinicalTrials.gov Identifier: NCT03725436) was to determine MTD for the
Model-Assisted Dose Finding Designs 59
FIGURE 3.8: The decision tree of the BOIN Suite software to assist users to
choose an appropriate BOIN design module.
We use this trial as an example to illustrate the use of the BOIN software,
and provide guidance to address some common design issues in phase I trials.
The Keyboard Suite web application shares a similar user interface as the
BOIN Suite, thus we here focus on the latter. After selecting and launching
the “BOIN/iBOIN” module from the BOIN Suite launchpad, we design the
trial using the following three steps:
FIGURE 3.9: Specify doses, sample size, and convergence stopping rule.
dose is believed to be safe, starting from a slightly higher dose level (e.g.,
two) may reduce the sample size as it allows the design to reach MTD sooner.
However, due to limited knowledge on the safety of the new drug, in general
it is not recommended to start from a high dose level (e.g., four).
The cohort size for the trial is three and the number of cohorts is 10, with
a total sample size of 30. As a rule of thumb, we recommend the maximum
sample size N = 6 × J (i.e., the maximum sample size of the 3+3 design) as
the total sample size, where J is the number of doses. This sample size gener-
ally yields reasonable operating characteristics (e.g., 50–70% correct selection
percentage of the true MTD). To reduce the sample size, it is often useful
to use the “convergence” stopping rule: stop the trial early when m patients
have been assigned to a dose and the decision is to stay at that dose. This
stopping criterion suggests that the dose finding approximately converges to
MTD, thus the trial can be stopped. We recommend m = 9 or larger. In this
trial, m = 12 is used. Because of the early stopping rule, the actual sample
size used in the trial is often smaller than the prespecified maximum sample
size N . The saving depends on the true dose–toxicity scenario and can be
evaluated using simulation. Usually, the savings in the sample size is more
prominent when the true MTD is near the starting dose.
The choice of the cohort size should be based on considerations of de-
sign performance and logistic complexity. Given a fixed total sample size, the
use of large cohort sizes will result in fewer cohorts. This reduces the logisti-
cal burden and requires fewer dose escalation/de-escalation decisions for trial
conduct. As a tradeoff, using a large cohort size often reduces the accuracy
of identifying the MTD and evaluating trial safety, because trials based on
a smaller number of cohorts tend to be less adaptive. For example, given a
Model-Assisted Dose Finding Designs 61
total sample size of 30 patients, using a cohort size of six patients results in
a total of five cohorts. This means that during the trial, we only have four
chances to escalate/de-escalate the dose. If MTD is dose level five or six, and
if the dose escalation starts from dose level one and no dose skipping is per-
mitted, then there is a high likelihood that we will not be able to reach MTD
because many patients are treated in lower doses. In addition, using a co-
hort size of six may expose up to six patients to an overly toxic dose before
dose de-escalation is made. In contrast, using small cohort sizes, such as one
or two patients per cohort, renders the trial more freedom to move up and
down between doses and be more responsive/adaptive to the observed data.
This is logistically more complicated, however, because both data and dose
escalation/de-escalation decisions need to be updated more frequently. In ad-
dition, it prolongs the trial duration as new patients cannot be enrolled until
the patient in the previous cohort has completed their DLT assessment. As a
result, the most commonly-used cohort size in practice is two to four patients
per cohort.
Target and Accelerated Titration As shown in Figure 3.10, the target DLT
probability is φ = 0.25, which should be elicited from clinicians. The target
φ can be adjusted if a specific safety requirement is desirable. For example,
if it is desirable to de-escalate the dose when the DLT rate is > 30%, then
φ = 25% is an appropriate target with the de-escalation boundary λd =
0.298. As described previously, such simple mapping between the target and
escalation/de-escalation boundaries is a unique and important advantage of
BOIN.
Given φ, although the software allows users to specify φ1 and φ2 (see Sec-
tions 3.4.2 and 3.4.3 for their definition and interpretation) by unchecking the
FIGURE 3.10: Specify the target, accelerated titration, and 3+3 design run-in
(available only when the target φ = 0.25).
62 Model-Assisted Bayesian Designs for Dose Finding and Optimization
box “X use the default alternatives to minimize decision error” under the tar-
get toxicity probability field, we highly recommend using the default values
(i.e., φ1 = 0.6φ and φ2 = 1.4φ) provided by the software. These default values
have been shown to produce highly robust and desirable operating character-
istics. Nevertheless, when necessary, the values of φ1 and φ2 can be calibrated
to satisfy certain trial design goals. For example, if we do not want to modify
φ and prefer a more conservative design, we may set φ2 = 1.2φ to obtain a
lower de-escalation boundary λd . To this end, it is important to distinguish
(φ1 , φ2 ) from (λe , λd ), where φ1 and φ2 are the toxicity probabilities used to
minimize the decision errors, and λe and λd are the decision boundaries actu-
ally used to determine dose escalation and de-escalation. A practical way to
judge if (φ1 , φ2 ) specified by users is appropriate or not is to examine whether
the resulting (λe , λd ) makes practical and clinical sense. For example, given
φ = 0.25, if we set (φ1 , φ2 ) = (0.9φ, 1.1φ), the resulting (λe , λd )=(0.237, 0.262)
makes little sense because λe and λd are too close and nearly indistinguishable
under small sample sizes. The fundamental issue here is that it is not mean-
ingful to minimize the decision error for the hypotheses (i.e., φ vs. φ1 = 0.9φ
vs. φ2 = 1.1φ) given that we have no power to distinguish. See Section 3.4.3
for more discussion.
In addition to the “convergence” stopping rule described above, another
useful approach offered by the BOIN software to reducing the sample size is to
conduct the accelerated titration (Simon et al., 1997) before treating patients
in cohorts of three (Figure 3.10). During the accelerated titration, we treat
patients in cohorts of one, and we continue escalating the dose in the one-
patient-per-dose-level fashion until any of the following events are observed:
(i) the first instance of DLT, (ii) the second instance of moderate (grade 2)
toxicity, or (iii) the highest dose level is reached. At that point, the titration
ends. We add two more patients to the current dose level, and hereafter switch
to the cohort size of three.
In the simulation, however, the software considers only (i) and (iii), ignor-
ing (ii), due to two practical considerations. First, incorporating (ii) in the
simulation requires users to specify the probability of grade 2 toxicity at each
dose level, which is cumbersome and brings substantial noise and uncertainty.
Second, the main concern of using the accelerated titration is that it may
make the design risky. The design ignoring (ii) is more aggressive than its
counterpart considering (ii), thus if the operating characteristics of the former
is satisfactory, the trial with (ii) is safer.
As the accelerated titration generally leads to more aggressive dose esca-
lation, it should be used only when there is sufficient evidence that low doses
are most likely underdosing. When the accelerated titration is used, based on
our experience, a reasonable rule of thumb for the maximum sample size is
N = j ∗ − 1 + 6(J − j ∗ + 1), where j ∗ is the dose level where the first DLT
is expected to occur. For example, if we expect that the first three doses are
very safe and the first DLT may occur in dose level j ∗ = 4, then the maximum
required sample size N could be reduced to 15 when there are five dose levels.
Model-Assisted Dose Finding Designs 63
When the target φ = 0.25, the software provides an option “Apply the
3+3 design run-in” to embed the 3+3 design rule into the BOIN design. The
rationale for this option is that when φ = 0.25, the default BOIN de-escalation
boundary is λd = 0.298, which means de-escalating the dose when 1/3 or
2/6 patients have DLT. Due to the influence of the conventional 3+3 design,
in some cases, investigators prefer that the design stays at the current dose
when 1/3 patients have DLT. The 3+3 run-in option enables that and en-
forces the design to stay at the current dose when 1/3 DLT. This option is
added mainly based on practical consideration, not statistical consideration.
Actually, the 3+3 design rule—stay when 1/3 DLT, but de-escalate when 2/6
DLT—is not self-consistent, see Section 3.4.4 for the explanation of why using
the same fixed de-escalation boundary is optimal. Nevertheless, as the modifi-
cation only occurs when the cumulative number of patients is three, activating
the option generally has minor impact on the design operating characteristics.
Overdose Control This panel (see Figure 3.11) specifies the overdose con-
trol rule, described in Section 3.4.1. That is, if Pr(πj > φ | Dj ) > 0.95 and
nj ≥ 3, dose level j and higher are eliminated from the trial, and the trial is
terminated if the lowest dose level is eliminated. When the trial is terminated
due to toxicity, no dose should be selected as MTD. In general, we recommend
using the default probability cutoff 0.95. A smaller value (e.g., 0.9), results in
stronger overdose control, but at the cost of reducing the probability of cor-
rectly identifying MTD. This is because, in order to correctly identify MTD,
it is imperative to explore the doses sufficiently to learn their toxicity profile.
In some trial settings, under the null case that all doses are overly toxic
(i.e., the lowest/first dose is above the MTD), the probability of early trial
termination may not be as high as we desire (e.g., > 70%). That is simply
because the small sample size cannot provide enough power to distinguish
whether a dose is overly toxic or not (e.g., 40% vs. 30%). To achieve a high
early termination probability (when the first dose is overly toxic), we can
activate the first option “ Check to impose a more stringent safety stopping
rule on the lowest dose” (see Figure 3.12). This option makes the lowest/first
dose more likely to be eliminated by lowering the probability cutoff by δ. The
default value δ = 0.05 is generally recommended and produces a good balance
between the safety and the accuracy to identify MTD. A large value of δ (e.g.,
0.1) increases the early termination probability (when all doses are toxic), but
at the cost of reducing the probability of correctly identifying MTD when it
is the lowest dose.
At the end of the trial, the BOIN design selects MTD as the dose whose
isotonic estimate of the DLT probability is closet to φ. In some cases, it may
be desirable to require that the DLT probability estimate of MTD be lower
than the de-escalation boundary λd . This can be done by activating the option
at the bottom of the “Overdose Control” panel (i.e., by checking the box “
Check to ensure p̂M T D < de-escalation boundary.”)
After completing the specification of trial parameters, the decision table
will be generated by clicking the “Get Decision Table” button. The decision
table automatically will be included in the protocol template in Step 3, but it
can also be saved as a separate csv, Excel, or pdf file in this step if needed.
FIGURE 3.14: Operating characteristics of the BOIN design for the solid
tumor trial.
included as a table in the protocol template in the next step, but can also be
saved as a separate csv or Excel file if needed.
FIGURE 3.16: Estimate and identify MTD at the completion of the solid
tumor trial based on a hypothetical dataset.
4
Drug-Combination Trials
4.1 Introduction
Drug combination therapy provides an effective approach to improving treat-
ment efficacy and overcoming most cancers’ resistance to monotherapy. The
objectives of using drug combinations are to induce an additive or a syner-
gistic treatment effect, increase the joint dose intensity with non-overlapping
toxicities, and target various tumor cell susceptibilities and disease pathways.
Despite the enormous importance of combination therapies, statistical designs
currently used for dose finding in phase I trials of combination therapies are
grossly inefficient and rudimentary—most combination trials have used the
conventional 3+3 design (Riviere et al., 2015a). The objective of this chapter
is to address the challenges and clarify misconceptions in designing combina-
tion therapy trials and introduce more efficient designs, in particular model-
assisted designs, for dose finding in phase I drug-combination trials.
In general, drug combinations may involve one of the followings: two or
more previously marketed drugs or biologics, two or more new molecular en-
tities, or a mix of previously marketed drugs or biologics and new molecular
entities. According to the US Food and Drug Administration (FDA) guide-
lines (FDA, 2006, 2013), prior to testing a new drug combination in human
beings, extensive preclinical studies are required to demonstrate the biologi-
cal rationale for the combination and to assess the safety of the combination
(FDA, 2006). When such data are not available or indicate safety concerns
for the combination, additional toxicology studies are required to address the
concerns. Sometimes drug-combination trials may involve two or more new in-
vestigational drugs that have not been previously studied for any indication.
In such cases, additional considerations are needed for the co-development of
the new investigational drugs for use in combination (FDA, 2013). In drug-
combination trials, it is useful to test multiple doses of each drug to identify the
optimal dose combination in terms of risks and benefits (FDA, 2013). Com-
pared to single-agent trials, drug-combination trials have a higher dimension
for the dose searching space, leading to several unique challenges.
The major challenge in designing combination trials is that the dose com-
binations under investigation are only partially ordered by the dose-limiting
toxicity (DLT) probability). This is in contrast with monotherapy trials, for
which the doses under investigation are fully ordered by the DLT probability
DOI: 10.1201/9780429052781-4 69
70 Model-Assisted Bayesian Designs for Dose Finding and Optimization
FIGURE 4.1: Partial ordering (left) and toxicity contours (right) for drug
combinations.
(i.e., the higher the dose, the greater the DLT probability). Consider a trial
combining J doses of agent A, denoted as A1 < A2 < · · · < AJ , and K doses of
agent B, denoted as B1 < B2 < · · · < BK . Let Aj Bk denote the combination
of Aj and Bk . Often it is reasonable to assume that when the dose of agent
A is held constant, the DLT probability for the combination increases in the
dose of agent B, and vice versa. As shown in the left panel of Figure 4.1, the
rows and columns of the dose combination matrix are partially ordered, with
the DLT probability increasing in the dose of the corresponding agent when
the dose of the other agent is fixed. However, in other directions of the dose
matrix (e.g., along the diagonals from the upper left corner to the lower right
corner), the ordering is not clear due to unknown drug-drug interactions. For
example, a priori we do not know whether the DLT probability for A2 B2 is
higher than the DLT probability for A1 B3 or A3 B1 .
The partial ordering of dose combinations has several implications. First,
monotherapy designs for finding the maximum tolerated dose (MTD), de-
scribed in previous chapters, cannot directly be used for finding MTD of a
drug combination. The second important implication is that there is not just
a single MTD. Rather, as depicted in the right panel of Figure 4.1, there is an
MTD contour in the two-dimensional dose space. Therefore, multiple MTDs
may exist in the J ×K dose matrix. When designing a drug-combination trial,
one must decide whether to look for a single MTD or multiple MTDs. In some
settings, it can be advantageous to find multiple MTDs so that we can further
study which one yields the highest synergistic treatment effect. We begin this
Drug-Combination Trials 71
chapter with designs that look for a single MTD, and we finish with designs
that target multiple MTDs.
Before describing the designs, we establish some notation. Let πjk denote
the DLT probability for dose combination Aj Bk , and let φ denote the target
DLT probability for MTD. We use njk to denote the number of patients who
have been assigned to Aj Bk , and yjk to denote the number of DLTs observed
at Aj Bk , j = 1, . . . , J and k = 1, . . . , K. Therefore, at a particular point
during the trial, the observed data are D = {Djk , j = 1, . . . , J, k = 1, . . . , K},
where Djk = (njk , yjk ) are the data observed at Aj Bk .
πjk = 1 − {(1 − pα
j)
−γ
+ (1 − qkβ )−γ − 1}−1/γ , (4.1)
πj = p α
j, j = 1, . . . , J.
where f (θ) is the prior distribution of θ. For example, f (θ) = f (α)f (β)f (γ)
in the copula-type regression model, where f (α), f (β), and f (γ) denote inde-
pendent, vague gamma prior distributions with mean one and large variances
for α, β, and γ, respectively.
Based on the posterior distribution of the model parameters, the dose-
finding algorithm can be described as follows:
1. The first cohort of patients is treated at the lowest dose combination
A1 B1 .
2. Suppose the current dose combination is Aj Bk , to determine the
dose for the next cohort, consider the following:
(i) If Pr(πjk < φ|D) > ce , where ce is the fixed probability cut-
off for dose escalation, the dose for the next cohort of pa-
tients moves to an adjacent dose combination chosen from
{Aj+1 Bk , Aj+1 Bk−1 , Aj−1 Bk+1 , Aj Bk+1 }, which has a DLT
probability higher than the current doses and closest to φ. If
the current dose combination is AJ BK , the dose stays at the
same dose combination.
(ii) If Pr(πjk < φ|D) < cd , where cd is the fixed probability
cutoff for dose de-escalation, the dose moves to an adjacent
dose combination chosen from {Aj−1 Bk , Aj−1 Bk+1 , Aj+1 Bk−1 ,
Aj Bk−1 }, which has a DLT probability lower than the current
doses and closest to φ. If the current dose combination is A1 B1 ,
the trial is terminated.
(iii) Otherwise, the next cohort of patients continues to be treated
at the current dose combination.
3. Repeat Step 2 until the maximum sample size is reached, and select
MTD as the dose whose estimate of DLT probability is closest to
φ.
The model-based designs perform reasonably well, but for several reasons
they are rarely used in practice. First, these designs are statistically and com-
putationally complicated, leading many practitioners to perceive that deci-
sions of dose allocation arise from a “black box,” which limits its application
in practice. Secondly, robustness is another potential issue for the model-based
drug-combination designs. Since these designs use a strategy akin to CRM, one
might expect them to share the similar robustness (e.g., consistent under mis-
specified models (Shen and O’Quigley, 1996)). Unfortunately, that generally
is not the case. The consistency of CRM under misspecified models requires
several assumptions (Shen and O’Quigley, 1996). A critical one is monotonic-
ity (i.e., the DLT probability monotonically increases with the dose), which
does not hold for drug combinations. Based on our experience, model-based
drug-combination trial designs are substantially more delicate, and it is not
difficult to find scenarios where such designs do not perform well.
74 Model-Assisted Bayesian Designs for Dose Finding and Optimization
When the BOIN rule says escalate, we escalate to the dose combination that
belongs to AE and has the highest value of Pr(πjk ∈ (λe , λd )|Djk ); and when
the BOIN rule says de-escalate, we de-escalate to the dose combination that
belongs to AD and has the highest value of Pr(πjk ∈ (λe , λd )|Djk ). That is,
we always move toward the dose that is most likely to be in the acceptable
(or “stay”) interval (λe , λd ). The value of Pr(πjk ∈ (λe , λd )|Djk ) can be easily
evaluated based on the beta-binomial model
with the posterior πjk | Djk ∼ Beta(yjk + 1, njk − yjk + 1). The BOIN combi-
nation design is summarized in Table 4.1.
Because Pr{πjk ∈ (λe , λd )|Djk } can be pre-determined for all possible
outcomes Djk = (njk , yjk ), the dose escalation and de-escalation rule in Step
Drug-Combination Trials 75
(a) Patients in the first cohort are treated at the lowest dose combina-
tion A1 B1 or a prespecified dose combination.
(b) Suppose the current cohort is treated at dose combination Aj Bk ;
to assign a dose to the next cohort of patients:
• If π̂jk ≤ λe , we escalate the dose to the combination that
belongs to AE and has the largest value of Pr{πj 0 k0 ∈
(λe , λd )|Dj 0 k0 }. If the current dose combination is AJ BK , then
we retain this dose for treating the next cohort of patients.
• If π̂jk > λd , we de-escalate the dose to the combination
that belongs to AD and has the largest value of Pr{πj 0 k0 ∈
(λe , λd )|Dj 0 k0 }. If the current dose combination is A1 B1 , then
we retain this dose for treating the next cohort of patients.
• Otherwise, if λe < π̂jk ≤ λd , then the dose stays at the same
combination Aj Bk .
(c) Repeat Step (b) until the maximum sample size N is reached, and
select MTD as the dose combination whose isotonic estimate (Bril
et al., 1984b) of the DLT probability is closest to φ.
(b) can be tabulated before the trial begins, which makes the BOIN combi-
nation design easy to implement in practice. One important characteristic of
the dose transition rule of the BOIN combination design is that, to make the
decision, what really is needed is the ordering of Pr{πjk ∈ (λe , λd )|Djk } for
doses within AE and AD , not their absolute values. Thus, for the purpose
of decision making, we only need to tabulate the rank of each possible Djk
according to the value of Pr{πjk ∈ (λe , λd )|Djk }. This greatly simplifies the
decision table. We refer to the rank of a dose with Djk as the desirability score
of that dose. Table 4.2 shows the desirability score for njk up to 12, with the
cohort size of 3 and the target DLT probability φ = 0.3. A larger value in-
dicates a more desirable dose with a larger value of Pr{πjk ∈ (λe , λd )|Djk }.
To conduct the trial, there is no need for any model fitting or complicated
calculation (as required by model-based designs). Users simply look up the
desirability score table to make the dose escalation/de-escalation decision.
To illustrate the use of the decision table, suppose that at the current dose
A1 B1 , we observed π̂11 = 1/6 < λe = 0.236 and thus need to escalate the
dose. Assume that at this point, the observed data at A2 B1 and A1 B2 are
D21 = (0, 0) and D12 = (3, 1), respectively. To determine which dose the trial
should be escalated to, we simply look up Table 4.2 and identify that the
desirability scores of A2 B1 and A1 B2 are 25 and 40, respectively. As A1 B2
76 Model-Assisted Bayesian Designs for Dose Finding and Optimization
TABLE 4.2: Desirability score table for the BOIN combination design with
the target φ = 0.3. A larger value indicates a higher desirability.
has a higher desirability score, the next cohort of patients will be treated at
A1 B2 . In the case that A2 B1 and A1 B2 have the same desirability scores (e.g.,
both doses have not yet been used to treat any patients, or have been used to
treat patients and generated same data), we can choose one dose randomly or
based on other clinical considerations. The BOIN combination decision table,
such as Table 4.2, can be generated using the software described later, and
included in the trial protocol for trial conduct.
Like the BOIN single-agent design, during the trial conduct the BOIN
combination design imposes the dose elimination/early stopping rule such
that: if Pr(πjk > φ | Djk ) > 0.95 and njk ≥ 3, dose combination Aj Bk and
the higher combinations (i.e., {Aj 0 Bk0 , j 0 ≥ j, k 0 ≥ k}), are eliminated from the
trial, and the trial is terminated if the lowest dose combination is eliminated,
where Pr(π11 > φ | D11 ) is evaluated based on the posterior distribution
(3.2). The letter “E” in Table 4.2 reflects this dose elimination rule such that
a dose with a desirability score of “E” is excessively toxic and should not be
considered in the admissible escalation/de-escalation set.
The BOIN combination design employs the same optimal escalation and
de-escalation boundaries as the BOIN single-agent design to guide the dose
transition, thus it inherits the latter’s desirable statistical and operational
properties. It is simple, transparent, easy-to-calibrate, and efficient to identify
MTD. In addition, because no parametric assumption is made on the dose–
toxicity surface, the BOIN combination design is more robust than model-
based designs. Extensive simulation studies have demonstrated that the BOIN
combination design yields competitive performance comparable to more com-
plicated model-based designs, see Section 4.4 for more details.
Drug-Combination Trials 77
(a) Patients in the first cohort are treated at the lowest dose combina-
tion A1 B1 or a prespecified dose combination.
(b) Suppose the current cohort is treated at dose combination Aj Bk ,
given the observed data Djk = (njk , yjk ), we identify the strongest
key Imax based on the posterior distribution of πjk and assign a
dose to the next cohort of patients as follows:
one-dimensional searching lines (i.e., subtrials) such that the BOIN single-
agent design can be directly applied.
As illustrated in Figure 4.2, the waterfall design partitions the J × K
dose matrix into J subtrials (or blocks), within which the doses are fully
ordered. Without loss of generality, we assume that J ≤ K. These subtrials
are conducted sequentially from the top of the matrix to the bottom, which is
where the design gets its waterfall name. The goal of the waterfall design is to
find the MTD contour, which is equivalent to finding MTD in each row of the
dose matrix, if one exists. The waterfall design can be described as follows:
1. Divide the J ×K dose matrix into J subtrials SJ , · · · , S1 , according
to the dose level of drug A:
SJ = {A1 B1 , A2 B1 , · · · , AJ B1 , AJ B2 , · · · , AJ BK },
SJ−1 = {AJ−1 B2 , AJ−1 B3 , · · · , AJ−1 BK },
SJ−2 = {AJ−2 B2 , AJ−2 B3 , · · · , AJ−2 BK },
···
S1 = {A1 B2 , A1 B3 , · · · , A1 BK }.
Drug-Combination Trials 79
● ● ● ● ●
(a) ● ● ● ● ●
● ● ● ● ●
● ● ● ● ●
(b) ● ● ● ● ●
● ● ● ● ●
Drug A
● ● ● ● ●
(c) ● ● ● ● ●
● ● ● ● ●
● ● ● ● ●
(d) ● ● ● ● ●
● ● ● ● ●
Drug B
FIGURE 4.2: Illustration of the waterfall design for a 3 × 5 dose matrix. The
outlined doses in each panel form a subtrial, and the asterisk denotes the
candidate MTD. As shown in panel (a), the first subtrial starts with dose
combination A1 B1 . After the first subtrial identifies A3 B2 as the candidate
MTD, the second subtrial starts with dose combination A2 B3 (see panel (b)).
After the second subtrial identifies A2 B4 as the candidate MTD, the third
subtrial starts with dose combination A1 B5 (see panel (c)). After all subtrials
complete, the MTD in each row of the dose matrix is selected based on the
data from all subtrials, as shown in panel (d).
The rationale for the stopping rule is that when the patient allocation concen-
trates at a particular dose combination, this indicates that the dose-finding
algorithm likely has converged on the MTD, so we should stop the subtrial
and select the current dose combination as the candidate MTD. This stop-
ping rule automatically adjusts the sample size of each subtrial to reflect the
difficulty of the dose finding (e.g., the number of dose combinations between
the starting dose combination and the actual MTD, and the shape of the
dose–toxicity curve). In addition, this stopping rule ensures that a certain
number of patients are treated at the candidate MTD, which is achieved in
single-agent dose-finding designs using cohort expansion after selecting MTD.
Setting m ≥ 9 (m = 12 is preferred) usually ensures reasonable operating
characteristics.
Although the above stopping rule provides an automatic, reasonable way
to determine the sample size for a particular subtrial, in some cases, it is
advantageous to prespecify a maximum sample size for each subtrial as well.
This can be done by adding an extra stopping rule:
If the number of patients treated in subtrial Sj reaches or exceeds nmax
j ,
where nmax
j is the prespecified maximum sample size for the subtrial Sj,
then stop the subtrial, select the candidate MTD, and initiate the next
subtrial.
We recommended setting nmax j between 4×(the number of doses in subtrial
Sj ) and 6×(the number of doses in subtrial Sj ) for j = 1, . . . , J. For example,
a trial with a 3 × 5 dose matrix, like the trial depicted in Figure 4.2, consists
of a first subtrial with seven doses, and second and third subtrials each with
four doses. We may set nmaxj = 28, 16, and 16 for three subtrials, respectively,
and thus a maximum of 60 patients for the trial. Although a maximum sample
size of 60 patients may seem large, because there are 15 dose combinations,
60 patients actually is not a very large sample size. Consider a single-agent
dose-finding trial with 15 doses, the maximum sample size under the 3+3
design is 6 × 15 = 90 patients. We recommend using computer simulations
to calibrate m and nmaxj , thereby ensuring the design has desirable operating
characteristics. This simulation-based calibration can be carried out with the
software described in the next section.
Lastly, the partition of the dose matrix is not unique. Any partition can
be used as long as it is clinically sound and the doses within each block are
fully ordered. For example, we may use each row of the dose matrix as a
subtrial. In addition to the advantage of being simple and transparent, as
shown in the next section, the waterfall design also has competitive and often
better performance than the more complicated model-based designs, such as
the design based on the product of independent beta probabilities (PIPE)
(Mander and Sweeting, 2015).
82 Model-Assisted Bayesian Designs for Dose Finding and Optimization
100
(i) (ii) (iii)
80
60
Percentage
37.8
34.6
25.8
22.6
21.0
20
0
3+3
BLRM
POCRM
Copula
BOIN
3+3
BLRM
POCRM
Copula
BOIN
3+3
BLRM
POCRM
Copula
BOIN
Percentage of patients treated at overdoses
60
30
23.1
19.8
15.5 19.9 20.1 18.6
15.1
13.7
13.4
12.2
12.3
10
0
3+3
BLRM
POCRM
Copula
BOIN
3+3
BLRM
POCRM
Copula
BOIN
3+3
BLRM
POCRM
Copula
BOIN
FIGURE 4.3: Simulation results of designs for finding one MTD based on 3000
random scenarios of (i) the 2×4 combination trial with 27 patients, (ii) the
3×5 combination trial with 48 patients, and (iii) the 4×4 combination trial
with 48 patients.
84 Model-Assisted Bayesian Designs for Dose Finding and Optimization
100
80
64.7
63.2
62.0
Percentage
60
52.5 52.1
49.0
40
20
0
PIPE
Waterfall
PIPE
Waterfall
PIPE
Waterfall
Percentage of patients treated at overdoses
30
26.6
25.4
20
PIPE
Waterfall
PIPE
Waterfall
PIPE
Waterfall
FIGURE 4.4: Simulation results of the PIPE and waterfall designs for finding
the MTD contour based on 3000 random scenarios of (i) the 2×4 combination
trial with 27 patients, (ii) the 3×5 combination trial with 48 patients, and
(iii) the 4×4 combination trial with 48 patients.
Drug-Combination Trials 85
Case study
Acute Myeloid Leukemia (AML) Trial The objective of this phase I
trial (ClinicalTrials.gov Identifier: NCT03600155) was to determine MTD for
nivolumab and ipilimumab alone and in combination in patients with high risk
or refractory/relapsed AML and myelodysplastic syndrome (MDS) following
allogeneic stem cell transplantation. The trial consisted of three parallel arms:
nivolumab alone, ipilimumab alone, and the combination of nivolumab and ip-
ilimumab. We here focus on the combination arm, which employed the BOIN
combination design to find MTD. Three doses of ipilimumab and two doses
of nivolumab were investigated (i.e., six combinations) with the target DLT
probability φ = 30%.
We use this trial as an example to illustrate the use of the BOIN web ap-
plication to design drug-combination trials. After selecting and launching the
“BOIN Comb” module from the BOIN Suite launchpad (see Figure 4.5), the
trial can be designed using the following three steps:
Step 1: Enter trial parameters
Doses and Target As shown in Figure 4.6, drug A (nivolumab) has two
doses and drug B (ipilimumab) has three doses. The starting dose is A1 B1 .
For trials where the lowest dose is believed to be safe, starting from a slightly
higher dose level (e.g., A1 B2 ) may save the sample size as it allows the design
to reach the MTD sooner.
The target DLT probability is φ = 0.3, which should be elicited from clini-
cians. The target φ can be adjusted if a specific safety requirement is desirable,
see Section 3.6 for more guidance on the specification of φ1 and φ2 . Depending
on the trial objective, users can select to find a single MTD (using the BOIN
combination design) or the MTD contour (using the waterfall design). In gen-
eral, finding the MTD contour requires a larger sample size than find a single
MTD. This is because finding the MTD contour requires a more thorough
86 Model-Assisted Bayesian Designs for Dose Finding and Optimization
exploration of the dose matrix. When target φ = 0.25, the software provides
an option called “Apply the 3+3 design run-in” to embed the 3+3 design rule
into the BOIN decision table, see Section 3.6 for details.
Sample Size and Cohort Size The cohort size for the trial is three and
the number of cohorts is 10, with a total sample size of 30 (Figure 4.7). As
a rule of thumb, for a J × K dose matrix, we recommend the maximum
sample size N ∈ [4 × J × K, 6 × J × K], where 6 × J × K corresponds to the
maximum sample size of the 3+3 design for J × K doses. This sample size
generally yields reasonable operating characteristics. In the AML trial, J = 2
and K = 3, thus, the recommended sample size is 24 to 36. To reduce the
actual sample size, it is often useful to use the “convergence” stopping rule:
stop the trial early when m patients have been assigned to a dose and the
decision is to stay at that dose. This stopping criterion suggests that the dose
finding approximately converges to the MTD, thus the trial can be stopped.
We recommend m = 9 or larger. In this trial, m = 12 is used. Because of the
early stopping rule, the actual sample size used in the trial is often smaller
than N . The saving depends on the true dose–toxicity relationship and can
be evaluated using simulation.
Another useful approach to reducing the sample size is to conduct the ac-
celerated titration before treating patients in cohorts of three (i.e., checking
Drug-Combination Trials 87
“Yes” for “Perform accelerated titration” in Figure 4.6). During the acceler-
ated titration, we treat patients in cohorts of one, and we continue escalating
the dose in the one-patient-per-dose-level fashion until any of the following
events is observed: (i) the first instance of DLT, (ii) the second instance of
FIGURE 4.7: Specify the sample size, cohort size, and convergence stopping
rule.
88 Model-Assisted Bayesian Designs for Dose Finding and Optimization
FIGURE 4.8: Specify the sample size of subtrials for the waterfall design.
moderate (grade 2) toxicity, or (iii) the highest dose level is reached. At that
point, the titration ends. We add two more patients to the current dose level,
and hereafter switch to the cohort size of three. During the titration, the drug
to be escalated (A or B) can be randomly selected, which is adopted by the
software for simulation, or chosen based on clinical consideration. In addition,
as described in Section 3.6, in the simulation, the software considers only (i)
and (iii), ignoring (ii), due to practical considerations described previously.
As the accelerated titration leads to more aggressive dose escalation, it
should be used only when there is sufficient evidence that low doses are most
likely underdosing. When the accelerated titration is used, the sample size can
be chosen using simulation to obtain desirable operating characteristics.
In the case that the trial objective is to find the MTD contour, we need
to specify the maximum sample size nmax j for each subtrial (Figure 4.8). As
discussed in Section 4.3.3, we recommend setting nmax j between 4×(the num-
ber of doses in subtrial Sj ) and 6×(the number of doses in subtrial Sj ), for
j = 1, . . . , J. For this trial with a 2 × 3 dose matrix, the first and second
subtrials have four and two doses, respectively. We may set nmax j = 24 and
12 (i.e., eight and four cohorts), respectively.
Overdose Control This panel (Figure 4.9) specifies the overdose control rule,
described in Section 4.3.1. That is, if Pr(πjk > φ | Djk ) > pE and njk ≥ 3,
dose combination Aj Bk and higher dose combinations are eliminated from the
trial, and the trial is terminated if the lowest dose combination is eliminated.
When the trial is terminated due to toxicity, no dose combination should be
selected as MTD. In general, we recommend to use the default probability
cutoff pE = 0.95. A smaller value (e.g., 0.9), results in stronger overdose
control, but at the cost of reducing the probability of correctly identifying
Drug-Combination Trials 89
elimination rule that is more likely to eliminate the lowest/first dose, and thus
terminate the trial, by shifting the probability cutoff by δ. The default value
δ = 0.05 is generally recommended and produces a good balance between the
safety and accuracy to identify MTD. A large value of δ (e.g., 0.1) increases
the early termination probability (when all doses are toxic), but at the cost of
reducing the probability of correctly identifying MTD (when one of the doses
is MTD).
At the end of the trial, both the BOIN combination and waterfall designs
select MTD as the dose whose isotonic estimate of DLT probability is closet
to φ, globally or within each row of the dose matrix. In some cases, it may be
desirable to require the DLT probability estimate of MTD to be lower than
the de-escalation boundary λd . This can be done by activating the option at
the bottom of the “Overdose Control” panel (Figure 4.9).
After completing the specification of trial parameters, the decision table
will be generated by clicking the “Get Decision Table” button. The decision
table and the desirability score table will be automatically included in the
protocol template in Step 3, but they also can be saved as a separate csv,
Excel, or pdf file in this step if needed.
DOI: 10.1201/9780429052781-5 93
94 Model-Assisted Bayesian Designs for Dose Finding and Optimization
consequences if the most recently chosen dose later turns out to be overly
toxic. For example, suppose that the current dose level j has a total of
three patients, but all of them are waiting for toxicity evaluation. Motivated
by the desire to complete the trial quickly and based on the most recent
data, the dose-assignment decision for a new cohort of three patients is to
escalate the dose to j + 1, as none of the three patients have the DLT so far.
If it turns out that all three patients at dose level j experience severe late-
onset toxicities, then the three new patients already have been treated at an
excessively toxic dose.
A third approach is to suspend accrual after each cohort and wait until the
DLT data for the already accrued patients have cleared before enrolling the
next new cohort. This approach of repeatedly interrupting accrual, however, is
highly undesirable and often infeasible in practice. It delays treatment for new
patients and slows down the trial. Additionally, this approach consequently
increases the drug development duration, especially when the accrual is hard
as is the case in rare diseases.
In this chapter, we describe several novel phase I designs to handle late-
onset toxicities. Compared to the above empirical approaches, these designs
are statistically and scientifically rigorous and yield substantially better op-
erating characteristics. In what follows, we first characterize the implications
of late-onset toxicities, and then introduce and compare model-based designs
and model-assisted designs.
and let ti denote the time to DLT. For subjects who have experienced DLT,
we have 0 ≤ ti ≤ τ ; for those who do not experience DLT during the DLT
assessment window, we set ti = ∞. τ should be elicited from clinicians and
large enough to capture all DLTs relevant to MTD determination. For many
chemotherapies, τ is often taken as the first cycle of the therapy (e.g., 21 or
28 days), whereas for agents expected to induce late-onset toxicity (e.g., some
targeted or immunotherapy agents), τ can be several months or longer after
treatment.
At an interim decision time, let ui (0 ≤ ui ≤ τ ) denote the actual follow-
up time for patient i. If the patient has finished the DLT assessment, then
Late-Onset Toxicity 95
Patient 2
(x2=0)
Patient 1
(x1=1)
0 1 2 3 4
Time (month)
where the last equality follows because ti and ui are independent, and Pr(ti >
ui | ui < τ, ti > τ ) = 1. Similarly, for a patient who will experience DLT, the
probability that his/her toxicity outcome will be missing is given by
Therefore, the missing data are more likely to occur for those patients who
would not experience DLT in the follow-up period. Because the missing data
are nonignorable, the empirical approach that discards the missing data and
makes dose-assignment decisions solely based on the observed toxicity data
leads to biased inference and poor operating characteristics (Little and Rubin,
2014).
5.3 TITE-CRM
A number of model-based designs have been proposed to address late-onset
toxicities. Based on the framework of the continual reassessment method
(CRM), Cheung and Chappell (2000) proposed the time-to-event CRM
(TITE-CRM) to incorporate pending patients’ follow-up times into toxicity
evaluation via a weighting scheme. Bekele et al. (2007) proposed to monitor
late-onset toxicities using predicted risks. Regarding the late-onset toxicity is-
sue as a missing data problem, Yuan and Yin (2011b) utilized the expectation–
maximization algorithm to estimate the dose toxicity probabilities based on
the incomplete data to direct dose assignment, and Liu et al. (2013) proposed
the Bayesian data-augmentation CRM (DA-CRM) to handle pending toxic-
ity data. Yin et al. (2013) imputed the missing toxicity data based on the
Kaplan–Meier estimate. In what follows, we use TITE-CRM as an example
to illustrate the model-based approach.
Late-Onset Toxicity 97
5.4 TITE-BOIN
The time-to-event Bayesian optimal interval (TITE-BOIN) design is an ex-
tension of the BOIN design (Section 3.4) to address the issue of late-onset
toxicities (Yuan et al., 2018). As a model-assisted design, TITE-BOIN is sim-
ple to implement and its decision rule can be pre-tabulated and included in
the trial protocol. In addition, it yields outstanding performance comparable
to the more complicated model-based TITE-CRM.
Because the formulas for ∆e and ∆d are functions of summary statistics such
as nj , ỹj , cj , and STFTj , this dose escalation/de-escalation rule can be tabu-
lated prior to trial conduct. As a result, TITE-BOIN inherits the transparency
and simplicity of the standard BOIN design and does not require repeated,
complicated model fitting after treating each cohort/patient.
Table 5.1 shows the TITE-BOIN decision rule with a cohort size of three
and a target DLT rate of 0.2. To conduct the trial, we only need to count
the number of patients at the current dose, the number of patients who expe-
rienced DLT, the number of pending patients, and STFT, and then look up
the table to determine the dose escalation/de-escalation. Suppose that three
patients have been treated at the current dose, one of them had DLT. We
de-escalate the dose regardless of STFT. Consider another case where nine
patients have been cumulatively treated at the current dose, one patient had
DLT, and four patients have DLT data pending. To treat the next cohort of
patients, if STFT of the four pending patients is greater than 2.15, we escalate
the dose; otherwise, we retain the current dose. Table 5.1 assumes a cohort
size of three, but TITE-BOIN allows any prespecified cohort size, and the
corresponding decision table (i.e., similar to Table 5.1 but with more rows)
can be easily generated using the software described later.
In principle, TITE-BOIN supports continuous accrual and allows for real-
time dose assignment whenever a new patient arrives. To avoid risky decisions
caused by sparse data, an accrual suspension rule is imposed:
If at the current dose, more than 50% of the patients’ DLT outcomes are
pending, suspend the accrual to wait for more data to become available.
This rule corresponds to “Suspend accrual” in Table 5.1. In addition, the same
overdose control/safety stopping rule as BOIN is also employed:
If Pr(πj > φ | Dj ) > 0.95 and nj ≥ 3, dose level j and higher are elimi-
nated from the trial, and the trial is terminated if the lowest dose level is
eliminated.
This overdose control rule corresponds to the decision “Y&Elim,” representing
“Yes & Eliminate,” under the column entitled “De-escalate” in Table 5.1.
Late-Onset Toxicity 101
TABLE 5.1: Dose escalation and de-escalation rule for TITE-BOIN with a
target DLT rate of 0.2 and a cohort size of three, up to 12 patients.
TABLE 5.2: Dose–toxicity scenarios used in the simulation study. The target
dose (i.e., MTD) is bolded.
Dose Level
Scenario 1 2 3 4 5 6 7
1 0.3 0.4 0.5 0.6 0.7 0.8 0.9
2 0.14 0.3 0.39 0.48 0.56 0.64 0.7
3 0.07 0.23 0.41 0.49 0.62 0.68 0.73
4 0.05 0.15 0.3 0.4 0.5 0.6 0.7
5 0.05 0.12 0.2 0.3 0.38 0.49 0.56
6 0.01 0.04 0.08 0.15 0.3 0.36 0.43
7 0.02 0.04 0.08 0.1 0.2 0.3 0.4
8 0.01 0.03 0.05 0.07 0.09 0.3 0.5
TITE-CRM design using simulation studies. Table 5.2 shows eight scenarios
used to simulate trial data. The target DLT probability is 30%. The DLT
assessment window is three months, the accrual rate is two patients/month.
The time to DLT is sampled from a Weibull distribution, with 50% of DLTs
occurring in the second half of the assessment window. The maximum sample
size is 36 patients, and patients are treated in cohorts of three. Because the
3+3 and rolling six designs often stopped the trial early (e.g., when two of
three patients experienced DLT) before reaching 36 patients, in these cases
the remaining patients are treated at the selected “MTD” as the cohort ex-
pansion, such that the four designs have comparable sample sizes. For the
3+3 design, a new cohort is enrolled only when the previous cohort’s DLT
data are cleared. More details about the simulation study can be found in
Yuan et al. (2018).
Figure 5.2 shows the relative performance of the rolling six, TITE-BOIN
and TITE-CRM designs against the performance of the 3+3 design, includ-
ing the differences in (a) the percentage of correct selection of MTD, (b) the
percentage of patients overdosed (i.e., treated at doses above MTD), (c) the
percentage of patients underdosed (i.e., treated at doses below the MTD), and
(d) the average trial duration. TITE-BOIN is more efficient and has higher
percentages of correct selection of MTD than the algorithm-based design (i.e.,
the rolling six design) because of using the follow-up times of pending patients
to determine dose escalation and de-escalation. Compared to the model-based
TITE-CRM, TITE-BOIN has comparable accuracy to identify MTD, but it
is safer and much simpler. Compared to the 3+3 design, TITE-BOIN dra-
matically shortens the trial duration owing to its ability to make real-time
decisions in the presence of pending patients.
Late-Onset Toxicity 105
20
Percentage
Percentage
20
10
10
0
0
1 2 3 4 5 6 7 8 Average 1 2 3 4 5 6 7 8 Average
Scenario Scenario
0 −5
Percentage
Month
−10 −10
−20
−15
−30
−20
1 2 3 4 5 6 7 8 Average 1 2 3 4 5 6 7 8 Average
Scenario Scenario
FIGURE 5.2: The relative performance of the rolling six (R6), TITE-BOIN,
and TITE-CRM designs against the performance of the 3+3 design, including
the differences in (a) the percentage of correct selection of the MTD, (b) the
percentage of patients overdosed (i.e., treated at doses above the MTD), (c)
the percentage of patients underdosed (i.e., treated at doses below the MTD),
and (d) the average trial duration.
likelihood is given by
For a patient with δi = 0, xi has not been ascertained yet with DLT outcome
pending, and his/her actual observed outcome is x̃i = 0. These pending pa-
tients are a mixture of two subgroups: patients who will not experience DLT
(i.e., xi = 0), and patients who will experience DLT (i.e., xi = 1) but have
not experienced it yet by the interim decision time (i.e., ui < ti ). Therefore,
the likelihood for a pending patient is given by
Pr(x̃i = 0, δi = 0)
= Pr(xi = 0) Pr(x̃i = 0, δi = 0|xi = 0) + Pr(xi = 1) Pr(x̃i = 0, δi = 0 | xi = 1)
= Pr(xi = 0) + Pr(xi = 1) Pr(ti > ui | xi = 1)
= 1 − πj + πj {1 − Pr(ti ≤ ui | xi = 1)}
= 1 − π j ωi ,
where
nj
X
ñj = ỹj + m̃j , m̃j = mj + (1 − δi )ωi . (5.9)
i=1
1.0
0.05
True True
Approximated Approximated
0.8
0.04
Posterior distribution
Posterior distribution
0.6
0.03
0.02
0.4
0.01
0.2
0.00
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
πj πj
FIGURE 5.3: The exact and approximated posterior functions based on the
observed data: (a) nj = 5 patients have been treated and only mj = 2 pa-
tients have finished the assessment without any DLT, and the weights for the
remaining three patients are 0.3, 0.4, and 0.5; (b) nj = 12 patients have been
treated, ỹj = 1 DLT has been observed, mj = 4 patients have finished the as-
sessment without any DLT, and the weights for the remaining seven patients
are 0.1, 0.2, . . . , 0.7. The prior distribution of πj is πj ∼ Unif(0, 1); ỹj and m̃j
represent the “effective” numbers of patients with DLT and patients without
DLT, respectively, by the interim time γ; Ik∗ represents the target key in the
keyboard design.
Although the uniform scheme seems very restrictive, it yields remarkably ro-
bust performance. This was also observed in TITE-CRM and TITE-BOIN.
As described in section 5.4.2, a more flexible weighting scheme is to assume
that ti follows a piecewise uniform distribution, which partitions (0, τ ) into
Late-Onset Toxicity 109
several intervals and assumes a uniform distribution within each interval. This
approach is useful to incorporate prior information on ti . For ease of exposi-
tion, the assessment window (0, τ ) is partitioned into the initial part (0, τ /3),
the middle part (τ /3, 2τ /3), and the final part (2τ /3, τ ), and it assumes that
ti is uniformly distributed in each interval. Let (ν1 , ν2 , ν3 ) be the prior proba-
bility that the DLT would occur at the three parts of the assessment window,
where ν1 + ν2 + ν3 = 1. For example, prior data may suggest that the DLT
is more likely to occur late in the assessment window, in which case we can
choose ν3 > ν2 > ν1 . It then follows that
3ν1 ui /τ,
ui ∈ (0, τ /3),
ωi = Pr(ti < ui | xi = 1) = ν1 − ν2 + 3ν2 ui /τ, ui ∈ (τ /3, 2τ /3) ,
ν1 + ν2 − 2ν3 + 3ν3 ui /τ, ui ∈ (2τ /3, τ ),
table does not depend on the weighting scheme or the length of the DLT
assessment window. Table 5.3 applies no matter which of the aforementioned
three weighting schemes is used and no matter the length of the assessment
window. This is because the likelihood (5.8) depends on m̃j , which is the
effective number of patients without DLT. Moreover, when all the pending
DLT data become available, we have (ñj , ỹj ) = (nj , yj ). As a result, the TITE-
keyboard design reduces to the standard keyboard design in a seamless way.
For patient safety, Lin and Yuan (2020) required that dose escalation is not
allowed until at least two patients have completed the DLT assessment at the
current dose level. In addition, an overdose control/stopping rule is imposed:
at any time during the trial if any dose j satisfies Pr(πj > φ | nj , ỹj ) > η
and nj ≥ 3, then that dose and any higher doses are regarded as overly toxic
and should be eliminated from the trial, and the dose is de-escalated to level
j − 1 for the next cohort of patients, where η is the prespecified elimination
cutoff, say η = 0.95. If the lowest dose level is eliminated, the trial should be
terminated early. Table 5.3 also reflects such safety and overdose control rules.
The TITE-keyboard design has some desirable statistical properties. First,
similar to the TITE-BOIN design, the TITE-keyboard design also enjoys the
monotonicity property.
Theorem 5.4 The TITE-keyboard design is monotonic.
Second, the TITE-keyboard design is long-memory coherent. The proofs of
Theorems 5.4 and 5.5 are given in Lin and Yuan (2020).
Case study
Recurrent or High Grade Gynecologic Cancer Trial The objective of this
phase I trial (ClinicalTrials.gov Identifier: NCT03508570) is to determine
FIGURE 5.5: Specify doses, sample size, and convergence stopping rule.
114 Model-Assisted Bayesian Designs for Dose Finding and Optimization
FIGURE 5.6: Specify the target, DLT assessment window, and accrual rate.
even if the maximum sample size of 24 is not reached. Because the TITE-
BOIN design is a generalization of BOIN, one can follow the rule of BOIN
to specify the maximum sample size and the convergence criterion for early
stopping, see more details in Section 3.6.
Target and Accrual Rate As shown in Figure 5.6, the target DLT proba-
bility is φ = 0.3, and the default values for (φ1 , φ2 ) are used to derive the
optimal dose escalation and de-escalation boundaries. Section 3.6 provides
some guidance on how to adjust the values for φ, φ1 , and φ2 if a specific
safety requirement is desirable.
TITE-BOIN requires the specification of the DLT assessment window
(τ = 2.8 months) and the accrual rate (two patients/month). The decision
rule of TITE-BOIN does not depend on the accrual rate. The accrual rate
is used to simulate the operating characteristics of the design. By default,
TITE-BOIN assumes a uniform prior for the time to toxicity (i.e., the time to
toxicity is uniformly distributed over the assessment window). As described
previously, this assumption seems strong, but the design is remarkably ro-
bust to the violation of this assumption. Therefore, in general, we recommend
using this default prior. In the case that there is reliable prior information
on the distribution of the time to toxicity, we can incorporate that prior in-
formation by specifying prior DLT probabilities over the trimesters of the
assessment window (see Section 5.4.2). For example, Figure 5.7 sets the prior
DLT probabilities as 0.3, 0.5, and 0.2 for the trimesters of the assessment win-
dow to reflect that DLT is more likely to occur in the middle of the assessment
window.
Late-Onset Toxicity 115
Overdose Control This panel (see Figure 5.8) specifies the overdose control
rule, described in Section 5.4. That is, if Pr(πj > φ | Dj ) > 0.95 and nj ≥ 3,
dose level j and higher are eliminated from the trial, and the trial is termi-
nated if the lowest dose level is eliminated. When the trial is terminated due
to toxicity, no dose should be selected as MTD. see Section 3.6 for a detailed
discussion on how to specify overdose control through this panel to accommo-
date various trial objectives.
where α is the unknown parameter. Recall that q1 < · · · < qJ are known as
the skeleton, see Section 2.5. Under the Bayesian paradigm, we assign α a
normal prior f (α) = N(0, σ 2 ), where σ 2 is a prespecified hyperparameter.
This model a priori centers the dose–toxicity curve (π1 , · · · , πJ ) around
the skeleton (q1 , · · · , qJ ). The value of σ 2 controls the amount of informa-
tion borrowed from historical data via the skeleton. A smaller value leads to
stronger borrowing. If σ 2 = 0, the prior completely dominates the observed
data with πj ≡ qj regardless of the observed data.
To make the decision of dose escalation and de-escalation, CRM up-
dates the posterior estimate of πj based on the observed interim data D =
{(nj , yj ), j = 1, . . . , J}:
with the constraint λej ≤ λdj . When the non-informative prior ω0j = ω1j =
ω2j = 1/3 is used, which is recommended for most trials, the optimal
122 Model-Assisted Bayesian Designs for Dose Finding and Optimization
where ϕ0 = φ, ϕ1 = φ1 , and ϕ2 = φ2 .
3. Calculate (λej , λdj ) using formulas (6.3) and (6.4). In the rare case
that formulas (6.3) and (6.4) result in a solution λej > λdj , then de-
termine the optimal (λej , λdj ) using a numerical search to minimize
the decision error.
The derivation of ωkj in Step 2 is as below. For dose j, the predictive proba-
bility of Hkj based on the prior DLT rate qj and PESS nj can be expressed
as
q1 q2 q3 q4 q5
1.0
0.8
Probability
0.6
0.4
0.2
0.0
2 4 6 8 2 4 6 8 2 4 6 8
level j and higher are eliminated from the trial, where Pr(πj > φ | nj , yj ) is
evaluated based on the beta-binomial model with the Unif(0, 1) prior. As the
objective of the dose elimination rule is to protect patients from excessively
toxic doses, it is sensible to use the uniform prior to evaluate this rule to avoid
potential bias due to misspecification of the prior. As discussed in Section
3.4.5, if desirable, a vague prior such as Beta(0.02, 0.08) with the PESS = 0.1
can be also used to obtain similar operating characteristics on dose elimination
by slightly adjusting the overdose control probability cutoff 0.95 (e.g., to 0.93).
The trial is terminated if the lowest dose level is eliminated.
Robust prior
In general, iBOIN is robust to the misspecification of prior information
(i.e., prior-data conflict), especially when PESS is chosen as described above.
However, when the informative prior is grossly misspecified (e.g., the prior
estimate of MTD and the true MTD differ by three dose levels), it may com-
promise the accuracy of identifying MTD. To further strengthen the robust-
ness of iBOIN, Zhou et al. (2021a) proposed a robust prior, which is easy to
implement and yields superior operating characteristics.
Given the elicited skeleton (q1 , · · · , qJ ) with dose level j ∗ as the prior
estimate of MTD (i.e., qj ∗ = φ), the robust prior is the same as the prior
described above when j ∗ < J/2, but modifies PESS to (n01 , · · · , n0j ∗ , 0, · · · , 0)
when j ∗ ≥ J/2. In other words, when prior MTD j ∗ ≥ J/2, the robust prior
126 Model-Assisted Bayesian Designs for Dose Finding and Optimization
uses informative prior information for the dose up to the prior MTD estimate,
and after that it uses the non-informative prior.
The rationale of this robust prior is that the dose finding is a sequential
process of allocating patients from low doses to high doses. Thus, by the time
that dose finding reaches high doses, there is an extremely limited sample
size remaining to override the prior if it is misspecified. The robust prior
modifies the prior of high doses to be non-informative to facilitate overriding
the prior when the data conflict with the prior, thus alleviating the impact of
prior misspecification. This prior is particularly useful when there is a great
amount of uncertainty regarding the prior information. Zhou et al. (2021a)
also considered another robust prior, which is a mixture of informative and
non-informative priors. This mixture robust prior is conceptually appealing,
but more complicated and does not perform as well as the simple robust prior
described above.
aj = n0j qj ; bj = n0j (1 − qj ), j = 1, · · · , J.
Dose level
1 2 3 4 5
Scenario 1
True Pr(DLT) 0.30 0.42 0.50 0.60 0.65
Prior Pr(DLT) 0.30 0.42 0.54 0.64 0.73
Scenario 2
True Pr(DLT) 0.15 0.27 0.40 0.50 0.65
Prior Pr(DLT) 0.19 0.30 0.42 0.54 0.64
Scenario 3
True Pr(DLT) 0.09 0.12 0.15 0.30 0.45
Prior Pr(DLT) 0.01 0.04 0.10 0.19 0.30
Scenario 4
True Pr(DLT) 0.08 0.15 0.31 0.45 0.55
Prior Pr(DLT) 0.19 0.30 0.42 0.54 0.64
The simulation results show that (i) incorporating prior information im-
proves the accuracy of identifying MTD for all designs; (ii) iBOIN and iCRM
have similar performance in identifying MTD and allocating patients to MTD,
but iBOIN has a lower risk of overdosing patients and poor allocation; (iii)
Compared to iBOIN, iKeyboard has a larger variation in identifying MTD
(e.g., performs best in scenario 3, but worst in scenario 4), and a higher risk
of overdosing patients and poor allocation.
We use this trial as an example to illustrate how to use iBOIN to leverage the
prior information to improve the efficacy of phase I trials. After selecting and
% Pts % Pts
Design PCS at MTD >MTD ROD RPA
Scenario 1
CRM 54.8 59.9 27.8 23.2 12.2
iCRM 63.1 65.2 24.9 19.4 9.8
BOIN 59.2 59.6 29.0 23.6 10.2
iBOIN 64.2 66.2 22.4 12.8 4.5
Keyboard 59.2 59.3 29.3 23.6 10.2
iKeyboard 64.2 50.7 39.6 34.2 17.8
Scenario 2
CRM 51.6 36.1 7.7 29.5 25.2
iCRM 53.3 42.4 5.6 23.7 16.8
BOIN 50.6 41.1 6.0 23.0 17.1
iBOIN 57.8 47.6 3.7 10.4 8.6
Keyboard 50.2 41.1 6.0 23.0 16.7
iKeyboard 59.6 37.8 6.6 35.1 23.6
Scenario 3
CRM 50.7 29.9 14.6 9.3 32.9
iCRM 57.3 33.8 17.0 13.0 28.2
BOIN 51.5 28.6 13.1 1.2 24.6
iBOIN 58.6 35.5 18.4 3.8 11.8
Keyboard 52.1 8.6 13.1 1.2 24.6
iKeyboard 59.5 9.2 27.9 11.2 17.9
Scenario 4
CRM 58.0 38.1 21.7 17.3 21.8
iCRM 59.8 38.0 18.3 13.4 21.3
BOIN 52.3 35.6 17.0 7.9 19.2
iBOIN 61.6 33.0 10.9 2.2 14.8
Keyboard 52.4 35.7 17.1 7.9 18.9
iKeyboard 56.4 45.4 15.3 3.6 6.8
130 Model-Assisted Bayesian Designs for Dose Finding and Optimization
launching the “iBOIN” module from the BOIN Suite launchpad, we design
the trial by taking the following three steps:
Step 1: Enter trial parameters
Doses, Sample Size, and Target As shown in Figure 6.3, the number of
doses under investigation is three, starting dose level is one. The starting dose
level can be adjusted according to prior information, e.g., one level below the
prior MTD estimated using historical data. For example, in some trials, the
appropriate starting dose level may be two or even higher.
The cohort size for the trial is three and the number of cohorts is four, with
a total sample size of 12. As a rule of thumb, we recommend the maximum
sample size N = 6 × J (i.e., the maximum sample size of the 3+3 design) as
the total sample size, where J is the number of doses. This trial uses a smaller
sample size due to the limited accrual and the intention to shorten the trial
duration. In this case, leveraging available prior information provides a useful
approach to compensate the small sample size.
To reduce the sample size, it is often useful to use the “convergence”
stopping rule: stop the trial early when m patients have been assigned to a
dose and the decision is to stay at that dose. This stopping criterion suggests
that the dose finding approximately converges to MTD, thus the trial can
be stopped. We recommend m = 9 or larger. In this trial, m = 9 is used.
Because of the early-stopping rule, the actual sample size used in the trial
is often smaller than N . The saving in the sample size depends on the true
Incorporating Historical Data 131
FIGURE 6.3: Specify doses, sample size, target, convergence stopping rule,
and accelerated titration.
Overdose Control This panel (see Figure 6.4) specifies the overdose control
rule, described in Section 5.4. That is, if Pr(πj > φ | Dj ) > 0.95 and nj ≥ 3,
dose level j and higher are eliminated from the trial, and the trial is termi-
nated if the lowest dose level is eliminated. When the trial is terminated due
to toxicity, no dose should be selected as MTD. In general, we recommend
to use the default probability cutoff 0.95. A smaller value, e.g., 0.9, results in
132 Model-Assisted Bayesian Designs for Dose Finding and Optimization
stronger overdose control, but at the cost of reducing the probability of cor-
rectly identifying MTD. This is because, in order to correctly identify MTD,
it is imperative to explore the doses sufficiently to learn their toxicity profile.
The option “ Check to impose a more stringent safety stopping rule on the
lowest dose” can be used to increase the early stopping probability when all
doses are toxic; see Section 3.6 for a detailed discussion on how to use this
option.
Prior and PESS This panel (see Figure 6.5) specifies the prior estimate of
toxicity probability at each dose (i.e., the skeleton estimated based on the his-
torical data) and PESS. Based on the historical data on alisertib and osimer-
tinib, the combination is expected to be safe with little overlapped toxicities,
and thus the skeleton is set as (0.05, 0.12, 0.25) for the three doses. Following
the guidance described in Section 6.2.2, PESS is set as two for the three doses,
leading to a prior being informative without dominating the trial data. For
many trials, it is desirable to use the robust prior to obtain extra robustness.
This can be done by checking the corresponding activation box. For this trial,
the robust prior is identical to the informative prior, as the prior estimate of
MTD is the last dose.
After completing the specification of trial parameters, the decision table
will be generated by clicking the “Get Decision Table” button. The decision
table will be automatically included in the protocol template in Step 3, but
can also be saved as a separate csv, Excel or pdf file in this step if needed.
The resulting dose escalation and de-escalation boundaries are shown in
Table 6.4. As the prior information suggests that dose level one is far be-
low MTD, the escalation boundary of dose level one (e.g., escalate if ≤ 1/3
DLT) is higher than that of dose levels two and three (e.g., escalate if ≤ 0/3
DLT), making it easier to escalate at dose level 1. When more patients are
treated, the trial data start to dominate the prior, and thus the escalation
and de-escalation boundaries become the same for three doses when six or
nine patients are treated at a dose. These boundaries eventually shrink to the
standard BOIN boundaries when the sample size is sufficiently large.
TABLE 6.4: iBOIN decision boundaries for the lung cancer trial, given the
skeleton (q1 , · · · , q5 ) = (0.05, 0.12, 0.25) and PESS n01 = n02 = n03 = 2. The
target DLT probability φ = 0.3.
ulation should cover various possible clinical scenarios, e.g., MTD is located
at different dose levels. To facilitate the generation of the operating charac-
teristics of the design, the software automatically provides a set of randomly
generated scenarios with various MTD locations, which are often adequate
for most trials. Depending on the application, users can add or remove sce-
narios. The software also provides an option to include the 3+3 design as a
comparator to facilitate communication with clinicians who are more familiar
with the conventional design. Table 6.5 shows the operating characteristics of
iBOIN compared with those of BOIN using the non-informative prior. When
the MTD prior estimate is correctly specified, iBOIN improves the percent-
age of correct selection from 61.4% to 72.5%. The simulation results will be
Incorporating Historical Data 135
TABLE 6.5: Operating characteristics of the iBOIN design for the lung cancer
trial in comparison with the BOIN with the non-informative prior. The dose
in bold is MTD.
Dose level
Design 1 2 3
Pr(DLT) 0.05 0.13 0.3
BOIN Selection (%) 1.5 37.1 61.4
% of Patients 30.6 38.3 31.0
iBOIN Selection (%) 1.4 26.1 72.5
% of Patients 27.2 39.2 33.6
bodies (e.g., Institutional Review Board), we follow the design decision table to
conduct the trial and make adaptive decisions (e.g., dose escalation/stay/de-
escalation). At the completion of the trial, the “Select MTD” function can be
used to determine MTD based on the observed trial data.
7
Multiple Toxicity Grades
TABLE 7.1: Toxicities and severity weights in the sarcoma trial, reproduced
from Bekele and Thall (2004).
dermatitis, and grade 3 nausea/vomiting, the TTB for that patient is then
calculated as TTB = 1 + 2.5 + 1.5 = 5. The maximum tolerated dose (MTD)
is defined as the dose level where the true TTB value is closest to the target
TTB value specified by the oncologists. To identify the MTD, Bekele and Thall
(2004) built a joint model for the toxicities under the Bayesian framework.
At each decision-making time, the dose that has the posterior expected TTB
closest to the target TTB is selected as the next level.
Similarly, Chen et al. (2010) proposed to map toxicity grades into a quasi-
continuous or continuous endpoint using the normalized equivalent toxicity
score (NETS). Lee et al. (2009) proposed the toxicity burden score (TBS)
to summarize toxicity using a weighted sum, where the severity weights were
estimated via regression based on historical data. Ezzalfani et al. (2013) pro-
posed another flexible toxicity endpoint, called the total toxicity profile (TTP),
which is computed as the Euclidean norm of the severity weights. Although
these composite scores (TTB, NETS, TBS, and TTP) are not strictly continu-
ous, in practice, after appropriate transformation, they can be approximately
regarded as a normally distributed endpoint.
as well as the standard binary endpoint (Mu et al., 2018). The second method
is the multiple-toxicity BOIN (MT-BOIN) design based on multiple toxicity
constraints (Lin, 2018). Unlike model-based designs, the decision rules of the
gBOIN and MT-BOIN designs can be tabulated prior to the onset of the trial,
making the trial conduct simple and straightforward, similar to the standard
BOIN design (with a binary endpoint).
where h(·), T (·), η(·), and A(·) are known functions, and θj is a distributional
parameter that can be scalar or vector, depending on the distribution of y
at dose dj . The exponential family of distributions includes many commonly
used distributions, such as normal, binomial, multinomial, Poisson, gamma,
and beta distributions. For example, define µ = E(y) and µj = E(y|dj ), then
• y follows a Bernoulli distribution if θj = µj , η(θj ) = log{µj /(1 − µj )},
A(θj ) = − log(1 − µj ) , T (y) = y, and h(y) = 1.
• y follows a normal distribution if θj = µj , η(θj ) = µj /σj2 , A(θj ) = µ2j /(2σj2 ),
T (y) = y, and h(y) = (2πσj2 )−1/2 exp{−y 2 /(2σj2 )}.
Let φ denote the target value of µ for dose finding. For binary or quasi-
binary toxicity endpoints (DLT or ETS), φ is simply the target DLT proba-
bility; for continuous endpoints (e.g., the TTB, TBS, or TTP), φ is the target
value of the TTB, TBS, or TTP. Let Dj = (y1 , · · · , ynj ) denote the observed
toxicity data from nj patients treated at dose dj , and define the corresponding
sample mean
nj
X
µ̂j = yi /nj .
i=1
For a binary or quasi-binary toxicity endpoint (DLT or ETS), µ̂j is the ob-
served toxicity rate at dose level j; and for continuous endpoints such as the
TTB or TTP, µ̂j is the sample mean of the observed TTB or TTP at dose
level j.
Multiple Toxicity Grades 141
The posterior probability Pr (µj > φ | Dj ) > 0.95 can be evaluated on the ba-
sis of a beta-binomial model for the binary or quasi-binary endpoint, assum-
ing µj follows a vague beta prior, e.g., µj ∼ Beta (1, 1), leading to a posterior
µj |yj , nj ∼ Beta (yj + 1, nj − yj + 1). For normal endpoint y with mean µj
and variance σj2 , assuming noninformative prior (µ, σj2 ) ∝ σj−2 , the marginal
posterior distribution of µj follows a student’s t distribution
Pnj with degrees of
freedom (nj − 1), mean µ̂j and scale parameter n−1 j
2
i=1 (yi − µ̂j ) .
denote stay (at the current dose), escalation, and de-escalation, respectively.
Under H0j , the correct decision is S, and incorrect decisions are S̄ = {E, D};
under H1j , the correct decision is E, and incorrect decisions are Ē = {S, D};
and under H2j , the correct decision is D, and incorrect decisions are D̄ =
{S, E}.
Taking a non-informative approach, the investigational dose dj is assumed
a priori to have an equal chance of being below, equal to, or above the target,
i.e., Pr(H0j ) = Pr(H1j ) = Pr(H2j ) = 1/3. Then the probability of making an
incorrect decision, denoted by α, is given by
α = Pr(H0j ) Pr(S|H0j ) + Pr (H1j ) Pr E |H1j + Pr (H2j ) Pr D|H2j
1 1
= Pr{µ̂j ≤ λe (dj , nj , φ) or µ̂j > λd (dj , nj , φ)} + Pr{µ̂j > λe (dj , nj , φ)}
3 3
1
+ Pr{µ̂j ≤ λd (dj , nj , φ)}. (7.2)
3
Mu et al. (2018) derived the optimal values of λe (dj , nj , φ) and λd (dj , nj , φ)
that minimize the decision error (7.2).
Theorem 7.1 Let ϑk denote the model parameters under Hkj , k = 0, 1, 2.
The probability of making an incorrect decision is minimized by
A(ϑ1 ) − A(ϑ0 ) A(ϑ2 ) − A(ϑ0 )
λe = , λd = , (7.3)
η(ϑ1 ) − η(ϑ0 ) η(ϑ2 ) − η(ϑ0 )
which are the same as the boundaries of the BOIN design (Liu and Yuan,
2015) for a standard binary toxicity endpoint.
When y is a continuous endpoint (e.g., TTB, NETS, TBS, and TTP)
following a normal distribution, we have ϑk = φk , A(ϑk ) = φ2k /(2σj2 ), η(ϑk ) =
φk /σj2 . Then,
φ + φ1 φ + φ2
λe = , λd = . (7.5)
2 2
Therefore, gBOIN generalizes the BOIN design to embrace various types of
endpoints.
Table 7.2 shows examples of the values of (λe , λd ) for different target values
of φ with φ1 = 0.6φ and φ2 = 1.4φ. It is remarkable that, regardless the type
144 Model-Assisted Bayesian Designs for Dose Finding and Optimization
Theorem 7.2 The gBOIN design is long-term memory coherent in the sense
that the design will never escalate the dose when µ̂j > φ, and it will never
de-escalate the dose when µ̂j < φ.
Theorem 7.3 As the number of patients goes to infinity, the dose assignment
and the selection of MTD under the gBOIN design converge almost surely to
dose level j ∗ if dose level j ∗ is the only dose satisfying µj ∗ ∈ (λe , λd ). If there
are multiple dose levels in (λe , λd ), the design will converge almost surely to
one of these levels.
or higher toxicity). With the two toxicity constraints, MTD (i.e., target dose)
is defined as
dj † = min{dj1∗ , dj2∗ }, (7.6)
where
djl∗ = arg min |πlj − φ(l) |, l = 1, 2.
dj ∈{d1 ,...,dJ }
Here, djl∗ is the dose that has the toxicity probability closest to the target
toxicity probability with respect to Yl . Depending on trial objectives, other
definitions of MTD can also be used when appropriate.
Suppose that at dose dj , a total of ylj out of nj patients have experienced
the toxicity event associated with Yl . The observed toxicity rate of Yl at dj is
(1) (2)
• If π̂1j > λd or π̂2j > λd , de-escalate the dose to level j − 1.
(1) (2) (2) (1)
• Otherwise, i.e., π̂1j ≤ λd , λe < π̂2j ≤ λd or λe < π̂1j ≤
(1) (2)
λd , π̂2j ≤ λd , stay at the current dose level j.
3. Repeat Step 2 until the maximum sample size N is reached. At that
point, perform the isotonic regression (Barlow et al., 1972) to the
estimated toxicity rates {π̂lj , j = 1, . . . , J}, so that the isotonically
transformed estimator {π̃lj , j = 1, . . . , J} satisfies the monotonic
constraint, and then select the overall MTD ĵ † as
where dĵ ∗ = arg min|π̃lj − φ(l) |, and the set N = {dj : nj > 0}
l
dj ∈N
contains all the doses that have been tested in the trial.
For the purpose of overdose control, the MT-BOIN design imposes a dose
elimination rule similar to BOIN as follows:
Multiple Toxicity Grades 147
H0j : (π1j , π2j ) ∈ Θ0j , H1j : (π1j , π2j ) ∈ Θ1j , H2j : (π1j , π2j ) ∈ Θ2j ,
(7.7)
where the parameter set Θij , i = 0, 1, 2 is defined as
n o
(2) (1)
Θ0j = (φ(1) , φ(2) ), (φ(1) , φ1 ), (φ1 , φ(2) ) ,
n o
(1) (2)
Θ1j = (φ1 , φ1 ) ,
n o
(1) (2) (1) (1) (2) (2) (1) (2)
Θ2j = (φ2 , φ2 ), (φ2 , φ(2) ), (φ2 , φ1 ), (φ(1) , φ2 ), (φ1 , φ2 ) .
Unlike the standard BOIN design that only involves three point hypotheses,
each hypothesis Hij , i = 0, 1, 2 of the proposed MT-BOIN enjoys a hierar-
chical structure and includes a set of multiple point sub-hypotheses, which
is thus composite. In particular, Θ1j indicates that both toxicity rates for
Y1 and Y2 at dose level j are overly small, hence, the hypothesis H1j cor-
responds to the subtherapeutic case. Θ2j is made up by the combinations
that at least one toxicity rate is excessively high, then H2j means that dose
level j is not safe. Similarly, H0j indicates that the dose is within the target
zone. Note that under H0j , the dose level with the joint toxicity probabilities
(2) (1)
(π1j , π2j ) = (φ(1) , φ1 ) (or (π1j , π2j ) = (φ1 , φ(2) )) is treated as proper dosing
instead of underdosing. This is because the monotone increasing dose–toxicity
relationship implies that φ(1) < π1,j+1 (or φ(2) < π2,j+1 ); as a result, dose level
j + 1 might be overly toxic in terms of toxicity Y1 (or Y2 ).
Let Oj = (y1j , y2j , nj ) be the local data observed at dose level j. De-
note the prior model probability Pr(Hij ) by ωij . For simplicity, the uniform
prior ωij = 1/3 is considered, i = 0, 1, 2. The correct dose-assignment de-
cisions under H0j , H1j , and H2j are S, E, and D, respectively. To account
for the composite nature of the three hypotheses, Lin (2018) defined αmax ,
the maximum probability of making incorrect decisions with multiple toxicity
148 Model-Assisted Bayesian Designs for Dose Finding and Optimization
outcomes, as follows:
αmax = ω1j Pr(S or D | H1j ) + ω0j Pr(D or E | H0j ) + ω2j Pr(E or S | H2j )
nj nj
X X
= ω1j I(π̂1j > λ(1)
e or π̂2j > λ(2)
e ) max f (Oj ; π1j , π2j )
(π1j ,π2j )∈Θ1j
y1j =0 y2j =0
where f (Oj ; π1j , π2j ) is the joint binomial likelihood function for Y1 and Y2 .
Due to the small sample size of the phase I dose-finding trials, the incorpo-
ration of the correlation between the two toxicity outcomes does not neces-
sarily improve the performance of the design. Therefore, the joint likelihood
f (Oj ; π1j , π2j ) can be simply treated as the product of independent binomial
likelihood functions. Note that even when the two outcomes are correlated,
the marginal estimate of the toxicity rate under the working independence
assumption is still consistent.
Based on the minimax theory, Lin (2018) obtained explicit expressions
(l) (l)
of optimal interval boundaries λe (dj , nj , φ(l) ) and λd (dj , nj , φ(l) ) for non-
nested toxicities by minimizing αmax .
Theorem 7.4 If uniform prior model probabilities are considered, the optimal
interval boundaries obtained by minimizing the maximum incorrect probability
(7.8) for the toxicity Yl are exactly the same as those of standard BOIN by
treating Yl alone, that is, for l = 1, 2,
! !
(l)
1 − φ1 1 − φ(l)
log log
1 − φ(l) (l) 1 − φ2
(l)
(l)
λe = (l)
! , λd =
(l)
!. (7.8)
φ(l) (1 − φ1 ) φ2 (1 − φ(l) )
log (l)
log (l)
φ1 (1 − φ(l) ) φ(l) (1 − φ2 )
FIGURE 7.2: Specify doses, sample size, and convergence stopping rule.
150 Model-Assisted Bayesian Designs for Dose Finding and Optimization
maximum sample size of the 3+3 design) as the total sample size, where J is
the number of doses. This sample size generally yields reasonable operating
characteristics (e.g., 50-70% correct selection percentage of the true MTD).
Section 3.6 provides more guidance on the determination of cohort size.
To reduce the sample size, it is often useful to use the “convergence”
stopping rule: stop the trial early when m patients have been assigned to a
dose and the decision is to stay at that dose. This stopping criterion suggests
that the dose finding approximately converges to MTD, thus the trial can be
stopped. We recommend m = 9 or larger. In this trial, m = 12 is used. Because
of the early stopping rule, the actual sample size used in the trial is often
smaller than the prespecified maximum sample size N . The saving depends
on the true dose–toxicity scenario and can be evaluated using simulation.
Usually, the saving in the sample size is more prominent when the true MTD
is near the starting dose.
Target and Equivalent DLT As shown in Figure 7.3, the target DLT proba-
bility is φ = 0.3. When appropriate, the accelerated titration can be chosen to
speed up dose escalation and reduce the total sample size. See Section 3.6 for
more discussion on choosing the target and pros and cons of the accelerated
titration.
For gBOIN, the new design parameters are “Equivalent Number of DLT,”
which maps toxicity grades to numeric scores that reflect their relative severity
in the unit of DLT. This is the “Toxicity score” approach discussed in Section
7.1. For example, in Figure 7.3, grade 3 or higher toxicity is defined as a DLT;
grade 2 toxicity is regarded as equivalent to 0.5 DLT, while grade 1 toxicity is
regarded as acceptable and should not affect decisions of dose transition and
MTD determination. The toxicity score should be elicited from clinicians and
reflects the relative severity of different toxicity grades.
Overdose Control This panel (see Figure 7.4) specifies the overdose con-
trol rule, and it is the same as that of standard BOIN. Section 3.6 provides
guidance on how to set up the parameters. After completing the specification
of trial parameters, the decision table will be generated by clicking the “Get
Decision Table” button. The decision table will be automatically included in
the protocol template in Step 3, but can also be saved as a separate csv, Excel
and pdf file in this step if needed.
FIGURE 7.6: Operating characteristics of the gBOIN design for the solid
tumor trial.
probability at each dose, and a 95% credible interval (see Figure 7.7). Given
the data entered, dose level three is selected as MTD. In contrast, if we
considered only DLT, ignoring grade 2 toxicities, then dose level four or five
would be selected as MTD.
FIGURE 7.7: MTD estimation at the completion of the solid tumor trial based
on a hypothetical dataset.
8
Finding Optimal Biological Dose
8.1 Introduction
Conventionally, the primary objective of phase I oncology trials is to establish
the maximum tolerated dose (MTD). This more-is-better paradigm is based
on the monotonicity assumption that a higher dose leads to higher toxicity and
also higher efficacy, which typically holds for conventional chemotherapies.
The advent of novel targeted therapy and immunotherapy, such as check-
point inhibitors and chimeric antigen receptor (CAR) T-cell therapy, has rev-
olutionized cancer treatment. For these novel therapies, although toxicity typ-
ically increases with the dose, efficacy may plateau or even decrease at high
doses (Cook et al., 2015; Sachs et al., 2016; Shah et al., 2021). In addition,
some targeted therapies demonstrate minimal toxicity in the therapeutic dose
range, making MTD unlikely to be reached (Mathijssen et al., 2014). In these
cases, the conventional more-is-better paradigm, ignoring efficacy data, does
not depict the underlying setting and may result in undesirable consequences.
(Shah et al., 2021). As a result, FDA Oncology Center of Excellence recently
initiated Project Optimus “to reform the dose optimization and dose selection
paradigm in oncology drug development” (FDA, 2022).
To illustrate the issue, consider five doses of a targeted or immunother-
apy agent, d1 < · · · < d5 . Suppose that the true toxicity probabilities for the
five doses are (πT 1 , . . . , πT 5 ) = (0.05, 0.12, 0.27, 0.35, 0.50). If the true efficacy
probabilities are (πE1 , . . . , πE5 ) = (0.20, 0.35, 0.36, 0.37, 0.38), then efficacy
reaches a plateau of about 0.35 at dose d2 . All dose-finding methods consider-
ing toxicity only and with a target toxicity probability of 0.3 or 0.25, are most
likely to select d3 as MTD and use it for cohort expansion or a subsequent
phase II trial. However, d2 is obviously more desirable than MTD (i.e., d3 )
with much lower toxicity and virtually identical efficacy. Any “toxicity-only”
phase I method cannot determine this because it ignores efficacy.
Furthermore, suppose that the toxicity rates are the same but the true effi-
cacy probabilities are (πE1 , . . . , πE5 ) = (0.01, 0.05, 0.30, 0.60, 0.60). Escalating
from d3 to d4 only increases the toxicity probability from 0.27 to 0.35, but
doubles the efficacy probability from 0.3 to 0.6. Often this small increase in
toxicity may be considered as a reasonable trade-off for the large increase in
efficacy by choosing d4 rather than d3 , however, toxicity-only methods cannot
determine this.
Lastly, if the agent is ineffective for all doses, with true efficacy probabilities
(πE1 , . . . , πE5 ) = (0.00, 0.01, 0.01, 0.02, 0.02), the best decision is to not choose
any dose, but the toxicity-based methods still are most likely to choose d3 .
These examples demonstrate the deficiency of choosing a “best” dose based
only on toxicity (e.g., MTD), ignoring efficacy, which may lead to the failure
of subsequent phase II or III trials.
Collect both
efficacy and
toxicity data
NO
Stop the trial and YES Is maximum Are all doses YES Terminate the
select the optimal sample size overly toxic or/
trial early
dose reached? and futile?
Update dose NO
acceptability and
dose desirability by
including the most
recent data
Yin, 2011a; Cai et al., 2014; Guo and Li, 2015), dose–schedule optimization
(Guo et al., 2016; Lin et al., 2020a, 2021), ordinal toxicity and efficacy out-
comes (Houede et al., 2010; Lee et al., 2016), personalized dose-finding based
on biomarkers (Guo and Yuan, 2017), and immunotherapy considering tox-
icity, efficacy and immune response (Liu et al., 2018), among others. For a
comprehensive review on model-based phase I–II designs, see Mandrekar et al.
(2010) and Yuan et al. (2016b).
These model-based phase I–II designs are statistically complicated and
computationally intensive. To account for toxicity and efficacy, the model
used by the designs is significantly more complex than that of the conven-
tional phase I trial design (e.g., CRM), as illustrated in the next section.
Highly complicated and structured parametric models also make the design
susceptible to model misspecification. As a result, the model-based phase I-
II designs are often regarded by practitioners as difficult to understand and
implement, severely limiting their use in practice.
Model-assisted designs have been proposed to simplify the implementation
of phase I–II trials, while yielding the performance comparable to or even bet-
ter than model-based designs. As model-assisted designs do not assume any
parametric dose-toxicity and -efficacy curves, they are also more robust than
model-based designs. Lin and Yin (2017b) developed a toxicity-efficacy inter-
val based phase I–II design (called STEIN) using the pool-adjacent-violators
algorithm and model averaging. Li et al. (2017) proposed a toxicity and effi-
cacy probability interval design that separately models toxicity and efficacy.
Takeda et al. (2018) presented a Bayesian optimal interval design that ac-
commodates both efficacy and toxicity endpoints, known as BOIN-ET. Zhou
et al. (2019) developed a two-stage utility-based Bayesian optimal interval (U-
BOIN) design using a Dirichlet-multinomial model to jointly model toxicity
and efficacy. Lin et al. (2020b) adopted a quasi-binomial likelihood approach
to model the observed utility directly, and developed the BOIN12 design. Shi
et al. (2021) developed a utility-based toxicity probability interval (uTPI)
design as an extension of the toxicity-only keyboard design (Yan et al., 2017).
In the following sections, we first introduce the EffTox design as an example
to illustrate the characteristics of model-based designs, and then we describe
several model-assisted phase I–II designs, including the BOIN12, U-BOIN, and
uTPI designs. Other model-assisted designs such as STEIN and BOIN-ET will
be briefly discussed at the end of this chapter.
probability increases with the dose, i.e., πT 1 < · · · < πT J . The (marginal)
efficacy probability πE1 , · · · , πEJ , however, does not necessarily increase with
the dose, which may plateau or even decrease at higher doses. The EffTox
design uses an efficacy–toxicity trade-off contour as the criterion to define and
select OBD (Thall and Cook, 2004).
where r > 0 controls the degree of the curvature of the trade-off contours
(Thall and Cook, 2004). The value of r is determined by solving the equation
(3) (3) (1) (2)
ψ(πE , πT ) = ψ(πE , 0) = ψ(1, πT ).
Cδ = {(πE , πT ) : ψ(πE , πT ) = δ} .
That is, all (πE , πT ) pairs in Cδ have the same utility value δ.
Figure 8.2 gives an example for the efficacy–toxicity trade-off contours
based on the function (8.1). The utility is standardized between 0 and 1, with
the right bottom corner representing the most desirable case with a utility
of 1, where efficacy is certain and toxicity is impossible, and the left upper
corner representing the least desirable case with a utility of 0, where efficacy
is impossible and toxicity is certain. All (πE , πT ) pairs on each contour are
equally desirable. The utilities of the contours increase moving from upper
left to lower right, as πT becomes smaller and πE becomes larger.
160 Model-Assisted Bayesian Designs for Dose Finding and Optimization
1.0
0.15
0.3
0.8
0.45
Probability of toxicity
0.6
0.6
0.4 d1 0.75
d3
0.2
0.9
d2
0.0
Probability of efficacy
where
with βT > 0 to ensure that the marginal toxicity probability πT j increases with
the dose. There is no restriction on βE1 and βE2 , so that the dose–efficacy
curve can take various shapes. To induce correlation between efficacy and
toxicity, a Gumbel-Morgenstern copula is used to model the joint distribution
of yE and yT ,
yE
f (yT , yE , dj ; θ) = πEj {1 − πEj }1−yE πTyTj {1 − πT j }1−yT +
eψ − 1
yT +yE
+(−1) πEj {1 − πEj }πT j {1 − πT j } ,
eψ + 1
where θ = (µE , βE1 , βE2 , µT , βT , ψ) is the vector of unknown parameters,
and ψ is a real-valued association parameter. Denoting the data of the first
n patients in the trial by Dn = {yT i , yEi , d[i] }ni=1 with d[i] ∈ {d1 , . . . , dJ }
indicating the dose that patient i has received, the joint likelihood is given by
n
Y
L(Dn | θ) = f (yT i , yEi , d[i] ; θ).
i=1
(1) The first cohort of patients are treated at the lowest dose, or the
starting dose specified by the clinicians.
(2) For each subsequent cohort after the first cohort, we fit the dose-toxicity
and dose-efficacy models based on the observed data Dn , and obtain the ad-
missible dose set Ã(Dn ) = A(Dn ) ∪ {dj }, where dj is the lowest untried dose
that has acceptable toxicity. If there is no untried dose or the untried doses
are all overly toxic, then Ã(Dn ) = A(Dn ).
(3) If Ã(Dn ) is empty, the trial should be terminated and no dose is
selected; otherwise, the next cohort is treated at the most desirable dose
dj ∈ Ã(Dn ) that has the largest estimate of the efficacy-toxicity trade-off
ψ(πEj , πT j ), subject to the constraint that no untried dose may be skipped
when escalating. Repeat Steps (2) and (3) if the trial is not terminated.
(4) Stop the trial when the maximum sample size N is reached. If A(DN )
is not empty, select the OBD as the dose dj ∗ ∈ A(DN ) that maximizes the
efficacy-toxicity trade-off ψ(πEj ∗ , πT j ∗ ).
TABLE 8.1: Utility table for binary toxicity and efficacy endpoints yT and
yE .
Efficacy (yE )
Toxicity (yT ) Yes (= 1) No (= 0)
No (= 0) υ01 = 100 υ00 = 40
Yes (= 1) υ11 = 60 υ10 = 0
164 Model-Assisted Bayesian Designs for Dose Finding and Optimization
where πT = p11 + p10 and πE = p01 + p11 are the marginal probabilities of
toxicity and efficacy at dose d, respectively. Therefore, the risk-benefit trade-
off approach based on the marginal toxicity and efficacy probabilities,
u0 = πE − wπT , (8.3)
is a special case of (8.2) with w = υ00 /υ11 and υ00 + υ11 = 100. For example,
the utility shown in Table 8.1 satisfies υ00 +υ11 = 100, and thus it is equivalent
to the simplified trade-off
u0 = πE − 2/3πT . (8.4)
TABLE 8.2: Utility table for trinary efficacy and toxicity endpoints
Efficacy
Toxicity CR/PR SD PD
Minor 100 60 35
Moderate 65 30 25
Severe 30 15 0
PD: progressive disease, SD: stable disease, PR: partial response, CR:
complete response.
166 Model-Assisted Bayesian Designs for Dose Finding and Optimization
efficacy), respectively. Here, n = y01 + y00 + y11 + y10 is the number of patients
treated at dose d. The numbers of patients who have experienced toxicity and
efficacy are nT = y11 + y10 and nE = y01 + y11 , respectively. Subscript j (i.e.,
dose level) is suppressed in this and next sections for notational brevity.
Under the Bayesian paradigm, the U-BOIN design assumes a Dirichlet-
multinomial model,
where (α01 , α00 , α11 , α10 ) are hyperparameters, representing the prior numbers
of events for four outcomes, with the total prior sample size n0 = α01 + α00 +
α11 + α10 . The posterior distribution of (p01, p00 , p11 , p10 ) based on D is
(p01, p00 , p11 , p10 ) | D ∼ Dirichlet(α01 + y01 , α00 + y00 , α11 + y11 , α10 + y10 ).
πT | D ∼ Beta(αT + nT , βT + n − nT ), (8.6)
πE | D ∼ Beta(αE + nE , βE + n − nE ),
Start at the
pre-specified
starting dose
No
Does the number of patients on
any dose reach 𝑠" ?
Yes
Based on data in both Stages I and II, Stop the
Yes trial & no
estimate dose utility, and
determine admissible set 𝒜(𝐷) dose
selected
No
Yes Escalate
The toxicity rate at the highest
tried dose 𝜋$ #$∗ ≤ 𝜆% ? the dose to
𝑗∗ + 1
No
Stage II
Treat the next cohort of patients at the
admissible dose with the largest
estimated utility
B2 Update the admissible dose set A(D) based on the interim data D.
If no dose is admissible, terminate the trial and no dose should be
selected as OBD. Otherwise, use one of the following strategies to
allocate the next cohort of patients:
(i) Equally randomize the next cohort of patients to a dose in
A(D).
(ii) Assign the next cohort of patients to the admissible dose that
has the largest posterior mean utility û.
(iii) Adaptively randomize the next cohort of patients to a dose in
A(D) with a probability proportional to its û.
B3 Repeat Steps B1 and B2 until the maximum sample size N is ex-
hausted or the number of patients treated at one of the doses reaches
s2 (> s1 ), and then select the OBD as the admissible dose (i.e.,
∈ A(D)) that has the largest posterior mean utility.
In Stage I, following the BOIN design, we impose an overdose control rule as
follows: if Pr(πT j > φT | Dj ) > 0.95 and nj ≥ 3, dose level j and higher are
eliminated from the trial, and the trial is terminated if the lowest dose level
is eliminated, where Pr(πT j > φT | Dj ) > 0.95 is evaluated based a beta-
binomial model with the uniform prior. Once the trial moves on to Stage II,
this overdose control rule is seamlessly merged as the safety criterion of the
inadmissible rule defined in Section 8.3.3.
For Stage II Step B1, the reason that we perform dose escalation when
π̂T j ∗ ≤ λe is to allow the trial to continue exploring the dose space, given
that the highest tried dose is safe, to reduce the risk of being stuck at a local
suboptimal dose due to large variation caused by small sample size.
In Stage II Step B2, the three strategies generally yield similar performance
in identifying the OBD, but have a different emphasis on patient allocation.
Determining which one to use should be based on specific trial considerations.
Strategy (i) (i.e., equal randomization) is the easiest to implement and al-
lows more uniform learning of admissible doses. In addition, equal randomiza-
tion does not require efficacy readout, avoiding the common logistic difficulty
that the efficacy endpoint may take a relatively long time to be ascertained.
Also, equal randomization is useful to balance confounders across admissible
doses to facilitate reliable identification of OBD particularly when patients
are highly heterogenous (e.g., all-comer trials). Equal randomization can be
modified to a fixed-ratio randomization. For example, based on the number of
patients allocated at Stage I, we choose a fixed randomization ratio such that
at the end of Stage II, the sample size at each admissible dose is expected to
be the same.
In comparison, Strategy (ii), i.e., pick the winner, aims to maximize patient
benefit by assigning the next cohort to the currently estimated optimal dose.
This approach tends to assign more patients to OBD than Strategy (i). As
a trade-off, in certain scenarios it may have a slightly lower probability of
170 Model-Assisted Bayesian Designs for Dose Finding and Optimization
TABLE 8.3: Estimated mean utility (i.e., û) given the number of patients
treated at a dose is 3 or 6 with the elicited utility scores in Table 8.1, toxicity
upper limit φT = 0.3 and efficacy lower limit φE = 0.3.
nT nE û nT nE û
n=3
0 0 42.5 1 3 77.5
0 1 57.5 2 0 22.5
0 2 72.5 2 1 37.5
0 3 87.5 2 2 52.5
1 0 32.5 2 3 67.5
1 1 47.5 >2 Any 0
1 2 62.5
n=6
0 0 41.4 2 1 38.6
0 <1 0 2 1 38.6
0 1 50 2 2 47.1
0 2 58.6 2 3 55.7
0 3 67.1 2 4 64.3
0 4 75.7 2 5 72.9
0 5 84.3 2 6 81.4
0 6 92.9 3 <1 0
1 <1 0 3 1 32.9
1 1 44.3 3 2 41.4
1 2 52.9 3 3 50
1 3 61.4 3 4 58.6
1 4 70 3 5 67.1
1 5 78.6 3 6 75.7
1 6 87.1 >3 Any 0
2 <1 0
Note: n is the number of the patients at a specific dose, nT and nE are
respectively the numbers of toxicity and efficacy at that dose. “0” denotes
that the dose should be eliminated as not admissible because of high toxicity
or low efficacy.
TABLE 8.4: Comparison of U-BOIN with EffTox, including the selection per-
centage (Sel %) and the average number of patients treated at each dose (No.
of pts). The optimal biological dose (OBD) is bolded.
Dose Level
Design 1 2 3 4 5
Scenario 1
πT 0.02 0.15 0.30 0.45 0.60
πE 0.20 0.65 0.65 0.65 0.65
u 43.0 69.0 63.0 56.0 50.0
EffTox Sel % 2.0 50.0 45.0 2.0 0.0
No. of pts 4.3 22.7 23.8 2.8 0.4
U-BOIN Sel % 1.7 72.9 22.4 2.8 0.0
No. of pts 6.2 29.9 13.8 3.5 0.5
Scenario 2
πT 0.03 0.08 0.15 0.28 0.40
πE 0.10 0.22 0.60 0.60 0.60
u 36.0 43.0 66.0 60.0 55.0
EffTox Sel % 0.0 4.0 60.0 29.0 7.0
No. of pts 3.4 4.9 26.4 14.2 5.1
U-BOIN Sel % 1.1 3.2 65.7 24.9 4.3
No. of pts 4.9 7.5 24.4 12.7 4.3
Scenario 3
πT 0.05 0.07 0.10 0.12 0.16
πE 0.35 0.45 0.50 0.55 0.75
u 53.0 59.0 61.0 64.0 75.0
EffTox Sel % 10.0 12.0 26.0 24.0 29.0
No. of pts 8.6 7.9 14.2 11.1 12.1
U-BOIN Sel % 5.9 11.7 13.1 13.6 55.7
No. of pts 7.0 8.8 9.0 9.1 20.1
Note: πT and πE are the toxicity probability and efficacy probability,
respectively. u is the mean utility or desirability.
to search for OBD. In contrast, BOIN12 will quickly reach and stay at OBD
because it considers both efficacy and toxicity from the beginning of the trial.
Compared to BOIN12, U-BOIN has some operational advantages. BOIN12
requires that the efficacy endpoint must be scored quickly enough so that the
dose assignment decision is timely for each new cohort. U-BOIN is lenient
on this requirement as its first stage often “buys” sufficient time to evaluate
the efficacy endpoint, especially when the equal or fixed-ratio randomization
is employed in Stage II of U-BOIN. Section 8.5 introduces an extension of
BOIN12 that is able to handle delayed efficacy or/and toxicity endpoints.
υ11 nE + υ00 (n − nT )
x= .
100
The quasi-binomial likelihood of the unknown ũ based on the interim data D
is
n−x
L(D | ũ) ∝ ũx (1 − ũ) .
Under the Bayesian framework, assigning ũ a Beta prior, i.e., ũ ∼
Beta(αu , βu ), the posterior distribution of ũ arises as
further calibrated by simulation, see Section 8.3.3 for more discussion on the
choice of cT and cE .
In BOIN12, the posterior probability PPj = Pr(uj > ub | Dj ) is used to
determine the most desirable dose within A(D) for treating a new cohort of
patients, where Dj is the data observed at dj . Here, ub is a prespecified utility
benchmark used to evaluate dose desirability. Between two admissible doses
dj and dj 0 ,
• If PPj > PPj 0 , then dj is more desirable than dj 0 .
• If PPj < PPj 0 , then dj is less desirable than dj 0 .
• If PPj = PPj 0 , then dj and dj 0 are equally desirable.
In principle, the third case occurs if and only if Dj = Dj 0 . The posterior prob-
ability PPj automatically accounts for the uncertainty in estimating u, leading
to more reasonable decision making of dose escalation and de-escalation. In
contrast, most existing designs use the point estimate û = x/n to choose the
optimal dose, which ignores the uncertainty of the estimate. To see the im-
portance of accounting for estimation uncertainty, suppose dose dj has been
used to treat two patients and ûj = 60, and dose dj 0 has been used to treat 10
patients and ûj 0 = 59. Although ûj > ûj 0 , because ûj 0 is much more reliable, a
better decision actually is to choose dj 0 for treating the next cohort of patients
as stipulated by the posterior probability approach.
Evaluating PPj requires specifying a benchmark ub for comparison. The
value of ub can be elicited by clinicians to reflect their expectation. Lin et al.
(2020b) recommended the following default value of ub that yields desirable
operating characteristics in a variety of scenarios. Given φT and φE and as-
suming independence between toxicity and efficacy, the highest utility that is
deemed undesirable is given by
ub = (100 + u)/2,
representing that a utility value lies in the middle between u and the maximum
utility 100. Alternatively, one can also use the weighted average as the choice
for ub , i.e., ub = ω100 + (1 − ω)u, where ω ∈ [0, 1] is a non-negative weight.
In general, the weight ω controls the aggressiveness of the trial: the larger the
value of ω, the more aggressive the dose exploration.
One prominent feature of BOIN12 is that the desirability can be pre-
calculated and included in the trial protocol, due to the use of the quasi-
binomial likelihood. Specifically, given the maximum sample size N , all
possible outcome combinations (n, nT , nE ) (or (y01 , y00 , y11 , y10 )) can be enu-
merated, and the corresponding posterior probabilities PPj can be computed
based on (8.11). By sorting all the possible values of PPj from the smallest
to the largest, we can assign the ordered value from 0, 1, . . . to each possible
176 Model-Assisted Bayesian Designs for Dose Finding and Optimization
TABLE 8.5: Rank-based desirability score (RDS) table for the BOIN12 design.
Treat a patient or a
cohort of patients
No
> 𝜆! ≤ 𝜆"
Compute the DLT rate
at current level j
Within (𝜆" , 𝜆! ]
< 𝑁∗
Count the number of
patients at level j
≥ 𝑁∗
FIGURE 8.4: The schema of the Bayesian optimal interval phase I–II
(BOIN12) design, where (λe , λd ) are a pair of optimized dose escalation and
de-escalation boundaries adopted from the BOIN design, and N ∗ is a pre-
specified sample size cutoff (e.g., N ∗ = 6). DLT, dose-limiting toxicity; OBD,
optimal biologic dose.
Step 2(a) says that if the observed toxicity rate is high (i.e., π̂T j > λd ),
we should de-escalate the dose. Step 2(b) says that if there are sufficient data
(nj ≥ N ∗ ) to support that the toxicity probability of the current dose dj
is moderate (i.e., λe < π̂T j ≤ λd ), escalating the dose to dj+1 may cause
overdosing. We then examine whether the adjacent lower dose dj−1 provides
a better treatment benefit (i.e., desirability) than the current dose dj . If so,
we treat the next cohort of patients at dose level j − 1; otherwise stay at
the current level j. Rule 2(c) says that if the observed toxicity rate at the
current dose dj is low (π̂T j ≤ λe ) or moderate ( λe < π̂T j ≤ λd ) but with high
uncertainty (n < N ∗ ), then we examine whether the adjacent higher dose dj+1
or the lower dose dj−1 can provide a better treatment benefit (i.e., desirability)
than the current dose dj . The next cohort of patients will be treated at the
dose with the highest desirability among the adjacent doses. In Step 2(b), the
higher dose dj+1 is not considered due to the fact that, given that there is
substantial evidence (i.e., nj ≥ N ∗ ) that dj is already close to the de-escalation
boundary λd and dj+1 is likely to be overly toxic. Therefore, N ∗ is a cutoff
for sufficiency of the data, and a larger value of N ∗ encourages more active
exploration of new doses. By default, Lin et al. (2020b) recommended N ∗ = 6.
When needed, N ∗ can be further calibrated by simulation to obtain certain
design properties, e.g., using a larger N ∗ encourages faster dose exploration.
During the trial, only admissible doses can be given to incoming patients,
and doses that are not admissible should be eliminated. In other words, the
decision rule in Step 2 should be applied only to the doses in A(D). For
example, in Step 2(c), if only dj−1 , dj ∈ A(D) and dj+1 ∈ / A(D), then when
we apply the rule 2(c), we should choose the level from {j − 1, j} that has the
highest RDS to treat the next cohort of patients. At any time, if all doses are
eliminated, the trial should be stopped with no dose selected as OBD.
For ethical considerations, BOIN12 always assigns the next cohort of pa-
tients to the dose with the highest estimate of desirability. With a small sample
size, this myopic approach may cause the dose finding to get stuck at a locally
optimal dose. To alleviate that issue, a dose exploration rule can be imposed:
treat the next cohort of patients at the next higher dose if the following three
conditions are all satisfied:
• The number of patients treated at the current dose dj is greater than eight.
• The observed toxicity rate π̂T j is less than the de-escalation boundary λd .
• The next higher dose has never been used for treating patients.
When the trial completes, the final OBD is selected based on a two-step
procedure: in the first step, we select MTD based on the target toxicity rate
φT so that any dose levels above the selected MTD are deemed overly toxic; in
the second step, we then choose the dose level that has the highest desirability
among the doses that are not higher than MTD. Specifically, at the end of the
trial, we first obtain the observed marginal toxicity rates π̂T j for each dose
level j = 1, . . . , J. To borrow information across dose levels, an isotonic re-
gression is performed on {π̂T j } through the pool-adjacent-violators algorithm
180 Model-Assisted Bayesian Designs for Dose Finding and Optimization
(Bril et al., 1984a), so that the isotonically transformed estimate π̃T j mono-
tonically increases with the dose. MTD, dMTD , is then selected as the dose
that has the estimated toxicity rate closest to the target toxicity rate φT , that
is,
dMTD = arg min |π̃T j − φT |. (8.12)
dj ∈{d1 ,...,dJ }
Next, we obtain the posterior estimate of the mean utility at each dose level
j = 1, . . . , J as follows:
xj + αu
ūj = . (8.13)
nj + αu + βu
The final OBD, dOBD , is then selected as the admissible dose that does not
exceed the estimated MTD, dMTD , and also yields the highest estimated mean
utility, that is,
dOBD = arg max {ūj } . (8.14)
dj ≤dMTD , dj ∈A(D)
TABLE 8.6: Five scenarios with the marginal toxicity and efficacy probabil-
ities, as well as mean utilities (πT , πE , u) in the simulation study. The mean
utility is based on the 2 × 2 utility table given in Table 8.1. The OBD that
maximizes the utility is bolded.
Dose level
Scenario
1 2 3 4 5
1 (πT , πE ) (0.05, 0.20) (0.12, 0.35) (0.27, 0.36) (0.35, 0.37) (0.50, 0.38)
u 50.0 56.2 50.8 48.2 42.8
2 (πT , πE ) (0.05, 0.01) (0.12, 0.05) (0.27, 0.30) (0.35, 0.60) (0.50, 0.60)
u 38.6 38.2 47.2 62.0 56.0
3 (πT , πE ) (0.03, 0.05) (0.05, 0.10) (0.20, 0.50) (0.22, 0.68) (0.45, 0.70)
u 41.8 44.0 62.0 72.0 64.0
4 (πT , πE ) (0.02, 0.05) (0.05, 0.15) (0.10, 0.40) (0.20, 0.40) (0.30, 0.40)
u 42.2 47.0 60.0 56.0 52.0
5 (πT , πE ) (0.15, 0.25) (0.20, 0.55) (0.25, 0.40) (0.35, 0.30) (0.45, 0.20)
u 49.0 65.0 54.0 44.0 34.0
n(µ − µmin )
x̃ = .
µmax − µmin
80
Selection percentage 70.2
62.7
58.6
60
55.9
47.9 49.6 50.5
47.0
43.1 41.0 42.0 43.3
38.0
40
34.1 34.0
28.6 26.6 28.4
22.9
20
13.2
0
1 2 3 4 5
Scenario
(b) Number of patients at OBD 17.0
Number of patients
15
9.0 8.4
8.1 7.6 8.1
5.1 5.6
5
0
1 2 3 4 5
Scenario
(c) Number of patients at overdoses 3+3+CE EffTox TEPI BOIN12
Number of patients
15
11.0
9.6
10
8.7
4.1 4.0
5
3.8 3.6
3.2 3.2 2.8
2.1 2.4
1.4 1.4 1.1 1.2
0.0 0.0 0.0 0.0
0
1 2 3 4 5
Scenario
(d) Risk of poor allocation
86.8
79.3
80
77.0
72.8 71.2
Percentage
57.1 55.8
47.5 50.3
45.6 43.9 41.2
40
32.4 33.1
29.3
20
16.3
0
1 2 3 4 5
Scenario
separate consideration. In this case, the “severe” level of the toxicity outcome
can be used to indicate such irAEs. This results in nine possible outcomes,
denoted as outcome 1, . . . , 9, and the corresponding utility table is given in
Table 8.7.
Here, υ1 denotes the utility of the most desirable outcome (i.e., no/minor
toxicity and CR), thus setting υ1 = 100; and υ9 denotes the utility of the most
Finding Optimal Biological Dose 183
TABLE 8.7: Utility table for three-level efficacy and toxicity endpoints.
Efficacy
Toxicity CR PR NR
No/minor υ1 = 100 υ2 υ3
Moderate υ4 υ5 υ6
Severe υ7 υ8 υ9 = 0
undesirable outcome (i.e., severe toxicity and NR), thus setting υ9 = 0. Using
these two extreme cases as the reference, we elicit from clinicians the utility
scores of the other seven possible outcomes to reflect their clinical desirability.
For example, to reflect that irAE is highly undesirable, we can assign low
scores to υ7 and υ8 , e.g., υ7 = 30 and υ8 = 10.
Let pl denote the probability of observing the lth outcome l at dose d,
l = 1, . . . , 9. The desirability (or mean utility) of dose d is given by
9
X
µ= pl υl ,
l=1
six (or four) new patients will be enrolled by the time the previous cohort of
patients complete their efficacy (or toxicity) evaluation. The question is how
to treat these new patients in a timely fashion, given that the outcomes of
some of the previously treated patients are still pending.
Formally, let τT and τE denote the lengths of the assessment windows
for yT and yE , respectively. The value of τT and τE should be elicited from
clinicians and wide enough to capture all necessary toxicity and efficacy events
relevant to OBD determination. For example, τT may be the first cycle of the
therapy (e.g., 21 or 28 days), while τE may be also the first cycle of the therapy
(e.g., pharmacodynamic (PD) biomarkers), or several months after treatment
(e.g., tumor response). Let tT and tE denote the times to toxicity and efficacy,
respectively. For patients who will not experience toxicity or efficacy, define
tT = ∞ and tE = ∞. The relation between (yT , yE ) and (tT , tE ) is given as
follows:
yT = 1{tT ≤ τT }, and yE = 1{tE ≤ τE },
where 1{·} denotes the indicator function, which takes a value of 1 if the event
specified in the braces occurs and zero otherwise.
Suppose that n patients have been accrued and treated at dose d, we
use the subscript i to indicate the data for the ith patient, i = 1, . . . , n.
The relationship between the multinomial variable (y01 , y00 , y11 , y10 ) and the
individual patient-level data is
Pn Pn
y01 = i=1 1 {(yT i , yEi ) = (0, 1)} = i=1 1 {tT i > τT , tEi ≤ τE } ,
Pn Pn
y00 = i=1 1 {(yT i , yEi ) = (0, 0)} = i=1 1 {tT i > τT , tEi > τE } ,
Pn Pn
y11 = i=1 1 {(yT i , yEi ) = (1, 1)} = i=1 1 {tT i ≤ τT , tEi ≤ τE } ,
Pn Pn
y10 = i=1 1 {(yT i , yEi ) = (1, 0)} = i=1 1 {tT i ≤ τT , tEi > τE } .
for a, b ∈ {0, 1}, where Sab = Pr {tT > tU , tE > tU | (yT , yE ) = (a, b)}. As-
suming “working” independence between tT and tE , we have
Derivation for the above conditional probabilities and more details of the BDA
procedure are provided in Zhou et al. (2022). Following Cheung and Chappell
(2000) and Yuan et al. (2018), the TITE-BOIN12 design by default assumes
that the time to toxicity and time to efficacy are uniformly distributed over the
respective assessment windows. As a result, wT = tU /τT , and wE = tU /τE .
This assumption seems strong, but it is very robust for the purpose of dose
finding (Cheung and Chappell, 2000; Yuan et al., 2018; Zhou et al., 2022).
186 Model-Assisted Bayesian Designs for Dose Finding and Optimization
In this equation, the first term corresponds to those patients whose yT and
yE are both observed (i.e., (δT , δE ) = (0, 0)), thus their contribution to the
quasi-number of events x can be directly computed based on the observed
data. The last three terms correspond to the patients with at least one of yT
and yE pending, i.e., δT + δE > 0, and involve the mean imputation of yT
and yE respectively by Pr(yT = a | δT = 1) and Pr(yE = 1 | δE = b), a, b =
0, 1, based on the assumption that yT and yE are “working” independent.
Technically speaking, the correlation between efficacy and toxicity can be
modeled. The small sample sizes of phase I/II trials, however, provide limited
information to estimate the correlation parameter reliably. The independence
assumption for pending outcomes seems strong, but it makes the method
simple and computationally fast. Zhou et al. (2022) showed by numerical
studies that the TITE-BOIN12 design is remarkably robust to the violation
of this assumption. This may be because the assumption is only used for the
patients who have pending data. For patients with both yT and yE observed,
the independence assumption is not made. When the trial progresses, the
percentage of observed data increases, limiting the impact of the violation of
the independence assumption.
By assuming that given yq = 1, the time-to-event outcome tq is a uniform
random variable over (0, τq ), we have Pr(δq = 1 | yq = 1) = Pr(tq > tU | yq =
Finding Optimal Biological Dose 187
Pr(yq = 1 | δq = 1)
Pr(δq = 1 | yq = 1) Pr(yq = 1)
=
Pr(δq = 0 | yq = 0) Pr(yq = 0) + Pr(δq = 1 | yq = 1) Pr(yq = 1)
Pr(δq = 1 | yq = 1) Pr(yq = 1)
=
Pr(yq = 0) + Pr(δq = 1 | yq = 1) Pr(yq = 1)
πq (1 − tU /tq )
= ,
1 − πq tU /tq
or efficacy outcomes at the current dose, suspend the accrual to wait for more
data to become available.
Of note, another appealing feature of TITE-BOIN12 is that it naturally
accommodates the case that some patients may be evaluable for toxicity, but
not evaluable for efficacy (e.g., patients are off treatment due to toxicity).
These patients can be regarded as yT are observed and yE are (permanently)
pending, and thus directly incorporated into the utility estimation and deci-
sion making.
C
Desirability increases
observed at dose d,
The collection of strongest toxicity and desirability intervals (kdT , kdU ) can
be treated as the vector of sufficient statistics of the uTPI design in determin-
ing dose escalation or de-escalation. Specifically, let k ∗ denote the location of
190 Model-Assisted Bayesian Designs for Dose Finding and Optimization
the toxicity interval that contains the upper limit of the toxicity rate φT . In
the aforementioned example, k ∗ = 4 when φT = 0.35. As demonstrated in
Figure 8.6, the dose-finding rule of uTPI proceeds as follows:
1. Patients in the first cohort are treated at the lowest dose d1 , or the
physician-specified dose.
2. Based on the observed data, we obtain the strongest toxicity and
desirability intervals for each dose. Suppose j is the current dose
level. To assign a dose to the next cohort of patients,
(a) If kdTj > k ∗ , de-escalate the dose to level j − 1.
(b) If kdTj = k ∗ and nj ≥ N ∗ , choose the level from {j − 1, j} that
has a larger utility interval kdU .
(c) Otherwise, if kdTj < k ∗ or kdTj = k ∗ and nj < N ∗ , choose the
level from {j − 1, j, j + 1} that has the largest utility interval
kdU .
3. Repeat Step 2 until the maximum sample size N is reached, and
then use the OBD selection rule described in Section 8.4.3 to select
the OBD.
During the trial conduct, uTPI only treats the patients using doses in
the admissible set A(D) as defined in BOIN12 and U-BOIN. If no dose is
admissible, the trial should be terminated early.
Similar to BOIN12 and U-BOIN designs, the uTPI design also has a concise
decision structure such that a dose-assignment decision table can be calculated
before the trial starts and can be used throughout the trial, which simplifies
its practical implementation. More details about the implementation of the
uTPI design can be found in Shi et al. (2021).
(1) If π̂T (dj ) ≤ λe and π̂E (dj ) ≤ δ, then the current dose is safe but not effective,
and the next dose should be escalated to level j + 1.
(2) If π̂T (dj ) > λd , then the current dose is too toxic and the next dose should
be de-escalated to level j − 1.
192 Model-Assisted Bayesian Designs for Dose Finding and Optimization
1
Promising
region
Efficacy probability
𝜑
Inadmissible
region
Exploratory
region
0
0 𝜆! 𝜙 𝜆 " 1
Toxicity probability
FIGURE 8.7: Partitioned regions under the STEIN design. If the pair of the
observed toxicity and efficacy probabilities at the current dose level lies inside
the promising region, the current dose is retained. If the pair lies inside the
exploratory region, then more dose levels should be explored. Specifically, the
lighter color subregion indicates that the current dose is safe and thus dose
escalation is warranted. The dark color subregion indicates that the toxicity
probability is close to the target, and thus the decision of dose escalation/de-
escalation depends on the observed toxicity and efficacy data jointly. If the
pair lies inside the inadmissible region, the current dose is too toxic and thus
dose de-escalation is needed.
Escalate/
Escalate Stay/ De-escalate
De-escalate
0 𝜆! 𝜆" 1
Toxicity probability
FIGURE 8.8: Dose allocation rules for the BOIN-ET design considering both
efficacy and toxicity.
Finding Optimal Biological Dose 193
(3) If π̂T (dj ) < λd and π̂E (dj ) > δ, then the current dose is desirable in terms
of both efficacy and safety, then the next dose should not be changed.
(4) If λe < π̂T (dj ) ≤ λd and π̂E (dj ) ≤ δ, then all possible decisions including
escalation, stay, and de-escalation are considered. Typically, set the admis-
sible set Aj = {j − 1, j, j + 1}, the next dose is determined according to the
following rules:
(a) If dose level j + 1 is untried, then the next dose is escalated to level
j + 1.
(b) Otherwise, the next dose is selected as the one with the maximum ob-
served efficacy rate. If there is a tie having the maximum observed effi-
cacy rate, then the next dose is randomly chosen from the tied doses.
To determine the optimal values of (λe , λd , δ), Takeda et al. (2018) con-
sidered the following six hypotheses at dose dj :
Case study
Platinum-Refractory Oral Cancer Trial The objective of this phase I–II
trial (Clinical Trials Registry-India identifier: CTRI/2019/01/016837) is to
determine OBD for methotrexate when given along with erlotinib and cele-
coxib in treating patients with platinum-resistant or early-failure squamous
cell carcinoma of the oral cavity. DLT is scored based on the Common Termi-
nology Criteria for Adverse Events (version 4.03), and the efficacy endpoint
used in monitoring is the clinical benefit rate at two months. Five doses of
methotrexate (i.e., 3, 6, 9, 12, and 15 mg/m2 ) are investigated in combination
with 150 mg erlotinib and 200 mg celecoxib.
Additional examples include a CD33 CAR T-cell therapy trial in pa-
tients with relapsed or refractory acute myeloid leukemia (ClinicalTrials.gov
Identifier: NCT04835519, based on BOIN12), and a trial that evaluates a
plant extract to treat breast cancer patients (ClinicalTrials.gov Identifier:
NCT05007444, based on U-BOIN).
We use the oral cancer trial as an example to illustrate the use of the
BOIN12 app to design phase I-II trials. The BOIN12 app can be selected and
launched from the BOIN Suite launchpad (Figure 8.9). The app includes six
functional tabs and has the capability to implement the three main tasks: trial
design, trial conduct, and OBD determination. We design the trial using the
following three steps:
Doses and Sample Size As shown in Figure 8.10, the number of doses un-
der investigation for methotrexate is five, and the starting dose level is 1. The
total sample size is 36 and patients are treated in the cohort size of three,
which results in a total of 12 cohorts. We recommend the maximum sample
size N should be no less than 6 × J (i.e., the maximum sample size of the
3+3 design), where J is the number of doses. To reduce the sample size, the
“convergence” stopping rule is implemented in BOIN12, that is, when m pa-
tients have been treated by the current dose and the decision is to stay, the
trial should be early stopped before the exhaustion of the maximum sample
size. This stopping criterion indicates that the trial can be stopped when the
dose finding approximately converges to OBD. In this trial, m = 12 is used.
Because of this early stopping rule, the actual sample size used in the trial is
often smaller than N . The saving in sample size depends on the underlying
dose–toxicity or dose–efficacy scenario and can be evaluated through simula-
tion studies.
196 Model-Assisted Bayesian Designs for Dose Finding and Optimization
is to rule out excessively toxic and ineffective doses. Among admissible doses,
the dose assignment rule will allocate patients to the most desirable dose. In
other words, even if the admissible dose set includes some doses that are not
particularly safe or efficacious, the design will not assign patients to these
suboptimal doses. Due to the large uncertainty of small sample sizes, using
small values for cT and cE will inadvertently eliminate the doses that are
actually admissible, and thus affect the operating characteristics of the design.
If a dose is inadmissible due to violation of the safety criterion, this dose and
its higher doses are considered inadmissible. During the trial, only admissible
doses can be used to treat patients, and the doses that are not admissible
should be eliminated from the trial.
After the completion of the specification of design parameters, the de-
sign flowchart (Figure 8.13) and decision tables (Figure 8.14) can be gen-
erated by clicking the “Get Decision Table” button (see Figure 8.12). In
BOIN12, two decision tables will be generated: the first one summarizes the
FIGURE 8.13: The flowchart of the BOIN12 design generated by the BOIN12
shiny app.
Finding Optimal Biological Dose 199
TABLE 8.8: Simulation results generated by the BOIN12 shiny app. The
values corresponding to OBD are in boldface.
Dose level
Avg. N Stop %
1 2 3 4 5
Scenario 1
Pr(DLT) 0.03 0.17 0.35 0.50 0.65
Pr(Efficacy) 0.12 0.39 0.43 0.55 0.65
Mean utility 46.0 56.6 51.8 53.0 53.0
No. patients treated 5.2 8.8 6.4 2.8 0.6 23.8
Selection % 16.5 53.5 24.0 5.5 0.0 0.5
Scenario 2
Pr(DLT) 0.01 0.08 0.10 0.12 0.35
Pr(Efficacy) 0.05 0.25 0.30 0.60 0.60
Mean utility 42.6 51.8 54.0 71.2 62.0
No. patients treated 3.4 4.8 4.7 9.0 4.2 26.1
Selection % 1.5 14.5 6.5 63.5 14.0 0.0
decision table included in the protocol to conduct the trial and make adaptive
decisions (e.g., dose escalation/stay/de-escalation). Alternatively, users can
use the Trial Conduct tab to determine the dose for next cohort of patients.
In the latter approach, users can upload trial data to the app to obtain the
recommended dose for the next cohort of patients. Summary statistics for the
interim data are also provided by the app.
After the trial completes accrual and has all patients’ outcomes evaluated,
users can use the OBD determination tab to identify OBD. After uploading
the trial data with the provided csv template, users can obtain OBD on the
right side of the app, as well as various estimates, including the estimates for
joint toxicity-efficacy probabilities, marginal toxicity/efficacy probability, and
utility of each dose that has been used to treat patients.
Babb, J., Rogatko, A., and Zacks, S. (1998). Cancer phase I clinical tri-
als: efficient dose escalation with overdose control. Statistics in Medicine,
17(10):1103–1120.
Barlow, R. E., Bartholomew, D. J., Bremner, J. M., and Brunk, H. D. (1972).
Statistical Inference under Order Restrictions; The Theory and Application
of Isotonic Regression. Wiley, New York, NY.
Bekele, B. N., Ji, Y., Shen, Y., and Thall, P. F. (2007). Monitoring late-onset
toxicities in phase I trials using predicted risks. Biostatistics, 9(3):442–457.
Bekele, B. N. and Shen, Y. (2005). A Bayesian approach to jointly mod-
eling toxicity and biomarker expression in a phase I/II dose-finding trial.
Biometrics, 61(2):343–354.
Bekele, B. N. and Thall, P. F. (2004). Dose-finding based on multiple tox-
icities in a soft tissue sarcoma trial. Journal of the American Statistical
Association, 99(465):26–35.
Berger, J. O. (2013). Statistical Decision Theory and Bayesian Analysis.
Springer Science & Business Media.
Berry, D. A. (2003). Statistical innovations in cancer research. Cancer
Medicine, 6:465–478.
Berry, S. M., Carlin, B. P., Lee, J. J., and Muller, P. (2010). Bayesian Adaptive
Methods for Clinical Trials. CRC press.
Biswas, S., Liu, D. D., Lee, J. J., and Berry, D. A. (2009). Bayesian clinical
trials at the University of Texas M. D. Anderson Cancer Center. Clinical
Trials, 6(3):205–216.
Brahmer, J. R., Drake, C. G., Wollner, I., Powderly, J. D., Picus, J., Sharfman,
W. H., Stankevich, E., Pons, A., Salay, T. M., McMiller, T. L., et al. (2010).
Phase I study of single-agent anti–programmed death-1 (MDX-1106) in re-
fractory solid tumors: safety, clinical activity, pharmacodynamics, and im-
munologic correlates. Journal of Clinical Oncology, 28:3167–3175.
205
206 Bibliography
Houede, N., Thall, P. F., Nguyen, H., Paoletti, X., and Kramar, A. (2010).
Utility-based optimization of combination therapy using ordinal toxicity
and efficacy in phase I/II trials. Biometrics, 66(2):532–540.
Hunsberger, S., Rubinstein, L. V., Dancey, J., and Korn, E. L. (2005). Dose
escalation trial designs based on a molecularly targeted endpoint. Statistics
in Medicine, 24(14):2171–2181.
Iasonos, A., Wages, N. A., Conaway, M. R., Cheung, K., Yuan, Y., and
O’Quigley, J. (2016). Dimension of model parameter space and operat-
ing characteristics in adaptive dose-finding studies. Statistics in Medicine,
35(21):3760–3775.
Iasonos, A., Wilton, A. S., Riedel, E. R., Seshan, V. E., and Spriggs, D. R.
(2008). A comprehensive comparison of the continual reassessment method
to the standard 3+ 3 dose escalation scheme in phase I dose-finding studies.
Clinical Trials, 5(5):465–477.
Ivanova, A., Montazer-Haghighi, A., Mohanty, S. G., and Durham, S. D.
(2003). Improved up-and-down designs for phase I trials. Statistics in
Medicine, 22(1):69–82.
Jaki, T., Clive, S., and Weir, C. J. (2013). Principles of dose finding studies
in cancer: a comparison of trial designs. Cancer Chemotherapy and Phar-
macology, 71(5):1107–1114.
Ji, Y., Liu, P., Li, Y., and Nebiyou Bekele, B. (2010). A modified toxicity
probability interval method for dose-finding trials. Clinical Trials, 7(6):653–
663.
Kass, R. E. and Raftery, A. E. (1995). Bayes factors. Journal of the American
Statistical Association, 90(430):773–795.
Le Tourneau, C., Diéras, V., Tresca, P., Cacheux, W., and Paoletti, X. (2010).
Current challenges for the early clinical development of anticancer drugs in
the era of molecularly targeted agents. Targeted Oncology, 5:65–72.
Le Tourneau, C., Lee, J. J., and Siu, L. L. (2009). Dose escalation methods
in phase I cancer clinical trials. JNCI: Journal of the National Cancer
Institute, 101(10):708–720.
Lee, J., Thall, P. F., Ji, Y., and Müller, P. (2016). A decision-theoretic phase
I–II design for ordinal outcomes in two cycles. Biostatistics, 17(2):304–319.
Lee, J. J. and Chu, C. T. (2012). Bayesian clinical trials in action. Statistics
in Medicine, 31(25):2955–2972.
Lee, S., Hershman, D., Martin, P., Leonard, J., and Cheung, K. (2009). Vali-
dation of toxicity burden score for use in phase I clinical trials. Journal of
Clinical Oncology, 27(15 suppl):2514–2514.
Bibliography 209
Morita S, Thall PF, Müller P. (2008) Determining the effective sample size of
a parametric prior. Biometrics, 64(2):595–602.
Petit, C., Samson, A., Morita, S., Ursino, M., Guedj, J., Jullien, V., Comets,
E., and Zohar, S. (2018). Unified approach for extrapolation and bridging of
adult information in early-phase dose-finding paediatric studies. Statistical
Methods in Medical Research, 27(6):1860–1877.
Phan, T. G., Ma, H., Lim, R., Sobey, C. G., and Wallace, E. M. (2018). Phase
1 trial of amnion cell therapy for ischemic stroke. Frontiers in Neurology,
9:198.
Piantadosi, S., Fisher, J. D., and Grossman, S. (1998). Practical implemen-
tation of a modified continual reassessment method for dose-finding trials.
Cancer Chemotherapy and Pharmacology, 41(6):429–436.
Postel-Vinay, S., Gomez-Roca, C., Molife, L. R., Anghan, B., Levy, A., Judson,
I., De Bono, J., Soria, J.-C., Kaye, S., and Paoletti, X. (2011). Phase I
trials of molecularly targeted agents: should we pay more attention to late
toxicities. Journal of Clinical Oncology, 29(13):1728–1735.
Riviere, M.-K., Yuan, Y., Dubois, F., and Zohar, S. (2014). A Bayesian
dose-finding design for drug combination clinical trials based on the logistic
model. Pharmaceutical Statistics, 13(4):247–257.
Riviere, M.-K., Dubois, F., and Zohar, S. (2015a). Competing designs for drug
combination in phase I dose-finding clinical trials. Statistics in Medicine,
34(1):1–12.
Riviere, M.-K., Yuan, Y., Dubois, F., and Zohar, S. (2015b). A Bayesian
dose finding design for clinical trials combining a cytotoxic agent with a
molecularly targeted agent. Journal of the Royal Statistical Society: Series
C (Applied Statistics), 64(1):215–229.
Riviere, M.-K., Yuan, Y., Jourdan, J.-H., Dubois, F., and Zohar, S. (2018).
Phase I/II dose-finding design for molecularly targeted agent: plateau de-
termination using adaptive randomization. Statistical Methods in Medical
Research, 27(2):466–479.
Robert, C. and Casella, G. (2013). Monte Carlo Statistical Methods. Springer
Science & Business Media.
Bibliography 213
Rogatko, A., Schoeneck, D., Jonas, W., Tighiouart, M., Khuri, F. R., and
Porter, A. (2007). Translation of innovative designs into phase I trials.
Journal of Clinical Oncology, 25(31):4982–4986.
Ruppert, A. S. and Shoben, A. B. (2018). Overall success rate of a safe and
efficacious drug: results using six phase 1 designs, each followed by stan-
dard phase 2 and 3 designs. Contemporary Clinical Trials Communications,
12:40–50.
Sachs, J. R., Mayawala, K., Gadamsetty, S., Kang, S. P., and de Alwis, D. P.
(2016). Optimal dosing for targeted therapies in oncology: drug development
cases leading by example. Clinical Cancer Research, 22(6):1318–1324.
Shen, L. Z. and O’Quigley, J. (1996). Consistency of continual reassessment
method under model misspecification. Biometrika, 83(2):395.
Shi, H., Cao, J., Yuan, Y., and Lin, R. (2021). uTPI: a utility-based toxicity
probability interval design for phase I/II dose-finding trials. Statistics in
Medicine, 40(11):2626–2649.
Simon, R. (1989). Optimal two-stage designs for phase II clinical trials. Con-
trolled Clinical Trials, 10(1):1–10.
Simon, R., Freidlin, B., Rubinstein, L., Arbuck, S. G., Collins, J., and Chris-
tian, M. C. (1997). Accelerated titration designs for phase I clinical trials
in oncology. Journal of the National Cancer Institute, 89(15):1138–1147.
Skolnik, J. M., Barrett, J. S., Jayaraman, B., Patel, D., and Adamson, P. C.
(2008). Shortening the timeline of pediatric phase I trials: the rolling six
design. Journal of Clinical Oncology, 26(2):190–195.
Storer, B. E. (1989). Design and analysis of phase I clinical trials. Biometrics,
45(3):925–937.
Stylianou, M. and Follmann, D. A. (2004). The accelerated biased coin up-
and-down design in phase I trials. Journal of Biopharmaceutical Statistics,
14(1):249–260.
Takeda, K., Taguri, M., and Morita, S. (2018). BOIN-ET: Bayesian optimal
interval design for dose finding based on both efficacy and toxicity outcomes.
Pharmaceutical Statistics, 17(4):383–395.
Takeda K., Morita S., and Taguri M. (2020). TITE-BOIN-ET: Time-to-event
Bayesian optimal interval design to accelerate dose-finding based on both
efficacy and toxicity outcomes. Pharmaceutical Statistics, 19(3):335–349.
Thall, P. F. and Cook, J. D. (2004). Dose-finding based on efficacy–toxicity
trade-offs. Biometrics, 60(3):684–693.
214 Bibliography
Thall, P. F., Millikan, R. E., Mueller, P., and Lee, S.-J. (2003). Dose-finding
with two agents in phase I oncology trials. Biometrics, 59(3):487–496.
Tidwell, R. S. S., Peng, S. A., Chen, M., Liu, D. D., Yuan, Y., and Lee, J. J.
(2019). Bayesian clinical trials at the University of Texas MD Anderson
Cancer Center: an update. Clinical Trials, 16(6):645–656.
van Brummelen, E. M., Huitema, A. D., van Werkhoven, E., Beijnen, J. H.,
and Schellens, J. H. (2016). The performance of model-based versus rule-
based phase I clinical trials in oncology. Journal of Pharmacokinetics and
Pharmacodynamics, 43(3):235–242.
Wages, N. A., Conaway, M. R., and O’Quigley, J. (2011). Dose-finding design
for multi-drug combinations. Clinical Trials, 8(4):380–389.
Weber, J. S., Yang, J. C., Atkins, M. B., and Disis, M. L. (2015). Toxici-
ties of immunotherapy for the practitioner. Journal of Clinical Oncology,
33(18):2092.
Yan, F., Mandrekar, S. J., and Yuan, Y. (2017). Keyboard: a novel Bayesian
toxicity probability interval design for phase I clinical trials. Clinical Cancer
Research, 23(15):3994–4003.
Yin, G., Li, Y., and Ji, Y. (2006). Bayesian dose-finding in phase I/II clinical
trials using toxicity and efficacy odds ratios. Biometrics, 62(3):777–787.
Yin, G. and Lin, R. (2015). Comments on ‘competing designs for drug com-
bination in phase I dose-finding clinical trials’ by M-K. Riviere, F. Dubois,
and S. Zohar. Statistics in Medicine, 34(1):13–17.
Yin, G. and Yuan, Y. (2009a). Bayesian dose finding in oncology for drug
combinations by copula regression. Journal of the Royal Statistical Society:
Series C (Applied Statistics), 58(2):211–224.
Yin, G. and Yuan, Y. (2009b). Bayesian model averaging continual reassess-
ment method in phase I clinical trials. Journal of the American Statistical
Association, 104(487):954–968.
Yin, G. and Yuan, Y. (2009c). A latent contingency table approach to dose
finding for combinations of two agents. Biometrics, 65(3):866–875.
Yin, G., Zheng, S., and Xu, J. (2013). Fractional dose-finding methods with
late-onset toxicity in phase I clinical trials. Journal of Biopharmaceutical
Statistics, 23(4):856–870.
Yuan, Y., Hess, K. R., Hilsenbeck, S. G., and Gilbert, M. R. (2016a). Bayesian
optimal interval design: a simple and well-performing design for phase I
oncology trials. Clinical Cancer Research, 22(17):4291–4301.
Bibliography 215
Yuan, Y., Nguyen, H. Q., and Thall, P. F. (2016b). Bayesian Designs for
Phase I–II Clinical Trials. CRC Press.
Yuan, Y. and Yin, G. (2011a). Bayesian phase I/II adaptively randomized
oncology trials with combined drugs. The Annals of Applied Statistics,
5(2A):924.
Yuan, Y. and Yin, G. (2011b). Robust EM continual reassessment method
in oncology dose finding. Journal of the American Statistical Association,
106(495):818–831.
Yuan, Z., Chappell, R., and Bailey, H. (2007). The continual reassessment
method for multiple toxicity grades: a Bayesian quasi-likelihood approach.
Biometrics, 63(1):173–179.
Zang, Y. and Lee, J. J. (2014). Adaptive clinical trial designs in oncology.
Chinese Clinical Oncology, 3(4):49.
Zang, Y., Lee, J. J., and Yuan, Y. (2014). Adaptive designs for identifying
optimal biological dose for molecularly targeted agents. Clinical Trials,
11(3):319–327.
Zhang, L. and Yuan, Y. (2016). A practical Bayesian design to identify the
maximum tolerated dose contour for drug combination trials. Statistics in
Medicine, 35(27):4924–4936.
Zhang, W., Sargent, D. J., and Mandrekar, S. (2006). An adaptive dose-finding
design incorporating both toxicity and efficacy. Statistics in Medicine,
25(14):2365–2383.
Zhao, L., Lee, J., Mody, R., and Braun, T. M. (2011). The superiority of
the time-to-event continual reassessment method to the rolling six design
in pediatric oncology phase I trials. Clinical Trials, 8(4):361–369.
Zhou, H., Murray, T. A., Pan, H., and Yuan, Y. (2018a). Comparative re-
view of novel model-assisted designs for phase I clinical trials. Statistics in
Medicine, 37(14):2208–2222.
Zhou, H., Yuan, Y., and Nie, L. (2018b). Accuracy, safety, and reliability of
novel phase I trial designs. Clinical Cancer Research, 24(18):4357–4364.
216 Bibliography
Zhou, Y., Lee, J. J., Wang, S., Bailey, S., and Yuan, Y. (2021a). Incorporating
historical information to improve phase I clinical trials. Pharmaceutical
Statistics, 20(6):1017–1034.
Zhou, Y., Lee, J. J., and Yuan, Y. (2019). A utility-based Bayesian optimal
interval (U-BOIN) phase I/II design to identify the optimal biological dose
for targeted and immune therapies. Statistics in Medicine, 38(28):S5299–
S5316.
Zhou, Y., Li, R., Yan, F., Lee, J. J., and Yuan, Y. (2021b). A comparative
study of Bayesian optimal interval (BOIN) design with interval 3 + 3 (i3+
3) design for phase I oncology dose-finding trials. Statistics in Biopharma-
ceutical Research, 13(2):147–155.
Zhou, Y., Lin, R., Lee, J. J., Li, D., Wang, L., Li, R., and Yuan, Y. (2022).
TITE-BOIN12: a Bayesian phase I/II trial design to find the optimal bi-
ological dose with late-onset toxicity and efficacy. Statistics in Medicine,
41(11):1918–1931.
Zohar, S., Katsahian, S., and O’Quigley, J. (2011). An approach to meta-
analysis of dose-finding studies. Statistics in Medicine, 30(17):2109–2116.
Index
217
218 Index