0% found this document useful (0 votes)
4 views

Parameter Estimation in Nonlinear Mixed Effect Mod

Uploaded by

Soumyadeep Dhar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Parameter Estimation in Nonlinear Mixed Effect Mod

Uploaded by

Soumyadeep Dhar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Computational Statistics (2024) 39:2975–3005

https://doi.org/10.1007/s00180-023-01420-x

ORIGINAL PAPER

Parameter estimation in nonlinear mixed effect models


based on ordinary differential equations: an optimal
control approach

Quentin Clairon1,2 · Chloé Pasin3 · Irene Balelli4 · Rodolphe Thiébaut1,2 ·


Mélanie Prague1,2

Received: 10 December 2022 / Accepted: 18 September 2023 / Published online: 14 October 2023
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023

Abstract
We present a method for parameter estimation for nonlinear mixed-effects models
based on ordinary differential equations (NLME-ODEs). It aims to regularize
the estimation problem in the presence of model misspecification and practical
identifiability issues, while avoiding the need to know or estimate initial conditions
as nuisance parameters. To this end, we define our estimator as a minimizer of a
cost function that incorporates a possible gap between the assumed population-level
model and the specific individual dynamics. The computation of the cost function
leads to formulate and solve optimal control problems at the subject level. Compared
to the maximum likelihood method, we show through simulation examples that our
method improves the estimation accuracy in possibly partially observed systems
with unknown initial conditions or poorly identifiable parameters with or without
model error. We conclude this work with a real-world application in which we
model the antibody concentration after Ebola virus vaccination.

Keywords Dynamic population models · Ordinary differential equations · Optimal


control theory · Mechanistic models · Nonlinear mixed effects models · Clinical trial
analysis

* Quentin Clairon
[email protected]
1
Inria Bordeaux Sud‑Ouest, Inserm, Bordeaux Population Health Research Center, SISTM Team,
UMR1219, University of Bordeaux, 33000 Bordeaux, France
2
Vaccine Research Institute, 94000 Créteil, France
3
Department of Infectious Diseases and Hospital Epidemiology, University Hospital, Collegium
Helveticum, Institute of Medical Virology, University of Zurich, Zurich, Switzerland
4
Centre Inria d’Université Côte d’Azur, EPIONE Research Project, Valbonne, France

13
Vol.:(0123456789)

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


2976 Q. Clairon et al.

1 Introduction

Ordinary differential equation (ODE) models are standard in population dynamics,


epidemiology, virology, pharmacokinetics, or genetic regulatory network analysis
since they can describe the main mechanisms of interaction between different
biological components of complex systems and their evolution over time and
because they also provide reasonable approximations to stochastic dynamics
(Perelson et al. 1996; Engl et al. 2009; Villain et al. 2019).
For experimental designs with a large number of subjects and a limited number
of individual measurements, nonlinear mixed-effects models may be more relevant
than single-subject models, since they allow to collect information from the entire
population while accounting for variability among individuals. For instance,
clinical trials and pharmacokinetics/pharmacodynamics studies often fall into this
category (Lavielle and Mentre 2011; Guedj et al. 2007). Formally, we are interested
in a population where the dynamic of each subject i = 1, … n is modeled by the
d-dimensional ODE:
{
ẋ i (t) = f𝜃,bi (t, xi (t))
xi (0) = xi,0 (1)

where f is a d−dimensional vector field, 𝜃 is a p−dimensional parameter,


bi ∼ N(0, Ψ) is a q−dimensional random effect where Ψ is a variance-covariance
matrix, xi,0 is the initial condition for subject i. We denote X𝜃,bi ,xi,0 the solution of
( )
(1) for a given set 𝜃, bi , xi,0 . In (1), we can also incorporate covariate functions zi
which are omitted here for the purpose of clarity.
Our goal is to estimate the{ true } population parameters (𝜃 , Ψ ) as well as the true
∗ ∗

subject specific realizations b∗i i=1,…n from partial and noisy observations coming
from n subjects and described by the following observational model:
yij = CX𝜃∗ ,b∗ ,x∗ (tij ) + 𝜖ij .
i i,0

For the i -th subject, we denote tij its j-th measurement time-point on the observation
interval [0, T] and ni its total number of available measurements. Here C is a do × d
sized observation matrix emphasizing the potentially partially observed nature of
the process and 𝜖ij ∼ 𝜎 ∗ × N(0, Ido ) is the measurement error. The vector
{ }
𝐲𝐢 = yij j=1,…n corresponds to the set of observations available for the i-th subject
{ } i
and y = 𝐲𝐢 i=1,…n is the set of all observations in the population. We also assume
that only a subset xi,0
k∗
of xi,0

is perfectly known, the other ones, denoted xi,0
u∗
, being
(( ) ( ) )T
T T
unknown and they are ordered as follows xi,0 = u
xi,0 k
, xi,0 for the sake of
clarity. Nonetheless, pre-existing information can be available for xi,0
u∗
under the form
( )
of a priori distribution with a possibly parameter dependent density ℙ xi,0
u
∣ 𝜃, Ψ, bi .
The same holds for (𝜃, Ψ) for which a priori distribution ℙ(𝜃, Ψ) can be available.

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Parameter estimation in nonlinear mixed effect models based… 2977

Our problem belongs to the class of parameter estimation problem in nonlinear


mixed effect models based on ODEs (NLME-ODEs). In this context, frequentist
methods based on likelihood maximization (via different numerical procedures:
Laplace approximation (Pinheiro and Bates 1994), Gaussian quadrature (Guedj et al.
2007) or SAEM (Comets et al. 2017; Lavielle and Mentré 2007)) and Bayesian ones
aiming to reconstruct the a posteriori distribution or to derive the maximum a poste-
riori estimator (via MCMC algorithms (Lunn et al. 2000; Huang and Dagne 2011),
importance sampling (Raftery and Bao 2010), approximation of the asymptotic pos-
terior distribution (Prague et al. 2013)) have been proposed. In particular, dedicated
methods/softwares using the structure of ODE models have been implemented to
increase numerical stability and speed up convergence rate (Tornoe et al. 2004), to
reduce the computational time (Donnet and Samson 2006) or to avoid the repeated
model integration and estimation of initial conditions (Wang et al. 2014). However,
all the preceding methods face similar pitfalls due to specific features of population
models based on ODEs (with the exception of (Wang et al. 2014)):

1. They do not take into account the presence of model misspecification, a common
feature of ODE models used in biology. Indeed, the ODE modeling process
suffers from model inadequacy, understood as the discrepancy between the mean
model response and the real process, as well as residual variability subject to
specific stochastic perturbations or missing elements that disappear by averaging
over the entire population (Kennedy and O’Hagan 2001). As examples of the
causes of model inadequacies, one can think of the ODE models used in
epidemiology and virology, which are derived by approximations in which, for
example, interactions are modeled by pairwise products, while higher order terms
and/or the influence of unknown/unmeasured external factors are neglected. As
for residual variability, recall that biological processes are often stochastic and
the justification for deterministic modeling lies in the approximation of stochastic
processes (Kurtz 1978; Kampen 1992). Moreover, in the context of NLME-ODEs,
new sources of model uncertainties emerge. Firstly, error measurement in covar-
iates zi can lead to use a proxy function ẑi instead of zi (Huang and Dagne 2011).
Secondly, the sequential nature of most inference methods leads to estimate
{ ∗}
bi i=1,…n based on an approximation 𝜃̂ instead of 𝜃 ∗ . Thus, the structure of
mixed-effect models spreads measurement uncertainty into the mechanistic model
structure during the estimation. It turns classical
{ } statistical uncertainties into
model error causes. Estimation of 𝜃 ∗, Ψ∗ and b∗i i=1,…n must be performed in the
presence of the model error, although it is known to dramatically affect the
accuracy of methods that do not take it into account (Brynjarsdottir and O’Hagan
2014).
2. They have to estimate or make assumptions on the true unknown initial conditions
u∗
xi,0 . In ODE models, the initial conditions are generally nuisance parameters in
the sense that inferring their values does not bring answers to the scientific
questions which motivate the model construction but is necessary for the
estimation of the relevant parameters. For example, partially observed
compartmental models used in pharmacokinetics/pharmacodynamics often

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2978 Q. Clairon et al.

involve unknown initial conditions which needs to be inferred to estimate the


transmission rates between compartments, which are the true parameters of inter-
est. Unknown initial conditions imply either assumptions on xi,0
u∗
values (Lavielle
and Mentre 2011), another potential cause of model misspecifications, or their
estimation
( (Huang and) Lu 2008). This latter case requires a priori knowledge to
derive ℙ xi,0
u
∣ 𝜃, Ψ, bi expression and simultaneous inference of (b∗i , xi,0
u∗
) as sub-
ject specific parameters. This increases the complexity of the related optimization
problem and can degrade estimation accuracy.
3. They can face accuracy degradation when the inverse problem of parameter esti-
mation is ill-posed (Engl et al. 2009) due to practical identifiability issues. Ill-
(posedness) in ODE models is often due to the geometry induced by the mapping
𝜃, bi , xi,0 ⟼ CX𝜃,bi ,xi,0, where there can be a small number of relevant directions
of variation skewed from the original parameter axes (Gutenkunst et al. 2007).
This problem, called sloppiness, often appears in ODE models used in biology
(Gutenkunst et al. 2007; Leary et al. 2015) and leads to an ill-conditioned Fisher
Information Matrix. For maximum likelihood estimators this is a cause of high
variance due to the Cramér-Rao bound. For Bayesian inference, it leads to a nearly
singular asymptotic posterior distribution because of Bernstein–von Mises theo-
rem (see Campbell (2007) for the computational induced problems). Although
this problem is in part mitigated in NLME-ODEs which merge different subjects
for estimating (𝜃 ∗ , Ψ∗ ) and use distribution of bi ∣ Ψ as prior at the subject level
(Lavielle and Aarons 2015), estimation accuracy can benefit from the use of
regularization techniques.

These specific features of ODE-based population models limit the amount of


information classic approaches can extract for estimation purposes from observations
no matter their qualities or abundances. This advocates for the development of new
estimation procedures. Approximate methods (Varah 1982; Ramsay et al. 2007)
have already proven to be useful for ODE models to face such issues. They rely
on an approximation of the solution of the original ODE (1) which is expected
to have a smoother dependence with respect to the parameters and to relax the
constraint imposed by the model during the estimation process. The interest of such
approximations is twofold. Firstly, they produce estimators with a better conditioned
variance matrix comparing to classic likelihood based approaches. Secondly,
they reduce the effect of model error on estimator accuracy. Also, some of these
approximations bypass the need to estimate initial conditions (Ramsay et al. 2007;
Clairon 2020). Still, these methods are currently limited to cases where observations
are coming from one subject.
In this work, we develop a new estimation method adapted to NLME-ODEs
integrating such approximations to mitigate the effect of model misspecification and
poorly identifiable parameters on estimation accuracy, while avoiding the need to
estimate xi,0
u∗
as additional subject specific parameters. At the contrary to the men-
tioned methods, we propose here a hierarchical profiling approach, taking the form
of a nested estimation procedure, instead of relying on a classic joint likelihood

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Parameter estimation in nonlinear mixed effect models based… 2979

specification. One point being to avoid high-dimensional integrations often required


by likelihood approaches in mixed-effect
{ } models (Pinheiro and Bates 1994). The
unknown initial conditions xi,0 u∗
are seen as nuisance parameters for
{ ∗} i=1,…n
bi i=1,…n estimation, which are in turn considered as nuisance parameters for pop-
ulation parameter (𝜃 ∗ , Ψ∗ , 𝜎 ∗ ) estimation. This leads to the construction
{ } of outer,
middle and inner criteria for the estimation of (𝜃 ∗ , Ψ∗ , 𝜎 ∗ ), b∗i i=1,…n and
{ }
u∗
xi,0 respectively. The inner criteria is designed to incorporate
( i=1,…n )
u
ℙ xi,0 ∣ 𝜃, Ψ, bi if an expression is proposed for it but can also be defined without if
{ }
no prior information exists for xi,0 u∗
. Also, this criterion accounts for model
i=1,…n
error presence by assuming that the actual dynamic of each subject is better
described by a perturbed version of the ODE (1). This added perturbation aims to
capture various sources of errors at the subject level (Brynjarsdottir and O’Hagan
2014; Tuo and Wu 2015). We control the magnitude of the acceptable perturbations
by defining the inner criteria through a cost function balancing the two contrary
objectives of fidelity to the observations and to the original model: to this end, we
introduce a model discrepancy penalization term. The practical computation of the
chosen perturbations requires to solve optimal control problems (Clarke 2013)
known as tracking problems. This is done using a method inspired by Cimen and
Banks (2004) which has the advantage to automatically provide an estimator for xi,0 u∗

with no additional computational costs. This is the key element to efficiently profile
on unknown initial conditions during b∗i estimation, and treat them as nuisance
parameters instead of integrating them into bi definition, as it is usually done in the
previously mentioned methods.
In Sect. 2, we present the inner, middle and outer criteria used to define our
estimator. In Sect. 3, we compare our approach with classic maximum likelihood in
simulations. Then, we proceed to the real data analysis coming from clinical studies
and a model of the antibody concentration dynamics following immunization
with an Ebola vaccine in East African participants (Pasin et al. 2019). Section 5
concludes and discuss future extensions of the method.

2 Estimator construction: definition of the inner/middle/outer


criteria

From now on, we use the Cholesky decomposition 𝜎 2 Ψ−1 = △T △ and the para-
metrization 𝜙 ∶= (𝜃, Δ, 𝜎) instead of (𝜃, Ψ, 𝜎) to enforce positiveness and symmetry
of Ψ and denote in a summarized way the set of all population √parameters. The norm
‖.‖2 denotes the classic Euclidean one defined by ‖b‖2 = bT b. We estimate the
population and individual parameters via a nested procedure:

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2980 Q. Clairon et al.

• Estimator 𝜙
̂ obtained by minimization of an outer criterion F based on an
( ( ))
approximation of minb minxu − ln ℙ 𝜙, b, x0 ∣ y , the log joint-distribution
u
{ } of
( ) 0 { }
𝜙, b, x0 sequentially profiled on b ∶= bi i=1,…n and x0 ∶= xi,0
u u u
, which
i=1,…n
are respectively the set of all random effects and unknown initial conditions
among all subjects.
• Estimator b̂i ∶= b̂i (𝜙) obtained for each subject i by minimization of a middle
( )
criterion Gi based on an approximation of minxu u
− ln ℙ(𝐲𝐢 , bi , xi,0 ∣ 𝜙) , the
i,0

log joint-distribution of the data, the random effects and unknown initial con-
ditions profiled on the latter.
• Estimator x̂ u
i,0
∶= x̂
u
i,0
(𝜙, bi ) obtained for each subject i by minimization of an
inner criterion Hi based on an approximation of − ln ℙ(𝐲𝐢 , xi,0 u
∣ 𝜙, bi ), the log
joint-distribution of the data and unknown initial conditions.

Our estimation procedure can be expressed in a pseudo-algorithmic way.

1. Outer criteria minimization:


𝜙̂ = arg min𝜙 F(𝜙)
̃ b, xu ∣ y)
∶= arg min𝜙 minb minxu −2 ln ℙ(𝜙,
0 0
̃ ̂
= arg min𝜙 −2 ln ℙ(𝜙, b, x̂u ∣ y),
0

for a given subject i and 𝜙 value:


2. Middle criteria minimization:
̂
bi (𝜙) = arg minbi Gi (bi ∣ 𝜙)
∶= arg minbi minxu −2 ln ℙ(𝐲 ̃ 𝐢 , bi , xu ∣ 𝜙)
i,0 i,0
= arg minb −2 ln ℙ(𝐲̃ 𝐢 , bi , x̂
u
∣ 𝜙),
i i,0

for a given bi value:


3. Inner criteria minimization:


u
i,0
(𝜙, bi ) = arg minxu Hi (xi,0 ∣ 𝜙, bi )
i,0
̃ 𝐢 , xu ∣ 𝜙, bi ).
∶= arg minxu −2 ln ℙ(𝐲
i,0 i,0

In the following sections, we derive the expressions of F, Gi and Hi starting with


Hi since each criterion construction rely on lower level ones. Finally, (despite that)
the following formal presentation of criteria are made for any ℙ xi,0 u
∣ 𝜙, bi
expressions, we have to restrict ourselves to uniform, normal and log-normal den-
sities in practice to use our numerical procedures.

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Parameter estimation in nonlinear mixed effect models based… 2981

2.1 Inner criterion

In this section, we construct the criteria Hi used to estimate xi,0


u∗
for a given (𝜙, bi )
( )
value. A classic procedure would lead to jointly estimate bi , xi,0 by maximiza-
∗ u∗

tion of the log joint-likelihood function of the data and (bi , xi,0
u
). However for each
subject, we want to:

1. profile on xi,0
u∗
during random effects estimation to limit b∗i estimation degradation
due to presence of nuisance parameters,
( )
2. use prior knowledge given by ℙ xi,0 u
∣ 𝜙, bi if available,
3. allow an acceptable deviation from the assumed model at the population level to
take into account possible model misspecifications.

To solve the first and second point, we define our estimator:

1. as the maximizer of the joint conditional likelihood ℙ(𝐲𝐢 , xi,0


u
∣ 𝜙, bi ) if
( )
ℙ xi,0 ∣ 𝜙, bi is available,
u

2. otherwise as the maximizer of ℙ(𝐲𝐢 ∣ 𝜙, bi , xi,0


u
).
( )
From the expression ℙ(𝐲𝐢 , xi,0 u u
∣ 𝜙, bi ) = ℙ(𝐲𝐢 ∣ 𝜙, bi , xi,0 u
)ℙ xi,0 ∣ 𝜙, bi , we derive
( )
u
arg maxxu ℙ(𝐲𝐢 ∣ 𝜙, bi , xi,0 u
) = arg maxxu ℙ(𝐲𝐢 , xi,0 ∣ 𝜙, bi ) if ℙ xi,0
u
∣ 𝜙, bi is constant.
i,0 i,0

So, the estimation criteria in absence of prior information is equivalent to choosing a


uniform prior over xi,0 u
space and constitute only a particular case. We will thus focus on
ℙ(𝐲𝐢 , xi,0 ∣ 𝜙, bi ) from now one. We have:
u

∏ � �
u u u
ℙ(𝐲𝐢 , xi,0 ∣ 𝜙, bi ) = j ℙ(yij ∣ 𝜙, bi , xi,0 )ℙ xi,0 ∣ 𝜙, bi
∏ � �2 � �
−do ∕2 −do −0.5� CX (t )−yij � ∕𝜎 2
� 𝜃,bi ,xi,0 ij �2 u
= j (2𝜋) 𝜎 e ℙ xi,0
∣ 𝜙, b i ,

from which we derive the joint likelihood estimator:


u u
(𝜙, bi ) = arg minxu −2 ln ℙ(𝐲𝐢 , xi,0 ∣ 𝜙, bi )
i,0 �
��
i,0
∑ �
1 � �2
j� (t ) − yij � − 2 ln ℙ xi,0 ∣ 𝜙, bi
u
= arg minxu CX .
i,0 𝜎2 � 𝜃,bi ,xi,0 ij �2

We also want to allow the presence of perturbations at the subject scale comparing
to the original model defined at the population level. For this, we assume the
regression function is no longer X𝜃,bi ,xi,0 , but rather X𝜃,bi ,xi,0 ,ui , the solution of:
{
ẋ i (t) = f𝜃,bi (t, xi (t)) + Bui (t)
xi (0) = xi,0 . (2)

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2982 Q. Clairon et al.

This perturbed ODE has been obtained by the addition of the ( forcing term ) t ↦ Bui (t)
to ODE (1) with B a d × du matrix and ui a function in L2 [0, T], ℝdu representing
the perturbation. However, to ensure the possible perturbations remain small, we
∑ � �2
replace the data fitting criterion j �CX𝜃,bi ,xi,0 (tij ) − yij � by minui Ci (xi,0
u
, ui ∣ 𝜃, bi , U),
� �2
where
∑‖ ‖2
‖CX𝜃,bi ,xi,0 ,ui (tij ) − yij ‖ + ‖u‖ ,
u 2
Ci (xi,0 , ui ∣ 𝜃, bi , U) =
‖ ‖2 ‖ i ‖U,L2
j

and ‖ ‖2 T
‖ui ‖U,L2 = ∫0 ui (t) Uui (t)dt is the weighted Euclidean norm. Here, the magni-
T

tude of the allowed perturbations is controlled by a positive definite and symmetric


weighting matrix U. Finally, we obtain:


u
i,0
u
(𝜙, bi ) ∶= arg minxu Hi (xi,0 ∣ 𝜙, bi ) (3)
i,0

where
{ ( )}
u 1 u u
Hi (xi,0 ∣ 𝜙, bi ) = min min C (x , u ∣ 𝜃, b , U) − 2 ln ℙ x ∣ 𝜙, b .
u
xi,0 𝜎 2 ui i i,0 i i i,0 i

Computing Hi (xi,0 u
∣ 𝜙, bi ) requires to solve the infinite dimensional optimization
( )
problem minui Ci (xi,0 , ui ∣ 𝜃, bi , U) in L2 [0, T], ℝdu . This problem belongs to the
u

field of optimal control theory for which dedicated approaches have been developed
(Sontag 1998; Aliyu 2011; Clarke 2013). Here we use the same method as in Clai-
ron (2020) which is detailed in Appendix A. All it requires from the user is to spec-
ify a(pseudo-linear
) representation of ODE (1), i.e., a possibly state-dependent matrix
A𝜃,bi t, xi (t) and state-independent vector r𝜃,bi (t) such that:
( )
f𝜃,bi (t, xi (t)) = A𝜃,bi t, xi (t) xi (t) + r𝜃,bi (t). (4)

This formulation is crucial for solving the optimal control problem in a


computationally( efficient ) way. Linear models already fit in this formalism with
A𝜃,bi (t) ∶= A𝜃,bi t, xi (t) . For nonlinear models, the pseudo-linear representation is
not unique but always exists (in order to exploit this non-uniqueness as an additional
degree of freedom, see Cimen (2008) section 6). This method presents the advan-
tage of formulating minui Ci (xi,0 u
, ui ∣ 𝜃, bi , U) as a quadratic form (or a sequence of
quadratic forms) with respect to xi,0 u
. Thus, if we choose a uniform, normal or log-
( )
normal law for ℙ xi,0 u
∣ 𝜙, bi , arg minxu Hi (xi,0u
∣ 𝜙, bi ) has a closed form expression
i,0

(approximated for log-normal), and obtaining x̂ u


i,0
(𝜙, bi ) does not add any computa-
tional complexity comparing to minui Ci (xi,0 , ui ∣ 𝜃, bi , U).
u

For a given xi,0 u


, the perturbation ui corresponding to the solution of
u
minui Ci (xi,0 , ui ∣ 𝜃, bi , U) is named optimal control and denoted ui,𝜙,bi ,xu . In particular,
i,0

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Parameter estimation in nonlinear mixed effect models based… 2983

we denote ui,𝜃,bi ∶= ui,𝜃,bi ,x̂i,0 the optimal control corresponding to the initial condition
( ( )T )T
estimator x̂ ̂u T
i,0 = xi,0 (𝜙, bi ) , xi,0
k
. The solution of (2) corresponding to the opti-

mal control ui ∶= ui,𝜃,bi is denoted X 𝜃,bi and named optimal trajectory: it will be consid-
ered as the regression function for the i-th subject. X 𝜃,bi is thus defined as solution of
ODE (2) which needs the smallest perturbation in order to get close to the observations.
In particular, X 𝜃,bi and ui,𝜃,bi are respectively the subject specific state variable and per-
turbation such that:
{ }
1 ∑‖ ‖2 ‖ ‖ 2
Hi (x�
u
(𝜙, bi )∣𝜙, bi ) = 2 ‖CX̄ 𝜃,bi (tij ) − yij ‖ + ‖ū i,𝜃,bi ‖ 2
i,0
𝜎 ‖ ‖2 ‖ ‖U,L
j (5)
− 2 ln ℙ(x� (𝜙, bi )∣𝜙, bi ).
u
i,0

Again, formal expressions can be derived for both ui,𝜃,bi and x̂u
i,0
(𝜙, bi ), but they pre-
sent no interest for the sake of explanation and are left in Appendix A.

Remark 1 At this stage, we acknowledge the existing similarities between our approach
and the one presented in Wang et al. (2014), an extension of Ramsay et al. (2007) to a
population framework. Both methods approximate the original ODE and avoid initial
condition estimation. However, Wang et al. (2014) still consider classic likelihood for 𝜙
estimation
{ } and the absence of a proper probabilistic framework for handling
u
xi,0 makes it difficult to incorporate a priori information when available. More-
i=1,…n
over, the spline basis decomposition used by Wang et al. (2014) is a source of inaccu-
racy for ODE solution reconstruction, a cause of estimation error as pointed out in Clai-
ron (2020) in which Ramsay et al. (2007) and control based approaches have been
compared in a one subject setting. Finally, the estimation quality of the method pro-
posed in Ramsay et al. (2007) critically depends on hyperparameter choices (basis
dimension, knots location etc.) which can be complicated even when data are coming
from one subject and can thus become intractable for large populations.

2.2 Middle criterion

To construct an estimator b̂i of the random effects, we rely on an approximation of


u
ln ℙ(𝐲𝐢 , bi , xi,0 ∣ 𝜙) profiled on the unknown initial conditions. Since
u u u
ℙ(𝐲𝐢 , bi , xi,0 ∣ 𝜙) = ℙ(𝐲𝐢 ∣ 𝜙, bi , xi,0 )ℙ(bi , xi,0 ∣ 𝜙)
u u
= ℙ(𝐲𝐢 ∣ 𝜙, bi , xi,0 )ℙ(xi,0 ∣ 𝜙, bi )ℙ(bi ∣ 𝜙),

with ℙ(bi ∣ 𝜙) = √ 1 1
e− 2 bi
T
T △ △
𝜎2
bi , we can define as estimator:
(2𝜋)q |𝜎 2 (△T △) |
−1

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2984 Q. Clairon et al.

b̂i (𝜙) = u
arg minbi minxu −2 ln ℙ(𝐲𝐢 , bi , xi,0 ∣ 𝜙)
� i,0 � � ��
1 ∑ � �2
j� (t ) − yij � − 2 ln ℙ xi,0 ∣ 𝜙, bi
u
= arg minbi minxu CX

i,0 𝜎2 � 𝜃,bi ,xi,0 ij �2
‖ ‖
2
Δb
+ 𝜎 2i 2 .

Still, we use the same relaxation & penalization scheme as in the previous section to
account for model error presence for b∗i estimation. We replace again the term
∑ � �2
j� CX𝜃,bi ,xi,0 (tij ) − yij � by minui Ci (xi,0
u
, ui ∣ 𝜃, bi , U) in the previous criteria and we
� �2
end up with the following estimator:

b̂i (𝜙) ∶= arg minbi Gi (bi ∣ 𝜙) (6)

where:

( ) ‖Δb ‖2
̂u ‖ i ‖2 (7)
Gi (bi ∣ 𝜙) = Hi (xi,0 𝜙, bi ∣ 𝜙, bi ) + .
𝜎2

2.3 Outer criterion

2.3.1 F general expression

We focus in this section on population parameter estimation. Classic maximum


likelihood based approaches generally consider as estimator:

𝜙̂ ∶= arg max𝜙 ℙ(𝜙 ∣ y) = arg max𝜙 i ∫ ℙ(𝜙, bi , xi,0 u u
∣ 𝐲𝐢 )d(bi , xi,0 ). This generally
requires the numerical approximation of integrals of possibly high dimensions, a source
of approximation and computational issues (Pinheiro and Bates 1994). To avoid this,
we consider the random effects as nuisance parameters and rely on a classic profiling
approach for 𝜙∗ estimation (Murphy and der Vaart 2000). Instead of taking{the}mean,
we rely on the profiled joint distribution sequentially with respect to b ∶= bi i=1,…n
{ } ( )
and x0u = xi,0u
, or equivalently minb minxu −2 ln ℙ(𝜙, b, x0u ∣ y) . Bayes for-
i=1,…n 0

mula gives us ℙ(𝜙, b, x0u ∣ y) ∝ ℙ(y, b, x0u ∣ 𝜙)ℙ(𝜙) and we get


�∏ �
ℙ(𝜙, b, x0u ∣ y) ∝ i ℙ(𝐲𝐢 , bi , xi,0 ∣ 𝜙) ℙ(𝜙) by conditional independence of subject
u

by subject observations and subject specific parameters. It follows that


( ( )) ∑ { }
min min −2 ln ℙ 𝜙, b, x0u ∣ y ∝ min min −2 ln ℙ(𝐲𝐢 , bi , xi,0
u
∣ 𝜙) − 2 ln ℙ(𝜙),
b u
x0 i u
bi xi,0

from which we derive the estimator

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Parameter estimation in nonlinear mixed effect models based… 2985

{ { { }
∑ 1 ∑ ‖ ‖2
𝜙̂ = arg min𝜙 i minbi minxu 2 ‖CX𝜃,bi ,xi,0 (tij ) − yij ‖ − 2 ln ℙ(xi,0
u ∣ 𝜙, b )
i
i,0 𝜎 j ‖ ‖2
}
‖Δbi ‖22
+ 𝜎2
( o∑ ) }
i ni + nq ln 𝜎 − n ln | △ △| − 2 ln ℙ(𝜙)
+ d 2 T

by using the exact expression of ln ℙ(𝐲𝐢 , bi , xi,0 u


∣ 𝜙) (computational details are
recalled in Appendix B). In order to account for the presence of model error and
limit its effect on estimation, we replace in the last expression the classic profiled
likelihood estimator for b∗i and xi,0u∗
by b̂i (𝜙) and x̂
u
i,0
(𝜙, bi ) respectively and X𝜃,bi ,xi,0 by
X 𝜃,bi . This leads us to the following population parameter estimator:

𝜙̂ ∶= arg min𝜙 F(𝜙) (8)

where:
� �
1 ∑∑ � �2 � ̂ �2
F(𝜙) = 𝜎2 i j� CX (t ) − yij � + �Δbi (𝜙)�
� 𝜃,b̂i (𝜙) ij �2 � �2
̂u ̂ ̂ (9)
− 2 ln ℙ(xi,0 (𝜙, bi (𝜙)) ∣ 𝜙, bi (𝜙))
� ∑ �
+ do i ni + nq ln 𝜎 2 − n ln � △T △� − 2 ln ℙ(𝜙).

F profiling on  for uniform xi,0


2.3.2  u
distribution

If ℙ(xi,0u
∣ 𝜙, bi ) is constant then x̂ u
i,0
(𝜙, bi ) and b̂i (𝜙) do not depend on 𝜎 i.e.

u
i,0
(𝜙, bi ) = x̂u
i,0
(𝜃, bi ) and b̂i (𝜙) = b̂i (𝜃, Δ) and consequentially neither does
X 𝜃,b̂ (𝜃,Δ) . So, for each (𝜃, Δ), the maximizer in 𝜎 2 of F(𝜙) = F(𝜃, Δ, 𝜎) has a closed
i
form expression:
� �
1 � �� �2 � � 2
𝜎 2 (𝜃, Δ) = � ∑ � �CX 𝜃,b̂i (𝜃,Δ) (tij ) − yij � + �Δb̂i (𝜃, Δ)� .
do i ni + qn i � �2 � �2
j
(10)
By using 𝜎 (𝜃, Δ) expression, we get min𝜎 2 F(𝜙) = F[𝜃, Δ] where:
2

( )
∑ ( )
F[𝜃, Δ] = d o
ni + qn ln 𝜎 2 (𝜃, Δ) − n ln | △T △| − 2 ln ℙ(𝜙).
i

Thus, we can profile F(𝜙) on 𝜎 2 and define our estimator as:


( )
̂Δ
𝜃, ̂ = arg min(𝜃,Δ) F[𝜃, Δ] . (11)
( )
An estimator of 𝜎 ∗ is obtained from there by computing 𝜎 2 𝜃,
̂Δ̂ , given by equation
(10). The details of F derivation are left in appendix B.

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2986 Q. Clairon et al.

( )
̂ 1
2.4 Asymptotic Variance‑Covariance matrix estimator for , ̂
( )
We derive an estimator of the asymptotic variance of 𝜃,̂Δ̂ . Here we restrict to
the case described in Sect. 2.3.2 when a uniform distribution is chosen for xi,0
u
and
the outer criterion is profiled on 𝜎 . The general case can be considered similarly,
but we withdraw it for the sake of clarity since it is not used in following simula-
tion works. We highlight that in practice the matrix Δ is parametrized by a(vector )
𝛿 of dimension q′ , i.e △ ∶= △(𝛿) and we give here a variance estimator of 𝜃, ̂ 𝛿̂ .
From this, the variance of Δ ̂ can be obtained using classic delta-methods (see
van der Vaart (1998) chapter 3).
Let us start by presenting sufficient conditions ensuring our estimator is asymp-
totically normal, by introducing h(bi , 𝜃, Δ, 𝐲𝐢 ) = � �2 ∑ �CX 𝜃,b
�Δbi �2 + j �
� i
‖2
(tij ) − yij ‖ :
‖2
� ∑n � ̂ ��
1. the function F[𝜃, � � � �
+ ln �Δ(𝛿)� has a well
1
limn i 𝔼 h(b(𝜃,Δ(𝛿)),𝜃,Δ(𝛿),yi )
̃ Δ(𝛿)] = −0.5 do 𝔼 n 1 + q ln n
do 𝔼[n1 ]+q
( )
separated minimum 𝜃, 𝛿 belonging to the interior of a compact Θ × Ω,
{ }
2. the true densities of unknown initial conditions ℙ∗ (. ∣ 𝜙∗ , b∗i ) i=1,…,n have finite
variance and either

(a) are identicals: ℙ∗ (. ∣ 𝜙∗ , b∗i ) ∶= ℙ∗ (. ∣ 𝜙∗ ),


(b) or are such that for every 𝜀 > 0, we got
� n �
1 �� � �� 2
h(𝜈) (𝐲𝐢 ) − 𝔼 h(𝜈) (𝐲𝐢 ) 1�h(𝐲 )−𝔼 h(𝐲 ) >𝜀√V (𝜈) � = 0
n⟶∞ � (𝜈) �2
lim 𝔼
𝐢 [ 𝐢]
V i=1

(𝜈)
for 𝜈 = 0, 1 where h(𝜈) (𝐲𝐢 ) = d(𝜈)d (𝜃,𝛿) h ̂
(bi (𝜃, Δ(𝛿)), 𝜃, Δ(𝛿), 𝐲𝐢 ) and
√∑
i Var(h (𝐲𝐢 )) ,
(𝜈) (𝜈) 2
V = { }
3. the subject specific number of observations ni i=1,…n are i.i.d and uniformly
bounded, ( )
4. for all possibles values 𝜃, bi , the solution X𝜃,bi ,x∗ belongs to a compact 𝜒 of ℝd ,
i,0
and for all (t, 𝜃, x), the mapping bi ⟼ f𝜃,bi (t, x) has a compact support Θb,
( )
5. 𝜃, bi , t, x ⟼ f𝜃,bi (t, x) belongs to C1 (Θ × Θb × [0, T] × 𝜒, ( ℝ ), )
d
2
𝜕 Ci ̂
6. the matrices (xu (𝜃, Δ(𝛿)), u𝜃,̂b (𝜃,Δ(𝛿)) ∣ 𝜃, b̂i 𝜃, Δ(𝛿) , U)
𝜕 2 xi,0 i,0
and
2
( ) i

𝜕
G (b̂ 𝜃, Δ(𝛿) ∣ 𝜃, Δ(𝛿)) are of full rank almost surely for every sequence 𝐲𝐢,
𝜕2 b i i
i
7. t h e r e is a neighborhood Θ𝜃 of 𝜃 such that
( )
𝜃, bi , t, x ⟼ f𝜃,bi (t, x) ∈ C5 (Θ𝜃 × Θb × [0, T] × 𝜒, ℝd ).

Condition 2b is here to ensure asymptotic normality for non identically


distributed random variables via Lindeberg-Feller theorem. Conditions 1–4 are

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Parameter estimation in nonlinear mixed effect models based… 2987

( )
used to derive the consistency of our estimator toward 𝜃, 𝛿 by following classic
steps for M-estimator by proving 1. the uniform convergence of our stochastic
cost function to a deterministic one, 2. the existence of a well-separated minimum
for this deterministic function (van der Vaart 1998). Conditions 5–7(ensures) that
our cost function is asymptotically smooth enough in the vicinity of 𝜃, 𝛿 to pro-
ceed to a Taylor expansion
√ and transfer the regularity of the cost function to the
asymptotic behavior of n(𝜃̂ − 𝜃, 𝛿̂ − 𝛿). Less restrictive conditions can be estab-
lished under which our estimator is still asymptotically normal, in particular
regarding f𝜃,bi regularity with respect to t.

Theorem 1 Under conditions 1–7, there


( is) a model dependent lower bound
( ) 𝜆 such
that if ‖U‖2 > 𝜆 then the estimator 𝜃, 𝛿 converges almost surely to 𝜃, 𝛿 such
̂ ̂
that:
� � �T �

̂ ̂ −1
n(𝜃 − 𝜃, 𝛿 − 𝛿) ⇝ N 0, A(𝜃, 𝛿) B(𝜃, 𝛿) A(𝜃, 𝛿)−1

where
[ ] [ ]
1 ∑ 𝜕J(𝜃, 𝛿, 𝐲𝐢 ) 1 ∑
n
T
A(𝜃, 𝛿) = lim , B(𝜃, 𝛿) = lim J(𝜃, 𝛿, 𝐲𝐢 )J(𝜃, 𝛿, 𝐲𝐢 )
n n
i=1
𝜕(𝜃, 𝛿) n n
i
( )
J𝜃 (𝜃, 𝛿, 𝐲𝐢 )
and the vector valued function J(𝜃, 𝛿, 𝐲𝐢 ) = is given by:
J𝛿 (𝜃, 𝛿, 𝐲𝐢 )
d ̂
J𝜃 (𝜃, 𝛿, 𝐲𝐢 ) = h(b(𝜃, Δ(𝛿)), 𝜃, Δ(𝛿), yi )
d𝜃
d ̂
J𝛿 (𝜃, 𝛿, 𝐲𝐢 ) = h(b (𝜃, Δ(𝛿)), 𝜃, Δ(𝛿), yi )
d𝛿 i ( )
2 −1 𝜕 ▵ (𝛿)
− [ ] Tr ▵ (𝛿) h(b̂ i (𝜃, Δ(𝛿)), 𝜃, Δ(𝛿), yi ).
d o 𝔼 n1 + q 𝜕𝛿k

The proof is left in appendix D. The practical interest of this theorem is to give
an estimator of the Variance-Covariance matrix:
( )T ∑ ̂ ̂ 𝛿,𝐲
̂ 𝐢)
𝜕 J(𝜃,
̂ 𝛿)
̂ ≃ A( ̂ 𝜃,
̂ 𝛿) ̂ 𝜃,
̂ −1 B( ̂ 𝛿)
̂ A( ̂ 𝜃,
̂ 𝛿)
̂ −1 ∕n with ̂ 𝜃,
̂ 𝛿)̂ =−
i=1 𝜕(𝜃,𝛿) ,
1 n
V(𝜃, A( n
̂ 𝜃,
B( ̂ = 1 ∑n ̂
̂ 𝛿) ̂ 𝐲𝐢 )̂
̂ 𝛿,
J(𝜃, ̂ 𝛿,
J(𝜃, ̂ 𝐲𝐢 )T and the vector valued function
i=1
n
( )
̂
J𝜃 (𝜃, 𝛿, 𝐲𝐢 )
̂
J(𝜃, 𝛿, 𝐲𝐢 ) = given by Ĵ𝜃 (𝜃, 𝛿, 𝐲𝐢 ) = J𝜃 (𝜃, 𝛿, 𝐲𝐢 ) and
̂
J𝛿 (𝜃, 𝛿, 𝐲𝐢 )

Ĵ𝛿 (𝜃, 𝛿, 𝐲𝐢 ) = d
h(̂
bi (𝜃, Δ(𝛿)), 𝜃, Δ(𝛿), 𝐲𝐢 )
d𝛿 � �
− do ∑2nn +qn Tr △(𝛿)−1 𝜕△(𝛿)
𝜕𝛿k
h(̂
bi (𝜃, Δ(𝛿)), 𝜃, Δ(𝛿), 𝐲𝐢 ).
i i

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2988 Q. Clairon et al.

Now that we have proven


( the )existence of the variance-covariance matrix V(𝜃, 𝛿)
such that 𝛿̂ − 𝛿 ⇝ N 0, V(𝜃, 𝛿) , we can use the Delta method to derive the asymp-
( ) ( )−1
totic normality of the original matrix Ψ 𝛿̂ = 𝜎 2 Δ(𝛿) ̂ T Δ(𝛿)
̂ as well as an esti-
mator of its asymptotic
( variance.
) In the case of a diagonal matrix Ψ, composed of
{ }
the elements Ψ21 , … Ψ2q and of the parametrization △(𝛿) = Diag( e𝛿l )l=1,…q�
used in Sect. 3, we derive:
∗ ∗
̂ ⎞ ⎛ Ψ1 (𝛿 ∗ ) ⎞
⎛ Ψ1 (𝛿) ⎛ ⎛ e−𝛿1 0 0 ⎞ ⎛ e−𝛿1 0 0 ⎞⎞
⎜ ⋮ ⎟ − ⎜ ⋮ ⎟ ⇝ N ⎜0, 𝜎 ⎜ 0 ⋱ 0 ⎟V(𝜃 , 𝛿 )⎜ 0 ⋱ 0 ⎟⎟.
2 ∗ ∗
⎜ ̂ ⎟⎠ ⎜⎝ Ψq (𝛿 ∗ ) ⎟⎠ ⎜ ⎜ −𝛿 ∗ ⎟ ⎜ −𝛿 ∗ ⎟⎟
⎝ Ψq (𝛿) ⎝ ⎝ 0 0 e q⎠ ⎝ 0 0 e q ⎠⎠

Remark 2 The previous theorem 1 states that we retrieve a parametric convergence


rate. Thus, we avoid the pitfall described in Sartori (2003) for profiled methods in
presence of a number of nuisance parameters increasing with the number of subjects
(or strata to resume Sartori (2003) terminology) potentially leading to bias accu-
mulation for score functions among subjects. The i.i.d structure of random effects
allows us to rely on central limit theorem to avoid this accumulation phenomenon.

3 Results on simulated data

We compare the accuracy of our approach with maximum likelihood (ML) in


different models and experimental designs reflecting the problems exposed in the
introduction, that is estimation in 1. presence of model error, 2. partially observed
framework with unknown initial conditions and 3. presence of poorly identifiable
parameters. We proceed to Monte-Carlo simulations based on NMC = 100 runs. At
each run, we generate ni observations coming from n subjects on an observation
interval [0, T] with Gaussian measurement noise of standard deviation 𝜎 ∗. From
these data, we estimate 𝜃 ∗, Ψ∗ and b∗i with both estimation methods. We quantify( ) the
accuracy of each estimator 𝜓 ̂p of the population parameters estimate 𝜓 ̂ = 𝜃, Ψ via
̂ ̂
[ ]
Monte-Carlo computation of the bias Bias(̂ ̂p − 𝜓p , the empirical vari-
𝜓p ) = 𝔼 𝜓 ∗
[( )2 ]
[ ]
ance V e (̂
𝜓p ) = 𝔼 𝔼 𝜓 ̂p − 𝜓p∗ , the mean squared error
( )
MSE(̂ 𝜓p ) = Bias(̂ 𝜓p ), the estimated variance V
𝜓p )2 + V e (̂ ̂ 𝜓̂p , as well as the cover-
age rate of the 95%-confidence interval derived from it. This coverage rate, denoted
[CR in the √ following] results, corresponds to the frequency at which the interval
( )
𝜓
̂p ± z0.975 V ̂ 𝜓 ̂p contains 𝜓p∗ with z0.975 the 0.975−quantile of the centered
Gaussian law. We compute the previous quantities for the normalized values
𝜓
̂p
̂pnorm ∶=
𝜓 𝜓p∗
to make relevant comparisons among parameters with different order of

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Parameter estimation in nonlinear mixed effect models based… 2989

[ ]
‖ ∗ ̂ ‖2
magnitude. For b∗i ,
we estimate the mean squared error MSE(bi ) = 𝔼 ‖bi − bi ‖ .
̂
‖ ‖2
For each subsequent examples, we give the results for n = 50 and present in Appen-
dix C the case n = 20 to analyze the evolution of each estimator accuracy with
respect to data sparsity. For one example in section “Effect of population size on
estimation accuracy” in Appendix C, we also consider the case n = 100 to analyze
the evolution of estimation accuracy with respect to an increasing population size.
In the following, we use the superscript ML to denote the ML estimator. For the
fairness of comparison with ML, we choose a non-informative prior i.e.
ln ℙ(𝜃, Δ) = 0 for our method throughout this section (the impact of prior incorpora-
tion is analyzed in an example left in Appendix C, section “Effect of prior informa-
tion on estimation accuracy”, for a discussion on prior choice in NLME-ODEs see
Prague et al. (2012)). Also, we do not use a distribution for xi,0
u
for(our approach.
) For
ML which requires it, we will use the right parametric form for ℙ xi,0 ∣ 𝜙, bi . If the
u

ODE (1) has an analytical solution, the ML estimator is computed via SAEM algo-
rithm (SAEMIX package Comets et al. (2017)). Otherwise, it is done via a restricted
likelihood method dedicated to ODE models implemented in the nlmeODE package
(Tornoe et al. 2004). For our method, we need to select U balancing model and data
fidelity in the inner and middle criteria (5)-(7). We use the method presented in
G. Hooker and Earn (2011) to compute { EPi}(U), the prediction error for the subject i
corresponding to the estimators 𝜃U , bi,U
̂ ̂ obtained for a given matrix U.
i=1,…n

From this, we compute EP(U) = i EPi (U) the { global
} prediction
{ error
} for the whole
population. We test a trial of scalar matrices Ul l=1,…L = 𝜆l × Id l=1,…L and retain
{ }
the hyperparameter value 𝜆l minimizing EP and we denote 𝜃, ̂ b̂i,
̂ Ψ, the cor-
i=1,…n

responding estimator. For solving the optimization problems required for computing
our criteria, we use the Nelder-Mead algorithm implemented in the optimr package
(Nash 2016). All optimization algorithms used here require a starting guess value.
We start from the true parameter value for each of them. By doing so, we aim to
keep distinct two problems: 1. the numerical stability of the estimation procedures,
2. the intrinsic accuracy of the different estimators. These two problems are corre-
lated, but we aim to address only the latter which corresponds to the issues raised in
introduction. Still, we check on preliminary
( analysis
) that local minima presence was
not an issue in the neighborhood of 𝜃 ∗ , △∗ by testing different starting points for
all methods. No problem appears for our method and SAEMIX. A negligible num-
ber of non convergence cases appear for nlmeODE which have been discarded
thanks to the convergence criteria embedded in the package (the occurence and
importance of such convergence issues is analyzed in an example left in section
“Effect of wrong first guess on estimation accuracy” in Appendix C in which we
show that our method suffers less than ML from convergence issues when initial
conditions are unknown).

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2990 Q. Clairon et al.

Fig. 1  Left: Examples of solutions of (12) and corresponding observations. Right: Solution of (12) and a
realization of (13) for the same parameter values

3.1 Application 1—partially observed linear model

We consider the population model where each subject i follows the ODE:

Table 1  Results of estimation for model (12). The different subscripts stand for the following estimation
( )
scenarios: 1. x0 when both initial conditions are set to x1,i,0
∗ ∗
, x2,i,0 , 2. x0,2 when x1,i,0 is set to yi,0 and x2,i,0
to x2,i,0
∗ , 3. absence of subscript when x
1,i,0 is set to yi,0 and x2,i,0 is estimated. Results from our method are
in bold

Well-specified Misspecified
MSE Bias Ve ̂
V CR MSE bi MSE Bias Ve ̂
V CR MSE bi

𝜃1 𝜃̂xML 0.01 3e−3 0.01 0.01 0.95 0.02 3e−3 0.01 0.01 0.92
0

𝜃̂xML 0.01 3e−3 0.01 0.01 0.92 0.01 3e−3 0.01 0.01 0.93
2,0

𝜃̂ML 0.02 4e−3 0.02 0.01 0.90 0.02 0.02 0.02 0.01 0.90
𝜃̂ 0.01 6e−3 7e−3 0.01 0.98 0.01 0.01 0.01 0.01 0.96
𝜃2 𝜃̂xML 2e−4 1e−3 1e−4 1e−4 0.95 1e−3 0.02 4e−4 6e−4 0.88
0

𝜃̂xML 2e−4 1e−3 1e−4 1e−4 0.93 1e−3 0.02 5e−4 6e−4 0.88
2,0

𝜃̂ML 5e−4 4e−3 5e−4 3e−4 0.86 5e−3 0.02 4e−3 1e−3 0.77
𝜃̂ 4e−4 3e−4 4e−4 5e−4 0.98 3e−3 0.01 3e−3 5e−3 0.94
Ψ 𝜃̂xML 0.01 −0.02 0.01 0.01 0.93 6e−3 0.01 −0.02 0.01 0.01 0.93 0.01
0

𝜃̂xML 0.01 −0.02 0.01 0.01 0.92 7e−3 0.01 −0.02 0.01 0.01 0.94 0.02
2,0

𝜃̂ML 0.01 −0.02 0.01 0.01 0.92 0.01 0.01 −0.02 0.01 0.01 0.93 0.04
𝜃̂ 0.01 − 0.02 0.01 0.01 0.92 5e−3 0.01 − 0.03 0.01 0.02 0.92 0.02

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Parameter estimation in nonlinear mixed effect models based… 2991

⎧ Ẋ = 𝜙 X − 𝜙 X
⎪ ̇ 1,i 2,i 2,i 1,i 1,i
⎨ �X2,i = −𝜙2,i X2,i� � � (12)
⎪ X1,i (0), X2,i (0) = x1,i,0 , x2,i,0

with the following parametrization: log(𝜙1,i ) = 𝜃1 + bi and log(𝜙2,i ) = 𝜃2 where


bi ∼ N(0, Ψ). The true population parameter values are
𝜃 ∗ = (𝜃1∗ , 𝜃2∗ ) = (log (0.5), log (2)) and Ψ∗ = 0.52 and we are in a partially observed
framework where only X1,i is accessible. The true initial conditions are distributed
with x1,i,0

∼ N(2, 0.5) and x2,i,0 ∗
∼ N(3, 1). For the penalization term in our method,
we choose the values 𝜆l = 10 , 106 , 108 . An analytic solution exists for ODE (12). In
4

particular the first component is given by


x 𝜙
X1,i (t) = e−𝜙1,i t (x1,i,0 + 𝜙2,i,0−𝜙2,i (e(𝜙1,i −𝜙2,i )t − 1)) and will be used for estimation with
1,i 2,i
the SAEMIX package. We generate ni = 11 longitudinal observations per subject on
[0, T] = [0, 10] with measurement noise of standard deviation 𝜎 = 0.03. An exam-
ple of sampled observations and corresponding solutions are plotted in Fig. 1.
We want to investigate the impact of initial condition, especially the unobserved
one x2,i,0

, on the ML estimator accuracy. Indeed, our method does not need to
estimate x2,i,0∗
and thus no additional difficulties appear in this partially observed
framework. For the ML, however, it is a nuisance subject-specific parameter that
should be estimated and for which no observations are available. For this, we com-
pute 𝜃̂xML , 𝜃̂xML and 𝜃̂ML the ML estimator respectively when: 1. both initial conditions
0 2,0
are perfectly known, 2. x1,i,0 ∗
is replaced by the measured value, 3. in addition, x2,i,0

has to be estimated.

3.1.1 Well‑specified case

We used the exact model described in Sect. 3.1 for the estimation procedure. Thus,
we are in a completely well-specified setting, with all mechanisms modeled. We pre-
sent the estimation results in Table 1—left side. For ML, the results are goods in
terms of accuracy and consistent in terms of asymptotic confidence interval cover-
age rate when both initial conditions are known: 95% for 𝜃1 and 𝜃2, which is consist-
ent with theoretical results. However, there is a significant drop in accuracy when

x2,i,0 has to be estimated. In particular, the coverage rate drops to 90% and 86% for
𝜃1 and 𝜃2 respectively. Interestingly, ML inaccuracy is driven by bias and under-esti-
mated variance when initial conditions are not known (as shown by a greater V e than
̂ ). In this case our method provides a relevant alternative: it gives accurate estima-
V
tions with a good coverage rate for all parameters while avoiding the estimation of

x2,i,0 . Variances are properly estimated compared to empirical variances. Estimation
of individual random effects is also more accurate with our method, with a MSE for
bi 2 times smaller compared to ML with unknown initial conditions.

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2992 Q. Clairon et al.

3.1.2 Misspecified case in presence of model error at the subject level

To mimic the presence of misspecification, we now generate the observations from


the hypoelliptic stochastic model:

⎧ dX = 𝜙 X dt − 𝜙 X dt
⎪ 1,i 2,i 2,i 1,i 1,i
⎨ �dX2,i = −𝜙2,i X2,i � 𝛼dBt
� dt + � (13)
⎪ X1,i (0), X2,i (0) = x1,i,0 , x2,i,0

with Bt a Wiener process and 𝛼 = 0.1 the diffusion coefficient. For the sake of
comparison, a solution of (12) and a realization of its perturbed counterpart given
by (13) are plotted in Fig. 1. This framework where stochasticity only affects the
unmeasured compartment is known to be problematic for parameter estimation and
inference procedures are yet to be developed for sparse sampling case. From Fig. 1 it
is easy to see that the diffusion 𝛼 will be hard to estimate when we only have obser-
vations for X1,i . Thus, we still estimate the parameters from the model (12) which
is now seen as a deterministic approximation of the true stochastic process. Still, it
is expected that our method will mitigate the effect of stochasticity on the estima-
tion accuracy by taking into account model misspecification. Results are presented
in Table 1—right side. The differences between the two methods are similar to the
previous well-specified case with an additional loss of accuracy coming from model
error for both estimators. However, the misspecification effect for ML is more pro-
nounced comparing to our method which manages to limit the damages done. This
illustrates the benefits of taking into account model uncertainty for estimation, in
particular here when model error occurs in the unobserved compartment, a situation
in which classic statistical criteria for model assessment based on a data fitting crite-
rion are difficult to use.
Finally, we acknowledge that the effect of other misspecification sources can be
investigated. For example, the population which is here assumed homogeneous can
be in fact a mixture of subjects with random effects distributed according to differ-
ent laws. To account for this, we evaluate in section “Effect of outlier presence on
estimation accuracy” in Appendix C the situation in which an increasing fraction
of subjects are chosen as outliers for the random effect assumed distribution. We
then investigate its impact on estimation accuracy for ML and optimal control based
methods.

3.2 Application 2–Partially observed nonlinear model

We consider the model presented in De Gaetano and Arino (2000) for the analysis of
glucose and insulin regulation:

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Parameter estimation in nonlinear mixed effect models based… 2993

Table 2  Results of estimation for model (14). The different subscripts stand for the following estima-
tion scenarios: 1. SI when SI is set to SI∗, 2. absence of subscript when SI is estimated. Results from our
method are in bold
Well-specified Misspecified

MSE Bias Ve ̂
V CR MSE bi MSE Bias Ve ̂
V CR MSE bi

𝜃SG 𝜃̂SML 5e−5 2e−3 4e−5 9e−6 0.95 6e−5 3e−3 6e−5 2e−5 0.85
I

𝜃̂ML 2e−3 0.03 1e−3 8e−5 0.85 2e−3 3e−3 1e−3 2e−4 0.54

𝜃̂SI 1e−5 4e−4 1e−5 8e−6 0.95 2e−5 − 2e−5 2e−5 2e−5 0.93

𝜃̂ 2e−4 − 6e−4 2e−4 2e−4 0.96 3e−4 − 1e−3 3e−4 4e−4 0.93

𝜃SI 𝜃̂SML known known


I

𝜃̂ML 2e−3 0.03 1e−3 6e−5 0.90 0.01 0.04 0.01 1e−3 0.55

𝜃̂SI known known

𝜃̂ 1e−4 − 7e−4 1e−4 1e−4 0.96 3e−4 − 1e−3 3e−4 3e−4 0.92

𝜃m 𝜃̂SML 7e−4 3e−3 6e−4 5e−4 0.94 8e−4 − 3e−3 8e−4 5e−4 0.89
I

𝜃̂ML 9e−4 8e−3 8e−4 5e−4 0.86 5e−3 − 5e−3 5e−3 5e−4 0.88

𝜃̂SI 5e−4 6e−3 5e−4 5e−4 0.95 4− 4 7e−4 4e−4 5e−4 0.95

𝜃̂ 6e−4 6e−3 5e−4 5e−4 0.95 4e−4 6e−4 4e−4 5e−4 0.96

Ψ 𝜃̂SML 0.02 7e−4 0.02 0.02 0.95 0.02 0.03 − 3e−3 0.03 0.02 0.93 0.03
I

𝜃̂ML 0.04 −0.09 0.03 0.02 0.88 0.02 0.03 − 8e−3 0.02 0.02 0.87 0.03

𝜃̂SI 0.01 − 2e−3 0.01 0.01 0.95 0.01 0.01 − 4e−3 0.01 0.02 0.94 0.01

𝜃̂ 0.01 3e−3 0.01 0.01 0.94 0.01 0.02 − 7e−3 0.02 0.02 0.94 0.02

⎧ Ġ = S (G − G ) − X G
⎪ ̇i G B i i i
⎨ Ii = 𝛾t(Gi − h) − mi (Ii − IB ) (14)
⎪ Ẋ i = −p2 (Xi + SI (Ii − IB )).

The ODE system (14) rules the behavior of circulating glucose Gi and insulin Ii in
blood as well as insulin Xi present in interstitial fluid. We are in a partially
( observed)
case where only Gi and Ii are measured. The values of parameters p2 , 𝛾, h, GB , IB
( )
are set to (−4.93, −6.85, 4.14, 100, 100) and we aim to estimate 𝜃 = 𝜃SG , 𝜃SI , 𝜃m ,
linked to the original model via the parametrization: log(SG ) = 𝜃SG , log(SI ) = 𝜃SI and
log(mi ) = 𝜃m + bi where bi ∼ N(0, Ψ). The true population parameter values are

) and Ψ = 0.26 . The true subject-specific initial condi-



𝜃 ∗ = (−3.89,(−7.09, −1.81) 2

tions xi,0

= G∗i,0 , Ii,0
∗ ∗
, Xi,0 are distributed according to ln(xi,0

) ∼ N(lx∗ , Ψlx∗ ) with
( ) 0 0

lx∗ = (5.52, 4.88, −7) and Ψlx∗ = 0.172 , 0.12 , 10−4 . For the penalization term in our
0 0

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2994 Q. Clairon et al.

method, we choose the values 𝜆l = 106 , 107 , 108 . We generate ni = 5 observations on


[0, T] = [0, 180] with Gaussian measurement noise of standard deviation 𝜎 ∗ = 3. As
in the previous example, we investigate the impact of unknown initial conditions on
the estimators accuracy. We are particularly interested in the joint estimation of 𝜃SI ,
which appears only in the equation ruling the unobserved state variable Xi , and xi,0∗

required for each subject by ML. For this, we distinguish two cases, 1. when 𝜃SI is
known, 2. when 𝜃SI has to be estimated and we denote respectively 𝜃̂ SI and 𝜃 the cor-
̂
responding estimators. Finally, since the model is nonlinear, we have to specify a
pseudo-linear representation of the vector field as in (4):

� � ⎛ −SG 0 − Gi ⎞ ⎛ SG GB ⎞
A𝜃,bi t, Gi , Ii , Xi = ⎜ 𝛾t − mi 0 ⎟, r𝜃,bi (t) = ⎜ −𝛾th + mi IB ⎟.
⎜ ⎟ ⎜ ⎟
⎝ 0 − p2 SI − p2 ⎠ ⎝ p2 SI IB ⎠

3.2.1 Well‑specified case

We present the estimation results in Table 2—left side. Our method has small bias
and achieve good coverages in all cases. We obtain smaller MSE than ML and avoid
the drop in coverage rate of the confidence interval in the case of 𝜃S∗ estimation,
I
which is often needed in practice. The difference between the two estimators
behavior is explained by the fact that they are defined through the construction of
two different optimization problems. At the population level, our approach leads to
minimize a cost function depending on a 4-dimensional parameter whereas ML, due
to its need to estimate xi,0

, considers a 10-dimensional one. Thus, the parameter
spaces explored by each method to look for the minimum are very different.

Table 3  Biological interpretation and parameter values


Param- Biological interpretation Values
eters
𝛿L Long-lived B-cells declining rate log(2)∕(364 × 6)

𝜃∗ 𝜃𝛿∗ Mean log-value for 𝛿S , the short-lived cells declining rate log(log(2)∕1.2) ≃ −0.54
S
𝜃𝜙∗ Mean log-value for 𝜙S , the antibodies influx from short-lived cells log(2755) ≃ 7.92
S
𝜃𝜙∗ Mean log-value for 𝜙L , the antibodies influx from long-lived cells log(16) ≃ 2.78
L
𝜃𝛿∗ Mean log-value for 𝛿Ab, the antibodies declining rate log(log(2)∕24) ≃ −3.54
Ab
Ψ∗ Ψ∗𝜙 Inter individual variance for log(𝜙S,i ) 0.922
S

Ψ∗𝜙 Inter individual variance for log(𝜙L,i ) 0.852


L

Ψ∗𝛿 Inter individual variance for log(𝛿Ab,i ) 0.32


Ab

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Parameter estimation in nonlinear mixed effect models based… 2995

3.2.2 Misspecified case in presence of model error at the subject level

To mimic misspecification presence, we generate the observations from the


stochastic model:

⎧ dG = �S (G − G ) − X G �dt + 𝛼 dB
⎪ i � G B i i i � 1 1,t
⎨ dIi = �𝛾t(Gi − h) − mi (Ii − I�B )dt + 𝛼2 dB2,t , (15)
⎪ dXi = −p2 (Xi + SI (Ii − IB )) dt + 𝛼3 dB3,t

( ) ( )
where the Bi,t are Wiener processes and 𝛼1 , 𝛼2 , 𝛼3 = 2, 2, 2 × 10−4 their diffusion
coefficients. We present the estimation results in Table 2—right side. For ML, the
drop in coverage rate for 𝜃S∗ and 𝜃S∗ is even more striking when 𝜃S∗ needs to be esti-
G I I
mated. This is explained by the effect of model misspecification which increases
bias and the fact that ML does not take into account this new source of uncertainty
which leads to under-estimation of variance and too narrow confidence intervals.
Our method achieved small bias, nominal coverages and small MSE for random
effects.

3.3 Application 3—antibody concentration evolution model

We consider the model presented in Pasin et al. (2019) to analyze the antibody con-
centration, denoted Ai , generated by two populations of antibody secreting cells: the
short lived, denoted Si , and the long-lived, denoted Li:

⎧ Ṡ i = −𝛿S Si
⎪ L̇ i = −𝛿L Li
⎨ Ȧ = 𝜗 S + 𝜗 L − 𝛿 A (16)
⎪ �i S,i i L,i �i �Ab,i i �
⎩ Si (0), Li (0), Ai (0) = Si,0 , Li,0 , Ai,0 .

This model is used to quantify the humoral response on different populations after
an Ebola vaccine injection with a 2 doses regimen seven days after the second
injection when the antibody secreting cells enter in a decreasing phase. These cells
being unobserved, the preceding equation can be simplified to focus on antibody
concentration evolution:

Ȧ i = 𝜙S,i e−𝛿S t + 𝜙L,i e−𝛿L t − 𝛿Ab,i Ai (17)

with 𝜙S,i ∶= 𝜗S,i Si,0 and 𝜙L,i ∶= 𝜗L,i Li,0. This equation has an analytic solution which
will be used for ML. We consider the following parametrization: log(𝛿S ) = 𝜃𝛿S ,
log(𝜙S,i ) = 𝜃𝜙S + b𝜙S ,i , log(𝜙L,i ) = 𝜃𝜙L + b𝜙L ,i and log(𝛿Ab,i ) = 𝜃𝛿Ab + b𝛿Ab ,i . The true
parameter values are presented in Table 3. For the penalization term in our method,
we choose the values 𝜆l = 103 , 105 , 107 , 108 .
According to Pasin et al. (2019), 𝛿L was non-identifiable based on the available
data and only a lower bound has been derived for it via profiled likelihood. So, to
make fair comparisons between our approach and maximum likelihood, we do not

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2996 Q. Clairon et al.

Table 4  Results of estimation for model (17). The different subscripts stand for the following estima-
tion scenarios: 1. 𝛿S when 𝜃𝛿S is set to 𝜃𝛿∗ , 2. absence of subscript when 𝜃𝛿S is estimated. Results from our
method are in bold
S

Well-specified Misspecified
MSE Bias Ve ̂
V CR MSE bi MSE Bias Ve ̂
V CR MSE bi

𝜃𝛿S 𝜃̂𝛿ML known known


S

𝜃̂ML 2.13 0.78 1.51 70.64 0.92 3.88 1.48 1.68 4.10 0.80
𝜃̂𝛿S known known
𝜃̂ 0.62 − 0.34 0.50 0.66 0.92 0.93 − 0.40 0.77 0.62 0.90
𝜃𝜙S 𝜃̂𝛿ML 4e−4 0.01 3e−4 3e−4 0.94 1e−3 0.02 1e−3 5e−4 0.91
S

𝜃̂
ML
0.01 −0.05 7e−3 0.40 0.92 0.02 −0.10 0.01 0.02 0.88
𝜃̂𝛿S 2e−3 − 0.05 2e−4 1e−3 0.94 7e−4 − 0.02 3e−4 1e−3 0.92
𝜃̂ 2e−3 1e−3 2e−3 2e−3 0.93 4e−3 − 6e−3 3e−3 0.01 0.90
𝜃𝜙L 𝜃̂𝛿ML 3e−3 0.02 3e−3 2e−3 0.95 5e−3 0.03 4e−3 3e−3 0.93
S

𝜃̂ML 4e−3 0.03 4e−3 3e−3 0.90 9e−3 0.05 7e−3 4e−3 0.90
𝜃̂𝛿S 7e−4 − 0.01 5e−4 3e−3 0.95 2e−3 − 0.02 3e−3 2e−3 0.97
𝜃̂ 3e−3 − 3e−3 3e−3 2e−3 0.91 6e−3 − 8e−3 6e−3 7e−3 0.90
𝜃𝛿Ab 𝜃̂𝛿ML 7e−4 − 0.02 5e−4 3e−4 0.93 2e−3 − 0.03 1e−3 1e−3 0.92
S

𝜃̂ML 2e−3 −0.02 1e−3 4e−4 0.88 4e−3 −0.04 3e−3 7e−4 0.88
𝜃̂𝛿S 2e−4 0.01 1e−4 3e−4 0.95 3e−4 2e−3 3e−4 3e−4 0.96
𝜃̂ 4e−4 0.01 3e−4 2e−4 0.90 3e−4 8e−3 3e−4 2e−3 0.89
Ψ𝜙S 𝜃̂ML 0.04 − 1e−3 0.04 0.07 1 0.15 0.05 0.03 0.05 0.08 1 0.17
𝛿 S

𝜃̂ML 0.11 0.01 0.11 0.05 1 0.17 0.13 0.01 0.13 0.25 1 0.21
𝜃̂𝛿S 0.02 8e−3 0.02 0.01 0.94 0.06 0.02 2e−3 0.02 0.02 0.94 0.11
𝜃̂ 0.02 − 0.03 0.02 0.02 0.94 0.07 0.02 − 0.05 0.02 0.03 0.92 0.08
Ψ𝜙L 𝜃̂ML 0.03 0.04 0.02 0.04 1 0.30 0.05 0.03 0.05 0.06 1 0.73
𝛿 S

𝜃̂ML 0.03 0.05 0.02 0.04 1 0.60 0.03 0.05 0.02 0.07 1 0.74
𝜃̂𝛿S 0.02 − 0.1 5e−3 8e−3 0.93 0.07 0.02 − 0.10 0.01 0.02 0.91 0.10
𝜃̂ 0.03 − 0.06 0.02 0.01 0.92 0.08 0.03 − 0.06 0.02 0.03 0.87 0.12
Ψ𝛿Ab 𝜃̂ML 0.11 0.18 0.08 0.02 1 0.10 0.33 0.41 0.17 0.05 1 0.56
𝛿 S

𝜃̂ML 0.20 0.29 0.11 0.02 1 0.50 0.30 0.34 0.19 0.05 1 0.69
𝜃̂𝛿S 0.10 − 0.30 0.01 0.01 0.95 0.03 0.10 − 0.16 0.08 0.06 0.91 0.04
𝜃̂ 0.11 − 0.27 0.04 0.04 0.95 0.04 0.15 − 0.29 0.06 0.10 0.88 0.06

estimate it. Regarding population parameters, we are particularly interested in the


behavior of estimation methods for 𝜃𝛿S and 𝜃𝜙S . Indeed, a parameter sensitivity
analysis shows the symmetric role of 𝜃𝛿S and 𝜃𝜙S on the ODE solution (see Balelli
et al. (2020)). Thus, they are likely to face practical identifiability problems. To

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Parameter estimation in nonlinear mixed effect models based… 2997

investigate this effect, we estimate the parameters when 𝜃𝛿∗ 1. is known (the
S
corresponding estimators will be denoted with the subscript 𝛿S ), or 2. has to be
estimated as well.

3.3.1 Well‑specified case

We generate ni = 11 longitudinal observations on the interval [0, T] = [0, 364] with


measurement noise of standard deviation 𝜎 ∗ = 100. For each subject i, the initial
condition has been generated according to A∗i,0 ∼ N(A0 , 𝜎 2 ) with A0 = 500 and
A0
𝜎A = 260 to reflect the dispersion observed in data presented in Pasin et al. (2019).
0
We present the estimation results in Table 4—left side.
Our method gives an improved estimation with a dramatically reduced{ } variance
for 𝜃𝛿∗ comparing to ML, as well as an improved estimate for the b∗i i=1,…n in all
S
cases. We assume that { ∗is} due to the committed estimation error for 𝜃 which causes

model error during bi i=1,…n estimation, not accounted for by ML. This in turn
explains why variance Ψ∗ is better estimated with our approach. In this mixed-effect
context, this cause of model error is systematically present and claims for the use of
estimation methods taking it into account when subject specific parameters are criti-
cal for the practitioner.

3.3.2 Misspecified case in presence of model error at the subject level

The data are generated with a stochastic perturbed version of ODE (17):
( )
dAi = 𝜙S,i e−𝛿S t + 𝜙L,i e−𝛿L t − 𝛿Ab,i Ai dt + 𝛼dBt (18)

where Bt is a Wiener process and 𝛼 = 10 its diffusion coefficient. The value for 𝛼 has
been chosen big enough to produce significantly perturbed trajectories but small
enough to ensure that ODE (17) is still a relevant approximation for estimation
purpose. The results are presented
{ ∗ } in Table 4—right side. Our method outperforms
the ML for 𝜃𝛿 as well as for bi i∈[1, n] estimation and their variances. However, we

S
acknowledge that this last simulation setting is challenging even for our approach

Table 5  Estimation presented in Estimations from Pasin Optimal Control appraoch


Pasin et al. (2019) and via our et al. (2019)
approach
Parameter Mean IC95% Mean IC95%

𝜃𝛿S −0.57 [−1.02, −0.02] −0.18 [−0.58, 0.22]


𝜃𝜙S 7.92 [7.52, 8.30] 7.45 [6.85, 7.96]
𝜃𝜙L 2.78 [2.62, 3.01] 2.58 [2.15, 3.01]
𝜃𝛿Ab −3.54 [−3.62, −3.45] −3.48 [−3.95, −3.01]
Ψ𝜙S 0.92 [0.83, 1.01] 0.64 [0.60, 0.70]
Ψ𝜙L 0.85 [0.78, 0.92] 0.70 [0.55, 0.90]
Ψ𝛿Ab 0.30 [0.24, 0.36] 0.25 [0.19, 0.31]

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2998 Q. Clairon et al.

Fig. 2  Mean trajectory for Pasin et al. (2019) estimation (Dashed line) and the optimal control approach
estimation (Solid line). Shaded area are the 95% confidence intervals

with confidence coverage around 90% for most of parameters, below the theoretical
rate of 95%.

4 Real data analysis

We use the presented estimation approach to address the same problem as Pasin
et al. (2019). This real data example is similar to the synthetic scenario performed
in Sect. 3.3. In brief, we use data from a phase I trial in East Africa evaluating the
effect of an heterologous anti-Ebola vaccine strategy in which Ad26.ZEBOV was
injected first and then MVA-BN-Filo with a delay of 28 days between the two
doses. We consider a population of n=28 individuals, with in average 5
measurements per subject. In order to ensure a fair comparison, we adopt a

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Parameter estimation in nonlinear mixed effect models based… 2999

Fig. 3  Examples of fitted trajectories for both methods for four different random subjects. Dashed lines:
fitted ODE solutions from Pasin et al. (2019). Solid line: optimal trajectories X 𝜃,
̂ b̂i obtained with optimal
control approach. Shaded area are the 95% confidence intervals

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
3000 Q. Clairon et al.

Fig. 4  (1) Up: Estimated residual controls for each subject, (2) bottom: mean optimal control and 95%
confidence interval for the optimal controls a) left: ui,𝜃̂P ,b̂ P ,y obtained from parameter estimation in Pasin
i i,0

et al. (2019), b) right: ui,𝜃,b ̂ obtained from our estimation


̂ (𝜃)
i

( )
Bayesian framework for 𝜃 = 𝜃𝛿S , 𝜃𝜙S , 𝜃𝜙L , 𝜃𝛿Ab and used the same prior distribu-
tion as in the original paper:

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Parameter estimation in nonlinear mixed effect models based… 3001

⎛⎛ −1 ⎞ ⎛ 25 0 0 0 ⎞⎞
⎜⎜ 0 ⎟ ⎜ 0 100 0 0 ⎟⎟
𝜋(𝜃) ∼ N ⎜⎜
0 ⎟ ⎜ 0 0 ⎟⎟
, .
0 100
⎜⎜ ⎟ ⎜ ⎟⎟
⎝⎝ −4.1 ⎠ ⎝ 0 0 0 1 ⎠⎠

We set our mesh-size to get 200 discretization points for each subject and we use
U = 10 i.e., a value lower than in the simulated data case because of the presence of
model error. We also proceed to the log-transformation of the data to stabilize the
measurement noise variance. Using the transformation Ãi (t) ∶= log10 Ai (t) in Equa-
tion (17), this drives us to use the following nonlinear model:

̇ 1 ( ) � 𝛿Ab,i
A�i (t) = 𝜙S,i e−𝛿S t + 𝜙L,i e−𝛿L t 10−Ai (t) − . (19)
ln(10) ln(10)
1 ( ) −x 𝛿Ab,i
We choose A𝜃,bi (t, x) = ln(10) 𝜙S,i e−𝛿S t + 𝜙L,i e−𝛿L t 10x and r𝜃,bi (t) = − ln(10) for the
pseudo-linear formulation of the model. In Table 5, we compare our estimations
with those presented in Pasin et al. (2019) obtained using the NIMROD software
(Prague et al. 2013). Both methods produce estimations with overlapping confidence
intervals for 𝜃 supporting the previous published results in term of antibodies con-
centrations
( dynamics
) over time. Still, significant differences appear for
Ψ𝜙S , Ψ𝜙L , Ψ𝛿Ab estimation with lower dispersion of random effects in the optimal
control approach. This is explained by the fact
{ that a}part of the variability is now
carried out by subject-specific perturbations ui,𝜃,b
̂ (𝜃)
̂ . In Fig. 2, we plot the
i i=1,…n
mean curve for both estimation methods, that is the solution of ODE (19) with 𝜃
value given by Table 5 and random effects set to 0. The mean evolution are compa-
rable between the two approaches. This is confirmed at the individual level in Fig. 3.
Finally, our method
{ can}be used to assess the model adequacy via the temporal evo-
lution analysis of ui,𝜃,b
̂ (𝜃)
̂ estimated as byproducts of our method. In Sect. 2.1,
i i=1,…n

we have also indicated that perturbations ui,𝜃,bi ,xi,0 can be computed for an arbitrary set
{ }
(𝜃, bi , xi,0 ). In particular, we estimate ui,𝜃̂P ,b̂ P ,y , the committed error corre-
i i,0 i=1,…n
P
sponding to (𝜃 , b̂i ), the population and subject specific estimators obtained in Pasin
̂P
et al. (2019). In Fig. 4, we plot both perturbation sets. Our method leads to residual
perturbations of smaller magnitudes and narrower confidence intervals. This means
that our approach produces an estimation which minimizes the committed model error
for each subject comparing to a method based only on a data fitting criterion as in Pasin
et al. (2019). Moreover, by reducing the size of the confidence interval for estimated
perturbations, we conclude to a mean perturbation among the population which is sta-
tistically different from zero at the beginning of observation interval. This may indicate
presence of model misspecification.

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
3002 Q. Clairon et al.

5 Conclusion

In this paper, we propose an estimation method that addresses problems encountered


by classical approaches in NLME−ODE models. We identify the following
shortcomings for exact methods such as likelihood-based inference: their difficulty
in the presence of model misspecification, their need to estimate initial conditions
as regular random effects, and their dramatic performance degradation in the
presence of poorly identifiable parameters. We propose here a method based on an
approximation of the profiled likelihood and control theory that accounts for the
presence of potential model uncertainty at the subject level and that can be easily
profiled on initial conditions. Simulations with and without model error illustrate
the advantages of regularization techniques for estimating poorly identifiable
parameters, subject-specific parameters, and their variances in NLME−ODEs. In
addition, bypassing the estimation of initial conditions represents a clear advantage
for partially observed systems comparing to likelihood based approaches, as
emphasized in the simulations.
Still, this benefit in term of estimation accuracy comes with a computational
price. On a server (see https://plafrim-users.gitlabpages.inria.fr/doc/ for more
server details) with the parallelization package Snow in R language, and for a
given choice of penalization matrix U, it takes approximately 10–15 min to obtain
an estimation for the two-dimensional linear model, 30 min for the insulin model
and 3-4 h for the antibody concentration evolution one, whereas it was a matter
of minutes for the other approaches. Nevertheless, the use of compiled languages
and proper parallelization could reduce the computation time. Moreover, we have
willingly separated the formal definition of the optimal control problem required by
our method and the numerical procedure used to solve it, in case it may exist better
suited approaches for this specific control problem. Right now, our current strategy
allows us to profile on initial conditions (despite requiring continuous observations
and thus excluding applications to count or binary ones), therefore looking for
another numerical procedure is beyond the scope of this paper.
Finally, the qualitative assessment of model misspecification exposed in Sect. 4
can be made more rigorous. In a one subject setting, the estimation of a perturbation
term at the derivative level via non-parametric procedures to test model error
presence has been already explored (Hooker et al. 2015; Engelhardt et al. 2017).
Comparing to statistical methods solely based on data fitting criteria, they generally
produce more sensitive statistical tests and can explore misspecification presence
even for unobserved state−variables. Our control based approach can extend such
tests to a population framework, while avoiding issues due to hyperparameter
selection required for non-parametric statistical methods which can appear for
a growing number of subjects. For example, to stay in a Bayesian setting, we can
specify a prior distribution for the controls and then compare it with the obtained
posterior once the inference is made. This would lead to a semi-parametric inference
problem for which an optimal control based approach has already been proven
useful (see Clairon (2020)). This is a subject for further work.

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Parameter estimation in nonlinear mixed effect models based… 3003

6 Supplementary information

A supplementary file containing the appendixes and proof of theorem 1 is available


alongside this article.
Supplementary Information The online version contains supplementary material available at https://​doi.​
org/​10.​1007/​s00180-​023-​01420-x.

Acknowledgements Experiments presented in this paper were carried out using the PlaFRIM experimen-
tal testbed, supported by Inria, CNRS (LABRI and IMB), Université de Bordeaux, Bordeaux INP and
Conseil Régional d’ Aquitaine (see https://​www.​plafr​im.​fr/). This manuscript was developed under WP4
of EBOVAC3.

Funding This work has received funding from the Innovative Medicines Initiative 2 Joint Undertaking
under projects EBOVAC1 and EBOVAC3 (respectively grant agreement No 115854 and No 800176).
The IMI2 Joint Undertaking receives support from the European Union’s Horizon 2020 research and
innovation programme and the European Federation of Pharmaceutical Industries and Association.

Availability of data and materials Not applicable

Code availability Our estimation method is implemented in R and a code reproducing the examples of
Sect. 3 is available on a GitHub repository located here (https://github.com/QuentinClairon/NLME_
ODE_estimation_via_optimal_control.git).

Declarations
Conflict of interest The authors have no conflicts of interest to declare that are relevant to the content of
this article.

Ethics approval Not applicable.

Consent to participate Not applicable.

Consent for publication Not applicable.

References
Aliyu M (2011) Nonlinear H-Infinity Control, Hamiltonian Systems and Hamilton-Jacobi Equations,
CRC Press
Balelli I, Pasin C, Prague M et al (2020) A model for establishment, maintenance and reactivation of the
immune response after vaccination against Ebola virus. J Theor Biol 495:110254
Brynjarsdottir J, O’Hagan A (2014) Learning about physical parameters: the importance of model dis-
crepancy. Inverse Prob 30:24
Campbell D (2007) Bayesian collocation tempering and generalized profiling for estimation of param-
eters from differential equation models. PhD thesis, McGill University Montreal, Quebec
Cimen T (2008) State−dependent Riccati equation (SDRE) control: a survey. IFAC Proc 41:3761–3775
Cimen T, Banks S (2004) Global optimal feedback control for general nonlinear systems with nonquad-
ratic performance criteria. Syst Control Lett 53:327–346
Clairon Q (2020) A regularization method for the parameter estimation problem in ordinary differential
equations via discrete optimal control theory. J Stat Plan Inference 210:1–9
Clarke F (2013) Functional analysis, calculus of variations and optimal control, Graduate Texts in Math-
ematics, Springer, London

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
3004 Q. Clairon et al.

Comets E, Lavenu A, Lavielle M (2017) Parameter estimation in nonlinear mixed effect models using
saemix, an R implementation of the SAEM algorithm. J Stat Softw 80:1–42
De Gaetano A, Arino O (2000) Mathematical modelling of the intravenous glucose tolerance test. J Math
Biol 40(2):136–168
Donnet S, Samson A (2006) Estimation of parameters in incomplete data models defined by dynamical
systems. J Stat Plan Inference 137(9):2815–2831
Engelhardt B, Kschischo M, Fröhlich H (2017) A Bayesian approach to estimating hidden variables as
well as missing and wrong molecular interactions in ordinary differential equation-based mathemat-
ical models. J R Soc Interface 14(131):20170332
Engl H, Flamm C, Kügler P et al (2009) Inverse problems in systems biology. Inverse Prob 25(12):123014
Hooker G, Ellner SP, Roditi LD, Earn DJ (2011) Parameterizing state−space models for infectious dis-
ease dynamics by generalized profiling: measles in Ontario. J R Soc 8:961–974
Guedj J, Thiebaut R, Commenges D (2007) Maximum likelihood estimation in dynamical models of
HIV. Biometrics 63:1198–206
Gutenkunst RN, Waterfall J, Casey F et al (2007) Universally sloppy parameter sensitivities in systems
biology models. Public Libr Sci Comput Biol 3:e189
Hooker G, Ellner SP et al (2015) Goodness of fit in nonlinear dynamics: Misspecified rates or misspeci-
fied states? Ann Appl Stat 9(2):754–776
Huang Y, Dagne G (2011) A Bayesian approach to joint mixed-effects models with a skew normal distri-
bution and measurement errors in covariates. Biometrics 67:260–269
Huang Y, Lu T (2008) Modeling long-term longitudinal HIV dynamics with application to an aids clini-
cal study. Ann Appl Stat 2:1348–1408
Kampen NV (1992) Stochastic process in physics and chemistry. Elsevier
Kennedy MC, O’Hagan A (2001) Bayesian calibration of computer models. J R Stat Soc Ser B (Stat
Methodol) 63(3):425–464
Kurtz T (1978) Strong approximation theorems for density dependent Markov chains. Stoch Process Appl
6:223–240
Lavielle M, Aarons L (2015) What do we mean by identifiability in mixed effects models? J Pharmacoki-
net Pharmacodyn 43:111–122
Lavielle M, Mentré F (2007) Estimation of population pharmacokinetic parameters of saquinavir in HIV
patients with the monolix software. J Pharmacokinet Pharmacodyn 34:229–249
Leary TO, Sutton A, Marder E (2015) Computational models in the age of large datasets. Curr Opin
Neurobiol 32:87–94
Lunn D, Thomas A, Best N et al (2000) Winbugs—a Bayesian modelling framework: concepts, structure
and extensibility. Stat Comput 10:325–337
Lavielle M, Samson A, Karina Fermin A, Mentre F (2011) Maximum likelihood estimation of long terms
HIV dynamic models and antiviral response. Biometrics 67:250–259
Murphy S, der Vaart AV (2000) On profile likelihood. J Am Stat Assoc 95:449–465
Nash JC (2016) Using and extending the optimr package
Pasin C, Balelli I, Van Effelterre T et al (2019) Dynamics of the humoral immune response to a prime−
boost Ebola vaccine: quantification and sources of variation. J Virol 93(18):e00579-19
Perelson A, Neumann A, Markowitz M et al (1996) Hiv-1 dynamics in vivo: virion clearance rate,
infected cell life−span, and viral generation time. Science 271:1582–1586
Pinheiro J, Bates DM (1994) Approximations to the loglikelihood function in the nonlinear mixed effects
model. J Comput Graph Stat 4:12–35
Prague M, Commenges D, Drylewicz J et al (2012) Treatment monitoring of HIV-infected patients based
on mechanistic models. Biometrics 68(3):902–911
Prague M, Commengues D, Guedj J et al (2013) Nimrod: a program for inference via a normal approx-
imation of the posterior in models with random effects based on ordinary differential equations.
Comput Methods Programs Biomed 111:447–458
Raftery A, Bao L (2010) Estimating and projecting trends in HIV/aids generalized epidemics using incre-
mental mixture importance sampling. Biometrics 66:1162–1173
Ramsay J, Hooker G, Cao J et al (2007) Parameter estimation for differential equations: a generalized
smoothing approach. J R Stat Soc 69:741–796
Sartori N (2003) Modified profile likelihood in models with stratum nuisance parameters. Biometrika
90:533–549
Sontag E (1998) Mathematical control theory: deterministic finite−dimensional systems. Springer, New
York

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Parameter estimation in nonlinear mixed effect models based… 3005

Tornoe C, Agerso H, Jonsson EN et al (2004) Non-linear mixed-effects pharmacokinetic/pharmaco-


dynamic modelling in NLME using differential equations. Comput Methods Programs Biomed
76:31–41
Tuo R, Wu C (2015) Efficient calibration for imperfect computer models. Ann Stat
van der Vaart A (1998) Asymptotic Statistics, Cambridge Series in Statistical and Probabilities Math-
ematics, Cambridge University Press
Varah JM (1982) A spline least squares method for numerical parameter estimation in differential equa-
tions. SIAM J Sci Stat Comput 3(1):28–46
Villain L, Commenges D, Pasin C et al (2019) Adaptive protocols based on predictions from a mechanis-
tic model of the effect of IL7 on CD4 counts. Stat Med 38(2):221–235
Wang L, Cao J, Ramsay J et al (2014) Estimating mixed-effects differential equation models. Stat Comput
24:111–121

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under
a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted
manuscript version of this article is solely governed by the terms of such publishing agreement and
applicable law.

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center
GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers
and authorised users (“Users”), for small-scale personal, non-commercial use provided that all
copyright, trade and service marks and other proprietary notices are maintained. By accessing,
sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of
use (“Terms”). For these purposes, Springer Nature considers academic use (by researchers and
students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and
conditions, a relevant site licence or a personal subscription. These Terms will prevail over any
conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription (to
the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of
the Creative Commons license used will apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may
also use these personal data internally within ResearchGate and Springer Nature and as agreed share
it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not otherwise
disclose your personal data outside the ResearchGate or the Springer Nature group of companies
unless we have your permission as detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial
use, it is important to note that Users may not:

1. use such content for the purpose of providing other users with access on a regular or large scale
basis or as a means to circumvent access control;
2. use such content where to do so would be considered a criminal or statutory offence in any
jurisdiction, or gives rise to civil liability, or is otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association
unless explicitly agreed to by Springer Nature in writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a
systematic database of Springer Nature journal content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a
product or service that creates revenue, royalties, rent or income from our content or its inclusion as
part of a paid for service or for other commercial gain. Springer Nature journal content cannot be
used for inter-library loans and librarians may not upload Springer Nature journal content on a large
scale into their, or any other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not
obligated to publish any information or content on this website and may remove it or features or
functionality at our sole discretion, at any time with or without notice. Springer Nature may revoke
this licence to you at any time and remove access to any copies of the Springer Nature journal content
which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or
guarantees to Users, either express or implied with respect to the Springer nature journal content and
all parties disclaim and waive any implied warranties or warranties imposed by law, including
merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published
by Springer Nature that may be licensed from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a
regular basis or in any other manner not expressly permitted by these Terms, please contact Springer
Nature at

[email protected]

You might also like