Parameter Estimation in Nonlinear Mixed Effect Mod
Parameter Estimation in Nonlinear Mixed Effect Mod
https://doi.org/10.1007/s00180-023-01420-x
ORIGINAL PAPER
Received: 10 December 2022 / Accepted: 18 September 2023 / Published online: 14 October 2023
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023
Abstract
We present a method for parameter estimation for nonlinear mixed-effects models
based on ordinary differential equations (NLME-ODEs). It aims to regularize
the estimation problem in the presence of model misspecification and practical
identifiability issues, while avoiding the need to know or estimate initial conditions
as nuisance parameters. To this end, we define our estimator as a minimizer of a
cost function that incorporates a possible gap between the assumed population-level
model and the specific individual dynamics. The computation of the cost function
leads to formulate and solve optimal control problems at the subject level. Compared
to the maximum likelihood method, we show through simulation examples that our
method improves the estimation accuracy in possibly partially observed systems
with unknown initial conditions or poorly identifiable parameters with or without
model error. We conclude this work with a real-world application in which we
model the antibody concentration after Ebola virus vaccination.
* Quentin Clairon
[email protected]
1
Inria Bordeaux Sud‑Ouest, Inserm, Bordeaux Population Health Research Center, SISTM Team,
UMR1219, University of Bordeaux, 33000 Bordeaux, France
2
Vaccine Research Institute, 94000 Créteil, France
3
Department of Infectious Diseases and Hospital Epidemiology, University Hospital, Collegium
Helveticum, Institute of Medical Virology, University of Zurich, Zurich, Switzerland
4
Centre Inria d’Université Côte d’Azur, EPIONE Research Project, Valbonne, France
13
Vol.:(0123456789)
1 Introduction
subject specific realizations b∗i i=1,…n from partial and noisy observations coming
from n subjects and described by the following observational model:
yij = CX𝜃∗ ,b∗ ,x∗ (tij ) + 𝜖ij .
i i,0
For the i -th subject, we denote tij its j-th measurement time-point on the observation
interval [0, T] and ni its total number of available measurements. Here C is a do × d
sized observation matrix emphasizing the potentially partially observed nature of
the process and 𝜖ij ∼ 𝜎 ∗ × N(0, Ido ) is the measurement error. The vector
{ }
𝐲𝐢 = yij j=1,…n corresponds to the set of observations available for the i-th subject
{ } i
and y = 𝐲𝐢 i=1,…n is the set of all observations in the population. We also assume
that only a subset xi,0
k∗
of xi,0
∗
is perfectly known, the other ones, denoted xi,0
u∗
, being
(( ) ( ) )T
T T
unknown and they are ordered as follows xi,0 = u
xi,0 k
, xi,0 for the sake of
clarity. Nonetheless, pre-existing information can be available for xi,0
u∗
under the form
( )
of a priori distribution with a possibly parameter dependent density ℙ xi,0
u
∣ 𝜃, Ψ, bi .
The same holds for (𝜃, Ψ) for which a priori distribution ℙ(𝜃, Ψ) can be available.
13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Parameter estimation in nonlinear mixed effect models based… 2977
1. They do not take into account the presence of model misspecification, a common
feature of ODE models used in biology. Indeed, the ODE modeling process
suffers from model inadequacy, understood as the discrepancy between the mean
model response and the real process, as well as residual variability subject to
specific stochastic perturbations or missing elements that disappear by averaging
over the entire population (Kennedy and O’Hagan 2001). As examples of the
causes of model inadequacies, one can think of the ODE models used in
epidemiology and virology, which are derived by approximations in which, for
example, interactions are modeled by pairwise products, while higher order terms
and/or the influence of unknown/unmeasured external factors are neglected. As
for residual variability, recall that biological processes are often stochastic and
the justification for deterministic modeling lies in the approximation of stochastic
processes (Kurtz 1978; Kampen 1992). Moreover, in the context of NLME-ODEs,
new sources of model uncertainties emerge. Firstly, error measurement in covar-
iates zi can lead to use a proxy function ẑi instead of zi (Huang and Dagne 2011).
Secondly, the sequential nature of most inference methods leads to estimate
{ ∗}
bi i=1,…n based on an approximation 𝜃̂ instead of 𝜃 ∗ . Thus, the structure of
mixed-effect models spreads measurement uncertainty into the mechanistic model
structure during the estimation. It turns classical
{ } statistical uncertainties into
model error causes. Estimation of 𝜃 ∗, Ψ∗ and b∗i i=1,…n must be performed in the
presence of the model error, although it is known to dramatically affect the
accuracy of methods that do not take it into account (Brynjarsdottir and O’Hagan
2014).
2. They have to estimate or make assumptions on the true unknown initial conditions
u∗
xi,0 . In ODE models, the initial conditions are generally nuisance parameters in
the sense that inferring their values does not bring answers to the scientific
questions which motivate the model construction but is necessary for the
estimation of the relevant parameters. For example, partially observed
compartmental models used in pharmacokinetics/pharmacodynamics often
13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2978 Q. Clairon et al.
13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Parameter estimation in nonlinear mixed effect models based… 2979
with no additional computational costs. This is the key element to efficiently profile
on unknown initial conditions during b∗i estimation, and treat them as nuisance
parameters instead of integrating them into bi definition, as it is usually done in the
previously mentioned methods.
In Sect. 2, we present the inner, middle and outer criteria used to define our
estimator. In Sect. 3, we compare our approach with classic maximum likelihood in
simulations. Then, we proceed to the real data analysis coming from clinical studies
and a model of the antibody concentration dynamics following immunization
with an Ebola vaccine in East African participants (Pasin et al. 2019). Section 5
concludes and discuss future extensions of the method.
From now on, we use the Cholesky decomposition 𝜎 2 Ψ−1 = △T △ and the para-
metrization 𝜙 ∶= (𝜃, Δ, 𝜎) instead of (𝜃, Ψ, 𝜎) to enforce positiveness and symmetry
of Ψ and denote in a summarized way the set of all population √parameters. The norm
‖.‖2 denotes the classic Euclidean one defined by ‖b‖2 = bT b. We estimate the
population and individual parameters via a nested procedure:
13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2980 Q. Clairon et al.
• Estimator 𝜙
̂ obtained by minimization of an outer criterion F based on an
( ( ))
approximation of minb minxu − ln ℙ 𝜙, b, x0 ∣ y , the log joint-distribution
u
{ } of
( ) 0 { }
𝜙, b, x0 sequentially profiled on b ∶= bi i=1,…n and x0 ∶= xi,0
u u u
, which
i=1,…n
are respectively the set of all random effects and unknown initial conditions
among all subjects.
• Estimator b̂i ∶= b̂i (𝜙) obtained for each subject i by minimization of a middle
( )
criterion Gi based on an approximation of minxu u
− ln ℙ(𝐲𝐢 , bi , xi,0 ∣ 𝜙) , the
i,0
log joint-distribution of the data, the random effects and unknown initial con-
ditions profiled on the latter.
• Estimator x̂ u
i,0
∶= x̂
u
i,0
(𝜙, bi ) obtained for each subject i by minimization of an
inner criterion Hi based on an approximation of − ln ℙ(𝐲𝐢 , xi,0 u
∣ 𝜙, bi ), the log
joint-distribution of the data and unknown initial conditions.
x̂
u
i,0
(𝜙, bi ) = arg minxu Hi (xi,0 ∣ 𝜙, bi )
i,0
̃ 𝐢 , xu ∣ 𝜙, bi ).
∶= arg minxu −2 ln ℙ(𝐲
i,0 i,0
13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Parameter estimation in nonlinear mixed effect models based… 2981
2.1 Inner criterion
tion of the log joint-likelihood function of the data and (bi , xi,0
u
). However for each
subject, we want to:
1. profile on xi,0
u∗
during random effects estimation to limit b∗i estimation degradation
due to presence of nuisance parameters,
( )
2. use prior knowledge given by ℙ xi,0 u
∣ 𝜙, bi if available,
3. allow an acceptable deviation from the assumed model at the population level to
take into account possible model misspecifications.
∏ � �
u u u
ℙ(𝐲𝐢 , xi,0 ∣ 𝜙, bi ) = j ℙ(yij ∣ 𝜙, bi , xi,0 )ℙ xi,0 ∣ 𝜙, bi
∏ � �2 � �
−do ∕2 −do −0.5� CX (t )−yij � ∕𝜎 2
� 𝜃,bi ,xi,0 ij �2 u
= j (2𝜋) 𝜎 e ℙ xi,0
∣ 𝜙, b i ,
x̂
u u
(𝜙, bi ) = arg minxu −2 ln ℙ(𝐲𝐢 , xi,0 ∣ 𝜙, bi )
i,0 �
��
i,0
∑ �
1 � �2
j� (t ) − yij � − 2 ln ℙ xi,0 ∣ 𝜙, bi
u
= arg minxu CX .
i,0 𝜎2 � 𝜃,bi ,xi,0 ij �2
We also want to allow the presence of perturbations at the subject scale comparing
to the original model defined at the population level. For this, we assume the
regression function is no longer X𝜃,bi ,xi,0 , but rather X𝜃,bi ,xi,0 ,ui , the solution of:
{
ẋ i (t) = f𝜃,bi (t, xi (t)) + Bui (t)
xi (0) = xi,0 . (2)
13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2982 Q. Clairon et al.
This perturbed ODE has been obtained by the addition of the ( forcing term ) t ↦ Bui (t)
to ODE (1) with B a d × du matrix and ui a function in L2 [0, T], ℝdu representing
the perturbation. However, to ensure the possible perturbations remain small, we
∑ � �2
replace the data fitting criterion j �CX𝜃,bi ,xi,0 (tij ) − yij � by minui Ci (xi,0
u
, ui ∣ 𝜃, bi , U),
� �2
where
∑‖ ‖2
‖CX𝜃,bi ,xi,0 ,ui (tij ) − yij ‖ + ‖u‖ ,
u 2
Ci (xi,0 , ui ∣ 𝜃, bi , U) =
‖ ‖2 ‖ i ‖U,L2
j
and ‖ ‖2 T
‖ui ‖U,L2 = ∫0 ui (t) Uui (t)dt is the weighted Euclidean norm. Here, the magni-
T
x̂
u
i,0
u
(𝜙, bi ) ∶= arg minxu Hi (xi,0 ∣ 𝜙, bi ) (3)
i,0
where
{ ( )}
u 1 u u
Hi (xi,0 ∣ 𝜙, bi ) = min min C (x , u ∣ 𝜃, b , U) − 2 ln ℙ x ∣ 𝜙, b .
u
xi,0 𝜎 2 ui i i,0 i i i,0 i
Computing Hi (xi,0 u
∣ 𝜙, bi ) requires to solve the infinite dimensional optimization
( )
problem minui Ci (xi,0 , ui ∣ 𝜃, bi , U) in L2 [0, T], ℝdu . This problem belongs to the
u
field of optimal control theory for which dedicated approaches have been developed
(Sontag 1998; Aliyu 2011; Clarke 2013). Here we use the same method as in Clai-
ron (2020) which is detailed in Appendix A. All it requires from the user is to spec-
ify a(pseudo-linear
) representation of ODE (1), i.e., a possibly state-dependent matrix
A𝜃,bi t, xi (t) and state-independent vector r𝜃,bi (t) such that:
( )
f𝜃,bi (t, xi (t)) = A𝜃,bi t, xi (t) xi (t) + r𝜃,bi (t). (4)
13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Parameter estimation in nonlinear mixed effect models based… 2983
we denote ui,𝜃,bi ∶= ui,𝜃,bi ,x̂i,0 the optimal control corresponding to the initial condition
( ( )T )T
estimator x̂ ̂u T
i,0 = xi,0 (𝜙, bi ) , xi,0
k
. The solution of (2) corresponding to the opti-
mal control ui ∶= ui,𝜃,bi is denoted X 𝜃,bi and named optimal trajectory: it will be consid-
ered as the regression function for the i-th subject. X 𝜃,bi is thus defined as solution of
ODE (2) which needs the smallest perturbation in order to get close to the observations.
In particular, X 𝜃,bi and ui,𝜃,bi are respectively the subject specific state variable and per-
turbation such that:
{ }
1 ∑‖ ‖2 ‖ ‖ 2
Hi (x�
u
(𝜙, bi )∣𝜙, bi ) = 2 ‖CX̄ 𝜃,bi (tij ) − yij ‖ + ‖ū i,𝜃,bi ‖ 2
i,0
𝜎 ‖ ‖2 ‖ ‖U,L
j (5)
− 2 ln ℙ(x� (𝜙, bi )∣𝜙, bi ).
u
i,0
Again, formal expressions can be derived for both ui,𝜃,bi and x̂u
i,0
(𝜙, bi ), but they pre-
sent no interest for the sake of explanation and are left in Appendix A.
Remark 1 At this stage, we acknowledge the existing similarities between our approach
and the one presented in Wang et al. (2014), an extension of Ramsay et al. (2007) to a
population framework. Both methods approximate the original ODE and avoid initial
condition estimation. However, Wang et al. (2014) still consider classic likelihood for 𝜙
estimation
{ } and the absence of a proper probabilistic framework for handling
u
xi,0 makes it difficult to incorporate a priori information when available. More-
i=1,…n
over, the spline basis decomposition used by Wang et al. (2014) is a source of inaccu-
racy for ODE solution reconstruction, a cause of estimation error as pointed out in Clai-
ron (2020) in which Ramsay et al. (2007) and control based approaches have been
compared in a one subject setting. Finally, the estimation quality of the method pro-
posed in Ramsay et al. (2007) critically depends on hyperparameter choices (basis
dimension, knots location etc.) which can be complicated even when data are coming
from one subject and can thus become intractable for large populations.
2.2 Middle criterion
with ℙ(bi ∣ 𝜙) = √ 1 1
e− 2 bi
T
T △ △
𝜎2
bi , we can define as estimator:
(2𝜋)q |𝜎 2 (△T △) |
−1
13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2984 Q. Clairon et al.
b̂i (𝜙) = u
arg minbi minxu −2 ln ℙ(𝐲𝐢 , bi , xi,0 ∣ 𝜙)
� i,0 � � ��
1 ∑ � �2
j� (t ) − yij � − 2 ln ℙ xi,0 ∣ 𝜙, bi
u
= arg minbi minxu CX
�
i,0 𝜎2 � 𝜃,bi ,xi,0 ij �2
‖ ‖
2
Δb
+ 𝜎 2i 2 .
Still, we use the same relaxation & penalization scheme as in the previous section to
account for model error presence for b∗i estimation. We replace again the term
∑ � �2
j� CX𝜃,bi ,xi,0 (tij ) − yij � by minui Ci (xi,0
u
, ui ∣ 𝜃, bi , U) in the previous criteria and we
� �2
end up with the following estimator:
where:
( ) ‖Δb ‖2
̂u ‖ i ‖2 (7)
Gi (bi ∣ 𝜙) = Hi (xi,0 𝜙, bi ∣ 𝜙, bi ) + .
𝜎2
2.3 Outer criterion
13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Parameter estimation in nonlinear mixed effect models based… 2985
{ { { }
∑ 1 ∑ ‖ ‖2
𝜙̂ = arg min𝜙 i minbi minxu 2 ‖CX𝜃,bi ,xi,0 (tij ) − yij ‖ − 2 ln ℙ(xi,0
u ∣ 𝜙, b )
i
i,0 𝜎 j ‖ ‖2
}
‖Δbi ‖22
+ 𝜎2
( o∑ ) }
i ni + nq ln 𝜎 − n ln | △ △| − 2 ln ℙ(𝜙)
+ d 2 T
where:
� �
1 ∑∑ � �2 � ̂ �2
F(𝜙) = 𝜎2 i j� CX (t ) − yij � + �Δbi (𝜙)�
� 𝜃,b̂i (𝜙) ij �2 � �2
̂u ̂ ̂ (9)
− 2 ln ℙ(xi,0 (𝜙, bi (𝜙)) ∣ 𝜙, bi (𝜙))
� ∑ �
+ do i ni + nq ln 𝜎 2 − n ln � △T △� − 2 ln ℙ(𝜙).
If ℙ(xi,0u
∣ 𝜙, bi ) is constant then x̂ u
i,0
(𝜙, bi ) and b̂i (𝜙) do not depend on 𝜎 i.e.
x̂
u
i,0
(𝜙, bi ) = x̂u
i,0
(𝜃, bi ) and b̂i (𝜙) = b̂i (𝜃, Δ) and consequentially neither does
X 𝜃,b̂ (𝜃,Δ) . So, for each (𝜃, Δ), the maximizer in 𝜎 2 of F(𝜙) = F(𝜃, Δ, 𝜎) has a closed
i
form expression:
� �
1 � �� �2 � � 2
𝜎 2 (𝜃, Δ) = � ∑ � �CX 𝜃,b̂i (𝜃,Δ) (tij ) − yij � + �Δb̂i (𝜃, Δ)� .
do i ni + qn i � �2 � �2
j
(10)
By using 𝜎 (𝜃, Δ) expression, we get min𝜎 2 F(𝜙) = F[𝜃, Δ] where:
2
( )
∑ ( )
F[𝜃, Δ] = d o
ni + qn ln 𝜎 2 (𝜃, Δ) − n ln | △T △| − 2 ln ℙ(𝜙).
i
13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2986 Q. Clairon et al.
( )
̂ 1
2.4 Asymptotic Variance‑Covariance matrix estimator for , ̂
( )
We derive an estimator of the asymptotic variance of 𝜃,̂Δ̂ . Here we restrict to
the case described in Sect. 2.3.2 when a uniform distribution is chosen for xi,0
u
and
the outer criterion is profiled on 𝜎 . The general case can be considered similarly,
but we withdraw it for the sake of clarity since it is not used in following simula-
tion works. We highlight that in practice the matrix Δ is parametrized by a(vector )
𝛿 of dimension q′ , i.e △ ∶= △(𝛿) and we give here a variance estimator of 𝜃, ̂ 𝛿̂ .
From this, the variance of Δ ̂ can be obtained using classic delta-methods (see
van der Vaart (1998) chapter 3).
Let us start by presenting sufficient conditions ensuring our estimator is asymp-
totically normal, by introducing h(bi , 𝜃, Δ, 𝐲𝐢 ) = � �2 ∑ �CX 𝜃,b
�Δbi �2 + j �
� i
‖2
(tij ) − yij ‖ :
‖2
� ∑n � ̂ ��
1. the function F[𝜃, � � � �
+ ln �Δ(𝛿)� has a well
1
limn i 𝔼 h(b(𝜃,Δ(𝛿)),𝜃,Δ(𝛿),yi )
̃ Δ(𝛿)] = −0.5 do 𝔼 n 1 + q ln n
do 𝔼[n1 ]+q
( )
separated minimum 𝜃, 𝛿 belonging to the interior of a compact Θ × Ω,
{ }
2. the true densities of unknown initial conditions ℙ∗ (. ∣ 𝜙∗ , b∗i ) i=1,…,n have finite
variance and either
(𝜈)
for 𝜈 = 0, 1 where h(𝜈) (𝐲𝐢 ) = d(𝜈)d (𝜃,𝛿) h ̂
(bi (𝜃, Δ(𝛿)), 𝜃, Δ(𝛿), 𝐲𝐢 ) and
√∑
i Var(h (𝐲𝐢 )) ,
(𝜈) (𝜈) 2
V = { }
3. the subject specific number of observations ni i=1,…n are i.i.d and uniformly
bounded, ( )
4. for all possibles values 𝜃, bi , the solution X𝜃,bi ,x∗ belongs to a compact 𝜒 of ℝd ,
i,0
and for all (t, 𝜃, x), the mapping bi ⟼ f𝜃,bi (t, x) has a compact support Θb,
( )
5. 𝜃, bi , t, x ⟼ f𝜃,bi (t, x) belongs to C1 (Θ × Θb × [0, T] × 𝜒, ( ℝ ), )
d
2
𝜕 Ci ̂
6. the matrices (xu (𝜃, Δ(𝛿)), u𝜃,̂b (𝜃,Δ(𝛿)) ∣ 𝜃, b̂i 𝜃, Δ(𝛿) , U)
𝜕 2 xi,0 i,0
and
2
( ) i
𝜕
G (b̂ 𝜃, Δ(𝛿) ∣ 𝜃, Δ(𝛿)) are of full rank almost surely for every sequence 𝐲𝐢,
𝜕2 b i i
i
7. t h e r e is a neighborhood Θ𝜃 of 𝜃 such that
( )
𝜃, bi , t, x ⟼ f𝜃,bi (t, x) ∈ C5 (Θ𝜃 × Θb × [0, T] × 𝜒, ℝd ).
13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Parameter estimation in nonlinear mixed effect models based… 2987
( )
used to derive the consistency of our estimator toward 𝜃, 𝛿 by following classic
steps for M-estimator by proving 1. the uniform convergence of our stochastic
cost function to a deterministic one, 2. the existence of a well-separated minimum
for this deterministic function (van der Vaart 1998). Conditions 5–7(ensures) that
our cost function is asymptotically smooth enough in the vicinity of 𝜃, 𝛿 to pro-
ceed to a Taylor expansion
√ and transfer the regularity of the cost function to the
asymptotic behavior of n(𝜃̂ − 𝜃, 𝛿̂ − 𝛿). Less restrictive conditions can be estab-
lished under which our estimator is still asymptotically normal, in particular
regarding f𝜃,bi regularity with respect to t.
where
[ ] [ ]
1 ∑ 𝜕J(𝜃, 𝛿, 𝐲𝐢 ) 1 ∑
n
T
A(𝜃, 𝛿) = lim , B(𝜃, 𝛿) = lim J(𝜃, 𝛿, 𝐲𝐢 )J(𝜃, 𝛿, 𝐲𝐢 )
n n
i=1
𝜕(𝜃, 𝛿) n n
i
( )
J𝜃 (𝜃, 𝛿, 𝐲𝐢 )
and the vector valued function J(𝜃, 𝛿, 𝐲𝐢 ) = is given by:
J𝛿 (𝜃, 𝛿, 𝐲𝐢 )
d ̂
J𝜃 (𝜃, 𝛿, 𝐲𝐢 ) = h(b(𝜃, Δ(𝛿)), 𝜃, Δ(𝛿), yi )
d𝜃
d ̂
J𝛿 (𝜃, 𝛿, 𝐲𝐢 ) = h(b (𝜃, Δ(𝛿)), 𝜃, Δ(𝛿), yi )
d𝛿 i ( )
2 −1 𝜕 ▵ (𝛿)
− [ ] Tr ▵ (𝛿) h(b̂ i (𝜃, Δ(𝛿)), 𝜃, Δ(𝛿), yi ).
d o 𝔼 n1 + q 𝜕𝛿k
The proof is left in appendix D. The practical interest of this theorem is to give
an estimator of the Variance-Covariance matrix:
( )T ∑ ̂ ̂ 𝛿,𝐲
̂ 𝐢)
𝜕 J(𝜃,
̂ 𝛿)
̂ ≃ A( ̂ 𝜃,
̂ 𝛿) ̂ 𝜃,
̂ −1 B( ̂ 𝛿)
̂ A( ̂ 𝜃,
̂ 𝛿)
̂ −1 ∕n with ̂ 𝜃,
̂ 𝛿)̂ =−
i=1 𝜕(𝜃,𝛿) ,
1 n
V(𝜃, A( n
̂ 𝜃,
B( ̂ = 1 ∑n ̂
̂ 𝛿) ̂ 𝐲𝐢 )̂
̂ 𝛿,
J(𝜃, ̂ 𝛿,
J(𝜃, ̂ 𝐲𝐢 )T and the vector valued function
i=1
n
( )
̂
J𝜃 (𝜃, 𝛿, 𝐲𝐢 )
̂
J(𝜃, 𝛿, 𝐲𝐢 ) = given by Ĵ𝜃 (𝜃, 𝛿, 𝐲𝐢 ) = J𝜃 (𝜃, 𝛿, 𝐲𝐢 ) and
̂
J𝛿 (𝜃, 𝛿, 𝐲𝐢 )
Ĵ𝛿 (𝜃, 𝛿, 𝐲𝐢 ) = d
h(̂
bi (𝜃, Δ(𝛿)), 𝜃, Δ(𝛿), 𝐲𝐢 )
d𝛿 � �
− do ∑2nn +qn Tr △(𝛿)−1 𝜕△(𝛿)
𝜕𝛿k
h(̂
bi (𝜃, Δ(𝛿)), 𝜃, Δ(𝛿), 𝐲𝐢 ).
i i
13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2988 Q. Clairon et al.
13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Parameter estimation in nonlinear mixed effect models based… 2989
[ ]
‖ ∗ ̂ ‖2
magnitude. For b∗i ,
we estimate the mean squared error MSE(bi ) = 𝔼 ‖bi − bi ‖ .
̂
‖ ‖2
For each subsequent examples, we give the results for n = 50 and present in Appen-
dix C the case n = 20 to analyze the evolution of each estimator accuracy with
respect to data sparsity. For one example in section “Effect of population size on
estimation accuracy” in Appendix C, we also consider the case n = 100 to analyze
the evolution of estimation accuracy with respect to an increasing population size.
In the following, we use the superscript ML to denote the ML estimator. For the
fairness of comparison with ML, we choose a non-informative prior i.e.
ln ℙ(𝜃, Δ) = 0 for our method throughout this section (the impact of prior incorpora-
tion is analyzed in an example left in Appendix C, section “Effect of prior informa-
tion on estimation accuracy”, for a discussion on prior choice in NLME-ODEs see
Prague et al. (2012)). Also, we do not use a distribution for xi,0
u
for(our approach.
) For
ML which requires it, we will use the right parametric form for ℙ xi,0 ∣ 𝜙, bi . If the
u
ODE (1) has an analytical solution, the ML estimator is computed via SAEM algo-
rithm (SAEMIX package Comets et al. (2017)). Otherwise, it is done via a restricted
likelihood method dedicated to ODE models implemented in the nlmeODE package
(Tornoe et al. 2004). For our method, we need to select U balancing model and data
fidelity in the inner and middle criteria (5)-(7). We use the method presented in
G. Hooker and Earn (2011) to compute { EPi}(U), the prediction error for the subject i
corresponding to the estimators 𝜃U , bi,U
̂ ̂ obtained for a given matrix U.
i=1,…n
∑
From this, we compute EP(U) = i EPi (U) the { global
} prediction
{ error
} for the whole
population. We test a trial of scalar matrices Ul l=1,…L = 𝜆l × Id l=1,…L and retain
{ }
the hyperparameter value 𝜆l minimizing EP and we denote 𝜃, ̂ b̂i,
̂ Ψ, the cor-
i=1,…n
responding estimator. For solving the optimization problems required for computing
our criteria, we use the Nelder-Mead algorithm implemented in the optimr package
(Nash 2016). All optimization algorithms used here require a starting guess value.
We start from the true parameter value for each of them. By doing so, we aim to
keep distinct two problems: 1. the numerical stability of the estimation procedures,
2. the intrinsic accuracy of the different estimators. These two problems are corre-
lated, but we aim to address only the latter which corresponds to the issues raised in
introduction. Still, we check on preliminary
( analysis
) that local minima presence was
not an issue in the neighborhood of 𝜃 ∗ , △∗ by testing different starting points for
all methods. No problem appears for our method and SAEMIX. A negligible num-
ber of non convergence cases appear for nlmeODE which have been discarded
thanks to the convergence criteria embedded in the package (the occurence and
importance of such convergence issues is analyzed in an example left in section
“Effect of wrong first guess on estimation accuracy” in Appendix C in which we
show that our method suffers less than ML from convergence issues when initial
conditions are unknown).
13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2990 Q. Clairon et al.
Fig. 1 Left: Examples of solutions of (12) and corresponding observations. Right: Solution of (12) and a
realization of (13) for the same parameter values
We consider the population model where each subject i follows the ODE:
Table 1 Results of estimation for model (12). The different subscripts stand for the following estimation
( )
scenarios: 1. x0 when both initial conditions are set to x1,i,0
∗ ∗
, x2,i,0 , 2. x0,2 when x1,i,0 is set to yi,0 and x2,i,0
to x2,i,0
∗ , 3. absence of subscript when x
1,i,0 is set to yi,0 and x2,i,0 is estimated. Results from our method are
in bold
Well-specified Misspecified
MSE Bias Ve ̂
V CR MSE bi MSE Bias Ve ̂
V CR MSE bi
𝜃1 𝜃̂xML 0.01 3e−3 0.01 0.01 0.95 0.02 3e−3 0.01 0.01 0.92
0
𝜃̂xML 0.01 3e−3 0.01 0.01 0.92 0.01 3e−3 0.01 0.01 0.93
2,0
𝜃̂ML 0.02 4e−3 0.02 0.01 0.90 0.02 0.02 0.02 0.01 0.90
𝜃̂ 0.01 6e−3 7e−3 0.01 0.98 0.01 0.01 0.01 0.01 0.96
𝜃2 𝜃̂xML 2e−4 1e−3 1e−4 1e−4 0.95 1e−3 0.02 4e−4 6e−4 0.88
0
𝜃̂xML 2e−4 1e−3 1e−4 1e−4 0.93 1e−3 0.02 5e−4 6e−4 0.88
2,0
𝜃̂ML 5e−4 4e−3 5e−4 3e−4 0.86 5e−3 0.02 4e−3 1e−3 0.77
𝜃̂ 4e−4 3e−4 4e−4 5e−4 0.98 3e−3 0.01 3e−3 5e−3 0.94
Ψ 𝜃̂xML 0.01 −0.02 0.01 0.01 0.93 6e−3 0.01 −0.02 0.01 0.01 0.93 0.01
0
𝜃̂xML 0.01 −0.02 0.01 0.01 0.92 7e−3 0.01 −0.02 0.01 0.01 0.94 0.02
2,0
𝜃̂ML 0.01 −0.02 0.01 0.01 0.92 0.01 0.01 −0.02 0.01 0.01 0.93 0.04
𝜃̂ 0.01 − 0.02 0.01 0.01 0.92 5e−3 0.01 − 0.03 0.01 0.02 0.92 0.02
13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Parameter estimation in nonlinear mixed effect models based… 2991
⎧ Ẋ = 𝜙 X − 𝜙 X
⎪ ̇ 1,i 2,i 2,i 1,i 1,i
⎨ �X2,i = −𝜙2,i X2,i� � � (12)
⎪ X1,i (0), X2,i (0) = x1,i,0 , x2,i,0
⎩
has to be estimated.
3.1.1 Well‑specified case
We used the exact model described in Sect. 3.1 for the estimation procedure. Thus,
we are in a completely well-specified setting, with all mechanisms modeled. We pre-
sent the estimation results in Table 1—left side. For ML, the results are goods in
terms of accuracy and consistent in terms of asymptotic confidence interval cover-
age rate when both initial conditions are known: 95% for 𝜃1 and 𝜃2, which is consist-
ent with theoretical results. However, there is a significant drop in accuracy when
∗
x2,i,0 has to be estimated. In particular, the coverage rate drops to 90% and 86% for
𝜃1 and 𝜃2 respectively. Interestingly, ML inaccuracy is driven by bias and under-esti-
mated variance when initial conditions are not known (as shown by a greater V e than
̂ ). In this case our method provides a relevant alternative: it gives accurate estima-
V
tions with a good coverage rate for all parameters while avoiding the estimation of
∗
x2,i,0 . Variances are properly estimated compared to empirical variances. Estimation
of individual random effects is also more accurate with our method, with a MSE for
bi 2 times smaller compared to ML with unknown initial conditions.
13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2992 Q. Clairon et al.
⎧ dX = 𝜙 X dt − 𝜙 X dt
⎪ 1,i 2,i 2,i 1,i 1,i
⎨ �dX2,i = −𝜙2,i X2,i � 𝛼dBt
� dt + � (13)
⎪ X1,i (0), X2,i (0) = x1,i,0 , x2,i,0
⎩
with Bt a Wiener process and 𝛼 = 0.1 the diffusion coefficient. For the sake of
comparison, a solution of (12) and a realization of its perturbed counterpart given
by (13) are plotted in Fig. 1. This framework where stochasticity only affects the
unmeasured compartment is known to be problematic for parameter estimation and
inference procedures are yet to be developed for sparse sampling case. From Fig. 1 it
is easy to see that the diffusion 𝛼 will be hard to estimate when we only have obser-
vations for X1,i . Thus, we still estimate the parameters from the model (12) which
is now seen as a deterministic approximation of the true stochastic process. Still, it
is expected that our method will mitigate the effect of stochasticity on the estima-
tion accuracy by taking into account model misspecification. Results are presented
in Table 1—right side. The differences between the two methods are similar to the
previous well-specified case with an additional loss of accuracy coming from model
error for both estimators. However, the misspecification effect for ML is more pro-
nounced comparing to our method which manages to limit the damages done. This
illustrates the benefits of taking into account model uncertainty for estimation, in
particular here when model error occurs in the unobserved compartment, a situation
in which classic statistical criteria for model assessment based on a data fitting crite-
rion are difficult to use.
Finally, we acknowledge that the effect of other misspecification sources can be
investigated. For example, the population which is here assumed homogeneous can
be in fact a mixture of subjects with random effects distributed according to differ-
ent laws. To account for this, we evaluate in section “Effect of outlier presence on
estimation accuracy” in Appendix C the situation in which an increasing fraction
of subjects are chosen as outliers for the random effect assumed distribution. We
then investigate its impact on estimation accuracy for ML and optimal control based
methods.
We consider the model presented in De Gaetano and Arino (2000) for the analysis of
glucose and insulin regulation:
13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Parameter estimation in nonlinear mixed effect models based… 2993
Table 2 Results of estimation for model (14). The different subscripts stand for the following estima-
tion scenarios: 1. SI when SI is set to SI∗, 2. absence of subscript when SI is estimated. Results from our
method are in bold
Well-specified Misspecified
MSE Bias Ve ̂
V CR MSE bi MSE Bias Ve ̂
V CR MSE bi
𝜃SG 𝜃̂SML 5e−5 2e−3 4e−5 9e−6 0.95 6e−5 3e−3 6e−5 2e−5 0.85
I
𝜃̂ML 2e−3 0.03 1e−3 8e−5 0.85 2e−3 3e−3 1e−3 2e−4 0.54
𝜃̂SI 1e−5 4e−4 1e−5 8e−6 0.95 2e−5 − 2e−5 2e−5 2e−5 0.93
𝜃̂ 2e−4 − 6e−4 2e−4 2e−4 0.96 3e−4 − 1e−3 3e−4 4e−4 0.93
𝜃̂ML 2e−3 0.03 1e−3 6e−5 0.90 0.01 0.04 0.01 1e−3 0.55
𝜃̂ 1e−4 − 7e−4 1e−4 1e−4 0.96 3e−4 − 1e−3 3e−4 3e−4 0.92
𝜃m 𝜃̂SML 7e−4 3e−3 6e−4 5e−4 0.94 8e−4 − 3e−3 8e−4 5e−4 0.89
I
𝜃̂ML 9e−4 8e−3 8e−4 5e−4 0.86 5e−3 − 5e−3 5e−3 5e−4 0.88
𝜃̂SI 5e−4 6e−3 5e−4 5e−4 0.95 4− 4 7e−4 4e−4 5e−4 0.95
𝜃̂ 6e−4 6e−3 5e−4 5e−4 0.95 4e−4 6e−4 4e−4 5e−4 0.96
Ψ 𝜃̂SML 0.02 7e−4 0.02 0.02 0.95 0.02 0.03 − 3e−3 0.03 0.02 0.93 0.03
I
𝜃̂ML 0.04 −0.09 0.03 0.02 0.88 0.02 0.03 − 8e−3 0.02 0.02 0.87 0.03
𝜃̂SI 0.01 − 2e−3 0.01 0.01 0.95 0.01 0.01 − 4e−3 0.01 0.02 0.94 0.01
𝜃̂ 0.01 3e−3 0.01 0.01 0.94 0.01 0.02 − 7e−3 0.02 0.02 0.94 0.02
⎧ Ġ = S (G − G ) − X G
⎪ ̇i G B i i i
⎨ Ii = 𝛾t(Gi − h) − mi (Ii − IB ) (14)
⎪ Ẋ i = −p2 (Xi + SI (Ii − IB )).
⎩
The ODE system (14) rules the behavior of circulating glucose Gi and insulin Ii in
blood as well as insulin Xi present in interstitial fluid. We are in a partially
( observed)
case where only Gi and Ii are measured. The values of parameters p2 , 𝛾, h, GB , IB
( )
are set to (−4.93, −6.85, 4.14, 100, 100) and we aim to estimate 𝜃 = 𝜃SG , 𝜃SI , 𝜃m ,
linked to the original model via the parametrization: log(SG ) = 𝜃SG , log(SI ) = 𝜃SI and
log(mi ) = 𝜃m + bi where bi ∼ N(0, Ψ). The true population parameter values are
tions xi,0
∗
= G∗i,0 , Ii,0
∗ ∗
, Xi,0 are distributed according to ln(xi,0
∗
) ∼ N(lx∗ , Ψlx∗ ) with
( ) 0 0
lx∗ = (5.52, 4.88, −7) and Ψlx∗ = 0.172 , 0.12 , 10−4 . For the penalization term in our
0 0
13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2994 Q. Clairon et al.
required for each subject by ML. For this, we distinguish two cases, 1. when 𝜃SI is
known, 2. when 𝜃SI has to be estimated and we denote respectively 𝜃̂ SI and 𝜃 the cor-
̂
responding estimators. Finally, since the model is nonlinear, we have to specify a
pseudo-linear representation of the vector field as in (4):
� � ⎛ −SG 0 − Gi ⎞ ⎛ SG GB ⎞
A𝜃,bi t, Gi , Ii , Xi = ⎜ 𝛾t − mi 0 ⎟, r𝜃,bi (t) = ⎜ −𝛾th + mi IB ⎟.
⎜ ⎟ ⎜ ⎟
⎝ 0 − p2 SI − p2 ⎠ ⎝ p2 SI IB ⎠
3.2.1 Well‑specified case
We present the estimation results in Table 2—left side. Our method has small bias
and achieve good coverages in all cases. We obtain smaller MSE than ML and avoid
the drop in coverage rate of the confidence interval in the case of 𝜃S∗ estimation,
I
which is often needed in practice. The difference between the two estimators
behavior is explained by the fact that they are defined through the construction of
two different optimization problems. At the population level, our approach leads to
minimize a cost function depending on a 4-dimensional parameter whereas ML, due
to its need to estimate xi,0
∗
, considers a 10-dimensional one. Thus, the parameter
spaces explored by each method to look for the minimum are very different.
𝜃∗ 𝜃𝛿∗ Mean log-value for 𝛿S , the short-lived cells declining rate log(log(2)∕1.2) ≃ −0.54
S
𝜃𝜙∗ Mean log-value for 𝜙S , the antibodies influx from short-lived cells log(2755) ≃ 7.92
S
𝜃𝜙∗ Mean log-value for 𝜙L , the antibodies influx from long-lived cells log(16) ≃ 2.78
L
𝜃𝛿∗ Mean log-value for 𝛿Ab, the antibodies declining rate log(log(2)∕24) ≃ −3.54
Ab
Ψ∗ Ψ∗𝜙 Inter individual variance for log(𝜙S,i ) 0.922
S
13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Parameter estimation in nonlinear mixed effect models based… 2995
⎧ dG = �S (G − G ) − X G �dt + 𝛼 dB
⎪ i � G B i i i � 1 1,t
⎨ dIi = �𝛾t(Gi − h) − mi (Ii − I�B )dt + 𝛼2 dB2,t , (15)
⎪ dXi = −p2 (Xi + SI (Ii − IB )) dt + 𝛼3 dB3,t
⎩
( ) ( )
where the Bi,t are Wiener processes and 𝛼1 , 𝛼2 , 𝛼3 = 2, 2, 2 × 10−4 their diffusion
coefficients. We present the estimation results in Table 2—right side. For ML, the
drop in coverage rate for 𝜃S∗ and 𝜃S∗ is even more striking when 𝜃S∗ needs to be esti-
G I I
mated. This is explained by the effect of model misspecification which increases
bias and the fact that ML does not take into account this new source of uncertainty
which leads to under-estimation of variance and too narrow confidence intervals.
Our method achieved small bias, nominal coverages and small MSE for random
effects.
We consider the model presented in Pasin et al. (2019) to analyze the antibody con-
centration, denoted Ai , generated by two populations of antibody secreting cells: the
short lived, denoted Si , and the long-lived, denoted Li:
⎧ Ṡ i = −𝛿S Si
⎪ L̇ i = −𝛿L Li
⎨ Ȧ = 𝜗 S + 𝜗 L − 𝛿 A (16)
⎪ �i S,i i L,i �i �Ab,i i �
⎩ Si (0), Li (0), Ai (0) = Si,0 , Li,0 , Ai,0 .
This model is used to quantify the humoral response on different populations after
an Ebola vaccine injection with a 2 doses regimen seven days after the second
injection when the antibody secreting cells enter in a decreasing phase. These cells
being unobserved, the preceding equation can be simplified to focus on antibody
concentration evolution:
with 𝜙S,i ∶= 𝜗S,i Si,0 and 𝜙L,i ∶= 𝜗L,i Li,0. This equation has an analytic solution which
will be used for ML. We consider the following parametrization: log(𝛿S ) = 𝜃𝛿S ,
log(𝜙S,i ) = 𝜃𝜙S + b𝜙S ,i , log(𝜙L,i ) = 𝜃𝜙L + b𝜙L ,i and log(𝛿Ab,i ) = 𝜃𝛿Ab + b𝛿Ab ,i . The true
parameter values are presented in Table 3. For the penalization term in our method,
we choose the values 𝜆l = 103 , 105 , 107 , 108 .
According to Pasin et al. (2019), 𝛿L was non-identifiable based on the available
data and only a lower bound has been derived for it via profiled likelihood. So, to
make fair comparisons between our approach and maximum likelihood, we do not
13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2996 Q. Clairon et al.
Table 4 Results of estimation for model (17). The different subscripts stand for the following estima-
tion scenarios: 1. 𝛿S when 𝜃𝛿S is set to 𝜃𝛿∗ , 2. absence of subscript when 𝜃𝛿S is estimated. Results from our
method are in bold
S
Well-specified Misspecified
MSE Bias Ve ̂
V CR MSE bi MSE Bias Ve ̂
V CR MSE bi
𝜃̂ML 2.13 0.78 1.51 70.64 0.92 3.88 1.48 1.68 4.10 0.80
𝜃̂𝛿S known known
𝜃̂ 0.62 − 0.34 0.50 0.66 0.92 0.93 − 0.40 0.77 0.62 0.90
𝜃𝜙S 𝜃̂𝛿ML 4e−4 0.01 3e−4 3e−4 0.94 1e−3 0.02 1e−3 5e−4 0.91
S
𝜃̂
ML
0.01 −0.05 7e−3 0.40 0.92 0.02 −0.10 0.01 0.02 0.88
𝜃̂𝛿S 2e−3 − 0.05 2e−4 1e−3 0.94 7e−4 − 0.02 3e−4 1e−3 0.92
𝜃̂ 2e−3 1e−3 2e−3 2e−3 0.93 4e−3 − 6e−3 3e−3 0.01 0.90
𝜃𝜙L 𝜃̂𝛿ML 3e−3 0.02 3e−3 2e−3 0.95 5e−3 0.03 4e−3 3e−3 0.93
S
𝜃̂ML 4e−3 0.03 4e−3 3e−3 0.90 9e−3 0.05 7e−3 4e−3 0.90
𝜃̂𝛿S 7e−4 − 0.01 5e−4 3e−3 0.95 2e−3 − 0.02 3e−3 2e−3 0.97
𝜃̂ 3e−3 − 3e−3 3e−3 2e−3 0.91 6e−3 − 8e−3 6e−3 7e−3 0.90
𝜃𝛿Ab 𝜃̂𝛿ML 7e−4 − 0.02 5e−4 3e−4 0.93 2e−3 − 0.03 1e−3 1e−3 0.92
S
𝜃̂ML 2e−3 −0.02 1e−3 4e−4 0.88 4e−3 −0.04 3e−3 7e−4 0.88
𝜃̂𝛿S 2e−4 0.01 1e−4 3e−4 0.95 3e−4 2e−3 3e−4 3e−4 0.96
𝜃̂ 4e−4 0.01 3e−4 2e−4 0.90 3e−4 8e−3 3e−4 2e−3 0.89
Ψ𝜙S 𝜃̂ML 0.04 − 1e−3 0.04 0.07 1 0.15 0.05 0.03 0.05 0.08 1 0.17
𝛿 S
𝜃̂ML 0.11 0.01 0.11 0.05 1 0.17 0.13 0.01 0.13 0.25 1 0.21
𝜃̂𝛿S 0.02 8e−3 0.02 0.01 0.94 0.06 0.02 2e−3 0.02 0.02 0.94 0.11
𝜃̂ 0.02 − 0.03 0.02 0.02 0.94 0.07 0.02 − 0.05 0.02 0.03 0.92 0.08
Ψ𝜙L 𝜃̂ML 0.03 0.04 0.02 0.04 1 0.30 0.05 0.03 0.05 0.06 1 0.73
𝛿 S
𝜃̂ML 0.03 0.05 0.02 0.04 1 0.60 0.03 0.05 0.02 0.07 1 0.74
𝜃̂𝛿S 0.02 − 0.1 5e−3 8e−3 0.93 0.07 0.02 − 0.10 0.01 0.02 0.91 0.10
𝜃̂ 0.03 − 0.06 0.02 0.01 0.92 0.08 0.03 − 0.06 0.02 0.03 0.87 0.12
Ψ𝛿Ab 𝜃̂ML 0.11 0.18 0.08 0.02 1 0.10 0.33 0.41 0.17 0.05 1 0.56
𝛿 S
𝜃̂ML 0.20 0.29 0.11 0.02 1 0.50 0.30 0.34 0.19 0.05 1 0.69
𝜃̂𝛿S 0.10 − 0.30 0.01 0.01 0.95 0.03 0.10 − 0.16 0.08 0.06 0.91 0.04
𝜃̂ 0.11 − 0.27 0.04 0.04 0.95 0.04 0.15 − 0.29 0.06 0.10 0.88 0.06
13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Parameter estimation in nonlinear mixed effect models based… 2997
investigate this effect, we estimate the parameters when 𝜃𝛿∗ 1. is known (the
S
corresponding estimators will be denoted with the subscript 𝛿S ), or 2. has to be
estimated as well.
3.3.1 Well‑specified case
model error during bi i=1,…n estimation, not accounted for by ML. This in turn
explains why variance Ψ∗ is better estimated with our approach. In this mixed-effect
context, this cause of model error is systematically present and claims for the use of
estimation methods taking it into account when subject specific parameters are criti-
cal for the practitioner.
The data are generated with a stochastic perturbed version of ODE (17):
( )
dAi = 𝜙S,i e−𝛿S t + 𝜙L,i e−𝛿L t − 𝛿Ab,i Ai dt + 𝛼dBt (18)
where Bt is a Wiener process and 𝛼 = 10 its diffusion coefficient. The value for 𝛼 has
been chosen big enough to produce significantly perturbed trajectories but small
enough to ensure that ODE (17) is still a relevant approximation for estimation
purpose. The results are presented
{ ∗ } in Table 4—right side. Our method outperforms
the ML for 𝜃𝛿 as well as for bi i∈[1, n] estimation and their variances. However, we
∗
S
acknowledge that this last simulation setting is challenging even for our approach
13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2998 Q. Clairon et al.
Fig. 2 Mean trajectory for Pasin et al. (2019) estimation (Dashed line) and the optimal control approach
estimation (Solid line). Shaded area are the 95% confidence intervals
with confidence coverage around 90% for most of parameters, below the theoretical
rate of 95%.
We use the presented estimation approach to address the same problem as Pasin
et al. (2019). This real data example is similar to the synthetic scenario performed
in Sect. 3.3. In brief, we use data from a phase I trial in East Africa evaluating the
effect of an heterologous anti-Ebola vaccine strategy in which Ad26.ZEBOV was
injected first and then MVA-BN-Filo with a delay of 28 days between the two
doses. We consider a population of n=28 individuals, with in average 5
measurements per subject. In order to ensure a fair comparison, we adopt a
13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Parameter estimation in nonlinear mixed effect models based… 2999
Fig. 3 Examples of fitted trajectories for both methods for four different random subjects. Dashed lines:
fitted ODE solutions from Pasin et al. (2019). Solid line: optimal trajectories X 𝜃,
̂ b̂i obtained with optimal
control approach. Shaded area are the 95% confidence intervals
13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
3000 Q. Clairon et al.
Fig. 4 (1) Up: Estimated residual controls for each subject, (2) bottom: mean optimal control and 95%
confidence interval for the optimal controls a) left: ui,𝜃̂P ,b̂ P ,y obtained from parameter estimation in Pasin
i i,0
( )
Bayesian framework for 𝜃 = 𝜃𝛿S , 𝜃𝜙S , 𝜃𝜙L , 𝜃𝛿Ab and used the same prior distribu-
tion as in the original paper:
13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Parameter estimation in nonlinear mixed effect models based… 3001
⎛⎛ −1 ⎞ ⎛ 25 0 0 0 ⎞⎞
⎜⎜ 0 ⎟ ⎜ 0 100 0 0 ⎟⎟
𝜋(𝜃) ∼ N ⎜⎜
0 ⎟ ⎜ 0 0 ⎟⎟
, .
0 100
⎜⎜ ⎟ ⎜ ⎟⎟
⎝⎝ −4.1 ⎠ ⎝ 0 0 0 1 ⎠⎠
We set our mesh-size to get 200 discretization points for each subject and we use
U = 10 i.e., a value lower than in the simulated data case because of the presence of
model error. We also proceed to the log-transformation of the data to stabilize the
measurement noise variance. Using the transformation Ãi (t) ∶= log10 Ai (t) in Equa-
tion (17), this drives us to use the following nonlinear model:
̇ 1 ( ) � 𝛿Ab,i
A�i (t) = 𝜙S,i e−𝛿S t + 𝜙L,i e−𝛿L t 10−Ai (t) − . (19)
ln(10) ln(10)
1 ( ) −x 𝛿Ab,i
We choose A𝜃,bi (t, x) = ln(10) 𝜙S,i e−𝛿S t + 𝜙L,i e−𝛿L t 10x and r𝜃,bi (t) = − ln(10) for the
pseudo-linear formulation of the model. In Table 5, we compare our estimations
with those presented in Pasin et al. (2019) obtained using the NIMROD software
(Prague et al. 2013). Both methods produce estimations with overlapping confidence
intervals for 𝜃 supporting the previous published results in term of antibodies con-
centrations
( dynamics
) over time. Still, significant differences appear for
Ψ𝜙S , Ψ𝜙L , Ψ𝛿Ab estimation with lower dispersion of random effects in the optimal
control approach. This is explained by the fact
{ that a}part of the variability is now
carried out by subject-specific perturbations ui,𝜃,b
̂ (𝜃)
̂ . In Fig. 2, we plot the
i i=1,…n
mean curve for both estimation methods, that is the solution of ODE (19) with 𝜃
value given by Table 5 and random effects set to 0. The mean evolution are compa-
rable between the two approaches. This is confirmed at the individual level in Fig. 3.
Finally, our method
{ can}be used to assess the model adequacy via the temporal evo-
lution analysis of ui,𝜃,b
̂ (𝜃)
̂ estimated as byproducts of our method. In Sect. 2.1,
i i=1,…n
we have also indicated that perturbations ui,𝜃,bi ,xi,0 can be computed for an arbitrary set
{ }
(𝜃, bi , xi,0 ). In particular, we estimate ui,𝜃̂P ,b̂ P ,y , the committed error corre-
i i,0 i=1,…n
P
sponding to (𝜃 , b̂i ), the population and subject specific estimators obtained in Pasin
̂P
et al. (2019). In Fig. 4, we plot both perturbation sets. Our method leads to residual
perturbations of smaller magnitudes and narrower confidence intervals. This means
that our approach produces an estimation which minimizes the committed model error
for each subject comparing to a method based only on a data fitting criterion as in Pasin
et al. (2019). Moreover, by reducing the size of the confidence interval for estimated
perturbations, we conclude to a mean perturbation among the population which is sta-
tistically different from zero at the beginning of observation interval. This may indicate
presence of model misspecification.
13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
3002 Q. Clairon et al.
5 Conclusion
13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Parameter estimation in nonlinear mixed effect models based… 3003
6 Supplementary information
Acknowledgements Experiments presented in this paper were carried out using the PlaFRIM experimen-
tal testbed, supported by Inria, CNRS (LABRI and IMB), Université de Bordeaux, Bordeaux INP and
Conseil Régional d’ Aquitaine (see https://www.plafrim.fr/). This manuscript was developed under WP4
of EBOVAC3.
Funding This work has received funding from the Innovative Medicines Initiative 2 Joint Undertaking
under projects EBOVAC1 and EBOVAC3 (respectively grant agreement No 115854 and No 800176).
The IMI2 Joint Undertaking receives support from the European Union’s Horizon 2020 research and
innovation programme and the European Federation of Pharmaceutical Industries and Association.
Code availability Our estimation method is implemented in R and a code reproducing the examples of
Sect. 3 is available on a GitHub repository located here (https://github.com/QuentinClairon/NLME_
ODE_estimation_via_optimal_control.git).
Declarations
Conflict of interest The authors have no conflicts of interest to declare that are relevant to the content of
this article.
References
Aliyu M (2011) Nonlinear H-Infinity Control, Hamiltonian Systems and Hamilton-Jacobi Equations,
CRC Press
Balelli I, Pasin C, Prague M et al (2020) A model for establishment, maintenance and reactivation of the
immune response after vaccination against Ebola virus. J Theor Biol 495:110254
Brynjarsdottir J, O’Hagan A (2014) Learning about physical parameters: the importance of model dis-
crepancy. Inverse Prob 30:24
Campbell D (2007) Bayesian collocation tempering and generalized profiling for estimation of param-
eters from differential equation models. PhD thesis, McGill University Montreal, Quebec
Cimen T (2008) State−dependent Riccati equation (SDRE) control: a survey. IFAC Proc 41:3761–3775
Cimen T, Banks S (2004) Global optimal feedback control for general nonlinear systems with nonquad-
ratic performance criteria. Syst Control Lett 53:327–346
Clairon Q (2020) A regularization method for the parameter estimation problem in ordinary differential
equations via discrete optimal control theory. J Stat Plan Inference 210:1–9
Clarke F (2013) Functional analysis, calculus of variations and optimal control, Graduate Texts in Math-
ematics, Springer, London
13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
3004 Q. Clairon et al.
Comets E, Lavenu A, Lavielle M (2017) Parameter estimation in nonlinear mixed effect models using
saemix, an R implementation of the SAEM algorithm. J Stat Softw 80:1–42
De Gaetano A, Arino O (2000) Mathematical modelling of the intravenous glucose tolerance test. J Math
Biol 40(2):136–168
Donnet S, Samson A (2006) Estimation of parameters in incomplete data models defined by dynamical
systems. J Stat Plan Inference 137(9):2815–2831
Engelhardt B, Kschischo M, Fröhlich H (2017) A Bayesian approach to estimating hidden variables as
well as missing and wrong molecular interactions in ordinary differential equation-based mathemat-
ical models. J R Soc Interface 14(131):20170332
Engl H, Flamm C, Kügler P et al (2009) Inverse problems in systems biology. Inverse Prob 25(12):123014
Hooker G, Ellner SP, Roditi LD, Earn DJ (2011) Parameterizing state−space models for infectious dis-
ease dynamics by generalized profiling: measles in Ontario. J R Soc 8:961–974
Guedj J, Thiebaut R, Commenges D (2007) Maximum likelihood estimation in dynamical models of
HIV. Biometrics 63:1198–206
Gutenkunst RN, Waterfall J, Casey F et al (2007) Universally sloppy parameter sensitivities in systems
biology models. Public Libr Sci Comput Biol 3:e189
Hooker G, Ellner SP et al (2015) Goodness of fit in nonlinear dynamics: Misspecified rates or misspeci-
fied states? Ann Appl Stat 9(2):754–776
Huang Y, Dagne G (2011) A Bayesian approach to joint mixed-effects models with a skew normal distri-
bution and measurement errors in covariates. Biometrics 67:260–269
Huang Y, Lu T (2008) Modeling long-term longitudinal HIV dynamics with application to an aids clini-
cal study. Ann Appl Stat 2:1348–1408
Kampen NV (1992) Stochastic process in physics and chemistry. Elsevier
Kennedy MC, O’Hagan A (2001) Bayesian calibration of computer models. J R Stat Soc Ser B (Stat
Methodol) 63(3):425–464
Kurtz T (1978) Strong approximation theorems for density dependent Markov chains. Stoch Process Appl
6:223–240
Lavielle M, Aarons L (2015) What do we mean by identifiability in mixed effects models? J Pharmacoki-
net Pharmacodyn 43:111–122
Lavielle M, Mentré F (2007) Estimation of population pharmacokinetic parameters of saquinavir in HIV
patients with the monolix software. J Pharmacokinet Pharmacodyn 34:229–249
Leary TO, Sutton A, Marder E (2015) Computational models in the age of large datasets. Curr Opin
Neurobiol 32:87–94
Lunn D, Thomas A, Best N et al (2000) Winbugs—a Bayesian modelling framework: concepts, structure
and extensibility. Stat Comput 10:325–337
Lavielle M, Samson A, Karina Fermin A, Mentre F (2011) Maximum likelihood estimation of long terms
HIV dynamic models and antiviral response. Biometrics 67:250–259
Murphy S, der Vaart AV (2000) On profile likelihood. J Am Stat Assoc 95:449–465
Nash JC (2016) Using and extending the optimr package
Pasin C, Balelli I, Van Effelterre T et al (2019) Dynamics of the humoral immune response to a prime−
boost Ebola vaccine: quantification and sources of variation. J Virol 93(18):e00579-19
Perelson A, Neumann A, Markowitz M et al (1996) Hiv-1 dynamics in vivo: virion clearance rate,
infected cell life−span, and viral generation time. Science 271:1582–1586
Pinheiro J, Bates DM (1994) Approximations to the loglikelihood function in the nonlinear mixed effects
model. J Comput Graph Stat 4:12–35
Prague M, Commenges D, Drylewicz J et al (2012) Treatment monitoring of HIV-infected patients based
on mechanistic models. Biometrics 68(3):902–911
Prague M, Commengues D, Guedj J et al (2013) Nimrod: a program for inference via a normal approx-
imation of the posterior in models with random effects based on ordinary differential equations.
Comput Methods Programs Biomed 111:447–458
Raftery A, Bao L (2010) Estimating and projecting trends in HIV/aids generalized epidemics using incre-
mental mixture importance sampling. Biometrics 66:1162–1173
Ramsay J, Hooker G, Cao J et al (2007) Parameter estimation for differential equations: a generalized
smoothing approach. J R Stat Soc 69:741–796
Sartori N (2003) Modified profile likelihood in models with stratum nuisance parameters. Biometrika
90:533–549
Sontag E (1998) Mathematical control theory: deterministic finite−dimensional systems. Springer, New
York
13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Parameter estimation in nonlinear mixed effect models based… 3005
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under
a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted
manuscript version of this article is solely governed by the terms of such publishing agreement and
applicable law.
13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center
GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers
and authorised users (“Users”), for small-scale personal, non-commercial use provided that all
copyright, trade and service marks and other proprietary notices are maintained. By accessing,
sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of
use (“Terms”). For these purposes, Springer Nature considers academic use (by researchers and
students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and
conditions, a relevant site licence or a personal subscription. These Terms will prevail over any
conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription (to
the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of
the Creative Commons license used will apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may
also use these personal data internally within ResearchGate and Springer Nature and as agreed share
it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not otherwise
disclose your personal data outside the ResearchGate or the Springer Nature group of companies
unless we have your permission as detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial
use, it is important to note that Users may not:
1. use such content for the purpose of providing other users with access on a regular or large scale
basis or as a means to circumvent access control;
2. use such content where to do so would be considered a criminal or statutory offence in any
jurisdiction, or gives rise to civil liability, or is otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association
unless explicitly agreed to by Springer Nature in writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a
systematic database of Springer Nature journal content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a
product or service that creates revenue, royalties, rent or income from our content or its inclusion as
part of a paid for service or for other commercial gain. Springer Nature journal content cannot be
used for inter-library loans and librarians may not upload Springer Nature journal content on a large
scale into their, or any other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not
obligated to publish any information or content on this website and may remove it or features or
functionality at our sole discretion, at any time with or without notice. Springer Nature may revoke
this licence to you at any time and remove access to any copies of the Springer Nature journal content
which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or
guarantees to Users, either express or implied with respect to the Springer nature journal content and
all parties disclaim and waive any implied warranties or warranties imposed by law, including
merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published
by Springer Nature that may be licensed from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a
regular basis or in any other manner not expressly permitted by these Terms, please contact Springer
Nature at