2016 - A Paradigm For Data-Driven Predictive Modeling Using Field Inversion and Machine Learning
2016 - A Paradigm For Data-Driven Predictive Modeling Using Field Inversion and Machine Learning
a r t i c l e i n f o a b s t r a c t
Article history: We propose a modeling paradigm, termed field inversion and machine learning (FIML),
Received 9 July 2015 that seeks to comprehensively harness data from sources such as high-fidelity simulations
Received in revised form 27 September and experiments to aid the creation of improved closure models for computational physics
2015
applications. In contrast to inferring model parameters, this work uses inverse modeling to
Accepted 7 November 2015
Available online 10 November 2015
obtain corrective, spatially distributed functional terms, offering a route to directly address
model-form errors. Once the inference has been performed over a number of problems
Keywords: that are representative of the deficient physics in the closure model, machine learning
Data-driven modeling techniques are used to reconstruct the model corrections in terms of variables that appear
Machine learning in the closure model. These reconstructed functional forms are then used to augment the
Closure modeling closure model in a predictive computational setting. As a first demonstrative example,
a scalar ordinary differential equation is considered, wherein the model equation has
missing and deficient terms. Following this, the methodology is extended to the prediction
of turbulent channel flow. In both of these applications, the approach is demonstrated
to be able to successfully reconstruct functional corrections and yield accurate predictive
solutions while providing a measure of model form uncertainties.
© 2015 Elsevier Inc. All rights reserved.
Even with the tremendous growth in computational power during the past decade, simulations based on first-principles
models of physical systems remain prohibitively expensive for most practical problems. As a result, one has to rely on
coarse-grained models to characterize or predict the overall state of a complex system or its statistical properties. Derivation
of the more affordable models, however, involves a number of additional assumptions that can limit their accuracy. The
pursuit of accurate closures in coarse-grained or intermediate/low-fidelity models is typically a central issue and pacing item
in many scientific disciplines. At the same time, it is becoming feasible to run first-principles or high-fidelity simulations
under idealized conditions and in some regimes of interest. Concurrently, experimental techniques have evolved to a point
where high resolution information can be provided at many scales, including those that are of direct relevance to problems
of interest for physicists and engineers. Against this backdrop, data mining techniques have already made their mark in many
disciplines of science and engineering by providing improved physical insight as well as quantitative data for modeling.
Unprecedented opportunities exist in going one step further and directly utilizing available data to improve and generate
predictive models that can be used in practical analysis and design in a robust manner.
Physical modeling has always been data-driven to a degree. Typically, a theory or set of theories are formulated and
unknown model coefficients and functions are empirically determined by correlating the response of the model with avail-
able data. Over the past decade, more formal calibration procedures have emerged in many different fields of application.
* Corresponding author.
E-mail addresses: [email protected] (E.J. Parish), [email protected] (K. Duraisamy).
http://dx.doi.org/10.1016/j.jcp.2015.11.012
0021-9991/© 2015 Elsevier Inc. All rights reserved.
E.J. Parish, K. Duraisamy / Journal of Computational Physics 305 (2016) 758–774 759
Discipline-specific reviews are presented in Navon [1] for data assimilation in weather forecasting and Kerschen et al. [2]
for system identification in Structural Dynamics. Given some data Gd and a model output Gm (Q, λ), where Q are state
variables and λ are model parameters, Frequentist (typically Least Squares) or Bayesian procedures are formulated to infer
optimal values of λ. This is usually accomplished by minimizing the difference between the data and the model output.
The Least Squares procedure is conceptually simple and can offer probability measures on the output, whereas the Bayesian
approach can be formulated more rigorously and can account for a more precise prescription of prior knowledge and prob-
ability structure, though at a much higher expense. Nevertheless, both types of techniques have been successfully used for
parameter estimation.
Errors in the underlying structure of the model may result in inadequacies [3], even if the best possible set of parameters
have been inferred. Predicted values of the outputs may never match the true value in a deterministic or statistical sense
for many models. A widely used approach to address model inadequacy is the Bayesian calibration framework of Kennedy
and O’Hagan (Ref. [4] and its derivatives [5,6], etc.). The essence of their approach is to represent the output quantity of
interest as
1. Mathematical setting
Consider a physical system that is governed by a set of non-linear equations (partial differential or otherwise). The
truth-model of the system is given by
along with well-posed initial and boundary conditions. The operator R T contains the governing equations of the system
while Q T contains the model variables. The physical system is modeled by
2. Field inversion
The challenge of creating the stochastic system in Eq. (5) is in the estimation of the distribution of the function β(η).
Model discrepancy inhibits direct extraction of the functional form of β(η) from available data. Instead, an inverse problem
is posed to infer the distribution of β such that realizations of the stochastic system are consistent with underlying physics;
potentially both in the mean and higher order statistics. Bayesian inversion is used to obtain β(ω) in the form of functional
corrections. The functional correction is obtained by inferring β at every grid point in the computational domain. The
process of inferring β can be thought of as follows: We start by having an estimate of β , along with a certain amount
of confidence in that estimate. This is the prior probability of β , given by p (β). Next, we observe an external system and
obtain an observational dataset d along with its associated uncertainty. For a given β , there is some probability that the
model will reproduce the dataset d. This is given by the likelihood function h(d|β). Given p (β) and h(d|β), there exists
some probability of β given the observations d. This is the posterior probability q(β|d). The goal of the inverse is to obtain
q(β|d). Mathematically, the posterior probability distribution is given by Bayes’ theorem
h(d|β) p (β)
q(β|d) = , (6)
c
where c = h(d|β) p (β)dβ . The solution of Eq. (6) can be made tractable via assumptions regarding the distribution of d
and β . In the case that the dataset d and random function β are Gaussian, and the distribution h(d|β) is Gaussian, it can be
1
Most continuum models have this issue in one form or the other.
2
In this paper, the focus is on the closure model and not on issues such as numerical discretization errors, initial/boundary condition uncertainties, etc.
E.J. Parish, K. Duraisamy / Journal of Computational Physics 305 (2016) 758–774 761
shown [18] that the problem of determining the distribution of β is reduced to estimating the maximum a posteriori (MAP)
solution, which is found by solving a deterministic optimization problem
1 T T
βmap = arg min d − h(β) Cm −1 d − h(β) + β − βprior Cβ −1 β − βprior , (7)
2
where Cm and Cβ are the observational and prior covariance matrices, respectively. The observational covariance is deter-
mined by the statistics of the observed dataset d and the prior covariance is determined by prior knowledge of the system.
The parameter to observable map h(β) is a subset of the governing equations. The parameter being optimized in Eq. (7)
is β . The term being minimized is referred to as the cost function J; i.e.
The dimensionality of the optimization problem scales with the number of discrete parameters being optimized. In this
case, β is being optimized at every point in the domain so the dimensionality of the optimization problem scales with the
number of mesh points. In the linear case, the covariance of the posterior is given by the inverse of the Hessian of the cost
function J evaluated at the MAP point
−1
d2 J
Cβmap = H−1 =
. (9)
βmap d βi d β j βmap
In the non-linear case, Eq. (9) becomes an approximation that is a result of a linearization about the MAP point. Once the
MAP solution and posterior covariance are found, realizations of β can be drawn from the posterior distribution. A standard
method is to perform a Cholesky decomposition on Cβmap such that
where s is a vector, the components of which are independent standard normal variates.
It is well-recognized that the Gaussian assumption is a strong one. In a general setting, the data d and prior β may not
be truly Gaussian. In non-linear problems, even if d and the prior β are Gaussian, the posterior distribution may not be
Gaussian in nature. In our approach, we consider the approximation to make our approaches feasible in high-dimensional
problems. The error introduced by the Gaussian assumption is problem-dependent and can be estimated by analyzing
the model response, but in the non-linear case, the posterior distribution must be viewed as an approximation. Computing
non-Gaussian posterior distributions require expensive sampling based methods, such as Markov chain Monte Carlo (MCMC)
methods (see Appendix A). These methods are especially costly in high-dimensional state spaces. The inversion procedure
described in this paper is scalable, while sampling based methods are not.
In the following subsections, different aspects of the inversion process are discussed.
For field inversion, the dimensionality of the inversion scales with the number of mesh points. Though simplifications
may be performed by constructing a surrogate representation of β over the computational domain, we pursue the more
detailed approach of estimating β at every grid point in the computational domain. The resulting optimization problem is
high-dimensional and efficient methods of minimizing the cost function are needed. Gradient-based methods are used to
solve the inverse problem in this work. These methods require derivatives with respect to a large number of parameters,
which are efficiently calculated using a discrete adjoint [19] formulation. To determine the gradient, the adjoint equation is
first solved for ψ ,
T T
∂R ∂J
ψ =− , (12)
∂Q ∂Q
where J is the cost function, R is the residual of the primal equations, and Q are the model variables. The gradient is then
computed using
dJ ∂J ∂R
G= = + ψT . (13)
dβ ∂β ∂β
The optimization problem is solved using BFGS [20], or in problematic cases, using steepest decent. The solution of the
adjoint equation requires the computation of the Jacobian of the primal equations, which is calculated analytically.
762 E.J. Parish, K. Duraisamy / Journal of Computational Physics 305 (2016) 758–774
To determine the posterior covariance, a Hessian calculation is required. In this work, an adjoint-adjoint method is used
to compute the Hessian. For a system with M discrete model variables and N optimization parameters (the model variables
are Q at each grid point and the optimization parameter is β at each grid point), the Hessian is computed by [21]:
∂ 2J ∂ 2 Rm ∂ Rm ∂ 2J ∂ 2 Rm
Hij = + ψm + μi ,m + νi ,m + νi ,n ψm m, n ∈ [1, M ] (14)
∂βi ∂β j ∂βi ∂β j ∂β j ∂ Q n ∂β j ∂ Q n ∂β j
where
∂ Rm ∂ Rm
νi,n =− m, n ∈ [1, M ] i ∈ [1, N ] (15)
∂ Qn ∂βi
∂ Rm ∂2 F ∂ 2 Rm ∂ 2J ∂ 2 Rm
μi,m =− − ψm − νi ,n − νi ,n ψm k, m, n ∈ [1, M ] i ∈ [1, N ]. (16)
∂ Qk ∂βi ∂ uk ∂βi ∂ Q k ∂ Q n∂ Q k ∂ Q n∂ Q k
A low-rank approximation is useful for a diagonal observational covariance matrix. A diagonal covariance assumes that
the data are uncorrelated. This is a reasonable approximation when measurement error dominates the observational vari-
ance. It is additionally relevant for cases where data is not available to build a complete covariance matrix. For a cost
function with a diagonal observational covariance matrix and no prior, the cost function simplifies to
2
1
M
h(β)i − di
J= . (17)
2
i
σi
To approximate the Hessian, M scalar valued functions are defined,
h(β)i − di
f i (β) = for i = 1, 2, . . . , M . (18)
σi
The gradient of the non-regularized cost function can computed as
M
∇J(β) = ∇ f i (β)2 = 2 f i (β)∇( f i (β)). (19)
i =1 i =1
Or equivalently
The Jacobian of the scalar valued functions can then be used to approximate the Hessian by
In the case of an uninformed prior with a high covariance, the magnitude of the observational covariance will generally
be much less than that of the prior. The solution of the inverse problem is expectedly sensitive to the specification of Cm . In
general, Cm is determined by the statistics of the observational data. The availability of observational statistics varies from
case to case; and it is thus important to quantify the performance of the inversion for different forms of Cm . In this work,
three different models are considered. The simplest possible model we consider takes the form
2
Cm = σobs I, (23)
where σobs is a scalar that is representative of some mean variance of the observations. Such a model neglects all covari-
ances. The second model considered is an extension of the above
obs
Cm = σ 2
I, (24)
E.J. Parish, K. Duraisamy / Journal of Computational Physics 305 (2016) 758–774 763
where σ obs is a vector containing the variances for each observation. This model assumes that σobs can be determined from
available data. The third model considered assumes the availability of a complete set of statistics, in which case the exact
covariance matrix of the data vector D is given by
Cm = E Di − Di Dj − Dj , (25)
3. Machine learning
The inversion produces solutions of the correction β that are in a spatio-temporal (or in the present work, spatial) form,
for specific problems. If the inversion is performed over a large number of problems and objective functions, problem-specific
inference can be converted to general modeling knowledge via Supervised machine learning [22,23] algorithms. These tech-
niques can be used to elicit the functional relation β(η) where η(Q) are the model input features. The examples provided
in this paper use Gaussian Processes [24] (GPs), and the interaction between the inverse and ML formulation is limited to
variances, i.e. the ML algorithm does not use the entire covariance matrix generated in the inversion. In the GP formulation,
it is assumed that the output function at the training points, Ytrain (where Ytrain consists of inferred spatio-temporal fields
of β for a wide class of problems), is drawn from the distribution
where φ is a vector whose elements are φ(ηtrain,i , ηtest ). It is noted that in the current implementation the machine learning
process only returns variances. The covariance of the training data assumed in Eq. (27) is a mathematical construct designed
to help the machine learning process and has no direct relation to the true covariance of the training data.
The hyperparameters h in Eq. (27) are found by maximizing the probability of obtaining an output distribution Ytrain
given input features η and hyperparameters h. This is done by maximizing the log marginal likelihood function,
1 1 N
log p (Ytrain |ηtrain , h) = − log |φ + λ| − Ytrain (φ + λ)−1 Ytrain − log(2π ), (31)
2 2 2
where N is the number of training points.
The framework is now applied to a scalar non-linear ordinary differential equation that resembles one-dimensional heat
conduction with radiative and convective heat sources. The “true” model is taken to be
d2 T 4
= ε(T ) T ∞ − T 4 + h( T ∞ − T ), z ∈ [0..1], (32)
dz2
with homogeneous boundary conditions. We will refer to ε as the emissivity of the material and h as the convection
coefficient. In the true process, the emissivity is a stochastic non-linear function of temperature and is given by
764 E.J. Parish, K. Duraisamy / Journal of Computational Physics 305 (2016) 758–774
Fig. 1. Solutions of the base model compared to the mean of the true process.
Table 1
Summary of the conditions used for the inversion.
3π
ε( T ) = 1 + 5 sin T + exp(0.02T ) + N (0, 0.1 ) × 10−4 . 2
(33)
200
The convection coefficient is taken to be a constant of h = 0.5. To demonstrate the framework, we consider the case where
the true process (Eqs. (32) and (33)) is unknown. The process is imperfectly modeled by
d2 T 4
= ε0 T ∞ ( z) − T 4 (34)
dz2
with ε0 = 5 × 10−4 . Eq. (34) will be referred to as the base model. The resulting model outputs are shown in Fig. 1. The
model particularly suffers when T ∞ is low, where the ignored linear term is significant. The inverse is posed by adding a
spatial multiplier to ε0
d2 T 4
= β(z)ε0 T ∞ ( z) − T 4 . (35)
dz2
The goal of the framework is to obtain β( z) from the inversion, and then to learn β = β( T , T ∞ ). Note that β will encapsulate
both the true form of ε and the convective heat transfer term. The true solution for β is
1 3π h T∞ − T
β( T , T ∞ ) = βr + βc = 1 + 5 sin T + exp(0.02T ) + N (0, 0.1) × 10−4 + 4 − T4
. (36)
ε0 200 ε0 T ∞
Synthetic data is generated by solving 100 realizations of the true process (Eq. (32)) for T ∞ ∈ [5, 10, . . . , 50]. The governing
equation is solved using second order central differences on a uniform mesh with 31 grid points. These synthetic data are
used as observational data for the inverse calculations. Note that the inversion is performed on the same computational
grid. A summary of the conditions used for the inversion are given in Table 1.
The inversion is performed using the various models for the observational covariance matrix Cm that were previously
discussed. For Cm = σobs
2
I, the observational data is used to compute a single mean variance. For Cm = σ obs
2
I, the obser-
vational data is used to compute the variance of temperature at each grid point. An uninformative prior is selected that
corresponds with the baseline model, i.e. βprior = 1. The prior variance is selected such that the 2σ limits of the prior PDF
of temperature encompass the observed solution. The prior PDF for temperature is determined by solving the forward model
E.J. Parish, K. Duraisamy / Journal of Computational Physics 305 (2016) 758–774 765
for samples of βprior . In this case, the forward model was sampled 100 times. Elementary statistical formulae can be used
as a general guideline to determine the number of required samples. Given n statistically independent samples, the error on
the mean σ X and the error on the (co)variances σ S can be approximated by
σ 2 2 2
σX = √ , σ =σ
S . (37)
n (n − 1)
For clarity, the entire solution process is outlined:
1. Sample the prior distribution of β via Equations (10) and (11) with the assumed prior covariance matrix Cβ .
(a) For each sample of β solve the model equation (Equation (35)) to determine distributions for temperature.
(b) Determine if Cβ was appropriately chosen by ensuring that the observed temperature profile falls within the ±2σ
limits of the distribution predicted by the model equation.
2. Solve the inverse problem with the various models for Cm by solving the optimization problem.
3. Sample the posterior distribution of βmap via Equations (10) and (11) with Cβmap .
(a) For each sample of β , solve the model equation to determine the posterior distributions for temperature about the
MAP point.
The results of the inversion for T ∞ = 50 are of the most interest and are shown in Fig. 2. It is first seen that the MAP
solution for temperature coincides with the observed value for all models. The MAP solution for β , however, only coincides
with the true solution when the complete observational covariance is used. For the diagonal models of Cm that ignore
covariance, the posterior variance is too high in the center of the domain, as seen in Figs. 2a and 2b. However, the posterior
variance is directly correlated to the local accuracy of βmap . When the complete observational covariance is used the correct
posterior distribution is inferred across the entire domain.
Fig. 3 gives a compiled summary of the inferred βmap for the 10 cases. For plotting purposes, βr and βc are extracted
from βmap and the uncertainty bounds are attached to βr . The missing and deficient terms in the model equation have been
effectively inferred, especially in Fig. 3c. For the lower-order representations of Cm , β was inferred correctly over most of
the domain; with error being present at low and high temperatures. This error is well reflected by the posterior variance.
The solutions using the complete observational covariance are seen to yield extremely accurate inferences for β and σ over
the entire domain.
Several conclusions can be drawn from the inference step. First, the performance of the inversion was comparable for
Cm = σ 2 I and Cm = σ 2 I. This shows that an accurate observational covariance is needed to correctly infer the posterior
distribution of β . Second, if a simpler observational covariance is used, the posterior distribution can still provide informa-
tion on the accuracy of the inference; but the posterior distribution is not representative of the underlying physics. Here
we make an important note that the objective of this model problem was to infer the correction β and its proper posterior
distribution. However, it is often desirable to infer a correction for a mean quantity (as in the next example). Under these
settings, the covariance
matrices should be constructed differently than in the procedure described above. For example,
mean
σobs = σobs / N samples is a more appropriate standard deviation for mean quantities. In general, an arbitrarily low vari-
ance can always be set for the observable, in which case the resulting MAP solutions will be in closer agreement with the
observed data (the discrepancies between the MAP and true solutions in Fig. 2 can be eliminated with this method).
In this section it was shown that, unless the correct observational covariance was used, the statistics of the resulting
posterior distribution can be inaccurate. It is well argued that, in the absence of the true observational covariance, the
inversion could be carried out for low order statistics such as mean quantities.
As described in Section 3, a machine learning algorithm utilizing GPs is used to elicit the functional relationship β( T , T ∞ )
from the spatial data generated in the inversion. The data generated from the inversion using the exact observational
covariance is used for training. It is noted that the ML formulation employed only makes use of the variance predicted
by the inverse, rather than the entire covariance matrix. The hyperparameters for the GP are optimized off-line, and then
the resulting model is injected into the solver at every iteration of the solution. Systematically, the solver calls the residual
calculation routine which in turn calls the ML algorithm. The ML algorithm is queried with T and T ∞ and returns βML
and σML .
The machine-learned predictive model was evaluated for a variety of new predictive cases. Table 2 gives a summary of
the conditions and the performance of the ML model. Case 1 provides examples with similar “physics” as the training cases
(i.e. T ∞ = constant), and the results of the implementation are excellent. Cases 2 and 3 explore the performance of the
model for regions of T ∞ > 50, for which there were no training data. For these cases, an improvement is observed over
the baseline model. Cases 4 and 5 explore the performance of the ML model for lower values of T ∞ , where the linear heat
transfer term becomes important. Again, it is seen that the performance of the model is much improved. The solutions for
Case 2 and Case 5 with the predictive model are given in Figs. 4 and 5. In both cases, the ML solution for temperature and
β is much improved from the base model, but error is still present. However, excellent correlation is seen between model
766 E.J. Parish, K. Duraisamy / Journal of Computational Physics 305 (2016) 758–774
Fig. 2. Posterior model distributions at T ∞ = 50 for each model of Cm . From left to right T and β with 2σ limits, σ , and Cβmap are shown, respectively.
Table 2
Summary of cases used to test the predictive model. The L2 norm is used to compute the errors reported in column 5.
error and the predicted variance. Similar correlations were seen for all cases, suggesting that the posterior variance is an
indicator of local model accuracy. This feature is extremely useful in a predictive setting.
The modeling of turbulent flows has been a long-standing obstacle to the application of computational fluid dynamics
(CFD) to many practical problems. Direct numerical simulations (DNS) attempt to resolve all scales of turbulence but the
resolution requirements make this technique infeasible for most flows of engineering interest. To compute practical high
Reynolds number flows, near-wall modeling is performed using a Reynolds-averaged Navier–Stokes (RANS)-type closure.
RANS-based methods are typically formulated using a combination of theory and intuition. Traditionally, a number of free
E.J. Parish, K. Duraisamy / Journal of Computational Physics 305 (2016) 758–774 767
Fig. 3. Summary of Posterior model distributions for each model of Cm . From left to right, βr , βc , and σ are shown.
parameters remain in the model and these are calibrated using empirical fitting and are often found to be deficient in many
flows. The key issue is that the main source of error is in the functional form of the model terms. Functional relationships
elicited directly from high-fidelity simulation or experimental data will not translate to RANS model improvements since
the inference has to be within the context of the model. The technique outlined in this section infers and reconstructs a
correction that is consistent with the low fidelity (RANS) model.
The FIML framework is applied to turbulent channel flow with a k − ω turbulence model [25]. The Reynolds-averaged
momentum equation for incompressible fully-developed channel flow is given by
∂ ∂u ∂p
μ − ρ u v − = 0, (38)
∂y ∂y ∂x
where p and u are the mean pressure and velocity, respectively. The process of Reynolds averaging introduces the unclosed
Reynolds stresses, τi j = −ρ u v . Determining τi j is the fundamental challenge of turbulence modeling. The k − ω model
makes use of the Boussinesq approximation, where the Reynolds-stress tensor is assumed to take the form
2
τi j = 2νt S i j − kδi j , (39)
3
768 E.J. Parish, K. Duraisamy / Journal of Computational Physics 305 (2016) 758–774
with νt being the turbulent eddy viscosity, k the turbulent kinetic energy, and S i j the mean strain-rate tensor. The still
unclosed turbulent eddy viscosity is then determined by introducing transport equations for the turbulent kinetic energy k
and the specific dissipation rate ω . On dimensional grounds, the turbulent eddy viscosity is modeled by
k
νt = C μ , (40)
ω
where C μ is a constant of proportionality. For the case of planar channel flow, the transport equations for k and ω become
ordinary differential equations of the form
2
∂u ∗ ∂ ∗k ∂k
νt − α kω + ν +σ =0 (41)
∂y ∂y ω ∂y
2
∂u ∂ k ∂ω
γ − αω2 + ν +σ = 0. (42)
∂y ∂y ω ∂y
The standard closure coefficients for the Wilcox k − ω model are used and are given in Table 3, along with the associated
boundary conditions for the channel flow. Equations (38) through (42) will be referred to as the base model. Numerically,
these governing equations are discretized with second-order finite differences and the system is solved by introducing
pseudo-time derivatives to the left hand side of Equations (38), (41), and (42). Implicit time integration is then used to
iterate the system to a steady state. The equations are solved on a geometrically graded mesh with the first grid point
E.J. Parish, K. Duraisamy / Journal of Computational Physics 305 (2016) 758–774 769
Table 3
Summary of model coefficients and boundary conditions for the k − ω model in planar channel flow. The channel wall is at y = 0 and the mid plane of the
channel is at y = h/2.
Cμ α∗ σ∗ γ α σ y=0 y = h/2
1.00 0.09 0.6 13/25 0.09 0.5 u , k = 0; ω = ωw ∂/∂ y (u , k, ω) = 0
placed well into the viscous sublayer at y + ≈ 0.05. At the wall, the boundary condition for ω becomes singular, which is
numerically handled by analyzing asymptotics [25] of ω .
5.1. Inversion
The functional correction β( y ) is introduced as a multiplier to the production term in the turbulent kinetic energy
equation,
2
∂u ∂ k ∂k
νt β( y ) − α ∗kω + ν + σ∗ = 0. (43)
∂y ∂y ω ∂y
Introducing β to the production term modifies the entire turbulence model, and is equivalent to adding an additive source
term. DNS data from Jimenez et al. [26] are used in the inverse modeling. Of this data, the velocity profiles are targeted;
i.e. d = uDNS . Since the DNS data provides a near perfect observation of truth, the observational covariance is taken to
be Cm = σobs2
I where σobs = 10−10 . This choice neglects covariance in the observed data. Constructing a more accurate
observational covariance, as was done in the previous problem, requires statistics that are not readily available from DNS.
In this case, the two-point correlation of the mean streamwise velocity in the wall normal direction could be used to build
an accurate observational covariance. However, a plausible alternative could be to use an approximation to the two-point
correlation to build a more accurate observational covariance.
The prior distribution is determined by the same process discussed previously, where σ p = 0.5 was selected such that
the DNS velocity profile falls within the 2σ limits of the prior model. With σobs = 10−10 and σ p = 0.5, the dependence on
the prior has been effectively eliminated for regions where the inferred function is sensitive to the data.
The inversion is performed for different wall-shear stress-based Reynolds numbers Reτ ∈ [180, 550, 950, 2000, 4200]. An
example of the resulting posterior distribution for inferred velocity and β is given in Fig. 6. The MAP solution is seen to
match the DNS data very well. Due to the low observational variance, the posterior distribution for velocity collapses on the
MAP solution. The posterior distribution for β also collapses on the MAP estimate for y + > 5. Turbulent production within
the viscous sublayer ( y + < 5) is very small and thus the value of β in this region is largely inconsequential and cannot be
inferred with a high degree of confidence.
A summary of the inferred corrections for all Reynolds numbers is given in Fig. 7. A universal scaling is seen with y +
within the inner layer, and with y near the center of the channel, both results being consistent with the underlying physics.
A Reynolds number dependence that is usually missed in traditional turbulence models is additionally observed. A detailed
summary of the inversion is provided in [27].
Gaussian processes are again used to extract the functional √relationship β(η). The non-dimensional input features η
considered in this process are the inverse solutions for { Sk/ε , d k/ν , P /ε , y + } at Reτ ∈ [180, 550, 950, 4200]; Reτ = 2000
was omitted from this training data set. In the training process, only input features within the inner layer were considered.
It is well recognized that a functional relationship β(η) between the model correction and input features may not exist; or
minimally the accuracy of the functional extraction may vary across the solution space. Injecting the ML algorithm into the
solver may not be sufficient for cases where local ML predictions are highly inaccurate. One method to make an appropriate
model update is to consider an additional Bayesian update step after the machine learning has completed. This final Bayesian
update step (with the appropriate assumptions) is given by
1 T T
βpost = arg min β − βML Cβ − 1
ML β − βML + β − βprior Cβ prior −1 β − βprior . (44)
2
Note that C βML and βML are functions of both the inverse and the machine learning algorithms, while βprior is specified in
the inverse. The posterior model will assimilate to the ML model for regions where the variance of the ML model is low.
For regions of high variance, the posterior model will assimilate to the prior model. Although not shown, the results using
this method for the previous model problem were comparable to those reported in Table 2.
The posterior model described above is now tested at Reτ = 2000. The ML model is queried once at the inverse solution
to construct a machine learned correction. This correction is applied during the predictive solution to obtain the final
posterior model. The impact of the final Bayesian update is seen in Fig. 8. The ML prediction performs well within the inner
layer, as such a low training variance is predicted and the posterior model assimilates to the ML model. In the channel core,
770 E.J. Parish, K. Duraisamy / Journal of Computational Physics 305 (2016) 758–774
Fig. 6. Posterior model distributions for planar channel flow at Reτ = 934. The ±2σ limits are shaded in both figures.
Fig. 7. Summary of inferred β for Reτ ∈ [186, 547, 934, 2004, 4200].
the ML model is unable to extract an accurate functional relationship. This lack of accuracy is well reflected by the high ML
variance in this solution region, as the posterior model assimilates to the prior model.
Fig. 9 shows the resulting velocity predictions and the associated 95% confidence intervals. Two features worth noting
are the improved performance within the inner layer and the correlation between the confidence intervals and model error.
The MAP solution within the inner layer is much improved, in particular the slight bump that characterizes the buffer layer
is well captured in the posterior model. In the outer layer, however, the mean velocity is under-predicted. In this region,
E.J. Parish, K. Duraisamy / Journal of Computational Physics 305 (2016) 758–774 771
a high ML variance was predicted and the posterior model reverts to the prior model. The failure to predict the increased
destruction of TKE in the channel core leads to too high of an eddy viscosity and an under-prediction of velocity. It is worth
noting that the turbulent eddy viscosity predicted by the baseline k − ω model is too high in the channel core. When the
correct behavior within the inner layer is captured and the baseline model is used in the channel core, an under-prediction
in velocity is expected. The confidence intervals given by the posterior model again provide a reasonable estimate of the
underlying uncertainty of the model. While the resulting PDFs should not be viewed as exact, they provide information
about local model accuracy, which is of paramount value to the practitioner.
The wealth of available data from high-fidelity simulations and high resolution experiments provides unprecedented
opportunities to more comprehensively inform closure models. In this work, a data-driven modeling approach, which we
refer to as FIML (field inversion and machine learning) was presented. The proposed approach moves beyond parameter
calibration and uses data to directly infer information about the functional form of model discrepancies. The inference
process generates function correction information for specific problems. Once the inference is applied over a number of
problems, machine learning is used to reconstruct the inferred function in terms of variables that will be available during
predictive simulations using lower fidelity models. This step aims to create generic modeling knowledge from the inferred
information. The reconstructed function is then embedded into a predictive solution process. In contrast to existing calibra-
tion frameworks, our approach uses data to directly infer information about underlying model discrepancies and provides a
methodology to generalize the inferred information. This approach provides insight into model error at a fundamental level,
rather than at the level of the output.
772 E.J. Parish, K. Duraisamy / Journal of Computational Physics 305 (2016) 758–774
The framework was applied to a scalar non-linear ODE model problem, in which missing and deficient terms were re-
constructed and the predictive capability of the improved model was confirmed. A second application was extended to
turbulent channel flow, where DNS data was used to inform a standard Reynolds-averaged closure model. While it was
shown that precise observational statistics may be needed to precisely quantify the posterior distribution, simple approxi-
mations for the prior statistics and linearized Gaussian assumptions for the posterior proved to be sufficient to obtain mean
solutions and posterior distributions that are representative of the modeling error.
The field inversion process directly provides comprehensive information about model discrepancies, which is of great use
to the modeler in the quest to formulate more accurate closures. The machine learning step could be considered as one tool
that can be used to reconstruct the discrepancy in terms of low fidelity model information. It was demonstrated that, for
the simple problems considered, it is possible to use machine learning methods to elicit functional relationships and the
associated uncertainties for the corrections obtained in the inference process. This extraction allows for predictive modeling.
The examples in this paper are illustrative in nature. For the framework to be able to offer improved predictions in
practical situations, inverse problems must be solved over a wide class of problems (and over multiple objective functions
of interest) that will be representative of the deficient physics in the baseline model. Concurrently, the tendency of the
learning process to over-fit data must also be avoided. At every stage of the process, the underlying physical insight is
irreplaceable and thus it is left to the modeler to make judicious choices about the data, prior information and introduction
of one or more correction functions. Further, physical considerations such as realizability and consistency with asymptotic
limits should be enforced.
A number of challenges remain for a full-scale implementation in complex problems. These include grid/numerical
scheme dependence of the inferred corrections, solver convergence, scalability, learning errors, accounting for non-Gaussian
behavior, etc. The present work has nevertheless demonstrated that the FIML method can play a significant role in using
data to more comprehensively inform predictive models, offering a route to creating improved closure approximations while
providing measures of model-form uncertainties.
Acknowledgements
This work was supported by NASA LEARN project NNX15AN98A and by the NSF via grant 1507928.
The inverse procedure outlined in this paper makes strong assumptions about the Gaussian nature of the underlying
PDFs. In our approach, it is assumed that distributions for the prior probability p (β), observational data d, likelihood h(d|β),
and conditional probability q(β|d) are all Gaussian. In the linear case, if d and p (β) are Gaussian, the posterior distribution
will be Gaussian. This need not be true in the non-linear case. To accurately infer the posterior PDF in the non-linear case,
more expensive methods (such as sampling) need to be utilized. Since the non-linear heat problem presented in Section 4 is
relatively simple, Markov Chain Monte Carlo (MCMC) simulations were performed. Sampling was performed with the Python
package PyMC [28], which utilizes the Metropolis–Hastings step, to determine the posterior distribution. Fig. 10 shows
the posterior distribution determined by MCMC sampling for T ∞ = 50 with the complete observational covariance matrix
(Eq. (25)) and compares it to the MAP solution obtained through Bayesian inversion. It is seen that the comparison between
the two methods is excellent. The MAP solution coincides almost perfectly with the mean MCMC solution. Additionally, the
posterior PDFs are Gaussian across the entire domain and compare well with the MAP solution. While it is not prudent
to generalize these results to other non-linear problems, the Gaussian assumption appears to be reasonable in this specific
problem.
The primary computational cost in the FIML framework arises from functional inversion and construction of the machine-
learned model, both of which are off-line processes. The inverse problem requires the solution of a high-dimensional
optimization problem, which could be simplified by using a parametric representation of the function β . The number of
iterations required by optimization algorithms varies from problem to problem, but the computational cost is O (100) solves
of the forward model. In the presented work, the forward model is additionally solved for realizations of β . This sampling
process is performed for both the prior and posterior distributions. The number of samples required depends on the de-
sired accuracy of statistical quantities, but O (100) samples serves as a representative estimate. Additionally, the sampling
process is embarrassingly parallel. Note that the number of forward solves required for sampling and optimization (assum-
ing gradient-based methods) does not scale with dimensionality. Constructing the Gaussian Process ML model requires an
N × N inversion (with N being the number of training points). Although this is an off-line process, it can become pro-
hibitively expensive for very large training sets, demanding sparse and approximate solvers. Additionally, determining the
GP hyperparameters requires the matrix inversion at each iteration of the optimization algorithm. For high-dimensional
learning problems, more efficient ML algorithms (such as neural networks or approximate GPs [23]) need to be considered.
The on-line cost of the FIML framework is realized in the evaluation of the ML model. In the case of GPs, this evalua-
tion requires a matrix-vector multiplication. In practice, the introduction of the ML model may have an impact on solver
convergence and stability, both of which could affect the computational cost.
E.J. Parish, K. Duraisamy / Journal of Computational Physics 305 (2016) 758–774 773
Fig. 10. The posterior distribution of β as obtained through MCMC is compared to the MAP solution. The upper left figure compares the mean solution
obtained with MCMC and the 95% confidence intervals. The upper right figure shows the PDF for β at z = 0.1. The lower left and right figures show the
PDF for β at z = 0.5 and z = 0.96 respectively.
References
[1] I.M. Navon, Data assimilation for numerical weather prediction: a review, in: Data Assimilation for Atmospheric, Oceanic, and Hydrologic Applications,
Springer, 2009.
[2] G. Kerschen, K. Worden, A. Vakakis, J. Golinval, Past, present, and future of system identification in structural dynamics, Mech. Syst. Signal Process. 20
(2006) 505–592.
[3] N.R. Council, Assessing the Reliability of Complex Models: Mathematical and Statistical Foundations of Verification, Validation, and Uncertainty Quan-
tification, National Academies Press, 2012.
[4] M.C. Kennedy, A. O’Hagan, Bayesian calibration of computer models, J. R. Stat. Soc., Ser. B, Stat. Methodol. 63 (3) (2001) 425–464.
[5] M.C. Kennedy, C.W. Anderson, S. Conti, A. O’Hagan, Case studies in Gaussian process modelling of computer codes, Reliab. Eng. Syst. Saf. 91 (10) (2006)
1301–1309.
[6] S. Conti, J.P. Gosling, J.E. Oakley, A. O’hagan, Gaussian process emulation of dynamic computer codes, Biometrika 96 (3) (2009) 663–676.
[7] J. Brynjarsdóttir, A. O’Hagan, Learning about physical parameters: the importance of model discrepancy, Inverse Probl. 30 (11) (2014) 114007.
[8] P.D. Arendt, D.W. Apley, W. Chen, Quantification of model uncertainty: calibration, model discrepancy, and identifiability, J. Mech. Des. 134 (10) (2012)
100908.
[9] R.C. Smith, Uncertainty Quantification: Theory, Implementation, and Applications, Computational Science and Engineering, vol. 12, SIAM, 2013.
[10] S.H. Cheung, T.A. Oliver, E.E. Prudencio, S. Prudhomme, R.D. Moser, Bayesian uncertainty analysis with applications to turbulence modeling, Reliab. Eng.
Syst. Saf. 96 (9) (2011) 1137–1149.
[11] W.N. Edeling, P. Cinnella, R.P. Dwight, Bayesian estimates of parameter variability in the k − ε turbulence model, 2014.
[12] J.L. Beck, L.S. Katafygiotis, Updating models and their uncertainties. I: Bayesian statistical framework, J. Eng. Mech. 124 (4) (1998) 455–461.
[13] L.M. Berliner, K. Jezek, N. Cressie, Y. Kim, C. Lam, C.V.D. Veen, Modeling dynamic controls on ice streams: a Bayesian statistical approach, J. Glaciol. 54
(2008) 705–714.
[14] K. Sargsyan, H. Najm, R. Ghanem, On the statistical calibration of physical models, Int. J. Chem. Kinet. 47 (4) (April 2015) 246–276.
[15] J.L. Loeppky, D. Bingham, W.J. Welch, Computer model calibration or tuning in practice, Technical report, University of British, Columbia, 2006.
[16] C. Soize, Stochastic modeling of uncertainties in computational structural dynamics recent theoretical advances, J. Sound Vib. 332 (10) (2013)
2379–2395.
[17] M.L. Mehta, Random Matrices, Pure and Applied Mathematics, vol. 142, Academic Press, 2004.
[18] R. Aster, Parameter Estimation and Inverse Problems, Elsevier Academic Press, 2005.
[19] M.B. Giles, M.C. Duta, J.-D. Müller, N.A. Pierce, Algorithm developments for discrete adjoint methods, AIAA J. 41 (2) (2003) 198–205.
774 E.J. Parish, K. Duraisamy / Journal of Computational Physics 305 (2016) 758–774
[20] J.E. Dennis Jr., J.J. Moré, Quasi-Newton methods, motivation and theory, SIAM Rev. 19 (1) (1977) 46–89.
[21] P. Caplan, Numerical computation of second derivatives with applications to optimization problems, Unpublished academic report, MIT.
[22] B.D. Tracey, K. Duraisamy, J.J. Alonso, A machine learning strategy to assist turbulence model development, in: 53rd AIAA Aerospace Sciences Meeting,
The American Institute of Aeronautics and Astronautics, 2015.
[23] Z.J. Zhang, K. Duraisamy, Machine learning methods for data-driven turbulence modeling, in: AIAA Aviation and Aeronautics Forum and Exposition,
Dallas, Texas, June 2015.
[24] C.E. Rasmussen, Gaussian processes for machine learning, 2006.
[25] D.C. Wilcox, Turbulence Modeling for CFD, vol. 2, DCW Industries, La Canada, CA, 1998.
[26] J. Jimenez, S. Hoyas, Turbulent fluctuations above the buffer layer of wall-bounded flows, J. Fluid Mech. 611 (2008) 215–236.
[27] E. Parish, K. Duraisamy, Quantification of turbulence modeling uncertainties using full field inversion, in: AIAA Aviation and Aeronautics Forum and
Exposition, Dallas, Texas, June 2015.
[28] A. Patil, D. Huard, C. Fonnesbeck, PyMC: Bayesian stochastic modelling in Python, J. Stat. Softw. 35 (2010) 4.