OpenTURNS: An Industrial Software For Uncertainty Quantification in Simulation
OpenTURNS: An Industrial Software For Uncertainty Quantification in Simulation
net/publication/318175970
CITATIONS READS
143 3,412
4 authors, including:
All content following this page was uploaded by Régis Lebrun on 21 March 2018.
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1 Presentation of OpenTURNS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 The Uncertainty Management Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Main Originality of OpenTURNS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 The Flooding Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Uncertainty Quantification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1 Modeling of a Random Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Statistics Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Conditioned Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Bayesian Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Uncertainty Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1 Min-Max Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 Central Tendency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3 Failure Probability Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.1 Graphical Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 Sampling-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5 Metamodels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
M. Baudin ()
Industrial Risk Management Department, EDF R&D France, Chatou, France
e-mail: [email protected]
A. Dutfoy
Industrial Risk Management Department, EDF R&D France, Saclay, France
e-mail: [email protected]; [email protected]
A.-L. Popelin
Industrial Risk Management Department, EDF R&D France, Chatou, France
e-mail: [email protected]
B. Iooss
Industrial Risk Management Department, EDF R&D France, Chatou, France
Abstract
The needs to assess robust performances for complex systems and to answer
tighter regulatory processes (security, safety, environmental control, health
impacts, etc.) have led to the emergence of a new industrial simulation challenge:
to take uncertainties into account when dealing with complex numerical
simulation frameworks. Therefore, a generic methodology has emerged from
the joint effort of several industrial companies and academic institutions. EDF
R&D, Airbus Group, and Phimeca Engineering started a collaboration at the
beginning of 2005, joined by IMACS in 2014, for the development of an open-
source software platform dedicated to uncertainty propagation by probabilistic
methods, named OpenTURNS for open-source treatment of uncertainty, Risk
’N Statistics. OpenTURNS addresses the specific industrial challenges attached
to uncertainties, which are transparency, genericity, modularity, and multi-
accessibility. This paper focuses on OpenTURNS and presents its main features:
OpenTURNS is an open- source software under the LGPL license that presents
itself as a C++ library and a Python TUI and which works under Linux and
Windows environment. All the methodological tools are described in the different
sections of this paper: uncertainty quantification, uncertainty propagation,
sensitivity analysis, and metamodeling. A section also explains the generic
wrappers’ way to link OpenTURNS to any external code. The paper illustrates
as much as possible the methodological tools on an educational example that
simulates the height of a river and compares it to the height of a dike that protects
industrial facilities. At last, it gives an overview of the main developments
planned for the next few years.
Keywords
OpenTURNS • Uncertainty • Quantification • Propagation • Estimation •
Sensitivity • Simulation • Probability • Statistics • Random vectors • Mul-
tivariate distribution • Open source • Python module • C++ library • Trans-
parency • Genericity
1 Introduction
The needs to assess robust performances for complex systems and to answer tighter
regulatory processes (security, safety, environmental control, health impacts, etc.)
have led to the emergence of a new industrial simulation challenge: to take uncer-
OpenTURNS: An Industrial Software for Uncertainty Quantification in Simulation 3
tainties into account when dealing with complex numerical simulation frameworks.
Many attempts at treating uncertainty in large industrial applications have involved
domain-specific approaches or standards: metrology, reliability, differential-based
approaches, variance decomposition, etc. However, facing the questioning of their
certification authorities in an increasing number of different domains, these domain-
specific approaches are no more appropriate. Therefore, a generic methodology
has emerged from the joint effort of several industrial companies and academic
institutions: [28] reviews these past developments. The specific industrial challenges
attached to the recent uncertainty concerns are:
As no software was fully answering the challenges mentioned above, EDF R&D,
Airbus Group, and Phimeca Engineering started a collaboration at the beginning
of 2005, joined by IMACS in 2014, for the development of an open-source
software platform dedicated to uncertainty propagation by probabilistic methods,
named OpenTURNS for open-source treatment of uncertainty, Risk ’N Statistics
[10, 29]. OpenTURNS is actively supported by its core team of four industrial
partners (IMACS joined the consortium in 2014) and its industrial and academic
user community that meet through the website www.openturns.org and annually
during the OpenTURNS User’s Day. At EDF, OpenTURNS is the repository of
all scientific developments on this subject, to ensure their dissemination within the
several business units of the company. The software has also been distributed for
several years via the integrating platform Salome [27].
OpenTURNS is an open-source software under the LGPL license that presents itself
as a C++ library and a Python TUI and which works under Linux and Windows
environment, with the following key features:
• Open-source initiative to secure the transparency of the approach and its open-
ness to ongoing research and development (R&D) and expert challenging
• Generic to the physical or industrial domains for treating of multi-physical
problems
• Structured in a practitioner guidance methodological approach
4 M. Baudin et al.
• With advanced industrial computing capabilities, enabling the use of massive dis-
tribution and high-performance computing, various engineering environments,
large data models, etc.
• Includes the largest variety of qualified algorithms in order to manage uncertain-
ties in several situations
• Contains complete documentation (reference guide, use cases guide, user man-
ual, examples guide, and developers’ guide)
All the methodological tools are described after this introduction in the different
sections of this paper: uncertainty quantification, uncertainty propagation, sensitiv-
ity analysis, and metamodeling. Before the conclusion, a section also explains the
generic wrappers’ way to link OpenTURNS to any external code.
OpenTURNS can be downloaded from its dedicated website www.openturns.
org which offers different pre-compiled packages specific to several Windows and
Linux environments. It is also possible to download the source files from the
SourceForge server (www.sourceforge.net) and to compile them within another
environment: the OpenTURNS developer’s guide provides advice to help compiling
the source files. At last, OpenTURNS has been integrated for more than 5 years in
the major Linux distributions (e.g., debian, ubuntu, redhat, and suze).
• Step A: specify the random inputs X , the deterministic inputs d, the model G
(analytical, complex computer code or experimental process), the variable of
interest (model output) Y , and the quantity of interest on the output (central dis-
persion, its distribution, probability to exceed a threshold, etc.). The fundamental
relation writes:
with X D .X1 ; : : : ; Xd /.
• Step B: quantify the sources of uncertainty. This step consists in modeling the
joint probability density function (pdf) of the random input vector by direct
methods (e.g., statistical fitting, expert judgment) [15].
• Step B’: quantify the sources of uncertainty by indirect methods using some real
observations of the model outputs [39]. The calibration process aims to estimate
the values or the pdf of the inputs, while the validation process aims to model the
bias between the model and the real system.
• Step C: propagate uncertainties to estimate the quantity of interest. With respect
to this quantity, the computational resources, and the CPU time cost of a single
model run, various methods will be applied: analytical formula, geometrical
OpenTURNS: An Industrial Software for Uncertainty Quantification in Simulation 5
For each of these steps, OpenTURNS offers a large number of different methods
whose applicability depend on the specificity of the problem (dimension of inputs,
model complexity, CPU time cost for a model run, quantity of interest, etc.).
Throughout this paper, the discussion is illustrated with a simple application model
that simulates the height of a river and compares it to the height of a dike
that protects industrial facilities as illustrated in Fig. 2. When the river height
exceeds that of the dike, flooding occurs. This academic model is used as a
pedagogical example in [14]. The model is based on a crude simplification of the
1D hydrodynamical equations of Saint-Venant under the assumptions of uniform
Table 1 Input variables of the flood model and their probability distributions
Input Description Unit Probability distribution
Q Maximal annual flow rate m3 /s Gumbel G .1:8e 3 ; 1014/
Ks Strickler coefficient – Normal N .30; 7:5/
Zv River downstream level m Triangular T .47:6; 50:5; 52:4/
Zm River upstream level m Triangular T .52:5; 54:9; 57:7/
and constant flow rate and large rectangular sections. It consists of an equation that
involves the characteristics of the river stretch:
0 10:6
B Q C
H D@ q A ; (2)
Zm Zv
BKs L
where the output variable H is the maximal annual height of the river, B is the river
width, and L is the length of the river stretch. The four random input variables
Q, Ks , Zv , and Zm are defined in Table 1 with their probability distribution.
The randomness of these variables is due to their spatiotemporal variability, our
ignorance of their true value, or some inaccuracies of their estimation.
2 Uncertainty Quantification
note the random vector which writes as a linear combination of a finite set of
independent variables: X D a0 Ca1 X 1 C: : : aN X N , thanks to the Python command,
written for N D 2 with explicit notations:
>>>myX= RandomMixture ( [ d i s t X 1 , d i s t X 2 ] , [ a1 , a2 ] , a0 )
where all the variables Xi are identically distributed according to distX. In that case,
the distribution of X is exactly determined, using the characteristic functions of the
Xi distributions and the Poisson summation formula.
In the univariate case, OpenTURNS exactly determines the pushforward distri-
bution D of any distribution D0 through the function f W R ! R, thanks to the
Python command (with straight notations):
>>>d= C o m p o s i t e D i s t r i b u t i o n ( f , d0 )
Figure 3 illustrates the copula of such a distribution, built as the ordinal sum of
some maximum entropy order statistics copulas.
The OpenTURNS Python script to model the input random vector of the tutorial
presented previously is as follows:
# Margin d i s t r i b u t i o n s :
>>> d i s t _ Q = Gumbel ( 1 . 8 e 3, 1 0 1 4 )
>>> d i s t _ Q = T r u n c a t e d D i s t r i b u t i o n ( d i s t _ Q , 0 . 0 , T r u n c a t e d D i s t r i b u t i o n .LOWER)
>>> d i s t _ K = Normal ( 3 0 . 0 , 7 . 5 )
>>> d i s t _ K = T r u n c a t e d D i s t r i b u t i o n ( d i s t _ K , 0 . , T r u n c a t e d D i s t r i b u t i o n .LOWER)
>>> d i s t _ Z v = T r i a n g u l a r ( 4 7 . 6 , 5 0 . 5 , 5 2 . 4 )
>>> d i s t _ Z m = T r i a n g u l a r ( 5 2 . 5 , 5 4 . 9 , 5 7 . 7 )
# C opula i n d i m e n s i o n 4 f o r ( Q, K , Zv , Zm )
>>>R= C o r r e l a t i o n M a t r i x ( 2 )
OpenTURNS: An Industrial Software for Uncertainty Quantification in Simulation 9
Fig. 3 An example of maximum entropy copula which almost surely satisfies the ordering
constraint: X1 X2
>>>R [ 0 , 1 ] = 0 . 7
>>> c o p u l a = ComposedCopula ( [ I n d e p e n d e n t C o p u l a ( 2 ) , Nor ma lC opula ( R ) ] )
# F i n a l d i s t r i b u t i o n f o r ( Q, K , Zv , Zm )
>>> d i s t I n p u t = C o m p o s e d D i s t r i b u t i o n ( [ loi_Q , loi_K , l o i _ Z v , loi_Zm ] , c o p u l a )
# F i n a l random v e c t o r ( Q, K , Zv , Zm )
>>> i n p u t V e c t o r =RandomVector ( d i s t I n p u t )
Note that OpenTURNS can truncate any distribution to a lower, an upper bound,
or a given interval. Furthermore, a normal copula models the dependence between
the variables Zv and Zm , with a correlation of 0.7. The variables .Q; K/ are
independent. Both blocks .Q; K/ and .Zv ; Zm / are independent.
Any field can be exported into the VTK format which allows it to be visualized
using, e.g., ParaView (www.paraview.org).
10 M. Baudin et al.
Fig. 6 A user-defined nonstationary covariance function and its estimation from several given
fields
Note that OpenTURNS enables the mapping of any stochastic processes X into
a process Y through a function f : Y D f .X / where the function f can consist, for
example, of adding or removing a trend, applying a Box Cox transformation in order
to stabilize the variance of X . The Python command is, with explicit notations:
>>> myYprocess = C o m p o s i t e P r o c e s s ( f , myXprocess )
X
K
X .!; x/ D Ai .!/i .x/
i D1
OpenTURNS enables the user to estimate a model from data, in the univariate as
well as in the multivariate framework, using the maximum likelihood principle or
the moment-based estimation.
Some tests, such as the Kolmogorov-Smirnov test, the chi-square test, and the
Anderson-Darling test (for normal distributions), are implemented and can help to
select a model among others, from a sample of data. The Python command to build
a model and test it writes:
>>> e s t i m a t e d B e t a = B e t a F a c t o r y ( s a m p l e )
>>> t e s t R e s u l t = F i t t i n g T e s t . Kolmogorov ( sam pl e , e s t i m a t e d B e t a )
Figure 7 illustrates the resulting estimated distributions from a sample of size 500
issued from a beta distribution: the kernel smoothing method takes into account the
Fig. 7 Beta distribution estimation from a sample of size 500: parametric estimation versus kernel
smoothing technique
OpenTURNS: An Industrial Software for Uncertainty Quantification in Simulation 13
fact that data are bounded by 0 and 1. The histogram of the data is drawn to enable
comparison.
Several visual tests are also implemented to help select models: among them, the
QQ plot test and the Henry line test which writes (in the case of a beta distribution
for the QQ plot test):
>>> g r a p h Q Q p l o t = V i s u a l T e s t . DrawQQplot ( sam pl e , B e t a ( ) )
>>> g r a p h H e n r y L i n e = V i s u a l T e s t . DrawHenryLine ( s a m p l e )
Figure 8 illustrates the QQ plot test on a sample of size 500 issued from a beta
distribution: the adequation seems satisfying!
Stochastic processes also have estimation procedures from sample of fields or, if
the ergodic hypothesis is verified, from just one field. Multivariate ARMA processes
are estimated according to the BIC and AIC criteria and the Whittle estimator, which
is based on the maximization of the likelihood function in the frequency domain.
The Python command to estimate an ARMA.p; q/ process of dimension d , based
on a sample of time series, writes:
>>>estimatedARMA = ARMALikelihood ( p , q , d ) . b u i l d ( s a m p l e T i m e S e r i e s )
Moreover, OpenTURNS can estimate the covariance function and the spectral
density function of normal processes from given fields. For example, the Python
command to estimate a stationary covariance model from a sample of realizations
of the process writes:
>>>myCovFunc = S t a t i o n a r y C o v a r i a n c e M o d e l F a c t o r y ( ) . b u i l d ( s a m p l e P r o c e s s )
Fig. 8 QQ plot test: theoretical model beta versus the sample of size 500
14 M. Baudin et al.
Distribution Y
fixed parameters
bayesian random vector
1.5
1.0
PDF
0.5
0.0
Conditional distribution
1.0
0.8
0.6
PDF
0.4
0.2
0.0
−1 0 1 2
X
3 Uncertainty Propagation
Once the input multivariate distribution has been satisfactorily chosen, these
uncertainties can be propagated through the G model to the output vector Y .
Depending on the final goal of the study (min-max approach, central tendency,
or reliability), several methods can be used to estimate the corresponding quantity
16 M. Baudin et al.
of interest, tending to respect the best compromise between the accuracy of the
estimator and the number of calls to the numerical, and potentially costly, model.
The aim here is to determine the extreme (minimum and maximum) values of the
components of Y for the set of all possible values of X . Several techniques enable
it to be done:
The type of design of experiments impacts the quality of the metamodel and
then on the evaluation of its extreme values. OpenTURNS gives access to two usual
families of design of experiments for a min-max study:
• Some stratified patterns (axial, composite, factorial, or box patterns). Here are
the two command lines that generate a sample from a two-level factorial pattern:
>>> m y C e n t e r e d R e d u c t e d G r i d = F a c t o r i a l ( 2 , l e v e l s )
>>>mySample = m y C e n t e r e d R e d u c e d G r i d . g e n e r a t e ( )
• Some weighted patterns that include, on the one hand, random patterns (Monte
Carlo, LHS) and, on the other hand, low-discrepancy sequences (Sobol, Faure,
Halton, Reverse Halton, and Haselgrove, in dimension n > 1). The following
lines illustrate the creation of a Faure sequence in dimension 2 or a Monte Carlo
design experiment from a bidimensional normal (0,1) distribution:
# Sobol Sequence Sampling
>>> mySobolSample = F a u r e S e q u e n c e ( 2 ) . g e n e r a t e ( 1 0 0 0 )
# Monte C a r l o S a m p l i n g
>>>myMCSample = M o n t e C a r l o E x p e r i m e n t ( Normal ( 2 ) , 1 0 0 )
0.8
0.6
x2
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
x1
# For t h e r e s e a r c h o f t h e min v a l u e
>>>myAlgoTNC = TNC( T N C S p e c i f i c P a r a m e t e r s ( ) , l i m i t S t a t e F u n c t i o n ,
i n t e r v a l O p t i m , s t a r t i n g P o i n t , TNC . MINIMIZATION)
# For t h e r e s e a r c h o f t h e max v a l u e
>>>myAlgoTNC = TNC( T N C S p e c i f i c P a r a m e t e r s ( ) , l i m i t S t a t e F u n c t i o n ,
i n t e r v a l O p t i m , s t a r t i n g P o i n t , TNC . MAXIMIZATION)
# Run t h e r e s e a r c h and e x t r a c t t h e r e s u l t s
>>>myAlgoTNC . r u n ( )
>>>myAlgoTNCResult = B o u n d C o n s t r a i n e d A l g o r i t h m ( myAlgoTNC ) . g e t R e s u l t ( )
>>> o p t i m a l V a l u e = myAlgoTNCResult . g e t O p t i m a l V a l u e ( )
A central tendency evaluation aims at evaluating a reference value for the variable
of interest, here the water level, H , and an indicator of the dispersion of the variable
p To address this problem, mean Y D e.Y / and the standard
around the reference.
deviation Y D V.Y / of Y are here evaluated using two different methods.
First, following the usual method within the measurement science community
[12], Y and Y have been computed under a Taylor first-order approximation of
the function Y D G.X/ (notice that the explicit dependence on the deterministic
variable d is here omitted for simplifying notations):
i and j being the standard deviation of the ith and jth components Xi and Xj of
the vector X and ij their correlation coefficient. Thanks to the formulas above,
the mean and the standard deviation of H are evaluated as 52.75m and 1.15m,
respectively:
>>>myQuadCum = Q u a d r a t i c C u m u l ( o u t p u t V a r i a b l e )
# F i r s t o r d e r Mean
>>> m e a n F i r s t O r d e r = myQuadCum . g e t M e a n F i r s t O r d e r ( ) [ 0 ]
# Second o r d e r Mean
>>> m eanSecondOrder = myQuadCum . g e t M e a n S e c o n d O r d e r ( ) [ 0 ]
# F i r s t order Variance
>>> v a r F i r s t O r d e r = myQuadCum . g e t C o v a r i a n c e ( ) [ 0 , 0 ]
Then, the same quantities have been evaluated by a Monte Carlo evaluation: a set
of 10000 samples of the vector X is generated and the function G.X/ is evaluated,
thus giving a sample of H . The empirical mean and standard deviation of this
sample are 52.75 and 1.42 m, respectively. Figure 13 shows the empirical histogram
of the generated sample of H .
# C r e a t e a random s a m p l e o f t h e o u t p u t v a r i a b l e o f i n t e r e s t o f s i z e 10000
>>> o u t p u t S a m p l e = o u t p u t V a r i a b l e . g e t N u m e r i c a l S a m p l e ( 1 0 0 0 0 )
OpenTURNS: An Industrial Software for Uncertainty Quantification in Simulation 19
sample histogram
0.30
Unnamed histogram
0.25
0.20
frequency
0.15
0.10
0.05
0.00
48 50 52 54 56 58 60 62
realizations
# Ge t t h e e m p i r i c a l mean
>>> e m p i r i c a l M e a n = o u t p u t S a m p l e . computeMean ( )
# Ge t t h e e m p i r i c a l c o v a r i a n c e m a t r i x
>>> e m p i r i c a l C o v a r i a n c e M a t r i x = o u t p u t S a m p l e . c o m p u t e C o v a r i a n c e ( )
This section focuses on the estimation of the probability for the output Y to exceed a
certain threshold s, noted Pf in the following. If s is the altitude of a flood protection
dike, then the above excess probability, Pf , can be interpreted as the probability of
an overflow of the dike, i.e., a failure probability.
Note that an equivalent way of formulating this reliability problem would be to
estimate the .1 p/-th quantile of the output’s distribution. This quantile can be
interpreted as the flood height qp which is attained with probability p each year.
T D 1=p is then seen to be a return period, i.e., a flood as high as q1=T occurs on
average every T years.
Hence, the probability of overflowing a dike with height s is less than p (where p,
for instance, could be set according to safety regulations) if and only if s qp , i.e.,
if the dike’s altitude is higher than the flood with return period equal to T D 1=p:
20 M. Baudin et al.
Zm : 1.2%
Zv : 9.5%
Ks : 56.8%
3.3.1 FORM
A way to evaluate such failure probabilities is through the so-called first-order
reliability method (FORM) [9]. This approach allows, by using an equiprobabilistic
transformation and an approximation of the limit state function, the evaluation with
a much reduced number of model evaluations, of some low probability as required
in the reliability field. Note that OpenTURNS implements the Nataf transformation
where the input vector X has a normal copula, the generalized Nataf transformation
when X has an elliptical copula, and the Rosenblatt transformation for any other
cases [16–19].
The probability that the yearly maximal water height H exceeds s = 58 m is
evaluated using FORM. The Hasofer-Lind reliability index was found to be equal to
ˇHL D 3:04; yielding a final estimate of:
The method gives also some importance factors that measure the weight of each
input variable in the probability of exceedance, as shown on Fig. 14.
>>>myFORM = FORM( Cobyl a ( ) , myEvent , m e a n I n p u t V e c t o r )
>>>myFORM. r u n ( )
>>> F o r m R e s u l t = myFORM . g e t R e s u l t ( )
>>>pFORM = F o r m R e s u l t . g e t E v e n t P r o b a b i l i t y ( )
>>> H a s o f e r I n d e x = F o r m R e s u l t . g e t H a s o f e r R e l i a b i l i t y I n d e x ( )
# Importance f a c t o r s
>>> i m p o r t a n c e F a c t o r s G r a p h = F o r m R e s u l t . d r a w I m p o r t a n c e F a c t o r s ( )
1 X
N
PO f;M C D 1 .i / : (5)
N i D1 fY >sg
p
The sample average of the estimation error PO f;M C Pf decreases as 1= N and
can be precisely quantified by a confidence interval derived from the central limit
theorem. In the present case, the results are:
These results are coherent with those of the FORM approximation, confirming that
the assumptions underlying the latter are correct. Figure 15 shows the convergence
of the estimate depending on the size of the sample, obtained with OpenTURNS.
>>>myEvent = E v e n t ( o u t p u t V a r i a b l e , G r e a t e r ( ) , t h r e s h o l d )
>>> myMonteCarlo = M ont eCarl o ( myEvent )
# S p e c i f y t h e maximum number o f s i m u l a t i o n s
22 M. Baudin et al.
1 X
N
O n .U .i / /
P f;IS D 1fGıT 1 U .i / >sg (6)
N i D1 n .U .i / u /
The rationale of this approach is that, by sampling in the vicinity of the failure
domain boundary, a larger proportion of values fall within the failure domain than by
sampling around the origin, leading to a better evaluation of the failure probability
and a reduction in the estimation variance. Using this approach, the results are:
X N
b f;DS D 1
P qi
N i D1
F1 F2 Fm D F
# T h i s a l l o w s t o c o n t r o l t h e number o f s a m p l e s p e r s t e p
>>>mySSAlgo . set M axi m um Out erSam pl i ng ( 1 0 0 0 0 )
# Run t h e a l g o r i t h m
>>>mySSAlgo . r u n ( )
4 Sensitivity Analysis
In sensitivity analysis, graphical techniques have to be used first. With all the
scatterplots between each input variable and the model output, one can immediately
detect some trends in their functional relation. The following instructions allow
scatterplots of Fig. 16 to be obtained from a Monte Carlo sample of size N D 1000
of the flooding model.
>>> i n p u t S a m p l e = i n p u t R a n d o m V e c t o r . g e t N u m e r i c a l S a m p l e ( 1 0 0 0 )
>>> i n p u t S a m p l e . s e t D e s c r i p t i o n ( [ ’Q ’ , ’K ’ , ’ Zv ’ , ’Zm ’ ] )
>>> o u t p u t S a m p l e = f i n a l M o d e l C r u e ( i n p u t S a m p l e )
>>> o u t p u t S a m p l e . s e t D e s c r i p t i o n ( [ ’H ’ ] )
# Here , s t a c k b o t h s a m p l e s i n one
>>> i n p u t S a m p l e . s t a c k ( o u t p u t S a m p l e )
>>> m y P a i r s = P a i r s ( i n p u t S a m p l e )
>>>myGraph = Graph ( )
>>>myGraph . add ( m y P a i r s )
In the right column of Fig. 16, it is clear that the strong and rather linear effects
of Q and Zv on the output variable H . In the plot of the third line and fourth
column, it is also clear that the dependence between Zv and Zm comes from the
large correlation coefficient introduced in the probabilistic model.
However scatterplots do not capture some interaction effects between the inputs.
Cobweb plots are then used to visualize the simulations as a set of trajectories. The
following instructions allow the cobweb plots of Fig. 17 to be obtained where the
simulations leading to the largest values of the model output H have been colored
in red.
OpenTURNS: An Industrial Software for Uncertainty Quantification in Simulation 25
10 20 30 40 50 53 54 55 56 57
2000 4000
Q
0
50
30
K
10
52
50
Zv
48
57
55
Zm
53
50 52 54 56
H
0 1000 3000 48 49 50 51 52 50 52 54 56
Fig. 16 Scatterplots between the inputs and the output of the flooding model: each combination
(input i, input j) and (input i, output) is drawn, which enables to exhibit some correlation patterns
>>> i n p u t S a m p l e = i n p u t R a n d o m V e c t o r . g e t N u m e r i c a l S a m p l e ( 1 0 0 0 )
>>> o u t p u t S a m p l e = f i n a l M o d e l C r u e ( i n p u t S a m p l e )
# Graph 1 : v a l u e b a s e d s c a l e t o d e s c r i b e t h e Y r a n g e
>>> m i nVal ue = o u t p u t S a m p l e . c o m p u t e Q u a n t i l e P e r C o m p o n e n t ( 0 . 0 5 ) [ 0 ]
>>>maxValue = o u t p u t S a m p l e . c o m p u t e Q u a n t i l e P e r C o m p o n e n t ( 0 . 9 5 ) [ 0 ]
>>>myCobweb = V i s u a l T e s t . DrawCobWeb ( i n p u t S a m p l e , o u t p u t S a m p l e ,
minValue , maxValue , ’ r e d ’ , F a l s e )
Q
K
Zv
Zm
H
Fig. 17 Cobweb plot for the flooding model: each simulation is drawn. The input marginal values
are linked to the output value (last column). All the simulations that led to a high quantile of the
output are drawn in red: the cobweb plot enables to detect all the combinations of the inputs they
come from
i
SRCi D ˛i (for i D 1 : : : p), (7)
Y
Table 2 Regression coefficients and SRC of the flood model inputs (˛0 D 0:1675 and R2 D
0:97)
Q Ks Zv Zm
˛i 3:2640 0:0012 0:0556 1:1720
SRCi 0:3462 0:0851 0:6814 0:0149
The SRC values confirm our first conclusions drawn from the scatterplots visual
analysis. As R2 D 0:97 is very close to one, the model is quasi-linear. The SRC
coefficients are sufficient to perform a global sensitivity analysis.
Several other estimation methods are available in OpenTURNS for a sensitivity
analysis purpose:
Si D VarVar
ŒE.Y jXi /
.Y /
(first order index) and (8)
Pp P
STi D i D1 Si C i <j Sij C : : : (total index),
are estimated in OpenTURNS with the classic pick-freeze method based on two
independent Monte Carlo samples [34]. In OpenTURNS, other ways to compute
the Sobol indices are the extended FAST method [35] and the coefficients of the
polynomial chaos expansion [38].
5 Metamodels
The polynomial chaos expansion enables the approximation of the output random
variable of interest Y D G.X / with g W Rd ! Rp by the surrogate model:
X
YQ D ˛k k ı T .X /
k2K
k .z/ D 1
k1 .z1 /
2
k2 .z2 / d
kd .zd /
as well as:
˛ D E g ı T 1 .Z / k .Z / k
(10)
where f is a function L1 ./. The set I , the points . i /i 2I , and the weights
.!i /i 2I are evaluated from weighted designs of experiments which can be random
(Monte Carlo experiments and importance sampling experiments) or determinis-
tic (low-discrepancy experiments, user-given experiments, and Gaussian product
experiments).
At last, OpenTURNS gives access to:
>>> p r o d u c t B a s i s = O r t h o g o n a l P r o d u c t P o l y n o m i a l F a c t o r y
( polyColl , enumerateFunction )
# Truncature s t r a t e g y of the m u l t i v a r i a t e orthonormal basi s
# Choose a l l t h e p o l y n o m i a l s o f d e g r e e <= 4
>>> d e g r e e = 4
>>> indexMax = e n u m e r a t e F u n c t i o n . g e t S t r a t a C u m u l a t e d C a r d i n a l ( d e g r e e )
# Keep a l l t h e p o l y n o m i a l s o f d e g r e e <= 4
# whi ch c o r r e s p o n d s t o t h e 5 f i r s t o n e s
>>> a d a p t i v e S t r a t e g y = F i x e d S t r a t e g y ( p r o d u c t B a s i s , indexMax )
# Evaluation stra te gy of the approximation c o e f f i c i e n t s
>>> s a m p l i n g S i z e = 50
>>> e x p e r i m e n t = M o n t e C a r l o E x p e r i m e n t ( s a m p l i n g S i z e )
>>> p r o j e c t i o n S t r a t e g y = L e a s t S q u a r e s S t r a t e g y ( e x p e r i m e n t )
# C r e a t i o n o f t h e F u n c t i o n a l Chaos A l g o r i t h m
>>> a l g o = F u n c t i o n a l C h a o s A l g o r i t h m ( model , d i s t r i b u t i o n ,
adaptiveStrategy , . . . projectionStrategy )
>>> a l g o . r u n ( )
# Get t h e r e s u l t
>>> f u n c t i o n a l C h a o s R e s u l t = a l g o . g e t R e s u l t ( )
>>> m et am odel = f u n c t i o n a l C h a o s R e s u l t . get M et aM odel ( )
Fig. 18 An example of a polynomial chaos expansion: the blue line is the reference function
G W x 7! x sin x and the red one its approximation only valid on Œ1; 1
OpenTURNS: An Industrial Software for Uncertainty Quantification in Simulation 31
Kriging (also known as Gaussian process regression) [24, 30, 33, 37] is a Bayesian
technique that aims at approximating functions (most often in order to surrogate
them because they are expensive to evaluate). In the following, it is assumed that
the aim is to surrogate a scalar-valued model G W x 7! y. Note the OpenTURNS
implementation of Kriging can deal with vector-valued functions (G W x 7! y),
with simple loops over each output. It is also assumed that the model was run over
a design of experiments
in order to produce a set of observations gathered in the
following dataset: x i ; y i ; i D 1; : : : ; n . Ultimately Kriging aims at producing a
predictor (also known as a response surface or metamodel) denoted as G. Q
It is assumed that the model G is a realization of the normal process Y W ˝
Rd ! R defined by:
where m.x/ is the trend and Z.x/ is a zero-mean Gaussian process with a
covariance function c W Rd Rd ! R which depends on the vector of parameters
2 Rn :
Y .!; x .i / / D y .i / 8i D 1; : : : ; n (15)
Q
G.x/ D EŒY .!; x/jY .!; x .i / / D y .i / ; 8i D 1; : : : ; n (16)
Q
G.x/ D .f .x//t Q̌ C .c .x//t C1 .y F Q̌ / (17)
10
l
5
l
l
l
0
model, kriging
l
−5
l
−10
−15
model
−20
kriging
data
0 2 4 6 8
x
Fig. 19 An example of Kriging approximation based on six observations: the blue line is the
reference function G W x 7! x sin x and the red one its approximation by a realization of a Gaussian
process
and C D Œc .xi ; xj /i;j D1:::n , F D Œf .xi /t i D1:::n and c t .x/ D Œc .x; xi /i D1:::n .
The line command writes:
>>> a l g o = K r i g i n g A l g o r i t h m ( i n p u t S a m p l e , o u t p u t S a m p l e , b a s i s ,
covarianceModel )
>>> a l g o . r u n ( )
>>> r e s u l t = a l g o . g e t R e s u l t ( )
>>> m et am odel = r e s u l t . get M et aM odel ( )
>>> g r a p h = m et am odel . draw ( )
On the practical side, the OpenTURNS software provides features which make the
connection to the simulator G easy and make its evaluation generally fast. Within
the OpenTURNS framework, the method to connect to G is called “wrapping.”
In the simplest situations, the function G is analytical and the formulas can be
provided to OpenTURNS with a character string. Here, the Muparser C++ library
[4] is used to evaluate the value of the mathematical function. In this case, the
evaluation of G by OpenTURNS is quite fast.
OpenTURNS: An Industrial Software for Uncertainty Quantification in Simulation 33
Once created, the function G can be used as a regular Python function or can be
passed as an input argument of other OpenTURNS classes.
In most cases, the function G is provided as a Python function, which can be
connected to OpenTURNS with the PythonFunction class. This task is easy
(for those who are familiar with this language) and allows the scientific packages
already available in Python to be combined. For example, if the computational code
uses XML files on input or output, it is easy to make use of the XML features of
Python (e.g., the minidom package). Moreover, if the function evaluation can be
vectorized (e.g., with the numpy package), then the func_sample option of the
PythonFunction class can improve the performance a lot.
The following Python script creates the function G associated with the flooding
model. The flood function is first defined with the def Python statement.
This function takes the variable X as input argument, which is an array with
four components, Q, K_s, Z_v, and Z_m, which correspond to the input random
variables in the model. The body of the flood function is a regular Python script,
so that all Python functions can be used at this point (e.g., the numpy or scipy
functions). The last statement of the function returns the overflow S. Then the
PythonFunction class is used in order to convert this Python function into an
object that OpenTURNS can use. This class takes as input arguments the number
of input variables (in this case, 4), the number of outputs (in this case, 1), and the
function and returns the object G.
>>>from o p e n t u r n s i m port P y t h o n F u n c t i o n
>>> d e f f l o o d (X) :
L = 5 . 0 e3 ; B = 300.0
Q, K_s , Z_v , Z_m = X
a l p h a = ( Z_m Z_v ) / L
H = (Q / ( K_s *B* s q r t ( a l p h a ) ) ) * * ( 0 . 6 )
return H
>>>G = P y t h o n F u n c t i o n ( 4 , 1 , f l o o d )
If, as many of the computational codes commonly used, the data exchange
is based on text files, OpenTURNS provides a component (coupling_tools)
which is able to read and write structured text files based, for example, on line
indices and perhaps containing tables (using line and column indices). Moreover,
OpenTURNS provides a component which can evaluate such a Python function
using the multi-thread capabilities that most computers have.
34 M. Baudin et al.
when the computer code has been differentiated with automatic differentiation
methods, such as forward or adjoint techniques.
In the case where the function is analytical and is provided as a character string,
OpenTURNS is able to compute the exact derivatives of G. In order to do this, the
software uses the Ev3 C++ library [22] to perform the symbolic computation of the
derivatives and MuParser [4] to evaluate it.
In most common situations, however, the code does not compute its derivatives.
In this case, OpenTURNS provides a method to compute the derivatives based on
finite difference formulas. By default, a centered finite difference formula for the
gradient and a centered formula for the Hessian matrix are used.
For most common engineering practices, OpenTURNS can evaluate G with the
multi-thread capabilities of most laptop and scientific workstations. However, when
the evaluation of G is more CPU consuming or when the number of evaluations
required is larger, these features are not sufficient by themselves, and it is necessary
to use a high-performance computer such as the Zumbrota, Athos, or Ivanhoe super-
computers available at EDF R&D which have from 16,000 to 65,000 cores [40].
In this case, two solutions are commonly used. The first one is to use a feature
which can execute a Python function on remote processors, connected on the
network with ssh. Here, the data flow is based on files, located in automatically
generated directories, which prevents the loss of intermediate data. This feature
(the DistributedPythonFunction) allows each remote processor to use its
multi-thread capabilities, providing two different levels of parallel computing.
The second solution is to use the OpenTURNS component integrated in the
Salome platform. This component, along with a graphical user interface, called
“Eficas,” makes use of a software, called “YACS,” which can call a Python script.
The YACS module allows calculation schemes in Salome to be built, edited, and
executed. It provides both a graphical user interface to chain the computations
by linking the inputs and outputs of computer codes and then to execute these
computations on remote machines.
Several studies have been conducted at EDF based on the OpenTURNS com-
ponent of Salome. For example, an uncertainty propagation study (the thermal
evaluation of the storage of high-level nuclear waste) was making use of a computer
code where one single run required approximately 10 min on the 8 cores of a
workstation (with shared memory). Within Salome, the OpenTURNS simulation
involving 6000 unitary evaluations of the function G required 8000 CPU hours on
32 nodes [3].
36 M. Baudin et al.
7 Conclusions
This educational example has shown a number of questions and problems that can be
addressed by UQ methods, uncertainty quantification, central tendency evaluation,
excess probability assessment, and sensitivity analysis, that can require the use of a
metamodel.
Different numerical methods have been used for solving these three classes
of problems, leading substantially to the same (or very similar) results. In the
industrial practice of UQ, the main issue (which actually motivates the choice of
one mathematical method instead of another) is the computational budget, which
is actually given by the number of allowed runs of the deterministic model G.
When the computer code implementing G is computationally expensive, one needs
specifically designed mathematical and software tools.
OpenTURNS is specially intended to meet these issues: (i) it includes a set of
efficient mathematical methods for UQ and (ii) it can be easily connected to any
external black box model G. Thanks to these two main features, OpenTURNS is a
software that can address many different physics problems and thus help to solve
industrial problems. From this perspective, the partnership around OpenTURNS
focuses efforts on the integration of the most efficient and innovative methods
required by the industrial applications that takes into account both the need of
genericity and of ease to communicate. The main projects for 2015 concern the
improvement of the Kriging implementation to integrate some very smart methods
of optimization. Around this theme, some other classical optimization methods will
also be generalized or newly implemented.
A growing need for model exploration and analysis of uncertainty problem in
industrial applications is to better visualize the information provided by such a
volume of data. In this area, specific visualization software, such as ParaView,
can provide very efficient and interactive features. Taking the advantage of the
integration of OpenTURNS in the Salome platform, EDF is working on a better
link between the ParaView module in Salome (called ParaVIS) and the uncertainty
analysis with OpenTURNS: in 2012, functional boxplot [13] has been implemented.
Some recent work around in situ visualization for uncertainty analysis should also
be developed and implemented and so benefit very computationaly expensive model
physics that generate an extremely high volume of data.
Part of this work has been backed by French National Research Agency (ANR)
trough the Chorus project (no. ANR-13-MONU-0005-08). We are grateful to the
OpenTURNS Consortium members. We also thank Regis Lebrun, Mathieu Couplet,
and Merlin Keller for their help.
References
1. Airbus, EDF, Phimeca: Developer’s guide, OpenTURNS 1.4 (2014). http://openturns.org
2. Au, S., Beck, J.L.: Estimation of small failure probabilities in high dimensions by subset
simulation. Probab. Eng. Mech. 16, 263–277 (2001)
OpenTURNS: An Industrial Software for Uncertainty Quantification in Simulation 37
3. Barate, R.: Calcul haute performance avec OpenTURNS, workshop du GdR MASCOT-NUM,
Quantification d’incertitude et calcul intensif. http://www.gdr-mascotnum.fr/media/openturns-
hpc-2013-03-28.pdf (2013)
4. Berg, I.: muparser, http://muparser.beltoforion.de, fast Math Parser Library (2014)
5. Berger, J. (ed.): Statistical Decision Theory and Bayesian Analysis. Springer, New York (1985)
6. Blatman, G.: Adaptative sparse polynomial chaos expansions for uncertainty propagation and
sensitivity anaysis. PhD thesis, Clermont University (2009)
7. Blatman, G., Sudret, B.: Adaptive sparse polynomial chaos expansion based on least angle
regression. J. Comput. Phys. 230, 2345–2367 (2011)
8. Butucea, C., Delmas, J., Dutfoy, A., Fischer, R.: Maximum entropy copula with given diagonal
section. J. Multivar. Anal. 137, 61–81 (2015)
9. Ditlevsen, O., Madsen, H.: Structural Reliability Methods. Wiley, Chichester/New York (1996)
10. Dutfoy, A., Dutka-Malen, I., Pasanisi, A., Lebrun, R., Mangeant, F., Gupta, J.S., Pendola, M.,
Yalamas, T.: OpenTURNS, an open source initiative to treat uncertainties, Risks’N statistics in
a structured industrial approach. In: Proceedings of 41èmes Journées de Statistique, Bordeaux
(2009)
11. Fang, K.T., Li, R., Sudjianto, A.: Design and Modeling for Computer Experiments. Chapman
& Hall/CRC, Boca Raton (2006)
12. gum08: JCGM 100-2008 – Evaluation of measurement data – guide to the expression of
uncertainty in measurement. JCGM (2008)
13. Hyndman, R., Shang, H.: Rainbow plots, bagplots, and boxplots for functional data. J. Comput.
Graph. Stat. 19, 29–45 (2010)
14. Iooss, B., Lemaître, P.: A review on global sensitivity analysis methods. In: Meloni, C.,
Dellino, G. (eds.) Uncertainty Management in Simulation-Optimization of Complex Systems:
Algorithms and Applications. Springer, New York (2015)
15. Kurowicka, D., Cooke, R.: Uncertainty Analysis with High Dimensional Dependence Mod-
elling. Wiley, Chichester/Hoboken (2006)
16. Lebrun, R., Dutfoy, A.: Do rosenblatt and nataf isoprobabilistic transformations really differ?
Probab. Eng. Mech. 24, 577–584 (2009)
17. Lebrun, R., Dutfoy, A.: A generalization of the nataf transformation to distributions with
elliptical copula. Probab. Eng. Mech. 24, 172–178 (2009)
18. Lebrun, R., Dutfoy, A.: An innovating analysis of the nataf transformation from the viewpoint
of copula. Probab. Eng. Mech. 24, 312–320 (2009)
19. Lebrun, R., Dutfoy, A.: A practical approach to dependence modelling using copulas. J. Risk
Reliab. 223(04), 347–361 (2009)
20. Lebrun, R., Dutfoy, A.: Copulas for order statistics with prescribed margins. J. Multivar. Anal.
128, 120–133 (2014)
21. Lemaire, M.: Structural Reliability. Wiley, Hoboken (2009)
22. Liberty, L.: Ev3: a library for symbolic computation in c++ using n-ary trees, http://www.lix.
polytechnique.fr/~liberti/Ev3.pdf (2003)
23. Marin, J.M., Robert, C. (eds.): Bayesian Core: A Practical Approach to Computational
Bayesian Statistics. Springer, New York (2007)
24. Marrel, A., Iooss, B., Van Dorpe, F., Volkova, E.: An efficient methodology for modeling
complex computer codes with Gaussian processes. Comput. Stat. Data Anal. 52, 4731–4744
(2008)
25. Munoz-Zuniga, M., Garnier, J., Remy, E.: Adaptive directional stratification for controlled
estimation of the probability of a rare event. Reliab. Eng. Syst. Saf. 96, 1691–1712 (2011)
26. Nash, S.: A survey of truncated-newton methods. J. Comput. Appl. Math. 124, 45–59 (2000)
27. OPEN CASCADE S.: Salome: the open source integration platform for numerical simulation.
http://www.salome-platform.org (2006)
28. Pasanisi, A.: Uncertainty analysis and decision-aid: methodological, technical and managerial
contributions to engineering and R&D studies. Habilitation Thesis of Université de Technolo-
gie de Compiègne, France https://tel.archives-ouvertes.fr/tel-01002915 (2014)
38 M. Baudin et al.
29. Pasanisi, A., Dutfoy, A.: An industrial viewpoint on uncertainty quantification in simulation:
stakes, methods, tools, examples. In: Dienstfrey, A., Boisvert, R. (eds.) Uncertainty Quantifi-
cation in Scientific Computing – 10th IFIP WG 2.5 Working Conference, WoCoUQ 2011,
Boulder, 1–4 Aug 2011. IFIP Advances in Information and Communication Technology,
vol. 377, pp. 27–45. Springer, Berlin (2012)
30. Rasmussen, C., Williams, C., Dietterich, T.: Gaussian Processes for Machine Learning. MIT,
Cambridge (2006)
31. Robert, C.P., Casella, G.: Monte Carlo Statistical Methods. Springer, New York (2004)
32. Rubinstein, R.: Simulation and the Monte-Carlo Methods. Wiley, New York (1981)
33. Sacks, J., Welch, W., Mitchell, T., Wynn, H.: Design and analysis of computer experiments.
Stat. Sci. 4, 409–435 (1989)
34. Saltelli, A.: Making best use of model evaluations to compute sensitivity indices. Comput.
Phys. Commun. 145, 280–297 (2002)
35. Saltelli, A., Tarantola, S., Chan, K.: A quantitative, model-independent method for global
sensitivity analysis of model output. Technometrics 41, 39–56 (1999)
36. Saltelli, A., Chan, K., Scott, E. (eds.): Sensitivity Analysis. Wiley Series in Probability and
Statistics. Wiley, Chichester/New York (2000)
37. Santner, T., Williams, B., Notz, W.: The Design and Analysis of Computer Experiments.
Springer, New York (2003)
38. Sudret, B.: Global sensitivity analysis using polynomial chaos expansion. Reliab. Eng. Syst.
Saf. 93, 964–979 (2008)
39. Tarantola, A.: Inverse Problem Theory and Methods for Model Parameter Estimation. Society
for Industrial and Applied Mathematics, Philadelphia (2005)
40. Top 500 Supercomputer Sites: Zumbrota http://www.top500.org/system/177726, BlueGene/Q,
Power BQC 16C 1.60GHz, Custom (2014)