0% found this document useful (0 votes)
147 views8 pages

The Computation of The Expected Improvement in Dominated Hypervolume of Pareto Front Approximations

The document describes a method for computing the expected improvement of hypervolume when evaluating a new candidate solution in multi-objective optimization problems. It presents a procedure to directly compute the integral expression for expected hypervolume improvement by partitioning the objective space into grid cells and calculating the contribution of each active cell. Key steps include enumerating grid cells based on the sorted objective function values, identifying active vs inactive cells, and computing cell contributions using integrals of marginal normal distributions.

Uploaded by

WesternDigi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
147 views8 pages

The Computation of The Expected Improvement in Dominated Hypervolume of Pareto Front Approximations

The document describes a method for computing the expected improvement of hypervolume when evaluating a new candidate solution in multi-objective optimization problems. It presents a procedure to directly compute the integral expression for expected hypervolume improvement by partitioning the objective space into grid cells and calculating the contribution of each active cell. Key steps include enumerating grid cells based on the sorted objective function values, identifying active vs inactive cells, and computing cell contributions using integrals of marginal normal distributions.

Uploaded by

WesternDigi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

The computation of the expected improvement in

dominated hypervolume of Pareto front


approximations
Michael Emmerich, Andre Deutz, Jan-Willem Klinkenberg
Leiden Institute for Advanced Computer Science,
Niels Bohrweg 1, Leiden University, The Netherlands
LIACS-TR 9-2008, September 2008
Abstract
The hypervolume measure is used frequently in the design and performance assessment of multiobjective optimization algorithms. Especially
in the context of metamodel (or surrogate) assisted optimization [1, 2] it is
interesting to look at the following problem. Given an approximation set
for the Pareto front and a new candidate solution x that has not yet been
evaluated precisely but for which a prediction with uncertainty measure
in the form of an independent multivariate Gaussian distribution with
mean vector
~ and standard deviation vector ~ exists (Figure 1): What
is the expected improvement of the hypervolume when x is being added
to the population? For multiple objectives up til now only a Monte Carlo
method has been reported. This contribution provides a direct computation procedure for the integral expression. This will be useful to enhance
both accuracy and speed of computation for this important measure.

Introduction

A point ~y Rm is said to dominate a point ~y 0 Rm , iff i {1, . . . , m} :


yi yi0 and ~y 6= ~y 0 . In shorthand we write ~y ~y 0 . We write ~y ~y 0 , iff
i {1, . . . , m} : yi yi0 . For a (finite) set of points P Rm and a vector
~y Rm define: P ~y , iff ~y 0 P : ~y 0 ~y .
Let Vol(U ) denote the Lebesgue measure of a set U . Then we can define the
S-metric as :
S(P, ~y ref ) = Vol({~y Rm |P ~y ~y ref })
(1)
Next, let us assume a multiobjective optimization problem
f1 (x) min, . . . fm (x) min, x X
1

(2)

Approximation Set on the old Pareto front


Mean values of approximations

Probability Density
0.2
20

0
15

20
15

10
10

f1
5

f2

0 0

Figure 1: Multivariate distributions of three uncertain predictions and confidence boxes around them. The red line indicates the current approximation to
the Pareto front.
for some input space X. We may think of X as Rn for some n but this is
not essential for the following discussion. In addition, let us assume that a
prediction tool (e.g. Kriging) provides a prediction for the function values
fi (x), i = 1, . . . , m with an uncertainty measure. The prediction and its uncertainty are defined by a multivariate normal distribution. Whenever we deal
with independent models of the m objectives, the multivariate distribution is
defined by a vector of mean values
~ of the marginal distributions and the vector
of their corresponding standard deviations ~ .
In the context of efficient global optimization and metamodel-assisted local
search it is desirable to know what the expected improvement of a point is. By
maximizing the expected improvement over a selection of points from X we can
detect promising solutions using the prediction tool which subsequently may be
evaluated by means of a real experiment or a cost expensive analysis tool.
In this paper we are particularly interested in the expected improvement
in hypervolume. The improvement in hypervolume is measured relative to a
population of solution vectors P = {~y (1) , . . . , ~y (k) } in the objective space Rm
by:
I(~y , P ) = S(P {~y }) S(P ).
(3)
The expected improvement in hypervolume is now defined as
Z
ExI(x) =
~
y Rm

I(~y , P ) PDFx (~y )d~y

(4)

In [1] this integral is proposed for the purpose of metamodel-assisted multiobjective optimization. A Monte-Carlo integration method was described for its
computation. However, this method has limited accuracy and a direct computation would be desirable. This paper proposes for the first time such a direct
2

method for computing the expected improvement integral as a closed form expression for m > 1, thereby generalizing the direct computation of Jones at al.
[2] that is only defined for the single-objective case, i.e. m = 1.

Computation Procedure

R
As with the probability of non-domination, i.e. 1 ~yRm T(P ~y )d~y , described
in [1], the integration region Rm can be partitioned into a set of interval boxes,
and then piecewise integration can solve the problem of computing the integral
directly. To provide an intuition on the grid-variables and areas introduced in
the following, Fig. 3 may serve the reader.
(2)
(k)
(1)
Let bi , bi , . . ., bi denote the sorted list of the all the i-th coordinates
(0)
(k+1)
of vectors ~y (1) , . . . , ~y k . For technical reasons we define bi = and bi
=
(k+2)
ref
= .
yi , bi
(i)
The grid-coordinates bj give rise to a partitioning into grid cells. We
can enumerate these grid cells as follows. For each (i1 , . . . , im ), where is
(i )
(i )
{0, . . . , k+1}, the grid cell named C(i1 , . . . , im ) is determined by (b1 1 , . . . , bmm )T
(i +1)
(i +1)
and (b1 1 , . . . , bmm )T as the half open (from below) interval box (i.e.,
(i )
(i )
(i ) (i +1)
(i2 ) (i2 +1)
(i1 ) (i1 +1)
] (bmm , bmm ]). We call (b1 1 , . . . , bmm )T
] (b2 , b2
(b1 , b1
the lower bound of C(i1 , . . . , im ) denoted by ~l(i1 , . . . , im ), likewise we call
(i +1)
(i +1)
(b1 1 , . . . , bmm )T the upper bound of of C(i1 , . . . , im ) denoted by ~u(i1 , . . . , im ).
With this notation we will also denote C(i1 , . . . , im ) by (~l(i1 , . . . , im ), ~u(i1 , . . . , im )].
See Figure 2 for an example.
It can be directly observed that there are many cells the integration over
which adds a contribution of zero to the integral. These are cells that
1. have lower bounds ~l(i1 , . . . , im ) that are dominated or equal to points in
P , i.e. P ~l(i1 , . . . , im ), or
2. that have upper bounds ~u(i1 , . . . , im ) that do not dominate the reference
point, i.e. ~u(i1 , . . . , im ) ~y ref .
The second criterion is fulfilled by grid cells C(i1 , . . . , im ) with i1 = k + 1 or
i2 = k + 1 . . . or im = k + 1, i.e. at least one of their coordinates has index
k + 1.
All cells that fulfill criterion (1) or (2) will be called inactive cells, while the
other cells will be termed active cells. Active cells are cells the inner points of
which are dominating the reference point and dominate at least one point in P .
Obviously, the expected improvement integral is the sum of all contributions
of integration of the improvement integral over the set of active cells C + , i.e.
X
(i1 , . . . , im )
(5)
ExI(x) =
C(i1 ,...,im )C +

Figure 2: Schematic drawing of a population, its hypervolume, and grid in the


bi-objective case. The black points are the points of the population, except the
point in the upper right corner that marks the position of the reference point for
the hypervolume. The yellow region defines the measured hypervolume S. The
(i)
(i)
grid coordinates are indicated by b1 and b2 for the first and second coordinate,
respectively. Grid-cell C(1, 1) is highlighted by a thick black boundary.

Figure 3: Schematic drawing of the integration area and grid in the bi-objective
case.(S = {~z Rm |P (~u) ~z ~u}, where P (~u) = {~
p P |~u p~} )

, where
Z
(i1 , . . . , im ) =

I(~y , P ) PDFx (~y )d~y

~
y (~l(i1 ,...,im ),~
u(i1 ,...,im )]

(6)

. Note, that it is important to choose the half-open interval boxes here, as


the lower bounds can reach out to and we have to avoid overlapping of
boundaries, which might cause double integration of non-zero contributions in
case of zero standard deviations. Moreover, because of the possibility of zero
standard deviations, Lebesgue integration is assumed.
Let us now discuss, how the contribution (i1 , . . . , im ) can be computed.
First the computation will be described. Later on, we will detail the derivation
of the expression. Readers, who are only interested in the implementation of
the integral can omit the latter part.
The approach is discussed in a top-down way, we will provide a general
expression for computing (i1 , . . . , im ) and then its components are defined.
The contribution (i1 , . . . , im ) of an active grid cell can be computed via
(i1 , . . . , im ) = (

m
Y

j (i1 , . . . , im )) Vol(S )

j=1

m
Y

((

i=1

j (i1 , . . . , im ) =

ui i
li i
) (
)) (7)
i
i
(8)

(vj (i1 , . . . , im ), uj (i1 , . . . , im ), j , j ) (vj (i1 , . . . , im ), lj (i1 , . . . , im ), j , j )


(9)
with j = 1, . . . , m. ( is defined in Equation 10. For S see Figure 3. )
We define a vector ~v (i1 , . . . , im ) Rm whose coordinates are defined as
follows. The j-th coordinate of ~v (i1 , . . . , im ) is the j-th coordinate of the
intersection point, called ~(h)(j) , of the attainment surface of P and the ray
{~y Rm |t 0 : ~y = ~l(i1 , . . . , im ) + t~e(j) } (where j {1, . . . , m}). We
can construct the j-th coordinate of ~v as follows by finding the first vector in
the sequence ~l(i1 , . . . , ij , . . . im ), ~l(i1 , . . . , ij +1, . . . im ), ~l(i1 , . . . , ij +2, . . . im ) etc.
which does not strictly dominate some point of P . The j-th coordinate of this
point is announced as the j-th coordinate of ~v (cf. Fig 3.
The computation of this boundary coordinate can be done by looping only
at the discrete steps given by the grid nodes in direction of the unit vector ~e(j) .
In case m = 2 the monotony of the staircase structure of the polyline describing
the attainment curve can be exploited to determine this value efficiently .
The integration of the marginal normal distributions is captured in the expression:
b
b
) + (a )(
)
(10)
(a, b, , ) = (

. In this expression denotes the probability density function of the standard


normal distribution, and denotes the cumulative probability density function
of the normal distribution, i.e.:

1
1
(11)
(x) = exp(x2 /2), (x) = (1 + erf(x/ 2))
2
2
5

Finally, Vol(S ) is a correction term. It denotes the hypervolume measure


for the subset of points in P dominated or equal to ~u(i1 , . . . , im ) and reference
point ~v (i1 , . . . , im ) (see also Figure 3). This can be computed by means of a
standard procedure for hypervolume computation. For the 2-D case differences
of the cumulated contributions of to the hypervolume, when integrating it along
f1 along the staircase, can be exploited for its efficient computation.

Derivation

The direct computation of the expected improvement contribution (i1 , . . . , im )


can best be motivated by means of decomposing the general integral expression.
For the sake of clarity in the discussion of this section we assume that the grid
indices i1 , . . . , im and omit them as parameters, i.e. we set ~l = ~l(i1 , . . . , im ), ~u =
~u(i1 , . . . , im ), C = C(i1 , . . . , im ), and so forth.
The improvement I(~y ) in the grid cell C always can be decomposed into
the contribution of L+ that is the hypervolume measure of the non-dominated
part in [~u, ~v ] and a boundary contribution V that depends on the integration
variable ~y and forms a boundary layer around the lower parts of [~u, ~v ]. The set
V is determined by the set difference
V = [~y , ~v ] [~u, ~v ]

(12)

(in the following, we assume always the grid indices i1 , . . . , im ). In summary for
points ~y within the cell C(i1 , . . . , im , ) it holds:
I(~y ) = Vol([~y , ~v ] [~u, ~v ] + L+ )

(13)

. An important observation is that only the first term, namely [~y , ~v ] depends on
~y .
After these preliminaries, let us recall the expected improvement integral
(expression 4). The contribution of cell C(i1 , . . . , im ) is given by:
Z u1
Z um
=

I(y1 , . . . , ym , P ) PDFx (y1 , . . . , ym )dy1 . . . dym


(14)
l1

lm

, where ~l(i1 , . . . im ) = (l1 , . . . , lm ) and ~l(i1 , . . . im ) = (u1 , . . . , um ). In order to


compute explicitly by reducing, among others, to an S-metric computation of
S (i.e., Vol(S ), we define three quantities A1 , A2 , and A3 as follows.
Z um Y
Z u1
m

(vi yi ) PDFx (y1 , . . . , ym )dy1 . . . dym


(15)
A1 =
lm

l1

i=1

u1

A2 =

m
um Y

y1 =l1

, and

lm

u1

A3 =

um

l1

(vi ui ) PDFx (y1 , . . . , ym )dy1 . . . dym

(16)

i=1

Vol(L+ ) PDFx (y1 , . . . , ym )dy1 . . . dym

lm

(17)

. Clearly = A1 A2 + A3 . The expression A2 + A3 is equal to:


Z u1
Z um
m
Y
(Vol(L+ )
(vi ui ))

PDFx (y1 , . . . , ym )dy1 . . . dym


l1

i=1

(18)

lm

Qm
Moreover the expression Vol(L+ ) i=1 (vi ui ) is the negative hypervolume
measure for the Q
set of points in P dominated or equal to ~u with reference point ~v
m
(i.e., Vol(L+ ) i=1 (vi ui ) = Vol(S ), where S = {~z Rm |P (~u) ~z ~u}
with P (~u) = {~
p P |~u p~}, see also Figure 3).
In addition, the integral
Z u1
Z um

PDFx (y1 , . . . , ym )dy1 . . . dym


l1

lm

is, in case of independent normal distributions, equal to the term


m
Y

((

i=1

li i
ui i
) (
))
i
i

. An expression for A1 is more difficult to derive, due to the nonlinearity. First


of all we decompose the integral into the product of one-dimensional integrals:
m Z ui
Y
yi i
A1 =
(vi z)(
)dz
(19)
i
i=1 li
!
Z
Z li
m
ui
Y
z i
z i
)dz
(vi z)(
)dz
=
(vi z)(
i
i

i=1
=

m
Y

((vi , ui , i , i ) (vi , li , i , i ))

i=1

. The last step is motivated by the following equality and the definition in
Equation 10 :
Z b
y
b
b
(a z)(
)dz = (
) + (a )(
).
(20)

.
The computation of reduces to the integration of a one-dimensional inRb
tegration the details of which are as follows: Compute (a y) pdf(y)dy,
Rb
2
2
2
2
21 ( y
) dy =

where pdf(y) =
exp( 12 ( y
) ). Answer: (a y) e

1 y 2
2

e 2 ( ) + 2(a) erf( 22 (b))+ 2(a). Simplified: ( b


)+(a

1

1 e 2 y being the probability density function of the


)( b
) with (y) =
2
standard normal distribution and (y) = 21 (1 + erf( y2 )) being the cumulative
density function of the standard normal distribution.
Now, all components used in the computation of have been derived and
we get the computation procedure summarized in the previous section. The

computation is relatively simple to implement. It needs no numerical integration


(except, of course, for the computation of the function erf). Implementations
of the erf function can be found, however, in most programming languages and
statistical packages.

Conclusion

The integral of the expected improvement in hypervolume can be computed


directly in case of more than one objectives. The proposed procedure for computation is highly accurate and it can be computed for arbitrary dimensions
of the objective space. In the 2-dimensional case the running time is at most
quadratic in the number of points in the Pareto front approximation set. An
open problem is whether it is possible to find a more efficient computation of
the expected improvement for higher dimensional objective spaces, for example
by merging neighboring interval boxes. However, the given procedure can be
considered as a very accurate and often faster computation alternative to the
Monte carlo integration method. It is notable, that despite the nonlinearities
caused by the normal distribution function, the described procedure does solely
use expressions that can be directly computed and no numerical integration is
needed (except for the computation of erf).

References
[1] M. Emmerich, Single- and Multiobjective Evolutionary Design Optimization Using Gaussian Random Field Metamodels, PhD Thesis, FB Informatik, University of Dortmund, 2005
[2] Jones, D., Schonlau, M., Welch, W., (1998) Efficient Global Optimization
of Expensive Black-Box Functions. Journal of Global Optimization, Vol.
13, 455-492.
[3] J.W. Klinkenberg, M. Emmerich, A. Deutz: Expected Improvement of the
S-Metric for finite Parero front Approximations, Proc. of MCDM 2008,
Auckland, NZ

You might also like