Penaltyfunctionmethodsusingmatrixlaboratory MATLAB
Penaltyfunctionmethodsusingmatrixlaboratory MATLAB
net/publication/315450072
CITATIONS READS
7 1,457
1 author:
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Hailay Weldegiorgis Berhe on 20 March 2017.
The purpose of the study was to investigate how effectively the penalty function methods are able to
solve constrained optimization problems. The approach in these methods is to transform the
constrained optimization problem into an equivalent unconstrained problem and solved using one of
the algorithms for unconstrained optimization problems. Algorithms and matrix laboratory (MATLAB)
codes are developed using Powell’s method for unconstrained optimization problems and then
problems that have appeared frequently in the optimization literature, which have been solved using
different techniques compared with other algorithms. It is found out in the research that the sequential
transformation methods converge to at least to a local minimum in most cases without the need for the
convexity assumptions and with no requirement for differentiability of the objective and constraint
functions. For problems of non-convex functions it is recommended to solve the problem with different
starting points, penalty parameters and penalty multipliers and take the best solution. But on the other
hand for the exact penalty methods convexity assumptions and second-order sufficiency conditions for
a local minimum is needed for the solution of unconstrained optimization problem to converge to the
solution of the original problem with a finite penalty parameter. In these methods a single application of
an unconstrained minimization technique as against the sequential methods is used to solve the
constrained optimization problem.
Key words: Penalty function, penalty parameter, augmented lagrangian penalty function, exact penalty
function, unconstrained representation of the primal problem.
INTRODUCTION
Optimization is the act of obtaining the best result under minimum as compared to unconstrained ones. Several
given circumstances. In design, construction and situations can be identified depending on the effect of
maintenance of any engineering system, engineers have constraints on the objective function. The simplest
to take many technological and managerial decisions at situation is when the constraints do not have any
several stages. The ultimate goal of all such decisions is influence on the minimum point. Here the constrained
either to minimize the effort required or to maximize the minimum of the problem is the same as the
desired benefit. Since the effort required or the benefit unconstrained minimum, that is, the constraints do not
desired in any practical situation can be expressed as a have any influence on the objective function. For simple
function of certain decision variables, optimization can be optimization problems it may be possible to determine
defined as the process of finding the conditions that give before hand, whether or not the constraints have any
the maximum or minimum value of a function. It can be influence on the minimum point. However, in most of the
taken to mean minimization since the maximum of a practical problems, it will be extremely difficult to identify
function can be found by seeking the minimum of the it. Thus, one has to proceed with general assumption that
negative of the same function. the constraints will have some influence on the optimum
Optimization can be of constrained or unconstrained point. The minimum of a nonlinear programming problem
problems. The presence of constraints in a nonlinear will not be, in general, an extreme point of the feasible
programming creates more problems while finding the region and may not even be on the boundary. Also, the
210 Afr. J. Math. Comput. Sci. Res.
problem may have local minima even if the most cases) optimization, which has to be solved
corresponding unconstrained problem is not having local iteratively. The sequential transformation method is also
minima. Furthermore, none of the local minima may called the classical approach and is perhaps the simplest
correspond to the global minimum of the unconstrained to implement. Basically, there are two alternative
problem. All these characteristics are direct approaches. The first is called the exterior penalty
consequences of the introduction of constraints and function method (commonly called penalty function
hence, we should have general algorithms to overcome method), in which a penalty term is added to the objective
these kinds of minimization problems. function for any violation of constraints. This method
The algorithms for minimization are iterative generates a sequence of infeasible points, hence its
procedures that require starting values of the design name, whose limit is an optimal solution to the original
variable x. If the objective function has several local problem. The second method is called interior penalty
minima, the initial choice of x determines which of these function method (commonly called barrier function
will be computed. There is no guaranteed way of finding method), in which a barrier term that prevents the points
the global optimal point. One suggested procedure is to generated from leaving the feasible region is added to the
make several computers run using different starting objective function. The method generates a sequence of
points and pick the best. Majority of available methods feasible points whose limit is an optimal solution to the
are designed for unconstrained optimization where no original problem.
restrictions are placed on the design variables. In these Penalty function methods are procedures for approxi-
problems, the minima exist if they are stationary points mating constrained optimization problems by uncon-
(points where gradient vector of the objective function strained problems. The approximation is accomplished by
vanishes). There are also special algorithms for adding to the objective function a term that prescribes a
constrained optimization problems, but they are not easily high cost for the violation of the constraints. Associated
accessible due to their complexity and specialization. with this method is a parameter µ that determines the
All of the many methods available for the solution of a severity of the penalty and consequently the degree to
constrained nonlinear programming problem can be which the unconstrained problem approximates the
classified into two broad categories, namely, the direct original problem. As µ →∞ the approximation becomes
methods and the indirect methods approach. In the direct increasingly accurate.
methods the constraints are handled in an explicit Thus, there are two fundamental issues associated with
manner whereas in the most of the indirect methods, the this method. The first has to do how well the uncon-
constrained problem is solved as a sequence of strained problem approximates the constrained one. This
unconstrained minimization problems or as a single is essential in examining whether, as the parameter µ is
unconstrained minimization problem. Here we are increased towards infinity, the solution of the uncon-
concerned on the indirect methods of solving constrained strained problem converges to a solution of the
optimization problems. A large number of methods and constrained problem. The other issue, most important
their variations are available in the literature for solving from a practical view point, is the question of how to solve
constrained optimization problems using indirect a given unconstrained problem when its objective
methods. As is frequently the case with nonlinear function contains a penalty term. It turns out that as µ is
problems, there is no single method that is clearly better increased yields a good approximating problem; the
than the others. Each method has its own strengths and corresponding structure of the resulting unconstrained
weaknesses. The quest for a general method that works problem becomes increasingly unfavorable thereby
effectively for all types of problems continues. The main slowing the convergence rate of many algorithms that
purpose of this research is to present the development of may be applied. Therefore it is necessary to device
two methods that are generally considered for solving acceleration procedures that circumvent this slow
constrained optimization problems, the sequential convergence phenomenon. To motivate the idea of
transformation methods and the exact transformation penalty function methods consider the following nonlinear
methods. programming problem with only inequality constraints:
Sequential transformation methods are the oldest
methods also known as sequential unconstrained Minimize f(x), subject to g(x) 0 (P) x X;
minimization techniques (SUMT) based upon the work of
Whose feasible region we denote by S =
Fiacco and McCormick (1968). They are still among the n n m
. Functions f: R → R and g: R → R
most popular ones for some cases of problems, although
are assumed to be continuously differentiable and X is a
there are some modifications that are more often used.
nonempty set in Rn. Let the set of minimum points of
These methods help us to remove a set of complicating
constraints of an optimization problem and give us a problem (P) be denoted by M (f, S), where M (f, S) ≠ .
frame work to exploit any available methods for And we consider a real sequence {µk} such that ≥ 0.
unconstrained optimization problems to be solved, perhaps, The number is called penalty parameter, which controls
approximately. However, this is not without a cost. In fact, the degree of penalty for violating the constraints. Now we
this transforms the problem into a problem of non smooth (in consider functions θ: X (R+ {0}) → R, as defined by
Berhe 211
θ(x, µ):= f(x) + µp(x), (x, µ) {(X) (R+ {0})}, (1.1) Objectives of the research
we can intuitively see that an optimal solution to the (y) = 0 if y ≤ 0 and (y) > 0 if y > 0. (2.1b)
above problem must have h2(x) close to zero, otherwise a
2
large penalty term µh (x) will be incurred and hence f(x) + Typically is of the form
2
µh (x) approaches to infinity which makes it difficult to
minimize the unconstrained problem (Bazaraa (y) = (maximum {0, y}) q,
Berhe 213
where q is a nonnegative real number. Thus, the penalty The θ-function will have continuous first derivatives.
function p is usually of the form p(x) These derivatives are given by
= .
n = + .
Definition 1. A function p : R → R is called a penalty
function if p satisfies
n
Generally, the value of q is chosen as 2 in practical
i. p(x) is continuous on R computations and hence, will be used as q = 2 in the
ii. p(x) = 0 if g(x) 0 and subsequent discussion of the penalty method with
iii. p(x) > 0 if g(x) 0.
p(x) = .
An often-used class of penalty functions for optimization
problems with only inequality constraints is:
We refer to the function f(x) + μp(x) as the auxiliary Consider the optimization problem in example1,
function. Denoting θ(x, μ): = f(x) + Let p(x) = , then,
μ , for the auxiliary function. The
effect of the second term on the right side is to increase p(x) =
θ(x, μ) in proportion to the qth power of the amount by
which the constraints are violated. Thus there will be a
penalty for violating the constraints and the amount of Note that the minimum of f + μp occurs at the point 2 –
penalty will increase at a faster rate compared to the ( ) and approaches the minimum point x = 2 of the
amount of violation of a constraint (for q Rao (2009).
Let us see the behavior of θ(x, μ) for various values of q. original problem as μ → ∞. The penalty and auxiliary
functions are as shown in Figure 1.
If the constraints are of the form gi(x) ≤ 0 for I = 1, . . . ,
i. q = 0,
m and hi(x) = 0 for I = 1, . . . , l, then a suitable penalty
θ(x, μ) = f(x) + μ function p is defined by
= p(x) = + , (2.2a)
This function is discontinuous on the boundary of the where and ψ are continuous functions satisfying the
acceptable region and hence it will be very difficult to following properties:
minimize this function.
(y) = 0 if y ≤ 0 and (y) > 0 if y > 0. (2.2b)
ii. 0 (y) = 0 if y = 0 and (y) > 0 if y ≠ 0.
Here the θ-function will be continuous, but the penalty for Typically, and are of the forms
violating a constraint may be too small. Also the
q
derivatives of the function are discontinuous along the (y) = (maximum {0, y})
boundary. Thus, it will be difficult to minimize the θ- (y) = |y|
q
function.
where q is a nonnegative real number. Thus, the penalty
iii. q = 1 function p is usually of the form
In this case, under certain restrictions, it has been shown
that there exists a μo large enough that the minimum of θ p(x) = + .
is exactly the constrained minimum of the original
problem for all μk μo; however, the contours of the θ-
function posses discontinuous first derivatives along the Definition 2
boundary.
Hence, in spite of the convenience of choosing a single A function p : Rn → R is called a penalty function if p
μk that yields the constrained minimum in one satisfies
unconstrained minimization, the method is not very
attractive from computational point of view. i. p(x) is a continuous function on Rn
ii. p(x) = 0 if g(x) 0 and h(x) = 0 and
iv. q iii. p(x) > 0 if g(x) 0 and h(x) 0.
214 Afr. J. Math. Comput. Sci. Res.
(a) (b)
Figure 1. Penalty and auxiliary functions.
An often used class of penalty functions for this is: Minimize f(x)
subject to g(x) 0
p(x) = + , where q ≥ h(x) = 0 (P)
1. x X,
the original problem (P), where the set of minimum points computing θ(µ) for a sufficiently large µ.This result is
of θ is denoted by M(f, X) and the set of all minimum established in Theorem 2. First, the lemma theorem is
points of (P) is denoted by M(f, S). needed.
The representation of penalty methods above has
assumed either that the problem (P) has no equality Lemma 1 (Penalty Lemma)
constraints, or that the equality constraints have been
converted into inequality constraints. For the latter, the Suppose that f, g1, . . . , gm, h1, . . . , hl are continuous
conversion is easy to do, but the conversion usually n n
functions on R , and let X be a nonempty set in R . Let p
violates good judgments in that it unnecessarily n
be a continuous function on R as given by definition 1,
complicates the problem. Furthermore, it can cause the and suppose that for each µ, there exists an x µ ∈ X, which
linear independence condition to be automatically is a solution of θ(µ), where θ(µ) := f(xµ ) + µp(xµ ).
violated for every feasible solution. Therefore, instead let Then, the following statements hold:
us consider the constrained optimization problem (P) with
both inequality and equality constraints since the above 1. p(xµ) is a non-increasing function of µ.
can be easily verified from this. To describe penalty 2. f(xµ) is a non-decreasing function of µ.
methods for problems with mixed constraints, we denote 3. θ(µ) is a non-decreasing function of µ.
the penalty parameter by l(μ) = μ ≥ 0, which is a
4. nf{f(x) : x S} ≥ , where θ(µ) = inf{f(x) +
monotonically increasing function and the penalty
function P(x) = + , satisfying the µp(x) : x X}, and g, h are vector valued functions whose
components are g1, g2, . . . , gm and h1, h2 , . . . , hl
properties given in (2.2b) and then consider the following
respectively.
Primal and Penalty problems:
Proof: Assume that µ and λ are penalty parameters such
Primal problem that λ < µ.
Maximize θ(μ) θ(µ) = f(xµ) + µ p(xµ) ≤ inf{f(x) + µ p(x), for all x ∈ X} which
subject to μ ≥ 0, follows that
where θ(μ) = inf{f(x) + μp(x) : x X}. The penalty problem f(xµ) + µp(xµ) ≤ f(xλ) + µp(xλ), since ∈ X. (2.3b)
consists of maximizing the infimum (greatest lower
bound) of the function {f(x) + μp(x): x X}; therefore, it is Adding equation (2.3a) and (2.3b) holds:
a max-min problem. Therefore the penalty problem can
be formulated as: f(xλ) + λp(xλ) + f(xµ) + µp(xµ) ≤ f(xµ) + λp(xµ) + f(xλ) + µp(xλ)
4. Suppose be any feasible solution to problem (P) inf{f(x), x ∈ S} ≥ θ(µ) = f(xµ) + µp(xµ) ≥ f(x1)+µp(xµ)
with > f(x1) + ( |f(y) – f(x1)| + 2)ε
= f (x1) + |f(y) – f(x1)| + 2ε > f(y)
g( ) ≤ 0, h( ) = 0 and p( ) = 0, where ∈ X. Then,
f( ) + µp( ) = inf{f(x), x ∈ S} which implies that it follows that inf{f(x), x ∈ S} > f(y). This is not possible in
f( ) = inf{f(x), x ∈ S}. (2.3c) the view of feasibility of y.
1
By the definition of θ(µ) Thus, p( ) ≤ ε for all µ ≥ |f(y) – f(x1)| + 2. Rearranging
θ(µ) = f( ) + µp( ) ≤ f( ) + µp( ) = inf{f(x), x ∈ S }, for the above we get ε ≥ |f(y) – f(x1))| + , since ε > 0 is
all µ ≥ 0. arbitrary, p( ) → 0 as µ → ∞.
To show inf{f(x) : x ∈ S } = = .
Therefore, ≤ {inf{f(x), x ∈ S }}. Let { } be any arbitrary convergent sequence of {xµ},
and let be its limit. Then,
The next result concerns convergence of the penalty
method. It is assumed that f(x) is bounded below on the ≥ θ(µk) = f( )+ p( ) ≥ f( ).
(nonempty) feasible region so that the minimum exists.
Since → and f is continuous function with
= f( ) , then the above inequality implies
Theorem 2 (Penalty convergence theorem)
that
Consider the following Primal problem:
≥ f( ). (2.4)
Minimize f(x)
subject to g(x) 0 Since p( ) → 0 as µ → ∞, then p( ) = 0 , that is, is a
h(x) = 0 (P) feasible solution to the original problem (P) which follows
x X, that inf{f(x) : x ∈ S } = f( ).
where f, g, h are continuous functions on Rn and X is a By part 3 of Lemma 1 θ(µ) is a nondecreasing function of
nonempty set in Rn. Suppose that the problem has a µ, then
feasible solution denoted by , and p is a continuous
function of the form (2.2). Furthermore, suppose that for = . (2.5a)
each µ, there exists a solution ∈ X to the problem to
minimize {f(x) + µp(x) subject to x ∈ X}, and {xµ} is is an optimal solution to (P) by assumption implies that
contained in a compact subset X then,
inf{f(x) : x ∈ S } = f( ) (2.5b)
inf{f(x) : x ∈ S } = = ,
and by part 4 of the Lemma 1 above
where θ(µ) = inf{f(x) + µp(x) : x ∈ X} = f( ) + µp( ). ≤ inf{f(x) : x ∈ S }. (2.5c)
Furthermore, the limit of any convergent subsequence
of { } is an optimal solution to the original problem, and Equating (2.4), (2.5a), (2.5b) and (2.5c), we get
µp( ) → 0 as µ → ∞. inf{f(x) : x ∈ S } = =
Berhe 217
Since gi( ) < 0 for all elements of N then by Theorem 2.2, follows that ≥ 0 for I ∈ I, for all μ sufficiently large.
We have gi( ) < 0 for sufficiently large μ which results
= 0 (by assumption). Hence, we can write the → , as μ → ∞. Then = 0 for I ∈ N.
foregoing identity as From the continuity of all functions involved,
and the corresponding penalty problem: Solving these results in = 2/3; = 2/3; = 4/3; (u = 0
yields an infeasible solution).
Minimize μ To consider this example using penalty method, define
subject to ( ) , the penalty function
Thus, = and = for any fixed µ. standard type of analysis will be applicable to the tail of
such a sequence (Luenberger, 1974).
When μ → ∞, this converges to the optimum solution of
=( ).
Consider the constrained optimization problem:
Now suppose we use (2.8) to define
=
Example 6
Thus, Consider the auxiliary function, (x, µ) =
+ , of example 2.5. The Hessian is:
= 2μ ,
H= .
which is 2µ times a matrix that approaches
. This matrix has rank equal to the rank
of the active constraints at (Luenberger, 1974). Suppose we want to find its eigenvalues by solving det |H
Assuming that there are r1 active constraints at the – | = 0,
solution , then for well behaved the matrix Q(x, µ) with |H – | = -4
only inequality constraints has r1 eigenvalues that tend to = – (6 +4 + 8 + 12 .
∞ as μ → ∞, but the n – r1 eigenvalues, though varying
with μ, tend to finite limits. These limits turnout to be the This quadratic equation yields
eigenvalues of L( ) restricted to the tangent subspace M
of the active constraints. The other matrix = (3 + 2μ) ± ,
2μ with l equality constraints has = (3 + 2μ) - and = (3 + 2μ) + .
rank l. As μ → ∞, → , the matrix Q(x, µ) with only
equality constraints has l eigenvalues that approach Note that → ∞ as µ → ∞, while is finite; and, hence,
infinity while the n - l eigenvalues approach some finite the condition number of H approaches ∞ as µ → ∞.
limits. Consequently, we can expect a severely ill- Taking the ratio of the largest and the smallest
conditioned Hessian matrix for large values of μ. eigenvalue yields
Considering equation (2.10) with both equality and It should be clear that as μ → ∞, the limit of the
inequality constraints we have as → , is a local preceding ratio also goes to ∞. This indicates that as the
solution to the constrained minimization problem (P) and iterations proceed and we start to increase the value of μ,
that it satisfies the Hessian of the unconstrained function that we are
minimizing becomes increasingly ill-conditioned. This is a
h( ) = 0 and gA ( ) = 0 and gI( ) < 0, common situation and is especially problematic if we are
using a method for the unconstrained optimization that
where gA and gI, is the induced partitioning of g into r1 requires the use of the Hessian.
active and r2 inactive constraints, respectively. Assuming
that the l gradients of h and the r1 gradients of gA
evaluated at together are linearly independent, then is Unconstrained minimization techniques and penalty
said to be regular. It follows from this expression that, for function methods
large µ and for close to the solution of (P) the matrix Q
In this we mainly concentrate on the problems of
has l + , eigenvalues of the order of µ. Consequently, efficiently solving the unconstrained problems with a
we can expect a severely ill-conditioned Hessian matrix penalty method. The main difficulty as explained above is
for large values of μ. Since the rate of convergence of the the extremely unfavorable eigenvalue structure. Certainly
method of steepest descent applied to a functional is straight forward application of the method of steepest
determined by the ratio of the smallest to the largest descent is out of the question.
eigenvalues of the Hessian of that functional, it follows in
particular that the steepest descent method applied to
converges slowly for large . Newton’s method and penalty function methods
In examining the structure of Q is therefore; first, as µ is
increased, the solution of the penalty problem One method for avoiding slow convergence for the
approaches the solution of the original problem, and, problems is to apply Newton’s method (or one of its
hence, the neighborhood in which attention is focused for variations), since the order two convergence of Newton’s
convergence analysis is close to the true solution. This method is unaffected by the poor eigenvalue structure.
222 Afr. J. Math. Comput. Sci. Res.
In applying the method, however, special care must be use it here is:
devoted to the manner by which the Hessian is inverted,
since it is ill-conditioned. Nevertheless, if second order a. First, it is assumed that the objective and constraint
information is easily available, Newton’s method offers an functions be continuous and smooth (continuously
extremely attractive and effective method for solving differentiable). Experience has shown this to be a more
modest size penalty and barrier optimization problems. theoretical than practical requirement and this restriction
When such information is not readily available, or if is routinely violated in engineering design and in some
data handling and storage requirements of Newton’s facility location problems. Therefore it is better to develop
method are excessive, attention naturally focuses on zero a general code that solves both differentiable and non-
order or first order methods. differentiable problems.
b. The input of the derivative if it exists is tiresome for
problems with large number of variables. In spite of its
Conjugate gradients and penalty function methods advantages, Newton’s method for example is not
generally used in practice due to the following features of
According to Luenberger (1984) the partial conjugate the method:
gradient method for solving unconstrained problems is
ideally suited to penalty and barrier problems having only i. It requires the storing of the n n Hessian matrix of the
a few active constraints. If there are l active constraints, objective function,
then taking cycles of 1+1 conjugate gradient steps will ii. It becomes very difficult and sometimes, impossible to
yield a rate of convergence that is independent of µ. For compute the elements of the Hessian matrix of the
example, consider the problem having only equality objective function,
constraints: iii. It requires the inversion of the Hessian matrix of the
objective function at each step,
Minimize f(x) (P)
iv. It requires the evaluation of the product of inverse of
subject to h(x) = 0,
the Hessian matrix of the objective function and the
where x Rn, h(x) Rl, l < n. Applying the standard negative of the gradient of the objective function at each
quadratic penalty method, we solve instead the step.
unconstrained problem:
Because of the above reasons I do not prefer first and
minimize f(x) + µ , second order methods and I did not give more emphasis
on these methods and their algorithms.
for large µ. The objective function of this problem has a Finally, we should not use second-order gradient
Hessian matrix that has l eigenvalues that are of order µ methods (e.g., pure Newton's method) with the quadratic
in magnitude, while the remaining n – l eigenvalues are loss penalty function for inequality constraints, since the
close to the eigenvalues of the matrix LM, corresponding Hessian is discontinuous (Belegundu and, Chandrupatla,
to the primal problem (P). Thus, letting x µ+1 be 1999). To see this clearly, consider:
determined from xµ by making l + 1 steps of a
(nonquadratic) conjugate gradient method, and assuming Minimize f(x) = 100/x
xµ → , a solution to , the sequence {f(xµ)} converges subject to g = x -5 ≤ 0,
linearly to f( ) with a convergence ratio equal to
approximately with f(x) being a monotonically decreasing function of x.
At the optimum = 5, the gradient of p(x) is 2µmax(0, x –
β−α 2 5). Regardless of whether we approach from the left or
( )
β+α right, the value of at is zero. So, (x) is first-order
where and are, respectively, the smallest and largest differentiable. However, = 0 when approaching from
eigenvalues of LM( ). This is an extremely effective
technique when l is relatively small. The method can be the left while = 2µ when approaching from the right.
used for problems having inequality constraints as well Thus, the penalty function is not second-order
but it is advisable to change the cycle length, depending differentiable at the optimum.
on the number of constraints active at the end of the
previous cycle.
Here we will use Powell’s method which is the zero Powell’s method and penalty function methods
order method. Powell’s method is an extension of the
basic pattern search methods. It is the most widely used Powell’s method is a zero-order method, requiring the
direct search method and can be proved to be a method evaluation of f(x) only. If the problem involves n design
of conjugate directions. This is as effective as the first variables, the basic algorithm is (Kiusalaas, 2005):
order methods like the gradient method for solving
unconstrained optimization problems. The reason why we Choose a point x0 in the design space.
Berhe 223
Choose the starting vectors vi , i = 1, 2, . . . , n(the usual monotonically increasing values of μk, let {μk}, k = 1, . . .
choice is v i = ei , where ei is the unit vector in the xi- be a sequence tending to infinity such that μk ≥ 0 and μk+1
coordinate direction). > μk. Now for each k we solve the problem
Cycle
do with i = 1, 2, . . . , n Minimize {θ(x, μk), x X}. (2.11)
Minimize f(x) along the line through xi−1 in the direction of
vi. Let the minimum point be xi. To obtain xk, the optimum it is assumed that problem
(2.11) has a solution for all positive values of μk. A simple
end implementation known as the sequential unconstrained
do v n+1 ← xn - x0 (this vector is conjugate to vn+1 produced minimization technique (SUMT) is given below.
in the previous loop).
Minimize f(x) along the line through x0 in the direction of Step 0: (Initialization) Select a growth parameter β > 1
vn+1. Let the minimum point be xn+1. and a stopping parameter ε > 0 and an initial value of the
if |xn+1 − x0| < ε exit loop penalty parameter μ0. Choose a starting point x0 that
do with i = 1, 2, . . . , n violates at least one constraint and formulate the
vi ← vi+1 (v1 is discarded, the other vectors are reused) augmented objective function θ(x, µk). Let k = 1.
end do end cycle.
Powell (1997) demonstrated that the vectors vn+1 Step 1: Iterative - Starting from xk-1, use an
produced in successive cycles are mutually conjugate, so unconstrained search technique to find the point that
that the minimum point of a quadratic surface is reached minimizes θ(x, μk–1) and call it xk .
in precisely n cycles. In practice, the merit function is
seldom quadratic, but as long as any function can be Step 2: Stopping Rule - If the distance between x k–1 and
approximated locally by quadratic function, Powell’s xk is smaller than ε, that is, || xk–1 – xk || < ε or the
method will work. Of course, it usually takes more than n difference between two successive objective function
cycles to arrive at the minimum of a non quadratic values is smaller than ε, that is, |f(xk-1) – f(xk)| < ε, stop
function. Note that it takes n line minimizations to with xk an estimate of the optimal solution otherwise, put
construct each conjugate direction. μk = βμk–1, and formulate the new θ(x, µk) and put k = k+1
Powell’s method does have a major flaw that has to be and return to the iterative step.
remedied; if f(x) is not a quadratic, the algorithm tends to
produce search directions that gradually become linearly
dependent, thereby ruining the progress towards the Considerations for implementation of the penalty
minimum. The source of the problem is the automatic function method
discarding of v1 at the end of each cycle. It has been
suggested that it is better to throw out the direction that Starting point x1
resulted in the largest decrease of f(x), a policy that we
adopt. It seems counter intuitive to discard the best First in the solution step is to select a starting point. A
direction, but it is likely to be close to the direction added good rule of thumb is to start at an infeasible point. By
in the next cycle, thereby contributing to linear depen- design then, we will see that every trial point, except the
dence. As a result of the change, the search directions last one, will be infeasible (exterior to the feasible region).
cease to be mutually conjugate, so that a quadratic form A reasonable place to start is at the unconstrained
is not minimized in n cycles any more. This is not a minimum. Always we should ensure that the penalty does
significant loss since in practice f(x) is seldom a quadratic not dominate the objective function during initial iterations
anyway. Powell suggested a few other refinements to of penalty function method.
speed up convergence. Since they complicate the
bookkeeping considerably, we did not implement them.
Selecting the initial penalty parameter (µ0)
General description of the penalty function method
algorithm The initial penalty parameter μ0 should be fixed so that
the magnitude of the penalty term is not much smaller
The detail of this and a MATLAB computer program for than the magnitude of objective function. If an imbalance
implementing the penalty method using Powell’s method exists, the influence of the objective function could direct
of unconstrained minimization is given in the appendix. the algorithm to head towards an unbounded minimum
even in the presence of unsatisfied constraints. Because
Algorithm 1 (Algorithm for the penalty function the exterior penalty method approach seems to work so
method) well, it is natural to conjecture that all we have to do is set
μ to a very large number and then optimize the resulting
To solve the sequence of unconstrained problems with augmented objective function θ(x, μk) to obtain the
224 Afr. J. Math. Comput. Sci. Res.
solution to the original problem. Unfortunately, this about the same magnitude. This scaling operation is
conjecture is not correct. First, “large” depends on the intended to ensure that no subset of the constraints has
particular model. It is almost always impossible to tell an undue influence on the search process. If some
how large μ must be to provide a solution to the problem constraints are dominant, the algorithm will steer towards
without creating numerical difficulties in the computations. a solution that satisfies those constraints at the expense
Second, in a very real sense, the problem is dynamically of searching for the minimum. In either case,
changing with the relative position of the current value of convergence may be exceedingly slow. Discussion on
x and the subset of the constraints that are violated. The how to normalize constraints is given on barrier function
third reason why the conjecture is not correct is methods.
associated with the fact that large values of μ create
enormously steep valleys at the constraint boundaries.
Steep valleys will often present formidable if not Test problems (Testing practical examples)
insurmountable convergence difficulties for all preferred
search methods unless the algorithm starts at a point As discussed in previous sections, a number of
extremely close to the minimum being sought. algorithms are available for solving constrained nonlinear
Fortunately, there is a direct and sound strategy that will programming problems. In recent years, a variety of
overcome each of the difficulties mentioned above. All computer programs have been developed to solve
that needs to be done is to start with a relatively small engineering optimization problems. Many of these are
value of μ. The most frequently used initial penalty complex and versatile and the user needs a good
parameters in the literature are 0.01, 0.1, 2, 5, and 10. understanding of the algorithms/computer programs to be
This will assure that no steep valleys are present in the able to use them effectively. Before solving a new
initial optimization of θ(x, μk ). Subsequently, we will solve engineering design optimization problem, we usually test
a sequence of unconstrained problems with the behavior and convergence of the algorithm/computer
monotonically increasing values of μ chosen so that the program on simple test problems. Eight test problems are
solution to each new problem is “close” to the previous given in this section. All these problems have appeared in
one. This will preclude any major difficulties in finding the the optimization and on facility location literature and
minimum of θ(x, μk ) from one iteration to the next. most of them have been solved using different
techniques.
-4 -2 2 4 6
-2
-4
Figure 2. The sequence of unfeasible results from outside the
feasible region. And the iteration step using MATLAB for penalty
and the necessary data are given.
And the iteration step using MATLAB for penalty method subject to - ≤0
and the necessary data are given as follows:
- ≤0
Initial: -x1 + 2x2 - 2 ≤ 0
x1 = [2; 5]; We consider the sequence of problems:
µ = 1; beta = 10; = f(x) + µ[ +
tol = 1.0e-4; tol1 = 1.0e-6; h = 0.1;N = 10 (Table 1). ]
Optimum solution point using Mathematica is x =
Example 2 (1.33271, 1.7112) and Optimum solution is at =
8.26363
Consider the optimization problem: Optimum solution point using MATLAB is x =
(1.280776520285, 1.640388354297) and
Minimize f(x) = + Optimum solution is at = 8.5235020151.
226 Afr. J. Math. Comput. Sci. Res.
-4 -2 2 4 6
Figure 3. The sequence of unfeasible results.
-2
The graph of the feasible region and steps of a computer Minimize f(x) = -ln( )
program (based on Mathematica) with the contours of the subject to ≥ 0
objective function are shown in Figure 3.
And the iteration step using MATLAB for penalty 2 +3 ≤6
method and the necessary data are given as follows:
Initial: Solution
= f(x) + µ[ ]
Example 3
The Exterior penalty function method, coupled with the
Consider the optimization problem: Powell method of unconstrained minimization and golden
Berhe 227
2.5
2
Figure 4. The sequence of unfeasible results from outside the feasible region.
1.5
Table 3. The iteration step using MATLAB.
0.5
bracket and golden search method of one-dimensional 1 tol = 1.5 2 = 1.0e-3;
1.0e-9; tol1 2.5 h = 0.1;N
3 = 10;
3.5Table 3
search, is used to solve this problem.
Optimum solution point using Mathematica is x =
(1.80125, 0.800555) and Example 4
Optimum solution is at = -0.804892
Consider the optimization problem:
Optimum solution point using MATLAB is x =
(1.803696121437, 0.801642712659) and Minimize f(x) = - 5ln( )
Optimum solution is at = -0.8057446020 subject to + -4≤0
The graph of the feasible region and steps of a computer - ≤ 0.
program (Deumlich, 1996) with the contours of the
objective function are shown in Figure 4. And the iteration We consider the sequence of problems:
step using MATLAB for penalty method and the
necessary data are given as follows: = - 5ln( ) +
Initial: µ[ ].
We can solve this problem numerically. Since the
x1 = [100; 100]; function f is not convex we can expect local minimum
µ = 1; beta = 1.5; points depending on the choice of the initial point.
228 Afr. J. Math. Comput. Sci. Res.
Optimal
point
Figure 5. The sequence of unfeasible results from outside the feasible region.
Optimal point
Figure 6. The sequence of unfeasible results from outside the feasible region.
x1 ≥ 0 Example 6
x2 ≥ 0
A new facility is to be located such that the sum of its
The corresponding unconstrained optimization problem distance from the four existing facilities is minimized. The
four facilities are located at the points (1, 2), (-2, 4), (2, 6),
is: and (-6,-3). If the coordinates of the new facility are x 1
and x2, suppose that x 1 and x2 must satisfy the
restrictions x1 + x2 = 2, + ≤2, - - ≤-3, x1 ≥ 0,
= f(x) + µ[ ]. and x2 ≥ 0.
Optimum solution point using Mathematica is x
= Formulate the problem
Optimum solution is at = 16.0996. Solve the problem by a penalty function method using a
Optimum solution point using MATLAB is x = suitable unconstrained optimization technique.
(0.003236508228, 1.996776847061) and
Optimum solution is at = 16.1140109620. Minimize f(x) = + +
The graph of the feasible region and steps of a computer
+
program (Deumlich 1996) with the contours of the
objective function are shown in Figure 6. subject to x1 + x2 = 2
And the iteration step using MATLAB for penalty method + ≤2
and the necessary data are given as follows: - - ≤ -3
-x1 0
Initial: -x2 ≤ 0
x = [-100000; -100000]; µ = 0.1; beta = 10;
tol = 1.0e-6; tol1 = 1.0e-3; h = 0.1;N = 10 (Table 5). The corresponding unconstrained optimization
230 Afr. J. Math. Comput. Sci. Res.
Optimal point
𝑥12+𝑥22 = 2
−𝑥12 − 2𝑥22 = -3
Figure 7. The sequence of unfeasible results from outside the feasible region.
. Example 7
Optimum solution point using Mathematica is x =
{0.624988, 1.28927} and The detail of this location problem is given in example 1
Optimum solution is at = 17.579. of the barrier method.
Optimum solution point using MATLAB is x =
(0.989607061234, 1.010307279416) and Minimize
Optimum solution is at = 18.3670763153.
The graph of the feasible region and steps of a computer 3600 + 2500 +
program (Deumlich 1996) with the contours of the
+ 2200 +
objective function are shown in Figure 7.
+
And the iteration step using MATLAB for penalty method
and the necessary data are given as follows: + +
+ +
Initial:
Example 8 ,
,
Here, we test the well studied welded beam design
problem, which has been solved by using a number of ,
classical optimization methods and by using Genetic
Algorithms [Deb, 128 to 129]. The welded beam is ,
designed for minimum cost subject to constraints on
shear stress in weld (η), bending stress in the beam (ζ),
buckling load on the bar (Pc), end deflection of the beam
232 Afr. J. Math. Comput. Sci. Res.
P = 6000 lb, ηmax =13,600 psi, ζmax = 30,000 psi, and δmax For the types of penalty functions considered thus far,
= 0.25 in. we have seen that we need to make the penalty
parameter infinitely large in a limiting sense to recover an
Starting and optimum solutions: optimal solution. This can cause numerical difficulties and
ill-conditioning effects. To alleviate the computational
xstart = = , fstart = 5.398 and X* = and
difficulties associated with having to take the penalty
parameter to infinity in order to recover an optimal
= $2.3810 solution to the original problem, we present below two
penalty functions that possess this property and are
Optimum solution point given by Rao (2009) is x = known as exact penalty functions. These are exact
(0.2444, 6.2177, 8.2915, 0.2444) and Optimum solution absolute value (l1 penalty function) and augmented
Lagrangian penalty function method.
is at = 2.3810. Optimum solution point using MATLAB
is x = (0.375852754, 2.8212375, 10.0249324, 0.234488)
The exact absolute value or l1 penalty function
and Optimum solution is at = 2.3467952747.
An attractive approach to nonlinear programming is to
And the iteration step using MATLAB for penalty method attempt to determine an exact penalty function by which
and the necessary data are given as follows: is meant a function defined in terms of the objective
function and constraints. This holds out the possibility
Initial:
that the solution can be found by a single application of
x1 = [2; 3; 0.1; 0.05]; an unconstrained minimization technique to , as against
µ = 0.01; beta = 2; the sequential processes described above cannot be
tol = 1.0e-2; tol1 = 1.0e-6; h = 0.1; N = 5 (Table 8). used. Consider problem (P) to minimize f(x) subject to
≤ 0, i = 1, . . . , m, and = 0, i = 1, . . . , l, and a
Using other starting point we have different solution but penalty parameter μ > 0.
the difference is not significant as given below. Roughly speaking, an exact penalty function for
problem (P) is a function , where μ > 0 is the
Initial: penalty parameter, with the property that there exists a
lower bound > 0 such that for μ ≥ any local minimizer
x1 = [0.4; 6; 0 .01; 0.05];
of (P) is also a local minimizer of the penalty problem.
µ = 0.1; beta = 2;
Exact penalty functions can be divided into two classes:
tol = 1.0e-2; tol1 = 1.0e-6; h = 0.1;N = 30 (Table 9).
continuously differentiable and non-differentiable exact
penalty functions. Continuously differentiable exact
penalty functions were introduced by Fletcher (1987) for
EXACT PENALTY FUNCTION METHODS equality constrained problems and by Gland and Polak
(1979) for problems with inequality constraints; further
In this chapter, we analyze two important extensions of contributions have been assumed in Di Pillo. Non-
the transformation methods, which are called exact differentiable exact penalty functions were introduced by
penalty functions and have been most frequently used. In Zangwill (1967); Pietrgykowski (1969). The most
these methods a single unconstrained minimization frequently used type of exact penalty function is the l1
problem, with a reasonable sized penalty parameter can exact penalty function. This function has been
yield an optimum solution to the original problem. This researched widely, for example by Pietrgykowski (1969);
suggests an algorithm which attempts to locate the Coleman and Conn (1982); in nonlinear programming
optimum value of whilst keeping µ finite and so avoids applications amongst others. Unfortunately the many
the ill-conditioning in the limit µ goes to infinity that we effective techniques for smooth minimization cannot
face in penalty function methods. adequately be used because of its non-differentiability
Berhe 233
and the best way of using this penalty function is functions and that , i = 1, . . . , l are affine functions.
currently being researched. A more realistic approach is Then, for μ ≥ maximum { , i I, | |, i = 1, . . . , l}, also
to use this function as a criterion function to be used in minimizes the exact l1 penalized objective function
conjunction with other iterative methods for nonlinear defined by (4.1).
programming. The most satisfactory approach of all is to
apply methods of non-smooth optimization.
A class of non-differentiable exact penalty functions
n Proof
associated to (P) for X = R was analyzed by
Charalambous in 1978. It is assumed by
Since is a KKT point to (P), it is feasible to (P) and
satisfies
(x, ) = f(x) q ,
Pietrgykowski (1969), has shown that function (4.1) is ≥ (x) and ≥ (x) for i = 1, . . . , l. (4.3c)
exact in the sense that there is a finite µ > 0 such that
any regular local minimizer of (P) is also a local minimizer The equivalence follows easily by observing that for any x
of the penalized unconstrained problem. In 1970, ∈ Rn, the maximum value of the objective function in
Luenberger showed that, under convex assumptions, (4.3a), subject to (4.3b) and (4.3c), is realized by taking
there is a lower bound for µ, equal to the largest = maximum {0, } for i = 1, . . . , m and = | |
Lagrange multiplier in absolute value, associated to the for i = 1, . . . , l. In particular, given , define =
nonlinear problem. In 1978, Charalambous generalized maximum {0, } for i = 1, …, m and = | | 0 for i
the result of Luenberger for the l1 penalty function (4.1), = 1, …, l.
assuming the second-order sufficient conditions for (P).
The following result shows that, under suitable convexity
assumptions, there does exist a finite value of μ that will Note that, of the inequalities ≥ (x), i = 1, . . . , m, only
recover an optimum solution to (P) via the minimization those corresponding to i ∈ I are binding, while all the
of . Alternatively, it can be shown that if satisfies the other inequalities in (4.3) are binding at ( ). Hence,
second-order sufficiency conditions for a local minimum for ( ) to be a KKT point for (4.3), we must find
of (P) (the Hessian is positive definite). Then, for μ at Lagrangian multipliers , , i = 1, . . . , m, and , , i
least as large as the theorem below, will also be a local = 1, . . . , l, associated with the respective pairs of
minimum of . constraints in (4.3b) and (4.3c) such that
f( ) + + = 0,
Theorem 4 μ = 0 for i = 1, . . . , m,
μ = 0 for i = 1, . . . , l,
Consider the following primal problem: , ≥ 0 for i = 1, . . . , m,
Minimize f(x) , ) ≥ 0 for i = 1, . . . , l, = 0 for i I.
subject to g(x) 0
h(x) = 0. (P) Assumed that μ ≥ , i ∈ I, | |, i = 1, . . . , l},
we then have, using (4.2), that = for all i ∈ I, =0
Let be a KKT point with Lagrangian multipliers , i ∈ I , for i ≠ I, =μ- for all i = 1, . . . , m, and =
and , i = 1, . . . , l associated with the inequality and
equality constraints, respectively, where I = {i ∈ {1, . . . , and = for i = 1, . . . , l satisfy the forgoing KKT
m} : ( ) = 0} is the index set of active constraints. conditions. By stated convexity assumptions, it follows
that ( ) solves (4.3), and, so, minimizes . This
Furthermore, suppose that f and , i I are convex completes the proof. We proof it as follows in detail:
234 Afr. J. Math. Comput. Sci. Res.
Lemma 5 Example 1
which contradicts ( ). Thus is feasible for (P). That Case 2: if + = 1, then z = | + -1| = 0.
being the case,
By ( )
f( ) ≤ f( ) - µ = f( )
=
from (4.3.1.) and so solves (P). Therefore they have the
same optimal value. μ = 0, then
Berhe 235
method, we start with arbitrary values of Lagrange Although, there are different algorithms to solve this kind
multipliers and develop a procedure that moves the of problems the algorithm due to Powell (1997) is given
Lagrange multipliers closer to their optimum values. below and ensures global convergence. The outline of
Thus, near the optimum, the function is not as sensitive such an algorithm is as follows.
to the value of and the procedure converges to the true
optimum.
Therefore, to make use of the above result, one Algorithm 1: Algorithm for ALAG with equality
attempts to estimate the multipliers by updating the constraints
vector v after solving each (or some) unconstrained
minimizations of FALAG. The outline of such an algorithm Initialization: Select some initial Lagrangian multipliers v
is given in the following section. = usually 0 and positive values μ1, . . . , μl for the
penalty parameters. Let x o be a null vector, and denote
VIOL(xo) = ∞, where for any x ∈ Rn, VIOL(x) =
Schema of an algorithm using augmented Lagrangian
maximum{|hi(x)| : i = 1, . . . , l} is a measure of constraint
penalty functions violations. Put k = 1 and proceed to the "inner loop" of the
algorithm.
Method of multipliers
Inner loop (Penalty function minimization): Solve
The method of multipliers is an approach for solving n
minimize FALAG(x, ) subject to x ∈ R and let xk denote
nonlinear programming problems by using the
the optimal solution obtained. If VIOL (xk) = 0, stop with xk
augmented Lagrangian penalty function in a manner that
as a KKT point. (Practically, one would terminate if VIOL
combines the algorithmic aspects of both Lagrangian
(xk) is less than some tolerance > 0). Otherwise, if VIOL
duality methods and penalty function methods. However,
(xk) ≤ 0.25VIOL (xk‐ 1), proceed to the outer loop. On the
this is accomplished while gaining from both these
other hand, if VIOL (xk) > 0.25VIOL (xk‐ 1) then, for each
concepts without being impaired by their respective
constraint i = 1, . . . , l for which |hi(xk)| >0.25VIOL(xk‐ 1),
shortcomings. The method adopts a dual ascent step
replace the corresponding penalty parameter μi by 10μi
similar to the sub-gradient optimization scheme for
and repeat this inner loop step.
optimizing the Lagrangian dual; but, unlike the latter
approach, the overall procedure produces both primal
and dual solutions. The primal solution is produced via a
Outer loop (Lagrange multiplier update): Replace
penalty function minimization; but because of the
properties of the ALAG penalty function, this can usually by ,
be accomplished without having to make the penalty Where,
parameter infinitely large and, hence, having to contend
with the accompanying ill-conditioning effects. Moreover, = +2 for i = 1, . . . , l. (4.10)
we can employ efficient derivative based methods in
minimizing the penalized objective function. The Increment k by 1, and return to the inner loop.
fundamental scheme of this algorithm is as follows.
The inner loop of the forgoing method is concerned with
the minimization of the augmented Lagrangian penalty
Schema of the algorithm for equality constraints function. For this purpose, we can use xk‐ 1 (for k ≥ 2) as a
starting solution and employ Newton’s method (with line
Consider the problem of minimizing f(x) subject to the searches) in case the Hessian is available, or else use a
equality constraints hi(x) = 0 for i = 1, . . . , l. (The quasi-Newton method if only gradients are available, or
extension to include inequality constraints is relatively use some conjugate gradient method for relatively large-
straight forward and is addressed in the following scale problems. If VIOL (xk) = 0, then xk is feasible, and,
subsection). Below, we outline the procedure first, and moreover,
then provide some interpretations, motivations, and
implementation comments. As is typically the case, the = + +2 =0
augmented Lagrangian function employed is of the form (4.11)
(4.4), except that each constraint is assigned its own
specific penalty parameter , instead of a common implies that a KKT point. Whenever the revised
parameter µ. Hence, constraint violations, and iterate of the inner loop does not improve the measure
consequent penalizations, can be individually monitored. for constraint violations by selected factor 0.25, the
Accordingly, we replace (4.4) by penalty parameter is increased by a factor of 10. Hence,
the outer loop will be visited after a finite number of
(x, v) = f(x) + + iterations when the tolerance is used in the inner
Berhe 239
Hence, from the view point of problem (4.6), convergence ALAG penalty function for problems with mixed
is obtained above in one of the two ways. First, we might constraints
finitely determine a KKT point as is frequently the case.
Alternatively, viewing the forgoing algorithm as a one of Consider problem (P) to minimize f(x) subject to the
applying the standard quadratic penalty function constraints gi(x) ≤ 0 for i = 1, . . . , m and hi(x) = 0 for i =
approach, in sprit, to the equivalent sequence of 1, . . . , l (Bhatti (2000). The extension of the forgoing
problems of the type (4.6), each having particular theory of augmented Lagrangians and the method of
estimates of the Lagrangian multipliers in the objective multipliers to this case, which also includes inequality
function, convergence is achieved by letting the penalty constraints, is readily accomplished by equivalently
parameters approach infinity. In the latter case, the inner writing the inequalities as the equations gi(x) + = 0 for i
loop problems become increasingly ill-conditioned and = 1, . . . , m. Now suppose that is a KKT point for
second-order methods become imperative. problem (P)
240 Afr. J. Math. Comput. Sci. Res.
C = {d ≠ 0 : = 0 for i
= 1, . . . , l}. Using this can be written as:
µ + ( )=µ - . = + maximum .
If minimizes (4.14), then the sub-gradient component
For a given penalty parameter µ > 0, let represent to at (u, v) = ( , ) is found at
the minimum of (4.12) over (x, s) for any given set of = 2µ – 2
Lagrange multipliers (u, v). Now let us rewrite (4.12) more and is
conveniently as follows:
- .
indications are that it does not. The Hessian matrix of (x, Also, his warmest and honorable thanks also go to his
) becomes increasingly ill-conditioned as µ → ∞ and the best friend, Abreha Hailezgi who motivated and told him
minimization becomes more difficult. That's why the about his potential, and contributed a lot for the success
parameter µ should not be increased too quickly and the of this research paper.
previous iterate should be used as a starting point. As μ Finally, he thanks all his friends who have helped him
→ ∞ the Hessian (at the solution) is equal to the sum of directly or indirectly in this endeavor, especially those
L, the Hessian of the Lagrangian associated with the who love him more.
original constrained problem, and a matrix of rank r that
tends to infinity (where r is the number of active
constraints). This is the fundamental property of these Some notations
methods.
Though penalty functions are old methods for solving The following notations are frequently appearing in this
constrained optimization problems, it is, nevertheless, research:
worthy of noticing to recognize the wrong assumption and
generalization that everything which is old method is µ = Penalty parameter.
nonsense. We have to be very careful not to trivialize old x = (x1, x2, x3, …, xn) is n-dimensional vector.
methods for solving constrained optimization problems θ(x, µ) = Unconstrained representation of the primal
and erroneously assume it to be as synonymous to problem (P).
backwardness, as some might misconceive it. In fact, this θ(µ) = is the infimum of θ(x, µ) with respect to x.
sequential methods needs to be modified in one way or xµ = A minimum point of θ(µ).
another so that they would serve for the ever-changing X = A nonempty set in Rn.
and growing demands of algorithms for certain M(f, S) = Set of minimum points of the constrained
optimization problems. Though these methods suffer optimization problem (P).
from some computational disadvantages, in the absence M(f, X) = Set of minimum points of the unconstrained
of alternative software especially for no-derivative optimization problem θ(x, µ).
problems they are still recommended. They work well for FALAG = Augmented Lagrangian Penalty Function.
zero order methods like Powell’s method with some p(x) = Penalty function.
modifications and taking different initial points and LM( ) = L restricted to the subspace M that is tangent to
monotonically increasing parameters. the constraint surface.
Finally, In spite of their great initial success, their slow
rates of convergence due to ill-conditioning of the
REFERENCES
associated Hessian led researchers to pursue other
approaches. With the advent of interior point methods for Bazaraa MS, Sherali HD, Shetty CM (2006). Nonlinear Programming:
linear programming, algorithm designers have taken a Theory and Algorithms, Second Edition, John Wiley & Sons, New
fresh look at penalty methods and have been able to York. pp. 469-500.
Belegundu AD, Chandrupatla TR (1999). Optimization concepts and
achieve much greater efficiency than previously thought Applications in Engineering 2nd edition, Pensylvania State University,
possible (Nash and Sofer, 1993). pp. 278-290.
Exact transformation methods are newer and less well- Bhatti MA (2000). Practical Optimization Methods with Mathematica
established as sequential transformation methods and Applications, Department of Civil and Environmental Engineering
University of Iowa, Springer-Verlag New York, Inc. pp. 512-680.
are called the newly established modern penalty Charalambous CA (1978). Lower Bound for the controling parametres
methods. Exact transformation methods avoid this long of exact panalty functions, Mathematical Programming, 15:278-290.
sequence by constructing penalty functions that are exact Coleman TF, Conn AR (1982). Nonlinear Programming Via an exact
in the sense that the solution of the penalty problem penalty function: Asymptotic analysis, Mathematical programming,
pp.123-136
yields the exact solution to the original problem for a finite Deumlich R (1996). A course in Mathematica, Addis Ababa University,
value of the penalty parameter. However, it can be shown Faculty of science, Department of Mathematics. pp.1-140
that such exact functions are not differentiable in most Fiacco AV, McCormick GP (1968). Extensions of SUMT for nonlinear
cases. Great consideration should be assumed to the programming: Equality constraints and extrapolation. Manage. Sci.
12(11):816-828.
convexity assumption and second-order conditions in Fletcher R (1987). Practical Methods of Optimization, Second Edition,
using these methods. John Wiley & Sons, New York. pp. 277-318.
Gland ST, Polak E (1979). A multiplier method with Authomatic
Limitation of penalty growth. Math. Programming,17:140-155
Hestenes MR (1969). Multiplier and gradient methods. J. Optim. Theory
ACKNOWLEDGEMENTS Appl. 4(5):123-136.
Himmelblau DH (1972). Applied Nonlinear Programming, New York,
First, the author wishes to acknowledge his family, McGraw-Hill, pp. 342-355.
especially his parents for their unconditional love and Kiusalaas J (2005). Numerical Methods in Engineering with MATLAB,
the Pennsylvania State University, and Cambridge University Press
faith in him since his birth, without whose support and Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore,
encouragement, this would not have been a reality. São Paulo. pp. 391-404.
Berhe 243
Luenberger DG (1974). A combined penalty function and Gradient Zangwill WI (1967). Nonlinear programming via Penalty Functions.
projection method for nonlinear programming. J. Opt. Appl. 14:5. Manage. Sci. 13(5):344-358.
Luenberger DG (1984). Linear and Nonlinear Programming, 2nd ed.,
Addison-Wesley Publishing Company, Reading, MA. pp. 401-430.
Nash SG, Sofer A (1993). Linear and Nonlinear Programming, McGraw
Hill, New York. pp. 469-765.
Pietrgykowski T (1969). An exact potential method for constrained
maxima, SIAM J. Num. Anal. 6:217-238.
Powell MJD (1997). A fast algorithm for nonlinearity constrained
optimization calculations, in Lecture Notes in Mathematics, Watson
GA et al., Eds., Springer-Verlag, Berlin. pp. 343-357.
Rao SS (2009). Engineering Optimization: Theory and Practice, Fourth
Edition, John Wiley & Sons, Inc. pp. 248-318.
244 Afr. J. Math. Comput. Sci. Res.
Appendix k=64746.022*(1-0.0282346*x(3))*x(3)*x(4).^3;
m =(2.1952./(x(3).^3*x(4)));
General description of the penalty function algorithm t1=(6000./(sqrt(2)*x(1)*x(2)));
t2=6000*(14+0.5*x(2))*sqrt(0.25*(x(2).^2+(x(1)+x(3)).^2));
The SUMT iteration involves updating the penalty t3=2*(0.707*x(1)*x(2)*((x(2).^2/12)+0.25*(x(1)+x(3)).^2));
parameters and initial design vector and calling the t=t2./t3;
unconstrained problem again. In the algorithm Powell’s T=sqrt((t1).^2+(t).^2+((x(2)*t1*t)./sqrt(0.25*(x(2).^2+(x(1)+
method (which is the zero order method) together with x(3)).^2))));
golden-bracket and golden-section method for line c=[T-13600;l-30000;x(1)-x(4);6000-k;m-0.25;-
minimization is used. The program expects the following x(1)+0.125;x(1)-10;-x(2)+0.1;x(2)-10;...
files to be available in the path -x(3)+0.1;x(3)-10;-x(4)+0.1;x(4)-10];
ceq =[]; % no equality constraints.
i. Objective function,
ii. Equality and inequality constraints together, function z = unconweldedbeam(x,miw) %
iii. Unconstrained function, The corresponding unconstrained problem
iv. The flines function (for a line search). l=(504000./((x(3)).^2*x(4)));
k=64746.022*(1-0.0282346*x(3))*x(3)*x(4).^3;
For each an iteration of the penalty method there is an m =(2.1952./(x(3).^3*x(4)));
inner iteration of the Powell’s method. t1=(6000./(sqrt(2)*x(1)*x(2)));
The program uses global statements to communicate t2=6000*(14+0.5*x(2))*sqrt(0.25*(x(2).^2+(x(1)+x(3)).^2));
penalty parameters, initial point, search direction (V), t3=2*(0.707*x(1)*x(2)*((x(2).^2/12)+0.25*(x(1)+x(3)).^2));
whereas the initial penalty parameters, initial design t=t2./t3;
variable, the number of iterations for penalty method, T=sqrt((t1).^2+(t).^2+((x(2)*t1*t)./sqrt(0.25*(x(2).^2+(x(1)+
tolerances for the penalty and Powell’s method are given x(3)).^2))));
by user automatically. z=obweldedbeam(x)+miw*(max(0,T-13600)).^2
Several parameters are coded into the program, +miw*(max(0,l-30000)).^2+...
especially those needed for golden bracket and golden miw*(max(0,x(1)-x(4))).^2 +miw*(max(0,6000-
section methods. k)).^2+miw*(max(0,m-0.25)).^2 +miw*(max(0,-
Step 0: (Initialization) Choose x 0, number of SUMT x(1)+0.125)).^2+...
iterations (N), penalty parameter (µ), and penalty miw*(max(0,x(1)-10)).^2 +miw*(max(0,-
x(2)+0.1)).^2+miw*(max(0,x(2)-10)).^2 +miw*(max(0,-
multiplier ( ), tolerance for the penalty method (tol1) and
for the Powell’s method (tol). x(3)+0.1)).^2+...
miw*(max(0,x(3)-10)).^2 +miw*(max(0,-
k = 1 (SUMT iteration counter)
Step 1: Start the Powell’s method to minimize f(x, µ) x(4)+0.1)).^2+miw*(max(0,x(4)-10)).^2;
Output xk.
Step 3: Convergence of exterior penalty method. The MATLAB Code for Penalty Function Method:
Stopping criteria:
= - , = - . Function penaltyfunction
If ≤ : stop (they have approximately the same % Penalty function method for minimizing f(x1,x2, ..., xn).
solution) % Example for Logarithmic function on Example 8.
else if ≤ : stop (design not changing)
else if k = : stop (max SUMT iteration reached) % input:
continue % tol and tol1 are error tolerances for Powell’s method
k k+1 and penalty method respectively.
% x = starting point (vector).
= % µ = the penalty parameter.
go to step 2 % beta = the penalty multiplier.
% N = number of iterations for the penalty method, we
Input for the welded beam example given in example 8 of choose it depending on the problem.
penalty function method. % h = initial step size used in search for golden bracket.
% output:
function f = obweldedbeam(x) % objective function % xmin = minimum point.
f=1.10471*x(1).^2*x(2)+0.04811*x(3)*x(4)*(14+x(2)); % objmin = miminum value of objective function.
% augmin = minimum of the corresponding unconstrained
function [c,ceq] = conwelededbeam(x) % constraints problem
% V = search direction, the same as the unit vectors in method in the Kth iteration
the coordinate directions. µ=beta*µ;
sqrt(dot(f - obj,(f- obj));
% Starting of the program. if sqrt(dot(f - obj, - obj)) < tol1
clc; % clears the screen. return
clear all; % clears all values of variables for memory end
advantage. end % end of SUMT iteration.
global x µ V
x = [0.4; 6; 0.01; 0.05]; % f in the direction of coordinate axes.
µ = 0.1; beta = 2;
tol = 1.0e-2; tol1 = 1.0e-6; h = 0.1;N = 30; function z = flines(s) % f in the search direction V
if size(x,2) > 1; x = x'; end % x must be column vector global x µ V
n = length(x); % Number of design variables z = feval(@unconweldedbeam,x+s*V,µ);
df = zeros(n,1); % Decreases of f stored here % Start of golden bracketing for the minimum.
u = eye (n); % Columns of u store search directions V
disp(sprintf(' µ xmin objmin augmin ')) function [a,b] = goldbracket(func,x1,h)
disp(sprintf(' ------ ------------------ ------------ --------------- ')) % Brackets the minimum point of f(x).
for k=1:N % loop for the penalty function method % Usage: [a,b] = goldbracket(func,xstart,h)
[c,ceq]= conwelededbeam(x);
obj= obweldedbeam(x); % input:
f= unconweldedbeam(x,µ); % func = handle of function that returns f(x).
disp(sprintf('%1.5f (%3.12f,%3.12f) %2.10f %2.10f ',µ,x, % x1 = starting value of x.
obj,f)) % h = initial step size used in search.
for j = 1:30 % Allow up to 30 cycles for Powell’s method % c = a constant factor used to increase the step size h
xold = x;
fold = feval(@unconweldedbeam,xold,µ); % output:
% First n line searches record the decrease of f % a, b = limits on x at the minimum point.
for i = 1:n c = 1.618033989;
V = u(1:n,i); f1 = feval(func,x1);
[a,b] = goldbracket(@fline,0.0,h); x2 = x1 + h; f2 = feval(func,x2);
[s,fmin] = goldsearch(@fline,a,b); % Determine downhill direction and change sign of h if
df(i) = fold - fmin; needed.
fold = fmin; if f2 > f1
x = x + s*V; h = -h;
end x2 = x1 + h; f2 = feval(func,x2);
% Last line search in the cycle % Check if minimum is between x1 - h and x1 + h
V = x - xold; if f2 > f1
[a,b] = goldbracket(@fline,0.0,h); a = x2; b = x1 - h; return
[s,fmin] = goldsearch(@fline,a,b); end
x = x + s*V; end
if sqrt(dot(x-xold,x-xold)/n) < tol % Search loop for the minimum
y = x; % assign the solution to y for i = 1:100
end h = c*h;
% Identify biggest decrease of f & update search x3 = x2 + h; f3 = feval(func,x3);
directions if f3 > f2
imax = 1; dfmax = df(1); a = x1; b = x3; return
for i = 2:n end
if df(i) > dfmax x1 = x2; x2 = x3; f2 = f3;
imax= i; dfmax = df(i); end
end error('goldbracket did not find a minimum please try
end another starting point')
for i = imax:n-1
u(1:n,i) = u(1:n,i+1); % Start of golden search for the minimum.
end function [xmin,fmin] = goldsearch(func,a,b,tol2)
u(1:n,n) = V; % Golden section search for the minimum of f(x).
end % end of Powell’s method % The minimum point must be bracketed in a <= x <= b.
x=y; % y is the minimum point found using Powell’s % usage: [fmin,xmin] = goldsearch(func,xstart,h).
246 Afr. J. Math. Comput. Sci. Res.
% input: else
% func = handle of function that returns f(x). b = x2; x2 = x1; f2 = f1;
% a, b = limits of the interval containing the minimum. x1 = R*a + C*b;
% tol2 = error tolerance used in golden section. f1 = feval(func,x1);
end
% output: end
% fmin = minimum value of f(x). if f1 < f2; fmin = f1; xmin = x1;
% xmin = value of x at the minimum point. else
if nargin < 4; tol2 = 1.0e-6; end fmin = f2; xmin = x2;
nIter = ceil(-2.078087*log(tol2/abs(b-a))); end
R = 0.618033989; % R is called golden ratio.
C = 1.0 - R;
% First telescoping
x1 = R*a + C*b;
x2 = C*a + R*b;
f1 = feval(func,x1);
f2 = feval(func,x2);
% Main loop
for i =1:nIter
if f1 > f2
a = x1; x1 = x2; f1 = f2;
x2 = C*a + R*b;
f2 = feval(func,x2);