0% found this document useful (0 votes)
24 views81 pages

Metaheuristics Introduction 2

Uploaded by

IREM SEDA YILMAZ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views81 pages

Metaheuristics Introduction 2

Uploaded by

IREM SEDA YILMAZ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 81

INTRODUCTION

Optimality Criteria
A point x which satisfies all the constraints is called a feasible point and
thus is a feasible solution to the problem.
The set of all feasible points is called the feasible region.
A point x* is called a strong local maximum of the nonlinearly
constrained optimization problem if f(x) is defined in a δ-
neigbourhood N(x∗, δ) and satisfies f(x∗) > f(u) for ∀u ∈ N(x∗, δ)
where δ > 0 and u ≠ x∗.
If x∗ is not a strong local maximum, the inclusion of equality in
the condition f(x∗) ≥ f(u) for ∀u ∈ N(x∗, δ) defines the point x∗
as a weak local maximum.
Figure 1. Strong and weak maxima and minima
Example
The minimum of f(x) = x2 at x = 0 is a strong local minimum.

The minimum of g(x, y) = (x − y)2 + (x − y)2 at x = y = 0 is a weak local


minimum because g(x, y) = 0 along the line x = y so that g(x, y = x) = 0 =
g(0, 0).
Combinatorial Problems

In the past 57 years many researchers have studied


the problem of finding optimal solutions to
problems which can be structured as a function of
some decision variables, perhaps in the presence
of some constraints.
Such problems can be formulated generally as follows:

Here, x is a vector of decision variables, and f(.), gi(.) and hi(.) are
general functions.
There are many specific classes of such problems;
obtained by placing restrictions on the type of functions
under consideration, and on the values that the decision
variables can take.

Perhaps the most well-known of these classes is that


obtained by restricting f(.), gi(.) and hi(.) to be linear
functions of decision variables which are allowed to take
fractional (continuous) variables, which leads to problems
of linear programming.
Another class of problems are those of a combinatorial
nature. This term is usually reserved for problems in
which the decision variables are discrete - i.e. where the
solution is a set, or a sequence, of integers or other
discrete objects.
The problem of finding optimal solutions to such
problems is therefore known as combinatorial optimization.
For example, if 𝑥 takes 𝑘 discrete values (e.g. integers),
and if there are 𝐾 variables, the number of all possible
solutions will be 𝑘𝐾.

It is difficult to check all possible solutions in order to


find the best one(s).
Some examples of this kind of problem are as
follows.
The assignment problem

A set of n people is available to carry out n tasks. If person i does task


j, it costs cij units.

Find an assignment {1,…,n} which minimizes

Here the solution is represented by the permutation {1,…,n} of


the numbers {1,…,n}.
The knapsack problem

Knapsack problem is another combinatorial problem


defined by:
–There are 𝑁 objects;
–Each object has a weight and a value;
–The knapsack has a capacity;
–The user has a quota(minimum desired value);

The problem is to find a sub-set of the objects that can be put


into the knapsack and can maximize the total value.
The 0-1 knapsack problem
A set of n items is available to be packed into a knapsack with
capacity C units. Item i has value vi and uses up ci units of capacity.
Determine the subset I of items which should be packed in order to
maximize

Such that

Here the solution is represented by the subset I  {1,…,n}.


Traveling salesman problem (TSP)

Given 𝑁 users located in 𝑁 different places (cities).

The problem is to find a route so that the salesman can


visit all users once (and only once), start from and
return to his own place (to find the Hamiltonian cycle).
The set covering problem
A family of m subsets collectively contains n items such that subset Si
 
contains ni(  n) items. Select k(  m) subsets S i1 ,..., S ik such that

so as to minimize

where ci is the cost of selecting subset Si.

 
Here the solution is represented by the family of subsets S i1 ,..., S ik .
The vehicle routing problem
A depot has m vehicles available to make deliveries to n customers.
The capacity of vehicle k is Ck units, while customer i requires ci units.
The distance between customers i and j is di,j. No vehicle may travel
more than D units.

Allocate customers to vehicles and find the order in which each


vehicle visits its customers so as to minimize distance.
Here, vehicle k visits nk customers, and the solution is represented by the
permutation

of the numbers {1,…,n}, which is partitioned by the numbers {nk}. It is also


understood that the depot is represented by the ‘customers’ 0,k and nk+1,k for
each k.
Links with linear programming

Combinatorial problems have close links with linear


programming (LP), and most of the early attempts to
solve them used developments of LP methods, generally
by introducing integer variables taking the values 0 or 1,
in order to produce an integer programming (IP)
formulation.
For example, in the case of the 0-1 knapsack problem, we define

The problem then reduces to the following Integer Program:


However, in many cases it needs some ingenuity to find an
IP formulation of a combinatorial optimization problem.
Moreover, such formulations often involve very large
numbers of variables and constraints, and general-
purpose IP computer codes cannot usually cope with very
large problems. IP is actually much harder than ordinary
LP.
However, although IP is not in general a successful route
to finding optimal solutions to combinatorial problems,
there are good reasons for its popularity.
Firstly, the act of formulation is itself often helpful in
defining more precisely the nature of a given problem.
Secondly, for many problems, it is possible within IP to
use the technique of Lagrangean relaxation to generate a
lower bound on the optimal solution. Such information
can be very valuable.
Local and Global Optima
A feature of many combinatorial problems is that while
there may be several true or ‘global’ optima, there are
many more that are in some sense only ‘locally’ optimal.

This idea more concrete by introducing the concept of a


neighbourhood.
A neighbourhood N(x, ) of a solution x is a set of solutions
that can be reached from x by a simple operation .
Such an operation  might be the removal of an object from,
or addition of an object to, a solution. The interchange of two
objects in a solution is another example of such an operation
which is particularly common in sequencing problems. Often
these operations are called moves.
If a solution y is better than any other solution in its
neighbourhood N(y, ), then y is a local optimum with respect
to this neighbourhood.
In some cases it is possible to find a move  such that a
local optimum is also a global optimum. For example, in
some one-machine sequencing problems, a pairwise
interchange move has this property.
However for many problems this is not possible, and it is
necessary to use some form of implicit enumeration.
Of course, there are many existing treatments of
combinatorial optimization which tend to concentrate on
methods which are exact rather than heuristic. That is,
they are mainly concerned with those techniques which
guarantee to find the optimal solution to a stated problem.
These methods usually rely on links with the theories of
linear programming or graphs, or else use an implicit
enumeration approach such as branch and bound or
dynamic programming.
Heuristics
The term heuristic derives from the Greek heuriskein
meaning to find or discover.

The word has indeed been used in Artificial Intelligence


circles with quite a different connotation, for example, the
authors use the term to include methods such as branch-
and-bound which find globally optimal solutions, and this
usage probably pre-dates the now common one.
In Operational Research, a ‘heuristic’ would be better
described as a ‘seeking’ method, as it cannot guarantee to
find anything.

Nevertheless, in the usage that has become common in


the context of combinatorial optimization, the term
heuristics used as a contrast to methods which guarantee
to find a global optimum.
Definition

A heuristic is a technique which seeks good (i.e. near


optimal) solutions at a reasonable computational cost
without being able to guarantee either feasibility or
optimality, or even in many cases to state how close to
optimality a particular feasible solution is.
The procedure also should be sufficiently efficient to deal
with very large problems.

The procedure often is a full-fledged iterative algorithm,


where each iteration involves conducting a search for a
new solution that might be better than the best solution
found previously. When the algorithm is terminated after
a reasonable time, the solution it provides is the best one
that was found during any iteration.
The causes of the explosion of interest on heuristics seem
to be two-fold.

- The development of the concept of computational


complexity has provided a rational basis for exploring
heuristic techniques.

-There has been a significant increase in the power and


efficiency of the more modern approaches.
The case for heuristics
A naive approach to solving an instance of a
combinatorial problem is simply to list all the feasible
solutions of a given problem, evaluate their objective
functions, and pick the best.

However, it is immediately obvious that this approach of


complete enumeration is likely to be grossly inefficient;
further, although it is possible in principle to solve any
problem in this way, in practice it is not, because of the
vast number of possible solutions to any problem of a
reasonable size.
Consider the famous traveling salesman problem (TSP).

It is a very popular problem, because, it is so easily stated,


yet so hard to solve.

The problem is as follows, a salesman has to find a route


which visits each of N cities once and only once, and
which minimizes the total distance traveled.
As the starting point is arbitrary, there are clearly (N-1)!
possible solutions (or (N-1)!/2 if the distance between
every pair of cities is the same regardless of the direction
of travel).

Suppose we have a computer that can list all possible


solutions of a 20 city problem in 1 hour. Then, using the
above formula, it would clearly take 20 hours to solve a 21
city problem, and 17.5 days to solve a 22 city problem; a
25 city problem would take nearly 6 centuries.
Because of this exponential growth in computing time
with the size of the problem, complete enumeration is
clearly a non-starter.
Algorithms and Complexity

To understand what is meant by the complexity of an


algorithm, we must define algorithms, problems, and
problem instances. Moreover, we must understand how
one measures the size of a problem instance and what
constitutes a “step” in an algorithm.
A problem is an abstract description coupled with a question
requiring an answer;
for example, the Traveling Salesman Problem (TSP) is: “Given
a graph with nodes and edges and costs associated with the
edges, what is a least-cost closed walk (or tour) containing each
of the nodes exactly once?”
An instance of a problem, on the other hand, includes an exact
specification of the data: for example, “The graph contains
nodes 1, 2, 3, 4, 5, and 6, and edges (1, 2) with cost 10, (1, 3)
with cost 14, …” and so on. Stated more mathematically, a
problem can be thought of as a function p that maps an
instance x to an output p(x) (an answer).
An algorithm for a problem is a set of instructions
guaranteed to find the correct solution to any instance
in a finite number of steps. In other words, for a
problem p, an algorithm is a finite procedure for
computing p(x) for any given input x.
In a simple model of a computing device, a “step” consists
of one of the following operations: addition, subtraction,
multiplication, finite-precision division, and comparison of
two numbers. Thus if an algorithm requires one hundred
additions and 220 comparisons for some instance, we say
that the algorithm requires 320 steps on that instance.
For finding how long the algorithm takes (in the worst case)
asymptotically as the size of an instance gets large, a simple
function of the input size that is a reasonably tight upper
bound on the actual number of steps can be formulated.
Such a function is called the complexity or running time of the
algorithm.
Technically, the size of an instance is the number of bits
required to encode it. It is measured in terms of the
inherent dimensions of the instance (such as the number of
nodes and edges in a graph), plus the number of bits
required to encode the numerical information in the
instance (such as the edge costs).
Since numerical data are encoded in binary, an integer C
requires about log2|C| bits to encode and so contributes
logarithmically to the size of the instance. The running time
of the algorithm is then expressed as a function of these
parameters, rather than the precise input size.
For example, for the TSP, an algorithm's running time
might be expressed as a function of the number of
nodes, the number of edges, and the maximum number
of bits required to encode any edge cost.

The complexity of an algorithm is only a rough estimate


of the number of steps that will be required on an
instance.
Complexity try to classify problems in terms of the
mathematical order of the computational resources
required to solve the problems via computer algorithms.
How does the running time grow as the size of the instance
gets very large? For these reasons, it is useful to introduce
Big-O notation.

For two functions f(t) and g(t) of a nonnegative parameter t,


we say that f(t) = O(g(t)) if there is a constant c > 0 such
that, for all sufficiently large t, f(t) ≤ cg(t). The function cg(t)
is thus an asymptotic upper bound on f.

For example, 100(t2 + t) = O(t2), since by taking c = 101 the


relation follows for t ≥ 100; however, 0.0001 t3 is not O(t2).
Notice that it is possible for f(t) = O(g(t)) and g(t) = O(f(t))
simultaneously.
The big O notation means that f is asymptotically equivalent
to the order of g(x).

where K is a finite, non-zero limit, we write f = O(g ).


If the limit is unity or K = 1, we say f{x) is order of g(x).
When we say f is order of 100 (or f ~ 100), this does not
mean f  100, but it can mean that f is between about 50 to
150.
From the order definition, we can easily conclude that the
addition of an O(1) number does not change the order of
an expression. That is, n + 1, n + 5 and n + 15 are all O(n).
Similarly, an O(1) factor has not much influence at all. That
is, 0.7n, 1.5n, 2n + 1 and 3n + 7 are all O(n).

As in computation, we always deal with integers n >> 1,


this means that n<< n2 as n/n2 << 1. The term with the
highest power of n dominates. Therefore, n2 + 2n + 3 is
equivalent to O(n2), and n3 + 2n2 + 5n + 20 is O(n3).
Let us come back to the computational complexity of
an algorithm. For the sorting algorithm for a given
number of n data entries, sorting these numbers
into either ascending or descending order will take the
computational time as a function of the problem size n.
O(n) means a linear complexity, while O(n2) has a
quadratic complexity. That is, if n is doubled, then the
time will double for linear complexity, but it will
quadruple for quadratic complexity.
It is said that an algorithm runs in polynomial time (is a
polynomial-time algorithm) if the running time f(t) =
O(P(t)), where P(t) is a polynomial function of the input
size.
Polynomial-time algorithms are generally (and formally)
considered efficient, and problems for which
polynomial time algorithms exist are considered “easy.”
In Exponential-Time Algorithms, case run time grows
as a function that cannot be polynomially bounded by
the input parameters.

Example: O(2n) O(n!)

Why is a polynomial-time algorithm better than an


exponential-time one?
Exponential time algorithms have an explosive
growth rate
n=5 n=10 n=100 n=1000
n 5 10 100 1000

n2 25 100 10000 1000000

n3 125 1000 1000000 109

2n 32 1024 1.27 x 1030 1.07 x 10301

n! 120 3.6 x 106 9.33 x 10157 4.02 x 102567


Polynomial and Non-Deterministic
Polynomial Problems
In some cases, such as the 'Hungarian' method for solving
the assignment problem, or Johnson's method for 2-
machine sequencing, the computing effort could be
shown to grow as a low-order polynomial in the size of
the problem.
However, for many others, such as the traveling salesman
problem, the computational effort required was an
exponential function of the problem size.
P-Complete

Problems are classified according to complexity of their


best algorithms. The problems that have polynomial time
algorithms are in the complexity class P.
i.e. P includes all decision problems for which there is an
algorithm that halts with the correct yes/no answer in a
number of steps bounded by a polynomial in the problem
size n.
Most of these problems are search problems – searching
from an exponentially big solution space.

Algorithms like greedy, Linear Programming or Dynamic


Programming are making some intelligent choices to
restrict the search volume.
The Class P in general is thought of as being
composed of relatively “easy” problems for which
efficient algorithms exist.
In mathematical programming, an easy or tractable
problem is a problem whose solution can be obtained by
computer algorithms with a solution time (or number of
steps) as a polynomial function of problem size n.
Algorithms with polynomial-time are considered efficient.
A problem is called the P-problem or polynomial-time
problem if the number of steps needed to find the
solution is bounded by a polynomial in n and it has at least
one algorithm to solve it.
Examples of Class P Problems

• Shortest path
• Minimum spanning tree
• Network flow
• Transportation, assignment and transshipment
• Some single machine scheduling problems
NP
Another class of problems is NP.

NP stands for Non-Deterministic Polynomial time.


NP is the class of decision problems for which we can
check solutions in polynomial time.
i.e. easy to verify but not necessarily easy to solve
A problem is not solvable in polynomial time, we use the
'non-deterministic polynomial (NP) time' to refer to this
case. Subsequently, the problem has the NP-complexity.
However, a solution can still be verified in a polynomial
time. The main difficulty is to find such a solution in the
first place in an efficient way. Finding such a solution in
the first place may require more than NP time.
We can only guess the solution and nondeterministic
means that no particular rule has been followed to make
the guess. No efficient algorithm exits for problems in this
class. If anything exists, it is some nondeterministic
algorithm (based on guessing correctly); but we can verify
a given solution in polynomial time.
Formally, it is the set of decision problems such that if x
is a “yes” instance then this could be verified in polynomial
time if a clue or certificate whose size is polynomial in
the size of x is appended to the problem input.
NP includes all those decision problems that could be
polynomial-time solved if the right (polynomial-length)
clue is appended to the problem input.
Class P vs Class NP

Class P contains all those that have been conquered with


well-bounded, constructive algorithms.
Class NP includes the decision problem versions of
virtually all the widely studied combinatorial optimization
problems.
P is a subset of NP.
NP-Complete

A refinement of NP class is NP-complete.

For NP complete problems, no polynomial-time


algorithms are known.
A problem is in the class NP Complete if it is in NP and
is as hard as any problem in NP.
A problem is NP-hard if all problems in NP are
polynomial time reducible to it, even though it may not be
in NP itself.
A language B is NP-complete if it satisfies two conditions
• B is in NP
• Every A in NP is polynomial time reducible to B.
If a language satisfies the second property, but not
necessarily the first one, the language B is known as NP-
Hard. Informally, a search problem B is NP-Hard if there
exists some NP-Complete problem A that Turing reduces
to B.
The problem in NP-Hard cannot be solved in polynomial
time, until P = NP.
A problem is NP-complete if it is in NP.
All other problems in NP are reducible to it.
NP-complete is a NP-hard problem. If an efficient
algorithm can be found for a NP-complete problem then
all others can also be solved.
Models and problems

What we are actually optimizing is a model of a real-world


problem. There is no guarantee that the best solution to
the model is also the best solution to the underlying real-
world problem.

Should we prefer an exact solution of an approximate


model, or an approximate solution of an exact model?
Heuristics are usually rather more flexible and are capable
of coping with more complicated (and more realistic)
objective functions and/or constraints than exact
algorithms.
This is the case for methods like simulated annealing, tabu
search and genetic algorithms, for example, where
objective functions need no simplifying assumptions of
linearity. Thus it may be possible model to the real-world
problem rather more accurately than is possible if an
exact algorithm is used.
Evaluation of heuristics
Whenever a heuristic is applied, the user always faces the
problem of how good the generated solution really is. In
some cases it is possible to analyze heuristic procedures
explicitly and find theoretical results bearing on their
average or worst-case performance. However, analysis of
general performance in this way is often difficult, and in
any case may provide little help in deciding how well a
heuristic has performed in a particular instance.
Suppose that we have some instance of an NP-hard
minimization problem. If, as in Figure 1, the vertical line
representing value (the higher up this line the higher the
value) then somewhere on this line is the optimal solution
to the problem instance. Exactly where on this line the
optimal solution lies is unknown, but it must be
somewhere.
Figure 1: Diagrammatic representation of a minimization problem
Conceptually therefore this optimal solution value divides the
value line into two:

•above the optimal solution value are upper bounds, values


which are above the (unknown) optimal solution value,
•below the optimal solution value are lower bounds, values
which are below the (unknown) optimal solution value.

In order to discover the optimal solution value, an algorithm


really should address both these issues, it must concern itself
with both upper and lower bounds.
In particular the quality of these bounds is important to
the computational success of any algorithm:
- upper bounds that are as close as possible to the optimal
solution, i.e. as small as possible is good;
- lower bounds that are as close as possible to the optimal
solution, i.e. as large as possible is good.
In the context of a minimization problem, the upper
bound is generated by a heuristic method. Lower bounds
can be generated by linear programming, using the
technique of dual ascent, or by Lagrangean relaxation,
using subgradient optimization or multiplier adjustment.
Figure 2. Connection between heuristics and relaxation

You might also like