Line Search Methods
Line Search Methods
f (x)
xR
Any optimization algorithm starts by an initial point x0 and performs series of iterations to reach
the optimal point x . At any k th iteration the next point is given by the sum of old point and the
direction in which to search of a next point multiplied by how far to go in that direction. Thus
xk+1 = xk + k dk
here dk is a search direction and k is a positive scalar determining how far to go in that direction.
It is called as the step length. Additionally, the new point must be such that the function value (the
function which we are optimizing) at that point should be less than or equal to the previous point.
This is quite obvious because if we are moving in a different direction where the function value is
increasing we are not really moving towards the minimum. Thus,
f (xk+1 ) f (xk )
f (xk + k dk ) f (xk )
The general optimization algorithm begins with an initial point, finds a descent search direction,
determines the step length and checks the termination criteria. So, when we are trying to find the
step length, we already know that the direction in which we are going is descent. So, ideally we want
to go far enough so that the function reaches its minimum. Thus, given the previous point and a
descent search direction, we are trying to find a scalar step length such that the value of the function
is minimum in that direction. In its mathematical form we may write,
min f (x + d)
0
Since, x and d are known, this problem reduces to a univariate, 1D, minimization problem.Assuming
that f is smooth and continuous, we find its optimum where its first-derivative is zero, i.e. f (x +
d) = 0. All we are doing is trying to find zero of a function (i.e. the point where the curves intersects
the xaxis). We will visit some known and some newer zero-finding (or root-finding) techniques.
When, the dimensionality of the problem or the degree of equation increases finding an exact
zero is difficult. Usually, we are looking for an interval 0 [a, b] so that |a b| < , where is an
acceptable tolerance.
1 Bisection Method
In bisection method we reduce begin with an interval so that 0 [a, b] and divide the interval in two
] and [ a+b
, b]. A next search interval is chosen by comparing and finding which one
halves,i.e. [a, a+b
2
2
has zero. This is done by evaluating the sign. The algorithm for this is given as follows: Choose a, b
so that f (a)f (b) < 0
1. m =
a+b
2
1 BISECTION METHOD
Features of bisection
Guaranteed Convergence.
Slow as it has linear convergence with = 0.5.
At the k th iteration
1.2
|ba|
2k
)
= giving total evaluations as k = log( |ba|
2
Example
Converging f (x) =
Number
1
2
3
4
5
6
7
8
9
10
11
12
13
14
x
1+x2
a
-0.6
-0.6
-0.2625
-0.09375
-0.009375
-0.009375
-0.009375
-0.009375
-0.004102
-0.001465
-0.000146
-0.000146
-0.000146
-0.000146
b
0.75
0.075
0.075
0.075
0.075
0.032813
0.011719
0.001172
0.001172
0.001172
0.001172
0.000513
0.000183
0.000018
f (b)
0.48
0.07458
0.07458
0.07458
0.07458
0.032777
0.011717
0.001172
0.001172
0.001172
0.001172
0.000513
0.000183
0.000018
m
0.075
-0.2625
-0.09375
-0.009375
0.032813
0.011719
0.001172
-0.004102
-0.001465
-0.000146
0.000513
0.000183
0.000018
-0.000064
f (m)
0.07458
-0.245578
-0.092933
-0.009374
0.032777
0.011717
0.001172
-0.004101
-0.001465
-0.000146
0.000513
0.000183
0.000018
-0.000064
2 NEWTONS METHOD
2 Newtons Method
In Newtons method does a linear approximation of the function and finding the x-intercept of that
approximation, thereby improving the performance of the bisection method. Linear approximation
can be done by using Taylors series.
Newtons method has a quadratic convergence when the chosen point is close enough to zero.
If the derivative is zero at the root, it has only local quadratic convergence.
Numerical difficulties occur when the first-derivative is zero.
If a poor starting point is chosen the method may fail to converge or diverge.
If it is difficult to find analytical derivation, secant method may be used.
For a smooth, continuous function when proper starting point is chosen, Newtons method can be
x
4
.
real fast. The convergence of f (x) = 1+x
2 with x0 = 0.5 and = 10
X
1
2
3
4
5
f (x)
0.5
-0.333333
0.083333
-0.001166
0
x
1+x2
3 SECANT METHOD
1
2
3
4
5
6
..
.
0.75
-1.928571
-5.275529
-10.944297
-22.072876
-44.236547
..
.
20
-725265.55757
3 Secant Method
When f 0 is expensive or cumbersome to calculate, one can use secants method to approximate the
derivative. The derivation of this method comes by replacing first derivative in the newtons method
k1
by its approximation (finite differentiation), i.e f 0 (xk ) = xfkk f
, where fk = f (xk )
xk1
xk+1 = xk
xk xk1
fk
fk fk1
Just like Newtons method the secants method to find the minimum is given by:
xk+1 = xk
xk xk1 0
fk
0
fk0 fk1
Convergence of secant method is super-linear with = 1.618. The table below shows the secant
x
method convergence for f = 1+x
2 with -0.6 and 0.75 as initial points
4
xk1
-0.6
0.75
0.046552
-0.028817
xk
0.75
0.046552
-0.028817
0.000024
f (xk1 )
-0.441176
0.48
0.046451
-0.028793
f (xk )
0.48
0.046451
-0.028793
0.000024
x
1+x2
Number
0
1
2
3
..
.
a
-0.6
-0.6
-0.6
-0.6
..
.
b
0.75
0.234346
-0.084346
-0.281308
..
.
f (a)
-0.08375
-0.441176
-0.441176
-0.441176
..
.
f (b)
0.222146
0.222146
-0.08375
-0.26068
..
.
20
-0.6
-0.599911
-0.441176
-0.441146
4.1
x
1+x2
5 REFERENCES
Step-length Interval
[0, .0557]
[0, .0344]
[0, .0213]
[0, .0132]
[0, .0081]
[0, .0050]
[0, .00031]
[0, .0002]
[0, .0001]
[0, .00006]
i = i1 0.618
0.0344
0.0213
0.0132
0.0081
0.0050
0.0031
0.0002
0.0001
0.00006
0.000034
x1
0.0344
0.0213
0.0132
0.0081
0.0050
0.0031
0.0002
0.0001
0.00006
0.00004
x2
1.0688
1.0426
1.0264
1.0162
1.0100
1.0062
1.0004
1.0002
1.00012
1.00008
5 References
1. Jorge Nocedal, Stephen J.Wright: Numerical Optimization, Springer Series in Operations Research
(1999).
2. E. de Klerk, C. Roos, and T. Terlaky, Nonlinear Optimization (pdf) (ps), 1999-2004, Delft.
3. http://de2de.synechism.org/c3/sec36.pdf