0% found this document useful (0 votes)
56 views20 pages

7 Newton Raphson Method

The document discusses Newton Raphson method for optimization. It explains that Newton Raphson uses the first and second derivative to have better performance than gradient descent. It also discusses combining Newton Raphson with other methods like gradient descent to improve performance. It provides an example of using Newton Raphson to minimize a function.

Uploaded by

tanmoynath0999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views20 pages

7 Newton Raphson Method

The document discusses Newton Raphson method for optimization. It explains that Newton Raphson uses the first and second derivative to have better performance than gradient descent. It also discusses combining Newton Raphson with other methods like gradient descent to improve performance. It provides an example of using Newton Raphson to minimize a function.

Uploaded by

tanmoynath0999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Newton Raphson Method

Newton Raphson Method

• Better performance than the Steepest descent method due to the use of first and second derivative.

• However this happens when the initial guess is nearer to the minima.

• The functions must be in a form of 𝑓 𝑥 = 0

• Question: Can you combine any other method with Newton-Raphson so that its performance can
be improved.
Newton Raphson Method

• At each point, a quadratic approximation of the original function is used


𝑘 𝑘 𝑇 𝑘
1 𝑘 𝑇
𝑓 𝑥 ≈𝑓 𝑥 + 𝑥−𝑥 𝛻𝑓 𝑥 + 𝑥 − 𝑥 𝐹 𝑥𝑘 𝑥 − 𝑥𝑘
2
≜ 𝑞(𝑥)

Here, 𝐹 𝑥 𝑘 = 𝛻 2 𝑓(𝑥 𝑘 ) = Hessian

• Use the FONC: 𝛻𝑞 𝑥 = 0


⇒ 𝛻𝑓 𝑥 𝑘 + 𝐹 𝑥 𝑘 𝑥 − 𝑥 𝑘 = 0
This is Newton’s formula as previously discussed.
For 𝐹 𝑥 𝑘 > 0 at every point, it will converge to zero.
Newton Raphson Method

• The update equation


−1
𝑥 𝑘+1 = 𝑥 𝑘 − 𝐹 𝑥 𝑘 𝛻𝑓(𝑥 𝑘 )
• Here the order of the terms are important since these are matrix operations.
Newton Raphson Method

• If we start far from the solution the convergence is not guaranteed.


• Let us consider a function:
𝜙 𝛼 = 𝑓(𝑥 𝑘 + 𝛼𝑑 𝑘 ),
𝑘 𝑘 −1
where 𝑑 = −𝐹 𝑥 𝛻𝑓 𝑥 𝑘 = 𝑥 𝑘+1 − 𝑥 𝑘 is the search direction.
• Differentiating
𝑇 𝑘
𝜙′ 𝛼 = 𝛻𝑓 𝑥𝑘 + 𝛼𝑑 𝑘 𝑑
Newton Raphson Method

𝑇 𝑘 𝑇 𝑘
𝜙′ 𝛼 = 𝛻𝑓 𝑥𝑘 + 𝑘
𝛼𝑑 𝑑 ⇒ 𝜙′ 0 = 𝛻𝑓 𝑥 𝑘 𝑑
𝑘 𝑇 𝑘 −1
= −𝛻𝑓 𝑥 𝐹 𝑥 𝛻𝑓 𝑥 𝑘 < 0
𝒌 −𝟏
for 𝑭 𝒙 > 𝟎 and 𝛁𝒇 𝒙𝒌 ≠ 𝟎
• This means that the slope of 𝜙(𝛼) at 0 is negative => The function is decreasing.
• Hence it is possible to find an 𝛼 ∈ 0, 𝛼ത for which 𝜙 𝛼 < 𝜙(0)
• Which means 𝑓 𝑥 𝑘 + 𝛼𝑑𝑘 < 𝑓(𝑥 𝑘 )
𝒌 −𝟏
• Hence 𝑭 𝒙 > 𝟎 and 𝛁𝒇 𝒙𝒌 ≠ 𝟎 are necessary criteria for convergence.
Newton Raphson Method

𝑘 𝑇 𝑘 −1
−𝛻𝑓 𝑥 𝐹 𝑥 𝛻𝑓 𝑥𝑘 < 0
𝒌 −𝟏
for 𝑭 𝒙 < 𝟎 and 𝛁𝒇 𝒙𝒌 ≠ 𝟎
• This means that the slope of 𝜙(𝛼) at 0 is negative => The function is decreasing.
• Hence it is possible to find an 𝛼 ∈ 0, 𝛼ത for which

𝜙 𝛼 < 𝜙 0 ⇒ 𝑓 𝑥 𝑘 + 𝛼𝑑𝑘 < 𝑓(𝑥 𝑘 )

𝒌 −𝟏
• Which means Hence 𝑭 𝒙 > 𝟎 and 𝛁𝒇 𝒙𝒌 ≠ 𝟎 are necessary criteria for
convergence.
Newton Raphson Method

• The update equation can be modified to add a learning rate

𝑘+1 𝑘 𝑘 −1
𝑥 = 𝑥 − 𝛼𝐹 𝑥 𝛻𝑓 𝑥 𝑘

• Some disadvantages:
𝑘 −1
• The 𝐹 𝑥 matrix should be invertible.
• For a large n it becomes computationally expensive.
• We need to start at a sufficiently close range.
Levenberg Marquardt Modification
−1
𝑥 𝑘+1 = 𝑥 𝑘 − 𝛼𝐹 𝑥 𝑘 𝛻𝑓 𝑥 𝑘
𝑘 −1
• Disadvantage: The 𝐹 𝑥 matrix should be invertible.
−𝟏
Solution: 𝒙𝒌+𝟏 = 𝒙𝒌 − 𝜶 𝑭 𝐱 𝐤 + 𝝁𝒌 𝑰 𝜵𝒇 𝒙𝒌
• Where 𝜇𝑘 ≥ 0.
• If 𝜇𝑘 → 0, it approaches Newton’s Method
• If 𝜇𝑘 → ∞, it approaches Steepest descent.
Example

• Using Newton Raphson Method to minimize Powell function

𝑓 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 = 𝑥1 + 10𝑥2 2 + 5 𝑥3 − 𝑥4 2 + 𝑥2 − 2𝑥3 4 + 10 𝑥1 − 𝑥4 4

Start with 𝑥0 = 3, −1,0,1


Example

• Using Newton Raphson Method to minimize Powell function

𝑓 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 = 𝑥1 + 10𝑥2 2 + 5 𝑥3 − 𝑥4 2 + 𝑥2 − 2𝑥3 4 + 10 𝑥1 − 𝑥4 4

1. Find Gradient
Example

• Using Newton Raphson Method to minimize Powell function

𝑓 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 = 𝑥1 + 10𝑥2 2 + 5 𝑥3 − 𝑥4 2 + 𝑥2 − 2𝑥3 4 + 10 𝑥1 − 𝑥4 4

2. Find the Hessian


Example

• Using Newton Raphson Method to minimize Powell function

𝑓 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 = 𝑥1 + 10𝑥2 2 + 5 𝑥3 − 𝑥4 2 + 𝑥2 − 2𝑥3 4 + 10 𝑥1 − 𝑥4 4

3. Start with 𝑥0 = 3, −1,0,1 and do iteration 1 and repeat until 𝛻𝑓 𝑥 = 0 or stopping criterion.
Newton’s Method for Curve Fitting

Least Square Method


• A number of data points, 𝑦, are given and
are required to fit them in a function 𝑞.
• Example: m data points collected over a
time duration as shown in the figure are
given.
• We need to fit them into a 3rd order
polynomial 𝑞 𝑡 = 𝑎𝑡 3 + 𝑏𝑡 2 + 𝑐𝑡 + 𝑑,
where 𝑎, 𝑏, 𝑐, 𝑑 are the unknown
coefficients.
• We need to determine 𝑎, 𝑏, 𝑐, 𝑑.
Newton’s Method for Curve Fitting

• The unknown vector 𝑥 = 𝑎, 𝑏, 𝑐, 𝑑 𝑇 = 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 𝑇 , we can take some initial assumption

𝑇
𝑥0 = 𝑥1 0 , 𝑥2 0 , 𝑥3 0 , 𝑥4 0

• And the function 𝑞(𝑡), can be written in terms of 𝑥 as

𝑝 𝑥 = 𝑥1 𝑡 3 + 𝑥2 𝑡 2 + 𝑥3 𝑡 + 𝑥4

• Define the error between the actual data and the estimated data as
𝑟𝑖 𝑥 = 𝑦𝑖 − 𝑝𝑖 𝑥 = 𝑦𝑖 − 𝑥1 𝑡𝑖3 − 𝑥2 𝑡𝑖2 − 𝑥3 𝑡𝑖 − 𝑥4
Newton’s Method for Curve Fitting

• The cost function that has to be minimized can be formulated as

𝑚
2
𝑓 𝑥 = ෍ 𝑟𝑖 𝑥
𝑖=1
• The optimization problem is now 𝑚

𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 ෍ 𝑟𝑖 𝑥 2

𝑖=1

• Define 𝑟 = 𝑟1 𝑟2 … 𝑟𝑚 𝑇 and 𝑓 𝑥 = 𝑟 𝑇 𝑟 = σ𝑚
𝑖=1 𝑟𝑖 𝑥
2.
Newton’s Method for Curve Fitting

• Now the gradient and the Hessian can be found as 𝑚


𝜕𝑓(𝑥) 𝜕𝑟𝑖
𝛻𝑓 𝑥 𝑗 = = ෍ 2𝑟𝑖
𝜕𝑥𝑗 𝜕𝑥𝑗
𝑖=1

𝜕𝑟1 𝜕𝑟1 𝜕𝑟1 𝜕𝑟1



𝜕𝑥1 𝜕𝑥2 𝜕𝑥3 𝜕𝑥𝑛
.
𝐷𝑒𝑓𝑖𝑛𝑖𝑛𝑔, 𝐽 𝑥 = .
.
𝜕𝑟𝑚 𝜕𝑟𝑚 𝜕𝑟𝑚 𝜕𝑟𝑚

𝜕𝑥1 𝜕𝑥2 𝜕𝑥3 𝜕𝑥𝑛 𝑚×𝑛
This is the Jacobian Matrix.
Newton’s Method for Curve Fitting

• Hessian
𝜕 𝜕𝑓
𝐹 𝑥 =
𝜕𝑥𝑘 𝜕𝑥𝑗

𝑚
𝜕 𝜕𝑟𝑖
= ෍ 2𝑟𝑖
𝜕𝑥𝑘 𝜕𝑥𝑗
𝑖=1

𝑚
𝜕𝑟𝑖 𝜕𝑟𝑖 𝜕 2 𝑟𝑖
=2 ෍ + 𝑟𝑖
𝜕𝑥𝑘 𝜕𝑥𝑗 𝜕𝑥𝑘 𝜕𝑥𝑗
𝑖=1

= 2( 𝐽𝑇 𝐽 + 𝑆)
Newton’s Method for Curve Fitting

𝜕 2 𝑟𝑖
• Here, S = S(x) = 𝑟𝑖 and it can be ignored as its contribution becomes negligible.
𝜕𝑥𝑘 𝜕𝑥𝑗

• Hence the update equation:


𝑥𝑘+1 = 𝑥𝑘 − 𝐽𝑇 𝐽 + 𝑆 −1 𝑇
𝐽 𝑟
OR

𝑥𝑘+1 = 𝑥𝑘 − 𝐽𝑇 𝐽 −1 𝐽𝑇 𝑟 Gauss Newton Method


OR

𝑥𝑘+1 = 𝑥𝑘 − 𝐽𝑇 𝐽 + 𝜇𝑘 𝐼 −1 𝐽𝑇 𝑟
Levenberg Marquardt Algorithm
Newton’s Method for Curve Fitting: Dimensions

• If there are 100 data points and 𝑥 = 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 𝑇

• Then the dimensions of the update equation will be

𝑇 −1 𝑇
𝑥𝑘+1 𝟒×𝟏 = 𝑥𝑘(𝟒×𝟏) − 𝐽𝟒×𝟏𝟎𝟎 𝐽𝟏𝟎𝟎×𝟒 + 𝜇𝑘 𝐼𝟒×𝟒 𝟒×𝟒 𝐽𝟒×𝟏𝟎𝟎 𝑟𝟏𝟎𝟎×𝟏

You might also like