0% found this document useful (0 votes)

5 views26 pages

Opt Sem10

The document outlines the Conjugate Gradient (CG) method for solving strongly convex quadratic optimization problems, detailing the initialization, optimal step length, algorithm iteration, direction update, and convergence loop. It emphasizes the importance of A-orthogonality in direction updates and provides practical formulas for the step size and direction coefficients. Additionally, it includes pseudocode for implementing the CG method and discusses modifications for non-linear problems.

Uploaded by

Roman Degtyarev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views26 pages

Opt Sem10

Uploaded by

Roman Degtyarev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Conjugate gradient method

Seminar

Optimization for ML. Faculty of Computer Science. HSE University

v § } 1
Strongly convex quadratics
Consider the following quadratic optimization problem: Optimality conditions:
1 ⊤ ∇f (x∗ ) = Ax∗ − b = 0 ⇐⇒ Ax∗ = b
min f (x) = min x Ax − b⊤ x + c, where A ∈ Sd++ .
x∈Rd x∈Rd 2

Lecture recap v § } 2
Strongly convex quadratics
Consider the following quadratic optimization problem: Optimality conditions:
1 ⊤ ∇f (x∗ ) = Ax∗ − b = 0 ⇐⇒ Ax∗ = b
min f (x) = min x Ax − b⊤ x + c, where A ∈ Sd++ .
x∈Rd x∈Rd 2

Steepest Descent Conjugate Gradient

4 4
2 2
0 0
2 2
4 4
4 2 0 2 4 4 2 0 2 4
Lecture recap v § } 2
Overview of the CG method for the quadratic problem
1) Initialization. k = 0 and xk = x0 , dk = d0 = −∇f (x0 ).

Lecture recap v § } 3
Overview of the CG method for the quadratic problem
1) Initialization. k = 0 and xk = x0 , dk = d0 = −∇f (x0 ).
2) Optimal Step Length. By the procedure of line search we find the optimal length of step. This involves
calculate αk minimizing f (xk + αk dk ):

d⊤
k (Axk − b)
αk = −
d⊤
k Adk

3) Algorithm Iteration. Update the position of xk by moving in the direction dk , with a step size αk :

xk+1 = xk + αk dk

d⊤
k (Axk − b)
αk = −
d⊤
k Adk

3) Algorithm Iteration. Update the position of xk by moving in the direction dk , with a step size αk :

xk+1 = xk + αk dk

4) Direction Update. Update the dk+1 = −∇f (xk+1 ) + βk dk , where βk is calculated by the formula:

∇f (xk+1 )⊤ Adk
βk = .
d⊤
k Adk

d⊤
k (Axk − b)
αk = −
d⊤
k Adk

3) Algorithm Iteration. Update the position of xk by moving in the direction dk , with a step size αk :

xk+1 = xk + αk dk

4) Direction Update. Update the dk+1 = −∇f (xk+1 ) + βk dk , where βk is calculated by the formula:

∇f (xk+1 )⊤ Adk
βk = .
d⊤
k Adk

5) Convergence Loop. Repeat steps 2-4 until n directions are built, where n is the dimension of space (dimension
of x).
Lecture recap v § } 3
Optimal Step Length

Exact line search:

αk = arg min f (xk+1 ) = arg min f (xk + αdk )
α∈R+ α∈R+

Lecture recap v § } 4
Optimal Step Length

Exact line search:

αk = arg min f (xk+1 ) = arg min f (xk + αdk )
α∈R+ α∈R+

Let’s find an analytical expression for the step αk :

1
f (xk + αdk ) = (xk + αdk )⊤ A (xk + αdk ) − b⊤ (xk + αdk ) + c
2
1 1 ⊤

= α2 d⊤ ⊤
k Adk + dk (Axk − b) α + xk Axk + x⊤k dk + c
2 2

Lecture recap v § } 4
Optimal Step Length

Exact line search:

αk = arg min f (xk+1 ) = arg min f (xk + αdk )
α∈R+ α∈R+

Let’s find an analytical expression for the step αk :

1
f (xk + αdk ) = (xk + αdk )⊤ A (xk + αdk ) − b⊤ (xk + αdk ) + c
2
1 1 ⊤

= α2 d⊤ ⊤
k Adk + dk (Axk − b) α + xk Axk + x⊤k dk + c
2 2

We consider A ∈ Sd++ , so the point with zero derivative on this parabola is a minimum:

d⊤
k (Axk − b)
d⊤ ⊤

k Adk αk + dk (Axk − b) = 0 ⇐⇒ αk = −
d⊤
k Adk

Lecture recap v § } 4
Direction Update
We update the direction in such a way that the next direction is A - orthogonal to the previous one:

dk+1 ⊥A dk ⇐⇒ d⊤
k+1 Adk = 0

Lecture recap v § } 5
Direction Update
We update the direction in such a way that the next direction is A - orthogonal to the previous one:

dk+1 ⊥A dk ⇐⇒ d⊤
k+1 Adk = 0

Since dk+1 = −∇f (xk+1 ) + βk dk , we choose βk so that there is A - orthogonality:

∇f (xk+1 )⊤ Adk
d⊤ ⊤ ⊤
k+1 Adk = −∇f (xk+1 ) Adk + βk dk Adk = 0 ⇐⇒ βk =
d⊤
k Adk

Lecture recap v § } 5
Direction Update
We update the direction in such a way that the next direction is A - orthogonal to the previous one:

dk+1 ⊥A dk ⇐⇒ d⊤
k+1 Adk = 0

Since dk+1 = −∇f (xk+1 ) + βk dk , we choose βk so that there is A - orthogonality:

∇f (xk+1 )⊤ Adk
d⊤ ⊤ ⊤
k+1 Adk = −∇f (xk+1 ) Adk + βk dk Adk = 0 ⇐⇒ βk =
d⊤
k Adk

Lemma 1

All directions of construction using the procedure described above are orthogonal to each other:

d⊤
i Adj = 0, if i ̸= j

d⊤
i Adj > 0, if i = j

Lecture recap v § } 5
A-orthogonality

v1 and v2 are orthogonal v and v are A-orthogonal

v1Tv2 = 0.00 v TTv = 0.80
v1TAv2 = 1.19 v Av = 0.00
4 4

2 2

0 0
x

x
2 2

4 4
4 2 0 2 4 4 2 0 2 4
Lecture recap x x v § } 6
Convergence of the CG method

Lemma 2

Suppose, we solve n-dimensional quadratic convex optimization problem. The conjugate directions method:
k
X
xk+1 = x0 + αi di ,
i=0

d⊤
i (Axi − b)
where αi = − taken from the line search, converges for at most n steps of the algorithm.
d⊤
i Adi

Lecture recap v § } 7
CG method in practice
In practice, the following formulas are usually used for the step αk and the coefficient βk :

⊤
rk⊤ rk rk+1 rk+1
αk = βk = ,
d⊤
k Adk
⊤
rk rk

where rk = b − Axk , since xk+1 = xk + αk dk then rk+1 = rk − αk Adk . Also, riT rk = 0, ∀i ̸= k (Lemma 5 from
the lecture).

Lecture recap v § } 8
CG method in practice
In practice, the following formulas are usually used for the step αk and the coefficient βk :

⊤
rk⊤ rk rk+1 rk+1
αk = βk = ,
d⊤
k Adk
⊤
rk rk

where rk = b − Axk , since xk+1 = xk + αk dk then rk+1 = rk − αk Adk . Also, riT rk = 0, ∀i ̸= k (Lemma 5 from
the lecture).
Let’s get an expression for βk :
∇f (xk+1 )⊤ Adk r⊤ Adk
βk = ⊤
= − k+1
dk Adk d⊤
k Adk

Lecture recap v § } 8
CG method in practice
In practice, the following formulas are usually used for the step αk and the coefficient βk :

⊤
rk⊤ rk rk+1 rk+1
αk = βk = ,
d⊤
k Adk
⊤
rk rk

⊤ 1 ⊤ ⊤ ⊤
Numerator: rk+1 Adk = r
αk k+1
(rk − rk+1 ) = [rk+1 rk = 0] = − α1k rk+1 rk+1
⊤
Denominator: d⊤
k Adk = (rk + βk−1 dk−1 ) Adk =
1 ⊤
r
αk k
(rk − rk+1 ) = 1 ⊤
r r
αk k k

Lecture recap v § } 8
CG method in practice
In practice, the following formulas are usually used for the step αk and the coefficient βk :

⊤
rk⊤ rk rk+1 rk+1
αk = βk = ,
d⊤
k Adk
⊤
rk rk

ñ Question

Why is this modification better than the standard version?

Lecture recap v § } 8
CG method in practice. Pseudocode
r0 := b − Ax0
if r0 is sufficiently small, then return x0 as the result
d0 := r0
k := 0
repeat
rTk rk
αk :=
dTk Adk
xk+1 := xk + αk dk
rk+1 := rk − αk Adk
if rk+1 is sufficiently small, then exit loop
rTk+1 rk+1
βk :=
rTk rk
dk+1 := rk+1 + βk dk
k := k + 1
end repeat
return xk+1 as the result
Lecture recap v § } 9
Exercise 1

Write iterations of the conjugate gradient method for a quadratic problem

1 T
f (x) = x Ax − bT x −→ minn
2 x∈R

and run experiments for several matrices A. See code here 3.

Lecture recap v § } 10
Non-linear conjugate gradient method
In case we do not have an analytic expression for a function or its gradient, we will most likely not be able to solve
the one-dimensional minimization problem analytically. Therefore, step 2 of the algorithm is replaced by the usual
line search procedure. But there is the following mathematical trick for the fourth point:
For two iterations, it is fair:

xk+1 − xk = cdk ,

where c is some kind of constant. Then for the quadratic case, we have:

∇f (xk+1 ) − ∇f (xk ) = (Axk+1 − b) − (Axk − b) = A(xk+1 − xk ) = cAdk

1
Expressing from this equation the work Adk = (∇f (xk+1 ) − ∇f (xk )), we get rid of the “knowledge” of the
c
function in step definition βk , then point 4 will be rewritten as:

∇f (xk+1 )⊤ (∇f (xk+1 ) − ∇f (xk ))

βk = .
d⊤
k (∇f (xk+1 ) − ∇f (xk ))

This method is called the Polak-Ribier method.

Lecture recap v § } 11
Exercise 2

Write iterations of the Polack-Ribier method and run experiments for several µ in binary logistic regression:
m
µ 1 X
f (x) = ∥x∥22 + log(1 + exp(−yi ⟨ai , x⟩)) −→ minn
2 m x∈R
i=1

See code here 3.

Lecture recap v § } 12
A pathological example

Let t ∈ (0, 1) and √

 
√t t √  
1
 t +t
1√ t √ 0

t 1+t t
 
W = , b=
 ... 

 .. .. .. 
 . . . 
√ 0
t 1+t

Since W invertible, there exists a unique solution to W x = b. Solving it by conjugate gradient descent gives us
rather bad convergence. During the CG process, the error grows exponentially (!), until it suddenly becomes zero as
the unique solution is found.
Residual ∥W xk − b∥2 grows exponentially as (1/t)k until the n iteration, after which it drops sharply to zero.
See experiment here 3.

Computational experiments v § } 13
Another computational experiments

Let’s see another examples here 3. The code taken from §.

Computational experiments v § } 14

Discrete Mathematics An Open Introduction, 2nd Edition
No ratings yet
Discrete Mathematics An Open Introduction, 2nd Edition
242 pages
Chapter 4
No ratings yet
Chapter 4
27 pages
Introduction Numerical Analysis
No ratings yet
Introduction Numerical Analysis
252 pages
CAAM 454 554 1lvazxx
No ratings yet
CAAM 454 554 1lvazxx
422 pages
Get Signals and Systems Principles and Applications 1st Edition Shaila Dinkar Apte Free All Chapters
No ratings yet
Get Signals and Systems Principles and Applications 1st Edition Shaila Dinkar Apte Free All Chapters
55 pages
MAT321 Lecture Notes Boumal 2019
No ratings yet
MAT321 Lecture Notes Boumal 2019
203 pages
Nla 08
No ratings yet
Nla 08
77 pages
On The Connection Between The Conjugate Gradient Method and Quasi-Newton Methods On Quadratic Problems
No ratings yet
On The Connection Between The Conjugate Gradient Method and Quasi-Newton Methods On Quadratic Problems
20 pages
Clnote Oct12
No ratings yet
Clnote Oct12
25 pages
Clnote Sept24
No ratings yet
Clnote Sept24
24 pages
Benson 1
No ratings yet
Benson 1
36 pages
Jiyue Zeng Honors Thesis
No ratings yet
Jiyue Zeng Honors Thesis
59 pages
Mile Pra 25 Aug 2024 12th Jee Main Part Test Phase 3 KPM Model Test
No ratings yet
Mile Pra 25 Aug 2024 12th Jee Main Part Test Phase 3 KPM Model Test
12 pages
Lec 02
No ratings yet
Lec 02
43 pages
10 General Aptitude - GQB (Ddpanda)
No ratings yet
10 General Aptitude - GQB (Ddpanda)
71 pages
Doan BFGS
No ratings yet
Doan BFGS
72 pages
OULD - bammOUNE Preparation Tp1
No ratings yet
OULD - bammOUNE Preparation Tp1
13 pages
The Sun and The Stars Are Set in Motion - New Model of Solar System: Legitimate Refutation of Heliocentric Model
No ratings yet
The Sun and The Stars Are Set in Motion - New Model of Solar System: Legitimate Refutation of Heliocentric Model
147 pages
Multi Variable Optimization: Min F (X, X, X, - X)
No ratings yet
Multi Variable Optimization: Min F (X, X, X, - X)
69 pages
Iterative Methods For Linear Systems: Course Website
No ratings yet
Iterative Methods For Linear Systems: Course Website
24 pages
Scan 9 Apr 2019 PDF
No ratings yet
Scan 9 Apr 2019 PDF
26 pages
Behat
100% (1)
Behat
87 pages
1991imajna 11 325 332
No ratings yet
1991imajna 11 325 332
9 pages
Chapter1 Existence and Uniqueness Theorems
No ratings yet
Chapter1 Existence and Uniqueness Theorems
13 pages
Hauser Lecture2
No ratings yet
Hauser Lecture2
26 pages
Arbitrary Precision Calculator
No ratings yet
Arbitrary Precision Calculator
11 pages
Math 11143 Peer Review
No ratings yet
Math 11143 Peer Review
16 pages
Subgrad Method Slides
No ratings yet
Subgrad Method Slides
33 pages
Gradient Descent PDF
No ratings yet
Gradient Descent PDF
9 pages
BE EnTc Mid Sem Examination Digital Image and Video Processing 2021-2022 Sem I
No ratings yet
BE EnTc Mid Sem Examination Digital Image and Video Processing 2021-2022 Sem I
11 pages
E1 251 Linear and Nonlinear Op2miza2on: Chapter 9: The Method of Conjugate Direc6ons
No ratings yet
E1 251 Linear and Nonlinear Op2miza2on: Chapter 9: The Method of Conjugate Direc6ons
32 pages
Chương 9
No ratings yet
Chương 9
12 pages
Lecture Notes
No ratings yet
Lecture Notes
337 pages
Painless Conjugate Gradient
No ratings yet
Painless Conjugate Gradient
64 pages
Jee Main - (One Year Crp-2425) C-Lot-Ph-1 (Vec, KM, Lom, Wep & Com)
No ratings yet
Jee Main - (One Year Crp-2425) C-Lot-Ph-1 (Vec, KM, Lom, Wep & Com)
20 pages
Multi Variable Optimization: Min F (X, X, X, - X)
No ratings yet
Multi Variable Optimization: Min F (X, X, X, - X)
38 pages
Lecture 7 8 Other Descent Methods
No ratings yet
Lecture 7 8 Other Descent Methods
7 pages
E1 251 Linear and Nonlinear Op2miza2on
No ratings yet
E1 251 Linear and Nonlinear Op2miza2on
24 pages
Global Convergence of A Modified Fletcher-Reeves Conjugate Gradient Method With Armijo-Type Line Search - Zhang, Zhou (2006)
No ratings yet
Global Convergence of A Modified Fletcher-Reeves Conjugate Gradient Method With Armijo-Type Line Search - Zhang, Zhou (2006)
12 pages
Cgnotes PDF
No ratings yet
Cgnotes PDF
11 pages
Partial Differential Equation Part C Upto 21oct
No ratings yet
Partial Differential Equation Part C Upto 21oct
7 pages
Maximum Slope Method
No ratings yet
Maximum Slope Method
14 pages
Conjugate Gradient Method
No ratings yet
Conjugate Gradient Method
8 pages
Metodos Iterativos para Optimizacion
No ratings yet
Metodos Iterativos para Optimizacion
188 pages
BSC Part 3
No ratings yet
BSC Part 3
29 pages
Lecture 5
No ratings yet
Lecture 5
6 pages
Expanding Subspace Theorem
No ratings yet
Expanding Subspace Theorem
7 pages
Unconstrained Optimization: Prof. S.S. Jang Department of Chemical Engineering National Tsing-Hua Univeristy
No ratings yet
Unconstrained Optimization: Prof. S.S. Jang Department of Chemical Engineering National Tsing-Hua Univeristy
46 pages
Multi Variable Optimization: Min F (X, X, X, - X)
No ratings yet
Multi Variable Optimization: Min F (X, X, X, - X)
69 pages
Assignment/ Tugasan - Elementary Data Analysis
No ratings yet
Assignment/ Tugasan - Elementary Data Analysis
8 pages
Conjugate Gradient Method
No ratings yet
Conjugate Gradient Method
30 pages
Python Notes 11 Dictionary Tuples and Sets 1664121924
No ratings yet
Python Notes 11 Dictionary Tuples and Sets 1664121924
21 pages
Lec4 Gradient Method Revise
No ratings yet
Lec4 Gradient Method Revise
33 pages
15.093 Optimization Methods
No ratings yet
15.093 Optimization Methods
12 pages
Jain College, Jayanagar II PUC Mock Paper I 2018 Mathematics Duration: 3hr 15 Min Max - Marks: 100 Part A I. Answer All The Questions: 1 × 10 10
No ratings yet
Jain College, Jayanagar II PUC Mock Paper I 2018 Mathematics Duration: 3hr 15 Min Max - Marks: 100 Part A I. Answer All The Questions: 1 × 10 10
3 pages
Lecture 04 - Conjugate Gradient Methods
No ratings yet
Lecture 04 - Conjugate Gradient Methods
9 pages
02-Subgrad Method Notes
No ratings yet
02-Subgrad Method Notes
27 pages
Painless Conjugate Gradient Figs
No ratings yet
Painless Conjugate Gradient Figs
39 pages
02 Chapter 3 - Weight Volume Relationships
No ratings yet
02 Chapter 3 - Weight Volume Relationships
42 pages
Course Notes MATH
No ratings yet
Course Notes MATH
130 pages
Transcend Ts4gssd25h M
No ratings yet
Transcend Ts4gssd25h M
34 pages
06 SG Method
No ratings yet
06 SG Method
33 pages
Hw3sol PDF
No ratings yet
Hw3sol PDF
8 pages
GOVT 702: Advanced Political Analysis Georgetown University
No ratings yet
GOVT 702: Advanced Political Analysis Georgetown University
5 pages
Optimization Class Notes MTH-9842
No ratings yet
Optimization Class Notes MTH-9842
25 pages
Solvingsingular Linear Equation
No ratings yet
Solvingsingular Linear Equation
49 pages
Grade 11 Ap CSP 4TH MP Exam
No ratings yet
Grade 11 Ap CSP 4TH MP Exam
4 pages
Conjugate Gradient Method Report
No ratings yet
Conjugate Gradient Method Report
17 pages
Bda Important Questions
100% (1)
Bda Important Questions
4 pages
Chapter 3 Supplementary: (Recall That N (N) )
No ratings yet
Chapter 3 Supplementary: (Recall That N (N) )
5 pages
Notes On Some Methods For Solving Linear Systems: Dianne P. O'Leary, 1983 and 1999 September 25, 2007
No ratings yet
Notes On Some Methods For Solving Linear Systems: Dianne P. O'Leary, 1983 and 1999 September 25, 2007
11 pages
Aryabhatta 2021 Class VIII QP PDF
No ratings yet
Aryabhatta 2021 Class VIII QP PDF
14 pages
BJMC - 04 (Answer)
No ratings yet
BJMC - 04 (Answer)
2 pages
Algorithms Process Optimization
No ratings yet
Algorithms Process Optimization
5 pages
Project For Automated Train by Roshan
No ratings yet
Project For Automated Train by Roshan
6 pages
Analysis in Adams
No ratings yet
Analysis in Adams
5 pages
(Sipnayan 2021) School Invitation
No ratings yet
(Sipnayan 2021) School Invitation
16 pages
Appendix: 9.1 Functionals and Functional Derivatives
No ratings yet
Appendix: 9.1 Functionals and Functional Derivatives
4 pages
X IR T T
No ratings yet
X IR T T
8 pages
Nonlinear Programming 3rd Edition Theoretical Solutions Manual
No ratings yet
Nonlinear Programming 3rd Edition Theoretical Solutions Manual
12 pages
Presentation Regression
No ratings yet
Presentation Regression
12 pages
Parallelograms
No ratings yet
Parallelograms
4 pages
Conjugate Gradient Method: Com S 477/577 Nov 6, 2007
No ratings yet
Conjugate Gradient Method: Com S 477/577 Nov 6, 2007
8 pages
Mat 1275
No ratings yet
Mat 1275
5 pages
Chapter - 5 Is - LM Model Econ - 102 2
No ratings yet
Chapter - 5 Is - LM Model Econ - 102 2
28 pages
Matrix Multiplication1
No ratings yet
Matrix Multiplication1
10 pages
Solutions to Problems in Fluids and Turbomachinery
From Everand
Solutions to Problems in Fluids and Turbomachinery
Rahul Basu
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet

Opt Sem10

Uploaded by

Opt Sem10

Uploaded by

Conjugate gradient method

Optimization for ML. Faculty of Computer Science. HSE University

Steepest Descent Conjugate Gradient

Exact line search:

Exact line search:

Let’s find an analytical expression for the step αk :

Exact line search:

Let’s find an analytical expression for the step αk :

Since dk+1 = −∇f (xk+1 ) + βk dk , we choose βk so that there is A - orthogonality:

Since dk+1 = −∇f (xk+1 ) + βk dk , we choose βk so that there is A - orthogonality:

v1 and v2 are orthogonal v and v are A-orthogonal

Why is this modification better than the standard version?

Write iterations of the conjugate gradient method for a quadratic problem

and run experiments for several matrices A. See code here 3.

∇f (xk+1 ) − ∇f (xk ) = (Axk+1 − b) − (Axk − b) = A(xk+1 − xk ) = cAdk

∇f (xk+1 )⊤ (∇f (xk+1 ) − ∇f (xk ))

This method is called the Polak-Ribier method.

See code here 3.

Let t ∈ (0, 1) and √

Let’s see another examples here 3. The code taken from §.

You might also like