0% found this document useful (0 votes)
11 views5 pages

Exercise Sheet 1

The document contains exercise sheets for a course on Linear Algebra and Optimization for Machine Learning at Delft University of Technology. It includes various mathematical exercises related to matrix representations, properties of eigenvalues, orthogonal projections, and gradient descent methods. Each exercise requires formal proofs or computations regarding linear algebra concepts and their applications in optimization.

Uploaded by

lershbersh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views5 pages

Exercise Sheet 1

The document contains exercise sheets for a course on Linear Algebra and Optimization for Machine Learning at Delft University of Technology. It includes various mathematical exercises related to matrix representations, properties of eigenvalues, orthogonal projections, and gradient descent methods. Each exercise requires formal proofs or computations regarding linear algebra concepts and their applications in optimization.

Uploaded by

lershbersh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

DELFT UNIVERSITY OF TECHNOLOGY

Faculty of Electrical Engineering, Mathematics and Computer Science

EXERCISE SHEET 1 – WI4635 LINEAR ALGEBRA AND OPTIMIZATION


FOR MACHINE LEARNING

Exercise 1
Consider the matrix
−1.0 1.0 0.0 · · · 0.0
 
.. .. 
 0.0 −1.0 1.0
 . . 
A= . .. .. ..  ∈ Rn×n+1
 .. . . . 0.0
0.0 · · · 0.0 −1.0 1.0

(a) Depending on n, how many numbers (integers and floats) are required to store the matrix
in dense format and in the

• dictionary of keys (DOK),


• compressed sparse row (CSR) and
• compressed sparse column (CSC)

sparse matrix formats. And write down the matrix representations of A in the three
formats for the case n = 4.

(b) Compute for which n which matrix format from (a) is most efficient.

(c) Assume a sparse matrix with unknown distribution of the nonzero entries. Discuss the
efficiency of the operation X
diag A.

Hint:

(c) Remember that direct access to single entries of sparse matrices is expensive.

Exercise 2
Show formally:

(a) Every column of


x⊗y
is a multiple of x and every row is a multiple of y.

(b) x ⊗ y is a rank 1 matrix ⇔ x, y ̸= 0.

1
Exercise 3
Verify that the definitions v
u n X
m
uX
∥A∥F := t |aij |2
i=1 j=1

and
∥A∥2F = tr A⊤ A


are equivalent.

Exercise 4
Let A be diagonalizable with
A = V DV −1 . (1)
(a) Discuss the order of the number of operations (addition & multiplication) necessary for
computing
Ak
directly or efficiently, making use of eq. (1) (as discussed in the lecture/lecture notes).
Assume that A and V are both dense, while D is – of course – sparse.
(b) If, instead of computing
Ak ,
we are interested in computing
Ak v
for some vector v, how does the situation change?
(c) What would be the impact of A being sparse, but V being dense.

Hint: As long as the matrix is dense, the numbers of operations can be simply counted.
(c) For the discussion, remember that the multiplication with a sparse matrix only depends
on the number of non-zero entries.

Exercise 5
Let A ∈ Rn×n be symmetric positive definite. Show:
(a) The spectral norm of A corresponds to its largest eigenvalue, that is,

∥A∥2 = λmax .

(operator norm induced by ∥·∥2 vector norm)


(b) The spectral condition number, that is,

κ (A) := ∥A∥2 A−1 2


,

is given by
λmax (A)
κ (A) =
λmin (A)
with λmax (A) and λmin (A) the maximum respectively minimum eigenvalue of A.

2
Hint:

(a) First, show that


∥Ax∥22
P 2 2
λa
∥A∥22 = max 2 = max Pi i 2 i
i ai
x̸=0 ∥x∥ x̸=0
2
with ai from
n
X
x= ai vi ,
i

where (λi , vi ) are the eigenpairs of A; here, the vi have to be chosen in an appropriate
way. Then, show that P 2 2
λa
max Pi i 2 i = λ2max .
i ai
x̸=0

Exercise 6

(b) Show that


(v, w)
Pw v = w
(w, w)
is an orthogonal projection.

(b) Argue why the classical Gram–Schmidt orthogonalization algorithm may experience a
loss of orthogonality.

(b) Show that Givens rotations are orthogonal and compute their spectral condition number.

Hint:

(c) The definition of the spectral condition number is given in Exercise 5.

Exercise 7
Let A ∈ Rn×n be symmetric. Show that

λmin (A) ≤ R (A, v) ≤ λmax (A)

for all 0 ̸= v ∈ Rn . Here, R (A, v) is the Rayleigh quotient.


Hint: Similar to Exercise 5, write v as a linear combination of the eigenvectors.

Exercise 8
Show that the summation of two vectors

S = x1 + x2

is backward stable.
Hint: Let the numerical algorithm of adding to floating point numbers give you S̃ with
S̃i = ((x1 )i + (x2 )i ) (1 + εi ) with |εi | ≤ ε.

3
Exercise 9
Show that the left and right eigenvalues of a matrix A ∈ Rn×n are the same.
Hint: Make use of
det A⊤ = det (A) .


Exercise 10
Let A ∈ Rn×n with
A = LDL⊤ ,
where
···
 
1 0 0
.. .. .. 
l
 . . .
L =  21
 ... ... .. 
. 0
ln1 · · · ln,(n−1) 1
is a lower triangular matrix and

d11 0 · · · 0
 
... ... .. 
0 . 

D= . . .
 .. .. .. 0 

0 · · · 0 dnn

is a diagonal matrix. Show the following properties of A:


(a) A is symmetric.

(b) A is nonsingular if and only if all diagonal entries of D are nonzero.

(c) A is positive definite or negative definite if and only if all diagonal entries of D are
positive or negative, respectively.

Exercise 11
Consider the gradient descent method with a varying learning rate αk :
Algorithm 1: Gradient descent method
Data: Initial guess x(0) ∈ Rn , learning rate α ∈ R+ , and tolerance T OL > 0
r(0) := b − Ax(0) ;
while r(k) ≥ T OL r(0) do
Compute αk
x(k+1) := x(k) + αk r(k)
r(k+1) := b − Ax(k+1)
end
Result: Approximate solution of Ax = b
Assume that A is symmetric positive definite. Compute a formula for αk (step in red) such
that, in each step
2
e(k+1) A
is minimized, where
∥v∥2A ; = v ⊤ Av,

4
e(k+1) = x⋆ − x(k+1) and x⋆ is the solution of

Ax = b.

Hint: Show that


2 2 2 2
e(k+1) A
= e(k) A
− 2αk r(k) 2
+ αk2 r(k) A
=: f (αk ) ,

and then, compute the global minimum of f : R → R with respect to αk .

You might also like