0% found this document useful (0 votes)
7 views5 pages

Exercise Sheet 8

This document is an exercise sheet for a course on Linear Algebra and Optimization for Machine Learning at Delft University of Technology. It includes various exercises related to single-layer perceptrons, linear functions, numerical approximations, and the implementation of neural networks. The exercises cover theoretical proofs, programming tasks, and applications of mathematical concepts in machine learning.

Uploaded by

lershbersh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views5 pages

Exercise Sheet 8

This document is an exercise sheet for a course on Linear Algebra and Optimization for Machine Learning at Delft University of Technology. It includes various exercises related to single-layer perceptrons, linear functions, numerical approximations, and the implementation of neural networks. The exercises cover theoretical proofs, programming tasks, and applications of mathematical concepts in machine learning.

Uploaded by

lershbersh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

DELFT UNIVERSITY OF TECHNOLOGY

Faculty of Electrical Engineering, Mathematics and Computer Science

EXERCISE SHEET 8 – WI4635 LINEAR ALGEBRA AND OPTIMIZATION


FOR MACHINE LEARNING

Exercise 1
Show that the single-layer perceptron model

p (x) = a⊤ σ (bx + c) (1)

with σ(x) = max{0, x} and a, b, c ∈ Rn for some n ∈ N can represent any piecewise linear
function. You may proceed as follows:

(a) Let 0 = x0 ≤ . . . ≤ xN = 1 be a set of grid points, and let C(I) and P1 (I) be the
continuous and linear functions on the interval I, respectively. Show that any function

f ∈ C([0, 1]) ∧ f ∈ P1 ([xi , xi+1 ]) ∀i ∈ {0, . . . , N }

can be written as a linear combination of hat functions (Figure 1) defined as

0, x ≤ xi−1 ,


 x−xi−1

, x ∈ [xi−1 , xi ],
xi −xi−1
pi (x) = x−xi+1
 xi −xi+1 , x ∈ [xi , xi+1 ],


0, x ≥ xi+1 .

1 pi

0
xi−1 xi xi+1

Figure 1: Hat function pi .

(b) Construct the hat function pi using functions of the form eq. (1).

Exercise 2
Show that a single-layer perceptron is an over-parametrized model. That is, consider models
of the form
p (x) = a⊤ σ (bx + c) (2)

1
with x ∈ R, σ(y) = max{0, y}, and a, b, c ∈ Rn ; we concatenate the parameter vectors as
follows:  
a
 b  ∈ R3n .
c
In the lecture, we have already noted that a permutation of the parameter vectors a, b, c
leads to the same model. In this exercise, construct a subspace (dimension > 0) of the space
of concatenated parameter vectors, R3n , that yields the model

p(x) = σ(x)

using the representation eq. (2); to simplify the discussion, you may choose any specific n.

Exercise 3
Consider the following Theorem from the book Linear Algebra and Learning from Data from
G. Strang1 :

Theorem 1 For v ∈ Rm , suppose the graph of F (v) has folds along N hyperplanes H1 , . . . , HN .
Those come from N linear equations a⊤ i v + bi = 0, in other words from ReLU at N neurons.
Then the number of linear pieces of F and regions bounded by the N hyperplanes is r(N, m):
m      
X N N N
r(N, m) = = + ... + .
i=0
i 0 m

The binomial coefficients are  


N N!
=
i i!(N − i)!
N N
 
with 0! = 1 and 0
= 1 and i
= 0 for i > N .

(a) Prove the recursion formula

r(N, m) = r(N − 1, m) + r(N − 1, m − 1).

(b) Using the theorem, compute the numbers for one, two, and three dimensional input
vectors.

Exercise 4

(a) Using Python, verify numerically that


N Z 1
1 X
f (xi ) → f (x) dx
N i=0 0

for N → ∞, where xi is sampled form a uniform distribution U (0, 1). Choose a fourth
order polynomial and compare the numerical approximation with the exact value of the
integral.
1
Strang, Gilbert. Linear Algebra and Learning From Data. Wellesley-Cambridge Press, 2019.

2
(b) Use the same approach to compute an approximation of π by approximating
Z
χC (x) dx,
[0,1]2

where 
1, for ∥x∥2 ≤ 1,
χC (x) =
0, otherwise.

Exercise 5
Prove a sub-optimal approximation result, which uses a stronger assumption of f that just
continuity:

(a) Show that, for the linear interpolant p on Ii = [xi , xi+1 ] with

p(xi ) = f (xi ) and p(xi+1 ) = f (xi+1 ),

there exists some ξ ∈ I, such that

h2i ′′
max |f (x) − p(x)| ≤ |f (ξ)| ,
x 8
with hi = xi+1 − xi .

(b) Let f ∈ C 2 ([0, 1]) arbitrary and σ(x) = max {0, x} and. Prove that, for every ε > 0,
there exists some n ∈ N and a function

P (x) = a⊤ σ (bx + c) ,

with a, b, c ∈ Rn , such that


max |f (x) − p(x)| < ε.

Hint:

(a) Construct the function

F (x) = f (x) − p(x) + K(x − xi )(x − xi+1 )

with K ∈ R. Impose F (x̄) = 0 for some x̄ ∈ [xi , xi+1 ]. Then, use Rolle’s theorem.

Exercise 6
Implement a multi-layer perceptron model with two hidden layers, sigmoid activation func-
tion, and the following architecture using Python:

• input dimension: 2

• dimension of each hidden layer: 4

• output dimension: 1

3
You may use any deep learning library to implement the model or implement it yourself.

In order to initialize the weights, compare the following two strategies discussed in the
lecture:
 
• Uniform distribution: U − ni , ni ,
√1 √1

 q q 
• Glorot and Bengio: U − ni +ni+1 , ni +ni+1
6 6

Initialize the bias vectors with zero. Without training the network, perform the following
tasks:

(a) Visualize the output of the neural networks after initialization for five different random
seeds.

(b) Visualize the value of the loss function depending on the diagonal entries of the weight
matrix in the first hidden layer. Therefore, use the loss functions

2
LMSE N N σW,b (xi ) , yi 1
N N σW,b (xi ) − yi 2 ,

Mean squared error (MSE) = N
σ 1
N N σW,b (x) − y 1 .

Mean absolute error (MAE) LMAE N N W,b (x) , yi = N

In order to compute the input and output data, take 100 points randomly sampled from
{x ∈ R2 | x21 + x22 ≤ 1} as input and the squared norm as corresponding output:

y = x21 + x22 .

Exercise 7
For the two functions

(a)
f (x, y, θ, r) = x + iy + r(cos(θ) + i sin(θ))

(b) A multi-layer perceptron model with two hidden layers

• input dimension: 4
• dimension of each hidden layer: 2
• output dimension: 1

visualize the computational graph. Then, perform forward and backward propagation for
the input  
1
1
π .
 
2
2

4
Exercise 8

(a) Show that the discrete convolution operation is linear.


(b) Let  
d11
   .. 
d11 d12 d13 d14  . 
d21 d22 d23 d24 
 
D= and d = d14 
  
d31 d32 d33 d34  d41 
d41 d42 d43 d44
 
. . .
d44
be the matrix and vector representations of the same data. Derive the sparse matrix A
corresponding to the discrete convolution with the kernel
 
0 1 0
K = 1 −4 1 ,
0 1 0
such that
Ad and D∗K
are equivalent. How sparse is the matrix? How does the number of nonzeros related to
the size of the kernel? How would the sparsity change if D ∈ R100×4 instead?

Exercise 9

(a) Find a solution of the boundary value problem: find u such that
 
∂ 1 ∂
· u = 1 in [0, 1],
∂x x + 1 ∂x
u(0) = 0,
7
u(1) = ,
3

(b) Give an example for the functions a and b in the ansatz

u(x) = a(x) + b(x)N N σW,b (x)

and the loss function that allow for solving the BVP using the method of Lagaris et al.
(c) Replace the ansatz by a data loss and formalize the corresponding PINN loss function.

Exercise 10
Test the example in examples/pinn forward/Poisson Dirichlet 1d.py from the DeepXDE
library2 for solving a one dimensional Poisson problem using PINNs.

2
https://github.com/lululxvi/deepxde

You might also like