0% found this document useful (0 votes)
29 views3 pages

hw8 (5555)

This document contains 7 problems related to mathematical foundations of deep neural networks. The problems cover topics like transposes of downsampling operators, nearest neighbor upsampling, f-divergences, generalized inverse transform sampling, change of variables formulas for Gaussians, computing inverses of permutations, and properties of permutation matrices.

Uploaded by

Ezekiel Elliott
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views3 pages

hw8 (5555)

This document contains 7 problems related to mathematical foundations of deep neural networks. The problems cover topics like transposes of downsampling operators, nearest neighbor upsampling, f-divergences, generalized inverse transform sampling, change of variables formulas for Gaussians, computing inverses of permutations, and properties of permutation matrices.

Uploaded by

Ezekiel Elliott
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Mathematical Foundations of Deep Neural Networks, M1407.

001200
E. Ryu
Spring 2024
Due 5pm, Monday, May 06, 2024

Problem 1: Transpose of downsampling. Consider the downsampling operator T : Rm×n →


R(m/2)×(n/2) , defined as the average pool with a 2 × 2 kernel and stride 2. For the sake of
simplicity, assume m and n are even. Describe the action of T ⊤ . More specifically, describe
how to compute T ⊤ (Y ) for any Y ∈ R(m/2)×(n/2) .

Clarification. The downsampling operator T is a linear operator (why?). Therefore, T has a


matrix representation A ∈ R(mn/4)×(mn) such that

T (X) = (A(X.reshape(mn))).reshape(m/2, n/2)

for all X ∈ Rm×n . The adjoint T ⊤ has two equivalent definitions. One definition is

T ⊤ (Y ) = (A⊤ (Y.reshape(mn/4))).reshape(m, n)

for all Y ∈ R(m/2)×(n/2) . Another is


m/2 n/2 m X
n
XX X
Yij (T (X))ij = (T ⊤ (Y ))ij (X)ij
i=1 j=1 i=1 j=1

for all X ∈ Rm×n and Y ∈ R(m/2)×(n/2) .


Hint. To spoil the suspence, T ⊤ is a constant times the nearest neighbor upsampling. Explain
why in your answer.

Problem 2: Nearest neighbor upsampling. How is the nearest neighbor upsampling operator
an instance of transpose convolution? Specifically, describe how
layer = nn . Upsample ( scale_factor =r , mode = ’ nearest ’)

where r is a positive integer, can be equivalently represented by


layer = nn . ConvTranspose2d (...)
layer . weight . data = ...

with ... appropriately filled in.

1
Problem 3: f-divergence. Let X and Y be two continuous random variables with densities pX
and pY . The f -divergence of X from Y is defined as
Z  
pX (x)
Df (X∥Y ) = f pY (x) dx,
pY (x)

where f is a convex function such that f (1) = 0.

(a) Show that Df (X∥Y ) ≥ 0.

(b) Show that f = − log t and f = t log t correspond to the KL divergence.

Problem 4: Generalized inverse transform sampling. Let F : R → [0, 1] be the CDF of a


random variable and let U ∼ Uniform([0, 1]). If F is continuous and strictly increasing and
therefore invertible, then F −1 (U ) is a random variable with CDF F , because

P(F −1 (U ) ≤ t) = P(U ≤ F (t)) = F (t).

When F is not necessarily invertible, the generalized inverse of F is G : (0, 1) → R with

G(u) = inf{x ∈ R | u ≤ F (x)}.

Show that G(U ) is a random variable with CDF F .

Hint. Use the fact that F is right-continuous, i.e., limh→0+ F (x + h) = F (x) for all x ∈ R, and
that limx→−∞ F (x) = 0.

Problem 5: Change of variables formula for Gaussians. If φ : Rn → Rn is a one-to-one differ-


entiable function, Y = φ(X), and Y is a continuous random variable with density function pY ,
then X is a continuous random variable with density function

∂φ
pX (x) = pY (φ(x)) det (x) .
∂x

Let Y ∈ Rn be a continuous random vector with density


1 − 21 ∥y∥2
pY (y) = e ,
(2π)n/2

i.e., Y ∼ N (0, I). Let X = AY + b with an invertible matrix A ∈ Rn×n and a vector b ∈ Rn .
Define Σ = AA⊺ . Show that X is a continuous random vector with density
1 1 ⊺ −1
pX (x) = p e− 2 (x−b) Σ (x−b) .
n
(2π) det Σ

2
Problem 6: Inverse permutation. Let Sn denote the group of length-n permutations. Note
that the map i 7→ σ(i) is a bijection. Define σ −1 ∈ Sn as the permutation representing the
inverse of this map, i.e, σ −1 (σ(i)) = i for i = 1, . . . , n. Describe an algorithm for computing
σ −1 given σ.

Clarification. In this class, we defined σ as a list of length n containing the elements of {1, . . . , n}
exactly once. The output of the algorithm, σ −1 , should also be provided as a list.
Clarification. For this problem, it is sufficient to describe the algorithm in equations or pseu-
docode. There is no need to submit a Python script for this problem.

Problem 7: Permutation matrix. Given a permutation σ ∈ Sn , the permutation matrix of σ is


defined as  ⊺ 
eσ(1)
 e⊺ 
 σ(2)  n×n
Pσ =  ..  ∈ R ,

 . 
e⊺σ(n)
where e1 , . . . , en ∈ Rn are the standard unit vectors. Show

(a) (Pσ x)i = xσ(i) for all x ∈ Rn and i = 1, . . . , n,

(b) Pσ⊺ = Pσ−1 = Pσ−1 and

(c) | det Pσ | = 1.

Hint. If the rows of U ∈ Rn×n are orthonormal, we say U is an orthogonal matrix. Orthogonal
matrices satisfy U U ⊺ = U ⊺ U = I.

You might also like