0% found this document useful (0 votes)
34 views

Matrix Calculus Tutorial

Foundations of Data science

Uploaded by

boatrockerz83
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Matrix Calculus Tutorial

Foundations of Data science

Uploaded by

boatrockerz83
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Matrix Calculus

Tanmay Devale

1 Kronecker Product
Let A be a m × n matrix and B be a p × q matrix then the Kronecker or Tensor
product A and B denoted by A⊗B is a mp×nq matrix C with elements defined
by cαβ = aij bkl where α = p(i − 1) + k, β = q(j − 1)
 + l. 
  b11 b12
a a12
For example: Consider A = 11 and B= b21 b22  then
a21 a22
 b31 b32

a11 b11 a11 b12 a12 b11 a12 b12
 a11 b21 a11 b22 a12 b21 a12 b22 
 

a11 B a12 B a11 b31 a11 b32 a12 b31 a12 b32 
A⊗B= = 
a21 B a22 B a21 b11 a21 b12 a22 b11 a22 b12 
 
a21 b21 a21 b22 a22 b21 a22 b22 
a21 b31 a21 b32 a22 b31 a22 b32
 
      10 −2 0 0
2 0 5 −1 2B 0B −2 8 0 0
Say A = and B = Then A ⊗ B = = 
1 3 −1 4 1B 3B  5 −1 15 −3
−1 4 −3 12

1.1 Exercise
1. Let A and B be matrices then find A ⊗ B.
   
3 1 0
(a) A = and B =
2 2 7
   
(b) A = 1 −1 and B = 1 0 5
 
  1
(c) A = 3 6 and B = 0
1
 
1 0  
(d) A = and B = −1 3
0 2
   
2 3
(e) A = and B =
1 8
2. Is A ⊗ B = B ⊗ A? Provide a proof or counter example.

1
2 Matrix Differentiation
We are going use the following notation:

1. x denotes scalars
2. ⃗x denotes vectors(specifically column vectors)

3. X denotes matrices
We are interested in the following 9 derivatives.

dy d⃗y dY
dx dx dx
dy dy dy
d⃗x d⃗x d⃗x
dy d⃗y dY
dX dX dX
Table 1: Derivatives of interest

2.1 Derivative with respect to a scalar


We are now looking at the first row from table 1.
dy
We already know what dx meansfrom elementary
 calculus.
  df1 (x)
f1 (x)
d⃗
y  dx
Now let ⃗y = f2 (x) then dx =  dfdx
2 (x) 

f3 (x) df3 (x)
dx " #
  df11 (x) df12 (x)
f11 (x) f12 (x) dY dx dx
Let Y = then dx = df21 (x) df22 (x)
f21 (x) f22 (x)
dx dx

2.1.1 Exercise
Given the notation as specified above, find derivative of y, ⃗y , Y w.r.t. x.
 
x  2 
2 x + 1 cos(x)
1. y = sin(x ), ⃗y = cos(x) , Y =
 
sin(x) x − 1
2x2
 
ln(x)  3
x + x2 + 1

2 ex
2. y = ex , ⃗y =  sin(x)  , Y =
2x sin(x)
cos(x2 )

2
 
cos(x)
 sin(x2 )  πx eπ
 
sin(πx)
3. y = ln(x10 ), ⃗y =  ,Y =
 tan(x)  cos(sec(x)) csc(x3 ) x
x4 + x + 2023

2.2 Derivative with respect to a vector


We are now looking at the second row from table 1.  
x1
While calculating the derivative w.r.t. a vector say ⃗x = x2  we consider the
x3

 ∂ ∂ ∂

vector ∂⃗x = ∂x1 ∂x2 ∂x3 . and then

for scalar find ∂⃗x ⊗ y

for vector find ∂⃗x ⊗⃗ y

for matrix find ∂⃗ x ⊗ Y

2.2.1 Exercise
1. Given the notation as specified above, find derivative of y, ⃗y , Y w.r.t. ⃗x.
 xyz   
e  2 2
 x
x yz xy z
(a) y = sin(x + yz), ⃗y =  x2 z  , Y = 2 where ⃗x = y 
xyz ln(xyz)
xy z
   3 2 e

ln(xyz) x + x + yz π cos(x + y + z)
(b) y = 5xyz , ⃗y = y ,Y =
 x cos(z) xyz 2sin(cos(x + y)) z
x
where ⃗x = y 
z
2. Consider functions f : Rn → Rm and g : Rn → Rm .
(a) Show that for ⃗x i.e. x ∈ Rn
d(f (x) + g(x) df (x) dg(x)
= +
dx dx dx
(b) Show that for ⃗x i.e. x ∈ Rn and a ∈ R
daf (x) df (x)
=a
dx dx
3. The quadratic form xT Ax is a form we will encounter often. In this
T
question, we are interested in dxdxAx . Assume that A is not a function of
x.
 
x
(a) Evaluate x Ax when x = 1 and the (i, j)th element of A is Aij
T
x2
T
.Why do you think x Ax is called the quadratic form?

3
(b) Which definition of the derivative do we need in order to evaluate
dxT Ax
dx ?
T
(c) Assume x ∈ R2 and A ∈ R2×2 . Evaluate dxdxAx .
(d) Generalize the previous result to when x ∈ R2 and A ∈ Rn×n and
T
evaluate dxdxAx . Can you express the result in matrix form?
(e) What happens when A is a symmetric matrix?

2.3 Derivative with respect to a matrix


We are now looking at the third row from table 1.  
x11 x12
While calculating the derivative w.r.t. a matrix say X = x21 x22  we con-
x31 x32
 ∂ ∂ ∂ 

sider the vector ∂X = ∂x∂11 ∂x∂21 ∂x∂31 . and then
∂x12 ∂x22 ∂x32

for scalar find ∂X ⊗y

for vector find ∂X ⊗ ⃗y

for matrix find ∂X ⊗Y

2.3.1 Exercise
1. Given the notation as specified above, find derivative of y, ⃗y , Y w.r.t. X.
 xx   
e 0 1 sin(x0 + 2x1 ) 2x1 + x3
(a) y = 4x3 +3x2 +2x1 +x0 , ⃗y = x2 x3 , Y =
e 2x0 + x2 cos(2x2 + x3 )
 
x0 x1
w.r.t. X =
x2 x3
sin(x5 ) + cos(x4 x3 ) + x22 + 2x1 x0
 
(b) y = ln(x25 x34 x3 x22 x01 ), ⃗y = ,Y =
eiπ + 1
 x x 2x1 
1 5 4 0  
2x4 + x1 tan(2x2 + x4 ) w.r.t. X = x0 x1 x2
x3 x4 x5
cot(x0 ) csc(x4 + x1 )

3 Chain Rule
3.1 The basics
Recall that for h(x) = f (g(x)) the chain rule is
dh df dg
=
dx dg dx
For the multivariate case h(x) = f (g1 (x), g2 (x)), the chain rule is extended as
dh ∂f dg1 ∂f dg2 ∂f dg3
= + +
dx ∂g1 dx ∂g2 dx ∂g3 dx

4
3.1.1 Exercises
∂f ∂f
Evaluate ∂x and ∂y for each of the following:

1. f (u, v) = (u − v)eu , where u = xy and v = x2 − y 2 .


x 2
2. f (u, v) = ulog(v) + vlog(u), where u = 2 + y and v = xey

3. f (u, v) = ulog(v) where u = xsiny + ysinx and v = xcosy + ycosx


4. f (u, v) = u+v
1−uv where u = tan( x+y x−y
2 ) and v = tan( 2 )

Our previous operations can be thought of as adding all components that con-
tribute to the change of h. Building on this, we can extend the chain rule to also
work in matrix calculus. For detailed proof of why the chain rule still follows in
matrix calculus please refer to reference 3.

3.1.2 Exercise
Consider x ∈ Rp , y ∈ Rr , z ∈ Rn . Which of the following are true?
dz dz dy dz dy dz
= or =
dx dy dx dx dx dy

3.2 Useful examples of vectored derivatives


In the following we provide some examples of vectored derivatives that are used
frequently in machine learning. Consider the case where the function g(·) has
a d-dimensional vector argument and its output is scalar. Furthermore, the
function f (·) is a scalar-to-scalar function

J = f (g(w))

In such a case, we can apply the vectored chain rule to obtain the following:
∂J
∇J = ⃗ f ′ (g(w))
= ∇g(w) ⃗
∂w | {z }
scalar

In this case, the order of multiplication does not matter, because one of the
factors in the product is a scalar. Note that this result is used frequently in ma-
chine learning, because many loss-functions in machine learning are computed
by applying a scalar function f (·) to the dot product of w ⃗ with a training point
⃗a. In other words, we have g(w) ⃗ =w ⃗ · ⃗a. Note that w
⃗ · ⃗a can also be written
⃗ T (I)⃗a, where I represents the identity matrix. This is in the form of one of
as w
the matrix identities listed above. In such a case, one can use the chain rule to
obtain the following:
∂J
= [f ′ (g(w))]
⃗ ⃗a
∂w⃗ | {z }
scalar

5
This result is extremely useful, and it can be used for computing the derivatives
of many loss functions like least-squares regression, SVMs and logistic regression.
The vector ⃗a is simply replaced with the vector of the training point at hand.
The function f (·) defines the specific form of the loss function for the model at
hand.

3.2.1 Exercise
d 1
1. Evaluate dx σ(x) = 1+e−x . Is this equation* familiar? What is it com-
monly called?
2. Express your answer in the previous question using only σ(x).
3. Consider a weight vector w ⃗ and a sample point ⃗x. We perform affine
⃗ T ⃗x and then perform σ(z).
transformation i.e. z = w

(a) Write σ(z) in terms of w


⃗ and ⃗x.
∂σ(z)
(b) Evaluate ∂w
⃗ and compare with the discussion above.

4 Matrix Identities
Assume identity 3(a) and prove all other identities.
∂c
1. ∂⃗
x = 0T
∂⃗
u⃗v
2. ∂⃗x = ⃗u ∂⃗
u
x +
∂⃗
∂⃗
u
x⃗
∂⃗ v
3. Note the change in notation
Let u, v, x be variable column vectors.
Let a, b be constant column vectors.
Let A be a constant matrix then:
∂y T Av ∂v
(a) ∂x = uT A ∂x + v T AT ∂u
∂x
∂uT v ∂v
(b) ∂x = uT ∂x + v T ∂u
∂x
∂aT x
(c) ∂x = aT
∂bT Ax
(d) ∂x = bT A
T
∂x Ax
(e) ∂x = xT (A + AT )
∂||x||2
(f) ∂x = 2xT
∂au T ∂u
(g) ∂x = a ∂x

6
5 References
1. Weisstein, Eric W. ”Kronecker Product.” From MathWorld–A Wolfram
Web Resource. https://mathworld.wolfram.com/KroneckerProduct.
html
2. Taboga, Marco (2021). ”Kronecker product”, Lectures on matrix algebra.
https://www.statlect.com/matrix-algebra/Kronecker-product

3. Kim, H. Vijayakumar, A. (2022). Matrix calculus for 10-301/601. https:


//www.cs.cmu.edu/~mgormley/courses/10601/slides/10601-matrix-calculus.
pdf

You might also like