0% found this document useful (0 votes)

58 views

Notes On Tensors

This introduces the mathematical operations involving tensors. Useful for study of Continuum Mechanics.

Uploaded by

Sambit Das

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views

Notes On Tensors

This introduces the mathematical operations involving tensors. Useful for study of Continuum Mechanics.

Uploaded by

Sambit Das

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

1

Tensors for matrix dierentiation

Richard Turner
Here are some notes on how to use tensors to nd matrix
derivatives, and the relation to the . (Hadamard), vec,
(Kronecker), vec-transpose and reshape operators. I wrote
these notes for myself, and I apologise for any mistakes and
confusions. Two sections are currently unnished: I hope to
complete them soon.
1 A tensor notation
Lets setup one useful form of tensor notation, which incorporates the
matrix and inner product, the outer product, the Hadamard (MATLAB
.* or ) product diag and diag
1
. These will be denoted using dierent
combinations of pairs of up-stairs and down-stairs indices. If we have
only 2
nd
order tensors (and lower) we want to be able to easily con-
vert the result into matrix representation. We have a free choice for
the horizontal ordering of indices therefore this can be used to denote
transposes and the order of multiplication.
a
j
i
b
k
j
=

j
a
j
i
b
k
j
(1)
= (AB)
k
i
(2)
A
j
i
= (A
T
)
j
i
(3)
a
i
b
j
= (ab
T
)
j
i
(4)
a
i
i
=

i
a
i
i
(5)
= tr A (6)
A
j
i
B
j
i
= H
jkm
iln
A
l
k
B
n
m
(7)
= (A B)
j
i
(8)
H
jkm
iln
=
k
i

j
l

m
i

j
n
(9)
A
i,i
= diag(A)
i
(10)
A
j
i

j
i
= (diag
1
diag A)
j
i
(11)
The Kronecker delta
j
i
is 1 i i = j and zero for i = j.
From the matrix perspective, the rst indices index the rows and
the second the columns. A second order tensor must have one upstairs
2
and one downstairs index. The only way of moving downstairs indices
upstairs, is to ip ALL indices, and this does not aect anything.
Summations occur between one down-stairs index and one up-stairs
index, (the Einstein convention). Repeated down-stairs or up-stairs
indices imply a Hadamard or entry-wise product. As an example, the
Hadamard product between two vectors (of the same size) is: a b =
[a
1
, a
2
, ..., a
I
]
T
. [b
1
, b
2
, ..., b
I
]
T
= [a
1
b
1
, a
2
b
2
, ..., a
I
b
I
]
T
If we have a bunch of second and/or rst order tensors (eg. S
j
i
=
B
j
l
W
k
i
A
l
k
), we can convert them into matrix/vector notation using the
order of the indices from left to right, and the rule for the transpose.
1. Make the rst and the last indices match the LHS (S
j
i
= W
k
i
A
l
k
B
j
l
=
(W
T
)
k
i
A
l
k
B
j
l
)
2. Transpose the central objects so the indices run consecutively
(S
j
i
= (W
T
)
k
i
(A
T
)
l
k
B
j
l
)
3. Replace with matrix notation (S = W
T
A
T
B)
2 Basic derivatives
To convert derivatives found using the sux notation into matrix deriva-
tives, we need to be aware of one more convention.
Imagine dierentiating a vector x by another vector y. The result
is a matrix, but we must choose which way round we want the rows
and columns. Conventionally the choice is:
x
i
y
j
=
j
i
(12)
So the chain rule is easy to apply (and intuitive) by a right hand mul-
tiplication:
x
i
z
k
=
x
i
y
j
y
j
z
k
(13)
=
j
i

k
j
(14)
We might also want to dierentiate a matrix by a matrix. Although the
resulting forth order tensor cannot be represented a matrix, the object
could be required in applying the chain rule (see the example in the
following section where we dierentiate an object like tr(WW
T
) with
3
respect to W) and so we need rules for assigning the indices. Luckily
we have enough conventions to unambiguously specify this:
A
j
i
B
l
k
=
j,k
i,l
(15)
Where the ordering withing the upstairs and downstairs slots is arbi-
trary.
Note that this relation denes all the derivatives you can form with
2
nd
order tensors and lower (a subset of which are the three types of
derivatives that can be represented as matrices).
2.0.1 Some examples
Here are some explicit examples:
Find:
tr(AWCW
T
B)
dW
=
f
dW
(16)
Solution:
f
dW
j
i
= A
l
k
W
m
l
C
n
m
(W
T
)
p
n
B
k
p
(17)
= A
l
k
W
m
l
C
n
m
W
p
n
B
k
p
(18)
= A
l
k

i
l

m
j
C
n
m
W
p
n
B
k
p
(19)
+A
l
k
W
m
l
C
n
m

jn
B
k
p
(20)
= A
i
k
C
n
j
W
p
n
B
k
p
(21)
+A
l
k
W
m
l
C
j
m
B
k
i
(22)
= C
n
j
(W
T
)
p
n
B
k
p
A
i
k
(23)
+(C
T
)
j
m
(W
T
)
m
l
(A
T
)
l
k
(B
T
)
k
i
(24)
= (CW
T
BA +C
T
W
T
A
T
B
T
)
j
i
(25)
Note the discrepancy with some texts (for example the widely used
the matrix cook book): their results dier by a transpose as their
denitions for x/M and M/x are not consistent with regard to
their use in the chain rule.
Find:
dA
1
dx
(26)
4
Solution:
dI
j
i
dx
=
dA
k
i
(A
1
)
j
k
dx
(27)
0 =
dA
k
i
dx
(A
1
)
j
k
+A
k
i
d(A
1
)
j
k
dx
(28)

k
l
d(A
1
)
j
k
dx
= (A
1
)
i
l
dA
k
i
dx
(A
1
)
j
k
(29)
d(A
1
)
j
l
dx
=
_
A
1
dA
dx
A
1
_
j
l
(30)
3 Dierentiating determinants
complete this section
4 Dierentiating structured matrices
Imagine we want to dierentiate a matrix with some structure, for
example a symmetric matrix. To form the derivatives we can use the
matrix-self-derivative:

j,k
i,l
=
A
j
i
A
l
k
(31)
So that:
f
A
l
k
=
f
A
j
i
A
j
i
A
l
k
(32)
When forming dierentials we have to be careful to only sum over
unique entries in A:
df =
f
A
l
k
dA
l
k
(33)
=

unique k,l
f
A
l
k
dA
l
k
(34)
One ubiquitous form of structured matrices are the symmetric ma-
trices. For this class the matrix-self-derivative is:
5
S
j
i
S
l
k
=
k
i

j
l
+
i,l

j,k

j
i

k
l

k
i
(35)
The rst two terms make sure we count the o diagonal elements
twice, and the second term avoids over counting of the diagonal and
involves some entry-wise products.
4.1 A sermon about symmetric matrices
Lets do a family of derivatives correctly that most people muck up.
Find:
df(S)
dS
where S is symmetric.
f(S)
S
l
k
=
df(S)
dS
j
i
dS
j
i
dS
l
k
(36)
= f

(S)
i
j
_

k
i

j
l
+
i,l

j,k

j
i

k
l

k
i

(37)
= f

(S)
i
j
_

k
i

j
l
+
i,l

j,k

j
i

k
l

k
i

(38)
= f

(S)
k
l
+f

(S)
k
l
f

(S)
i
j

j
i

k
l

k
i
(39)
= f

(S)
k
l
+f

(S)
k
l
f

(S)
i
i

k
l

k
i
(40)
= f

(S)
k
l
+f

(S)
k
l
f

(S)
k
k

k
l
(41)
= [f

(S) +f

(S)
T
f

(S) I]
k
l
(42)
= [2f

(S) f

(S) I]
k
l
(43)
If we want the dierential, we must sum over all the unique elements
(and no more):
df(S) =

i,ji
[2f

(S) f

(S) I]
i
j
dS
j
i
(44)
Heres a concrete example where people forget the above: imagine nd-
ing the ML covariance matrix of a multivariate Gaussian distribution,
given some data. The likelihood is given by:
P({x
n
}
N
n=1
) =

n
det(2)
1/2
exp
_

1
2
(x
n
)
T

1
(x
n
)
_
(45)
log P({x
n
}
N
n=1
) =
N
2
_
Dlog(2) log det(
1
) + tr
_

1
X
_
(46)
Where we have introduced X =

(x
n
)(x
n
)
T
_
. Now we dier-
entiate this wrt.
1
:
6
log P({x
n
}
N
n=1
)
d
1
=
N
2
_
log det(
1
)
d
1
+
tr (
1
X)
d
1
_
(47)
=
N
2
[2 I + 2X X I] (48)
These are the correct derivatives. As 2 I +2XX I X
people can recover = X without using the symmetrised expressions.
However, if we wanted to do gradient ascent here, then we should use
the following updates:
(
1
n+1
)
j
i
= (
1
n+1
)
j
i
[2 I + 2X X I]
ji
i
(49)
Where is an upper triangular matrix. Most people would use:
(
1
n+1
)
j
i
= (
1
n+1
)
j
i
[ X]
j
i
(50)
If we initialise with symmetric
1
0
then the incorrect proceedure
will walk us along the manifold of symmetric matrices towards the ML
solution. You might think numerical errors could, in principle, step us
o the manifold of symmetric matrices: Afterall, (
1
)
j
i
X
i
j
is invariant
so long as (
1
)
j
i
+(
1
)
i
j
= const. There appears to be no pressure to
keep
1
symmetric. Things actually turn out to be worse than this.
45 is not actually a correct expression for a Gaussian distribution if

1
is not symmetric. Specically the normalising constant should be
replaced by det(
1
2
[+
T
]). If you dont use this symmetrised form, the
manifold of symmetric matrices lies along a minimum and the ML
solution is a saddle point (see Fig. 1). For this reason, it seems
prudent to use the correct gradients when doing gradient ascent, and
to remember to normalise correctly.
5 Relation to the vec and kronecker product op-
erators
The vec, Kronecker product, vector transpose, and reshape operators
shue tensors so they can be represented in arrays of dierent di-
mensionalities (and sizes). The are most intuitively dened by visual
examples, but useful results can be proved using a tensor representa-
tion that always involves a tensor product between the object we are
transforming and an indicator tensor.
In this section our tensor algebra does not need to deal with entry-
wise products etc.. Therefore, to aid the clarity of the presentation we
7
Figure 1: a. The expected cost function is symmetric, and the maximum is
a ridge. b. The cost function with the incorrect normaliser is not symmetric
and the MLparameter values correspond to a saddle point.
8
use the more usual sux notation. The results presented here are easy
to generalise using the previous framework (as is needed to relate the
entry-wise and Kronecker products, and diag, for example), but the
result is less aesthetic.
5.1 Vec
The vec operator lets you represent a matrix as a vector, by stacking
the columns. For example:
vec
__
x
11
x
12
x
13
x
21
x
22
x
23
__
=
_
_
_
_
_
_
_
x
11
x
21
x
12
x
22
x
12
x
23
_
_
_
_
_
_
_
(51)
The tensor representation of this operator is:
x
i
= V
iab
X
ab
(52)
V
iab
=
i,a+(b1)A
(53)
5.2 Kronecker Product
The Kronecker product operator () lets you represent the outer prod-
uct of two matrices (a 4th order tensor) as a matrix.
_
x
11
x
12
x
21
x
22
_
Y =
_
x
11
Y x
12
Y
x
21
Y x
22
Y
_
(54)
Alternatively, written as a tensor, we have:
Z
j
i
= K
jbd
iac
X
a
b
Y
c
d
(55)
K
ijabcd
=
i,c+(a1)C

j,d+(b1)D
(56)
Examples The important result relating the vec and Kronecker
products is:
(C
T
A) vec(B) = vec(ABC) (57)
which can be proved using the denitions above:
9
[(C
T
A) vec(B)]
ij
= K
ijabcd
C
ba
A
cd
V
jef
B
ef
(58)
=
i,c+(a1)C

j,d+(b1)D
C
ba
A
cd

j,e+(f1)E
B
ef
(59)
=
i,c+(a1)C

d,e

bf
C
ba
A
cd
B
ef
(60)
=
i,c+(a1)C
C
ba
A
cd
B
bd
(61)
= V
ica
A
cd
B
bd
C
ba
(62)
= vec(ACB) (63)
5.3 vec-transpose
This is not the same as reshape in MATLAB despite what Tom Minka
claims. complete this section
5.4 reshape
Reshape generalises the vec operator. It allows us to handle high di-
mensional objects easily. For example, by remapping tensors into ma-
trices we can form tensor inverses.
We want to be able to map an array of size ABC . . . with in
total ABC. . . = N elements into another array of IJK. . . =
N elements. To do this we need to specify where each of the elements
in the old array appears in the new array. There are lots of choices for
how we lay down the elements in the new array. It would be useful to
do this systematically.
One way to do this is to come up with a systematic method for num-
bering each element in an array, and then we could lay down elements
in the new array such that they will be assigned the same number. A
systematic numbering system can be constructed as follows.
In a normal counting system, we choose a base B (eg. 10, 2). We
can uniquely form integer numbers as a sum of upto B 1 1s, Bs,
B
2
s and so on. eg. B = 10, 954 = 9 100 + 5 50 + 4 1 eg.
B = 2, 13 = 1 2
3
+ 1 2
2
+ 0 2
1
+ 1 1. The coecients are
the representation of the number. Lets dene a new counting system
which has a non-constant base. The last number i will again represent
the number of ones and will take a values from 0 to I 1, the second
number j represents the number of Is and takes values from 0 to J 1,
the third number k represents the number of I Js and takes values
from 0 to K1, and so on. eg. Using {I, J, K} = {2, 3, 4} the number
21 is 36+12+11 and we can represent I J K = 24 numbers
this way. Now if we associate i +1, j +1, k +1, . . . with the position in
10
an N dimensional array, we have assigned all points in that array with
a unique integer number that forms a sequence.
T
i,j,k,...
= R
I,J,K,...;A,B,C,...
i,j,k,...,a,b,c,...
S
a,b,c,...
(64)
= reshape(S, [I, J, K, . . .]) (65)
R
I,J,K,...;A,B,C,...
i,j,k,...,a,b,c,...
=
a1+(b1)A+(c1)AB+...,i1+(j1)I+(k1)IJ+...
(66)
5.4.1 Examples:
Here are some examples and useful results:
From this it is simple to show that reshaping a tensor, and then
reshaping it back to its original size returns all the entries to their
original positions:
T
a

,...
= R
A,B,C,...;I,J,K,...
a

,...,i,j,k,...
R
I,J,K,...;A,B,C,...
i,j,k,...,a,b,c,...
S
a,b,c,...
(67)
=
a

1+(b

1)A+(c

1)AB+...,i1+(j1)I+(k1)IJ+...

a1+(b1)A+(c1)AB+...,i1+(j1)I+(k1)IJ+...
S
a,b,c,...
(68)
=
a

1+(b

1)A+(c

1)AB+...,a1+(b1)A+(c1)AB+...
S
a,b,c,...
(69)
=
a

,c
S
a,b,c,...
(70)
= S
a

,...
(71)
This result should be obvious. The way we introduced the reshape
operator was via a numbering system that was unique for all arrays
of a given shape. This means if we reshape an array and reshape it
again, the result must be equivalent to reshaping directly: intermediate
reshapings cannot eect anything.
A problem where the reshape operator is useful arises in multi-linear
models. There we have to solve:

d,i,j,...
= g
d,a,b,...

a,b,...,i,j,...
(72)
where we want g
d,a,b,...
and we know
d,i,j,...
and
a,b,...,i,j,...
. The di-
mensionalities are I = A, J = B. . .. The solution amounts to nding
the inverse of
a,b,...,i,j,...
. Reshaping the left and right hand sides into
[D, Q = I J . . .] matrices we have:
R
D,Q;D,I,J,...
e,q,d,i,j,...

d,i,j,...
= R
D,Q;I,J,...
e,q,d,i,j,...
g
d,a,b,...

a,b,...,i,j,...
(73)
=
e+(q1)D,d+(i1)D+(j1)DI+...
g
d,a,b,...

a,b,...,i,j,...
(74)
=
e,d

q,i+(j1)I+...
g
d,a,b,...

a,b,...,i,j,...
(75)
= g
e,a,b,...

q,i+(j1)I+...

a,b,...,i,j,...
(76)
11
Letting X = reshape(, [D, Q]), we replace the tensor product on the
RHS with a matrix product using the reshape operator again:
X
e,q
=
a

,b
. . . g
e,a,b,...

q,i+(j1)I+...

,...,i,j,...
(77)
= R
A,B,...;Q
a

,...,g
R
Q;A,B,...
g,a,b,...
g
e,a,b,...

q,i+(j1)I+...

,...,i,j,...
(78)
= R
Q;A,B,...
g,a,b,...
g
e,a,b,...

q,i+(j1)I+...
R
A,B,...;Q
a

,...,f,g

,...,i,j,...
(79)
= R
D,Q;D,A,B,...
e,g,e

,a,b,...
g
e

,a,b,...
R
Q,Q;A,B,...,A,B,...
g,q,a

,...,i,j...

a

,...,i,j,...
(80)
= reshape(g, [D, Q])
e,g
reshape(, [Q, Q])
g,q
(81)
Letting Y = reshape(, [Q, Q]) the solution is:
g
d,a,b,...
= reshape(XY
1
, [D, A, B, . . .])
d,a,b,...
(82)

Principles of Magnetic Resonance Imaging: A Signal Processing Perspective
No ratings yet
Principles of Magnetic Resonance Imaging: A Signal Processing Perspective
44 pages
Matrix Calculus
No ratings yet
Matrix Calculus
8 pages
Thomas Minka - Note On Matrix Calculus and Algebra
No ratings yet
Thomas Minka - Note On Matrix Calculus and Algebra
19 pages
Tensor PDF
No ratings yet
Tensor PDF
10 pages
2 - Mathematical Preliminaries
No ratings yet
2 - Mathematical Preliminaries
27 pages
EN530.678 Nonlinear Control and Planning in Robotics Lecture 1: Matrix Algebra Basics January 27, 2020
No ratings yet
EN530.678 Nonlinear Control and Planning in Robotics Lecture 1: Matrix Algebra Basics January 27, 2020
4 pages
Matrix Decompositions
No ratings yet
Matrix Decompositions
9 pages
Elasticity ME5413-Lecture1
No ratings yet
Elasticity ME5413-Lecture1
36 pages
Linear Algebra Reivew: All Linear Algebra, So This Is A Fairly Serious Weakness. This Review Is
No ratings yet
Linear Algebra Reivew: All Linear Algebra, So This Is A Fairly Serious Weakness. This Review Is
10 pages
Refresher Algebra Calculus
100% (2)
Refresher Algebra Calculus
2 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Lecture Notes Continuum Mechanics 2019
No ratings yet
Lecture Notes Continuum Mechanics 2019
53 pages
Introduction To Suffix Notation: Adam Thorn February 17, 2009
No ratings yet
Introduction To Suffix Notation: Adam Thorn February 17, 2009
4 pages
Math 5390 Chapter 2
No ratings yet
Math 5390 Chapter 2
5 pages
Engineering Maths: Prof. C. S. Jog Department of Mechanical Engineering Indian Institute of Science Bangalore 560012
No ratings yet
Engineering Maths: Prof. C. S. Jog Department of Mechanical Engineering Indian Institute of Science Bangalore 560012
93 pages
A Journey From Linear Algebra To Machine Learning
No ratings yet
A Journey From Linear Algebra To Machine Learning
50 pages
Executive Summary of AI and ET
No ratings yet
Executive Summary of AI and ET
154 pages
Introduction To Vectors and Tensors
No ratings yet
Introduction To Vectors and Tensors
37 pages
Math 243M, Numerical Linear Algebra Lecture Notes
No ratings yet
Math 243M, Numerical Linear Algebra Lecture Notes
71 pages
Lecture 2 - Applied Mathematics (updated 22nd Sept 2022) (1) (1)
No ratings yet
Lecture 2 - Applied Mathematics (updated 22nd Sept 2022) (1) (1)
74 pages
2207.04377v1
No ratings yet
2207.04377v1
6 pages
Tensors.(Susanka)
No ratings yet
Tensors.(Susanka)
81 pages
L3 Prerequisite Basic Maths Session 2
No ratings yet
L3 Prerequisite Basic Maths Session 2
37 pages
00 Lectureslides LinAlg
No ratings yet
00 Lectureslides LinAlg
20 pages
Selected Linear Algebra for Machine Learning
No ratings yet
Selected Linear Algebra for Machine Learning
30 pages
Linear Algebra
No ratings yet
Linear Algebra
23 pages
Mathematical Formula Handbook
100% (1)
Mathematical Formula Handbook
26 pages
Matrix Calculus Tutorial
No ratings yet
Matrix Calculus Tutorial
7 pages
SF I PDF
No ratings yet
SF I PDF
30 pages
斯坦福大学机器学习数学基础 17-24
No ratings yet
斯坦福大学机器学习数学基础 17-24
8 pages
Chapter Matrix Derivative Common Cases
No ratings yet
Chapter Matrix Derivative Common Cases
6 pages
Scan Jul 16, 2023
No ratings yet
Scan Jul 16, 2023
8 pages
ML_Lec 3- Review of Linear Algebra
No ratings yet
ML_Lec 3- Review of Linear Algebra
16 pages
01 - Lab Notes
No ratings yet
01 - Lab Notes
8 pages
D2L CH2 Part2
No ratings yet
D2L CH2 Part2
40 pages
Mathematical Preliminaries Handbook
No ratings yet
Mathematical Preliminaries Handbook
6 pages
第二章
No ratings yet
第二章
61 pages
Mathematical Functions
From Everand
Mathematical Functions
Oliver Linton
No ratings yet
18.06 - Linear Algebra Cheatsheet: 1 Vectors
No ratings yet
18.06 - Linear Algebra Cheatsheet: 1 Vectors
5 pages
Tungban Machine Learning Math Course
No ratings yet
Tungban Machine Learning Math Course
124 pages
Review of Matrix Operations: Vector: A Sequence of Elements (The Order Is Important)
No ratings yet
Review of Matrix Operations: Vector: A Sequence of Elements (The Order Is Important)
11 pages
Summary
No ratings yet
Summary
115 pages
Matrices and Vectors. - . in A Nutshell: AT Patera, M Yano October 9, 2014
No ratings yet
Matrices and Vectors. - . in A Nutshell: AT Patera, M Yano October 9, 2014
19 pages
L02 Notes
No ratings yet
L02 Notes
6 pages
Matrix Calc
No ratings yet
Matrix Calc
23 pages
Matrix Algebra Calculus Review
0% (1)
Matrix Algebra Calculus Review
12 pages
Games103 02 Math
No ratings yet
Games103 02 Math
47 pages
TA WEEK 3 Copy
No ratings yet
TA WEEK 3 Copy
27 pages
Matrix Algebra
No ratings yet
Matrix Algebra
18 pages
Matrixcalc PDF
No ratings yet
Matrixcalc PDF
23 pages
Matrix
No ratings yet
Matrix
10 pages
Multivariate Notes r1
No ratings yet
Multivariate Notes r1
54 pages
EM Short Formulas
No ratings yet
EM Short Formulas
24 pages
ENGMAE 200A: Engineering Analysis I Matrix Eigenvalue Problems Instructor: Dr. Ramin Bostanabad
No ratings yet
ENGMAE 200A: Engineering Analysis I Matrix Eigenvalue Problems Instructor: Dr. Ramin Bostanabad
42 pages
Crash Course On Tensor Analysis Department of Mathematics, IIT Madras
No ratings yet
Crash Course On Tensor Analysis Department of Mathematics, IIT Madras
25 pages
Handbook of Matrices
No ratings yet
Handbook of Matrices
309 pages
Fluid Dynamics Crete
No ratings yet
Fluid Dynamics Crete
204 pages
Matrix Calculus: 1 The Derivative
100% (1)
Matrix Calculus: 1 The Derivative
13 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet