0% found this document useful (0 votes)

49 views3 pages

Mat Deriv

This document discusses properties of matrix calculus that are useful for machine learning algorithms. It defines notation for matrices and vectors. Key properties include: the gradient of a function with respect to a matrix; derivatives of operations involving transposes, traces, and sums of matrices; and an example of deriving the least squares solution for linear regression using these properties.

Uploaded by

samhith23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views3 pages

Mat Deriv

Uploaded by

samhith23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Some Important Properties for Matrix Calculus

Dawen Liang
Carnegie Mellon University
[email protected]

1 Introduction
Matrix calculation plays an essential role in many machine learning algorithms, among which ma-
trix calculus is the most commonly used tool. In this note, based on the properties from the dif-
ferential calculus, we show that they are all adaptable to the matrix calculus1 . And in the end, an
example on least-square linear regression is presented.

2 Notation
A matrix is represented as a bold upper letter, e.g. X, where Xm,n indicates the numbers of rows
and columns are m and n, respectively. A vector is represented as a bold lower letter, e.g. x, where
it is a n × 1 column vector in this note. An important concept for a n × n matrix An,n is the trace
Tr(A), which is defined as the sum of the diagonal:
n
X
Tr(A) = Aii (1)
i=1

where Aii index the element at the ith row and ith column.

3 Properties
The derivative of a matrix is usually referred as the gradient, denoted as ∇. Consider a function
f : Rm×n → Rp×q , the gradient for f (A) w.r.t. Am,n is:
 ∂f ∂f ∂f

∂A11 ∂A12 · · · ∂A 1n
 ∂f ∂f ∂f 
∂f (A)  ∂A 21 ∂A 22
· · · ∂A 
2n 
∇A f (A) = = . . . .
∂A  .. .. .. .. 


∂f ∂f ∂f
∂Am1 ∂Am2 ··· ∂Amn

This definition is very similar to the differential derivative, thus a few simple properties hold
(the matrix A below is square matrix and has the same dimension with the vectors):

∇x bT Ax = bT A (2)
1
Some of the detailed derivations which are omitted in this note can be found at http://www.cs.berkeley.
edu/˜jduchi/projects/matrix_prop.pdf

1
∇A XAY = YT XT (3)
∇x xT Ax = Ax + AT x (4)
∇AT f (A) = (∇A f (A))T (5)
where superscript T denotes the transpose of a matrix or a vector.
Now let us turn to the properties for the derivative of the trace. First of all, a few useful properties
for trace:
Tr(A) = Tr(AT ) (6)
Tr(ABC) = Tr(BCA) = Tr(CAB) (7)
Tr(A + B) = Tr(A) + Tr(B) (8)
which are all easily derived. Note that the second one be extended to more general case with
arbitrary number of matrices.
Thus, for the derivatives,
∇A Tr(AB) = BT (9)
Proof :
Just extend Tr(AB) according to the trace definition (Eq. 1).

∇A Tr(ABAT C) = CAB + CT ABT (10)

Proof :

∇A Tr(ABAT C)
=∇A Tr((AB) (AT C))
| {z } | {z }
u(A) v(AT )

=∇A:u(A) Tr(u(A)v(AT )) + ∇A:v(AT ) Tr(u(A)v(AT ))

=(v(AT ))T ∇A u(A) + (∇AT :v(AT ) Tr(u(A)v(AT ))T
=CT ABT + ((u(A))T ∇AT v(AT ))T
=CT ABT + (BT AT CT )T
=CAB + CT ABT

Here we make use of the property of the derivative of product: (u(x)v(x))0 = u0 (x)v(x) +
u(x)v 0 (x). The notation ∇A:u(A) means to calculate the derivative w.r.t. A only on u(A). Same ap-
plies to ∇AT :v(AT ) . Here chain rule is used. Note that the conversion from ∇A:v(AT ) to ∇AT :v(AT )
is based on Eq. 5.

4 An Example on Least-square Linear Regression

Now we will derive the solution for least-square linear regression in matrix form, using the proper-
ties shown above. We know that the least-square linear regression has a closed-form solution (often
referred as normal equation).

2
Assume we have N data points {x(i) , y (i) }1:N , and the linear regression function hθ (x) is
parametrized by θ. We can rearrange the data to matrix form:
 (1) T   (1) 
(x ) y
 (x(2) )T   y (2) 
X= y= . 
   
..  .
 .   . 
(x(N ) )T y (N )

Thus the error can be represented as:

hθ (x(1) ) − y(1)
 
 hθ (x(2) ) − y(2)

Xθ − y = 
 
 ..
  .
(N )
hθ (x ) − y (N )

The squared error E(θ), according to the numerical definition:

N
1X
E(θ) = (hθ (x(i) ) − y(i) )2
2
i=1

which is equivalent to the matrix form:

1
E(θ) = (Xθ − y)T (Xθ − y)
2
Take the derivative:
1
∇θ E(θ) = ∇ (Xθ − y)T (Xθ − y)
2| {z }
1×1 matrix, thus Tr(·)=(·)
1
= ∇Tr(θT XT Xθ − yT Xθ − θT XT y + yT y)
2
1
= ∇Tr(θT XT Xθ) − ∇Tr(yT Xθ) − ∇Tr(θT XT y)
2
1
= ∇Tr(θ I θT XT X) − (yT X)T − XT y
2
The first term can be computed using Eq. 10, where A = θ, B = I, and C = XT X (Note that
in this case, C = CT ). Plug back to the derivation:
1
∇θ E(θ) = (XT Xθ + XT Xθ − 2XT y)
2
1
= (2XT Xθ − 2XT y)
2
Set to 0
====⇒ XT Xθ = XT y
θLS = (XT X)−1 XT y

The normal equation is obtained in matrix form.

Module 2 MAT 350
No ratings yet
Module 2 MAT 350
95 pages
Maths Grade 8 Term 3
No ratings yet
Maths Grade 8 Term 3
277 pages
Engineering Mathematics Short Notes by K Umamaheswara
No ratings yet
Engineering Mathematics Short Notes by K Umamaheswara
29 pages
Matrix Calculus: Derivation and Simple Application: HU, Pili March 30, 2012
No ratings yet
Matrix Calculus: Derivation and Simple Application: HU, Pili March 30, 2012
30 pages
Matrix Cookbook
No ratings yet
Matrix Cookbook
56 pages
Calculus and Linear Algebra
No ratings yet
Calculus and Linear Algebra
45 pages
Linear Algebra - Module 1
No ratings yet
Linear Algebra - Module 1
52 pages
Matrix Introduction
No ratings yet
Matrix Introduction
30 pages
MOTLI Common Questions
100% (1)
MOTLI Common Questions
72 pages
Matrix in English
No ratings yet
Matrix in English
15 pages
Elementary Linear Algebra - Stephen Andrilli, David Hecker
No ratings yet
Elementary Linear Algebra - Stephen Andrilli, David Hecker
124 pages
Linear - Algebra - and Metric Calculus-1
No ratings yet
Linear - Algebra - and Metric Calculus-1
59 pages
Matrix Calculus Tutorial
No ratings yet
Matrix Calculus Tutorial
7 pages
LA Intro
No ratings yet
LA Intro
15 pages
斯坦福大学机器学习数学基础 17-24
No ratings yet
斯坦福大学机器学习数学基础 17-24
8 pages
TA WEEK 3 Copy
No ratings yet
TA WEEK 3 Copy
27 pages
Matrix Calculus 2
No ratings yet
Matrix Calculus 2
6 pages
3 Math Basics
No ratings yet
3 Math Basics
70 pages
Multivariate Notes r1
No ratings yet
Multivariate Notes r1
54 pages
Gravitation, Gauge Theories and Differential Geometry
100% (2)
Gravitation, Gauge Theories and Differential Geometry
178 pages
Matrix Calculus: 1 The Derivative
100% (1)
Matrix Calculus: 1 The Derivative
13 pages
Chapter 1 Simple Linear Regression (Part 6: Matrix Version)
No ratings yet
Chapter 1 Simple Linear Regression (Part 6: Matrix Version)
12 pages
A Second Course in Elementary Differential Equations PDF
No ratings yet
A Second Course in Elementary Differential Equations PDF
201 pages
Ebook 2
No ratings yet
Ebook 2
8 pages
Lecture Note 3 - Introduction To Vector and Matrix Differentiation
No ratings yet
Lecture Note 3 - Introduction To Vector and Matrix Differentiation
6 pages
Cheatsheet 2
No ratings yet
Cheatsheet 2
2 pages
Review of Linear Algebra
No ratings yet
Review of Linear Algebra
19 pages
Session 04 (13.09.2023)
No ratings yet
Session 04 (13.09.2023)
36 pages
Chapter Matrix Derivative Common Cases
No ratings yet
Chapter Matrix Derivative Common Cases
6 pages
Poly Cet
No ratings yet
Poly Cet
220 pages
M0 1 After Class
No ratings yet
M0 1 After Class
21 pages
Applied Linear Algebra
No ratings yet
Applied Linear Algebra
121 pages
Thomas Minka - Note On Matrix Calculus and Algebra
No ratings yet
Thomas Minka - Note On Matrix Calculus and Algebra
19 pages
Selected Linear Algebra For Machine Learning
No ratings yet
Selected Linear Algebra For Machine Learning
30 pages
Refresher Algebra Calculus
100% (2)
Refresher Algebra Calculus
2 pages
Matrix Calculus
No ratings yet
Matrix Calculus
8 pages
Matrix 123
No ratings yet
Matrix 123
6 pages
NA 5 Latex
No ratings yet
NA 5 Latex
45 pages
Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning: Andrew Rosenberg
No ratings yet
Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning: Andrew Rosenberg
46 pages
A Journey From Linear Algebra To Machine Learning
No ratings yet
A Journey From Linear Algebra To Machine Learning
50 pages
Day 1
No ratings yet
Day 1
41 pages
Math 5390 Chapter 2
No ratings yet
Math 5390 Chapter 2
5 pages
A Glossary For Elementary Linear Algebra - Unknown
No ratings yet
A Glossary For Elementary Linear Algebra - Unknown
36 pages
Decomp
No ratings yet
Decomp
27 pages
Matrix Cookbook PDF
No ratings yet
Matrix Cookbook PDF
56 pages
MATLAB Linear Algebra
No ratings yet
MATLAB Linear Algebra
39 pages
00 Lectureslides LinAlg
No ratings yet
00 Lectureslides LinAlg
20 pages
Trigonometry
No ratings yet
Trigonometry
105 pages
Matrix Algebra: Harvey Mudd College Math Tutorial
No ratings yet
Matrix Algebra: Harvey Mudd College Math Tutorial
6 pages
Matrix Decompositions
No ratings yet
Matrix Decompositions
9 pages
Chap 06
No ratings yet
Chap 06
7 pages
Chapter 4.2 I Algebraic Fractions Enhance
No ratings yet
Chapter 4.2 I Algebraic Fractions Enhance
18 pages
Notas Algebra Linea 20
No ratings yet
Notas Algebra Linea 20
35 pages
TR (A) +TR (B) : - Y: of Matrix
No ratings yet
TR (A) +TR (B) : - Y: of Matrix
7 pages
Matrix Identities: Sam Roweis
No ratings yet
Matrix Identities: Sam Roweis
4 pages
cs229.... Machine Language. Andrew NG
No ratings yet
cs229.... Machine Language. Andrew NG
17 pages
Numerical Linear Algebra With Matlab
No ratings yet
Numerical Linear Algebra With Matlab
16 pages
Lecture Notes 9 Plane Curves and Polar Coordinates
No ratings yet
Lecture Notes 9 Plane Curves and Polar Coordinates
26 pages
1 Factoring: Precalculus Review Package
No ratings yet
1 Factoring: Precalculus Review Package
5 pages
Matrix Calculus - Notes On The Derivative of A Trace: Johannes Traa
No ratings yet
Matrix Calculus - Notes On The Derivative of A Trace: Johannes Traa
7 pages
Notes On Matrices
No ratings yet
Notes On Matrices
8 pages
Linear Algebra Primer: Daniel S. Stutts, PH.D
No ratings yet
Linear Algebra Primer: Daniel S. Stutts, PH.D
14 pages
Definitions & Examples: Polynomial Functions
No ratings yet
Definitions & Examples: Polynomial Functions
17 pages
Condensed Book
No ratings yet
Condensed Book
152 pages
Radical Equations 1 PDF
No ratings yet
Radical Equations 1 PDF
4 pages
Algebra Cambridge
No ratings yet
Algebra Cambridge
20 pages
Digital Communication Systems by Simon Haykin-101
No ratings yet
Digital Communication Systems by Simon Haykin-101
6 pages
2024 Math 2070 HW 1 Sol
No ratings yet
2024 Math 2070 HW 1 Sol
6 pages
23MAT117 - Linear Algebra CDP
No ratings yet
23MAT117 - Linear Algebra CDP
12 pages
Show That The Set of All Solutions To The Nonhomoge-: Spanning Sets
No ratings yet
Show That The Set of All Solutions To The Nonhomoge-: Spanning Sets
10 pages
Syllabi Int. MSC. Core Corses
No ratings yet
Syllabi Int. MSC. Core Corses
59 pages
CEGP013091: 49.248.216.238 02/02/2023 13:38:48 Static-238
No ratings yet
CEGP013091: 49.248.216.238 02/02/2023 13:38:48 Static-238
5 pages
Final Module 4 The Language of Sets
No ratings yet
Final Module 4 The Language of Sets
12 pages
Functions Paper 4
No ratings yet
Functions Paper 4
27 pages
B.tech. I-Yr Summer Classes Schedule
No ratings yet
B.tech. I-Yr Summer Classes Schedule
1 page
Polynomials Grade 10 Ashwin
No ratings yet
Polynomials Grade 10 Ashwin
10 pages
Math Worksheet Prime Fractions LCM
No ratings yet
Math Worksheet Prime Fractions LCM
4 pages
CLASS 10 Maths
No ratings yet
CLASS 10 Maths
2 pages
(WWW - Entrance-Exam - Net) - PTU MCA 3rd Semester Sample Paper 2
No ratings yet
(WWW - Entrance-Exam - Net) - PTU MCA 3rd Semester Sample Paper 2
2 pages
Wa0023
No ratings yet
Wa0023
4 pages
9S5Apm6HEemE8A7At5Cb6A Week 2 Householder
No ratings yet
9S5Apm6HEemE8A7At5Cb6A Week 2 Householder
7 pages
Maths PB-2
No ratings yet
Maths PB-2
6 pages
Class 8 PT 2 PDF
No ratings yet
Class 8 PT 2 PDF
2 pages
LE19 Vectors ANSWERS
No ratings yet
LE19 Vectors ANSWERS
2 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Topology Essentials
From Everand
Topology Essentials
Emil G. Milewski
5/5 (1)
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A First Course in Functional Analysis
From Everand
A First Course in Functional Analysis
Martin Davis
No ratings yet

Mat Deriv

Uploaded by

Mat Deriv

Uploaded by

Some Important Properties for Matrix Calculus

∇A Tr(ABAT C) = CAB + CT ABT (10)

=∇A:u(A) Tr(u(A)v(AT )) + ∇A:v(AT ) Tr(u(A)v(AT ))

4 An Example on Least-square Linear Regression

Thus the error can be represented as:

The squared error E(θ), according to the numerical definition:

which is equivalent to the matrix form:

The normal equation is obtained in matrix form.

You might also like