0% found this document useful (0 votes)

72 views15 pages

Matrix Differentiation

1) The document describes matrix-vector differentiation techniques using standard derivative rules and conversion rules. 2) Key aspects include taking the differential of composed functions by substituting the differential of the inner function into the differential of the outer function. 3) Examples are provided to demonstrate calculating first and second derivatives of functions, and expressing them in canonical forms involving the gradient and Hessian.

Uploaded by

kirthana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views15 pages

Matrix Differentiation

Uploaded by

kirthana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

1 Matrix-vector differentiation

1.1 Theory
To calculate most of the derivatives that arise in practice, only a small table of standard derivatives and
conversion rules is sufficient. It is most convenient to work in terms of «differential» — with it, you can not
to think about intermediate dimensions, but simply apply the standard rules.
Note: This section describes the matrix-vector differentiation technique itself. For a more detailed
description of the mathematical theory behind this technique, see section A.

Conversion rules Standard Derivatives table

dA = 0 d⟨A, X⟩ = ⟨A, dX⟩
d(αX) = α(dX) d⟨Ax, x⟩ = ⟨(A + AT )x, dx⟩
d(AXB) = A(dX)B d⟨Ax, x⟩ = 2⟨Ax, dx⟩ (if A = AT )
d(X + Y ) = dX + dY d(Det(X)) = Det(X)⟨X −T , dX⟩
T T
d(X ) = (dX) d(X −1 ) = −X −1 (dX)X −1
d(XY ) = (dX)Y + X(dY )
d⟨X, Y ⟩ = ⟨dX, Y ⟩ + ⟨X, dY ⟩

X ϕdX − (dϕ)X
d =
ϕ ϕ2

Here A, B — fixed matrices; α — fixed scalar; X, Y — arbitrary differentiable matrix functions (consistent
in dimensions so that all operations make sense); ϕ — arbitrary differentiable scalar function.
One of the most important is the derived composition rule. Let g(Y ) and f (X) — be two differentiable
functions, and we know the expressions for their differentials: dg(Y ) and df (X). To calculate the derivative
of composition ϕ(X) := g(f (X)) , as in the scalar case, you need:
• take the expression of the calculated differential dg(Y );
• substitute the value f (X) instead of Y , and the value df (X) instead of dY .

Example
Consider the function ϕ(x) := ln⟨Ax, x⟩, where A ∈ Sn++ . In this case,

dy
g(y) := ln(y), dg(y) = ; f (x) := ⟨Ax, x⟩, df (x) = 2⟨Ax, dx⟩.
y

Inline formal to dg(y) instead of y, the expression f (x) = ⟨Ax, x⟩, is a dy, the expression df (x) =
2⟨Ax, dx⟩:

2⟨Ax, dx⟩ 2⟨Ax, h⟩

dϕ(x) = (In the notation with «D» - large: Dϕ(x)[h] = ).
⟨Ax, x⟩ ⟨Ax, x⟩

Usually, all matrix-vector functions that arise in practice are composed using table functions and standard
operations on them. Due to the universality of the above rules, it becomes as easy to differentiate arbitrarily
complex functions of this type as it is to differentiate one-dimensional functions.
The resulting expression must eventually be reduced to one of the canonical forms:

1
Exit
Scalar Vector Matrix
Input
df (x) = f ′ (x)dx
Scalar − −
(f ′ (x): scalar; dx: scalar)
df (x) = ⟨∇f (x)dx⟩ df (x) = Jf (x)dx
Vector −
(∇f (x): vector; dx: vector) (Jf (x): matrix; dx: vector)
df (X) = ⟨∇f (X)dX⟩
Matrix − −
(∇f (X): matrix; dX: matrix)

The cases marked with «−» will not interest us. The object ∇f (x) (vector for the vector argument function
and matrix for the matrix argument function) is called gradient. The matrix Jf (x) is called the Jacobi
matrix.
You can find the second derivative of the function f (X) using the following «algorithm»:
◦ calculate the first derivative of the function; fix in the expression for df (X) the increment of dX —
denote it as dX1 ;
◦ calculate the derivative for the function g(X) = df (X), assuming dX1 is fixed (constant). The new
increment is dX2 .

Example
Let’s turn to the function ϕ(x) = ln⟨Ax, x⟩, where A ∈ Sn++ . We have already calculated its first
derivative: dϕ(x) = 2⟨Ax,dx⟩
⟨Ax,x⟩ . Denote dx by dx1 and consider the new function:

2⟨Ax, dx1 ⟩
g(x) =
⟨Ax, x⟩

Find the derivative of g(x), assuming that dx1 is a constant vector:

2⟨Ax, dx1 ⟩ d(2⟨Ax, dx1 ⟩)⟨Ax, x⟩ − 2⟨Ax, dx1 ⟩d⟨Ax, x⟩
d2 ϕ(x) = d =
⟨Ax, x⟩ ⟨Ax, x⟩2
4AxxT A

2⟨Adx1 , dx2 ⟩⟨Ax, x⟩ − 2⟨Ax, dx1 ⟩2⟨Ax, dx2 ⟩ 2A
= = − dx1 , dx2 .
⟨Ax, x⟩2 ⟨Ax, x⟩ ⟨Ax, x⟩2
D E
4AxxT A
(In the notation with D - large: D2 ϕ(x)[h1 , h2 ] = 2A
⟨Ax,x⟩ − ⟨Ax,x⟩2 h1 , h2 .)

For the second derivative, the canonical form for the scalar function of the vector argument
d2 f (x) = ⟨∇2 f (x)dx1 , dx2 ⟩.
The matrix ∇2 f (x) is called Hessian. For doubly continuously differentiable functions, the Hessian is a
symmetric matrix.

1.2 Tasks
Problem 1 (Quadratic function). Find the first and second derivatives of df (x) and d2 f (x), as well as the
gradient of ∇f (x) and the Hessian of ∇2 f (x) functions
1
f (x) := ⟨Ax, x⟩ − ⟨b, x⟩ + c, x ∈ Rn ,
2
where A ∈ Sn , b ∈ Rn , c ∈ R.

2
Solution. Find the first derivative:

1 1 1
df (x) = d ⟨Ax, x⟩ − ⟨b, x⟩ + c = d⟨Ax, x⟩ − d⟨b, x⟩ = 2⟨Ax, dx⟩ − ⟨b, dx⟩ = ⟨Ax − b, dx⟩ .
2 2 2

Note that df (x) is already written in the canonical form df (x) = ⟨∇f (x), dx⟩, so

∇f (x) = Ax − b .

Now find the second derivative:

d2 f (x) = d⟨Ax − b, dx1 ⟩ = ⟨d(Ax − b), dx1 ⟩ = ⟨d(Ax), dx1 ⟩ = ⟨Adx2 , dx1 ⟩ .

To find the Hessian, we give d2 f (x) to a canonical form d2 f (x) = ⟨∇2 f (x)dx1 , dx2 ⟩:

d2 f (x) = ⟨Adx1 , dx2 ⟩ ⇒ ∇2 f (x) = A .

Problem 2. Find the first and second derivatives of df (x) and d2 f (x), as well as the gradient of ∇f (x) and
the Hessian of ∇2 f (x) functions
1
f (x) := ∥Ax − b∥22 , x ∈ Rn ,
2
where A ∈ Rm×n , b ∈ Rn .
Solution. Find the first derivative:

1 1
df (x) = d ∥Ax − b∥22 = {d(∥x∥22 ) = d⟨x, x⟩ = 2⟨x, dx⟩} = 2⟨Ax − b, d(Ax − b)⟩ = ⟨Ax − b, Adx⟩ .
2 2
To find the gradient, we give df (x) to the canonical form df (x) = ⟨∇f (x), dx⟩:

df (x) = ⟨AT (Ax − b), dx⟩ ⇒ ∇f (x) = AT (Ax − b) .

Now find the second derivative:

d2 f (x) = d⟨Ax − b, Adx1 ⟩ = ⟨d(Ax − b), Adx1 ⟩ = ⟨Adx2 , Adx1 ⟩ = ⟨dx2 , AT Adx1 ⟩ .

To find the Hessian, we give d2 f (x) to a canonical form d2 f (x) = ⟨∇2 f (x)dx1 , dx2 ⟩:

d2 f (x) = ⟨AT Adx1 , dx2 ⟩ ⇒ ∇2 f (x) = AT A .

Problem 3 (The cube of the Euclidean norm). Find the first and second derivatives of df (x) and d2 f (x),
as well as the gradient of ∇f (x) and the Hessian of ∇2 f (x) functions
1
f (x) := ∥x∥32 , x ∈ Rn .
3
Solution. Find the first derivative:

1 1 13 1
df (x) = d ∥x∥2 = d⟨x, x⟩3/2 = ⟨x, x⟩1/2 d⟨x, x⟩ = ∥x∥2 (2⟨x, dx⟩) = ∥x∥2 ⟨x, dx⟩ .
3
3 3 3 2 2

3
To find the gradient, we give df (x) to a canonical form df (x) = ⟨∇f (x)dx⟩:

df (x) = ⟨∥x∥2 x, dx⟩ ⇒ ∇f (x) = ∥x∥2 x .

Now find the second derivative:

d2 f (x) = d(∥x∥2 ⟨x, dx1 ⟩) = d(∥x∥2 ) ⟨x, dx1 ⟩ + ∥x∥2 d⟨x, dx1 ⟩
| {z }
=d(⟨x,x⟩1/2 )

1
= ⟨x, x⟩−1/2 (2⟨x, dx2 ⟩ ⟨x, dx1 ⟩ + ∥x∥2 ⟨dx2 , dx1 ⟩
2

= ∥x∥−1
2 ⟨x, dx2 ⟩⟨x, dx1 ⟩ + ∥x∥2 ⟨dx2 , dx1 ⟩ .

To find the Hessian, we give d2 f (x) to the canonical form d2 f (x) = ⟨∇2 f (x)dx1 , dx2 ⟩:

d2 f (x) = ∥x∥−1
2 ⟨dx1 , x⟩⟨x, dx2 ⟩ + ∥x∥2 ⟨dx1 , dx2 ⟩

= ⟨(∥x∥−1 T
2 xx + ∥x∥2 In )dx1 , dx2 ⟩ ⇒ ∇2 f (x) = ∥x∥−1 T
2 xx + ∥x∥2 In .

Note that the resulting formula for the Hessian (and the second derivative) is only valid for x ̸= 0, since the value
of ∥x∥−1
2 is undefined for x = 0. This restriction arose because at the very beginning we used the product rule, and
we had a derivative d(∥x∥2 ) that does not exist at the point x = 0. Nevertheless, it can be shown that the function
f in question is everywhere twice continuously differentiable, and its second derivative at the point x = 0 is zero.
Thus, we can say that the formula was derived, in fact, true for any values of x, with the caveat that at the point
x = 0 the value of ∥x∥−1 T
2 x must be understood as 0 (the limit when x → 0).

Problem 4 (Euclidean norm). Find the first and second derivatives of df (x) and d2 f (x), as well as the
gradient of ∇f (x) and the Hessian of ∇2 f (x) functions

f (x) := ∥x∥2 , x ∈ Rn \ {0}.

Solution. Find the first derivative:

1 1
df (x) = d(∥x∥2 ) = d(⟨x, x⟩1/2 ) = ⟨x, x⟩−1/2 d⟨x, x⟩ = ∥x∥−1
2 2
−1
⟨x, dx⟩ = ∥x∥2 ⟨x, dx⟩ .
2 2
To find the gradient, we give df (x) to a canonical form df (x) = ⟨∇f (x)dx⟩:

df (x) = ⟨∥x∥−1
2 xdx⟩ ⇒ ∇f (x) = ∥x∥−1
2 x .

Now find the second derivative:

d2 f (x) = d(∥x∥−1 −1 −1
2 ⟨x, dx1 ⟩) = d(∥x∥2 )⟨x, dx1 ⟩ + ∥x∥2 d⟨x, dx1 ⟩

= −∥x∥−2 −1
2 d(∥x∥2 )⟨x, dx1 ⟩ + ∥x∥2 ⟨dx2 , dx1 ⟩
= −∥x∥−2 −1 −1
2 (∥x∥2 ⟨x, dx2 ⟩)⟨x, dx1 ⟩ + ∥x∥2 ⟨dx2 , dx1 ⟩

= ∥x∥−1 −3
2 ⟨dx2 , dx1 ⟩ − ∥x∥2 ⟨x, dx2 ⟩⟨x, dx1 ⟩ .

To find the Hessian, we give d2 f (x) to the canonical form d2 f (x) = ⟨∇2 f (x)dx1 , dx2 ⟩:

d2 f (x) = ∥x∥−1 −2
2 (⟨dx1 , dx2 ⟩ − ∥x∥2 ⟨dx1 , x⟩⟨x, dx2 ⟩)

= ⟨∥x∥−1 −2 T
2 (In − ∥x∥2 xx )dx1 , dx2 ⟩ ⇒ ∇2 f (x) = ∥x∥−1 −2 T
2 (In − ∥x∥2 xx ) .

4
Problem 5 (Logistics function). Find the first and second derivatives of df (x) and d2 f (x), as well as the
gradient of ∇f (x) and the Hessian of ∇2 f (x) functions

f (x) := ln(1 + exp(⟨a, x⟩)), x ∈ Rn ,

where a ∈ Rn .
Solution. Find the first derivative:

dx d(1 + exp(⟨a, x⟩)) d(exp(⟨a, x⟩))
df (x) = d(ln(1 + exp(⟨a, x⟩))) = d(ln(x)) == =
x 1 + exp(⟨a, x⟩) 1 + exp(⟨a, x⟩)
exp(⟨a, x⟩)d⟨a, x⟩ exp(⟨a, x⟩)⟨a, dx⟩ ⟨a, dx⟩
= {d(exp(x)) = exp(x)dx} = = =
1 + exp(⟨a, x⟩) 1 + exp(⟨a, x⟩) 1 + exp(−⟨a, x⟩)
= σ(⟨a, x⟩)⟨a, dx⟩ .

Here is the notation σ : R → R for sigmoid function:

1
σ(x) := .
1 + exp(−x)

To find the gradient, we give df (x) to a canonical form df (x) = ⟨∇f (x)dx⟩:

df (x) = ⟨σ(⟨a, x⟩)a, dx⟩ ⇒ ∇f (x) = σ(⟨a, x⟩)a .

So the gradient ∇f (x) – is a vector collinear to the vector a with the coefficient σ(⟨a, x⟩) ∈ (0, 1). Depending on the
point x, only the length of the vector ∇f (x) changes, but not its direction.
Now find the second derivative:

d2 f (x) = d(σ(⟨a, x⟩)⟨a, dx1 ⟩) = d(σ(⟨a, x⟩))⟨a, dx1 ⟩ = {d(σ(x)) = σ ′ (x)dx} = (σ ′ (⟨a, x⟩)d⟨a, x⟩)⟨a, dx1 ⟩
= σ ′ (⟨a, x⟩)⟨a, dx2 ⟩⟨a, dx1 ⟩ = {σ ′ (x) = σ(x)(1 − σ(x))}
= σ(⟨a, x⟩)(1 − σ(⟨a, x⟩))⟨a, dx2 ⟩⟨a, dx1 ⟩ .

To find the Hessian, we give d2 f (x) to the canonical form d2 f (x) = ⟨∇2 f (x)dx1 , dx2 ⟩:

d2 f (x) = σ(⟨a, x⟩)(1 − σ(⟨a, x⟩))⟨dx1 , a⟩⟨a, dx2 ⟩

= ⟨(σ(⟨a, x⟩)(1 − σ(⟨a, x⟩))aaT )dx1 , dx2 ⟩ ⇒ ∇2 f (x) = σ(⟨a, x⟩)(1 − σ(⟨a, x⟩))aaT .

Note that the Hessian ∇2 f (x) – is a peer matrix proportional to the matrix aaT with the coefficient σ(⟨a, x⟩)(1 −
σ(⟨a, x⟩)) ∈ (0, 0.25). The point x only affects the proportionality coefficient.

Problem 6 (Logarithm of the determinant). Find the first and second derivatives of df (X) and d2 f (X), as
well as the gradient of ∇f (X) functions

f (X) := ln(Det(X)),

given on the set Sn++ in the space Sn .

5
Solution. Find the first derivative:

dx

d(Det(X)) ⟨X −1 , dX⟩
Det(X)

df (X) = d(ln Det(X)) = d(ln(x)) = = = = ⟨X −1 , dX⟩ .
x Det(X) Det(X)

Note that df (X) is already written in the canonical form df (X) = ⟨∇f (X), dX⟩.1 So,

∇f (X) = X −1 .

Now find the second derivative:

d2 f (X) = d⟨X −1 , dX1 ⟩ = ⟨d(X −1 ), dX1 ⟩ = ⟨−X −1 (dX2 )X −1 , dX1 ⟩ = −⟨X −1 (dX2 )X −1 , dX1 ⟩ .

The result is a bilinear form from the increments of dX1 and dX2 in the matrix space.
Consider
D2 f (X)[H, H] = −⟨X −1 HX −1 , H⟩.
We show that D2 f (X)[H, H] has a negative sign for all X ∈ Sn++ and H ∈ Sn , i.e. that the function f is a
concave function. Indeed, by decomposing X −1 = X −1/2 X −1/2 , we rewrite D2 f (X)[H, H] in the following
form:
D2 f (X)[H, H] = −⟨X −1/2 HX −1/2 , X −1/2 HX −1/2 ⟩ = −∥X −1/2 HX −1/2 ∥2F .
From this it can be seen that D2 f (X)[H, H], indeed, has a negative sign.

Problem 7. Find the derivative of df (X) and the gradient of ∇f (X) functions

f (X) := ∥AX − B∥F , X ∈ Rk×n ,

where A ∈ Rm×k , B ∈ Rm×n .

Solution. Calculate separately d(∥X∥F ):

1 −1/2 1
d(∥X∥F ) = d(⟨X, X⟩1/2 ) = d(x1/2 ) = x dx = ⟨X, X⟩−1/2 d⟨X, X⟩
2 2
1
= ∥X∥−1
F 2⟨X, dX⟩ = ∥X∥−1
F ⟨X, dX⟩.
2

Now we use the resulting formula to find df (X):

df (X) = d(∥AX − B∥F ) = ∥AX − B∥−1

F ⟨AX − B, d(AX − B)⟩

= ∥AX − B∥−1
F ⟨AX − B, AdX⟩ .

To find the gradient, we give df (X) to a canonical form df (X) = ⟨∇f (X)dX⟩:

df (X) = ⟨∥AX − B∥−1 T

F A (AX − B)dX⟩ ⇒ ∇f (X) = ∥AX − B∥−1 T
F A (AX − B) .

Problem 8. Find the derivative of df (X) and the gradient of ∇f (X) functions

f (X) := Tr(AXBX −1 ), X ∈ Rn×n , Det(X) ̸= 0,

where A, B ∈ Rn×n .
1 In this example, we are working in the space of symmetric matrices Sn , so the sign of the transpose can be omitted.

6
Solution. For convenience, we will rewrite the trace through the scalar product:

f (X) = ⟨In , AXBX −1 ⟩.

Find the first derivative:

df (X) = d⟨In , AXBX −1 ⟩ = ⟨In , d(AXBX −1 )⟩ = ⟨In , (d(AXB))X −1 + (AXB)d(X −1 )⟩

= ⟨In , (A(dX)B)X −1 + (AXB)(−X −1 (dX)X −1 )⟩ = ⟨In , A(dX)BX −1 − AXBX −1 (dX)X −1 ⟩ .

To find the gradient, we give df (X) to the canonical form df (X) = ⟨∇f (X), dX⟩:

df (X) = ⟨In , A(dX)BX −1 ⟩ − ⟨In , AXBX −1 (dX)X −1 ⟩ =

= ⟨AT X −T B T , dX⟩ − ⟨X −T B T X T AT X −T , dX⟩
= ⟨AT X −T B T − X −T B T X T AT X −T , dX⟩
⇒ ∇f (X) = AT X −T B T − X −T B T X T AT X −T .

Problem 9. Consider the scalar argument function

ϕ(α) := f (x + αp), α ∈ R,

where x, p ∈ Rn , f : Rn → R — is a doubly continuously differentiable function. Find the first and second
derivatives of ϕ′ (α) and ϕ”(α) and express them in terms of the gradient ∇f (·) and the Hessian ∇2 f (·).
Solution. In this problem, you need to keep in mind that the differentiation is performed by α, and x is a
constant vector.
Find the first derivative:
dϕ(α) = dα (f (x + αp)) = {df (x) = ⟨∇f (x), dx⟩} = ⟨∇f (x + αp), dα (x + αp)⟩
= ⟨∇f (x + αp), (dα)p⟩ = ⟨∇f (x + αp), p⟩dα.

Here, the last equality follows from the fact that dα — is a scalar. Note that we have introduced dϕ(α) in
canonical form dϕ(α) = ϕ′ (α)dα. Means,

ϕ′ (α) = ⟨∇f (x + αp), p⟩ .

Now find the second derivative:

d2 ϕ(α) = dα (⟨∇f (x + αp), p⟩dα1 ) = ⟨dα ∇f (x + αp), p⟩dα1 = {d∇f (x) = ∇2 f (x)dx}
= ⟨∇2 f (x + αp)dα (x + αp), p⟩dα1 = ⟨∇2 f (x + αp)(dα2 )p, p⟩dα1
= ⟨∇2 f (x + αp)p, p⟩dα1 dα2 .

Thus, from the canonical form d2 ϕ(α) = ϕ”(α)α1 α2 , we get

ϕ′′ (α) = ⟨∇2 f (x + αp)p, p⟩ .

Problem 10. Consider the scalar argument function

ϕ(α) := ∥r(x + αp)∥2 , α ∈ R+ , r(x + αp) ̸= 0,

where x, p ∈ Rn , r : Rn → Rm — differentiable map. Find the derivative of ϕ′ (α) and express it in terms of
the Jacobi matrix Jr (·).

7
Solution. In this problem, as in the previous one, you need to constantly remember that the differentiation
is performed by α, and x is a constant vector.
Find the first derivative:

⟨x, dx⟩ ⟨r(x + αp), dα r(x + αp)⟩
dϕ(α) = dα (∥r(x + αp)∥2 ) = d∥x∥2 = = = {dr(x) = Jr (x)dx}
∥x∥2 ∥r(x + αp)∥2
⟨r(x + αp), Jr (x + αp)dα (x + αp)⟩ ⟨r(x + αp), Jr (x + αp)(dα)p⟩
= =
∥r(x + αp)∥2 ∥r(x + αp)∥2
⟨r(x + αp), Jr (x + αp)p⟩
= dα.
∥r(x + αp)∥2

From here
⟨r(x + αp), Jr (x + αp)p⟩
ϕ′ (α) = .
∥r(x + αp)∥2

A Derivatives: theory
A.1 Definition
Let’s start by recalling the concept of a derivative.
For a function of a single variable f : R → R, its derivative at the point x is denoted by f ′ (x) and is
determined from the equality:

f (x + h) = f (x) + f ′ (x)h + o(h) for all sufficiently small h.

In other words, by fixing some point x, we want to approximate the change of the function f (x + h) − f (x)
in the neighborhood of this point using a linear function over h, and f ′ (x)h is the best way to do this.
Let us now consider a more general situation.
Let U and V be finite-dimensional linear spaces with norms. The main examples of such spaces for us
will be numbers: R, vectors: Rn and matrices: Rn×m , as well as their combinations (Cartesian products).
Consider the function f : X → V , where X ⊆ U .

Definition A.1 (Differentiability). Let x ∈ X be the inner point of the set X, and let L : U → V be
the linear operator. We will say that the function f is differentiable at the point x with the derivative
L if the following decomposition holds for all sufficiently small h ∈ U :

f (x + h) = f (x) + L[h] + o(∥h∥). (A.1)

If for any linear operator L : U → V the function f is not differentiable at the point x with the derivative
L, then we will say that f is not differentiable at the point x. If the point x is not an internal point of
the set X, then we leave the notion of differentiability of the function f at the point x undefined.

Remark A.2. The expression o(∥h∥) has the standard value:

def ∥f (x + h) − f (x) − L[h]∥

f (x + h) − f (x) − L[h] = o(∥h∥) ⇐⇒ lim = 0.
h→0 ∥h∥

8
Remark A.3. Since the spaces U and V under consideration are finite-dimensional (and in a finite-
dimensional space all norms are topologically equivalent), it does not matter which specific norms are
used in the definition given above: if a function f is differentiable at x with the derivative L for one
choice of norms, then f will also be differentiable at x with the derivative L for any other choice of
norms.

Proposition A.4. Suppose that the function f is differentiable at x with the derivative L1 and is also
differentiable at x with the derivative L2 . Then L1 = L2 .

Thus, if the function f is differentiable at the point x, then its derivative L is defined in a unique way.
We will denote it with the symbol df (x).

Remark A.5. The object df depends on two parameters: the point x ∈ X, which we aproximarem
function and increment h ∈ U , which is deposited from the fixed point:

df : X × U → V, linear in the second argument — in «h».

Remark A.6. There are different notations for the derivative of the function f at the point x:

Df (x)[h] ≡ df (x)[h] ≡ Df (x)[∆x] ≡ df (x)[∆x].

They all mean the same thing. When working with the definition of the derivative, it is convenient to
explicitly specify the increment (h or ∆x) in square brackets. When calculating derivatives in practice,
using the already known calculated derivatives and conversion properties, the increment in square
brackets is usually not written: df (x) or even just df , when it is clear what it is about.

So, the derivative of the function at the point x — is the linear operator df (x) that best approximates
the increment of the function:
f (x + h) − f (x) ≈ Df (x)[h].
Another well-known and important concept is the derivative of a function in the direction. It turns out
that knowing the derivative of the function f , we can easily calculate its derivative along any direction h.

Proposition A.7. Let f be differentiable at the point x. Choose an arbitrary direction h. Then:

∂f (x) f (x + th) − f (x)

Df (x)[h] = := lim
∂h t→+0 t

That is, to calculate ∂f∂h

(x)
— the derivative of the function f along the direction h, it is enough to apply
df (x)[·] to this direction.
A set of vectors
ei = (0, . . . , 0, 1, 0, . . . , 0) ∈ Rn , i = 1, . . . , n
i
called standard basis in Rn .
If for some i the function has a (two-sided) derivative along the direction ei , then it is called the partial
derivative over thei -th coordinate:
∂f (x) f (x + tei ) − f (x)
:= lim = Df (x)[ei ].
∂xi t→0 t
Note that a function can be undifferentiated even if it has derivatives in all directions.

9
Example A.8. Consider the function f (x) = ∥x∥2 . Find its derivative in the direction of h at the point
x = 0:
∂∥x∥2 ∥0 + th∥2 − ∥0∥2 t∥h∥2
= lim = lim = ∥h∥2 .
∂h x=0 t→+0 t t→+0 t
If the function f (x) = ∥x∥2 are differentiable at zero, according to 2:

∂f (0)
Df (0)[h] = = ∥h∥2 ,
∂h
but the function ∥h∥2 is not linear, which contradicts the fact that the derivative is a linear operator.
So ∥x∥2 is not differentiable at zero, although it has derivatives along all directions.

Function gradient, Jacobi matrix.

• In the case of U = Rn , V = R the linear function Df (x)[h] can always be represented by a scalar
product with some vector:
Df (x)[h] = ⟨ax , h⟩ where ax ∈ Rn — different for each x.

The vector ax is called the gradient of the function f at the point x and is denoted by ∇f (x).
In the standard basis, the gradient of the function is represented as a vector of partial derivatives:
!
∂f ∂f
∇f (x) = (x), . . . , (x) ∈ Rn .
∂x1 ∂xn
Like all the vectors we have, this vector is a column vector.
• if U = Rn×m , V = R linear function df (x)[H] can always be represented using a dot product with
some matrix:
df (x)[H] = ⟨Ax , H⟩, Ax , H ∈ Rn×m .
This matrix is also called a gradient function at the point x: ∇f (x) = Ax and in the standard basis
(from the matrix with all zeros except one unity) is written in a matrix of partial derivatives:
!n,m
∂f
∇f (x) = (x)
∂xij
i=1,j=1

• if U = Rm , V = Rn , the linear operator df (x)[·], fixing bases, can always be represented by the matrix:
Df (x)[h] = Jx h, Jx ∈ Rn×m .

The matrix Jx is called the Jacobi matrix of the function f . In the standard basis, it consists of partial
derivatives: !n,m
∂fi
Jx = (x) .
∂xj
i=1,j=1

Proposition A.9 (Differential calculus). Let U and V — vector spaces, X — a subset of U , x ∈ X —

an inner point of X. The following properties are valid:
(a) (The derivative of a constant) Let f : X → V — be a constant function, i.e. there is v ∈ V that
f (x′ ) = v for all x′ ∈ X. Then f is differentiable at x, and df (x) = 0.
(b) (Derivative of the identity function) Let f : X → V — the identity function, i.e. f (x′ ) = x′

10
for all x′ ∈ X. Then f is differentiable at x, and its derivative — is also an identity function:
Df (x)[h] = h for all h ∈ U .
(c) (Linearity) Let f : X → V and g : X → V — functions. If f and g differentiable at x and
c1 , c2 ∈ R — numbers, the function (f c1 + c2 g) is also differentiable at x, and

d(f c1 + c2 g)(x) = c1 df (x) + c2 dg(x).

(d) (Usually works) Let α : X → R and f : X → V — functions. If α and f differentiable at x, the

function αf is also differentiable at x, and

D(αf )(x)[h] = (Dα(x)[h])f (x) + α(x)(Df (x)[h])

for all h ∈ U .
(e) (Composition Rule) Let Y — be a subset of V , f : X → Y — function. Also let W — vector
space, g : Y → W — function. If f is differentiable at x, and g is differentiable at f (x), then their
composition (g ◦ f ) : X → W (defined as (g ◦ f )(x) = g(f (x))) will also be differentiable at x, and
h i h i
D(g◦f )(x) = Dg(f (x)) df (x) or, in more detail, D(g◦f )(x)[h] = Dg(f (x)) , Df (x)[h] .

(f ) (Private Rule) Let α : X → R and f : X → V be functions. If α and f are differentiable at x,

and if α does not vanish at X, then the function (1/α)f is also differentiable at x, and

1 α(x)(Df (x)[h]) − (Dα(x)[h])f (x)
D f (x)[h] = .
α α(x)2

for all h ∈ U .

Proof. The first four properties are proved by definition, and the last is derived from the rules of composition
and composition.
Note that the product rule in the statement A.9 is set only if one of the functions is scalar. This is
understandable, since in vector spaces only multiplication by a scalar is defined, and not by an arbitrary
element of the vector space. However, in some special cases, the product rule remains true even if both
functions are non-scalar. For example, the following statement is true.

Proposition A.10. Let U — be a vector space, X — a subset of U , x ∈ X — an inner point of X. Let

f : X → Rm×n and g : X → Rn×k — matrix-valued functions. Suppose that f and g are differentiable
at x. Then the function f g is also differentiable at the point x, and

D(f g)(x)[h] = (Df (x)[h])g(x) + f (x)(Dg(x)[h]).

for all h ∈ U . (Here, the multiplication operation means matrix multiplication, so the order of the
multipliers is important.)

A.2 Second derivative

Let the function f : X → V be differentiable at each point x ∈ X ⊆ U .
Consider the derivative of the function f with a fixed increment h1 ∈ U as a function of x:

g(x) = Df (x)[h1 ].

11
Definition A.11. If there is a derivative for the function g at some point x, then it is called second
derivative of the function f at point x:

D2 f (x)[h1 , h2 ] := Dg(x)[h2 ].

It can be shown that D2 f (x)[h1 , h2 ] is a bilinear function of h1 and h2 .

By analogy, the third one is defined: D3 f (x)[h1 , h2 , h3 ], fourth and higher order derivatives.
If the derivative df (x) is a continuous function over x, then f — is said to be continuously differentiable.
If the second derivative D2 f (x) is continuous over x, then f — is twice continuously differentiable.
For the functions f : Rn → R, the second derivative, like any bilinear form, can be represented using the
matrix:
D2 f (x)[h1 , h2 ] = ⟨Hx h1 , h2 ⟩, Hx ∈ Rn×n .
The matrix Hx is called the Hessian of the function f at the point x and is usually denoted by ∇2 f (x).
In the standard basis, this matrix consists of the second partial derivatives:
!n,n
2 ∂2f
∇ f (x) = (x)
∂xi ∂xj
i=1,j=1

For a doubly continuously differentiable function, its Hessian is a symmetric matrix:

∇2 f (x) ∈ Sn .

A.3 Taylor formula

For a doubly continuously differentiable function, the Taylor formula holds:
1
f (x + h) = f (x) + Df (x)[h] + D2 f (x)[h, h] + o(∥h∥2 ).
2
For a function f : Rn → R can be written using the gradient and Hessian:
1
f (x + h) = f (x) + ⟨∇f (x), h⟩ + ⟨∇2 f (x)h, h⟩ + o(∥h∥2 ).
2
If the function has continuous derivatives up to and including the order of k, then Taylor’s formula can
be written down to the kth derivative:
1 2 1 1
f (x + h) = f (x) + Df (x)[h] + D f (x)[h, h] + D3 f (x)[h, h, h] + · · · + Dk f (x)[h, . . . , h] + o(∥h∥k ).
2! 3! k!

A.4 Calculating tabular derivatives

Note: Throughout the following, ∥ · ∥ denotes (for short) the Euclidean norm for vectors and the spectral
(operator) norm for matrices.

Example A.12 (Linear function). Let c ∈ Rn , and let f : Rn → R — function f (x) := ⟨c, x⟩. We show
that f is differentiable at an arbitrary point x ∈ Rn and find its derivative df (x) : Rn → R. To do this,
we fix an arbitrary increment of the argument h ∈ Rn and calculate the corresponding increment of the
function:
f (x + h) − f (x) = ⟨c, x + h⟩ − ⟨c, x⟩ = ⟨c, h⟩.
Note that the mapping h 7→ ⟨c, h⟩ is linear. So, for the function f , the decomposition (A.1) with

12
Df (x)[hisvalid] := ⟨c, h⟩. Thus, the function f is differentiable at an arbitrary point x ∈ Rn with the
derivative Df (x)[h] = ⟨c, h⟩.

Example A.13 (Quadratic form). Let A ∈ Rn×n , and let f : Rn → R — function f (x) := ⟨Ax, x⟩. We
fix an arbitrary point x ∈ Rn and an arbitrary increment of the argument h ∈ Rn and calculate the
corresponding increment of the function:

f (x + h) − f (x) = ⟨A(x + h), x + h⟩ − ⟨Ax, x⟩ = ⟨(A + AT )x, h⟩ + ⟨Ah, h⟩.

Note that the mapping h 7→ ⟨(A + AT )x, h⟩ is linear, and ⟨Ah, h⟩ = o(∥h∥) , since the following chain
of inequalities is valid for all h ∈ Rn :

|⟨Ah, h⟩| ≤ ∥h∥∥Ah∥ ≤ ∥A∥∥h∥2 .

Here, the first inequality follows from the Cauchy-Bunyakovsky inequality; the second inequality follows
from the consistency of the matrix and vector norms. Thus, the function f is differentiable at an
arbitrary point x ∈ Rn with the derivative Df (x)[h] = ⟨(A + AT )x, h⟩.

Example A.14 (Inverse matrix). Let S := {X ∈ Rn×n : Det(X) ̸= 0} — the set of all square
nondegenerate matrices of size n. Consider the function f : S → S, which for each matrix X ∈ S
returns its inverse: f (X) := X −1 . We show that f is differentiable at any point X ∈ S. To do
this, fix an arbitrary sufficiently small increment parameter H ∈ Rn×n (satisfying X + H ∈ S and
∥H∥ < 1/∥X −1 ∥) and consider the corresponding increment of the function:

f (X + H) − f (X) = (X + H)−1 − X −1 = (X(In + X −1 H))−1 − X −1 = ((In + X −1 H)−1 − In )X −1 .

Let’s evaluate separately (In +X −1 H)−1 . To do this, we decompose this matrix into a Neumann series:a
∞
X
−1 −1 −1
(In + X H) = In − X H+ (−X −1 H)k .
k=2

Note that the series on the right side of the last equality is absolutely convergent, since ∥X −1 H∥ < 1 is
sufficiently small H. We show that the sum of this series is o(H∥):
∞ ∞ ∞
X X X ∥X −1 ∥2 ∥H∥2
(−X −1 H)k ≤ ∥(−X −1 H)k ∥ ≤ ∥X −1 ∥k ∥H∥k = .
1 − ∥X −1 ∥ · ∥H∥
k=2 k=2 k=2

Here, the first inequality follows from the triangle inequality for the norm; the second inequality follows
from the submultiplicativity of the norm; then the sum of the geometric series is calculated. Thus,

(In + X −1 H)−1 = In − X −1 H + o(∥H∥).

Substituting this expression into the above formula for the increment of the function, we get

f (X + H) − f (X) = −X −1 HX −1 + o(∥H∥).

13
Thus, the function f is differentiable at an arbitrary point X ∈ S with the derivative df (X)[H] =
−X −1 HX −1 .
∞
a We mean the decomposition (In − A)−1 = Ak , valid for any matrix A ∈ Rn×n , such that ∥A∥ < 1. This formula
P
k=0
∞
is a generalization of the well-known formula for the sum of a geometric series: (1 − q)−1 = q k for any |q| < 1.
P
k=0

Remark A.15. The derived formula for the derivative of the function X −1 can be obtained very
simply by using the following trick. Consider the differential of the unit matrix d(In ). On the one hand,
since the matrix is constant, d(In ) = 0. On the other hand, by the product rule, dIn = d XX −1 =

(dX)X −1 + Xd(X −1 ). Equating the expressions, we get d(X −1 ) = −X −1 (dX)X −1 , or, in another
form, d(X −1 )[H] = −X −1 HX −1 . Note, however, that the above argument is not a complete proof of
the identity, since it assumes, but does not prove, the existence of the differential d(X −1 ).

Example A.16 (The determinant of the matrix). Let f : Rn×n → R — the function f (X) := Det(X).
Consider an arbitrary point X ∈ Rn×n and an arbitrary increment parameter H ∈ Rn×n . We will
assume that the matrix X is invertible. Write out the corresponding increment of the function:

f (X+H)−f (X) = Det(X+H)−Det(X) = Det(X(In +X −1 H))−Det(X) = Det(X)(Det(In +X −1 H)−1).

Let’s evaluate separately Det(In + X −1 H). To do this, we use the fact that the determinant of the
matrix is equal to the product of its eigenvalues. Let λ1 (X −1 H), . . . , λn (X −1 H) — the eigenvalues of
the matrix X −1 H (numbered in any order and possibly complex). Note that the eigenvalues of the
matrix In + X −1 H are 1 + λ1 (X −1 H), . . . , 1 + λn (X −1 H). Therefore
 
Yn Xn X
Det(In +X −1 H) = [1+λi (X −1 H)] = 1+ λi (X −1 H)+  λi (X −1 H)λj (X −1 H) + . . .  ,
i=1 i=1 1≤i≤j≤n

where the ellipsis hides the sum of all possible triples λi (X −1 H)λj (X −1 H)λk (X −1 H), all possible fours,
and so on.Note that the expression in parentheses is the value o(∥H∥). This follows from the triangle
inequality and the fact that for an arbitrary matrix A ∈ Rn×n , all its eigenvalues do not exceed modulo
its norm ∥A∥. (Indeed, let λ ∈ C — be the eigenvalue of the matrix A, and let x ∈ Cn \ {0} — the
corresponding eigenvector is: Ax = λx. Then |λ|∥x∥ = ∥Ax∥ ≤ ∥A∥∥x∥.) So
n
X
Det(In − X −1 H) = 1 + λi (X −1 H) + o(∥H∥) = 1 + Tr(X −1 H) + o(∥H∥).
i=1

Substituting the resulting expression in the above formula for the increment of the function, we get

f (X + H) − f (X) = Det(X) Tr(X −1 H) + o(∥H∥).

Thus, for any invertible matrix X ∈ Rn×n , the function f is differentiable at the point X with the
derivative df (x)[H] = Det(X) Tr(X −1 H) = Det(X)⟨X −T , H⟩.

Remark A.17. It can be shown that the function in question f (X) = Det(X) will be differentiable
everywhere on Rn×n , and not just on a subset of invertible matrices. The general formula for the

14
derivative in this case is called Jacobi formula and looks like this: df (X)[H] = Tr(Adj(X)H), where
Adj(X) — is the attached matrix to X. Note that if X — is a non-degenerate matrix, then Adj(X) =
Det(X)X −1 and the Jacobi formula passes into the proven formula df (X)[H] = Det(X) Tr(X −1 H).

2 Gradient Vector and Directional Derivative Portal File
No ratings yet
2 Gradient Vector and Directional Derivative Portal File
29 pages
Document Calculus II Chapter 2
No ratings yet
Document Calculus II Chapter 2
23 pages
Opt Sem3
No ratings yet
Opt Sem3
50 pages
Gradients Involving Matrices
No ratings yet
Gradients Involving Matrices
5 pages
Mths
No ratings yet
Mths
4 pages
Mathématiques Avancées Master Wow
No ratings yet
Mathématiques Avancées Master Wow
4 pages
Cs 01 1 Solutions
No ratings yet
Cs 01 1 Solutions
3 pages
ME554 Sheet 3 Final PDF
No ratings yet
ME554 Sheet 3 Final PDF
31 pages
斯坦福大学机器学习数学基础 17-24
No ratings yet
斯坦福大学机器学习数学基础 17-24
8 pages
Calculus Ii: Chapter 1: Functions of Several Variables
No ratings yet
Calculus Ii: Chapter 1: Functions of Several Variables
31 pages
Scalar and Vector Field Operations
No ratings yet
Scalar and Vector Field Operations
6 pages
Chapter 2 - Differential Calculus
No ratings yet
Chapter 2 - Differential Calculus
88 pages
Matrix Calculus 2
No ratings yet
Matrix Calculus 2
6 pages
Linear Algebra Assignment Solution
100% (1)
Linear Algebra Assignment Solution
28 pages
CENG3300 Lecture 2-1
No ratings yet
CENG3300 Lecture 2-1
21 pages
Mit18 S096iap23 Lec1
No ratings yet
Mit18 S096iap23 Lec1
16 pages
Matrix Differential
No ratings yet
Matrix Differential
21 pages
Maths
No ratings yet
Maths
6 pages
Gradient, Jacobian, Hessian, Laplacian and All That
No ratings yet
Gradient, Jacobian, Hessian, Laplacian and All That
2 pages
Applied Numerical Methods Using MATLAB - 2005 - Yang - Appendix C Differentiation With Respect To A Vector
No ratings yet
Applied Numerical Methods Using MATLAB - 2005 - Yang - Appendix C Differentiation With Respect To A Vector
2 pages
Second Derivatives
No ratings yet
Second Derivatives
6 pages
Quantity of Sewage and Storm Water
No ratings yet
Quantity of Sewage and Storm Water
12 pages
PartialDerivative and Polar Coordinates
No ratings yet
PartialDerivative and Polar Coordinates
7 pages
Calculus
No ratings yet
Calculus
5 pages
Gradient Descent - Xiaowei Huang
No ratings yet
Gradient Descent - Xiaowei Huang
53 pages
Linear Quadratic Gradients
No ratings yet
Linear Quadratic Gradients
3 pages
Matrix Calculus Tutorial
No ratings yet
Matrix Calculus Tutorial
7 pages
matrixcalc Đạo hàm ma trận PDF
No ratings yet
matrixcalc Đạo hàm ma trận PDF
25 pages
Curs Tehnici de Optimizare
No ratings yet
Curs Tehnici de Optimizare
141 pages
Calculus With Vectors and Matrices
No ratings yet
Calculus With Vectors and Matrices
16 pages
Discrete Second Derivative Matrix
No ratings yet
Discrete Second Derivative Matrix
5 pages
MC
No ratings yet
MC
27 pages
JETIR2305743
No ratings yet
JETIR2305743
13 pages
CAFE DES ARTISTES Recipes
No ratings yet
CAFE DES ARTISTES Recipes
17 pages
Ann2018 L5
No ratings yet
Ann2018 L5
23 pages
Sol - 08
100% (1)
Sol - 08
2 pages
Chapter Matrix Derivative Common Cases
No ratings yet
Chapter Matrix Derivative Common Cases
6 pages
Splash+Trickle Fills - Cut Sheet - CT089 - 01-24 - EN-REV1
0% (1)
Splash+Trickle Fills - Cut Sheet - CT089 - 01-24 - EN-REV1
2 pages
Notes
No ratings yet
Notes
21 pages
RSDH 300 Spec
No ratings yet
RSDH 300 Spec
4 pages
Vector Calculus and Differential Forms With Applications To Electromagnetism
No ratings yet
Vector Calculus and Differential Forms With Applications To Electromagnetism
28 pages
Real Analysis
No ratings yet
Real Analysis
49 pages
QUANTUM No-Go Locator
No ratings yet
QUANTUM No-Go Locator
1 page
Math-Chapter 6
No ratings yet
Math-Chapter 6
4 pages
The Effect of Delay Time On Fragmentation Distribution Through Small and Medium Scale Testing and Analysis
No ratings yet
The Effect of Delay Time On Fragmentation Distribution Through Small and Medium Scale Testing and Analysis
25 pages
Second Order Partial Derivatives The Hessian Matrix Minima and Maxima
No ratings yet
Second Order Partial Derivatives The Hessian Matrix Minima and Maxima
12 pages
As-Most Imp Ques On All Chapters MS
No ratings yet
As-Most Imp Ques On All Chapters MS
105 pages
Matrix Calculus
No ratings yet
Matrix Calculus
8 pages
Thời gian làm bài: 60 phút không kể thời gian phát đề
No ratings yet
Thời gian làm bài: 60 phút không kể thời gian phát đề
7 pages
Math4ml PDF
No ratings yet
Math4ml PDF
21 pages
Derivada de Fréchet
No ratings yet
Derivada de Fréchet
18 pages
Lecture Note 3 - Introduction To Vector and Matrix Differentiation
No ratings yet
Lecture Note 3 - Introduction To Vector and Matrix Differentiation
6 pages
046to055 Gr12 Detailed Ch06 Q Book
No ratings yet
046to055 Gr12 Detailed Ch06 Q Book
10 pages
B3D: Mathematics University College London Spring 2006.: 1 Functions of Several Variables
No ratings yet
B3D: Mathematics University College London Spring 2006.: 1 Functions of Several Variables
7 pages
Ebs Boltmark Series
No ratings yet
Ebs Boltmark Series
8 pages
Lecture Note 5
No ratings yet
Lecture Note 5
9 pages
FEM Analysis of CORNERING CHARACTERISTICS OF ROTATING TIRES PDF
No ratings yet
FEM Analysis of CORNERING CHARACTERISTICS OF ROTATING TIRES PDF
178 pages
New Automated Lubricity Tester Evaluates Fluid Additives, Systems and Their Application
No ratings yet
New Automated Lubricity Tester Evaluates Fluid Additives, Systems and Their Application
8 pages
725017ku 2
No ratings yet
725017ku 2
13 pages
Multi-Variate Calculus Applications
No ratings yet
Multi-Variate Calculus Applications
10 pages
Thomas Minka - Note On Matrix Calculus and Algebra
No ratings yet
Thomas Minka - Note On Matrix Calculus and Algebra
19 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Sol - 12
No ratings yet
Sol - 12
2 pages
Math Grade 9 Teacher Made Test Sy 2023-2024
No ratings yet
Math Grade 9 Teacher Made Test Sy 2023-2024
6 pages
Eapp W3
No ratings yet
Eapp W3
4 pages
Thawe Temple
No ratings yet
Thawe Temple
1 page
Edyn 5 23
No ratings yet
Edyn 5 23
1 page
Edyn 8 23
No ratings yet
Edyn 8 23
1 page
CS6910 Tutorial1
No ratings yet
CS6910 Tutorial1
10 pages
Pelco IP110 Series Camclosure Spec
No ratings yet
Pelco IP110 Series Camclosure Spec
4 pages
Tutorial 1 Fundamentals: Review of Derivatives, Gradients and Hessians
No ratings yet
Tutorial 1 Fundamentals: Review of Derivatives, Gradients and Hessians
25 pages
Sol - 09
No ratings yet
Sol - 09
2 pages
Edyn 9 23
No ratings yet
Edyn 9 23
2 pages
ECE275AB Lecture 10 View Graphs 2008-2009
No ratings yet
ECE275AB Lecture 10 View Graphs 2008-2009
15 pages
Matrix Derivatives
No ratings yet
Matrix Derivatives
4 pages
Matrix Calculus: 1 The Derivative
100% (1)
Matrix Calculus: 1 The Derivative
13 pages
Directional Derivatives and Gradient Vector: Theorem
No ratings yet
Directional Derivatives and Gradient Vector: Theorem
25 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
TESLA
No ratings yet
TESLA
2 pages
Differential Calculus For Vector Functions 1 Vector Functions of Variable
No ratings yet
Differential Calculus For Vector Functions 1 Vector Functions of Variable
11 pages
SSP392 Audi A5 PDF
100% (7)
SSP392 Audi A5 PDF
92 pages
1 Linear Transformations and Their Matrix Repre-Sentations
No ratings yet
1 Linear Transformations and Their Matrix Repre-Sentations
9 pages
Physiology Prep Manual For Undergraduates - 4th Edition Readable PDF Download
No ratings yet
Physiology Prep Manual For Undergraduates - 4th Edition Readable PDF Download
15 pages
Vector and Matrix Calculus: Herman Kamper 30 January 2013
No ratings yet
Vector and Matrix Calculus: Herman Kamper 30 January 2013
5 pages
In Hong Kong and Macao
No ratings yet
In Hong Kong and Macao
17 pages
Partial Derivatives
No ratings yet
Partial Derivatives
9 pages
Selena-Sledai : A Tool To Assess Disease Activity in Patients With SLE
No ratings yet
Selena-Sledai : A Tool To Assess Disease Activity in Patients With SLE
2 pages
N9020A MXA X-Series Signal Analyzer: Data Sheet
No ratings yet
N9020A MXA X-Series Signal Analyzer: Data Sheet
18 pages
Ethical Universalism - Relativism
No ratings yet
Ethical Universalism - Relativism
4 pages
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Bfu-100m User Manual 2010
No ratings yet
Bfu-100m User Manual 2010
24 pages
TCP-permission To Develop Land Form PDF
No ratings yet
TCP-permission To Develop Land Form PDF
4 pages
16SPC14XTB
No ratings yet
16SPC14XTB
1 page
Free Whale Intarsia Pattern by JGR PDF
No ratings yet
Free Whale Intarsia Pattern by JGR PDF
11 pages
Thales Flash Dipping Sonar PDF
No ratings yet
Thales Flash Dipping Sonar PDF
2 pages

Matrix Differentiation

Uploaded by

Matrix Differentiation

Uploaded by

1 Matrix-vector differentiation

Conversion rules Standard Derivatives table

2⟨Ax, dx⟩ 2⟨Ax, h⟩

Find the derivative of g(x), assuming that dx1 is a constant vector:

Now find the second derivative:

d2 f (x) = ⟨Adx1 , dx2 ⟩ ⇒ ∇2 f (x) = A .

df (x) = ⟨AT (Ax − b), dx⟩ ⇒ ∇f (x) = AT (Ax − b) .

Now find the second derivative:

d2 f (x) = ⟨AT Adx1 , dx2 ⟩ ⇒ ∇2 f (x) = AT A .

df (x) = ⟨∥x∥2 x, dx⟩ ⇒ ∇f (x) = ∥x∥2 x .

Now find the second derivative:

f (x) := ∥x∥2 , x ∈ Rn \ {0}.

Solution. Find the first derivative:

Now find the second derivative:

f (x) := ln(1 + exp(⟨a, x⟩)), x ∈ Rn ,

Here is the notation σ : R → R for sigmoid function:

df (x) = ⟨σ(⟨a, x⟩)a, dx⟩ ⇒ ∇f (x) = σ(⟨a, x⟩)a .

d2 f (x) = σ(⟨a, x⟩)(1 − σ(⟨a, x⟩))⟨dx1 , a⟩⟨a, dx2 ⟩

given on the set Sn++ in the space Sn .

Now find the second derivative:

f (X) := ∥AX − B∥F , X ∈ Rk×n ,

where A ∈ Rm×k , B ∈ Rm×n .

df (X) = d(∥AX − B∥F ) = ∥AX − B∥−1

df (X) = ⟨∥AX − B∥−1 T

f (X) := Tr(AXBX −1 ), X ∈ Rn×n , Det(X) ̸= 0,

f (X) = ⟨In , AXBX −1 ⟩.

Find the first derivative:

df (X) = d⟨In , AXBX −1 ⟩ = ⟨In , d(AXBX −1 )⟩ = ⟨In , (d(AXB))X −1 + (AXB)d(X −1 )⟩

= ⟨In , (A(dX)B)X −1 + (AXB)(−X −1 (dX)X −1 )⟩ = ⟨In , A(dX)BX −1 − AXBX −1 (dX)X −1 ⟩ .

df (X) = ⟨In , A(dX)BX −1 ⟩ − ⟨In , AXBX −1 (dX)X −1 ⟩ =

Problem 9. Consider the scalar argument function

ϕ′ (α) = ⟨∇f (x + αp), p⟩ .

Now find the second derivative:

Thus, from the canonical form d2 ϕ(α) = ϕ”(α)α1 α2 , we get

ϕ′′ (α) = ⟨∇2 f (x + αp)p, p⟩ .

Problem 10. Consider the scalar argument function

ϕ(α) := ∥r(x + αp)∥2 , α ∈ R+ , r(x + αp) ̸= 0,

f (x + h) = f (x) + f ′ (x)h + o(h) for all sufficiently small h.

f (x + h) = f (x) + L[h] + o(∥h∥). (A.1)

Remark A.2. The expression o(∥h∥) has the standard value:

def ∥f (x + h) − f (x) − L[h]∥

df : X × U → V, linear in the second argument — in «h».

Df (x)[h] ≡ df (x)[h] ≡ Df (x)[∆x] ≡ df (x)[∆x].

∂f (x) f (x + th) − f (x)

That is, to calculate ∂f∂h

Function gradient, Jacobi matrix.

Proposition A.9 (Differential calculus). Let U and V — vector spaces, X — a subset of U , x ∈ X —

d(f c1 + c2 g)(x) = c1 df (x) + c2 dg(x).

(d) (Usually works) Let α : X → R and f : X → V — functions. If α and f differentiable at x, the

D(αf )(x)[h] = (Dα(x)[h])f (x) + α(x)(Df (x)[h])

(f ) (Private Rule) Let α : X → R and f : X → V be functions. If α and f are differentiable at x,

Proposition A.10. Let U — be a vector space, X — a subset of U , x ∈ X — an inner point of X. Let

D(f g)(x)[h] = (Df (x)[h])g(x) + f (x)(Dg(x)[h]).

A.2 Second derivative

It can be shown that D2 f (x)[h1 , h2 ] is a bilinear function of h1 and h2 .

For a doubly continuously differentiable function, its Hessian is a symmetric matrix:

A.3 Taylor formula

A.4 Calculating tabular derivatives

f (x + h) − f (x) = ⟨A(x + h), x + h⟩ − ⟨Ax, x⟩ = ⟨(A + AT )x, h⟩ + ⟨Ah, h⟩.

|⟨Ah, h⟩| ≤ ∥h∥∥Ah∥ ≤ ∥A∥∥h∥2 .

f (X + H) − f (X) = (X + H)−1 − X −1 = (X(In + X −1 H))−1 − X −1 = ((In + X −1 H)−1 − In )X −1 .

(In + X −1 H)−1 = In − X −1 H + o(∥H∥).

f (X+H)−f (X) = Det(X+H)−Det(X) = Det(X(In +X −1 H))−Det(X) = Det(X)(Det(In +X −1 H)−1).

f (X + H) − f (X) = Det(X) Tr(X −1 H) + o(∥H∥).

You might also like