Calculus of Variation and Image Processing: Scalar Product
Calculus of Variation and Image Processing: Scalar Product
v
n
. The scalar product of
u and
v is dened as,
<
u ,
v >=
u
v =
n
i=1
u
i
v
i
Let A be a square transformation matrix for vectors in
n
.
A
u
v =< A
u ,
v >=<
u , t
A
v >
where t
A
is the transpose of the matrix A.
Scalar product of functions
C
k
[0, 1] denotes a function space where in all the functions are continuous and
whose derivatives up to an order k exist and are continuous. Let f, g C
k
[0, 1].
The function can be viewed as an innite dimensional vector and we can dene
the scalar product of the two functions as,
< f, g >
L
2=
_
1
0
f(x)g(x)dx
Properties of scalar product
Let f, g, g
C
k
[0, 1] and let , .
Linearity:
< f, g + g
>
1
Symmetric:
< f, g >=< g, f >
Positive square norm:
< f, f > 0
Zero function
< f, f >= 0 f = 0
Extremum of nite dimensional functions
Let
f :
n
Let
v
n
be the extremum point for the function f.
Let
w
n
, t
Hence if we perturb f at
v , the value will increase if
v is a point of minima
or decrease if it is a point of maxima.
Let
g
w
(t) = f(
v + t
w)
The perturbation is in the direction
w by an amount t
w
(t) =
d
dt
(
v + t
w) f
=
w f(
v + t
w)
We now nd what is called the directional derivative of f in the direction of
w
at the point
v (i.e. t = 0).
g
w
(0) =
w f(
v ) =
f
w
(
v )
=< f(
v ),
w >
Thus, the directional derivative is a scalar product of a vector that is indepen-
dent of the direction of perturbation (f(
v ), known as the gradient of the
function) and a vector that gives the direction of perturbation. This is a char-
acteristic of directional derivative. So, in order to nd out a point of extremum,
all that is required to do is to check the necessary condition for a point to be
a point of extremum, that the gradient of the function at that point should be
zero. Point of extremum will be a point
v , where,
f(
v ) = 0
2
If, for any function f,
f
w
(
v ) =< J,
w > (1)
where J does not depend on the direction of perturbation and
w gives the
direction of perturbation, then J is the gradient of the function f. We use this
characteristic to nd out the gradient of functions for calculating their points
of extremum. Thus our rst step in nding a point of extremum is to compute
the gradient of the function.
Extremum points in function spaces
Here too, we will nd out the equivalent of gradient of a function. Functions
whose domain consists of a function space and maps these functions to a denite
number in are called functionals. Let
E : C
k
[0, 1]
E(f) =
_
1
0
f
2
(x)dx
Proceeding in the same fashion as in the last section, we perturb E in the
direction of, say, g C
k
[0, 1]. Let
i
g
(t) = E(f + tg)
where t . Now, i
g
(t) is a function whose domain is one dimensional. Dier-
entiating g with respect to t.
i
g
(t) =
d
dt
E(f + tg)
The directional derivative of E in the direction g computed at f is given by,
d
dt
E(f + tg)|
t=0
=:
E
g
(f)
This is known as the Gateaux derivative.
E(f + tg) =
_
1
0
(f + tg)
2
dx =
_
1
0
(f
2
+ t
2
g
2
+ 2tfg)dx
d
dt
E(f + tg) =
_
1
0
d
dt
(f
2
+ t
2
g
2
+ 2tfg)dx =
_
1
0
(2tg
2
+ 2fg)dx
Putting t = 0,
d
dt
E(f + tg)|
t=0
=
_
1
0
2fgdx =< 2f, g >
L
2
3
Comparing with equation (1), the rst component depends only on the function
while the second depends only on the direction of perturbation.
Gradient of E(f) = f = 2f So for a point of extremum,
f = 0
Hence, f = 0.
Note: The gradient of a function also depends on the denition of scalar prod-
uct used. The scalar product can be used to dene norm of the points, using
which distance is dened in the concerned space. Gradient measures the maxi-
mum change in the value of the function or functional along with the direction of
this change. Hence the gradient is dependent on the denition of scalar product
used.
Finding shortest path between two xed points
Let A and B be two xed points (in
2
) between which we have to nd a
shortest distance path. Let C(t) = (x(t), y(t)), t denote a curve between
the two points. The length of the curve is given by,
L =
_
_
_
dx
dt
_
2
+
_
dy
dt
_
2
_1
2
dt
The functional to be minimized is given by,
E(C) =
_
1
0
(x
(t)
2
+ y
(t)
2
)
1
2
dt
where (x(0), y(0)) = A, (x(1), y(1)) = B
Let us perturb the functional by another function (curve in this case), =
(u(t), v(t)). Note that (0) = 0, (1) = 0, because the end points A and B
should remain xed.
The directional derivative of E, in the direction of is given by,
d
ds
E(C + s)|
s=0
d
ds
E(C + s) =
_
1
0
d
ds
_
(x + su)
2
+ (y + sv)
2
_1
2
dt
d
ds
E(C + s)|
s=0
=
_
1
0
1
2
1
_
x
2
+ y
2
(2u
+ 2y
)dt
=
_
1
0
1
_
x
2
+ y
2
(x
, y
) (u
, v
)dt
=
_
1
0
1
|C
(t)|
x
dt +
_
1
0
1
|C
(t)|
y
dt (2)
4
Integrating by parts,
_
1
|C
(t)|
x
u
_
1
0
_
1
0
d
dt
_
x
|C
(t)|
_
u dt +
_
1
|C
(t)|
y
v
_
1
0
_
1
0
d
dt
_
y
|C
(t)|
_
v dt
Applying the given boundary conditions on , the rst and third term vanish.
We have,
=
_
1
0
_
d
dt
_
x
|C
(t)|
,
y
|C
(t)|
_
(u, v)
_
dt
=
_
1
0
_
d
dt
_
C
(t)
|C
(t)|
__
dt (3)
Here the rst term in the scalar product is independent of the direction of
perturbation and the second term only depends on the direction of perturbation.
Therefore the gradient of the functional is given by,
E(C) =
d
dt
_
C
(t)
|C
(t)|
_
This term is almost the curvature of the curve. To nd a minimum, we equate
the gradient to zero, which gives the information that the curvature of the
shortest path between A and B should be zero. Intuitively, the shortest path is
a straight line and curvature of a straight line is zero.
Note
Equation 2 can also be written as,
_
1
0
1
|C
(t)|
x
dt +
_
1
0
1
|C
(t)|
y
dt =
_
C
(t)
|C
(t)|
_
Comparing with equation 3, we see that the derivative operator has just shifted
from the second part of the scalar product to the rst part, after integrating
by parts. This is similar to carrying out transpose of a transformation matrix
and shifting it to the other part in a scalar product. For an operator, this is
known as adjoint. In this case we see that the adjoint of derivative operator is
also derivative operator.
Application in Image denoising
Let u
0
be the observed noisy image given by,
u
0
= u +
where u is the true image and is zero mean, white independent noise.
Using Bayes rule,
P(u/u
0
) =
P(u
0
/u)P(u)
P(u
0
)
5
P(u
0
/u) is the likelihood, P(u) is the prior knowledge about the true image and
P(u
0
) is the given evidence.
u
0
(i, j) u(i, j) = (i, j)
(i, j) = c exp
_
u
0
ij
u
ij
2
2
_
2
is characterized by N(0, ).
P(u
0
/u) =
ij
c exp
_
u
0
ij
u
ij
2
2
_
2
= c exp
ij
_
u
0
ij
u
ij
2
2
_
2
This suggests that higher the noise, lower is the probability of getting the ob-
served image u
0
, given the true image u. For the prior information, we assume
the image to be smooth (the histogram should be narrow), and the intensity
values to be normally distributed.
P(u) = d exp
|u|
2
2
2
The posteriori probability is given by,
P(u/u
0
) = c
exp
u
0
i
j
u
ij
2
2
exp
|u|
2
2
2
Our aim is to nd u such that the posteriori probability is maximum, which is
equivalent to minimizing log(P(u/u
0
)).
Taking logarithm on both sides and ignoring constants,
log(P(u/u
0
)) =
1
2
2
ij
(u
0
ij
u
ij
)
2
+
||u||
2
2
2
The negative logarithm is now a functional to be minimized. We go to the
continuous domain from the discrete domain, hence converting the summation
to integration,
E(u) =
1
2
2
_
D
(u u
0
)
2
dx +
1
2
2
_
D
|u|
2
dx
The rst term puts a condition that the output image should be close to the
observed image and the second term governs the amount of smoothness in the
image. To calculate the gradient at the boundary for the second term we assume
the Neumann boundary condition, which states that the normal derivatives at
boundaries are zero. This helps in cancelling some terms while integrating by
6
parts.
Let
E(u) =
2
_
D
(u u
0
)
2
dx +
1
2
_
D
|u|
2
dx
where =
1
2
and u = (u
x
, u
y
) The directional derivative of E is given by,
d
dt
E(u + tv)|
t=0
d
dt
E(u + tv) =
1
2
_
D
d
dt
_
(u + tv u
0
)
2
+ (u
x
+ tv
x
)
2
+ (u
y
+ tv
y
)
2
dx dy
=
_
D
[(u u
0
)v + (u
x
v
x
+ u
y
v
y
)] dx dy
Now
_
D
u
y
v
y
dx dy =
_
D
u
y
v
x
_
D
u
xx
v
Similarly,
_
D
u
y
v
y
=
_
D
u
y
v
y
_
D
u
yy
v
where
= (
x
,
y
) is the normal vector to the region D.
Adding the two terms,
_
D
(u
x
u
y
+ v
x
v
y
)dx dy =
_
D
v(u
x
x
+ u
y
y
)
_
D
(u
xx
+ u
yy
)v
=
_
D
v < u,
>
_
D
(u
xx
+ u
yy
)v
=
_
D
v
u
_
D
u v
From the Neumann boundary condition, which says that the derivative of u
in the normal direction at the boundary is zero, the rst term in the above
expression is zero. Therefore,
_
uv =
_
uv
This is also known as the Greens formula.
< u, v >=< div(u), v >
< u, v >=< u, v >
Thus we can see that adjoint of gradient operator is negative divergence.
Now,
_
D
(u u
0
)v + (u
x
v
x
+ u
y
v
y
) =
_
D
[(u u
0
) u] v
7
The rst term is independent of the direction in which we are computing the
derivative and depends only on u, while the second term depends only on the
direction. Therefore,
_
D
(u u
0
)v + (u
x
v
x
+ u
y
v
y
) =< E(u), v >
For an extremum,
(u u
0
) u = 0
This is the rst order Tikhonov regularization condition of an image.
Euler - Lagrange equation
Let our functional be of the form,
E(u) =
_
D
L(
x , u, u
x
, u
y
)
The directional derivative is given as,
d
ds
E(u + sv)|
s=0
=
_
D
d
ds
L(
x , u + sv, u
x
+ sv
x
, u
y
+ sv
y
)|
s=0
=
_
D
d
ds
(u + sv)L
u
+
d
ds
(u
x
+ sv
x
)L
ux
+
d
ds
(u
y
+ sv
y
)L
uy
|
s=0
=
_
D
(L
u
v+ < (L
ux
, L
uy
), (v
x
, v
y
) >)
Using the adjoint operator of gradient, we get,
_
(L
ux
, L
uy
) v =
_
div(L
ux
, L
uy
) v
The directional derivative of our functional is given by,
=
_
D
<
_
L
u
div(L
ux
, L
uy
)
, v >
Therefore the gradient of our functional is given by the rst term,
E = L
u
div(L
ux
, L
uy
)
This is known as the Euler- Lagrange equation. In the image denoising appli-
cation we had,
L(u, u
x
, u
y
) =
2
(u u
0
)
2
+
1
2
(u
2
x
+ u
2
y
)
In deriving the Euler - Lagrange equation, we directly used the adjoint of an
operator instead of doing integration by parts. This can be done by choosing
appropriate boundary conditions which cancel out some terms in the integration
by parts and give only the adjoint of an operator.
8
Optical ow estimation: Horn and Schunk
Let u(x
0
, y
0
, t) = u(x(t), y(t), t). We are assuming that the pixel value at a
particular point remains the same through the trajectory (x(t), y(t)), x(t
0
) =
x
0
, y(t
0
) = y
0
.
Therefore,
d
dt
u(x(t), y(t), t) = x
(t)u
x
+ y
(t)u
y
+ u
t
= 0
The instantaneous velocity of a pixel is given as
v = (x
(t), y
(t)) = (v
1
(t), v
2
(t)).
The optical ow constraint equation is thus given as,
v u + u
t
= 0
There are two unknowns, x
(t) and y
v ) =
1
2
_
D
(
v u + u
t
)
2
+
2
(|v
1
|
2
+|v
2
|
2
)
The directional derivative of our functional is given as,
d
ds
E(
v + s
w)|
s=0
=
_
d
ds
1
2
(
v u + s
w u + u
t
)
2
+
2
(v
1x
+ sw
1x
)
2
+(v
1y
+ sw
1y
)
2
+ (v
2x
+ sw
2x
)
2
+ (v
2y
+ sw
2y
)
2
|
s=0
=
_
(
v u + u
t
)u
w (v
1
w
1
+ v
2
w
2
)
This is because of the adjoint operator of gradient.
=
_
D
(
v u + u
t
)u
w (v
1
, v
2
)
w
=
_
D
[(v u + u
t
)u (v
1
, v
2
)]
w
The rst term is independent of the direction of perturbation and the second
term depends only on the direction. Therefore the rst term is our gradient.
Equating it to zero, we get two coupled equations for two unknowns.
(
v u + u
t
)u
x
v
1
= 0
(
v u + u
t
)u
y
v
2
= 0
9