AnImplementationandDetailedAnalysisofthe K SVDImageDenoising
AnImplementationandDetailedAnalysisofthe K SVDImageDenoising
Abstract
K-SVD is a signal representation method which, from a set of signals, can derive a dictionary
able to approximate each signal with a sparse combination of the atoms. This paper focuses on
the K-SVD-based image denoising algorithm. The implementation is described in detail and its
parameters are analyzed and varied to come up with a reliable implementation.
1 Overview
Denoising is a major task of image processing. In the last decades, several denoising algorithms have
been proposed.
One class of such algorithms contains those which take profit of the analysis of the image in a
(redundant) frame. For example, in this subset, we can mention the threshold of the image coefficients
in an orthonormal basis, like the cosine basis [19, 18], a wavelet basis [8], or a curvelet basis [17].
In this category can also be included the methods which try to recover the main structures of the
signal by using a dictionary (which basically consists of a possibly redundant set of generators).
The matching pursuit algorithm [15] and the orthogonal matching pursuit [7] are of this type. The
efficiency of these methods comes from the fact that natural images can be sparsely approximated
in these dictionaries.
The variational methods form a second class of denoising algorithms. Among them let us mention
the total variation (TV) denoising [16, 4] where the chosen regularity model is the set of functions
of bounded variations.
In another class, one could include methods that take advantage of the non-local similarity of
patches in the image. Among the most famous, we can name NL-means [3], BM3D [6], and NL-
Bayes [10].
The K-SVD-based denoising algorithm merges some concepts coming from these three classes,
paving the way of dictionary learning. Indeed, the efficiency of the dictionary is encoded through a
functional which is optimized taking profit of the non-local similarities of the image. It is divided
into three steps : a) sparse coding step, where, using the initial dictionary, we compute sparse
Marc Lebrun, Arthur Leclaire, An Implementation and Detailed Analysis of the K-SVD Image Denoising Algorithm, Image Processing On
Line, 2 (2012), pp. 96–133. http://dx.doi.org/10.5201/ipol.2012.llm-ksvd
An Implementation and Detailed Analysis of the K-SVD Image Denoising Algorithm
approximations of all patches (with a fixed size) of the image; b) dictionary update, where we try to
update the dictionary in such a manner that the quality of the sparse approximations is increased;
and next, c) reconstruction step which recovers the denoised image from the collection of denoised
patches. Actually, before getting to c), the algorithm carries out K iterations of steps a and b.
There is by now a thriving literature about dictionary learning. Here we will only quote the main
articles that led to the design of the K-SVD algorithm for color images. The K-SVD method was
introduced in [1] where the whole objective was to optimize the quality of sparse approximations of
vectors in a learned dictionary. Even if this article noticed the interest of the technique in image
processing tasks, it is in [9] that a detailed study has been led on the denoising of gray-level images.
Then, the adjustment to color images has been treated in [14]. Let us notice that this last article
proved that the K-SVD method can also be useful in other image processing tasks, such as non-
uniform denoising, demosaicing and inpainting.
Following these articles, dictionary learning has become a very active research topic. To go beyond
the scope of this article, see [13] or [11].
2 Theoretical Description
To get a maximal coherence between the different documents about K-SVD, we use the same nota-
tions as in the article [14].
y = x0 + w
where w is a white Gaussian noise vector of zero mean and known standard deviation σ. Conse-
quently, we look for an image x̂ that is close to the initial image, such that each of its patches admits
a sparse representation in terms of a learned dictionary.
For every possible position (i, j) of a pixel in the√image
√ x, we denote by Rij x the size n column
vector formed by the grayscale levels of the squared n× n patch of the image x and whose top-left
corner has coordinates (i, j). One can notice that, with the column notation, Rij x is precisely the
multiplication of x (column vector of size N ) by a matrix Rij of size n × N whose columns are
indexed by the image pixels. Each of the rows of Rij allows to extract the value of one pixel of the
image x and thus is zero except for the coefficient of index p, which is equal to 1.
In the following, the notation D refers to a dictionary. It is a matrix of size n×k, with k ≥ n whose
columns are normalized (in Euclidean norm). We take k ≥ n because otherwise, there is no chance
that the columns of D can span Rn . The algorithm will require an initialization of the dictionary
: to this end, we may choose an usual orthogonal basis (discrete cosine transform, wavelets. . . ), or
we may collect patches from clean images or even from the noisy image itself (without forgetting the
normalization). We give two examples of dictionaries in figure 1.
The dictionary allows to compute a sparse representation αij of each patch Rij x. The represen-
tations αij will thus be column vectors of size k satisfying Rij x ≈ Dαij . We put them √ together
√ in a
matrix α with k rows and Np columns where Np is the number of patches of size n × n of the
image.
97
Marc Lebrun, Arthur Leclaire
Figure 1: Left, a dictionary formed with random patches from the image “Castle” (converted in
grayscale levels) after addition of a white Gaussian noise. Right, the dictionary obtained at the end
of the K-SVD algorithm. For each atom, the contrast is enhanced differently.
With the above notation it is easy to detail each part of the algorithm. At first, D̂ is initialized
with an initial dictionary denoted by Dinit . The initialization alternatives will be discussed later on.
The first step looks for sparse representations of the patches Rij y of y in the dictionary D̂. In
other words, for each patch Rij y, a column vector αˆij (of size k) is built such that it has only a few
non-zero coefficients and such that the distance between Rij y and its sparse approximation D̂αˆij is
small.
The second step updates one by one the columns of the dictionary D̂ and the representations
αˆij in such a way that all patches in the image y become more efficient. Therefore, the goal is to
decrease the quantity X
kD̂αˆij − Rij yk22
i,j
where kαij k0 refers to the l0 norm of αij , i.e. the number of non-zero coefficients of αij . We remind
the reader that D̂ is a matrix whose size is n × k, that αij is a size k column vector and that
98
An Implementation and Detailed Analysis of the K-SVD Image Denoising Algorithm
Rij y is a size n column vector. If it were perfect, this ORMP would find a patch with the sparsest
representation in D̂ and which distance to Rij y is less than n(Cσ)2 . This last constraint brings in a
new parameter C. This coefficient multiplying the standard deviation σ guarantees that, with high
probability,
√ a white Gaussian noise of standard deviation σ on n pixels has an l2 norm lower than
nCσ. We give details on the choice of C in Section 3. In fact, the ORMP is not perfect : indeed,
it only allows one to find a patch having one sparse (not necessarily the sparsest) representation in
D̂ and which distance to Rij y is lower than n(Cσ)2 .
Let us give more details about how the ORMP can compute a sparse representation of a patch. A
good reference to learn about ORMP is [5]. Nevertheless, we shall give here a complete explanation
using the notation of our C++ code. In order to use lighter notations, we will rather explain how
the ORMP finds a sparse representation a ∈ Rk of a vector x ∈ Rn in a dictionary formed by the
normalized vectors d1 , . . . , dk which span Rn .
Let x be a vector of Rn . We wand to find a sparse representation α of x in the dictionary D formed
by the normalized vectors d0 , . . . , dk−1 . Precisely, we are going to give an approximate solution of
the following optimization problem :
We will detail the choice of the atoms in order to stick to our C++ code.
We denote by lj the index of the element of the dictionary that we choose at the step j ≥ 0. We
also set Lj = {l0 , . . . , lj }.
Let us assume that we are at the beginning of the j-th loop (j ≥ 0) (and thus l0 , . . . , lj−1 are
already chosen).
We start by introducing the residue
where ProjF refers to the subspace F , and where Vect(dl0 , . . . , dlj−1 ) refers to the space spanned by
the vectors dl0 , . . . , dlj−1 . If krk2 < ε then we stop and α is the representation of ProjVect(dl ,...,dl ) (x)
0 j−1
in (dl0 , . . . , dlj−1 ) already obtained at the previous step, cf. its computation at the end of the loop 1 .
We choose lj in order to minimize the norm of the new potential residue :
2
lj = Arg min kx − ProjVect(dl ,...,dlj−1 ,di ) (x)k .
0
i∈L
/ j−1
we use the Gram-Schmidt process. We denote by (tl0 , . . . , tlj−1 ) the orthogonal family obtained after
Gram-Schmidt orthogonalization of (dl0 , . . . , dlj−1 ), and by (el0 , . . . , elj−1 ) the orthonormal family
obtained after Gram-Schmidt orthonormalization of (tl0 , . . . , tlj−1 ). For i ∈ / Lj−1 , we denote by
(j)
(tl0 , . . . , tlj−1 , ti ) the family obtained after Gram-Schmidt orthogonalization of (dl0 , . . . , dlj−1 , di ),
1
if we break when j = 0, then α = 0
99
Marc Lebrun, Arthur Leclaire
(j) (j)
and (el0 , . . . , elj−1 , ei ) the (orthonormal) family obtained by normalizing of (tl0 , . . . , tlj−1 , ti ). The
reader have to be aware that this orthonormalization can be progressively computed : at the j-th
step, the vectors (tl0 , . . . , tlj−1 ) and (el0 , . . . , elj−1 ) are already computed. It is thus sufficient to detail,
(j)
at the j-th step, the computation of di and ti for i ∈ / Lj−1 :
j−1
(j)
X
ti = di − hdi , elp ielp ,
p=0
j−1
(j)
X
kti k2 =1− hdi , elp i2 ,
p=0
(j)
(j) ti
ei = (j)
.
kti k
We notice that
(j) (j)
ProjVect(d (j) (x) = hx, el0 iel0 + . . . + hx, elj−1 ielj−1 + hx, ei iei
l0 ,...,dlj−1 ,di )
(where ProjF refers to the orthogonal projection onto the subspace F ) and, consequently,
2 (j)
kProjVect(dl ,...,dlj−1 ,di ) (x)k = hx, el0 i2 + . . . + hx, elj−1 i2 + hx, ei i2 .
0
(j)
Therefore, maximizing the norm of the projection is equivalent to maximize hx, ei i. This is why
we choose
(j)
lj = Arg max hx, ei i2
i∈L
/ j−1
(j) (j)
and with this index comes the vector tlj = tlj and the normalized vector elj = elj . The computation
(j) (j)
of hx, ei i is done by replacing ei by its above given definition :
hx, di i − j−1
P
(j) p=0 hdi , elp ihx, elp i
hx, ei i = q Pj−1 . (3)
1 − p=0 hdi , elp i 2
To implement this computation efficiently, we notice that the denominator and the square of
the numerator are nothing but the subtraction of those used at the previous step by respectively
hdi , elj−1 ihx, elj−1 i and hdi , elj−1 i. Hence, at each step, we need hdi , elj−1 i and hx, elj−1 i which corre-
spond in the code to the variables D_ELj[i][j] and x_elj, which are updated at each loop. The
computation of hx, elj−1 i is not a problem (it is only the formula 3 of the previous step !). However,
we have to explain the update of hdi , elj−1 i. We will see thereafter that the computation of α requires
the coordinates of (el0 , . . . , elj−1 ) on the basis (dl0 , . . . , dlj−1 ) and we will explain how we can obtain
them progressively. Once these coordinates are computed, the scalar product hdi , elj−1 i can be ob-
(j)
tained by a linear combination of the scalar products hdi , dls i, (0 ≤ s < j). The numerator (hx, ti i)
is then saved in the variable x_T[i], and the square of the denominator in the variable scores[i].
Once we have chosen lj , we can go back to the beginning of the loop to stop or choose the next
atom. Clearly, the algorithm terminates because the atoms d1 , . . . , dk span Rn .
At this point let us assume that we are at the end of the j-th loop (and thus, we have chosen
l0 , . . . , lj ). We still have to explain how the sparse representation α of x in D is computed.
100
An Implementation and Detailed Analysis of the K-SVD Image Denoising Algorithm
The coefficients hx, ep i, (p < j) have already been computed in the preceding step. The last coefficient
is given by the equality (3) for i = lj .
Finally, we have to go back to the representation in terms of dl1 , . . . , dlj . To this aim, we introduce
the coordinates of the (el0 , . . . , elj−1 ) on the basis (dl0 , . . . , dlj−1 ). Let us denote them by apq , (p ≤ q)
: p
X
∀p < j, elp = apq dlq .
q=0
th
At the j − 1 -step, the apq are computed for p < j (and again p ≤ q). It suffices to explain how we
compute ajq for q ≥ j. From the definition elj , replacing the elp , (p < j), we obtain
j−1 p
!
1 X X
elj = dlj − hdlj , elp iapq dlq ,
ktlj k p=0 q=0
Finally, we have
j j j
!
X X X
x≈ hx, elp ielp = hx, elp iapq dlq ,
p=0 q=0 p=q
We insist on the fact that the coordinates of (el0 , . . . , elj−1 ) on the basis (dl0 , . . . , dlj−1 ) are also
required for the choice of the index lj , as explained above. Subsequently, it is natural to compute
these coordinates at each loop.
Correspondence with the Notations Used in the Code Now we link the notations used in
the explanation above with the notations used in the code. First, in the code, let us warn the reader
that we have used indexation in column order, that is, D[i] refers to the i-th column of the matrix
D.
We have also used a convention : whenever a variable contains the matrix multiplication of
the transpose of B by A, then the result is saved in the variable A_B. Therefore, A_B = T B A, and
A_B[p][q] is the scalar product between A[p] and B[q].
Let us add that elj (even if it is not a proper variable) will of course refer to elj . Similarly, DLj
(resp. ELj) will refer to the matrix whose columns are (in order) dl0 , . . . , dlj (resp. el0 , . . . , elj ). Last,
(j) (j)
T will refer to the matrix whose columns are t0 , . . . , tk−1 .
101
Marc Lebrun, Arthur Leclaire
· Np = Np
· n=n
· k=k
· epsilon = ε
· L : maximal sparsity allowed for the representations (here we do not use this constraint, i.e. in
our code, L = min(n, k))
(j)
norm[i] = kti k2 = 1 − j−1 2
P
· p=0 hdi , elp i
(j)
x_T[i] = hx, ti i = hx, di i − j−1
P
· p=0 hdi , elp ihx, elp i
(j) 2 Rdn[i]2
· scores[i] = hx, ei i = norm[i]
· lj = lj
· invNorm = 1/sqrt(norm[lj]) = kt1l k
j
· x_elj = x_T[lj]*invNorm = hx, elj i
· x_el[p] = hx, elp i
· delta = x_elj*x_elj = hx, elj i2
normr = kxk2 − jp=0 hx, elp i2
P
·
· D_DLj[i][s] = hdi , dls i
· A[p][q] = apq , (p ≥ q)
· D_ELj[i][j] is equal to hdi , elj i at the end of the j-th loop.
· val temporarily saves the variable hdi , elj i
coord[q] = αlq = jp=q hx, elp iapq : “coordinate” of x on dlq
P
·
· s : summing index
A[j][j] = invNorm
j−1
!
X
∀i < j, A[j][i] = − D_ELj[lj][k] * A[k][i] · invNorm .
k=i
numerical stability an artificial break is added in the code. It happens if ktj k < 10−6 . Thus the
ORMP is stopped in order to avoid the division by ktj k.
decrease, without increasing the sparsity penalty kαij k0 . We will denote by d̂l (1 ≤ l ≤ k) the
columns of the dictionary D̂.
First, let us try to minimize the quantity (6) without taking care of the sparsity. As explained
above, we go through the columns of the dictionary, and the index of the current column will be
denoted by l, (1 ≤ l ≤ k). We are going to modify the atom d̂l and the coefficients αˆij (l) in order
to improve the approximations in an L2 distortion sense. In order to translate this objective into an
optimization problem, for each (i, j), we introduce the residue
102
An Implementation and Detailed Analysis of the K-SVD Image Denoising Algorithm
which is the error committed by deciding not to use dˆl any more in the representation of the patch
Rij y : elij is thus a size n vector.
These residues are grouped together in a matrix El (whose columns are indexed by (i, j)). The
values of the coefficients αˆij (l) are also grouped in a row vector denoted by α̂l . Therefore, El is a
matrix of size n × Np and α̂l is a row vector of size Np . We need to find a new dˆl and a new row
vector α̂l which minimize
X
kD̂αˆij − dˆl αˆij (l) + dl αl − Rij yk22 = ||El − dl αl ||2F (8)
i,j
where the squared Frobenius norm ||M ||2F refers to the sum of the squared elements of M . This
Frobenius norm is also equal to the sum of the squared (Euclidean) norm of the columns, and it is
easy to check that minimizing (8) amounts to reduce the approximation error caused by dˆl . It is
well-known that the minimization of such a Frobenius norm consists in a rank-one approximation,
which always admits a solution, practically given by the singular value decomposition (SVD). Using
the SVD of El :
El = U ∆V T (9)
(where U and V are orthogonal matrices and where ∆ is the null matrix except from its first diagonal,
where it is non-negative and decreasing), the updated values of dˆl and α̂l are respectively the first
column of U and the first column of V multiplied by ∆(1, 1). By the way, we will notice that the
rank-one approximation does not require the computation of the whole matrices U , V , and ∆. In
our implementation, it is sufficient to use a truncated SVD, which is much faster (especially if El is
large). Let us explain the method we used to compute the truncated SVD.
To use lighter notations, we use, as in the code, the notation X = El . Starting from the SVD
(9), one can write
XX T = U ∆∆T U T ,
X T X = V ∆T ∆V T .
As a result, ∆(1, 1) is the square of the greatest eigenvalue of the symmetrical positive-definite matrix
XX T , and the first column of U is the corresponding eigenvector. The same observation is valid for
V . Therefore, we can find these eigenvectors and ∆(1, 1) thanks to the power method applied to the
matrices XX T and X T X. Concerning the convergence of the power method, one could refer to [2].
One could notice that in the pseudo-code that we present below, the power method can be applied
to the two matrices simultaneously.
The SVD function takes as arguments a matrix X of which we want the SVD, a maximal number
of iterations max_iter (set to 100 in the code) and a tolerance threshold ε (set to 10−6 in the code).
It gives back an approximation s of the greatest singular value of X, an approximation u of the first
column of U , and an approximation v of the first column of V .
Here is the pseudo-code.
Initialization : we arbitrarily initialize v (in the code, we set v = dˆl ); we also set i = 0, s = 1 and
sold = 0.
While ( i < max_iter and s−ssold > ε ), we proceed to the following affectations :
u v
u ← Xv, u ← , v ← X T u, sold ← s, s ← kvk, v ←
kuk s
The values of s, u, and v obtained at the end of this loop are the return values of the truncated
SVD.
103
Marc Lebrun, Arthur Leclaire
XX T u ≈ λu
where λ is the greatest eigenvalue of XX T . Taking the scalar product with u, and since u is
normalized, we have
kX T uk2 = hXX T u, ui ≈ λ ,
which yields √
s≈ λ.
This explains why s is an approximation of the largest singular value of X.
This way, for each l = 1, · · · , k, the energy (6) never increases. But for now, the sparsity of
the coefficients is not under control. In order to do that, a slight modification is brought in to the
preceding process : for each l,the operations involved in the update of dˆl and α̂l is restricted to the
patches which already used the atom dˆl before the update.
Setting
ωl = { (i, j) | αˆij (l) 6= 0 } ,
the values that we will group together in El and α̂l will be only the values of elij and αˆij (l) for indices
(i, j) ∈ ωl . Hence, the indices (i, j) of the sum of the LHS of (8) will be restricted to (i, j) ∈ ωl ;
the matrix El is now of size n × Card(ωl ) and α̂l is now a row vector of size Card(ωl ). Also, in
(6), note that the terms of indices (i, j) ∈/ ωl are not affected by this update. This proves that this
modification decreases (6) without increasing kαij k0 . This modification also implies a reduction of
the matrix El which SVD is being computed.
Recall that the sparse coding computes sparse representations α̂ and that the dictionary updates
make D̂ change but also modify α̂. After K iterations of these steps, we are in possession of a learned
dictionary D̂ and of sparse representations αˆij of the patches of the image.
2.1.3 Reconstruction
Now that the first two parts of the algorithm built a dictionary D̂ and sparse representations αˆij which
are well-adapted to our image, we can build the globally denoised image by solving the minimization
problem X
x̂ = Arg min λ||x − y||22 + ||D̂αˆij − Rij x||22 . (10)
x∈RN i,j
The first term controls the global proximity to our reconstruction x̂ with the noisy image y. It is
thus a fidelity term that is weighted by the parameter λ. The second term controls the proximity
of the patch Rij x̂ of our reconstruction to the denoised patch Dαij . This functional is quadratic,
coercive, and differentiable. Subsequently, this problem admits a unique solution that we can compute
explicitly :
!−1 !
X X
x̂ = λI + RTij Rij λy + RTij D̂αˆij . (11)
i,j i,j
This formula can appear a little bit complicated, but it is in fact very simple. The only thing to
notice is that the matrix that has to be inverted is diagonal. In consequence, this formula only means
that the value of a pixel in the denoised image is computed by averaging the value of this pixel in
the noisy image (weighted by λ) and the values of this pixel on the patches to which he belongs
(weighted by 1). We obtain the values of the pixels of x̂, one by one, without requiring any matrix
inversion that (11) would perhaps suggest.
104
An Implementation and Detailed Analysis of the K-SVD Image Denoising Algorithm
2.1.4 Comments
In the articles [9] and [14] the following minimization problem is mentioned :
X X
(x̂, D̂, α̂) = Arg min λ||x − y||22 + µij ||αij ||0 + ||Dαij − Rij x||22 (12)
D,x,α
i,j i,j
which groups all the quantities that we have tried to minimize in the preceding paragraphs.
Let us briefly analyze this formula, even though the forthcoming comments are slightly redundant
with the previous explanation :
• the first term controls the global proximity of x̂ to the noisy image y (fidelity term);
• the second term controls the sparsity of the representations of the patches;
• last, the third term controls for each (i, j), the proximity of the patch Rij x̂ of our reconstruction
to the denoised patch Dαij .
The coefficients λ and µij set the balance between the importance given to the fidelity term and to
the sparsity constraints of the representations of the patches.
This non-convex problem is too difficult to be addressed in this form. This explains why the
article [9] suggests to break it down into parts, and to try to minimize separately the different terms
of (12). This way, we are led to the K-SVD algorithm. Notice also a serious difference: the values of
µij are not required in the above implementation.
Without specifying values for µij , we cannot really address the problems of linking the minimiza-
tion of (12) and the suggested iterative method. Moreover, we do not understand why the authors
did not set only one weight µ rather than weights µij depending on the patches. We would have to
explain why the sparsity of certain patches are more important than others. If the µij are not equal,
then their determination is still a crucial point of the method that remains to be analyzed.
The alternation of sparse coding step and dictionary update step makes difficult the analysis of
the aforementioned energies. On the one hand, the ORMP is only an approximate solution. On the
other hand, in the sparse coding step, the constraints are formed by parts of the Frobenius norm
that is minimized in the dictionary update. For this reason, we want to insist on the fact that
the minimization of (12) is nothing but a possible interpretation of the K-SVD method. Of course,
solving directly the problem (12) is appealing but seems for now out of reach.
The reader could notice that, at each of the K iterations of the first two steps, the algorithm
uses a SVD, thus explaining the name K-SVD. As stated in [1], the reference to K-means is not just
formal : in K-means, we do not allow sparse combinations of the atoms, but we try to optimize the
dictionary in such a way that the error committed by representing each observation with a single
atom in the dictionary is minimal.
105
Marc Lebrun, Arthur Leclaire
Figure 2: Denoised images with separated channels (left), and then concatenated channels (right).
(σ = 25). The reader will notice that the denoising is better on the sky and the water surfaces.
In order to obtain the colors correctly, the algorithm previously described will be applied on
column vectors which are the concatenation of the R,G,B values. In this way, the algorithm will better
update the dictionary, because it is able to learn correlations which exist between color channels. An
example of color dictionary is shown in figure 3.
One can see the difference in figure 2. We remind the reader that from now on the size of columns
which represent images is 3N , and the size of columns which represent patches is 3n.
Unfortunately, even with this adaptation, non-negligible color artifacts are still present.
The authors of [14] justify these artifacts with the following statement : the previously described
algorithm tries to adapt the dictionary to all patches contained in the image. This need of universality
implies that the atoms of the dictionary tend to look like grayscale atoms. To correct these color
artifacts, [14] suggests to modify the metric used in the break condition of the ORMP. From now on
we use the scalar product inferred metric
γ t t
hy, xiγ = y t x + y J Jx (13)
n2
instead of the Euclidean metric, where we denote by J the matrix whose size is 3n × 3n built
from three diagonal blocks of size n × n, full of 1, and where γ ≥ 0 is parameter which needs to be
fixed. In other words, the new norm can be written as
where we denote by mC (x) the average of x on the channel C (and where the Euclidean norm is
denoted by || · ||).
Thus, the new metric, under the parameter γ, put more importance of the proximity of the mean
value of the patches.
106
An Implementation and Detailed Analysis of the K-SVD Image Denoising Algorithm
Figure 3: Left : dictionary composed by patches extracted randomly from the “Castle” image, on
whom a white gaussian noise has been added. Right : dictionary obtained at the end of the color
version of K-SVD. The contrast is enhanced independently for each atom.
This color correction can be easily integrated in the ORMP thanks to the following equality :
γ a t a
I+ J= I+ J I+ J (15)
n n n
where a > 0 is chosen so that γ = 2a + a2 . Thus we can write for all vectors x,
a
||x||γ = (I + J)x . (16)
n
Consequently, to work with the new metric, all columns have to be multiplied by (I + na J) and
we can work again with the Euclidean norm. Nevertheless we remind the reader that in the ORMP
all columns of the dictionary are normalized, which is why a diagonal matrix D is introduced. Its
elements are the inverses of the norm of the columns of (I + na J)D. Its size is k ×k. Then (I + na J)DD
has normalized columns. Now the ORMP can be applied to obtain the βˆij such that
a a
I + J Rij y ≈ I + J DDβˆij
n n
for the Euclidean norm. In the next session, if we denote
αˆij = Dβˆij ,
we get
Rij y ≈ Dαˆij
for the norm k · kγ .
One can notice the contribution of this color version in the figure 4. Here again has appeared a
new parameter γ which will be briefly discussed in the following part.
107
Marc Lebrun, Arthur Leclaire
γ=0 γ = 5.25
Figure 4: Denoising for σ = 30 with γ = 0 and γ = 5.25. Some color artifacts still remain, but the
denoising is slightly better in some areas when γ = 5.25, cf figure 5.
γ=0 γ = 5.25
108
An Implementation and Detailed Analysis of the K-SVD Image Denoising Algorithm
Deduce αˆij = Dβˆij (sparse too) which then verify Rij y ≈ Dαˆij for the norm k · kγ .
Dictionary update
for l = 1, · · · , k do
Introduce ωl = { (i, j) | αˆij (l) 6= 0 }.
for (i, j) ∈ ωl do
Obtain the residue elij = Rij y − D̂αˆij + dˆl αˆij (l).
end
Put these column vectors together in a matrix El . Values αˆij (l) are also assembled in a
row vector denoted by α̂l for (i, j) ∈ ωl .
Update dˆl and α̂l as solutions of the minimization problem:
109
Marc Lebrun, Arthur Leclaire
• C : multiplier coefficient;
• K : number of iterations;
The question is to pick the right values for the various parameters listed above, and to evaluate
their influence on the final result.
3.1 Influence of C
This parameter is used in the stopping condition of the ORMP. In order to understand the chosen
value, let us get started with a clean patch x0 (where the length of the column is denoted by ñ = n
(resp. ñ = 3n) for grayscale (resp. color) images), on which a white Gaussian noise w is added to
obtain a noisy patch x. Then the ORMP tries to find a vector α as sparse as possible such that
√
||x − Dα||2 ≤ ñCσ .
√
√If the noise has norm lower than ñCσ, then x will be in the x0 -centered sphere, which radius
is ñCσ. If we assume that x0 is the only element of this sphere to have a sparse representation in
the dictionary D, then one can suppose that the ORMP will be able to find this x0 . Then we will
ensure that the noise has a large probability to belong to this sphere.
Thus the idea of [14] is to force
√
P(||w||2 ≤ ñCσ) = 0.93 . (17)
Practically, the corresponding value is obtained by using the inverse of the distribution function
of χ2 (ñ).
110
An Implementation and Detailed Analysis of the K-SVD Image Denoising Algorithm
λ=0 λ = 0.05
λ = 0.15 λ = 0.25
Figure 6: σ = 10
Table 2 shows the comparison between the empirically obtained parameter (λe ) and the theoret-
ically obtained parameter (λt ).
In the end the final kept value for λ is the one given by (18).
111
λ=0 λ = 0.05 λ = 0.1 λ = 0.15 λ = 0.2 λ = 0.25 λ = 0.3
σ PSNR RMSE PSNR RMSE PSNR RMSE PSNR RMSE PSNR RMSE PSNR RMSE PSNR RMSE
2 44.43 1.53 44.77 1.47 44.70 1.48 44.52 1.52 44.32 1.55 44.13 1.58 43.97 1.61
Marc Lebrun, Arthur Leclaire
5 39.03 2.85 39.08 2.83 38.97 2.87 38.78 2.93 38.57 3.01 38.34 3.08 38.12 3.16
10 34.55 4.77 34.58 4.76 34.53 4.79 34.42 4.84 34.28 4.92 34.13 5.01 33.96 5.11
20 30.47 7.64 30.47 7.63 30.44 7.66 30.38 7.72 30.30 7.79 30.20 7.88 30.08 7.98
30 28.18 9.94 28.18 9.94 28.15 9.98 28.10 10.03 28.03 10.11 27.96 10.20 27.86 10.30
112
40 26.60 11.92 26.58 11.95 26.55 11.99 26.50 12.06 26.45 12.14 26.38 12.24 26.30 12.35
60 24.36 15.44 24.33 15.49 24.29 15.55 24.25 15.63 24.20 15.73 24.14 15.83 24.07 15.95
80 22.76 18.55 22.73 18.62 22.69 18.71 22.64 18.81 22.59 18.92 22.54 19.03 22.48 19.17
100 21.47 21.53 21.43 21.63 21.38 21.74 21.34 21.86 21.28 21.99 21.23 22.13 21.17 22.28
√
Table 1: In bold the best result for a given σ. Other parameters are fixed to : K = 15; n = 5; γ = 5.25; k = 256.
An Implementation and Detailed Analysis of the K-SVD Image Denoising Algorithm
λ=0 λ = 0.05
λ = 0.15 λ = 0.25
Figure 7: σ = 30
λt λe
σ PSNR RMSE value PSNR RMSE value
5 38.83 2.92 0.0050 38.64 2.98 0.05
10 34.24 4.95 0.0078 34.10 5.03 0.05
20 29.85 8.20 0.012 29.84 8.21 0.05
30 27.64 10.58 0.013 27.68 10.54 0.05
40 26.08 12.66 0.014 26.10 12.63 0.0
60 24.05 16.00 0.018 24.00 16.08 0.0
80 22.78 18.52 0.017 22.80 18.46 0.0
100 21.88 20.54 0.019 21.84 20.63 0.0
√
Table 2: In bold the best result for a given σ. Other parameters are fixed to : K = 15; n = 5;
γ = 5.25; k = 256.
113
Marc Lebrun, Arthur Leclaire
λ=0 λ = 0.05
λ = 0.15 λ = 0.25
Figure 8: σ = 80
114
An Implementation and Detailed Analysis of the K-SVD Image Denoising Algorithm
empirically the convergence of the method. Indeed when K is large enough further iterations should
improve the dictionary only marginally. Depending on the convergence of the method (which can
change according to σ), a huge number of iterations is assumed to be needed in order to assure the
best possible estimate. On an other side, each iteration is really expensive in terms of processing
time. Thus avoiding spurious iterations allows one to obtain a faster algorithm. In consequence the
main goal is to obtain a good compromise between having enough iterations to obtain a good result
close to the optimum and having a correct processing time.
Table 3 shows the PSNR and RMSE evolutions depending on the number of iterations.
One can notice that for σ ≥ 5 the PSNR converges, and the higher σ, the faster the convergence
of the PSNR. Thus it is possible to keep few iterations for high values of noise.
In order to better illustrate the speed of the PSNR convergence in function of K and σ, figure 9
shows f (P SN R(i)) according to the number of iterations i, where f is defined by
x i − x0
f (xi ) =
xm
with xm = max(xi − x0 ).
PSNR vs Nb Iterations
1 sigma = 2
sigma = 5
0.9 sigma = 10
sigma = 20
sigma = 30
0.8 sigma = 40
sigma = 60
sigma = 80
0.7 sigma = 100
0.6
PSNR
0.5
0.4
0.3
0.2
0.1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Nb Iterations
In the following, the number of iterations will therefore be fixed to K = 15, no matter what σ.
115
σ=2 σ=5 σ = 10 σ = 20 σ = 30 σ = 40 σ = 60 σ = 80 σ = 100
K PSNR RMSE PSNR RMSE PSNR RMSE PSNR RMSE PSNR RMSE PSNR RMSE PSNR RMSE PSNR RMSE PSNR RMSE
1 44.26 1.56 38.13 3.16 33.78 5.22 29.57 8.47 27.19 11.15 25.67 13.27 23.24 17.56 21.61 21.17 20.32 24.56
2 44.38 1.54 38.41 3.06 34.18 4.98 30.09 7.97 27.80 10.39 26.26 12.40 23.92 16.23 22.38 19.38 21.11 22.44
3 44.47 1.52 38.56 3.01 34.34 4.89 30.24 7.84 27.96 10.20 26.39 12.22 24.06 15.98 22.51 19.10 21.24 22.11
4 44.49 1.52 38.63 2.98 34.39 4.86 30.29 7.80 28.00 10.14 26.44 12.14 24.11 15.88 22.55 19.02 21.28 22.00
5 44.53 1.51 38.65 2.98 34.41 4.85 30.30 7.79 28.02 10.12 26.47 12.11 24.14 15.83 22.58 18.95 21.31 21.93
Marc Lebrun, Arthur Leclaire
6 44.54 1.51 38.67 2.97 34.42 4.84 30.32 7.77 28.04 10.11 26.48 12.09 24.15 15.81 22.58 18.94 21.32 21.89
7 44.54 1.51 38.69 2.96 34.42 4.84 30.33 7.77 28.04 10.10 26.49 12.08 24.16 15.79 22.59 18.91 21.34 21.86
8 44.53 1.51 38.70 2.96 34.44 4.84 30.33 7.76 28.06 10.08 26.50 12.07 24.18 15.76 22.60 18.90 21.34 21.84
9 44.57 1.51 38.71 2.96 34.45 4.83 30.35 7.75 28.07 10.07 26.51 12.05 24.19 15.75 22.61 18.89 21.35 21.83
10 44.59 1.50 38.72 2.95 34.45 4.83 30.36 7.74 28.08 10.06 26.52 12.04 24.19 15.74 22.61 18.87 21.35 21.82
11 44.39 1.54 38.73 2.95 34.46 4.82 30.37 7.73 28.08 10.05 26.53 12.03 24.19 15.73 22.61 18.87 21.35 21.82
116
12 44.41 1.53 38.73 2.95 34.47 4.82 30.38 7.72 28.09 10.05 26.53 12.02 24.20 15.72 22.61 18.87 21.35 21.82
13 44.65 1.49 38.73 2.95 34.48 4.81 30.39 7.71 28.10 10.04 26.54 12.00 24.20 15.71 22.62 18.85 21.35 21.81
14 44.57 1.51 38.75 2.94 34.48 4.81 30.40 7.70 28.10 10.03 26.54 12.00 24.21 15.71 22.62 18.85 21.35 21.81
15 44.40 1.53 38.75 2.94 34.49 4.80 30.40 7.70 28.11 10.02 26.55 11.99 24.21 15.71 22.63 18.84 21.35 21.82
16 44.45 1.53 38.75 2.94 34.50 4.80 30.41 7.69 28.12 10.01 26.55 11.99 24.21 15.70 22.63 18.83 21.35 21.82
17 44.35 1.54 38.75 2.95 34.51 4.80 30.42 7.68 28.12 10.00 26.55 11.99 24.21 15.69 22.64 18.82 21.35 21.82
18 44.36 1.54 38.76 2.94 34.51 4.80 30.42 7.68 28.13 10.00 26.56 11.98 24.22 15.69 22.64 18.82 21.36 21.81
19 44.55 1.51 38.77 2.94 34.52 4.79 30.42 7.68 28.13 9.99 26.56 11.98 24.22 15.68 22.64 18.81 21.36 21.81
20 44.59 1.50 38.77 2.94 34.53 4.79 30.43 7.67 28.13 9.99 26.56 11.98 24.22 15.68 22.64 18.81 21.36 21.81
√
Table 3: Other parameters are fixed to : n = 5; γ = 5.25; λ = 0.15; k = 256.
An Implementation and Detailed Analysis of the K-SVD Image Denoising Algorithm
√
Table 4: In bold the best result for a given σ. Other parameters are fixed to : K = 15; n = 5;
γ = 5.25; λ = 0.15.
According to this table one can see that it might be interesting to choose larger sizes for the
dictionary for relatively small noise (σ ≤ 30), and smaller sizes for high noise (σ ≥ 60). Although
this parameter has an influence on the processing time, it remains relatively flexible according to
PSNR results. In the following, this parameter will therefore be fixed to k = 256.
√
3.6 Influence of the Size of the Patches n
The size of the patches has a huge influence on the final result, and we can win several decibels in
PSNR by choosing an appropriate n. As for most of the patch-based denoising method, best results
are obtained by working with relatively big patches, as seen in table 6.
Similarly to other patch-based denoising method (for example BM3D), it is necessary to increase
the size of the patches when the noise increases.
Despite
√ the fact that according to PSNR/RMSE results it seems better to take relatively small
patches ( n = 5 or 7) for small values of noise, we have to take into consideration the visual result.
Visual results for several values of the noise and for all studied patch sizes are shown in figures
10, 11, 12, and 13.
One can notice that visually the choice is not so easy. Too small patches give huge artifacts, and
leads to many low frequency fluctuations, but with big patches almost all details are lost. We get a
visually nicer image, but completely blurred.
117
γ = 3.5 γ = 4.5 γ = 4.75 γ=5 γ = 5.25 γ = 5.5 γ = 5.75 γ=6 γ=7
σ PSNR RMSE PSNR RMSE PSNR RMSE PSNR RMSE PSNR RMSE PSNR RMSE PSNR RMSE PSNR RMSE PSNR RMSE
Marc Lebrun, Arthur Leclaire
2 44.24 1.56 44.67 1.49 44.66 1.49 44.64 1.49 44.66 1.49 44.58 1.50 44.67 1.49 44.64 1.49 44.66 1.49
5 38.75 2.94 38.77 2.94 38.75 2.94 38.75 2.94 38.77 2.94 38.76 2.94 38.77 2.94 38.77 2.94 38.74 2.95
10 34.48 4.81 34.50 4.80 34.47 4.82 34.49 4.81 34.50 4.80 34.49 4.81 34.49 4.81 34.49 4.81 34.51 4.80
20 30.39 7.71 30.39 7.71 30.39 7.71 30.37 7.72 30.35 7.74 30.37 7.73 30.36 7.73 30.37 7.72 30.36 7.73
30 28.10 10.04 28.08 10.05 28.09 10.04 28.10 10.04 28.09 10.05 28.07 10.06 28.08 10.06 28.09 10.05 28.07 10.07
118
40 26.53 12.02 26.52 12.04 26.51 12.06 26.52 12.03 26.50 12.06 26.54 12.01 26.50 12.06 26.51 12.04 26.51 12.04
60 24.29 15.56 24.27 15.59 24.28 15.57 24.27 15.60 24.26 15.62 24.24 15.64 24.27 15.60 24.29 15.55 24.26 15.61
80 22.68 18.72 22.70 18.69 22.68 18.72 22.64 18.81 22.69 18.71 22.67 18.75 22.69 18.71 22.66 18.77 22.68 18.73
100 21.40 21.70 21.36 21.79 21.37 21.78 21.38 21.74 21.37 21.78 21.38 21.75 21.37 21.79 21.37 21.79 21.36 21.79
√
Table 5: In bold the best result for a given σ. Other parameters are fixed to : K = 15; n = 5; λ = 0.15; k = 256.
√ √ √ √ √ √ √
n=3 n=5 n=7 n=9 n = 11 n = 13 n = 15
σ PSNR RMSE PSNR RMSE PSNR RMSE PSNR RMSE PSNR RMSE PSNR RMSE PSNR RMSE
5 38.52 3.02 39.07 2.84 38.87 2.90 38.55 3.01 38.42 3.06 38.12 3.17 36.80 3.68
10 33.87 5.16 34.65 4.72 34.45 4.83 34.18 4.98 33.89 5.15 33.62 5.31 33.37 5.47
20 29.27 8.77 30.52 7.59 30.32 7.77 30.04 8.02 29.74 8.31 29.48 8.56 29.26 8.77
30 26.46 12.12 28.19 9.93 28.11 10.03 27.78 10.41 27.48 10.78 27.20 11.12 26.99 11.41
40 24.40 15.36 26.56 11.99 26.55 12.00 26.24 12.42 25.90 12.92 25.60 13.38 25.35 13.77
119
60 21.51 21.42 24.42 15.33 24.66 14.91 24.37 15.41 24.06 15.98 23.69 16.67 23.41 17.21
80 19.30 27.63 22.72 18.64 23.33 17.39 23.15 17.74 22.87 18.32 22.59 18.92 22.28 19.60
100 17.56 33.78 21.47 21.53 22.39 19.37 22.38 19.38 22.13 19.98 21.84 20.63 21.56 21.31
Table 6: In bold the best result for a given σ. Other parameters are fixed to : K = 15; k = 256; γ = 5.25; λ = 0 if σ > 0, 0.05 otherwise.
An Implementation and Detailed Analysis of the K-SVD Image Denoising Algorithm
Marc Lebrun, Arthur Leclaire
√
Noisy image n=3
√ √
n=5 n=7
√ √
n=9 n = 11
√ √
n = 13 n = 15
Figure 10: σ = 10
120
An Implementation and Detailed Analysis of the K-SVD Image Denoising Algorithm
√
Noisy image n=3
√ √
n=5 n=7
√ √
n=9 n = 11
√ √
n = 13 n = 15
Figure 11: σ = 30
121
Marc Lebrun, Arthur Leclaire
√
Noisy image n=3
√ √
n=5 n=7
√ √
n=9 n = 11
√ √
n = 13 n = 15
Figure 12: σ = 60
122
An Implementation and Detailed Analysis of the K-SVD Image Denoising Algorithm
√
Noisy image n=3
√ √
n=5 n=7
√ √
n=9 n = 11
√ √
n = 13 n = 15
Figure 13: σ = 100
123
Marc Lebrun, Arthur Leclaire
In conclusion a compromise has to be found, which cannot only be chosen according to the
PSNR/RMSE results, but also taken into account the visual aspect. The values of n which will
therefore be kept are
σ 0 < σ ≤ 20 20 < σ ≤ 60 60 < σ
√
n 5 7 9
√
Table 7: In bold the best result for a given σ. Other parameters are fixed to : K = 15; n = 5;
γ = 5.25; k = 256; λ = 0.15.
One can think that the initialization of the dictionary is quite important (because we run the
algorithm with few number iterations, so the maximum is not reached), because depending on the
initialization we have variations of more than 0.1dB. But when σ increases, one observes less variation
in the results. An explanation might be that the number of iterations K is then more appropriate,
so we are close to optimality, and the initialization is not really crucial.
In conclusion the initialization of the dictionary is not crucial, and the initialization by taking
random patches from the noisy image is quite good.
124
An Implementation and Detailed Analysis of the K-SVD Image Denoising Algorithm
If obviously we cannot reduce the size of the image and if we cannot modify the size of the patches
without highly damaging the final result, it is still possible to reduce the number of patches used
during the training part of the dictionary, by applying the following principle :
1. The set of patches is built on the whole image;
2. Keep one patch out of T to build a T times smaller patch set;
3. Apply the loop on the ORMP and the update of the dictionary by SVD K times on this sub-set,
in order to obtain a final dictionary Df ;
4. Then apply with only one iteration the whole algorithm on the initial full set of patches, but
with Df as previously obtained.
With this simple trick it is then possible to divide the processing time by slightly less3 than T .
Before applying this trick, we have to determinate its impact on the final result, in order to find the
more appropriate value of T for each σ.
We have seen during the study of the parameters that the result in PSNR for σ = 2 is highly
chaotic depending on the number of iterations K. For that reason we do not present results for this
particular value of noise.
Table 8 shows a summary for some values4 of T .
This study shows that it is possible to highly reduce the processing time of this method whilst
keeping a result close to the original method.
According to the obtained results, it seems reasonable to take T = 16 for σ ≤ 40 and T = 8 for
σ > 40.
In order to help the readers to make up their own idea concerning the gain in term of processing
time with this trick, table 9 shows the processing time in seconds for a 512 × 512 × 3 image on a i5
processor with 8Go of Ram5 .
Thanks to this trick, we obtain reasonable processing time for σ ≥ 10. Moreover we can decrease
this time to 112 seconds (resp. 42s.) for σ = 5 (resp. σ = 10) by taking T = 32, without decreasing
the PSNR. But we cannot decrease the processing time more, because we have to process a single
iteration of the full set of patches, which is mainly responsible for the processing time.
One can be surprised by the fact that the processing time is decreasing with respect to σ. But it
can be easily explained :
• For very small values of noise, it is quite complex to get a sparse representation of the patches
since they are very different from one another. Then at the end of the ORMP we have to
process a large matrix;
• On the contrary for very high noise the signal is covered by the noise, then patches are very
similar. Thus it is easier to get a sparse representation of them, and then at the end of the
ORMP the matrix is even smaller.
125
T =1 T =2 T =4 T =8 T = 12 T = 16 T = 20
σ PSNR RMSE PSNR RMSE PSNR RMSE PSNR RMSE PSNR RMSE PSNR RMSE PSNR RMSE
Marc Lebrun, Arthur Leclaire
5 38.84 2.91 38.86 2.91 38.89 2.90 38.90 2.89 38.89 2.90 38.91 2.89 38.92 2.88
10 34.47 4.82 34.49 4.81 34.51 4.80 34.49 4.81 34.52 4.79 34.55 4.78 34.50 4.80
20 30.28 7.80 30.31 7.78 30.28 7.80 30.31 7.78 30.29 7.80 30.29 7.80 30.31 7.78
30 28.05 10.09 28.05 10.09 28.04 10.10 28.04 10.10 28.04 10.10 28.01 10.13 28.01 10.14
126
40 26.61 11.91 26.63 11.88 26.63 11.88 26.59 11.94 26.59 11.94 26.56 11.98 26.58 11.96
60 24.69 14.87 24.65 14.92 24.65 14.92 24.60 15.02 24.56 15.08 24.56 15.08 24.52 15.15
80 23.28 17.47 23.23 17.59 23.22 17.61 23.14 17.75 23.07 17.90 23.00 18.06 23.03 17.98
100 22.28 19.61 22.25 19.67 22.18 19.84 22.12 19.98 22.07 20.09 22.04 20.15 22.00 20.25
√
Table 8: In bold the best result for a given σ. Parameters are fixed to : K = 15; n = 7; γ = 5.25; k = 256; λ = 0.05.
An Implementation and Detailed Analysis of the K-SVD Image Denoising Algorithm
σ 5 10 20 30 40 60 80 100
T =1 1306 446 213 165 152 141 138 137
T tabulated 140 53 28 23 22 29 29 28
Moreover, results for K-SVD will be shown for the algorithm which gives best PSNR results
(named K-SVD 1 in the following) and the one which gives better visual results (K-SVD 2)6 .
The following study has been led on the following noise-free color image (σreel ¡¡ 1). All algorithms
have been processed on the same noisy images obtained from noiseless images (saved in real values
and not sampled on [0, 255])) :
5.2 Images
In addition to the PNSR/RMSE results, it is really interesting to compare visually those methods.
The results for σ = 20 are shown in figure 14.
6 Conclusion
In this article, we have proposed a detailed analysis of the K-SVD algorithm, already introduced
in the articles [9] and [14]. Through this explanation, we showed why we could expect remarkable
denoising results from this algorithm. But we also noticed immediately the difficulty of the related
optimization problems.
In a numerical way, we have observed the stability of this method, but we also brought up its
heavy computational cost. In spite of these drawbacks, our experiments have clarified the impact
of the different parameters on the result, and thus we have proposed reliable values to tune some of
them. Moreover, we showed some denoising experiments which prove that the K-SVD method leads
to good results, both in terms of PSNR values and of visual quality. The skeptical reader can pursue
our experiments by applying the proposed demonstration to the images of her choice. Finally, the
suggested modification (taking into account only a subset of the patches of the image) seems to get
similar results with an interesting reduction of the execution time.
In conclusion, the K-SVD method can be considered to be part of the state of the art. But, above
all it has to be seen as a first successful use of dictionary learning to address an image processing task.
The more recent algorithms of this field, in particular those which replace the l0 -sparsity constraint
by a l1 constraint (cf. [12]), seem very promising. They lead to a great gain in computational time,
and therefore allow one to handle bigger images.
6
To know the difference, please see the study on the influence of the size of the patches n.
127
TV denoising NL-means DCT denoising K-SVD 1 K-SVD 2 NL-Bayes BM3D
σ PSNR RMSE PSNR RMSE PSNR RMSE PSNR RMSE PSNR RMSE PSNR RMSE PSNR RMSE
Marc Lebrun, Arthur Leclaire
5 35.57 4.25 37.37 3.45 38.92 2.89 39.04 2.85 39.06 2.84 39.84 2.60 39.53 2.69
10 31.61 6.70 33.44 5.43 34.25 4.94 34.60 4.75 34.59 4.75 35.26 4.40 35.13 4.47
20 28.09 10.05 29.70 8.35 29.92 8.14 30.46 7.65 30.48 7.63 30.91 7.26 31.00 7.19
30 26.19 12.50 27.07 11.30 27.58 10.65 28.17 9.95 28.14 9.99 28.57 9.51 28.74 9.32
128
40 24.94 14.44 25.57 13.43 26.14 12.58 26.72 11.76 26.70 11.79 27.05 11.33 27.09 11.27
60 23.31 17.42 23.36 17.32 24.19 15.74 24.66 14.91 24.57 15.07 24.87 14.56 25.39 13.71
80 22.26 19.66 21.83 20.66 22.96 18.14 23.40 17.24 23.31 17.42 23.60 16.85 24.21 15.71
100 21.52 21.41 20.85 23.12 22.08 20.07 22.45 19.23 22.05 20.14 22.70 18.69 23.21 17.62
Table 10: Results of the methods.
An Implementation and Detailed Analysis of the K-SVD Image Denoising Algorithm
NL-means TV Denoising
K-SVD 1 K-SVD 2
BM3D NL-Bayes
Figure 14: σ = 20
129
Marc Lebrun, Arthur Leclaire
Acknowledgment
The authors are grateful to Julien Mairal and Jean-Michel Morel for their help and advises.
Glossary
Global Notations
· x: generic notation for an image;
· x0 : clean image;
· N : number of pixels of x0 ;
· Ñ : is equal to N (resp. 3N ) for a grayscale (resp. color) image;
· w: white Gaussian noise which is added to x0 ;
· σ: noise standard deviation;
· y: noisy image: y = x0 + w;
· x̂: denoised image obtained after applying the algorithm;
· xˆλ : (in the paragraph 3.2) final denoised image obtained after applying the algorithm with the
parameter λ;
· (i, j): position of a generic pixel in the image x;
· n: total number of pixels in a patch. As we are working with square patches, n is a perfect
square; √ √
· Np : number of patches whose size is n × n contained in the image x; √ √
· Rij : matrix whose size is n × N making the extraction of a square patch whose size is n × n
and whose up left pixel has coordinate (i, j). Columns of Rij are indexed by the pixels of x;
· D: generic notation for a dictionary;
· dl : column of index l (1 ≤ l ≤ k) of the dictionary D;
· k: number of atoms in the dictionary;
· αij : generic notation for the representation of the patch Rij x in the dictionary: Rij x ≈ Dαij ;
· α: matrix whose columns are formed by αij . Then columns √ of√α are indexed by (i, j) and the
matrix has as many columns as there is patches of size n × n in the image x;
· D̂: current dictionary (updated at each iteration of the algorithm);
· K: number of iterations of the algorithm;
· Dinit : initial dictionary;
· αˆij : current representation of the patch Rij x in D̂ (updated for each iteration of the algorithm);
· α̂: matrix whose columns are αˆij ;
· λ: weighting of kx − yk22 in the minimization problem (12). This coefficient is used during the
reconstruction step;
· µij : weighting of kαij k0 in the minimization problem (12). This coefficient is not explicitly
used in the algorithm;
130
An Implementation and Detailed Analysis of the K-SVD Image Denoising Algorithm
· C: Thanks to this coefficient,√the norm l2 on n pixels of a white Gaussian noise whose standard
deviation is σ is lower than nCσ with probability 0.93. This coefficient is used during the
break condition of the ORMP;
· dˆl : column of the dictionary whose index is l, (1 ≤ l ≤ k);
· αˆij (l): coefficient of αˆij of index l. It matches to the weighting of the atom d̂l in the represen-
tation of the patch Rij x;
· elij = Rij x̂ − D̂αˆij + d̂l αˆij (l): residue corresponding to the atom l and the patch Rij x (it is a
column vector whose size is n);
· El : matrix grouping the residues elij together;
· (U, ∆, V ): singular value decomposition of El ;
· ωl : set of indices all (i, j) such that αˆij (l) 6= 0;
· I: identity square matrix whose size is N × N ;
· γ: parameter of the new metric of the ORMP for the color processing;
· x, y: generic notations for column vectors whose size is ñ;
· α: generic notation for the representation of a vector x in the dictionary D: x ≈ Dα;
· J: square matrix whose size is 3n × 3n, built with three blocks of size n × n full of 1;
· mC (x): average of x in the channel C;
· I: square identity matrix, whose size is 3n × 3n; √
· a: positive solution of γ = 2a + a2 . Then we get a = 1 + γ − 1;
· ñ: ñ = n (resp. ñ = 3n) if we are working on grayscale (resp. color) images;
· βˆij : result of the ORMP for the current representation of the color patch Rij x in D̂ with the
metric k · k;
· D: diagonal matrix containing the inverse of the norm of the columns of (I + na J)D.
131
Marc Lebrun, Arthur Leclaire
· max_iter: maximal number of authorized iterations (fixed to 100 in the C++ code);
· : tolerance threshold controlling the break condition of the SVD (fixed to 10−6 in the C++
code).
References
[1] M. Aharon, Michael Elad, and A. Bruckstein. K-SVD: An Algorithm for Designing Overcomplete
Dictionaries for Sparse Representation. IEEE Transactions on image processing, pages 9–12,
2005. http://dx.doi.org/10.1109/TSP.2006.881199.
[2] G. Allaire and S. M. Kaber. Algèbre Linéaire Numérique. Ellipses, Paris, 2002.
ISBN:2729810013.
[3] A. Buades, B. Coll, and J.M. Morel. A non local algorithm for image denoising. IEEE Computer
Vision and Pattern Recognition, 2:60–65, 2005. http://dx.doi.org/10.1109/CVPR.2005.38.
[4] A. Chambolle. An algorithm for total variation minimization and applications. Journal of
Mathematical Imaging and Vision, 20:89–97, 2004. http://dx.doi.org/10.1023/B:JMIV.
OOOOO11325.36760.1e.
[5] S.F. Cotter, R. Adler, R.D. Rao, and K. Kreutz-Delgado. Forward sequential algorithms for best
basis selection. In Vision, Image and Signal Processing, IEE Proceedings, volume 146, pages
235–244, 1999. http://dx.doi.org/10.1049/ip-vis:19990445.
[6] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Image denoising by sparse 3D transform-
domain collaborative filtering. IEEE Transactions on image processing, 16:2007, 2007. http:
//dx.doi.org/10.1109/TIP.2007.901238.
[7] G. Davis, S. Mallat, and M. Avellaneda. Adaptive greedy approximations. Journal of construc-
tive Approximation, 13:57–98, 1997. http://dx.doi.org/10.1007/BF02678430.
[8] D. Donoho and I. Johnstone. Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81:425–
455, 1993. http://dx.doi.org/10.1093/biomet/81.3.425.
[9] M. Elad and M. Aharon. Image denoising via sparse and redundant representations over learned
dictionaries. IEEE Transactions on image processing, 15(12):3736–3745, 2006. http://dx.doi.
org/10.1109/TIP.2006.881969.
[10] M. Lebrun, A. Buades, and J.M. Morel. Implementation of the non-local bayes image denoising.
Image Processing on Line, http: // www. ipol. im/ , 2011. ipol.im . Workshop, 2011.
[12] J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online learning for matrix factorization and sparse
coding. Journal of Machine Learning Research, 11:19–60, 2010.
[13] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman. Non-local sparse models for image
restoration. In ICCV’09, pages 2272–2279, 2009. http://dx.doi.org/10.1109/ICCV.2009.
5459452.
132
An Implementation and Detailed Analysis of the K-SVD Image Denoising Algorithm
[14] J. Mairal, M. Elad, and G. Sapiro. Sparse representation for color image restoration. IEEE
Transactions on image processing, 17(1):53–69, 2008. http://dx.doi.org/10.1109/TIP.2007.
911828.
[15] S. Mallat and Z. Zhang. Matching pursuits with time-frequency dictionaries. IEEE Transactions
on signal processing, 41(12), December 1992. http://dx.doi.org/10.1109/78.258082.
[16] L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms.
Phys. D, 60:259–268, 1992. http://dx.doi.org/10.1016/0167-2789(92)90242-F.
[17] J.L. Starck, E.J. Candès, and D.L. Donoho. The curvelet transform for image denoising. IEEE
Transactions on image processing, 11:670–684, 2002. http://dx.doi.org/10.1109/TIP.2002.
1014998.
[18] L.P. Yaroslavsky. Local adaptive image restoration and enhancement with the use of DFT and
DCT in a running window. In Proceedings of SPIE, volume 2825, pages 2–13, 1996. http:
//dx.doi.org/10.1007/3-540-76076-8_114.
[19] L.P. Yaroslavsky, K.O. Egiazarian, and J.T. Astola. Transform domain image restoration
methods: review, comparison, and interpretation. In Society of Photo-Optical Instrumen-
tation Engineers (SPIE) Conference Series, volume 4304, pages 155–169, May 2001. http:
//dx.doi.org/10.1117/12.424970.
133