Dictionary Learning Algorithms and Applications
Dictionary Learning Algorithms and Applications
Dictionary Learning
Algorithms and Applications
123
Bogdan Dumitrescu Paul Irofti
Department of Automatic Control Department of Computer Science
and Systems Engineering Faculty of Mathematics
Faculty of Automatic Control and Computer Science
and Computers University of Bucharest
University Politehnica of Bucharest Bucharest, Romania
Bucharest, Romania
This Springer imprint is published by the registered company Springer International Publishing AG part
of Springer Nature.
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To our families:
Adina, Andrei, Sebastian
Ralu, Paul, Marta
Preface
This book revolves around the question of designing a matrix D ∈ Rm×n called
dictionary, such that to obtain good sparse representations y ≈ Dx for a class of
signals y ∈ Rm given through a set of samples. The (also unknown) representation
vectors x ∈ Rn are sparse, meaning that they have only a few nonzero coefficients.
So, each signal is a linear combination of a few atoms, as the dictionary columns
are usually named.
Sparse representations have a great appeal in signal processing and related
applications due to their parsimony, which allows capturing the essentials of a
signal. This has physical roots in the workings of the human visual system, for
example, but is also a convenient model for signals that may lack an analytic
model or have a very complicated one. The applications include image denoising
or inpainting, compression, compressed sensing, remote sensing, classification, and
others.
The dictionary learning problem has already 20 years of history, with a slow start
followed by a boom about 10 years ago, when several of the important algorithms
were designed. The pace of research is still high, focusing especially on variations of
the basic design problem and on applications. Our main goal in this book is to give a
systematic view of the field, insisting on the relevant ways of posing the dictionary
learning problem and on the most successful solutions. Our cover of applications is
purposefully restrained to a small number of examples meant to open the road for
the interested reader.
In a strict sense, dictionary learning can be seen as a difficult and large
optimization problem, where the sparse representation error is minimized in least
squares sense. We are usually interested in finding a good local optimum and not
necessarily the global optimum. This is because the design is based on an inherently
imperfect set of training signals; also, the optimized dictionary has to be used most
often in an application where the objective is hard to express directly in terms of the
dictionary. However, in some cases, a prominent example being classification, the
objective can be amended such that the final goal is also considered; in such cases,
a very good solution is extremely important.
vii
viii Preface
Acknowledgements This book started with a short Erasmus course on dictionary learning given
by BD at Tampere University of Technology in 2016. We are indebted to Cristian Rusu, who was
the first in our group to work on dictionary learning problems for his PhD thesis; many discussions
along several years helped our understanding of the topic. We thank our office mate and friend
Florin Stoican for his contribution to the figures related to the water network application from
Chap. 8. Finally, we congratulate each other for the patience with which this writing enterprise has
been undertaken, sacrificing some of our more immediate research interests.
1 Sparse Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 The Sparse Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Orthogonal Matching Pursuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Algorithms for Basis Pursuit: FISTA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Guarantees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.6 The Choice of a Dictionary: Fixed vs Learned . . . . . . . . . . . . . . . . . . . . . 20
2 Dictionary Learning Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.1 The Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 An Analysis of the DL Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3 Test Problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.1 Representation Error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.2 Dictionary Recovery. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4 Applications: A Quick Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4.1 Denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4.2 Inpainting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.4.3 Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.4.4 Compressed Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.4.5 Classification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3 Standard Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.1 Basic Strategy: Alternating Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 Sparse Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3 Simple Descent Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3.1 Gradient Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3.2 Coordinate Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.4 Method of Optimal Directions (MOD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.5 K-SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.6 Parallel Algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.7 SimCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.8 Refinements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
xi
xii Contents
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281