0% found this document useful (0 votes)
2 views12 pages

Dictionary Learning Algorithms and Applications

This book focuses on dictionary learning, which aims to design a matrix for obtaining sparse representations of signals. It covers various algorithms, applications, and enhancements in the field, emphasizing practical implementations and experimental evidence. The content is structured to support readers with a background in applied mathematics and programming, making it suitable for both beginners and advanced researchers in the area.

Uploaded by

msgcipi1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views12 pages

Dictionary Learning Algorithms and Applications

This book focuses on dictionary learning, which aims to design a matrix for obtaining sparse representations of signals. It covers various algorithms, applications, and enhancements in the field, emphasizing practical implementations and experimental evidence. The content is structured to support readers with a background in applied mathematics and programming, making it suitable for both beginners and advanced researchers in the area.

Uploaded by

msgcipi1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Dictionary Learning Algorithms and Applications

Bogdan Dumitrescu • Paul Irofti

Dictionary Learning
Algorithms and Applications

123
Bogdan Dumitrescu Paul Irofti
Department of Automatic Control Department of Computer Science
and Systems Engineering Faculty of Mathematics
Faculty of Automatic Control and Computer Science
and Computers University of Bucharest
University Politehnica of Bucharest Bucharest, Romania
Bucharest, Romania

ISBN 978-3-319-78673-5 ISBN 978-3-319-78674-2 (eBook)


https://doi.org/10.1007/978-3-319-78674-2

Library of Congress Control Number: 2018936662

© Springer International Publishing AG, part of Springer Nature 2018


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, express or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

Printed on acid-free paper

This Springer imprint is published by the registered company Springer International Publishing AG part
of Springer Nature.
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To our families:
Adina, Andrei, Sebastian
Ralu, Paul, Marta
Preface

This book revolves around the question of designing a matrix D ∈ Rm×n called
dictionary, such that to obtain good sparse representations y ≈ Dx for a class of
signals y ∈ Rm given through a set of samples. The (also unknown) representation
vectors x ∈ Rn are sparse, meaning that they have only a few nonzero coefficients.
So, each signal is a linear combination of a few atoms, as the dictionary columns
are usually named.
Sparse representations have a great appeal in signal processing and related
applications due to their parsimony, which allows capturing the essentials of a
signal. This has physical roots in the workings of the human visual system, for
example, but is also a convenient model for signals that may lack an analytic
model or have a very complicated one. The applications include image denoising
or inpainting, compression, compressed sensing, remote sensing, classification, and
others.
The dictionary learning problem has already 20 years of history, with a slow start
followed by a boom about 10 years ago, when several of the important algorithms
were designed. The pace of research is still high, focusing especially on variations of
the basic design problem and on applications. Our main goal in this book is to give a
systematic view of the field, insisting on the relevant ways of posing the dictionary
learning problem and on the most successful solutions. Our cover of applications is
purposefully restrained to a small number of examples meant to open the road for
the interested reader.
In a strict sense, dictionary learning can be seen as a difficult and large
optimization problem, where the sparse representation error is minimized in least
squares sense. We are usually interested in finding a good local optimum and not
necessarily the global optimum. This is because the design is based on an inherently
imperfect set of training signals; also, the optimized dictionary has to be used most
often in an application where the objective is hard to express directly in terms of the
dictionary. However, in some cases, a prominent example being classification, the
objective can be amended such that the final goal is also considered; in such cases,
a very good solution is extremely important.

vii
viii Preface

Without investigating the problem from a pure optimization viewpoint, the


algorithms are the main end product of our presentation. We start from the defining
idea of an algorithm, trying to emphasize its place among related ideas, then
give the technical details that make it efficient and ready for implementation,
and finally describe the algorithm in a pseudocode that should be sufficiently
clear for reproducibility. We also discuss properties of the algorithms, particu-
larly complexity, but in a rather light manner. This pragmatic view is justified
by the reality that dictionary learning algorithms are validated practically in
applications and not theoretically. We present some experimental evidence that
should give a sense of how well an algorithm solves a particular flavor of the
problem.
This book is written for a reader who is generally knowledgeable in applied
mathematics, optimization, and programming languages, for example at mas-
ters level in computer science, electrical engineering, or related fields. Some
experience with sparse representations surely helps but is not strictly necessary.
The book can be the support for a graduate course, since most of the notions
are gently introduced. There are also many pointers for continuing the explo-
ration of the dictionary learning topic; hence the book is useful for those who
start research in this direction. Finally, the book aims to serve also those who
are more advanced and simply need reference descriptions of a few selected
algorithms.
We give now an overview of the book. Chapter 1 is an introduction to the sparse
representations world, focusing on the two successful computational approaches:
Orthogonal Matching Pursuit as a representative of greedy algorithms and FISTA as
a representative of convex relaxations.
Chapters 2 and 3 are the foundation of the book, the former discussing the basic
form of the dictionary learning problem, illustrated with a quick tour of the main
applications, and the latter presenting the most important algorithms for solving it.
Here are introduced the well-known MOD or K-SVD algorithms. The presentation
follows the logic of the optimization techniques that are employed. Coordinate
descent takes the largest share, its success being caused by its simplicity, which
allows tackling a large problem such as dictionary learning with sometimes surpris-
ing efficiency. Gradient descent also plays its part, but the resulting algorithms are
slower. Simple variations of the algorithms, like those using linearization around the
current point or parallel update of the atoms, bring sometimes unexpectedly good
results.
Chapters 4–6 are dedicated to enhancements of the dictionary learning prob-
lem, tailored to solve diverse issues. Regularization, as always, can improve the
numerical behavior of an algorithm. Incoherence is a special property useful in
sparse representations, aiming to keep the atoms far from one another. Chapter 4
presents versions of the basic algorithms that solve the regularized problem or
enforce incoherence. Chapter 5 is dedicated to other kinds of modifications of the
problem, the most significant being that where sparsity is enforced globally, not
for the representation of each signal. Also important is the idea of changing the
optimization objective to take into account the application where the dictionary
Preface ix

is used. Other algorithmic ideas, such as replacing optimization with selection or


giving an online character to the processing of the training signals, are treated here.
Chapter 6 plays with a new variable, the dictionary size (the number of atoms),
previously assumed as being fixed. The discrete character of the size leads to several
types of heuristics that attempt to find an ideal size, although the very definition of
the concept may be problematic. The narrow sense of finding the smallest dictionary
that produces a given representation error is the easiest to formalize.
Chapter 7 attacks a different topic, that of endowing the dictionary with a
structure, which may come from an application requirement like shift invariance
or from a desire to promote incoherence and thus to work with orthogonal blocks.
A special place is occupied by dictionaries that allow the direct representation of
multidimensional signals, like images, which are vectorized in the standard setup.
In all cases, complexity is a significant issue and most algorithms are faster than
their general counterparts.
Chapters 8 and 9 deal with an important application field of sparse represen-
tations and dictionary learning, namely, classification. The diversity of ideas is
extremely interesting, starting from learning an individual dictionary for each class
and going to modifications of the learning objective that almost directly consider
the classification results. Chapter 9 presents a nonlinear extension of dictionary
learning based on the kernel trick. One can thus solve classification problems that
are impossible for the usual approach. The complexity is much increased, but
approximation methods like Nyström sampling offer a middle way by inserting
some of the nonlinear character in the basic linear representation.
Chapter 10, the last one, takes the dual analysis view of cosparsity, where the
representation is built from orthogonality relations between atoms and signals,
unlike the usual sparse synthesis view where the signal is built as a linear
combination of atoms. Dictionary learning takes a different shape in this case and,
although the general approach remains similar, the algorithms rely on quite different
operations.
The reading path is normally sequential, but depending on reader’s interest,
several routes can be taken. The first three chapters are mandatory for a basic
comprehension of dictionary learning; they form the core of the book and should
always come first. Chapters 4 and 5 cover essential areas and we recommend their
(almost) full reading; the last few sections of Chap. 5 can be postponed if necessary.
The remaining chapters are more or less independent and can be read in any order
with the exception of the tandem made by Chaps. 8 and 9 that are better understood
in succession. Also, the sections of Chap. 7 treat distinct dictionary structures with
no relation between them; Sects. 7.5 and 7.6 come naturally in this order, as the first
deals with 2D signals and the second with multidimensional signals in general.
We implemented in Matlab almost all algorithms presented in this book. Together
with code for the examples, they can be found at https://github.com/dl-book.
x Preface

Acknowledgements This book started with a short Erasmus course on dictionary learning given
by BD at Tampere University of Technology in 2016. We are indebted to Cristian Rusu, who was
the first in our group to work on dictionary learning problems for his PhD thesis; many discussions
along several years helped our understanding of the topic. We thank our office mate and friend
Florin Stoican for his contribution to the figures related to the water network application from
Chap. 8. Finally, we congratulate each other for the patience with which this writing enterprise has
been undertaken, sacrificing some of our more immediate research interests.

Bucharest, Romania Bogdan Dumitrescu


January 2018 Paul Irofti
Contents

1 Sparse Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 The Sparse Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Orthogonal Matching Pursuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Algorithms for Basis Pursuit: FISTA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Guarantees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.6 The Choice of a Dictionary: Fixed vs Learned . . . . . . . . . . . . . . . . . . . . . 20
2 Dictionary Learning Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.1 The Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 An Analysis of the DL Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3 Test Problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.1 Representation Error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.2 Dictionary Recovery. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4 Applications: A Quick Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4.1 Denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4.2 Inpainting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.4.3 Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.4.4 Compressed Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.4.5 Classification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3 Standard Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.1 Basic Strategy: Alternating Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 Sparse Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3 Simple Descent Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3.1 Gradient Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3.2 Coordinate Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.4 Method of Optimal Directions (MOD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.5 K-SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.6 Parallel Algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.7 SimCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.8 Refinements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

xi
xii Contents

3.9 Practical Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69


3.9.1 Initialization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.9.2 Dictionary Size and Other Size Parameters . . . . . . . . . . . . . . . 71
3.9.3 Unused or Redundant Atoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.9.4 Randomization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.10 Comparisons: Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.11 Comparisons: Some Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.11.1 Representation Error Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.11.2 Dictionary Recovery Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.11.3 Denoising Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.12 Impact of Sparse Representation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 83
4 Regularization and Incoherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.1 Learning with a Penalty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.2 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.2.1 Sparse Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.2.2 Regularized K-SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.2.3 Comparison Between Regularized K-SVD and SimCO . . 95
4.3 Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.4 Joint Optimization of Error and Coherence . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.5 Optimizing an Orthogonal Dictionary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.6 Imposing Explicit Coherence Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.7 Atom-by-Atom Decorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5 Other Views on the DL Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.1 Representations with Variable Sparsity Levels . . . . . . . . . . . . . . . . . . . . . 115
5.2 A Simple Algorithm for DL with 1 Penalty . . . . . . . . . . . . . . . . . . . . . . . 118
5.3 A Majorization Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.4 Proximal Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.5 A Gallery of Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
5.6 Task-Driven DL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.7 Dictionary Selection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5.8 Online DL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.8.1 Online Coordinate Descent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
5.8.2 RLS DL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
5.9 DL with Incomplete Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6 Optimizing Dictionary Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.1 Introduction: DL with Imposed Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.2 A General Size-Optimizing DL Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
6.3 Stagewise K-SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
6.4 An Initialization Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
6.5 An Atom Splitting Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
6.6 Clustering as a DL Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
6.7 Other Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
6.8 Size-Reducing OMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Contents xiii

7 Structured Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167


7.1 Short Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
7.2 Sparse Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
7.2.1 Double Sparsity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
7.2.2 Greedy Selection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
7.2.3 Multi-Layer Sparse DL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
7.2.4 Multiscale Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
7.3 Orthogonal Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
7.3.1 Orthogonal Basis Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
7.3.2 Union of Orthonormal Bases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
7.3.3 Single Block Orthogonal DL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
7.4 Shift Invariant Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
7.4.1 Circulant Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
7.4.2 Convolutional Sparse Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
7.5 Separable Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
7.5.1 2D-OMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
7.5.2 SeDiL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
7.6 Tensor Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
7.6.1 CP Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
7.6.2 CP Dictionary Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
7.6.3 Tensor Singular Valued Decomposition . . . . . . . . . . . . . . . . . . . 199
7.6.4 t-SVD Dictionary Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
7.7 Composite Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
7.7.1 Convex Approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
7.7.2 Composite Dictionaries with Orthogonal Blocks . . . . . . . . . 206
8 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
8.1 Classification Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
8.2 Water Networks Application. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
8.3 Simple Algorithms with Many Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . 214
8.4 Discriminative DL with Many Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . 216
8.4.1 DL with Discriminative Penalty Functions . . . . . . . . . . . . . . . . 217
8.4.2 Adding a Shared Dictionary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
8.5 Discriminative DL with a Single Dictionary . . . . . . . . . . . . . . . . . . . . . . . . 222
8.5.1 Classification Using the Representation Matrix . . . . . . . . . . . 222
8.5.2 Discriminative DL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
8.5.3 Label Consistent DL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
8.6 Other Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
9 Kernel Dictionary Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
9.1 Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
9.2 Dictionary Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
9.3 Kernel OMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
9.4 Kernel DL Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
9.4.1 Kernel MOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
9.4.2 Kernel SGK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
xiv Contents

9.4.3 Kernel AK-SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240


9.4.4 Kernel K-SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
9.5 Size Reduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
9.5.1 Nyström Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
9.5.2 Changing the Target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
9.6 Classification with Kernel DL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
9.6.1 Kernel SRC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
9.6.2 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
9.6.3 Kernel Discriminative DL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
9.6.4 Multiple Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
10 Cosparse Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
10.1 The Cosparse Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
10.2 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
10.2.1 Backward Greedy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
10.2.2 Optimized Backward Greedy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
10.2.3 Other Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
10.3 Cosparse DL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
10.3.1 Analysis K-SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
10.3.2 Analysis SimCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281

You might also like