0% found this document useful (0 votes)
91 views

Sparse Dictionary Learning

Sparse dictionary learning is a representation learning method that finds a sparse representation of input data as a linear combination of basic elements called atoms. Atoms compose an over-complete dictionary that allows for multiple representations of the same signal and improved sparsity and flexibility compared to orthogonal dictionaries. Common applications include compressed sensing and signal recovery by finding sparse representations that can then be recovered using algorithms like basis pursuit. Dictionary learning algorithms iteratively update the dictionary and sparse codings by fixing one while optimizing the other.

Uploaded by

joseph676
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views

Sparse Dictionary Learning

Sparse dictionary learning is a representation learning method that finds a sparse representation of input data as a linear combination of basic elements called atoms. Atoms compose an over-complete dictionary that allows for multiple representations of the same signal and improved sparsity and flexibility compared to orthogonal dictionaries. Common applications include compressed sensing and signal recovery by finding sparse representations that can then be recovered using algorithms like basis pursuit. Dictionary learning algorithms iteratively update the dictionary and sparse codings by fixing one while optimizing the other.

Uploaded by

joseph676
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Sparse dictionary learning

Sparse dictionary learning (also known as sparse coding or SDL) is a representation learning method
which aims at finding a sparse representation of the input data in the form of a linear combination of basic
elements as well as those basic elements themselves. These elements are called atoms and they compose a
dictionary. Atoms in the dictionary are not required to be orthogonal, and they may be an over-complete
spanning set. This problem setup also allows the dimensionality of the signals being represented to be
higher than the one of the signals being observed. The above two properties lead to having seemingly
redundant atoms that allow multiple representations of the same signal but also provide an improvement in
sparsity and flexibility of the representation.

One of the most important applications of sparse dictionary learning is in the field of compressed sensing or
signal recovery. In compressed sensing, a high-dimensional signal can be recovered with only a few linear
measurements provided that the signal is sparse or nearly sparse. Since not all signals satisfy this sparsity
condition, it is of great importance to find a sparse representation of that signal such as the wavelet
transform or the directional gradient of a rasterized matrix. Once a matrix or a high dimensional vector is
transferred to a sparse space, different recovery algorithms like basis pursuit, CoSaMP[1] or fast non-
iterative algorithms[2] can be used to recover the signal.

One of the key principles of dictionary learning is that the dictionary has to be inferred from the input data.
The emergence of sparse dictionary learning methods was stimulated by the fact that in signal processing
one typically wants to represent the input data using as few components as possible. Before this approach
the general practice was to use predefined dictionaries (such as Fourier or wavelet transforms). However, in
certain cases a dictionary that is trained to fit the input data can significantly improve the sparsity, which has
applications in data decomposition, compression and analysis and has been used in the fields of image
denoising and classification, video and audio processing. Sparsity and overcomplete dictionaries have
immense applications in image compression, image fusion and inpainting.

Problem statement
Given the input dataset we wish to
find a dictionary and a Image Denoising by Dictionary
representation such that both Learning
is minimized and the representations are sparse
enough. This can be formulated as the following optimization problem:

, where ,

is required to constrain so that its atoms would not reach arbitrarily high values allowing for arbitrarily
low (but non-zero) values of . controls the trade off between the sparsity and the minimization error.
The minimization problem above is not convex because of the ℓ 0 -"norm" and solving this problem is NP-
hard.[3] In some cases L1 -norm is known to ensure sparsity[4] and so the above becomes a convex
optimization problem with respect to each of the variables and when the other one is fixed, but it is
not jointly convex in .

Properties of the dictionary

The dictionary defined above can be "undercomplete" if or "overcomplete" in case with


the latter being a typical assumption for a sparse dictionary learning problem. The case of a complete
dictionary does not provide any improvement from a representational point of view and thus isn't
considered.

Undercomplete dictionaries represent the setup in which the actual input data lies in a lower-dimensional
space. This case is strongly related to dimensionality reduction and techniques like principal component
analysis which require atoms to be orthogonal. The choice of these subspaces is crucial for
efficient dimensionality reduction, but it is not trivial. And dimensionality reduction based on dictionary
representation can be extended to address specific tasks such as data analysis or classification. However,
their main downside is limiting the choice of atoms.

Overcomplete dictionaries, however, do not require the atoms to be orthogonal (they will never have a basis
anyway) thus allowing for more flexible dictionaries and richer data representations.

An overcomplete dictionary which allows for sparse representation of signal can be a famous transform
matrix (wavelets transform, fourier transform) or it can be formulated so that its elements are changed in
such a way that it sparsely represents the given signal in a best way. Learned dictionaries are capable of
giving sparser solutions as compared to predefined transform matrices.

Algorithms
As the optimization problem described above can be solved as a convex problem with respect to either
dictionary or sparse coding while the other one of the two is fixed, most of the algorithms are based on the
idea of iteratively updating one and then the other.

The problem of finding an optimal sparse coding with a given dictionary is known as sparse
approximation (or sometimes just sparse coding problem). A number of algorithms have been developed to
solve it (such as matching pursuit and LASSO) and are incorporated in the algorithms described below.

Method of optimal directions (MOD)

The method of optimal directions (or MOD) was one of the first methods introduced to tackle the sparse
dictionary learning problem.[5] The core idea of it is to solve the minimization problem subject to the
limited number of non-zero components of the representation vector:

Here, denotes the Frobenius norm. MOD alternates between getting the sparse coding using a method
such as matching pursuit and updating the dictionary by computing the analytical solution of the problem
given by where is a Moore-Penrose pseudoinverse. After this update is renormalized to
fit the constraints and the new sparse coding is obtained again. The process is repeated until convergence
(or until a sufficiently small residue).

MOD has proved to be a very efficient method for low-dimensional input data requiring just a few
iterations to converge. However, due to the high complexity of the matrix-inversion operation, computing
the pseudoinverse in high-dimensional cases is in many cases intractable. This shortcoming has inspired the
development of other dictionary learning methods.

K-SVD

K-SVD is an algorithm that performs SVD at its core to update the atoms of the dictionary one by one and
basically is a generalization of K-means. It enforces that each element of the input data is encoded by a
linear combination of not more than elements in a way identical to the MOD approach:

This algorithm's essence is to first fix the dictionary, find the best possible under the above constraint
(using Orthogonal Matching Pursuit) and then iteratively update the atoms of dictionary in the following
manner:

The next steps of the algorithm include rank-1 approximation of the residual matrix , updating and
enforcing the sparsity of after the update. This algorithm is considered to be standard for dictionary
learning and is used in a variety of applications. However, it shares weaknesses with MOD being efficient
only for signals with relatively low dimensionality and having the possibility for being stuck at local
minima.

Stochastic gradient descent

One can also apply a widespread stochastic gradient descent method with iterative projection to solve this
problem.[6][7] The idea of this method is to update the dictionary using the first order stochastic gradient
and project it on the constraint set . The step that occurs at i-th iteration is described by this expression:

, where is a random subset of

and is a gradient step.

Lagrange dual method

An algorithm based on solving a dual Lagrangian problem provides an efficient way to solve for the
dictionary having no complications induced by the sparsity function.[8] Consider the following Lagrangian:

, where is a constraint on the

norm of the atoms and are the so-called dual variables forming the diagonal matrix .
We can then provide an analytical expression for the Lagrange dual after minimization over :

After applying one of the optimization methods to the value of the dual (such as Newton's method or
conjugate gradient) we get the value of :

Solving this problem is less computational hard because the amount of dual variables is a lot of times
much less than the amount of variables in the primal problem.

LASSO

In this approach, the optimization problem is formulated as:

, where is the permitted error in the reconstruction LASSO.

It finds an estimate of by minimizing the least square error subject to a L1 -norm constraint in the solution
vector, formulated as:

, where controls the trade-off between sparsity and the

reconstruction error. This gives the global optimal solution.[9] See also Online dictionary learning for
Sparse coding (https://www.di.ens.fr/~fbach/mairal_icml09.pdf)

Parametric training methods

Parametric training methods are aimed to incorporate the best of both worlds — the realm of analytically
constructed dictionaries and the learned ones.[10] This allows to construct more powerful generalized
dictionaries that can potentially be applied to the cases of arbitrary-sized signals. Notable approaches
include:

Translation-invariant dictionaries.[11] These dictionaries are composed by the translations of


the atoms originating from the dictionary constructed for a finite-size signal patch. This
allows the resulting dictionary to provide a representation for the arbitrary-sized signal.
Multiscale dictionaries.[12] This method focuses on constructing a dictionary that is
composed of differently scaled dictionaries to improve sparsity.
Sparse dictionaries.[13] This method focuses on not only providing a sparse representation
but also constructing a sparse dictionary which is enforced by the expression
where is some pre-defined analytical dictionary with desirable properties such as fast
computation and is a sparse matrix. Such formulation allows to directly combine the fast
implementation of analytical dictionaries with the flexibility of sparse approaches.

Online dictionary learning (LASSO approach (https://www.di.ens.fr/~fbac


h/mairal_icml09.pdf))
Many common approaches to sparse dictionary learning rely on the fact that the whole input data (or at
least a large enough training dataset) is available for the algorithm. However, this might not be the case in
the real-world scenario as the size of the input data might be too big to fit it into memory. The other case
where this assumption can not be made is when the input data comes in a form of a stream. Such cases lie
in the field of study of online learning which essentially suggests iteratively updating the model upon the
new data points becoming available.

A dictionary can be learned in an online manner the following way:[14]

1. For
2. Draw a new sample

3. Find a sparse coding using LARS:

4. Update dictionary using block-coordinate approach:

This method allows us to gradually update the dictionary as new data becomes available for sparse
representation learning and helps drastically reduce the amount of memory needed to store the dataset
(which often has a huge size).

Applications
The dictionary learning framework, namely the linear decomposition of an input signal using a few basis
elements learned from data itself, has led to state-of-art results in various image and video processing tasks.
This technique can be applied to classification problems in a way that if we have built specific dictionaries
for each class, the input signal can be classified by finding the dictionary corresponding to the sparsest
representation.

It also has properties that are useful for signal denoising since usually one can learn a dictionary to represent
the meaningful part of the input signal in a sparse way but the noise in the input will have a much less
sparse representation.[15]

Sparse dictionary learning has been successfully applied to various image, video and audio processing tasks
as well as to texture synthesis[16] and unsupervised clustering.[17] In evaluations with the Bag-of-Words
model,[18][19] sparse coding was found empirically to outperform other coding approaches on the object
category recognition tasks.

Dictionary learning is used to analyse medical signals in detail. Such medical signals include those from
electroencephalography (EEG), electrocardiography (ECG), magnetic resonance imaging (MRI),
functional MRI (fMRI), continuous glucose monitors [20] and ultrasound computer tomography (USCT),
where different assumptions are used to analyze each signal.

See also
Sparse approximation
Sparse PCA
K-SVD
Matrix factorization
Neural sparse coding

References
1. Needell, D.; Tropp, J.A. (2009). "CoSaMP: Iterative signal recovery from incomplete and
inaccurate samples". Applied and Computational Harmonic Analysis. 26 (3): 301–321.
arXiv:0803.2392 (https://arxiv.org/abs/0803.2392). doi:10.1016/j.acha.2008.07.002 (https://d
oi.org/10.1016%2Fj.acha.2008.07.002).
2. Lotfi, M.; Vidyasagar, M."A Fast Non-iterative Algorithm for Compressive Sensing Using
Binary Measurement Matrices"
3. A. M. Tillmann, "On the Computational Intractability of Exact and Approximate Dictionary
Learning", IEEE Signal Processing Letters 22(1), 2015: 45–49.
4. Donoho, David L. (2006-06-01). "For most large underdetermined systems of linear
equations the minimal 𝓁1-norm solution is also the sparsest solution". Communications on
Pure and Applied Mathematics. 59 (6): 797–829. doi:10.1002/cpa.20132 (https://doi.org/10.1
002%2Fcpa.20132). ISSN 1097-0312 (https://www.worldcat.org/issn/1097-0312).
S2CID 8510060 (https://api.semanticscholar.org/CorpusID:8510060).
5. Engan, K.; Aase, S.O.; Hakon Husoy, J. (1999-01-01). "Method of optimal directions for
frame design". 1999 IEEE International Conference on Acoustics, Speech, and Signal
Processing. Proceedings. ICASSP99 (Cat. No.99CH36258) (https://www.semanticscholar.or
g/paper/684732677d91a93b115f57e8d671ef7f5f13ee14). Vol. 5. pp. 2443–2446 vol.5.
doi:10.1109/ICASSP.1999.760624 (https://doi.org/10.1109%2FICASSP.1999.760624).
ISBN 978-0-7803-5041-0. S2CID 33097614 (https://api.semanticscholar.org/CorpusID:3309
7614).
6. Aharon, Michal; Elad, Michael (2008). "Sparse and Redundant Modeling of Image Content
Using an Image-Signature-Dictionary". SIAM Journal on Imaging Sciences. 1 (3): 228–247.
CiteSeerX 10.1.1.298.6982 (https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.298.
6982). doi:10.1137/07070156x (https://doi.org/10.1137%2F07070156x).
7. Pintér, János D. (2000-01-01). Yair Censor and Stavros A. Zenios, Parallel Optimization —
Theory, Algorithms, and Applications. Oxford University Press, New York/Oxford, 1997,
xxviii+539 pages. (US $ 85.00) (https://www.semanticscholar.org/paper/b31b0f7ff361e51600
dcf715b17777ec364dc4c9). Journal of Global Optimization. Vol. 16. pp. 107–108.
doi:10.1023/A:1008311628080 (https://doi.org/10.1023%2FA%3A1008311628080).
ISBN 978-0-19-510062-4. ISSN 0925-5001 (https://www.worldcat.org/issn/0925-5001).
S2CID 22475558 (https://api.semanticscholar.org/CorpusID:22475558).
8. Lee, Honglak, et al. "Efficient sparse coding algorithms." Advances in neural information
processing systems. 2006.
9. Kumar, Abhay; Kataria, Saurabh. "Dictionary Learning Based Applications in Image
Processing using Convex Optimisation" (http://home.iitk.ac.in/~saurabhk/EE609A_12011_1
2807637_.pdf) (PDF).
10. Rubinstein, R.; Bruckstein, A.M.; Elad, M. (2010-06-01). "Dictionaries for Sparse
Representation Modeling". Proceedings of the IEEE. 98 (6): 1045–1057.
CiteSeerX 10.1.1.160.527 (https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.160.5
27). doi:10.1109/JPROC.2010.2040551 (https://doi.org/10.1109%2FJPROC.2010.2040551).
ISSN 0018-9219 (https://www.worldcat.org/issn/0018-9219). S2CID 2176046 (https://api.se
manticscholar.org/CorpusID:2176046).
11. Engan, Kjersti; Skretting, Karl; Husøy, John H\a akon (2007-01-01). "Family of Iterative LS-
based Dictionary Learning Algorithms, ILS-DLA, for Sparse Signal Representation". Digit.
Signal Process. 17 (1): 32–49. doi:10.1016/j.dsp.2006.02.002 (https://doi.org/10.1016%2Fj.d
sp.2006.02.002). ISSN 1051-2004 (https://www.worldcat.org/issn/1051-2004).
12. Mairal, J.; Sapiro, G.; Elad, M. (2008-01-01). "Learning Multiscale Sparse Representations
for Image and Video Restoration". Multiscale Modeling & Simulation. 7 (1): 214–241.
CiteSeerX 10.1.1.95.6239 (https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.95.62
39). doi:10.1137/070697653 (https://doi.org/10.1137%2F070697653). ISSN 1540-3459 (http
s://www.worldcat.org/issn/1540-3459).
13. Rubinstein, R.; Zibulevsky, M.; Elad, M. (2010-03-01). "Double Sparsity: Learning Sparse
Dictionaries for Sparse Signal Approximation". IEEE Transactions on Signal Processing. 58
(3): 1553–1564. Bibcode:2010ITSP...58.1553R (https://ui.adsabs.harvard.edu/abs/2010ITS
P...58.1553R). CiteSeerX 10.1.1.183.992 (https://citeseerx.ist.psu.edu/viewdoc/summary?doi
=10.1.1.183.992). doi:10.1109/TSP.2009.2036477 (https://doi.org/10.1109%2FTSP.2009.20
36477). ISSN 1053-587X (https://www.worldcat.org/issn/1053-587X). S2CID 7193037 (http
s://api.semanticscholar.org/CorpusID:7193037).
14. Mairal, Julien; Bach, Francis; Ponce, Jean; Sapiro, Guillermo (2010-03-01). "Online
Learning for Matrix Factorization and Sparse Coding" (http://dl.acm.org/citation.cfm?id=1756
006.1756008). J. Mach. Learn. Res. 11: 19–60. arXiv:0908.0050 (https://arxiv.org/abs/0908.0
050). Bibcode:2009arXiv0908.0050M (https://ui.adsabs.harvard.edu/abs/2009arXiv0908.00
50M). ISSN 1532-4435 (https://www.worldcat.org/issn/1532-4435).
15. Aharon, M, M Elad, and A Bruckstein. 2006. "K-SVD: An Algorithm for Designing
Overcomplete Dictionaries for Sparse Representation (https://freddy.cs.technion.ac.il/wp-con
tent/uploads/2017/12/K-SVD-An-Algorithm-for-Designing-Overcomplete.pdf)." Signal
Processing, IEEE Transactions on 54 (11): 4311-4322
16. Peyré, Gabriel (2008-11-06). "Sparse Modeling of Textures" (https://hal.archives-ouvertes.fr/
hal-00359747/file/08-JMIV-Peyre-SparseTextures.pdf) (PDF). Journal of Mathematical
Imaging and Vision. 34 (1): 17–31. doi:10.1007/s10851-008-0120-3 (https://doi.org/10.100
7%2Fs10851-008-0120-3). ISSN 0924-9907 (https://www.worldcat.org/issn/0924-9907).
S2CID 15994546 (https://api.semanticscholar.org/CorpusID:15994546).
17. Ramirez, Ignacio; Sprechmann, Pablo; Sapiro, Guillermo (2010-01-01). "Classification and
clustering via dictionary learning with structured incoherence and shared features". 2010
IEEE Computer Society Conference on Computer Vision and Pattern Recognition (http://ww
w.computer.org/csdl/proceedings/cvpr/2010/6984/00/05539964-abs.html). Los Alamitos, CA,
USA: IEEE Computer Society. pp. 3501–3508. doi:10.1109/CVPR.2010.5539964 (https://do
i.org/10.1109%2FCVPR.2010.5539964). ISBN 978-1-4244-6984-0. S2CID 206591234 (http
s://api.semanticscholar.org/CorpusID:206591234).
18. Koniusz, Piotr; Yan, Fei; Mikolajczyk, Krystian (2013-05-01). "Comparison of mid-level
feature coding approaches and pooling strategies in visual concept detection". Computer
Vision and Image Understanding. 117 (5): 479–492. CiteSeerX 10.1.1.377.3979 (https://cites
eerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.377.3979). doi:10.1016/j.cviu.2012.10.010 (h
ttps://doi.org/10.1016%2Fj.cviu.2012.10.010). ISSN 1077-3142 (https://www.worldcat.org/iss
n/1077-3142).
19. Koniusz, Piotr; Yan, Fei; Gosselin, Philippe Henri; Mikolajczyk, Krystian (2017-02-24).
"Higher-order occurrence pooling for bags-of-words: Visual concept detection" (http://spiral.i
mperial.ac.uk/bitstream/10044/1/39814/2/pkpami2e-peter.pdf) (PDF). IEEE Transactions on
Pattern Analysis and Machine Intelligence. 39 (2): 313–326.
doi:10.1109/TPAMI.2016.2545667 (https://doi.org/10.1109%2FTPAMI.2016.2545667).
hdl:10044/1/39814 (https://hdl.handle.net/10044%2F1%2F39814). ISSN 0162-8828 (https://
www.worldcat.org/issn/0162-8828). PMID 27019477 (https://pubmed.ncbi.nlm.nih.gov/27019
477).
20. AlMatouq, Ali; LalegKirati, TaousMeriem; Novara, Carlo; Ivana, Rabbone; Vincent, Tyrone
(2019-03-15). "Sparse Reconstruction of Glucose Fluxes Using Continuous Glucose
Monitors" (https://ieeexplore.ieee.org/document/8667648). IEEE/ACM Transactions on
Computational Biology and Bioinformatics. 17 (5): 1797–1809.
doi:10.1109/TCBB.2019.2905198 (https://doi.org/10.1109%2FTCBB.2019.2905198).
hdl:10754/655914 (https://hdl.handle.net/10754%2F655914). ISSN 1545-5963 (https://www.
worldcat.org/issn/1545-5963). PMID 30892232 (https://pubmed.ncbi.nlm.nih.gov/30892232).
S2CID 84185121 (https://api.semanticscholar.org/CorpusID:84185121).

Retrieved from "https://en.wikipedia.org/w/index.php?title=Sparse_dictionary_learning&oldid=1166534447"

You might also like