Deep Orthogonal Matrix Factorization As A Hierarchical Clustering Technique

This document discusses deep orthogonal nonnegative matrix factorization (deep ONMF) as a hierarchical clustering technique. Deep ONMF decomposes a data matrix through multiple layers of orthogonal and nonnegative matrix factorizations. It can be interpreted as a bottom-up hierarchical clustering model that successively merges clusters from one layer to the next. The authors propose a greedy initialization strategy for deep ONMF and evaluate its performance on synthetic and hyperspectral image data compared to other initialization methods.

Uploaded by

sagar deshpande

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views5 pages

Deep Orthogonal Matrix Factorization As A Hierarchical Clustering Technique

Uploaded by

sagar deshpande

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Deep orthogonal matrix factorization as a

hierarchical clustering technique

Pierre De Handschutter Nicolas Gillis
Department of Mathematics and Operational Research
University of Mons, Belgium

Abstract—Deep orthogonal nonnegative matrix factorization the optimization is performed. Multilayer MF performs the
(deep ONMF) is a constrained deep low-rank matrix approxim- decomposition in a purely sequential way, that is, it suc-
ation model which decomposes a data matrix through several cessively minimizes kWi−1 − Wi Hi k2F for i = 1, 2, . . . , L
layers of factorizations. Deep ONMF imposes that each data
point is assigned to a single cluster at each layer. In this paper, where W0 = X. Deep MF considers a further backpropaga-
we first explain why deep ONMF can be interpreted as a bottom- tion step. This consists in minimizing the loss function
up hiearchical clustering technique. Then our main contribution kX − WL HL · · · H1 k2F across the layers: after a sequential
is to provide a simple yet effective greedy initialization strategy decomposition as in multilayer MF, the factors are iteratively
for deep ONMF. We show on synthetic data sets that it performs updated in a block-coordinate descent fashion, see [6] and
competitively with other initialization strategies, and apply it on
the decomposition of a hyperspectral image into its constitutive the references therein for more details. As for shallow MFs,
materials. additional constraints must be imposed on the factors to render
it meaningful. Imposing nonnegativity and orthogonality on
I. I NTRODUCTION the Hl ’s leads to deep ONMF [7], the topic of this paper.
Given a matrix X ∈ Rm×n where each column is a data Organization of the paper: In Section II, we start by
point lying in an m-dimensional space, a low-rank matrix explaining why deep ONMF is a particular hierarchical clus-
approximation seeks for matrices W ∈ Rm×r and H ∈ Rr×n tering (HC) model. We then provide a greedy initialization
such that P each data point X(:, j) can be approximated as for deep ONMF in Section III. In Section IV-A, we compare
r
X(:, j) ≈ k=1 W (:, k)H(k, j) for j = 1, . . . , n. This means several initialization techniques on synthetic data, and in
that each data point is a linear combination of r basis vectors, Section IV-B we illustrate the ability of deep ONMF combined
where r is called the rank of the approximation. In matrix with our greedy initialization to cluster the pixels of a hyper-
form, this approximation, also called a factorization, is written spectral image in a hierarchical way. Finally, in Section V, we
as X ≈ W H, where each column of W corresponds to a briefly conclude and give perspectives of research.
basis vector and each column of H indicates the proportions II. D EEP ONMF IS EQUIVALENT TO HC
in which each basis vector appears in each data point. The It is well-known that standard NMF can be interpreted as
quality of the approximation is generally measured by the least a soft clustering technique. In particular, when the sum of
squares criterion, that is, kX − W Hk2F . the entries in each column of H is constrained to be equal
To ensure the interpretability and uniqueness of such mod- to 1, H(k, j) is the proportion in which the data point X(:, j)
els, constraints are typically imposed on the factors W and is associated to the k-th basis vector W (:, k). Due to the
H, such as sparsity [1] and nonnegativity [2], leading to row-wise orthogonality constraint, ONMF is more restrictive:
sparse component analysis and nonnegative matrix factoriz- nonnegativity together with orthogonality implies that each
ation (NMF), respectively. Adding orthogonality on top of column of H has at most a single non-zero entry. This
nonnegativity for the factor H, we obtain orthogonal NMF follows from the fact that two nonnegative and orthogonal
(ONMF) [3] which can be formulated as follows vectors must have disjoint supports. Hence ONMF associates
min kX − W Hk2F such that HH T = Ir , (1) each data point to a single basis vector and performs a
W ∈Rm×r
+ ,H∈Rr×n
+ hard clustering [3]. In fact, it can be proved that ONMF is
where Ir is the identity matrix of size r. equivalent to a weighted variant of spherical k-means [8].
Recently, matrix factorizations (MFs) have been extended Recall that spherical k-means minimizes the angles between
to the case where the input matrix is decomposed in more the data points and their associated centroid, as opposed to
than two factors. More precisely, L layers of successive k-means that minimizes their Euclidean distances.
Deep ONMF is the extension of ONMF (1) to several layers:
factorizations of ranks dl (l = 1, ..., L) are performed on X
for l = 1, 2, . . . , L,
as follows: X ≈ W1 H1 , W1 ≈ W2 H2 , . . . , WL−1 ≈ WL HL ,
where Wl ∈ Rm×dl and Hl ∈ R+ dl ×dl−1 (l = 1, . . . , L) Wl−1 ≈ Wl Hl such that (Wl , Hl ) ≥ 0 and Hl HlT = Idl ,
with d0 = n, so that the matrix X is approximated as (2)
X ≈ WL HL HL−1 · · · H1 . This model is referred to as where W0 = X. In deep ONMF, the ranks dl ’s need to
multilayer MF [4] or deep MF [5], depending on the way decrease as the factorization unfolds, that is, dl > dl+1 for

ISBN: 978-9-0827-9706-0 1466 EUSIPCO 2021

all l, otherwise we end up with trivial factorizations. In fact, Assume w.l.o.g. that kX(:, i)k2 ≥ kX(:, j)k2 . In q
the case
√
if dl ≤ dl+1 for some l, Wl = Wl+1 Hl+1 = (Wl 0) I0dl √
kX(:, i)k2 = kX(:, j)k2 , α = 1/ 2, otherwise α = L+2L L
where 0 is the matrix of zeros of appropriate dimension. T
X(:,i) X(:,j)
Under this assumption, deep ONMF can be interpreted as a where L = 4K 2 + 1 and K = kX(:,i)k 2 −kX(:,j)k2 (note that,
√
hierarchical clustering (HC) model, as it successively merges as kX(:, i)k2 → kX(:, j)k2 , L → ∞ and α → 1/ 2). Finally,
the data points by aggregating clusters, which is referred to the optimal solution of ONMF with r = n − 1 will merge
as bottom-up HC or agglomerative clustering [9]. This first the data points X(:, i) and X(:, j) for which e(i, j) takes
layer splits the columns of X into d1 clusters, the second the smallest value. After permutation, let us assume w.l.o.g.
layer splits the centroids W1 into d2 clusters, and so on. In that i = n − 1 and j = n, the optimal ONMF has the form

In−2 0 0

particular, it is interesting to observe that when the ranks are X ≈ X(:, 1 : n − 2) w , with error
such that dl = dl−1 − 1 for all l, each layer merges two 0 hi hj
e(i, j) given in (3).
clusters of the previous layer into a single new cluster while
We will denote by ONMF(n − 1) the exact procedure for
keeping the others unchanged. Hence, deep ONMF is a HC
ONMF for r = n − 1. The two points leading to the smallest
technique where the criterion of the clustering is a weighted
e(i, j) (see Eq. 3) are merged according to Algorithm 1.
spherical k-means. It is closely related to deep k-means [10]
Note that, given (i, j), computing α and w requires O(m)
which enforces each column of Hl to have a single nonzero
operations. Hence ONMF(n − 1) requires O(mn2 ) operations
entry equal to one, for all l (in ONMF, there is a scaling degree
to test all pairs (i, j) and pick the one that leads to the lowest
of freedom to approximate a data point with a centroid).
error.
Note that other HC techniques are based on NMF ideas.
This is for example the case in [11], [12], [13] where rank- Algorithm 1 Exact orthogonal approximation of two points
2 NMF’s are applied sequentially to split clusters in two. In
Input: Two data points xi , xj ∈ Rm .
contrast to deep ONMF, these techniques are top-down which
Output: Basis vector w ∈ Rm , coefficient α, error e(i, j)
means that they start by assigning all the data points to a single
1: perm= 0;
cluster and progressively split them in several clusters, hence
2: if kxi k < kxj k then
leading to a different interpretation than deep ONMF.
3: (xi , xj ) ← (xj , xi ); perm = 1;
III. G REEDY INITIALIZATION OF DEEP ONMF 4: end if
5: if kxi k = kxj k then
Usually, deep ONMF is applied using dl dl−1 at each √
6: α = 1/ 2
layer and, in particular, d1 n. However, we show in this
7: else q √
section that it is possible to obtain a closed-form optimal xT xj 2 L+ L
solution of deep ONMF at the first layer when d1 = n − 1 or, 8: K = kxi k2 i −kx jk
2 , L = 4K + 1, α = 2L
equivalently, for ONMF with r = n − 1; see Section III-A. 9: end if √
Using this idea sequentially, we propose a greedy initializa- 10: w = αxi + 1 − α2 xj √
tion for deep ONMF; see Section III-B. As most clustering 11: e(i, j) = kxi − αwk2 + kxj − 1 − α2 wk2
strategies, deep ONMF relies on iterative methods and is very 12: if perm √ = 1 then
sensitive to initialization. This is the main motivation behind 13: α ← 1 − α2 .
this work. 14: end if

A. Solving ONMF for r = n − 1 B. Application to deep ONMF

Let us consider a data matrix X ∈ Rm×n and let us consider Based on ONMF(n − 1), we propose a simple greedy ini-
ONMF (1) with r = n − 1. Because ONMF needs to assign tialization of deep ONMF, which we refer to as the successive
n data points to n − 1 clusters, only two data points need to orthogonal decomposition algorithm (SODA).
be merged within the same cluster, say the ith and the jth. Starting from the trivial decomposition X = W1 H1 , with
Assuming we know i and j, minimizing the reconstruction W1 = X and H1 = In , the two data points W1 (:, i) and
error requires finding the scalars hi and hj and the new W1 (:, j) with smallest value for e(i, j) are merged at layer 2
centroid w ∈ Rm such that according to the closed-form solution described in the previous
section. At each layer, two basis vectors are merged in a new
e(i, j) = kX(:, i) − hi wk2F + kX(:, j) − hj wk2F (3) one using the method described in Section III-A. When the
is minimized, with h2i + h2j = 1 and (hi , hj ) ≥ 0. Using current number of basis vectors is equal to the value of some
the first-order optimality conditions, it is easy to show that, inner rank dl , they are used to initialize the corresponding Wl .
at optimality, w = hi X(:, i) + hj X(:, j) and hk = X(:,k)
T
w Let us analyze the computational cost of SODA. First, the
kwk22
reconstruction errors e(i, j) of any two data points i and j
for k = i, j. Combining both equalities and h2i + h2j = 1, and
are computed, which requires n(n−1) 2 times O(m) operations.
writing everything in terms of α := hi , we obtain
√ SODA will have n − dL + 1 steps to be able to construct
α αkX(:, i)k2 + 1 − α2 X(:, i)T X(:, j) WL , each of them requires to compute the reconstruction error
√ = √ . between the remaining basis vectors of the previous layer and
1 − α2 αX(:, i)T X(:, j) + 1 − α2 kX(:, j)k2

1467
the new one, which requires O(nm) operations. Moreover,
assuming an efficient sorting strategy of the array of errors
e(i, j)’s, finding the couple of indices which generates the
smallest e(i, j) is O(n2 log(n2 )). Hence, SODA requires in
total Õ(n2 m) operations (where ˜ indicates that we removed
logarithmic terms), which is not practical for large data sets.
Usually, deep ONMF algorithms run in O(mnd1 ) operations
where d1 n.
However, for large data sets, such as hyperspectral images
where n is the number of pixels (which can be of the order of
millions), this greedy idea can be used as well. For example,
we first compute an ONMF of rank larger than (or equal to) d1 ,
say d01 ≥ d1 with any standard algorithm faster than SODA,
and then "unfold" the remaining d01 clusters through SODA.
This requires O(mnd01 ) operations for the first layer ONMF,
and then Õ(d02 1 m) operations for the next ones computed
by SODA. In practice, we recommend to use d01 as a small
multiple of d1 .
IV. N UMERICAL EXPERIMENTS
In this section, we evaluate the performance of several
initialization techniques for deep ONMF on synthetic data in Fig. 1: Geometric illustration of the synthetic data sets.
Section IV-A, and show the hierarchical clustering produced
by deep ONMF for a hyperspectral image in Section IV-B. where X̃ = W1 H1 , each entry of N follows a Gaussian
distribution of mean 0 and standard deviation 1, and is the
A. Synthetic data sets noise level. To assess the quality of the different initializations,
Let us compare the effectiveness of the following initializ- 10 data sets are generated for each noise level and we report
ation methods for the Wl ’s in deep ONMF: the mean and standard deviation of the clustering accuracy
• Random initialization (RAND): any Wl , l = 1, ..., L is (ACC) at both layers for several noise levels. Given K
set up by randomly picking dl < dl−1 columns of Wl−1 , estimated clusters Gk ’s and K ground truth clusters Hk ’s, the
with W0 = X. This is the most standard approach in the ACC is defined as
literature.
K
• Successive nonnegative projection algorithm (SNPA) [6]: 1 X
ACC(G, H) = max |Gk ∩ HP (k) | (4)
Wl is obtained with SNPA [14] applied on Wl−1 . n P ∈[1···K]
k=1
• Our proposed greedy algorithm, SODA.
• RAND+SODA: Similarly to as described at the end of where P is any permutation of {1, 2, . . . , K}.
Section III-B, we randomly choose d01 n points, and The results are presented in terms of both reconstruction
then apply SODA on this subset of points. error and accuracy at Table I. More precisely, it reports the
We compare these initializations when combined with the relative reconstruction error kX−W 2 H 2 H 1 kF
kXkF , denoted rel_err,
alternating optimization strategy that optimizes Wl ’s and Hl ’s and the accuracy at the first and the second layers, denoted
alternatively by extending the multiplicative updates proposed ACC 1 and ACC 2, respectively.
for ONMF by [15] to deep ONMF. Clearly, SODA outperforms RAND and SNPA in terms
We generate the synthetic data sets as follows, in m = 3 of clustering accuracy. When the noise is small, it always
dimensions. We take d1 = 16 and d2 = 4 and generate the manages to reach a perfect clustering at both layers, contrary
ground-truth (GT) basis vectors W1 and W2 whose columns to the two other methods. Of course, this is at the expense
have unit `1 norms in such a way that the 16 first layer basis of a larger computational cost, from O(mnr) for RAND
vectors are clustered in 4 groups around 4 second layer basis and SNPA, to O(mn2 ) for SODA. However, RAND+SODA
vectors; see Fig. 1 for an illustration. As shown on Fig. 1, performs almost as well as SODA at a reduced cost (see the
the columns of W2 are the central basis vectors of 4 columns end of Section III-B), showing that using the greedy procedure
of W1 : more precisely, it is equal to their average, up to a further in the decomposition is also worthwhile. Note that the
scaling factor. We then pick n = 1000 points uniformly at accuracy of all algorithms is always a bit higher for the second
random over the GT clusters, that is, each data point is equal layer since there are fewer clusters, which are better separated.
to one of the columns of W1 , up to scaling factor and fix d01 The reason SNPA underperforms is because some clusters are
to 100. Finally, noise is added to the data such that contained in the convex cone of the others, while SNPA is
designed to identify extreme rays of the cone generated by

N
X = max 0, X̃ + ||X̃||F , the columns of X.
||N ||F

1468
Table I: Comparison of the clustering accuracies at layer 1 (ACC 1) and 2 (ACC 2) and final relative error (rel_err) of deep
MF applied on synthetic data with several initialization strategies, as a function of the noise level. The average and standard
deviation (if above 0.01) over 10 data sets are reported. The best method in terms of accuracy is highlighted in bold for each
configuration.
RAND SNPA SODA RAND+SODA
ACC 1 ACC 2 rel_err (%) ACC 1 ACC 2 rel_err (%) ACC 1 ACC 2 rel_err (%) ACC 1 ACC 2 rel_err
10−4 0.54 ± 0.14 0.74 ± 0.21 9.26 ± 6.23 0.21 ± 0.03 0.69 ± 0.17 14.84 ± 3.91 1 1 7.49 1 1 7.50
10−3 0.49 ± 0.17 0.66 ± 0.19 9.41 ± 6.19 0.18 ± 0.02 0.67 ± 0.14 15.49 ± 4.24 1 1 7.49 1 1 7.50
10−2 0.48 ± 0.17 0.76 ± 0.16 10.31 ± 6.51 0.42 ± 0.09 0.72 ± 0.16 10.82 ± 4.21 0.97 0.99 7.51 0.97 0.99 7.52
10−1 0.40 ± 0.07 0.70 ± 0.17 15.46 ± 4.57 0.38 ± 0.07 0.68 ± 0.09 14.27 ± 3.20 0.69 ± 0.01 0.92 ± 0.01 9.75 0.57 ± 0.07 0.92 ± 0.01 10.00 ± 0.67

Interestingly, the value of the relative reconstruction errors

obtained by the different algorithms are close to one another
(for example, for = 10−2 , the average is 10.30% for
RAND, 10.82% for SNPA, 7.51% for SODA, and 7.52%
for RAND+SODA), although SODA-based algorithms have
a significantly higher clustering accuracy (for example, for
= 10−2 at the first layer, the average accuracy is
48% for RAND, 42% for SNPA, and 97% for SODA and
RAND+SODA). This illustrates the fact that on challenging
settings such as the one of Fig. 1 for which the maximum
distance between two points belonging to the same cluster
might be larger than the inter-cluster distance, different ways
of splitting the data lead to comparable reconstruction errors
but rather different clustering accuracies. In other words,
ONMF has many local minima with similar objective function
values, so it is important to use proper initialization and
algorithms to identify good solutions. In fact, SODA extracts
initial basis vectors located at the center of each cluster, which
is not guaranteed by the other approaches. Hence SODA
should be preferred as an initialization technique as it performs Fig. 2: Urban hyperspectral image.
significantly better in terms of accuracy.

B. Hyperspectral unmixing signatures, are merged in a single cluster. Then, the road/metal
A hyperspectral image (HI) contains the reflectance values and dirt are merged to create a single cluster while the two
of n pixels in m wavelength spectral bands and is generally kinds of grass are also merged. Finally, the road and roof are
represented by a matrix X ∈ Rm×n where each column of X merged, while trees and grass are also gathered in a cluster
is the so-called spectral signature of each pixel. Hyperspectral made of vegetation.
unmixing (HU) consists in identifying the spectral signatures Deep MF provides a richer decomposition than single-layer
of r materials and under the linear mixing assumption, NMF is matrix factorization and the hybrid initialization combining
appropriate to solve HU [16]. Similarly, when deep ONMF is SNPA with SODA is efficient to set up the factors.
applied, the materials (also called endmembers) are extracted
in a hierarchical bottom-up fashion. V. C ONCLUSION
The HYDICE Urban HI is made of n = 307 × 307 pixels In this paper, we explained why deep ONMF is equivalent
with m = 162 spectral bands; see Fig. 2. There are several to a particular bottom-up hierarchical clustering. We then
versions of the ground truth depending on the number of proposed a greedy initialisation for deep ONMF, SODA, which
materials considered [17]. was shown to outperform random initialization and SNPA on
The abundance maps, that is, the proportions in which every synthetic data sets, especially in situations with noise or when
material appears in every pixel, extracted by deep ONMF, with the clusters are quite close to each other. We emphasized the
L = 6, dl = 8 − l for all l are represented on Fig. 3. To fact that similar (small) final reconstruction errors can be as-
initialize the factors of each layer, we first apply ONMF with sociated to various clustering accuracies hence a proper choice
d1 = 7 with SNPA initialization, and then apply SODA on W1 , of the initialization technique is critical. We also showed
while [6] used a multilayer ONMF with SNPA initialization that deep ONMF initialized with SODA-based algorithms are
of all layers. For conciseness, we gathered the representations able to produce meaningful hierarchical decompositions in a
of layer 3 and 4 as well as those of layer 5 and 6 in a single hyperspectral image.
level as distinct clusters were merged at these layers. The first Future directions of research include to validate the pro-
layer extracts two types of grass, trees, road, dirt, metal and posed method on more data sets and other applications, such
roof. At layer 2, road and metal, which have similar spectral as topic modeling. Also, a thorough study of the robustness

1469
Fig. 3: Deep ONMF applied on the Urban HI.

to noise of SODA would be interesting. In fact, as long as [7] Bensheng Lyu, Kan Xie, and Weijun Sun, “A deep orthogonal non-
the noise is sufficiently small, SODA provides an optimal negative matrix factorization method for learning attribute representa-
tions,” in International Conference on Neural Information Processing.
clustering. This is obvious in the noiseless case, where all Springer, 2017, pp. 443–452.
data points in the same cluster are multiples of one another, [8] Filippo Pompili, Nicolas Gillis, P-A Absil, and François Glineur, “Two
and should be quantified in noisy situations. algorithms for orthogonal nonnegative matrix factorization with applic-
ation to clustering,” Neurocomputing, vol. 141, pp. 15–25, 2014.
[9] Athman Bouguettaya, Qi Yu, Xumin Liu, Xiangmin Zhou, and Andy
ACKNOWLEDGEMENT Song, “Efficient agglomerative hierarchical clustering,” Expert Systems
with Applications, vol. 42, no. 5, pp. 2785–2797, 2015.
This work was supported by the European Research Council [10] Shudong Huang, Zhao Kang, and Zenglin Xu, “Deep k-means: A simple
and effective method for data clustering,” in International Conference
(ERC starting grant no 679515), and by the Fonds de la on Neural Computing for Advanced Applications. Springer, 2020, pp.
Recherche Scientifique - FNRS (F.R.S.-FNRS) and the Fonds 272–283.
Wetenschappelijk Onderzoek - Vlaanderen (FWO) under EOS [11] Da Kuang and Haesun Park, “Fast rank-2 nonnegative matrix factoriz-
ation for hierarchical document clustering,” in Proceedings of the 19th
Project no O005318F-RG47. Pierre De Handschutter is a ACM SIGKDD international conference on Knowledge discovery and
research fellow of the F.R.S.-FNRS. data mining, 2013, pp. 739–747.
[12] Yuqian Li, Diana M Sima, Sofie Van Cauter, Anca R Croitor Sava, Uwe
Himmelreich, Yiming Pi, and Sabine Van Huffel, “Hierarchical non-
R EFERENCES negative matrix factorization (hNMF): a tissue pattern differentiation
method for glioblastoma multiforme diagnosis using MRSI,” NMR in
[1] Pando Georgiev, Fabian Theis, and Andrzej Cichocki, “Sparse compon- Biomedicine, vol. 26, no. 3, pp. 307–319, 2013.
ent analysis and blind source separation of underdetermined mixtures,” [13] Nicolas Gillis, Da Kuang, and Haesun Park, “Hierarchical clustering of
IEEE transactions on neural networks, vol. 16, no. 4, pp. 992–996, 2005. hyperspectral images using rank-two nonnegative matrix factorization,”
[2] Daniel D Lee and H Sebastian Seung, “Learning the parts of objects IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 4,
by non-negative matrix factorization,” Nature, vol. 401, no. 6755, pp. pp. 2066–2078, 2015.
788–791, 1999. [14] Nicolas Gillis, “Successive nonnegative projection algorithm for robust
[3] Chris HQ Ding, Tao Li, Wei Peng, and Haesun Park, “Orthogonal nonnegative blind source separation,” SIAM Journal on Imaging Sci-
nonnegative matrix t-factorizations for clustering,” in Proceedings of the ences, vol. 7, no. 2, pp. 1420–1450, 2014.
12th ACM SIGKDD international conference on Knowledge discovery [15] Seungjin Choi, “Algorithms for orthogonal nonnegative matrix factoriz-
and data mining, 2006, pp. 126–135. ation,” in 2008 IEEE international joint conference on neural networks.
[4] Andrzej Cichocki and Rafal Zdunek, “Multilayer nonnegative matrix IEEE, 2008, pp. 1828–1832.
factorisation,” Electronics Letters, vol. 42, no. 16, pp. 947–948, 2006. [16] José M Bioucas-Dias, Antonio Plaza, Nicolas Dobigeon, Mario Par-
[5] George Trigeorgis, Konstantinos Bousmalis, Stefanos Zafeiriou, and ente, Qian Du, Paul Gader, and Jocelyn Chanussot, “Hyperspectral
Björn Schuller, “A deep matrix factorization method for learning unmixing overview: Geometrical, statistical, and sparse regression-
attribute representations,” IEEE Transactions on Pattern Analysis and based approaches,” IEEE journal of selected topics in applied earth
Machine Intelligence, vol. 39, no. 3, pp. 417–429, 2016. observations and remote sensing, vol. 5, no. 2, pp. 354–379, 2012.
[6] Pierre De Handschutter, Nicolas Gillis, and Xavier Siebert, “Deep matrix [17] Feiyun Zhu, “Spectral unmixing datasets with ground truths,” arXiv
factorizations,” arXiv preprint arXiv:2010.00380, 2020. preprint arXiv:1708.05125, 2017.

1470

Maths Cheat Sheet PDF
No ratings yet
Maths Cheat Sheet PDF
1 page
A Block Coordinate Descent-Based Projected GradientAlgorithm For Orthogonal Non-Negative Matrix Factorization
No ratings yet
A Block Coordinate Descent-Based Projected GradientAlgorithm For Orthogonal Non-Negative Matrix Factorization
22 pages
Approximation Algorithms For Orthogonal Non-Negative Matrix Factorization
No ratings yet
Approximation Algorithms For Orthogonal Non-Negative Matrix Factorization
12 pages
Kim 2013
No ratings yet
Kim 2013
35 pages
Non-Negative Matrix Factorization, A New Tool For Feature Extraction: Theory and Applications
No ratings yet
Non-Negative Matrix Factorization, A New Tool For Feature Extraction: Theory and Applications
8 pages
Graph Regularized Non-Negative Matrix Factorization For Data Representation
No ratings yet
Graph Regularized Non-Negative Matrix Factorization For Data Representation
17 pages
Graph Regularized Non-Negative Matrix Factorization For Data Representation
No ratings yet
Graph Regularized Non-Negative Matrix Factorization For Data Representation
14 pages
Graph Dual Regularization Non-Negative Matrix Factorization For Co-Clustering
No ratings yet
Graph Dual Regularization Non-Negative Matrix Factorization For Co-Clustering
14 pages
Algorithms For Non-Negative Matrix Factorization
No ratings yet
Algorithms For Non-Negative Matrix Factorization
7 pages
Multiplicative Updates For NMF With - Divergences Under Disjoint Equality Constraints
No ratings yet
Multiplicative Updates For NMF With - Divergences Under Disjoint Equality Constraints
31 pages
Von Mises-Fisher Mixture Model-Based Deep Learning: Application To Face Verification
No ratings yet
Von Mises-Fisher Mixture Model-Based Deep Learning: Application To Face Verification
16 pages
Research 4
No ratings yet
Research 4
22 pages
On Clustering Binary Data: Tao Li Shenghuo Zhu
No ratings yet
On Clustering Binary Data: Tao Li Shenghuo Zhu
5 pages
Support Vector Machines For Histogram-Based Image Classification
No ratings yet
Support Vector Machines For Histogram-Based Image Classification
10 pages
Topic: Non-Negative Matrix Factorisation: Assignment - 2
No ratings yet
Topic: Non-Negative Matrix Factorisation: Assignment - 2
6 pages
Multi-Resolution Beta-Divergence NMF For Blind Spectral Unmixing
No ratings yet
Multi-Resolution Beta-Divergence NMF For Blind Spectral Unmixing
13 pages
Nonnegative Matrix Factorization
No ratings yet
Nonnegative Matrix Factorization
4 pages
Non-Negative Matrix Factorization
No ratings yet
Non-Negative Matrix Factorization
19 pages
1861 Algorithms For Non Negative Matrix Factorization
No ratings yet
1861 Algorithms For Non Negative Matrix Factorization
7 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
7 pages
NHNRM spl11pmd
No ratings yet
NHNRM spl11pmd
10 pages
2EL1730 ML Lecture11 NMF - Annotated
No ratings yet
2EL1730 ML Lecture11 NMF - Annotated
41 pages
NMF in R
No ratings yet
NMF in R
36 pages
Algorithms For Nonnegative Matrix Factorization With The Kullback-Leibler Divergence
No ratings yet
Algorithms For Nonnegative Matrix Factorization With The Kullback-Leibler Divergence
31 pages
Unvilling - Shapes - P
No ratings yet
Unvilling - Shapes - P
46 pages
Alexandridis 2015
No ratings yet
Alexandridis 2015
5 pages
Pgradnmf PDF
No ratings yet
Pgradnmf PDF
27 pages
Projected Gradient Methods For Non-Negative Matrix Factorization
No ratings yet
Projected Gradient Methods For Non-Negative Matrix Factorization
27 pages
NMF Tutorial
No ratings yet
NMF Tutorial
189 pages
Fast and Efficient Boolean Matrix Factorization by Geometric Segmentation
No ratings yet
Fast and Efficient Boolean Matrix Factorization by Geometric Segmentation
8 pages
(SSNMF) Semi-Supervised Nonnegative Matrix Factorization
No ratings yet
(SSNMF) Semi-Supervised Nonnegative Matrix Factorization
4 pages
Bag of Features
No ratings yet
Bag of Features
7 pages
C. Cifarelli Et Al - Incremental Classification With Generalized Eigenvalues
No ratings yet
C. Cifarelli Et Al - Incremental Classification With Generalized Eigenvalues
25 pages
Mining Ratio Rules Via Principal Sparse Non Negative Matrix Factorization
No ratings yet
Mining Ratio Rules Via Principal Sparse Non Negative Matrix Factorization
4 pages
Feature Selection For Nonlinear Kernel Support Vector Machines
No ratings yet
Feature Selection For Nonlinear Kernel Support Vector Machines
6 pages
1 Non-Negative Matrix Factorization (NMF) : K A A A
No ratings yet
1 Non-Negative Matrix Factorization (NMF) : K A A A
7 pages
Maximal Margin Hyper-Sphere SVM For Binary Pattern Classification
No ratings yet
Maximal Margin Hyper-Sphere SVM For Binary Pattern Classification
23 pages
Support Vector Machines
No ratings yet
Support Vector Machines
32 pages
ML Unit-4
No ratings yet
ML Unit-4
28 pages
BigData Assessment2 26230605
No ratings yet
BigData Assessment2 26230605
14 pages
CBM342 BCI Unit IV
No ratings yet
CBM342 BCI Unit IV
22 pages
AML Unit-4 (Part-2)
No ratings yet
AML Unit-4 (Part-2)
6 pages
A Map Reduce Based Support Vector Machine For Big Data Classification
No ratings yet
A Map Reduce Based Support Vector Machine For Big Data Classification
22 pages
Laude 2018 Discrete Continuous
No ratings yet
Laude 2018 Discrete Continuous
13 pages
Support Vector Network
No ratings yet
Support Vector Network
25 pages
NIPS 2009 An Integer Projected Fixed Point Method For Graph Matching and Map Inference Paper
No ratings yet
NIPS 2009 An Integer Projected Fixed Point Method For Graph Matching and Map Inference Paper
9 pages
Document Clustering Through Non-Negative Matrix Factorization-A Case Study of Hadoop For Computational Time Reduction of Large Scale Documents.
No ratings yet
Document Clustering Through Non-Negative Matrix Factorization-A Case Study of Hadoop For Computational Time Reduction of Large Scale Documents.
10 pages
Projected Gradient Methods For Nonnegative Matrix
No ratings yet
Projected Gradient Methods For Nonnegative Matrix
24 pages
ZhuoLiu SVclustering
No ratings yet
ZhuoLiu SVclustering
28 pages
Regularized Compression of A Noisy Blurred Image
No ratings yet
Regularized Compression of A Noisy Blurred Image
13 pages
Explicit Unsupervised Feature Selection Based On Structured Graph and Locally
No ratings yet
Explicit Unsupervised Feature Selection Based On Structured Graph and Locally
16 pages
ML 41
No ratings yet
ML 41
49 pages
Least Squares Support Vector Machines: Johan Suykens
No ratings yet
Least Squares Support Vector Machines: Johan Suykens
84 pages
Cai 2022 Online
No ratings yet
Cai 2022 Online
17 pages
Recommender Systems Clustering Using Bayesian Non Negative Matrix Factorization
No ratings yet
Recommender Systems Clustering Using Bayesian Non Negative Matrix Factorization
16 pages
Face Recognition Using PCA and SVM
No ratings yet
Face Recognition Using PCA and SVM
5 pages
Ipmv Mod 5&6 (Theory Questions)
No ratings yet
Ipmv Mod 5&6 (Theory Questions)
11 pages
Adhikary e Murty - 2012 - Feature Selection For Unsupervised Learning
No ratings yet
Adhikary e Murty - 2012 - Feature Selection For Unsupervised Learning
8 pages
SVM & Image Classification.
No ratings yet
SVM & Image Classification.
22 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
FORM Reliability
No ratings yet
FORM Reliability
2 pages
HANDOUT - Linear Algebra - 23GE1303 - ODD 24-25
No ratings yet
HANDOUT - Linear Algebra - 23GE1303 - ODD 24-25
5 pages
MA-108 Differential Equations I: Ronnie Sebastian
No ratings yet
MA-108 Differential Equations I: Ronnie Sebastian
36 pages
10 - Vector Algebra
No ratings yet
10 - Vector Algebra
22 pages
SCI Lab Manual Format-Final - 1
No ratings yet
SCI Lab Manual Format-Final - 1
66 pages
ASSIGN8
No ratings yet
ASSIGN8
5 pages
GQB CS PDF
100% (8)
GQB CS PDF
426 pages
Math 1041 (Lecture Note)
No ratings yet
Math 1041 (Lecture Note)
96 pages
01 Directional Derivatives and Gradient
No ratings yet
01 Directional Derivatives and Gradient
114 pages
Linear Transformations
No ratings yet
Linear Transformations
31 pages
Diagonalizationrevisited11 10 14
No ratings yet
Diagonalizationrevisited11 10 14
34 pages
LVC-2 Glossary of Notations Recommender Systems
No ratings yet
LVC-2 Glossary of Notations Recommender Systems
2 pages
IIT Matrices
No ratings yet
IIT Matrices
46 pages
Applications of Duality Theory (Diewert)
No ratings yet
Applications of Duality Theory (Diewert)
9 pages
Lesson 3 Part 2 - Tup Format - Pee 4M - Numerical Methods
No ratings yet
Lesson 3 Part 2 - Tup Format - Pee 4M - Numerical Methods
38 pages
MA1513 Chapter 1 Lecture Note
No ratings yet
MA1513 Chapter 1 Lecture Note
34 pages
Mit18 06S10 L31
No ratings yet
Mit18 06S10 L31
13 pages
221tee100 Scheme
No ratings yet
221tee100 Scheme
2 pages
Matlab - Matrix Laboratory
No ratings yet
Matlab - Matrix Laboratory
7 pages
Lines and Planes in 2D and 3D Geometry: Mihai-Sorin Stupariu
No ratings yet
Lines and Planes in 2D and 3D Geometry: Mihai-Sorin Stupariu
51 pages
hw3 Solutions PDF
No ratings yet
hw3 Solutions PDF
11 pages
EMFT
0% (1)
EMFT
59 pages
Exam
No ratings yet
Exam
9 pages
Class 12th Maths Sample Paper
No ratings yet
Class 12th Maths Sample Paper
5 pages
Lec 3 Rigid Body Motion
No ratings yet
Lec 3 Rigid Body Motion
92 pages
The Theory of Quantum Information 1st Edition John Watrous - Download The Ebook Now For An Unlimited Reading Experience
100% (4)
The Theory of Quantum Information 1st Edition John Watrous - Download The Ebook Now For An Unlimited Reading Experience
57 pages
Enlargement Worksheet
No ratings yet
Enlargement Worksheet
10 pages
Information Theory Fundamentals: Distance Between Two Images Based On Pixels
No ratings yet
Information Theory Fundamentals: Distance Between Two Images Based On Pixels
24 pages
Controlability & Reachability
No ratings yet
Controlability & Reachability
3 pages

Deep Orthogonal Matrix Factorization As A Hierarchical Clustering Technique

Uploaded by

Deep Orthogonal Matrix Factorization As A Hierarchical Clustering Technique

Uploaded by

Deep orthogonal matrix factorization as a

hierarchical clustering technique

ISBN: 978-9-0827-9706-0 1466 EUSIPCO 2021

A. Solving ONMF for r = n − 1 B. Application to deep ONMF

Interestingly, the value of the relative reconstruction errors

You might also like