0% found this document useful (0 votes)

27 views8 pages

Fusion Subspace Clustering For Incomplete Data

This document introduces fusion subspace clustering (FSC), a novel method for learning low-dimensional structures from incomplete data. FSC assigns each data point to its own subspace and minimizes the distance between each point and its subspace, as well as the distance between subspaces of similar data points, allowing the subspaces to fuse together. FSC can handle low, high, and full-rank data directly with noise, and has sample complexity approaching the theoretical limit. It provides a model selection approach and converges to a local minimum, outperforming other methods on real and synthetic data, especially when data is missing.

Uploaded by

habeeb4sa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views8 pages

Fusion Subspace Clustering For Incomplete Data

Uploaded by

habeeb4sa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Fusion Subspace Clustering for Incomplete Data

Usman Mahmood Daniel Pimentel-Alarcón

Department of Computer Science Department of Biostatistics and Medical Informatics
Georgia State University University of Wisconsin-Madison
Atlanta, GA, USA Madison, WI, USA
[email protected] [email protected]

Abstract—This paper introduces fusion subspace clustering, cluster K r-dimensional subspaces. In low-sampling regimes,
arXiv:2205.10872v1 [cs.LG] 22 May 2022

a novel method to learn low-dimensional structures that ap- this would require a super-polynomial number of points [39],
proximate large scale yet highly incomplete data. The main which are rarely available in practice. Alternatively, filling
idea is to assign each datum to a subspace of its own, and
minimize the distance between the subspaces of all data, so that missing entries with a sensible value (e.g., zeros or means [35]
subspaces of the same cluster get fused together. Our method or using low-rank matrix completion [36]) may work if data
allows low, high, and even full-rank data; it directly accounts is missing at a rate inversely proportional to the subspaces’
for noise, and its sample complexity approaches the information- dimensions [37], or if data is low-rank. However, in most
theoretic limit. In addition, our approach provides a natural applications data is missing at much higher rates, and due to the
model selection clusterpath, and a direct completion method. We
give convergence guarantees, analyze computational complexity, number and dimensions of the subspaces, data is typically high
and show through extensive experiments on real and synthetic or even full-rank. In general, data filled with zeros or means
data that our approach performs comparably to the state-of-the- no longer lie in a union of subspaces (UoS), thus guaranteeing
art with complete data, and dramatically better if data is missing.failure even with a modest amount of missing data [38]. Other
approaches include alternating methods like k-subspaces [40],
I. I NTRODUCTION
expectation-maximization [41], group-lasso [42], and lifting
Inferring low-dimensional structures that explain high- techniques [38], [43], [44] that require (at the very least)
dimensional data has become a cornerstone of discovery in squaring the dimension of an already high-dimensional problem,
virtually all fields of science. Principal component analysis which severely limits their applicability. More recently methods
(PCA), which identifies the low-dimensional linear subspace like [45], [46] incorporate a variation of fuzzy c-means for
that best explains a dataset, is arguably the most prominent data imputation and/or clustering. However, these existing
technique for this purpose. In many applications — computer approaches either have limited applicability or do not perform
vision, image processing, bioinformatics, linguistics, networks well if data is missing in large quantities [50]. For example,
analysis, and more [1]–[10] — data is often composed of a k-nearest neighbors imputation can distort the data distribution,
mixture of several classes, each of which can be explained by resulting in inaccurate nearest neighbors identification [47].
a different subspace. Clustering accordingly is an important Regression methods can also lead to low accuracy, especially
unsupervised learning problem that has received tremendous if the underlying variables have low correlation. Existing
attention in recent years, producing theory and algorithms to approaches along with their weaknesses are compared by [50].
handle outliers, noisy measurements, privacy concerns, and These challenges call the attention to new strategies to address
data constraints, among other difficulties [11]–[31]. missing data.
However, one major contemporary challenge is that data This paper introduces fusion subspace clustering (F SC), a
is often incomplete. For example, in image inpainting, the novel approach to address incomplete data, inspired by greedy
values of some pixels are missing due to faulty sensors methods, convex relaxations, and fusion penalties [59]–[64].
and image contamination [32]; in computer vision features The main idea is to assign each datum to a subspace of its
are often missing due to occlusions and tracking algorithms own, and then fuse together nearby subspaces by minimizing
malfunctions [33]; in recommender systems each user only (i) the distance between each datum and its subspace (thus
rates a limited number of items [34]; in a network, most nodes guaranteeing that each datum is explained by its subspace),
communicate in subsets, producing only a handful of all the and (ii) the distance between the subspaces of all data, so
possible measurements [7]. that subspaces from points that belong together get fused
Missing data notoriously complicates clustering. The into one. While F SC is mainly motivated by missing data,
main difficulty with highly incomplete data is that subsets it is also new to full-data, and has the next advantages: it
of points are rarely observed in overlapping coordinates, allows low, high, and even full-rank data. F SC directly allows
which impedes assessing distances. Existing self-expressive noise, and its sample complexity approaches the information-
formulations [9], [35], agglomerative strategies [25], and partial theoretic limit [65], as shown in sections VII-A4 and VII-A5.
neighborhoods [39], all require observing O(r + 1) overlapping Similar to hierarchical clustering, F SC can produce a model
coordinates in at least K sets of O(r + 1) points in order to selection clusterpath providing detailed information about intra-
cluster and cluster-to-cluster distances (see Figure 2). Finally, Notice that if λ ≥ n(n − 1)/2 (the number of distinct pairs of
its simplicity makes F SC amenable to analysis: our main points), then the problem is unconstrained, and a trivial solution
theoretical result shows that F SC converges to a local minimum. is Ui formed by xi and any other r − 1 vectors (in fact this
This is particularly remarkable, especially in light that most is precisely our choice for initialization, with the additional
other subspace clustering algorithms lack theoretical guarantees r − 1 vectors populated with i.i.d. N(0, 1) entries, known to
(even local convergence) when data is missing (except for produce incoherent and nearly orthogonal subspaces with high
restrictive fractions of missing entries, and liftings, which are probability [51]). If λ = n(n − 1)/2 − 1, then (1) forces
unfeasible for high-dimensional data). Our experiments on two subspaces to fuse, similar to the first step in hierarchical
real and synthetic data show that with full-data, F SC performs clustering. More generally, if λ = n(n − 1)/2 − `, then (1)
comparably to the state-of-the-art, and dramatically better if forces ` − 1 subspaces to fuse. However, (1) is a combinatorial
data is missing. problem, so we propose the following relaxation:

II. P ROBLEM S TATEMENT n n n

X λ XX
Let X ∈ R d×n
be a data matrix whose columns lie arg min ρ2 (xi , Ui ) + wij ρ(Ui , Uj ).
U1 ,...,Un i=1 2
approximately in the union of K low-dimensional subspaces i=1 j=1
of Rd (i.e., we allow noise). Assume that we do not know
a priori the subspaces, nor how many there are, nor their Notice that this is an `1 -group penalty that promotes sparsity
dimensions, nor which column belongs to which subspace. Let in the terms ρ(Ui , Uj ) [53]–[56]; however, in our case the
XΩ denote the incomplete version of X, observed only in the groups are not known and need to be discovered, similar to
entries of Ω ⊂ {1, . . . , d} × {1, . . . , n}. Given XΩ , our goals task structure learning [57]. The weights wij ≥ 0 quantify
are to cluster the columns of XΩ according to the underlying how much attention we give to each penalty. Ideally, if xi and
subspaces, infer such subspaces, and complete XΩ . xj are not in the same subspace, we would like the penalty
Notations. Throughout the paper, xi ∈ Rd denotes the ith ρ(Ui , Uj ) to be ignored, so that Ui and Uj do not tilt one
column of X, Ui ⊂ Rd denotes the subspace assigned to xi , another. Conversely, if xi and xj are in the same subspace,
and Ui ∈ Rd×r is a basis of Ui ; here i = 1, . . . , n, and r is an we want the penalty ρ(Ui , Uj ) to have more weight, so that
upper bound on the dimension of the subspaces. Given i, we Ui and Uj get fused together. Here again λ ≥ 0 is a proxy of
use the superscript Ω to indicate the restriction of a subspace, K that regulates how subspaces fuse together. The larger λ,
matrix or vector to the observed entries in xi . For example, the more we penalize subspaces being apart, which results in
if xi is observed on ` rows, then xΩ `
i ∈ R and Ui ∈ R
Ω `×r more subspaces getting fused. Sections V and VI contain more
denote the restrictions of xi and Ui to the observed rows in xi . details about λ and wij .
We use k · kF to denote the Frobenius norm, and 1 to denote The main reason we find this fusion formulation so attractive
the indicator function. is its capacity to effectively isolate the burden of missing
data on the first terms, where it can be easily handled.
III. F USION S UBSPACE C LUSTERING That is because under standard incoherence assumptions,
Ω
First notice that we can write (full-data) subspace clustering the incomplete-data residual norm ρ(xΩ i , Ui ) := kxi −
Ω
Ω ΩT Ω −1 ΩT Ω
as the following optimization problem: Ui (Ui Ui ) Ui xi k2 will be proportionally close to
the full-data residual norm ρ(xi , Ui ) with high probability
Xn n
1 XX
n (Theorem 1 in [58]). So if data is missing, all we need to
arg min ρ2 (xi , Ui ) s.t. 1{ρ(Ui ,Uj )=0} ≤ λ, do is replace the first terms, so that each subspace only fits
U1 ,...,Un i=1 2
i=1 j=1 the observed entries of its assigned column. This leaves the
(1) second terms unaffected to be used as proxies to compute
where distances between incomplete points, even if they are observed
on disjoint coordinates. With these observations we obtain our
−1 T
ρ(xi , Ui ) := xi − Ui (UT i Ui ) Ui x i 2 , general F SC formulation:
−1 T −1 T
ρ(Ui , Uj ) := Ui (UT
i Ui ) Ui − Uj (UT
j Uj ) Uj F
.
Recall that the projector operator onto span{Ui } is Pi := n n n
X
Ω λ XX
Ui (UT i Ui )
−1 T
Ui , so ρ(xi , Ui ) is simply measuring the dis- arg min ρ2 (xΩ
i , Ui ) + wij ρ(Ui , Uj ).
U1 ,...,Un i=1 2
tance between a point and a subspace through its projection i=1 j=1

residual. Similarly, ρ(Ui , Uj ) defines a metric on the Grassman- (2)

nian [52], measuring distance between subspaces through their
projection operators, which unlike bases, are unique for each
subspace. Formulation (1) assigns each point xi to a subspace We point out that (2) only requires that each point is close
of its own (with basis Ui ). Minimizing ρ(xi , Ui ) ensures that to its corresponding subspace, as opposed to exactly on it, so it
each point is close to its assigned subspace.
√ The constraint directly allows noise. To solve √
(2) we use coordinate gradient
ensures that there are no more than O( λ) different subspaces. descent. Since we use wij = rd 1κ {exp(−γρ2 (Ui , Uj ))}
(see Section VI), the gradient of (2) with respect to Ui is: xi we can complete xΩi as x̂i = Ûki θ̂ i , thus achieving (iii). In
our experiments both LRMC and our approach yield the exact
X wij ∇00ij (1 − γρ2 (Ui , Uj ))
∇i = ∇0i + 4λ , same results up to numerical precision.
ρ(Ui , Uj )
j6=i,1κ =1
IV. C ONVERGENCE G UARANTEES
where ∇0i is equal to 0 for the rows not in Ω, Our main theoretical result shows that a sequence of gradient
iterates of F SC will converge to a critical point, which is not
∇0i =− 2xΩ ΩT Ω ΩT Ω −1
i xi Ui (Ui Ui )
generally the case for methods of this type [66], [67].
ΩT Ω −1
+ (UΩ ΩT 2 Ω ΩT Ω
i Ui ) xi xi Ui (Ui Ui )
ΩT Ω −1 ΩT Ω ΩT Ω ΩT Ω Theorem 1. The sequence of gradient iterates {Ui (t) }i∈[n],t>0
+ UΩi (Ui Ui ) Ui xi xi Ui Ui Ui of (2) has an accumulation point. Moreover, any accumulation
for the rows in Ω, and ∇00ij = (Ui (UT −1 T
Ui − point of {Ui (t) }i∈[n],t>0 is a critical point of (2).
i Ui )
T −1 T T −1
I)Uj (Uj Uj ) Uj Ui (Ui Ui ) . Proof. Since the objective function in (2) only depends on Ui
Whether data is missing or not, the solution to (2) will through its projection matrix, its solution is equivalent to the
be a sequence of (full-data) bases U1 , . . . , Un , one for each solution restricting Ui to be orthonormal, whence kUi kF = r.
column in X. Due to the second term in (2), we expect It follows that the sequence {Ui (t) }i∈[n],t>0 will be bounded.
subspace Ui = span{Ui } to be close to Uj = span{Uj } if By the Bolzano Weierstrass theorem, it contains a convergent
columns xi and xj belong together, and far otherwise. It remains subsequence, whose limit will be an accumulation point. To
to group together subspaces that are close, or equivalently, show that accumulation points are critical points we will
assign a label to each subspace Ui . To this end, use spectral demonstrate that the conditions of Lemma 3.2 in [68] are
clustering, which shows remarkable performance in many met. Let
modern problems, and is widely used as the final step in n n
λ XX
many subspace clustering algorithms, including the state-of- f (U1 , . . . , Un ) := wij ρ(Ui , Uj ),
2
the-art SSC. Spectral clustering receives a similarity matrix i=1 j=1
S ∈ Rn×n between n points, and runs a standard clustering gi (Ui ) := ρ2 (xi , Ui ).
method (like k-means) on the relevant eigenvectors of the
Laplacian matrix of S [71]. We can build a similarity matrix Next notice that:
S between subspaces U1 , . . . , Un whose (i, j)th entry is equal (a) gi is closed because it is continuous with a closed domain
to 1/ρ(Ui , Uj ). At this point we can run spectral clustering (see Section A.3.3 in [69]). gi is also proper convex,
on S to assign a label ki ∈ {1, . . . , K} to each subspace Ui , because g1 is a norm [70]. Finally, gi is sub differentiable.
or equivalently, to each column xi , thus providing a clustering (b) f is continuously differentiable, because with Ui orthonor-
of X, as desired. mal, Ui (UT i Ui )
−1 T
Ui simplifies to Ui Ui T , whence f is
a polynomial.
A. Subspace Estimation and Data Completion (c) The gradient of f is Lipschitz continuous with
Recall that our goals are to: (i) cluster the columns of XΩ , respect
√ to Ui , because we are √ using wij =
2
Ω
(ii) infer the underlying subspaces, and (iii) complete X . So rd 1 κ {exp(−γρ (U i , U j ))} ≤ rd (see Sec-
T −1 T
far we have only achieved (i). However, that is the difficult step. tion VI), and ρ(U i , U j ) ≤√ kU i (U i U i) Ui kF +
T −1 T T −1 T
Ω
In fact, once X is clustered, there are several straightforward kU (U U ) U k ≤ rkU (U U ) U i k2 +
√ j j Tj −1
j F
T √ i i i
ways to achieve (ii) and (iii). Common approaches concatenate rkUj (Uj Uj ) Uj k2 ≤ 2 r.
all the columns of XΩ that correspond to the same cluster (d) The objective function in (2) is continuous with closed and
into a single matrix XΩ k , and complete it into a matrix X̂k bounded domains. It follows by the extreme value theorem
using low-rank matrix completion (LRMC) [36] (because its that it must have a minimum, and hence a minimizer.
columns now lie in a single subspace), thus achieving (iii). To Conditions (a)-(d) are the assumptions of Lemma 3.2 in [68].
accomplish (ii) one can compute the leading singular vectors of Theorem 1 follows directly.
X̂k to produce an subspace basis estimate Ûk . We can do this
as well. However, F SC does not require LRMC, which may V. M ODEL S ELECTION
fail if the subspaces are coherent (aligned with the canonical F SC provides a natural way for model selection, namely
axes) or samples are not uniformly spread [36]. determining the number of subspaces K that best explain the
Our LRMC-free approach is as follows: since the bases Ui data, and their dimensions. Intuitively, the first term in (2)
produced by (2) have no missing data, we can normalize and guarantees that each subspace Ui is close to its assigned column,
concatenate all the bases that correspond to the kth cluster into and the second term guarantees that subspaces from different
a single matrix Wk , and compute its leading singular vectors columns are close to one another. The tradeoff between these
to produce an average estimate Ûk , thus achieving (ii). Next two quantities is determined by λ ≥ 0 (see Figure 1). If λ = 0,
we can estimate the coefficient of each xΩ i with respect to its then the second term is ignored, and there is a trivial solution
Ω −1 ΩT Ω
corresponding subspace basis Ûki : θ̂ i := (ÛΩT
ki Ûki ) Ûki xi . where each subspace exactly contains its assigned column (thus
Since the coefficient of xΩ i is the same as the coefficient of attaining the minimum, zero, in the first term). If λ > 0, the
Fig. 1: In (2), λ ≥ 0 regulates how clusters fuse together. If λ = 0, each point is assigned to a subspace that exactly contains it (overfitting). The larger λ, the
more we penalize subspaces being apart, which results in subspaces getting closer (as with λ = 1), up to the point that some subspaces fuse (as with λ = 10).
In the extreme (λ = ∞), all subspaces fuse together, and we need to explain all data with a single subspace (which may not be enough). Notice that it is not
always evident how many subspaces one should use to explain a dataset. In this illustration, should we choose λ = 1, which would result in 3 subspaces, or
λ = 10, which would result in 2? Section V discusses how to choose λ, which in turn determines the number of subspaces K that best explain the data, and
their dimensions.

second term forces subspaces from different columns to get complexity [73], [74]. Ideally, we would like wij to be large if
closer, even if they no longer contain exactly their assigned xi and xj lie in the same subspace, so that the penalty ρ(Ui , Uj )
columns. As λ grows, subspaces get closer and closer, up to gets a higher weight, forcing subspaces Ui and Uj to fuse into
the point where some subspaces fuse into one. This is verified one. Conversely, if xi and xj lie in different subspaces, we want
in our experiments (see Figure 3). The extreme case (λ = ∞) wij to be small, so that the penalty ρ(Ui , Uj ) gets ignored,
forces all subspaces to fuse into one (to attain zero in the and subspaces Ui and U√j do not fuse.
second term), meaning we only have one subspace to explain Here we use wij = rd 1κ {exp(−γρ2 (Ui , Uj ))}, where
all data, which is precisely PCA (for full-data) and LRMC the indicator 1κ takes the value one if j is amongst the κ nearest
(for incomplete data). In other words, F SC is a generalization neighbors
√ of i or vice versa, and zero otherwise. Here the factor
of PCA and LRMC, which is the particular case of (2) with rd ensures that the penalty is in the order of the number
λ = ∞. Iteratively decreasing λ will result in more and more of degrees of freedom (the same rescaling is used in [54],
clusters, until λ = 0 produces n. The more subspaces, the more [55]), and the second factor is a Gaussian kernel that slows the
accuracy, but the more degrees of freedom (overfitting). For fusion of distant subspaces [73], [74], where γ ≥ 0 regulates
each λ that provides a different clustering, we can compute a how separated subspaces are. In particular γ = 0 corresponds
goodness of fit test (like the Akaike information criterion, AIC to uniform weights (wij = 1 for every (i, j)), known to be a
[72]) that quantifies the tradeoff between accuracy and degrees good option if subspaces are well-separated, and to produce no
of freedom, to determine the best number of subspaces K. For splits in the clusterpath of Euclidean clustering (Theorem 1 in
example, this test can be in the form of K, and the residuals [62]). More generally, with γ > 0 the second factor measures
of the projections of each xΩ Ω
i onto its corresponding Ûk , as the distance between subspaces Ui and Uj such that if Ui
defined in Section III-A. Similarly, we can iteratively increase and Uj are close, then ρ(Ui , Uj ) will be small, resulting in a
r to find all the columns that lie in 1-dimensional subspaces, large value of exp(−γρ2 (Ui , Uj )) (and vice versa), as desired.
then all the columns that lie in 2-dimensional subspaces, and Figure 2 shows the distribution of these Gaussian kernels on
so on (pruning the data at each iteration). This will result in an the Yale B dataset (which are mostly small for outer-cluster
estimate of the number of subspaces K, and their dimensions. points, and mostly large for intra-cluster points, as desired),
The Subspace Clusterpath. Notice that iteratively increas- together with the similarity matrices produced by two options
ing λ also provides a natural way to quantify and visualize intra- (γ = 0 and γ = 1).
cluster and outer-cluster similarities through a graph showing We point out that besides improving clustering quality,
the evolution of subspaces as they fuse together, similar to the limiting positive weights to nearest neighbors (NNs) improves
clusterpath produced in [62] for euclidian clustering (Figure the computational complexity of F SC. To see this first notice
2). Notice, however, that fusion is not necessarily monotonic, that finding NNs requires n2 linear operations to compute
i.e., fused subspaces may split, so in general this graph may pairwise distances. However, these calculations are negligible in
be a network, rather than a tree. comparison with the polynomial operations required to compute
n2 gradients (which require matrix inversions). Consequently,
VI. P ENALTY W EIGHTS , C OMPUTATIONAL C OMPLEXITY, by limiting positive weights to κ NNs we cut down these
AND PARAMETERS
n2 polynomial calculations to κn, thus reducing the effective
Like other fusion formulations [53]–[57], F SC involves computational complexity of F SC to O(drκn) and achieving
weight terms wij that bring the flexibility to distinguish which linear complexity in problem size (see Sections 3.3 and 5.1 in
subspaces to fuse, and which ones not to, which in turn can [73]). This, of course, comes at a price: increasing the effective
dramatically improve the clustering quality and computational number of parameters of F SC to a total of four: λ, which
U1 U1
U10 U10
U20 U20
U30 U30
U40 U40

p
U50 U50
U60 U60
U70 U70
U80 U80
U90 U90
U96 U96
U1 U10 U20 U30 U40 U50 U60 U70 U80 U90U96 U1 U10 U20 U30 U40 U50 U60 U70 U80 U90U96
Subspaces Subspaces
Fig. 2: Left: Clusterpath showing how subspace estimates (corresponding 32 data points in 4 gaussian subspaces, 8 each) progressively fuse as λ increases.
Center: Distribution of Gaussian kernels in the Yale B dataset with γ = 1; notice that most outer-cluster points receive small values and vice versa, as
desired. Right: Similarity matrices produced by F SC on the Yale dataset with γ = 0 (uniform weights, producing a poor clustering because subspaces are not
well-separated), and with γ = 1 and κUU1 = UU1 1
1 r (Gaussian kernel weights with nearest neighbors, producing a near perfect clustering).
UU1010 UU1010
UU2020 UU2020
UU3030 UU3030
UU4040 UU4040
p

p
p

p
controls how much subspacesUU50fuse
50 (see Section V), UU5050 γ and κ, nk = 20, and p = 0. We run 30 trials of each experiment, and
UU6060 UU6060
which together determine the Uweights
U7070 wij that control UU7070 which show the average results of F SC and all the algorithms above.
UU8080 step size, which can UU8080 be tuned
subspaces fuse, and the gradient
UU9090 UU9090 In all our simulations we first generate K matrices U?k ∈
UU9696 UU9696
U7070UU8080UU90U90U9696with i.i.d. N(0, 1) entries, to use as bases of the true
with standard techniques; in our d×r
UU1 1UUexperiments
1010UU
2020UU
3030UU
4040UU
5050UU
6060UU
7070UU
we
8080UU
90U90U
used
9696 UU 1 1UU
5-fold
1010UU
2020UU
3030UU
4040UU
5050UU
6060UR
cross-validation. Subspaces
Subspaces Subspacessubspaces. For each k we generate a matrix Θ? ∈ Rr×nk , also
Subspaces
k
VII. E XPERIMENTS with i.i.d. N(0, 1) entries, to use as coefficients of the columns
in the kth subspace. We then form X as the concatenation
Now we study the performance of F SC. For reference, we [U1 ? Θ?1 U2 ? Θ?2 · · · U?K Θ?K ], plus a d × n noise matrix
compare against the following subspace clustering algorithms with i.i.d. N(0, σ 2 ) entries. To create Ω, we sample each entry
that allow missing data: (a) Entry-wise zero-filling SSC [35]. independently with probability 1 − p.
(b) LRMC + SSC [35]. (c) SSC-Lifting [38]. (d) Algebraic
1) Effect of the penalty parameter: We first study the number
variety high-rank matrix completion [43]. (e) k-subspaces with
of clusters obtained by F SC as a function of λ, with the default
missing data [40]. (f ) EM [41]. (g) Group-sparse subspace
settings above. Figure 3 shows, consistent with our discussion
clustering [42]. (h) Mixture subspace clustering [42]. We
in Section V, that if λ = 0, F SC assigns each point to its
chose these algorithms based on [38], [42], where they show
own cluster. As λ increases, subspaces start fusing together up
comparable state-of-the-art performance. (a)-(d) essentially
to the point where if λ is too large, F SC fuses all subspaces
run SSC after fill missing entries with zeros, according to
into one, and all data gets clustered together. Next we study
a single larger subspace, or after lifting the data. (e)-(h)
performance. Figure 3 shows that there is a wide range of
are alternating algorithms that according to [42] produce
values of λ that produce low error, showing that F SC is quite
best results when initialized with the output of (a), and so
stable. Note that the error increases if λ is too small or too
indirectly they also depend on SSC. To measure performance
large. This is consistent with our previous experiment, showing
we compute clustering error (fraction of misclassified points).
that extreme values of λ produce too few or too many clusters.
When applicable (i.e., when no data is missing) we additionally
compare against the following full-data approaches: BDR 2) Effect of dimensionality: It is well-documented that
[30], iPursuit [28], LRR [11], [13], LSR [24], LRSC [23], data in lower-dimensional subspaces are easier to cluster
L2Graph [26], SCC [6], SSC [9], and S3C [29]. In the [38], [39], [42]–[44], [65]. In the extreme case, clustering 1-
interest of reproducibility, all our code is included in the dimensional subspaces requires a simple co-linearity test, and
supplementary material. In the interest of fairness to other is theoretically possible with as little as 2 samples per column
algorithms, whenever available we (a) used their code, (b) used [65]. In contrast, no existing algorithm can successfully cluster
their specified parameters, (c) did a sweep to find the best (d − 1)-dimensional subspaces (hyperplanes), which is actually
parameters, and (d) used reported results from the literature. impossible even if one entry per column is missing [65]. Of
Whenever there was a discrepancy, we reported their best course, subspaces’ dimensionality is relative to the ambient
performance, be that from reports or from our experiments. dimension: a 10-dimensional subspace is a hyperplane in R11 ,
but low-dimensional in R1000 . In this experiment we test F SC
A. Simulations as a function of the low-dimensionality of the subspaces, i.e.,
Since F SC is an entirely new approach to both, full and the gap between the ambient dimension d and the subspaces’
incomplete data, we present a thorough series of experiments dimension r. First we fix d = 100 and compute error as a
to study its behavior as a function of the penalty parameter λ, function of r. As r grows, the subspace becomes higher and
the ambient dimension d, the number of subspaces involved higher-dimensional. Then we turn things around, fixing r = 5
K, their dimensions r, the noise variance σ 2 , the number of and varying d. As d grows, this subspace becomes lower and
data points in each cluster nk , and of course, the fraction lower-dimensional. Figure 3 shows that F SC is more sensitive
of unobserved entries p. Unless otherwise stated, we use the to high-dimensionality than the state-of-the-art. However, pay
following default settings: d = 100, K = 4, r = 5, σ = 0, attention to the scale: even in the worst-case (r = 30), the gap
Simulations Real Data
Yale Yale

0.1 0.2 0.3 0.4 0.5 0.6

0.00 0.01 0.02 0.03 0.04
FSC (this paper)

0.6
0.03
Produced Clusters

Clustering Error
SSC(EWZF)

0.4
SSC(LRMC)

0.00
Lift

0.05 0.2
EM

-0.03
1510

0.05
1 10−410−5 10−9 0 10−5 10−3 10−1 1 2 3 4 10 15 25 2 3 4 5 6 .1 .3 .5 .7 .9 .99
Parameter λ Parameter λ Number of Subspaces Number of Subjects MISSING DATA

Hopkins Hopkins
0.66

0.6
GSSC

0.61

0.09
Clustering Error

Mixture
0.44

0.4
0.020

0.06
BDR
0.38

iPursuit
0.22

0.05 0.2
0.03
LRR

0.005
0.11

0.00
0.0

1 5 10 20 30 50 6 10 20 30 50 70 90 100 6 10 15 20 25 30 2 3 .1 .3 .5 .7 .9 .99
Subspaces’ Dimension Ambient Dimension Columns Per Subspace Number of Objects MISSING DATA

(nk = 20) (nk = 50) MNIST MNIST

0.05 0.15 0.25 0.35 0.45 0.55

0.8

LSR

0.8
0.10

0.8
Clustering Error

LRSC
0.6

0.6
L2Graph

0.6
0.4

0.4
SCC
0.05

0.4
SSC
0.2

0.2

S3C
0.0

0.0

0.2
0.00

0.0001 0.001 0.01 0.1 .3 .4 .5 .6 .7 .8 .9 .94 .3 .4 .5 .6 .7 .8 .9 .94 2 3 4 5 6 7 8 9 10 .1 .3 .5 .7 .9 .99

Noise Level MISSING DATA MISSING DATA Number of Digits MISSING DATA

Fig. 3: Top-left corner: Number of clusters obtained by F SC as a function of the parameter λ in (2). Rest: Clustering error of F SC and other baseline
algorithms. Notice the different scales. With full-data F SC is rarely and barely outperformed by other algorithms. In contrast, when data is missing, F SC
outperforms other algorithms by a wide margin. For example, in simulations with nk = 20 (resp. Yale dataset) and p = 0.9, F SC achieves 7.5% error
(resp. 25.7%), while the next best algorithm achieves 71.25% (resp. 64.79%). We point out that some curves are “missing” from some plots because some
methods are not applicable. e.g., SCC cannot handle missing data, and Lift cannot handle large dimensions.

between F SC and the state-of-the-art is around 10%. and many columns). Notice that if p ≈ 0 (few missing data),
3) Effect of noise: Figure 3 shows that F SC performs as well then F SC performs as well as the state-of-the-art, and much
or better than the state-of-the-art with different noise levels. better as p increases (many missing data); see for example
Recall that λ quantifies the tradeoff between how accurately nk = 20 and p = 0.9, where the best alternative algorithms
we want to represent each point xi (the first term in (2)), and gets 71.25% error, which is close to random guessing (because
how close subspaces from different points will be (second there are K = 4 subspaces in our default settings). In contrast,
term), which in turn determines how subspaces fuse together, F SC gets 7.5% error. Notice that p = 0.9 is very close to
or equivalently, how many subspaces we will obtain. If data is the exact information-theoretic minimum sampling rate p =
completely noiseless, we expect to represent each point very 1 − (r + 1)/d = 0.94 [65]. Similar to noise, if there is much
accurately, and so we can use a smaller λ (giving more weight missing data the first term in (2) will carry less weight, which
to the first term). On the other hand, if data is noisy, we expect we can compensate by making λ smaller.
to represent each point within the noise level, and so we can
use a larger λ. As a rule of thumb, we can use λ inversely B. Real Data Experiments
proportional to the noise level σ. 1) Face Clustering.: It has been shown that the vectorized
4) Effect of the number of subspaces and data points: images of the same person under different illuminations lie
Figure 3 shows that F SC is quite robust to the number of near a 9-dimensional subspace [4]. In this experiment we
subspaces. Recall that in our default settings, r = 5, so K ≥ 20 evaluate the performance of F SC at clustering faces of multiple
produces a full-rank data matrix X. Figure 3 also evaluates the individuals, using the Yale B dataset [75], containing a total of
performance of F SC as a function of the columns per subspace 2432 images, each of size 48 × 42, evenly distributed among
nk . Since r = 5, nk = 6 is information-theoretically necessary 38 individuals. To compare things vis à vis, before clustering,
for subspace clustering, we conclude that F SC only requires we use robust PCA [76] on each cluster, to remove outliers;
little more than that to perform as well as the state-of-the-art. this is a widely used preprocessing step [9], [35], [38], [41].
5) Effect of missing data: There is a tradeoff between the In each of 30 trials, we select K people uniformly at random,
number of columns per subspace nk and the sampling rate and record the clustering error. Figure 3 shows that F SC is
(1 − p) required for subspace clustering [65]. The larger nk , very competitive and there is only a negligible gap between
the higher p may be, and vice versa. Figure 3 evaluates the F SC and the best alternative algorithm. Figure 3 also shows the
performance of F SC as a function of p with nk = 20, 50 (few average clustering error as a function of the amount of missing
data (induced uniformly at random), with K fixed to 6 people. reduce computational complexity (as in [73] for euclidean
Consistent with our simulations, F SC outperforms the state-of- clustering), (ii) optimal initializations, greedy, adaptive, data-
the-art in the low-sampling regime (many missing data). For driven, and outlier-robust variants, and (iii) geodesics on the
example, with p = 0.9 F SC gets 25.7% error, while the next Stiefel and Grassmann manifolds (similar to [78] for subspace
best algorithm gets 64.79%. Note that p = 0.9 is quite close to tracking) to avoid the inversion of the term UT i Ui in Pi ,
the exact information-theoretic limit p = 1−(r+1)/d = 0.995 which may become ill-conditioned. Ultimately, we hope this
[65]. publication spurs discussions and insights that ultimately lead
2) Motion Segmentation.: It is well-known that the locations to better methods and a better understanding of subspace
over time of a rigidly moving object approximately lie in a clustering when data is missing.
3-dimensional affine subspace [2], [3] (which can be thought
as a 4-dimensional subspace whose fourth component accounts R EFERENCES
for the offset). Hence, by tracking points in a video, and [1] T. Hastie and P. Simard, Metrics and models for handwritten character
subspace clustering them, we can segment the multiple moving recognition, Statistical Science, 1998.
[2] C. Tomasi and T. Kanade, Shape and motion from image streams under
objects appearing in the video. In this experiment we test F SC orthography, International Journal of Computer Vision, 1992.
on this task, using the Hopkins 155 dataset [77], containing [3] K. Kanatani, Motion segmentation by subspace separation and model
sequences of points tracked over time in 155 videos. Each selection, IEEE International Conference in Computer Vision, 2001.
video contains K = 2, 3 objects. On average, each object is [4] R. Basri and D. Jacobs, Lambertian reflection and linear subspaces,
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003.
tracked on nk = 133 points (described by two coordinates) over [5] J. Rennie and N. Srebro, Fast maximum margin matrix factorization for
29 frames, producing vectors in ambient dimension d = 58. collaborative prediction, International Conference on Machine Learning,
Figure 3 shows the results. With full-data F SC is far from 2005.
[6] G. Chen and G. Lerman, Spectral curvature clustering (SCC) International
the best, but has performance comparable to the rest of the Journal of Computer Vision, 2009.
algorithms. However, when data is missing, we again see that [7] B. Eriksson, P. Barford, J. Sommers and R. Nowak, DomainImpute:
F SC again dramatically outperforms the rest of the algorithms. Inferring unseen components in the Internet, IEEE INFOCOM Mini-
Conference, 2011.
Figure 3 shows the average results over all videos when data [8] A. Zhang, N. Fawaz, S. Ioannidis and A. Montanari, Guess who rated
is induced uniformly at random. For example, with p = 0.9, this movie: Identifying users through subspace clustering, Uncertainty
the best baseline algorithm gets 52.95% error. In contrast, F SC in Artificial Intelligence, 2012.
[9] E. Elhamifar and R. Vidal, Sparse subspace clustering: Algorithm, theory,
achieves 15.03% error. Notice that p = 0.9 is very close to and applications IEEE Transactions on Pattern Analysis and Machine
the exact information-theoretic minimum sampling rate p = Intelligence, 2013.
10(r + 1)/d = 0.914 [65]. [10] G. Mateos and K. Rajawat, Dynamic network cartography: Advances in
network health monitoring, IEEE Signal Processing Magazine, 2013.
3) Handwritten Digits Clustering.: As a last experiment [11] G. Liu, Z. Lin and Y. Yu, Robust subspace segmentation by low-rank
we use F SC to cluster vectorized images of handwritten representation, International Conference on Machine Learning, 2010.
digits, known to be well-approximated by 12-dimensional [12] R. Vidal, Subspace clustering, IEEE Signal Processing Magazine, 2011.
[13] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu and Y. Ma, Robust recovery of
subspaces [79]. For this purpose we use the MNIST dataset subspace structures by low-rank representation, IEEE Transactions on
[80], containing thousands of gray-scaled, 28 × 28 images. Pattern Analysis and Machine Intelligence, 2013.
First we will test F SC as a function of the number of digits [14] M. Soltanolkotabi, E. Elhamifar and E. Candès, Robust subspace
clustering, Annals of Statistics, 2014.
(subspaces) in the mix. Following common practice, for each [15] C. Qu and H. Xu, Subspace clustering with irrelevant features via robust
K = 2, 3, . . . , 10 we randomly selected K digits, nk = 50 Dantzig selector, Advances in Neural Information Processing Systems,
images per digit, and aimed to cluster the images. Figure 3 2015.
[16] X. Peng, Z. Yi and H. Tang, Robust subspace clustering via thresholding
shows the average results of 10 independent trials of each ridge regression, AAAI Conference on Artificial Intelligence, 2015.
configuration. Consistent with our previous experiments, if [17] Y. Wang and H. Xu, Noisy sparse subspace clustering, International
no data is missing, F SC performs comparable to the rest of Conference on Machine Learning, 2013.
[18] Y. Wang, Y. Wang and A. Singh, Differentially private subspace
the algorithms with a gap (in the worst-cases) no larger than clustering, Advances in Neural Information Processing Systems, 2015.
5%. However, as soon as we induce missing data (uniformly [19] H. Hu, J. Feng and J. Zhou, Exploiting unsupervised and supervised
at random), F SC starts outperforming all other methods by a constraints for subspace clustering, IEEE Pattern Analysis and Machine
huge margin (up to 40%). Figure 3 shows the results of this [20] Intelligence, 2015.
C. You, D. Robinson and R. Vidal, Scalable sparse subspace clustering
experiment. by orthogonal matching pursuit, IEEE Conference on Computer Vision
and Pattern Recognitionm 2016.
B ROADER I MPACT [21] Y. Yang, J. Feng, N. Jojic, J. Yang and T. Huang, `0 -sparse subspace
clustering, European Conference on Computer Vision, 2016.
This paper introduces a novel strategy to address missing data [22] B. Xin, Y. Wang, W. Gao and D. Wipf, Data-dependent sparsity for
in subspace clustering, which enables clustering and completion subspace clustering, Uncertainty in Artificial Intelligence, 2017.
in regimes where other methods fail. Practitioners in computer [23] P. Favaro, R. Vidal, and A. Ravichandran, A closed form solution to
robust subspace estimation and clustering, IEEE Conference on Computer
vision, recommender systems, networks inference, and data Vision and Pattern Recognition, 2011.
science in general can use our new method. We expect this [24] C. Lu, H. Min, Z. Zhao, L. Zhu, D. Huang, and S. Yan, Robust and
paper to motivate the learning community to explore new efficient subspace segmentation via least squares regression, Proceedings
of the European Conference in Computer Vision, 2012.
directions that stem from this initial work. These include the [25] D. Park, C. Caramanis and S. Sanghavi, Greedy subspace clustering,
investigation of (i) ADMM and AMA formulations of (2) that Advances in Neural Information Processing Systems, 2014.
[26] X. Peng, Z. Yu, Z. Yi and H. Tang, Constructing the L2-graph for [52] K. Ye and L. Lim, Schubert varieties and distances between subspaces of
robust subspace learning and subspace clustering, IEEE Transactions different dimensions, SIAM Journal on Matrix Analysis and Applications,
on Cybernetics, 2017. 2016.
[27] P. Ji, T. Zhang, H. Li, M. Salzmann and I. Reid, Deep subspace clustering [53] A. Antoniadis and J. Fan, Regularization of wavelet approximations (with
networks, Advances in Neural Information Processing Systems, 2017. discussion), Journal of the American Statistical Association, 2001.
[28] M. Rahmani and G. Atia, Innovation pursuit: a new approach to subspace [54] M. Yuan and Y. Lin, Model selection and estimation in regression with
clustering, IEEE Transactions on Signal Processing, 2017. grouped variables, Journal of the Royal Statistical Society, 2006.
[29] C. Li, C. You and R. Vidal, Structured sparse subspace clustering: a joint [55] L. Meier, S. Van de Geer and P. Bühlmann, The group lasso for logistic
affinity learning and subspace clustering framework, IEEE Transactions regression, Journal of the Royal Statistical Society, 2008.
on Image Processing, 2017. [56] J. Friedman, T. Hastie and R. Tibshirani, A note on the group lasso and
[30] C. Lu, J. Feng, Z. Lin, T. Mei and S. Yan, Subspace clustering by block a sparse group lasso, Arxiv preprint, 2010.
diagonal representation, IEEE Transactions on Pattern Analysis and [57] M. Kshirsagar, E. Yang and A. Lozano, Learning task structure via
Machine Intelligence, 2018. sparsity grouped multitask learning, European Conference on Machine
[31] M. Yin, S. Xie, Z. Wu, Y. Zhang and J. Gao, Subspace clustering Learning and Principles and Practice of Knowledge Discovery in
via learning an adaptive low-rank graph, IEEE Transactions on Image Databases, 2017.
Processing, 2018. [58] L. Balzano, B. Recht and R. Nowak, High-dimensional matched subspace
[32] J. Mairal, F. Bach, J. Ponce and G. Sapiro, Online dictionary learning detection when data are missing, IEEE International Symposium on
for sparse coding, International Conference on Machine Learning, 2009. Information Theory, 2010.
[33] R. Vidal, R. Tron and R. Hartley, Multiframe motion segmentation with [59] S. Land and J. Friedman, Variable fusion: a new method of adaptive
missing data using Power Factorization and GPCA International Journal signal regression, Technical Report, Department of Statistics, Stanford
of Computer, 2008. University, 1996.
[34] D. Park, J. Neeman, J. Zhang, S. Sanghavi and I. Dhillon, Preference com- [60] R. Tibshirani, S. Rosset, J. Zhu and K. Knight, Sparsity and smoothness
pletion: Large-scale collaborative ranking from pairwise comparisons, via the fused lasso, Journal of the Royal Statistical Society, 2005.
International Conference on Machine Learning, 2015. [61] X. Shen and H. Huang, Grouping pursuit through a regularization
solution surface, Journal of the American Statistical Association, 2010.
[35] C. Yang, D. Robinson and R. Vidal, Sparse subspace clustering with
[62] T. Hocking, A. Joulin and F. Bach, Clusterpath: An algorithm for
missing entries, International Conference on Machine Learning, 2015.
clustering using convex fusion penalties International Conference on
[36] E. Candès and B. Recht, Exact matrix completion via convex optimization,
Machine Learning, 2011.
Foundations of Computational Mathematics, 2009.
[63] F. Lindsten, H. Ohlsson and L. Ljung, Clustering using sum-of-norms
[37] M. Tsakiris and R. Vidal, Theoretical analysis of sparse subspace regularization: with application to particle filter output computation,
clustering with missing entries, International Conference on Machine Statistical Signal Processing, 2011.
Learning, 2018. [64] S. Poddar and M. Jacob, Clustering of data with missing entries using
[38] E. Elhamifar, High-rank matrix completion and clustering under self- non-convex fusion penalties, arXiv preprint, 2017.
expressive models, Neural Information Processing Systems, 2016. [65] D. Pimentel-Alarcón and R. Nowak, The information-theoretic require-
[39] B. Eriksson, L. Balzano and R. Nowak, High-rank matrix completion ments of subspace clustering with missing data, International Conference
and subspace clustering with missing data, Artificial Intelligence and on Machine Learning, 2016.
Statistics, 2012. [66] M. Powell, On search directions for minimization algorithms, Mathemat-
[40] L. Balzano, R. Nowak, A. Szlam and B. Recht, k-Subspaces with missing ical Programming, 1973.
data, IEEE Statistical Signal Processing, 2012. [67] J. Nocedal and S. Wright, Numerical optimization, Springer Science and
[41] D. Pimentel-Alarcón, L. Balzano and R. Nowak, On the sample Business Media, 2006.
complexity of subspace clustering with missing data, IEEE Statistical [68] A. Beck, On the convergence of alternating minimization for convex
Signal Processing, 2014. programming with applications to iteratively reweighted least squares
[42] D. Pimentel-Alarcón, L. Balzano, R. Marcia, R. Nowak and R. Willett, and decomposition schemes, SIAM Journal on Optimization, 2015.
Group-sparse subspace clustering with missing data, IEEE Statistical [69] S. Boyd and L. Vandenberghe, Convex optimization, Cambridge Univer-
Signal Processing, 2016. sity Press, 2004.
[43] G. Ongie, R. Willett, R. Nowak and L. Balzano, Algebraic variety models [70] C. Ding, D. Zhou, X. He and H. Zha, R1-PCA: Rotational invariant
for high-rank matrix completion, International Conference on Machine L1-norm principal component analysis for robust subspace factorization,
Learning, 2017. International Conference on Machine Learning, 2006.
[44] D. Pimentel-Alarcón, G. Ongie, L. Balzano, R. Willett and R. Nowak, [71] A. Ng, M. Jordan and Y. Weiss, On spectral clustering: analysis and an
Low algebraic dimension matrix completion, Allerton Conference on algorithm, Advances in Neural Information Processing Systems, 2002.
Communication, Control, and Computing, 2017. [72] H. Akaike, Information theory and an extension of the maximum
[45] Y. Song, M. Li, Z. Zhu, G. Yang and X. Luo, Non-negative Latent likelihood principle, IEEE International Symposium on Information
Factor Analysis-Incorporated and Feature-Weighted Fuzzy Double c- Theory, 1973.
Means Clustering for Incomplete Data," IEEE Transactions on Fuzzy [73] E. Chi and K. Lange, Splitting methods for convex clustering, Journal of
Systems. Computational and Graphical Statistics, 2015.
[46] D. Li, H. Zhang, T. Li, A. Bouras, X. Yu and T. Wang, Hybrid [74] K. Tan and D. Witten, Statistical properties of convex clustering,
Missing Value Imputation Algorithms Using Fuzzy C-Means and Vaguely Electronic Journal of Statistics, 2015.
Quantified Rough Set, IEEE Transactions on Fuzzy Systems. [75] A. Georghiades, P. Belhumeur and D. Kriegman, From few to many:
[47] Beretta, L., Santaniello, A. Nearest neighbor imputation algorithms: a Illumination cone models for face recognition under variable lighting and
critical evaluation. BMC Med Inform Decis Mak 16, 74 (2016). pose, IEEE Transactions on Pattern Analysis and Machine Intelligence,
[48] Lodder P. To Impute or not Impute, That’s the Question. In Mellenbergh 2001.
GJ, Adèr HJ, editors, Advising on research methods: Selected topics [76] E. Candès, X. Li, Y. Ma and J. Wright, Robust principal component
2013. Huizen: Johannes van Kessel Publishing. 2014. analysis?, Journal of the ACM, 2011.
[49] A. Chaudhry et al, "A Method for Improving Imputation and Prediction [77] R. Tron and R. Vidal, A benchmark for the comparison of 3-D motion
Accuracy of Highly Seasonal Univariate Data with Large Periods of segmentation algorithms, IEEE Conference on Computer Vision and
Missingness," Wireless Communications & Mobile Computing (Online), Pattern Recognition, 2007.
vol. 2019, pp. 13, 2019. [78] L. Balzano, R. Nowak and B. Recht, Online identification and tracking
[50] C. Lane, R. Boger, C. You, M. Tsakiris, B. Haeffele and R. Vidal, of subspaces from highly incomplete information, Allerton Conference
"Classifying and Comparing Approaches to Subspace Clustering with on Communication, Control and Computing, 2010.
Missing Data," 2019 IEEE/CVF International Conference on Computer [79] T. Hastie and P. Simard, Metrics and models for handwritten character
Vision Workshop (ICCVW), 2019, pp. 669-677. recognition, Statistical Science, 1998.
[51] E. Candès, Y. Eldar, D. Needell and P. Randall, Compressed sensing [80] Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, Gradient-based learning
with coherent and redundant dictionaries, Applied and Computational applied to document recognition, Proceedings of the IEEE, 1998.
Harmonic Analysis, 2011.

Nepal Telecom - See - Ntc.net - NP - See Results 2080
100% (1)
Nepal Telecom - See - Ntc.net - NP - See Results 2080
1 page
Limit & Continuity
89% (9)
Limit & Continuity
32 pages
Percentage Concept by Dear Sir
100% (1)
Percentage Concept by Dear Sir
5 pages
Ram Murty M. - Esmonde J. Problems in Algebraic Number Theory (2ed. - Springer - 2005) (353s)
100% (1)
Ram Murty M. - Esmonde J. Problems in Algebraic Number Theory (2ed. - Springer - 2005) (353s)
353 pages
Hasse Diagram AND Lattice
No ratings yet
Hasse Diagram AND Lattice
10 pages
7 Steps of Writing 10X Better Prompts
No ratings yet
7 Steps of Writing 10X Better Prompts
63 pages
NDA NA 2 Maths 2012
No ratings yet
NDA NA 2 Maths 2012
32 pages
The Complete Guide To Data Preprocessing
No ratings yet
The Complete Guide To Data Preprocessing
50 pages
Module 1 WK 1
No ratings yet
Module 1 WK 1
46 pages
Dmbi Iat-2 Imp Ques Soln
No ratings yet
Dmbi Iat-2 Imp Ques Soln
43 pages
Preprocessing
No ratings yet
Preprocessing
62 pages
Fuzzy Artmap and Neural Network Approach To Online Processing of Inputs With Missing Values
No ratings yet
Fuzzy Artmap and Neural Network Approach To Online Processing of Inputs With Missing Values
7 pages
Unit-Ii Data Preprocessing
No ratings yet
Unit-Ii Data Preprocessing
94 pages
AI - Learning.websites and Other Educational Websites
No ratings yet
AI - Learning.websites and Other Educational Websites
18 pages
Cluster Quilting Spectral Clustering For Patchwork Learning
No ratings yet
Cluster Quilting Spectral Clustering For Patchwork Learning
43 pages
3 Data Preprocessing
No ratings yet
3 Data Preprocessing
33 pages
13 - Chapter 4 PDF
No ratings yet
13 - Chapter 4 PDF
46 pages
Data Sheet Das579543
No ratings yet
Data Sheet Das579543
29 pages
Businnes Intelligence
No ratings yet
Businnes Intelligence
36 pages
Unlabeled Principal Component Analysis and Matrix Completion
No ratings yet
Unlabeled Principal Component Analysis and Matrix Completion
38 pages
Role of AI Chatbots in Education Systematic Litera
No ratings yet
Role of AI Chatbots in Education Systematic Litera
17 pages
DWDM 3
No ratings yet
DWDM 3
12 pages
2020 FARML Texas Team Selection Test Solutions Manual
No ratings yet
2020 FARML Texas Team Selection Test Solutions Manual
8 pages
Prevention of Security Concerns During Outlier Detection
No ratings yet
Prevention of Security Concerns During Outlier Detection
3 pages
M P U S: To Change The Symbol
No ratings yet
M P U S: To Change The Symbol
30 pages
Novel Methods For Imputing Missing Values in Water Level Monitoring Data
No ratings yet
Novel Methods For Imputing Missing Values in Water Level Monitoring Data
28 pages
3-Data Pre-Processing
No ratings yet
3-Data Pre-Processing
18 pages
Mid Term
No ratings yet
Mid Term
12 pages
Extreme Learning Machine For Missing Data Using Multiple Imputations
No ratings yet
Extreme Learning Machine For Missing Data Using Multiple Imputations
18 pages
Mathematics Revision Worksheet
No ratings yet
Mathematics Revision Worksheet
4 pages
Quality Assurance in Machining Process Using Data Mining
No ratings yet
Quality Assurance in Machining Process Using Data Mining
6 pages
Missing Value
No ratings yet
Missing Value
11 pages
Platias2020 Greece
No ratings yet
Platias2020 Greece
10 pages
Journal - pone.0259266ACT TicketACT Ticket
No ratings yet
Journal - pone.0259266ACT TicketACT Ticket
33 pages
Data Preprocessing 013333
No ratings yet
Data Preprocessing 013333
8 pages
Enhanced Synthetic Oversampling For Multiclass Imbalanced Data
No ratings yet
Enhanced Synthetic Oversampling For Multiclass Imbalanced Data
20 pages
Grangier Melvin Nips 2010
No ratings yet
Grangier Melvin Nips 2010
9 pages
Mit401 Unit 10-Slm
No ratings yet
Mit401 Unit 10-Slm
23 pages
Chang 2016
No ratings yet
Chang 2016
12 pages
dm10 061 Acare
No ratings yet
dm10 061 Acare
12 pages
DWM Solution May 2019
No ratings yet
DWM Solution May 2019
9 pages
Data Preprocessing
No ratings yet
Data Preprocessing
0 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
52 pages
Wagstaff 2004
No ratings yet
Wagstaff 2004
10 pages
Gaussian Mixture Model Clustering With Incomplete Data
No ratings yet
Gaussian Mixture Model Clustering With Incomplete Data
14 pages
Parsons PDF
No ratings yet
Parsons PDF
16 pages
"Handling and Mitigation of Missing Data in Sensors" Course: Business Data Mining Group 13
No ratings yet
"Handling and Mitigation of Missing Data in Sensors" Course: Business Data Mining Group 13
12 pages
CS-DM Module-2
No ratings yet
CS-DM Module-2
30 pages
Recent Advances in Clustering A Brief Survey
No ratings yet
Recent Advances in Clustering A Brief Survey
9 pages
Clustering Incomplete Data Using Kernel-Based
No ratings yet
Clustering Incomplete Data Using Kernel-Based
8 pages
6 Wave Equation: TT 2 XX
No ratings yet
6 Wave Equation: TT 2 XX
14 pages
G8 Term+3 Angles
No ratings yet
G8 Term+3 Angles
20 pages
Missing Value Imputation Via Clusterwise Linear Regression
No ratings yet
Missing Value Imputation Via Clusterwise Linear Regression
13 pages
Fuzzy Based Techniques For Handling Missing Values
No ratings yet
Fuzzy Based Techniques For Handling Missing Values
6 pages
Necromancy
No ratings yet
Necromancy
2 pages
Robust Feature Selection On Incomplete Data
No ratings yet
Robust Feature Selection On Incomplete Data
7 pages
2008li-Pairwise Constraint Propagation by Semidefinite Programming For Semi-Supervised Classification
No ratings yet
2008li-Pairwise Constraint Propagation by Semidefinite Programming For Semi-Supervised Classification
8 pages
For Intervention (Gen - Math)
No ratings yet
For Intervention (Gen - Math)
4 pages
Adinarayana, Ilavarasan - 2018 - An Efficient Decision Tree For Imbalance Data Learning Using Confiscate and Substitute Technique
No ratings yet
Adinarayana, Ilavarasan - 2018 - An Efficient Decision Tree For Imbalance Data Learning Using Confiscate and Substitute Technique
8 pages
Chap 1 Data Preprocessing
No ratings yet
Chap 1 Data Preprocessing
17 pages
An Analysis of Four Missing Data Treatment Methods
No ratings yet
An Analysis of Four Missing Data Treatment Methods
13 pages
p1190 Rekatsinas
No ratings yet
p1190 Rekatsinas
12 pages
Data Preprocessing For Supervised Leaning
No ratings yet
Data Preprocessing For Supervised Leaning
6 pages
DWM Exp6 C49
No ratings yet
DWM Exp6 C49
15 pages
Usr Tomcat7 Documents REZUMAT ENG Lemnaru Camelia
No ratings yet
Usr Tomcat7 Documents REZUMAT ENG Lemnaru Camelia
6 pages
Data and DW Lab Manual Updated
No ratings yet
Data and DW Lab Manual Updated
44 pages
Clustering With Incomplete Data Insights Into Model Stability and Filling Method Effectiveness
No ratings yet
Clustering With Incomplete Data Insights Into Model Stability and Filling Method Effectiveness
6 pages
Constrained Multibody Dynamics With Python
No ratings yet
Constrained Multibody Dynamics With Python
10 pages
PAACDA Comprehensive Data Corruption Detection Algorithm
No ratings yet
PAACDA Comprehensive Data Corruption Detection Algorithm
8 pages
Spreadsheets Path To Math Petti
No ratings yet
Spreadsheets Path To Math Petti
108 pages
Chapter 3
No ratings yet
Chapter 3
18 pages
Data Prep
No ratings yet
Data Prep
5 pages
An Intuitive Guide To Linear Algebra - BetterExplained
No ratings yet
An Intuitive Guide To Linear Algebra - BetterExplained
6 pages
Choosing Allowability Boundaries For Describing Objects in Subject Areas
No ratings yet
Choosing Allowability Boundaries For Describing Objects in Subject Areas
8 pages
Framework For Missing Value Imputation: Ms.R.Malarvizhi, DR - Antony Selvadoss Thanamani
No ratings yet
Framework For Missing Value Imputation: Ms.R.Malarvizhi, DR - Antony Selvadoss Thanamani
3 pages
Agglomerative Mean-Shift Clustering
No ratings yet
Agglomerative Mean-Shift Clustering
7 pages
University Mathematics 1 by Olan I Yi Evans
No ratings yet
University Mathematics 1 by Olan I Yi Evans
11 pages
Data Clustering Using Kernel Based
No ratings yet
Data Clustering Using Kernel Based
6 pages
Understanding High School Mathematics
No ratings yet
Understanding High School Mathematics
11 pages
Labview Guide
No ratings yet
Labview Guide
8 pages
The Calculation of Integrals Involving B-Splines by Means of Recursion Relations
No ratings yet
The Calculation of Integrals Involving B-Splines by Means of Recursion Relations
10 pages
Math Syllabus
No ratings yet
Math Syllabus
60 pages
Reflections
No ratings yet
Reflections
22 pages
Zak New Errata
No ratings yet
Zak New Errata
1 page
Layer Normalization: Jimmy@psi - Toronto.edu Rkiros@cs - Toronto.edu Hinton@cs - Toronto.edu
No ratings yet
Layer Normalization: Jimmy@psi - Toronto.edu Rkiros@cs - Toronto.edu Hinton@cs - Toronto.edu
14 pages
ASurveyon AIChatbots
No ratings yet
ASurveyon AIChatbots
10 pages
Bias and Inaccuracy in AI Chatbot Ophthalmologist
No ratings yet
Bias and Inaccuracy in AI Chatbot Ophthalmologist
9 pages
Module-4 RKDC Important Question
No ratings yet
Module-4 RKDC Important Question
21 pages
Efficient Skyline Computation On Massive Incomplet
No ratings yet
Efficient Skyline Computation On Massive Incomplet
19 pages
P08 Parametric Equations
No ratings yet
P08 Parametric Equations
38 pages
The Behaviour of Rank Correlation Coefficients For
No ratings yet
The Behaviour of Rank Correlation Coefficients For
15 pages
Online Feature Selection and Classification With Incomplete Data
No ratings yet
Online Feature Selection and Classification With Incomplete Data
13 pages
Queuing Theory and Reliability Question Bank
No ratings yet
Queuing Theory and Reliability Question Bank
12 pages
dlp1 Math8q1
No ratings yet
dlp1 Math8q1
3 pages
Polynomials CBSE Class9 SamplePaper
No ratings yet
Polynomials CBSE Class9 SamplePaper
4 pages
Union-Find Data Structures and Algorithms: Definitive Reference for Developers and Engineers
From Everand
Union-Find Data Structures and Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Contextual Image Classification: Understanding Visual Data for Effective Classification
From Everand
Contextual Image Classification: Understanding Visual Data for Effective Classification
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Fusion Subspace Clustering For Incomplete Data

Uploaded by

Fusion Subspace Clustering For Incomplete Data

Uploaded by

Fusion Subspace Clustering for Incomplete Data

Usman Mahmood Daniel Pimentel-Alarcón

II. P ROBLEM S TATEMENT n n n

residual. Similarly, ρ(Ui , Uj ) defines a metric on the Grassman- (2)

0.1 0.2 0.3 0.4 0.5 0.6

(nk = 20) (nk = 50) MNIST MNIST

0.05 0.15 0.25 0.35 0.45 0.55

0.0001 0.001 0.01 0.1 .3 .4 .5 .6 .7 .8 .9 .94 .3 .4 .5 .6 .7 .8 .9 .94 2 3 4 5 6 7 8 9 10 .1 .3 .5 .7 .9 .99

You might also like