Multi-View Clustering A Survey
Multi-View Clustering A Survey
Abstract: In the big data era, the data are generated from different sources or observed from different views. These
data are referred to as multi-view data. Unleashing the power of knowledge in multi-view data is very important
in big data mining and analysis. This calls for advanced techniques that consider the diversity of different views,
while fusing these data. Multi-view Clustering (MvC) has attracted increasing attention in recent years by aiming to
exploit complementary and consensus information across multiple views. This paper summarizes a large number
of multi-view clustering algorithms, provides a taxonomy according to the mechanisms and principles involved, and
classifies these algorithms into five categories, namely, co-training style algorithms, multi-kernel learning, multi-
view graph clustering, multi-view subspace clustering, and multi-task multi-view clustering. Therein, multi-view
graph clustering is further categorized as graph-based, network-based, and spectral-based methods. Multi-view
subspace clustering is further divided into subspace learning-based, and non-negative matrix factorization-based
methods. This paper does not only introduce the mechanisms for each category of methods, but also gives a few
examples for how these techniques are used. In addition, it lists some publically available multi-view datasets.
Overall, this paper serves as an introductory text and survey for multi-view clustering.
Key words: multi-view clustering; co-training; multi-kernel learning; graph clustering; subspace clustering; subspace
learning; non-negative matrix factorization; multi-task learning
multi-view learning have been conducted in Refs. [1– organize and summarize them in five categories:
3]. Moreover, Zheng[4] provided an overview on Co-training style algorithms: This category of
the methodologies for multi-view (cross-domain) data methods treats multi-view data by using co-
fusion, in which some specific applications were training strategy. It bootstraps the clustering of
discussed. Existing multi-view learning technologies different views by using the prior or learning
are roughly divided into supervised learning and knowledge from one another. By iteratively
unsupervised learning. This paper focuses on one of the carrying out this strategy, the clustering results of
unsupervised learning techniques, namely, clustering. all views tend to each other and this leads to the
Clustering has emerged as a powerful alternative broadest consensus across all views.
learning tool for exploring the underlying structure of Multi-kernel learning: This category of methods
data[5, 6] , especially in the era of big data[7] . The basic uses predefined kernels corresponding to different
idea of clustering algorithms is to partition a set of data views, and then combines these kernels either
objects according to some criteria, such that similar linearly or non-linearly in order to improve
objects are grouped into the same cluster, and dissimilar clustering performance.
objects are divided into different clusters. Multi-view graph clustering: This category of
Many advanced clustering algorithms have been methods seeks to find a fusion graph (or network)
investigated in the last few decades. Although these across all views and then uses graph-cut algorithms
clustering algorithms have been very successful to or other technologies (e.g., spectral clustering) on
some extent, most of them are only suitable to the fusion graph in order to produce the clustering
single view data. Even concatenating all views result.
into a single view and then adopting state-of-the- Multi-view subspace clustering: This category
art clustering algorithms on this single view may learns a unified feature representation (to be
not improve the clustering performance, because such input into a model for clustering) from all the
way is not physically meaningful due to each view feature subspaces of all views by assuming that
having its specific statistical property. In comparison, all views share this representation. Typical
Multi-view Clustering (MvC) performs effectively on models include subspace learning and Non-
multi-view data by considering the diversity and negative Matrix Factorization (NMF).
complementarity of different views. Early studies on Multi-task multi-view clustering: This category
MvC, such as reinforcement clustering for multi-type treads each view with one task or multiple
interrelated data[8] , multi-view version of DBSCAN[9] , related tasks, transfers the inter-task knowledge
and two-view version of EM-based and agglomerative to one another, and exploits multi-task and multi-
algorithms[10] , etc., began approximately in 2003. view relationships in order to improve clustering
As an advanced clustering paradigm, MvC received performance.
increasing attention in recent years. Thus far, four We will provide a specific introduction and a few
workshops[11–14] and a mini-symposium[15] have been examples for each category in the following sections.
held in conjunction to related international conferences. Moreover, we also list some widely used multi-view
In the context of MvC, an inherent problem (and also datasets in order to help researchers in this field.
the goal) of all algorithms having to be dealt with To do that, the rest of this paper will be organized
elaborately, is to find a way to maximize clustering as follows. In Section 2, we illustrate two related
quality within each view, while taking clustering principles that ensure the success of MvC. In Section
consistency across different views into consideration. 3, we provide an overview of earlier and more recent
Moreover, incomplete multi-view data, where some MvC methods with five categories, and enumerate a few
data objects could be missing their observation on one examples for each category. Some publicly available
view (i.e., missing objects) or could be available only datasets are covered in Section 4. Finally, in Section
for their partial features on that view (i.e., missing 5, we conclude this paper and discuss challenges and
feature), also pose challenges to MvC. future trends for MvC.
In this paper, we review a number of representative Notations and Definitions: We begin with a
MvC methods. According to the mechanisms and description of the notations used in this paper. We
principles on which these methods are based, we state that matrices and vectors throughout this paper are
Yan Yang et al.: Multi-view Clustering: A Survey 85
written in uppercase and lowercase letters, respectively. the context of multi-view data, each single view is
The common notations and corresponding definitions sufficient for a particular knowledge discovery task.
are summarized in Table 1. However, different views often contain information
complementary to each other. For instance, in the
2 Principles of MvC field of image processing, each image is described by
This section deals with analyzing two significant different types of features, such as LBP, SIFT, and
principles of MvC, namely, complementary and HOG, where LBP is a powerful texture feature, SIFT is
consensus principles. These two principles partially robust to image illumination, noise, and rotation, while
answer why MvC is effective, what the underlying HOG is sensitive to marginal information. Therefore, it
assumptions are, and above all how the MvC should be is necessary to exploit these mutually complementary
modeled and performed. information underlying multiple views in order to
By referring to Ref. [16], we give an illustration on describe these data objects, and to provide deeper
these two principles. Given a data object with two insights with regard to the internal clustering.
views, this data object is mapped into a latent data space Consensus principle: This principle aims to
as shown in Fig. 1. From Fig. 1, we can observe that: maximize consistency across multiple distinct views.
(1) some ingredients (part A and part C) exist in the Based on probably approximately correct analysis,
individual view, such as part A in view 1 and part C in Dasgupta et al.[17] proposed a generalization error
view 2, i.e., the complementarity of two views, and (2) analysis for the consensus principle. Given a multi-
some ingredients (part B) of the object are shared by view dataset X , this dataset has two views X 1 and
both views, i.e., the consensus between two views. X 2 . Under some mild assumptions, Dasgupta et al.[17]
Next, we analyze these two principles as follows: demonstrated the connection between the consensus
Complementary principle: This principle states that of two hypotheses on two views, respectively. The
multiple views should be employed in order to describe connection is formulated as the following inequality:
data objects more comprehensively and accurately. In P .f 1 ¤ f 2 / > maxfPerr .f 1 /; Perr .f 2 /g (1)
Sun et al.[36] presented a proximal alternating assign a point to the same cluster, irrespective of the
linearized minimization algorithm. This algorithm views, as was done in most of those co-training based
can simultaneously decompose multiple data matrices MVC approaches, Kumar and Daumé III[28] took the
into sparse row and column vectors, and link different spectral embedding from one view in order to constrain
views of data with a binary vector, where the binary the similarity graph of the other view. By carrying out
vector enforces consistency for the row clusters from this process iteratively, the clusterings of the two views
all views. Simultaneously building similarity matrices, tended to each other.
rather than a set of clusters, between the rows and
3.2 Multi-kernel learning
columns of a data matrix, an architecture to learn
co-similarities from multi-view datasets was designed Multi-kernel learning was originally developed in order
in Ref. [37], and was subsequently parallelized to boost the search space capacity of possible kernel
in Ref. [38]. Assuming that transferring similarity functions, e.g., Linear kernel, Polynomial kernel,
values (generated from individual data) from one and Gaussian kernels, in order to achieve good
view to others would result in better data clustering, generalization. As kernels in multi-kernel learning
Hussain and Bashir[39] extended the co-similarity based naturally correspond to different views, multi-kernel
architecture in order to handle multiple datasets with learning has been widely applied in order to deal with
two kinds of integration schemes (i.e., intermediate multi-view data. The general procedure of multi-kernel
integration and late integration). In addition, several learning approaches is shown in Fig. 4, where different
collaborative MvC approaches have been investigated predefined kernels are used to deal with different views.
in Refs. [40, 41]. These approaches consist of two Then these kernels are combined either linearly or non-
phases: the local phase and collaboration phase, where linearly in order to arrive at a unified kernel. In an
the local phase applies a clustering algorithm to each MvC setting, multi-kernel learning based MvC intends
view, and the collaboration phase collaborates each to optimally combine a group of predefined kernels
view with the clustering results associated to the other in order to improve clustering performance. In such
views produced from the local phase. methods, an essential problem consists of finding a way
Example 1: As illustrated in Fig. 3 (using two to choose suitable kernel functions and combine these
views for brevity), Kumar and Daumé III[28] first kernels optimally.
applied the co-training strategy to the problem of In a single view scenario, based on maximum margin
multi-view spectral clustering. Unlike semi-supervised clustering[42] , Zhao et al.[43] presented a multiple kernel
learning, there are no labeled data in unsupervised clustering algorithm, which can simultaneously find the
learning settings; therefore, the prototypical co-training maximum margin hyperplane, best clusterings, and the
algorithms were not available directly for MvC. optimal kernels. Du et al.[44] performed a robust K-
However, the motivation of co-training still remains means (with l2;1 -norm) on kernel space, and proposed
the same as in unsupervised learning problems. In
a multiple kernel K-means algorithm, which is able
other words, it limits the search only to hypotheses
to simultaneously find the best clustering labels, the
(clusterings) that agree with those in other views.
cluster membership, and the optimal combination of
Assuming that the true underlying clustering would
multiple kernels. It is worth stressing that this type of
the above mentioned algorithms is available for dealing
1: Calculate the graph similarity matrix S 1 and S 2 for both with multi-view data under the framework shown in Fig.
views. 4. In a multi-view scenario, De Sa et al.[45] constructed
2: Initialize the graph Laplacian matrices L1 and L2 , and a custom kernel combination method based on the
the discriminative eigenvectors U 1 and U 2 .
minimizing-disagreement algorithm[46, 47] . Specifically,
3: Perform spectral embedding on S 1 with U 2 to get new
similarity matrix S 1 .
4: Perform spectral embedding on S 2 with U 1 to get new
similarity matrix S 2 .
5: Compute the new Laplacians matrices L1 and L2 , new
eigenvectors U 1 and U 2 .
6: Go to Step 3 and repeat for a number of iterations.
Fig. 3 Co-training approach for multi-view spectral
clustering[28] . Fig. 4 General procedure of multi-kernel learning.
88 Big Data Mining and Analytics, June 2018, 1(2): 83-107
they generated a multi-partite graph in order to induce a above) also stated that their proposed algorithm could
kernel, which was then used for spectral clustering. In calculate sample affinities with missing views. In
fact, this method could be regarded as a variant of kernel the setting where no view is complete, Shao et al.[57]
canonical correlation analysis, and a generalization of proposed a collective kernel learning algorithm in order
co-clustering and spectral clustering. Moreover, Yu et to infer the hidden sample similarity. The idea behind
al.[48] extended the classical K-means clustering into this approach was to collectively complete the kernel
Hilbert space, where multi-view data matrices were matrices of the incomplete views by optimizing the
denoted as kernel matrices and were then combined alignment of the shared instances of those views.
automatically for data fusion. Similar work has also In addition, unlike some existing methods, where
been done in Ref. [49]. The difference with Ref. [48] incomplete kernels were first imputed and then an
is that the kernels were combined in a localized way available multi-kernel clustering algorithm was applied
in order to better capture the sample characteristics of to the inputting kernels, Liu et al.[58] integrated the
the data. Rather than extending the existing clustering kernel imputation and clustering into a unified learning
algorithms with a multi-kernel learning setting, Lu procedure for incomplete MvC.
et al.[50] studied multiple kernel clustering based Example 2: One challenge of multi-kernel learning
on a centered kernel alignment (an effective kernel consists of choosing appropriate kernel functions (e.g.,
evaluation measure), which was employed in order to Linear kernel, Polynomial kernel, and Gaussian kernel),
unify two clustering tasks and multi-kernel learning into which map the original low-dimensional space to a
a single optimization framework. high-dimensional space. The general method for multi-
Methods with the weighted combination of kernels view data is to use a linear combination of several kernel
have also been studied by considering the difference of functions, while the weights of different kernels should
views (or kernels). For instance, kernel based weighted be taken into consideration. Moreover, the weights of
MvC was investigated in Ref. [51], where the weights to different views are also an important factor for MvC. To
the kernels were assigned according to the information these ends, Zhang et al.[55] developed an auto-weighted
quality of the corresponding views. A systemic MvC multi-kernel MvC algorithm that weights the views and
approach was proposed in order to automatically assign kernels simultaneously. Figure 5 gives an illustration
weights for deriving the kernel matrix on each view for the proposed algorithm. First, it employs Kernel
through an optimization process in Ref. [52], where Principal Component Analysis (KPCA) on each view
the kernel matrix learning was based on the kernel in order to reduce the dimension of the original data,
alignment in order to measure the similarity between and results in low-dimensional multi-view data. Then,
two kernel matrices. In addition, Liu et al.[53] showed it applies the designed weighted Gaussian kernel on the
a weighted multiple kernel K-means clustering method low-dimensional multi-view data. This step drives the
with matrix-induced regularization, which could reduce weight of each view and cluster centers. After finite
the redundant kernels and enhance the diversity of iterations, it arrives at the final clustering result. It
the predefined kernels. Zhao et al.[54] provided a is worth noting that the designed weighted Gaussian
weighted MvC method with matrix-induced and low- kernel integrates the advantages of Gaussian kernel and
rank regularization. Zhang et al.[55] also presented a Polynomial kernel. The designed weighted Gaussian
weighted MvC algorithm based on improved Gaussian
kernels with variable weights.
However, in many applications, it is common that
data on some views is not available, or is only partially
available, which leads to incomplete multi-view data,
as we mentioned in the introduction. To address this
issue, Trivedi et al.[56] presented a general approach
that allows the MvC, in complete view settings, to
be applicable in this scenario, where only one view
was complete and the auxiliary views were incomplete.
They took the kernel CCA based MvC as an example in
order to illustrate their idea. De Sa et al.[45] (mentioned Fig. 5 The flow chart of auto-weighted multi-kernel MvC.
Yan Yang et al.: Multi-view Clustering: A Survey 89
kernel[55] is formulated as
p
kx yk2
K.x; y/ D exp C R D
2 2
p
!
qkx yk2
X p p q
R exp D
qD0
q 2 2
p
!
kx yk2
p
X p p q
R C R exp
qD0
q 2 2 =q
Fig. 6 General procedure of graph-based clustering.
(2)
Given multi-view data with m views, n samples, and fusion graph in order to produce the final clustering
k clusters, the objective function[55] based on the K- result.
means and designed kernel is formulated as In this category, the literature review is organized in
m X
n k
X X three parts, namely, graph-based MvC, network-based
min !i;v ıij k v .xiv / v .cjv /k2 ;
MvC, and spectral-based MvC.
vD1 i D1 j D1
Q 3.3.1 Graph-based MvC
s.t., !i;v > 0; !i;v D 1 (3)
v Based on multiple similarity graphs, Tang et al.[59]
where cjv is the cluster center, and ıij is the indicator discussed a general clustering problem. The proposed
variable with ıij D 1 if xi 2 cj , otherwise ıij D 0. linked matrix factorization method extracted common
By plugging the designed Gaussian kernel K.x; y/ D factors from multiple graphs, which led to various
.x/ .y/ into Formula (3)[55] , it is rewritten as graph-based clustering methods that could be naturally
m X
X n k
X applied to multi-view data. Hussain et al.[60] proposed a
2ıij .1 C R/p K.xiv ; cjv / ;
min !i;v multi-view document clustering algorithm, which first
vD1 i D1 j D1 applies single-view clustering algorithms to the data
Q matrix of each view, in order to generate multiple
s.t.; !i;v > 0; v !i;v D 1 (4)
partitions. Then, it uses these partitions to generate
Note that Formula (4) inherits the properties of K- a set of three different similarity matrices, namely,
means and kernel, and the designed kernel integrates the the affinity matrix, cluster based similarity matrix, and
advantages of Gaussian kernel and Polynomial kernel. pair-wise dissimilarity matrix. Finally, it employs
an ensemble technique in order to aggregate these
3.3 Multi-view graph clustering
matrices, and forms a unified similarity matrix for
Graphs (or networks) are widely used for representing clustering. Moreover, the impact of different similarity
the relationships between objects, where each node measures (e.g., Pearson, and Spearman correlations,
corresponds to a data object and each edge depicts the Euclidean, and Canberra Distances, etc.) on MvC
relationship between a pair of objects. In practice, the has been studied in Ref. [61]. Later on, Xue
relationship is often denoted by the similarity or the et al.[62] proposed a group-aware multi-view fusion
affinity relationship; namely, the input graph matrix is method, which adopts different weights to characterize
generated from a data similarity matrix. In a multi-view the pairwise similarity between different groups.
scenario, data objects are captured by multiple graphs. Furthermore, even though some MvC methods learned
A common assumption is that each individual graph can a weight for each graph, such methods have additional
capture the partial information of the data; while, all parameters. In order to address these challenges, Nie
graphs have the same underlying clustering structure of et al.[63] developed a parameter-free multiple graph
data. Thus, these graphs are able to mutually reinforce framework to learn a set of weights automatically for
each other by consolidating the correlation between the all graphs. In addition, unsupervised feature selection
data objects collectively. In general, the graph-based for multi-view data has also been investigated. There,
fusion procedure for multi-view data is similar to Fig. 6. these selected features were used for a clustering
Multi-view graph clustering aims to find a fusion graph task or other learning tasks. For instance, Wei et
across all views and then uses graph-cut algorithms al.[64] proposed a method, named cross diffused matrix
or other technologies (e.g., spectral clustering) on the alignment based on feature selection, in order to select
90 Big Data Mining and Analytics, June 2018, 1(2): 83-107
features for each view by performing alignment on a weights in order to filter unreliable neighbors in the
cross diffused matrix. Then they applied co-regularized union of view-specific neighborhoods by representing
spectral clustering[29] on these selected features in each object in a weighted sum of its neighbors
order to produce the final results. Moreover, since under each view. The learning sparse weights were
traditional approaches such as Ref. [63] evaluate the employed in order to generate a similarity graph, and
similarity by a predefined or fixed graph Laplacian this graph was further utilized for MvC. In addition,
in each view, separately, and neglect the underlying Nie et al.[70] introduced a novel multi-view learning
common structures across different views, Hou et model with adaptive neighbors. This model performs
al.[65] presented a multi-view unsupervised feature semi-supervised classification and local manifold
selection algorithm with adaptive similarities and view learning and clustering, simultaneously. It modifies the
weights. In total, this feature selection method employs similarity matrix during each iteration until it arrives at
three types of data information, i.e., data similarity, the optimal one. Moreover, it automatically allocates
data clustering structure, and the correlation between the weight coefficient for each view without penalty
different views. parameters.
On the other hand, methods with a combination Example 3: It should be noted that most of the
of the nearest neighbor techniques have also been existing methods adopt a globally uniform similarity
investigated. For instance, Hamzaoui et al.[66] designed measure over the entire data space. However, in real-
a multi-source Shared Nearest Neighbors (SNN) world objects such as images, different images have
scheme for multi-modal image clustering. The central different visual appearance and the visual distribution
idea was to extend the existing SNN-based similarity is also complex. It is difficult to capture the similarity
measures to the case of multiple sources, and of different objects accurately only by using a globally
then introduce an original automatic source selection uniform measure. To solve this problem, Xue et al.[62]
step in order to build candidate clusterings. With presented a group-aware multi-view fusion approach
consideration of the generative and manifold data for image clustering. This approach can partition
structure, Wang et al.[67, 68] developed a generative images into different groups with more compact visual
model with an ensemble manifold regularization cohesiveness, and assign diverse fusion weights for
for MvC. Specifically, they constructed a nearest images between and within groups. In comparison to
neighbor graph for each view in order to encode the the global fusion methods, this group-aware fusion
corresponding manifold information, and a multiple model provides a more flexible fusion strategy and
graph ensemble regularization framework was designed more effective similarity measures among images.
in order to learn the optimal intrinsic manifold. Then, The framework of this method is shown in Fig. 7.
the manifold regularization term was incorporated into a Concretely, multiple features such as LBP, GIST, and
multi-view topic model based on PLSA, which resulted Centrist, were first extracted from images, which
in a unified objective function. Unlike the above constituted three different features (views). Then, a
two methods, Zhang and Mao[69] focused on the task graph was constructed for each view. Next, all images
of efficiently selecting clustering consistent neighbors were divided into different groups, and a fused graph
for MvC. The proposed approach used jointly sparse was constructed with the proposed fusion strategy.
The intent was to assign different fusion weights for can detect network clusters in multiple domains, and
images belonging to different groups and the same their cross-domain associations. In addition, Yu and
fusion weights for the images within the same group. Zhang[75] studied community detection in multiple
Finally, the fusion weights were learned by solving social networks, and attempted to find the communities
the formulated objective function, and the clustering for multiple net-works involving both anchor and
findings are obtained by performing spectral clustering non-anchor users simultaneously. Wang et al.[76]
on this fused graph. proposed an MvC algorithm, named multi-view affinity
3.3.2 Network-based MvC propagation, based on max-product belief propagation.
Despite previous successes, most graph-based MvC The key point was to establish an MvC model consisting
approaches usually assume that the same set of data of two components that measure the within-view quality
objects are available for different views. Thus, the and the explicit clustering consistency across different
relationship between data objects in different views views, respectively.
is one-to-one relationship. However, in many real- Example 4: Most multi-view graph clustering
life applications, such as social networks, literature approaches usually assume that data objects in different
citation networks, and biology interaction networks, views have a strict one-to-one mapping relationship,
data are collected from different domains and an while, in many actual applications, data are collected
object in one domain may correspond to multiple from diverse domains (views), where the cross-domain
objects in another domain, which results in many- mapping relationship is many-to-many rather than one-
to-many mapping relationships. Representing these to-one. In other words, an object in one domain
relationships with networks rather than with graphs may correspond to multiple objects in another domain.
may be more appropriate. This is the main reason for Moreover, different domains have their inherent data
distinguishing network-based MvC from graph-based distributions. This breaks another assumption by which
MvC. all views share a common clustering structure. To
Related work on network-based MvC starts from address the above mentioned two challenges, Ni
Ref. [71], in which a network based multi-view graph et al.[72] developed a robust and flexible multi-
clustering framework is developed, and termed co- network clustering framework that allows many-to-
regularized graph clustering. This framework illustrates many relationship and multiple underlying clustering
several key properties, namely, many-to-many mapping structures.
relationships, mapping associated with weights, and Specially, Ni et al.[72] modeled each domain
partial mapping among different networks. However, similarity as a network, and also modeled the similarity
different networks may have different data distributions, among different domains as a network to regularize
leading to assumptions such as the one in Ref. [71], the clustering structures in different networks. They
by which all networks admit that a common clustering defined a global network, named Network of Networks
structure no longer holds. To relax this assumption, (NoN) as shown in Fig. 8, where the dashed network
Ni et al.[72] presented a robust and flexible framework, represents the main network among six domains fA,
which allows multiple underlying clustering structures
A
across different networks. It treats domain similarity 1
11 4 13
as the main network, and formulates the clustering 9 10 C
B 5
problem via NMF on the designed network of network 1 12
3 7 11
1
settings. Similar work has also been presented in 11 4 2 9 4
10
9 5 7 3 10
Ref. [73], where the network grouping and underlying 7
2
3
clustering detection are coupled and mutually enhanced
D E
during the learning process. Furthermore, Liu et 1 9 1
2
2
al.[74] stated that existing network-based methods tend 3
7 4 7
3
F 5
to focus on the network clustering task itself, but 4 6 6
1 8
ignore any associations that may be exhibited between 2 3
4 5
clustering findings from different domains. Given 6
this, they offered a robust clustering approach that
Fig. 8 Network of networks[72] .
92 Big Data Mining and Analytics, June 2018, 1(2): 83-107
B, C, D, E, Fg, and each node in the main network 3.3.3 Spectral-based MvC
corresponds to a domain-specific network denoted by Spectral clustering is a classic data clustering paradigm.
the solid lines. Correspondingly, the clusterings in The basic idea is to form a pairwise affinity matrix
the main network and domain-specific networks are between any pairs of objects, normalize this affinity
referred to as main clusterings and domain clusterings, matrix, and compute eigenvectors of this normalized
respectively. Given these concepts, they sought to affinity matrix (i.e., graph Laplacian). It has been
partition the NoN by using a two-phase approach. By shown that the second eigenvector of the normalized
this approach, first, the main network is partitioned, graph Laplacian is a relaxation of a binary vector
and then the learning information is incorporated from solution. This solution can minimize the normalized
the main network in order to cluster domain-specific cut on a graph, which is the relationship between the
networks. The objective function[72] is formulated as spectral and graph. In Ref. [19], De Sa developed a
follows: spectral clustering algorithm on two independent views,
OM D kG HH T k2F (5) each of which could be fed into a clustering model.
where k k2F is the Frobenius norm, G 2 Rgg (g is This spectral-based MvC algorithm created a bipartite
the number of nodes in the main network) is the main graph with a minimizing-disagreement criterion[46, 47]
network, and H 2 RC gk is the factor matrix of the in order to connect the two-view features and then
main network. perform available spectral clustering algorithms on this
The domain-specific network clustering[72] is bipartite graph. Zhou and Burges[77] investigated multi-
formulated as follows: view spectral clustering by generalizing a normalized
g
X cut from the single view to multiple views, by
min OS D kAi U i .U i /T k2F C considering how to learn a clustering close to the
8i;U i >0
i D1 optimal solution for all graphs, and to further develop
8j;V j >0 „ ƒ‚ …
Domain-specific network clustering a multi-view transductive inference on the basis of
g X
X k multi-view spectral clustering. Similar work has been
˛ hij kU i V j k2F carried out in Ref. [78], where it also intended to
i D1 j D1
(6)
„ ƒ‚ … find a balance cut that could separate all similarity
Main cluster guided regularization graphs well. In addition, Long et al.[79] developed
i a general model for multi-view unsupervised learning
where A is the domain-specific network corresponding
under a distributed framework, with the aim of detecting
to node i .i D 1; :::; g/ in the main network G, U i is
hidden patterns individually from each representation
the factor matrix of Ai , V j is the j -th hidden factor
of multiple views, and to seek optimal hidden patterns
matrix, hij (an element of H ) indicates to which degree
from these finding patterns. The authors put forward
the main node i belongs to the j -th main clustering, and
the concept of mapping function in order to make the
˛ is a regularization parameter.
patterns from various pattern spaces comparable in this
Furthermore, based on the above proposed model,
general model. Hence, an optimal pattern was achieved
they designed a general model that allowed partially
from these various patterns of multiple representations.
aligned domain-specific networks to have different
Instead of committing to one clustering solution, Niu et
node sizes and a different number of clusterings. This
al.[80] proposed a method that can provide several non-
model[72] is formulated as the following function:
g g X k
redundant clustering solutions. This method learns non-
redundant subspaces for multiple views, and produces
X X
min OS D JA C ˛ JR (7)
8i;U i >0 a clustering solution simultaneously for each view.
i D1 i D1 j D1
8j;V j >0
To address the issue that data may have considerable
where
noise, Xia et al.[81] investigated the Markov chain
JA D kAi U i .U i /T k2F ;
in order to formulate a multi-view spectral clustering
JR D k.Qij U i /.Qij U i /T .P ij V j /.P ij V j /T k2F ; model. This model has the flavor of low-rank and
where P ij is the mapping matrix between U i and V j , sparse decomposition. It first draws a transition
and Qij is denoted by Qij D P ij .P ij /T . In the end, probability matrix from each single view, and then
the clustering results are driven on the factor matrices uses these matrices in order to form a shared low-
U i ; :::; U g .
Yan Yang et al.: Multi-view Clustering: A Survey 93
rank transition probability matrix. Finally, this shared multimodal brain network inferences[94] , social circle
matrix is input to a standard Markov chain model detection[95] , and human microbiome data analysis[96] .
for clustering. To handle a large-scale problem Example 5: In general, spectral clustering methods
and improve computational efficiency, Li et al.[82] involve two time-consuming steps. The first step,
offered a multi-view spectral clustering algorithm for constructing the similarity (or affinity) graph, takes
large-scale multi-view data. This algorithm uses a O.n2 d / time, while the second step, computing the
local manifold fusion in order to fuse heterogeneous eigen-decomposition, takes O.k n2 / time. Moreover,
features, and bipartite graphs so as to approximate another drawback of spectral clustering is that most
the similarity graphs. Moreover, Chikhi[83] presented spectral clustering methods usually do not provide a
a multi-view normalized cuts approach, a parameter natural extension for dealing with the out-of-sample
free multi-view spectral clustering algorithm, based on problem. To overcome the above two issues, Li et al.[82]
spectral partitioning and local refinement. Lu et al.[84] proposed a multi-view spectral clustering approach
studied convex sparse spectral clustering with sparse for large-scale multi-view data. In summary, the
regularization for single view data, and proposed a designed algorithm uses local manifold regularization
pairwise sparse spectral clustering for handling multi- to fuse heterogeneous features, and approximates the
view data. However, as the number of views grows, similarity graphs with bipartite graphs in order to
it is scarcely possible to avoid dependencies among improve efficiency. It is also easily extended in order to
views, and these dependencies often delude correct deal with the out-of-sample problem. First, it generates
predictions. To address these issues, Son et al.[85] a few consensus salient points for all views. These
extended traditional spectral clustering in order to deal salient points are employed in order to capture the
with the dependencies among views. Especially, they manifold of the original views. Then a bipartite graph
designed a brainstorming process in order to force the is constructed between the raw data points and the
information of each view to be shared among them. salient points for each view. The graphs of all views
Several MvC methods have also been studied by are fused together with a local manifold regularization
combining spectral clustering and other technologies. item. Finally, it applies a spectral clustering algorithm
For instance, Huang et al.[86] developed an affinity on the resulting fused graph and outputs the clustering
aggregation spectral clustering algorithm by extending indicator of the salient points, in order to deal with the
spectral clustering to a setting with multiple available out-of-sample problem efficiently.
affinities. Shao et al.[87] designed a multi-source Here, the two important questions that need to be
MvC framework based on collective spectral clustering stressed are how to reach consensus across all views,
with a discrepancy penalty across sources. Note that and how to express their relationship. With local
this method is applicable to incomplete multi-view manifold learning, the two questions mentioned above
data. Moreover, Feng et al.[88] introduced a multi-view are formulated with the following function[82] :
spectral clustering via roust local subspace learning X m
by considering that all views are noisy and derived min .˛ i /r tr.F T Li F /;
F T F DI;˛ v
iD1
from a robust unified subspace and noisy. Wang et
al.[89] proposed an iterative low-rank based structured ˛ i D 1; ˛ i > 0
P
s.t., i (8)
optimization method for multi-view spectral clustering,
where ˛ i is the non-negative normalized weight factor
which encodes the local manifold structure of the data
for the i -th view, tr./ is the trace of a matrix, r is a
from each view-dependent feature space, and arrives at
scalar to control the distribution of different weights
a multi-view agreement based on an iterative process.
among different views, F 2 Rnk is the class indicator
Zhao et al.[90] discussed the semi-supervised MvC, and
matrix, and Li is the normalized graph Laplacian matrix
presented a multi-view matrix completion method with
for the i -th view. The normalized graph Laplacian
a pairwise similarity matrix in order to utilize side
matrix[82] is formulated as
information; namely, must-link and cannot-link.
1=2 1=2
In addition, there are also spectral-based MvC LDI D WD (9)
methods for multi-type relational data[91] , multi-modal where W 2 Rnn is the adjacent matrix of the graph,
image data[92] , social media data[93] , as well as D 2 Rbn is the degree matrix whose i -th diagonal
element is di i D jnD1 wij .
P
some applications with spectral-based MvC, such as
94 Big Data Mining and Analytics, June 2018, 1(2): 83-107
Formula (8) aims to provide a consensus result F the technologies involved in subspace learning-based
among all views. This unique consensus eliminates MvC include subspace learning, subspace clustering,
the requirement for computing the local results subspace projection, low-rank approximation, and
for each view, and the computational overhead of tensor decomposition.
communicating back and forth between the local results Assuming that all views are conditionally
and the global result. To further uncover inter-view independent, given the clustering labels, Chaudhuri
relationships, Formula (8)[82] is rewritten as et al.[97] presented a multi-view subspace learning
min tr.F T LF /; method based on canonical correlation analysis.
F T F DI;˛ v This method provides auxiliary results for Gaussian
Pm
s.t., i ˛ i D 1; ˛ i > 0 (10) mixtures and log concave distribution mixtures.
Pm Guo[98] proposed a convex subspace representation
where L D i .˛ i /r Li . Denote that L is regarded as
learning method for MvC. The key idea is to detect a
the local manifold fusion of all views. Formula (10) is
shared subspace representation across multiple views,
solved by iterative optimization techniques. The total
and then adopt standard clustering algorithms on
computational complexity is approximately O.T k n2 C
Pm i 2
this shared representation. Zhao et al.[99] developed
i mn.d / /, where T is the number of iterations. a co-training framework for multi-view subspace
3.4 Multi-view subspace clustering clustering. It combined classical K-means and linear
Multi-view subspace clustering, i.e., learning a new and discriminant analysis under a co-training scheme,
unified representation for all view data, from multiple which utilized labels learned automatically in one
subspaces, or a latent space that makes it easier to deal view in order to generate discriminative subspaces
with high-dimensional data when building clustering in another. Deng et al.[100] put forward a feature
models, has become a hot topic in the field of MvC. The weighting method based on subspace learning, where
general procedure of multi-view subspace clustering is it locally adapted the feature weighting of each
illustrated as Fig. 9; there it obtains such a unified group automatically according to the tightness of
feature representation in two ways: 1 learn a unified views. Moreover, Cao et al.[101] stated that exploiting
representation from multiple subspaces directly, or 2 the specific independently constructed matrices is
first learn a latent space and then arrive at this unified insufficient for the success of MvC, and exploring the
representation. Finally, this unified representation was underlying complementarity is of great importance.
fed into an off-the-shelf clustering model in order to To this end, they designed a framework, named
produce the clustering results. After reviewing the diversity-induced multi-view subspace clustering
literature on MvC, we divided the multi-view subspace framework. Concretely, they extended the existing
clustering methods into two major types, namely, single-view subspace clustering to the multi-view
subspace learning-based and NMF-based (a special case domain, and utilized the Hilbert Schmidt Independence
in subspace learning) methods. Criterion as a diversity term in order to explore
the complementarity of multi-view representations.
3.4.1 Subspace learning-based MvC
However, many studies have usually focused on the
Subspace learning-based MvC seeks to find a latent combining information rather than on improving the
space from multiple low-dimensional subspaces by feature representation capability of each view. To solve
assuming that data points are drawn from this this problem, Wang et al.[102] presented a framework
latent subspace. Here, we extend this concept in with an extreme learning machine, and implemented
order to make its role more general. In this paper, three algorithms on this framework. Unlike the
methods in Refs. [98, 102] that perform subspace
Subspace 1
① clustering on a common view, Gao et al.[103] performed
View 1
Subspace 2 Latent space subspace clustering on every view, simultaneously,
② Unified while guaranteeing the clustering consistence among
Data View 2 representation
different views by adopting a common indicator. In
…
…
Subspace m
View m
addition, Xu et al.[104, 105] proposed an MvC method
called discriminatingly embedded K-means, which
Fig. 9 General procedure of multi-view subspace clustering. embedded the synchronous learning of multiple
Yan Yang et al.: Multi-view Clustering: A Survey 95
discriminative subspaces into multi-view K-means harnessed the prior information in order to obtain
clustering in order to formulate a unified framework, the view-specific sparse representation, while utilizing
while controlling the inter coordination between the correlation between different views. Moreover,
these subspaces adaptively. To effectively exploit data Cao et al.[122] put forward a constrained multi-view
correlation consensus among multi-views, subspace video face clustering method, which considers both
clustering with a similarity matrix for multi-view data the video face pairwise constraints and the multi-
was studied in Refs. [106, 107], where the authors view consistence simultaneously. Unlike some existing
intended to find a correlation or similarity consensus clustering methods that only employ these constraints
among all views, which was inspired by the idea in the clustering phase, this method strengthens the
that data objects within the same subspace have large pairwise constraints through the entire framework,
similarity, while having small similarity for data objects namely, in sparse subspace representation and spectral
within the distinct subspaces for each view. Rather clustering.
than using a similarity matrix, Fan et al.[108] drew An incomplete MvC based on subspace learning has
global low-rank constraints and local cross topology also been investigated. For instance, Yin et al.[123, 124]
preserving constraints into subspace clustering for the proposed an incomplete MvC method, which unified
purpose of characterizing data correlations. There subspace learning, feature selection, and inter-view and
have also been several methods investigated by intra-view similarity into a single objective function.
combing with other technologies, sparse subspace It learns a latent representation for incomplete multi-
clustering[109, 110] , low-rank approximation[111] , and view data, where this latent representation serves as an
tensor decomposition[112–115] , to name a few. approximation of the normalized indicator matrix. Xu
Unlike the above mentioned methods or frameworks, et al.[125] suggested that the key to deal with incomplete
which output just a single clustering, Cui et al.[116, 117] view problem is to exploit the connections between
presented pioneering work in order to find alternative different views. This enables incomplete views to be
and multiple clustering solutions based subspace restored with the assistance of complete views. They
learning. They designed an MvC framework in order investigated the estimation of incomplete views with the
to find all non-redundant data clustering views, and help of information from other observed views through
suggested two methods within this framework, i.e., this subspace.
orthogonal clustering, and clustering in orthogonal Example 6: Multi-view data is often incomplete,
subspaces. The difference is that the former seeks namely, data objects have incomplete feature sets.
orthogonality in the cluster space, while the latter does it Based on subspace learning, Yin et al.[123, 124] studied
in the feature space. Similar work has also been carried incomplete multi-view learning for incomplete and
out in Ref. [118], where multiple generalizations of the unlabeled multi-view data. Figure 10 shows the
data are provided by using multiple mixture models. presented subspace learning model, which learns a
Each mixture describes a specific view on the data by unified latent representation for incomplete multi-view
using a mixture of the Beta distributions in subspace data. This model directly optimizes the class indicator
projections. Moreover, Muller et al.[119] presented a matrix, which establishes a bridge for incomplete
short tutorial on this topic. feature sets. Moreover, feature selection is considered
In a semi-supervised clustering setting, Günnemann to deal with high dimensional and noisy features.
et al.[120] developed a new Bayesian framework for Moreover, the structures of the inter-view data and
semi-supervised MvC based on their previous work[118] , intra-view data are preserved in order to enhance
which also sought to the detections of multiple and learning performance. To this end, an objective function
alternative clusterings. This new framework treated was developed along with an efficient optimization
multi-view data with several multivariate mixture algorithm.
v v
distributions located in subspace projections, and Let X D ŒXcv ; XO v 2 Rdv .nc CnO / denote the v-
v
handled prior knowledge in a form of sample- th view data matrix, where Xcv 2 Rdv nc and XO 2
v
level constraints in order to indicate which objects Rdv nO represent the data matrix in the v-th view for
should or should not be grouped together. Moreover, complete and partial instances, respectively. Similarly,
v v
Yin et al.[121] presented a pairwise sparse subspace Y D ŒYcv ; YO v 2 R.nc CnO /k represents the class
representation model for MvC. The designed model indicator of the v-th view. To learn the class indicator
96 Big Data Mining and Analytics, June 2018, 1(2): 83-107
Fig. 10 The overview of the proposed model with two views, i.e., text and image[124] . For the incomplete multi-view dataset, it
uses a projection matrix in order to project the original features (text and image) to a latent space, which explicitly captures the
clustering structure. Moreover, group sparsity is imposed on projection matrices for feature selection. Moreover, inter-view and
intra-view data similarities are preserved in order to enhance the model. Finally, the features on this latent space are applied to
the clustering task.
matrix, it drives a projection matrix U v 2 Rdv k for The NMF aims to find two non-negative matrices
each view in order to project their original spaces to a W 2 Rd p and H 2 Rpn , whose product can
unified space. The objective function[124] is formulated adequately approximate the original matrix X . Here,
as the former matrix W is termed as the basis matrix
m
X m
X (basic space), while the latter matrix H represents the
min kŒXcv ; XO v T U i ŒYcv ; YO v k2F Cˇ kU i k21 C coefficient matrix (representation feature), and p (in
U;Y
„
i D1
ƒ‚ …
i D1
„ ƒ‚ … general, p minfn; d g) denotes the desired reduced
Feature projection Feature learning dimension. The reconstruction processes[126] can be
X m
m X formulated as a Frobenius norm optimization problem,
tr..U p /T X i Lpq .X /T U q /; defined as
pD1 qD1
„ ƒ‚ … min kX W H k2F ; s.t., W > 0; H > 0 (12)
Similarity preserving W;H
s.t., Y 2 f0; 1gnk ; Y 1k D 1n (11) Moreover, many variants of NMF have also been put
where it has three terms: using the projection matrix forward, such as G-orthogonal NMF[127] , Regularized
to project each incomplete view to the latent space NMF[128, 129] , Convex and Semi-NMF[130] , and Multi-
defined by Y ; feature selection for each view based layer NMF[131] . Li and Ding[132] proposed a survey on
on l21 -norm regularization; inter-view and intra-view NMF for clustering, in which more details of NMF can
data similarity, which preserves the term defined by be found.
the Laplacian matrix Lpq . Moreover, the constraints In a multi-view scenario, a late integration approach
imposed on Y guarantee that each example belongs to via NMF was studied in Ref. [133]. The proposed
one group only. approach takes the clustering results generated
3.4.2 NMF-based MvC independently on each available view, constructs an
NMF, which was originally investigated as a intermediate matrix representation for these clustering
dimensionality reduction technique[126] , has emerged results, and performs NMF on this representation in
as an effective latent feature learning method. The order to reconcile the groups arising from individual
non-negative constraint leads to the parts-based views. Unlike the approach presented in Ref. [133],
representation of samples, which accords with the which plugs the clustering results into NMF, Liu et
cognitive process of the human brain, from the al.[134] developed a new NMF-based MvC framework,
psychological and physiological evidences. Given which feeds the data directly into NMF and drives
an input non-negative data matrix X 2 Rd n , each a fused representation. Reference [134] formulates a
column of X is a feature vector of one sample. joint matrix factorization with normalization strategy
Yan Yang et al.: Multi-view Clustering: A Survey 97
that pushes the representative result of each view basis space are also close to each other. Motivated
toward a common consensus. Moreover, it provides a by Refs. [129, 134], Hidru and Goldenberg[146] ,
new insight into applying NMF to MvC. Inspired by and Wang et al.[147] investigated graph-regularized
this framework, some improved work has also been multi-view NMF-based clustering respectively, both
investigated. For instance, a semi-supervised MvC of which had little difference. Rather than taking
algorithm based on NMF with weight for each view graph regularization from unlabeled data, Guan et
was studied in Refs. [16, 135], where it discovers al.[148] constructed the graph embedding framework
a partially shared latent representation. With this through partial label information and considered the
learned representation of multi-view data, a robust sparseness constraints at the same time. Multi-manifold
sparse regression model was introduced in order to regularized NMF goes into Refs. [149, 150], in which
predict the clustering results. Embedding the similarity multiple manifolds were combined linearly and two
matrices of the data points into NMF, He et al.[136] also kinds of MvC methods were led by different strategies.
focused on learning a shared latent representation for Based on SymNMF[151] , Zhang et al.[152] introduced
MvC. Chang et al.[137] developed a multi-view NMF a graph regularized symmetric NMF framework for
algorithm and applied it to clothing image clustering, MvC. Furthermore, graph regularized MvC approaches
by taking a new regularization term in order to advocate via concept factorization[153] were discussed in Refs.
the structural incoherence between the representing [154, 155].
result of each view. Considering the local geometric In the research field of incomplete MvC, Shao et
structure of each view, and penalizing the disagreement al.[156, 157] made some attempts via NMF. The main
of different views at the same time, Ou et al.[138] idea was to incorporate weighted NMF in order to
proposed another types of multi-view NMF with a handle the missing objects in each incomplete view,
patch alignment strategy. Instead of treating the objects and pushing the learned latent representation feature
or views as distinct positions, Xu et al.[139] introduced towards a consensus. Shao et al.[158] also proposed
a new self-paced learning algorithm with a smoothed a general framework for incomplete data via tensor
weighting scheme, which inherits the merits of logistic modeling and factorization. This framework first uses
function and provides probabilistic weights. the kernel matrices in order to generate an initial tensor
Many other methods based on the NMF variants across all views, and then formulates a joint tensor
have also been investigated. Based on G-orthogonal factorization process with the sparsity constraint. This
NMF[127] , Cai et al.[140] presented a robust multi-view process is used to iteratively push the initial tensor
K-means clustering algorithm for large-scale multi- towards an exploration of the latent factors. Moreover,
view data. Qian and Zhai[141] proposed a multi- the later fusion method based on NMF[133] can also
view unsupervised method for text-image web news handle incomplete views as author notes. In addition,
data, where image local learning regularized orthogonal Li et al.[159] presented a partial multi-view clustering
NMF was adopted in order to learn pseudo labels, and algorithm (named PVC), which is specifically designed
robust joint l2;1 -norm was performed in order to select for two-view datasets. It employs NMF in order
discriminative features. Zhao et al.[142] presented a to learn a latent subspace, in which the samples
deep matrix factorization framework via Multi-layer belonging to the same group are close to each other
NMF[131] for MvC, where Semi-NMF[130] was used and the similar samples from the same view should
to learn a hierarchical representation for multi-view be grouped well. Later on, Qian et al.[160] improved
data in a layer-wise manner. Multiple sparse views PVC by considering the cluster similarity and manifold
clustering approaches with l2;1 -norm and group l1 - preserving constraints. Furthermore, Rai et al.[161]
norm have also been investigated in Refs. [143–145]. extended PVC to support multi-views and view-specific
Moreover, graph (or manifold) regularized NMF for graph Laplacian regularization. With the help of
MvC has also attracted attention. Graph regularized inter-view constraints (i.e., must-link and cannot-link
NMF[129] is an extension of NMF, which has been constraints), Zhang et al.[162] defined a disagreement
shown to improve the quality of the X factorization between each pair of views in order to guide the
based on a manifold assumption, i.e., if two data factorization process.
points are close in the intrinsic geometry, then the Example 7: Wang et al.[154] proposed an auto-
representation of these two data points in the new weighted Multi-view Concept Clustering (MvCC)
98 Big Data Mining and Analytics, June 2018, 1(2): 83-107
based on concept factorization with local manifold all the variables simultaneously, and an alternating
regularization. The MvCC framework is shown in iterative algorithm based on the multiplicative update
Fig. 11. In brief, concept factorization[153] is a rules was developed to optimize it. More details can be
variant of NMF, which is available for handling found in Ref. [154].
the data containing negative, and is also easily
performed in the kernel space. Furthermore, the 3.5 Multi-task multi-view clustering
local manifold regularization is incorporated into the MvC exploits the consistency and complementarity
concept factorization process in order to preserve among different views in order to achieve better
the locally geometrical structure of the original data clustering quality, as mentioned above. Another
space. Both weights of each view are determined concept, namely, multi-task clustering (belonging to
automatically and the given co-normalized scheme the field of multi-task learning[163] ), performs multiple
makes fusion meaningful in terms of driving the related tasks together and utilizes the relationship
common consensus representation. In addition, the between these tasks in order to enhance clustering
clustering results are driven directly from the common performance for single-view data. By inheriting the
consensus representation, without requiring additional property of both MvC and multi-task clustering, the
clustering steps. This is due to that the consensus matrix Multi-task Multi-view Clustering (M2 vC) treats each
being sparse. individual view data with one task or multiple tasks as
Given a multi-view data X D fX 1 ; :::; X m g, the shown in Fig. 12. This has received some attention in
objective function of MvCC[154] is formulated as recent years. The main challenges of M2 vC consist
8
ˆ kX v X v W v .H v /T k2F C >
9 of finding a way to model the intra-task (within-task)
clustering on each view, and a way to exploit the multi-
ˆ
ˆ „ ƒ‚ … >>
ˆ
ˆ >
>
ˆ Concept factorization >
m
ˆ
ˆ v T v v
>
> task and multi-view relationship, while transferring the
X < ˛ tr..H / L H / C =
min „ ƒ‚ … ; inter-task (between-task) knowledge to one another.
H Local manifold regularization
Here, we provide a review on M2 vC in order to attract
ˆ >
vD1 ˆˆ >
ˇ !v kH v H k2F
ˆ >
>
ˆ >
ˆ
ˆ
ˆ „ ƒ‚ …
>
>
> further attention and promote research in this area.
: ;
Consensus representation By assuming that a common underlying subspace
s.t., W > 0; H v > 0; !v > 0;
v is shared by multiple related tasks, Gu and Zhou[164]
P
!v D 1 v
(13) proposed a cross-domain multi-task clustering method
v v
where W is the association matrix, H is by treading each view with a task. This method
the representation matrix, H is the consensus aims to learn such a subspace, and through it,
representation matrix, Lv is the Laplacian matrix, !v is the knowledge of one task can be transferred to
the weight of the v-th view, ˛ and ˇ are the trade-off another. Note that the authors also assumed that the
parameters. It is worth stressing that the weight ! is dimensionality of the feature vector for each task is
determined automatically, and the parameters ˛ and the same, and the number of clusters in each task
ˇ are suggested with empirical values, while, as the is also the same. Later on, Zhang and Zhou[165]
objective function Formula (13) is not convex over relaxed these assumptions, and introduced an improved
2× 2 × ×
Consensus matrix *
View 2 2
= 2 2
× 2
1
×
…
…
…
2 max
× × ×
View m
…
Data View 2 and
Shared features technology[176] , bi-level weighted MvC based on K-
…
Task t
means[177–180] , and multi-view fuzzy clustering[181–184] .
View m
Example 8: In multi-task multi-view settings, the
Fig. 12 General procedure of multi-task multi-view tasks are related through common views. The key step
clustering. of M2 vC is to link the features in the common view
in order to integrate the related tasks. In the field
cross-domain multi-task clustering, which can performs of M2 vC, Zhang et al.[174] developed a typical M2 vC
multiple related clustering tasks simultaneously through framework based on co-clustering. An illustration of
domain adaptation. Besides, Xie et al.[166] presented a this framework is shown in Fig. 13, where the square
multi-task co-clustering based on 3-factor NMF. The region represents the set of data samples, and the
objective function of this method consisted of two circular region represents the set of data features under
parts, i.e., task-specific co-clustering and cross-task a view in each task as described in Ref. [174]. Note
feature space regularization. The multi-task clustering that the samples of task 1 and task 2 have a common
method via SymNMF[151] for multi-view data has view, which contains task shared features (denoted
also been studied in Ref. [167], where several tasks by the light gray overlapping area) and task specific
were performed simultaneously with a geometric features (denoted by the light gray non-overlapping
affine transformation in order to control intra-task area). This framework consists of three components:
and inter-task knowledge sharing. In addition, Wang within-view-task clustering, multi-view relationship
et al.[168] explored multi-view spectral clustering by learning, and multi-task relationship learning. Under
using a multi-objective formulation (seen as a multi- this framework, they proposed two M2 vC algorithms.
task problem), which is solved by Pareto optimization. One is the bipartite graph based M2 vC algorithm,
Wahid et al.[169, 170] studied a multi-objective MvC which only handles the data containing non-negative
ensemble method based on an evolutionary approach. values. Another one is the semi-non-negative matrix tri-
Zhang et al.[171] developed a multi-task clustering factorization based M2 vC algorithm, which is a general
algorithm by transferring the knowledge of instances. M2 vC method, i.e., it can deal with the data containing
The proposed algorithm learns a shared subspace, and negative or non-negative values.
constructs a shared nearest neighbor similarity matrix Given T clustering tasks, each task is covered with
for each individual task. Then, it applies a traditional m t views. S is the index collection of the common
spectral clustering method on the shared nearest views; Tv is the index collection of all tasks under
neighbor similarity matrix of each task. Shi et al.[172] this common view. For within-view-task clustering, it
incorporated spectral clustering and discriminative treats the data objects in each view of each task with
analysis into a unified framework by exploiting the co-clustering, which accomplishes the essential part of
correlation information between multiple views, where the whole algorithm and ensures the preservation of
spectral clustering aims to discover the cluster structure, the knowledge available locally at each view of each
and the discriminative analysis aims to preserve the task in order to avoid negative transfer. For multi-view
structure. Moreover, Zhang et al.[173, 174] presented an relationship learning, it minimizes the disagreement
M2 vC framework, which integrates within-view-task between the clusters of data under each pair of views
clustering, multi-view relationship learning, and multi-
task relationship learning. Under this framework, they Shared features
brief summary, including multi-modal clustering based Fig. 13 Co-clustering based M2 vC framework[174] .
100 Big Data Mining and Analytics, June 2018, 1(2): 83-107
in each task. For multi-task relationship learning, it are selected sections from Wikipedia’s featured articles
uses co-clustering in order to drive a shared subspace collection. They are available in full or small versions.
among the related tasks under each common view Handwritten Digit Dataset‘ : It consists of features
by assuming that related tasks should share some of handwritten numerals (0–9) from the UCI repository.
common or relevant features. The M2 vC clustering 100leaves Datasetk : It contains sixteen different
framework0[174] is formalized as 1 kinds of plant leafs, where each kind has one-hundred
T
X mt
X mt X
X mt XX samples. For each sample, the shape descriptor, fine
min @ H1 C H2 A C H3 scale margin, and the texture histogram are given.
t D1 i D1 i D1 j ¤i i 2S t2Ti Corel Images Dataset : This dataset consists of
(14) image features extracted from a Corel image collection.
Pm t
where i D1 H 1 is to co-cluster data objects and It provides four sets of features, namely, color
Pm t Pm t
features of all the views in each task t , i D1 j ¤i H2 histogram, color histogram layout, color moments, and
is to minimize the disagreement between the clustering co-occurence texture.
assignments of any two different views in each task NUS-WIDE Dataset : A web image dataset with
P P
t, i 2S t 2Ti H3 is to obtain the shared subspace six types of low-level features extracted from these
under each common view by the same co-clustering images.
method as the first component. and are trade- YouTube Video Dataset : This dataset contains
off parameters. Under this framework, two specific approximately 1.2105 instances, where each instance
clustering algorithms were investigated in Ref. [174]. is described by 13 types of features, and also has its
class information.
4 Publically Available Datasets
To support researchers working in the field of MvC, 5 Conclusion and Discussion
we summarize some widely used multi-view datasets.
The proliferation of multi-view data calls for advanced
For these publically available datasets, we provide their
clustering technologies that can discover knowledge
URLs.
from multi-view datasets. This paper surveyed most of
3Sources Dataset : A multi-view text corpus,
the existing algorithms and technologies of MvC, and
constructed from news articles from three online news
classified these MvC algorithms into five categories,
services. This repository also has Multi-View Twitter
i.e., co-training style algorithm, multi-kernel learning,
Datasets, a collection of Twitter datasets for social
multi-view graph clustering, multi-view subspace
networks discovery, and BBC and BBCSport Datasets,
clustering, and multi-task multi-view clustering. For
two synthetic text datasets originating from BBC News.
each category of MvC, we did not only review the
WebKb Datasets : These datasets contain web-page
existing algorithms, but also introduced the ideas
data collected from the computer science departments
and technologies behind them, while giving specific
of four universities, namely, four multi-view datasets.
illustrative examples.
Newsgroup Datasets : There are subsets of
Although MvC was proposed around 2003, as we
the NG20 dataset with 3 different pre-processings.
mentioned in the introduction section, there is no
The description of the subsets, and details on the
criterion to decide which MvC algorithm is the best,
preprocessing steps can be found in Ref. [185].
since different methods have their own advantages and
Moreover, this repository also has the Reuters
Multilingual Dataset, Cora Dataset, CiteSeer ‘
http://archive.ics.uci.edu/ml/datasets/
Dataset, Movies617 Dataset, and mini Wekb Multiple+Features
k https://archive.ics.uci.edu/ml/
Datasets.
Wikipedia Article Dataset : The collected datasets datasets/One-hundred+plant+species+leaves+
data+set
http://mlg.ucd.ie/datasets/bbc.html, https://archive.ics.uci.edu/ml/
http://www.cs.cmu.edu/afs/cs.cmu.edu/ datasets/Corel+Image+Features
project/theo-20/www/data/. http://lms.comp.nus.edu.sg/research/
http://lig-membres.imag.fr/grimal/data. NUS-WIDE.htm
html. https://archive.ics.uci.edu/ml/
http://www.svcl.ucsd.edu/projects/ datasets/YouTube+Multiview+Video+Games+
crossmodal/ Dataset
Yan Yang et al.: Multi-view Clustering: A Survey 101
disadvantages. In brief, co-training style algorithms can extensive. Effort is expected to be put into the
enhance the clusters of different views interactively by investigation of the incomplete MvC.
exchanging information. However, they are intractable Multi-task multi-view clustering: This direction is a
when the number of views is more than three. new trend in the research of MvC; however, this
Kernel based MvC inherits the advantage of kernel, trend is accompanied by a few challenges such
while bringing about high computational complexity. as finding a way to explore the relationships of
Multi-view graph clustering introduces spectral graph different tasks and different views, and finding a
theory, while relying on the constructed affinity (or way to transfer the knowledge between each other
similarity) matrices. Multi-view subspace clustering views.
methods have straightforward interpretability, and also In addition, several widely used datasets were listed
have initialization dependence. Multi-task multi-view in order to provide convenience for future researchers.
inherits both properties of multi-task clustering and In summary, this paper serves as a bridge for readers in
multi-view clustering; however, this is still in infancy. order to further promote the research of MvC.
Hopefully, these technologies have close relationship
Acknowledgment
to one another. For example, subspace learning can be
performed on the kernel space, therefore, it is valuable This work was supported in part by the National Natural
in developing the general framework of MvC, which Science Foundation of China (No. 61572407).
inherits the merits of different categories. References
Below, we would like to highlight a number of
challenging problems and future directions in order to [1] C. Xu, D. C. Tao, and C. Xu, A survey on multi-view
encourage more research in MvC. Their solutions will learning, arXiv preprint arXiv: 1304.5634, 2013.
[2] S. L. Sun, A survey of multi-view machine learning,
have a fundamental impact on MvC, specifically, on Neural Comput. Appl., vol. 23, nos. 7&8, pp. 2031–2038,
multi-view data fusion, machine learning, and artificial 2013.
intelligence in general. [3] J. Zhao, X. J. Xie, X. Xu, and S. L. Sun, Multi-view
Correctness of views: Finding a way of knowing learning overview: Recent progress and new challenges,
whether a view is correct, is crucial for MvC. Since Information Fusion, vol. 38, pp. 43–54, 2017.
[4] Y. Zheng, Methodologies for cross-domain data fusion: An
MvC exploits all available views in order to help overview, IEEE Trans. Big Data, vol. 1, no. 1, pp. 16–34,
clustering performance, incorrect views are very 2015.
harmful. Although some work leverages these [5] R. Xu and D. Wunsch, Survey of clustering algorithms,
views with weights, errors could be propagated IEEE Trans. Neural Netw., vol. 16, no. 3, pp. 645–678,
from a misleading view to other views. Thus, this 2005.
[6] C. C. Aggarwal and C. K. Reddy, Data Clustering:
problem must be solved or mitigated to a great Algorithms and Applications. Boca Raton, FL, USA:
extent in order to ensure that MvC is effective. Chapman and Hall/CRC, 2013.
The opportune moment of fusion: Existing MvC [7] A. Fahad, N. Alshatri, Z. Tari, A. Alamri, I. Khalil, A. Y.
adopts three fusion strategies for multi-view data in Zomaya, S. Foufou, and A. Bouras, A survey of clustering
algorithms for big data: Taxonomy and empirical analysis,
the clustering process, namely, fusion in the data,
IEEE Trans. Emerg. Top. Comput., vol. 2, no. 3, pp. 267–
fusion in the projected features, and fusion in the
279, 2014.
results. Most of the current research works of MvC [8] J. D. Wang, H. J. Zeng, Z. Chen, H. J. Lu, L. Tao, and
focus on the second fusion strategy. However, W. Y. Ma, ReCoM: Reinforcement clustering of multi-type
there is no theoretical foundation to decide which interrelated data objects, in Proc. 26th Annu. Int. ACM
one is the best. Theoretical and methodological SIGIR Conf. RESEARCH and Development in Information
Retrieval, 2003, pp. 274–281.
research is required in order to uncover their [9] K. Kailing, H. P. Kriegel, A. Pryakhin, and M. Schubert,
essence. Clustering multi-represented objects with noise, in Proc.
Incomplete MvC: Although some attempts have 8th Pacific-Asia Conf. Advances in Knowledge Discovery
been made for incomplete multi-view data, as and Data Mining, Sydney, Australia, 2004, pp. 394–403.
we mentioned in each section of the category, [10] S. Bickel and T. Scheffer, Multi-view clustering, in Proc.
4th IEEE Int. Conf. Data Mining, Brighton, UK, 2004, pp.
incomplete MvC is still a challenging problem.
19–26.
In real-life, data loss occurs frequently, while, [11] MultiClust: 1st international workshop on discovering,
the research in incomplete MvC has not been summarizing and using multiple clusterings held in
102 Big Data Mining and Analytics, June 2018, 1(2): 83-107
conjunction with KDD-2010, http://eecs.oregonstate.edu/ [27] G. F. Tzortzis and A. C. Likas, Multiple view clustering
research/multiclust/, 2010. using a weighted combination of exemplar-based mixture
[12] 2ed MultiClust Workshop: Discovering, summarizing models, IEEE Trans. Neural Netw., vol. 21, no. 12, pp.
and using multiple clusterings held in conjunction 1925–1938, 2010.
with ECML/PKDD 2011, http://dme.rwth-aachen.de/ [28] A. Kumar and H. Daumé III, A co-training approach for
en/MultiClust2011, 2011. multi-view spectral clustering, in Proc. 28th Int. Conf.
[13] 3rd MultiClust Workshop: Discovering, summarizing and Machine Learning, Bellevue, WA, USA, 2011, pp. 393–
using multiple clusterings held in conjunction with ICDM 400.
2012, http://www.dbs.ifi.lmu.de/research/MultiClust2012/, [29] A. Kumar, P. Rai, and H. Daumé III, Co-regularized multi-
2012. view spectral clustering, in Proc. 24th Int. Conf. Neural
[14] MultiClust: 4th international workshop on discovering, Information Processing Systems, Granada, Spain, 2011,
20 big data mining and analytics, summarizing and pp. 1413–1421.
using multiple clusterings held in conjunction KDD, [30] Y. K. Ye, X. W. Liu, J. P. Yin, and E. Zhu, Co-regularized
http://cs.au.dk/research/data-intensive-systems/projects/ kernel k-means for multi-view clustering, in Proc. 23rd
multiclust2013/, 2013. Int. Conf. Pattern Recognition, Cancun, Mexico, 2016, pp.
[15] MultiClust 2014: Mini-symposium on multiple 1583–1588.
clusterings, multi-view data, and multi-source [31] A. Appice and D. Malerba, A co-training strategy for
knowledge-driven held in conjunction with multiple view clustering in process mining, IEEE Trans.
SDM, http://www.wikicfp.com/cfp/servlet/event. Serv. Comput., vol. 9, no. 6, pp. 832–845, 2016.
showcfp?eventid=33931, 2014. [32] Y. Jiang, J. Liu, Z. C. Li, P. Li, and H. Q. Lu, Co-
[16] J. Liu, Y. Jiang, Z. C. Li, Z. H. Zhou, and H. Q. Lu, regularized PLSA for multi-view clustering, in Proc. 11th
Partially shared latent factor learning with multiview data, Asian Conf. Computer Vision-Volume Part II, Daejeon,
IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 6, pp. Korea, 2013, pp. 202–213.
[33] E. Eaton, M. desJardins, and S. Jacob, Multi-view
1233–1246, 2015.
[17] S. Dasgupta, M. L. Littman, and D. A. McAllester, PAC clustering with constraint propagation for learning with
generalization bounds for co-training, in Proc. 14th Int. an incomplete mapping between views, in Proc. 19th
Conf. Neural Information Processing Systems: Natural and ACM Int. Conf. Information and Knowledge Management,
Synthetic, Vancouver, Canada, 2001, pp. 375–382. Toronto, Canada, 2010, pp. 389–398.
[34] E. Eaton, M. desJardins, and S. Jacob, Multi-view
[18] A. Blum and T. Mitchell, Combining labeled and
constrained clustering with an incomplete mapping
unlabeled data with co-training, in Proc. 11th Annu. Conf.
between views, Knowl. Inf. Syst., vol. 38, no. 1, pp. 231–
Computational Learning Theory, Madison, WI, USA,
257, 2014.
1998, pp. 92–100.
[35] L. Meng, A. H. Tan, and D. Xu, Semi-supervised
[19] V. R. De Sa, Spectral clustering with two views, in Proc.
heterogeneous fusion for multimedia data co-clustering,
22nd Workshop on Learning with Multiple Views, Bonn,
IEEE Trans. Knowl. Data Eng., vol. 26, no. 9, pp. 2293–
Germany, 2005, pp. 20–27.
2306, 2014.
[20] S. Abney, Bootstrapping, in Proc. 40th Annu. Meeting [36] J. W. Sun, J. Lu, T. Y. Xu, and J. B. Bi, Multi-view
of the Association for Computational Linguistics, sparse co-clustering via proximal alternating linearized
Philadelphia, PA, USA, 2002, pp. 360–367. minimization, in Proc. 32nd Int. Conf. Machine Learning,
[21] M. F. Balcan, A. Blum, and K. Yang, Co-training and
Lille, France, 2015, pp. 757–766.
expansion: Towards bridging theory and practice, in [37] G. Bisson and C. Grimal, An architecture to efficiently
Proc. 17th Int. Conf. Neural Information Proc. Systems, learn co-similarities from multi-view datasets, in Proc.
Vancouver, Canada, 2004, pp. 89–96. 19th Int. Conf. Neural Information Proc., Doha, Qatar,
[22] W. Wang and Z. H. Zhou, Analyzing co-training style 2012, pp. 184–193.
algorithms, in Proc. 18th European Conf. Machine [38] G. Bisson and C. Grimal, Co-clustering of multi-view
Learning, Warsaw, Poland, 2007, pp. 454–465. datasets: A parallelizable approach, in Proc. IEEE 12th
[23] K. Nigam and R. Ghani, Analyzing the effectiveness Int. Conf. Data Mining, Brussels, Belgium, 2012, pp. 828–
and applicability of co-training, in Proc. 9th Int. Conf. 833.
Information and Knowledge Management, McLean, VA, [39] S. F. Hussain and S. Bashir, Co-clustering of multi-view
USA, 2000, pp. 86–93. datasets, Knowl. Inf. Syst., vol. 47, no. 3, pp. 545–570,
[24] V. Sindhwani, P. Niyogi, and M. Belkin, A co- 2016.
regularization approach to semi-supervised learning with [40] Y. Jiang, J. Liu, Z. C. Li, and H. Q. Lu, Collaborative PLSA
multiple views, in Proc. 22nd Workshop on Learning with for multi-view clustering, in Proc. 21st Int. Conf. Pattern
Multiple Views, Bonn, Germany, 2005, pp. 74–79. Recognition, Tsukuba, Japan, 2012, pp. 2997–3000.
[25] Y. Z. Cheng and G. M. Church, Biclustering of expression [41] M. Ghassany, N. Grozavu, and Y. Bennani, Collaborative
data, in Proc. 8th Int. Conf. Intelligent Systems for multi-view clustering, in Proc. 2013 Int. Joint Conf. Neural
Molecular Biology, Menlo Park, CA, USA, 2000, pp. 93– Networks, Dallas, TX, USA, 2013, pp. 1–8.
103. [42] L. L. Xu, J. Neufeld, B. Larson, and D. Schuurmans,
[26] S. Bickel and T. Scheffer, Estimation of mixture models Maximum margin clustering, in Proc. 18th Annu. Conf.
using Co-EM, in Proc. 16th European Conf. Machine Advances in Neural Information Processing Systems 17,
Learning, Porto, Portugal, 2005, pp. 35–46. Cambridge, MA, USA, 2005, pp. 1537–1544.
Yan Yang et al.: Multi-view Clustering: A Survey 103
[43] B. Zhao, J. T. Kwok, and C. S. Zhang, Multiple kernel [60] S. F. Hussain, M. Mushtaq, and Z. Halim, Multi-view
clustering, in Proc. 2009 SIAM Int. Conf. Data Mining, document clustering via ensemble method, J . Intell. Inf.
Sparks, NV, USA, 2009, pp. 638–649. Syst., vol. 43, no. 1, pp. 81–99, 2014.
[44] L. Du, P. Zhou, L. Shi, H. M. Wang, M. Y. Fan, W. J. Wang, [61] A. Serra, D. Greco, and R. Tagliaferri, Impact of different
and Y. D. Shen, Robust multiple kernel k-means using metrics on multi-view clustering, in Proc. 2015 Int. Joint
`2;1 -norm, in Proc. 24th Int. Conf. Artificial Intelligence, Conf. Neural Networks, Killarney, Ireland, 2015, pp. 1–8.
Buenos Aires, Argentina, 2015, pp. 3476–3482. [62] Z. Xue, G. R. Li, S. H. Wang, C. J. Zhang, W. G. Zhang,
[45] V. R. De Sa, P. W. Gallagher, J. M. Lewis, and V. L. and Q. M. Huang, GOMES: A group-aware multi-view
Malave, Multi-view kernel construction, Mach. Learn., fusion approach towards real-world image clustering, in
vol. 79, nos. 1&2, pp. 47–71, 2010. Proc. 2015 IEEE Int. Conf. Multimedia and Expo, Turin,
[46] V. R. De Sa, Learning classification with unlabeled data, Italy, 2015, pp. 1–6.
in Advances in Neural Information Processing Systems 6, [63] F. P. Nie, J. Li, and X. L. Li, Parameter-free auto-weighted
Cambridge, MA, USA, 1993, pp. 112–119. multiple graph learning: A framework for multiview
[47] V. R. De Sa and D. H. Ballard, Category learning through
clustering and semi-supervised classification, in Proc. 25th
multimodality sensing, Neural Comput., vol. 10, no. 5, pp.
Int. Joint Conf. Artificial Intelligence, New York, NY,
1097–1117, 1998.
[48] S. Yu, L. C. Tranchevent, X. H. Liu, W. Glanzel, J. A. USA, 2016, pp. 1881–1887.
K. Suykens, B. De Moor, and Y. Moreau, Optimized data [64] X. K. Wei, B. K. Cao, and P. S. Yu, Multi-view
fusion for kernel k-means clustering, IEEE Trans. Pattern unsupervised feature selection by cross-diffused matrix
Anal. Mach. Intell., vol. 34, no. 5, pp. 1031–1039, 2012. alignment, in Proc. 2017 Int. Joint Conf. Neural Networks,
[49] M. Gönen and A. A. Margolin, Localized data fusion Anchorage, AK, USA, 2017, pp. 494–501.
for kernel k-means clustering with application to cancer [65] C. P. Hou, F. P. Nie, H. Tao, and D. Y. Yi, Multi-view
biology, in Proc. 28th Annu. Conf. Neural Information unsupervised feature selection with adaptive similarity and
Proc. Systems, Montreal, Canada, 2014, pp. 1305–1313. view weight, IEEE Trans. Knowl. Data Eng., vol. 29, no.
[50] Y. T. Lu, L. T. Wang, J. F. Lu, J. Y. Yang, and C. H. 9, pp. 1998–2011, 2017.
Shen, Multiple kernel clustering based on centered kernel [66] A. Hamzaoui, A. Joly, and N. Boujemaa, Multi-
alignment, Pattern Recognit., vol. 47, no. 11, pp. 3656– source shared nearest neighbours for multi-modal image
3664, 2014. clustering, Multimed. Tools Appl., vol. 51, no. 2, pp. 479–
[51] G. Tzortzis and A. Likas, Kernel-based weighted multi- 503, 2011.
view clustering, in Proc. 12th Int. Conf. Data Mining, [67] S. K. Wang, Y. M. Ye, and R. Y. K. Lau, A generative
Brussels, Belgium, 2012, pp. 675–684. model with ensemble manifold regularization for multi-
[52] D. Y. Guo, J. Zhang, X. W. Liu, Y. Cui, and C. X. view clustering, in Proc. 11th Int. Conf. Advanced
Zhao, Multiple kernel learning based multi-view spectral Intelligent Computing Theories and Applications, Fuzhou,
clustering, in Proc. 22nd Int. Conf. Pattern Recognition, China, 2015, pp. 109–114.
Stockholm, Sweden, 2014, pp. 3774–3779. [68] S. K. Wang, E. K. Wang, X. T. Li, Y. M. Ye, R. Y.
[53] X. W. Liu, Y. Dou, J. P. Yin, L. Wang, and E. K. Lau, and X. L. Du, Multi-view learning via multiple
Zhu, Multiple kernel k-means clustering with matrix- graph regularized generative model, Knowl.-Based Syst.,
induced regularization, in Proc. 30th AAAI Conf. Artificial vol. 121, pp. 153–162, 2017.
Intelligence, Phoenix, AZ, USA, 2016, pp. 1888–1984. [69] Z. Y. Zhang and J. Y. Mao, Jointly sparse
[54] Y. Zhao, Y. Dou, X. W. Liu, and T. Li, A novel multi- neighborhood graph for multi-view manifold clustering,
view clustering method via low-rank and matrix-induced Neurocomputing, vol. 216, pp. 28–38, 2016.
regularization, Neurocomputing, vol. 216, pp. 342–350, [70] F. P. Nie, G. H. Cai, and X. L. Li, Multi-view clustering and
2016. semi-supervised classification with adaptive neighbours,
[55] P. R. Zhang, Y. Yang, B. Peng, and M. J. He, Multi-view
in Proc. 31st AAAI Conf. Artificial Intelligence, San
clustering algorithm based on variable weight and MKL, in
Francisco, CA, USA, 2017, pp. 2408–2414. t
Proc. Int. Joint Conf. Rough Sets, Olsztyn, Poland, 2017,
[71] W. Cheng, X. Zhang, Z. S. Guo, Y. B. Wu, P. F. Sullivan,
pp. 599–610.
[56] A. Trivedi, P. Rai, H. Daumé III, and S. L. DuVall, and W. Wang, Flexible and robust co-regularized multi-
Multiview clustering with incomplete views, in Proc. domain graph clustering, in Proc. 19th ACM SIGKDD Int.
Workshop on Machine Learning for Social Computing, Conf. Knowledge Discovery and Data Mining, Chicago,
Whistler, Canada, 2010. IL, USA, 2013, pp. 320–328.
[57] W. X. Shao, X. X. Shi, and P. S. Yu, Clustering on multiple [72] J. C. Ni, H. H. Tong, W. Fan, and X. Zhang, Flexible
incomplete datasets via collective kernel learning, in Proc. and robust multi-network clustering, in Proc. 21t h ACM
13th Int. Conf. Data Mining, Dallas, TX, USA, 2013, pp. SIGKDD Int. Conf. Knowledge Discovery and Data
1181–1186. Mining, Sydney, Australia, 2015, pp. 835–844.
[58] X. W. Liu, M. M. Li, L. Wang, Y. Dou, J. P. Yin, [73] J. C. Ni, W. Cheng, W. Fan, and X. Zhang, Self-grouping
and E. Zhu, Multiple kernel k-means with incomplete multi-network clustering, in Proc. 16th Int. Conf. Data
kernels, in Proc. 31st AAAI Conf. Artificial Intelligence, Mining, Barcelona, Spain, 2016, pp. 1119–1124.
San Francisco, CA, USA, 2017, pp. 2259–2265. [74] R. Liu, W. Cheng, H. H. Tong, W. Wang, and X. Zhang,
[59] W. Tang, Z. D. Lu, and I. S. Dhillon, Clustering with Robust multi-network clustering via joint cross-domain
multiple graphs, in Proc. 9th IEEE Int. Conf. Data Mining, cluster alignment, in Proc. 2015 IEEE Int. Conf. Data
Miami, FL, USA, 2009, pp. 1016–1021. Mining, Atlantic City, NJ, USA, 2015, pp. 291–300.
104 Big Data Mining and Analytics, June 2018, 1(2): 83-107
[75] P. S. Yu and J. W. Zhang, MCD: Mutual clustering across [91] B. Long, Z. F. Zhang, X. Y. Wú, and P. S. Yu, Spectral
multiple social networks, in Proc. 2015 IEEE Int. Conf. Big clustering for multi-type relational data, in Proc. 23rd Int.
Data, New York, NY, USA, 2015, pp. 762–771. Conf. Machine Learning, Pittsburgh, PA, USA, 2006, pp.
[76] C. D. Wang, J. H. Lai, and P. S. Yu, Multi-view clustering 585–592.
based on belief propagation, IEEE Trans. Knowl. Data [92] X. Cai, F. P. Nie, H. Huang, and F. Kamangar,
Eng., vol. 28, no. 4, pp. 1007–1021, 2016. Heterogeneous image feature integration via multi-modal
[77] D. Y. Zhou and C. J. C. Burges, Spectral clustering and spectral clustering, in Proc. 2011 IEEE Conf. Computer
transductive learning with multiple views, in Proc. 24th Vision and Pattern Recognition, Colorado Springs, CO,
Int. Conf. Machine Learning, Corvalis, OR, USA, 2007, USA, 2011, pp. 1977–1984.
pp. 1159–1166. [93] J. L. Tang, X. Hu, H. J. Gao, and H. Liu, Unsupervised
[78] Y. Cheng and R. L. Zhao, Multiview spectral clustering feature selection for multi-view data in social media, in
via ensemble, in Proc. 2009 IEEE Int. Conf. Granular Proc. 2013 SIAM Int. Conf. Data Mining, Austin, TX,
Computing, Nanchang, China, 2009, pp. 101–106. USA, 2013, pp. 270–278.
[79] B. Long, P. S. Yu, and Z. F. Zhang, A general model for [94] H. B. Chen, K. M. Li, D. J. Zhu, X. Jiang, Y. X.
multiple view unsupervised learning, in Proc. 2008 SIAM Yuan, P. L. Lv, T. Zhang, L. Guo, D. G. Shen, and T.
Int. Conf. Data Mining, Philadelphia, PA, USA, 2008, pp. M. Liu, Inferring group-wise consistent multimodal brain
822–833. networks via multi-view spectral clustering, IEEE Trans.
[80] D. L. Niu, J. G. Dy, and M. I. Jordan, Multiple non- Med. Imag., vol. 32, no. 9, pp. 1576–1586, 2013.
redundant spectral clustering views, in Proc. 27th Int. [95] Y. H. Yang, C. Lan, X. L. Li, B. Luo, and J.
Conf. Int. Conf. Machine Learning, Haifa, Israel, 2010, pp. Huan, Automatic social circle detection using multi-view
831–838. clustering, in Proc. 23rd ACM Int. Conf. Conf. Information
[81] R. K. Xia, Y. Pan, L. Du, and J. Yin, Robust multi-view and Knowledge Management, Shanghai, China, 2014, pp.
spectral clustering via low-rank and sparse decomposition, 1019–1028.
in Proc. 28th AAAI Conf. Artificial Intelligence, Québec [96] Y. Zhang, X. H. Hu, and X. P. Jiang, Multi-view clustering
City, Canada, 2014, pp. 2149–2155. of microbiome samples by robust similarity network fusion
and spectral clustering, IEEE/ACM Trans. Comput. Biol.
[82] Y. Q. Li, F. P. Nie, H. Huang, and J. Z. Huang, Large-scale
Bioinform., vol. 14, no. 2, pp. 264–271, 2017.
multi-view spectral clustering via bipartite graph, in Proc.
[97] K. Chaudhuri, S. M. Kakade, K. Livescu, and K. Sridharan,
29th AAAI Conf. Artificial Intelligence, Austin, TX, USA,
Multi-view clustering via canonical correlation analysis, in
2015, pp. 2750–2756.
Proc. 26th Annu. Int. Conf. Machine Learning, Montreal,
[83] N. F. Chikhi, Multi-view clustering via spectral
Canada, 2009, pp. 129–136.
partitioning and local refinement, Information Processing
[98] Y. H. Guo, Convex subspace representation learning from
and Management, vol. 52, no. 4, pp. 618–627, 2016.
multi-view data, in Proc. 27th AAAI Conf. Artificial
[84] C. Y. Lu, S. C. Yan, and Z. C. Lin, Convex sparse spectral
Intelligence, Bellevue, WA, USA, 2013, pp. 387–393.
clustering: Single-view to multi-view, IEEE Trans. Image
[99] X. R. Zhao, N. Evans, and J. L. Dugelay, A subspace
Proc., vol. 25, no. 6, pp. 2833–2843, 2016.
co-training framework for multi-view clustering, Pattern
[85] J. W. Son, J. Jeon, A. Lee, and S. J. Kim, Spectral Recogn. Lett., vol. 41, pp. 73–82, 2014.
clustering with brainstorming process for multi-view data, [100] Q. Deng, Y. Yang, M. He, and H. Xing, Locally adaptive
in Proc. 31st AAAI Conf. Artificial Intelligence, San feature weighting for multiview clustering, in Proc. 12th
Francisco, CA, USA, 2017, pp. 2548–2554. Int. FLINS Conf. Uncertainty Modelling in Knowledge
[86] H. C. Huang, Y. Y. Chuang, and C. S. Chen, Engineering and Decision Making, Roubaix, France, 2016,
Affinity aggregation for spectral clustering, in Proc. 2012 pp. 139–145.
IEEE Conf. Computer Vision and Pattern Recognition, [101] X. C. Cao, C. Q. Zhang, H. Z. Fu, S. Liu, and H.
Providence, RI, USA, 2012, pp. 773–780. Zhang, Diversity-induced multi-view subspace clustering,
[87] W. X. Shao, J. W. Zhang, L. F. He, and P. S. Yu, Multi- in Proc. 2015 IEEE Conf. Computer Vision and Pattern
source multi-view clustering via discrepancy penalty, in Recognition, Boston, MA, USA, 2015, pp. 586–594.
Proc. 2016 Int. Joint Conf. Neural Networks, Vancouver, [102] Q. Wang, Y. Dou, X. W. Liu, Q. Lv, and S. J. Li,
Canada, 2016, pp. 2714–2721. Multi-view clustering with extreme learning machine,
[88] L. Feng, L. Cai, Y. Liu, and S. L. Liu, Multi-view spectral Neurocomputing, vol. 214, pp. 483–494, 2016.
clustering via robust local subspace learning, Soft Comput., [103] H. C. Gao, F. P. Nie, X. L. Li, and H. Huang, Multi-
vol. 21, no. 8, pp. 1937–1948, 2017. view subspace clustering, in Proc. 2015 IEEE Int. Conf.
[89] Y. Wang, W. J. Zhang, L. Wu, X. M. Lin, M. Fang, and Computer Vision, Santiago, Chile, 2016, pp. 4238–4246.
S. R. Pan, Iterative views agreement: An iterative low- [104] J. L. Xu, J. W. Han, and F. P. Nie, Discriminatively
rank based structured optimization method to multi-view embedded k-means for multi-view clustering, in
spectral clustering, in Proc. 25th Int. Joint Conf. Artificial Proc. 2016 IEEE Conf. Computer Vision and Pattern
Intelligence, New York, NY, USA, 2016, pp. 2153–2159. Recognition, Las Vegas, NV, USA, 2016, pp. 5356–5364.
[90] P. Zhao, Y. Jiang, and Z. H. Zhou, Multi-view matrix [105] J. L. Xu, J. W. Han, F. P. Nie, and X. L. Li, Re-
completion for clustering with side information, in Proc. weighted discriminatively embedded k-means for multi-
21st Pacific-Asia Conf. Advances in Knowledge Discovery view clustering, IEEE Trans. Image Process., vol. 26, no.
and Data Mining, Jeju, South Korea, 2017, pp. 403–415. 6, pp. 3016–3027, 2017.
Yan Yang et al.: Multi-view Clustering: A Survey 105
[106] Y. Wang, X. M. Lin, L. Wu, W. J. Zhang, and [122] X. C. Cao, C. Q, Zhang, C. J. Zhou, H. Z. Fu, and
Q. Zhang, Exploiting correlation consensus: Towards H. Foroosh, Constrained multi-view video face clustering,
subspace clustering for multi-modal data, in Proc. 22nd IEEE Trans. Image Process., vol. 24, no. 11, pp. 4381–
ACM Int. Conf. Multimedia, Orlando, FL, USA, 2014, pp. 4393, 2015.
981–984. [123] Q. Y. Yin, S. Wu, and L. Wang, Incomplete multi-
[107] Y. Wang, X. M. Lin, L. Wu, W. J. Zhang, Q. Zhang, and X. view clustering via subspace learning, in Proc. 24th ACM
D. Huang, Robust subspace clustering for multi-view data Int. on Conf. Information and Knowledge Management,
by exploiting correlation consensus, IEEE Trans. Image Melbourne, Australia, 2015, pp. 383–392.
Process., vol. 24, no. 11, pp. 3939–3949, 2015. [124] Q. Y. Yin, S. Wu, and L. Wang, Unified subspace learning
[108] Y. B. Fan, R. He, and B. G. Hu, Global and local consistent for incomplete and unlabeled multi-view data, Pattern
multi-view subspace clustering, in Proc. 3rd IAPR Asian Recogn., vol. 67, pp. 313–327, 2017.
Conf. Pattern Recognition, Kuala Lumpur, Malaysia, 2015, [125] C. Xu, D. C. Tao, and C. Xu, Multi-view learning with
pp. 564–568. incomplete views, IEEE Trans. Image Process., vol. 24, no.
[109] X. Zhang, D. Phung, S. Venkatesh, D. S. Pham, and W. 12, pp. 5812–5825, 2015.
Q. Liu, Multi-view subspace clustering for face images, in [126] D. D. Lee and H. S. Seung, Learning the parts of objects
Proc.2015 Int. Conf. Digital Image Computing: Techniques by non-negative matrix factorization, Nature, vol. 401, no.
and Applications, Adelaide, Australia, 2015, pp. 1–7. 6755, pp. 788–791, 1999.
[110] D. Wang, R. He, L. Wang, and T. N. Tan, Adaptive multi- [127] C. Ding, X. F. He, and H. D. Simon, Nonnegative
view clustering via cross trace lasso, in Proc. 3rd IAPR lagrangian relaxation of k-means and spectral clustering,
Asian Conf. Pattern Recognition, Kuala Lumpur, Malaysia, in Proc. 16th European Conf. Machine Learning, Porto,
2015, pp. 559–563. Portugal, 2005, pp. 530–538.
[128] V. P. Pauca, J. Piper, and R. J. Plemmons, Nonnegative
[111] D. Wang, Q. Y. Yin, R. He, L. Wang, and T. N. Tan, Multi-
matrix factorization for spectral data analysis, Linear
view clustering via structured low-rank representation, in
Algebra Appl., vol. 416, no. 1, pp. 29–47, 2006.
Proc. 24th ACM Int. on Conf. Information and Knowledge
[129] D. Cai, X. F. He, J. W. Han, and T. S. Huang,
Management, Melbourne, Australia, 2015, pp. 1911–1914.
Graph regularized nonnegative matrix factorization for
[112] X. H. Liu, S. W. Ji, W. Glänzel, and B. De Moor,
data representation, IEEE Trans. Pattern Anal. Mach.
Multiview partitioning via tensor methods, IEEE Trans.
Intell., vol. 33, no. 8, pp. 1548–1560, 2011.
Knowl. Data Eng., vol. 25, no. 5, pp. 1056–1069, 2013. [130] C. H. Q. Ding, T. Li, and M. I. Jordan, Convex and semi-
[113] C. Q. Zhang, H. Z. Fu, S. Liu, G. C. Liu, and X.
nonnegative matrix factorizations, IEEE Trans. Pattern
C. Cao, Low-rank tensor constrained multiview subspace
Anal. Mach. Intell., vol. 32, no. 1, pp. 45–55, 2010.
clustering, in Proc. 2015 IEEE Int. Conf. Computer Vision, [131] A. Cichocki and R. Zdunek, Multilayer nonnegative
Santiago, Chile, 2015, pp. 1582–1590. matrix factorisation, Electron. Lett., vol. 42, no. 16, pp.
[114] Y. Xie, D. C. Tao, W. S. Zhang, and L. Zhang, Multi-view 947–948, 2006.
subspace clustering via relaxed L1 -norm of tensor multi- [132] T. Li and C. Ding, Non-negative matrix factorizations for
rank, arXiv preprint arXiv: 1610.07126, 2016. clustering: A survey, in Data Clustering: Algorithms and
[115] M. Yin, J. B. Gao, S. L. Xie, and Y. Guo, Low-rank multi- Applications, C. C. Aggarwal and C. K. Reddy, eds. Boca
view clustering in third-order tensor space, arXiv preprint Raton, FL, USA: Chapman & Hall/CRC, 2014.
arXiv: 1608.08336, 2016. [133] D. Greene and P. Cunningham, A matrix factorization
[116] Y. Cui, X. Z. Fern, and J. G. Dy, Non-redundant multi- approach for integrating multiple data views, in Proc.
view clustering via orthogonalization, in Proc. 7th IEEE European Conf. Machine Learning and Knowledge
Int. Conf. Data Mining, Omaha, NE, USA, 2007, pp. 133– Discovery in Databases: Part I, Bled, Slovenia, 2009, pp.
142. 423–438.
[117] Y. Cui, X. Z. Fern, and J. G. Dy, Learning [134] J. L. Liu, C. Wang, J. Gao, and J. W. Han, Multi-view
multiple nonredundant clusterings, ACM Trans. Knowl. clustering via joint nonnegative matrix factorization, in
Discov.Data, vol. 4, no. 3, p. 15, 2010. Proc. 2013 SIAM Int. Conf. Data Mining, Austin, TX,
[118] S. Günnemann, I. Färber, and T. Seidl, Multi-view USA, 2013, pp. 252–260.
clustering using mixture models in subspace projections, in [135] Y. Jiang, J. Liu, Z. C. Li, and H. Q. Lu, Semi-supervised
Proc. 18th ACM SIGKDD Int. Conf. Knowledge Discovery unified latent factor learning with multi-view data, Mach.
and Data Mining, Beijing, China, 2012, pp. 132–140. Vision Appl., vol. 25, no. 7, pp. 1635–1645, 2014.
[119] E. Muller, S. Günnemann, I. Farber, and T. Seidl, [136] M. J. He, Y. Yang, and H. J. Wang, Learning latent features
Discovering multiple clustering solutions: Grouping for multi-view clustering based on NMF, in Proc. Int. Joint
objects in different views of the data, in Proc. 10th Int. Conf. Rough Sets, Santiago de Chile, Chile, 2016, pp. 459–
Conf. Data Mining, Sydney, Australia, 2012, p. 1220. 469.
[120] S. Günnemann, I. Färber, M. Rüdiger, and T. Seidl, [137] W. Y. Chang, C. P. Wei, and Y. C. F. Wang,
SMVC: Semi-supervised multi-view clustering in Multi-view nonnegative matrix factorization for clothing
subspace projections, in Proc. 20th ACM SIGKDD Int. image characterization, in Proc. 22nd Int. Conf. Pattern
Conf. Knowledge Discovery and Data Mining, New York, Recognition, Stockholm, Sweden, 2014, pp. 1272–1277.
NY, USA, 2014, pp. 253–262. [138] W. H. Ou, S. J. Yu, G. Li, J. Lu, K. S. Zhang,
[121] Q. Y. Yin, S. Wu, R. He, and L. Wang, Multi-view and G. Xie, Multi-view non-negative matrix factorization
clustering via pairwise sparse subspace representation, by patch alignment framework with view consistency,
Neurocomputing, vol. 156, pp. 12–21, 2015. Neurocomputing, vol. 204, pp. 116–124, 2016.
106 Big Data Mining and Analytics, June 2018, 1(2): 83-107
[139] C. Xu, D. C. Tao, and C. Xu, Multi-view self-paced [153] W. Xu and Y. H. Gong, Document clustering by concept
learning for clustering, in Proc. 24th Int. Conf. Artificial factorization, in Proc. 27th Annu. Int. ACM SIGIR
Intelligence, Buenos Aires, Argentina, 2015, pp. 3974– Conf. Research and Development in Information Retrieval,
3980. Sheffield, UK, 2004, pp. 202–209.
[140] X. Cai, F. P. Nie, and H. Huang, Multi-view k-means [154] H. Wang, Y. Yang, and T. R. Li, Multi-view clustering via
clustering on big data, in Proc. 23rd Int. joint Conf. concept factorization with local manifold regularization, in
Artificial Intelligence, Beijing, China, 2013, pp. 2598– Proc. 16th Int. Conf. Data Mining, Barcelona, Spain, 2016,
2604. pp. 1245–1250.
[141] M. J. Qian and C. X. Zhai, Unsupervised feature selection [155] K. Zhan, J. H. Shi, J. Wang, and F. Tian, Graph-
for multi-view clustering on text-image web news data, regularized concept factorization for multi-view document
in Proc. 23rd ACM Int. Conf. Conf. Information and clustering, Journal of Visual Communication and Image
Knowledge Management, Shanghai, China, 2014, pp. Representation, vol. 48, pp. 411–418, 2017.
1963–1966. [156] W. X. Shao, L. F. He, and P. S. Yu, Multiple
[142] H. D. Zhao, Z. M. Ding, and Y. Fu, Multi-view clustering incomplete views clustering via weighted nonnegative
via deep matrix factorization, in Proc. 31st AAAI Conf. matrix factorization with L2,1 regularization, in European
Artificial Intelligence, San Francisco, CA, USA, 2017, pp. Conf. Machine Learning and Knowledge Discovery in
2921–2927. Databases, Porto, Portugal, 2015, pp. 318–334.
[143] H. Wang, F. P. Nie, and H. Huang, Multi-view clustering [157] W. X. Shao, L. F. He, C. T. Lu, and P. S. Yu, Online multi-
and feature learning via structured sparsity, in Proc. 30th view clustering with incomplete views, arXiv preprint
Int. Conf. Machine Learning, Atlanta, GA, USA, 2013, pp. arXiv: 1611.00481, 2016.
352–360. [158] W. X. Shao, L. F. He, and P. S. Yu, Clustering on
[144] H. F. Liu, H. Y. Mao, and Y. Fu, Robust multi-view feature multi-source incomplete data via tensor modeling and
selection, in Proc. 16th Int. Conf. Data Mining, Barcelona, factorization, in Proc. 19th Pacific-Asia Conf. Advances
Spain, 2016, pp. 281–290. in Knowledge Discovery and Data Mining, Ho Chi Minh
[145] X. L. Gong, F. W. Wang, and L. P. Huang, Weighted NMF- City, Vietnam, 2015, pp. 485–497.
based multiple sparse views clustering for web items, [159] S. Y. Li, Y. Jiang, and Z. H. Zhou, Partial multi-view
in Proc. 21st Pacific-Asia Conf. Advances in Knowledge clustering, in Proc. 28th AAAI Conf. Artificial Intelligence,
Discovery and Data Mining, Jeju, South Korea, 2017, pp. Québec City, Canada, 2014, pp. 1968–1974.
416–428. [160] B. Qian, X. B. Shen, Y. Y. Gu, Z. M. Tang, and
[146] D. Hidru and A. Goldenberg, EquiNMF: Graph Y. H. Ding, Double constrained NMF for partial multi-
regularized multiview nonnegative matrix factorization, view clustering, in Proc. 2016 Int. Conf. Digital Image
arXiv preprint arXiv: 1409.4018, 2014. Computing: Techniques and Applications, Gold Coast,
[147] Z. F. Wang, X. W. Kong, H. Y. Fu, M. Li, and Y. Australia, 2016, pp. 1–7.
J. Zhang, Feature extraction via multi-view non-negative [161] N. Rai, S. Negi, S. Chaudhury, and O. Deshmukh,
matrix factorization with local graph regularization, in Partial multi-view clustering using graph regularized NMF,
Proc. 2015 IEEE Int. Conf. Image Proc., Quebec City, in Proc. 23rd Int. Conf. Pattern Recognition, Cancun,
Canada, 2015, pp. 3500–3504. Mexico, 2016, pp. 2192–2197.
[148] Z. Y. Guan, L. J. Zhang, J. Y. Peng, and J. P. Fan, Multi- [162] X. C. Zhang, L. L. Zong, X. Y. Liu, and H.
view concept learning for data representation, IEEE Trans. Yu, Constrained NMF-based multi-view clustering on
Knowl. Data Eng., vol. 27, no. 11, pp. 3016–3028, 2015. unmapped data, in Proc. 29th AAAI Conf. Artificial
[149] X. C. Zhang, L. Zhao, L. L. Zong, X. Y. Liu, and H. Intelligence, Austin, TX, USA, 2015, pp. 3174–3180.
Yu, Multi-view clustering via multi-manifold regularized [163] R. Caruana, Multitask learning, Mach. Learn., vol. 28, no.
nonnegative matrix factorization, in Proc. 2014 IEEE Int. 1, pp. 41–75, 1997.
Conf. Data Mining, Shenzhen, China, 2014, pp. 1103– [164] Q. Q. Gu and J. Zhou, Learning the shared subspace
1108. for multi-task clustering and transductive transfer
[150] L. L. Zong, X. C. Zhang, L. Zhao, H. Yu, and Q. L. Zhao, classification, in Proc. 9th IEEE Int. Conf. Data Mining,
Multi-view clustering via multi-manifold regularized non- Miami, FL, USA, 2009, pp. 159–168.
negative matrix factorization, Neural Netw., vol. 88, pp. [165] Z. H. Zhang and J. Zhou, Multi-task clustering via domain
74–89, 2017. adaptation, Pattern Recogn., vol. 45, no. 1, pp. 465–473,
[151] D. Kuang, C. Ding, and H. Park, Symmetric nonnegative 2012.
matrix factorization for graph clustering, in Proc. 2012 [166] S. N. Xie, H. T. Lu, and Y. C. He, Multi-task co-clustering
SIAM Int. Conf. Data Mining, Anaheim, CA, USA, 2012, via nonnegative matrix factorization, in Proc. 21st Int.
pp. 106–117. Conf. Pattern Recognition, Tsukuba, Japan, 2012, pp.
[152] X. C. Zhang, Z. X. Wang, L. L. Zong, and H. Yu, 2954–2958.
Multi-view clustering via graph regularized symmetric [167] S. Al-Stouhi and C. K. Reddy, Multi-task clustering using
nonnegative matrix factorization, in Proc. 2016 IEEE Int. constrained symmetric non-negative matrix factorization,
Conf. Cloud Computing and Big Data Analysis, in Proc. 2014 SIAM Int. Conf. Data Mining, Philadelphia,
Chengdu, China, 2016, pp. 109–114. PA, USA, 2014, pp. 785–793.
Yan Yang et al.: Multi-view Clustering: A Survey 107
[168] X. Wang, B. Y. Qian, J. P. Ye, and I. Davidson, [177] X. J. Chen, X. F. Xu, J. Z. Huang, and Y. M. Ye, Tw-k-
Multi-objective multi-view spectral clustering via Pareto means: Automated two-level variable weighting clustering
optimization, in Proc. 2013 SIAM Int. Conf. Data Mining, algorithm for multiview data, IEEE Trans. Knowl. Data
Austin, TX, USA, 2013, pp. 234–242. Eng., vol. 25, no. 4, pp. 932–944, 2013.
[169] A. Wahid, X. Y. Gao, and P. Andreae, Multi-view [178] Y. M. Xu, C. D. Wang, and J. H. Lai, Weighted multi-view
clustering of web documents using multi-objective genetic clustering with feature selection, Pattern Recogn., vol. 53,
algorithm, in Proc. 2016 IEEE Congress on Evolutionary pp. 25–35, 2016.
Computation, Beijing, China, 2014, pp. 2625–2632. [179] B. Jiang, F. Y. Qiu, L. P. Wang, and Z. J. Zhang, Bi-level
weighted multi-view clustering via hybrid particle swarm
[170] A. Wahid, X. Y. Gao, and P. Andreae, Multi-objective
optimization, Inf. Process. Manage., vol. 52, no. 3, pp.
multi-view clustering ensemble based on evolutionary
387–398, 2016.
approach, in Proc. 2015 IEEE Congress on Evolutionary [180] B. Jiang, F. Y. Qiu, and L. P. Wang, Multi-view clustering
Computation, Sendai, Japan, 2015, pp. 1696–1703. via simultaneous weighting on views and features, Appl.
[171] X. T. Zhang, X. C. Zhang, H. Liu, and X. Y. Soft Comput., vol. 47, pp. 304–315, 2016.
Liu, Multi-task clustering through instances transfer, [181] G. Cleuziou, M. Exbrayat, L. Martin, and J. H.
Neurocomputing, vol. 251, pp. 145–155, 2017. Sublemontier, CoFKM: A centralized method for multiple-
[172] H. Shi, Y. Li, Y. H. Han, and Q. H. Hu, Cluster structure view clustering, in Proc. 9th IEEE Int. Conf. Data Mining,
preserving unsupervised feature selection for multi-view Miami, FL, USA, 2009, pp. 752–757.
tasks, Neurocomputing, vol. 175, pp. 686–697, 2016. [182] F. de A. T. de Carvalho, F. M. de Melo, and Y.
[173] X. C. Zhang, X. T. Zhang, and H. Liu, Multi-task multi- Lechevallier, A multi-view relational fuzzy c-medoid
view clustering for non-negative data, in Proc. 24th Int. vectors clustering algorithm, Neurocomputing, vol. 163,
Conf. Artificial Intelligence, Buenos Aires, Argentina, pp. 115–123, 2015.
[183] Y. Z. Jiang, F. L. Chung, S. T. Wang, Z. H. Deng, J. Wang,
2015, pp. 4055–4061.
and P. J. Qian, Collaborative fuzzy clustering from multiple
[174] X. T. Zhang, X. C. Zhang, H. Liu, and X. Y. Liu, Multi-
weighted views. IEEE Trans. Cybern., vol. 45, no. 4, pp.
task multi-view clustering, IEEE Trans. Knowl. Data Eng.,
688–701, 2015.
vol. 28, no. 12, pp. 3324–3338, 2016. [184] Y. T. Wang and L. H. Chen, Multi-view fuzzy clustering
[175] R. Bekkerman and J. Jeon, Multi-modal clustering with minimax optimization for effective clustering of data
for multimedia collections, in Proc. 2007 IEEE Conf. from multiple sources, Expert Syst. Appl., vol. 72, pp. 457–
Computer Vision and Pattern Recognition, Minneapolis, 466, 2017.
MN, USA, 2007, pp. 1–8. [185] S. F. Hussain, G. Bisson, and C. Grimal, An improved
[176] X. J. Xie and S. L. Sun, Multi-view clustering co-similarity measure for document clustering, in Proc.
ensembles, in Proc. 2013 Int. Conf. Machine Learning and 9th Int. Conf. Machine Learning and Applications,
Cybernetics, Tianjin, China, 2013, pp. 51–56. Washington, DC, USA, 2010, pp. 190–197.
Yan Yang received the BS and MS, Hao Wang received the BEng degree
degrees from Huazhong University of from Nanyang Institute of Technology,
Science and Technology, Wuhan, China, in Nanyang, China, in 2014. He is currently
1984 and 1987, respectively. She received pursuing the PhD degree in Southwest
the PhD degree from Southwest Jiaotong Jiaotong University, Chengdu, China.
University, Chengdu, China, in 2007. From His current research interests include
2002 to 2003 and 2004 to 2005, she was data mining, multi-view learning, natural
a visiting scholar with the University of language processing, and lifelong machine
Waterloo, Waterloo, Canada. She is currently a professor learning.
and vice dean with the School of Information Science and
Technology, Southwest Jiaotong University, Chengdu, China.
Her current research interests include multi-view learning, big
data analysis and mining, ensemble learning, semi-supervised
learning, and cloud computing.