0% found this document useful (0 votes)

25 views8 pages

Clustering Incomplete Data Using Kernel-Based

Uploaded by

Govt Degree College W Chonna Wala Hasilpur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views8 pages

Clustering Incomplete Data Using Kernel-Based

Uploaded by

Govt Degree College W Chonna Wala Hasilpur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Neural Processing Letters 18: 155–162, 2003.

155
# 2003 Kluwer Academic Publishers. Printed in the Netherlands.

Clustering Incomplete Data Using Kernel-Based

Fuzzy C-means Algorithm

DAO-QIANG ZHANG? and SONG-CAN CHEN

Department of Computer Science and Engineering, Nanjing University of Aeronautics
and Astronautics, Nanjing, 210016, People’s Republic of China. e-mail: [email protected]

Abstract. There is a trend in recent machine learning community to construct a nonlinear

version of a linear algorithm using the ‘kernel method’, e.g. Support Vector Machines (SVMs),
kernel principal component analysis, kernel fisher discriminant analysis and the recent kernel
clustering algorithms. In unsupervised clustering algorithms using kernel method, typically, a
nonlinear mapping is used first to map the data into a potentially much higher feature space,
where clustering is then performed. A drawback of these kernel clustering algorithms is that the
clustering prototypes lie in high dimensional feature space and hence lack clear and intuitive
descriptions unless using additional projection approximation from the feature to the data
space as done in the existing literatures. In this paper, a novel clustering algorithm using
the ‘kernel method’ based on the classical fuzzy clustering algorithm (FCM) is proposed
and called as kernel fuzzy c-means algorithm (KFCM). KFCM adopts a new kernel-induced
metric in the data space to replace the original Euclidean norm metric in FCM and the
clustered prototypes still lie in the data space so that the clustering results can be reformulated
and interpreted in the original space. Our analysis shows that KFCM is robust to noise and
outliers and also tolerates unequal sized clusters. And finally this property is utilized to cluster
incomplete data. Experiments on two artificial and one real datasets show that KFCM
has better clustering performance and more robust than several modifications of FCM for
incomplete data clustering.

Key words. clustering, fuzzy c-means, incomplete data, Kernel methods

1. Introduction
The fuzzy c-means (FCM) algorithm [1], as a typical clustering algorithm, has been
utilized in a wide variety of engineering and scientiﬁc disciplines such as medicine
imaging, bioinformatics, pattern recognition, and data mining. FCM partitions
a given dataset, X ¼ fx1 ; . . . ; xn g Rp , into c fuzzy subsets by minimizing the
following objective function
c X
X n
2
Jm ðU; V Þ ¼ um
ik kxk vi k ð1Þ
i¼1 k¼1

where c is the number of clusters and selected as a speciﬁed value in this paper, n the
number of data points, uik the membership of xk in class i, m the quantity controlling
?Corresponding author.
156 DAO-QIANG ZHANG AND SONG-CAN CHEN

clustering fuzziness, and V the set of cluster centers (vi 2 Rp ). The matrix U with the
ik-th entry uik is constrained to contain elements in the range [0,1] such that
Pc
i¼1 uik ¼ 1; 8k ¼ 1; 2; . . . ; n. The function Jm is minimized by a famous alternate
iterative algorithm.
Since the original FCM uses the squared-norm to measure similarity between
prototypes and data points, it can only be effective in clustering ‘spherical’ clusters.
To cluster more general dataset, a lot of algorithms have been proposed by replacing
the squared-norm in Equation (1) with other similarity measures [1, 2]. A recent
development is to use kernel method to construct the kernel versions of the FCM
algorithm [3–5]. A common ground of these algorithms is that clustering is per-
formed in the transformed feature space after a (implicitly) nonlinear data transfor-
mation F. However, a drawback of these algorithms is that clustering result,
especially those prototypes, is difficult to be exactly represented and reformulated
due to having not corresponding pre-images in the data space for some prototypes
in the feature space. To avoid that problem, some additional approximate projection
techniques must be used, as shown in [6, 7].
On the other hand, all aforementioned algorithms are based on the assumption
that the data in a dataset are complete, that is, all of the features (i.e., components)
of every vector in X are known or exist. However, many real data sets such as
medical data suffer from incompleteness, i.e., one or more of the components in X
are missing, as a result of measurement errors, missing observations, etc. [8]. Many
algorithms have been proposed to deal with incomplete data [8–12]. An elementary
but good summary was given in [9], where several principles for handling incomplete
data were included. More recently, the triangle inequality was used to estimate the
missing dissimilarity data [8]. And in [12], several ways were developed to continue
the FCM clustering of incomplete data. One simple method is to use the partial
distance strategy (PDS) in FCM, the other is to estimate the missing feature as
the weighted sum of the prototypes (WSP), and another strategy is the nearest
prototype strategy (NPS).
In this paper, an alternative kernel-based fuzzy c-means (KFCM) algorithm is
proposed to cluster incomplete data. Unlike the usual way utilizing kernel method
in FCM, the proposed KFCM clustering algorithm is performed still in original data
space, i.e., prototypes lie in data space. Furthermore, KFCM adopts a more robust
kernel-induced metric different from the Euclidean norm in original FCM. By a
similar way as in [12], we applied the proposed KFCM to cluster incomplete data,
and it is shown that WSP and NPS are two special cases of KFCM when clustering
incomplete data. Furthermore, because KFCM has better outlier and noise immu-
nity than FCM, it is especially suitable to dealing with incomplete data. In this
paper, three artificial and real datasets are used for testing. Experimental results
show that KFCM has better performance than WSP and NPS.
In Section 2, we first discuss the alternative kernel based fuzzy c-means clustering
algorithm, and then we apply this algorithm for incomplete data clustering in Section 3.
To demonstrate the effectiveness of the proposed algorithm, three experiments
CLUSTERING INCOMPLETE DATA 157

on incomplete datasets are conducted and results are given in Section 4. At last,
conclusions and discussions are given in Section 5.

2. Kernel Fuzzy c-means Clustering (KFCM)

Deﬁne a nonlinear map as F : x ! FðxÞ 2 F, where x 2 X. X denotes the data space,
and F the transformed feature space with higher or even inﬁnite dimension. KFCM
minimizes the following objective function
c X
X n
2
Jm ðU; V Þ ¼ um
ik kFðxk Þ Fðvi Þk ð2Þ
i¼1 k¼1

where

kFðxk Þ Fðvi Þk2 ¼ Kðxk ; xk Þ þ Kðvi ; vi Þ 2Kðxk ; vi Þ ð3Þ

where Kðx; yÞ ¼ FðxÞT FðyÞ and is an inner product kernel function. If we adopt the
Gaussian function as a kernel function, i.e., Kðx; yÞ ¼ expðkx yk2 =s2 Þ, then
Kðx; xÞ ¼ 1, according to Equation (3), Equation (2) can be rewritten as

c X
X n
Jm ðU; V Þ ¼ 2 um
ik ð1 Kðxk ; vi ÞÞ ð4Þ
i¼1 k¼1

Minimizing Equation (4) under the constraint of U, we have

ð1=ð1 Kðxk ; vi ÞÞÞ1=ðm1Þ

uik ¼ Pc 1=ðm1Þ
; 8i ¼ 1; 2; . . . ; c; k ¼ 1; 2; . . . ; n ð5Þ
j¼1 ð1=ð1 Kðxk ; vj ÞÞÞ

Pn
um
ik Kðxk ; vi Þxk
vi ¼ Pk¼1
n m ; 8i ¼ 1; 2; . . . ; c ð6Þ
k¼1 ik Kðxk ; vi Þ
u

It is worth to note that although Equations (5)–(6) are derived using the Gaussian
kernel function, we can use other functions satisfying Kðx; xÞ ¼ 1 in Equations
(5)–(6) in real applications such as the following RBF functions and hyper tangent
functions:
(1) RBF functions:
X
a a b 2
Kðx; yÞ ¼ exp jxi yi j =s ð0 < b 4 2Þ ð7Þ
i
158 DAO-QIANG ZHANG AND SONG-CAN CHEN

(2) Hyper tangent functions:

Kðx; yÞ ¼ 1 tanhðkx yk2 =s2 Þ ð8Þ

Note that RBF function with a ¼ 1; b ¼ 2 reduces into the common-used Gaussian
function. In fact, Equation (3) can be viewed as kernel-induced new metric are in the
data space, which is defined as the following
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
dðx; yÞ/kFðxÞ FðyÞk ¼ 2ð1 Kðx; yÞÞ ð9Þ

In Appendix I, we prove that dðx; yÞ deﬁned in Equation (9) is a metric in the original
space in case that K(x,y) takes as the Gaussian, RBF functions and hyper tangent
function as the kernel functions respectively. According to Equation (6), the data
point xk is endowed with an additional weight Kðxk ; vi Þ, which measures the similar-
ity between xk and vi . When xk is an outlier, i.e., xk is far from the other data points,
Kðxk ; vi Þ will be very small, so the weighted sum of data points shall be more robust.
Since in incomplete dataset, a data point with missing components is likely to turn
into an outlier, the algorithm based on KFCM to cluster incomplete data is of great
potential.

3. Clustering Incomplete Data using KFCM

To implement clustering on incomplete data, we derive the following algorithm
based on KFCM:
(1) Fix c; m > 1 and e > 0 for some positive constant;
(2) Set xkj ¼ 0; if xkj is a missing feature;
(3) Initialize the prototypes vi using FCM;
(4) For t ¼ 1; 2; . . . ; tmax , do:
(a) Update all memberships utik with Equation (5);
(b) Update all prototypes vti with Equation (6);
(c) Calculate the missing value using Equation (10);
(d) Compute E t ¼ maxi;k jutik ut1
ik j; if E t 4 e, stop;
Endo.
Pc m
uik Kðxk ; vi Þvij
xkj ¼ Pi¼1
c m ð10Þ
i¼1 uik Kðxk ; vi Þ

Here the kernel function is the same as in Section 2. From Equations (5), (6) and (10),
as s ! 1; Kðxk ; vi Þ 1 kxk vi k2 =s2 , then KFCM reduces to the classical FCM,
and Equation (10) changes into the expression used in WSP algorithm. Furthermore,
if s ! 0, Equation (10) reduces to xkj ¼ vpj , where p ¼ mini ðkxk vi k2 Þ, which is just
the strategy used in NPS algorithm.
CLUSTERING INCOMPLETE DATA 159

4. Experiments Results and Discussions

In this section, we compare the performance of FCM with that of KFCM on some
incomplete datasets. In all cases, the incomplete data are artificially generated
by randomly selecting a specified percentage of its components to be designated as
missing values. The random selection of missing values is constrained so that: (1) each
original feature vector retains at least one component; (2) each feature has at least
one value present in the incomplete data set [12]. The initial values for prototypes
are obtained using FCM on the original (complete) data sets. And the kernel func-
tions used are Gaussian, RBF, and hyper tangent functions.
The first dataset (Data Set A) is an artificially generated one [2]. It contains two
clusters; one has 25 points and the other 125 points. Figure 1 shows the clustering
result on the complete data using FCM and KFCM. We use both Gaussian and
hyper tangent kernel function in this experiment, with s ¼ 2, and experiments are
repeated 10 times, with each time having the same result. It can be seen from Figure 2
that FCM misclassifies 8 data points, while KFCM correctly classifies the data.
Table I gives the clustering result on incomplete Data Set A. Both Gaussian and
RBF kernels are used in KFCM, with s ¼ 0:2. Results are got under a total of
1000 trials. Obviously, KFCM has much better performance than FCM on Data
Set A and the best result is got using the RBF function at a ¼ 1.5, b ¼ 1.2, as shown
in Table I.
The second artificial dataset (Data Set B) consists of two clusters having 100 points
each in R5 [12]. They are randomly distributed from a Gaussian distributions with
mean (1 1 1 1 1) and (1 1 1 1 1) respectively and have identity covariance.
Figure 2 shows the 2-D plot of Data Set B using the standard PCA technique [13].

Figure 1. Clustering result of FCM and KFCM algorithms on Data Set A. þ cluster 1; . cluster 2; * data
points misclassiﬁed by FCM.
160 DAO-QIANG ZHANG AND SONG-CAN CHEN

Figure 2. Two dimension plot of the distribution of Data Set B. þ cluster 1; * cluster 2.

For Data Set B with Gaussian distributions, both FCM and KFCM can correctly
classify the data. Next, we perform clustering on incomplete Data Set B. Table II
gives the clustering result on incomplete Data Set B. The kernel function used in
KFCM is the Gaussian and hyper tangent function, both taking s ¼ 2. Results
are averaged under a total of 1000 trials. The Gaussian function has the smallest mis-
classiﬁcations in average in all cases, as shown in Table II.
The last dataset is the well-known Iris dataset. It consists of 150 four-dimensional
feature vectors, with 50 vectors for each of three physically labeled classes. In order
to acquire better clustering performance, each vector is normalized. The clustering
result on the incomplete Iris data is shown in Table III. The kernel functions used

Table I. Averaged number of misclassiﬁcations on incomplete Data Set A.

% Missing PDS WSP NPS Gaussian RBF (a ¼ 1.5, b ¼ 1.2)

10 21.84 11.92 17.95 7.62 7.27

20 33.56 16.12 31.45 16.08 14.93

Table II. Averaged number of misclassiﬁcations on incomplete Data Set B.

% Missing PDS WSP NPS Gaussian Hyper tangent

20 2.57 2.54 2.61 2.43 2.51

40 6.39 6.33 6.71 6.07 6.10
60 15.70 14.66 30.50 14.32 14.39
CLUSTERING INCOMPLETE DATA 161

Table III. Averaged number of misclassiﬁcations on incomplete Iris dataset.

% Missing PDS WSP NPS Gaussian RBF (a ¼ 0.5, b ¼ 2)

25 63.96 16.33 29.14 13.57 12.73

50 77.79 37.21 50.75 37.66 31.26

are Gaussian and RBF functions, both with s ¼ 1. The result is averaged under a
total of 1000 trials. The RBF function under a ¼ 0.5, b ¼ 2 has the averaged smallest
misclassifications in all cases, as shown in Table III.
From our experiments, we found that different kernels with different parameters
lead to different clustering results. Thus a key point is to choose an appropriate
kernel parameter. However, there is not a general theory to guide the selection of
the best parameter in most kernel based algorithms. This is an open problem. Here
in our experiments, we adopted an approach similar to that in [6], i.e., running a
5-fold cross-validation procedure only on a few realizations of the data set. Each
time, this is done in two stages: first taking a large interval in the exponential scale
to find a good initial guess of the parameter, and then shortening gradually the inter-
val to refine the parameter at the second stage. We use the median of the five esti-
mations throughout the remaining trials on the data set. Usually, it needs about several
tens of trials on the data set to get an appropriate parameter. Remembering in our
experiments, a total of 1000 trials are done, the computation cost on choosing an
appropriate parameter is still less.

5. Conclusions
In this paper, we proposed a kernel-induced new metric to replace the Euclidean
norm in fuzzy c-means algorithm in the original space and then derived the alterna-
tive kernel-based fuzzy c-means algorithm. Unlike the common way using the ‘kernel
method’ to represent a variable in dual form as in SVM [7], kernel PCA [6], kernel
Fisher discriminant analysis [6] and kernel clustering algorithms [3–5], we adopted
a new kernel-induced metric as in Equation (2). It was shown the proposed kernel
clustering algorithm is robust to noise and outliers and also tolerates unequal sized
clusters. That property is further utilized to cluster incomplete data and results in
better performance than those classical counterparts.

Appendix I. Proof that d(x,y) defined in Equation (9) is a metric

Proof. To prove dðx; yÞ is a metric, the necessary and sufficient condition is that
dðx; yÞ satisfies the following three conditions [14]
(i) dðx; yÞ > 0; 8x 6¼ y; dðx; xÞ ¼ 0,
(ii) dðx; yÞ ¼ dðy; xÞ,
(iii) dðx; yÞ 4 dðx; zÞ þ dðz; yÞ; 8z.
162 DAO-QIANG ZHANG AND SONG-CAN CHEN

It’s easy to verify that for Gaussian, RBF and hyper tangent kernel functions,
dðx; yÞ defined in Equation (9) satisfies 8x 6¼ y, dðx; yÞ ¼ dðy; xÞ > 0, and dðx; xÞ ¼ 0,
so condition (i) and (ii) are satisfied. From Equation (9), we have

dðx; yÞ ¼ kFðxÞ FðyÞk 4 kFðxÞ FðzÞk þ kFðzÞ FðyÞk ¼ dðx; zÞ þ dðz; yÞ:

Thus condition (iii) is also satisﬁed due to the properties of the norm. So dðx; yÞ is a
metric.

Acknowledgements
The authors are grateful to the anonymous reviewers for their comments and sugges-
tions to improve the presentation of this paper. This work was supported in part by
the National Science Foundations of China and of Jiangsu under Grant Nos.
60271017 and BK2002092, ‘QingLan project’ foundation of Jiangsu and Returnees
foundation of Educational Ministry.

References
1. Bezdek, J. C.: Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press,
New York, 1981.
2. Wu, K. L. and Yang, M. S.: Alternative c-means clustering algorithms, Pattern Reco-
gnition vol. 35, pp. 2267–2278, 2002.
3. Girolami, M.: Mercer kernel-based clustering in feature space, IEEE Trans. Neual
Networks 13(3) (2002), 780–784.
4. Zhang, L., Zhou, W. D. and Jiao, L. C.: Kernel clustering algorithm, Chinese J. Com-
puters 25(6) (2002), 587–590 (in Chinese).
5. Zhang, D. Q. and Chen, S. C.: Fuzzy clustering using kernel methods, In: Procedings of
Inter. Conf. Control and Automatation (ICCA’02), June 16–19, pp. 123–128, Xiamen,
China, 2002.
6. Muller, K. R. and Mika, S. et al.: An introduction to Kernel-based learning algorithms,
IEEE Trans. Neural Networks 12(2) (2001), 181–202.
7. Vapnik, V. N.: Statistical Learning Theory. Wiley, New York, 1998.
8. Hathaway, R. J. and Bezdek, J. C.: Clustering incomplete relational data using the
non-Euclidean relational fuzzy c-means algorithm. Pattern Recognition Letters 23
(2002), 151–160.
9. Jain, A. K. and Dubes, R. C.: Algorithms for Clustering Data, Englewood Cliffs, NJ, 1988.
10. Gaul, W. and Schader, M.: Pyramidal classiﬁcation based on incomplete data, J. Classi-
ﬁcation 11 (1994), 171–193.
11. Schafer, J. L.: Analysis of Incomplete Multivariate Data, Chapman & Hall, London, 1997.
12. Hathaway, R. J. and Bezdek, J. C.: Fuzzy c-means clustering of incomplete data, IEEE
Trans. Syst. Man. Cybernetics 31(5) (2001), 735–744.
13. Jolliffe, L. T.: Principal Component Analysis, Springer-Verlag, 1986.
14. Rudin, W.: Principles of Mathematical Analysis, McGraw-Hill Book Company, New
York, 1976.

Scalable Fuzzy Clustering With Anchor Graph
No ratings yet
Scalable Fuzzy Clustering With Anchor Graph
12 pages
Performance Analysis of Various Fuzzy Clustering Algorithms: A Review
No ratings yet
Performance Analysis of Various Fuzzy Clustering Algorithms: A Review
12 pages
A Hybrid Metaheuristic and Kernel Intuitionistic Fuzzy C-Means
No ratings yet
A Hybrid Metaheuristic and Kernel Intuitionistic Fuzzy C-Means
10 pages
Journal - pone.0259266ACT TicketACT Ticket
No ratings yet
Journal - pone.0259266ACT TicketACT Ticket
33 pages
B Tech Electrical and Electronics Engineering
No ratings yet
B Tech Electrical and Electronics Engineering
221 pages
A Distributed Framework For Trimmed Kernel K-Means Clustering
No ratings yet
A Distributed Framework For Trimmed Kernel K-Means Clustering
14 pages
FCM - The Fuzzy C-Means Clustering Algorithm
No ratings yet
FCM - The Fuzzy C-Means Clustering Algorithm
13 pages
Clustering - Fuzzy C-Means
No ratings yet
Clustering - Fuzzy C-Means
5 pages
Application of Statistics To Agriculture Using Greaco-Latin Square Design
No ratings yet
Application of Statistics To Agriculture Using Greaco-Latin Square Design
8 pages
Gaussian Mixture Model Clustering With Incomplete Data
No ratings yet
Gaussian Mixture Model Clustering With Incomplete Data
14 pages
Cluster 2
No ratings yet
Cluster 2
11 pages
CBSE Class 6 Maths Practice Worksheets
73% (33)
CBSE Class 6 Maths Practice Worksheets
52 pages
Notes On Unsupervised Learning
No ratings yet
Notes On Unsupervised Learning
14 pages
Fs ch10 Clustering
No ratings yet
Fs ch10 Clustering
59 pages
Fuzzy C Means (Overlapping Clustering)
No ratings yet
Fuzzy C Means (Overlapping Clustering)
13 pages
Kernel Clustering
No ratings yet
Kernel Clustering
57 pages
K - Means Clustering With Outlier Removal
No ratings yet
K - Means Clustering With Outlier Removal
7 pages
1 s2.0 S0031320305002943 Main
No ratings yet
1 s2.0 S0031320305002943 Main
17 pages
(IJCST-V1I3P1) : D.Vanisri
No ratings yet
(IJCST-V1I3P1) : D.Vanisri
8 pages
Yang 2017
No ratings yet
Yang 2017
15 pages
9 Fuzzy Clustering
No ratings yet
9 Fuzzy Clustering
32 pages
An Improved K-Means Algorithm Based On Fuzzy Metrics
No ratings yet
An Improved K-Means Algorithm Based On Fuzzy Metrics
9 pages
CSE4261 Lecture-8
No ratings yet
CSE4261 Lecture-8
49 pages
Jismo Math P5
100% (1)
Jismo Math P5
33 pages
Chang 2016
No ratings yet
Chang 2016
12 pages
296 995 1 PB PDF
No ratings yet
296 995 1 PB PDF
15 pages
Data Clustering Using Kernel Based
No ratings yet
Data Clustering Using Kernel Based
6 pages
A Generalization of Distance Functions For Fuzzy C - Means Clustering With Centroids of Arithmetic Means
No ratings yet
A Generalization of Distance Functions For Fuzzy C - Means Clustering With Centroids of Arithmetic Means
15 pages
Indefinite Kernel Fuzzy Clustering Algorithms: C-Means
No ratings yet
Indefinite Kernel Fuzzy Clustering Algorithms: C-Means
2 pages
Unsupervised Learning 2024-PPG
No ratings yet
Unsupervised Learning 2024-PPG
85 pages
Wagstaff 2004
No ratings yet
Wagstaff 2004
10 pages
Fusion Subspace Clustering For Incomplete Data
No ratings yet
Fusion Subspace Clustering For Incomplete Data
8 pages
Evaluation of The Performance of Clustering Algorithms in
No ratings yet
Evaluation of The Performance of Clustering Algorithms in
5 pages
Constrained K-Means Clustering With Background Knowledge
No ratings yet
Constrained K-Means Clustering With Background Knowledge
8 pages
Dmbi Iat-2 Imp Ques Soln
No ratings yet
Dmbi Iat-2 Imp Ques Soln
43 pages
Fuzzy C-Mean Clustering Algorithm Modification and Adaptation For Applications
No ratings yet
Fuzzy C-Mean Clustering Algorithm Modification and Adaptation For Applications
4 pages
Fuzzy C Means
No ratings yet
Fuzzy C Means
4 pages
An Efficient Fuzzy Clusnjkstering Algorithm
No ratings yet
An Efficient Fuzzy Clusnjkstering Algorithm
10 pages
Assignment 4
No ratings yet
Assignment 4
2 pages
Image Segmentation by Fuzzy C-Means Clustering Algorithm With A Novel Penalty Term Yong Yang
No ratings yet
Image Segmentation by Fuzzy C-Means Clustering Algorithm With A Novel Penalty Term Yong Yang
15 pages
Kmeans&Variants
No ratings yet
Kmeans&Variants
29 pages
A Novel Kernelized Fuzzy Clustering Algorithm For Data Classification
No ratings yet
A Novel Kernelized Fuzzy Clustering Algorithm For Data Classification
6 pages
Agglomerative Mean-Shift Clustering
No ratings yet
Agglomerative Mean-Shift Clustering
7 pages
Missing Value
No ratings yet
Missing Value
11 pages
Neurocomputing: Yi Ding, Xian Fu
No ratings yet
Neurocomputing: Yi Ding, Xian Fu
3 pages
Improving Fuzzy C-Means Clustering Based On Feature-Weight Learning
No ratings yet
Improving Fuzzy C-Means Clustering Based On Feature-Weight Learning
10 pages
Fuzzypaper May No K
No ratings yet
Fuzzypaper May No K
20 pages
Data Mining Algorithms in R - Clustering - Fuzzy Clustering - Fuzzy C-Means - Wikibooks, Open Books For An Open World
No ratings yet
Data Mining Algorithms in R - Clustering - Fuzzy Clustering - Fuzzy C-Means - Wikibooks, Open Books For An Open World
8 pages
Pennachi - Theory of Asset Pricing
100% (1)
Pennachi - Theory of Asset Pricing
570 pages
Fuzzy Means Algorithm
No ratings yet
Fuzzy Means Algorithm
14 pages
Growth and Decay Basic Calculus Lesson Plan
No ratings yet
Growth and Decay Basic Calculus Lesson Plan
10 pages
Mathematics in Chemical Engineering A 50 Year Introspection
No ratings yet
Mathematics in Chemical Engineering A 50 Year Introspection
17 pages
Recent Advances in Mathematics For Engineering (Mathematical Engineering, Manufacturing, and Management Sciences) 1st Edition Mangey Ram (Editor)
100% (3)
Recent Advances in Mathematics For Engineering (Mathematical Engineering, Manufacturing, and Management Sciences) 1st Edition Mangey Ram (Editor)
54 pages
A Fuzzy K-Means Clustering Algorithm Using Cluster Center Displacement
No ratings yet
A Fuzzy K-Means Clustering Algorithm Using Cluster Center Displacement
15 pages
A Hybrid Algorithm Based On KFCM-HACO-FAPSO For Clustering ECG Beat
No ratings yet
A Hybrid Algorithm Based On KFCM-HACO-FAPSO For Clustering ECG Beat
6 pages
Fuzzy Clustering With Multiple Kernels: Naouel Baili Hichem Frigui
No ratings yet
Fuzzy Clustering With Multiple Kernels: Naouel Baili Hichem Frigui
7 pages
Fuzzy Image Processing: Fuzzy C-Means Clustering Farah Al-Tufaili
No ratings yet
Fuzzy Image Processing: Fuzzy C-Means Clustering Farah Al-Tufaili
17 pages
Fuzzy C Means
No ratings yet
Fuzzy C Means
2 pages
On The Selection of M For Fuzzy C-Means
No ratings yet
On The Selection of M For Fuzzy C-Means
7 pages
Assignment#3 AI
No ratings yet
Assignment#3 AI
5 pages
Fuzzy C-Means - Review
No ratings yet
Fuzzy C-Means - Review
3 pages
Robust Fuzzy Clustering Algorithms
No ratings yet
Robust Fuzzy Clustering Algorithms
6 pages
Assignment 2
No ratings yet
Assignment 2
8 pages
Fuzzy Image Processing: Fuzzy C-Means Clustering Farah Al-Tufaili
No ratings yet
Fuzzy Image Processing: Fuzzy C-Means Clustering Farah Al-Tufaili
17 pages
05 - Multiple-Stage Factory Models - With - Solutions - New
No ratings yet
05 - Multiple-Stage Factory Models - With - Solutions - New
74 pages
IMECS2009 pp177-182
No ratings yet
IMECS2009 pp177-182
6 pages
An Ε-Insensitive Approach To Fuzzy Clustering: Int. J. Appl. Math. Comput. Sci., 2001, Vol.11, No.4, 993-1007
No ratings yet
An Ε-Insensitive Approach To Fuzzy Clustering: Int. J. Appl. Math. Comput. Sci., 2001, Vol.11, No.4, 993-1007
15 pages
Keynote Speaker Snsi06 Unsupervised Classification by Soft Computing Techniques
No ratings yet
Keynote Speaker Snsi06 Unsupervised Classification by Soft Computing Techniques
4 pages
DSP - Mod2 QB
No ratings yet
DSP - Mod2 QB
15 pages
Computer Architecture ECE 361 Lecture 5: The Design Process & ALU Design
No ratings yet
Computer Architecture ECE 361 Lecture 5: The Design Process & ALU Design
55 pages
Aryabhatta Question Paper Class XI 2019
No ratings yet
Aryabhatta Question Paper Class XI 2019
15 pages
Control of Spatiotemporal
No ratings yet
Control of Spatiotemporal
14 pages
Solved ISRO Scientist or Engineer Mechanical May 2017 Paper With Solutions
No ratings yet
Solved ISRO Scientist or Engineer Mechanical May 2017 Paper With Solutions
26 pages
Wca Regulations and Guidelines
No ratings yet
Wca Regulations and Guidelines
25 pages
90 - 48173 - FINALmaterial C and DS Course Handout 22-08-15
No ratings yet
90 - 48173 - FINALmaterial C and DS Course Handout 22-08-15
156 pages
Fuzzy Clustering: Presented by CH - Srikanth (07991A1268)
No ratings yet
Fuzzy Clustering: Presented by CH - Srikanth (07991A1268)
11 pages
Kinematic Analysis For Sliding Failure of Multi-Faced Rock Slopes
No ratings yet
Kinematic Analysis For Sliding Failure of Multi-Faced Rock Slopes
11 pages
Radar Systems
No ratings yet
Radar Systems
12 pages
Artificial Neural Networks-Unsupervised Learning PDF
No ratings yet
Artificial Neural Networks-Unsupervised Learning PDF
39 pages
Unit 2 Acceleration Analysis - Relative Velocity Method
No ratings yet
Unit 2 Acceleration Analysis - Relative Velocity Method
9 pages
Balancing Hard To Balance Equations PDF
No ratings yet
Balancing Hard To Balance Equations PDF
2 pages
Lab Exam - M1 - Data - Set
No ratings yet
Lab Exam - M1 - Data - Set
10 pages
SIMULATION MODEL of Permanent Magnet Synchronous Motor
No ratings yet
SIMULATION MODEL of Permanent Magnet Synchronous Motor
9 pages
Studies Analysing FWD Test Results: S.No. Paper Name Journal Objectives Methodology Conclusion
No ratings yet
Studies Analysing FWD Test Results: S.No. Paper Name Journal Objectives Methodology Conclusion
4 pages
Eighth Semester B.Tech Degree Examination, May 2019: C H1005 Pages: 2
No ratings yet
Eighth Semester B.Tech Degree Examination, May 2019: C H1005 Pages: 2
2 pages
Ch.5 Fixed-Point vs. Floating Point
No ratings yet
Ch.5 Fixed-Point vs. Floating Point
10 pages
Asymptotic Generalizations of The Lockhart Martinelli Method For Two Phase Flows
No ratings yet
Asymptotic Generalizations of The Lockhart Martinelli Method For Two Phase Flows
12 pages
Table Arus Motor
No ratings yet
Table Arus Motor
2 pages
Data Preprocessing in Data Mining PDF
100% (3)
Data Preprocessing in Data Mining PDF
327 pages

Clustering Incomplete Data Using Kernel-Based

Uploaded by

Clustering Incomplete Data Using Kernel-Based

Uploaded by

Neural Processing Letters 18: 155–162, 2003.

Clustering Incomplete Data Using Kernel-Based

DAO-QIANG ZHANG? and SONG-CAN CHEN

Abstract. There is a trend in recent machine learning community to construct a nonlinear

Key words. clustering, fuzzy c-means, incomplete data, Kernel methods

2. Kernel Fuzzy c-means Clustering (KFCM)

kFðxk Þ  Fðvi Þk2 ¼ Kðxk ; xk Þ þ Kðvi ; vi Þ  2Kðxk ; vi Þ ð3Þ

Minimizing Equation (4) under the constraint of U, we have

ð1=ð1  Kðxk ; vi ÞÞÞ1=ðm1Þ

(2) Hyper tangent functions:

3. Clustering Incomplete Data using KFCM

4. Experiments Results and Discussions

Table I. Averaged number of misclassiﬁcations on incomplete Data Set A.

% Missing PDS WSP NPS Gaussian RBF (a ¼ 1.5, b ¼ 1.2)

10 21.84 11.92 17.95 7.62 7.27

Table II. Averaged number of misclassiﬁcations on incomplete Data Set B.

% Missing PDS WSP NPS Gaussian Hyper tangent

20 2.57 2.54 2.61 2.43 2.51

Table III. Averaged number of misclassiﬁcations on incomplete Iris dataset.

% Missing PDS WSP NPS Gaussian RBF (a ¼ 0.5, b ¼ 2)

25 63.96 16.33 29.14 13.57 12.73

Appendix I. Proof that d(x,y) defined in Equation (9) is a metric

You might also like

kFðxk Þ Fðvi Þk2 ¼ Kðxk ; xk Þ þ Kðvi ; vi Þ 2Kðxk ; vi Þ ð3Þ

ð1=ð1 Kðxk ; vi ÞÞÞ1=ðm1Þ