A Novel Kernelized Fuzzy Clustering Algorithm For Data Classification
A Novel Kernelized Fuzzy Clustering Algorithm For Data Classification
1073
Neha Mehra, International Journal of Emerging Trends in Engineering Research, 9(8), August 2021, 1073 – 1078
The cluster membership matrix uij corresponding to A kernel is a function K that for all x, k from the
data point xi to each cluster vj can be calculated as original input space x satisfies
1074
Neha Mehra, International Journal of Emerging Trends in Engineering Research, 9(8), August 2021, 1073 – 1078
Kernel based FCM (KFCM) function using the Where, d = ‖x − x‖ and ̅ Is the average of all
mapping ϕ can be generally defined as the constrained distances d .
minimization of the objective as described below:
To minimize the objective function given in equation
(9) the proposed kernel based fuzzy clustering
J(U, V) = ∑ ∑ u ϕ(x ) − ϕ(v ) (6)
algorithm is explained in Algorithm 2.
Where ϕ(x ) − ϕ(v ) is the Euclidean distance Algorithm 2: Kernel based Fuzzy Clustering
between ϕ(x ) and ϕ(v ), and ϕ(x ) and ϕ(v ), are the Algorithm
kernel spaces of x and v , respectively. N is the total
number of training samples and c is the number of Input: X, V, c, m, σ
clusters. is the fuzzy membership on the each Output: U, V’
Step 1: Randomly Initialize cluster centers V= {v1, v2,
training sample xi to each cluster vj.
..,vc}
Step 2: Compute cluster membership by using
ϕ(x ) − ϕ(v ) equation (9)
= ϕ(x ) − ϕ(v ) ϕ(x ) − ϕ(v ) Step 3: Compute cluster centers by using equation (10)
= Step 4: If V ′ − V < ϵ then stop, otherwise continue
ϕ(x ) ϕ(x ) − ϕ v ϕ(x ) − ϕ(x ) ϕ v + with step 2.
ϕ v ϕ v The equation (9) and (10) are updated until the
predefined convergence threshold is met and then the
= K (x , x ) + K v , v − algorithm terminates.
2 K x , v
5. EXPERIMENTS AND RESULTS
(7)
In this section, experimental studies of the proposed
In the case of RBF kernel approach are conducted on five real world dataset.
K (x , x ) = 1 and K v , v = 1 Firstly, c objects are chosen at random from datasets as
initial clusters initialize the membership matrix U. We
set fuzzification parameter m as 1.75 and convergence
So the equation (7) can be rewritten as
threshold as 10-3. In the experiments, we compare the
performance of FCM and KFCM to show the
J(U, V) = 2 ∑ ∑ u 1 − K x , v (8)
1075
Neha Mehra, International Journal of Emerging Trends in Engineering Research, 9(8), August 2021, 1073 – 1078
1076
Neha Mehra, International Journal of Emerging Trends in Engineering Research, 9(8), August 2021, 1073 – 1078
Table 2. F-Measure for FCM and KFCM on various in F-Measure. The NMI measure also shows the
Datasets improvement of our proposed algorithm. Maximum
improvement is shown in Balance Scale dataset. Ecoli
Datasets FCM KFCM dataset shows the maximum improvement in ARI
measure. Nearly 10 % to 20% improvement is
Seed 0.43517 0.60976 observed in Ari Measure. The main aim of the fuzzy
clustering approach is to minimize the objective
Wine 0.34721 0.64912 function. The proposed algorithm shows maximum
minimization of the objective function in Wine and
Breast Cancer 0.47674 0.78793 Breast Cancer Wisconsin dataset. The above
Wisconsin discussion shows that the proposed kernel based
Balance Scale 0.21968 0.63089 clustering algorithm is comparable efficient to the
Fuzzy C-Means algorithm.
Ecoli 0.37596 0.68654
6. CONCLUSION
Clustering is most efficient technique used in the field
Table 3. NMI for FCM and KFCM on various Datasets of research. The main advantage of the research is to
include the concept of kernel methods. In this paper a
Datasets FCM KFCM Kernel based Fuzzy clustering algorithm is proposed
to handle the non-linear data which deals with the
Seed 0.69483 0.70976 limitation of Fuzzy C-Means algorithm. It uses RBF
Wine 0.46952 0.87422 kernel as the kernel function for fuzzy clustering. The
effectiveness of the proposed method is compared with
Breast Cancer 0.73001 0.89603 the Fuzzy C means, which shows that the proposed
Wisconsin approach performs better clustering results. The
Balance Scale 0.24539 0.79285 proposed algorithm shows 10 % to 20% improvement
Ecoli 0.55712 0.8274 in all the measures that we have discussed. The work
can be extended in future to process big data.
Table 4. ARI for FCM and KFCM on various Datasets REFERENCES
Datasets FCM KFCM
[1] Garima, Hina Gulati, P.K Singh, "Clustering
Seed 0.3381 0.6234 Techniques in Data Mining: A Comparison", 2nd
Wine 0.48052 0.92468 International Conference on Computing for
Breast Cancer 0.95608 0.97281 Sustainable Global Development (INDIACom), New
Wisconsin Delhi, India, pp. 410 -415, 2015.
Balance Scale 0.3024 0.6752 [2] Malika Bendechache, Nhien-An Le-Khac and M-
Ecoli 0.28571 0.71905 Tahar Kechadi, "Efficient Large Scale Clustering
based on Data Partitioning", IEEE International
Conference on Data Science and Advanced Analytics,
Table 5. Objective Function for FCM and KFCM on
Montreal, QC, Canada, pp. 612 - 621, 2016.
various Datasets
[3] Raju G, Binu Thomas, Sonam Tobgay and Th.
Shanta Kumar, "Fuzzy Clustering Methods in Data
Datasets FCM KFCM
Mining: A comparative Case Analysis", International
Conference on Advanced Computer Theory and
Seed 1182.73 838.9414
Engineering, Phuket, Thailand, pp. 489 - 493, 2008.
Wine 2788.39 777.5576
[4] Dae-Won Kim, Kwang H. Lee, Doheon Lee, “On
Breast Cancer 30147.6 832.4609
cluster validity index for estimation of the optimal
Wisconsin
number of fuzzy clusters”, Pattern Recognition
Balance Scale 1250.44 891.3369 Society, Published by Elsevier, pp. 2009- 2025, April
Ecoli 959.84 853.4041 2004.
[5] Xiao-Hong Wu, Jian-Jiang Zhou, “Kernel-based
The above result shows that in all the datasets F- Fuzzy K-nearest-neighbor Algorithm”, International
Measure is increased in our proposed algorithm. Conference on Computational Intelligence for
Maximum improvement is shown in Balance Scale Modelling, Control and Automation and International
dataset. Nearly 10 % to 20% improvement is observed Conference on Intelligent Agents, Web Technologies
1077
Neha Mehra, International Journal of Emerging Trends in Engineering Research, 9(8), August 2021, 1073 – 1078
and Internet Commerce (CIMCA-IAWTIC'06), Conference on Machine Vision and Image Processing
Vienna, Austria, Vol. 2, pp. 159 - 162, November (MVIP), Isfahan, Iran, pp. 221 – 227, 2017.
2005. [14] James C. Bezdek, “Pattern Recognition with
[6] Hsin-Chien Huang, Yung-Yu Chuang, and Chu- Fuzzy Objective Function Algorithms”, Plenum Press,
Song Chen, “Multiple Kernel Fuzzy Clustering”, IEEE New York, 1981.
Transactions on Fuzzy Systems, Vol. 20, No. 1, pp. [15]Tomer Hertz Tomboy, Aharon Bar Hillel and
120 - 134, February 2012. Aharonbh Daphna Weinshall, “Learning a Kernel
[7] Mark Girolami, “Mercer Kernel – Based Clustering Function for Classification with small Training
in Feature Space”, IEEE Transactions on Neural Samples”, In proceeding of Machine Learning,
Networks, Vol. 12, No. 3, pp, 780 – 784, May 2002. proceedings of the twenty-third International
[8] Grigorios F. Tzortzis and Aristidis C. Likas, “The Conference (ICML 2006), Pittsburgh, Pennsylvania,
Global Kernel- Means Algorithm for Clustering in USA, pp. 401 - 408, 25-29 June 2006.
Feature Space”, IEEE Transactions on Neural [16] Tomas M. Cover, “Geometrical and statistical
Networks, Vol. 20, No. 7, pp, 1181 – 1194, July 2009. properties of systems of linear inequalities in pattern
[9] G. R. Reddy, K. Ramudu, S. Zaheeruddin, and R. recognition”, IEEE transactions on Electronic
R. Rao, “Image segmentation using kernel fuzzy c- Computers. 14, Vol. EC-14, No. 3, pp, 326–334, June
means clustering on level set method on noisy 1965.
images,” in Proceedings of the International [17] C. J. Van Rijsbergen, Information Retrieval,
Conference on Communications and Signal Processing Butterworth, 1979.
(ICCSP ’11), Calicut, India, pp. 522–526, February [18] Neha Bharill, Aruna Tiwari and Aayushi Malviya,
2011. “Fuzzy Based Scalable Clustering Algorithms for
[10] Naouel Baili and Hichem Frigui, “Fuzzy handling Big data Using Apache Spark”, IEEE
Clustering with Multiple Kernels in Feature Space”, Transactions on Big Data, Vol. 2, No. 4, pp, 339–352,
WCCI IEEE World Congress on Computational December 2016.
Intelligence, Brisbane, Australia, pp. 1- 8, 10-15, June, [19] Liang Bai, Jiye Liang and Yike Guo, “An
2012. ensemble clusterer of multiple fuzzy k-means
[11] J. Liu and M. Xu, “Kernelized fuzzy attribute C- clusterings to recognize arbitrarily shaped clusters”,
means clustering algorithm,” Fuzzy Sets Syst., vol. IEEE Transactions on Fuzzy Systems, IEEE Early
159, pp. 2428–2445, 2008. Access Articles, pp, 1-1, May 2018.
[12] Zhulin Liu, C. L. Philip Chen, Long Chen, and Jin [20 Himanshi Agrawal, Neha Mehra, Vandan Tewari,
Zhou, ‘Multi-Attribute Based Fuzzy C-means in “Implementation of Protein Sequence Classification for
Approximated Feature Space”, International Globin family using Ensemble Learning”, International
Conference on Fuzzy Theory and Its Applications, Journal of Emerging Trends in Engineering Research,
Volume 9. No. 4, April 2021.
Pingtung, Taiwan, pp. 1 – 6, 2017.
[13]Alireza Farrokhi Nia and Mousa Shamsi, “Brain
MR Image Segmentation Using Differential Evolution
Guided by RBF-Kernel based FCM”, 10th Iranian
1078