Li 2017
Li 2017
TABLE I
C OMPARISON OF THE H YPER -S PHERES -BASED M ODELS
A. Training Stages
Our algorithms are trained in three stages, which are
described below.
Fig. 6. Artificial dataset 3 after Nyström and SVD transformation. Stage 1 (Forming Hyper-Spheres and Adjusting Centroids
and Radii):
1) Forming Hyper-Spheres and Adjusting Centroids: Given
approximation of A(k ≤ n) and is defined by that instances are read dynamically, there is no
hyper-sphere at the beginning. The first instance inputed
A11 A21
Ak = CA+
nys
C T
= + ≈A (4) forms a hyper-sphere whose centroid is itself and ini-
11 A21 A21 A11 AT21 tial radius is set to a large value. When a new instance
where A+ 11 denotes the generalized pseudo inverse of A11 .
is inputed and does not fall into any existing hyper-
There exists an Eigen decomposition A+ −1 T spheres, a new hyper-sphere will be formed in the same
11 = V V , such
nys nys
that each element Ak ij in Ak can be decomposed as way. If a new instance falls into one or more existing
hyper-spheres, the winner is the one whose centroid is
Ak nys ij = CiT V−1 V T Cj the closest to the new instance. The winning cluster’s
T centroid is recalculated as
= −1/2 V T Ci −1/2 V T Ci ci (t + 1) = ci (t) + α[φ(x) − ci (t)] (9)
T
= −1/2 V T (κ(xi , x1 ), . . . κ(xi , xm )) where x is the new inputed instance, c(t) is the original
centroid of the hyper-sphere, c(t+1) is the new centroid,
• −1/2 V T κ xj , x1 , . . . κ xj , xm (5) and α is the learning rate. When the number of instances
that fall within a particular hyper-sphere grows, its cen-
where κ(xi , xj ) is the base kernel function, x1 , x2 , . . . , xm are
troid tends to move toward the densest zone. In order
representative data points and can be obtained by uniform
to speed up the search of the winner, we build simple
sampling or clustering methods such as K-means and SOFM.
k-dimension trees for all hyper-spheres. With the knowl-
Let
edge of the radius, it is easy to figure out the upper and
∼
φ (x) = −1/2 V T (κ(x, x1 ), . . . , κ(x, xm ))T (6) lower bounds of the selected k dimensions. In this way,
m it avoids extensive computation of all Euclidean distance
such that of instance and hyper-sphere pairs.
2) Building Decision Border Zone—DMZ: The goal of this
nys ∼ ∼ ∼
Ak ij = φ (xi )T φ xj = κ xi , xj . (7) step is to find the DMZ’s median points that approximate
m m the shape of the DMZ. We find the points using the fol-
With Nyström method, we can get an explicit approximation lowing technique. The first time a labeled instance falls
of the nonlinear projection φ(x), which is into a hyper-sphere, the hyper-sphere will be labeled
using the label of this instance. If another instance with
∼
x → φ (x). (8) a conflicting label falls into the same hyper-sphere, it
m indicates that the hyper-sphere has entered the DMZ. We
To justify why we use kernel methods for our model, we identify the nearest data point in the hyper-sphere to the
first used Nyström method to raise the dimension of dataset newly inputed conflicting instance, and let pi represent
3 to 403, then used singular value decomposition (SVD) to the median point as follows:
reduce the dimension to 2 for the purpose of visualization. 1
Fig. 6 illustrates the transformed dataset 3 from Fig. 4(c). φ(xconflicting ) + ci
pi = (10)
2
Compared with Fig. 4(c), the data in Fig. 6 can be cov-
where φ(xconflicting ), pi ∈ ci , and pi is recorded and used
ered with less hyper-spheres, or each hyper-sphere can enclose
in the posterior clustering process.
more data points. Because the sampling points in Nyström
3) Adjusting the Radii of Hyper-Spheres: Once a DMZ
methods can be obtained dynamically, the projection of (8)
point is found in a hyper-sphere, the radius of the
can be used for every single instance in competitive learning
hyper-sphere should be updated such that it does not
and can be applied directly to our incremental model.
enter the DMZ. The new radius of hyper-sphere ci
Without loss of generality, we use φ(x) to denote a potential
should therefore be set as
projection of x in the reminder of this paper. If it works in the
original space, the projection of x is to itself. ri = d(pi , ci ) − dsafe (11)
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 8. Convergence test on all of the datasets. (a) Dataset 1. (b) Dataset 2. (c) Dataset 3. (d) Iris. (e) Seeds. (f) Segment. (g) Wholesale. (h) Glass.
(i) Diabetes. (j) Wine. (k) Credit-g. (l) Credit-rating. (m) Phishing websites. (n) Credit card. (o) Pendigits. (p) Shuffle. (q) Occupation. (r) HAPT. (s) Loans.
(t) URLs.
In each mapper tasks, the operations are based on instances. two situations. In the first situation, the new instance falls
It queries local cache for every instance to find out in into an existing hyper-sphere and the label of the instance
which hyper-spheres the instance falls, marks the winning is determined by the label of the hyper-sphere. In the sec-
hyper-sphere and the conflicting ones, and sends the hyper- ond situation, the new instance does not fall into an existing
spheres along with the description of the needed operations in hyper-sphere, and the label of the new instance is coordinated
another form of key-value<id, hyper-sphere> pairs. by the k nearest hyper-spheres’ labels
In each reducer task, the operations are based on every y = arg max wj I yi = lj (12)
hyper-sphere, which is aggregated according to the hyper-sphere lj
ci ∈Nk (x)
id emitted from mapper tasks. The competitive learning can where wj = exp(−([dE (φ(x), cj )2 ]/[2rj2 ])); i = 1, 2, . . . , L;
be conducted collectively with the aggregated instances. The j = 1, 2, . . . , k; Nk (x) is the k nearest hyper-spheres; and I is
tuning of a radius can be performed for only once with the the indicator function. The default value of k is set to 3.
closest conflicting instance, and it should find out the orphan
points and return the tuned hyper-sphere at the end. IV. E XPERIMENTS
After a turn of the MapReduce tasks, the merging and selec-
tion of the hyper-spheres should be performed. After all of We implemented our classifier using Java, with the help of
the operations, the tuned hyper-spheres should be saved to third-party Jars including common-math3, weka, joptimizer,
the cache. The orphan points should be retrained in the next and a local caching framework. The distributed MapReduce
turn. In the whole MapReduce process, subtasks do not coor- implementation of AdaHS was built upon Hazelcast [41],
dinate with each other. Thus the hyper-spheres and DMZ are which also provides distributed caching system. Most exper-
not updated in real time in a mini-batch turn, and they are iments were conducted on computer of i7-4560U (4CPU,
updated collectively after all reducer tasks return. 2-GHz), 8-GB RAM, and Ubuntu OS. The distributed deploy
of AdaHS was conducted on a cluster of two and four
machines, respectively, using the same configuration.
C. Predicting Labels
Just like other supervised competitive neural networks, A. Benchmark Datasets
AdaHS must determine the winning hyper-sphere in the hid- To evaluate the AdaHS, we used 20 datasets as the bench-
den layer to predict the label of a new instance. There are marks. Among them, three were the 2-D artificial datasets
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
TABLE IV
N UMBER OF H YPER -S PHERES
TABLE V
D ETAILS OF THE ACCURACY (%) IN S ITUATIONS I AND II wholesale, but perform poorly on datasets 1–3, segment, glass,
and URLs. k-NN, k-SVM, and LWL were slow on large-scale
datasets such as loans and URLs. Kernel methods improved
the performance on most datasets, both in Nys-AdaSH and
k-SVM.
AdaHS fit quite well for specific datasets, such as
datasets 1–3, shuttle, occupancy, loans, and URLs, while main-
tained an acceptable accuracy on other datasets. As a local
model, AdaHS works well on datasets which are linearly
inseparable. In addition, because of the build-in clustering
mechanism, its accuracy is comparable to k-NN and even SVM
using kernel methods, but free from slow searching speed and
excessive memory consumption.
We observed slightly lower performance of AdaHS and k-
NN on diabetes, wine, credit-g, and credit-rating. This is due
to the “bad distance metrics” noted in [31], which is crucial
to distance-based models and implying that the features are
not selected or scaled properly. Besides, one assumption of
AdaHS is that instances in local areas can be clustered well.
If this assumption is violated, such as in occupancy and loans,
AdaHS will retrogress like k-NN. There will be too many
be easily hyper-spherically clustered, such as dataset 2, dis- clusters and DMZ points in the memory, and the searching
tortion error values converge to the minimum after relatively speed will drop accordingly.
more iterations of training. As it is shown in Fig. 4(b), the bor-
ders are relatively complex, so it took more hyper-spheres to
enclose the entire set of instances and more time to converge. F. Discussion
On some datasets, such as iris, segment, wholesale, credit- 1) Time Complexity and Space Complexity: It is obvious
rating, and HAPT, the classifier converged after several small that the time costs of Algorithms 1–3 are n × m, m2 , and m,
oscillations. where n is the number of data points and m is the number of
Results of the convergence tests on all 20 datasets showed clusters. The original form of the time complexity is O(nm +
that, given enough hyper-spheres and with the constraints that m2 + m). If the number of clusters m is constant, the total
all instances in the same cluster have the same label, the com- computational cost is O(n). That means if the assumption of
petitive learning was able to provide a clustering solution no “clustering” holds, AdaHS runs in linear time. Data kept in
matter how complex and irregular the decision border is and memory are clusters and DMZ information, so the space cost
what the data distribution is. is O(m + l), where l is the data size in DMZ.
In Nys-AdaHS, the time cost of SOFM is O(nk), where
E. Performance Evaluation k is the cluster number of SOFM and the target dimension
Tenfold cross-validation was used to test the accuracy of of Nyström method. The time cost of SVD on A+ 11 in (4) is
AdaHS. To examine the details of the resulting predictions, O(rk2 ), where r is the rank of A+11 [40] and the multiplication
performance on the two types of prediction was studied with the vector in (6) also takes O(nk). So the total time cost is
separately. O(rk2 + 2nk + nm + m2 + m). With the cluster center of SOFM
1) Two Types of Prediction: As discussed in Section III, our kept in memory, the space cost of Nys-AdaHS is O(k + m + l).
algorithm may be confronted with two situations in prediction, It can be observed from Table VII that the time cost of
i.e., there is an explicit winning hyper-sphere or an instance Nys-AdaHS is far less than that of k-SVM, especially on the
does not fall into any existing hyper-sphere. Experiments on last 7 datasets. Because the computation of kernel matrix in
the 20 datasets showed that prediction accuracies of the two k-SVM takes O(n2 ), which makes it hardly feasible in real
situations varied. Accuracies of the first situation were much applications. For example, on URLs, k-SVM took 6.085E5 s
higher than the second situation, as shown in Table V. (about seven days), and that is not realistic in practice. Nys-
2) Accuracy and Time Cost Comparison: To evaluate the AdaHs only took 894 s.
relative performance of AdaHS, we selected several other 2) Significance of Nyström Methods for AdaHS: Our moti-
well-known algorithms, including naïve Bayes, LDA, SVM, vation to apply kernel methods to AdaSH is not exactly the
C4.5, RBFN, and other incremental learning algorithms for same as SVM’s. Based on the data in Tables IV, VI, and VII,
comparison. Both accuracy and time cost were recorded. The the benefits of Nyström method for AdaHS are summarized
comparative results are shown in Tables VI and VII. as follows.
Indices in Tables VI and VII show that C4.5 performed a) Improving classification accuracy: On datasets 1–3,
best on datasets phishing_sites and loans whose attributes were segment, wholesale, glass, and loans, kernel methods improved
mostly nominal. LDA and L-SVM performed well on datasets the accuracy of SVM dramatically. That is because RBF kernel
with a globally consistent pattern, such as iris, seeds, wine, and brings the local learning ability to SVM, and improves the
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
TABLE VI
ACCURACY (%) C OMPARISON W ITH OTHER A LGORITHMS
TABLE VII
T IME C OST (S ECONDS ) C OMPARISON W ITH OTHER A LGORITHMS
accuracy of SVM on datasets that are not linearly separatable. ability for data definition” [18]. The reason is that AdaHS
AdaSH does not rely heavily on kernel methods like SVM is a clustering-based method and now each cluster contains
in terms of accuracy. AdaSH is a local model. Nevertheless, more information. It is especially useful when we analyze the
if kernel method can make the dissimilar points apart and evolving trends or the similar instances contained in the same
linearly separable in the new space [16], it can make the hyper- cluster.
spheres more easily to enclose the points and classify. The c) Improving the training speed: Nyström method does
effect of this benefit can be observed on most of the dataset increase a time cost of O(2nk +rk2 ). By projecting data points
in Table VI, for most accuracies were improved slightly. to a new space of simpler distribution, the number of clusters
b) Increasing hyper-spheres’ ability for data definition: can be reduced significantly, and the time cost saved from
It can be observed from Table IV that the number of clus- this benefit is O((m1 − m2 )(n + m1 + m2 + 1)), where m1
ters was reduced significantly on all datasets with Nyström and m2 refer to the number of clusters in AdaHS and Nys-
method. That means each hyper-sphere in the new space AdaHS, respectively. So there is a tradeoff between the two
can enclose more data points. In SVDD, this phenomenon terms. On large-scale datasets of complex data distribution
was stated as “kernel methods increase the hyper-sphere’s (i.e., the original number of clusters is very large, like credit
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
card, occupancy, loans, and URLs), the total learning time of [13] R. Spring and A. Shrivastava, “Scalable and sustainable deep learning
Nys-AdaHS could be reduced. via randomized hashing,” in Proc. ACM SIGKDD, Halifax, NS, Canada,
2017, pp. 445–454.
[14] A. Andoni and P. Indyk, “Near-optimal hashing algorithms for approx-
imate nearest neighbor in high dimensions,” Commun. ACM, vol. 51,
V. C ONCLUSION no. 1, pp. 117–122, 2008.
[15] P. Laskov, C. Gehl, S. Krüger, and K.-R. Müller, “Incremental support
To deal with dynamic data and changing patterns, this paper vector learning: Analysis, implementation and applications,” J. Mach.
proposed a new algorithm AdaHS, which incorporates the Learn. Res., vol. 7, pp. 1909–1936, Sep. 2006.
adaptivity of competitive neural networks and the idea of [16] S. Agarwal, V. V. Saradhi, and H. Karnick, “Kernel-based online
building a border zone. It has a strong capability for local machine learning and support vector reduction,” Neurocomputing,
vol. 71, nos. 7–9, pp. 1230–1237, 2008.
learning. By keeping only cluster and DMZ information in [17] B. Li, M. Chi, J. Fan, and X. Xue, “Support cluster machine,” in Proc.
memory, it avoids the problem of excessive memory con- ICML, Corvallis, OR, USA, 2007, pp. 505–512.
sumption and improves the searching speed dramatically. The [18] G. Chen, X. Zhang, Z. J. Wang, and F. Li, “Robust support vector
data description for outlier detection with noise or uncertain data,”
experiments using 20 datasets showed that AdaHS is especially Knowl. Based Syst., vol. 90, pp. 129–137, Dec. 2015.
suitable for datasets whose patterns are changing, decision bor- [19] G. Huang, H. Chen, Z. Zhou, F. Yin, and K. Guo, “Two-class support
ders are complex, and instances with the same label can be vector data description,” Pattern Recognit., vol. 44, no. 2, pp. 320–329,
2011.
spherically clustered. AdaHS has great potentials in fields like [20] Z. Uykan, C. Guzelis, M. E. Celebi, and H. N. Koivo, “Analysis of
anti-fraud analysis, network intrusion detection, stock market, input–output clustering for determining centers of RBFN,” IEEE Trans.
and credit scoring. Neural Netw., vol. 11, no. 4, pp. 851–858, Jul. 2000.
[21] B. Chandra and M. Gupta, “A novel approach for distance-based
AdaHS is proposed as a classifier to deal with semi-supervised clustering using functional link neural network,” Soft
changing patterns, which is a subtopic in system Comput., vol. 17, no. 3, pp. 369–379, 2013.
uncertainties [44]–[47]. System uncertainties theory has [22] S. Grossberg, “Adaptive resonance theory: How a brain learns to con-
sciously attend, learn, and recognize a changing world,” Neural Netw.,
great significance to many important applications such as vol. 37, pp. 1–47, Jan. 2013.
“actuator dynamics” [46], “multiagent-based systems” [47], [23] G. A. Carpenter, S. Grossberg, and J. H. Reynolds, “ARTMAP:
and “nonlinear systems” [47]. One of our future works Supervised real-time learning and classification of nonstationary data by
a self-organizing neural network,” Neural Netw., vol. 4, no. 5,
will take these problems into consideration and explore the pp. 565–588, 1991.
potential applications to those areas. [24] J. R. Williamson, “Gaussian ARTMAP: A neural network for fast incre-
mental learning of noisy multidimensional maps,” Neural Netw., vol. 9,
no. 5, pp. 881–897, 1996.
R EFERENCES [25] B. Vigdor and B. Lerner, “The Bayesian ARTMAP,” IEEE Trans. Neural
Netw., vol. 18, no. 6, pp. 1628–1644, Nov. 2007.
[1] A. Bouchachia, “Adaptation in classification systems,” in Foundations [26] R. Hecht-Nielsen, “Counter propagation networks,” Appl. Opt., vol. 26,
of Computational Intelligence, vol. 2. Heidelberg, Germany: Springer, no. 23, pp. 4979–4983, 1987.
2009, pp. 237–258. [27] Y. Dong, M. Shao, and X. Tai, “An adaptive counter propagation network
[2] G. Kou, Y. Peng, and G. Wang, “Evaluation of clustering algorithms based on soft competition,” Pattern Recognit. Lett., vol. 29, no. 7,
for financial risk analysis using MCDM methods,” Inf. Sci., vol. 275, pp. 938–949, 2008.
pp. 1–12, Aug. 2014. [28] P. Schneider, M. Biehl, and B. Hammer, “Adaptive relevance matri-
[3] G. Kou, Y. Peng, Y. Shi, Z. Chen, and X. Chen, “A multiple- ces in learning vector quantization,” Neural Comput., vol. 21, no. 12,
criteria quadratic programming approach to network intrusion pp. 3532–3561, 2009.
detection,” in Data Mining and Knowledge Management, [29] T. Kohonen, “Self-organized formation of topologically correct feature
vol. 3327. Heidelberg, Germany: Springer, 2005, pp. 145–153. maps,” Biol. Cybern., vol. 43, no. 1, pp. 59–69, 1982.
[Online]. Available: https://link.springer.com/chapter/10.1007%2F978- [30] C. Xiao and W. A. Chaovalitwongse, “Optimization models for feature
3-540-30537-8_16#citeas selection of decomposed nearest neighbor,” IEEE Trans. Syst., Man,
[4] Y. Huang and G. Kou, “A kernel entropy manifold learning approach Cybern., Syst., vol. 46, no. 2, pp. 177–184, Feb. 2016.
for financial data analysis,” Decis. Support Syst., vol. 64, pp. 31–42, [31] R. A. Valente and T. Abrão, “MIMO transmit scheme based on mor-
Aug. 2014. phological perceptron with competitive learning,” Neural Netw., vol. 80,
[5] Z.-H. Zhou and Z.-Q. Chen, “Hybrid decision tree,” Knowl. Based Syst., pp. 9–18, Apr. 2016.
vol. 15, no. 8, pp. 515–528, 2002. [32] Q. Dai and G. Song, “A novel supervised competitive learning algo-
[6] C. Alippi, D. Liu, D. Zhao, and L. Bu, “Detecting and reacting to rithm,” Neurocomputing, vol. 191, pp. 356–362, May 2016.
changes in sensing units: The active classifier case,” IEEE Trans. Syst., [33] N. N. Schraudolph, J. Yu, and S. Günter, “A stochastic quasi-
Man, Cybern., Syst., vol. 44, no. 3, pp. 353–362, Mar. 2014. Newton method for online convex optimization,” J. Mach. Learn. Res.,
[7] V. Bruni and D. Vitulano, “An improvement of kernel-based object track- pp. 436–443, 2007.
ing based on human perception,” IEEE Trans. Syst., Man, Cybern., Syst., [34] W. Bian and D. Tao, “Constrained empirical risk minimization frame-
vol. 44, no. 11, pp. 1474–1485, Nov. 2014. work for distance metric learning,” IEEE Trans. Neural Netw. Learn.
[8] H. He, S. Chen, K. Li, and X. Xu, “Incremental learning from stream Syst., vol. 23, no. 8, pp. 1194–1205, Aug. 2012.
data,” IEEE Trans. Neural Netw., vol. 22, no. 12, pp. 1901–1914, [35] K. Chen and S. H. Wang, “Semi-supervised learning via regular-
Dec. 2011. ized boosting working on multiple semi-supervised assumptions,” IEEE
[9] M. Pratama, S. G. Anavatti, P. P. Angelov, and E. Lughofer, “PANFIS: Trans. Pattern Anal. Mach. Intell., vol. 33, no. 1, pp. 129–143, Jan. 2011.
A novel incremental learning machine,” IEEE Trans. Neural Netw. [36] M. Zia-ur Rehman, T. Li, Y. Yang, and H. Wang, “Hyper-ellipsoidal clus-
Learn. Syst., vol. 25, no. 1, pp. 55–68, Jan. 2014. tering technique for evolving data stream,” Knowl. Based Syst., vol. 70,
[10] L. L. Minku, A. P. White, and X. Yao, “The impact of diversity on pp. 3–14, Nov. 2014.
online ensemble learning in the presence of concept drift,” IEEE Trans. [37] J. Lai and C. Wang, “Kernel and graph: Two approaches for nonlinear
Knowl. Data Eng., vol. 22, no. 5, pp. 730–742, May 2010. competitive learning clustering,” Front. Elect. Electron. Eng., vol. 7,
[11] Y. Guo, W. Zhou, C. Luo, C. Liu, and H. Xiong, “Instance-based credit no. 1, pp. 134–146, 2012.
risk assessment for investment decisions in P2P lending,” Eur. J. Oper. [38] J.-S. Wu, W.-S. Zheng, and J.-H. Lai, “Approximate kernel competitive
Res., vol. 249, no. 2, pp. 417–426, 2016. learning,” Neural Netw., vol. 63, pp. 117–132, Mar. 2015.
[12] C.-H. Chen, “Feature selection for clustering using instance-based learn- [39] S. Kumar, M. Mohri, and A. Talwalkar, “Sampling methods for
ing by exploring the nearest and farthest neighbors,” Inf. Sci., vol. 318, the Nyström method,” J. Mach. Learn. Res., vol. 13, pp. 981–1006,
pp. 14–27, Oct. 2015. Apr. 2012.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
[40] J. Lu, S. C. H. Hoi, J. Wang, P. Zhao, and Z.-Y. Liu, “Large scale online Gang Kou received the B.S. degree in physics
kernel learning,” J. Mach. Learn. Res., vol. 17, no. 47, pp. 1–43, 2016. from Tsinghua University, Beijing, China, and
[41] Hazelcast. Hazelcast: The Leading In–Memory Data Grid. Accessed: the M.S. degree in computer science and the
Apr. 5, 2016. [Online]. Available: http://hazelcast.com Ph.D. degree in information technology from the
[42] UC Irvine Machine Learning Repository. Accessed: Dec. 13, 2015. University of Nebraska at Omaha, Omaha, NE,
[Online]. Available: http://archive.ics.uci.edu/ml/index.php USA.
[43] LibSVM. LIBSVM Data: Classification, Regression, and He is a Distinguished Professor of Chang Jiang
Multi-Label. Accessed: Dec. 16, 2015. [Online]. Available: Scholars Program and the Executive Dean of the
https://www.csie.ntu.edu.tw/∼cjlin/libsvmtools/datasets/ School of Business Administration, Southwestern
[44] C. Chen et al., “Adaptive fuzzy asymptotic control of MIMO systems University of Finance and Economics, Chengdu,
with unknown input coefficients via a robust Nussbaum gain-based China.
approach,” IEEE Trans. Fuzzy Syst., vol. 25, no. 5, pp. 1252–1263, Dr. Kou is the Managing Editor of the International Journal of Information
Oct. 2017. Technology and Decision Making, and the Editor-in-Chief of Springer book
[45] Z. Liu, G. Lai, Y. Zhang, X. Chen, and C. L. P. Chen, “Adaptive neu- series on Quantitative Management.
ral control for a class of nonlinear time-varying delay systems with
unknown hysteresis,” IEEE Trans. Neural Netw. Learn. Syst., vol. 25,
no. 12, pp. 2129–2140, Dec. 2014. Yi Peng received the B.S. degree in manage-
[46] C. Chen, Z. Liu, Y. Zhang, C. L. P. Chen, and S. L. Xie, ment information systems from Sichuan University,
“Saturated Nussbaum function based approach for robotic systems with Chengdu, China, in 1997, and the M.S. degree
unknown actuator dynamics,” IEEE Trans. Cybern., vol. 46, no. 10, in management information systems and the
pp. 2311–2322, Oct. 2016. Ph.D. degree in information technology from the
[47] C. Chen et al., “Adaptive consensus of nonlinear multi-agent systems University of Nebraska at Omaha, Omaha, NE,
with non-identical partially unknown control directions and bounded USA, in 2007.
modelling errors,” IEEE Trans. Autom. Control, vol. 62, no. 9, From 2007 to 2011, she was an Assistant
pp. 4654–4659, Sep. 2017. Professor with the School of Management and
Economics, University of Electronic Science and
Technology of China, Chengdu, China, where she
has been a Professor with the School of Management and Economics,
University of Electronic Science and Technology of China, since 2011. She
has authored three books and over 100 articles. Her current research interests
include data mining, multiple criteria decision making, and data mining
applications.