Broad Learning System An Effective and Efficient Incremental Learning System Without The Need For Deep Architecture
Broad Learning System An Effective and Efficient Incremental Learning System Without The Need For Deep Architecture
1, JANUARY 2018
Authorized licensed use limited to: Southeastern Louisiana University. Downloaded on May 14,2025 at 21:07:02 UTC from IEEE Xplore. Restrictions apply.
CHEN AND LIU: BLS: AN EFFECTIVE AND EFFICIENT INCREMENTAL LEARNING SYSTEM 11
Authorized licensed use limited to: Southeastern Louisiana University. Downloaded on May 14,2025 at 21:07:02 UTC from IEEE Xplore. Restrictions apply.
12 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 29, NO. 1, JANUARY 2018
Authorized licensed use limited to: Southeastern Louisiana University. Downloaded on May 14,2025 at 21:07:02 UTC from IEEE Xplore. Restrictions apply.
CHEN AND LIU: BLS: AN EFFECTIVE AND EFFICIENT INCREMENTAL LEARNING SYSTEM 13
where f (w) = Zw − x22 and g(w) = λw1 . In ADMM A. Broad Learning Model
form, the above problem could be rewritten as The proposed BLS is constructed based on the traditional
RVFLNN. However, unlike the traditional RVFLNN that takes
arg min : f (w) + g(o), s.t. w − o = 0. (6)
w
the input directly and establishes the enhancement nodes, we
first map the inputs to construct a set of mapped features.
Therefore, the proximal problem could be solved by the In addition, we also develop incremental learning algorithms
following iterative steps: that can update the system dynamically.
⎧ Assume that we present the input data X and project the
−1
⎨wk+1 := (Z Z + ρ I) (Z x + ρ(o − u )) data, using φi (X Wei +βei ), to become the i th mapped features,
T T k k
⎪
ok+1 := S λ (wk+1 + uk ) (7) Zi , where Wei is the random weights with the proper dimen-
⎪
⎩ ρ
sions. Denote Z i ≡ [Z1, . . . , Zi ], which is the concatenation
uk+1 := uk + (wk+1 − ok+1 ) of all the first i groups of mapping features. Similarly, the
j th group of enhancement nodes, ξ j (Z i Wh j + βh j ) is denoted
where ρ > 0 and S is the soft threshholding operator which
as H j , and the concatenation of all the first j groups of
is defined as
enhancement nodes are denoted as H j ≡ [ H1 , . . . , H j ].
⎧
⎪ In practice, i and j can be selected differently depending upon
⎨a − κ, a > κ
the complexity of the modeling tasks. Furthermore, φi and φk
Sκ (a) = 0, |a| ≤ κ (8)
⎪
⎩ can be different functions for i = k. Similarly, ξ j and ξr can
a + κ, a < −κ. be different functions for j = r . Without loss of generality, the
subscripts of the i th random mappings φi and the j th random
mappings ξ j are omitted in this paper.
D. Singular Value Decomposition In our BLS, to take the advantages of sparse autoencoder
Now comes a highlight of linear algebra. Any real m × n characteristics, we apply the linear inverse problem in (7) and
matrix A can be factored as fine-tune the initial Wei to obtain better features. Next, the
details of the algorithm are given below.
A = UV T Assume the input data set X, which equips with N samples,
each with M dimensions, and Y is the output matrix which
where U is an m × m orthogonal matrix whose columns are belongs to R N×C . For n feature mappings, each mapping
the eigenvectors of A AT , V is an n × n orthogonal matrix generates k nodes, can be represented as the equation of the
whose columns are the eigenvectors of AT A, and is an form
m × n diagonal matrix of the form
Zi = φ(X Wei + βei ), i = 1, . . . , n (9)
= diag{σ1 , . . . , σr , 0, . . . , 0}
where Wei and βei are randomly generated. Denote all the
with σ1 σ2 · · · σr > 0 and r = rank( A). Moreover, in feature nodes as Z n ≡ [Z1, . . . , Zn ], and denote the mth group
the above, σ1 , . . . , σr are the square roots of the eigenvalues of enhancement nodes as
of AT A. They are called the singular values of A. Therefore,
Hm ≡ ξ(Z n Wh m + βh m ). (10)
we achieve a decomposition of matrix A, which is one of a
number of effective numerical analysis tools used to analyze Hence, the broad model can be represented as the equation
matrices. In our algorithms, two different ways to reduce the of the form
size of the matrix are involved. In the first, the threshold
parameter η is set as 0 < η ≤ 1, which means that the Y = [Z1, . . . , Zn |ξ(Z n Wh 1 + βh 1 ), . . . , ξ(Z n Wh m + βh m )]W m
components associate with an eigenvalue σi ≥ ησ1 are kept. = [Z1, . . . , Zn |H1 , . . . , Hm ]W m
The second case is to select a fixed l singularities, while l
= [Z n |H m ]W m
is smaller than n. Define a threshold value ε, which is η for
case 1 and l for case 2. In the real practice, both of the cases where the W m = [Z n |H m ]+ Y . W m are the connecting
may be happened depending on various requirements. An SVD weights for the broad structure and can be easily computed
technique is known in its advantage in feature selection. through the ridge regression approximation of [Z n |H m ]+
using (3). Fig. 4(a) shows the above broad learning network.
III. B ROAD L EARNING S YSTEM
In this section, the details of the proposed BLS are given. B. Alternative Enhancement Nodes Establishment
First, the model construction that is suitable for broad expan- In the previous section, the broad expansion of enhancement
sion is introduced, and the second part is the incremental nodes is added synchronously with the connections from
learning for the dynamic expansion of the model. The two mapped features. Here, a different construction can be done
characteristics result in a complete system. At last, model by connecting each group of mapped features to a group of
simplification using SVD is presented. enhancement nodes. Details are described below.
Authorized licensed use limited to: Southeastern Louisiana University. Downloaded on May 14,2025 at 21:07:02 UTC from IEEE Xplore. Restrictions apply.
14 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 29, NO. 1, JANUARY 2018
Fig. 4. Illustration of BLS. (a) BLS. (b) BLS with an alternative enhancement nodes establishment.
(a)
For input data set, X, for n mapped features and for n together as Z (a) ∈ R N×nk whose weights are We , and
enhancement groups, the new construction is separately, the mq enhancement nodes are denoted together
as H (a) ∈ R N×mq whose weights are Wh(a) . Respectively, for
Y = [Z1, ξ(Z1 Wh 1 + βh 1 )| . . . Zn , ξ(Zn Wh n + βh n )]W n (11) the second model, treat the nk feature mapping together as
[Z1, . . . , Zn |ξ(Z1 Wh 1 + βh 1 ), . . . , ξ(Zn Wh n + βh n )]W n (b)
Z (b) ∈ R N×nk whose weights are We , and separately, the
(12) nγ enhancement nodes are denoted together as H (b) ∈ R N×nγ
whose weights are Wh(b) . Obviously, We(b) and We(a), that is
where Zi , i = 1, . . . , n, are N × α dimensional mapping with the same dimension, are exactly equivalent because the
features achieved by (9) and Wh j ∈ Rα×γ . This model entrances of the two matrices are generated from the same
structure is illustrated in Fig. 4(b). distribution.
It is obvious that the main difference between two con- As for the enhancement nodes part, first we have that H (a)
structions, in Fig. 4(a) and (b), is in the connections of the and H (b) are of the same size if mq = nγ . Therefore, we need
enhancement nodes. The following theorem proves that the to prove that their elements are equivalent. For any sample
above two different connections in the enhancement nodes are (a)
xl chosen from the data set, denote the columns of We as
actually equivalent. (a)
wei ∈ R , i = 1, . . . , nk and the columns of Wh as w ah j ∈
a N
Theorem 1: For the model in Section III-A [or Fig. 4(a)],
Rnk , j = 1, . . . , mq.
the feature dimension of Zi(a) is k, for i = 1, . . . , n, and the Hence, the j th enhancement node associate with the sample
(a)
dimension of H j is q, for j = 1, . . . , m. Respectively, for xl should be
the model in Section III-B [or Fig. 4(b)] the feature dimension
of Zi(b) is k, for i = 1, . . . , n, and the dimension of H j(b) is Hl(a) (a) (a)
j = ξ φ X We + βe Wh(a) + βh(a) lj
γ , for j = 1, . . . , n. Then if mq = nγ , and H (a) and H (b) = ξ φ xl wea1 + βeal1 , . . . , φ xl weank + βealnk wah j + βha j
are normalized, the two networks are exactly equivalent. nk
Consequently, no matter which kind of establishment of = ξ φ xl weai + βlia whali + βhali
enhancement nodes is adapted, the network is essentially i=1
the same, as long as the total number of feature nodes and ≈ nk E ξ φ xl wea + βea wha + βha
enhancement nodes are equal. Hereby, only the model in
= nk E ξ φ xl we + βe wh + βh .
Section III-A [or Fig. 4(a)] will be considered in the rest of
this paper. The proof is as follows. Here E stands for the expectation of distribution, we is
Proof: Suppose that the elements of Wei , Wh j , βei , and the N dimension random vector drawn from the distribu-
βh j are randomly drawn independently of the same distribu- tion density ρ(w), and wh is the scaler number sampled
tion ρ(w). For the first model, treat the nk feature mapping from ρ(w).
Authorized licensed use limited to: Southeastern Louisiana University. Downloaded on May 14,2025 at 21:07:02 UTC from IEEE Xplore. Restrictions apply.
CHEN AND LIU: BLS: AN EFFECTIVE AND EFFICIENT INCREMENTAL LEARNING SYSTEM 15
Similarly, for the columns of We(b) as webi ∈ R N , i = Algorithm 1 Broad Learning: Increment of p Additional
1, . . . , nk and the columns of Wh(b) as wbh j ∈ Rk , j = Enhancement Nodes
1, . . . , nγ , it could be deduced that for the second model we Input : training samples X;
have Output: W
1 for i = 0; i ≤ n do
Hl(b) (b) (b)
j = ξ φ X We + βe Wh(b) + βh(b) lj 2 Random Wei , βei ;
=ξ φ xe1 w1b + βebl1 ,...,φ xel wek
b
+ βeblk whb j + β bj 3 Calculate Zi = [φ(X Wei + βei )];
4 end
nk
5 Set the feature mapping group Z = [Z1 , . . . , Zn ];
n
= ξ φ xl webi + βebli whbli + βhbi j
6 for j = 1; j ≤ m do
i=1
7 Random Wh j , βh j ;
≈ k E ξ φ xl web + βeb wh b + βhb 8 Calculate H j = [ξ(Z n Wh j + βh j )];
= k E ξ φ xl we + βe wh + βh . 9 end
10 Set the enhancement nodes group H = [ H1 , . . . , Hm ];
m
Since all the we , wh and βe , βh are drawn from the same +
11 Set A and calculate ( A ) with Eq. (3);
m m
distribution, the expectations of the above two composite
12 while The training error threshold is not satisfied do
distributions are obviously the same. Hence, it is clear that
13 Random Wh m+1 , βh m+1 ;
1 (a)
(b) Calculate Hm+1 = [ξ(Zm+1 Wh m+1 + βh m+1 )];
Hl j H . 14
n lj 15 Set Am+1 = [ Am |Hm+1 ];
Therefore, we could conclude that under the given assumption, 16 Calculate ( Am+1 )+ and W m+1 by Eq. (13,14,15);
H (a) and H (b) are also equivalent if the normalization operator 17 m = m + 1;
is applied. 18 end
19 Set W = W
m+1 ;
Authorized licensed use limited to: Southeastern Louisiana University. Downloaded on May 14,2025 at 21:07:02 UTC from IEEE Xplore. Restrictions apply.
16 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 29, NO. 1, JANUARY 2018
upgraded pseudoinverse matrix should be achieved as follows: Algorithm 2 Broad Learning: Increment of n + 1 Mapped
m + Features
+ ( An ) − D B T Input : training samples X;
An+1 =
m
(18)
BT Output: W
+ 1 for i = 0; i ≤ n do
where D = ( Am
n ) [Zn+1 |Hex m ] 2 Random Wei , βei ;
3 Calculate Zi = [φ(X Wei + βei )];
(C)+ if C = 0
B =
T
+ (19) 4 end
(1 + D T D)−1 D T Am
n if C = 0 5 Set the feature mapping group Z = [Z1 , . . . , Zn ];
n
6 for j = 1; j ≤ m do
and C = [Zn+1 |Hexm ] − Am
n D. 7 Random Wh j , βh j ;
Again, the new weights are 8 Calculate H j = [ξ(Z n Wh j + βh j )];
m 9 end
Wn − D B T Y
Wn+1 =
m
. (20) 10 Set the enhancement nodes group H = [ H1 , . . . , Hm ];
m
BT Y +
11 Set An and calculate ( An ) with Eq. (3);
m m
Specifically, this algorithm only needs to compute the 12 while The training error threshold is not satisfied do
pseudoinverse of the additional mapped features instead of 13 Random Wen+1 , βen+1 ;
computing of the entire Am n+1 , thus resulting in fast incremen- 14 Calculate Zn = [φ(X Wen+1 + βen+1 )];
tal learning. 15 Random Wex i , βex i , i = 1, . . . , m;
The incremental algorithm of the increment feature mapping 16 Calculate Hex m =
is shown in Algorithm 2. And the incremental network for [ξ(Zn+1 Wex 1 + βex 1 ), . . . , ξ(Zn+1 Wex m + βex m )];
additional (n + 1) feature mapping as well as p enhancement 17 Update Am n+1 ;
Update ( Am +
n+1 ) and Wn+1 by Eq. (18,19,20);
m
nodes is shown in Fig. 6. 18
19 n = n + 1;
20 end
E. Incremental Learning for the Increment of Input Data 21 Set W = Wn+1 ;
m
Now let us come to the cases that the input training samples
keep entering. Often, once a system modeling is completed and
if a new input with a corresponding output enters to model, Wh j and βei , βh j are randomly generated during the initial of
the model should be updated to reflect the additional samples. the network. Hence, we have the updating matrix
The algorithm in this section is designed to update the weights
easily without an entire training cycle. Amn
An =
x m
.
Denote X a as the new inputs added into the neural network, ATx
and denote Am n as the n groups of feature mapping nodes The associated pseudoinverse updating algorithm could be
and m groups of enhancement nodes of the initial network. deduced as follows:
The respectively increment of mapped feature nodes and the
+ +
enhancement nodes are formulated as follows: x
Am
n = Am
n − B D T |B (23)
+
A x = φ(X a We1 + βe1 ), . . . , φ(X a Wen + βen )| (21) where D T = ATx Am
n
ξ Zxn Wh 1 + βh 1 , . . . , ξ Zxn Wh m + βh m (22) (C)+ if C = 0
BT = + (24)
(1 + D T D)−1 Am
n D if C = 0
where Zx n = [φ(X a We1 + βe1 ), . . . , φ(X a Wen + βen )] is the
group of the incremental features updated by X a . The Wei , and C = ATx − D T Am
n.
Authorized licensed use limited to: Southeastern Louisiana University. Downloaded on May 14,2025 at 21:07:02 UTC from IEEE Xplore. Restrictions apply.
CHEN AND LIU: BLS: AN EFFECTIVE AND EFFICIENT INCREMENTAL LEARNING SYSTEM 17
Authorized licensed use limited to: Southeastern Louisiana University. Downloaded on May 14,2025 at 21:07:02 UTC from IEEE Xplore. Restrictions apply.
18 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 29, NO. 1, JANUARY 2018
Algorithm 3 Broad Learning: Increment of Feature- As for the original model, define
Mapping Nodes, Enhancement Nodes, and New Inputs {0,n} {0,n} T
Input : training sample X; Wn0 W Z1 | . . . |W Zn
Output: W
we have that
1 for i = 0; i ≤ n do
2 Random Wei , βei ; Y = A0n Wn0
3 Calculate Zi = [φ(X Wei + βei )];
= [Z1, . . . , Zn ]Wn0
4 end
{0,n} {0,n}
5 Set the feature mapping group Z = [Z1 , . . . , Zn ];
n = Z1 W Z 1 + · · · + Zn W Z n
6 for j = 1; j ≤ m do T {0,n} T {0,n}
= Z1 VZP1 VZP1 W Z1 + · · · + Zn VZPn VZPn W Zn
7 Random Wh j , βh j ; ⎡ ⎤
T {0,n}
8 Calculate H j = [ξ(Z n Wh j + βh j )]; VZP1 W Z1
⎢ ⎥
9 end = Z1 VZP1 , . . . , Zn VZPn ⎣ ··· ⎦
10 Set the enhancement nodes group H = [ H1 , . . . , Hm ];
m T {0,n}
VZPn W Zn
m m +
11 Set An and calculate ( An ) with Eq. (3); {0,n} {0,n}
12 while The training error threshold is not satisfied do
= AF WF
13 if p enhancement nodes are added then
where
14 Random Wh m+1 , βh m+1 ; ⎡ ⎤
Calculate Hm+1 = [ξ(Z n Wh m+1 + βh m+1 )]; Update T {0,n}
15 VZP1 W Z1
Am+1 ; {0,n} ⎢ ⎥
n WF =⎣ ··· ⎦.
16 Calculate ( Am+1
n )+ and Wnm+1 by Eq. (13,14,15); P T
V Zn W Zn
{0,n}
17 m = m + 1;
18 else Finally, by solving a least square linear equation, the model
19 if n + 1 feature mapping is added then is refined to
20 Random Wen+1 , βen+1 ;
{0,n} {0,n}
21 Calculate Zn+1 = [φ(X Wen+1 + βen+1 )]; Y = AF WF (29)
22 Random Wex i , βex i , i = 1, . . . , m;
23 Calculate Hex n+1 = where
[ξ(Zn+1 Wex i + βex i ), . . . , ξ(Zn+1 Wex m + βex m )]; {0,n} {0,n} +
WF = AF Y. (30)
24 Update Am n+1 ;
Update ( Am +
n+1 ) and Wn+1 by Eq. (18,19,20);
m
{0,n} {0,n}
Here, ( A F )+ is the pseudoinverse of A F . In this way, the
25
26 n = n + 1; {0,n}
original A0n is simplified to A F .
27 else
2) SVD Simplification of Enhancement Nodes: We are able
28 New inputs are added as X a ;
to simplify the structure after adding a new group of enhance-
29 calculate A x by (21),(22), update x Am n;
+ ment nodes to the network. Suppose that the n groups of
30 update (x Amn ) and ( Wn ) by Eq. (23,24,25);
x m
feature mapping nodes and m groups of enhancement nodes
31 end
have been added, and the network is
32 end
33 end {m,n} {m,n}
Y = AF WF
34 Set W;
where
{m,n}
AF = Z1 VZP1 , . . . , Zn VZPn |H1 VHP1 , . . . , Hm VHPm
where P and Q are divided by the order of singularities, and
under the parameter ε.
Remember that our motivation is to achieve a satisfactory H j = ξ Z1 VZP1 , . . . , Zn VZPn Wh j + βh j .
reduction of the numbers of nodes. The idea is to compress
Zi by the principal portion, ZiP . The equation between Zi and In the above equations, H jP s, j = 1, . . . , m, are obtained by
ZiP is derived as follows: the same way as ZiP , which means
H j = UHj H
P
V T
j Hj
(31)
T Q Q T
ZiP VZPi = U Zi ZPi VZPi VZPi = UHj · H
P
j
| H j · VHPj |VH j (32)
T QT T QT
= U Zi ZPi VZPi VZPi + U Zi ZPi VZi VZPi P
= UHj H VP +
Q
U H j H j VH j
Q
= H jP + H j . (33)
j Hj
Q Q T
= U Zi · ZPi | Zi · VZPi |VZi · VZPi
Similarly, the simplified structure is obtained by substituting
= Zi VZPi . H j by H j VHPj .
Authorized licensed use limited to: Southeastern Louisiana University. Downloaded on May 14,2025 at 21:07:02 UTC from IEEE Xplore. Restrictions apply.
CHEN AND LIU: BLS: AN EFFECTIVE AND EFFICIENT INCREMENTAL LEARNING SYSTEM 19
TABLE I TABLE II
C LASSIFICATION A CCURACY ON MNIST D ATA S ET C LASSIFICATION A CCURACY ON MNIST D ATA S ET W ITH D IFFERENT
N UMBERS OF E NHANCEMENT N ODES
Authorized licensed use limited to: Southeastern Louisiana University. Downloaded on May 14,2025 at 21:07:02 UTC from IEEE Xplore. Restrictions apply.
CHEN AND LIU: BLS: AN EFFECTIVE AND EFFICIENT INCREMENTAL LEARNING SYSTEM 21
TABLE III
C LASSIFICATION A CCURACY ON MNIST D ATA S ET U SING I NCREMENTAL L EARNING
TABLE IV
S NAPSHOT R ESULTS OF MNIST C LASSIFICATION U SING I NCREMENTAL L EARNING
Authorized licensed use limited to: Southeastern Louisiana University. Downloaded on May 14,2025 at 21:07:02 UTC from IEEE Xplore. Restrictions apply.
22 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 29, NO. 1, JANUARY 2018
TABLE V
S NAPSHOT R ESULTS OF MNIST C LASSIFICATION U SING I NCREMENTAL L EARNING : I NCREMENT OF I NPUT PATTERNS
TABLE VI
S NAPSHOT R ESULTS OF MNIST C LASSIFICATION U SING I NCREMENTAL L EARNING : I NCREMENT OF I NPUT PATTERNS AND E NHANCEMENT N ODES
Authorized licensed use limited to: Southeastern Louisiana University. Downloaded on May 14,2025 at 21:07:02 UTC from IEEE Xplore. Restrictions apply.
CHEN AND LIU: BLS: AN EFFECTIVE AND EFFICIENT INCREMENTAL LEARNING SYSTEM 23
TABLE VIII
N ETWORK C OMPRESSION R ESULT U SING SVD B ROAD L EARNING A LGORITHM
Authorized licensed use limited to: Southeastern Louisiana University. Downloaded on May 14,2025 at 21:07:02 UTC from IEEE Xplore. Restrictions apply.
24 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 29, NO. 1, JANUARY 2018
[18] M. Leshno, V. Y. Lin, A. Pinkus, and S. Schocken, “Multilayer [44] B. A. Olshausen and D. J. Field, “Sparse coding with an overcomplete
feedforward networks with a nonpolynomial activation function can basis set: A strategy employed by V1?” Vis. Res., vol. 37, no. 23,
approximate any function,” Neural Netw., vol. 6, no. 6, pp. 861–867, pp. 3311–3325, 1997.
1993. [45] R. Tibshirani, “Regression shrinkage and selection via the lasso,” J. Roy.
[19] Y.-H. Pao and Y. Takefuji, “Functional-link net computing: Theory, Statist. Soc., B (Methodological), vol. 58, no. 1, pp. 267–288, 1996.
system architecture, and functionalities,” Computer, vol. 25, no. 5, [46] J. A. Tropp and A. C. Gilbert, “Signal recovery from random mea-
pp. 76–79, May 1992. surements via orthogonal matching pursuit,” IEEE Trans. Inf. Theory,
[20] Y.-H. Pao, G.-H. Park, and D. J. Sobajic, “Learning and generalization vol. 53, no. 12, pp. 4655–4666, Dec. 2007.
characteristics of the random vector functional-link net,” Neurocomput- [47] M. Aharon, M. Elad, and A. Bruckstein, “rm K -SVD: An algorithm
ing, vol. 6, no. 2, pp. 163–180, 1994. for designing overcomplete dictionaries for sparse representation,” IEEE
[21] B. Igelnik and Y.-H. Pao, “Stochastic choice of basis functions in Trans. Signal Process., vol. 54, no. 11, pp. 4311–4322, Nov. 2006.
adaptive function approximation and the functional-link net,” IEEE [48] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed
Trans. Neural Netw., vol. 6, no. 6, pp. 1320–1329, Nov. 1995. optimization and statistical learning via the alternating direction method
[22] Y. LeCun et al., “Handwritten digit recognition with a back-propagation of multipliers,” Found. Trends Mach. Learn., vol. 3, no. 1, pp. 1–122,
network,” in Proc. Neural Inf. Process. Syst. (NIPS), 1990, pp. 396–404. Jan. 2011.
[23] J. S. Denker et al., “Neural network recognizer for hand-written zip [49] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding
code digits,” in Advances in Neural Information Processing Systems, algorithm for linear inverse problems,” SIAM J. Imag. Sci., vol. 2, no. 1,
D. S. Touretzky, Ed. San Francisco, CA, USA: Morgan Kaufmann, 1989, pp. 183–202, 2009.
pp. 323–331. [50] T. Goldstein and B. O’Donoghue, S. Setzer, and R. Baraniuk, “Fast
alternating direction optimization methods,” SIAM J. Imag. Sci., vol. 7,
[24] K. S. Narendra and K. Parthasarathy, “Identification and control of
no. 3, pp. 1588–1623, 2014.
dynamical systems using neural networks,” IEEE Trans. Neural Netw.,
[51] O. Breuleux, Y. Bengio, and P. Vincent, “Quickly generating represen-
vol. 1, no. 1, pp. 4–27, Mar. 1990.
tative samples from an RBM-derived process,” Neural Comput., vol. 23,
[25] I. Y. Tyukin and D. V. Prokhorov, “Feasibility of random basis func- no. 8, pp. 2058–2073, Aug. 2011.
tion approximators for modeling and control,” in Proc. IEEE Control [52] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting
Appl. Intell. Control (ISIC) (CCA), Jul. 2009, pp. 1391–1396. and composing robust features with denoising autoencoders,” in Proc.
[26] C. L. P. Chen and C.-Y. Zhang, “Data-intensive applications, challenges, 25th Int. Conf. Mach. Learn. (ICML), New York, NY, USA, 2008,
techniques and technologies: A survey on big data,” Inf. Sci., vol. 275, pp. 1096–1103.
pp. 314–347, Aug. 2014. [53] C. M. Bishop, Pattern Recognition and Machine Learning (Information
[27] C. L. P. Chen and J. Z. Wan, “A rapid learning and dynamic stepwise Science and Statistics). Secaucus, NJ, USA: Springer-Verlag, 2006.
updating algorithm for flat neural networks and the application to time- [54] E. Cambria et al., “Extreme learning machines [trends controversies],”
series prediction,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 29, IEEE Intell. Syst., vol. 28, no. 6, pp. 30–59, Nov. 2013.
no. 1, pp. 62–72, Feb. 1999. [55] Y. LeCun, F. J. Huang, and L. Bottou, “Learning methods for generic
[28] A. Rakotomamonjy, “Variable selection using SVM-based criteria,” object recognition with invariance to pose and lighting,” in Proc. IEEE
J. Mach. Learn. Res., vol. 3, pp. 1357–1370, Mar. 2003. Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), vol. 2.
[29] P. M. Narendra and K. Fukunaga, “A branch and bound algorithm Jun. 2004, pp. II-94–II-104.
for feature subset selection,” IEEE Trans. Comput., vol. 26, no. 9,
pp. 917–922, Sep. 1977.
[30] J. Fan and R. Li, “Variable selection via nonconcave penalized likelihood
and its oracle properties,” J. Amer. Statist. Assoc., vol. 96, no. 456,
pp. 1348–1360, 2001. C. L. Philip Chen (S’88–M’88–SM’94–F’07)
[31] R. G. Baraniuk and M. B. Wakin, “Random projections of smooth received the M.S. degree in electrical engineering
manifolds,” Found. Comput. Math., vol. 9, no. 1, pp. 51–77, 2009. from the University of Michigan, Ann Arbor, MI,
[32] L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, USA, in 1985, and the Ph.D. degree in electrical
2001. engineering from Purdue University, West Lafayette,
[33] L. Grasedyck, D. Kressner, and C. Tobler, “A literature survey of low- IN, USA, in 1988.
rank tensor approximation techniques,” GAMM-Mitteilungen, vol. 36, He was a tenured professor in the United States
no. 1, pp. 53–78, 2013. for 23 years. He is currently the Dean of the Faculty
[34] I. Markovsky, “Low rank approximation,” in Algorithms, Implementa- of Science and Technology, University of Macau,
tion, Applications (Communications and Control Engineering). London, Macau, China, where he is the Chair Professor of the
U.K.: Springer, 2011. Department of Computer and Information Science.
He is also the Department Head and an Associate Dean with two different
[35] Z. Yang, Y. Xiang, K. Xie, and Y. Lai, “Adaptive method for nonsmooth
universities. His current research interests include systems, cybernetics, and
nonnegative matrix factorization,” IEEE Trans. Neural Netw. Learn.
computational intelligence.
Syst., vol. 28, no. 4, pp. 948–960, Apr. 2017.
Dr. Chen is a fellow of AAAS, Chinese Association of Automation,
[36] C. L. P. Chen, “A rapid supervised learning neural network for function and HKIE. He was the President of IEEE Systems, Man, and Cybernetics
interpolation and approximation,” IEEE Trans. Neural Netw., vol. 7, Society from 2012 to 2013. He has been the Editor-in-Chief of the IEEE
no. 5, pp. 1220–1230, Sep. 1996. T RANSACTIONS ON S YSTEMS , M AN , AND C YBERNETICS : S YSTEMS since
[37] C. Leonides, “Control and dynamic systems V18,” in Advances in 2014 and an Associate Editor of several IEEE Transactions. He is also the
Theory and Applications (Control and dynamic systems). Amsterdam, Chair of TC 9.1 Economic and Business Systems of International Federation
The Netherlands: Elsevier, 2012. of Automatic Control, and also an Accreditation Board of Engineering and
[38] A. Ben-Israel and T. Greville, Generalized Inverses: Theory and Appli- Technology Education Program Evaluator for computer engineering, electrical
cations. New York, NY, USA: Wiley, 1974. engineering, and software engineering programs.
[39] C. R. Rao and S. K. Mitra, “Generalized inverse of a matrix and its
applications,” in Proc. 6th Berkeley Symp. Math. Statist. Probab., vol. 1.
1972, pp. 601–620.
[40] D. Serre, “Matrices,” in Theory and Applications (Graduate Texts in
Mathematics). New York, NY, USA: Springer-Verlag, 2002. Zhulin Liu received the bachelor’s degree in mathe-
[41] A. E. Hoerl and R. W. Kennard, “Ridge regression: Biased estimation matics from Shandong University, Shandong, China
for nonorthogonal problems,” Technometrics, vol. 42, no. 1, pp. 80–86, in 2005, and the M.S. degree in mathematics from
Feb. 2000. the University of Macau, Macau, China, in 2009,
[42] Z. Xu, X. Chang, F. Xu, and H. Zhang, “L 1/2 regularization: A thresh- where she is currently pursuing the Ph.D. degree
olding representation theory and a fast solver,” IEEE Trans. Neural Netw. with the Faculty of Science and Technology.
Learn. Syst., vol. 23, no. 7, pp. 1013–1027, Jul. 2012. Her current research interests include computa-
tional intelligence, matching learning, and function
[43] W. Yang, Y. Gao, Y. Shi, and L. Cao, “MRM-lasso: A sparse multiview
approximation.
feature selection method via low-rank analysis,” IEEE Trans. Neural
Netw. Learn. Syst., vol. 26, no. 11, pp. 2801–2815, Nov. 2015.
Authorized licensed use limited to: Southeastern Louisiana University. Downloaded on May 14,2025 at 21:07:02 UTC from IEEE Xplore. Restrictions apply.