0% found this document useful (0 votes)
37 views

Adaptive Deep Supervised Autoencoder Based Image R PDF

Uploaded by

Bob Assan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Adaptive Deep Supervised Autoencoder Based Image R PDF

Uploaded by

Bob Assan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Hindawi Publishing Corporation

Mathematical Problems in Engineering


Volume 2016, Article ID 6795352, 14 pages
http://dx.doi.org/10.1155/2016/6795352

Research Article
Adaptive Deep Supervised Autoencoder Based Image
Reconstruction for Face Recognition

Rongbing Huang,1,2 Chang Liu,1 Guoqi Li,3 and Jiliu Zhou2


1
Key Laboratory of Pattern Recognition and Intelligent Information Processing, Institutions of Higher Education of
Sichuan Province, Chengdu University, Chengdu, Sichuan 610106, China
2
School of Computer and Software, Sichuan University, Chengdu, Sichuan 610065, China
3
School of Reliability and System Engineering, Beihang University, Beijing 100191, China

Correspondence should be addressed to Rongbing Huang; [email protected]

Received 3 June 2016; Revised 30 July 2016; Accepted 28 September 2016

Academic Editor: Simone Bianco

Copyright © 2016 Rongbing Huang et al. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly
cited.

Based on a special type of denoising autoencoder (DAE) and image reconstruction, we present a novel supervised deep learning
framework for face recognition (FR). Unlike existing deep autoencoder which is unsupervised face recognition method, the
proposed method takes class label information from training samples into account in the deep learning procedure and can
automatically discover the underlying nonlinear manifold structures. Specifically, we define an Adaptive Deep Supervised Network
Template (ADSNT) with the supervised autoencoder which is trained to extract characteristic features from corrupted/clean
facial images and reconstruct the corresponding similar facial images. The reconstruction is realized by a so-called “bottleneck”
neural network that learns to map face images into a low-dimensional vector and reconstruct the respective corresponding
face images from the mapping vectors. Having trained the ADSNT, a new face image can then be recognized by comparing its
reconstruction image with individual gallery images, respectively. Extensive experiments on three databases including AR, PubFig,
and Extended Yale B demonstrate that the proposed method can significantly improve the accuracy of face recognition under
enormous illumination, pose change, and a fraction of occlusion.

1. Introduction nearest neighbor classifier (NNC) and its variants like near-
est subspace [1] are the most popular methods in pattern
Over the last couple of decades, face recognition has gained a classification [2]. In [3], the problem of face recognition was
great deal of attention in the academic and industrial commu- transformed to a binary classification problem through con-
nities on account of its challenging essence and its widespread structing intra- and interfacial image spaces. The intraspace
applications. The study of face recognition has a great theoret- stands for the difference of the same person and the inter-
ical value, which involves image processing, artificial intel- space denotes the difference of different people. Then, many
ligence, machine learning, computer vision, and so on, and binary classifiers such as Support Vector Machine (SVM) [4],
it also has a high correlation with other biometrics like Bayesian, and Adaboost [5] can be used.
fingerprints, speech recognition, and iris scans. In the field Besides the classifier design, the other important issue
of pattern recognition, as a classic problem, face recognition is feature representation. In the real word, face images
mainly covers two issues, feature exaction and classifier are usually influenced by variances such as illuminations,
design. Currently, most existing works are focusing on these posture, occlusions, and expressions. Additionally, there is
two aspects to promote the performance of face recognition fact that the difference from the same person would be
system. much larger than that from different people. Therefore, it is
In most real-world applications, it is actually a multiclass crucial to get efficient and discriminant features making the
classification issue for face recognition. There are many classi- intraspace compact and expanding the margin among dif-
fication methods proposed by researchers. Among them, ferent people. Until now, various feature extraction methods
2 Mathematical Problems in Engineering

1024
Decoder

x̂ ··· 2000

1256

ℎ ···
250
2
̂
‖x − x‖
1256
x̃ ···

2000

x ··· 1024
Encoder

(a) (b)

Figure 1: Network architectures. (a) DAE and (b) SDAE.

have been explored, including classical subspace-based The rest of this paper is organized as follows. In Section 2,
dimension reduction approaches like principal component we give a brief review of DAE and the state-of-the-art face
analysis (PCA), fisher linear discriminant analysis (FLDA), recognition based on deep learning. In Section 3, we focus
independent component analysis (ICA), and so on [6]. In on the proposed face recognition approach. The experimental
addition, there are some local appearance features extraction results conducted on three public databases are given in
methods like Gabor wavelet transform, local binary patterns Section 4. Finally, we draw a conclusion in Section 5.
(LBP), and their variants [7] which are stable to local
facial variations such as expressions, occlusions, and poses. 2. Related Work
Currently, deep learning including deep neural network has
shown its great success on image expression [8, 9], and their In this section, we briefly review work related to DAE and
basic idea is to train a nonlinear feature extractor in each layer deep learning based face recognition system.
[10, 11]. After greedy layer-wise training of a deep network 2.1. Work Related to DAE. DAE is a one-layer neural network,
architecture, the output of the network is applied as image which is a recent variant of the conventional autoencoder
feature for latter classification task. Among deep network (AE). It learns to try to recover the clean input data sample
architectures, as a representative building block, denoising from its corrupted version. The architecture of DAE is illus-
autoencoder (DAE) [12] learns features that is robust to trated in Figure 1(a). Let there be a total of 𝑘 training samples
noise by a nonlinear deterministic mapping. Image features and let 𝑥 denote the original input data. In DAE, firstly,
derived from DAE have demonstrated good performance in let the input data 𝑥 be contaminated with some predefined
many aspects such as object detection and digit recognition. noise such as Gaussian white noise or Poisson noise to obtain
Inspired by the great success of DAE based deep network corrupted version 𝑥 ̃ such that 𝑥̃ is input into an encoder ℎ =
architecture, a supervised autoencoder (SAE) [9] was also 𝑓(̃𝑥) = 𝑢𝑓 (𝑊̃ 𝑥 + 𝑏𝑓 ). Then an output of the encoder ℎ is used
proposed to build the block, which firstly treated the facial
as an input of a decoder 𝑥 ̂ = 𝑔(ℎ) = 𝑢𝑔 (𝑊󸀠 ℎ+𝑏𝑔 ). Here 𝑢𝑓 and
images in some variants like illuminations, expressions, and
𝑢𝑔 are the predefined activation functions such as sigmoid
poses as corrupted images by noises. A face image without the
function, hyperbolic tangent function, or rectifier function
variant through an SAE can be recovered; meanwhile, robust
features for image representation are also extracted. [13] of encoder and decoder, respectively. 𝑊 ∈ 𝑅𝑑ℎ ×𝑑𝑥 and
Taking as an example the great success of DAE and SAE 𝑊󸀠 ∈ 𝑅𝑑𝑥 ×𝑑ℎ are the network parameters which denote the
based deep learning and inspired by the face recognition weights for the encoder and decoder, respectively. 𝑏𝑓 ∈ 𝑅𝑑ℎ
under complex environment, in this article, we present a and 𝑏𝑔 ∈ 𝑅𝑑𝑥 refer to the bias terms. 𝑑𝑥 and 𝑑ℎ present
novel deep learning method based on SAE for face recogni- dimensionality of the original data and the number of hidden
tion. Unlike existing deep stacked autoencoder (AE) which neurons, respectively. On the basis of the above definition, a
is an unsupervised feature learning approach, our proposed DAE learns by solving a regularized optimization problem as
method takes full advantage of the class label information of follows:
training samples in the deep learning procedure and tries to 𝑘
𝜆 󵄩 󵄩2
discover the underlying nonlinear manifold structures in the min ̂ ‖22 +
∑ ‖𝑥 − 𝑥 (∑ ‖𝑊‖2𝐹 + ∑ 󵄩󵄩󵄩󵄩𝑊󸀠 󵄩󵄩󵄩󵄩𝐹 ) . (1)
data. 󸀠
𝑊,𝑊 ,𝑏𝑓 ,𝑏𝑔 𝑖=1 2 𝑗 𝑙
Mathematical Problems in Engineering 3

ADSNT

Gallery Preprocess,
images for example, Train Output: label I of the test image
󵄩󵄩 ̂ 󵄩
Training histogram I = arg min 󵄩󵄩x(t) − xg 󵄩󵄩󵄩 ∀g ∈ 1, 2, . . . , c
set Probe g 󵄩 󵄩 󵄩󵄩
equalization
images
Face 𝜃ADSNT
dataset

Preprocess,
for example, Map
Test image histogram
equalization

Figure 2: Flowchart of the proposed ADSNT image reconstruction for face recognition.

Here ‖ ⋅ ‖22 is the reconstruction error and ‖ ⋅ ‖𝐹 denotes the structure has been greatly successful in image representation
Frobenius norm and 𝜆 is a parameter that balances the recon- field [8, 12, 15, 16]. By means of deep structure learning, the
struction loss and weight penalty terms. With reconstructing ability of model representation gets great enhancement and
the clean input data from a corrupted version of it, a DAE we can learn complicated (nonlinear) information from orig-
can explore more robust features than a conventional AE only inal data effectively. In [16], deep Fisher network was designed
simply learning the identity mapping. through stacking all the Fisher vectors, which greatly per-
To further promote learning meaningful features, sparsity formed over conventional Fisher vector representation. Chen
constraints [14] are utilized to impose on the hidden neurons et al. [17] proposed marginalized SDAE to learn the opti-
when the number of hidden neurons is large, which is defined mal closed-form solution, which reduced the computational
in the light of the Kullback-Leibler (KL) divergence as complexity and improved the scalability of high-dimensional
𝑚 𝑚 descriptive features. Taigman et al. [18] presented a face
𝜌 1−𝜌 verification system based on Convolutional Neural Networks
∑ KL (𝜌 || 𝜌𝑗 ) = ∑ 𝜌 log + (1 − 𝜌) log ( ) , (2)
𝑗 𝑗=1 𝜌𝑗 1 − 𝜌𝑗 (CNNs), which also obtained high accuracy of verification on
the LFW dataset. Zhu et al. [19] designed a network structure
where 𝑚 is the number of neurons in one hidden layer, 𝜌𝑗 that is composed of facial identity-preserving layer and
is determined by taking the average activation of a hidden image reconstruction layer, which can reduce intravariance
unit 𝑗 (over all the training set), and 𝜌 is a sparsity parameter and achieve discriminant information preservation. In [20],
(typically a small value). Hayat et al. proposed a deep learning framework based
After finishing 𝑓 and 𝑔 learning, the output from encoder on AE with application to image set classification and face
ℎ is input to the next layer. Through training such DAE recognition, which obtained the best performance comparing
layerwise, stacked denoising autoencoders (SDAE) are then with existing state-of-the-art methods. Gao et al. [9] further
built. Its structure is illustrated in Figure 1(b). proposed an SAE which can be used to build the deep
In the real-word application, like face recognition, the architecture and can extract the facial features that are robust
faces are usually influenced by all kinds of variances such as to variants. Sun et al. [21] learned multiple convolutional
expression, illumination, pose, and occlusion. To overcome networks (ConvNets) from predicting 10,000 subjects, which
the effect of variances, Gao et al. [9] proposed supervised generalized well to face verification issue. Furthermore, they
autoencoder based on the principle of DAE. They treated improved the ConvNets by incorporating identification and
the training sample (gallery image) from each person with verification missions and enhanced recognition performance
frontal/uniform illumination, neural expression, and without [22]. Cai et al. [23] stacked several sparse independent
occlusion as clean data and test faces (probe images) accom- subspace analyses (sISA) to construct deep network structure
panied by variances (expression, illumination, occlusion, etc.) to learn identity representation.
as corrupted data. A mapping capturing the discriminant
structure of the facial images from different people is learned,
while keeping robust to the variances in these faces. Then 3. Proposed Method
robust feature is extracted for image presentation and the
performance of face recognition is greatly enhanced. This section presents our proposed approach whose block
diagram is illustrated in Figure 2. Firstly, inspired by stacked
2.2. Deep Learning Based Face Recognition System. In the DAE and SAE [9], we define Adaptive Deep Supervised
early face recognition, there have been various face represen- Network Template (ADSNT) that can learn an underlying
tation methods including hand-crafted or “shallow” learning nonlinear manifold structure from the facial images. The
ways [6, 7]. In recent years, with the development of big basic architecture of ADSNT is illustrated in Figure 3(c) and
data and computer hardware, feature learning based on deep the corresponding details are depicted in Section 3.1. To make
4 Mathematical Problems in Engineering

Similar
g3
Similar f3 f3
··· ··· ···
··· ··· g2
g𝜃󳰀 f2 f2
f𝜃1 f𝜃1 1 ··· ··· ···
̂
f1 x g1
f1
··· ··· ··· ··· ···
···
x ̃
x ̂
x x ̃
x ̂
x
Similar Similar

(a) (b)

̂)
JH (x, x

̂
x x
··· ···
g𝜃󳰀
1
··· ···
g𝜃󳰀
2
··· ···
DC

g𝜃󳰀
3
··· ···
f𝜃 3 f𝜃 3
··· ···
EC

f𝜃 2 f𝜃 2
··· ···
f𝜃 1 f𝜃 1
··· ···

̃
x x
Corrupted face Clean face
(c)

Figure 3: Architecture of SSAE and ADSNT. (a) Supervised autoencoder (SAE) which is comprised of clean/“corrupted” datum, one hidden
layer, and one reconstruction layer by using the “corrupted datum”; (b) stacked supervised autoencoder (SSAE); (c) architecture of the
Adaptive Deep Supervised Network Template (ADSNT).

the deep network perform well, similar to [20], we need to one reconstruction layer. In this paper, we use three hidden
give it initialization weights. Then, the preinitialized ADSNT layers to compose the encoder and decoder, respectively,
is trained to reconstruct the invariant faces which are insen- whose structures are shown in Figures 3(b) and 3(c). The
sitive to illumination, pose, and occlusion. Finally, having encoder part tries best to seek a compact low-dimensional
trained the ADSNT, we use the nearest neighbor classifier to meaningful representation of the clean/“corrupted” data.
recognize a new face image by comparing its reconstruction Following the work [20], the encoder can be formulated as
image with individual gallery images, respectively. a combination of several layers which are connected with a
nonlinear activation function 𝑢𝑓 (⋅). We can use a sigmoid
3.1. Adaptive Deep Supervised Network Template (ADSNT). function or a rectified linear unit as nonlinear activation to
As presented in Figure 3(c), our ADSNT is a deep supervised map the clean/“corrupted” data 𝑥/̃ 𝑥 to a representation ℎ as
autoencoder (DSAE) that consists of two parts: an encoder follows:
(EC) and a decoder (DC). Each of them has three hidden
layers and they share the third layer, that is, the central
ℎ = 𝑓 (ℎ2 ) = 𝑢𝑓 (𝑊𝑒(3) ℎ2 + 𝑏𝑒(3) ) ,
hidden layer. The features learned from the hidden layer and
the reconstructed clean face are obtained by using the “cor-
ℎ2 = 𝑓 (ℎ1 ) = 𝑢𝑓 (𝑊𝑒(2) ℎ1 + 𝑏𝑒(2) ) ,
rupted” data to train the SSAE. In the process of pretraining,
we learn a stack of SAE, each having only one hidden layer
ℎ1 = 𝑓 (𝑥) = 𝑢𝑓 (𝑊𝑒(1) 𝑥 + 𝑏𝑒(1) ) ,
of feature detectors. Then, the learned activation features of
(3)
one SAE are used as “data” for training the next SAE in the
ℎ = 𝑓 (ℎ2 ) = 𝑢𝑓 (𝑊𝑒(3) ℎ2 + 𝑏𝑒(3) ) ,
stack. Such training is repeated a number of times until we get
the desired number of layers. Although we use the basic SAE
structure which is shown in Figure 3(a) [9] to construct the ℎ2 = 𝑓 (ℎ1 ) = 𝑢𝑓 (𝑊𝑒(2) ℎ1 + 𝑏𝑒(2) ) ,
stacked supervised autoencoder (SSAE), Gao et al.’s stacked
supervised autoencoder only used two hidden layers and 𝑥) = 𝑢𝑓 (𝑊𝑒(1) 𝑥
ℎ1 = 𝑓 (̃ ̃ + 𝑏𝑒(1) ) ,
Mathematical Problems in Engineering 5

where 𝑊𝑒(𝑖) ∈ 𝑅𝑑𝑖−1 ×𝑑𝑖 is a weight matrix of the encoder for the Then, we can further modify cost function and obtain the
𝑖th layer with 𝑑𝑖 neurons and 𝑏𝑒(𝑖) ∈ 𝑅𝑑𝑖 is the bias vector. The following objection formulation:
encoder parameters learning are achieved by jointly training
the encoder-decoder structure to reconstruct the “corrupt” arg min 𝐽reg
𝜃ADSNT
data by minimizing a cost function (see Section 3.2). There-
fore, the decoder can be defined as a combination of several 3 5 (6)
layers integrating a nonlinear activation function 𝑢𝑔 (⋅) which = 𝐽 + 𝛾 (∑ KL (𝜌𝑥 || 𝜌0 ) + ∑ KL (𝜌𝑥̃ || 𝜌0 )) ,
reconstructs the “corrupt” data 𝑥 ̃ from the encoder output ℎ. 𝑖 𝑖
The reconstructed output 𝑥 ̂ of the decoder is given by
where
1 1
̂ = 𝑔 (𝑥) = 𝑢𝑔 (𝑊𝑑(3) 𝑥 + 𝑏𝑑(3) ) ,
𝑥 𝜌𝑥 = ∑ ( 𝑓 (𝑥𝑖 ) + 1) ,
𝑀 𝑖 2
𝑥 = 𝑔 (𝑥) = 𝑢𝑔 (𝑊𝑑(2) 𝑥 + 𝑏𝑑(2) ) , (4) 1 1
𝜌𝑥̃ = 𝑥𝑖 ) + 1) ,
∑ (𝑓 (̃
𝑀 𝑖 2
𝑥 = 𝑔 (ℎ) = 𝑢𝑔 (𝑊𝑑(1) ℎ + 𝑏𝑑(1) ) . (7)
KL (𝜌 || 𝜌0 )
So, we can describe the complete ADSNT by its parameter
𝜌0 1 − 𝜌0
𝜃ADSNT = {𝜃𝑊, 𝜃𝑏 }, where 𝜃𝑊 = {𝑊𝑒(𝑖) , 𝑊𝑑(𝑖) } and 𝜃𝑏 = = ∑ (𝜌0 log ( ) + (1 − 𝜌0 ) log ( )) .
𝜌𝑗 1 − 𝜌𝑗
{𝑏𝑒(𝑖) , 𝑏𝑑(𝑖) }, 𝑖 = 1, 2, 3. 𝑗

Here the KL divergence between two distributions, that is, 𝜌0


3.2. Formulation of Image Reconstruction Based on ADSNT. and 𝜌𝑗 that present 𝜌𝑥 or 𝜌𝑥̃ , is calculated. The sparsity 𝜌0 is
Now, we are ready to depict the reconstruction image based usually a constant (taking a small value, according to the work
on ADSNT. The details are presented as follows. [9, 24], it is set to 0.05 in our experiments), whereas 𝜌𝑥 and 𝜌𝑥̃
Given a set of 𝑘 classes training images that include are the mapping mean activation values from clean data and
gallery images (called clean data) and probe images (called corrupted data, respectively.
“corrupted” data), and their corresponding class labels 𝑦𝑐 =
[1, 2, . . . , 𝑘], the dataset will be used to train ADSNT for 3.3. Optimization of ADSNT. For obtaining the optimization
feature learning. Let 𝑥̃𝑖 denote a probe image, and 𝑥𝑖 (𝑖 = parameter 𝜃ADSNT = {𝜃𝑊, 𝜃𝑏 }, it is important to initialize
1, 2, . . . , 𝑀) present gallery images corresponding to 𝑥̃𝑖 . It is weights and select an optimization training algorithm. The
desirable that 𝑥𝑖 and 𝑥̃𝑖 should be similar. Therefore, following training will fail if the initialization weights are inappropriate.
the work [9, 22], we obtain the following formulation: This is to say, if we give network too large initialization
weights, the ADSNT will be trapped in local minimum. If the
1 initialized weights are too small, the ADSNT will encounter
󵄩 󵄩2
arg min 𝐽 = ∑ 󵄩󵄩𝑥 − 𝑥̂𝑖 󵄩󵄩󵄩 the vanishing gradient problem during backpropagation.
𝜃ADSNT 𝑀 𝑖 󵄩 𝑖 Therefore, following the work [20, 24], Gaussian Restricted
𝜆 𝜃𝑊 Boltzmann Machines (GRBMs) are adopted to initialize
󵄩 󵄩2
+ ∑ 󵄩󵄩󵄩𝑓 (𝑥𝑖 ) − 𝑓 (̃
𝑥𝑖 )󵄩󵄩󵄩 (5)
weight parameters by performing pretraining, which has
𝑀 𝑖 been already applied widely. For more details, we refer
the reader to the original paper [24]. After obtaining the
𝜑 3 󵄩󵄩 (𝑖) 󵄩󵄩2 3
󵄩 󵄩2 initialized weights, the limited memory Broyden-Fletcher-
+ (∑ 󵄩󵄩󵄩𝑊𝑒 󵄩󵄩󵄩𝐹 + ∑ 󵄩󵄩󵄩󵄩𝑊𝑑(𝑖) 󵄩󵄩󵄩󵄩𝐹 ) ,
2 𝑗 𝑗
Goldfarb-Shanno (L-BFGS) optimization algorithm is uti-
lized to learn the parameters as it has better performance and
faster convergence than stochastic gradient descent (SGD)
where 𝜃ADSNT = {𝜃𝑊, 𝜃𝑏 } (see Section 3.1) are the parameters and conjugated gradient (CGD) [25]. Algorithm 1 depicts the
of ADSNT which is fine-tuned by learning. In this paper, we optimization procedure of ADSNT.
𝑇
only explore the tied weights; that is, 𝑊𝑑(3) = 𝑊𝑒(1) , 𝑊𝑑(2) =
𝑇 𝑇
Algorithm 1. (learning adaptive deep supervised network
𝑊𝑒(2) , and 𝑊𝑑(1) = 𝑊𝑒(3) (see Figure 3(c)). 𝑥̂𝑖 is the recon- template)
struction image of the corrupted image 𝑥̃𝑖 . Like regularization
parameter, 𝜆 𝜃𝑊 balances the similarity of the same person Input. Training images Ω: 𝑘 classes, and each class is com-
to preserve 𝑓(𝑥𝑖 ) and 𝑓(̃𝑥𝑖 ) as similarly as possible. 𝑓(⋅) is a posed of the face with neutral expression, frontal pose,
nonlinear activation function. 𝜑 is a parameter that balances and normal illumination condition (clean data) and random
weight penalty terms and reconstruction loss. ‖ ⋅ ‖𝐹 presents number of variant faces (corrupted data). Number of network
the Frobenius norm and ∑3𝑗 ‖𝑊𝑒(𝑖) ‖2𝐹 + ∑3𝑗 ‖𝑊𝑑(𝑖) ‖2𝐹 ensures layers 𝐿. Iterative number 𝐼, balancing parameters 𝜆, 𝜑 and 𝛾,
small weight values for all the hidden neurons. Furthermore, and convergence error 𝜀.
following the work [9, 14], we impose a sparsity constraint
on the hidden layer to enhance learning meaningful features. Output. Weight parameters 𝜃ADSNT = {𝜃𝑊, 𝜃𝑏 }
6 Mathematical Problems in Engineering

(1) Preprocess all images, namely, perform histogram images and make them more compact. For the details about
equalization histogram equalization, one can be referred to see [26].
(2) 𝑋: Randomly select a small subset for each individual After the ADSNT is trained completely with a certain
from Ω number of individuals, we can use it to perform on the unseen
face images for recognizing them.
(3) Initialize: Train GRBMs by using 𝑋 to initialize the
Given a test facial image 𝑥(𝑡) which is also preprocessed
𝜃ADSNT = {𝜃𝑊, 𝜃𝑏 }
with histogram equalization in the same way as the training
(4) (Optimization by L-BFGS) images and presented to the ADSNT network, we reconstruct
For 𝑟 = 1, 2, . . . , 𝑅 do (using (3) and (4)) image 𝑥̂(𝑡)
from ADSNT, which is similar
to clean face. For the sake of simplicity, the nearest neighbor
Calculate 𝐽reg using (6)
classification based on the Euclidean distance between the
If 𝑟 > 1 and |𝐽𝑟 − 𝐽𝑟−1 | < 𝜀, go to Return reconstruction and all the gallery images identifies the class.
The classification formula is defined as
Return. 𝜃𝑊 and 𝜃𝑏 . 󵄩󵄩󵄩 ̂ 󵄩󵄩
𝐼𝑘 (𝑥(𝑡) ) = arg min 󵄩𝑥(𝑡) − 𝑥𝑔 󵄩󵄩󵄩 , ∀𝑔 ∈ 1, 2, . . . , 𝑐,
𝑔 󵄩 󵄩󵄩 󵄩󵄩 (8)
Since training the ADSNT model aims to reconstruct
clean data, namely, gallery images from corrupt data, it might where 𝐼𝑘 (𝑥(𝑡) ) is the resulting identity and 𝑥𝑔 is the clean facial
learn an underlying structure from the corrupt data and image in the gallery images of individual 𝑔.
produce very useful representation. Furthermore, we can
learn an overcomplete sparse representation from corrupt
data through mapping them into a high-dimensional feature 4. Experimental Results and Discussion
space since the first hidden layer has the number of neurons
In this section, extensive experiments are conducted to
larger than the dimensionality of original data. The high-
present and compare the performance of different methods
dimensional model representation is then followed by a so-
with the proposed approach. The experiments are imple-
called “bottleneck”; that is, the data is further mapped to an
mented on three widely used face databases, that is, AR
abstract, compact, and low-dimensional model representa-
[27], Extended Yale B [28], and PubFig [29]. The details of
tion in the subsequent layers of the encoder. Through such
these three databases and performance evaluation of different
a mapping, the redundant information such as illumination,
approaches are presented as follows.
poses, and partial occlusion in the corrupted faces is removed
and only the useful information content for us is kept. In addi- 4.1. Dataset Description. The AR database contains over 4000
tion, we know that if we use AE with only one hidden layer color face images from 126 people (56 women and 70 men).
and jointly linear activation functions, the learned weights The images were taken in two sessions (between two weeks)
would be analogous to a PCA subspace [20]. However, AE and each session contained 13 pictures from one person.
is an unsupervised algorithm. In our work, we make use of These images contain frontal view faces with different facial
the class label information to train SAE, so if we also use expression, illuminations, and occlusions (sun glasses and
only one hidden layer with a linear activation function, the scarf). Some sample face images from AR are illustrated
learned weights by the SAE are thought to be similar to “LDA” in Figure 5(a). In our experiments, for each person, we
subspace. However, in our structure, we apply the nonlinear choose the facial images with neutral expression, frontal pose,
activation functions and stack several hidden layers together, and normal illumination condition as gallery images and
and then the ADSNT can adapt to very complicated nonlinear randomly select half the number of images from the rest of
manifold structures. Some of reconstructed images based on the images of each person as probe images. The remaining
ADSNT from AR database are shown in Figure 4(b). One can images compose the testing set.
see that ADSNT can remove the illumination. For those face The Extended Yale B database consists of 16128 images
images with partial occlusion, ADSNT can also imitate the of 38 people under 64 illumination conditions and 9 poses.
clean faces. This results are not surprising because the human Some sample face images from Extended Yale B are illustrated
being has the capability of inferring the unknown faces from in Figure 5(b). For each person, we select the faces that have
known face images via the experience (for deep network normal light condition and frontal pose as gallery images and
structure, the experience learned derives from generic set) randomly choose 6 poses and 16 illumination face images to
[9]. compose the probe images. The remaining images compose
3.4. Face Classification Based on ADSNT Image Reconstruc- the testing set.
tion. To better train ADSNT, all images need to be pre- The PubFig database is composed of 58,797 images of
processed. It is a very important step for object recogni- 200 subjects taken from the internet. The images of the
tion including face recognition. The common ways include database were taken in completely uncontrolled conditions
histogram equalization, geometry normalization, and image with noncooperative people. These images have a very large
smoothing. In this paper, for the sake of simplicity, we only degree of variability in face expression, pose, illumination,
perform histogram equalization on all the facial images to and so forth. Some sample images from PubFig are illustrated
minimize illumination variations. That is, we utilize his- in Figure 5(c). In our experiments, for each individual,
togram equalization to normalize the histogram of facial we select the faces with neutral expression, the frontal or
Mathematical Problems in Engineering 7

(a)

(b)

̃ . (b) Reconstructed
Figure 4: Some original images from AR database and the reconstructed ones. (a) Original face images (corrupted faces) 𝑥
̂.
faces 𝑥

near frontal pose, and normal illumination as galleries and nodes for these layers is empirically set as [1024 → 500 →
randomly choose half the number of images from the rest of 120], because our experiments show that three hidden layers
the images of each person as probes. The remaining images can get a sufficiently good performance (see Section 4.3.3).
compose the testing set. In order to show the whole experimental process about
4.2. Experimental Settings. In all the experiments, the facial parameters setting, we initially use the hyperbolic tangent
images from the AR, PubFig, and Extended Yale B databases function as the nonlinear activation function and implement
are automatically detected using OpenCV face detector [30]. ADSNT on AR. We also choose the face images with neutral
After that, we normalize the detected facial images (in expression, frontal pose, and normal illumination as galleries
orientation and scale) such that two eyes can be aligned and randomly select half the number of images from the rest
at the same location. Then, the face areas are cropped and of the images of each person as probe images. The remaining
converted to 256 gray levels images. The size of each cropped images compose the testing set. The mean identification rates
image is 26 × 30 pixels. Thus, the dimensionality of the input are recorded.
vector is 780. Figure 6 presents an example from AR database Firstly, we empirically set the parameter 𝜀 = 0.001 and
and the corresponding cropped image. Each cropped facial sparsity target 𝜌0 = 0.05 and fix the parameters 𝜆 = 0.5
image is further preprocessed with histogram equalization and 𝜑 = 0.1 in ADSNT to check the effect of 𝛾 on the
to minimize illumination variations. We train our ADSNT identification rate. As illustrated in Figure 6(a), where 𝛾 =
model with 3 hidden layers, where the number of hidden 0.08, ADSNT recognition method gets the best performance.
8 Mathematical Problems in Engineering

(a)

(b)

(c)

Figure 5: A fraction of samples from AR, PubFig, and Extended Yale B face databases. (a) AR, (b) Extended Yale B, and (c) PubFig.

Then, according to Figure 6(a), we fix the parameters 𝛾 = 0.08 and low recognition rate will be achieved. If 𝛾 is too small,
and 𝜆 = 0.5 in ADSNT to check the influence of 𝜑. As we can get poor performance. For the weight decay 𝜑, if it
showed in Figure 6(b), when 𝜑 = 0.6, our method achieves is too small, the values of weights for all hidden units will
the best recognition rate. At last, we fix 𝛾 = 0.08 and 𝜑 = 0.6, change very slightly. On the contrary, the values of weights
and the recognition rates are illustrated in Figure 6(c) with will change greatly.
different value of 𝜆. When 𝜆 = 3, the recognition rate is the Using above those experiments, we gain the optimal
highest. From the plot in Figure 6, one can observe that the parameter values used in ADSNT as 𝜆 = 3, 𝜑 = 0.6, and
parameters 𝜆, 𝜑, and 𝛾 cannot be too large or too small. If 𝛾 = 0.08 on AR database. The similar experiments also have
𝜆 is too large, the ADSNT would be less discriminative of been performed on Extended Yale B and PubFig databases.
different subjects because it implements too strong similarity We can get the parameters setting as 𝜆 = 2.6, 𝜑 = 0.5, and
preservation entry. But if 𝜆 is too small, it will degrade the 𝛾 = 0.06 on Extended Yale B database and 𝜆 = 2.8, 𝜑 = 0.52,
recognition performance and the significance of similarity and 𝛾 = 0.09 on PubFig database.
preservation entry. Similarly, 𝛾 can also not be too large, or In the experiments, we use two measures including
the hidden neurons will not be activated for a given input the mean identification accuracy 𝜇 with standard deviation
Mathematical Problems in Engineering 9

92 95
𝜆 = 0.5 𝛾 = 0.08
𝜑 = 0.1 𝜆 = 0.5
88 90
Identification rate (%)

Identification rate (%)


84 85

80 80

76 75
0.0001
0.005
0.001
0.05
0.06
0.07
0.08
0.09
0.1
0.11
0.12
0.13
0.14

0.006
0.009
0.02
0.04
0.08
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
2
2.2
The parameter 𝛾 The parameter 𝜑
(a) (b)

𝜑 = 0.6
90
𝛾 = 0.08
Identification rate (%)

85

80
0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

5.5

6.0
The parameter 𝜆
(c)

Figure 6: Parameters setting.

](𝜇 ± ]) and the receiving operating characteristic (ROC) Table 1: Comparisons of the average identification accuracy
curves to validate the effectiveness of our method as well as and standard deviation (%) of different approaches on different
other methods. databases.
Method AR Extended Yale B PubFig
4.3. Experimental Results and Analysis DAE [12] 57.56 ± 0.2 63.45 ± 1.3 61.33 ± 1.5
MDAE [17] 67.80 ± 1.3 71.56 ± 1.6 70.55 ± 2.5
4.3.1. Comparison with Different Methods. In the following
CAE [15] 49.50 ± 2.1 55.72 ± 0.8 68.56 ± 1.6
experiments on the three databases, we compare the pro-
posed approach with several recently proposed methods. DLN [31] NA 81.50 ± 1.4 77.60 ± 1.4
These compared methods include DAE with 10% random SSAE [9] 85.21 ± 0.7 82.22 ± 0.3 84.04 ± 1.2
mask noises [12], marginalized DAE (MDAE) [17], Constrac- RICA [32] 76.33 ± 1.7 70.44 ± 1.3 72.35 ± 1.5
tive Autoencoders (CAE) [15], Deep Lambertian Networks TDRM [20] 87.70 ± 0.6 86.42 ± 1.2 89.90 ± 0.9
(DLN) [31], stacked supervised autoencoder (SSAE) [9], ICA- Our method 92.32 ± 0.7 93.66 ± 0.4 91.26 ± 1.6
Reconstruction (RICA) [32], and Template Deep Recon-
struction Model (TDRM) [20]. We use the implementation
of these algorithms that are provided by the respective significantly outperforms other methods and gets the best
authors. For all the compared approaches, we use the default mean recognition rates for the same setting of training
parameters that are recommended in the corresponding and testing sets. Compared to those unsupervised deep
papers. learning methods such as DAE, MDAE, CAE, DLN, and
The mean identification accuracy with standard devia- TDRM, the improvement of our method is over 30% on
tions of different approaches on three databases is shown Extended Yale B and AR databases where there is a little
in Table 1. The ROC curves of different approaches are pose variance. On the PubFig database, our approach can
illustrated in Figure 7. The results imply that our approach also achieve the mean identification rate of 91.26 ± 1.6%
10 Mathematical Problems in Engineering

1.0 1.0

0.9

0.8 0.9

True positive rate


True positive rate

0.7

0.6 0.8

0.5

0.4 0.7
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
False positive rate False positive rate
DAE SSAE DAE RICA
MDAE RICA MDAE TDRM
CAE TDRM CAE Our method
DLN Our method SSAE
(a) (b)
1.00
0.95
0.90
0.85
True positive rate

0.80
0.75
0.70
0.65
0.60
0.55
0.50
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
False positive rate
DAE SSAE
MDAE RICA
CAE TDRM
DLN Our method
(c)

Figure 7: Comparisons of ROC curves between our method and other methods on different databases. (a) AR, (b) Extended Yale B, and (c)
PubFig.

and outperforms all compared methods. The reason is that because of using the weight penalty terms, GRBM to initialize
our method can extract discriminative, robust information to weights, and three layers’ similarity preservation term.
variances (expression, illumination, pose, etc.) in the learned
deep networks. Compared with a supervised method like 4.3.2. Convergence Analysis. In this subsection, we evaluated
RICA, the proposed method can improve over 16%, 19%, the convergence of our ADSNT versus a different number
and 23% on AR, PubFig, and Extended Yale B databases, of iterations. Figure 8 illustrates the value of the objective
respectively. Our method is a deep learning method, which function of ADSNT versus a different number of iterations
focuses on the nonlinear classification problem with learning on the AR, PubFig, and Extended Yale B databases. From
a nonlinear mapping such that more nonlinear, discriminant Figure 8(a), one can observe that ADSNT converges in about
information may be explored to enhance the identification 55, 28, and 70 iterations on the three databases, respectively.
performance. Compared with SSAE method that is designed We also implement the identification accuracy of ADSNT
for removing the variances such as illumination, pose, and versus a different number of iterations on the AR, PubFig,
partial occlusion, our method can still be better over 6% and Extended Yale B databases. Figure 8(b) plots the mean
Mathematical Problems in Engineering 11

4 100

Mean identification rate (%)


3
Objective function value

80

60
1

0 40
0 30 60 90 120 150 20 40 60 80
Iteration number Iteration number
AR AR
Extended Yale B Extended Yale B
PubFig PubFig
(a) (b)

Figure 8: Convergence analysis. (a) Convergence curves of ADSNT on AR, PubFig, and Extended Yale B. (b) Mean identification rate (%)
versus iterations of ADSNT on AR, PubFig, and Extended Yale B.

100 performance of different layer ADSNT. One can observe that


90 three-hidden layer network outperforms 2-layer network,
80
and the result of 3-layer ADSNT network is very nearly equal
Identification accuracy (%)

to those of 4-layer network on AR and Extended Yale B


70
databases. We also observe that the performance of 4-layer
60 network is a bit lower than that of 3-layer network on the
50 PubFig database. In addition, the deeper ADSNT network
40
is, the more complex its computational complexity becomes.
Therefore, the 3-layer network depth is a good trade-off
30 between performance and computational complexity.
20
10 4.3.4. Activation Function. Following the work in [9], we also
0 estimate the performance of ADSNT with different activation
AR Extended Yale B PubFig functions such as sigmoid, hyperbolic tangent, and rectified
Layer 1 Layer 3
linear unit (ReLU) [33] which is defined as 𝑓(𝑥) = max(0, 𝑥).
Layer 2 Layer 4 When the sigmoid 𝑓(𝑥) = 1/(1 + 𝑒−𝑥 ) is used as activation
function, the objective function (see (6)) is rewritten as
Figure 9: The results of ADSNT with different network depth on follows:
the different datasets.
arg min 𝐽
𝜃ADSNT
identification rate of ADSNT. From Figure 8(b), one can also
1 󵄩 󵄩2 𝜆 𝜃 󵄩 󵄩2
observe that ADSNT achieves stable performance after about = ∑ 󵄩󵄩󵄩𝑥𝑖 − 𝑥̂𝑖 󵄩󵄩󵄩 + 𝑊 ∑ 󵄩󵄩󵄩𝑓 (𝑥𝑖 ) − 𝑓 (̃
𝑥𝑖 )󵄩󵄩󵄩
55, 70, and 28 iterations on AR, PubFig, and Extended Yale B 𝑀 𝑖 𝑀 𝑖
databases, respectively.
(9)
𝜑 3 󵄩󵄩 (𝑖) 󵄩󵄩2 3
󵄩 󵄩2
+ (∑ 󵄩󵄩󵄩𝑊𝑒 󵄩󵄩󵄩𝐹 + ∑ 󵄩󵄩󵄩󵄩𝑊𝑑(𝑖) 󵄩󵄩󵄩󵄩𝐹 )
4.3.3. The Effect of Network Depth. In this subsection, we 2 𝑗 𝑗
conduct experiments on the three face datasets with different
hidden layer of our proposed ADSNT network. The proposed 3 5
method achieves an identification rate of 92.3 ± 0.6%, 93.3 ± + 𝛾 (∑ KL (𝜌𝑥 || 𝜌0 ) + ∑ KL (𝜌𝑥̃ || 𝜌0 )) ,
1.2%, and 91.22 ± 0.8% by three-hidden layer ADSNT 𝑖 𝑖
network, that is, 1024 → 500 → 120, respectively, on AR,
Extended Yale B, and PubFig datasets. Figure 9 illustrates the where 𝜌𝑥 = (1/𝑀) ∑𝑖 𝑓(𝑥𝑖 ), 𝜌𝑥̃ = (1/𝑀) ∑𝑖 𝑓(̃
𝑥𝑖 ).
12 Mathematical Problems in Engineering

Table 2: Comparisons of the ADSNT algorithm with different Table 3: (a) Training time (seconds) for different methods. (b)
activation functions on the AR, PubFig, and Extended Yale B Testing time (seconds) for different methods. The proposed method
databases. costs the least amount of testing time comparing with other
methods.
Dataset Sigmoid Tanh ReLU
(a)
AR 88.66 ± 1.4 92.32 ± 0.7 93.22 ± 1.5
Extended Yale B 90.55 ± 0.6 93.66 ± 0.4 94.54 ± 0.3 Methods Time
PubFig 87.40 ± 1.2 91.26 ± 1.6 92.44 ± 1.1 DAE [12] 8.61
MDAE [17] 10.54
CAE [15] 7.86
If ReLU is adopted as activation function, (6) is formu- DLN [31] 23.43
lated as SSAE [9] 53.51
arg min 𝐽 RICA [32] 13.44
𝜃ADSNT TDRM [20] 110.2
Our method 122.32
1 󵄩 󵄩2 𝜆 𝜃 󵄩 󵄩2
= ∑ 󵄩󵄩𝑥 − 𝑥̂𝑖 󵄩󵄩󵄩 + 𝑊 ∑ 󵄩󵄩󵄩𝑓 (𝑥𝑖 ) − 𝑓 (̃
𝑥𝑖 )󵄩󵄩󵄩 (b)
𝑀 𝑖 󵄩 𝑖 𝑀 𝑖
Methods Time
3 3 (10) DAE [12] 0.27
𝜑 󵄩 󵄩2 󵄩 󵄩2
+ (∑ 󵄩󵄩󵄩󵄩𝑊𝑒(𝑖) 󵄩󵄩󵄩󵄩𝐹 + ∑ 󵄩󵄩󵄩󵄩𝑊𝑑(𝑖) 󵄩󵄩󵄩󵄩𝐹 ) MDAE [17] 0.3
2 𝑗 𝑗 CAE [15] 0.26
3 5
DLN [31] 0.35
󵄩 󵄩 󵄩 󵄩
+ 𝛾 (∑ 󵄩󵄩󵄩𝑓 (𝑥𝑖 )󵄩󵄩󵄩1 + ∑ 󵄩󵄩󵄩𝑓 (̃
𝑥𝑖 )󵄩󵄩󵄩1 ) . SSAE [9] 0.22
𝑖 𝑖 RICA [32] 0.19
TDRM [20] 0.18
Table 2 shows the performance of the proposed ADSNT Our method 0.13
based on different activation functions conducted on the
three databases. From Table 2, one can see that ReLU achieves
the best performance. The key reason is that we use the weight
decay term 𝜑 to optimize the objective function. images into a low-dimensional vector and to reconstruct
the respective corresponding face images from the mapping
4.3.5. Timing Consumption Analysis. In this subsection, we vectors. Having trained the ADSNT, a new face image can
use a HP Z620 workstation with Intel Xeon E5-2609, 2.4 GHz then be recognized by comparing its reconstruction image
CPU, 8 G RAM and conduct a series of experiments on with individual gallery images during testing. The proposed
AR database to compare the time consumption of different method has been evaluated on the widely used AR, PubFig,
methods which are tabulated in Table 3. The training time and Extended Yale B databases and the experimental results
(seconds) is shown in Table 3(a) while the time (seconds) have shown its effectiveness. For future work, we are focusing
needed to recognize a face from the testing set is shown on applying our proposed method to other application fields
in Table 3(b). From Table 3, one can see that the proposed such as pattern classification based on image set and action
method requires comparatively more time for training recognition based on the video to further demonstrate its
because of initialization of ADSNT and performing image validity.
reconstruction. However, the procedure of training is offline.
When we identity an image from testing set, our method
requires less time than other methods.
Competing Interests
The authors declare that there is no conflict of interests
5. Conclusions regarding the publication of this paper.
In this article, we present an adaptive deep supervised
autoencoder based image reconstruction method for face Acknowledgments
recognition. Unlike conventional deep autoencoder based
face recognition method, our method considers the class This paper is partially supported by the research grant for the
label information from training samples in the deep learn- Natural Science Foundation from Sichuan Provincial Depart-
ing procedure and can automatically discover the under- ment of Education (Grant no. 13ZB0336) and the National
lying nonlinear manifold structures. Specifically, a multi- Natural Science Foundation of China (Grant no. 61502059).
layer supervised adaptive network structure is presented,
which is trained to extract characteristic features from cor- References
rupted/clean facial images and reconstruct the corresponding
similar facial images. The reconstruction is realized by a so- [1] J.-T. Chien and C.-C. Wu, “Discriminant waveletfaces and near-
called “bottleneck” neural network that learns to map face est feature classifiers for face recognition,” IEEE Transactions on
Mathematical Problems in Engineering 13

Pattern Analysis and Machine Intelligence, vol. 24, no. 12, pp. [18] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, “DeepFace: clos-
1644–1649, 2002. ing the gap to human-level performance in face verification,” in
[2] Z. Lei, M. Pietikäinen, and S. Z. Li, “Learning discriminant face Proceedings of the 27th IEEE Conference on Computer Vision and
descriptor,” IEEE Transactions on Pattern Analysis and Machine Pattern Recognition (CVPR ’14), pp. 1701–1708, June 2014.
Intelligence, vol. 36, no. 2, pp. 289–302, 2014. [19] Z. Zhu, P. Luo, X. Wang, and X. Tang, “Deep learning identity-
[3] B. Moghaddam, T. Jebara, and A. Pentland, “Bayesian face preserving face space,” in Proceedings of the 14th IEEE Interna-
recognition,” Pattern Recognition, vol. 33, no. 11, pp. 1771–1782, tional Conference on Computer Vision (ICCV ’13), pp. 113–120,
2000. Sydney, Australia, December 2013.
[4] N. Cristianini and J. S. Taylor, An Introduction to Support [20] M. Hayat, M. Bennamoun, and S. An, “Deep reconstruction
Vector Machines and Other Kernel-based Learning Methods, models for image set classification,” IEEE Transactions on
Cambridge University Press, New York, NY, USA, 2004. Pattern Analysis and Machine Intelligence, vol. 37, no. 4, pp. 713–
[5] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 727, 2015.
John Wiley & Sons, New York, NY, USA, 2nd edition, 2001. [21] Y. Sun, X. Wang, and X. Tang, “Deep learning face representa-
[6] W. Zhao, R. Chellappa, P. J. Phillips, and A. Rosenfeld, “Face tion from predicting 10,000 classes,” in Proceedings of the 27th
recognition: a literature survey,” ACM Computing Surveys, vol. IEEE Conference on Computer Vision and Pattern Recognition
35, no. 4, pp. 399–458, 2003. (CVPR ’14), pp. 1891–1898, Columbus, Ohio, USA, June 2014.
[7] B. Zhang, S. Shan, X. Chen, and W. Gao, “Histogram of Gabor [22] Y. Sun, X. Wang, and X. Tang, “Deep learning face rep-
phase patterns (HGPP): a novel object representation approach resentation by joint identification-verification,” Tech. Rep.,
for face recognition,” IEEE Transactions on Image Processing, https://arxiv.org/abs/1406.4773.
vol. 16, no. 1, pp. 57–68, 2007. [23] X. Cai, C. Wang, B. Xiao, X. Chen, and J. Zhou, “Deep nonlinear
[8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet metric learning with independent subspace analysis for face
classification with deep convolutional neural networks,” in verification,” in Proceedings of the 20th ACM International
Neural Information Processing Systems, pp. 1527–1554, 2012. Conference on Multimedia (MM ’12), pp. 749–752, November
2012.
[9] S. Gao, Y. Zhang, K. Jia, J. Lu, and Y. Zhang, “Single sample face
recognition via learning deep supervised autoencoders,” IEEE [24] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning
Transactions on Information Forensics and Security, vol. 10, no. algorithm for deep belief nets,” Neural Computation, vol. 18, no.
10, pp. 2108–2118, 2015. 7, pp. 1527–1554, 2006.
[10] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimen- [25] Q. V. Le, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, and
sionality of data with neural networks,” American Association A. Y. Ng, “On optimization methods for deep learning,” in
for the Advancement of Science. Science, vol. 313, no. 5786, pp. Proceedings of the 28th International Conference on Machine
504–507, 2006. Learning (ICML ’11), pp. 265–272, Bellevue, Wash, USA, July
[11] Y. Bengio, “Practical recommendations for gradient-based 2011.
training of deep architecturesm,” in Neural Networks: Tricks of [26] C. Zhou, X. Wei, Q. Zhang, and X. Fang, “Fisher’s linear
the Trade, pp. 437–478, Springer, Berlin, Germany, 2012. discriminant (FLD) and support vector machine (SVM) in non-
[12] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. negative matrix factorization (NMF) residual space for face
Manzagol, “Stacked denoising autoencoders: learning useful recognition,” Optica Applicata, vol. 40, no. 3, pp. 693–704, 2010.
representations in a deep network with a local denoising [27] A. Martinez and R. Benavente, “The AR face database,” CVC
criterion,” Journal of Machine Learning Research (JMLR), vol. 11, Tech. Rep. #24, 1998.
no. 5, pp. 3371–3408, 2010. [28] A. S. Georghiades, P. N. Belhumeur, and D. J. Kriegman, “From
[13] V. Nair and G. E. Hinton, “Rectified linear units improve few to many: illumination cone models for face recognition
Restricted Boltzmann machines,” in Proceedings of the 27th under variable lighting and pose,” IEEE Transactions on Pattern
International Conference on Machine Learning (ICML ’10), pp. Analysis and Machine Intelligence, vol. 23, no. 6, pp. 643–660,
807–814, Haifa, Israel, June 2010. 2001.
[14] A. Coates, H. Lee, and A. Y. Ng, “An analysis of single-layer [29] N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar,
networks in unsupervised feature learning,” in Proceedings of “Attribute and simile classifiers for face verification,” in Proceed-
the 14th International Conference on Artificial Intelligence and ings of the 12th International Conference on Computer Vision
Statistics (AISTATS ’11), pp. 215–223, Sardinia, Italy, 2010. (ICCV ’09), pp. 365–372, Kyoto, Japan, October 2009.
[15] S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio, [30] M. Rezaei and R. Klette, “Novel adaptive eye detection and
“Contractive auto-encoders: explicit invariance during feature tracking for challenging lighting conditions,” in Computer
extraction,” in Proceedings of the 28th International Conference Vision—ACCV 2012 Workshops, J.-I. Park and J. Kim, Eds.,
on Machine Learning (ICML ’11), pp. 833–840, Bellevue, Wash, vol. 7729 of Lecture Notes in Computer Science, pp. 427–440,
USA, July 2011. Springer, Berlin, Germany, 2013.
[16] K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep fisher [31] Y. Tang, R. Salakhutdinov, and G. H. Hinton, “Deep Lambertian
networks for large-scale image classification,” in Proceedings of networks,” in Proceedings of the 29th International Conference on
the 27th Annual Conference on Neural Information Processing Machine Learning (ICML 2012), pp. 1623–1630, Edinburgh, UK,
Systems (NIPS ’13), pp. 163–171, Lake Tahoe, Nev, USA, Decem- July 2012.
ber 2013. [32] Q. V. Le, A. Karpenko, J. Ngiam, and A. Y. Ng, “ICA with
[17] M. Chen, Z. Xu, K. Q. Weinberger, and F. Sha, “Marginalized reconstruction cost for efficient overcomplete feature learning,”
denoising autoencoders for domain adaptation,” in Proceedings in Proceedings of the 25th Annual Conference on Neural Infor-
of the 29th International Conference on Machine Learning (ICML mation Processing Systems (NIPS ’11), pp. 1017–1025, Granada,
’12), pp. 767–774, Edinburgh, UK, July 2012. Spain, December 2011.
14 Mathematical Problems in Engineering

[33] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neu-


ral networks,” in Proceedings of the 14th International Conference
on Artificial Intelligence and Statistics, pp. 315–323, 2011.
Advances in Advances in Journal of Journal of
Operations Research
Hindawi Publishing Corporation
Decision Sciences
Hindawi Publishing Corporation
Applied Mathematics
Hindawi Publishing Corporation
Algebra
Hindawi Publishing Corporation
Probability and Statistics
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

The Scientific International Journal of


World Journal
Hindawi Publishing Corporation
Differential Equations
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

Submit your manuscripts at


http://www.hindawi.com

International Journal of Advances in


Combinatorics
Hindawi Publishing Corporation
Mathematical Physics
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

Journal of Journal of Mathematical Problems Abstract and Discrete Dynamics in


Complex Analysis
Hindawi Publishing Corporation
Mathematics
Hindawi Publishing Corporation
in Engineering
Hindawi Publishing Corporation
Applied Analysis
Hindawi Publishing Corporation
Nature and Society
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

International
Journal of Journal of
Mathematics and
Mathematical
Discrete Mathematics
Sciences

Journal of International Journal of Journal of

Hindawi Publishing Corporation Hindawi Publishing Corporation Volume 2014


Function Spaces
Hindawi Publishing Corporation
Stochastic Analysis
Hindawi Publishing Corporation
Optimization
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

You might also like