Efficient Facial Expression Recognition Algorithm Based On Hierarchical Deep Neural Network Structure
Efficient Facial Expression Recognition Algorithm Based On Hierarchical Deep Neural Network Structure
ABSTRACT With the continued development of artificial intelligence (AI) technology, research on interac-
tion technology has become more popular. Facial expression recognition (FER) is an important type of visual
information that can be used to understand a human’s emotional situation. In particular, the importance of AI
systems has recently increased due to advancements in research on AI systems applied to AI robots. In this
paper, we propose a new scheme for FER system based on hierarchical deep learning. The feature extracted
from the appearance feature-based network is fused with the geometric feature in a hierarchical structure. The
appearance feature-based network extracts holistic features of the face using the preprocessed LBP image,
whereas the geometric feature-based network learns the coordinate change of action units (AUs) landmark,
which is a muscle that moves mainly when making facial expressions. The proposed method combines the
result of the softmax function of two features by considering the error associated with the second highest
emotion (Top-2) prediction result. In addition, we propose a technique to generate facial images with neutral
emotion using the autoencoder technique. By this technique, we can extract the dynamic facial features
between the neutral and emotional images without sequence data. We compare the proposed algorithm with
the other recent algorithms for CK+ and JAFFE dataset, which are typically considered to be verified datasets
in the facial expression recognition. The ten-fold cross validation results show 96.46% of accuracy in the
CK+ dataset and 91.27% of accuracy in the JAFFE dataset. When comparing with other methods, the result
of the proposed hierarchical deep network structure shows up to about 3% of the accuracy improvement
and 1.3% of average improvement in CK+ dataset, respectively. In JAFFE datasets, up to about 7% of the
accuracy is enhanced, and the average improvement is verified by about 1.5%.
INDEX TERMS Artificial intelligence (AI), facial expression recognition (FER), emotion recognition, deep
learning, LBP feature, geometric feature, convolutional neural network (CNN).
I. INTRODUCTION the use of such technologies that recognize voice and lan-
Technologies for communication have traditionally been guage, there are artificial intelligence robots that can interact
developed based on the senses that play a major role in closely with real life, in such ways as managing the daily
human interaction [1]. In particular, artificial intelligence schedules of people and playing their favorite music. How-
voice recognition technology using the sense of hearing and ever, sensory acceptance is required for interactions more
AI speakers has been commercialized because of improve- precisely mirroring those of humans. Therefore, the most
ments in artificial intelligence (AI) technology [2]. Through necessary technology is a vision sensor, as vision is a large
part of human perception in most interactions. In artificial
The associate editor coordinating the review of this manuscript and
intelligence robots using interactions between a human and
approving it for publication was Peter Peer. a machine, human faces provide important information as a
2169-3536
2019 IEEE. Translations and content mining are permitted for academic research only.
VOLUME 7, 2019 Personal use is also permitted, but republication/redistribution requires IEEE permission. 41273
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
J.-H. Kim et al.: Efficient FER Algorithm Based on Hierarchical Deep Neural Network Structure
clue to understand the current state of the user. Therefore, recognition. Aside from environmental changes, there is a
the field of facial expression recognition has been studied problem with the lack of appropriate datasets.
extensively over the last ten years. In this paper, we propose an efficient algorithm to
Recently, with the increment of relevant data and continued improve the recognition accuracy by a hierarchial deep
development of deep learning, a facial expression recogni- neural network structure which can re-classify the result
tion system which accurately recognizes facial expressions (Top-2 error emotion). The feature extracted from the appear-
in various environments has come to be actively studied. ance feature-based network is fused with the geometric
Facial expression recognition systems FERs) are fundamen- feature in a hierarchical structure. The proposed scheme
tally based on an ergonomic and evolutionary approach. combines two features to obtain more accurate result by
Based on universality, similarity, physiological, and evolu- considering the error associated with the second highest emo-
tionary properties, emotions in FER studies can be classi- tion (Top-2) prediction result.
fied into six categories: happiness, sadness, fear, disgust, The rest of this paper is organized as follows: In Section II,
surprise, and anger. In addition, emotions can be classi- we discuss various existing algorithms for FER. Section III
fied into seven categories with the addition of a neutral presents a new proposed FER algorithm using deep learn-
emotion [1], [3]. ing based on appearance feature and geometric feature. The
A FER system for recognizing facial expressions requires experimental results are reported in Section IV. Finally,
four steps. First, we need a face detection step that local- the concluding remarks are presented in Section V.
izes the human face. Representative algorithms include
Adaboost [4], Haarcascade [5], and a histogram of oriented II. RELATED WORKS
gradients (HOG) [6]. The second step involves a face regis- We describe related works of facial expression recognition
tration with which to obtain the main feature points in order to systems that have been studied to date. These algorithms
recognize face rotations or muscle movement. The faces after can largely be divided into the classical feature extraction
detection step is inclined to be degraded in terms of recogni- method and the deep learning-based method. The classical
tion accuracy due to the potential for various illuminations feature extraction methods can be roughly classified further
and rotations. Therefore, it is necessary to improve the image into the appearance feature extraction method that extracts
by obtaining landmarks, which are the positions of the main the features of the entire facial criterion and the geometric
muscle movements when one is making facial expression. feature extraction method that extracts geometric elements
The positions that define the contraction of the facial muscles of the facial structure and motion of the facial muscles.
are called action units (AUs), and the main positions include In the following sub-sections, we will describe some of the
the eyebrows, eyes, nose, and mouth [7]. A typical algorithm recent algorithms involving the appearance feature extraction
is active appearance models (AAM) [8]. Third, features that method, the geometric feature extraction method, and deep
can recognize facial expressions are extracted by acquiring learning-based facial expression recognition algorithms.
the motion or position information of the feature points in the
feature extraction step. A. CLASSICAL FEATURE FER APPROACHES
To this end, the approach can be divided into appear- Liu et al. [13] and Happy and Routray [14] reported a rep-
ance feature-based and geometric feature-based methods. resentative FER algorithm using LBP. In Liu et al. [13],
The appearance feature-based method is a feature extraction the active patches were defined around 68 landmarks
method for the entire facial image. It involves the method extracted by the active appearance model (AAM), and the
of dimension reduction through a fusion with binary feature features were extracted using LBP for the patches. In this way,
extraction, which is widely applied in the field of facial they eliminated unnecessary parts of the face and reduced
studies. Principal component analysis (PCA) and linear dis- the effect on the environmental change. This improved the
criminant analysis (LDA) are typical dimension reduction accuracy by using the more robust features obtained from
methods. The local binary pattern (LBP) and local direc- the main facial muscles with patch centering. In Happy and
tional pattern (LDP) techniques are binary feature extraction Routray [14], rather than using other existing algorithms that
methods [9], [10] for presenting facial expression. The geo- extract facial landmarks, they detected the points of the eye-
metric feature-based method extracts the geometric position brows, eyes, nose, and mouth corners by applying the sobel
of the face or the value of the change in facial muscle move- edge, otsu algorithm, morphological operation, etc. By defin-
ment. Finally, based on the obtained features, a classification ing the active facial patches and extracting the LBP histogram
step is needed to classify the defined emotions using a sup- feature of each patch, the features of the 19 patches have been
port vector machine (SVM) and the hidden markov model extracted from the main facial muscles such as the forehead,
(HMM) [11], [12]. nose, and mouth, which move when facial expressions are
Despite the fact that many algorithms have been studied, generated. The local directional ternary pattern (LDTP) fea-
some problems still remain, such as illumination changes, ture extraction method based on the two largest directions
rotations, occlusions, and accessories [3]. These are not only after performing a matrix operation on the Robinson compass
classical problems involved in image processing, but also was used by Ryu et al. [15]. This algorithm formed a 17 ×
factors that cause hardship for capturing action units of facial 17 block around 42 landmarks selected by the active pattern
through LDTP, and extracted an LDTP histogram from it. argumentation methods, and FER research based on CNN is
Through this process, robust features were extracted, and being actively studied.
more accurate emotional recognition was achieved by extract- The FER approaches using dual networks, which fuse
ing more information from the strong response regions. both holistic features of the face and partial features focused
When emotions are expressed, the formation of patches on facial landmarks, have been studied in Jung et al. [19],
around the mainly changing facial muscles, the application Xie and Hu [20] and Yang et al. [21]. In these approaches,
of this information to the appearance-based feature extrac- one CNN network extracted features using facial gray scale
tion technique, have recently expanded to contribute to the images while the other network extracted features using
development of various algorithms applying deep learning. image patches or landmark changes. Finally, these features
A typical algorithm that uses geometric features extracts are fused by a weighted function or fully-connected learning.
the temporal or dynamic changes of the landmark of the face. A method of extracting temporal features and spatial fea-
Kotsia and Pitas [16] used the candide wireframe model tures to combine softmax and predict emotions was used by
to predict facial emotions by extracting geometric features Zhang et al. [22]. The temporal features were extracted so
around face landmarks. In this algorithm, the grid was traced as to learn temporal landmarks using the part-based hier-
in the sequential dataset. The geometric and dynamic infor- archical bidirectional recurrent neural network (PHRNN),
mation regarding the emotion changes have been extracted by and the multi-signal convolutional neural network (MSCNN)
the difference between the first neutral face grid and the peak involved extracting holistic features that extract the overall
emotion grid of the last frame. Finally, when classifying the facial features. The PHRNN classifies the facial landmarks
final emotion, the emotion was classified using support vector into each AU and hierarchically learns the movements of
machine (SVM) by combining the values with the facial AUs, which change with time. The MSCNN learns the facial
action units (FAUs), including the facial change according to gray scale images in order to extract the entire appearance
the emotions. features. Using these two networks, more accurate facial
A dynamic texture-based approach to classifying emotions expression recognition could be made possible by consider-
using a free-form deformation (FFD) technique for tracking ing both temporal and spatial features.
the motion direction of AUs in image sequences was proposed Despite many studies, the recognition rate is still not
by Koelstra et al. [17]. The extracted representation based high enough, due to the influence of various environmental
on motion history was used to derive the motion direction changes such as lighting and accessories as well as the dif-
histogram descriptor in the spatial and temporal domains. The ference in the characteristics of individual people. Therefore,
extracted features finally combined gentleboolst algorithm in this paper, we would like to propose an efficient FER
and HMM in order to classify the emotions. scheme with robust features combining two types of features
These geometric feature extraction methods can reduce using deep learning.
the effect of the degradation of accuracy due to illumination
or external change by tracking the movement of geometric III. PROPOSED ALGORITHM
coordinates extracted from the main AUs. Therefore, those The proposed algorithm is shown in Figure 1. In this study,
geometric feature-based algorithms have been studied by we propose an efficient algorithm to improve the recognition
many researchers in order to improve accuracy by the fusion accuracy by a hierarchial deep neural network structure which
of the features extraction methods. can re-classify the result (Top-2 error emotion), which is the
most frequent error. The first network learns the convolu-
B. CNN-BASED FER APPROACHES tional neural network, which focuses on AUs using the LBP
Recently, due to the development of big data and the improve- feature, which is a typical feature extraction technique in the
ment of hardware technology, many algorithms based on deep field of facial studies [9]. The second network extracts the
learning have been researched. Since the field of FER is being geometric changes of the landmarks of each AU and learns
influenced by these advancements as well, more robust and all pairs of six emotions. Based on two features, the proposed
efficient feature recognition has been achieved through the algorithm combines them using adaptive weighting function
automatic learning of the extracted facial features. In this to give the final result.
section, we introduce the CNN-based FER algorithms.
Lopes et al. [18] suggested a representative facial expres- A. OBSERVATION: ERROR RATIO OF TOP-2 SELECTION
sion algorithm applying deep learning based on CNN. This The ratio of the correct answer was measured by using the
takes the data argumentation process to resolve the scarcity 6-length softmax results in order to determine the cause of
of the FER dataset and to make robust facial emotions to errors and ratio of the facial recognition errors when only
changes such as rotation and transportation. In this algo- using a single network. The experiment was performed using
rithm, except for the parts with unnecessary elements around two datasets with 150 images per emotion using the appear-
the face, the AUs are cropped into blocks at the center of ance feature-based CNN. This experiment was conducted by
the action unit, and the emotions are classified into six to using a 10-fold cross validation method.
seven emotions though CNN. In such algorithms, the lack of The entire dataset was divided into 10 sets, 9 of which
datasets required for deep learning algorithms are solved by were employed for training while the other one was used for
TABLE 1. Top-2 error rate in CK + dataset. in Figure 1. The facial expression is predicted as one of six
emotions: anger, disgust, fear, happy, sad, and surprise. First,
the appearance feature-based network uses the LBP image
in order to learn the holistic characteristics of the face in
one frame. Secondly, the geometric feature-based network
learns the change of eighteen x and y coordinates among the
facial landmarks, which mainly move according to emotional
changes. The predicted result of the appearance feature-based
network, the two highest softmax values among the emotions
are weighted with the results of the geometric feature-based
TABLE 2. Top-2 error rate in JAFFE dataset.
network, respectively. Then, robust features are generated by
fusing the different types of features. Finally, we can obtain
a predicted final emotion.
From the next sub-section, each module will be explained
in detail
B. PREPROCESSING
Before the main process of facial expression recognition,
it is necessary to identify a face and recognize the facial
area. Therefore, the face and non-face parts must be sepa-
verification. The softmax results for six emotions were sorted rated through the face detection process. Only preserving the
in descending order, and the rank of correspondence to the important parts for emotion recognition prevents accuracy
True answer was counted. The number of counts was divided degradation due to changes in the environment surrounding
by the total number of points of validation data in the fold, the face. In this paper, we used the face detector model
and the ratio was measured at each fold. The results are shown which was learned through P. F. Felzenszwalb et al. [23]: it
in Tables 1 and 2. is a detector that uses the HoG algorithm to determine the
As shown in Tables 1 and 2, the case of including the sec- face boundary coordinates. We cropped the facial area using
ond highest label in CK +, was observed by 4.2%. When this detector. In this algorithm, the linear SVM was used to
Top-2 ∼ Top-6 were considered as errors and Top-1 is the identify the facial region by training HoG features from pos-
correctly predicted result. The ratio of Top-2 resulted in 4.7%, itive (contain an object) and negative (not contain an object)
which covers 90.5% of total error. In addition, the JAFFE samples composed of sliding window. The algorithm could
dataset had the largest error rate of Top-2 at 8.0% which is be used as the get_frontal_face_detector defined in dlib [24]
occupied with 75.3% of 10.6% total error. in order to identify and crop the facial area coordinates of left-
As a result, we can see that the error is biased in Top-2 error. top x, y, right-bottom x, y. The cropped facial images usually
The error occurs within the Top-2 range. In all datasets, appear from the middle of the forehead to the chin, and from
Top-2 errors occurred at a rate of more than 82% of the total the leftmost face to the rightmost face.
error. From this viewpoint, we can consider a structure to After the facial region has been cropped, a blurring
improve the recognition accuracy if we reduce Top-2 error process is performed before creating the LBP image
rate by a refined classification. in order to remove noise, which is the input value of
To design an efficient scheme, robust features are extracted the appearance feature-based network. If the features are
using hierarchical fusion of the two types of networks shown extracted from unfiltered facial images, it may lead to a
In a similar way, the of convolutional and pooling The results of two emotions with the highest value among
operations are repeated three times. Finally, 256 maps of the softmax results extracted using this learned model are
16 × 16 sizes as a result of the last pooling are derived. After later used for more accurate emotion prediction by fusing
the convolutional and pooling layers are finished, these values with the result of the geometric feature. Thus, the label infor-
are flattened and passed through the fully connected layers, mation for these two high emotions are transmitted to the
which are the hidden layers. The first fully connected layer geometric feature-based network.
has 1024 nodes, while the second has 500 nodes.
In the proposed network, we use the dropout operation D. THE GEOMETRIC FEATURE-BASED NETWORK
between fully connected layers. When the network is learn- We considered both types of the appearance feature-based
ing, it turns off the neurons at random, thus disturbing the feature and geometric feature in order to reduce recognition
learning. This can prevent an over-fitting that is biased toward errors by using more robust features. In the case of using only
the learning data [29]. We use rectified linear unit (ReLU) as one network, the recognition accuracy is inclined to be low
the activation function between the convolutional layer and because of various factors, such as rotations, illuminations,
fully connected layer. This activation function is a step for and peripheral accessories. Further, in the case of fine emo-
converting the quantitative value of the feature map through tional change, it is difficult to recognize emotion only using
the convolution operation into a nonlinear value. At the end of the holistic features of the face.
the network, six emotions are extracted as continuous values In this paper, we additionally use a geometric feature-
using the softmax function. The formula of the softmax result based CNN that captures the movements of the landmarks
for six emotions is as the following: of emotion. The feature of the partial elements obtained by
eai detecting the movement of the landmark is added to the
si = , (4)
n−1
P a overall features so that more robust features can be extracted.
ek Furthermore, we detected and demonstrated that the facial
k=0
expression recognition error most frequently occurred in the
where n corresponds to the number of emotions requiring
emotion of the second highest probability when only using
classification. si is the softmax function score of the ith class.
the appearance network. Based on this assumption, the Top-2
This value is the sum of the exponential values of the ak s,
values with the highest values among the results of 6-classes
which are the values of the entire category, and is then divided
probabilities composed of the last softmax layer are selected.
by the exponential value of the emotion score. The error
Finally, the emotion is classified through the max result of
is computed by the network through this process, and the
two network fusion by weighted sum calculation.
error is reduced by using the cross entropy loss function. The
For extracting geometric features which contain the
calculation of the loss is considered as the following:
dynamic features of a face, a neutral facial image of the
n−1
X person depicted in the reference facial image is required.
L=− yj log(sj ), (5) However, in the real system, there is not enough neutral facial
j=0 images. There are also some FER datasets which do not have
where yj is the jth element of the correct answer vector. enough neutral image data. In this case, we need to obtain
Using the cross entropy function, it is possible to flexibly enough neutral image data to learn the dynamic feature of
respond to various probability distributions of the model by the facial expression. We suggest the autoencoder to generate
obtaining the cross entropy through a negative log-likelihood. neutral image data. This network can be used to learn the
In addition, the process of finding a gradient is relatively geometric feature-based network by obtaining the difference
simple as well [30]. of coordinates between the generated neutral facial image and
To classify six categories, which is n, if the first element is the emotional image, and to create dynamic features. The
correct, y = [1, 0, 0, 0, 0, 0], y1 = 1 and the others are zero. sj proposed autoencoder technique is presented in Figure 5.
is the output value of the softmax function. In addition, we use This network is constructed from the VGG19 network
a steepest gradient descent (SGD) as an optimizer along with structure [28]. The images of neutral faces can be generated
the calculated cross entropy loss. using this structure. It can be divided into the encoding and
TABLE 3. Fifteen pairs of six emotions for the geometric feature-based same way as the appearance feature-based network, and the
model (AN:Angry, DI:Disgust, FE:Fear, HA:Happy, SA:Sad, SU:Surprise).
dropout is used to overcome the overfitting problem. The loss
function for learning is calculated using the cross entropy
loss function in the same way as the appearance feature-
based network. The model for each pair is stored, and the
model corresponding to the Top-2 pair from the result of
the softmax of the previously obtained appearance model is
selected. In this way, we consider a weighting function to
determine the final emotion.
TABLE 7. Confusion matrix of appearance feature-based CNN in the TABLE 8. Confusion matrix of the proposed method in the JAFFE dataset.
JAFFE dataset. (AN:Angry, DI:Disgust, FE:Fear, HA:Happy, SA:Sad, (AN:Angry, DI:Disgust, FE:Fear, HA:Happy, SA:Sad, SU:Surprise).
SU:Surprise).
[31] C. Sagonas, E. Antonakos, G. Tzimiropoulos, S. Zafeiriou, and In 2004, he joined in the Real-Time Multimedia Research Team, Elec-
M. Pantic, ‘‘300 faces in-the-wild challenge: Database and results,’’ Image tronics and Telecommunications Research Institute (ETRI), South Korea,
Vis. Comput., vol. 47, pp. 3–18, Mar. 2016. where he was a Senior Researcher. In ETRI, he developed so many real-time
[32] V. Kazemi and J. Sullivan, ‘‘One millisecond face alignment video signal processing algorithms and patents and received the Best Paper
with an ensemble of regression trees,’’ in Proc. IEEE Conf. Award, in 2007. From 2009 to 2016, he was an Associate Professor with
Comput. Vis. Pattern Recognit., Columbus, Ohio, Jun. 2014, the Division of Computer Science and Engineering, Sun Moon University,
pp. 1867–1874. South Korea. In 2016, he joined the Department of Information Technol-
[33] K. Simonyan and A. Zisserman. (2014). ‘‘Very deep convolutional
ogy (IT) Engineering, Sookmyung Women’s University, South Korea, where
networks for large-scale image recognition.’’ [Online]. Available:
he is currently an Associate Professor. He has published over 200 inter-
https://arxiv.org/abs/1409.1556
[34] P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, national journal and conference papers, and holds patents in his field. His
‘‘The extended cohn-kanade dataset (ck+): A complete dataset for action research interests include image and video signal processing for the content-
unit and emotion-specified expression,’’ in Proc. IEEE Comput. Soc. Conf. based image coding, video coding techniques, 3-D video signal processing,
Comput. Vis. Pattern Recognit. Workshops (CVPRW), San Francisco, CA, deep/reinforcement learning algorithm, embedded multimedia systems, and
USA, Jun. 2010, pp. 94–101. intelligent information systems for image signal processing.
[35] M. Lyons, S. Akamatsu, M. Kamachi, and J. Gyoba, ‘‘Coding Dr. Kim is a Professional Member of the ACM and IEICE. He has
facial expressions with gabor wavelets,’’ in Proc. Third IEEE Int. received the Special Merit Award for Outstanding Paper from the IEEE Con-
Conf. Autom. Face Gesture Recognit., Nara, Japan, Apr. 1998, sumer Electronics Society, IEEE ICCE 2012, the Certification Appreciation
pp. 200–205. Award from the SPIE Optical Engineering, in 2013, and the Best Academic
[36] S. Elaiwat, M. Bennamoun, and F. Boussaid, ‘‘A spatio-temporal RBM- Award from the CIS, in 2014. He also served or serves as an Organizing
based model for facial expression recognition,’’ Pattern Recognit., vol. 49, Committee Member of CSIP 2011, a Co-Organizer of CICCAT2016/2017,
pp. 152–161, Jan. 2016. and a Program Committee Member of many international conferences. He is
[37] Y. Liu, Y. Li, X. Ma, and R. Song, ‘‘Facial expression recognition with serving as a Professional Reviewer for many academic journals, including
fusion features extracted from salient facial areas,’’ Sensors, vol. 17, no. 4,
the IEEE, ACM, Elsevier, Springer, Oxford, SPIE, IET, MDPI, and so on.
712, 2017.
In 2007, he served as an Editorial Board Member for the International Jour-
[38] M. Goyani and N. Patel, ‘‘Multi-level haar wavelet based facial expression
recognition using logistic regression,’’ Indian J. Sci. Technol., vol. 10, p. 9, nal of Soft Computing, Recent Patents on Signal Processing, the Research
Mar. 2017. Journal of Information Technology, the Journal of Convergence Information
Technology, and the Journal of Engineering and Applied Sciences. Since
2018, he has been the Editor-in-Chief of The Journal of Multimedia Infor-
mation System and an Associate Editor of the IEEE ACCESS Journal. He is
serving as an Associate Editor for Circuits, Systems and Signal Processing
JI-HAE KIM was born in Seoul, South Korea,
(Springer), The Journal of Supercomputing (Springer), the Journal of Real-
in 1994. She received the B.S. and M.S.
Time Image Processing (Springer), and the International Journal of Image
degrees from the College of Engineering, Sook-
Processing and Visual Communication (IJIPVC).
myung Women’s University, Seoul, South Korea,
in 2017 and 2019, respectively, where she was
a Researcher with the Intelligent Vision Pro-
PARTHA PRATIM ROY (M’87) received the
cessing Laboratory (IVPL), from 2017 to 2019.
Ph.D. degree in computer science from the Uni-
Her research interests include the development of
versitat Autònoma de Barcelona, Spain, in 2010.
computer vision processing and facial expression
He was a Postdoctoral Research Fellow with the
recognition techniques using deep learning and
Computer Science Laboratory (LI, RFAI Group),
machine learning, fundamental study of image feature extraction, and image
France, and with the Synchromedia Laboratory,
classification.
Canada. He was also a Visiting Scientist with the
Indian Statistical Institute, Kolkata, India, for more
than 6 times. He has gathered industrial experience
while working as an Assistant System Engineer
at TATA Consultancy Services, India, from 2003 to 2005, and as a Chief
Engineer at Samsung, Noida, from 2013 to 2014. He is currently an Assistant
Professor with the Department of Computer Science and Engineering, IIT
Roorkee, Roorkee. He has participated in several national and international
projects funded by the Spanish and French Government. He has published
more than 160 research papers in various international journals and confer-
ence proceedings. His main research interest includes pattern recognition.
In 2009, he received the Best Student Paper Award from the International
Conference on Document Analysis and Recognition (ICDAR).