0% found this document useful (0 votes)
111 views18 pages

A Robust Static Hand Gesture Recognition

contains a clear explaination of all the preprocessing required for a single hand gesture image and classifying it into different classes using nearest neighborhood algorithm.

Uploaded by

Konda Chandu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views18 pages

A Robust Static Hand Gesture Recognition

contains a clear explaination of all the preprocessing required for a single hand gesture image and classifying it into different classes using nearest neighborhood algorithm.

Uploaded by

Konda Chandu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Pattern Recognition 46 (2013) 2202–2219

Contents lists available at SciVerse ScienceDirect

Pattern Recognition
journal homepage: www.elsevier.com/locate/pr

A robust static hand gesture recognition system using geometry based


normalizations and Krawtchouk moments
S. Padam Priyal n, Prabin Kumar Bora
Department of Electronics and Electrical Engineering, Indian Institute of Technology Guwahati, Guwahati 781039, Assam, India

a r t i c l e i n f o a b s t r a c t

Article history: Static hand gesture recognition involves interpretation of hand shapes by a computer. This work
Received 5 August 2011 addresses three main issues in developing a gesture interpretation system. They are (i) the separation
Received in revised form of the hand from the forearm region, (ii) rotation normalization using the geometry of gestures and
28 December 2012
(iii) user and view independent gesture recognition. The gesture image comprising the hand and the
Accepted 30 January 2013
Available online 8 February 2013
forearm is detected through skin color detection and segmented to obtain a binary silhouette. A novel
method based on the anthropometric measures of the hand is proposed for extracting the regions
Keywords: constituting the hand and the forearm. An efficient rotation normalization method that depends on the
Hand extraction gesture geometry is devised for aligning the extracted hand. These normalized binary silhouettes are
Hand gesture
represented using the Krawtchouk moment features and classified using a minimum distance classifier.
Krawtchouk moments
The Krawtchouk features are found to be robust to viewpoint changes and capable of achieving good
Minimum distance classifier
Rotation normalization recognition for a small number of training samples. Hence, these features exhibit user independence.
Skin color detection The developed gesture recognition system is robust to similarity transformations and perspective
View and user-independent recognition distortions. It can be well realized for real-time implementation of gesture based applications.
& 2013 Elsevier Ltd. All rights reserved.

1. Introduction properly defined to the machine. Hence, computer vision techni-


ques in which one or more cameras are used to capture the hand
Human–computer interaction (HCI) is an important activity images have evolved. The methods based on these techniques are
that forms an elementary unit of intelligence based automation. called vision based methods. The availability of fast computing
The very common HCI is based on the use of simple mechanical and the advances in computer vision algorithms have led to the
devices such as the mouse and the keyboard. Despite familiarity, rapid growth in the development of vision based gestural inter-
these devices inherently limit the speed and naturalness of the faces. Many reported works on static hand gesture recognition
interaction between man and the machine. The ultimate goal of have also focused in incorporating the dynamic characteristics of
HCI is to develop interactive computer systems that are non- the hand. However, the level of complexity in recognizing the
obtrusive and emulate the ‘natural’ way of interaction among hand posture is comparatively high and recovering the hand
humans. The futuristic technologies in intelligent automation shape is difficult due to variation in size, rotation of the hand
attempt to incorporate communication modalities like speech, and the variation of the viewpoint with respect to the camera.
hand writing and hand gestures with HCI. The development of The approaches to hand shape recognition are based on the 3D
hand gesture interfaces finds successful applications in sign-to- modeling of the hand or using 2D image models like the image
text translation systems, robotics, video/dance annotations, assis- contour and the silhouette. The computational cost in fully recover-
tive systems, sign language communication, virtual reality and ing the 3D hand/arm state is very high for real-time recognition and
video based surveillance. slight variations in the model parameters greatly affect the system
The hand gesture interfaces are based on the hand shape (static performance [1]. By contrast, the processing of 2D image models
gesture) or the movement of the hand (dynamic gesture). The HCI involves low computational cost and high accuracy for a modest
interpretation of these gestures requires proper means by which gesture vocabulary [1]. Thus, the 2D approaches are well pertinent
the dynamic and/or static configurations of the hand could be for real time processing. The general approach to vision based hand
shape recognition is to extract a unique set of visual features and
match them to a pre-defined representation of the hand gesture.
n
Corresponding author. Tel.: þ91 361 258 2502; fax: þ 91 361 258 2542.
Therefore, the important factor in developing the gesture recogni-
E-mail addresses: [email protected], tion system is the accurate representation of the hand shapes. This
[email protected] (S. Padam Priyal), [email protected] (P.K. Bora). step is usually known as the feature extraction in pattern recognition.

0031-3203/$ - see front matter & 2013 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.patcog.2013.01.033
S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219 2203

The features are derived either from the spatial domain or This work presents a detailed gesture recognition system that
from the transform domain representation of the hand shapes. evaluates the performance of the Krawtchouk moment features
The extracted features describing the hand gestures can be on a database that consists of 4230 gesture samples. We propose
classified into two groups: contour based features and the region novel methods based on the anthropometric measures to auto-
based features. The contour based features correspond to the matically identify the hand and its constituent regions. The
information derived only from the shape boundary. Common geometry of the gesture is characterized in terms of the abducted
contour based methods that are used for shape description are fingers. This gesture geometry is used to normalize for the
Fourier descriptors, shape signatures, curvature scale space and orientation changes. These proposed normalization techniques
chain code representations [2]. Hausdorff distance [3] and shape are robust to similarity and perspective distortions. The main
context [2] are correspondence based matching techniques in contributions in this work are:
which the boundary points are the features representing the
shape. The region based features are global descriptors derived
by considering the entire pixels constituting the shape region. The 1. A rule based technique using the anthropometric measures of
common region based methods include moment functions and the hand is devised to identify the forearm and the hand
moment invariants, shape matrices, convex hulls and medial axis regions.
transforms [2]. Similarly, the spatial-domain measures like the 2. A rotation normalization method based on the protruded/
Euclidean distance, the city-block distance and the image correla- abducted fingers and the longest axis of the hand is devised.
tion are used for region based matching in which the pixels 3. A static hand gesture database consisting of 10 gesture classes
within the shape region are considered as features. and 4230 samples is constructed.
The efficiency of these features is generally evaluated based 4. A study on the Krawtchouk moment features in comparison to
on the compactness in representation, the robustness to spatial the geometric and the Zernike moments for viewpoint and
transformations, the sensitivity to noise, accuracy in classification, user invariant hand gesture recognition is performed.
low computational complexity and the storage requirements [2].
In this context, the moments based representations are preferred The rest of the paper is organized as follows: Section 2 presents a
mainly due to their compact representation, invariance properties summary of the related works in static hand gesture recognition.
and the robustness to noise [4]. The moments also offer the Section 3 gives the formulation for Krawtchouk and other con-
advantages of reduced computational load and database storage sidered moments. Section 4 provides an overview of the proposed
requirements. Hence, the moment functions are among the robust gesture analysis system in detail. Experimental results are dis-
features that are widely used for shape representation and find cussed in Section 5. Section 6 concludes the paper mentioning the
successful applications in the field of pattern recognition which scope for future work.
involves archiving and fast retrieval of images [5].
Recently the discrete orthogonal moments like the Tchebichef
moments and the Krawtchouk moments were introduced for 2. Summary of the related works
image analysis [6,7]. It is shown that these moments provide
higher approximation accuracy than the existing moment based The primary issues in hand gesture recognition are: (i) hand
representations and are potential features for pattern classifica- localization, (ii) scale and rotational invariance and (iii) viewpoint
tion. Hence, in this work we have proposed the classification of and person/user independence. Ong and Ranganath [9] presented
static hand gestures using Krawtchouk moments as features. The a thorough review on hand gesture analysis along with the insight
objective of this work is to study the potentiality of the Krawtch- into problems associated with it.
ouk moments in uniquely representing the hand shapes for Earlier works assumed the gestures to be performed in a
gesture classification. Hence, the experiments are performed on uniform background. This required a simple thresholding techni-
a database consisting of gesture images that are normalized for que to obtain the hand silhouette. For a non-uniform background,
similarity variations like scaling, translation and rotation. The skin color detection is the most popular and the general method
performance of the Krawtchouk moments is compared with the for hand localization [1,10–14]. The skin color cues are combined
geometric and the Zernike moments based recognition methods. with the motion cues [10,12,13] and the edge information [14]
The other main issues considered in developing the gesture for improving the efficiency of hand detection. Segmented hand
recognition system are the (i) identification of the hand region images are usually normalized for the size, orientation and
and (ii) normalization of the rotation changes. illumination variations [15–17]. The features can be extracted
The identification of the hand region involves separating the directly from the intensity images or the binary silhouettes or
hand from the forearm. The lack of gesture information in the the contours.
forearm makes it redundant and its presence increases the data size. In [18,19], the orientation histograms are derived from the
In most of the previous works, the forearm region is excluded by intensity images. These histograms represent summarized infor-
either making the gesturers to wear full arm clothing or by limiting mation on the orientations of the hand image and are shown to be
the forearm region into the scene while acquisition. However, such illumination invariant. They are however, rotation variant. Triesch
restrictions are not suitable in real-time applications. The orienta- et al. [20] have classified hand postures using elastic graph
tion of the acquired gesture changes due to the angle made by the matching. The system is designed to efficiently identify gestures
gesturer with respect to the camera and vice-versa. in a complex background. It is sensitive to geometric distortions
This work concentrates on vision based static hand gesture that arise due to the variations in the hand anatomy and the
recognition considering the afore-mentioned problems. In [8], viewpoint. The advantages of the matching procedure are its user
the Krawtchouk moments are introduced as features for gesture independence and the robustness to complex environments. In
classification. The performance of Krawtchouk moments is com- [1,15], local linear embedding (LLE) is introduced to map the high
pared with that of a few other moments like the geometric, dimensional data to a low dimensional space in such a way as to
the Zernike and the Tchebichef moments. It is shown that the preserve the relationship between neighboring points. Each point
Krawtchouk moments based representation of hand shapes gives in the low dimensional space is approximated by a linear
high recognition rates. The analysis is performed on hand regions combination of their neighbors. The approach is invariant to scale
that are manually extracted and corrected for rotation changes. and translation but sensitive to rotation. Just et al. [21] used the
2204 S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219

modified census transform to derive the hand features both in The method is normalized for rotation, scale and translation
complex and uniform backgrounds. The system does not require a variations. However, it is not suitable for gestures with almost
hand segmentation procedure. However, it has to be trained also same boundary profiles. In [39,40], the static hand gestures are
with the gestures in a complex background including small classified through Hausdorff distance matching that involves
variations in the scale and rotation. This increases the complexity computing the point-wise correspondence between the boundary
of the training procedure and demands a larger training set. Amin pixels of the images to be compared. Their experiments show that
and Yan [17] derived the Gabor features from the intensity hand Hausdorff distance based matching provide good recognition
images. The feature set is reduced in dimensionality using the efficiency. But the major drawback of Hausdorff distance based
principal component analysis (PCA) and the classification is matching is its computational complexity.
performed with fuzzy C-means clustering. The images are initially Dias et al. [41] present a system known as the open gestures
scale normalized by resizing them to a fixed size. The rotation recognition engine (O.G.R.E) for recognizing the hand postures in
correction is achieved by aligning the major axis of the forearm the Portuguese Sign Language. The histogram of the distances and
along 901 with respect to the horizontal axis of the image. the angle between the contour edges are used to derive a contour
Similarly, Huang et al. [22] have also employed Gabor-PCA signature known as the pair-wise geometrical histogram. The
features for representing the hand gestures. They estimate the classification is performed by comparing the pair-wise geome-
orientation of the hand gestures using the Gabor filter responses. trical histograms representing the gestures. Kelly et al. [42] have
The estimated angle is used to correct the hand pose into an derived features from the binary silhouette and the one dimen-
upright orientation [22]. The Gabor-PCA features are classified sional boundary profile to represent the hand postures. The
using the support vector machine (SVM) classifier. binary silhouette is represented using the Hu moments. The size
The geometric moment invariants derived from the binary functions are derived from the boundary profile to describe the
hand silhouettes form the feature set in [23,24]. The direct hand hand shape. The dimensionality of the size functions is reduced
features such as the number of protruded fingers, the distance using the PCA. They combine the Hu moments and the eigen
along with angle between the fingers and the palm region are space size functions to achieve user independent gesture
used for gesture representation in [25–28]. Hu’s moment invar- recognition.
iants are used as a feature set in [29]. These methods are sensitive From the above study, we infer that the boundary based
to variations in the hand anatomy and the viewpoint distortions. representations fail in discriminating gestures with almost same
Chang et al. [30] compute the Zernike and the pseudo-Zernike boundary profiles and are sensitive to boundary distortions.
moment features for representing the hand postures. They Therefore, for a visually distinct gesture vocabulary, the moment
decompose the binary hand silhouette into the finger and the based approaches can be well explored. Also, processing the
palm parts. The decomposition allows the features to be localized intensity image sequences is complex and it increases the
with respect to the palm and the finger regions. The Zernike and computational load. Hence, binary silhouettes of the hand ges-
the pseudo-Zernike moment features are derived separately for tures are used for processing instead of the intensity images.
both the regions with higher weights given to the finger features
during recognition. Gu and Su [31] employed a multivariate
piecewise linear decision algorithm to classify the Zernike
moment features obtained from the hand postures. The system 3. Theory of moments
is trained to be user and viewpoint independent.
Boundary-based representations include invariant features The non-orthogonal and the orthogonal moments have been
derived from the Fourier descriptors (FD), the localized contour used to represent images in different applications including shape
sequences and the curvature scale space (CSS). The Fourier descrip- analysis and object recognition. The geometric moments are the
tors in terms of the discrete Fourier transform (DFT) are obtained most widely employed features for object recognition [9,43].
from the complex representation of the closed boundary points. However, these moments are non-orthogonal and so reconstruct-
These descriptors are one of the efficient representations and are ing the image from the moment features is very intricate. It is also
invariant to rotation. However, the matching algorithm is sensi- not possible to decipher the accuracy of such representations.
tive to the starting point variations in contour extraction [32,33]. In image analysis, Teague [44] introduced and derived the
In order to compensate for starting point variations, the contour is orthogonal moments with the orthogonal polynomials as the
traced from a fixed point in [33]. The distance between DFT basis functions. The set of orthogonal moments has minimal
coefficients of two different curves is computed using the mod- information redundancy [44,45]. In this class, the Zernike
ified Fourier descriptor (MFD) method. The recognition efficiency moments defined in the polar domain are based on the contin-
depends on the choice of number of Fourier descriptors. Gupta uous orthogonal polynomials and are rotation invariant [43]. For
and Ma [34] derive the localized contour sequences for represent- computation, the Zernike moments have to be approximated in
ing the hand contours. The contour representation is sensitive to the discrete domain and the discretization error increases for
shifts in the starting point. Hence, the invariance is incorporated higher orders. Hence, the moments based on discrete orthogonal
into the classification stage by determining the position of best polynomials like the Tchebichef and the Krawtchouk polynomials
match using the circular shift operation. The CSS proposed in [35] have been proposed [44]. These moments are defined in
is an efficient technique for extracting curvature features from an the image coordinate space and do not involve any numerical
input contour at multiple scales. The CSS consists of large peaks approximation.
that represent the concavities in the image contour and the A few studies are being reported on the accuracy of the
method is easily made invariant to translation, scaling and Krawtchouk moments in image analysis [7,8,46,45]. Yap et al. [7]
rotation. Kopf et al. [36] and Chang [37] employed the CSS to introduced the Krawtchouk moments for image representation and
capture the local features of the hand gesture. Since the human verified their performance on character recognition. From the
hand is highly flexible, the location of the largest peak in the CSS experiments, they conclude that Krawtchouk moments perform
image will be unstable for the same hand postures, thus affecting better than geometric moments and the other orthogonal moments
the recognition. Also, similar hand contours are not well discri- like the Zernike, the Legendre and the Tchebichef moments.
minated. Liang et al. [38] have proposed hand gesture recognition To derive the moments, consider a 2D image f ðx,yÞ defined
using the radiant projection transform and the Fourier transform. over a rectangular grid of size ðN þ1Þ  ðM þ 1Þ with ðx,yÞ A
S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219 2205

f0,1, . . . ,Ng  f0,1, . . . ,Mg. In this work, we consider the following The Zernike moment Znm of order n is given by
moments to represent f. Z 2p Z 1
nþ1
Z nm ¼ V nnm ðr, yÞf ðr, yÞr dr dy ð4Þ
p 0 0
3.1. Geometric moments
The integration in (4) needs to be computed numerically. The
The geometric moment of order ðn þ mÞ is defined as [43] magnitude of the Zernike moments Znm is invariant to rotation
X
N X
M and hence, are commonly used for rotation invariant gesture
Gnm ¼ f ðx,yÞxn ym ð1Þ representation [30,31].
x¼0y¼0

where n A f0,1, . . . ,Ng and m A f0,1, . . . ,Mg. The geometric


3.3. Krawtchouk moments
moments Gnm are the projection of the image f ðx,yÞ on the 2D
polynomial bases xn ym .
The Krawtchouk moments are discrete orthogonal moments
derived from the Krawtchouk polynomials. The nth order
3.2. Zernike moments Krawtchouk polynomial at a discrete point x with ð0 op o1,
q ¼ 1pÞ is defined in terms of hypergeometric function as [47]
These moments are defined on the polar coordinates ðr, yÞ,  
such that 0 r r r 1 and 0 r y r 2p. The complex Zernike poly- 1
K n ðx; p,NÞ ¼ 2 F 1 n,x; N; ð5Þ
nomial V nm ðr, yÞ of order n Z0 and repetition m is defined as [43] p

V nm ðr, yÞ ¼ Rnm ðrÞ expðjmyÞ ð2Þ By definition


For even values of n9m9 and 9m9r n, Rnm ðrÞ is the real-valued Xn
ðaÞv ðbÞv zv
radial polynomial given by 2 F 1 ða,b; c; zÞ ¼ and ðaÞv ¼ aða þ 1Þ . . . ða þ v1Þ
v¼0
ðcÞv v!
ðn9m9Þ=2
X ð1Þs ðnsÞ!rn2s Thus
Rnm ðrÞ ¼
s¼0
s!ððn þ 9m9Þ=2sÞ!ððn9m9Þ=2sÞ! x x

x1

K 0 ðx; p,NÞ ¼ 1, K 1 ðx; p,NÞ ¼ 1 , K 2 ðx; p,NÞ ¼ 1 2þ
Np Np ðN1Þp
The plots of the radial polynomials Rnm ðrÞ for different orders of n
and repetition m are given in Fig. 1(a). The 2D complex Zernike and so on.
polynomials V nm ðr, yÞ obtained for different values of n and m are The set of ðN þ 1Þ Krawtchouk polynomials forms a complete
shown in Fig. 1(b). The complex Zernike polynomials satisfy the orthogonal basis with a binomial weight function
orthogonality property  
Z 2p Z 1 N
p wðx; p,NÞ ¼ px ð1pÞNx ð6Þ
V nnm ðr, yÞV lk ðr, yÞr dr dy ¼ d½nld½mk x
0 0 n þ1
where d½: is the Kronecker delta function. The orthogonality property is given by [7,47]

X  
N
1p n n!
wðx; p,NÞK n ðx; p,NÞK m ðx; p,NÞ ¼ ð1Þn d½nm
3.2.1. Image representation by Zernike polynomials p ðNÞn
x¼0
The Zernike polynomials are defined in the polar domain and
ð7Þ
hence, the image coordinates (x,y) system needs to be converted
to ðr, yÞ coordinates. Let f ðr, yÞ define the image in polar domain. where d½: is the Kronecker delta function. Assuming rðn; p,NÞ ¼
Using the Zernike polynomials of the form V nm ðr, yÞ, the image ð1Þn ðð1pÞ=pÞn n!=ððNÞn Þ, the weighted Krawtchouk polynomial
f ðr, yÞ is approximated as [43] for order n ¼ 0,1, . . . ,N is defined as
X
n max X
n sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
f ðr, yÞ ffi Z nm V nm ðr, yÞ ð3Þ wðx; p,NÞ
K n ðx; p,NÞ ¼ K n ðx; p,NÞ ð8Þ
n¼0 m ¼ 0
n9m9even
rðn; p,NÞ

1.5 n=6, m=0 n=12, m=0


R00
1 R11
1
0 R20
Rnm − Radial polynomial

R22
0.5 −1
0 0.5 1
R31 n=6, m=2 n=12, m=2

0 R33
R40

−0.5

n=6, m=4 n=12, m=4


−1

−1.5
0 0.2 0.4 0.6 0.8 1
ρ − Radius

Fig. 1. (a) 1D Zernike radial polynomials Rnm ðrÞ. (b) 2D complex Zernike polynomials V nm ðr, yÞ (real part).
2206 S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219

and hence, the orthogonality condition in (7) becomes p1 can be tuned to shift the ROI horizontally and p2 shifts the ROI
vertically. Like in the 1D case, the direction of shift depends on the
X
N
K n ðx; p,NÞK m ðx; p,NÞ ¼ d½nm signs of Dp1 and Dp2 . Therefore, with the proper choice of p1 and
x¼0 p2, a subimage corresponding to the desired ROI can be repre-
sented by the Krawtchouk moments. This wavelet-like property
Thus the weighted Krawtchouk polynomials form an orthonormal
gives the capability of ROI feature extraction to the Krawtchouk
basis. The constant p can be considered as a translation parameter
moments.
that shifts the support of the polynomial over the range of x. For
By comparing the plots of 2D polynomial functions in
p ¼ 0:5 þ Dp, the support of weighted Krawtchouk polynomial
Figs. 1(b) and 2(b), we can infer that the Zernike polynomials
is approximately shifted by NDp [7]. The direction of shifting
have wider supports. Therefore, the Zernike moments character-
depends on the sign of Dp. The polynomial is shifted in þ x
ize the global shape features. The support of the Krawtchouk
direction when Dp is positive and vice versa. The plots of 1D
polynomials varies with its order. The lower order polynomials
weighted Krawtchouk polynomials of order n ¼0,1 and 2
have compact supports and the higher orders have more wider
for N ¼64 and three values of p are shown in Fig. 2(a). The
supports. Therefore, the lower order Krawtchouk moments cap-
Krawtchouk polynomials are calculated recursively using the
ture the local features and the higher order moments represent
relations [47]
the global characteristics.
pðnN1ÞK n ðxÞ ¼ ðx þ 12p þ2pnnNpÞK n1 ðxÞðp1Þðn1ÞK n2 ðxÞ It can also be noted that the lower order Krawtchouk poly-
ð9Þ nomials have relatively high spatial frequency characteristics and
hence, the local variations corresponding to the edges are well
where
defined at the lower orders. In the case of Zernike moments, the
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi spatial frequency increases only with the order. Hence, it requires
x wðx; p,NÞ
K 0 ðx; p,NÞ ¼ wðx; p,NÞ and K 1 ðx; p,NÞ ¼ 1 higher orders to represent the edges and the lower orders to
pN rð1; p,NÞ
characterize the average shape information.
The Krawtchouk moments obtained in (11) are used as
features in the proposed feature-based hand gesture classification
3.3.1. Image representation by Krawtchouk polynomials system. The geometric moments and the Zernike moments as
The separability property can be used to obtain the 2D given in (1) and (4) respectively are also used as features for
Krawtchouk bases and the image f ðx,yÞ is approximated by the comparative analysis.
sum of weighted Krawtchouk polynomials as
X
n X
max mmax
4. Gesture recognition using Krawtchouk moments
f ðx,yÞ ffi Q nm K n ðx; p1 ,NÞK m ðy; p2 ,MÞ ð10Þ
n¼0m¼0
The proposed gesture recognition system is developed by
where K n ðx; p1 ,NÞK m ðy; p2 ,MÞ is the ðn þ mÞth order 2D weighted broadly dividing the procedure into three phases. They are:
Krawtchouk basis with 0 o p1 o 1, 0 op2 o1 and the coefficient (1) hand detection and segmentation, (2) normalization and
Qnm is called the ðn þmÞth order Krawtchouk moment. Fig. 2(b) shows (3) feature extraction and classification. A description of these
the plots of 2D Krawtchouk bases for N ¼ M ¼ 104, p1 ¼ p2 ¼ 0:5 and tasks is presented below. Fig. 3 shows a schematic representation
different values of n and m. of the proposed gesture recognition system.
Using the orthogonality property, the Krawtchouk moments of
order ðn þmÞ are obtained as
4.1. Hand detection and segmentation
N X
X M
Q nm ¼ K n ðx; p1 ,NÞK m ðy; p2 ,MÞf ðx,yÞ ð11Þ This phase detects and segments the hand data from the
x¼0y¼0
captured image. The hand regions are detected using the skin
The appropriate selection of p1 and p2 enables to extract the local color pixels. The background is restricted such that the hand is the
features of an image at the region-of-interest (ROI). The parameter largest object with respect to the skin color.

p=0.25 n=6, m=0 n=12, m=0


Order: n =0 Order: n =1 Order: n = 2
0.3 0.2
0.2
f(x)

0.2
f(x)

f(x)

0 0
0.1
−0.2 −0.2
20 40 60 20 40 60 20 40 60
x x x n=6, m=2 n=12, m=2

p=0.5
0.3 0.2 0.2
0.2
f(x)
f(x)

f(x)

0 0
0.1
−0.2 −0.2
20 40 60 20 40 60 20 40 60
n=6, m=4 n=12, m=4
x x x
p=0.75
0.3 0.2 0.2
0.2 0
f(x)

f(x)

f(x)

0
0.1
−0.2 −0.2
20 40 60 20 40 60 20 40 60
x x x

Fig. 2. (a) 1D weighted Krawtchouk polynomials for different values of p and N ¼64. (b) 2D weighted Krawtchouk polynomials for p1 ¼ p2 ¼ 0:5 and N ¼ M ¼ 104.
S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219 2207

Fig. 3. Schematic representation of the gesture recognition system.

Fig. 4. Results of hand segmentation using skin color detection. (a) Acquired images. (b) Skin color regions. (c) Segmented gesture images.

Teng et al. [1] have given a simple and effective method to the detected binary objects. The resultant is subjected to morpholo-
detect skin color pixels by combining the features obtained from gical closing operation with a disk-shaped structuring element in
the YCbCr and the YIQ color spaces. The hue value y is estimated order to obtain a well defined segmented gesture image.
from the Cb–Cr chromatic components by
  4.2. Normalization techniques
Cr
y ¼ tan1 ð12Þ
Cb
This is an essential phase in which the segmented image is
The in-phase color component I is calculated from the RGB normalized for any geometrical variations in order to obtain the
components as desired hand gesture. The important factors to be compensated in
this step are
I ¼ 0:596R0:274G0:322B ð13Þ

Their experiments conclude about the range of y and the in-phase


color component I for Asian and European skin tones. The pixels are 1. The presence of forearm region.
grouped as skin color pixels if 1051 r y r 1501 and 30 rI r 100. 2. The orientation of the object.
Fig. 4(b) illustrates the skin color detection using this method for
the hand gesture images shown in Fig. 4(a). The detection results in The recognition efficiency can be improved through proper nor-
a binary image which may also contain other objects not belonging malization of the gesture image. Hence, a robust normalization
to the hand. Since, the hand is assumed to be the largest skin color method based on the gesture geometry is proposed for extracting
object, the other components are filtered by comparing the area of the hand region and orientation correction.
2208 S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219

4.2.1. Proposed method for rule based hand extraction (2) The straightforward implementation of EDT defined through
Consider a binary image f defined over a grid B of size (19) and (20) is computationally expensive. Therefore, the
ðN þ1Þ  ðM þ 1Þ. B is composed of two complementary regions conventional approach to fast EDT based on the Voronoi
R and R representing the gesture (object) and the background decomposition of the image proposed in [48] is employed. A
respectively. Thus study on the several other algorithms proposed for reducing
the computational complexity of EDT is discussed in [49].
R ¼ fðx,yÞ9ðx,yÞ A B and f ðx,yÞ ¼ 1g ð14Þ
(b) Verification of region continuity
and the complementary region R is given by (3) The continuity of the subregions after detection is verified
through connected component labeling preceded by morpho-
R ¼ B\R ð15Þ
logical erosion. The erosion operation with a small structuring
The boundary dR of the gesture region is defined by the set of element is performed to disconnect the weakly connected
pixels in R that are adjacent to at least one pixel in the region R. It object pixels. The structuring element considered is a disk
is represented as operator with radius 3. The resultant is verified to be a
dR ¼ fðx,yÞ9ðx,yÞ A R and ðx,yÞ is adjacent to a pixel in Rg ð16Þ continuous region if there is only one connected component.
If there is more than one connected component, the detected
The gesture region R can be partitioned in to three subregions. region is verified as discontinuous.
They are (a) Rfingers (fingers), (b) Rpalm (palm) and (c) Rforearm
(forearm). Hence The geometrical measurements along the finger regions vary
R ¼ Rfingers [ Rpalm [ Rforearm ð17Þ with the users and they get altered due to geometric distortions.
However, the measures across the palm and the forearm can be
such that generalized and their ratios are robust to geometric distortions.
Rfingers \ Rpalm ¼ | The palm is an intactly acquired part that connects the fingers and
Rfingers \ Rforearm ¼ | the forearm. Since Rpalm lies as an interface between Rfingers and
Rpalm \ Rforearm ¼ | ð18Þ Rforearm, the separation of palm facilitates the straightforward
detection of the other two regions. Hence, the anthropometry of
Fig. 5 illustrates these elementary regions comprising the palm is utilized for detecting the regions in the gesture image.
gesture object R. Based on the anatomy, the palm and the forearm
can be considered as continuous smooth regions. The forearm 4.2.1.1. Anthropometry based palm detection. The parameters of
extends outside the palm and its width is less than that of the the hand considered for palm detection are the hand length, palm
palm region. Conversely, the region containing the fingers is length and the palm width that are as illustrated in Fig. 6(a). The
discontinuous under abduction. Also, the width of a finger is anthropometric studies in [50–52] present the statistics of the
much smaller than that of the palm and the forearm. Therefore, above mentioned hand parameters. From these studies, we infer
the geometrical variations in the width and the continuity that the minimum value of the ratio of palm length ðLpalm Þ to palm
of these subregions in the gesture image are used as cues for width ðW palm Þ is approximately 1.322 and its maximum value is
detection. 1.43. Similar observations were made from our photometric
experiments. Fig. 6(b) gives the histogram of the different palm
length to palm width ratio obtained through our experimentation.
(a) Computation of width This ratio will be utilized to approximate the palm region as an
(1) The variation in the width along the longest axis of the ellipse. Considering all the variations of this ratio, we take
gesture image is calculated from the distance map obtained
Lpalm ¼ 1:5  W palm ð21Þ
using the Euclidean distance transform (EDT). The EDT gives
the minimum distance of an object pixel to any pixel on the Based on the geometry, we approximate the palm region Rpalm as
boundary set dR. The Euclidean distance between a boundary an elliptical region with
pixel ðxb ,yb Þ A dR and an object pixel ðx,yÞ A R is defined as Major axis length ¼ 1:5  Minor axis length ð22Þ
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
dðxb ,yb Þ,ðx,yÞ ¼ ðxxb Þ2 þðyyb Þ2 ð19Þ Assuming apalm as the semi-major axis length and bpalm as the
semi-minor axis length, we can write
The value of the EDT, Dðx,yÞ for the object pixel (x,y) is
computed as Lpalm
apalm ¼ ð23Þ
2
Dðx,yÞ ¼ min dðxb ,yb Þ,ðx,yÞ ð20Þ
ðxb ,yb Þ A dR
W palm
bpalm ¼ ð24Þ
The values of Dðx,yÞ at different (x,y) are used to detect the 2
subregions of R. Therefore

apalm ¼ 1:5  bpalm ð25Þ

Using (25), it can be inferred that all the pixels constituting Rpalm
will lie within the ellipse of semi-major axis length apalm.
Therefore, the palm center and the value of apalm have to be
estimated for detecting the palm region.
Computing the palm center. Given that the boundary of Rpalm is
an ellipse, its center is known to have the maximum distance to
the nearest boundary. Therefore, the center of Rpalm is computed
using the EDT in (20). The pixels (x,y) with EDT values Dðx,yÞ
Fig. 5. Pictorial representation of the regions composing the binary image f. R greater than a threshold z are the points belonging to the
denotes the gesture region and R denotes the background region. neighborhood of the center of the palm. This neighborhood is
S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219 2209

Fig. 6. (a) Hand geometry. (b) Histogram of the experimental values of palm length ðLpalm Þ to palm width ðW palm Þ ratio calculated for 140 image samples taken from 23
persons.

defined as is a
For R with no forearm region, Rforearm ¼ | and A ¼ Rpalm . Rpalm
part of A that is approximated as an elliptic region. Thus
C ¼ fðx,yÞ A R9Dðx,yÞ 4 zg ð26Þ
8  9
>
>  ðxo ,yo Þ A A and >
>
The center ðxc ,yc Þ is defined as the palm centroid and given by <  2  2 =

x
Rpalm ¼ ðxo ,yo Þ o c þ o c r1
x y y ð31Þ
ðxc ,yc Þ ¼ ðbX e,bY eÞ ð27Þ >
>  apalm >
>
: bpalm ;
where
1 X 1 X
X¼ xi , Y¼ y
9C9 ðx ,y Þ A C 9C9 ðx ,y Þ A C i 4.2.1.2. Detection of forearm. The forearm is detected through the
i i i i
abstraction of the palm region Rpalm from the gesture image R. The
9C9 is the cardinality of C and be denotes rounding off to the abstraction separates the forearm and the finger regions, such
nearest integer. that R is modified as
The threshold z is selected as maxðDðx,yÞ Þt. The offset t is
considered to compensate for the inaccuracies due to viewing R^ ¼ R\Rpalm ¼ Rfingers [ Rforearm ð32Þ
angles. For small values of t, the centroid may not correspond to As in the case of palm detection, the finger region is removed from R^
the exact palm center and for large values of t will tend to deviate through the morphological opening operation. The structuring
the centroid from the palm region. The optimal value of t is element is a disk with its radius calculated from (30). The resultant
experimentally chosen as 2. is a forearm region and has the following characteristics:
Computing the semi-major axis length. From the geometry, it can
be understood that the nearest boundary points from the palm
centroid correspond to the end points of the minor axis. Hence, the 1. The resultant Rforearm DA and the region enclosing Rforearm is
EDT value at ðxc ,yc Þ is the length of the semi-minor axis and therefore continuous.
bpalm ¼ Dðxc ,yc Þ ð28Þ 2. The width of the wrist crease is considered as the minimum
width of the forearm region. From the anthropometric mea-
From (25), it follows that the length of the semi-major axis can be sures in [51], the minimum value of the ratio of the palm
given as width to wrist breadth is obtained as 1.29 and the maximum
apalm ¼ 1:5  Dðxc ,yc Þ ð29Þ value is computed as 1.55. Using this statistics, the empirical
value for the width of the forearm is chosen as

2bpalm
Detecting the palm. In order to ensure proper detection of the W forearm 4 ð33Þ
1:29
palm, the finger regions (Rfingers) are sheared from the segmented
object through the morphological opening operation. The struc-
turing element is a disk with radius dr empirically chosen as
4.2.1.3. Identifying the finger region. Having detected the palm and
bpalm the forearm, the remaining section of the gesture image R will
dr ¼ ð30Þ
1:5 contain the finger region if it satisfies the following conditions:

The resultant is considered as the residual and will be referred as


the oddment. The oddment is generally composed of the palm  Rfingers JA.
region and may or may not contain the forearm. This implies  The region enclosing Rfingers is marked by irregular boundary, if
A DR. Therefore, the oddment A can be defined as more than one finger is abducted.
 The width of a finger (maximum EDT value in this section)
A ¼ Rpalm [ Rforearm is much less than that of the palm and the forearm.
2210 S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219

Fig. 7. Illustration of the rule based region detection and separation of the hand from the gesture image f. The intensity of the background pixels is assigned as 0 and the
object pixels are assigned the maximum intensity value 1.

Experimentally where x is the user defined minimum permissible difference. The


finite difference approximations
bpalm
W finger r ð34Þ gðz þ 1Þgðz1Þ
2 g 0 ðzÞ C ð36Þ
2
and
A procedural illustration of the proposed rule-based method for
gðz þ 1Þ þgðz1Þ2gðzÞ
detecting the hand region from the input image is shown in Fig. 7. g 00 ðzÞ C ð37Þ
After detecting the hand region, the pixels belonging to the 4
forearm Rforearm are removed from the gesture image. are used to implement (35). In some cases, a few peaks may
correspond to the palm region. These points are easily eliminated
4.2.2. Proposed approach to orientation correction by verifying their presence in the oddment A. The 2D coordinate
The orientation of the hand can be assumed to imply the positions of the detected peaks are utilized to find a representative
orientation of the gesture. In the case of static gestures, the peak for each abducted finger. The distance curve corresponding to a
information is conveyed through the finger configurations. Since gesture and the detected finger tips are shown in Fig. 8.
the human hand is highly flexible, it is natural that the orientation Let L be the total number of detected peaks and gi , i ¼ 1, . . . ,L
of the oddment might not be the orientation of the fingers. Hence, define the position vectors of the detected points with respect to
the major axis of the gesture is not sufficient to estimate the ðxc ,yc Þ, indexed from left to right. These vectors gi are referred as
angular deviation that is caused by the fingers. Therefore, in order the finger vectors and the central finger vector g^ is computed from
to align a gesture class uniformly, the orientation with respect to 8
the abducted fingers is utilized. If the number of abducted fingers < gðL þ 1Þ=2
> if L is odd
is less than 2, the orientation correction is achieved using the g^ ¼ gL=2 þ gL=2 þ 1 ð38Þ
>
: otherwise
oddment. 2

The gestures are assumed to be perfectly aligned if the vector g^ is


4.2.2.1. Orientation correction using finger configuration. The normaliz-
at 901 with respect to the horizontal axis of the image. Otherwise,
ation of rotation changes based on the finger configuration is
the segmented gesture image is to be rotated by 901+g^ .
achieved by detecting the tip of the abducted fingers. For this
purpose, the boundary points ðxb ,yb Þ are ordered as a contiguous
chain of coordinates using the 8-connectivity. Any one of the 4.2.2.2. Orientation correction using the oddment. The geometry of
boundary pixels that is not enclosed within the region containing the oddment A is utilized to correct the orientation of the gestures
fingers is used as the starting point and the ordering is performed in with only one abducted finger and the gestures like the fist. The
the clockwise direction. shape of the oddment can be well approximated by an ellipse
Suppose that z is the length of the boundary measured by the and hence, the orientation of its major axis with respect to
number of pixels, then a distance curve g(z) is generated by the horizontal axis of the image gives the approximate rotation
computing the Euclidean distances between the palm centroid angle of the hand gesture.
ðxc ,yc Þ and the boundary pixel ðxb ,yb Þ at z using (19).
The curve g is smoothed using cubic-spline smoothing [53]. The
resultant is a smooth curve consisting of peaks that correspond to the 4.2.3. Normalization of scale and spatial translation
finger tips of the hand gesture. These peaks are detected by comput- The scale of the rotation corrected gesture region is normal-
ing the first and the second order derivatives of g using the finite ized and fixed to a pre-defined size through the nearest neighbor
difference approximations. Thus, gðzÞ is considered to be a peak if interpolation/down sampling technique. The spatial translation is
corrected by shifting the palm centroid ðxc ,yc Þ to the center of
9g 0 ðzÞ9 o x and g 00 ðzÞ o 0 ð35Þ the image.
S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219 2211

Therefore, the resultant is the segmented gesture image that is In our experiment, the segmentation overload is simplified by
normalized for transformations due to rotation, scaling and capturing the images under uniform background. However, the
translation. foreground is cluttered with other objects and the hand is ensured
as the largest skin color object within the field-of-view (FOV).
4.3. Feature extraction and classification Except for the size, there were no restrictions imposed on the color
and texture of the irrelevant cluttered objects. Also, the FOV was
The normalized hand gesture data obtained are represented sufficiently large enabling the users to perform gestures more
using the Krawtchouk moments calculated using (11). For com- naturally without interfering their gesturing styles. The gestures
parative studies, the geometric and the Zernike moments that are are captured by varying the viewpoint.
defined through (1) and (4) respectively, are employed as features
for representing the normalized hand gestures. The features 5.1. Determination of viewpoint
representing the gestures are classified using the minimum
distance classifier. In the field of imaging, the viewpoint refers to the position of the
Consider zs and zt as the feature vector of the test image and camera with respect to the object of focus [54]. Therefore, in the
the target image (in the trained set) respectively. Then, the context of our experiment, we define the viewing angle as the angle
classification of zs is done using the minimum distance classifier made by the camera with the longest axis of the hand. The optimal
defined as choice of viewing angle or the viewpoint is determined by the
X
T amount of perspective distortion. The distortion is caused if the focal
dt ðzs ,zt Þ ¼ ðzsj ztj Þ2 ð39Þ plane is not parallel to the object’s surface and/or not in level with the
j¼1
center of the object. This refers that the camera is not at equidistant
from all the parts of the object [55]. Hence, the viewpoint is assumed
Match ¼ arg min ðdt Þ
ftg to be optimum if the camera is placed parallel to the surface of the
hand. For our experimental setup, the optimum viewing angle is
where t is the index of signs in the trained set and T is the determined to be 901. Fig. 9(a) illustrates the setup for gesture
dimension of the feature vectors. acquisition from the optimal viewpoint and Fig. 9(b) illustrates the
variation of the viewing angles with respect to the hand.
5. Experimental results and discussion
5.2. About the database
The gestures in the database are captured using the RGB
Frontech e-cam. The camera has a resolution of 1280  960 and The database consists of two sets of gesture data. The first
is connected to an Intel core-II duo 2 GB RAM processor. dataset composed of gestures collected from a perfect viewpoint,

Fig. 8. Description for finger tip detection using the peaks in the distance curve denotes g^ .

Fig. 9. (a) A schematic representation of the experimental setup at optimal view angle. (b) Illustrations of the view angle variation between the camera and the object. The
object is assumed to lie on the x2y plane and the camera is mounted along the z-axis. The view angle is measured with respect to the x2y plane.
2212 S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219

Fig. 10. Gestures signs in database.

which means that the angle of view is perpendicular to the evaluated as [56]
imaging surface. The second dataset contains gestures captured
by varying the view angles. The testing is performed realtime on ð2mf mf^ þc1 Þð2sf f^ þc2 Þ
SSIMðf , f^ Þl ¼ ð41Þ
the gestures collected under controlled environment. ðm2f þ m2^ þ c1 Þðs2f þ s2^ þc2 Þ
f f
The database is constructed by capturing 4230 gestures from
23 users. The data contains 10 gesture signs with 423 samples for where mf and mf^ denote the mean intensities, s2f and s2^ denote
f
each user. The gesture signs taken for evaluation are shown in the variances and sf f^ denotes the covariance. The constants c1
Fig. 10. The images are collected under three different scales, and c2 are included to avoid unstable results when the means and
seven orientations and the view angles at 451, 901, 1351, 2251 the variances are very close to zero. For our experiments, the
and 3151. constants are chosen as c1 ¼0.01 and c2 ¼0.03. The average of the
Of the 4230 images, 2260 gestures are taken at 901 and the block-wise similarity gives the SSIM index representing the
remaining 1970 at varying view angles. We refer the dataset overall image quality. The value of SSIM index lies on [  1,1]
taken at 901 as Dataset 1 and the remaining data as Dataset 2. The and a larger value means high similarity between the compared
Dataset 1 consists of gestures that vary due to similarity trans- images.
formations of rotation and scaling. Dataset 2 consists of gestures The analysis with respect to a gesture image is presented
that are taken at different view angles and scales. Due to the in Fig. 11 for illustrating the performance of the orthogonal
viewing angles, the gestures undergo perspective distortions and moments in gesture representation. The normalized gesture
the view angle variation also imposes orientation changes. Thus, image considered for analysis is shown in Fig. 11(a). A few
the gestures in Dataset 2 accounts for both perspective (view examples of the corresponding image obtained through recon-
angle) and similarity (orientation and scale) distortions. Also, the struction from different number of Zernike and Krawtchouk
gestures in Dataset 1 are collected cautiously such that there is moments are shown in Fig. 11(b) and (c) respectively. From these
no self-occlusion between the fingers. But, while collecting the reconstructed images, it is observed that the perceptual simila-
samples in Dataset 2 the precautions were not taken to control rities between the original and the reconstructed images are more
self-occlusion which might occur due to either the user’s flex- in Krawtchouk moment based representation.
ibility or the view angle variation. The portioning of the database The comparative plots of the MSE and the SSIM index values
allows to study in detail the efficiency of the user and the view obtained for image reconstruction by varying the number of
independent gesture classification. moments are shown in Fig. 11(d) and (e) respectively. The MSE
value obtained for the Krawtchouk moment based representation
is less than the Zernike moment based representation and
5.3. Gesture representation using orthogonal moments the corresponding SSIM index values show that the Krawtchouk
moments have comparatively high similarity to the original
The potential of the orthogonal moments as features for image. It is also noted that the computation of the Zernike
gesture representation is compared on the basis of their accuracy moments becomes numerically unstable for higher orders. Hence,
in reconstruction. Let f denote a normalized binary gesture image the reconstruction error increases with the increase in the order.
for which the Zernike and the Krawtchouk moments are derived In the case of Krawtchouk moments, the reconstruction error
using (4) and (11) respectively. Accordingly, f^ be the image decreases with the increase in the order.
reconstructed from the corresponding moments through (3) and The Krawtchouk moments closely approximate the original
(10). The image reconstructed from the moments is binarised image and the edges are better defined in the Krawtchouk based
through thresholding. The accuracy in reconstruction is measured approach. This is expected; because the Krawtchouk polynomials
using the mean square error (MSE) and the structural similarity (as shown in Fig. 2) are localized and have relatively high spatial
(SSIM) index. frequency components. On the other hand, the Zernike polyno-
The MSE between the images f and f^ is computed as mials (as shown in Fig. 2) are global functions with relatively less
spatial frequency components. As a result, the Zernike moments
1 X N X M
are not capable of representing the local changes as efficiently as
MSE ¼ ðf ðx,yÞf^ ðx,yÞÞ2 ð40Þ
ðN þ 1ÞðM þ 1Þ x ¼ 0 y ¼ 0 the Krawtchouk moments. From the results in Fig. 11 it can be
inferred that the Krawtchouk moments offer better accuracy
The SSIM index between the images f and f^ is computed block- than the Zernike moments in gesture representation. Therefore,
wise by dividing the images into L blocks of size 11  11. For it suggests that the Krawtchouk moments are potential features
l A f1,2, . . . ,Lg, the SSIM between the lth block of f and f^ is for better gesture representation.
S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219 2213

Fig. 11. Illustration of performance evaluation of gesture representation using Zernike and Krawtchouk moments at various orders. (a) Original image. (b) Examples of
images reconstructed from Zernike moments. N.M. denotes number of moments. (c) Examples of images reconstructed from Krawtchouk moments. N.M. denotes number
of moments. (d) Comparative plot of MSE vs number of moments. (e) Comparative plot of SSIM index vs number of moments.

5.4. Gesture classification consists of 230 gestures with 23 training samples per gesture
sign. Some examples of the gestures contained in the training set
The orders of the orthogonal moments are selected experi- are shown in Fig. 12. The classification is performed on 2030
mentally based on the accuracy in reconstruction and the orders testing samples that are collected from 23 users. The detailed
of the geometric moments are chosen based on the recognition scores of the gesture classification results in Table 1 with respect
performance. The maximum orders of the geometric moments, to the samples from Dataset 1 are given in Tables 2–4.
the Zernike moments and the Krawtchouk moments were fixed at In the case of geometric moments, the rate of misclassification
14 ðn ¼ 7 and m ¼ 7Þ, 30 and 80 ðn ¼ 40 and m ¼ 40Þ respectively. increases vastly as the number of users in the training set
The parameters p1 and p2 of the Krawtchouk polynomials are decreases. Further, the decline in the classification accuracies
fixed at 0.5 each to ensure that the moments are emphasized with indicates that the geometric moments provide the least user
respect to the centroid of the object. The resolution of the image is independence. The gesture-wise classification results of the geo-
fixed at 104  104 with the scale of the hand object normalized to metric moments obtained for varying number of users in the
64  64 through the nearest neighbor interpolation/down sam- training set are tabulated in Table 2(a)–(c). From these results, it
pling method. The experiments for analyzing the user indepen- is observed that most of the mismatch has occurred between the
dence and view invariance are as follows. gestures that are geometrically close. For example, gesture 3 is
mostly misclassified as gesture 2, gesture 1 is misidentified as
gesture 7, gesture 2 is recognized as either gesture 1 or gesture
5.4.1. Verification of user independence 3 and gesture 7 is matched as gesture 2. It is also observed that in
In gesture classification, user independence refers to the the case of geometric moments there is poor perceptual corre-
robustness to user variations that include variations in the hand spondence between the mismatched gestures. This is because the
geometry and the flexibility of the fingers. In order to perform the geometric moments are global features and they only represent
experiment, the training and the testing samples are taken from the statistical attributes of a shape.
Dataset 1. As stated earlier, the gestures in Dataset 1 are collected The Zernike moments offer better classification rate than the
at the optimum view angle such that the gestures do not undergo geometric moments even as the number of users considered for
perspective distortion. Therefore, the variations among the ges- training decreases. From the comprehensive scores of the classifica-
ture samples in Dataset 1 are only due to the user variations. For tion results given in Table 3(a)–(c), it is understood that the accuracy
this reason, the experiments for verifying the user-invariance are of Zernike moments is mainly reduced due to the confusion among
performed using only the samples in Dataset1. the gestures 1, 8 and 9. Since the Zernike polynomials are defined in
The user independence of the features is verified by varying the polar domain, the magnitude of the Zernike moments for shapes
the number of users considered while training. The number of with almost similar boundary profile will also be approximately
users considered in forming the training set for experimentation same. Hence, the misclassification in the case of Zernike moments
are varied as 23, 15 and 7. Thus, the largest training dataset occurred between the gestures that have almost similar boundary
2214 S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219

Fig. 12. Examples of the training gestures taken from Dataset 1.

Table 1 Table 2
Verification of user independence. Comparison of classification results obtained Comprehensive scores of the overall classification results in Table 1 for geometric
for varying number of users in the training set. The number of testing samples in moments with different number of users in the training set and 203 testing
Dataset 1 is 2030. samples/gesture taken from Dataset 1.

No. of No. of Classification results for the testing samples I/P O/P
users in training from Dataset 1 based on
the samples/ 0 1 2 3 4 5 6 7 8 9
training gestures
set (a) Confusion matrix for 23 training samples/gestures
Geometric Zernike Krawtchouk 0 175 0 1 8 1 0 0 1 8 9
moments based moments moments based 1 0 184 0 0 0 0 0 19 0 0
representation based representation 2 0 18 163 13 4 2 0 1 1 1
representation 3 4 1 53 126 12 0 0 1 0 6
4 1 1 12 18 160 5 2 0 3 1
23 23 82.07 90.89 95.42 5 0 0 0 0 13 179 7 0 0 4
15 15 78.28 89.75 95.12 6 0 1 0 1 10 5 156 27 0 3
7 7 72.66 86.65 93.15 7 0 1 17 2 3 0 3 172 5 0
8 9 3 3 5 13 0 2 1 167 0
% CC—percentage of correct classification. 9 0 0 1 4 0 0 2 12 0 184

(b) Confusion matrix for 15 training samples/gestures


profile. From the samples shown in Fig. 12, it can be noted that the 0 171 5 1 12 3 0 0 1 7 3
1 0 185 0 0 0 0 1 17 0 0
gestures 1, 8 and 9 have almost same boundary profile and hence,
2 0 18 157 19 6 0 0 1 1 1
are frequently mismatched. 3 5 1 65 108 19 0 0 0 0 5
In the case of Krawtchouk moments, the mismatch has 4 1 1 15 15 158 5 3 0 2 3
occurred between the gestures with coinciding regions. With 5 0 0 0 0 22 165 13 0 0 3
respect to the shape, some gesture signs in the dataset can be 6 0 5 0 0 16 6 150 23 1 2
7 0 1 10 2 3 0 5 175 7 0
considered as the subsets of other signs in the context of spatial
8 11 2 4 2 22 0 3 1 158 0
distribution of its pixels. To show that, in Krawtchouk moment 9 3 2 1 8 0 0 2 25 0 162
based representation the confusion has occurred between the
(c) Confusion matrix for 7 training samples/gestures
gestures with almost same spatial distribution of the pixels, a 0 168 4 1 14 4 0 0 1 8 3
simple analysis is performed by comparing the spatial distribu- 1 0 177 0 0 0 0 1 25 0 0
tion of the boundary pixels. If the boundary pixels exhibit higher 2 0 26 135 29 8 0 0 2 1 2
correspondence, so will be the regions within the boundaries. 3 19 0 54 100 20 0 1 1 0 8
4 1 2 2 19 159 5 10 0 2 3
Fig. 13 illustrates a few examples from the misclassifications in
5 0 2 0 0 22 155 21 0 0 3
gesture classes 1 and 9. It can be verified that the spatial 6 0 1 0 2 30 1 129 38 1 1
distribution of the pixels in the test gestures coincides highly 7 0 1 15 2 4 0 4 171 6 0
with the matches obtained through Krawtchouk moments based 8 13 4 3 2 34 0 4 0 143 0
9 2 2 2 9 4 0 5 41 0 138
classification. As per the results in Table 4(a)–(c) the gestures 1,
6 and 9 are frequently misclassified as gesture 7. Some examples
of these misclassifications along with the corresponding training
gestures are shown in Fig. 14. Similarly, gesture 0 is a subset of all 5.4.2. Verification of view invariance
the gesture signs contained in the database. As a result, many The view angle variations during gesture acquisition lead to
gestures are mismatched with gesture 0. From the results in perspective distortions and may sometimes cause self-occlusion.
Table 1, it is confirmed that the Krawtchouk moment features The self-occlusion can also be due to the poor flexibility of the
represent the gestures more accurately than each of the geo- gestures. The study on view invariance verifies the robustness of the
metric and the Zernike moment based features. Particularly, the methods towards the effects of viewpoint changes.
performance of Krawtchouk moments is consistent even for a In order to study the view invariance property of the con-
small number of users in the training set. In the case of the sidered methods, the initial experiment is performed by consider-
geometric and the Zernike moments, the classification rate with ing the training set taken from Dataset 1. We refer the training
respect to each gesture has varied with the varying number of samples from Dataset 1 as Training set-I. Considering that
samples while training. Therefore, it is concluded that among the gestures in Dataset 1 are taken at a view angle of 901,
the moment based representations the Krawtchouk moments are the experiments also include the samples from Dataset 1 while
more robust features for user independent gesture recognition. testing. Therefore, the testing set consists of 3600 samples that
S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219 2215

include 2030 samples from Dataset 1 and 1570 samples from distortion affects the boundary profile and the geometric attri-
Dataset 2. The classification results obtained using the Training butes of a shape. Hence, the geometric moments are insufficient
set-I are tabulated in Table 5. The comprehensive gesture-wise for recognizing the gestures under view angle variation. Similarly,
classification scores are given in Table 6(a)–(c). the Zernike moments are sensitive to boundary distortions and as
From the results in Table 5, it is evident that among the a result the performance of the Zernike moments is low for the
moment based representations, the Krawtchouk moments offer gesture samples from Dataset 2. From the detailed scores in
better classification accuracy. It is known that the perspective Table 6(b), it is observed that the maximum misclassification in

Table 3 Table 4
Comprehensive scores of the overall classification results in Table 1 for Zernike Comprehensive scores of the overall classification results in Table 1 for Krawtch-
moments with varying number of users in the training set and 203 testing ouk moments with different number of users in the training set and 203 testing
samples/gestures taken from Dataset 1. samples/gestures taken from Dataset 1.

I/P O/P I/P O/P

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9

(a) Confusion matrix for 23 training samples/gestures (a) Confusion matrix for 23 training samples/gestures
0 182 3 0 0 0 0 0 0 14 4 0 198 2 0 0 0 0 0 1 2 0
1 0 177 0 0 0 0 2 17 3 4 1 0 186 0 0 0 0 0 16 0 1
2 0 3 198 2 0 0 0 0 0 0 2 0 2 201 0 0 0 0 0 0 0
3 0 0 0 201 0 0 0 1 0 1 3 2 0 0 200 1 0 0 0 0 0
4 1 0 0 0 198 1 3 0 0 0 4 4 0 0 1 198 0 0 0 0 0
5 0 1 0 0 0 197 3 0 0 2 5 0 0 0 0 0 194 3 0 0 6
6 0 0 0 0 0 0 180 22 0 1 6 0 0 0 0 1 0 175 25 0 2
7 0 1 2 0 0 0 1 196 3 0 7 0 2 1 0 0 0 1 193 6 0
8 1 12 1 0 0 0 1 15 162 11 8 2 2 1 0 0 0 0 1 197 0
9 2 11 0 0 0 0 1 0 35 154 9 5 0 0 0 0 0 0 3 0 195

(b) Confusion matrix for 15 training samples/gestures (b) Confusion matrix for 15 training samples/gestures
0 184 2 0 0 0 0 0 0 12 5 0 197 2 0 0 0 0 0 1 3 0
1 0 176 0 0 0 0 2 18 5 2 1 0 185 2 0 0 0 0 15 0 1
2 0 3 197 2 0 0 1 0 0 0 2 0 1 202 0 0 0 0 0 0 0
3 0 0 0 201 0 0 0 1 0 1 3 2 0 0 200 0 0 1 0 0 0
4 1 1 0 0 197 1 3 0 0 0 4 4 0 0 1 198 0 0 0 0 0
5 0 1 0 0 0 197 3 0 0 2 5 0 0 0 0 0 193 4 0 0 6
6 0 0 0 0 0 0 190 12 0 1 6 0 0 0 0 1 0 178 22 0 2
7 0 3 2 0 0 0 4 188 5 1 7 0 4 1 0 0 0 3 189 6 0
8 3 12 0 0 0 0 1 15 160 12 8 3 2 1 0 1 0 0 1 195 0
9 3 19 0 0 0 0 0 4 44 133 9 4 0 0 2 0 0 0 3 0 194

(c) Confusion matrix for 7 training samples/gestures (c) Confusion matrix for 7 training samples/gestures
0 181 3 0 0 0 0 1 0 11 7 0 194 4 0 0 0 0 0 0 5 0
1 0 179 1 0 0 0 0 16 3 4 1 0 186 2 0 0 0 0 15 0 0
2 0 5 189 3 0 0 1 0 1 4 2 0 5 198 0 0 0 0 0 0 0
3 0 0 0 200 1 0 0 1 0 1 3 3 1 0 188 2 0 1 3 5 0
4 1 0 0 1 196 0 5 0 0 0 4 4 0 0 0 199 0 0 0 0 0
5 0 2 0 0 3 189 7 0 0 2 5 0 0 0 0 0 194 5 0 0 4
6 0 0 0 0 0 0 186 17 0 0 6 0 0 0 0 2 0 158 42 0 1
7 0 0 2 0 0 0 2 192 3 4 7 0 2 2 0 0 0 3 195 1 0
8 7 18 0 0 0 0 1 29 135 13 8 4 0 5 0 1 0 2 2 189 0
9 4 14 0 1 0 0 1 30 41 112 9 2 0 0 5 0 0 0 6 0 190

Test gesture Obtained match Actual match in Test gesture` Obtained match Actual match in
gesture‘1’ gesture‘7’ Training set of gesture‘1’ gesture‘9’ gesture‘7’ Training set of gesture ‘9’

Comparison between the spatial distribution of boundary points Comparison between the spatial distribution of boundary points
10 10
20 20
20 20
30 30
30 30
40 40
40 40
50 50
50 50
60 60
60 60
70 70
70 70
80 80
80 80
20 30 40 50 60 70 80 20 30 40 50 60 70 80 20 30 40 50 60 70 80 20 30 40 50 60 70 80
Test data Test data
Obtained match Obtained match
Actual match Actual match

Fig. 13. Examples of results from Krawtchouk moment based classification. The illustration is presented to show that the Krawtchouk moments depend on the similarity
between the spatial distribution of the pixels within the gesture regions. The spatial correspondence between the gestures is analyzed based on the shape boundary. It can
be observed that the maximum number of boundary pixels from the test sample coincide more with the obtained match rather than the actual match.
2216 S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219

Fig. 14. Results from the experiment on user invariance. Examples of the testing samples that are misclassified in Krawtchouk moments based method. The
correspondence of the test gesture can be observed to be high with respect to the mismatched gesture rather than the trained gestures within the same class.

Table 5 Table 6
Experimental validation of view invariance. Comparison of classification results Confusion matrix for the classification results given in Table 5 for Training set-I
obtained for Training set-I & II. The training sets include gestures collected from with 23 training samples/gesture signs and 360 testing samples/gesture signs.
23 users. The number of testing samples in Dataset 1 and Dataset 2 is 2030 and
1570 respectively. I/P O/P

Moment based Training set-I Training set-II 0 1 2 3 4 5 6 7 8 9


representations (a) Detailed score of geometric moments
% CC for % CC for Overall % CC for % CC for Overall 0 306 3 1 14 1 1 17 17
Dataset Dataset % CC Dataset Dataset % CC 1 1 309 4 45 1
1 2 1 2 2 39 277 22 10 2 1 2 7
3 24 3 90 203 21 1 2 16
Geometric 82.07 71.4 77.42 87.39 80.57 84.42 4 6 2 23 35 272 10 2 6 4
moments 5 1 39 274 25 1 20
Zernike 90.89 75.48 84.17 94.83 90.32 92.86 6 2 5 37 6 259 39 4 8
moments 7 10 38 13 3 7 276 11 2
Krawtchouk 95.42 86.88 91.69 97.73 95.92 96.94 8 21 4 7 7 28 4 1 285 3
moments 9 1 1 13 1 2 16 326

(b) Detailed score of Zernike moments


% CC—percentage of correct classification.
0 326 4 24 6
1 290 2 36 17 15
2 2 36 309 3 1 1 8
Zernike moments based method is again due to the confusion 3 9 1 9 325 4 1 1 4 6
4 8 3 1 10 328 1 5 4
among the gestures 1, 8, and 9. Similarly, gesture 7 is confused
5 1 1 4 327 12 15
with gesture 2 and gesture 6 is misclassified as gesture 7. Among 6 2 299 44 13 2
the moment based representations, the Krawtchouk moments 7 1 8 22 4 309 15 1
have higher recognition rate for the testing samples from Datasets 8 8 23 1 1 23 285 19
1 and 2. Particularly, in the case of Dataset 2 the improvement is 9 7 38 2 2 79 232

almost by 11% for Training set-I and it indicates that the Krawtch- (c) Detailed score of Krawtchouk moments
ouk moments are robust to the view angle variations. 0 354 3 1 2
1 319 1 38 2
By comparing the gesture-wise classification results given in
2 7 22 331
the Table 6(a)–(c), it is observed that the number of misclassifica- 3 22 1 8 315 2 7 5
tions is notably more for almost all the gestures in Dataset 2. The 4 15 12 330 1 1 1
samples of some of the gestures from Dataset 2 with higher 5 2 321 18 19
misclassification rate is shown in Fig. 15. It can be understood 6 1 303 45 6 5
7 1 7 3 2 335 12
that the recognition efficiency is reduced mainly due to the self-
8 7 3 1 1 2 346
occlusion between the fingers and the boundary deviations. 9 8 5 347
From Table 5 it should be noted that the classification accuracy
is better for the testing samples from Dataset 1. This is because
the Training set-I is constructed using the samples taken Dataset 9 have less classification rates. As tabulated in Table 7(c), the
1. This indicates that the performance for Dataset 2 can be maximum cases of misclassification for Krawtchouk moments are
improved if the training set also includes samples taken at varied due to gesture 6. It is noted that gesture 6 is more prevalently
view angles. misclassified as gesture 7.
The comprehensive scores for the results obtained using As stated earlier, the Zernike polynomials are global functions
Training set-I and Training set-II are given in Tables 6 and 7 and hence, the misclassifications in the case of Zernike moments
respectively. From the gesture-wise classification results obtained have occurred for the gestures with almost similar boundary
for Training set-I, it is difficult to perceptually relate the mis- profile. The Krawtchouk moments are region based features and
classified gestures. However, including more samples from dif- are local functions whose support increases with the order of the
ferent view points to the training set has improved the polynomial. Hence, as explained before, the confusion occurred
distinctness of the gestures. From the results for geometric between the gestures with almost similar spatial distribution of
moments in Table 7(a), it is observed that more misclassification pixels.
has occurred for gestures 2, 3, 4 and 6. In the case of Zernike The plots in Fig. 16 illustrate the classification accuracies of
moments, the results in Table 7(b) show that gestures 6, 8 and each gesture sign at different view angles. From these plots, it is
S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219 2217

inferred that the maximum cases of misclassifications have samples from Dataset 2. The results are consolidated in Table 5.
occurred at a view angle of 1351. Further, the results in Fig. 16 As expected, the improvement in recognition accuracies for
corroborate the observations in Table 7. Dataset 2 is desirably higher for Training set-II. The performance
of the geometric and the Zernike moments has also improved
notably. The performance of the Krawtchouk moment features is
5.4.3. Improving view invariant recognition consistently superior to that of the other considered moments for
In order to improve the view invariant classification rate, the both the training sets.
experiments are repeated by including the gestures taken at The comprehensive scores for the results obtained using
different view angles in the training set. The extended training Training set-I and Training set-II are given in Tables 6 and 7
set consists of 630 gesture samples that are collected from 23 respectively. From the gesture-wise classification results obtained
users. Among those, 230 samples are taken from Dataset 1 and for Training set-I, it is difficult to perceptually relate the mis-
400 samples from Dataset 2. We refer the extended training set as classified gestures. However, including more samples from dif-
Training set-II. The classification results are obtained for 3600 ferent view points to the training set has improved the
samples that contain 2030 samples from Dataset 1 and 1570 distinctness of the gestures. From the results for geometric
moments in Table 7(a), it is observed that more misclassification
Table 7 has occurred for gestures 2, 3, 4 and 6. In the case of Zernike
Confusion matrix for the classification results given in Table 5 for Training set-II
moments, the results in Table 7(b) show that gestures 6, 8 and
with 40 training samples/gesture signs and 360 testing samples/gesture sign.
9 have less classification rates. As tabulated in Table 7(c), the
I/P O/P maximum cases of misclassification for Krawtchouk moments are
due to gesture 6. It is noted that gesture 6 is more prevalently
0 1 2 3 4 5 6 7 8 9
misclassified as gesture 7.
(a) Detailed score of geometric moments As stated earlier, the Zernike polynomials are global functions and
0 323 5 2 12 5 6 7 hence, the misclassifications in the case of Zernike moments have
1 328 4 2 3 1 1 19 1 1 occurred for the gestures with almost similar boundary profile. The
2 19 305 24 6 1 4 1 Krawtchouk moments are region based features and are local func-
3 14 1 60 246 27 1 1 2 8
4 1 2 11 62 264 11 2 2 4 1
tions whose support increases with the order of the polynomial.
5 1 2 14 330 4 9 Hence, as explained before, the confusion occurred between the
6 4 9 6 286 42 8 5 gestures with almost similar spatial distribution of pixels.
7 13 10 7 5 322 2 1 The plots in Fig. 16 illustrate the classification accuracies of
8 4 3 13 20 3 5 310 2
each gesture sign at different view angles. From these plots, it is
9 1 3 12 2 5 8 4 325
inferred that the maximum cases of misclassifications have
(b) Detailed score of Zernike moments occurred at a view angle of 1351. Further, the results in Fig. 16
0 341 3 6 10
1 327 4 13 16
corroborate the observations in Table 7.
2 356 2 1 1
3 1 4 345 2 1 2 5
4 11 343 1 3 1 1
5 1 1 356 2
6. Conclusion
6 3 327 28 2
7 3 10 12 330 1 4
8 30 4 7 297 22 This paper has presented a gesture recognition system using
9 1 8 30 321 geometry based normalizations and Krawtchouk moment features
(c) Detailed score of Krawtchouk moments for classifying static hand gestures. The proposed system is robust to
0 353 4 1 2 similarity transformations and projective variations. A rule based
1 350 1 5 4 normalization method utilizing the anthropometry of hand is
2 360
formulated for separating the hand region from the forearm. The
3 4 1 4 343 1 1 1 5
4 10 348 1 1 method also identifies the finger and the palm regions of the hand.
5 1 358 1 An adaptive rotation normalization procedure based on the
6 1 329 25 3 2 abducted fingers and the major axes of the hand is proposed. The
7 7 7 345 1 2D Krawtchouk moments are used to represent the segmented
8 2 1 4 1 2 350
9 3 1 2 354
binary gesture image. The classification is performed using a
minimum distance classifier. The experiments are aimed towards

Fig. 15. Samples of the test gestures from Dataset 2 that has less recognition accuracy with respect to all the methods.
2218 S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219

Fig. 16. Illustration of gesture-wise classification results obtained at different view angles. The training is performed using Training set-II and the results are shown for 290
testing samples (29 samples/gesture sign) at each view angle. (a) Classification results of geometric moments at different view angles. (b) Classification results of Zernike
moments at different view angles. (c) Classification results of Krawtchouk moments at different view angles.

analyzing the accuracy of the Krawtchouk moments as features in [3] M.-P. Dubuisson, A.K. Jain, A modified Hausdorff distance for object matching,
user and view invariant static hand gesture classification. in: Proceedings of the 12th IAPR International Conference on Pattern
Recognition, vol. 1, 1994, pp. 566–568.
The experiments are conducted on a large database consisting of [4] J.-L. Coatrieux, Moment-based approaches in imaging. Part 2: invariance, IEEE
10 gesture classes and 4230 gesture samples. A detailed study on Engineering in Medicine and Biology Magazine 27 (1) (2008) 81–83.
the Krawtchouk moments based classification is conducted in [5] J. Flusser, T. Suk, B. Zitová, Moments and Moment Invariants in Pattern
comparison with the geometric moments and the Zernike moments. Recognition, John Wiley & Sons, 2010.
[6] R. Mukundan, S.H. Ong, P.A. Lee, Image analysis by Tchebichef moments, IEEE
Based on the results, we conclude that the Krawtchouk moments are Transactions on Image Processing 10 (9) (2001) 1357–1364.
robust features for achieving viewpoint invariant and user indepen- [7] P.T. Yap, R. Paramesran, S.H. Ong, Image analysis by Krawtchouk moments,
dent recognition of static hand gestures. IEEE Transactions on Image Processing 12 (11) (2003) 1367–1376.
Future works based on this paper may involve utilizing the [8] S.P. Priyal, P.K. Bora, A study on static hand gesture recognition using
moments, in: Proceedings of the International Conference on Signal Proces-
proposed anthropometry based extraction methods for normalizing sing and Communications (SPCOM), IEEE, 2010, pp. 1–5.
the user variations in the hand gesture. Despite scale normalization, [9] S.C.W. Ong, S. Ranganath, Automatic sign language analysis: a survey and the
the user-dependency occurs due to the variation in the aspect ratio of future beyond lexical meaning, IEEE Transactions on Pattern Analysis and
the constituent regions of the hand. The anthropometry based Machine Intelligence 27 (6) (2005) 873–891.
[10] K. Imagawa, S. Lu, S. Igi, Color-based hand tracking system for sign language
extraction method is capable of separating the finger and the palm recognition, in: Proceedings of the 3rd International Conference on Auto-
regions and hence, it can be used to achieve user-independence by matic Face and Gesture Recognition, IEEE, 1998, pp. 462–467.
normalizing the aspect ratio of the hand regions. Subsequent research [11] K. Imagawa, H. Matsuo, R.-i. Taniguchi, D. Arita, S. Lu, S. Igi, Recognition of
local features for camera-based sign language recognition system, in:
can be extended to studying the efficiency of Krawtchouk moment
Proceedings of the 15th International Conference on Pattern Recognition,
features in recognizing occluded hand shapes and complex hand vol. 4, IEEE, 2000, pp. 849–853.
gestures as in sign-language communication. [12] S. Akyol, P. Alvarado, Finding relevant image content for mobile sign
language recognition, in: Proceedings of the International Conference on
Signal Processing, Pattern Recognition and Applications, 2001, pp. 48–52.
[13] M.-H. Yang, N. Ahuja, M. Tabb, Extraction of 2D motion trajectories and its
Conflict of interest statement application to hand gesture recognition, IEEE Transactions on Pattern
Analysis and Machine Intelligence 24 (8) (2002) 1061–1074.
None declared. [14] J.-C. Terillon, A. Piplr, Y. Niwa, K. Yamamoto, Robust face detection and hand
posture recognition in color images for human-machine interaction, in:
Proceedings of the 16th International Conference on Pattern Recognition,
vol. 1, IEEE, 2002, pp. 204–209.
References [15] S.S. Ge, Y. Yang, T.H. Lee, Hand gesture recognition and tracking based on
distributed locally linear embedding, Journal of Image and Vision Computing
[1] X. Teng, B. Wu, W. Yu, C. Liu, A hand gesture recognition based on local linear 26 (12) (2008) 1607–1620.
embedding, Journal of Visual Languages and Computing 16 (2005) 442–454. [16] Y. Cui, J. Weng, Appearance-based hand sign recognition from intensity
[2] D. Zhang, G. Lu, Review of shape representation and description techniques, image sequences, Computer Vision and Image Understanding 78 (2) (2000)
Pattern Recognition 37 (2004) 1–19. 157–176.
S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219 2219

[17] M. Amin, H. Yan, Sign language finger alphabet recognition from Gabor-pca [35] F. Mokhtarian, R. Suomela, K.C. Chan, Image point feature detection through
representation of hand gestures, in: Proceedings of the International Conference curvature scale space, in: IEEE 7th International Conference on Image
on Machine Learning and Cybernatics, vol. 4, IEEE, 2007, pp. 2218–2223. Processing and Its Applications, vol. 1, IEEE, 1999, pp. 206–210.
[18] William T. Freeman, M. Roth, Orientation histogram for hand gesture [36] S. Kopf, T. Haenselmann, W. Effelsberg, Shape-based posture and gesture
recognition, in: Proceedings of the 1st International Workshop on Automatic recognition in videos, in: Proceedings of the SPIE, vol. 5682, 2005, pp. 114–124.
Face and Gesture Recognition, IEEE, 1995, pp. 296–301. [37] C.-C. Chang, Adaptive multiple sets of CSS features for hand posture
[19] H. Zhou, D.J. Lin, T.S. Huang, Static hand gesture recognition based on local recognition, Neurocomputing, Elsevier 69 (16–18) (2006) 2017–2025.
orientation histogram feature distribution model, in: Proceedings of the [38] Z. Liang, N. Sum, M. Cao, Recognition of static human gesture based on
Conference on Computer Vision and Pattern Recognition Workshops, vol. 10, radiant projection transform and Fourier transform, in: Proceedings of the
IEEE, 2004, p. 161. 2008 Congress on Image and Signal Processing, vol. 4, IEEE, 2008, pp. 635–640.
[20] J. Triesch, C. von der Malsburg, A system for person independent hand [39] E. Sánchez-Nielsen, L. Antón-Canalı́s, M. Hernández-Tejera, Hand gesture
posture recognition against complex backgrounds, IEEE Transactions on
recognition for human-machine interaction, Journal of WSCG 12 (1-3) (2004)
Pattern Analysis and Machine Intelligence 23 (12) (2001) 1449–1453.
395–402.
[21] A. Just, Y. Rodriguez, S. Marcel, Hand posture classification and recognition
[40] V.S. Rao, C. Mahanta, Gesture based robot control, in: Proceedings of 4th
using modified census transform, in: Proceedings of the 7th International
International Conference on Intelligent Sensing and Information Processing
Conference on Automatic Face and Gesture Recognition, IEEE, 2006, pp. 351–
356. (ICISIP), 2006, pp. 145–148.
[22] D.-Y. Huang, W.-C. Hub, S.-H. Chang, Gabor filter-based hand pose angle [41] J.M.S. Dias, P. Nande, N. Barata, A. Correia, O.G.R.E.–open gesture recognition
estimation for hand gesture recognition under varying illumination, Expert engine, A platform for gesture based communication and interaction, Lecture
Systems with Applications 38 (2011) 6031–6042. Notes in Artificial Intelligence, Springer-Verlag, vol. 3881, 2006, pp. 129–132.
[23] T. Starner, J. Weaver, A. Pentland, Real-time American sign language [42] D. Kelly, J. McDonald, C. Markham, A person independent system for
recognition using desk and wearable computer based video, IEEE Transac- recognition of hand postures used in sign language, Pattern Recognition
tions on Pattern Analysis and Machine Intelligence 20 (12) (1998) Letters 31 (2010) 1359–1368.
1371–1375. [43] R. Mukundan, K.R. Ramakrishnan, Moment Functions in Image Analysis:
[24] N. Tanibata, N. Shimada, Y. Shirai, Extraction of hand features for recognition Theory and Applications, World Scientific Publishing Co.Pte.Ltd., 1998.
of sign language words, in: Proceedings of the International Conference on [44] M.R. Teague, Image analysis via the general theory of moments, Journal of
Vision Interface, 2002, pp. 391–398. Optic Society of America 70 (1962) 920–930.
[25] B. Bauer, K.-F. Kraiss, Towards an Automatic Sign Language System using [45] L. Zhu, J. Liao, X. Tong, L. Luo, B. Fu, G. Zhang, Image analysis by modified
Subunits, Lecture Notes in Computer Science, vol. 2298, Springer-Verlag, Krawtchouk moments, Lecture Notes in Computer Science, Springer-Verlag,
2001, pp. 64–75. vol. 5553, 2009, pp. 310–317.
[26] H.-S. Yoon, J. Soh, Y.J. Bae, H.S. Yang, Hand gesture recognition using [46] B. Bayraktar, T. Bernas, J. Robinson, B. Rajwa, A numerical recipe for accurate
combined features of location, angle and velocity, Pattern Recognition 34 image reconstruction from discrete orthogonal moments, Pattern Recogni-
(7) (2001) 1491–1501. tion 40 (2) (2007) 659–669.
[27] B. Bauer, K. Kraiss, Video based sign recognition using self-organizing [47] R. Koekoek, R. Swarttouw, The Askey-Scheme of hypergeometric orthogonal
subunits, in: Proceedings of the 16th International Conference on Pattern polynomials and its q-analogue, Technical report, Technische Universiteit
Recognition, vol. 2, IEEE, 2002, pp. 434–437. Delft Faculty of Technical Mathematics and Informatics Report 98-17, Delft,
[28] S. Chandran, A. Sawa, Real-time detection and understanding of isolated Netherlands, 1998.
protruded fingers, in: Proceedings of the Conference on Computer Vision and [48] H. Breu, J. Gil, D. Kirkpatrick, M. Werman, Linear time Euclidean distance
Pattern Recognition Workshop, vol. 10, IEEE, 2004, p. 152.
transform algorithms, IEEE Transactions on Pattern Analysis and Machine
[29] A. Chalechale, F. Safaei, G. Naghdy, P. Premaratne, Hand posture analysis for
Intelligence 17 (5) (1995) 529–533.
visual based human machine interface, in: Proceedings of the Workshop on
[49] R. Fabbri, L. da F. Costa, J. C. Torelli, O. M. Bruno, 2D Euclidean distance
Digital Image Computing, 2005, pp. 91–96.
transform algorithms: a Comparative survey, ACM Computing Surveys 40 (1)
[30] C.-C. Chang, J.J. Chen, W.-K. Tai, C.-C. Han, New approach for static gesture
recognition, Journal of Information Science and Engineering 22 (2006) (2008) pp. 2:1–2:44.
1047–1057. [50] S. Pheasant, Bodyspace: anthropometry, ergonomics and the design of work,
[31] L. Gu, J. Su, Natural hand posture classification based on Zernike moments Taylor and Francis Routledge, 1996.
and hierarchial classifier, in: Proceedings of the International Conference on [51] A. Nag, P. Nag, H. Desai, Hand anthropometry of Indian women, The Indian
Robotics and Automaton, IEEE, 2008, pp. 3088–3093. Journal of Medical Research 117 (2003) 260–269.
[32] T. McElroy, E. Wilson, G. Anspach, Fourier descriptors and neural networks [52] T. Kanchan, P. Rastogi, Sex determination from hand dimensions of north and
for shape classification, in: Proceedings of the International Conference on south Indians, Journal of Forensic Sciences 54 (3) (2009) 546–550.
Acoustics, Speech and Signal Processing, vol. 5, 1995, pp. 3435–3438. [53] C. de Boor, A Practical Guide to Splines, revised ed., Springer-Verlag, 2001.
[33] A. Licsar, T. Sziranyi, User-adaptive hand gesture recognition system with [54] F. Hunter, P. Fuqua, Light Science and Magic: An Introduction to Photographic
interactive training, Journal of Image and Vision Computing 23 (12) (2005) Lighting, 2nd ed., Elsevier, 1997.
1102–1114. [55] K. Milburn, Digital Photography Expert Techniques, O’Reilly Media, 2004.
[34] L. Gupta, S. Ma, Gesture-based interaction and communication: automated [56] Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli, Image quality assessment:
classification of hand gesture contours, IEEE Trans. on Systems, Man and from error visibility to structural similarity, IEEE Transactions on Image
Cybernatics, Part C: Applications and Reviews 31 (1) (2001) 114–120. Processing 13 (4) (2004) 600–612.

S. Padam Priyal received the B.Eng. degree in Electronics and Communication Engineering from Karunya Institute of Technology, Coimbatore, India, in 2002 and the M.Eng.
degree in Communication Systems from the Mepco Schlenk Engineering College, Sivakasi, India, in 2004. Currently, she is a Research Scholar in the Department of
Electronics and Electrical Engineering, Indian Institute of Technology, Guwahati. Her research interests include computer vision, pattern recognition and image analysis.

Prabin Kumar Bora received the B.Eng. degree in Electrical Engineering from Assam Engineering College, Guwahati, India, in 1984 and the M.Eng. and Ph.D. degrees in
Electrical Engineering from the Indian Institute of Sciences, Bangalore, in 1990 and 1993 respectively. Currently, he is a Professor in the Department of Electronics and
Electrical Engineering, Indian Institute of Technology, Guwahati. Previously, he was a Faculty Member with Assam Engineering College, Guwahat; Jorhat Engineering
College, Jorhat, India; and Jorhat and Gauhati University, Guwahati. His research interests include computer vision, pattern recognition, video coding, image and video
watermarking and perceptual video hashing.

You might also like