0% found this document useful (0 votes)
23 views14 pages

Dynamic Hand Gesture Recognition

vision based technique using hand gstures
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views14 pages

Dynamic Hand Gesture Recognition

vision based technique using hand gstures
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/305984650

Dynamic hand gesture recognition using vision


based approach for human-computer
interaction

Article in Neural Computing and Applications · August 2016


DOI: 10.1007/s00521-016-2525-z

CITATIONS READS

0 106

3 authors:

Joyeeta Singha Amarjit Roy


The LNM Institute of Information Technology National Institute of Technology, Silchar
16 PUBLICATIONS 67 CITATIONS 8 PUBLICATIONS 8 CITATIONS

SEE PROFILE SEE PROFILE

Rabul Laskar
National Institute of Technology, Silchar
48 PUBLICATIONS 84 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

vision based hand gesture recognition View project

Malaria infected erythrocyte classification View project

All content following this page was uploaded by Joyeeta Singha on 16 August 2016.

The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document
and are linked to publications on ResearchGate, letting you access and read them immediately.
Neural Comput & Applic
DOI 10.1007/s00521-016-2525-z

ORIGINAL ARTICLE

Dynamic hand gesture recognition using vision-based approach


for human–computer interaction
Joyeeta Singha1 • Amarjit Roy1 • Rabul Hussain Laskar1

Received: 28 January 2016 / Accepted: 8 August 2016


 The Natural Computing Applications Forum 2016

Abstract In this work, a vision-based approach is used to Keywords Human–computer interaction  Hand gesture
build a dynamic hand gesture recognition system. Various recognition  KLT  ANOVA  IFS
challenges such as complicated background, change in
illumination and occlusion make the detection and tracking
of hand difficult in any vision-based approaches. To 1 Introduction
overcome such challenges, a hand detection technique is
developed by combining three-frame differencing and skin Nonverbal communication that includes communication
filtering. The three-frame differencing is performed for through body postures, hand gestures and facial expres-
both colored and grayscale frames. The hand is then sions makes up most of all communication among human.
tracked using modified Kanade–Lucas–Tomasi feature Hand gestures are one of the most common forms of
tracker where the features were selected using the compact communication to interact with human to human or human
criteria. Velocity and orientation information were added to machine. Hand gestures consist of specific linguistic
to remove the redundant feature points. Finally, color cue content, whereas other forms of communications are gen-
information is used to locate the final hand region in the eral emotional state. Due to its speed, simplicity and nat-
tracked region. During the feature extraction, 44 features uralness, hand gestures have been widely used in sign
were selected from the existing literatures. Using all the languages and human–computer interaction systems [1].
features could lead to overfitting, information redundancy Hand gesture recognition provides human to interact with
and dimension disaster. Thus, a system with optimal fea- computer in more natural and effective way. The hand
tures was selected using analysis of variance combined gesture recognition system available in the literature has
with incremental feature selection. These selected features successful applications in computer games, sign-to-text
were then fed as an input to the ANN, SVM and kNN translation systems, sign language communication [2, 3],
model. These individual classifiers were combined to robotics [4] and video-based surveillance.
produce classifier fusion model. Fivefold cross-validation Hand gestures may be static or dynamic. In static ges-
has been used to evaluate the performance of the proposed ture recognition, the hand shape, size of palm, length and
model. Based on the experimental results, it may be con- width of fingers need to be kept in mind [5]. Dynamic hand
cluded that classifier fusion provides satisfactory results gestures need spatiotemporal information to track hand [6].
(92.23 %) compared to other individual classifiers. One- Hand detection and tracking is the initial step in any hand
way analysis of variance test, Friedman’s test and Kruskal– gesture recognition system. Comaniciu et al. proposed a
Wallis test have also been conducted to validate the sta- model to track hand using color histogram [7]. They used
tistical significance of the results. the color histogram of the detected hand as the mean shift
input to locate and track hand in the video sequences. But
the drawback of the model was that it was unable to detect
& Joyeeta Singha
hand when background had similar color as object. Simi-
[email protected]
larly, Chai et al. [8] and Wang et al. [9] used skin-colored
1
Department of ECE, NIT Silchar, Silchar, India information to detect hand. YCbCr color space model was

123
Neural Comput & Applic

used for segmentation. Guo et al. [10] proposed a hand also conducted to validate the statistical significance of the
tracking system using skin filtering, pixel-based hierar- results.
chical feature for AdaBoosting, and codebook background The paper is organized as follows. The architecture of
cancelation. But, the background has to be known a priori. the proposed gesture recognition system with details
Camshift—an improved version of mean shift algorithm, about each subsystems is presented in Sect. 2. The dif-
was widely used for tracking objects [11]. This algorithm ferent experimental results obtained during hand detec-
has found to be efficiently track hand in a simple back- tion, hand tracking, feature selection and classification are
ground scene, but it cannot give the same result when the discussed in Sect. 3. Finally, the paper is concluded in
target is occluded with other skin-colored objects. Shi and Sect. 4.
Tomasi [12] selected the corner points with high intensities
as the features to track target object. Though good tracking
results have been observed, the feature points go on 2 Proposed system
decreasing with succeeding video frames. This happens
due to change in illumination or change in appearance of There are five phases in our proposed system. They are hand
the hand. Asaari et al. [13] integrated adaptive Kalman detection, hand tracking, feature extraction, feature selection
filter and eigenhand to track hand under different chal- and classification. Figure 1 shows the block diagram of the
lenging environment. But the algorithms fails when there is proposed hand gesture recognition system. The details of
presence of large-scale variations and pose changes. each phases are provided in the following subsections.
Kolsch and Turk [14] introduced a KLT tracker-based hand
tracking algorithm. This tracker fails when there is shape 2.1 Hand detection
transformation of the hand. Nowadays, many depth-based
hand detections [15] are observed, but 3D gesture inter- The first step in any hand gesture recognition system is the
action is not much user friendly. segmentation of hand from the background. For this, the
The contributions of our paper are as follows. Firstly, a first three frames of the video sequence were considered.
database has been developed using bare hand, namely The system architecture for detection of hand is shown in
‘NITS hand gesture database IV,’ for 40 class of gestures Fig. 2. The algorithm for hand detection is presented as
(10 numerals, 26 alphabets and 4 arithmetic operators). ‘Algorithm 1.’ This process includes three steps whose
This database has also been made publicly available. results are combined to obtain the desired hand. They are:
Secondly, a new hand detection scheme has been devel-
• Face detection followed by skin filtering
oped by combining three-frame differencing and skin fil-
• Three-frame differencing for colored frames
tering. The hand was then tracked using modified KLT
• Three-frame differencing for grayscale frames
algorithm which included additional compact criteria,
velocity and direction information along with traditional Initially, the face of the user gesticulating (if present) is
KLT algorithm. Finally color cues were used to detect the detected and removed from the second frame of the video
final hand region in every frame. Thirdly, ANOVA and IFS using the Viola–Jones algorithm [16, 17]. After the face is
techniques were used to select the optimal features from removed, skin filtering [8] is performed to obtain the skin-
the 44 existing features. Study on the effect of different colored objects in the frame. On the other hand, three-frame
combination of features made by IFS using individual differencing is performed with the first three frames. It is
classifiers such as ANN, SVM and kNN was made in this computed for both colored and grayscale frames. Morpho-
paper. Lastly, classifier fusion technique was developed by logical operations are carried out as shown in Fig. 2 in order
combining the results of individual classifiers such as to achieve the desired results. The results of the skin filtering
ANN, SVM and kNN. Moreover, one-way analysis of and three-frame differencing are combined to obtain the
variance test, Friedman’s test and Kruskal–Wallis test were desired hand from the background. But, it has been observed

Fig. 1 Proposed system


Hand Hand Feature
Input video detection tracking extraction

Feature
Output Classification selection

123
Neural Comput & Applic

Fig. 2 System architecture of


the hand detection

that along with the desired hand, there is other small skin- 2.2 Hand tracking
colored objects in the surroundings. Thus, the largest binary
linked object was filtered from the other objects which are The hand detected is tracked using three steps such as
considered to be the desired hand in the background. initialization of tracking region, exaction of features from

123
Neural Comput & Applic

initialized tracking region and refining tracking region as tracking is lost. Moreover, if the hand is occluded with
shown in Fig. 3. The steps are presented in ‘Algorithm 2.’ other skin-colored objects, the tracker gets confused, and
The details of each step are discussed in the following thus, the some of the features are lost or it wrongly tracks
subsections. the features which do not belong to the hand. To minimize
such difficulties, we used compact criteria to select the
2.2.1 Initialization of tracking region optimal feature points so that the features are not too
sparsely spread over the hand. This compact criterion is
In traditional CamShift algorithm or KLT tracker, the ini- based on the centroid of the feature points. The traditional
tial tracking region needs to be selected manually. But to way of computing centroid directly by averaging the
make a system robust, automatic selection of tracking position of feature points is very sensitive to outliers. The
region is necessary. In this proposed system, initialization detailed steps of compact criterion are provided below:
of the first tracking window has been made automatic by
• The distance between each point and the remaining points
considering the detected hand as the initial tracking
and weight each point were calculated as shown in Eq. (1).
window.

2.2.2 Extraction of features from initialized tracking


region 1
wi ¼ P 2
ð1Þ
k6¼i jjxk  xi jj
Selecting good features from the initial tracking window is
very important. The feature points should satisfy three • According to the equation, if a feature point xi is far
rules: from the other feature points, then it has a small weight.
• It is in the tracking region. Alternately, if xi is near to most of the other feature
• It should not be spread far from each other over the points, then it will have a larger weight. After
P
hand. normalizing the weights as iwi = 1, the centroid is
• It should not be concentrated on a small part of the calculated as:
hand.
X
By using KLT feature tracker, the feature points go on xc ¼ wi xi ð2Þ
i
decreasing at next frames. This is because the points are
lost in the succeeding frames of the video. A time comes • The final step of compact criteria for a feature point xi
when there are no more features to be tracked, and thus, is expressed as jxk  xi j2 \ro , where ro is a distance

123
Neural Comput & Applic

considered as the main orientation. The feature points


moving in different direction compared to the direction
along which majority feature points are moving are rejec-
ted for tracking in the next frame. The above steps help to
remove the redundant feature points. The red bounding box
in Fig. 4a shows the final tracked hand region, whereas the
green bounding box shows the detected hand region
without the orientation feature.

2.2.3 Refining tracking region

The bounding box calculated from former step cannot rep-


resent the hand region precisely. This is because although
compact criterion selects appropriate feature points, some-
Fig. 3 System architecture of proposed hand tracking times the features are not uniform in the hand region and the
existence of tracking failure makes the tracking result not
reliable. Thus, CamShift algorithm can maximize the prob-
threshold. This criterion helps to eliminate the redun-
ability of skin region in the tracking window in a few itera-
dant feature points and ensures that the feature points
tion. This makes the tracking process more stable. Finally,
are concentrated around the centroid.
the features are again generated at every 30 frames and the
In the next step, the system was checked for the pres- above process is carried out accordingly so as to avoid loss of
ence of any skin-colored static objects in the background information from feature points.
which could lead to confusion with the hand. Thus,
velocity information was added to remove the feature 2.3 Trajectory smoothening
points corresponding to the static objects. The feature
points having velocity less than a threshold value The trajectory of the gesture is obtained by joining the
((2 * Vavg)/3) are rejected, where Vavg is the average centroid points of the tracked region during tracking of
velocity. After this, the orientation information was used to hand at every video frame. This gesture trajectory is gen-
detect the direction along which the feature points are erally noisy because of the factors like movement of hand.
moving in the consecutive frames. Figure 4a shows an Thus, the gesture trajectory must be smoothened before
example where direction of orientation is shown with processing further steps. Douglas–Peucker algorithm [18]
yellow lines. The red points correspond to the feature is applied to the gesture trajectory to smoothen the gesture
points in previous frame, and yellow points refer to feature trajectory. The gesture trajectory smoothened using the
points tracked in current frame. Firstly, the 360 is spited Douglas–Peucker algorithm showed better results com-
into 8 bins as shown in Fig. 4b. Then, all the velocity pared to the smoothening process used by Bhuyan et al.
orientation of feature movements is calculated. Figure 4c [19] and Singha et al. [20, 21].
shows the histogram of the direction of orientation. Finally, The self co-articulated strokes were detected after the
the bin corresponding with the most feature points is smoothening of the gesture trajectory and removed from the

15

10

0
1 2 3 4 5 6 7 8

(a) (b) (c)


Fig. 4 a Tracking result, b 8-bin segments of orientation and c result histogram

123
Neural Comput & Applic

gesture using the same steps performed in our previous paper 2.6 Classification
[20]. These strokes were removed in this stage because these
hand movements are not a part of the gesture. 2.6.1 ANN-based classification

2.4 Extraction of features The ANN architecture used for the proposed system has three
layers: 1 input, 1 hidden and 1 output. Input layer consists of
For matching the trajectory, 44 features were considered in 40 neurons which represent the 40 features, and output layer
this paper. We used the features from different literatures has 40 neurons representing 40 gesture classes. The network
[19, 20, 22–27]. The total set of features used in our system was trained with different numbers of hidden units such as 50,
is presented in Table 1. 52, 54, 56 and 58. This helped the system to capture the net-
work structure with highest train accuracy. The weights were
2.5 Selection of optimal features adjusted by back-propagation algorithm. The optimum net-
work achieved for our system was 40L-54N-40L. Then, the
From the 44 features extracted in our system, the best set of testing was performed using fivefold cross-validation process.
features was selected using feature selection technique by
ANOVA and IFS. This could reduce the overfitting and 2.6.2 SVM-based classification
information redundancy in the model. The two-level fea-
ture selection technique is developed as explained in our Here, the dataset used in our system was trained with dif-
previous paper [34] to obtain the optimal features. ferent kernel functions: linear, quadratic, polynomial and

Table 1 Description on features used


Existing features Feature description

Rubine’s feature [25] Cosine and sine of the initial angle with respect to The cosine and sine of the angle between the first and
the x-axis last point
The length of bounding box diagonal The total gesture length
The angle of the bounding box The total traversed angle
The distance between first and last point Maximum speed squared
Stroke duration
E-Rubine’s feature [24] Number of stop points Distance from the start to end point in relation to the
Distance from start to center point in relation to the diagonal
diagonal Total number of strokes
Direction of the first half of the stroke Straightness
Direction of the second half of the stroke Total distance between strokes
Angle between the first and second half of the stroke Angle between strokes
Stroke distance in relation to each other
Location feature [19, 26, 27] Average distance from the center of the gesture trajectory to each trajectory points
Orientation feature [22, 26] Motion chain code Orientation of end hand
Orientation of start hand Number of significant curves in a gesture
Velocity feature [26] Average velocity
Acceleration feature [19] Acceleration between consecutive trajectory points
Velocity profile [19] Number of maxima and number of minima in velocity profile of a gesture
Position feature [20] The position of the start hand and the end hand using the 6 quadrants
Self co-articulated feature [20] Number of self co-articulation
Orientation of self co-articulated strokes
Position of start and end of self co-articulated strokes
Ratio feature [20] Ratio of longest to shortest distance from the center of the gesture to trajectory points
Distance feature [20] Average distance from start to end of the gesture
Ellipse fitted orientation Orientation is calculated for every ellipse fitted for every 6 consecutive trajectory points
feature [23]
Length of major axis [23] Length of the major axis of each ellipse fitted
Position feature [23] Position of the start hand and the end hand using the 3 quadrants

123
Neural Comput & Applic

radial basis function. During training, the kernel function the individual classifiers such as ANN, SVM and kNN.
which provides the best results was used for the testing. Moreover, the results of the classifier fusion have also been
After the training phase, the testing was performed using provided here. Three statistical tests such as ‘one-way
fivefold cross-validation process. analysis of variance,’ ‘Friedman’s test’ and ‘Kruskal–
Wallis test’ were performed to test the statistical signifi-
2.6.3 kNN-based classification cance of the classifier fusion. Also the stability of the
algorithm to Gaussian noise is examined in Sect. 3.4.
The system was trained for different values of k such as 3, Finally, a comparison is provided in Sect. 3.5.
5, 7 and 9 during the training phase. Odd values of k have
been selected so as to avoid draw votes. After the training 3.1 Performance of the hand detection and tracking
phase, the testing was performed using fivefold cross-val-
idation process. The objective analysis of the results obtained using the
proposed algorithm for hand detection and tracking is
2.6.4 Classifier fusion provided in Figs. 5 and 6, respectively. The proposed hand
detection technique is compared with the existing tech-
The results of individual classifiers such as ANN, SVM and niques such as skin filtering [2] and two-frame differencing
kNN were combined to get the classifier fusion result. [13]. Figure 5 shows the comparative analysis of the pro-
Majority voting technique was used for combining the posed hand detection algorithm with other techniques. The
individual classifiers. It has been observed that combining result obtained using the proposed hand tracking algorithm
the results of the individual classifiers provides result is compared with the other three state-of-the-art object
which was more desirable than the individual ones [28]. tracking algorithms such as CamShift [29], KLT [14] and
After the classifier fusion was performed on training set, particle filter [30].
fivefold cross-validation was carried out. From Fig. 6, it can be seen that the CamShift algorithm
fails if there is any overlapping between hand and other
skin-colored objects in surrounding. The feature points are
3 Experimental results lost at certain interval of time when KLT algorithm is used,
whereas in case of particle filter, as the number of particles
For the experiments, we developed ‘NITS hand gesture is increased, the complexity of the algorithm increases
database IV.’ It consists of 40 gesture class (10 numerals, resulting in delay. The proposed tracking algorithm is able
26 alphabets and 4 arithmetic operators) gesticulated by 20 to handle all the above problems during tracking discussed
users. A total of 9600 gestures were used as training dataset above.
and 2000 samples as testing dataset. Some of the samples
of the total database are available in http://www.joy 3.2 Performance of the feature selection
eetasingha26.wix.com/nits-database. The proposed system
was tested on Windows 8-based Intel Core I7 processor The feature selection was performed as described in
with 4 GB RAM, and all the experiments were performed Sect. 2.5. The ANOVA was used to find the F value of
using MATLAB R2013a. The users were asked to gestic-
ulate keeping the following conditions into account.
• The background should not consist of moving skin-
colored objects at the start of the video recording.
• The hand should be already available at the start
position of the gesture before the video recording starts.
• The palm of the hand should be moving at the same
place for few seconds to detect the presence of hand.
• The hand is then moved smoothly and slowly to the
most prominent gesture positions.
• The hand is kept in the final gesture position for few
seconds to complete the gesture.
The performance of hand detection and tracking is
provided in Sect. 3.1. The experimental results of optimal
Fig. 5 Comparison of proposed hand detection algorithm with
feature selection using ANOVA and IFS are provided in existing techniques (green: skin filtering, yellow: two-frame differ-
Sect. 3.2. Section 3.3 includes the results observed from encing, red: proposed detector) (color figure online)

123
Neural Comput & Applic

Fig. 6 Comparison of proposed tracking algorithm with existing techniques for gesture ‘Five’ (green: KLT, blue: CamShift, yellow: particle
filter, red: proposed tracker) (color figure online)

Table 2 Total set of features used in the system each features for checking the statistical significance of the
features. A total of 40 features were found to have statis-
Feature set Feature
tical significance which is denoted by star marked in
f1–f*13 Rubine’s feature Table 2. These 40 features are ranked in the decreasing ‘F’
f14–f*24 E-Rubine features values as given in Table 3. Then, the performances of the
f*25 Location combination of remaining 40 feature subsets were exam-
f26–f*29 Orientation feature ined using IFS technique. The accuracy resulted in com-
f31–f*32 Position bining different feature set is shown in Fig. 7. It can be
f33–f*36 Self co-articulated features observed from the IFS curve that combination of 21, 18 and
f*37 Ratio feature 26 features provides maximum accuracy for ANN, SVM
f*40 Length of major axis of the ellipse fitted and kNN, respectively.
f*41 Position
f43–f*44 Velocity profile 3.3 Performance of the classifiers
f30 Velocity
f38 Distance feature The results of the ANN are presented in Table 4. The
f39 Ellipse fitted orientation feature parameters of ANN such as hidden units and iterations
f42 Acceleration were varied, and the corresponding train and test accuracies
were calculated. The highest train and test accuracy were
* p \ 0.05 indicates statistical significance
observed for network structure 40L-54N-40L (L and

Table 3 Features ranked in the


Rank Feature set Rank Feature set Rank Feature set Rank Feature set
decreasing order of ‘F’ value
1 f34 11 f31 21 f7 31 f19
2 f35 12 f41 22 f13 32 f22
3 f36 13 f32 23 f17 33 f3
4 f33 14 f27 24 f14 34 f9
5 f40 15 f28 25 f23 35 f11
6 f20 16 f29 26 f24 36 f10
7 f25 17 f43 27 f1 37 f16
8 f6 18 f2 28 f4 38 f18
9 f8 19 f21 29 f12 39 f44
10 f26 20 f5 30 f15 40 f37

123
Neural Comput & Applic

ANN SVM kNN Table 5 5-fold cross-validation result using ANN classifier
100
88.31 90.58 87.82
90 Expt# Accuracy (%)
80 Subset 1 90.67
Accuracy (%)

70 Subset 2 89.11
60 Subset 3 86.23
50 Subset 4 93.60
40 Subset 5 93.29
30 Overall accuracy 90.58
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
Number of features

Fig. 7 IFS curve of the combination of features Table 6 Results using SVM
Expt# Kernel function Train accuracy Test accuracy
Table 4 Results using ANN (%) (%)
Expt# Hidden Network Iterations Train Test 1 Linear 86 82
units structure accuracy accuracy
(%) (%) 2 Quadratic 86 80
3 Polynomial 88 84
1 50 40–50–40 100 84 80 4 Radial basis 90 86
500 86 80 function
2 52 40–52–40 100 86 80
500 90 82
3 54 40–54–40 100 86 82 Table 7 Cross-validation results using SVM with ‘rbf’ kernel
500 90 86 function
4 56 40–56–40 100 90 84 Expt# Accuracy (%)
500 88 80
Subset 1 88.21
5 58 40–58–40 100 86 80
Subset 2 87.26
500 90 80
Subset 3 88.53
Subset 4 89.01
N correspond to linear and nonlinear layers), itera- Subset 5 88.54
tions = 500. These network parameters were used for Overall accuracy 88.31
performing fivefold cross-validation. Cross-validation was
performed to determine the validity of the proposed model.
The results obtained during the cross-validation process are Table 8 Results using kNN
presented in Table 5.
Expt# k Train accuracy (%) Test accuracy (%)
The SVM classifier was tested for different kernel
functions. The results of the evaluation are provided in 1 3 76 70
Table 6. The highest test and train accuracies were 2 5 84 80
observed for kernel function as radial basis function (rbf). 3 7 78 72
The fivefold cross-validation process was performed using 4 9 80 74
‘rbf’ kernel function, and the results are provided in
Table 7.
The system was also evaluated using kNN classifier
Table 9 Cross-validation results using kNN with k = 5
using different values of k. The results of the evaluation are
provided in Table 8. The highest train and test accuracy Expt# Accuracy (%)
were obtained for k = 5. The fivefold cross-validation
Subset 1 88.92
results are listed in Table 9.
Subset 2 85.00
Each classifiers with suitable parameters obtained from
Subset 3 86.20
the experiments above (Tables 4, 5, 6, 7, 8 and 9) were
Subset 4 90.31
used to generate a classifier fusion model. The fivefold
Subset 5 88.67
cross-validation result obtained for this classifier fusion is
Overall accuracy 87.82
provided in Table 10. Table 11 provides the summary of

123
Neural Comput & Applic

Table 10 Cross-validation results using classifier fusion Table 13 Friedman’s test details
Expt# Accuracy (%) Sources SS df MS Chi-sq Prob [ Chi-sq

Subset 1 94.27 Columns 16.6 3 5.533 9.96 0.0189


Subset 2 92.00 Error 8.4 12 0.70
Subset 3 92.30 Total 25 19
Subset 4 91.10 Chi-sq Chi square
Subset 5 91.48
Overall accuracy 92.23
Table 14 Kruskal–Wallis test details
Sources SS df MS Chi-sq Prob [ Chi-sq
Table 11 Comparison of success rate by different classifier using
fivefold cross-validation Columns 368.6 3 122.867 10.53 0.0145
Classifier Success rate (%) Computational time (s) Error 296.4 16 18.525
Total 665 19
ANN 90.58 0.560
SVM 88.31 0.420
kNN 87.82 0.496 Post hoc analysis was performed using Tukey’s HSD test
Classifier fusion 92.23 1.570 for the three tests (one-way analysis of variance test, Fried-
man’s test and Kruskal–Wallis test) to find out the most sig-
nificantly different classifier. Figure 8 shows that the classifier
Table 12 One-way analysis of variance test details
fusion is significantly different from other classifiers.

Source of SS df MS F P value F-crit 3.4 Performance of the system with noisy data
variation

Columns 63.184 3 21.0615 5.26 0.0102 3.2389 A set of experiments were conducted using the noisy set of
Error 64.028 16 4.0017 data. The gesture trajectories were made noisy by applying
Total 127.212 19 a Gaussian white noise with signal-to-noise ratio of 30 and
SS sum of squares, df degree of freedom, MS mean squared error, F F-
40. Few examples of the noisy data are shown in Fig. 9.
statistic For moderate noise level (SNR = 40), the average accu-
racy of the 40 gestures was 90.5 % which is comparatively
the final cross-validation results of the different classifiers. similar to the results with ‘no noise’ (92.23 %), while for
It can be concluded that classifier fusion provides an high noise (SNR = 30), the accuracy got degraded to
improvement in terms of success rate of 1.65, 3.92 and 83.26 %. The accuracy for different sets of gestures for
4.41 % as compared to baseline models ANN, SVM and different SNR values is shown in Fig. 10. The low accu-
kNN, respectively. racy was observed due to large number of misclassifica-
To test the statistical significance between the different tions in gesture ‘One,’ ‘Two,’ ‘Seven,’ ‘S,’ ‘Z,’ ‘Divide.’
individual classifiers, we performed one-way analysis of This may be due to the simplicity in these gestures, while
variance test [31], Friedman’s test and Kruskal–Wallis test the accuracy of other gestures remains high as they have
on the results obtained from the fivefold cross-validation large number of variations when gesticulated by different
process. The results of the one-way analysis of variance users at different instant of time.
test are given in Table 12. The null hypothesis Ho for this
experiment is that the mean accuracies of all the classifiers 3.5 Comparative analysis
are same. It can be observed that F [ Fcrit and P  a where
a ¼ 0:05. We may conclude that there is a statistically A comparison has been done between the performance of the
significant difference between the accuracies of classifiers. system with proposed optimal feature set and the features
Similarly, the results of Friedman’s test and Kruskal– available in the literature [19, 20, 23]. The comparison is
Wallis test are shown in Tables 13 and 14, respectively. In shown graphically in Fig. 11. Figure 11a–f corresponds to the
both the test, Chi - sqðour dataÞ [ Chi - sqðfrom tableÞ accuracy of different feature sets using different classifier
and Prob ð pÞ show that it is statistically significant for 0.1 models such as CRF [23, 32], HCRF [33], ANN [20, 34],
significant level. Thus, we say that the mean and column SVM [35] and kNN [36], and classifier fusion for numerals,
effects of different classifiers are different and not all the alphabets, arithmetic operators, self co-articulated, non-self
classifiers come from the same distribution. co-articulated and total set of gestures, respectively. The

123
Neural Comput & Applic

Fig. 8 Post hoc analysis using Tukey’s HSD test for a one-way analysis of variance test, b Friedman’s test and c Kruskal–Wallis test (1-SVM,
2-kNN, 3-ANN, 4-fusion)

literature, and thus, we developed ‘NITS hand gesture


database IV.’ The proposed system can be used for
developing a gesture-controlled hexadecimal keyboard
making human–computer interaction easier. A total of 44
No noise SNR=40 SNR=30 No noise SNR=40 SNR=30 features were selected from the existing literature. ANOVA
Fig. 9 Few samples of noisy data
test was performed in order to check the statistical signif-
icance of the 44 features. The 40 significant features were
performance of the proposed system with the optimal set of then arranged in the decreasing order of the F-static value
features has been observed to outperform the feature set in the which was then fed to the IFS to select the optimal features.
other literature using classifier fusion for every set of gestures. The total number of features or the optimal features was
Also it has been observed that the classifier fusion performs observed to be 21, 18 and 26 for ANN, SVM and kNN,
better than the individual classifiers for different feature sets respectively. The results of the three individual classifiers
experimented. Thus, we conclude from the comparative were combined to provide the classifier fusion results.
analysis that the combination of optimal features and classi- After this, fivefold cross-validation was used to provide the
fier fusion yields the highest overall accuracy of 92.23 % overall accuracy of the system. The overall accuracy was
which is shown in Fig. 11f. observed to be 90.58, 88.31, 87.82 and 92.23 % using the
ANN, SVM, kNN and classifier fusion, respectively. To
test the statistical significance between the different indi-
4 Conclusion and future work vidual classifiers, we performed one-way analysis of vari-
ance test, Friedman’s test and Kruskal–Wallis test on the
In this paper, we have developed a hand gesture recogni- results obtained from the fivefold cross-validation process.
tion system where 40 set of gestures were considered. As It was observed from these tests that the classifier fusion is
such, database for 40 set of gestures is not available in the significantly different from other classifiers.

Fig. 10 Performance of the SNR=30 SNR=40 No Noise


system for various gesture sets 100

with varying SNR (30, 40, no


95
noise)
90
Accuracy (%)

85

80

75

70
Numerals Alphabets Arithmetic Self co- Non self co- Total set
Operators articulated articulated

123
Neural Comput & Applic

Fig. 11 Comparison of [19] [23] [20] proposed [19] [23] [20] proposed
proposed features with existing 100 95
features [19, 20, 23] using CRF, 90 85
HCRF, ANN, SVM, kNN and 80
70 75
classifier fusion models for
60 65
gesture sets: a numerals, 50
b alphabets, c arithmetic 40 55
operators, d self co-articulated CRF HCRF kNN SVM ANN Fusion CRF HCRF kNN SVM ANN Fusion
gestures, e non-self co-
articulated gestures and f total (a) (b)
set of gestures
[19] [23] [20] proposed [19] [23] [20] proposed
100 100

90 90

80 80

70 70

60 60
CRF HCRF kNN SVM ANN Fusion CRF HCRF kNN SVM ANN Fusion

(c) (d)

[19] [23] [20] proposed [19] [23] [20] proposed


95 95
85 85
75
75
65
55 65

45 55
CRF HCRF kNN SVM ANN Fusion CRF HCRF kNN SVM ANN Fusion

(e) (f)

Moreover, our system was tested for noisy set of data 2. Singha J, Das K (2013) Indian sign language recognition using
where Gaussian noise with SNR = 30 and SNR = 40 was eigen value weighted Euclidean distance based classification
technique. Int J Adv Comput Sci Appl 4(2):188–195
added to the gesture trajectories. We can observe from the 3. Singha J, Das K (2013) Recognition of Indian sign language in
results that the low noise does not affect the system per- live video. Int J Comput Appl 70(19):17–22
formance largely, but when the system was subjected to 4. Badi HS, Hussein S (2014) Hand posture and gesture recognition
high noise (SNR = 30), the performance degraded from technology. Neural Comput Appl 25(3–4):871–878
5. Badi H, HasanHussein S, Kareem SA (2014) Feature extraction
92.23 to 83.26 %. Similarly, comparison of the proposed and ML techniques for static gesture recognition. Neural Comput
model was performed with the existing features in the lit- Appl 25(3–4):733–741
erature. It was observed that our system provided perfor- 6. El-Baz AH, Tolba AS (2013) An efficient algorithm for 3D hand
mance better than the existing ones. However, the system gesture recognition using combined neural classifiers. Neural
Comput Appl 22(7–8):1477–1484
was proposed for isolated gestures. This can be extended 7. Comaniciu D, Ramesh V, Meer P (2003) Kernel-based object
for continuous set of gestures in future. Moreover, new set tracking. IEEE Trans Pattern Anal Mach Intell 25(5):234–240
of features may be added as a future work to the features 8. Chai D, Ngan KN (1999) Face segmentation using skin-color
used in this paper to enhance the performance of the map in videophone applications. IEEE Trans Circuits Syst Video
Technol 9:551–564
system. 9. Wang H, Chang S-F (1997) A highly efficient system for auto-
matic face region detection in MPEG video. IEEE Trans Circuits
Acknowledgments The authors acknowledge the Speech and Image Syst Video Technol 7:615–628
Processing Lab under Department of ECE at National Institute of 10. Guo JM, Liu YF, Chang CH (2012) Improved hand tracking
Technology Silchar, India, for providing all necessary facilities to system. IEEE Trans Circuits Syst Video Technol 22:5
carry out the research work. 11. Bradski GR (1998) Computer vision face tracking as a compo-
nent of a perceptual user interface. In: The workshop on appli-
cations of computer vision, Princeton, NJ, pp 214–219
References 12. Shi J, Tomasi C (1994) Good features to track. In: Proceedings of
the IEEE conference on computer vision and pattern recognition,
pp 593–600
1. Hasan H, Abdul-Kareem S (2014) Human–computer interaction 13. Asaari MSM, Rosdi BA, Suandi SA (2014) Adaptive Kalman
using vision-based hand gesture recognition systems: a survey. filter incorporated eigenhand (AKFIE) for real-time hand tracking
Neural Comput Appl 25(2):251–261 system. Multimed Tools Appl 70(3):1869–1898

123
Neural Comput & Applic

14. Kolsch M, Turk M (2004) Fast 2D hand tracking with flocks of 25. Rubine B (1991) Specifying gestures by example. In: Proceedings
features and multi-cue integration. In: Proceedings of the IEEE of ACM SIGGRAPH’93, 18th international conference on com-
conference on computer vision and pattern recognition workshop, puter graphics and interactive techniques, USA, pp 329–337
pp 158 26. Xu D, Wu X, Chen YL, Xu Y (2014) Online dynamic gesture
15. Yao Y, Fu Y (2014) Contour model-based hand-gesture recog- recognition for human robot interaction. J Intell Rob Syst
nition using the Kinect sensor. Circuits Syst Video Technol IEEE 77(3–4):583–596
Trans 24(11):1935–1944 27. Lin J, Ding Y (2013) A temporal hand gesture recognition system
16. Viola P, Jones M (2001) Rapid object detection using a boosted based on hog and motion trajectory. Opt Int J Light Electron Opt
cascade of simple features. In: Proceedings of the IEEE confer- 124(24):6795–6798
ence on computer vision and pattern recognition, pp 511–518 28. Sharkey AJC (1999) Combining artificial neural nets: ensemble
17. Viola P, Jones MJ (2004) Robust real-time face detection. Int J and modular multi-net systems. Springer, London
Comput Vis 57(2):137–154 29. Nadgeri SM, Sawarkar SD, Gawande AD (2010) Hand gesture
18. Geetha M, Menon R, Jayan S, James R, Janardhan GVV (2011) recognition using Camshift algorithm. In: Proceedings of the
Gesture recognition for American Sign Language with polygon third ieee international conference on emerging trends in engi-
approximation, IEEE international conference on technology for neering and technology, Goa, pp 37–41
education, Tamil Nadu, India, 4–16 July, pp 241–245 30. Shan C, Tan T, Wei Y (2007) Real-time hand tracking using a mean
19. Bhuyan MK, Ghosh D, Bora PK (2006) Feature extraction from shift embedded particle filter. Pattern Recogn 40(7):1958–1970
2D gesture trajectory in dynamic hand gesture recognition. In: 31. Semwal VB, Mondal K, Nandi GC (2015) Robust and accurate
Proceedings of the IEEE conference on cybernetics and intelli- feature selection for humanoid push recovery and classification:
gent systems, pp 1–6 deep learning approach. Neural Comput Appl. doi:10.1007/
20. Singha J, Laskar RH (2016) Self co-articulation detection and s00521-015-2089-3
trajectory guided recognition for dynamic hand gestures. IET 32. Yang HD, Sclaroff S, Lee SW (2009) Sign language spotting with
Comput Vis 10(2):143–152 a threshold model based on conditional random fields. IEEE
21. Singha J, Laskar RH (2015) ANN-based hand gesture recognition Trans Pattern Anal Mach Intell 31(7):1264–1277
using self co-articulated set of features. IETE J Res 33. Quattoni A, Wang S, Morency LP, Collins M, Darrell T (2007)
61(6):597–608 Hidden conditional random fields. IEEE Trans Pattern Anal Mach
22. Kao CY, Fahn CS (2011) A human-machine interaction tech- Intell 29(10):1848–1852
nique: hand gesture recognition based on hidden Markov models 34. Bouchrika T, Zaied M, Jemai O, Amar CB (2014) Neural solu-
with trajectory of hand motion. Proc Eng 15:3739–3743 tions to interact with computers by hand gesture recognition.
23. Bhuyan MK, Kumar DA, MacDorman KF, Iwahori Y (2014) A Multimed Tools Appl 72(3):2949–2975
novel set of features for continuous hand gesture recognition. 35. Dardas NH, Georganas ND (2011) Real-time hand gesture detec-
J Multimodal User Interfaces 8(4):333–343 tion and recognition using bag-of-features and support vector
24. Signer B, Norrie MC, Kurmann U, Gesture I (2007) A Java machine techniques. IEEE Trans Instrum Meas 60(11):3592–3607
framework for the development and deployment of stroke-based 36. Dasarathy BV (1990) Nearest neighbor (NN) norms: NN pattern
online gesture recognition algorithms, Technical report TR561, classification techniques. IEEE Computer Society Press, Los
ETH Zurich Alamitos, CA

123

View publication stats

You might also like