0% found this document useful (0 votes)
9 views5 pages

CVNN

Uploaded by

ahmedyarubhani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views5 pages

CVNN

Uploaded by

ahmedyarubhani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Using Complex-Valued Levenberg-Marquardt

Algorithm for Learning and Recognizing Various


Hand Gestures
Abdul Rahman Hafiz∗ , Md Faijul Amin∗ and Kazuyuki Murase∗
∗ Departmentof Human and Artificial Intelligence System,
Graduate School of Engineering,
University of Fukui, JAPAN
Email: [email protected]

Abstract—With the advancement in technology, we see that  


complex-valued data arise in many practical applications, spe-
cially in signal and image processing. In this paper, we introduce
a new application by generating complex-valued dataset that
represents various hand gestures in complex domain. The system
consists of three components: real time hand tracking, hand-
skeleton construction, and hand gesture recognition. A complex-
valued neural network (CVNN) having one hidden layer and
trained with Complex Levenberg-Marquardt (CLM) algorithm
has been used to recognize 26 different gestures that represents
English Alphabet. The result shows that the CLM provides
reasonable recognition performance. In addition to that, a com-
parison among different activation functions have been presented.

I. I NTRODUCTION
Recent technological advancements in the human-computer
interaction field have shown that conventional tools, such as
keyboard, mouse, and light pen, do not provide a natural form Fig. 1. The left image is a human hand after applying edge detection
algorithm, while the image on the right is showing the branches that were
of interaction. Even though those tools were the standard produced after applying thinning algorithm to the image
forms of input for many decades, the ubiquity of digital
systems revealed the urgent requirement for a more reachable
interaction method that can be used by anyone regardless the image captured using Kinect camera [7]. We used three
of his educational background. Since the hand has always layer complex-valued neural network (CVNN), and Complex
been the natural interaction methods among humans, a recent Levenberg-Marquardt (CLM) algorithm [8] for the training,
resurgence in developing new hand modeling techniques has due to the nature of the data that we can collect from the
been observed. Regardless of the technique used, the main generated hand-skeleton representation. We investigate the
goals have always been: using descriptive gestures while recognition performance with respect to various activation
keeping the computer processing and modeling as simple as functions in the hidden layer. The output layer, however, uses a
possible. recently proposed activation function [11], that helps an output
The system can be applied to various background, change- neuron behaving like a discriminative function.
able lighting of the environment and different kinds of human The remainder of the paper is organized as follows. Section
colors. To achieve that, we construct simple representation II, discusses the procedures to generate the hand-skeleton
of human hand (hand-skeleton) after applying edge detection structure. Section III introduces CLM algorithm and various
with thinning algorithm (Fig.1) to the input image, we then activation functions. Computer simulation results are discussed
define gestures for each English characters [1]. in Section IV. Finally, concluding remarks are given in Section
Recently, complex-valued data are used in many applica- V.
tions, such as array signal processing [2], radar and magnetic
resonance data processing [3], [4], communication systems [5], II. P ROCEDURES
signal representation in complex baseband [6], and processing In this research, we utilized the camera of a Microsoft
data in the frequency domain [3]. Kinect motion-sensing input device [7], accompanied by
In our approach, complex-valued data that represent hand OpenCV platform which have the computational capabilities
gesture can be obtained after applying sequence of filters to required for real-time image acquisition and handling. Fig.
2 represents a modular view of the final system. The image human hand and face. Then we used HSL representation of
acquired by the Video Input Module is passed to both the Hand color to identify the color of human skin. It is known that
Location Module and the Image Processing Module. While the HSL representation identifies the color of human skin more
Hand Location Module is responsible for detecting the location accurately than RGB representation[12]. The last step involved
of the hand within the image, the Image Processing Module separating the image of the hand from image of the face (Fig.
processes the area that has been previously detected by the 4).
Hand Location Module. The output of the Image Processing
Module is then passed to the Hand-Skeleton Construction
Module which is responsible for creating a skeleton module
 
from the image of the hand. This module has two outputs:
skeleton model of the hand, and supplementary data passed to
the Hand Location Module to increase the location detection
accuracy. The skeletal model is then passed to the CLM where
the actual recognition takes place. The following subsections
Fig. 4. The result after skin detection, showing human hand with two states.
describe in detail each stage of the system: In the left, the human hand with the fingers are together, and in the right, the
fingers are apart

Input&Frames& Although the hues of the hand’s color and the face’s color
are different, the difference is too small to be considered
reliable by itself. Accordingly, we had to support that by
Hand&Location& Image&Processing& another source of information. For instance, when the system
is initialized, it depends on the motion of the hand to distin-
Hand&Skeleton!
guish it from the face, and then the system would realize the
Hand5tree&Construction& Recognition location of the hand using the feed-back of the hand-skeleton
! Construction! !
! ! construction part of the system.
!
Fig. 2. Human hand gesture recognition system, showing its module and the
From Fig. 4, we can notice that when the fingers are close to
connections between them each other, we might lose some information about each finger
state. To compensate for that problem, we used a sequence of
Input&Frames&
A. Tracking and Detection image processing algorithms to aid the correct recognition of
the finger’s state, as described in the next section.
The first step involves separating the image of the hand from
the rest of the image. To do that, we used Kinect’s depth map B. Image Processing
Hand&Location& Image&Processing&
to wipe the background of the image. As we can see in Fig. After locating the human hand in the image, the system
3. , only the silhouette of the human body was extracted from filters the region where the hand is located as shown in Fig.
the image, disposing any other unneeded objects. Hand5tree&Construction& Recognition
5. !
!
!

Hand%Image% Image% Edge%detection% Threshold

Hand%Image% Hand&!
Dilation% Thinning% Hand%Tree%
! Skeleton!!
!

Fig. 5. The image filters that applied in real-time for the hand location,
that produce connected branches of line which represent the fingers of human
hand
Fig. 3. Simple illustration of background deletion, the image in the left is the
source image, the image in the middle is the depth map that Kinect’s camera
provides, and the image in the right is the result after deleting the background First, the system applies Sobel Edge Detection Algorithm
[10] to get a contour of the hand. This filter scans the image
Next, we removed the regions not having the color of for sharp contrast differences, and assigns a white color shade
human skin. The resulting image contains the location of the equivalent to the contrast in that region.

Monday, January 9, 12
Next, by restricting the whiteness to a specific threshold, elements. Each element xi , 1 ≤ i ≤ 8, is a Cartesian
the system deletes any noisy edges effectively creating a representation of each segment of the hand skeleton, √ .i.e.,
sharper edge representation. However, that step would produce xi = ri ejθi = ri cos(θi ) + jri sin(θi ), where j = −1. To
disconnected regions in the edges of the hand, affecting the classify the patterns represented by complex-valued feature
outcome of thinning algorithm. To avoid that drawback, we vectors, we apply a feedforward complex-valued neural net-
used a dilation algorithm resulting in a fully connection figure. work (CVNN) with one hidden layer. The output layer uses
The dilated image is then passed to a thinning algorithm an activation function proposed in [11] which can act as a
[8]. This algorithm generates one line of pixels representing discriminating function giving a discriminating score. We call
the branching of the structure. The outcome of that step is: the function here as discrim. The function has the following
a representing the hand as interconnected lines meeting at form:
2
multiple nodes. fC→R (z) = (fR (u) − fR (v)) (1)
The final step involves reading that representation. The
system creates pairs of data for each branching by traces the where z = u + jv denotes the weighted sum of input signals
line that connects the nodes, calculating the length and the along with the bias and called net-input and fR () is a real-
angle of these lines. The calculation method for the length of valued log-sigmoid function. The hidden layer, however, may
the line which connects two nodes and its relative angle is take any activation function found in the literature of the
shown in Fig. 6. CVNNs. Recently CLM has been proposed by [8] as a fast
learning algorithm for the feedforward complex-valued neural
f
networks (CVNN). The CVNN in this study is trained with
g b
the CLM algorithm because of its faster convergence. Since
θ5
we have a total of 26 different gestures, the output layer has
θ6
26 neurons, each representing one gesture.
d
Two fingers Four fingers h θ7 θ4 θ3 θ1
In order to see the effect of hidden layer activation functions
θ2
θ8
e
in the hand gesture recognition problem, we investigate a
c
i
number of complex activation functions listed below.

splitTanh [14] : f (z) = tanh u + j tanh v (2)


Hand Tree
Three fingers Five fingers a 1 1
splitSigm [14] : f (z) = −u
+j (3)
1+e 1 + e−v
Fig. 6. The left side of the figure is the hand-skeleton branches and nodes
that have been generated from the result of the thinning algorithm. The right linear : f (z) = z (4)
side is the pairs of data that can be abstracted from these branches George [11] : f (z) = z/(c + |z|/r), c, r, constants (5)
Fig. 6 shows the method used for converting a hand-skeleton eiz − e−iz
tan [15] : f (z) = (6)
model to a set of amplitude-phase pairs that can be processed i(ei z + e−iz )
by the CVNN. The conversion process involves the recursive eiz − e−iz
sin [15] : f (z) = (7)
measurement of the branches relative to the root branches. The 2i Z z
complete conversion process is made of the following steps: dt
atan [15] : f (z) = arctan z = (8)
1) Locate the lower-most root branch. 1 + t2
Z 0z
2) Measure the angle between branches from the node at dt
asin [15] : f (z) = arcsin z = (9)
the end of the root branch and the extension of the root (1 − t)1/2
branch (θ1 ) and the length of that branch (r1 ). Z0 z
dt
3) Repeat step 2 until you reach branches that don’t branch acos [15] : f (z) = arccos z = 2 1/2
(10)
0 (1 − t )
at their terminal nodes. z −z
e −e
4) Input the parameters of the non-branching branches sinh [15] : f (z) = (11)
(terminal branches) to the CVNN 2
ez − e−z
5) In the case when the number of terminal branches tanh [15] : f (z) = z (12)
is not equal to 5 (that is, less than the number of e + e−z Z z
human fingers), other terminal branches are assigned dt
atanh [15] : f (z) = arctanh z = (13)
with zeroes for both phase and amplitude. For simplicity, (1 + t2 )1/2
Z 0z
the assumed pictures will follow a predefined pattern, as dt
asinh [15] : f (z) = arcsinh z = 1/2
(14)
can be seen in Fig. 6. 0 (1 − t)

III. C OMPLEX -VALUED L EVENBERG -M ARQUARDT IV. R ESULTS


(CLM) A LGORITHM FOR H AND G ESTURE R ECOGNITION A pattern set with 26 hand gestures were collected, The data
The image processing steps discussed above produces a set comprises 520 patterns and was divided into a training set
complex vector x = [x1 , x2 , . . . , x8 ]T consisting of eight (50%), a validation (25%), and a testing set (25%).
1.2"
We test the system robustness in simple data collection and
noise deletion tasks, the system could work in 10 frames per splitTanh"
1"
second (fps) and could read the hand state for 90% of the time
sin"

training"error"(MSE)"
[1]. 0.8"
For recognizing the English character, we defined distin- asin"
guishable gestures of the hand to represent each character. 0.6" George"
These gestures have been chosen so that it will be easier for
discrim"
the system to recognize them. Consideration was also taken on 0.4"
human’s natural skills to move from one gesture to other. Our linear"
algorithm detects the edges between the fingers even though 0.2"
the fingers are stuck together. This allowed us to design a
0"
simple representation for each character as shown in Fig. 7.
1" 6" 11" 16" 21" 26" 31" 36" 41" 46"
itera7on"

a b c d e f Fig. 9. Mean squared error with different activation functions

g h i j k l
Table 1, shows the classification error for different activation
functions sorted by the smallest value. From the table we can
m n o p q r notice that the split type activation functions performed better
for this problem.
s t u v w x
TABLE I
C LASSIFICATION ERROR FOR DIFFERENT ACTIVATION FUNCTIONS
y z Activation Functions Classification Error(%)
splitTanh[14] 08.46
splitSigm[14] 11.29
Fig. 7. Hand gesture for each English character, we can notice that the
linear[15] 13.59
gestures are differing in the number of fingers and the angles they make with asin[15] 15.65
each other tan[15] 16.41
discrim [11] 17.18
We presented all the input data to the CLM and computed acos[15] 17.69
tanh[15] 17.95
the outputs and the validation error. From Fig.8 we can notice George[13] 19.49
that the optimal number for the neurons in the hidden layer sin[15] 20.00
asinh[15] 20.26
was 4, sinh[15] 21.54
atan[15] 23.85
90" atanh[15] 28.97
splitTanh"
80"
sin"
valida1on"error"(percentage)"

70"
asin"
60" V. C ONCLUSION
George"
50" In this paper, the CLM algorithm has been used in a hand
discrim"
40" linear"
gesture recognition system to distinguish 26 differed gestures
(English Alphabet). By using Kinect depth map and the human
30"
skin color we could isolate human hand from the rest of the
20" image, then we used a sequence of image filters to generate a
10" descriptive representation of human hand. We call it ”Hand-
Skeleton”. This representation allows us to use the CLM
0"
algorithm for learning and recognition stage. The results shows
1" 2" 3" 4" 5" 6"
that the CLM algorithm with the split type activation functions
number"of"neurons"in"the"hidden"layer"
achieves the highest recognition performance.
Fig. 8. Validation error for different activations function and number of
neurons in the hidden layer ACKNOWLEDGMENT

The learning process was terminated if some stopping This study was supported by grants to K.M. from Japanese
criteria were met, such as, validation error increases rather Society for promotion of Sciences and Technology, and the
than a decrease. University of Fukui.
R EFERENCES
[1] A. Hafiz, Md.F. Amin and K. Murase, Real-Time Hand Gesture Recogni-
tion Using Complex-Valued Neural Network (CVNN), 2011 International
Conference on Neural Information Processing (ICONIP 2011), Shanghai,
China, Nov. 2011.
[2] H.L. van Trees, Optimum Array Processing, Wiley Interscience, New
York, 2002.
[3] A. Hirose, An improved parallel thinning algorithm, Springer, Heidelberg,
2006.
[4] V.D. Calhoun, T. Adali, G.D. Pearlson, P.C.M. van Zijl, and J.J. Pekar,
Independent component analysis of fMRI data in the complex domain,
Magnetic Resonance in Medicine , v.48(1), 180-192, 2002.
[5] G.L. Stuber, Principle of Mobile Communication, Kluwer, Boston, 2001.
[6] C.W. Helstom, Elements of Signal Detection and Estimation, Prentice
Hall, New Jersey, 1995.
[7] J. Shotton and T. Sharp, Real-Time Human Pose Recognition in Parts
from Single Depth Images, published at Microsoft Research in Cambridge,
statosuedu, 2, 2011.
[8] Md.F. Amin, I.A. Muhammad ,Y.A.N. Ahmed and K. Murase, Wirtinger
Calculus Based Gradient Descent and Levenberg-Marquardt Learning
Algorithms in Complex-Valued Neural Networks, 2011 International Con-
ference on Neural Information Processing (ICONIP 2011), Shanghai,
China, Nov. 2011.
[9] Intel Corporation, Open Source Computer Vision Library, Reference
Manual, Copyright 1999-2001, Available: www.developer.intel.com
[10] I. Sobel and G. Feldman, A 3x3 Isotropic Gradient Operator for
Image Processing, presented at a talk at the Stanford Artificial Project,
unpublished but often cited 1968.
[11] Md.F. Amin, K. Murase, A single-Layered Complex-Valued Neural
Network for Real-Valued Classification Problems, Neurocomputing, Vol
72, pp.945-955. 2009.
[12] J. B. Martinkauppi, M. N. Soriano, and M. H. Laaksonen, Behavior
of skin color under varying illumination seen by different cameras at
different color spaces, in Proc. SPIE, Machine Vision Applications in
Industrial Inspection IX, San Jose, CA, Jan. 2001, pp. 102–113.
[13] M.G. Geotge and C. Koutsougeras, Complex domain backpropagation,
IEEE Transactions on Circuits and Systems II, 39(5):330-334, 1992.
[14] A. Hirose, Complex-Valued Neural Networks, Springer, 1-160. 2006.
[15] T. Kim and T. Adal, Approximation by fully complex multilayer percep-
trons, Neural Computation, vol. 15, no. 7, 1641 pages, 2003.

You might also like