0% found this document useful (0 votes)
10 views6 pages

Improved Estimation of Hand Postures Usi

The document presents an improved method for hand pose estimation using depth images, focusing on optimizing performance while maintaining accuracy. By incorporating anatomical constraints through principal component analysis (PCA) and biased particle swarm optimization, the authors demonstrate significant enhancements in both the accuracy and speed of pose estimation. The proposed approach effectively addresses challenges such as high-dimensionality and self-occlusions in hand tracking, aiming for real-time application in human-robot interaction.

Uploaded by

Jeremy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views6 pages

Improved Estimation of Hand Postures Usi

The document presents an improved method for hand pose estimation using depth images, focusing on optimizing performance while maintaining accuracy. By incorporating anatomical constraints through principal component analysis (PCA) and biased particle swarm optimization, the authors demonstrate significant enhancements in both the accuracy and speed of pose estimation. The proposed approach effectively addresses challenges such as high-dimensionality and self-occlusions in hand tracking, aiming for real-time application in human-robot interaction.

Uploaded by

Jeremy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Improved Estimation of Hand Postures Using

Depth Images

Dennis Hamester, Doreen Jirak and Stefan Wermter

University of Hamburg, Department of Informatics, Knowledge Technology


Vogt-Kölln-Straße 30, D - 22527 Hamburg, Germany
{7hameste,jirak,wermter}@informatik.uni-hamburg.de
http://www.informatik.uni-hamburg.de/WTM/

Abstract—Hand pose estimation is the task of deriving a Oikonomidis et al.’s [2] method deals very well with the
hand’s articulation from sensory input, here depth images high-dimensionality and self-occlusions of the human hand.
in particular. A novel approach states pose estimation as an However, their approach is still computationally demanding.
optimization problem: a high-dimensional hypothesis space is They report that their algorithm can run at about 15 FPS on
constructed from a hand model, in which particle swarms search a high-end PC. This is only half the rate at which the Kinect
for the best pose hypothesis. We propose various additions to
provides images. Our goal was to improve the performance,
this approach. Our extended hand model includes anatomical
constraints of hand motion by applying principal component possibly to the point of running in real-time. At the same time
analysis (PCA). This allows us to treat pose estimation as a we did not want to sacrifice any accuracy. We addressed this
problem with variable dimensionality. The most important benefit by exploiting biases in certain variants of particle swarms. We
becomes visible once our PCA-enhanced model is combined with will show, that the optimization behavior of these variants can
biased particle swarms. Several experiments show that accuracy be aligned with a priori knowledge about how humans perform
and performance of pose estimation improve significantly. hand motions. The result was an overall improved convergence
behavior, leading to better pose estimation in less time.
I. I NTRODUCTION The idea to use a priori information has already been
The human hand is highly articulated. Humans use hands to applied successfully to hand pose estimation by Bianchi et
manipulate objects in their surroundings and to communicate al. [3]. They determined statistical properties of hand motion
with other people. Capturing exact hand postures is an impor- and used these to improve the noisy measurements of a
tant step for Human-Robot Interaction and the development of low-cost data glove. Our method differs in the way a priori
natural interfaces. Computer vision (CV) can provide cheap knowledge is used. We use it to transform the search space
and unobtrusive solutions to this problem, especially compared of all hand postures, such that certain variants of particle
to data gloves. swarm optimization (PSO) perform better due to biases in their
behavior. We also do not require an existing pose estimation.
Solving CV-based hand pose estimation without markers
in single camera setups is a very challenging task, because This paper is organized as follows: Section II covers our
hands can take on vastly different shapes in images. The image preprocessing. Its purpose is to segment images into
amount of degrees of freedom (DOFs) contributes to a high- hand and non-hand parts. Section III introduces our new hand
dimensional problem. The problem is further complicated by model and how its parameter space is altered through principal
self-occlusions of the hand, that happen inevitably during the component analysis. Particle swarm optimization and the target
projection onto 2D images. function mentioned above are covered in Section IV. We will
also explain our motivation for using a PSO variant with
Following the taxonomy of Erol et al. [1], the approach certain biases. In Section V we detail our experiments with
discussed here belongs to the class of model-based tracking the new method and provide an evaluation of the data. A
methods that follow a single hypothesis over time. In this final discussion and an outlook for future research are given
context, single hypothesis means that only one satisfying in Section VI.
solution is searched for and kept for the initialization of the
next frame. II. H AND D ETECTION & T RACKING
Significant progress in this area was made by Oikonomidis Detecting hands in images is a necessary step, because pose
et al. [2]. They formulate pose estimation as an optimiza- estimation is not capable of performing this segmentation by
tion problem. An internal hand model defines the parameters itself. We have separated the task into two steps: first, an initial
(DOFs) that make up a hand pose. This high-dimensional one-time detection of hands based on depth images and shape
space is searched by a particle swarm for a suitable solution. recognition and second, subsequent tracking of the hand region
As a particle moves, it renders an artificial depth image with an adaptive skin color model.
of its current hand pose hypothesis, which is compared to
the actual observation from the Kinect. A target function is For the first step, we restrict the detection to a specific
used to measure the discrepancy between rendered image and hand posture that has a distinctive shape. A hand has to
observation. be open and face the sensor, with the fingers spread out a
little. We perform foreground segmentation on depth images
to reduce the region of interest. After that, edge detection in the
foreground depth image provides a set of candidate contours.
To support classification and the ability for generalization, we
use Fourier descriptors with 12 complex-valued coefficients1
to represent contours. These provide desirable invariance prop-
Fig. 1. Hand model in different configurations. It consists of two types of
erties against common affine transformation (e. g. scale or objects: ellipsoids (in blue) and elliptical cylinders (in green, red, orange and
rotation). Furthermore, the contour information is condensed yellow).
in these 12 coefficients. Finally soft-margin support vector
machines are used to separate hands from non-hands.
Based on the color that is enclosed by a hand contour, we
learn the parameters of an elliptical boundary model (EBM)
[4] of the skin color distribution. In all subsequent frames after
successful detection by shape, this model is used to retrieve
the hand.
This two-step scheme can properly distinguish between
hands and other skin-colored objects in the scene. It comes
at the cost of requiring a specific hand posture for detection.
But this restriction is alleviated as soon as the distribution
parameters of the skin color are learned.

III. H AND M ODEL


Fig. 2. Schematic joint model of a human hand. Joints are shown as blue
The hand model serves two purposes: first, it defines the dots. The CMC1 and MCP2-5 are modelled as 2-DOF joints. The other joints
parameters that make up the state of the hand. Each parameter (MCP1, IP1, PIP2-5, DIP2-5) have only one DOF.
is one DOF of the model. Since our method requires synthetic
hand images, the second purpose of our model is to define what
a hand looks like. Apart from these, constraints of human hand hand. Joints come in two variants: joints with two DOFs and
articulations will be discussed as a third property. those with just one DOF.
The joints used here describe a rotation, in either one or
A. Shape two dimensions. If it is a 2-DOF joint, both axes of rotation
The geometrical detail of our hand model must be kept are orthogonal to each other and the reference point of the
low, while still ensuring resemblance to a real human hand. rotation is the same in both dimensions. This also implies that
The algorithm repeatedly renders depth images, which are then both axes must intersect, which is not necessarily true for real
compared to real depth images from the Kinect. Even though joints [5].
rendering is accelerated here with OpenGL, the complexity of A schematic joint model of the human hand is depicted in
the hand model has a huge impact on the runtime performance. Fig. 2. All DOFs together form a 20-dimensional parameter
The model is shown in Fig. 1. It is composed of two space, in which each point describes one particular posture.
primitive objects: elliptical cylinders and ellipsoids. The main A special 6-DOF joint is placed in the center of mass of the
object of the palm, shown in green in Fig. 1, is an elliptical palm, because we do not want restrict hands to one specific
cylinder, whose major semi-axis is significantly larger than location in space nor is the orientation assumed fixed. This
the minor semi-axis. Two ellipsoids (blue) are placed on both joint represent the global position and orientation of the hand
ends of the cylinder to provide a smooth surface. Each finger in space relative to the sensor. With the addition of this joint,
consists of five objects: three cylinders and two spheres. The the final parameter space has 26 dimensions.
cylinders are shown in red, orange and yellow to emphasize
the different phalanges. The spheres are placed between two C. Constraints
adjacent phalanges to handle discontinuities that occur when
bending a finger. The thumb is modelled similarly, but has a One particular goal was to take inter-dependencies between
large ellipsoid (blue) as its first object instead of a cylinder. DOFs into consideration. Although each joint has as many
We found this to be very effective at reproducing the skin DOFs as stated before, we are not able to control all of them
deformation that happens when the thumb is moved. Our independently. Lin et al. [6] and Wu et al. [7] state, that 95%
model consists 27 individual objects: 15 elliptical cylinders of the variance in hand articulations can be reduced to just 7
and 12 ellipsoids. dimensions.
Lin et al. [6] further classify hand motion constraints into
B. Degrees of Freedom three types. The first type refers to static constraints, called
A set of joints is placed into the above model, which allow range of motion values [5], for each individual DOF. They
the model to take on basically any articulation of a human are usually expressed by two boundary angles that must not
be exceeded. The second type refers to dynamic constraints
1 The specific descriptor length was chosen on the basis of experiments. that are caused by the anatomy of human hands. These can be
TABLE I. VARIANCES AND ( CUMULATIVE ) RATIOS OF RANDOM HAND
POSES AFTER PCA.

1 2 3 4 5 6
Variance 1.324 0.466 0.325 0.259 0.178 0.13
Ratio 42.7% 15% 10.5% 8.3% 5.7% 4.2%
Cum. Ratio 42.7% 57.7% 68.2% 76.5% 82.2% 86.5%
7 8 9 10 11 12
Variance 0.118 0.09 0.063 0.057 0.035 0.022
Ratio 3.8% 2.9% 2% 1.8% 1.1% 0.7%
Cum. Ratio 90.3% 93.2% 95.2% 97% 98.1% 98.9%
13 14 15 16
Variance 0.017 0.012 0.004 0.003
Ratio 0.5% 0.4% 0.1% 0.08%
Cum. Ratio 99.4% 99.8% 99.9% 100%

Fig. 3. Covariance matrix of random hand poses. The DOFs are:


intra- and inter-finger constraints. One example for an intra- 1,2=CMC1; 3,4=MCP2; 5,6=MCP3; 7,8=MCP4; 9,10=MCP5; 11=MCP1;
finger constraint is the fact that distal interphalangeal (DIP) and 12=PIP2; 13=PIP3; 14=PIP4; 15=PIP5; 16=IP. Position, orientation and DIP
proximal interphalangeal (PIP) joints (refer to Fig. 2) can only joints are not included.
be bent together [1], [6]. The third type comprises factors other
than the anatomy, like e. g. the smoothness of hand motion.
concentrated in the first 9 dimensions (in both publications
We do not use closed formulas to model constraints, except only 7 dimensions were required). The first two dimensions
for a few common type 2 constraints. These are the ones account for more than 50% of the variance. Based on this
mentioned above, concerning the relationship between PIP and data, we assume that some of the least significant dimensions
DIP angles. By using the equation are essentially just noise.
2
θDIP,i = θP IP,i 2 ≤ i ≤ 5 (1)
3 IV. PARTICLE S WARM O PTIMIZATION
the dimensionality is reduced by 4 down to 22. Particle swarms were developed in 1995 by Kennedy and
We use PCA to further remove dimensions based on their Eberhart [8] and have received great attention since then
significance. This also allows us to treat the dimensionality as [9], [10]. The method originates from human social behav-
a variable parameter instead of predefined value. ior simulations. In these simulations, agents were placed in
a two-dimensional space and moved through it in discrete
Several videos were recorded with the Kinect to generate time steps. The direction of the movement was based on an
the necessary data for the PCA. All videos together con- attraction point, which Kennedy and Eberhart [8] called the
tained just above 9200 frames, which corresponded to about 5 cornfield vector, in analogy to bird flocks searching for food.
minutes in total. Each video contained a sequence of mostly The authors observed that all agents settled quickly on the
random finger motions, to capture as many hand postures attraction point, despite their random initialization. The result
as possible. Even though such motions are random in their was the formulation of the original particle swarm optimization
articulation, they are still natural, in the sense that they are algorithm.
anatomically plausible, but lack semantic meaning. The hand
was neither moved nor rotated in any of the videos. The In a swarm, n simple entities, called particles, exist in
palm always faced the sensor. In none of the videos, external a d-dimensional space. Each particle has a position p ∈ Rd
physical forces were applied to the hand or fingers. and velocity v ∈ Rd . When particles move in discrete steps
over time, they evaluate their own position p with a target
We used our hand model with 22 DOFs and estimated function f (which will be detailed shortly). The goal is
hand poses for each frame. The global position and orientation to minimize this function. After all particles moved, their
estimations were stripped from the resulting dataset and not velocities are updated. The new velocity comprises two distinct
considered for the PCA, since no meaningful correlation components: a cognitive and a social component [11]. The
between global hand pose and finger articulation was expected. cognitive component is the position pc,i , with the best target
This left about 9200 estimations for the remaining 16 joint value a particle i has seen in the past. As such, the cognitive
angles: the CMC1, MCP1-5, IP1 and PIP2-5 (refer to Fig. 2). component potentially differs between all particles. The social
The data had not been smoothed or filtered in any other way component on the other hand, is the global best known position
prior to the PCA. ps and is shared between all particles in the swarm. These two
vectors take on the role of attraction points in the following
The covariance matrix of the dataset is shown in Fig. 3. formula for the velocity update:
It shows some relatively high covariance values outside the
diagonal. These give indication of the inter-dependencies be- vi ← χ [ vi + U (0, φc ) ⊗ (pc,i − pi ) (2)
tween the DOFs. Interestingly, the DOFs that correspond to + U (0, φs ) ⊗ (ps − pi ) ]
the abduction angles of the MCP joints (DOFs 4, 6, 8 and
10) do not show significant covariance values. The variances U (a, b) is a vector of d random numbers, each uniformly
in Table I indicate very well, that much of the hand motion distributed in the range given by its parameters and ⊗ denotes
happens in only few dimensions. The data is similar to Lin component-wise multiplication. The parameters φc and φs
et al. [6] and Wu et al. [7], in that 95% of the variance is control the influence of the cognitive and social component
on the new velocity. The parameter χ is used to control the axes. These new axes do no longer represent just one DOF,
velocities and avoid swarm-explosion. It can be computed as but many. Specifically, each axis models a particular (linear)
follows [12]: finger motion involving possibly many DOFs that was (with
decreasing significance) noticeable in the sample data. Con-
φ = φc + φs > 4 sider for example the motion between an open hand and a fist.
2 In the original space, this requires changing many DOFs at the
χ=  (3)
φ − 2 + φ2 − 4φ same time. If we further assume that this motion corresponds
roughly to one of the principal components, this might just
For each particle i, the positions are then updated by simply involve change in a very limited subset of DOFs, after the
adding the velocity: space has been rotated by the PCA. In this regard, the new
parameter space is aligned with the way particles move in a
pi ← pi + v i (4) biased PSO. Most significant changes in hand postures happen
along parallels to coordinate axes, and less likely along a mix
The target function we use here determines how closely a of many axes (diagonals).
given hand pose hypothesis h ∈ R26 matches the observation
depth image do . Let x×y be the image size and dh the rendered The switch to a biased PSO in combination with PCA
depth image of h using our hand model. Then the function revealed another positive side effect during our experiments,
y 
besides the increase in accuracy. Oikonomidis et al. [2] were
x
 forced to randomly disturb the particles every few generations
f (h) = min (|do (u, v) − dh (u, v)| , t) (5) due to premature convergence (“swarm collapse”). At first,
v=1 u=1
we observed the same behavior. However, after making the
iterates over both images and computes the sum of pixel- discussed changes, the swarm collapses disappeared. With our
wise differences, which are thresholded at some value t. Areas method, the swarm does not converge prematurely and is able
in both images that do not contain hand pixels, are marked to find satisfying solutions on its own.
with a value of 0. If we omitted the threshold, this would
lead to very high numbers caused by possibly few pixels. We V. E XPERIMENTS & E VALUATIONS
experimentally determined t = 5cm to work well.
Our goal for the experiments and evaluations was to assess
This function was designed similarly to the one proposed the differences of our pose estimation compared to Oikono-
by Oikonomidis et al. [2], but also radically simplified. Orig- midis et al. [2]. We were primarily interested in measuring the
inally, two more components were included. First, a term that possible accuracy gains through quantitative evaluation. We
tested whether a pixel is skin-colored or not. This would will also present some qualitative results at the end of this
be redundant in our case, because all other pixels in do section.
have already been filtered out during the tracking phase Evaluating a hand pose estimation method is in itself not a
(Section II). The second term penalized physically implausible trivial task, because ground-truth information is not available
hand postures. More specifically, it considered the differences when working with real videos. To deal with this problem, we
in abduction angles of the three adjacent finger pairs. In our generated a test video, in part synthetically. The images, that
experiments, we observed that such a term actually hinders make up the video, were completely rendered with our hand
proper optimization. Most of the cases in which this happened, model, but the actual movement of the hand was authentic.
had relative abductions close to zero between adjacent fingers We first recorded a real video of the desired hand motions
(like in a stop gesture or a fist). We therefore removed this and then ran the hand pose estimation on them. Gaussian
penalizing term. filters were used to eliminate high frequency noise in the hand
Spears et al. [13] showed, that drawing random numbers pose sequence. This filtered sequence was then used in turn
dimension by dimension in equation (2) causes several biases. to render the synthetic video of the hand motions. As a last
They found out, that the bias is made up of two components: step, we applied noise and a discretization step to the video in
skew and spread. When a particle moves primarily parallel order to mimic depth images from the Kinect.
to an axis, the skew bias pushes it towards a diagonal of The video contained random motions of the fingers and
two or more axes. On the other hand, a particle that moves the thumb and lasted for approximately 25 seconds. We tried
along a diagonal is highly unstable and gets pushed back to to capture movement of all DOFs and cover many possible
a trajectory parallel to an axis due to the spread bias. The articulations. This video did not contain notable movement of
biases appeared regardless of PSO parameters, like swarm size, the whole hand in space. The hand itself was also not rotated.
number of iterations and dimensionality. A particle swarm that For the whole duration of the video, the palm was facing
updates velocities dimension by dimension is biased towards the sensor. The mean distance between sensor and hand was
movement along axis parallels, even when the problem is roughly one meter.
rotationally symmetric. As a direct result, new PSO versions
(like SPSO 2011 [11]) were developed to overcome the bias. Let πi (x) be the projection of the vector x onto its i-th
component and x1 , x2 ∈ R26 be hand poses. Then
We deliberately propose to use PSO with these biases. In
26
our application, particles move through the parameter space of 1 
our hand model, in which each axis corresponds to one DOF. ea (x1 , x2 ) = |πi (x1 − x2 )| (6)
23 i=4
However as detailed in Section III, this space is altered by
a PCA. The PCA rotates the hand model’s parameter space measures the discrepancy of all angles (given in degrees [◦ ])
in such a way that eigenvectors become the new coordinate as the mean absolute difference. The first three components
27° 16°
Original method / 64 particles
24° Original method / 32 particles 14°
21° Our method / 64 particles
Our method / 32 particles 12°
18°
10°
Mean error

Mean error
15°

12°



3° 2°

0° 0°
4 6 8 10 12 14 16 18 20 22 24 26 28 30 7 10 13 16 19 22
PSO generations Number of DOFs after PCA

Fig. 4. Comparison of our method (blue, magenta) with Oikonomidis et Fig. 5. Dependency between DOFs and mean error.
al. [2] (red, green). A single measurement indicates the angle error (equation 6)
averaged over all frames in the test video.
be influenced negatively when insignificant dimensions were
present.
correspond to the hand location in 3D space, which was not
considered for evaluation. In contrast to Oikonomidis et al. [2],
we chose to stay close to the actual representation of hand C. Qualitative Results
poses as a vector of mostly angles. They derived locations of When it comes to the visually perceived accuracy of
phalanx endpoints and used them to measure accuracy. our pose estimation, there were only minor discrepancies
This evaluation did not take into consideration PSO pa- compared to the real hand posture. Figure 6 shows eight
rameters other than the number of particles and generations. different postures alongside the model, articulated according
In particular, the effect of the cognitive and social factors to the estimation. Most errors stem from thumb estimation.
in the particles velocity equation (2) was not analyzed. To Particularly in Fig. 6(d), the thumb does not point away from
conduct the experiments, we set the values to φc = 2.8 the hand. It is actually inside the other fingers, which is also the
and φs = 1.3 [2], i. e. the constriction factor χ was 0.73 case in Fig. 6(f). This happened quite often, because we did not
(equation 3). Most combinations for φc and φs perform well, perform collision detection or model any physical constraints.
as long as φc + φs = 4.1 holds true [14]. Figures 6(e) and (g) show postures with severe out-of-plane
rotations, that still resulted in proper estimations. We have
found these kinds of postures to be especially problematic,
A. Direct Comparison because the hand occludes large parts of itself.
The vertical axis in Fig. 4 shows the mean absolute angle
error ea . A single measurement is the mean error over all VI. D ISCUSSION & F UTURE W ORK
frames in the entire video for the given PSO parameters. The
In this paper we presented an improved method for the
original method shows a strong dependency on the number of
problem of full-DOF hand pose estimation, based on the
generations. To keep the error below an average of 9◦ at least
method by Oikonomidis et al. [2] that has been extended to
64 particles and 22 generations had to be used. For our method
take a priori knowledge about hand motion into consideration.
on the other hand, 32 particles and 14 generations already were
We achieved this by first applying a common relationship
sufficient. In general, we observed much faster convergence
between DIP and PIP joints, followed by a change of basis
after enabling the PCA and biased PSO. The curves for our
to eigenvectors. This way, biases in particle swarms can be
method are less steep in Fig 4. This directly translates to
exploited, leading to much improved convergence behavior.
an improved performance, because less effort was required
We performed several experiments with partially synthetic
to achieve a certain maximum error. Using 64 particles and
data to provide evidence for this claim. Other experiments
25 generations has been suggested before [2]. We reached the
revealed that our method retains its optimal accuracy when
same error at 32/18, which is roughly 2.8 times faster.
all dimensions are

B. Dimensionality Reduction We discussed our hand model from three different per-
spectives: shape, DOFs and constraints. Several similar works
The experiments above used our hand model with 22 [2], [15], [16] focus almost exclusively on the shape of the
DOFs. We applied the PCA but did not remove any dimensions model, while we put more emphasis on the DOFs of the model
afterwards. Figure 5 depicts the same experiment (32 particles, instead. With the terminology of Lin et al. [6], only level 1
20 generations) but with a varying number of dimensions. The constraints have been imposed on joint angles in the relevant
lowest possible number of dimensions is 7, which include 6 for literature [2], [15], [16]. In this paper, constraints of level 2
the global pose and just one dimension for all joint angles. The and 3 were considered, and some have been modelled with
data indicate that there was no benefit in removing dimensions. closed formulas, while the majority is included through PCA.
Starting from the right, the mean error first stagnates and then This also introduced the dimensionality of the hand model as
starts rising. Thus, our method performs optimally when all a parameter instead of a fixed value. The PCA played a major
dimensions are left in place. The biased PSO did not seem to role in the improved properties of our method.
(a) (b) (c) (d)

(e) (f) (g) (h)

Fig. 6. Eight hand postures and their estimation.

Biased particle swarms were the other key component R EFERENCES


in gaining accuracy. We recognized that these biases push [1] A. Erol, G. Bebis, M. Nicolescu, R. D. Boyle, and X. Twombly, “Vision-
particles onto trajectories parallel to coordinate axes and how based hand pose estimation: A review,” Computer Vision and Image
this relates to PCA. The particle swarm in our method does not Understanding, vol. 108, pp. 52–73, 2007.
converge prematurely. We were thus not forced to apply addi- [2] I. Oikonomidis, N. Kyriazis, and A. A. Argyros, “Efficient model-based
tional randomness to keep the swarm alive, like Oikonomidis 3D tracking of hand articulations using kinect,” in British Machine
et al. [2] did. Vision Conference, Proc. of the, 2011, pp. 101.1–101.11.
[3] M. Bianchi, P. Salaris, and A. Bicchi, “Synergy-based hand pose sens-
ing: Reconstruction enhancement,” Int. Journal of Robotics Research,
A. Contributions The, vol. 32, no. 4, pp. 396–406, 2013.
Our method maintains the same level of accuracy as before [4] J. Y. Lee and S. I. Yoo, “An elliptical boundary model for skin color
detection,” in Imaging Science, Systems, and Technology, Int. Conf. on,
[2], but is about 2.8 times faster. If a 16% increase in estima- 2002.
tion errors is acceptable, our method is able to run five times
[5] G. Stillfried and P. van der Smagt, “Movement model of a human hand
faster. It is able to exceed the 30 Hz framerate of the Kinect. based on magnetic resonance imaging (mri),” in Applied Bionics and
We expect that even more performance is achievable with our Biomechanics, Int. Conf. on, 2010.
method when the set of possible hand postures is constrained [6] J. Lin, Y. Wu, and T. S. Huang, “Modeling the constraints of human
by specific applications. Our idea to combine biased PSO with hand motion,” in Human Motion, Proc. Workshop on, 2000, pp. 121–
PCA provides a very flexible way of incorporating a priori 126.
knowledge. [7] Y. Wu, J. Y. Lin, and T. S. Huang, “Capturing natural hand articulation,”
in Computer Vision, IEEE Int. Conf. on, vol. 2, 2001, pp. 426–432.
We also gave a working example on how biased PSO algo- [8] J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Neural
rithms can be exploited and that the results can be significant. Networks, IEEE Int. Conf. on, vol. 4, 1995, pp. 1942–1948.
This might prove useful to many more applications of PSO, [9] R. Poli, J. Kennedy, and T. Blackwell, “Particle swarm optimization -
because it is not specific to pose estimation. an overview,” Swarm Intelligence, vol. 1, pp. 33–57, 2007.
[10] R. Poli, “Analysis of the publications on the applications of particle
B. Future Work swarm optimisation,” Journal of Artificial Evolution and Applications,
vol. 2008, pp. 3:1–3:10, 2008.
Despite our improvements, the approach is still computa- [11] S. S. Pace, A. Cain, and C. J. Woodward, “A consolidated model of
tionally demanding. Most of the time is spent rendering depth particle swarm optimisation variants,” in Evolutionary Computation,
images. For future work, we would like to explore ways of IEEE Congress on, 2012, pp. 1–8.
shifting some of the effort to an offline learning phase. This [12] M. Clerc and J. Kennedy, “The particle swarm - explosion, stability,
and convergence in a multidimensional complex space,” IEEE Trans.
might be done by pre-rendering a subset of hand postures and Evol. Comput., vol. 6, no. 1, pp. 58–73, 2002.
a suitable interpolation method. [13] W. M. Spears, D. Green, and D. F. Spears, “Biases in particle swarm
We plan to also conduct additional experiments to identify optimization,” Int. Journal of Swarm Intelligence Research, vol. 1, no. 2,
pp. 34–57, 2010.
circumstances under which the algorithm fails. First experi-
[14] I. Oikonomidis, N. Kyriazis, and A. A. Argyros, “Tracking the artic-
ments indicate that out-of-plane rotations of the palm require ulated motion of two strongly interacting hands,” in Computer Vision
significantly more effort for a proper pose estimation. These and Pattern Recognition, IEEE Conf. on, 2012, pp. 1862–1869.
rotations are characterized by the palm not being aligned with [15] H. Hamer, K. Schindler, E. Koller-Meier, and L. Van Gool, “Tracking
sensor image plane, as in Figs. 6(e) and (g). a hand manipulating an object,” in Computer Vision, IEEE Int. Conf.
on, 2009, pp. 1475–1482.
ACKNOWLEDGMENT [16] C. Keskin, F. Kıraç, Y. E. Kara, and L. Akarun, “Real time hand
pose estimation using depth sensors,” in Consumer Depth Cameras
The authors would like to thank Johannes Bauer and Sven for Computer Vision, ser. Advances in Computer Vision and Pattern
Magg for providing valuable suggestions when writing this Recognition. Springer London, 2013, pp. 119–137.
paper. We would also like to thank the anonymous reviewers
for their valuable comments on our paper.

You might also like