0% found this document useful (0 votes)
20 views9 pages

Real-Time Recognition of The Users Arm Gestures in 2D Space With A Smart Camera

Uploaded by

Kedarnath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views9 pages

Real-Time Recognition of The Users Arm Gestures in 2D Space With A Smart Camera

Uploaded by

Kedarnath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Real-Time Recognition of the

User’s Arm Gestures in 2D Space


with a Smart Camera
Naji Guedri and Rached Gharbi

G
esture control technology is one of the most limits most input to normal keyboard and mouse interactions
important technologies introduced today to fa- without mechanical devices [3].
cilitate human-machine communication. In this There are two types of gestures in computer interfaces. The
article, we propose a Smart Camera One (SCO) to remotely first type, offline gestures, are processed after the user inter-
monitor a user ’s arm movements in order to control a ma- acts with the object. Robots can be taught how to perform tasks
chine. Therefore, its main role is to act as an intermediary by using self-learning mechanisms allow robots to simulate
between the user and the machine to facilitate commu- human behavior, but these learning mechanisms require expen-
nication. This instrument technology can measure and sive real-world data and are difficult to collect [4]. In the method
control the different arm positions in real-time with an ac- proposed by Bonardi et al. [5], the successful educational simu-
curacy of up to 81.5%. The technique consists of learning lation of robots has been implemented using algorithms known
to perform live tasks. SCO is able to immediately execute as task-integrated control networks (­ TecNets), without the need
a user ’s task without going through the machine learn- for a real human rendering of the keys to form a robot, so the
ing phase through demonstrations. During the time of method does not require any interaction with human charges.
the test, the setup of the demonstrator is done visually by The technique is to learn to accomplish tasks from one or more
SCO, it can recognize users according to their skin color demonstrations to produce actions on a variation of the same
and distinguish them from colorful backgrounds. This is touch e.g., motor speeds. With TecNets, tasks are hard to for-
based on image processing using an intelligent algorithm get, and many tasks can be done after being learned. During the
implemented in the SCO through the Python program- time of the test, the configuration of the demonstrator is done vi-
ming languages and the OpenCV library as described in sually by camera, so the learning will be very applicable from a
this article. Although initially a specialized application, human demonstration. The camera is like an eye, observing var-
SCO could be important in several areas, from remote ious actions and then recording the data needed for rehearsal.
control of mobile robots to gaming, education, and even However, TecNets are unable to immediately perform a user’s
marketing. Extensive experiments demonstrate the effec- task without going through the robot learning phase through
tiveness of the developed model as shown in [1]. demonstrations.
In the second type of gestures, named direct Action Ges-
Current Trends and Applications tures, the concept of recognition and direct manipulation is
Most people can accomplish any task simply by watching used. Simple gestures allow users to control devices without
another person perform it in front of them only once. In the physically touching them. Often it is facial or hand move-
world of human-machine communication, it will be compli- ments that make gestures more than any other physical
cated until a machine can imitate human behaviors, before movement [6]. This avoided having to touch the interface,
it can effectively reproduce that behavior. Gesture recogni- such as smartphones, laptops, games, TV, and music equip-
tion technology is considered one of the most important and ment during the COVID-19 pandemic [7],[8]. However, most
widely used technologies, because it can provide users with body-based systems or handheld sensors use built-in data sen-
convenient information at the time of completing the task that sors to accelerate moving positions, e.g., the disadvantage of
facilitate communication and control of any device remotely data glove-based systems is that the user must wear gloves
[2]. Gesture recognition can be considered as a way for com- to operate the system. Alrubayi et al. proposes a pattern rec-
puters to understand human language, therefore building a ognition model for static gestures [9],[10]. The dataset was
richer bridge between machines and humans than primitive captured using a DataGlove device, and the finger and wrist
text user interfaces or even graphical user interfaces. This still movements can be measured as mentioned in [11], in which

May 2024 IEEE Instrumentation & Measurement Magazine 59


1094-6969/24/$25.00©2024IEEE
Authorized licensed use limited to: Zhejiang University. Downloaded on October 09,2024 at 15:47:43 UTC from IEEE Xplore. Restrictions apply.
electromyography (EMG) signals from human upper limb an RPi 4 and 8GO which have always required a relatively low
muscles are the control interface between the user and the ro- amount of power compared to standard desktop PCs. The SCO
botic arm. Several methods have been developed to interpret is used to identify arms gestures and then send the action in-
sign language using computer vision algorithms and cam- formation in real-time.
eras, e.g., Lazarou proposes a novel shape matching method
for real-time hand gesture recognition using a classical com- Basic Rules for Moving Arms
puter vision techniques [12]. Clark et al. use a cup of tea as a The shoulder joint is the most mobile joint in the body: it is a
remote to select his favorite channel: they move the cup left or free-moving spherical joint and allows the arm to move in a
right until to reach the station or to adjust the volume or select circular motion and move upward. The shoulder is a group of
a specific channel, and sliders will appear on the screen [13]. joints, tendons, and muscles that work together to allow ample
Wang proposes a novel gesture recognition model that uses a room for arm movement described by three different move-
convolutional neural network using an infrared camera [14]. ments (above, middle or under). These three movements are
Furthermore, in the work of Yeduri et al., various hand move- distinguished by the angle θ opening between the limb and the
ments were recognized using a thermal imaging camera [15]. torso e.g., a normal person can raise his right hand upright at
Therefore, work in computer vision continues to capture ges- an angle θ that is approximately 135° [23].
tures or poses and general human motion through cameras Fig. 1b shows the arm movements in both directions with
connected to computers [16],[17]. an angle of θ1which represents the movement of the right arm
In this research, the idea is different from existing ones be- in circle C1 and θ2 which represents the movement of the left
cause we rely on the camera to detect the movement of the arm in the circle C2. Fig. 1c also shows leg movement, such as
user’s limbs to instantly recognize and repeat these gestures. angle θ3, which represents the right leg movement in circle C3,
Then, if we install SCO on this machine, the machine can im- and angle θ4, which represents left leg movement in circle C4.
mediately track these actions without learning them through Smith et al. [24] used a 3D infra-red scanning method to calcu-
demonstrations. In particular, the system can identify right- late parameters. In our work, all these actions are performed in
hand, left-hand, right-leg and left-leg movements. We rely on front of the SCO and can be detected by our algorithm. All θ an-
these movements in three different planes (down, parallel, and gles are calculated using these following equations to calculate
up) to control things remotely. Therefore, through the SCO the angle θ1or θ2 and anticipate the angle of upper limb 0°< θ1,2
proposed in this work, it is possible to recognize gestures and < 135°. Equation (2) is used to calculate the angle θ3 or θ4 and
monitor the movements of the user’s arm and legs in different anticipates the angle of lower limb 0°< θ3,4< 90°:
directions at the same time. This is done through data analysis
 (tbL , R )2 + (ml L , R )2 − (zaL , R )2 
and the use of computer vision techniques with the algorithm θ1,2 = cos−1  (1)
 2 × tbL , R × ml L , R 
implemented in the proposed SCO. This camera has a user-  
friendly design, is very portable, easy to carry, can be used  ( gll L , R )2 + ( grl L , R )2 − (zl L , R )2 
outdoors without a power adapter, of small size and is very θ3,4 = cos−1  (2)
 2 × gll L , R × grl L ,R 
lightweight. The SCO can be used almost anywhere. In fact,  
the power of this idea is in the smart and fast algorithm imple- The average person cannot raise his arm upward at an an-
mented after doing many experiments that allow the camera to gle of 180° or more, except in exceptional cases in which the
detect the arm without adding additional secondary sensors. user is a sports practitioner or in some rare congenital cases.
In this paper, we give the principle of a live user arms de- This movement area was divided into three equal ranges (arms
tection system through a smart camera with an implemented up, arms at shoulder height, or arms down), where each range
embedded system using a python program for calculating related to the position of the arm. Therefore, there are a total
distances and processing images. Shirmohammadi et al. [18] of nine possible static gestures that can be distinguished by
use camera vision to perform measurement operations and/ SCO as shown in Table 1. The angle slot θ3 and θ4 will be in the
or calculation processes. The SCO proposed in this paper can range of 0° to 90° with respect to the lower limb. Users can per-
be applied to many projects that rely on remote monitoring to form three movements: the first movement is to move only the
know the position of the user’s arms and legs like in the med- right arm and position the left arm in the range ‘arms at sides’
ical field [19], video games [20], industrial robot control field at an angle of θ1 = 0° down. There are 134 gestures, divided into
[21], and fire brigade robot control [22]. 3 static gestures and 131 dynamic gestures. There are 17170
gestures, divided into 9 static gestures and 17161 dynamic
Structure of Model ­gestures as listed in Table 2.
Using our prototype SCO (Fig. 1a), arms gesture recognition We can observe that dynamic gestures are linked with
can be obtained through the following three steps: the pres- static gestures, because arm movement cannot be very
ence of the user in front of the camera and comply with the accurate to adopt only static gestures. This behavior was con-
basic rules for moving the arm; taking a video by SCO; and sidered in our developed algorithms. Fig. 2 shows some of the
displaying the gesture of arms on the monitor. The SCO is many gestures a user can do in front of the SCO. Among these
mounted on an adjustable stand and is 25 mm x 23 mm x 9 mm upper limb movements, there are three possibilities: only
with weight just over 3 g. The power of a SCO is the power of static gestures in the picture as shown in Fig. 2a, Fig. 2b, and

60 IEEE Instrumentation & Measurement Magazine May 2024


Authorized licensed use limited to: Zhejiang University. Downloaded on October 09,2024 at 15:47:43 UTC from IEEE Xplore. Restrictions apply.
ls
P ixe Right side of user Left side of user
4 60

Arms up
Facing forward
n itor C1
Mo C2
Head
Li
m
SCaMO 135° b

480 Pixels

Arms at shoulder
dcs
Limb

height
90°
θ2 Torso θ1

Y Surface
0° 0°

Arms at sides
Z

Arms down/
Limb

Limb
X

The feet hip-width apart

(a) (b)

C1 C2
180° 180°

Armpit (ar L) Armpit (ar R)


Triceps Triceps
brachii (tb) brachii (tb R)
Pisiform (pL,R)
270° 90° 270°
90°
Midaxillary
Midaxillary

θ2
line (ml R)

θ1
line (ml L)

za L za R


Gluteus medius (gm L) Gluteus medius (gm R)
Upper Limb

C3
180° 180° C4

Hip (hjL)
Gracile (gllL) Gracile (grlR) Hip (hj R)
Ankle (ajL,R) 270°
270° 90° 90°
Gracile (gll R)
Gracile (grl L)

θ3 θ4

zl L zl R


Ground (g L) Lower Limb Ground (g R)

(c)

Fig. 1. (a) Schematic of the experiment after placing the SCO (SCaMO) to test user gestures. (b) The normal alignment of the lower limb notes the toes and knees
face forward, and the upper limb moves on the sides in straight movements without any bending. (c) Visual representation of the joint angles of the upper limb and
lower limb.

May 2024 IEEE Instrumentation & Measurement Magazine 61


Authorized licensed use limited to: Zhejiang University. Downloaded on October 09,2024 at 15:47:43 UTC from IEEE Xplore. Restrictions apply.
Table 1 – Nine different static gestures that the SCO can detect
Limbs Arms position Static Gesture 1 (SG1) SG2 SG3 SG4 SG5 SG6 SG7 SG8 SG9
Right Up • •
Arm At shoulder height • • •
Down/ at sides • • •
Left Up • • •
Arm At shoulder height • • •
Down/ at sides • • •

Table 2 – Number of arm gestures depends on the angle of opening


Number Number of Total
Case
Arm moves of static Angle θ dynamic Angle θ number of
number
gestures gestures gestures
N°1 Only right arm 3 θ1 = 0°, 90°, 135° 131 θ1 ∈ [1,89]∩[91,134] 134
N°2 Only left arm 3 θ2 = 0°, 90°, 135° 131 θ2 ∈ [1,89]∩[91,134] 134
θ1, θ2) = (0°, 0°), (90°, 90°),

Right & left (135°, 135°), (0°, 90°), (90°,
N°3 9 17161 θ1, θ2) ∈[1,89]∩[91,134]
(θ 17170
arms together 0°), (0°, 135°), (135°, 0°),
(90°, 135°), (135°, 90°)

(xe, ye)
(xe, ye) (xe, ye) (xe, ye) (xe, ye)
d
Right han
hand
Left hand Right hand Left
User Right hand User Left hand d User User User
Right hand

n Left
Left hand

t ha han
d
gh
Ri

(a) (b) (c) (d) (e)

(xe, ye) (xe, ye) (xe, ye) (xe, ye) (xe, ye)

Right Righ h and


hand t ha
nd Left
Left hand
Right hand

User User and User User User


Left hand

Left th d Left
han Righ an han
d th d
gh
Ri

(f) (g) (h) (i) (j)

Fig. 2. Some of the dynamic gestures and static gestures of user detected by the SCO under different circumstances.

Fig. 2i; only dynamic gestures in the image as shown in Fig. 2c, divided into two main parts that complement each other.
Fig. 2d, Fig. 2g, and Fig. 2h; and dynamic gestures and static The first part works on the ‘Face detection,’ and the sec-
gestures in one picture, as shown in Fig. 2e, Fig. 2f, and Fig. 2j. ond part works on the ‘Fixing intervals’ on the image. In the
The algorithm converts dynamic gestures to static ones since first part, when recording a video, the algorithm tracks the
the movements are recognized and then included in the static movement of the user’s head in all directions. Lu et al. [25]
gestures. proposed a new computer vision-based algorithm deduced
from face detection technology. In our algorithm, a part of
Concept of the Algorithm the real-time face recognition algorithm from the OpenCV li-
Our algorithm is applied an identical method to the mea- brary is also used, but with some modifications, including a
surement of angles θ. The principle is to determine the change in the size of the green frame that surrounds the us-
reflection point coordinates that represent a pixel in an er’s head during movement. We also extract the center point
image. Fig. 3 shows the five basic steps through which (xcp, ycp) and work with it instead of (x, y) obtained from the
the algorithm passes inside the camera. The algorithm is library, as shown in Fig. 4a. The following equations show

62 IEEE Instrumentation & Measurement Magazine May 2024


Authorized licensed use limited to: Zhejiang University. Downloaded on October 09,2024 at 15:47:43 UTC from IEEE Xplore. Restrictions apply.
how to obtain new small green frame, where W is the origi-
nal width of the green frame, H is the original length of the
Video capture
green frame, wp is the width of the small green frame, hp is the
1
length of the small green frame and D is the divider.
Video processing
W
wp = (3) - Conversion RGB to
D Gray color.
H - Filter Gaussian
hp = (4)
D - Thresholding

2
The purpose is to make a very small size that is almost a green
point (Gpt) drawn on the user’s nose tip visible in the video. It
has an important role for the user, as it shows him whether the
SCaMO detected the gesture of his arms or not. It appears au- Fixing intervals Face detection
tomatically when the user is in front of the camera and in the
Fixed image Regulating user
video only when the algorithm inside the camera is running. parametric values face detection
according to face parameters
The user can adjust software parameters, such as calibrating detection settings (x, y, w, l).
parameters in the video HSV as mentioned by Chen et al. [26],
according to the effects of external factors and modification in
the position of the user. The dsc distance (Fig. 1a) that separates
4
the user from the camera in order to capture the movement of
the arms is set at a few meters. This distance varies according Detect
coordinates of 3
to the difference in the size of the user’s body and according to
first white pixels
the resolution of the camera, and external factors such as light-
to identify arms
ing and shadows that affect the quality of the image through position
the appearance of the image noise. The metric we adopt to get
the good distance dsc, in order to get good results, is the Gpt in- 5
dicator appearing on the video when the user is in the correct Action
position. On average, this distance can reach approximately
3 m, depending on a regular web camera, as is the case in this Fig. 3. The schematic diagram of arms gestures recognition using SCO.
article. The dsc distance can be enlarged many times when we
adopt a zoom and work with a more advanced camera. Also, proposed algorithm. Therefore, this advantage makes it possi-
the importance of using point (xcp, ycp) in the algorithm is to ble to change the work equipment. For example, it is possible
work on dividing the image into two parts (the right and left to put our algorithm in a computer with an integrated camera
side of the user). Equation (5) was used to get the number of instead of using the RPi with a web camera as mentioned by N.
centroids in a rectangle that consists of n distinct points x1 to Lalithamani [27]. Through the xcp point, the algorithm divides
xn. But in the context of image processing, (6) and (7) are used the image into two parts, each part carrying half of the body.
to obtain the coordinate of centroid where xcp is the abscissa of The ycp parameter plays an important role in determining
the centroid, ycp is the ordinate of the centroid, and M denotes the position of the hand. That is, it knows the direction of the
the Moment: hand, specifically whether the hand is raised above or not. The
n coordinates of nose tip (xcp, ycp) are where the point xcp repre-
∑x (5)
1
C= i sents the distinction between ‘Right side of user’ and ‘Left side
n
i =1 of user’. The point ycp represents the level at which the position
of the raised hand is determined. According to the physical
M10
xcp = (6) length and the distance between the user and the camera, the
M00
dcs distance is related to the user size of the image. If the dcs is
large, then the size of the user in the image is small, and there-
M10
ycp = (7) fore Y1 is large, and vice versa. In our example, the image size
M00
is 640 x 480 pixels, and the sum of the three regions is less than
Fig. 4a shows the right slit is equal to the left slit because the or equal to the image length of 480 pixels. The Y1 represents
user is in the middle of the video. In some other, more proba- the range ‘Arms up’ and is specified in this range [y0, ycp = y1].
ble cases, we find that the user does not exist in flipping the The Y2 represents the range ‘Arms at shoulder height’ and is
video, and therefore one side is larger than the other. The ­specified in this range [y1, y2].
reason is that the camera is installed on fixed support in all In the next phase, the algorithm works mainly on the fixing
directions. In our algorithm, the camera can determine the intervals where it divides the right-hand slot for the user into
position of the arm even though the user is not placed in the three equal parts (X1r = X2r = X3r). The same is true for the left-
center of the video due to the programming intelligence of the hand slot (X1l = X2l = X3l) on abscise axis as shown in Fig. 4. The

May 2024 IEEE Instrumentation & Measurement Magazine 63


Authorized licensed use limited to: Zhejiang University. Downloaded on October 09,2024 at 15:47:43 UTC from IEEE Xplore. Restrictions apply.
460 Pixels xbpl
Y x1 x1r xbmr x x3r = xcp = x0l x1l xhml x2l x3l
2r Y
X
X1r X2r X3r X11 X21 X31
y0 y0
(x, y0)

Y1
Y1
wp
(xbml, ybml)
ycp ybml
hp (xcp, ycp) ycp
480 Pixels

y1
Y2
Y2
User
y2
ybmr y2
(xbmr, ybmr)
(xbpl, ybpl) ybpl
Y3 Y3
Z Z
y3 y3
X
(a)

On the right side of the user On the left side of the user

Smart Camera

(b) (c)

Fig. 4. (a) The method of recognizing the user's arm movement through the camera‘s internal algorithm. (b) The threshold area X2r. (c) The threshold area X2l.

purpose of these splits is to reduce the time required to process the first white pixel coordinates (xbml, ybml) in Fig. 4c and the co-
the video image in order to recognize the result. The two parts ordinates (xbmr, ybmr) in Fig. 4b. Searching starts from right to
X1l and X3r contain the body, head, torso, and other parts of the left, e.g., from point x2l to point x1l, to find the first white pixel
user’s body that are not important to process and that the al- in Fig. 4c. This is unlike Fig. 4b in which the search direction
gorithm cannot distinguish. Therefore, we only had to work starts from left to right, e.g., from point x1r to point x2r. Then the
on analyzing the X2l and X2r parts of the image. This represents point xbml belongs to [x2l, x1l] which is the interval x2l. Moreover,
2/6 of the image size, which corresponds to reducing more the point xbmr belongs to [x2r, x2r] which is the interval X2r. From
than 80% of the time required to obtain the result, especially point ybml, we can discover the gesture of the right arm.
important since the shooting is via video and requires immedi- In addition, from the ybmr point, we can see the left arm ges-
ate results. Fig. 4b and Fig. 4c are respectively the parts X2r and ture. If ybmr belongs to Y1, then the hand is up since that point is in
X2l. These two parts are two small RGB images. The algorithm the ‘Arms up’ range. If ybmr belongs to Y2, then the hand is in the
converts them to two binary images and starts by searching for middle since this point is in the ‘Arm at shoulder height’ range.

64 IEEE Instrumentation & Measurement Magazine May 2024


Authorized licensed use limited to: Zhejiang University. Downloaded on October 09,2024 at 15:47:43 UTC from IEEE Xplore. Restrictions apply.
(a) (b) (c)

Fig. 5. Monitor displays the user who is present in front of SCO and displays the result in the small black window at the bottom: (a) hand down; (b) hand in the
middle; (c) hand up.

If ybmr belongs toY3, then the


hand points down since that
point is in the ‘Arms down’
range. If no white pixel point θ2 θ2
θ2 θ1 θ1 θ1 θ1
is found in Fig. 4b or Fig. 4c, θ2
usually the xbml point will be
between [x0l, x1l], and point θ4 θ3 θ4 θ3 θ3
θ4 θ4 θ3
ybml will exist between [y2,y3]
in this case. With a normal θ1 ∈ ]0°, 135°] θ1 ∈ ]0°, 135°] θ1 ∈ ]0°, 135°] θ1 ∈ ]0°, 135°]
person, it is impossible to θ2 ∈ ]0°, 135°] θ2 = 0° θ2 = 0° θ2 ∈ ]0°, 135°]
find the ybml point between θ3 = 0° θ3 ∈ ]0°, 90°] θ3 = 0° θ3 = 0°
θ4 = 0° θ4 = 0° θ4 ∈ ]0°, 90°] θ4 ∈ ]0°, 90°]
[y0, ycp] because the opening (a) (b) (c) (d)
angle does not exceed 135°.
The same principle is true
for the right side of the user. θ2 θ1 θ2 θ1 θ2 θ1 θ2 θ1
Therefore, the algorithm in-
dicates that the user’s arm is
down and in the ‘arm down’ θ4 θ3 θ4 θ3 θ4 θ3 θ4 θ3
range. After identifying the
position of the arm, the al- θ1 = 0° θ1 = 0° θ1 = 0° θ1 ∈ ]0°, 135°]
gorithm proceeds to display θ2 = 0° θ2 ∈ ]0°, 135°] θ2 ∈ ]0°, 135°] θ2 ∈ ]0°, 135°]
θ3 = 0° θ3 = 0° θ3 ∈ ]0°, 90°] θ3 ∈ ]0°, 135°]
the results on the screen.
θ4 = 0° θ4 ∈ ]0°, 90°] θ4 = 0° θ4 = 0°
(e) (f) (g) (h)
Results and
Discussion
Some experiments are real- Fig. 6. Arms and leg gestures of the user in eight different positions.
ized in which we determine
the position of the user’s right hand. The movement of this hand differ depending on what work is required, e.g., to control a ro-
includes three positions (down, middle, or up) when the posi- bot arm kit remotely, to count the number of movements of a
tion of the legs is closed. This experiment was performed in a sports exercise, etc. The gestures cannot be random in front of
workshop in our lab where there is a colored background behind the SCO. However, some users do not stand properly, so the
the user, such as colored objects, light reflections, and illusions head direction of the user may need a bit of correction in front
on panels behind him (Fig. 5). It is necessary to install a special of the camera.
halo near the smart camera to avoid interference from shadows F. Ficuciello et al. developed reliable control algorithms
and other light-related obstacles [28]. Through the operating to reduce mechanical complexity on the control side [30]. In
system ‘raspbian’ in the RPi, we can access Python program files our research, Fig. 6 shows all the possible cases controlled by
using a monitor for display in which a written word indicates our algorithm, which are eight cases (from Fig. 6a to Fig. 6h.
the gesture hand (hand down, hand in the middle, hand up) ac- Namely, the angles θ3 and θ4 represent the gesture of a lower
cording to the variables of the movement of the user’s hand. limb, as shown in Fig. 6b. In Fig. 6a and Fig. 6e, the user stands
We can make small adjustments to our algorithm in order to straight and places her feet together so her legs touch, taking
send this data to the robotic arm by RPi [29]. The movements into account her thighs, knees, calves, and ankles. Therefore,

May 2024 IEEE Instrumentation & Measurement Magazine 65


Authorized licensed use limited to: Zhejiang University. Downloaded on October 09,2024 at 15:47:43 UTC from IEEE Xplore. Restrictions apply.
real-time video processing.
The smart algorithm detects
100% 100% 100%
different arm gestures. We
100%
90% have carried out many ex-
75% 75% 75% 75% periments that have given
80%
70% impressive results. Whether
60% 50% 50% the arm is raised, raised in
50% the middle, or down, our
40% methods rely on the cam-
25% 25% 25% 25%
30%
era to reveal arm gestures
20%
by SCO, and it is not nec-
10% 0% 0% 0%
0% essary to place a sensor on
Fig. 6a Fig. 6b Fig. 6c Fig. 6d Fig. 6e Fig. 6f Fig. 6g Fig. 6h the arms or use a flexible
wearable data band/data
Correct result Error rate
glove to achieve the result.
Moreover, the SCO is able
to be used in applications
Fig. 7. Percentage of error rate and correct result for eight different positions detected when using SCO.
from the human-machine
both angles θ3 and θ4 are 0°. Some experiments have proven interface and human-computer interaction to human-robot in-
that the movement of the leg can affect the result of the move- teraction. This will contribute to the simplification of routines
ment of the arms. If the user’s leg moves, as shown in Fig. 6b, and ease of development. Anyone can stand in front of SCO
Fig. 6c, Fig. 6d, Fig. 6f and Fig. 6h, the algorithm will be unable and move their arms to control things remotely without need-
to identify the gesture arms at 100%, because there is a differ- ing a specialist to create a user interface.
ence from one gesture to another.
The total error rate is 12.5% as displayed in Fig. 7. This is References
because, the white pixel point coordinate detected by our al- [1] N. Guedri and R. Gharbi, “Real time recognition of the user’s
gorithm (xbpl, ybpl) in Fig. 4 is not taken into account, and the arm gestures in 2D space with a smart camera,” [Video]. [Online].
result will be correct (the arm is above). Indeed, the algo- Available: https://sites.google.com/view/smart-camera-
rithm has recognized the first point, which is the coordinates scamo/accueil?read_current=1.
(xbml, ybml). Therefore, the rest of the white pixels are not pro- [2] M. Hoetjes and L.V. Maastricht “Using gesture to facilitate L2
cessed by the algorithm, even the coordinates (xbpl, ybpl). phoneme acquisition: the importance of gesture and phoneme
In some other cases, the foot used is then raised up, so the complexity,” Frontiers in Psychology, vol. 11, Nov. 2020.
first white pixel is the coordinate (xbpl, ybpl) and the result will [3] L. A. M. Zaina, R. P. M. Fortes., V. Casadei, L. S. Nosaki, and D. M.
be incorrect. If the idea is illustrated through opening angles, B. Paiva, “Preventing accessibility barriers: guidelines for using
it will look like this: when θ3 and θ4 equal 0°, this means that user interface design patterns in mobile applications,” J. Syst. and
there is no error in determining the hand motion used. The re- Software, vol. 186, 111213, Apr. 2022.
sult is 100% correct, as shown in Fig. 6a. When θ3 or θ4 equal 0°, [4] K. Sadeddine, F. Z. Chelali, R. Djeradi, A. Djeradi, and S.
this means that there is no error in determining the movement B. Abderrahmane, “Recognition of user-dependent and
of the right or left hand of a user, and then the result is correct independent static hand gestures: application to sign
at 50%. But, in determining the movement of the other hand, language,” J. Visual Commun. Image Representation, vol. 79,
the error can be found, and then the error rate was determined 103193, Aug. 2021.
to be 25% out of 50% and the end result is correct 75% out of [5] A. Bonardi, S. James, and A. J. Davison, “Learning one-shot
100% as illustrated for Fig. 6b, Fig. 6d, Fig. 6f and Fig. 6h. In imitation from humans without humans,” IEEE Robotics and
the case of the arm at the bottom, the final result will be 100% Automation Ltrs., vol. 5, no. 2, pp. 3533–3539, Mar. 2020.
correct regardless of the position of the leg, because θ3 or θ4 is [6] V. V. Edwards, “60 hand gestures you should be using and their
less than 90°, and therefore the result is always ‘arm at the meaning,” Science of People. [Online]. Available: https://www.
bottom’, which is actually true of the position of the arm. In scienceofpeople.com/hand-gestures/.
Fig. 6g and Fig. 6c, when θ3 or θ4 does not equal 0°, legs are [7] J. Katti, A. Kulkarni, A. Pachange., A. Jadhav, and P. Nikam,
apart and feet are on the floor. The error rate of the right part “Contactless elevator based on hand gestures during COVID-19
of the user was determined to be 25% out of 50%, and the same like pandemics,” in Proc. Int. Conf. Advanced Computing and
error for the left part, and the result is correct 50% out of 100% Communication Systems (ICACCS), Jun. 2021.
as shown in Fig. 6e. [8] S. Shriram, B. Nagaraj, J. Jaya, S. Shankar, and P. Ajay, “Deep learning-
based real-time 9 virtual mouse system using computer vision to
Conclusion avoid COVID-19 spread,” J. Healthcare Engin., 8133076, 2021.
We proposed a new method to monitor the gesture of a user’s [9] A. H. Alrubayi et al., “A pattern recognition model for static
arm. The SCO prototype presented in this work is based on gestures in Malaysian sign language based on machine learning

66 IEEE Instrumentation & Measurement Magazine May 2024


Authorized licensed use limited to: Zhejiang University. Downloaded on October 09,2024 at 15:47:43 UTC from IEEE Xplore. Restrictions apply.
techniques,” Computers and Electrical Engin., vol. 95, 107383, [23] D. Haering, M. Raison, M. Begon, “A method for computing 3D
Oct. 2021. shoulder range of motion limits considering interactions between
[10] Y. Zhang et al., “Static and dynamic human arm/hand gesture degrees of freedom,” J. Biomechanical Engin., vol. 136, no. 8,
capturing and recognition via multiinformation fusion of flexible 084502, May 2014.
strain sensors,” IEEE Sensors J., vol. 20, no. 12, pp. 6450–6459, Jan. 2020. [24] S. H. L. Smith and A. M. J. Bull, “Rapid calculation of bespoke
[11] P. K. Artemiadis and K. J. Kyriakopoulos, “Assessment of muscle body segment parameters using 3D infra-red scanning,” Medi.
fatigue using a probabilistic framework for an EMG-based robot Engin. Physics, vol. 62, pp. 36–45, Oct. 2018.
control scenario,” in Proc. IEEE Int. Conf. BioInformatics BioEngin., [25] D. Lu. and L. Yan, “Face detection and recognition algorithm in
Oct. 2008. digital image based on computer vision sensor,” Image Analysis of
[12] M. Lazaroua, B. Li, and T. Stathaki, “A novel shape matching Vision Sensors, article 4796768, Sep. 2021.
descriptor for real-time static hand gesture recognition,” Computer [26] Z-H. Chen, J-T. Kim, J. Liang, J. Zhang, and Y-B. Yuan, “Real-time
Vision and Image Understanding, vol. 210, 103241, Sep. 2021. hand gesture recognition using finger segmentation,” Machine
[13] C. Clarke and H. Gellersen, “MatchPoint: spontaneous spatial Learning in Intelligent Video and Automated Monitoring, 267872, Jun.
coupling of body movement for touchless pointing,” in Proc. 30th 2014.
Ann. ACM Symp.User Interface Software and Technol. (UIST '17), [27] N. Lalithamani, “Gesture control using single camera for PC,”
pp.179–192, Oct. 2017. Procedia Computer Science, vol. 78, pp. 146–152, 2016.
[14] J. Wang, T. Liu, and X. Wang, “Human hand gesture recognition [28] L. Ma et al., “A fast LED calibration method under near field
with convolutional neural networks for K-12 double-teachers lighting based on photometric stereo,” Optics and Lasers in Engin.,
instruction mode classroom,” Infrared Physics and Technol., vol. vol. 147, Dec. 2021.
111, 103464, Dec. 2020. [29] M. Dobiš, M. Dekan, P. Beňo, F. Duchoň and A. Babinec
[15] S. R. Yeduri, D. S. Brelanda, O. J. Pandey, and L. R. Cenkeramaddi, “Evaluation criteria for trajectories of robotic arms,”
“Updating thermal imaging dataset of hand gestures with unique MDPIRobotics, vol. 11, no. 1, pp. 29, Feb. 2022.
labels,” Data in Brief, vol. 42, 108037, Jun. 2022. [30] F. Ficuciello, A. Villani, T. L. Baldi, and D. Prattichizzo, “A human
[16] T. Song. H. Zhao, Z. Liu, H. Liu, Y. Hu, and D. Sun, “Intelligent gesture mapping method to control a multi‐functional hand for
human hand gesture recognition by local–global fusing quality- robot‐assisted laparoscopic surgery: the MUSHA case,” Frontiers
aware features,” Future Generation Computer Syst., vol. 115, in Robotics and AI, Dec. 2021.
pp.298–303, Feb. 2021.
[17] A. Shore, “Talking about facial recognition technology: how framing Naji Guedri ([email protected]) is a Teacher-Researcher
and context influence privacy concerns and support for prohibitive in electrical engineering, specializing in artificial intelligence.
policy,” Telematics and Informatics, vol. 70, 101815, May 2022. His research focuses on computer vision, measurement, and
[18] S. Shirmohammadi and A. Ferrero, “Camera as the instrument: digital systems for industrial inspection, integrated electron-
the rising trend of vision based measurement,” IEEE Instrum. ics, nanotechnology, and microelectronics. He obtained his
Meas. Mag., vol. 17, no. 3, pp. 41-47, Jun. 2014. engineering degree in microelectronics in 2016, a master’s
[19] R. Gauer, “The Intera project at the Hospital Evangélico de degree in electronics and digital systems in 2018, and a doc-
Londrina is helping surgeons to control vital on-screen imagery toral degree in electrical engineering in 2023 at the National
with gestures to save precious time in the operating room,” Higher School of Engineers of Tunis (ENSIT), Tunisia. He has
Innovation that Matters, Springwise.com. [Online]. Available: also carried out research internships at EPFL in Switzerland in
https://www.springwise.com/in-brazil-surgeons-kinect- September 2019 and at the University of Siena in Italy in De-
control-x-ray-displays/. cember 2020.
[20] M. Orange, “MYO is a piece of wearable tech that detects muscle
configuration to enable users to remotely control any device with Rached Gharbi ([email protected]) has been a Full
arm movements,” Innovation that Matters, Springwise.com. Professor since 2015 at the National Higher School of Engi-
[Online]. Available: https://www.springwise.com/armband- neers of Tunis (ENSIT)-University of Tunis, Tunisia where he
enables-remote-control-device-gesture/. has served as Director since 2017. He previously held the posi-
[21] “Myo robot control–Intuitive manipulation with a 6 DOF robotic tion of Head of the Department of Electrical Engineering from
arm and anthropomorphic hand.” [Online]. Available: https:// 2011 to 2017. His research focuses on electronic devices, semi-
www.youtube.com/watch?v=EnY56VFmAYY. conductor materials, solar cells, photodetectors, DSSCs, and
[22] A. Dhiman et al., “Fire fighter robot with deep learning and intelligent systems. He obtained his Ph.D. degree in 1999 and
machine vision,” SSRN, 2020. [Online]. Available: https://doi. his HDR in 2008 in Industrial Engineering from the National
org/10.2139/ssrn.3633609. Engineers School of Tunis (ENIT)-University of Tunis-ElManar.

May 2024 IEEE Instrumentation & Measurement Magazine 67


Authorized licensed use limited to: Zhejiang University. Downloaded on October 09,2024 at 15:47:43 UTC from IEEE Xplore. Restrictions apply.

You might also like