0% found this document useful (0 votes)

30 views10 pages

Paper 5

Uploaded by

717821f223

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views10 pages

Paper 5

Uploaded by

717821f223

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Journal of Electrical Engineering & Technology (2022) 17:1961–1970

https://doi.org/10.1007/s42835-021-00972-6

ORIGINAL ARTICLE

Hand Gesture Control for Human–Computer Interaction with Deep

Learning
S. N. David Chua1 · K. Y. Richard Chin1 · S. F. Lim1 · Pushpdant Jain2

Received: 14 September 2021 / Revised: 21 November 2021 / Accepted: 23 November 2021 / Published online: 21 January 2022
© The Korean Institute of Electrical Engineers 2021

Abstract
The use of gesture control has numerous advantages compared to the use of physical hardware. However, it has yet to gain
popularity as most gesture control systems require extra sensors or depth cameras to detect or capture the movement of
gestures before a meaningful signal can be triggered for corresponding course of action. This research proposes a method
for a hand gesture control system with the use of an object detection algorithm, YOLOv3, combined with handcrafted rules
to achieve dynamic gesture control on the computer. This project utilizes a single RGB camera for hand gesture recognition
and localization. The dataset of all gestures used for training and its corresponding commands, are custom designed by the
authors due to the lack of standard gestures specifically for human–computer interaction. Algorithms to integrate gesture
commands with virtual mouse and keyboard input through the Pynput library in Python, were developed to handle commands
such as mouse control, media control, and others. The mAP result of the YOLOv3 model obtained 96.68% accuracy based
on testing result. The use of rule-based algorithms for gesture interpretation was successfully implemented to transform
static gesture recognition into dynamic gesture.

Keywords Hand gesture · Human computer interaction · Deep learning · Object detection

1 Introduction surpass the long-established WIMP (windows, icons, menus,

pointer) graphical user interface and permit inexperienced
The input techniques for human–computer interaction (HCI) users to have effortless human–computer interaction without
are mostly limited to mouse and keyboard. These interactions going through extensive training [5, 19].
require users to have contact on physical hardware which However, most gesture recognition systems require
may be undesirable in some situation. With the advance- users to have wearable devices and consequently, they have
ment of technologies, there has been increasing innovation failed to gain wider acceptance among user [1]. Therefore,
on how we interact with computers and computing devices, vision-based gesture recognition systems stand out as the
such as voice commands and gesture control. While voice most practical ones as they only require the user to have
command has gained huge popularity among users, it has a webcam [21]. Furthermore, vision-based gesture recog-
its limitation due to its nature and may not be suitable for nition has become more prominent in these few years due
navigating through complex menus [29]. Meanwhile, the use to the advancement in artificial intelligence [9], enabling
of gestures allows for intuitive and natural interaction with computers to learn, identify objects and information from
the computer [2]. Hand gestures are widely explored as an complex background. The advancement of object detection
interaction medium for a variety of applications since it is algorithms has enabled real-time object detection [12], and
highly intuitive and natural for a human. Intuition may even in some situation, may even surpass the human’s ability to
recognize objects [15].
* S. N. David Chua The vision-based recognition method usually utilizes one
[email protected]; [email protected] or more cameras to aid in gesture interpretation through
sequences in a video. A common method for hand gesture
1
Faculty of Engineering, Universiti Malaysia Sarawak, recognition is separated into three different phases, i.e. the
94300 Kota Samarahan, Malaysia
detection of hand, hand tracking, and gesture or posture
2
School of Mechanical Engineering, VIT Bhopal University, recognition [2]. However, with the advancement of deep
Bhopal Indore Bypass, Sehore 466114, India

13
Vol.:(0123456789)
1962 Journal of Electrical Engineering & Technology (2022) 17:1961–1970

convolutional neural networks (CNNs), many researchers on some aspect in HCI, such as mouse cursor control, where
began adopting CNNs that can be self-trained to extract continuous input of gesture class and its location is needed
important features, effectively removing the need for feature instead of single time command activation. Therefore, this
engineering while increasing the accuracy and recognition work proposed to use static gesture recognition that can out-
speed. put gesture class and its location continuously.
Most of the modern prominent researches on vision- For static gesture recognition, [17] used CNN to recog-
based gesture recognition are done based on video classi- nize gestures and achieved 97.12% accuracy. They claimed
fication [10, 21]. However, the video classification method that data augmentation on the dataset such as rescaling,
is restrained to one-off classification and each gesture can zooming, shearing, rotation, width, and height shifting had
only execute a one-time command rather than continuous increased their accuracy for 4%. A similar approach was
command input. For example, users cannot drag and drop a undertaken by [14] to recognize 24 static hand gestures
file or folder in a computer’s GUI since the user do not have from the alphabet sign language of PERU. Kim et al. [20]
full control over the command in terms of time and distance. proposed the use of You-Only-Look-Once (YOLO) object
Therefore, this paper proposed a ruled based algorithm that detection network and concluded that using YOLO with
allows gesture recognition using only normal RGB camera ROI segmentation achieved higher accuracy while accel-
with YOLOv3 object detector to recognize gestures and erating training speed. Meanwhile, Ni et al. [24] proposed
interpret sequences and movement of static gestures into Light YOLO model that improved the accuracy, speed, and
dynamic gestures to give computer commands through vir- model size of YOLOv2 model for gesture recognition. In Bai
tual keyboard and mouse input into the computer. In this et al. [4], a modified Single Shot Multibox Detector (SSD)
paper, the YOLOv3 object detector is chosen to classify and network is adopted for skeleton-based gesture recognition.
localize gestures. Several techniques are proposed in this In this work, the well-established YOLOv3 object detector
paper for gesture control using gesture class and its location. was chosen, as the gesture detector, as it is one of the fastest
state-of-the-art algorithms with high accuracy.
There are several hand gesture recognition datasets that
2 Related Work are publicly available. For example, one of the largest data-
sets is the Chalearn dataset that has both isolated and con-
Before CNNs were heavily used in vision-based gesture rec- tinuous gesture. However, the gestures in the dataset are
ognition, there was numerous research in gesture recogni- derived from Italian sign language and may not be univer-
tion using feature engineering [18, 31]. However, researchers sally suitable for interaction with computers. Meanwhile,
began to shift their interest in CNNs to recognize gestures the nvGesture dataset [23] is designed for in-car automotive
due to its superior performance. Furthermore. with the com- devices, and the EgoGesture dataset [32] is in the egocentric
mercialization of depth sensors (RGB-D) such as Microsoft view. Given their limitation to the objective of this project,
Kinect, many kinds of research on gesture or action recog- custom gesture dataset is designed for the proposed gesture
nition began to take advantage of it due to its robustness control system.
against illumination variations and abundant 3D structure
information.
Most of the research in hand gesture recognition focused 3 Methodology
on video classification problem to classify dynamic gestures.
Wan et al. [30] summarized current methods on utilizing The proposed system for gesture recognition and control
RGB-D sensors, which can be categorized into two main cat- system involves two main steps, which are gesture detection
egories: isolated and continuous gesture recognition. Unlike using YOLOv3 object detector, followed by a rule-based
isolated gesture recognition, continuous gesture recognition gesture interpreter. In this project, the camera that is used
is harder as the system needs to recognize more than one to capture and recognize gestures was the Logitech C922
gesture in a video. This is challenging as the detector should webcam. The computer used was ASUS GL552VW laptop
recognize the start and end of the gesture in the video by with Nvidia GTX960m graphics card which has 5.0 com-
itself [21]. There are several strategies for this problem. For puter capability and 2048 GDDR5 memory. The system was
example, Chai et al. [10] assume all gestures starts and ends designed for the Windows 10 operating system.
with the performers’ hands down, so that the system knows
when the gesture is performed. Meanwhile, Camgoz et al. 3.1 Defining Gestures and Corresponding
[8] treat the segmentation process as a learning volume, Commands
while Köpüklü et al. [21] proposed a hierarchical structure
to ensure single time activation for each performed gesture. The system utilizes image classification and localiza-
However, using video classification may have its limitation tion, combined with handcrafted rules to activate control

13
Journal of Electrical Engineering & Technology (2022) 17:1961–1970 1963

commands for the computer. For this study, custom gestures

are used instead of referring to gestures from other studies.
10 hand postures were proposed in this study as shown in
Fig. 1. Both the user’s hands are used for the system, where
the first row in the figure represent gestures for the right
hand, while the bottom row represents gestures for the left
hand. The images shown are mirrored images such that the
input images are flipped horizontally so that the system can
act as a mirror to avoid confusion during operation.
There are seven gestures for the right hand and three for Fig. 2 Hand gestures used for this project
the left hand. For the right hand, starting from the left of the
top row in Fig. 2, the postures are palm, two-fingers, pre-
pinch, post-pinch, fist, pre-flick, and post-flick. For the left on user requirements. The gesture control is predominantly
hand, the postures are palm, fist, and two-fingers. controlled with the right hand, with the left hand as a modi-
The system uses hand posture sequences and movement fier gesture, which is applied easily using if/else statement.
to activate certain control command. For example, the flick- The implementation part is discussed in more detailed in
ing command is registered when the system detects the “pre- Sect. 3.5.
flick” posture followed by “post-flick” posture. Likewise,
hand closing gesture is recognized when the system detects 3.2 Image Dataset Preparation
“palm” followed by “fist”. From the ten postures, nine types
of controls are defined as shown in Table 1. It is worth to The training and testing dataset is commonly prepared by
mention that the number of control is not limited to this table splitting them from a common dataset. However, in this
in practice and can have a higher control variation based study, the testing dataset was created separately to assess the
performance of the YOLOv3 object detector under six dif-
ferent categories to study the performance of the detector in
different conditions. The details are discussed in Sect. 3.2.2.

3.2.1 Preparation of Training Dataset

In this project, 200 training images per gesture were pre-

pared, which corresponds to a total of 2000 training images.
All the training dataset were prepared by the authors. Each
image was captured individually with a webcam with var-
ied camera height and hand position relative to the camera
to diversify the image dataset to reduce overfitting during
training. The training dataset was prepared systematically
as shown in Table 2, where the images were captured in
batches, and hand position and angle were varied in each
image. Next, the LabelImg labelling tool [28] was used to
label images in YOLO format. During labelling, the bound-
ing box is labelled accurately and precisely as some of the
gesture commands rely on the output coordinate to function
properly, such as controlling the cursor position.

3.2.2 Preparation of Testing Dataset

In this study, the performance of the YOLO object detec-

tor is assessed under six different categories. The first
three categories test the model’s performance on three
different subjects that were not present in the training
dataset to determine how the detector would perform on
different people with a limited dataset. All the detections
Fig. 1 Proposed gesture control system were done based on 10 gesture classes at 70 cm away from

13
1964 Journal of Electrical Engineering & Technology (2022) 17:1961–1970

Table 1 Defined gestures and No. Control Left hand Right hand
corresponding commands
1 Close active application window None Sequential gesture
Gesture 1: Pre-flick
Gesture 2: Post-flick
2 Scroll up/down None Circular movement
Gesture: Two-fingers
3 Scroll horizontally Static movement Horizontal movement
Gesture: Fist Gesture: Two-fingers
4 Zoom in/out Static movement Circular movement
Gesture: Two-fingers Gesture: Two-fingers
5 Mouse control Static movement Sequential gesture
Gesture: Gesture 1: Pre-pinch
1. Palm: Absolute cursor Gesture 2: Post-pinch
movement
2. Fist: Relational cursor
movement
6 Mute None Sequential gesture
Gesture 1: Palm
Gesture 2: Fist
7 Media play or pause Sequential gesture None
Gesture 1: Palm
Gesture 2: Fist
8 Adjust volume Static movement Vertical movement
Gesture: Palm Gesture: Two-fingers
9 Show desktop Static movement Sequential gesture
Gesture: Fist Gesture 1: Palm
Gesture 2: Fist

Table 2 Structure of dataset preparation

Batch no. Camera height Distance (cm) Batch size

(cm)

1 100 70 15
2 100 100 15
3 100 130 15
4 115 70 15
5 115 100 15
6 115 130 15
7 130 70 15
8 130 100 15
9 130 130 15
10 100 30 15
11 130 30 15
12 120 80 35

the camera, but with a unique gesture position in each

Fig. 3 Structure of prepared dataset
image. Then, the remaining three categories are used to
evaluate the model performance for its recognition at dif-
ferent hand gesture distance away from the camera, which per each gesture class, hence, 100 images per category.
is 70 cm, 120 cm, and 180 cm. These three categories The overview of the structure of the dataset preparation
were all captured with the same subject as the training is shown in Fig. 3.
images. All test categories were tested with 10 images

13
Journal of Electrical Engineering & Technology (2022) 17:1961–1970 1965

3.3 Training YOLO Object Detector

The configuration file for YOLOv3 was configured accord-

ing to the requirement of the project and the limitation of the
hardware. One important configuration for this project is set-
ting flip = 0 so that the model can distinguish between left and
right hand. Next, batch number and subdivision number were
adjusted according to the capability of the GPU. A batch rep-
resents the number of images that would be processed before
updating the network weights, and subdivision is the number
of divisions for the batch. Both the batch number and subdivi-
sion number were set at 64 due to the hardware limitation. It is
anticipated that the performance of the model may improve if a Fig. 4 Representation of the computer screen, output frame, and cur-
sor control frame in term of pixels
smaller subdivision number is used since larger minibatch may
give better optimization, where minibatch = batch/subdivision.
Furthermore, the random parameter for image size augmenta- Average precision (AP) summarizes the shape of the pre-
tion was also set to 0 to accommodate the low GPU memory. cision/recall curve.
The performance of the model shall improve if these limita-
1∑
tions of having high subdivision and no size randomization can AP = p (r) (4)
11 r𝜖{0,0.1,…,1} interp
be eliminated. The width and height of input resolution are left
default at 416 while the learning rate is 0.001. where pinterp (r) = maxr̃∶̃r≥r p(̃r).
Finally, the mean average precision provides the overall
3.4 Evaluating YOLOv3 Performance AP of all detection classes
1 ∑N
To evaluate the performance of the model, object detection mAP = AP . (5)
N i=1 i
evaluation following the PascalVOC standard is used [13].
The model is evaluated with 0.5 IoU (intersection over union),
which is calculated with
3.5 Integrating Gestures with Windows Commands
Area of Overlap
IoU = . (1) For the implementation of gesture commands, a Python
Area of Union
YOLOv3 implementation approach is used, which is based
Testing image was fed to the model for detection and a on the GitHub repository by [3]. The code from this reposi-
predicted bounding box was drawn. The accuracy was meas- tory is fully written in Python and implemented in Pytorch.
ured by computing the area of overlap, also called Intersec- The main library used to develop the system in this project
tion over Union, IoU between ground truth bounding box and is the Pynput module, which is used to execute Windows
predicted bounding box. If the calculated IoU is more than 0.5, commands based on the recognized gestures.
the detection would be considered as True Positive (TP). The
detection is a False Positive (FP) if the detected object does not
match the ground truth object (IoU < 0.5). The detection is a 3.5.1 Cursor Positioning
False Negative (FN) if the model failed to detect any object in
an image that possesses an object. The precision and recall are Two modes for cursor positioning are proposed for this
calculated based on the formulae as follows. Precision meas- study, which are absolute positioning and relative position-
ures how many predictions are correctly predicted in the total ing. The absolute position takes in the coordinate output
amount of prediction. Recall measures how good the model is from the gesture detection and translates them directly to the
at predicting the positives from all cases. cursor position on the computer screen. However, this type
of control may have difficulty to make a precise movement.
TP Hence, relative positioning helps to make slower, more accu-
Precision = (2)
TP + FP rate positioning by using the distance travelled by the hand
to move the cursor.
TP
Recall = . (3)
TP + FN 3.5.1.1 Absolute Positioning The absolute cursor posi-
tioning algorithm can be represented visually as shown
in Fig. 4. The grey frame in the figure represents a

13
1966 Journal of Electrical Engineering & Technology (2022) 17:1961–1970

1920 × 1080 pixels computer screen. The blue frame rep- 3.5.2 Scrolling Algorithm and Related Commands
resents the output video (640 × 840) of the YOLO detec-
tor, while the red control frame (240 × 135) is a virtual One simple technique for scrolling that is proposed is by
frame where the position of the hand gesture controls the using a dragging-like concept as if a touchscreen device on
position of the cursor. The control frame is kept signifi- midair. This technique is used for scrolling left and right.
cantly smaller so that the ends of the computer screen can That is, when “right two-fingers” is detected, the distance
be reached easily with smaller hand movement. of the hand movement is translated to the amount of scroll-
The YOLO network outputs two coordinates to form a ing horizontally. The horizontal distance, S is calculated by
bounding box around a hand gesture, which is the top left subtracting previous hand horizontal coordinate, Cp from the
and bottom right vertices of the box. To ease the smooth- current coordinate, Cc.
ness of the control, a centre coordinate is used because
using the corner coordinates may cause undesired over
S = Cc − Cp (9)
fluctuations. The average of both coordinates is calculated The S variable is stored in the system, and when a thresh-
using old is reached, the scrolling command is activated, and S is
(
Cx1 + Cx2 Cy1 + Cy2
) reset to 0 again to repeat the process. The chosen threshold is
Cx,y = , . (6) 80 for scrolling left, and −80 for scrolling right. This method
2 2
is also used in volume adjustment commands, where mov-
Then, with cursor position coordinate as P, ing gesture up and down controls the volume adjustment.
This approach is straightforward and is very user friendly.
P = (C − Cl) × i (7) However, when a large amount of scrolling is needed, it
where Cl is the top and left clearance distance of the con- becomes inefficient as the system does not recognize which
trol frame to the screen and i is the scale coefficient for the hand movement should registered as scrolling, and which
hand coordinate to the cursor position. For this project, the movement is not when the user intends to return the hand
clearance is set with Clx = 300 and Cly = 180 as shown in position for more scrolling. Therefore, another method is
Fig. 4. Therefore, with 1080p screen, i = 8. The result of proposed for vertical scrolling.
these configurations is that the control frame slights to the The vertical scrolling command is activated when the
right of the camera frame because only the right hand is used user uses the two-finger gesture on the right hand with a
to control the cursor position. |In addition, this function is circular movement. However, an algorithm is needed since
used with other gesture commands that require hand position the computer cannot detect the circular movement easily.
input, such as scrolling and volume adjusting. A method is proposed to detect circular rotation indirectly
by registering directional changes of the hand position. The
3.5.1.2 Relative Positioning Cursor positioning based on method works by comparing the latest hand gesture coor-
the absolute coordinate of the hand is not precise and can dinate and previous coordinate to determine the movement
be hard to control due to inconsistent bounding box. To direction of the hand. Each directional change corresponds
solve the problem, a relational cursor movement approach to one circular movement. Therefore, the scrolling command
is proposed to reduce the cursor movement. This is acti- can be outputted. For example, gesture movement coming
vated by changing the left-hand modifier gesture to fist from left and down and changes into left and up corresponds
(see Table 1). The relational approach moves the position to a clockwise movement (Fig. 5). Therefore, scroll down
of the cursor according to the distance of the movement of command is activated.
the hand. By using hand coordinate output from the previ- The location of the predicted hand location can fluctu-
ous frame, the vector distance to move the cursor, M can ate over time. That may register unintended movement. A
be calculated as
Cc − Cp
M= (8)
s
where Cc is the current hand coordinate output, Cp is the pre-
vious hand coordinate output, and s is the sensitivity index.
The selected s in this project is 3. As a comparison by com-
paring Eqs. (7) and (8), the cursor moves 24 times slower
using relational cursor positioning. Hence, the cursor can be
controlled more precisely. Fig. 5 Illustration of scrolling command activation for clockwise
rotation

13
Journal of Electrical Engineering & Technology (2022) 17:1961–1970 1967

threshold of movement distance can easily be set for the be able to perform well in recognizing gestures of different
scrolling command to be activated to eliminate the fluctuat- people. Further, hand gestures with higher feature similarity
ing hand coordinate. The threshold used for this project is can increase the rate of false detection, such as the ‘right pre-
80. flick’ and ‘right post-flick’ gestures, or the ‘fist’ and ‘post-
pinch’ gestures which have very similar features. Therefore,
3.5.3 Sequential Gestures the use of gestures with distinctive features should improve
the recognition rate.
Gesture commands that require sequential gestures are easily The chart in Fig. 7 shows a significant decrease in mAP
implemented by storing the gesture that is performed in the when the IoU is increased to 75%. The combined mAP series
previous video capture frame and using if/else statement for indicates the evaluation of the entire testing dataset com-
execution. For example, to close active Windows application bined and is shown to decrease by 25.26% with the increased
using the flick gesture, if the current gesture is post-flick and IoU. The extent of the reduction indicates that the model
the previous gesture is pre-flick, the virtual ‘CTRL + F4’ is not very good at predicting the accurate location of the
would be pressed to close the active window. This is done bounding box. This is because, with a higher IoU threshold,
by assigning ‘previous gesture’ = ‘current gesture’ at the end detections that do not meet the threshold is categorized as
of each execution loop. false positives. A similar trend is also observed in the origi-
nal YOLOv3 article [27], where the drop of AP is relatively
higher compared to other state-of-the-art object detectors.
4 Experiment result and Discussion Table 3 shows the comparison results of hand recognition
with other similar studies of gesture recognition by image
4.1 Model Evaluation classification. The highest mAP at 50% IoU for this project
is 96.68% and is very close to other studies. All the studies
The model’s performance during testing using the testing reported rather high recognition rate due to the fact that ges-
dataset is shown in Fig. 6. The three categories that involve tures are generally distinctive and consistent in its features.
foreign subjects performed significantly worse than those
with training subject. Subject 2 of the three categories had
the lowest mAP at 45.90%. The test category of 120 cm
showed the highest mAP at 96.68%. One conclusion that
can be drawn from this experiment is that feature similarities
of the gestures affect the detection rate greatly. For exam-
ple, subject 2 and 3 showed lower average precision which
may be due to gender and age differences from the training
subject. Meanwhile, subject 1 may perform better due to the
same gender with the training subject. Although the model
was trained with a very limited dataset, it still showed rela-
tively good performance on other test subjects. Therefore, if
more training dataset is provided to train the model, it should Fig. 7 Graph of mAP with different IoUs

Fig. 6 Recognition performance at 50% IoU

13
1968 Journal of Electrical Engineering & Technology (2022) 17:1961–1970

Table 3 Comparison with other studies more superior than the method using feature engineering
Author (year) Method Accuracy (%) with color segmentation, such as the works by Huang et al.
[16] and Rahmat et al. [26], where gestures appeared in
Maqueda et al. [22] VS-LBP + SVM (h–h) 97.3 front of the face could not be recognized accurately due
Chen et al. [11] GMM + GMF 98.7 to similar skin color. Furthermore, the detector does not
Oyedotun and Khashman CNN and SDAE 91.33 and 92.83 detect any gesture in non-gesture frames when an appro-
[25]
priate confidence level threshold is applied.
Ni et al. [24] Light YOLO 98.06
The system is very responsive and easy to use when
Bush et al. [7] SSD + CNN 98.97
outputting commands that involved sequential gesture.
This project YOLOv3 96.68
However, there is a limitation for cursor control where
the control may experience fluctuation in position due to
inconsistent bounding box from the YOLOv3 output. This
is undeniable considering that YOLOv3 has lower AP at
higher IoU which indicates that it has a lower capability to
output accurate bounding box compared to another state-
of-the-art object detector [27]. Some smoothing algorithm
may be done to improve the control of cursor position,
such as applying the Bezier curves. Aside from that, other
object detectors may be used to improve the system. For
example, the newer YOLOv4 [6] is reported to have bet-
ter accuracy and detection speed. Also, the use of class-
agnostic non-maximal suppression with low IoU threshold
(0.1 to 0) can improve the system by eliminating repeated
posture detection on a single hand, thereby lowering the
false positive rate.

Fig. 8 Demonstration of hand gesture recognition on controlling the

cursor with both hands (fist on the left hand and pinch for the right 5 Conclusion
hand)
This paper presents a method based on an object detector
for recognizing dynamic gestures through static gesture rec-
Their recognition problems have significantly fewer classes ognition without using video classification. Cursor control,
compared to a standard dataset such as the COCO Dataset, virtual scrolling, and sequential gesture transformation tech-
where YOLOv3 is reported to have 57.9% AP at 50% IoU niques were proposed to interpret outputted gesture class
[27]. Furthermore, the comparison is only illustrative, as and location from the object detector into dynamic gestures.
all of the studies shown were tested using their own custom The use of rule-based algorithms for gesture interpretation
dataset. was demonstrated in the paper and successfully transformed
static gesture recognition into dynamic gesture and treat-
4.2 Performance in Real‑Time Detection ing the hand as an air mouse. The gestures were able to be
recognized accurately, and good HCI interactivity response
The object detector showed rather good performance when was shown during the execution of gesture commands. The
it was applied in real-time detection, where gestures could limitation of this method is that it cannot precisely output the
be recognized most of the time. The recognition speed can location of the hand posture. Furthermore, current gesture
vary with different input resolution into the network. How- functions are limited due to the small amount of posture
ever, faster recognition with lower resolution can decrease classes. Therefore, for the future direction of this research,
the recognition accuracy. The chosen input resolution is we propose to incorporate landmark detection to output a
320. Despite using a mid-range graphic card, the detec- more precise posture position. The functionality of the sys-
tion reaches 10FPS during gesture detection. It is found tem can also be improved by establishing a large-scale hand
that hand gestures can be recognized accurately even with posture dataset for HCI that contains more posture varia-
a clustered background and with background of similar tions so that the object detector can recognize more postures.
color with the hand, such as other body parts, as shown Finally, using a better object detector algorithm may improve
in Fig. 8. This method of gesture recognition proved to be the performance of the system.

13
Journal of Electrical Engineering & Technology (2022) 17:1961–1970 1969

Acknowledgements This research was funded by Universiti Malaysia 15. Geirhos R, Schütt HH, Medina Temme CR, Bethge M, Rauber
Sarawak under the UNIMAS publication support fee fund. J, Wichmann FA (2018) Generalisation in humans and deep
neural networks. In: Advances in neural information processing
systems
Declarations 16. Huang H, Chong Y, Nie C, Pan S (2019) Hand gesture recognition
with skin detection and deep learning method. J Phys Conf Ser.
Conflict of interest On behalf of all authors, the corresponding author https://doi.org/10.1088/1742-6596/1213/2/022001
states that there is no conflict of interest. 17. Islam MZ, Hossain MS, Ul Islam R, Andersson K (2019) Static
hand gesture recognition using convolutional neural network with
data augmentation. In: 2019 Joint 8th international conference on
informatics, electronics and vision, ICIEV 2019 and 3rd inter-
References national conference on imaging, vision and pattern recognition,
IcIVPR 2019 with international conference on activity and behav-
1. Al-Shamayleh AS, Ahmad R, Abushariah MAM, Alam KA, ior computing, ABC 2019. https://doi.org/10.1109/ICIEV.2019.
Jomhari N (2018) A systematic literature review on vision based 8858563
gesture recognition techniques. Multimed Tools Appl. https://doi. 18. Ji S, Xu W, Yang M, Yu K (2013) 3D Convolutional neural net-
org/10.1007/s11042-018-5971-z works for human action recognition. IEEE Trans Pattern Anal
2. Anwar S, Sinha SK, Vivek S, Ashank V (2019). Hand gesture rec- Mach Intell. https://doi.org/10.1109/TPAMI.2012.59
ognition: a survey. Lecture notes in electrical engineering. https:// 19. Kim H, Albuquerque G, Havemann S, Fellner DW (2005) Tangi-
doi.org/10.1007/978-981-13-0776-8_33 ble 3D: hand gesture interaction for immersive 3D modeling. In:
3. Ayooshkathuria (2018) pytorch-yolo-v3. Github 9th international workshop on immersive projection technology—
4. Bai Y, Zhang L, Wang T, Zhou X (2019) A skeleton object detec- 11th Eurographics symposium on virtual environments, IPT/
tion-based dynamic gesture recognition method. In: Proceedings EGVE 2005
of the 2019 IEEE 16th international conference on networking, 20. Kim S, Ji Y, Lee KB (2018) An effective sign language learning
sensing and control, ICNSC 2019. https://d oi.o rg/1 0.1 109/I CNSC. with object detection based ROI segmentation. In: Proceedings—
2019.8743166 2nd IEEE international conference on robotic computing, IRC
5. Beyer G, Meier M (2011) Music interfaces for novice users: com- 2018. https://doi.org/10.1109/IRC.2018.00069
posing music on a public display with hand gestures. In: Proceed- 21. Köpüklü O, Gunduz A, Kose N, Rigoll G (2019) Real-time hand
ings of the international conference on new interfaces for musical gesture detection and classification using convolutional neural
expression networks. In: Proceedings—14th IEEE international conference
6. Bochkovskiy A, Wang CY, Liao M (2020) YOLOv4: optimal on automatic face and gesture recognition, FG 2019. https://doi.
speed and accuracy of object detection. https://a rxiv.o rg/p df/2 004. org/10.1109/FG.2019.8756576
10934v1.pdf 22. Maqueda AI, Del-Blanco CR, Jaureguizar F, García N (2015)
7. Bush IJ, Abiyev R, Arslan M (2019) Impact of machine learn- Human-computer interaction based on visual hand-gesture rec-
ing techniques on hand gesture recognition. J Intell Fuzzy Syst. ognition using volumetric spatiograms of local binary patterns.
https://doi.org/10.3233/JIFS-190353 Comput Vis Image Underst. https://doi.org/10.1016/j.cviu.2015.
8. Camgoz NC, Hadfield S, Bowden R (2017) Particle filter based 07.009
probabilistic forced alignment for continuous gesture recognition. 23. Molchanov P, Yang X, Gupta S, Kim K, Tyree S, Kautz J (2016)
In: Proceedings—2017 IEEE international conference on com- Online detection and classification of dynamic hand gestures with
puter vision workshops, ICCVW 2017. https://doi.org/10.1109/ recurrent 3D convolutional neural networks. In: Proceedings of
ICCVW.2017.364 the IEEE computer society conference on computer vision and
9. Chandrasekaran G, Periyasamy S, Panjappagounder Rajaman- pattern recognition. https://doi.org/10.1109/CVPR.2016.456
ickam K (2020) Minimization of test time in system on chip using 24. Ni Z, Chen J, Sang N, Gao C, Liu L (2018) Light YOLO for
artificial intelligence-based test scheduling techniques. Neural high-speed gesture recognition. In: Proceedings—international
Comput Appl. https://doi.org/10.1007/s00521-019-04039-6 conference on image processing, ICIP. https://doi.org/10.1109/
10. Chai X, Liu Z, Yin F, Liu Z, Chen X (2016) Two streams recurrent ICIP.2018.8451766
neural networks for large-scale continuous gesture recognition. 25. Oyedotun OK, Khashman A (2017) Deep learning in vision-based
In: Proceedings—international conference on pattern recognition. static hand gesture recognition. Neural Comput Appl. https://doi.
https://doi.org/10.1109/ICPR.2016.7899603 org/10.1007/s00521-016-2294-8
11. Chen D, Li G, Sun Y, Kong J, Jiang G, Tang H, Ju Z, Yu H, Liu H 26. Rahmat RF, Chairunnisa T, Gunawan D, Pasha MF, Budiarto R
(2017) An interactive image segmentation method in hand gesture (2019) Hand gestures recognition with improved skin color seg-
recognition. Sensors (Switzerland). https://d oi.o rg/1 0.3 390/s 1702 mentation in human-computer interaction applications. J Theor
0253 Appl Inf Technol 97(3):727-739
12. Chua SND, Lim SF, Lai SN et al (2019) Development of a child 27. Redmon J, Farhadi A (2018) YOLO v.3. Tech Report
detection system with artificial intelligence using object detection 28. Tzutalin (2015) LabelImg. LabelImg. https://g ithub.c om/t zutal in/
method. J Electr Eng Technol 14:2523–2529. https://doi.org/10. labelImg
1007/s42835-019-00255-1 29. Walker A (2013) Voice commands or gesture recognition: how
13. Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman will we control the computers of the future? https://www.indep
A (2010) The pascal visual object classes (VOC) challenge. Int J endent.co.uk/life-style/gadgets-and-tech/voice-commands-or-
Comput Vis. https://doi.org/10.1007/s11263-009-0275-4 gesture-recognition-how-will-we-control-the-computers-of-the-
14. Flores CJL, Cutipa AEG, Enciso RL (2017) Application of convo- future-8899614.html
lutional neural networks for static hand gestures recognition under 30. Wan J, Li SZ, Zhao Y, Zhou S, Guyon I, Escalera S (2016)
different invariant features. In: Proceedings of the 2017 IEEE 24th ChaLearn looking at people RGB-D isolated and continuous data-
international congress on electronics, electrical engineering and sets for gesture recognition. In: IEEE computer society conference
computing, INTERCON 2017. https://doi.org/10.1109/INTER on computer vision and pattern recognition workshops. https://d oi.
CON.2017.8079727 org/10.1109/CVPRW.2016.100

13
1970 Journal of Electrical Engineering & Technology (2022) 17:1961–1970

31. Yang X, Tian Y (2014) Super normal vector for activity recogni- K. Y. Richard Chin received his B.Eng. from Universiti Malaysia
tion using depth sequences. In: Proceedings of the IEEE computer Sarawak, Malaysia. His research interests include computer vision and
society conference on computer vision and pattern recognition. mechanical processes.
https://doi.org/10.1109/CVPR.2014.108
32. Zhang Y, Cao C, Cheng J, Lu H (2018) EgoGesture: a new data- S. F. Lim received her Ph.D. degree from National University of Sin-
set and benchmark for egocentric hand gesture recognition. IEEE gapore, Singapore. Her research interests include adsorption process
Trans Multimed. https://doi.org/10.1109/TMM.2018.2808769 and AI.Pushpdant Jain received his Ph.D. from National Institute of
Technology, Rourkela (Odisha). His research interests include new
Publisher's Note Springer Nature remains neutral with regard to product development and finite element analysis.
jurisdictional claims in published maps and institutional affiliations.
Pushpdant Jain received his Ph.D. from National Institute of Tech-
nology, Rourkela (Odisha). His research interests include new product
development and finite element analysis.
S. N. David Chua received his Ph.D. degree from Dublin City Univer-
sity, Ireland. His research interests include finite element modelling,
simulation, biomechanical and AI.

Space Laser Communications For Beyond 5G 6G 1
No ratings yet
Space Laser Communications For Beyond 5G 6G 1
13 pages
BMW HUD Factory Schematic
100% (1)
BMW HUD Factory Schematic
4 pages
Gesture Volume Control Using Otsu Method
No ratings yet
Gesture Volume Control Using Otsu Method
12 pages
Sensors 24 06262
No ratings yet
Sensors 24 06262
16 pages
Verifying Flowmeter Accuracy
100% (1)
Verifying Flowmeter Accuracy
8 pages
1 s2.0 S0952197622002871 Main
No ratings yet
1 s2.0 S0952197622002871 Main
21 pages
A Review of The Hand Gesture Recognition System Cu
No ratings yet
A Review of The Hand Gesture Recognition System Cu
19 pages
First Meeting Hasanv3
No ratings yet
First Meeting Hasanv3
20 pages
Hand Gesture Recognition Using Diff Models
No ratings yet
Hand Gesture Recognition Using Diff Models
22 pages
Paper Sprigner Final
No ratings yet
Paper Sprigner Final
13 pages
Ict 7 - Q1 Exam
No ratings yet
Ict 7 - Q1 Exam
3 pages
Computer Vision Based Hand Gesture Recognition System: February 2023
No ratings yet
Computer Vision Based Hand Gesture Recognition System: February 2023
9 pages
Hand Gestures Report
No ratings yet
Hand Gestures Report
24 pages
Hand Gesture Recognition Thesis
100% (2)
Hand Gesture Recognition Thesis
5 pages
IACV - Mini Project Report - 214
No ratings yet
IACV - Mini Project Report - 214
17 pages
Shsconf Stehf2022 03011
No ratings yet
Shsconf Stehf2022 03011
5 pages
Smart Meter Manual
No ratings yet
Smart Meter Manual
3 pages
(Base - Paper) Gesture Recognisation For Computer
No ratings yet
(Base - Paper) Gesture Recognisation For Computer
8 pages
A Review of The Hand Gesture Recognition IEEE 2021
No ratings yet
A Review of The Hand Gesture Recognition IEEE 2021
15 pages
Duplichecker Plagiarism Report
No ratings yet
Duplichecker Plagiarism Report
2 pages
Major Project PPT Format (1) Hand Gesture Recognition
No ratings yet
Major Project PPT Format (1) Hand Gesture Recognition
20 pages
Hand-Gesture Recognition Using Computer-Vision Techniques: David J. Rios-Soria Satu E. Schaeffer Sara E. Garza-Villarreal
No ratings yet
Hand-Gesture Recognition Using Computer-Vision Techniques: David J. Rios-Soria Satu E. Schaeffer Sara E. Garza-Villarreal
8 pages
Ijprems30400021488 April
No ratings yet
Ijprems30400021488 April
6 pages
Paper 3
No ratings yet
Paper 3
10 pages
Irjet V11i4321
No ratings yet
Irjet V11i4321
4 pages
Paper Survey
No ratings yet
Paper Survey
10 pages
IJRPR4107
No ratings yet
IJRPR4107
6 pages
ZYX-S2 User Manual
No ratings yet
ZYX-S2 User Manual
7 pages
Virtually Controlling Computers Using Hand Gesture and Voice Commands
No ratings yet
Virtually Controlling Computers Using Hand Gesture and Voice Commands
9 pages
Controlled Hand Gestures Using Python and OpenCV
No ratings yet
Controlled Hand Gestures Using Python and OpenCV
7 pages
Ms Word Using Hand Gessturexnxxt
No ratings yet
Ms Word Using Hand Gessturexnxxt
7 pages
A Deep Learning-Based Multimodal Depth-Aware
No ratings yet
A Deep Learning-Based Multimodal Depth-Aware
9 pages
Ijiie 2019dec1003vol1
No ratings yet
Ijiie 2019dec1003vol1
6 pages
Pisharady 2015
No ratings yet
Pisharady 2015
14 pages
Computer Vision Based Hand Gesture Recognition For Human-Robot Interaction
No ratings yet
Computer Vision Based Hand Gesture Recognition For Human-Robot Interaction
26 pages
Turban Bi2e PP Ch02
No ratings yet
Turban Bi2e PP Ch02
48 pages
References
No ratings yet
References
1 page
Centrifuga
No ratings yet
Centrifuga
21 pages
0478 s18 QP 11 PDF
No ratings yet
0478 s18 QP 11 PDF
11 pages
Applying Hand Gesture Recognition For User Guide Application Using Mediapipe
No ratings yet
Applying Hand Gesture Recognition For User Guide Application Using Mediapipe
8 pages
1376 3343 1 PB
No ratings yet
1376 3343 1 PB
10 pages
myML RESEARCH
No ratings yet
myML RESEARCH
7 pages
Paper 9004
No ratings yet
Paper 9004
4 pages
Electromagnetic Brake Project
No ratings yet
Electromagnetic Brake Project
3 pages
Survey On 3D Hand Gesture Recognition
No ratings yet
Survey On 3D Hand Gesture Recognition
15 pages
Responsibilities: Bosch
No ratings yet
Responsibilities: Bosch
4 pages
PDF
No ratings yet
PDF
9 pages
840D SL ADI4 Equip Man 0721 en-US
No ratings yet
840D SL ADI4 Equip Man 0721 en-US
90 pages
Lundahl: Tube Amplifier Interstage Transformer / Line Output Transformer LL1692A
No ratings yet
Lundahl: Tube Amplifier Interstage Transformer / Line Output Transformer LL1692A
2 pages
Automated Digital Hand Gesture and Speech Recognition Based Presentations
No ratings yet
Automated Digital Hand Gesture and Speech Recognition Based Presentations
12 pages
Hand Gesture Recognition and Conversion To Speech For Speech Impaired
No ratings yet
Hand Gesture Recognition and Conversion To Speech For Speech Impaired
8 pages
Gestop: Customizable Gesture Control of Computer Systems: Sriram S K Nishant Sinha
No ratings yet
Gestop: Customizable Gesture Control of Computer Systems: Sriram S K Nishant Sinha
5 pages
Soumen Dikpati C.V
No ratings yet
Soumen Dikpati C.V
2 pages
Software Testing Engineer: QC Exposure
No ratings yet
Software Testing Engineer: QC Exposure
3 pages
A Real-Time Hand Gesture Recognition and Human-Computer Interaction System
No ratings yet
A Real-Time Hand Gesture Recognition and Human-Computer Interaction System
8 pages
Gesture Recognition For Human Computer Interaction
No ratings yet
Gesture Recognition For Human Computer Interaction
3 pages
ICSCSS 177 Final Manuscript
No ratings yet
ICSCSS 177 Final Manuscript
6 pages
Ejemplo de Bill of Lading
No ratings yet
Ejemplo de Bill of Lading
2 pages
Fast and Robust Dynamic Hand Gesture Recognition Via Key Frames Extraction and Feature Fusion
No ratings yet
Fast and Robust Dynamic Hand Gesture Recognition Via Key Frames Extraction and Feature Fusion
11 pages
A Robust Static Hand Gesture Recognition
No ratings yet
A Robust Static Hand Gesture Recognition
18 pages
Rs 4.18 Rs 7.17 Rs 14.15 Rs 14.34 Rs 38.24: Telenor
No ratings yet
Rs 4.18 Rs 7.17 Rs 14.15 Rs 14.34 Rs 38.24: Telenor
2 pages
Concur Expense EXP - SG - Workflow - AuthAppr
No ratings yet
Concur Expense EXP - SG - Workflow - AuthAppr
38 pages
A Human Hand Gesture Based TV Fan Control System Using Open CV
No ratings yet
A Human Hand Gesture Based TV Fan Control System Using Open CV
99 pages
Wieldy Finger and Hand Motion Detection For Human Computer Interaction
No ratings yet
Wieldy Finger and Hand Motion Detection For Human Computer Interaction
5 pages
Gesture Volume Control Project Synopsis
No ratings yet
Gesture Volume Control Project Synopsis
5 pages
Zerone IT Fest 2K24
No ratings yet
Zerone IT Fest 2K24
2 pages
Hand Gesture Paper
No ratings yet
Hand Gesture Paper
5 pages
Esp8266 Commands
No ratings yet
Esp8266 Commands
12 pages
B F M G R S: Rightness Actor Atching For Esture Ecognition Ystem Using Scaled Normalization
No ratings yet
B F M G R S: Rightness Actor Atching For Esture Ecognition Ystem Using Scaled Normalization
12 pages
Naukri Pawankumar (2y 2m)
No ratings yet
Naukri Pawankumar (2y 2m)
2 pages
Volume Control Using Gestures
100% (1)
Volume Control Using Gestures
4 pages
Hand Gesture Recognition: Jay Prakash, Uma Kant Gautam
No ratings yet
Hand Gesture Recognition: Jay Prakash, Uma Kant Gautam
6 pages
Boiler Fotovoltaic Solar Kerberos Lite 110.B
No ratings yet
Boiler Fotovoltaic Solar Kerberos Lite 110.B
2 pages
Vision Based Hand Gesture Recognition: Pragati Garg, Naveen Aggarwal and Sanjeev Sofat
No ratings yet
Vision Based Hand Gesture Recognition: Pragati Garg, Naveen Aggarwal and Sanjeev Sofat
6 pages
Chanhan 2015
No ratings yet
Chanhan 2015
7 pages
Marketing and Sales Management
No ratings yet
Marketing and Sales Management
13 pages
Vision Based Hand Gesture Recognition
No ratings yet
Vision Based Hand Gesture Recognition
6 pages
Ijecet: International Journal of Electronics and Communication Engineering & Technology (Ijecet)
No ratings yet
Ijecet: International Journal of Electronics and Communication Engineering & Technology (Ijecet)
9 pages
Timers On The ATmega168 - 328 - QEEWiki
No ratings yet
Timers On The ATmega168 - 328 - QEEWiki
9 pages
Real Time Vision-Based Hand Gesture Recognition For Robotic Application
No ratings yet
Real Time Vision-Based Hand Gesture Recognition For Robotic Application
10 pages
117 2074 Salesforce-AI-Associate
No ratings yet
117 2074 Salesforce-AI-Associate
7 pages
La Gard Combogard Pro 39e Electronic Lock Software Installation Instructions 730 018 Rev D Web PDF
No ratings yet
La Gard Combogard Pro 39e Electronic Lock Software Installation Instructions 730 018 Rev D Web PDF
12 pages
Internal Verification of Assessment Decisions - BTEC (RQF) : Higher Nationals
No ratings yet
Internal Verification of Assessment Decisions - BTEC (RQF) : Higher Nationals
57 pages
BI Brochure
No ratings yet
BI Brochure
2 pages
48 A Albert Prasanna Kumar Simulation
No ratings yet
48 A Albert Prasanna Kumar Simulation
6 pages
Data Sheet Cruzer Fit Usb 2 0
No ratings yet
Data Sheet Cruzer Fit Usb 2 0
2 pages
Hand Gesture Recognition Using Neural Networks: G.R.S. Murthy R.S. Jadon
No ratings yet
Hand Gesture Recognition Using Neural Networks: G.R.S. Murthy R.S. Jadon
5 pages
Gesture Technology: A Review: Aarti Malik, Ruchika
No ratings yet
Gesture Technology: A Review: Aarti Malik, Ruchika
4 pages
Gesture Recognition: Unlocking the Language of Motion
From Everand
Gesture Recognition: Unlocking the Language of Motion
Fouad Sabry
No ratings yet
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
From Everand
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
Fouad Sabry
No ratings yet

Paper 5

Uploaded by

Paper 5

Uploaded by

Journal of Electrical Engineering & Technology (2022) 17:1961–1970

Hand Gesture Control for Human–Computer Interaction with Deep

1 Introduction surpass the long-established WIMP (windows, icons, menus,

commands for the computer. For this study, custom gestures

3.2.1 Preparation of Training Dataset

In this project, 200 training images per gesture were pre-

3.2.2 Preparation of Testing Dataset

In this study, the performance of the YOLO object detec-

Table 2 Structure of dataset preparation

Batch no. Camera height Distance (cm) Batch size

the camera, but with a unique gesture position in each

3.3 Training YOLO Object Detector

The configuration file for YOLOv3 was configured accord-

Fig. 6 Recognition performance at 50% IoU

Fig. 8 Demonstration of hand gesture recognition on controlling the

You might also like

3.2.1 Preparation of Training Dataset

3.2.2 Preparation of Testing Dataset

3.3 Training YOLO Object Detector