0% found this document useful (0 votes)
9 views

Towards Smart Interaction Hand Gesture Recognition Using Machine Learning in IoT Scenarios

The document discusses using machine learning models like VGG16 and MobileNet to recognize hand gestures for applications in computer vision and IoT. It presents a system to identify 10 different hand gestures from over 20,000 images using these pre-trained models with customized fully connected layers. The MobileNet model is found to outperform VGG16 in recognizing gestures. The goal is to create an efficient baseline that combines models to improve accuracy of vision-based gesture recognition systems.

Uploaded by

xufukbesenkx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Towards Smart Interaction Hand Gesture Recognition Using Machine Learning in IoT Scenarios

The document discusses using machine learning models like VGG16 and MobileNet to recognize hand gestures for applications in computer vision and IoT. It presents a system to identify 10 different hand gestures from over 20,000 images using these pre-trained models with customized fully connected layers. The MobileNet model is found to outperform VGG16 in recognizing gestures. The goal is to create an efficient baseline that combines models to improve accuracy of vision-based gesture recognition systems.

Uploaded by

xufukbesenkx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2023 IEEE Global Conference on Artificial Intelligence and Internet of Things (GCAIoT)

Towards Smart Interaction: Hand Gesture


Recognition Using Machine Learning in IoT
2023 IEEE Global Conference on Artificial Intelligence and Internet of Things (GCAIoT) | 979-8-3503-8202-0/23/$31.00 ©2023 IEEE | DOI: 10.1109/GCAIoT61060.2023.10385100

Scenarios
Harit Ahuja Kaavya Varadarajan Mannar Jammal
Information System and Technology Information System and Technology Information System and Technology
York University York University York University
Toronto, Canada Toronto, Canada Toronto, Canada
[email protected] [email protected] [email protected]

Abstract—Hand gesture recognition is becoming popular in and non-standard environment [3]. Machine learning models
computer vision applications like sign language interpretation, are used to overcome technical issues and correctly identify
IoT robotic controls, and home automation. In this paper, the image gesture. The performance aspect of the vision-
we propose a gesture recognition system using two machine
learning models, VGG16 and MobileNet, and compare their based computer recognition systems has an average accuracy
performance in detecting hand gestures. Our dataset consists of of 78% [4]. This would create problems when developing a
over 20,000 images of ten different hand gestures that allow us to reliable system requiring hand gestures as input. However,
implement transfer learning and adapt the pre-trained models by using deep learning algorithms, this hand gesture system’s
customizing the fully connected layers. Our findings indicate that accuracy could be improved by more than 90%. The practical
the MobileNet model outperforms VGG16 model in recognizing
hand gestures. The paper focuses on creating a baseline that can aspect of computer vision would be to provide new interaction
combine different models to produce a more efficient result. techniques between humans and computer systems. The input
Index Terms—Hand Gesture, Machine learning, IoT, Com- devices used for the computer systems can leverage the deep
puter vision, VGG16, CNN, MobileNet. learning algorithms and could improve the interaction between
the human and the computer. [5]
I. I NTRODUCTION Convolutional Neural Network (CNN) is a popular deep
Computer vision is a collaborative field that uses digital im- learning model that is used in the computer vision domain.
ages or videos to allow computers to understand the data from Deep learning methods such as neural networks are generated
digital pictures. Computer vision has sub-domains, includ- to learn complex data by extracting hidden features from input
ing object detection, object recognition, motion estimation, to output, making detection and pattern recognition easier
3D pose modeling, etc. The computer vision field produces [6]. Feature extraction and data training are automatically
striking effects from these advances. In less than a decade, performed by combining deep learning techniques with the
the accuracy rate for identifying and analyzing objects has images gathered from computer vision. The specific convolu-
increased from 50% to 90 % [1]. In recent times, rapidly tional network designed for classification and localization is
identifying and responding to visual data by the computer has the Visual Geometry Group (VGG), which is used for hand
greater accuracy than humans. gesture recognition.
The most spontaneous and straightforward way to commu- VGG network can detect image recognition, image embed-
nicate with the computer is through the non-verbal interaction ding vectors, and image detection and localization used in the
technique called hand gesture [2]. Hand gesture recognition medical field for X-ray and MRI scans. VGG stands out from
(HGR) is becoming popular in computer vision due to its the other top models because of a 3x3 filter (small receptive
extensive use in applications like sign language, clinical and field) used in the convolutional layer [7]. MobileNet is a
health, Internet of Things (IoT) robotic controls, and home au- lightweight deep convolutional neural network with depth-wise
tomation. Home automation includes actions such as switching separable convolutions to produce a streamlined architecture.
on and off of the appliances and increasing or decreasing the MobileNet provides functional embedded vision and mobile
room temperature. applications [8]. Since MobileNet is a lightweight network
When a photo is taken with the still hand showing a gesture that reduces the parameters and computation values compared
against a transparent background, it is considered to be in good to VGG16. In this paper, we aim to do the following:
condition. However, the situation is not the same. While cap- 1. Study a gesture recognition system based on two pre-
turing a photo, it is hard to find a transparent, solid background trained deep learning models, VGG16 and MobileNet. 2.
when showing the gesture. Such problems are fighting the Compare the performance of the two models (VGG16 and
lag, diversity gestures, combination of moments, movements, MobileNet) in detecting hand gestures. 3. Provide comparative

Authorized licensed use limited to: ULAKBIM UASL - DOKUZ EYLUL UNIVERSITESI. Downloaded on February 19,2024 at 11:29:22 UTC from IEEE Xplore. Restrictions apply.
979-8-3503-8202-0/23/$31.00 ©2023 IEEE 48
2023 IEEE Global Conference on Artificial Intelligence and Internet of Things (GCAIoT)

analysis based on the evaluation metrics such as accuracy and way for future developments. In [14], the research aims at
F1 score. Based on our study, we found that the MobileNet automatically interpreting gestures based on computer vision.
outperforms VGG16 by achieving 93 percent accuracy and F1 They propose a technique that commands a computer that uses
score. six static and eight dynamic hand gestures. The authors follow
The outline for this paper is as follows. Section 2 discusses three steps in tracing the detected hand, hand shape recog-
the literature study on Hand gesture recognition using CNN. nition, and converting the data into the required command.
Section 3 defines the problem statement and methodology of The experiment shows 93.09% accuracy. Allard et al. [15]
this research in detail. Section 4 discusses the methodology have implemented a model that uses transfer learning on data
evaluation while Section 5 concludes the paper and discusses obtained from users by applying deep learning techniques to
some potential future work. learn discriminant features from the datasets. In [2], Rui Maet
al. have proposed a model that uses a Deep convolutional Gen-
II. L ITERATURE S TUDY erative Adversarial network that consists of two components
A. Literature Study for Hand Gesture Recognition using CNN discriminator and a generator. The components’ structure is
CNN is one of the popular deep-learning methods used based on the hourglass network. The images are passed as
in hand gesture recognition. Most research develops models input that captures the image’s depth and passes through the
by having CNN as the basic and combining it with other DCGAN framework, and the output is the position of human
techniques to establish the most efficient performance in hand body parts. Static and Dynamic gestures are the hand gestures
gesture recognition. The network designed by Vivek Bheda used in sign language. Without moving, the hands and fingers
and N. Dianna Radpour [9] is based on the typical CNN archi- in space are known as static gestures, whereas continuous
tecture. It consists of multiple convolutional and dense layers. movement of hands is known as dynamic gestures. Sakshi
The model is tested with alphabet gestures, NZ ASL, and Sharma and her team [16] have created a model to recognize
self-generated datasets. In [10], Chen et al., have designed a the static hand gesture in a vision-based (images captured
compact CNN model that improves the classification accuracy using a camera) sign language translation system based on
and reduces the number of parameters in a model. Hammadi a convolutional neural network.
et al. [11] have proposed a system of CNN for automatic hand
C. Theoretical findings from the other research about hand
gesture recognition based on deep learning. This proposed
gesture recognition
system utilizes a transfer learning of 3D CNN for hand
gesture recognition. In [12], Mohanty et al., proposes a deep Many articles have constructed detailed findings of the ben-
learning framework that recognizes hand gestures robustly. efits and limitations and the use of various features to develop
They use CNN to identify hand postures despite variations in better hand gesture recognition techniques. Choek et al. [17]
the spatial location in the image, clutter in the background, provides a detailed review of state-of-the-art hand gestures
and hand sizes. Their experimental results demonstrate the and sign language techniques. Moreover, the authors discuss
superior performance of the proposed algorithm on state-of- the limitations of gesture recognition and sign language. The
the-art datasets. most common methods used for preprocessing the data were
Nunez et al. [1] propose an architecture addressing hand Gaussian and median filters. A few of the feature extraction
gesture recognition by 3D data sequences extracted from the methods include Principal Component Analysis (PCA), Scale-
full-body skeleton. They use both LSTM and CNN to ad- Invariant Feature Transform (SIFT), and Speed up Robust
dress temporal 3D pose recognition. Their proposed two-stage Features (SURF). In their article, Varun et al. [18] discuss how
training methods use CNN as the initial training and, then the development of gesture recognition or detection will help
the whole training takes place using CNN and LSTM. They futuristic methods in such a way that it allows people who have
also used data Augmentation to reduce the overfitting of the difficulty controlling or operating systems or devices. After
data. Their results show that the combination of both methods developing their models in this article, they have concluded
for training (CNN + LSTM) has performed better than the that it could handle some hand gestures provided by any
single method. Oyedotun et al. implement a deep learning person, which helps identify the motion. Munir Oudah et al.
method on Thomas Moeslund’s gesture dataset containing [3] have reviewed the hand gesture techniques literature and il-
24 hand gestures. The authors realized that learning could lustrated the benefits and shortcomings of the techniques. Hand
be overcome by using ReLU (Rectified Linear Unit). The gestures research papers have chosen many methods based
model’s architecture comprises CNN and Stacked denoising on computer vision and instrumented sensor technology. They
autoencoder (SDAE). The model can achieve 91.3% and have mentioned the vital process called the camera vision-
92.8% accuracy [13]. based sensor. The communication between computers and
humans is contactless and uses cameras of different configu-
B. Other Literature Studies for Hand Gesture Recognition rations such as Fisheye, monocular, and IR. Yet, the methods
To tackle various hand gesture recognition difficulties and face the difficulties of background complexion, variation in
to improve the model efficiency, many studies have proposed lighting, issues in the background, time processing against the
and achieved state-of-the-art techniques. Such techniques play resolution, frame rate, and background and foreground objects
a significant role in enhancing recognition and paving the having different skin tones.

Authorized licensed use limited to: ULAKBIM UASL - DOKUZ EYLUL UNIVERSITESI. Downloaded on February 19,2024 at 11:29:22 UTC from IEEE Xplore. Restrictions apply.
49
2023 IEEE Global Conference on Artificial Intelligence and Internet of Things (GCAIoT)

While all these studies leverage machine learning models, last two individuals are dedicated to the testing set. In an IoT
particularly VGG16, our approach carves out its uniqueness scenario, this dataset could be invaluable for gesture-controlled
through the application of a larger dataset and a direct smart devices. It would enable more intuitive human-device
performance analysis of MobileNet and VGG16. In contrast interaction such as controlling the thermostat with simple hand
to Alnuaim et al.’s work [20], which primarily focuses on gestures.
Arabic sign language without a similar model comparison,
our study extends its versatility by establishing a broader
application baseline. Furthermore, when compared to Sahoo
et al.’s work [21], our research retains its distinctive stance
due to its comprehensive dataset and comparative evaluation of
two distinct models. This breadth allows us to cater to a wider
scope of applications, thus offering a unique contribution to
the field.
Fig. 1: Representation of the ’Thumb’ hand gesture from the
D. Enhancements in Hand Gesture Recognition Models dataset.
Our paper introduces significant enhancements in the do-
main of CNN-based hand gesture recognition, setting it apart
from previous works through refined model architectures and
preprocessing methods. The approach of [24] adjusted the
VGG16 model by reducing its complexity to manage memory
restraints. Our work, on the other hand, preserves the full depth
of pre-trained models, including VGG16. By implementing
layer freezing in convolutional networks and revising fully
connected layers, we bolster the functional capacity of these Fig. 2: Illustration of the ’OK’ hand gesture within the dataset.
models without significantly diminishing their complexity or
depth. This strategy maintains a robust capacity for feature
representation while ensuring a custom fit for the nuanced re- B. Pre-Processing Data
quirements of hand gesture recognition tasks. Advancing past The images in the dataset are converted to grayscale for
the methods utilized in [21], which relied on detailed depth processing the dataset and resized to fit the model. Converting
map analysis for image preprocessing, our study introduces the images to grayscale helps to reduce the dimensionality of
a shift towards more efficient grayscale image conversion. the input data while retaining the important features for gesture
This refinement reduces the computational footprint while pre- recognition. This conversion is beneficial for reducing memory
serving recognition precision. Such an enhancement not only requirements and computing complexity. Further, the single
escalates processing efficiency but also extends the versatility channel gray scaled images were replicated three times to
of this technology across various platforms. match the input format of the pre-trained models. Finally, the
gesture labels are one-hot encoded, transforming the images
III. P ROBLEM S TATEMENT AND M ETHODOLOGY into a binary matrix to perform multi-class classification tasks.
This study proposes a gesture recognition system using two C. ML Models
pre-trained models, VGG16 and MobileNet. In this section,
In this work, we develop two pre-trained deep learning mod-
we discuss dataset preparation, pre-processing, and model
els based on convolutional neural network (CNN) algorithms
implementation. First, the dataset contains over 20,000 images
to classify the images in the dataset. The models are VGG16
of various hand gestures and is pre-processed and divided into
and MobileNetV2, whose architectures are illustrated in Figure
training and testing sets [19]. Next, the two pre-trained models
3 and Figure 4, respectively. These models have been pre-
are used with transfer learning, and custom layers are added to
trained on the ImageNet dataset [19].
tailor the model to detect the gestures efficiently. Finally, we
In order to adapt these models for our task, we first froze
evaluate and compare the models using performance metrics
the convolutional layer, a process detailed by Equation 1. This
like F1 score and accuracy.
equation shows how the output feature map Fconv is derived
A. Dataset Collection from the convolution of the input image I and the weights
of the convolutional layer Wconv , added to the bias of the
The dataset consists of ten different hand gestures. These
convolutional layer bconv .
include, but are not limited to, gestures like ’Fist’, ’Stop’,
’Index Finger’, ’Thumb’ (Figure 1), and ’OK’ (Figure 2). Each
Fconv = Σ(Wconv ∗ I) + bconv (1)
gesture has its dedicated folder, making a total of 10 folders,
where each folder contains images of the respective hand ges- After freezing the convolutional layers, we have customized
ture from one individual. To divide the dataset, the data of the the fully connected layers for all the models, following Equa-
first eight individuals are assigned to the training set, and the tion 2. Here, the output of the fully connected layer FFC is

Authorized licensed use limited to: ULAKBIM UASL - DOKUZ EYLUL UNIVERSITESI. Downloaded on February 19,2024 at 11:29:22 UTC from IEEE Xplore. Restrictions apply.
50
2023 IEEE Global Conference on Artificial Intelligence and Internet of Things (GCAIoT)

Fig. 3: Architecture of VGG16, adapted from [22]

obtained by the summation of the product of the weights of Fig. 4: MobileNetV2 Architecture, adapted from [23]
the fully connected layer WFC and the input to this layer x,
and the bias bFC .
percent. The classification report of VGG16 in Table 1 displays
FFC = WFC ∗ x + bFC (2) that the model has performed well across the different hand
gesture classes with precision scores ranging between 0.79
Our work uses a single dense layer of 10 output units to 1 and recall scores from 0.74 to 1. On the other hand,
and employs the SoftMax function as the activation function, MobileNet has achieved a higher overall accuracy of 93
represented by Equation 3. percent and an F1 score of 93 percent. The classification report
for MobileNet in Table 2 shows that the model performed well
exi in the cases of all classes in the dataset, with precision scores
Softmax(xi ) = for all j in output layer (3) ranging between 0.78 to 1 and recall scores from 0.71 to 1.
Σ(exj )
Comparing the classification metrics for both models shows
Finally, the models are compiled and trained using the Adam that MobileNet outperforms the VGG16 model in recognizing
optimizer, governed by Equation 4. hand gestures in the dataset. These high-accuracy results have
mt significant implications for IoT applications, specifically in
θt+1 = θt − η ∗ √ (4) fields like healthcare and home automation. Nuanced hand
vt + ϵ
gesture recognition can facilitate efficient human-machine
Each model was trained for ten epochs with a batch size of interactions in these domains. In summary, the ability to
64. The Adam optimizer update rule describes how the weights recognize and interpret hand gestures has the potential to
at time t + 1 (θt+1 ) are updated based on the weights at time t transfer our interactions with intelligent devices, taking user
(θt ), the learning rate η, the estimates of the first and second engagement to a new level.
moments of the gradients mt and vt , and a small constant ϵ
for maintaining numerical stability. TABLE I: Classification metrics for VGG16

IV. E VALUATION Metric Palm Precision Recall F1-score

In this study, we have employed a comprehensive evaluation I 0.86 0.97 0.91


Fist 1.00 0.76 0.86
setting that utilizes a hand gesture dataset with over 20,000 Fist Moved 0.93 0.74 0.83
images representing ten different gestures. This setting is Thumb 0.90 0.81 0.85
designed to rigorously assess and compare the performance Index 1.00 0.87 0.93
Ok 0.92 1.00 0.96
of two deep learning models, VGG16 and MobileNet. The Palm Moved 0.79 0.84 0.82
evaluation leveraged the transfer learning technique, fine- C 0.86 1.00 0.93
tuning pre-trained models by tailoring their fully connected Down 0.96 1.00 0.98
layers to the specialized task of hand gesture recognition.
Consistent testing conditions were maintained to ensure the
comparability of results across the models. V. C ONCLUSION AND F UTURE W ORK
The empirical results that the VGG16 model has achieved In conclusion, this work demonstrates the effectiveness of
an overall accuracy of 90 percent and an F1 score of 89 pre-trained deep learning models for detecting hand gestures.

Authorized licensed use limited to: ULAKBIM UASL - DOKUZ EYLUL UNIVERSITESI. Downloaded on February 19,2024 at 11:29:22 UTC from IEEE Xplore. Restrictions apply.
51
2023 IEEE Global Conference on Artificial Intelligence and Internet of Things (GCAIoT)

TABLE II: Classification metrics for MobileNet [9] Vivek Bheda and D. Radpour, “Using Deep Convolutional Networks
for Gesture Recognition in American Sign Language,” arXiv (Cornell
Metric Palm Precision Recall F1-score University), Oct. 2017, doi: https://doi.org/10.48550/arxiv.1710.06836.
I 1.00 1.00 1.00 [10] L. Chen, J. Fu, Y. Wu, H. Li, and B. Zheng, “Hand Gesture Recognition
Fist 0.86 0.83 0.84 Using Compact CNN via Surface Electromyography Signals,” Sensors,
Fist Moved 1.00 0.71 0.83 vol. 20, no. 3, p. 672, Jan. 2020, doi: https://doi.org/10.3390/s20030672.
Thumb 0.87 0.96 0.91
Index 0.78 1.00 0.88 [11] M. Al-Hammadi, G. Muhammad, W. Abdul, M. Alsulaiman, and M.
Ok 0.96 0.94 0.95 S. Hossain, “Hand Gesture Recognition Using 3D-CNN Model,” IEEE
Palm Moved 1.00 0.82 0.90 Consumer Electronics Magazine, vol. 9, no. 1, pp. 95–101, Jan. 2020,
C 0.97 1.00 0.99 doi: https://doi.org/10.1109/mce.2019.2941464.
Down 1.00 1.00 1.00 [12] R. F. Pinto, C. D. B. Borges, A. M. A. Almeida, and I. C. Paula, “Static
Hand Gesture Recognition Based on Convolutional Neural Networks,”
Journal of Electrical and Computer Engineering, vol. 2019, pp. 1–12,
Oct. 2019, doi: https://doi.org/10.1155/2019/4167890.
Using a hand gesture dataset containing over 20,000 im- [13] O. K. Oyedotun and A. Khashman, “Deep learning in vision-based static
hand gesture recognition,” Neural Computing and Applications, vol. 28,
ages, our evaluation of our models displays that MobileNet no. 12, pp. 3941–3951, Apr. 2016, doi: https://doi.org/10.1007/s00521-
outperforms VGG16 in terms of accuracy and F1 score, 016-2294-8.
achieving an overall accuracy and F1 score of 93 percent. [14] S. Hussain, R. Saxena, X. Han, J. A. Khan, and H. Shin, “Hand
gesture recognition using deep learning,” IEEE Xplore, Nov. 01, 2017.
While both models performed well across the different hand https://ieeexplore.ieee.org/document/8368821
gesture classes, the MobileNet model achieved higher recall [15] U. Cote-Allard et al., “Deep Learning for Electromyographic Hand Ges-
and precision scores. ture Signal Classification Using Transfer Learning,” IEEE Transactions
The results of this study demonstrate the benefits and on Neural Systems and Rehabilitation Engineering, vol. 27, no. 4, pp.
760–771, Apr. 2019, doi: https://doi.org/10.1109/tnsre.2019.2896269.
potential of deep learning models for hand gesture recognition [16] S. Sharma and S. Singh, “Vision-based hand gesture recognition us-
tasks. However, there is room for further improvement in the ing deep learning for the interpretation of sign language,” Expert
performance of these models in real-world scenarios. First, our Systems with Applications, vol. 182, p. 115657, Nov. 2021, doi:
https://doi.org/10.1016/j.eswa.2021.115657.
work uses a dataset containing 20,100 images, which could be [17] M. J. Cheok, Z. Omar, and M. H. Jaward, “A review of hand gesture and
expanded by adding more diverse gestures for the recognition sign language recognition techniques,” International Journal of Machine
task, including different angles of the hand and lighting Learning and Cybernetics, vol. 10, no. 1, pp. 131–153, Aug. 2017, doi:
https://doi.org/10.1007/s13042-017-0705-5.
conditions. This will help the model generalize the images [18] Kollipara Sai Varun, I. Puneeth, and T. Prem Jacob, “Hand Gesture
better and improve its accuracy. Moreover, other state-of-the- Recognition and Implementation for Disables using CNN’S,” Interna-
art deep learning models, like EfficientNet and ResNet can tional Conference on Communication and Signal Processing, Apr. 2019,
doi: https://doi.org/10.1109/iccsp.2019.8697980.
be used to compare and evaluate the performance metrics.In [19] GTI-UPM, ”LeapGestRecog,” Kaggle, 2018. [Online]. Available:
the future, the training process of the model can be integrated https://www.kaggle.com/datasets/gti-upm/leapgestrecog
with the user’s feedback and interactions. The successive user [20] A. Alnuaim, M. Zakariah, W. A. Hatamleh, H. Tarazi, V. Tripathi,
and E. T. Amoatey, “Human-Computer Interaction with Hand Ges-
feedback helps improve the model’s performance. ture Recognition Using ResNet and MobileNet,” Computational Intel-
ligence and Neuroscience, vol. 2022, p. e8777355, Mar. 2022, doi:
R EFERENCES https://doi.org/10.1155/2022/8777355.
[1] J. C. Núñez, R. Cabido, J. J. Pantrigo, A. S. Montemayor, and [21] J. P. Sahoo, A. J. Prakash, P. Pławiak, and S. Samantray, “Real-
J. F. Vélez, “Convolutional Neural Networks and Long Short-Term Time Hand Gesture Recognition Using Fine-Tuned Convolutional
Memory for skeleton-based human activity and hand gesture recog- Neural Network,” Sensors, vol. 22, no. 3, p. 706, Jan. 2022, doi:
nition,” Pattern Recognition, vol. 76, pp. 80–94, Apr. 2018, doi: https://doi.org/10.3390/s22030706.
https://doi.org/10.1016/j.patcog.2017.10.033. [22] M. Ferguson, R. ak, Y.-T. Lee, and K. Law, “Automatic localization of
[2] R. Ma, Z. Zhang, and E. Chen, “Human Motion Gesture Recognition casting defects with convolutional neural networks,” in Proceedings of
Based on Computer Vision,” Complexity, vol. 2021, pp. 1–11, Feb. 2021, the IEEE International Conference on Big Data, pp. 1726-1735, 2017,
doi: https://doi.org/10.1155/2021/6679746. doi: 10.1109/BigData.2017.8258115.
[3] M. Oudah, A. Al-Naji, and J. Chahl, “Hand Gesture Recognition Based [23] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mo-
on Computer Vision: A Review of Techniques,” Journal of Imaging, vol. bileNetV2: Inverted Residuals and Linear Bottlenecks,” arXiv preprint
6, no. 8, p. 73, Jul. 2020, doi: https://doi.org/10.3390/jimaging6080073. arXiv:1801.04381, 2019.
[24] E. L. R. Ewe, C. P. Lee, L. C. Kwek, and K. M. Lim, “Hand Gesture
[4] N. Mohamed, M. B. Mustafa, and N. Jomhari, “A Review of Recognition via Lightweight VGG16 and Ensemble Classifier,” Applied
the Hand Gesture Recognition System: Current Progress and Fu- Sciences, vol. 12, no. 15, p. 7643, Jul. 2022, doi: 10.3390/app12157643.
ture Directions,” IEEE Access, vol. 9, pp. 157422–157436, 2021,
doi:https://doi.org/10.1109/access.2021.3129650.
[5] H. Huang, Y. Chong, C. Nie, and S. Pan, “Hand Gesture Recog-
nition with Skin Detection and Deep Learning Method,” Journal of
Physics: Conference Series, vol. 1213, p. 022001, Jun. 2019, doi:
https://doi.org/10.1088/1742-6596/1213/2/022001.
[6] T. R. Gadekallu et al., “Hand gesture classification using a novel CNN-
crow search algorithm,” Complex and Intelligent Systems, vol. 7, no. 4,
pp. 1855–1868, Mar. 2021, doi: 10.1007/s40747-021-00324-x.
[7] “VGG-16 — CNN model,” GeeksforGeeks, Feb. 26, 2020.
https://www.geeksforgeeks.org/vgg-16-cnn-model/
[8] W. Wang, Y. Li, T. Zou, X. Wang, J. You, and Y. Luo, “A
Novel Image Classification Approach via Dense-MobileNet Models,”
Mobile Information Systems, vol. 2020, pp. 1–8, Jan. 2020, doi:
https://doi.org/10.1155/2020/7602384.

Authorized licensed use limited to: ULAKBIM UASL - DOKUZ EYLUL UNIVERSITESI. Downloaded on February 19,2024 at 11:29:22 UTC from IEEE Xplore. Restrictions apply.
52

You might also like