Smart Control of Home Appliances Using Hand Gesture Recognition in An IoT-Enabled System
Smart Control of Home Appliances Using Hand Gesture Recognition in An IoT-Enabled System
An International Journal
Cheng-Ying Yang, Yi-Nan Lin, Sheng-Kuan Wang, Victor R.L. Shen, Yi-Chih
Tung, Frank H.C. Shen & Chun-Hsiang Huang
To cite this article: Cheng-Ying Yang, Yi-Nan Lin, Sheng-Kuan Wang, Victor R.L. Shen, Yi-Chih
Tung, Frank H.C. Shen & Chun-Hsiang Huang (2023) Smart Control of Home Appliances Using
Hand Gesture Recognition in an IoT-Enabled System, Applied Artificial Intelligence, 37:1,
2176607, DOI: 10.1080/08839514.2023.2176607
Introduction
Nowadays, with the rapid development of the Internet and technological pro
ducts, people have ushered in an era when individuals are closely connected with
each other. Moreover, many identification systems have been developed, such as
sign language recognition, face recognition, and license plate recognition (Riedel,
Brehm, and Pfeifroth 2021). However, there are still many flaws in hand gesture
recognition. As their technology cannot recognize users’ hand gestures quickly
and accurately, it results in being unable to solve users’ problems promptly
CONTACT Victor R.L. Shen [email protected] Department of Computer Science and Information
Engineering National Taipei University,New Taipei City 237, Taiwan
© 2023 The Author(s). Published with license by Taylor & Francis Group, LLC.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/
licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly
cited.
e2176607-624 C.-Y. YANG ET AL.
(Sharma et al. 2022). For this reason, it motivates us to find the resolution of all
difficulties that seniors and people with disabilities may encounter at home, such
as turning on and off the lights, locking the door, and making the phone calls.
This study is based on the concept of virtual touch system with assisted
gestures by using deep learning and MediaPipe, which employed dynamic
gestures to control the computer without using a mouse (Yu 2021). Moreover,
the paper (Shih 2019) was also cited. In terms of complementary hardware and
software requirements between these two works, the instant gesture tracking
on wearable gloves was achieved by using the MediaPipe hand tracking
module and calculating the angle of the intersection lines based on four key
points. However, their method needs to further improve the accuracy and
efficiency of recognition, which will be applied to more innovative cases.
How people can easily communicate with machines is now a new trend.
Many researchers have tried to find reliable and humanized methods through
the recognition of hand gestures, facial expressions, and body language,
among which hand gesture is the most flexible and convenient one.
Nevertheless, the hand tracking and recognition subjects are challenging due
to the high flexibility of the hand (Riedel, Brehm, and Pfeifroth 2021).
This study aims to design a gesture recognition module incorporated into
smart home systems so that both the elderly and the disabled people can
control home appliances more comfortably and conveniently. Currently,
everyone has a mobile device with camera lenses, which can identify the
current gestures anytime and anywhere to control the home appliances
(Gogineni, et al., 2020) when connected to Wi-Fi equipment. Therefore, it is
such a novel device that all people can enjoy using (Alemuda 2017).
Literature Review
This section presents an IoT-Enabled system for realizing smart control of
home appliances by hand gesture recognition, which consists of MediaPipe
and various development software tools, such as Thonny, Android Studio, and
WoPeD.
APPLIED ARTIFICIAL INTELLIGENCE e2176607-625
Petri Net
Petri net theory was developed by a German mathematician, Dr. Carl Adam
Petri, which is basically a directed and mathematical graph of discrete parallel
systems, suitable for modeling asynchronous and concurrent systems (Chen
et al. 2021). It can be used to perform qualitative and quantitative analysis of
a system, as well as to represent the systematic synchronization and mutual
exclusion. Therefore, Petri net is widely used in different fields for system
simulation, analysis, and modular construction (Hamroun et al. 2020; Kloetzer
and Mahulea 2020; Zhu et al. 2019).
A basic PN model contains four elements, namely, Place, which is denoted
as a circle; Transition, a long bar or square; Arc, a line with arrow; and Token,
a solid dot, as listed in Table 1.
Petri nets are basically composed of three elements, PN = (P, T, F), where
P = {p1, p2, . . . , pm} denotes a finite set of places.
T = {t1, t2, . . . , tm} denotes a finite set of transitions.
F = (P×T) ∪ (T×P) denotes a set of lines with arrows (i.e. flow relation).
M = {m0, m1, m2, . . .} denotes a set of markings. mi denotes a vector in the
set of M, representing the state of token distribution after the Petri net is
triggered i times. Additionally, the value in a vector is an integer number
indicating the number of tokens in the corresponding place.
Transition or
Arc
Token
e2176607-628 C.-Y. YANG ET AL.
WoPed Software
Workflow Petri Net Designer (WoPeD (Workflow Petri Net Designer), 2022;
GitHub/WoPeD, 2022) is an open-source software tool developed by
Cooperative State University Karlsruhe under the GNU Lesser General
Public License (LGPL) that provides modeling, simulation, and analysis of
processes described by workflow networks. WoPeD is currently maintained by
Sourceforge (a web-based open-source development platform), and current
development progress can be found on the home page of WoPeD project at
Sourceforge. The verification of the system design process is carried out with
this tool. The Petri net model is used to analyze the design process and to
ensure the feasibility and soundness of a system.
Related Works
For some people with mobility problems who are unable to take care of
themselves and need the help of others, or for some speakers who cannot
use a mouse at a close distance, researchers C.R. Yu and F. Alemuda (Alemuda
2017; Yu 2021) proposed a method to use gesture recognition to control the
actions of rolling up and down, the zooming in and out of the slides. However,
the integrity and soundness of the systems have not yet been formally verified
to ensure that the pre-development model of the system is feasible. The
wearable glove in home appliances based on IoT technology was proposed
by W.-H. Shih (Shih 2019), but it was found with inconvenience and high
costs. Shih’s experimental results indicate that his study employed deep images
to recognize the user’s gestures, which is like the method proposed by X. Shen
(Shen et al. 2022). However, the experiments revealed that the body sensing
devices were expensive, and their proposed methods were unable to control
home appliances. Moreover, the precision of the American sign language
(ASL) recognition was compared. The method of gesture recognition for
letters A-Z proposed by S. Padhy (Padhy 2021) was tested, but the numbers
were not yet recognized. Amazon has been a trendsetter through its Alexa-
powered devices. Alexa is an intelligent personal assistant (IPA) that performs
tasks, such as playing music, providing news and information, and controlling
smart home appliances. A relationship between Alexa and consumers with
special needs is established as it helps them regain their independence and
freedom (Ramadan, Farah, and El Essrawi 2020). Recent improvements of the
IoT technology are giving rise to the explosion of interconnected devices,
empowering many smart applications. Promising future directions for deep
learning (DL)-based IoT in smart city environments are proposed. The overall
idea is to utilize the few available resources more smartly by incorporating DL-
IoT (Rajyalakshmi, et al. Rajyalakshmi and Lakshmanna 2022).
APPLIED ARTIFICIAL INTELLIGENCE e2176607-629
Proposed Approach
In this section, hardware/software configurations, system structure, gesture
recognition, and hand gesture definitions are presented.
ESP8266 microcontroller and relay module are selected. The hardware con
figuration can send high or low voltage signal to the D5 pin of the microcon
troller through Wi-Fi, and then use the base voltage of a transistor to control
the relay. The control mode leads the base voltage to send high voltage signal
to energize the relay. In contrary, low voltage signal leads to relay disconnec
tion so that the purpose of switching the home appliances off can be achieved.
Hereby, the ESP8266 microcontroller is combined with RGB LED light bar.
With the red line being 5 V, the brown line being GND, and the white line
being D2 pin, the connection is thus completed.
System Structure
Gesture Recognition
When the camera is turned on, the user makes a gesture so that the system can
capture 21 key points of a hand. For example, the gesture of number 1 uses
four key points of 0, 6, 7, and 8 on the finger and palm. Two lines are formed as
shown in Figure 3 The calculation of the angle of bending fingers is done by
using the following formulas (Reference, 2022), where Xa = x2 - x1 and Ya = y2
- y1, Xb = x4 - x3, and Yb = y4 - y3 transform into vectors L1 = <Xa, Ya> and
L2 = <Xb, Yb> to find the inner product of two vectors L1 ·L2 = Xa × Xb + Ya
× Yb, as listed in Table 2. Furthermore, based on Eq. (1) of inner product of
two vectors:
L1 � L2
cosA ¼ (1)
jL1j � jL2j
e2176607-630 C.-Y. YANG ET AL.
where A denotes the angle between two vectors, and Eq. (2):
ðXa � Xb þ Ya � Yb Þ
cosA ¼ pffiffiffiffiffiffiffi2ffiffiffiffiffiffiffiffiffiffiffiffiffi2ffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ffiffiffiffiffiffiffiffiffiffiffiffi2ffiffiffi (2)
ðXa þ Ya Þ � ðXb þ Yb Þ
the inverse trigonometric function is used to find the angle as Eq. (3):
1 ðXa � Xb þ Ya � Yb Þ
A ¼ cos pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi (3)
ðXa 2 þ Ya 2 Þ � ðXb 2 þ Yb 2 Þ
the state of each finger in order, as listed in Table 3 (thumb, index finger,
middle finger, ring finger, and little finger), where the index finger is straight,
and the other four fingers are bent. Therefore, it is determined that the user is
making a gesture of number 1. In this study, ten gestures are thus defined.
1 (0, 1, 0, 0, 0)
2 (0, 1, 1, 0, 0)
3 (0, 1, 1, 1, 0)
4 (0, 1, 1, 1, 1)
(Continued)
e2176607-634 C.-Y. YANG ET AL.
Table 3. (Continued).
Left-hand finger states
Numbers Gestures (thumb, index finger, middle finger, ring finger, little finger)
5 (1, 1, 1, 1, 1)
6 (1, 0, 0, 0, 1)
7 (1, 1, 0, 0, 0)
8 (1, 1, 1, 0, 0)
9 (1, 1, 1, 1, 0)
APPLIED ARTIFICIAL INTELLIGENCE e2176607-635
goes to t15 (output 21 hand key point images) through p12 (take the intersec
tion line of four key points to calculate the angle). If (finger bend>40 degrees)
t16, it goes through p13 (calculated finger angle>40 degrees) to t18 (finger
bend). On the contrary, t17 (finger bend<40 degrees) fires to go through p14
(calculated finger angle<40 degrees) to t19 (finger straightening), passes
through p15 (confirm finger angle) to fire t20 (display gesture recognition
result) through p16 (compare gesture definition sources). If t22 (finger has no
corresponding finger) fires, then it goes back to p2; otherwise, it goes to fire t21
(finger has the corresponding finger) and enters t24 (no Wi-Fi connection)
through p17 (matching with the corresponding command). It then goes back
to p17 (matching with the corresponding command). Otherwise, it goes to t23
(Wi-Fi connection) through p18 (matching with Wi-Fi successfully) and
enters t25 (sending the command to ESP8266). It fires t26 (control electrical
appliances) through p19 (compare gesture to command data). Through p20
APPLIED ARTIFICIAL INTELLIGENCE e2176607-637
(confirm the status of electrical appliances), it fires t27 (end) and finally goes
through p21 (the end of system design process).
To verify the correctness of the system design process including hardware
and software components, a workflow diagram is loaded into this program.
Furthermore, this study has used 21 places as listed in Table 4 and 27 transi
tions as listed in Table 5.
System Verification
Figure 5 depicts the net statistics and the structural analysis of the PN model,
which displays the total number of elements in the model and the soundness of
the system design process. Consequently, there are no conflicts or deficiencies
in the operational process, and the feasibility of the system is fully verified.
Experimental Results
Once the mobile device is connected to the ESP8266 microcontroller, it sends
the command to the home appliances after completion of hand gesture
recognition and the execution time back to the mobile device. After testing,
it takes 0.62 seconds from making a correct gesture to turn on the LED light.
The original LED light is turned on, but when the finger makes the gesture of
number 1, the command makes the LED light turn off, as shown in Figure 6
Originally, the LED light turned off, when the finger makes the gesture of
number 2, the command makes the LED light turn on, as shown in Figure 7
Therefore, this test shows that it is possible to use gesture recognition to
control the home appliances.
As shown in Figure 8, there are 15 controllable light beads on the RGB light
bar, and gestures can be used to make the light beads display according to the
desirable brightness and color. When the finger makes the gesture of number
4, the command makes the RGB light bar turn red.
As shown in Figure 9, when the finger makes the gesture of number 5, it
sends a command to make the RGB light bar turn blue.
As shown in Figures 10–12, when the finger makes the gesture of number 6,
the command makes the red RGB light bar change the degree of brightness in
three steps, namely, normal, slightly dim, and slightly bright.
In this study, ten samples were asked to make number gestures that could be
judged by the naked eye in bright light with a simple background. They
performed ten defined gestures at different angles, with the palms facing
outward or inward, in a total of five movement patterns.
This might be due to the differences in gesture habits and the finger skeletal
and muscular structures of each sample, resulting in differences of gesture
movements. The precision value of each type of gesture in the MediaPipe
model is listed in Table 6, and the recall value is listed in Table 7. The precision
e2176607-638 C.-Y. YANG ET AL.
calculation method Eq. (4) and the recall calculation method Eq. (5) are shown
as follows:
TP
Precision = TPþFP (4)
APPLIED ARTIFICIAL INTELLIGENCE e2176607-639
TP
Recall = TPþFN (5)
Table 8. Comparison of the precision with different methods for gestures in ASL.
Methods
Gestures Proposed (%) Shen, X. et al. (%) Chen, Y.-C. (%)
0 100 95.27 100
1 100 91.75 98
2 100 89.25 95
3 90 90 98
4 100 100 95
5 100 100 100
6 90 77 91
7 90 84.25 91
8 90 76.25 83
9 100 74 96
Average 96 86.94 94.4
Table 9. Comparison of this study with other methods.
Methods
Metrics Proposed Yu, C.-R. Shih, W.-H. Padhy, S. Chen, X., et al.
Hardware Smartphone, USB 3.0 bus; Wearable smart gloves, Arduino Surface electromyography Surface electromyography (sEMG)-based,
ESP8266 At least 4GB RAM YUN (sEMG), CNN+LSTM (long short-term memory)
microcontroller webcam or Multilinear singular value
chip smartphone decomposition (MLSVD)
Software MediaPipe Hand Windows 10 Arduino operating system Tensor-based approach, Transfer learning (TL) strategy,
Android Studio operating system, Dictionary Learning (DL) CNN-based source network
Thonny 3.7.5 Python 3.6
Number of gestures 10 7 4 10 20
Control of the actions 1. All home 1. PPT files zoom in 1. Power switch of light bulb, color
appliances switch and zoom out and brightness setting (1) (1)
on/off 2. Slides playing, 2. Power setting of air conditioner, Upper limb motion Myoelectric control systems
2. Control of light next page, and close temperature, and wind speed classification
color (2)
(2) Biomedical engineeringsystems
Biomedical engineeringsystems
Distance 2.5m 1.9m 2.0m 0.0m 0.0m
The success rate when the 32% 22% 25% 28% 29%
light illumination is 2.95 lux
Response time(sec.) 0.62 0.87 0.91 0.90 0.89
Precision % 98.80 90.65 91.71 92.12 93.32
Recall % 97.67 89.11 90.16 91.23 92.77
APPLIED ARTIFICIAL INTELLIGENCE
Functional Comparison
Conclusion
A low-cost ESP8266 microcontroller chip is used to enable hand gesture
recognition for smart control of home appliances. This study aims to use
MediaPipe hand tracking to extend the fingers based on the palm and add the
vector angle formulas to calculate the finger angle. Ten hand gestures were
defined. To the end, the built system can control the home appliances via hand
gestures with promising precision and recognition speed. This study has made
the following contributions:
(1) The system design framework is modeled and analyzed using Petri net
tool, WoPeD, to ensure its integrity and soundness. If the system has no
errors, then it will accelerate the system production.
(2) It is easy and fast to operate, and the manufacturing cost is low, which
only takes about 0.62 seconds to control home appliances.
(3) The vector formula was used to determine the bending angle of fingers
to effectively improve the recognition of numbers 0-9, with the preci
sion and recall values as high as 98.80% and 97.67%, respectively. The
precision value in ASL reaches 96% when compared to other methods.
This system can help users operate the home appliances in a comfortable and
convenient way. For example, there is no need for users to get up to switch on
and off the home appliances. All they need to do is to connect mobile devices
to Wi-Fi equipment to complete the actions of switching on and off the electric
power.
In addition to controlling the home appliances, this study expects to be
applied to medical or automotive related products. It is also anticipated that
the results of this study will inspire more researchers to delve into the devel
opment of gesture recognition systems and create more innovative ideas. In
APPLIED ARTIFICIAL INTELLIGENCE e2176607-645
this way, the public may enjoy the convenience brought by hand gesture
recognition in the future.
Acknowledgements
The authors are grateful to the anonymous reviewers for their constructive comments which
have improved the quality of this paper.
Disclosure Statement
No potential conflict of interest was reported by the author(s).
Funding
This work was supported by the Ministry of Science and Technology, Taiwan, under grants
MOST 110-2637-E-131-005-, MOST 110-2221-E-845-002- and MOST 111-2221-E-845-003-.
References
Relay. [Online] Available: https://tutorials.webduino.io/zh-tw/docs/basic/component/relay.
html. May 2022a.
Alemuda, F. 2017. Gesture-based control in a smart home environment, Master Thesis,
International Graduate Program of Electrical Engineering and Computer Science,National
Chiao Tung University.
Chen, X., Y. Li, R. Hu, X. Zhang, and X. Chen. 2021. Hand gesture recognition based on surface
electromyography using convolutional neural network with transfer learning method. IEEE
Journal of Biomedical and Health Informatics 25 (4):1292–304. doi:10.1109/JBHI.2020.
3009383.
Gan, L., Y. Liu, Y. Li, R. Zhang, L. Huang, and C. Shi. 2022. Gesture recognition system using
24 GHz FMCW radar sensor realized on real-time edge computing platform. IEEE Sensors
Journal 22 (9):8904–14. doi:10.1109/JSEN.2022.3163449.
Get to know android studio. [Online] Available: https://developer.android.com/studio/intro?
hl=zh-tw. May 2022b.
GitHub android studio. [Online] Available: https://github.com/android. May 2022c.
GitHub Google/mediapipe. [Online] Available: https://github.com/google/mediapipe. May
2022d.
GitHub tfreytag/WoPeD. [Online] Available: https://github.com/tfreytag/WoPeD (Visited in
2022/05)
GitHub Thonny. [Online] Available: https://github.com/thonny/thonny. May 2022e.
Gogineni, K., A. Chitreddy, A. Vattikuti, and N. Palaniappan. 2020. Gesture and speech
recognizing helper bot. Applied Artificial Intelligence 34 (7):585–95. doi:10.1080/08839514.
2020.1740473.
Hamroun, A., K. Labadi, M. Lazri, S. B. Sanap, V. K. Bhojwani, and M. V K. 2020. Modelling
and performance analysis of electric car-sharing systems using Petri nets. E3S Web of
Conferences 170 (3001):1–6. doi:10.1051/e3sconf/202017003001.
e2176607-646 C.-Y. YANG ET AL.
Kloetzer, M., and C. Mahulea. 2020. Path planning for robotic teams based on LTL specifica
tions and petri net models. Discrete Event Dynamic Systems 30 (1):55–79. doi:10.1007/
s10626-019-00300-1.
Lee, F. N. 2019. A real-time gesture recognition system based on image processing, Master
Thesis, Department of Communication Engineering, National Taipei University.
Lin, C. H. 2018. ESP8266-based IoTtalk device application: Implementation and performance
evaluation, Master Thesis, Graduate Institute of Network Engineering, National Chiao Tung
University.
Ling, Y., X. Chen, Y. Ruan, X. Zhang, and X. Chen. 2021. Comparative study of gesture
recognition based on accelerometer and photoplethysmography sensor for gesture interac
tions in wearable devices. IEEE Sensors Journal 21 (15):17107–17. doi:10.1109/JSEN.2021.
3081714.
Lugaresi, C., J. Tang, H. Nash, C. McClanahan, E. Uboweja, M. Hays, F. Zhang, C. L. Chang, M.
G. Yong, J. Lee et al, 2019. Mediapipe: A framework for building perception pipelines. arXiv
preprint arXiv:1906.08172. doi:10.48550/arXiv.1906.08172.
Lu, X., S. Sun, K. Liu, J. Sun, and L. Xu. 2022. Development of a wearable gesture recognition
system based on two-terminal electrical impedance tomography. IEEE Journal of Biomedical
and Health Informatics 26 (6):2515–23. doi:10.1109/JBHI.2021.3130374.
OkHttp Internet Connection. [Online] Available: https://ithelp.ithome.com.tw/articles/
10188600. May 2022.
OkHttp : Use Report for Android Third-party. [Online] Available: https://bng86.gitbooks.io/
android-third-party-/content/okhttp.html. May 2022.
Padhy, S. 2021. A tensor-based approach using multilinear SVD for hand gesture recognition
from sEMG signals. IEEE Sensors Journal 21 (5):6634–42. doi:10.1109/JSEN.2020.3042540.
Rajyalakshmi, V., and K. Lakshmanna. 2022. A review on smart city - IoT and deep learning
algorithms, challenges. International Journal of Engineering Systems Modelling and
Simulation 13 (1):3–26. doi:10.1504/IJESMS.2022.122733.
Ramadan, Z., M. F. Farah, and L. El Essrawi. 2020. Amazon.Love: How Alexa is redefining
companionship and interdependence for people with special needs. Psychology & Marketing
10 (1):1–12.
Reference program for static gesture-image 2D method GitCode. [Online] Available: https://
gitcode.net/EricLee/handpose_x/-/issues/3?from_codechina=yes. May 2022f.
Riedel, A., N. Brehm, and T. Pfeifroth. 2021. Hand gesture recognition of methods-time
measurement-1 motions in manual assembly tasks using graph convolutional networks.
Applied Artificial Intelligence 36 (1):1–12. doi:10.1080/08839514.2021.2014191.
Sharma, V., M. Gupta, A. K. Pandey, D. Mishra, and A. Kumar. 2022. A review of deep
learning-based human activity recognition on benchmark video datasets. Applied Artificial
Intelligence 36 (1):1–11. doi:10.1080/08839514.2022.2093705.
Shen, X., H. Zheng, X. Feng, and J. Hu. 2022. ML-HGR-Net: A meta-learning network for
fmcw radar based hand gesture recognition. IEEE Sensors Journal 22 (11):10808–17. doi:10.
1109/JSEN.2022.3169231.
Shih, W. H. 2019. Applying IoT and gesture control technology to build a friendly smart home
environment, Master Thesis, Department of Computer Science & Information Engineering,
Chung Hua University.
Thonny, Python IDE for beginners. [Online] Available: https://thonny.org/. May 2022.
APPLIED ARTIFICIAL INTELLIGENCE e2176607-647