0% found this document useful (0 votes)
49 views4 pages

Gesture Controlled Virtual Mouse With Voice Automation IJERTV12IS040131

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views4 pages

Gesture Controlled Virtual Mouse With Voice Automation IJERTV12IS040131

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Published by : International Journal of Engineering Research & Technology (IJERT)

http://www.ijert.org ISSN: 2278-0181


Vol. 12 Issue 04, April-2023

Gesture Controlled Virtual Mouse with Voice


Automation
Prithvi J, S Shree Lakshmi, Suraj Nair and Sohan R Kumar
Department of Computer Science And Engineering
B.M.S. College of Engineering Bengaluru, Karnataka, India

Ms. Sunayana S
Department of Computer Science and Engineering Visveswaraya Technological University,
Belgaum Bengaluru, Karnataka, India

Abstract— This research paper proposes a Gesture Controlled friendly. The system does not require any additional
Virtual Mouse system that enables human- computer hardware, and the implementation of the system is based
interaction using hand gestures and voice commands. The on models such as the Convolutional Neural Network
system requires no direct contact with the computer and (CNN) implemented by MediaPipe running on top of
allows for virtual control of all input/output operations. The
system employs state-of-the-art Machine Learning and
pybind11. The system comprises two modules, one of
Computer Vision algorithms to recognize static and dynamic which operates directly on hands using MediaPipe hand
hand gestures and voice commands, without the need for detection, while the other module uses gloves of any
additional hardware. The system comprises two modules, uniform color. The system currently supports the
one that works directly on hands using MediaPipe Hand Windows platform.
detection and another that uses gloves of any uniform This research paper presents a detailed analysis of the
color. The system leverages models such as Convolutional Gesture Controlled Virtual Mouse, covering the system’s
Neural Networks implemented by MediaPipe running on top architecture, algorithmic approach to gesture recognition,
of pybind11. The paper discusses the system’s architecture, and implementation of both modules. The paper also
algorithmic approach to gesture recognition, and
implementation of both modules in detail. The proposed
discusses the advantages of the Gesture Controlled Vir-
system presents a natural and user-friendly alternative to tual Mouse over traditional input methods, such as the
traditional input methods and can have potential applications increased naturalness and user-friendliness of the interac-
in healthcare and education. The paper’s findings will be tion. The findings presented in this paper will contribute
of interest to researchers and practitioners in the field of to the growing field of Human-Computer Interaction and
Human-Computer Interaction. will be useful for researchers, developers, and anyone in-
terested in the latest advances in gesture-based interaction
Index Terms— Gesture Control, Virtual Mouse, Human- technology.
Computer Interaction, Hand Gestures, Voice Commands,
Ma- chine Learning, Computer Vision, MediaPipe,
II. PROBLEM STATEMENT
Convolutional Neural Networks, Pybind11, Healthcare,
Education. With the emergence of ubiquitous computing, tradi-
tional methods of user interaction involving the keyboard,
I. INTRODUCTION mouse, and pen are no longer adequate. The limitations
The field of Human-Computer Interaction has seen of these devices restrict the range of instructions that
significant advancements with the introduction of inno- can be executed. Direct usage of hand gestures and voice
vative technologies. Traditional input methods such as commands have the potential to serve as input devices
keyboards, mice, and touchscreens have become more for more natural and intuitive interaction, enabling users
sophisticated, but still require direct contact with the to perform everyday tasks with ease. Such methods can
computer, limiting the scope of interaction. Gesture-based offer a more extensive instruction set and eliminate the
interaction has emerged as an alternative approach to need for direct physical contact with the computer, further
traditional methods, and the Gesture Controlled Virtual enhancing the user’s experience.
Mouse is an innovative technology that enables intu- III. LITERATURE SURVEY
itive interaction between humans and computers. This A. Background
research paper presents a comprehensive study of the Gesture-based mouse control using computer vision
Gesture Controlled Virtual Mouse, which leverages state- has been a topic of interest for researchers for a long
of-the-art Machine Learning and Computer Vision algo- time. Various methods have been proposed for gesture
rithms to enable users to control input/output operations recognition, but in this paper, the authors have proposed
using hand gestures and voice commands without the a new method based on color detection and masking. This
need for direct contact. system is implemented in Python programming language
The Gesture Controlled Virtual Mouse is designed using the OpenCV library, which is a popular computer
using the latest technology and is capable of recognizing vision library. The proposed system is a virtual mouse
both static and dynamic hand gestures in addition to voice that will work only based on webcam captured frames
commands, making the interaction more natural and user- and tracking colored fingertips.

IJERTV12IS040131 www.ijert.org 557


(This work is licensed under a Creative Commons Attribution 4.0 International License.)
Published by : International Journal of Engineering Research & Technology (IJERT)
http://www.ijert.org ISSN: 2278-0181
Vol. 12 Issue 04, April-2023

The objective of this paper is to develop and implement the time all three color caps get new coordinates, it
an alternative system to control a mouse cursor. The performs scrolls. If their y coordinate values decrease, it
alternative method is hand gesture recognition using a will perform scrolling down, and if the values increase, it
webcam and a color detection method. The ultimate out- will perform scrolling up.In conclusion, the proposed
come of this paper is to develop a system that recognizes system has shown a new method for gesture-based mouse
hand gestures and controls the mouse cursor using the control using computer vision. The system uses color
color detection method of any computer. detection and masking to recognize hand gestures and
The system works on the frames captured by the we- control the mouse cursor.
bcam on the computer machine or built-in camera on a
laptop. By creating the video capture object, the system IV. PROPOSED SYSTEM
will capture video using the webcam in real-time. The A. Overview
camera should be positioned in a way so that it can see The proposed Gesture Controlled Virtual Mouse sys-
the user’s hands in the right positions. tem also includes a third module that leverages voice
automation for wireless mouse assistance. This module
B. Literature Survey allows users to perform mouse operations such as
In the previously proposed system by Kabid Hassan clicking, scrolling, and dragging, by simply giving voice
Shibly’s "Design and Development of Hand Gesture commands. This feature is especially helpful for users
Based Virtual Mouse" research paper published in who are unable to use hand gestures due to physical
ICASERT (2019), color detection is done by detecting limitations.
color pixels of fingertips with color caps from the frames The voice automation module is implemented using
that were captured by the webcam. This is the initial and state-of-the-art speech recognition algorithms that en-
funda- mental step of the proposed system. The outcome able the system to accurately recognize the user’s voice
of this step will be a grayscale image, where the commands. The module is designed to work seamlessly
intensity of the pixels differs from the color cap to the with the other two modules of the system, allowing users
rest of the frame, and the color cap area will be to switch between hand gestures and voice commands
highlighted. Then, rectangle bounding boxes (masks) will effortlessly.
be created around the color cap, and the color cap will be This module also adds a layer of convenience by allow-
tracked. The gesture will be detected from the tracking ing users to perform mouse operations from a distance,
of these color caps.At first, the center of two detected without the need for any direct contact with the computer.
color objects is calculated, which is done by the This makes it a useful tool for presentations, demonstra-
coordinates of the center of the detected rectangle. To tions, and other scenarios where the user needs to interact
create a line between two coordinates, the built-in with the computer without being physically close to it.
OpenCV function is used, and to detect the midpoint Overall, the Gesture Controlled Virtual Mouse
equation, a given formula is used. This midpoint is the system is an innovative and user-friendly solution that
tracker for the mouse pointer, and the mouse pointer will simplifies human-computer interaction. With its advanced
track this midpoint. In this system, the coordinates from machine learning and computer vision algorithms, it
camera captured frames resolution are converted to screen offers a reli- able and efficient way for users to control
resolution. A predefined location for the mouse is set, so their computers using hand gestures, voice commands, or
that when the mouse pointer reaches that position, the a combination of both.
mouse started to work, and this may be called an open
gesture. This allows the user to control the mouse pointer. B. Convolutional Neural Networks (MediaPipe running
The previous system uses close gestures for clicking on top of pybind11)
events. When the rectangle bounding boxes come The convolutional neural network (CNN) implemented
closer to another rectangle, the bounding box is created by MediaPipe is based on deep learning algorithms that
with the edge of the tracking bounding boxes. When the use a series of convolutional layers to extract features
newly created bounding box becomes 20 percent of its from images. The basic algorithm for CNNs can be
creation time size, the system performs the left button summarized as follows:
click, and it can be clicked. By holding this position more 1. Input layer: Accepts the input image and performs
than 5 seconds, the user can perform a double-click. preprocessing such as normalization.
And for the right button click, again the open gesture 2. Convolution layer: Applies convolution operation to
is used. To perform the right button click, a single finger the input image using multiple filters to extract relevant
is good enough. The system will detect one fingertip color features. The output of this layer is called a feature map.
cap, then it performs a right button click.To scroll with 3. Activation function: Introduces non-linearity to the
this system, the user needs to use the open gesture move- feature maps.
ment with three fingers with color caps. If the users use 4. Pooling layer: Reduces the spatial dimensions of the
their three fingers together and change its position to feature maps to reduce computational complexity.
downwards, it will perform scrolling down. Similarly, if 5. Repeat steps 2-4 for multiple layers.
its position is changed to upwards, it will perform 6. Flatten layer: Converts the feature maps into a vector
scrolling up. When three fingers move up or down, the to feed them into the fully connected layer.
color caps get a new position and new coordinates. By 7. Fully connected layer: Performs the classification

IJERTV12IS040131 www.ijert.org 558


(This work is licensed under a Creative Commons Attribution 4.0 International License.)
Published by : International Journal of Engineering Research & Technology (IJERT)
http://www.ijert.org ISSN: 2278-0181
Vol. 12 Issue 04, April-2023

task by applying weights and biases to the input vector. 4) Right Click: Gesture for single right click
8. Output layer: Produces the final output.
5) Double Click: Gesture for double click
Here’s a pseudocode implementation of a simple CNN 6) Scrolling: Dynamic Gestures for horizontal and ver-
algorithm: tical scroll. The speed of scroll is proportional to the
distance moved by pinch gesture from start point. Ver-
tical and Horizontal scrolls are controlled by vertical and
Algorithm 1: Convolutional Neural Network Algo- horizontal pinch movements respectively.
rithm 7) Drag and Drop: Gesture for drag and drop function-
Input: Input image I ality. Can be used to move/tranfer files from one directory
Output: Output feature map O to other.
1: Initialize: Set stride S and filter size K ; Calculate: 8) Multiple Item Selection: Gesture to select multiple
Output size Os = (Is −K )/S + 1; items
2: for each filter Fi do 9) Volume Control: Dynamic Gestures for Volume
3: for each output channel c do con- trol. The rate of increase/decrease of volume is
4: for each pixel in Oc do propor- tional to the distance moved by pinch gesture
5: Calculate: Starting pixel ps = pixeli ∗S; from start point.
Calculate: Ending pixel pe = ps + K ; 10) Brightness Control: Dynamic Gestures for Bright-
Extract: K × K region R from Ic starting ness control. The rate of increase/decrease of brightness
from ps ; Convolve: Element-wise is proportional to the distance moved by pinch gesture
multiply R with Fi ; Sum: Add up all from start point.
the elements in the resulting matrix; B. Voice Automated Mouse
Assign: Result to corresponding pixel in 1) Launch / Stop Gesture Recognition: article graphicx
Oc ; Echo Launch Gesture Recognition Turns on webcam
6: end for hand gesture recognition. Echo Stop Gesture
7: end Recognition Turns off webcam and stops gesture
8: end recognition. (Termi- nation of Gesture controller can also
be done via pressing Enter key in webcam window)
2) Google Search: Echo search (text you wish to
search) Opens a new tab on Chrome Browser if it is
V. WORK DONE AND RESULTS ANALYSIS running, else opens a new window. Searches the given
A. Gesture-Controlled Mouse text on Google.
1) Neutral Gesture: Neutral Gesture. Used to halt/stop 3) Find a Location on Google Maps: Echo Find a Lo-
execution of current gesture. cation Will ask the user for the location to be searched.
2) Move Cursor: Cursor is assigned to the midpoint of (Location you wish to find) Will find the required location
index and middle fingertips. This gesture moves the on Google Maps in a new Chrome tab.
cursor to the desired location. Speed of the cursor 4) File Navigation: Echo list files / Echo list Will list
movement is proportional to the speed of hand. the files and respective file numbers in your Current
3) Left Click: Gesture for single left click Directory (by default C:) Echo open (file number) Opens
the file / directory corresponding to specified file number.
Echo go back / Echo back Changes the Current Directory
to Parent Directory and lists the files.

Fig. 1. Virtual Mouse

IJERTV12IS040131 www.ijert.org 559


(This work is licensed under a Creative Commons Attribution 4.0 International License.)
Published by : International Journal of Engineering Research & Technology (IJERT)
http://www.ijert.org ISSN: 2278-0181
Vol. 12 Issue 04, April-2023

for providing us with opportunity to encourage us to write this


paper.
REFERENCES
[1] Tsang, W.-W. M., Kong-Pang Pun. (2005). A finger-tracking
virtual mouse realized in an embedded system. 2005
International Sympo- sium on Intelligent Signal Processing and
Communication Systems. doi:10.1109/ispacs.2005.1595526.
[2] Tsai, T.-H., Huang, C.-C., Zhang, K.-L. (2015).
Embedded vir- tual mouse system by using hand gesture
recognition. 2015 IEEE International Conference on
Consumer Electronics - Taiwan. doi:10.1109/icce-
tw.2015.7216939 10.1109/icce-tw.2015.7216939.
[3] Roh, M.-C., Huh, S.-J., Lee, S.-W. (2009). A Virtual
Fig. 2. Voice Assistant- ECHO Mouse interface based on Two-layered Bayesian Network.
2009 Workshop on Applications of Computer Vision (WACV).
doi:10.1109/wacv.2009.5403082 10.1109/wacv.2009.5403082.
5) Current Date and Time: Echo what is today’s date / [4] Li Wensheng, Deng Chunjian, Lv Yi. (2010).
Echo date Echo what is the time / Echo time Returns the Implementation of virtual mouse based on machine vision. The
current date and time. 2010 Interna- tional Conference on Apperceiving Computing
and Intelligence Analysis Proceeding.
6) Copy and Paste: Echo Copy Copies the selected
doi:10.1109/icacia.2010.5709921 10.1109/ica-
text to clipboard. Echo Paste Pastes the copied text. cia.2010.5709921.
7) Sleep / Wake up Echo: Sleep Echo bye Pauses voice [5] Choi, O., Son, Y.-J., Lim, H., Ahn, S. C. (2018). Co-
command execution till the assistant is woken up. Wake recognition of multiple fingertips for tabletop human-projector
interaction. IEEE Transactions on Multimedia, 1–1.
up Echo wake up Resumes voice command execution.
doi:10.1109/tmm.2018.2880608.
8) Exit: Echo Exit Terminates the voice assistant [6] Jyothilakshmi P, Rekha, K. R., Nataraj, K. R. (2015). A
thread. GUI window needs to be closed manually. frame- work for human- machine interaction using Depthmap
and com- pactness. 2015 International Conference on
Emerging Research in Electronics, Computer Science and
VI. CONCLUSIONS
Technology (ICERECT). doi:10.1109/erect.2015.7499060.
In conclusion, Gesture Controlled Virtual Mouse is an [7] [7]S. Vasanthagokul, K. Vijaya Guru Kamakshi, Gaurab
innovative system that revolutionizes the way humans Mudbhari, T. Chithrakumar, "Virtual Mouse to Enhance User
interact with computers. The use of hand gestures and Experience and Increase Accessibility", 2022 4th International
Conference on Inven- tive Research in Computing Applications
voice commands provides a new level of convenience
(ICIRCA), pp.1266-1271, 2022,
and ease to users, allowing them to control all I/O op- doi:10.1109/ICIRCA54612.2022.9985625.
erations without any direct contact with the computer. [8] Shajideen, S. M. S., Preetha, V. H. (2018). Hand
The system utilizes state-of-the-art Machine Learning and Gestures - Virtual Mouse for Human Computer Interaction.
2018 International Conference on Smart Systems and Inventive
Computer Vision algorithms such as CNN implemented
Technology (ICS-SIT). doi:10.1109/icssit.2018.8748401.
by MediaPipe running on top of pybind11 to recognize [9] Henzen, A., Nohama, P. (2016). Adaptable virtual keyboard
hand gestures and voice commands accurately and and mouse for people with special needs. 2016 Future
efficiently. The two modules - one for direct hand Technologies Conference (FTC).
doi:10.1109/ftc.2016.7821782.
detection and the other for gloves of any uniform color -
[10] Reddy, V. V., Dhyanchand, T., Krishna, G.
cater to different user preferences and provide flexibility V.,
in usage. Additionally, the system incorporates a voice Mahes
automation feature that serves various tasks with great h- waram, S. (2020). Virtual Mouse Control Using Colored
Fin- ger Tips and Hand Gesture Recognition. 2020 IEEE-
efficiency, accuracy, and ease. With the current
HYDCON. doi:10.1109/hydcon48903.2020.9242677 .
implementation of the system on the Windows platform, [11] Shetty, M., Daniel, C. A., Bhatkar, M. K., Lopes, O. P.
Gesture Controlled Virtual Mouse presents an exciting (2020). Virtual Mouse Using Object Tracking. 2020 5th
prospect for the future of human-computer interaction. It International Con- ference on Communication and
Electronics Systems (IC-CES).
is expected to increase productivity and convenience for
doi:10.1109/icces48766.2020.9137854.
users and could poten- tially have numerous practical [12] Xu, G., Wang, Y., Feng, X. (2009). A Robust Low Cost
applications in industries such as healthcare, gaming, and Virtual Mouse Based on Face Tracking. 2009 Chinese
manufacturing. Conference on Pattern Recogni-tion.
doi:10.1109/ccpr.2009.5344072
ACKNOWLEDGMENT
We would like to thank Miss Sunayana for her valuable
comments, suggestions to improve the quality of the
paper and for helping us review our work regularly. We
would also like to thank the Department of Computer
Science and Engineering, B.M.S. College of Engineering

IJERTV12IS040131 www.ijert.org 560


(This work is licensed under a Creative Commons Attribution 4.0 International License.)

You might also like