Ai Virtual Mouse and Keyboard Using Python and Opencv To Abate The Spread of Covid-19
Ai Virtual Mouse and Keyboard Using Python and Opencv To Abate The Spread of Covid-19
BACHELOR OF
TECHNOLOGY IN
ELECTRONICS AND COMMUNICATION ENGINEERING
Submitted by
G. Sirisha-318126512077
M.Chandra Sekhar-
U.Mahesh-318126512110
318126512088 A.Trivedh-
318126512061
(2021-2022)
i
ACKNOWLEDGEMENT
We would like to express our deep gratitude to our project guide Mrs.Ch.Padma
Sree, M.Tech, (Ph.D) Department of Electronics and Communication Engineering,
ANITS, for his guidance with unsurpassed knowledge and immense encouragement.
We are grateful to Dr. V. Rajyalakshmi, Head of the Department, Electronics and
Communication Engineering, for providing us with the required facilities for the
completion of the project work.
We express our thanks to all teaching faculty of the Department of ECE, whose
suggestions during reviews helped us in accomplishment of our project. We would
like to thank all non-teaching staff of the Department of ECE, ANITS for providing
great assistance in accomplishment of our project.
We would like to thank our parents, friends, and classmates for their encouragement
throughout our project period. At last, but not the least, we thank everyone for
supporting us directly or indirectly in completing this project successfully.
PROJECT STUDENTS
G.Sirisha (318126512077)
U.Mahesh (318126512110)
M.Chandra sekhar (318126512088)
A.Trivedh (318126512061)
iii
ABSTRACT
iv
CONTENTS
LIST OF TABLES ix
LIST OF ABBREVATIONS x
1
CHAPTER 1 INTRODUCTION
v
27
CHAPTER 4 SYSTEM DESIGN
28
4.1 Introduction
29
4.2 Algorithm Illustration
29
4.2.1 Landmarking algorithm
4.2.2 MediaPipe 31
4.2.3 OpenCV 34
35
4.3 System Requirements
4.3.1 Software Requirements 35
CHAPTER 6 CONCLUSION 53
FUTURE SCOPE 55
REFERENCES 57
vi
LIST OF FIGURES
viii
LIST OF TABLES
ix
LIST OF ABBREVATIONS
AI Artificial Intelligence
ML Machine Learning
UI User Interface
CV Computer Vision
CT Computed Tomography
OS Operating System
RGB Red-Green-Blue
x
CHAPTER 1
INTRODUCTIO
N
1
In general, devices are becoming compact in the form of bluetooth or wireless
technologies. This project proposes an AI virtual mouse system that makes use
of the hand gestures and hand tip detection for performing mouse functions in
the computer using computer vision.
Project Objective
The objective of this project is to provide an alternative for a routine physical
mouse so that there will be a less physical contact with mouse. We can perform
all the mouse operations and few keyboard operations by just recognizing
different hand- gestures through web-cam.
Project Outline
The outline of the project is as follows. In today’s world there is lots of
development happening in the field of Technology. Today’s technology is
combined with the technique called Artificial Intelligence. This project is also
based on small part of AI. This project presents finger movement gesture
detection on our computer’s window using camera & handling the whole system
by just moving your one finger. Using finger detection methods for instant
camera access and user-friendly user interface makes it more easily accessible.
This system reduces the use of any physical mouse which saves time and also
reduces efforts. AI virtual mouse and keyboard is developed using Python and
OpenCV, a computer vision library. The proposed model utilizes MediaPipe
package for recognizing the hands and the tip of the fingers, as well as
PyAutoGUI and Autopy packages for controlling the system by performing
mouse operations like right click, left click, scroll up, scroll down and keyboard
operations like escape, volume up, volume down. Outcome of this model
demonstrates a high-level accuracy which can function extremely well in real-
time applications using only a CPU and no GPU. This system also helps in
controlling robots.
2
CHAPTER 2
LITERATURE REVIEW
3
There are traditional approaches for virtual keyboard and mouse systems which
are usually based on hand gestures. But few are done using deep learning and
few using different algorithms. Our literature review focuses on the research
works on virtual keyboard and virtual mouse which were published previously.
Shindhe et al. expanded a method for mouse free cursor control where mouse
cursor operations are controlled by using hand fingers. They have collected hand
gestures via webcam using color detection principles. The built-in function of
image processing toolbox in MATLAB and a mouse driver, written in java are
used in this approach. The pointer was not too efficient on the air as the cursor
was very sensitive to the motion.
4
In 2011, S. Hernanto et al. built a method for virtual keyboard using webcam. In
this approach, two functions are used for finger detection and location. This
system used two different webcams which are used to detect skin and location
separately. The average time per character of this virtual keyboard is 2.92
milliseconds and the average accuracy of this system is 88.61%.
In 2016, Hubert Cecotti developed a system for disabled people named a multi-
modal gaze-controlled virtual keyboard. The virtual keyboard has 8 main
commands for menu selection to spell 30 different characters and a delete button
to recover from error. They evaluated the performance of the system using the
speed and information transfer rate at both the command and application levels.
5
CHAPTER 3
THEORETICAL ASPECTS
6
Human-Computer Interaction Technology
7
HCI practitioners find the optimal combination that fits the purpose of the
product. For example, for a mobile app, this might be a combination of visual UI
and auditory UI. Mouse and keyboard are one of the brilliant developments of
HCI.
Applications of HCI
For example, the sensory perception and interactive input devices include speech
recognition, keyboards, and touch-sensitive screens; the output devices include
the printers and visual display; wireless devices such as the application of
wireless internet; and the virtual reality devices.
There are few devices which are related to HCI. They are
a. Mouse
A computer mouse is a handheld hardware input device that controls a cursor in
a GUI for pointing, moving and selecting text, icons, files, and folders on your
computer. In addition to these functions, a mouse can also be used to drag-and-
drop objects and give you access to the right-click menu. For desktop
computers, the mouse is placed on a flat surface (e.g., mouse pad or desk) in
front of your computer. The full form of mouse is Manually Operated User
Selection Equipment or Mechanically Operated User Signal Engine.
Wired Mouse
Bluetooth Mouse
Trackball Mouse
Optical Mouse
Laser Mouse
Magic Mouse
USB Mouse
Vertical
Mouse Functions of
mouse:
Point.
8
Select.
9
Hover.
Scroll.
Drag-and-drop.
b. Keyboard
Qwerty Keyboards
Wired Keyboards
Numeric Keypads
Ergonomic Keyboards
Wireless Keyboards
USB Keyboards
Bluetooth Keyboards
Computer Vision
One of the most powerful and compelling types of AI is computer vision which
you’ve almost surely experienced in any number of ways without even knowing.
Computer vision is the field of computer science that focuses on replicating
parts of the complexity of the human vision system and enabling computers to
identify and process objects in images and videos in the same way that worked
in limited capacity. Thanks to advances in artificial intelligence and innovations
in deep- learning and neural networks, the field has been able to take great leaps
in recent years and has been able to surpass humans in some tasks related to
detecting and labeling objects.
1
One of the driving factors behind the growth of computer vision is the amount of
data we generate today that is then used to train and make computer vision
better.
Before the advent of deep learning, the tasks that computer vision could perform
were very limited and required a lot of manual coding and effort by developers
and human operators. For instance, if you wanted to perform facial recognition,
you would have to perform the following steps:
Create a database: You had to capture individual images of all the subjects you
wanted to track in a specific format.
Annotate images: Then for every individual image, you would have to enter
several key data points, such as distance between the eyes, the width of nose
bridge, distance between upper-lip and nose, and dozens of other measurements
that define the unique characteristics of each person.
Capture new images: Next, you would have to capture new images, whether
from photographs or video content. And then you had to go through the
measurement process again, marking the key points on the image.
After all this manual work, the application would finally be able to compare the
measurements in the new image with the ones stored in its database and tell you
whether it corresponded with any of the profiles it was tracking.
Machine learning provided a different approach to solving computer vision
problems.
1
Image Processing
Image processing is a method to convert an image into digital form and perform
some operations on it, in order to get an enhanced image or to extract some
useful information from it. Image processing is one form of signal processing in
which the input is a photograph or video frame; the output may be either an
image or a set of characteristics or parameters related to the image. An image
contains sub- images sometimes referred as regions-of-interest, or simply
regions this implies that images contain collections of objects each of which can
be the basis for a region. Thus, we have chosen image processing for identifying
the defects on the surface of the Rexene, where the defective part will be the
area of interest.
In Image science, Image processing is any form of for which the input is an
image, such as a photograph or video frame; the output of image processing may
be either image or a set of characteristics or parameters related to the image.
Image processing usually refers to digital image processing, but optical and
analog image processing also are possible. The acquisition of images (producing
the input image in the first place) is referred to imaging.
Image processing is any form of signal processing for which the input is an, such
as a photograph or video frame, the output of image processing may be either an
image or a set of characteristics or parameters related to the image. Most of the
image-processing techniques involve treating the image as a two-dimensional
signal and applying standard signal-processing techniques to it. An image may
be considered to contain sub-images sometimes referred to as regions-of-
interest, ROIs, or simply regions. This concept reflects the fact that images
frequently contain collections of objects each of which can be the basis for a
region. Thus, we have chosen image processing for identifying the defects on
the surface of the Ceramic stile, where the defective part will be the area of
interest.
1
not to human eyes like satellite photographs. Output is the last stage in which
result can be altered image or report that is based on image analysis.
1
Basic terms in Image Processing
Pixel is the smallest element of an image. The value of a pixel at any point
corresponds to the intensity of the light photons striking at that point. Each pixel
stores a value proportional to the light intensity at that particular location.
1
Resolution
The term resolution refers to the total number of count of pixels in a digital
image. For example, if an image has M rows and N columns, then its resolution
can be defined as M x N. If we define resolution as the total number of pixels,
then pixel resolution can be defined with set of two numbers. The 1 st number the
pixels across columns, and the2ndnumber is the pixels across its rows. We can
say that the higher is the pixel resolution and the higher is the quality of the
image. Size of an image = (pixel resolution) X (bits per pixel).
The two types of image processing used are analog image processing and
digital image processing.
In this case, digital computers are used to process the image. The image will be
converted into the digital form using a scanner–digitizer and then process it. It is
defined as the subjecting numerical representation of objects to a series of
1
operations in order to obtain the desired result. It starts with one image and
produces a modified version of the image. It is therefore an image that takes one
image into another.
1
Fig. 3.2 Image resizing
2. Image filtering
Uncertainties are introduced into the image such as random image noise, partial
volume effects and intensity non-uniformity artifact (INU), due to the movement
of the camera. This results in smooth and slowly varying change in image pixel
values and lead to information loss, SNR gain and degradation of edge and finer
details of image. Spatial filters are used for noise reduction. These filters may be
linear or non-linear filters.
3. Image segmentation
1
Fig. 3.4 Image segmentation
2. Signature verification
2
Machine learning
Machine learning (ML) is a type of artificial intelligence (AI) that allows
software applications to become more accurate at predicting outcomes without
being explicitly programmed to do so. Machine learning algorithms use
historical data as input to predict new output values. Machine learning (ML) is
the study of computer algorithms that can improve automatically through
experience and by the use of data. It is seen as a part of artificial intelligence.
Machine learning algorithms build a model based on sample data, known as
training data, in order to make predictions or decisions without being explicitly
programmed to do so. Machine learning algorithms are used in a wide variety of
applications, such as in medicine, email filtering, speech recognition, and
computer vision, where it is difficult or unfeasible to develop conventional
algorithms to perform the needed tasks.
Machine learning has given the computer systems the abilities to automatically
learn without being explicitly programmed. So, it can be described using the life
cycle of machine learning. Machine learning life cycle is a cyclic process to
build an efficient machine learning project. The main purpose of the life cycle is
to find a solution to the problem or project.
Machine learning life cycle involves seven major steps, which are given below:
Data preparation
Data Wrangling
2
Analyse Data
Deployment
Machine learning is a buzz word for today's technology, and it is growing very
rapidly day by day. We are using machine learning in our daily life even without
knowing it such as Google Maps, Google assistant, Alexa, etc.
2
Fig 3.6: Applications of Machine Learning
2
In addition to performing linear classification, SVMs can efficiently perform
anon- linear classification using what is called the kernel trick, implicitly
mapping the inputs into high-dimensional feature spaces. When data are
unlabeled, supervised learning is not possible, and a supervised learning
approach is required, which attempts to find natural clustering of the data to
groups, and then map new data to these formed groups.
H1 does not separate the classes. H2 does, but only with a small margin.
H3 separates them with the maximal margin.
2
separation is achieved by the hyperplane that has the largest distance to the
nearest training-data point of any class (so-called functional margin), since in
general the larger the margin, the lower the generalization error of the classifier.
For this reason, it was proposed that the original finite-dimensional space be
mapped in to a much higher-dimensional space, presumably making the
separation easier in that space. To keep the computational load reasonable, the
mappings used by SVM schemes are designed to ensure that dot products of
pairs of input data vectors may be computed easily in terms of the variables in
the original space, by defining them in terms of a kernel function selected to suit
the problem. The hyperplanes in the higher- dimensional space are defined as
the set of points whose dot product with a vector in that space is constant, where
such a set of vectors is an orthogonal (and thus minimal) set of vectors that
defines a hyperplane. The vectors defining the hyperplanes can be chosen to be
linear combinations with parameters alpha of images of feature vectors that
occur in the data base. With this choice of a hyperplane, the points in the feature
space that are mapped into the hyperplane are defined by a relation.
2
Working of SVM
For example, consider the following figure, in which the data points fall into two
different categories.
The two categories can be separated with a curve, as shown in the following
figure.
After the transformation, the boundary between the two categories can be
defined by a hyperplane, as shown in the following figure.
2
The mathematical function used for the transformation is known as
the kernel function. SVM in IBMSPSS modeller supports the following kernel
types:
Linear
Polynomial
Sigmoid
Support Vectors – Data points that are closest to the hyperplane is called
support vectors. Separating line will be defined with the help of these data
points.
Margin − It may be defined as the gap between two lines on the closet data
points of different classes. It can be calculated as the perpendicular distance from
the line to the support vectors. Large margin is considered as a good margin and
small margin is considered as a bad margin.
Applications of SVM:
SVMs can be used to solve various real-world problems. SVMs are helpful in
text and hypertext categorization, as their application can significantly reduce the
need for labeled training instances in both the standard inductive and
transductive settings. Some methods for shallow semantic parsing are based on
support vector machines. Classification of images can also be performed using
SVMs. Experimental results show that SVMs achieve significantly higher search
accuracy than traditional query refinement schemes after just three to four
rounds of relevance feedback. This is also true for image segmentation systems,
2
including
2
those using a modified version SVM that uses the privileged approach as
suggested by Vapnik.
Classification of satellite data like SAR data using supervised SVM. Hand-
written characters can be recognized using SVM. The SVM algorithm has been
widely applied in the biological and other sciences. They have been used to
classify proteins with upto 90% of the compounds classified correctly.
Permutation tests based on SVM weights have been suggested as a mechanism
for interpretation of SVM models. Support- vector machine weights have also
been used to interpret SVM models in the past. Post interpretation of support-
vector machine models in order to identify features used by the model to make
predictions is a relatively new area of research with special significance in the
biological sciences.
2
CHAPTER 4
SYSTEM
DESIGN
3
Introduction
Brain with the development technologies in the areas of augmented reality and
devices that we use in our daily life, these devices are becoming compact in the
form of Bluetooth or wireless technologies. This project proposes an AI virtual
mouse system that makes use of the hand gestures and hand tip detection for
performing mouse functions in the computer using computer vision.
The main objective of the proposed system is to perform computer mouse cursor
functions and scroll function using a web camera or a built-in camera in the
computer instead of using a traditional mouse device. Hand gesture and hand tip
detection by using computer vision is used as a HCI with the computer. With the
use of the AI virtual mouse system, we can track the fingertip of the hand
gesture by using a built-in camera or web camera and perform the mouse cursor
operations and scrolling function and also move the cursor with it. While using a
wireless or a Bluetooth mouse, some devices such as the mouse, the dongle to
connect to the PC, and also, a battery to power the mouse to operate are used,
but in this paper, the user uses his/her built-in camera or a webcam and uses
his/her hand gestures to control the computer mouse operations. In the proposed
system, the web camera captures and then processes the frames that have been
captured and then recognizes the various hand gestures and hand tip gestures and
then performs the particular mouse function. Python programming language is
used for developing the AI virtual mouse system, and also, OpenCV which is the
library for computer vision is used in the AI virtual mouse system.
In the proposed AI virtual mouse system, the model makes use of the MediaPipe
package for the tracking of the hands and for tracking of the tip of the hands, and
also, Autopy and PyAutoGUI packages were used for moving around the
window screen of the computer for performing functions such as left click, right
click, and scrolling functions. The results of the proposed model showed very
high accuracy level, and the proposed model can work very well in real-world
application with the use of a CPU without the use of a GPU.
3
Algorithm Illustration
In this project we have used mainly Landmarking Algorithm in which we will
be having palm model and hand landmark model. This algorithm uses machine
leaning algorithm and is present in MediaPipe package. Palm model and hand
landmark model are described below.
Our method addresses the above challenges using different strategies. First, we
train a palm detector instead of a hand detector, since estimating bounding boxes
of rigid objects like palms and fists is significantly simpler than detecting hands
with articulated fingers. In addition, as palms are smaller objects, the non-
maximum suppression algorithm works well even for two-hand self-occlusion
cases, like handshakes. Moreover, palms can be modeled using square bounding
boxes (anchors in ML terminology) ignoring other aspect ratios, and therefore
reducing the number of anchors by a factor of 3-5. Second, an encoder-decoder
feature extractor is used for bigger scene context awareness even for small
objects (similar to the Retina Net approach). Lastly, we minimize the focal loss
during training to support a large amount of anchors resulting from the high
scale variance.
3
just 86.22%.
b) Hand Landmark Model:
After the palm detection over the whole image our subsequent hand landmark
model performs precise keypoint localization of 21 3D hand-knuckle
coordinates inside the detected hand regions via regression that is direct
coordinate prediction. The model learns a consistent internal hand pose
representation and is robust even to partially visible hands and self-occlusions.
3
Fig 4.1: Landmarks of a hand
MediaPipe
The ability to perceive the shape and motion of hands can be a vital component
in improving the user experience across a variety of technological domains and
platforms. For example, it can form the basis for sign language understanding
and hand gesture control, and can also enable the overlay of digital content and
information on top of the physical world in augmented reality. While coming
naturally to people, robust real-time hand perception is a decidedly challenging
computer vision task, as hands often occlude themselves or each other (e.g:
finger/palm occlusions and handshakes) and lack high contrast patterns.
3
a) About ML(Machine Learning) Pipeline
A palm detection model that operates on the full image and returns an oriented
hand bounding box. A hand landmark model that operates on the cropped image
region defined by the palm detector and returns high-fidelity 3D hand key
points.
This strategy is similar to that employed in our MediaPipe Face Mesh solution,
which uses a face detector together with a face landmark model.
Providing the accurately cropped hand image to the hand landmark model
drastically reduces the need for data augmentation (e.g. rotations, translation and
scale) and instead allows the network to dedicate most of its capacity towards
coordinate prediction accuracy. In addition, in our pipeline the crops can also be
generated based on the hand landmarks identified in the previous frame, and
only when the landmark model could no longer identify hand presence is palm
detection invoked to re-localize the hand.
3
Fig 4.2: Flowchart of MediaPipe
3
OpenCV
OpenCV is the huge open-source library for the computer vision, machine
learning, and image processing and now it plays a major role in real-time
operation which is very important in today’s systems. By using it, one can
process images and videos to identify objects, faces, or even handwriting of a
human. When it integrated with various libraries, such as NumPy, python is
capable of processing the OpenCV array structure for analysis. To identify
image pattern and its various features we use vector space and perform
mathematical operations on these features.
The first OpenCV version was 1.0. OpenCV is released under a BSD license and
hence it’s free for both academic and commercial use. It has C++, C, Python and
Java interfaces and supports Windows, Linux, Mac OS, iOS and Android. When
OpenCV was designed the main focus was real-time applications for
computational efficiency. All things are written in optimized C/C++ to take
advantage of multi-core processing.
Applications of OpenCV:
There are lots of applications which are solved using OpenCV, some of them are
listed below
Face recognition
Object recognition
3
OpenCV Functionality:
System requirements
Software Requirement:
Applications of Python
As mentioned before, Python is one of the most widely used languages over the
web. Here are few applications of Python:
Portable − Python can run on a wide variety of hardware platforms and has
the same interface on all platforms.
3
GUI Programming − Python supports GUI applications that can be
created and ported to many system calls, libraries and windows systems,
such as Windows MFC, Macintosh, and the X Window system of Unix.
2. OPENCV:
on import cv2
vid =
cv2.VideoCapture(0)
while(True):
cv2.imshow('frame', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
vid.release(
cv2.destroyAllWindows()
3. MEDIAPIPE:
PyAutoGUI is a Python automation library used to click, drag, scroll, move, etc.
It can be used to click at an exact position.
import
3
pyautogui screen
Width,
4
screen Height = pyautogui.size()
pyautogui.keyUp('shift')
pyautogui.hotkey('ctrl', 'c')
6. PYCHARM:
PyCharm provides an API so that developers can write their own plugins to
extend PyCharm features. Several plugins from other JetBrains IDE also work
with PyCharm. There are more than 1000 plugins which are compatible with
PyCharm.
4
Hardware requirement:
1. WEBCAM:
A webcam is a digital video device commonly built into a computer. Its main
function is to transmit pictures over the Internet. It is popularly used with instant
messaging services and for recording images. A webcam is video camera that
feeds or streams an image or video in real time to or through a computer
network, such as the internet.
Webcams are typically small cameras that sit on a desk, attach to a user's
monitor, or are built into the hardware. Webcams can be used during a video
chat session involving two or more people, with conversations that include live
audio and video.
Webcam software enables users to record a video or stream the video on the
Internet. As video streaming over the Internet requires much bandwidth, such
streams usually use compressed formats. The maximum resolution of a webcam
is also lower than most handheld video cameras, as higher resolutions would be
reduced during transmission. The lower resolution enables webcams to be
relatively inexpensive compared to most video cameras, but the effect is
adequate for video chat sessions.
The term "webcam" (a clipped compound) may also be used in its original sense
of a video camera connected to the Web continuously for an indefinite time,
rather than for a particular session, generally supplying a view for anyone who
visits its web page over the Internet. Some of them, for example, those used as
online traffic cameras, are expensive, rugged professional video cameras.
4
METHODOLOGY
The various functions and conditions used in the system are explained in the
flowchart of the real-time AI virtual mouse system shown in Fig 4.3.
The proposed AI virtual mouse system is based on the frames that have been
captured by the webcam in a laptop or PC. By using the Python computer vision
library OpenCV, the video capture object is created and the web camera will
start capturing video. The web camera captures and passes the frames to the AI
virtual system.
The AI virtual mouse system uses the webcam where each frame is captured till
the termination of the program. The video frames are processed from BGR to
RGB color space to find the hands in the video frame by frame.
The AI virtual mouse system makes use of the transformational algorithm, and it
converts the coordinates of fingertip from the webcam screen to the computer
window full screen for controlling the mouse. When the hands are detected and
when we find which finger is up for performing the specific mouse function, the
web-cam captures that particular frame and process further operation.
In this stage, we are detecting which finger is up using the tip Id of the
respective finger that we found using the MediaPipe and the respective co-
ordinates of the fingers that are up, and according to that, the particular mouse
function is performed.
1. If index finger with tip id 1 is up then the mouse is moved around the
window of the computer by using AutoPy package.
4
3. If both index finger with tip id 1 and middle finger with tip id 2 are up
then the Right-click operation is performed by using PyautoGUI
package.
4. If both thumb finger with tip id 0 and index finger with tip id 1 are up
then the scroll-up operation is performed by using PyautoGUI package.
5. If middle finger with tip id 2, ring finger with tip id 3 and little finger
with tip id 4 are up then the scroll-down operation is performed by using
PyautoGUI package.
6. If thumb finger with tip id 0, index finger with tip id 1 middle finger
with tip id 2 are up then the volume-up operation is performed by using
PyautoGUI package.
7. If ring finger with tip id 3 and little finger with tip id 4 are up then the
volume-down operation is performed by using PyautoGUI package.
4
Fig 4.3: Flow-chart of real-time AI virtual mouse and keyboard system
4
CHAPTER 5
RESULTS AND DISCUSSIONS
4
SIMULATED RESULTS
This AI virtual mouse system and virtual keyboard can be created totally
utilizing open- source software. So, anyone can utilize anywhere with
computers, no particular preparing ought to be required to function the
framework. They just need to know the hand gestures for particular operation.
This project uses the concept of advancing the HCI using computer vision. In
this proposed system, there is no drawback of detecting of different skin colors
of hand.
The proposed systems use the following tools i.e., Python3.8 and above,
OpenCV, MediaPipe, Numpy, Autopy, PyAutoGUI and time. This complete
process is implemented in the PyCharm platform. Once after running the
program, the camera of your device will be automatically accessed and you can
start operating your system with different hand gestures. Different hand gestures
for computer to perform mouse operations are given below:
If all tip id’s are up then hand is recognized, that can be observed in Fig 5.1.
4
Fig 5.2: Gesture for mouse movement
If index finger with tip id 1 is up then the mouse is moved around the window of
the computer by using AutoPy package as shown in Fig 5.2.
4
Fig 5.4: Gesture for right click function
If both index finger with tip id 1 and middle finger with tip id 2 are up then
in Fig 5.4.
If both thumb finger with tip id 0 and index finger with tip id 1 are up then
4
Fig 5.6: Gesture for scroll down function
If middle finger with tip id 2, ring finger with tip id 3 and little finger with tip
If thumb finger with tip id 0, index finger with tip id 1 middle finger with tip
5
Fig 5.8: Gesture for Volume down function
If ring finger with tip id 3 and little finger with tip id 4 are up then the volume
Fig5.8.
5
Fig 5.10: Gesture for typing or clicking a letter on screen from virtual
keyboard
finger with tip id 1 and middle finger with tip id 2 are up and distance
The prototype of virtual keyboard is shown in Fig 5.11 where the typing is
possible only in the specified field given on the screen when virtual keyboard
is displayed.
5
TABLE 5.1
RESPECTIVE TIP ID’S FOR FINGERS
TABLE 5.2
TESTED
RESULTS
5
ration Scroll-down operation Volume-up operation Volume-down operation
Escape function
Accuracy
5060708090100
Fig 5.12: Graph determining accuracy level of each operation
From the above Table 5.12, our proposed AI virtual mouse system is 99.8%
accurate and AI virtual keyboard is 97% accurate, which justifies that our
system performed well. There is a bit less accuracy in scroll-up operation since
we have given less clicks for one time scrolling. Since this is an open source,
you can edit and provide how much scrolling you need. When compared to
previous models of AI virtual mouse and keyboard, our model worked very well
and the accuracy level can be observed in Table 5.2.
5
Performance Analysis
5060708090100
We can observe the accuracy level when compared to previous models is more
in our proposed model in Table 3. The graph for comparison between the
models is also shown in Fig 5.13.
1. The proposed model has a greater accuracy of 99.7% which is far greater
than that of other proposed models for virtual mouse and keyboard, and it has
many applications.
2. Amidst the COVID-19 situation, it is not safe to use the devices by touching
5
them because it may result in a possible situation of spread of the virus by
touching the devices, so the proposed AI virtual mouse can be used to
control the PC mouse functions without using the physical mouse.
3. The system can be used to control robots and automation systems without
the usage of devices.
4. 2D and 3D images can be drawn using the AI virtual system using the
hand gestures.
5. AI virtual mouse can be used to play virtual reality and augmented reality-
based games without the wireless or wired mouse devices.
6. In the field of robotics, the proposed system like HCI can be used
for controlling robots.
5
CHAPTER 6
CONCLUSION
5
CONCLUSION
This system proposes a framework that recognizes hand motions and getting
freed of the requirement for a mouse and keyboard. This framework is based on
computer vision calculations and can perform all mouse errands. The past
click and a few challenges in clicking and dragging to choose the content. From
the results of the demonstrated system is ready to conclude that the proposed AI
virtual mouse framework has executed well and features a more prominent
and keyboard are useful for many applications such as designing and
architecture, controlling robots, automation systems and also this model is used
5
FUTURE SCOPE
In this proposed system just the prototype for a keyboard is represented but it
can be developed in such a way which can be used to type anywhere rather than
in only a specified place on the virtual keyboard.
5
Paper Publication details
6
REFERENCES
6
(2016).
[13] D.-S. Tran, N.-H. Ho, H.-J. Yang, S.-H. Kim, and G. S. Lee, “Real-time virtual
mouse system using RGB-D images and fingertip detection,” Multimedia Tools and
Applications Multimedia Tools and Applications, vol. 80, no. 7, pp. 10473–
10490,2021.
[14] International Research Journal of Engineering and Technology (IRJET)
“VIRTUAL MOUSE APPLICATION”, Volume: 08 Issue: 07 | July2021.
[15] H. Shibly, S. Kumar Dey, M. A. Islam, and S. Iftekhar Showrav, “Design and
development of hand gesture based virtual mouse,” in Proceedings of the 2019 1st
International Conference on Advances in Science, Engineering and Robotics
Technology (ICASERT), pp. 1– 5, Dhaka, Bangladesh,May2019.
[16] J.T.Camillo Lugaresi, “Media Pipe: A Framework for Building Perception
Pipelines,”2019.
[17] D - H. Liou, D. Lee, and C.-C. Hsieh, “A real time hand gesture recognition
system using motion history image,” in Proceedings of the 2010 2nd International
Conference on Signal Processing Systems, July2010.
[18] Haria, A. Subramanian, N. Asokkumar, S. Poddar, andJ. S. Nayak, “Hand gesture
recognition for human computer interaction,” Procedia Computer Science, vol.
115, pp. 367–374,2017.
[19] Adajania, Y., Gosalia, J., Kanade, A., Mehta, H., Shekokar, N.: Virtual keyboard
using shadow analysis. In: 2010 3rd International Conference on Emerging Trends
in Engineering and Technology. pp. 163–165. IEEE (2010).
[20]Krejov P, Bowden R (2013) Multi-touch less: Real-time fingertip detection and
tracking using geodesic maxima. In: Proceedings IEEE International Conference
on Automatic Face and Gesture Recognition.
[21] S.Shriram, B.Nagaraj, J.Jaya, S.Shankar, P.Ajay (2021), Deep learning-based
real- time AI Virtual mouse system using computer vision to avoid COVID-19
Spread, Hindawi . Volume 2021, Article ID 8133076.