0% found this document useful (0 votes)
38 views4 pages

Hand Gesture Recognition Based Virtual Mouse Events

IEEE paper for hand gesture recognition based virtual mouse

Uploaded by

Ganesh Ghumare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views4 pages

Hand Gesture Recognition Based Virtual Mouse Events

IEEE paper for hand gesture recognition based virtual mouse

Uploaded by

Ganesh Ghumare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

2021 2nd International Conference for Emerging Technology (INCET)

Belgaum, India. May 21-23, 2021

Hand Gesture Recognition Based Virtual Mouse


Events
Manav Ranawat Madhur Rajadhyaksha Neha Lakhani
Department of Information Technology Department of Information Technology Department of Information Technology
Sardar Patel Institute of Technology Sardar Patel Institute of Technology Sardar Patel Institute of Technology
Mumbai, India Mumbai, India Mumbai, India
[email protected] [email protected] [email protected]
2021 2nd International Conference for Emerging Technology (INCET) | 978-1-7281-7029-9/20/$31.00 ©2021 IEEE | DOI: 10.1109/INCET51464.2021.9456388

Dr Radha Shankarmani
Department of Information Technology
Sardar Patel Institute of Technology
Mumbai, India
[email protected]

Abstract—This paper proposes a virtual mouse application The first step towards this is detecting the hand contour,
based on the tracking of different hand gestures. The system i.e., the continuous parts of the hand along the boundary with
eliminates the dependency on any external hardware required the same colour and intensity [2]. This is followed by
to perform mouse actions. A built-in camera tracks the user’s creating a convex hull and identifying the convexity defects.
hands, predefined gestures are recognized and the The resultant image is passed into a model trained by
corresponding mouse events are executed. This system has convolutional neural networks to identify the gesture.
been implemented in Python using OpenCV and PyAutoGUI. Centroid identification is done to track the hand. Lastly,
Researchers have studied background conditions, effects of PyAutoGUI libraries are used to trigger the corresponding
differences in illuminance and skin colour individually.
mouse clicks and cursor movement.
However, the proposed system aims to take into account all the
above factors to build an application most suitable in the real This paper is organized as follows: Section II describes in
world. brief, the literature survey; Section III explains the proposed
system and the methodology in detail; Section IV shows the
Keywords— Image Processing, Computer Vision, Gesture results of the model implementation; Section V finally
Recognition, Color Detection, Virtual Mouse, Human-Computer concludes the paper.
Interaction
II. LITERATURE SURVEY
I. INTRODUCTION
Human-Computer Interaction consists of two main
The most commonly employed method of cursor control approaches for Hand Gesture Recognition, hardware-based
in computer systems is the use of a mouse or a touchpad. and vision-based. One of the first hardware-based approaches,
Although it provides comfort and ease of use, this proposed by Quam in 1990, used data gloves to recognize
technology is not free of hardware. In the proposed system, gestures.
an effort is made to completely replace physical hardware
devices for cursor control with a gesture recognition-based The method required the user to wear a bulky data glove
system. which was inconvenient and made it difficult to perform
certain gestures. Vision-based HCI is further classified into
Gesture recognition, a subdomain of computer vision, marker-based and marker-less approaches. The former
consists of a set of images, which may or may not be a video requires the user to wear colour markers or colour caps, while
sequence, given as input. The predefined gestures, if any, are the latter works on the principle of skin detection and hand
detected as the output [5]. This system consists of segmentation [4].
components such as image segmentation, pattern detection,
statistical modelling, etc [12]. Various approaches have been proposed for the use of
colour caps for the detection of fingertips [1] [6]. The intensity
Skin detection is a primary subcategory of image of these pixels in the grayscale image distinguishes the
segmentation. This feature helps in determining hand regions fingertips from the rest of the frame. However, these systems
in a reasonable amount of time. Identification of skin-like do not perform skin pixel detection and hand segmentation
pixels in an image is a binary classification problem from stored frames. Secondly, the accuracy obtained for a
formulated in [8]. Different colour spaces like RGB and noisy background is much less than that obtained for a plain
HSV have been tested and offered various threshold values background.
to discriminate skin pixels from non-skin ones [9].
Siam et al. have applied a similar procedure involving two
Once the skin pigments are extracted, face detection is basic steps, marker detection, and marker tracking. Coloured
performed. There are mainly two ways to build skin colour markers have resulted in lower computational time and
models: nonparametric skin modelling method and increased the accuracy of gesture detection. They have
parametric techniques [10]. This paper implements the Haar implemented the sliding window algorithm which, although
feature- based classification system where models are trained optimized, requires high powered processors to deliver better
from positive and negative images [11]. Once the faces are results. Tracking accuracy has also been observed to reduce in
identified, we subtract them from the image so that the case of hand movement at a greater speed [4].
system can focus solely on hand gesture recognition.

978-1-7281-7029-9/21/$31.00 ©2021 IEEE 1


Authorized licensed use limited to: Zhejiang University. Downloaded on September 12,2024 at 22:37:46 UTC from IEEE Xplore. Restrictions apply.
Yadav et al. have achieved cursor control by skin coloured pixels. This process helps eliminate any
detection, hand contour extraction, hand tracking and gesture background noise near the skin pixels. After erosion, the
recognition. A 32 bin histogram model is used for skin image in the frame becomes thinner as compared to the
detection while a border-finding edge detection algorithm is original. Hence, dilation is applied to the frame, to recover
used to extract hand contours. The centroid of the hand is the pixels lost around the boundaries of the skin region. This
determined for each consecutive frame for hand tracking and is followed by blurring the image with a 2x2 filter to remove
cursor control [3]. This approach is not suitable for real-time sharp edges from the frame and smoothen the colour
applications as the model is based on the assumption that only transitions. Lastly, binary thresholding is performed on the
the hand will be captured in the frame. grayscale image to segment the skin from the rest of the
frame.
A similar approach towards cursor control can be observed
in [5]. Face detection is used to track the face region and
obtain threshold values for the skin colour. Hand contours are
extracted and a convex hull is generated around the hand
region. The endpoints of all the fingers are obtained and
gesture classification is based on the number of fingers
detected. Although this method is efficient for gesture
recognition, it does not focus on hand tracking which is crucial
for cursor control.
Grif et al. take a slightly different approach for recognizing
hand gestures. After initial preprocessing, hand angle features
are considered for gesture recognition. The extreme left,
extreme right and highest pixels are detected in the frame and
the angle between them is calculated. Various hand postures
are mapped to specific intervals of these angles, which in turn
are mapped to certain mouse actions [7]. This approach is also
unsuitable for real-world applications as testing was
performed exclusively in daylight conditions while the
background colour was purposely kept blue to easily
distinguish between the background and skin pixels.
This paper performs marker-less vision-based gesture
recognition where the accuracy remains unaffected by
background noise or changes in the light intensity. The output
has been tested under varying conditions and the results have
been discussed.
III. PROPOSED SYSTEM
A. System Overview
The scope of this paper is to perform mouse events by
classifying the hand gesture performed by the system user.
The events emulated in the model include cursor movement,
left-click, right-click, double-click, scroll-up, scroll-down,
and opening and closing applications. Different gestures will Fig. 1. System architecture
be classified and mapped to these events. Figure 1
demonstrates the system architecture of the proposed 2) Hand Detection
method. The paper has been implemented in four major After skin detection is done, the next step is to detect the
stages, namely, Image Capturing and Preprocessing, Hand face in the image. This has been done using the Haar
Detection, Gesture Recognition and Event Triggering. These Cascade Classifier [11] [14]. It is a cascade function trained
stages are further broken down into more steps. They are on positive and negative images to accurately detect the
explained in detail in the methodology section. features of a face. Once the face is detected, a rectangle is
B. Methodology constructed around it. The pixels inside this rectangle are set
to zero, and thus the face is eliminated. After subtracting the
1) Image Capturing and Preprocessing face, we can assume that any region in the frame densely
A video capturing object is created using Python populated by skin coloured pixels is the hand of the user.
OpenCV which is used to read the frames in the video. To Hand contours are traced along the boundary of these pixels
obtain the colour-information regardless of the image and the rest of the frame. A convex hull is generated to
intensity the RGB colour space has been converted to HSV. enclose all the points in the contour. Convexity defects are
This has shown stable results under varying light intensities used to determine the deviations of the object from the hull.
[13]. Range-based skin detection is then performed on the The last step of the process is the detection of the centroid,
image to distinguish the skin-coloured pixels from the non- which is the arithmetic mean of all points in the hull. This is
skin coloured ones. Erosion, dilation and blur operations are determined by taking the weighted average of these points.
performed on the image for removing any noise in the The centroid will be tracked to perform cursor movements in
background and to accentuate the useful portions of the the following stages.
image [17]. Erosion is performed on the frame for the skin

2
Authorized licensed use limited to: Zhejiang University. Downloaded on September 12,2024 at 22:37:46 UTC from IEEE Xplore. Restrictions apply.
3) Gesture Recognition TABLE I. GESTURES AND THEIR CORRESPONDING MOUSE EVENTS
A dataset containing 6 classes of hand gestures with 1200 Gesture Mouse Event
images each, has been created. This is then accumulated and L Cursor movement
labelled into a csv (comma separated values) file. Using a Paper Left Click, Double Click
Convolutional Neural Network (CNN) architecture as Three Right Click
mentioned in Figure 2, a deep learning model has been Scissor Scroll Down
trained [16]. The model consists of two pairs of convolution Rock On Scroll Up
layers followed by max-pooling layers. The Dropout and Rock Launch Notepad
some Fully Connected layers are added to the model. A
ReLU activation function is used for its initial and hidden
layers and a softmax activation function for its final output
The CNN model has achieved a testing accuracy of
layer. The optimizer used is the Adam optimizer and the loss
98.47%. The gestures recognized are correctly converted to
function used is categorical cross-entropy. As stated above,
the corresponding mouse actions. It has also been observed
the hand region is detected using the endpoints that were
that the gesture ‘Paper’ when held for long emulates the
obtained from contours. This region is preprocessed and
double click operation.
passed as an input to the model to classify the hand gestures
for performing mouse actions.

Fig. 3. ‘L’ gesture for cursor movement

Fig. 2. Model summary


Fig. 4. ‘Paper’ gesture for left click
4) Event Triggering
The frames obtained from the video are preprocessed and
passed as input to the model to classify a hand gesture. The
gestures are predicted and mapped to mouse events using
PyAutoGUI [15]. The mouse actions that can be performed
are - Cursor Movement, Left Click, Right Click, Scroll Up,
Scroll Down and Double Click. A gesture has also been
mapped for opening and closing an application. For each
frame, the centroid position of the hand is stored in a deque
and the two most recent points are used for cursor movement
[18].
Fig. 5. ‘Scissor’ gesture for scrolling down
IV. RESULTS AND EVALUATION
The hand region has been successfully detected and
extracted by eliminating the background. This is evident in
the threshold images as shown in figures 3_to 8 where the
segmented hand region is observed with negligible
background noise. The system has been tested for different
skin colours and the performance remains unaffected. Six
different classes of gestures have been predefined and used
for training the model after which they are translated into
mouse events. The following table depicts the mapping of
the gestures to the corresponding mouse actions. Fig. 6. ‘Rock on’ gesture for scrolling down

3
Authorized licensed use limited to: Zhejiang University. Downloaded on September 12,2024 at 22:37:46 UTC from IEEE Xplore. Restrictions apply.
REFERENCES
[1] K. H. Shibly, S. Kumar Dey, M. A. Islam and S. Iftekhar Showrav,
"Design and Development of Hand Gesture Based Virtual Mouse,"
2019 1st International Conference on Advances in Science,
Engineering and Robotics Technology (ICASERT), Dhaka,
Bangladesh, 2019, pp. 1-5, doi: 10.1109/ICASERT.2019.8934612.
[2] Preksha Pareek, R. P. (2017). Event Triggering Using Hand Gesture
Using Open CV. International Journal of Engineering and Computer
Science, 5(2). Retrieved from
https://www.ijecs.in/index.php/ijecs/article/view/377
[3] Onkar Yadav, Sagar Makwana, Pandhari Yadav, Prof. Leena Raut,
“Cursor Movement By Hand Gesture”, International Journal Of
Engineering Sciences & Research Technology, vol. , no. 3, pp. 243–
247, 2017.
Fig. 7. 'Three' gesture for right-click [4] Siam, Sayem & Sakel, Jahidul & Kabir, Md. (2016). Human Computer
Interaction Using Marker Based Hand Gesture Recognition.
[5] Ozturk, O., Aksac, A., Ozyer, T. et al. “Boosting real-time
recognition of hand posture and gesture for virtual mouse operations
with segmentation”. Appl Intell 43, 786–801 (2015).
https://doi.org/10.1007/s10489-015-0680-z
[6] Abhilash S S, Lisho Thomas, Naveen Wilson, Chaithanya C, “Virtual
Mouse Using Hand Gesture”, International Research Journal of
Engineering and Technology (IRJET), vol. 5, no. 4, pp.3903–3906,
2018.
[7] Grif, Horatiu & Turc, Traian. (2018). Human hand gesture based
system for mouse cursor control. Procedia Manufacturing. 22. 1038-
1042. 10.1016/j.promfg.2018.03.147.
[8] Cohen, Ira & Sebe, Nicu & Garg, Ashutosh & Chen, Lawrence. (2003).
Facial expression recognition from video sequences: Temporal and
static modeling. Computer Vision and Image Understanding. 91. 160-
187. 10.1016/S1077-3142(03)00081-X.
Fig. 8. ‘Rock’ gesture for opening the Notepad application [9] Kakumanu, Praveen & Makrogiannis, Sokratis & Bourbakis, NG.
(2007). A survey of skin-color modeling and detection methods.
Pattern Recognition. 40. 1106-1122. 10.1016/j.patcog.2006.06.010.
V. CONCLUSION
[10] Qiong Liu and Guang-zheng Peng, "A robust skin color based face
The virtual mouse based on hand tracking and gesture detection algorithm," 2010 2nd International Asia Conference on
recognition has been successfully implemented. Conversion Informatics in Control, Automation and Robotics (CAR 2010), Wuhan,
China, 2010, pp. 525-528, doi: 10.1109/CAR.2010.5456614.
of colour space from RGB to HSV has provided greater
accuracy in low light conditions. The presence of the user's [11] P. Viola and M. Jones, "Rapid object detection using a boosted cascade
of simple features," Proceedings of the 2001 IEEE Computer Society
face in the background does not affect the results since face Conference on Computer Vision and Pattern Recognition. CVPR 2001,
detection and subsequent face subtraction have been Kauai, HI, USA, 2001, pp. I-I, doi: 10.1109/CVPR.2001.990517.
executed. Hand contour and convex hull have been [12] S. Mitra and T. Acharya, "Gesture Recognition: A Survey," in IEEE
constructed which help in hand detection. The centroid Transactions on Systems, Man, and Cybernetics, Part C (Applications
identification has provided ease of tracking hand movements. and Reviews), vol. 37, no. 3, pp. 311-324, May 2007, doi:
The creation of a custom dataset with predefined gestures has 10.1109/TSMCC.2007.893280.
significantly increased the accuracy of the CNN model in [13] Z. Liu, W. Chen, Y. Zou and C. Hu, "Regions of interest extraction
based on HSV color space," IEEE 10th International Conference on
identifying them. Since factors such as differences in Industrial Informatics, Beijing, China, 2012, pp. 481-485, doi:
illuminance and skin colour have a negligible effect on the 10.1109/INDIN.2012.6301214.
performance of the model, the proposed system can replace [14] Padilla, Rafael & Filho, Cicero & Costa, Marly. (2012). Evaluation of
the existing hardware used for cursor control. Thus, the paper Haar Cascade Classifiers for Face Detection.
is a significant advancement in the field of Human-Computer [15] Khan, Faiz. (2020). Computer Vision Based Mouse Control Using
Interaction. Although this system is currently employed for Object Detection and Marker Motion Tracking. 9. 35-45. Faiz Khan et
basic cursor control, it can be extended into various real- al, International Journal of Computer Science and Mobile Computing,
Vol.9 Issue.5, May- 2020, pg. 35-45
world applications such as Digital Art, Gaming, Virtual and
Augmented Reality. [16] Pinto, Raimundo & Braga Borges, Carlos & Almeida, Antonio & Paula
Jr, Ialis. (2019). Static Hand Gesture Recognition Based on
Convolutional Neural Networks. Journal of Electrical and Computer
ACKNOWLEDGMENT Engineering. 2019. 1-12. 10.1155/2019/4167890.
The authors are thankful to Dr Radha Shankarmani for [17] J. Lee, R. Haralick and L. Shapiro, "Morphologic edge detection," in
mentoring us throughout the research and development of the IEEE Journal on Robotics and Automation, vol. 3, no. 2, pp. 142-156,
Virtual Mouse based on Gesture Recognition. It’s owing to April 1987, doi: 10.1109/JRA.1987.1087088.
her help that the paper has reached its goal to serve the [18] Verma, Rachna. (2017). A Review of Object Detection and Tracking
Methods. International Journal of Advance Engineering and Research
desired purpose. The authors would take this opportunity to Development. 4. 569-578.
thank Sardar Patel Institute of Technology for providing all
the necessary resources which fueled the progress of this
research.

4
Authorized licensed use limited to: Zhejiang University. Downloaded on September 12,2024 at 22:37:46 UTC from IEEE Xplore. Restrictions apply.

You might also like