Hand Gesture Recognition
Hand Gesture Recognition
Hand Gesture Recognition
of the degree of
BACHELOR OF ENGINEERING
In
COMPUTER ENGINEERING
By
Group No: 34
Guide:
This is to certify that the project entitled “Hand Gesture Recognition” is a bonafide work
of
Submitted to the University of Mumbai in partial fulfillment of the requirement for the award
Guide
“COMPUTER ENGINEERING”.
Examiners
1.---------------------------------------------
2.---------------------------------------------
Date:
Place:
Declaration
1) ____________________
(Signature)
____________________
(Name of student and Roll No.)
2) ____________________
(Signature)
____________________
(Name of student and Roll No.)
3) ____________________
(Signature)
____________________
(Name of student and Roll No.)
Date:
Abstract
We use our hands constantly to interact with things: pick them up, move them, transform their
shape, or activate them in some way. In the same unconscious way, we gesticulate in
communicating fundamental ideas: ‘stop’, ‘come closer’, ‘over there’, ‘no’, ‘agreed’, and so on.
Gestures are thus a natural and intuitive form of both interaction and communication. Gestures
and gesture recognition are terms increasingly encountered in discussions of human-computer
interaction. We present a tool created for human-computer interaction based on hand gestures.
The underlying algorithm utilizes only computer-vision techniques. The tool is able to recognize
in real time six different hand gestures, captured using a webcam. Experiments conducted to
evaluate the system performance are reported. Hand gesture recognition could help in video gaming
by allowing players to interact with the game using gestures instead of using a controller. However, such
an algorithm needs to be more robust to account for the myriad of possible hand positions in
three-dimensional space. It also needs to work with video rather than static images. That is beyond the
scope of our project.
TABLE OF CONTENTS
List of Figures i
List of Tables ii
1. Introduction 1
1.1 Introduction 1
1.2 Aims & Objective 2
1.3 Scope 2
2. Review of Literature 3
2.1 Domain Explanation 3
2.2 Existing Solution 3
2.3 H/W & S/W requirement 4
3. Analysis 5
3.1 Functional Requirement 5
3.2 Non-Functional Requirement 5
3.3 Proposed System 6
4. Design 13
4.1 DataFlowDiagram 13
4.2 Sequence Diagram 15
4.3 UseCase Diagram 17
5. Implementation 22
5.1 Plan for Implementation 22
6. Conclusion 28
Appendix 29
References 30
Acknowledgement 31
List of Figures
Figure 2.1 6
Figure 2.2 15
Figure 2.3 16
Figure 2.4 17
Figure 2.5 18
Figure 4.1 18
Figure 4.2 19
Figure 4.3 19
Figure 4.4 19
Chapter1
Introduction
1.1 Introduction
Gesture recognition has been a very interesting problem in Computer Vision community for a
long time. This is particularly due to the fact that segmentation of foreground object from a
cluttered background is a challenging problem in real-time. The most obvious reason is
because of the semantic gap involved when a human looks at an image and a computer
looking at the same image. Humans can easily figure out what’s in an image but for a
computer, images are just 3-dimensional matrices. It is because of this, computer vision
problems remains a challenge. To recognize these gestures from a live video sequence, we
first need to take out the hand region alone removing all the unwanted portions in the video
sequence. After segmenting the hand region, we then count the fingers shown in the video
sequence to instruct a robot based on the finger count.
Objective of this project is to make an application that controls some specific functionalities
of computer using hand gestures via integrated webcam.
Our project has been divided into four modules:
Module 1– Taking input from the webcam and converting it into a form that can be
processed easily.
Module 2– Intercepting the gesture from the input of the webcam.
Module 3– Recognizing the gesture from a database of gestures.
Module 4– According to the intercepted gesture, give corresponding commands for the
operations.
1.3 Scope
The hand gesture recognition system can be used further to control the operation of other
system applications like Explorer, Media Player etc. To create a website which operates
using hand gestures. JavaScript can be dynamically combined with the gesture recognition
logic for the same. To use the gesture recognition logic in sensitive areas of work like
hospitals and nuclear power plants where sterility between machines and human is vital.
To create a battery free technology that enables the operation of mobile devices with hand
gesture
Chapter2
Review of Literature
OpenCV (Open Source Computer Vision) is a library of programming functions mainly aimed
at real-time computer vision. In simple language it is library used for Image Processing. It is
mainly used to do all the operation related to Images.OpenCV was built to provide a common
infrastructure for computer vision applications and to accelerate the use of machine perception
in the commercial products. Being a BSD-licensed product, OpenCV makes it easy for
businesses to utilize and modify the code.
The library has more than 2500 optimized algorithms, which includes a comprehensive set of
both classic and state-of-the-art computer vision and machine learning algorithms. These
algorithms can be used to detect and recognize faces, identify objects, classify human actions in
videos, track camera movements, track moving objects, extract 3D models of objects, produce
3D point clouds from stereo cameras, stitch images together to produce a high resolution image
of an entire scene, find similar images from an image database, remove red eyes from images
taken using flash, follow eye movements, recognize scenery and establish markers to overlay it
with augmented reality, etc. OpenCV has more than 47 thousand people of user community and
estimated number of downloads exceeding 14 million. The library is used extensively in
companies, research groups and by governmental bodies.
The sense of sight is arguably the most important of man's five senses. It provides a
huge amount of information about the world that is rich in detail and delivered at the
speed of light. However,human vision is not without its limitations, both physical and
psychological. Through digital imaging technology and computers, man has
transcending many visual limitations. He can see into far galaxies, the microscopic
world, the subatomic world, and even “observe” infra-red, x-ray,ultraviolet and other
spectra for medical diagnosis, meteorology, surveillance, and military uses, all with
great success. While computers have been central to this success, for the most part
man is the sole interpreter of all the digital data. For a long time, the central question
has been whether computers can be designed to analyze and acquire information from
images autonomously in the same natural way humans can. According to Gonzales
and Woods , this is the province of computer vision, which is that branch of artificial
intelligence that ultimately aims to “use computers to emulate human vision, including
learning and being able to make inferences and taking actions based on visual inputs.”
. Thea use of images has an obvious drawback.Humans perceive the world in 3D, but
current visual sensors like cameras capture the world in 2Di mages. The result is the
natural loss of a good deal of information in the captured images. Without a proper
paradigm to explain the mystery of human vision and perception, the recovery of lost
information (reconstruction of the world) from 2D images represents a difficult hurdle
for machine vision . However, despite this limitation, computer vision has progressed,
riding mainly on there remarkable advancement of decades-old digital image
processing techniques, using the science and methods contributed by other disciplines
such as optics, neurobiology, psychology, physics,mathematics, electronics, computer
science, artificial intelligence and others. Computer vision techniques and digital image
processing methods both draw the proverbial waterReal-Time Hand Gesture Detection
and Recognition Using Simple Heuristic Rules Page 3 of 57from the same pool, which
is the digital image, and therefore necessarily overlap. Image processing takes a digital
image and subjects it to processes, such as noise reduction, detail enhancement, or
filtering, for the purpose of producing another desired image as the end result. For
example, the blurred image of a car registration plate might be enhanced by imaging
techniques to produce a clear photo of the same so the police might identify the owner
of the car. On the other hand,computer vision takes a digital image and subjects it to
the same digital imaging techniques but for the purpose of analyzing and
understanding what the image depicts. For example, the image of a building can be fed
to a computer and thereafter be identified by the computer as a residential house, a
stadium, high-rise office tower, shopping mall, or a farm barn. [5]Russell and Norvig [6]
identified three broad approaches used in computer vision to distill useful information
from the raw data provided by images. The first is the feature extraction
approach,which focuses on simple computations applied directly to digital images to
measure some useable characteristic, such as size. This relies on generally known
image processing algorithms for noise reduction, filtering, object detection, edge
detection, texture analysis, computation of optical flow,and segmentation, which
techniques are commonly used to pre-process images for subsequent image analysis.
This is also considered an “uninformed” approach.The second is the recognition
approach, where the focus is on distinguishing and labelling objects based on
knowledge of characteristics that sets of similar objects have in common, such as
shape or appearance or patterns of elements, sufficient to form classes. Here
computer vision uses the techniques of artificial intelligence in knowledge
representation to enable a “classifier” to match classes to objects based on the pattern
of their features or structural descriptions. A classifier has to“learn” the patterns by
being fed a training set of objects and their classes and achieving the goal of
minimizing mistakes and maximizing successes through a step-by-step process of
improvement.There are many techniques in artificial intelligence that can be used for
object or pattern recognition, including statistical pattern recognition, neural nets,
genetic algorithms and fuzzysystems.The third is the reconstruction approach, where
the focus is on building a geometric model of the world suggested by the image or
images and which is used as a basis for action. This corresponds to the stage of image
understanding, which represents the highest and most complex level of computer
vision processing. Here the emphasis is on enabling the computer vision system to
construct internal models based on the data supplied by the images and to discard or
update these internalReal-Time Hand Gesture Detection and Recognition Using
Simple Heuristic Rules Page 4 of 57models as they are verified against the real world
or some other criteria. If the internal model is consistent with the real world, then image
understanding takes place. Thus, image understanding requires the construction,
manipulation and control of models and at the moment relies heavily upon the science
and technology of artificial intelligence.
The construction of a database for hand gesture (i.e., the selection of specific hand
gestures) generally depends on the intended application. A vocabulary of six static hand
gestures is made for HCI as shown in Fig. 2.
Fig. 2.1
Six static hand gestures: Open, Close, Cut, Paste, Maximize and Minimize
. Original of lightning: original and artificial. The database consists of 30 images for the
training set (five samples for each gesture) and 56 images for testing with scaling,
translation, and rotation effects. Employing relatively few training images facilitates the
measurement of the robustness of the proposed methods, given that the use of
algorithms that require relatively modest resources either in terms of training data or
computational resources is desirable. In addition, Guodong and Dyer considered that
using a small data set to represent each class is of practical value especially in problems
where it is difficult to get a lot of examples for each class.
Fig. 2.2
The primary goal of the pre-processing stage is to ensure a uniform input to the classification
network. This stage includes hand segmentation to isolate the foreground (hand gesture) from
the background and the use of special filters to remove any noise caused by the segmentation
process. This stage also includes edge detection to find the final shape of the hand.
The hand image is segmented from the background. The segmentation process should be
fast, reliable, consistent, and able to achieve optimal image quality suitable for the
recognition of the hand gesture. Gesture recognition requires accurate segmentation. A
thresholding algorithm is used in this study to segment the gesture image (see Fig. 4).
Segmentation is accomplished by scanning the image pixel by pixel and labeling each
pixel as an object or a background depending on whether the gray level of that pixel is
greater or less than the value of the threshold T.
Fig. 2.3 Hand gesture images before and after segmentation
Once the hand gesture image has been segmented, a special filter is applied to remove
noise by eliminating all the single white pixels on a black background and all the single
black pixels on a white foreground. To accomplish this goal, a median filter is applied to
the segmented image as shown in Fig. 5.
To recognize static gestures, the model parameters derived from the description of the
shape and the boundary of the hand are extracted for further processing. Sobel was
chosen for edge detection. Figure 6 shows some gesture images before and after edge
detection operation using Sobel method.
Fig. 2.5 Sobel edge detection for Open, Close and Cut
The graphical representation which shows the information flow, transformations apply as
the information transforms from one stage to the other stage is known as data flow
diagram (DFD). A system or the software at different level of abstraction can be
represented by the data flow diagram.
The data flow diagram can be divided into different levels so as to represent the functional
details and the flow of the information. The mechanism provided by the DFD is functional
modeling and modeling the information flow.The zero level DFD also known as context
model or the fundamental system. It can be used to represent the complete software as one
bubble. The input and the output is shown by the arrows. Extra processes (bubbles) and
information flow paths can be shown as the level 0 DFD is divided further to expose more
detail.
In DFD level 0, user gives a input video which is stored on the disk to application with the help of
webcam. Application performs required operations on the video by using hand gestures.
Application then gives back the recovered video to user.
In DFD level 1, hand gestures are captured through webcam. Processing is done on hand gestures
to detect events of hand, for this through histogram back projection method and hand gesture
event detection methods are used. With the help of these events video is selected which is
degraded. Frames from that video is extracted and then Pix-Mix algorithm is applied on that
frames for video Inpainting methods such as extracting object, replacing object etc. Then final
recovered video is generated.
An interface chart is the grouping graph which highlights on the time request of the messages. A
grouping chart give the data around a set of items and the messages sent and got by those articles.
The articles are regularly named or unacknowledged examples of classes, yet might likewise speak
to occasions of different things, for example, coordinated efforts, segments, and hubs.
The arrangement graphs are utilized to speak to the element perspective of the framework. A
grouping chart is a kind of correspondence outline in UML that indicates how courses of action
work one with an alternate and in what sort. A succession graph exhibit, as parallel vertical lines,
unique techniques or substance that inhabit the same time, and as straight shafts, the correspondence
exchanged between them, in the request in which they happen. This allows the state of simple
runtime situations in a graphical style.
fig 4.3 Sequence diagram
Use (Utilization) case diagrams are one of the five graphs in the Unified Modeling Language (UML)
for displaying the element parts of frameworks (movement outlines, state graph graphs, arrangement
outlines, and cooperation outlines are four different sorts of charts in the UML for demonstrating the
element parts of frameworks). Utilization case outlines are fundamental to displaying the conduct of
a framework, a subsystem, or a class. Every one demonstrates a set of utilization cases and
performing artists and their connections.
We apply utilize case charts to model the utilization case perspective of a framework. Generally, this
includes demonstrating the connection of a framework, subsystem, or class, or displaying the
necessities of the conduct of these components. Utilization case graphs are imperative for imagining,
determining, and archiving the conduct of a component. They make frameworks, subsystems, and
classes agreeable and justifiable by exhibiting an outside perspective of how those components may
be utilized as a part of connection.
Chapter5
Implementation
5.1 Plan For Implementation
We are going to implement a system that recognizes Gesture input using webcam & performs the
Specified Operation. This application can be made to run in the background while the user runs
other programs and applications. This is very useful for a hands-free approach.
This project has a vast arena of development, notably the Sixth Sense project which completely
revolutionizes the digital world. The code can be extended to incorporate mouse movements as
well as still gestures in 3-D. Further tweaks can be incorporated in the code to increase the
efficiency of the gesture recognition process. The code can be improved for better interpretation
and recognition of the gestures and newer gestures may be incorporated for more functionalities.
The user interface for adding and checking gestures as well as running the program can be
improved greatly, e.g. providing an interactive GUI rather than using terminal commands.
First of all we will create a binary mask of the hand in order to compute the hand
contour. To keep it simple we will segment the images based on the hand skin color using
the inrange operation, but of course one can come up with sophisticated approaches to
build a more stable algorithm. Furthermore we will convert our frames, which are in
BGR format by default as you read them from a file or capture in OpenCV, to HLS (Hue,
Lightness, Saturation) color space. The Hue channel encodes the actual color
information. By this we only have to figure out the proper Hue value range of the skin
and then adjust the values for Saturation and Lightness.
2. Computing the contour and it’s convex hull
Next we will tell OpenCV to find all contours in the mask. We will return the largest
contour in case segmentation did not work out as well and still contains noise.
Chapter 6
REFERENCES
[1] W. WANG, J. PAN “Hand Segmentation Using Skin Color and Background Information,”
International Conference on Machine Learning and Cybernetics, pp. 1487-1492, July, 2012.
[2] A. Y. Dawod, J. Abodullah, and Md. J. Alam, “Adaptive Skin Color Model for Hand
Segmentation,” International Conference on Computer Applications and Industrial Electronics
(ICCAIE 2010), pp. 486-489, December, 2010.
[3] A. Y. Dawod, J. Abodullah, and Md. J. Alam, “A New Method for Hand Segmentation Using
Free-Form Skin Color Model,” International Conference on Advanced Computer Theory and
Engineering (ICACTE), pp. V2-562-V2-566, 2010.
[4] G. Bebis, M. Nicolescu, R. D. Boyle, and X. Twombly, “A Review on Vision-Based Full DOF
Hand Motion Estimation,” IEEE Computer Society Conference on Computer Vision and Pattern
Recognition (CVPR, 05), 2005
[5] A. Erol, G. Bebis, M. Nicolescu, R. D. Boyle, and X. Twombly, “Vision-based Hand Pose
Estimation: A Review,” Computer Vision and Image Understanding, vol. 108, pp. 52–73, 2007.
[6] X. Wu, C. Yang, Y. Wang, H. Li, and S. Xu, “An Intelligent Interactive System Based on Hand
Gesture Recognition Algorithm and Kinect,” Fifth International Symposium on Computational
Intelligence and Design, pp.294-298, 2012.
[7] Z. Ren, J. Yuan, Member, IEEE, J. Meng, Member, IEEE, and Z. Zhang, Fellow, IEEE “Robust
Part-Based Hand Gesture Recognition Using Kinect Sensor,” IEEE Transactions On Multimedia,
vol. 15, pp. 1110-1120, AUGUST, 2013.
[8] M. Panpar “Hand Gesture Recognition Based on Shape Parameters,” Communication and
Applications (ICCCA), International Conference, pp, 1-6, 2012.