Hand Gesture Recognition

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

Hand Gesture Recognition

Submitted in partial fulfillment of the requirements

of the degree of

BACHELOR OF ENGINEERING
In
COMPUTER ENGINEERING
By

Group No: 34

Roll No. Name

1604013 Mayur Chablani

1604089 Mangesh Phadse

1604106 Tejas Shenoy

Guide:

PROF. Jayant Gadge

(Assistant Professor, Department of Computer Engineering, TSEC)

Computer Engineering Department Thadomal Shahani Engineering


College University of Mumbai 2019-2020
 
CERTIFICATE

This is to certify that the project entitled ​“Hand Gesture Recognition”​ is a bonafide work
of

Roll No. Name

1604013 Mayur Chablani

1604089 Mangesh Phadse

1604106 Tejas Shenoy

Submitted to the University of Mumbai in partial fulfillment of the requirement for the award

of the degree of ​“BACHELOR OF ENGINEERING”​in​“COMPUTER


ENGINEERING”​.

Prof. Jayant Gadge

Guide

Dr. Tanuja Sarode


Dr.G.T.Thampi

Head of Department Principal


 

Project Report Approval for B.E

Project report entitled (​Hand Gesture Recognition)​ by

Roll No. Name

1604013 Mayur Chablani

1604089 Mangesh Phadse

1604106 Tejas Shenoy

is approved for the degree of ​“BACHELOR OF ENGINEERING” in

“COMPUTER ENGINEERING”​.

Examiners

1.---------------------------------------------

2.---------------------------------------------

Date:
Place:
 
Declaration

We declare that this written submission represents my ideas in my own


words and where others 'ideas or words have been included, we have adequately
cited and referenced the original sources. We also declare that we have adhered
to all principles of academic honesty and integrity and have not misrepresented
or fabricated or falsified any idea/data/fact/source in our submission. We
understand that any violation of the above will because for disciplinary action by
the Institute and can also evoke penal action from the sources which have thus
not been properly cited or from whom proper permission has not been taken
when needed.

1) ____________________
(Signature)
____________________
(Name of student and Roll No.)

2) ____________________

(Signature)
____________________
(Name of student and Roll No.)

3) ____________________
(Signature)

____________________
(Name of student and Roll No.)

Date:
 
Abstract
We use our hands constantly to interact with things: pick them up, move them, transform their
shape, or activate them in some way. In the same unconscious way, we gesticulate in
communicating fundamental ideas: ‘stop’, ‘come closer’, ‘over there’, ‘no’, ‘agreed’, and so on.
Gestures are thus a natural and intuitive form of both interaction and communication. Gestures
and gesture recognition are terms increasingly encountered in discussions of human-computer
interaction. We present a tool created for human-computer interaction based on hand gestures.
The underlying algorithm utilizes only computer-vision techniques. The tool is able to recognize
in real time six different hand gestures, captured using a webcam. Experiments conducted to
evaluate the system performance are reported​. Hand gesture recognition could help in video gaming
by allowing players to interact with the game using gestures instead of using a controller. However, such
an algorithm needs to be more robust to account for the myriad of possible hand positions in
three-dimensional space. It also needs to work with video rather than static images. That is beyond the
scope of our project.
 
TABLE OF CONTENTS

Sr. No. Topic Page No.

List of Figures i
List of Tables ii
1. Introduction 1
1.1 Introduction 1
1.2 Aims & Objective 2
1.3 Scope 2
2. Review of Literature 3
2.1 Domain Explanation 3
2.2 Existing Solution 3
2.3 H/W & S/W requirement 4
3. Analysis 5
3.1 Functional Requirement 5
3.2 Non-Functional Requirement 5
3.3 Proposed System 6
4. Design 13
4.1 DataFlowDiagram 13
4.2 Sequence Diagram 15
4.3 UseCase Diagram 17
 
5. Implementation 22
5.1 Plan for Implementation 22
6. Conclusion 28
Appendix 29
References 30
Acknowledgement 31
 
List of Figures

Figure No. Page No.

Figure 2.1 6

Figure 2.2 15

Figure 2.3 16

Figure 2.4 17

Figure 2.5 18

Figure 4.1 18

Figure 4.2 19

Figure 4.3 19

Figure 4.4 19
 
Chapter1

Introduction

1.1 Introduction

Gesture recognition has been a very interesting problem in Computer Vision community for a
long time. This is particularly due to the fact that segmentation of foreground object from a
cluttered background is a challenging problem in real-time. The most obvious reason is
because of the semantic gap involved when a human looks at an image and a computer
looking at the same image. Humans can easily figure out what’s in an image but for a
computer, images are just 3-dimensional matrices. It is because of this, computer vision
problems remains a challenge.​ ​To recognize these gestures from a live video sequence, we
first need to take out the hand region alone removing all the unwanted portions in the video
sequence. After segmenting the hand region, we then count the fingers shown in the video
sequence to instruct a robot based on the finger count.

1.2 Aim & Objective

Objective of this project is to make an application that controls some specific functionalities
of computer using hand gestures via integrated webcam.
Our project has been divided into four modules:
Module 1​– Taking input from the webcam and converting it into a form that can be
processed easily.
Module 2​– Intercepting the gesture from the input of the webcam.
Module 3​– Recognizing the gesture from a database of gestures.
Module 4​– According to the intercepted gesture, give corresponding commands for the
operations.

1.3 Scope

The hand gesture recognition system can be used further to control the operation of other
system applications like Explorer, Media Player etc. To create a website which operates
using hand gestures. JavaScript can be dynamically combined with the gesture recognition
logic for the same. To use the gesture recognition logic in sensitive areas of work like
hospitals and nuclear power plants where sterility between machines and human is vital.
To create a battery free technology that enables the operation of mobile devices with hand
gesture
 
Chapter2
Review of Literature

2.1 Domain Explanation


2.1.1 OpenCV

OpenCV​ (Open Source Computer Vision) is a library of programming functions mainly aimed
at real-time computer vision. In simple language it is library used for Image Processing. It is
mainly used to do all the operation related to Images.OpenCV was built to provide a common
infrastructure for computer vision applications and to accelerate the use of machine perception
in the commercial products. Being a BSD-licensed product, OpenCV makes it easy for
businesses to utilize and modify the code.
The library has more than 2500 optimized algorithms, which includes a comprehensive set of
both classic and state-of-the-art computer vision and machine learning algorithms. These
algorithms can be used to detect and recognize faces, identify objects, classify human actions in
videos, track camera movements, track moving objects, extract 3D models of objects, produce
3D point clouds from stereo cameras, stitch images together to produce a high resolution image
of an entire scene, find similar images from an image database, remove red eyes from images
taken using flash, follow eye movements, recognize scenery and establish markers to overlay it
with augmented reality, etc. OpenCV has more than 47 thousand people of user community and
estimated number of downloads exceeding 14 million. The library is used extensively in
companies, research groups and by governmental bodies.

2.1.2 ​Computer vision and Digital Image Processing

The sense of sight is arguably the most important of man's five senses. It provides a
huge amount of information about the world that is rich in detail and delivered at the
speed of light. However,human vision is not without its limitations, both physical and
psychological. Through digital imaging technology and computers, man has
transcending many visual limitations. He can see into far galaxies, the microscopic
world, the subatomic world, and even “observe” infra-red, x-ray,ultraviolet and other
spectra for medical diagnosis, meteorology, surveillance, and military uses, all with
great success. While computers have been central to this success, for the most part
man is the sole interpreter of all the digital data. For a long time, the central question
has been whether computers can be designed to analyze and acquire information from
images autonomously in the same natural way humans can. According to Gonzales
and Woods , this is the province of computer vision, which is that branch of artificial
intelligence that ultimately aims to “use computers to emulate human vision, including
learning and being able to make inferences and taking actions based on visual inputs.”
. Thea use of images has an obvious drawback.Humans perceive the world in 3D, but
current visual sensors like cameras capture the world in 2Di mages. The result is the
natural loss of a good deal of information in the captured images. Without a proper
paradigm to explain the mystery of human vision and perception, the recovery of lost
information (reconstruction of the world) from 2D images represents a difficult hurdle
for machine vision . However, despite this limitation, computer vision has progressed,
riding mainly on there remarkable advancement of decades-old digital image
processing techniques, using the science and methods contributed by other disciplines
such as optics, neurobiology, psychology, physics,mathematics, electronics, computer
science, artificial intelligence and others. Computer vision techniques and digital image
processing methods both draw the proverbial waterReal-Time Hand Gesture Detection
and Recognition Using Simple Heuristic Rules Page 3 of 57from the same pool, which
is the digital image, and therefore necessarily overlap. Image processing takes a digital
image and subjects it to processes, such as noise reduction, detail enhancement, or
filtering, for the purpose of producing another desired image as the end result. For
example, the blurred image of a car registration plate might be enhanced by imaging
techniques to produce a clear photo of the same so the police might identify the owner
of the car. On the other hand,computer vision takes a digital image and subjects it to
the same digital imaging techniques but for the purpose of analyzing and
understanding what the image depicts. For example, the image of a building can be fed
to a computer and thereafter be identified by the computer as a residential house, a
stadium, high-rise office tower, shopping mall, or a farm barn. [5]Russell and Norvig [6]
identified three broad approaches used in computer vision to distill useful information
from the raw data provided by images. The first is the feature extraction
approach,which focuses on simple computations applied directly to digital images to
measure some useable characteristic, such as size. This relies on generally known
image processing algorithms for noise reduction, filtering, object detection, edge
detection, texture analysis, computation of optical flow,and segmentation, which
techniques are commonly used to pre-process images for subsequent image analysis.
This is also considered an “uninformed” approach.The second is the recognition
approach, where the focus is on distinguishing and labelling objects based on
knowledge of characteristics that sets of similar objects have in common, such as
shape or appearance or patterns of elements, sufficient to form classes. Here
computer vision uses the techniques of artificial intelligence in knowledge
representation to enable a “classifier” to match classes to objects based on the pattern
of their features or structural descriptions. A classifier has to“learn” the patterns by
being fed a training set of objects and their classes and achieving the goal of
minimizing mistakes and maximizing successes through a step-by-step process of
improvement.There are many techniques in artificial intelligence that can be used for
object or pattern recognition, including statistical pattern recognition, neural nets,
genetic algorithms and fuzzysystems.The third is the reconstruction approach, where
the focus is on building a geometric model of the world suggested by the image or
images and which is used as a basis for action. This corresponds to the stage of image
understanding, which represents the highest and most complex level of computer
vision processing. Here the emphasis is on enabling the computer vision system to
construct internal models based on the data supplied by the images and to discard or
update these internalReal-Time Hand Gesture Detection and Recognition Using
Simple Heuristic Rules Page 4 of 57models as they are verified against the real world
or some other criteria. If the internal model is consistent with the real world, then image
understanding takes place. Thus, image understanding requires the construction,
manipulation and control of models and at the moment relies heavily upon the science
and technology of artificial intelligence.

2.2 Existing Solution

2.2.1 Hand gesture image capture

The construction of a database for hand gesture (i.e., the selection of specific hand
gestures) generally depends on the intended application. A vocabulary of six static hand
gestures is made for HCI as shown in Fig. ​2​.

Fig. 2.1

Six static hand gestures: Open, Close, Cut, Paste, Maximize and Minimize

. Original of lightning: original and artificial. The database consists of 30 images for the
training set (five samples for each gesture) and 56 images for testing with scaling,
translation, and rotation effects. Employing relatively few training images facilitates the
measurement of the robustness of the proposed methods, given that the use of
algorithms that require relatively modest resources either in terms of training data or
computational resources is desirable. In addition, Guodong and Dyer considered that
using a small data set to represent each class is of practical value especially in problems
where it is difficult to get a lot of examples for each class.
Fig. 2.2

2.2.2 Pre-processing stage

The primary goal of the pre-processing stage is to ensure a uniform input to the classification
network. This stage includes hand segmentation to isolate the foreground (hand gesture) from
the background and the use of special filters to remove any noise caused by the segmentation
process. This stage also includes edge detection to find the final shape of the hand.

2.2.3 Hand segmentation

The hand image is segmented from the background. The segmentation process should be
fast, reliable, consistent, and able to achieve optimal image quality suitable for the
recognition of the hand gesture. Gesture recognition requires accurate segmentation. A
thresholding algorithm is used in this study to segment the gesture image (see Fig. 4).
Segmentation is accomplished by scanning the image pixel by pixel and labeling each
pixel as an object or a background depending on whether the gray level of that pixel is
greater or less than the value of the threshold T.
Fig. 2.3 Hand gesture images before and after segmentation

2.2.4 Noise reduction

Once the hand gesture image has been segmented, a special filter is applied to remove
noise by eliminating all the single white pixels on a black background and all the single
black pixels on a white foreground. To accomplish this goal, a median filter is applied to
the segmented image as shown in Fig. 5.

Fig. 2.4 Median filter effect

2.2.5 Edge detection

To recognize static gestures, the model parameters derived from the description of the
shape and the boundary of the hand are extracted for further processing. Sobel was
chosen for edge detection. Figure 6 shows some gesture images before and after edge
detection operation using Sobel method.
Fig. 2.5 Sobel edge detection for Open, Close and Cut

2.3 H/W & S/W requirement

2.3.1 Software Requirement


A set of instructions or program required to make hardware platform suitable for
desired task is known as software. Software can also be defined as the utility
programs that are required to drive hardware of computer.
● Operating system-Microsoft Windows 7 SP 1 or above
● Microsoft Visual Studio 2010
● Python
● Opencv libraries
● Supporting Webcam Drivers

2.3.2 Hardware Requirement


All the physical equipment’s i.e. input devices, processor, and output device & inter connecting
processor of the computer s called as hardware
● .Hard Disk minimum of 40 GB.
● RAM minimum of 2 GB.
● Dual Core and up ,15” Monitor.
● Integrated webcam or external webcam (15 -20 fps).
Chapter3
Analysis
3 .1Functional Requirements
● Skin Detection Module :This software shall perform skin colour detection and filter out all
objects that do not contain the colour of skin. By filtering object of non-skin coloured, the
system can then use its remaining resources and focus on hand detection and gesture
recognition. This also allows the system to pinpoint possible locations of user’s hands.
● Filtered Object Detection: Once the program has filtered out most of the unwanted parts of
the picture after using the skin detection module, the software shall read and recognize
“clusters” skin coloured objects also known as “blobs”.
● Object Location Upon detection: system shall be able to compute the location of the object
using simple trigonometry math

3.2 Non Functional Requirement


Non-functional requirements specify the criteria in the operation and the architecture of the system.
● Efficiency in Computation: This software shall minimize the use of Central Processing Unit
(CPU) and memory resources on the operating system. When HGR is executing, the software
shall utilize less than 80% of the system’s CPU resource and less than 100 megabytes of
system memory. 3.2.2.
● Extensibility: The software shall be extensible to support future developments and add-ons
to the HGR software. The gesture control module of HGR shall be at least 50% extensible to
allow new gesture recognition features to be added to the system.
● Portability: The HGR software shall be 100% portable to all operating platforms that support
Java Runtime Environment (JRE). Therefore, this software should not depend on the
different operating systems.
● Performance: This software shall minimize the number of calculations needed to perform
image processing and hand gesture detection. Each captured video frame shall be processed
within 350 milliseconds to achieve 3 frames per second performance.
Chapter4
Design
4.1 Data flow diagram

The graphical representation which shows the information flow, transformations apply as
the information transforms from one stage to the other stage is known as data flow
diagram (DFD). A system or the software at different level of abstraction can be
represented by the data flow diagram.

The data flow diagram can be divided into different levels so as to represent the functional
details and the flow of the information. The mechanism provided by the DFD is functional
modeling and modeling the information flow.The zero level DFD also known as context
model or the fundamental system. It can be used to represent the complete software as one
bubble. The input and the output is shown by the arrows. Extra processes (bubbles) and
information flow paths can be shown as the level 0 DFD is divided further to expose more
detail.

fig 4.1 Data Flow diagram

In DFD level 0, user gives a input video which is stored on the disk to application with the help of
webcam. Application performs required operations on the video by using hand gestures.
Application then gives back the recovered video to user.

In DFD level 1, hand gestures are captured through webcam. Processing is done on hand gestures
to detect events of hand, for this through histogram back projection method and hand gesture
event detection methods are used. With the help of these events video is selected which is
degraded. Frames from that video is extracted and then Pix-Mix algorithm is applied on that

frames for video Inpainting methods such as extracting object, replacing object etc. Then final
recovered video is generated.

4.2 Data flow diagram2

4.2 Sequence diagram

An interface chart is the grouping graph which highlights on the time request of the messages. A
grouping chart give the data around a set of items and the messages sent and got by those articles.
The articles are regularly named or unacknowledged examples of classes, yet might likewise speak
to occasions of different things, for example, coordinated efforts, segments, and hubs.
The arrangement graphs are utilized to speak to the element perspective of the framework. A
grouping chart is a kind of correspondence outline in UML that indicates how courses of action
work one with an alternate and in what sort. A succession graph exhibit, as parallel vertical lines,
unique techniques or substance that inhabit the same time, and as straight shafts, the correspondence
exchanged between them, in the request in which they happen. This allows the state of simple
runtime situations in a graphical style.
fig 4.3 Sequence diagram

4.3 UseCase Diagram

Use (Utilization) case diagrams are one of the five graphs in the Unified Modeling Language (UML)
for displaying the element parts of frameworks (movement outlines, state graph graphs, arrangement
outlines, and cooperation outlines are four different sorts of charts in the UML for demonstrating the
element parts of frameworks). Utilization case outlines are fundamental to displaying the conduct of
a framework, a subsystem, or a class. Every one demonstrates a set of utilization cases and
performing artists and their connections.
We apply utilize case charts to model the utilization case perspective of a framework. Generally, this
includes demonstrating the connection of a framework, subsystem, or class, or displaying the
necessities of the conduct of these components. Utilization case graphs are imperative for imagining,
determining, and archiving the conduct of a component. They make frameworks, subsystems, and
classes agreeable and justifiable by exhibiting an outside perspective of how those components may
be utilized as a part of connection.

fig 4.4 Use case Diagram

Chapter5
Implementation
5.1 Plan For Implementation
We are going to implement a system that recognizes Gesture input using webcam & performs the

Specified Operation. This application can be made to run in the background while the user runs

other programs and applications. This is very useful for a hands-free approach.

This project has a vast arena of development, notably the Sixth Sense project which completely

revolutionizes the digital world. The code can be extended to incorporate mouse movements as

well as still gestures in 3-D. Further tweaks can be incorporated in the code to increase the

efficiency of the gesture recognition process. The code can be improved for better interpretation

and recognition of the gestures and newer gestures may be incorporated for more functionalities.

The user interface for adding and checking gestures as well as running the program can be

improved greatly, e.g. providing an interactive GUI rather than using terminal commands.

1. Preparing the binary mask

First of all we will create a binary mask of the hand in order to compute the hand
contour. To keep it simple we will segment the images based on the hand skin color using
the ​inrange ​operation, but of course one can come up with sophisticated approaches to
build a more stable algorithm. Furthermore we will convert our frames, which are in
BGR format by default as you read them from a file or capture in OpenCV, to HLS (Hue,
Lightness, Saturation) color space. The Hue channel encodes the actual color
information. By this we only have to figure out the proper Hue value range of the skin
and then adjust the values for Saturation and Lightness.
2. Computing the contour and it’s convex hull

Next we will tell OpenCV to find all contours in the mask. We will return the largest
contour in case segmentation did not work out as well and still contains noise.

3. Detecting the fingertips


Now we let OpenCV compute the convexity defects of the hand contour to our new
polygon. This will give us the defect regions of our contour which are described by a
starting point p0, an ending point p1 and the defect point p2. As you can see the defect
points (green circles) are located in the “valley ” position between two points of the
convex hull of the contour.

 
Chapter 6
REFERENCES

[1] W. WANG, J. PAN “Hand Segmentation Using Skin Color and Background Information,”
International Conference on Machine Learning and Cybernetics, pp. 1487-1492, July, 2012.
[2] A. Y. Dawod, J. Abodullah, and Md. J. Alam, “Adaptive Skin Color Model for Hand
Segmentation,” International Conference on Computer Applications and Industrial Electronics
(ICCAIE 2010), pp. 486-489, December, 2010.
[3] A. Y. Dawod, J. Abodullah, and Md. J. Alam, “A New Method for Hand Segmentation Using
Free-Form Skin Color Model,” International Conference on Advanced Computer Theory and
Engineering (ICACTE), pp. V2-562-V2-566, 2010.
[4] G. Bebis, M. Nicolescu, R. D. Boyle, and X. Twombly, “A Review on Vision-Based Full DOF
Hand Motion Estimation,” IEEE Computer Society Conference on Computer Vision and Pattern
Recognition (CVPR, 05), 2005
[5] A. Erol, G. Bebis, M. Nicolescu, R. D. Boyle, and X. Twombly, “Vision-based Hand Pose
Estimation: A Review,” Computer Vision and Image Understanding, vol. 108, pp. 52–73, 2007.
[6] X. Wu, C. Yang, Y. Wang, H. Li, and S. Xu, “An Intelligent Interactive System Based on Hand
Gesture Recognition Algorithm and Kinect,” Fifth International Symposium on Computational
Intelligence and Design, pp.294-298, 2012.
[7] Z. Ren, J. Yuan, Member, IEEE, J. Meng, Member, IEEE, and Z. Zhang, Fellow, IEEE “Robust
Part-Based Hand Gesture Recognition Using Kinect Sensor,” IEEE Transactions On Multimedia,
vol. 15, pp. 1110-1120, AUGUST, 2013.
[8] M. Panpar “Hand Gesture Recognition Based on Shape Parameters,” Communication and
Applications (ICCCA), International Conference, pp, 1-6, 2012.

You might also like