0% found this document useful (0 votes)
91 views26 pages

Hand Gesture Recognition

This document summarizes a project submitted by three students to fulfill the requirements for a Bachelor of Engineering degree in Computer Engineering. The project aims to create a tool for human-computer interaction using hand gesture recognition. The tool utilizes computer vision techniques to recognize six different hand gestures in real-time from a webcam feed. Experiments were conducted to evaluate the system's performance. However, the algorithm would need to be more robust to work with video input in three-dimensional space.

Uploaded by

mayur chablani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views26 pages

Hand Gesture Recognition

This document summarizes a project submitted by three students to fulfill the requirements for a Bachelor of Engineering degree in Computer Engineering. The project aims to create a tool for human-computer interaction using hand gesture recognition. The tool utilizes computer vision techniques to recognize six different hand gestures in real-time from a webcam feed. Experiments were conducted to evaluate the system's performance. However, the algorithm would need to be more robust to work with video input in three-dimensional space.

Uploaded by

mayur chablani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Hand Gesture Recognition

Submitted in partial fulfillment of the requirements

of the degree of

BACHELOR OF ENGINEERING
In
COMPUTER ENGINEERING
By

Group No: 34

Roll No. Name

1604013 Mayur Chablani

1604089 Mangesh Phadse

1604106 Tejas Shenoy

Guide:

PROF. Jayant Gadge

(Assistant Professor, Department of Computer Engineering, TSEC)

Computer Engineering Department Thadomal Shahani Engineering


College University of Mumbai 2019-2020
CERTIFICATE

This is to certify that the project entitled “Hand Gesture Recognition” is a bonafide
work of

Roll No. Name

1604013 Mayur Chablani

1604089 Mangesh Phadse

1604106 Tejas Shenoy

Submitted to the University of Mumbai in partial fulfillment of the requirement for the award

of the degree of “BACHELOR OF ENGINEERING”in“COMPUTER


ENGINEERING”.

Prof. Jayant Gadge

Guide

Dr. Tanuja Sarode

Dr.G.T.Thampi
Head of Department

Principal

1
Project Report Approval for B.E

Project report entitled (Hand Gesture Recognition) by

Roll No. Name

1604013 Mayur Chablani

1604089 Mangesh Phadse

1604106 Tejas Shenoy

is approved for the degree of “BACHELOR OF ENGINEERING” in

“COMPUTER ENGINEERING”.

Examiners

1.---------------------------------------------

2.---------------------------------------------

Date:
Place:
Declaration

2
We declare that this written submission represents my ideas in my own
words and where others 'ideas or words have been included, we have adequately
cited and referenced the original sources. We also declare that we have adhered to
all principles of academic honesty and integrity and have not misrepresented or
fabricated or falsified any idea/data/fact/source in our submission. We understand
that any violation of the above will because for disciplinary action by the Institute
and can also evoke penal action from the sources which have thus not been
properly cited or from whom proper permission has not been taken when needed.

1) ____________________
(Signature)
____________________
(Name of student and Roll No.)

2) ____________________
(Signature)
____________________
(Name of student and Roll No.)

3) ____________________
(Signature)
____________________
(Name of student and Roll No.)

Date:
Abstract
We use our hands constantly to interact with things: pick them up, move them, transform their
shape, or activate them in some way. In the same unconscious way, we gesticulate in

3
communicating fundamental ideas: ‘stop’, ‘come closer’, ‘over there’, ‘no’, ‘agreed’, and so on.
Gestures are thus a natural and intuitive form of both interaction and communication. Gestures
and gesture recognition are terms increasingly encountered in discussions of human-computer
interaction. We present a tool created for human-computer interaction based on hand gestures.
The underlying algorithm utilizes only computer-vision techniques. The tool is able to recognize
in real time six different hand gestures, captured using a webcam. Experiments conducted to
evaluate the system performance are reported. Hand gesture recognition could help in video gaming
by allowing players to interact with the game using gestures instead of using a controller. However, such
an algorithm needs to be more robust to account for the myriad of possible hand positions in three-
dimensional space. It also needs to work with video rather than static images. That is beyond the scope of
our project.

4
TABLE OF CONTENTS

Sr. No. Topic Page No.

List of Figures 7

1. Introduction
1.1 Introduction 8
1.2 Aims & Objective 8
1.3 Scope 8
2. Review of Literature
2.1 Domain Explanation 9
2.2 Existing Solution 11
2.3 H/W & S/W requirement 14
3. Analysis
3.1 Functional Requirement 16
3.2 Non-Functional Requirement 16

4. Design
4.1 DataFlowDiagram 17
4.2 Sequence Diagram 18
4.3 UseCase Diagram 19

5
5. Implementation 22
5.1 Plan for Implementation 22
6. Conclusion 28
Appendix 29
References 30
Acknowledgement 31

6
List of Figures

Figure No. Page No.

Figure 2.1 6

Figure 2.2 15

Figure 2.3 16

Figure 2.4 17

Figure 2.5 18

Figure 4.1 18

Figure 4.2 19

Figure 4.3 19

Figure 4.4 19

7
Chapter1

Introduction

1.1 Introduction

Gesture recognition has been a very interesting problem in Computer Vision community for a
long time. This is particularly due to the fact that segmentation of foreground object from a
cluttered background is a challenging problem in real-time. The most obvious reason is because
of the semantic gap involved when a human looks at an image and a computer looking at the
same image. Humans can easily figure out what’s in an image but for a computer, images are just
3-dimensional matrices. It is because of this, computer vision problems remains a challenge. To
recognize these gestures from a live video sequence, we first need to take out the hand region
alone removing all the unwanted portions in the video sequence. After segmenting the hand
region, we then count the fingers shown in the video sequence to instruct a robot based on the
finger count.

1.2 Aim & Objective

Objective of this project is to make an application that controls some specific functionalities of
computer using hand gestures via integrated webcam.
Our project has been divided into four modules:
Module 1– Taking input from the webcam and converting it into a form that can be processed
easily.
Module 2– Intercepting the gesture from the input of the webcam.
Module 3– Recognizing the gesture from a database of gestures.
Module 4– According to the intercepted gesture, give corresponding commands for the
operations.

1.3 Scope

The hand gesture recognition system can be used further to control the operation of other
system applications like Explorer, Media Player etc. To create a website which operates using
hand gestures. JavaScript can be dynamically combined with the gesture recognition logic for
the same. To use the gesture recognition logic in sensitive areas of work like hospitals and
nuclear power plants where sterility between machines and human is vital. To create a battery
free technology that enables the operation of mobile devices with hand gesture

8
Chapter2
Review of Literature

2.1 Domain Explanation


2.1.1 OpenCV

OpenCV (Open Source Computer Vision) is a library of programming functions mainly aimed at
real-time computer vision. In simple language it is library used for Image Processing. It is mainly
used to do all the operation related to Images.OpenCV was built to provide a common
infrastructure for computer vision applications and to accelerate the use of machine perception in
the commercial products. Being a BSD-licensed product, OpenCV makes it easy for businesses
to utilize and modify the code.

The library has more than 2500 optimized algorithms, which includes a comprehensive set of
both classic and state-of-the-art computer vision and machine learning algorithms. These
algorithms can be used to detect and recognize faces, identify objects, classify human actions in
videos, track camera movements, track moving objects, extract 3D models of objects, produce
3D point clouds from stereo cameras, stitch images together to produce a high resolution image
of an entire scene, find similar images from an image database, remove red eyes from images
taken using flash, follow eye movements, recognize scenery and establish markers to overlay it
with augmented reality, etc. OpenCV has more than 47 thousand people of user community and
estimated number of downloads exceeding 14 million. The library is used extensively in
companies, research groups and by governmental bodies.

2.1.2 Computer vision and Digital Image Processing

The sense of sight is arguably the most important of man's five senses. It provides a
huge amount of information about the world that is rich in detail and delivered at the
speed of light. However,human vision is not without its limitations, both physical and
psychological. Through digital imaging technology and computers, man has
transcending many visual limitations. He can see into far galaxies, the microscopic
world, the subatomic world, and even “observe” infra-red, x-ray,ultraviolet and other
spectra for medical diagnosis, meteorology, surveillance, and military uses, all with
great success. While computers have been central to this success, for the most part
man is the sole interpreter of all the digital data. For a long time, the central question
has been whether computers can be designed to analyze and acquire information from
images autonomously in the same natural way humans can. According to Gonzales and
Woods , this is the province of computer vision, which is that branch of artificial
intelligence that ultimately aims to “use computers to emulate human vision, including

9
learning and being able to make inferences and taking actions based on visual inputs.” .
The use of images has an obvious drawback.Humans perceive the world in 3D, but
current visual sensors like cameras capture the world in 2Di mages. The result is the
natural loss of a good deal of information in the captured images. Without a proper
paradigm to explain the mystery of human vision and perception, the recovery of lost
information (reconstruction of the world) from 2D images represents a difficult hurdle for
machine vision . However, despite this limitation, computer vision has progressed, riding
mainly on there remarkable advancement of decades-old digital image processing
techniques, using the science and methods contributed by other disciplines such as
optics, neurobiology, psychology, physics,mathematics, electronics, computer science,
artificial intelligence and others. Computer vision techniques and digital image
processing methods both draw the proverbial waterReal-Time Hand Gesture Detection
and Recognition Using Simple Heuristic Rules which is the digital image, and therefore
necessarily overlap. Image processing takes a digital image and subjects it to
processes, such as noise reduction, detail enhancement, or filtering, for the purpose of
producing another desired image as the end result. For example, the blurred image of a
car registration plate might be enhanced by imaging techniques to produce a clear
photo of the same so the police might identify the owner of the car. On the other
hand,computer vision takes a digital image and subjects it to the same digital imaging
techniques but for the purpose of analyzing and understanding what the image depicts.
For example, the image of a building can be fed to a computer and thereafter be
identified by the computer as a residential house, a stadium, high-rise office tower,
shopping mall, or a farm barn. Russell and Norvig identified three broad approaches
used in computer vision to distill useful information from the raw data provided by
images. The first is the feature extraction approach,which focuses on simple
computations applied directly to digital images to measure some useable characteristic,
such as size. This relies on generally known image processing algorithms for noise
reduction, filtering, object detection, edge detection, texture analysis, computation of
optical flow,and segmentation, which techniques are commonly used to pre-process
images for subsequent image analysis. This is also considered an “uninformed”
approach.The second is the recognition approach, where the focus is on distinguishing
and labelling objects based on knowledge of characteristics that sets of similar objects
have in common, such as shape or appearance or patterns of elements, sufficient to
form classes. Here computer vision uses the techniques of artificial intelligence in
knowledge representation to enable a “classifier” to match classes to objects based on
the pattern of their features or structural descriptions. A classifier has to“learn” the
patterns by being fed a training set of objects and their classes and achieving the goal
of minimizing mistakes and maximizing successes through a step-by-step process of
improvement.There are many techniques in artificial intelligence that can be used for
object or pattern recognition, including statistical pattern recognition, neural nets,
genetic algorithms and fuzzysystems.The third is the reconstruction approach, where
the focus is on building a geometric model of the world suggested by the image or
images and which is used as a basis for action. This corresponds to the stage of image
understanding, which represents the highest and most complex level of computer vision
processing. Here the emphasis is on enabling the computer vision system to construct

10
internal models based on the data supplied by the images and to discard or update
these internalReal-Time Hand Gesture Detection and Recognition Using Simple
Heuristic Rules Page 4 of 57models as they are verified against the real world or some
other criteria. If the internal model is consistent with the real world, then image
understanding takes place. Thus, image understanding requires the construction,
manipulation and control of models and at the moment relies heavily upon the science
and technology of artificial intelligence.

2.2 Existing Solution

2.2.1 Hand gesture image capture

The construction of a database for hand gesture (i.e., the selection of specific hand
gestures) generally depends on the intended application. A vocabulary of six static hand
gestures is made for HCI as shown in Fig. 2.

Fig. 2.1

Six static hand gestures: Open, Close, Cut, Paste, Maximize and Minimize

. Original of lightning: original and artificial. The database consists of 30 images for the
training set (five samples for each gesture) and 56 images for testing with scaling,
translation, and rotation effects. Employing relatively few training images facilitates the
measurement of the robustness of the proposed methods, given that the use of algorithms
that require relatively modest resources either in terms of training data or computational
resources is desirable. In addition, Guodong and Dyer considered that using a small data
set to represent each class is of practical value especially in problems where it is difficult
to get a lot of examples for each class.

11
Fig. 2.2

2.2.2 Pre-processing stage

The primary goal of the pre-processing stage is to ensure a uniform input to the classification
network. This stage includes hand segmentation to isolate the foreground (hand gesture) from the
background and the use of special filters to remove any noise caused by the segmentation
process. This stage also includes edge detection to find the final shape of the hand.

2.2.3 Hand segmentation

The hand image is segmented from the background. The segmentation process should be
fast, reliable, consistent, and able to achieve optimal image quality suitable for the
recognition of the hand gesture. Gesture recognition requires accurate segmentation. A
thresholding algorithm is used in this study to segment the gesture image (see Fig. 4).
Segmentation is accomplished by scanning the image pixel by pixel and labeling each
pixel as an object or a background depending on whether the gray level of that pixel is
greater or less than the value of the threshold T.

12
Fig. 2.3 Hand gesture images before and after segmentation

2.2.4 Noise reduction

Once the hand gesture image has been segmented, a special filter is applied to remove
noise by eliminating all the single white pixels on a black background and all the single
black pixels on a white foreground. To accomplish this goal, a median filter is applied to
the segmented image as shown in Fig. 5.

Fig. 2.4 Median filter effect

2.2.5 Edge detection

To recognize static gestures, the model parameters derived from the description of the
shape and the boundary of the hand are extracted for further processing. Sobel was
chosen for edge detection. Figure 6 shows some gesture images before and after edge
detection operation using Sobel method.

13
Fig. 2.5 Sobel edge detection for Open, Close and Cut

2.3 H/W & S/W requirement


2.3.1 Software Requirement
A set of instructions or program required to make hardware platform suitable for
desired task is known as software. Software can also be defined as the utility
programs that are required to drive hardware of computer.
● Operating system-Microsoft Windows 7 SP 1 or above
● Microsoft Visual Studio 2010
● Python
● Opencv libraries
● Supporting Webcam Drivers

2.3.2 Hardware Requirement

14
All the physical equipment’s i.e. input devices, processor, and output device & inter connecting
processor of the computer s called as hardware
● .Hard Disk minimum of 40 GB.
● RAM minimum of 2 GB.
● Dual Core and up ,15” Monitor.
● Integrated webcam or external webcam (15 -20 fps).

15
Chapter3
Analysis
3 .1Functional Requirements
● Skin Detection Module :This software shall perform skin colour detection and filter out all
objects that do not contain the colour of skin. By filtering object of non-skin coloured, the
system can then use its remaining resources and focus on hand detection and gesture
recognition. This also allows the system to pinpoint possible locations of user’s hands.
● Filtered Object Detection: Once the program has filtered out most of the unwanted parts of
the picture after using the skin detection module, the software shall read and recognize
“clusters” skin coloured objects also known as “blobs”.
● Object Location Upon detection: system shall be able to compute the location of the object
using simple trigonometry math

3.2 Non Functional Requirement


Non-functional requirements specify the criteria in the operation and the architecture of the system.
● Efficiency in Computation: This software shall minimize the use of Central Processing Unit
(CPU) and memory resources on the operating system. When HGR is executing, the
software shall utilize less than 80% of the system’s CPU resource and less than 100
megabytes of system memory. 3.2.2.
● Extensibility: The software shall be extensible to support future developments and add-ons
to the HGR software. The gesture control module of HGR shall be at least 50% extensible to
allow new gesture recognition features to be added to the system.
● Portability: The HGR software shall be 100% portable to all operating platforms that
support Java Runtime Environment (JRE). Therefore, this software should not depend on
the different operating systems.
● Performance: This software shall minimize the number of calculations needed to perform
image processing and hand gesture detection. Each captured video frame shall be processed
within 350 milliseconds to achieve 3 frames per second performance.

16
Chapter4
Design
4.1 Data flow diagram

The graphical representation which shows the information flow, transformations apply as
the information transforms from one stage to the other stage is known as data flow diagram
(DFD). A system or the software at different level of abstraction can be represented by the
data flow diagram.

The data flow diagram can be divided into different levels so as to represent the functional
details and the flow of the information. The mechanism provided by the DFD is functional
modeling and modeling the information flow.The zero level DFD also known as context
model or the fundamental system. It can be used to represent the complete software as one
bubble. The input and the output is shown by the arrows. Extra processes (bubbles) and
information flow paths can be shown as the level 0 DFD is divided further to expose more
detail.

fig 4.1 Data Flow diagram

In DFD level 0, user gives a input video which is stored on the disk to application with the help of
webcam. Application performs required operations on the video by using hand gestures.
Application then gives back the recovered video to user.

17
In DFD level 1, hand gestures are captured through webcam. Processing is done on hand gestures
to detect events of hand, for this through histogram back projection method and hand gesture event
detection methods are used. With the help of these events video is selected which is degraded.
Frames from that video is extracted and then Pix-Mix algorithm is applied on that

frames for video Inpainting methods such as extracting object, replacing object etc. Then final
recovered video is generated.

4.2 Data flow diagram2

4.2 Sequence diagram

An interface chart is the grouping graph which highlights on the time request of the messages. A
grouping chart give the data around a set of items and the messages sent and got by those articles.
The articles are regularly named or unacknowledged examples of classes, yet might likewise speak
to occasions of different things, for example, coordinated efforts, segments, and hubs.
The arrangement graphs are utilized to speak to the element perspective of the framework. A
grouping chart is a kind of correspondence outline in UML that indicates how courses of action
work one with an alternate and in what sort. A succession graph exhibit, as parallel vertical lines,
unique techniques or substance that inhabit the same time, and as straight shafts, the
correspondence exchanged between them, in the request in which they happen. This allows the state
of simple runt

18
fig 4.3 Sequence diagram

4.3 UseCase Diagram

Use (Utilization) case diagrams are one of the five graphs in the Unified Modeling Language
(UML) for displaying the element parts of frameworks (movement outlines, state graph graphs,
arrangement outlines, and cooperation outlines are four different sorts of charts in the UML for
demonstrating the element parts of frameworks). Utilization case outlines are fundamental to
displaying the conduct of a framework, a subsystem, or a class. Every one demonstrates a set of
utilization cases and performing artists and their connections.
We apply utilize case charts to model the utilization case perspective of a framework. Generally,
this includes demonstrating the connection of a framework, subsystem, or class, or displaying the
necessities of the conduct of these components. Utilization case graphs are imperative for
imagining, determining, and archiving the conduct of a component. They make frameworks,
subsystems, and classes agreeable and justifiable by exhibiting an outside perspective of how those
components may be utilized as a part of connection.

19
fig 4.4 Use case Diagram

20
Chapter5
Implementation

5.1 Plan For Implementation

We are going to implement a system that recognizes Gesture input using webcam & performs the
Specified Operation. This application can be made to run in the background while the user runs
other programs and applications. This is very useful for a hands-free approach.

This project has a vast arena of development, notably the Sixth Sense project which completely
revolutionizes the digital world. The code can be extended to incorporate mouse movements as
well as still gestures in 3-D. Further tweaks can be incorporated in the code to increase the
efficiency of the gesture recognition process. The code can be improved for better interpretation
and recognition of the gestures and newer gestures may be incorporated for more functionalities.
The user interface for adding and checking gestures as well as running the program can be
improved greatly, e.g. providing an interactive GUI rather than using terminal commands.

1. Preparing the binary mask

First of all we will create a binary mask of the hand in order to compute the hand contour. To keep
it simple we will segment the images based on the hand skin color using the inrange operation,
but of course one can come up with sophisticated approaches to build a more stable algorithm.
Furthermore we will convert our frames, which are in BGR format by default as you read them
from a file or capture in OpenCV, to HLS (Hue, Lightness, Saturation) color space. The Hue
channel encodes the actual color information. By this we only have to figure out the proper Hue
value range of the skin and then adjust the values for Saturation and Lightness.

21
2. Computing the contour and it’s convex hull

Next we will tell OpenCV to find all contours in the mask. We will return the largest contour in
case segmentation did not work out as well and still contains noise.

3. Detecting the fingertips

22
Now we let OpenCV compute the convexity defects of the hand contour to our new polygon. This
will give us the defect regions of our contour which are described by a starting point p0, an ending
point p1 and the defect point p2. As you can see the defect points (green circles) are located in the
“valley ” position between two points of the convex hull of the contour.

23
Chapter 6
REFERENCES
[1] W. WANG, J. PAN “Hand Segmentation Using Skin Color and Background Information,”
International Conference on Machine Learning and Cybernetics, pp. 1487-1492, July, 2012.
[2] A. Y. Dawod, J. Abodullah, and Md. J. Alam, “Adaptive Skin Color Model for Hand
Segmentation,” International Conference on Computer Applications and Industrial Electronics
(ICCAIE 2010), pp. 486-489, December, 2010.
[3] A. Y. Dawod, J. Abodullah, and Md. J. Alam, “A New Method for Hand Segmentation Using
Free-Form Skin Color Model,” International Conference on Advanced Computer Theory and
Engineering (ICACTE), pp. V2-562-V2-566, 2010.
[4] G. Bebis, M. Nicolescu, R. D. Boyle, and X. Twombly, “A Review on Vision-Based Full DOF
Hand Motion Estimation,” IEEE Computer Society Conference on Computer Vision and Pattern
Recognition (CVPR, 05), 2005
[5] A. Erol, G. Bebis, M. Nicolescu, R. D. Boyle, and X. Twombly, “Vision-based Hand Pose
Estimation: A Review,” Computer Vision and Image Understanding, vol. 108, pp. 52–73, 2007.
[6] X. Wu, C. Yang, Y. Wang, H. Li, and S. Xu, “An Intelligent Interactive System Based on Hand
Gesture Recognition Algorithm and Kinect,” Fifth International Symposium on Computational
Intelligence and Design, pp.294-298, 2012.
[7] Z. Ren, J. Yuan, Member, IEEE, J. Meng, Member, IEEE, and Z. Zhang, Fellow, IEEE
“Robust Part-Based Hand Gesture Recognition Using Kinect Sensor,” IEEE Transactions On
Multimedia, vol. 15, pp. 1110-1120, AUGUST, 2013.
[8] M. Panpar “Hand Gesture Recognition Based on Shape Parameters,” Communication and
Applications (ICCCA), International Conference, pp, 1-6, 2012.

ACKNOWLEDGEMENT

It gives us great pleasure in presenting this project report titled “HAND GESTURE
RECOGNITION” and we wish to express our immense gratitude to the people

24
who provided invaluable knowledge and support in the completion of this project. Their
guidance and motivation has helped in making this project a great success. We express our
gratitude to our project guide Prof.Jayant Gadge, who provided us with all the
guidance and encouragement throughout the project development. We would also like to
express our sincere gratitude to the respective Project coordinators. We are also thankful to
her for providing us the needed assistance, detailed suggestions and also encouragement to
do the project. We would like to express our sincere gratitude to our respected principal Dr.
G.T Thampi and the management of Thadomal Shahani Engineering College for providing
such an ideal atmosphere to build up this project with well-equipped library with all the most
necessary reference materials and up to date IT Laboratories. We are extremely thankful to
all staff and the management of the college for providing us all the facilities and resources
required.

25

You might also like