0% found this document useful (0 votes)

21 views31 pages

Final Review Report

The document discusses developing a gesture recognition system using machine learning to help communicate with hearing or speech impaired individuals. It covers collecting gesture image data, processing the images, using pattern recognition techniques and tools like TensorFlow, Keras and OpenCV to train a model to classify gestures. The system aims to bridge communication between physically challenged individuals and others by recognizing various hand signs and gestures.

Uploaded by

priyavrath tiwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views31 pages

Final Review Report

Uploaded by

priyavrath tiwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 31

A Project Final- Report

Gesture Recognition Using ML

Submitted in partial
fulfillment of the
requirement for the
award of the degree of
B.TECH (CSE)

Under The Supervision of

Mr. Gautam Kumar
(Assistant Professor)

Submitted By
S.No Admission No. Student Name Degree/Branch Semester

1. 20SCSE1010267 Priyabrath Tripathi B.TECH VII

(CSE)
2. 20SCSE1010082 Prakhar Tripathi B.TECH VII
(CSE)

SCHOOL OF COMPUTING SCIENCE AND

ENGINEERING DEPARTMENT OF COMPUTER
SCIENCE AND ENGINEERING GALGOTIAS
UNIVERSITY, GREATER NOIDA INDIA
December , 2023
SCHOOL OF COMPUTING SCIENCE AND
ENGINEERING
GALGOTIAS UNIVERSITY, GREATER NOIDA

CANDIDATE’S DECLARATION
We here by certify that the work which is being presented in the project, entitled “Gesture Recognition
Using ML” in partial fulfillment of the requirements for the award of the B.Tech-CSE submitted in the
School of Computing Science and Engineering of Galgotias University, Greater Noida, is an original
work carried out during the 6 months, and 2022, under the supervision of Gautam Kumar(Assistant
Professor), Department of Computer Science and Engineering/Computer Application and Information
and Science, of School of Computing Science and Engineering , Galgotias University, Greater Noida.

The matter presented in the project has not been submitted by us for the award of
any other degree of this or any other places.
Priyabrath Tripathi (20SCSE1010267)
Prakhar Tripathi (20SCSE1010045)

This is to certify that the above statement made by the candidates is correct to the
best of my knowledge.
Mr.Gautam Kumar

Assistant Professor
CERTIFICATE

The Final Thesis/Project/ Dissertation Viva-Voce examination of Priyabrath Tripathi

(20SCSE1010267) ,Prakhar Tripathi (20SCSE1010082) has been held on 30/11/2023 and
his/her work is recommended for the award of BTech.

Signature of Examiner(s) Signature of Supervisor(s)

Signature of Project Coordinator Signature of Dean

Date: December,

2023

Place: Greater Noida

Abstract
In this Project, an Indian Gesture based communication acknowledgment utilizing
Python program has been created. This work was taken up by remembering the
hardships that are looked by diversely abled individuals, for example, individuals
who can't talk, or the people who can't hear. The code has been written in Python
and prepared utilizing different modules like Tensor flow, Keras, and working
framework module (OS), OpenCv (Cv2), Numpy, and different preprocessors. The
preparation has been finished involving a by made information base of images in
Indian communication through signing (ISL, for example, digits 0-9 as well as an
Internet based data set from GitHub to further develop precision. The outcomes got
were arranged in Anaconda 3.0 and afterward at last tried. This system can help
contrastingly abled individuals discuss better with others around them. This can be
exceptionally useful for the tragically challenged individuals in speaking with
others as realizing communication through signing isn't something normal to
everything, in addition, this can be reached out to making programmed editors,
where the individual can undoubtedly compose by only their hand signals.

I
List of Tables

 Table of Students Data

S. Name Admission Contact Email ID

No. Number Number
1. Prakhar Tripathi 20SCSE1010267 8887879104 [email protected]
n
2. Priyabrath 20SCSE1010082 9598922535 prakhar. [email protected]
Tripathi

 Table for Faculty Data

S. No. Name Contact Number Email ID

1. Mr. Gautam Kumar 8126707228 [email protected]

II
List of Figures

Figure Figure Name Number

No.
1. Introduction (Convolutional Neural Network CNN) 1-8
2. UML: Use Case Diagram 9
3. UML: Sequence Diagram 10
4. DFD 11
5. Flowchart 12
6. ER-Diagram 13
7. Module Description 14
8. (a) Image captured from web-camera. (b) 15
Image after background is set to black using
HSV (first image).

9. (a) Image after binaries. (b) Image after 16

segmentation and resizing

10. Code 18-21

1. Generated Gesture
2. Jupyter Notes

11. Conclusion 22-23

III
Table of Contents

Title Page
No.
Abstract I
List of Table II
List of Figures III
Chapter 1 Introduction 1
1.1 Problem Statement
1.2 Tool and Technology Used
1.2.1 Data Collection
1.2.2 Image Processing
1.2.3 Pattern recognition
1.2.4 Tools Used
1.3 Challenges in Gesture Recognition
1.4 Types of Approaches
1.5 Hand gesture recognition application domain
Chapter 2 Literature Survey/Project Design
2.1 Existing Literature
2.2 Project Requirements
2.2.1 Domain Analysis
2.3 System Design
2.3.1 UML diagram
2.3.2 DFD Diagram
2.3.3 Flowchart Diagram
2.3.4 ER-Diagram
Chapter 3 Module Description
3.1 Data Collection
3.2 Data Processing
CHAPTER-1

INTRODUCTION

"Talk to a man in a language he knows, that goes to his head," Nelson

Mandela said. Talk to him in his own language; it will reach his heart." Language
is undeniably important in human contact, and it has existed since the dawn of
civilization. It is a medium through which individuals communicate in order to
express themselves and comprehend real-world concepts. No books, no cell
phones, and certainly no word I'm writing would be meaningful without it. It's firmly
ingrained. We often take it for granted in our daily lives and fail to recognize its
significance.

Regrettably, in today's rapidly changing culture, people with speech & hearing
impairments are frequently neglected and excluded. Normal people face difficulty
in understanding their language. Hence there is a need of a system which
recognizes the different signs, gestures and conveys the information to the normal
people. It bridges the gap between physically challenged people and normal
people.

It will be a fantastic tool for persons with hearing impairments to convey their
thoughts, as well as a great way for non-sign language users to grasp what the latter
is saying. Many countries have their own set of sign motions and interpretations.
An alphabet in Korean sign language, for example, will not be the same as an
alphabet in Indian sign language. While this emphasizes the diversity of sign
languages, it also emphasizes their complexity. Deep learning must be well-versed
in gestures in order to achieve a reasonable level of accuracy.

1.1 Problem Statement

Hand signals and gestures are used by those who are unable to speak.
Ordinary people have trouble understanding their own language. As a result, a
system that identifies various signs and gestures and relays information to ordinary

1
people is required. It connects persons who are physically handicapped with others
who are not.

1.2 Tools & Technology Used

1.2.1 Data Collection

The various gestures and hand moments will be collected using a digital camera.

1.2.2 Image Processing

Image processing is a method to perform some operations on an image, in order to
get an enhanced image or to extract some useful information from it. It is a type of
signal processing in which input is an image and output may be image or
characteristics/features associated with that image. Nowadays, image processing is
among rapidly growing technologies. It forms core research area within
engineering and computer science disciplines too.
Image processing basically includes the following three steps:
• Importing the image via image acquisition tools.

• Analyzing and manipulating the image.

• Output in which result can be altered image or report that is based on image
analysis.

1.2.3 Pattern recognition:

On the basis of image processing, it is necessary to separate objects
from images by pattern recognition technology, then to identify and classify these
objects through technologies provided by statistical decision theory.

1.2.4 Tools used

The prerequisites software & libraries for the sign language project are:

Python
IDE (Visual Code)
Numpy

2
cv2
Keras
Tensor flow
Hardware tools required:
-

Monitor
Keyboard and Mouse
Digital Camera (Webcam)

1.3 Challenges in gesture recognition

Motion modelling, motion analysis, pattern recognition, and machine learning are all
used in gesture recognition. It includes of manual and non-manual parameters approaches.
The capacity to foresee is influenced by the environment's structure, such as background
light and movement speed. In 2D space, the gesture seems varied due to the multiple angles.
In certain studies, the signer wears a wrist band or a colorful glove to help with the hand
segmentation procedure. For example, wearing colored gloves minimizes the complexity of
the segmentation process. Temporal variation, spatial complexity, movement epenthesis,
repeatability, and connection, as well as numerous properties such as change of orientation
and region of gesture carried out, are all expected issues in dynamic gesture identification. A
gesture recognition system's performance in overcoming difficulties may be measured using
a variety of metrics. Scalability, robustness, real-time performance, and user independence
are categorized.
1.4 Type of approaches
Recognition of hand gestures can be achieved by using either a vision-based or
sensor- based approaches. Vision-based approaches require the acquisition of images or
video of the hand gestures through video camera. Single camera—Webcam, video camera
and smartphone camera. Stereo-camera—using multiple monocular cameras to provides
depth information. Active techniques—Uses the projection of structured light. Such devices
include Kinect and Leap Motion Controller (LMC). Invasive techniques—Body markers
such as colored gloves, wrist bands, and LED lights.
Sensor-based approach requires the use of sensors, instruments to capture the motion,
position, and velocity of the hand. Inertial measurement unit (IMU)—Measure the acceleration,
position, degree of freedom and acceleration Int. J. Mach. Learn. & Cyber. 1 3 of the fingers.

This includes the use of gyroscope and accelerometer. Electromyography (EMG)—Measures

human muscle’s electrical pulses and harness the bio-signal to detect fingers movements.
Wi-Fi and Radar—Uses radio waves, broad beam radar or spectrogram to detect in-air
signal strength changes. Others—Utilizes flex sensors, ultrasonic, mechanical,
electromagnetics and haptic technologies.
3
1.5 Hand gesture recognition application domain
The ability of a computer or machine to understand the hand gestures is the key to unlock
numerous potential application. Potential application domains of gesture recognition system are
as follows:

1. Sign language recognition—Communication medium for the deaf. It consists of several

categories namely fingerspelling, isolated words, lexicon of words, and continuous signs.

2. Robotics and Tele-robotic—Actuators and motions of the robotic arms, legs and other parts
can be moved by simulating a human’s action.

3. Games and virtual reality—Virtual reality enable realistic interaction between user and the
virtual environment. It simulates movement of users and translate the movement in 3D world.

4. Human–computer interaction (HCI)—Includes application of gesture control in military,

medical field, manipulating graphics, design tools, annotating or editing documents.

4
CHAPTER-2
Literature survey/ Project Design

2.1 Existing Literature

A survey of the literature for our proposed system reveals that many
attempts have been made to solve sign identification in videos and photos using
various methodologies and algorithms.

Kshitij Bantupalli et al., proposed a system to help communication in

between no-sign and people using sign-language by creating a visionary
application which converts the sign language into text with the help of
CNN(Convolutional Neural networks) to extract spatial features and using
RNN(Recurrent neural Network) to train the dataset. The drawbacks faced by this
model is that while taking different skin tones to build and analyze the data model,
there was a loss of accuracy. The model also performed poorly when there was
variation in clothing.

A similar model was proposed by Siming He using a 40-word dataset and

10,000 sign language graphics. Faster R-CNN with an incorporated RPN module is
utilised to locate the hand regions in the video frame. In terms of accuracy, it
enhances performance. When compared to single stage target detection algorithms
like YOLO, detection and template classification can be done at a faster rate. When
compared to Fast-RCNN, the detection accuracy of Faster R-CNN improves from
89.0 percent to 91.7 percent in the paper. For the language image sequences, a 3D
CNN is employed for feature extraction, and a sign-language recognition
framework comprising of long and short time memory (LSTM) coding and
decoding networks is created.

The paper by M. Geetha and U. C. Manjusha, make use of 50 specimens of

every alphabets and digits in a vision based recognition of Indian Sign Language
characters and numerals using B-Spine approximations. The region of interest of
the sign gesture is analyzed and the boundary is removed. The boundaryobtained is
further transformed to a B-spline curve by using the Maximum Curvature Points
(MCPs) as the Control points. The B-spline curve undergoes a series of smoothening
process so features can be extracted. Support vector machine is used to classify the
images and the accuracy is 90.00%.

5
In [5], Pigou used CLAP14 as his dataset [6]. It consists of 20 Italian sign
gestures. After preprocessing the images, he used a Convolutional Neural network
model having 6 layers for training. It is to be noted that his model is not a 3D CNN
and all the kernels are in 2D. He has used Rectified linear Units (ReLU) as activation
functions. Feature extraction is performed by the CNN while classification uses
ANN or fully connected layer. His work has achieved an accuracy of 91.70% with
an error rate of 8.30%.

6
2.2 Project Requirements
2.2.1 Domain Analysis
The domain analysis that we have done for the project mainly involved
understanding the neural networks (CNN).

Convolutional neural networks are composed of multiple layers of artificial

neurons. Artificial neurons, a rough imitation of their biological counterparts, are
mathematical functions that calculate the weighted sum of multiple inputs and
outputs an activation value.

The first (or bottom) layer of the CNN usually detects basic features such as
horizontal, vertical, and diagonal edges. The output of the first layer is fed as input
of the next layer, which extracts more complex features, such as corners and
combinations of edges. As you move deeper into the convolutional neural network,
the layers start detecting higher-level features such as objects, faces, and more.

Convolutional neural networks are distinguished from other neural networks

by their superior performance with image, speech, or audio signal inputs. They
have three main types of layers, which are:
Convolutional layer
Pooling layer
Fully-connected (FC) layer

Fig 1 Convolutional Neural Network CNN

7
Convolutional Layer- The convolutional layer is the core building block of a
CNN, and it is where the majority of computation occurs. It requires a few
components, which are input data, a filter, and a feature map. Let’s assume that
the input will be a color image, which is made up of a matrix of pixels in 3D. This
means that the input will have three dimensions—a height, width, and depth—
which correspond to RGB in an image. We also have a feature detector, also known
as a kernel or a filter, which will move across the receptive fields of the image,
checking if the feature is present. This process is known as a convolution.

The feature detector is a two-dimensional (2-D) array of weights, which represents

part of the image. While they can vary in size, the filter size is typically a 3x3
matrix; this also determines the size of the receptive field. The filter is then
applied to an area of the image, and a dot product is calculated between the input
pixels and the filter. This dot product is then fed into an output array. Afterwards,
the filter shifts by a stride, repeating the process until the kernel has swept across
the entire image. The final output from the series of dot products from the input
and the filter is known as a feature map, activation map, or a convolved feature.

Pooling Layer- Pooling layers, also known as down sampling, conducts

dimensionality reduction, reducing the number of parameters in the input. Similar
to the convolutional layer, the pooling operation sweeps a filter across the entire
input, but the difference is that this filter does not have any weights. Instead, the
kernel applies an aggregation function to the values within the receptive field,
populating the output array. There are two main types of pooling: max pooling and
average pooling

Fully-Connected Layer- The name of the full-connected layer aptly describes

itself. As mentioned earlier, the pixel values of the input image are not directly
connected to the output layer in partially connected layers. However, in the fully-
connected layer, each node in the output layer connects directly to a node in the
previous layer.

This layer performs the task of classification based on the features extracted through
the previous layers and their different filters. While convolutional and pooling
layers tend to use ReLu functions, FC layers usually leverage a softmax activation
function to classify inputs appropriately, producing a probability from 0 to 1.

8
2.3 System Design

2.3.1 UML Diagram

These are different diagrams in UML.

Use Case Diagram -Use Case during requirement elicitation and analysis to
represent the functionality of the system. Use case describes a function by
the system that yields a visible result for an actor. The identification of
actorsand use cases result in the definitions of the boundary of the system
i.e., differentiating the tasks accomplished by the system and the tasks
accomplished by its environment. The actors are outside the boundary of the
system, whereas the use cases are inside the boundary of the system. Use
case describes the behavior of the system as seen from the actor’s point of
view. It describes the function provided by the system as a set of events that
yield a visible result for the actor.

Fig 2 Use Case Diagram

9
Sequence Diagram Sequence diagram displays the time sequence of the
objects participating in the interaction. This consists of the vertical
dimension (time) and horizontal dimension (different objects).
Objects: Object can be viewed as an entity at a particular point in time with
specific value and as a holder of identity. A sequence diagram shows object
interactions arranged in time sequence. It depicts the objects and classes
involved in the scenario and the sequence of messages exchanged between the
objects needed to carry out the functionality of the scenario. Sequence
diagrams are typically associated with use case realizations in the Logical
View of the system under development. Sequence diagrams are sometimes
called event diagrams or event scenarios.

Fig-3 Sequence Diagram

10
2.3.2 DATA FLOW DIAGRAM (DFD)
The DFD is also known as bubble chart. It is a simple graphical formalism
that can be used to represent a system in terms of the input data to the system, various
processing carried out on these data, and the output data is generated by the system.
It maps out the flow of information for any process or system, how data is
processed in terms of inputs and outputs. It uses defined symbols like rectangles,
circles and arrows to show data inputs, outputs, storage points and the routes
between each destination. They can be used to analyze an existing system or
modelof a new one.
A DFD can often visually “say” things that
Would be hard to explain in words and they work for both technical and
non- technical. There are four components in DFD:
1. External Entity
2.Process
3.Data Flow
4. Data Store

Fig-4 DFD Diagram

11
2.3.3 Flowchart
A flowchart is a sort of diagram that depicts an algorithm or process by
depicting the stages as various types of boxes and linking them with arrows to
illustrate their sequence. These boxes and arrows do not represent process operations;
rather, they are suggested by the order of operations. Flowcharts are used in a variety
of areas to analyze, develop, record, and manage a process or programme. In a
flowchart, the two most frequent sorts of boxes are:

A processing phase indicated by a rectangular box and typically referred to as

activity.
A diamond is generally used to represent a choice.

Fig-5 Flowchart

12
2.3.4 Entity Relationship Diagram
Entity Relationship Diagram (ERD) is a diagram that shows the
relationship between entities sets recorded in a database. In other words, ER
diagrams aid in the explanation of database logical structure. Entities, attributes,
and relationships are the three core ideas that ER diagrams are built on.
An ER diagram appears to be quite similar to a flowchart at first glance. The ER
Diagram, on the other hand, has numerous specific symbols, and the meanings of
these symbols distinguish this model. The entity framework architecture is
represented by the ER Diagram.

Fig-6 ER-Diagram

13
CHAPTER-3
MODULE
DESCRIPTION

The proposed system's first phase is to collect data. To capture hand

movements, many researchers have employed sensors or cameras. The hand
motions are captured using the web camera in our system. The photographs go
through a series of steps in which the backgrounds are recognized and removed
using the HSV color extraction technique (Hue, Saturation, Value). Following
that, segmentation is used to identify the skin tone region. A mask is applied to the
images using morphological processes, and a series of dilation and erosion using
an elliptical kernel is performed. The photographs obtained with Open-CV are
resized to the same size, so there is no discernible difference between images of
different gestures. The model then evaluated and the system would then be able
topredict the alphabets.

Fig-7 Module Description

14
Data Collection: Image capture with web camera.
Image Processing: Backgrounds detected and eliminated with HSV, then
morphological operations are performed and masks are applied.
Segmentation of hand gesture is done after which, the image is resized.
Feature Extraction: Binary pixels.
Classification: Using CNN of 3 layers:
Evaluation: The precision, recall and F- measures for each class are
determined.
Prediction: The system predicts input gesture of user and displays result.

3.1 Data Collection

The data obtained in vision-based gesture recognition is a frame of pictures.

Images capturing equipment such as a normal video camera, webcam, stereo camera,
thermal camera, or more modern active approaches like as Kinect and LMC are
used to collect input for such systems. Stereo cameras, Kinect, and LMC are three-
dimensional cameras that can capture depth data. Sensor-based recognition is
defined in this study as any data collecting approach that does not employ cameras.

3.2 Data Processing

A. HSV colourspace and background elimination

Because the photos are in RGB color spaces, segmenting the hand motion only on
the basis of skin color becomes more challenging. As a result, we convert the
photos to HSV color space. It is a model that divides an image's color into three
components: hue, saturation, and value. HSV is a useful technique for improving
image stability by separating brightness from chromaticity. Because the Hue
element is unaffected by any form of light, shadows, or shadings, it may be used to
remove backgrounds. To detect the hand motion and set the backdrop to black, a
track-bar with H values ranging from 0 to 179, S values ranging from 0-255, and V
values ranging from 0 to
255 is utilized. With an elliptical kernel, dilation and erosion operations are
performed on the hand gesture region.

15
(a) (b)
Fig-8 (a) Image captured from web-camera. (b) Image after background is set to black
using HSV (first image).

B. Segmentation
After that, the first image is converted to grayscale. While this technique may
result in a loss of color in the skin gesture region, it will also improve our system's
resiliency to changes in lighting or illumination. The converted image's non-black
pixels are binaries, while the others are left intact, resulting in black. The hand
gesture is split in two ways: first, by removing all of the images attached components,
and then, by allowing only the portion that is really related, in this case, the hand
motion. The frame has been reduced to 64 by 64 pixels. After the segmentation
process, binary pictures of 64 by 64 pixels are created, with the white region
representing the hand gesture and the black colored area representing the
remainder.

(a) (b)
Fig-9 (a) Image after binaries. (b) Image after segmentation and resizing

16
C. Feature Extraction
The ability to identify and extract relevant elements from a picture is one of the
most significant aspects of image processing. Images, when collected and saved as
a dataset, typically take up a lot of space since they include a lot of data. Feature
extraction assists us in solving this challenge by automatically decreasing the data
after the key characteristics have been extracted. It also helps to preserve the
classifier's accuracy while simultaneously reducing its complexity. The binary pixels
of the photographs were determined to be critical in our scenario. We were able to
gather enough characteristics by scaling the photos to 64 pixels to properly
categorize the Sign Language motions.

3.3 Classification
Machine learning algorithms for classification can be classified as
supervised or unsupervised. Supervised machine learning is a method of teaching a
computer to detect patterns in input data that can subsequently be used to predict
future data. Supervised machine learning uses a collection of known training data
and applies it to labelled training data to infer a function. Unsupervised machine
learning is used to make conclusions from datasets that have no labelled responses.
There is no reward or punishment weightage to which classes the data is intended
to go because no labelled response is supplied into the classifier.

17
CHAPTER-4
Code
1. Geneated Gesture
import os
import glob
import pandas as pd import io
import xml.etree.ElementTree as ET
import argparse

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
import tensorflow.compat.v1 as tf # Suppress TensorFlow logging (1)
from PIL import Image

from object_detection.utils import dataset_util, label_map_util

from collections import namedtuple

# Initiate argument parser

args = parser.parse_args()
parser = argparse.ArgumentParser(
description="Sample TensorFlow XML-to-TFRecord
if args.image_dir is None:
converter") parser.add_argument("-x",
args.image_dir ="--xml_dir",
def xml_to_csv(path):help="Path to the folder where the input .xml files are
label_map = label_map_util.load_labelmap(args.labels_path)
stored.", type=str)
label_map_dict =
parser.add_argument("-l",
"--labels_path",
help="Path to the labels (.pbtxt) file.",
type=str) parser.add_argument("-o",
"--output_path",
help="Path of output TFRecord (.record) file.",
type=str) parser.add_argument("-i",
"--image_dir",
help="Path to the folder where the input image files are stored. "
"Defaults to the same directory as XML_DIR.",
type=str,
default=None) parser.add_argument("-c",
"--csv_path",
help="Path of output .csv file. If none provided, then no file
will
be "
"written.",
"""Iterates through all .xml files (generated by labelImg) in a given directory
and combines
them in a single Pandas dataframe.
Parameters:

path : str
The path containing the .xml files Returns

Pandas DataFrame
The produced dataframe
"""

xml_list = []
for xml_file in glob.glob(path +
def class_text_to_int(row_label):
'/*.xml'): tree = ET.parse(xml_file)
return
root = tree.getroot()
def split(df, group):
datafor member in root.findall('object'):
= namedtuple('data', ['filename',
def 'object'])value = df.groupby(group)
create_tf_example(group,
gb = path):
with (root.find('filename').text,
tf.gfile.GFile(os.path.join(path,
return [data(filename, gb.get_group(x))'{}'.format(group.filename)),
for filename, x in 'rb') as
fid: int(root.find('size')[0].text),
encoded_jpg = fid.read()
zip(gb.groups.keys(), gb.groups)]
encoded_jpg_io = int(root.find('size')
[1].text),
io.BytesIO(encoded_jpg) member[0].text,
image =
int(member[4][0].text),
Image.open(encoded_jpg_io) width,
int(member[4][1].text),
filename = int(member[4][2].text),
int(member[4][3].text)
group.filename.encode('utf8')
)
xml_list.append(value)
column_name = ['filename', 'width', 'height',
'class', 'xmin', 'ymin', 'xmax',
'ymax'] xml_df = pd.DataFrame(xml_list,
xmins =
[] xmaxs
= []
ymins =
[] ymaxs
= []

for index, row in group.object.iterrows():

xmins.append(row['xmin'] / width)
xmaxs.append(row['xmax'] / width)
def main(_):
ymins.append(row['ymin'] / height)
ymaxs.append(row['ymax'] / height)
writer =
if name classes_text.append(row['class'].encode('utf8'))
== '__main__':
tf.python_io.TFRecordWriter(args.output_path) path =
classes.append(class_text_to_int(row['class']))
tf.app.run()
os.path.join(args.image_dir)
examples
tf_example = xml_to_csv(args.xml_dir)
= tf.train.Example(features=tf.train.Features(feature={
grouped = split(examples,
'image/height': dataset_util.int64_feature(height),
'filename') for group
'image/width': in grouped:
dataset_util.int64_feature(width),
tf_example = create_tf_example(group, path)
'image/filename': dataset_util.bytes_feature(filename),
writer.write(tf_example.SerializeToString())
'image/source_id': dataset_util.bytes_feature(filename),
writer.close()
'image/encoded': dataset_util.bytes_feature(encoded_jpg),
print('Successfully created the TFRecord file:
'image/format': dataset_util.bytes_feature(image_format),
{}'.format(args.output_path)) if args.csv_path is not None:
'image/object/bbox/xmin':
examples.to_csv(args.csv_path, index=None)
dataset_util.float_list_feature(xmins),
'image/object/bbox/xmax':
dataset_util.float_list_feature(xmaxs),
'image/object/bbox/ymin':
dataset_util.float_list_feature(ymins),
'image/object/bbox/ymax':
dataset_util.float_list_feature(ymaxs),
2. JupyterNotes:
import
cv2 Output
import os
import
time
import
uuid
IMAGES_PATH =
'Tensorflow/workspace/images/collectedimages' labels =
['hello', 'thanks', 'yes' , 'no', 'good'] number_imgs =
15
for label in labels:
!mkdir {'Tensorflow\workspace\images\
collectedimages\\'+label} cap = cv2.VideoCapture(0)
print('collecting images for
{}'.format(label)) time.sleep(5)
for imagnum in
range(number_imgs): ret,
frame = cap.read()
imgname = os.path.join(IMAGES_PATH,
label,
label+'.'+'{}.jpg'.format(str(uuid.uuid1())))
cv2.imwrite(imgname,
frame)
CHAPTER-5
CONCLUSION AND FUTURE ENHANCEMENT

Gesture recognition is a field of study that has a wide range of applications,

including sign language recognition, remote control robotics, and virtual reality
human–computer interface. Nonetheless, the occlusion of the hand, the existence of
affine transformation, database scalability, varied backdrop illumination, and high
processing cost remain challenges to establishing an accurate and resilient system.

Many breakthroughs have been made in the field of artificial intelligence, machine
learning and computer vision. They have immensely contributed in how we
perceive things around us and improve the way in which we apply their techniques
in our everyday lives. Many researches have been conducted on sign gesture
recognition using different techniques like ANN, LSTM and 3D CNN. However,
most of them require extra computing power. On the other hand, our research paper
requires low computing power and gives a remarkable accuracy of above 90%. In
our research, we proposed to normalize and rescale our images to 64 pixels in order
to extract features (binary pixels) and make the system more robust.

The movements, body language, and facial expressions used in sign languages
vary greatly from nation to country. The syntax and structure of a sentence might
also differ significantly. Learning and recording motions was a difficult task in our
study since hand movement had to be accurate and on point. Certain movements
are difficult to duplicate. And keeping our hands in the same place while
compiling our dataset was difficult.

We hope to expand our datasets with other alphabets and refine the model so that it
can recognize more alphabetical characteristics while maintaining high accuracy.
We'd like to improve the system even further by include voice recognition so that
blind individuals may benefit as well.

Gesture recognition using machine learning has proven to be a transformative

technology with diverse applications across various industries. Through the
utilization of algorithms and neural networks, it enables systems to interpret and

22
respond to human gestures accurately. In conclusion, this technology has
demonstrated significant advancements in real-time gesture analysis, enhancing
human-computer interaction and fostering accessibility in numerous fields.
One of the key strengths of machine learning-based gesture recognition lies in its
ability to continuously improve accuracy and efficiency with more extensive
datasets. Further enhancements could involve refining existing models through
increased data diversity, encompassing various gestures across different
demographics and cultural backgrounds. Additionally, integrating multimodal
approaches by combining gesture recognition with other sensory inputs like voice
or facial expressions could enhance the overall system performance.

Moreover, optimizing computational efficiency remains crucial for deploying

gesture recognition in resource-constrained environments. Exploring lightweight
architectures and algorithms that balance accuracy and computational cost would
make these systems more accessible across a wider range of devices.

Furthermore, the incorporation of interpretability and transparency into machine

learning models for gesture recognition is essential. Developing methods to
understand and explain the decision-making process of these models could
increase their trustworthiness, particularly in critical applications like healthcare or
autonomous systems.

In conclusion, while machine learning-based gesture recognition has shown

immense potential, continuous research and development are essential to further
improve its accuracy, efficiency, interpretability, and deployment across diverse
applications

23
REFRENCES

[1] https://peda.net/id/08f8c4a8511
[2] K. Bantupalli and Y. Xie, "American Sign Language Recognition using Deep Learning
and Computer Vision," 2018 IEEE International Conference on Big Data (Big Data),
Seattle, WA, USA, 2018, pp. 4896-4899, doi:10.1109/BigData.2018.8622141.

[3] M. Geetha and U. C. Manjusha, , “A Vision Based Recognition of Indian Sign Language
Alphabets and Numerals Using B-Spline Approximation”, International Journal onComputer
Science and Engineering (IJCSE), vol. 4, no. 3, pp. 406-415. 2012.

[4] He, Siming. (2019). Research of a Sign Language Translation System Based on
Deep Learning. 392-396. 10.1109/AIAM48774.2019.00083.

[5] Pigou L., Dieleman S., Kindermans PJ., Schrauwen B. (2015) Sign Language
Recognition Using Convolutional Neural Networks. In: Agapito L., Bronstein
M., Rother C. (eds) Computer Vision - ECCV 2014 Workshops. ECCV 2014. Lecture Notes in
Computer Science, vol 8925. Springer, Cham.

[6] Escalera, S., Baró, X., Gonzàlez, J., Bautista, M., Madadi, M., Reyes, M., . . . Guyon,
I. (2014). ChaLearn Looking at People Challenge 2014: Dataset and
Results. Workshop at the European Conference on Computer Vision (pp. 459-473). Springer, .
Cham.

128GB Rom Pack - Free Download, Borrow, and Streaming - Internet Archive
No ratings yet
128GB Rom Pack - Free Download, Borrow, and Streaming - Internet Archive
5 pages
How To Create Bank Statement Transaction Creation Rules and Account Bank Charges Fees or Interest
100% (2)
How To Create Bank Statement Transaction Creation Rules and Account Bank Charges Fees or Interest
9 pages
Blackbook
No ratings yet
Blackbook
35 pages
G7 Synopsis
No ratings yet
G7 Synopsis
14 pages
Project Pre - Submission Final Report
No ratings yet
Project Pre - Submission Final Report
17 pages
Surveyreport 1
No ratings yet
Surveyreport 1
4 pages
Project Report - Sign Language To Text Conversion
No ratings yet
Project Report - Sign Language To Text Conversion
34 pages
ASL Recognition Using Hand Gestures
No ratings yet
ASL Recognition Using Hand Gestures
55 pages
Mudratalk: Indian Sign Language Translator: Bharati Vidyapeeth Deemed To Be University
No ratings yet
Mudratalk: Indian Sign Language Translator: Bharati Vidyapeeth Deemed To Be University
18 pages
Sign Language Report
No ratings yet
Sign Language Report
32 pages
Final PPT Capstone Project
No ratings yet
Final PPT Capstone Project
17 pages
Sign Laguage To Text Convertor - Synopsis - Docx - Google Drive
No ratings yet
Sign Laguage To Text Convertor - Synopsis - Docx - Google Drive
12 pages
Rida Mumtaz
No ratings yet
Rida Mumtaz
26 pages
Final Report
No ratings yet
Final Report
22 pages
CEW
No ratings yet
CEW
21 pages
Deep Learning Based Sign Language Recognition System Using Convolutional Neural Network
No ratings yet
Deep Learning Based Sign Language Recognition System Using Convolutional Neural Network
68 pages
Project Presentation - 3 Template
No ratings yet
Project Presentation - 3 Template
12 pages
Project Synopsis
No ratings yet
Project Synopsis
22 pages
Project Synopsis
No ratings yet
Project Synopsis
31 pages
Design Project 2
No ratings yet
Design Project 2
9 pages
Project Review 1
No ratings yet
Project Review 1
24 pages
Project Synopsis
No ratings yet
Project Synopsis
20 pages
Hand Signs To Audio Converte1
No ratings yet
Hand Signs To Audio Converte1
11 pages
Final Projct
No ratings yet
Final Projct
46 pages
Sign
No ratings yet
Sign
70 pages
Review 3
No ratings yet
Review 3
17 pages
Gesture Recognition System With Machine Learning
No ratings yet
Gesture Recognition System With Machine Learning
26 pages
Synopsis
No ratings yet
Synopsis
20 pages
Hand Gesture Recognition Report-Updated
No ratings yet
Hand Gesture Recognition Report-Updated
62 pages
Aicte Project
No ratings yet
Aicte Project
10 pages
Synopsis Final Year
No ratings yet
Synopsis Final Year
8 pages
"Ai Base Gesture Recognition" (1) .2
No ratings yet
"Ai Base Gesture Recognition" (1) .2
12 pages
Minor Project Report Sign Language Detection
No ratings yet
Minor Project Report Sign Language Detection
34 pages
Final Report
No ratings yet
Final Report
39 pages
Assignment: Shubam Thakyal (2021A1R032)
No ratings yet
Assignment: Shubam Thakyal (2021A1R032)
51 pages
Hand Gesture Recognition and Voice Conversion For Deaf and Dumb
No ratings yet
Hand Gesture Recognition and Voice Conversion For Deaf and Dumb
8 pages
Synopsis Main
No ratings yet
Synopsis Main
10 pages
Project File
No ratings yet
Project File
66 pages
Bachelor's-Project Report - (Sign Language To Text Conversion)
No ratings yet
Bachelor's-Project Report - (Sign Language To Text Conversion)
30 pages
Project Report - Sign Language To Text Conversion..2
No ratings yet
Project Report - Sign Language To Text Conversion..2
37 pages
Final Minor Report
No ratings yet
Final Minor Report
24 pages
From - Table - of - Content - Report - s2t (1) (1) 2
No ratings yet
From - Table - of - Content - Report - s2t (1) (1) 2
33 pages
JOURNAL Sign
No ratings yet
JOURNAL Sign
2 pages
Analyses of Machine Learning Techniques For Sign Language To Text Conversion For Speech Impaired
No ratings yet
Analyses of Machine Learning Techniques For Sign Language To Text Conversion For Speech Impaired
5 pages
Synopsis 1
No ratings yet
Synopsis 1
8 pages
Recognizing and Transforming Sign Language To Speech
No ratings yet
Recognizing and Transforming Sign Language To Speech
23 pages
Sign Language Recognition Project Full60Pages
No ratings yet
Sign Language Recognition Project Full60Pages
19 pages
Implementation of Virtual Assistant With Sign Language Using Deep Learning and Tensor Flow
No ratings yet
Implementation of Virtual Assistant With Sign Language Using Deep Learning and Tensor Flow
4 pages
B09 SignLanguageDetection
No ratings yet
B09 SignLanguageDetection
35 pages
Batch 2 - It A
No ratings yet
Batch 2 - It A
23 pages
Sign Lang
No ratings yet
Sign Lang
19 pages
Project 54
No ratings yet
Project 54
63 pages
Final Year Report
No ratings yet
Final Year Report
48 pages
Report 3
No ratings yet
Report 3
32 pages
Sign Lang Detection Project
No ratings yet
Sign Lang Detection Project
16 pages
Updated 10663 M. Arsalan & 10662 M. Abubakar FYP-1 Report
No ratings yet
Updated 10663 M. Arsalan & 10662 M. Abubakar FYP-1 Report
19 pages
Sign Language Detection Using The Computer Vision
No ratings yet
Sign Language Detection Using The Computer Vision
27 pages
Implementation of Hand Gesture Recognition System To Aid Deaf-Dumb People
No ratings yet
Implementation of Hand Gesture Recognition System To Aid Deaf-Dumb People
15 pages
Major Project 'Hand Gesture Recognition System ' Synopsis
No ratings yet
Major Project 'Hand Gesture Recognition System ' Synopsis
9 pages
At2 Final
No ratings yet
At2 Final
23 pages
Farman
No ratings yet
Farman
9 pages
Pranjal Report
No ratings yet
Pranjal Report
49 pages
Networks of Control PDF
No ratings yet
Networks of Control PDF
165 pages
Computer Database Assignment
No ratings yet
Computer Database Assignment
4 pages
Software Quality, Dilemma, Achieving
33% (3)
Software Quality, Dilemma, Achieving
21 pages
Reading Sample Sap Press Sap Analytics Cloud Financial Planning and Analysis
No ratings yet
Reading Sample Sap Press Sap Analytics Cloud Financial Planning and Analysis
28 pages
Python Overview
No ratings yet
Python Overview
15 pages
BTP EA Intro BB Ver 1.01 SAP Mobile Cards
No ratings yet
BTP EA Intro BB Ver 1.01 SAP Mobile Cards
13 pages
Embedded System: Hcmute - Faculty of Mechanical Engineering Lecturer: Phd. Bui Ha Duc Email: Ducbh@Hcmute - Edu.Vn
No ratings yet
Embedded System: Hcmute - Faculty of Mechanical Engineering Lecturer: Phd. Bui Ha Duc Email: Ducbh@Hcmute - Edu.Vn
23 pages
Database Management 2020
No ratings yet
Database Management 2020
5 pages
Robo 7
No ratings yet
Robo 7
5 pages
Python Flashcards V2
No ratings yet
Python Flashcards V2
13 pages
User-Manual LAUNCH
No ratings yet
User-Manual LAUNCH
3 pages
ST InspireCast 0005 Datasheet 8.5x11
No ratings yet
ST InspireCast 0005 Datasheet 8.5x11
2 pages
Commands
No ratings yet
Commands
4 pages
05 Huawei MindSpore AI Development Framework - ALEX
No ratings yet
05 Huawei MindSpore AI Development Framework - ALEX
57 pages
Symfony Bundles Documentation
No ratings yet
Symfony Bundles Documentation
4 pages
SOLID Properties PDF
No ratings yet
SOLID Properties PDF
26 pages
Sepm Final 064
No ratings yet
Sepm Final 064
67 pages
CLF-C02 Exam Guide Slides
No ratings yet
CLF-C02 Exam Guide Slides
30 pages
Corn Leaf Disease Detection (The Crop Master)
No ratings yet
Corn Leaf Disease Detection (The Crop Master)
7 pages
LWDMTX5 001
No ratings yet
LWDMTX5 001
333 pages
What Are Pointers?: Key Points To Remember About Pointers in C
No ratings yet
What Are Pointers?: Key Points To Remember About Pointers in C
10 pages
Schneider - Configure OCPP - Handleiding-Mobiflow-manuals-Schneider-Electric-V2
No ratings yet
Schneider - Configure OCPP - Handleiding-Mobiflow-manuals-Schneider-Electric-V2
31 pages
A Project Report On Management Information System
No ratings yet
A Project Report On Management Information System
30 pages
Veeam Backup 11 0 Storage Integration User Guide
No ratings yet
Veeam Backup 11 0 Storage Integration User Guide
259 pages
CCBoot Manual - Update Image and Game
No ratings yet
CCBoot Manual - Update Image and Game
65 pages
Buss Pass
No ratings yet
Buss Pass
1 page
White Paper - CRM Contact Person Replication - 09feb10
No ratings yet
White Paper - CRM Contact Person Replication - 09feb10
11 pages

Final Review Report

Uploaded by

Final Review Report

Uploaded by

A Project Final- Report

Gesture Recognition Using ML

Under The Supervision of

1. 20SCSE1010267 Priyabrath Tripathi B.TECH VII

SCHOOL OF COMPUTING SCIENCE AND

The Final Thesis/Project/ Dissertation Viva-Voce examination of Priyabrath Tripathi

Signature of Examiner(s) Signature of Supervisor(s)

Signature of Project Coordinator Signature of Dean

Place: Greater Noida

 Table of Students Data

S. Name Admission Contact Email ID

 Table for Faculty Data

S. No. Name Contact Number Email ID

Figure Figure Name Number

9. (a) Image after binaries. (b) Image after 16

10. Code 18-21

11. Conclusion 22-23

"Talk to a man in a language he knows, that goes to his head," Nelson

1.1 Problem Statement

1.2 Tools & Technology Used

1.2.1 Data Collection

1.2.2 Image Processing

• Analyzing and manipulating the image.

1.2.3 Pattern recognition:

1.2.4 Tools used

1.3 Challenges in gesture recognition

This includes the use of gyroscope and accelerometer. Electromyography (EMG)—Measures

1. Sign language recognition—Communication medium for the deaf. It consists of several

4. Human–computer interaction (HCI)—Includes application of gesture control in military,

2.1 Existing Literature

Kshitij Bantupalli et al., proposed a system to help communication in

A similar model was proposed by Siming He using a 40-word dataset and

The paper by M. Geetha and U. C. Manjusha, make use of 50 specimens of

Convolutional neural networks are composed of multiple layers of artificial

Convolutional neural networks are distinguished from other neural networks

Fig 1 Convolutional Neural Network CNN

The feature detector is a two-dimensional (2-D) array of weights, which represents

Pooling Layer- Pooling layers, also known as down sampling, conducts

Fully-Connected Layer- The name of the full-connected layer aptly describes

2.3.1 UML Diagram

Fig 2 Use Case Diagram

Fig-3 Sequence Diagram

Fig-4 DFD Diagram

A processing phase indicated by a rectangular box and typically referred to as

The proposed system's first phase is to collect data. To capture hand

Fig-7 Module Description

3.1 Data Collection

The data obtained in vision-based gesture recognition is a frame of pictures.

3.2 Data Processing

A. HSV colourspace and background elimination

from object_detection.utils import dataset_util, label_map_util

# Initiate argument parser

for index, row in group.object.iterrows():

Gesture recognition is a field of study that has a wide range of applications,

Gesture recognition using machine learning has proven to be a transformative

Moreover, optimizing computational efficiency remains crucial for deploying

Furthermore, the incorporation of interpretability and transparency into machine

In conclusion, while machine learning-based gesture recognition has shown

You might also like