0% found this document useful (0 votes)
15 views24 pages

Project Presentation

Uploaded by

Rohit Majumder
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views24 pages

Project Presentation

Uploaded by

Rohit Majumder
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

BHARATI VIDYAPEETH DEEMED TO BE UNIVERSITY

DEPARTMENT OF ENGINEERING AND TECHNOLOGY, NAVI MUMBAI


CAMPUS
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

HAND GESTURE RECOGNITION

Project Members:
Rohit Majumder Bhaskar Chaurasia
Aditya Tyagi Sanika Khankale

Guide:
Prof. Datta Deshmukh
Introduction

In the past few years, huge advancements have been made in the fields of science and
technology. Not only this, but technology has also got much cheaper and its availability
has widened as it is now available to the common man. So, it is vital to no longer overlook
the duty of our generation to make use of this accessibility to technology to contribute to
the progress and improvement of society at large.

Human beings have, since the beginning of time, been described as a social animal. As a
social being, one of the principal aspects of our life is communication. Social interaction or
simply communication has always been regarded as one of the major aspects of living a
happy life. For an individual to live a normal lifestyle, communication is necessary and is
required for almost all of our daily tasks. But there is a not so blessed segment of society
which faces hearing and vocal disabilities. A hearing-impaired individual is one who either
can’t hear at all or is able to hear sounds which are above a certain frequency, or what we’d
generally call ‘can only hear when spoken too loudly’. An individual with the inability to
speak due to any reason whatsoever is considered a mute or silent person.
Indian Sign Language

History
● ISL uses both hands similar to
British Sign Language and is
similar to International Sign
Language.
● ISL alphabets derived from British
Sign
Language and French Sign
Language alphabets.
● Unlike its american counterpart
which uses one hand, uses both
hands to represent alphabets.
Indian Sign Language

American Sign Language Indian Sign Language


Literature Review
Sr. Paper Title[Year of Technology Advantages Limitations
No. publication] used

1 HSRS using KNN [2011] KNN High Accuracy Very Inefficient

2 ISLRS using CNN [2021] CNN Accuracy Requires Huge


And Efficiency Training Data
3 ISLRS using SURF with SURF based Very Robust Requires Huge
SVM and CNN [2022] SVM and Training Data
CNN
4 ISLRS using Customized Customized Customisable Requires Huge
CNN [2023] CNN Training Data
Problem Statement
Despite the richness and effectiveness of sign language as a mode of communication for
the hearing-impaired community, the lack of widespread understanding and interpretation
by non-signers poses significant barriers in effective communication. Presently, there is a
pronounced absence of reliable, real-time hand sign language recognition systems that can
accurately interpret diverse sign gestures, adapt to varying signing styles, and facilitate
seamless communication between signers and non-signers.

Existing systems often face challenges in accurately recognizing intricate hand movements,
handling variations in sign gestures, and ensuring real-time responsiveness. The absence of
a robust, adaptable, and universally applicable hand sign language recognition system limits
the ability of individuals using sign language to interact inclusively with the broader community.

The primary problem addressed by this project revolves around the development of an
accurate, adaptable, and real-time hand sign language recognition system capable of overcoming
the challenges posed by variations in gestures, environmental factors, and different signing styles.
The system aims to bridge the communication gap between signers and non-signers, thereby
fostering inclusivity, enhancing accessibility, and enabling effective communication across
diverse settings.
Objectives
The primary objective of this project is to explore the efficacy and potential advantages of
employing Graph Convolutional Networks (GCNs) in contrast to traditional Convolutional Neural
Networks (CNNs) for hand sign language recognition.

Hand gesture recognition technology uses sensors or web camera to read and interpret
hand movements as commands.

The objective of hand gesture recognition for mute and deaf people is to provide an
alternative means of communication for those who are unable to speak or hear.

This technology allows drivers and passengers to interact with the vehicle, usually to
control the infotainment system without touching any buttons or screens.
Existing System
Indian Sign Language Recognition System using SURF(Speeded-Up
Robust Features) with SVM and CNN [2022]
By Uma Shanker Tiwary
Shagun Katoch
Varsha Singh

Sign Language Recognition : High Performance Deep Learning


Approach Applied To Multiple Sign
Languages[2022] By Nabil Benaya
El Zaar
El allati
Learning
Frame Extraction

Skin Segmentation

Feature Extraction

Training and

Testing
Skin Segmentation

Initial Approaches
● Training on skin segmentation dataset
Tried machine learning models like SVM, random forests on the skin
segmentation dataset from
https://archive.ics.uci.edu/ml/datasets/Skin+Segmentation
Very bad dataset, after training on around 2,00,000 points, skin segmentation
of hand images gave back almost black image(i.e. almost no skin detection)

● HSV model and constraints on values of H and S


Convert Image from RGB to HSV model and retain pixels satisfying 25<H<230
and 25<S<230
This implementation wasn’t much effective and the authors in the report had
used it along with motion segmentation which made their segmentation
slightly better.
Skin Segmentation

Final Approach
In this approach, we transform the image from RGB space to YIQ and YUV space.
From U and V, we get theta=tan-1(V/U). In the original approach, the author
classified skin pixels as those with 30<I<100 and 105o<theta<150o .
Since those parameters weren’t working that good for us, we somewhat tweaked
the
parameters and it performed much better than the previous two approaches.
Skin Segmentation

Final Approach
In this approach, we transform the image from RGB space to YIQ and YUV space.
From U and V, we get theta=tan-1(V/U). In the original approach, the author
classified skin pixels as those with 30<I<100 and 105o<theta<150o .
Since those parameters weren’t working that good for us, we somewhat tweaked
the
parameters and it performed much better than the previous two approaches.
Bag of Visual Words

Bag of Words approach


In BoW approach for text
classification,a document is
represented as a bag(multiset)
of its words.
In Bag of Visual Words ,we use
the
BoW approach for image
classification, whereby every
image is treated as a document.
So now “words” need to be
defined for the image also.
Result obtained For Bag of Visual
Words
We took 25 images per alphabet from 3 person each for training and 25
images per alphabet from another person for testing.
So training over 1950 images, we tested for 650 images and obtained
the following results :-

Train Set Size Test Set Size Correctly Classified Accuracy

1950 650 220 33.84%


Result obtained For Bag of Visual
Words

Observations
● Similar looking alphabets misclassified amongst each other
● One of the persons among the 3 persons was left handed and gave
laterally inverted images for many alphabets.
Proposed System

Indian Sign Language Recognition System using


GCNs(Graphical Convolutional Networks)

An end-to-end deep neural network, hand gesture graph


convolutional network, is presented in which the convolution is
conducted only on linked skeleton joints. Since the training dataset
is relatively small, this work proposes expanding the coordinate
dimensionality so as to let models learn more semantic features.
Furthermore, relative coordinates are employed to help hand gesture
graph convolutional network learn the feature representation
independent of the random starting positions of actions. The
proposed method is validated on two challenging datasets, and the
experimental results show that it outperforms the state-of-the-art
methods. Furthermore, it is relatively lightweight in practice for
hand skeleton-based gesture recognition.
Convolutional networks perform well on processing images and skeleton
joints sequences which can be seen as special images. However, arranging
the joints as pixels in images leads to the destruction of the original human
body topology. Since this defect has been noticed, we proposed a novel
model named ST-GCN to limit the convolutional operations between the
linked joints. Inspiring by their work, we proposed HG-GCN for the finger
hand gesture recognition. To put the proposed model into context, a brief
overview of this structure is firstly provided. Then, we describe our HG-GCN
for hand gesture recognition. Also, we detail two strategies to process
embedded data.
Overview of spatial temporal
graph convNet

This section analyzes the structure of ST-GCN and its propagation. A normal convolutional
network takes input as a four-dimensional matrix whose shape is [N,H,W,C] where N denotes
the batch size, C denotes the channel, and H×W denotes the area of the image. In order to use
convolutional networks for skeleton-based action recognition, an embedded skeleton joints
sequence is reshaped to [N,T,V,C] where N denotes the batch size, T denotes the length of
frames, V denotes the number of joints each frame, and C denotes the coordinate dimensions
of joints. Although skeleton joints can be presented as an image in this way, it ignores the
relationship between different parts of skeleton joints and hence propagate irrelevant
information from one joint to another, which introduces noise between them.

To address this problem, we proposed ST-GCN to multiply an adjacent [V,V] matrix A with
feature maps after t×1 convolutional operations. The elements in this matrix are decided by the
relationship of each two joints, e.g., column vectors denote joints themselves and row vectors
denote the joints linked to them. The whole weights add to 1 for every joint and are the same
for all linked joints, e.g., numbers of A1,M and AN,M are both 0.5 if joint VM is only linked to
joint VN.
Hand gesture graph convNet
Conclusion
The exploration of Graph Convolutional Networks (GCNs) and Convolutional Neural
Networks (CNNs) for hand sign recognition has provided valuable insights into their
respective strengths and suitability for this specific task.

Throughout this study, both architectures were evaluated in the context of their
applicability to hand sign recognition, considering factors such as model performance,
data representation, and computational efficiency.

CNNs, known for their efficacy in extracting spatial features from images, have
traditionally been the go-to architecture for image-based tasks like sign language
recognition. Their ability to capture spatial hierarchies within pixel data remains a
robust approach, especially when working with image-centric datasets.

On the other hand, Graph Convolutional Networks (GCNs) showcase promising


potential in scenarios where the data naturally lends itself to a graph structure. For
hand sign recognition, if hand poses can be effectively modeled as graphs GCNs have
shown the ability to capture intricate spatial relationships between these key points,
potentially offering advantages in understanding complex hand gestures and their
spatial dependencies.
References
1.http://mi.eng.cam.ac.uk/~cipolla/lectures/PartIB
/old/IB-visualcodebook.pdf
2.https://github.com/shackenberg/Minimal-Bag-of-Visual-Words-Image-
Classifier/blob/master/sift.py
3.http://en.wikipedia.org/wiki/YIQ
4.http://en.wikipedia.org/wiki/YUV 5.
http://cs229.stanford.edu/proj2011/ChenSenguptaSundaram-
SignLanguageGestureRecognitionWithUnsupervisedFeatureLearni
ng.pdf
6.http://en.wikipedia.org/wiki/Bag-of-
words_model_in_computer_vision
7.Neha V. Tavari, P. A. V. D.,Indian sign language recognition
based on histograms of oriented gradient,International Journal
of Computer Science and Information Technologies 5, 3 (2014),
3657-3660
Thank You

You might also like