Project Presentation
Project Presentation
Project Members:
Rohit Majumder Bhaskar Chaurasia
Aditya Tyagi Sanika Khankale
Guide:
Prof. Datta Deshmukh
Introduction
In the past few years, huge advancements have been made in the fields of science and
technology. Not only this, but technology has also got much cheaper and its availability
has widened as it is now available to the common man. So, it is vital to no longer overlook
the duty of our generation to make use of this accessibility to technology to contribute to
the progress and improvement of society at large.
Human beings have, since the beginning of time, been described as a social animal. As a
social being, one of the principal aspects of our life is communication. Social interaction or
simply communication has always been regarded as one of the major aspects of living a
happy life. For an individual to live a normal lifestyle, communication is necessary and is
required for almost all of our daily tasks. But there is a not so blessed segment of society
which faces hearing and vocal disabilities. A hearing-impaired individual is one who either
can’t hear at all or is able to hear sounds which are above a certain frequency, or what we’d
generally call ‘can only hear when spoken too loudly’. An individual with the inability to
speak due to any reason whatsoever is considered a mute or silent person.
Indian Sign Language
History
● ISL uses both hands similar to
British Sign Language and is
similar to International Sign
Language.
● ISL alphabets derived from British
Sign
Language and French Sign
Language alphabets.
● Unlike its american counterpart
which uses one hand, uses both
hands to represent alphabets.
Indian Sign Language
Existing systems often face challenges in accurately recognizing intricate hand movements,
handling variations in sign gestures, and ensuring real-time responsiveness. The absence of
a robust, adaptable, and universally applicable hand sign language recognition system limits
the ability of individuals using sign language to interact inclusively with the broader community.
The primary problem addressed by this project revolves around the development of an
accurate, adaptable, and real-time hand sign language recognition system capable of overcoming
the challenges posed by variations in gestures, environmental factors, and different signing styles.
The system aims to bridge the communication gap between signers and non-signers, thereby
fostering inclusivity, enhancing accessibility, and enabling effective communication across
diverse settings.
Objectives
The primary objective of this project is to explore the efficacy and potential advantages of
employing Graph Convolutional Networks (GCNs) in contrast to traditional Convolutional Neural
Networks (CNNs) for hand sign language recognition.
Hand gesture recognition technology uses sensors or web camera to read and interpret
hand movements as commands.
The objective of hand gesture recognition for mute and deaf people is to provide an
alternative means of communication for those who are unable to speak or hear.
This technology allows drivers and passengers to interact with the vehicle, usually to
control the infotainment system without touching any buttons or screens.
Existing System
Indian Sign Language Recognition System using SURF(Speeded-Up
Robust Features) with SVM and CNN [2022]
By Uma Shanker Tiwary
Shagun Katoch
Varsha Singh
Skin Segmentation
Feature Extraction
Training and
Testing
Skin Segmentation
Initial Approaches
● Training on skin segmentation dataset
Tried machine learning models like SVM, random forests on the skin
segmentation dataset from
https://archive.ics.uci.edu/ml/datasets/Skin+Segmentation
Very bad dataset, after training on around 2,00,000 points, skin segmentation
of hand images gave back almost black image(i.e. almost no skin detection)
Final Approach
In this approach, we transform the image from RGB space to YIQ and YUV space.
From U and V, we get theta=tan-1(V/U). In the original approach, the author
classified skin pixels as those with 30<I<100 and 105o<theta<150o .
Since those parameters weren’t working that good for us, we somewhat tweaked
the
parameters and it performed much better than the previous two approaches.
Skin Segmentation
Final Approach
In this approach, we transform the image from RGB space to YIQ and YUV space.
From U and V, we get theta=tan-1(V/U). In the original approach, the author
classified skin pixels as those with 30<I<100 and 105o<theta<150o .
Since those parameters weren’t working that good for us, we somewhat tweaked
the
parameters and it performed much better than the previous two approaches.
Bag of Visual Words
Observations
● Similar looking alphabets misclassified amongst each other
● One of the persons among the 3 persons was left handed and gave
laterally inverted images for many alphabets.
Proposed System
This section analyzes the structure of ST-GCN and its propagation. A normal convolutional
network takes input as a four-dimensional matrix whose shape is [N,H,W,C] where N denotes
the batch size, C denotes the channel, and H×W denotes the area of the image. In order to use
convolutional networks for skeleton-based action recognition, an embedded skeleton joints
sequence is reshaped to [N,T,V,C] where N denotes the batch size, T denotes the length of
frames, V denotes the number of joints each frame, and C denotes the coordinate dimensions
of joints. Although skeleton joints can be presented as an image in this way, it ignores the
relationship between different parts of skeleton joints and hence propagate irrelevant
information from one joint to another, which introduces noise between them.
To address this problem, we proposed ST-GCN to multiply an adjacent [V,V] matrix A with
feature maps after t×1 convolutional operations. The elements in this matrix are decided by the
relationship of each two joints, e.g., column vectors denote joints themselves and row vectors
denote the joints linked to them. The whole weights add to 1 for every joint and are the same
for all linked joints, e.g., numbers of A1,M and AN,M are both 0.5 if joint VM is only linked to
joint VN.
Hand gesture graph convNet
Conclusion
The exploration of Graph Convolutional Networks (GCNs) and Convolutional Neural
Networks (CNNs) for hand sign recognition has provided valuable insights into their
respective strengths and suitability for this specific task.
Throughout this study, both architectures were evaluated in the context of their
applicability to hand sign recognition, considering factors such as model performance,
data representation, and computational efficiency.
CNNs, known for their efficacy in extracting spatial features from images, have
traditionally been the go-to architecture for image-based tasks like sign language
recognition. Their ability to capture spatial hierarchies within pixel data remains a
robust approach, especially when working with image-centric datasets.