0% found this document useful (0 votes)
22 views19 pages

(Sign Language Detection) : ACE Engineering College

Uploaded by

revanmk08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views19 pages

(Sign Language Detection) : ACE Engineering College

Uploaded by

revanmk08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

A REAL TIME Project Report on

(Sign Language Detection)


Submitted in partial fulfillment of the requirements for the award of the degree of

BACHELOR OF TECHNOLOGY
In

CSE (DATA SCIENCE)


By

Mk. Revan (22AG1A6739)

M. Rithvik (22AG1A6735)
N. Harikrishna(22AG1A6746)

Under the guidance of:


A. Sarala Devi
Assistant professor

DEPARTMENT OF CSE (DATA SCIENCE)


ACE Engineering College
Ankushapur(V), Ghatkesar(M), Medchal Dist - 501 301
(An Autonomous Institution, Affiliated to JNTUH, Hyderabad)
www.aceec.ac.in
2023-2024

1
DEPARTMENT OF CSE (DATA SCIENCE)

CERTIFICATE
This is to certify that the Real time project report entitled “project
title” is a bonafide work done by <Revan,Ruthvik,
Harikrishna>bearing<22AG1A6739,22AG1A6735,22AG1A6746> in
partial fulfillment for the award of Degree of BACHELOR OF
TECHNOLOGY in CSE (Data Science) from JNTUH University,
Hyderabad during the academic year 2023- 2024. This record of bonafide
work carried out by them under our guidance and supervision.

The results embodied in this report have not been submitted by the
student to any other University or Institution for the award of any degree
or diploma.

A.Sarala Devi Dr. P. Chiranjeevi


Associate Professor Associate Professor
Supervisor H.O.D., CSE-DS.

2
ACKNOWLEDGEMENT

We would like to express my gratitude to all the people behind the screen who
have helped me transform an idea into a real time application.
We would like to express my heart-felt gratitude to my parents without whom
we would not have been privileged to achieve and fulfill my dreams.
A special thanks to our General Secretary, Prof. Y. V. Gopala Krishna
Murthy, for having founded such an esteemed institution. Sincere thanks to our Joint
Secretary Mrs. M. Padmavathi, for support in doing project work. I am also grateful
to our beloved principal, Dr. B. L. RAJU for permitting us to carry out this project.
We profoundly thank Dr. P. Chiranjeevi, Associate Professor and Head of
the Department of Computer Science and Engineering (Data Science), who has been
an excellent guide and also a great source of inspiration to my work.
We extremely thank Mrs. B. Saritha and Mr. M. Hari Krishna, Assistant
Professors, Project coordinators, who helped us in all the way in fulfilling of all
aspects in completion of our Mini-Project.
We are very thankful to my internal guide Mrs. A. Sarala Devi who has been
an excellent and also given continuous support for the Completion of my project
work.
The satisfaction and euphoria that accompany the successful completion of the
task would be great, but incomplete without the mention of the people who made it
possible, whose constant guidance and encouragement crown all the efforts with
success. In this context, I would like to thank all the other staff members, both
teaching and non-teaching, who have extended their timely help and eased my task.

Mk.Revan (22AG1A6739)
M.Rithvik (22AG1A6735)
N.Harikrishna(22AG1A6746)

3
SIGN LANGUAGE DETECTION

4
ABSTRACT
Sign Language is mainly used by deaf (hard hearing) and dumb people to exchange
information between their own community and with other people. It is a language
where people use their hand gestures to communicate as they can't speak or hear. Sign
Language Recognition (SLR) deals with recognizing the hand gestures acquisition
and continues till text or speech is generated for corresponding hand gestures. Here
hand gestures for sign language can be classified as static and dynamic. However,
static hand gesture recognition is simpler than dynamic hand gesture recognition, but
both recognition is important to the human community. We can use Deep Learning
Computer Vision to recognize the hand gestures by building Deep Neural Network
architectures (Convolution Neural Network Architectures) where the model will learn
to recognize the hand gestures images over an epoch. Once the model Successfully
recognizes the gesture the corresponding English text is generated and then text can
be converted to speech. This model will be more efficient and hence communicate for
the deaf (hard hearing) and dump people will be easier. In this paper, we will discuss
how Sign Language Recognition is done using Deep Learning.

5
CONTENTS

S.NO CHAPTER NAME PAGE NO

1 INTRODUCTION 7-8

2 EXISTING SYSTEM 9

3 EXISTING SYSTEM DRAWBACKS 10

4 LITERATURE REVIEW 11

5 PROPOSED MODEL / SYSTEM 12

6 SOFTWARE REQUIREMENTS AND

HARDWARE REQUIREMENTS 13

7 SYSTEM ANALYSIS 15

7.1 MODULE DESCRIPTION 14-15

8 METHODOLOGY 16-17

8.1 DATA COLLECTION 18

8.2 DATA PREPROCESSING 19

9 OVERALL STRUCTURE 20

10 CONCLUSION 21

6
1.INTRODUCTION

Deaf (hard hearing) and dumb people use Sign Language (SL) [1] as their
primary means to express their ideas and thoughts with their own community and
with other people with hand and body gestures. It has its own vocabulary,
meaning, and syntax which is different from the spoken language or written
language. Spoken language is a language produced by articulate sounds mapped
against specific words and grammatical combinations to convey meaningful
messages. Sign language uses visual hand and body gestures to convey
meaningful messages. There are somewhere between 138 and 300 different types
of Sign Language used around globally today. In India, there are only about 250
certified sign language interpreters for a deaf population of around 7 million.
This would be a problem to teach sign language to the deaf and dumb people as
there is a limited number of sign language interpreters exits today. Sign
Language Recognition is an attempt to recognize these hand gestures and convert
them to the corresponding text or speech. Today Computer Vision and Deep
Learning have gained a lot of popularity and many State of the Art (SOTA)
models can be built. Using Deep Learning algorithms and Image Processing
we can able to classify these hand gestures and able to produce corresponding
text. An example of “A” alphabet in sign language notion to English “A” text or
speech.

In Deep Learning Convolution Neural Networks (CNN) is the most popular


neural network algorithm which is a widely used algorithm for Image/Video
tasks. For Convolution Neural Networks (CNN) we have advanced architectures
like LeNET-5 [2], and MobileNetV2 [3] where we can use thesearchitectures
to achieve the State of the Art (SOTA).We can use all these architectures and
combine them using neural network ensemble techniques [4]. By this, we can
achieve an almost 100% accurate model which will recognize the hand gestures.
This model will be deployed in web frameworks like Django or a standalone
application or embedded devices where the hand gestures are recognized
inthelivecamera and then converting them to text. This system will help deaf and
dumb people to communicate easil

7
2. EXISTING SYTSEM

Sign language detection systems are technology-driven solutions


designed to interpret and understand sign language gestures, enabling
communication between individuals who are deaf or hard of hearing and those
who do not understand sign language. These systems typically employ various
technologies, including computer vision, machine learning, and depth sensing, to
recognize and interpret sign language gestures accurately.

Here's an overview of the existing system of sign language detection:

1. Computer Vision: Many sign language detection systems use computer vision
techniques to analyze video input and identify hand gestures and movements.
They often rely on techniques like image segmentation, feature extraction, and
object detection to locate and track the signer's hands and other relevant features.

2. Depth Sensing: Depth sensing technology, such as Microsoft's Kinect sensor


or similar devices, provides depth information in addition to colour imagery. This
allows for more accurate tracking of hand movements and gestures in three-
dimensional space, enhancing the precision of sign language interpretation.

3. Machine Learning: Machine learning algorithms play a crucial role in sign


language detection systems. These algorithms are trained on large datasets of sign
language gestures to recognize patterns and associations between hand
movements and corresponding signs. Techniques like convolutional neural
networks (CNNs) and recurrent neural networks (RNNs) are commonly used for
this purpose.

4. Gesture Recognition: Sign language detection systems analyze the


movements and configurations of the signer's hands and fingers to recognize
specific gestures and translate them into text or speech. This involves identifying
hand shapes (signs), movements, orientations, and facial expressions, as these
elements contribute to the grammar and semantics of sign language.

5. Real-Time Processing: Many systems are designed to process sign language


gestures in real-time, allowing for immediate translation and communication

8
between signers and non-signers. Low-latency processing is essential to facilitate
smooth and natural interactions between users.

6. User Interfaces: Sign language detection systems often include user-friendly


interfaces that display the interpreted sign language in real-time, either as text,
synthesized speech, or animated avatars. These interfaces may also support
additional features such as translation into different spoken languages and
customization options for users with different communication preferences.

7. Accessibility Integration: Some sign language detection systems are


integrated into existing communication devices and platforms to make them more
accessible to individuals who use sign language. For example, they may be
incorporated into video conferencing software, mobile apps, or wearable devices
to enable seamless communication between signers and non-signers.

3.EXISTING SYSTEM DRAWBACKS

Existing systems for sign language detection have made significant progress, but
they still face several drawbacks:

1. Limited vocabulary: Many existing systems focus on recognizing a limited


set of predefined gestures or signs. This restricts their utility for real-world
applications where sign language involves a vast vocabulary and nuanced
expressions.

2. Data variability: Sign language involves a wide range of variations in hand


shapes, movements, and orientations. Existing systems may struggle to generalize
well across different users, environments, and signing styles.

3. Data annotation: Building accurate sign language recognition models requires


large annotated datasets. However, manually annotating sign language data can
be time-consuming and expensive, leading to limited availability of high-quality
datasets.

9
4. Real-time performance: Some systems may face challenges in achieving real-
time performance, especially when dealing with complex hand movements or
processing high-resolution video streams.

5. Hardware limitations: Implementing sign language recognition on resource-


constrained devices like smartphones or embedded systems can be challenging
due to limitations in processing power and memory.

4.LITERATURE REVEIW

Real-time sign language fingerspelling recognition using convolutional neural


networks from depth map.

This works focuses on static fingerspelling in American Sign Language A method for
implementing a sign language to text/voice conversion system without using handheld
gloves and sensors, by capturing the gesture continuously and converting them to
voice. In this method, only a few images were captured for recognition. The design of
a communication aid for the physically challenged.

Design of a communication aid for physically challenged.

The system was developed under the MATLAB environment. It consists of mainly
two phases via training phase and the testing phase. In the training phase, the author
used feed-forward neural networks. The problem here is MATLAB is not that
efficient and also integrating the concurrent attributes as a whole is difficult.

American Sign Language Interpreter System for Deaf and Dumb Individuals.

The discussed procedures could recognize 20 out of 24 static ASL alphabets. The
alphabets A, M, N, and S couldn’t be recognized due to the occlusion problem. They
have used only a limited number of images.

5.PROPOSED MODEL / SYSTEM

10
In Machine Learning we have an ensemble technique where we train multiple sub-
models and average them. Random Forest algorithm is an example where it uses
multiple Decision tree algorithms. Similarly, we can perform ensemble for Neural
Networks as well. There are a lot of ensemble techniques for Neural Networks like
Stacked generalization, Ensemble learning via negative correlation and, Probabilistic
Modelling with Neural Networks . We have implemented the Horizontal Voting
Ensemble method to improve the performance of neural networks.

Features of proposed system:

1. Expanded vocabulary: The proposed system could aim to recognize a wider range
of signs and gestures, including both common and less frequently used signs. This
would increase its usefulness and applicability in real-world scenarios.

2. Robustness to variability: The system would be designed to handle variations in


hand shapes, movements, orientations, and signing styles across different users and
environments.

3. Efficient data annotation: To overcome the challenge of limited annotated data,


the proposed system might employ techniques such as data augmentation, semi-
supervised learning, or active learning to effectively leverage available data and
reduce the need for manual annotation.

4. Real-time performance: Emphasis would be placed on optimizing the system for


real-time performance, ensuring timely and responsive sign language recognition even
in dynamic environments or when processing high-resolution video streams.

5. Hardware compatibility: The proposed system could be designed to run


efficiently on a variety of hardware platforms, including smartphones, tablets,
wearable devices, and embedded systems, making sign language detection accessible
in diverse contexts.

11
6.SOFTWARE REQUIREMENTS AND HARDWARE

REQUIREMENTS

1.Operating System : Windows

2.Programming languages: Python

3.Libraries: Media pipe ,Open cv ,Scikit-learn ,Numpy, Matplotlib.

4.Developing Environment: You can use any code editor or integrated development
environment (IDE).

5.Package Management: You'll need a package manager to install and


manage Python libraries. `pip` is the standard package manager for Python. It
comes pre-installed with Python versions 3.4 and above. You can use `pip` to
install libraries like `media pipe`, `open cv-python`, `matplotlib`, `scikit-learn`, and
`numpy` .

6.CPU: A modern CPU with multiple cores (e.g., Intel Core i5 or AMD Ryzen 5)
would provide adequate processing power.

7.GPU (optional): If you plan to train deep learning models or perform


computationally intensive tasks, having a dedicated GPU can significantly accelerate
training and inference.

8.RAM: Adequate RAM is essential, especially when working with large datasets or
complex models(at least 8 GB of RAM).

9.Storage: Sufficient storage space is required for storing datasets, code, and model
checkpoints. SSDs are preferred over HDDs for faster data access and reduced
loading times.

10.Webcam (optional): Most modern laptops come with built-in webcams, but you
can also use external webcams for desktop computers.

11.Other Peripherals: Standard peripherals such as a keyboard, mouse, and display


are necessary for interacting with your development environment.

12
7 .SYSTEM ANALYSIS

7.1 Module Description

As shown in Figure , the project will be structured into 3 distinct functional


blocks, Data Processing, Training, Classify Gesture. The block diagram is
simplified in detail to abstract some of the minutiae:

•Data Processing: The load data.py script contains functions to load the Raw
Image Data and save the image data as numpy arrays into file storage. The
process data.py script will load the image data from data.npy and preprocess the
image by resizing/rescaling the image, and applying filters and ZCA whitening to
enhance features. During training the processed image data was split into
training, validation, and testing data and written to storage. Training also involves
a load dataset.py script that loads the relevant data split into a Dataset class. For
use of the trained model in classifying gestures, an individual image is loaded and
processed from the filesystem.

• Training: The training loop for the model is contained in train model.py. The
model is trained with hyperparameters obtained from a config file that lists the
learning rate, batch size, image filtering, and number of epochs. The
configuration used to train the model is saved along with the model architecture
for future evaluation and tweaking for improved results. Within the training loop,

13
the training and validation datasets are loaded as Dataloaders and the model is
trained using Adam Optimizer with Cross Entropy Loss. The model is evaluated
every epoch on the validation set and the model with best validation accuracy is
saved to storage for further evaluation and use. Upon finishing training, the
training and validation error and loss is saved to the disk, along with a plot of
error and loss over training.

• Classify Gesture: After a model has been trained, it can be used to classify a
new ASL gesture that is available as a file on the filesystem. The user inputs the
filepath of the gesture image and the test data.py script will pass the filepath to
process data.py to load and preprocess the file the same way as the model has
been trained.

14
8 .METHODOLOGY

1. Data Collection:

- Gather a diverse dataset of sign language videos, covering various sign languages,
gestures, and signers.

- Ensure that the dataset includes annotations specifying the signs performed in each
video frame.

- Consider factors such as lighting conditions, camera angles, and signer


characteristics to ensure dataset diversity.

2. Preprocessing:

- Preprocess the videos to extract relevant features, such as hand positions,


movements, and facial expressions.

- Normalize the data to account for differences in scale, rotation, and perspective.

- Augment the dataset to increase its size and variability, for example, by applying
transformations like rotation, scaling, and flipping.

3. Model Selection:

- Choose an appropriate model architecture for sign language detection, considering


factors such as complexity, computational efficiency, and performance.

- Common choices include convolutional neural networks (CNNs) for image-based


tasks and recurrent neural networks (RNNs) for sequential data like sign language
sequences.

4. Training:

- Split the dataset into training, validation, and test sets to evaluate model
performance.

- Train the selected model using the training data, optimizing the model parameters to
minimize a chosen loss function (e.g., cross-entropy loss).

15
- Monitor the model's performance on the validation set and adjust hyperparameters
as needed to prevent overfitting.

5. Evaluation:

- Evaluate the trained model on the test set to assess its performance in real-world
scenarios.

- Measure metrics such as accuracy, precision, recall, and F1-score to quantify the
model's effectiveness in detecting sign language gestures.

- Analyze the model's performance across different sign languages, signer


demographics, and environmental conditions.

6. Fine-Tuning and Optimization:

- Fine-tune the model based on the evaluation results, addressing any weaknesses or
areas for improvement identified during testing.

- Optimize the model for deployment, considering factors such as inference speed,
memory usage, and energy efficiency, especially for real-time applications.

8.1 Data collection

The primary source of data for this project was the compiled dataset of American
Sign Language (ASL) called the ASL Alphabet from Kaggle user Akash [3]. The
dataset is comprised of 87,000 images which are 200x200 pixels. There are 29
total classes, each with 3000 images, 26 for the letters A-Z and 3 for space, delete
and nothing. This data is solely of the user Akash gesturing in ASL, with the
images taken from his laptop’s webcam. These photos were then cropped,
rescaled, and labelled for use.

Figure 2: Examples of images from the Kaggle dataset used for training. Note
difficulty of distinguishing fingers in the letter E. A self-generated test set was

16
created in order to to investigate the neural network’s ability to generalize. Five
different test sets of images were taken with a webcam under different lighting
conditions, backgrounds, and use of dominant/non-dominant hand. These images
were then cropped and preprocessed.

8.2 Data Pre-processing

The data preprocessing was done using the PILLOW library, an image processing
library, and sklearn.decomposition library, which is useful for its matrix
optimization and decomposition functionality. 2 MIE324 Fall 2018 Project
Proposal (a) Sample from nonuniform background test set (b) Sample from plain
white background test set (c) Sample from a darker test set (d) Sample from plain
white background test set Figure 3: Examples of the signed letter A T from two
test sets with differing lighting and background Image Enhancement: A
combination of brightness, contrast, sharpness, and color enhancement was used
on the images. For example, the contrast and brightness were changed such that
fingers could be distinguished when the image was very dark. Edge
Enhancement: Edge enhancement is an image filtering techniques that makes
edges more defined. This is achieved by the increase of contrast in a local region
of the image that is detected as an edge. This has the effect of making the border
of the hand and fingers, versus the background, much more clear and distinct.
This can potentially help the neural network identify the hand and its boundaries.
Image Whitening: ZCA, or image whitening, is a technique that uses the singular
value decomposition of a matrix. This algorithm decorrelates the data, and
removes the redundant, or obvious, information out of the data. This allows for
the neural network to look for more complex and sophisticated relationships, and
to uncover the underlying structure of the patterns it is being trained on. The
covariance matrix of the image is set to identity, and the mean to zero.

17
9.OVERALL STRUCTURE

The model used in this classification task is a fairly basic implementation of a


Convolutional Neural Network (CNN). As the project requires classification
of images, a CNN is the go-to architecture. The basis for our model design
came from Using Deep Convolutional Networks for Gesture Recognition in
AmericanSign Language paper that accomplished a similar ASL Gesture
Classification task [4]. This model consisted of convolutional blocks
containing two 2D Convolutional Layers with ReLU activation, followed by
Max Pooling and Dropout layers. These convolutional blocks are repeated 3
times and followed by Fully Connected layers that eventually classify into the
required categories. The kernel sizes are maintained at 3 X 3 throughout the
model. Our originally proposed model is identical to the one from the
aforementioned paper, this model is shown in Figure 5. We omitted the
dropout layers on the fully connected layers at first to allow for faster training
and to establish a baseline without dropout. Figure 5: Model Architecture as
implemented in Using Deep Convolutional Networks for Gesture Recognition
in AmericanSign Language[4] We also decided to design a separate model to
compare with the model in the paper. This model was designed to be trained

18
faster and to establish a baseline for problem complexity. This smaller model
was built with only one “block” of convolutional layers consisting of two
convolutional layers with variable kernel sizes progressing from 5 X 5 to 10 X
10, ReLU activation, and the usual Max Pooling and Dropout. This fed into
three fully connected layers which output into the 29 classes of letters. The
variation of the kernel sizes was motivated by our dataset including the
background, whereas the paper preprocessed their data to remove the
background. The design followed the thinking that the first layer with smaller
kernel would capture smaller features such as hand outline, finger edges and
shadows. The larger kernel hopefully captures combinations of the smaller
features like finger crossing, angles, hand location, etc.

10.CONCLUSION

In conclusion, we were successfully able to develop a practical and


meaningful system that can able to understand sign language and translate that
to the corresponding text. There are still many shortages of our system like
this system can detect 0-9 digits and A-Z alphabets hand gestures but doesn’t
cover body gestures and other dynamic gestures. In We are sure and it can be
improved and optimized in the future.

19

You might also like