C12 Final Report
C12 Final Report
Submitted by:
K. BABITHA (208T1A05E5)
A. CHAITANYA KUMAR (208T1A05D2)
V. DNV SRAVANTHI (208T1A05I6)
M. PRAVALIKA (218T5A0518)
D. AKSHAYA (208T1A05E1)
Mr. K. SRIKANTH
ASSITANT PROFESSOR
BACHELOR OF TECHNOLOGY
IN
at
I
DECLARATION
We hereby declare that the major project report entitled “VISUAL GESTURES AS A
LANGUAGE: ENABLING SPEECH THROUGH IMAGES” submitted for the
B.Tech. (CSE) degree is my original work and the project has not formed the basis for
the award of any other degree, diploma, fellowship, or any other similar titles.
M. PRAVALIKA 218T5A0518
D. AKSHAYA 208T1A05E1
Place:
Date:
II
DHANEKULA INSTITUTE OF ENGINEERING &
TECHNOLOGY
(Affiliated to JNTU: Kakinada, Approved by AICTE – New Delhi)
CERTIFICATE
III
VISION – MISSION – PEOs
Vision/Mission/PEOs
Institute Vision Pioneering Professional Education through Quality
IV
POs/PSOs
Program Outcomes (Pos)
Engineering Knowledge: Apply the knowledge of mathematics, science,
1 engineering fundamentals and an engineering specialization to the solution of
complex engineering problems.
Problem analysis: Identify, formulate, review research literature, and
analyze complex engineering problems reaching substantiated conclusions
2
using first principles of mathematics, natural sciences, and engineering
sciences.
Design/development of solutions: Design solutions for complex
engineering problems and design system components or processes that meet
3
the specified needs with appropriate consideration for the public health and
safety, and the cultural, societal, and environmental considerations
Conduct investigations of complex problems: Use research-based
4 knowledge and research methods including design of experiments, analysis
and interpretation of data, and synthesis of the information to provide valid
conclusions.
Modern tool usage: Create, select, and apply appropriate techniques,
5 resources, and modern engineering and IT tools including prediction and
modeling to complex engineering activities with an understanding of the
limitations.
The engineer and society: Apply reasoning informed by the contextual
6 knowledge to assess societal, health, safety, legal and cultural issues, and
the consequent responsibilities relevant to the professional engineering
practice.
Environment and sustainability: Understand the impact of the
7 professional engineering solutions in societal and environmental contexts,
and demonstrate the knowledge of, and need for sustainable development.
8 Ethics: Apply ethical principles and commit to professional ethics and
responsibilities and norms of the engineering practice.
9 Individual and team work: Function effectively as an individual, and as a
member or leader in diverse teams, and in multidisciplinary settings.
Communication: Communicate effectively on complex engineering
activities with the engineering community and with society at large, such
10 as, being able to comprehend and write effective reports and design
documentation, make effective presentations, and give and receive clear
instructions.
Project management and finance: Demonstrate knowledge and
11 understanding of the engineering and management principles and apply
these to one’s own work, as a member and leader in a team, to manage
projects and in multidisciplinary environments.
Life-long learning: Recognize the need for, and have the preparation and
12 ability to engage in independent and life-long learning in the broadest
context of technological change.
V
Program Specific Outcome Statements (PSO`s):
Have expertise in algorithms, networking, web applications and software
1 engineering for efficient design of computer-based systems of varying
complexity.
2 Qualify in national international level competitive examinations for
successful higher studies and employment.
VI
PROJECT MAPPINGS
Batch No: C12
Project Title VISUAL GESTURES AS A LANGUAGE:
ENABLING SPEECH THROUGH IMAGES
Project Domain Deep Learning
Type of the Project Application
Guide Name K. SRIKANTH
Student Roll No Student Name
208T1A05E5 K. BABITHA
208T1A05D2 A. CHAITANYA KUMAR
208T1A05I6 V. DNV SRAVANTHI
218T5A0518 M. PRAVALIKA
208T1A05E1 D. AKSHAYA
COURSE OUTCOMES: At the end of the Course/Subject, the students will be
able to
Blooms
PSO
CO. No Course Outcomes (COs) POs Taxonomy &
s
Level
Identify the real-world problem 1,2,3,4,6,8, 1,2
R20C501.1 with a set of requirements to 9,10,11 Applying(L3)
design a solution.
Implement, Test and Validate the 1,2,3,4,5,8, 1,2
R20C501.2 solution against the requirements 9,10,11 Analyzing(L4)
for a given problem.
Lead a team as a responsible 1,2,4,5,6,8, 1,2
member in developing software 9,10,11 Analyzing(L4)
R20C501.3
solutions for real world problems
and societal issues with ethics.
Participate in discussions to bring 1,2,4,6,7,8, 1,2
R20C501.4 technical and behavioral ideas for 9,10,11 Evaluating(L5)
good solutions.
Express ideas with good 1,2,7,8,9,10 1,2
R20C501.5 communication skills during Creating(L6)
presentations.
Learn new technologies to 1,2,4,5,89,1 1,2
R20C501.6 contribute in the software industry 1,12 Creating(L6)
for optimal solutions
VII
Course Outcomes vs PO`s Mapping:
Courses
P01 P02 P03 P04 P05 P06 P07 P08 P09 P10 P011 P012
Out Comes
R20C501.1 3 3 3 3 3 3 3 3 3
R20C501.2 3 3 3 3 3 3 3 3 3
R20C501.3 3 3 3 3 3 3 3 3 3
R20C501.4 3 3 3 3 3 3 3 3 3
R20C501.5 3 3 3 3 3 3
R20C501.6 3 3 3 3 3 3 3 3
Total 18 9 6 15 9 9 6 18 18 15 15 3
Average 3 3 3 3 3 3 3 3 3 3 3 3
VIII
6. R20C501.6 is strongly mapped with PO1, PO2, PO4, PO5, PO9, PO11, PO12 as
we use engineering concepts and scientific solutions to communicate with deaf
and dumb people. In addition, we utilize modern technology, commit to ethics,
and have good communication skills.
Course Outcomes vs PSOs Mapping:
Courses Out
PSO1 PSO2
Comes
R20C501.1 3 3
R20C501.2 3 3
R20C501.3 3 3
R20C501.4 3 3
R20C501.5 3 3
R20C501.6 3 3
Total 18 18
Average 3 3
208T1A05E5 K.Babitha
208T1A05D2 A.Chaitanya Kumar
208T1A05I6 V.DNV Sravanthi Project Guide
218T5A0518 M.Pravalika K.Srikanth
208T1A05E1 D.Akshaya (Assistant Professor, CSE)
IX
ACKNOWLEDGEMENT
Behind every achievement lies an unfathomable sea of gratitude to those who activated
it, without whom it would ever have come into existence. To them we lay the words of
gratitude imprinted with us.
We would like to thank our respected Principal, Dr. RAVI KADIYALA and, Dr. K.
SOWMYA, Head of the Department, Computer Science and Engineering for their
support throughout our major project.
We also extend our thanks to all the faculty members of the Computer Science &
Engineering department for their valuable contributions in this project.
We would like to extend our warm appreciation to all our friends for sharing their
knowledge and valuable contributions in this project.
Finally, we express our deep sense of gratitude to our parents for their continuous support
throughout our academic career and their encouragement in the completion of this project
successfully.
K. BABITHA 208T1A05E5
A. CHAITANYA KUMAR 208T1A05D2
V. DNV SRAVANTHI 208T1A05I6
M. PRAVALIKA 218T5A0518
D. AKSHAYA 208T1A05E1
X
ABSTRACT
XI
List of Figures
Figure No Name of the Figure Page No
Figure 1.1 Gestures 4
Figure 1.2 Layers of CNN 10
Figure 3.1 System Architecture 52
Figure 3.2 Use Case Diagram for Gesture Language translation 56
Figure 3.3 Class Diagram for Gesture Language translation 58
Figure 3.4 Sequence Diagram for Gesture Language translation 60
List of Tables
Table No: Table Name Page No:
Table 1 Testing Table 92
XII
TABLE OF CONTENTS
Title Page I
Declaration of the student II
Certificate of the Guide III
Vision-Mission-PEO’s IV
PO’s-PSO’s V
Project Mappings VII
Acknowledgement X
Abstract XI
List of Figures XII
List of Tables XII
1 INTRODUCTION 1
1.1 Problem Statement 4
1.2 Objective 5
1.3 Basic Concepts 6
2 LITERATURE SURVEY 22
2.1 Literature Study 23
2.2 Existing System 31
2.3 Proposed System 31
2.4 Feasibility Study 32
2.4.1 ECONOMICAL FEASIBILITY 33
2.4.2 TECHNICAL FEASIBILITY 34
2.4.3 SOCIAL FEASIBILTY 35
2.5 Need for Feasibility Study 36
3 ANALYSIS AND DESIGN 37
3.1 Requirements 38
3.1.1 Functional Requirements 47
3.1.2 Non-Functional Requirements 49
3.2 System Specifications 51
3.3 System Architecture 52
3.4 UML Diagrams 53
3.4.1 Use case Diagram 55
3.4.2 Class Diagram 57
3.4.3 Sequence Diagram 58
3.4.4 Collaboration Diagram 60
3.4.5 Activity Diagram 62
3.4.6 Component Diagram 64
3.4.7 Deployment Diagram 66
3.4.8 State Chart Diagram 68
4 IMPLEMENTATION 71
4.1 Algorithms 72
4.2 Algorithms Steps 77
4.3 Software Installation 80
4.4 Software Environment 81
4.5 Steps for Executing the Project 82
4.6 Pseudo code 83
5 TESTING 87
5.1 Testing 88
5.1.1 Types of Tests 88
5.1.2 White Box Testing 90
5.1.3 Black Box Testing 90
5.1.4 Levels of Testing 90
5.1.4.1 Unit Testing 90
5.1.4.2 Integration Testing 91
5.1.4.3 Acceptance Testing 91
6 RESULTS 93
6.1 Output Screens 94
6.2 Results Outputs 96
7 CONCLUSION 99
8 FUTURE SCOPE 101
9 REFERENCES 103
10 PUBLISHED PAPER 106
INTRODUCTION
Page 1 of 114
1. INTRODUCTION
Sign language, a visual-gestural language, offers a means for individuals with hearing
impairments to express themselves and engage with others. Yet, the lack of widespread
knowledge and understanding of sign language among the general population poses a
considerable obstacle to effective communication. Without proficient interpretation or
alternative communication methods, individuals with speech and hearing disabilities
may find themselves isolated from meaningful interaction.
To address these challenges and promote inclusivity, there is a pressing need to develop
innovative solutions that facilitate sign language communication and make it accessible
to a broader audience. Sign language recognition systems represent a promising avenue
for bridging the communication gap between individuals who use sign language and
those who do not.
Sign language relies on hand gestures, facial expressions, and body movements to
convey complex information and emotions. However, interpreting these gestures
accurately requires specialized knowledge and training. Moreover, sign language varies
across different regions and communities, further complicating the process of
communication and interpretation.
Despite significant research efforts in recent years, sign language recognition remains
a challenging problem. Traditional methods, such as using hand gloves equipped with
sensors for human-computer interaction, have limitations. These methods often require
users to wear cumbersome equipment and manage complex cables connecting to a
computer, hindering natural and spontaneous communication.
Page 2 of 114
To overcome these limitations and enhance accessibility, researchers are exploring
alternative approaches to sign language recognition that do not rely on external
wearable hardware. By leveraging advancements in computer vision, machine learning,
and artificial intelligence, these systems aim to recognize sign language gestures using
bare hands, eliminating the need for specialized equipment, and streamlining user
interaction.
Furthermore, automatic sign language recognition systems have the potential to reduce
reliance on costly and often inaccessible human-based translation services. By
automating the interpretation process, these systems offer a more efficient and scalable
solution for bridging communication gaps in diverse settings, including education,
healthcare, and social interaction.
In summary, sign language recognition systems hold immense promise for promoting
inclusivity and breaking down communication barriers for individuals with speech and
hearing disabilities. Through ongoing research and innovation, we can continue to
advance the field of sign language recognition and create a more inclusive society
where communication is accessible to all.
Page 3 of 114
Figure 1.1 Gestures
Sign language translation, the process of converting gestures into written language, has
seen advancements, yet a gap remains in converting this written language into spoken
form. To address this gap, we propose the development of a system capable of
translating sign language gestures into speech using deep learning algorithms,
specifically Convolutional Neural Networks (CNN) and Artificial Neural Networks
(ANN).
CNNs are well-suited for capturing intricate hand movements, as they excel at
extracting spatial features from images. Conversely, ANN is adept at learning temporal
relationships, making it suitable for understanding the sequential nature of sign
language gestures. By leveraging these deep learning algorithms, our system aims to
accurately interpret sign language gestures and generate corresponding written text.
Page 4 of 114
Once the sign language gestures are translated into written language, the next challenge
is converting this text into speech. To achieve this, we plan to integrate a Text-To-
Speech (TTS) API into our system. This will enable us to seamlessly convert the
translated text into spoken language, providing a complete communication solution for
deaf and mute individuals.
1.2 OBJECTIVE
The core objective of achieving real-time operation is critical to ensuring that the
system can provide immediate feedback and responses, facilitating seamless
communication between users. To enhance usability and accessibility, we plan to
integrate the Google Text-to-Speech (gTTS) tool, which will enable the system to
convert the recognized ISL finger spelling gestures into spoken words. This auditory
output feature will significantly benefit individuals with hearing impairments, as it will
provide them with an additional mode of communication beyond visual cues.
Our project aims to go beyond mere gesture recognition and focus on converting these
gestures into meaningful sentences. By implementing algorithms to parse the sequence
of recognized signs and arrange them into grammatically correct sentences, we intend
Page 5 of 114
to create a comprehensive communication solution for users of ISL. This holistic
approach not only enables users to convey individual words but also facilitates the
construction of complete sentences, thereby enhancing the richness and effectiveness
of communication.
Ultimately, our overarching objective is to foster inclusivity and break down barriers to
communication for individuals with disabilities, particularly those who rely on sign
language as their primary mode of communication. By developing a user-friendly, real-
time system that seamlessly translates ISL gestures into spoken words, we aim to
empower users to communicate more effectively and engage more fully in everyday
interactions. Through this project, we aspire to contribute towards creating a more
accessible and inclusive society where communication barriers are minimized, and all
individuals have equal opportunities to express themselves and connect with others.
Page 6 of 114
c.) Gesture Detection:
Gesture detection involves the process of identifying and categorizing specific hand
movements or gestures that represent words, letters, or other meaningful units in
sign language. By analyzing the spatial and temporal characteristics of hand
movements captured in images or videos, computers can determine which gestures
correspond to which linguistic elements, facilitating effective communication
between signers and non-signers.
Deep Learning:
Deep learning represents a sophisticated subset of machine learning methodologies,
characterized by the construction and training of neural networks with multiple
layers. Unlike traditional machine learning algorithms, which often rely on feature
engineering and manual extraction of relevant patterns from data, deep learning
models can automatically learn intricate features and patterns directly from raw data.
This capability is particularly advantageous when dealing with large and complex
datasets, where manually defining features may be impractical or infeasible.
Page 7 of 114
The foundation of deep learning lies in artificial neural networks, which are
computational models inspired by the structure and function of the human brain.
These networks consist of interconnected nodes, or neurons, organized into layers.
Data is fed into the input layer, processed through intermediate hidden layers, and
finally, the output layer produces the desired predictions or classifications.
One of the key strengths of deep learning is its ability to learn hierarchical
representations of data. Each layer of the neural network extracts increasingly
abstract and complex features from the input data, allowing the model to capture
intricate patterns and relationships. This hierarchical feature learning enables deep
learning models to excel in tasks such as image and speech recognition, where
understanding high-dimensional and nuanced data is essential.
Deep learning has found widespread application across various domains, including
computer vision, natural language processing, and robotics. In computer vision, deep
learning models have achieved remarkable success in tasks such as object detection,
image classification, and facial recognition. Similarly, in natural language
processing, deep learning techniques have revolutionized the field, enabling
advancements in machine translation, sentiment analysis, and speech synthesis.
Moreover, deep learning has played a pivotal role in the development of autonomous
systems, including self-driving cars, drones, and robotic agents. By leveraging deep
learning algorithms, these systems can perceive and interpret their environments,
make informed decisions, and adapt to changing conditions in real-time.
Overall, the versatility and power of deep learning make it a cornerstone of modern
artificial intelligence research and application. As the volume and complexity of data
continue to grow, deep learning techniques are poised to drive further innovation
and breakthroughs across diverse fields, ultimately shaping the future of technology
and society.
Page 8 of 114
sliding across the input image, identifying patterns and edges. This feature extraction
capability is pivotal in enabling CNNs to discern complex visual information and
make accurate predictions about the contents of images.
The architecture of CNNs typically comprises several layers, each serving a specific
function in the feature extraction and classification process. The fundamental
building blocks of CNNs include:
2. Max Pooling Layer (MaxPool2D): Max pooling layers reduce the dimensionality
of feature maps generated by convolutional layers by retaining the most
significant information while discarding irrelevant details. This process helps in
reducing computational complexity and controlling overfitting, ultimately
improving the efficiency of the network.
3. Flatten Layer: The flatten layer serves to reshape the multi-dimensional feature
maps produced by previous layers into a one-dimensional vector. This flattened
representation is then fed into dense layers for further processing and
classification.
4. Dense Layer (Fully Connected): Dense layers, also known as fully connected
layers, are traditional neural network layers where each neuron is connected to
every neuron in the previous and next layers. These layers are responsible for
learning non-linear relationships in the data and performing classification tasks
based on the extracted features.
Page 9 of 114
layer for multi-class classification problems, converts raw scores into
probabilities, ensuring that the output probabilities sum up to 1.
Page 10 of 114
In an ANN, inputs are fed into the first layer of neurons, known as the input layer.
Each neuron in the input layer corresponds to a feature or attribute of the input data.
The input layer processes the incoming data and passes it on to the next layer of
neurons, known as the hidden layers. The hidden layers are responsible for learning
and extracting complex patterns and relationships from the input data through a
series of nonlinear transformations. Each neuron in the hidden layers computes a
weighted sum of its inputs, applies an activation function to this sum, and passes the
result to the neurons in the next layer.
Finally, the processed information is passed to the output layer, where the network
generates its final predictions or outputs based on the learned features. The output
layer typically consists of one or more neurons, depending on the nature of the task
(e.g., binary classification, multi-class classification, regression). Each neuron in the
output layer represents a possible outcome or class label, and the neuron with the
highest activation value indicates the network's prediction.
OpenCV:
OpenCV (Open-Source Computer Vision Library) is a widely-used open-source
library for computer vision and image processing tasks. It provides a comprehensive
set of functions and algorithms that facilitate various operations on images and
videos, including reading, writing, manipulation, analysis, and feature extraction.
Page 11 of 114
In the context of the provided code, OpenCV is utilized for a range of computer
vision tasks, which may include:
1. Image Loading and Display: OpenCV provides functions to load images from
files in various formats (e.g., JPEG, PNG) and display them on the screen. This
functionality allows developers to visualize images and inspect them during the
development process.
Page 12 of 114
6. Video Processing: OpenCV allows developers to process video streams in real-
time, enabling tasks such as video capture, frame manipulation, object tracking,
and motion analysis. It also supports video compression, encoding, and decoding
for efficient video processing.
Overall, OpenCV serves as a versatile and powerful tool for a wide range of
computer vision and image processing tasks, making it a popular choice for
developers working on applications ranging from robotics and automation to
healthcare and entertainment. Its extensive documentation, active community, and
cross-platform support further contribute to its widespread adoption in both
academic and industrial settings.
Background Subtraction:
The `cal_accum_avg () ` function serves a crucial role in computer vision
applications, particularly in scenarios where it's essential to extract the background
from input frames. This function calculates the accumulated weighted average of the
background frames over time, allowing for the continuous updating and refinement
of the background model.
In practical terms, the function iterates through a series of input frames, gradually
incorporating each frame into the accumulated average background model. The
accumulation process involves assigning weights to each pixel in the background
model based on its historical values and the new information provided by the current
input frame. By adjusting these weights over time, the function ensures that the
background model remains adaptive and robust to changes in the environment.
Page 13 of 114
updating and refining the background model. Its ability to adapt to changes in the
scene over time makes it a valuable tool for various computer vision applications,
including surveillance, object tracking, and scene analysis.
Hand Segmentation:
The `segment_hand () ` function serves a pivotal role in computer vision tasks,
particularly in applications where isolating the hand from the background is
necessary. This function utilizes background subtraction techniques to segment the
hand region from the rest of the scene.
To achieve this, the function first calculates the absolute difference between the
current frame and the accumulated background model obtained from
`cal_accum_avg () ` function. This difference represents the changes in pixel values
between the current frame and the background, effectively highlighting regions
where motion or variations have occurred.
Next, the function applies a threshold to the absolute difference image to convert it
into a binary image. This thresholding operation distinguishes between pixels
representing the hand (foreground) and those representing the background. By
setting an appropriate threshold value, the function can effectively separate the hand
region from the background, creating a binary mask where hand pixels are
represented by white and background pixels by black.
Finally, the function identifies contours within the binary image using contour
detection algorithms such as the one provided by OpenCV. These contours represent
continuous regions of white pixels in the binary mask, which correspond to the hand
region. By extracting and analyzing these contours, the function can accurately
delineate the boundaries of the hand and obtain its shape and position within the
frame.
Overall, the `segment_hand () ` function plays a crucial role in hand detection and
tracking applications by utilizing background subtraction techniques to isolate the
hand from the background. Its ability to accurately segment the hand region allows
Page 14 of 114
for further processing and analysis, such as hand gesture recognition, hand pose
estimation, and interaction in human-computer interaction systems.
Preprocessing:
The `segment_hand () ` function is a crucial preprocessing step in computer vision
applications focused on hand detection and tracking. It begins by computing the
absolute difference between the current frame and the background model,
emphasizing regions where notable changes have occurred.
Following this, a thresholding operation is applied to the difference image,
classifying pixels as foreground (hand) or background based on their intensity
values. This creates a binary image where hand pixels are represented by white and
background pixels by black. Subsequently, contour detection algorithms are utilized
to identify continuous regions of white pixels in the binary image, which correspond
to the hand region. By detecting and extracting these contours, the function
accurately delineates the boundaries of the hand and identifies its spatial extent
within the frame. `segment_hand () ` effectively isolates the hand region from the
background in input frames, laying the groundwork for subsequent analysis and
interaction tasks such as hand gesture recognition and human-computer interaction.
Contour Detection:
The `cv2.findContours() ` function is a fundamental tool in computer vision for
identifying and extracting contours from binary images, particularly after
thresholding operations. Contours represent the boundaries of objects within an
image and play a crucial role in tasks such as shape analysis, object detection, and
segmentation. In the context of hand detection, this function is employed to locate
the contours outlining the hand region within the binary image obtained from the
preprocessing step.
Since the hand region is typically the largest connected component in the binary
image, it corresponds to the largest contour detected by the function. By identifying
this contour, the `cv2.findContours() ` function effectively delineates the boundaries
of the hand region, enabling subsequent analysis and processing. This could include
extracting features such as the centroid, area, and convex hull of the hand, facilitating
tasks like hand gesture recognition, hand tracking, or human-computer interaction.
Page 15 of 114
The function serves as a key component in the pipeline for hand detection and
enables accurate localization of the hand within the input image.
Page 16 of 114
TensorFlow and Keras:
TensorFlow stands as a prominent open-source machine learning library that Google
developed. It is widely recognized for its versatility, efficiency, and extensive
support for building and deploying machine learning models across a variety of
platforms and devices. TensorFlow offers a comprehensive ecosystem of tools and
resources, making it a popular choice among researchers and developers alike for
tasks ranging from traditional machine learning to deep learning and beyond.
Keras, on the other hand, is an open-source neural network library that operates on
top of TensorFlow. It serves as a high-level neural networks API, providing a user-
friendly interface for building, training, and deploying neural network models. Keras
prioritizes simplicity and ease of use, making it particularly well-suited for rapid
prototyping and experimentation with neural network architectures. By abstracting
away the complexities of low-level TensorFlow operations, Keras enables
developers to focus on model design and experimentation without getting bogged
down in implementation details.
Collaboration between TensorFlow and Keras brings together the best of both
worlds: TensorFlow's power and scalability, coupled with Keras's simplicity and
flexibility. This integration has significantly contributed to the widespread adoption
of both frameworks in the machine learning community, fostering innovation and
advancements in deep learning research and applications.
Page 17 of 114
Matplotlib:
Matplotlib stands as a cornerstone in the Python ecosystem, offering a
comprehensive plotting library that facilitates the creation of static, interactive, and
animated visualizations. Its versatility and ease of use make it a go-to choose for
data scientists, researchers, and developers across various domains.
One of Matplotlib's key strengths lies in its ability to generate high-quality static
visualizations with minimal code. With Matplotlib, users can create a wide range of
plots, including line plots, scatter plots, bar plots, histograms, and more, allowing
for effective exploration and communication of data insights. Its intuitive interface
and extensive customization options enable users to tailor visualizations to their
specific needs, adjusting parameters such as colors, labels, axes, and annotations.
Page 18 of 114
Data Augmentation:
In the provided code, data augmentation techniques such as rotation, zooming, and
horizontal flipping are applied to increase the diversity of the training dataset,
thereby enhancing the robustness of the model to variations in the input data. This is
achieved using the `ImageDataGenerator` class provided by Keras.
1. Rotation:
The `rotation_range` parameter is set to 40, which allows for random rotations of
the input images within the range of -40 to +40 degrees. This introduces variations
in the orientation of the hand gestures, helping the model generalize better to unseen
angles.
2. Zooming:
The `zoom_range` parameter is set to 0.2, enabling random zooming of the input
images by a factor of up to 20%. This augmentation simulates variations in the scale
of the hand gestures, allowing the model to learn from different zoom levels.
3. Horizontal Flipping:
The `horizontal_flip` parameter is set to True, enabling random horizontal flipping
of the input images. This augmentation mirrors the hand gestures horizontally,
effectively doubling the size of the training dataset and exposing the model to
additional variations in hand orientation.
By applying these data augmentation techniques during the training process, the
model becomes more robust and generalizes better to unseen variations in hand
gestures. This helps prevent overfitting and improves the model's performance on
real-world data.
Page 19 of 114
programming languages, making it a go-to choose for many. Some key advantages
and features of VS Code include:
2. Intuitive User Interface: One of the standout features of VS Code is its clean
and intuitive user interface, designed to streamline the coding experience. The
editor provides a clutter-free workspace with customizable layouts, a rich set
of editing tools, and a robust set of keyboard shortcuts for efficient navigation
and coding.
Page 20 of 114
including extensions, plugins, and documentation. This active ecosystem
fosters collaboration, innovation, and knowledge sharing among users.
Visual Studio Code stands out as a powerful, flexible, and user-friendly code editor
that caters to the diverse needs of developers worldwide. Its robust features,
intuitive interface, and extensive ecosystem make it an indispensable tool for
modern software development projects.
Page 21 of 114
LITERATURE SURVEY
Page 22 of 114
2. LITERATURE SURVEY
A literature review surveys prior research published in books, scholarly articles, and
any other sources relevant to a particular issue, area of research, or theory, and by
so doing, provides a description, summary, and critical evaluation of these works in
relation to the research problem being investigated.
The journal provides insights into the nature and significance of gestures,
particularly within the context of sign language. It defines gesture as a form of non-
verbal communication characterized by bodily motions that convey information,
which can be either static (unchanging) or dynamic (changing over time). Sign
language, as described, encompasses visual gestures and signs used by individuals
who are deaf or mute to communicate. It emphasizes that sign language is a
structured code, where each sign carries a specific meaning assigned to it. These
signs go beyond representing just alphabets or numbers; they also convey common
expressions, greetings, and full sentences, allowing for rich and nuanced
communication.
Page 23 of 114
Overall, the journal underscores the importance of gestures and sign language as
fundamental modes of communication for individuals who are deaf or mute. It
highlights the structured nature of sign language and acknowledges the
complexities associated with representing meaning through gestures, particularly
in ISL. Additionally, it suggests a need for more research and development efforts
to further understand and advance the field of sign language, particularly in the
context of ISL [6].
Summary:
The essence of gestures, particularly within the context of sign language, and sheds
light on the complexities associated with Indian Sign Language (ISL) compared to
American Sign Language (ASL). Gestures, as emphasized, are a means of
conveying information through bodily motions, with sign language serving as a
prime example utilized by individuals who are deaf and mute. Sign language
comprises visually represented signs, each carrying specific meanings, enabling
rich and nuanced communication.
Page 24 of 114
It highlights the importance of developing an interactive, real-time video-based
sign language translation system, particularly tailored for individuals who are deaf
or mute and face challenges in communicating with others. Such a system, powered
by efficient machine learning algorithms, holds significant potential to bridge the
communication gap between individuals with hearing and speech impairments and
those who can hear and speak.
These domains, gesture recognition, and human activity recognition, are rapidly
advancing areas of research and development. They not only contribute to the
creation of sign language translation systems but also find applications in various
other fields, including automation in households and industries. The integration of
efficient machine learning algorithms into these systems enables higher levels of
automation and efficiency, facilitating seamless communication and interaction
between individuals with hearing and speech impairments and their counterparts
in both personal and professional settings.
Page 25 of 114
Summary:
The development of a real-time video-based sign language translation system,
propelled by efficient machine learning algorithms, represents a significant step
forward in improving communication accessibility for individuals who are deaf or
mute. By harnessing the power of machine learning, this system endeavors to
bridge the communication gap between those with hearing and speech impairments
and the rest of society.
Central to the functionality of this system is the ability to recognize gestures and
human activity. Gesture recognition plays a pivotal role in interpreting sign
language, as it involves identifying and analyzing hand movements, facial
expressions, and body postures—the primary components of sign language
communication. Additionally, the system must also recognize human activity to
understand the context in which gestures are made and interpret individual
behavior accurately.
Page 26 of 114
2.1.3 ML Based Sign Language Recognition System
The development of the model centres around vision-based isolated hand gesture
detection and recognition, aiming to provide a solution for individuals with speech
and hearing impairments to effectively communicate through sign language. By
segmenting sign language into region-wise divisions, the model offers a
straightforward method for users to convey information, enhancing accessibility
and understanding. This approach is particularly valuable considering that a
significant portion of society does not comprehend sign language, leaving speech
and hearing-impaired individuals reliant on human translators for communication.
However, the availability and affordability of human interpreters may be limited,
presenting challenges in ensuring consistent and accessible communication.
Page 27 of 114
Summary:
The model emphasizes vision-based isolated hand gesture detection and
recognition, which plays a pivotal role in enabling individuals to convey
information effectively through sign language. By focusing on this aspect, the
model aims to provide a user-friendly and efficient method for communication,
particularly for those with speech and hearing impairments. Sign language, with
its intricate gestures and expressions, serves as a rich and nuanced form of
communication, and the model's emphasis on isolated hand gesture detection and
recognition ensures that these subtleties are accurately captured and understood.
One of the primary motivations behind the development of such a model is the
limited availability and affordability of human translators. Many individuals who
are speech and hearing impaired rely on human interpreters to facilitate
communication with others. However, the scarcity of trained interpreters, coupled
with the associated costs, can often hinder access to effective communication. In
this context, an automated system emerges as a valuable substitute, offering a
reliable and accessible solution for interpreting sign language.
Page 28 of 114
2.1.4 Sign Language Recognition System Using Deep-Learning for Deaf and
Dumb
Aashir Hafeez, Suryansh Singh, Ujjwal Singh, Priyanshu Agarwal, Anant Kumar
Jayswal.
Amity School of Engineering and Technology Amity University, Noida Uttar
Pradesh, India.
The journal highlights the prevalent use of sign language among the majority of
deaf individuals as their primary mode of communication. It underscores the
challenge faced by those who do not understand sign language in effectively
interacting with individuals who rely on it for communication. In response to this
challenge, researchers have developed a device known as a sign language
recognition system (SLR).
The journal delves into the multiple stages involved in the development of an
automated SLR system. These stages typically include data collection,
preprocessing, feature extraction, model training, evaluation, and deployment.
Data collection involves gathering a comprehensive dataset of ASL gestures,
while preprocessing involves tasks such as image or video cleaning,
normalization, and segmentation. Feature extraction aims to extract relevant
features from the data, such as hand shapes, movements, and orientations.
Page 29 of 114
deployment involves integrating the trained model into a real-world application
or device, such as a mobile app or a wearable device, to enable real-time ASL
gesture recognition.
The study described in the journal provides valuable insights into the
development of automated systems for recognizing sign language. By comparing
different machine learning techniques and outlining the various stages of an SLR
system, the study contributes to advancing research in this field and improving
communication accessibility for individuals who rely on sign language as their
primary means of communication [16].
Summary:
Deaf individuals rely predominantly on sign language as their primary mode of
communication. However, this poses a challenge for those who do not understand
sign language, as it creates barriers to effective interaction. To address this issue,
a sign language recognition system has been developed. This system serves as a
technological solution to facilitate communication between individuals who use
sign language and those who do not.
The primary objective of the study is to automate the process of sign language
recognition. By leveraging machine learning techniques, researchers seek to
develop algorithms capable of accurately identifying and understanding ASL
gestures in real-time. This automation aims to enhance accessibility and
inclusivity by enabling individuals who do not understand sign language to
communicate effectively with those who rely on it.
Page 30 of 114
The development of a sign language recognition system represents a significant
advancement in improving communication accessibility for individuals who are
deaf or hard of hearing. By comparing different machine learning techniques and
focusing on automating sign language recognition, the study contributes to
advancing research in this field and ultimately fostering greater understanding
and connection among diverse populations.
Page 31 of 114
Later the translated text is then converted to speech using a Text-To-Speech (TTS)
API. This allows the system to provide a complete communication solution for
deaf and mute individuals.
Page 32 of 114
potential benefits. It also helps in securing support and funding for the project by
demonstrating its feasibility and potential return on investment.
Overall, the feasibility analysis phase is a critical step in the project lifecycle,
helping to ensure that the proposed system is both technically and economically
feasible, and aligns with the company's operational needs and objectives.
Three key considerations involved in the feasibility analysis are.
• ECONOMICAL FEASIBILITY
• TECHNICAL FEASIBILITY
• SOCIAL FEASIBILITY
To achieve this, the developed system must remain within the allocated budget.
This goal has been successfully accomplished, primarily because most of the
technologies utilized in the project are freely available. Leveraging open-source
technologies and tools helps minimize costs associated with software licenses and
subscriptions. Additionally, it allows the organization to benefit from the
collective efforts of the open-source community and access a wide range of
resources and support.
Page 33 of 114
This approach not only ensures cost-effectiveness but also maximizes the
organization's return on investment by minimizing unnecessary expenditures. By
strategically leveraging freely available technologies and only investing in
customized solutions when essential, the organization can optimize its resources
and achieve its objectives without overspending. This demonstrates prudent
financial management and underscores the importance of economic feasibility in
project planning and execution.
The study is conducted to assess the technical feasibility of the proposed system,
emphasizing the importance of ensuring that the system's technical requirements
are well-aligned with the available resources. It is crucial that the system does not
place excessive demands on the organization's technical infrastructure, as this
could lead to issues such as performance bottlenecks, system failures, or
increased maintenance costs.
High demands on technical resources can also impact the client, potentially
resulting in delays, disruptions, or additional expenses. Therefore, it's imperative
that the developed system has modest technical requirements, requiring minimal
or no changes to the existing technical environment for implementation.
To achieve this, the system's technical architecture and design must be carefully
considered to optimize resource utilization and minimize dependencies on
specialized hardware or software. Utilizing scalable and efficient technologies,
such as cloud computing or virtualization, can help ensure that the system remains
adaptable to changes in demand and can accommodate future growth without
significant investments in additional infrastructure.
Page 34 of 114
This approach not only minimizes potential disruptions for the client but also
enhances the overall sustainability and long-term viability of the system.
The aspect of the study focuses on assessing the level of acceptance of the system
by its users, which is crucial for the successful implementation and adoption of
the system. This process encompasses various elements, including training the
users to use the system efficiently and effectively. It's essential that users do not
perceive the system as a threat but rather as a valuable tool that enhances their
productivity and efficiency.
User acceptance depends heavily on the methods employed to educate users about
the system and make them familiar with its features and functionalities. Effective
training programs and user-friendly interfaces play a significant role in building
user confidence and fostering acceptance. Users should feel empowered and
comfortable using the system, knowing that it supports their tasks and enhances
their abilities.
Ultimately, the goal is to raise the user’s level of confidence in the system so that
they embrace it as a necessary and valuable tool in their workflow. When users
feel confident and comfortable using the system and are empowered to provide
feedback, they are more likely to accept and adopt it wholeheartedly, leading to
successful implementation and long-term usage.
Page 35 of 114
2.5 NEED FOR FEASIBILITY STUDY
During the feasibility study, various aspects of the proposed project are carefully
analyzed to assess its feasibility. This includes evaluating technical feasibility to
determine if the project can be successfully developed using available technology
and resources. Economic feasibility assesses the financial viability of the project,
considering factors such as development costs, potential return on investment,
and long-term sustainability. Operational feasibility evaluates whether the
proposed system aligns with the organization's operational goals and can be
effectively integrated into existing workflows and processes.
Page 36 of 114
ANALYSIS AND DESIGN
Page 37 of 114
3. ANALYSIS AND DESIGN
3.1. REQUIREMENTS
SOFTWARE REQUIREMENTS
Software requirements describe the features and functionalities expected from the
target system, encompassing both obvious and hidden, known, and unknown, as
well as expected and unexpected requirements from the client's perspective. The
process of gathering, analyzing, and documenting software requirements is
collectively referred to as software requirement analysis, which is essential for
understanding the scope of the project and defining the system's objectives.
In the context of the provided code, software requirements refer to the specific
features and functionalities that the code aims to implement. These requirements
serve as guidelines for development and provide clarity on the system's intended
behavior and capabilities. By documenting and understanding these requirements,
Page 38 of 114
developers can ensure that the software system meets the needs and expectations of
its stakeholders.
TensorFlow:
TensorFlow stands as a cornerstone in the realm of deep learning frameworks,
crafted and maintained by the tech juggernaut Google. This open-source framework
offers developers an extensive toolkit comprising tools and libraries essential for
crafting and honing a diverse array of machine learning models, prominently
featuring neural networks. Within the context of the provided code, TensorFlow
assumes a pivotal role, steering the entire journey from model inception to
evaluation, with a specific focus on convolutional neural networks (CNNs) tailored
explicitly for image classification tasks.
At its core, TensorFlow serves as a robust foundation for defining the architecture
and specifications of CNN models. Developers harness its flexibility to articulate
the intricate layers, activation functions, and other pivotal parameters essential for
effective image classification. TensorFlow's scalability ensures that developers can
seamlessly configure and adapt CNN architectures to address the unique
requirements of their projects.
Page 39 of 114
recall, and other key performance indicators, providing a comprehensive
understanding of the model's effectiveness in image classification tasks.
OpenCV:
OpenCV (Open-Source Computer Vision Library) stands as a cornerstone in the
domain of computer vision and machine learning, renowned for its widespread
adoption and versatility. As an open-source software library, OpenCV offers a rich
assortment of tools and algorithms designed to facilitate a multitude of image and
video processing tasks.
These tasks encompass a broad spectrum, ranging from fundamental operations like
reading and writing images to more sophisticated functionalities such as object
detection and tracking.
In the context of the provided code, OpenCV assumes a central role in performing
various image processing tasks essential for the project's objectives. One of its
primary functions involves reading images from external sources, enabling the code
to access and manipulate visual data. Additionally, OpenCV offers a plethora of
image processing techniques, including filtering operations, edge detection, and
contour finding, all of which contribute to the extraction of meaningful information
from images.
One notable capability of OpenCV utilized in the code is contour finding, a crucial
operation in object detection and shape analysis tasks. By identifying contours in
images, the code can isolate and extract regions of interest, facilitating subsequent
processing steps. Moreover, OpenCV provides functionalities for displaying
images, enabling developers to visualize intermediate results and validate the
effectiveness of their algorithms.
Page 40 of 114
NumPy:
NumPy serves as a cornerstone in the realm of numerical computing within the
Python ecosystem, offering a robust foundation for handling multi-dimensional
arrays and matrices. As a fundamental package, NumPy provides developers with
a comprehensive suite of tools and functions tailored for efficient numerical
operations and array manipulations.
One of the key features of NumPy is its support for multi-dimensional arrays, which
enables developers to represent and manipulate data in a structured and efficient
manner. These arrays serve as the building blocks for various mathematical
computations, data processing tasks, and scientific simulations.
Within the context of the provided code, NumPy is utilized for a variety of purposes,
including array manipulations, mathematical operations, and handling image data.
For instance, NumPy's array manipulation functions enable developers to reshape,
concatenate, and transpose arrays as needed. Its mathematical functions facilitate
computations such as matrix multiplication, element-wise operations, and statistical
analysis.
Page 41 of 114
Matplotlib:
Matplotlib stands as a versatile and powerful plotting library for Python, offering
developers a comprehensive toolkit for creating a diverse range of visualizations.
With its extensive collection of functions and capabilities, Matplotlib is a go-to
choose for data visualization tasks across various domains.
One of the key strengths of Matplotlib lies in its ability to generate a wide array of
visualizations, including line plots, histograms, scatter plots, bar charts, and more.
These visualizations serve as powerful tools for exploring and communicating data
insights effectively.
In the context of the provided code, Matplotlib plays a crucial role in visualizing
images and plotting training and validation accuracy and loss curves. For image
display, Matplotlib provides functions to visualize images stored as arrays, enabling
developers to inspect and analyze image data efficiently.
Page 42 of 114
OS:
A key element for working with the operating system's file system in Python is the
`os` module. With its many features, developers may easily explore file paths,
work with directories, and execute different file actions inside of their Python
scripts.
Giving developers easy access to file paths so they can interact with files and
directories on their system is one of the main goals of the `os` module. Developers
can create file paths, verify if a given path refers to a directory, and check if files or
directories exist with functions like `os.path.join()}, `os.path.exists()}, and
`os.path.isdir()}, respectively.
The code provided makes use of the `os` module, which is frequently used for
retrieving folder paths.
Python 3
Python, conceptualized in the late 1980s by Guido van Rossum at Centrum
Wiskunde & Informatica (CWI), has emerged as a versatile and powerful
programming language. One of Python's key strengths lies in its rich ecosystem of
modules, which play a crucial role in organizing and structuring Python code.
Page 43 of 114
developers with ready-made solutions to common programming tasks, accelerating
development and reducing the need to reinvent the wheel.
At its core, Visual Studio Code is engineered to provide developers with a highly
customizable and efficient coding environment. Its lightweight nature ensures fast
startup times and responsive performance, even when handling large codebases.
Despite its streamlined design, VS Code packs a punch with a plethora of features
aimed at enhancing productivity and code quality.
One of the standout features of Visual Studio Code is its extensive ecosystem of
extensions. With a rich marketplace of extensions developed by both Microsoft and
the community, developers can tailor their editing experience to suit their specific
workflow and requirements. From language support and syntax highlighting to
debugging tools and version control integrations, VS Code extensions empower
developers to personalize their development environment with ease.
Page 44 of 114
In addition to its editing capabilities, Visual Studio Code offers robust support for
debugging, enabling developers to identify and resolve issues efficiently. With
built-in debugging tools and seamless integration with various debugging
extensions, developers can debug their code directly within the editor, streamlining
the development process.
Furthermore, Visual Studio Code boasts robust support for version control systems
such as Git, allowing developers to manage their code repositories seamlessly.
Integration with Git features such as version history, branching, and merging
empowers developers to collaborate effectively and track changes to their codebase
with confidence.
USER REQUIREMENTS:
1. Image Dataset:
Users require a dataset comprising images organized into training and testing
directories, with subdirectories representing different classes, such as various hand
gestures. This dataset serves as the foundation for training and evaluating
convolutional neural network (CNN) models for image classification tasks.
2. Data Preprocessing:
Effective data preprocessing tools are essential for preparing the image dataset for
model training. This involves resizing images to a specified size, normalizing pixel
values to a common scale, and applying data augmentation techniques like rotation,
zoom, and horizontal flipping. These preprocessing steps help enhance the
robustness and generalization ability of the CNN model.
Page 45 of 114
3. Model Training:
Users need to train a CNN model using the preprocessed image dataset. This entails
defining the model architecture, which includes specifying the number and types of
convolutional and pooling layers, as well as fully connected layers. Additionally,
users must compile the model with appropriate loss and optimization functions, and
set training parameters such as batch size and number of epochs.
4. Model Evaluation:
After training the CNN model, users need to assess its performance on a separate
test dataset. This involves evaluating metrics such as accuracy and loss on the test
data to gauge the model's ability to generalize to unseen examples. Model
evaluation helps identify potential issues such as overfitting or underfitting and
guides further optimization efforts.
5. Visualization:
Tools for visualizing training and validation metrics, such as accuracy and loss
curves, are essential for analyzing the performance of the CNN model over epochs.
Visualizing these metrics helps users track the model's training progress, detect
patterns, and make informed decisions regarding model optimization strategies.
6. Model Saving:
Once the CNN model is trained and evaluated, users need the capability to save the
model to a file for future use. Saving the model allows users to reuse it for inference
tasks, deploy it in production environments, or share it with others without the need
for retraining from scratch.
Page 46 of 114
3.1.1 FUNCTIONAL REQUIREMENTS
The major functional requirements for our work encompass a comprehensive set
of specifications detailing the expected behavior and capabilities of the system
being developed. These requirements serve as the foundation for the entire
development process, guiding the design, implementation, and validation phases
to ensure the resulting system meets the intended objectives and effectively
addresses the needs of its users.
1. Data Preprocessing:
Data preprocessing involves preparing the input data for model training by
performing various transformations and augmentations. Firstly, the images are
resized to a specified size to ensure uniformity in input dimensions for the model.
Next, pixel values are normalized to a common scale, typically ranging from 0 to
1, to facilitate convergence during training. Additionally, data augmentation
techniques such as rotation, zoom, and horizontal flipping are applied to increase
Page 47 of 114
the variability of the dataset, thereby enhancing the model's ability to generalize to
unseen data.
3. Model Training:
The compiled model is trained using the training data generated from the
ImageDataGenerator, which automatically generates batches of augmented images
during training. Parameters such as the number of epochs (iterations over the entire
dataset) and batch size (number of samples processed in each iteration) are specified
for training. Throughout the training process, metrics such as accuracy and loss are
monitored and recorded to assess the model's performance and track its learning
progress.
4. Model Evaluation:
After training, the performance of the trained model is evaluated on a separate test
dataset that was not used during training. Evaluation metrics such as accuracy and
loss are calculated to quantify the model's ability to correctly classify unseen data
instances. This evaluation provides insights into the model's generalization
performance and helps identify potential areas for improvement.
5. Visualization:
Visualizing training and validation metrics, such as accuracy and loss curves,
allows for a qualitative assessment of the model's performance over epochs. Plots
of these metrics provide insights into the model's convergence behavior, indicating
Page 48 of 114
whether it is learning effectively or suffering from issues such as overfitting or
underfitting.
6. Model Saving:
Once the model is trained and evaluated, it is saved to a file in the HDF5 format
using the `save` method provided by Keras. This enables the trained model to be
reused for inference tasks or deployed in production environments without the need
for retraining.
In the context of the provided code, the non-functional requirements are crucial for
ensuring the effectiveness, efficiency, and reliability of the system. Here's an
elaboration on each of the mentioned non-functional requirements:
Page 49 of 114
3. Usability: Well-organized and comprehensible code is essential for facilitating
collaboration among developers and ensuring the maintainability of the system.
Clear comments, documentation, and coding conventions should be employed
to enhance code readability and ease of understanding.
Page 50 of 114
3.2 SYSTEM SPECIFICATIONS
H/W CONFIGURATION:
S/W CONFIGURATION:
• Matplotlib
• Numpy
• Open cv-python
• Tensorflow
• GTTS
• Tkinter
• Speech_recognition
• PyAudio
Page 51 of 114
3.3 SYSTEM ARCHITECTURE
Page 52 of 114
3.4 UML DIAGRAMS
Unified Modeling Language (UML) serves as a standardized method for visualizing the
design and architecture of a system, akin to blueprints in traditional engineering
disciplines. It is closely associated with object-oriented design and analysis
methodologies, providing a comprehensive set of graphical notations to represent
various aspects of a system's structure and behavior.
Page 53 of 114
By utilizing these UML diagrams, software engineers can effectively communicate and
document the design and behavior of complex systems, facilitating better
understanding, analysis, and collaboration among stakeholders throughout the software
development lifecycle.
GOALS:
The goals outlined in the design of the Unified Modeling Language (UML) serve to
establish a robust and versatile modeling language that addresses the diverse needs of
software developers and stakeholders throughout the software development process.
Page 54 of 114
5. Encourage the growth of OO tools market: UML serves as a catalyst for the
development of a rich ecosystem of object-oriented (OO) modeling tools and software
engineering frameworks. By providing a standardized modeling language, UML fosters
innovation and competition in the market for OO development tools, ultimately
benefiting users with a wide range of options and solutions.
The user initiates the interaction by activating the webcam, which provides a real-time
video feed for gesture recognition. As the user performs hand gestures in front of the
webcam, the system captures and analyzes various features such as hand shape,
movement trajectory, and finger positions.
These extracted features are then compared against predefined patterns stored within
the system using matching algorithms. The system determines the closest match
between the captured gestures and the stored patterns, enabling gesture recognition.
Upon successful recognition, the system provides feedback to the user in two forms:
visual and auditory. The identified gesture is displayed as text output on the screen,
Page 55 of 114
offering visual confirmation. Additionally, the system utilizes text-to-speech
technology to convey the identified gesture audibly, providing auditory feedback to the
user.
Overall, this use case facilitates seamless interaction with the system, allowing users to
communicate through gestures effectively. The automated recognition and feedback
mechanisms enhance user experience by providing both visual and auditory cues,
ensuring efficient communication through gesture-based interactions.
Page 56 of 114
3.4.2 CLASS DIAGRAM
Complementing the CNN class, the "Artificial Neural Network (ANN)" class
extends the system's capabilities by addressing broader machine learning tasks.
For instance, it can recognize facial expressions or body language, providing
additional context to the sign language translation process.
Page 57 of 114
Figure 3.3: Class Diagram for Gesture Language translation
Page 58 of 114
Following the capture of hand movements, the system proceeds to create
representations of the signs conveyed by the user. This involves extracting
relevant features such as hand position, orientation, and motion from the video
data obtained. These representations serve as input for further processing and
analysis.
Through this sequential process, the system accurately interprets and translates
the sign language gestures into textual form. The sequence diagram captures the
iterative and branching nature of the process, demonstrating how the system
systematically processes each step to achieve the outcome of translating sign
language into text.
Page 59 of 114
Figure 3.4: Sequence Diagram for Gesture Language translation
Page 60 of 114
the dynamic behavior of a system and understanding the sequence of events
during runtime.
While both diagrams convey similar information about the interactions between
objects, they offer different perspectives on the system. Sequence diagrams
emphasize the temporal aspects of object interactions, showing the sequence of
messages over time. In contrast, collaboration diagrams provide a static view of
the system's architecture, highlighting the relationships between objects and
how they collaborate to achieve system functionality. Together, these diagrams
complement each other in providing a comprehensive understanding of the
system's behavior and structure.
NOTATIONS:
1. Objects
2. Actors
3. Links
4. Messages
Page 61 of 114
Figure 3.5: Collaboration Diagram for Gesture Language translation
In UML, the activity diagram serves as a powerful tool for modeling the flow of
control within a system, focusing on the sequential and concurrent activities that
occur during its operation. Unlike other diagrams that emphasize implementation
details, the activity diagram provides a high-level overview of the system's
behavior, making it suitable for visualizing complex processes and interactions.
For our project, we employed the swim lane concept to construct an activity
diagram that illustrates the sequential steps involved in translating sign language
gestures into text. The diagram begins with the user initiating the process by
executing a sign language gesture, which serves as the starting point. Following
this, the system captures the gesture and inputs it into the translation system for
further analysis.
Page 62 of 114
Once the gesture is received, the translation system evaluates the feasibility of
successfully translating it into text and speech. If the translation is deemed
feasible, the system proceeds to perform the translation process, converting the
gesture into textual form. The resulting text and speech output are then displayed
to the user, providing both visual and auditory feedback.
However, if the translation is not feasible for any reason, the process may
terminate without displaying any result, or it may indicate to the user that the
translation was unsuccessful. This systematic approach ensures efficient handling
of sign language gestures and facilitates their translation into textual form,
thereby enhancing communication for individuals with hearing impairments.
By depicting the sequential flow of activities and decision points in the translation
process, the activity diagram offers valuable insights into the system's behavior,
enabling stakeholders to understand and analyze its functionality more
effectively.
Page 63 of 114
Figure 3.6: Activity Diagram for Gesture Language translation
The component diagram in our project delineates the physical view of the system,
breaking down the sign language translation process into smaller, manageable
components. Each component represents a distinct element of the system, such as
executables, files, or libraries, and illustrates their relationships and organization
within the overall architecture.
Page 64 of 114
In the context of translating a sign language gesture into text, several key
components play essential roles in the process. Firstly, the interaction is initiated
by the user, who signs a gesture, thereby triggering the translation process. This
user interaction component serves as the starting point for the system's operation.
Next, the system captures the signed gesture using a camera component, which
acts as the input device. The camera component captures image or video data
representing the signed gesture, which is then transmitted to the processing
component for analysis.
Once the sign language gesture has been interpreted and translated into text, the
resulting text is outputted to the user through a display component. This display
component may present the translated text on a screen or output it through another
medium, such as a speaker for auditory feedback.
Page 65 of 114
Figure 3.7: Component Diagram for Gesture Language translation
Page 66 of 114
In essence, the deployment diagram maps the software architecture, designed in
the component diagram, to the physical system architecture, illustrating where
and how each software component will be deployed and executed. This mapping
is achieved through the depiction of nodes, which represent individual hardware
devices or computing resources, and their relationships, which delineate how
these nodes interact and communicate with each other.
The deployment diagram and the component diagram are closely interrelated, as
they both contribute to understanding the overall system architecture from
different perspectives. While the component diagram describes the internal
structure and organization of software components within the system, the
deployment diagram extends this perspective to encompass the physical hardware
infrastructure on which these components reside and operate. Together, these
diagrams provide a comprehensive overview of the system's architecture,
covering both its software and hardware aspects.
Page 67 of 114
Figure 3.8: Deployment Diagram for Gesture Language translation
The state chart diagram for translating a sign language gesture into text provides
a visual representation of the various states and transitions involved in the
translation process. Each state represents a distinct phase of the system's
operation, while transitions depict the flow of control between states based on
certain conditions or events.
1. Idle State:
• In the idle state, the system remains passive, awaiting user input in the
form of a sign language gesture.
2. Gesture Detection State:
Page 68 of 114
• Upon detection of a gesture by the camera, the system transitions to the
gesture detection state.
• Here, the system captures and prepares to process the detected gesture,
initializing the subsequent analysis phase.
3. Processing State:
• Upon entering the processing state, the system begins analyzing the
captured image or video data.
• This analysis involves processing the gesture using the CNN component to
interpret its meaning.
4. Translation State:
5. Display State:
• In the display state, the translated text is presented to the user on a screen or
through another output medium.
• This phase provides feedback to the user, conveying the meaning of the sign
language gesture in a comprehensible format.
6. Error State:
The state chart diagram effectively illustrates the sequential flow of the system
as it progresses through different phases of gesture capture, analysis, translation,
and display. It also accounts for potential errors or exceptions in the process,
Page 69 of 114
ensuring that the system can handle unexpected situations and provide
appropriate feedback to the user.
Page 70 of 114
IMPLEMENTATION
Page 71 of 114
4. IMPLEMENTATION
4.1 ALGORITHMS
CNN
In machine learning, ANNs, particularly CNNs, are powerful tools for various
classification tasks such as image, audio, and text recognition.
ANNs are inspired by the structure and functioning of the human brain, consisting of
interconnected nodes (neurons) organized into layers.
CNNs are particularly effective for image classification tasks due to their ability to
capture spatial hierarchies of features in images.
They are designed to learn spatial hierarchies of features automatically and adaptively
from raw pixel images.
CNNs excel at capturing local patterns such as edges, textures, and shapes, and
combining them to form higher-level representations.
Different types of neural networks are used for different tasks. For example:
Each type of neural network architecture is tailored to handle specific data types and
capture relevant patterns effectively.
Page 72 of 114
Basic Building Block for CNN:
CNNs consist of several layers, including convolutional layers, pooling layers, and
fully connected layers.
The basic building block of a CNN is the convolutional layer, which performs
convolutions on the input image to extract features.
Convolutional layers are typically followed by pooling layers, which down sample the
feature maps to reduce computational complexity and increase translation invariance.
The final layers of a CNN typically consist of fully connected layers, which process
the extracted features and make predictions based on them.
CNNs are versatile and powerful tools in machine learning, particularly for image-
related tasks, and understanding their architecture and basic building blocks is crucial
for effective utilization in various applications.
In the context of building a convolutional neural network (CNN), three key layers
constitute the building blocks of the architecture:
1. 1st Convolution Layer:
• The input image, with a resolution of 200x200 pixels, undergoes processing
in the first convolutional layer.
Page 73 of 114
• In this layer, 64 filter weights are applied to the input image to extract
features.
• Each filter performs convolution operations on different parts of the input
image to detect patterns or features.
Page 74 of 114
6. 3rd Pooling Layer:
• Following the third convolutional layer, another round of max pooling with
a filter size of 3x3 is performed.
• This pooling operation further reduces the spatial dimensions of the feature
maps, facilitating hierarchical feature extraction.
7. Flatten Layer:
• The output from the third pooling layer is flattened into a linear form to
prepare it for input into the subsequent dense layers.
• This flattening process converts the 2D pixel array into a one-dimensional
vector, enabling further processing by fully connected layers.
8. Final Layer:
• The output of the third pooling layer serves as input for the final dense layer.
• This dense layer consists of neurons equal to the number of classes being
classified (e.g., 27 classes of hand signs, including alphabets and a blank
symbol).
• Each neuron in this layer corresponds to a class, and the network's output
represents the likelihood or probability of each class being present in the
input image.
Page 75 of 114
Figure 4.2 Steps and the layers included in CNN
Parameter Sharing:
Page 76 of 114
Translation Invariance:
• CNNs can recognize patterns regardless of their location in the input image.
• This property improves the model's robustness, particularly in tasks like
gesture recognition.
Scale Invariance:
Input:
Dataset of Preprocessed Images: This dataset contains images that have been
preprocessed and categorized into different classes. Each image represents a
specific class or category.
Image Size for Resizing: The images in the dataset need to be resized to a specific
size before being used for training the CNN model. This size is typically
determined based on the input requirements of the CNN architecture.
Page 77 of 114
Steps:
2. Define Dataset Paths and Labels: Specify the paths to the directories
containing the preprocessed images and define the corresponding class labels.
4. Convert Data and Target to NumPy Arrays: Convert the preprocessed image
data and their corresponding labels into NumPy arrays, which are compatible with
the input requirements of the CNN model.
5. Split Data into Training and Testing Sets: Divide the dataset into separate
training and testing sets to evaluate the performance of the trained model on unseen
data.
6. Build the CNN Model: Define the architecture of the CNN model using layers
such as convolutional layers, pooling layers, and fully connected layers. Configure
the model's parameters and structure based on the specific requirements of the
classification task.
7. Compile the Model: Compile the CNN model by specifying the loss function,
optimizer, and evaluation metrics to be used during the training process.
8. Train the Model: Train the compiled CNN model using the training data.
Adjust the model's parameters iteratively to minimize the training loss and improve
performance on the training dataset.
9. Evaluate the Model: Evaluate the trained CNN model's performance on the
testing dataset to assess its accuracy, precision, recall, and other relevant metrics.
Page 78 of 114
ARTIFICIAL NEURAL NETWORKS(ANN):
Information Processing: Input data is initially fed into the input layer of the neural
network. Each neuron in the input layer processes a specific feature of the input
data. The processed information is then transmitted to neurons in the subsequent
hidden layers.
Hidden Layers: Hidden layers are intermediary layers between the input and
output layers. They perform complex transformations and computations on the
input data, extracting relevant features and patterns. The number of hidden layers
and the number of neurons in each layer can vary based on the complexity of the
task and the architecture of the network.
Output Layer: The output layer receives the processed information from the
hidden layers and produces the final output of the neural network. The output can
be in various forms, such as classification labels, numerical values, or probability
scores, depending on the nature of the task being performed.
Page 79 of 114
ANN mimics the information processing capabilities of the human brain, allowing
it to learn from data, extract meaningful patterns, and make predictions or
classifications based on the learned knowledge.
2. Run the Installer: Once the download is complete, run the installer executable
file. Follow the on-screen instructions to proceed with the installation.
4. Choose Installation Options: You may have the option to customize the
installation by selecting components and features you want to include. For most
users, the default installation options are sufficient.
Page 80 of 114
5. Select Installation Location: Choose the directory where you want to install
Visual Studio Code. The default location is usually in the Program Files folder
on Windows.
6. Complete Installation: Once you have selected the installation options and
location, proceed with the installation. The installer will copy the necessary files
and set up VS Code on your system.
7. Launch VS Code: After the installation is complete, you can launch Visual
Studio Code from the Start menu (on Windows), the Applications folder (on
macOS), or by running the `code` command in a terminal (on Linux).
8. Optional: Install Extensions: Visual Studio Code supports extensions that add
functionality and language support. You can install extensions from the
Extensions view within VS Code by searching for the ones you need and clicking
Install.
Operating System: The code can be executed on various operating systems such
as Windows, macOS, or Linux.
Python: Python is the primary programming language used in the code. Ensure
Python is installed on your system. The code appears to be compatible with
Python 3.x.
Visual Studio Code: Install Visual Studio Code on your system. VS Code is a
lightweight and versatile code editor that supports various programming
languages and provides features for code debugging, version control, and
extensions.
Page 81 of 114
Required Python Libraries: The code relies on several Python libraries such as
TensorFlow, Keras, OpenCV, Matplotlib, NumPy, and others. Ensure these
libraries are installed in your Python environment. You can install them using
pip, the Python package manager, by running `pip install <library-name>` in the
terminal.
1. Install Visual Studio Code (VS Code): If you have not already installed VS
Code, you can download it from the official website
(https://code.visualstudio.com/) and follow the installation instructions for your
operating system.
3.Open the Project Folder: Use the "File" menu in VS Code to open the folder
containing the Python script and related files for the project.
4. Set Up Python Environment: Make sure you have Python installed on your
system. You can check this by opening a terminal within VS Code and running
the command `python --version`. If Python is not installed, you can download and
install it from the official Python website (https://www.python.org/).
5. Install Required Python Packages: Open a terminal in VS Code and use pip to
install the required Python packages. You can do this by running the following
command:
6. Open the Python Script: In the Explorer pane of VS Code, navigate to the
Python script file (usually named something like `main.py` or `project.py`) that
you want to execute.
7. Run the Script: There are several ways to run the Python script in VS Code:
Page 82 of 114
Press F5 to run the script in debug mode.
Use the "Run Python File in Terminal" option from the context menu (right-
click on the script file).
Open a terminal in VS Code and run the script manually using the `python`
command:
python script_name.py
9. Review Output: After the script has finished executing, review the output in
the terminal or any other output channels specified in the script.
Page 83 of 114
# Function to accumulate background
If background is None:
Else:
If background is None:
If no contours found:
Return None
Else:
Page 84 of 114
# Create data generators for training and testing images
# Model creation
# Training
Set epochs to 20
# Evaluation
Page 85 of 114
Get next batch of images and labels from test_batches
# Plotting
# Save model
Page 86 of 114
TESTING
Page 87 of 114
5. TESTING
5.1 TESTING
TESTING
• UNIT TEST
Unit testing involves the design of test cases that validate that the internal
program logic is functioning properly, and that program inputs produce
valid outputs. All decision branches and internal code flow should be
validated. It is the testing of individual software units of the application
.it is done after the completion of an individual unit before integration.
This is a structural testing, that relies on knowledge of its construction and
is invasive. Unit tests perform basic tests at component level and test a
specific business process, application, and/or system configuration. Unit
tests ensure that each unique path of a business process performs
accurately to the documented specifications and contains clearly defined
inputs and expected results.
• INTEGRATION TEST
Integration tests are designed to test integrated software components to
determine if they run as one program. Testing is event driven and is more
concerned with the basic outcome of screens or fields. Integration tests
demonstrate that although the components were individually satisfaction,
Page 88 of 114
as shown by successfully unit testing, the combination of components is
correct and consistent. Integration testing is specifically aimed at
exposing the problems that arise from the combination of components.
• FUNCTIONAL TEST
Functional tests provide systematic demonstrations that functions tested
are available as specified by the business and technical requirements,
system documentation, and user manuals. Functional testing is centered
on the following items:
• SYSTEM TEST
System testing ensures that the entire integrated software system meets
requirements. It evaluates a configuration to ensure known and
predictable results. An example of system testing is the configuration-
oriented system integration test. System testing is based on process
descriptions and flows, emphasizing pre-driven process links and
integration points.
Page 89 of 114
5.1.2 WHITE BOX TESTING
White Box Testing is a test in which the software tester has knowledge of the
inner workings, structure, and language of the software, or at least its purpose.
It is used to evaluate areas that cannot be reached from a black box level.
Black Box Testing is testing the software without any knowledge of the inner
workings, structure or language of the module being tested. Black box tests, like
most other kinds of tests, must be written from a definitive source document,
such as specification or requirements document, such as specification or
requirements document. It is a test in which the software under the test is treated
as a black box. you cannot “see” into it. The test provides inputs and responds
to outputs without considering how the software works.
Unit testing is usually conducted as part of a combined code and unit test phase
of the software lifecycle, although it is common for coding and unit testing to
be conducted as two distinct phases.
• Field testing will be performed manually, and functional tests will be written
in detail.
Test objectives
Features to be evaluated
Page 90 of 114
5.1.4.2 INTEGRATION TESTING
Test Results: All the test cases mentioned above passed successfully. No defects
encountered.
Page 91 of 114
Test Test Cases Input Expected O/T Actual O/T P/F
Case
No.
Images Model should Pass
1 Verifying Clear and containing clear accurately classify
webcam. Distinct and distinct ISL each image
Signs signs for each corresponding to
Input letter of the the gesture.
alphabet .
Test Results: All the test cases mentioned above passed successfully. No defects
encountered.
Page 92 of 114
RESULTS
Page 93 of 114
6. RESULTS
Page 94 of 114
Figure 6.2 Hand Gesture given as input
Page 95 of 114
6.2 RESULT OUTPUTS
Page 96 of 114
Figure 6.4 Live Prediction of the hand gesture
Page 97 of 114
Figure 6.5 Conversion of hand gestures
Page 98 of 114
CONCLUSION
Page 99 of 114
7. CONCLUSION
The future scope of the project involving Artificial Neural Networks (ANNs) is vast
and promising, with numerous avenues for advancement and innovation. Potential
directions include exploring more complex neural network architectures such as deep
neural networks (DNNs) and recurrent neural networks (RNNs) to improve
performance in tasks like image classification, speech recognition, and natural language
processing. Additionally, integrating ANNs with emerging technologies like
augmented reality (AR) and virtual reality (VR) could lead to novel applications in
fields such as education, healthcare, and entertainment. Further enhancements could be
achieved through multimodal learning approaches, real-time interaction capabilities,
and a focus on accessibility and inclusivity for diverse user groups. Ethical
considerations and responsible AI deployment will also play a crucial role in shaping
the future development and deployment of ANN-based systems, ensuring fairness,
transparency, and accountability in their implementation. Overall, the future of the
project holds immense potential for addressing complex real-world challenges and
making meaningful contributions to various industries and domains.
America is different from the Indian Sign in urgent need of recognizing and
Language of India. Looking to the ease of translating sign language. Lack of efficient
understanding Indian Sign Language, we gesture detection system designed
standardized to work on Indian Sign specifically for the differently abled,
Language gestures. We need to convert the motivates us as a team to do something
Indian sign language so that it is understood great in this field. The proposed work aims
by others and also help them to at converting such sign gestures into speech
communicate without any barriers. that can be understood by normal people.
Sign language recognition is still a The entire model pipeline is developed by
challenging problem inspire of many CNN architecture for the classification
research efforts during the last many years. of 26 alphabets and one extra alphabet for
One of the methods of hand gesture null character.
recognition is to use the hand gloves for Our model is capable of predicting gestures
human computer interaction. But this from Sign language in real-time with high
method is sophisticated as it requires user to efficiency. These predicted alphabets are
wear glove and carry a load of cables converted to form words and hence forms
connecting the device to a computer. sentences. These sentences are converted
Therefore, to eliminate this complication into voice modules by incorporating
and to make user interaction with computer Google Text to Speech(gTTS API).
easy and natural we proposed to work on This system can therefore be used in real-
sign recognition using bare hands i.e., no time applications which aims at bridging
usage of any external wearable hardware. the the gap in the process of communication
Mainly sign language recognition processes between the Deaf and Dumb people with
are highly depending on human based rest of the world.
translation services. The involvement of
human expertise is very difficult and
expensive also for translation. Now our
proposed automatic sign language
recognition system leads to understand the
meaning of different signs without any aid
from the expert.
In common, any sign language recognition
system contains several modules like object
tracking, skin segmentation, feature Data Set:
extraction, and recognition. The first two The system trained CNNs for the
modules 2 are basically used to extract and classification of numbers, alphabets and
locate hands in the video frames and the other daily used words using 17113
next modules is used for feature extraction, images.. Our method provides 96%
classification and recognition of gesture. accuracy for the 27 letters of the alphabet.
For an image-based gesture recognition The result also shows that with increasing
system, image space variables are widely the number of images (i.e., it can be pre-
large, it is crucial to extract the essential processed images also) in dataset, results
features of the image. In our project we into increase in the accuracy of the system.
basically focus on producing a model which Data Preprocessing:
can recognise Finger spelling-based hand Background Subtraction: If applicable,
gestures in order to form a complete word this technique removes background
by combining each gesture. elements from the image, isolating the
A language translator is extensively utilized primary object of interest.
by the mute people for converting and Grayscale Conversion: Images are
giving shape to their thoughts. A system is converted to grayscale to simplify
Data Acquisition :
We have tried to obtained our dataset but
due to the lack of resources we opted for
performing our pre-processing method
directly on to the existing dataset.
Pre-processing :
While training the model requirement of
data is very large in order to work in very
effective manner. So, if we have a limited
number of images in our dataset for our
network, therefore in order to increase the
data set we have generally augmented our
images. We have just made minor
alterations to our dataset like flips, shifts or
rotations. Data augmentation can also help 2.LITERATURE REVIEW
in reducing the chances of overfitting on Shravani K,etal. [1]“Indian Sign Language
models. Here we have resized and rescale Character Recognition” clarifies that
our images to treat all images in same gesture is a pattern which may be static,
manner. dynamic or both, and is a form of non verbal
communication in which bodily motions
convey information. Sign language is would reduce the communication gap that
composed of visual gestures and signs, exists among people in society.
which are used by deaf and mute for their Aashir Hafeez, Suryansh Singh, [4]Ujjwal
talking. It is a well-structured code gesture Singh, Priyanshu Agarwal, Anant Kumar
where every sign has a specific meaning Jayswal “Sign Language Recognition
allotted to it. These signs are not only used System Using Deep-Learning for Deaf and
for alphabets or numeric but also for Dumb” states that the majority of deaf
common expressions also for example persons utilise sign language as their
greetings and sentences. ISL uses both the primary means of communication. They are
hands for gesture representation and it is different from us in that we are unable to
complex comparing to ASL. Because of understand their sign language, thus in
this reason, there is less research and order to interact with them, we developed a
development in this field. device called a sign language recognition
Babita Sonare, Aditya Padgal, Yash system. This study compares different
Gaikwad, Aniket Patil [2]"Video-Based machine learning techniques using the
Sign Language Translation System Using dataset for American Sign Language. It
Machine Learning" states that the mostly goes over the many stages of an
development of an interactive real-time automated system for recognising sign
video-based sign language translation language (SLR).
system powered by efficient machine
learning algorithms which is commonly 3.EXISTING SYSTEM
developed for deaf-dumb people who are Sign language Translation is one of the
not able to hear or challenging topics as it is in rudimentary
speak and is difficult for them to stage of its development, unlike other Sign
communicate among themselves or with Languages. The project has shown the
normal people. Gesture and human activity classification of sign languages using
recognition both are crucial for detecting machine learning models.
the sign language as well as the behavior of So, there are very limited standard data sets,
an individual. These components are which has variations and noises. It leads to
rapidly growing domains, enabling higher occlusion of features and this is a major
automation in households as well as in barrier for the lack of development in this
industries. field. The existing project aims at helping in
Amrutha K, Prabu P [3] “ML Based Sign then research of this field further by
Language Recognition System”. The providing a data set of sign language
development of the model is based on translation. A data of sign language was
vision-based isolated hand gesture created by us for alphabets and numeric.
detection and recognition. The region-wise Later, the features will be extracted from
division of the sign language helps the users the collected segmented data using image
to have a facile method to convey pre-processing and Bag of words model.
information. As the larger population of
society does not understand sign language, 4.PROPOSED SYSTEM
the speech, and hearing impaired usually Communication is an important aspect
rely on the human translator. The when it comes to share or express
availability and affordability of using a information, feelings, and it brings people
human interpreter might not be possible all closer to each other with better
the time. The best substitute would be an understanding.
automated translator system that can read Sign language, a full-fledged natural
and interpret sign language and convert it language that conveys meaning through
into an understandable form. This translator gestures, is the primary chief of
communication among Deaf and Dumb
CNN Model:
1st Convolution Layer: The input picture
has resolution of 200x200 pixels. It is first
ALGORITHMS processed in the first convolutional layer
Artificial Neural Network (ANN): using 64 filter weights.
Artificial Neural Network is a connection 1st Pooling Layer: The pictures are down
of neurons, replicating the structure of sampled using max pooling of 3x3 i.e., we
human brain. Each connection of neuron keep the highest value in the 3x3 square of
transfers information to another neuron. array. Therefore, our picture is down
Inputs are fed into first layer of neurons sampled.
which processes it and transfers to another 2nd Convolution Layer: Now, this output
layer of neurons called as hidden layers. of the first pooling layer is served as an
After processing of information through input to the second convolutional layer. It is
multiple layers of hidden layers, processed in the second convolutional layer
information is passed to final output layer. using 128 filter weights(2x2 pixels each).
2nd Pooling Layer: The resulting images
are down sampled again using max pool of
3x3 and is reduced to even lesser resolution able to provide the right classification or
of image. output for a specific example even if some
3rd Convolution Layer: convolutional of the activations are dropped out.
layer using 256 filter weights (2x2 pixels Optimizer: We have used Adam optimizer
each). for updating the model in response to the
3rd Pooling Layer: The resulting images output of the loss function. Adam combines
are down sampled again using max pool of the advantages of two extensions of two
3x3 and is reduced to even lesser resolution stochastic gradient descent algorithms
of image. namely adaptive gradient algorithm (ADA
Flatten Layer: It is used to convert the 2D GRAD) and root mean square propagation
pixel array into linear form in order to (RMSProp).
produce converge it into 27 class of hand
signs. TensorFlow:
Final layer: The output of the 3rd Densely TensorFlow is an open-source software
Connected Layer serves as an input for the library for numerical computation. First, we
final layer which will have the number of define the nodes of the computation graph,
neurons as the number of classes we are then inside a session, the actual
classifying (alphabets + blank symbol). computation takes place. TensorFlow is
Activation Function: We have used ReLu widely used in Machine Learning.
(Rectified Linear Unit) in each of the layers Keras:
(convolutional as well as fully connected Keras is a high-level neural networks
neurons). ReLu calculates max(x,0) for library written in python that works as a
each input pixel. This adds nonlinearity to wrapper to TensorFlow. It is used in cases
the formula and helps to learn more where we want to quickly build and test the
complicated features. It helps in removing neural network with minimal lines of code.
the vanishing gradient problem and It contains implementations of commonly
speeding up the training by reducing the used neural network elements like layers,
computation time. At the last activation objective, activation functions, optimizers,
function, we used SOFTMAX function. It and tools to make working with images and
is used as the activation function in the text data easier.
output layer of neural network models that OpenCV:
predict a multinomial probability OpenCV (Open-Source Computer Vision)
distribution. That is, SoftMax is used as the is an open-source library of programming
activation function for multi-class functions used for real-time computer-
classification problems where class vision. It is mainly used for image
membership is required on more than two processing, video capture and analysis for
class labels. • features like face and object recognition. It
Pooling Layer: We apply Max pooling to is written in C++ which is its primary
the input image with a pool size of (3, 3) interface, however bindings are available
with ReLu activation function. This reduces for Python, Java, MATLAB/OCTAVE.
the amount of parameters thus lessening the
computation cost and reduces overfitting. Training and Testing:
Dropout Layers: The problem of We convert our input images (RGB) into
overfitting, where after training, the grayscale and apply gaussian blur to
weights of the network are so tuned to the remove unnecessary noise. We feed the
training examples they are given that the input images after pre-processing to our
network doesn’t perform well when given model for training and testing after
new examples. This layer “drops out” a applying all the operations mentioned
random set of activations in that layer by above. The prediction layer estimates how
setting them to zero. The network should be likely the image will fall under one of the
5.CONCLUSION
Communication between deaf-mute and a
normal person have always been a
challenging task. The goal of our project is
to reduce the barrier between them. We
have made our effort by contributing to the
field of Sign Language recognition. In this
project, we developed a CNN-based human
hand gesture recognition system. The
salient feature of our system is that there is
no need to build a model for every gesture [6] K.Bhanu Prathap, G.Divya Swaroop,
using hand features such as fingertips and B.Praveen Kumar, Vipin Kamble, Mayur
contours. Here in this project, we have Parate, “ISLR: Indian Sign Language
constructed a CNN classifier which is Recognition” .
capable of recognizing sign language [7] Pavleen Kaur, Payel Ganguly, Saumya
gestures. The proposed system has shown Verma, Neha Bansal, “Bridging the
satisfactory results on the transitive Communication Gap: With Real Time Sign
gestures. In this report, a functional real Language Translation”.
time vision-based sign language [8]Hao Zhou, Wengang Zhou, Weizhen Qi,
recognition for deaf and dumb people have Junfu Pu, Houqiang Li, “Improving Sign
been developed. We achieved final Language Translation with Monolingual
accuracy of 98.0% on our dataset. We are Data by Sign Back-Translation”.
able to improve our prediction after [9] Wanbo Li, Hang Pu, Ruijuan Wang,
implementing two layers of algorithms, we “Sign Language Recognition Based on
have also verified our result for the similar Computer Vision”.
looking gesture which were more prone to [10]Neeraj Kumar Pandey, Aakanchha
misclassification. This way we are able to Dwivedi, Mukul Sharma, Arpit Bansal,
detect almost all the symbols provided that Amit Kumar Mishra, “An Improved Sign
they are shown properly, there is no noise in Language Translation approach using KNN
the background and lighting is adequate. in Deep Learning Environment”.
[11]. R Vijaya Prakash, Akshay R, A
REFERENCES Ashwitha Reddy, R Harshitha, K
[1] K. M. J. R. A. &. R. I. Tiku, "Real-time Himansee, S.K Abdul Sattar, “Sign
Conversion of Sign Language to Text and Language Recognition Using CNN”.
Speech," in Tiku, K., Maloo, J., Ramesh, [12]. Sakshi Sharma, Sukhwinder Singh,
A., & R, I. (2020). Real-time Conversion of “Vision-based sign language recognition
Sign Language to Text and Speech. 2020 system: A Comprehensive Review”.
Second International CSecond
International Conference on Inventive
Research in Computing Applications
(ICIRCA), 2020.
[2] S. Y. M. M. K. S. V. S. &. S. S. Heera,
"Talking Hands – An Indian Sign Language
to Speech Translating Gloves," in
International Conference on Innovative
Mechanisms for Industry Applications
(ICIMIA 2017), 2017.
[3] Hunter Phillips, Steven Lasch &
Mahesh Maddumala , “American Sign
Language Translation Using Transfer
Learning”.
[4]M. Rajmohan, C. Srinivasan, Orsu
Ranga Babu, Subbiah Murugan, Badam Sai
Kumar Reddy “Efficient Indian Sign
Language Interpreter For Hearing
Impaired”.
[5]Mahmudul Haque, Syma Afsha, Tareque
Bashar Ovi, Hussain Nyeem, “Improving
Automatic Sign Language Translation with
Image Binarisation and Deep Learning”.