Sathyabama: Conversion of Sign Language Into Speech or Text Using CNN

CONVERSION OF SIGN LANGUAGE INTO SPEECH OR
TEXT USING CNN
Submitted in partial fulfillment of the requirements for

the award of
Bachelor of Engineering degree in Computer Science and Engineering
By
Jebakani C. (38110215)
Rishitha S.P. (38110461)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

SCHOOL OF COMPUTING
SATHYABAMA
INSTITUTE OF SCIENCE AND TECHNOLOGY
(DEEMED TO BE UNIVERSITY)
Accredited with Grade “A” by NAAC
JEPPIAAR NAGAR, RAJIV GANDHI SALAI,
CHENNAI - 600 119
MARCH-2022
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
BONAFIDE CERTIFICATE
This is to certify that this Project Report is the bonafide work of Jebakani C (REG
NO: 38110215), Rishitha S.P. (REG NO: 38110461) who have done Project work
as a team who carried out the project entitled ―CONVERSION OF SIGN
LANGUAGE INTO SPEECH OR TEXT USING CNN‖ under my supervision from
November 2021 to April 2022.
Internal Guide
Ms. AISHWARYA R M.E.,
Head of the Department

Dr. L. Lakshmanan M.E., Ph.D.,
Dr.S.Vigneshwari M.E., Ph.D.,
Submitted for Viva voce Examination held on
Internal Examiner External Examiner

DECLARATION
We Jebakani C (REG NO: 38110215) and Rishitha S.P. (REG NO: 38110461)
hereby declare that the Project Report entitled “CONVERSION OF SIGN
LANGUAGE INTO SPEECH OR TEXT USING CNN‖ done by me under the
guidance of Ms. AISHWARYA R M.E., is submitted in partial fulfillment of the
requirements for the award of Bachelor of Engineering degree in Computer
Science and Engineering
DATE:
PLACE: SIGNATURE OF THE CANDIDATE

ACKNOWLEDGEMENT
I am pleased to acknowledge my sincere thanks to Board of Management of

SATHYABAMA for their kind encouragement in doing this project and for
completing it successfully. I am grateful to them.
I convey my thanks to Dr. T. Sasikala M.E., Ph.D, Dean, School of Computing Dr.
L. Lakshmanan M.E., Ph.D. , and Dr.S.Vigneshwari M.E., Ph.D. Heads of the
Department of Computer Science and Engineering for providing me necessary
support and details at the right time during the progressive reviews.
I would like to express my sincere and deep sense of gratitude to my Project

Guide Ms. Aishwarya R. M.E., for her valuable guidance, suggestions and
constant encouragement paved way for the successful completion of my project
work.
I wish to express my thanks to all Teaching and Non-teaching staff members of

the Department of Computer Science and Engineering who were helpful in
many ways for the completion of the project. .......................
ABSTRACT
Exchange of words among the community is one of the essential mediums of

survival. These people communicate using "Sign Language" among their
communities which has its own meaning, grammar and lexicons, and it may not be
comprehensible for every other individual. Our proposed methodology focuses on
creating a vision-based application that interprets the sign language into
understandable speech or text on an embedded device and this is done using
deep learning techniques and machine learning algorithms. The dataset has been
split into training data and test data in the ratio 9:1. This work involves CNN, IoT
and Python Language.
Keywords—Convolutional Neural Network, Raspberry pi, Indian Sign Language,
hand gesture recognition.
i
LIST OF FIGURES
Figure no. Name of the Figure Page no.
1.1 Phases of pattern recognition 3
2.1 Layers involved in CNN 21
2.2 Architecture of Sign Language recognition System 27

3.1 Sample dataset from train set 31
3.2 Sample dataset from test set 31
3.6.1 Data flow Diagram 39
3.6.2 Use Case Diagram 46
3.6.4 Class Diagram 49

3.6.5 Sequence Diagram 52
3.6.7 State Chart Diagram 53
ii
TABLE OF CONTENT
CHAPTER NO. TITLE PAGE NO

1 INTRODUCTION
1.1 Image Processing 1

1.2 Sign Language 2
1.3 Sign Language And Hand Gesture 4
Recognition
1.4 Motivation 5
1.5 Problem Statement 5
2 LITERATURE SURVEY
2.1 Survey Walk Through 7

2.1.1 Tensor Flow 7
2.1.2 Opencv 7
2.1.3 keras 11
2.1.4 numpy 13
2.1.5 Neural Networks 15
2.2 Existing Models 22
2.3 Proposed System 26
3 METHODOLOGY AND
IMPLEMENTATION
3.1 Training Module 28
3.1.1 Pre-Processing 28
i
3.2 Algorithm 32
i
3.3 Segmentation 33
i
3.4 Convolution Neural Networks 34
3.5 Testing Module 37

3.6 D
e
signs
3.6.1 Dataflow Diagram UML 41
3.6.2 Usecase Diagram 44
3.6.3 Class Diagram 48
3.6.4 Sequence Diagram 49
3.6.5 State chart Diagram 52

3.7 System Requirements
3.7.1 Software requirements 54
3.7.2 Hardware requirements 54
3.8 Processing Module 54
3.9 Streaming Module 56
3.10 Performance Measure
3.10.1 Precision 57
3.10.2 Recall 57
3.10.3 Support 57
3.10.4 F1 score 57
4 RESULT AND FUTURE WORKS

4.1 Results 58
iv
5 CONCLUSION AND FUTURE WORK
5.1 Conclusion and Future work 59
7 APPENDIX
a) Sample code 61
b) Screenshots 68
v
CHAPTER 1
INTRODUCTION
Speech impaired people use hand signs and gestures to

communicate. Normal people face difficulty in understanding their
language. Hence there is a need of a system which recognizes the
different signs, gestures and conveys the information to the normal
people. It bridges the gap between physically challenged people and
normal people.
1.1 IMAGE PROCESSING

Image processing is a method to perform some operations on an
image, in order to get an enhanced image or to extract some useful
information from it. It is a type of signal processing in which input is an
image and output may be image or characteristics/features associated
with that image. Nowadays, image processing is among rapidly
growing technologies. It forms core research area within engineering
and computer science disciplines too.
Image processing basically includes the following three steps:
• Importing the image via image acquisition tools.
• Analysing and manipulating the image.
• Output in which result can be altered image or report that is based on
• image analysis.
There are two types of methods used for image processing namely,
analogue and digital image processing. Analogue image processing
can be used for the hard copies like printouts and photographs. Image
analysts use various fundamentals of interpretation while using these
visual techniques. Digital image processing techniques help in
manipulation of the digital images by using computers. The three
general phases that all types of data have to undergo while using
1
digital technique are preprocessing, enhancement, and display,
information extraction.
Digital image processing:

Digital image processing consists of the manipulation of images using
digital computers. Its use has been increasing exponentially in the last
decades. Its applications range from medicine to entertainment,
passing by geological processing and remote sensing. Multimedia
systems, one of the pillars of the modern information society, rely
heavily on digital image processing.
Digital image processing consists of the manipulation of those finite
precision numbers. The processing of digital images can be divided
into several classes: image enhancement, image restoration, image
analysis, and image compression. In image enhancement, an image is
manipulated, mostly by heuristic techniques, so that a human viewer
can extract useful information from it.
Digital image processing is to process images by computer. Digital
image processing can be defined as subjecting a numerical
representation of an object to a series of operations in order to obtain
a desired result. Digital image processing consists of the conversion of
a physical image into a corresponding digital image and the extraction
of significant information from the digital image by applying various
algorithms.
Pattern recognition: On the basis of image processing, it is
necessary to separate objects from images by pattern recognition
technology, then to identify and classify these objects through
technologies provided by statistical decision theory. Under the
conditions that an image includes several objects, the pattern
recognition consists of three phases, as shown in Fig.
2
Fig1.1: Phases of pattern recognition
The first phase includes the image segmentation and object separation.
In this phase, different objects are detected and separate from other
background. The second phase is the feature extraction. In this phase,
objects are measured. The measuring feature is to quantitatively
estimate some important features of objects, and a group of the
features are combined to make up a feature vector during feature
extraction. The third phase is classification. In this phase, the output is
just a decision to determine which category every object belongs to.
Therefore, for pattern recognition, what input are images and what
output are object types and structural analysis of images. The
structural analysis is a description of images in order to correctly
understand and judge for the important information of images.
1.2 SIGN LANGUAGE
It is a language that includes gestures made with the hands and other
body parts, including facial expressions and postures of the body.It
used primarily by people who are deaf and dumb. There are many
different sign languages as, British, Indian and American sign
languages. British sign language (BSL) is not easily intelligible to users
of American sign Language (ASL) and vice versa .
A functioning signing recognition system could provide a chance for
the inattentive communicate with non-signing people without the
necessity for an interpreter. It might be wont to generate speech or
3
text making the deaf more independent. Unfortunately there has not
been any system with these capabilities thus far. during this project
our aim is to develop a system which may classify signing accurately.
American Sign Language (ASL) is a complete, natural language that
has the same linguistic properties as spoken languages, with grammar
that differs from English. ASL is expressed by movements of the
hands and face. It is the primary language of many North Americans
who are deaf and hard of hearing, and is used by many hearing
people as well.
1.3 SIGN LANGUAGE AND HAND GESTURE

RECOGNITION
The process of converting the signs and gestures shown by the user
into text is called sign language recognition. It bridges the
communication gap between people who cannot speak and the
general public. Image processing algorithms along with neural
networks is used to map the gesture to appropriate text in the training
data and hence raw images/videos are converted into respective text
that can be read and understood.
Dumb people are usually deprived of normal communication with

other people in the society. It has been observed that they find it really
difficult at times to interact with normal people with their gestures, as
only a very few of those are recognized by most people. Since people
with hearing impairment or deaf people cannot talk like normal people
so they have to depend on some sort of visual communication in most
of the time. Sign Language is the primary means of communication in
the deaf and dumb community. As like any other language it has also
got grammar and vocabulary but uses visual modality for exchanging
information. The problem arises when dumb or deaf people try to
express themselves to other people with the help of these sign
language grammars. This is because normal people are usually
unaware of these grammars. As a result it has been seen that
communication of a dumb person are only limited within his/her family
4
or the deaf community. The importance of sign language is
emphasized by the growing public approval and funds for international
project. At this age of Technology the demand for a computer based
system is highly demanding for the dumb community. However,
researchers have been attacking the problem for quite some time now
and the results are showing some promise. Interesting technologies
are being developed for speech recognition but no real commercial
product for sign recognition is actually there in the current market. The
idea is to make computers to understand human language and
develop a user friendly human computer interfaces (HCI). Making a
computer understand speech, facial expressions and human gestures
are some steps towards it. Gestures are the non-verbally exchanged
information. A person can perform innumerable gestures at a time.
Since human gestures are perceived through vision, it is a subject of
great interest forcomputer vision researchers. The project aims to
determine human gestures by creating an HCI. Coding of these
gestures into machine language demands a complex programming
algorithm. In our project we are focusing on Image Processing and
Template matching for better output generation.
1.4 MOTIVATION
The 2011 Indian census cites roughly 1.3 million people with
―hearingimpairment‖. In contrast to that numbers from India‘s National
Association of the Deaf estimates that 18 million people –roughly 1
per cent of Indian population are deaf. These statistics formed the
motivation for our project. As these speech impairment and deaf
people need a proper channel to communicate with normal people
there is a need for a system. Not all normal people can understand
sign language of impaired people. Our project hence is aimed at
converting the sign language gestures into text that is readable for
normal people
1.5 PROBLEM STATEMENT

Speech impaired people use hand signs and gestures to communicate.
5
Normal people face difficulty in understanding their language. Hence
there is a need of a system which recognizes the different signs,
gestures and conveys the information to the normal people. It bridges
the gap between physically challenged people and normal people.
1.6 ORGANISATION OF THESIS

The book is organised as follows:
Part 1: The various technologies that are studied are introduced and
the problem statement is stated alongwith the motivation to our
project.
Part 2: The Literature survey is put forth which explains the various
other works and their technologies that are used for Sign Language
Recognition.
Part 3: Explains the methodologies in detail, represents the

architecture and algorithms used.
Part 4: Represents the project in various designs.
Part 5: Provides the experimental analysis, the code involved and the results
obtained.
Part 6: Concludes the project and provides the scope to which the
project can be extended.
6
CHAPTER 2
LITERATURE SURVEY
2.1 SURVEY WALKTHROUGH:
The domain analysis that we have done for the project mainly
involved understanding the neural networks
2.1.1 TensorFlow:
TensorFlow is a free and open-source software library for dataflow and

differentiable programming across a range of tasks. It is a symbolic
math library, and is also used for machine learning applications such
as neural networks. It is used for both research and production at
Google.
Features: TensorFlow provides stable Python (for version 3.7 across

all platforms) and C APIs; and without API backwards compatibility
guarantee: C++, Go, Java, JavaScript and Swift (early release). Third-
party packages are available for C#, Haskell Julia, MATLAB,R, Scala,
Rust, OCaml, and Crystal."New language support should be built on
top of the C API. However, not all functionality is available in C yet."
Some more functionality is provided by the Python API.
Application: Among the applications for which TensorFlow is the

foundation, are automated image-captioning software, suchas
DeepDream.
2.1.2 Opencv:
OpenCV (Open Source Computer Vision Library) is a library of

programming functions mainly aimed at real-time computer vision.[1]
Originally developed by Intel, it was later supported by Willow Garage
7
then Itseez (which was later acquired by Intel[2]). The library is cross-
platform and free for use under the open-source BSD license.
OpenCV's application areas include:
 2D and 3D feature toolkits

 Egomotion estimation
 Facial recognition system
 Gesture recognition
 Human–computer interaction (HCI)
 Mobile robotics
 Motion understanding
 Object identification
 Segmentation and recognition
Stereopsis stereo vision: depth perception from 2 cameras
 Structure from motion (SFM).

 Motion tracking
 Augmented reality
To support some of the above areas, OpenCV includes a statistical

machine learning library that contains:
 Boosting
 Decision tree learning
 Gradient boosting trees
 Expectation-maximization algorithm
 k-nearest neighbor algorithm
 Naive Bayes classifier
 Artificial neural networks
 Random forest
 Support vector machine (SVM)
 Deep neural networks (DNN)
8
AForge.NET, a computer vision library for the Common Language
Runtime (.NET Framework and Mono).
ROS (Robot Operating System). OpenCV is used as the primary

vision package in ROS.
VXL, an alternative library written in C++.
Integrating Vision Toolkit (IVT), a fast and easy-to-use C++ library with
an optional interface to OpenCV.
CVIPtools, a complete GUI-based computer-vision and image-

processing software environment, with C function libraries, a COM-
based DLL, along with two utility programs for algorithm development
and batch processing.
OpenNN, an open-source neural networks library
written in C++. List of free and open source
software packages
 OpenCV Functionality
 Image/video I/O, processing, display (core, imgproc, highgui)
 Object/feature detection (objdetect, features2d, nonfree)
 Geometry-based monocular or stereo computer vision
(calib3d, stitching, videostab)
 Computational photography (photo, video, superres)
 Machine learning & clustering (ml, flann)
 CUDA
acceleration (gpu)
9
Image-Processing:
Image processing is a method to perform some operations on an

image, in order to get an enhanced image and or to extract some
useful information from it.
If we talk about the basic definition of image processing then ―Image

processing is the analysis and manipulation of a digitized image,
especially in order to improve its quality‖.
Digital-Image :
An image may be defined as a two-dimensional function f(x, y), where

x and y are spatial(plane) coordinates, and the amplitude of fat any
pair of coordinates (x, y) is called the intensity or grey level of the
image at that point.
In another word An image is nothing more than a two-dimensional

matrix (3-D in case of coloured images) which is defined by the
mathematical function f(x, y) at any point is giving the pixel value at
that point of an image, the pixel value describes how bright that pixel
is, and what colour it should be.
Image processing is basically signal processing in which input is an

image and output is image or characteristics according to requirement
associated with that image.
Image processing basically includes the
following three steps : Importing the image
Analysing and manipulating the image
10
Output in which result can be altered image or report that is based
on image analysis Applications of Computer Vision:
Here we have listed down some of major domains where Computer

Vision is heavily used.
 Robotics Application
 Localization − Determine robot location automatically
 Navigation
 Obstacles avoidance
 Assembly (peg-in-hole, welding, painting)
 Manipulation (e.g. PUMA robot manipulator)
 Human Robot Interaction (HRI) − Intelligent robotics to interact
with and serve people
 Medicine Application
 Classification and detection (e.g. lesion or cells classification
and tumor detection)
 2D/3D segmentation
 3D human organ reconstruction (MRI or ultrasound)
 Vision-guided robotics surgery
 Industrial Automation Application
 Industrial inspection (defect detection)
 Assembly
 Barcode and package label reading
 Object sorting
 Document understanding (e.g. OCR)
 Security Application
 Biometrics (iris, finger print, face recognition)
 Surveillance − Detecting certain suspicious activities or behaviors
 Transportation Application
 Autonomous vehicle
 Safety, e.g., driver vigilance monitoring
11
2.1.3 Keras:
Keras is an open-source neural-network library written in

Python. It is capable of running on top of TensorFlow, Microsoft
Cognitive Toolkit, R, Theano, or PlaidML. Designed to enable fast
experimentation with deep neural networks, it focuses on being user-
friendly, modular, and extensible. It was developed as part of the
research effort of project ONEIROS (Open-ended Neuro-Electronic
Intelligent Robot Operating System), and its primary author and
maintainer is François Chollet, a Google engineer. Chollet also is the
author of the XCeption deep neural network model.
Features: Keras contains numerous implementations of commonly

used neural- network building blocks such as layers, objectives,
activation functions, optimizers, anda host of tools to make working
with image and text data easier to simplify the
coding necessary for writing deep neural network code. The code is
hosted on GitHub, and community support forums include the GitHub
issues page, and a Slack channel.
In addition to standard neural networks, Keras has support for

convolutional and recurrent neural networks. It supports other
common utility layers like dropout, batch normalization, and pooling.
Keras allows users to productize deep models on smartphones (iOS

and Android), on the web, or on the Java Virtual Machine. It also
allows use of distributed training of deep-learning models on clusters
of Graphics processing units (GPU) and tensor processing units (TPU)
principally in conjunction with CUDA.
Keras applications module is used to provide pre-trained model for

deep neural networks. Keras models are used for prediction, feature
12
extraction and fine tuning. This chapter explains about Keras
applications in detail.
Pre-trained models
Trained model consists of two parts model Architecture and model

Weights. Model weights are large file so we have to download and
extract the feature from ImageNet database. Some of the popular pre-
trained models are listed below,
 ResNet
 VGG16
 MobileNet
 InceptionResNetV2
 InceptionV3
2.1.4 Numpy:
NumPy (pronounced /ˈnʌmpaɪ/ (NUM-py) or sometimes /ˈnʌmpi/ (NUM-pee))

is a library for the Python programming language, adding support for
large, multidimensional arrays and matrices, along with a large
collection of high-level mathematical functions to operate on these
arrays. The ancestor of NumPy, Numeric, was originally created by
Jim Hugunin with contributions from several other developers. In
2005, Travis Oliphant created NumPy by incorporating features of the
competing Numarray into Numeric, with extensive modifications.
NumPy is open- source software and has many contributors.
Features: NumPy targets the CPython reference implementation of

Python, which is a non-optimizing bytecode interpreter. Mathematical
algorithms written for this version of Python often run much slower
than compiled equivalents. NumPy addresses the slowness problem
partly by providing multidimensional arrays and functions and
operators that operate efficiently on arrays, requiring rewriting some
13
code, mostly inner loops using NumPy.
Using NumPy in Python gives functionality comparable to MATLAB

since they are both interpreted,and they both allow the user to write
fast programs as long as most operations work on arrays or matrices
instead of scalars. In comparison, MATLAB boasts a large number of
additional toolboxes, notably Simulink, whereas NumPy is intrinsically
integrated with Python, a more modern and complete programming
language. Moreover, complementary Python packages are available;
SciPy is a library that adds more MATLAB-like functionality and
Matplotlib is aplotting package that providesMATLAB-like plotting
functionality. Internally, both MATLAB and NumPy rely on BLAS and
LAPACK for efficient linear algebra computations.
Python bindings of the widely used computer vision library OpenCV

utilize NumPy arrays to store and operate on data. Since images with
multiple channels are simply represented as three-dimensional arrays,
indexing, slicing or masking with other arrays are very efficient ways to
access specific pixels of an image. The NumPy array as universal
data structure in OpenCV for images, extracted feature points, filter
kernels and many more vastly simplifies the programming workflow
and debugging.
Limitations: Inserting or appending entries to an array is not as trivially

possible as it is with Python's lists. The np.pad(...) routine to extend
arrays actually creates new arrays of the desired shape and padding
values, copies the given array into the new one and returns it.
NumPy'snp.concatenate([a1,a2]) operation does not actually link the
two arrays but returns a new one, filled with the entries from both given
arrays in sequence. Reshaping the dimensionality of an array with
np.reshape(...) is only possible as long as the number of elements in
the array does not change. These circumstances originate from the
fact that NumPy's arrays must be views on contiguous memory
buffers. A replacement package called Blaze attempts to overcome
14
this limitation.
Algorithms that are not expressible as a vectorized operation will

typically run slowly because they must be implemented in "pure
Python", while vectorization may increase memory complexity of some
operations from constant to linear, because temporary arrays must be
created that are as large as the inputs. Runtime compilation of
numerical code has been implemented by several groups to avoid
these problems; open source solutions that interoperate with NumPy
include scipy.weave, numexpr and Numba. Cython and Pythran are
static-compiling alternatives to these.
2.1.5 Neural Networks:
A neural network is a series of algorithms that endeavors to

recognize underlying relationships in a set of data through a process
that mimics the way the human brain operates. In this sense, neural
networks refer to systems of neurons, either organic or artificial in
nature. Neural networks can adapt to changing input; so the network
generates the best possible resultwithout needing to redesign the
output criteria. The concept of neural networks, which has its roots in
artificial intelligence, is swiftly gaining popularity in the development of
trading systems.
A neural network works similarly to the human brain‘s neural network.

A ―neuron‖ in a neural network is a mathematical function that collects
and classifies information according to a specific architecture. The
network bears a strong resemblance to statistical methods such as
curve fitting and regression analysis.
A neural network contains layers of interconnected nodes. Each node

is a perceptron and is similar to a multiple linear regression. The
perceptron feeds the signal produced by a multiple linear regression
15
into an activation function that may be nonlinear.
In a multi-layered perceptron (MLP), perceptrons are arranged in

interconnected layers. The input layer collects input patterns. The
output layer has classifications or output signals to which input patterns
may map. Hidden layers fine-tune the input weightings until the neural
network‘s margin of error is minimal. It is hypothesized that hidden
layers extrapolate salient features in the input data that have predictive
power regarding the outputs. This describes feature extraction, which
accomplishes a utility similar to statistical techniques such as principal
component analysis.
Areas of Application
Followings are some of the areas, where ANN is being used. It

suggests that ANN has an interdisciplinary approach in its
development and applications.
Speech Recognition
Speech occupies a prominent role in human-human interaction.

Therefore, it is natural for people to expect speech interfaces with
computers. In the present era, for communication with machines,
humans still need sophisticated languages which are difficult to learn
and use. To ease this communication barrier, a simple solution could
be, communication in a spoken language that is possible for the
machine to understand.
Great progress has been made in this field, however, still such kinds of
systems are facing the problem of limited vocabulary or grammar
along with the issue of retraining of the system for different speakers in
different conditions. ANN is playing a major role in this area. Following
ANNs have been used for speech recognition −
Multilayer networks
16
Multilayer networks with recurrent
connections Kohonen self-
organizin feature map
The most useful network for this is Kohonen Self-Organizing feature

map, which has its input as short segments of the speech
waveform. It will map the same kind of phonemes as the output
array, called feature extraction technique. After extracting the features,
with the help of some acoustic models as back-end processing, it will
recognize the utterance.
Character Recognition
It is an interesting problem which falls under the general area of

Pattern Recognition. Many neural networks have been developed for
automatic recognition of handwritten characters, either letters or digits.
Following are some ANNs which have been used for character
recognition −
Multilayer neural networks such as Backpropagation
neural networks. Neocognitron
Though back-propagation neural networks have several hidden layers,

the pattern of connection from one layer to the next is localized.
Similarly, neocognitron also has several hidden layers and its training
is done layer by layer for such kind of applications.
Signature Verification Application
Signatures are one of the most useful ways to authorize and

authenticate a person in legal transactions. Signature verification
technique is a non-vision based technique.
17
For this application, the first approach is to extract the feature or rather
the geometrical feature set representing the signature. With these
feature sets, we have to train the neural networks using an efficient
neural network algorithm. This trained neural network will classify the
signature as being genuine or forged under the verification stage.
Human Face Recognition
It is one of the biometric methods to identify the given face. It is a

typical task because of the characterization of ―non-face‖ images.
However, if a neural network is well trained, then it can be divided into
two classes namely images having faces and images that do not have
faces.
First, all the input images must be preprocessed. Then, the

dimensionality of that image must be reduced. And, at last it must be
classified using neural network training algorithm. Following neural
networks are used for training purposes with preprocessed image −
Fully-connected multilayer feed-forward neural network trained with the help of

backpropagation algorithm.
For dimensionality reduction, Principal Component
Analysis PCA is used. Deep Learning:
Deep-learning networks are distinguished from the more

commonplace single-hidden- layer neural networks by their depth; that
is, the number of node layers through which data must pass in a
multistep process of pattern recognition.
Earlier versions of neural networks such as the first perceptrons were

shallow, composed of one input and one output layer, and at most one
hidden layer in between. More than three layers (including input and
18
output) qualifies as ―deep‖ learning. So deep is not just a buzzword to
make algorithms seem like they read Sartre and listen to bands you
haven‘t heard of yet. It is a strictly defined term that means more than
one hidden layer.
In deep-learning networks, each layer of nodes trains on a distinct set

of features based on the previous layer‘s output. The further you
advance into the neural net, the more complex the features your
nodes can recognize, since they aggregate and recombine features
from the previous layer.
This is known as feature hierarchy, and it is a hierarchy of increasing

complexity and abstraction. It makes deep-learning networks capable
of handling very large, high- dimensional data sets with billions of
parameters that pass through nonlinear functions.
Above all, these neural nets are16 capable of discovering latent

structures within unlabeled, unstructured data, which is the vast
majority of data in the world. Another word for unstructured data is
raw media; i.e. pictures, texts, video and audio recordings. Therefore,
one of the problems deep learning solves best is in processing and
clustering the world‘s raw, unlabeled media, discerning similarities and
anomalies in data that no human has organized in a relational
database or ever put a name to.
For example, deep learning can take a million images, and cluster
them according to their similarities: cats in one corner, ice breakers in
another, and in a third all the photos of your grandmother. This is the
basis of so-called smart photo albums.
Deep-learning networks perform automatic feature extraction without

human intervention, unlike most traditional machine-learning
algorithms. Given that feature extraction is a task that can take teams
of data scientists years to accomplish, deep learning is a way to
circumvent the chokepoint of limited experts. It augments the powers
19
of small data science teams, which by their nature do not scale.
When training on unlabeled data, each node layer in a deep network

learns features automatically by repeatedly trying to reconstruct the
input from which it draws its samples, attempting to minimize the
difference between the network‘s guesses and the probability
distribution of the input data itself. Restricted Boltzmann machines, for
examples, create so-called reconstructions in this manner.
In the process, these neural networks learn to recognize correlations

between certain relevant features and optimal results – they draw
connections between feature signals and what those features
represent, whether it be a full reconstruction, or with labeled data.
A deep-learning network trained on labeled data can then be applied

to unstructured data, giving it access to much more input than
machine-learning nets.
Convolution neural network:
Convolutional neural networks (CNN) is a special architecture of

artificial neural networks, proposed by Yann LeCun in 1988. CNN uses
some features of the visual cortex. One of the most popular uses of this
architecture is image classification. For example Facebook uses CNN
for automatic tagging algorithms, Amazon — for generating product
recommendations and Google — for search through among users‘
photos.
Instead of the image, the computer sees an array of pixels. For

example, if image size is 300 x 300. In this case, the size of the array
will be 300x300x3. Where 300 is width, next 300 is height and 3 is
RGB channel values. The computer is assigned a value from 0 to 255
to each of these numbers. Тhis value describes the intensity of the
pixel at each point.
20
To solve this problem the computer looks for the characteristics of the
baselevel. In human understanding such characteristics are for example
the trunk or large ears. For the computer, these characteristics are
boundaries or curvatures. And then through the groups of convolutional
layers the computer constructs more abstract concepts.In more detail:
the image is passed through a series of convolutional, nonlinear,
pooling layers and fully connected layers, and then generates the
output.
Applications of convolution neural network:
Decoding Facial Recognition:
Facial recognition is broken down by a convolutional neural network

into the following major components -
 Identifying every face in the picture

 Focusing on each face despite external factors, such as light, angle,
pose, etc.
 Identifying unique features
Comparing all the collected data with already existing data in the
database to match a face with a name.
A similar process is followed for scene labeling as well. Analyzing Documents:
Convolutional neural networks can also be used for document analysis. This
is not just useful for handwriting analysis, but also has a major stake in
recognizers. For a machine to be able to scan an individual's writing, and
then compare that to the wide database it has, it must execute almost a
million commands a minute. It is said with the use of CNNs and newer
models and algorithms, the error rate has been brought down to a
minimum of 0.4% at a character level, though it's complete testing is yet to
be widely seen.
21
Fig2.1: Layers involved in CNN
2.1.6 EXISTING SYSTEM
In Literature survey we have gone through other similar works

that are implemented in the domain of sign language recognition.The
summaries of each of the project works are mentioned below
A Survey of Hand Gesture Recognition Methods in Sign

Language Recognition
Sign Language Recognition (SLR) system, which is required to

recognize sign languages, has been widely studied for years.The
studies are based on various input sensors, gesture segmentation,
extraction of features and classifcation methods.This paper aims to
analyze and compare the methods employed in the SLR systems,
classi cations methods that have been used, and suggests the most
promising method for future research. Due to recent advancement in
classifcationmethods, many of the recent proposed works mainly
contribute on the classifcation methods, such as hybrid method and
Deep Learning. This paper focuses on the classifcation methodsused
in prior Sign Language Recognition system. Based on our
review, HMM- based approaches have been explored extensively in
prior research, including its modifcations.
This study is based on various input sensors, gesture segmentation,

extraction of features and classification methods. This paper aims to
22
analyze and compare the methods employed in the SLR systems,
classifications methods that have been used, and suggests the most
reliable method for future research. Due to recent advancement in
classification methods, many of the recently proposed works mainly
contribute to the classification methods, such as hybrid method and
Deep Learning. Based on our review, HMM-based approaches have
been explored extensively in prior research, including its
modifications.Hybrid CNN-HMM and fully Deep Learning approaches
have shown promising results and offer opportunities for further
exploration.
Communication between Deaf-Dumb People and

Normal People
Chat applications have become a powerful mediathat assist people to
communicate in different languages witheach other. There are lots of
chat applications that are useddifferent people in different languages
but there are not such achat application that has facilitate to
communicate with signlanguages. The developed system isbased on
Sinhala Sign language. The system has included fourmain components
as text messages are converted to sign messages, voice messages
are converted to sign messages, signmessages are converted to text
messages and sign messages areconverted to voice messages.
Google voice recognition API hasused to develop speech character
recognition for voice messages.The system has been trained for the
speech and text patterns by usingsome text parameters and signs of
Sinhala Sign language isdisplayed by emoji. Those emoji and signs
that are included inthis system will bring the normal people more close
to the disabled people. This is a 2 way communication system but it
uses pattern of gesture recognition which is not very realiable in
getting appropriate output.
A System for Recognition of Indian Sign Language for

Deaf People using Otsu’s Algorithm
23
In this paper we proposed some methods,through which the
recognition of the signs becomes easy forpeoples while
communication. And the result of thosesymbols signs will be
converted into the text. In this project,we are capturing hand gestures
through webcam andconvert this image into gray scale image. The
segmentationof gray scale image of a hand gesture is performed
usingOtsu thresholdingalgorithm.. Total image level is dividedinto two
classes one is hand and other is background. Theoptimal threshold
value is determined by computing theratio between class variance and
total class variance. Tofind the boundary of hand gesture in image
Canny edgedetection technique is used.In Canny edge detection we
used edge based segmentation and threshold based
segmentation.Then Otsu‘s algorithm is used because of its simple
calculation and stability.This algorithm fails, when the global
distribution of the target and background vary widely.
Intelligent Sign Language Recognition Using Image

Processing
Computer recognition of sign language is an important research

problem for enabling communication with hearing impaired people.
This project introduces an efficient and fast algorithm for identification
of the number of fingers opened in a gesture representing an alphabet
of the Binary Sign Language. The system does not require the hand to
be perfectly aligned to the camera. The project uses image processing
system to identify, especially English alphabetic sign language used by
the deaf people to communicate. The basic objective of this project is
to develop a computer based intelligent system that will enable dumb
people significantly to communicate with all other people using their
natural hand gestures. The idea consisted of designing and building
up an intelligent system using image processing, machine learning
and artificial intelligence concepts to take visual inputs of sign
language‘s hand gestures and generate easily recognizable form of
outputs. Hence the objective of this project is to develop an intelligent
system which can act as a translator between the sign language and
24
the spoken language dynamically and can make the communication
between people with hearing impairment and normal people both
effective and efficient. The system is we are implementing for Binary
sign language but it can detect any sign language with prior image
processing
Sign Language Recognition Using Image Processing
One of the major drawback of our society is the barrier that is created
between disabled or handicapped persons and the normal person.
Communication is the only medium by which we can share our
thoughts or convey the message but for a person with disability (deaf
and dumb) faces difficulty in communication with normal person. For
many deaf and dumb people , sign language is the basic means of
communication. Sign language recognition (SLR) aims to interpret sign
languages automatically by a computer in order to help the deaf
communicate with hearing society conveniently. Our aim is to design a
system to help the person who trained the hearing impaired to
communicate with the rest of the world using sign language or hand
gesture recognition techniques. In this system, feature detection and
feature extraction of hand gesture is done with the help of SURF
algorithm using image processing. All this work is done using
MATLAB software. With the help of this algorithm, a person can easily
trained a deaf and dumb.
Sign Language Interpreter using Image Processing and

Machine Learning
Speech impairment is a disability which affects one‘s ability to speak

and hear. Such individuals use sign language to communicate with
other people. Although it is an effective form of communication, there
remains a challenge for people who do not understand sign language
to communicate with speech impaired people. The aim of this paper is
to develop an application which will translate sign language to English
in the form of text and audio, thus aiding communication with sign
25
language. The application acquires image data using the webcam of
the computer, then it is preprocessed using a combinational algorithm
and recognition is done using template matching. The translation in
the form of text is then converted to audio. The database used for this
system includes 6000 images of English alphabets. We used 4800
images for training and 1200 images for testing. The system produces
88% accuracy.
Hand Gesture Recognition based on Digital Image

Processing using MATLAB
This research work presents a prototype system that helps to

recognize hand gesture to normal people in order to communicate
more effectively with the special people. Aforesaid research work
focuses on the problem of gesture recognition in real time that sign
language used by the community of deaf people. The problem
addressed is based on Digital Image Processing using Color
Segmentation, Skin Detection, Image Segmentation, Image Filtering,
and Template Matching techniques. This system recognizes gestures
of ASL (American Sign Language) including the alphabet and a subset
of its words.
GESTURE RECOGNITION SYSTEM
Communication plays a crucial part in human life. It encourages a man

to pass on his sentiments, feelings and messages by talking,
composing or by utilizing some other medium. Gesture based
communication is the main method for Communication for the
discourse and hearing weakened individuals. Communication via
gestures is a dialect that utilizations outwardly transmitted motions that
consolidates hand signs and development of the hands, arms, lip
designs, body developments and outward appearances, rather than
utilizing discourse or content, to express the individual's musings.
Gestures are the expressive and important body developments that
26
speaks to some message or data. Gestures are the requirement for
hearing and discourse hindered, they pass on their message to
others just with the assistance of motions. Gesture Recognition
System is the capacity of the computer interface to catch, track and
perceive the motions and deliver the yield in light of the caught signals.
It enables the clients to interface with machines (HMI) without the any
need of mechanical gadgets. There are two sorts of sign recognition
methods: image- based and sensor- based strategies. Image based
approach is utilized as a part of this project that manages
communication via gestures motions to distinguish and track the signs
and change over them into the relating discourse and content.
2.2 PROPOSED SYSTEM
Our proposed system is sign language recognition system

using convolution neural networks which recognizes various hand
gestures by capturing video and converting it into frames. Then the
hand pixels are segmented and the image it obtained and sent for
comparison to the trained model. Thus our system is more robust in
getting exact text labels of letters.
2.2.1 System Architecture
27
Fig2.2 Architecture of Sign Language recognition System
CHAPTER 3
METHODOLOGY
3.1 TRAINING MODULE:

Supervised machine learning:It is one of the ways of machine
learning where the model is trained by input data and expected output
data. Тo create such model, it is necessary to go through the following
phases:
28
1. model construction
2. model training
3. model testing
4. model evaluation
Model construction: It depends on machine learning algorithms.

In this projectscase, it was neural networks.Such an agorithm looks
like:
1. begin with its object: model = Sequential()
2. then consist of layers with their types: model.add(type_of_layer())
3. after adding a sufficient number of layers the model is compiled. At
this moment Keras communicates with TensorFlow for construction of
the model. During model compilation it is important to write a loss
function and an optimizer algorithm. It looks like:
model.comile(loss= ‗name_of_loss_function‘,
optimizer= ‗name_of_opimazer_alg‘ ) The loss function shows the
accuracy of each prediction made by the model.
Before model training it is important to scale data for their further use.
Model training:
After model construction it is time for model training. In this
phase, the model is trained using training data and expected output for
this data. It‘s look this way: model.fit(training_data, expected_output).
Progress is visible on the console when the script runs. At the end it
will report the final accuracy of the model.
Model Testing:
During this phase a second set of data is loaded. This data set
has never been seen by the model and therefore it‘s true accuracy will
be verified. After the model training is complete, and it is understood
that the model shows the right result, it can.
be saved by: model.save(―name_of_file.h5‖). Finally, the saved model

can be used in the real world. The name of this phase is model
29
evaluation. This means that the model can be used to evaluate new
data.
Uniform aspect ratio

Understanding aspect ratios:An aspect ratio is a proportional
relationship between an image's width and height. Essentially, it
describes an image's shape.Aspect ratios are written as a formula of
width to height, like this: For example, a square image has an aspect
ratio of 1:1, since the height and width are the same. The image could
be 500px × 500px, or 1500px × 1500px, and the aspect ratio would
still be 1:1.As another example, a portrait-style image might have a
ratio of 2:3. With this aspect ratio, the height is 1.5 times longer than
the width. So the image could be 500px
× 750px, 1500px × 2250px, etc.
Cropping to an aspect ratio
Aside from using built in site style options , you may want to manually
crop an image to a certain aspect ratio. For example, if you use
product images that have same aspect ratio, they'll all crop the same
way on your site. 7
Option 1 - Crop to a pre-set shape

Use the built-in Image Editor to crop images to a specific shape. After
opening the editor, use the crop tool to choose from preset aspect
ratios.
Option 2 - Custom dimensions
To crop images to a custom aspect ratio not offered by our built-
in Image Editor, use a third-party editor. Since images don‘t need to
have the samedimensions to have the same aspect ratio, it‘s better to
crop them to a specific ratio than to try to matchtheir exact dimensions.
For best results, crop the shorter side based on the longer side.
• For instance, if your image is 1500px × 1200px, and you want an
aspect ratio of 3:1, crop the shorter side to make the image 1500px ×
30
500px.
• Don't scale up the longer side; this can make your image blurry.
Image scaling:
• In compuer graphics and digital imaging , image scaling refers to the
resizing of a digital image. In video technology, the magnification of
digital
material is known as upscaling or resolution enhancement .
• When scaling a vector graphic image, the graphic
primitives that make up the image can be scaled using
geometric transformations, with no loss
of image quality. When scaling a raster graphics image, a new image with
a higher or lower number of pixels must be generated. In the case of
decreasing the pixel number (scaling down) this usually results in
avisible quality loss. From the standpoint of digital signal processing,
the scaling of raster graphics is a two- dimensional example of
sample-rate conversion, the conversion of a discrete signal from a
sampling rate (in this case the local sampling rate) to another.
DATASETS USED FOR TRAINING AND TESTING
31
Fig3.1: Sample dataset from train set
Fig3.2: Sample dataset from test set
3.2 ALGORITHM
HISTOGRAM CALCULATION:
Histograms are collected counts of data organized into a set of predefined bins
When we say data we are not restricting it to be intensity value. The

data collected can be whatever feature you find useful to describe
your image.
Let's see an example. Imagine that a Matrix contains information of

an image (i.e. intensity in the range 0−255):
32
What happens if we want to count this data in an organized way?
Since we know that the range of information value for this case is 256
values, we can segment our range in subparts (called bins) like:
[0,255]=[0,15]∪[16,31]∪....∪[240,255]range=bin1∪bin2∪ . ∪binn=15
and we can keep count of the number of pixels that fall in the range of each
bini
BackPropogation: Back-propagation is the essence of neural net

training. It is the method of fine-tuning the weights of a neural net
based on the error rate obtained in the previous epoch (i.e., iteration).
Proper tuning of the weights allows you to reduce error rates and to
make the model reliable by increasing its generalization.
Backpropagation is a short form for "backward propagation of errors."
It is a standard method of training artificial neural networks. This
method helps to calculate the gradient of a loss function with respects
to all the weights in the network.
Optimizer(Adam): Adam can be looked at as a combination

ofRMSprop and Stochastic Gradient Descent with momentum. It uses
the squared gradients to scale the learning rate like RMSprop and it
takes advantage of momentum by using moving average of the
gradient instead of gradient itself like SGD with momentum. Adam is
an adaptive learning rate method, which means, it computes individual
learning rates for different parameters. Its name is derived from
adaptive moment estimation, and the reason it‘s called that is because
Adam uses estimations of first and second moments of gradient to
adapt the learning rate for each weight of the neural network. Now,
what is moment ? N-th moment of a random variable is defined as the
expected value of that variable to the power of n. More formally:
Loss Function(categorical cross entrophy): Categorical
crossentropy is a loss function that is used for single label
categorization. This is when only onecategory isapplicable for each
data point. In other words, an example can belong to one class only.
33
Note. The block before the Target block must use the activation function
Softmax.
3.3 SEGMENTATION
Image segmentation is the process of partitioning a digital image into
multiple segments(sets of pixels, also known as image objects). The
goal of segmentation is to simplify and/or change the representation of
an image into something that is more meaningful and easier to
analyse.Modern image segmentation techniques are powered by deep
learning technology. Here are several deep learning architectures
used for segmentation:
Why does Image Segmentation even matter?
If we take an example of Autonomous Vehicles, they need sensory

input devices like cameras, radar, and lasers to allow the car to
perceive the world around it, creating a digital map. Autonomous
driving is not even possible without object detection which itself
involves image classification/segmentation.
How Image Segmentation works
Image Segmentation involves converting an image into a collection of
regions of pixels that are represented by a mask or a labeled image.
By dividing an image into segments, you can process only the
important segments of the image instead of processing the entire
image. A common technique is to look for abrupt discontinuities in
pixel values, which typically indicate edges that define a
region.Another common approach is to detect similarities in the regions
of an image. Some techniques that follow this approach are region
growing, clustering, and thresholding. A variety of other approaches to
perform image segmentation have been developed over the years
using domain-specific knowledge to effectively solve segmentation
problems in specific application areas.
3.4 CLASSIFICATION :CONVOLUTION NEURAL NETWORK

Image classification is the process of taking an input(like a
34
picture) and outputting its class or probability that the input is a
particular class. Neural networks are applied in the following steps:
1) One hot encode the data: A one-hot encoding can be applied to the
integer representation. This is where the integer encoded variable is
removed and a new binary variable is added for each unique integer
value.
2) Define the model: A model said in a very simplified form is nothing
but a function that is used to take in certain input, perform certain
operation to its beston the given input (learning and then
predicting/classifying) and produce the suitable output.
3) Compile the model: The optimizer controls the learning rate. We will
be using ‗adam‘ as our optmizer. Adam is generally a good optimizer
to use for many cases. The adam optimizer adjusts the learning rate
throughout training. The learning rate determines how fast the optimal
weights for the model are calculated. A smaller learning rate may lead
to more accurate weights (up to a certain point), but the time it takes to
compute the weights will be longer.
4) Train the model: Training a model simply means learning

(determining) good values for all the weights and the bias from labeled
examples. In supervised learning, a machine learning algorithm builds
a model by examining many examples and attempting to find a model
that minimizes loss; this process is called empirical risk minimization.
5) Test the model
A convolutional neural network convolves learned featured with input
data and uses 2D convolution layers.
Convolution Operation:
In purely mathematical terms, convolution is a function derived
from two given functions by integration which expresses how the
shape of one is modified by the other.
Convolution formula:
35
Here are the three elements that enter into the convolution operation:
• Input image
• Feature detector
• Feature map
Steps to apply convolution layer:
• You place it over the input image beginning from the top-left
corner within the borders you see demarcated above, and then you
count the number of cells in which the feature detector matches the
input image.
• The number of matching cells is then inserted in the top-left
cell of the feature map
• You then move the feature detector one cell to the right and
do the same thing. This movement is called a and since we are moving
the feature detector one cell at time, that would be called a stride of
one pixel.
• What you will find in this example is that the feature detector's
middle-left cell with the number 1 inside it matches the cell that it is
standing over inside the input image. That's the only matching cell, and
so you write ―1‖ in the next cell in the feature map, and so on and so
forth.
• After you have gone through the whole first row, you can then
move it over to the next row and go through the same process.
There are several uses that we gain from deriving a feature map.
These are the most important of them: Reducing the size of the input
image, and you should know that the larger your strides (the
movements across pixels), the smaller your feature map.
Relu Layer:
Rectified linear unit is used to scale the parameters to non
negativevalues.We get pixel values as negative values too . Inthis
layer we make them as 0‘s. The purpose of applying the rectifier
function is to increase the non-linearity in our images. The reason we
want to do that is that images are naturally non-linear. The rectifier
36
serves to break up the linearity even further in order to make up for
the linearity that we might impose an image when we put it through
the convolution operation. What the rectifier function does to an image
like this is remove all the black elements from it, keeping only those
carrying a positive value (the grey and white colors).The essential
difference between the non-rectified version of the image and the
rectified one is the progression of colors. After we rectify the image,
you will find the colors changing more abruptly. The gradual change is
no longer there. That indicates that the linearity has been disposed of.
Pooling Layer:
The pooling (POOL) layer reduces the height and width of the input. It
helps reduce computation, as well as helps make feature detectors
more invariant to its position in the input This process is what provides
the convolutional neural network with the ―spatial variance‖ capability.
In addition to that, pooling serves to minimize the size of the images as
well as the number of parameters which, in turn, prevents an issue of
―overfitting‖ from coming up. Overfitting in a nutshell is when you create
an excessively complex model in order to account for the
idiosyncracies we just mentioned The result ofusing a pooling layer
and creating down sampled or pooled feature maps is a summarized
version of the features detected in the input. They are useful as small
changes in the location of the feature in the input detected by the
convolutional layer will result in a pooled feature map with the
feature in the same location. Thiscapability added by pooling is called
the model‘s invariance to local translation.
Fully Connected Layer:

The role of the artificial neural network is to take this data and
combine the features into a wider variety of attributes that make the
convolutional network more capable of classifying images, which is the
whole purpose from creating a convolutional neural network. It has
neurons linked to each other ,and activates if it identifies patterns and
sends signals to output layer .the outputlayer gives output class based
37
on weight values, For now, all you need to know is that the loss
function informs us of how accurate our network is, which we then use
in optimizing our network in order to increase its effectiveness. That
requires certain things to be altered in our network. These include the
weights (the blue lines connecting the neurons, which are basically the
synapses), and the feature detector since the network often turns out
to be looking for the wrong features and has to be reviewed multiple
times for the sake of optimization.This full connection process
practically works as follows:
• The neuron in the fully-connected layer detects a certain feature; say, a
nose.
• It preserves its value.
• It communicates this value to the classes trained images.
3.5 TESTING
The purpose of testing is to discover errors. Testing is a

process of trying to discover every conceivable fault or weakness in a
work product.
It provides a way to check the functionality of components, sub
assemblies, assemblies and/or a finished product. It is the process of
exercising software with the intent of ensuring that the software
system meets its requirements and user expectations and does not fail
in an unacceptable manner.
Software testing is an important element of the software quality
assurance and represents the ultimate review of specification, design
and coding. The increasing feasibility of software as a system and the
cost associated with the software failures are motivated forces for
well planned through testing.
Testing Objectives:
There are several rules that can serve as testing objectives they are:
Testing is a process of executing program with the intent of finding an
error.
A good test case is the one that has a high probability of
38
finding an undiscovered error.
Types of Testing:
In order to make sure that the system does not have errors,
the different levels of testing strategies that are applied at different
phases of software development are :
Unit Testing:
Unit testing is done on individual models as they are
completed and becomes executable. It is confined only to the
designer's requirements. Unit testing is different from and should be
preceded by other techniques, including:
Inform Debugging
Code Inspection
Black Box testing

In this strategy some test cases are generated as input
conditions that fully execute all functional requirements for the
program.
This testing has been used to find error in the
following categories: Incorrect or missing functions
Interface errors
Errors in data structures are external database access
Performance error
Initialisation and termination of errors
In this testing only the output is checked for correctness
The logical flow of data is not checked
White Box testing

In this the test cases are generated on the logic of each
module by drawing flow graphs of that module and logical decisions
are tested on all the cases.
39
It has been used to generate the test cases in the following cases:
Guarantee that all independent paths have been executed
Execute all loops at their boundaries and within their operational
bounds.
Execute internal data structures to ensure their validity.
Integration Testing
Integration testing ensures that software and subsystems
work together a whole. It test the interface of all the modules to make
sure that the modules behave properly when integrated together. It is
typically performed by developers, especially at the lower, module to
module level. Testers become involved in higher levels
System Testing
Involves in house testing of the entire system before delivery to the

user. The aim is to satisfy the user the system meets all requirements
of the client‘s specifications. It is conducted by the testing organization
if a company has one. Test data may range from and generated to
production.
Requires test scheduling to plan and organize:
Inclusion of changes/fixes.
Test data to use
One common approach is graduated testing: as system testing

progresses and (hopefully) fewer and fewer defects are found, the
code is frozen for testing for increasingly longer time periods.
Acceptance Testing
It is a pre-delivery testing in which entire system is

tested at client‘s site on real world data to find errors.
User Acceptance
Test (UAT)
―Beta testing‖: Acceptance testing in the customer environment.
40
Requirements
traceability:
Match requirements to test cases.

Every requirement has to be cleared by at least one test case.
Display in a matrix of requirements vs. test cases.
id Test case Input Expected Test status

description output
Initializing
trained
1 Loadind model Loaded pass
mode
model without
l and load it
errors
into ON
Capturing Image frames
video of captured
2 Converting pass
an video stream
video to
d converting
frames
it
into frames
3 Recognize Image frame label Pass

hand gesture that contains
hand object
Table3.1: verification of testcases
3.6 DESIGN
Dataflow Diagram
The DFD is also known as bubble chart. It is a simple graphical
41
formalism that can be used to represent a system in terms of the input
data to the system, various processing carried out on these data, and
the output data is generated by the system. It maps out the flow of
information for any process or system, how data is processed in terms
of inputs and outputs. It uses defined symbols like rectangles, circles
and arrows to show data inputs, outputs, storage points and the routes
between each destination. They can be used to analyse an existing
system or model of a new one. A DFD can often visually ―say‖ things
that
would be hard to explain in words and they work for both technical
and non- technical. There are four components in DFD:
1. External Entity
2. Process
3. Data Flow
4. data Store
1) External Entity:
It is an outside system that sends or receives data, communicating with
the system. They are the sources and destinations of information
entering and leaving the system. They might be an outside
organization or person, a computer system or a business system. They
are known as terminators, sources and sinks or actors. They are
typically drawn on the edges of the diagram. These are sources and
destinations of the system‘s input and output.
Representation:
2) Process:
It is just like a function that changes the data, producing an output. It might
perform computations for sort data based on logic or direct the
dataflowbased on business rules
Representation:
42
3) Data Flow:
A dataflow represents a package of information flowing between two
objects in the data-flow diagram, Data flows are used to model the
flow of information into the system, out of the system and between the
elements within the system.
Representation:
4) Data Store:
These are the files or repositories that hold information for later use,
such as a database table or a membership form. Each data store
receives a simple label.
Representation:
Fig3.6.1:Dataflow Diagram for Sign Language Recognition
43
3.6.1 UML DIAGRAMS
UML stands for Unified Modeling Language. Taking SRS
document of analysis as input to the design phase drawn UML
diagrams. The UML is only language so is just one part of the
software development method. The UML is process independent,
although optimally it should be used in a process that should be
driven, architecture-centric, iterative, and incremental. The UML is
language for visualizing, specifying, constructing, documenting the
articles in a software- intensive system. It is based on diagrammatic
representations of software components.
A modeling language is a language whose vocabulary and rules

focus on the conceptual and physical representation of the system. A
modeling language such as the UML is thus a standard language for
software blueprints.
The UML is a graphical language, which consists of all interesting

systems. There are also different structures that can transcend what
can be represented in a programming language.
These are different diagrams in UML.
3.6.2 Use Case Diagram

Use Case during requirement elicitation and analysis to
represent the functionality of the system. Use case describes a
function by the system that yields a visible result for an actor. The
identification of actors and use cases result in the definitions of the
boundary of the system i.e., differentiating the tasks accomplished by
the system and the tasks accomplished by its environment. The actors
are outside the boundary of the system, whereas the use cases are
inside the boundary of the system. Use case describes the behaviour
of the system as seen from the actor‘s point of view. It describes the
44
function provided by the system as a set of events that yield a visible
result for the actor.
Purpose of Use Case Diagrams
The purpose of use case diagram is to capture the dynamic aspect of

a system. However, this definition is too generic to describe the
purpose, as other four diagrams (activity, sequence, collaboration,
and Statechart) also have the same purpose. We will look into some
specific purpose, which will distinguish it from other four diagrams.
Use case diagrams are used to gather the requirements of a system

including internal and external influences. These requirements are
mostly design requirements. Hence, when a system is analyzed to
gather its functionalities, use cases are prepared and actors are
identified.
When the initial task is complete, use case diagrams are modelled to
present the outside view.
In brief, the purposes of use case diagrams can be said
to be as follows − Used to gather the requirements of a
system.
Used to get an outside view of a system.
Identify the external and internal factors
influencing the system. Show the interaction
among the requirements are actors.
How to Draw a Use Case Diagram?
Use case diagrams are considered for high level requirement analysis
of a system. When the requirements of a system are analyzed, the
functionalities are captured in use cases.
We can say that use cases are nothing but the system functionalities
45
written in an organized manner. The second thing which is relevant to
use cases are the actors. Actors can be defined as something that
interacts with the system.
Actors can be a human user, some internal applications, or may be

some external applications. When we are planning to draw a use case
diagram, we should have the following items identified.
Functionalities to be
represented as use case
Actors
Relationships among the use cases and actors.

Use case diagrams are drawn to capture the functional requirements
of a system. After identifying the above items, we have to use the
following guidelines to draw an efficient use case diagram
The name of a use case is very important. The name should be

chosen in such a way so that it can identify the functionalities
performed.
Give a suitable name for actors.
Show relationships and dependencies clearly in the diagram.
Do not try to include all types of relationships, as the main purpose of

the diagram is to identify the requirements.
Use notes whenever required to clarify some important points.
46
Fig 3.6.2: Usecase diagram of sign language recognition System
47
Table 3.6.3: Usecase Scenario for sign language recognition system
3.6.3 Class Diagram
Class diagrams model class structure and contents using

design elements such as classes, packages and objects. Class
diagram describe the different perspective when designing a system-
conceptual, specification and implementation. Classes are composed
48
of three things: name, attributes, and operations. Class diagram also
display relationships such as containment, inheritance, association
etc. The association relationship is most common relationship in a
class diagram. The association shows the relationship between
instances of classes.
How to Draw a Class Diagram?
Class diagrams are the most popular UML diagrams used for
construction of software applications. It is very important to learn the
drawing procedure of class diagram.
Class diagrams have a lot of properties to consider while drawing but

here the diagram will be considered from a top level view.
Class diagram is basically a graphical representation of the static view

of the system and represents different aspects of the application. A
collection of class diagrams represent the whole system.
The following points should be remembered while drawing a class diagram −
The name of the class diagram should be meaningful to describe the

aspect of the system.
Each element and their relationships should be identified in
advance. Responsibility (attributes and methods) of each class
should be clearly identified
For each class, minimum number of properties should be specified, as

unnecessary properties will make the diagram complicated.
Use notes whenever required to describe some aspect of the

diagram. At the end of the drawing it should be understandable to the
developer/coder.
Finally, before making the final version, the diagram should be drawn
on plain paper and reworked as many times as possible to make it
correct.
49
Fig 3.6.4: Class diagram of sign language recognition system
3.6.4 Sequence Diagram
Sequence diagram displays the time sequence of the objects

participating in the interaction. This consists of the vertical
dimension(time) and horizontal dimension (different objects).
Objects: Object can be viewed as an entity at a particular point in time

with specific value and as a holder of identity.
A sequence diagram shows object interactions arranged in time

sequence. It depicts the objects and classes involved in the scenario
and the sequence of messages exchanged between the objects
needed to carry out the functionality of the scenario. Sequence
diagrams are typically associated with use case realizations in the
50
Logical View of the system under development. Sequence diagrams
are sometimes called event diagrams or event scenarios.
A sequence diagram shows, as parallel vertical lines (lifelines),

different processes or objects that live simultaneously, and, as
horizontal arrows, the messages exchanged between them, in the
order in which they occur. This allows the specification of simple
runtime scenarios in a graphical manner.
If the lifeline is that of an object, it demonstrates a role. Leaving the

instance name blank can represent anonymous and unnamed
instances.
Messages, written with horizontal arrows with the message name

written above them, display interaction. Solid arrow heads represent
synchronous calls, open arrow heads represent asynchronous
messages, and dashed lines represent reply messages. If a caller
sends a synchronous message, it must wait until the message is done,
such as invoking a subroutine. If a caller sends an asynchronous
message, it can continue processing and doesn‘t have to wait for a
response. Asynchronous calls are present in multithreaded
applications, event-driven applications and in message-oriented
middleware. Activation boxes, or method-call boxes, are opaque
rectangles drawn on top of lifelines to represent that processes are
being performed in response to the message (ExecutionSpecifications
in UML).
Objects calling methods on themselves use messages and add new

activation boxes on top of any others to indicate a further level of
processing. If an object is destroyed (removed from memory), an X is
drawn on bottom of the lifeline, and the dashed line ceases to be
drawn below it. It should be the result of a message, either from the
object itself, or another.
A message sent from outside the diagram can be represented by a

message originating from a filled-in circle (found message in UML) or
from a border of the sequence diagram (gate in UML).
51
UML has introduced significant improvements to the capabilities of
sequence diagrams. Most of these improvements are based on the
idea of interaction fragmentswhich represent smaller pieces of an
enclosing interaction. Multiple interaction fragments are combined to
create a variety of combined fragments, which are then used to model
interactions that include parallelism, conditional branches, optional
interactions
Fig3.6.5: Sequence diagram of sign language recognition system
52
4.1.1 State Chart
A state chart diagram describes a state machine which shows

the behaviour of classes. It shows the actual changes in state not
processes or commands that create those changes and is the dynamic
behaviour of objects over time by model1ing the life cycle of objects of
each class.
It describes how an object is changing from one state to another state.

There are mainly two states in State Chart Diagram:1. Initial State 2.
Final-State. Some of the components of State Chart Diagram are:
State: It is a condition or situation in life cycle of an object during

which it‘s satisfies same condition or performs some activity or waits
for some event.
Transition: It is a relationship between two states indicating that object

in first state performs some actions and enters into the next state or
event.
Event: An event is specification of significant occurrence that has a

location in time andspace
53
Fig:3.6.7 :State Chart diagram of sign language recognition system
3.7 SYSTEM REQUIREMENTS
3.7.1 Software Requirements
There are several software requirements that must be met for software to function
on a computer, including resources requirements and prerequisites. The minimal
requirements are as follows,
 Raspian OS
 Anaconda with Spyder
3.7.2 Hardware Requirements
The most common set of requirements defined by any operating system or
software application is the physical computer resources, also known as hardware.
The minimal hardware requirements are as follows,
54
 Raspberry Pi B+
 Camera Module
 8 GB SD Card
3.8 PROCESSING MODULE
Raspberry Pi is a small single-board computers developed in the United Kingdom
by the Raspberry Pi Foundation. The organization behind the Raspberry Pi
consists of two arms. The first two models were developed by the Raspberry Pi
Foundation. The Raspberry Pi hardware has evolved through several versions that
feature variations in memory capacity and peripheral- device support.
The Raspberry Pi device looks like a motherboard, with the mounted chips and
ports exposed (something you'd expect to see only if you opened up your
computer and looked at its internal boards), but it has all the components you need
to connect input, output, and storage devices and start computing.
Raspberry Pi is a low-cost, basic computer that was originally intended to help
spur interest in computing among school-aged children. The Raspberry Pi is
contained on a single circuit board and features ports for:
 HDMI
 USB 2.0
 Composite video
 Analog audio
 Internet
 SD Card
The computer runs entirely on open-source software and gives students the ability
to mix and match software according to the work they wish to do.
The Raspberry Pi debuted in February 2012. The group behind the computer's
development - the Raspberry Pi Foundation - started the project to make
computing fun for students, while also creating interest in how computers work at a
basic level. Unlike using an encased computer from a manufacturer, the
Raspberry Pi shows the essential guts behind the plastic. Even the software, by
virtue of being open- source, offers an opportunity for students to explore the
underlying code-if they wish. The Raspberry Pi is believed to be an ideal learning
tool, in that it is cheap to make, easy to replace and needs only a keyboard and a
TV to run. These same strengths also make it an ideal product to jumpstart
55
computing in the developing world. The quad-core Raspberry Pi 3 is both faster
and more capable than its predecessor, the Raspberry Pi 2. For those interested in
benchmarks, the Pi 3's CPU--the board's main processor--has roughly 50-60
percent better performance in 32-bit mode than that of the Pi 2, and is 10x faster
than the original single-core Raspberry Pi.
Fig 3.8: Processing Module
Compared to the original PI, real-world applications will see performance increase
of between 2.5x for single threaded applications and more than 20x when video
playback is accelerated by the chip's NEON engine. Unlike its predecessor, the
new board is capable of playing 1080p MP4 video at 60 frames per second (with a
bit rate of about 5400Kbps), boosting the Pi's media centre credentials. That's not
to say, however, that all video will playback this smoothly, with performance
dependent on the source video, the player used and bitrate. The Pi 3 also supports
wireless internet out of the box, with built- in Wi-Fi and Bluetooth. The latest board
can also boot directly from a USB-attached hard drive or pen drive, as well as
supporting booting from a network-attached file system, using PXE, which is useful
for remotely updating a Pi and for sharing an operating system image between
multiple machines.
3.9 STREAMING MODULE

An USB camera is a video camera that feeds or streams its image in real time to
or through a computer to a computer network. When "captured" by the computer,
the video stream may be saved, viewed or sent on to other networks travelling
through systems such as the internet, and e-mailed as an attachment. When sent
56
to a remote location, the video stream may be saved, viewed or on sent there.
Unlike an IP camera (which connects using Ethernet or Wi-Fi), a webcam is
generally connected by a USB cable, or similar cable, or built into computer
hardware, such as laptops.
The term ―USB‖ camera (a clipped compound) may also be used in its original
sense of a video camera connected to the USB continuously for an
indefinite time, rather than for a particular session, generally supplying a view for
anyone who visits its web page over the Internet. Some of them, for example,
those used as online cameras, are expensive, rugged professional video cameras.
Fig 3.9: USB Camera
3.10 PERFORMANCE MEASURE

Performance measures are used to evaluate the network performance of the
proposed model. This work uses accuracy, precision, recall and f1-score as
performance measure, which are formulated.
PRECISION
Precision, used in document retrievals, may be defined as the number of correct
documents returned by our ML model. We can easily calculate it by confusion
matrix with the help of following formula
RECALL
Recall may be defined as the number of positives returned by our ML model. We
can easily calculate it by confusion matrix with the help of following formula.
SUPPORT
57
Support may be defined as the number of samples of the true response that lies
in each class of target values.
F1 SCORE
This score will give us the harmonic mean of precision and recall. Mathematically,
F1 score is the weighted average of the precision and recall. The best value of F1
would be 1 and worst would be 0. We can calculate F1 score with the help of
following formula
F1 = 2 * (precision * recall) / (precision + recall)
F1 score is having equal relative contribution of precision and recall.
We can use classification_report function of sklearn.metrics to get the
classification report of our classification model.
58
CHAPTER 4
RESULTS
4.1 RESULTS
Our proposed methodology‘s execution has been examined on the test data
which was distinct from the training data set. The testing process involves
43,200 image samples of different hand signals. All these images were
simultaneously updated into our proposed model to come up with accurate
results. Our expected result is to obtain a text which is the translation of the
sign language given as an input. Our model will anticipate all the hand
gestures of the Indian Sign Language. To achieve an efficient result. The
estimated accuracy of our proposed system is more than 95% even in a
multiplex lighting environment which is considered an adequate result as of
now for real-time interpretation.
59
CHAPTER 5
CONCLUSIONS AND FUTURE WORK
5.1 CONCLUSIONS AND FUTURE WORK
The proposed a prototype model can recognize and classify the Indian Sign
Language using the deep structured learning techniques called CNN. We
observe that the CNN model gives the highest accuracy due to the advanced
techniques. From the process, we can conclude that CNN is an efficient
technique to categorize hand gestures with high degree of accuracy. In the
future work, we would like to expand the dialects for a few more sign
languages and our response time can also be improved.
60
REFERENCES
[1] Vijayalakshmi, P., & Aarthi, M. (2016, April). Sign language to speech conversion. In 2016 International Conference on
Recent Trends in Information Technology (ICRTIT) (pp. 1-6). IEEE.
[2] NB, M. K. (2018). Conversion of sign language into text. International Journal of Applied Engineering Research, 13(9),
7154-7161.
[3] Masood, S., Srivastava, A., Thuwal, H. C., & Ahmad, M. (2018). Real-time sign language gesture (word) recognition from
video sequences using CNN and RNN. In Intelligent Engineering Informatics (pp. 623-632). Springer, Singapore.
[4] Apoorv, S., Bhowmick, S. K., & Prabha, R. S. (2020, June). Indian sign language interpreter using image processing and
machine learning. In IOP Conference Series: Materials Science and Engineering (Vol. 872, No. 1, p. 012026). IOP
Publishing.
[5] Kaushik, N., Rahul, V., & Kumar, K. S. (2020). A Survey of Approaches for Sign Language Recognition
System. International Journal of Psychosocial Rehabilitation, 24(01).
[6] Kishore, P. V. V., & Kumar, P. R. (2012). A video-based Indian sign language recognition system (INSLR) using wavelet
transform and fuzzy logic. International Journal of Engineering and Technology, 4(5), 537.
[7] Dixit, K., & Jalal, A. S. (2013, February). Automatic Indian sign language recognition system. In 2013 3rd IEEE
International Advance Computing Conference (IACC) (pp. 883-887). IEEE.
[8] Das, A., Yadav, L., Singhal, M., Sachan, R., Goyal, H., Taparia, K., ... & Trivedi, G. (2016, December). Smart glove for
Sign Language communications. In 2016 International Conference on Accessibility to Digital World (ICADW) (pp. 27-31).
IEEE.
[9] Sruthi, R., Rao, B. V., Nagapravallika, P., Harikrishna, G., & Babu, K. N. (2018). Vision-Based Sign Language by Using
MATLAB. International Research Journal of Engineering and Technology (IRJET), 5(3).
[10] Kumar, A., & Kumar, R. (2021). A novel approach for ISL alphabet recognition using Extreme Learning
Machine. International Journal of Information Technology, 13(1), 349-357.
[11] Maraqa, M., & Abu-Zaiter, R. (2008, August). Recognition of Arabic Sign Language (Arce) using recurrent neural
networks. In 2008 First International Conference on the Applications of Digital Information and Web Technologies
(ICADIWT) (pp. 478-481). IEEE.
[12] Masood, S., Srivastava, A., Thuwal, H. C., & Ahmad, M. (2018). Real-time sign language gesture (word) recognition from
video sequences using CNN and RNN. In Intelligent Engineering Informatics (pp. 623-632). Springer, Singapore.
[13] Camgoz, N. C., Hadfield, S., Koller, O., Ney, H., & Bowden, R. (2018). Neural sign language translation. In Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 7784-7793).
[14] Cheng, K. L., Yang, Z., Chen, Q., & Tai, Y. W. (2020, August). Fully convolutional networks for continuous sign language
recognition. In European Conference on Computer Vision (pp. 697-714). Springer, Cham.
[15] Dabre, K., & Dholay, S. (2014, April). The machine learning model for sign language interpretation using webcam images.
In 2014 International Conference on Circuits, Systems, Communication and Information Technology Applications
(CSCITA) (pp. 317-321). IEEE.
[16] Taskiran, M., Killioglu, M., & Kahraman, N. (2018, July). A real-time system for recognition of American sign language
by using deep learning. In 2018 41st International Conference on Telecommunications and Signal Processing (TSP) (pp. 1-
5). IEEE.
[17] Khan, S. A., Joy, A. D., Asaduzzaman, S. M., & Hossain, M. (2019, April). An efficient sign language translator device
using convolutional neural network and customized ROI segmentation. In 2019 2nd International Conference on
Communication Engineering and Technology (ICCET) (pp. 152-156). IEEE.
[18] Nair, A. V., & Bindu, V. (2013). A review on Indian sign language recognition. International journal of computer
applications, 73(22).
[19] Kumar, D. M., Bavanraj, K., Thavananthan, S., Bastiansz, G. M. A. S., Harshanath, S. M. B., & Alosious, J. (2020,
December). EasyTalk: A Translator for Sri Lankan Sign Language using Machine Learning and Artificial Intelligence.
In 2020 2nd International Conference on Advancements in Computing (ICAC) (Vol. 1, pp. 506-511). IEEE.
[20] Kumar, A., Madaan, M., Kumar, S., Saha, A., & Yadav, S. (2021, August). Indian Sign Language Gesture Recognition in
Real-Time using Convolutional Neural Networks. In 2021 8th International Conference on Signal Processing and
Integrated Networks (SPIN) (pp. 562-568). IEEE.
[21] Manikandan, K., Patidar, A., Walia, P., & Roy, A. B. (2018). Hand gesture detection and conversion to speech and text.
arXiv preprint arXiv:1811.11997.
[22] Misra, S., Singha, J., & Laskar, R. H. (2018). Vision-based hand gesture recognition of alphabets, numbers, arithmetic
operators and ASCII characters to develop a virtual text-entry interface system. Neural Computing and Applications, 29(8),
117-135.
[23] Hoste, L., Dumas, B., & Signer, B. (2012, May). SpeeG: a multimodal speech-and gesture-based text input solution. In
Proceedings of the International working conference on advanced visual interfaces (pp. 156-163).
[24] Buxton, W., Fiume, E., Hill, R., Lee, A., & Woo, C. (1983). Continuous hand-gesture driven input. In Graphics Interface
(Vol. 83, pp. 191-195).
[25] Kunjumon, J., & Megalingam, R. K. (2019, November). Hand gesture recognition system for translating indian sign
language into text and speech. In 2019 International Conference on Smart Systems and Inventive Technology (ICSSIT)
(pp. 14-18). IEEE.
[26] Dardas, N. H., & Georganas, N. D. (2011). Real-time hand gesture detection and recognition using bag-of-features and
support vector machine techniques. IEEE Transactions on Instrumentation and measurement, 60(11), 3592-3607.
[27] Köpüklü, O., Gunduz, A., Kose, N., & Rigoll, G. (2019, May). Real-time hand gesture detection and classification using
convolutional neural networks. In 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition
(FG 2019) (pp. 1-8). IEEE.
[28] Francke, H., Ruiz-del-Solar, J., & Verschae, R. (2007, December). Real-time hand gesture detection and recognition using
boosted classifiers and active learning. In Pacific-Rim Symposium on Image and Video Technology (pp. 533-547).
Springer, Berlin, Heidelberg.
[29] Zhang, Q., Chen, F., & Liu, X. (2008, July). Hand gesture detection and segmentation based on difference background
image with complex background. In 2008 International Conference on Embedded Software and Systems (pp. 338-343).
IEEE.
[30] Mazhar, O., Navarro, B., Ramdani, S., Passama, R., & Cherubini, A. (2019). A real-time humanrobot interaction
framework with robust background invariant hand gesture detection. Robotics and Computer-Integrated Manufacturing, 60,
34-48.
[31] Liu, W., Li, X., Jia, Z., Yan, H., & Ma, X. (2017). A three-dimensional triangular vision-based contouring error detection
system and method for machine tools. Precision Engineering, 50, 85-98.
[32] Cohen, C. J., Beach, G., & Foulk, G. (2001, October). A basic hand gesture control system for PC applications. In
Proceedings 30th Applied Imagery Pattern Recognition Workshop (AIPR 2001). Analysis and Understanding of Time
Varying Imagery (pp. 74-79). IEEE.
[33] Reifinger, S., Wallhoff, F., Ablassmeier, M., Poitschke, T., & Rigoll, G. (2007, July). Static and dynamic hand-gesture
recognition for augmented reality applications. In International Conference on Human-Computer Interaction (pp. 728-737).
Springer, Berlin, Heidelberg.
61
[34] Kurakin, A., Zhang, Z., & Liu, Z. (2012, August). A real time system for dynamic hand gesture recognition with a depth
sensor. In 2012 Proceedings of the 20th European signal processing conference (EUSIPCO) (pp. 1975-1979). IEEE.
[35] Plouffe, G., & Cretu, A. M. (2015). Static and dynamic hand gesture recognition in depth data using dynamic time
warping. IEEE transactions on instrumentation and measurement, 65(2), 305- 316.
[36] Ghotkar, A. S., Khatal, R., Khupase, S., Asati, S., & Hadap, M. (2012, January). Hand gesture recognition for indian sign
language. In 2012 International Conference on Computer Communication and Informatics (pp. 1-4). IEEE.
[37] Dutta, K. K., & GS, A. K. (2015, December). Double handed Indian Sign Language to speech and text. In 2015 Third
International Conference on Image Information Processing (ICIIP) (pp. 374-377). IEEE.
[38] Dixit, K., & Jalal, A. S. (2013, February). Automatic Indian sign language recognition system. In 2013 3rd IEEE
International Advance Computing Conference (IACC) (pp. 883-887). IEEE.
[39] Nair, A. V., & Bindu, V. (2013). A review on Indian sign language recognition. International journal of computer
applications, 73(22).
[40] Rajam, P. S., & Balakrishnan, G. (2011, September). Real time Indian sign language recognition system to aid deaf-dumb
people. In 2011 IEEE 13th international conference on communication technology (pp. 737-742). IEEE.
APPENDICES
(a)SAMPLE CODE
Training and validation
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import os
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from sklearn.model_selection import train_test_split
from tensorflow.keras.utils import to_categorical
import cv2
import pydot
def load_dataset(directory):
images = []
labels = []
for idx, label in enumerate(uniq_labels):
for file in os.listdir(directory+'/'+label):
filepath = directory + '/' +label +'/' +file
img = cv2.resize(cv2.imread(filepath), (50, 50))
62
images.append(img)
labels.append(idx)
images = np.asarray(images)
labels = np.asarray(labels)
return images, labels
def display_images(x_data, y_data, title, display_label = True):
x, y = x_data, y_data
fig, axes = plt.subplots(5, 8, figsize = (18, 5))
fig.subplots_adjust(hspace= 0.5, wspace=0.5)
fig.suptitle(title, fontsize=18)
for i, ax in enumerate(axes.flat):
ax.imshow(cv2.cvtColor(x[i], cv2.COLOR_BGR2RGB))
if display_label:
ax.set_xlabel(uniq_labels[y[i]])
ax.set_xticks([])
ax.set_yticks([])
plt.show()
#loading_dataset into X_pre and Y_pre
data_dir = r'D:\final year main project\1Indian sign Language\sample_creation'
uniq_labels = sorted(os.listdir(data_dir))
X_pre, Y_pre = load_dataset(data_dir)
print(X_pre.shape, Y_pre.shape)
#spliting dataset into 80% train, 10% validation and 10% test data
X_train, X_test, Y_train, Y_test = train_test_split(X_pre, Y_pre, test_size = 0.8)
X_test, X_eval, Y_test, Y_eval = train_test_split(X_test, Y_test, test_size = 0.5)

63
print("Train images shape", X_train.shape, Y_train.shape)
print("Test images shape", X_test.shape, Y_test.shape)
print("Evaluate image shape", X_eval.shape, Y_eval.shape)
display_images(X_train, Y_train, 'Samples from Train Set')
display_images(X_test, Y_test, 'Samples from Test set')
display_images(X_eval, Y_eval, 'Samples from validation set')
Y_train = to_categorical(Y_train)
Y_test = to_categorical(Y_test)
Y_eval = to_categorical(Y_eval)
X_train = X_train / 255
X_test = X_test/255
X_eval = X_eval/255
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(16, (3,3), activation = 'relu', input_shape=(50, 50, 3)),
tf.keras.layers.Conv2D(16, (3,3), activation = 'relu'),
tf.keras.layers.Conv2D(16, (3,3), activation= 'relu'),
tf.keras.layers.MaxPool2D((2,2)),
tf.keras.layers.MaxPool2D((2,2)),
tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
tf.keras.layers.Flatten(),
64
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(1, activation='softmax')
])
model.summary()
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics=['accuracy'])
history = model.fit(X_train, Y_train, epochs=20, validation_data=(X_eval, Y_eval))
#testing
model.evaluate(X_test, Y_test)
model.save(r'D:\final year main project\1Indian sign Language\test_train.h5')
#summarize history for accuracy
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
main.py
import cv2
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import os
65
model = tf.keras.models.load_model(r'D:\final year main project\1Indian sign
Language\test_train.h5')
model.summary()
data_dir = r'D:\final year main project\1Indian sign Language\dataset'
labels = sorted(os.listdir(data_dir))
labels[-1] = 'Nothing'
print(labels)
cap = cv2.VideoCapture(0)
while(True):
_ , frame = cap.read()
cv2.rectangle(frame, (100, 100), (400, 400), (0, 255, 0), 5)
roi = frame[100:400, 100:400]
img = cv2.resize(roi, (50, 50))
cv2.imshow('Output', roi)
img = img/255
prediction = model.predict(img.reshape(1, 50, 50, 3))
char_index = np.argmax(prediction)
confidence = round(prediction[0, char_index]*100, 1)
predicted_char = labels[char_index]
font = cv2.FONT_HERSHEY_TRIPLEX
fontScale = 1
color = (0, 255, 255)
thickness = 2
if confidence > 98:
66
msg = predicted_char +', Conf: '+str(confidence) +' %'
cv2.putText(frame, msg, (80, 80), font, fontScale, color, thickness)
print(predicted_char)
cv2.imshow('Output1', frame)
if cv2.waitKey(2) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
(b)OUTPUT SCREENSHOTS
(a) Experimental outcomes
67
(b) Training accuracy
(c) Structure of network
68
(d) Output in windows
(e) Output in Raspian OS
69
PLAGIARISM REPORT
70

Sathyabama: Conversion of Sign Language Into Speech or Text Using CNN

Uploaded by

Document Informationclick to expand document information

Document Informationclick to expand document information

Copyright:

Available Formats

Sathyabama: Conversion of Sign Language Into Speech or Text Using CNN

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sathyabama: Conversion of Sign Language Into Speech or Text Using CNN

Uploaded by

Copyright:

Available Formats

CONVERSION OF SIGN LANGUAGE INTO SPEECH OR

TEXT USING CNN

Submitted in partial fulfillment of the requirements for

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Head of the Department

Submitted for Viva voce Examination held on

Internal Examiner External Examiner

hereby declare that the Project Report entitled “CONVERSION OF SIGN

LANGUAGE INTO SPEECH OR TEXT USING CNN‖ done by me under the

guidance of Ms. AISHWARYA R M.E., is submitted in partial fulfillment of the

requirements for the award of Bachelor of Engineering degree in Computer

Science and Engineering

PLACE: SIGNATURE OF THE CANDIDATE

I am pleased to acknowledge my sincere thanks to Board of Management of

I would like to express my sincere and deep sense of gratitude to my Project

I wish to express my thanks to all Teaching and Non-teaching staff members of

Exchange of words among the community is one of the essential mediums of

Figure no. Name of the Figure Page no.

1.1 Phases of pattern recognition 3

2.1 Layers involved in CNN 21

2.2 Architecture of Sign Language recognition System 27

3.2 Sample dataset from test set 31

3.6.1 Data flow Diagram 39

3.6.2 Use Case Diagram 46

3.6.4 Class Diagram 49

CHAPTER NO. TITLE PAGE NO

1.1 Image Processing 1

2.1 Survey Walk Through 7

2.3 Proposed System 26

3.5 Testing Module 37

3.6.2 Usecase Diagram 44

3.6.3 Class Diagram 48

3.6.4 Sequence Diagram 49

3.6.5 State chart Diagram 52

3.7.1 Software requirements 54

3.7.2 Hardware requirements 54

3.8 Processing Module 54

3.9 Streaming Module 56

3.10 Performance Measure

4 RESULT AND FUTURE WORKS

5.1 Conclusion and Future work 59

Speech impaired people use hand signs and gestures to

1.1 IMAGE PROCESSING

• Analysing and manipulating the image.

• Output in which result can be altered image or report that is based on

Digital image processing:

1.2 SIGN LANGUAGE

1.3 SIGN LANGUAGE AND HAND GESTURE

Dumb people are usually deprived of normal communication with

1.5 PROBLEM STATEMENT

1.6 ORGANISATION OF THESIS

Part 3: Explains the methodologies in detail, represents the

Part 4: Represents the project in various designs.

2.1 SURVEY WALKTHROUGH:

TensorFlow is a free and open-source software library for dataflow and

Features: TensorFlow provides stable Python (for version 3.7 across

Application: Among the applications for which TensorFlow is the