Conversion of Sign Language Into Speech or Text Using CNN
Conversion of Sign Language Into Speech or Text Using CNN
LIST OF FIGURES
TABLE OF CONTENT
2 LITERATURE SURVEY
3 METHODOLOGY AND
IMPLEMENTATION
3.1 Training Module 28
3.1.1 Pre-Processing 28
i
3.2 Algorithm 32
i
3.3 Segmentation 33
i
3.4 Convolution Neural Networks 34
3.10.1 Precision 57
3.10.2 Recall 57
3.10.3 Support 57
3.10.4 F1 score 57
iv
5 CONCLUSION AND FUTURE WORK
7 APPENDIX
a) Sample code 61
b) Screenshots 68
v
CHAPTER 1
INTRODUCTION
• image analysis.
There are two types of methods used for image processing namely,
analogue and digital image processing. Analogue image processing
can be used for the hard copies like printouts and photographs. Image
analysts use various fundamentals of interpretation while using these
visual techniques. Digital image processing techniques help in
manipulation of the digital images by using computers. The three
general phases that all types of data have to undergo while using
1
digital technique are pre- processing, enhancement, and display,
information extraction.
2
Fig1.1: Phases of pattern recognition
The first phase includes the image segmentation and object separation.
In this phase, different objects are detected and separate from other
background. The second phase is the feature extraction. In this phase,
objects are measured. The measuring feature is to quantitatively
estimate some important features of objects, and a group of the
features are combined to make up a feature vector during feature
extraction. The third phase is classification. In this phase, the output is
just a decision to determine which category every object belongs to.
Therefore, for pattern recognition, what input are images and what
output are object types and structural analysis of images. The
structural analysis is a description of images in order to correctly
understand and judge for the important information of images.
It is a language that includes gestures made with the hands and other
body parts, including facial expressions and postures of the body.It
used primarily by people who are deaf and dumb. There are many
different sign languages as, British, Indian and American sign
languages. British sign language (BSL) is not easily intelligible to users
of American sign Language (ASL) and vice versa .
A functioning signing recognition system could provide a chance for
the inattentive communicate with non-signing people without the
necessity for an interpreter. It might be wont to generate speech or
3
text making the deaf more independent. Unfortunately there has not
been any system with these capabilities thus far. during this project
our aim is to develop a system which may classify signing accurately.
American Sign Language (ASL) is a complete, natural language that
has the same linguistic properties as spoken languages, with grammar
that differs from English. ASL is expressed by movements of the
hands and face. It is the primary language of many North Americans
who are deaf and hard of hearing, and is used by many hearing
people as well.
The process of converting the signs and gestures shown by the user
into text is called sign language recognition. It bridges the
communication gap between people who cannot speak and the
general public. Image processing algorithms along with neural
networks is used to map the gesture to appropriate text in the training
data and hence raw images/videos are converted into respective text
that can be read and understood.
4
or the deaf community. The importance of sign language is
emphasized by the growing public approval and funds for international
project. At this age of Technology the demand for a computer based
system is highly demanding for the dumb community. However,
researchers have been attacking the problem for quite some time now
and the results are showing some promise. Interesting technologies
are being developed for speech recognition but no real commercial
product for sign recognition is actually there in the current market. The
idea is to make computers to understand human language and
develop a user friendly human computer interfaces (HCI). Making a
computer understand speech, facial expressions and human gestures
are some steps towards it. Gestures are the non-verbally exchanged
information. A person can perform innumerable gestures at a time.
Since human gestures are perceived through vision, it is a subject of
great interest forcomputer vision researchers. The project aims to
determine human gestures by creating an HCI. Coding of these
gestures into machine language demands a complex programming
algorithm. In our project we are focusing on Image Processing and
Template matching for better output generation.
1.4 MOTIVATION
The 2011 Indian census cites roughly 1.3 million people with
―hearingimpairment‖. In contrast to that numbers from India‘s National
Association of the Deaf estimates that 18 million people –roughly 1
per cent of Indian population are deaf. These statistics formed the
motivation for our project. As these speech impairment and deaf
people need a proper channel to communicate with normal people
there is a need for a system. Not all normal people can understand
sign language of impaired people. Our project hence is aimed at
converting the sign language gestures into text that is readable for
normal people