0% found this document useful (0 votes)
132 views

Project Report - Sign Language To Text Conversion

The document discusses a project that aims to develop a technique for fingerspelling-based American Sign Language (ASL) interpretation using machine learning. It involves preprocessing hand movements with a filter and classifying gestures using a trained neural network model. The system recognizes all 26 ASL alphabet letters with 85% accuracy. This has significant implications for improving communication accessibility for the deaf and hard of hearing community.

Uploaded by

Mohd Anas
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
132 views

Project Report - Sign Language To Text Conversion

The document discusses a project that aims to develop a technique for fingerspelling-based American Sign Language (ASL) interpretation using machine learning. It involves preprocessing hand movements with a filter and classifying gestures using a trained neural network model. The system recognizes all 26 ASL alphabet letters with 85% accuracy. This has significant implications for improving communication accessibility for the deaf and hard of hearing community.

Uploaded by

Mohd Anas
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

UNIVERSITY INSTITUTE OF TECHNOLOGY,

BARKATULLAH UNIVERSITY, BHOPAL (M.P.)


Department of Computer Science & Engineering

MAJOR PROJECT
ON
"SIGN LANGUAGE DETECTION USING MACHINE LEARNING”

Submitted for the partial fulfilment of the Requirement for the


Award of degree of Bachelor of Technology (B.TECH)
Year 2024
University Institute of Technology, Barkatullah University, Bhopal
Submitted By :
Nupur Bopche Chhavi Dhote

Under the Guidance Of

Dr. Kamini Maheshwar Mrs.Kavita Chourasia


Project coordinator Project coordinator
CSE,UIT-BU,Bhopal CSE,UIT-BU,Bhopal

Dr.Divakar Singh Prof. N.K. Gaur


Guide &HOD Director
CSE,UIT-BU, Bhopal UIT-BU,Bhopal
UNIVERSITY INSTITUTE OF TECHNOLOGY
BARKATULLAH UNIVERSITY, BHOPAL (M.P.)
Department of Computer Science & Engineering

CERTIFICATE
Year 2023-2024

This is to certify that Miss Nupur Bopche and Miss Chhavi Dhote have successfully
completed this project work entitled “Sign Language Detection Using Machine Learning”
in the partial fulfilment of the award of the degree of Bachelor of Technology(B.TECH.) 8th
Semester in Computer Science & Engineering in the year 2024 of Barkatullah University
Institute of Technology, Bhopal.

Dr. Kamini Maheshwar Mrs.Kavita Chourasia


Project Coordinator Project Coordinator
CSE,UIT-BU,Bhopal CSE,UIT-BU,Bhopal

Dr.DivakarSingh Prof. N.K. Gaur


Guide &HOD Director
CSE,UIT-BU,Bhopal UIT-BU,Bhopal
UNIVERSITY INSTITUTE OF TECHNOLOGY,
BARKATULLAH UNIVERSITY, BHOPAL (M.P.)
Department of Computer Science & Engineering

DECLARATION
YEAR 2020-24

Hereby declare that the Major Project work being presented in this report entitled “Sign
Language Detection Using Machine Learning” submitted in the department of computer
science, FACULTY OF TECHONOLGY, UNIVERSITY INSTITUTE OF
TECHNOLOGY BARKATULLAH, BHOPAL is the authentic work carried out by us
under the guidance of Mrs. Kavita Chourasia, Dr. Kamini Maheshwar Assistant
Professor and Dr. Divakar Singh HOD Department of Computer Science & Engineering,
Barkatullah University, Bhopal.
This is our original work and has not been submitted earlier for the award of any other
degree, diploma or any other certificate.

DATE:__\__\____

Nupur Bopche
Chhavi Dhote
ACKNOWLEDGEMENT

We would like to express our deepest appreciation for all those who provided us the
possibility to complete this project.
Before we get into the thick of the things we would like to add a few heartfelt words for the
teachers who are part of this project in a numerous way. Teacher who gave us support right
from the project idea were convinced.
First of all , It gives us immense pleasure to express my deepest sense of gratitude and
sincere thanks to my highly respected and esteemed guide Dr. Kamini Maheshwar and
Mrs. Kavita Chourasia for their valuable guidance, encouragement and help for completing
this work.
We would also like to extend our special thanks and gratitude to HOD Dr. Divakar Singh
for providing effective platform and support in the development of this project. And finally,
we would like to render our thanks to Director Dr. N.K Gaur for his guidance in this major
project titled “Sign Language Detection Using Machine Learning”
Last but not the least we would like to thank our parent and friends for their support and
coordination. Regardless of the source we wish to express our gratitude to those who may
have contributed to this work, even though anonymously.

Submitted By:-
Chhavi Dhote(R218237200067)
Nupur Bopche(R218237200020)
ABSTRACT

Sign language stands as both a fundamental and ancient form of communication, yet its
widespread accessibility remains constrained by the scarcity of proficient sign language
interpreters. This effort closes this crucial gap by presenting a brand-new real-time
technique for fingerspelling-based American Sign Language (ASL) interpretation that
makes use of neural networks. The suggested method involves a number of steps, the first
of which is pre-processing hand movements with a specific filter in order to improve
uniformity and clarity. Then, for classification, a well trained neural network model is
used, which reliably predicts gestures by using discovered patterns and characteristics.
Interestingly, the system recognizes all 26 ASL alphabet letters with an amazing accuracy
rate of 85%.
This breakthrough has significant implications for improving communication accessibility
for those who use sign language, even beyond its technical accomplishments. This study
offers a promising way to get past the obstacles that the deaf and hard of hearing
community faces in successful communication: using machine learning. Because the
interpretation process is real-time, users can participate in a smooth and fluid manner,
gaining unprecedented autonomy and inclusivity in the process. Furthermore, the
suggested method's versatility and scalability allow for its integration into a variety of
platforms and devices, expanding its influence and reach even further.
To sum up, this study shows how artificial intelligence can revolutionize communication
for a variety of populations, marking a significant advancement in the field of assistive
technology. This project promotes a more inclusive and equitable society where
communication obstacles are removed and people are given the freedom to express
themselves freely and effectively by bridging the gap between sign language users and the
larger community.
LIST OF FIGURES

S. No. Figure No Figure Name Page No


1 1.1 Component 2
2 1.2 ASL Gestures 3
3 4.1 ANN Architecture 11
4 4.2 CNN Architecture 12
5 4.3 Pooling 13
6 4.4 Fully Connected Layer 14
7 5.1 Region of Interest (ROI) 18
8 5.2 Gaussian Blur 19
9 5.3 Two Layer Algorithm 19
10 5.4 Application 23
11 7.1 Confusion Matrix I 40
12 7.2 Confusion Matrix II 40
LIST OF ABBREVIATIONS

S.No. Abbreviation Full Form


1 D&M Deaf and/or Mute
2 ASL American Sign Language
3 BSL British Sign Language
4 HMM Hidden Markov Model
5 ANN Artificial Neural Network
6 CNN Convolutional Neural Network
7 ROI Region of Interest
8 ReLU Rectified Linear Unit
9 RGB Red Green Blue
10 NLP Natural Language processing
TABLE OF CONTENTS

ABSTRACT....................................................................................................I

List of Figures.................................................................................................II
List of Abbreviations......................................................................................III

CHAPTER 1: INTRODUCTION..............................................................01

CHAPTER 2: MOTIVATION...................................................................04

CHAPTER 3: LITERATURE SURVEY..................................................06


Data Acquisition..............................................................................07
Data Pre-Processing.........................................................................08
Feature Extraction……………………...…..…................................08
Gesture Classification……………………………...............................08

CHAPTER 4: TECHNIQUES AND REQUIREMENTS….…..............10


Feature Extraction and Representation .............................................11
Artificial Neural Network (ANN) .....................................................11
Convolutional Neural Network (CNN)..............................................12
TensorFlow ........................................................................................14
Keras...................................................................................................14
OpenCV .............................................................................................15
NumPy ...............................................................................................15
MediaPipe...........................................................................................15
CHAPTER 5 :METHODOLOGY..............................................................17
Data Set Generation............................................................................18
Gesture Classification ........................................................................19
Finger Spelling Sentence Formation Implementation.......................22
Auto-correct Feature............................................................................23
Training and Testing ............................................................................23

CHAPTER 6: CODING............................................................................25
Data Collection..................................................................................26
Functions...........................................................................................29
Data ..................................................................................................30
Training.............................................................................................32
Application........................................................................................33

CHAPTER 7: OUTPUT AND RESULT ...............................................36


Output...............................................................................................37
Result................................................................................................39

CHAPTER 8: CHALLENGES FACED ...............................................41

CHAPTER 9: CONCLUSION ...............................................................43


CHAPTER 10: FUTURE SCOPE..........................................................45

REFERENCE............................................................................................47
CHAPTER 1

INTRODUCTION

1
1. Introduction
American sign language is a predominant sign language since the only disability Deaf and Dumb
(hereby referred to as D&M) people have is communication related and since they cannot use
spoken languages, the only way for them to communicate is through sign language. Communication
is the process of exchange of thoughts and messages in various ways such as speech, signals,
behavior and visuals.D&M people make use of their hands to express different gestures to express
their ideas with other people. Gestures are the non-verbally exchanged messages and these gestures
are understood with vision. This nonverbal communication of deaf and dumb people is called sign
language. A sign language is a language which uses gestures instead of sound to convey meaning
combining hand-shapes, orientation and movement of the hands, arms or body, facial expressions
and lip-patterns. Contrary to popular belief, sign language is not international. These vary from
region to region.

Sign language is a visual language and consists of 3 major components [6]:

Fig1.1- Component

Minimizing the verbal exchange gap among D&M and non-D&M people turns into a want to make
certain effective conversation among all. Sign language translation is among one of the most
growing lines of research and it enables the maximum natural manner of communication for those
with hearing impairments. A hand gesture recognition system offers an opportunity for deaf people
to talk with vocal humans without the need of an interpreter. The system is built for the automated
conversion of ASL into textual content and speech.

2
In our project we primarily focus on producing a model which can recognize Finger spelling
based hand gestures in order to form a complete word by combining each gesture. The
gestures we aim to train are as given in the image below.

Fig 1.2-ASL Gestures

3
CHAPTER 2

MOTIVATION

4
2.Motivation
Communication between individuals who are deaf and/or mute (D&M) and those who are not
often encounters a significant language barrier due to the structural differences between sign
language and spoken or written language. Consequently, individuals who rely on sign
language for communication are dependent on visual-based methods to interact effectively
with others.

The existence of a common interface capable of converting sign language into text would
greatly facilitate understanding for non-D&M individuals, allowing them to comprehend
gestures more easily. Thus, research efforts have been directed towards the development of a
vision-based interface system, enabling seamless communication for D&M individuals without
requiring proficiency in each other's respective languages.

The primary objective is to design and implement a user-friendly Human-Computer Interface


(HCI) that can effectively interpret human sign language. Sign languages vary across different
regions and cultures worldwide, including American Sign Language (ASL), French Sign
Language, British Sign Language (BSL), Indian Sign Language, Japanese Sign Language,
among others. Extensive work has been conducted on various sign languages globally,
highlighting the importance of creating a versatile and inclusive HCI solution that caters to
diverse linguistic needs.

5
CHAPTER 3
LITERATURE SURVEY

6
3. Literature Survey:
In the recent years there has been tremendous research done on the hand gesture
recognition.
With the help of literature survey, we realized that the basic steps in hand gesture recognition
are: -

 Data acquisition
 Data pre-processing
 Feature extraction
 Gesture classification

3.1Data acquisition:
The different approaches to acquire data about the hand gesture can be done in the following
ways:

1. Use of sensory devices:


It uses electromechanical devices to provide exact hand configuration, and position.
Different glove-based approaches can be used to extract information. But it is
expensive and not user friendly.

2. Vision based approach:


In vision-based methods, the computer webcam is the input device for observing the
information of hands and/or fingers. The Vision Based methods require only a camera,
thus realizing a natural interaction between humans and computers without the use of
any extra devices, thereby reducing cost. These systems tend to complement biological
vision by describing artificial vision systems that are implemented in software and/or
hardware. The main challenge of vision-based hand detection ranges from coping with
the large variability of the human hand’s appearance due to a huge number of hand
movements, to different skin-color possibilities as well as to the variations in
viewpoints, scales, and speed of the camera capturing the scene.

7
3.2Data Pre-Processing and 3.3 Feature extraction for vision-based
approach:

● In [1] the approach for hand detection combines threshold-based colour detection with
background subtraction. We can use AdaBoost face detector to differentiate between
faces and hands as they both involve similar skin-color.

● We can also extract necessary image which is to be trained by applying a filter called
Gaussian Blur (also known as Gaussian smoothing). The filter can be easily applied
using open computer vision (also known as OpenCV) and is described in [3].

● For extracting necessary image which is to be trained we can use instrumented gloves as
mentioned in [4]. This helps reduce computation time for Pre-Processing and gives us
more concise and accurate data compared to applying filters on data received from video
extraction.

● We tried doing the hand segmentation of an image using color segmentation techniques
but skin colorur and tone is highly dependent on the lighting conditions due to which
output, we got for the segmentation we tried to do were no so great. Moreover, we have a
huge number of symbols to be trained for our project many of which look similar to each
other like the gesture for symbol ‘V’ and digit ‘2’, hence we decided that in order to
produce better accuracies for our large number of symbols, rather than segmenting the
hand out of a random background we keep background of hand a stable single colour so
that we don’t need to segment it on the basis of skin colour. This would help us to get
better results.

3.4Gesture Classification:

 In [1] Hidden Markov Models (HMM) is used for the classification of the gestures. This
model deals with dynamic aspects of gestures. Gestures are extracted from a sequence of
video images by tracking the skin-color blobs corresponding to the hand into a body– face
space centered on the face of the user.

8
 The goal is to recognize two classes of gestures: deictic and symbolic. The image is filtered
using a fast look–up indexing table. After filtering, skin colour pixels are gathered into
blobs. Blobs are statistical objects based on the location (x, y) and the colorimetry (Y, U,
V) of the skin color pixels in order to determine homogeneous areas.

 In [2] Naïve Bayes Classifier is used which is an effective and fast method for static hand
gesture recognition. It is based on classifying the different gestures according to geometric
based invariants which are obtained from image data after segmentation.

 Thus, unlike many other recognition methods, this method is not dependent on skin colour.
The gestures are extracted from each frame of the video, with a static background. The first
step is to segment and label the objects of interest and to extract geometric invariants from
them. Next step is the classification of gestures by using a K nearest neighbor algorithm
aided with distance weighting algorithm (KNNDW) to provide suitable data for a locally
weighted Naïve Bayes‟ classifier.

 According to the paper on “Human Hand Gesture Recognition Using a Convolution Neural
Network” by Hsien-I Lin, Ming-Hsiang Hsu, and Wei-Kai Chen (graduates of Institute of
Automation Technology National Taipei University of Technology Taipei, Taiwan), they
have constructed a skin model to extract the hands out of an image and then apply binary
threshold to the whole image. After obtaining the threshold image they calibrate it about the
principal axis in order to centre the image about the axis. They input this image to a
convolutional neural network model in order to train and predict the outputs. They have
trained their model over 7 hand gestures and using this model they produced an accuracy of
around 95% for those 7 gestures.

9
CHAPTER 4
TECHNIQUES AND REQUIREMENT

10
4. Techniques and Requirement

4.1Feature Extraction and Representation:


The representation of an image as a 3D matrix having dimension as of height and width of
the image and the value of each pixel as depth (1 in case of Grayscale and 3 in case of
RGB). Further, these pixel values are used for extracting useful features using CNN.

4.2Artificial Neural Network (ANN):


Artificial Neural Network is a connection of neurons, replicating the structure of human
brain. Each connection of neuron transfers information to another neuron. Inputs are fed
into first layer of neurons which processes it and transfers to another layer of neurons called
as hidden layers. After processing of information through multiple layers of hidden layers,
information is passed to final output layer.

Fig 4.1- ANN Architecture

These are capable of learning and have to be trained. There are different learning strategies:

1. Unsupervised Learning

11
2. Supervised Learning

3. Reinforcement Learning

4.3Convolutional Neural Network (CNN):

Unlike regular Neural Networks, in the layers of CNN, the neurons are arranged in 3
dimensions: width, height, depth. The neurons in a layer will only be connected to a small
region of the layer (window size) before it, instead of all of the neurons in a fully-connected
manner. Moreover, the final output layer would have dimensions (number of classes),
because by the end of the CNN architecture we will reduce the full image into a single
vector of class scores.

Fig 4.2- CNN Architecture

1. Convolution Layer:

In convolution layer we take a small window size [typically of length 5*5] that extends
to the depth of the input matrix. The layer consists of learnable filters of window size.
During every iteration we slid the window by stride size [typically 1], and compute the
dot product of filter entries and input values at a given position.

As we continue this process we will create a 2-Dimensional activation matrix that gives
the response of that matrix at every spatial position. That is, the network will learn

12
filters that activate when they see some type of visual feature such as an edge of some
orientation or a blotch of some colour.

2. Pooling Layer:

We use pooling layer to decrease the size of activation matrix and ultimately reduce the
learnable parameters. There are two types of pooling:

a. Max Pooling: In max pooling we take a window size [for example window of
size 2*2], and only take the maximum of 4 values. Well lid this window and
continue this process, so well finally get an activation matrix half of its original Size.
b. Average Pooling: In average pooling, we take advantage of all Values in a
window.

Fig 4.3-Pooling

3. Fully Connected Layer:

In convolution layer, neurons are connected only to a local region, while in a fully
connected region, we will connect all the inputs to neurons.

13
Fig 4.4-Fully Connected Layer

4. Final Output Layer:


After getting values from fully connected layer, we will connect them to the final layer
of neurons [having count equal to total number of classes], that will predict the
probability of each image to be in different classes.

4.4TensorFlow:
TensorFlow is an end-to-end open-source platform for Machine Learning. It has a
comprehensive, flexible ecosystem of tools, libraries and community resources that lets
researchers push the state-of-the-art in Machine Learning and developers easily build and
deploy Machine Learning powered applications.

TensorFlow offers multiple levels of abstraction so you can choose the right one for your
needs. Build and train models by using the high-level Keras API, which makes getting
started with TensorFlow and machine learning easy.

If you need more flexibility, eager execution allows for immediate iteration and intuitive
debugging. For large ML training tasks, use the Distribution Strategy API for distributed
training on different hardware configurations without changing the model definition.

4.5 Keras:
Keras is a high-level neural networks library written in python that works as a wrapper to
TensorFlow. It is used in cases where we want to quickly build and test the neural network
14
with minimal lines of code. It contains implementations of commonly used neural network
elements like layers, objective, activation functions, optimizers, and tools to make working
with images and text data easier.

4.6 OpenCV:
OpenCV (Open-Source Computer Vision) is an open-source library of programming
functions used for real-time computer-vision.
It is mainly used for image processing, video capture and analysis for features like face and
object recognition. It is written in C++ which is its primary interface, however bindings are
available for Python, Java, MATLAB/OCTAVE.

4.7 NumPy:
NumPy serves as the backbone of many scientific computing projects, offering a wide range
of functionalities crucial for data manipulation and analysis. Its primary data structure, the
ndarray (N-dimensional array), enables efficient storage and manipulation of large datasets,
making it indispensable for tasks such as image processing, signal processing, and machine
learning.
In our major project, NumPy plays a pivotal role in handling image data and processing. We
leverage NumPy arrays to represent images, enabling us to perform various operations such
as resizing, cropping, and filtering. Additionally, NumPy's extensive collection of
mathematical functions allows us to perform complex computations on image data with
ease, facilitating tasks such as feature extraction, transformation, and normalization.
Furthermore, NumPy seamlessly integrates with other libraries commonly used in computer
vision and machine learning projects, such as OpenCV and scikit-learn. This
interoperability enables us to leverage the full power of NumPy alongside specialized tools
for tasks such as image segmentation, object detection, and classification.

4.8 MediaPipe:
MediaPipe empowers developers to build robust and efficient pipelines for real-time
perception tasks, including hand tracking, pose estimation, facial recognition, and
augmented reality. Its modular design and pre-trained models simplify the development
process, allowing developers to focus on building applications rather than reinventing the

15
wheel.
In our major project, we harness the capabilities of MediaPipe for tasks related to hand
tracking and gesture recognition. By leveraging MediaPipe's pre-trained models and
efficient inference pipelines, we can accurately detect and track hand movements in real-
time, enabling seamless interaction with our system.
Moreover, MediaPipe offers a high level of customization, allowing us to fine-tune models
and adapt them to specific use cases. This flexibility enables us to tailor our gesture
recognition system to the unique requirements of our project, ensuring optimal performance
and accuracy.

16
CHAPTER 5
METHODOLOGY

17
5.Methodology
The system is a vision-based approach. All signs are represented with bare hands and so it
eliminates the problem of using any artificial devices for interaction.

5.1 Data Set Generation:


For the project we tried to find already made datasets but we couldn’t find dataset in the
form of raw images that matched our requirements. All we could find were the datasets in
the form of RGB values. Hence, we decided to create our own data set. Steps we
followed to create our data set are as follows.

We used Open computer vision (OpenCV) library in order to produce our dataset.

Firstly, we captured around 630 images of each of the symbol in ASL (American Sign
Language) for training purposes and around 160 images per symbol for testing purpose.

First, we capture each frame shown by the webcam of our machine. In each frame we
define a Region Of Interest (ROI) which is denoted by a blue bounded square as shown in
the image below:

Fig 5.1-Region of Interest(ROI)

Then, we apply Gaussian Blur Filter to our image which helps us extract various features of
our image. The image, after applying Gaussian Blur, looks as follows:

18
Fig 5.2-Gaussian Blur

5.2Gesture Classification:
Our approach uses two layers of algorithm to predict the final symbol of the user.

Fig 5.3-Two Layer Algorithm

19
Algorithm Layer 1:

1. Apply Gaussian Blur filter and threshold to the frame taken with openCV to get the
processed image after feature extraction.
2. This processed image is passed to the CNN model for prediction and if a letter is
detected for more than 50 frames then the letter is printed and taken into consideration
for forming the word.
3. Space between the words is considered using the blank symbol.

Algorithm Layer 2:

1. We detect various sets of symbols which show similar results on getting detected.
2. We then classify between those sets using classifiers made for those sets only.

Layer 1:

 CNN Model:

1. 1st Convolution Layer:The input picture has resolution of 128x128 pixels. It is first
processed in the first convolutional layer using 32 filter weights (3x3 pixels each). This
will result in a 126X126 pixel image, one for each Filter-weights.
2. 1st Pooling Layer:The pictures are down sampled using max pooling of 2x2 i.e we keep
the highest value in the 2x2 square of array. Therefore, our picture is down sampled to
63x63pixels.
3. 2nd Convolution Layer: Now, these 63 x 63 from the output of the first pooling layer is
served as an input to the second convolutional layer. It is processed in the second
convolutional layer using 32 filter weights (3x3 pixels each). This will result in a 60 x 60
pixel image.
4. 2nd Pooling Layer: The resulting images are down sampled again using max pool of
2x2 and is reduced to 30 x 30 resolution of images.
5. 1st Densely Connected Layer: Now these images are used as an input to a fully
connected layer with 128 neurons and the output from the second convolutional layer is
reshaped to an array of 30x30x32 =28800 values. The input to this layer is an array of
28800 values. The output of these layer is fed to the 2nd Densely Connected Layer. We
are using a dropout layer of value 0.5 to avoidoverfitting.

20
6. 2nd Densely Connected Layer: Now the output from the 1st Densely Connected Layer
is used as an input to a fully connected layer with 96 neurons.
7. Final Layer: The output of the 2nd Densely Connected Layer serves as an input for the
final layer which will have the number of neurons as the number of classes we are
classifying (alphabets + blank symbol).

 Activation Function:
We have used ReLU (Rectified Linear Unit) in each of the layers (convolutional as
well as fully connected neurons).
ReLU calculates max(x,0) for each input pixel. This adds nonlinearity to the formula
and helps to learn more complicated features. It helps in removing the vanishing
gradient problemand speeding up the training by reducing the computation time.

 Pooling Layer:
We apply Max pooling to the input image with a pool size of (2, 2) with ReLU
activation function. This reduces the amount of parameters thus lessening the
computation cost and reduces overfitting.

 Dropout Layers:
The problem of overfitting, where after training, the weights of the network are so
tuned to the training examples they are given that the network doesn’t perform well
when given new examples. This layer “drops out” a random set of activations in that
layer by setting them to zero. The network should be able to provide the right
classification or output for a specific example even if some of the activations are
dropped out [5].

 Optimizer:
We have used Adam optimizer for updating the model in response to the output of
the loss function.
Adam optimizer combines the advantages of two extensions of two stochastic
gradient descent algorithms namely adaptive gradient algorithm (ADA GRAD) and
root mean square propagation (RMSProp).

21
Layer 2:

We are using two layers of algorithms to verify and predict symbols which are more similar
to each other so that we can get us close as we can get to detect the symbol shown. In our
testing we found that following symbols were not showing properly and were giving other
symbols also:

1. For D : R and U
2. For U : D and R
3. For I : T, D, K and I
4. For S : M and N
So, to handle above cases we made three different classifiers for classifying these sets:
1. {D, R, U}
2. {T, K, D, I}
3. {S, M, N}

5.3Finger Spelling Sentence Formation Implementation:


1. Whenever the count of a letter detected exceeds a specific value and no other letter is
close to it by a threshold,we print the letter and add it to the current string (In our code
we kept the value as 50 and difference threshold as20).
2. Otherwise, we clear the current dictionary which has the count of detections
ofpresentsymboltoavoidtheprobabilityofawronglettergettingpredicted.
3. Whenever the count of a blank (plain background) detected exceeds a specific value
and if the current buffer is empty no spaces are detected.
4. In other case it predicts the end of word by printing a space and the current gets
appended to the sentence below.

22
Fig 5.4- Application

5.4AutoCorrect Feature:

A python library Hunspell_suggest is used to suggest correct alternatives for each (incorrect)
input word and we display a set of words matching the current word in which the user can
select a word to append it to the current sentence. This helps in reducing mistakes committed
in spellings and assists in predicting complex words.

5.5Training and Testing:

We convert our input images (RGB) into grayscale and apply gaussian blur to remove
unnecessary noise. We apply adaptive threshold to extract our hand from the background and
resize our images to 128 x 128.

We feed the input images after pre-processing to our model for training and testing after
applying all the operations mentioned above.

23
The prediction layer estimates how likely the image will fall under one of the classes. So, the
output is normalized between 0 and 1 and such that the sum of each value in each class sums
to 1. We have achieved this using SoftMax function.

At first the output of the prediction layer will be somewhat far from the actual value. To
make it better we have trained the networks using labelled data. The cross-entropy is a
performance measurement used in the classification. It is a continuous function which is
positive at values which is not same as labelled value and is zero exactly when it is equal to
the labelled value. Therefore, we optimized the cross-entropy by minimizing it as close to
zero. To do this in our network layer we adjust the weights of our neural networks.
TensorFlow has an inbuilt function to calculate the cross entropy.

As we have found out the cross-entropy function, we have optimized it using Gradient
Descent in fact with the best gradient descent optimizer is called Adam Optimizer.

24
CHAPTER 6

CODING

25
6.Coding

1) DATA COLLECTION:-

import os
import cv2
cap=cv2.VideoCapture(0)
directory='Image/'
while True:
_,frame=cap.read()
count = {
'a': len(os.listdir(directory+"/A")),
'b': len(os.listdir(directory+"/B")),
'c': len(os.listdir(directory+"/C")),
'd': len(os.listdir(directory+"/D")),
'e': len(os.listdir(directory+"/E")),
'f': len(os.listdir(directory+"/F")),
'g': len(os.listdir(directory+"/G")),
'h': len(os.listdir(directory+"/H")),
'i': len(os.listdir(directory+"/I")),
'j': len(os.listdir(directory+"/J")),
'k': len(os.listdir(directory+"/K")),
'l': len(os.listdir(directory+"/L")),
'm': len(os.listdir(directory+"/M")),
'n': len(os.listdir(directory+"/N")),
'o': len(os.listdir(directory+"/O")),
'p': len(os.listdir(directory+"/P")),
'q': len(os.listdir(directory+"/Q")),
'r': len(os.listdir(directory+"/R")),
's': len(os.listdir(directory+"/S")),
't': len(os.listdir(directory+"/T")),
'u': len(os.listdir(directory+"/U")),
'v': len(os.listdir(directory+"/V")),
'w': len(os.listdir(directory+"/W")),
'x': len(os.listdir(directory+"/X")),
'y': len(os.listdir(directory+"/Y")),
'z': len(os.listdir(directory+"/Z"))
}
# cv2.putText(frame, "a : "+str(count['a']), (10, 100),
cv2.FONT_HERSHEY_PLAIN, 1, (0,255,255), 1)
# cv2.putText(frame, "b : "+str(count['b']), (10, 110),
cv2.FONT_HERSHEY_PLAIN, 1, (0,255,255), 1)
# cv2.putText(frame, "c : "+str(count['c']), (10, 120),
cv2.FONT_HERSHEY_PLAIN, 1, (0,255,255), 1)
# cv2.putText(frame, "d : "+str(count['d']), (10, 130),
cv2.FONT_HERSHEY_PLAIN, 1, (0,255,255), 1)
# cv2.putText(frame, "e : "+str(count['e']), (10, 140),
cv2.FONT_HERSHEY_PLAIN, 1, (0,255,255), 1)

26
# cv2.putText(frame, "f : "+str(count['f']), (10, 150),
cv2.FONT_HERSHEY_PLAIN, 1, (0,255,255), 1)
# cv2.putText(frame, "g : "+str(count['g']), (10, 160),
cv2.FONT_HERSHEY_PLAIN, 1, (0,255,255), 1)
# cv2.putText(frame, "h : "+str(count['h']), (10, 170),
cv2.FONT_HERSHEY_PLAIN, 1, (0,255,255), 1)
# cv2.putText(frame, "i : "+str(count['i']), (10, 180),
cv2.FONT_HERSHEY_PLAIN, 1, (0,255,255), 1)
# cv2.putText(frame, "k : "+str(count['k']), (10, 190),
cv2.FONT_HERSHEY_PLAIN, 1, (0,255,255), 1)
# cv2.putText(frame, "l : "+str(count['l']), (10, 200),
cv2.FONT_HERSHEY_PLAIN, 1, (0,255,255), 1)
# cv2.putText(frame, "m : "+str(count['m']), (10, 210),
cv2.FONT_HERSHEY_PLAIN, 1, (0,255,255), 1)
# cv2.putText(frame, "n : "+str(count['n']), (10, 220),
cv2.FONT_HERSHEY_PLAIN, 1, (0,255,255), 1)
# cv2.putText(frame, "o : "+str(count['o']), (10, 230),
cv2.FONT_HERSHEY_PLAIN, 1, (0,255,255), 1)
# cv2.putText(frame, "p : "+str(count['p']), (10, 240),
cv2.FONT_HERSHEY_PLAIN, 1, (0,255,255), 1)
# cv2.putText(frame, "q : "+str(count['q']), (10, 250),
cv2.FONT_HERSHEY_PLAIN, 1, (0,255,255), 1)
# cv2.putText(frame, "r : "+str(count['r']), (10, 260),
cv2.FONT_HERSHEY_PLAIN, 1, (0,255,255), 1)
# cv2.putText(frame, "s : "+str(count['s']), (10, 270),
cv2.FONT_HERSHEY_PLAIN, 1, (0,255,255), 1)
# cv2.putText(frame, "t : "+str(count['t']), (10, 280),
cv2.FONT_HERSHEY_PLAIN, 1, (0,255,255), 1)
# cv2.putText(frame, "u : "+str(count['u']), (10, 290),
cv2.FONT_HERSHEY_PLAIN, 1, (0,255,255), 1)
# cv2.putText(frame, "v : "+str(count['v']), (10, 300),
cv2.FONT_HERSHEY_PLAIN, 1, (0,255,255), 1)
# cv2.putText(frame, "w : "+str(count['w']), (10, 310),
cv2.FONT_HERSHEY_PLAIN, 1, (0,255,255), 1)
# cv2.putText(frame, "x : "+str(count['x']), (10, 320),
cv2.FONT_HERSHEY_PLAIN, 1, (0,255,255), 1)
# cv2.putText(frame, "y : "+str(count['y']), (10, 330),
cv2.FONT_HERSHEY_PLAIN, 1, (0,255,255), 1)
# cv2.putText(frame, "z : "+str(count['z']), (10, 340),
cv2.FONT_HERSHEY_PLAIN, 1, (0,255,255), 1)
row = frame.shape[1]
col = frame.shape[0]
cv2.rectangle(frame,(0,40),(300,400),(255,255,255),2)
cv2.imshow("data",frame)
cv2.imshow("ROI",frame[40:400,0:300])
frame=frame[40:400,0:300]
interrupt = cv2.waitKey(10)
if interrupt & 0xFF == ord('a'):

27
cv2.imwrite(directory+'A/'+str(count['a'])+'.png',frame)
if interrupt & 0xFF == ord('b'):
cv2.imwrite(directory+'B/'+str(count['b'])+'.png',frame)
if interrupt & 0xFF == ord('c'):
cv2.imwrite(directory+'C/'+str(count['c'])+'.png',frame)
if interrupt & 0xFF == ord('d'):
cv2.imwrite(directory+'D/'+str(count['d'])+'.png',frame)
if interrupt & 0xFF == ord('e'):
cv2.imwrite(directory+'E/'+str(count['e'])+'.png',frame)
if interrupt & 0xFF == ord('f'):
cv2.imwrite(directory+'F/'+str(count['f'])+'.png',frame)
if interrupt & 0xFF == ord('g'):
cv2.imwrite(directory+'G/'+str(count['g'])+'.png',frame)
if interrupt & 0xFF == ord('h'):
cv2.imwrite(directory+'H/'+str(count['h'])+'.png',frame)
if interrupt & 0xFF == ord('i'):
cv2.imwrite(directory+'I/'+str(count['i'])+'.png',frame)
if interrupt & 0xFF == ord('j'):
cv2.imwrite(directory+'J/'+str(count['j'])+'.png',frame)
if interrupt & 0xFF == ord('k'):
cv2.imwrite(directory+'K/'+str(count['k'])+'.png',frame)
if interrupt & 0xFF == ord('l'):
cv2.imwrite(directory+'L/'+str(count['l'])+'.png',frame)
if interrupt & 0xFF == ord('m'):
cv2.imwrite(directory+'M/'+str(count['m'])+'.png',frame)
if interrupt & 0xFF == ord('n'):
cv2.imwrite(directory+'N/'+str(count['n'])+'.png',frame)
if interrupt & 0xFF == ord('o'):
cv2.imwrite(directory+'O/'+str(count['o'])+'.png',frame)
if interrupt & 0xFF == ord('p'):
cv2.imwrite(directory+'P/'+str(count['p'])+'.png',frame)
if interrupt & 0xFF == ord('q'):
cv2.imwrite(directory+'Q/'+str(count['q'])+'.png',frame)
if interrupt & 0xFF == ord('r'):
cv2.imwrite(directory+'R/'+str(count['r'])+'.png',frame)
if interrupt & 0xFF == ord('s'):
cv2.imwrite(directory+'S/'+str(count['s'])+'.png',frame)
if interrupt & 0xFF == ord('t'):
cv2.imwrite(directory+'T/'+str(count['t'])+'.png',frame)
if interrupt & 0xFF == ord('u'):
cv2.imwrite(directory+'U/'+str(count['u'])+'.png',frame)
if interrupt & 0xFF == ord('v'):
cv2.imwrite(directory+'V/'+str(count['v'])+'.png',frame)
if interrupt & 0xFF == ord('w'):
cv2.imwrite(directory+'W/'+str(count['w'])+'.png',frame)
if interrupt & 0xFF == ord('x'):
cv2.imwrite(directory+'X/'+str(count['x'])+'.png',frame)
if interrupt & 0xFF == ord('y'):

28
cv2.imwrite(directory+'Y/'+str(count['y'])+'.png',frame)
if interrupt & 0xFF == ord('z'):
cv2.imwrite(directory+'Z/'+str(count['z'])+'.png',frame)

cap.release()
cv2.destroyAllWindows()

2) FUNCTIONS:-

#import dependency
import cv2
import numpy as np
import os
import mediapipe as mp

mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles
mp_hands = mp.solutions.hands

def mediapipe_detection(image, model):


image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # COLOR CONVERSION BGR 2
RGB
image.flags.writeable = False # Image is no longer
writeable
results = model.process(image) # Make prediction
image.flags.writeable = True # Image is now writeable
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR) # COLOR COVERSION RGB 2 BGR
return image, results

def draw_styled_landmarks(image, results):


if results.multi_hand_landmarks:
for hand_landmarks in results.multi_hand_landmarks:
mp_drawing.draw_landmarks(
image,
hand_landmarks,
mp_hands.HAND_CONNECTIONS,
mp_drawing_styles.get_default_hand_landmarks_style(),
mp_drawing_styles.get_default_hand_connections_style())

def extract_keypoints(results):
if results.multi_hand_landmarks:
for hand_landmarks in results.multi_hand_landmarks:
rh = np.array([[res.x, res.y, res.z] for res in
hand_landmarks.landmark]).flatten() if hand_landmarks else np.zeros(21*3)

29
return(np.concatenate([rh]))
# Path for exported data, numpy arrays
DATA_PATH = os.path.join('MP_Data')

actions = np.array(['A','B','C','D','E','F','G',
'H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'])

no_sequences = 100

sequence_length = 50

3)DATA:-

from function import *


from time import sleep

for action in actions:


for sequence in range(no_sequences):
try:
os.makedirs(os.path.join(DATA_PATH, action, str(sequence)))
except:
pass

# cap = cv2.VideoCapture(0)
# Set mediapipe model
with mp_hands.Hands(
model_complexity=0,
min_detection_confidence=0.5,
min_tracking_confidence=0.5) as hands:

# NEW LOOP
# Loop through actions
for action in actions:
# Loop through sequences aka videos
for sequence in range(no_sequences):
# Loop through video length aka sequence length
for frame_num in range(sequence_length):

# Read feed
# ret, frame = cap.read()
frame = cv2.imread('Image/{}/{}.png'.format(action, sequence))
# frame=cv2.imread('{}{}.png'.format(action,sequence))
# frame=cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)

# Make detections

30
image, results = mediapipe_detection(frame, hands)
# print(results)

# Draw landmarks
draw_styled_landmarks(image, results)

# NEW Apply wait logic


if frame_num == 0:
cv2.putText(image, 'STARTING COLLECTION', (120, 200),
cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 4,
cv2.LINE_AA)
cv2.putText(image, 'Collecting frames for {} Video Number
{}'.format(action, sequence), (15, 12),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1,
cv2.LINE_AA)
# Show to screen
cv2.imshow('OpenCV Feed', image)
cv2.waitKey(200)
else:
cv2.putText(image, 'Collecting frames for {} Video Number
{}'.format(action, sequence), (15, 12),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1,
cv2.LINE_AA)
# Show to screen
cv2.imshow('OpenCV Feed', image)

# NEW Export keypoints


keypoints = extract_keypoints(results)
npy_path = os.path.join(
DATA_PATH, action, str(sequence), str(frame_num))
np.save(npy_path, keypoints)

# Break gracefully
if cv2.waitKey(10) & 0xFF == ord('q'):
break

# cap.release()
cv2.destroyAllWindows()

31
4) TRAINING:-

from function import *


from sklearn.model_selection import train_test_split
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import LSTM, Dense
from keras.callbacks import TensorBoard
label_map = {label:num for num, label in enumerate(actions)}
# print(label_map)
sequences, labels = [], []
for action in actions:
for sequence in range(no_sequences):
window = []
for frame_num in range(sequence_length):
res = np.load(os.path.join(DATA_PATH, action, str(sequence),
"{}.npy".format(frame_num)))
window.append(res)
sequences.append(window)
labels.append(label_map[action])

X = np.array(sequences)
y = to_categorical(labels).astype(int)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.05)

log_dir = os.path.join('Logs')
tb_callback = TensorBoard(log_dir=log_dir)
model = Sequential()
model.add(LSTM(64, return_sequences=True, activation='relu',
input_shape=(50,63)))
model.add(LSTM(128, return_sequences=True, activation='relu'))
model.add(LSTM(64, return_sequences=False, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(actions.shape[0], activation='softmax'))
res = [.7, 0.2, 0.1]

model.compile(optimizer='Adam', loss='categorical_crossentropy',
metrics=['categorical_accuracy'])
model.fit(X_train, y_train, epochs=100, callbacks=[tb_callback])
model.summary()

model_json = model.to_json()
with open("model.json", "w") as json_file:
json_file.write(model_json)
model.save('model.h5')

32
5)Application:-

from function import *


from keras.utils import to_categorical
from keras.models import model_from_json
from keras.layers import LSTM, Dense
from keras.callbacks import TensorBoard
json_file = open("model.json", "r")
model_json = json_file.read()
json_file.close()
model = model_from_json(model_json)
model.load_weights("model.h5")

colors = []
for i in range(0,20):
colors.append((245,117,16))
print(len(colors))
def prob_viz(res, actions, input_frame, colors,threshold):
output_frame = input_frame.copy()
for num, prob in enumerate(res):
cv2.rectangle(output_frame, (0,60+num*40), (int(prob*100), 90+num*40),
colors[num], -1)
cv2.putText(output_frame, actions[num], (0, 85+num*40),
cv2.FONT_HERSHEY_SIMPLEX, 1, (255,255,255), 2, cv2.LINE_AA)

return output_frame

# 1. New detection variables


sequence = []
sentence = []
accuracy=[]
predictions = []
threshold = 0.8

cap = cv2.VideoCapture(0)
# cap = cv2.VideoCapture("https://192.168.43.41:8080/video")
# Set mediapipe model
with mp_hands.Hands(
model_complexity=0,
min_detection_confidence=0.5,
min_tracking_confidence=0.5) as hands:
while cap.isOpened():

# Read feed
ret, frame = cap.read()

# Make detections

33
cropframe=frame[40:400,0:300]
# print(frame.shape)
frame=cv2.rectangle(frame,(0,40),(300,400),255,2)
# frame=cv2.putText(frame,"Active
Region",(75,25),cv2.FONT_HERSHEY_COMPLEX_SMALL,2,255,2)
image, results = mediapipe_detection(cropframe, hands)
# print(results)

# Draw landmarks
# draw_styled_landmarks(image, results)
# 2. Prediction logic
keypoints = extract_keypoints(results)
sequence.append(keypoints)
sequence = sequence[-50:]

try:
if len(sequence) == 50:
res = model.predict(np.expand_dims(sequence, axis=0))[0]
print(actions[np.argmax(res)])
predictions.append(np.argmax(res))

#3. Viz logic


if np.unique(predictions[-10:])[0]==np.argmax(res):
if res[np.argmax(res)] > threshold:
if len(sentence) > 0:
if actions[np.argmax(res)] != sentence[-1]:
sentence.append(actions[np.argmax(res)])
accuracy.append(str(res[np.argmax(res)]*100))
else:
sentence.append(actions[np.argmax(res)])
accuracy.append(str(res[np.argmax(res)]*100))

if len(sentence) > 1:
sentence = sentence[-1:]
accuracy=accuracy[-1:]

# Viz probabilities
# frame = prob_viz(res, actions, frame, colors,threshold)
except Exception as e:
# print(e)
pass

cv2.rectangle(frame, (0,0), (300, 40), (245, 117, 16), -1)


cv2.putText(frame,"Output: -"+' '.join(sentence)+''.join(accuracy),
(3,30),
cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2,
cv2.LINE_AA)

34
# Show to screen
cv2.imshow('OpenCV Feed', frame)

# Break gracefully
if cv2.waitKey(10) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()

35
CHAPTER 7

OUTPUT AND RESULT

36
7. OUTPUT AND RESULT

OUTPUT:-

37
38
39
RESULT:-

We have achieved an accuracy of 85% in our model using only layer 1 of our algorithm,
and using the combination of layer 1 and layer 2 we achieve an accuracy of 85%, which is
a better accuracy then most of the current research papers on American sign language.

Most of the research papers focus on using devices like Kinect for hand detection.

In [7] they build a recognition system for Flemish sign language using convolutional neural
networks and Kinect and achieve an error rate of 2.5%.

In [8] a recognition model is built using hidden Markov model classifier and a vocabulary
of 30 words and they achieve an error rate of 10.90%.

In [9] they achieve an average accuracy of 86% for 41 static gestures in Japanese sign
language.

Using depth sensors map [10] achieved an accuracy of 90.99% for observed signers and
83.58% and 85.49% for new signers.

They also used CNN for their recognition system. One thing should be noted that our model
doesn’t uses any background subtraction algorithm whiles some of the models present
above do that.

So, once we try to implement background subtraction in our project the accuracies may
vary. On the other hand, most of the above projects use Kinect devices but our main aim
was to create a project which can be used with readily available resources. A sensor like
Kinect not only isn’t readily available but also is expensive for most of audience to buy and
our model uses a normal webcam of the laptop hence it is great plus point.

40
Below are the confusion matrices for our results.

Fig 7.1-Confusion Matrix I

Fig 7.2--Confusion Matrix II

41
CHAPTER 8

CHALLENGES FACED

42
8. Challenges Faced
There were many challenges faced during the project. The very first issue we faced was that
concerning thedata set. We wanted to deal with raw images and that too square images as
CNN in Keras since it is much more convenient working with only square images.

We couldn’t find any existing dataset as per our requirements and hence we decided to make
our own dataset. Second issue was to select a filter which we could apply on our images so
that proper features of the images could be obtained and hence then we could provide that
image as input for CNN model.

We tried various filters including binary threshold, canny edge detection, Gaussian blur etc.
but finally settled with GaussianBlur Filter.

More issues were faced relating to the accuracy of the model we had trained in the earlier
phases. This problemwas eventually improved by increasing the input image size and also by
improving thedataset.

43
CHAPTER 9

CONCLUSION

9. Conclusion

44
In this report, a functional real time vision based American Sign Language recognition for
D&M people have been developed for ASL alphabets.

We achieved final accuracy of 85% on our data set. We have improved our prediction after
implementing two layers of algorithms wherein we have verified and predicted symbols
which are more similar to each other.

This gives us the ability to detect almost all the symbols provided that they are shown
properly, there is no noise in the background and lighting is adequate.

45
CHAPTER 10
FUTURE SCOPE

10. Future Scope

46
We are planning to achieve higher accuracy even in case of complex backgrounds by trying
out various background subtraction algorithms.

We are also thinking of improving the Pre processing to predict gestures in low light
conditions with a higher accuracy.

This project can be enhanced by being built as a web/mobile application for the users to
conveniently access the project. Also, the existing project only works for ASL; it can be
extended to work for other native sign languages with the right amount of data set and
training. This project implements a finger spelling translator; however, sign languages are
also spoken in a contextual basis where each gesture could represent an object, or verb. So,
identifying this kind of a contextual signing would require a higher degree of processing and
natural language processing (NLP).

References

47
[1] T. Yang, Y. Xu, and “A., Hidden Markov Model for Gesture Recognition”, CMU-RI-TR-
94 10, Robotics Institute, Carnegie Mellon Univ., Pittsburgh, PA, May 1994.

[2] Pujan Ziaie, Thomas M uller, Mary Ellen Foster, and Alois Knoll “A NäıveBayesMunich,
Dept.ofInformaticsVI,RoboticsandEmbedded Systems, Boltzmannstr. 3, DE-85748
Garching,Germany.

[3]https://docs.opencv.org/2.4/doc/tutorials/imgproc/gausian_median_blur_bilateral_filter/gau
sian_median_blur_bilateral_filter.html

[4] MohammedWaleedKalous,MachinerecognitionofAuslansignsusing PowerGloves:


Towards large-lexicon recognition of signlanguage.

[5]aeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural
Networks-Part-2/

[6] http://www-i6.informatik.rwth-aachen.de/~dreuw/database.php

[7] Pigou L., Dieleman S., Kindermans PJ., Schrauwen B. (2015) Sign Language Recognition
Using Convolutional Neural Networks. In: Agapito L., Bronstein M., Rother C. (eds)
Computer Vision - ECCV 2014 Workshops. ECCV 2014. Lecture Notes in Computer
Science, vol 8925. Springer, Cham

[8] Zaki, M.M., Shaheen, S.I.: Sign language recognition using a combination of new vision-
based features. Pattern Recognition Letters 32(4), 572–577 (2011).

[9] N. Mukai, N. Harada and Y. Chang, "Japanese Fingerspelling Recognition Based on


Classification Tree and Machine Learning," 2017 Nicograph International (NicoInt), Kyoto,
Japan, 2017, pp. 19-24. doi:10.1109/NICOInt.2017.9

[10] Byeongkeun Kang, Subarna Tripathi, Truong Q. Nguyen” Real-time sign language
fingerspelling recognition using convolutional neural networks from depth map” 2015 3rd
IAPR Asian Conference on Pattern Recognition(ACPR)

[11] Number System Recognition (https://github.com/chasinginfinity/number-sign-


recognition)

[12] https://opencv.org/

[13] https://en.wikipedia.org/wiki/TensorFlow

48
[14] https://en.wikipedia.org/wiki/Convolutional_neural_nework

[15] http://hunspell.github.io/

49

You might also like