0% found this document useful (0 votes)
32 views

Sign Language Report

Uploaded by

Arham Chowdary
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Sign Language Report

Uploaded by

Arham Chowdary
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Bridging Communication Gaps Converting Sign Language to Text

A MINI PROJECT REPORT

18CSE422T-INTODUCTION TO MACHINE LANGUAGE

Submitted by

AAYUSH KUMAR SINGH (RA2111003011699)

ARHAM CHOWDARY(RA2111003011655)

Under the guidance of

Dr.T.Veeramakali

Assistant Professor, Department of Computer Science and Engineering

in partial fulfillment for the award of the degree of

BACHELOR OF TECHNOLOGY

in

COMPUTER SCIENCE & ENGINEERING

of

FACULTY OF ENGINEERING AND TECHNOLOGY

S.R.M. Nagar, Kattankulathur, Chengalpattu

i
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
(Under Section 3 of UGC Act, 1956)

BONAFIDE CERTIFICATE

Certified that Mini project report titled “EVENT MANAGEMENT SYSTEM” is the bona
fide work of ARHAM CHOWDARY(RA2111003011655) who carried out the minor project
under my supervision. Certified further, that to the best of my knowledge, the work
reported herein does not form any other project report or dissertation on the basis of which
a degree or award was conferred on an earlier occasion on this or any other candidate.

SIGNATURE SIGNATURE

Dr.T.Veeramakali Dr. M. Pushpalatha


Associate Professor Head of the department
Department of Computing Professor
Technologies Department of Computing
Technologies

ii
ABSTRACT
This report introduces an innovative AI/ML-driven solution designed to eliminate communication barriers
between sign language users and non-signers. The proposed system offers real-time conversion of sign
language gestures into text, thereby facilitating seamless communication accessibility. By harnessing the
capabilities of computer vision and machine learning, the system interprets gestures captured through video
input and generates corresponding text output. The core components of the system include advanced gesture
recognition algorithms and deep learning models trained on extensive sign language datasets. Through
meticulous testing and refinement, the system aims to achieve exceptional accuracy and responsiveness in
sign language interpretation. By deploying this cutting-edge technology, our project addresses the critical need
for inclusivity and accessibility across diverse domains. This AI initiative significantly contributes to societal
progress by fostering linguistic equality and promoting social integration for individuals reliant on sign
language communication.

iii
Table Of Contents

TITLE PAGE i

BONAFIDE CERTIFICATE ii

BONA FIDE CERTIFICATE ii

ABSTRACT iii

1. INTRODUCTION 1

2. MOTIVATION 3

3. LITERATURE REVIEW 4
3.1 Data Acquisition 4

3.2 Data Pre-Pprocessing 5

3.3 Feature Extraction 5

3.4 Gesture Classification 5


4. KEYWORD AND DEFINITION 7
4.1 Feature Extraction and Representation 7

4.2 Artificial Neural Network (ANN) 7

4.3 Convolutional Neural Network (CNN) 8

4.4 TensorFlow 10

4.5 Keras 11
4.6 openCV 11

5. METHODOLOGY 12
5.1 Data Set Generation 12

5.2 Gesture Classification 13


5.3 Finger Spelling Sentence Formation Implementation
16

5.4 Auto-correct Feature 17

5.5 Training and Testing 17

6. CHALLENGES FACED 19

iv
7. RESULTS 20

8. CONCLUSION 22

9. FUTURE SCOPE 23

10. REFERENCE 24
1
11. APPENDIX 26

v
1. Introduction:
American sign language is a predominant sign language Since the only disability Deaf
and Dumb (hereby referred to as D&M) people have is communication related and since
they cannot use spoken languages, the only way for them to communicate is through
sign language. Communication is the process of exchange of thoughts and messages in
various ways such as speech, signals, behavior and visuals. D&M people make use of
their hands to express different gestures to express their ideas with other people.
Gestures are the non-verbally exchanged messages and these gestures are understood
with vision. This nonverbal communication of deaf and dumb people is called sign
language. A sign language is a language which uses gestures instead of sound to convey
meaning combining hand-shapes, orientation and movement of the hands, arms or
body, facial expressions and lip-patterns. Contrary to popular belief, sign language is
not international. These vary from region to region.

Sign language is a visual language and consists of 3 major components [6]:

Figure -1

Minimizing the verbal exchange gap among D&M and non-D&M people turns into a
want to make certain effective conversation among all. Sign language translation is
among one of the most growing lines of research and it enables the maximum natural
manner of communication for those with hearing impairments. A hand gesture
recognition system offers an opportunity for deaf people to talk with vocal humans
without the need of an interpreter. The system is built for the automated conversion of
ASL into textual content and speech.

Page 1 of 27
In our project we primarily focus on producing a model which can recognize
Fingerspelling based hand gestures in order to form a complete word by combining
each gesture. The gestures we aim to train are as given in the image below.

Figure - 2

2. Motivation:
Fоr interасtiоn between normal рeорle аnd D&M рeорle а lаnguаge bаrrier is

Page 2 of 27
сreаted аs sign lаnguаge struсture since it is different frоm nоrmаl text. Sо,
they deрend оn visiоn-bаsed соmmuniсаtiоn fоr interасtiоn.

If there is а соmmоn interfасe thаt соnverts the sign lаnguаge tо text, then the
gestures саn be eаsily understооd by non-D&M рeорle. Sо, reseаrсh hаs been
mаde fоr а visiоn-bаsed interfасe system where D&M рeорle саn enjоy
соmmuniсаtiоn withоut reаlly knоwing eасh оther's lаnguаge.

The aim is tо develop а user-friendly Humаn Cоmрuter Interfасe (HСI) where


the соmрuter understаnds the humаn sign lаnguаge.
There аre vаriоus sign lаnguаges аll оver the wоrld, nаmely Аmeriсаn Sign
Lаnguаge (АSL), Frenсh Sign Lаnguаge, British Sign Lаnguаge (BSL), Indiаn
Sign lаnguаge, Jараnese Sign Lаnguаge аnd wоrk hаs been dоne оn оther
lаnguаges аll аrоund the wоrld.

3. Literature Survey:
In the recent years there has been tremendous research done on the hand gesture

Page 3 of 27
recognition.
With the help of literature survey, we realized that the basic steps in hand gesture
recognition are: -

• Data acquisition
• Data pre-processing
• Feature extraction
• Gesture classification

3.1 Data acquisition:


The different approaches to acquire data about the hand gesture can be done in the
following ways:

1. Use of sensory devices:


It uses electromechanical devices to provide exact hand configuration, and
position. Different glove-based approaches can be used to extract information.
But it is expensive and not user friendly.

2. Vision based approach:


In vision-based methods, the computer webcam is the input device for observing
the information of hands and/or fingers. The Vision Based methods require only
a camera, thus realizing a natural interaction between humans and computers
without the use of any extra devices, thereby reducing cost. These systems tend
to complement biological vision by describing artificial vision systems that are
implemented in software and/or hardware. The main challenge of vision-based
hand detection ranges from coping with the large variability of the human hand’s
appearance due to a huge number of hand movements, to different skin-color
possibilities as well as to the variations in viewpoints, scales, and speed of the
camera capturing the scene.

3.2 Data Pre-Processing and 3.3 Feature extraction for vision-based


approach:

Page 4 of 27
● In [1] the approach for hand detection combines threshold-based colour detection
with background subtraction. We can use AdaBoost face detector to differentiate
between faces and hands as they both involve similar skin-color.

● We can also extract necessary image which is to be trained by applying a filter


called Gaussian Blur (also known as Gaussian smoothing). The filter can be easily
applied using open computer vision (also known as OpenCV) and is described in
[3].

● For extracting necessary image which is to be trained we can use instrumented


gloves as mentioned in [4]. This helps reduce computation time for Pre-Processing
and gives us more concise and accurate data compared to applying filters on data
received from video extraction.

● We tried doing the hand segmentation of an image using color segmentation


techniques but skin colorur and tone is highly dependent on the lighting conditions
due to which output, we got for the segmentation we tried to do were no so great.
Moreover, we have a huge number of symbols to be trained for our project many
of which look similar to each other like the gesture for symbol ‘V’ and digit ‘2’,
hence we decided that in order to produce better accuracies for our large number
of symbols, rather than segmenting the hand out of a random background we keep
background of hand a stable single colour so that we don’t need to segment it on
the basis of skin colour. This would help us to get better results.

3.4 Gesture Classification:

• In [1] Hidden Markov Models (HMM) is used for the classification of the gestures.
This model deals with dynamic aspects of gestures. Gestures are extracted from a
sequence of video images by tracking the skin-color blobs corresponding to the hand
into a body– face space centred on the face of the user.

• The goal is to recognize two classes of gestures: deictic and symbolic. The image is

Page 5 of 27
filtered using a fast look–up indexing table. After filtering, skin colour pixels are
gathered into blobs. Blobs are statistical objects based on the location (x, y) and the
colorimetry (Y, U, V) of the skin color pixels in order to determine homogeneous
areas.

• In [2] Naïve Bayes Classifier is used which is an effective and fast method for static
hand gesture recognition. It is based on classifying the different gestures according to
geometric based invariants which are obtained from image data after segmentation.

• Thus, unlike many other recognition methods, this method is not dependent on skin
colour. The gestures are extracted from each frame of the video, with a static
background. The first step is to segment and label the objects of interest and to extract
geometric invariants from them. Next step is the classification of gestures by using a
K nearest neighbor algorithm aided with distance weighting algorithm (KNNDW) to
provide suitable data for a locally weighted Naïve Bayes‟ classifier.

• According to the paper on “Human Hand Gesture Recognition Using a Convolution


Neural Network” by Hsien-I Lin, Ming-Hsiang Hsu, and Wei-Kai Chen (graduates of
Institute of Automation Technology National Taipei University of Technology
Taipei, Taiwan), they have constructed a skin model to extract the hands out of an
image and then apply binary threshold to the whole image. After obtaining the
threshold image they calibrate it about the principal axis in order to centre the image
about the axis. They input this image to a convolutional neural network model in order
to train and predict the outputs. They have trained their model over 7 hand gestures
and using this model they produced an accuracy of around 95% for those 7 gestures.

4. Key words and Definitions:


4.1 Feature Extraction and Representation:

Page 6 of 27
The representation of an image as a 3D matrix having dimension as of height and
width of the image and the value of each pixel as depth (1 in case of Grayscale and 3
in case of RGB). Further, these pixel values are used for extracting useful features
using CNN.

4.2 Artificial Neural Network (ANN):


Artificial Neural Network is a connection of neurons, replicating the structure of
human brain. Each connection of neuron transfers information to another neuron.
Inputs are fed into first layer of neurons which processes it and transfers to another
layer of neurons called as hidden layers. After processing of information through
multiple layers of hidden layers, information is passed to final output layer.

Figure - 3

These are capable of learning and have to be trained. There are different learning
strategies:

1. Unsupervised Learning

2. Supervised Learning

Page 7 of 27
3. Reinforcement Learning

4.3 Convolutional Neural Network (CNN):

Unlike regular Neural Networks, in the layers of CNN, the neurons are arranged in 3
dimensions: width, height, depth. The neurons in a layer will only be connected to a
small region of the layer (window size) before it, instead of all of the neurons in a
fully-connected manner. Moreover, the final output layer would have dimensions
(number of classes), because by the end of the CNN architecture we will reduce the
full image into a single vector of class scores.

Figure – 4

1. Convolution Layer:

In convolution layer we take a small window size [typically of length 5*5] that
extends to the depth of the input matrix. The layer consists of learnable filters
of window size. During every iteration we slid the window by stride size
[typically 1], and compute the dot product of filter entries and input values at a
given position.

As we continue this process we will create a 2-Dimensional activation matrix


that gives the response of that matrix at every spatial position. That is, the

Page 8 of 27
network will learn filters that activate when they see some type of visual feature
such as an edge of some orientation or a blotch of some colour.

2. Pooling Layer:

We use pooling layer to decrease the size of activation matrix and ultimately
reduce the learnable parameters. There are two types of pooling:

a. Max Pooling: In max pooling we take a window size [for example window
of size 2*2], and only take the maximum of 4 values. Well lid this window
and continue this process, so well finally get an activation matrix half of its
original Size.
b. Average Pooling: In average pooling, we take advantage of of all Values
in a window.

Figure - 5

3. Fully Connected Layer:

In convolution layer, neurons are connected only to a local region, while in a fully
connected region, we will connect all the inputs to neurons.

Page 9 of 27
Figure - 6
4. Final Output Layer:
After getting values from fully connected layer, we will connect them to the final
layer of neurons [having count equal to total number of classes], that will predict
the probability of each image to be in different classes.

4.4 TensorFlow:
TensorFlow is an end-to-end open-source platform for Machine Learning. It has a
comprehensive, flexible ecosystem of tools, libraries and community resources that
lets researchers push the state-of-the-art in Machine Learning and developers easily
build and deploy Machine Learning powered applications.

TensorFlow offers multiple levels of abstraction so you can choose the right one for
your needs. Build and train models by using the high-level Keras API, which makes
getting started with TensorFlow and machine learning easy.

If you need more flexibility, eager execution allows for immediate iteration and
intuitive debugging. For large ML training tasks, use the Distribution Strategy API for
distributed training on different hardware configurations without changing the model
definition.

4.5 Keras:
Keras is a high-level neural networks library written in python that works as a wrapper
to TensorFlow. It is used in cases where we want to quickly build and test the neural

Page 10 of 27
network with minimal lines of code. It contains implementations of commonly used
neural network elements like layers, objective, activation functions, optimizers, and
tools to make working with images and text data easier.

4.6 OpenCV:
OpenCV (Open-Source Computer Vision) is an open-source library of programming
functions used for real-time computer-vision.
It is mainly used for image processing, video capture and analysis for features like
face and object recognition. It is written in C++ which is its primary interface,
however bindings are available for Python, Java, MATLAB/OCTAVE.

5. Methodology:
The system is a vision-based approach. All signs are represented with bare hands and
so it eliminates the problem of using any artificial devices for interaction.

Page 11 of 27
5.1 Data Set Generation:
For the project we tried to find already made datasets but we couldn’t find dataset
in the form of raw images that matched our requirements. All we could find were
the datasets in the form of RGB values. Hence, we decided to create our own data
set. Steps we followed to create our data set are as follows.

We used Open computer vision (OpenCV) library in order to produce our dataset.

Firstly, we captured around 800 images of each of the symbol in ASL (American
Sign Language) for training purposes and around 200 images per symbol for testing
purpose.

First, we capture each frame shown by the webcam of our machine. In each frame
we define a Region Of Interest (ROI) which is denoted by a blue bounded square
as shown in the image below:

Figure - 7

Then, we apply Gaussian Blur Filter to our image which helps us extract various
features of our image. The image, after applying Gaussian Blur, looks as follows:

Page 12 of 27
Figure – 8

5.2 Gesture Classification:


Our approach uses two layers of algorithm to predict the final symbol of the user.

Figure - 9

Page 13 of 27
Algorithm Layer 1:

1. Apply Gaussian Blur filter and threshold to the frame taken with openCV to get
the processed image after feature extraction.
2. This processed image is passed to the CNN model for prediction and if a letter
is detected for more than 50 frames then the letter is printed and taken into
consideration for forming the word.
3. Space between the words is considered using the blank symbol.

Algorithm Layer 2:

1. We detect various sets of symbols which show similar results on getting detected.
2. We then classify between those sets using classifiers made for those sets only.

Layer 1:

⚫ CNN Model:

1. 1st Convolution Layer: The input picture has resolution of 128x128 pixels. It is
first processed in the first convolutional layer using 32 filter weights (3x3 pixels
each). This will result in a 126X126 pixel image, one for each Filter-weights.
2. 1st Pooling Layer: The pictures are down sampled using max pooling of 2x2 i.e
we keep the highest value in the 2x2 square of array. Therefore, our picture is down
sampled to 63x63 pixels.
3. 2nd Convolution Layer: Now, these 63 x 63 from the output of the first pooling
layer is served as an input to the second convolutional layer. It is processed in the
second convolutional layer using 32 filter weights (3x3 pixels each). This will
result in a 60 x 60 pixel image.
4. 2nd Pooling Layer: The resulting images are down sampled again using max pool
of 2x2 and is reduced to 30 x 30 resolution of images.
5. 1st Densely Connected Layer: Now these images are used as an input to a fully
connected layer with 128 neurons and the output from the second convolutional
layer is reshaped to an array of 30x30x32 =28800 values. The input to this layer is
an array of 28800 values. The output of these layer is fed to the 2nd Densely
Connected Layer. We are using a dropout layer of value 0.5 to avoid overfitting.

Page 14 of 27
6. 2nd Densely Connected Layer: Now the output from the 1st Densely Connected
Layer is used as an input to a fully connected layer with 96 neurons.
7. Final layer: The output of the 2nd Densely Connected Layer serves as an input
for the final layer which will have the number of neurons as the number of classes
we are classifying (alphabets + blank symbol).

• Activation Function:
We have used ReLU (Rectified Linear Unit) in each of the layers
(convolutional as well as fully connected neurons).
ReLU calculates max(x,0) for each input pixel. This adds nonlinearity to the
formula and helps to learn more complicated features. It helps in removing the
vanishing gradient problem and speeding up the training by reducing the
computation time.

• Pooling Layer:
We apply Max pooling to the input image with a pool size of (2, 2) with ReLU
activation function. This reduces the amount of parameters thus lessening the
computation cost and reduces overfitting.

• Dropout Layers:
The problem of overfitting, where after training, the weights of the network
are so tuned to the training examples they are given that the network doesn’t
perform well when given new examples. This layer “drops out” a random set
of activations in that layer by setting them to zero. The network should be able
to provide the right classification or output for a specific example even if some
of the activations are dropped out [5].

• Optimizer:
We have used Adam optimizer for updating the model in response to the
output of the loss function.

Page 15 of 27
Adam optimizer combines the advantages of two extensions of two stochastic
gradient descent algorithms namely adaptive gradient algorithm (ADA
GRAD) and root mean square propagation (RMSProp).

Layer 2:

We are using two layers of algorithms to verify and predict symbols which are more
similar to each other so that we can get us close as we can get to detect the symbol
shown. In our testing we found that following symbols were not showing properly and
were giving other symbols also:

1. For D : R and U
2. For U : D and R
3. For I : T, D, K and I
4. For S : M and N
So, to handle above cases we made three different classifiers for classifying these sets:
1. {D, R, U}
2. {T, K, D, I}
3. {S, M, N}

5.3 Finger Spelling Sentence Formation Implementation:


1. Whenever the count of a letter detected exceeds a specific value and no other
letter is close to it by a threshold, we print the letter and add it to the current string
(In our code we kept the value as 50 and difference threshold as 20).
2. Otherwise, we clear the current dictionary which has the count of detections of
present symbol to avoid the probability of a wrong letter getting predicted.
3. Whenever the count of a blank (plain background) detected exceeds a specific
value and if the current buffer is empty no spaces are detected.
4. In other case it predicts the end of word by printing a space and the current gets
appended to the sentence below.

Page 16 of 27
Figure - 10

5.4 AutoCorrect Feature:

A python library Hunspell_suggest is used to suggest correct alternatives for each


(incorrect) input word and we display a set of words matching the current word in which
the user can select a word to append it to the current sentence. This helps in reducing
mistakes committed in spellings and assists in predicting complex words.

5.5 Training and Testing:

We convert our input images (RGB) into grayscale and apply gaussian blur to remove
unnecessary noise. We apply adaptive threshold to extract our hand from the
background and resize our images to 128 x 128.

We feed the input images after pre-processing to our model for training and testing after
applying all the operations mentioned above.

Page 17 of 27
The prediction layer estimates how likely the image will fall under one of the classes.
So, the output is normalized between 0 and 1 and such that the sum of each value in
each class sums to 1. We have achieved this using SoftMax function.

At first the output of the prediction layer will be somewhat far from the actual value.
To make it better we have trained the networks using labelled data. The cross-entropy
is a performance measurement used in the classification. It is a continuous function
which is positive at values which is not same as labelled value and is zero exactly when
it is equal to the labelled value. Therefore, we optimized the cross-entropy by
minimizing it as close to zero. To do this in our network layer we adjust the weights of
our neural networks. TensorFlow has an inbuilt function to calculate the cross entropy.

As we have found out the cross-entropy function, we have optimized it using Gradient
Descent in fact with the best gradient descent optimizer is called Adam Optimizer.

Page 18 of 27
6. Challenges Faced:
There were many challenges faced during the project. The very first issue we faced was
that concerning the data set. We wanted to deal with raw images and that too square
images as CNN in Keras since it is much more convenient working with only square
images.

We couldn’t find any existing data set as per our requirements and hence we decided to
make our own data set. Second issue was to select a filter which we could apply on our
images so that proper features of the images could be obtained and hence then we could
provide that image as input for CNN model.

We tried various filters including binary threshold, canny edge detection, Gaussian blur
etc. but finally settled with Gaussian Blur Filter.

More issues were faced relating to the accuracy of the model we had trained in the
earlier phases. This problem was eventually improved by increasing the input image
size and also by improving the data set.

Page 19 of 27
7. Results:
We have achieved an accuracy of 95.8% in our model using only layer 1 of our
algorithm, and using the combination of layer 1 and layer 2 we achieve an accuracy
of 98.0%, which is a better accuracy then most of the current research papers on
American sign language.

Most of the research papers focus on using devices like Kinect for hand detection.

In [7] they build a recognition system for Flemish sign language using convolutional
neural networks and Kinect and achieve an error rate of 2.5%.

In [8] a recognition model is built using hidden Markov model classifier and a
vocabulary of 30 words and they achieve an error rate of 10.90%.

In [9] they achieve an average accuracy of 86% for 41 static gestures in Japanese sign
language.

Using depth sensors map [10] achieved an accuracy of 99.99% for observed signers
and 83.58% and 85.49% for new signers.

They also used CNN for their recognition system. One thing should be noted that our
model doesn’t uses any background subtraction algorithm whiles some of the models
present above do that.

So, once we try to implement background subtraction in our project the accuracies
may vary. On the other hand, most of the above projects use Kinect devices but our
main aim was to create a project which can be used with readily available resources.
A sensor like Kinect not only isn’t readily available but also is expensive for most of
audience to buy and our model uses a normal webcam of the laptop hence it is great
plus point.

Page 20 of 27
Below are the confusion matrices for our results.

Figure - 11

Figure – 12

Page 21 of 27
8. Conclusion:
In this report, a functional real time vision based American Sign Language recognition
for D&M people have been developed for asl alphabets.

We achieved final accuracy of 98.0% on our data set. We have improved our
prediction after implementing two layers of algorithms wherein we have verified and
predicted symbols which are more similar to each other.

This gives us the ability to detect almost all the symbols provided that they are shown
properly, there is no noise in the background and lighting is adequate.

Page 22 of 27
9. Future Scope:
We are planning to achieve higher accuracy even in case of complex backgrounds by
trying out various background subtraction algorithms.

We are also thinking of improving the Pre Processing to predict gestures in low light
conditions with a higher accuracy.

This project can be enhanced by being built as a web/mobile application for the users
to conveniently access the project. Also, the existing project only works for ASL; it can
be extended to work for other native sign languages with the right amount of data set
and training. This project implements a finger spelling translator; however, sign
languages are also spoken in a contextual basis where each gesture could represent an
object, or verb. So, identifying this kind of a contextual signing would require a higher
degree of processing and natural language processing (NLP).

Page 23 of 27
10. References:
[1] T. Yang, Y. Xu, and “A., Hidden Markov Model for Gesture Recognition”, CMU-
RI-TR-94 10, Robotics Institute, Carnegie Mellon Univ., Pittsburgh, PA, May 1994.

[2] Pujan Ziaie, Thomas M uller, Mary Ellen Foster, and Alois Knoll “A Na ̈ıve Bayes
Munich, Dept. of Informatics VI, Robotics and Embedded Systems, Boltzmannstr. 3,
DE-85748 Garching, Germany.

[3]https://docs.opencv.org/2.4/doc/tutorials/imgproc/gausian_median_blur_bilateral_fi
lter/gausian_median_blur_bilateral_filter.html

[4] Mohammed Waleed Kalous, Machine recognition of Auslan signs using


PowerGloves: Towards large-lexicon recognition of sign language.

[5]aeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-
Neural Networks-Part-2/

[6] http://www-i6.informatik.rwth-aachen.de/~dreuw/database.php

[7] Pigou L., Dieleman S., Kindermans PJ., Schrauwen B. (2015) Sign Language
Recognition Using Convolutional Neural Networks. In: Agapito L., Bronstein M.,
Rother C. (eds) Computer Vision - ECCV 2014 Workshops. ECCV 2014. Lecture
Notes in Computer Science, vol 8925. Springer, Cham

[8] Zaki, M.M., Shaheen, S.I.: Sign language recognition using a combination of new
vision-based features. Pattern Recognition Letters 32(4), 572–577 (2011).

[9] N. Mukai, N. Harada and Y. Chang, "Japanese Fingerspelling Recognition Based


on Classification Tree and Machine Learning," 2017 Nicograph International
(NicoInt), Kyoto, Japan, 2017, pp. 19-24. doi:10.1109/NICOInt.2017.9

[10] Byeongkeun Kang, Subarna Tripathi, Truong Q. Nguyen” Real-time sign language
fingerspelling recognition using convolutional neural networks from depth map” 2015
3rd IAPR Asian Conference on Pattern Recognition (ACPR)

[11] Number System Recognition (https://github.com/chasinginfinity/number-sign-


recognition)

[12] https://opencv.org/

Page 24 of 27
[13] https://en.wikipedia.org/wiki/TensorFlow

[14] https://en.wikipedia.org/wiki/Convolutional_neural_nework

[15] http://hunspell.github.io/

Page 25 of 27
11. Appendix:

1. openCV:

openCV (Open-Source Computer Vision Library) is released under a BSD license and
hence it’s free for both academic and commercial use.

It has C++, Python and Java interfaces and supports Windows, Linux, Mac OS, iOS
and Android. OpenCV was designed for computational efficiency and with a strong
focus on real-time applications.

Written in optimized C/C++, the library can take advantage of multi-core processing.
Enabled with OpenCL, it can take advantage of the hardware acceleration of the
underlying heterogeneous compute platform.

Adopted all around the world, OpenCV has more than 47 thousand people of user
community and estimated number of downloads exceeding 14 million. Usage ranges
from interactive art, to mines inspection, stitching maps on the web or through advanced
robotics.

2. Convolutional Neural Network:

CNNs use a variation of multilayer perceptron’s designed to require minimal pre-


processing. They are also known as shift invariant or space invariant artificial neural
networks (SIANN), based on their shared-weights architecture and translation
invariance characteristics.

Convolutional networks were inspired by biological processes in that the connectivity


pattern between neurons resembles the organization of the animal visual cortex.
Individual cortical neurons respond to stimuli only in a restricted region of the visual
field known as the receptive field. The receptive fields of different neurons partially
overlap such that they cover the entire visual field.

CNNs use relatively little pre-processing compared to other image classification


algorithms. This means that the network learns the filters that in traditional algorithms

Page 26 of 27
were hand-engineered. This independence from prior knowledge and human effort in
feature design is a major advantage.

They have applications in image and video recognition, recommender systems, image
classification, medical image analysis, and natural language processing.

3. TensorFlow:

TensorFlow is an open-source software library for dataflow programming across a


range of tasks. It is a symbolic math library, and is also used for machine learning
applications such as neural networks.

It is used for both research and production at Google.

TensorFlow was developed by the Google brain team for internal Google use. It was
released under the Apache 2.0 open-source library on November 9, 2015.

TensorFlow is Google Brain's second-generation system. Version 1.0.0 was released


on February 11, 2017. While the reference implementation runs on single devices,
TensorFlow can run on multiple CPUs and GPUs (with optional CUDA and SYCL
extensions for general-purpose computing on graphics processing units).

TensorFlow is available on 64-bit Linux, macOS, Windows, and mobile computing


platforms including Android and iOS.

Its flexible architecture allows for the easy deployment of computation across a variety
of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile
and edge devices.

Page 27 of 27

You might also like