0% found this document useful (0 votes)
24 views8 pages

Sign Language Recognition Reseach Paper-1

Uploaded by

Saloni Tiwari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views8 pages

Sign Language Recognition Reseach Paper-1

Uploaded by

Saloni Tiwari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Alphabet Recognition of Sign Language using

Machine Learning
Avinash Kumar Sharma Abhyudaya Mittal Aashna Kapoor
Computer Science and Engineering Computer Science and Engineering Computer Science and Engineering
ABES Institute of Technology ABES Institute of Technology ABES Institute of Technology
Ghaziabad, India Ghaziabad, India Ghaziabad, India
[email protected] [email protected] [email protected]

Aditi Tiwari
Computer Science and Engineering
ABES Institute of Technology
Ghaziabad, India
[email protected]

Abstract - One of the major issues that our society is dealing with and orientations, as well as face expressions. Hearing loss
is the difficulty that people with disabilities have in sharing their affects roughly 466 million people globally, with 34 million
feelings with normal people. People with disabilities can of them being children. Individuals who are labelled as
communicate through sign (gesture) languages. This project aims "deaf" have very limited or no hearing capabilities.
to design a model which can recognize sign language alphabets
(hand gestures) and convert it into text and sound using machine Only a small percentage of the population is aware of sign
learning approach. The main goal of this project to break down language. It is also not an international language, contrary to
barriers to communication between people with disabilities and the common perception. Obviously, this makes communication
rest of society. The performance of this method is evaluated on between the Deaf population and the hearing majority even
publicly available ISL dataset. In our project, which is based on more difficult. Because the Deaf community is generally
Convolution Neural Networks, we used the Inception V3 deep less adept in writing a spoken language, the option of
learning model for image classification. Hand gestures are written communication is inconvenient [3]. Hearing or
captured as images by the webcams in this project, and the defined speech problems affect about 0.05 percent of the world's
model aids in the recognition of the corresponding alphabet for population, according to the United Nations Statistics
hand gestures. Division. The disability was present in 63 percent of these
We have tried to overcome the existing limitations in Sign patients at birth, whereas the rest acquired it as a result of an
Language Recognition and to increase the efficiency. accident. According to JICA disability statistics, hearing
impairments account for 8.36 percent of all disabilities in
Keywords — Sign Language, Hand Gesture, Gesture Recognition, India, while speech difficulties account for 5.06 percent. For
Human Computer Interaction, Sign Language Recognition.
a deaf population of around 7 million individuals in India,
there are only about 250 competent sign language
I. INTRODUCTION interpreters [4]. A department of Persons with Disabilities
Empowerment is part of the Ministry of Social Justice and
Communication is an indispensable tool for human Empowerment, and it deals with policies for people with
existence which is a basic and effective way to share
disabilities. The ISLRTC of the ministry is in charge of deaf
thoughts, feelings and opinion, but a significant portion of and dumb schools [5].
the world's population is lacking this ability. Hearing loss,
speech disability, or both affect a large number of people. It has ties to a number of Indian schools. Aside from that,
Hearing loss is defined as a partial or total inability to hear many groups, such as the Indian Deaf and Dumb Society, try
in one or both ears. Mute, on the other hand, is a disability to give various forms of assistance on their own. Schools for
that prevents people from speaking and makes them unable the deaf and dumb rely primarily on verbal communication
to communicate. If a child becomes deaf-mute throughout due to a dearth of skilled teachers. This is the state of large
childhood, their capacity to learn languages is hampered, centers, and when it comes to rural areas, there are no
resulting in language impairment, also known as hearing institutions or assistance for the deaf and dumb. As a result,
mutism. We discovered that those who are unable to residents in these locations are experiencing severe
communicate verbally and have hearing impairments have psychological distress and feel utterly cut off from the rest
difficulty in ordinary communication, and that this hearing of the world. Even when they reach adulthood, they remain
or speech disability results in a shortage of equal reliant on their relatives, or they struggle to make ends meet
opportunity for them [1] [2]. because they are unable to find suitable work.
People who are deaf or deaf-blind use sign language as a The biggest problem is that fit people are either unwilling to
means of communication. A sign language is made up of a learn sign languages or find them difficult to remember.
variety of gestures made up of diverse hand shapes, motions, Researchers have tried a variety of ways for recognizing
diverse hand gestures in order to allow normal people to
comprehend sign languages and to eliminate barriers in our
society for individuals with disabilities. With the
advancement of modern technology, we can find a variety of
ways to integrate these people into society. The availability
of sensors, cameras, and AI technologies like as deep ASL, like all languages, is a living language, it evolves
learning, CNN, ANN, and speech to voice programs, as well with time. Many high schools, colleges, and universities
as speech to text programs, has opened the way for the in the United States accept it as a modern and "foreign"
development of useful gadgets. We can undoubtedly make language requirement for academic degrees.
significant progress in engaging with people with the help of
these new technologies. C. Indian Sign Language (ISL)
A. Sign Language and Gestures Indian Sign Language (ISL) is India's most commonly
used sign language; it is referred to as the mother
To visually transmit sign patterns that convey meaning
tongue in some metropolitan regions due to its
in sign language, a sequence of facial expressions, an
orientation, hand shapes, and hand and body movement widespread use. ISL is a collection of authentic sign
are used. Hand gestures are very crucial for deaf and languages that have grown over time and are widely
mute people who use Sign Language to communicate used as shown in Figure 2 [7].
with the outside world. Sign language has been shown
to be useful in communicating a wide range of needs, India's sign language is very scientific, with its own
from basic necessities to complex concepts. There are grammar.
three types of sign languages which are as follows:

i) Fingerspelling one letter at a time


ii) Sign vocabulary at the word level: commonly
used communication words
iii) Non-manual characteristics include whole
body movements, including facial expressions
and body position

There are a variety of sign languages used in different


countries. The most popular and frequently used among
them is American Sign Language.

B. American Sign Language (ASL)


Figure 2: Indian Sign Language
American Sign Language (ASL) is a complete, natural
language with linguistic features similar to spoken ISL is divided into two categories: manual and non-
languages and a grammar distinct from English. Hand manual. Figure 2 depicts the situation.
and face movements are used to express ASL as shown
in Figure 1. It is the predominant language of many deaf i) Manual: It's possible to accomplish it with
and hard-of-hearing North Americans, as well as some one or both hands.
hearing persons. It contains its own rules for ii) Non-manual: Facial expressions can be
pronunciation, word creation, and word order, as well as used.
all of the other basic characteristics of language.
II.SYSTEM OVERVIEW
Although the precise roots of ASL are uncertain, some
suggest that it developed more than 200 years ago from The goal is to create a system that can recognise and classify
the blending of local sign languages and French Sign sign language motions from datasets that have been recorded
Language [6]. as shown in Figure 3. The suggested framework is based on
the Inception v3 model, which is a widely used image
recognition model that has been reported to attain an accuracy
of 98.99 percent for the American Sign Language.

Figure 1: American Sign Language


factors like lighting and background irrelevant. However, the
major drawbacks of this approach are its high cost and the
necessity for the user to wear the glove continuously.
Both methodologies yield accurate results and, despite having
areas for improvement, they significantly contribute to the
goal of facilitating communication for individuals with
hearing and speech disabilities. The approaches proposed in
this project have the potential to deliver positive outcomes,
aiming to address the challenges associated with different
gesture recognition techniques. Generally, the results
achieved are highly accurate, boasting over 98 percent
accuracy in most cases.
Figure 3: System Overview
IV.RELATED WORK
III. LITERARY SURVEY
A. Deep Learning Model
An extensive examination of the literature related to the
proposed framework reveals numerous efforts and research In recent years, deep learning has achieved remarkable
endeavors aimed at addressing sign recognition in videos and success in the field of computer vision, demonstrating
images through a variety of methods and algorithms. The several key advantages, such as rich feature extraction,
ability to effectively communicate thoughts and ideas poses a powerful modeling capacity, and intuitive training .
major challenge for individuals with hearing and speaking Deep learning, a subset of machine learning, involves
disabilities. This challenge is compounded by the fact that artificial neural networks that are inspired by the
most people are not inclined to learn sign languages, which structure and function of the human brain . This
creates a pressing need for the development of systems that technology is crucial for self-driving cars, enabling
can bridge this communication gap. Consequently, numerous them to recognize stop signs and distinguish between
methods and devices have been proposed and developed to pedestrians and lampposts. Additionally, deep learning
facilitate communication with deaf or hard-of-hearing powers voice control in consumer electronics, including
individuals. phones, tablets, televisions, and hands-free speakers. Its
In this paper, we review several of these innovative solutions. recent surge in popularity is well-deserved, as it
Each approach focuses on translating sign language gestures accomplishes feats that were previously considered
into readable text and audible voice outputs. These solutions impossible.
range from advanced algorithms capable of interpreting
In deep learning, a computer model learns to perform
complex gestures to devices designed to capture and translate
categorization tasks directly from images, text, or
sign language in real-time. By examining the strengths and
sound. One area where deep learning is expected to
limitations of these various methods, our review aims to
have a significant impact in the coming years is natural
highlight the progress made in this field and the potential for
language understanding. We anticipate that systems
future advancements in creating more inclusive
utilizing Recurrent Neural Networks (RNNs) will
communication systems for those with hearing and speaking
greatly enhance their ability to comprehend words or
disabilities.
entire texts by developing strategies for selectively
The proposed solutions for gesture recognition are divided
focusing on specific parts. Deep learning models can
into three stages: input, processing, and output. These stages
achieve state-of-the-art accuracy, sometimes even
can be implemented using two different methodologies, as
surpassing human performance. These models are
highlighted in various research papers. The first methodology
trained using extensive labeled datasets and various
employs image processing and machine learning techniques,
neural network architectures.
while the second relies on sensors and microcontrollers. Each
approach has its own set of advantages and disadvantages, but B. Convolutional Neural Network (CNN)
both are highly effective in facilitating the communication
process. Most state-of-the-art computer vision solutions for a
Based on the reviewed papers, we can conclude that the wide range of problems use convolutional networks
image-based method is both inexpensive and portable, [22]. The model which we have used uses
making it accessible for use at any time. However, it faces Convolutional Neural Network which is one of the most
challenges with image processing under varying light often used deep neural networks (CNN). Convolutional
conditions and backgrounds. While this method is less Neural Networks had its origins in the neocognitron46,
efficient than the glove-based system, it is more cost- which had a similar architecture but no end-to-end
effective. supervised-learning mechanism like backpropagation.
For the recognition of phonemes and simple phrases, a
The sensory-based approach, on the other hand, offers the primitive 1D CNN dubbed a time-delay neural net was
significant advantage of directly acquiring data such as the utilized [29][30]. Convolution is a mathematical linear
degree of bend, wrist orientation, and hand motion through action between matrices that gives it its name.
glove sensors, which translates into voltage values for the Convolutional layer, non-linearity layer, pooling layer,
computing device. This eliminates the need for processing and fully-connected layer are some of the layers of
raw data into meaningful values and makes environmental CNN. The pooling and non-linearity layers do not have
parameters, but the convolutional and fully-connected - It is more efficient.
layers have. In machine learning issues, the CNN
performs admirably [13]. As a result, CNNs can have a - It has a broader network than the Inception V1 and V2
wide range of designs, which are mostly determined by models, but its speed remains unaffected.
the task. The task could include picture classification, - It is less computationally expensive.
multiple class segmentation, or the localization of
particular objects within a scene [26]. CNNs are - It uses auxiliary classifiers as regularizers.
typically employed to solve tough image-driven pattern
recognition tasks, and their exact yet simple architecture
makes getting started with ANNs a lot easier [14]. We chose the Inception V3 model for our project because it
C. Transfer Learning is an improved version of the Inception V1 model. To
enhance model adaptation, the Inception V3 model employs
Transfer learning has surged in popularity because it several techniques to optimize the network.
significantly reduces training time and requires far less
data to enhance performance. It aims to improve the V.PROPOSED SYSTEM
performance of target learners in target domains by
The proposed system's design consists of the five phases
transferring information from different but related
listed below, as well as a flow diagram depicting the steps
source domains . Transfer learning is particularly
required is shown in the figure 4 below –
advantageous in many knowledge engineering
scenarios, such as web document classification .
 Database Collection
Recently, transfer learning techniques have been  Training of Model
successfully applied in various real-world applications.  Pre-Processing and Hand Segmentation
For instance, Raina et al. and Dai et al. proposed using  Feature Extraction
transfer learning to learn text data across domains . In  Classification
our project, we used the Inception v3 model, which  Text to Speech Conversion
incorporates transfer learning. This model was pre-
trained on millions of images using extremely high
computational power, which would be challenging to
achieve from scratch .
Our approach involved leveraging this pre-trained
model and further training it with specific datasets
relevant to our requirements and objectives. The results
were outstanding.

D. Inception Model

There are two types of Inception models:

 Inception V1:When numerous deep convolutional


layers were used in a model, overfitting often
occurred. To prevent this, the Inception V1 model
uses multiple filters of varying sizes at the same
level. This approach results in parallel layers rather
than deep layers, making the model broader instead
of deeper.

 Inception V3:The Inception V3 model surpasses


GoogleNet (Inception V1) in object recognition
performance . It comprises three components: a
basic convolutional block, an upgraded Inception
Figure 4: System Flowchart
module, and a classifier. Inception V3 is a
convolutional neural network with 48 layers. A pre-
trained version of this network, trained on more In the following sub-sections of this section, each of the
than a million images from the ImageNet database, steps is detailed.
is accessible. The main objective of factorizing
convolutions in Inception V3 is to minimize the A. Database Collection
number of connections and parameters while
The database collection is an important part of every
preserving network efficiency..
project as with the help of the database we train our
model and to ensure the accuracy of the model it is
very important to collect the database form a reliable
Features of Inception V3 source. In this project as we have worked on Indian
Sign Language and American Sign Language, we Gary Bradski founded OpenCV at Intel in
have collected the dataset from Kaggle is the largest 1999 with the goal of speeding up research
data science community on the planet, with a wealth and commercial applications of computer
of tools and services to assist you in achieving your vision around the world while also driving
data science objectives [8].We have used two types a demand for ever more powerful
of data sets, one for the Indian Sign Language and computers for Intel[20]. OpenCV is a large
the other for the American Sign Language. For each open-source computer vision, machine
alphabet we have used around 1200 images to train learning, and image processing library.
our model in order to achieve better accuracy and Python can process the OpenCV array
efficiency. So, if we consider that there are 26 structure for analysis when it is combined
alphabets in total, and each alphabet has 1200 data with various libraries such as NumPy. We
sets, we can see that 26x1200 is a massive dataset use vector space and perform mathematical
that can aid us in training our model in a more operations on these features to identify
efficient manner, allowing us to attain higher image patterns and their various features
accuracy and because the signer's hands are in [16].
different positions and orientations for the same  Segmentation
sign, the system is more flexible. The process of converting an image into
small segments in order to extract more
B. Training of Model accurate image attributes is known as
The model we have used in order to achieve our segmentation. If the segments are properly
objective is Inception V3 model. The Inception V3 autonomous (two segments of an image
is a deep learning model for image classification that should not have any identical information),
uses Convolutional Neural Networks. The Inception then the image's representation and
V3 is a more advanced version of the fundamental description will be accurate, whereas the
model Inception V1, which was first released in result of rugged segmentation will be
2014 as GoogLeNet. It was created by a Google inaccurate.[19]
team, as the name implies [11]. Before extracting the features from the input image, a series
of operations are performed on it to ensure that high-quality
The algorithm for training the model - features are extracted. Threshold-based segmentation is used
Step 1. Load the dataset through the url in the for hand segmentation [18].
working folder
Step 2. Read the dataset images in a variable D. Feature Extraction
Step 3. Set the training data size to 80 percent and In image classification, feature extraction is a crucial
testing data size to 20 percent. step. It provides for the most accurate representation
Step 4. Divide the images randomly to create of image content [21]. Feature extraction divides and
training and testing samples. organises a large collection of raw data as part of the
Step 5. Perform image transformation by cropping dimensionality reduction process. Classes have been
images, random rotation and normalization. reduced to smaller, easier-to-manage groups as a
Step 6. Check the valid images after the result, processing would be more straightforward.
transformation.
Step 7. Iterate through the training images and check
various label classes.
Step 8. Set class names as labels to be predicted as
output.
Step 9. Set number of epoch (training cycles) to 25.
Step 10. Perform the training of model in each cycle
to increase the accuracy in each step.
Step 11. Calculate the accuracy and training loss by
testing the model.
Step 12. Save the generated model in local storage.

C. Pre-processing and Hand Segmentation


This is the first step in the pre-processing process.
This is the image sensing process. To remove noise,
each picture frame is pre-processed. Open CV is
used to start the camera module in this project. When Figure 5 : Processing of Images
cameras open to capture the image of hand gesture,
we can see the rectangular box which helps to detect
the hand gesture without any background noise. Various image pre-processing techniques, including
as binarization, thresholding, scaling, and
 Open CV normalisation, are applied to the sampled image
before getting features as shown in Figure 5. Alphabet - O
Following that, feature extraction techniques are
used to extract features that can be used to classify
and recognise images [25]. The large number of
variables is the most crucial feature of these massive
data sets. Processing these variables necessitates a
significant amount of computing power. As a result,
by selecting and combining variables into functions,
function extraction aids in the extraction of the best
feature from large data sets. Minimising the amount
of data these features are straightforward to use Alphabet - C
while accurately and uniquely describing the data
collection process.
In this project feature extraction enables the model
to detect the hand gestures without any background
noise. Form, contour, geometrical feature (position,
angle, distance, etc.), colour feature, histogram, and
other predefined features are extracted from pre-
processed images and used later for sign
classification or recognition [17].

E. Classification
Alphabet - W
Many picture categorization models have been
developed to aid in the resolution of the most
pressing issue of identification accuracy. Image
categorization is a key subject in the field of
computer vision, having a wide range of practical
applications [28]. We have used transfer learning
mechanism to train our model. Inception V3 Model
is used in this project, which is an image classifier
model which works on CNN (Convolutional Neural
Network) and It is pre-trained on a very large data.
So, by transfer learning we mean that we have
trained the existing inception V3 model on our target Alphabet – B
dataset of sign languages. Now, we have used this
alphabet recognition model to predict the various Figure 6: Output Labels

labels of sign languages. The predict function takes F. Text to Speech Conversion
user image as input and map it to correct label
according to the trained model. Finally the correct Speech is one of the most ancient and natural ways
label is returned as output as shown in Figure 6. for humans to share information. Throughout the
years [23]. The process of turning words into a vocal
audio form is known as text-to-speech (TTS). The
programme, tool, or software takes a user's input text
and, using natural language processing methods,
deduces the linguistics of the language and does
logical inference on it. This processed text is then
sent to the next block, which performs digital signal
processing on it. This processed text is then
translated into a voice format using a variety of
techniques and transformations. Speech is
synthesised throughout the entire procedure.
Alphabet - R In this project for converting the text into speech we
have used gTTS module. Google Text-to-Speech
(gTTS) is a Python library and command-line utility
for interacting with the Google Translate text-to-
speech API [10]. The gTTS library, which can be
used for voice translation, will be imported from the
gTTS module [9].

The text-to-speech (TTS) synthesis process is


divided into two stages. The first is text analysis, in
which the input text is converted into a phonetic or
other linguistic representation, and the second is
speech waveform generation, in which the output is
generated using this phonetic and prosodic
information. The terms "high-level synthesis" and
"low-level synthesis" are commonly used to
describe these two phases [24].

Other languages, such as French, German, and


Hindi, can also benefit from the gTTS module.
When there is a communication barrier and the user
is unable to communicate his messages to others,
this is highly useful. Text-to-speech is a wonderful
benefit to those who are visually impaired or have
other disabilities since it may assist them with text-
to- speech translation. The gTTS module can also be Figure 8: Accuracy and Test Loss Graph
used for other languages, which opens up a lot of
possibilities. VII.CONCLUSION & FUTURE SCOPE
In this project, we have observed that using transfer learning
i) Features of GTTS is very efficient approach. We have used the pre trained
Customizable speech-specific sentence Inception V3 model which is based on CNN and Deep Neural
tokenizer that can read any length of text Network algorithms and trained it on sign language dataset
while maintaining proper intonation, with 3000 images per alphabet. This large dataset helped us
abbreviations, decimals, and other features to achieve greater accuracy for sign language recognition.
and the text pre-processors that can be
customised to give features such as
pronunciation using GTTS library as
shown in Fig 7.

Figure 7: Text to Speech part


Figure 9: Select Alphabet Accuracy comparison

VI.RESULTS The problem encountered in previous work like less accuracy


The inception v3 model gave excellent result in classifying on some selected single hand alphabets is also solved as we
the sign language gestures and the accuracy we have have achieved similar accuracy for every alphabet as shown
achieved for the American Sign Language is 98.99% and in Fig 9.
Training loss of 1.46% as shown in Figure 8. The future scope of this project is to achieve the same
Also, there was a drawback that on including the letters such accuracy while recognizing words and sentences along with
as {C, L, M, N, R, U, Y} the researchers were not getting a the development of an mobile based application to be
good accuracy is solved in our project. We have achieved installed on portable devices like smart watches or mobile
the above-mentioned accuracy including these 7 letters {C, phones so that people can freely use it in their daily lives.
L, M, N, R, U, Y} that is total 26 letters. Including the
letters C, L, M, N, R, U, and Y, we tested our method on all
26 alphabets and achieved 99.99% accuracy with American
Sign Language. VIII. REFERENCES
[1] Liang R-H, Ouhyoung M (1998) A real-time continuous gesture
recognition system for sign language. In: IEEE International Conference on
Automatic Face and Gesture Recognition, 1998. Proceedings. Third. IEEE,
pp 558–567
[2] Liang R-H (1997) Continuous gesture recognition system for taiwanese
sign language. National Taiwan University
[3] Pigou L., Dieleman S., Kindermans PJ., Schrauwen B. (2015) Sign
Language Recognition Using Convolutional Neural Networks. In: Agapito
L., Bronstein M., Rother C. (eds) Computer Vision - ECCV 2014 [35] R. Raina, A. Y. Ng, and D. Koller, “Constructing informative priors
Workshops. using transfer learning.” in Proceedings of the 23rd International Conference
[4] Starner T, Weaver J, Pentland A (1998) Real-time American sign on Machine Learning, Pittsburgh, Pennsylvania, USA, June 2006, pp. 713–
language recognition using desk and wearable computer based video. IEEE 720
Trans Pattern Anal Mach Intell 20(12):1371–1375 [36] W. Dai, G. Xue, Q. Yang, and Y. Yu, “Co-clustering based classification
[5] Vogler C, Metaxas D (1997) Adapting hidden markov models for asl for out-of-domain documents,” in Proceedings of the 13th ACM SIGKDD
recognition by using three dimensional computer vision methods. In: IEEE International Conference on Knowledge Discovery and Data Mining, San
INTERNATIONAL CONFERENCE ON SYSTEMS MAN AND Jose, California, USA, August 2007.
CYBERNETICS, vol 1. IEEE, pp 156–161 [37]W. Dai, G. Xue, Q. Yang, and Y. Yu, “Transferring naive bayes
[6] Huang XD, Ariki Y, Jack MA (1990) Hidden markov models for speech classifiers for text classification,” in Proceedings of the 22rd AAAI
recognition Conference on Artificial Intelligence, Vancouver, British Columbia, Canada,
[7] Lichtenauer JF, Hendriks EA, Reinders MJT (2008) Sign language July 2007, pp. 540–545.
recognition by combining statistical dtw and independent classification. [38] C. Lin, L. Li, W. Luo, K. C. P. Wang, and J. Guo, “Transfer Learning
IEEE Transactions on Pattern Analysis & Machine Intelligence Based Traffic Sign Recognition Using Inception-v3 Model”, Period.
30(11):2040–2046 Polytech. Transp. Eng., vol. 47, no. 3, pp. 242–250, 2019.
[8] https://www.kaggle.com/docs/datasets [39] S. He, "Research of a Sign Language Translation System Based on Deep
[9] https://pypi.org/project/gTTS/ Learning," 2019 International Conference on Artificial Intelligence and
[10]https://readthedocs.org/projects/gtts/downloads/pdf/latest/ Advanced Manufacturing (AIAM), 2019, pp. 392-396, doi:
[11]https://iq.opengenus.org/inception-v3-model-architecture/ 10.1109/AIAM48774.2019.00083.
[12]https://machinelearningmastery.com/what-is-deep-learning/ [40] Li Y., Zhang T., "Deep neural mapping support vector machines,"
[13]S. Albawi, T. A. Mohammed and S. Al-Zawi, "Understanding of a Neural Networks, Vol. 93, pp.185-194, 2017.
convolutional neural network," 2017 International Conference on
Engineering and Technology (ICET), 2017, pp. 1-6, doi:
10.1109/ICEngTechnol.2017.8308186.
[14]O'Shea, Keiron & Nash, Ryan. (2015) An Introduction to Convolutional
Neural Networks. ArXiv e-prints.
[15]https://medium.com/analytics-vidhya/transfer-learning-using-inception-
v3-for-image-classification-86700411251b
[16]https://www.geeksforgeeks.org/opencv-overview/
[17]https://www.itmconferences.org/articles/itmconf/pdf/2021/05/itmconf_i
cacc2021_03004.pdf
[18] A novel approach for ISL alphabet recognition using Extreme Learning
Machine Anand Kumar1 Ravinder Kumar2 Received: 5 December 2019 /
Accepted: 26 September 2020 Bharati Vidyapeeth’s Institute of Computer
Applications and Management 2020
[19]https://www.ripublication.com/ijaer18/ijaerv13n9_90.pdf
[20] Emami, Shervin & Suciu, Valentin. (2012). Facial Recognition using
OpenCV. Journal of Mobile, Embedded and Distributed Systems.
[21] Medjahed, Seyyid Ahmed (2015) A Comparative Study of Feature
Extraction Methods in Images Classification. International Journal of Image,
Graphics and Signal Processing.
[22] Szegedy, Christian & Vanhoucke, Vincent & Ioffe, Sergey & Shlens,
Jon & Wojna, ZB. (2016). Rethinking the Inception Architecture for
Computer Vision. 10.1109/CVPR.2016.308.
[23] Nwakanma, Ifeanyi & Oluigbo, Ikenna & Izunna, Okpala. (2014). Text
– To – Speech Synthesis (TTS). 2. 154-163.
[24] Lemmetty, S., 1999. Review of Speech Syn1thesis Technology. Masters
Dissertation, Helsinki University Of Technology.
[25] Kumar, Gaurav & Bhatia, Pradeep. (2014). A Detailed Review of
Feature Extraction in Image Processing Systems. 10.1109/ACCT.2014.74.
[26] Teja Kattenborn, Jens Leitloff, Felix Schiefer, Stefan Hinz,Review on
Convolutional Neural Networks (CNN) in vegetation remote sensing,ISPRS
Journal of Photogrammetry and Remote Sensing,Volume 173,2021,Pages
24-49,ISSN 0924-2716,https://doi.org/10.1016/j.isprsjprs.2020.12.010.
[27] F. Zhuang et al., "A Comprehensive Survey on Transfer Learning," in
Proceedings of the IEEE, vol. 109, no. 1, pp. 43-76, Jan. 2021, doi:
10.1109/JPROC.2020.3004555.
[28] Hussain, Mahbub & Bird, Jordan & Faria, Diego. (2018). A Study on
CNN Transfer Learning for Image Classification.
[29] Waibel, A., Hanazawa, T., Hinton, G. E., Shikano, K. & Lang, K.
Phoneme recognition using time-delay neural networks. IEEE Trans.
Acoustics Speech Signal Process. 37, 328–339 (1989).
[30] Bottou, L., Fogelman-Soulié, F., Blanchet, P. & Lienard, J. Experiments
with time delay networks and dynamic time warping for speaker independent
isolated digit recognition. In Proc. EuroSpeech 89 537–540 (1989).
[31] Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation by
jointly learning to align and translate. In Proc. International Conference on
Learning Representations http://arxiv.org/abs/1409.0473 (2015).
[32] Xu, K. et al. Show, attend and tell: Neural image caption generation with
visual attention. In Proc. International Conference on Learning
Representations http:// arxiv.org/abs/1502.03044 (2015).
[33] G. P. C. Fung, J. X. Yu, H. Lu, and P. S. Yu, “Text classification without
negative examples revisit,” IEEE Transactions on Knowledge and Data
Engineering, vol. 18, no. 1, pp. 6–20, 2006.
[34] H. Al-Mubaid and S. A. Umair, “A new text categorization technique
using distributional clustering and learning logic,” IEEE Transactions on
Knowledge and Data Engineering, vol. 18, no. 9, pp. 1156–1165, 2006.

You might also like