0% found this document useful (0 votes)

26 views41 pages

Major Report

Uploaded by

ar6629

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views41 pages

Major Report

Uploaded by

ar6629

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 41

CHAPTER I

INTRODUCTION
1.1. General
As is the case with all other spoken languages, sign language has a history that is both
substantial and complex. Approximately around the fifth century B.C., the first people in
Greece began using hand gestures as a means of communication. Nevertheless, the first
recorded records of sign language as a mode of communication appearing in Western
countries date back to the 17th century.
Monastic sign languages were utilised by a number of religious organisations in Europe
beginning in the tenth century and continuing until the end of the century. It was more
accurate to say that these were advanced gestural communication systems than real sign
languages. Before the year 1492, deaf individuals in Native American civilizations made
extensive use of Plains Indian Sign Language for a variety of purposes, including commerce,
ceremonies, storytelling, and ordinary communication.

1.2. Importance of sign language recognition

For the purpose of fostering inclusivity and removing obstacles to communication, it is vital
to have systems in place that are capable of understanding sign language in the current day.
As a result of their role as a crucial link between the community of persons who are hard of
hearing and people who are not familiar with sign language, they reduce the barriers that
make communication difficult and promote an environment that is more inclusive. With the
ability to provide real-time translation of sign language, these gadgets have the potential to
totally transform relationships by making it possible to have talks that are more efficient and
harmonious. They ensure that educational opportunities are accessible to all individuals by
providing hearing-impaired students with additional learning opportunities. This ensures that
educational opportunities are open to all students. Additionally, advancements in human-
computer connection are being achieved by sign language recognition systems. These
advancements extend beyond the realm of interpersonal communication. They make
technology more accessible by enabling individuals to communicate with digital systems
using sign language, which in turn makes technology more accessible.

1.3. Introduction about the System

ISL, which stands for Indian Sign Language, is an essential mode of communication for the
deaf minority in India. However, communication with the general population is severely
limited due to the fact that there are only between 500,000 and 800,000 speakers.
Communication through written means can be cumbersome, impersonal, and even unfeasible
in times of urgency. We present an ISL recognition system that uses Convolutional Neural
Networks (CNN) to translate real-time video of a user's ISL signals into text. This is a
solution to the problem that we have encountered.

Our system is comprised of three critical duties that require real-time execution:
1. Videotaping the user as they provide input
2. Assigning a distinct ISL sign to each frame in the video
3. Constructing the classification scores into the most probable word and exhibiting it as the
output.
Among the obstacles encountered in the development of a computer vision-based solution for
this issue are the following:
• Aspects of the environment (such as illumination, background, and camera placement)
• Occlusion, such as complete or partial obstruction of the hands or digits
• Detection of sign boundaries (determining the beginning and end of a sign)
Co-articulation refers to the impact that succeeding or preceding signs have on the current
sign.

Even while neural networks have been used in previous research to recognise ISL signs with
an accuracy that is greater than 90 percent, the vast majority of these techniques require
additional hardware, such as motion-tracking mitts and 3D recording devices. This is because
neural networks are considered to be a relatively new technique. In spite of the fact that
neural networks have been identified with an accuracy that is more than ninety percent, this is
the case. The viability and scalability of these systems are significantly diminished as a result
of the limitations that were highlighted earlier in this discussion.

A video of a user manually signing a word using a web application is processed by the
pipeline that is a component of our system. This pipeline is responsible for processing the
video. The video is processed by the pipeline from the point of view of the input. After we
have extracted individual frames from the video, we then use a convolutional neural network
(CNN) to compute the sign probabilities for each individually extracted frame. This is done
after we have extracted single frames from the video. When taken in their entirety, these
probabilities cover the entirety of the International Sign Language (ISL) sign repertoire. In
order to structure the frames in a manner that is consistent with the sign index to which each
frame is hypothesised to belong, we make use of a wide range of various heuristics. Because
of this, we are able to arrange the frames in a manner that is efficient. Therefore, in
conclusion, we make use of a language model in order to provide the user with the phrase that
is most likely to be utilised. This, in turn, makes it feasible for the deaf community in India to
communicate in a manner that is both instantaneous and completely uninterrupted.
CHAPTER II
LITERTURE REVIEW

The following are some of the research papers studied to understand the working of realtime
sign language recognition system and get an idea of how we can improve the existing
technologies.
1. In the context of ASL recognition system classification, the three predominant classifier
types employed are neural networks, Bayesian networks, and linear classifiers.
- A While linear classifiers may appear straightforward to configure, they operate most
effectively when provided with complex features.
- Singha and Das achieved an accuracy of 96% when implementing one-handed motions
through the utilisation of Karhunen-Loeve Transforms.

2. Real-Time ASL Recognition Using Neural Networks: The researchers are Sigberto
Alarcon Viesca, Barbara Garcia, and Theodore "Brandon" Garcia.
The procedure of translating and rotating axes to establish a new system of coordinates in
accordance with data variance.
The application of linear classifiers to the identification of hand gestures, including the
elevation of the forefinger or the indication of an object.
In order to achieve an accuracy of 62.3%, Sharma et al. implemented piece-wise classifiers,
including SVM and k-NN, after eliminating noise and background.

3. Bayesian Networks, also known as Hidden Markov Models, require precisely defined
models but demonstrate efficacy in capturing temporal patterns.
- A By employing a hand motion monitor (HMM) integrated into a three-dimensional glove,
Starner and Pentland achieved an astounding success rate of 99.2%.

Using a DBN model, Suk et al. recognised hand gestures in live video streams; this brings us
to our fourth point: dynamic bayesian networks.
- A Although not restricted to American Sign Language, the classification of hand gestures
achieved a 99% success rate.

5. ASL Neural Networks: Relevant categorization features for ASL translation are learned by
ASL neural networks.
- By considering the position and movement of the hands, Mekala et al. translated American
Sign Language footage into text using a three-layer neural network.
A more streamlined hand position can be attained through the implementation of Fourier
transforms.

6. Alternative Approaches: Admasu and Raimond implemented a feedforward neural network

to classify Ethiopian Sign Language with an achieved accuracy of 98.5%.
A range of image preparation methods were implemented, comprising contrast adjustment,
scaling, background removal, and segmentation.
By leveraging convolutional neural networks (CNNs) and Microsoft Kinect, L. Pigou et al.
achieved a cross-validation accuracy of 91.7% in their classification of Italian motions.
CHAPTER III
SYSTEM ANALYSIS
3.1 Approach and methodologies

3.1.1 Classifier Development

a. Transfer Training

A model that is capable of classifying or categorising input data into separate classes or
categories according to specified traits or features is required for the development of a
classifier. This model must be constructed. When developing a classifier, the goal is to make
an accurate prediction about the category of data that has not been observed before by
making use of patterns that have been learnt from a training dataset. The process of
developing a classifier makes use of a wide variety of algorithms and methods, such as
Bayesian networks, neural networks, support vector machines, and decision trees.

Equation (1) takes the mean loss for each training example, xi , to produce the full softmax
loss.

Transfer learning is a method of machine learning in which a model that has been trained on
a specific task in the past is reused or transferred to a task that is related to the one that it was
trained on. Transfer learning is the process of utilising the knowledge gained from one
activity to improve performance and learning on a different task that is related to the
knowledge gained from the first task. The use of this strategy is particularly helpful in
situations when there is a limited amount of data available for the new task or when training a
model from the beginning would demand a large amount of time or resources.

Through the use of insights learned from a different yet interrelated area or project, the
deployment of transfer learning has the potential to improve the effectiveness of classifiers. A
pre-trained model's information can be transferred to a new task through the process of
transfer learning, which can help speed up the training process, improve generalisation, and
improve the overall performance of the classifier on the new task. Transfer learning can also
help improve the overall performance of the classifier.

b. Caffe and GoogleNet

Berkeley Vision and Learning Centre (BVLC) is responsible for the development of the
Caffe framework, which is a deep learning framework. It is utilised extensively for a wide
range of machine learning tasks, notably in applications that involve computer vision
capabilities. When it comes to training deep neural networks, Caffe is well-known for its
speed and efficiency, as well as its expressive architecture, which makes it simple to
experiment with a variety of network configurations.

GoogleNet is a deep convolutional neural network architecture that was built by Google. It is
also known as the Inception-v1 model. In order to achieve high accuracy in picture
classification tasks, it is designed to be computationally efficient while yet obtaining high
accuracy. The notion of inception modules was first presented by GoogleNet. These modules
are building pieces that enable the simultaneous processing of various filter sizes inside the
same layer, which ultimately results in enhanced performance.

Researchers and practitioners can take advantage of the capabilities of both the framework
and the model architecture when they use Caffe with GoogleNet for machine learning tasks.
This allows them to construct and train powerful deep learning models for image
classification, object recognition, and other computer vision applications. Because of its
adaptability and effectiveness, Caffe is a particularly well-liked option for the implementation
and training of complex neural networks such as GoogleNet.

Through the utilisation of Caffe in conjunction with the GoogleNet architecture, developers
are able to reap the benefits of the optimised design of GoogleNet for image recognition
tasks, while simultaneously utilising Caffe's framework for the efficient training and
deployment of models. This combination makes it possible to construct cutting-edge machine
learning models that are capable of achieving high levels of accuracy on difficult image
datasets.

In a nutshell, Caffe and GoogleNet are two of the most powerful tools in the world of
machine learning, particularly when it comes to jobs that include images. The combination of
these two factors makes it possible for academics and practitioners to construct and train deep
neural networks in an effective manner, hence achieving top-tier performance in several
computer vision applications, including picture categorization.

c. Enhancing the Quality of the Data by Including Orientations

A technique that is utilised in the field of machine learning is known as fine-tuning data by
adding orientations. This technique is utilised to enhance the robustness and generalisation of
models, particularly in tasks that are associated with image recognition and object detection.
The model can learn to be invariant to these transformations and perform better on data that it
has not before encountered if the training data is supplemented with variations in orientation,
such as rotations, flips, and translations.

The incorporation of orientations into the training data enables the model to acquire
characteristics that are unaffected by changes in orientation, so rendering it more resistant to
differences in the manner in which objects are depicted in photographs. Through the use of
this augmentation strategy, it is possible to avoid overfitting and enhance the model's
capacity to generalise to data that has not been seen before.

d. Fine-tuning data

In situations when the training data is restricted or where the dataset lacks diversity in terms
of object orientations, fine-tuning the data involves adding orientations. This is especially
effective in situations where the training data is limited. During the training process, the
inclusion of diverse orientations allows the model to acquire the ability to recognise things
from a variety of angles, which ultimately results in improved performance on data from the
real world.

TensorFlow, PyTorch, and Keras are examples of tools and frameworks that can be utilised
by developers in order to apply data augmentation approaches such as that of adding
orientations. Developers are able to simply augment their training data with multiple
orientations by utilising these libraries, which contain functions for rotating, flipping, and
translating images.

To summarise, the process of fine-tuning data by incorporating orientations is an important

technique in the field of machine learning that can improve the performance and durability of
models. The capacity of the model to generalise and generate accurate predictions on a
variety of datasets can be improved by the developers by supplementing the training data
with variations in orientation.

3.1.2 Datasets and features

a. Dataset Description

Colour images (A) and depth images (B) are included in the Indian Sign
Language (ISL) Finger Spelling Dataset, which was compiled by the Centre for
Vision, Speech, and Signal Processing at the University of Surrey. For the goal
of making the information accessible through a web application and a laptop
equipped with a camera, we have decided to be solely utilising colour photos.
These photographs are close-ups of hands that take up the bulk of the surface
area of the image. The dataset consists of twenty-four static indicators of ISL
that were obtained from five distinct users throughout separate sessions with
lighting and backgrounds that were controlled similarly. A total of over 65,000
colour photographs are included in this collection. The height-to-width ratios of
the colour images vary, but the average size of these images is roughly 150x150
pixels.

In addition, the Massey University Gesture Dataset 2012 for International

Standard Language (ISL) contains 2,524 close-up colour photos in which the
hands touch all four borders of the frame. The hands are cropped very closely,
leaving very little negative space, and they are positioned against a background
that is black all the way through. With an average resolution of approximately
500x500 pixels, these pictures were taken by five different individuals. In order
to guarantee the training and validation of the model, we partitioned the datasets
according to the volunteers. Out of the five volunteers that were a part of each
dataset, four of them were assigned to the training set, while the fifth volunteer
from each group was assigned to the validation set. Because removing a hand
from the training set could have a major impact on the performance of the
model, we made the decision not to create a separate test set in order to preserve
the generalizability of the results. Instead, we assessed the classifier by utilising
the web application, signing ourselves, and analysing the classification
probabilities that were given by the models as a result of the evaluation.

b. Pre-processing and data augmentation

The heights and weights of the photos in both collections are not comparable to
one another. Therefore, in order to conform to the input that is anticipated by
GoogleNet, we enlarge them to 256x256 pixels and then take random crops of
224x224 pixels. In addition, we zero-center the data by beginning with the mean
picture from ILSRVC 2012 and subtracting it. It is not necessary for us to
normalise the image tensors because the range of possible values in them is
limited to 0-255. In addition, we flip the photos horizontally because it is
possible to produce signs with either the left or the right hand, and our datasets
contain examples of both of these configurations.
Due to the fact that the distinction between any two classes in our datasets is
rather little in comparison to the differences between ILSRVC classes, we
sought to pad the images with black pixels in order to ensure that they
maintained their aspect ratio when they were resized. Because of this padding,
we are also able to delete a smaller number of pixels that are relevant when we
do random crops.
3.1.3 Experiments, Results and Analysis

When we compare our findings to those of other studies, we use two measures to evaluate
this comparison. In the body of research that has been done, the correctness of the validation
set is one of the criteria that is utilised the most frequently.

Accuracy of the examples that have been classified. The top-5 accuracy statistic is yet
another approach that is frequently utilised. It determines the percentage of classifications in
which the right label is among the top five classes with the highest scores when the
classification is measured.
In addition to this, we make use of a confusion matrix, which is a particular table layout that
offers a visual depiction of the classification model's performance for each class. We are able
to get vital insights that will help us improve our future performance if we analyse the letters
that were incorrectly classified.

For each of the experiments listed below, our model was trained on letters a-y (excluding j).
After conducting some preliminary testing, we discovered that utilising a foundational base
learning

Using a rate of 1e-6 yielded promising results when fitting the training data. It showed a
consistent improvement in accuracy and appeared to converge successfully. After reaching a
point where the improvements in the loss were no longer happening, we decided to intervene
and manually halt the process. We then made the decision to decrease the learning rate in an
attempt to enhance the optimisation of our loss function. We decreased our learning rate by
various factors, ranging from 2 to 100.
In addition, we utilised the training routine that yielded the most favourable results when
tested with actual users on our web application ('2_init'). Additionally, we developed models
to classify letters from a to k (excluding j) or a to e. This allowed us to assess whether we
achieved improved results by reducing the number of classes.

1. Single-layer reinitialization and enhanced learning rate factor ('1_init'):

According to this method, we began with a model that contained pre-trained weights from
GoogleNet that had been trained on ILSVRC 2012. After that, we used Xavier initialization
to re-initialize all of the classification layers, and we increased the learning rate factor for
only this particular layer in order to speed up its learning in comparison to the other layers in
the network that had already been pre-trained.

2. Dual-layer reinitialization and altered learning rate factors ('2_init'):

In order to use this method, we started with a model that was pre-trained with weights from
GoogleNet that were trained on ILSVRC 2012. All of the classification layers were re-
initialized using the Xavier initialization, and their sizes were changed such that they
corresponded to the number of classes that were present in our dataset. In addition to this, we
enhanced the learning rate parameters for the top three layers that are located underneath each
classification head.
3. Reinitialization of a single layer while simultaneously increasing the learning
rate factor and increasing the batch size ('1_init'):
With this approach, we began by initialising the model with pre-trained weights from
GoogleNet that had been trained on ILSVRC 2012, and then we proceeded to reinitialize all
of the classification layers by employing Xavier originalization. We merely improved the
learning rate factor for this particular layer in order to promote faster learning in comparison
to the other layers that had been pre-trained. Furthermore, we increased the batch size from
four to twenty, which is a huge increase.

4. Reinitialization of a single layer and a consistent learning rate (referred to as

"full_lr"):
In this particular situation, we began with a model that contained pre-trained weights from
GoogleNet that had been trained on ILSVRC 2012. We used Xavier initialization to reset all
of the classification layers, but we made sure to keep the learning rate constant across the
whole network.

The results for all models are summarized below.

Table 1. Optimal accuracy ranges for all models trained on each letter subset.
Fig. 2: Epochs vs. validation accuracy for all models trained on letters a-y
(excluding j)
Fig. 3: Epochs vs. training loss for all models trained on letters ay (excluding j)

Fig. 4: Epochs vs. validation accuracy for the 2_init models trained on each
letter subset (excluding j)
Fig. 5: Epochs vs. training loss for the 2_init models trained on each letter
subset (excluding j)

Fig. 6: Confusion matrix for the 2_init model trained on letters ay (excluding j)

Fig. 7: Confusion matrix for the 2_init model trained on letters ak (excluding j)
CHAPTER IV
RESULT
4.1. SYSTEM ARCHITECTURE

In order to process video input, extract characteristics, classify gestures, and offer real-
time feedback, the architecture of the system that is used for real-time sign language
recognition often consists of several critical components that collaborate with one
another. An example of a hypothetical system architecture is presented here in high-level
overview form:

1. Data Preprocessing and Transformation, the dataset are processed in order to improve
the overall quality of the images and get them ready for feature extraction. It is possible
that this will involve methods such as scaling, normalization, and noise reduction in order
to enhance the quality of the supplied data.

1. Video Input: The system is able to capture the hand gestures and motions of the user as
they are practicing sign language by receiving live video input from a camera or webcam.

3. Feature Extraction: The video frames that have been preprocessed are then input into a
feature extraction module, which then extracts relevant features that represent the hand
motions and movements. In order to extract spatial and temporal information from the
video frames, it is possible to make use of techniques such as Convolutional Neural
Networks (CNNs) or Recurrent Neural Networks (RNNs).

4. Model Prediction: After the features have been extracted, they are then sent to a gesture
classification model. This model uses the input data to make a prediction about the sign
language motion that corresponds to the recovered features. It is possible for this model to
be a deep learning model that has been trained on a huge dataset of sign language
movements in order to effectively identify and recognize various signals.

5. Post-processing: Once the sign language gesture has been classified, post-processing
techniques can be implement in order to refine the output and improve the accuracy of the
system. In order to improve recognition performance, this may involve smoothing the
predicted gestures across time, integrating information about the context, or applying
language models.

6. Translation: After the gestures in sign language have been identified, they are
displayed on a user interface. This user interface may be a graphical interface that
displays the gestures that have been identified in real time. It may be possible to improve
communication and interaction by providing the user with feedback regarding the signs
that have been recognized.
4.2. USE CASE DIAGRAM

You could conceive of a use case diagram as a visual representation of the interactions
that take place between actors (users or external systems) and a system. This is one way
to perceive a use case diagram. It demonstrates the various applications of the system that
may be employed to achieve certain goals in a variety of different approaches. During the
process of developing software, use case diagrams are a common tool that are utilised for
the purpose of capturing and conveying the functional requirements of a system. The
Unified Modelling Language (UML), which is an environment for modelling, utilises
these diagrams as a component of its infrastructure.

The actors, the use cases, and the interactions that are present are the most significant
components that make up a use case diagram. A use case diagram is composed of several
main components. In contrast, use cases are certain functionality or tasks that the system
is able to carry out, whereas actors are entities that interact with the system. Use cases are
a type of use case. In order to illustrate how actors make use of the system in order to
accomplish their goals, the relationships that exist between actors and use cases provide
an illustration of how the system is utilised by characters.

Use case diagrams are advantageous for a variety of reasons, including the ones listed
above. They offer a high-level overview of the working of the system, assist in
identifying the external entities that interact with the system, and provide a source of
information. In addition to serving as a basis for subsequent analysis and design efforts,
they also provide a source of information. Because they offer a visual representation of
the behaviour of the system from the perspective of the user, they also make it simpler for
stakeholders, such as developers, designers, and customers, to interact with one another.
This is because they make it easier for stakeholders to communicate with one another.

When a use case diagram is created, ovals are used to represent use cases, and stick
figures are used to represent participants in the diagram. The interactions and
dependencies that exist between actors and use cases are expressed through the use of
lines that connect actors to use cases. There are many different types of relationships that
can be displayed in a use case diagram. These relationships include generalisations,
associations, and includes/extends linkages. Each of these specific types of relationships
has a unique purpose in defining the behaviour and requirements of the system. Among
the several sorts of relationships that fall under this category are generalisations,
affiliations, and other examples.

As a result of their ability to assist in defining the essential characteristics and interactions
of the system, use case diagrams are particularly useful at the early stages of system
development. It is because of this that they are so appealing. Additionally, they are useful
in the process of verifying requirements and can be used as a reference for activities that
involve testing and validation that take place during the development process. This is in
addition to the fact that they are helpful by themselves.
4.3. Loss and Accuracy
Both the '1_init' and '2_init' models produced very noisy results for our losses, as seen in
Figure 3. Due to the fact that we were limited in both space and time, we were initially forced
to select a batch size number of 4, which was less than optimal and ultimately led to the noise
loss. Following the observation of these results, we proceeded to train a neural network by
utilising a Lighting MemoryMapped Database (LMDB), and we were successful in
increasing the batch size to twenty. In addition to increasing the rate at which we were able to
converge on a validation accuracy, this made it possible for us to lower our loss in a more
smooth and monotonic manner. The 'full_train' model, on the other hand, had learning rates
that were consistent across all of the layers in the network, which enabled us to acquire
knowledge more rapidly. The reason for this is probably due to the fact that we are able to
modify the pre-trained weights in GoogleNet with more ease and make adjustments to
account for the major disparities that exist between the datasets. In spite of the fact that this
results in a more precise fitting of the training data, it did not appear to affect the validation
accuracy in a manner that was inferior to that achieved with other models. We came to the
conclusion, after doing an analysis of a significant number of the images contained within our
dataset, that they were most likely created by capturing video frames of individuals making
American Sign Language signs in the same room and in a single sitting. The fact that our data
set does not contain any variations is the reason why our 'full_train' model has a validation
accuracy that is comparable to that of other models. It is interesting to note that modifying
our re-initialization strategy and learning rates had a minimal impact on the final top-1 and
top-5 accuracies. The most significant difference between the two models was less than 7%
on top-1 accuracy and just over 1% on top-5 accuracy. However, we did observe that the
models that utilised solely re-initialized weights at the classification layer performed
significantly better than the model that utilised two layers of re-initialization. In spite of the
fact that the separation between classes in our dataset and the ILSVRC 2012 dataset is
extremely dissimilar, this is not something that should come as a surprise because of the
excellent quality of the features that are recovered by GoogleNet. The pre-training was
performed on a substantially larger number of photos than we have available in our dataset.
When compared side by side, the '1_init' and '2_init' models produced very little difference.
Due to the fact that GoogleNet is comprised of 22 layers, our intuition would lead us to
expect that reinitializing two layers (as opposed to reinitializing one layer and increasing the
learning rate multiple on another) would not show to be extremely useful in terms of fine-
tuning our model to our validation set. During our experiments, we were able to verify this.
Using our web application, we conducted qualitative tests on the four models on actual users
(see below for more information). Taking the '2_init' model and developing classifiers for the
letters a through e and a through k (with the exception of j) was the decision that we made
because we anticipated that it would be easier to differentiate between a smaller number of
classes. Figure 4 demonstrates that there was a clear and negative association between the
validation accuracies that were achieved through the use of the '2_init' model and the quantity
of letters that we were attempting to classify. This is not surprising. With five letters, we were
able to get a validation accuracy of approximately 98%, whereas with ten letters, we were
only able to achieve 74%.
4.4.
4.4. Confusion Matrix
Based on the confusion matrices, it appears that the primary reason for our issues in terms of
accuracy is the incorrect classification of particular characters (for example, the letters k and
d in Figure 7). The most important reason for our failures is that this is the case. The classifier
is unable to differentiate between two or three letters that are quite similar to one another in
many situations, or it greatly prefers one of the two letters in a pair (for example, g/h in
Figure 7 and m/n/s/t in Figure 6). In other words, the classifier is unable to detect the
differences between the letters. In other words, the classifier is unable to differentiate
between the letters in the alphabet.
Upon conducting an investigation of the confusion matrix for the ten-letter model, it was
discovered that, with the exception of correctly identifying the letter k, it performed very
satisfactorily. This was the sole thing that was discovered to be absent in the situation.
Especially in light of the findings that we have found, we are of the opinion that this
conclusion can be attributed to two fundamental causes. The first thing to note is that the
dataset contains k that has been signed from a number of different points of view. Variations
that vary from the front to the back of the hand that is facing the camera are included in these
viewpoints. Rotations, in which the fingers are pointed up and to the sides, are also included.
There is just one component that is consistent across all of the shots, and that is the centre
mass of the hand. Actually, this is exactly what each and every one of the images looks to be.
In addition to the fact that the letter k, which is part of the alphabetic range of letters a
through k, has characteristics that are extremely comparable to those of the letters d, g, h, and
i, this is a further insult to the injury. It is possible to build it by combining certain of the
components of these letters in a specific way. Consequently, if the classifiers placed an
excessive degree of trust on any of these qualities alone, it would be straightforward to
mistakenly categorise k as any of them, which would make it more difficult to acquire
knowledge about the latter.

4.5. Real time user testing

We put the four a–y classification models that we initially examined through their paces on
actual users by utilising our web application, which was stated earlier in this paragraph. The
reason for doing this was to make certain that they were in fact accurate. To provide a more
specific example, this meant conducting tests on photographs that were taken in a wide
variety of settings and with a wide variety of hands. We made a big discovery when we found
that there is no significant association between the final validation accuracy of the models
and their performance in real time on our web application. This was a significant finding for
us. This was an important thing that happened. In relation to this topic, a crucial remark was
made. For example, the 'full_lr' technique identified an input as an in more than half of the
cases with a probability greater than 0.99, and this was nearly entirely independent of the
letter that was displayed. This was the case in almost all of the cases. In the vast majority of
the instances, this was the situation.
Despite the fact that it provided the lowest degree of validation accuracy, the '2_init' model
was clearly distinguished from the others in a clear and distinct manner. The observations
that were made made this point very clear to take into consideration. On the other hand, in
spite of this, it was still unable to correctly identify the letter that was among the top five. Our
ability to construct the models that were mentioned previously was made possible by the
utilisation of five (a – e) and ten (a – k, excluding j) classes. For the sake of achieving the
objective that was discussed previously, this action was taken.
Compared to the findings obtained from testing the classifiers on the validation set, the
results obtained from testing the classifiers on the actual web application, which comprised a
lesser number of classes, gave drastically different outcomes. This was the case since the
validation set contained a larger number of classes. On the ten-letter classification, there were
a few letters that were correctly classified in more than seventy percent of the cases (for
example, b, c, and Fig. 7), but they were nearly never included in the top five predictions.
This was the situation for a few of the letters. It was the case with a few of the letters that this
was the scenario. Fig. 7 illustrates that the projections included a number of characters that
were visibly overrepresented, such as the letters a and e. This is an additional point of interest
that should be taken into consideration. Due to the fact that neither of these individuals
possesses any fingers that are extending forth from the middle of their middle hand, we have
grounds to believe that this is the case. According to this hypothesis, there is no evidence to
support it. Consequently, they will share the core mass of the hand in the photographs with
every other letter, and there will be no contour that would prevent them from being included
in the picture. This is because the hand does not have a distinct outline.

4.6. Translation module

The emergence of technology that is capable of recognising sign language in real time has the
potential to drastically revolutionise the way in which individuals who are deaf or hard of
hearing interact with one another. This device is equipped with a translation component that
has the capability to convert sign language into spoken languages such as English and a
number of Indian languages. One of the most fascinating aspects of the technology is this.
This function bridges the communication gap between individuals who use sign language and
others who do not understand sign language, so bringing forth new possibilities for
communication that is not only accessible but also inclusive.

Computer vision techniques of a high level of sophistication are utilised by real-time sign
language recognition in order to complete its translation job. In order to understand the
gestures and motions that are made by the user, these algorithms do analysis and
interpretation. A wide range of signals and gestures can be identified and identified by these
algorithms, which can then translate them into written or spoken language in real time. These
algorithms can interpret a variety of signs and gestures. Users of sign language and
individuals who are not familiar with sign language are able to communicate with one
another in a manner that is not only seamless but also quick as a result of this expertise.

The translation function has the potential to have a particularly big influence when it is
considered in regard to Indian languages. This is because India is home to a large number of
languages that are spoken by its people. It is possible that this technology will be able to
assist in bridging communication barriers not only between those who use sign language and
those who do not use sign language, but also between individuals who speak different
regional languages in India. This is accomplished through the facilitation of translation into a
number of different Indian languages. Individuals may be supplied with the ability to
communicate more effectively and engage fully in a range of aspects of society if they are
provided with better accessibility and participation opportunities.

A further point to consider is that the translation of sign language into English and Indian
languages in real time has the potential to have a wide range of applications. It is possible that
the implementation of this technology in educational settings will make it simpler for
students who are deaf or hard of hearing to effectively communicate with their instructors.
This, in turn, can lead to greater comprehension as well as increased participation in activities
at the classroom level. In the context of healthcare settings, it has the potential to improve
communication between medical workers and patients who use sign language. This would
ensure that patients receive care that is not just correct but also effective.

Not only is it possible to use the translation function of real-time sign language recognition
into individual interactions, but it can also be incorporated into a wide range of technologies
and different kinds of platforms. It is feasible, for example, to combine it into technologies
that are used for video conferencing in order to make it easier for those who use sign
language and hearing people to communicate with one another during virtual meetings.
Additionally, it is able to be included into mobile applications in order to provide translation
assistance for everyday conversations held by the user while they are on the move.

In general, the translation power that is included in real-time sign language recognition has a
great deal of promise for being able to facilitate accessibility, inclusivity, and effective
communication for individuals who use sign language. This technology innovation has the
potential to completely transform the way in which we interact and communicate within a
society that is both varied and welcoming to people of all backgrounds. In order to
accomplish this, it eliminates barriers to communication and makes it possible to translate
into both English and Indian languages in real time.
CHAPTER V
Challenges and Future Scope
The development of a real-time sign language recognition system that makes use of OpenCV
and TensorFlow involves a number of issues that need to be successfully solved in order to
ensure successful implementation.

A big obstacle is the intricacy and diversity of sign language motions, which can be difficult
to interpret. Sign language is characterised by a wide range of hand movements, facial
expressions, and body postures, all of which can vary significantly from one sign language to
another and even from one signer to another. It is necessary to have powerful computer vision
and machine learning algorithms that are able to handle this diversity in order to effectively
capture and interpret these nuances in real time.

A further obstacle is the requirement for processing and inference to take place in real time.
In order to support effective communication, sign language recognition systems need to be
able to digest video data in a short amount of time and deliver correct predictions in real time.
In order to guarantee the system's usability and practicality, it is critically important to
achieve minimal latency while simultaneously keeping high accuracy.

In addition, it is of the utmost importance to guarantee the reliable and generalizable nature of
the system. It is expected that the model would be able to reliably recognise signs in a variety
of lighting conditions, backgrounds, and camera angles. For applications that take place in the
real world, where environmental variables may not be under one's control, it is essential that
the system be resistant to noise, occlusions, and variations in hand forms and movements.

An further hurdle is presented by the collecting and annotation of data. For the purpose of
training the model, it might be time-consuming and resource-intensive to construct a
comprehensive dataset of sign language movements that includes a wide variety of variations
and to ensure that the labelling is accurate. An additional layer of complication is added to
the process of data preparation by the fact that the annotation of sign language data requires
knowledge in sign language interpretation in order to guarantee the accuracy of labels.

Additionally, deployment and integration with programmes that are used in the real world
present obstacles. In order to optimise the model for deployment on devices with limited
resources while still retaining real-time performance, it is necessary to take into consideration
the size of the model, the speed at which it infers, and the amount of memory it uses. There is
a possibility that further development work will be required for integration with pre-existing
systems or applications in order to guarantee a smooth operation and user experience.

In conclusion, the success of a sign language recognition system is predicated on the

feedback received from users and the ongoing improvement of the system. For the purpose of
building a system that is capable of satisfying the requirements of its intended users, it is vital
to collect feedback from individuals who use sign language and to include their input in order
to improve the system's accuracy, usability, and accessibility.
It is vital, in order to construct an efficient real-time sign language recognition system
utilising OpenCV and TensorFlow, to address these problems by utilising a combination of
technologically advanced computer vision techniques, machine learning algorithms, data
preparation tactics, and user-centric design approaches.

Future Scope
Incorporating additional technologies, approaches, or datasets into the real-time sign
language recognition system that was constructed using OpenCV and TensorFlow can
dramatically increase the system's performance and allow it to be used more effectively.

CNNs, which are three-dimensional convolutional neural networks: When compared to

typical 2D CNNs, the acquisition of spatial and temporal information from video sequences
can be accomplished more successfully with the implementation of 3D CNNs. The
incorporation of the temporal dimension into the architecture of the model enables the system
to have a greater understanding of the dynamic nature of sign language motions, which
ultimately results in enhanced accuracy and resilience.

2. "Attention Mechanisms to Consider:" It is possible to improve the system's ability to

record intricate hand motions and facial expressions in sign language gestures by
incorporating attention mechanisms into the model design. This will allow the system to
concentrate on those elements of the input sequence that are specifically significant. In
particular, attention mechanisms have the potential to improve the interpretability and
performance of the model, particularly in sign language sequences that are complex and
variable.

3. Multi-modal Learning number three: It is possible to improve the system's comprehension

of sign language motions by employing multi-modal learning strategies. These strategies
involve combining visual information from video frames with audio or textual data. Through
the utilisation of several modalities, the model is able to make use of complimentary
information, which ultimately results in improved recognition accuracy and robustness.

4. A Transfer Learning Approach Utilising Large-Scale Sign Language Datasets: Through the
utilisation of large-scale sign language datasets, such as American Sign Language (ASL) or
other sign languages, transfer learning has the potential to improve the performance of the
model as well as its generalisation capabilities. It is possible to improve the system's ability to
learn complicated patterns and variations in sign language movements by pre-training the
model on a dataset that is both diverse and comprehensive.

5. Data Synthesis and Generation, Increasing the size of the training dataset and introducing
variations in hand positions, backdrops, and lighting conditions can be accomplished by the
generation of synthetic data through the use of techniques such as Generative Adversarial
Networks (GANs). The use of synthetic data can assist in enhancing the generalisation
capabilities of the model as well as improving its robustness to a variety of other scenarios.

It would be good to investigate other neural network architectures that have showed
effectiveness in this domain, such as VGG or ResNet, in addition to our core focus on
optimising the GoogleNet model for picture classification. This would go hand in hand with
our primary goal. We are able to obtain insights into the performance of various models and
possibly discover new tactics for enhancing our sign language recognition system if we
experiment with a variety of models.

In addition, we are aware of the possible impact that could be brought about by the
implementation of comprehensive image preprocessing techniques in order to enhance the
categorization process. Adjustments such as enhancing the contrast, eliminating the
background, and possibly cropping the image could be included in this process. Utilising an
additional convolutional neural network (CNN) to localise and crop the hand region in the
photos would be a more advanced strategy that would result in an increase in the accuracy of
sign language identification.

By combining bigram and trigram language models into our language model, we may
considerably increase our system's capacity to handle whole sentences rather than individual
words. This would be a tremendous improvement. Because of this advancement, it would be
necessary to make improvements in letter segmentation and to design a technique that is more
effective in getting photographs from users at a higher frequency. Through the incorporation
of these enhancements, our objective is to develop a system that is more streamlined and all-
encompassing in its ability to translate sign language into written or spoken language.

Furthermore, one of our goals is to continuously improve the capabilities and performance of
our sign language recognition system. In order to achieve this goal, we are studying
alternative neural network topologies and using complex image preprocessing techniques.
We hope that by taking into consideration these tactics, we will be able to improve the
accuracy, efficiency, and overall user experience of our technology, which will ultimately
lead to improved communication and inclusion for people who use sign language.
CHAPTER VI
References
[1] Mitchell, Ross; Young, Travas; Bachleda, Bellamie; Karchmer, Michael (2006). "How
Many People Use ASL in the United States?: Why Estimates Need Updating" (PDF). Sign
Language Studies (Gallaudet University Press.) 6 (3). ISSN 0302-1475. Retrieved November
27, 2012.
[2] Singha, J. and Das, K. “Hand Gesture Recognition Based on Karhunen-Loeve
Transform”, Mobile and Embedded 232 Technology International Conference (MECON),
January 17-18, 2013, India. 365-371.
[3] D. Aryanie, Y. Heryadi. American Sign Language-Based Finger-spelling Recognition
using k-Nearest Neighbors Classifier. 3rd International Conference on Information and
Communication Technology (2015) 533-536.
[4] R. Sharma et al. Recognition of Single Handed Sign Language Gestures using Contour
Tracing descriptor. Proceedings of the World Congress on Engineering 2013 Vol. II, WCE
2013, July 3 - 5, 2013, London, U.K.
[5] T.Starner and A. Pentland. Real-Time American Sign Language Recognition from Video
Using Hidden Markov Models. Computational Imaging and Vision, 9(1); 227-243, 1997.
[6] M. Jeballi et al. Extension of Hidden Markov Model for Recognizing Large Vocabulary
of Sign Language. International Journal of Artificial Intelligence & Applications 4(2); 35-42,
2013
[7] H. Suk et al. Hand gesture recognition based on dynamic Bayesian network framework.
Patter Recognition 43 (9); 3059-3072, 2010.
[8] P. Mekala et al. Real-time Sign Language Recognition based on Neural Network
Architecture. System Theory (SSST), 2011 IEEE 43rd Southeastern Symposium 14-16
March 2011.
[9] Y.F. Admasu, and K. Raimond, Ethiopian Sign Language Recognition Using Artificial
Neural Network. 10th International Conference on Intelligent Systems Design and
Applications, 2010. 995-1000.
[10] J. Atwood, M. Eicholtz, and J. Farrell. American Sign Language Recognition System.
Artificial Intelligence and Machine Learning for Engineering Design. Dept. of Mechanical
Engineering, Carnegie Mellon University, 2012.
[11] L. Pigou et al. Sign Language Recognition Using Convolutional Neural Networks.
European Conference on Computer Vision 6-12 September 2014
[12] Y. Jia. Caffe: An open source convolutional architecture for fast feature embedding.
http://caffe.berkeleyvision.org/, 2014.
[13] Lifeprint.com. American Sign Language (ASL) Manual Alphabet (fingerspelling) 2007.
CHAPTER VII
Appendix
Code:
Module name: cnn_model-train.py
import numpy as np
import pickle
import cv2, os
from glob import glob
from keras import optimizers
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.utils import np_utils
from keras.callbacks import ModelCheckpoint
from keras import backend as K
K.set_image_dim_ordering('tf')

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

def get_image_size():
img = cv2.imread('gestures/1/100.jpg', 0)
return img.shape

def get_num_of_classes():
return len(glob('gestures/*'))
image_x, image_y = get_image_size()

def cnn_model():
num_of_classes = get_num_of_classes()
model = Sequential()
model.add(Conv2D(16, (2,2), input_shape=(image_x, image_y, 1), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same'))
model.add(Conv2D(32, (3,3), activation='relu'))
model.add(MaxPooling2D(pool_size=(3, 3), strides=(3, 3), padding='same'))
model.add(Conv2D(64, (5,5), activation='relu'))
model.add(MaxPooling2D(pool_size=(5, 5), strides=(5, 5), padding='same'))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(num_of_classes, activation='softmax'))
sgd = optimizers.SGD(lr=1e-2)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
filepath="cnn_model_keras2.h5"
checkpoint1 = ModelCheckpoint(filepath, monitor='val_acc', verbose=1,
save_best_only=True, mode='max')
callbacks_list = [checkpoint1]
#from keras.utils import plot_model
#plot_model(model, to_file='model.png', show_shapes=True)
return model, callbacks_list

def train():
with open("train_images", "rb") as f:
train_images = np.array(pickle.load(f))
with open("train_labels", "rb") as f:
train_labels = np.array(pickle.load(f), dtype=np.int32)
with open("val_images", "rb") as f:
val_images = np.array(pickle.load(f))
with open("val_labels", "rb") as f:
val_labels = np.array(pickle.load(f), dtype=np.int32)

train_images = np.reshape(train_images, (train_images.shape[0], image_x, image_y, 1))

val_images = np.reshape(val_images, (val_images.shape[0], image_x, image_y, 1))
train_labels = np_utils.to_categorical(train_labels)
val_labels = np_utils.to_categorical(val_labels)

print(val_labels.shape)

model, callbacks_list = cnn_model()

model.summary()
model.fit(train_images, train_labels, validation_data=(val_images, val_labels), epochs=15,
batch_size=500, callbacks=callbacks_list)
scores = model.evaluate(val_images, val_labels, verbose=0)
print("CNN Error: %.2f%%" % (100-scores[1]*100))
#model.save('cnn_model_keras2.h5')

train()
K.clear_session();

Module: create_gestures.py
import cv2
import numpy as np
import pickle, os, sqlite3, random

image_x, image_y = 50, 50

def get_hand_hist():
with open("E:\CODE\Sign-Language-Interpreter-using-Deep-Learning\Code", "rb") as f:
hist = pickle.load(f)
return hist

def init_create_folder_database():
# create the folder and database if not exist
if not os.path.exists("gestures"):
os.mkdir("gestures")
if not os.path.exists("gesture_db.db"):
conn = sqlite3.connect("gesture_db.db")
create_table_cmd = "CREATE TABLE gesture ( g_id INTEGER NOT NULL
PRIMARY KEY AUTOINCREMENT UNIQUE, g_name TEXT NOT NULL )"
conn.execute(create_table_cmd)
conn.commit()

def create_folder(folder_name):
if not os.path.exists(folder_name):
os.mkdir(folder_name)

def store_in_db(g_id, g_name):

conn = sqlite3.connect("gesture_db.db")
cmd = "INSERT INTO gesture (g_id, g_name) VALUES (%s, \'%s\')" % (g_id, g_name)
try:
conn.execute(cmd)
except sqlite3.IntegrityError:
choice = input("g_id already exists. Want to change the record? (y/n): ")
if choice.lower() == 'y':
cmd = "UPDATE gesture SET g_name = \'%s\' WHERE g_id = %s" % (g_name,
g_id)
conn.execute(cmd)
else:
print("Doing nothing...")
return
conn.commit()

def store_images(g_id):
total_pics = 1200
hist = get_hand_hist()
cam = cv2.VideoCapture(1)
if cam.read()[0]==False:
cam = cv2.VideoCapture(0)
x, y, w, h = 300, 100, 300, 300

create_folder("gestures/"+str(g_id))
pic_no = 0
flag_start_capturing = False
frames = 0

while True:
img = cam.read()[1]
img = cv2.flip(img, 1)
imgHSV = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
dst = cv2.calcBackProject([imgHSV], [0, 1], hist, [0, 180, 0, 256], 1)
disc = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(10,10))
cv2.filter2D(dst,-1,disc,dst)
blur = cv2.GaussianBlur(dst, (11,11), 0)
blur = cv2.medianBlur(blur, 15)
thresh = cv2.threshold(blur,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)[1]
thresh = cv2.merge((thresh,thresh,thresh))
thresh = cv2.cvtColor(thresh, cv2.COLOR_BGR2GRAY)
thresh = thresh[y:y+h, x:x+w]
contours = cv2.findContours(thresh.copy(), cv2.RETR_TREE,
cv2.CHAIN_APPROX_NONE)[1]

if len(contours) > 0:
contour = max(contours, key = cv2.contourArea)
if cv2.contourArea(contour) > 10000 and frames > 50:
x1, y1, w1, h1 = cv2.boundingRect(contour)
pic_no += 1
save_img = thresh[y1:y1+h1, x1:x1+w1]
if w1 > h1:
save_img = cv2.copyMakeBorder(save_img, int((w1-h1)/2) , int((w1-h1)/2) , 0,
0, cv2.BORDER_CONSTANT, (0, 0, 0))
elif h1 > w1:
save_img = cv2.copyMakeBorder(save_img, 0, 0, int((h1-w1)/2) , int((h1-w1)/2)
, cv2.BORDER_CONSTANT, (0, 0, 0))
save_img = cv2.resize(save_img, (image_x, image_y))
rand = random.randint(0, 10)
if rand % 2 == 0:
save_img = cv2.flip(save_img, 1)
cv2.putText(img, "Capturing...", (30, 60), cv2.FONT_HERSHEY_TRIPLEX, 2,
(127, 255, 255))
cv2.imwrite("gestures/"+str(g_id)+"/"+str(pic_no)+".jpg", save_img)

cv2.rectangle(img, (x,y), (x+w, y+h), (0,255,0), 2)

cv2.putText(img, str(pic_no), (30, 400), cv2.FONT_HERSHEY_TRIPLEX, 1.5, (127,
127, 255))
cv2.imshow("Capturing gesture", img)
cv2.imshow("thresh", thresh)
keypress = cv2.waitKey(1)
if keypress == ord('c'):
if flag_start_capturing == False:
flag_start_capturing = True
else:
flag_start_capturing = False
frames = 0
if flag_start_capturing == True:
frames += 1
if pic_no == total_pics:
break

init_create_folder_database()
g_id = input("Enter gesture no.: ")
g_name = input("Enter gesture name/text: ")
store_in_db(g_id, g_name)
store_images(g_id)

Module: display_gestures.py
import cv2, os, random
import numpy as np

def get_image_size():
img = cv2.imread('gestures/0/100.jpg', 0)
return img.shape

gestures = os.listdir('gestures/')
gestures.sort(key = int)
begin_index = 0
end_index = 5
image_x, image_y = get_image_size()
if len(gestures)%5 != 0:
rows = int(len(gestures)/5)+1
else:
rows = int(len(gestures)/5)

full_img = None
for i in range(rows):
col_img = None
for j in range(begin_index, end_index):
img_path = "gestures/%s/%d.jpg" % (j, random.randint(1, 1200))
img = cv2.imread(img_path, 0)
if np.any(img == None):
img = np.zeros((image_y, image_x), dtype = np.uint8)
if np.any(col_img == None):
col_img = img
else:
col_img = np.hstack((col_img, img))

begin_index += 5
end_index += 5
if np.any(full_img == None):
full_img = col_img
else:
full_img = np.vstack((full_img, col_img))

cv2.imshow("gestures", full_img)
cv2.imwrite('full_img.jpg', full_img)
cv2.waitKey(0)
Module: load_images.py
import cv2
from glob import glob
import numpy as np
import random
from sklearn.utils import shuffle
import pickle
import os

def pickle_images_labels():
images_labels = []
images = glob("gestures/*/*.jpg")
images.sort()
for image in images:
print(image)
label = image[image.find(os.sep)+1: image.rfind(os.sep)]
img = cv2.imread(image, 0)
images_labels.append((np.array(img, dtype=np.uint8), int(label)))
return images_labels

images_labels = pickle_images_labels()
images_labels = shuffle(shuffle(shuffle(shuffle(images_labels))))
images, labels = zip(*images_labels)
print("Length of images_labels", len(images_labels))

train_images = images[:int(5/6*len(images))]
print("Length of train_images", len(train_images))
with open("train_images", "wb") as f:
pickle.dump(train_images, f)
del train_images
train_labels = labels[:int(5/6*len(labels))]
print("Length of train_labels", len(train_labels))
with open("train_labels", "wb") as f:
pickle.dump(train_labels, f)
del train_labels

test_images = images[int(5/6*len(images)):int(11/12*len(images))]
print("Length of test_images", len(test_images))
with open("test_images", "wb") as f:
pickle.dump(test_images, f)
del test_images

test_labels = labels[int(5/6*len(labels)):int(11/12*len(images))]
print("Length of test_labels", len(test_labels))
with open("test_labels", "wb") as f:
pickle.dump(test_labels, f)
del test_labels

val_images = images[int(11/12*len(images)):]
print("Length of test_images", len(val_images))
with open("val_images", "wb") as f:
pickle.dump(val_images, f)
del val_images

val_labels = labels[int(11/12*len(labels)):]
print("Length of val_labels", len(val_labels))
with open("val_labels", "wb") as f:
pickle.dump(val_labels, f)
del val_labels
Module: Rotate_imag.py
import cv2, os

def flip_images():
gest_folder = "gestures"
images_labels = []
images = []
labels = []
for g_id in os.listdir(gest_folder):
for i in range(1200):
path = gest_folder+"/"+g_id+"/"+str(i+1)+".jpg"
new_path = gest_folder+"/"+g_id+"/"+str(i+1+1200)+".jpg"
print(path)
img = cv2.imread(path, 0)
img = cv2.flip(img, 1)
cv2.imwrite(new_path, img)

flip_images()

Module: Set_hand_histogram.py
import cv2
import numpy as np
import pickle

def build_squares(img):
x, y, w, h = 420, 140, 10, 10
d = 10
imgCrop = None
crop = None
for i in range(10):
for j in range(5):
if np.any(imgCrop == None):
imgCrop = img[y:y+h, x:x+w]
else:
imgCrop = np.hstack((imgCrop, img[y:y+h, x:x+w]))
#print(imgCrop.shape)
cv2.rectangle(img, (x,y), (x+w, y+h), (0,255,0), 1)
x+=w+d
if np.any(crop == None):
crop = imgCrop
else:
crop = np.vstack((crop, imgCrop))
imgCrop = None
x = 420
y+=h+d
return crop

def get_hand_hist():
cam = cv2.VideoCapture(1)
if cam.read()[0]==False:
cam = cv2.VideoCapture(0)
x, y, w, h = 300, 100, 300, 300
flagPressedC, flagPressedS = False, False
imgCrop = None
while True:
img = cam.read()[1]
img = cv2.flip(img, 1)
img = cv2.resize(img, (640, 480))
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)

keypress = cv2.waitKey(1)
if keypress == ord('c'):
hsvCrop = cv2.cvtColor(imgCrop, cv2.COLOR_BGR2HSV)
flagPressedC = True
hist = cv2.calcHist([hsvCrop], [0, 1], None, [180, 256], [0, 180, 0, 256])
cv2.normalize(hist, hist, 0, 255, cv2.NORM_MINMAX)
elif keypress == ord('s'):
flagPressedS = True
break
if flagPressedC:
dst = cv2.calcBackProject([hsv], [0, 1], hist, [0, 180, 0, 256], 1)
dst1 = dst.copy()
disc = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(10,10))
cv2.filter2D(dst,-1,disc,dst)
blur = cv2.GaussianBlur(dst, (11,11), 0)
blur = cv2.medianBlur(blur, 15)
ret,thresh = cv2.threshold(blur,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
thresh = cv2.merge((thresh,thresh,thresh))
#cv2.imshow("res", res)
cv2.imshow("Thresh", thresh)
if not flagPressedS:
imgCrop = build_squares(img)
#cv2.rectangle(img, (x,y), (x+w, y+h), (0,255,0), 2)
cv2.imshow("Set hand histogram", img)
cam.release()
cv2.destroyAllWindows()
with open("hist", "wb") as f:
pickle.dump(hist, f)

get_hand_hist()

Module: CodeRunner.py
import cv2
import numpy as np
import pickle

get_hand_hist()

Cinematography: Lighting
88% (24)
Cinematography: Lighting
77 pages
New Project Report
No ratings yet
New Project Report
48 pages
Sign Language Recognition Using Machine Learning A Survey
No ratings yet
Sign Language Recognition Using Machine Learning A Survey
5 pages
All Test Cases PDF
0% (1)
All Test Cases PDF
7 pages
Multimodal Deep Learning For Real-Time Gesture Recognition and Cross-Lingual Translation
No ratings yet
Multimodal Deep Learning For Real-Time Gesture Recognition and Cross-Lingual Translation
11 pages
Sign Language Recognition Using Machine Learning
No ratings yet
Sign Language Recognition Using Machine Learning
8 pages
Sign Language Recognition System Using TensorFlow
No ratings yet
Sign Language Recognition System Using TensorFlow
14 pages
(7-14) Journal of Soft Computing and Computational Intelligence5
No ratings yet
(7-14) Journal of Soft Computing and Computational Intelligence5
8 pages
MCA2185 - Research Paper
No ratings yet
MCA2185 - Research Paper
8 pages
Dec 2024 New Paper
No ratings yet
Dec 2024 New Paper
7 pages
Irjet V6i3619
No ratings yet
Irjet V6i3619
3 pages
Sign Language Recogntion Report
No ratings yet
Sign Language Recogntion Report
29 pages
Fin Irjmets1682255678
No ratings yet
Fin Irjmets1682255678
5 pages
Mohammed Maqdoom Jahagirdarp2Yo
No ratings yet
Mohammed Maqdoom Jahagirdarp2Yo
9 pages
Efficient Approach For ISL Using ML
No ratings yet
Efficient Approach For ISL Using ML
4 pages
All Research
No ratings yet
All Research
133 pages
Real-Time Sign Language Interpreter Using Deep-Learning
No ratings yet
Real-Time Sign Language Interpreter Using Deep-Learning
8 pages
2021a1r002 1
No ratings yet
2021a1r002 1
14 pages
Visual Language Interpreter
No ratings yet
Visual Language Interpreter
7 pages
Real Time Indian Sign Language Recognition and Speech Generation Using Convolutional Neural Network
No ratings yet
Real Time Indian Sign Language Recognition and Speech Generation Using Convolutional Neural Network
4 pages
PFX 48420843
No ratings yet
PFX 48420843
6 pages
BIt On
No ratings yet
BIt On
12 pages
IJRPR462
No ratings yet
IJRPR462
9 pages
1 s2.0 S2590005622000121 Main
No ratings yet
1 s2.0 S2590005622000121 Main
14 pages
7th Sem Report Sign Language Recognition
No ratings yet
7th Sem Report Sign Language Recognition
15 pages
Si-Lang Translator With Image Processing
No ratings yet
Si-Lang Translator With Image Processing
4 pages
Survey Paper
No ratings yet
Survey Paper
8 pages
Paper 3+ijisae
No ratings yet
Paper 3+ijisae
15 pages
Dynamic Tool For American Sign Language Finger Spelling Interpreter
No ratings yet
Dynamic Tool For American Sign Language Finger Spelling Interpreter
5 pages
Indian Sign Language Recognition System For Dynamic Signs
No ratings yet
Indian Sign Language Recognition System For Dynamic Signs
9 pages
JPNR 2022 S01 126
No ratings yet
JPNR 2022 S01 126
8 pages
Document 8 - Donee
No ratings yet
Document 8 - Donee
113 pages
Sign Language Recognition System Using Flex Sensor Network
No ratings yet
Sign Language Recognition System Using Flex Sensor Network
6 pages
Hand Sign Language Translator For Speech Impaired
No ratings yet
Hand Sign Language Translator For Speech Impaired
4 pages
Real-Time American Sign Language Recognition With Convolutional Neural Networks
No ratings yet
Real-Time American Sign Language Recognition With Convolutional Neural Networks
8 pages
Hand Gesture Based Sign Language Recognition Using Deep Learning
No ratings yet
Hand Gesture Based Sign Language Recognition Using Deep Learning
5 pages
Sign Language Detection Using Mediapipe and Deep Learning
No ratings yet
Sign Language Detection Using Mediapipe and Deep Learning
6 pages
IJRPR20645
No ratings yet
IJRPR20645
9 pages
Synopsis
No ratings yet
Synopsis
20 pages
Sign Language
No ratings yet
Sign Language
79 pages
Journal Paper - Sign Language
No ratings yet
Journal Paper - Sign Language
10 pages
A Review On The Perception and Recognition Systems For Interpreting Sign Languages Used by Deaf and Mute
No ratings yet
A Review On The Perception and Recognition Systems For Interpreting Sign Languages Used by Deaf and Mute
6 pages
Implementation of Virtual Assistant With Sign Language Using Deep Learning and Tensor Flow
No ratings yet
Implementation of Virtual Assistant With Sign Language Using Deep Learning and Tensor Flow
4 pages
Assignment: Shubam Thakyal (2021A1R032)
No ratings yet
Assignment: Shubam Thakyal (2021A1R032)
51 pages
Real-Time Recognition of Indian Sign Language
No ratings yet
Real-Time Recognition of Indian Sign Language
6 pages
REPORT - FINAL - Praga
No ratings yet
REPORT - FINAL - Praga
29 pages
American Sign Language Real Time Detection Using TensorFlow and Keras in Python
No ratings yet
American Sign Language Real Time Detection Using TensorFlow and Keras in Python
6 pages
Paper Hand Gesture Updated
No ratings yet
Paper Hand Gesture Updated
5 pages
Silent Signals AI Power Sign Language Recognization
No ratings yet
Silent Signals AI Power Sign Language Recognization
8 pages
Static Sign Language Recognition Using Deep Learning
No ratings yet
Static Sign Language Recognition Using Deep Learning
9 pages
Sign Language Recognition System With Speech Output
No ratings yet
Sign Language Recognition System With Speech Output
5 pages
Research Paper Sign Language
No ratings yet
Research Paper Sign Language
9 pages
IOP Report
No ratings yet
IOP Report
7 pages
Sign Language Detection
No ratings yet
Sign Language Detection
5 pages
Communication Interpretation Using Machine Learning and Open CV
No ratings yet
Communication Interpretation Using Machine Learning and Open CV
11 pages
ITR Report Final - 1BM19IS085
No ratings yet
ITR Report Final - 1BM19IS085
25 pages
Silent Expressions
No ratings yet
Silent Expressions
9 pages
SSRN 4973124
No ratings yet
SSRN 4973124
9 pages
A Transformer-Based Contrastive Learning Approach
No ratings yet
A Transformer-Based Contrastive Learning Approach
17 pages
A Survey of Machine Learning Techniques For Sign Language Translation Ijariie22722
No ratings yet
A Survey of Machine Learning Techniques For Sign Language Translation Ijariie22722
10 pages
Updated Research Paper
No ratings yet
Updated Research Paper
14 pages
Deep Learning: Fundamentals and Applications
From Everand
Deep Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
10-An - Swimming Pool Dehumidifier Sizing
No ratings yet
10-An - Swimming Pool Dehumidifier Sizing
4 pages
Cryptanalysis of A New Ultralightweight RFID Authentication ProtocolSASI
No ratings yet
Cryptanalysis of A New Ultralightweight RFID Authentication ProtocolSASI
5 pages
Analysis of Tension Members Part 2 of 2
No ratings yet
Analysis of Tension Members Part 2 of 2
13 pages
MA3151 Matrix and Calculus Unit Wise
No ratings yet
MA3151 Matrix and Calculus Unit Wise
5 pages
Sub-Surface Understanding of An Oil Field in Cambay Basin
No ratings yet
Sub-Surface Understanding of An Oil Field in Cambay Basin
9 pages
LinkWay S2 Datasheet 012 Web
No ratings yet
LinkWay S2 Datasheet 012 Web
2 pages
Design of Pulley and V Belt
100% (1)
Design of Pulley and V Belt
12 pages
Ip Study Material
No ratings yet
Ip Study Material
185 pages
Siemens 1LA7 Cat 48
No ratings yet
Siemens 1LA7 Cat 48
1 page
Project Report
No ratings yet
Project Report
29 pages
Test - Unit - 1 - Vector - Google Forms
No ratings yet
Test - Unit - 1 - Vector - Google Forms
4 pages
C: Identify The Structures of The Given Sentences. P: Create Sentences Using Sentence Structures. A: Share Ideas Regarding Sentence Structures
No ratings yet
C: Identify The Structures of The Given Sentences. P: Create Sentences Using Sentence Structures. A: Share Ideas Regarding Sentence Structures
11 pages
Repeatability & Reproducibility of Determination of Nitrogen Content of Fishmeal by Combustion Dumas & Comparison With Kjeldahl
No ratings yet
Repeatability & Reproducibility of Determination of Nitrogen Content of Fishmeal by Combustion Dumas & Comparison With Kjeldahl
15 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
5 pages
Prajwal Deshmukh - Batch A
No ratings yet
Prajwal Deshmukh - Batch A
38 pages
Bbs For MNB@64+016
No ratings yet
Bbs For MNB@64+016
2 pages
Flutter Analysis of The Aircraft Wing: Paramasivam Suresh (Ur13Ae044)
No ratings yet
Flutter Analysis of The Aircraft Wing: Paramasivam Suresh (Ur13Ae044)
9 pages
113 Current Monitoring Relay of Imin and Imax in 1P - AC/DC: PRI-41, PRI-42
No ratings yet
113 Current Monitoring Relay of Imin and Imax in 1P - AC/DC: PRI-41, PRI-42
1 page
Quantitative Research Designs
100% (15)
Quantitative Research Designs
16 pages
Abyss MiniRPG
No ratings yet
Abyss MiniRPG
4 pages
KODAG
No ratings yet
KODAG
24 pages
Short Notes On Polymerization
No ratings yet
Short Notes On Polymerization
6 pages
Geosynthetic Lining System For Modern Waste Facilities - Experiences in Developing Asia
No ratings yet
Geosynthetic Lining System For Modern Waste Facilities - Experiences in Developing Asia
8 pages
Umc Notification
No ratings yet
Umc Notification
1 page
Canopy Merged PDF
No ratings yet
Canopy Merged PDF
32 pages
MC 3487
No ratings yet
MC 3487
6 pages
Final Demonstration LP
No ratings yet
Final Demonstration LP
12 pages
Nmrws2: H O That Are Aldehydes
No ratings yet
Nmrws2: H O That Are Aldehydes
4 pages

Major Report

Uploaded by

Major Report

Uploaded by

CHAPTER I

1.2. Importance of sign language recognition

1.3. Introduction about the System

6. Alternative Approaches: Admasu and Raimond implemented a feedforward neural network

3.1.1 Classifier Development

b. Caffe and GoogleNet

c. Enhancing the Quality of the Data by Including Orientations

To summarise, the process of fine-tuning data by incorporating orientations is an important

3.1.2 Datasets and features

In addition, the Massey University Gesture Dataset 2012 for International

b. Pre-processing and data augmentation

1. Single-layer reinitialization and enhanced learning rate factor ('1_init'):

2. Dual-layer reinitialization and altered learning rate factors ('2_init'):

4. Reinitialization of a single layer and a consistent learning rate (referred to as

The results for all models are summarized below.

4.5. Real time user testing

4.6. Translation module

In conclusion, the success of a sign language recognition system is predicated on the

CNNs, which are three-dimensional convolutional neural networks: When compared to

2. "Attention Mechanisms to Consider:" It is possible to improve the system's ability to

3. Multi-modal Learning number three: It is possible to improve the system's comprehension

train_images = np.reshape(train_images, (train_images.shape[0], image_x, image_y, 1))

model, callbacks_list = cnn_model()

image_x, image_y = 50, 50

def store_in_db(g_id, g_name):

cv2.rectangle(img, (x,y), (x+w, y+h), (0,255,0), 2)

You might also like