0% found this document useful (0 votes)
16 views61 pages

Final Year Project Report (Final)

The thesis presents a project aimed at developing a system that converts sign language to text using machine learning, specifically Convolutional Neural Networks (CNN). The system is designed to enhance communication accessibility for the Deaf and Hard of Hearing community by providing real-time interpretation of sign language gestures, with future plans for multi-person interaction and wearable device integration. The project includes extensive data collection, model training, and user interface design to ensure effective and inclusive communication.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views61 pages

Final Year Project Report (Final)

The thesis presents a project aimed at developing a system that converts sign language to text using machine learning, specifically Convolutional Neural Networks (CNN). The system is designed to enhance communication accessibility for the Deaf and Hard of Hearing community by providing real-time interpretation of sign language gestures, with future plans for multi-person interaction and wearable device integration. The project includes extensive data collection, model training, and user interface design to ensure effective and inclusive communication.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 61

DEVELOPING A SYSTEM FOR CONVERTING SIGN

LANGUAGE TO TEXT

A Thesis submitted in partial fulfillment of the


requirements for the award of the degree of

BACHELOR OF TECHNOLOGY

In

COMPUTER SCIENCE AND ENGINEERING(ARTIFICIAL INTELLIGENCE AND


MACHINE LEARNING)
By
Ishu Singh 2000681530027
Raman Baliyan 2000681530039
Ritik Chauhan 2000681530040
Harsh Tyagi 2000681530024

Under the Supervision of:-


Mr. Umesh Kumar(Assistant Professor, Department of CSE (AI))

MEERUT INSTITUTE OF ENGINEERING AND

TECHNOLOGY, MEERUT 250005

Dr. A.P.J. Abdul Kalam Technical University, U.P., Lucknow


MAY,2024
DECLARATION

We hereby declare that this submission is our work and that, to the best of our
knowledge and belief, it contains no material previously published or written
by another person nor material which to a substantial extent has been accepted
for the award of any other degree or diploma of the university or other institute
of higher learning, except where due acknowledgment has been made in the
text.

Signature :
Name: Ishu Singh

RollNo.2000681530
027
Date:
CERTIFICATE

This is to certify that Project Report entitled ― “Developing a system for converting sign
language to text using Machine Learning” which is submitted by Ishu Singh
(2000681530027), Raman Baliyan (2000681530039), Ritik Chauhan (2000681530040) and
Harsh Tyagi (2000681530024) in partial fulfillment of the requirement for the award of
degree B. Tech. in Department of Computer Science and Engineering (Artificial Intelligence
& Machine Learning) of Dr. A.P.J. Abdul Kalam Technical University, U.P., Lucknow, is a
record of the candidates own work carried out by them under my supervision. The matter
embodied in this Project report is original and has not been submitted for the award of any
other degree.

Date: Supervisor : Mr. Umesh Kumar

ii
ACKNOWLEDGEMENT

It gives us a great sense of pleasure to present the report of the B. Tech Project
undertaken during B.Tech. Final Year. We owe a special debt of gratitude to
our guide Mr.Umesh Kumar, Assistant Professor, Department of Computer
Science and Engineering (Artificial Intelligence), Meerut Institute of
Engineering and Technology, Meerut for her constant support and guidance
throughout the course of our work. her sincerity, thoroughness and
perseverance have been a constant source of inspiration for us. It is only her
cognizant efforts that our endeavors have seen light of the day.

We also do not like to miss the opportunity to acknowledge the contribution of


all faculty members of the department for their kind assistance and cooperation
during the development of our project. Last but not the least, we acknowledge
our friends for their contribution in the completion of the project.

Signature : Signature :

Name : Ishu Singh Name : Raman Baliyan

Roll No : 2000681530027 Roll No : 2000681530039

Date : Date :

Signature : Signature :

Name : Ritik Chauhan Name : Harsh Tyagi

Roll No : 2000681530040 Roll No : 2000681530024

Date : Date :

iii
ABSTRACT

To train the proposed Sign Language Detection System, we collected a large-scale


dataset consisting of diverse sign language gestures performed by multiple signers.
Extensive experiments demonstrate the effectiveness and robustness of the proposed
system in accurately recognizing various sign language gestures under different
environmental conditions, including varying lighting conditions, background clutter,
and signer-specific variations.

The proposed Sign Language Detection System holds significant promise in


enhancing accessibility for the Deaf and Hard of Hearing Community by providing
real-time interpretation of sign language gestures. Future work includes extending
the system to support multi-person sign language interactions and integrating it into
wearable devices for on-the-go accessibility.

To train the proposed Sign Language Detection System, we compiled a huge dataset
of various sign language gestures made by numerous signers. Extensive trials show
that the suggested system is successful and resilient in properly recognizing diverse
sign language motions across a variety of ambient circumstances, including
illumination, background clutter, and signer-specific changes.

The proposed Sign Language Detection System shows great potential in improving
accessibility for the Deaf and Hard of Hearing Community by offering real-time
interpretation of sign language motions. Future work will entail expanding the
system to accommodate multi-person sign language communications and
incorporating it into wearable devices for on-the-go accessibility.

v
LIST OF TABLES

S.NO. DESCRIPTION PAGE NO.

3.1 Libraries 20

5.1 Hardware detail 23

5.2 Software details 23

vi
LIST OF FIGURE

S.No. Description Page No.


1 American Sign Language 1
3.1 Flow Chart 10
3.2 No. of classes and images 11
3.3 Dataset 11
3.4 Labelled Dataset 12
3.5 Gray scalled Dataset 13
3.6 Train test Split 13
3.7 CNN Model 16
3.8 CNN Architecture 17
3.9 Training Loss and Accuracy 18
4.1 Model Accuracy 21
4.5 Confusion Matrix 22

vii
TABLE OF CONTENT

Page No.

DECLARATION............................................................................................................i

CERTIFICATE..............................................................................................................ii

ACKNOWLEDGEMENT.............................................................................................iii

ABSTRACT....................................................................................................................iv

LIST OF TABLES..........................................................................................................v

LIST OF FIGURES........................................................................................................vi

CHAPTER 1 INTRODUCTION..................................................................................1

1.1 INTRODUCTION..........................................................................................1

1.2 NEED OF THE PROJECT............................................................................3

1.3 UTILITY IN MARKET..................................................................................5

1.4 OBJECTIVES.................................................................................................6

CHAPTER 2 LITERATURE........................................................................................7

CHAPTER 3 METHODOLOGY AND IMPLEMENTATION.................................9

3.1 DATA COLLECTION.................................................................................11

3.2 DATA PREPROCESSING..........................................................................12

3.3 MODEL TRAINING……………………………………………………….13

3.3.1 CONVOLUTIONAL NEURAL NETWORK ARCHITECTURE. 15

3.4 MODEL TESTING AND EVALUATION....................................................18

3.5 TECHNOLOGY USED..................................................................................19

3.6 LIBRARIES USED.........................................................................................20

CHAPTER 4 RESULTS.................................................................................................21

RESULTS.........................................................................................................................21
CHAPTER 5 SYSTEM REQUIREMENTS..................................................................23

5.1 HARDWARE REQUIREMENTS...................................................................23

5.2 SOFTWARE REQUIREMENTS.....................................................................23

CHAPTER 6 CONCLUSION AND FUTURE SCOPE ...............................................24

REFRENCES........................................................................................................................26

APPENDIX……………………………………………………………………………………….27
CHAPTER-1 INTRODUCTION

1.1 Introduction

In the contemporary era of rapid technological advancements, the quest for innovative
solutions that foster seamless communication for individuals with diverse linguistic abilities
remains a pivotal focal point. Within this context, the development of a Hand Sign Language
to Text and Speech Conversion system using advanced Convolutional Neural Networks (CNN)
represents a significant stride towards inclusivity and accessibility. This groundbreaking
system stands as a testament to the fusion of state-of-the-art image processing, machine
learning methodologies, and intuitive user interfaces, all converging to bridge the gap between
conventional spoken language and the intricate nuances of sign language.
Amidst its multifaceted capabilities, one of the primary objectives of this system is the accurate
detection and interpretation of an extensive range of hand signs, encompassing not only the 26
letters of the English alphabet but also the recognition of the backslash symbol, a crucial
component for seamless textual communication. By harnessing the power of CNN, the system
demonstrates an unprecedented accuracy rate exceeding 99%, enabling the precise translation
of intricate hand gestures into their corresponding textual representations.
The core architecture of the system integrates the robust OpenCV library for intricate image
processing and gesture recognition, coupled with the flexible Keras library, serving as the
backbone for the streamlined implementation of the CNN model.
The comprehensive workflow of the system encompasses real-time video input capturing,
sophisticated image preprocessing, and informed predictions based on the robust CNN model,
reflecting a harmonious blend of cutting-edge technology and user-centric design.
Furthermore, the system is equipped with a highly intuitive Graphical User Interface (GUI)
that showcases the captured video feed and the recognized hand sign, providing users with a
seamless experience to interact with the system effortlessly. Users are presented with an array
of options, including the ability to select suggested words or effortlessly clear the recognized
sentence, fostering an environment of interactive and dynamic communication. Additionally,
the integration of text-to-speech functionality empowers users to not only visualize but also
audibly comprehend the recognized sentence, enhancing the overall accessibility and user
experience.

Through rigorous and extensive testing, the efficacy and precision of the proposed system have

Page 1 of 51
been extensively validated, underscoring its immense potential for real-world applications
across a diverse spectrum of contexts. By facilitating the seamless conversion of intricate hand
gestures into coherent textual and auditory output, this system paves the way for enhanced
communication and inclusivity,
catering to the diverse needs of individuals with varying linguistic abilities and promoting a
more connected and accessible society.

Fig. 1 – American Sign Language

Page 2 of 51
1.1 Need of the Project

Creating a project for sign language to text conversion and speech synthesis could
greatly benefit the Deaf and Hard of Hearing community by providing them with a
more accessible means of communication. Here's an outline of how such a project
could be structured:

Research and Data Collection: Study various sign languages, their gestures, and
common vocabulary. Gather a diverse dataset of sign language gestures, possibly
including videos or images, along with their corresponding text translations.

Machine Learning Model Development: Train a machine learning model, possibly


using deep learning techniques, to recognize sign language gestures from input
images or video streams. Explore techniques such as convolutional neural networks
(CNNs) for image recognition and sequence-to-sequence models for translating
sequences of gestures into text.

Text-to-Speech (TTS) Integration: Implement a text-to-speech module that


converts the recognized text into spoken language. Choose a TTS engine that
supports multiple languages and accents to accommodate the diverse user base.

User Interface Design: Develop a user-friendly interface that allows users to input
sign language gestures, either through live video feeds from cameras or pre-
recorded videos/images. Display the recognized text translations in real-time,
providing immediate feedback to users. Integrate speech synthesis to audibly
output the translated text.Testing and Evaluation: Conduct extensive testing with
individuals proficient in sign language to ensure the accuracy and reliability of
gesture recognition. Gather feedback from users to improve the usability and
accessibility of the system. Accessibility Considerations: Ensure that the
application is compatible with assistive technologies and adheres to accessibility
standards. Provide customization options for font size, color schemes, and other
visual preferences.Localization and Language Support: Support multiple sign
languages and spoken languages to cater to a global audience.

Page3of 51
1.2 Utility in Market

A reliable system for detecting sign language has a large and varied market potential.
Such a system can be used in education, healthcare, customer service, entertainment,
and other fields in addition to meeting the requirements of those with hearing
impairments.

The use of sign language-detecting technology in the classroom can support


inclusive learning settings and enable students who are hard of hearing or deaf to
engage completely in-class activities. Educational institutions can also use this
technology to offer specialized instruction in communication and interpretation in
sign language.

Accurate sign language recognition in healthcare settings can facilitate better


communication between medical professionals and deaf patients, improving care
quality and guaranteeing that vital medical information is communicated.

Additionally, the use of sign language recognition technology in the retail and
customer service sectors can help companies better serve clients who have hearing
loss, creating an environment that is more welcoming and inclusive.

Sign language recognition can create new opportunities for accessibility and content
production in the entertainment sector, giving hard of hearing and deaf people access
to a greater variety of multimedia content, such as movies, TV series, and
internet videos.

Overall, a machine learning-based system for detecting sign language is useful in a


variety of fields, providing advantages for inclusion, accessibility, and
communication. The goal Of this project is to realize this potential by creating a
scalable and dependable solution that caters to the requirements of various user
groups.

Page4of 51
1.3 Objectives

1. Statistics indicate that more than 80% of individuals with disabilities are unable to
read or write. In response, our system endeavors to narrow the communication
divide between individuals with different abilities, including those who are
hearing-impaired or visually impaired. It achieves this by converting a significant
portion of sign language into text and speech, thereby facilitating effective
communication among diverse groups.

2. Individuals who are deaf or hard of hearing can utilize understandable hand
movements to express their messages.

3. Those without visual impairments can utilize the software to understand sign
language and communicate proficiently with individuals who are deaf or hard of
hearing. Similarly, individuals who are visually impaired can also engage in
communication when the predicted text from sign language is converted into
speech.

4. This project aims to close the existing gap in understanding sign language, making
it more accessible and comprehensible to a wider audience.

5. The aim of this project is to recognize symbolic expressions depicted in images,


thereby facilitating seamless communication between individuals who are hearing
impaired and those who are not.
6. Generating data with respect to American sign language and processing it using re
processing. .
7. Utilizing Deep Learning models to train the preprocessed data for real-time sign
language recognition and speech conversion. Testing the model in the real world
scenario.
8. Evaluating the model in real-world conditions.

Page5of 51
CHAPTER-2 LITERATURE SURVEY
In the domain of sign language recognition and translation, Convolutional Neural Networks
(CNNs) have emerged as a prominent technique, particularly for American Sign Language
(ASL) recognition. Researchers like Hsien-I Lin et al. have utilized image segmentation to
extract hand gestures and achieved high accuracy levels, around 95%, using CNN models
trained on specific hand motions. Similarly, Garcia et al. developed a real-time ASL translator
using pre-trained models like GoogLeNet, achieving accurate letter classification.

In Das et al.'s study [1], they developed an SLR system utilizing deep learning techniques,
specifically training an Inception V3 CNN on a dataset comprising static images of ASL
motions. Their dataset consisted of 24 classes representing alphabets from A to Z, except for J.
Achieving an average accuracy rate of over 90%, with the best validation accuracy reaching
98%, their model demonstrated the effectiveness of the Inception V3 architecture for static sign
language detection.

Sahoo et al. [2] focused on identifying Indian Sign Language (ISL) gestures related to numbers
0 to 9. They employed machine learning methods such as Naive Bayes and k-Nearest Neighbor
on a dataset captured using a digital RGB sensor. Their models achieved impressive average
accuracy rates of 98.36% and 97.79%, respectively, with k-Nearest Neighbor slightly
outperforming Naive Bayes.

Ansari et al. [3] investigated ISL static gestures using both 3D depth data and 2D images
captured with Microsoft Kinect. They utilized K-means clustering for classification and
achieved an average accuracy rate of 90.68% for recognizing 16 alphabets, demonstrating the
efficacy of incorporating depth information into the classification process.

Rekha et al. [4] analyzed a dataset containing static and dynamic signs in ISL, employing skin
color segmentation techniques for hand detection. They trained a multiclass Support Vector
Machine (SVM) using features such as edge orientation and texture, achieving a success rate of
86.3%. However, due to its slow processing speed, this method was deemed unsuitable for
real-time gesture detection.

Bhuyan et al. [5] utilized a dataset of ISL gestures and employed a skin color-based
segmentation approach for hand detection. They achieved a recognition rate of over 90% using

Page6of 51
the nearest neighbor classification method, showcasing the effectiveness of simple yet robust
techniques.

Pugeault et al. [6] developed a real-time ASL recognition system utilizing a large dataset of 3D
depth photos collected through a Kinect sensor. Their system achieved highly accurate
classification rates by incorporating Gabor filters and multi-class random forests,
demonstrating the effectiveness of integrating advanced feature extraction techniques.

Keskin et al. [7] focused on recognizing ASL numerals using an object identification technique
based on components. With a dataset comprising 30,000 observations categorized into ten
classes, their approach demonstrated strong performance in numeral recognition.

Sundar B et al. [8] presented a vision-based approach for recognizing ASL alphabets using the
MediaPipe framework. Their system achieved an impressive 99% accuracy in recognizing 26
ASL alphabets through hand gesture recognition using LSTM. The proposed approach offers
valuable applications in human-computer interaction (HCI) by converting hand gestures into
text, highlighting its potential for enhancing accessibility and communication.

Jyotishman Bora et al. [9] developed a machine learning approach for recognizing Assamese
Sign Language gestures. They utilized a combination of 2D and 3D images along with the
MediaPipe hand tracking solution, training a feed-forward neural network. Their model
achieved 99% accuracy in recognizing Assamese gestures, demonstrating the effectiveness of
their method and suggesting its applicability to other local Indian languages. The lightweight
nature of the MediaPipe solution allows for implementation on various devices without
compromising speed and accuracy.In terms of continuous sign language recognition, systems
have been developed to automate training sets and identify compound sign gestures using
noisy text supervision. Statistical models have also been explored to convert speech data into
sign language, with evaluations based on metrics like Word Error Rate (WER), BLEU, and
NIST scores.

Overall, research in sign language recognition and translation spans various techniques and
languages, aiming to improve communication accessibility for individuals with hearing
impairments

Page7of 51
CHAPTER-3 METHODOLOGY AND IMPLEMENTATION

The proposed system aims to develop a robust and efficient Hand Sign Language to Text and
Speech Conversion system using advanced Convolutional Neural Networks (CNN). With a
primary focus on recognizing hand signs, including the 26 alphabets and the backslash
character, the system integrates cutting-edge technologies to ensure accurate translation and
interpretation. Leveraging the OpenCV library for streamlined image processing and gesture
recognition, and the Keras library for the implementation of the CNN model, the system
guarantees high precision in sign language interpretation.
The system involves the real-time capture of video input showcasing hand gestures, which are
then pre-processed to enhance the quality of the images. These pre-processed images are then
fed into the trained CNN model, enabling precise predictions and accurate translation of the
gestures into corresponding text characters. The integration of a user-friendly Graphical User
Interface (GUI) provides an intuitive display of the captured video and the recognized hand
sign, empowering users with the option to choose suggested words or clear the recognized
sentence effortlessly.
Furthermore, the system is equipped with text-to-speech functionality, allowing users to listen
to the recognized sentence, thereby enhancing the overall accessibility and usability of the
system. The proposed system is designed with a focus on real-world applications, ensuring its
effectiveness and accuracy through extensive testing and validation. The system's robust
architecture and accurate translation capabilities position it as a promising solution for bridging
communication gaps and facilitating seamless interaction for individuals using sign language.
The camera is used to capture the image gestures in the vision based method. The vision based
method reduces the difficulties as in the glove based method. “Hand talk-a sign language
recognition based on accelerometer and semi data” this paper introduces American Sign
Language conventions. It is part of the “deaf culture” and includes its own system of puns,
inside jokes, etc. It is very difficult to understand understanding someone speaking Japanese by
English speaker. The sign language of Sweden is very difficult to understand by the speaker of
ASL.
A computer vision system is implemented to select whether to differentiate objects using
colour or black and white and, if colour, to decide what colour space to use (red, green, blue or
hue, saturation, luminosity).

Page8of 51
Fig. 3.1 - FlowChart

Page9of 51
3.1 Data Collection
For the project we tried to find already made datasets but we couldn’t find
dataset in the form of raw images that matched our requirements. All we could
find were the datasets in the form of RGB values. Hence, we decided to create
our own data set. Steps we followed to create our data set are as follows.

We used Open computer vision (OpenCV)library in order to produce our


dataset. Firstly,wecapturedaround800imagesofeachofthesymbolinASL(American
Sign Language) for training purposes and around 200 images per symbol for
testing purpose.
First, we capture each frame shown by the webcam of our machine. In each
frame we define a Region Of Interest (ROI) which is denoted by a blue bounded
square as shown in the image below:

Fig. 3.2 – No. of Classes and Images

Fig. 3.3 - Dataset

Page10of 51
Fig. 3.4- Labeled Dataset

3.2 Data Preprocessing

TensorFlow/Keras’ Image Data Generator augments data by applying diverse


transformations to original images, creating variations that boost the model’s
generalization and performance on unseen data.

The rescale parameter plays a crucial role in normalizing pixel values by dividing them
by 255, a common practice that confines the pixel range to [0, 1]. Another important
parameter, rotation_range, defines the extent of random rotations applied to the images,
with a maximum rotation angle of 15 degrees. The width_shift_range and
height_shift_range parameters dictate the allowable horizontal and vertical shifts,
proportionate to the image dimensions. For instance, setting width_shift_range to 0.1
permits horizontal shifts up to 10% of the image width. shear_range controls shear
transformations, which slant images along the x or y axis. The zoom_range parameter
manages zooming effects, allowing images to be zoomed in (up to 20%) or out (up to
20%). Lastly, horizontal_flip enables horizontal flipping of images, a valuable technique
for enhancing the model’s capacity to recognize features from diverse orientations. I used
OpenCV to convert images to grayscale, apply Gaussian blur, and create binary training
images for CNN model comparison. MediaPipe was also employed to extract hand
landmarks, enhancing the analysis.

Page11of 51
Fig. 3.5 – Grayscaled DataSet

3.3 Model Training

For model training, we began by splitting the preprocessed dataset into training,
validation, and testing sets. This division ensured that the model could learn from a
diverse range of examples during training while also allowing for unbiased
evaluation of its performance on unseen data.

Fig. 3.6 – train test split

Next, we defined the architecture of the Convolutional Neural Network (CNN) to be


used for sign language detection. CNNs are well-suited for image classification tasks
due to their ability to learn hierarchical representations of visual features.

Page12of 51
The architecture comprised multiple convolutional layers followed by max-pooling
layers to extract and downsample features from the input images. Batch
normalization layers were incorporated to accelerate training and improve model
stability. Dropout regularization was also applied to mitigate overfitting by randomly
deactivating a fraction of neurons during training .The activation function used
throughout the network was Rectified Linear Unit (ReLU), known for its simplicity
and effectiveness in promoting nonlinear transformations.

After defining the CNN architecture, we compiled the model using the TensorFlow
and Keras frameworks. TensorFlow provides a comprehensive suite of tools for
building and training deep learning models, while Keras offers a high-level API for
quickly prototyping neural networks.

We configured the model with appropriate loss function, optimizer, and evaluation
metrics. Categorical cross-entropy loss was chosen as the loss function for multi-
class classification tasks, while the Adam optimizer was selected for its efficiency
and adaptability. Accuracy was used as the evaluation metric to assess the model's
performance on both training and validation data.

Finally, we initiated the training process using TensorFlow and Keras, feeding the
preprocessed training data into the model and iteratively adjusting its parameters to
minimize the defined loss function. The training progress was monitored using the
validation set to prevent overfitting and ensure generalization to unseen data

Page13of 51
3.3.1 Convolutional Neural Network Architecture (CNN)

Convolutional Neural Networks are a type of deep learning model specifically designed
for processing and analysing visual data, such as images and videos. They are inspired by
the visual processing mechanism of the human brain and have proven to be highly
effective in tasks such as image recognition, object detection, and image classification.
CNNs are composed of multiple layers, including convolutional layers, pooling layers,
and fully connected layers. These layers work together to extract relevant features from
the input data and make accurate predictions. The key operations within a CNN include
convolution, pooling, and fully connected layers.

 Convolutional layers: These layers perform feature extraction by applying a set of


learnable filters to the input data. These filters detect various patterns and features at
different levels of abstraction. The application of these filters helps the network
recognize edges, textures, and higher-level features within the input images.

 Pooling layers: Pooling layers work to reduce the spatial dimensions of the data. They
do this by down sampling the feature maps, thereby decreasing the computational load
and controlling overfitting. Common types of pooling include max pooling and average
pooling, which retain the most significant features while discarding the less relevant
information.

 Fully connected layers: The fully connected layers interpret the features extracted by
the convolutional and pooling layers. They use this information to make predictions
based on the learned representations. These layers provide the final output of the CNN,
enabling the network to classify the input data into different categories based on the
learned features.One of the primary advantages of CNNs is their ability to
automatically learn hierarchical representations from raw pixel data. Unlike traditional
machine learning models, which require manual feature extraction, CNNs can learn
complex patterns and relationships between pixels on their own. This capability makes
them well-suited for tasks that involve complex visual patterns and intricate spatial
relationships.To improve their performance, CNNs utilize techniques such as back
propagation and gradient descent during the training process. These techniques allow
the network to adjust its parameters, optimizing its ability to recognize and classify
images accurately. In recent years, CNNs have become a foundational component in

Page14of 51
various computer vision applications. Their capability to capture spatial hierarchies and
local patterns within images has significantly contributed to advancements in image
understanding and pattern recognition. Researchers continue to explore ways to
enhance CNN architectures, such as by incorporating residual connections, batch
normalization, and attention mechanisms, to further improve their performance and
generalizability across different visual tasks.

Fig. 3.7 – CNN Model

Page15of 51
Fig. 3.8 – CNN Architecture

The model was compiled using the Adam optimizer once the necessary
hyperparameters were determined, employing categorical cross-entropy as the loss
function. Adam optimizer offers a straightforward implementation, computational
efficiency, and requires less memory. This optimization technique estimates the first
and second moments of the gradient to adaptively adjust the learning rate for each
parameter, contributing to efficient parameter updates.

Now, let's delve deeper into the analysis of the layers within the model:

 Input Layer : I he input layer plays a crucial role in processing and


representing the raw input data, typically images, before passing them
through subsequent layers for feature extraction and classification. Here's a
breakdown of the role of the input layer in a VCN architecture

 Convolutional Layer (Conv2D) : Convolutional layers apply a kernel to the


input data, performing operations such as blurring, edge detection, and
sharpening. The

Page16of 51
output size of the convolutional layer is determined by parameters such as
kernel size, stride, and padding.
It is basically used to extract features from the dataset which can be used as
data pre procressing.

 Pooling Layer : Pooling layers downsample the output of the convolutional


layer, reducing its spatial dimensions while retaining important features. This
CNN utilizes max pooling, which selects the maximum value from specific
regions of the input.

 Flattening Layer : The flattening layer converts the multidimensional output


from the convolutional and pooling layers into a one-dimensional array,
preparing it for input into the fully connected layers.

 Fully Connected or Dense Layers : Fully connected layers learn features


based on combinations of features from preceding layers. They consist of
densely connected neurons, each receiving input from every neuron in the
previous layer.

 Output Layer : The output layer comprises neurons activated by the softmax
function. Softmax generates a probability distribution over the classes,
indicating the likelihood of the input image belonging to each class.

3.4 Model Testing and Evaluation

The model training and optimization module involve training the Convolutional Neural
Network (CNN) using the pre-processed dataset and optimizing the network's
architecture and parameters to achieve superior performance. This module includes
procedures such as model configuration, hyperparameter tuning, and cross-validation to
enhance the CNN's learning capabilities and generalization to various hand sign gestures.
By conducting comprehensive model training and optimization, the module ensures the
CNN's ability to accurately recognize and classify a wide range of hand sign language
gestures with high precision and reliability.

Page17of 51
Fig 3.9- Training loss and accuracy

3.5 Technology Used

 Python : To develop and run the model, we use Python due to its rich
libraries and its efficiency to run the machine learning programs.

 Deep Learning : We used Deep Learning in order to classify the data as per
the project’s requirement.

 Google Colab IDE : To run and implement our code, we use google colab
IDE to train and develop our model.

Page18of 51
3.5 Libraries Used

Table 3.1 - Libraries

Library Name Usage


Tensorflow Used for building and defining the model
os Used to read and write the data in the folder
using folder’s directory.
matplotlib Used to plot all the graphs that we need to
determine from the model
Cv2 Used to read the write the data from the
folder
Keras Used for building, training and deploying
neural network.
Sequential The sequential is a linear stack of layers
where each layer has one input tensor and
one output tensor
Conv2D It is referred to as Convolutional layer in
CNN model and is used for image
processing
Flatten The Flatten layer in neural networks is
primarily used to convert multi-
dimensional input data into a one-
dimensional array.
Dense It is referred to as feed forward neural
network i.e. each neuron in a layer receives
input from each neuron of previous layer
MaxPooling2D It is used to down sample feature maps
while retaining the most important features.

Mediapipe Mediapipe is an open-source framework


developed by Google for building
multimodal (e.g., video, audio, and sensor
data) machine learning pipelines.
Droupout It is basically used to prevent our model
from overfitting.
ImageDataGenerator It is basically used in data preprocessing
like for resizing the image
Sklearn It is used for various purpose like train-test
split and training the model.
Model Checkpoint It is used to save the model’s weight so that
it can be used when we resume the training
of the model if it get interrupted.

Page19of 51
CHAPTER 4 RESULTS

4.1 Results

We cleaned the ASL dataset before using 4500 photos per class to train our model. There
were 166K photos in the original collection. An 80% training set and a 20% test set were
created from the dataset. In order to train the model, we used a range of hyperparameters,
including learning rate, batch size, and the number of epochs.
Our test set evaluation metrics demonstrate the trained model's remarkable performance.
It properly identified every sample in the test set, earning a high accuracy score of 100%.
The classification report's precision, recall, and F1-score values are all 100%, showing
that the model properly identified each class's samples without making any errors.

Fig. 4.1 Model Accuracy

Classes Precision Recal F1- Support


l score
A 1.00 1.00 1.00 912
B 1.00 1.00 1.00 940
C 1.00 1.00 1.00 921
D 1.00 0.99 1.00 927
E 1.00 1.00 1.00 900
F 1.00 0.99 1.00 923
G 1.00 1.00 1.00 910
H 1.00 1.00 1.00 895
I 1.00 1.00 1.00 884
J 1.00 1.00 1.00 874
K 1.00 0.99 1.00 868
L 1.00 1.00 1.00 893
M 0.99 1.00 0.99 884
N 1.00 0.99 1.00 935
O 1.00 1.00 1.00 887
P 1.00 1.00 1.00 898
Q 0.99 1.00 1.00 837

Page20of 51
R 1.00 1.00 1.00 912
S 1.00 1.00 1.00 861
T 1.00 1.00 1.00 895
u 1.00 1.00 1.00 878
V 1.00 1.00 1.00 901
W 1.00 1.00 1.00 917
X 1.00 1.00 1.00 952
Y 1.00 1.00 1.00 897
Z 1.00 1.00 1.00 904
Accuracy 1.00 23400
Macro avg 1.00 1.00 1.00 23400
Weighted avg 1.00 1.00 1.00 23400

TABLE I: CLASSIFICATION REPORT FOR ASL-MODEL

The confusion matrix provides a summary of the performance of a classification model.


Each row in the matrix represents the instances in the actual class, while each column
represents the instances in the predicted class. Fig 3.7 represents the confusion matrix plotted
between the 26 classes representing the alphabets (A-Z).

Fig. 4.5 – Confusion Matrix

Page21of 51
CHAPTER 5 SYSTEM REQUIREMENTS

1.1 Hardware Requirements

 Camera
 SD Card

Table 5.1 – Hardware Details

Processor 2GHz Intel

HDD 180 GB

RAM 4 GB

1.2 Software Requirements

 Google Colab or Python IDE


 Including resources requirements and prerequisites

Table 5.2 – Software Details

Operating System Windows 10

Programming Language Python

Database Labelled Image

Platform Google Colab, VS Code

Page22of 51
CHAPTER 6 CONCLUSION AND FUTURE SCOPE

6.1 Conclusion and future scope

In summary, our ASL recognition model stands out with an extraordinary accuracy rate of
99.50% in real-time Sign Language Recognition (SLR). This achievement is primarily
attributed to the sophisticated combination of Mediapipe for feature extraction and
Convolutional Neural Networks (CNN) for classification. By leveraging these advanced
techniques, our model offers a robust and precise solution for interpreting ASL hand
gestures.

Central to the success of our model is the meticulous curation and preprocessing of the
dataset. From an initial collection of 13,000 photos, we carefully selected 500
representative images per class, ensuring a balanced and diverse training corpus. This
meticulous approach enabled our model to generalize effectively, recognizing a broad
spectrum of ASL gestures with remarkable accuracy.

Furthermore, we employed data augmentation techniques to enrich the training data,


thereby enhancing the model's ability to handle variations in hand gestures, lighting
conditions, and backgrounds. This augmentation strategy played a pivotal role in
bolstering the model's robustness and performance in real-world scenarios. Looking
ahead, our future objectives are ambitious yet promising. We aim to explore the
integration of additional deep learning architectures and methodologies to further elevate
the precision and speed of our model. By harnessing the latest advancements in AI
research, we aspire to push the boundaries of SLR technology, empowering our model to
excel in diverse environments and contexts.

Moreover, we envision expanding the scope of our model to encompass a wider range of
sign languages and gestures, fostering inclusivity and accessibility on a global scale. This
expansion will involve extensive research and development efforts, including the
integration of machine learning algorithms capable of comprehending entire sign
language sentences and phrases.

Ultimately, our goal is to realize a comprehensive suite of SLR systems that transcend
linguistic barriers, enabling seamless communication between sign language users and the

Page23of 51
broader community. With continued innovation and collaboration, we believe that this
technology has the potential to revolutionize communication and accessibility for
individuals with hearing impairments, facilitating greater integration and participation in
society.

In addition, community engagement efforts that seek feedback and input from the deaf and
hard of hearing community help guarantee that system design is culturally sensitive and
inclusive. Finally, the project's future orientation lies in ongoing research and
collaboration, with the objective of making sign language detection technology more
accessible, intuitive, and powerful for all users.

Page24of 51
REFERENCES

1. Das, S. Gawde, K. Suratwala and D. Kalbande. (2018) "Sign language recognition using
deep learning on custom processed static gesture images," in International Conference on
Smart City and Emerging Technology (ICSCET).

2. A. K. Sahoo. (2021) "Indian sign language recognition using machine learning


techniques," in Macromolecular Symposia.

3. Z. A. Ansari and G. Harit. (2016) "Nearest neighbour classification of Indian sign


language gestures using kinect camera," Sadhana, vol. 41, p. 161–182.

4. Rekha, J. Bhattacharya, and S. Majumder. (2011) "Shape,texture and local movement


hand gesture features for Indian sign language recognition," in 3rd international
conference on trendz in information sciences & computing (TISC2011).

5. M. K. Bhuyan, M. K. Kar, and D. R. Neog. (2011) "Hand pose identification from


monocular image for sign language recognition," in 2011 IEEE International Conference
onSignal and Image Processing Applications (ICSIPA).

6. N. Pugeault and R. Bowden. (2011) "Spelling it out: Real-time ASL fingerspelling


recognition," in 2011 IEEE International conference on computer vision workshops(ICCV
workshops).

7. C. Keskin, F. Kıraç, Y. E. Kara and L. Akarun. (2013) "Real time hand pose estimation
using depth sensors," in Consumer depth cameras for computer vision, Springer, p.119–
137

8. Sundar, B., & Bagyammal, T. (2022). American Sign Language Recognition for
Alphabets Using MediaPipe and LSTM. Procedia Computer Science, 215, 642–
651.https://doi.org/10.1016/j.procs.2022.12.066

9. Bora, J., Dehingia, S., Boruah, A., Chetia, A. A., & Gogoi,D. (2023). Real-time Assamese
Sign Language Recognition using MediaPipe and Deep Learning. Procedia Computer

Page25of 51
Science, 218, 1384–1393. https://doi.org/10.1016/j.procs.2023.01.117

Page26of 51
APPENDIX

Github link: https://github.com/ISHU14singh/final-year-project


Code:
# Importing Libraries
import numpy as np
import math
import cv2
import os, sys
import traceback
import pyttsx3
from keras.models import load_model
from cvzone.HandTrackingModule import HandDetector
from string import ascii_uppercase
import enchant
import tkinter as tk
from PIL import Image, ImageTk

offset=29

# os.environ["THEANO_FLAGS"] = "device=cuda, assert_no_cpu_op=True"

hs=enchant.Dict("en-US")
hd = HandDetector(maxHands=1)
hd2 = HandDetector(maxHands=1)

class Application:

def __init__(self):
self.vs = cv2.VideoCapture(0)
self.current_image = None
self.model = load_model('model.h5')
self.speak_engine=pyttsx3.init()

Page27of 51
self.speak_engine.setProperty("rate",100)
voices=self.speak_engine.getProperty("voices")
self.speak_engine.setProperty("voice",voices[0].id)

self.ct = {}
self.ct['blank'] = 0
self.blank_flag = 0
self.space_flag=False
self.next_flag=True
self.prev_char=""
self.count=-1
self.ten_prev_char=[]
for i in range(10):
self.ten_prev_char.append(" ")

for i in ascii_uppercase:
self.ct[i] = 0

# print("Loaded model from disk")

self.root = tk.Tk()
self.root.title("Sign Language To Text Conversion")
self.root.protocol('WM_DELETE_WINDOW', self.destructor)
self.root.geometry("1300x700")

self.panel = tk.Label(self.root)
self.panel.place(x=100, y=3, width=480, height=640)

self.panel2 = tk.Label(self.root) # initialize image panel


self.panel2.place(x=700, y=115, width=400, height=400)

self.T = tk.Label(self.root)
self.T.place(x=60, y=5)

Page28of 51
self.T.config(text="Sign Language To Text Conversion", font=("Courier", 30, "bold"))

self.panel3 = tk.Label(self.root) # Current Symbol


self.panel3.place(x=280, y=585)

self.T1 = tk.Label(self.root)
self.T1.place(x=10, y=580)
self.T1.config(text="Character :", font=("Courier", 30, "bold"))

self.panel5 = tk.Label(self.root) # Sentence


self.panel5.place(x=260, y=632)

self.T3 = tk.Label(self.root)
self.T3.place(x=10, y=632)
self.T3.config(text="Sentence :", font=("Courier", 30, "bold"))

self.T4 = tk.Label(self.root)
self.T4.place(x=10, y=700)
self.T4.config(text="Suggestions :", fg="red", font=("Courier", 30, "bold"))

self.b1=tk.Button(self.root)
self.b1.place(x=390,y=700)

self.b2 = tk.Button(self.root)
self.b2.place(x=590, y=700)

self.b3 = tk.Button(self.root)
self.b3.place(x=790, y=700)

self.b4 = tk.Button(self.root)
self.b4.place(x=990, y=700)

self.speak = tk.Button(self.root)

Page29of 51
self.speak.place(x=1305, y=630)
self.speak.config(text="Speak", font=("Courier", 20), wraplength=100,
command=self.speak_fun)

self.clear = tk.Button(self.root)
self.clear.place(x=1205, y=630)
self.clear.config(text="Clear", font=("Courier", 20), wraplength=100,
command=self.clear_fun)

self.str = " "


self.ccc=0
self.word = " "
self.current_symbol = "C"
self.photo = "Empty"

self.word1=" "
self.word2 = " "
self.word3 = " "
self.word4 = " "

self.video_loop()

def video_loop(self):
try:
ok, frame = self.vs.read()
cv2image = cv2.flip(frame, 1)
hands = hd.findHands(cv2image, draw=False, flipType=True)
cv2image_copy=np.array(cv2image)
cv2image = cv2.cvtColor(cv2image, cv2.COLOR_BGR2RGB)
self.current_image = Image.fromarray(cv2image)

Page30of 51
imgtk = ImageTk.PhotoImage(image=self.current_image)
self.panel.imgtk = imgtk
self.panel.config(image=imgtk)

if hands:
# #print(" --------- lmlist=",hands[1])
hand = hands[0]

lmList, bbox = () , ()
for item in hand:
lmList = item['lmList']
bbox = item['bbox']

x, y, w, h = bbox[0]+300, bbox[1]+300, bbox[2]+350, bbox[3]+100


# x, y, w, h = hand['bbox']
image = cv2image_copy[y - offset:y + h + offset, x - offset:x + w + offset]

white = cv2.imread("C:\\Users\\MY LENOVO\\Desktop\\signLanguage\\white.jpg")

handz = hd2.findHands(image, draw=False, flipType=True)


# print(" ", self.ccc)
self.ccc += 1
if handz:
hand = handz[0]
self.pts = lmList
# self.pts = hand['lmList']

os = ((400 - w) // 2) - 15
os1 = ((400 - h) // 2) - 15
for t in range(0, 4, 1):
cv2.line(white, (self.pts[t][0] + os, self.pts[t][1] + os1), (self.pts[t + 1][0] + os, self.pts[t + 1][1]
+ os1),
(0, 255, 0), 3)

Page31of 51
for t in range(5, 8, 1):
cv2.line(white, (self.pts[t][0] + os, self.pts[t][1] + os1), (self.pts[t + 1][0] + os, self.pts[t + 1][1]
+ os1),
(0, 255, 0), 3)
for t in range(9, 12, 1):
cv2.line(white, (self.pts[t][0] + os, self.pts[t][1] + os1), (self.pts[t + 1][0] + os, self.pts[t + 1][1]
+ os1),
(0, 255, 0), 3)
for t in range(13, 16, 1):
cv2.line(white, (self.pts[t][0] + os, self.pts[t][1] + os1), (self.pts[t + 1][0] + os, self.pts[t + 1][1]
+ os1),
(0, 255, 0), 3)
for t in range(17, 20, 1):
cv2.line(white, (self.pts[t][0] + os, self.pts[t][1] + os1), (self.pts[t + 1][0] + os, self.pts[t + 1][1]
+ os1),
(0, 255, 0), 3)
cv2.line(white, (self.pts[5][0] + os, self.pts[5][1] + os1), (self.pts[9][0] + os, self.pts[9][1] +
os1), (0, 255, 0),
3)
cv2.line(white, (self.pts[9][0] + os, self.pts[9][1] + os1), (self.pts[13][0] + os, self.pts[13][1] +
os1), (0, 255, 0),
3)
cv2.line(white, (self.pts[13][0] + os, self.pts[13][1] + os1), (self.pts[17][0] + os, self.pts[17][1]
+ os1),
(0, 255, 0), 3)
cv2.line(white, (self.pts[0][0] + os, self.pts[0][1] + os1), (self.pts[5][0] + os, self.pts[5][1] +
os1), (0, 255, 0),
3)
cv2.line(white, (self.pts[0][0] + os, self.pts[0][1] + os1), (self.pts[17][0] + os, self.pts[17][1] +
os1), (0, 255, 0),
3)

for i in range(21):
cv2.circle(white, (self.pts[i][0] + os, self.pts[i][1] + os1), 2, (0, 0, 255), 1)

Page32of 51
res=white
self.predict(res)

self.current_image2 = Image.fromarray(res)

imgtk = ImageTk.PhotoImage(image=self.current_image2)

self.panel2.imgtk = imgtk
self.panel2.config(image=imgtk)

self.panel3.config(text=self.current_symbol, font=("Courier", 30))

#self.panel4.config(text=self.word, font=("Courier", 30))

self.b1.config(text=self.word1, font=("Courier", 20), wraplength=825, command=self.action1)


self.b2.config(text=self.word2, font=("Courier", 20), wraplength=825, command=self.action2)
self.b3.config(text=self.word3, font=("Courier", 20), wraplength=825, command=self.action3)
self.b4.config(text=self.word4, font=("Courier", 20), wraplength=825, command=self.action4)

self.panel5.config(text=self.str, font=("Courier", 30), wraplength=1025)


except Exception:
print("========")
finally:
self.root.after(1, self.video_loop)

def distance(self,x,y):
return math.sqrt(((x[0] - y[0]) ** 2) + ((x[1] - y[1]) ** 2))

def action1(self):
idx_space = self.str.rfind(" ")
idx_word = self.str.find(self.word, idx_space)

Page33of 51
last_idx = len(self.str)
self.str = self.str[:idx_word]
self.str = self.str + self.word1.upper()

def action2(self):
idx_space = self.str.rfind(" ")
idx_word = self.str.find(self.word, idx_space)
last_idx = len(self.str)
self.str=self.str[:idx_word]
self.str=self.str+self.word2.upper()
#self.str[idx_word:last_idx] = self.word2

def action3(self):
idx_space = self.str.rfind(" ")
idx_word = self.str.find(self.word, idx_space)
last_idx = len(self.str)
self.str = self.str[:idx_word]
self.str = self.str + self.word3.upper()

def action4(self):
idx_space = self.str.rfind(" ")
idx_word = self.str.find(self.word, idx_space)
last_idx = len(self.str)
self.str = self.str[:idx_word]
self.str = self.str + self.word4.upper()

def speak_fun(self):
self.speak_engine.say(self.str)
self.speak_engine.runAndWait()

Page34of 51
def clear_fun(self):
self.str=" "
self.word1 = " "
self.word2 = " "
self.word3 = " "
self.word4 = " "

def predict(self, test_image):


white=test_image
white = white.reshape(1, 400, 400, 3)
prob = np.array(self.model.predict(white)[0], dtype='float32')
ch1 = np.argmax(prob, axis=0)
prob[ch1] = 0
ch2 = np.argmax(prob, axis=0)
prob[ch2] = 0
ch3 = np.argmax(prob, axis=0)
prob[ch3] = 0

pl = [ch1, ch2]

# condition for [Aemnst]


l = [[5, 2], [5, 3], [3, 5], [3, 6], [3, 0], [3, 2], [6, 4], [6, 1], [6, 2], [6, 6], [6, 7], [6, 0], [6, 5],
[4, 1], [1, 0], [1, 1], [6, 3], [1, 6], [5, 6], [5, 1], [4, 5], [1, 4], [1, 5], [2, 0], [2, 6], [4, 6],
[1, 0], [5, 7], [1, 6], [6, 1], [7, 6], [2, 5], [7, 1], [5, 4], [7, 0], [7, 5], [7, 2]]
if pl in l:
if (self.pts[6][1] < self.pts[8][1] and self.pts[10][1] < self.pts[12][1] and self.pts[14][1] <
self.pts[16][1] and self.pts[18][1] < self.pts[20][
1]):
ch1 = 0
# print("00000")

# condition for [o][s]


l = [[2, 2], [2, 1]]
if pl in l:

Page35of 51
if (self.pts[5][0] < self.pts[4][0]):
ch1 = 0
print("++++++++++++++++++")
# print("00000")

# condition for [c0][aemnst]


l = [[0, 0], [0, 6], [0, 2], [0, 5], [0, 1], [0, 7], [5, 2], [7, 6], [7, 1]]
pl = [ch1, ch2]
if pl in l:
if (self.pts[0][0] > self.pts[8][0] and self.pts[0][0] > self.pts[4][0] and self.pts[0][0] >
self.pts[12][0] and self.pts[0][0] > self.pts[16][
0] and self.pts[0][0] > self.pts[20][0]) and self.pts[5][0] > self.pts[4][0]:
ch1 = 2
# print("22222")

# condition for [c0][aemnst]


l = [[6, 0], [6, 6], [6, 2]]
pl = [ch1, ch2]
if pl in l:
if self.distance(self.pts[8], self.pts[16]) < 52:
ch1 = 2
# print("22222")

# condition for [gh][bdfikruvw]


l = [[1, 4], [1, 5], [1, 6], [1, 3], [1, 0]]
pl = [ch1, ch2]

if pl in l:
if self.pts[6][1] > self.pts[8][1] and self.pts[14][1] < self.pts[16][1] and self.pts[18][1] <
self.pts[20][1] and self.pts[0][0] < self.pts[8][
0] and self.pts[0][0] < self.pts[12][0] and self.pts[0][0] < self.pts[16][0] and self.pts[0][0] <
self.pts[20][0]:
ch1 = 3

Page36of 51
print("33333c")

# con for [gh][l]


l = [[4, 6], [4, 1], [4, 5], [4, 3], [4, 7]]
pl = [ch1, ch2]
if pl in l:
if self.pts[4][0] > self.pts[0][0]:
ch1 = 3
print("33333b")

# con for [gh][pqz]


l = [[5, 3], [5, 0], [5, 7], [5, 4], [5, 2], [5, 1], [5, 5]]
pl = [ch1, ch2]
if pl in l:
if self.pts[2][1] + 15 < self.pts[16][1]:
ch1 = 3
print("33333a")

# con for [l][x]


l = [[6, 4], [6, 1], [6, 2]]
pl = [ch1, ch2]
if pl in l:
if self.distance(self.pts[4], self.pts[11]) > 55:
ch1 = 4
# print("44444")

# con for [l][d]


l = [[1, 4], [1, 6], [1, 1]]
pl = [ch1, ch2]
if pl in l:
if (self.distance(self.pts[4], self.pts[11]) > 50) and (

Page37of 51
self.pts[6][1] > self.pts[8][1] and self.pts[10][1] < self.pts[12][1] and self.pts[14][1] <
self.pts[16][1] and self.pts[18][1] <
self.pts[20][1]):
ch1 = 4
# print("44444")

# con for [l][gh]


l = [[3, 6], [3, 4]]
pl = [ch1, ch2]
if pl in l:
if (self.pts[4][0] < self.pts[0][0]):
ch1 = 4
# print("44444")

# con for [l][c0]


l = [[2, 2], [2, 5], [2, 4]]
pl = [ch1, ch2]
if pl in l:
if (self.pts[1][0] < self.pts[12][0]):
ch1 = 4
# print("44444")

# con for [l][c0]


l = [[2, 2], [2, 5], [2, 4]]
pl = [ch1, ch2]
if pl in l:
if (self.pts[1][0] < self.pts[12][0]):
ch1 = 4
# print("44444")

# con for [gh][z]


l = [[3, 6], [3, 5], [3, 4]]
pl = [ch1, ch2]
if pl in l:

Page38of 51
if (self.pts[6][1] > self.pts[8][1] and self.pts[10][1] < self.pts[12][1] and self.pts[14][1] <
self.pts[16][1] and self.pts[18][1] < self.pts[20][
1]) and self.pts[4][1] > self.pts[10][1]:
ch1 = 5
print("55555b")

# con for [gh][pq]


l = [[3, 2], [3, 1], [3, 6]]
pl = [ch1, ch2]
if pl in l:
if self.pts[4][1] + 17 > self.pts[8][1] and self.pts[4][1] + 17 > self.pts[12][1] and self.pts[4][1]
+ 17 > self.pts[16][1] and self.pts[4][
1] + 17 > self.pts[20][1]:
ch1 = 5
print("55555a")

# con for [l][pqz]


l = [[4, 4], [4, 5], [4, 2], [7, 5], [7, 6], [7, 0]]
pl = [ch1, ch2]
if pl in l:
if self.pts[4][0] > self.pts[0][0]:
ch1 = 5
# print("55555")

# con for [pqz][aemnst]


l = [[0, 2], [0, 6], [0, 1], [0, 5], [0, 0], [0, 7], [0, 4], [0, 3], [2, 7]]
pl = [ch1, ch2]
if pl in l:
if self.pts[0][0] < self.pts[8][0] and self.pts[0][0] < self.pts[12][0] and self.pts[0][0] <
self.pts[16][0] and self.pts[0][0] < self.pts[20][0]:
ch1 = 5
# print("55555")

# con for [pqz][yj]

Page39of 51
l = [[5, 7], [5, 2], [5, 6]]
pl = [ch1, ch2]
if pl in l:
if self.pts[3][0] < self.pts[0][0]:
ch1 = 7
# print("77777")

# con for [l][yj]


l = [[4, 6], [4, 2], [4, 4], [4, 1], [4, 5], [4, 7]]
pl = [ch1, ch2]
if pl in l:
if self.pts[6][1] < self.pts[8][1]:
ch1 = 7
# print("77777")

# con for [x][yj]


l = [[6, 7], [0, 7], [0, 1], [0, 0], [6, 4], [6, 6], [6, 5], [6, 1]]
pl = [ch1, ch2]
if pl in l:
if self.pts[18][1] > self.pts[20][1]:
ch1 = 7
# print("77777")

# condition for [x][aemnst]


l = [[0, 4], [0, 2], [0, 3], [0, 1], [0, 6]]
pl = [ch1, ch2]
if pl in l:
if self.pts[5][0] > self.pts[16][0]:
ch1 = 6
# print("666661")

# condition for [yj][x]


print("2222 ch1=+++++++++++++++++", ch1, ",", ch2)

Page40of 51
l = [[7, 2]]
pl = [ch1, ch2]
if pl in l:
if self.pts[18][1] < self.pts[20][1] and self.pts[8][1] < self.pts[10][1]:
ch1 = 6
# print("666662")

# condition for [c0][x]


l = [[2, 1], [2, 2], [2, 6], [2, 7], [2, 0]]
pl = [ch1, ch2]
if pl in l:
if self.distance(self.pts[8], self.pts[16]) > 50:
ch1 = 6
# print("666663")

# con for [l][x]

l = [[4, 6], [4, 2], [4, 1], [4, 4]]


pl = [ch1, ch2]
if pl in l:
if self.distance(self.pts[4], self.pts[11]) < 60:
ch1 = 6
# print("666664")

# con for [x][d]


l = [[1, 4], [1, 6], [1, 0], [1, 2]]
pl = [ch1, ch2]
if pl in l:
if self.pts[5][0] - self.pts[4][0] - 15 > 0:
ch1 = 6
# print("666665")

# con for [b][pqz]


l = [[5, 0], [5, 1], [5, 4], [5, 5], [5, 6], [6, 1], [7, 6], [0, 2], [7, 1], [7, 4], [6, 6], [7, 2], [5, 0],

Page41of 51
[6, 3], [6, 4], [7, 5], [7, 2]]
pl = [ch1, ch2]
if pl in l:
if (self.pts[6][1] > self.pts[8][1] and self.pts[10][1] > self.pts[12][1] and self.pts[14][1] >
self.pts[16][1] and self.pts[18][1] > self.pts[20][
1]):
ch1 = 1
# print("111111")

# con for [f][pqz]


l = [[6, 1], [6, 0], [0, 3], [6, 4], [2, 2], [0, 6], [6, 2], [7, 6], [4, 6], [4, 1], [4, 2], [0, 2], [7, 1],
[7, 4], [6, 6], [7, 2], [7, 5], [7, 2]]
pl = [ch1, ch2]
if pl in l:
if (self.pts[6][1] < self.pts[8][1] and self.pts[10][1] > self.pts[12][1] and self.pts[14][1] >
self.pts[16][1] and
self.pts[18][1] > self.pts[20][1]):
ch1 = 1
# print("111112")

l = [[6, 1], [6, 0], [4, 2], [4, 1], [4, 6], [4, 4]]
pl = [ch1, ch2]
if pl in l:
if (self.pts[10][1] > self.pts[12][1] and self.pts[14][1] > self.pts[16][1] and
self.pts[18][1] > self.pts[20][1]):
ch1 = 1
# print("111112")

# con for [d][pqz]


fg = 19
# print("_________________ch1=",ch1," ch2=",ch2)
l = [[5, 0], [3, 4], [3, 0], [3, 1], [3, 5], [5, 5], [5, 4], [5, 1], [7, 6]]
pl = [ch1, ch2]
if pl in l:

Page42of 51
if ((self.pts[6][1] > self.pts[8][1] and self.pts[10][1] < self.pts[12][1] and self.pts[14][1] <
self.pts[16][1] and
self.pts[18][1] < self.pts[20][1]) and (self.pts[2][0] < self.pts[0][0]) and self.pts[4][1] >
self.pts[14][1]):
ch1 = 1
# print("111113")

l = [[4, 1], [4, 2], [4, 4]]


pl = [ch1, ch2]
if pl in l:
if (self.distance(self.pts[4], self.pts[11]) < 50) and (
self.pts[6][1] > self.pts[8][1] and self.pts[10][1] < self.pts[12][1] and self.pts[14][1] <
self.pts[16][1] and self.pts[18][1] <
self.pts[20][1]):
ch1 = 1
# print("1111993")

l = [[3, 4], [3, 0], [3, 1], [3, 5], [3, 6]]
pl = [ch1, ch2]
if pl in l:
if ((self.pts[6][1] > self.pts[8][1] and self.pts[10][1] < self.pts[12][1] and self.pts[14][1] <
self.pts[16][1] and
self.pts[18][1] < self.pts[20][1]) and (self.pts[2][0] < self.pts[0][0]) and self.pts[14][1] <
self.pts[4][1]):
ch1 = 1
# print("1111mmm3")

l = [[6, 6], [6, 4], [6, 1], [6, 2]]


pl = [ch1, ch2]
if pl in l:
if self.pts[5][0] - self.pts[4][0] - 15 < 0:
ch1 = 1
# print("1111140")

Page43of 51
# con for [i][pqz]
l = [[5, 4], [5, 5], [5, 1], [0, 3], [0, 7], [5, 0], [0, 2], [6, 2], [7, 5], [7, 1], [7, 6], [7, 7]]
pl = [ch1, ch2]
if pl in l:
if ((self.pts[6][1] < self.pts[8][1] and self.pts[10][1] < self.pts[12][1] and self.pts[14][1] <
self.pts[16][1] and
self.pts[18][1] > self.pts[20][1])):
ch1 = 1

# con for [yj][bfdi]


l = [[1, 5], [1, 7], [1, 1], [1, 6], [1, 3], [1, 0]]
pl = [ch1, ch2]
if pl in l:
if (self.pts[4][0] < self.pts[5][0] + 15) and (
(self.pts[6][1] < self.pts[8][1] and self.pts[10][1] < self.pts[12][1] and self.pts[14][1] <
self.pts[16][1] and
self.pts[18][1] > self.pts[20][1])):
ch1 = 7
# print("111114lll;;p")

# con for [uvr]


l = [[5, 5], [5, 0], [5, 4], [5, 1], [4, 6], [4, 1], [7, 6], [3, 0], [3, 5]]
pl = [ch1, ch2]
if pl in l:
if ((self.pts[6][1] > self.pts[8][1] and self.pts[10][1] > self.pts[12][1] and self.pts[14][1] <
self.pts[16][1] and
self.pts[18][1] < self.pts[20][1])) and self.pts[4][1] > self.pts[14][1]:
ch1 = 1
# print("111115")

# con for [w]


fg = 13
l = [[3, 5], [3, 0], [3, 6], [5, 1], [4, 1], [2, 0], [5, 0], [5, 5]]
pl = [ch1, ch2]

Page44of 51
if pl in l:
if not (self.pts[0][0] + fg < self.pts[8][0] and self.pts[0][0] + fg < self.pts[12][0] and self.pts[0]
[0] + fg < self.pts[16][0] and
self.pts[0][0] + fg < self.pts[20][0]) and not (
self.pts[0][0] > self.pts[8][0] and self.pts[0][0] > self.pts[12][0] and self.pts[0][0] > self.pts[16]
[0] and self.pts[0][0] > self.pts[20][
0]) and self.distance(self.pts[4], self.pts[11]) < 50:
ch1 = 1
# print("111116")

# con for [w]

l = [[5, 0], [5, 5], [0, 1]]


pl = [ch1, ch2]
if pl in l:
if self.pts[6][1] > self.pts[8][1] and self.pts[10][1] > self.pts[12][1] and self.pts[14][1] >
self.pts[16][1]:
ch1 = 1
if ch1 == 0:
ch1 = 'S'
if self.pts[4][0] < self.pts[6][0] and self.pts[4][0] < self.pts[10][0] and self.pts[4][0] <
self.pts[14][0] and self.pts[4][0] < self.pts[18][0]:
ch1 = 'A'
if self.pts[4][0] > self.pts[6][0] and self.pts[4][0] < self.pts[10][0] and self.pts[4][0] <
self.pts[14][0] and self.pts[4][0] < self.pts[18][
0] and self.pts[4][1] < self.pts[14][1] and self.pts[4][1] < self.pts[18][1]:
ch1 = 'T'
if self.pts[4][1] > self.pts[8][1] and self.pts[4][1] > self.pts[12][1] and self.pts[4][1] >
self.pts[16][1] and self.pts[4][1] > self.pts[20][1]:
ch1 = 'E'
if self.pts[4][0] > self.pts[6][0] and self.pts[4][0] > self.pts[10][0] and self.pts[4][0] >
self.pts[14][0] and self.pts[4][1] < self.pts[18][1]:
ch1 = 'M'

Page45of 51
if self.pts[4][0] > self.pts[6][0] and self.pts[4][0] > self.pts[10][0] and self.pts[4][1] <
self.pts[18][1] and self.pts[4][1] < self.pts[14][1]:
ch1 = 'N'

if ch1 == 2:
if self.distance(self.pts[12], self.pts[4]) > 42:
ch1 = 'C'
else:
ch1 = 'O'

if ch1 == 3:
if (self.distance(self.pts[8], self.pts[12])) > 72:
ch1 = 'G'
else:
ch1 = 'H'

if ch1 == 7:
if self.distance(self.pts[8], self.pts[4]) > 42:
ch1 = 'Y'
else:
ch1 = 'J'

if ch1 == 4:
ch1 = 'L'

if ch1 == 6:
ch1 = 'X'

if ch1 == 5:
if self.pts[4][0] > self.pts[12][0] and self.pts[4][0] > self.pts[16][0] and self.pts[4][0] >
self.pts[20][0]:
if self.pts[8][1] < self.pts[5][1]:
ch1 = 'Z'
else:

Page46of 51
ch1 = 'Q'
else:
ch1 = 'P'

if ch1 == 1:
if (self.pts[6][1] > self.pts[8][1] and self.pts[10][1] > self.pts[12][1] and self.pts[14][1] >
self.pts[16][1] and self.pts[18][1] > self.pts[20][
1]):
ch1 = 'B'
if (self.pts[6][1] > self.pts[8][1] and self.pts[10][1] < self.pts[12][1] and self.pts[14][1] <
self.pts[16][1] and self.pts[18][1] < self.pts[20][
1]):
ch1 = 'D'
if (self.pts[6][1] < self.pts[8][1] and self.pts[10][1] > self.pts[12][1] and self.pts[14][1] >
self.pts[16][1] and self.pts[18][1] > self.pts[20][
1]):
ch1 = 'F'
if (self.pts[6][1] < self.pts[8][1] and self.pts[10][1] < self.pts[12][1] and self.pts[14][1] <
self.pts[16][1] and self.pts[18][1] > self.pts[20][
1]):
ch1 = 'I'
if (self.pts[6][1] > self.pts[8][1] and self.pts[10][1] > self.pts[12][1] and self.pts[14][1] >
self.pts[16][1] and self.pts[18][1] < self.pts[20][
1]):
ch1 = 'W'
if (self.pts[6][1] > self.pts[8][1] and self.pts[10][1] > self.pts[12][1] and self.pts[14][1] <
self.pts[16][1] and self.pts[18][1] < self.pts[20][
1]) and self.pts[4][1] < self.pts[9][1]:
ch1 = 'K'
if ((self.distance(self.pts[8], self.pts[12]) - self.distance(self.pts[6], self.pts[10])) < 8) and (
self.pts[6][1] > self.pts[8][1] and self.pts[10][1] > self.pts[12][1] and self.pts[14][1] <
self.pts[16][1] and self.pts[18][1] <
self.pts[20][1]):
ch1 = 'U'

Page47of 51
if ((self.distance(self.pts[8], self.pts[12]) - self.distance(self.pts[6], self.pts[10])) >= 8) and (
self.pts[6][1] > self.pts[8][1] and self.pts[10][1] > self.pts[12][1] and self.pts[14][1] <
self.pts[16][1] and self.pts[18][1] <
self.pts[20][1]) and (self.pts[4][1] > self.pts[9][1]):
ch1 = 'V'

if (self.pts[8][0] > self.pts[12][0]) and (


self.pts[6][1] > self.pts[8][1] and self.pts[10][1] > self.pts[12][1] and self.pts[14][1] <
self.pts[16][1] and self.pts[18][1] <
self.pts[20][1]):
ch1 = 'R'

if ch1 == 1 or ch1 =='E' or ch1 =='S' or ch1 =='X' or ch1 =='Y' or ch1 =='B':
if (self.pts[6][1] > self.pts[8][1] and self.pts[10][1] < self.pts[12][1] and self.pts[14][1] <
self.pts[16][1] and self.pts[18][1] > self.pts[20][1]):
ch1=" "

print(self.pts[4][0] < self.pts[5][0])


if ch1 == 'E' or ch1=='Y' or ch1=='B':
if (self.pts[4][0] < self.pts[5][0]) and (self.pts[6][1] > self.pts[8][1] and self.pts[10][1] >
self.pts[12][1] and self.pts[14][1] > self.pts[16][1] and self.pts[18][1] > self.pts[20][1]):
ch1="next"

if ch1 == 'Next' or 'B' or 'C' or 'H' or 'F' or 'X':


if (self.pts[0][0] > self.pts[8][0] and self.pts[0][0] > self.pts[12][0] and self.pts[0][0] >
self.pts[16][0] and self.pts[0][0] > self.pts[20][0]) and (self.pts[4][1] < self.pts[8][1] and
self.pts[4][1] < self.pts[12][1] and self.pts[4][1] < self.pts[16][1] and self.pts[4][1] <
self.pts[20][1]) and (self.pts[4][1] < self.pts[6][1] and self.pts[4][1] < self.pts[10][1] and
self.pts[4][1] < self.pts[14][1] and self.pts[4][1] < self.pts[18][1]):
ch1 = 'space'

if ch1 == "next" and self.prev_char != "next":

Page48of 51
if self.ten_prev_char[(self.count - 2) % 10] != "next":
char_to_append = self.ten_prev_char[(self.count - 2) % 10]
# Replace "Backspace" with a space character
if char_to_append == "space":
char_to_append = " "
self.str += char_to_append
else:
char_to_append = self.ten_prev_char[(self.count - 0) % 10]
# Replace "Backspace" with a space character
if char_to_append == "space":
char_to_append = " "
self.str += char_to_append

if ch1==" " and self.prev_char!=" ":


self.str = self.str + " "

self.prev_char=ch1
self.current_symbol=ch1
self.count += 1
self.ten_prev_char[self.count%10]=ch1

if len(self.str.strip())!=0:
st=self.str.rfind(" ")
ed=len(self.str)
word=self.str[st+1:ed]
self.word=word
print("----------word = ",word)
if len(word.strip())!=0:
hs.check(word)
lenn = len(hs.suggest(word))
if lenn >= 4:
self.word4 = hs.suggest(word)[3]

Page49of 51
if lenn >= 3:
self.word3 = hs.suggest(word)[2]

if lenn >= 2:
self.word2 = hs.suggest(word)[1]

if lenn >= 1:
self.word1 = hs.suggest(word)[0]
else:
self.word1 = " "
self.word2 = " "
self.word3 = " "
self.word4 = " "

def destructor(self):

print("Closing Application...")
# print(self.ten_prev_char)
self.root.destroy()
self.vs.release()
cv2.destroyAllWindows()

print("Starting Application...")

(Application()).root.mainloop()

OUTPUT:

Page50of 51
Page51of 51
Page52of 51

You might also like