0% found this document useful (0 votes)
57 views51 pages

Final Report

The document is a project report on 'Sign Language Recognition Using Machine Learning' submitted by students at Medi-Caps University for their Bachelor of Technology degree. It outlines the objectives, significance, and methodologies of developing a system that interprets sign language gestures through machine learning, aiming to enhance communication for the hearing-impaired community. The report includes sections on requirements specification, design, implementation, testing, and future scope of the project.

Uploaded by

en21cs301878
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views51 pages

Final Report

The document is a project report on 'Sign Language Recognition Using Machine Learning' submitted by students at Medi-Caps University for their Bachelor of Technology degree. It outlines the objectives, significance, and methodologies of developing a system that interprets sign language gestures through machine learning, aiming to enhance communication for the hearing-impaired community. The report includes sections on requirements specification, design, implementation, testing, and future scope of the project.

Uploaded by

en21cs301878
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

SIGN LANGUAGE RECOGNIZATION USING

MACHINE LEARNING

A Minor Project Report


Submitted in partial fulfillment of requirement of the
Degree of
BACHELOR OF TECHNOLOGY in COMPUTER SCIENCE &
ENGINEERING
BY
VIVEK PARIHAR & EN21CS301874
YAMAN KUMAR SAHOO & EN21CS301878
YOGESH SEPTA & EN21CS301890

Under the Guidance of


Prof. Rashmi Choudhary

Department of Computer Science & Engineering


Faculty of Engineering
MEDI-CAPS UNIVERSITY, INDORE- 453331

APRIL-2024
Report Approval

The project work “SIGN LANGUAGE RECOGNIZATION USING MACHINE


LEARNING” is hereby approved as a creditable study of an engineering/computer
application subject carried out and presented in a manner satisfactory to warrant its
acceptance as prerequisite for the Degree for which it has been submitted.

It is to be understood that by this approval the undersigned do not endorse or approve


any statement made, opinion expressed, or conclusion drawn there in; but approve the
“Project Report” only for the purpose for which it has been submitted.

Internal Examiner
Name:
Designation
Affiliation

External Examiner
Name:
Designation
Affiliation
Declaration

I/We hereby declare that the project entitled “SIGN LANGUAGE


RECOGNIZATION USING MACHINE LEARNING”” submitted in partial
fulfillment for the award of the degree of Bachelor of Technology/Master of
Computer Applications in ‘COMPUTER SCIENCE & ENGINEERING’ completed
under the supervision of Prof. Rashmi Choudhary, Professor, Department of
Computer Science and Engineering, Faculty of Engineering, Medi-Caps University
Indore is an authentic work.

Further, I/we declare that the content of this Project work, in full or in parts, have
neither been taken from any other source nor have been submitted to any other
Institute or University for the award of any degree or diploma.

Signature and name of the student(s) with date

VIVEK PARIHAR (EN21CS301874) ______________________

YAMAN KUMAR SAHOO (EN21CS301878) _______________

YOGESH SEPTA (EN21CS301890)______________________


Certificate

I/We, Prof. Rashmi Choudhary certify that the project entitled “SIGN
LANGUAGE RECOGNIZATION USING MACHINE LEARNING” submitted
in partial fulfillment for the award of the degree of Bachelor of Technology/Master of
Computer Applications by VIVEK PARIHAR, YAMAN KUMAR SAHOO,
YOGESH SEPTA is the record carried out by him/them under my/our guidance and
that the work has not formed the basis of award of any other degree elsewhere.

________________________________ __________________________

Prof. Rashmi Choudhary External Guide


Computer Science & Engineering NAME: __________________

Medi-Caps University, Indore

_____________________

Dr. Ratnesh Litoriya

Head of the Department


Computer Science & Engineering

Medi-Caps University, Indore


Acknowledgements

I would like to express my deepest gratitude to the Honorable Chancellor, Shri R C Mittal, who
has provided me with every facility to successfully carry out this project, and my profound
indebtedness to Prof. (Dr.) D. K. Patnaik, Vice Chancellor, Medi-Caps University, whose unfailing
support and enthusiasm has always boosted up my morale. I also thank Prof. (Dr.) Pramod S. Nair,
Dean, Faculty of Engineering, Medi-Caps University, for giving me a chance to work on this project.
I would also like to thank my Head of the Department Dr. Ratnesh Litoriya for his continuous
encouragement for the betterment of the project.

I express my heartfelt gratitude to my Internal Guide, Prof. Rashmi Choudhary


without whose continuous help and support, this project would ever have reached to the
completion.

It is their help and support, due to which we became able to complete the design and
technical report.

Without their support this report would not have been possible.

VIVEK PARIHAR
(EN21CS301874)
YAMAN KUMAR SAHOO
(EN21CS301878)
YOGESH SEPTA
(EN21CS301890)
B.Tech. III Year
Department of Computer Science & Engineering
Faculty of Engineering
Medi-Caps University, Indore
Abstract
The Sign Language Recognition System is a technology designed to understand and interpret
sign language gestures. It involves the collection of diverse sign language datasets using sensors
like gloves with sensors or cameras. Preprocessing techniques clean and enhance the captured
data, and feature extraction identifies key aspects of the gestures. Machine learning models, such
as Convolutional Neural Networks or Recurrent Neural Networks, are trained on the data to
associate hand movements with specific meanings. The system is validated and tested for
accuracy, and a user interface is implemented for communication. Real-time processing enables
immediate recognition, and continuous improvement is achieved through updates and user
feedback. This technology facilitates communication between individuals using sign language
and those who may not understand it, contributing to inclusivity and accessibility.

The primary objective of this system is to alleviate communication challenges faced by the
hearing-impaired community by automating the recognition of sign language gestures in real-
time. By utilizing advanced machine learning algorithms, the system can interpret and translate
sign language gestures into meaningful and accessible information. This technological
innovation not only promotes inclusivity but also fosters independence for individuals with
hearing impairments, allowing them to communicate effectively and seamlessly in various
contexts. The integration of these powerful technologies showcases a holistic approach to
bridging communication gaps and creating a more inclusive environment for the hearing-
impaired population.

Keywords:

• Machine Learning
• User Interface
• CNN
• Hearing Impaired
• Innovation
• Accessibility
• Gesture Recognition.
Table of Contents
Page No.

Report Approval ii
Declaration iii
Certificate iv
Acknowledgement v
Abstract vi
Table of Contents vii
List of figures viii
Abbreviations ix
Notations & Symbols x
Chapter 1 Introduction
1.1 Introduction 1
1.2 Literature Review 1-2
1.3 Objectives 2-3
1.4 Significance 3
1.5 Research Design 3
1.6 Source of Data 3
Chapter 2 REQUIREMENTS SPECIFICATION
2.1 User Characteristics 4
2.2 Functional Requirements 5

2.3 Dependencies 5-6


2.4 Performance Requirements 6
2.5 Hardware Requirements 7
2.6 Constraints & Assumptions 7-8
Chapter 3 DESIGN
3.1 Algorithm 9-11

3.2 Finger Spelling Sentence Formation Implementation 11

3.3 System Design


3.3.1 Data Flow Diagrams (Level 0,Level1) 12-13
3.3.2 Activity Diagram 14
3.3.3 Flow Chart 15

3.3.4 Class Diagram 16


3.3.5 ER Diagram 17
3.3.6 Sequence diagram 18
3.3.7 Use-Case diagram 19

Chapter 4 Implementation, Testing, and Maintenance


4.1 Introduction to Languages, IDE’s, Tools and Technologies used 20-24
for Implementation
4.2 Testing Techniques and Test Plans 24-29
4.3 End User Instructions 30

Chapter 5 Results and Discussions


5.1 User Interface Representation 31
5.2 Snapshots of system with brief detail of each 32-34
5.3 Brief Description of Various Modules of the system 34-35
Chapter 6 Summary and Conclusions 36
Chapter 7 Future scope 37
Appendix 37-39
Bibliography 39-40
List of Figures

Fig .1 Data Flow Diagram (3.3.1.1 AND 3.3.1.2)………………………………..12-13

Fig .2 Activity Diagram (3.3.2)……………………………………………………...14

Fig .3 Flow Chart (3.3.3)…………………………………………………………….15

Fig .4 Class Diagram (3.3.4)……………….………………………………………..16

Fig .5 ER Diagram (3.3.5)……………….………………………………………….17

Fig .6 Sequence Diagram (3.3.6)……………….…………………………………...18

Fig .7 Use-Case Diagram (3.3.7)..…………….……………………………………19

Fig .8 CNN (Fig 4.1.2.1)……………………………………………………………..21

Fig .9 Dataset generation (Figure -4.2.1.1 and Figure -4.2.1.2)……………………24-25

Fig .10 Gesture Classification (Figure -4.2.2.1)……………………………………..25

Fig .11 UI Representation (5.1)……………………………………………………..31

Fig .12 Training Data Collection (Figure -5.2.1.1 and 5.2.1.2)……………………….32

Fig .13 Testing Data Collection (Figure -5.2.2.1)……………………………………33

Fig .12 Final Application (Figure -5.2.3)…………………………………………….34


Abbreviations
1. CNN - Convolutional Neural Network

2. ML - Machine Learning

3. ASL - American Sign Language

4. RNN - Recurrent Neural Network

5. CRNN - Convolutional Recurrent Neural Network

6. HOG - Histogram of Oriented Gradients

7. LBP - Local Binary Patterns

8. SVM - Support Vector Machine

9. PCA - Principal Component Analysis

10. ROI - Region of Interest

11. FPS - Frames Per Second

12. API - Application Programming Interface

13. GUI - Graphical User Interface

14. JSON - JavaScript Object Notation

15. CSV - Comma-Separated Values

16. GUI - Graphical User Interface

17. RGB - Red, Green, Blue (color model)

18. HSV - Hue, Saturation, Value (color model)

19. CUDA - Compute Unified Device Architecture

20. GPU - Graphics Processing Unit


Notations & Symbol
MEDI-CAPS UNIVERSITY, INDORE

CHAPTER-1

Introduction

1.1 Introduction
Sign Language Recognition is a technology designed to bridge communication gaps between
individuals who use sign language and those who may not understand it. Sign language is a visual-
gestural language used by the deaf and hard of hearing community for communication. Sign
Language Recognition systems utilize advanced technologies such as sensors, machine learning,
and computer vision to interpret and translate sign language gestures into written or spoken
language.
The primary goal of Sign Language Recognition is to enable effective communication between
individuals who use sign language as their primary means of expression and those who rely on
spoken or written language. These systems play a crucial role in fostering inclusivity and
accessibility, breaking down barriers that may exist in everyday communication for individuals
with hearing impairments.
Validation and testing phases ensure the accuracy and reliability of the system, and a user
interface is implemented to convey the recognized gestures, making communication accessible
to a wider audience. Real-time processing capabilities enable immediate recognition, making the
technology practical for various applications.
Continuous improvement is a fundamental aspect of Sign Language Recognition systems,
allowing for updates, refinement, and adaptation over time. This iterative process ensures that the
system remains effective, accommodating different sign language variations and user needs.

1.2 Literature Review

 Scholars investigate the linguistic properties of sign languages, treating them as complete
and unique languages with their own grammar, syntax, and semantics.
 Research explores how the brain processes sign language, delving into cognitive and
neurological aspects to understand how sign language is perceived, produced, and
represented in the brain.

COMPUTER SCIENCE & ENGINEERING 1


MEDI-CAPS UNIVERSITY, INDORE

 Studies in sign language education focus on effective teaching methods, curriculum


design, and the outcomes of sign language learning, particularly among deaf individuals.
 Researchers explore the application of technology, including computer vision and
machine learning, to develop sign language recognition systems, aiming to enhance
communication and accessibility.
 Research takes a global perspective, considering regional variations in sign languages,
efforts to standardize or preserve them, and the recognition of sign languages at the
national and international levels.
 A comprehensive literature review explores existing research on sign language
recognition, machine learning, and related technologies.

1.3 Objectives

Facilitating Communication: Enable effective communication between individuals who


use sign language and those who may not understand sign language, promoting inclusivity and
bridging communication gaps.
Accessibility: Improve accessibility for deaf and hard-of-hearing individuals in various
settings, such as education, employment, healthcare, and public services.
Education Support: Assist in educational settings by providing tools for sign language
instruction, communication aids, and resources for deaf students, educators, and learners.
Enhancing Daily Life: Enhance the overall quality of life for individuals who rely on sign
language as their primary mode of communication in daily interactions.
Educational and Professional Opportunities: Open up educational and professional
opportunities for individuals who use sign language by enabling effective communication with a
broader audience.

Cultural Preservation: Contribute to the preservation and promotion of deaf culture by


recognizing the importance of sign language as a linguistic and cultural expression.
Real-time Communication: Enable real-time recognition and interpretation of sign
language gestures to facilitate immediate and seamless communication.

COMPUTER SCIENCE & ENGINEERING 2


MEDI-CAPS UNIVERSITY, INDORE

Continuous Improvement: Evolve and improve over time by incorporating user feedback,
updating datasets, and refining algorithms to enhance the accuracy and effectiveness of sign
language recognition systems.

1.4 Significance

Sign language recognition is pivotal for accessibility, empowering deaf individuals to communicate
effectively with the broader community. It fosters inclusivity, breaking down communication
barriers and promoting equal participation. This technology enhances educational opportunities,
supports cultural preservation, and drives technological advancement. By acknowledging sign
language, societies affirm the rights and identities of deaf communities, ensuring legal recognition
and equal access to services. Overall, sign language recognition transforms lives, enabling
individuals to navigate the world more independently and fostering a more inclusive and
understanding society.

1.5 Research Design


The research design for sign language recognition typically involves data collection of sign language
gestures, preprocessing for noise reduction, feature extraction to capture relevant information, and
machine learning algorithms for classification. Researchers may utilize datasets of sign language
videos or motion capture data. Evaluation metrics such as accuracy, precision, and recall are
employed to assess the performance of the recognition system. Cross-validation techniques are often
used to validate the model's generalization capability. Additionally, user studies may be conducted
to evaluate the usability and effectiveness of the recognition system in real-world scenarios, ensuring
its practical applicability for the deaf community.

1.6 Source of Data


Main source of data is Sign Language Gestures that we collected from our own by using openCV
which enables camera to capture images and then we save those images in dataset folders >>
trainingData folder.

COMPUTER SCIENCE & ENGINEERING 3


MEDI-CAPS UNIVERSITY, INDORE

CHAPTER-2
REQUIREMENTS SPECIFICATION

2.1 User Characteristics


User characteristics in sign language recognition refer to the individuals or groups who
interact with or benefit from the technology. They include:

1. Deaf and Hard of Hearing Individuals:


Primary users of sign language recognition systems, who rely on them for communication
and interaction in various contexts.

2. Sign Language Interpreters:


Professionals who facilitate communication between deaf individuals and others, who may
use sign language recognition tools to aid their work.

3. Educators and Researchers:


Individuals involved in teaching sign language, studying its linguistics, or conducting
research on sign language recognition technology.

4. Developers and Engineers:


Those responsible for creating and improving sign language recognition systems, including
software developers, engineers, and researchers in the field of computer vision and machine
learning.

5. Healthcare Professionals:
Medical personnel who may use sign language recognition technology to communicate with
deaf patients or provide healthcare information in sign language.

6. Students and Language Learners:


Individuals learning sign language who may use recognition systems as educational tools to
improve their proficiency.

COMPUTER SCIENCE & ENGINEERING 4


MEDI-CAPS UNIVERSITY, INDORE

2.2 Functional Requirements


Functional requirements for sign language recognition systems involve features and capabilities
necessary for effective communication and interaction with deaf individuals. These include:

1. Gesture Detection: Accurately identifying and distinguishing individual signs and


gestures in real-time or recorded video.

2. Vocabulary Expansion: Supporting a wide range of signs and gestures to accommodate


different sign languages and dialects.

3. Continuous Recognition: Capable of recognizing continuous signing sequences rather


than just isolated signs, allowing for more natural communication.

4. Multi-person Recognition: Ability to recognize signs from multiple individuals


simultaneously, facilitating group interactions.

5. Adaptability: The system should be adaptable to various lighting conditions, backgrounds,


and signing speeds to ensure robust performance in different environments.

6. Feedback Mechanism: Providing immediate feedback to users, such as text or audio


translations, to enhance communication effectiveness.

7. Customization Options: Allowing users to customize preferences, such as sign language


dialect or signing speed, to accommodate individual communication styles.

8. Integration: Seamless integration with other communication technologies and platforms,


such as video conferencing software or mobile applications, for widespread accessibility.

9. Accuracy and Reliability: Ensuring high accuracy and reliability in gesture recognition to
minimize misinterpretations and errors in communication.

10. User-friendly Interface: Providing an intuitive and easy-to-use interface for both deaf
users and communication partners to facilitate smooth interactions.

2.3 Dependencies
Dependencies of sign language recognition requirements include:

1. Data Quality: Accurate recognition depends on high-quality training data, including


diverse sign language gestures captured in different contexts and lighting conditions.

2. Algorithm Development: Advanced machine learning algorithms are needed to process


and analyze sign language gestures effectively, requiring expertise in computer vision and
artificial intelligence.

COMPUTER SCIENCE & ENGINEERING 5


MEDI-CAPS UNIVERSITY, INDORE

3. Hardware Compatibility: The system's performance may depend on the hardware used,
such as cameras or motion sensors, requiring compatibility and optimization for specific
devices.

4. User Feedback: Continuous user feedback is essential for refining and improving
recognition accuracy and usability based on real-world usage scenarios.

5. Ethical Considerations: Ensuring ethical data collection and usage practices, including
user consent and privacy protection, are crucial dependencies for responsible sign language
recognition development.

2.4 Performance Requirements

Performance requirements for sign language recognition systems aim to ensure efficient and
accurate communication between users. These requirements include:

1. Accuracy: The system must accurately recognize sign language gestures to facilitate
effective communication, with high precision and recall rates.

2. Real-time Processing: The system should process sign language gestures in real-time to
enable fluid and natural interactions without significant delays.

3. Scalability: It should be capable of handling a large vocabulary of signs and gestures to


accommodate various communication needs and linguistic diversity.

4. Robustness: The system should perform reliably across different environmental


conditions, such as varying lighting, backgrounds, and signing speeds.

5. Adaptability: It should adapt to different users' signing styles and preferences, ensuring
accurate recognition for individuals with diverse communication styles.

6. Latency: Minimal latency in gesture recognition is essential for smooth and natural
communication, especially in interactive settings like video conferencing.

7. Resource Efficiency: The system should operate efficiently, minimizing computational


resources and energy consumption, particularly for mobile or embedded applications.

8. User Experience: It should provide a seamless and intuitive user experience, with clear
feedback mechanisms and minimal user effort required for interaction

COMPUTER SCIENCE & ENGINEERING 6


MEDI-CAPS UNIVERSITY, INDORE

2.5 Hardware Requirements


Hardware requirements for sign language recognition systems depend on the specific
implementation and deployment scenarios. However, some common hardware components and
considerations include:

1. Cameras: High-resolution cameras capable of capturing clear video footage of sign


language gestures are essential. The number and placement of cameras may vary
depending on the application, such as desktop computers, smartphones, or wearable
devices.

2. Processing Units: Powerful processors or dedicated hardware accelerators are needed for
real-time processing of video data and running machine learning algorithms for gesture
recognition. This may include CPUs, GPUs, or specialized chips like TPUs or FPGAs.

3. Memory: Sufficient RAM is necessary to store and process video frames, intermediate
data, and model parameters during gesture recognition tasks.

4. Storage: Adequate storage space may be required for storing training data, pre-trained
models, and application data, depending on the system's requirements.

5. Sensors: Additional sensors, such as depth sensors or accelerometers, may enhance gesture
recognition accuracy or provide contextual information about the user's movements.

6. Connectivity: Reliable network connectivity, such as Wi-Fi or mobile data, may be


necessary for accessing cloud-based services, performing updates, or transmitting data in
remote deployment scenarios.

7. Power Supply: Depending on the deployment environment, power considerations are


essential to ensure uninterrupted operation. This includes battery life for portable devices
or power backup solutions for stationary setups.

8. Integration: Hardware components should be seamlessly integrated with the software


components of the sign language recognition system, ensuring compatibility and optimal
performance.

2.6 Constraints & Assumptions


Constraints and assumptions in sign language recognition systems include:

Constraints:

1. Hardware Limitations: The system's performance may be limited by the processing


power, memory, and storage capacity of the hardware used, especially in resource-
constrained environments.

COMPUTER SCIENCE & ENGINEERING 7


MEDI-CAPS UNIVERSITY, INDORE

2. Data Availability: Limited availability of diverse and high-quality sign language datasets
for training and testing may constrain the system's accuracy and robustness.

3. Environmental Factors: Factors such as varying lighting conditions, background clutter,


and occlusions can affect the system's performance, especially in real-world settings.

4. Computation Time: Real-time processing requirements impose constraints on the


computational complexity of algorithms, impacting system responsiveness.

5. User Variability: Variability in signing styles, gestures, and hand shapes among different
individuals poses challenges for accurate recognition.

Assumptions:
1. Standardized Gestures: Assumes a standardized set of sign language gestures and
vocabulary for recognition, which may not fully capture the diversity of sign languages and
dialects.

2. Clear Line of Sight: Assumes an unobstructed view of the signer's hands for accurate
gesture detection, which may not always be feasible in practical scenarios.

3. Stable Environment: Assumes a stable and controlled environment during system


operation to minimize the impact of environmental factors on recognition accuracy.

4. Limited Vocabulary: Assumes a limited vocabulary of signs and gestures for recognition,
which may not cover all possible communication needs.

5. User Cooperation: Assumes user cooperation and willingness to adapt signing behavior or
provide feedback for system improvement, which may vary among individuals.

COMPUTER SCIENCE & ENGINEERING 8


MEDI-CAPS UNIVERSITY, INDORE

CHAPTER-3

DESIGN

3.1 Algorithm

Algorithm Layer 1:

1. Apply Gaussian Blur filter and threshold to the frame taken with openCV to get the processed
image after feature extraction.
2. This processed image is passed to the CNN model for prediction and if a letter is detected for
more than 50 frames then the letter is printed and taken into consideration for forming the
word.
3. Space between the words is considered using the blank symbol.

Algorithm Layer 2:

1. We detect various sets of symbols which show similar results on getting detected.
2. We then classify between those sets using classifiers made for those sets only.

Layer 1:

 CNN Model:

1. 1st Convolution Layer: The input picture has resolution of 128x128 pixels. It is first processed
in the first convolutional layer using 32 filter weights (3x3 pixels each). This will result in a
126X126 pixel image, one for each Filter-weights.
2. 1st Pooling Layer: The pictures are down sampled using max pooling of 2x2 i.e we keep the
highest value in the 2x2 square of array. Therefore, our picture is down sampled to 63x63 pixels.
3. 2nd Convolution Layer: Now, these 63 x 63 from the output of the first pooling layer is served
as an input to the second convolutional layer. It is processed in the second convolutional layer
using 32 filter weights (3x3 pixels each). This will result in a 60 x 60 pixel image.
4. 2nd Pooling Layer: The resulting images are down sampled again using max pool of 2x2 and
is reduced to 30 x 30 resolution of images.

COMPUTER SCIENCE & ENGINEERING 9


MEDI-CAPS UNIVERSITY, INDORE

5. 1st Densely Connected Layer: Now these images are used as an input to a fully connected
layer with 128 neurons and the output from the second convolutional layer is reshaped to an
array of 30x30x32 =28800 values. The input to this layer is an array of 28800 values. The output
of these layer is fed to the 2nd Densely Connected Layer. We are using a dropout layer of value
0.5 to avoid overfitting.
6. 2nd Densely Connected Layer: Now the output from the 1st Densely Connected Layer is used
as an input to a fully connected layer with 96 neurons.
7. Final layer: The output of the 2nd Densely Connected Layer serves as an input for the final
layer which will have the number of neurons as the number of classes we are classifying
(alphabets + blank symbol).
 Activation Function:
We have used ReLU (Rectified Linear Unit) in each of the layers (convolutional as well as
fully connected neurons).
ReLU calculates max(x,0) for each input pixel. This adds nonlinearity to the formula and
helps to learn more complicated features. It helps in removing the vanishing gradient
problemand speeding up the training by reducing the computation time.

 Pooling Layer:
We apply Max pooling to the input image with a pool size of (2, 2) with ReLU activation
function. This reduces the amount of parameters thus lessening the computation cost and
reduces overfitting.

 Dropout Layers:
The problem of overfitting, where after training, the weights of the network are so tuned to
the training examples they are given that the network doesn’t perform well when given new
examples. This layer “drops out” a random set of activations in that layer by setting them
to zero. The network should be able to provide the right classification or output for a
specific example even if some of the activations are dropped out [5].

 Optimizer:
We have used Adam optimizer for updating the model in response to the output of the loss
function.

COMPUTER SCIENCE & ENGINEERING 10


MEDI-CAPS UNIVERSITY, INDORE

Adam optimizer combines the advantages of two extensions of two stochastic gradient
descent algorithms namely adaptive gradient algorithm (ADA GRAD) and root mean
square propagation (RMSProp).

Layer 2:

We are using two layers of algorithms to verify and predict symbols which are more similar to each
other so that we can get us close as we can get to detect the symbol shown. In our testing we found
that following symbols were not showing properly and were giving other symbols also:

1. For D : R and U
2. For U : D and R
3. For I : T, D, K and I
4. For S : M and N
So, to handle above cases we made three different classifiers for classifying these sets:
1. {D, R, U}
2. {T, K, D, I}
3. {S, M, N}

3.2 Finger Spelling Sentence Formation Implementation:

1. Whenever the count of a letter detected exceeds a specific value and no other letter is close to
it by a threshold, we print the letter and add it to the current string (In our code we kept the
value as 50 and difference threshold as 20).
2. Otherwise, we clear the current dictionary which has the count of detections of present symbol
to avoid the probability of a wrong letter getting predicted.
3. Whenever the count of a blank (plain background) detected exceeds a specific value and if the
current buffer is empty no spaces are detected.
4. In other case it predicts the end of word by printing a space and the current gets appended to
the sentence below.

COMPUTER SCIENCE & ENGINEERING 11


MEDI-CAPS UNIVERSITY, INDORE

3.3 System Designs:

3.3.1 Data Flow Diagram (Level 0):

COMPUTER SCIENCE & ENGINEERING 12


MEDI-CAPS UNIVERSITY, INDORE

3.3.1 Data Flow Diagram (Level 1):

COMPUTER SCIENCE & ENGINEERING 13


MEDI-CAPS UNIVERSITY, INDORE

3.3.2 Activity Diagram:

COMPUTER SCIENCE & ENGINEERING 14


MEDI-CAPS UNIVERSITY, INDORE

3.3.3 Flow Chart:

COMPUTER SCIENCE & ENGINEERING 15


MEDI-CAPS UNIVERSITY, INDORE

3.3.4 Class Diagram:

COMPUTER SCIENCE & ENGINEERING 16


MEDI-CAPS UNIVERSITY, INDORE

3.3.5 ER Diagram:

COMPUTER SCIENCE & ENGINEERING 17


MEDI-CAPS UNIVERSITY, INDORE

3.3.6 Sequence Diagram:

COMPUTER SCIENCE & ENGINEERING 18


MEDI-CAPS UNIVERSITY, INDORE

3.3.7 Use-Case Diagram:

COMPUTER SCIENCE & ENGINEERING 19


MEDI-CAPS UNIVERSITY, INDORE

CHAPTER-4
Implementation, Testing, and Maintenance

4.1 Introduction to Languages, IDE’s, Tools, and Technologies used for


Implementation

4.1.1 Python:
Python is a high-level, interpreted programming language celebrated for its simplicity, readability,
and versatility. It was conceived by Guido van Rossum and introduced in 1991, emphasizing clean
syntax and ease of use for developers across skill levels. Noteworthy attributes of Python include its
straightforward and comprehensible syntax, which favors readability and clear code organization
through indentation rather than complex symbols. Being an interpreted language, Python executes
code line by line via an interpreter, enabling swift development and experimentation. Additionally,
Python offers an array of high-level data types like lists, dictionaries, tuples, sets, and strings,
simplifying data manipulation tasks. Its extensive standard library encompasses modules for various
functionalities such as file handling, networking, web development, and more, reducing reliance on
external dependencies. Python's dynamic typing determines variable types during runtime,
complemented by strong typing to catch type errors during execution. Furthermore, Python boasts
cross-platform compatibility, running seamlessly on diverse operating systems like Windows,
macOS, and Linux. Its vast and active community contributes to ongoing development, creates
libraries and frameworks, and offers support through various channels, solidifying Python's position
as a leading programming language. Widely utilized in web development, data analysis, artificial
intelligence, scientific computing, automation, and scripting, Python has gained immense popularity
for its simplicity, flexibility, and extensive ecosystem.

4.1.2 CNN:
Convolutional Neural Networks (CNNs) are a class of deep learning models specifically designed
for processing structured grid-like data, such as images. Introduced in the 1980s, CNNs have
revolutionized the field of computer vision and have become the cornerstone of various applications,
including image classification, object detection, facial recognition, and more.

CNNs are characterized by their hierarchical structure, consisting of multiple layers, including
convolutional layers, pooling layers, and fully connected layers. In a CNN, convolutional layers
apply convolutional operations to extract features from input images. These layers use learnable
filters or kernels to convolve over the input data, capturing local patterns and spatial dependencies.
Pooling layers then downsample the feature maps obtained from the convolutional layers, reducing
their spatial dimensions while retaining important information.

COMPUTER SCIENCE & ENGINEERING 20


MEDI-CAPS UNIVERSITY, INDORE

Through repeated application of convolutional and pooling layers, CNNs learn to hierarchically
extract increasingly abstract features from the input images. The final layers of a CNN typically
consist of one or more fully connected layers, which perform classification or regression tasks based
on the extracted features.

CNNs are trained using large datasets through the process of supervised learning, where input
images are labeled with corresponding classes or attributes. During training, the network learns to
optimize its parameters (such as filter weights and biases) to minimize the discrepancy between
predicted and actual labels, typically using backpropagation and gradient descent optimization
algorithms.

The success of CNNs can be attributed to their ability to automatically learn hierarchical
representations directly from raw data, without the need for handcrafted features. This makes CNNs
highly effective in a wide range of visual recognition tasks, leading to their widespread adoption in
both academic research and industrial applications.

Fig 4.1.2.1

4.1.3 VS Code:
Visual Studio Code (VS Code) is a free and open-source code editor developed by Microsoft.
Launched in 2015, it quickly gained popularity among developers for its lightweight yet powerful
features. Built on top of the Electron framework, VS Code is highly customizable, allowing
developers to tailor it to their preferences with extensions, themes, and settings.

VS Code supports a wide range of programming languages and features built-in support for syntax
highlighting, code completion, and debugging. It offers an integrated terminal, version control
through Git, and seamless integration with various tools and services, making it suitable for a diverse
range of development workflows.

One of the key strengths of VS Code is its extensive extension ecosystem, with thousands of
extensions available for enhancing functionality, adding new features, and supporting additional

COMPUTER SCIENCE & ENGINEERING 21


MEDI-CAPS UNIVERSITY, INDORE

languages and frameworks. These extensions are contributed by both Microsoft and the community,
further extending the capabilities of the editor.

Overall, VS Code provides developers with a highly productive and efficient environment for
writing code, debugging, and collaborating on projects. Its popularity continues to grow, making it
a top choice for developers across different platforms and programming languages.

4.1.4 Tensorflow:
TensorFlow is an end-to-end open-source platform for Machine Learning. It has a
comprehensive, flexible ecosystem of tools, libraries and community resources that lets

researchers push the state-of-the-art in Machine Learning and developers easily build and deploy
Machine Learning powered applications.

TensorFlow offers multiple levels of abstraction so you can choose the right one for your needs.
Build and train models by using the high-level Keras API, which makes getting started with
TensorFlow and machine learning easy.

If you need more flexibility, eager execution allows for immediate iteration and intuitive debugging.
For large ML training tasks, use the Distribution Strategy API for distributed training on different
hardware configurations without changing the model definition.

4.1.5 Keras:
Keras is a user-friendly, high-level deep learning library for Python. It simplifies the creation and
training of neural networks through an intuitive API, enabling rapid prototyping and
experimentation. With seamless integration with TensorFlow and other backends, Keras allows for
efficient execution of neural network computations, including GPU acceleration. Its modular design
and extensibility make it adaptable to diverse research needs and project requirements, contributing
to its widespread adoption in academia and industry for developing and training neural network
models.

4.1.6 OpenCV:
OpenCV (Open Source Computer Vision Library) is a powerful open-source library for computer
vision and image processing tasks in Python, C++, and other programming languages. It provides a
wide range of functionalities for tasks such as image and video processing, object detection and
tracking, feature extraction, and more. With its extensive collection of algorithms and tools,
OpenCV simplifies the development of computer vision applications, making it popular among

COMPUTER SCIENCE & ENGINEERING 22


MEDI-CAPS UNIVERSITY, INDORE

researchers, developers, and hobbyists alike. Its versatility, ease of use, and robustness have made
it a go-to choice for a wide range of projects, from simple image filtering to complex computer
vision applications in various domains like robotics, healthcare, automotive, and surveillance.

4.1.7 Tkinter:
Tkinter is a Python library for creating graphical user interfaces (GUIs). It simplifies the
development of desktop applications by providing widgets and event-driven programming. It's
widely used for building interactive interfaces due to its ease of use and integration with Python's
standard library.

4.1.7 Hunspell:
Hunspell is a spell checking and morphological analysis library used in various applications for
language processing and correction.

4.1.8 Pyttsx3:
Pyttsx3 is a Python library for text-to-speech (TTS) conversion. It provides a simple interface to
convert text strings into spoken audio using different speech engines, such as the Microsoft Speech
API (SAPI5) on Windows. Pyttsx3 supports various features like changing voice characteristics,
adjusting speech rate, and more, making it useful for creating speech-enabled applications, assistive
technologies, and automated systems.

4.1.9 numpy:
NumPy is a powerful Python library for numerical computing that provides support for large, multi-
dimensional arrays and matrices, along with a collection of mathematical functions to operate on
these arrays efficiently. It is widely used in scientific computing, data analysis, and machine learning
due to its speed and ease of use. NumPy's array operations are implemented in C, making them fast
and suitable for handling large datasets and complex mathematical computations in Python.

4.1.10 Machine Learning:


ML, or Machine Learning, is a branch of artificial intelligence (AI) that focuses on developing
algorithms and techniques that enable computers to learn from and make predictions or decisions
based on data. Instead of being explicitly programmed to perform a specific task, ML algorithms
learn from example data to improve their performance over time. ML encompasses various
approaches, including supervised learning, unsupervised learning, and reinforcement learning, and
is widely used in diverse applications such as predictive analytics, pattern recognition, natural
language processing, and computer vision.

4.1.11 Supervised Learning:


Supervised learning is a machine learning approach where the algorithm learns from labeled data,
consisting of input-output pairs. The goal is to learn a mapping from input to output, so that given
new input data, the algorithm can accurately predict the corresponding output. During training, the
COMPUTER SCIENCE & ENGINEERING 23
MEDI-CAPS UNIVERSITY, INDORE

algorithm adjusts its parameters based on the difference between its predictions and the true labels
in the training data. Supervised learning is used for tasks such as classification, where the output is
a category label, and regression, where the output is a continuous value.

4.2 Testing Techniques and Test Plans


The system is a vision-based approach. All signs are represented with bare hands and so it
eliminates the problem of using any artificial devices for interaction.
4.2.1 Data Set Generation:
For the project we tried to find already made datasets but we couldn’t find dataset in the form
of raw images that matched our requirements. All we could find were the datasets in the form
of RGB values. Hence, we decided to create our own data set. Steps we followed to create our
data set are as follows.

We used Open computer vision (OpenCV) library in order to produce our dataset.

Firstly, we captured around 800 images of each of the symbol in ASL (American Sign
Language) for training purposes and around 200 images per symbol for testing purpose.
First, we capture each frame shown by the webcam of our machine. In each frame we define a
Region Of Interest (ROI) which is denoted by a blue bounded square as shown in the image
below:

Figure -4.2.1.1

COMPUTER SCIENCE & ENGINEERING 24


MEDI-CAPS UNIVERSITY, INDORE

Then, we apply Gaussian Blur Filter to our image which helps us extract various features of our
image. The image, after applying Gaussian Blur, looks as follows:

Figure -4.2.1.2

4.2.2 Gesture Classification:


Our approach uses two layers of algorithm to predict the final symbol of the user.

Figure -4.2.2.1

COMPUTER SCIENCE & ENGINEERING 25


MEDI-CAPS UNIVERSITY, INDORE

Algorithm Layer 1:

4. Apply Gaussian Blur filter and threshold to the frame taken with openCV to get the processed
image after feature extraction.
5. This processed image is passed to the CNN model for prediction and if a letter is detected for
more than 50 frames then the letter is printed and taken into consideration for forming the
word.
6. Space between the words is considered using the blank symbol.

Algorithm Layer 2:

3. We detect various sets of symbols which show similar results on getting detected.
4. We then classify between those sets using classifiers made for those sets only.

Layer 1:

 CNN Model:

8. 1st Convolution Layer: The input picture has resolution of 128x128 pixels. It is first processed
in the first convolutional layer using 32 filter weights (3x3 pixels each). This will result in a
126X126 pixel image, one for each Filter-weights.
9. 1st Pooling Layer: The pictures are down sampled using max pooling of 2x2 i.e we keep the
highest value in the 2x2 square of array. Therefore, our picture is down sampled to 63x63 pixels.
10. 2nd Convolution Layer: Now, these 63 x 63 from the output of the first pooling layer is served
as an input to the second convolutional layer. It is processed in the second convolutional layer
using 32 filter weights (3x3 pixels each). This will result in a 60 x 60 pixel image.
11. 2nd Pooling Layer: The resulting images are down sampled again using max pool of 2x2 and
is reduced to 30 x 30 resolution of images.
12. 1st Densely Connected Layer: Now these images are used as an input to a fully connected
layer with 128 neurons and the output from the second convolutional layer is reshaped to an
array of 30x30x32 =28800 values. The input to this layer is an array of 28800 values. The output
of these layer is fed to the 2nd Densely Connected Layer. We are using a dropout layer of value
0.5 to avoid overfitting.

COMPUTER SCIENCE & ENGINEERING 26


MEDI-CAPS UNIVERSITY, INDORE

13. 2nd Densely Connected Layer: Now the output from the 1st Densely Connected Layer is used
as an input to a fully connected layer with 96 neurons.
14. Final layer: The output of the 2nd Densely Connected Layer serves as an input for the final
layer which will have the number of neurons as the number of classes we are classifying
(alphabets + blank symbol).

 Activation Function:
We have used ReLU (Rectified Linear Unit) in each of the layers (convolutional as well as
fully connected neurons).
ReLU calculates max(x,0) for each input pixel. This adds nonlinearity to the formula and
helps to learn more complicated features. It helps in removing the vanishing gradient
problemand speeding up the training by reducing the computation time.

 Pooling Layer:
We apply Max pooling to the input image with a pool size of (2, 2) with ReLU activation
function. This reduces the amount of parameters thus lessening the computation cost and
reduces overfitting.

 Dropout Layers:
The problem of overfitting, where after training, the weights of the network are so tuned to
the training examples they are given that the network doesn’t perform well when given new
examples. This layer “drops out” a random set of activations in that layer by setting them
to zero. The network should be able to provide the right classification or output for a
specific example even if some of the activations are dropped out [5].

 Optimizer:
We have used Adam optimizer for updating the model in response to the output of the loss
function.
Adam optimizer combines the advantages of two extensions of two stochastic gradient
descent algorithms namely adaptive gradient algorithm (ADA GRAD) and root mean
square propagation (RMSProp).

COMPUTER SCIENCE & ENGINEERING 27


MEDI-CAPS UNIVERSITY, INDORE

Layer 2:

We are using two layers of algorithms to verify and predict symbols which are more similar to each
other so that we can get us close as we can get to detect the symbol shown. In our testing we found
that following symbols were not showing properly and were giving other symbols also:

a) For D : R and U
b) For U : D and R
c) For I : T, D, K and I
d) For S : M and N
So, to handle above cases we made three different classifiers for classifying these sets:
a) {D, R, U}
b) {T, K, D, I}
c) {S, M, N}

4.2.3 Finger Spelling Sentence Formation Implementation:

1. Whenever the count of a letter detected exceeds a specific value and no other letter is close to it
by a threshold, we print the letter and add it to the current string (In our code we kept the value
as 50 and difference threshold as 20).
2. Otherwise, we clear the current dictionary which has the count of detections of present symbol
to avoid the probability of a wrong letter getting predicted.
3. Whenever the count of a blank (plain background) detected exceeds a specific value and if the
current buffer is empty no spaces are detected.
4. In other case it predicts the end of word by printing a space and the current gets appended to the
sentence below.

4.2.4 AutoCorrect Feature:

A python library Hunspell_suggest is used to suggest correct alternatives for each (incorrect) input
word and we display a set of words matching the current word in which the user can select a word

COMPUTER SCIENCE & ENGINEERING 28


MEDI-CAPS UNIVERSITY, INDORE

to append it to the current sentence. This helps in reducing mistakes committed in spellings and
assists in predicting complex words.

4.2.5 Training and Testing:

We convert our input images (RGB) into grayscale and apply gaussian blur to remove unnecessary
noise. We apply adaptive threshold to extract our hand from the background and resize our images
to 128 x 128.

We feed the input images after pre-processing to our model for training and testing after applying
all the operations mentioned above.

The prediction layer estimates how likely the image will fall under one of the classes. So, the output
is normalized between 0 and 1 and such that the sum of each value in each class sums to 1. We have
achieved this using SoftMax function.

At first the output of the prediction layer will be somewhat far from the actual value. To make it
better we have trained the networks using labelled data. It is a continuous function which is positive
at values which is not same as labelled value and is zero exactly when it is equal to the labelled
value.

4.2.6 Challenges Faced:

There were many challenges faced during the project. The very first issue we faced was that
concerning the data set. We wanted to deal with raw images and that too square images as CNN in
Keras since it is much more convenient working with only square images.

We couldn’t find any existing data set as per our requirements and hence we decided to make our
own data set. Second issue was to select a filter which we could apply on our images so that proper
features of the images could be obtained and hence then we could provide that image as input for
CNN model.

We tried various filters including binary threshold, canny edge detection, Gaussian blur etc. but
finally settled with Gaussian Blur Filter.

More issues were faced relating to the accuracy of the model we had trained in the earlier phases.
This problem was eventually improved by increasing the input image size and also by improving
the data set.

COMPUTER SCIENCE & ENGINEERING 29


MEDI-CAPS UNIVERSITY, INDORE

4.3 End User Instructions


To provide end-user instructions for the sign language recognition model, we should include
guidance on how to use the system effectively. Here's a set of instructions we can provide:

1. Installation:
- Ensure you have Python installed on your system.
- Install the required libraries by running `pip install -r requirements.txt` in your terminal or
command prompt.

2. Running the Application:


- Navigate to the directory where the application files are located.
- Run the `application.py` script using Python. You can do this by running `python aplication.py`
in your terminal or command prompt or vs code.

3. Operating the Application:


- When the application starts, it will activate your camera. Ensure your camera is correctly
positioned to capture your hand gestures.
- Hold your hand in front of the camera with the desired sign gesture.
- The application will recognize the sign gesture and display the corresponding text on the screen.

4. Interacting with the Application:


- Press the 'q' key to quit the application and close the window.

5. Using the Recognition System Effectively:


- Ensure proper lighting conditions for optimal recognition accuracy.
- Make clear and distinct hand gestures to improve recognition performance.
- Experiment with different distances from the camera to find the optimal range for gesture
recognition.
- If the recognition accuracy is low, try adjusting the camera angle or hand positioning.

6. Troubleshooting:
- If the application crashes or freezes, try restarting it.
- Ensure your system meets the minimum requirements for running the application.
- Check for any error messages displayed in the terminal or command prompt for troubleshooting
purposes.

COMPUTER SCIENCE & ENGINEERING 30


MEDI-CAPS UNIVERSITY, INDORE

CHAPTER-5

Results and Discussions


5.1 User Interface Representation

Used Python to create this UI, where the processed image is passed to the CNN model for prediction
and if a letter is detected for more than 50 frames then the letter is printed and taken into
consideration for forming the word.

After forming a word, it transfers to sentence part and then clicking on speak button let you listen
the sentence formed.

COMPUTER SCIENCE & ENGINEERING 31


MEDI-CAPS UNIVERSITY, INDORE

5.2 Snapshots of system with brief detail of each


5.2.1 Data collection for training:
We used Open computer vision (OpenCV) library in order to produce our dataset.

Firstly, we captured around 800 images of each of the symbol in ASL (American Sign Language)
for training purposes and around 200 images per symbol for testing purpose.
First, we capture each frame shown by the webcam of our machine. In each frame we define a
Region Of Interest (ROI) which is denoted by a blue bounded square as shown in the image below:

Fig. 5.2.1.1
Then, we apply Gaussian Blur Filter to our image which helps us extract various features of our
image. The image, after applying Gaussian Blur, looks as follows:

Fig. 5.2.1.2
COMPUTER SCIENCE & ENGINEERING 32
MEDI-CAPS UNIVERSITY, INDORE

5.2.2 Data collection for testing:


If the testing and training images are the same, it can lead to inaccurate evaluation of your sign
language recognition model. Here's why:

1. Overfitting: When you train a model using the same data that you test it on, the model may
simply memorize the training data without truly learning the underlying patterns. This can lead to
overfitting, where the model performs well on the training data but poorly on new, unseen data.

2. Misleading Evaluation: If the model has memorized the training data, it may perform
unrealistically well during testing, giving you a false sense of the model's performance. However,
this performance won't generalize to new data, and the model may fail to recognize sign language
gestures accurately in real-world scenarios.

Fig. 5.2.2.1

COMPUTER SCIENCE & ENGINEERING 33


MEDI-CAPS UNIVERSITY, INDORE

5.2.3 Final Application:

5.3 Brief Description of Various Modules of the system


Here's a brief description of various modules of the sign language recognition system:
1. Data Collection Module:
- Responsible for gathering a diverse dataset of American Sign Language (ASL) sign language
images or videos.
- Involves capturing images or recording videos of hand gestures representing different ASL
signs.
- May utilize tools like webcams, smartphones, or specialized sensors for data acquisition.

COMPUTER SCIENCE & ENGINEERING 34


MEDI-CAPS UNIVERSITY, INDORE

2. Data Preprocessing Module:


- Focuses on preparing the collected data for training the machine learning model.
- Tasks include resizing images, converting videos to frames, and labeling samples with
corresponding ASL sign labels.
- Ensures consistency and quality of the dataset to improve the model's performance during
training.

3. Feature Extraction Module:


- Extracts meaningful features from preprocessed data to represent ASL signs.
- Utilizes techniques such as Histogram of Oriented Gradients (HOG), Local Binary Patterns
(LBP), or deep learning-based feature extraction using Convolutional Neural Networks (CNNs).
- Aims to capture distinctive characteristics of hand gestures that aid in accurate recognition.

4. Model Training Module:


- Trains the chosen machine learning model using the preprocessed data.
- Tasks include feeding data into the model, adjusting parameters through backpropagation, and
optimizing the model to minimize loss.
- Aims to enhance the model's ability to accurately classify ASL signs during inference.

COMPUTER SCIENCE & ENGINEERING 35


MEDI-CAPS UNIVERSITY, INDORE

CHAPTER-6

Summary and Conclusions


The sign language recognition system using machine learning (ML) is a comprehensive solution
designed to facilitate communication between individuals using American Sign Language (ASL)
and those who may not be proficient in it. This system encompasses various modules, starting from
data collection to deployment and feedback loop integration. Initially, the system focuses on
collecting a diverse dataset of ASL sign language images or videos, a task that requires meticulous
attention to ensure dataset diversity and quality. Subsequently, the collected data undergoes
preprocessing to standardize and enhance its consistency, followed by feature extraction to derive
meaningful features for classification. The choice of ML models, including CNNs, RNNs, or
CRNNs, is crucial in accurately recognizing ASL signs, and the selected models undergo rigorous
training on preprocessed data to optimize their performance.

Evaluation metrics such as accuracy, precision, recall, and F1 score are employed to assess the
trained models' efficacy, guiding the optimization process. Once trained, the models are deployed
in production environments, enabling real-time ASL sign recognition. This deployment facilitates
integration into various applications and devices, thereby extending the system's accessibility and
usability. Furthermore, a feedback loop mechanism ensures continuous improvement by soliciting
user feedback and monitoring system performance in real-world scenarios. This iterative process is
fundamental in refining the system's accuracy, responsiveness, and user experience over time.

In conclusion, the sign language recognition system represents a significant advancement in


leveraging ML technology to bridge communication barriers for individuals using ASL. Despite
challenges such as data collection complexity and model optimization, the system demonstrates
remarkable potential in enhancing accessibility and fostering inclusivity. By combining robust data
collection, accurate feature extraction, and well-trained ML models with ongoing iteration guided
by user feedback, the system embodies a promising solution for facilitating meaningful
communication and promoting inclusivity for individuals with hearing impairments.

COMPUTER SCIENCE & ENGINEERING 36


MEDI-CAPS UNIVERSITY, INDORE

CHAPTER-7

7.1 Future scope


Looking ahead, the sign language recognition system using machine learning shows promise for
improving accuracy and expanding its vocabulary. Future developments may involve enhancing
ML algorithms, incorporating multi-modal approaches, and integrating with emerging
technologies like AR and wearables. Collaboration with the deaf community is crucial for cultural
sensitivity and inclusivity. In conclusion, ongoing refinement and collaboration can ensure the
system's effectiveness in promoting communication accessibility for individuals using sign
language.

7.2 Appendix
1. Accuracy: In the context of classification models, accuracy represents the proportion of
correct predictions made by a model compared to the total number of predictions.

2. Artificial Intelligence (AI): AI refers to the development of computer systems capable of


performing tasks that typically require human intelligence, such as understanding natural
language, recognizing patterns, and making decisions.

3. Cloud-ML: Cloud-ML is a platform that offers tools and services to developers, allowing them
to build and deploy custom machine learning models in cloud environments. It simplifies the
process of developing machine learning solutions by providing pre-built algorithms and
infrastructure.

4. Framework: In machine learning, a framework is a software tool or library that provides a set
of functionalities and tools for developing machine learning models. Frameworks like
TensorFlow, PyTorch, and scikit-learn offer APIs and tools for tasks such as data
preprocessing, model training, and deployment.

5. Gesture: A gesture refers to a physical movement or action, often made with hands or other
body parts, used to convey meaning or communicate information. In the context of sign
language recognition, gestures are the hand movements and expressions used to represent
words or concepts.

6. Machine Learning (ML): ML is a subset of artificial intelligence that focuses on developing


algorithms and models that can learn from data and make predictions or decisions without
being explicitly programmed. ML algorithms enable computers to improve their performance
on a task through experience or exposure to data.

COMPUTER SCIENCE & ENGINEERING 37


MEDI-CAPS UNIVERSITY, INDORE

7. Model: A model in machine learning refers to a mathematical representation or algorithm


that has been trained on data to make predictions or decisions. Models learn patterns and
relationships from data during the training process and can be used to make predictions on
new, unseen data.

8. NumPy: NumPy is a Python library used for numerical computing, particularly for working
with arrays and matrices. It provides support for mathematical functions, linear algebra
operations, and random number generation, making it a fundamental library for scientific
computing in Python.

9. OpenCV: OpenCV (Open Source Computer Vision Library) is an open-source library for
computer vision and image processing tasks. It offers a wide range of functionalities for tasks
such as image manipulation, feature detection, object recognition, and video analysis.

10. Optimal Approach: An optimal approach refers to a decision or strategy that leads to the
best possible outcome among all available options. In machine learning, finding an optimal
approach often involves optimizing model parameters, choosing appropriate algorithms, and
selecting relevant features to maximize performance.

11. Pandas: Pandas is a Python library used for data manipulation and analysis. It provides data
structures and functions for working with structured data, such as tabular data or time series,
making it a powerful tool for data preprocessing and analysis tasks.

12. Deep Learning: Deep learning is a subset of machine learning that focuses on developing
artificial neural networks with multiple layers (deep neural networks) to learn from data and
make predictions. Deep learning algorithms can automatically learn features from data,
enabling them to perform complex tasks such as image recognition, speech recognition, and
natural language processing.

13. Computer Vision: Computer vision is a multidisciplinary field that focuses on enabling
computers to gain high-level understanding from digital images or videos. It involves tasks
such as image recognition, object detection, scene understanding, and image generation
using techniques from machine learning, image processing, and computer graphics.

14. System: In the context of software engineering, a system refers to a collection of


interconnected components or modules that work together to achieve a common goal or
perform a specific function. The Sign Language to Text Converter is an example of a system
designed to convert sign language gestures into text.

COMPUTER SCIENCE & ENGINEERING 38


MEDI-CAPS UNIVERSITY, INDORE

15. TensorFlow: TensorFlow is an open-source machine learning framework developed by


Google. It provides a comprehensive ecosystem of tools, libraries, and resources for building
and deploying machine learning models, including support for deep learning algorithms,
distributed training, and production deployment.

7.3 Bibliography
[1] T. Yang, Y. Xu, and “A., Hidden Markov Model for Gesture Recognition”, CMU-RI-TR-94 10,
Robotics Institute, Carnegie Mellon Univ., Pittsburgh, PA, May 1994.

[2] Pujan Ziaie, Thomas M uller, Mary Ellen Foster, and Alois Knoll “A Na ̈ıve Bayes Munich, Dept.
of Informatics VI, Robotics and Embedded Systems, Boltzmannstr. 3, DE-85748 Garching,
Germany.

[3]https://docs.opencv.org/2.4/doc/tutorials/imgproc/gausian_median_blur_bilateral_filter/gausian_
median_blur_bilateral_filter.html

[4] Mohammed Waleed Kalous, Machine recognition of Auslan signs using PowerGloves: Towards
large-lexicon recognition of sign language.

[5]aeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural
Networks-Part-2/

[6] http://www-i6.informatik.rwth-aachen.de/~dreuw/database.php

[7] Pigou L., Dieleman S., Kindermans PJ., Schrauwen B. (2015) Sign Language Recognition Using
Convolutional Neural Networks. In: Agapito L., Bronstein M., Rother C. (eds) Computer Vision -
ECCV 2014 Workshops. ECCV 2014. Lecture Notes in Computer Science, vol 8925. Springer,
Cham

[8] Zaki, M.M., Shaheen, S.I.: Sign language recognition using a combination of new vision-based
features. Pattern Recognition Letters 32(4), 572–577 (2011).

[9] N. Mukai, N. Harada and Y. Chang, "Japanese Fingerspelling Recognition Based on


Classification Tree and Machine Learning," 2017 Nicograph International (NicoInt), Kyoto, Japan,
2017, pp. 19-24. doi:10.1109/NICOInt.2017.9

COMPUTER SCIENCE & ENGINEERING 39


MEDI-CAPS UNIVERSITY, INDORE

[10] Byeongkeun Kang, Subarna Tripathi, Truong Q. Nguyen” Real-time sign language
fingerspelling recognition using convolutional neural networks from depth map” 2015 3rd IAPR
Asian Conference on Pattern Recognition (ACPR)

[11] Number System Recognition (https://github.com/chasinginfinity/number-sign-recognition)

[12] https://opencv.org/

[13] https://en.wikipedia.org/wiki/TensorFlow

[14] https://en.wikipedia.org/wiki/Convolutional_neural_nework

[15] http://hunspell.github.io/

COMPUTER SCIENCE & ENGINEERING 40

You might also like