Sign Language Detection by Abhishek Deshmukh

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 35

Janardan Bhagat Shikshan Prasarak Sanstha’s

CHANGU KANA THAKUR


ARTS, COMMERCE, SCIENCE COLLEGE
NEW PANVEL (Autonomous)

PROJECT
REPORT
ON
“SIGN LANGUAGE DETECTION”

DEVELOPED BY
Ms. Abhishek Vivek Deshmukh

UNDER THE GUIDANCE OF


PROF. Mr. S.S Vyavahare

ACADEM
IC YEAR
2023-24
CERTIFICATE
This is to certify that the Project Report entitled

Is successfully completed by Abhishek Vivek Deshmukh,

Examination Seat Number PCS23302 under the guidance of Mr.


S.S Vyavahare, during the academic year 2023 to 2024 as per
the Syllabus, and the fulfilment for the completion of the M.Sc.-II
(Semester-IV) in the Computer Science of University of Mumbai.
It is also to certify that this is original work of the candidate done
during academic year 2023- 2024.

Place :

New

Panvel

Date:

Internal Examiner Head of Department

External Examiner Principal


ACKNOLEDGEMENT

I would like to express my sincere gratitude to our


Head of Department Prof. Mrs. Pratibha Jadhav and my
guide too Prof Mr. S.S Vyavahare for your invaluable
support and guidance during the course of the project.

I would like to thank you for your valuable guidance and


appreciation forgiving form and substance to this report. It is due to
your enduring efforts, patients and enthusiasm, which has given a
sense of direction and purpose fullness to this project and ultimately
made it a success.

A sincere quote of thanks for providing me with all the


information that Indeed for this project, without which the successful
completion of the project may not have been possible.

We also appreciate the outstanding co-operation by


the non- teaching staff, for the long timings that I could
receive.

We are also very much thankful to the “Changu Kana Thakur


Arts, Commerce & Science College” for including the project work
as apart of the syllabus , without which we would not have gained the
experience of using the analytical software as this.
DECLARATION

I hereby declare that the project entitled, “SIGN LANGUAGE


DETECTION ” done at Changu Kana Thakur College of Arts, Commerce,
and Science Panvel, has not been in any case duplicated to submit to any
other university for the award of any degree. To the best of my knowledge
other than me, no one has submitted to any other university.

The project is done in partial fulfillment of the requirement for the award of
MASTER OF SCIENCE (Computer Science) to be
submitted as a final semester project as part of our curriculum.

Abhishek Vivek Deshmukh


ABSTRACT

Sign Language is the means of communication among the deaf and


mute community. Sign Language emerges and evolves naturally within
hearing impaired community. Sign Language communication involves
manual and non-manual signals where manual signs involve fingers,
hands, arms and non-manual signs involve face, head, eyes and body.
Sign Language is a well-structured language with a phonology,
morphology, syntax and grammar. Sign languageis a complete natural
language that uses different ways of expression for communication in
everyday life. Sign Language recognition system transfers the
communication from humanhuman to human-computer interaction. The
aim of the sign language recognitionsystem is to present an efficient and
accurate mechanism to transcribe text or speech, thus the “dialog
communication” between the deaf and hearing person will be smooth.
There is no standardized sign language for all deaf people across the
world. However, sign languages are not universal, as with spoken
languages, these differ from region to region.. A person whocan talk
and hear properly (normal person) cannot communicate with deaf &
dumb person unless he/she is familiar with sign language. Same case is
applicable when a deaf & dumb person wants to communicate with a
normal person or blind person

This approach takes image from the camera as data of gesture. The
vision based method mainly concentrates on captured image of gesture
and extract the main feature and recognizesit. The colour bands were
used at the start of vision based approach. The main disadvantageof this
method was the standard colour should be used on the finger tips. Then
use of bare hands preferred rather than the colour bands
TABLE OF CONTENTS

Sr.no Topic Page.no Remark

Chapter 1

1 Introduction 1

Chapter 2

2 Literature Review 2

2.1 Research Gap 4

2.2 Scope of the Project 5

2.3 Problem Statement 6

Chapter 3

3 Proposed Methodology 7

Chapter 4

4 Code Implementation And 10


Screenshots

Chapter 5

5 Result Analysis 30

Chapter 6

6.1 Conclusion 31

6.2 References 32

6.3 Biblography 33
Chapter 1
Introduction

Sign Language Detection is a vital mode of communication for individuals who are
deaf or hard of hearing. With advancements in technology, particularly in computer
vision and machine learning, sign language detection has gained significance in
facilitating communication accessibility and inclusivity for the deaf community.
Sign language detection involves recognizing and interpreting gestures and
movements made by individuals using sign language and translating them into text
or spoken language.

Sign Language Detection strive to develop algorithms and methods for accurately
identifying the sequences of symbols produced and understanding their meaning.
Many SLR methods mistreat the problem as Gesture Recognition (GR). Therefore,
research has so far focused on identifying the positive characteristics and methods
of differentiation in order to properly label a given signal from a set of potential
indicators. However, sign language is more than just a collection of well-articulated
gestures.

Gesture Detection is a subject in computer science as well language technology for


the purpose of translating a person touch with mathematical algorithms. The
subdiscipline of computer vision. Gestures can come from any body movement or
position but usually appears on the face or hand. The current focus in the field
includes emotional recognition from facial and hand touch recognition.

Sign language guides this part of the community and empowers smooth
communication in the community of people with trouble talking and hearing
(deaf and dumb). They use hand signals along with facial expressions and body
activities to cooperate.

1
Chapter 2
2 Literature Survey

Sign language detection is a multidisciplinary field that involves computer vision, machine
learning, and sign language linguistics. Researchers have conducted numerous studies on sign
language detection using various techniques, including OpenCV and MediaPipe, which are
popular libraries for computer vision and hand tracking.

Here are some key points that may be covered in a literature survey on sign language detection
using OpenCV and MediaPipe:

1. Overview of Sign Language: A literature survey may start with an overview of sign
language, its importance, and the challenges associated with sign language processing.

2. Sign Language Recognition Techniques: The literature survey may cover different
techniques used for sign language recognition, including computer vision, machine learning,
and deep learning approaches. It may discuss the advantages and limitations of each technique.

3. OpenCV and MediaPipe: The literature survey may provide an overview of OpenCV and
MediaPipe, including their features, functionalities, and applications in sign language detection.

4. Sign Language Datasets: The literature survey may discuss existing sign language datasets
used for training and evaluating sign language detection models, including their size, diversity,
and limitations.

5. Sign Language Detection Algorithms: The literature survey may review different sign
language detection algorithms that utilize OpenCV and MediaPipe, including hand tracking,
hand landmark detection, and gesture recognition techniques. It may discuss their performance,
accuracy, and limitations.

2
6. Applications of Sign Language Detection: The literature survey may explore the
applications of sign language detection in various domains, such as accessibility, education,
communication, healthcare, and research.

7. Evaluation Metrics: The literature survey may discuss the evaluation metrics used to assess
the performance of sign language detection models, such as accuracy, precision, recall, F1-
score, and others.

8. Challenges and Future Directions: The literature survey may highlight the challenges and
limitations of sign language detection using OpenCV and MediaPipe, and identify future
research directions and opportunities for improvement.

9. Existing Research and Publications: The literature survey may summarize the existing
research and publications related to sign language detection using OpenCV and MediaPipe,
including their findings, methodologies, and contributions to the field.

It's important to note that a literature survey should be comprehensive, thorough, and basedon
relevant and credible sources, such as peer-reviewed journals, conference proceedings,and
reputable research publications. It should also provide critical analysis and synthesis of the
existing literature to identify research gaps and potential areas for further investigation.

3
2.1 Research Gap

Research in sign language detection has made significant strides in recent years, driven by
advancements in computer vision and machine learning techniques. However, several research
gaps still exist in this field, presenting opportunities for further investigation and innovation. Here
are some key research gaps in sign language detection:

Data Availability and Standardization: One of the primary challenges in sign language
detection research is the limited availability of standardized datasets. Existing datasets often
focus on specific sign languages or lack diversity in terms of signers, signing styles, and
environmental conditions. There is a need for large-scale, diverse datasets that cover
multiple sign languages, dialects, and variations in signing speed, complexity, and context.
Standardization of data collection protocols and annotation schemes would facilitate
benchmarking and comparison of different sign language detection approaches.

Cross-Linguistic Generalization: Most existing sign language detection models are trained and
evaluated on a single sign language or a limited set of sign languages. Generalizing models
across different sign languages poses a significant research challenge due to variations in
vocabulary, grammar, phonology, and cultural conventions. Research efforts are needed to
develop cross-linguistically robust models that can recognize common gestures and patterns
across different sign languages while accommodating language-specific variations.

Continuous Sign Language Detection: Many current sign language detection systems focus on
recognizing isolated signs or short sequences of signs. However, natural sign language
communication involves continuous, fluent motion with transitions between signs and non-
manual components such as facial expressions and body movements. Research is needed to
develop algorithms for continuous sign language recognition that can effectively handle
temporal dependencies, segmentation, and recognition of fluent signing gestures in real-time.

Non-Manual Signals and Contextual Information: Sign language communication involves not
only hand gestures but also non-manual signals such as facial expressions, head movements, and
body posture, which convey grammatical and semantic information. Incorporating non-manual
signals and contextual information into sign language detection models remains an open research
challenge. There is a need for multi-modal approaches that integrate visual, spatial, temporal,
and linguistic cues to improve the accuracy and robustness of sign language detection systems..
4
2.2 SCOPE OF THE PROJECT

The scope of sign language detection using OpenCV and MediaPipe is vast and can be
applied in various areas. Here are some potential areas where sign language detection using
OpenCV and MediaPipe can be valuable:

1. Sign Language Translation: Sign language detection can be used to automatically


translate sign language gestures into spoken or written language, enabling
communication between sign language users and non-sign language users.

2. Education and Learning: Sign language detection can be used in educational settings
to create interactive learning materials, tutorials, or games that facilitate sign language
learning for individuals who are learning sign language as their primary or secondary
language.

3. Communication and Interaction: Sign language detection can be incorporated into


communication and interaction tools, such as video conferencing platforms, messaging
apps, or social media, to enable sign language users to communicate and interact more
easily with others.

4. Human-Computer Interaction: Sign language detection can be utilized in human-


computer interaction scenarios, such as gesture-based interfaces or virtual reality
applications, to enable users to interact with computers or virtual environments using
sign language gestures.

5. Healthcare and Rehabilitation: Sign language detection can be integrated into


rehabilitation programs for individuals with hearing impairment or speech-language
disorders, to enhance their communication and rehabilitation process.

6. Research and Development: Sign language detection using OpenCV and MediaPipe
can also be used in research and development of sign language recognition algorithms,
machine learning models, or other related technologies to advance the field of sign
language processing and improve the accuracy and performance of sign language.
5
2.3 Problem Statement

Sign language uses lots of gestures so that it looks like movement language which
consists of a series of hands and arms motions. For different countries, there are
different sign languages and hand gestures.Also, itis noted that some unknown
words are translated by simply showing gestures for each alphabet in the
word. In addition, sign language also includes specific gestures to each
alphabet in the English dictionary and for each number between 0 and 9. Based
on these sign languages are made up of two groups, namely static gesture,
and dynamic gesture. The static gesture is used for alphabet and number
representation, whereas the dynamic gesture is used for specific concepts.
Dynamic also includes words, sentences, etc. The static gesture consists of hand
gestures, whereas the latter includes motion of hands, head, or both. Sign language
is a visual language and consists of 3 major components, such as finger-spelling,
word-level sign vocabulary, and non-manual features. Finger-spelling is used to
spell words letter by letter and convey the message whereas the latter is keyword-
based. But the design of a sign language translator is quite challenging despite
many research efforts during the last few decades. Also, even the same signs have
significantly different appearances for different signers and different viewpoints.
This work focuses on the creation of a static sign language translator by using a
Convolutional Neural Network. We created a lightweight network that can be
used with embedded devices/standalone applications/web applications having
fewer resources

6
Chapter 3
Proposed Methodology

A perfectly working model idea for sign language, that includes all the alphabets and digits.
It also includes dual mode of output that is, sign language converted into text/speech and
speech converted in sign.

ARCHITECTURAL DESIGN :-

MACHINE LEARNING CONCEPTS

CNN ALGORITHM :
Image classification is the process of taking an input(like a picture) and outputting its class or probability
that the input is a particular class. Neural networks are applied in the following steps:

7
1) One hot encode the data: A one-hot encoding can be applied to the integer representation. This
is where the integer encoded variable is removed and a new binary variable is added for each
unique
integer value.
2) Define the model: A model said in a very simplified form is nothing but a function that is used to
take in certain input, perform certain operation to its best on the given input (learning and then
predicting/classifying) and produce the suitable output.
3) Compile the model: The optimizer controls the learning rate. ‘adam’ as our optmizer. Adam is
generally a good optimizer to use for many cases. The adam optimizer adjusts the learning rate
throughout training. The learning rate determines how fast the optimal weights for the model
are
calculated. A smaller learning rate may lead to more accurate weights (up to a certain point), but the
time it takes to compute the weights will be longer.

Libraries for Data Analysis

The models are implemented using Python 3.10 with listed libraries:

⮚ Pandas
Pandas is a Python package to work with structured and time series data. The data from various file
formats such as csv, json, sql etc can be imported using Pandas. It is a powerful open source tool
used for data analysis and data manipulation operations such as data cleaning, merging, selecting as
well wrangling.

⮚ Sklearn

This python library is helpful for building machine learning and statistical models such as
clustering, classification, regression etc. Though it can be used for reading, manipulating
and summarizing the data as well, better libraries are there to perform these functions.

⮚ Matplotlib.pyplot
Matplotlib.pyplot is a collection of functions that make matplotlib work like MATLAB. Each
pyplot function makes some change to a figure: e.g., creates a figure, creates a plotting area in a
figure, plots some lines in a plotting area, decorates the plot with labels, etc.
8
Hardware Requirments :-

Memory Space : Minimum

– 32

MBRecommended :

64MB

HDD : To install the software at

least 10 GB. PROCESSOR : Intel

Pentium IV,1GHZ or aboveRam

: 4GB or above.

Software Requirments

python version 3.10.11

opencv- ==4.7.0.68

mediapipe==0.9.0.1 {to be installed in python version -

3.8.0 }scikit-learn==1.2.0

VS code IDE

9
Chapter 4
4 CODE
IMPLIMENTATION

Importing Required Libraries

Loading and Píepíocessing the dataset

10
11
12
One Hot Encoding :-

13
Data Argumentation :-

Model Building :-

14
15
Confusion Matrix :-

16
17
Models Perforamance :-

18
Datacollection.py

import cv2
from cvzone.HandTrackingModule import HandDetector
import numpy as np
import math
import time

cap = cv2.VideoCapture(0)
detector = HandDetector(maxHands=1)
offset = 20
imgSize = 300
counter = 0

folder = "Data/Please"

while True:
success, img = cap.read()
hands, img = detector.findHands(img)
if hands:
hand = hands[0]
x, y, w, h = hand['bbox']

imgWhite = np.ones((imgSize, imgSize, 3), np.uint8)*255

imgCrop = img[y-offset:y + h + offset, x-offset:x + w +


offset] imgCropShape = imgCrop.shape

aspectRatio = h / w

19
if aspectRatio > 1:
k = imgSize / h
wCal = math.ceil(k * w)
imgResize = cv2.resize(imgCrop, (wCal, imgSize))
imgResizeShape = imgResize.shape
wGap = math.ceil((imgSize-wCal)/2)
imgWhite[:, wGap: wCal + wGap] = imgResize

else:
k = imgSize / w
hCal = math.ceil(k * h)
imgResize = cv2.resize(imgCrop, (imgSize, hCal))
imgResizeShape = imgResize.shape
hGap = math.ceil((imgSize - hCal) / 2)
imgWhite[hGap: hCal + hGap, :] = imgResize

cv2.imshow('ImageCrop', imgCrop)
cv2.imshow('ImageWhite', imgWhite)

cv2.imshow('Image', img)
key = cv2.waitKey(1)
if key == ord("s"):
counter += 1
cv2.imwrite(f'{folder}/Image_{time.time()}.jpg', imgWhite)
print(counter)

20
Test.py

import cv2
import numpy as np
import os
from cvzone.ClassificationModule import Classifier
from matplotlib import pyplot as plt
import mediapipe as mp
from cvzone.HandTrackingModule import HandDetector
import math
import time

cap = cv2.VideoCapture(0)
detector = HandDetector(maxHands=1)
Classifier = Classifier("Model/keras_model.h5", "Model/labels.txt")

offset = 20
imgSize = 300

folder = "Data/C"
counter = 0

labels = ["Hello", "God Bye", "No", "Please", "Sorry", "Thank You", "Welcome", "Yes"]

while True:
success, img = cap.read()
imgOutput = img.copy()
hands, img = detector.findHands(img)
if hands:
hand = hands[0]

21
x, y, w,h = hand['bbox']

imgWhite = np.ones((imgSize, imgSize, 3), np.uint8)*255


imgCrop = img[y - offset:y + h+offset, x-offset:x + w + offset]

imgCropShape = imgCrop.shape

aspectRatio = h / w

if aspectRatio >1:
k = imgSize/h
wCal = math.ceil(k * w)
imgResize = cv2.resize(imgCrop,(wCal, imgSize))
imgCropShape = imgResize.shape
wGap = math.ceil((imgSize-wCal)/2)
imgWhite[:, wGap:wCal+wGap] = imgResize
prediction, index =
Classifier.getPrediction(imgWhite,draw=False) print(prediction)

else:
k= imgSize/ w
hCal = math.ceil(k * h)
imgResize = cv2.resize(imgCrop,(imgSize, hCal))
imgCropShape = imgResize.shape
hGap = math.ceil((imgSize - hCal)/2)
imgWhite[hGap:hCal + hGap, :] = imgResize
prediction, index = Classifier.getPrediction(imgWhite,draw=False)

22
cv2.rectangle(imgOutput,(x - offset, y - offset-50),
(x - offset+90, y - offset-50+50), (255, 0 ,255), cv2.FILLED)
cv2.putText(imgOutput,labels[index], (x, y - 26),
cv2.FONT_HERSHEY_COMPLEX,
1.7, (255, 255, 255), 2)
cv2.rectangle(imgOutput,(x-offset, y-offset),
(x + w+offset, y + h+offset), (255, 0 ,255), 4)

cv2.imshow("ImageCrop", imgCrop)
cv2.imshow("ImageWhite", imgWhite)

cv2.imshow("Image", imgOutput)
cv2.waitKey(1)

laeals.txt

0 Class 1
1 Class 2
2 Class 3
3 Class 4
4 Class 5
5 Class 6
6 Class 7
7 Class 8

23
OutPut ScreenShots :-

24
25
Chapter 5
Result Analysis

In human action recognition tasks, sign language has an extra advantage as it


can be used to communicate efficiently. Many techniques have been developed
using image processing, sensor data processing, and motion detection by
applying different dynamic algorithms and methods like machine learning and
deep learning. Depending on methodologies, researchers have proposed their
way of classifying sign languages. As technologies develop, we can explore the
limitations of previous works and improve accuracy. this paper proposes a
technique for acknowledging hand motions, which is an excellent part of
gesture-based communication jargon, because of a proficient profound deep
convolutional neural network (CNN) architecture.
The proposed CNN design disposes of the requirement for recognition and
division of hands from the captured images, decreasing the computational
weight looked at during hand pose recognition with classical approaches. In our
method, we used two input channels for the images and hand landmarks to get
more robust data, making the process more efficient with a dynamic learning
rate adjustment. Besides in the presented results were acquired by retraining and
testing the sign language gestures dataset on a convolutional neural organization
model utilizing Inception v3. The model comprises various convolution channel
inputs that are prepared on a piece of similar information. A capsule-based deep
neural network sign posture translator for an American Sign Language (ASL)
fingerspelling (posture) has been introduced where the idea concept of capsules
and pooling are used simultaneously in the network.
This exploration affirms that utilizing pooling and capsule routing on a similar
network can improve the network's accuracy and convergence speed. In our
method, we have used the pre-trained model of Google to extract the hand
landmarks, almost like transfer learning. We have shown that utilizing two input
channels could also improve accuracy.

26
Chapter 6
Conclusion

In conclusion, sign language detection holds immense potential to bridge the


communication gap between sign language users and non-signers, thereby
promoting inclusivity, accessibility, and empowerment for the deaf and hard of
hearing community. Through the development of robust and scalable sign
language detection systems, we can enable real-time interpretation and
translation of sign language gestures into spoken language or text, facilitating
communication in various contexts such as education, employment, healthcare,
and social interactions.
Detecting sign language has become an important researchfield to improve
communication with deaf and dumb people.It is also important to understand
that different sign languagesare developed in different language communities,
and researchon sign language detection is also language-specific. Eventhough
English is a mainstream language with a large deafand dumb community, there
has been very little researchconducted on sign language detection in English. In
this paper,we propose a new English sign language detection schemethat relies
on fingertip position as a training tool for a CNN classifier. Several methods
have been tested and comparedagainst a large dataset of images. Based on test
set accuracy,the proposed method outperforms all existing methods. Inaddition,
the proposed scheme appears to be compact andefficient in terms of
computation and size.

27
Referance

[1] www.ijtrs/Literature%20Survey%20on%20Hand%20Gesture%201.pdf

[2] Jin CM, Omar Z, Jaward MH. A mobile application of American sign
language translation via image processing algorithms. Proceedings - 2016 IEEE
Region 10 Symposium, TENSYMP 2016. 2016;: p. 104-109.

[3] Shahriar S, Siddiquee A, Islam T, Ghosh A, Chakraborty R, Khan AI, et al.


Real- Time American Sign Language Recognition Using Skin Segmentation and
Image Category Classification with Convolutional Neural Network and Deep
Learning. IEEE Region 10 Annual Int ernational Conference,
Proceedings/TENCON. 2019; 2018- Octob(October): p. 1168-1171.

[4] Hore S, Chatterjee S, Santhi V, Dey N, Ashour AS, Balas VE, et al.
Optimized Neural Networks. 2017;: p. 139-151

[5] Ahmed W, Chanda K, Mitra S. Vision based Hand Gesture Recognition


using Dynamic Time Warping for Indian Sign Language. In 2016 International
Conference on Information Science (ICIS); 2016: IEEE. p. 120-125.

[6] Flores CJL, Cutipa AEG, Enciso RL. Application of convolutional neural
networksfor static hand gestures recognition under different invariant features.
Proceedings of the 2017 IEEE 24th International Congress on Electronics,
Electrical Engineering and Computing, INTERCON 2017. 2017;: p. 5-8

[7] Kumar P, Gauba H, Roy PP, Dogra DP. Coupled HMM-based multi-sensor
data fusion for sign language recognition. Pattern Recognition Letters. 2017; 86:
p. 1-8

[8] Huang J, Zhou W, Zhang Q, Li H, Li W. Video-based sign language


recognition without temporal segmentation. 32nd AAAI Conference on Artificial
Intelligence, AAAI 2018. 2018;: p. 2257-2264

[9] Jalal MA, Chen R, Moore RK, Mihaylova L. American Sign Language
Posture Understanding with Deep Neural Networks. 2018 21st International
Conference on Information Fusion, FUSION 2018. 2018;: p. 573-579.

[10] Aly W, Aly S, Almotairi S. User-independent American sign language


alphabet recognition based on depth image and features. IEEE Access. 2019; 7:
p. 123138-123150

28
BIBLOGRAPHY

[1]https://iopscience.iop.org/article/10.1088/1757-899X/1022/1/012072/meta

[2]https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-
021-01527-5
[3]https://www.mdpi.com/1999-4893/16/2/88

[4]https://www.researchgate.net/publication/349140147_Heart_Disease_Prediction

[5]https://www.sciencedirect.com/science/article/pii/S1877050920315210

[6]https://www.scirp.org/journal/paperinformation.aspx?paperid=122494

[7]https://arxiv.org/vc/arxiv/papers/2105/2105.10816v1.pdf

[8] https://jocc.journals.ekb.eg/article_282098_4b9e9c103330a9a045517d04f3a0a1
4a.pdf
[9] https://ieeexplore.ieee.org/document/8740989

[10]https://ieeexplore.ieee.org/iel7/6287639/8600701/08740989.pdf

29

You might also like