Sign Language Detection by Abhishek Deshmukh
Sign Language Detection by Abhishek Deshmukh
Sign Language Detection by Abhishek Deshmukh
PROJECT
REPORT
ON
“SIGN LANGUAGE DETECTION”
DEVELOPED BY
Ms. Abhishek Vivek Deshmukh
ACADEM
IC YEAR
2023-24
CERTIFICATE
This is to certify that the Project Report entitled
Place :
New
Panvel
Date:
The project is done in partial fulfillment of the requirement for the award of
MASTER OF SCIENCE (Computer Science) to be
submitted as a final semester project as part of our curriculum.
This approach takes image from the camera as data of gesture. The
vision based method mainly concentrates on captured image of gesture
and extract the main feature and recognizesit. The colour bands were
used at the start of vision based approach. The main disadvantageof this
method was the standard colour should be used on the finger tips. Then
use of bare hands preferred rather than the colour bands
TABLE OF CONTENTS
Chapter 1
1 Introduction 1
Chapter 2
2 Literature Review 2
Chapter 3
3 Proposed Methodology 7
Chapter 4
Chapter 5
5 Result Analysis 30
Chapter 6
6.1 Conclusion 31
6.2 References 32
6.3 Biblography 33
Chapter 1
Introduction
Sign Language Detection is a vital mode of communication for individuals who are
deaf or hard of hearing. With advancements in technology, particularly in computer
vision and machine learning, sign language detection has gained significance in
facilitating communication accessibility and inclusivity for the deaf community.
Sign language detection involves recognizing and interpreting gestures and
movements made by individuals using sign language and translating them into text
or spoken language.
Sign Language Detection strive to develop algorithms and methods for accurately
identifying the sequences of symbols produced and understanding their meaning.
Many SLR methods mistreat the problem as Gesture Recognition (GR). Therefore,
research has so far focused on identifying the positive characteristics and methods
of differentiation in order to properly label a given signal from a set of potential
indicators. However, sign language is more than just a collection of well-articulated
gestures.
Sign language guides this part of the community and empowers smooth
communication in the community of people with trouble talking and hearing
(deaf and dumb). They use hand signals along with facial expressions and body
activities to cooperate.
1
Chapter 2
2 Literature Survey
Sign language detection is a multidisciplinary field that involves computer vision, machine
learning, and sign language linguistics. Researchers have conducted numerous studies on sign
language detection using various techniques, including OpenCV and MediaPipe, which are
popular libraries for computer vision and hand tracking.
Here are some key points that may be covered in a literature survey on sign language detection
using OpenCV and MediaPipe:
1. Overview of Sign Language: A literature survey may start with an overview of sign
language, its importance, and the challenges associated with sign language processing.
2. Sign Language Recognition Techniques: The literature survey may cover different
techniques used for sign language recognition, including computer vision, machine learning,
and deep learning approaches. It may discuss the advantages and limitations of each technique.
3. OpenCV and MediaPipe: The literature survey may provide an overview of OpenCV and
MediaPipe, including their features, functionalities, and applications in sign language detection.
4. Sign Language Datasets: The literature survey may discuss existing sign language datasets
used for training and evaluating sign language detection models, including their size, diversity,
and limitations.
5. Sign Language Detection Algorithms: The literature survey may review different sign
language detection algorithms that utilize OpenCV and MediaPipe, including hand tracking,
hand landmark detection, and gesture recognition techniques. It may discuss their performance,
accuracy, and limitations.
2
6. Applications of Sign Language Detection: The literature survey may explore the
applications of sign language detection in various domains, such as accessibility, education,
communication, healthcare, and research.
7. Evaluation Metrics: The literature survey may discuss the evaluation metrics used to assess
the performance of sign language detection models, such as accuracy, precision, recall, F1-
score, and others.
8. Challenges and Future Directions: The literature survey may highlight the challenges and
limitations of sign language detection using OpenCV and MediaPipe, and identify future
research directions and opportunities for improvement.
9. Existing Research and Publications: The literature survey may summarize the existing
research and publications related to sign language detection using OpenCV and MediaPipe,
including their findings, methodologies, and contributions to the field.
It's important to note that a literature survey should be comprehensive, thorough, and basedon
relevant and credible sources, such as peer-reviewed journals, conference proceedings,and
reputable research publications. It should also provide critical analysis and synthesis of the
existing literature to identify research gaps and potential areas for further investigation.
3
2.1 Research Gap
Research in sign language detection has made significant strides in recent years, driven by
advancements in computer vision and machine learning techniques. However, several research
gaps still exist in this field, presenting opportunities for further investigation and innovation. Here
are some key research gaps in sign language detection:
Data Availability and Standardization: One of the primary challenges in sign language
detection research is the limited availability of standardized datasets. Existing datasets often
focus on specific sign languages or lack diversity in terms of signers, signing styles, and
environmental conditions. There is a need for large-scale, diverse datasets that cover
multiple sign languages, dialects, and variations in signing speed, complexity, and context.
Standardization of data collection protocols and annotation schemes would facilitate
benchmarking and comparison of different sign language detection approaches.
Cross-Linguistic Generalization: Most existing sign language detection models are trained and
evaluated on a single sign language or a limited set of sign languages. Generalizing models
across different sign languages poses a significant research challenge due to variations in
vocabulary, grammar, phonology, and cultural conventions. Research efforts are needed to
develop cross-linguistically robust models that can recognize common gestures and patterns
across different sign languages while accommodating language-specific variations.
Continuous Sign Language Detection: Many current sign language detection systems focus on
recognizing isolated signs or short sequences of signs. However, natural sign language
communication involves continuous, fluent motion with transitions between signs and non-
manual components such as facial expressions and body movements. Research is needed to
develop algorithms for continuous sign language recognition that can effectively handle
temporal dependencies, segmentation, and recognition of fluent signing gestures in real-time.
Non-Manual Signals and Contextual Information: Sign language communication involves not
only hand gestures but also non-manual signals such as facial expressions, head movements, and
body posture, which convey grammatical and semantic information. Incorporating non-manual
signals and contextual information into sign language detection models remains an open research
challenge. There is a need for multi-modal approaches that integrate visual, spatial, temporal,
and linguistic cues to improve the accuracy and robustness of sign language detection systems..
4
2.2 SCOPE OF THE PROJECT
The scope of sign language detection using OpenCV and MediaPipe is vast and can be
applied in various areas. Here are some potential areas where sign language detection using
OpenCV and MediaPipe can be valuable:
2. Education and Learning: Sign language detection can be used in educational settings
to create interactive learning materials, tutorials, or games that facilitate sign language
learning for individuals who are learning sign language as their primary or secondary
language.
6. Research and Development: Sign language detection using OpenCV and MediaPipe
can also be used in research and development of sign language recognition algorithms,
machine learning models, or other related technologies to advance the field of sign
language processing and improve the accuracy and performance of sign language.
5
2.3 Problem Statement
Sign language uses lots of gestures so that it looks like movement language which
consists of a series of hands and arms motions. For different countries, there are
different sign languages and hand gestures.Also, itis noted that some unknown
words are translated by simply showing gestures for each alphabet in the
word. In addition, sign language also includes specific gestures to each
alphabet in the English dictionary and for each number between 0 and 9. Based
on these sign languages are made up of two groups, namely static gesture,
and dynamic gesture. The static gesture is used for alphabet and number
representation, whereas the dynamic gesture is used for specific concepts.
Dynamic also includes words, sentences, etc. The static gesture consists of hand
gestures, whereas the latter includes motion of hands, head, or both. Sign language
is a visual language and consists of 3 major components, such as finger-spelling,
word-level sign vocabulary, and non-manual features. Finger-spelling is used to
spell words letter by letter and convey the message whereas the latter is keyword-
based. But the design of a sign language translator is quite challenging despite
many research efforts during the last few decades. Also, even the same signs have
significantly different appearances for different signers and different viewpoints.
This work focuses on the creation of a static sign language translator by using a
Convolutional Neural Network. We created a lightweight network that can be
used with embedded devices/standalone applications/web applications having
fewer resources
6
Chapter 3
Proposed Methodology
A perfectly working model idea for sign language, that includes all the alphabets and digits.
It also includes dual mode of output that is, sign language converted into text/speech and
speech converted in sign.
ARCHITECTURAL DESIGN :-
CNN ALGORITHM :
Image classification is the process of taking an input(like a picture) and outputting its class or probability
that the input is a particular class. Neural networks are applied in the following steps:
7
1) One hot encode the data: A one-hot encoding can be applied to the integer representation. This
is where the integer encoded variable is removed and a new binary variable is added for each
unique
integer value.
2) Define the model: A model said in a very simplified form is nothing but a function that is used to
take in certain input, perform certain operation to its best on the given input (learning and then
predicting/classifying) and produce the suitable output.
3) Compile the model: The optimizer controls the learning rate. ‘adam’ as our optmizer. Adam is
generally a good optimizer to use for many cases. The adam optimizer adjusts the learning rate
throughout training. The learning rate determines how fast the optimal weights for the model
are
calculated. A smaller learning rate may lead to more accurate weights (up to a certain point), but the
time it takes to compute the weights will be longer.
The models are implemented using Python 3.10 with listed libraries:
⮚ Pandas
Pandas is a Python package to work with structured and time series data. The data from various file
formats such as csv, json, sql etc can be imported using Pandas. It is a powerful open source tool
used for data analysis and data manipulation operations such as data cleaning, merging, selecting as
well wrangling.
⮚ Sklearn
This python library is helpful for building machine learning and statistical models such as
clustering, classification, regression etc. Though it can be used for reading, manipulating
and summarizing the data as well, better libraries are there to perform these functions.
⮚ Matplotlib.pyplot
Matplotlib.pyplot is a collection of functions that make matplotlib work like MATLAB. Each
pyplot function makes some change to a figure: e.g., creates a figure, creates a plotting area in a
figure, plots some lines in a plotting area, decorates the plot with labels, etc.
8
Hardware Requirments :-
– 32
MBRecommended :
64MB
: 4GB or above.
Software Requirments
opencv- ==4.7.0.68
3.8.0 }scikit-learn==1.2.0
VS code IDE
9
Chapter 4
4 CODE
IMPLIMENTATION
10
11
12
One Hot Encoding :-
13
Data Argumentation :-
Model Building :-
14
15
Confusion Matrix :-
16
17
Models Perforamance :-
18
Datacollection.py
import cv2
from cvzone.HandTrackingModule import HandDetector
import numpy as np
import math
import time
cap = cv2.VideoCapture(0)
detector = HandDetector(maxHands=1)
offset = 20
imgSize = 300
counter = 0
folder = "Data/Please"
while True:
success, img = cap.read()
hands, img = detector.findHands(img)
if hands:
hand = hands[0]
x, y, w, h = hand['bbox']
aspectRatio = h / w
19
if aspectRatio > 1:
k = imgSize / h
wCal = math.ceil(k * w)
imgResize = cv2.resize(imgCrop, (wCal, imgSize))
imgResizeShape = imgResize.shape
wGap = math.ceil((imgSize-wCal)/2)
imgWhite[:, wGap: wCal + wGap] = imgResize
else:
k = imgSize / w
hCal = math.ceil(k * h)
imgResize = cv2.resize(imgCrop, (imgSize, hCal))
imgResizeShape = imgResize.shape
hGap = math.ceil((imgSize - hCal) / 2)
imgWhite[hGap: hCal + hGap, :] = imgResize
cv2.imshow('ImageCrop', imgCrop)
cv2.imshow('ImageWhite', imgWhite)
cv2.imshow('Image', img)
key = cv2.waitKey(1)
if key == ord("s"):
counter += 1
cv2.imwrite(f'{folder}/Image_{time.time()}.jpg', imgWhite)
print(counter)
20
Test.py
import cv2
import numpy as np
import os
from cvzone.ClassificationModule import Classifier
from matplotlib import pyplot as plt
import mediapipe as mp
from cvzone.HandTrackingModule import HandDetector
import math
import time
cap = cv2.VideoCapture(0)
detector = HandDetector(maxHands=1)
Classifier = Classifier("Model/keras_model.h5", "Model/labels.txt")
offset = 20
imgSize = 300
folder = "Data/C"
counter = 0
labels = ["Hello", "God Bye", "No", "Please", "Sorry", "Thank You", "Welcome", "Yes"]
while True:
success, img = cap.read()
imgOutput = img.copy()
hands, img = detector.findHands(img)
if hands:
hand = hands[0]
21
x, y, w,h = hand['bbox']
imgCropShape = imgCrop.shape
aspectRatio = h / w
if aspectRatio >1:
k = imgSize/h
wCal = math.ceil(k * w)
imgResize = cv2.resize(imgCrop,(wCal, imgSize))
imgCropShape = imgResize.shape
wGap = math.ceil((imgSize-wCal)/2)
imgWhite[:, wGap:wCal+wGap] = imgResize
prediction, index =
Classifier.getPrediction(imgWhite,draw=False) print(prediction)
else:
k= imgSize/ w
hCal = math.ceil(k * h)
imgResize = cv2.resize(imgCrop,(imgSize, hCal))
imgCropShape = imgResize.shape
hGap = math.ceil((imgSize - hCal)/2)
imgWhite[hGap:hCal + hGap, :] = imgResize
prediction, index = Classifier.getPrediction(imgWhite,draw=False)
22
cv2.rectangle(imgOutput,(x - offset, y - offset-50),
(x - offset+90, y - offset-50+50), (255, 0 ,255), cv2.FILLED)
cv2.putText(imgOutput,labels[index], (x, y - 26),
cv2.FONT_HERSHEY_COMPLEX,
1.7, (255, 255, 255), 2)
cv2.rectangle(imgOutput,(x-offset, y-offset),
(x + w+offset, y + h+offset), (255, 0 ,255), 4)
cv2.imshow("ImageCrop", imgCrop)
cv2.imshow("ImageWhite", imgWhite)
cv2.imshow("Image", imgOutput)
cv2.waitKey(1)
laeals.txt
0 Class 1
1 Class 2
2 Class 3
3 Class 4
4 Class 5
5 Class 6
6 Class 7
7 Class 8
23
OutPut ScreenShots :-
24
25
Chapter 5
Result Analysis
26
Chapter 6
Conclusion
27
Referance
[1] www.ijtrs/Literature%20Survey%20on%20Hand%20Gesture%201.pdf
[2] Jin CM, Omar Z, Jaward MH. A mobile application of American sign
language translation via image processing algorithms. Proceedings - 2016 IEEE
Region 10 Symposium, TENSYMP 2016. 2016;: p. 104-109.
[4] Hore S, Chatterjee S, Santhi V, Dey N, Ashour AS, Balas VE, et al.
Optimized Neural Networks. 2017;: p. 139-151
[6] Flores CJL, Cutipa AEG, Enciso RL. Application of convolutional neural
networksfor static hand gestures recognition under different invariant features.
Proceedings of the 2017 IEEE 24th International Congress on Electronics,
Electrical Engineering and Computing, INTERCON 2017. 2017;: p. 5-8
[7] Kumar P, Gauba H, Roy PP, Dogra DP. Coupled HMM-based multi-sensor
data fusion for sign language recognition. Pattern Recognition Letters. 2017; 86:
p. 1-8
[9] Jalal MA, Chen R, Moore RK, Mihaylova L. American Sign Language
Posture Understanding with Deep Neural Networks. 2018 21st International
Conference on Information Fusion, FUSION 2018. 2018;: p. 573-579.
28
BIBLOGRAPHY
[1]https://iopscience.iop.org/article/10.1088/1757-899X/1022/1/012072/meta
[2]https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-
021-01527-5
[3]https://www.mdpi.com/1999-4893/16/2/88
[4]https://www.researchgate.net/publication/349140147_Heart_Disease_Prediction
[5]https://www.sciencedirect.com/science/article/pii/S1877050920315210
[6]https://www.scirp.org/journal/paperinformation.aspx?paperid=122494
[7]https://arxiv.org/vc/arxiv/papers/2105/2105.10816v1.pdf
[8] https://jocc.journals.ekb.eg/article_282098_4b9e9c103330a9a045517d04f3a0a1
4a.pdf
[9] https://ieeexplore.ieee.org/document/8740989
[10]https://ieeexplore.ieee.org/iel7/6287639/8600701/08740989.pdf
29