0% found this document useful (0 votes)

17 views16 pages

American Sign Language Detection Using YOLOv5 and

Uploaded by

munimahesh2605

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views16 pages

American Sign Language Detection Using YOLOv5 and

Uploaded by

munimahesh2605

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

American Sign Language Detection using YOLOv5

and YOLOv8
Shobhit Tyagi
Department of Computer Science & Engineering, School of Engineering & Technology, Sharda University,
India
Prashant Upadhyay (  [email protected] )
Department of Computer Science & Engineering, School of Engineering & Technology, Sharda University,
India
Hoor Fatima
Department of Computer Science & Engineering, School of Engineering & Technology, Sharda University,
India
Sachin Jain
Department of Computer Science & Engineering, Ajay Kumar Garg Engineering College, Ghaziabad, India
Avinash Kumar Sharma
Department of Computer Science & Engineering, ABES Institute of Technology Ghaziabad

Research Article

Keywords: CNNs, Sign Language Recognition, YOLOv8, Transfer Learning, Direct Location Prediction

Posted Date: July 6th, 2023

DOI: https://doi.org/10.21203/rs.3.rs-3126918/v1

License:   This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License

Page 1/16
Abstract
In the modern world, culture and religion are diverse and widespread. Sign language culture had grown
since its emergence in the American School for the Deaf (ASD) in 1817. In a world where computers are
now solving real-time applications and issues using deep learning, Sign language (SL) is one of those.
YOLO is an object detection and classification algorithm that uses Convolutional neural network (CNN) to
achieve high performance and accuracy. The paper aims to detect American sign language using YOLO
models and compare different YOLO algorithms by implementing a custom model for recognizing sign
language. The experiments show that the latest YOLOv8 gave better results than other YOLO versions
regarding precision and mAP, while YOLOv7 has a higher recall value during testing than YOLOv8. The
proposed model is lightweight, fast and uses the American sign language letters dataset for training and
testing. The custom model achieved 95% precision, 97% recall, and 96% mAP @0.5, showing the model
capabilities in real-time hand gesture recognition.

1. Introduction
According to the WHO [1], over 1.5 billion people suffer from hearing loss around the globe. Additionally,
about one billion teenagers are at risk of developing hearing loss due to the improper use of earbuds and
headphones. Such disabilities severely affect children's physical and mental health, education, and
employment opportunities. Older adults often experience hearing loss, social isolation, loneliness, and
frustration. Children with hearing loss may face delayed language development and communication
difficulties. Unfortunately, hearing loss is often not adequately accommodated in private and government
settings, which negatively impacts employment opportunities and academic achievement. Children with
hearing impairment have a hard time understanding others and receive little to no education. There is no
universal sign language that is used by all deaf individuals. Sign language detection poses several
challenges due to its unique characteristics and the complexity of capturing and interpreting sign
language. Although there exists some noticeable similarities, each country has its own unique way of
using sign language. Some of the key challenges in sign language detection include:

1. Variations in sign languages: SL vary across different regions and countries. Each sign language has
its own vocabulary, grammar, and syntax. Detecting and understanding different sign languages
require language-specific models and datasets, making developing a universal sign language
detection system challenging.
2. Gesture recognition: Sign languages involve a combination of simultaneous hand movements, facial
expressions, body postures, and other non-manual markers. Capturing and recognizing these subtle
and dynamic gestures accurately is a complex task.
3. Data scarcity: SL data is relatively scarce compared to spoken languages. Building accurate sign
language detection models requires large amounts of annotated data, which is often limited,
especially for certain sign languages or specific sign variations.

Page 2/16
4. Background noise and occlusion: Sign language is often performed in real-world environments,
which can introduce background noise, occlusions, and cluttered backgrounds. These factors can
interfere with the visibility of the signer's hands and facial expressions, making it difficult to detect
and interpret signs accurately.
5. Ambiguity and context dependency: Sign languages rely on facial expressions, context, and body
movements to express meaning. Isolated signs may have multiple interpretations depending on the
surrounding signs or the speaker's intentions.
6. Real-time detection and latency: Low latency is crucial in applications where sign language detection
needs to be performed in real time, such as sign language interpretation systems or assistive
technologies.

Another major issue faced by the deaf community is that some languages are officially recognized while
others are not recognized by the government or the institutions. The National Association of Deaf [2]
reports that there are 18 million individuals with hearing impairments in India. Innovative Artificial
Intelligence approaches [3] could be useful for a sign language interpretation application. The field of
artificial intelligence (AI) involves developing smart computers that can learn from raw data. These
machines are capable of making decisions even in unfamiliar situations. There are many systems and
methods that have been developed to tackle such problems. This research aims to compare two
algorithms of the same object detection family and also discuss the characteristics and advantages of
each algorithm in their own respective ways [4].

It is observed that many institutions tend to give priority to American Sign Language (ASL) over Indian
Sign Language (ISL). Nevertheless, ASL is chosen in this project as it is suitable for training the models
using alphabets and numbers. This paper introduces a custom CNN model to predict the gestures
performed in real-time at high accuracy. The experiment-based functional model is capable of
recognizing hand gestures that can be deployed in any real-life situation.

2. Related Work
Sign language is unique due to its nature of using different kinds of physical gestures instead of using
specific sounds. A substantial amount of work had been done in facial and gesture recognition. Based on
our research, the existing methods can be divided into two categories

In the first method, an external device is used for sign recognition; while another method for detecting and
recognizing hand gestures [10] is to use deep learning. Several researchers have developed novel
techniques for SL recognition with solutions to different aspects such as cost, latency, performance and
portability. A. Bhattacharya [5] talks about training a classifier on a dataset of 24 gestures that can be
easily spelt on fingers using the "bag of visual words approach".

In order to describe each image as a graph (particularly a histogram) indicating the frequency of
observed gestures in that image, this method first identifies the features in the images that are then

Page 3/16
clustered to construct a book containing those gestures and their frequency. T. Starner [6] showed two
real-time hidden Markov model-based systems that only require a camera to track the user's user's to
recognize intermediate-level continuous ASL.

The original system had a desk-mounted camera to watch the user and had a 92% accuracy rate. The
second device, which reaches 98% accuracy, mounts the camera in the user's hat. L. Aziz [7] surveyed the
most recent developments in visual object detection with deep learning, covering around 300 methods,
including region-based object detection methods such as SPPnet [30], Faster R-CNN [32], Classification
and regression-based object detection methods such as YOLO [15, 16], etc. The authors also researched
and analyzed publicly available benchmark datasets based on their origin, usage, advantages and
limitations along with their evaluation metrics. D. Naglot [8] recognized several signs using a Multi-layer
Perceptron neural network on 520 samples contained in a dataset. Table 1. shows the recent state-of-the-
art methods for sign language detection.

Page 4/16
Table 1
Related work
AUTHOR METHOD ALGORITHM USED DATASET ACCURACY

O. Vedak et SL interpreter HOG & SVM 6000 images of 26 88%

al. [2] utilizing English language
machine alphabets.
learning and
image
processing

A. Training a SL SURF and BRISK features 4972 images of 24 SURF

Bhattacharya classifier using are used for automatic static single-handed
et al. [5] the bag of feature identification. fingerspelling Features
visual words Algorithms such as KNN, gestures on a plain
approach SVM, etc, are used for background SVM −
evaluation. 91.35 KNN
− 86.97

BRISK
Features
SVM −
91.15 KNN
− 87.38

D. Naglot et In Real-time SL MLP and Back 26 different 96.15%

al. [8] detection using propagation (An idea for a alphabets of ASL of
Leap Motion categorization model that 520 samples
Controller accepts a set of features (consisting of 20
as input.) samples of each
alphabet)

C. Chuan et SL recognition K-NN & SVM (Leap motion Four data sets were KNN −
al. [9] using Leap sensors and a webcam collected from the 7.78%
Motion Sensor have a lot of potential to two signers with
advance SL learning two sets from each SVM −
techniques.) individual. 79.83%

Z. wang et al. End-to-end SL DeepSLR technology" is 5586 sign words -

[11] detection in real used to record the coarse and 26 alphabet
time. and finger movements signs
along with several
sensors.

M. Fernando Low-cost Active shape models 50 signs, 5 signs 76%

et al. [12] approach for from 10 users (A, B,
Real-time SL C, D, V Signs)
recognition

K. Dutta et al. Machine ISL 220 images of 92%

[13] learning double handed ISL
methods for alphabets and 800
ISL detection images of single
handed Indian SL
alphabets

Page 5/16
AUTHOR METHOD ALGORITHM USED DATASET ACCURACY

T. Two-handed HOG THISL dataset with 87.67%

Mangamuri ISL dataset for 26 motions, each of
et al. [14] comparing which represents a
machine letter of the English
learning alphabet
classification
models

3. Methodology
The most commonly used datasets for object detection [20, 21] and segmentation are Pascal VOC 2007
[28] and Microsoft COCO [17, 29]. This research focuses on YOLO version 5 and 8, as these models have
better performance in recognizing sign language with high accuracy. The proposed methodology is
shown in Fig. 1.

3.1 YOLOv5
YOLOv5 is the deep learning-based architecture that is used to conduct this experiment. YOLOv5 [18] is
lightweight and fast and needs less computational power than the other current state-of-the-art
architecture model while keeping the accuracy near the current state-of-the-art detection models. The first
model to be published without a supporting paper was Glenn Jocher's YOLOv5, which was also marked
as being under "ongoing development" on its repo. Glenn Jocher is a researcher at Ultralystics LLC. The
Github repository for YOLOv5 is available : Releases · ultralytics/yolov5 (github.com). The publication
date was June 2020. Python was used to construct YOLOv5 on IoT devices [26], which simplifies
installation and integration, as opposed to C as in past versions. Also, the PyTorch community was larger
than the Darknet community, giving it more possibility for growth and expansion in the future. [19]. It is
much faster than the other YOLO models. YOLOv5 uses CSPNET [29] as the backbone to extract the
feature map from the image. It also uses Path Aggregation Network (PANet) [31] to boost the information
flow. Figure 3 shows the architecture of YOLOv5. We are using YOLOv5 for the following reasons:

1) The YOLOv5 has SOTA features such as an activation function, hyperparameter, data augmentation
technique and an easy-to-use manual

2) The model has a simple architecture, which makes it computationally easy to train even with small
resources.

3) The small and lightweight nature of YOLOv5 makes it useful for mobile devices and embedded
applications.

3.2 YOLOv8
Also authored by Glenn Jocher and launched on January 23rd 2023, YOLOv8 is the latest in the family of
algorithms and is still in development, along with adding many new features such as Anchor free
Page 6/16
detection and mosaic augmentation. A CLI that is included with YOLOv8 makes training a model easier to
understand. Moreover, there is a Python package [27] that offers a smoother development experience
than the previous model. The Github repository for the YOLOv8 is available : GitHub -
ultralytics/ultralytics: NEW - YOLOv8 🚀 in PyTorch > ONNX > CoreML > TFLite. Predictions for both the
bounding boxes as well as the classes are produced when the input image has been evaluated once. The
algorithm is as since both the predictions – bounding box and classification, are performed
simultaneously. The provided image is initially converted into a grid of equal lengths (S x S). Next,
confidence scores are defined for each grid cell's "b" bounding boxes as shown in equation (i). [22].
Confidence is the probability that an object exists in every bounding box.

target
Confidence (C) = P(object) * IOU pred —(i)

Where, IOU = Intersection over union

IOU [23] stands for a fractional value in the range of 0 and 1. Union is the total area between the predicted
and the target areas, whereas intersection is the overlap between the predicted bounding box and the
target area. The ideal value is close to 1, which denotes that the estimated bounding box is near the
target region. Along with this, every grid cell also predicts the Confidence conditional class probability as
shown in equation (ii) and (iii).

target
C = P (Class i | object) * P(object) * I. O. U pred —(ii)

target
C = P(Class i ) * I. O. U pred —(iii)

Now coming to the loss function, it is calculated by summing all the bounding box parameter’s loss
function result as shown in equation (iv),
2
S b
obj 2 2
λcoord ∑ ∑ 1 [(xi − x̂ i ) + (y − ŷ ) ]
ij i i
i=0 j=0

2
2 2
S b
obj
ˆ
+λcoord ∑ ∑ 1 [(√w i − √ŵ i ) + (√hi − √ hi ) ]
ij
i=0 j=0

2 2
S 2 S 2
b obj b obj
ˆ ˆ
+∑ ∑ 1 (C i − C i ) + λnoobj ∑ ∑ 1 (C i − C i )
i=0 j=0 ij i=0 j=0 ij

2
S obj 2
+∑
i=0
1
i
∑
cϵclasses
(p
i
(c) − p̂
i
(c)) —(iv)

The given equation carries five important terminologies defined as follows [23]:

(x i , yi) – coordinates of target center (bounding box)

( x̂ i , ŷ i) – coordinates of predicted center (bounding box)

Page 7/16
(w i , hi) – dimensions of target

( ŵ i , ˆ
hi) – dimensions of predicted

Equation's first step computes the loss associated with the bounding box using coordinates (xi,yi). If an
object is present inside the jth forecasted bounding box inside the ith cell, 1ij is defined as 1, and if it is
obj

not true then 0 as shown in equation (v). The prediction with the highest current IOU with the target region
will be considered "responsible" for predicting an object by the predicted bounding box [19].
2
S b obj 2 2
λcoord ∑
i=0
∑
j=0
1
ij
[(xi − x̂ i ) + (y
i
− ŷ ) ]
i
—(v)

The next portion is responsible for calculating the error in the prediction of the dimensions of the
bounding box. The scale of the inaccuracy in the large boxes, however, is less impactful on the equation
itself than it is in the small boxes. The square roots of width and height, which are both normalized in the
range of 0 and 1, make the discrepancies between smaller values bigger than those between larger ones.
As a result, rather than using the dimension values directly, the bounding box's square root is employed
as shown in equation (vi).

2
2 2
S b obj ˆ
λcoord ∑
i=0
∑
j=0
1
ij
[(√w i − √ŵ i ) + (√hi − √ hi ) ] —(vi)

The loss value of confidence is calculated in the next section for both circumstances, regardless of
whether the object is present inside the bounding box or not. However, if that predictor is in charge of the
target box, only then the loss function shall penalize the object confidence mistake as shown in equation
obj
(vii). If there is an object in the cell, 1ij equals 1, else results 0.

2 2 2 2
S b obj S b obj
ˆ ˆ
∑
i=0
∑
j=0
1
ij
(C i − C i ) + λnoobj ∑
i=0
∑
j=0
1
ij
(C i − C i ) —(vii)

obj
With the exception of 1ij , which is needed because the algorithm does not penalize classification errors
if there are no objects present in the cell, the last portion computes the loss of class probability [19] as
shown in equation (viii) .
2
S obj 2
∑
i=0
1
i
∑
cϵclasses
(p i (c) − p̂ i (c)) —(viii)

The first step is to install and initialize both the algorithms – YOLOv5 and YOLOv8, both the algorithms
will be running simultaneously on different devices with similar specifications all so it is easier to
compare both results and also less time consuming. The data set that is chosen for this work is
“American Sign Language letters” [23] from the publicly available datasets on Roboflow. Along with that,
we will use PyTorch which is based on the well-known Torch library. Moreover, a Python-based library that
is more frequently used for computer vision and natural language processing. When downloading a

Page 8/16
dataset from Roboflow, many methods are provided as to how that dataset should be implemented in the
model. One of those methods is to apply the pytorch code that installs a roboflow package and also
downloads the dataset directly into the directory of the program. Another major package that is
downloaded is Ultralytics, this package provides all the versions of the YOLO algorithm hence making it
optimal for this particular program.

4. Results
The dataset used contains images for testing, training and validation in the ratio of 1: 21: 2 respectively
(72 for testing, 1512 for training, 144 for validation) and the new model has been trained for 80 epochs.
Figure 2. shows the confusion matrix of v5 and v8 on the ASL dataset. It is worth noting that as
compared to YOLOv5, YOLOv8’s confusion matrix shows more discrepancies, which makes v8 vulnerable
to unseen images and real life applications. There are 12 occurrences, other than background in YOLOv5
where the predicted value is not the same as the true values.

While in YOLOv8’s case, those occurrences are 13. In hindsight, this comparison may seem redundant but
after observing the two matrices, a conclusion is reached that background prediction for both cases are
highly differentiable. One difference between v5 and v8 is the number of iterations needed to reach the
minimum, v8 requires less number of iterations as compared to the v5. This may have happened because
YOLOv8 is still in its first few stages of development as compared to its predecessor.

Next noticeable difference is the significance of each wrong prediction, the other model presents 7 of
those specific cases. Figure 3 shows the comparison of the loss reduction of both YOLO versions, which
shows that v5 has descended faster than the v8. Almost at both bounding box and classification loss,
YOLOv8 gives a quicker minimum point which is around the 40th epoch mark; contrasting that, YOLOv5
results give that point around after the 50th epoch, presumably around the 60th epoch or higher as
shown in Table 2. The custom model achieved 95% precision, 97% recall, and 96% mAP @0.5, showing
the model capabilities in real-time hand gesture recognition. The bounding box loss in v5 and v8 base
models is 0.326 and 0.312,while the classification loss of v5 and v8 models is 0.191 and 0.175. This
shows that YOLOv8 is much better at detecting objects, while YOLOv8 has much classification accuracy.
Overall, v8 is more accurate and faster at detecting objects and recognizing hand gestures. YOLOv5
shows more crowded points and also high stretches between the initial few points. After all the results of
the training were obtained, a small batch from the validation subset was run and tested and comparing
both results from the surface would indicate that both models can perform well when the gesture is much
more distinct and identifiable through naked eyes alone. However, the same cannot be stated for those
gestures which are less distinct and can normally be mistaken for another letter entirely. YOLOv8 gives
more accurate and correct results and also provides predictions of certain gestures that weren’t given as
well. properly predicted by the YOLOv5 model.

Page 9/16
Table 2
Initial and Final loss
BOUNDING BOX LOSS CLASSIFICATION LOSS mAP

1st epoch 80th epoch 1st epoch 80th epoch

YOLOv5 1.733 0.3265 4.251 0.191 93.6%

YOLOv8 1.444 0.312 4.352 0.1735 96%

Both neural networks were trained for 80 epochs on web IDE “Google collab” with a Tesla T4 at P(8) and
CUDA version 12.0. Time taken for training was approximately 6 hours, excluding the remaining 1–2
hours for preparing the dataset and testing. Validation results of both are shown in Fig. 4a and Fig. 4b.
Also the Precision Curve of YOLOv5 and YOLOv8 is shown in Fig. 5a and Fig. 5b. Both performances are
excellent and do not show any major change in between themselves. The difference between the final
confidence level of these is observed to be around 0.01 and therefore no proper census can be made on
whether a particular model is better than the other. However, the situation is different during practical
testing, while images provided from the dataset [24] for testing both the models show no difference in
between themselves, better performance is observed in YOLOv5 for majority of its time when testing an
intermediate quality recorded video. Correct and Incorrect prediction is shown using both versions in
Fig. 6a and Fig. 6b.

5. Conclusion
While YOLOv8 performs noticeably better during training and validation to some extent, even in testing.
However, the model is not ready for any practical applications that involve constant frame movement,
which makes YOLOv8 a more conceptual model at its current stage. On the other hand, YOLOv5 is easily
able to predict the right gesture and even fills the prediction, which was left blank by the other model.

Declarations
On behalf of all authors, I state that there is no conflict of interest.

Ethical Considerations I declare that all ethical guidelines and principles have been followed during the
course of this research.

Contributors: 1,2 wrote the manuscript and 3,4,5 have designed the experiments of the proposed model.

Availability of Data: All the links for the external sources are included in the manuscipt.

Funding: No funding is received.

Page 10/16
References
1. World Health Organization. (2023, February 27) Deafness and hearing loss. Retreived from:
https://www.who.int/news-room/fact-sheets/detail/deafness-and-hearing-loss.
2. Vedak, Omkar, et al. "Sign language interpreter using image processing and machine learning."
International Research Journal of Engineering and Technology (IRJET) (2019).
3. Halvardsson, G., Peterson, J., Soto-Valero, C., & Baudry, B. (2021). Interpretation of swedish sign
language using convolutional neural networks and transfer learning. SN Computer Science, 2(3),
207.
4. Daniels, S., Suciati, N., & Fathichah, C. (2021, February). Indonesian sign language recognition using
yolo method. In IOP Conference Series: Materials Science and Engineering (Vol. 1077, No. 1, p.
012029). IOP Publishing.
5. Bhattacharya, Abhiruchi, Vidya Zope, Kasturi Kumbhar, Padmaja Borwankar, and Ariscia Mendes.
"Classification of sign language gestures using machine learning." Image 8, no. 12 (2019).
6. Starner, J. Weaver and A. Pentland, "Real-time American sign language recognition using desk and
wearable computer based video," in IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 20, no. 12, pp. 1371-1375, Dec. 1998, doi: 10.1109/34.735811.
7. Aziz, M. S. B. Haji Salam, U. U. Sheikh and S. Ayub, "Exploring Deep Learning-Based Architecture,
Strategies, Applications and Current Trends in Generic Object Detection: A Comprehensive Review," in
IEEE Access, vol. 8, pp. 170461-170495, 2020, doi: 10.1109/ACCESS.2020.3021508.
8. Naglot and M. Kulkarni, "Real time sign language recognition using the leap motion controller," 2016
International Conference on Inventive Computation Technologies (ICICT), 2016, pp. 1-5, doi:
10.1109/INVENTIVE.2016.7830097.
9. C. -H. Chuan, E. Regina and C. Guardino, "American Sign Language Recognition Using Leap Motion
Sensor," 2014 13th International Conference on Machine Learning and Applications, 2014, pp. 541-
544, doi: 10.1109/ICMLA.2014.110.
10. Imagawa, Shan Lu and S. Igi, "Color-based hands tracking system for sign language recognition,"
Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition, 1998,
pp. 462-467, doi: 10.1109/AFGR.1998.670991.
11. Wang et al., "Hear Sign Language: A Real-Time End-to-End Sign Language Recognition System," in
IEEE Transactions on Mobile Computing, vol. 21, no. 7, pp. 2398-2410, 1 July 2022, doi:
10.1109/TMC.2020.3038303.
12. Fernando and J. Wijayanayaka, "Low cost approach for real time sign language recognition," 2013
IEEE 8th International Conference on Industrial and Information Systems, 2013, pp. 637-642, doi:
10.1109/ICIInfS.2013.6732059.
13. K. Dutta and S. A. S. Bellary, "Machine Learning Techniques for Indian Sign Language Recognition,"
2017 International Conference on Current Trends in Computer, Electrical, Electronics and
Communication (CTCEEC), 2017, pp. 333-336, doi: 10.1109/CTCEEC.2017.8454988.

Page 11/16
14. S. Teja Mangamuri, L. Jain and A. Sharmay, "Two Hand Indian Sign Language dataset for
benchmarking classification models of Machine Learning," 2019 International Conference on Issues
and Challenges in Intelligent Computing Techniques (ICICT), 2019, pp. 1-5, doi:
10.1109/ICICT46931.2019.8977713.
15. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object
Detection”, in proceedings of the IEEE conference on computer vision and pattern recognition, pp.
779–788, 2016
16. Du, Juan. "Understanding object detection based on CNN family and YOLO." In Journal of Physics:
Conference Series, vol. 1004, p. 012029. IOP Publishing, 2018
17. Redmon, J., & Farhadi, A. (2017). YOLO9000: better, faster, stronger. In Proceedings of the IEEE
conference on computer vision and pattern recognition (pp. 7263-7271)
18. Thuan, D. (2021). Evolution of Yolo algorithm and Yolov5: The State-of-the-Art coloobject detection
algorithm.
19. D. T. Yung, W. K. Wong, F. H. Juwono and Z. A. Sim, "Safety Helmet Detection Using Deep Learning:
Implementation and Comparative Study Using YOLOv5, YOLOv6, and YOLOv7," 2022 International
Conference on Green Energy, Computing and Sustainable Technology (GECOST), Miri Sarawak,
Malaysia, 2022, pp. 164-170, doi: 10.1109/GECOST55694.2022.10010490.
20. Huang, J. Pedoeem and C. Chen, "YOLO-LITE: A Real-Time Object Detection Algorithm Optimized for
Non-GPU Computers," 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA,
2018, pp. 2503-2510, doi: 10.1109/BigData.2018.8621865
21. Diwan, T., Anirudh, G., & Tembhurne, J. V. (2022). Object detection using YOLO: Challenges,
architectural successors, datasets and applications. Multimedia Tools and Applications, 1-33.
22. Thuan, D. (2021). Evolution of Yolo algorithm and Yolov5: The State-of-the-Art object detection
algorithm.
23. lee, 2020, American Sign Language Letters Dataset v1, https://public.roboflow.com/object-
detection/american-sign-language-letters
24. Noman, V. Stankovic and A. Tawfik, "Object Detection Techniques: Overview and Performance
Comparison," 2019 IEEE International Symposium on Signal Processing and Information Technology
(ISSPIT), 2019, pp. 1-5, doi: 10.1109/ISSPIT47144.2019.9001879.
25. Rastogi, A., & Ryuh, B. S. (2019). Teat detection algorithm: YOLO vs. Haar-cascade. Journal of
Mechanical Science and Technology, 33, 1869-1874.
26. Wu, S., Li, Z., Li, S., Liu, Q., & Wu, W. (2023). Static Gesture Recognition Algorithm Based on Improved
YOLOv5s. Electronics, 12(3), 596.
27. Karaman, A., Pacal, I., Basturk, A., Akay, B., Nalbantoglu, U., Coskun, S., ... & Karaboga, D. (2023).
Robust real-time polyp detection system design based on YOLO algorithms by optimizing activation
functions and hyper-parameters with artificial bee colony (ABC). Expert Systems with Applications,
119741.

Page 12/16
28. Mark Everingham, Luc Gool, Christopher K. Williams, John Winn, and Andrew Zisserman. 2010. The
Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vision 88, 2 (June 2010), 303–
338.https://doi.org/10.1007/s11263-009-0275-4.
29. Lin, Tsung-Yi, et al. "Microsoft coco: Common objects in context." Computer Vision–ECCV 2014: 13th
European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer
International Publishing, 2014.
30. Wang CY, Liao HY, Wu YH, Chen PY, Hsieh JW, Yeh IH. CSPNet: A new backbone that can enhance
learning capability. InProceedings of the IEEE/CVF conference on computer vision and pattern
recognition workshops 2020 (pp. 390-391).
31. Liu S, Qi L, Qin H, Shi J, Jia J. Path aggregation network for instance segmentation. InProceedings of
the IEEE conference on computer vision and pattern recognition 2018 (pp. 8759-8768).
32. Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks."
Advances in neural information processing systems 28 (2015).

Figures

Figure 1

Methodology

Page 13/16
Figure 2

Confusion Matrix for YOLOv5 (above) and YOLOv8(below)

Figure 3

Loss reduction for YOLOv5 (left) and YOLOv8 (right)

Page 14/16
Figure 4

a: Validation result (YOLOv5)

b: Validation result (YOLOv8)

Figure 5

a: Precision curve (YOLOv5)

b: Precision Curve (YOLOv8)

Page 15/16
Figure 6

a: Correct prediction (YOLOv5)

b: Incorrect prediction (YOLOv8)

Page 16/16

Air Compressor Parts PDF
0% (1)
Air Compressor Parts PDF
51 pages
Group 8 - Project Proposal
No ratings yet
Group 8 - Project Proposal
21 pages
SIGNLANGUAGE PPT
100% (1)
SIGNLANGUAGE PPT
15 pages
IRJMETS Template
No ratings yet
IRJMETS Template
7 pages
Official Notification For OAVS Recruitment
No ratings yet
Official Notification For OAVS Recruitment
28 pages
Recent Advances On Deep Learning For Sign Language
No ratings yet
Recent Advances On Deep Learning For Sign Language
52 pages
21UGYS01 - Mapping Techniques
No ratings yet
21UGYS01 - Mapping Techniques
109 pages
Rb183210 Mpa Craft Guidebook FA
100% (1)
Rb183210 Mpa Craft Guidebook FA
23 pages
An Integrated Mediapipe Optimized GRU Model For Indian Sign Language Recognition
No ratings yet
An Integrated Mediapipe Optimized GRU Model For Indian Sign Language Recognition
16 pages
PBL Final
No ratings yet
PBL Final
14 pages
Natural Dyeing of Cotton Fabric
100% (2)
Natural Dyeing of Cotton Fabric
35 pages
All Research
No ratings yet
All Research
133 pages
Retrieve
No ratings yet
Retrieve
21 pages
A Survey of Machine Learning Techniques For Sign Language Translation Ijariie22722
No ratings yet
A Survey of Machine Learning Techniques For Sign Language Translation Ijariie22722
10 pages
Mathematics 11 03729
No ratings yet
Mathematics 11 03729
20 pages
(Sign Language Detection) : ACE Engineering College
No ratings yet
(Sign Language Detection) : ACE Engineering College
19 pages
Sign Language Detection
No ratings yet
Sign Language Detection
6 pages
Sign Language
No ratings yet
Sign Language
22 pages
A6 - FSL Vowels
No ratings yet
A6 - FSL Vowels
16 pages
Research Project I E33c79ec
No ratings yet
Research Project I E33c79ec
12 pages
Updated
No ratings yet
Updated
30 pages
Sign Language Recognition Using ML
No ratings yet
Sign Language Recognition Using ML
3 pages
Sign Language Recognition
No ratings yet
Sign Language Recognition
24 pages
CCW Basics and The Micro 830
No ratings yet
CCW Basics and The Micro 830
52 pages
Vitotres343 TechGuide PDF
No ratings yet
Vitotres343 TechGuide PDF
32 pages
Plag Free
No ratings yet
Plag Free
28 pages
IJCRT2402668
No ratings yet
IJCRT2402668
7 pages
IJRAR23B3375
No ratings yet
IJRAR23B3375
5 pages
Report 1
No ratings yet
Report 1
30 pages
IRJAEH0202016 - Real-Time Sign Language Recognition and Translation Using Deep Learning Techniques
No ratings yet
IRJAEH0202016 - Real-Time Sign Language Recognition and Translation Using Deep Learning Techniques
5 pages
Sampling Procedure APEDA 1721269949
No ratings yet
Sampling Procedure APEDA 1721269949
5 pages
Sign Language Detection and Recognizatio
No ratings yet
Sign Language Detection and Recognizatio
7 pages
Research Paper - Sign Language Dect
No ratings yet
Research Paper - Sign Language Dect
9 pages
At 2 Manuscript
No ratings yet
At 2 Manuscript
2 pages
Assignment: Shubam Thakyal (2021A1R032)
No ratings yet
Assignment: Shubam Thakyal (2021A1R032)
51 pages
American - Yolo
No ratings yet
American - Yolo
16 pages
Sign Language Recogntion Report
No ratings yet
Sign Language Recogntion Report
29 pages
Final Conf PPT
No ratings yet
Final Conf PPT
11 pages
2021a1r002 1
No ratings yet
2021a1r002 1
14 pages
MB-310 Dynamics 365 Finance
No ratings yet
MB-310 Dynamics 365 Finance
13 pages
SSC Cpo
No ratings yet
SSC Cpo
1 page
Multimodal Deep Learning For Real-Time Gesture Recognition and Cross-Lingual Translation
No ratings yet
Multimodal Deep Learning For Real-Time Gesture Recognition and Cross-Lingual Translation
11 pages
Ft-757gx2 User Hb9fax
No ratings yet
Ft-757gx2 User Hb9fax
37 pages
BIt On
No ratings yet
BIt On
12 pages
Discover Artificial Intelligence: Using LSTM To Translate Thai Sign Language To Text in Real Time
No ratings yet
Discover Artificial Intelligence: Using LSTM To Translate Thai Sign Language To Text in Real Time
11 pages
Final Minor Report
No ratings yet
Final Minor Report
24 pages
On Sign Language Detection
No ratings yet
On Sign Language Detection
25 pages
Indian Sign Language Recognition System For Dynamic Signs
No ratings yet
Indian Sign Language Recognition System For Dynamic Signs
9 pages
Conference Paper
No ratings yet
Conference Paper
5 pages
Real-Time Sign Language Recognition and Multilingual Translation
No ratings yet
Real-Time Sign Language Recognition and Multilingual Translation
6 pages
IJRPR20645
No ratings yet
IJRPR20645
9 pages
IJCRT2304544
No ratings yet
IJCRT2304544
5 pages
IJNRD2504467
No ratings yet
IJNRD2504467
6 pages
Mohammed Maqdoom Jahagirdarp2Yo
No ratings yet
Mohammed Maqdoom Jahagirdarp2Yo
9 pages
PFX 48420843
No ratings yet
PFX 48420843
6 pages
Sign Language Detection Using Mediapipe and Deep Learning
No ratings yet
Sign Language Detection Using Mediapipe and Deep Learning
6 pages
Deep Learning For Sign Language Recognition
No ratings yet
Deep Learning For Sign Language Recognition
4 pages
MCA2185 - Research Paper
No ratings yet
MCA2185 - Research Paper
8 pages
Recognizing Sign Language Using Machine Learning and Deep Learning Models
No ratings yet
Recognizing Sign Language Using Machine Learning and Deep Learning Models
11 pages
Hand Gesture Based Sign Language Recognition Using Deep Learning
No ratings yet
Hand Gesture Based Sign Language Recognition Using Deep Learning
5 pages
Dec 2024 New Paper
No ratings yet
Dec 2024 New Paper
7 pages
Sign 1
No ratings yet
Sign 1
10 pages
Sign Language Recognition System Using Convolutional Neural Network and Computer Vision
No ratings yet
Sign Language Recognition System Using Convolutional Neural Network and Computer Vision
6 pages
American Sign Language Real Time Detection Using TensorFlow and Keras in Python
No ratings yet
American Sign Language Real Time Detection Using TensorFlow and Keras in Python
6 pages
G7 Synopsis
No ratings yet
G7 Synopsis
14 pages
Sign Language Detection Using Machine Learning
No ratings yet
Sign Language Detection Using Machine Learning
6 pages
Ecn Rulesregulationsandplayingconditions v4.0.3
No ratings yet
Ecn Rulesregulationsandplayingconditions v4.0.3
142 pages
Silent Signals AI Power Sign Language Recognization
No ratings yet
Silent Signals AI Power Sign Language Recognization
8 pages
Enhancing Accessibility With Long Short-Term Memory-Based Sign Language Detection Systems
No ratings yet
Enhancing Accessibility With Long Short-Term Memory-Based Sign Language Detection Systems
8 pages
G1 Sign Language Identifier PPT
No ratings yet
G1 Sign Language Identifier PPT
18 pages
Salesforce Admin Report
No ratings yet
Salesforce Admin Report
34 pages
Partnership - Case Digests (Thyrz)
No ratings yet
Partnership - Case Digests (Thyrz)
15 pages
Of Plymouth Plantation PDF
100% (2)
Of Plymouth Plantation PDF
4 pages
Lecture 7-2
No ratings yet
Lecture 7-2
37 pages
Case Study Synopsis Lpu Ums
No ratings yet
Case Study Synopsis Lpu Ums
5 pages
Visual Language Interpreter
No ratings yet
Visual Language Interpreter
7 pages
Reset Sony HCD-GR8
No ratings yet
Reset Sony HCD-GR8
1 page
GVI Seychelles Marine Report Jan 2017 - Dec 2017 (Cap Ternay)
No ratings yet
GVI Seychelles Marine Report Jan 2017 - Dec 2017 (Cap Ternay)
82 pages
Position Paper
No ratings yet
Position Paper
2 pages
Assessment Task 2: Activity No. 1
No ratings yet
Assessment Task 2: Activity No. 1
5 pages
Shops & Estt
No ratings yet
Shops & Estt
4 pages
BC2402 Designing and Developing Databases - Course Outline
No ratings yet
BC2402 Designing and Developing Databases - Course Outline
11 pages
Lab Report Liquid Flow
No ratings yet
Lab Report Liquid Flow
17 pages
Water Supply Base Map of Bellary City: Allipura Impounding Reservoir - 12633 ML
No ratings yet
Water Supply Base Map of Bellary City: Allipura Impounding Reservoir - 12633 ML
1 page
Phase 0
No ratings yet
Phase 0
15 pages
Markets in Profile 部分18
No ratings yet
Markets in Profile 部分18
5 pages
Dot Matrix Printer (DMP)
No ratings yet
Dot Matrix Printer (DMP)
12 pages
General Duty Valves For Water Based Fire Suppression Piping
No ratings yet
General Duty Valves For Water Based Fire Suppression Piping
5 pages
WhatsApp Chat PDF
No ratings yet
WhatsApp Chat PDF
1 page
Ex 5 - NN - Wheat Seed Data
No ratings yet
Ex 5 - NN - Wheat Seed Data
5 pages
CV 3
No ratings yet
CV 3
2 pages
Tito 2
No ratings yet
Tito 2
2 pages
Image Filtering Techniques: Aim: Source Code
No ratings yet
Image Filtering Techniques: Aim: Source Code
2 pages
CV 5
No ratings yet
CV 5
2 pages
DSO Organizational Chart - by Michael W. Davis, DDS
No ratings yet
DSO Organizational Chart - by Michael W. Davis, DDS
1 page
Visual Word: Unlocking the Power of Image Understanding
From Everand
Visual Word: Unlocking the Power of Image Understanding
Fouad Sabry
No ratings yet

American Sign Language Detection Using YOLOv5 and

Uploaded by

American Sign Language Detection Using YOLOv5 and

Uploaded by

American Sign Language Detection using YOLOv5

Posted Date: July 6th, 2023

O. Vedak et SL interpreter HOG & SVM 6000 images of 26 88%

A. Training a SL SURF and BRISK features 4972 images of 24 SURF

D. Naglot et In Real-time SL MLP and Back 26 different 96.15%

Z. wang et al. End-to-end SL DeepSLR technology" is 5586 sign words -

M. Fernando Low-cost Active shape models 50 signs, 5 signs 76%

K. Dutta et al. Machine ISL 220 images of 92%

T. Two-handed HOG THISL dataset with 87.67%

Where, IOU = Intersection over union

(x i , yi) – coordinates of target center (bounding box)

1st epoch 80th epoch 1st epoch 80th epoch

YOLOv5 1.733 0.3265 4.251 0.191 93.6%

YOLOv8 1.444 0.312 4.352 0.1735 96%

Funding: No funding is received.

Confusion Matrix for YOLOv5 (above) and YOLOv8(below)

Loss reduction for YOLOv5 (left) and YOLOv8 (right)

a: Validation result (YOLOv5)

b: Validation result (YOLOv8)

a: Precision curve (YOLOv5)

b: Precision Curve (YOLOv8)

a: Correct prediction (YOLOv5)

b: Incorrect prediction (YOLOv8)

You might also like