Sign Doc 2 - Merged
Sign Doc 2 - Merged
A PROJECT REPORT
Submitted by
SHREYAS [710120205303]
MADHAN KUMAR [710120205026]
ASHOKAN [710120205006]
BACHELOR OF TECHNOLOGY
in
INFORMATION TECHNOLOGY
ADITHYA INSTITUTE OF
TECHNOLOGY
MAY 2023
BONAFIDE CERTIFICATE
SIGNATURE SIGNATURE
Dr.Mishmala Sushith,M.E,PhD. Ms.Teitsana Devi B.E,M.E
HEAD OF THE DEPARTMENT SUPERVISOR
Department of information technology Assistant Professor-IT
Adithya Institute of Technology, Adithya Instituteof Technology,
Kurumbapalayam, Kurumbapalayam,
Coimbatore-641107. Coimbatore-641107.
Held on
The proposed system is a real-time sign language detector that uses computer
vision techniques and machine learning algorithms to recognize and translate
American Sign Language (ASL) gestures into text or speech.The system consists of a
camera that captures video footage of the user's hand gestures, which is then fed into a
deep-learning model using MediaPipe for analysis.The proposed system is designed to
be user-friendly and accessible, with a simple interface that allows users to easily
input their ASL gestures and receive real-time feedback.This proposed system may
future we update feature in the real time video call sign translator
LIST OF FIGURES
FIG.NO FIGURE NAME PAGE NO.
1 American Sign Language Symbols 6
2 Example of MediaPipe Holistic 8
3 MediaPipe Landmarks for Hands 8
4 Model Architecture Diagram 9
5 LSTM Computation Cell 11
6 Model Testing Accuracy & Loss 22
TABLE OF CONTENTS
ABSTRACT iv
LISTOFFIGURES v
1. INTRODUCTION 1
2. LITERATURE SURVEY 3
3. EXISTING SYSTEM 5
4. PROPOSED SYSTEM 6
5. MODEL ARCHITECTURE 9
6. SYSTEM SPECIFICATION 12
7. IMPLEMENTATIONAND OUTPUT RESULT 22
8. CONCLUSION 23
1
deals with feature extraction example boundary modelling, contour, segmentation of
gestures and estimation of hand shapes. But, all these solutions are not lightweight
enough to run in real-time devices like mobile phone applications and thus are
restricted to platform equipped with robust processors. Moreover, the challenge of
hand-tracking remained persistent in all these techniques. To address this drawback,
our proposed methodology used an approach that involves Google’s innovative,
rapidly growing and open source project MediaPipe and a machine learning algorithm
on top of this framework to get a faster, simpler, cost-effective, portable and easy to
deploy pipeline which can be used as a sign language recognition system.
1.1 SCOPE OF THEPROJECT
The main scope this project is to recognize the sign language from using human
gestureand translate the sign language. It improves the communication between
normal people and deaf people this project will act like bridge both actors. This sign
language project will adopt the different region sign languages also.
1.2 OBJECTIVE OF THE PROJECT
The objective of this project is to create an automatic sign language translator
that can recognize and interpret sign language gestures and movements accurately
and generate spoken language output in real-time. The project aims to use state-of-
the-art computer vision and machine learning techniques to develop a reliable and
efficient system that can recognize a broad range of sign language gestures and
dialects.
The system's primary objective is to facilitate communication between
individuals who use sign language and those who do not, improving inclusivity and
accessibility for people with hearing impairments.
The project's secondary objective is to develop a user-friendly and accessible
interface that can be easily used by people with varying levels of technical knowledge.
Ultimately, the project's success will be measured by its ability to accurately translate
sign language into spoken language and its usefulness in promoting communication
and inclusivity in society.
2
CHAPTER2
LITERATURE SURVEY
Relatively hand gesture recognition is a difficult problem to address in the field
of machine learning. Classification methods can be divided into supervised and
unsupervised method. Based on these methods the SLR system can recognize static or
dynamic sign gestures of hands.
2.1 Neural Network for the First Time in Sign Language Recognition
Murakami and Taguchi in the year 1991, published a research article using neural
network for the first time in sign language recognition. With the development in the
field of computer vision, numerous researchers came up with novel approaches to help
the physically challenged community. Using coloured gloves, a real-time hand
tracking application was developed by Wang and Popovic. The colour pattern of the
gloves was recognized by K-Nearest Neighbors (KNN) technique but continuous
feeding of hand streams is required for the system.
2.2 Isolated Sign Recognition
Support Vector Mechanism (SVM) outperformed this algorithm in the research
findings of Rekha et al, Kurdyumov et al, Tharwat et al. and Baranwal and Nandi.
There are two types of Sign Language Recognition: Isolated sign recognition and
continuous sentence recognition. Likewise, whole sign level modelling and subunit
sign level modelling exist in the SLR system. Visual-descriptive and linguistic-
oriented are two approaches that lead to subunit level sign modelling.
2.3 Pre-Processed Images for a Hand-Detection System
R.Sharma et al., used 80000 individual numeric signs with more than 500
pictures per sign to train a machine learning model. Their system methodology
comprises a training database of pre-processed images for a hand-detection system and
a gesture recognition system. Image pre-processing included feature extraction to
normalize the input information before training the machine learning model. The
images are converted into grayscale for better object contour maintaining a
standardized resolution and then flattened into a smaller amount of one-dimensional
components. The feature extraction technique helps to extract certain features about
the pixel data from images and feed them to CNN for easier training and more
accurate prediction. Hand tracking in 2D and 3D space has been performed by W.Liu
et al.They used skin saliency where skin tones within a specific range were extracted
for better feature extraction and achieved a classification accuracy of around 98%.
3
2.4 Action Recognition Using CNN
Similar to action recognition, some recent works use CNNs to extract the
holistic features from image frames and then use the extracted features for
classification. Several approaches first extract body keypoints and then concatenate
their locations as a feature vector. The extracted features are then fed into a stacked
GRU for recognizing signs. These methods demonstrate the effectiveness of using
human poses in the word-level sign recognition task. Instead of encoding the spatial
and temporal information separately, recent works also employ 3D CNNs to capture
spatial-temporal features together. However, these methods are only tested on small-
scale datasets. Thus, the generalization ability of those methods remains unknown.
Moreover, due to the lack of a standard word-level large-scale sign language dataset,
the results of different methods evaluated on different small-scale datasets are not
comparable and might not reflect the practical usefulness of models.
The most important obstacle to vision based SLT research has been the
availability of suitable datasets. Curating and annotating continuous sign language
videos with spoken language translations is a laborious task. There are datasets
available from linguistic sources and sign language interpretations from broadcasts.
However, the available annotations are either weak (subtitles) or too few to build
models which would work on a large domain of discourse. In addition, such datasets
lack the human pose information which legacy Sign Language Recognition (SLR)
methods heavily relied on.
The relationship between sign sentences and their spoken language translations
are non-monotonic, as they have different ordering. Also, sign glosses and linguistic
constructs do not necessarily have a one-to-one mapping with their spoken language
counterparts. This made the use of available CSLR methods (that were designed to
learn from weakly annotated data) infeasible, as they are build on the assumption that
sign language videos and corresponding annotations share the same temporal order.
It is evident from all these previous methods that to recognize hand gesture
precisely with high accuracy, models require a large dataset and complicated
methodology with complex mathematical processing. Pre-processing of images plays
a vital in the gesture tracking process. Therefore, for our project, we used an open-
source framework from Google known as MediaPipe which is capable of detecting
human body part accurately.
4
CHAPTER 3
EXISTINGSYSTEM
3.1 Sign Language Recognition
Early approaches for SLR rely on hand-crafted features (Tharwat et al., 2014;
Yang, 2010) and use Hidden Markov Models (Forster et al., 2013) or Dynamic Time
Warping (Lichtenauer et al., 2008) to model sequential dependencies. More recently,
2D convolutional neural networks (2D-CNN) and 3D convolutional neural networks
(3D-CNN) effectively model spatio-temporal representations from sign language
videos (Cui et al., 2017; Molchanov et al., 2016). Most existing work on CSLR
divides the task into three sub-tasks: alignment learning, single-gloss SLR, and
sequence construction (Koller et al., 2017; Zhang et al., 2014) while others perform
the task in an end-to-end fashion using deep learning (Huang et al., 2015; Camgoz et
al., 2017).
5
hidden states. Vaswani et al. (2017) introduces the Transformer, a seq2seq model
relying on self-attention that obtains state-of-the-art results in NMT.
CHAPTER4
PROPOSED
SYSTEM
The deaf-mute community have undeniable communication problems in their
daily life. Recent developments in artificial intelligence tear down this communication
barrier. The main purpose of this paper is to demonstrate a methodology that
simplified Sign Language Recognition using MediaPipe’s open-source framework and
machine learning algorithm. The predictive model is lightweight and adaptable to
smart devices. Multiple sign language datasets such as American, Indian, Italian and
Turkey are used for training purpose to analyze the capability of the framework. With
an average accuracy of 99%, the proposed model is efficient, precise and robust. Real-
time accurate detection using Long Short Term Memory (LSTM) algorithm without
any wearable sensors makes use of this technology more comfortable and easy.
Sign Languages Created Among Small Communities: The NIDCD is also funding
research on sign languages created among small communities of people with little to
no outside influence.
4.2 MediaPipe
7
keypoints and is available on-device for mobile and desktop.
8
Figure 3 - MediaPipe Landmarks for Hands
CHAPTER5
MODEL ARCHITECTURE
9
MediaPipe has a large collection of human body detection and tracking models which
are trained on a massive and most diverse dataset of Google. As the skeleton of nodes
and edges or landmarks, they track key points on different parts of the body. All co-
ordinate points are three-dimension normalized. Models build by Google developers
using Tensorflow lite facilitates the flow of information easily adaptable and
modifiable via graphs. MediaPipe pipelines are composed of nodes on a graph which
are generally specified in pbtxt file. These nodes are connected to C++ files.
Expansion upon these files is the base calculator class in Mediapipe. Just like a
video stream this class gets contracts of media streams from other nodes in the graph
and ensures that it is connected. Once, rest of the pipelines nodes are connected, the
class generates its own output processed data. Packet objects encapsulating many
different types of information are used to send each stream of information to each
calculator. Into a graph, side packets can also be imposed, where a calculator node can
be introduced with auxiliary data like constants or static properties. This simplified
structure in the pipeline of dataflow enables additions or modifications with ease and
the flow of data becomes more precisely controllable. The Hand tracking solution has
an ML pipeline at its backend consisting of two models working dependently with
each other: a) Palm Detection Model b) Land Landmark Model. The Palm Detection
Model provides an accurately cropped palm image and further is passed on to the
landmark model. This process diminishes the use of data augmentation (i.e. Rotations,
Flipping, Scaling) that is done in Deep Learning models and dedicates most of its
power for landmark localization.
The traditional way is to detect the hand from the frame and then do landmark
localization over the current frame. But in this Palm Detector using ML pipeline
challenges with a different strategy. Detecting hands is a complex procedure as you
have to perform image processing and thresholding and work with a variety of hand
sizes which leads to consumption of time. Instead of directly detecting hand from the
current frame, first, the Palm detector is trained which estimates bounding boxes
around the rigid objects like palm and fists which is simpler than detecting hands with
coupled fingers. Secondly, an encoder-decoder is used as an extractor for bigger scene
context.
10
file is then prepared for splitting into training and validation set. 80% of the data is
retained for training our model with various optimization and loss function, whereas
20% of data is reserved for validating the model.
Hidden State (ht): The hidden state is the output of the LSTM at each time step. It
can be considered as the LSTM's "memory" of the previous time steps and contains
information that is relevant for prediction or classification tasks.
Input Gate (it), Forget Gate (ft), and Output Gate (ot): These gates regulate the
flow of information through the LSTM cell. The input gate determines how much new
information should be stored in the cell state, the forget gate controls what information
to discard from the cell state, and the output gate determines how much of the cell
state should be exposed as the hidden state.
Gate Activation Functions: Sigmoid activation functions are typically used for the
input, forget, and output gates to squash the gate values between 0 and 1. This allows
11
the gates to control the flow of information effectively.
Cell Activation Function: The cell state uses a hyperbolic tangent (tanh) activation
function to squish the values between -1 and 1, which helps in capturing the
relationships and dependencies between different time steps.
During the forward pass of training or inference, the LSTM takes a sequence of
inputs (e.g., word embeddings or image features) and updates its cell state and hidden
state at each time step. The updated hidden state can be used for tasks like sequence
prediction, sentiment analysis, machine translation, and more.
LSTMs have been widely used in various domains, including natural language
processing (NLP), speech recognition, time series analysis, and computer vision, due
to their ability to model and understand complex sequential patterns and dependencies.
They have proven to be effective in capturing long-range dependencies and have
become a popular choice for tasks involving sequential data.
CHAPTER6
SYSTEMSPECIFICATION
6.1 HardwareRequirement
RAM 16GB
12
5 OpenCV 4.7.0 https://opencv.org/
Anaconda:
It includes over 1,500 data science packages and libraries, making it easy to get
started with data analysis and machine learning.
Anaconda is available for Windows, macOS, and Linux, and can be installed
through a graphical installer or command-line interface.
TensorFlow:
It was developed by Google and is widely used in industry and academia for a
variety of tasks, including image and speech recognition, natural language
processing, and more.
TensorFlow provides a flexible and scalable framework for building deep neural
networks, including support for distributed training across multiple GPUs and
machines.
It includes a range of pre-built models and tools for visualizing and analyzing
model performance.
TensorFlow can be used with Python, C++, Java, and other programming
languages.
13
Pandas:
It offers data structures like DataFrames and Series, which enable easy
manipulation and analysis of structured data.
Pandas provides powerful features for data cleaning, filtering, and aggregation.
Pandas supports a wide range of data formats, including CSV, Excel, SQL
databases, and more.
NumPy:
It offers efficient data structures like arrays and matrices, which enable fast
computations.
NumPy enables efficient data storage and retrieval using disk or memory-
mapped files.
OpenCV:
OpenCV is a popular library for computer vision and image processing tasks.
It offers a wide range of functions and algorithms for image and video I/O,
image manipulation, feature detection, and object recognition.
14
reconstruction.
Scikit-learn:
It also offers functionality for evaluating models and handling missing data.
Scikit-learn is compatible with other Python libraries like NumPy, Pandas, and
Matplotlib.
MediaPipe:
It provides pre-built components and algorithms for tasks like hand tracking,
face detection, pose estimation, and object detection.
Matplotlib:
15
It provides a wide range of functions for creating line plots, scatter plots, bar
plots, histograms, and more.
Matplotlib offers extensive customization options for labels, titles, axes, colors,
and styles.
Matplotlib is compatible with other Python libraries like NumPy, Pandas, and
scikit-learn.
6.3Installation Procedure
6.3.1 Anaconda Navigator - Installing on Windows
1. Download the Anaconda installer.
2. Go to your Downloads folder and double-click the installer to launch. To
prevent permission errors, do not launch the installer from the Favorites folder.
Notes : If you encounter issues during installation, temporarily disable your
anti-virus software during install, then re-enable it after the installation
concludes. If you installed for all users, uninstall Anaconda and re-install it for
your user only.
3. Click Next.
4. Read the licensing terms and click I Agree.
5. It is recommended that you install for Just Me, which will install Anaconda
Distribution to just the current user account. Only select an install for All
Users if you need to install for all users’ accounts on the computer (which
requires Windows Administrator privileges).
6. Click Next.
7. Select a destination folder to install Anaconda and click Next. Install Anaconda
to a directory path that does not contain spaces or unicode characters.
8. Do not install as Administrator unless admin privileges are required.
16
9. Choose whether to add Anaconda to your PATH environment variable or
register Anaconda as your default Python. We don’t recommend adding
Anaconda to your PATH environment variable, since this can interfere with
other software. Unless you plan on installing and running multiple versions of
Anaconda or multiple versions of Python, accept the default and leave this box
checked. Instead, use Anaconda software by opening Anaconda Navigator or
the Anaconda Prompt from the Start Menu.
10.As of Anaconda Distribution 2022.05, the option to add Anaconda to the PATH
environment variable during an All Users installation has been disabled. This
was done to address a security exploit. You can still add Anaconda to the PATH
environment variable during a Just Me installation.
17
11.Click Install. If you want to watch the packages Anaconda is installing, click
Show Details.
12.Click Next.
13.Optional: To install Dataspell for Anaconda,
click https://www.anaconda.com/dataspell.
14.Or to continue without Dataspell, click Next.
18
1. After a successful installation you will see the “Thanks for installing Anaconda”
dialog box:
2. If you wish to read more about Anaconda.org and how to get started with
Anaconda, check the boxes “Anaconda Distribution Tutorial” and “Learn more
about Anaconda”. Click the Finish button.
3. Verify your installation.
19
# Include the bash command regardless of whether or not you are using the Bash
shell
bash~/Downloads/Anaconda3-2020.05-Linux-x86_64.sh
# Replace ~/Downloads with your actual path
# Replace the .sh file name with the name of the file you downloaded
For Python 2.7, enter the following:
# Include the bash command regardless of whether or not you are using the Bash
shell
bash~/Downloads/Anaconda2-2019.10-MacOSX-x86_64.sh
# Replace ~/Downloads with your actual path
# Replace the .sh file name with the name of the file you downloaded
Press Enter to review the license agreement. Then press and hold Enter to scroll.
Enter “yes” to agree to the license agreement.
Use Enter to accept the default install location, use CTRL+C to cancel the
installation, or enter another file path to specify an alternate installation directory.
If you accept the default install location, the installer
displays PREFIX=/home/<USER>/anaconda<2/3> and continues the installation.
It may take a few minutes to complete.
Notes: Anaconda recommends you accept the default install location. Do not choose
the path as /usr for the Anaconda/Miniconda installation.
The installer prompts you to choose whether to initialize Anaconda Distribution
by running conda init. Anaconda recommends entering “yes”.
If you enter “no”, then conda will not modify your shell scripts at all. In
order to initialize after the installation process is done, first
run source [PATH TO CONDA]/bin/activate and then run conda init.
See FAQ.
The installer finishes and displays, “Thank you for installing Anaconda<2/3>!”
Optional: The installer describes the partnership between Anaconda and
JetBrains and provides a link to install Dataspell for Anaconda
at https://www.anaconda.com/dataspell.
Close and re-open your terminal window for the installation to take effect, or
enter the command source ~/.bashrc to refresh the terminal.
20
You can also control whether or not your shell has the base environment
activated each time it opens.
# The base environment is activated by default
condaconfig--setauto_activate_baseTrue
# The base environment is not activated by default
condaconfig--setauto_activate_baseFalse
# The above commands only work if conda init has been run first
# conda init is available in conda versions 4.6.12 and later
Verify your installation.
Notes: If you install multiple versions of Anaconda, the system defaults to the most
current version, as long as you haven’t altered the default install path.
Verifying Your Installation
Confirm that Anaconda is installed and working with Anaconda Navigator or
conda with the following instructions.
Anaconda Navigator
1. Anaconda Navigator is a graphical user interface (GUI) that is automatically
installed with Anaconda. Navigator will open if the installation was successful. If
Navigator does not open, review our help resources.
2. Windows: Click Start, search for Anaconda Navigator, and click to open.
3. macOS: Click Launchpad and select Anaconda Navigator. Or use Cmd+Space to
open Spotlight Search and type “Navigator” to open the program.
Conda
1. If you prefer using a command line interface (CLI), use conda to verify the
installation using Anaconda Prompt on Windows or the terminal on Linux and
macOS.
2. To open Anaconda Prompt:
3. Windows: Click Start, search for Anaconda Prompt, and click to open.
6. Linux–Ubuntu: Open the Dash by clicking the Ubuntu icon, then type
“terminal”.
21
7. After opening Anaconda Prompt or the terminal, choose any of the following
methods to verify:
8. Enter conda list. If Anaconda is installed and working, this will display a list
of installed packages and their versions.
9. Enter the command python. This command runs the Python shell, also known
as the REPL. If Anaconda is installed and working, the version information it
displays when it starts up will include “Anaconda”. To exit the Python shell,
enter the command quit().
Tensorflow: Scikit-learn:
pip install tensorflow pip install scikit-learn
Pandas: MediaPipe:
pip install pandas pip install mediapipe
NumPy: Matplotlib:
pip install numpy pip install matplotlib
OpenCV:
pip install opencv-python
CHAPTER7
IMPLEMENTATIONANDOUTPUT RESULT
The sign language recogtion and translator model was trained using sample
training dataset extracted from real-time trained sign symbols, over 100 epochs. The
resulting accuracy of the text only emotion recognition model was 83.33%, with loss
of 0.17%.
The history variable is assigned the output of the fit method to store the training
history.
The code for plotting training accuracy and loss is added after the model
training loop.
22
The plt.show() command is used to display the generated plots.
The following figures show the loss and accuracy results while training a text only
emotion recognition model for each and every epoch.
CHAPTER8
CONCLUSION
With an average accuracy of 83% in most of the sign language dataset using
MediaPipe’s technology and machine learning, our proposed methodology show that
MediaPipe can be efficiently used as a tool to detect complex hand gesture precisely.
Although, sign language modelling using image processing techniques has evolved
over the past few years but methods are complex with a requirement of high
computational power. Time consumption to train a model is also high. From that
perspective, this work provides new insights into this problem. Less computing power
and the adaptability to smart devices makes the model robust and cost-effective.
Training and testing with various sign language datasets show this framework can be
adapted effectively for any regional sign language dataset and maximum accuracy can
be obtained. Faster real-time detection demonstrates the model’s efficiency better than
the present state-of-arts. In the future, the work can be extended by introducing word
23
detection of sign language from videos using Mediapipe’s state-of-art and best
possible classification algorithms
APPENDIX – I
SOURCE
CODE
Data Collection
import os
import numpy as np
import cv2
import mediapipe as mp
from itertools import
product from my_functions
24
import *
25
actions = np.array(['a',
'b']) sequences = 30
frames = 10
PATH = os.path.join('data')
cap = cv2.VideoCapture(0)
if not cap.isOpened():
print("Cannot access camera.")
exit()
with mp.solutions.holistic.Holistic(min_detection_confidence=0.75,
min_tracking_confidence=0.75) as holistic:
for action, sequence, frame in product(actions, range(sequences), range(frames)):
if frame == 0:
while True:
if keyboard.is_pressed(' '):
break
_, image = cap.read()
26
cv2.putText(image, 'Recroding data for the "{}". Sequence number
{}.'.format(action, sequence),
(20,20), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,0,255), 1,
cv2.LINE_AA)
cv2.putText(image, 'Pause.', (20,400), cv2.FONT_HERSHEY_SIMPLEX,
1, (0,0,255), 2, cv2.LINE_AA)
cv2.putText(image, 'Press "Space" when you are ready.', (20,450),
cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0,255), 2, cv2.LINE_AA)
cv2.imshow('Camera', image)
cv2.waitKey(1)
if cv2.getWindowProperty('Camera',cv2.WND_PROP_VISIBLE) < 1:
break
else:
_, image = cap.read()
results = image_process(image, holistic)
draw_landmarks(image, results)
if cv2.getWindowProperty('Camera',cv2.WND_PROP_VISIBLE) < 1:
break
27
keypoints = keypoint_extraction(results)
frame_path = os.path.join(PATH, action, str(sequence), str(frame))
np.save(frame_path, keypoints)
cap.release()
cv2.destroyAllWindows()
MY FUNCTION
import mediapipe as
mp import cv2
import numpy as np
def keypoint_extraction(results):
lh = np.array([[res.x, res.y, res.z] for res in
results.left_hand_landmarks.landmark]).flatten() if results.left_hand_landmarks else
28
np.zeros(63)
rh = np.array([[res.x, res.y, res.z] for res in
results.right_hand_landmarks.landmark]).flatten() if results.right_hand_landmarks
else np.zeros(63)
return np.concatenate([lh, rh])
MODEL
import tensorflow
import numpy as np
import os
from sklearn.model_selection import train_test_split
from tensorflow.keras.utils import to_categorical
from itertools import product
from sklearn import metrics
actions = np.array(os.listdir(PATH))
sequences = 30
frames = 10
29
temp = []
for frame in range(frames):
npy = np.load(os.path.join(PATH, action, str(sequence), str(frame) + '.npy'))
temp.append(npy)
landmarks.append(temp)
labels.append(label_map[action])
X, Y = np.array(landmarks), to_categorical(labels).astype(int)
model.compile(optimizer='Adam', loss='categorical_crossentropy',
metrics=['categorical_accuracy'])
model.fit(X_train, Y_train, epochs=100)
model.save('my_model')
30
history = model.fit(X_train, Y_train, epochs=100, validation_data=(X_test, Y_test))
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.plot(history.history['categorical_accuracy'])
plt.plot(history.history['val_categorical_accuracy'])
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend(['Train', 'Test'], loc='upper left')
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend(['Train', 'Test'], loc='upper right')
plt.tight_layout()
plt.show()
31
import numpy as np
import os
import mediapipe as
mp import cv2
from my_functions import *
import tensorflow as tf
from tensorflow import keras
from keras.models import load_model
import keyboard
PATH = os.path.join('data')
actions = np.array(os.listdir(PATH))
model = load_model('my_model')
cap = cv2.VideoCapture(0)
if not cap.isOpened():
print("Cannot access camera.")
exit()
while cap.isOpened():
_, image = cap.read()
32
results = image_process(image, holistic)
draw_landmarks(image, results)
keypoints.append(keypoint_extraction(results))
if len(keypoints) == 10:
keypoints = np.array(keypoints)
prediction = model.predict(keypoints[np.newaxis, :, :])
keypoints = []
if len(sentence) > 7:
sentence = sentence[-
7:]
if keyboard.is_pressed('
'): sentence = [' ']
cv2.imshow('Camera', image)
33
cv2.waitKey(1)
if cv2.getWindowProperty('Camera',cv2.WND_PROP_VISIBLE) < 1:
break
cap.release()
cv2.destroyAllWindows()
APPENDIX – II
REFERENCES
1. The world’s simplest facial recognition API for Python and the command
line. https://github.com/ ageitgey/face_recognition.
3. Necati Cihan Camgoz, Oscar Koller, Simon Hadfield, and Richard Bowden.
2020. Sign language transformers: Joint end-to-end sign language recognition
and translation. In Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR), June.
34
4. Pradyumna Narayana, J. Ross Beveridge, and Bruce A. Draper.
Gesture recognition: Focus on the hands. In CVPR, 2018.
35