Seven Sem Project Report
Seven Sem Project Report
A Project Report
On
“SignaLink:
ASL Translation Using Feedforward Neural Network and Convolutional
Neural Network with Analysis”
Submitted to:
Submitted By:
Abhijeet Yadav (23813/076)
Bibek Thakuri (23818/076)
Prajwal Shrestha (Malla) (23838/076)
TRIBHUVAN UNIVERSITY
SUPERVISOR’S RECOMMENDATION
I hereby recommend that this project, prepared under my supervision entitled “SignaLink: ASL
Translation Using Feedforward Neural Network and Convolutional Neural Network With Ana-
lysis”, a platform that detects the American Sign Language and translate into corresponding
speech and vice versa, the system also provides user functionality to analyze the two different
models, in partial fulfilment of the requirements for the degree of B.Sc. in Computer Science
and Information Technology is processed for the evaluation.
……………………………….
ii
ACKNOWLEDGEMENT
The successful realization of our final year project owes much to the invaluable support
extended by our project supervisors Mr. Chhetra Bahadur Chhetri, our designated
supervisor, deserves profound appreciation and our sincere gratitude. We would also like to
express our thanks to the National College of Computer Studies for providing us with an
exceptional platform to pursue and develop this project.
Under the guidance of Mr. Chhetra Bahadur Chhetri and the NCCS team, our team has
significantly deepened its understanding of AI and ML algorithms, as well as various
components associated with the development of this system. This experience has equipped us
with a comprehensive knowledge of the intricate workings behind complex AI systems,
positioning us well for future real-life projects in this domain.
We extend our immense gratitude to the NCCS team for their thorough review, approval, and
guidance throughout this transformative journey. Special acknowledgment is also due to the
supportive online communities, as well as our friends and families, whose assistance was
instrumental in the proper design, construction, and formation of this application.
In conclusion, we are truly honored by the collaborative efforts and support that have
contributed to the success of our project.
Abhijeet Yadav
Bibek Thakuri
Date: 26/08/2023
iv
ABSTRACT
The sign language translation system aims to bridge the communication gap between
individuals who use sign language and those who do not. This project presents a way to develop
a system capable of translating sign language into text and vice versa. The project uses two
algorithms Feedforward Neural Network (FNN) and Convolutional Neural Network (CNN) .
The data set used for both algorithms is a custom dataset that mimics the MNIST dataset where
each pixel of a 28*28 grayscale image is represented between 0 to 255.Both algorithms use the
dataset to train and recognize hand signs according to their respective labels. After opening the
camera, when the user classifies the hand sign, the system displays the corresponding text and
plays the corresponding audio. The test accuracy of CNN was found to be 99% and FNN was
found to be 91.02%. The system also has a dashboard where the accuracy, precision, recall and
f1 score can be visualized using different graphs and charts.
Keyword: Sign language translation system, FNN, CNN, MNIST dataset, Grayscale image
v
TABLE OF CONTENTS
SUPERVISOR’S RECOMMENDATION ........................................................................... II
ACKNOWLEDGEMENT .................................................................................................... IV
ABSTRACT ............................................................................................................................ V
CHAPTER 1: INTRODUCTION....................................................................................... 1
3.1.3 Analysis................................................................................................................. 9
vi
4.2 Algorithm Details.................................................................................................... 19
REFERENCES ...................................................................................................................... 47
vii
LIST OF ABBREVIATIONS
AI: Artificial Intelligence
viii
LIST OF FIGURES
Figure 1.1 Waterfall Model ....................................................................................................... 3
Figure 3.1: Use Case Diagram .................................................................................................. 6
Figure 3.2 ER Diagram ............................................................................................................. 9
Figure 3.3: DFD LEVEL 0 ..................................................................................................... 10
Figure 3.4 DFD LEVEL 1........................................................................................................11
Figure 4.1: System Flow ......................................................................................................... 12
Figure 4.2: Sign-to-Speech Flow Diagram ............................................................................. 13
Figure 4.3: Speech-to-Sign Flow Diagram ............................................................................. 14
Figure 4.4: Analysis Module Flow Diagram........................................................................... 15
Figure 4.5: High Level Design of Model ................................................................................ 16
Figure 4.6: Interface for Homepage ........................................................................................ 17
Figure 4.7: Interface for Sign to Speech Page ........................................................................ 18
Figure 4.8: Interface for Speech to Sign Page ........................................................................ 18
Figure 4.9: Interface for Analysis Page................................................................................... 19
Figure 4.10: Feedforward Neural Network ............................................................................. 20
Figure 4.11: Convolutional Neural Network .......................................................................... 22
Figure 5.1: Test for detecting sign language using FNN ........................................................ 33
Figure 5.2: Test for playing sign language according to the speech ....................................... 35
Figure 5.3: Loading the Home Page ....................................................................................... 36
Figure 5.4: Loading of Sign-To-Speech Window using FNN ................................................ 37
Figure 5.5: Loading of Sign To Speech Window using CNN. ................................................ 37
Figure 5.6: Loading of Speech-To-Sign Window ................................................................... 38
Figure 5.7: Loading of Analysis Window ............................................................................... 38
Figure 5.8: Confusion Matrix of FNN .................................................................................... 40
Figure 5.9: Training and Validation Loss of FNN Model ....................................................... 41
Figure 5.9: Training and Validation Accuracy of FNN Model ............................................... 42
Figure 5.10 Confusion Matrix of CNN ................................................................................... 42
Figure 5.11: Training and Validation Loss of CNN Model ..................................................... 43
Figure 5.12: Comparison of Model Performance ................................................................... 44
Figure 5.13: Comparison of Accuracy between Two Models ................................................. 45
ix
LIST OF TABLES
Table 3.1: Functional Requirements ......................................................................................... 7
Table 3.2: Non-Functional Requirements ................................................................................. 7
Table 4.1: Overview of the Dataset ......................................................................................... 16
Table 5.1: Tools Used .............................................................................................................. 24
Table 5.2: Test for Detecting Sign Language .......................................................................... 32
Table 5.3: Test for playing sign language according to the speech ......................................... 35
Table 5.4: Test for loading the application .............................................................................. 36
Table 5.5: Classification Report for FNN Model .................................................................... 41
Table 5.6: Classification Report for CNN Model ................................................................... 43
x
Chapter 1: Introduction
1.1 Introduction
At the current time according to WHO about 5% of the world's total population belongs to the
DHH community and has a hard time communicating with speech users. [1] Signalink is a
desktop-based application that classifies and translates American Sign Language into text and
speech and vice versa. It uses machine learning algorithms to classify the hand signs and then
translates them by displaying the corresponding text and speech. For the classification, two
different algorithms CNN and FNN were used.
The project also provides a dashboard where the two algorithms can be compared and
visualized using the different charts. The batch size, epoch, and learning rate can be given by
the user manually and see the comparison with the visual charts in real time.
The purpose of the project is to recognize sign language and translate it into text and audio.
1.3 Objective
The main objective of this project is:
To create a system that translates sign language into text or spoken language and vice versa.
1
Camera may not work properly in low light condition.
Requirement gathering
In this initial phase, the goal is to gather and document detailed requirements for the ASL
translation system, such as the range of signs to be recognized, desired output formats (text or
speech), performance expectations, and user interface preferences.
System design
Based on the gathered requirements, we will design the overall architecture and components
of the ASL translation system. This phase will involve designing the Feedforward Neural
Network model architecture, selecting appropriate computer vision techniques for hand
tracking and feature extraction, and defining the translation and user interface components.
Detailed design documents, including data flow diagrams, ER diagram will be produced.
Implementation
In this phase, the code is written for the entire system based on design specifications. The
computer vision algorithms, the CNN and FNN model is implemented and the real time
processing logic is developed for Sign language translation.
Testing
The unit testing is performed for individual components and modules is conducted. Similarly
integration testing is performed to ensure that the system components work together as
expected.
2
Figure 1.1 Waterfall Model
Chapter 1: It includes the introduction section, the problem we attempted to solve, and the
objectives and scope, development methodology for the project.
Chapter 2: It includes the background study of fundamental theories, general concepts and
literature review of similar projects, theories and results by other researchers.
Chapter 3: It provides an overview of all the requirements along with system analysis and
feasibility analysis of the system.
Chapter 4: It includes a detailed description of how the system was designed. It also includes
the details of the algorithm used.
Chapter 5: It includes the tools we used to build the system and how the testing process was
done.
Chapter 6: It includes the conclusion of the project and how we are further planning to make
the system work sustainably.
3
Chapter 2: Background Study and Literature Review
2.1 Background Study
Sign language is a visual-gestural language used by deaf and hard of hearing individuals for
communication. It relies on hand shapes, facial expressions, and body movements to convey
meaning. However, not everyone is proficient in sign language, leading to communication
barriers and social isolation for deaf individuals.
According to WHO 5% of world’s total population belongs to DHH Community. Despite the
technological advances there is still seems to be a barrier between the speech and sign user.
Recognition and classification of sign language using machine learning models involve several
steps. Initially, data collection and proper labeling are essential. Subsequently, the data
undergoes processing, and a suitable machine learning algorithm is selected. For our project,
we have opted for two machine learning models: Feedforward Neural Network (FNN) and
Convolutional Neural Network (CNN). Following the selection of the algorithm, the model is
trained using the labeled data. Nodes, the fundamental units within a Feedforward Neural
Network, receive input, undergo mathematical operations, and produce output. These nodes
are organized into layers, including input, hidden, and output layers. Activation functions play
a crucial role in determining the network's output by introducing non-linearities, facilitating
the learning of complex patterns within the data. After training the model, performance
evaluation is conducted using various evaluation metrics.
Machine learning enables the recognition and interpretation of sign language gestures,
allowing for real-time translation into spoken language and vice versa. By analyzing video
streams, machine learning algorithms can accurately interpret sign language gestures,
enhancing accessibility. Additionally, it enables the development of gesture-to-speech
interfaces, facilitating seamless communication. Through continuous learning and refinement,
these systems can improve their accuracy and effectiveness over time. Ultimately, machine
4
learning plays a pivotal role in breaking down communication barriers for deaf or hard-of-
hearing individuals, promoting inclusivity and accessibility in society.
There have been several studies that have used CNN and FNN for image processing. There is
a Study [2] where they developed a multi-layer fully connected neural network with a single
hidden layer to recognize handwritten digits. Testing was carried out using the publicly
accessible MNIST handwritten database, consisting of 28,000 digit images for training and
14,000 images for testing. Their artificial neural network achieved an impressive test accuracy
of 99.60%.
Another [3] study they propose a novel deep learning approach aimed at detecting sign
language, aiming to bridge this communication gap. Their methodology involves the creation
of a dataset comprising 11 sign words, which they use to train a customized Convolutional
Neural Network (CNN) model for real-time sign language detection. Preprocessing steps were
applied to the dataset before training the CNN model. Their results demonstrate that the
customized CNN model achieves impressive performance metrics, including 98.6% accuracy,
99% precision, 99% recall, and 99% f1-score on the test dataset.
The study [4] implements a Feedforward Neural Network (FNN) for image classification,
aiming to enhance its structure by integrating the dropout method to prevent overfitting. The
FNN is initialized with random uniform values and zero biases, incorporating ReLU and
Softmax activation functions. MNIST-handwritten numbers dataset is used for evaluation
Dropout is chosen as it efficiently prevents overfitting by randomly dropping neurons during
training. The FNN with dropout achieves an average accuracy of 99.86% and a loss of 0.47%,
outperforming the standard FFN's 98.13% accuracy and 9.15% loss. Comparatively, CNN
achieves 99.26% accuracy and 2.39% loss. Dropout not only enhances accuracy but also
reduces training time due to its iterative nature.
In conclusion, FNN and CNN have been successful in image classification tasks in several
studies.
5
Chapter 3: Requirement Analysis and Feasibility Study
3.1 System Analysis
3.1.1 Requirement Analysis
For this project, the requirement analysis process aimed to identify the key features and
functionality of the system, as well as any constraints or limitations that needed to be
considered during development.
6
3.1.1.1 Functional Requirement
FR1 Sign Language The system should be able to recognize and interpret
Recognition signs.
FR2 Speech Synthesis Convert sign language input into spoken language
using speech synthesis
FR3 Text Output the system should be able to convert sign language
input into written text.
FR4 User Interface Provide an intuitive and user-friendly interface for
both input and output
FR5 Real-time Processing Ensure real-time processing of sign language input to
provide immediate feedback
FR6 Speech Recognition The system should be able to recognize speech
7
NFR3 Usability Our system is simple to use which makes.
accessing desire feature a lot easier and faster.
Every UI component is arranged properly for
easy navigation and effective usage.
NFR4 Maintainability The application will be very simple to maintain
because detailed documentation describing all of
the system's components will be prepared.
This guarantees that future software developers and
engineers will have no trouble ensuring the quality
of our application.
The technical feasibility of a sign language translator is boosted by the prevalence of cameras
in today's mobile phones and computers, making it accessible to a broad audience. Analyzing
sign language videos doesn't demand high-end computers, reducing costs. This makes
developing and maintaining such a translator economically viable, as it utilizes existing
hardware and minimizes the need for specialized equipment. Leveraging readily available
technology allows for the implementation of sign language translation system sat a relatively
low cost, enhancing their overall technical feasibility.
8
The sign language translator is operationally feasible as it utilizes data and algorithms to
accurately translate sign language, making it practical and sustainable over time. The system
is designed to be user-friendly and easily integrated into existing communication platforms,
ensuring its ease of use and operational effectiveness.
A crucial aspect in the project's successful completion was teamwork. The project's objectives
were attained in part because of the team members' efficient work distribution and smooth
coordination. Because of this, the project was finished on time and according to a planned
schedule, demonstrating that scheduling was feasible.
3.1.3 Analysis
3.1.3.1 Data Modeling
DFD Diagram
DFD Level 0
The Level 0 DFD illustrates the user inputting hyperparameters (epoch, batch size, learning
rate) into the system. The system processes hand signs and voice inputs, utilizing machine
learning algorithms to generate sign language translations. It outputs predicted labels, voice
9
translations, and corresponding videos, facilitating user interaction and comprehension. This
high-level view outlines the flow of data, emphasizing the system's core functionalities and the
exchange of information between users and the translation system.
At Level 1 of the DFD, the Sign Language Translation System comprises three main modules:
Sign to Speech, Speech to Sign, and Analysis, all initiated by user input of hyperparameters
for optimal system configuration. The Sign to Speech module preprocesses hand sign data,
extracting features and normalizing them before feeding them into dedicated machine learning
models designed to translate sign language into speech. Simultaneously, the Speech to Sign
module processes voice input by preprocessing it and converting it into textual form. This text
data undergoes analysis and translation using specialized machine learning algorithms to
generate corresponding sign language representations. The analysis model is used to compare
between the FNN and CNN. Following the translation processes, the system integrates
predicted speech outputs, sign language representations, and analysis results. These are then
presented to users via the system interface, facilitating interaction and review
10
Figure 3.4 DFD LEVEL 1
11
Chapter 4: System Design
4.1 Design
The application is layered on top of hand sign and detection and language detection
mechanism, which checks for either hand sign to translate it into text and audio or checks for
audio input to recognize language and translate it into hand sign.
12
Figure 4.2: Sign-to-Speech Flow Diagram
In this workflow, the camera is initially opened to convert sign to speech. Once the camera is
open, it captured the hand sign. If the hand sign matches with label to the corresponding text,
it proceeds to the next step, which is to display the corresponding text. If the hand sign does
13
not match, it goes back to capture hand sign step. After the corresponding text is displayed, the
next step is to play the corresponding speech.
14
Figure 4.4: Analysis Module Flow Diagram
In the analysis workflow, the first step is to load the data in which we want to perform the
analysis. After that we input the hyperparameter such as epoch, batch size and learning rate so
that the data can be trained according to our need. Then we select the figure we like to display
such as bar graph, confusion matrix, box plot and loss and accuracy plot.
15
Figure 4.5: High Level Design of Model
Data Collection
The dataset for our project is collected by ourselves. The data formatting for the project
followed the structure of the MNIST dataset, which includes labels and image pixels. For
training, data was collected for three labels, with 600 datasets available for each label. For
testing, data for three labels was collected, with 200 datasets available for each label. Each row
consists of one label and 784 pixels, as the image size is 28x28 pixels. A custom capture model
is utilized to capture images, initially at a size of 200x200 pixels, which are subsequently
resized to 28x28 pixels. These images are then converted to grayscale and transformed into
CSV format. In the CSV file, each pixel is represented by a value between 0 and 255, where 0
denotes the darkest black and 255 represents the brightest white pixel.
16
4.1.1 Interface Design
Making an interface design before starting the front-end development is crucial. The project
interface is made using Figma, the interface is designed to analyze the requirements of the
project. The project consists of mainly 4 pages: The home page, Sign to Speech Page, Speech
to Page, and Analysis Page.
17
Figure 4.7: Interface for Sign to Speech Page
18
Figure 4.9: Interface for Analysis Page
The Feedforward Neural Network architecture consists of interconnected nodes organized into
layers, including input, hidden, and output layers. Each node receives input signals, processes
them through weighted connections, and applies an activation function to produce an output.
During training, the network adjusts its weights based on the difference between predicted and
actual outputs, minimizing the loss function through techniques such as gradient descent and
backpropagation. In our ASL translation system, the FNN is trained on labeled sign language
data to learn the mappings between input gestures and their corresponding meanings. [5]
19
Figure 4.10: Feedforward Neural Network
1. Data Preparation: first we need a label dataset consisting of pixel value of each pixel of
grayscale image ranging from 0 to 255
2. Forward Pass: The forward propagation in which the input layer is the number of pixel in
our dataset. The second layer is the hidden layer which is obtained by applying weight and bias
term to each pixel or neuron then activation function is applied so that it doesn’t become a
linear function. The activation function used to add the complexity in this layer is ReLu
function. The output layer is also obtained by adding weight and bias term along with weight
and activation function. The activation function applied in the output layer is softmax
activation function which gives probability for the label.
20
3. Backward Pass: In back propagation we first get the prediction, find how much it deviated
by actual label to give some sort of error. We also find how much each of the previous weight
and bias term contributed to the error. The third part is updating the parameter. Then this part
is propagated back and repeated again.
dZ[2] = A[2] -Y
𝟏
dW[2] = 𝒎dZ[2] A[1]T
𝟏
dB[2] = 𝒎 ∑dZ[2]
4. Update parameters: Updating parameters involves adjusting the weights and biases of the
neurons to minimize the error between the predicted output and the actual target values during
training. This process, known as backpropagation, aims to fine-tune the network's parameters
to improve its ability to make accurate predictions or classifications.
In the Feedforward Neural Network used in our system there is 784 neurons in input layer 10
in hidden layer and 3 in output layer.
21
Convolutional Neural Network (CNN):
Convolutional layers extract features from input images through convolution operations, while
pooling layers reduce spatial dimensions, enhancing computational efficiency. Fully connected
layers integrate the extracted features to make predictions. CNNs excel at capturing spatial
hierarchies and patterns within images, making them ideal for hand gesture recognition in our
project. [6]
Input Image: The input would be images of hand gestures representing different signs in the
sign language. Each image is preprocessed than represented as a matrix of pixel values.
Convolutional Layer: The convolutional layer applies a set of filters(kernels) to the input
data. These filters slide over the input image, performing element-wise multiplication and
summation to produce feature maps. The filters act as feature detectors, identifying patterns
and features at different spatial locations in the input images. As the network trains, the filters
learn to detect low-level features such as edges, corners, shapes, curves etc. present in the
MNIST dataset that we collected. MaxPooling was used to downsamples the feature maps
obtained from the convolution layers and reduces the spatial dimensions. It takes the maximum
value within each pooling window and make the representation more invariant to small
translations and distortions in the input data from our csv file.
22
IReLU (i,j)=max(0,I (i,j))
Y= max(W)
Fully Connected Layer: After several convolutional and pooling layers, the high-level
features are fed into fully connected layers. These layers consolidate the features learned by
the convolutional layers and map them to the appropriate output classes. In the project, the
fully connected layers help in recognizing complex patterns and relationships between
different hand gestures.
Output: The output layer represents the predictions made by the network. Each neuron in this
layer corresponds to a sign language gesture. The network predicts the sign language gesture
corresponding to the input image based on the activations of the neurons in the output layer.
23
Chapter 5: Implementation and Testing
5.1 Implementation
During this study the Waterfall model was used because it offers a straightforward and
systematic approach to software development. With this model, we can break down the project
into clear and distinct phases, allowing us to focus on one aspect at a time. In this model, each
stage of the software development life cycle must be finished before transitioning to the
subsequent one, with minimal to no overlap between phases. For our Sign Language
Translation, we chose to use the Waterfall model for our Sign Language system because it
offers a straightforward and systematic approach to software development. With this model we
can break down the project into clear and distinct phases, allowing us to focus on one aspect
at a time. This helps to ensure that each phase, such as gathering requirements, designing the
system, implementing features, testing for accuracy, deploying the final product.
24
relationships and interactions
between different components
of the drowsiness detection
system, providing a visual guide
for developers and stakeholders.
Language Python Python was used as the main
programming language as we
were already familiar with the
language and it is popular for
machine learning.
Code Editors Jupyter Notebook, Visual Studio Visual Studio Code was used as
Code the main text editor and the code
that required data visualization
was done in Jupyter.
UI/UX Figma Figma was employed for
making the wireframe and the
main user interface of the
system.
Documentation Microsoft Office Package MS Word was employed for
documentation purposes. It
provides a familiar desktop
based environment for creating
detailed project documentation,
including specifications, user
manuals, and other essential
documentation.
3D Model DeepMotion, Blender DeepMotion is a web
application that tracks the video
using AI and transfer it to 3d
model . We used DeepMotion to
convert our video into 3d
25
model. The 3d model output
was refined so we used Blender
to refine the 3d model.
def load_data(file_path):
data = pd.read_csv(file_path)
data = np.array(data)
np.random.shuffle(data)
return data
def preprocess_data(data):
m, n = data.shape
X = data[:, 1:].T / 255.0 # Normalize input data
Y = data[:, 0]
return X, Y
26
Step 2: Then an activation function for the hidden layer that is ReLU function is applied and
the activation function for the output layer which gives a probability is defined. The activation
function helps to make the neurons non-linear.
def ReLU(Z):
return np.maximum(Z, 0)
def softmax(Z):
exp_Z = np.exp(Z - np.max(Z)) # Subtracting max(Z) for numerical stability return exp_Z /
np.sum(exp_Z, axis=0)
Step 3: After defining the activation function forward pass is performed which used the
activation function , multiples the activation function with the weight term and adds a bias term
before passing to the next layer .
Z1 = W1.dot(X) + b1
A1 = ReLU(Z1)
Z2 = W2.dot (A1) + b2
A2 = softmax(Z2)
return Z1, A1, Z2, A2
Step 4: After the forward pass backward pass is performed in which first we get the prediction
and then compare with the label to find the error. We also find how much did the weight term
and the bias term contributed towards those layer and go back more till the input layer.
27
def backward_prop(Z1, A1, Z2, A2, W1, W2, X, Y):
num_classes = len(np.unique(Y))
m = X.shape[1]
one_hot_Y = one_hot (Y, num_classes)
dZ2 = A2 - one_hot_Y
dw2 = 1 / m * dZ2.dot(A1.T)
db2 = 1 / m * np.sum(dZ2, axis=1, keepdims=True)
dZ1 = W2.T.dot(dZ2) * (Z1 > 0) # ReLU derivative
dw1 = 1 / m * dZ1.dot(X.T)
db1 = 1 / m * np.sum(dZ1, axis=1, keepdims=True)
return dw1, db1, dw2, db2
Step 5: Then when we find how much the bias and the weight term contribute towards the
error we update the parameters accordingly.
def update_params (W1, b1, W2, b2, dW1, db1, dW2, db2, alpha):
W1-=alpha * dW1
b1-=alpha * db1
W2-=alpha * dW2
B2-=alpha * db2
return W1, b1, W2, b2
Step 6: After that we define one_hot function , prediction function and the accuracy function.
Step 7: Then a main function is defined which uses all the function and train the data and also
compute the loss in each iteration or epoch.
28
def gradient_descent(X, Y, alpha, iterations):
W1, b1, W2, b2 = init_params(X.shape[0], len(np.unique(Y)))
for i in range(iterations):
Z1, A1, Z2, A2 = forward_prop(W1, b1, W2, b2, X)
dW1, db1, dW2, db2 = backward_prop(Z1, A1, Z2, A2, W1, W2, X, Y)
W1, b1, W2, b2 = update_params(W1, b1, W2, b2, dW1, db1, dW2, db2, alpha)
# Calculate loss
loss = compute_loss(A2, Y)
if i % 10 == 0:
predictions = get_predictions(A2)
accuracy = get_accuracy(predictions, Y)
print(f"Iteration {i}: Loss = {loss:.4f}, Accuracy = {accuracy:.2f}")
return W1, b1, W2, b2
Step 8: After that prediction is made and tested if the prediction was correct or not by
comparing the label we get with the actual label.
29
Step 9: The code saves model parameters (W1, b1, W2, b2) as 'mode6.pkl'. Subsequently, it
evaluates predictions four times using test_prediction. For each evaluation, the function utilizes
training data (X_train, Y_train) and the saved model parameters.
# Test predictions
test_prediction(0, X_train, Y_train, W1, b1, W2, b2)
test_prediction(1, X_train, Y_train, W1, b1, W2, b2)
test_prediction(2, X_train, Y_train, W1, b1, W2, b2)
test_prediction(3, X_train, Y_train, W1, b1, W2, b2)
b) Implementation of CNN
Step 1: The train and test data is loaded from the CSV file.
Step 2: Then the dataset is spiltted into training and testing set. It allocates 30% of the data for
testing while keeping 70% for training.
30
model = Sequential()
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.20))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss='categorical_crossentropy',optimizer=Adam(),metrics=['accuracy'])
Step 5: In this step the model is trained using the training data (x_train and y_train). It specifies
parameters such as batch size, number of epochs, and verbosity level for monitoring progress.
31
Step 6: Then the model is evaluated and the prediction is made on the image.
# Make predictions
y_pred = model.predict(test_images)
5.2 Testing
Testing involves assessing and confirming the functionality of the developed sign language
application to ensure it accurately interprets and translates sign language gestures and
movements. It aims to determine if the actual sign language recognition and translation outputs
align with the expected results for various sign language inputs.
32
Figure 5.1: Test for detecting sign language using FNN
33
Figure 5.3: Test for detecting sign language using FNN
34
Table 5.3: Test for playing sign language according to the speech
Figure 5.5: Test for playing sign language according to the speech
5.2.2 System Testing
System testing in a sign language recognition project involves evaluating the entire system as
a whole to ensure it meets its specified requirements and functions correctly in its intended
environment. In sign language recognition system, testing involves capturing sign language
gestures through input devices like cameras, processing the data, recognizing the signs
accurately, and providing appropriate output or responses.
35
Table 5.4: Test for loading the application
36
Figure 5.7: Loading of Sign-To-Speech Window using FNN
37
Figure 5.9: Loading of Speech-To-Sign Window
38
5.3 Result Analysis
The system was tested through unit testing and proved to be effective in executing its intended
functions. The results showed that the project was able to meet its goals, but there is still room
for improvement in terms of expanding the system's capabilities and increasing community
involvement.
• Precision: The precision measures the proportion of correctly identified sign language
gestures or movements among all the gestures or movements that the model predicted as
belonging to a particular sign.
Recall: The recall measures the proportion of correctly identified sign language gestures or
movements among all the actual sign language gestures or movements present in the dataset.
39
• F1 score: The F1 score provides a single metric that balances the trade-off between precision
(accurately identifying sign language gestures) and recall (detecting most of the actual sign
language gestures present). The formula for F1 score is:
The best hyperparameters for the Feedforward Neural Network model were found to be an
epoch of 10 with a learning rate of 0.01 and batch of 23. The accuracy was 90.63% for the
training dataset and91.02% for the testing dataset. Sometimes, the confusion matrix
misclassified label 0 with label 1 and label 2 with 16 times and 20 times respectively. label 1
was misclassified as label 0 and label 2 ,1 and 9 times respectively and label 2 was misclassified
as label 0 and label 1, 16 and 4 times respectively.
40
Table 5.5: Classification Report for FNN Model
41
Figure 5.13: Training and Validation Accuracy of FNN Model
Evaluating Accuracy for Convolutional Neural Network
The best hyperparameters for the Convolutional Neural Network model were found to be
anepoch of 10 with a learning rate of 0.01 and batch of 32 . The accuracy was 96% for the
training dataset and 99% for the testing dataset.Sometimes, the confusion matrix misclassified
label 0 waith label 1 and label 2 with 9 times and 1 times repectively.label 1 was misclassified
as label 0 and label 2 ,7 and 2 times respectively while label 2 was accurately classified.
42
Table 5.6: Classification Report for CNN Model
43
Figure 5.12: Training and Validation Accuracy of CNN Model
Comparison of Model
44
Figure 5.17: Comparison of Accuracy between Two Models
45
Chapter 6: Conclusion and Future Improvements
6.1 Conclusion
In conclusion, the project classifies and translates the hand sign into text and speech using a
Feedforward Neural Network and convolutional neural network algorithm. The result shows
that the model effectively classifies and translates the sign language. The CNN algorithm
proves to be a more effective tool for classifying the hand sign because of the convolution layer
and is also robust to overfitting. Moreover, the model was trained on the custom data that we
collected which mimics the MNIST dataset.
The project is also able to visualize the accuracy, precision, recall, and f1 score through
different charts like bar graphs, line graphs, etc. The project is also able to show the
comprehension and the visualization of the data through the user input epochs, batch size, etc.
Overall, the project classifies and translates sign language to text and speech and also speech
to sign.
a. Enhanced Data Collection: Improved methods for collecting and analyzing data can
enhance the system's accuracy, enabling it to recognize a broader range of hand signs.
b. Enhanced User Interfaces: Upgrades to user interfaces can simplify system navigation,
making it more user-friendly and enabling easier access to necessary information.
c. Phrase Recognition: The system can be developed to recognize phrases or longer sentences,
enhancing communication and making it even more seamless and straightforward.
46
References
[2] K. T. Islam, G. Mujtaba, R. G. Raj and H. F. Nweke, "Handwritten digits recognition with
artificial neural network," 2017 International Conference on Engineering Technology and
Technopreneurship (ICE2T), pp. 1-4, 2017.
[3] M. Saiful, A. Isam, H. Moon, R. Tammana, M. Das, M. Alam and A. Rahman, "Real-Time
Sign Language Detection Using CNN," 2022.
[5] M. Sazli, "A brief review of feed-forward neural networks," Communications Faculty Of
Science University of Ankara, vol. 50, pp. 11-17, 2006.
47