NNFL Assignment 1128
NNFL Assignment 1128
Assignment Name : Deep learning-based model for real-time sign language recognition
Submitted By
The problem of recognizing sign language using computer vision is a significant one due
to the need for high accuracy and adaptability in real-world scenarios. Traditional
methods require manual feature extraction, which can be complex and error-prone. With
deep learning, particularly convolutional neural networks (CNNs), we can automate
feature extraction and classification, leading to more efficient and accurate models for
recognizing American Sign Language (ASL) gestures.
This project focuses on building a CNN-based model to recognize ASL alphabets, aiming
for real-time performance in sign language interpretation and contributing to the
development of assistive technologies for the hearing impaired.
Data Preprocessing:
The dataset used for training the model consists of labeled images of ASL signs, where
each image represents a particular sign in the ASL alphabet. The preprocessing steps
undertaken are essential for standardizing the images and enhancing model performance.
1. Image Augmentation:
Although image augmentation is typically useful in preventing overfitting, this
code does not include explicit augmentation steps like rotation, flipping, or
zooming. However, it can be added to enhance the model's robustness to various
inputs.
2. Normalization:
In this setup, the images are resized to a uniform size of (224, 224), ensuring
consistency in input dimensions. This helps the model learn better, as it processes
images of the same size.
4. Batching:
A batch size of 32 is used, enabling efficient training by processing multiple
images in parallel, which speeds up the learning process while maintaining
accuracy.
1. Layers:
○ Flatten Layer (Flatten): After the feature extraction, the output of the last
convolutional layer is flattened into a one-dimensional array, which is then
passed to a dense layer.
○ Fully Connected Layer (Dense): A dense layer with 128 units and ReLU
activation is used to learn high-level representations of the features.
○ ReLU: ReLU activation functions are used in the convolutional and dense
layers to introduce non-linearity, enabling the network to learn complex
patterns in the data.
3. Optimizer:
● Training Accuracy: The model reached a training accuracy of 100% by the end
of the 10th epoch, indicating excellent fitting to the training data.
● Precision: 95.98%
● Recall: 95.22%
● F1 Score: 95.25%
These metrics highlight the model’s overall good performance in classifying ASL images
with a balanced approach between precision and recall. The high F1 score confirms the
model’s ability to correctly classify both positive and negative classes.
Confusion Matrix:
The confusion matrix is plotted to visualize the model's performance across different
ASL classes. It helps identify which classes the model struggles with, guiding future
improvements such as data augmentation or model refinement.
Fig: Confusion Matrix
4. Overfitting:
Given the relatively small dataset, there is a risk of overfitting the model to the
training data. Using techniques like dropout, data augmentation, or increasing the
dataset size would help improve generalization.
Code Screenshots:
Yes. One major conflict lies between accuracy and real-time performance. While
deep and complex models like CNNs can achieve high accuracy, they may not be
fast enough for real-time deployment without optimization. Another challenge is
balancing generalization (performing well on new users or varying
lighting/background conditions) with overfitting on the training data. Optimizing
models for performance while keeping them lightweight is a significant
engineering dilemma.
Absolutely. The project requires abstract thinking to design a neural network that
can interpret complex spatial patterns of hand gestures. It also involves thinking
creatively about data preprocessing, handling diverse gesture styles, and
considering future scalability to real-time systems. Using techniques like CNNs,
softmax activation, and confusion matrices to evaluate performance demonstrates a
high level of abstract reasoning.
●Are the issues encountered infrequent in standard engineering
practice?
Yes. Real-time sign language recognition involves challenges that are not typically
encountered in traditional software or hardware engineering. These include hand
gesture ambiguity, variation in human physiology, dynamic backgrounds, and
adapting AI systems to human motion all of which are specialized and complex
problems in the AI and computer vision domain.