0% found this document useful (0 votes)
9 views15 pages

NNFL Assignment 1128

The document outlines an assignment focused on developing a deep learning-based model for real-time sign language recognition using convolutional neural networks (CNNs). It details the challenges of accurately interpreting American Sign Language (ASL) gestures, the data preprocessing steps, the neural network architecture, and the training results, which show high accuracy and performance metrics. The conclusion emphasizes the need for further enhancements and optimizations to achieve robust real-time recognition in practical applications.

Uploaded by

mdhossainmaskat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views15 pages

NNFL Assignment 1128

The document outlines an assignment focused on developing a deep learning-based model for real-time sign language recognition using convolutional neural networks (CNNs). It details the challenges of accurately interpreting American Sign Language (ASL) gestures, the data preprocessing steps, the neural network architecture, and the training results, which show high accuracy and performance metrics. The conclusion emphasizes the need for further enhancements and optimizations to achieve robust real-time recognition in practical applications.

Uploaded by

mdhossainmaskat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Premier University

Department of Computer Science & Engineering

Course Code : CSE 451

Course Title : Neural Networks and Fuzzy logic

Assignment Name : Deep learning-based model for real-time sign language recognition

Date of Submission: 08 / 04 / 2025

Submitted By

Name Rahat Imroze Ahmed


Marks
ID 0222220005101128
Department CSE
Semester 7th
Section E
Introduction & Problem Statement:
Sign language recognition has gained significant attention in the field of computer vision
and human-computer interaction. It is a crucial step in bridging communication gaps for
the hearing impaired and offering an inclusive society. One of the primary challenges in
sign language recognition is developing a model that can accurately interpret hand
gestures in real-time, while also handling various factors such as variations in lighting,
hand positions, and background noise.

The problem of recognizing sign language using computer vision is a significant one due
to the need for high accuracy and adaptability in real-world scenarios. Traditional
methods require manual feature extraction, which can be complex and error-prone. With
deep learning, particularly convolutional neural networks (CNNs), we can automate
feature extraction and classification, leading to more efficient and accurate models for
recognizing American Sign Language (ASL) gestures.

This project focuses on building a CNN-based model to recognize ASL alphabets, aiming
for real-time performance in sign language interpretation and contributing to the
development of assistive technologies for the hearing impaired.


Data Preprocessing:
The dataset used for training the model consists of labeled images of ASL signs, where
each image represents a particular sign in the ASL alphabet. The preprocessing steps
undertaken are essential for standardizing the images and enhancing model performance.
1.​ Image Augmentation:​
Although image augmentation is typically useful in preventing overfitting, this
code does not include explicit augmentation steps like rotation, flipping, or
zooming. However, it can be added to enhance the model's robustness to various
inputs.​

2.​ Normalization:​
In this setup, the images are resized to a uniform size of (224, 224), ensuring
consistency in input dimensions. This helps the model learn better, as it processes
images of the same size.​

3.​ Train-Validation Split:​


The dataset is split into two parts—80% for training and 20% for
validation—using the validation_split argument of the
image_dataset_from_directory method. This ensures the model has a dedicated
dataset to evaluate its generalization ability.​

4.​ Batching:​
A batch size of 32 is used, enabling efficient training by processing multiple
images in parallel, which speeds up the learning process while maintaining
accuracy.​

Neural Network Architecture:


The architecture of the neural network used in this project is a simple yet effective
Convolutional Neural Network (CNN), which is suitable for image classification tasks.

1.​ Layers:​

○​ Convolutional Layers (Conv2D): These layers are responsible for


learning spatial features from the input images. The network uses three
convolutional layers with 32, 64, and 128 filters, respectively. Each
convolutional layer is followed by a ReLU (Rectified Linear Unit)
activation function, which helps introduce non-linearity and enables the
model to learn complex patterns.​
○​ MaxPooling Layers (MaxPooling2D): After each convolutional layer, a
max-pooling operation is applied to reduce the spatial dimensions of the
feature maps. This helps decrease the number of parameters and
computations in the model while preserving the important features.​

○​ Flatten Layer (Flatten): After the feature extraction, the output of the last
convolutional layer is flattened into a one-dimensional array, which is then
passed to a dense layer.​

○​ Fully Connected Layer (Dense): A dense layer with 128 units and ReLU
activation is used to learn high-level representations of the features.​

○​ Output Layer (Dense): The output layer consists of a softmax activation


function to classify the input image into one of the ASL alphabet classes,
with one neuron per class.​

2.​ Activation Functions:​

○​ ReLU: ReLU activation functions are used in the convolutional and dense
layers to introduce non-linearity, enabling the network to learn complex
patterns in the data.​

○​ Softmax: Softmax is used in the output layer to generate probability


distributions over the possible classes.​

3.​ Optimizer:​

○​ The Adam optimizer is used, which is popular due to its efficiency in


handling sparse gradients and its adaptive learning rate
Fig: CNN Model Details Diagram
Training and Evaluation Results
The model was trained for 10 epochs, and the training process was monitored using both
accuracy and loss metrics. The results obtained are as follows:

●​ Training Accuracy: The model reached a training accuracy of 100% by the end
of the 10th epoch, indicating excellent fitting to the training data.​

●​ Validation Accuracy: The validation accuracy was 95.22%, demonstrating that


the model generalizes well to unseen data, though some improvement could be
achieved in real-world scenarios with more diverse data.

Precision, Recall, and F1 Score:

●​ Precision: 95.98%​

●​ Recall: 95.22%​

●​ F1 Score: 95.25%​
These metrics highlight the model’s overall good performance in classifying ASL images
with a balanced approach between precision and recall. The high F1 score confirms the
model’s ability to correctly classify both positive and negative classes.

Fig :Testing the model

Confusion Matrix:
The confusion matrix is plotted to visualize the model's performance across different
ASL classes. It helps identify which classes the model struggles with, guiding future
improvements such as data augmentation or model refinement.
Fig: Confusion Matrix

Challenges in Real-Time Sign Language Recognition:


While the model demonstrates promising results in a controlled validation setting,
real-time sign language recognition presents several challenges:

1.​ Lighting and Background Variability:​


In real-world applications, changes in lighting conditions or complex
backgrounds can negatively affect model performance. To address this, additional
data augmentation techniques such as random brightness or contrast adjustments
can be implemented.​
2.​ Hand Gestures and Variations:​
Hand gestures may vary depending on the person performing them, including
size, speed, and orientation of the hand. These variations introduce noise and can
lower accuracy. More diverse data and more complex models (e.g., incorporating
pose estimation) could help mitigate this issue.​

3.​ Real-Time Processing:​


Real-time recognition requires the model to process images quickly, which could
be a challenge for deep learning models, particularly on edge devices. Optimizing
the model for faster inference (e.g., using model quantization or lightweight
architectures like MobileNet) would be crucial for real-time use.​

4.​ Overfitting:​
Given the relatively small dataset, there is a risk of overfitting the model to the
training data. Using techniques like dropout, data augmentation, or increasing the
dataset size would help improve generalization.​

5.​ Model Deployment:​


Deploying the trained model in real-world applications such as mobile or
embedded systems would require model optimization for memory and
computation efficiency. Edge AI frameworks like TensorFlow Lite can be
considered for deployment in resource-constrained environments.​

Conclusion and Future Work:


The model performs well in classifying ASL alphabets with high accuracy, precision,
recall, and F1 score. However, to achieve robust real-time sign language recognition,
further enhancements are necessary, such as incorporating hand tracking, leveraging more
diverse datasets, and optimizing the model for edge deployment.

Future work may include:

●​ Expanding the dataset to cover a wider range of gestures.​


●​ Experimenting with more complex models (e.g., using Recurrent Neural Networks
for sequential gestures).​

●​ Enhancing the model with data augmentation techniques to improve


generalization.​

●​ Exploring lightweight models for real-time deployment in mobile devices.​

Code Screenshots:

Fig no.1: Dataset split into test train data


Figure no.2: CNN Model Layers & Activation Function

Figure No.3: Training the model and displaying its accuracy


Figure No.4: Testing the model

Figure No 5: Code for visualizing the confusion matrix

Link of the code: https://www.kaggle.com/code/rahatimroze/nnflassignment


Reflection on Complex Problem-Solving Aspects:
● Does the solution require in-depth engineering knowledge?

Yes. Developing a real-time sign language recognition system involves a solid


understanding of computer vision, machine learning, and deep learning
architectures (such as CNNs). Additionally, it requires knowledge of preprocessing
techniques like normalization and handling image data effectively, which are
crucial for building a robust model. Understanding metrics like precision, recall,
and F1 score is also essential for evaluating performance.

● Are there conflicting technical and engineering challenges?

Yes. One major conflict lies between accuracy and real-time performance. While
deep and complex models like CNNs can achieve high accuracy, they may not be
fast enough for real-time deployment without optimization. Another challenge is
balancing generalization (performing well on new users or varying
lighting/background conditions) with overfitting on the training data. Optimizing
models for performance while keeping them lightweight is a significant
engineering dilemma.

● Does it require abstract thinking and novel problem-solving


techniques?

Absolutely. The project requires abstract thinking to design a neural network that
can interpret complex spatial patterns of hand gestures. It also involves thinking
creatively about data preprocessing, handling diverse gesture styles, and
considering future scalability to real-time systems. Using techniques like CNNs,
softmax activation, and confusion matrices to evaluate performance demonstrates a
high level of abstract reasoning.
●Are the issues encountered infrequent in standard engineering
practice?

Yes. Real-time sign language recognition involves challenges that are not typically
encountered in traditional software or hardware engineering. These include hand
gesture ambiguity, variation in human physiology, dynamic backgrounds, and
adapting AI systems to human motion all of which are specialized and complex
problems in the AI and computer vision domain.

● Does the solution involve adherence to specific standards (e.g., real-time AI


processing)?

Yes. If this solution is to be used in real-world applications, it must meet standards


for real-time inference, low-latency processing, and potentially accessibility or
assistive technology standards. Ensuring consistent performance across diverse
users and environments is essential, as is optimizing the model for deployment
using standards like TensorFlow Lite.

● Are there multiple stakeholders with different needs?

Yes. The primary stakeholders include:

●​ Hearing-impaired individuals who need accurate and reliable sign


recognition.​

●​ Developers/engineers aiming to optimize the system for real-time use.​

●​ End-users who might integrate the system into educational tools or


translation services.​

●​ Researchers who may want to build on the system for gesture-based


interaction. These stakeholders have varying priorities, accuracy, speed, ease
of integration, and reliability.​

● How does this problem involve interdependence between AI, vision,


and human-computer interaction?

This project is a perfect example of the interdisciplinary nature of modern


problem-solving.

●​ AI (deep learning) is used for learning from image data.​

●​ Computer vision handles preprocessing, interpreting pixel-level data, and


extracting features.​

●​ Human-computer interaction (HCI) is at the core, as the system’s success


depends on how intuitively and accurately it interprets user gestures.​

You might also like