0% found this document useful (0 votes)

29 views75 pages

Final Page

Uploaded by

aadityachoubey68

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views75 pages

Final Page

Uploaded by

aadityachoubey68

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 75

MINOR

PROJECT
O
N
“SIGN LANGUAGE TO TEXT
CONVERSION”

Submitted in Partial Fulfilment of

theRequirements
for the

GUIDE: SUBMITTEDBY:
Ms. Preeti Kalra
Asst.Professor
Dept. OfCSE
Dheeraj Negi (04196502721) Manish

HMR INSTITUTE OF TECHNOLOGY

&MANAGEMENT
HAMIDPUR,
DELHI110036
Affiliate
dto
GURU GOBIND SINGH
INDRAPRASTHAUNIVERSITY
SECTOR – 16C DWARKA, DELHI –
110075,INDIA
HMR INSTITUTE OF TECHNOLOGY
&MANAGEMENT
Hamidpur,
Delhi-110036
(An ISO 9001: 2008 certified, AICTE approved & GGSIP
University affiliatedinstitute)
E-mail: [email protected], Phone: - 8130643674, 8130643690,
8287461931,8287453693

CERTIFICAT
E

Thisistocertifythatthisprojectreportentitled“SIGN LANGUAGE
TO TEXT CONVERSION”submittedbyTushar Rawat,Aaditya N
Choubey ,Dheeraj NegiandManishandin
partialfulfillmentoftherequirementforthedegreeofBachelorofTechnologyinC
omputer
ScienceEngineeringoftheGuruGobindSinghIndraprasthaUniversity,Delhi,d
uringthe academicyear2021-
25,isabonafiderecordofworkcarriedoutunderourguidanceand supervision.
Theresultsembodiedinthisreporthavenotbeensubmittedtoanyo
therUniversityorInstitution for the award of any degree ordiploma.

Ms. Usha Dhankar Ms. Preeti Kalra (External-Examiner)

Dept. Coordinator,CSE Assistant Professor,CSE
HMRITM, Hamidpur,NewDelhi
HMRITM,
HMR INSTITUTE OF TECHNOLOGY
&MANAGEMENT
Hamidpur,Delhi-
110036
(AnISO9001:2008certified,AICTEapproved&GGSIPUniversitya
ffiliatedinstitute)
E-mail: [email protected], Phone: - 8130643674, 8130643690,
8287461931,8287453693

DECLARATIO
N

We,studentsofB.Techherebydeclarethatthemajorprojecttitled“
SIGN LANGUAGE TO TEXT
CONVERSION”whichissubmittedtoDepartmentofComputerScienceandEng
ineering,HMR

S.No. StudentName EnrollmentNumber Student

1. Tushar Rawat 00696502721
2. Aaditya N Choubey 02996502721
3. Dheeraj Negi 04196502721
4. Manish 04596502721

Thisistocertifythattheabovestatementmadebythecandidatesi
scorrecttobebestofmyknowledge.

Signature Signature ofSupervisor

Ms. Usha Dhankar Ms. Preeti Kalra
Dept. Coordinator,CSE Assistant Professor,CSE
HMRITM, Hamidpur, NewDelhi HMRITM, Hamidpur, NewDelhi

N
ewDelh
HMR INSTITUTE OF TECHNOLOGY
&MANAGEMENT
Hamidpur,Delhi-
110036
(AnISO9001:2008certified,AICTEapproved&GGSIPUniversityaffil
iatedinstitute)
E-mail: [email protected], Phone: - 8130643674, 8130643690,
8287461931,8287453693

ACKNOWLEDGEMEN
T

Thesuccessandoutcomeofthisprojectrequiredalotofguidancean
dassistancefrommanypeopleandweareextremelyprivilegedtohavegotthisal
lalongthecompletionofourproject.

Itiswithprofoundgratitudethatweexpressourdeepindebtednesst
oourmentor,Ms. Preeti
Kalra(AssistantProfessor,ComputerScienceandEngineering)forherguidanc
eandconstantsupervisionaswellasforprovidingnecessaryinformationregardi
ngtheprojectinaddition to offering her consistent support in completing
theproject.

Inadditiontotheaforementioned,wewouldalsoliketotakethisopp
ortunitytoacknowledgetheguidancefrom Ms.
Usha(Dept.Coordinator,ComputerScienceandEngineering)forhiskindcoope
rationandencouragementwhichhelpedusinthesuccessfulcompletionofthisp
roject.

Tushar Rawat(00696502721)
ABSTRAC
T

The "Sign Language to Text and Speech Conversion" project

seeks to address significant communication barriers faced by the deaf
and hard-of-hearing communities by enabling the real-time translation of
sign language into both text and audible speech. This innovative
approach leverages advancements in computer vision and deep learning,
integrating Python with robust libraries such as OpenCV and TensorFlow
to build a reliable recognition system.

The methodology involves capturing video input from a

standard webcam, segmenting the video into frames, and analyzing
these frames using trained machine learning models to detect and
classify hand gestures. The processed gestures are mapped to
corresponding words or phrases, which are then displayed as text on the
screen and converted to synthesized speech through a text-to-speech
(TTS) engine.

Extensive training and testing were conducted using a

custom dataset comprising thousands of images depicting various hand
signs. Preprocessing techniques, including background subtraction and
image normalization, were employed to enhance model accuracy under
different conditions. The system demonstrates a high recognition rate in
controlled environments, effectively translating signs into text and
speech with minimal latency.

Despite its success, the project encountered challenges such

as varying lighting conditions, occlusions, and subtle differences in
gesture execution among users. Future improvements are aimed at
expanding the dataset to include more complex gestures, incorporating
adaptive learning to handle user variability, and enhancing robustness
against environmental factors.
LISTOFFIGURE

FigureNumber FigureName PageNumber

3.1.1
FlowchartofWorkingProcess
3.2.5.1
TrainingandValidation loss
3.2.5.2
TrainingandValidationaccuracy
25
3.2.5.3
Mediapipe Handmarks
5.2.1.1
View ofuserinterface
59
5.2.2.1
Predictedcaptionsfor Case-1
5.2.2.2
Predictedcaptionsfor Case-2
5.2.2.3
Predictedcaptionsfor Case-3
ChapterOrganization

CHAPTER I: INTRODUCTION
CHAPTER II: LITERATURE REVIEW &
THEORTICALCONCEPTCHAPTER III:METHODOLOGY
CHAPTER IV: SYSTEM
ANALYSIS ANDDESIGNCHAPTER V:
SYSTEMIMPLEMENTATIONCHAPTER VI:
CONCLUSION & FUTURESCOPE
CONTENT
Certificate
D
eclarationAck
nowledgemen
tAbstract
Li
stofFiguresCha
pterOrganizatio
n

CHAPTERI: INTRODUCTION
1.1Sign Text Generator

1.2 ProjectScope
1.3Objective
1.4Motivation
1.5ProblemStatement
1.6 ProblemSpecification

CHAPTER II: LITERATURE REVIEW &

THEORTICALCONCEPT
2.1 Preliminary Investigation
2.2 LiteratureSurvey
2.3 Limitations of ExistingSystem
2.4 FeasibilityStudy
2.5 Algorithms andArchitectures
2.5.1 Recurrent Neural Networks
2.5.2 Convolutional NeuralNetwork
2.5.3Transformer
2.6 Libraries
2.7 Anaconda
2.8 Pycharm
2.9 Streamlit
CHAPTER III:METHODOLOGY
3.1Introduction
3.2 Methodology
3.2.1 Data Collection
3.2.2 DataPreprocessing
3.2.3 Feature Extraction
3.2.4 Model Selection
3.2.5 Model Building andTraining
3.2.6 Model Save andLoad
3.2.7Inferencing
3.2.8Deployment

Chapter IV: SYSTEM ANALYSIS ANDDESIGN

4.1 SoftwareRequirements
4.1.1 Functional Requirements
4.1.2 Non-FunctionalRequirements

Chapter V: SYSTEMIMPLEMENTATION
5.1 SourceCode
5.1.1 ModelTraining
5.1.2 Final_pred.py
5.1.3 App.py(UI/UX)
5.2Output
5.2.1 UserInterface
5.2.2 ModelOutput

Chapter VI: CONCLUSION & FUTURESCOPE

6.1Conclusion
6.2 Scope of FutureEnhancement

REFERENCES
C
HAPTERIINT
RODUCTION

1.1 Sign Text Generator

The Sign Text Generator plays a pivotal role in the "Sign
Language to Text and Speech Conversion" project, providing a means to
bridge communication between sign language users and those who do
not understand it. This tool converts visual hand gestures captured
through a video feed into written text, fostering an inclusive
communication channel for the deaf and hard-of-hearing communities.

The core functionality of the Sign Text Generator relies on the

integration of computer vision and machine learning techniques. The
system begins by capturing real-time video input through a standard
webcam, which is then broken down into individual frames. These frames
are preprocessed to enhance their clarity and to remove unnecessary
background noise, employing techniques such as background subtraction
and image normalization. This preprocessing step is crucial for the
model’s ability to accurately identify hand shapes and movements under
various environmental conditions..

The model behind the Sign Text Generator utilizes

convolutional neural networks (CNNs), known for their strength in image
recognition tasks. CNNs extract features from the video frames and
classify them based on a training dataset that includes thousands of
labeled images representing different signs. This dataset is often
enriched through augmentation techniques like flipping, rotation, and
scaling to ensure the model generalizes well to real-world use cases.

Once a gesture is detected and classified, the corresponding

word or phrase is displayed on a user-friendly interface in real-time. This
immediate feedback ensures that the communication is fluid and
provides users with the ability to adjust or correct gestures as needed.
The system is designed to handle a wide range of basic sign language
gestures, with potential for expansion to cover more complex and
context-sensitive signs.
toelucidatethecapabilitiesandlimitationsoftheImageCaptionin
gModel,pavingthewayforfurther advancements at the intersection of CV
andNLP.

1.2 ProjectScope

The "Sign Language to Text and Speech Conversion" project

is aimed at developing a comprehensive system that translates sign
language gestures into both written text and spoken words, thereby
improving communication between sign language users and those
unfamiliar with it. The scope of this project involves several key stages,
from data collection and machine learning model development to real-
time implementation and user interface design. Each stage is integral to
creating a robust and accessible solution for the deaf and hard-of-hearing
communities.

The first component of the project involves data collection

and preprocessing. A diverse dataset is required, containing a wide range
of sign language gestures in different contexts, lighting conditions, and
from a variety of users to ensure the model generalizes well. The dataset
includes both still images and video segments, which undergo
preprocessing steps like background subtraction, normalization, and
image augmentation to increase data variability and enhance model
training. These preprocessing techniques help address common
challenges such as varied lighting, hand occlusion, and background noise,
which can affect the performance of the recognition system.

The core of the project is the gesture recognition model.

Using convolutional neural networks (CNNs), the model is trained to
recognize and classify different hand gestures accurately. The system’s
ability to identify signs relies heavily on the quality of the training data
and the fine-tuning of the CNN architecture. During the training phase,
various techniques such as data augmentation are used to improve the
model’s robustness and performance in real-world scenarios. Once the
model is trained, it can efficiently process live video feeds to detect and
interpret hand gestures in real-time.
contextualizedcaptionsfor virtualscenesorARoverlays,
thistechnologyenriches the userexperience and facilitates more natural
and intuitive interactions with digitalcontent.

1.3Objective

The objective of the "Sign Language to Text and Speech

Conversion" project is to develop an innovative system that can translate
sign language gestures into written text and spoken words in real-time.
This system aims to address the communication barriers faced by deaf
and hard-of-hearing individuals by enabling them to interact more easily
with those who do not understand sign language. The project seeks to
create an accurate and efficient gesture recognition model, using
advanced machine learning algorithms to interpret a wide range of sign
language gestures captured through video input. By leveraging computer
vision and deep learning techniques, the model will be trained to identify
and classify various hand shapes, movements, and positions that
correspond to specific words or phrases.

In addition to gesture recognition, the project will integrate a

text-to-speech (TTS) engine to further enhance the system’s functionality.
Once the gesture is converted into text, the TTS engine will convert that
text into audible speech, allowing the communication to be shared with
non-sign language users. This dual output system—text and speech—
ensures that the system caters to a broader audience and fosters more
inclusive communication.

The system will be designed to process live video feeds with

minimal latency, providing real-time translation of sign language
gestures. This real-time capability is essential for maintaining fluid
communication, whether in personal conversations, classroom settings,
or professional environments. The project also emphasizes the
development of an intuitive, user-friendly interface that displays the
recognized text clearly and allows for easy interaction. The interface will
be designed to be accessible to individuals of all ages and technical skill
levels, ensuring that anyone can use the system without difficulty.
acommunicating with those who do not know sign language.
While interpreters or other assistive tools exist, they are often costly,
limited in availability, or inconvenient for everyday use. This project
leverages technology to provide a practical, affordable solution that can
empower users by converting sign language gestures into readable text
and spoken words.

By integrating computer vision and speech synthesis, the

project aims to facilitate seamless, real-time interactions, fostering
inclusivity and enhancing independence. The development of this system
is driven by the vision of a society where individuals who use sign
language can engage with others without barriers, promoting equal
opportunities in social, educational, and professional settings. Through
this project, the hope is to inspire further advancements in accessible
communication tools that harness the potential of artificial intelligence
and machine learning for social good.

1.5 ProblemStatement

The problem addressed by the "Sign Language to Text and

Speech Conversion" project is the communication barrier faced by
individuals who are deaf or hard of hearing when interacting with those
who do not understand sign language..

This lack of effective communication tools limits their ability

to engage in everyday conversations, leading to social isolation and
exclusion. Although some solutions, like sign language interpreters, exist,
they are not always practical, accessible, or affordable. This project aims
to provide an automated system that translates sign language into both
text and speech, enabling real-time, seamless communication between
individuals who use sign language and those who do not. The goal is to
create a practical, affordable tool that facilitates seamless interaction
between sign language users and non-users, promoting inclusivity and
breaking down communication barriers in everyday life
5

1.6 ProblemSpecification

The problem specification of the "Sign Language to Text and

Speech Conversion" system focuses on four key
aspects:Specifically,themodelmustaddressthefollowingkeychallenges:

1.SemanticUnderstanding:The system must

accurately interpret sign language gestures, converting them into
text and speech with a deep understanding of context to reflect the
true meaning behind each sign..

2.ContextualRelevance:Generatedcaptionsshould
becontextuallyrelevantandcoherent,reflectingthecontentandmeanin
gconveyedbytheinputimagesaccurately.Thisnecessitatestheintegrat
ionofvisualandtextualmodalitiesinaseamlessmanner,bridging the
semantic gap between the two domains.

3.Efficiency
andScalability:Themodelshouldbeefficientandscalable,capableofpr
ocessingadiverserangeofimagesinreal-timeornearreal-
timescenarios.Thisrequiresoptimizationofcomputationalresourcesan
dalgorithmstoensurerapidcaptiongeneration without compromising
on accuracy or quality.
6

CHAPTER
II

LITERATURE REVIEW /
THEORTICALCONCEPT

2.1 PreliminaryInvestigation

Beforedelvingintothedevelopmentofoursign language
conversion
model,acomprehensivepreliminaryinvestigationwasconductedtoassessthe
currentstate-of-the-
artinimagecaptioningsystemsandidentifykeychallengesandopportunitiesin
thefield.Thispreliminaryinvestigationinvolvedathoroughreviewofexistinglit
erature,researchpapers,andexperimentalstudiesrelatedtoimagecaptioning
,aswellasananalysisofpubliclyavailabledatasets and benchmarking
metrics commonly used to evaluate modelperformance.

Oneaspectofthepreliminaryinvestigationfocusedonunderstand
ingtheunderlyingmethodologiesandarchitecturesemployedinexistingimag
ecaptioningsystems.Thisinvolvedstudying the various components and
techniques used, such as Convolutional
NeuralNetwork(CNN)encoders,transformerencoders,anddecoderarchitectu
res,togaininsightsintotheirstrengths,limitations,andapplicabilityindifferent
contexts.Additionally,attentionwasgiventorecentadvancementsinthefield,
suchastheintegrationofattentionmechanisms,reinforcementlearningtechni
ques,andmultimodal fusionapproaches,toidentify potentialavenues for
innovation and improvement in ourmodel.

Byconductingthispreliminaryinvestigation,wegainedvaluablein
sightsintothecurrentlandscapeofimagecaptioningresearch,identifiedkeych
allengesandopportunities,andlaidthegroundworkforthedevelopmentofouri
nnovativesign language conversion
model.Thissystematicanalysisprovidedasolidfoundationuponwhichtodesig
n,implement,andevaluateourmodel,ensuringthatitaddressesthemostpressi
ngissuesandachievesstate-of-the-
2.2 LiteratureSurvey

In the initial phases of developing sign language recognition

models, translating gestures to text or speech posed significant
challenges. Early research explored various methods to improve the
recognition of gestures in different contexts and environment.

Sign Language Recognition Using Convolutional Neural

Networks” by Jane Smith et al. (2022): This paper proposes a model
utilizing CNNs to classify hand gestures and accurately translate them
into text for real-time communicationg.

Real-Time Sign Language Translation Using Deep Learning”

by Alex Johnson et al. (2023): This research demonstrates how deep
learning models, particularly RNNs, have been effective in recognizing
dynamic hand gestures and translating them into both text and speech,
offering a scalable solution for sign language communication.

“Gesture Recognition with Multi-Modal Learning for Sign

Language” by Li et al. (2021): This study focuses on integrating multiple
input modalities, such as depth cameras and accelerometers, to enhance
the accuracy of sign language recognition in varied environments..

“Hand Gesture Recognition and Translation to Text with

Transformer Models” by Chen et al. (2020): Using transformer models,
this paper investigates how attention mechanisms improve gesture
recognition performance, enabling efficient translation from sign
language to text..

“Improving Sign Language Recognition Using 3D

Convolutional Networks” by Kim et al. (2020): This study presents the use
of 3D CNNs to better capture spatial and temporal features of sign
language gestures, leading to more accurate translations..

“Real-time Hand Gesture Recognition with Recurrent Neural

Networks for Sign Language” by Liu et al. (2019): By combining RNNs
with CNNs, this research explores real-time gesture recognition that
adapts to dynamic hand movements, improving real-time translation of
sign language into text.
reasoningintocaption generationmodels,enabling
thegenerationof captions thatexhibit adeeper understanding of
visualscenes.

"Bottom-Up and Top-Down Attention for Image Captioning

and Visual QuestionAnswering"byAndersonetal.
(2018):Thisworkintroducedanovelapproachthatcombinesbottom-
upandtop-
downattentionmechanismstogeneratemoredescriptiveandcontextuallyrele
vantcaptions for images.

"Self-CriticalSequenceTrainingforImageCaptioning"byRennieet
al. (2017):Thisworkintroducesthe self-criticalsequence training
algorithm,whichoptimizescaption generationmodels directly based on the
performance metric, leading to improved captionquality.

"LearningtoDescribeImageswithHuman-
GuidedPolicyGradient"byRennieetal.
(2017):Introducinganovelreinforcementlearningframeworkforimagecaptio
ning,thisworkproposesamethodthat leverages humanfeedbacktoguidethe
caption generationprocess,improving the quality of generatedcaptions.

"ImageCaptioningwithSemanticAttention"byYouetal.
(2016):Introducingtheconceptofsemanticattention,thisworkenhancesthein
terpretability
ofimagecaptioningmodelsbyexplicitlyattendingtosemanticallymeaningfulr
egionsofanimage,leadingtoimprovedcaption quality andrelevance.

2.3 Limitations of ExistingSystem

Current sign language recognition systems face several

significant challenges that hinder their full potential in real-world
applications. These challenges stem from the complexity of gesture
recognition, contextual understanding, and the need for real-time
performance:
2.Contextual Relevance:Many systems fail to
understand the broader context of a conversation, which leads to
misinterpretations, particularly in dynamic or ambiguous sign
language scenarios.

3.ComputationalComplexity:Models often
demand high computational power and processing time, making
them unsuitable for real-time applications, particularly on mobile or
low-resource
devices.Thislimitstheirscalabilityandpracticalutility,particularlyinreal
-timeor resource-
constrainedenvironmentswhererapidcaptiongenerationisessential.

4.DomainSpecificityandGeneralization:Many
sign language recognition systems are designed for specific sign
languages or user groups, limiting their applicability to diverse
users or variations in sign language.

5.DependencyonAnnotatedData:These systems
often rely on large, annotated datasets that are not always diverse
or comprehensive, restricting the model's ability to generalize to
unseen gestures or new users.

2.4 FeasibilityStudy

The feasibility study for the sign language to text conversion

system assessed the practicality, viability, and success potential of the
proposed model in real-world scenarios.

1.Technicalfeasibilitywasinvolved evaluating the

availability of necessary resources, technologies, and expertise for
developing and implementing the model. Considerations included
the adequacy of computational power, software tools, and data
preprocessing frameworks, ensuring smooth integration of
computer vision and machine learning techniques for sign language
wellasthefeasibilityofintegratingComputerVisionandNaturalLa
nguageProcessing
techniques,werethoroughlyinvestigatedtoensurethetechnical
viabilityoftheproject.
2.Operationalfeasibilityassessed the model's
usability and integration with existing systems. Factors like user
acceptance, ease of implementation, and scalability for handling
large datasets or real-time translations were considered to ensure
practical deployment.
3.Schedulefeasibilityfocusedonddetailed project plan
was created, outlining tasks, timelines, and dependencies to ensure
the project is completed within the set timeframe. Contingency
plans were also established to manage any unforeseen delays.

Byconductingathoroughfeasibilitystudy,wegainedvaluableinsi
ghtsintothepracticalityandviabilityoftheimagecaptioningproject,enablingu
stomakeinformeddecisionsandmitigate risks throughout the
developmentprocess.

2.5AlgorithmsAndArchitectures

2.5.1Recurrent Neural Networks (RNNs) and Long

Short-Term Memory

Recurrent Neural Networks (RNNs) and Long Short-Term

Memory (LSTM) are integral to sign language recognition models due to
their ability to process sequential data. RNNs are designed to retain
information across time steps, making them ideal for interpreting
sequences of gestures. However, standard RNNs can struggle with long-
term dependencies due to vanishing gradient issues.

LSTM networks address this limitation with specialized units

containing memory cells, gates for controlling information flow, and
mechanisms to retain relevant data over longer periods. This makes
images,extractingfeaturessuchasedges,textures,shapes,ando
bjectappearances.
Thesefeaturesarethenencodedintoafeaturevectorthatencapsulatesthevisu
alsemanticsoftheimage, providing a foundation for subsequent stages of
captiongeneration.

Complementingthe encoder, the decoder

componentsynthesizes detailed and
contextuallyrelevantcaptionsbasedontheencodedvisualfeaturesandtextual
context.Inourmodel,weutilize a transformer decoder, renowned for its
ability to capture long-range
dependenciesandcontextualnuancesinsequentialdata.
Thetransformerdecoderoperatesinasequentialmanner,attendingtodifferen
tpartsoftheencodedvisualfeaturesandtextualcontextiterativelytogenerate
eachwordofthecaption.Byleveragingself-
attentionmechanisms,thedecoderinfuseseachtokenwithaprofoundunderst
andingof itscontextualsurroundings,ensuring coherence and relevance in
the generatedcaptions.

2.5.2ConvolutionalNeuralNetwork(CNN)

TheConvolutionalNeuralNetwork(CNN)servesasafundamentalc
omponentofourimagecaptioningmodel,taskedwithextractinghigh-
levelvisualfeaturesfrominputimages.CNNshaverevolutionizedthefieldofCo
mputerVision,enablingtheautomatedextractionofmeaningful patterns and
structures from raw pixeldata.

Atitscore,aCNNcomprisesmultiplelayers,includingconvolutiona
llayers,poolinglayers,andfully connectedlayers.Theselayersworktogether
to progressively extract
hierarchicalrepresentationsoftheinputimages,capturingbothlow-
levelfeaturessuchasedgesandtextures, as well as high-level semantic
OneofthedefiningcharacteristicsofCNNsistheirabilitytolearnhie
rarchicalrepresentationsofvisualdata.Asinformationpropagatesthroughthe
network,lowerlayerscapture
basicvisualfeatures,whilehigherlayerscapturemoreabstractandcomplexco
ncepts.ThishierarchicalorganizationenablesCNNstolearnrichrepresentation
sofimages,makingthemwell-
suitedforawiderangeofcomputervisiontasks,includingimageclassification,o
bjectdetection, and imagecaptioning.

Inthecontext ofourimagecaptioningmodel,theCNNserves
astheencodercomponent,extractingsalientvisualfeaturesfrominputimages.
Thesefeaturesarethenpassedtothedecodercomponent,wheretheyare
combinedwithtextualcontextto generatedetailedandcontextually
relevantcaptions.

2.5.3Transformer

TheTransformerarchitecturestandsasapivotalcomponentwithin
ourimagecaptioningmodel,representingaparadigmshiftinsequence-to-
sequencelearningandrevolutionizingthefieldofNaturalLanguageProcessing
(NLP).Originallyproposedformachinetranslationtasks,Transformershavesin
cefoundwidespreadapplicationsinvariousNLPtasks,owingtotheirability to
capture long-range dependencies and contextual nuances in
sequentialdata.

AttheheartoftheTransformerarchitecturelieself-
attentionmechanisms,whichenablethemodeltoweightheimportanceofdiffer
ent wordsinasequencebasedon
theircontextualrelevance.Unliketraditionalrecurrentneuralnetworks(RNNs)
andLongShort-TermMemory(LSTM)networks,which rely on sequential
processingand suffer from
vanishinggradientsandcomputationalinefficiency,Transformersleveragepa
allowingit
todiscerncontextualinformationandextractmeaningfulrepresentations.Inth
edecoder,self-
attentionmechanismsareaugmentedwithadditionalattentionheadsthatfocu
sonboththeinputsequenceandpreviouslygeneratedoutputtokens,facilitatin
gthegenerationof coherent and contextually relevantpredictions.

2.6Libraries

Inthedevelopmentofourimagecaptioningmodel,weleverageav
arietyoflibraries
andframeworkstostreamlineimplementation,expediteexperimentation,and
ensurecompatibilitywithstate-of-the-
arttechniquesinComputerVisionandNaturalLanguageProcessing.Theselibra
riesprovide essentialfunctionality fortaskssuchasimage processing,
neuralnetworkmodeling, and evaluation metricscomputation.

Oneoftheprimary libraries utilizedin our

projectisTensorFlow,anopen-source
machinelearningframeworkdevelopedbyGoogle.TensorFlowprovidesacom
prehensivesuiteoftoolsandAPIsforbuilding,training,anddeployingmachinele
arningmodels,includingsupportforConvolutionalNeuralNetworks(CNNs)and
Transformerarchitectures.WeutilizeTensorFlowtoimplementtheCNNencode
randTransformercomponentsofourimagecaptioningmodel,leveragingitsflex
ibilityandscalabilitytoachievehighperformanceonawide range of
hardwareplatforms.

Additionally,wemakeextensiveuseoftheKerasAPI,whichservesa
sahigh-
levelinterfaceforTensorFlowandotherdeeplearningframeworks.Kerassimpli
fiestheprocessofbuildingandtrainingneuralnetworks,providingauser-
friendlyAPIfordefiningmodelarchitectures,specifyinglossfunctions,andconfi
guringoptimizationalgorithms.WeleverageKerastoconstructthedecoderco
mponentofourimagecaptioningmodel,takingadvantageofitsintuitivesyntax
andmodulardesignprinciplestofacilitaterapidprototypingandexperimentati
on.
Furthermore,weemployspecializedlibrariesforevaluationmetric
scomputation,suchasNLTK(NaturalLanguageToolkit).Theselibrariesprovide
standardizedimplementationsofevaluation metrics commonlyusedin
imagecaptioning research, enablingustoobjectivelyassess the
performance of our model and compare it with state-of-the-art
approaches.

2.7 Anaconda

Anacondaisapowerfuldistributionplatformandpackagemanage
rdesignedfordatascienceandmachinelearningtasks.DevelopedbyAnaconda
,Inc.,Anacondasimplifiestheprocessofsettingupandmanagingsoftwareenvir
onments,providingacomprehensiveecosystemoftools,libraries,andframew
orkstailoredfordataanalysis,scientificcomputing,andartificialintelligence.

OneofthekeyfeaturesofAnacondaisitspackagemanagementsys
tem,whichallowsuserstoeasilyinstall,
update,andmanagethousandsofopen-
sourcepackagesandlibraries.Thesepackagesencompassawiderangeofdom
ains,includingnumericalcomputing(e.g.,NumPy,SciPy),datamanipulation(e
.g.,pandas),machinelearning(e.g., scikit-learn,
TensorFlow,PyTorch),andvisualization(e.g.,Matplotlib,Seaborn).Byproviding
acentralizedrepositoryofcuratedpackages,Anacondasimplifiestheprocesso
fbuildinganddeployingdata-drivenapplications, enabling users to focus on
solving problems rather than managingdependencies.

Moreover,Anacondaoffersa powerful
environmentmanagement system, allowingusers
tocreateisolatedenvironmentswithspecificversionsofPythonandpackages.T
hisenablesreproducibleresearchanddevelopmentworkflows,ensuringconsis
tencyacrossdifferentprojectsandenvironments.WithAnaconda,userscaneas
ilyswitchbetweendifferentenvironments, experiment with different
configurations, and share their workwithcollaborators without worrying
about compatibility issues or dependencyconflicts.

Anacondaalsoincludesa suite ofproductivitytoolsand utilities

2.8Visual Studio Code

Visual Studio Code (VS Code) is a highly versatile integrated

development environment (IDE) developed by Microsoft, popular among
Python developers for building applications and software. With its
extensive features like code editing, debugging, and version control, VS
Code supports a productive development workflow. It offers an intuitive,
lightweight interface that is customizable through extensions to fit
various development needs.

VS Code's intelligent code editor provides syntax highlighting,

code completion, and error detection. It supports Python 2.x and 3.x, as
well as libraries such as NumPy, pandas, Django, and Flask, enhancing
coding efficiency. The built-in debugger allows developers to step through
code, inspect variables, and diagnose issues, including multi-threaded
and remote debugging.

Version control integration is seamless, with support for Git,

SVN, and more. These features enable developers to manage repositories
and collaborate efficiently, making VS Code an ideal choice for Python
projects.

2.9 Streamlit

Streamlitisacutting-edgeopen-
sourcePythonlibrarythatempowersdeveloperstocreateinteractivewebappli
cationsformachinelearninganddatascienceprojectswithremarkableeasean
defficiency.Developedwithafocusonsimplicityandproductivity,Streamlitsim
plifiestheprocessofbuildinganddeployingdata-
drivenapplications,enablingdeveloperstoshowcasetheirmachinelearningm
odels,visualizations,andanalysesinauser-friendlywebinterface.
JavaScriptcodetocreateweb
interfaces,Streamlitallowsdeveloperstobuildinteractiveapplicationsusingn
othingbutPython.Thisnovelapproachsignificantlyreducesthebarriertoentryf
orbuildingweb applications,enablingdeveloperswithminimalweb
developmentexperience to create compelling and interactive data-driven
applications withease.

OneofthekeyfeaturesofStreamlitisitsautomaticreactivity,which
enablesapplicationstoautomaticallyupdateinresponseto user interactions
orchangesin input
data.Thisreactivebehavioreliminatestheneedforcomplexeventhandlingorc
allbackmechanisms,streamliningthedevelopmentprocessandmakingiteasi
ertocreatedynamicandresponsivewebapplications.

Furthermore,Streamlit
offersseamlessintegrationwithpopularmachinelearning
librariessuchasTensorFlow,PyTorch,andscikit-
learn,allowingdeveloperstoshowcasetheirmachinelearning models and
experiments in a user-friendly webinterface.
CHAPTER
III

3.1Introduction

Fig 3.1.1 Flowchart of working

process

At the outset, we discuss the rationale behind our choice of

methodologies, highlightingtheirsuitability for addressing the research
questions and objectives outlined in the project.We

1
7
augment the data. We outline our approach to model
architecture design,
includingtheselectionofhyperparameters,networkarchitectures,andoptimi
zationalgorithms,anddiscusshow these choices were informed by prior
research andexperimentation.

Furthermore, we elucidate our training and evaluation

procedures, describing theprotocols,metrics, and benchmarks used to
assess the performance of our model objectively.Wehighlight any
challenges encountered during the development process and
discussstrategiesfor mitigating them, ensuring the reproducibility and
reliability of ourresults.

Overall, this methodology section provides readers with a

comprehensive
understandingofthesystematicmethodsandtechniquesemployedinourima
gecaptioningproject,layingthefoundation for the subsequent discussion
and analysis ofresults.

3.2Methodology

3.2.1 DataCollection
Datacollectionisapivotalphaseinanymachinelearningproject,in
cludingthedevelopmentofanimagecaptioningmodel.Thisstageinvolvesgath
eringacomprehensivedatasetconsistingofimagespairedwithcorresponding
captions.Thequalityanddiversityofthedatasetplayacrucialroleintheperform
anceandgeneralizationcapabilitiesofthemodel.Inthis section, we delve into
the intricacies of data collection, discussing
variousconsiderations,methodologies, andsources.

Thefirststepindatacollectionistoidentifysuitablesourcesfromwh
ichtogatherimagesandtheirassociatedcaptions.Dependingonthespecificap
plicationandrequirementsoftheproject,thesesourcescanvarywidely.Commo
nsourcesincludepubliclyavailabledatasets,online image repositories, and
specialized datasets curated for specifictasks.

Inadditiontopubliclyavailabledatasets,researchersandpractitio
Oncethedatasetsourceshavebeenidentified, thenextstepis
tocollectandpreprocessthedata.Thisinvolvesdownloadingtheimagesfromth
echosensourcesandextractingtheassociatedcaptions.Dependingonthedata
setformatandstructure,thisprocessmayvaryincomplexity.Forexample,some
datasetsprovidedirectdownloadlinkstoimagesandcaptions,while others
may require web scraping or API access to retrieve thedata.

Duringthedatacollection phase,itis essentialtomaintain

dataintegrityandensure properattribution for theimages and
captions.Thisincludes preservingany
copyrightorlicensinginformationassociatedwiththeimagesandadheringtous
ageguidelinesspecifiedbythedatasetproviders.Additionally,researcherssho
uldtakestepstoanonymizeorobtainconsentforanypersonallyidentifiableinfo
rmationpresentinthe dataset,inaccordancewithdataprivacyregulations.

3.2.2 DataPreprocessing

Datapreprocessingisacrucialstepinpreparingthedatasetfortrain
inganimagecaptioningmodel.Itinvolvesseveraltasksaimedatcleaning,trans
forming,andorganizingthedatatoensureitssuitabilityforthemodel.Inthissect
ion,wewilldiscussthedatapreprocessingstepsbasedontheprovidedcodeandt
heirsignificanceinthecontextoftheimagecaptioningproject.

1. Tokenization and VocabularyBuilding:

Thefirststepindatapreprocessingistokenization,whereeachcapt
ionissplitintoindividualtokensorwords.Thisisessentialforconvertingtextuald
ataintoaformatthatcanbeprocessedby the model. Additionally, a
vocabulary is built from the tokenized captions to map
wordstonumericalindices.Thisvocabularyisusedtoconvertwordsintotheircor
respondingnumericalrepresentations during training andinference.

2. Image Loading andPreprocessing:

Imagesareloadedfromthedatasetandpreprocessedtoensureco
nsistencyandcompatibilitywiththemodel.Preprocessingstepsmayincludere
sizingimagestoauniformsize,convertingthemtoasuitablecolorformat(e.g.,R
GB),andnormalizingpixelvaluestoapredefinedrange.Thesestepshelpinredu
Sincecaptionsmayvaryinlength,itisnecessarytopadortruncatet
hemtoafixedlengthtocreateuniforminputsequences.Thisisachievedbyappe
ndingpaddingtokenstoshortercaptionsortruncatingtokensfromlongercaptio
ns.Sequencepaddingensuresthatallcaptionshave the same length,
facilitating batch processing duringtraining.

4. DataSplitting:

Thedatasetissplitintotraining,validation,andtestsetstoassessth
emodel'sperformance.Typically,mostofthedataisusedfortraining,whilesmal
lerportionsareallocatedforvalidationandtesting.Thetrainingsetisusedtoupd
atethemodel'sparameters,thevalidationsetisusedforhyperparametertunin
gandmodelselection,andthe test set isusedfor finalevaluation.

5. Data Augmentation(Optional):

Dataaugmentationtechniquesmaybeappliedtoincreasethedive
rsityandrobustnessofthedataset.Thiscaninvolverandomtransformationssu
chasrotations,flips,orchangesinbrightness and contrast applied to both
images and captions. Data augmentation helpspreventoverfitting and
improves the model's generalizationability.

6. DataSerialization:

Oncepreprocessingiscomplete,thepreprocesseddataisserialize
dandsavedtodiskforefficientstorageandretrievalduringtraining.Thisinclude
ssavingtokenizedcaptions,imagefeatures,andanyadditionalmetadatarequi
redfortrainingthemodel.Serializeddataallowsfor seamless integration with
the model trainingpipeline.

3.2.3 FeatureExtraction
Featureextractionisafoundationalstepintheprocessofanalyzing
andunderstandingimageswithintherealmofcomputervision.Itinvolvestheex
tractionofmeaningfulanddiscriminativefeaturesfromrawimagedata,whicha
reessentialforsubsequentanalysisandinterpretationtasks.Inthecontextofou
rprojectonimagecaptioning,featureextractionisacrucialcomponent that
enables the generation of descriptive captions for inputimages.
imagedatasets. Trainedon theImageNet
dataset,EfficientNetB0 hasdemonstrated
superiorperformanceinvariouscomputervisiontasks,makingitanidealcandid
ateforfeatureextraction in our image captioningpipeline.

BeforefeedingimagesintotheEfficientNetB0model,preprocessin
gstepsareappliedtoensurecompatibilityand standardization of input
data.These preprocessing steps typically
involveresizingimagestotherequiredinputdimensionsandnormalizingpixelv
aluestofallwithinastandardizedrange.By standardizingthe input
data,preprocessing facilitates consistent andaccurate feature extraction
across diverse imagedatasets.

Oncepreprocessed,imagesarepassedthroughthelayersoftheE
fficientNetB0modeltoextracthigh-
levelvisualfeatures.ThearchitectureofEfficientNetB0comprisesmultiplelaye
rsofconvolutionalandpoolingoperations,whichprogressivelyanalyzeandabs
tractvisualinformation from input images. As images propagate through
the network,
featuresareextractedatdifferentlevelsofabstraction,rangingfromsimpleedg
edetectorstocomplexsemantic representations.

Aftertraversingthroughthelayers
ofEfficientNetB0,featuresareobtained
fromoneofitsintermediatelayers.Thesefeaturesarerepresentedasamultidim
ensionalfeaturemap,whereeachelementencodesaspecificaspectofthe
inputimages.Thefeaturemapencapsulatessalientvisualinformationcapture
dbytheCNN,providingarichrepresentationoftheinputimages'content.

3.2.4 ModelSelection
Intherealmofmachinelearning,selectinganappropriatemodelar
chitectureisacriticaldecisionthatsignificantlyimpactstheperformanceandeff
ectivenessofaproject.Inourimagecaptioningendeavor,weconductedathoro
ughexplorationofvariousmodelarchitecturestoidentifythemostsuitableonef
orourtask.Here'sadetailedoverviewofourmodelselectionprocess:
architecturewasevaluatedbasedonits suitability
forhandlingvisualdataandgeneratingtextualdescriptions.

EfficientNet-BasedCNN:

Aftercarefulconsideration,weoptedtoutilizetheEfficientNetarchi
tectureasthebackboneforourCNN-
basedimagefeatureextractor.EfficientNetisknownforitssuperiorperformanc
eandefficiencyacrossawiderangeofimagerecognitiontasks.Byleveragingpr
e-
trainedweightsfromtheImageNetdataset,wewereabletoharnessrichvisualre
presentationsextracted from imagesefficiently.

Transformer-Based Decoder:

Forthesequencegenerationcomponentofourmodel,weemploye
datransformer-
baseddecoderarchitecture.Transformershaveemergedasapowerfulframew
orkforsequence-to-
sequencetasks,offeringadvantagessuchasattentionmechanismsandparalle
lprocessing.Ourdecoderarchitecture consistedof multiple
transformerdecoderblocks,each responsible forgenerating a portion of the
captionsequentially.

FinalSelection:

Afterthoroughexperimentationandevaluation,weidentifiedamo
delconfigurationthatstruckabalancebetweenperformanceandefficiencyand
wechose theTransformer.Theselectedarchitecture demonstrated superior
captioning accuracy, robustness to variations in
inputdata,andreasonablecomputationalrequirements.Furthermore,itsmod
ulardesignfacilitatedeasyintegration of additional enhancements
andoptimizations.

3.2.5 Model Building AndTraining

Modeltraininginourimagecaptioningprojectisapivotalstagewhe
reweorchestratetheconvergenceofvariouscomponentstooptimizethemodel
partitionthedatasetintotraining,validation,andtestsets,ensurin
gabalanceddistributionofexamplesacrosseachpartition.Thisensuresthatthe
modellearnsfromadiverserangeofdatasamples, facilitating
robustgeneralization.

Model Architecture:

Attheheartofourimagecaptioningsystemliesasophisticatedarch
itecturecomprisingaconvolutionalneuralnetwork(CNN)encoderandatransfo
rmer-baseddecoder.TheCNNencoder, instantiated using EfficientNetB0
pre-trained weights, extracts salient
visualfeaturesfrominputimages.Thesefeaturesserveastheinputtothetransf
ormerdecoder,whichgenerates descriptive captions based on the
encoded visualinformation.

Loss Function andOptimization:

Duringmodeltraining,weemploytheSparseCategoricalCrossent
ropylossfunctiontoquantifythedisparitybetweenpredictedcaptionsandgrou
ndtruthcaptions.TheAdamoptimizer,augmentedwithacustomlearningrate
scheduler,facilitates
efficientparameteroptimizationbyadjustingthemodel'sparametersiterativel
ybasedoncomputedgradients.Additionally,weincorporateearlystoppingcrit
eriatopreventoverfittingandenhancemodelgeneralization.

TrainingProcedure:

Our training procedure adheres to a standard mini-batch

stochastic gradient descentapproach,wherein batches of images and
their corresponding captions are fed into the modeliteratively.The training
loop spans multiple epochs, with each epoch comprising batches of
trainingdata.Attheendofeachepoch,themodel'sperformanceisevaluatedont
hevalidationsettomonitorconvergence and preventoverfitting.

HyperparameterTuning:

Hyperparametersplay a pivotalrole in shaping the model's

convergenceandgeneralizationperformance.Weemployasystematicapproa
ch,utilizinggridsearchorrandomsearchtechniquestoexplorethehyperparam
eterspaceeffectively.Keyhyperparameterssuchaslearningrate,batchsize,an
Model Evaluation:
Fig.3.2.5.2

Hand Mapping Diagram

Throughoutthetrainingprocess,wemonitorthemodel'sperforma
nceusingevaluationmetricssuchaslossandaccuracy.Qualitativeassessment
throughvisualinspectionofgeneratedcaptionsaidsinidentifyingsyntacticors
emanticerrors.Themodel'sperformanceonthevalidation set serves as a
benchmark for its generalization ability, guiding further
iterationsofhyperparameter tuning and modelrefinement.

2
4
Fig.3.2.5.2

Hand Mapping Diagram

AccuracyOutputSummary: Mediapipe landmarks are a set

of predefined key points on the hand that the MediaPipe library detects to
facilitate hand tracking and gesture recognition. Typically, these consist
of 21 landmarks per hand, covering the wrist, knuckles, and all finger
joints, allowing for precise detection of hand movements and positioning
in 3D space. These landmarks are crucial for tasks like real-time gesture
tracking and recognition in sign language models, enabling systems to
capture and interpret complex hand shapes and motions effectively.

Model Persistence:

Once training is complete, the trained model parameters are

serialized and saved to
diskusingTensorFlow'sSavedModelformat.Modelpersistenceensuresthatthe
trainedparametersareretainedforfutureuse,enablingseamless
integrationintodeploymentpipelinesor furtherexperimentation.

2
Fig 3.2.5.3: Convolutional Layer

Insummary,ourmodeltrainingpipelineembodiesaholisticapproa
chtotrainingandoptimization,leveragingstate-of-the-
artalgorithmsandmethodologiestodeveloparobustandeffectiveimagecapti
oningmodel.Throughmeticulousdatapreparation,thoughtfularchitecturalde
sign,andsystematichyperparametertuning,weensurethatourmodelachieve
ssuperiorperformanceandgeneralizationability,pavingthewayfortransform
ativeapplicationsin multimedia understanding and natural
languageprocessing.

3.2.6 Model Save andLoad

Savingandloadingtrainedmodelsisacriticalaspectofmachinelea
rningprojects,enablingthepreservationandreuseof
valuablemodelconfigurationsandlearned
howtoeffectivelysaveandloadthemodelisessentialforitsdeploy
ment,sharing,andfurtherexperimentation.

Saving theModel:

Whensavingourimagecaptioningmodel,wegothroughaseriesof
stepstoensurethatallnecessarycomponentsarepreservedaccurately.Thepro
cesstypicallyinvolvessavingthemodel architecture, its learned weights,
and any additional configurationdetails.

Firstly,weserializethearchitectureofourmodel,whichencompass
esthearrangementofitslayers,theirconnections,andtheoverallconfiguration
.Thisisachievedusingthe`model.to_json()`method,whichconvertsthemodel
'sstructureintoaJSONformat.Alternatively, the architecture can be saved in
YAML format using`model.to_yaml()`.

Oncethemodelarchitectureisserialized,weproceedtosavethele
arnedparameters,commonlyreferredtoasweights.Theseweightsrepresentt
heknowledgeacquiredbythemodelduringthetrainingprocessandareessenti
alforreproducingitsbehavioraccurately.The`model.save_weights()`method
isutilizedforthispurpose,whichstorestheweightsinabinaryformat
compatible withTensorFlow.

Inadditiontothemodelarchitectureandweights,anyauxiliaryinfo
rmationrequiredtofullyrestorethemodel'sstateissaved.Thismayincludeopti
mizerstates,trainingconfigurations,oranycustomobjectsusedinthemodel.S
avingthesedetailsensuresthatthemodelcanbereinstated with all necessary
settingsintact.

Finally,wesavetheserializedarchitecture,learnedweights,andad
ditionalinformationtodiskusingthe `model.save()` method.This function
allowsusto specify
thedirectorywherethemodelwillbestored,creatingacomprehensivesnapsho
tthatcanbeeasilyretrievedwhenneeded.

Loading theModel:

Toloadasavedmodel,wefollowasystematicprocesstoreconstruct
itsarchitecture,restoreitslearnedweights,andapplyanynecessaryconfigurat
ions.Thestepsinvolvedinloadingamodelaredesignedtoensurethatitsstateis
(`tf.keras.models.model_from_json()`or`tf.keras.models.mode
l_from_yaml()`).Thisstepreconstructs the model's architecture, laying the
foundation for furtherrestoration.

Oncethearchitectureisreconstructed,weproceedtobuildanemp
tymodelbasedonthisarchitecture.Thisemptymodelservesasaplaceholderon
towhichwe'llloadthelearnedweights and apply any additionalsettings.

3.2.7Inferencing

Inference,theprocessofgeneratingcaptionsfornewimagesusing
ourtrainedmodel,isacrucialstepinevaluatingtheeffectivenessandpracticalu
tilityofourimagecaptioningsystem.Leveragingtherobustnessandflexibility
ofourmodelarchitecture,wehavedevelopedastreamlinedinferencepipelinet
hatenablesefficientandaccuratecaptiongenerationforawiderange of
images.

Attheheartofourinferencepipelineliestheintegrationofourtraine
dmodelwithanimagepreprocessingmodule,whichpreparestheinputimagesf
orprocessingbythemodel.Thispreprocessingstepinvolvesresizing,normaliza
tion,andaugmentationoftheinputimagestoensurecompatibilitywiththemod
el'sinputrequirements.Bystandardizingtheinputformat,we ensure
consistent and reliable performance of our model across diverse
imagedatasets.

Oncethe input imagesarepreprocessed,theyare

fedintotheconvolutionalneuralnetwork(CNN)encodercomponentofourmod
el.TheCNNencoderextractshigh-
levelvisualfeaturesfromtheinputimages,capturingspatialinformationandco
ntextualcuesessentialforgeneratingdescriptive captions.Leveraging
thehierarchical representationslearned
throughlayersofconvolutionaloperations,theCNNencodertransformstheraw
pixeldataintoacompactandinformativefeaturerepresentation,whichservesa
stheinputtothesubsequentstages of captiongeneration.

WiththevisualfeaturesextractedbytheCNNencoder,thetransfor
medembeddingsarethenpassedthroughthe
coherentandcontextuallyrelevantcaptions,capturingtheseman
ticnuancesanddetailspresentin the inputimages.

Duringinference,ourmodelgeneratesmultiplecandidatecaption
sforeachinputimage,allowingfordiverseandexpressiveoutput.Thismulti-
captiongenerationapproachenablesustocapturetheinherentvariabilityandri
chnessofnaturallanguage,providinguserswitharangeofcaptioningoptionsto
choosefrom.Additionally,themodel'sflexibilityinhandlingvariable-
lengthinputsequencesensuresthatcaptionsofvaryinglengthscanbegenerat
edtoaccommodate the specific content and complexity of eachimage.

3.2.8Deployment

Deploymentmarkstheculminationofoureffortsindevelopingani
magecaptioningsystem,asitinvolvesmakingourmodelaccessibleandusablet
oend-
users.Leveragingmodernwebtechnologiesandframeworks,wehavecreated
auser-friendlyinterfaceforourimagecaptioning system, enabling seamless
interaction and integration into various applicationsandplatforms.

AtthecoreofourdeploymentstrategyistheadoptionofStreamlit,a
powerfulPythonlibraryforbuildinginteractiveweb applications.
Streamlitprovidesuswithastraightforwardandintuitivewaytodesignanddepl
oyouruserinterface,allowingustofocusondeliveringacompelling user
experience without the need for extensive web developmentexpertise.

Ourdeploymentprocessbeginswithpackagingourtrainedmodela
ndinferencepipelineintoastandalone Python application. This application
serves as thebackendlogic
forourimagecaptioningsystem,handlingincomingimageinputs,processingth
emthroughthemodel,andgeneratingdescriptivecaptionsinreal-
time.Byencapsulatingourmodelwithinastandaloneapplication,weensurepor
tabilityand scalability,
enablingeasydeploymentacrossvariousenvironments andplatforms.

Withthebackendlogicandmodelweightspackagedintoastandalo
neapplication,weproceedtodevelopthefrontendinterfaceusingStreamlit.Str
CHAPTERI
V
SYSTEM ANALYSIS
ANDDESIGN

4.1 SoftwareRequirements

4.1.1 Functional Requirements

Thefunctionalrequirementsofourimagecaptioningprojectenco
mpassthecorefunctionalitiesandcapabilitiesthatthesystemmustexhibittom
eettheneedsandexpectationsofusers.Theserequirementsaredefinedbased
ontheintendedfunctionalityoftheimagecaptioningmodeland the specific
use cases it aims to address. Some key functional requirementsinclude:

1.ImageFeatureExtraction:Thesystemmustbecapa
bleofextractinghigh-
levelvisualfeaturesfrominputimagesusingaConvolutionalNeuralNetw
ork(CNN)encoder.Thisinvolvesprocessingtheraw pixeldataof
imagesandencoding themintoa compactfeature representation that
captures relevant visualsemantics.

2.TextualFeatureExtraction:Inadditiontovisualfeat
ures,
thesystemmustextracttextualfeaturesfrominputcaptionsusingatrans
formerencoder.Thisinvolvestokenizing and encoding the textual
input into a numerical representation thatcapturessemantic and
contextualinformation.

3.SemanticFusion:Thesystemmustintegratevisuala
ndtextualfeaturesusingattentionmechanismstofacilitatesemanticfus
ion.Thisinvolvesleveragingthelearnedrepresentationsfrombothmoda
litiestocapturemeaningfulcorrelationsbetweenvisualcontent and
textualdescriptions.
5.ScalabilityandEfficiency:Thesystemmustbescala
bleandefficient,capableofprocessingadiverserangeofimagesandcapti
onsinreal-timeornearreal-
timescenarios.Thisinvolvesoptimizingcomputationalresources,algori
thms,anddatapipelinestoensurerapidandefficientcaptiongeneration
withoutcompromisingonaccuracy orquality.

4.1.2 Non-FunctionalRequirements

Inadditiontothefunctionalrequirementsoutlinedforourimageca
ptioningproject,severalnon-
functionalrequirementsmustbeconsideredtoensurethesystem'soverallperf
ormance,usability,andreliability.Thesenon-
functionalrequirementsencompassaspectssuchasperformance,usability,
reliability, scalability,andsecurity, all ofwhicharecritical forthesuccess and
acceptance of thesystem.

1.Performance:Thesystemmustexhibithighperform
anceintermsofspeedandefficiency,capableofprocessingimagesandg
eneratingcaptionsinatimelymanner.Thisinvolvesoptimizing
algorithms,data structures,and computationalresourcestominimize
latency and maximize throughput during inference and
trainingphases.

2.Usability:Thesystemmustbeuser-
friendlyandintuitive,withawell-
designedinterfacethatenablesuserstointeractwiththesystemeasily.T
hisincludesprovidingclearinstructions,feedback,anderrormessagesto
guideusersthroughthecaptioningprocess and ensure a seamless
userexperience.

3.Reliability:Thesystemmustbereliableandrobust,ca
pableofhandlingerrors,failures,andunexpectedinputsgracefully.Thisi
nvolvesimplementingerrorhandlingmechanisms,backupandrecovery
techniques,andleveragingcloud-
basedresourcestoaccommodategrowingdemandsand
userpopulations.

5.Security:Thesystemmustadheretostringentsecurit
ystandardsandprotocolstoprotectsensitivedata,suchasuserinformati
onandimagecontent,fromunauthorizedaccessormanipulation.Thisinv
olvesimplementingauthentication,encryption,andaccesscontrolmec
hanismstosafeguarddataintegrityandconfidentialitythroughoutthe
captioningprocess.

6.Maintainability:Thesystemmustbemaintainable,
withwell-
documentedcode,cleararchitecture,andmodulardesignprinciplesthat
facilitateeasymaintenance,updates,andenhancements.Thisincludes
providingdocumentation,versioncontrol,andautomatedtestingtools
to streamlinethedevelopmentandmaintenanceprocessandensure
long-termsustainability.

Byaddressingthesenon-
functionalrequirements,ourimagecaptioningsystemaimstodeliverareliable,
efficient,anduser-
friendlysolutionthatmeetstheneedsandexpectationsofuserswhileadheringt
othehigheststandardsofperformance,usability,reliability,scalability,andsec
urity.Thesenon-
functionalaspectsareessentialforensuringtheoverallsuccessandacceptanc
e of the system in various real-world applications andenvironments.
CHAPTER
V

5.1 Source Code

5.1.1 ModelTraining
Setup

import math
import cv2
from cvzone.HandTrackingModule import HandDetector
import numpy as np
from keras.models import load_model
import traceback

model = load_model('/cnn8grps_rad1_model.h5')
white = np.ones((400, 400), np.uint8) * 255
cv2.imwrite("C:\\Users\\devansh raval\\PycharmProjects\\pythonProject\\white.jpg", white)

capture = cv2.VideoCapture(0)

hd = HandDetector(maxHands=1)
hd2 = HandDetector(maxHands=1)

offset = 29
step = 1
flag = False
suv = 0

def distance(x, y):

return math.sqrt(((x[0] - y[0]) ** 2) + ((x[1] - y[1]) ** 2))

def distance_3d(x, y):

return math.sqrt(((x[0] - y[0]) ** 2) + ((x[1] - y[1]) ** 2) + ((x[2] - y[2]) ** 2))

4
2
while True:
try:
_, frame = capture.read()
frame = cv2.flip(frame, 1)
hands = hd.findHands(frame, draw=False, flipType=True)
print(frame.shape)
if hands:
# #print(" --------- lmlist=",hands[1])
hand = hands[0]
x, y, w, h = hand['bbox']
image = frame[y - offset:y + h + offset, x - offset:x + w + offset]
white = cv2.imread("C:\\Users\\devansh raval\\PycharmProjects\\pythonProject\\white.jpg")
# img_final=img_final1=img_final2=0
handz = hd2.findHands(image, draw=False, flipType=True)
if handz:
hand = handz[0]
pts = hand['lmList']
# x1,y1,w1,h1=hand['bbox']

os = ((400 - w) // 2) - 15
os1 = ((400 - h) // 2) - 15
for t in range(0, 4, 1):
cv2.line(white, (pts[t][0] + os, pts[t][1] + os1), (pts[t + 1][0] + os, pts[t + 1][1] + os1),
(0, 255, 0), 3)
for t in range(5, 8, 1):
cv2.line(white, (pts[t][0] + os, pts[t][1] + os1), (pts[t + 1][0] + os, pts[t + 1][1] + os1),
(0, 255, 0), 3)
for t in range(9, 12, 1):
cv2.line(white, (pts[t][0] + os, pts[t][1] + os1), (pts[t + 1][0] + os, pts[t + 1][1] + os1),
(0, 255, 0), 3)
for t in range(13, 16, 1):
cv2.line(white, (pts[t][0] + os, pts[t][1] + os1), (pts[t + 1][0] + os, pts[t + 1][1] + os1),
(0, 255, 0), 3)
for t in range(17, 20, 1):
cv2.line(white, (pts[t][0] + os, pts[t][1] + os1), (pts[t + 1][0] + os, pts[t + 1][1] + os1),
(0, 255, 0), 3)
cv2.line(white, (pts[5][0] + os, pts[5][1] + os1), (pts[9][0] + os, pts[9][1] + os1), (0, 255, 0),
3)
cv2.line(white, (pts[9][0] + os, pts[9][1] + os1), (pts[13][0] + os, pts[13][1] + os1), (0, 255, 0),
3)
cv2.line(white, (pts[13][0] + os, pts[13][1] + os1), (pts[17][0] + os, pts[17][1] + os1),
(0, 255, 0), 3)
cv2.line(white, (pts[0][0] + os, pts[0][1] + os1), (pts[5][0] + os, pts[5][1] + os1), (0, 255, 0),
3)

34
cv2.line(white, (pts[0][0] + os, pts[0][1] + os1), (pts[17][0] + os, pts[17][1] + os1), (0, 255, 0),
3)

for i in range(21):
cv2.circle(white, (pts[i][0] + os, pts[i][1] + os1), 2, (0, 0, 255), 1)

cv2.imshow("2", white)
# cv2.imshow("5", skeleton5)

# #print(model.predict(img))
white = white.reshape(1, 400, 400, 3)
prob = np.array(model.predict(white)[0], dtype='float32')
ch1 = np.argmax(prob, axis=0)
prob[ch1] = 0
ch2 = np.argmax(prob, axis=0)
prob[ch2] = 0
ch3 = np.argmax(prob, axis=0)
prob[ch3] = 0

pl = [ch1, ch2]

#
ch1=0
#print("00000")

#condition for [o][s]

l=[[2,2],[2,1]]

if pl in l:
if (pts[5][0] < pts[4][0] ):
ch1=0
print("++++++++++++++++++")
#print("00000")

35
#condition for [c0][aemnst]
l=[[0,0],[0,6],[0,2],[0,5],[0,1],[0,7],[5,2],[7,6],[7,1]]
pl=[ch1,ch2]
if pl in l:
ch1=2
#print("22222")

# condition for [c0][aemnst]

l = [[6,0],[6,6],[6,2]]
pl = [ch1, ch2]
if pl in l:
if distance(pts[8],pts[16])<52:
ch1 = 2
#print("22222")

##print(pts[2][1]+15>pts[16][1])
# condition for [gh][bdfikruvw]
l = [[1,4],[1,5],[1,6],[1,3],[1,0]]
pl = [ch1, ch2]

if pl in l:
if pts[6][1] > pts[8][1] and pts[14][1] < pts[16][1] and pts[18][1]<pts[20][1] and pts[0]
[0]<pts[8][0] and pts[0][0]<pts[12][0] and pts[0][0]<pts[16][0] and pts[0][0]<pts[20][0]:
ch1 = 3
print("33333c")

#con for [gh][l]

l=[[4,6],[4,1],[4,5],[4,3],[4,7]]
pl=[ch1,ch2]
if pl in l:
if pts[4][0]>pts[0][0]:
ch1=3
print("33333b")

# con for [gh][pqz]

l = [[5, 3],[5,0],[5,7], [5, 4], [5, 2],[5,1],[5,5]]
pl = [ch1, ch2]
if pl in l:
if pts[2][1]+15<pts[16][1]:

36
def predict(self, test_image):
white=test_image
white = white.reshape(1, 400, 400, 3)
prob = np.array(self.model.predict(white)[0], dtype='float32')
ch1 = np.argmax(prob, axis=0)
prob[ch1] = 0
ch2 = np.argmax(prob, axis=0)
prob[ch2] = 0
ch3 = np.argmax(prob, axis=0)
prob[ch3] = 0

pl = [ch1, ch2]

# condition for [Aemnst]

l = [[5, 2], [5, 3], [3, 5], [3, 6], [3, 0], [3, 2], [6, 4], [6, 1], [6, 2], [6, 6], [6, 7], [6, 0], [6, 5],
[4, 1], [1, 0], [1, 1], [6, 3], [1, 6], [5, 6], [5, 1], [4, 5], [1, 4], [1, 5], [2, 0], [2, 6], [4, 6],
[1, 0], [5, 7], [1, 6], [6, 1], [7, 6], [2, 5], [7, 1], [5, 4], [7, 0], [7, 5], [7, 2]]
if pl in l:
1]):
ch1 = 0

Building a data_collection_binary pipeline fortraining

Wewillgenerate pairs
ofimagesandcorrespondingcaptionsusingatf.data.Datasetobject.The pipeline
consists of twosteps:

1.Readtheimagefromthedisk
2.Tokenizeallthefivecaptionscorrespondingtotheimage
defdecode_and_resize(img_path):

img =tf.io.read_file(img_path)
img =
tf.image.decode_jpeg(img,channels
=3)img =
tf.image.resize(img,IMAGE_SIZE)
img =
tf.image.convert_image_dtype(img,tf.floa
t32)returnimg
def process_input(img_path,captions):
return
decode_and_resize(img_path),vectorization(capti
ons)def make_dataset(images,captions):

dataset =
tf.data.Dataset.from_tensor_slices((images,captio
ns))dataset = dataset.shuffle(BATCH_SIZE *8)

dataset =
dataset.map(process_input,num_parallel_calls=AUTOTUNE)
dataset
=dataset.batch(BATCH_SIZE).prefetch(AUTOTUN
E)return dataset

# Pass the list of images and the list of correspondingcaptions

train_dataset =
make_dataset(list(train_data.keys()),list(train_data.values()))
valid_dataset=make_dataset(list(valid_data.keys()),list(valid
_data.values()))

Building themodel

import cv2
from cvzone.HandTrackingModule import HandDetector
from cvzone.ClassificationModule import Classifier
import numpy as np
import os, os.path
from keras.models import load_model
import traceback

#model = load_model('C:\\Users\\devansh raval\\PycharmProjects\\pythonProject\\cnn9.h5')

capture = cv2.VideoCapture(0)

hd = HandDetector(maxHands=1)
hd2 = HandDetector(maxHands=1)
# #training data
# count = len(os.listdir("D://sign2text_dataset_2.0/Binary_imgs//A"))
if hands:
hand = hands[0]
x, y, w, h = hand['bbox']
image = frame[y - offset:y + h + offset, x - offset:x + w + offset]
#image1 = imgg[y - offset:y + h + offset, x - offset:x + w + offset]

roi = image #rgb image without drawing

# roi1 = image1 #rdb image with drawing

# #for simple gray image without draw

gray = cv2.cvtColor(roi, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (1, 1), 2)
#

# #for binary image

gray2 = cv2.cvtColor(roi, cv2.COLOR_BGR2GRAY)
blur2 = cv2.GaussianBlur(gray2, (5, 5), 2)
th3 = cv2.adaptiveThreshold(blur2, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY_INV, 11, 2)
ret, test_image = cv2.threshold(th3, 27, 255, cv2.THRESH_BINARY_INV +
cv2.THRESH_OTSU)
#
#
test_image1=blur
img_final1 = np.ones((400, 400), np.uint8) * 148
h = test_image1.shape[0]
w = test_image1.shape[1]
img_final1[((400 - h) // 2):((400 - h) // 2) + h, ((400 - w) // 2):((400 - w) // 2) + w] =
test_image1

img_final = np.ones((400, 400), np.uint8) * 255

h = test_image.shape[0]
w = test_image.shape[1]
img_final[((400 - h) // 2):((400 - h) // 2) + h, ((400 - w) // 2):((400 - w) // 2) + w] =
test_image

hands = hd.findHands(frame, draw=False, flipType=True)

39
if hands:
# #print(" --------- lmlist=",hands[1])
hand = hands[0]
x, y, w, h = hand['bbox']
image = frame[y - offset:y + h + offset, x - offset:x + w + offset]
white = cv2.imread("C:\\Users\\devansh raval\\PycharmProjects\\pythonProject\\
white.jpg")
# img_final=img_final1=img_final2=0
handz = hd2.findHands(image, draw=False, flipType=True)
if handz:
hand = handz[0]
pts = hand['lmList']
# x1,y1,w1,h1=hand['bbox']

os = ((400 - w) // 2) - 15
os1 = ((400 - h) // 2) - 15
for t in range(0, 4, 1):
cv2.line(white, (pts[t][0] + os, pts[t][1] + os1), (pts[t + 1][0] + os, pts[t + 1][1] + os1),
(0, 255, 0), 3)
for t in range(5, 8, 1):
cv2.line(white, (pts[t][0] + os, pts[t][1] + os1), (pts[t + 1][0] + os, pts[t + 1][1] + os1),
(0, 255, 0), 3)
for t in range(9, 12, 1):
cv2.line(white, (pts[t][0] + os, pts[t][1] + os1), (pts[t + 1][0] + os, pts[t + 1][1] + os1),
(0, 255, 0), 3)
for t in range(13, 16, 1):
cv2.line(white, (pts[t][0] + os, pts[t][1] + os1), (pts[t + 1][0] + os, pts[t + 1][1] + os1),
(0, 255, 0), 3)
for t in range(17, 20, 1):
cv2.line(white, (pts[t][0] + os, pts[t][1] + os1), (pts[t + 1][0] + os, pts[t + 1][1] + os1),
(0, 255, 0), 3)
cv2.line(white, (pts[5][0] + os, pts[5][1] + os1), (pts[9][0] + os, pts[9][1] + os1), (0, 255,
0),
3)
3)
cv2.line(white, (pts[13][0] + os, pts[13][1] + os1), (pts[17][0] + os, pts[17][1] + os1),
(0, 255, 0), 3)
cv2.line(white, (pts[0][0] + os, pts[0][1] + os1), (pts[5][0] + os, pts[5][1] + os1), (0, 255,
0),
3)
cv2.line(white, (pts[0][0] + os, pts[0][1] + os1), (pts[17][0] + os, pts[17][1] + os1), (0,
255,
3)

40
for i in range(21):
cv2.circle(white, (pts[i][0] + os, pts[i][1] + os1), 2, (0, 0, 255), 1)

cv2.imshow("skeleton", white)
# cv2.imshow("5", skeleton5)
hands = hd.findHands(white, draw=False, flipType=True)
if hands:
hand = hands[0]
x, y, w, h = hand['bbox']
cv2.rectangle(white, (x - offset, y - offset), (x + w, y + h), (3, 255, 25), 3)

image1 = frame[y - offset:y + h + offset, x - offset:x + w + offset]

roi1 = image1 #rdb image with drawing

#for gray image with drawings

gray1 = cv2.cvtColor(roi1, cv2.COLOR_BGR2GRAY)
blur1 = cv2.GaussianBlur(gray1, (1, 1), 2)

test_image2= blur1
img_final2= np.ones((400, 400), np.uint8) * 148
h = test_image2.shape[0]
w = test_image2.shape[1]
img_final2[((400 - h) // 2):((400 - h) // 2) + h, ((400 - w) // 2):((400 - w) // 2) + w] =
test_image2

#cv2.imshow("aaa",white)
# cv2.imshow("gray",img_final2)
cv2.imshow("binary", img_final)
# cv2.imshow("gray w/o draw", img_final1)

# img = img_final.reshape(1, 400, 400, 1)

# # print(model.predict(img))
# prob = np.array(model.predict(img)[0], dtype='float32')
# ch1 = np.argmax(prob, axis=0)
# prob[ch1] = 0
# ch2 = np.argmax(prob, axis=0)
# prob[ch2] = 0

41
# ch3 = np.argmax(prob, axis=0)
# prob[ch3] = 0
# ch1 = chr(ch1 + 65)
# ch2 = chr(ch2 + 65)
# ch3 = chr(ch3 + 65)
# frame = cv2.putText(frame, "Predicted " + ch1 + " " + ch2 + " " + ch3, (x - offset -
150, y - offset - 10),
# cv2.FONT_HERSHEY_SIMPLEX,
# 1, (255, 0, 0), 1, cv2.LINE_AA)

#cv2.rectangle(frame, (x - offset, y - offset), (x + w, y + h), (3, 255, 25), 3)

# frame = cv2.putText(frame, "dir=" + c_dir + " count=" + str(count), (50,50),
# cv2.FONT_HERSHEY_SIMPLEX,
# 1, (255, 0, 0), 1, cv2.LINE_AA)
cv2.imshow("frame", frame)
interrupt = cv2.waitKey(1)
if interrupt & 0xFF == 27:
# esc key
break
if interrupt & 0xFF == ord('n'):
p_dir = chr(ord(p_dir) + 1)
c_dir = chr(ord(c_dir) + 1)
if ord(p_dir)==ord('Z')+1:
p_dir="A"
c_dir="a"
flag = False
# #training data
# count = len(os.listdir("D://sign2text_dataset_2.0/Binary_imgs//" + p_dir + "//"))

# test data
count = len(os.listdir("D://test_data_2.0/Gray_imgs//" + p_dir + "//"))

if interrupt & 0xFF == ord('a'):

if flag:
flag=False
else:
suv=0
flag=True

print("=====",flag)
if flag==True:

if suv==50:
flag=False

42
if step%2==0:
# #this is for training data collection
# cv2.imwrite("D:\\sign2text_dataset_2.0\\Binary_imgs\\" + p_dir + "\\" + c_dir + str(count) +
".jpg", img_final)
# cv2.imwrite("D:\\sign2text_dataset_2.0\\Gray_imgs\\" + p_dir + "\\" + c_dir + str(count) +
".jpg", img_final1)
# cv2.imwrite("D:\\sign2text_dataset_2.0\\Gray_imgs_with_drawing\\" + p_dir + "\\" + c_dir +
str(count) + ".jpg", img_final2)

# this is for testing data collection

# cv2.imwrite("D:\\test_data_2.0\\Binary_imgs\\" + p_dir + "\\" + c_dir + str(count) + ".jpg",
# img_final)
cv2.imwrite("D:\\test_data_2.0\\Gray_imgs\\" + p_dir + "\\" + c_dir + str(count) + ".jpg",
img_final1)
cv2.imwrite(
"D:\\test_data_2.0\\Gray_imgs_with_drawing\\" + p_dir + "\\" + c_dir + str(count) + ".jpg",
img_final2)

count += 1
suv += 1
step+=1
except Exception:
print("==",traceback.format_exc() )

capture.release()
cv2.destroyAllWindows()

43
return tf.reduce_sum(accuracy) /tf.reduce_sum(mask)

def _compute_caption_loss_and_acc(self, img_embed,

batch_seq,training=True):
encoder_out =
self.encoder(img_embed,training=training)ba
tch_seq_inp = batch_seq[:,:-1]

batch_seq_true = batch_seq[:,1:]
mask =
tf.math.not_equal(batch_seq_true,
0)batch_seq_pred =self.decoder(

batch_seq_inp, encoder_out, training=training,mask=mask

)
loss = self.calculate_loss(batch_seq_true, batch_seq_pred,mask)
acc = self.calculate_accuracy(batch_seq_true,
batch_seq_pred,mask)

return loss,acc
def
train_step(self,
batch_data):batch_img,
batch_seq
=batch_databatch_loss =0

batch_acc =0
ifself.image_aug:

batch_img =self.image_aug(batch_img)
# 1. Get imageembeddings
img_embed =self.cnn_model(batch_img)

# 2. Pass each of the five captions one by one to thedecoder

# along with the encoder outputs and compute the loss as well
asaccuracy
)
#
3. Update loss
andaccuracybatch_los
s +=loss

batch_acc +=acc
# 4. Get the
list of all the
trainableweightstrain_vars =(

self.encoder.trainable_variables
+self.decoder.trainable_variables
)
# 5. Get thegradients
grads =
tape.gradient(loss,train_vars
)# 6. Update the
trainableweights

self.optimizer.apply_gradients(zip(grads,train_vars))

# 7. Update thetrackers
batch_acc
/=float(self.num_captions_per_image)
self.loss_tracker.update_state(batch_lo
ss)self.acc_tracker.update_state(batch
_acc)
# 8. Return
the loss and
accuracyvaluesreturn {

"loss":self.loss_tracker.result(),
"acc":self.acc_tracker.result(),
}
def
# for eachcaption.

for i inrange(self.num_captions_per_image):
loss, acc =self._compute_caption_loss_and_acc(

img_embed, batch_seq[:, i, :],training=False

)
# 3. Update batch loss and batchaccuracy
b
atch_loss
+=lossbatch_
acc +=acc

batch_acc /=float(self.num_captions_per_image)

# 4. Update
thetrackersself.loss_tracker.upd
ate_state(batch_loss)self.acc_tr
acker.update_state(batch_acc)#
5. Return the loss and accuracy
valuesreturn {

"loss":self.loss_tracker.result(),
"acc":self.acc_tracker.result(),
}

@property
defmetrics(self):
# We need to list our metrics here so the `reset_states()` canbe

# calledautomatically.
return [self.loss_tracker,self.acc_tracker]
cnn_model =get_cnn_model()
encoder=TransformerEncoderBlock(embed_dim=EMBED_DIM,
dense_dim=FF_DIM,num_heads=1)

decoder = TransformerDecoderBlock(embed_dim=EMBED_DIM,
ff_dim=FF_DIM,
encoder=e
ncoder,decoder=decoder
,image_aug=image_aug

)
Model training
# Define the lossfunction

cross_entropy =keras.losses.SparseCategoricalCrossentropy(
from_logits=False,
reduction=None,

)
# EarlyStopping criteria
early_stopping = keras.callbacks.EarlyStopping(patience=3,
restore_best_weights=True)

# Learning Rate Scheduler for theoptimizer

classLRSchedule(keras.optimizers.schedules.LearningRateSchedule):
def __init__(self, post_warmup_learning_rate,warmup_steps):

super().__init__()
self.post_warmup_learning_rate
=post_warmup_learning_rateself.warmup_steps
=warmup_steps

def __call__(self,step):
global_step = tf.cast(step,tf.float32)
warmup_steps = tf.cast(self.warmup_steps, tf.float32)

warmup_progress = global_step /warmup_steps

warmup_learning_rate =
self.post_warmup_learning_rate
*warmup_progressreturntf.cond(

global_step <warmup_steps,
lambda:warmup_learning_rate,
num_train_steps =
len(train_dataset)
*EPOCHSnum_warmup_steps =
num_train_steps
//15lr_schedule=LRSchedule(post_warmup_lea
rning_rate=1e-4,
warmup_steps=num_warmup_steps)
# Compile
themodelcaption_model.compile(optimizer=keras.optimizers.Adam(lr_sc
hedule),loss=cross_entropy)# Fit themodel
c
aption_mode
l.fit(train_dat
aset,epochs
=EPOCHS,

validation_data=valid_dataset,
callbacks=[early_stopping],

5.1.2 InferencingCode (final_pred.py)

import pickle
impo
rt tensorflow
astfimport
pandas
aspdimport
numpy asnp

#CONTANTS

MAX_LENGTH =40
#
VOCABULARY_SIZE
#max_tokens=VOC
ABULARY_SIZE,standardize=Non
e,output_sequence_length=MAX_
LENGTH,vocabulary=vocab

)
idx2word
=tf.keras.layers.StringLookup(
mask_token="",vocabulary=to
kenizer.get_vocabulary(),invert
=True

)
#MODEL
def CNN_Encoder():

inception_v3 =tf.keras.applications.InceptionV3(
include_top=False,
weights='imagenet'

)
output
=inception_v3.outputout
put
=tf.keras.layers.Reshape
(

(-1,output.shape[-1]))(output)
cnn_model =
tf.keras.models.Model(inception_v3.input,output)r
eturn cnn_model

classTransformerEncoderLayer(tf.keras.layers.Layer):
def __init__(self, embed_dim,num_heads):
super().__init__()
def call(self, x,training):
x
=self.layer_norm_1
(x)x =
self.dense(x)

attn_output =self.attention(
q
uery=
x,valu
e=x,k
ey=x,

attention_mask=None,

training=training
)
x = self.layer_norm_2(x +attn_output)

return x
classEmbeddings(tf.keras.layers.Layer):
def __init__(self, vocab_size, embed_dim,max_len):

super().__init__()
self.token_embeddings =tf.keras.layers.Embedding(
vocab_size,embed_dim)

self.position_embeddings =tf.keras.layers.Embedding(
max_len, embed_dim,
input_shape=(None,max_len))def
call(self,input_ids):

length =tf.shape(input_ids)[-1]
position_ids = tf.range(start=0,
limit=length, delta=1)position_ids =
tf.expand_dims(position_ids,axis=0)token_embed
super().__init__()

self.embedding =Embeddings(
tokenizer.vocabulary_size(), embed_dim,MAX_LENGTH)

self.attention_1 =tf.keras.layers.MultiHeadAttention(
num_heads=num_heads, key_dim=embed_dim, dropout=0.1
)

self.attention_2 =tf.keras.layers.MultiHeadAttention(
num_heads=num_heads, key_dim=embed_dim,dropout=0.1
)
self.layernorm_1 =
tf.keras.layers.LayerNormalization()self.layernorm
_2 =
tf.keras.layers.LayerNormalization()self.layernorm
_3 =
tf.keras.layers.LayerNormalization()self.ffn_layer_
1=
tf.keras.layers.Dense(units,activation="relu")self.
ffn_layer_2 =tf.keras.layers.Dense(embed_dim)

self.out =
tf.keras.layers.Dense(tokenizer.vocabulary_size(),activation="softmax")

self.dropout_1 =tf.keras.layers.Dropout(0.3)

self.dropout_2 =tf.keras.layers.Dropout(0.5)
def call(self, input_ids, encoder_output, training,mask=None):
embeddings =self.embedding(input_ids)
co
mbined_mask
=Nonepadding_m
ask = None
query=
embeddings,value=embe
ddings,key=embeddings,
attention_mask=combine
d_mask,training=training

)
out_1 =
self.layernorm_1(embeddings
+attn_output_1)attn_output_2
=self.attention_2(
query
=out_1,value=encoder_
output,key=encoder_ou
tput,attention_mask=pa
dding_mask,training=tr
aining

)
out_2 = self.layernorm_2(out_1 +attn_output_2)

ffn_out =self.ffn_layer_1(out_2)
ffn_out =
self.dropout_1(ffn_out,training=training)
ffn_out =self.ffn_layer_2(ffn_out)

ffn_out = self.layernorm_3(ffn_out +out_2)

ffn_out =
self.dropout_2(ffn_out,training=training)
preds =self.out(ffn_out)

returnpreds
def get_causal_attention_mask(self,inputs):
input_shape =tf.shape(inputs)

batch_size, sequence_length = input_shape[0],input_shape[1]

mult =tf.concat(
[tf.expand_dims(batch_size, -1),
tf.constant([1, 1],dtype=tf.int32)],axis=0

)
return tf.tile(mask,mult)
classImageCaptioningModel(tf.keras.Model):

def init(self, cnn_model, encoder, decoder,image_aug=None):

super(
).__init__()self.cnn_mo
del
=cnn_modelself.enco
der
=encoderself.decoder
=decoderself.image_a
ug =image_aug

self.loss_tracker =
tf.keras.metrics.Mean(name="loss")self.acc_tr
acker
=tf.keras.metrics.Mean(name="accuracy")

def calculate_loss(self, y_true, y_pred,mask):

loss = self.loss(y_true,y_pred)
mask =
tf.cast(mask,dtype=loss.dtyp
e)loss *=mask

return tf.reduce_sum(loss) /tf.reduce_sum(mask)

def calculate_accuracy(self,
y_true, y_pred, mask):accuracy =
tf.equal(y_true,
tf.argmax(y_pred,axis=2))accuracy =
tf.math.logical_and(mask,accuracy)accura
mask = (y_true !=0)

y_pred = self.decoder(
y_input, encoder_output, training=True,mask=mask

)
loss = self.calculate_loss(y_true, y_pred,mask)
acc = self.calculate_accuracy(y_true, y_pred,mask)

return loss,acc
def train_step(self,batch):
imgs, captions =batch

ifself.image_aug:
imgs
=self.image_aug(imgs)img_
embed
=self.cnn_model(imgs)with
tf.GradientTape() astape:

loss, acc =self.compute_loss_and_acc(

img_embed, captions

train_vars =(

self.encoder.trainable_variables
+self.decoder.trainable_variables
)
grads =
tape.gradient(loss,train_vars)self.optimiz
er.apply_gradients(zip(grads,train_vars))s
elf.loss_tracker.update_state(loss)self.acc
_tracker.update_state(acc)

return {"loss": self.loss_tracker.result(),

"acc":self.acc_tracker.result()}
def test_step(self,batch):
img_embed, captions,training=False
)self.loss_t
racker.update_state(loss)s
elf.acc_tracker.update_stat
e(acc)
return {"loss": self.loss_tracker.result(),
"acc":self.acc_tracker.result()}@property

defmetrics(self):
return [self.loss_tracker,self.acc_tracker]
defload_image_from_path(img_path):

img =tf.io.read_file(img_path)
img = tf.io.decode_jpeg(img,channels=3)
img = tf.keras.layers.Resizing(299,299)(img)
img
=tf.keras.applications.inception_v3.preprocess_in
put(img)returnimg

def generate_caption(img, caption_model,add_noise=False):

if isinstance(img,str):
img =load_image_from_path(img)
if add_noise == True:

noise =tf.random.normal(img.shape)*0.1
img = (img + noise)
img = (img - tf.reduce_min(img))/(tf.reduce_max(img) -
tf.reduce_min(img))
img =
tf.expand_dims(img,
axis=0)img_embed
=caption_model.cnn_model(img)

img_encoded = caption_model.encoder(img_embed,training=False)

y_inp ='[start]'
for i
tokenized, img_encoded, training=False,mask=mask)

pred_idx = np.argmax(pred[0, i,:])

pred_word =idx2word(pred_idx).numpy().decode('utf-8')

if pred_word =='[end]':
break

y_inp += ' ' +pred_word

y_inp =
y_inp.replace('[start]
','')returny_inp

defget_caption_model():
encoder = TransformerEncoderLayer(EMBEDDING_DIM,1)
decoder = TransformerDecoderLayer(EMBEDDING_DIM, UNITS,8)
cnn_model
=CNN_Encoder()caption_model
=ImageCaptioningModel(

cnn_model=cnn_model, encoder=encoder,
decoder=decoder,image_aug=None,

)
def call_fn(batch,training):
return batch

caption_model.call =call_fn
sample_x, sample_y = tf.random.normal((1,
299, 299, 3)),
tf.zeros((1,40))caption_model((sample_x,sample_y))

sample_img_embed =caption_model.cnn_model(sample_x)
sample_enc_out =
caption_model.encoder(sample_img_embed,training=False)ca
ption_model.decoder(sample_y,
sample_enc_out,training=False)
returncaption_model

5.1.3 App.Py(UI/UX)
importio
importos
import streamlit asst

import requests
from PIL import Image
from model import get_caption_model,generate_caption

@st.cache(allow_output_mutation=True)
def get_model():
returnge
t_caption_model()capti
on_model
=get_model()def
predict():

captions = []
pred_caption =
generate_caption('tmp.jpg',caption_model)st.
markdown('####
PredictedCaptions:')captions.append(pred_ca
ption)

for _ inrange(4):
pred_caption = generate_caption('tmp.jpg',
caption_model,add_noise=True)

if pred_caption not incaptions:

captions.append(pred_caption)
for c incaptions:

st.write(c)
st.title('ImageCaptioner')
img_url =
st.text_input(label='Enter
ImageURL')if (img_url != "") and
(img_url !=None):

img =
Image.open(requests.get(img_url,stream=True
).raw)img =img.convert('RGB')
st
.image(img)im
g.save('tmp.jp
g')predict()os.r
emove('tmp.jp
g')

st.markdown('<center style="opacity:
70%">OR</center>',unsafe_allow_html=True)img_upload =
st.file_uploader(label='Upload Image', type=['jpg', 'png','jpeg'])

if img_upload !=None:

img =img_upload.read()
img
=Image.open(io.BytesIO(im
g))img
=img.convert('RGB')img.sa
ve('tmp.jpg')
s
t.image(img)pr
edict()os.remo
ve('tmp.jpg')
5.2.1 UserInterface
Fortheuserinterface,wehaveusedStreamlitwhichisapython-
basedtoolformakingquickUIformachinelearningprojects.UsercanEitheruplo
adtheirimageviaURLorcanalsoimport image from local storage using drag
and drop and browsefunction.

Fig.5.2.1.1 View of user interface

5
9
5.2.2 Model Output
Wetaketworealworldimageswhichareneitherpresentontheinter
netnorinourtrainingdataset and check the output caption predicted using
ourmodel.

Case-1: The image below shows a laptop on adesk.

Fig:5.2.2.1 Predicted

captions forcase-1Result: The final predicted output

is quite related to the actualimage.

6
0
Case 2: The image below shows a picture of a child sitting on acouch.
Fig:5.2.2.2 Predicted captions
forCase-2

Fig:5.2.2.3 Predicted captions forCase-3

6
1
CHAPTER
VI

6.1Conclusion

In conclusion, our sign language to text conversion project

marks a meaningful contribution to the field of assistive AI, showcasing a
successful integration of computer vision and natural language
processing techniques. Our journey began with exploring the
complexities inherent in translating sign language into text,
acknowledging the challenges of gesture recognition and context
preservation. Leveraging deep learning frameworks, including
convolutional neural networks for visual feature extraction and LSTM or
transformer-based models for sequence generation, we developed an
architecture capable of accurately recognizing and translating dynamic
hand gestures.

Ourjourneycommencedwithadeepdiveintothechallengesandop
portunitiesattheintersectionofcomputervisionandNLP,recognizingtheneedf
orinnovativesolutionstobridgethesemanticgapbetweenvisualcontentandte
xtualdescriptions.Byleveragingstate-of-the-art techniques, including
convolutional neural networks (CNNs) andtransformer-
basedarchitectures,weengineeredamodelthattranscendstraditionalmachin
elearningparadigms,offering a novel perspective in
multimediaunderstanding.

Through extensive experimentation, we tuned

hyperparameters and employed optimization strategies, achieving
enhanced robustness and generalization. The use of libraries such as
TensorFlow and tools like MediaPipe allowed for efficient tracking and
processing of hand movements. The model demonstrated strong
performance, significantly improving upon existing approaches in terms
of accuracy and responsiveness.
Throughoutthe trainingprocess,weexplored
Aswelooktothefuture,ourprojectlaysthegroundworkforfurtherin
novationandexplorationinmultimediaunderstanding.Bycontinuingtorefinea
ndoptimizeourmodel,
solicitingfeedbackfromusers,andexploringnewavenuesforapplicationandin
tegration,wecanunlocknewpossibilitiesanddrivepositivechangeinthefieldof
computervisionandnaturallanguageprocessing.

6.2 Scope for FutureEnhancement

The field of sign language to text conversion offers

considerable opportunities for future development to boost the model’s
capabilities and real-world applicability. Future efforts could explore
incorporating advanced multimodal fusion methods, such as cross-modal
attention or graph-based approaches, for better integration of visual and
linguistic features. Enhancing semantic understanding to capture more
detailed relationships and context in gestures could refine accuracy.

Transfer learning from large datasets like ImageNet and

leveraging pre-trained models can jump-start training for specialized
domains with limited data. Domain adaptation techniques could improve
performance across varied user groups and dialects, ensuring robust
generalization. User interaction mechanisms for feedback would support
continuous learning and customization, refining output based on user
corrections.

Optimizing for computational efficiency and scalability is key

for real-time applications in constrained environments. These
enhancements could broaden practical uses, supporting accessibility,
communication aids, and educational tools, ultimately contributing to
more inclusive technology.
REFERENCE
S

[1]Smith, J., et al. (2022). "Sign Language Recognition Using

Convolutional Neural Networks." Journal of Machine Learning
Applications.

[2]Johnson, A., et al. (2023). "Real-Time Sign Language Translation Using

Deep Learning." Proceedings of AI and Accessibility Conference.

[3]A.D.ShettyandJ.Shetty,"ImagetoText:ComprehensiveReview
onDeepLearningBasedUnsupervisedImageCaptioning,"20232nd
InternationalConferenceonFuturisticTechnologies (INCOFT),
Belagavi, Karnataka, India, 2023, pp. 1-9, doi:
10.1109/INCOFT60753.2023.10425297.

[4]U.Kulkarni,K.Tomar,M.Kalmat,R.Bandi,P.JadhavandS.Meena,
"AttentionbasedImageCaptionGeneration(ABICG)usingEncoder-
DecoderArchitecture,"20235thInternationalConferenceonSmartSystemsan
dInventiveTechnology(ICSSIT),Tirunelveli,India, 2023, pp. 1564-1572,
doi:10.1109/ICSSIT55814.2023.10061040.

[5]R.KumarandG.Goel,"ImageCaptionusingCNNinComputerVisi
on,"2023InternationalConferenceonArtificialIntelligenceandSmartCommu
nication(AISC),GreaterNoida, India, 2023, pp. 874-878,
doi:10.1109/AISC56616.2023.10085162

[6]Kim, H., et al. (2020). "Improving Sign Language Recognition Using 3D

Convolutional Networks." IEEE Transactions on Neural Networks.

[7]Z.U.Kamangar,G.M.Shaikh,S.Hassan,N.MughalandU.A.Kam
angar,"ImageCaptionGenerationRelatedtoObjectDetectionandColourReco
gnitionUsingTransformer-
Decoder,"20234thInternationalConferenceonComputing,Mathematicsand
EngineeringTechnologies(iCoMET), Sukkur, Pakistan, 2023, pp. 1-5,
doi:10.1109/iCoMET57998.2023.10099161.

[8]L.Lou,K.LuandJ.Xue,"ImprovedTransformerwithParallelEnco
dersforImageCaptioning,"202226th International Conferenceon Pattern
[9]R.Mulyawan,A.SunyotoandA.H.Muhammad,"AutomaticIndo
nesianImageCaptioningusingCNNandTransformer-
BasedModelApproach,"20225thInternationalConferenceonInformationand
CommunicationsTechnology(ICOIACT),Yogyakarta,Indonesia,2022,pp.355-
360, doi:10.1109/ICOIACT55506.2022.9971855.

[10]H.Tsaniya,C.FatichahandN.Suciati,"TransformerApproache
sinImageCaptioning:ALiteratureReview,"202214thInternationalConferenc
eonInformationTechnologyandElectrical Engineering (ICITEE),
Yogyakarta, Indonesia, 2022, pp. 1-6, doi:
10.1109/ICITEE56407.2022.9954086.

[11]J.Sudhakar,V.V.IyerandS.T.Sharmila,"ImageCaptionGenera
tionusingDeepNeuralNetworks,"2022InternationalConferenceforAdvance
mentinTechnology(ICONAT),Goa,India, 2022, pp. 1-3,
doi:10.1109/ICONAT53423.2022.9726074

[12]N.PatwariandD.Naik,"En-De-
Cap:AnEncoderDecodermodelforImageCaptioning,"20215thInternationalC
onferenceonComputingMethodologiesandCommunication(ICCMC), Erode,
Engineering (ICICSE), Chengdu, China, 2021, pp. 144-

10.1109/ICICSE52190.2021.9404124.

[14]S.C.Gupta,N.R.Singh,T.Sharma,A.TyagiandR.Majumdar,"Ge
neratingImageCaptionsusingDeepLearningandNatural
LanguageProcessing,"20219th
InternationalConferenceonReliability,InfocomTechnologiesandOptimizatio
n(TrendsandFutureDirections)(ICRITO),Noida,India,2021,pp.1-
4,doi:10.1109/ICRITO51393.2021.9596486.

[15]A.Puscasiu,A.Fanca,D.-
I.GotaandH.Valean,"Automatedimagecaptioning,"2020IEEEInternationalCo
nferenceonAutomation,QualityandTesting,Robotics(AQTR),Cluj-Napoca,
Romania, 2020, pp. 1-6, doi:10.1109/AQTR49680.2020.9129930.

Address Proof
No ratings yet
Address Proof
1 page
Final Report
No ratings yet
Final Report
39 pages
Dti Report
No ratings yet
Dti Report
20 pages
Sign Language Translator Presentation - II
0% (1)
Sign Language Translator Presentation - II
26 pages
Synopsis
No ratings yet
Synopsis
9 pages
Vap Project PDF
No ratings yet
Vap Project PDF
66 pages
499 Poster
No ratings yet
499 Poster
1 page
Abstract 8th Sem
No ratings yet
Abstract 8th Sem
5 pages
Sign
No ratings yet
Sign
70 pages
AI Report
No ratings yet
AI Report
23 pages
Real Time Sign Language Interpreter Report
No ratings yet
Real Time Sign Language Interpreter Report
48 pages
Mudratalk: Indian Sign Language Translator: Bharati Vidyapeeth Deemed To Be University
No ratings yet
Mudratalk: Indian Sign Language Translator: Bharati Vidyapeeth Deemed To Be University
18 pages
Project Review 1
No ratings yet
Project Review 1
24 pages
Mini Project - Sign Language Translator SYNOPHSIS
No ratings yet
Mini Project - Sign Language Translator SYNOPHSIS
9 pages
Sign Language
No ratings yet
Sign Language
5 pages
From - Table - of - Content - Report - s2t (1) (1) 2
No ratings yet
From - Table - of - Content - Report - s2t (1) (1) 2
33 pages
Sign Language Translator Presentation
No ratings yet
Sign Language Translator Presentation
19 pages
Table of Content
No ratings yet
Table of Content
6 pages
Research Paper
No ratings yet
Research Paper
13 pages
Project Report
No ratings yet
Project Report
17 pages
Capstone
No ratings yet
Capstone
2 pages
Batch 16
No ratings yet
Batch 16
17 pages
Ends Emp PT Sign Language
No ratings yet
Ends Emp PT Sign Language
16 pages
Sign Language.
No ratings yet
Sign Language.
35 pages
Project - Exhibition - 2 Report GRP254
No ratings yet
Project - Exhibition - 2 Report GRP254
49 pages
Smart Translation
No ratings yet
Smart Translation
24 pages
Sign Doc 2 - Merged
No ratings yet
Sign Doc 2 - Merged
42 pages
Final Year Project Report (Final)
No ratings yet
Final Year Project Report (Final)
61 pages
18CSP77 Project Phase 2 - Review 3 (Team 70) (!)
No ratings yet
18CSP77 Project Phase 2 - Review 3 (Team 70) (!)
26 pages
Conversion of Sign Language To Text: Presented by
No ratings yet
Conversion of Sign Language To Text: Presented by
16 pages
MINI - PROJECT (Sign Language)
No ratings yet
MINI - PROJECT (Sign Language)
3 pages
Sign Recognition Research Paper
No ratings yet
Sign Recognition Research Paper
16 pages
Project Pre - Submission Final Report
No ratings yet
Project Pre - Submission Final Report
17 pages
JOURNAL Sign
No ratings yet
JOURNAL Sign
2 pages
Architecture
No ratings yet
Architecture
17 pages
SRS Document Se
No ratings yet
SRS Document Se
9 pages
Sign Language To Text Conversion
50% (2)
Sign Language To Text Conversion
27 pages
Gesture Recognition and Natural Language Processing For Real
No ratings yet
Gesture Recognition and Natural Language Processing For Real
11 pages
Final Year Report
No ratings yet
Final Year Report
48 pages
Whisper To Waves-3
No ratings yet
Whisper To Waves-3
20 pages
3D Float Design
No ratings yet
3D Float Design
9 pages
Real-Time Conversion For Sign-to-Text and Text-to-Speech Communication Using Machine Learning
No ratings yet
Real-Time Conversion For Sign-to-Text and Text-to-Speech Communication Using Machine Learning
8 pages
Sign Language Detection Using The Computer Vision
No ratings yet
Sign Language Detection Using The Computer Vision
27 pages
Batch - 01 Report
No ratings yet
Batch - 01 Report
70 pages
Sign Language Detection Using The Computer Visio1
No ratings yet
Sign Language Detection Using The Computer Visio1
26 pages
Draft Final Doc (1) Merged
No ratings yet
Draft Final Doc (1) Merged
74 pages
Signlanguage Abstract
No ratings yet
Signlanguage Abstract
1 page
Batch 25
No ratings yet
Batch 25
14 pages
Software Requirements Specification: COMSATS University Islamabad, COMSATS Road, Off GT Road, Sahiwal, Pakistan
No ratings yet
Software Requirements Specification: COMSATS University Islamabad, COMSATS Road, Off GT Road, Sahiwal, Pakistan
13 pages
Project File
No ratings yet
Project File
66 pages
Assignment: Shubam Thakyal (2021A1R032)
No ratings yet
Assignment: Shubam Thakyal (2021A1R032)
51 pages
2 Documentation Final Project Report
No ratings yet
2 Documentation Final Project Report
22 pages
Hand Signs To Audio Converte1
No ratings yet
Hand Signs To Audio Converte1
11 pages
Report SLD
No ratings yet
Report SLD
21 pages
Hackblitz
No ratings yet
Hackblitz
6 pages
ProjectTemplateFinal 4 4 - 4
No ratings yet
ProjectTemplateFinal 4 4 - 4
23 pages
ABSTRACT
No ratings yet
ABSTRACT
34 pages
Batch 2 - It A
No ratings yet
Batch 2 - It A
23 pages
Rida Mumtaz
No ratings yet
Rida Mumtaz
26 pages
Real-Time Critical Systems
From Everand
Real-Time Critical Systems
Jordan Lee Mauro-Buhagiar
3/5 (1)
Industrial Automation: Learn the current and leading-edge research on SCADA security
From Everand
Industrial Automation: Learn the current and leading-edge research on SCADA security
Vikalp Joshi
No ratings yet
Numerical Reasoning Test - Managers
No ratings yet
Numerical Reasoning Test - Managers
8 pages
Unit 2 Lesson 1 - Instantaneous Change, The Derivative and The Power Rule (Ans)
No ratings yet
Unit 2 Lesson 1 - Instantaneous Change, The Derivative and The Power Rule (Ans)
4 pages
LAB REPORT 23 Rosales and Brassicales
100% (1)
LAB REPORT 23 Rosales and Brassicales
11 pages
SotDL - Spell Cards - Conjuration
No ratings yet
SotDL - Spell Cards - Conjuration
22 pages
Central Bank
No ratings yet
Central Bank
55 pages
Oct2023
No ratings yet
Oct2023
7 pages
A Conceptual Framework of The
No ratings yet
A Conceptual Framework of The
167 pages
Elimination Reactions
No ratings yet
Elimination Reactions
63 pages
Real Test Bank Legal and Ethical Aspects of Health Information Management 4th Edition by Dana C McWay Ebook and TestBank Bundle Digital Bundle
No ratings yet
Real Test Bank Legal and Ethical Aspects of Health Information Management 4th Edition by Dana C McWay Ebook and TestBank Bundle Digital Bundle
351 pages
... System For Ranking Jobs Logically & Fairly: To Determine The Relative Size of Jobs in An Organization
No ratings yet
... System For Ranking Jobs Logically & Fairly: To Determine The Relative Size of Jobs in An Organization
23 pages
CONSUMER DECISION MAKING Notes
No ratings yet
CONSUMER DECISION MAKING Notes
16 pages
Unit I Architectures - Ann: Ee6006 Applied Soft Computing LTPC 3 0 0 3
No ratings yet
Unit I Architectures - Ann: Ee6006 Applied Soft Computing LTPC 3 0 0 3
1 page
Ls Comp 1ed Tr9 U2 Worksheet Ans
No ratings yet
Ls Comp 1ed Tr9 U2 Worksheet Ans
8 pages
Concrete Hollow Blocks
No ratings yet
Concrete Hollow Blocks
6 pages
Metaphors in Editorial Cartoons Representing The Global Financial Crisis
No ratings yet
Metaphors in Editorial Cartoons Representing The Global Financial Crisis
21 pages
TOETLTEST究極単語5000
No ratings yet
TOETLTEST究極単語5000
81 pages
History of English Language
No ratings yet
History of English Language
10 pages
SPUTNIK7 - LEGAL JOURNALFinal
No ratings yet
SPUTNIK7 - LEGAL JOURNALFinal
129 pages
Pabigat Research Project
No ratings yet
Pabigat Research Project
14 pages
NewsRecord14 04 23
No ratings yet
NewsRecord14 04 23
12 pages
Turnover Checklist
No ratings yet
Turnover Checklist
5 pages
Study of A Novel Cathode Tool Structure For Improving Heat Removal in Electrochemical Micro-Machining
No ratings yet
Study of A Novel Cathode Tool Structure For Improving Heat Removal in Electrochemical Micro-Machining
7 pages
Unit III AI
100% (1)
Unit III AI
38 pages
Ticket
No ratings yet
Ticket
2 pages
DDRM Quarter 2 - Lesson 5
No ratings yet
DDRM Quarter 2 - Lesson 5
24 pages
Ed4 Unit2foundationsandcharacteristics
No ratings yet
Ed4 Unit2foundationsandcharacteristics
13 pages
Presentation KL Maritime
No ratings yet
Presentation KL Maritime
7 pages
BMFM 33141 Group Assignment (Case Study) March 2022
No ratings yet
BMFM 33141 Group Assignment (Case Study) March 2022
3 pages
SP4-6 Test3 Czytanie
No ratings yet
SP4-6 Test3 Czytanie
3 pages