Natural Language Processing-Section
Natural Language Processing-Section
Section 6
Arabic - Image Caption Generator
1
Main idea
Image caption generator generates the caption for a
given image by understanding the image. The
challenging part of the caption generation is to
understand the image and understand the image
context and produce English description for the image
and then you can translate it to any other language.
2
Our Plan
3
VGG16
VGG16 is a convolutional neural network model that’s used
for image recognition. It’s unique in that it has only 16 layers
that have weights, as opposed to relying on a large number
of hyper-parameters. It’s considered one of the best vision
model architectures.
4
More Information
About VGG_16
6
English description to Arabic description
7
gTTS
gTTS (Google Text-to-Speech)is a Python library
and CLI tool to interface with Google Translate text-
to-speech API. We will import the gTTS library from
the gtts module which can be used for speech
translation.
8
Install libraries
We need to install 3 libraries:
!pip install pydotplus
!pip install googletrans
!pip install gTTS
9
Load our plan
model = VGG16()
Then re-structure the model to remove the last layer from the loaded model as we require
only the features not the classification
We can load the image as pixel data and prepare it to be presented to the network.
Keras provides some tools to help with this step.
First, we can use the load_img() function to load the image and resize it to the required
size of 224×224 pixels.
12
Convert from pixel to NumPy array
Next, we can convert the pixels to a NumPy array so that we can work with it in Keras. We
can use the img_to_array() function for this.
13
Reshape data for the model
The network expects one or more images as input; that means the input array will
need to be 4-dimensional: samples, rows, columns, and channels.
We only have one sample (one image). We can reshape the array by calling reshape()
and adding the extra dimension.
Note:
The input data is reshaped so that it can be formatted in a way that the network can understand
and use for training
14
Prepare the image for the VGG
model
Keras provides a function called preprocess_input() to prepare new input for the
network.
15
Get features
We can call the predict() function on the model in order to get a prediction of the
probability of the image belonging to each of the 1000 known object types. prepare
new input for the network.
Note:
By setting verbose 0, 1 or 2 you just say how do you want to 'see' the training progress for each epoch.
verbose=0 will show you nothing (silent)
verbose=1 will show you an animated progress bar
verbose=2 will just mention the number of epoch
16
Map an integer to a word
We get a dictionary contains the word with its index by using word_for_id function
After extract features from each photo in the directory and mapping each integer to a
word
Now, generate a description for the image
18
Fun Generate Description
19
Fun Generate Description
20
Fun Generate Description
21
Fun Generate Description
• We should handle in case cannot map the word
# stop if we cannot map the word
if word is None:
break
# append as input for generating the next word
in_text += ' ' + word
# stop if we predict the end of the sequence
if word == 'endseq':
break
24
Display Image
To Display Image
display(Image(filename=path))
Translate
translator = googletrans.Translator()
arabic_text =
translator.translate(english_text,dest='ar')
.text
print(arabic_text)
26
Translation Audio
audio_path="test.mp3"
ipd.Audio(audio_path, autoplay=True)
27
Try it yourself
Dataset :
https://www.kaggle.com/datasets/ming666/flicker8k-dataset
Code:
https://colab.research.google.com/drive/1BlNUBbSxi0HanGsAkz7L1q9YEvtKu7_B?
usp=sharing#scrollTo=xs_0ccfbTNSN
28
Thank you for your attention!
29