0% found this document useful (0 votes)

9 views29 pages

Natural Language Processing-Section

The document outlines a plan for generating image captions in Arabic using a VGG16 convolutional neural network model for image recognition. It details the steps for implementing the model in Keras, including preprocessing images, extracting features, generating English descriptions, and translating them to Arabic using Google Translate and gTTS. The document also provides code snippets and instructions for setting up the necessary libraries and executing the image caption generation process.

Uploaded by

dw9324764

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views29 pages

Natural Language Processing-Section

Uploaded by

dw9324764

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 29

Natural language processing

Section 6
Arabic - Image Caption Generator
1
Main idea
Image caption generator generates the caption for a
given image by understanding the image. The
challenging part of the caption generation is to
understand the image and understand the image
context and produce English description for the image
and then you can translate it to any other language.

2
Our Plan

we will follow these

steps to generate the
image caption

3
VGG16
VGG16 is a convolutional neural network model that’s used
for image recognition. It’s unique in that it has only 16 layers
that have weights, as opposed to relying on a large number
of hyper-parameters. It’s considered one of the best vision
model architectures.

4
More Information
About VGG_16

• VGG_16 consists of 16 layers, including 13 convolutional layers

and 3 fully connected layers.
• It is relatively deep compared to previous CNN architectures like
LeNet and AlexNet.
• The depth of the network allows it to learn more complex
features and capture fine details in the input images.
• It has a large number of parameters. It has about 138 million
trainable parameters, making it more computationally
expensive to train compared to shallower networks.
• Its models pretrained on large-scale image classification tasks,
such as the ImageNet dataset, are widely available.
• It has been shown to generalize well to other computer vision
tasks, such as object detection and semantic segmentation, by
utilizing its feature extraction capabilities.
5
How to Implement VGG16 in Keras

8 STEPS FOR IMPLEMENTING VGG16 IN

KEARS
1. Import the libraries for VGG16.
2. Create an object for training and testing data.
3. Initialize the model,
4. Pass the data to the dense layer.
5. Compile the model.
6. Import libraries to monitor and control training.
7. Visualize the training/validation data.
8. Test your model.

6
English description to Arabic description

You can use google translate library to translate English description

to Arabic description and then use gTTS library to read the Arabic
caption

7
gTTS
gTTS (Google Text-to-Speech)is a Python library
and CLI tool to interface with Google Translate text-
to-speech API. We will import the gTTS library from
the gtts module which can be used for speech
translation.

8
Install libraries
We need to install 3 libraries:
!pip install pydotplus
!pip install googletrans
!pip install gTTS

9
Load our plan

Use pydotplus library to draw a graph from dotted data

myplan="""digraph {
Load_VGG16_Model_Restructure ->
Load_Pretring_Model ->
Show_image ->
Input_preprocess ->
Generate_English_Description ->
Translate_English_Description_To_Arabic ->
Read_Arabic_Description_by_gTTS
}"""
mygraph=pydotplus.graph_from_dot_data(myplan)
mygraph.write_png("myplan.png“)
display(Image(filename= './myplan.png‘))
10
Preprocessing and cleaning data

Extract features from photos

To extract all features from photos, we need to load the model via

model = VGG16()

Then re-structure the model to remove the last layer from the loaded model as we require
only the features not the classification

model = Model(inputs=model.inputs, outputs=model.layers[-

2].output) 11
Load and Prepare Image

 We can load the image as pixel data and prepare it to be presented to the network.
 Keras provides some tools to help with this step.
 First, we can use the load_img() function to load the image and resize it to the required
size of 224×224 pixels.

from keras.preprocessing.image import load_img

image = load_img(filename, target_size=(224, 224))

12
Convert from pixel to NumPy array

Next, we can convert the pixels to a NumPy array so that we can work with it in Keras. We
can use the img_to_array() function for this.

from keras.preprocessing.image import img_to_array

# convert the image pixels to a numpy array
image = img_to_array(image)

13
Reshape data for the model

 The network expects one or more images as input; that means the input array will
need to be 4-dimensional: samples, rows, columns, and channels.
 We only have one sample (one image). We can reshape the array by calling reshape()
and adding the extra dimension.
 Note:
 The input data is reshaped so that it can be formatted in a way that the network can understand
and use for training

# reshape data for the model

image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))

14
Prepare the image for the VGG
model

 Keras provides a function called preprocess_input() to prepare new input for the
network.

from keras.applications.vgg16 import preprocess_input

# prepare the image for the VGG model
image = preprocess_input(image)

15
Get features

 We can call the predict() function on the model in order to get a prediction of the
probability of the image belonging to each of the 1000 known object types. prepare
new input for the network.
 Note:
 By setting verbose 0, 1 or 2 you just say how do you want to 'see' the training progress for each epoch.
 verbose=0 will show you nothing (silent)
 verbose=1 will show you an animated progress bar
 verbose=2 will just mention the number of epoch

feature = model.predict(image, verbose=0)

16
Map an integer to a word

 We get a dictionary contains the word with its index by using word_for_id function

def word_for_id(integer, tokenizer):

for word, index in tokenizer.word_index.items():
if index == integer:
return word
return None
17
Definition Fun Generate Description

After extract features from each photo in the directory and mapping each integer to a
word
Now, generate a description for the image

Def Function called “generate_desc”

18
Fun Generate Description

def generate_desc(model, tokenizer, photo,

max_length):
# seed the generation process
in_text = 'startseq'

19
Fun Generate Description

Next Step => iterate over the whole length of the

sequence
for i in range(max_length):
# integer encode input sequence
sequence = tokenizer.texts_to_sequences([in_text])[0]
# pad input
sequence = pad_sequences([sequence], maxlen=max_length)
# predict next word
yhat = model.predict([photo,sequence], verbose=0)

20
Fun Generate Description

Now need to convert probability to integer then map integer

to each word

# convert probability to integer

yhat = argmax(yhat)
# map integer to word
word = word_for_id(yhat, tokenizer)

21
Fun Generate Description
• We should handle in case cannot map the word
# stop if we cannot map the word
if word is None:
break
# append as input for generating the next word
in_text += ' ' + word
# stop if we predict the end of the sequence
if word == 'endseq':
break

• Return of the function generate_desc

return in_texturn in_text 22
Loading Tokenizer and Model
Last Step=> load the tokenizer and Model and define
#max_length
load the tokenizer
tokenizer = load(open('/content/drive/MyDrive/imagecaptiongenerator/tokenizer.pkl',
'rb'))
# pre-define the max sequence length (from training)
max_length = 34
# load the model
model = load_model('/content/drive/MyDrive/test/VGGmodels/{epoch:03d}-
{val_accuracy:.2f}"#"models/{epoch:03d}-{val_loss:.2f}.h5"#"training_1/cp.ckpt')
path = '/content/drive/MyDrive/flicker8k-dataset/Flickr8k_Dataset/Flicker8k_Dataset/
1019077836_6fc9b15408.jpg'
# load and prepare the photograph
photo = extract_features(path)
23
Calling Fun Generate Description

Call function generate_desc() to generate description

english_text = generate_desc(model, tokenizer, photo, max_length)

24
Display Image

To Display Image
display(Image(filename=path))

Then Replace Text

english_text=english_text.replace("startseq", "").replace("endseq", "")
print(english_text)
25
Translation

Translate
translator = googletrans.Translator()
arabic_text =
translator.translate(english_text,dest='ar')
.text

print(arabic_text)

26
Translation Audio

Play Translation Audio

tts = gTTS(arabic_text, lang='ar')
tts.save('test.mp3’)

audio_path="test.mp3"
ipd.Audio(audio_path, autoplay=True)

27
Try it yourself

Dataset :
https://www.kaggle.com/datasets/ming666/flicker8k-dataset
Code:
https://colab.research.google.com/drive/1BlNUBbSxi0HanGsAkz7L1q9YEvtKu7_B?
usp=sharing#scrollTo=xs_0ccfbTNSN

28
Thank you for your attention!

Computer Vision With Keras
No ratings yet
Computer Vision With Keras
67 pages
Image Caption Generation Using Deep Learning: Department of Electronics & Instrumentation Engineering NIT Silchar, Assam
No ratings yet
Image Caption Generation Using Deep Learning: Department of Electronics & Instrumentation Engineering NIT Silchar, Assam
21 pages
Deep Learning TensorFlow and Keras
No ratings yet
Deep Learning TensorFlow and Keras
454 pages
Christian Dior The Magic of Fashion
100% (3)
Christian Dior The Magic of Fashion
66 pages
Pre-Trained Models: Objectives
No ratings yet
Pre-Trained Models: Objectives
12 pages
Image Captioning Using CNN and LSTM
No ratings yet
Image Captioning Using CNN and LSTM
9 pages
BTP Report
No ratings yet
BTP Report
27 pages
Ker As Tutorial
No ratings yet
Ker As Tutorial
33 pages
Experiment 3
No ratings yet
Experiment 3
5 pages
Presentation Manu Niha
No ratings yet
Presentation Manu Niha
11 pages
Dlweek 7
No ratings yet
Dlweek 7
9 pages
Final Project Report
No ratings yet
Final Project Report
18 pages
Natural Language Processing
No ratings yet
Natural Language Processing
4 pages
Layers in CNN
No ratings yet
Layers in CNN
22 pages
21BCP167 Ai 9
No ratings yet
21BCP167 Ai 9
10 pages
Localization Using Convolutional Neural Networks
No ratings yet
Localization Using Convolutional Neural Networks
29 pages
CI Keras
No ratings yet
CI Keras
22 pages
DL7 2
No ratings yet
DL7 2
11 pages
Computer Vision Pretrained Models: What Is Pre-Trained Model?
No ratings yet
Computer Vision Pretrained Models: What Is Pre-Trained Model?
10 pages
Project Manual - Team 591965
No ratings yet
Project Manual - Team 591965
27 pages
Updated Lab Manual 14 DIP
No ratings yet
Updated Lab Manual 14 DIP
24 pages
Cad and Dog 2
No ratings yet
Cad and Dog 2
5 pages
Project Guidelines - AIML
No ratings yet
Project Guidelines - AIML
30 pages
Generative AI Mini Projects
No ratings yet
Generative AI Mini Projects
39 pages
ML Lab Session 05 - CNN Implementation
No ratings yet
ML Lab Session 05 - CNN Implementation
4 pages
Urn CH SLSP ZBZ 9781098134181 Ihv PDF
No ratings yet
Urn CH SLSP ZBZ 9781098134181 Ihv PDF
7 pages
CCS355 SET2 Anna University Lab Question Set Neural Network
No ratings yet
CCS355 SET2 Anna University Lab Question Set Neural Network
2 pages
Chapter 5
No ratings yet
Chapter 5
8 pages
ch4 CNN
No ratings yet
ch4 CNN
35 pages
Keras Applications
No ratings yet
Keras Applications
16 pages
Image Caption Generator Report
No ratings yet
Image Caption Generator Report
27 pages
Introduction To Keras
No ratings yet
Introduction To Keras
14 pages
Automatic Image Captioning Bot With CNN and RNN: - Submitted By-Harkirat Singh CSE-3 01976802717
No ratings yet
Automatic Image Captioning Bot With CNN and RNN: - Submitted By-Harkirat Singh CSE-3 01976802717
10 pages
DSE 3141 Deep Learning Lab Manual 2024 Week4
No ratings yet
DSE 3141 Deep Learning Lab Manual 2024 Week4
14 pages
"I C U N N ": Mage Lassification Sing Eural Etworks
No ratings yet
"I C U N N ": Mage Lassification Sing Eural Etworks
15 pages
Keras v.2.1.6
No ratings yet
Keras v.2.1.6
244 pages
RBBA ResNet - BERT - Bahdanau Attention For Image Caption Generator
No ratings yet
RBBA ResNet - BERT - Bahdanau Attention For Image Caption Generator
6 pages
Image Classification Using MNIST Dataset
No ratings yet
Image Classification Using MNIST Dataset
28 pages
Implementation of Simple and Efficient P
No ratings yet
Implementation of Simple and Efficient P
8 pages
Cad and Dog
No ratings yet
Cad and Dog
5 pages
Slides - ChatGPT - Jousef Murad
No ratings yet
Slides - ChatGPT - Jousef Murad
33 pages
RP Springer
No ratings yet
RP Springer
10 pages
IntroKeras Español
No ratings yet
IntroKeras Español
46 pages
Ai Image Captioning
No ratings yet
Ai Image Captioning
10 pages
Improved - FCC - Cat - Dog - Ipynb - Colab
No ratings yet
Improved - FCC - Cat - Dog - Ipynb - Colab
12 pages
Dla
No ratings yet
Dla
23 pages
Introduction To Keras!: Vincent Lepetit!
No ratings yet
Introduction To Keras!: Vincent Lepetit!
33 pages
Explore The Implementation of CNNs in Python
No ratings yet
Explore The Implementation of CNNs in Python
10 pages
Kirkvik Acit2022
No ratings yet
Kirkvik Acit2022
155 pages
CCS355 Set 2
No ratings yet
CCS355 Set 2
2 pages
Image Captioning With Visual Attention PDF
No ratings yet
Image Captioning With Visual Attention PDF
16 pages
Chitra K S 2022bcse07aed1011
No ratings yet
Chitra K S 2022bcse07aed1011
21 pages
Exercise 8
No ratings yet
Exercise 8
6 pages
VGG New
No ratings yet
VGG New
15 pages
Brain Tumour Detection
No ratings yet
Brain Tumour Detection
41 pages
Practical Guide To Keras
No ratings yet
Practical Guide To Keras
28 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
111 pages
Aust Cse Thesis Final Book
No ratings yet
Aust Cse Thesis Final Book
72 pages
DL Lab-III-II
No ratings yet
DL Lab-III-II
98 pages
Braintumourdetection 230331184432 B75486a5
No ratings yet
Braintumourdetection 230331184432 B75486a5
41 pages
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
SWE-Week 01
No ratings yet
SWE-Week 01
25 pages
SWE-Week 05
No ratings yet
SWE-Week 05
32 pages
SWE-Week 02
No ratings yet
SWE-Week 02
24 pages
SWE-Week 03
No ratings yet
SWE-Week 03
21 pages
SWE-Week 04
No ratings yet
SWE-Week 04
17 pages
Natural Language Processing-Section
No ratings yet
Natural Language Processing-Section
22 pages
Natural Language Processing-Section
No ratings yet
Natural Language Processing-Section
25 pages
Natural Language Processing-Section
No ratings yet
Natural Language Processing-Section
38 pages
Natural Language Processing-Section
No ratings yet
Natural Language Processing-Section
29 pages
Natural Language Processing-Section
No ratings yet
Natural Language Processing-Section
57 pages
Natural Language Processing Project Spring2024-2025
No ratings yet
Natural Language Processing Project Spring2024-2025
2 pages
1-Introduction To NLP - Part1
No ratings yet
1-Introduction To NLP - Part1
31 pages
4-Finite State Machines - Part1
No ratings yet
4-Finite State Machines - Part1
31 pages
Small Lab Design
No ratings yet
Small Lab Design
1 page
70 433 Question
No ratings yet
70 433 Question
5 pages
DS-M5504HM-T Series Mobile DVR: Main Features
No ratings yet
DS-M5504HM-T Series Mobile DVR: Main Features
4 pages
Cove R Lin e
No ratings yet
Cove R Lin e
17 pages
Versa CSeries Aluminum Solenoid Valves
No ratings yet
Versa CSeries Aluminum Solenoid Valves
24 pages
Prepaid Instruments in India Feb 27
No ratings yet
Prepaid Instruments in India Feb 27
11 pages
Wells Fargo Everyday Checking
No ratings yet
Wells Fargo Everyday Checking
7 pages
SA 226 LUBRICATION - Maintenance Practices
No ratings yet
SA 226 LUBRICATION - Maintenance Practices
12 pages
As 3515.2-2002 Gold and Gold Bearing Alloys Determination of Gold Content 30 Percent To 99.5 Percent - Gravim
No ratings yet
As 3515.2-2002 Gold and Gold Bearing Alloys Determination of Gold Content 30 Percent To 99.5 Percent - Gravim
7 pages
Httpssimplifydays.s3.Us West 2.amazonaws - Comsimplifybook Video4 PDF
No ratings yet
Httpssimplifydays.s3.Us West 2.amazonaws - Comsimplifybook Video4 PDF
7 pages
Tender - GCMS Specification
No ratings yet
Tender - GCMS Specification
5 pages
Bulacan Agricultural State College: Republic of The Philippines
No ratings yet
Bulacan Agricultural State College: Republic of The Philippines
10 pages
RPT Bahasa Inggeris Tingkatan 3 2017
100% (1)
RPT Bahasa Inggeris Tingkatan 3 2017
21 pages
Koskela en Es
No ratings yet
Koskela en Es
298 pages
Homework Hacks Tumblr
100% (2)
Homework Hacks Tumblr
8 pages
Experiment-1: Aim: Equipment Required
No ratings yet
Experiment-1: Aim: Equipment Required
17 pages
Packet Tracer 8.6.1.3
0% (1)
Packet Tracer 8.6.1.3
16 pages
Quote For Outstanding Usari SD Fab Scopes Rev1 - 23-01-2025
No ratings yet
Quote For Outstanding Usari SD Fab Scopes Rev1 - 23-01-2025
1 page
Skin Scarring
No ratings yet
Skin Scarring
29 pages
Alnpp0187h 2024
No ratings yet
Alnpp0187h 2024
8 pages
MR-Pdt-SE New Adhesion Communication
No ratings yet
MR-Pdt-SE New Adhesion Communication
2 pages
Plonking Summary
No ratings yet
Plonking Summary
2 pages
Closed-Loop Control of DC Drives With Controlled Rectifier
0% (1)
Closed-Loop Control of DC Drives With Controlled Rectifier
40 pages
(Viral) Kamal Kaur Viral Video Original Link
No ratings yet
(Viral) Kamal Kaur Viral Video Original Link
5 pages
Residual Method
No ratings yet
Residual Method
15 pages
Aluminium Foil
0% (1)
Aluminium Foil
45 pages
50 Days Weight Loss Chart
No ratings yet
50 Days Weight Loss Chart
6 pages
NCR GDCE Notification 2019 English 2019
No ratings yet
NCR GDCE Notification 2019 English 2019
10 pages
Course Contents of List of Courses Approved by Federal University of Technology, Akure in Metallurgical and Materials Engineering Department
100% (2)
Course Contents of List of Courses Approved by Federal University of Technology, Akure in Metallurgical and Materials Engineering Department
14 pages

Natural Language Processing-Section

Uploaded by

Natural Language Processing-Section

Uploaded by

Natural language processing

we will follow these

• VGG_16 consists of 16 layers, including 13 convolutional layers

8 STEPS FOR IMPLEMENTING VGG16 IN

You can use google translate library to translate English description

Use pydotplus library to draw a graph from dotted data

Extract features from photos

model = Model(inputs=model.inputs, outputs=model.layers[-

from keras.preprocessing.image import load_img

from keras.preprocessing.image import img_to_array

# reshape data for the model

from keras.applications.vgg16 import preprocess_input

feature = model.predict(image, verbose=0)

def word_for_id(integer, tokenizer):

Def Function called “generate_desc”

def generate_desc(model, tokenizer, photo,

Next Step => iterate over the whole length of the

Now need to convert probability to integer then map integer

# convert probability to integer

• Return of the function generate_desc

Call function generate_desc() to generate description

english_text = generate_desc(model, tokenizer, photo, max_length)

Then Replace Text

Play Translation Audio

You might also like