Image Caption
Image Caption
CONTENTS:
ational Institute of Science & Technology
• INTRODUCTION
• PROBLEM STATEMENT
• OBJECTIVES
• CHALLENGES
• METHODOLOGY
• FLOWCART AND ALGORITHM
• CODE & SIMULATION
• POSSIBLE OUTCOME
• FUTURE ADD-ON
• REFERENCE
• CONCLUSION
(202110331 Ashish Kumar Panda) (202110339 B.Sourav) (202110342 Bishnu Prasad Maharana) (202110348 Sushovit Badatya) 2
B.TECH MINOR PROJECT PRESENTATION 2024-25
INTRODUCTION:
ational Institute of Science & Technology
(202110331 Ashish Kumar Panda) (202110339 B.Sourav) (202110342 Bishnu Prasad Maharana) (202110348 Sushovit Badatya) 3
B.TECH MINOR PROJECT PRESENTATION 2024-25
PROBLEM STATEMENT:
ational Institute of Science & Technology
(202110331 Ashish Kumar Panda) (202110339 B.Sourav) (202110342 Bishnu Prasad Maharana) (202110348 Sushovit Badatya) 4
B.TECH MINOR PROJECT PRESENTATION 2024-25
OBJECTIVE:
ational Institute of Science & Technology
(202110331 Ashish Kumar Panda) (202110339 B.Sourav) (202110342 Bishnu Prasad Maharana) (202110348 Sushovit Badatya) 5
B.TECH MINOR PROJECT PRESENTATION 2024-25
CHALLENGES :
ational Institute of Science & Technology
• Data Dependency:
– High-quality, labeled datasets are required for training. Datasets need to have a wide variety of images
with accurate captions.
• Computational Complexity:
– Training deep neural networks, especially those involving both visual and language models, requires
significant computational power.
• Contextual and Semantic Understanding:
– Generating relevant captions goes beyond identifying objects—it requires an understanding of the
relationships and context within an image to generate human-like descriptions.
(202110331 Ashish Kumar Panda) (202110339 B.Sourav) (202110342 Bishnu Prasad Maharana) (202110348 Sushovit Badatya) 7
B.TECH MINOR PROJECT PRESENTATION 2024-25
METHODOLOGY:
ational Institute of Science & Technology
• Dataset Preparation:
– Datasets like Flickr8k/30k provide thousands of labeled images with human-generated captions.
– Preprocess images and captions, including resizing, normalizing, and tokenizing text.
• Feature Extraction:
– Use Convolutional Neural Networks (CNNs), such as ResNet or InceptionV3, to extract features from images.
– CNNs transform images into feature vectors that capture essential visual information.
• Text Generation:
– Use Recurrent Neural Networks (RNNs) or Transformers to generate text based on image features.
– LSTMs are commonly used in RNNs to handle sequential data and maintain context over time.
• Integrating CNN and RNN Models:
– Combine CNN feature extraction with RNN/Transformer-based language generation to create an end-to-end
captioning model.
– Image features guide the initial generation, while RNN/Transformer structures help complete sentences.
(202110331 Ashish Kumar Panda) (202110339 B.Sourav) (202110342 Bishnu Prasad Maharana) (202110348 Sushovit Badatya) 9
B.TECH MINOR PROJECT PRESENTATION 2024-25
FLOWCHART:
& Technology
Technology
Start
Initial the node in
advertisement mode
Is Energy
more then
of Science
threshold
No
Yes
Update the Broadcast as
Select as cluster head and Broadcast
threshold Value common Node
Any node
Institute of
have more
energy
ational Institute
Yes No
Create Cluster
Stop
(202110331 Ashish Kumar Panda) (202110339 B.Sourav) (202110342 Bishnu Prasad Maharana) (202110348 Sushovit Badatya) 10
B.TECH MINOR PROJECT PRESENTATION 2024-25
ALGORITHM:
ational Institute of Science & Technology
(202110331 Ashish Kumar Panda) (202110339 B.Sourav) (202110342 Bishnu Prasad Maharana) (202110348 Sushovit Badatya) 11
B.TECH MINOR PROJECT PRESENTATION 2024-25
SIMULATION 1:
ational Institute of Science & Technology
• features = {}
• directory = os.path.join(BASE_DIR, 'Images')
•
for img_name in tqdm(os.listdir(directory)):
• # Load the image and preprocess it
• img_path = os.path.join(directory, img_name)
• image = load_img(img_path, target_size=(224, 224))
• image = img_to_array(image)
• image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
• image = preprocess_input(image)
•
# Extract features
• feature = model.predict(image, verbose=0)
•
# Store features with image ID as key
• image_id = img_name.split('.')[0] # Use image filename without extension as ID
• features[image_id] = feature
• }
• f not os.path.exists(WORKING_DIR):
• os.makedirs(WORKING_DIR)
• print(f"Created directory: {WORKING_DIR}")
•
# Save extracted features for later use
• with open(os.path.join(WORKING_DIR, 'image_features.pkl'), 'wb') as f:
• pickle.dump(features, f)
•
print("Feature extraction complete and saved to 'image_features.pkl'.")
(202110331 Ashish Kumar Panda) (202110339 B.Sourav) (202110342 Bishnu Prasad Maharana) (202110348 Sushovit Badatya) 12
B.TECH MINOR PROJECT PRESENTATION 2024-25
SIMULATION 2:
ational Institute of Science & Technology
(202110331 Ashish Kumar Panda) (202110339 B.Sourav) (202110342 Bishnu Prasad Maharana) (202110348 Sushovit Badatya) 13
B.TECH MINOR PROJECT PRESENTATION 2024-25
POSSIBLE OUTCOMES:
ational Institute of Science & Technology
• High-Quality Captions: Model should generate captions that are accurate, descriptive, and
contextually appropriate.
• Improved Accessibility: Enhances accessibility for visually impaired users by providing descriptions
of visual content.
• Enhanced Image Indexing: Automatically generated captions can aid in search engine indexing for
better image retrieval.
(202110331 Ashish Kumar Panda) (202110339 B.Sourav) (202110342 Bishnu Prasad Maharana) (202110348 Sushovit Badatya) 14
B.TECH MINOR PROJECT PRESENTATION 2024-25
FUTURE ADD-ONS:
ational Institute of Science & Technology
• Real-Time Captioning: Enable the model to generate captions for images in real-time applications.
• Improved Context and Semantics :Fine-tune the model to capture deeper relationships and context
within images for more nuanced captions.
• Encryption and Decryption of images and generated captions: Encrypt the images and the genrated
captions to enhance security in real time uses.
(202110331 Ashish Kumar Panda) (202110339 B.Sourav) (202110342 Bishnu Prasad Maharana) (202110348 Sushovit Badatya) 14
B.TECH MINOR PROJECT PRESENTATION 2024-25
REFERENCE:
ational Institute of Science & Technology
• Krizhevsky, Alex, I. Sutskever, and G. E. Hinton. "ImageNet classification with deep convolutional
neural networks." International Conference on Neural Information Processing Systems Curran
Associates Inc. 1097-1105. (2012)
• Image Captioning Based on Deep Neural Networks Shuang Liu1 , Liang Bai1,a, Yanli Hu1 and Haoran
Wang1 1College of Systems Engineering, National University of Defense Technology,410073 Changsha,
China
• Show, Attend and Tell: Neural Image Caption Generation with Visual Attention -“Kelvin Xu, Jimmy Ba,
Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, Yoshua Bengio”
University of Toronto, Courant Institute of Mathematical Sciences, New York University (2016).
• Self-critical Sequence Training for Image Captioning “Steven J. Rennie, Etienne Marcheret, Youssef
Mroueh, Jarret Ross, Vaibhava Goel” IBM Research AI (2017)
• Bottom-Up and Top-Down Attention for Image Captioning- “Peter Anderson, Xiaodong He, Chris
Buehler, Damien Teney, Mark Johnson, Stephen Gould, Lei Zhang” Australian National University,
Microsoft Research, University of Edinburgh (2018)
• Transformer-based Image Captioning- “Hakan Yanık, Yusuf Sahillioglu” Middle East Technical
University, Turkey (2020)
(202110331 Ashish Kumar Panda) (202110339 B.Sourav) (202110342 Bishnu Prasad Maharana) (202110348 Sushovit Badatya) 15
B.TECH MINOR PROJECT PRESENTATION 2024-25
CONCLUSION:
ational Institute of Science & Technology
• In conclusion, developing an image caption generator using deep learning merges the complex
fields of computer vision and natural language processing to create a model capable of
automatically generating human-like captions for images. By integrating Convolutional Neural
Networks (CNNs) for visual feature extraction with Recurrent Neural Networks (RNNs) or
Transformers for language generation, the model interprets the visual content of an image and
produces descriptive, contextually relevant captions. This technology holds tremendous
potential for applications in accessibility, content indexing, and human-computer interaction,
bridging the gap between images and text-based understanding. While challenges remain, such
as ensuring nuanced contextual understanding and optimizing computational demands,
continued advancements in model architecture and data processing promise to make image
captioning more accurate, efficient, and widely applicable across diverse real-world scenarios.
(202110331 Ashish Kumar Panda) (202110339 B.Sourav) (202110342 Bishnu Prasad Maharana) (202110348 Sushovit Badatya) 16
B.TECH MINOR PROJECT PRESENTATION 2024-25
ational Institute of Science & Technology
THANK YOU
(202110331 Ashish Kumar Panda) (202110339 B.Sourav) (202110342 Bishnu Prasad Maharana) (202110348 Sushovit Badatya) 17