Application of Deep Learning Part1
Application of Deep Learning Part1
Lecture
Explain Images with Multimodal Recurrent Neural Networks, Mao et al. 10 - April 29, 2021
Deep Visual-Semantic Alignments for Generating Image Descriptions, Karpathy and Fei-Fei
Show and Tell: A Neural Image Caption Generator, Vinyals et al.
Long-term Recurrent Convolutional Networks for Visual Recognition and Description, Donahue et al.
Learning a Recurrent Visual Representation for Image Caption Generation, Chen and Zitnick
X
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 10 - April 29, 2021
test image
x0
<START>
y0
before:
h = tanh(Wxh * x + Whh * h)
h0
Wih
now:
h = tanh(Wxh * x + Whh * h + Wih * v)
x0
<START>
v
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 10 - April 29, 2021
test image
y0
sample!
h0
x0
<START> straw
y0 y1
h0 h1
x0
<START> straw
y0 y1
h0 h1
sample!
x0
<START> straw hat
y0 y1 y2
h0 h1 h2
x0
<START> straw hat
y0 y1 y2
sample
<END> token
h0 h1 h2 => finish.
x0
<START> straw hat
A cat sitting on a A cat is sitting on a tree A dog is running in the A white teddy bear sitting in
suitcase on the floor branch grass with a frisbee the grass
A bird is perched on
a tree branch
A man in a
baseball uniform
throwing a ball
Agrawal et al, “Visual 7W: Grounded Question Answering in Images”, CVPR 2015
Figures from Agrawal et al, copyright IEEE 2015. Reproduced for educational purposes.
Image
Model Yes or No
What is the dog Question
playing with?
Frisbee Answer
10 - 20
Lecturedepth April 29, 2021
time