RNN + RL: Shusen Wang
RNN + RL: Shusen Wang
Shusen Wang
http://wangshusen.github.io/
Prerequisites
Reference:
• Zoph & Le. Neural architecture search with reinforcement learning. In ICLR, 2017.
𝐡#
𝐱 #
𝐡#
𝐡% = tanh 𝐖 ⋅ +𝐛
𝐱#
𝐡#
𝐱 #
𝐡%
𝐡%
𝐡#
𝐱 #
𝐩% = [0.15, 0.6, 0.2, 0.05]
Softmax classifier: dense layer + softmax activation.
𝐡%
𝐡%
𝐡#
𝐱 #
Predict Number of Filters
Probability
𝐩% = [0.15, 0.6, 0.2, 0.05] 0.6
𝐡%
0.2
0.15
𝐡% 0.05
𝐡# # Filters
24 36 48 64
𝐱 #
Predict Number of Filters
𝐡%
0.2
0.15
𝐡% 0.05
𝐡# # Filters
24 36 48 64
𝐱 #
Predict Number of Filters
𝐚% = [0, 1, 0, 0]
Probability
𝐩% = [0.15, 0.6, 0.2, 0.05] 0.6
𝐡%
0.2
0.15
𝐡% 0.05
𝐡# # Filters
24 36 48 64
𝐱 #
𝐚%
𝐩%
Embedding: map a one-hot vector to a dense vector.
𝐡%
𝐡%
𝐡#
𝐱 # 𝐱%
𝐚% 𝐚: 𝐚; 𝐚%#
𝐩%
Embedding: map a one-hot vector to a dense vector.
𝐡%
𝐡%
𝐡#
𝐱 # 𝐱% 𝐱 : 𝐱 ; 𝐱%#
𝐚%
𝐩%
𝐡%
𝐡% 𝐡< = tanh 𝐖 ⋅ +𝐛
𝐱%
𝐡%
𝐡#
𝐱 # 𝐱%
𝐚%
𝐩%
𝐡% 𝐡<
𝐡% 𝐡<
𝐡#
𝐱 # 𝐱%
𝐚%
𝐡% 𝐡<
𝐡% 𝐡<
𝐡#
𝐱 # 𝐱%
# of filters size of filters
𝐚%
𝐡% 𝐡<
𝐡% 𝐡<
𝐡#
𝐱 # 𝐱%
Predict Size of Filters
𝐚%
Probability
𝐩% 𝐩< = [0.5, 0.1, 0.4]
0.5
0.4
𝐡% 𝐡<
𝐡% 𝐡< 0.1
Filter
𝐡#
3×3 5×5 7×7
Size
𝐱 # 𝐱%
Predict Size of Filters
𝐚% 𝐚< = [1, 0, 0]
Probability
𝐩% 𝐩< = [0.5, 0.1, 0.4]
0.5
0.4
𝐡% 𝐡<
𝐡% 𝐡< 0.1
Filter
𝐡#
3×3 5×5 7×7
Size
𝐱 # 𝐱%
𝐚% 𝐚<
𝐩% 𝐩<
embedding
𝐡% 𝐡<
𝐡% 𝐡<
𝐡#
𝐱 # 𝐱% 𝐱 <
𝐚% 𝐚<
𝐱 # 𝐱% 𝐱 <
𝐚% 𝐚<
𝐱 # 𝐱% 𝐱 <
𝐚% 𝐚<
𝐩% 𝐩<
𝐡% 𝐡< 𝐡C
𝐡% 𝐡< 𝐡C
𝐡#
𝐱 # 𝐱% 𝐱 <
𝐚% 𝐚<
𝐡% 𝐡< 𝐡C
𝐡% 𝐡< 𝐡C
𝐡#
𝐱 # 𝐱% 𝐱 <
Predict Stride
𝐚% 𝐚<
Probability
𝐩% 𝐩< 𝐩C = [0.3, 0.7]
0.7
𝐡% 𝐡< 𝐡C
0.3
𝐡% 𝐡< 𝐡C
𝐡# Stride
2 3
𝐱 # 𝐱% 𝐱 <
Predict Stride
𝐚% 𝐚< 𝐚C = [0, 1]
Probability
𝐩% 𝐩< 𝐩C = [0.3, 0.7]
0.7
𝐡% 𝐡< 𝐡C
0.3
𝐡% 𝐡< 𝐡C
𝐡# Stride
2 3
𝐱 # 𝐱% 𝐱 <
• Filter number = 36
𝐚% 𝐚< 𝐚C • Filter size = 3×3
• Stride = 3
𝐩% 𝐩< 𝐩C
𝐡% 𝐡< 𝐡C
𝐡% 𝐡< 𝐡C
𝐡#
𝐱 # 𝐱% 𝐱 <
𝐚% 𝐚< 𝐚C 𝐚F#
𝐩% 𝐩< 𝐩C 𝐩F#
𝐡% 𝐡< 𝐡C 𝐡F#
𝐡#
𝐡%
𝐡<
𝐡C
⋯
𝐱 # 𝐱% 𝐱 < 𝐱 AE
𝐚% 𝐚< 𝐚C 𝐚: 𝐚A 𝐚F 𝐚F#
𝐩% 𝐩< 𝐩C 𝐩: 𝐩A 𝐩F 𝐩F#
𝐡% 𝐡< 𝐡C 𝐡: 𝐡A 𝐡F 𝐡F#
𝐡#
⋯
𝐱 # 𝐱% 𝐱 < 𝐱 C 𝐱 : 𝐱 A ⋯ 𝐱 AE
# of size of # of size of
filters filters
stride
filters filters
stride ⋯ stride
𝐚% 𝐚< 𝐚C 𝐚: 𝐚A 𝐚F 𝐚F#
𝐩% 𝐩< 𝐩C 𝐩: 𝐩A 𝐩F 𝐩F#
𝐡% 𝐡< 𝐡C 𝐡: 𝐡A 𝐡F 𝐡F#
𝐡#
⋯
𝐱 # 𝐱% 𝐱 < 𝐱 C 𝐱 : 𝐱 A ⋯ 𝐱 AE
1st Conv Layer 2nd Conv Layer
# of size of # of size of
filters filters
stride
filters filters
stride ⋯ stride
𝐚% 𝐚< 𝐚C 𝐚: 𝐚A 𝐚F 𝐚F#
𝐩% 𝐩< 𝐩C 𝐩: 𝐩A 𝐩F 𝐩F#
𝐡% 𝐡< 𝐡C 𝐡: 𝐡A 𝐡F 𝐡F#
𝐡#
⋯
𝐱 # 𝐱% 𝐱 < 𝐱 C 𝐱 : 𝐱 A ⋯ 𝐱 AE
Training Controller RNN
How to train the controller RNN?
Training Set
Update
Challenges
𝐚M 𝐚MO% 𝐚MO<
𝐩M 𝐩MO% 𝐩MO<
𝐡M 𝐡MO% 𝐡MO<
𝐱 MN% 𝐱 M 𝐱 MO%
Policy Function
𝐚M 𝐚MO% 𝐚MO<
Predicted
𝐩M 𝐩MO% 𝐩MO<
Distribution
𝐡M 𝐡MO% 𝐡MO<
State
𝐱 MN% 𝐱 M 𝐱 MO%
Policy Function
Predicted
𝐩M 𝐩MO% 𝐩MO<
Distribution
𝐡M 𝐡MO% 𝐡MO<
State
𝐱 MN% 𝐱 M 𝐱 MO%
Policy Function
𝐡M 𝐡MO% 𝐡MO<
State
𝐱 MN% 𝐱 M 𝐱 MO%
Policy Function
Probability
Density
Policy function:
0.5 𝜋 𝐚MO% 𝐡M , 𝐱 M ; 𝛉).
0.4
• 𝜋 “3×3” 𝐡M , 𝐱 M ; 𝛉) = 0.5.
0.1
• 𝜋 “5×5” 𝐡M , 𝐱 M ; 𝛉) = 0.1.
http://wangshusen.github.io/