0% found this document useful (0 votes)
67 views4 pages

Backpropagation in Convolutional Neural Networks

The document describes the architecture of a convolutional neural network for image classification. It includes details on the layers, hyperparameters, and training process. The network contains convolution layers, pooling layers, and fully connected layers. It is trained on mini-batches with a learning rate of 0.001 using RMSProp optimization over 200 epochs. Deeper models and dropout could improve performance while larger strides and pooling kernels would speed up training.

Uploaded by

SergeiBugrov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views4 pages

Backpropagation in Convolutional Neural Networks

The document describes the architecture of a convolutional neural network for image classification. It includes details on the layers, hyperparameters, and training process. The network contains convolution layers, pooling layers, and fully connected layers. It is trained on mini-batches with a learning rate of 0.001 using RMSProp optimization over 200 epochs. Deeper models and dropout could improve performance while larger strides and pooling kernels would speed up training.

Uploaded by

SergeiBugrov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

ECSE 6965

Programing Assignment 3
Sergei Bugrov

Backpropagation in convolutional neural networks

1. 𝛁ŷ = ŷ – y, when loos function is cross entropy

𝜕ŷ 𝜕ŷ 𝜕ŷ
2. 𝛁𝑊𝑜 = 𝛁ŷ, 𝛁𝑏𝑜 = 𝛁ŷ, 𝛁FC = 𝛁ŷ
𝜕𝑊𝑜 𝜕𝑏𝑜 𝜕FC

3. 𝛁P[𝑟][𝑐] = 𝛁FC[(𝑟 − 1)𝑁𝑟𝐹𝐶 ]

𝜕𝑃 𝑁𝑃 𝜕𝑃[𝑟] 𝑁𝑃 𝑁𝑃 𝜕𝑃[𝑟][𝑐]
4. 𝛁A = 𝛁P = ∑𝑟=1
𝑟
𝛁P[𝑟] = ∑𝑟=1
𝑟
∑𝑐=1
𝑐
𝛁P[𝑟][𝑐] =
𝜕𝐴 𝜕𝐴 𝜕𝐴
𝜕𝑃[𝑟][𝑐] 𝜕𝑃[𝑟][𝑐]

𝜕𝐴[1][1] 𝜕𝐴[1][𝑁𝑐𝐴 ]
𝑃 𝑃 𝜕𝑃[𝑟][𝑐] 𝟏 𝑖𝑓 𝑘 = 𝑖 ∗ 𝑎𝑛𝑑 𝑙 = 𝑗 ∗
∑𝑁 𝑁𝑐
𝑟=1 ∑𝑐=1 𝛁P[𝑟][𝑐], 𝑤ℎ𝑒𝑟𝑒 =
𝑟
⋮ ⋱ ⋮ { ,
𝜕𝑃[𝑟][𝑐] 𝜕𝑃[𝑟][𝑐]
𝜕𝐴[𝑘][𝑙] 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

[𝜕𝐴[𝑁𝑟𝐴][1] 𝜕𝐴[𝑁𝑟𝐴 ][𝑁𝑐𝐴 ]]

where 𝑖 ∗ , 𝑗 ∗ = argmax 𝐴[𝑘][𝑙] and if stride = 1


𝑟≤𝑘≤𝑟+𝑑−1
𝑐≤𝑙≤𝑐+𝑑−1

𝜕𝐴[𝑟][𝑐] 𝜕𝐴[𝑟][𝑐]

𝜕𝐶[1][1] 𝜕𝐶[1][𝑁𝑐𝐴 ]
𝜕𝐴 𝑁𝑟𝐴 𝑁𝑐𝐴 𝜕𝐴[𝑟][𝑐] 𝑁𝑟𝐴 𝑁𝑐𝐴
5. 𝛁C = 𝛁A = ∑𝑟=1 ∑𝑐=1 𝛁A[𝑟][𝑐] = ∑𝑟=1 ∑𝑐=1 ⋮ ⋱ ⋮ 𝛁A[𝑟][𝑐]
𝜕𝐶 𝜕𝐶
𝜕𝐴[𝑟][𝑐] 𝜕𝐴[𝑟][𝑐]

[𝜕𝐶[𝑁𝑟𝐴][1] 𝜕𝐶[𝑁𝑟𝐴 ][𝑁𝑐𝐴 ]]

𝜕𝐴[𝑟][𝑐] 𝟏 𝑖𝑓 𝑖 = 𝑟, 𝑗 = 𝑐, 𝑎𝑛𝑑 𝐶[𝑖][𝑗] > 0


={
𝜕𝐶[𝑖][𝑗] 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]

𝜕𝑊𝑥 [1][1] 𝜕𝑊𝑥[1][𝑁𝑐𝐶 ]
𝜕𝐶 𝑁𝐶 𝑁 𝐶 𝜕𝐶[𝑟][𝑐] 𝑁𝐶 𝑁𝐶
6. 𝛁𝑊𝑥 = 𝛁C = ∑𝑟=1
𝑟
∑𝑐=1
𝑐
𝛁C[𝑟][𝑐] = ∑𝑟=1
𝑟
∑𝑐=1
𝑐
⋮ ⋱ ⋮ 𝛁C[𝑟][𝑐]
𝜕𝑊𝑥 𝜕𝑊𝑥
𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]

[𝜕𝑊𝑥[𝑁𝑟𝐶][1] 𝜕𝑊𝑥 [𝑁𝑟𝐴 ][𝑁𝑐𝑐 ]]
𝜕𝐶[𝑟][𝑐]
𝜕𝑊𝑥[𝑖][𝑗][1]
𝜕𝐶[𝑟][𝑐]
𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]
= 𝜕𝑊𝑥[𝑖][𝑗][2] , 𝜕𝑊 [𝑖][𝑗][𝑙] = X[𝑟 + 𝑖 − 1][𝑐 + 𝑗 − 1][𝑙]
𝜕𝑊𝑥[𝑖][𝑗] 𝑥

𝜕𝐶[𝑟][𝑐]
[𝜕𝑊𝑥[𝑖][𝑗][𝐷]]

𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]
𝑁𝑟𝐶 𝑁𝑐𝐶 ⋯ 𝑁𝑟𝐶 𝑁𝑐𝐶
𝜕𝐶 𝜕𝐶[𝑟][𝑐] 𝜕𝑏𝑥 [1][1] 𝜕𝑏𝑥 [1][𝐾]
𝛁𝑏𝑥 = 𝛁C = ∑ ∑ 𝛁C[𝑟][𝑐] = ∑ ∑ ⋮ ⋱ ⋮ 𝛁C[𝑟][𝑐]
𝜕𝑏𝑥 𝜕𝑏𝑥
𝑟=1 𝑐=1 𝑟=1 𝑐=1 𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]

[𝜕𝑏𝑥 [𝐾][1] 𝜕𝑏𝑥 [𝐾][𝐾]]
𝜕𝐶[𝑟][𝑐] 𝟏 𝑖𝑓 𝑖 = 𝑟, 𝑗 = 𝑐
where ={
𝜕𝑏𝑥 [𝑖][𝑗] 𝟎, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]

𝜕𝑋[1][1] 𝜕𝑋[1][𝑁𝑐𝑋 ]
𝜕𝐶 𝑁𝑟𝐶 𝑁𝑐𝐶 𝜕𝐶[𝑟][𝑐] 𝑁𝑟𝐶 𝑁𝑐𝐶
7. 𝛁X = 𝛁C = ∑𝑟=1 ∑𝑐=1 𝛁C[𝑟][𝑐] = ∑𝑟=1 ∑𝑐=1 ⋮ ⋱ ⋮ 𝛁C[𝑟][𝑐], where
𝜕X 𝜕X
𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]

[𝜕𝑋[𝑁𝑟𝑋 ][1] 𝜕X[𝑁𝑟𝑋 ][𝑁𝑐𝑋 ]]

𝜕𝐶[𝑟][𝑐]
𝜕𝑋[𝑖][𝑗][1]
𝜕𝐶[𝑟][𝑐] 𝜕𝐶[𝑟][𝑐]
= 𝜕𝑋[𝑖][𝑗][2]
𝜕𝑋[𝑖][𝑗]

𝜕𝐶[𝑟][𝑐]
[𝜕X[𝑖][𝑗][𝐷]]
𝜕𝐶[𝑟][𝑐] 𝑊𝑥 [𝑖 − 𝑟 + 1][𝑗 − 𝑐 + 1][𝑙] 𝑖𝑓 𝑟 ≤ 𝑖 ≤ 𝑟 + 𝐾 − 1 𝑎𝑛𝑑 𝑐 ≤ 𝑗 ≤ 𝑐 + 𝐾 − 1
where ={ when
𝜕X[𝑖][𝑗][𝑙] 𝟎, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
stride = 1

Model architecture.

• Input: tensor X ∈ R32x32x3;


• W 1, tensor of weights, its shape = [5, 5, 3, 32], vector of biases b1 ∈ R32;
• 1st Convolutions layer with 32 kernels of size 5x5, stride =1
• 1st Pooling layer with a kernel of size 2x2, stride =1;
• W 2, tensor of weights, its shape = [5, 5, 32, 32], vector of biases b2 ∈ R32;
• 2nd Convolutions layer with 32 kernels of size 5x5, stride =1;
• 1st Pooling layer with a kernel of size 2x2, stride =1;
• W 3, tensor of weights, its shape = [3, 3, 32, 64], vector of biases b3 ∈ R64;
• 3rd Convolutions layer with 64 kernels of size 3x3, stride =1;
• W 4, matrix of weights, its shape = [192, 65536], vector of biases b4 ∈ R192;
• FC, fully connected layer ∈ R192
• W 4, matrix of weights, its shape = [10, 192], vector of biases b4 ∈ R10;
• Output, output layer ∈ R10;
• Loss function – cross entropy error: loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=output, labels=y))
• Optimizer: tf.train.RMSPropOptimizer(learning_rate=1e-3).minimize(loss)

Hyperparameters.
Batch size = 250
Number of epochs = 200

Discussion. Stride > 1 and/or pooling kernel size > 2 will improve the speed of training while
performance will be the same. Obviously, a deeper model and/or Dropout layer will improve
generalization. It took me 2+ hours to train the model. I ran it for only 200 epochs vs 6000 epochs in the
instructions. Hence, relative underperformance.

1st Convolution Layer Filters:

Loss
8
6
4
2
0
3…
0

55
66
11
22
33
44

77
88
99
110
121
132
143
154
165
176
187
198
209
220
231
242

Train_Loss Val_Loss

Terrible overfitting. I would blame gigantic fully connected layer. Good dropout would improve the
situation.
Accuracy
120.00%

100.00%

80.00%

60.00%

40.00%

20.00%

0.00%

0…
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
200
210
220
230
240
Train_Accu Val_Accu

Test accuracy (per class)


Class Accuracy
0 0.6782786885245902
1 0.7465346534653465
2 0.51953125
3 0.386317907444668
4 0.5641025641025641
5 0.5594262295081968
6 0.7535641547861507
7 0.6565656565656566
8 0.7857142857142857
9 0.7309941520467836.

Class #3 is way below (only 0.386) average accuracy (Average train_accu: 99.50%, Average

valid_accu: 63.80%), as well as classes 3,4, and 5.

You might also like