0% found this document useful (0 votes)

32 views

Lecture 10 Basic CNN

The document provides an overview of convolutional neural networks using PyTorch. It begins by revising a fully connected neural network, then introduces convolutional networks with their basic building blocks of convolution and subsampling layers. It explains convolution operations for a single input channel, demonstrating how a kernel is convolved across an input to produce individual values in the output feature map. This helps build an understanding of how convolutional networks extract features from images.

Uploaded by

lingyun wu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views

Lecture 10 Basic CNN

Uploaded by

lingyun wu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 65

PyTorch Tutorial

10. Basic CNN

Lecturer : Hongpu Liu Lecture 10-1 PyTorch Tutorial @ SLAM Research Group
Revision: Fully Connected Neural Network

class Net(torch.nn.Module):
(𝑁, 784)
def __init__(self):
(𝑁, 1,28,28)
super(Net, self).__init__()
(𝑁, 512)
self.l1 = torch.nn.Linear(784, 512)
self.l2 = torch.nn.Linear(512, 256)
(𝑁, 512)
Input Layer self.l3 = torch.nn.Linear(256, 128)
self.l4 = torch.nn.Linear(128, 64)
Linear Layer (𝑁, 256)
self.l5 = torch.nn.Linear(64, 10)
ReLU Layer (𝑁, 256)
def forward(self, x):
Output Layer x = x.view(-1, 784)
(𝑁, 128)
x = F.relu(self.l1(x))
(𝑁, 128)
x = F.relu(self.l2(x))
x = F.relu(self.l3(x))
(𝑁, 64) x = F.relu(self.l4(x))
return self.l5(x)
(𝑁, 64)
(𝑁, 10) model = Net()

Lecturer : Hongpu Liu Lecture 10-2 PyTorch Tutorial @ SLAM Research Group
Convolutional Neural Network

𝑪𝟏 𝑺𝟏 𝑪𝟐 𝑺𝟐 𝒏𝟏 𝒏𝟐
Input Feature maps Feature maps Feature maps Feature maps Output
1 × 28 × 28 4 × 24 × 24 4 × 12 × 12 8×8×8 8×4×4

0
1

8
9
5×5 2×2 5×5 2×2 Fully Fully
Convolution Subsampling Convolution Subsampling Connected Connected

Feature Extraction Classification

Lecturer : Hongpu Liu Lecture 10-3 PyTorch Tutorial @ SLAM Research Group
Convolution

Output Channel

Patch

Red
Input Channel Green
Blue Height

Width

Lecturer : Hongpu Liu Lecture 10-4 PyTorch Tutorial @ SLAM Research Group
Convolution – Single Input Channel

Input Kernel Output

3 4 6 5 7

2 4 6 8 2 1 2 3

1 6 7 8 4 ⨀ 4 5 6 =
9 7 4 6 2 7 8 9

3 7 5 4 1

3∙𝟏+4∙𝟐+6∙𝟑+2∙𝟒+4∙𝟓+6∙𝟔+1∙𝟕+6∙𝟖+7∙𝟗=
3 + 8 + 18 + 8 + 20 + 36 + 7 + 48 + 63 = 𝟐𝟏𝟏
Lecturer : Hongpu Liu Lecture 10-5 PyTorch Tutorial @ SLAM Research Group
Convolution – Single Input Channel

Input Kernel Output

3 4 6 5 7

2 4 6 8 2 1 2 3 211

1 6 7 8 4 ⨀ 4 5 6 =
9 7 4 6 2 7 8 9

3 7 5 4 1

3∙𝟏+4∙𝟐+6∙𝟑+2∙𝟒+4∙𝟓+6∙𝟔+1∙𝟕+6∙𝟖+7∙𝟗=
3 + 8 + 18 + 8 + 20 + 36 + 7 + 48 + 63 = 𝟐𝟏𝟏
Lecturer : Hongpu Liu Lecture 10-6 PyTorch Tutorial @ SLAM Research Group
Convolution – Single Input Channel

Input Kernel Output

3 4 6 5 7

2 4 6 8 2 1 2 3 211

1 6 7 8 4 ⨀ 4 5 6 =
9 7 4 6 2 7 8 9

3 7 5 4 1

4∙𝟏+6∙𝟐+5∙𝟑+4∙𝟒+6∙𝟓+8∙𝟔+6∙𝟕+7∙𝟖+8∙𝟗=
4 + 12 + 15 + 16 + 30 + 48 + 42 + 56 + 72 = 𝟐𝟗𝟓
Lecturer : Hongpu Liu Lecture 10-7 PyTorch Tutorial @ SLAM Research Group
Convolution – Single Input Channel

Input Kernel Output

3 4 6 5 7

2 4 6 8 2 1 2 3 211 295

1 6 7 8 4 ⨀ 4 5 6 =
9 7 4 6 2 7 8 9

3 7 5 4 1

4∙𝟏+6∙𝟐+5∙𝟑+4∙𝟒+6∙𝟓+8∙𝟔+6∙𝟕+7∙𝟖+8∙𝟗=
4 + 12 + 15 + 16 + 30 + 48 + 42 + 56 + 72 = 𝟐𝟗𝟓
Lecturer : Hongpu Liu Lecture 10-8 PyTorch Tutorial @ SLAM Research Group
Convolution – Single Input Channel

Input Kernel Output

3 4 6 5 7

2 4 6 8 2 1 2 3 211 295 262

1 6 7 8 4 ⨀ 4 5 6 =
9 7 4 6 2 7 8 9

3 7 5 4 1

Lecturer : Hongpu Liu Lecture 10-9 PyTorch Tutorial @ SLAM Research Group
Convolution – Single Input Channel

Input Kernel Output

3 4 6 5 7

2 4 6 8 2 1 2 3 211 295 262

1 6 7 8 4 ⨀ 4 5 6 = 259

9 7 4 6 2 7 8 9

3 7 5 4 1

Lecturer : Hongpu Liu Lecture 10-10 PyTorch Tutorial @ SLAM Research Group
Convolution – Single Input Channel

Input Kernel Output

3 4 6 5 7

2 4 6 8 2 1 2 3 211 295 262

1 6 7 8 4 ⨀ 4 5 6 = 259 282

9 7 4 6 2 7 8 9

3 7 5 4 1

Lecturer : Hongpu Liu Lecture 10-11 PyTorch Tutorial @ SLAM Research Group
Convolution – Single Input Channel

Input Kernel Output

3 4 6 5 7

2 4 6 8 2 1 2 3 211 295 262

1 6 7 8 4 ⨀ 4 5 6 = 259 282 214

9 7 4 6 2 7 8 9

3 7 5 4 1

Lecturer : Hongpu Liu Lecture 10-12 PyTorch Tutorial @ SLAM Research Group
Convolution – Single Input Channel

Input Kernel Output

3 4 6 5 7

2 4 6 8 2 1 2 3 211 295 262

1 6 7 8 4 ⨀ 4 5 6 = 259 282 214

9 7 4 6 2 7 8 9 251

3 7 5 4 1

Lecturer : Hongpu Liu Lecture 10-13 PyTorch Tutorial @ SLAM Research Group
Convolution – Single Input Channel

Input Kernel Output

3 4 6 5 7

2 4 6 8 2 1 2 3 211 295 262

1 6 7 8 4 ⨀ 4 5 6 = 259 282 214

9 7 4 6 2 7 8 9 251 253

3 7 5 4 1

Lecturer : Hongpu Liu Lecture 10-14 PyTorch Tutorial @ SLAM Research Group
Convolution – Single Input Channel

Input Kernel Output

3 4 6 5 7

2 4 6 8 2 1 2 3 211 295 262

1 6 7 8 4 ⨀ 4 5 6 = 259 282 214

9 7 4 6 2 7 8 9 251 253 169

3 7 5 4 1

Lecturer : Hongpu Liu Lecture 10-15 PyTorch Tutorial @ SLAM Research Group
Convolution – 3 Input Channels

3 4 6 5 7
2 4 6 8 2
1 6 7 8 4
9 7 4 6 2
3 7 5 4 1

Lecturer : Hongpu Liu Lecture 10-16 PyTorch Tutorial @ SLAM Research Group
Convolution – 3 Input Channels

3 4 6 5 7 211 295 262

2 4 6 8 2 1 2 3
1
9
6
7
7
4
8
6
4
2
⨀ 4 5 6
7 8 9
= 259 282 214

251 253 169

3 7 5 4 1

3 4 6 5 7
2 4 6 8 2
1 6 7 8 4
9 7 4 6 2
3 7 5 4 1

Lecturer : Hongpu Liu Lecture 10-17 PyTorch Tutorial @ SLAM Research Group
Convolution – 3 Input Channels

3 4 6 5 7 211 295 262

2 4 6 8 2 1 2 3
1
9
6
7
7
4
8
6
4
2
⨀ 4 5 6
7 8 9
= 259 282 214

251 253 169

3 7 5 4 1

3 4 6 5 7 179 245 268

2 4 6 8 2 9 8 7
1
9
6
7
7
4
8
6
4
2
⨀ 6 5 4
3 2 1
= 201 278 256

239 287 241

3 7 5 4 1

3 4 6 5 7
2 4 6 8 2
1 6 7 8 4
9 7 4 6 2
3 7 5 4 1

Lecturer : Hongpu Liu Lecture 10-18 PyTorch Tutorial @ SLAM Research Group
Convolution – 3 Input Channels

3 4 6 5 7 211 295 262

2 4 6 8 2 1 2 3
1
9
6
7
7
4
8
6
4
2
⨀ 4 5 6
7 8 9
= 259 282 214

251 253 169

3 7 5 4 1

3 4 6 5 7 179 245 268

2 4 6 8 2 9 8 7
1
9
6
7
7
4
8
6
4
2
⨀ 6 5 4
3 2 1
= 201 278 256

239 287 241

3 7 5 4 1

3 4 6 5 7 235 297 248

2 4 6 8 2 1 4 7
1
9
6
7
7
4
8
6
4
2
⨀ 2 5 8
3 6 9
= 253 294 204

255 259 169

3 7 5 4 1

Lecturer : Hongpu Liu Lecture 10-19 PyTorch Tutorial @ SLAM Research Group
Convolution – 3 Input Channels

3 4 6 5 7 211 295 262

2 4 6 8 2 1 2 3
1
9
6
7
7
4
8
6
4
2
⨀ 4 5 6
7 8 9
= 259 282 214

251 253 169

3 7 5 4 1

3 4 6 5 7 179 245 268

2 4 6 8 2 9 8 7
1
9
6
7
7
4
8
6
4
2
⨀ 6 5 4
3 2 1
= 201 278 256
+
239 287 241
3 7 5 4 1

3 4 6 5 7 235 297 248

2 4 6 8 2 1 4 7
1
9
6
7
7
4
8
6
4
2
⨀ 2 5 8
3 6 9
= 253 294 204

255 259 169

3 7 5 4 1

Lecturer : Hongpu Liu Lecture 10-20 PyTorch Tutorial @ SLAM Research Group
Convolution – 3 Input Channels

3 4 6 5 7 211 295 262

2 4 6 8 2 1 2 3
1
9
6
7
7
4
8
6
4
2
⨀ 4 5 6
7 8 9
= 259 282 214

251 253 169

3 7 5 4 1

3 4 6 5 7 179 245 268 625 837 778

2 4 6 8 2 9 8 7
1
9
6
7
7
4
8
6
4
2
⨀ 6 5 4
3 2 1
= 201 278 256
+ = 713 854 674

239 287 241 745 799 579

3 7 5 4 1

3 4 6 5 7 235 297 248

2 4 6 8 2 1 4 7
1
9
6
7
7
4
8
6
4
2
⨀ 2 5 8
3 6 9
= 253 294 204

255 259 169

3 7 5 4 1

Lecturer : Hongpu Liu Lecture 10-21 PyTorch Tutorial @ SLAM Research Group
Convolution – 3 Input Channels

3 4 6 5 7 211 295 262

2 4 6 8 2 1 2 3 Convolution
1
9
6
7
7
4
8
6
4
2
⨀ 4 5 6
7 8 9
= 259 282 214

251 253 169

3 7 5 4 1

3 4 6 5 7 179 245 268 625 837 778

2 4 6 8 2 9 8 7
1
9
6
7
7
4
8
6
4
2
⨀ 6 5 4
3 2 1
= 201 278 256
+ = 713 854 674

239 287 241 745 799 579

3 7 5 4 1

3 4 6 5 7 235 297 248

2 4 6 8 2 1 4 7
1
9
6
7
7
4
8
6
4
2
⨀ 2 5 8
3 6 9
= 253 294 204

255 259 169

3 7 5 4 1

Lecturer : Hongpu Liu Lecture 10-22 PyTorch Tutorial @ SLAM Research Group
Convolution – 3 Input Channels

3 4 6 5 7
3 4 6 5 7 211 295 262
2 34 46 68 52 7 1 2 3 179 245 268 625 837 778
2 4 6 8 2 9 8 7
1 26 47 68 84 2 4 1
5 4
6 7
⨀
259 235
282 297
214 248
1 6 7 8 4
9 17 64 76 82 4
9 7 4 6 2
6 5 4
7 2
8 5
9 8
3 2 1
= 201 278 256 + 713 854 674

3 97 75 44 61 2
251 253 294
169 204
3 6 9
3 7 5 4 1 239 287 241 745 799 579
3 7 5 4 1
255 259 169

Convolution

Lecturer : Hongpu Liu Lecture 10-23 PyTorch Tutorial @ SLAM Research Group
Convolution – 3 Input Channels

3 97 75 44 61 2
251 253 294
169 204
3 6 9
3 7 5 4 1 239 287 241 745 799 579
3 7 5 4 1
255 259 169

Convolution

3×3
Convolution

Lecturer : Hongpu Liu Lecture 10-24 PyTorch Tutorial @ SLAM Research Group
Convolution – 3 Input Channels

3 97 75 44 61 2
251 253 294
169 204
3 6 9
3 7 5 4 1 239 287 241 745 799 579
3 7 5 4 1
255 259 169

Convolution

3×3
Convolution

Lecturer : Hongpu Liu Lecture 10-25 PyTorch Tutorial @ SLAM Research Group
Convolution – 3 Input Channels

3 97 75 44 61 2
251 253 294
169 204
3 6 9
3 7 5 4 1 239 287 241 745 799 579
3 7 5 4 1
255 259 169

𝑐ℎ𝑎𝑛𝑛𝑒𝑙 = 3
Convolution
𝑐ℎ𝑎𝑛𝑛𝑒𝑙 = 3 𝑐ℎ𝑎𝑛𝑛𝑒𝑙 = 1

ℎ𝑒𝑖𝑔ℎ𝑡 = 5 ℎ𝑒𝑖𝑔ℎ𝑡 = 3 ℎ𝑒𝑖𝑔ℎ𝑡 = 3

𝑤𝑖𝑑𝑡ℎ = 3
𝑤𝑖𝑑𝑡ℎ = 3
𝑤𝑖𝑑𝑡ℎ = 5 3×3
Convolution
Lecturer : Hongpu Liu Lecture 10-26 PyTorch Tutorial @ SLAM Research Group
Convolution – N Input Channels

𝑐ℎ𝑎𝑛𝑛𝑒𝑙 = 𝑛 The shape of kernel tensor is (𝑛, 3, 3)

𝑐ℎ𝑎𝑛𝑛𝑒𝑙 = 𝑛 𝑐ℎ𝑎𝑛𝑛𝑒𝑙 = 1

ℎ𝑒𝑖𝑔ℎ𝑡 = 5 ℎ𝑒𝑖𝑔ℎ𝑡 = 3 ℎ𝑒𝑖𝑔ℎ𝑡 = 3

𝑤𝑖𝑑𝑡ℎ = 3
𝑤𝑖𝑑𝑡ℎ = 3
𝑤𝑖𝑑𝑡ℎ = 5
3×3
Convolution

Lecturer : Hongpu Liu Lecture 10-27 PyTorch Tutorial @ SLAM Research Group
Convolution – N Input Channels and M Output Channels

𝑐ℎ𝑎𝑛𝑛𝑒𝑙 = 𝑛
𝑓𝑖𝑙𝑡𝑒𝑟1 𝐹𝑒𝑎𝑡𝑢𝑟𝑒 𝑚𝑎𝑝1

Cat ⋯
𝐹𝑒𝑎𝑡𝑢𝑟𝑒 𝑚𝑎𝑝2
⋮
𝑓𝑖𝑙𝑡𝑒𝑟2

⋮ ⋮ ⋮
(𝑚, 𝑤𝑖𝑑𝑡ℎ𝑜𝑢𝑡 , ℎ𝑒𝑖𝑔ℎ𝑡𝑜𝑢𝑡 )
(𝑛, 𝑤𝑖𝑑𝑡ℎ𝑖𝑛 , ℎ𝑒𝑖𝑔ℎ𝑡𝑖𝑛 )

𝑓𝑖𝑙𝑡𝑒𝑟𝑚 𝐹𝑒𝑎𝑡𝑢𝑟𝑒 𝑚𝑎𝑝𝑚

Lecturer : Hongpu Liu Lecture 10-28 PyTorch Tutorial @ SLAM Research Group
Convolutional Layer

This layer needs 𝒎 filters, which size are:

𝒏 × 𝒌𝒆𝒓𝒏𝒆𝒍_𝒔𝒊𝒛𝒆𝒘𝒊𝒅𝒕𝒉 × 𝒌𝒆𝒓𝒏𝒆𝒍_𝒔𝒊𝒛𝒆𝒉𝒆𝒊𝒈𝒉𝒕

4-Dimension Tensor
𝑚 × 𝑤𝑖𝑑𝑡ℎ𝑜𝑢𝑡 ,× ℎ𝑒𝑖𝑔ℎ𝑡𝑜𝑢𝑡
𝑛 × 𝑤𝑖𝑑𝑡ℎ𝑖𝑛 ,× ℎ𝑒𝑖𝑔ℎ𝑡𝑖𝑛

⋱
𝑚 × 𝑛 × 𝑘𝑒𝑟𝑛𝑒𝑙_𝑠𝑖𝑧𝑒𝑤𝑖𝑑𝑡ℎ × 𝑘𝑒𝑟𝑛𝑒𝑙_𝑠𝑖𝑧𝑒ℎ𝑒𝑖𝑔ℎ𝑡

Lecturer : Hongpu Liu Lecture 10-29 PyTorch Tutorial @ SLAM Research Group
Convolutional Layer