0% found this document useful (0 votes)
16 views48 pages

Chap8 CNN

Uploaded by

Nhật Tân
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views48 pages

Chap8 CNN

Uploaded by

Nhật Tân
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Artificial Neural Network

Dr. Trần Vũ Hoàng


Smaller Network: CNN
• We know it is good to learn a small model.
• From this fully connected model, do we really need all the edges?
• Can some of these be shared?
Consider learning an image:
• Some patterns are much smaller than the whole image

Can represent a small region with fewer parameters

“beak” detector
Consider learning an image:
Same pattern appears in different places:
They can be compressed!
What about training a lot of such “small” detectors
and each detector must “move around”.
“upper-left
beak” detector

They can be compressed


to the same parameters.

“middle beak”
detector
A convolutional layer

A CNN is a neural network with some convolutional layers (and some other layers).
A convolutional layer has a number of filters that does convolutional operation.

Beak detector

A filter
Convolution
These are the network
parameters to be learned.
1 -1 -1
1 0 0 0 0 1 -1 1 -1 Filter 1
0 1 0 0 1 0 -1 -1 1
0 0 1 1 0 0
1 0 0 0 1 0 -1 1 -1
-1 1 -1 Filter 2
0 1 0 0 1 0
0 0 1 0 1 0 -1 1 -1



6 x 6 image
Each filter detects a
small pattern (3 x 3).
Convolution 1 -1 -1
-1 1 -1 Filter 1
-1 -1 1
stride=1

1 0 0 0 0 1 Dot
product
0 1 0 0 1 0 3 -1
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0

6 x 6 image
Convolution 1 -1 -1
-1 1 -1 Filter 1
-1 -1 1
If stride=2

1 0 0 0 0 1
0 1 0 0 1 0 3 -3
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0

6 x 6 image
Convolution 1 -1 -1
-1 1 -1 Filter 1
-1 -1 1
stride=1

1 0 0 0 0 1
0 1 0 0 1 0 3 -1 -3 -1
0 0 1 1 0 0
1 0 0 0 1 0 -3 1 0 -3
0 1 0 0 1 0
0 0 1 0 1 0 -3 -3 0 1

6 x 6 image 3 -2 -2 -1
output = (kích thc - kernel)/stride+1
Convolution -1 1 -1
-1 1 -1 Filter 2
-1 1 -1
stride=1
Repeat this for each filter
1 0 0 0 0 1
0 1 0 0 1 0 3 -1 -3 -1
-1 -1 -1 -1
0 0 1 1 0 0
1 0 0 0 1 0 -3 1 0 -3
-1 -1 -2 1
0 1 0 0 1 0 Feature
0 0 1 0 1 0 -3 -3 Map0 1
-1 -1 -2 1

6 x 6 image 3 -2 -2 -1
-1 0 -4 3
Two 4 x 4 images
Forming 2 x 4 x 4 matrix
Color image: RGB 3 channels

11 -1-1 -1-1 -1-1 11 -1-1


1 -1 -1 -1 1 -1
-1-1 11 -1-1 -1-1-1 111 -1-1-1 Filter 2
-1 1 -1 Filter 1 -1 1 -1
-1-1 -1-1 11 -1-1 11 -1-1
-1 -1 1
Color image
1 0 0 0 0 1
1 0 0 0 0 1
0 11 00 00 01 00 1
0 1 0 0 1 0
0 00 11 01 00 10 0
0 0 1 1 0 0
1 00 00 10 11 00 0
1 0 0 0 1 0
0 11 00 00 01 10 0
0 1 0 0 1 0
0 00 11 00 01 10 0
0 0 1 0 1 0
0 0 1 0 1 0
Convolution v.s. Fully Connected
1 0 0 0 0 1 1 -1 -1 -1 1 -1
0 1 0 0 1 0 -1 1 -1 -1 1 -1
0 0 1 1 0 0 -1 -1 1 -1 1 -1
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
convolution
image

x1
1 0 0 0 0 1
0 1 0 0 1 0 x2
Fully- 0 0 1 1 0 0
1 0 0 0 1 0
connected




0 1 0 0 1 0
0 0 1 0 1 0
x36
1 -1 -1 Filter 1 1 1
-1 1 -1 2 0
-1 -1 1 3 0
4: 0 3
1 0 0 0 0 1


0 1 0 0 1 0 0
0 0 1 1 0 0 8 1
1 0 0 0 1 0 9 0
0 1 0 0 1 0 10: 0


0 0 1 0 1 0
1 0
6 x 6 image
3 0
14
fewer parameters! 15 1 Only connect to 9
16 1 inputs, not fully
connected


1 -1 -1 1: 1
-1 1 -1 Filter 1 2: 0
-1 -1 1 3: 0
4: 0 3
1 0 0 0 0 1


0 1 0 0 1 0 7: 0
0 0 1 1 0 0 8: 1
1 0 0 0 1 0 9: 0 -1
0 1 0 0 1 0 10: 0


0 0 1 0 1 0
1 0
6 x 6 image
3: 0
14:
Fewer parameters 15: 1
16: 1 Shared weights
Even fewer parameters


The whole CNN

cat dog ……
Convolution

Max Pooling
Can
Fully Connected repeat
Feedforward network
Convolution many
times

Max Pooling

Flattened
Max Pooling

1 -1 -1 -1 1 -1
-1 1 -1 Filter 1 -1 1 -1 Filter 2
-1 -1 1 -1 1 -1

3 -1 -3 -1 -1 -1 -1 -1

-3 1 0 -3 -1 -1 -2 1

-3 -3 0 1 -1 -1 -2 1

3 -2 -2 -1 -1 0 -4 3
Why Pooling?
• Subsampling pixels will not change the object
bird
bird

Subsampling

We can subsample the pixels to make image smaller

fewer parameters to characterize the image


A CNN compresses a fully connected network
in two ways:

• Reducing number of connections


• Shared weights on the edges
• Max pooling further reduces the complexity
Max Pooling

New image
1 0 0 0 0 1 but smaller
0 1 0 0 1 0 Conv
3 0
0 0 1 1 0 0 -1 1
1 0 0 0 1 0
0 1 0 0 1 0 Max 3 1
0 3
0 0 1 0 1 0 Pooling
2 x 2 image
6 x 6 image
Each filter
is a channel
The whole CNN

3 0
-1 1 Convolution

3 1
0 3
Max Pooling
A new image Can
repeat
Smaller than the original image Convolution many
times
The number of channels is the
Max Pooling
number of filters
The whole CNN

cat dog ……
Convolution

Max Pooling

Fully Connected A new image


Feedforward network
Convolution

Max Pooling

Flattened A new image


Flattening 3

1
3 0
-1 1 3

3 1 -1
0 3 Flattened

1 Fully Connected
Feedforward network

3
Only modified the network structure and input
CNN in Keras format (vector -> 3-D tensor)

input

Convolution
1 -1 -1
-1 1 -1
-1 1 -1
-1 1 -1 … There are
-1 -1 1 25 3x3
-1 1 -1 … Max Pooling
filters.
Input_shape = ( 28 , 28 , 1)

28 x 28 pixels 1: black/white, 3: RGB Convolution

3 -1 3 Max Pooling

-3 1
Only modified the network structure and input
CNN in Keras format (vector -> 3-D array)

Input
1 x 28 x 28

Convolution
How many parameters for
each filter? 9 25 x 26 x 26

Max Pooling
25 x 13 x 13

Convolution
How many parameters 225=
for each filter? 50 x 11 x 11
25x9
Max Pooling
50 x 5 x 5
Only modified the network structure and input
CNN in Keras format (vector -> 3-D array)

Input
1 x 28 x 28

Output Convolution

25 x 26 x 26
Fully connected Max Pooling
feedforward network
25 x 13 x 13

Convolution
50 x 11 x 11

Max Pooling
1250 50 x 5 x 5
Flattened
AlphaGo

Next move
Neural
(19 x 19
Network positions)

19 x 19 matrix
Fully-connected feedforward network
Black: 1
can be used
white: -1
none: 0 But CNN performs much better
AlphaGo’s policy network

The following is quotation from their Nature article:


Note: AlphaGo does not use Max Pooling.
CNN in speech recognition

The filters move in the


Frequency
CNN frequency direction.

Image Time
Spectrogram
CNN in text classification

Source of image:
http://citeseerx.ist.psu.edu/viewdoc/downlo
ad?doi=10.1.1.703.6858&rep=rep1&type=p
df
Convolutional Neural Networks: 1998.
Input 32*32. CPU

LeNet: a layered model composed of convolution and subsampling operations


followed by a holistic representation and ultimately a classifier for handwritten
digits. [ LeNet ]
Convolutional Neural Networks: 2012.
Input 224*224*3. GPU.

+ data
AlexNet: a layered model composed of convolution,
+ gpu
subsampling, and further operations followed by a holistic
+ non-saturating
representation and all-in-all a landmark classifier on
nonlinearity
ILSVRC12. [ AlexNet ]
+ regularization
VGGNet

• 16 layers
• Only 3*3 convolutions
• 138 million parameters
ResNet

• 152 layers
• ResNet50
The popular CNN

• LeNet, 1998
• AlexNet, 2012
• VGGNet, 2014
• ResNet, 2015
Computational complexity
• The memory bottleneck
• GPU, a few GB
CNN applications
• Transfer learning
• Fine-tuning the CNN
• Keep some early layers
• Early layers contain more generic features, edges, color blobs
• Common to many visual tasks
• Fine-tune the later layers
• More specific to the details of the class
• CNN as feature extractor
• Remove the last fully connected layer
• A kind of descriptor or CNN codes for the image
• AlexNet gives a 4096 Dim descriptor
CNN classification/recognition nets
• CNN layers and fully-connected classification layers
• From ResNet to DenseNet
• Densely connected
• Feature concatenation
Fully convolutional nets: semantic segmentation
• Classification/recognition nets produce ‘non-spatial’ outputs
• the last fully connected layer has the fixed dimension of classes, throws
away spatial coordinates

• Fully convolutional nets output maps as well


Semantic segmentation
Using sliding windows for semantic segmentation
Fully convolutional
Fully convolutional
Detection and segmentation nets:
The Mask Region-based CNN (R-CNN):
• Class-independent region (bounding box) proposals
• From selective search to region proposal net with objectness
• Use CNN to class each region
• Regression on the bounding box or contour segmentation
Using sliding windows for object detection
as classification
Detection and segmentation nets:
The Mask Region-based CNN (R-CNN):
• Class-independent region (bounding box) proposals
• From selective search to region proposal net with objectness
• Use CNN to class each region
• Regression on the bounding box or contour segmentation
Detection and segmentation nets:
The Mask Region-based CNN (R-CNN):
• Mask R-CNN: end-to-end
• Use CNN to make proposals on object/non-object in parallel
Excellent results
Bài tập 11

Trong dữ liệu ex7data.mat chưa dữ liệu lưu dưới dạng dict gồm:
X: 5000x400 là 5000 ảnh nhị phân chữ số viết tay có kích thước 20x20
y: 5000x1 là nhãn của các ảnh tương ứng
Các bạn làm các công việc sau:
- Reshape X về kích thước 5000x1x20x20
- Chia dữ liệu thành 70% train, 30% test (train_test_split) đảm bảo tính ngẫu nhiên và
đồng đều về nhãn.
- Chia dữ liệu train thành 90% train, 10% val (train_test_split) đảm bảo tính ngẫu nhiên
và đồng đều về nhãn.
- Xây dựng một mạng CNN cho phù hợp với dữ liệu trên để đạt được hiệu suất tốt nhất
- Show đường cong loss trong quá trình học
- Show độ chính xác trên tập test

You might also like