08classification II
08classification II
Lecture 8
Neural Network
CNN architecture
Neural Networks
Basic building block for composition is a
perceptron (Rosenblatt c.1960)
Linear classifier – vector of weights w and a ‘bias’ b
𝑥1 𝒘 = (𝑤1 , 𝑤2 , 𝑤3 )
𝒃 = 0.3
𝑥2 Output (binary)
𝑥3
Neural Network
• Multiple classes
Composition
Layer 1 Layer 2
Nielsen
CNN architecture
AlexNet : Network Size
CONV1
MAX POOL1
NORM1
CONV2
MAX POOL2
NORM2
CONV3
CONV4
CONV5
• Input 227x227x3 MAX POOL3
FC6
• 5 convolution layers FC7
• 3 dense layers FC8
• Output 1000-D vector
Fully convolutional network
• No feature flattening
• No fully connected layers
• Trick
• Reduce the spatial dimension to 1x1
• Use 1x1 convolution for prediction
• In a way 1x1 convolution is a fully connected layer
Converting FC into conv
MAX POOL FC FC
5 × 5 2 × 2 ⋮ ⋮
14 × 14 × 3 10 × 10 × 16 5 × 5 × 16 400 400 4
y (output layer) softmax
(for 4 classes)
MAX POOL
5 × 5 2 × 2 5 × 5 1 × 1
14 × 14 × 3 10 × 10 × 16 5 × 5 × 16 1 × 1 × 400 1 × 1 × 400 1 × 1 × 4
5 × 5 2 × 2 ⋮ ⋮
14 × 14 × 3 10 × 10 × 16 5 × 5 × 16 400 400 4
y (output layer) softmax
(for 4 classes)
MAX POOL
5 × 5 2 × 2 5 × 5 1 × 1
14 × 14 × 3 10 × 10 × 16 5 × 5 × 16 1 × 1 × 400 1 × 1 × 400 1 × 1 × 4
Question: Which one has fewer parameters? A: FC, B: CONV, C: Similar Credit: Sedat Ozer
Let’s introduce non-linearities
We’re going to introduce non-linear functions to transform the
features.
Nielsen
Activation Functions
Activation Functions
• Uses in the final prediction
• Target range?
• Sigmoid: (0, 1)
• Tanh: (-1, 1)
• Normalized image
• (-1, 1)
• (0, 1)
• (0, INF)
Binary classification
• Target class present or not?
• Single output
• Two outputs
Multi-class
• One prediction for each class
Softmax activation
scores = unnormalized log probabilities of the classes.
where
cat 3.2
car 5.1
frog -1.7
where
cat 3.2
car 5.1
frog -1.7
where
where
• Ongoing research
• You can define your own loss function
Network training
Visualizing Convolution