0% found this document useful (0 votes)
11 views27 pages

08classification II

The document discusses neural network architectures like CNNs. It explains concepts like convolutional and fully connected layers, activation functions, loss functions, and training neural networks using gradient descent. CNN architectures like AlexNet are described, as well as techniques like converting fully connected to convolutional layers.

Uploaded by

mohammadtestpi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views27 pages

08classification II

The document discusses neural network architectures like CNNs. It explains concepts like convolutional and fully connected layers, activation functions, loss functions, and training neural networks using gradient descent. CNN architectures like AlexNet are described, as well as techniques like converting fully connected to convolutional layers.

Uploaded by

mohammadtestpi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Classification - II

Lecture 8
Neural Network
CNN architecture
Neural Networks
Basic building block for composition is a
perceptron (Rosenblatt c.1960)
Linear classifier – vector of weights w and a ‘bias’ b
𝑥1 𝒘 = (𝑤1 , 𝑤2 , 𝑤3 )
𝒃 = 0.3

𝑥2 Output (binary)

𝑥3
Neural Network
• Multiple classes
Composition
Layer 1 Layer 2

Sets of layers and the connections (weights) between


them define the network architecture.

Nielsen
CNN architecture
AlexNet : Network Size
CONV1
MAX POOL1
NORM1
CONV2
MAX POOL2
NORM2
CONV3
CONV4
CONV5
• Input 227x227x3 MAX POOL3
FC6
• 5 convolution layers FC7
• 3 dense layers FC8
• Output 1000-D vector
Fully convolutional network
• No feature flattening
• No fully connected layers

• Trick
• Reduce the spatial dimension to 1x1
• Use 1x1 convolution for prediction
• In a way 1x1 convolution is a fully connected layer
Converting FC into conv
MAX POOL FC FC

5 × 5 2 × 2 ⋮ ⋮

14 × 14 × 3 10 × 10 × 16 5 × 5 × 16 400 400 4
y (output layer) softmax
(for 4 classes)
MAX POOL

5 × 5 2 × 2 5 × 5 1 × 1

14 × 14 × 3 10 × 10 × 16 5 × 5 × 16 1 × 1 × 400 1 × 1 × 400 1 × 1 × 4

Credit: Sedat Ozer


Converting FC into conv
MAX POOL FC FC

5 × 5 2 × 2 ⋮ ⋮

14 × 14 × 3 10 × 10 × 16 5 × 5 × 16 400 400 4
y (output layer) softmax
(for 4 classes)
MAX POOL

5 × 5 2 × 2 5 × 5 1 × 1

14 × 14 × 3 10 × 10 × 16 5 × 5 × 16 1 × 1 × 400 1 × 1 × 400 1 × 1 × 4

Question: Which one has fewer parameters? A: FC, B: CONV, C: Similar Credit: Sedat Ozer
Let’s introduce non-linearities
We’re going to introduce non-linear functions to transform the
features.

Nielsen
Activation Functions
Activation Functions
• Uses in the final prediction
• Target range?
• Sigmoid: (0, 1)
• Tanh: (-1, 1)
• Normalized image
• (-1, 1)
• (0, 1)
• (0, INF)
Binary classification
• Target class present or not?
• Single output
• Two outputs
Multi-class
• One prediction for each class
Softmax activation
scores = unnormalized log probabilities of the classes.

where

cat 3.2
car 5.1
frog -1.7

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


Softmax activation
scores = unnormalized log probabilities of the classes.

where

cat 3.2
car 5.1
frog -1.7

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


Softmax activation
scores = unnormalized log probabilities of the classes.

where

cat 3.2 24.5


exp
car 5.1 164.0
frog -1.7 0.18

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


Softmax activation
scores = unnormalized log probabilities of the classes.

where

cat 3.2 24.5 0.13


exp normalize
car 5.1 164.0 0.87
frog -1.7 0.18 0.00
probabilities

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n


Multi-label
• Multiple classes can be active
• Softmax will not work
• Use sigmoid activation
Loss Function
• Way to define how good the network is performing
• In terms of prediction
• Network training (Optimization)
• Find the best network parameters to minimize the loss
Loss Function
• Cross entropy
Train with Gradient Descent
Loss function
(Evaluate CNN
on training data)

𝑎1 𝑎2 𝑎3 𝑎 𝑠𝑡𝑜𝑝 Model parameters


(network weights)
Optimization
• SGD
• Adam
• AdaDelta
•…

• Ongoing research
• You can define your own loss function
Network training
Visualizing Convolution

You might also like