Domnic Object Detecion Basics
Domnic Object Detecion Basics
networks-Basics
What is a perceptron?
In the perceptron diagram
● Input vector: The feature vector that is fed
to the neuron. It is usually denoted with an
uppercase x to represent a vector of inputs
(x1, x2, . . ., xn).
● Weights vector: Each x1 is assigned a weight
value w1 that represents its importance to
distinguish between different input
datapoints.
● Neuron function: The calculations performed
within the neuron to modulate the input
signals: the weighted sum and step activation
function.
● Output: Controlled by the type of activation
function you choose for your network.
How does the perceptron learn?
The perceptron uses trial and error to learn from its mistakes. It uses
the weights as knobs by tuning their values up and down until the
network is trained.
The perceptron’s learning logic goes like this:
● The neuron calculates the weighted sum and applies the activation
function to make a prediction ŷ. This is called the feedforward
process:
○ ŷ = activation(Σxi · wi + b)
● It compares the output prediction with the correct label to
calculate the error:
○ error = y - ŷ
● It then updates the weight. If the prediction is too high, it
adjusts the weight to make a lower prediction the next time, and
How does the perceptron learn?
The perceptron uses trial and error to learn from its mistakes. It uses
the weights as knobs by tuning their values up and down until the
network is trained.
The perceptron’s learning logic goes like this:
● The neuron calculates the weighted sum and applies the activation
function to make a prediction ŷ. This is called the feedforward
process:
○ ŷ = activation(Σxi · wi + b)
● It compares the output prediction with the correct label to
calculate the error:
○ error = y - ŷ
● It then updates the weight. If the prediction is too high, it
adjusts the weight to make a lower prediction the next time, and
How does the perceptron learn?
The perceptron uses trial and error to learn from its mistakes. It uses
the weights as knobs by tuning their values up and down until the
network is trained.
The perceptron’s learning logic goes like this:
● The neuron calculates the weighted sum and applies the activation
function to make a prediction ŷ. This is called the feedforward
process:
○ ŷ = activation(Σxi · wi + b)
● It compares the output prediction with the correct label to
calculate the error:
○ error = y - ŷ
● It then updates the weight. If the prediction is too high, it
adjusts the weight to make a lower prediction the next time, and
Multilayer perceptrons
● Linear datasets--The data can be split with a
single straight line.
● Nonlinear datasets--The data cannot be split
with a single straight line. We need more
than one line to form a shape that splits the
data.
Multilayer perceptrons
● A single perceptron works great with simple datasets that can be
separated by a line.
● To split a nonlinear dataset, we need more than one line. This means
we need to come up with an architecture to use tens and hundreds of
neurons in our neural network
Multilayer perceptron
architecture
The main components of the neural network architecture are:
● Hidden layers --The neurons are stacked on top of each other in hidden layers. They are
called “hidden” layers
● Weight connections (edges) --Weights are assigned to each connection between the nodes to
reflect the importance of their influence on the final output prediction
● Output layer -- Depending on the setup of the neural network, the final output may be a
real-valued output (regression problem) or a set of probabilities (classification problem).
This is determined by the type of activation function we use in the neurons in the output
layer
How many layers, and how many nodes in each
layer?
layer 2
layer 3:
Cross-entropy: Cross-entropy is
commonly used in classification
problems because it quantifies the where (y) is the target probability, (p)
difference between two probability is the predicted probability, and (m) is
distributions: the number of classes
error = |ŷ - y |
= |(w · x) - y |
Why do we need an optimization algorithm?
Convolutions!
Just weighted sums of
small areas in image
Convolution:
Hyperparameters:
What kind of pooling? Average, mean, max, min
How big of stride? Controls downsampling
How big of region? Usually not much bigger than stride
3 -8 0 7 10 8 -3 10
-4 2 -6 4 -7 5 5 7
-3 -9 1 8 -8 9 -1 -5
-7 10 -9 -5 9 -8 -7 10
-5 5 9 4 10 -8 7 6
-3 8 0 2 2 -3 -2 5
4 -6 7 -3 1 4 10 0
Maxpooling Layer, 2x2 stride 2
-7 6 -1 3 9 9 6 -9
3 -8 0 7 10 8 -3 10
-4 2 -6 4 -7 5 5 7
6
-3 -9 1 8 -8 9 -1 -5
-7 10 -9 -5 9 -8 -7 10
-5 5 9 4 10 -8 7 6
-3 8 0 2 2 -3 -2 5
4 -6 7 -3 1 4 10 0
Maxpooling Layer, 2x2 stride 2
-7 6 -1 3 9 9 6 -9
3 -8 0 7 10 8 -3 10
-4 2 -6 4 -7 5 5 7
6
-3 -9 1 8 -8 9 -1 -5
-7 10 -9 -5 9 -8 -7 10
-5 5 9 4 10 -8 7 6
-3 8 0 2 2 -3 -2 5
4 -6 7 -3 1 4 10 0
Maxpooling Layer, 2x2 stride 2
-7 6 -1 3 9 9 6 -9
3 -8 0 7 10 8 -3 10
-4 2 -6 4 -7 5 5 7
6 7
-3 -9 1 8 -8 9 -1 -5
-7 10 -9 -5 9 -8 -7 10
-5 5 9 4 10 -8 7 6
-3 8 0 2 2 -3 -2 5
4 -6 7 -3 1 4 10 0
Maxpooling Layer, 2x2 stride 2
-7 6 -1 3 9 9 6 -9
3 -8 0 7 10 8 -3 10
-4 2 -6 4 -7 5 5 7
6 7
-3 -9 1 8 -8 9 -1 -5
-7 10 -9 -5 9 -8 -7 10
-5 5 9 4 10 -8 7 6
-3 8 0 2 2 -3 -2 5
4 -6 7 -3 1 4 10 0
Maxpooling Layer, 2x2 stride 2
-7 6 -1 3 9 9 6 -9
3 -8 0 7 10 8 -3 10
-4 2 -6 4 -7 5 5 7
6 7 10
-3 -9 1 8 -8 9 -1 -5
-7 10 -9 -5 9 -8 -7 10
-5 5 9 4 10 -8 7 6
-3 8 0 2 2 -3 -2 5
4 -6 7 -3 1 4 10 0
Maxpooling Layer, 2x2 stride 2
-7 6 -1 3 9 9 6 -9
3 -8 0 7 10 8 -3 10
-4 2 -6 4 -7 5 5 7
6 7 10 10
-3 -9 1 8 -8 9 -1 -5
-7 10 -9 -5 9 -8 -7 10
-5 5 9 4 10 -8 7 6
-3 8 0 2 2 -3 -2 5
4 -6 7 -3 1 4 10 0
Maxpooling Layer, 2x2 stride 2
-7 6 -1 3 9 9 6 -9
3 -8 0 7 10 8 -3 10
-4 2 -6 4 -7 5 5 7
6 7 10 10
-3 -9 1 8 -8 9 -1 -5
2
-7 10 -9 -5 9 -8 -7 10
-5 5 9 4 10 -8 7 6
-3 8 0 2 2 -3 -2 5
4 -6 7 -3 1 4 10 0
Maxpooling Layer, 2x2 stride 2
-7 6 -1 3 9 9 6 -9
3 -8 0 7 10 8 -3 10
-4 2 -6 4 -7 5 5 7
6 7 10 10
-3 -9 1 8 -8 9 -1 -5
2 8
-7 10 -9 -5 9 -8 -7 10
-5 5 9 4 10 -8 7 6
-3 8 0 2 2 -3 -2 5
4 -6 7 -3 1 4 10 0
Maxpooling Layer, 2x2 stride 2
-7 6 -1 3 9 9 6 -9
3 -8 0 7 10 8 -3 10
-4 2 -6 4 -7 5 5 7
6 7 10 10
-3 -9 1 8 -8 9 -1 -5
2 8 9
-7 10 -9 -5 9 -8 -7 10
-5 5 9 4 10 -8 7 6
-3 8 0 2 2 -3 -2 5
4 -6 7 -3 1 4 10 0
Maxpooling Layer, 2x2 stride 2
-7 6 -1 3 9 9 6 -9
3 -8 0 7 10 8 -3 10
-4 2 -6 4 -7 5 5 7
6 7 10 10
-3 -9 1 8 -8 9 -1 -5
2 8 9 7
-7 10 -9 -5 9 -8 -7 10
-5 5 9 4 10 -8 7 6
-3 8 0 2 2 -3 -2 5
4 -6 7 -3 1 4 10 0
Maxpooling Layer, 2x2 stride 2
-7 6 -1 3 9 9 6 -9
3 -8 0 7 10 8 -3 10
-4 2 -6 4 -7 5 5 7
6 7 10 10
-3 -9 1 8 -8 9 -1 -5
2 8 9 7
-7 10 -9 -5 9 -8 -7 10
10
-5 5 9 4 10 -8 7 6
-3 8 0 2 2 -3 -2 5
4 -6 7 -3 1 4 10 0
Maxpooling Layer, 2x2 stride 2
-7 6 -1 3 9 9 6 -9
3 -8 0 7 10 8 -3 10
-4 2 -6 4 -7 5 5 7
6 7 10 10
-3 -9 1 8 -8 9 -1 -5
2 8 9 7
-7 10 -9 -5 9 -8 -7 10
10 9
-5 5 9 4 10 -8 7 6
-3 8 0 2 2 -3 -2 5
4 -6 7 -3 1 4 10 0
Maxpooling Layer, 2x2 stride 2
-7 6 -1 3 9 9 6 -9
3 -8 0 7 10 8 -3 10
-4 2 -6 4 -7 5 5 7
6 7 10 10
-3 -9 1 8 -8 9 -1 -5
2 8 9 7
-7 10 -9 -5 9 -8 -7 10
10 9 10
-5 5 9 4 10 -8 7 6
-3 8 0 2 2 -3 -2 5
4 -6 7 -3 1 4 10 0
Maxpooling Layer, 2x2 stride 2
-7 6 -1 3 9 9 6 -9
3 -8 0 7 10 8 -3 10
-4 2 -6 4 -7 5 5 7
6 7 10 10
-3 -9 1 8 -8 9 -1 -5
2 8 9 7
-7 10 -9 -5 9 -8 -7 10
10 9 10 10
-5 5 9 4 10 -8 7 6
-3 8 0 2 2 -3 -2 5
4 -6 7 -3 1 4 10 0
Maxpooling Layer, 2x2 stride 2
-7 6 -1 3 9 9 6 -9
3 -8 0 7 10 8 -3 10
-4 2 -6 4 -7 5 5 7
6 7 10 10
-3 -9 1 8 -8 9 -1 -5
2 8 9 7
-7 10 -9 -5 9 -8 -7 10
10 9 10 10
-5 5 9 4 10 -8 7 6
8
-3 8 0 2 2 -3 -2 5
4 -6 7 -3 1 4 10 0
Maxpooling Layer, 2x2 stride 2
-7 6 -1 3 9 9 6 -9
3 -8 0 7 10 8 -3 10
-4 2 -6 4 -7 5 5 7
6 7 10 10
-3 -9 1 8 -8 9 -1 -5
2 8 9 7
-7 10 -9 -5 9 -8 -7 10
10 9 10 10
-5 5 9 4 10 -8 7 6
8 7
-3 8 0 2 2 -3 -2 5
4 -6 7 -3 1 4 10 0
Maxpooling Layer, 2x2 stride 2
-7 6 -1 3 9 9 6 -9
3 -8 0 7 10 8 -3 10
-4 2 -6 4 -7 5 5 7
6 7 10 10
-3 -9 1 8 -8 9 -1 -5
2 8 9 7
-7 10 -9 -5 9 -8 -7 10
10 9 10 10
-5 5 9 4 10 -8 7 6
8 7 4
-3 8 0 2 2 -3 -2 5
4 -6 7 -3 1 4 10 0
Maxpooling Layer, 2x2 stride 2
-7 6 -1 3 9 9 6 -9
3 -8 0 7 10 8 -3 10
-4 2 -6 4 -7 5 5 7
6 7 10 10
-3 -9 1 8 -8 9 -1 -5
2 8 9 7
-7 10 -9 -5 9 -8 -7 10
10 9 10 10
-5 5 9 4 10 -8 7 6
8 7 4 10
-3 8 0 2 2 -3 -2 5
4 -6 7 -3 1 4 10 0
(Fully) Connected Layer
The standard neural network layer where every
input neuron connects to every output neuron
Pooling layers:
Used to downsample feature maps, make processing more
efficient
Most common: maxpool, avgpool sometimes used at end
Connected layers:
Often used as last layer, to map image features -> prediction
No spatial information
Inefficient: lots of weights, no weight sharing
Convnet for Image
Classification
Object Localization
● But for Localization, to get the bounding
box, we need 4 outputs per class.