Neural network intro lecture 4
Neural network intro lecture 4
label
dataset (with label)
Cat
Dog
Neural network
We can have million of neuron to analyse and learn the pattern (features) of a given data
and memorize the pattern.
NN Basic and Concept
Neural Network components
1) Layers
2) Input and output
3) Loss function
4) optimizer
Layer
Layers
Input layer
(features)
Output layer
(classes/labels)
Input & Output
n
Class 1
m
Class 2
Xmn
Sentosa
X3 others
X4
The four features at first row are input to blue layer at once
Loss Function
4 main components
1) Neuron
2) Weight and Biases
3) Activation function
4) Feedforward
Neuron
Artificial
b1 neuron is
w1
referred
S F perceptron
Weight & Bias
s = w*x + b
Activation function
F(s) = 1
s = w*x + b 1+ e-x
Feed forward
F(s) = 1
s = w*x + b 1+ e-x
Loss
Forward direction to
reach F(S)
x=1
w=0.3 x b
w
b= - 0.3
s F
s=
F(s) = F(S)
Target value (label)
target value =
Loss =
Loss = target value – F(S)
Loss is the different between target value and F(s). Also called error
MSE
Loss
Forward direction to
reach F(S)
x=1
x b
w=0.3
w
w=0.3, b=-0.3
b= - 0.3
s F s = w*x +b
s= F(S) w x b s target value loss
F(s) =
Target value (label) 0.3*1 + (-0.3) = 0 0 0
target value =
Loss =
0.3*2 + (-0.3) = 0.3 -1 1.68
Loss = target value – F(S) 0.3*3 + (-0.3) = 0.6 -2 6.75
0.3*4 + (-0.3) = 0.9 -3 15.21
Lets x = 1, 2, 3, 4
Optimizer
4 main components
1) Backpropagation
2) Optimizer
3) Learning rate
4) Epoch & Accuracy
Back propagation
Backward direction
We go
backward in b
effort to x w
minimize s F Loss
the loss F(S)
Target value (label)
We go backward
Change in value
in effort to b
minimize x w
s= (x*w) + b What are w and b so that loss is zero?
the loss using s F
F(s) w=-1, b=1
Optimizer. It is a
function to Target value (label) s = w*x +b
change w x b s target value loss
w and b so that -1*1 + 1 = 0 0 0
loss is zero
-1*2 + 1 = -1 -1 0
-1*3 + 1 = -2 -2 0
Loss = target value – F(s) = 0 0
-1*4 + 1 = -3 -3
Learning rate
Backward direction
Optimizer
updates Change in value
weight and x
b
w
bias toward s= (x*w) + b
s F
zero loss. F(s)
Learning rate Target value (label)
is the rate of
optimizer
changes
weights and Loss = target value – F(s) = 0
biases
Epoch
Backward direction
Forward direction to
One epoch Reach target value
Input layer 1 output
consists of (Dataset)
one forward
direction and b
then one x1, x2, w Optimizer
x3, x4,
backward x5, x6 s F Loss
F(s)
direction and Target value (label)
optimizer is
executed 6 inputs are fetched into neuron in one cycle is one epoch
once for each
samples in You need run several epoch until loss is zero
dataset
Now let adding more layers (multilayer)
Adding more neuron
S2 F 1
F(S2) =
1+ e-s Every node has function F(s)
Adding more layers
Input layer 1 layer 2 output
2 inputs
b3 Model has 2 hidden layers
Layer 1 has 2 nodes
x1 w1 b1 Layer 2 has 3 nodes
S F w11
w5 Layer output has two labels
w2
S F w6 w12
w7 b4 S F Target value 1
w3
x2 w13
w4 b2 S F
w8 w14
Target value 2
S F w15 S F
w9
b5 w16
w10 Every input has path to every node
Every node has path to every node
S F
Every path has weight and value are different
Every node has different bias (b) except the output nodes
Every node has same activation function (F)
Every node has different output F(s)
Every node has to do summation (s)
F(s) is Activation function
Layer 1
Final value F(s) come out from nodes
determine by activation function. In
this example we use sigmoid function as
wn+1 activation function.
S F F(s)
1
F(S) =
1+ e-s
F(s)
1
Sigmoid function: F(S) = 1+ e-x
Sigmoid vs. Tanh ReLU Leaky ReLU
Feed forward
From x forward direction to
reach F(S)
x1
S F
S F
S F F1(S)
x2 S F
S F S F F2(S)
S F
Loss
Forward direction to
reach F(S)
x1
S F
S F
S F Target value (label)
S F
Back propagation
From F(s) backward direction
To reach x
S F
Optimizer
From x forward direction to
reach F(S)
We go backward Input layer 1 layer 2 output
in effort to b3
minimize x1 b1
w1 S F
the loss. w5 w11
w2
Optimizer is a S F w6 w12
w7 b4 S F Target value 1
function to x2
w3
w13
change w4 b2 S F
w8 w14
weights and Target value 2
S F w15 S F
biases so that w9
b5
w10 w16
loss is zero
S F
Optimizer
Backward direction
Change in value
x1
w1 b1 b3 Loss
S F
w2 b2 S F Target value (label)
w3
S F
GradientDescentOptimizer
AdadeltaOptimizer
MomentumOptimizer
AdamOptimizer
FtrlOptimizer
RMSPropOptimizer
Learning rate
Backward direction
Forward direction to
One epoch Reach target value
Input layer 1 layer 2 output
consists of (Dataset)
one forward
direction and x1
S F Loss
then one
backward S F
Target value (label)
S F
direction and
optimizer is S F
Loss = target value - F
executed
S F S F
once for all
sample in
dataset. S F
Epoch, batch & iterations
Epoch
Batch
Iteration
Epoch, batch & iterations
Epoch = 40
Num_of_batch (iteration) = 5
Batch_size = 20
Dataset is 1 sample
Epoch = 4
Num_of_batch = 1
Batch_size = 1
w1 b1
X1 S F w11
w5
w2
S F w6 w12
w7 b4 S F
w3
x2 w13
w4 b2 S F
w8 w14
S F w15 S F
w9
b5 w16
w10
S F
Layers How many layers?
Input
How many nodes?
Output How many inputs?
How many activation functions?
How many classes?
How many weights ?
How many biases
x4 How many optimizer?
How many parameters?
Assessing performance
Assessing the performance
Train data Test data
(80%) (20%)
Dataset is 100 samples
Validation phase
Epoch = 40
Each single epoch → we run train data.
Num_of_batch = 5
End of each single epoch → we run test data
Batch_size = 20
Dropped out