0% found this document useful (0 votes)
4 views

Neural network intro lecture 4

The document provides an overview of deep learning and neural networks, detailing their components such as layers, input/output, loss functions, and optimizers. It explains the processes of feedforward, backpropagation, and the importance of parameters and hyperparameters in training models. Additionally, it discusses performance assessment, including concepts like overfitting and dropout techniques to improve model accuracy.

Uploaded by

bukaraisha99
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Neural network intro lecture 4

The document provides an overview of deep learning and neural networks, detailing their components such as layers, input/output, loss functions, and optimizers. It explains the processes of feedforward, backpropagation, and the importance of parameters and hyperparameters in training models. Additionally, it discusses performance assessment, including concepts like overfitting and dropout techniques to improve model accuracy.

Uploaded by

bukaraisha99
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Neural Network

What is Deep Learning

label
dataset (with label)

Cat

Dog

It is called deep learning if we use neural network as a model in supervise learning


What is Deep Learning

Neural network

We can have million of neuron to analyse and learn the pattern (features) of a given data
and memorize the pattern.
NN Basic and Concept
Neural Network components

Neural network compose of 4 main components

1) Layers
2) Input and output
3) Loss function
4) optimizer
Layer
Layers

Node aka neuron


(e.g. layer has 4
neurons)

Fully connected layer/ dense


Input & Output

Input layer
(features)
Output layer
(classes/labels)
Input & Output
n
Class 1
m
Class 2

Xmn

The whole image is input


to first layer at once.

E.g. If you have 100 images, each image


will be inserted into NN one by one
Input & Output

Sentosa
X3 others

X4

The four features at first row are input to blue layer at once
Loss Function

4 main components

1) Neuron
2) Weight and Biases
3) Activation function
4) Feedforward
Neuron

Artificial
b1 neuron is
w1
referred
S F perceptron
Weight & Bias

Input layer Layer 1 x : input


w :weight
b b : bias
x F(s) : output
w s F F(s)

s = w*x + b
Activation function

Input layer Layer 1 x : input


w :weight
b b : bias
x F(s) : output
w s F F(s)

F(s) = 1
s = w*x + b 1+ e-x
Feed forward

Input layer Layer 1 x : input


x=1
w :weight
w=0.3
b b : bias
b= - 0.3 x F(s) : output
s= 0 w s F F(s)
F(s) = ?

F(s) = 1
s = w*x + b 1+ e-x
Loss
Forward direction to
reach F(S)

x=1
w=0.3 x b
w
b= - 0.3
s F
s=
F(s) = F(S)
Target value (label)
target value =
Loss =
Loss = target value – F(S)

Loss is the different between target value and F(s). Also called error
MSE
Loss
Forward direction to
reach F(S)

x=1
x b
w=0.3
w
w=0.3, b=-0.3
b= - 0.3
s F s = w*x +b
s= F(S) w x b s target value loss
F(s) =
Target value (label) 0.3*1 + (-0.3) = 0 0 0
target value =
Loss =
0.3*2 + (-0.3) = 0.3 -1 1.68
Loss = target value – F(S) 0.3*3 + (-0.3) = 0.6 -2 6.75
0.3*4 + (-0.3) = 0.9 -3 15.21

Lets x = 1, 2, 3, 4
Optimizer

4 main components

1) Backpropagation
2) Optimizer
3) Learning rate
4) Epoch & Accuracy
Back propagation
Backward direction

We go
backward in b
effort to x w
minimize s F Loss
the loss F(S)
Target value (label)

Loss = target value – F(S)


Optimizer (Reducing the loss)
Backward direction

We go backward
Change in value
in effort to b
minimize x w
s= (x*w) + b What are w and b so that loss is zero?
the loss using s F
F(s) w=-1, b=1
Optimizer. It is a
function to Target value (label) s = w*x +b
change w x b s target value loss
w and b so that -1*1 + 1 = 0 0 0
loss is zero
-1*2 + 1 = -1 -1 0
-1*3 + 1 = -2 -2 0
Loss = target value – F(s) = 0 0
-1*4 + 1 = -3 -3
Learning rate
Backward direction

Optimizer
updates Change in value
weight and x
b
w
bias toward s= (x*w) + b
s F
zero loss. F(s)
Learning rate Target value (label)
is the rate of
optimizer
changes
weights and Loss = target value – F(s) = 0
biases
Epoch
Backward direction

Forward direction to
One epoch Reach target value
Input layer 1 output
consists of (Dataset)
one forward
direction and b
then one x1, x2, w Optimizer
x3, x4,
backward x5, x6 s F Loss
F(s)
direction and Target value (label)
optimizer is
executed 6 inputs are fetched into neuron in one cycle is one epoch
once for each
samples in You need run several epoch until loss is zero
dataset
Now let adding more layers (multilayer)
Adding more neuron

w1, w2 .. W4 are weights


(Every input must x1 (Every path has weight)
w1 b1
has path to neuron(node))
w2 1
S1 F F(S1) =
1+ e-s
w3 b1and b2 is bias
x2
b2 (Node has bias)
w4

S2 F 1
F(S2) =
1+ e-s Every node has function F(s)
Adding more layers
Input layer 1 layer 2 output
2 inputs
b3 Model has 2 hidden layers
Layer 1 has 2 nodes
x1 w1 b1 Layer 2 has 3 nodes
S F w11
w5 Layer output has two labels
w2
S F w6 w12
w7 b4 S F Target value 1
w3
x2 w13
w4 b2 S F
w8 w14
Target value 2
S F w15 S F
w9
b5 w16
w10 Every input has path to every node
Every node has path to every node
S F
Every path has weight and value are different
Every node has different bias (b) except the output nodes
Every node has same activation function (F)
Every node has different output F(s)
Every node has to do summation (s)
F(s) is Activation function
Layer 1
Final value F(s) come out from nodes
determine by activation function. In
this example we use sigmoid function as
wn+1 activation function.
S F F(s)
1
F(S) =
1+ e-s

F(s)

1
Sigmoid function: F(S) = 1+ e-x
Sigmoid vs. Tanh ReLU Leaky ReLU
Feed forward
From x forward direction to
reach F(S)

Input layer 1 layer 2 output


(Dataset)

x1
S F

S F
S F F1(S)

x2 S F

S F S F F2(S)

S F
Loss
Forward direction to
reach F(S)

Input layer 1 layer 2 output


(Dataset)

x1
S F

S F
S F Target value (label)

x2 S F Loss = target value – F

S F Every path has loss


S F
Average of loss

S F
Back propagation
From F(s) backward direction
To reach x

We go Input layer 1 layer 2 output


backward in b3
effort to x1 b1
w1 S F
minimize w5 w11
w2
the loss by S F w6 w12
w7 b4 S F Target value 1
changing x2
w3
w13
value of w4 b2 S F
w8 w14
weight &, Target value 2
S F w15 S F
biases w9
b5
w10 w16

S F
Optimizer
From x forward direction to
reach F(S)
We go backward Input layer 1 layer 2 output
in effort to b3
minimize x1 b1
w1 S F
the loss. w5 w11
w2
Optimizer is a S F w6 w12
w7 b4 S F Target value 1
function to x2
w3
w13
change w4 b2 S F
w8 w14
weights and Target value 2
S F w15 S F
biases so that w9
b5
w10 w16
loss is zero
S F
Optimizer
Backward direction

Change in value

x1
w1 b1 b3 Loss

S F
w2 b2 S F Target value (label)
w3
S F

Optimizer works at every path


Objective : Loss = 0
Types of optimizer

GradientDescentOptimizer

AdadeltaOptimizer

MomentumOptimizer

AdamOptimizer

FtrlOptimizer

RMSPropOptimizer
Learning rate
Backward direction

Optimizer Change the


updates value according
to learning rate
weight and
bias toward
x1
zero loss. w1 b1 b3 Loss
Learning rate
S F
is the rate of w2 b2 S F Target value (label)
w3
optimizer
S F
changes
weights and Loss = target value - F
biases
Learning rate : rule that optimizer has to follow in changing w and b
Epoch
Backward direction

Forward direction to
One epoch Reach target value
Input layer 1 layer 2 output
consists of (Dataset)
one forward
direction and x1
S F Loss
then one
backward S F
Target value (label)
S F
direction and
optimizer is S F
Loss = target value - F
executed
S F S F
once for all
sample in
dataset. S F
Epoch, batch & iterations
Epoch

Batch

Iteration
Epoch, batch & iterations

This approach called “Mini batch gradient descent”


Epoch, batch & iterations

Dataset is 100 samples

Epoch = 40
Num_of_batch (iteration) = 5
Batch_size = 20

for i less than or equal to Epoch


for j less than or equal to Num_of_batch
compute loss and optimized Batch_size
Epoch, batch & iterations

What is happening during epoch?

Dataset is 1 sample

Epoch = 4
Num_of_batch = 1
Batch_size = 1

for i less than or equal to Epoch


for j less than or equal to Num_of_batch
After 4 epoch the optimizer achieves 0 error
compute loss and optimized Batch_size
Parameter and hyperparameter

Parameter Any value that change by computer.


They are weight and biases. Automatically update by optimizer

Hyperparameter Any value that change by human.


They are learning rate, epoch, batch, number of layer, number of nodes
dropped out rate.
Tutorial
How many parameters in this model
b3

w1 b1
X1 S F w11
w5
w2
S F w6 w12
w7 b4 S F
w3
x2 w13
w4 b2 S F
w8 w14
S F w15 S F
w9
b5 w16
w10
S F
Layers How many layers?
Input
How many nodes?
Output How many inputs?
How many activation functions?
How many classes?
How many weights ?
How many biases
x4 How many optimizer?
How many parameters?
Assessing performance
Assessing the performance
Train data Test data
(80%) (20%)
Dataset is 100 samples
Validation phase
Epoch = 40
Each single epoch → we run train data.
Num_of_batch = 5
End of each single epoch → we run test data
Batch_size = 20

for i less than or equal to Epoch


for j less than or equal to Num_of_batch Accuracy is the percentage of right prediction
compute loss and optimized Batch_size over number of sample in test data. It uses
during validation(of every epoch) or
testing phase(end of whole epoch).

Loss is the percentage during 1 epoch.


Assessing the performance
Train data Test data
(80%) (20%)
Dataset is 100 samples
Validation phase
Epoch = 40
Each single epoch → we run train data.
Num_of_batch = 5
End of each single epoch → we run test data
Batch_size = 20

for i less than or equal to Epoch


for j less than or equal to Num_of_batch Overfitting is when loss in validation phase is much
compute loss and optimized Batch_size bigger than in training phase.

Underfitting is simply the loss is much bigger


during training phase.
Assessing the performance
Overfitting is when training is so good but then when validation/testing phase is bit worsts

Dropped out

Randomly pick any nodes and disable it.

We gives every nodes a probability for being alive.


E.g. say probability is 0.5. So every node will be
50% alive or 50% dead.

Dropped out is always related to overcome overfitting.


Thank you

You might also like