0% found this document useful (0 votes)
259 views77 pages

Lesson 3 Artificial Neural Network

The document discusses deep learning concepts including artificial neural networks, biological neurons, artificial neurons, perceptrons, activation functions, feedforward networks, backpropagation, and gradient descent. It provides examples of how perceptrons work and compares biological neurons to artificial neurons. It also explains key concepts such as weights, biases, learning rate, and epochs that are important for training neural networks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
259 views77 pages

Lesson 3 Artificial Neural Network

The document discusses deep learning concepts including artificial neural networks, biological neurons, artificial neurons, perceptrons, activation functions, feedforward networks, backpropagation, and gradient descent. It provides examples of how perceptrons work and compares biological neurons to artificial neurons. It also explains key concepts such as weights, biases, learning rate, and epochs that are important for training neural networks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

Deep Learning with Keras and

TensorFlow
Artificial Neural Network
Learning Objectives

By the end of this lesson, you will be able to:

Explore neural networks

Perform weight updation using different activation functions

Deduce and implement backpropagation algorithm in Python

Optimize the performance of your neural network using L2


regularization and dropout layers
Biological Neuron vs. Artificial Neuron
Biological Neurons

Myelin sheath

Dendrites Axon terminals

Axon

Output Signals
Cell nucleus
Input Signals

▪ Neurons are interconnected nerve cells that build the nervous system and transmit information throughout the
body.
▪ Dendrites are extension of a nerve cell that receive impulses from other neurons.
▪ Cell nucleus stores cell’s hereditary material and coordinates cell’s activities.
▪ Axon is a nerve fiber that is used by neurons to transmit impulses.
▪ Synapse is the connection between two nerve cells.
Rise of Artificial Neurons

Biological Neuron
Artificial Neuron

▪ Researchers Warren McCullock and Walter Pitts published their first concept of simplified brain
cell in 1943.
▪ Nerve cell was considered similar to a simple logic gate with binary outputs.
▪ Dendrites can be assumed to process the input signal with a certain threshold such that if the
signal exceeds the threshold, the output signal is generated.
Definition of Artificial Neuron

An artificial neuron is analogous to biological neurons, where each neuron takes


inputs, adds weights to them separately, sums them up, and passes this sum
through a transfer function to produce a nonlinear output.
Biological Neurons and Artificial Neurons: A Comparison
Biological Neuron
Artificial Neuron

Biological Neurons Artificial Neurons

Cell Nucleus Node

Dendrites Input

Synapse Weights or interconnections

Axon Output
Neural Networks
Perceptron

▪ Single layer neural network


▪ Consists of weights, the summation processor, and an activation function

Inputs Weights Net inputs Activation


function function

1
W0

X1
W1
∈ Output

X2 W2

Wn
Xm
Perceptron: The Main Processing Unit

S f (S) Output

𝑊3

Summation
Activation
Function
Function

Note: Inputs X and weights W are real values.


Weights and Biases in a Perceptron

While the weights determine the slope of the equation, bias shifts the output line towards left or right.

S f (S) Output

𝑊3

Summation Activation
Function Function

𝑋𝑛
Activation Functions

Activation functions squash inputs to outputs by using a threshold value.


Feedforward Nets

▪ Information flow is unidirectional


▪ Information is distributed
▪ Information processing is parallel
The XOR Problem

A perceptron can learn anything that it can represent, i.e., anything separable with a hyperplane.
However, it cannot represent Exclusive OR since it is not linearly separable.

x1

-1 1
x2

-1
Multilayer Perceptrons

Ψ(a)
1 The most common output
function(Sigmoid)
a

One or more layers


Input of hidden units Output
nodes (hidden layers) neurons
Perceptron

Problem Scenario: You are hired by one of the major AI giants, planning to build the best image
classifier model available till date. In the first phase of model development, the input is passed
from the MNIST dataset. MNIST dataset is one of the most common datasets used for image
classification and is accessible from many different sources. It is a subset of a larger set available
from NIST and contains 60,000 training images and 10,000 testing images on handwritten digits
taken from American Census Bureau employees and American high school students.
Objective:
Build a perceptron-based classification model to:
▪ Classify the handwritten digits properly
▪ Make predictions
▪ Evaluate model efficiency

Access: Click the Practice Labs tab on the left panel. Now, click on the START LAB button and wait
while the lab prepares itself. Then, click on the LAUNCH LAB button. A full-fledged jupyter lab
opens, which you can use for your hands-on practice and projects.
Backpropagation
Learning Networks

▪ Learn from the input data or from training


examples and generalize from learned data
▪ By training the network, it tries to find a line,
plane or a hyperplane which can correctly
separate two classes by adjusting weights and
biases
▪ The network configures itself to solve a problem
The Backpropagation Algorithm

Provide the input


Initialize the weights Repeat the initial
and calculate the Update the weights
and the threshold steps
output

Let 𝑊𝑖 be the initial Let X be the input 𝑊𝑖 (t+1) = 𝑊𝑖 (t) +n(d- The former steps are
weight and Y be the output y)X, where d is the iterated continuously
desired output and y by changing the
is the actual output values of n till a
considerable output
is obtained
The Error Landscape

The objective is to minimize the SSE (Sum Squared Error)

Error
Landscape
SSE:
Sum Squared
Error
S (ti – zi)2

Weight
Values
Deriving a Gradient Descent or Ascent Algorithm

The idea of the algorithm is to decrease overall error (or other objective function) each time a
weight is changed.

Look at local gradient: the direction of largest change.

Take a step in that direction such that the step size is proportional to gradient.

Gradient descent tends to yield much faster convergence to maximum.


Gradient Ascent: Step 01

Initialize a random starting point

function
value

x
Gradient Ascent: Step 02

Take a step in the direction of the largest increase

function
value

x
Gradient Ascent: Step 03

Repeat

function
value

x
Gradient Ascent: Step 04

Next step is lower, so stop

function
value

x
Gradient Ascent: Step 05

Reduce step size to “hone in”

function
value

x
Gradient Ascent: Step 06

Converge to (local) maximum

function
value

x
The Learning Rate

▪ Learning rate is used to control the changes made to the weights and biases in order to reduce
the error.
▪ It is used to analyze how an error will change when the values of weights and biases are
changed by a unit.

Note: Generally, a learning rate of 0.01 is a safe bet.


Epoch

▪ One epoch consists of one full cycle of training data


▪ Preferred value of number of training epochs is set to 1000
Backpropagation: Example
A Feed Forward Network

Consider a forward pass network

.15 w1 .40 w5
i1 h1 o1
.45 w6
.05 .20 w2 .01

.25 w3 .50 w7

i2 h2 o2
.30 w4 .55 w8
.10 .99

1 1 b2 .60
b1 .35
Forward Pass

Squash it using logistic function to get the output:


Forward Pass

Carrying out the same process for o2, we get:


Calculating Total Error
The Backward Pass

Let us consider 𝑊5:


The Backward Pass
The Backward Pass
The Backward Pass
Weight Updation

Update the weight. To decrease the error, we then subtract this value from the current weight.
Weight Updation

.15 w1 .40 w5
i1 h1 o1
.45 w6
.05 .20 w2 .01

.25 w3 .50 w7

i2 h2 o2
.30 w4 .55 w8
.10 .99

1 1 b2 .60
b1 .35

Note: While using the backpropagation mechanism, we use the original weights, not the
updated weights.
Hidden Layer Weight Assignment

w1
i1 net out

h1

i2 h2
Hidden Layer Weight Assignment
Hidden Layer Weight Assignment
Hidden Layer Weight Assignment
Hidden Layer Weight Assignment

𝑊1 can be updated now:

The same needs to be repeated for 𝑊2 , 𝑊3 , and 𝑊4 .


Initially the error was 0.298371109. However, post first
iteration of the backpropagation algorithm the error value
lowered down to 0.291027924.

The above error can be reduced significantly by repeating this


process 10,000 times or more.
Activation Functions
Activation Functions

Nonlinearities are needed to learn complex (non-linear) representations of data, otherwise the
neural networks would be just a linear function
Activation Functions

Sigmoid

Tanh

ReLU
▪ Takes a real-valued number and squashes it into a
range of 0 to 1
▪ Sigmoid neurons saturate and kill gradients, thus
NN will barely learn
Activation Functions

Sigmoid

Tanh

ReLU
▪ Takes a real-valued number and squashes it into a
range of -1 to 1
▪ Like sigmoid, tanh neurons saturate
▪ Unlike sigmoid, output is zero-centered
Activation Functions

Sigmoid

Tanh

ReLU ▪ Takes a real-valued number and thresholds it at


zero
▪ Most deep networks use ReLU nowadays
▪ Trains much faster
▪ Prevents the vanishing gradient problem
Backpropagation

Problem Scenario: The backpropagation algorithm plays a key role in training a feedforward
artificial neural network. It models a given function by modifying internal weights of input neurons
to produce an expected output neuron.

Objective:
Build a neural network which takes tanh as the activation function and updates weights with
respect to the tanh gradients.

Access: Click the Practice Labs tab on the left panel. Now, click on the START LAB button and wait
while the lab prepares itself. Then, click on the LAUNCH LAB button. A full-fledged jupyter lab
opens, which you can use for your hands-on practice and projects.
Activation Function

Problem Scenario: Neural networks are the crux of deep learning, a field which has practical
applications in many different areas. They become more accurate and effective with multiple
layers. The creation of multilayered neural network is not feasible. However, developing the
source code for a shallow neural network will help understand the functioning of deep neural
networks much better.

Objective: Write a simple neural network in Python considering the activation function as
sigmoid.

Access: Click the Practice Labs tab on the left panel. Now, click on the START LAB button and wait
while the lab prepares itself. Then, click on the LAUNCH LAB button. A full-fledged jupyter lab
opens, which you can use for your hands-on practice and projects.
Defining Elements

Import the necessary libraries and define a class in the name of NeuralNetwork
Initialize Weights and Assign Activation Functions
Weight Adjustment

Note: The above functions are defined within the class named NeuralNetworks.
Weight Adjustment

Note: The above functions are defined within the class named NeuralNetworks.
Initialize the think Function
Initialize the Neural Network
Train the Neural Network
Regularization
Deep Neural Networks

When a neural network contains more than one hidden layer it becomes a Deep Neural Network.

……

……

……

Input layer Output layer

Multiple hidden layers


The Overfitting Problem

Training data
Validation data

error
Starting to
overfit
Early stopping
algorithm

Epochs

Learned hypothesis may fit the training data and the outliers (noise) very well but fail to
generalize test data.
Dealing with the Overfitting Problem

▪ Regularization penalizes big weights, in addition to the


overall cost function
L2 Regularization
▪ Weight decay value determines how dominant
regularization is during gradient computation
▪ Big weight decay coefficient implies big penalty for big
weights

Dropout Regularization

Regularization term Squared Weights


Dealing with the Overfitting Problem

L2 Regularization

Dropout Regularization

Complex nonlinearity reduced to


linear function after application of
L2 regularization, thus reducing the
complexity due to hidden layers
Dealing with the Overfitting Problem

L2 Regularization

▪ Randomly drops units (along with their connections) during


training
Dropout
Regularization ▪ Each unit is retained with fixed probability p, independent of other
units
▪ 0<p<1
▪ Hyper-parameter p has to be chosen (tuned)
▪ While testing the entire network gets activated while the weights
get scaled by a factor of p
Dealing with the Overfitting Problem

L2 Regularization

Dropout
Regularization

Dropout Layers
Dropout Experiment

An architecture of 784-2048-2048-2048-10 is used on the MNIST dataset. The dropout rate 𝑝 was
changed from small numbers (most units are dropped out) to 1.0 (no dropout).

High rate of dropout (𝒑 < 𝟎. 𝟑)


 Underfitting
No dropout (𝒑 = 𝟏. 𝟎)
 Very few units are turned on
 Training error is low
during training
 Test error is high

Best dropout rate (𝒑 = 𝟎. 𝟓)


 Training error is low
 Test error is low
Key Takeaways

Now, you are able to:

Explore neural networks

Perform weight updation using different activation functions

Deduce and implement backpropagation algorithm in Python

Optimize the performance of your neural network using L2


regularization and dropout layers
Knowledge Check
Knowledge
Check After the perceptron algorithm finishes training, how can the learned weights be
expressed in terms of the initial weight vector and the input vectors?
1

a. It requires one bit per data point

b. It requires one integer per data point

c. It requires one real number per data point

d. It is impossible
Knowledge
Check After the perceptron algorithm finishes training, how can the learned weights be
expressed in terms of the initial weight vector and the input vectors?
1

a. It requires one bit per data point

b. It requires one integer per data point

c. It requires one real number per data point

d. It is impossible

The correct answer is b

During the perceptron training algorithm, the weights are updated by adding or subtracting the input vector. Moreover, this might happen
multiple times for the same input vector. Therefore, the weight vector = initial vector + c1 * data point 1 + c2 * data point 2 + … cn * data
point n, where c_i = number of times data point i was added - number of times data point i was subtracted
Knowledge
Check Which of the following techniques performs similar operations as dropout in a neural
network?
2

a. Bagging

b. Boosting

c. Stacking

d. None of these
Knowledge
Check Which of the following techniques performs similar operations as dropout in a neural
network?
2

a. Bagging

b. Boosting

c. Stacking

d. None of these

The correct answer is a

Dropout can be seen as an extreme form of bagging in which each model is trained on a single case and each
parameter of the model is very strongly regularized by sharing it with the corresponding parameter in all the other
models.
MNIST Image Classification

Problem Scenario: The MNIST dataset is widely used for image classification.
However, while validating the same, researchers found out that the
classification model was overfitting as it was not giving acceptable accuracy on
the testing data.
Use the mnist_test.csv and mnist_train.csv for model optimization (using
dropout layers). Also, you will have to use one hot encoding for training and
testing labels.

Objective:
Optimize a neural network based classification model using dropout
regularization such that the p value is 0.70 for input and hidden layers.

Access: Click the Practice Labs tab on the left panel. Now, click on the START
LAB button and wait while the lab prepares itself. Then, click on the LAUNCH
LAB button. A full-fledged jupyter lab opens, which you can use for your
hands-on practice and projects.
Thank You

You might also like