5 Basic Neural Networks
5 Basic Neural Networks
Electrical Engineering 2
Department/ University of Basrah
Beginner’s Brain Map
Electrical Engineering 3
Department/ University of Basrah
Brain : a computational machine?
Information processing: brains vs computers
brains better at perception / cognition
slower at numerical calculations
parallel and distributed Processing
associative memory
Evolutionarily, brain has developed algorithms most
suitable for survival
Algorithms unknown: the search is on
Brain astonishing in the amount of information it
processes
– Typical computers: 109 operations/sec
– Housefly brain: 1011 operations/sec
Electrical Engineering 4
Department/ University of Basrah
Brain facts & figures
Electrical Engineering 5
Department/ University of Basrah
Neuron - “classical”
• Dendrites
– Receiving stations of neurons
– Don't generate action potentials
• Cell body
– Site at which information
received is integrated
• Axon
– Generate and relay action
potential
– Terminal
• Relays information to next neuron in the pathway
Electrical Engineering 6
Department/ University of Basrah
How do artificial neural nets model the brain?
Electrical Engineering 7
Department/ University of Basrah
Table 1.1 Analogy between biological and artificial neural networks
Electrical Engineering 8
Department/ University of Basrah
ARTIFICIAL NEURAL NETWORKS
Historical Review
Electrical Engineering 9
Department/ University of Basrah
The Concept of Artificial Neural Network
Electrical Engineering 10
Department/ University of Basrah
1. A set of synapses, or connections, each of which is characterized by its weight.
In particular, the signal at the input of the synapse xjj, associated with neuron k,
multiplied by the weight wkj .
2. The adder adds the input signals weighted by the respective neuron synapses.
3. Activation function f (s) limits the amplitude of the output neuron. Usually the
normalized amplitude range of the output neuron is in the range [0, 1] or [−1, 1].
Different types of activation functions are shown in Figures.
4. The threshold element, labeled b0. This value reflects an increase or decrease in the
signal induced local field served the function activation.
The current state of the neuron is defined as the weighted sum of its inputs
Electrical Engineering 11
Department/ University of Basrah
1.2 Classification Activation Function
As activation function can use the following functions: Step, Sign, sigmoid function,
hyperbolic tangent and ReLU- Rectifier linear unit and softmax function as shown in figures
below.
Electrical Engineering 12
Department/ University of Basrah
Currently sigmoid function lost its former popularity and is used very rarely. This feature
has several major drawbacks:
1. Saturation sigmoid function leads to attenuation gradients (saturated neurons kill
gradient). For example, if the initial weight values are too high, most neurons pass into a
state of saturation, resulting in poor network will learn.
2. With n layers, n small derivatives are multiplied together, gradient decreases exponentially
backwards through the layers. So that small gradient leads to that weights and biases will
not be updated.
3. Sigmoid output is not zero-centered. This can lead to unwanted zig-zag dynamics. This has
less serious consequences, compared with problem of saturation.
4. Exp() is a bit compute expensive.
Electrical Engineering 13
Department/ University of Basrah
Electrical Engineering 14
Department/ University of Basrah
Electrical Engineering 15
Department/ University of Basrah
The main feature of the ReLU activation function lies partly in its simplicity. As
one can see, all it does is replace negative values with 0 and keep positive ones
as is.
This avoids the problem of “killing” the gradients of large and small values,
while also being much faster computationally as it involves simpler
mathematical operations.
Also, in practice neural networks using ReLU tend to converge about six times
faster than those with sigmoid and tanh.
Very computationally efficient.
However, ReLU still faces some problems. First off, it's not 0-centered, which
can cause problems during training.
Most importantly though, it does not deal with negative inputs in a particularly
meaningful way. During the backpropagation process neural network updates
weights with the gradients. Some neurons, that have negative input values, will
have zero gradient and will not be updated.
There is the possibility that some neurons will not be updated during the whole
process of neural network training at all. Those neurons are called "dead"
neurons.
Electrical Engineering 16
Department/ University of Basrah
The Parametric ReLU function
The Parametric ReLU activation function builds on top of ReLU by trying to
handle negative values in a more meaningful way. More specifically, instead of
replacing negative values with 0, it multiplies them by some user-defined
number between 0 and 1, which essentially lets some of the information
contained in the negative inputs be used in the neural network model training.
The disadvantage of this activation function is that the parameter is not
learnable, and the user should define it very carefully in the neural network
architecture as results can vary depending on that parameter.
Electrical Engineering 17
Department/ University of Basrah
Leaky ReLU function
The leaky ReLU activation function is the specific case for the parametric ReLU
activation function, where a=0.01. The leaky ReLU functions in neural
networks can cause problems such as updating weights slower as the parameter
is very small.
It is preferable to use the parametric ReLU function in the neural network
model.
Electrical Engineering 18
Department/ University of Basrah
Exponential linear units (ELU) function
Exponential linear units(ELU) is yet another non-linear activation function as
an alternative to the ReLU function.
Positive inputs have the same output for both of them, but the negative values
are smoother with the ELU due to the exponent, but it also has its disadvantage
of making it computationally costly for this type of activation function.
Electrical Engineering 20
Department/ University of Basrah
Electrical Engineering 21
Department/ University of Basrah
Note that, in practice there are some important features that needs for
selecting the type of activation functions:
With using ReLU, be careful with your learning rate.
Try out leaky ReLU, ELU, softmax
Try out tanh but don’t depend much.
Don’t use sigmoid (may be in output layer-in binary classification)
Electrical Engineering 22
Department/ University of Basrah
How to choose an activation function?
Here's how to choose the right activation function when training a neural network
from scratch? Different activation functions have different advantages and
disadvantages and depending on the type of the artificial neural network the
outcome may be different.
Electrical Engineering 23
Department/ University of Basrah
The starting point can be to choose one of the ReLU-based activation functions
(including ReLU itself) since they have empirically proven to be very effective
for almost any task. After it tries to choose other activation functions for hidden
layers may be different activation functions for multiple layers to see how the
performance changes
Neural network architecture, machine learning tasks, and many others have an
impact on activation function selection. For example, if the task is binary
classification, then the sigmoid activation function is a good choice, but for the
multi-class classification softmax function is better as it will output probability
representation for each class.
In convolutional neural networks, activation functions can be ReLU-based to
increase the convergence speed.
However, some architectures require specific activation functions. For example,
recurrent neural network architectures utilize the sigmoid function and tanh
function, and their logic gate-like architecture wouldn't work with ReLU.
As a rule of thumb, when training a neural network begin from zero, one can
simply use ReLU, leaky ReLU, or ELU and expect decent results.
Electrical Engineering 24
Department/ University of Basrah
Example 1.1:
Consider the following network consists of four inputs with the weights as shown
With a binary activation function, and a sigmoid function, the outputs of the neuron are
respectively as follow:
Electrical Engineering 25
Department/ University of Basrah
1.5 SINGLE LAYER PERCEPTRON
1.5.1 General Architecture
The original idea of the perceptron was developed by Rosenblatt in the late 1950s along
with a convergence procedure to adjust the weights.
In Rosenblatt’s perceptron, the inputs were binary, and no bias was included. It was
based on the McCulloch-Pitts model of the neuron with the hard limitation activation
function.
Electrical Engineering 26
Department/ University of Basrah
Connection weights and threshold in a perceptron can be fixed or adapted using a number
of different algorithms.
First, connection weights W1, W2,…,Wn and the threshold value W0 are initialized to small
non-zero values. Then, a new input set with N values received through sensory units
(measurement devices) and the input is computed. Connection weights are only adapted
when an error occurs. This procedure is repeated until the classification of all inputs is
completed.
For clarification of the above concept, consider two input patterns classes C1 and C2. The
weight adaptation at the kth training phase can be formulated as follow:
Electrical Engineering 27
Department/ University of Basrah
2. Otherwise, the weight should be updated in accordance with the following rule:
Where is the learning rate parameter, which should be selected between 0 and 1.
Example 1.2:
Let us consider pattern classes C1 and C2, where C1: {(0,2), (0,1)} and C2: {(1,0), (1,1)}.
The objective is to obtain a decision surface based on perceptron learning. The 2-D graph
for the above data is shown in Figure below.
Electrical Engineering 28
Department/ University of Basrah
The perceptron structure is simply as follows:
For simplicity, let us assume =1 and initial weight vector W(1)=[0 0]. The iteration weights
are as follow:
Electrical Engineering 29
Department/ University of Basrah
Now if we continue the procedure, the perceptron classifies the two classes correctly at each
instance. For example for the fifth and sixth iterations:
Electrical Engineering 30
Department/ University of Basrah
In a similar fashion for the seventh and eighth iterations, the classification results are
indeed correct.
Therefore, the algorithm converges and the decision surface for the above perceptron is as
follows:
Now, let us consider the input data {1,2}, which is not in the training set. If we calculate
the output:
Electrical Engineering 31
Department/ University of Basrah
1.5.3 Perceptron Algorithm
Where is a positive gain fraction less than 1 and dk is the desired output.
Note that the weights remain the same if the network makes the correct decision.
Step 5: Repeat the procedures in steps 2-4 until the classification task is completed.
Unlike the learning in the ADALINE, the perceptron learning rule has been shown to
be capable of separating any linear separable set of the training patterns.
Electrical Engineering 32
Department/ University of Basrah
1.6 MULTI-LAYER PERCEPTRON
1.6.1 General Architecture
Multi-layer perceptrons represent a generalization of the single-layer perceptron as
described in the previous section.
A single layer perceptron forms a half–plane decision region. On the other hand multi-
layer perceptrons can form arbitrarily complex decision regions and can separate various
input patterns.
The capability of multi-layer perceptron stems from the non-linearities used within the
nodes. If the nodes were linear elements, then a single-layer network with appropriate
weight could be used instead of two- or three-layer perceptrons.
The Figure below shows a typical multi-layer perceptron neural network structure.
Electrical Engineering 33
Department/ University of Basrah
As observed it consists of the following layers:
Input Layer: A layer of neurons that receives information from external sources, and
passes this information to the network for processing. These may be either sensory inputs
or signals from other systems outside the one being modeled.
Hidden Layer: A layer of neurons that receives information from the input layer and
processes them in a hidden way. It has no direct connections to the outside world (inputs or
outputs). All connections from the hidden layer are to other layers within the system.
Output Layer: A layer of neurons that receives processed information and sends output
signals out of the system.
Bias: Acts on a neuron like an offset. The function of the bias is to provide a threshold for
the activation of neurons. The bias input is connected to each of the hidden and output
neurons in a network.
Electrical Engineering 34
Department/ University of Basrah
Example 1.3:
Suppose weights and biases are selected as shown in Figure below. The McCulloh-Pitts
model represents each neuron (binary hard limit activation function). Show that the network
solves XOR problem. In addition, draw the decision boundaries constructed by the network.
Electrical Engineering 35
Department/ University of Basrah
In Figure below, suppose the outputs of neurons (before activation function) denote as O1,
O2, and O3. The outputs of the summing points at the first layer are as follow:
Electrical Engineering 36
Department/ University of Basrah
With the binary hard limited functions, the output y1 and y2 are shown in Figures below.
The outputs of the summing points at the second layer are:
The perceptron is the simplest form of neural network used for the classification of
linearly separable patterns. Multi-layer perceptron overcome many limitations of
single- layer perceptron.
Electrical Engineering 37
Department/ University of Basrah
1.7 Neural Network Classifications
1.7.1 Feedforward and Feedback Networks
Electrical Engineering 38
Department/ University of Basrah
For feedback networks the inputs of each layer can be affected by the outputs from
previous layers. In addition, self-feedback is allowed. A simple single layer feedback
neural network is shown in Figure below.
As observed, the inputs of the network consist of both external inputs and the
network output with some delays. Examples of feedback algorithms include the
Hopfield network.
Electrical Engineering 39
Department/ University of Basrah
Electrical Engineering 40
Department/ University of Basrah