0% found this document useful (0 votes)
18 views40 pages

5 Basic Neural Networks

The document discusses the human brain and artificial neural networks. It provides an overview of the structure and function of the human brain and compares it to artificial neural networks. It also examines different types of activation functions used in artificial neural networks like sigmoid, tanh, ReLU, and their advantages and drawbacks.

Uploaded by

7kz7b9m6fw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views40 pages

5 Basic Neural Networks

The document discusses the human brain and artificial neural networks. It provides an overview of the structure and function of the human brain and compares it to artificial neural networks. It also examines different types of activation functions used in artificial neural networks like sigmoid, tanh, ReLU, and their advantages and drawbacks.

Uploaded by

7kz7b9m6fw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Lecture Five

Basic of Neural Network


The human brain

 Center of awareness and cognition

 Perhaps the most complex information processing


machine in nature

 Historically, considered as a monolithic information


processing machine

Electrical Engineering 2
Department/ University of Basrah
Beginner’s Brain Map

Forebrain (Cerebral Cortex):


Language, maths, sensation,
movement, cognition, emotion
Midbrain: Information Routing;
involuntary controls
Cerebellum: Motor
Control
Hindbrain: Control of
breathing, heartbeat, blood
circulation
Spinal cord: Reflexes,
information highways between
body & brain

Electrical Engineering 3
Department/ University of Basrah
Brain : a computational machine?
Information processing: brains vs computers
 brains better at perception / cognition
 slower at numerical calculations
 parallel and distributed Processing
 associative memory
 Evolutionarily, brain has developed algorithms most
suitable for survival
 Algorithms unknown: the search is on
 Brain astonishing in the amount of information it
processes
– Typical computers: 109 operations/sec
– Housefly brain: 1011 operations/sec
Electrical Engineering 4
Department/ University of Basrah
Brain facts & figures

• Basic building block of nervous system: nerve cell


(neuron)
• ~ 1012 neurons in brain
• ~ 1015 connections between them
• Connections made at “synapses”
• The speed: events on millisecond scale in neurons,
nanosecond scale in silicon chips

Electrical Engineering 5
Department/ University of Basrah
Neuron - “classical”
• Dendrites
– Receiving stations of neurons
– Don't generate action potentials
• Cell body
– Site at which information
received is integrated
• Axon
– Generate and relay action
potential
– Terminal
• Relays information to next neuron in the pathway

Electrical Engineering 6
Department/ University of Basrah
How do artificial neural nets model the brain?

 An artificial neural network consists of a number of very simple and highly


interconnected processors, also called neurons, which are analogous to the biological
neurons in the brain.
 The neurons are connected by weighted links passing signals from one neuron to
another.
 Each neuron receives a number of input signals through its connections; however, it
never produces more than a single output signal.
 The output signal is transmitted through the neuron’s outgoing connection
(corresponding to the biological axon).
 The outgoing connection, in turn, splits into a number of branches that transmit the
same signal (the signal is not divided among these branches in any way).
 The outgoing branches terminate at the incoming connections of other neurons in the
network.
 Table 1.1 shows the analogy between biological and artificial neural networks.

Electrical Engineering 7
Department/ University of Basrah
Table 1.1 Analogy between biological and artificial neural networks

Electrical Engineering 8
Department/ University of Basrah
ARTIFICIAL NEURAL NETWORKS
Historical Review

• Human brain has many incredible characteristics such as massive parallelism,


distributed representation and computation, learning ability, generalization
ability, adaptivity, which seems simple but is really complicated.
• It has been always a dream for computer scientist to create a computer which
could solve complex perceptual problems this fast.
• ANN models was an effort to apply the same method as human brain uses to
solve perceptual problems.
• Three periods of development for ANN:

- 1940: Mcculloch and Pitts: Initial works


- 1960: Rosenblatt: perceptron convergence theorem Minsky and Papert: work
showing the limitations of a simple perceptron
- 1980: Hopfield/Werbos and Rumelhart: Hopfield's energy approach/
backpropagation learning algorithm

Electrical Engineering 9
Department/ University of Basrah
The Concept of Artificial Neural Network

A neural network (NN)—a distributed parallel processor consisting of elementary units of


information processing, accumulating experimental knowledge and provide them for
subsequent processing.
Artificial neurons—a unit of information processing in neural networks. The model
neuron that underlies neural network shown in Fig. 1.1. In this model, there are four basic
elements.

Electrical Engineering 10
Department/ University of Basrah
1. A set of synapses, or connections, each of which is characterized by its weight.
In particular, the signal at the input of the synapse xjj, associated with neuron k,
multiplied by the weight wkj .
2. The adder adds the input signals weighted by the respective neuron synapses.
3. Activation function f (s) limits the amplitude of the output neuron. Usually the
normalized amplitude range of the output neuron is in the range [0, 1] or [−1, 1].
Different types of activation functions are shown in Figures.
4. The threshold element, labeled b0. This value reflects an increase or decrease in the
signal induced local field served the function activation.

The current state of the neuron is defined as the weighted sum of its inputs

The output neuron is a function of its state

Electrical Engineering 11
Department/ University of Basrah
1.2 Classification Activation Function
As activation function can use the following functions: Step, Sign, sigmoid function,
hyperbolic tangent and ReLU- Rectifier linear unit and softmax function as shown in figures
below.

Electrical Engineering 12
Department/ University of Basrah
 Currently sigmoid function lost its former popularity and is used very rarely. This feature
has several major drawbacks:
1. Saturation sigmoid function leads to attenuation gradients (saturated neurons kill
gradient). For example, if the initial weight values are too high, most neurons pass into a
state of saturation, resulting in poor network will learn.
2. With n layers, n small derivatives are multiplied together, gradient decreases exponentially
backwards through the layers. So that small gradient leads to that weights and biases will
not be updated.
3. Sigmoid output is not zero-centered. This can lead to unwanted zig-zag dynamics. This has
less serious consequences, compared with problem of saturation.
4. Exp() is a bit compute expensive.

 Still kills gradient when saturated.

Electrical Engineering 13
Department/ University of Basrah
Electrical Engineering 14
Department/ University of Basrah
Electrical Engineering 15
Department/ University of Basrah
 The main feature of the ReLU activation function lies partly in its simplicity. As
one can see, all it does is replace negative values with 0 and keep positive ones
as is.
 This avoids the problem of “killing” the gradients of large and small values,
while also being much faster computationally as it involves simpler
mathematical operations.
 Also, in practice neural networks using ReLU tend to converge about six times
faster than those with sigmoid and tanh.
 Very computationally efficient.

 However, ReLU still faces some problems. First off, it's not 0-centered, which
can cause problems during training.
 Most importantly though, it does not deal with negative inputs in a particularly
meaningful way. During the backpropagation process neural network updates
weights with the gradients. Some neurons, that have negative input values, will
have zero gradient and will not be updated.
 There is the possibility that some neurons will not be updated during the whole
process of neural network training at all. Those neurons are called "dead"
neurons.
Electrical Engineering 16
Department/ University of Basrah
The Parametric ReLU function
 The Parametric ReLU activation function builds on top of ReLU by trying to
handle negative values in a more meaningful way. More specifically, instead of
replacing negative values with 0, it multiplies them by some user-defined
number between 0 and 1, which essentially lets some of the information
contained in the negative inputs be used in the neural network model training.
 The disadvantage of this activation function is that the parameter is not
learnable, and the user should define it very carefully in the neural network
architecture as results can vary depending on that parameter.

Electrical Engineering 17
Department/ University of Basrah
Leaky ReLU function
 The leaky ReLU activation function is the specific case for the parametric ReLU
activation function, where a=0.01. The leaky ReLU functions in neural
networks can cause problems such as updating weights slower as the parameter
is very small.
 It is preferable to use the parametric ReLU function in the neural network
model.

Electrical Engineering 18
Department/ University of Basrah
Exponential linear units (ELU) function
 Exponential linear units(ELU) is yet another non-linear activation function as
an alternative to the ReLU function.
 Positive inputs have the same output for both of them, but the negative values
are smoother with the ELU due to the exponent, but it also has its disadvantage
of making it computationally costly for this type of activation function.

 All of the above-mentioned activation functions (ReLU, Leaky ReLU,


Parametric ReLU, and ELU) have one common problem. All of them are the
same activation function for positive input data (output as a linear function) and
the gradient is constant and equals "1" for all positive values. If the weights of
the hidden layers are big values they can start to multiply together and get
bigger and bigger, which will cause exploding gradient problem.
Electrical Engineering 19
Department/ University of Basrah
 The Softmax is used in multi-class models where it returns probabilities of each
class, with the target class having the highest probability.
 The main difference between the Sigmoid and Softmax is that the Sigmoid is
used in binary classification while the Softmax is used for multivariate
classification tasks.

Electrical Engineering 20
Department/ University of Basrah
Electrical Engineering 21
Department/ University of Basrah
 Note that, in practice there are some important features that needs for
selecting the type of activation functions:
 With using ReLU, be careful with your learning rate.
 Try out leaky ReLU, ELU, softmax
 Try out tanh but don’t depend much.
 Don’t use sigmoid (may be in output layer-in binary classification)

 The differentiability creates other problems, especially in deep learning, such


are "vanishing" and "exploding" gradients.
 In the "vanishing" gradient case from one hidden layer to another, the values of
the gradient can be smaller and smaller that eventually become zero.
 The "exploding" gradient is the other side of the problem where from one
hidden layer to another the values become bigger and bigger and reach infinity.

Electrical Engineering 22
Department/ University of Basrah
How to choose an activation function?

Here's how to choose the right activation function when training a neural network
from scratch? Different activation functions have different advantages and
disadvantages and depending on the type of the artificial neural network the
outcome may be different.
Electrical Engineering 23
Department/ University of Basrah
 The starting point can be to choose one of the ReLU-based activation functions
(including ReLU itself) since they have empirically proven to be very effective
for almost any task. After it tries to choose other activation functions for hidden
layers may be different activation functions for multiple layers to see how the
performance changes
 Neural network architecture, machine learning tasks, and many others have an
impact on activation function selection. For example, if the task is binary
classification, then the sigmoid activation function is a good choice, but for the
multi-class classification softmax function is better as it will output probability
representation for each class.
 In convolutional neural networks, activation functions can be ReLU-based to
increase the convergence speed.
 However, some architectures require specific activation functions. For example,
recurrent neural network architectures utilize the sigmoid function and tanh
function, and their logic gate-like architecture wouldn't work with ReLU.
 As a rule of thumb, when training a neural network begin from zero, one can
simply use ReLU, leaky ReLU, or ELU and expect decent results.

Electrical Engineering 24
Department/ University of Basrah
Example 1.1:
Consider the following network consists of four inputs with the weights as shown

The output R of the network,


prior to the activation
function stage, is calculated
as follows:

With a binary activation function, and a sigmoid function, the outputs of the neuron are
respectively as follow:

Electrical Engineering 25
Department/ University of Basrah
1.5 SINGLE LAYER PERCEPTRON
1.5.1 General Architecture

 The original idea of the perceptron was developed by Rosenblatt in the late 1950s along
with a convergence procedure to adjust the weights.
 In Rosenblatt’s perceptron, the inputs were binary, and no bias was included. It was
based on the McCulloch-Pitts model of the neuron with the hard limitation activation
function.

Electrical Engineering 26
Department/ University of Basrah
 Connection weights and threshold in a perceptron can be fixed or adapted using a number
of different algorithms.
 First, connection weights W1, W2,…,Wn and the threshold value W0 are initialized to small
non-zero values. Then, a new input set with N values received through sensory units
(measurement devices) and the input is computed. Connection weights are only adapted
when an error occurs. This procedure is repeated until the classification of all inputs is
completed.

1.5.2 Linear Classification

 For clarification of the above concept, consider two input patterns classes C1 and C2. The
weight adaptation at the kth training phase can be formulated as follow:

1. If k member of the training vector x(k) is correctly classified, no correction action is


needed for the weight vector. Since the activation function is selected as a hard limiter, the
following conditions will be valid:

Electrical Engineering 27
Department/ University of Basrah
2. Otherwise, the weight should be updated in accordance with the following rule:

Where  is the learning rate parameter, which should be selected between 0 and 1.

Example 1.2:
Let us consider pattern classes C1 and C2, where C1: {(0,2), (0,1)} and C2: {(1,0), (1,1)}.
The objective is to obtain a decision surface based on perceptron learning. The 2-D graph
for the above data is shown in Figure below.

Electrical Engineering 28
Department/ University of Basrah
The perceptron structure is simply as follows:

For simplicity, let us assume  =1 and initial weight vector W(1)=[0 0]. The iteration weights
are as follow:

Electrical Engineering 29
Department/ University of Basrah
Now if we continue the procedure, the perceptron classifies the two classes correctly at each
instance. For example for the fifth and sixth iterations:

Electrical Engineering 30
Department/ University of Basrah
In a similar fashion for the seventh and eighth iterations, the classification results are
indeed correct.

Therefore, the algorithm converges and the decision surface for the above perceptron is as
follows:

Now, let us consider the input data {1,2}, which is not in the training set. If we calculate
the output:

The output Y belongs to the class C2 as is expected.

Electrical Engineering 31
Department/ University of Basrah
1.5.3 Perceptron Algorithm

 The perceptron learning algorithm (Delta rule) can be summarized as follows:


Step 1: Initialize the weights W1, W2…Wn and threshold to small random values.
Step 2: Present new input X1, X2,..Xn and desired output dk .
Step 3: Calculate the actual output based on the following formula:

Step 4: Adapt the weights according to the following equation:

Where is a positive gain fraction less than 1 and dk is the desired output.
Note that the weights remain the same if the network makes the correct decision.
Step 5: Repeat the procedures in steps 2-4 until the classification task is completed.

 Unlike the learning in the ADALINE, the perceptron learning rule has been shown to
be capable of separating any linear separable set of the training patterns.

Electrical Engineering 32
Department/ University of Basrah
1.6 MULTI-LAYER PERCEPTRON
1.6.1 General Architecture
 Multi-layer perceptrons represent a generalization of the single-layer perceptron as
described in the previous section.
 A single layer perceptron forms a half–plane decision region. On the other hand multi-
layer perceptrons can form arbitrarily complex decision regions and can separate various
input patterns.
 The capability of multi-layer perceptron stems from the non-linearities used within the
nodes. If the nodes were linear elements, then a single-layer network with appropriate
weight could be used instead of two- or three-layer perceptrons.
 The Figure below shows a typical multi-layer perceptron neural network structure.

Electrical Engineering 33
Department/ University of Basrah
 As observed it consists of the following layers:

Input Layer: A layer of neurons that receives information from external sources, and
passes this information to the network for processing. These may be either sensory inputs
or signals from other systems outside the one being modeled.

Hidden Layer: A layer of neurons that receives information from the input layer and
processes them in a hidden way. It has no direct connections to the outside world (inputs or
outputs). All connections from the hidden layer are to other layers within the system.

Output Layer: A layer of neurons that receives processed information and sends output
signals out of the system.

Bias: Acts on a neuron like an offset. The function of the bias is to provide a threshold for
the activation of neurons. The bias input is connected to each of the hidden and output
neurons in a network.

Electrical Engineering 34
Department/ University of Basrah
Example 1.3:
Suppose weights and biases are selected as shown in Figure below. The McCulloh-Pitts
model represents each neuron (binary hard limit activation function). Show that the network
solves XOR problem. In addition, draw the decision boundaries constructed by the network.

Electrical Engineering 35
Department/ University of Basrah
In Figure below, suppose the outputs of neurons (before activation function) denote as O1,
O2, and O3. The outputs of the summing points at the first layer are as follow:

Electrical Engineering 36
Department/ University of Basrah
With the binary hard limited functions, the output y1 and y2 are shown in Figures below.
The outputs of the summing points at the second layer are:

The decision boundaries of the network are:

The perceptron is the simplest form of neural network used for the classification of
linearly separable patterns. Multi-layer perceptron overcome many limitations of
single- layer perceptron.

Electrical Engineering 37
Department/ University of Basrah
1.7 Neural Network Classifications
1.7.1 Feedforward and Feedback Networks

 In a feedforward neural network structure, the only appropriate connections are


between the outputs of each layer and the inputs of the next layer. Therefore, no
connections exist between the outputs of a layer and the inputs of either the same
layer or previous layers. The Figure below shows a two-layer feedforward network.
 In this topology, the inputs of each neuron are the weighted sum of the outputs from
the previous layer. There are weighted connections between the outputs of each layer
and the inputs of the next layer. If the weight of a branch is assigned a zero, it is
equivalent to no connection between correspondence nodes.
 The inputs are connected to each neuron in hidden layer via their correspondence
weights. Outputs of the last layer are considered the outputs of the network.

Electrical Engineering 38
Department/ University of Basrah
 For feedback networks the inputs of each layer can be affected by the outputs from
previous layers. In addition, self-feedback is allowed. A simple single layer feedback
neural network is shown in Figure below.

 As observed, the inputs of the network consist of both external inputs and the
network output with some delays. Examples of feedback algorithms include the
Hopfield network.

Electrical Engineering 39
Department/ University of Basrah
Electrical Engineering 40
Department/ University of Basrah

You might also like