0% found this document useful (0 votes)
11 views23 pages

Artificial Neural Networks (III)

Uploaded by

micheal amash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views23 pages

Artificial Neural Networks (III)

Uploaded by

micheal amash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Artificial neural networks (III)

Can we train a perceptron to perform basic logical


operations such as AND, OR or Exclusive-OR?
The truth tables for the operations AND, OR and Exclusive-OR
are shown in the table below. The table presents all possible
combinations of values for two variables, 𝑥1 and 𝑥2, and the
results of the operations.

1
Artificial neural networks (III)
The figure below represents the AND, OR and Exclusive-OR
functions as two-dimensional plots based on the values of the
two inputs. Points in the input space where the function output
is 1 are indicated by black dots, and points where the output is
0 are indicated by white dots.

2
Artificial neural networks (III)
In (𝑎) and (𝑏), we can draw a line so that black dots are on one
side and white dots on the other, but dots shown in (𝑐) are not
separable by a single line.

A perceptron is able to represent a function only if there is


some line that separates all the black dots from all the white
dots.

Such functions are called linearly separable. Therefore, a


perceptron can learn the operations AND and OR, but not
Exclusive-OR.
3
Artificial neural networks (III)
Can we do better by using a sigmoidal or linear element
in place of the hard limiter?

Single-layer perceptrons make decisions in the same way,


regardless of the activation function used by the perceptron.

It means that a single-layer perceptron can classify only linearly


separable patterns, regardless of whether we use a hard-limit or
soft-limit activation function.

4
Artificial neural networks (III)
How do we cope with problems which are not linearly
separable?

To cope with such problems we need multilayer neural


networks.

In fact, history has proved that the limitations of Rosenblatt’s


perceptron can be overcome by advanced forms of neural
networks, for example multilayer perceptrons trained with the
back-propagation algorithm.

5
Artificial neural networks (III)
Multilayer neural networks

A multilayer perceptron is a feedforward neural network with


one or more hidden layers. Typically, the network consists of an
input layer of source neurons, at least one middle or hidden
layer of computational neurons, and an output layer of
computational neurons.

The input signals are propagated in a forward direction on a


layer-by-layer basis.

6
Artificial neural networks (III)
A multilayer perceptron with two hidden layers is shown below.

7
Artificial neural networks (III)
But why do we need a hidden layer?

Each layer in a multilayer neural network has its own specific


function. The input layer accepts input signals from the
outside world and redistributes these signals to all neurons in
the hidden layer. Actually, the input layer rarely includes
computing neurons, and thus does not process input patterns.

The output layer accepts output signals, or in other words a


stimulus pattern, from the hidden layer and establishes the
output pattern of the entire network.
8
Artificial neural networks (III)
Neurons in the hidden layer detect the features; the weights of
the neurons represent the features hidden in the input patterns.

These features are then used by the output layer in determining


the output pattern.

With one hidden layer, we can represent any continuous


function of the input signals, and with two hidden layers even
discontinuous functions can be represented.

9
Artificial neural networks (III)
Why is a middle layer in a multilayer network called a
‘hidden’ layer? What does this layer hide?

A hidden layer ‘hides’ its desired output. Neurons in the hidden


layer cannot be observed through the input/output behaviour of
the network.

There is no obvious way to know what the desired output of the


hidden layer should be. In other words, the desired output of
the hidden layer is determined by the layer itself.

10
Artificial neural networks (III)
Can a neural network include more than two hidden
layers?
Commercial ANNs incorporate three and sometimes four
layers, including one or two hidden layers. Each layer can
contain from 10 to 1000 neurons.

Experimental neural networks may have five or even six layers,


including three or four hidden layers, and utilise millions of
neurons, but most practical applications use only three layers,
because each additional layer increases the computational
burden exponentially.
11
Artificial neural networks (III)
How do multilayer neural networks learn?

More than a hundred different learning algorithms are


available, but the most popular method is back-propagation.

This method was first proposed in 1969, but was ignored


because of its demanding computations. Only in the mid-1980s
was the back-propagation learning algorithm rediscovered.

12
Artificial neural networks (III)
Learning in a multilayer network proceeds the same way as for
a perceptron. A training set of input patterns is presented to the
network. The network computes its output pattern, and if there
is an error – or in other words a difference between actual and
desired output patterns – the weights are adjusted to reduce
this error.

In a perceptron, there is only one weight for each input and


only one output. But in the multilayer network, there are many
weights, each of which contributes to more than one output.

13
Artificial neural networks (III)
How can we assess the blame for an error and divide it
among the contributing weights?

In a back-propagation neural network, the learning algorithm


has two phases.

First, a training input pattern is presented to the network input


layer. The network then propagates the input pattern from layer
to layer until the output pattern is generated by the output layer.

14
Artificial neural networks (III)
If this pattern is different from the desired output, an error is
calculated and then propagated backwards through the network
from the output layer to the input layer. The weights are
modified as the error is propagated.

As with any other neural network, a back-propagation one is


determined by the connections between neurons (the network’s
architecture), the activation function used by the neurons, and
the learning algorithm (or the learning law) that specifies the
procedure for adjusting weights.

15
Artificial neural networks (III)
Typically, a back-propagation network is a multilayer network
that has three or four layers. The layers are fully connected,
that is, every neuron in each layer is connected to every other
neuron in the adjacent forward layer.

A neuron determines its output in a manner similar to


Rosenblatt’s perceptron. First, it computes the net weighted
input as before:
𝑛

𝑋 = ෍ 𝑥𝑖 𝑤𝑖 − 𝜃
𝑖=1
where 𝑛 is the number of inputs, and 𝜃 is the threshold applied
to the neuron. 16
Artificial neural networks (III)
Next, this input value is passed through the activation function.
However, unlike a percepron, neurons in the back-propagation
network use a sigmoid activation function:

1
𝑌 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 =
1 + 𝑒 −𝑋

The derivative of this function is easy to compute. It also


guarantees that the neuron output is bounded between 0 and 1.

17
Artificial neural networks (III)
However, it's worth noting that while the sigmoid activation
function has been historically popular, modern deep learning
architectures often prefer other activation functions like
ReLU (Rectified Linear Unit) due to some of the limitations
associated with sigmoid, such as the vanishing gradient
problem.

18
Artificial neural networks (III)
Vanishing Gradient Problem
• Neural Network Training: Neural networks are trained by
adjusting weights. This is done by backpropagation,
which uses gradients from the loss function.

• Role of Gradients: Gradients indicate the direction and


magnitude to adjust the weights for better accuracy.

19
Artificial neural networks (III)
• Gradient: In calculus, the gradient is a vector that points
in the direction of the steepest increase of a function. In
the context of neural networks, the gradient tells us how
much the loss will change if we change the weights by a
small amount. If we know the direction in which the loss
increases the most, we can adjust the weights in the
opposite direction to reduce the loss.

20
Artificial neural networks (III)
• Vanishing Gradients Issue: In deep networks, especially
with certain activation functions, gradients can become
extremely small. This is like multiplying many small
numbers together.

• Implication: When gradients are tiny, weight updates are


negligible. This means the deeper layers of the network
learn very slowly or not at all, hindering overall training.

21
Artificial neural networks (III)
Rectified Linear Units (ReLU)
The Rectified Linear Unit is the most commonly used
activation function in deep learning models. The function
returns 0 if it receives any negative input, but for any
positive value x it returns that value back. So, it can be
written as f(x)=max(0,x).

22
Artificial neural networks (III)
It's surprising that such a simple function (and one
composed of two linear pieces) can allow your model to
account for non-linearities and interactions so well. But the
ReLU function works great in most applications, and it is
very widely used as a result.

23

You might also like