Artificial Neural Networks (III)
Artificial Neural Networks (III)
1
Artificial neural networks (III)
The figure below represents the AND, OR and Exclusive-OR
functions as two-dimensional plots based on the values of the
two inputs. Points in the input space where the function output
is 1 are indicated by black dots, and points where the output is
0 are indicated by white dots.
2
Artificial neural networks (III)
In (𝑎) and (𝑏), we can draw a line so that black dots are on one
side and white dots on the other, but dots shown in (𝑐) are not
separable by a single line.
4
Artificial neural networks (III)
How do we cope with problems which are not linearly
separable?
5
Artificial neural networks (III)
Multilayer neural networks
6
Artificial neural networks (III)
A multilayer perceptron with two hidden layers is shown below.
7
Artificial neural networks (III)
But why do we need a hidden layer?
9
Artificial neural networks (III)
Why is a middle layer in a multilayer network called a
‘hidden’ layer? What does this layer hide?
10
Artificial neural networks (III)
Can a neural network include more than two hidden
layers?
Commercial ANNs incorporate three and sometimes four
layers, including one or two hidden layers. Each layer can
contain from 10 to 1000 neurons.
12
Artificial neural networks (III)
Learning in a multilayer network proceeds the same way as for
a perceptron. A training set of input patterns is presented to the
network. The network computes its output pattern, and if there
is an error – or in other words a difference between actual and
desired output patterns – the weights are adjusted to reduce
this error.
13
Artificial neural networks (III)
How can we assess the blame for an error and divide it
among the contributing weights?
14
Artificial neural networks (III)
If this pattern is different from the desired output, an error is
calculated and then propagated backwards through the network
from the output layer to the input layer. The weights are
modified as the error is propagated.
15
Artificial neural networks (III)
Typically, a back-propagation network is a multilayer network
that has three or four layers. The layers are fully connected,
that is, every neuron in each layer is connected to every other
neuron in the adjacent forward layer.
𝑋 = 𝑥𝑖 𝑤𝑖 − 𝜃
𝑖=1
where 𝑛 is the number of inputs, and 𝜃 is the threshold applied
to the neuron. 16
Artificial neural networks (III)
Next, this input value is passed through the activation function.
However, unlike a percepron, neurons in the back-propagation
network use a sigmoid activation function:
1
𝑌 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 =
1 + 𝑒 −𝑋
17
Artificial neural networks (III)
However, it's worth noting that while the sigmoid activation
function has been historically popular, modern deep learning
architectures often prefer other activation functions like
ReLU (Rectified Linear Unit) due to some of the limitations
associated with sigmoid, such as the vanishing gradient
problem.
18
Artificial neural networks (III)
Vanishing Gradient Problem
• Neural Network Training: Neural networks are trained by
adjusting weights. This is done by backpropagation,
which uses gradients from the loss function.
19
Artificial neural networks (III)
• Gradient: In calculus, the gradient is a vector that points
in the direction of the steepest increase of a function. In
the context of neural networks, the gradient tells us how
much the loss will change if we change the weights by a
small amount. If we know the direction in which the loss
increases the most, we can adjust the weights in the
opposite direction to reduce the loss.
20
Artificial neural networks (III)
• Vanishing Gradients Issue: In deep networks, especially
with certain activation functions, gradients can become
extremely small. This is like multiplying many small
numbers together.
21
Artificial neural networks (III)
Rectified Linear Units (ReLU)
The Rectified Linear Unit is the most commonly used
activation function in deep learning models. The function
returns 0 if it receives any negative input, but for any
positive value x it returns that value back. So, it can be
written as f(x)=max(0,x).
22
Artificial neural networks (III)
It's surprising that such a simple function (and one
composed of two linear pieces) can allow your model to
account for non-linearities and interactions so well. But the
ReLU function works great in most applications, and it is
very widely used as a result.
23