ANN Multi Layer Perceptron Assignment
ANN Multi Layer Perceptron Assignment
Input Layer
This is the initial layer of the network which takes in an input which will be used to produce
an output.
Hidden Layer(s)
The network needs to have at least one hidden layer. The hidden layer(s) perform
computations and operations on the input data to produce something meaningful.
Output Layer
An activation function is a simple mapping of summed weighted input to the output of the
neuron. It is called an activation function because it governs the threshold at which the
neuron is activated and strength of the output signal.
Neuron Weights
You may be familiar with linear regression, in which case the weights on the inputs are very
much like the coefficients used in a regression equation.
Like linear regression, each neuron also has a bias which can be thought of as an input that
always has the value 1.0 and it too must be weighted.
For example, a neuron may have two inputs in which case it requires three weights. One for
each input and one for the bias.
Weights are often initialised to small random values, such as values in the range 0 to 0.3,
although more complex initialisation schemes can be used.
Like linear regression, larger weights indicate increased complexity and fragility. It is
desirable to keep weights in the network small and regularisation techniques can be used.
Activation
The weighted inputs are summed and passed through an activation function, sometimes
called a transfer function.
An activation function is a simple mapping of summed weighted input to the output of the
neuron. It is called an activation function because it governs the threshold at which the
neuron is activated and strength of the output signal.
Historically simple step activation functions were used where if the summed input was
above a threshold, for example 0.5, then the neuron would output a value of 1.0, otherwise
it would output a 0.0.
Traditionally non-linear activation functions are used. This allows the network to combine
the inputs in more complex ways and in turn provide a richer capability in the functions they
can model. Non-linear functions like the logistic also called the sigmoid function were used
that output a value between 0 and 1 with an s-shaped distribution, and the hyperbolic
tangent function also called tanh that outputs the same distribution over the range -1 to +1.
More recently the rectifier activation function has been shown to provide better results.