Taud 2017
Taud 2017
Abstract Artificial Neural networks have been found to be outstanding tools able
to generate generalizable models in many disciplines. In this technical note, we
present the multi-layer perceptron (MLP) which is the most common neural
network.
Artificial Neural Networks (ANNs) are structures inspired by the function of the
brain. These Networks can perform model function estimation and handle
linear/nonlinear functions by learning from data relationships and generalizing to
unseen situations. One of the popular Artificial Neural Networks (ANNs) is
Multi-Layer Perceptron (MLP). This is a powerful modeling tool, which applies a
supervised training procedure using examples of data with known outputs (Bishop
1995). This procedure generates a nonlinear function model that enables the pre-
diction of output data from given input data.
H. Taud (&)
Centro de Innovación y Desarrollo Tecnológico en Cómputo,
Instituto Politécnico Nacional, Mexico City, Mexico
e-mail: [email protected]
J.F. Mas
Centro de Investigaciones en Geografía Ambiental, Universidad Nacional
Autónoma de México (UNAM), Morelia, Michoacán, Mexico
e-mail: [email protected]
2 Technical Details
In order to understand the MLP, a brief introduction to the one neuron perceptron
and single layer perceptron is provided. The former represents the simplest neural
network and has only one output to which all inputs are connected. Given i = 0,1,
…,n where n is the number of inputs, the quantities {wi} are the weights of the
neuron. The inputs {xi} correspond to features or variables and the output y to their
predictive binary class. Figure 1 describes the three steps forming the perceptron
model. Figure 2 shows its simplified representation. The weighting step involves
the multiplication of each input feature value by its weight {xiwi} and in the second
step they are added together (x0w0 + x1w1 + + xnwn). The third is the transfer
step where an activation function f (also called a transfer function) is applied to the
sum producing an output y presented as:
X
n
y ¼ f ðzÞ and z ¼ wi xi ð1Þ
i¼0
Fig. 1 Perceptron steps: from left to right, weighting, sum and transfer steps
Fig. 2 Perceptron model, from left to right: a steps model. b Simplified model
27 Multilayer Perceptron (MLP) 453
Linear f ðzÞ ¼ z
X
n
wi xi ¼ 0 ð2Þ
i¼0
The Equation (2) can be presented by the dot product between the weight vector
W and the input vector X:
W X ¼0 ð3Þ
With known responses of the input training data, the learning step (also known
as the training step) is completed. The purpose of learning is to optimize the
weights by minimizing a cost function, which is usually a square error between the
known response and the estimated one. Analytical techniques such as gradient
descent determine the optimum weight vector. The algorithm converges to a
solution reaching an operational configuration network. The validation of the model
is achieved using new data in order to show how the configuration can be gener-
alized to new situations.
The parallel connection of many perceptrons generates a single layer perceptron
(SLP) architecture, which is used in the case of various outputs. Figure 4a shows an
example with an input and output layer serving in a linearly separable multiclass case.
The perceptron and the single layer perceptron do not resolve the nonlinearly
separable problem (Fig. 3b). In this case, a solution can be found by adding any
number of layers in successive arrangement and creating a MLP architecture
(Fig. 4b). The output of one layer becomes the input of the next and so on. The first
454 H. Taud and J.F. Mas
and the last layers are called input and output layers respectively, while the others
are the hidden layers of the neural network.
The MLP is a layered feedforward neural network in which the information
flows unidirectionally from the input layer to the output layer, passing through the
hidden layers (Bishop 1995). Each connection between neurons has its own weight.
Perceptrons for the same layer have the same activation function. In general, it is a
sigmoid for the hidden layers. Depending on the application, the output layer can
also be a sigmoid or a linear function.
Among many other algorithms, the widely known MLP learning algorithm is a
backpropagation, which is a generalization of the Least Mean Squared rule (Du and
Swamy 2014). Weights can be corrected by propagating the errors from layer to
layer starting with the output layer and working backwards, hence the name
backpropagation.
The MLP model performance depends not only on the choice of the variables,
the numbers of hidden layers, nodes, and training data but also on the training
parameters such as learning rate, momentum controlling the weight change, and
number of iterations. A MLP with one hidden layer identifies the nonlinear function
with lower accuracies. Networks with more hidden layers are likely to overfit the
training data. The learning rate and the momentum control the speed and effec-
tiveness of the learning process.
In land change modeling, the analysis of the complex relationships between land
transition and the large number of variables acting as drivers, needs advanced
empirical techniques to find a nonlinear function that describes such a complex
relationship (Mas et al. 2014). Variables such as distance, slope, type of soil, land
tenure, etc. are presented at the input node of the network. Each output node
represents a different land transition (e.g. forest to pasture, forest to cropland, and
forest to urban, etc…) for which explanatory variable values are known, as well as
the land transition observed in the past. After the training step, the MLP is able to
predict the potential change of each transition when new input data is presented to
the network (Pijanowski et al. 2002; Mas et al. 2004).
27 Multilayer Perceptron (MLP) 455
References
Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, Oxford 482
Du K-L, Swamy MNS (2014) Neuronal networks and statistical learning. Springer, Berlin
Mas JF, Puig H, Palacio JL, Sosa AA (2004) Modelling deforestation using GIS and artificial
neural networks. Environ Model Softw 19(5):461–471
Mas JF, Kolb M, Paegelow M, Camacho Olmedo MT, Houet T (2014) Inductive pattern-based
land use/cover change models: a comparison of four software packages. Environ Model Softw
51:94–111
Pijanowski BC, Brown DG, Shellito BA, Manik GA (2002) Using neural nets and gis to forecast
land use changes: a land transformation model. Comput Environ Urban Syst 26(6):553–575