Neural Network Implementation in Java
Neural Network Implementation in Java
19-30
Abstract: Artificial neural networks are a powerful tool that engineers can use in variety of
purposes. The most common tasks are classification and regression, as well as control,
modeling and prediction. For more than three decades, the field of artificial neural networks
has been the center of intensive research. A large number of software tools have been
developed to train these types of networks, but there is still interest in implementing neural
networks in different programming languages. This paper aims to present the implementation
of an arbitrary neural network in the Java programming language.
Keywords: Artificial neural networks, Java, Neural network training, Neural networks
prediction
Sažetak: Veštačke neuronske mreže moćan alat koji inženjeri mogu da koriste za razne svrhe.
Najčešći zadaci su klasifikacija i regresija, kao i kontrola, modeliranje i predviđanje. Više od
tri decenije polje veštačkih neuronskih mreža je centar intenzivnih istraživanja. Razvijen je
veliki broj softverskih alata koji se koriste za obuku ovih vrsta mreža, ali i dalje postoji
interesovanje za implementacijom neuronskih mreža u različitim programskim jezicima. Ovaj
rad ima za cilj da predstavi implementaciju proizvoljne neuronske mreže u Java programskom
jeziku.
Ključne reči: Veštačke neuronske mreže, Java, obuka neuronske mreže, predikcija neuronskih
mreža
1. Introduction
Artificial neural networks are systems inspired by biological neural networks. Neural
networks learn to perform tasks by considering examples without programming of
task-specific rules. For example, in image recognition, they can learn to identify
Corresponding author.
E-mail address: [email protected]
Nenad Jovanović, Bojan Vasović, Zoran Jovanović & Miloš Cvjetković
images by analyzing examples of images that have been manually tagged. They
automatically generate identification characteristics from the examples they process.
The class of multilayer perceptron networks (MLP) (Bishop, 1995; Demuth et al.,
2014) is one of the most frequently studied models of neural networks. The multilayer
perceptron is a nonlinear model of data transfer, which organizes process units,
neurons, into layers. Communication is possible only between neurons from different
layers (Maca et al., 2014). Standard multilayer neural networks are capable of
approximating any measurable function in any degree of accuracy (Hornik et al.,
1989).
Despite the positive research results and the large number of papers, there is a need
for presentation and clear methodological recommendations for neural network
implementation into Java programming language.
The presented paper aims to present the implementation of a neural network with an
arbitrary number of layers and an arbitrary number of neurons in those layers.
2. Network architecture
A neural network with l layers is observed (Figure 1). Inputs x1, x2,..., xnl are connected
to the first layer neurons. Each connection corresponds to the weight wi,j. Each neuron
represents a processor unit where neuron input to n1 is calculated as:
𝑧11 = 𝑤1,1
1 1
𝑥1+ 𝑤1,2 1
𝑥2+ … + 𝑤1,𝑛1 𝑥𝑛1 + 𝑏11 (1)
𝑧𝑖1 1
= ∑ 𝑤1,𝑖 𝑥𝑖 + 𝑏𝑖1 (2)
𝑖=1
𝑧𝑖1 1
= ∑ 𝑤1,𝑖 𝑥𝑖 (3)
𝑖=0
The output from the first layer i-th neuron, which is the input for the second layer
neurons is:
𝑎1𝑖 = 𝑓(𝑧𝑖1 ) (4)
The output from the neural network is calculated in the for-loop. For-loop is often
slow to perform when it comes to processing large data sets, so the solution is to
vectorize these equations.
For the architecture shown, the matrix equation can be generalized as follows:
𝑍 = 𝑊𝑋 𝑇 + 𝑏 (5)
𝐴 = 𝑓(𝑍) (6)
The f function is called the transfer function. This function should provide a
nonlinear complex functional mapping between the inputs and desired data target
outputs.
The distinctive output of a single bias input linear neuron is shown in Figure 2 (a).
The sigmoid function is shown in Figure 2 (b). This transfer function takes an input,
which can have any value between plus and minus infinity, and gives an output in the
range of 0 to 1, according to the expression:
1
𝑓(𝑥) = (7)
1 + 𝑒 −𝑥
Hyperbolic tangent function is shown in Figure 2 (c).
𝑒 𝑥 − 𝑒 −𝑥
𝑓(𝑥) = 𝑥 (8)
𝑒 + 𝑒 −𝑥
In paper Duch and Jankowski (2001), several possibilities of using transfer functions
of different types in neural networks are presented, as well as regularization of large
networks with heterogeneous nodes and constructive approaches.
Paper aims to analyze the influence of the selection transfer function and training
algorithms on neural network flood runoff forecast (Maca et al., 2014). Sigmoid
function, used in this paper is described in paper (Yonaba et al., 2010).
3. Algorithm Backpropagation
Rummelhart (1995) proposed an algorithm inspired by the gradient method and was
called backpropagation. Based on the gradient method, the output error should be
returned to the previous layers, find the influence of individual weights on the obtained
error and determine the weight gain in individual layers.
The goal is to minimize the overall error. We can now calculate the error for each
output neuron using the error function and sum them to get the total error:
𝑚 𝑛𝐿
1
𝐽 = − [∑ ∑ 𝑦𝑘𝑖 log(ℎ𝑤 (𝑥 𝑖 ))𝑘 + (1 − 𝑦𝑘𝑖 ) log(1 − (ℎ𝑤 (𝑥 𝑖 ))𝑘 ] (9)
𝑚
𝑖=1 𝑘=1
Where is 𝑦𝑘𝑖 the desired output for the i-th training set and the k-th class from the
neural network, and ℎ𝑤 (𝑥 𝑖 ) is the output from the i-th layer, so it is the actual result
of the network.
𝐿
𝜕𝐽 𝜕𝐽 𝜕𝑎𝐿
𝛿 = 𝐿= 𝐿
∗ 𝐿 = 𝑎𝐿 − 𝑦 (10)
𝜕𝑧 𝜕𝑧 𝜕𝑧
Backpropagation allows calculations of 𝛿 𝑙 for each layer, and then with the help of
𝜕𝐽
these errors the values of the real interest 𝑙 , are calculated, respectively the
𝜕𝑤𝑖𝑗
influence of the weights of each layer on the total error.
The parameter α represents the learning rate. Learning rate is a small positive value
that controls the magnitude of the parameters change at each run. Learning rate
controls how quickly a neural network learns a problem.
4. Implementation
The base classes in the system are NeuralNetwork and Layer, which should
implement a general neural network model. Auxiliary class Matrix is also used,
which implements basic operations with matrices. The Layer class is presented in
the Figure 3.
Layer
# double[][] inputs
# double[][] W
# double[][] B
# double[][] Z
# int numNeuron
# ActivationFunc activationFunc
+ Layer(int numNeuron, int inputNum)
+ setActivationFunc
+ getActivationFunc
+ setWeights
+ setInput
+ setBias
+ getWeights
+ getInput
+ getBias
+ output
+ func
+ derivative
Matrix desiredInputs defines the inputs to the neural network, and the matrix
desiredOutputs the desired outputs.
int NUMBER_LAYERS = 2;
int numNeuron[] = {15,10};
int inputNum[] = {25,15};
private double[][] desiredInputs;
private double[][] desiredOutputs;
Lists layers, W and B represent the neural network elements (Figure 1) and they are
initialized inside the constructor.
layers.get(NUMBER_LAYERS).derivative(layers.get(NUMBER_LAYERS)
.getZ()));
double[][] dWpom = Matrix.div(Matrix.mul(dZpom,
Matrix.T(layers.get(NUMBER_LAYERS).getInput())), m);
///
double[][] dBpom = Matrix.div(dZpom, m);
dA.set(NUMBER_LAYERS, dApom);
dZ.set(NUMBER_LAYERS, dZpom);
dW.set(NUMBER_LAYERS, dWpom);
dB.set(NUMBER_LAYERS, dBpom);
The influence of weights on the neurons error in the layer layer (layer =
NUMBER_LAYERS-1; layer >0; layer--) can be expressed as follows:
double [][] dA2 =
Matrix.mul(Matrix.T(layers.get(layer+1).getWeights()),
dZ.get(layer+1));
double[][] dZ2 = Matrix.hadamardProduct(dA2,
layers.get(layer).derivative(layers.get(layer).getZ()));
double[][]dW2;
if (layer == 1){
dW2 = Matrix.div(Matrix.mul(dZ2,(desiredInputs)), m);
}else{
dW2 =
Matrix.div(Matrix.mul(dZ2,Matrix.T(layers.get(layer).getInput())),
m);
}
double[][] dZ2pom = Matrica.sumByRows(dZ2);
double[][]dB2 = Matrix.div(dZ2pom, m);
dA.set(layer, dA2);
dZ.set(layer, dZ2);
dW.set(layer, dW2);
dB.set(layer, dB2);
At the end of each iteration, new weights are determined for each layer, considering
the alfa learning rate. Weight gain and new weight and bias values by layers are
obtained:
for(int layer = 1; layer < NUMBER_LAYERS+1; layer++){
double[][] W1 = Matrix.sub(layers.get(layer).getWeights(),
Matrix.hadamardProduct(alfa, dW.get(layer)));
double[][] B1 = Matrix.sub(layers.get(layer).getBias(),
Matrix.hadamardProduct(alfa, dB.get(layer)));
W.set(layer,W1);
B.set(layer,B1);
}
5. Results
A set of ten characters is used to test the network. Each character is represented by a
5x5 matrix, so the neural network has 25 inputs and ten outputs. Also, the network
contains one hidden layer that has 45 neurons.
For the network training process, the test data shown in Figure 4 are used.
The Figure 4 also shows the desired outputs, for the given training data.
We see that each letter is represented by a 5x5 matrix (Figure 5). Matrix elements have
a value of 0.0 or 1.0, depending on the appearance of a given character.
A B C D E
00100 11111 01111 11110 11111
01010 10001 10000 10001 10000
01010 11110 10000 10001 11110
11111 10001 10000 10001 10000
10001 11111 01111 11110 11111
F G H I J
11111 01111 10001 00100 11111
10000 10000 10001 00100 00001
11110 10011 11111 00100 00001
10000 10001 10001 00100 10001
10000 01110 10001 00100 01110
Figure 5. Characters
After the training, the network testing was made for the input data, which represent
the letter B, with intentionally introduced noise (Figure 6).
1 0.7 1 1 1
1 0.3 0 0 1
1 1 1 0.9 0
10 00 1
11 11 1
The results shown in Figure 7 show that the network correctly performed the
classification and recognized the letter B, since only the second element of the vector
has a value of 1.0, and all other elements are approximately equal to zero.
Figure 7. Results
In the case of a neural network, which has two hidden layers and which uses the
tangent hyperbolic transfer function, the results are shown in Figures 9 and 10.
Figure 10. Error for a neural network with two hidden layers and Hyperbolic
tangent transfer function
6. Conclusion
This paper presents the neural network implementation into Java programming
language. The implementation is generally performed for an arbitrary network having
L layers and with the indicated number of neurons in each layer. The sigmoid function
was used as a transfer function in the first example and tangent hyperbolic in second.
A 10x25 matrix was used to train the net. Each row in the matrix represents a letter
measuring 5x5. A matrix of desired outputs is also given.
Network testing shows that the network correctly classifies the data and minimizes the
error after some 300 iterations (Figure 10), in case of a neural network that has two
hidden layers. In the case of a network with one hidden layer, the convergence is a bit
slower.
In the example, regularization was not used, so if the number of selected parameters
is too large, the neural network may begin to describe noise, which may result in
useless parameter adjustments.
References