Neural Networks in MATLAB
Neural Networks in MATLAB
Name ID number
Hugo Badillo A01196525
2. Neural networks 2
2.1. Learning: Back-propagation algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2. Easy problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3. XOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5. Mistakes 10
6. Codes 10
6.1. Discriminant MATLAB code with back-propagation . . . . . . . . . . . . . . . . . . . . . . . . . . 10
6.2. XOR MATLAB code with back-propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
7. Bibliography 12
1
2. Neural networks
Neural networks are a set of interconnected neural units. The structure of a multilayer neural networks is the follo-
wing:
Input Hidden Hidden Output
Layer Layer 1 Layer 2 Layer
𝑥0 ℎ(1)
0
ℎ(2)
0
𝑥1 ℎ(1)
1
ℎ(2)
1
𝑦1
𝑥2 ℎ(1)
2
ℎ(2)
2
Different structures of neural networks exist but in this diagram the dynamics of the networks can be seen. The in-
put layer consists of a set of 𝑛 input values, which can be represented as a column vector. 𝑋 = [𝑥0 𝑥1 ...𝑥𝑛 ]T .
The first hidden layer is the result of the linear combination of the input values as the argument of a nonlinear
function. In matrix form, the argument of the nonlinear function for each hidden layer is given by:
⎡ 𝑥𝑖,1 ⎤
[ ] ⎢ 𝑥𝑖,2 ⎥
𝑤𝑖,𝑚 ⎢
... ⎥
𝑠𝑖 = 𝐖𝐢 𝐗𝐢 = 𝑤𝑖,1 𝑤𝑖,2 ... (3)
⎢ ⎥
⎣𝑥𝑖,𝑚 ⎦
Where 𝐖𝐢 is a row vector with 𝑚 weights that connect each output variable of the past layer with the following layer
and 𝐗𝐢 − 𝟏 is a column vector of with the 𝑚 outputs of the 𝑖 − 1 layer If 𝑠𝑖 is greater than a threshold value the output
of the nonlinear function is 1, this means that the neuron is activated. If the output of the nonlinear function is 0 then
the neuron is not active. The most common nonlinear function used is the sigmoid function:
1
𝜎(𝑠𝑖 ) = (4)
1 + exp(−𝑠𝑖 )
The inputs of the network travel through the whole network activating different neurons until they reach the output
layer, where an output is produced.
Analogous to humans, neural networks learn from mistakes. In this case the mistake is a measure of the difference
between the output and the target value, also known as the error. When the error is large the network must change the
inter-layer weights to adjust the output (𝑦) towards the desired output (𝑑). The mean squared error of the network is:
1 ∑
𝑇 =𝑇
𝐸𝑚𝑠 (𝐖𝟏 , 𝐖𝟐 , ..., 𝐖𝐧 ) = (𝑑 − 𝑦)2 (5)
𝑇 𝑡=1
Where 𝑇 is the number of tests. The set of weights that minimizes 𝐸𝑚𝑠 will give the best match between the output of
the network and the target output. For convenience reasons, the error of the neural network will be defined as:
1 ∑
𝑇 =𝑇
𝐸= (𝑑 − 𝑦𝑡 )2 (6)
2 𝑡=1 𝑡
∇𝐸 = 0 (7)
2
Now the partial derivatives of the error function must be calculated. Remembering that the output is a function of the
∑
weights 𝑦𝑘 = 𝑓𝑘 ( 𝑀𝑏=0 𝑤𝑖𝑗 ℎ𝑏 ), the gradient of the error with respect to the output layer can be calculated with:
𝜕 1 ∑ ∑
𝑇 =𝑇 𝑀
𝜕𝐸𝜒 𝜕𝐸𝜒 𝜕𝑦𝑘 2 𝜕
= = (𝑑 − 𝑦𝑡 ) 𝑓 ( 𝑤 ℎ ) (8)
𝜕𝑤𝑘𝑗 𝜕𝑦𝑘 𝜕𝑤𝑘𝑗 𝜕𝑦𝑘 2 𝑡=1 𝑡 𝜕𝑤𝑘𝑗 𝑘 𝑏=0 𝑘𝑏 𝑏
𝑑𝜎(𝑥)
= 𝛿(𝑥)(1 − 𝛿(𝑥)) (9)
𝑑𝑥
𝜕𝐸𝜒
= (𝑦𝑘 − 𝑑𝑘 )𝑦𝑘 (1 − 𝑦𝑘 )ℎ𝑗 (10)
𝜕𝑤𝑘𝑗
Defining 𝛿𝑦𝑘 ≡ (𝑦𝑘 − 𝑑𝑘 )𝑦𝑘 (1 − 𝑦𝑘 ) equation 10 is:
𝜕𝐸𝜒
= 𝛿𝑦𝑘 ℎ𝑗 (11)
𝜕𝑤𝑘𝑗
[ ]
𝜕𝐸𝜒 ∑
𝐾
(∑
𝑀
) ∑
𝑝
′
= (𝑦𝑎 − 𝑑𝑎 )𝑓𝑎 𝑎𝑎𝑏 ℎ𝑏 𝑢𝑎𝑗 𝑓𝑗′ ( 𝑤𝑗𝑏 𝑥𝑏 )𝑥𝑖 (13)
𝜕𝑤𝑗𝑖 𝑎=1 𝑏=0 𝑏=0
𝜕𝐸𝜒 ∑
𝐾
= [(𝑦𝑎 − 𝑑𝑎 )𝑦𝑎 (1 − 𝑦𝑎 )𝑢𝑎𝑗 ]ℎ𝑗 (1 − ℎ𝑗 )𝑥𝑖 = 𝛿ℎ𝑗 𝑥𝑖 (14)
𝜕𝑤𝑗𝑖 𝑎=1
∑
Where 𝛿ℎ𝑗 ≡ 𝐾 𝑎=1 [(𝑦𝑎 − 𝑑𝑎 )𝑦𝑎 (1 − 𝑦𝑎 )𝑢𝑎𝑗 ]ℎ𝑗 (1 − ℎ𝑗 ) Equation 14 represents the backpropagation of the error from the
output layer to the hidden layer. The network is initialized with random sets of weights. The direction of the maximum
increase is parallel to the gradient vector, therefore the weight must adjust oppositely to this direction. Similar to the
gradient method the optimum weights are updated.
Where 𝛼 is a positive scalar called the learning rate. When the gradient of the error is zero the weight has reached
a critical point. The update of the weights in the output and hidden layers are derived using equation (10) and (14)
respectively. For the output layer...
𝑢𝑘𝑗 = 𝑢𝑜𝑙𝑑 − 𝜂𝛿𝑦𝑘 ℎ𝑗 (16)
and for the hidden layers
𝑤𝑖𝑗 = 𝑤𝑜𝑙𝑑
𝑖𝑗 − 𝜇𝛿ℎ𝑗 𝑥𝑖 (17)
𝑦 = 𝑚𝑥 + 𝑏 (18)
3
The discriminant line is: 𝑦0 = 𝑥0 The conditions for the given point to be up, down or on the line (𝑦0 ) are:
4
Figura 2: Mean squared error per iteration
5
2.3. XOR
To solve the XOR problem it is a requisite to have at least one hidden layer. The structure of the neural network is
the following:
Input Hidden Output
layer layer 1 layer
𝑥0 ℎ(1)
0
𝑥1 ℎ(1)
1
𝑦
𝑥2 ℎ(1)
2
Where 𝑥0 = 1 is the bias input, 𝑥1 and 𝑥2 are the real input variables of the XOR. The truth table of the XOR is:
𝑥1 𝑥2 Q
0 0 0
0 1 1
1 0 1
1 1 0
To train the neural network with all the possible combination of inputs, a matrix of inputs is built.
Each column is a different set of inputs ("test"). For the forward propagation of the network, a matrix of 6 weights is
needed for the hidden layer. [ ]
𝑤10 𝑤11 𝑤12
𝐖= (25)
𝑤20 𝑤11 𝑤12
The argument of the sigmoid function of the hidden layer is given by:
[ ] ⎡1 1 1 1⎤ [ ]
𝑤10 𝑤11 𝑤12 ⎢ ℎ ℎ1 ℎ1 ℎ1
𝐡=𝐖∗𝐗= 0 1 0 1⎥ = 1 (26)
𝑤20 𝑤11 𝑤12 ⎢ ⎥ ℎ2 ℎ2 ℎ2 ℎ2
⎣0 0 1 1⎦
The back propagation was carried out with equations (11) and (14) now that there are two layers. The results were:
6
Figura 4: Mean squared error per iteration
7
Figura 6: Recurrent neural network flow diagram
Due to the complexity of programming a recurrent neural network MATLAB’s Neural Net Time Series App was
used. First the type of recurrent neural net must be selected. There two types: Nonlinear Autoregressive with External
(Exogenous) Input (NARX) and Nonlinear Autoregressive (NAR).
Figura 7
The NAR is a type of recurrent neural network that only uses the time series of interest. Therefore the only data of
input for this network is the closing price of the dollar of 294 days.
The NARX uses the data of multiple time series to forecast the time series of interest. In this case the data of input
is the closing price of the dollar and the respective opening price of the same 294 days.
The following step is to select the structure of the neural network. The number of hidden layers and delays chosen
are arbitrary. Thus a small experiment was made to determine how many layers and delays to use for each type of
recurrent neural network. The unit of measure of the performance of each neural net was the Mean Squared Error. The
experiment configuration was:
The results for the each type of recurrent neural network were:
8
NAR
Delay (d) 2 5 15 30
Hidden layers
Testing MSE Testing MSE Testing MSE Testing MSE
(h)
10 0.0252928 0.0219844 0.0186507 0.0542187
50 0.0166787 0.019934 0.0227276 0.0213922
100 0.0809912 0.0237858 0.0201315 0.036909
NARX
Delay (d) 2 5 15 30
Hidden layers
Testing MSE Testing MSE Testing MSE Testing MSE
(h)
10 0.00073653 0.00078194 0.00430746 0.00153671
50 0.00052377 0.00048762 0.00323918 0.00164592
100 0.00150496 0.00155932 0.00323086 0.00352952
Based on the past experiment it is (probably) evident that the best type of the neural network is a Nonlinear Auto-
regressive with External (Exogenous) Input (NARX). This means that the neural network performed better when data
of the opening prices was considered. The experiment shows that performance does not has a significant dependence
on the number of hidden layers and delay days. Therefore the number of hidden layers will be 50 and the number of
delays will be 5, chosen arbitrarily.
Figura 8: Response of the neural network trained with Bayesian Regularization h=50, d=5, MSE=8.38749 × 10−4 .
The behaviour of the dollar versus peso time series is described with very good accuracy by the neural network.
Zooming into the plot we can observe that the deviation of the test outputs and test targets is very small and that the
the increase or decrease of the dollar vs peso time series is correctly forecast every time.
9
Figura 9: Zooming in time series
This last result is the most important, because the neural network will tell if there would be an increase or decrease
in the price of the currency with a very low uncertainty. Therefore the forecast of the neural network is more thrust
worthy and anticipated actions can be made to maximize profits.
5. Mistakes
Mistakes were made when trying to use a non-recurrent neural network to fit the dollar vs peso time series. Initially
genetic algorithm was going to be the method to find the weights of the neural network. After several codes the neural
network never converged to a good mean squared error. Then the method of finding weights was changed to back-
propagation but also no convergence was achieved. The reason behind these failures is the architecture of the neural
network. The lack of feedback of past outputs did not allow the neural network to find the right weights for it to fit the
training data.
6. Codes
6.1. Discriminant MATLAB code with back-propagation
10
12 plot(x,y)
13 hold on
14 scatter(p(:,1),p(:,2),[],Q0)
15 title('Points up and down the line' )
16 xlabel('x')
17 ylabel('y')
18 %% Training
19 %% Forward propagation
20 X=[xrand;yrand]; %input matrix
21 W=rand(1,2); %weights
22 for t=1:100 %iterations
23 Y=1./(1+exp(−W*X)); %output
24 E=0.5*sum((Y−d').^2); %squared error for each test
25 MSE(t)=mean(E);
26 %% Back propagation of error
27 for i=1:length(d)
28 dY=(Y(i)−d(i)')*Y(i)*(1−Y(i)); % error signals
29 dW(:,i)=dY*X(:,i)'; %change in output layer weights
30 end
31 %% Mean change in weights
32 dWm=(1/length(d))*sum(dW,2);
33 W=W−dWm';
34 end
35 figure(2)
36 plot(MSE)
37 title('Mean squared error of each iteration')
38 ylabel('E_{ms}')
39 xlabel('Iterations (t)')
40 %% Output of the trained neuron
41 %% Forward propagation
42 ytrained=1./(1+exp(−W*X));
43 figure (3)
44 plot(x,y)
45 hold on
46 scatter(xrand,yrand,[],round(ytrained'))
47 title('Neural network classifying points' )
48 xlabel('x')
49 ylabel('y')
11
21 dW(:,:,i)=m*dH.*X(:,i)'; %Change in weight of hidden layer for each test
22 dU(i,:)=n*dy*H(:,i)'; %Change in weight of output layer for each test
23 end
24 %% Mean change in weights
25 dWm=sum(dW,3)/i; %Average of change in weights of hidden layer
26 dUm=sum(dU,1)/i; %Average of change in weights of output layer
27 %% Weight Update
28 W=W−dWm(2:end,:);
29 U=U−dUm;
30 end
31 toc
32 y %output
33 plot(E)
34 title('Mean squared error between target and network outputs')
35 xlabel('Iterations (t)')
36 ylabel('E(t)')
7. Bibliography
Referencias
[1] Record, N.. Currency Overlay (Wiley Finance Series).
[5] Valencia, L.A. (2011). Forecast of Financial Markets Stoc Prices using Neural Networks and ANFIS (Master
Thesis)
[6] Hyndman R.J. (2009) Moving averages
12