XOR Problem Demonstration Using MATLAB
XOR Problem Demonstration Using MATLAB
XOR Problem Demonstration Using MATLAB
ABSTRACT
Neural network is the first and foremost step in machine learning. It provides the entire basis for
a machine to act like a human. It is prerequisite for a machine to take different categories of data
from analog world. But most of analog world data is non-linear. This non-linearity of analog data
raises a problem for neural network. A neural network classifies dataset linearly. That is, it can
only handle problems which are linearly classified. Thus, it is a necessity for neural network to
have a way to solve non-linearity. In this piece of work, we are going test this linearity
characteristic a neural network using OR, AND operation dataset which are linear. And we will
discuss nonlinearity problem for neural network using XOR dataset. At the end, we will solve
this problem of non-linearity and demonstrate it using MATLAB.
KEYWORDS
Neural Network, Linearity, Perceptron, Back propagation algorithm, XOR, MATLAB
1. Introduction
Neural network is an artificial network which tries to mimic a neural network of human brain.
Neural network of human brain consists of many neurons. Similarly an artificial neural network
consists of many artificial neurons. Thus, it produces almost similar result of a neural network of
human brain.
However, an artificial neural network consists of many neurons. The basic model of neuron
consists of many synaptic inputs in association with synaptic weights, a summing junction to
produce sum of products of synaptic weights and inputs and an activation function to limit the
output of the neuron. A basic model of neuron is shown below:
A neural network takes a problem and tries to generalize this problem into classes. This approach
of generalize into classes is linearity approach of neural network. It tries to draw a single or
multiple linear lines to produce multiple classes of dataset based on similar feature of the
problem dataset. For example, an AND as well as an OR operation has input dataset below:
A B A AND B
0 0 0
0 0 0
1 0 0
1 1 1
Fig: AND operation
A B A OR B
0 0 1
0 1 1
1 0 1
1 1 0
Fig: OR operation
With this data set a neural network will try to classify the output dataset into two classes. It will
produce a linear boundary line. One side of boundary line will contain all zeros for AND, all
ones for OR operation. On the other side of the boundary it will contain only 1 for AND, only 0
for OR operation. For the classification of this dataset, data set will need a single layer
perceptron only.
But in case of XOR where the output is non-linear, a single perceptron cannot produce a linear
classification. Dataset for XOR is shown below:
A B A XOR B
0 0 1
0 1 0
1 0 0
1 1 1
In this case, a multilayer perceptron is needed. We will see how a multilayer perceptron can solve
this problem in later sections.
2. Perceptron
A perceptron is the simplest form of a neural network used for the classification of a special type
of datasets said to be linearly separable. A perceptron is shown below:
Fig: perceptron
In the case of an elementary perceptron, there two decision regions separated by a hyper plane
defined by the equation below:
l
wki x i=0
i=1
w ki xi
Where are the synaptic weights and are the inputs. is the threshold value. For
example, a single layer perceptron can classify OR and AND dataset linearly. Because these
datasets are linearly separable.
x2
1 1 1
0 0
x1
0 1
x1 , x2
Fig: OR ( )
x2
0 1
0 1 1
x1
0 1
x1 , x2
Fig: AND (
But it cannot classify problems which are not linearly separable such as XOR dataset.
x2
1 1
0 1
0 0
x1
0 1
Fig: XOR
As we can see dataset is not linearly separable. To solve this problem we need multilayer
perceptron. In the next section we will discuss multilayer perceptron and how it solves this
problem using back-propagation algorithm.
3. Multilayer perceptron
A multilayer perceptron has one input dataset and one or many hidden layer and one output layer.
A multilayer perceptron is shown below:
( x 1 , x2)
1 1
1 0
0 0
0 1 ( x1 , x 2)
Fig: XOR
In the forward pass, an activity pattern is applied to the sensory nodes of the network and its
effect propagates through the network layer by layer. Finally, a set of outputs is produced as the
actual response of the network. During the forward pass, the synaptic weights of the networks
are all fixed.
During the backward pass, on the other hand, the synaptic weights are all adjusted in accordance
with an error-correction rule. Specifically, the actual response of the network is subtracted from a
desired response to produce an error signal. This error signal is then propagated backward
through the network against the direction of the synaptic connections- the name error back
propagation. The synaptic weights are adjusted to make the actual response of the network move
closer to the desired response in a statistical sense.
1 2
e ( n)
We define the instantaneous value of the error energy for neuron j as 2 j .
1 2
E(n) is obtained by summing e (n)
Correspondingly, the instantaneous value 2 j over all
neurons in the output layer; these are the only visible neurons for which error signals can be
calculated directly. We may thus write
1 2
E ( n )=
2 j c
e j (n)
The instantaneous error energy E(n) and therefore the average error energy Eav , is a
Eav
function of all the free parameters of the network. For a given training set, represents the
cost function as measure of learning performance, the objective of the learning process is to
E
adjust the free parameters of the network to minimize av . For this we consider a simple
method of training in which the weights updated on a pattern- by-pattern basis until one epoch
that is one complete presentation of the entire training set has been dealt with. The adjustments to
the weights are made in accordance with the respective errors computed for each pattern
presented to the network.
The induced local field V j (n) produced at the input of the activation function associated with
neuron j is therefore
m
V j ( n ) = w ji ( n ) y i (n)
i=0
w j0
Where m is the total number of inputs applied to neuron j. The synaptic weight equals the
bias by applied to neuron j. Hence the function signal y j (n) appearing at the output of the
neuron j at iteration n is
y j ( n )= j( v j ( n ) )
The back propagation algorithm applies a correction w ji (n) to the synaptic weight w ji (n)
E( n)
.
, which is proportional to the partial derivative w ji (n) According to the chain rule of
E( n)
The partial derivative w ji (n) represents a sensitivity factor, determining the direction of
E(n)
Now, let us calculate the parameters of the partial derivative w ji (n) in the equation above:
E(n)
=e (n)
e j (n) j
And
e j (n)
=1
y j (n)
And
V j (n)
y j (n) '
=
v j (n)
v j (n)
= y i (n)
w ji (n)
E( n)
Thus, the partial derivative w ji (n) becomes:
E( n) '
=e j (n) (V j ( n ) ) y i(n)
w ji (n)
n
The correction w ji (n) applied to w ji ) to w ji (n) be defined by the delta rule:
E( n)
w ji ( n )=
w ji (n)
Accordingly
w ji ( n )= j ( n ) y i (n)
E (n) '
j ( n )= =e j (n) (V j ( n ) )
v j (n)
The local gradient points to required changes in synaptic weights.
1. Initialization: Assuming that no prior information is available, pick the synaptic weights
and thresholds from a uniform distribution whose mean is zero and whose variance is
chosen to make the standard deviation of the induced local fields of the neurons lie at the
transition between the linear and saturated parts of the sigmoid activation function.
2. Presentation of training examples: Present the network with an epoch of the training
examples for each example in the set, ordered in some fashion; perform the sequence of
forward and backward computations described under points 3 and 4 respectively.
Where y l1
i (n) is the output signal of the neuron I in the previous layer (l1) at
iteration n.
l
w ji (n) is the synaptic weight of neuron j in layer l that is fed form neuron
y l1 1 l
i in the layer l1 . For i=0 , we have 0 ( n ) =+1w j 0 ( n ) =b j ( n) is the bias
applied to neuron j in layer l . Assuming the use of a sigmoid function, the output
y 0j ( n )=x j (n)
Where x j (n) is the jth element of the input vector X (n) . If neuron j is in
the output layer set
y Lj =O j (n)
Compute the error signal:
e j ( v )=d j ( n )O j (n)
Where d i (n) is the jth element of the desired response vector d (n) .
The synaptic weights of the network in layer according to the generalized delta rule:
w lji ( n1 ) + lj (n) y l1
i ( n)
l l
w ji ( n+ 1 )=w ji ( n )+
5. Iteration: Iterate the forward and backward computations under points 3 and 4 by
presenting new epochs of training examples to the network until the stopping criterion is
met.
Fig: Signal flow graph of the network for solving XOR problem
w 11=w12=+1
And
3
b1=
2
And
1
b2=
2
w 31=2
w 32+1
1
b3 =
2
(0, 1) (1, 1)
(0, 1) (1, 1)
3.3MATLAB Demonstration
In MATLAB demonstration we will test linearity for AND as well as OR dataset with a
perceptron. We will also test test non-linearity for XOR dataset for a perceptron. Later, we will
see how a multilayer perceptron can solve this non-linearity problem for XOR dataset. We will
be using regression plot for all of these purposes.
3.2.2 AND Dataset test for single perceptron with no hidden layer
MATLAB code for OR dataset is given below:
clc;
close all;
x=[0 0;1 1; 0 1; 1 0];
i=x'
t=[0 0 0 1];
net=perceptron;
view(net);
net=train(net,i,t);
y=net(i);
plotconfusion(t,y);
Confusion is shown below:
3.2.3 XOR Dataset test for single perceptron with no hidden layer
MATLAB code for XOR dataset is given below:
clc;
close all;
x=[0 0;1 1; 0 1; 1 0];
i=x'
t=[1 1 0 0];
net=perceptron;
view(net);
net=train(net,i,t);
y=net(i);
plotconfusion(t,y);
Confusion is shown below:
As we can see from confusion XOR dataset is non-linearly classified for all the targeted output.
So, a single perceptron with no hidden layer cannot solve an XOR problem. Now let us see if a
single perceptron with one hidden layer can solve this problem.
3.2.3 XOR Dataset test for single perceptron with a hidden layer and back
propagation training algorithm
MATLAB code for XOR dataset is given below:
clc;
close all;
x=[0 0;1 1; 0 1; 1 0];
i=x'
t=[1 1 0 0];
net=feedforwardnet(1,'trainrp');
view(net);
net=train(net,i,t);
y=net(i);
plot(y,t);
plotconfusion(t,y)
Confusion is shown below:
As we can see we get linear classification for XOR data set, thus solving the problem for a
perceptron with no hidden layer.
4. Conclusion
We have successfully showed the incapability of a single perceptron with no hidden layer cannot
classify the XOR dataset linearly. We have also successfully showed that this problem can be
solved by using a perceptron with a single hidden layer and using back propagation training
algorithm.
5. Reference
[1]. Neural Network: A comprehensive foundation By Simon Haykin, Mc Master, University
Hamilton, Ontario, Canada
[2]. AN approach to offline arabic character recognitin using neural network by S.N Nawaz,
M.Sarfaraz, A.Zidouri and W.G.AL-Khatib
[5]. Youtube
[6]. Wiki