Intelligent Information Processing With Matlab - Xiu Zhang
Intelligent Information Processing With Matlab - Xiu Zhang
Xin Zhang
Tianjin Normal University, Tianjin, China
Wei Wang
Tianjin Normal University, Tianjin, China
This work is subject to copyright. All rights are solely and exclusively
licensed by the Publisher, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in
any other physical way, and transmission or information storage and
retrieval, electronic adaptation, computer software, or by similar or
dissimilar methodology now known or hereafter developed.
The publisher, the authors, and the editors are safe to assume that the
advice and information in this book are believed to be true and accurate
at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, expressed or implied, with respect to the
material contained herein or for any errors or omissions that may have
been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
Xiu Zhang
Xin Zhang
Wei Wang
Contents
1 Artificial Neural Network
1.1 Artificial Neuron
1.2 Overview of Artificial Neural Network
1.3 Backpropagation Neural Network
1.4 Hopfield Neural Network
1.5 Competitive Neural Network
1.6 Deep Neural Network
References
2 Convolutional Neural Network
2.1 Overview of Convolutional Neural Network
2.2 Neural Network Performance Evaluation
2.3 Transfer Learning with Convolutional Neural Network
2.4 Research Progress of Neural Network
References
3 Fuzzy Computing
3.1 Overview of Fuzzy Computing
3.2 Fuzzy Sets
3.3 Fuzzy Pattern Recognition
3.4 Fuzzy Clustering
3.5 Fuzzy Inference
3.6 Fuzzy Control System
3.7 Fuzzy Logic Designer
References
4 Fuzzy Neural Network
4.1 Overview of Fuzzy Neural Network
4.2 Adaptive Fuzzy Neural Inference System
4.3 Time Series Prediction
4.4 Interval Type-2 Fuzzy Logic
4.5 Fuzzy C-means Clustering
4.6 Suburban Commuting Prediction Problem
4.7 Research Progress of Fuzzy Computing
References
5 Evolutionary Computing
5.1 Overview of Evolutionary Computing
5.2 Simple Genetic Algorithm
5.3 Genetic Algorithm for Travelling Salesman Problem
5.4 Ant Colony Optimization Algorithm
5.5 Particle Swarm Optimization Algorithm
5.6 Differential Evolution Algorithm
References
6 Testing and Evaluation of Evolutionary Computing
6.1 Test Set of Traveling Salesman Problem
6.2 Test Set of Continuous Optimization Problem
6.3 Evaluation of Continuous Optimization Problems
6.4 Artificial Bee Colony Algorithm
6.5 Fireworks Algorithm
6.6 Research Progress of Evolutionary Computing
References
OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
X. Zhang et al., Intelligent Information Processing with Matlab
https://doi.org/10.1007/978-981-99-6449-9_1
Xin Zhang
Email: [email protected]
Wei Wang
Email: [email protected]
Abstract
Artificial neural network is the core of deep learning algorithms and the
forefront of artificial intelligence. Its inspiration comes from neurons
within the human brain. Artificial neural network mimics the way
biological neurons transmit signals to each other. It can thus achieve
the goal of learning experiences. This chapter introduces artificial
neuron, perceptron and basic model of artificial neural network.
Moreover, the chapter also introduces backpropagation neural network,
Hopfield neural network, competitive neural network. Finally, deep
neural network is introduced in the chapter. Five examples are given to
show the working principle of artificial neural network. The programs
for implementing the examples are also provided for better
understanding the model of artificial neural network.
(2)
Artificial neuron generally sums all the inputs together using a
weighted summation method, as shown in Fig. 1.2. In Fig. 1.2,
each input xi is assigned a weight wi.
(3) Artificial neuron typically has an input bias, as shown in Fig. 1.3.
In Fig. 1.3, the bias is passing an x0 to the artificial neuron and
assigning it the weight w0, so that the bias is w0x0. In general, x0 is
equal to − 1. Note that the bias is also called a deviation, and
sometimes the bias is denoted by the symbol b, i.e., b = w0x0. For
ease of expression into a matrix, w0x0 is used here to denote the
bias.
Fig. 1.3 The input bias of artificial neuron
After the above steps, the input obtained by the artificial neuron can
be computed by:
(1.1)
(1.4)
(1.5)
(1.6)
(1.8)
(1.9)
As can be seen from Fig. 1.5, the output of the first artificial neuron:
(1.10)
(1.11)
It can be seen from (1.10) and (1.11) that the output formula of
each neuron is similar. For convenience of expression, subscripts are
often omitted and expressed in vector form. The output of the j-th
neuron is as follows:
(1.12)
where X represents the input transmitted to all neurons, Wj represents
the weight vector from the input to the j-th artificial neuron, and yj
represents the output of the j-th artificial neuron.
Generally, artificial neural network also contains a hidden layer,
which is also composed of artificial neurons, as shown in Fig. 1.6.
Only the winning neuron has the right to adjust its weight vector
by:
(1.18)
Since a larger dot product of two vectors indicates a closer
approximation, the adjustment results in making further close to
the current input X, so that the next time an input pattern similar to X
appears, the neuron that won last time is more likely to win, and thus
the weight vector corresponding to each neuron in the competitive
layer is gradually adjusted to the clustering center of the input sample
space.
Sometimes a winning neighborhood is defined with the winning
nerve as the center. In addition to the winning neuron, other neurons in
the neighborhood also adjust their weights to varying degrees. Weights
are generally initialized to arbitrary values and normalized.
Next, we introduce perceptron, which is a kind of feedforward
neural network. Perceptron is a hierarchical neural network which
simulates the environment information received by human vision and
transmits the information by nerve impulse. Some common
feedforward neural networks, such as adaptive linear neural networks,
backpropagation neural networks and radial basis function neural
networks, belong to perceptron in structure. The structure and function
of single-layer perceptron are simple, and the network itself has its
inherent limitations, which are overcome by the proposed improved
multi-layer perceptron network and the corresponding learning rules.
A single layer perceptron is a forward network with one layer of
neurons and a threshold activation function. This forward network has
no feedback connections or intra-layer connections and outputs only
one node. The single-layer perceptron network model is shown in
Fig. 1.4.
In the single-layer perceptron, the net input can be obtained as:
(1.19)
(1.20)
Example 1.1 Suppose the input data has 4 sample points, namely (0,
0), (0, 1), (1, 0) and (1, 1). The corresponding real output of the four
sample points is 0, 1, 1, 1. Since the output values are only 0 and 1, you
can see that this is a binary classification problem. This problem is
simulated in Matlab.
The specific programs are as follows:
x = [0 0 1 1; 0 1 0 1];
t = [0 1 1 1];
graduate School
net = perceptron(‘hardlim’, ‘learnp’);
net = configure(net,x,t);
net.iw{1,1} = [-1.5 -0.5];
net.b{1} = 1;
figure(1);
plotpv(x, t);
hold on; box on; grid on;
plotpc(net.iw{1,1},net.b{1}).
xlabel(‘×1’); ylabel(‘×2’); title(‘’).
hold off;
net = train(net,x,t);
view(net)
y = net(x);
figure(2);
plotpv(x, t);
hold on; box on; grid on;
plotpc(net.iw{1,1}, net.b{1})
xlabel(‘×1’); ylabel(‘×2’); title(‘’).
hold off;
As can be seen from Fig. 1.9, if the initial weight is arbitrarily set, the
single-layer perceptron cannot correctly classify the input sample. As
can be seen from Fig. 1.10, the single-layer perceptron can correctly
classify input samples after training.
The model of the multi-layer perceptron network (MLP) is shown in
Fig. 1.11. Besides the output layer, the multi-layer perceptron also has a
mid-layer, called the hidden layer.
Fig. 1.11 Multi-layer perceptron network model
(1.23)
where y is calculated from the weight W of the neural network and the
samples X of the training set, while X is known and the weights W are
unknown and need to be learned to be determined. This means that the
independent variable (1.23) is actually the weight W, which is an
optimization model. As mentioned before, machine learning has the
problem of overfitting, and the regularization method can solve this
problem to some extent. The regularization method is to add the
weights to the loss function, whose expression is:
(1.24)
Example 1.2 The dataset for this example comes from the UCI
machine learning database and is about monitoring Coronavirus
disease (COVID-19). This problem belongs to supervised learning. The
data set includes 14 samples, each of which has 7 attributes, and the
data set has 3 kinds of labels. Matlab is used to establish the BP neural
network program. After training, all samples are predicted and the
prediction accuracy is output.
This problem is simulated in Matlab, and the programs are as
follows:
P = [ 1 1 1 1 1 -1 -1
1 1 -1 1 1 -1 -1
1 1 1 1 -1 1 -1
1 1 -1 1 -1 1 -1
1 -1 -1 -1 -1 -1 1
1 1 1 -1 -1 -1 1
1 1 -1 -1 -1 -1 1
1 1 1 1 -1 -1 -1
1 -1 -1 1 1 -1 -1
-1 1 -1 1 1 -1 -1
1 -1 -1 1 -1 1 -1
-1 1 -1 1 -1 1 -1
-1 1 -1 -1 -1 -1 1
-1 -1 -1 -1 -1 -1 1]’;
T = [1 1 1 1 1 1 1 1 2 2 2 2 2 3];
hiddenLayerSize = [10, 10];
net = feedforwardnet(hiddenLayerSize);
net.numLayers.
net.layers{1}.transferFcn = ‘tansig’;
net.layers{2}.transferFcn = ‘logsig’;
net.trainFcn = ‘traingd’;
net.trainParam.goal = 0.01;
net.trainParam.lr = 0.1;
net.trainParam.showWindow = false;
[net, tr] = train(net, P, T);
o = sim(net, P);
o = round(o);
[T; o]
figure1 = figure(1);
axes1 = axes(‘Parent’,figure1);
hold(axes1,‘on’);
box(axes1,‘on’);
grid(axes1,‘on’);
plot(T, ‘d’, ‘MarkerSize’,10,‘LineWidth’,2,‘LineStyle’,‘none’);
plot(o, ‘*’, ‘MarkerSize’,10,‘LineWidth’,2,‘LineStyle’,‘none’);
hold(axes1,‘off’);
set(axes1,‘FontSize’,14);
print(‘Fig’, ‘-dpng’, ‘-r600’)
The running result of this example is shown in Fig. 1.13, where the
diamond is the real output and the asterisk is the output predicted by
the BP neural network. As can be seen from the figure, for 14 samples
in the data set, the trained BP neural network can correctly predict the
output results. Due to the use of random initialization weights, so each
independent run of the program may not be able to obtain the same
results, sometimes the prediction becomes inaccurate.
Fig. 1.13 Problem of COVID-19 prediction solved by BP neural network
The input of DHNN is the initial state value of the network, denoted
as X(0) = [x1(0), x2(0), …, xn(0)]T. The output of DHNN is the output
value of all neuron states, denoted by X = [x1, x2, …, xn]T. The connection
weight of artificial neuron xi to xj is denoted as wij, that is, the output of
xi is fed back to neuron xj as input. Each neuron has a threshold bj.
Under the excitation of the outside world, DHNN enters the dynamic
evolution process from the initial state, and the state of each neuron is
constantly changing. DHNN usually uses the sign function as the
activation function, and the net input of artificial neuron xj is:
(1.25)
(1.26)
In general, DHNN has wii = 0 and wij = wii. When DHNN reaches
stability, the state of each neuron no longer changes, and the steady
state at this time is the output of DHNN. If the network output of DHNN
at moment t is denoted as X(t), the output of DHNN in the steady state
is .
DHNN works in two ways: asynchronous mode and synchronous
mode. The asynchronous mode is a serial mode in which only one
neuron at a time adjusts its state according to (1.26) while the DHNN is
running, and the state of other neurons remains the same. When
adjusting the state of neurons, they can be adjusted in some prescribed
order, or they can be randomly selected for adjustment. Synchronous
mode is a parallel mode in which all neurons adjust their state
simultaneously while the DHNN is running.
DHNN can store a number of predetermined stable states, that is,
the value of the input. When it runs, an X(0) is applied to the network,
and the network will feed back the output as the input next time. After
several iterations, under certain preconditions, DHNN will finally
stabilize at the pre-set stable point. X(0) is known as the initial
activation vector of DHNN, which only plays a driving role in the initial
scope network. In the following loop iteration, the whole network is in
a self-excited state, and X(0) is replaced by the feedback vector as the
next input.
DHNN can be regarded as a discrete nonlinear dynamic system,
which may have stable state, finite ring state and chaotic state. First,
DHNN can be regarded as a discrete nonlinear dynamic system. As
mentioned above, it starts with the initial state X(0), and if it can
recurse a finite number of times, and its state does not change, so that
X(t + 1) = X(t), then the network is said to be stable, or the network has
a stable state. If DHNN is stable, it can converge from any initial state to
a stable state. Secondly, if DHNN is unstable, since the state of each
node in the network is binary, that is, there are only 1 and − 1 cases, it is
impossible for the network to have infinite divergence, but can only be
a self-sustained oscillation between 1 and − 1, then the network
becomes a finite ring network, or the network has a finite ring state.
Finally, if the state of a network changes within some definite range, but
its state neither repeats nor stops, that is, its state changes infinitely
many, and its motion trajectory does not diverge to infinity, then this
phenomenon is called chaos. For DHNN, the state of each node is binary,
so all possibilities of its network state are limited, so there will be no
chaotic phenomenon. In other words, DHNN does not have a chaotic
state.
If DHNN has a stable state, then it can realize associative memory
function. When the topology structure and weight matrix of the
network are given, the Hopfield neural network can store several pre-
set stable states. Which stable state the network reaches after running
is related to the initial state. If the stable state of the network is used to
represent the memory pattern, the process of the initial state
converging to the stable state can be regarded as the process of the
network searching for the memory pattern. The initial state has part of
the information of the memory pattern, and the subsequent evolution
of the network is to recall all the information process from part of the
information, thus realizing the associative memory function.
The concept of attractor and energy function is introduced next. If X
is the state when a network reaches stability, X is called the attractor of
the network, also known as the equilibrium point. If the attractor is
regarded as the solution of an optimization problem, then the evolution
process from the initial state to the attractor is the computational
process of finding the optimal solution.
Definition 1.3 If there are some X, which are weakly attracted to Xa,
the set of X is said to be the weakly attracted domain of Xa; If there are
some X that are strongly attracted to Xa, then the set of X is called the
strongly attracted domain of Xa.
A network can always evolve into an attractor starting from the
state in the attractor domain. Therefore, when designing the network, it
is necessary to make the network have as large an attractor domain as
possible so as to enhance the associative memory function.
Theorems 1.1 and 1.2 point out that no matter which way to adjust the
state of the network, as long as certain conditions are satisfied, DHNN
can converge to an attractor, that is, DHNN is stable.
If the DHNN network is stable, and steady state is a generalized
concept, how do you quantify steady state? The energy function is the
solution to this problem. For a system, the more stable it is, the less
energy it has, the smaller the value of its energy function. The minimum
value of the energy function corresponds to the stable state of the
system, so the energy function transforms the problem of finding the
attractor into the problem of finding the minimum value of the
function. Generally speaking, the energy function of a network is
defined as follows:
(1.27)
As can be seen from Fig. 1.15, if the initial state of DHNN is near the
upper left, it converges to the upper left attractor. If DHNN starts near
the bottom right, it converges to the bottom right attractor. This is the
associative memory function of DHNN.
(1.28)
By expanding the distance of the above Eq. (1.24) and using the
property of unit vector, it can be simplified as:
(1.29)
(1.30)
As can be seen from (1.29) and (1.30), if the Euclidean distance of
two vectors is minimized, it is only necessary to maximize the dot
product of the two vectors:
(1.31)
Note that the dot product of the weight vector and the input vector
is exactly the net input of the competing layer neurons. In other words,
the winning neuron is the one with the highest net input.
Step (3) Output and weight adjustment. In this learning rule, the
output of the winning neuron is 1, and the output of the remaining
neurons is 0, as follows:
(1.32)
It can be seen that only the winning neuron can adjust its weight
vector, and the adjusted weight vector is:
(1.33)
For the unwinning neurons, their weight values are not adjusted, which
is equivalent to the “victor” neuron j* applying lateral inhibition to
them, not allowing them to excite.
The new vector obtained after the adjustment is not necessarily a
unit vector, so it is necessary to re-normalize the adjusted vector. In
other words, after Step (3) output and weight adjustment is completed,
it is necessary to return to Step (1) vector normalization to continue
training until learning rate μ(t) attenuates to 0.
Next, we introduce the principle of competitive learning. As shown
in Fig. 1.17, assuming that the input pattern of a problem is a two-
dimensional vector, the normalized input pattern can be regarded as
points distributed on the unit circle, represented by “O”. It is assumed
that the competitive learning neural network has three neurons, and
the corresponding three inner star vectors are also distributed on the
unit circle after normalization, which is represented by the gray square.
From the observation of Fig. 1.17, we can see that the input pattern
points can be clustered into three clusters, that is, they can be divided
into three categories. In the initial state, the inner star vectors of the
neurons in the competition layer are randomly distributed, so how does
the competitive learning neural network realize the classification of
input patterns?
Example 1.4 The data set of this example comes from UCI machine
learning database, which is about the classification problem of iris. This
problem belongs to supervised learning. The data set includes 150
samples, each of which has 4 attributes. All samples are divided into 3
categories, with 50 samples in each category. The three types of iris are
Sentosa, Versicolour and Virginica. The four attributes of this dataset
are sepal length, sepal width, petal length and petal width. Matlab is
used to write the program of competitive learning neural network.
After training, all samples are predicted and the type of each sample is
output.
This problem is simulated in Matlab, and the programs are as
follows:
[inputs, outputs] = iris_dataset;
outputs = vec2ind(outputs);
net = competlayer(3);
net = configure(net, inputs);
net.trainParam.epochs = 50;
net = train(net,inputs);
y = net(inputs);
y = vec2ind(y);
figure(2); hold on;
plot(outputs, ‘d’, ‘MarkerSize’,10,‘LineWidth’,2,‘LineStyle’,‘none’);
plot(y, ‘*’, ‘MarkerSize’,10,‘LineWidth’,2,‘LineStyle’,‘none’);
box on; grid on; hold off;
The running results of the above program are shown in Fig. 1.20, with
the true category of each sample represented by a diamond and the
predicted category of each sample represented by an asterisk. As can be
seen from the figure, the competitive learning neural network correctly
judged the categories of most sample points, while about a dozen
sample categories were incorrectly predicted.
Fig. 1.20 Competitive learning: clustering results of iris data
(1.34)
As can be seen from (1.34), when x > 0, the value of ReLU activation
function is x; When x ≤ 0, the value of ReLU activation function is 0. It is
not difficult to see that the value of the activation function is always
non-negative. And the derivative of the ReLU activation function is:
(1.35)
The expression of the cross-entropy loss function is:
(1.36)
The sample size of this example is not large, and in fact, it does not need
to use deep neural network. Shallow neural network can also solve this
problem. This is just a teaching case to show the difference between
shallow neural network and deep neural network. Too many samples
will take a long training time, which is not conducive to case
presentation.
The problem is simulated in Matlab, and the shallow neural network
is used to solve the problem. The programs are as follows:
rng(0);
filename = “transmissionCasingData.csv”;
tbl = readtable(filename,‘TextType’,‘String’);
labelName = “GearToothCondition”;
tbl = convertvars(tbl,labelName,‘categorical’);
classNames = categories(tbl{:,labelName});
categoricalInputNames = [“SensorCondition” “ShaftCondition”];
tbl = convertvars(tbl,categoricalInputNames,‘categorical’);
for i = 1:numel(categoricalInputNames).
name = categoricalInputNames(i);
oh = onehotencode(tbl(:,name));
tbl = addvars(tbl,oh,‘After’,name);
tbl(:,name) = [];
end
tbl = splitvars(tbl);
inputs = (table2array(tbl(:, 1:(end-1))))’;
outputs = (double(tbl{:,labelName}))’;
hiddenLayerSize = [20];
net = feedforwardnet(hiddenLayerSize);
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
net.trainFcn = ‘traingdm’;
net.trainParam.epochs = 1000;
[net, tr] = train(net, inputs, outputs);
tstInd = tr.testInd;
YPred = net(inputs(:, tstInd));
YPred(YPred<1.5) = 1;
YPred(YPred>=1.5) = 2;
tstOutputs = outputs(tstInd);
accuracy = sum(YPred = = tstOutputs)/numel(tstOutputs);
figure (1);
confusionchart(tstOutputs,YPred);
After running the above program, it can be concluded that the
accuracy rate of the shallow neural network on the test set is 90.32%.
The program also draws the confusion matrix, which is omitted here.
The training process of shallow neural network is shown in Fig. 1.21. As
can be seen from the figure, there are 20 neurons in the hidden layer,
and the network trains the training data 1000 times. Such training
times are called epochs, also known as generations. This means that
each training sample has been repeated for 1,000 generations. As
shown in Fig. 1.21, the training time of this network is very short, only
1 s.
Fig. 1.21 Training process of shallow neural network
Next, the problem is simulated in Matlab and the deep neural
network is used to solve the problem. The programs are as follows:
rng(0);
filename = “transmissionCasingData.csv”;
tbl = readtable(filename,‘TextType’,‘String’);
labelName = “GearToothCondition”;
tbl = convertvars(tbl,labelName,‘categorical’);
classNames = categories(tbl{:,labelName});
categoricalInputNames = [“SensorCondition” “ShaftCondition”];
tbl = convertvars(tbl,categoricalInputNames,‘categorical’);
for i = 1:numel(categoricalInputNames)
name = categoricalInputNames(i);
oh = onehotencode(tbl(:,name));
tbl = addvars(tbl,oh,‘After’,name);
end
tbl = splitvars(tbl);
numObservations = size(tbl,1);
numObservationsTrain = floor(0.7*numObservations);
numObservationsValidation = floor(0.15*numObservations);
numObservationsTest = numObservations - numObservationsTrain -
numObservationsValidation;
idx = randperm(numObservations);
idxTrain = idx(1:numObservationsTrain);
idxValidation = idx(numObservationsTrain + 1: …
numObservationsTrain + numObservationsValidation);
idxTest = idx(numObservationsTrain + numObservationsValidation +
1:end);
tblTrain = tbl(idxTrain,:);
tblValidation = tbl(idxValidation,:);
tblTest = tbl(idxTest,:);
numFeatures = size(tbl,2) - 1;
numClasses = numel(classNames);
layers = [featureInputLayer(numFeatures,‘Normalization’, ‘zscore’).
fullyConnectedLayer(20
batchNormalizationLayer
reluLayer
fullyConnectedLayer(numClasses)
softmaxLayer.
classificationLayer];
miniBatchSize = 8;
options = trainingOptions(‘adam’, …
‘MaxEpochs’, 30, …
‘MiniBatchSize’,miniBatchSize, …
‘Shuffle’,‘every-epoch’, …
‘ValidationData’,tblValidation, …
‘Plots’,‘training-progress’, …
‘Verbose’,false);
net = trainNetwork(tblTrain,labelName,layers,options);
YPred = classify(net,tblTest(:,1:end-
1),‘MiniBatchSize’,miniBatchSize);
YTest = tblTest{:,labelName};
accuracy = sum(YPred = = YTest)/numel(YTest);
figure(1);
cm = confusionchart(YTest,YPred);
After running the above program, it can be concluded that the
accuracy of the deep neural network on the test set is 93.75%. The
program also draws the confusion matrix, which is omitted here. The
training process of this deep neural network is shown in Fig. 1.22. As
can be seen from the figure, the deep neural network is run on a single
CPU computer, and the accuracy of the verification set is 93.55%, which
is similar to that of the test set, indicating that there is no overfitting
phenomenon. The network performed 30 rounds of training on the
training data, much less than the 1000 rounds of shallow neural
network. As shown in Fig. 1.22, the training time of this network is 14 s,
indicating that the training time of deep neural network is longer than
that of shallow neural network. Moreover, during the training of this
deep neural network, the number of iterations per round is 18. This is
because deep neural networks generally use batch or minibatch
training.
Fig. 1.22 Training process of deep neural network
After finishing the design of the network, you can import the
existing dataset in the “Data” panel, and also train the deep neural
network in the “Training” panel, and these operations will not be
described specifically. It should be noted that some features of the Deep
Network Designer App are not perfect, and new features are added
every year, so users can design the network according to the version of
Matlab.
Exercises
(1) Try to write three neuron activation functions and draw the
corresponding curves.
(2)
Try to draw a schematic diagram of the neuron model, write the
mathematical model of the neuron, and explain the meaning of
each variable in the model.
(3)
Try to write the process of backpropagation neural network
algorithm and explain its advantages and disadvantages.
References
1. Zhou ZH (2021) Machine learning. Springer, Singapore. https://doi.org/10.1007/978-981-
15-1967-3
3. Chen K, Zhang X, Zhang X (2022) Identifying important attributes for secondary school
student performance prediction. In: Liang Q, Wang W, Mu J, Liu X, Na Z (eds) 3rd Artificial
intelligence in China, ChangBaiShan, July 2021. Lecture Notes in Electrical Engineering, vol
854. Springer, Singapore, pp 151–158. https://doi.org/10.1007/978-981-16-9423-3_19
5. Aggarwal CC (2018) Neural networks and deep learning—a textbook. Springer, Cham.
https://doi.org/10.1007/978-3-319-94463-0
OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
X. Zhang et al., Intelligent Information Processing with Matlab
https://doi.org/10.1007/978-981-99-6449-9_2
Xin Zhang
Email: [email protected]
Wei Wang
Email: [email protected]
Abstract
Convolutional neural network is one of the most important networks in
deep learning. Different from common artificial neural network, the
main characteristic of convolutional neural network is the convolution
operation. It has made remarkable achievements in computer vision
and natural language processing. Moreover, convolutional neural
network has received extensive attention from industry and academia.
This chapter first introduces the convolution operation of the
convolutional neural network. Then performance evaluation metrics
are introduced. Based on two typical convolutional neural network,
transfer learning is demonstrated to use trained convolutional neural
network to solve new computer vision problems. Finally, the state-of-
the-art research progress of artificial neural network is provided.
Kernel 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 1 1
2 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 1
3 1 1 1 0 0 0 1 1 1 0 0 1 0 1 1 1
4 0 1 1 1 0 0 1 1 1 1 0 0 1 0 1 1
5 0 0 1 1 1 0 0 1 1 1 1 0 1 1 0 1
6 0 0 0 1 1 1 0 0 1 1 1 1 0 1 1 1
In Table 2.1, the serial number in the first row represents the 16
convolutional kernels of convolutional layer 3, while the serial number
in the first column represents the 6 convolutional kernels of pooling
layer 2. The number 1 after the second column in the second row
indicates that one of the convolutional kernels of pooling layer 2 is
connected to one of the convolutional kernels of convolutional layer 3,
while the number 0 indicates that there is no connection between the
two convolutional kernels.
The fourth layer of LeNet is the pooling layer, noted as pooling layer
4. The feature map input to pooling layer 4 is 10 × 10 pixels, using 2 × 2
pixels of sampling, which is the same setup as pooling layer 2, and the
output is a 5 × 5 pixel feature map. Pooling layer 4 uses 16 2 × 2 pixels
for sampling, so it generates 16 feature maps of 5 × 5 pixels, i.e., the
number of neurons is 5 × 5 × 16 = 400. The number of connections from
convolutional layer 3 to pooling layer 4 is (2 × 2 + 1) × 400 = 2000.
The fifth layer of LeNet is the convolutional layer, which is called
convolutional layer 5. In convolutional layer 5, 120 convolutional
kernels are used, and the size of each convolutional kernel is 5 × 5
pixels with a step size of 1. Therefore, the size of the image computed
by each convolutional kernel is (5–5 + 1) × (5–5 + 1) = 1 × 1 pixel, and
120 feature maps of 1 pixel are obtained from 120 convolutional
kernels. The number of neurons in convolutional layer 5 is 120, and
each neuron is connected to all the 16 feature maps obtained from
pooling layer 4. The number of connections from pooling layer 4 to
convolutional layer 5 is (5 × 5 × 16 + 1) × 120 = 48,120. Since the
convolutional kernels of convolutional layer 5 have the same size as the
feature maps obtained from pooling layer 4 and are all connected to
each other, convolutional layer 5 is equivalent to a fully connected layer.
The sixth layer of LeNet is the fully connected layer, denoted as fully
connected layer 6. In fully connected layer 6, the number of neurons is
84, and the number of connections from 120 neurons in convolutional
layer 5–84 neurons in this layer is (120 + 1) × 84 = 10,164. This layer
uses the Sigmoid activation function.
The seventh layer of LeNet is the output layer, denoted as output
layer 7, which is also a fully connected layer. In output layer 7, the
number of neurons is 10. It uses one-hot encoding method, and 10
neurons can represent the category of numbers from 0 to 9. This layer
uses radial basis functions as activation functions.
In Matlab, you can use the Deep Network Designer App to design
LeNet, as shown in Fig. 2.3. Export the designed network and select
“Generate Code” to generate the corresponding programs, as follows:
Fig. 2.3 Designing LeNet neural network
layers = [
imageInputLayer([32 32 1],“Name”,“imageinput”)
convolution2dLayer([5 5],6,“Name”,“conv1”)
maxPooling2dLayer([2
2],“Name”,“maxpool2”,“Padding”,“same”,“Stride”,[2 2])
convolution2dLayer([5 5],16,“Name”,“conv3”)
maxPooling2dLayer([2
2],“Name”,“maxpool4”,“Padding”,“same”,“Stride”,[2 2])
convolution2dLayer([5 5],120,“Name”,“conv5”)
fullyConnectedLayer(84,“Name”,“fc6”)
fullyConnectedLayer(10,“Name”,“fc7”)
softmaxLayer(“Name”,“softmax”)
classificationLayer(“Name”,“classoutput”)];
plot(layerGraph(layers));
For multi-classification problems, “softmaxLayer” and
“classificationLayer” are generally required, but they are generally not
counted as layers. As mentioned above, the input of LeNet is a 32 × 32
pixel grayscale image, while the image of MNIST is a 28 × 28 pixel
grayscale image, there are two ways to solve this problem. One way is to
adjust the network structure of LeNet, such as the size of the
convolutional kernel and the way to connect between layers; the other
way is to adjust the size of the image. Here we use the second method,
which is to enlarge the image from 28 × 28 pixels to 32 × 32 pixels.
After resizing the image, you can use LeNet to solve the MNIST
classification problem. For the image resizing, the following programs
can be used:
resize3dLayer(“Name”,“resize3d-output-size”,…
“GeometricTransformMode”,“half-pixel”,“Method”,“nearest”,…
“NearestRoundingMode”,“round”,“OutputSize”,[32 32 1]).
Place the program for adjusting image size between the input layer
and convolution layer 1.
The programs to train and test LeNet model are as follows:
layers = [
imageInputLayer([28 28 1],“Name”,“imageinput”)
resize3dLayer(“Name”,“resize3d-output-size”,…
“GeometricTransformMode”,“half-pixel”,“Method”,“nearest”,…
“NearestRoundingMode”,“round”,“OutputSize”,[32 32 1])
convolution2dLayer([5 5],6,“Name”,“conv1”)
maxPooling2dLayer([2
2],“Name”,“maxpool2”,“Padding”,“same”,“Stride”,[2 2])
convolution2dLayer([5 5],16,“Name”,“conv3”)
maxPooling2dLayer([2
2],“Name”,“maxpool4”,“Padding”,“same”,“Stride”,[2 2])
convolution2dLayer([5 5],120,“Name”,“conv5”)
fullyConnectedLayer(84,“Name”,“fc6”)
fullyConnectedLayer(10,“Name”,“fc7”)
softmaxLayer(“Name”,“softmax”)
classificationLayer(“Name”,“classoutput”)];
options = trainingOptions(‘sgdm’, …
‘MaxEpochs’,10, …
‘MiniBatchSize’,128, …
‘Plots’,‘training-progress’);
trainNet = trainNetwork(XTrain,YTrain,layers,options);
save(‘MNIST_LeNet5.mat’,‘trainNet’);
YPred = classify(trainNet, XTest);
accuracy = sum(YPred == YTest)/numel(YPred);
The training process is not shown for the saving of space. The
accuracy on the test set is 98.42%, which shows that LeNet is able to
solve the MNIST classification problem.
It should be noted that when using Matlab to solve classification
problems, the classify function is generally used to make predictions,
while when using Matlab to solve regression problems, the predict
function is generally used to make predictions.
We still use the LeNet model from the previous section and the
network structure can be analyzed using the following program:
analyzeNetwork(layers);
Analyzing the network structure yields Fig. 2.5, from which we can
see the size of the output feature map for each layer and the number of
weight parameters to be learned for each layer. For LeNet, the input
layer, adjustment size, pooling layer, Softmax and output layer do not
involve weight parameters, so they are indicated by “-” in the figure.
Fig. 2.5 Analyzing the LeNet network
The training and testing programs for the LeNet network are as
follows:
layers = [
imageInputLayer([28 28 1],“Name”,“imageinput”)
resize3dLayer(“Name”,“resize3d-output-size”,…
“GeometricTransformMode”,“half-pixel”,“Method”,“nearest”,…
“NearestRoundingMode”,“round”,“OutputSize”,[32 32 1])
batchNormalizationLayer
convolution2dLayer([5 5],6,“Name”,“conv1”)
maxPooling2dLayer([2
2],“Name”,“maxpool2”,“Padding”,“same”,“Stride”,[2 2])
convolution2dLayer([5 5],16,“Name”,“conv3”)
maxPooling2dLayer([2
2],“Name”,“maxpool4”,“Padding”,“same”,“Stride”,[2 2])
convolution2dLayer([5 5],120,“Name”,“conv5”)
fullyConnectedLayer(84,“Name”,“fc6”)
fullyConnectedLayer(10,“Name”,“fc7”)
softmaxLayer(“Name”,“softmax”)
classificationLayer(“Name”,“classoutput”)];
options = trainingOptions(‘sgdm’, …
‘MaxEpochs’,10, …
‘Shuffle’,‘every-epoch’, …
‘MiniBatchSize’,128, …
‘ValidationData’,imdsValid, …
‘ValidationFrequency’,10, …
‘Verbose’,false, …
‘Plots’,‘training-progress’);
trainNet = trainNetwork(imdsTrain,layers,options);
save(‘Digits_LeNet5.mat’,‘trainNet’);
YPred = classify(trainNet, imdsTest);
YTest = imdsTest.Labels;
accuracy = sum(YPred = = YTest)/numel(YPred);
The results of the above program after running are shown in
Fig. 2.6. It can be seen that the training process was completed in 78 s
on a CPU computer, which is still relatively fast. The accuracy on the
validation set is 95.85%, while the accuracy on the test set is 96.65%,
which shows that the LeNet network is able to solve the classification
problem of the Digits dataset.
(2.1)
(2.2)
(2.3)
(2.4)
where β is the parameter that balances the precision rate and the recall
rate. When β = 2, the weight of recall is higher than that of precision,
and the F-score focuses on recall; when β = 0.5, the weight of precision
is higher than that of recall, and the F-score focuses on precision; and
when β = 1, the weights of precision and recall are equal, and the F-
score is also called F1-score. F1-score is also a common metric to
evaluate the performance of the model.
We can also calculate the true positive class rate, whose expression
is:
(2.5)
where TPR stands for True Positive Rate. It is clear to see that the TPR
and the recall rate are the same. Correspondingly, the false positive
class rate is:
(2.6)
where FPR denotes the False Positive Rate. The TPR and FPR rate can
be calculated from the first and second rows in Fig. 2.7. By using the
TPR as the vertical axis and the FPR as the horizontal axis, we can plot a
curve called Receiver Operating Characteristic (ROC). Since both TPR
and FPR are between 0 and 1, ROC is a curve located in [0, 1] × [0, 1].
We refer to the area enclosed by the ROC curve and the horizontal axis
as the Area Under Curve (AUC). Since AUC is a numerical value, it can
quantitatively describe the performance of a classification model.
For binary classification problems, ACC, P, R, F1 score and AUC are
available evaluation metrics. For multi-class classification problems, it
is sufficient to generalize the above formulae. These evaluation metrics
are between 0 and 1, and the closer to 1 the better the performance of
the classification model. The values of ACC, P, R, F1-score and AUC can
be expressed as percentage. Let us take the LeNet network as an
example to solve the handwritten postal code recognition problem,
which is a ten-class classification problem. The programs to calculate
the evaluation metrics on the test set are as follows:
M = confusionmat(YTest, YPred);
ACC = sum(diag(M)) / sum(M(:));
P1 = diag(M)./(sum(M,1) + 0.0001)’;
R1 = diag(M)./(sum(M,2) + 0.0001);
P = mean(P1);
R = mean(R1);
F1score = 2*P*R/(P + R);
fig = figure;
cm = confusionchart(YTest,YPred,‘RowSummary’,…
‘row-normalized’,‘ColumnSummary’,‘column-normalized’);
CLASSES = unique(YTest);
for i1 = 1:length(CLASSES).
% compute AUC for Class i.
[XRF,YRF,TRF,AUCRF(i1)] = perfcurve(YTest,…
YPredScores(:, i1),CLASSES(i1));
end
AUC = mean(AUCRF);
After running the above program, the result in Fig. 2.8 is obtained.
The values of all metrics are expressed as percentage. The ACC
overaged on ten categories is 97.6%, the average precision rate on ten
categories is 97.65%, the average recall rate on ten categories is 97.6%,
and the average AUC on ten categories is 99.99%. All four metrics are
close to 100%, which shows that LeNet shows a very good performance
on Digits dataset.
Fig. 2.8 Confusion matrix of the LeNet network on Digits dataset
In Fig. 2.8, we can see that the precision rate of each category, the
precision rate of number 9 is 93.5%, and the precision rate of the rest of
numbers are above 95%; the recall rates of number 3 and number 8 are
92.5% and 94.5% respectively, and the recall rates of the rest of
numbers are above 95%. It can be seen that although the precision rate
and recall rate of LeNet on the test set are above 95%, its precision or
recall of some digits could be less than 95%, which indicates that there
is still some room for improvement in the recognition of a specific digit.
Label Amount
caesar_salad 26
caprese_salad 15
french_fries 181
greek_salad 24
hamburger 238
Label Amount
hot_dog 31
pizza 299
sashimi 40
Sushi 124
The food example image dataset is a small size dataset and the
number of images in each category is different. From Table 2.2, we can
see that the category with the least sample is “caprese_salad” with only
15 images and the category with the most sample is “pizza” with 299
images, and the sample ratio of these two categories is about 1:20, so
this dataset can be considered as an unbalanced dataset.
AlexNet is a convolutional neural network proposed by Hinton and
his student Alex in 2012. It won the first place in the ImageNet dataset
competition. AlexNet is a convolutional neural network with 8 layers,
where the first 5 layers are convolutional and the last 3 layers are fully
connected. Without going into details of the exact structure and
computational process of AlexNet, the programs to solve the problem
using pre-trained AlexNet are as follows:
dataDir = fullfile(“ExampleFoodImageDataset”);
url = “https://www.mathworks.com/supportfiles/nnet/data/
ExampleFoodImageDataset.zip”;
if ~ exist(dataDir, “dir”)
mkdir(dataDir);
downloadExampleFoodImagesData(url,dataDir);
end
imds = imageDatastore(‘ExampleFoodImageDataset’, …
‘IncludeSubfolders’,true,‘LabelSource’,‘foldernames’);
labelCount = countEachLabel(imds);
img1 = readimage(imds,1);
size(img1)
[imdsTest,imdsTrain] = splitEachLabel(imds,0.2,‘randomize’);
[imdsValid,imdsTrain] = splitEachLabel(imdsTrain,0.25,‘randomize’);
labelCountTest = countEachLabel(imdsTest);
labelCountTrain = countEachLabel(imdsTrain);
labelCountValid = countEachLabel(imdsValid);
numTrainImages = numel(imdsTrain.Labels);
idx = randperm(numTrainImages,9);
I = imtile(imds, ‘Frames’, idx);
figure;
imshow(I);
net = alexnet;
inputSize = net.Layers(1).InputSize;
analyzeNetwork(net);
numClasses = numel(categories(imdsTrain.Labels));
layersTransfer = net.Layers(1:end-3);
layers = [
layersTransfer
fullyConnectedLayer(numClasses,‘WeightLearnRateFactor’,…
10,‘BiasLearnRateFactor’,10)
softmaxLayer
classificationLayer];
lgraph = layerGraph(layers);
aug = imageDataAugmenter(“RandXReflection”, true, …
“RandYReflection”, true, …
“RandXScale”, [0.8 1.2], …
“RandYScale”, [0.8 1.2]);
augImdsTrain = augmentedImageDatastore(inputSize(1:2),
imdsTrain, …
‘DataAugmentation’, aug);
augImdsVal = augmentedImageDatastore(inputSize(1:2), imdsValid);
opts = trainingOptions(“adam”, …
“InitialLearnRate”, 1e-4, …
“MaxEpochs”, 10, …
“ValidationData”, augImdsVal, …
“Verbose”, false,…
“Plots”, “training-progress”, …
“ExecutionEnvironment”,“cpu”,…
“MiniBatchSize”,128);
netTransfer = trainNetwork(augImdsTrain, lgraph, opts);
save(‘Food_AlexNet.mat’,‘netTransfer’, …
‘imdsTrain’,‘imdsValid‘,‘imdsTest‘);
augImdsTest = augmentedImageDatastore(inputSize(1:2), imdsTest);
[YPred, YPredScores] = classify(netTransfer, augImdsTest);
YTest = imdsTest.Labels;
M = confusionmat(YTest, YPred);
ACC = sum(diag(M)) / sum(M(:));
P1 = diag(M)./(sum(M,1) + 0.0001)‘;
R1 = diag(M)./(sum(M,2) + 0.0001);
P = mean(P1); % mean precision of all classes
R = mean(R1); % mean recall of all classes
F1score = 2*P*R/(P + R);
fig = figure;
cm = confusionchart(YTest,YPred,‘RowSummary‘,…
‘row-normalized‘,‘ColumnSummary‘,‘column-normalized‘);
CLASSES = unique(YTest);
for i1 = 1:length(CLASSES)
% compute AUC for Class i
[XRF,YRF,TRF,AUCRF(i1)] = perfcurve(YTest,…
YPredScores(:, i1),CLASSES(i1));
end
AUC = mean(AUCRF);
The results of the above program after running are shown in
Figs. 2.9, 2.10 and 2.11. Figure 2.9 shows the structure analysis of
AlexNet. It is known that the AlexNet network model contains more
than 60 million parameters to be learned. It should be noted that the
figure shows that the AlexNet structure consists of 25 layers, which is
because each step from input to output is considered as one layer in
Matlab, so the number of layers shown in the figure is larger than the 8
layers introduced earlier. The training process of this model is more
time consuming if the model is not pre-trained.
Fig. 2.9 Analysis of the AlexNet network
Fig. 2.10 Training process of the AlexNet on food example image dataset
Fig. 2.11 Confusion matrix of the AlexNet on food example image dataset
Figure 2.10 shows the training process of the AlexNet network, and
it can be seen that the model was re-trained on a CPU computer in less
than 6 min based on the pre-trained model. The accuracy on the
validation set was 85.71%. The accuracy on the test set was 83.67%.
Thus, the AlexNet network is able to solve food example image dataset.
Figure 2.11 shows the confusion matrix of the AlexNet network.
Although the model has a high average accuracy and recall on the nine
categories, it has shortcomings, for example, in the “sashimi” category,
the precision rate of AlexNet is only 44.4% and the recall rate of the
AlexNet is only 50.0%. The precision rate of the AlexNet in the
“hot_dog” category is 66.7%, and the recall in the “hot_dog” category is
only 33.3%.
SqueezeNet is a convolutional neural network proposed by Iandola
and other scholars [3]. After simulation experiments, SqueezeNet
achieves the same accuracy rate as AlexNet, but SqueezeNet has only
one-fiftieth of the number of parameters of AlexNet. SqueezeNet also
opens up a new research direction in the field of artificial intelligence,
which is to maximize the computational speed without decreasing the
accuracy of the model. A pre-trained version from the ImageNet
dataset, which is based on more than 1 million images, has been saved
in Matlab. The pre-trained network can classify images into 1000 object
classes, such as keyboard, mouse, pencil and many animals. Thus,
SqueezeNet has learned a rich feature representation of a wide range of
images, and its network input image size is 227 × 227.
We use the food example image dataset as the research problem.
The programs to solve the problem using pre-trained SqueezeNet are as
follows:
dataDir = fullfile(“ExampleFoodImageDataset”);
url = “https://www.mathworks.com/supportfiles/nnet/data/
ExampleFoodImageDataset.zip”;
if ~ exist(dataDir, “dir”).
mkdir(dataDir);
downloadExampleFoodImagesData(url,dataDir);
end
imds = imageDatastore(‘ExampleFoodImageDataset‘, …
‘IncludeSubfolders‘,true,‘LabelSource‘,‘foldernames‘);
labelCount = countEachLabel(imds);
img1 = readimage(imds,1);
size(img1)
[imdsTest,imdsTrain] = splitEachLabel(imds,0.2,‘randomize‘);
[imdsValid,imdsTrain] = splitEachLabel(imdsTrain,0.25,‘randomize‘);
labelCountTest = countEachLabel(imdsTest);
labelCountTrain = countEachLabel(imdsTrain);
labelCountValid = countEachLabel(imdsValid);
numTrainImages = numel(imdsTrain.Labels);
idx = randperm(numTrainImages,9);
I = imtile(imds, ‘Frames‘, idx);
figure;
imshow(I);
net = squeezenet;
inputSize = net.Layers(1).InputSize;
analyzeNetwork(net);
lgraph = layerGraph(net);
numClasses = numel(categories(imdsTrain.Labels));
newConvLayer = convolution2dLayer ([1,1],numClasses,…
‘WeightLearnRateFactor‘,10,‘BiasLearnRateFactor‘,…
10,“Name”,‘new_conv‘);
lgraph = replaceLayer(lgraph,‘conv10‘,newConvLayer);
newClassificatonLayer =
classificationLayer(‘Name‘,‘new_classoutput‘);
lgraph = replaceLayer(lgraph,…
‘ClassificationLayer_predictions‘,newClassificatonLayer);
aug = imageDataAugmenter(“RandXReflection”, true, …
“RandYReflection”, true, …
“RandXScale”, [0.8 1.2], …
“RandYScale”, [0.8 1.2]);
augImdsTrain = augmentedImageDatastore(inputSize(1:2),
imdsTrain, …
‘DataAugmentation‘, aug);
augImdsVal = augmentedImageDatastore(inputSize(1:2), imdsValid);
opts = trainingOptions(“adam”, …
“InitialLearnRate”, 1e-4, …
“MaxEpochs”, 10, …
“ValidationData”, augImdsVal, …
“Verbose”, false,…
“Plots”, “training-progress”, …
“ExecutionEnvironment”,“cpu”,…
“MiniBatchSize”,128);
netTransfer = trainNetwork(augImdsTrain, lgraph, opts);
save(‘Food_SqueezeNet.mat‘,‘netTransfer‘, …
‘imdsTrain‘,‘imdsValid‘,‘imdsTest‘);
augImdsTest = augmentedImageDatastore(inputSize(1:2), imdsTest);
[YPred, YPredScores] = classify(netTransfer, augImdsTest);
YTest = imdsTest.Labels;
M = confusionmat(YTest, YPred);
ACC = sum(diag(M)) / sum(M(:));
P1 = diag(M)./(sum(M,1) + 0.0001)‘;
R1 = diag(M)./(sum(M,2) + 0.0001);
P = mean(P1); % mean precision of all classes
R = mean(R1); % mean recall of all classes
F1score = 2*P*R/(P + R);
fig = figure;
cm = confusionchart(YTest,YPred,‘RowSummary‘,…
‘row-normalized‘,‘ColumnSummary‘,‘column-normalized‘);
CLASSES = unique(YTest);
for i1 = 1:length(CLASSES)
% compute AUC for Class i.
[XRF,YRF,TRF,AUCRF(i1)] = perfcurve(YTest,…
YPredScores(:, i1),CLASSES(i1));
end
AUC = mean(AUCRF);
The results of the above program after running are shown in
Figs. 2.12, 2.13 and 2.14. Figure 2.12 gives the structural analysis of the
SqueezeNet. The SqueezeNet network model contains about one
million two hundred thousand parameters to be learned. It should be
noted that the figure shows that the SqueezeNet structure consists of
68 layers, which is because in Matlab, each step from input to output is
considered as one layer, so the number of layers shown in the figure is
larger than the 18 layers introduced earlier.
Fig. 2.14 Confusion matrix of the SqueezeNet on food example image dataset
The training process of this model is more time consuming if the
model is not pre-trained. Figure 2.13 shows the training process of the
SqueezeNet. It can be seen that the model was retrained on a CPU
computer in less than eight minutes based on the pre-trained model
and with an accuracy of 75.51% on the validation set. The accuracy of
the SqueezeNet on the test set is 76.02%. Thus, the SqueezeNet is able
to solve the food example image dataset.
Figure 2.14 shows the confusion matrix of the SqueezeNet. It can be
seen that the model‘s precisioin rate of SqueezeNet does not exceed
50.0% on categories of “greek_salad” and “sashimi”. Moreover, the
SqueezeNet predicts incorrectly on all samples of the “caesar_salad”
category, resulting in an precision rate that cannot be calculated. Except
“french_fries”, “greek_salad”, “sashimi” and “sushi” categories, the recall
rates of the SqueezeNet are all less than 33.3%.
Finally, the comparison of the AlexNet and the SqueezeNet
performance on the food example image dataset is shown in Table 2.3.
Table 2.3 Performance of AlexNet and SqueezeNet on food example image dataset
Table 2.3 gives the values of the five metrics ACC, P, R, F1-score and
AUC introduced in the previous section, where F1-score is abbreviated
as F1. it can be seen that the AlexNet outperforms the SqueezeNet in all
metrics. This result is consistent with the experimental results of recent
researchers. The total number of parameters of the SqueezeNet is
reduced by about 50 times. Although the accuracy of the SqueezeNet is
lower than that of the AlexNet, the reduction in the total number of
parameters makes the SqueezeNet applicable to real-time image
classification problems. Moreover, the SqueezeNet can be laid out in
small chips such as field programmable gate array (FPGA).
Except MINST and ImageNet dataset, readers can use other
datasets. For example, corona virus disease 2019 (COVID-19) is a
worldwide outbreak of an infectious disease. There are publicly
available datasets on the classification problem of COVID-19 [4].
Readers could easily use AlexNet or SqueezeNet to solve COVID-19
dataset.
Exercises
(1)
Try to analyze the working of convolutional layer and pooling
layer in convolutional neural network and give examples.
(2)
Try to analyze the difference between convolutional neural
network and fully connected backpropagation neural network.
(3)
Novel Corona Virus Disease 2019 (COVID-19) is a worldwide
outbreak of an infectious disease. There are publicly available
datasets on the classification problem of COVID-19. For example,
COVID-CT is a CT image dataset (https://github.com/UCSD-AI4H/
COVID-CT) that includes 349 images with COVID-19 and 463
images without COVID-19. Please use Matlab to create a
convolutional neural network to solve the classification problem
and analyze its performance.
(4)
Please download the COVID-19 dataset and solve the problem by
transfer learning using pre-trained neural network models in
Matlab, e.g., AlexNet and SqueezeNet, and compare the
performance of the different models.
References
1. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document
recognition. Proc IEEE 86(11):2278–2324
[Crossref]
5. Borisov V, Leemann T, SeBler K et al (2022) Deep neural networks and tabular data: a
survey. IEEE Trans Neural Netw Learn Syst
6. Liu M, Chen L, Du X et al (2022) Activated gradients for deep neural network. IEEE Trans
Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2022.3229161
[Crossref]
7. Kabir HMD, Abdar M, Khosravi A et al (2022) Spinalnet: deep neural network with gradual
input. IEEE Trans Artif Intell. https://doi.org/10.1109/TAI.2022.3185179
[Crossref]
9. Song H, Kim M, Park D et al (2022) Learning from noisy labels with deep neural networks:
A survey. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2022.
3152527
[Crossref]
12. Galván E, Mooney P (2021) Neuroevolution in deep neural networks: current trends and
future challenges. IEEE Trans Artif Intell 2(6):476–493
[Crossref]
14. Wu X, Hong D, Chanussot J (2021) Convolutional neural networks for multimodal remote
sensing data classification. IEEE Trans Geosci Remote Sens 60:1–10
[Crossref]
15. Kumar A, Vashishtha G, Gandhi CP et al (2021) Novel convolutional neural network (NCNN)
for the diagnosis of bearing defects in rotary machinery. IEEE Trans Instrum Meas 70:1–10
16. Bessadok A, Mahjoub MA, Rekik I (2022) Graph neural networks in network neuroscience.
IEEE Trans Pattern Anal Mach Intell 45(5):5833–5848
[Crossref]
17.
Grattarola D, Alippi C (2021) Graph neural networks in tensorflow and keras with spektral
[application notes]. IEEE Comput Intell Mag 16(1):99–106
[Crossref]
19. Skarding J, Gabrys B, Musial K (2021) Foundations and modeling of dynamic networks
using dynamic graph neural networks: a survey. IEEE Access 9:79143–79168
[Crossref]
20. Huang Q, Yamada M, Tian Y et al (2023) Graphlime: local interpretable model explanations
for graph neural networks. IEEE Trans Knowl Data Eng 35(7):6968–6972
21. Bianchi FM, Grattarola D, Livi L et al (2021) Graph neural networks with convolutional
ARMA filters. IEEE Trans Pattern Anal Mach Intell 44(7):3496–3507
23. Liu M, Wang Z, Ji S (2021) Non-local graph neural networks. IEEE Trans Pattern Anal Mach
Intell 44(12):10270–10276
[Crossref]
24. Ruiz L, Gama F, Ribeiro A (2021) Graph neural networks: architectures, stability, and
transferability. Proc IEEE 109(5):660–682
[Crossref]
26. Chowdhury A, Verma G, Rao C et al (2021) Unfolding WMMSE using graph neural networks
for efficient power allocation. IEEE Trans Wireless Commun 20(9):6004–6017
[Crossref]
28. Liu Z, Qian P, Wang X et al (2023) Combining graph neural networks with expert
knowledge for smart contract vulnerability detection. IEEE Trans Knowl Data Eng
35(2):1296–1310
29. Isufi E, Gama F, Ribeiro A (2021) EdgeNets: edge varying graph neural networks. IEEE
Trans Pattern Anal Mach Intell 44(11):7457–7473
[Crossref]
30. Chen C, Li K, Wei W et al (2021) Hierarchical graph neural networks for few-shot learning.
IEEE Trans Circuits Syst Video Technol 32(1):240–252
[Crossref]
31.
Han Y, Huang G, Song S et al (2021) Dynamic neural networks: a survey. IEEE Trans Pattern
Anal Mach Intell 44(11):7436–7456
[Crossref]
32. Zhang Y, Tiňo P, Leonardis A et al (2021) A survey on neural network interpretability. IEEE
Trans Emerg Top Comput Intell 5(5):726–742
[Crossref]
34. Jospin LV, Laga H, Boussaid F et al (2022) Hands-on Bayesian neural networks—a tutorial
for deep learning users. IEEE Comput Intell Mag 17(2):29–48
[Crossref]
35. Fan FL, Xiong J, Li M et al (2021) On interpretability of artificial neural networks: a survey.
IEEE Trans Radiat Plasma Med Sci 5(6):741–760
[Crossref]
37. Darestani MZ, Heckel R (2021) Accelerated MRI with un-trained neural networks. IEEE
Trans Comput Imaging 7:724–733
[Crossref]
38. Gurrola-Ramos J, Dalmau O, Alarcón TE (2021) A residual dense u-net neural network for
image denoising. IEEE Access 9:31742–31754
[Crossref]
39. Kauffmann J, Esders M, Ruff L et al (2022) From clustering to cluster explanations via
neural networks. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.
2022.3185901
[Crossref]
40. Fei J, Wang H, Fang Y (2021) Novel neural network fractional-order sliding-mode control
with application to active power filter. IEEE Trans Syst, Man, Cybern: Syst 52(6):3508–3518
[Crossref]
41. Fei J, Wang Z, Liang X et al (2021) Fractional sliding-mode control for microgyroscope
based on multilayer recurrent fuzzy neural network. IEEE Trans Fuzzy Syst 30(6):1712–
1721
[Crossref]
OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
X. Zhang et al., Intelligent Information Processing with Matlab
https://doi.org/10.1007/978-981-99-6449-9_3
3. Fuzzy Computing
Xiu Zhang1 , Xin Zhang1 and Wei Wang1
(1) Tianjin Normal University, Tianjin, China
Xin Zhang
Email: [email protected]
Wei Wang
Email: [email protected]
Abstract
Fuzzy computing is a computational intelligence technology based on
fuzzy theory. Fuzzy computing can solve various problems such as
identification and clustering. Automatic control problems remain the
main application area of fuzzy computing. This chapter first introduces
the basis of fuzzy computing including fuzzy set and fuzzy membership
function. Fuzzy pattern recognition, fuzzy clustering and fuzzy inference
are three kinds of problems that fuzzy computing can solve. Then this
chapter introduces Mamdani fuzzy control system, which is one of the
most important application of fuzzy computing. Finally, this chapter
introduces fuzzy logic designer to build fuzzy controller for a control
problem.
(3.1)
If there are more than one ordinary set, they can also perform
operations on each other and there are some laws as follows:
(1)
Exchange law. , .
(2)
Combination law. ,
.
(3)
Absorption rate. , .
(4)
Idempotence law. , .
(5) Distributive law. ,
.
(6) Law of restitution. .
(7)
Complementary law. , .
(8)
0–1 law. , , , .
(9)
Inversion law, also known as De Morgan's law. ,
.
Based on the concept of classical sets, the concept of fuzzy sets is
introduced next.
(3.2)
The above expression determines a fuzzy set A of U, is called the
member function of the fuzzy set A.
The membership function reflects the degree to which the
elements in a fuzzy set A belong to that set. If the elements in U are
represented by x, then is called the degree of membership of the
element x belonging to the fuzzy set A. From Eq. (3.2), it can be seen that
takes values in the closed interval [0, 1]. If is close to 0, it
means that the degree of x belonging to A is low; conversely, if is
close to 1, it means that the degree of x belonging to A is high.
If a fuzzy set is to be represented, the corresponding method can be
used according to the specifics of the theoretical domain. When the
theoretical domain is a discrete finite set, the
methods usually used are the Zadeh representation, the ordered pair
representation and the vector representation.
(1) Zadeh representation method. The element in the theoretical
domain U is represented with its membership function ( )
by the following equation:
(3.3)
where A is the fuzzy set and the fraction in the expression is just a
form and not a division operation.
(3.4)
(3.5)
(3.7)
(3.8)
(3.10)
(3.11)
(3.12)
Then, using the Zadeh representation method, the “young” fuzzy set
Y can be expressed as:
(3.13)
(3.15)
(3.16)
[18, 25] [17, 30] [17, 28] [18, 25] [16, 35] [14, 25] [18, 30] [18, 35]
[18, 35] [16, 25] [15, 30] [18, 35] [17, 30] [18, 25] [18, 35] [20, 30]
[18, 30] [16, 30] [20, 35] [18, 30] [18, 25] [18, 35] [15, 25] [18, 30]
[15, 28] [16, 28] [18, 30] [18, 30] [16, 30] [18, 35] [18, 25] [18, 30]
[16, 28] [18, 30] [16, 30] [16, 28] [18, 35] [18, 35] [17, 27] [16, 28]
[15, 28] [18, 25] [19, 28] [15, 30] [15, 26] [17, 25] [15, 36] [18, 30]
[17, 30] [18, 35] [16, 35] [16, 30] [15, 25] [18, 28] [16, 30] [15, 28]
[18, 35] [18, 30] [17, 28] [18, 35] [15, 28] [15, 25] [15, 25] [15, 25]
[18, 30] [16, 24] [15, 25] [16, 32] [15, 27] [18, 35] [16, 25] [18, 30]
[16, 28] [18, 30] [18, 35] [18, 30] [18, 30] [18, 30] [17, 30] [18, 30]
[18, 35] [16, 30] [18, 28] [17, 25] [15, 30] [18, 25] [17, 30] [14, 25]
[18, 26] [18, 29] [18, 35] [18, 28] [18, 35] [18, 25] [16, 35] [17, 29]
[18, 25] [17, 30] [16, 28] [18, 30] [16, 28] [15, 30] [18, 30] [16, 30]
[20, 30] [20, 30] [16, 25] [17, 30] [15, 30] [18, 30] [16, 30] [18, 28]
[15, 35] [16, 30] [15, 30] [18, 35] [18, 35] [18, 30] [17, 30] [16, 35]
[17, 30] [15, 25] [18, 35] [15, 30] [15, 25] [15, 30] [18, 30] [17, 25]
[18, 29] [18, 28] [18, 35] [18, 25] [18, 30] [15, 30] [17, 30] [18, 30]
For Table 3.1, it is easy to see that the minimum age is 14 years and the
maximum age is 36 years. For the integer ages located in the interval
[14, 36], we can count their occurrences in Table 3.1, and the results are
shown in column 2 of Table 3.2.
Table 3.2 Frequency statistics of age
(3.17)
In the above equation a, b are two parameters with a < b. When the
element x lies between a and b, the affiliation is 1; otherwise, it is 0. In
addition, a one-sided distribution can be defined as follows:
(3.18)
(3.19)
(3.20)
(3.21)
(3.22)
where a, b, c, d are the four parameters and satisfy a < b < c < d.
The expressions for the small, intermediate and large fuzzy
distributions of the k-th parabolic fuzzy distribution are:
(3.23)
(3.24)
(3.25)
(3.26)
(3.27)
(3.28)
(3.29)
(3.30)
(3.31)
(3.32)
(3.33)
(3.34)
Union of fuzzy sets. The union of fuzzy sets is also known as the
maximal operator of fuzzy sets, or the sum operator of fuzzy sets. If
there are three fuzzy sets A, B and C, for all , all have:
(3.35)
(3.39)
(3.40)
Try to find the union and intersection of the two fuzzy sets.
(3.41)
(3.42)
(3.43)
that there is a small dot directly below in the definition of the strong
cut set.
(3.44)
Take =1, =0.7 and =0.3, respectively, and try to find their -cut
sets.
(3.45)
(3.46)
(3.47)
From the Example 3.3, we can see that -cut set is an ordinary set,
while the elements satisfying the level of correspond to a fuzzy set.
-cut set provides a way to convert between fuzzy and classical sets,
which is very useful when dealing with practical problems.
(3.49)
(3.50)
(3.51)
(2)
Nearest principle.
We first introduce the concepts of inner product, outer product and
nearness degree of fuzzy sets.
Definition 3.5 Suppose there are two fuzzy sets A and B on the
theoretical domain U. Then:
(3.54)
is called the inner product of A and B.
Definition 3.6 Suppose there are two fuzzy sets A and B on the
theoretical domain U. Then:
(3.55)
is called the outer product of A and B.
When the theoretical domain , according to
the above definition, we can obtain:
(3.56)
(3.57)
The nearness degree of two fuzzy sets can measure the similarity of
fuzzy sets. Commonly used nearness degree includes the lattice
nearness degree, the average nearness degree and the max–min
nearness degree.
The formula for lattice nearness degree is as follows:
(3.58)
The formula for the average nearness degree is as follows:
(3.59)
The formula for the max–min nearness degree is as follows:
(3.60)
(3.63)
When the theoretical domain is a finite set, for two sets A and B, we
can represent the fuzzy relation between A and B by a fuzzy matrix.
Specifically: suppose A and B are two non-empty sets, a fuzzy set R in the
direct product is called a binary fuzzy relation from A to B, or fuzzy
relation for short, and is denoted as .
Suppose , , and
denotes the fuzzy relation defined on A × B, then it is represented by the
matrix as:
(3.64)
(3.68)
(3.69)
(3.70)
(3.71)
(3.72)
(3.73)
Since the value of the data varies greatly, we use the maximum value
method to speciate the data, where each element of the matrix is divided
by the maximum value of the column it is in, i.e.:
(3.74)
(3.75)
(3.76)
From this, the fuzzy relation matrix can be obtained as:
(3.77)
(3.78)
(3.79)
(3.80)
Thus, we have .
Then, by choosing an appropriate confidence level , we
intercept based on the level. We sort the elements in in
descending order: 1 > 0.70 > 0.63 > 0.62 > 0.53. Hence, we can choose
and the clustering matrix is:
(3.81)
From the above equation, it can be seen that when , the data
are clustered into 5 classes.
(3.82)
(3.83)
(3.84)
From the above equation, it can be seen that when , the
data are clustered into 2 classes, where , , and belong to the
same class.
(3.85)
(3.86)
(3.87)
(3.88)
(3.89)
(3.90)
(5)
Distance method. This method uses the distance between vectors
to define the degree of similarity. The closer the distance between
two vectors, the greater the degree of similarity between them;
conversely, the farther the distance between two vectors, the less
similar the two vectors are. Commonly used distances include
Euclidean distance, Chebyshev distance, Hamming distance and
Minkowski distance. The expressions for the definition of these
distances will not be introduced.
(6)
Absolute value reciprocal method. This method uses the reciprocal
of the absolute value of the difference of two vectors to define the
degree of similarity, and its expression is:
(3.91)
As can be seen from Fig. 3.5, the grammatical rule G acts to convert
the linguistic variable x into a linguistic variable value, while the
semantic rule M acts to map the linguistic variable value into the
theoretical domain. A fuzzy linguistic variable is equivalent to a fuzzy
set. After the grammatical rule acts on the linguistic variable. That is, it
uses the tone operator to add “very”, “relatively”, “slightly”, etc. to the
linguistic variables to obtain different values of the linguistic variables.
These values need to be mapped to different affiliation values, which is
achieved by the semantic rules.
If we denote a fuzzy linguistic variable by A, whose membership
function is denoted by . We add to A the inflections “very”, “fairly”,
“comparatively”, “slightly”. The membership function of the linguistic
variable values can be transformed into ,
, . The complement of the fuzzy set can be
. Note that this is only an example to illustrate the
correspondence variation of grammatical rules and membership
functions, this correspondence form is not fixed and needs specific
analysis in each example.
In fuzzy computing, a fuzzy logic rule is a fuzzy implication relation.
An implication relation is essentially a kind of reasoning or inference.
One of the most commonly used fuzzy implication relations is: if x is A,
then y is B, denoted as A → B. In ordinary logic, A → B has a strict
definition. In fuzzy logic, is not a simple generalization of
ordinary logic and has many ways of definition. The commonly used
operations for fuzzy implication relations are:
(1)
Fuzzy implication minimum operation. This operation is given by
Mamdani. Its expression is:
(3.93)
(3.94)
(3.95)
From the above equation, it can be seen that the fuzzy implication
relation is defined as multiplication and concatenation of sets, which in
turn transforms into intersection, complement and concatenation of
fuzzy sets.
Among these three fuzzy implication relations, the most commonly
used are fuzzy implication minimum operation and fuzzy implication
arithmetic operation.
Example 3.9 Suppose we have two fuzzy sets A = [1, 0.8, 0.7, 0.4, 0.1]
and B = [1, 0.7, 0.3, 0], try to compute the fuzzy implication relation
using the fuzzy implication minimum operation.
(3.96)
The example can also be calculated using Matlab with the following
programs:
A = [1, 0.8, 0.7, 0.4, 0.1];
B = [1, 0.7, 0.3, 0];
m = length(A);
n = length(B);
for i1 = 1:m
for j1 = 1:n
Rc(i1,j1)=min(A(i1), B(j1));
end
end
After running the above program, the variable Rc is the matrix of the
fuzzy implication relation.
Fuzzy inference is the process of determining the mapping from
input to output using fuzzy logic. After determining the mapping from
input to output, fuzzy identification or fuzzy decision making can be
performed. We introduce fuzzy inference in terms of both simple fuzzy
conditional statements and multiple fuzzy conditional statements.
(1)
Simple fuzzy conditional statements.
Suppose the existing precondition is: if x is A, then y is B. For the input: if
x is , then the output is y is . This is the simple fuzzy conditional
statement, where the conclusion is based on the synthesis of the
fuzzy implication relation and the fuzzy set , i.e.:
(3.97)
where R is a fuzzy implication relation and 0 is a synthetic operation.
(3.98)
(3.101)
(3.102)
Solution The Matlab programs for the Example 3.11 are as follows:
A = [1, 0.4, 0.2];
B = [0.1, 0.6, 1];
C = [0.3, 0.7, 1];
m = length(A);
n = length(B);
for i1 = 1:m
for j1 = 1:n
RAB(i1,j1) = min(A(i1), B(j1));
end
end
RABLaShen = reshape(RAB', 1, size(RAB, 1) * size(RAB, 2));
m = length(RABLaShen);
n = length(C);
for i1 = 1:m
for j1 = 1:n
RABC(i1,j1) = min(RABLaShen(i1), C(j1));
end
end
Aapo = [0.3, 0.5, 0.7];
Bapo = [0.4, 0.5, 0.9];
m = length(Aapo);
n = length(Bapo);
for i1 = 1:m
for j1 = 1:n
RAapoBapo(i1,j1) = min(Aapo(i1), Bapo(j1));
end
end
m = size(RAapoBapo, 1);
n = size(RAapoBapo, 2);
RAapoBapoLaShen = reshape(RAapoBapo', 1, m*n);
n = size(RAapoBapoLaShen, 1);
l = size(RABC, 2);
for i1 = 1:n
for j1 = 1:l
Capo(i1, j1) = max(min([RAapoBapoLaShen(i1,:); RABC(:, j1)']));
end
end
After running the above program, the variable Capo is the value of
conclusion . We have .
Next, we introduce the multiple fuzzy conditional statements using
the “also” conjunction. Suppose the existing preconditions are: if x is
and y is , then z is , also if x is and y is , then z is , …, also
if x is and y is , then z is . For the input: if x is and y is ,
then the output: z is . Compared to the fuzzy conditional statement
connected by “and”, the precondition has more conditions connected by
“also”.
If the fuzzy implication relation for the i-th conditional rule, i.e., if x is
and y is , then z is , is written as:
(3.103)
Then the total fuzzy implication relation for all n precondition rules
is:
(3.104)
Finally, the output z is as:
(3.105)
From the above expressions, we can see that the multiple fuzzy
conditional statements connected with “also” are defined on the basis of
the fuzzy conditional statements connected with “and”. The final output
can be obtained by following the formula step by step.
The part in the dashed box in Fig. 3.6 is the fuzzy control system. The
fuzzy control system regulates the control object and solves real-life
problems by controlling changes in the object. Fuzzification refers to the
conversion of the input deterministic variable into a fuzzy variable. The
deterministic variable is also called the clear variable. Fuzzy inference
refers to the use of fuzzy implication relations and inference rules in
fuzzy logic to make decisions. Defuzzification refers to the conversion of
fuzzy variables obtained by fuzzy inference into definite variables for
control purposes. Knowledge base refers to the knowledge of the real-
life problem and the object to be controlled, which usually includes a
database and a fuzzy control rule base.
(1)
Fuzzification. The fuzzification operation is to map the input
observations into a fuzzy set over the theoretical domain.
First, the input observations are processed so that they are converted
into input variables suitable for the fuzzy controller. For example, if the
input observation is denoted as r and the output is denoted as y, in
general, we need to calculate the error e = r-y and the rate of change of
the error
Second, the input variables obtained from the processing are to be
scaled so that they are mapped to their respective theoretical domain
ranges.
Finally, the input variables transformed to the range of the
theoretical domain are to be fuzzed to obtain the corresponding fuzzy
sets [6].
The method of scale transformation of the input variables can be
linear or nonlinear, while the theoretical domain can be discrete or
continuous. If the theoretical domain is to be restricted to be discrete,
the continuous domain needs to be discretized, also known as
quantization [7]. The quantization of the theoretical domain can be
homogeneous or non-homogeneous.
For example, assuming that the range of the continuous domain is [−
3, 3], Table 3.3 gives the method of uniform quantization:
Table 3.3 Uniform quantization of continuous theoretical domains
Range [− 3, − 1.4) [− 1.4, − 0.8) [− 0.8, − 0.4) [− 0.4, 0.4) [0.4, 0.8) [0.8, 1.4) [1.4, 3]
Level −3 −2 −1 0 1 2 3
Range [− 3, − 2.5) [− 2.5, − 1.5) [− 1.5, − 0.5) [− 0.5, 0.5) [0.5, 1.5) [1.5, 2.5) [2.5, 3]
Level −3 −2 −1 0 1 2 3
Fuzzy set − 3 − 2 − 1 0 1 2 3
NB 1.0 0.5 0.0 0.0 0.0 0.0 0.0
NS 0.0 0.5 1.0 0.5 0.0 0.0 0.0
ZE 0.0 0.0 0.5 1.0 0.5 0.0 0.0
PS 0.0 0.0 0.0 0.5 1.0 0.5 0.0
PB 0.0 0.0 0.0 0.0 0.0 0.5 1.0
In Table 3.5, the fuzzy sets NB, NS, ZE, PS, and PB denote Negative
Big, Negative Small, Zero, Positive Small, and Positive Big, respectively.
ZE can be expressed as:
(3.106)
For the membership function on the continuous domain, the general
method is functional description. That is the membership is usually
expressed in the form of a function. The most common forms of
functions are Gaussian function, triangular function, trapezoidal
function. The expression of the Gaussian membership function is:
(3.107)
(3.108)
It is easy to see that the single-point fuzzy set only formally converts
the clear variable into a fuzzy variable, while it is still an accurate
quantity in substance. For example, if the theoretical domain is {− 3, − 2,
− 1, 0, 1, 2, 3} and the input variable = − 2, then its corresponding
single-point fuzzy set is A = (0,1,0,0,0,0,0,0); while for , then its
corresponding single-point fuzzy set is .
(2)
Fuzzy inference. Fuzzy inference was introduced in the previous
section and is omitted here.
(3) Defuzzification. A fuzzy variable can be obtained by fuzzy
inference, while for the actual control problem, it must be
converted into a clear variable. It is necessary to convert the fuzzy
variable into a clear variable. This is the task to be accomplished
by the defuzzification. The common methods of defuzzification
include: average maximum membership method, weighted average
method, maximum membership taking the minimum method,
maximum membership taking the maximum method, and median
method.
The average maximum membership method is also known as the “mom”
method. If the membership function of the fuzzy set of the output
variable is a single-peaked function. That is there is only one peak, the
maximum value of the membership function is selected as the clear
value, i.e.:
(3.109)
where denotes the determined value after defuzzification. If the
membership function of the fuzzy set of output variable is not a
single-peaked function, i.e., there are multiple peaks, the average of the
values of the elements corresponding to these peaks is selected as the
clear value.
The weighted average method is also known as the area center of gravity
method, sometimes abbreviated as centroid method. This method is a
weighted average of the membership in a fuzzy set to obtain clear
values. For a discrete membership function, the expression of the
weighted average method is:
(3.110)
For a continuous membership function, the expression of the weighted
average method is:
(3.111)
Example 3.13 Suppose the fuzzy set of the known output variable is
, try to use the weighted average
method to find the corresponding clear variable of .
(3.112)
−6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6
NB 1.0 0.8 0.7 0.4 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
NM 0.2 0.7 1.0 0.1 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
NS 0.1 0.1 0.3 0.7 1.0 0.7 0.2 0.0 0.0 0.0 0.0 0.0 0.0
NZ 0.0 0.0 0.0 0.0 0.1 0.6 1.0 0.0 0.0 0.0 0.0 0.0 0.0
PZ 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.6 0.1 0.0 0.0 0.0 0.0
PS 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.7 1.0 0.7 0.3 0.1 0.0
PM 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.7 1.0 0.7 0.3
PB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.4 0.7 0.8 1.0
−6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6
NB 1.0 0.7 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
NM 0.3 0.7 1.0 0.7 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
NS 0.0 0.0 0.3 0.7 1.0 0.7 0.3 0.0 0.0 0.0 0.0 0.0 0.0
ZE 0.0 0.0 0.0 0.0 0.3 0.7 1.0 0.7 0.3 0.0 0.0 0.0 0.0
PS 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.7 1.0 0.7 0.3 0.0 0.0
PM 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.7 1.0 0.7 0.3
PB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.7 1.0
In this fuzzy control system, the knowledge base is the fuzzy control
rules for the linguistic variables x, y and z, as shown in Table 3.8.
Table 3.8 Fuzzy control rules for linguistic variables x, y and z
NB NM NS ZE PS PM PB
NB NB NB NB NB NM ZE ZE
NB NM NS ZE PS PM PB
NM NB NB NB NB NM ZE ZE
NS NM NM NM NM ZE PS PS
NZ NM NM NS ZE PS PM PM
PZ NM NM NS ZE PS PM PM
PS NS NS ZE PM PM PM PM
PM ZE ZE PM PB PB PB PB
PB ZE ZE PM PB PB PB PB
−6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6
− 6 − 5.4 − 5.2 − 5.4 − 5.2 − 5.4 − 5.2 − 4.7 − 4.3 − 2.7 − 2.0 − 1.3 0.0 0.0
− 5 − 5.0 − 5.0 − 5.0 − 5.0 − 5.0 − 5.0 − 3.9 − 3.7 − 2.4 − 1.8 − 1.1 0.2 0.2
− 4 − 4.7 − 4.5 − 4.7 − 4.5 − 4.7 − 4.5 − 3.1 − 2.9 − 1.9 − 1.4 − 0.7 0.6 0.6
− 3 − 4.3 − 4.3 − 4.3 − 4.3 − 4.3 − 4.3 − 2.9 − 2.3 − 1.4 − 0.9 − 0.3 1.0 1.0
− 2 − 4.0 − 4.0 − 3.8 − 3.8 − 3.5 − 3.4 − 2.4 − 1.8 − 0.4 0.0 0.2 1.6 1.6
− 1 − 4.0 − 4.0 − 3.4 − 3.1 − 2.5 − 2.1 − 1.5 − 1.1 0.3 1.9 2.3 2.9 2.9
0 − 3.6 − 3.6 − 2.9 − 2.6 − 1.0 − 0.5 0.0 0.5 1.0 2.6 2.9 3.6 3.6
1 − 2.9 − 2.9 − 2.3 − 1.9 − 0.3 1.1 1.5 2.1 2.5 3.1 3.4 4.0 4.0
2 − 1.8 − 1.8 − 0.6 − 0.3 0.4 1.8 2.4 3.4 3.5 3.8 3.8 4.0 4.0
3 − 1.0 − 1.0 0.3 0.9 1.4 2.3 2.9 4.3 4.3 4.3 4.3 4.3 4.3
4 − 0.6 − 0.6 0.7 1.4 1.9 2.9 3.1 4.5 4.7 4.5 4.7 4.5 4.7
5 − 0.2 − 0.2 1.1 1.8 2.4 3.7 3.9 5.0 5.0 5.0 5.0 5.0 5.0
6 0.0 0.0 1.3 2.0 2.7 4.3 4.7 5.2 5.4 5.2 5.4 5.2 5.4
In Table 3.9, the output variables are retained only to one decimal
place to save space and for display purposes.
The above example of a fuzzy control system can also be
implemented using Matlab programming with the following programs:
A = xlsread(‘fuzzycon.xlsx’,‘x’);
B = xlsread(‘fuzzycon.xlsx’,‘yz’);
C = B;
R = xlsread(‘fuzzycon.xlsx’,‘r’);
U = -6:1:6;
n = length(U);
X = eye(n);
Y = X;
Z = zeros(n);
for i=1:n
for j=1:n
x0 = X(i,:);
y0 = Y(j,:);
zi = defuzzyAlsoAnd(A,B,C,R,x0,y0);
zi = sum(zi.*U)/sum(zi);
Z(i,j) = roundn(zi, -2);
end
end
xlswrite(‘fuzzycon.xlsx’, Z, ‘result’);
In the above program, the fuzzy set matrix and fuzzy rule matrix are
stored in a file with the suffix “xlsx”. Moreover, the calculated output
results are also stored in a file. The defuzzyAlsoAnd.m function is used
for defuzzification:
function zi = defuzzyAlsoAnd(A,B,C,R,x0,y0)
m = size(A,1);
n = size(B,1);
for i=1:m
for j=1:n
k = R(i,j);
Rtmp((i-1)*n+j,:)=fuzzyInference(A(i,:),B(j,:),C(k,:),x0,y0);
end
end
zi = max(Rtmp);
The fuzzyInference.m function is used for fuzzy inference:
function Capo=fuzzyInference(Ai,Bi,Ci,Aapo,Bapo)
m = length(Ai);
n = length(Bi);
for i1 = 1:m
for j1 = 1:n
RAB(i1,j1) = min(Ai(i1), Bi(j1));
end
end
RABLaShen = reshape(RAB', 1, size(RAB, 1) * size(RAB, 2));
m = length(RABLaShen);
n = length(Ci);
for i1 = 1:m
for j1 = 1:n
RABC(i1,j1) = min(RABLaShen(i1), Ci(j1));
end
end
m = length(Aapo);
n = length(Bapo);
for i1 = 1:m
for j1 = 1:n
RAapoBapo(i1,j1) = min(Aapo(i1), Bapo(j1));
end
end
m = size(RAapoBapo, 1);
n = size(RAapoBapo, 2);
RAapoBapoLaShen = reshape(RAapoBapo', 1, m*n);
n = size(RAapoBapoLaShen, 1);
l = size(RABC, 2);
for i1 = 1:n
for j1 = 1:l
Capo(i1, j1) = max(min([RAapoBapoLaShen(i1,:); RABC(:, j1)']));
end
end
It can be seen that the fuzzy control system is a comprehensive use
of fuzzy logic theory.
(3.115)
(3.116)
(3.118)
(3.119)
SG MG LG
SD VS M L
MD S M L
LD M L VL
For the input variables x = 60, y = 70, try to find the output quantity t.
Solution We solve the above problem with the help of Matlab's Fuzzy
Logic Designer, which can be opened by clicking Fuzzy Logic Designer
from the App in Matlab.
The interface after opening the fuzzy logic designer is shown in Fig. 3.7.
Under this interface, there is one input variable (located in the top left of
the figure) and one output variable (located in the top right of the
figure) by default. Users can also modify the name of the current
variable, which is changed to sludge in Fig. 3.7. Double-clicking the icon
of sludge with the mouse will bring up the dialog box shown in Fig. 3.8,
in which you can modify the theoretical domain and membership
function of sludge. We name the membership functions of sludge as SD,
MD and LD, respectively.
From the bottom left of the Fig. 3.9, we can see that the fuzzy
conditional statement connected with “and” uses “min”, which is the
intersection operation. The fuzzy conditional statement connected with
“also” uses “max”. These are the same as the fuzzy control system
example in the previous section. When solving the fuzzy control
problem of the washing machine, these default parameters are also
used, but the user can choose other settings as needed. The default
defuzzification is “centroid”, i.e., weighted average. It is modified to
“mom”, i.e., the average maximum membership method.
Then modify the name of the output variable as washing time. We
can open the subordinate function dialog box by double-clicking the icon
of the output variable. Then we modify the domain of washing time as
[0, 60], add 5 subordinate functions according to the requirements of
the example, and then close the subordinate function dialog box to get
the result as shown in Fig. 3.10.
Next, we add the rules of fuzzy control. Under the fuzzy controller
dialog box, double click the mamdani icon in the middle to open the rule
editing dialog box. We can add the 9 fuzzy control rules in the example.
This completes the design of the washing machine fuzzy control system.
Click the File menu, select Export, and save the file as
“washingMachineConrol.fis”.
To calculate the output variable t for the input variables x = 60, y =
70, under the View menu of the fuzzy control designer, click on Rules to
open the Rules view, as shown in Fig. 3.11. In this dialog box, at the
bottom left, enter [60; 70], and you can see the washing time is about
24.9 at the top right.
The third one plotted by the above program is the mapping surface
diagram of input and output variables, as shown in Fig. 3.14. It can be
seen that the washing time is a function of sludge and grease. It is
getting longer as the sludge and grease increase. Since fuzzy logic is
used for control, the change of washing time shows a certain slope,
which increases the stability of the washing machine control system.
Fig. 3.14 Function mapping curve of input and output
The Example 3.15 shows the usage of fuzzy logic designer and
programs to design a controller for washing machine. Users can choose
either way to solve their control problems.
Exercises
(1) Suppose the domain U is the age of a person with the range (0,
100]. There are 3 classes of patterns young , middle-aged
and old on this domain. For a certain variable , please
use the principle of maximum membership to determine the class
to which belongs, where:
(2)
Suppose the domain is the quality of
tea leaves, where the elements are stripe, color, clarity, soup color,
aroma and taste. There are five categories of patterns , , ,
and on this domain. For a certain tea B to be identified,
please use the nearest principle to determine the category to
which B belongs, where: ,
, ,
, ,
.
(3) If we want to use fuzzy logic method to study the effect of the
length of queuing time on passenger satisfaction at railroad
stations. We can define “time” as the input variable and
“satisfaction” as the output variable. The domain of the input time
is set to U = [5, 60], and its membership function has three fuzzy
sets on the domain, which are short time (ST), medium time (MT),
and long time (LT). The membership functions of ST, MT and LT
are:
The theoretical domain of the output variable time is set to U = [0,
10]. The output variable time has three fuzzy sets. They are higher
satisfaction (HS), general satisfaction (GS), and poor satisfaction (LS),
whose membership functions are:
References
1. Zadeh LA (2012) Fuzzy Logic. In: Meyers R (eds) Computational complexity. Springer, New
York, NY. https://doi.org/10.1007/978-1-4614-1800-9_73
2. Deng F, Chen W (2020) Intelligent computing and information processing. Beijing Institute of
Technology Press, Beijing
4. Chiu SL (1994) Fuzzy model identification based on cluster estimation. J Intell Fuzzy Syst 2,
267–278
7. Zadeh LA (1997) The roles of fuzzy logic and soft computing in the conception, design and
deployment of intelligent systems. In: Nwana, HS, Azarmi, N (eds) Software agents and soft
computing towards enhancing machine intelligence. Lecture Notes in Computer Science, vol
1198. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62560-7_45
8. Yager RR, Zadeh LA (2012) An introduction to fuzzy logic applications in intelligent systems.
Springer, New York, NY. https://doi.org/10.1007/978-1-4615-3640-6
OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
X. Zhang et al., Intelligent Information Processing with Matlab
https://doi.org/10.1007/978-981-99-6449-9_4
Xin Zhang
Email: [email protected]
Wei Wang
Email: [email protected]
Abstract
Fuzzy neural network combines fuzzy computing and artificial neural
network. Fuzzy neural network inherits the characteristics of both
fuzzy logic and neural network such as logical reasoning ability,
adaptive ability and learning ability. This chapter first gives an
overview of fuzzy neural network including Takagi-Sugeno fuzzy
system and expert system. Adaptive network-based fuzzy inference
system is introduced to illustrate the usage of fuzzy neural network.
Then fuzzy neural network is used to solve time series prediction
problem. Interval type-2 fuzzy logic is presented and its performance is
studied through time series prediction problem. Fuzzy neural network
is then applied to solve clustering problem and suburban commuting
prediction problem. Finally, the state-of-the-art research progress of
fuzzy computing is presented.
(4.2)
where represents the total output of the system. The expression for
the weighted average method is:
(4.3)
The above T-S fuzzy system consists of one input layer, three hidden
layers and one output layer. Since one hidden layer is skipped from the
weight calculation, the T-S fuzzy system can also be said to have two
hidden layers. There are four neurons from the input layer to the first
hidden layer. The first neuron computes , the
second neuron computes , the third neuron
computes , and the fourth neuron computes
. The second hidden layer has three neurons. The
first neuron computes , the second neuron computes , and
the third neuron computes . Then is computed,
and finally the output layer computes . It can be seen that the weights
between the second layer to the output layer in the T-S type fuzzy
neural network are all 1, which is a simplified neural network model.
By properly constructing the topology and activation function of the
neural network, we can obtain the neural network model equivalent to
the T-S fuzzy system. Then we can use the theory of neural network to
analyze the T-S fuzzy controller. For example, for a certain control
problem, data are first collected experimentally to form a training set.
We then can learn the training dataset by the method of neural
network. The trained T-S fuzzy controller can be employed to the
associated control problem.
Next, we introduce the neural network model of the fuzzy expert
system. The rule form of the fuzzy expert system is as follows:
(4.4)
where denotes the i-th rule, denotes the error, denotes the
change rate of the error, and is the given fuzzy set, is the
triangle fuzzy function. It can be seen that the output variable of the
fuzzy expert system is a certain control action. The control action is a
clear value, so the system does not have a defuzzification module.
Assuming that the fuzzy expert system contains n rules and the weight
of the i-th rule is noted as , the output can be calculated using the
weighted average method.
The fuzzy expert system corresponding to the FNN is shown in
Fig. 4.2. From the figure, it can be seen that the fuzzy expert system is
represented as a neural network structure. In the system, the weights
are calculated from the fuzzy set and the operations used are
. The fuzzy expert system consists of an input layer, two
hidden layers and an output layer. There are two neurons from the
input layer to the first hidden layer. The first neuron is to calculate the
weights and the second neuron is to calculate
. The second hidden layer has two neurons. The first
neuron is to compute . The second neuron is to compute
, and the weight of the second neuron is 1. Finally, the output
layer computes . It can be seen that the weights between the second
hidden layer to the output layer in the fuzzy expert neural network are
all 1. It is also a simplified neural network model. The fuzzy expert
system has a more concise neural network structure compared to the
T-S type fuzzy system.
(4.6)
(4.7)
(4.8)
Try to build a T-S type fuzzy system satisfying the above rules, and
compute the value of give and .
Solution The example can be solved by two ways. The first way is
using fuzzy logic designer. T-S type fuzzy system can be designed in an
interactive manner using the fuzzy logic designer. The usage of fuzzy
logic designer has been introduced in the last chapter; thus, it is not
described here.
The second way is using the following programs to solve the
example:
fis = sugfis(‘Name’,‘SugenoExample');
var1 = fisvar([0,10],‘Name',‘X');
var1 = addMF(var1,‘trimf', [0,0,4],‘Name',‘x1');
var1 = addMF(var1,‘trapmf', [2,4,6,8],'Name',‘x2');
var1 = addMF(var1,‘trimf', [6,10,10],‘Name',‘x3');
var2 = fisvar([0,10],‘Name',‘Y');
var2 = addMF(var2,‘trimf', [0,0,7],‘Name',‘y1');
var2 = addMF(var2,‘trimf', [3, 10],‘Name',‘y2');
fis.Inputs = [var1,var2];
var3 = fisvar([0,120],‘Name',‘Z');
var3 = addMF(var3,‘linear', [− 1,2,0],‘Name',‘z1');
var3 = addMF(var3,‘linear', [8,− 4,1],‘Name',‘z2');
var3 = addMF(var3,‘linear', [1,3,9],‘Name',‘z3');
var3 = addMF(var3,‘linear', [5,0,1],‘Name',‘z4’);
fis.Outputs = var3;
rulelist = [1,1,1,1,1;
2,1,2,1,1;
2,2,3,1,1;
3,0,4,1,1];
fuzzyRules = fisrule(rulelist,2);
fuzzyRules = update(fuzzyRules,fis);
fis.Rules = fuzzyRules;
x = 6;
y = 7;
z = evalfis(fis,[x,y]);
showrule(fis,1:2,‘verbose’);
figure(1);
plotfis(fis);
figure(2);
gensurf(fis)
The running results of the above program are shown in Figs. 4.3, 4.4
and 4.5.
Fig. 4.3 T-S type fuzzy system for Example 4.1
As can be seen in Fig. 4.3, the names of the two input variables are
defined as X and Y, where X has three fuzzy membership functions and
Y has two fuzzy membership functions. The name of this T-S type fuzzy
system is “SugenoExample” with four fuzzy rules, while the name of the
output variable is defined as Z. The output variable is composed of four
functions, each of which is a linear function of the input variables.
The fuzzy rules of the Example 4.1 can be seen in Fig. 4.4, from which
the process of fuzzy reasoning by the T-S model can be observed. When
the input , , when the output .
Figure 4.5 gives the surface plot of the relationship between two
input and output variables. It can be seen from the figure that the
mapping curve is not very smooth. There are large fluctuations and
drastic changes in some places, which reflects that the system is not
perfect. In this case, it is generally necessary to add more fuzzy rules to
accumulate more empirical knowledge, so that the unsmooth areas can
be eliminated and the whole mapping relationship tends to be
continuous and smooth. However, with the increase of fuzzy rules, it
makes the maintenance of fuzzy systems more and more complicated
and reduces the interpretability of the system. In practical applications,
we need to consider the modeling of fuzzy systems from several aspects
to achieve a satisfactory balance.
(4.10)
where , denotes the i-th rule, x and y denotes two
linguistic variables, and are the given fuzzy sets, , and
are the parameters of the fuzzy system.
This ANFIS fuzzy system is shown in Fig. 4.6, which shows that it is
a five-layer neural network structure containing one input layer, four
hidden layers and one output layer. In this figure, the first hidden layer
and the fourth hidden layer are represented by square blocks, which is
because these two layers contain adjustable parameters. While the
second hidden layer, the third hidden layer and the output layer are
represented by circular blocks, which is because these three layers do
not contain adjustable parameters. For the nodes with adjustable
parameters represented by square blocks, the learning algorithm of the
neural network can be used, and thus determine the final fuzzy system.
Fig. 4.6 ANFIS fuzzy neural network
In the above ANFIS fuzzy system, the first hidden layer is to fuzzify
the input variables. The first two neurons in this layer are to calculate
the fuzzy set of variable . The output value of the first layer is
obtained after calculating:
(4.11)
The last two neurons in the first layer are to compute the fuzzy set
of variable , which are computed to obtain the output value of the
layer:
(4.12)
where indicates the result obtained by a certain neuron calculation,
the number in the right superscript indicates that it is the first hidden
layer, and the right subscript indicates the i-th fuzzy rule. It should be
noted that there are several forms of membership functions to choose.
Different values will be obtained by using different membership
functions. The parameters in the chosen membership function are
generally called conditional parameters. For example, if a Gaussian-type
membership function is used:
(4.13)
(4.17)
It can be seen that the output layer yields result equivalent to (4.9)
and (4.10), which indicates that the ANFIS fuzzy system can be
represented as a fuzzy neural network equivalent to it.
From the above introduction, it can be seen that the ANFIS fuzzy
system includes conditional and conclusion parameters, which are
determined before they can be used. We can use the BP neural network
learning algorithm to learn these parameters, or we can combine the
BP neural network learning algorithm with the least square estimation
method to learn the parameters. Researchers have found that a mixture
of the BP neural network learning algorithm and the least square
estimation method is more effective for learning the parameters. The
learning algorithms for the conditional and conclusion parameters are
not described in detail here. Interested readers can refer to the related
materials.
We compare the Mandani fuzzy system and ANFIS fuzzy system in
terms of interpretability and accuracy. The fewer the parameters of a
fuzzy system, the more interpretable it is; conversely, the more the
parameters of a fuzzy system, the better its accuracy. The fewer the
rules of a fuzzy system, the more interpretable it is; conversely, the
more the rules of a fuzzy system, the better its accuracy. Mandani fuzzy
system uses fuzzy language to describe the problem, thus Mandani
fuzzy system is highly interpretable; while the output of ANFIS fuzzy
system is clear value, thus ANFIS fuzzy system is highly accurate. It can
be seen that to construct a fuzzy system, it is necessary to consider both
interpretability and accuracy, which are mutually constrained. Thus, a
balance point is preferable to maximize both interpretability and
accuracy metrics.
From Fig. 4.7, we can see that the points predicted by ANFIS and the
points in the training set differ greatly. We can adjust the parameters of
ANFIS to improve its performance. For example, ANFIS has 2
membership functions by default, and we set the number of
membership functions to 4. This means we increase the parameters in
the fuzzy rules and fuzzy system. We then set the number of training
iterations to 50, and the required programs are as follows:
load(‘fuzex1trnData.dat');
opt = anfisOptions(‘InitialFIS',4,‘EpochNumber',50);
opt.DisplayANFISInformation = 0;
opt.DisplayErrorValues = 0;
opt.DisplayStepSize = 0;
[fis,trainError] = anfis(fuzex1trnData,opt);
fisRMSE = min(trainError);
x = fuzex1trnData(:,1);
anfisOutput = evalfis(fis,x);
figure1 = figure(1);
axes1 = axes(‘Parent',figure1);
hold(axes1,‘on');
plot1 =
plot(x,fuzex1trnData(:,2),‘MarkerSize',8,‘LineWidth',2,‘LineStyle',
‘none');
set(plot1,‘DisplayName',‘Training Data',‘Marker',‘*',‘Color',[1 0 0]);
plot2 =
plot(x,anfisOutput,‘MarkerSize',8,‘LineWidth',2,‘LineStyle',‘none');
set(plot2,‘DisplayName',‘ANFIS Output',‘Marker',‘o',‘Color',[0 0 1]);
xlabel(‘x');
ylabel(‘z');
box(axes1,‘on');
set(axes1,‘FontSize',14);
legend1 = legend(axes1,‘show');
set(legend1,‘Position',[0.15 0.79 0.25 0.10]);
hold(axes1,‘off');
After the above program is run, the variable “fisRMSE” stores the
mean square error of ANFIS on the training set. The value of “fisRMSE”
is 0.0823. We can see a larger reduction in mean square error
compared to ANFIS without adjusted parameters. The predicted results
are shown in Fig. 4.8. In Fig. 4.8, the samples in the training set are
indicated by asterisks, and the points predicted by ANFIS are indicated
by circle symbols. It can be seen from the figure that the difference has
narrowed between the training samples and the output of ANFIS.
Fig. 4.8 Results of ANFIS prediction after adjusting parameters
Based on the above program, we can also add the validation set.
Then we can analyze the mean square error on the training and
validation sets with the following programs:
load(‘fuzex1trnData.dat');
load(‘fuzex1chkData.dat');
opt = anfisOptions(‘InitialFIS',4,‘EpochNumber',50);
opt.DisplayANFISInformation = 0;
opt.DisplayErrorValues = 0;
opt.DisplayStepSize = 0;
opt.ValidationData = fuzex1chkData;
[fis,trainError,stepSize,chkFIS,chkError] = anfis(fuzex1trnData,opt);
fisRMSE = min(trainError);
x = fuzex1trnData(:,1);
anfisOutput = evalfis(fis,x);
figure1 = figure(1);
axes1 = axes(‘Parent',figure1);
hold(axes1,‘on');
plot1 =
plot(x,fuzex1trnData(:,2),‘MarkerSize',8,‘LineWidth',2,‘LineStyle',
‘none');
set(plot1,‘DisplayName',‘Training Data',‘Marker',‘*',‘Color',[1 0 0]);
plot2 =
plot(x,anfisOutput,‘MarkerSize',8,‘LineWidth',2,‘LineStyle',‘none');
set(plot2,‘DisplayName',‘ANFIS Output',‘Marker',‘o',‘Color',[0 0 1]);
xlabel(‘x');
ylabel(‘z’);
box(axes1,‘on');
set(axes1,‘FontSize',14);
legend1 = legend(axes1,‘show');
set(legend1,‘Position',[0.15 0.79 0.25 0.10]);
hold(axes1,‘off');
figure2 = figure(2);
axes1 = axes(‘Parent',figure2);
hold(axes1,‘on');
epoch = 1:opt.EpochNumber;
[minval,minidx] = min(chkError);
plot1 = plot(epoch,trainError,‘LineWidth',2,‘LineStyle',‘none');
set(plot1,‘DisplayName',‘Train',‘Marker',‘o',‘Color',[0 0 1]);
plot2 = plot(epoch,chkError,‘LineWidth',2,‘LineStyle',‘none');
set(plot2,‘DisplayName',‘Validation',‘MarkerSize',8,...
‘Marker',‘*',‘Color',[1 0 0]);
plot(minidx,minval,‘DisplayName',‘Best',‘MarkerSize',25,...
‘Marker’,‘.',‘LineWidth',3,‘LineStyle',‘none',…
‘Color',[0 0 0]);
ylabel(‘RMSE');
xlabel(‘epoch');
box(axes1,‘on');
hold(axes1,‘off');
set(axes1,‘FontSize',14,‘XGrid',‘on',‘YGrid',‘on');
legend1 = legend(axes1,‘show');
set(legend1,‘Position',[0.7 0.55 0.2 0.15]);
After the above program is run, the mean square error of ANFIS on
the training and validation sets is shown in Fig. 4.9. In Fig. 4.9, the
results on the training set are represented by circle symbols, while the
results on the validation set are represented by asterisks. The point
with the smallest mean square error on the validation set is
represented by a solid circle symbol.
Fig. 4.9 Mean square error of ANFIS on the training and validation sets
From Fig. 4.9, it can be seen that the mean square error of ANFIS on
the training set decreases rapidly with the increase of training times
(epochs). After 30 epochs, the decreasing trend becomes slower, which
indicating that the model tends to be smooth. The mean square error
still shows up and down fluctuations. Correspondingly, the mean
square error of ANFIS on the validation set decreases first. The curve
reaches the minimum mean square error, i.e., the “Best” point in the
figure, at the 17-th iteration. In the subsequent epochs, the mean
square error of ANFIS on the validation set gradually increases. Thus,
the ANFIS model returned by the above program is the model of the 17-
th iteration, which is stored in the variable “chkFIS”.
(4.18)
(4.20)
(4.21)
Under the above initial conditions, we use the 4-th order Runge-
Kutta method to calculate the numerical solution of this problem. This
leads to a set of data for this problem. The dataset contains 1200
points. Try to make predictions based on the available dataset and
analyze the results.
Solution It can be seen that the time series prediction problem is one
independent variable and one dependent variable. Usually we need to
construct a dataset and then do prediction based on the dataset.
Assuming that there are already moments of data, the moment we
need to predict is . Usually we start from the C-th point in the
existing data and take a sample of every interval of D, i.e.:
(4.22)
If we take , a sample with 4 components can be
obtained:
(4.23)
And the moment to predict at this point is . We start from
, and end until . Thus, we can construct 1000 such
samples. We use the first 500 of the 1000 samples as the training
dataset and the last 500 as the validation dataset.
Fig. 4.14 Representations of tall building, left: binary logic, middle: type-1 fuzzy set, right:
interval type-2 fuzzy set
Zadeh proposed the type-2 fuzzy set in 1975. His starting point is
that people do not have the same understanding of the same linguistic
concept, also known as inter-individual uncertainty. Take the concept of
“tall building” as an example. Suppose a type-1 fuzzy set describes this
concept, the membership of an 18-floor building as a tall building is 0.8,
but does this membership have to be 0.8? Perhaps someone thinks this
membership should be 0.7? A type-2 fuzzy set is to express different
views of different individuals. Due to the complexity of expressing type-
2 fuzzy sets, researchers nowadays usually use interval type-2 fuzzy
sets. In the interval type-2 fuzzy set, the membership is no longer a
value but an interval. For example, the membership interval for an 18-
floor building belonging to a tall building is [0.7, 0.8], as shown in the
right graph in Fig. 4.14. If everyone agrees that the membership degree
of the 18-floor building belongs to the tall building is 0.8, then the
membership degree interval becomes a value, i.e., the interval type-2
fuzzy set becomes a type-1 fuzzy set. Thus, the interval type-2 fuzzy set
is a generalization of the type-1 fuzzy set.
Fuzzy logic based on type-1 fuzzy set is called type-1 fuzzy logic.
Similarly, fuzzy logic based on type-2 fuzzy set is called type-2 fuzzy
logic [3]. Sometime, type-1 fuzzy logic is called type-I fuzzy logic, and
type-2 fuzzy logic is called type-II fuzzy logic. According to the concept
of type-2 fuzzy sets, we know that the theories of fuzzy computing
introduced earlier are all of type-1, including type-1 fuzzy sets and
type-1 fuzzy systems. In 2000, Mendel and his student Liang promoted
the study of interval type-2 fuzzy sets, which led to the development of
interval type-2 fuzzy computing [4]. Type-2 fuzzy computing is then
applied to control and decision-making problems.
Solution We build a type-2 T-S fuzzy system using the same training
and validation sets as in the previous section, with the following
programs:
load(‘mgdata.dat');
time = mgdata(:,1);
x = mgdata(:, 2);
figure;
plot(time,x)
title(‘Mackey-Glass Chaotic Time Series')
xlabel(‘Time (sec)')
ylabel(‘x(t)')
C = 4;
for t =118:1117
Data(t − 117,:) = [x(t − 18) x(t − 12) x(t − 6) x(t) x(t + 6)];
end
trnX = Data(1:500,1:C);
trnY = Data(1:500,C + 1);
vldX = Data(501:end,1:C);
vldY = Data(501:end,C + 1);
fisin = sugfistype2;
numInputs = C;
numInputMFs = 3;
range = [min(x) max(x)];
for I = 1:numInputs
fisin = addInput(fisin,range,‘NumMFs’,numInputMFs);
for j = 1:numInputMFs
fisin.Inputs(i).MembershipFunctions(j).LowerScale = 1;
fisin.Inputs(i).MembershipFunctions(j).LowerLag = 0;
end
end
numOutputMFs = numInputMFs^numInputs;
fisin = addOutput(fisin,range,‘NumMFs',numOutputMFs);
figure;
plotfis(fisin)
options = tunefisOptions;
options.Method = ‘particleswarm';
options.OptimizationType = ‘learning';
options.NumMaxRules = numInputMFs^numInputs;
options.UseParallel = false;
options.MethodOptions.MaxIterations = 10;
fisout1 = tunefis(fisin,[],trnX,trnY,options);
figure;
plotfis(fisout1)
figure;
gensurf(fisout1,gensurfOptions(‘InputIndex',1))
evalOptions = evalfisOptions(“EmptyOutputFuzzySetMessage”,“none”,
...
“NoRuleFiredMessage”,“none”,“OutOfRangeInputValueMessage”
,“none”);
predY = evalfis(fisout1,vldX,evalOptions);
del = predY - vldY;
rmse = sqrt(mean(del.^2));
figure;
plot([predY vldY])
axis([0 length(vldY) min(vldY) − 0.01 max(vldY) + 0.13])
xlabel(‘t')
ylabel(‘x(t)')
legend([“predicted value” “true value”],‘Location',“northeast”)
After the above program is run, five figures would be drawn. We only
give three figures to show the results, as shown in Figs. 4.15, 4.16 and
4.17.
Fig. 4.15 Type-2 T-S fuzzy system at the beginning
Fig. 4.16 Type-2 T-S fuzzy system after training
Fig. 4.17 Performance of type-2 T-S fuzzy system on the validation set
The type-2 T-S fuzzy system at the initial time is given in Fig. 4.15. It
can be seen that the number of fuzzy rules is 0, i.e., there are no fuzzy
rules yet.
The type-2 T-S fuzzy system trained using the training set is given in
Fig. 4.16, from which it can be seen that the number of fuzzy rules is 68.
The performance of the type-2 T-S fuzzy system on the validation
set is given in Fig. 4.17. The RMSE at these points is about 0.071. The
result shows that the obtained fuzzy system model is able to solve the
MG time series prediction problem.
In conjunction with the previous section, the RMSE on the
validation set for the type-1 T-S fuzzy system is 0.003, while the RMSE
on the validation set for the type-2 T-S fuzzy system is 0.071. In terms
of RMSE metric, the type-2 T-S fuzzy system performs slightly worse
than the type-1 T-S fuzzy system. It should be noted that the
performance of the model is not intentionally optimized here.
Interested readers can make further comparisons.
4.5 Fuzzy C-means Clustering
Clustering is the basis for many classification and system modeling
methods. The purpose of clustering is to identify natural groupings of
data from a large amount of data to describe system behavior in a
concise form. The best known of the fuzzy clustering methods is the
fuzzy c-means clustering (FCM) method. FCM is a data clustering
technique that uses membership to indicate the degree to which data
points belong to a category. The FCM method was originally proposed
by Bezdek in 1981 [5]. It provides a method that shows how to group
data points that populate a certain multi-dimensional space into a
specific number of distinct clusters.
Initially, the FCM method first randomly selects the locations of the
clustering points. These randomly generated clustering centers are
likely to be wrong. Then, the FCM method assigns each data point a
membership degree belonging to each category. By iteratively updating
the cluster centers and membership degrees for each data point, the
FCM method iteratively moves the cluster centers to dense locations in
the dataset. This iteration is based on minimizing some objective
function that represents the distance from any given data point to the
cluster center weighted by the membership of that data point. The final
FCM method will output the clustering centers it finds.
Due to the use of fuzzy membership functions, the FCM method is
characterized by allowing each sample point to belong to more than
one category. The degree to which a sample point belongs to a category
is determined by the membership function. Since FCM allows each
sample point to belong to more than one category, this makes the
boundaries of the categories overlap each other. It is generally
represented by the fuzzy separation matrix index, which determines
the membership degree of the sample points to different categories.
The objective function used in the FCM method is:
(4.24)
(4.25)
(4.26)
Example 4.5 Let’s take the iris dataset as an example. The dataset
contains three types of irises, namely Sentosa iris, Versicolour iris and
Virginia iris. There are 50 samples of each iris species. Each sample has
4 attributes, namely sepal length, sepal width, petal length and petal
width. Please use the FCM method to perform clustering analysis on
this dataset.
Solution The programs for solving this problem using the FCM
method are as follows:
load(‘iris.dat’);
setosaIndex = iris(:,5) == 1;
versicolorIndex = iris(:,5) == 2;
virginicaIndex = iris(:,5) == 3;
setosa = iris(setosaIndex,:);
versicolor = iris(versicolorIndex,:);
virginica = iris(virginicaIndex,:);
Characteristics = {‘sepal length',‘sepal width',‘petal length',‘petal
width'};
pairs = [1 2; 1 3; 1 4; 2 3; 2 4; 3 4];
figure1 = figure;
for i =1:6
x = pairs(i,1);
y = pairs(i,2);
subplot1 = subplot(2,3,i,‘Parent',figure1);
hold(subplot1,‘on');
plot(setosa(:,x),setosa(:,y),‘Parent',subplot1,…
‘MarkerSize',8,‘Marker',‘.',‘LineStyle',‘none');
plot(versicolor(:,x),versicolor(:,y),‘Parent',subplot1,…
‘MarkerSize',8,‘Marker',‘x',‘LineStyle',‘none');
plot(virginica(:,x),virginica(:,y),‘Parent',subplot1,…
‘MarkerSize',8,‘Marker',‘square',‘LineStyle',‘none');
xlabel(Characteristics{x});
ylabel(Characteristics{y});
box(subplot1,‘on');
hold(subplot1,‘off');
set(subplot1,‘FontSize',12);
end
M = 3;
m = 2.0;
maxIter = 100;
minImpr = 1e − 6;
opt = [m maxIter minImpr true];
[centers,U,objFun] = fcm(iris,M,opt);
figure1 = figure;
for i = 1:6
subplot1 = subplot(2,3,i,‘Parent',figrue1);
x = pairs(i,1);
y = pairs(i,2);
hold(subplot1,‘on');
plot(setosa(:,x),setosa(:,y),‘Parent',subplot1,…
‘MarkerSize',8,‘Marker',‘.',‘LineStyle',‘none');
plot(versicolor(:,x),versicolor(:,y),‘Parent',subplot1,…
‘MarkerSize',8,‘Marker’,‘x',‘LineStyle',‘none');
plot(virginica(:,x),virginica(:,y),‘Parent',subplot1,…
‘MarkerSize',8,‘Marker',‘square',‘LineStyle',‘none');
for j = 1:M
text(centers(j,x),centers(j,y),int2str(j),…
‘FontSize',12,‘FontWeight',‘bold');
end
xlabel(Characteristics{x});
ylabel(Characteristics{y});
box(subplot1,‘on');
hold(subplot1,‘off');
set(subplot1,‘FontSize',12);
end
After the above program is run, the results are shown in Figs. 4.18
and 4.19.
Fig. 4.18 Visualizing the Iris dataset
Fig. 4.19 Clustering results of the FCM method on the Iris dataset
As shown in Fig. 4.18, the iris dataset has four attributes. A flat graph is
drawn by using two attributes. That is the cases in Fig. 4.18. It
can be seen that the overlap of the sample categories for the sepal
width and sepal length attributes is more, while the overlap of the
sample point categories for the sepal width and petal width attributes
is less.
The clustering results of the FCM method are given in Fig. 4.19. The
centers of the clusters are represented by numbers. Numbers 1, 2 and 3
indicate category 1, category 2 and category 3, respectively. It can be
seen from the figure that the FCM method solves this clustering
problem.
The above program also outputs the result of the objective function.
After 22 iterations, the FCM method reaches the minimum
threshold. At this point, the method terminates and the objective
function value is 6058.69.
The FCM method requires a predetermined number of categories.
This is one of the shortcomings of this method. Chiu proposed the
subtractive clustering (SC) method in 1994. The starting point of
subtractive clustering is that it does not require a predetermined
number of categories. It is also fast to estimate the number of
categories and calculate the centers of clusters. The steps of the SC
method are:
Step (1) calculate the probability that each sample is a cluster center.
Assuming that each sample point is a possible cluster center, and that
this probability is based on the density of other sample points
around the sample point;
Step (2) select the sample points most likely to be cluster centers as
temporary cluster centers;
Step (3) remove sample points from the neighborhood near the
temporary cluster centers. The size of the neighborhood is
determined by a parameter called the category influence range;
Step (4) among the remaining sample points, the one most likely to
be the cluster center is selected as the temporary cluster center;
Step (5) Repeat Steps (3) and (4) until some termination condition is
met.
Solution The programs for solving this problem using the SC method
are as follows:
load(‘iris.dat');
setosaIndex = iris(:,5) == 1;
versicolorIndex = iris(:,5) == 2;
virginicaIndex = iris(:,5) == 3;
setosa = iris(setosaIndex,:);
versicolor = iris(versicolorIndex,:);
virginica = iris(virginicaIndex,:);
clusterInfluenceRange = 1;
[centers,sigma] = subclust(iris,clusterInfluenceRange);
Characteristics = {‘sepal length',‘sepal width',‘petal length',‘petal
width'};
pairs = [1 2; 1 3; 1 4; 2 3; 2 4; 3 4];
figure1 = figure;
for i = 1:6
subplot1 = subplot(2,3,i,‘Parent',figure1);
x = pairs(i,1);
y = pairs(i,2);
hold(subplot1,‘on');
plot(setosa(:,x),setosa(:,y),‘Parent',subplot1,…
‘MarkerSize',8,‘Marker',‘.',‘LineStyle',‘none');
plot(versicolor(:,x),versicolor(:,y),‘Parent',subplot1,…
‘MarkerSize',8,‘Marker',‘x',‘LineStyle',‘none');
plot(virginica(:,x),virginica(:,y),‘Parent',subplot1,…
‘MarkerSize',8,‘Marker',‘square',‘LineStyle',‘none');
for j = 1:size(centers,1)
text(centers(j,x),centers(j,y),int2str(j),…
‘FontSize',12,‘FontWeight',‘bold');
end
xlabel(Characteristics{x});
ylabel(Characteristics{y});
box(subplot1,‘on');
hold(subplot1,‘off');
set(subplot1,‘FontSize',12);
end
After the above program is run, the result is shown in Fig. 4.20.
Fig. 4.20 Clustering results of the SC method on the Iris dataset
The clustering results of the SC method are given in Fig. 4.20. The
centers of the clusters are represented by numbers. Numbers 1, 2 and 3
indicate category 1, category 2 and category 3, respectively. It can be
seen from the figure that the SC method solves this clustering problem.
The advantage of the SC method is that it does not require a
predetermined number of categories, but introduces a category
influence range parameter. The smaller the value of the category
influence range parameter, which is between 0 and 1, the greater the
number of categories classified by the method; conversely, the closer its
value is to 1, the smaller the number of categories classified by the
method.
The SC method can find out the number of categories as well as the
cluster centers of the dataset. We can initialize the FCM method with
the number of categories and cluster centers to discover more suitable
clustering results. Moreover, the FCM and SC methods can be used for
the construction of fuzzy inference systems.
In the previous sections, the fuzzy system, by default, uses a grid
partitioning method. Grid partitioning method uniformly partitions the
range of input variables and generates the membership function on the
result of this partition. If the FCM method is used to build the fuzzy
system, the fuzzy system uses the clustering centers obtained from the
FCM method to generate the membership functions and fuzzy rules. If
the SC method is used to build the fuzzy system, the fuzzy system uses
the clustering centers obtained by the SC method to generate the
membership functions and fuzzy rules.
Example 4.7 Please use the FCM and SC methods to build fuzzy
systems to solve MG time series prediction problem.
From the Example 4.7, it can be seen that the fuzzy system based on
grid partitioning has the smallest RMSE on the validation set, followed
by the fuzzy system based on the FCM method, and the largest RMSE is
the fuzzy system based on the SC method. It should be noted that in this
example, we did not adjust the parameters of the methods, so we
cannot determine which method is superior based on this result. It is
better for the reader to try all of them when solving specific problems
and optimize the parameters to get the best performance.
The clustering results of the SC method are given in Fig. 4.21, where the
raw data are represented by circle symbols, while the cluster centers
are represented by star symbols. In this figure, the horizontal axis is the
total employment of the input variable and the vertical axis is the
number of car trips of the output variable.
The RMSE of the fuzzy system using the SC method is 0.5276 on the
training set, while the RMSE on the validation set is 0.6179.
We can initialize ANFIS with the obtained fuzzy system in order to
be able to obtain a better fuzzy system. The results of ANFIS on the
validation set using the SC method are given in Fig. 4.22. In this figure,
the original data are represented by circle symbols, the results
predicted by the fuzzy system are represented by star symbols, and the
results predicted by ANFIS are represented by crosses. As can be seen
from the figure, the model of ANFIS shows better performance. The
RMSE of ANFIS on the training set is 0.3393, while the RMSE on the
validation set it is 0.5834.
Fig. 4.22 Results of ANFIS on the validation set using the SC method
Figure 4.23 gives the error of the model on the validation set for
each epoch. It can be seen from the figure that the error gradually
decreases with the number of epochs and reaches its lowest at the
52nd epoch, as shown by the asterisk point in the figure. The error
increases again during the subsequent epochs. Even after the 52nd
epoch, the error of the model on the training set still decreases, but the
error of the model on the validation set increases. This indicates that
ANFIS is in an overfitting state when the number of epochs exceeds 52.
Thus, we use the result obtained from the 52nd epoch as the final
model.
Fig. 4.23 Error curve of ANFIS on the validation set
Exercises
(1)
Try to construct a T-S type fuzzy system by using fuzzy logic
designer and observe the relationship surfaces of system model,
fuzzy rules, input variables and output variables.
(2)
Try to briefly explain the differences and connections between
type-1 fuzzy systems and interval type-2 fuzzy systems.
(3)
Choose a proportional-integral-derivative (PID) control problem
and try to construct a ANFIS fuzzy system to solve this control
problem and analyze its performance.
References
1. de Campos Souza PV (2020) Fuzzy neural networks and neuro-fuzzy networks: a review
the main techniques and applications used in the literature. Appl Soft Comput 92:106275.
https://doi.org/10.1016/j.asoc.2020.106275
[Crossref]
2. Jang JSR (1993) ANFIS: adaptive-network-based fuzzy inference system. IEEE Trans Syst
Man Cybern 23(3):665–685
[Crossref]
3. Wu D, Zeng Z, Mo H (2020) Feiyue Wang, Interval type-2 fuzzy sets and systems: overview
and outlook. ACTA Automatica Sinica 46(8):1539–1556
4. Mendel JM (2017) Uncertain rule-based fuzzy systems: introduction and new directions,
2nd edn. Springer, Cham, pp 229–234
[Crossref][zbMATH]
5. Chiu S (1994) Fuzzy model identification based on cluster estimation. J Intell Fuzzy Syst
2(3):267–278
[Crossref]
6. Sun Z, Cao Y, Wen Z et al (2023) A grey wolf optimizer algorithm based fuzzy logic power
system stabilizer for single machine infinite bus system. Energy Rep 9:847–853. https://
doi.org/10.1016/j.egyr.2023.04.365
[Crossref]
7. Tarafdar A, Majumder P, Deb M, Bera UK (2023) Diagnosis and prognosis of incipient faults
and insulation status for asset management of power transformer using fuzzy logic
controller & fuzzy clustering means. Electr Power Syst Res 220:10925. https://doi.org/10.
1016/j.epsr.2023.109256
[Crossref]
9. Sierra-Garcia JE, Santos M (2022) Deep learning and fuzzy logic to implement a hybrid
wind turbine pitch control. Neural Comput Applic 34:10503–10517. https://doi.org/10.
1007/s00521-021-06323-w
[Crossref]
10. Raja K, Ramathilagam S (2021) Washing machine using fuzzy logic controller to provide
wash quality. Soft Comput 25:9957–9965. https://doi.org/10.1007/s00500-020-05477-4
[Crossref]
11. Thakur K, Maji S, Maity S et al (2023) Multiroute fresh produce green routing models with
driver fatigue using Type-2 fuzzy logic-based DFWA. Expert Syst Appl 229:120300. https://
doi.org/10.1016/j.eswa.2023.120300
[Crossref]
12.
Kumar A, Raj R, Kumar A, Verma B (2023) Design of a novel mixed interval type-2 fuzzy
logic controller for 2-DOF robot manipulator with payload. Eng Appl Artif Intell
123:106329.https://doi.org/10.1016/j.engappai.2023.106329
14. Luo G, Wang Z, Ma B et al (2021) Observer-based interval type-2 fuzzy friction modeling
and compensation control for steer-by-wire system. Neural Comput Applic 33:10429–
10448. https://doi.org/10.1007/s00521-021-05801-5
[Crossref]
16. Karthika R, Deborah JL, Vijayakumar P (2020) Intelligent e-learning system based on fuzzy
logic. Neural Comput Applic 32:7661–7670. https://doi.org/10.1007/s00521-019-04087-
y
[Crossref]
18. Zahra SR, Chishti MA (2022) A generic and lightweight security mechanism for detecting
malicious behavior in the uncertain Internet of Things using fuzzy logic- and fog-based
approach. Neural Comput Applic 34:6927–6952. https://doi.org/10.1007/s00521-021-
06823-9
[Crossref]
19. Talpur N, Abdulkadir SJ, Alhussian H et al (2022) A comprehensive review of deep neuro-
fuzzy system architectures and their optimization methods. Neural Comput Applic
34:1837–1875. https://doi.org/10.1007/s00521-021-06807-9
[Crossref]
OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
X. Zhang et al., Intelligent Information Processing with Matlab
https://doi.org/10.1007/978-981-99-6449-9_5
5. Evolutionary Computing
Xiu Zhang1 , Xin Zhang1 and Wei Wang1
(1) Tianjin Normal University, Tianjin, China
Xin Zhang
Email: [email protected]
Wei Wang
Email: [email protected]
Abstract
Evolutionary computing mimics the laws of biological evolution. It
solves optimization problems through the reproduction of individuals
and the competition between individuals. Evolutionary computing is a
collection of evolutionary algorithms that follow the survival of the
fittest law in species. Evolutionary algorithms are global probability
search algorithms based on natural selection, genetic mutation and
other biological evolution mechanisms. Evolutionary computing has
been used in various fields such as pattern recognition, image
processing, economic management, mechanical engineering, electrical
engineering, wireless communication, etc. This chapter first introduces
an overview of evolutionary computing and simple genetic algorithm.
Genetic algorithm is then used to solve travelling salesman problem.
Then, this chapter introduces ant colony optimization, particle swarm
optimization and differential evolution algorithms. These algorithms
have been used to solve both travelling salesman problem and
continuous optimization problem.
5.1 Overview of Evolutionary Computing
Evolutionary Computing is an intelligent computing technology that
mimics the laws of biological evolution and solves optimization
problems through the reproduction of individuals and the competition
between individuals. Evolutionary computing aims to achieve “survival
of the fittest” in species; accordingly, it aims to reach optimal solution in
optimization problems. Evolutionary computing is also called
evolutionary computation. Evolutionary computing (EC) is not a
specific algorithm, but a collective term for many algorithms. For
example, genetic algorithm preceded the name evolutionary
computation, and genetic algorithm is a specific algorithm. In general,
genetic algorithm is considered to be the earliest evolutionary
computing method.
In the 1970s, genetic algorithm (GA) was first proposed by Holland
in the United States [1]. In 1975, Holland published the monograph
“Adaptation in Natural and Artificial Systems”. In the book, he
introduced GA and verified that it could solve the NP-hard
(Nondeterministic Polynomial-Hard) problems with good results. Since
then, many scholars have noticed GA as a method and have continued
to derive more effective versions, so the GA proposed by Holland is
often referred to as the simple genetic algorithm.
Genetic algorithm is a way to simulate the evolutionary mechanism
of biological evolution in nature [2]. Based on Darwin's theory of
biological evolution, GA translates the law of survival of the fittest into a
strategy for finding the optimal solution. In scientific and practical
problems, the function of GA is to find, among all possible solutions, the
one that best fits the problem by satisfying the constraints. GA can
provide an optimal solution to an optimization problem.
In the 1960s, Fogel in the United States proposed evolutionary
programming (EP). In the same period, Rechenberg and Schwefel in
Germany proposed evolution strategies (ES). They applied ES to
complex engineering problems and achieved good results, thus gaining
wide recognition. Methods such as GA, EP, and ES were developed alone
for more than a decade. Until the 1980s, these methods did not attract
much attention, partly because they were not mature enough by
themselves and partly because they were not really applied to practical
problems due to the limitations of computer performance.
In the 1990s, Koza in the United States proposed genetic
programming (GP) in his monograph. GP uses hierarchical tree
structure to express problems. After the branch of GP was proposed,
Evolutionary computing began to emerge as a discipline. The four
methods of GA, EP, ES and GP influence each other, learn from each
other, and gradually evolve new evolutionary methods, which promote
the rapid development of EC.
The GA mentioned earlier is able to solve NP-hard problems, which
are actually a class of combinatorial optimization problems. When the
learning rules of neural networks adjust the parameters of weights,
gradient descent method is used to continuously approach the optimal
weights through iterations. Subsequently, stochastic gradient descent
and batch gradient descent methods are derived. All these methods
need to calculate the gradient of the loss function, which is generally
used to require the loss function to have continuity and differentiability.
In combinatorial optimization problems, the values of the independent
variables are often discrete, which makes the gradient-based methods
no longer applicable. Gradient-based methods are sometimes referred
to as traditional optimization methods, while EC methods such as GA
are called modern optimization methods. This is because the gradient
descent method dates back to 1847 and was proposed by Cauchy.
The travelling salesman problem (TSP) is a typical combinatorial
optimization problem. The TSP problem is to find the shortest distance
or optimal path to visit each city once and return to the starting point,
given that some cities and their distances from each other are known.
Suppose the number of cities in the TSP is N, then the possible paths to
visit each city are (N − 1)!, where the exclamation point denotes the
factorial. We know that the factorial function tends to infinity very fast,
and finding the optimal path from these possible paths is very difficult.
So far, the TSP has been derived in various forms, such as the multi-
traveler problem. The problem is to have multiple travelers traversing
some cities together, and the requirement is that all cities are passed
through once and return to their respective starting points to find the
shortest path through all cities. In real life, vehicle routing problem
(VRP) is such a multi-traveler problem. However, the VRP problem has
more constraints, such as the demand of goods, the arrival time of
vehicles, the capacity of vehicles, and the distance traveled.
With the development of EC, researchers have created many
function optimization problems in order to test the performance of
algorithms. These function optimization problems are synthetic
problems, which are generally arithmetic and composite of basic
elementary functions. For example, the Schwefer function is a
composite of N power and sine functions, typically N = 20, with the
independent variable x taking values in the range [− 500, 500]. The
function is very deceptive in that it has one global minimum and
another local minimum at a more distant location. If an optimization
method is trapped in a local minimum, it is difficult for the method to
escape from the local region and thus cannot find the global minimum.
The Schwefer function is:
(5.1)
As shown in Fig. 5.1, the Schwefer function has many minima, which
is only the case for N = 2. When the number of independent variables
increases, finding the global minima becomes more difficult.
Fig. 5.1 Top view of the Schwefer function
For a better view of the minima, the front view of the Schwefel
function is given in Fig. 5.2. From the figure, it can be seen that the
global minimum is located near x1 = 400, while the second minima
point is located near x1 = − 300, which is far away from each other,
which also indicates that the Schwefel function is very deceptive. If a
certain EC method is able to find the global minimum of this function, it
is reasonable to assume that this method has good performance.
(5.2)
From Eq. (5.2), it can be seen that the global minimum of the
objective function is (x1, x2) = (1, 1) and is the only minimum. The steps
required for the simple genetic algorithm (SGA) to solve the model in
(5.2) include: individual coding, initial population generation, fitness
calculation, selection operation, crossover operation, and mutation
operation.
(1)
Individual coding. In EC, each independent variable is assigned a
possible value, and then the combination of the values of these
independent variables constitutes a solution to the problem,
called an individual. For example, (x1, x2) = (3, 4) is a solution to
the problem model (5.2), but it is not a minimal value; it is only a
candidate solution to the problem. The SGA does not directly use
the values in the range of values of the independent variables, but
encodes them in binary notation, thus mimicking the genes of the
organism. Considering that the value of the independent variable
is a positive integer between 1 and 7, it is possible to represent an
independent variable in 3-bit binary. For example, the binary
symbol 001 represents a positive integer 1, the binary symbol 010
represents a positive integer 2, and so on, 111 represents a
positive integer 7. Here two independent variables can be
represented by 6-bit binary symbols.
Initial population generation A population is a group of
Initial population generation. A population is a group of
(2)
individuals, and the SGA uses a population to mimic a population
of organisms. Suppose the size of the population is 4, i.e., the
population consists of 4 individuals. We can use uniform
distribution to randomly generate 0 s and 1 s as binary symbols
and form individuals. Suppose the populations produced are:
011101, 101011, 011100 and 111001.
(3)
Fitness calculation. According to the law of survival of the fittest,
there is competition between individuals, which means that the
merits of individuals should be compared. The fitness function
can be used to measure an individual, i.e., to assign a value to an
individual, so that the fitness value of an individual can be
compared to determine the merit of an individual. In SGA, the
fitness function is a non-negative function and the maximum
value of the function is sought as the optimization objective,
which requires a mapping from the objective function to the
fitness function. In (5.2), the objective function is to find the
minimal value point, and the range of the objective function is
greater than 0. Thus, the fitness function can take the inverse of
the objective function. By calculation, the fitness values of
individuals in the population are shown in Table 5.1.
Table 5.1 Fitness values of individuals in the population
(5)
Crossover operation
The crossover operation mimics the genetic crossover of chromosomes
in an individual organism. The SGA uses a single-point crossover
operator, which requires two individuals to participate in the operation.
For example, we pair the first and second selected individuals and then
randomly select the position for the crossover operation. An individual
has 6 binary bits, so there are 5 possible positions for the crossover. We
still use a uniform distribution to produce random numbers, assuming
the position of the crossover is 2. Swapping the binary bits behind the
crossover position gives us the individual after the crossover operation,
as shown in Fig. 5.3.
Fig. 5.3 Single-point crossover operation
(5.3)
From the above equation, it can be seen that f(x) is the objective
function, which is the traversal cost of a path. The first constraint
indicates that each city must go out once; similarly, the second
constraint indicates that each city can only go in once. Together, these
two constraints mean that each city passes through once and only once.
The third constraint is the elimination of subloops in the path. The last
constraint is the range of values of the independent variable xij, which
indicates whether the route from city i to city j is selected. If it is
necessary to go from city i to city j, then xij = 1; otherwise xij = 0.
Suppose the traveler wants to visit some cities in the United States,
which is also an example that comes with Matlab. It should be noted
that the map used here is an abbreviated version and not a complete
map of the United States, which can reduce the difficulty of the
problem.
Example 5.1 Suppose there are n = 40 cities selected within the map
boundary. The traveler needs to traverse all cities once and only once
and return to the initial city location. The geographic locations of these
40 cities are known and the distances between the cities have been
given. Please use the genetic algorithm solver to solve the TSP and draw
a graph to analyze the results.
Solution First, let's configure the basic data for the TSP, the
programs used are as follows:
load(‘usborder.mat’,‘x’,‘y',‘xx',‘yy');
cities = 40;
locations = zeros(cities,2);
rng(1);
n = 1;
while (n <= cities)
xp = rand*1.5;
yp = rand;
if inpolygon(xp,yp,xx,yy)
locations(n,1) = xp;
locations(n,2) = yp;
n = n + 1;
end
end
distances = zeros(cities);
for count1 = 1:cities
for count2 = 1:count1
x1 = locations(count1,1);
y1 = locations(count1,2);
x2 = locations(count2,1);
y2 = locations(count2,2);
distances(count1,count2) = sqrt((x1 − x2)^2 + (y1 − y2)^2);
distances(count2,count1) = distances(count1,count2);
end
end
figure1 = figure(1);
axes1 = axes(‘Parent',figure1);
hold(axes1,‘on');
box(axes1,‘on');
grid(axes1,‘on');
plot(x, y, ‘Color',‘black', ‘LineWidth',2);
plot(locations(:,1),locations(:,2),‘bo',‘LineWidth',2,‘LineStyle',‘none');
hold(axes1,‘off');
set(axes1,‘FontSize',14);
The result of the above program after running is shown in Fig. 5.5,
where the circle symbols indicate the location of the city.
The careful reader may notice that the GA presented in this section
differs in many ways from the simple genetic algorithm in the previous
section, and also does not present the selection operation and the
calculation of the fitness function. The selection operation is performed
using the roulette wheel method implemented in Matlab as follows:
function parents = selectionroulette(expectation,nParents,options)
expectation = expectation(:,1);
wheel = cumsum(expectation) / nParents;
parents = zeros(1,nParents);
for i = 1:nParents
r = rand;
for j = 1:length(wheel)
if(r < wheel(j))
parents(i) = j;
break;
end
end
end
In the above programs, the expectation refers to the fitness value of
all individuals, not the objective function value. The fitness function
uses the objective function, but the selection operation does not use the
fitness function value directly; it must be transformed to fit the
selection operation. In Matlab, the default mapping method is the
sorting-based fitness scaling method with the following programs:
function expectation = fitscalingrank(scores,nParents)
scores = scores(:);
[~,i] = sort(scores);
expectation = zeros(size(scores));
expectation(i) = 1 ./ ((1:length(scores)) .^ 0.5);
expectation = nParents * expectation ./ sum(expectation);
In the above procedure, scores refer to the objective function value,
while expectation is the adapted value after conversion. The mapping
method of converting the objective function value to the fitness value
can affect the performance of GA. When using the scaling method, if the
scaled values vary too much, individuals with high scaling values are
likely to reproduce faster than those with low scaling values, i.e.,
individuals with high scaling values have a higher probability of being
selected, which can limit the search range of GA. On the other hand, if
the scaled value is too small, the probability of all individuals being
selected tends to be the same, which reduces the convergence speed of
GA and leads to a longer number of iterations and computation time.
The ranking-based fitness scaling method scales each individual
according to its ranking in the population and does not directly use the
original objective function value for scaling. For example, for the
minimization optimization problem, the rank of the individual with the
lowest objective function value is denoted as 1, the rank of the
individual with the second lowest objective function value is denoted as
2, and so on. The rank of the individual with the largest objective
function value is denoted as “popsize”. The scaled fitness value for the
individual whose rank is i is:
(5.4)
where fit(i) denotes the fitness value of the individual ranked as i. From
(5.4), it can be seen that the fitness value is a multiple of . The
ranking-based fitness scaling method can avoid the uneven dispersion
of the objective function values. The individual ranked 1 has the largest
scaled multiplier, while the remaining individuals have the same scaling
multiplier.
Besides the above operation methods, we can find other selection
operations, crossover operations and mutation operations. They are not
described here.
Example 5.2 Suppose there are n = 40 cities within the selected map
boundary, the traveler needs to traverse all cities once and only once
and return to the initial city location. The geographic locations of these
40 cities are known and the distances between the cities are given.
Write a program for the ACO algorithm to solve the problem and draw a
graph to analyze the results.
where k denotes the k-th ant, i and j denote cities, denotes the
pheromone on the route from city i to city j, denotes the heuristic
information from city i to city j, denotes the importance of the
pheromone, denotes the importance of the heuristic information,
and denotes the set of cities that the k-th ant is allowed to
visit. Equation (5.5) shows that the state transfer of the ant relies on
the pheromone of the route and also relies on the number of cities it
can see.
There are also some tricks to update the pheromone. The ants
update the pheromone after the end of the trip. The original
pheromone will volatilize after each iteration. Each ant will make a
recommendation for its walking route. This recommendation is based
on the consumption cost of the ant during the whole trip. The
mathematical expression of the pheromone update is:
(5.6)
(5.7)
(5.11)
where ωmin and ωmax are the minimum and maximum values of the
inertia factor, t is the current generation, and tmax is the maximum
number of iterations of the PSO algorithm.
Due to the introduction of the inertia factor, the PSO algorithm using
(5.10) and (5.11) has achieved better results and gained recognition to
the extent that (5.10) and (5.11) are often used as the standard PSO
algorithm.
(5.12)
Except using the toolbox of Matlab to solve the problem, we can also
write our own programs for the PSO algorithm:
function [xbest, fbest, cvgef] = PSO(fhd, xdim, xlb, xub, maxFEs)
popsize = 40;
omiga = 1/(2*log(2)) * ones(1, popsize);
maxomiga = 0.9;
minomiga = 0.4;
c1 = 0.5 + log(2);
K = 3;
popu = zeros(popsize, xdim); fpopu = zeros(popsize, 1);
tfun = 0;
for inp = 1:popsize
popu(inp, :) = xlb + (xub - xlb).*rand(1, xdim);
t1 = tic;
fpopu(inp, 1) = feval(fhd, popu(inp, :));
tfun = tfun + toc(t1);
end
xbestidx = find(fpopu == min(fpopu));
if ~isempty(xbestidx)
xbestidx = xbestidx(end);
fbest = fpopu(xbestidx);
xbest = popu(xbestidx, :);
else
fbest = inf;
xbest = inf * ones(1, xdim);
end
vmin = − 2*(xub-xlb); vmax = 2*(xub-xlb);
vel = repmat(vmin, popsize, 1) + rand(popsize, xdim).*
(repmat(vmax-vmin, popsize, 1));
pbest = popu; fpbest = fpopu;
neighbor = neighborSelection(popu, fpopu, K);
cvgef = nan * ones(1, maxFEs);
ieval = popsize;
igen = 1; imprFlag = 1;
idxf1 = 1; idxf2 = ieval; idxstat = 1; isidxstatupdated = − 1;
while ieval < maxFEs
cvgef(idxf1:idxf2) = fbest;
isidxstatupdated = − 1;
if imprFlag < 0
neighbor = neighborSelection(popu, fpopu, K);
imprFlag = 1;
end
[nbest, fnbest] = update_nbest(pbest, fpbest, neighbor);
popunew = nan * ones(popsize, xdim); fpopunew = nan *
ones(popsize, 1);
velnew = nan * ones(popsize, xdim);
omiga = inertiaWeightAdjustment(omiga, minomiga, maxomiga,
ieval, maxFEs);
for inp = 1:popsize
veltmp = c1 * rand(1,xdim).*(pbest(inp, :) - popu(inp, :)) + …
c1 * rand(1,xdim).*(nbest(inp, :) - popu(inp, :));
velnew(inp, :) = omiga(1,inp) * vel(inp, :) + veltmp;
for ix = 1:xdim
if velnew(inp, ix) < vmin(1, ix)
velnew(inp, ix) = vmin(1, ix) + rand(1) * (vmax(1,ix)-
vmin(1,ix));
elseif velnew(inp, ix) > vmax(1, ix)
velnew(inp, ix) = vmax(1, ix) + rand(1) * (vmax(1,ix)-
vmin(1,ix));
end
end
popunew(inp, :) = popu(inp, :) + velnew(inp, :);
for ix = 1:xdim
if popunew(inp, ix) < xlb(ix)
popunew(inp, ix) = xlb(ix) + rand(1)*(xub(ix)-xlb(ix));
elseif popunew(inp, ix) > xub(ix)
popunew(inp, ix) = xub(ix) + rand(1)*(xub(ix)-
xlb(ix));
end
end
end % for inp
t1 = tic;
for inp = 1:popsize
fpopunew(inp, 1) = feval(fhd, popunew(inp, :));
ieval = ieval + 1;
end
tfun = tfun + toc(t1);
popu = popunew; fpopu = fpopunew;
vel = velnew;
[pbest, fpbest] = update_pbest(popu, fpopu, pbest, fpbest);
igen = igen + 1;
[fbesttmp, idx] = min(fpopu);
if fbesttmp(1) < fbest
fbest = fbesttmp(1);
xbest = popu(idx(1), :);
imprFlag = 1;
else
imprFlag = − 1;
end
idxf1 = idxf2 + 1;
idxf2 = ieval;
idxstat = idxstat + 1;
isidxstatupdated = 1;
end
if isidxstatupdated < 0
idxf1 = idxf2 + 1;
idxf2 = maxFEs;
idxstat = idxstat + 1;
end
if idxf2 > maxFEs
idxf2 = maxFEs;
end
cvgef(idxf1:idxf2) = fbest;
end
The above program is the main program of the PSO algorithm.
When we compute the local best and global best particles, we use the
following programs:
function [nbest, fnbest] = update_nbest(popu, fpopu, neighbor)
[np, xdim] = size(popu);
nbest = nan * ones(np, xdim); fnbest = nan * ones(np, 1);
for irow = 1:np
nidx = neighbor(irow, :) > 0.5;
ftmp = fpopu(nidx); xtmp = popu(nidx, :);
[fnbest(irow), bestidx] = min(ftmp);
nbest(irow, :) = xtmp(bestidx, :);
end
end
When calculating local optimum, the local neighborhood is
computed by the following programs:
function neighbor = neighborSelection(popu, fpopu, K)
np = size(popu, 1);
neighbor = eye(np,np);
for irow = 1:np
nidx = randperm(np);
nidx(nidx == irow) = [];
neighbor(irow, nidx(1:K)) = 1;
end
end
After movement, each particle has to record its local best. The
programs are:
function [pbest, fpbest] = update_pbest(popu, fpopu, pbest, fpbest)
for irow = 1:length(fpbest)
if fpopu(irow) < fpbest(irow)
pbest(irow, :) = popu(irow, :);
fpbest(irow) = fpopu(irow);
end
end
end
The linear decreasing strategy of the inertia factor is realized by:
function omiga = inertiaWeightAdjustment(omiga, minomiga,
maxomiga, ieval, maxFEs)
for irow = 1:length(omiga)
omiga(irow) = ((maxFEs − ieval) * (maxomiga - minomiga)) /
(maxFEs − 1) + minomiga;
end
end
If we want to use a custom PSO algorithm to solve the sphere
function problem, the programs are as follows:
rng(1);
fun = @functionSphere;
nvars = 2;
lb = [− 100, − 100];
ub = [100, 100];
maxFEs = 3000;
[xbest, fbest, cvgef] = PSO(fun, nvars, lb, ub, maxFEs);
After running the above program, we see that the optimal solution
output by the algorithm is approximately equal to (0, 0), and the
corresponding objective function value is close to 0. This is the same
result obtained using the toolbox in Matlab.
The advantage of using a custom program is the manipulability. For
example, the “cvgef” variable returned in the above custom program,
which records the optimal fitness value of the algorithm for each
generation, can be used to plot the convergence curve of the algorithm,
as shown in Fig. 5.11. In this figure, where the vertical axis is on a
logarithmic scale, making it easier to observe the convergence process
of the algorithm.
Fig. 5.11 Convergence curve of the PSO algorithm on Example 5.3
Example 5.4 Suppose there are n = 40 cities within the map boundary,
the traveler needs to traverse all cities once and only once and return to
the initial city location. The geographic locations of these 40 cities are
known and the distances between the cities are given. Write a program
for the PSO algorithm to solve the TSP problem, and draw a graph to
analyze the results.
It can be seen from Fig. 5.12 that the optimal route output by the PSO
algorithm is not good. The route in the figure has large straight-line
segments. In fact, the total distance length of the route output by the
PSO is about 7.43, which is larger than the results given by the GA and
the ACO algorithms.
The convergence curve of the PSO algorithm is given in Fig. 5.13.
The solid line in the figure indicates the shortest distance of the particle
swarm at each generation, while the dashed line indicates the average
distance of the particle swarm at each generation. It can be seen that
after 200 generations, the shortest distance gradually decreases and
level off, indicating that the PSO algorithm is close to convergence.
However, compared with the GA and ACO algorithms, the output of the
PSO algorithm is not good, which to a certain extent indicates that the
PSO algorithm is not very suitable for discrete optimization problems.
The performance of the PSO algorithm needs to be further improved.
Fig. 5.13 Convergence curve of the PSO algorithm for solving the TSP
(5.16)
After generating all the trial vectors, their function values are
calculated and then the survivor selection operation is performed.
In the survivor selection operation, a new population containing Np
individuals will be generated, denoted as Gg+1 = {x1,g+1, x2,g+1, …,
xNp,g+1}. For generating xi,g+1, greedy selection is performed between
ui,g and xi,g. Mathematically, the selection is given by:
(5.17)
Example 5.6 Suppose there are n = 40 cities within the map boundary,
the traveler needs to traverse all cities once and only once and return to
the initial city location. Write a program for the DE algorithm to solve
the TSP problem, and draw a graph to analyze the results.
After the above program is run, the optimal value and optimal solution
of the DE algorithm are output, as shown in Fig. 5.16.
Fig. 5.16 Optimal route of the DE algorithm for Example 5.6
It can be seen from Fig. 5.16 that the optimal route output by the DE
is not good. The route in the figure has long straight-line segments. In
fact, the total distance length of the optimal route output by the DE
algorithm is about 9.92, which is greater than the results given by the
GA, ACO and PSO algorithms.
The convergence curve of the DE algorithm for Example 5.6 is given
in Fig. 5.17. The solid line in the figure indicates the shortest route
distance of the population at each generation, while the dashed line
indicates the average route distance of the population at each
generation. In Fig. 5.17, the number of iterations means the number of
generations. It can be seen that after 200 generations, the shortest
route distance gradually decreases and level off, indicating that the DE
algorithm is close to convergence. However, compared with the GA, ACO
and PSO algorithms, the output of the DE algorithm is not good, which
to a certain extent indicates that the DE algorithm is not well suited for
discrete optimization problems. The DE algorithm needs further
improvement to better solve the TSP.
Fig. 5.17 Convergence curve of the DE algorithm for Example 5.6
Exercises
(1)
Readers choose a set of optimization test functions, then select no
less than two evolutionary computing methods. Readers have to
write programs, adjust the parameters of the evolutionary
computing methods to solve the optimization test function, and
compare the performance of the chosen methods in terms of the
number of convergence iterations, the optimal solution output by
the methods, and the stability of the methods over multiple runs.
(2) Suppose there is a traveler who wants to visit 31 provincial
capitals across the country, and the traveler needs to choose the
route to be taken. The coordinates of the 34 cities in the country
are given in Table 5.5 as follows:
Table 5.5 The coordinates of the 34 cities
References
1. Katoch S, Chauhan SS, Kumar V (2021) A review on genetic algorithm: past, present, and
future. Multimedia Tools Appl 80:8091–8126. https://doi.org/10.1007/s11042-020-10139-
6
[Crossref]
3. Deneubourg JL, Aron S, Goss S, Pasteels JM (1990) The self-organizing exploratory pattern of
the argentine ant. J Insect Behav 3:159–168
[Crossref]
6. Storn R, Price K (1997) Differential evolution—a simple and efficient heuristic for global
optimization over continuous spaces. J Global Optim 11(4):341–359
[MathSciNet][Crossref][zbMATH]
OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
X. Zhang et al., Intelligent Information Processing with Matlab
https://doi.org/10.1007/978-981-99-6449-9_6
Xin Zhang
Email: [email protected]
Wei Wang
Email: [email protected]
Abstract
Evolutionary computing is a collection of evolutionary algorithms.
Different algorithms have different properties. For example, genetic
algorithm is suitable for discrete optimization problems; while
differential evolution algorithm is suitable for continuous optimization
problems. The pros and cons of such algorithms have to be studies and
tested on well-known optimization problems. This chapter presents a
test set of traveling salesman problem and a test set of continuous
optimization problem. The evaluation metrics are introduced to
compare and analyze evolutionary algorithms. Then, this chapter
introduces two recent evolutionary computing methods. They are
artificial bee colony algorithm and fireworks algorithm. Finally, the
state-of-the-art research progress of evolutionary computing are
presented.
6.1 Test Set of Traveling Salesman Problem
The traveling salesman problem (TSP) is a typical discrete optimization
problem that has been studied extensively by researchers in recent
years. In the previous chapter, we gave a case study of using
evolutionary computing (EC) methods to solve the TSP. In this section
we will give a test set of traveling salesman problems (TSPs). The test
set contains seven problems with increasing difficulty. The seven
problems are well suited for testing the performance of optimization
methods.
Figure 6.1 gives the city distribution of the 1-st instance of the test
set. In Fig. 6.1, the horizontal axis is east longitude and the vertical axis
is north latitude. As can be seen from the figure, the city distribution
shows a scattered nature, with some cities being far away and others
being close.
Fig. 6.1 City distribution map of TSPCNProblem1
Figure 6.2 gives the city distribution for the 7-th instance in the test
set. As can be seen from the figure, the distribution of cities shows
unevenness, with some cities being far away and others being close. In
particular, the cities in the lower right are more densely distributed,
while those in the upper left are more scattered.
Fig. 6.2 City distribution map of TSPCNProblem7
Figures 6.1 and 6.2 give the city distribution maps for the 1-st and
7-th instances. We do not give the city distribution maps for the
remaining instances because the distribution characteristics of the
remaining cities are between these two instances. The cities of
TSPCNProblem1 are a subset of cities of TSPCNProblem7; while
TSPCNProblem7 contains more cities than TSPCNProblem1.
We stored the north latitude and east longitude coordinate data as
“csv” files with corresponding names. We could solve the above 7 TSP
instances using the EC methods. Due to the stochastic nature of the EC
method, we need to independently repeat the optimization method
used 31 times, saving the optimal route and the shortest route distance
as “csv” files. In each file, the data are split by commas.
As can be seen in Fig. 6.3, the traversal route found by the GA for the
1-st instance has overlapping paths. This indicates that the GA did not
find the optimal solution for the instance. Interested readers can
further find better traversal routes by adjusting the hyperparameters of
the GA.
The results for all instances of the GA to solve the TSPCN test set are in
Table 6.2. This table gives the lengths of the best route found by
running the GA after 31 independent runs. In Table 6.2, columns 2
through 8 are the results from the 1-st instance to the 7-th instance.
Each row from row 2 onward is the length of the best route obtained by
running the GA independently.
Table 6.2 Route length found by GA on TSPCN test set
(6.1)
(6.2)
(6.3)
(6.5)
The fourth base function is also known as the HGBat function. The
fifth base function is:
(6.6)
(6.7)
The sixth base function is also known as the Griewank function. The
seventh base function is:
(6.8)
(6.9)
(6.11)
(6.13)
(6.14)
(6.15)
(6.16)
(6.17)
The tenth base function is also known as the Lunacek bi-Rastrigin
function. The eleventh base function is:
(6.18)
Among them:
(6.19)
(6.20)
The eleventh base function is also known as the modified Schwefel
function. The twelfth base function is:
(6.21)
Among them:
(6.22)
(6.24)
Example 6.2 Suppose we use the PSO algorithm to solve for the
functions in the CEC2020 test set with D = 10, 15 and 20. Please analyze
the effect of the PSO algorithm.
Solution The CEC2020 test set has been introduced in Sect. 6.2. The
PSO algorithm has been introduced in Sect. 5.5. In this example, we use
the PSO algorithm on the CEC2020 test set, and the results are given in
Table 6.6.
Table 6.6 Results of the PSO algorithm on the CEC2020 test set with D = 10
Table 6.6 gives the error between the best solution found by the PSO
algorithm and the optimal value of the corresponding function. In Table
6.6, the second column shows the minimum error (min) for each
function; the third column shows the maximum error (max) for each
function; the fourth column shows the median error (med) for each
function; the fifth column shows the mean error for each function; and
the sixth column shows the standard deviation (std) of the error for
each function. From these results, it is clear that the PSO algorithm is
able to find near-optimal solutions when solving problems f4, f6, f7 and
f8. However, the PSO algorithm does not find the optimal solution in
every run. The standard deviation of the method is particularly large
when solving problems f1 and f5, which indicates that the method is not
suitable for solving similar problems or that the performance of the
algorithm is not stable on such problems.
Similarly, Tables 6.7 and 6.8 give the cases where the PSO algorithm
solves the CEC2020 test set for D = 15 and D = 20, respectively. As can
be seen from the tables, the PSO algorithm in dimension D = 20 for
solving function f1 finds a better value than D = 15; while for function f5
and function f7, the PSO algorithm finds a better value than D = 20 in
dimension D = 15. It can be seen that a consistent law of variation
cannot be derived from the results of these two dimensions. A specific
analysis of the same function with different D values is required.
Table 6.7 Results of the PSO on the CEC2020 test set with D = 15
Table 6.8 Results of the PSO algorithm on the CEC2020 test set with D = 20
In Table 6.9, the second column shows the running time T0, which is
the time consumed to run the following programs:
t0 = tic;
x = 0.55;
for i = 1:1000000
x = x + x;
x = x / 2;
x = x * x;
x = sqrt(x);
x = log(x);
x = exp(x);
x = x / (x + 2);
end
T0 = toc(t0);
It can be seen that T0 is the calculation of some basic elementary
function. The running time T1 in the third column of Table 6.9 is the
time spent running the following programs:
t0 = tic;
D = 20;
rng(‘shuffle’);
xlb = − 100;
xub = 100;
fhd = str2func(‘cec20_func’);
fid = 7;
x = xlb + rand(200,000,D)*(xub-xlb);
fx = feval(fhd, x’, fid);
T1 = toc(t0);
The third to last row and second to last row of the above program
are the evaluation of the function f7. They are programmed in a matrix
fashion; i.e., a number of candidate solutions are first created to form a
matrix. Each row of the matrix is a candidate solution of the problem.
All the candidate solutions are evaluated at once. In addition, it is also
possible if the EC method uses a vector or single variable programming
approach. It is important to note that the computer runtime is faster
using the matrix programming approach. Thus, readers may obtain
different complexity by using different programming approach.
The forth column is the running time T2. The associated time is
computed by the following programs:
T2 = zeros(1,5);
for jrun = 1:length(T2)
t0 = tic;
FuncOpt = [100,1100,700,1900,1700,1600,2100,2200,2400,2500];
VTR = 1e-8;
D = 20;
MFE = 200,000;
rng(‘shuffle’);
xlb = −100;
xub = 100;
popsize = 100;
fhd = str2func(‘cec20_func’);
fid = 7;
[tmp1,tmp2,tmp3,tmp4] =
PSO(fhd,D,popsize,MFE,xlb,xub,FuncOpt(fid),
VTR,fid);
T2(jrun) = toc(t0);
end
T2 = mean(T2);
The above program is the PSO algorithm executed 5 times
independently to solve the function f7. Then the results after 5 times are
averaged to obtain T2.
The last column in Table 6.9 shows the computational complexity of
the PSO method, which is computed by:
(6.25)
where denotes the computational complexity of the PSO
algorithm. If other EC methods are used, it is only necessary to change
the PSO algorithm to other methods. Thus, we can compute complexity
of any EC method based on (6.25).
In addition to the above computational complexity calculation
methods, the complexity of an EC method can be analyzed theoretically.
Taking the PSO algorithm as an example. Suppose that the complexity
of the random number generator is not considered, and only the
addition, subtraction and multiplication computations of the PSO
algorithm are considered. The population size of the PSO algorithm is
Np, the dimension is D, and the number of generations of evolution is
Ng. From Eq. (5.8), it can be seen that 4D addition, subtraction and 4D
multiplication operations are required for the velocity update of each
particle. The position update for each particle requires D addition
operations from Eq. (5.9). Therefore, the number of arithmetic
operations required for each particle update is about 9D. In each
generation, the PSO algorithm needs to update the position for each
particle. At the end of each generation, the PSO algorithm requires
about 9DNp arithmetic operations. And after all evolved generations,
the PSO algorithm requires about 9DNpNg arithmetic operations.
Therefore, the computational complexity of the PSO algorithm is about
O(DNpNg) in theory.
Readers can do their own theoretical analysis of the computational
complexity of other EC methods such as GA and ACO. Most EC methods
have a theoretical computational complexity of at least O(DNpNg).
Clearly, this conclusion may be controversial. At least it is a scheme to
analyze the complexity of EC methods.
Solution The CEC2020 test set has been introduced in Sect. 6.2. The
DE algorithm has been introduced in Sect. 5.6. The classical DE
algorithm can solve continuous optimization problems. In this example,
we use the DE algorithm on the CEC2020 test set, and the results are
given in Tables 6.10, 6.11 and 6.12.
Table 6.10 Results of the DE algorithm on the CEC2020 test set with D = 10
Table 6.11 Results of the DE algorithm on the CEC2020 test set with D = 15
Table 6.12 Results of the DE algorithm on the CEC2020 test set with D = 20
Table 6.10 shows the results on CEC2020 test set with D = 10. Table
6.11 shows the results on CEC2020 test set with D = 15. Table 6.12
shows the results on CEC2020 test set with D = 20. As can be seen from
the tables, the DE algorithm is able to find the optimal solution for the
function f1 in different dimensional cases. In addition, although the
optimal solution of function f7 cannot be found, the DE algorithm
performs better than the PSO algorithm on this function.
From Tables 6.10, 6.11 and 6.12, it can be seen that the standard
deviation of the DE algorithm on each function is not large. This
indicates that the DE algorithm has good stability, i.e., the optimal
solutions found do not differ much when run independently for many
times. Even in solving some problems, it is only necessary to run the
method once to find the appropriate solution without repeatedly
running it many times.
(6.26)
where vi,j, xi,j and xr1,j is the j-th element of vi, xi, and xr1, respectively; φ
∈ [− 1, 1] is a random number. j1 ∈ [1, D] and r1 ∈ [1, Np] are random
integers. vi is the newly generated candidate solution. After evaluating
vi, a greedy selection is performed between vi and xi and the winner is
stored as the new xi.
3 For j = 1 to D
4 ;
5 Evaluate , compute its fitness (i = 1, 2, …, Np)
10 j1 = randInt(1, D);
11 Do r1 = randInt(1, Np); while(r1 = = i);
12 ;
13 For j = 1 to D
14 If j = = j1, ;
15 Else ;
16 Evaluate , compute its fitness, set iter = iter + 1;
17 If
18 Replace by ;
19 Replace by , replace by ;
20 ;
21 Else ;
22 /* onlooker bee phase */
23 Compute probability of food sources;
24 For i = 1 to Np
28 ;
29 For j = 1 to D
30 If j = = j1, ;
31 Else ;
32 Evaluate , compute its fitness, set iter = iter + 1;
33 If
34 Replace by ;
35 Replace by , replace by ;
36 ;
37 Else ;
38 /* scout bee phase */
39 For i = 1 to Np
40 If
41 For j = 1 to D
42 ;
43 Evaluate , compute its fitness, set iter = iter + 1;
44 Replace by ;
45 Replace by , replace by ;
46 ;
(6.27)
(6.28)
Solution The CEC2020 test set has been introduced in Sect. 6.2. The
ABC algorithm can solve continuous optimization problems, and we use
it to solve the functions in the CEC2020 test set. The results on the test
set are shown in Tables 6.13, 6.14 and 6.15, where Table 6.13 shows the
results for functions with D = 10, Table 6.14 shows the results for
functions with D = 15, and Table 6.15 shows the results for functions
with D = 20.
Table 6.13 Results of the ABC algorithm on CEC2020 test set with D = 10
Table 6.14 Results of the ABC algorithm on CEC2020 test set with D = 15
Table 6.15 Results of the ABC algorithm on CEC2020 test set with D = 20
From Tables 6.13, 6.13 and 6.15, it can be seen that the ABC
algorithm has worse results than the DE algorithm and the PSO
algorithm on function f5 and function f7. While the ABC algorithm has
better results than the PSO algorithm on function f1, and the ABC
algorithm has better results than the DE algorithm and the PSO
algorithm on functions f8, f9 and f10. This indicates that each of the
three EC methods has its own advantages and disadvantages. From this
perspective, readers can try to fuse these three algorithms, for example,
using an integrated learning approach, to obtain an improved algorithm
with better performance on the CEC2020 test set.
6.5 Fireworks Algorithm
Fireworks Algorithm (FWA) is a SI optimization method [2]. The
method is inspired by the explosion of fireworks in the night sky. It
generates sparks by simulating the explosion of fireworks to illuminate
a part of the night sky. The FWA was proposed by Tan of Peking
University in 2010 [3].
The flow of the FWA is shown in Fig. 6.5. In the fireworks algorithm,
we need to generate N random locations of fireworks in the search
space. One firework corresponds to a feasible solution of the problem.
Based on the fitness, we can assign resources to each firework and thus
control the explosion behavior of the fireworks. Each firework is
assigned a blast radius and the number of sparks it can produce. Then,
each firework explodes, producing the corresponding number of
sparks. A Gaussian mutation operation is then applied to the generated
sparks for a better search. The FWA also has a selection operation to
choose N new fireworks locations from the three sets of feasible
solutions: fireworks, exploding sparks, and Gaussian variant sparks.
The above steps are cycled until the algorithm terminates. The above
steps are the flow of the FWA, and we next describe in detail how each
step is computed.
Fig. 6.5 Flow chart of the fireworks algorithm
(6.30)
(6.31)
(6.35)
Exercises
(1)
The TSPCN test set include seven TSP problems. The test set has
been introduced in Sect. 6.1. Try to use GA, PSO, DE, ABC or other
EC methods to solve the TSPCN test set, and analyze the solutions
obtained by different EC methods.
(2)
Section 6.5 introduces the FWA algorithm. Try to use the FWA
algorithm to solve the CEC2020 test set, and analyze the
performance of the FWA algorithm.
References
1. Yue CT, Price KV, Suganthan PN, Liang JJ, Ali MZ, Qu BY, Award NH, Biswas PP (2020)
Problem Definitions and evaluation Criteria for the CEC 2020 special session and
competition on single objective bound constrained numerical optimization. Technical
Report 201911, Computational Intelligence Laboratory, Zhengzhou University, Zhengzhou
China, Nanyang Technological University, Singapore
3. Li J, Tan Y (2020) A comprehensive review of the fireworks algorithm. ACM Comput Survey
52:1–28
[Crossref]
4. Shi Y (2011) Brain storm optimization algorithm. In: Tan Y, Shi Y, Chai Y, Wang G (eds)
Advances in swarm intelligence. ICSI 2011. Lecture notes in computer science, vol 6728.
Springer, Berlin, Heidelberg, pp 303–309
6. Cheng S, Qin Q, Chen J et al (2016) Brain storm optimization algorithm: a review. Artif Intell
Rev 46:445–458
[Crossref]
9. Liang Z, Qin Q, Zhou C (2022) An image encryption algorithm based on Fibonacci Q-matrix
and genetic algorithm. Neural Comput Applic 34:19313–19341. https://doi.org/10.1007/
s00521-022-07493-x
[Crossref]
10. Nguyen TPQ, Kuo RJ, Le MD et al (2022) Local search genetic algorithm-based possibilistic
weighted fuzzy c-means for clustering mixed numerical and categorical data. Neural
Comput Applic 34:18059–18074. https://doi.org/10.1007/s00521-022-07411-1
[Crossref]
11. Abbasi S, Rahmani AM, Balador A, Sahafi A (2023) A fault-tolerant adaptive genetic
algorithm for service scheduling in internet of vehicles. Appl Soft Comput 143:110413.
https://doi.org/10.1016/j.asoc.2023.110413
[Crossref]
12. Luo Q, Wang H, Zheng Y et al (2020) Research on path planning of mobile robot based on
improved ant colony algorithm. Neural Comput Applic 32:1555–1566. https://doi.org/10.
1007/s00521-019-04172-2
[Crossref]
13.
Wu Z, Wu J, Zhao M et al (2021) Two-layered ant colony system to improve engraving
robot’s efficiency based on a large-scale TSP model. Neural Comput Applic 33:6939–6949.
https://doi.org/10.1007/s00521-020-05468-4
[Crossref]
14. Yu J, You X, Liu S (2022) Dynamically induced clustering ant colony algorithm based on a
coevolutionary chain. Knowl-Based Syst 251:109231. https://doi.org/10.1016/j.knosys.
2022.109231
[Crossref]
15. Shami TM, Mirjalili S, Al-Eryani Y et al (2023) Velocity pausing particle swarm
optimization: a novel variant for global optimization. Neural Comput Applic 35:9193–9223.
https://doi.org/10.1007/s00521-022-08179-0
[Crossref]
17. Zhang X, Zhang X, Wu Z (2019) Spectrum allocation by wave based adaptive differential
evolution algorithm. Ad Hoc Netw 94:101969
[Crossref]
18. Kumar R, Kumar P, Kumar Y (2022) Three stage fusion for effective time series forecasting
using Bi-LSTM-ARIMA and improved DE-ABC algorithm. Neural Comput Applic 34:18421–
18437. https://doi.org/10.1007/s00521-022-07431-x
[Crossref]
19. Zhang X, Zhang X, Han L (2019) An energy efficient internet of things network using restart
artificial bee colony and wireless power transfer. IEEE Access 7:12686–12695
[Crossref]
20. Stephan P, Stephan T, Kannan R et al (2021) A hybrid artificial bee colony with whale
optimization algorithm for improved breast cancer diagnosis. Neural Comput Applic
33:13667–13691. https://doi.org/10.1007/s00521-021-05997-6
[Crossref]
22. Satoh T, Nishizawa S, Nagase J et al (2023) Artificial bee colony algorithm-based design of
discrete-time stable unknown input estimator. Inf Sci 634:621–649. https://doi.org/10.
1016/j.ins.2023.03.130
[Crossref]
23.
Luo H, He C, Zhou J, Zhang L (2021) Rolling bearing sub-health recognition via extreme
learning machine based on deep belief network optimized by improved fireworks. IEEE
Access 9:42013–42026
[Crossref]
24. Han S, Zhu K, Zhou M et al (2022) A novel multiobjective fireworks algorithm and its
applications to imbalanced distance minimization problems. IEEE/CAA J Automatica Sinica
9(8):1476–1489
[Crossref]
25. Ma L, Cheng S, Shi Y (2021) Enhancing learning efficiency of brain storm optimization via
orthogonal learning design. IEEE Trans Syst Man Cybern Syst 51(1):6723–6742
[Crossref]
26. Xue Y, Zhao Y, Slowik A (2021) Classification based on brain storm optimization with
feature selection. IEEE Access 9:16582–16590
[Crossref]
27. Duan H, Zhao J, Deng Y, Shi Y, Ding X (2021) Dynamic discrete pigeon-inspired optimization
for multi-UAV cooperative search-attack mission planning. IEEE Trans Aerosp Electron Syst
57(1):706–720
[Crossref]
OceanofPDF.com