0% found this document useful (0 votes)
29 views11 pages

The Direction Analysis On Trajectory of Fast Neural Network Learning Robot

This article proposes a new algorithm that combines backpropagation neural networks (BPNN) and reinforcement learning (Q-Learning) to efficiently apply artificial neural networks to robot trajectory planning. The algorithm is tested in a simulation with a ROS mobile robot navigating an obstacle environment. Results show the algorithm achieves at least 5.47% higher accuracy and 5.5% higher precision, recall, and F1 scores compared to other neural network algorithms, allowing the robot to find the shortest and best trajectory regardless of obstacle number or environment size. The algorithm provides improved results over advanced models and has practical applications in robot trajectory planning.

Uploaded by

Rachma Okta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views11 pages

The Direction Analysis On Trajectory of Fast Neural Network Learning Robot

This article proposes a new algorithm that combines backpropagation neural networks (BPNN) and reinforcement learning (Q-Learning) to efficiently apply artificial neural networks to robot trajectory planning. The algorithm is tested in a simulation with a ROS mobile robot navigating an obstacle environment. Results show the algorithm achieves at least 5.47% higher accuracy and 5.5% higher precision, recall, and F1 scores compared to other neural network algorithms, allowing the robot to find the shortest and best trajectory regardless of obstacle number or environment size. The algorithm provides improved results over advanced models and has practical applications in robot trajectory planning.

Uploaded by

Rachma Okta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3110342, IEEE
Access

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000. Digital Object Identifier 10.1109/ACCESS.2017.Doi Number

The Direction Analysis on Trajectory of Fast Neural


Network Learning Robot
Xiaohong Li1, 2,a,* Maolin Li 1,b
1
School of Information Engineering, Shaoyang University, Shaoyang 422000, Hunan, China
2
Graduate school,Adamson University,Manila 1000,Philippines
*Corresponding author aEmail: [email protected]
b
Email: [email protected]
Abstract: This study is to efficiently apply artificial neural network (ANN) to the robotics, so as to provide
experimental basis for mobile robots to learn the optimal trajectory planning strategy. An algorithm model is
innovatively proposed based on back propagation neural network (BPNN) and reinforcement learning
(Q-Learning) by combining the motion space, selective strategy, and reward function design. The simulation
experiment environment is set and the ROS mobile robot is adopted for simulation experiments. The algorithm
proposed in this study is compared with other neural network algorithms from the perspectives of accuracy,
precision, recall, and F1. It can be found that the accuracy of algorithm proposed was at least 5.47% higher than
that of the model algorithm proposed by other scholars, and the values of precision, recall, and F1 were at least
5.5% higher. The results show that the mobile robot could find the shortest trajectory and the best trajectory in a
discrete obstacle environment, no matter the more or less the discrete obstacles or the large or small the space.
Therefore, compared to the advanced model algorithms proposed by other scholars in related fields, the robot
trajectory planning based on the improved BPNN combined with Q-Learning constructed in this study could
realize better results, and can be used in practical applications with robot trajectory planning, providing practical
value for the field of machine vision.

Keywords: artificial neural network; back propagation neural network; Q-Learning; mobile robot

1. INTRODUCTION scientific and technological competition among countries in


Many disciplines have made great progress under rapid the world.
development of science and technology today. Driven by The movement trajectory planning of mobile robots is
academic and industrial needs, the development of the core research point during the research. The trajectory
multidisciplinary technology integration is very fast, and planning of a mobile robot is mainly based on what kind of
many excellent results have been obtained in the academic trajectory the robot walks on with or without a map [7]. The
world and in the process of production practice [1, 2]. local movement trajectory planning of a mobile robot is a
Robot-related technology is a representative of dynamic planning method, and its most important feature is
multidisciplinary technology integration. It integrates the that the mobile robot can realize the real-time movement
research results of computers, sensors, and artificial trajectory planning based on local environment information.
intelligence (AI). It is the pinnacle of mechatronics However, the environment for local movement trajectory
achievements and can represent the high-tech level of a planning of mobile robots is unpredictable, and it is difficult
country [3]. The Stanford Research Institute firstly began to to deal with various situations through experience. Therefore,
study the autonomous path planning capabilities of it is necessary to introduce a self-learning function, so that the
autonomous mobile robots in complex environments in the robot can navigate autonomously and avoid obstacles after a
1960s [4]. At present, an important development direction for period of training [8, 9]. The reinforcement learning algorithm
various countries in the world is related to the research and has been widely used in the local trajectory planning of
development of robots. China has clearly pointed out in its mobile robots, and it has been proved throughpractice to be an
future development plan that robots, especially robots with effective algorithm for improving the intelligent system,
autonomous mobility, will be included in the field of including the clustering algorithm [10]. Reinforcement
advanced manufacturing technology [5, 6]. Germany pointed learning algorithm takes the "trial and error" behavior as the
out in 2013 that the future development of frontier industries basis, it uses the delayed return method to find the optimal
should give priority to the combination of intelligent robots action to obtain the best decision-making ability. The core
and man-machines. In 2018, Japan also proposed to include feature of reinforcement learning is that it can learn online
the smart manufacturing and mobile robots in the five key and update itself, which is one of the core technologies of
development areas. In the United States, it was also clearly path planning. Reinforcement learning has become more and
pointed out that the research on robots and autonomous more mature in algorithm theory and application by
systems would be included in the next 20 technological trends. combining algorithms and disciplines such as neural networks,
This means that in the future, robotics is the main direction of intelligent control, and game theory [11, 12]. Q-Learning is
the most commonly used reinforcement learning algorithm,

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3110342, IEEE
Access
Author Name: Preparation of Papers for IEEE Access (February 2017

but its convergence rate is relatively low [13]. The back trajectory planning. However, the algorithms for robot
propagation neural network (BPNN) under artificial neural trajectory planning is simple due to the lack of complete data
network (ANN) shows excellent perception and computing sets and limited model training capabilities, so there are few
capabilities, is good at nonlinear prediction and fitting, and researchers optimize and recombine the neural network
can adjust the connection signal strength among neurons to algorithms.
learn external environmental knowledge, showing a strong
generalization ability. Therefore, BPNN has become an
important calculation model for mobile robot motion behavior 3. THE TRAJECTORY PLANNING ALGORITHM BASED
control [14]. Some researchers have combined the potential ON THE BPNN AND Q-LEARNING
field method with ANN to improve the effect of movement
trajectory planning of mobile robot in a dynamic environment
3.1 Q-Learning
[15], but it can't solve a series of problems such as the slow
convergence speed of ANN represented by BPNN. The mobile robot can optimize the task result by
Based on above contents, the additional momentum continuously interaction with the environment. When
method is combined with the adaptive learning rate method to interacting with the environment with a certain action, the
optimize BPNN, and a trajectory planning based on the mobile robot will generate a new state under the action of the
BPNN and Q-Learning is innovatively proposed, so as to motion and the environment, which will be rewarded
provide effective experimental basis for the robots to learn the immediately by the environment. During the constant
best trajectory planning strategy under various obstacles repetition, the mobile robot continuously interacts with the
conditions and for the better development of the robotics. environment, generating a large amount of data. The
Q-Learning algorithm optimizes its own action strategy
2. Previous Works through the generated data, and then interacts with the
environment to generate new data, which can be adopted to
further improve its own mobile strategy. After many iterations
2.1 Analysis on ANN of learning, the mobile robot finally learns to obtain the best
With the rapid development of science and technology, action sequence to complete the corresponding task.
deep learning has been extensively studied in various fields. Therefore, the theoretical theories of reinforcement learning
As the key content of deep learning, ANN currently also are learnt firstly in this study. Q-Learning is a reward
occupies an important position in the field of robotics. guidance behavior obtained by the agent through "trial and
Al-Qurashi and Ziebart 2020 [16] studied the use of Long error" learning and interaction with the environment, aiming
Short Term Memory - Recurrent Neural Networks to maximize the reward for the agent. Q-Learning is different
(LSTM-RNN) to optimize the trajectory of the robot, which is from supervised learning in connectionist learning, which is
superior to other neural networks in position and direction. mainly reflected in the reinforcement signal. The
Peng et al. 2019 [17] proposed the use of Radial Basis reinforcement signal provided by the environment in
Function Network (RBF network) to solve the uncertainty of reinforcement learning is used to judge the quality of the
the robot control model, and verified the effectiveness of the action. It does not tell the Q-Learning system how to generate
method. the correct action, but an evaluation, usually a scalar signal.
Because the external environment provides little information,
2.2 Analysis on robot movement trajectory reinforcement learning system (RSL) must learn from one's
In recent years, robots have attracted more and more own experience. In this way, RLS gains knowledge in the
attention from researchers in the form of human-like or action evaluation environment and improves the action plan
animal-like robots. Prasetyo et al, 2019 [18] researched and to adapt to the environment.
discussed the gait planning proposed on the quadruped robot, Q-Learning [20] is a model-independent reinforcement
the trajectory planning used under this case is linear learning algorithm, which can directly optimize a Q function
translation and sinusoidal gait trajectory, and there are no that can be calculated iteratively. Its target strategy is greedy
obstacles, just walking on flat terrain. Liu et al, 2019 [19] strategy, and its action strategy is ε-greedy. The algorithm
proposed a local trajectory planning method for ground steps of Q-Learning are shown in Figure 1 below:
service robots, which generates a feasible and comfortable
trajectory while considering multiple stationary obstacles and
path curvature constraints.

2.3 Literature review


Robot trajectory planning faces some delay in avoiding
obstacles, inaccurate path planning, and low model
optimization efficiency. The existing ANN is very weak when
used in robot trajectory planning, so it is unable to accurately
obtain the data set, and unable to accurately establish a
mathematical model suitable for robot trajectory planning. In
view of the above shortcomings, most of the current
researches use a single artificial neural network for robot

10 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3110342, IEEE
Access
Author Name: Preparation of Papers for IEEE Access (February 2017

INitialization BASIC CHARACTERISTICS OF ANN.


Given Characteristic Basic contents
parameters category
Non-linear Artificial neurons are in two different states of activation
The initial state or inhibition, which is a non-linear relationship in
is given mathematics. Neural networks with thresholds have better
performance and can improve fault tolerance and storage
According to the greedy capacity.
strategy, select the action in the Unrestricted Neural networks are usually composed of many neurons.
state The overall behavior of a system depends not only on the
characteristics of a single neuron, but also on the
interaction and interconnection among various units. The
Convergence of algorithm infinite functions of the brain can be simulated through a
large number of connections among the units.
Non-constancy ANN is capable of self-adaptation, self-organization, and
Output final self-learning. Not only will the information processed by
strategy the neural network undergo various changes, but the
nonlinear dynamic system itself is also constantly
changing. The iterative process is often used to describe
FIGURE 1. The algorithm steps of Q-Learning.
the evolution process of a dynamic system.
After the algorithm is initialized and the parameters are
Non-convexity The function has multiple extreme values, so the system
set, the actions are selected according to the strategy in the
has multiple stable equilibrium states, which will lead to
initial state, the rewards and the next state are received, the
the diversity of system evolution.
end state is realized after convergence, and the final strategy
ANN can realize the abstract analysis on the neural
is outputted.
network of the human brain from the perspective of
information processing and establishes a simple model,
3.2 Q value function prediction model based on ANN forming different networks using different connection
During the Q-Learning, mobile robots show slow methods. It is a running model composed of many
convergence speed, but ANN shows very fast perception interconnected nodes (neurons). Each node (neuron)
calculation speed and good nonlinear prediction and fitting, represents a specific output function, which is called the
can adjust the connection signal strength among neurons, and activation function [22]. Each connection between two nodes
can learn about the external environment. Therefore, the represents a weighted value of the signal passing through the
movement trajectory of the mobile robot is optimized based connection, which is called the weight, being equivalent to the
on the ANN. memory of the ANN. The output of the network varies with
A neural network is a complex non-linear network the connection mode, weight, and excitation function of the
composed of a large number of simple non-linear units. It is a network. In essence, the network itself is usually an
non-linear model to simulate the function of the human brain. approximation of a specific algorithm or function, or an
Essentially, it is a model-independent adaptive function expression of a logic strategy. It is assumed that the input
estimator. When the given input is not the original training weight of neuron n is φ, the activation function is f, and the
sample, the neural network can also give an appropriate accumulation unit is m, then the output G of the neuron can be
output, that is, it has generalization ability. In the neural expressed as equation (1):
network, knowledge is distributed in the storage network
 
through learning examples, so the neural network is
Gn  f    n xn  m  , (1)
fault-tolerant. When a single processing unit is damaged, it
 n 
has little effect on the overall behavior of the neural network,
but it does not affect the normal operation of the entire system. The input weight in the above equation acts on the
Because of its strong learning ability and nonlinear mapping sample output x or sample input of the upper layer of the
ability, neural network has been widely used in robot network, so as to obtain the cumulative result by sum. Then,
kinematics, dynamics, and control. In mobile robot navigation, the nonlinear activation function f is adopted to obtain the
it is mainly used for environment model representation, local response value. The activation function is a threshold analysis
planning, global planning, sensor information fusion, robot mechanism, which can be activated and output only when the
control system, etc. The ANN is a nonlinear adaptive input exceeds a certain value, thus forming a neuron in the
information processing system composed of a large number of neural network [23]. The structure of the neuron activation
interconnected processing units. It is proposed on the basis of function is shown in Figure 2 below:
the results of modern neuroscience research, trying to process
information by simulating the processing and memory of the
brain neural network [21]. There are four basic characteristics
for ANN, as shown in Table 1 below:
TABLE 1.

10 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3110342, IEEE
Access
Author Name: Preparation of Papers for IEEE Access (February 2017

training, the network parameters (weights and thresholds)


φ1x1 φ2x2 φ0x0 corresponding to the minimum error are determined, and the
training is stopped. At this time, the trained neural network
weighting can process the input information of similar samples on its
own, as well as the nonlinear conversion information with the
smallest output error [25, 26]. BPNN is capable of non-linear
mapping and can approximate any continuous function.
Aiming at this mapping ability, the collected state-actions are
used to evaluate the Q-value function, and BPNN training is
performed at the same time. Finally, a Q-value function
prediction model is analyzed from the multiple state-action
data obtained by the intensive Q-Learning, and the model is
applied for prediction of new Q value data. BPNN shows the
disadvantages of long learning time, slow convergence, and
network training easily falling into local minimums. In view
Output of the above shortcomings, the additional momentum method
results is combined with the adaptive learning rate method [27] in
this study to apply the advantages of them to improve the
Figure 2. The structure diagram of the neuron activation function.
BPNN algorithm. The learning rate of the network is set to δ.
A neural network is a complex non-linear network If the deviation of the system feedback is gradually reduced,
composed of a large number of simple non-linear units. the next learning rate will increase, and vice versa, the
Essentially, it is a model-independent adaptive function learning rate will decrease. If the network system is trained to
estimator. When the given input is not the original training the saturation area of the error surface, the variation of the
sample, the neural network can also give an appropriate error is very small at this time, then the additional momentum
output, that is, it has generalization ability. In the neural term method can be expressed as equation (2) below:
network, knowledge is distributed in the storage network  Q
through learning examples, so the neural network is   , (2)
fault-tolerant. When a single processing unit is damaged, it 1   
has little effect on the overall behavior of the neural network, In the above equation, α represents the learning rate, λ
but it does not affect the normal operation of the entire system. represents the momentum factor in the network system, Q is
Because of its strong learning ability and nonlinear mapping the network error, and κ refers to the allowable rebound error
ability, neural network has been widely used in the robot coefficient.
kinematics, dynamics, and control. In mobile robot navigation, At this time, the adjustment process of the system
it is mainly used for environment model representation, local network connection weight can be expressed as follows:
Q
planning, global planning, sensor information fusion, robot
  t  1       t  , (3)
control system, etc. The ANN is a nonlinear adaptive R
information processing system composed of a large number of In the equation above, t represents time. The adjustment
interconnected processing units. It is proposed on the basis of of online learning rate through the unevenness of the
the results of modern neuroscience research, trying to process deviation surface can improve the convergence of the BPNN
information by simulating the processing and memory of the algorithm and effectively solve the drastic changes in the
brain neural network [24]. error curve.
The back propagation neural network (BPNN) is a type In ANN, RNN and LSTM are also often used in robot
of ANN. The basic BPNN algorithm is composed of the trajectory planning. The difference among RNN, LSTM, and
forward propagation of signals and backward propagation of BPNN is that LSTM can only avoid the disappearance of the
errors. In other words, the error output is calculated according gradient of RNN, but it can't combat the gradient explosion,
to the direction of input to output, and the weight and while BPNN shows strong self-learning, adaptive capabilities,
threshold are adjusted according to the direction of output to generalization capabilities, nonlinear mapping capabilities,
input. In the forward propagation, the input signal acts on the and fault tolerance, so it is more suitable for the computer
output node through the hidden layer, and the output signal is vision.
generated through nonlinear transformation. If the actual First of all, the prediction model of the BPNN combined
output is inconsistent with the expected output, it will be with Q-value function designed in this study is also composed
transformed into an error back propagation process. Error of an input layer, a hidden layer, and an output layer, but the
retransmission is to retransmit output errors to the input layer hidden layer here is structured with a single layer. The input
through the hidden layer and distribute the errors to all units of the neural network model is the environmental state
in each layer. The error signal of each layer is undertaken as variables A, B, C, and D perceived by the mobile robot, then
the basis for adjusting the weight of each unit. The error is the state vector dimension u of the input layer is 5, and the
reduced along the gradient direction through adjusting the output layer v is the 4 Q values corresponding to each action:
connection strength between the input node and the hidden Q(St,x1), Q(St,x2), Q(St,x3), and Q(St,x4). The number of
node, the connection strength between the hidden node, and neurons in the hidden layer can be calculated according to the
the output node and the threshold. After repeated learning and number of neurons in the input layer and output layer, as

10 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3110342, IEEE
Access
Author Name: Preparation of Papers for IEEE Access (February 2017

follows: function of the neurons in the output layer, and the S-type
N  u  v   k , (4) differentiable tangent function tansig is selected as the
In the above equation (2), u is the dimension of the state training function of the hidden layer neurons of the
vector of the input layer, v refers to the output layer, k BPNN-based Q value prediction model, as shown in the
represents a constant within [1,10]. Therefore, it can be following equation:
known that the number of neurons in the hidden layer is in the 2
range of [4, 13]. In order to maximize the training effect of f  x  1 , (5)
the model, the number of neurons is selected as 13 in this
1  e 2 x
study. The backpropagation error of the BPNN is supposed as
The structure of the Q value prediction model based on en, then the error is the actual Q value (Q(St,Xn)) obtained by
the BPNN is given as follows (Figure 3). the Q learning algorithm minus the Q value ( Q St , X n )  
predicted by the BPNN under the same set of samples, as
Input layer Hidden layer Output layer shown in the following equation (4):

en  Q  St , X n   Q  St , X n  , n  1, 2,3 (6)
A Q(St,x1)
The training steps of the BPNN-based Q neural network
are given in Table 3.
B Q(St,x2) TABLE 3.
THE TRAINING STEPS OF THE BPNN-BASED Q NEURAL NETWORK.
C Q(St,x3) Steps Specific contents
Step 1 Acquisition of input data: the mobile robot uses the
Q-Learning algorithm to obtain the perception feature vector
D Q(St,x4)
in the map environment, and normalizes the data in the
vector into the input value of the BPNN combined with the
Q value prediction model. According to the state behavior
relationship of the robot and the obstacle avoidance rule
after vibration, the movement behavior of the mobile robot
is restricted, and the current state behavior after the robot
FIGURE 3. The structure of the Q value prediction model based on the BPNN.
performs a step is given to evaluate the actual Q value. The
The input layer in the above model inputs the feature quantity and expected Q value are saved, and a
environmental variables perceived by the robot, the output sample training data is added to the input data set.
layer outputs the Q value of each action of the robot through Step 2 Adjustment of the parameters of BPNN: the training data is
the feature conversion of the hidden layer, and then the input into the input layer of BPNN, and the weights and
number of neurons in the hidden layer can be calculated. thresholds among the three layers are adjusted to minimize
The neural network layer contains many neurons, and the error between the expected value of Q and the predicted
each neuron is related to each other through weighting, value.
forming an interconnected neural network structure. The most Step 3 Judgment of the convergence of BPNN: the sum of squares
basic ANN consists of an input layer, a hidden layer, and an of multiple inspection errors is undertaken as the evaluation
output layer [28]. The functional characteristics of each layer function; and the BPNN converges when the value of the
are shown in Table 2 below: evaluation function is smaller than the given precise value.
TABLE 2 Step 4 The weights and thresholds of the BPNN connection layer
THE FUNCTIONAL CHARACTERISTICS OF EACH LAYER OF THE ANN. are saved.

Layer name Basic functions


Input layer It only receives information from the external environment. 3.3 Designs of motion space, selective strategy, and reward
The external environment is composed of input units that function of robots
can receive various types of characteristic information in the First, four actions are designed in this study, including
sample. Each neuron in this layer is equivalent to an forward R1, back R2, left turn R3, and right turn R4 to allow
independent variable, which only transmit the information to the mobile robot to walk freely in the map environment. The
the next layer without any calculation. above actions constitute the action set R={R1,R2,R3,R4}. The
Hidden layer Located between the input layer and the output layer, the left and right turning angles are set to be 30° and can be
hidden layer is to perform the data weighting calculation adjusted according to actual needs during operation, and the
and to link the input layer and the output layer through the back movement stipulates that the mobile robot rotates 180°
function. Its calculation result is the input value of the in place before moving forward.
output layer. Secondly, a mobile robot sensor model is designed based
Output layer The output layer is to obtain the calculation result, and each on the widely used ROS robot sensors [29], including three
output unit corresponds to a specific classification or a sonar sensors on the left, center, and right in front of the ROS
predicted value. robot, and the angle between each sensor is 22.5°. The
The linear function purelin is undertaken as the training position of the ROS robot body is supposed as the coordinate

10 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3110342, IEEE
Access
Author Name: Preparation of Papers for IEEE Access (February 2017

origin (0,0), the front direction of the robot is the y-axis, and robot and the target is H’, which can be expressed as follows:
the direction perpendicular to the y-axis is set as the x-axis,
 x1  x2    y1  y2 
2 2
and a two-dimensional planar rectangular coordinate system H' , (9)
is established as shown in the Figure 4:
In the equation (7) above, x1 represents the coordinates
y of the robot on the x axis, x2 represents the coordinates of the
target on the x axis, y1 represents the coordinates of the robot
22.5° on the y axis, and y2 represents the coordinates of the target
V2
on the y axis. According to the above equation, the reward
function of the robot approaching the target is designed in the
form of a discrete piecewise function, which is expressed as
follows:
150°  H '  t  1  H '  t  H '  t  1  H '  t 
 , (10)
E '  0 H '  t  1  H '  t 
V3 
 H '  t  1  H '  t  H '  t  1  H '  t 
30° V1

ROS x In the equation above, H ' t  1 


and H ' t  
represent the distance between the robot and the target at t+1
FIGURE 4. Schematic diagram of robot sensor. and t, respectively. Therefore, the total reward function E0 is
In the above coordinate system, the range of the area that calculated with below equation (9):
the robot can perceive and detect is 30° - 150°. The E  E'
E0  , (11)
rectangular plane area is divided into three parts: V1, V2, and 2
V3. Then the left, middle, and right sensors of the robot detect The smaller the calculation value obtained by the above
the obstacles in V1, V2, and V3, respectively. calculation method, the better beneficial to increase the
If the mobile robot selects a forward motion r1 from the calculation speed.
motion space R, it will travel to the divided area V2 of the
robot coordinate system in Figure 4. If the left turn action r2 is 3.4 The algorithm flow of local path planning based on BPNN-Q
selected, it will travel to the area V3 of the robot coordinate
system. If the right turn action r3 is selected, it will travel to learning
the V1 area of the robot coordinate system; and if the action The mobile robot obtains state variables according to the
R4 is selected, the robot will move back. environmental state information sensed by the sensor itself,
Finally, the reward function is designed. The design then selects actions, and gets the Q value function table of the
criterion of the reward function is given as follows. Tthe convergence function. The optimal state is used as the sample
farther away the obstacle is, the greater the positive reward data of the BPNN to obtain the BPNN-Q with generalization
will be; and the closer the obstacle is, the greater the negative ability. The predictive model of the value function. The model
reward. The movement behavior of the mobile robot will be predicts the Q value selection action based on the sensor
continuously evaluated. Suppose the maximum distance that information of the mobile robot on the given unknown
the ROS robot can perceive is H, the distance between the environment map, and realizes the local trajectory planning of
sensor and the obstacle is h, and the distance relation function the mobile robot. The local trajectory planning algorithm of
between the obstacle and the robot is the logarithmic function BPNN combined with Q-Learning is shown in Figure 5
of the base number h and h+0.01 as the true number, when h below:
= 0, the obtained logarithmic function value is not infinite. Start
Then the distance relation function U is expressed as equation
(5) below: State
initialization
U  h   log H  h  0.01 , (7) End
Select the action
and execute it Trajectory
The reward function E of robot to avoiding the obstacles strategy output
can be written as follows.
Collision or Yes Input BPNN and train
0 h  H deadlock to obtain the Q value

 No
prediction model

E   Ut 1  h Ut  h h<H and Ut 1  h <Ut  h ,


No
(8) No Yes Yes
Is it a target? Max

Ut 1  h Ut  h h<H and Ut 1  h  Ut  h
Learning?

In the above equation, Ut 1 h and U t h     FIGURE 5. The local certain trajectory planning algorithm of BPNN combined

represent the reward values of the distance at time t and t+1, with Q-Learning

respectively. After the initialization operation (Q table, environment


It is supposed that distance function between the mobile information, and BPNN parameter symbol), the motion

10 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3110342, IEEE
Access
Author Name: Preparation of Papers for IEEE Access (February 2017

strategy of the robot is selected to obtain the result of the precision, recall, and the F1 value, and the results are shown
reward function and the next state. After the action is in Figure 6 below:
executed, collision avoidance detection is performed. If there
a b
is a collision, the robot will return to the previous step to
re-adjust; if there is no collision, it will get the obstacle
avoidance reward, and then check whether it is approaching
the target. If it is approaching the target, the target reward will
be performed and the total return is calculated; if it is far
away from the target, the total return is directly calculated, the
Q value function is updated, and then the state is saved. If the
current state position is the best, the best sample data is c d
inputted in the BPNN, the parameters are adjusted to obtain
the maximum number of iterations, the BPNN-Q value
function prediction is obtained, and the best strategy is
generated; if it is not the best target position, it returns to the
initialization state to recalculate.

3.5 Experimental environment and parameters setting of the


simulation
FIGURE 6. Influences on robot trajectory planning accuracy as the number of
In this study, the python's keras library is adopted to train iterations increases under different algorithms (Figures a, b, c, and d showed the
the BPNN, and the Robomaster2019 data set is applied for comparison on accuracy, precision, recall, and F1 value, respectively.
robot trajectory training. As shown in the figure above, the algorithm used in this study is
The simulation experiment is performed on the ROS compared with other neural network algorithms from the
mobile robot [30], and the hardware and software components perspectives of accuracy, precision, recall, and F1 value. It can be
of which are shown in Table 4 below: found that the recognition accuracy of proposed algorithm reaches
TABLE 4 THE HARDWARE AND SOFTWARE COMPONENTS OF THE SIMULATION 92.53%, which is at least 5.47% higher than the model algorithm
proposed by other scholars. In addition, the precision, recall, and F1
Experimental Attribute
of the model algorithm in this study are 91.25%, 75.5%, and 63.51%,
environment
respectively. Compared with other algorithms, it is obvious that the
Hardware windows 10 operating system (64 bits), 8GB running
precision, recall, and F1 value of the model algorithm in this study
memory, Intel(R) Core(TM) i5-2450 2.5 GHz central
are at least 5.5% higher. Thus, compared with the advanced model
processing unit (CPU)
algorithms proposed by other scholars in related fields, the robot
Software MATLAB simulation software
trajectory planning based on the improved BPNN combined with
The simulation environment is a two-dimensional Q-Learning constructed in this study is better.
coordinate, the size of the map is 50 × 50, and the mobile
robot can move freely in the barrier-free area with randomly 4.2 Effects of various algorithms
given steps and directions. The starting point, ending point, BPNN + Q-Learning algorithm is compared with DDPG
and obstacles are randomly set in the environment map. combined with ML algorithm and DQN combined with ML
The parameter settings are shown in Table 5 below: algorithm to highlight the superiority of the proposed algorithm,
TABLE 5 THE PARAMETER SETTINGS OF THE SIMULATION and the results were shown in Figure 7 below:
Parameter Value
Movement step length of the robot 1 cm
Movement radius of the robot 0.5 cm
Movement velocity of the robot 1 cm/s
Minimum and maximum measuring distance of the 1 cm and 5 cm
sensor
Learning rate of the algorithm 0.3 A B
Maximum learning times of the robot 2,800 times
FIGURE 7. Comparison on performances of different algorithms (Figure A shows
the comparison results of running time. 1: BPNN; 2: Q-Learning; 3: BPNN
4. RESULTS AND DISCUSSION +Q-Learning; 4: DDPG combined with ML algorithm; 5: DQN combined with ML
algorithm; Figure B illustrates the change trend of loss function.)
Among the five algorithms, the BPNN + Q-Learning
4.1 Comparison on prediction performances of different shows the shortest average calculation time per round
algorithms (16365s); and the calculation time of BPNN, Q-Learning,
DDPG combined ML algorithm, and DQN combined with
To study the performance of the robot trajectory planning
ML algorithm are 19906s, 20078s, 18784s, and 18997s,
based on the improved BPNN combined with Q-Learning
respectively (as shown in Figure 6 above). DDPG combined
constructed in this study, the algorithm and DDPG combined
ML algorithm and DQN combined with ML algorithm take
with ML algorithm, BPNN, DQN combined with ML
less calculation time than BPNN and Q-Learning, but they
algorithm, RNN and LSTM are analyzed in terms of accuracy,
require more time than BPNN + Q-Learning. The loss

10 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3110342, IEEE
Access
Author Name: Preparation of Papers for IEEE Access (February 2017

function of BPNN + Q-Learning has been stable after the FIGURE 9. The trajectory planning of mobile robot in continuous obstacle
sixth variable, and the loss is significantly smaller than the environment.
other four functions.
This means that the selection and optimization of the The continuous obstacles in the above maps A and B
algorithm is a key step in the robot path planning, which will constitute different environments similar to indoors. The
determine the running time of the algorithm and whether the mobile robot can perform the best trajectory planning in the
trajectory length of the robot path planning is the best. two different environments. The number of steps of the
mobile robot in the map A and map B is 111 and 123,
4.4 Experimental results of mobile robot under common obstacle respectively. It suggests that the mobile robot can perform
environment optimal movement trajectory planning between any starting
point and ending point, and learn a good trajectory without
The mobile robot executes corresponding movements
any collision.
according to the maximum Q values of the four outputs of
BPNN, and performs movement trajectory planning in a 4.3 Experimental results of mobile robot under U-shaped obstacle
discrete obstacle environment (as shown in Figure 8 below). environment
The mobile robot is placed in the U-shaped obstacle
environment for the experiment of movement trajectory
planning, and the results are shown in Figure 10.

FIGURE 8. The trajectory planning of mobile robot in discrete obstacle

environment. A B
There are fewer discrete obstacles in map A, and the gap
FIGURE 10. The trajectory planning of mobile robot in U-shaped obstacle environment.
is larger. The mobile robot can find the shortest movement
trajectory, which is the best. At this time, the number of steps Figures 10A and 10B indicate that the mobile robot can
of the mobile robot after completing the planned trajectory is complete obstacle avoidance and path planning tasks in
100. The discrete obstacles in map B are significantly U-shaped obstacle environments of different sizes by using
increased, and the gap interval is reduced. The mobile robot BPNN + Q-Learning algorithm. The numbers of steps in two
can still find the best movement trajectory and make the maps are 132 and 125, respectively. BPNN + Q-Learning
movement trajectory the shortest. At this time, the number of algorithm is not only in the optimal state of learning, but also
steps used by the mobile robot after completing the planned includes generalized self-learning capabilities, which can
trajectory is 112. It suggests that under the robot path generalize states that it does not include. Therefore, the
planning model of this study, the optimal path planning of the mobile robot based on BPNN + Q-Learning algorithm can
mobile robot will not be affected by the density of discrete smoothly avoid U-shaped obstacle avoidance, and can plan
obstacles and the size of the gap interval, realizing stable the shortest path from the starting point to the ending point
performance. It reveals that the algorithm used in this study without collision, which can meet the requirements, and the
can greatly promote the trajectory planning efficiency, effect is very satisfactory.
enabling the robot to accurately find the best movement 4.4 Comparison of experimental results of various algorithms in
trajectory in an environment with a small number of discrete
obstacles or a large number of discrete obstacles. Then, the different obstacle environments
movement trajectory planning is tested in the continuous In the discrete obstacle, continuous obstacle, and
obstacle environment (as shown in Figure 9). U-shaped obstacle environments, the experimental results of
BPNN, Q-Learning, and BPNN + Q-Learning are compared.
The results are shown in Figures 11A,11B, and 11C.

10 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3110342, IEEE
Access
Author Name: Preparation of Papers for IEEE Access (February 2017

training effect of the BPNN. Therefore, the algorithm will be


debugged in the future research to obtain more optimal
training data, improve the training effect of the BPNN, and
enable the mobile robot to find a more complete local
estimation planning strategy.

COMPLIANCE WITH ETHICAL STANDARDS


Conflict of Interest: All Authors declare that they have
B
no conflict of interest.
A
Ethical approval: This article does not contain any
studies with human participants or animals performed by any
of the authors.
Informed consent was obtained from all individual
participants included in the study.

AUTHOR CONTRIBUTIONS
C All authors listed have made a substantial, direct and
intellectual contribution to the work, and approved it for
publication.
FIGURE 11. Experimental results of three algorithms under different obstacle
ACKNOWLEDGEMENTS
environments
This work was supported by General Project of
(Note: Figure 10A: discrete obstacle environment; Figure 10B: Department of Education of Hunan Province 20C1656
continuous obstacle environment; and Figure 10C: U-shaped Research on the Prediction method of College students'
obstacle environment. Red dotted line shows the movement Achievement based on Machine Learning.
trajectory under BPNN algorithm; blue dotted line shows the This work was supported by General Project of
movement trajectory under Q-Learning algorithm; and the Shaoyang Science and Technology Bureau 2018NS26
solid line marks the movement trajectory under BPNN + Q Research on the Application of Collaborative filtering
-Learning algorithm.) recommendation algorithm in Personalized Agricultural
Figure 10A reveals that the lengths of the movement Information recommendation Service.
trajectory of mobile robot is 132, 130, and 112 under the three This work was supported by The subject of Educational
algorithms (BPNN, Q-Learning, and BPNN + Q-Learning) in Science Planning in Hunan Province ND206628 Research on
a discrete obstacle environment, respectively. Figure 10B the Teaching Innovation of programming course in Colleges
discloses that the lengths of movement trajectory of the and Universities based on the demand of Deep Learning.
mobile robot under three algorithms are 155, 160, and 123,
respectively, under continuous obstacle environment. In the REFERENCES
U-shaped obstacle environment, the lengths of the movement
1. Semke L M, Tiberius V. Corporate foresight and dynamic
trajectory of the mobile robot are 167, 135, and 125 under the
BPNN, Q-Learning, and BPNN + Q-Learning, respectively. capabilities: An exploratory study. Forecasting, 2020, 2(2),
Such results suggest that the movement trajectory of mobile pp. 180-193.
robot under BPNN + Q-Learning algorithm at any obstacle
2. Diagne C, Catford J A, Essl F, et al. What are the
environment is smaller than that under the BPNN and
Q-Learning algorithm. economic costs of biological invasions? A complex topic
requiring international and interdisciplinary expertise.
5. CONCLUSION NeoBiota, 2020, 63, pp. 25.
This study innovatively proposes a prediction model of 3. Porpiglia F, Checcucci E, Amparore D, et al.
BPNN + Q-Learning by combining the BPNN and
Q-Learning algorithm to analyze the local movement Three-dimensional elastic augmented-reality
trajectory of the robot. The results show that in different robot-assisted radical prostatectomy using hyperaccuracy
obstacle environments (discrete obstacles, continuous three-dimensional reconstruction technology: a step
obstacles, or U-shaped obstacles), the mobile robot can plan
further in the identification of capsular involvement.
the best moving trajectory. It indicates that the mobile robots
not only show better performance in dynamic and complex European urology, 2019, 76(4), pp. 505-514.
environments, but also can use the shortest number of steps to 4. Pan J, Mai X, Wang C, et al. A Searching Space
find the best planned trajectory. It suggests that the proposed
Constrained Partial to Full Registration Approach With
algorithm in this study can be applied in robotics. However,
there are some shortcomings in this study. The number of Applications in Airport Trolley Deployment Robot. IEEE
training samples is limited, and the samples contain some Sensors Journal, 2020, 21(10), pp. 11946-11960.
non-optimal state actions, which have an impact on the

10 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3110342, IEEE
Access
Author Name: Preparation of Papers for IEEE Access (February 2017

5. Gladence L M, Vakula C K, Selvan M P, et al. A research Fourth IEEE International Conference on Robotic
on application of human-robot interaction using artifical Computing (IRC). IEEE, 2020, pp. 63-70.
intelligence. Int J Innov Technol Explor Eng, 2019, 8(9S2), 17. Peng G, Yang C, He W, et al. Force sensorless admittance
pp. 2278-3075. control with neural learning for robots with actuator
6. Gladence L M, Karthi M, Ravi T. A novel technique for saturation. IEEE Transactions on Industrial Electronics,
multi-class ordinal regression-APDC. Indian Journal of 2019, 67(4), pp. 3138-3148.
Science and Technology, 2016, 9(10), pp. 1-5. 18. Prasetyo G A, Suparman A F I, Nasution Z, et al.
7. Ajeil F H, Ibraheem I K, Azar A T, et al. Grid-based Development of the Gait Planning for Stability Movement
mobile robot path planning using aging-based ant colony on Quadruped Robot[C]//2019 International Electronics
optimization algorithm in static and dynamic Symposium (IES). IEEE, 2019, pp. 376-381.
environments. Sensors, 2020, 20(7), pp. 1880. 19. Liu Z, Wang Y. Trajectory Planning for Ground Service
8. Humaidi A J, Ibraheem I K, Azar A T, et al. A new Robot[C]//2019 Chinese Control And Decision
adaptive synergetic control design for single link robot Conference (CCDC). IEEE, 2019, pp. 1511-1515.
arm actuated by pneumatic muscles. Entropy, 2020, 22(7), 20. Lim S H, Autef A. Kernel-based reinforcement learning in
pp. 723. robust markov decision processes[C]//International
9. Ajeil F H, Ibraheem I K, Sahib M A, et al. Multi-objective Conference on Machine Learning. PMLR, 2019, pp.
path planning of an autonomous mobile robot using hybrid 3973-3981.
PSO-MFB optimization algorithm. Applied Soft 21. Marcjasz G, Uniejewski B, Weron R. On the importance
Computing, 2020, 89, pp. 106076. of the long-term seasonal component in day-ahead
10. Jamalullah S R, Gladence L M. Implementing Clustering electricity price forecasting with NARX neural networks.
Methodology by Obtaining Centroids of Sensor Nodes for International Journal of Forecasting, 2019, 35(4), pp.
Human Brain Functionality[C]//2020 6th International 1520-1532.
Conference on Advanced Computing and Communication 22. Pan W, Zhang L, Shen C. Data-driven time series
Systems (ICACCS). IEEE, 2020, pp. 1107-1110. prediction based on multiplicative neuron model artificial
11. Mohanty P K. An intelligent navigational strategy for neuron network. Applied Soft Computing, 2021, 104, pp.
mobile robots in uncertain environments using smart 107179.
cuckoo search algorithm. Journal of Ambient Intelligence 23. Park D W, Park S H, Hwang S K. Serial measurement of
and Humanized Computing, 2020, 11(12), pp. 6387-6402. S100B and NSE in pediatric traumatic brain injury.
12. Marchukov Y, Montano L. Multi-robot coordination for Child's Nervous System, 2019, 35(2): 343-348.
connectivity recovery after unpredictable environment 24. Zhou F, Lu G, Wen M, et al. Dynamic spectrum
changes - ScienceDirect. IFAC-PapersOnLine, 2019, management via machine learning: State of the art,
52(8), pp. 446-451. taxonomy, challenges, and open research issues. IEEE
13. Giorgi I, Cangelosi A, Masala G L. Learning Actions Network, 2019, 33(4), pp. 54-62.
From Natural Language Instructions Using an ON-World 25. Jin Y, Guo J, Ye H, et al. Extraction of Arecanut Planting
Embodied Cognitive Architecture. Frontiers in Distribution Based on the Feature Space Optimization of
Neurorobotics, 2021, 15, pp. 48. PlanetScope Imagery. Agriculture, 2021, 11(4), pp. 371.
14. Zheng J, Gao L, Wang H, et al. Smart Edge 26. Xie H, Wang Z. Study of cutting forces using FE,
Caching-Aided Partial Opportunistic Interference ANOVA, and BPNN in elliptical vibration cutting of
Alignment in HetNets[J]. Mobile Networks and titanium alloy Ti-6Al-4V. International Journal of
Applications, 2020, 25, pp. 1842-1850. Advanced Manufacturing Technology, 2019, 105(1), pp.
15. Singh M T, Chakrabarty A, Sarma B, et al. An Improved 1-16.
On-Policy Reinforcement Learning Algorithm[M]//Soft 27. Khater A A, El-Nagar A M, El-Bardini M, et al. Online
Computing Techniques and Applications. Springer, learning based on adaptive learning rate for a class of
Singapore, 2021, pp. 321-330. recurrent fuzzy neural network. Neural Computing and
16. Al-Qurashi Z, Ziebart B D. Recurrent Neural Networks Applications, 2020, 32(12), pp. 8691-8710.
for Hierarchically Mapping Human-Robot Poses[C]//2020

10 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3110342, IEEE
Access
Author Name: Preparation of Papers for IEEE Access (February 2017

28. Mojid M A, Hossain A B M Z, Ashraf M A. Artificial


neural network model to predict transport parameters of
reactive solutes from basic soil properties. Environmental
Pollution, 2019, 255(Pt 2), pp. 113355.
29. Karalekas G, Vologiannidis S, Kalomiros J. Europa: A
case study for teaching sensors, data acquisition and
robotics via a ROS-based educational robot. Sensors, 2020,
20(9), pp. 2469.
30. Krul S, Pantos C, Frangulea M, et al. Visual SLAM for
Indoor Livestock and Farming Using a Small Drone with a
Monocular Camera: A Feasibility Study. Drones, 2021,
5(2), pp. 41.

10 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

You might also like