The Direction Analysis On Trajectory of Fast Neural Network Learning Robot
The Direction Analysis On Trajectory of Fast Neural Network Learning Robot
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3110342, IEEE
Access
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000. Digital Object Identifier 10.1109/ACCESS.2017.Doi Number
Keywords: artificial neural network; back propagation neural network; Q-Learning; mobile robot
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3110342, IEEE
Access
Author Name: Preparation of Papers for IEEE Access (February 2017
but its convergence rate is relatively low [13]. The back trajectory planning. However, the algorithms for robot
propagation neural network (BPNN) under artificial neural trajectory planning is simple due to the lack of complete data
network (ANN) shows excellent perception and computing sets and limited model training capabilities, so there are few
capabilities, is good at nonlinear prediction and fitting, and researchers optimize and recombine the neural network
can adjust the connection signal strength among neurons to algorithms.
learn external environmental knowledge, showing a strong
generalization ability. Therefore, BPNN has become an
important calculation model for mobile robot motion behavior 3. THE TRAJECTORY PLANNING ALGORITHM BASED
control [14]. Some researchers have combined the potential ON THE BPNN AND Q-LEARNING
field method with ANN to improve the effect of movement
trajectory planning of mobile robot in a dynamic environment
3.1 Q-Learning
[15], but it can't solve a series of problems such as the slow
convergence speed of ANN represented by BPNN. The mobile robot can optimize the task result by
Based on above contents, the additional momentum continuously interaction with the environment. When
method is combined with the adaptive learning rate method to interacting with the environment with a certain action, the
optimize BPNN, and a trajectory planning based on the mobile robot will generate a new state under the action of the
BPNN and Q-Learning is innovatively proposed, so as to motion and the environment, which will be rewarded
provide effective experimental basis for the robots to learn the immediately by the environment. During the constant
best trajectory planning strategy under various obstacles repetition, the mobile robot continuously interacts with the
conditions and for the better development of the robotics. environment, generating a large amount of data. The
Q-Learning algorithm optimizes its own action strategy
2. Previous Works through the generated data, and then interacts with the
environment to generate new data, which can be adopted to
further improve its own mobile strategy. After many iterations
2.1 Analysis on ANN of learning, the mobile robot finally learns to obtain the best
With the rapid development of science and technology, action sequence to complete the corresponding task.
deep learning has been extensively studied in various fields. Therefore, the theoretical theories of reinforcement learning
As the key content of deep learning, ANN currently also are learnt firstly in this study. Q-Learning is a reward
occupies an important position in the field of robotics. guidance behavior obtained by the agent through "trial and
Al-Qurashi and Ziebart 2020 [16] studied the use of Long error" learning and interaction with the environment, aiming
Short Term Memory - Recurrent Neural Networks to maximize the reward for the agent. Q-Learning is different
(LSTM-RNN) to optimize the trajectory of the robot, which is from supervised learning in connectionist learning, which is
superior to other neural networks in position and direction. mainly reflected in the reinforcement signal. The
Peng et al. 2019 [17] proposed the use of Radial Basis reinforcement signal provided by the environment in
Function Network (RBF network) to solve the uncertainty of reinforcement learning is used to judge the quality of the
the robot control model, and verified the effectiveness of the action. It does not tell the Q-Learning system how to generate
method. the correct action, but an evaluation, usually a scalar signal.
Because the external environment provides little information,
2.2 Analysis on robot movement trajectory reinforcement learning system (RSL) must learn from one's
In recent years, robots have attracted more and more own experience. In this way, RLS gains knowledge in the
attention from researchers in the form of human-like or action evaluation environment and improves the action plan
animal-like robots. Prasetyo et al, 2019 [18] researched and to adapt to the environment.
discussed the gait planning proposed on the quadruped robot, Q-Learning [20] is a model-independent reinforcement
the trajectory planning used under this case is linear learning algorithm, which can directly optimize a Q function
translation and sinusoidal gait trajectory, and there are no that can be calculated iteratively. Its target strategy is greedy
obstacles, just walking on flat terrain. Liu et al, 2019 [19] strategy, and its action strategy is ε-greedy. The algorithm
proposed a local trajectory planning method for ground steps of Q-Learning are shown in Figure 1 below:
service robots, which generates a feasible and comfortable
trajectory while considering multiple stationary obstacles and
path curvature constraints.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3110342, IEEE
Access
Author Name: Preparation of Papers for IEEE Access (February 2017
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3110342, IEEE
Access
Author Name: Preparation of Papers for IEEE Access (February 2017
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3110342, IEEE
Access
Author Name: Preparation of Papers for IEEE Access (February 2017
follows: function of the neurons in the output layer, and the S-type
N u v k , (4) differentiable tangent function tansig is selected as the
In the above equation (2), u is the dimension of the state training function of the hidden layer neurons of the
vector of the input layer, v refers to the output layer, k BPNN-based Q value prediction model, as shown in the
represents a constant within [1,10]. Therefore, it can be following equation:
known that the number of neurons in the hidden layer is in the 2
range of [4, 13]. In order to maximize the training effect of f x 1 , (5)
the model, the number of neurons is selected as 13 in this
1 e 2 x
study. The backpropagation error of the BPNN is supposed as
The structure of the Q value prediction model based on en, then the error is the actual Q value (Q(St,Xn)) obtained by
the BPNN is given as follows (Figure 3). the Q learning algorithm minus the Q value ( Q St , X n )
predicted by the BPNN under the same set of samples, as
Input layer Hidden layer Output layer shown in the following equation (4):
en Q St , X n Q St , X n , n 1, 2,3 (6)
A Q(St,x1)
The training steps of the BPNN-based Q neural network
are given in Table 3.
B Q(St,x2) TABLE 3.
THE TRAINING STEPS OF THE BPNN-BASED Q NEURAL NETWORK.
C Q(St,x3) Steps Specific contents
Step 1 Acquisition of input data: the mobile robot uses the
Q-Learning algorithm to obtain the perception feature vector
D Q(St,x4)
in the map environment, and normalizes the data in the
vector into the input value of the BPNN combined with the
Q value prediction model. According to the state behavior
relationship of the robot and the obstacle avoidance rule
after vibration, the movement behavior of the mobile robot
is restricted, and the current state behavior after the robot
FIGURE 3. The structure of the Q value prediction model based on the BPNN.
performs a step is given to evaluate the actual Q value. The
The input layer in the above model inputs the feature quantity and expected Q value are saved, and a
environmental variables perceived by the robot, the output sample training data is added to the input data set.
layer outputs the Q value of each action of the robot through Step 2 Adjustment of the parameters of BPNN: the training data is
the feature conversion of the hidden layer, and then the input into the input layer of BPNN, and the weights and
number of neurons in the hidden layer can be calculated. thresholds among the three layers are adjusted to minimize
The neural network layer contains many neurons, and the error between the expected value of Q and the predicted
each neuron is related to each other through weighting, value.
forming an interconnected neural network structure. The most Step 3 Judgment of the convergence of BPNN: the sum of squares
basic ANN consists of an input layer, a hidden layer, and an of multiple inspection errors is undertaken as the evaluation
output layer [28]. The functional characteristics of each layer function; and the BPNN converges when the value of the
are shown in Table 2 below: evaluation function is smaller than the given precise value.
TABLE 2 Step 4 The weights and thresholds of the BPNN connection layer
THE FUNCTIONAL CHARACTERISTICS OF EACH LAYER OF THE ANN. are saved.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3110342, IEEE
Access
Author Name: Preparation of Papers for IEEE Access (February 2017
origin (0,0), the front direction of the robot is the y-axis, and robot and the target is H’, which can be expressed as follows:
the direction perpendicular to the y-axis is set as the x-axis,
x1 x2 y1 y2
2 2
and a two-dimensional planar rectangular coordinate system H' , (9)
is established as shown in the Figure 4:
In the equation (7) above, x1 represents the coordinates
y of the robot on the x axis, x2 represents the coordinates of the
target on the x axis, y1 represents the coordinates of the robot
22.5° on the y axis, and y2 represents the coordinates of the target
V2
on the y axis. According to the above equation, the reward
function of the robot approaching the target is designed in the
form of a discrete piecewise function, which is expressed as
follows:
150° H ' t 1 H ' t H ' t 1 H ' t
, (10)
E ' 0 H ' t 1 H ' t
V3
H ' t 1 H ' t H ' t 1 H ' t
30° V1
No
prediction model
In the above equation, Ut 1 h and U t h FIGURE 5. The local certain trajectory planning algorithm of BPNN combined
represent the reward values of the distance at time t and t+1, with Q-Learning
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3110342, IEEE
Access
Author Name: Preparation of Papers for IEEE Access (February 2017
strategy of the robot is selected to obtain the result of the precision, recall, and the F1 value, and the results are shown
reward function and the next state. After the action is in Figure 6 below:
executed, collision avoidance detection is performed. If there
a b
is a collision, the robot will return to the previous step to
re-adjust; if there is no collision, it will get the obstacle
avoidance reward, and then check whether it is approaching
the target. If it is approaching the target, the target reward will
be performed and the total return is calculated; if it is far
away from the target, the total return is directly calculated, the
Q value function is updated, and then the state is saved. If the
current state position is the best, the best sample data is c d
inputted in the BPNN, the parameters are adjusted to obtain
the maximum number of iterations, the BPNN-Q value
function prediction is obtained, and the best strategy is
generated; if it is not the best target position, it returns to the
initialization state to recalculate.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3110342, IEEE
Access
Author Name: Preparation of Papers for IEEE Access (February 2017
function of BPNN + Q-Learning has been stable after the FIGURE 9. The trajectory planning of mobile robot in continuous obstacle
sixth variable, and the loss is significantly smaller than the environment.
other four functions.
This means that the selection and optimization of the The continuous obstacles in the above maps A and B
algorithm is a key step in the robot path planning, which will constitute different environments similar to indoors. The
determine the running time of the algorithm and whether the mobile robot can perform the best trajectory planning in the
trajectory length of the robot path planning is the best. two different environments. The number of steps of the
mobile robot in the map A and map B is 111 and 123,
4.4 Experimental results of mobile robot under common obstacle respectively. It suggests that the mobile robot can perform
environment optimal movement trajectory planning between any starting
point and ending point, and learn a good trajectory without
The mobile robot executes corresponding movements
any collision.
according to the maximum Q values of the four outputs of
BPNN, and performs movement trajectory planning in a 4.3 Experimental results of mobile robot under U-shaped obstacle
discrete obstacle environment (as shown in Figure 8 below). environment
The mobile robot is placed in the U-shaped obstacle
environment for the experiment of movement trajectory
planning, and the results are shown in Figure 10.
environment. A B
There are fewer discrete obstacles in map A, and the gap
FIGURE 10. The trajectory planning of mobile robot in U-shaped obstacle environment.
is larger. The mobile robot can find the shortest movement
trajectory, which is the best. At this time, the number of steps Figures 10A and 10B indicate that the mobile robot can
of the mobile robot after completing the planned trajectory is complete obstacle avoidance and path planning tasks in
100. The discrete obstacles in map B are significantly U-shaped obstacle environments of different sizes by using
increased, and the gap interval is reduced. The mobile robot BPNN + Q-Learning algorithm. The numbers of steps in two
can still find the best movement trajectory and make the maps are 132 and 125, respectively. BPNN + Q-Learning
movement trajectory the shortest. At this time, the number of algorithm is not only in the optimal state of learning, but also
steps used by the mobile robot after completing the planned includes generalized self-learning capabilities, which can
trajectory is 112. It suggests that under the robot path generalize states that it does not include. Therefore, the
planning model of this study, the optimal path planning of the mobile robot based on BPNN + Q-Learning algorithm can
mobile robot will not be affected by the density of discrete smoothly avoid U-shaped obstacle avoidance, and can plan
obstacles and the size of the gap interval, realizing stable the shortest path from the starting point to the ending point
performance. It reveals that the algorithm used in this study without collision, which can meet the requirements, and the
can greatly promote the trajectory planning efficiency, effect is very satisfactory.
enabling the robot to accurately find the best movement 4.4 Comparison of experimental results of various algorithms in
trajectory in an environment with a small number of discrete
obstacles or a large number of discrete obstacles. Then, the different obstacle environments
movement trajectory planning is tested in the continuous In the discrete obstacle, continuous obstacle, and
obstacle environment (as shown in Figure 9). U-shaped obstacle environments, the experimental results of
BPNN, Q-Learning, and BPNN + Q-Learning are compared.
The results are shown in Figures 11A,11B, and 11C.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3110342, IEEE
Access
Author Name: Preparation of Papers for IEEE Access (February 2017
AUTHOR CONTRIBUTIONS
C All authors listed have made a substantial, direct and
intellectual contribution to the work, and approved it for
publication.
FIGURE 11. Experimental results of three algorithms under different obstacle
ACKNOWLEDGEMENTS
environments
This work was supported by General Project of
(Note: Figure 10A: discrete obstacle environment; Figure 10B: Department of Education of Hunan Province 20C1656
continuous obstacle environment; and Figure 10C: U-shaped Research on the Prediction method of College students'
obstacle environment. Red dotted line shows the movement Achievement based on Machine Learning.
trajectory under BPNN algorithm; blue dotted line shows the This work was supported by General Project of
movement trajectory under Q-Learning algorithm; and the Shaoyang Science and Technology Bureau 2018NS26
solid line marks the movement trajectory under BPNN + Q Research on the Application of Collaborative filtering
-Learning algorithm.) recommendation algorithm in Personalized Agricultural
Figure 10A reveals that the lengths of the movement Information recommendation Service.
trajectory of mobile robot is 132, 130, and 112 under the three This work was supported by The subject of Educational
algorithms (BPNN, Q-Learning, and BPNN + Q-Learning) in Science Planning in Hunan Province ND206628 Research on
a discrete obstacle environment, respectively. Figure 10B the Teaching Innovation of programming course in Colleges
discloses that the lengths of movement trajectory of the and Universities based on the demand of Deep Learning.
mobile robot under three algorithms are 155, 160, and 123,
respectively, under continuous obstacle environment. In the REFERENCES
U-shaped obstacle environment, the lengths of the movement
1. Semke L M, Tiberius V. Corporate foresight and dynamic
trajectory of the mobile robot are 167, 135, and 125 under the
BPNN, Q-Learning, and BPNN + Q-Learning, respectively. capabilities: An exploratory study. Forecasting, 2020, 2(2),
Such results suggest that the movement trajectory of mobile pp. 180-193.
robot under BPNN + Q-Learning algorithm at any obstacle
2. Diagne C, Catford J A, Essl F, et al. What are the
environment is smaller than that under the BPNN and
Q-Learning algorithm. economic costs of biological invasions? A complex topic
requiring international and interdisciplinary expertise.
5. CONCLUSION NeoBiota, 2020, 63, pp. 25.
This study innovatively proposes a prediction model of 3. Porpiglia F, Checcucci E, Amparore D, et al.
BPNN + Q-Learning by combining the BPNN and
Q-Learning algorithm to analyze the local movement Three-dimensional elastic augmented-reality
trajectory of the robot. The results show that in different robot-assisted radical prostatectomy using hyperaccuracy
obstacle environments (discrete obstacles, continuous three-dimensional reconstruction technology: a step
obstacles, or U-shaped obstacles), the mobile robot can plan
further in the identification of capsular involvement.
the best moving trajectory. It indicates that the mobile robots
not only show better performance in dynamic and complex European urology, 2019, 76(4), pp. 505-514.
environments, but also can use the shortest number of steps to 4. Pan J, Mai X, Wang C, et al. A Searching Space
find the best planned trajectory. It suggests that the proposed
Constrained Partial to Full Registration Approach With
algorithm in this study can be applied in robotics. However,
there are some shortcomings in this study. The number of Applications in Airport Trolley Deployment Robot. IEEE
training samples is limited, and the samples contain some Sensors Journal, 2020, 21(10), pp. 11946-11960.
non-optimal state actions, which have an impact on the
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3110342, IEEE
Access
Author Name: Preparation of Papers for IEEE Access (February 2017
5. Gladence L M, Vakula C K, Selvan M P, et al. A research Fourth IEEE International Conference on Robotic
on application of human-robot interaction using artifical Computing (IRC). IEEE, 2020, pp. 63-70.
intelligence. Int J Innov Technol Explor Eng, 2019, 8(9S2), 17. Peng G, Yang C, He W, et al. Force sensorless admittance
pp. 2278-3075. control with neural learning for robots with actuator
6. Gladence L M, Karthi M, Ravi T. A novel technique for saturation. IEEE Transactions on Industrial Electronics,
multi-class ordinal regression-APDC. Indian Journal of 2019, 67(4), pp. 3138-3148.
Science and Technology, 2016, 9(10), pp. 1-5. 18. Prasetyo G A, Suparman A F I, Nasution Z, et al.
7. Ajeil F H, Ibraheem I K, Azar A T, et al. Grid-based Development of the Gait Planning for Stability Movement
mobile robot path planning using aging-based ant colony on Quadruped Robot[C]//2019 International Electronics
optimization algorithm in static and dynamic Symposium (IES). IEEE, 2019, pp. 376-381.
environments. Sensors, 2020, 20(7), pp. 1880. 19. Liu Z, Wang Y. Trajectory Planning for Ground Service
8. Humaidi A J, Ibraheem I K, Azar A T, et al. A new Robot[C]//2019 Chinese Control And Decision
adaptive synergetic control design for single link robot Conference (CCDC). IEEE, 2019, pp. 1511-1515.
arm actuated by pneumatic muscles. Entropy, 2020, 22(7), 20. Lim S H, Autef A. Kernel-based reinforcement learning in
pp. 723. robust markov decision processes[C]//International
9. Ajeil F H, Ibraheem I K, Sahib M A, et al. Multi-objective Conference on Machine Learning. PMLR, 2019, pp.
path planning of an autonomous mobile robot using hybrid 3973-3981.
PSO-MFB optimization algorithm. Applied Soft 21. Marcjasz G, Uniejewski B, Weron R. On the importance
Computing, 2020, 89, pp. 106076. of the long-term seasonal component in day-ahead
10. Jamalullah S R, Gladence L M. Implementing Clustering electricity price forecasting with NARX neural networks.
Methodology by Obtaining Centroids of Sensor Nodes for International Journal of Forecasting, 2019, 35(4), pp.
Human Brain Functionality[C]//2020 6th International 1520-1532.
Conference on Advanced Computing and Communication 22. Pan W, Zhang L, Shen C. Data-driven time series
Systems (ICACCS). IEEE, 2020, pp. 1107-1110. prediction based on multiplicative neuron model artificial
11. Mohanty P K. An intelligent navigational strategy for neuron network. Applied Soft Computing, 2021, 104, pp.
mobile robots in uncertain environments using smart 107179.
cuckoo search algorithm. Journal of Ambient Intelligence 23. Park D W, Park S H, Hwang S K. Serial measurement of
and Humanized Computing, 2020, 11(12), pp. 6387-6402. S100B and NSE in pediatric traumatic brain injury.
12. Marchukov Y, Montano L. Multi-robot coordination for Child's Nervous System, 2019, 35(2): 343-348.
connectivity recovery after unpredictable environment 24. Zhou F, Lu G, Wen M, et al. Dynamic spectrum
changes - ScienceDirect. IFAC-PapersOnLine, 2019, management via machine learning: State of the art,
52(8), pp. 446-451. taxonomy, challenges, and open research issues. IEEE
13. Giorgi I, Cangelosi A, Masala G L. Learning Actions Network, 2019, 33(4), pp. 54-62.
From Natural Language Instructions Using an ON-World 25. Jin Y, Guo J, Ye H, et al. Extraction of Arecanut Planting
Embodied Cognitive Architecture. Frontiers in Distribution Based on the Feature Space Optimization of
Neurorobotics, 2021, 15, pp. 48. PlanetScope Imagery. Agriculture, 2021, 11(4), pp. 371.
14. Zheng J, Gao L, Wang H, et al. Smart Edge 26. Xie H, Wang Z. Study of cutting forces using FE,
Caching-Aided Partial Opportunistic Interference ANOVA, and BPNN in elliptical vibration cutting of
Alignment in HetNets[J]. Mobile Networks and titanium alloy Ti-6Al-4V. International Journal of
Applications, 2020, 25, pp. 1842-1850. Advanced Manufacturing Technology, 2019, 105(1), pp.
15. Singh M T, Chakrabarty A, Sarma B, et al. An Improved 1-16.
On-Policy Reinforcement Learning Algorithm[M]//Soft 27. Khater A A, El-Nagar A M, El-Bardini M, et al. Online
Computing Techniques and Applications. Springer, learning based on adaptive learning rate for a class of
Singapore, 2021, pp. 321-330. recurrent fuzzy neural network. Neural Computing and
16. Al-Qurashi Z, Ziebart B D. Recurrent Neural Networks Applications, 2020, 32(12), pp. 8691-8710.
for Hierarchically Mapping Human-Robot Poses[C]//2020
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3110342, IEEE
Access
Author Name: Preparation of Papers for IEEE Access (February 2017
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/